With the advent of legislation and regulations such as NIS2 and DORA, a lot of ink is being spilt about digital resilience. This is a very important concept to be able to comply with the new regulations. From a network perspective, this is about the ability of organizations to respond to disruptions in their digital infrastructure. We talk to Joe Vaccaro, VP & GM at Cisco for the ThousandEyes division of the company, about what this means for organizations.
The network is not the easiest part of the infrastructure to make and keep digitally resilient. After all, organizations do not only use networks that they manage themselves. A large part of the networks they rely on are located outside their own walls. Vaccaro: “Networks used to be completely self-managed. If there was a problem, you would reboot the router. However, the world has changed towards an internet-centered architecture.”
With the above reality as a basis, we dive into the world of digital resilience of the network with Vaccaro. From Cisco’s perspective digital resilience consists of three components: security, observability and assurance. These components must all be in order and work together to make organizations truly digitally resilient, Vaccaro explains. In other words, “you need a well-secured network with strong observability capabilities and end-to-end assurance to fully understand all aspects of digital resilience”.
Resilience requires a different perspective
One of the core concepts Vaccaro regularly returns to in our conversation is assurance. The assurance that organizations want about their network. That should be set up and should operate in such a way that they are optimally digitally resilient.
When we ask Vaccaro for his definition of assurance, he indicates that it revolves around the promise of an outcome: the delivery of an excellent digital experience. Security and observability obviously play an important role in this. However, it also requires a fundamentally different conceptual approach from organizations. As already indicated, a network does not consist of a single segment, but of a chain of components and connections. These connections sit beyond your direct control, yet you rely on them to deliver your services. It is therefore important that we look at it end-to-end, Vaccaro argues.
On paper, the above approach sounds very logical in the year 2025. After all, the internet is the new network. Yet it is not necessarily easy to become optimally digitally resilient. Vaccaro indicates that we need a “fundamental shift”. “We must move from find and fix, which you can do within your owned network that you have control over, to evidence and escalate, which is the new reality of working with third-party providers, leveraging data to quickly get on the same page in terms of whose portion of the network is impacted by an issue, and therefore responsible for a fix.”
We need a different operational attitude
This new reality requires a different operational attitude within organizations. Vaccaro gives an example to clarify this. “BGP routing that’s blackholing traffic requires evidence,” he says. This is important because otherwise it is and will remain completely unclear where the problem lies. “And then we’re in finger-pointing mode,” he predicts how the response to such an incident will be without evidence.
Without insight into (hidden) dependencies within the distributed network, it is virtually impossible to figure out where things go wrong. In fact, many hidden dependencies only become clear during incidents.
Vaccaro gives an anonymized example of a hidden dependency that caused major problems. A large company suffered a loss of one million dollars per minute during an outage, but had no idea where the problem was because they had no insight into their broader network. After six hours (that is 360 million dollars) they knocked on ThousandEyes’ door. In less than five minutes they identified that the problem was with an upstream provider, specifically with a peering connection upstream of their primary provider.
A new approach to network resilience
The example above illustrates why we need to look at network resilience differently, is the point Vaccaro wants to make. The network is no longer just yours, and in the event of disruptions, the crucial question is: is it only my problem, or is there a broader problem, for example with the cloud provider hosting my services? ThousandEyes can provide collective intelligence (across enterprise, Internet and cloud networks) and synthetic monitoring to deliver visibility into both the networks you own and the ones you don’t so that you can determine exactly where an issue is happening.
A network problem is often not the result of an error or malfunction but of a misconfiguration of a component in the entire network chain, Vaccaro indicates. That makes it even more important to have a continuous and good understanding of how the network functions. A malfunction in a part of the chain usually generates a notification somewhere. This is much less often the case with an incorrect configuration. Because what is an incorrect configuration for one thing is not necessary a misconfiguration for something else. If the network components do not have context to their configurations, they will not indicate that something is wrong.
End-to-end topology
To tackle the challenges that arise from (common or uncommon) misconfigurations and other network problems, we need an end-to-end topology, Vaccaro reiterates. ThousandEyes (and Cisco as a whole) have recently put a lot of extra work into this. We saw a good example of this recently during Mobile World Congress. There, ThousandEyes announced Connected Devices. This is intended for service providers and extends their insight into the performance of their customers’ networks in their home environments.
The goal, as Vaccaro describes it, is to help service providers see deeper so that they can catch an outage or other disruption quickly, before it impacts customers who might be streaming their favorite show or getting on a work call. Digital resilience in this way means providers can deliver a better connected experience to their customers and improve Net Promoter Score, for example.
What it all comes down to is that an end-to-end topology such as the one built by ThousandEyes should make it possible to quickly see what goes wrong, but also provide insight into what will happen. This makes it possible to predict where a malfunction will occur.
DORA and third parties
The Digital Operational Resilience Act (DORA) will be no news to readers who are active in the financial world. You can see DORA as a kind of advanced NIS2, only directly enforced by the EU (so no local ‘translation’ of the legislation). It is a collection of best practices that many financial institutions must adhere to. Most of it is fairly obvious. In fact, we would call it basic hygiene when it comes to resilience. However, one component under DORA will have caused financial institutions some stress and will continue to do so: they must now adhere to new expectations when it comes to the services they provide and the resilience of their third-party ICT dependencies.
This third-party dependency component of DORA raises a lot of additional questions that need answering. These dependencies must be clearly identified. Vaccaro gives an example: “Maybe your organization uses Twilio, but do you know that this service runs on AWS?” You really should know that. An adjustment to one of the third-party services that an organization uses can have an effect on the entire organization. “If SecOps adjusts a security group in the AWS load balancer, it can have consequences for other components,” he explains.
The point Vaccaro obviously wants to make here is that ThousandEyes makes these hidden dependencies visible. You can also use it to detect a lot of technical debt, or technology that you should actually update or stop using, in the infrastructure. Only when an organization has mapped out all the dependencies can they start tackling problems and move from a reactive to proactive stance. It is therefore possible to select the right third parties from the outset. Vaccaro calls this the final step in assurance: not just avoiding the impact of an outage, but planning and organizing the third parties you want to use for optimized performance.
ThousandEyes and Cisco
ThousandEyes has been part of Cisco for about five years. It has now been integrated into almost all parts of the portfolio. In recent years, Cisco has been talking a lot about the quality of the employee digital experience, for example. ThousandEyes naturally plays an important role in this. The quality of the network is often of great importance to employees. This also enables them to use the tools they depend on for their jobs. It is therefore very important for organizations that they can solve problems as quickly as possible. And not have to search for six hours first (and then still find nothing), like the company in the example above.
ThousandEyes can also add value in the area of security. For example, it is integrated with Cisco Secure Access, the company’s SSE offering. Among other things, it can see in some detail the health of the endpoint on which Secure Access runs. It can also quickly establish correlations between endpoint performance and SaaS application performance. Customers using Splunk can use OpenTelemetry and REST APIs to get everything in their dashboard.
When it comes to the Cisco platform, ThousandEyes obviously also benefits from its enormous size. Cisco has by far the largest market share in networking. “A lot of the world runs on Cisco,” in Vaccaro’s words. If there is a problem somewhere in a Cisco network, you can solve it quickly because there is so much of the aforementioned collective intelligence present.
Conclusion: resilience starts with insight
The story from Vaccaro, ThousandEyes and, more generally, Cisco is clear. Without insight into the network infrastructure, it is very difficult, if not impossible, to respond adequately to things that go wrong or could go wrong. However, it is important not to focus too narrowly on a single characteristic.“To achieve digital resilience across the entire digital footprint is always a matter of combining assurance, observability and security,” according to Vaccaro.
ThousandEyes wants to support the way in which organizations should make decisions by making a meaningful contribution to all three components. Is that all there is to it? Certainly not. Organizations must be set up in such a way that they can make use of this contribution. Without that insight, it doesn’t matter how much data you collect. You will never have the knowledge you need to take advantage of it. However, with legislation such as NIS2 and DORA, that would not be very wise.
Also read: Cisco Networking Cloud gets more Digital Experience Assurance: what’s that?
Opening image via Cisco ThousandEyes