This use case is an example for a general category of use cases related to the migration of Virtual Network Functions. Here the network functions are SDN controllers running in a virtual machine. Compared to so many controller placement works, for a meaningful flexibility measurement, we focus on the control plane dynamics to react to varying network situations (e.g. load). Therefore, we have accurately modeled the controller migration dynamics with (a) controller(s) migration and (b) switch(es)-to-controller reassignment, i.e., a complex function chain.
We consider the following scenario. For more details see .
This use case is an example for a category of uses cases related to the routing of flows in a network system reacting to events. Here routing reacts to link failures. We illustrate the network flexibility of different routing methods, including purely reactive systems and systems where the resilience to failures is proactively pre-planned (protection).
Note that in this use case, we are able to measure network flexibility with a complete set of challenges (= requests) as we measure the recovery success rate for all possible connections (all source-destination pairs) exposed to all possible single and link-pair failures, hence we do not draw the challenges from a distribution.
We consider the following scenario. For more details see .
Remember, network flexibility of a flow routing method represents its ability to respond to network failures under a recovery time threshold T and a cost threshold C.
A failure request is not supported if the time it takes to recover the failure is larger than T. A failure request is supported if the normal operation can be restored within T. An evaluation run is composed of calculating the recovery time of all failure requests, which disrupt the connection.
For the network flexibility evaluation, you can choose and vary the following two parameters: topology and ratio of link failures (default: Abilene with 0.0 = all single link failures and 0 percent of link-pair failures). We plot the flexibility for the three different systems that differ in their reaction to failures (1+1 protection, 1:1 protection and Restoration) over the time threshold T, which represents the recovery time limit.
Please choose the following two parameters to see the flexibility plot (default: Abilene with 0.0).
1. Choose the topology:
Abilene Tiscali Sprintlink Germany 50
2. Choose the ratio of link-pair failures (all single link failures are always part of the request set):
0.0 0.1 0.5 0.9 1.0
The recovery time is quasi instantaneous as this approach duplicates resources (i.e., bandwidth and flow rules) in the form of a disjoint path-pair and uses them simultaneously. Hence, T_recovery = T_switching, where the switching time depends from the technology applied. We used 0 ms in the evaluation.
Reserves backup resources in advance, but they are not used to send data while the primary path is operational. Hence, failure detection and notification are required to re-route the traffic from the failed primary to the backup path. Therefore, T_recovery = T_detection + T_notify_source + T_switching, where T_detection is 40 ms using bidirectional failure detection . Notifying the source and target nodes of the connection (we assume bidirectional connections following the same path in both direction) to switch from the working to the protection path is calculated with 5 μs propagation delay per km of cable . We assume that switching time is negligible (0 ms) compared to the other tasks.
No protection resources are planned and reserved until a failure occurs. Hence, the after-failure tasks are failure detection, notification of the controller, recovery path computation and deployment of the modified flow rules at every switch along the new path, formally: T_recovery = T_detection + T_notify_controller + T_calculation + T_installrules, where failure detection time is 50 ms using loss-of-signal detection , calculation time is set to 1 ms, while the notification of the controller and propagation delay from the controller to the switches (installing new flow rules and resend data on the new path) is calculated with 5 μs propagation delay per km of cable .We selected topologies with corresponding node coordinates and distance values from the Topology Zoo . We selected a random node as the controller location in each network, and investigated the following failure scenarios:
We calculated the recovery time in each failure scenario for all possible source-target pairs in the topology. For each source-target pair we considered only the failure requests, which disrupt the connection between the given end-nodes. Hence, the presented flexibility values (i.e., recoverable connections within time T) are averages for all possible service disruptions in the selected topology. The curves show the ratio of recoverable connections for the selected link-pair failure ratios, while varying T.
On the one hand, the additional signaling delay makes restoration less flexible for recovery times up to 50 ms, which is usually required in carrier-grade networks. However, if the recovery time threshold is above 70 ms, restoration becomes the most flexible choice in the above setting, surpassing the others, as it can recover more connections and accommodate more, practically all, link-pair failures until the topology remains connected.
On the other hand, 1+1 protection shows very high flexibility regardless of T (although with an increased bandwidth cost). This is because it provides immediate connectivity after an arbitrary single failure or after a link-pair failure which affects only the primary path owing to the simultaneous primary and backup flows. However, 1+1 protection is not flexible to recover failures affecting both the primary and backup paths.