Network Design - Chapter 14: Survivable Network Design - University of Pittsburgh

pdf 26 trang hoanguyen 3810
Bạn đang xem 20 trang mẫu của tài liệu "Network Design - Chapter 14: Survivable Network Design - University of Pittsburgh", để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên

Tài liệu đính kèm:

  • pdfnetwork_design_chapter_14_survivable_network_design_universi.pdf

Nội dung text: Network Design - Chapter 14: Survivable Network Design - University of Pittsburgh

  1. Survivable Network Design David Tipper Associate Professor Department of Information Science and Telecommunications University of Pittsburgh Telcom 2110 Slides 14 Motivation • Communications networks need to be survivable? • Communication Networks are Critical Infrastructure (CI) (PCCIP 1996) the systems, assets and services upon which society and the economy depend • Communication infrastructure often considered most important CI due to reliance on it by other infrastructures – banking and finance, government services – power grid SCADA, etc. • Increasing Impact and Rate of Failures – Increased bandwidth of links (WDM technology in fiber optic network) – Increased societal dependence – Multiple network operators and vendor equipment 1
  2. Causes of Network Outages • According to Sprint a link outage in IP backbone every 30 min on average • Accidents – cable cuts, car wreck, etc. – According to AT&T 4.39 Cable cuts / year / 1000 km • Human errors – incorrect maintenance, installation • Environmental hazards – fire, flood, etc. • Sabotage – physical, electronic • Operational disruptions – schedule upgrades, maintenance, power outage • Hardware/Software failures – Line card failure, faulty laser, software crash, etc. IP Backbone Failure Other Unknown 9% Link Failure 36% Router Operations ‹ Time to Recover ‹ Software Upgrade from Layer ‹ Hardware Upgrade 1 failure 32% ‹ Configuration Errors ‹ Congestion 23% Router Failures ‹ Software failures ‹ Hardware failures Source: University of Michigan, 2000 ‹ DOS Attacks 2
  3. Network Survivability • Definition – Ability of the network to support the committed Quality of Services (QoS) continuously in the presence of various failure scenarios • Survivability Components – Analysis: understand failures and system functionality after failures – Design: adopt network procedures and architecture to prevent and minimize the impact of failures/attacks on network services. – Goal: maintain service for certain scenarios at reasonable cost • Self – Healing network Network Survivability • Definition – Ability of the network to support the committed Quality of Services (QoS) continuously in the presence of various failure scenarios • Survivability Components –Analysis: understand failures and system functionality after failures –Design: adopt network procedures and architecture to prevent and minimize the impact of failures/attacks on network services. – Goal: maintain service for certain scenarios at reasonable cost • Self – Healing network 3
  4. Survivable Network Design • Three steps towards a survivable network 1. Prevention: – Robust equipment and architecture (e.g., backup power supplies) – Security (physical, electronic), Intrusion detection, etc. 2. Topology Design and Capacity Allocation ƒ Design network with enough resources in appropriate topology ƒ Spare capacity allocation – to recover from failure 3. Network Management and traffic restoration procedures ƒ Detect the failure, and reroute traffic around failure using the redundant capacity Survivability – Basic Concepts • Working path and Backup path (recovery path): • Working path: carry traffic under normal operation • Backup path: an alternate path to carry the traffic in case of failures 3 4 Working route Backup route DCS route Backup Customer 1 X 2 A B 4
  5. Survivability – Basic Concept – To survive against a network failure – working path and backup path must be disjoint – So that both paths are not lost at the same time • Disjoint = ? (depending on a failure assumption) – Link disjoint – Node disjoint – (Shared Risk Link Group) SRLG disjoint BP BP AP AP Source Destination Source Destination Link-disjoint Node-disjoint Shared Risk Link Group (SRLG) C Logical intent Actual routing A Physical Cables B • Two fiber cables share the same duct or other common physical structure (such as a bridge crossing). • Two cables can be failed simultaneously 5
  6. Classification of Survivability Techniques • Path-based (Global) versus Link-based (Local) • Protection versus Restoration • Dedicated-Backup versus Shared- Backup Capacity • Ring versus Mesh topology • Dual homing • P cycle Path-based versus Link-based • Path-based Scheme (Global) – Disjoint alternate routes are provided between source and destination node 2 3 Working path Backup path 1 6 45 6
  7. Path-based versus Link-based • Link-based Scheme (Local) –Alternate routes are provided between end nodes of the failed link 23 Working path Backup path 1 6 4 5 Partial Path Scheme • Partial Path Scheme – Alternate routes are from the upstream node to destination node or from the downstream node to source node 2 3 23 1 6 1 6 4 5 45 Working path Backup path 7
  8. Path-based versus Link-based Bandwidth Faster efficient Simpler recovery speed Path-based 3 Link-based 3 3 Protection versus Restoration • When to establish the backup paths? • Protection – Backup paths are fully setup before a failure occurs. – When failure occurs, no additional signaling is needed to establish the backup path – Faster recovery time W P • Restoration – Backup paths are established after a failure occurs – More flexible with regard to the failure scenarios BP • backup paths are setup after the location of failure is known – More capacity efficient • due to its shared-backup nature, • Utilize any spare capacity available in the network – But cannot guarantee 100% restorability after failures 8
  9. Protection • Protection Variants – 1+1 Protection (dedicated protection) • Traffic is duplicated and transmitted over both working and backup paths – Fastest recovery speed, but not bandwidth efficient – 1:1 Protection (dedicated protection with extra traffic) • During normal operation (failure free), traffic is transmitted only over working path; backup path can be used to transmit extra traffic (low priority traffic) Æ better bandwidth utilization • When the working path fails, extra traffic is preempted, and traffic is switched to the backup path BP WP Source Destination Protection – 1:N Protection (shared recovery with extra traffic) • One protection entity for N working entities Protection Channel Working Channel 1 S S P Working Channel 2 AP A Working Channel n Node 1 Node 2 – M:N Protection (M ≤ N) • M protection entities for N working entities – Self Healing Rings are a form of Protection 9
  10. Types of Self-healing Rings Working ring Working ring Protection ring Protection ring ADM ADM ADM ADM ADM ADM ADM ADM 1:1 Uni-directional self-healing ring 1:1 Bi-directional self-healing ring (USHR) (BSHR) Ring - Availability • Availability for the 4-nodes self-healing ring network ADM ADM ADM ADM 43 Aring =+AAA4(1) − 10
  11. Dedicated versus Shared - Backup • Dedicated-Backup Capacity – Backup resource can be used only by a particular working path • Shared-Backup Capacity – Backup resource between several working paths can be shared – Rule: backup resource can be shared only when corresponding working paths are not expected to fail at the same time – More capacity efficient WP1 (traffic 5 units) 4 6 BP1 2 Working path Link 5-7: 5 7 Backup path dedicated spare capacity = 15 units 1 shared spare capacity = 10 units BP2 3 8 WP2 (traffic 10 units) Ring vs Mesh Architectures Advantages of Rings: • More cost efficient at low traffic volumes • Fast protection switching, some capacity sharing Advantages of Mesh: • More cost efficient at high traffic volumes • Facilitates capacity and cost efficient mesh restoration • More flexible channel re-configuration 11
  12. Mesh Network Restoration • WDM Optical Networks - lightpath • MPLS Networks – LSP •Example: 7 9 Link restoration/Protection 13 backup path 1 2 3 Working path 6 12 10 5 Path restoration 4 backup path 11 8 Path protection backup path What Does Survivability Get You? BP WP Source (S) Destination (D) • Ai is an availability of link i • Availability of a connection between S-D: Ano− protection= ∏ A i iWP∈ AAAAprotection=+−∏ i∏∏ i i iWP∈∈∈∪ iBP iWPBP •Given Ai = 0.998297, - Ano-protection = 0.996597, Aprotection= 0.999983 12
  13. Dual-homing and Multi-homing • Dual-homing – Customer host is connected to two switched-hubs. – Traffic may be split between primary and secondary paths connecting to the hubs. – Each path is served as a backup for another. • Multi-homing – Customer host is connected to more than two switched hubs. – Greater protection against a failure. Dual/Multi-homing Topologies switch customer host Dual-homing topology Multi-homing topology 13
  14. Dual-homing in Telephone Network Transmission Network SDH/SONET Facility Protection Small X Radius of Class 4 Damage Toll Diverse Network Switch Locations X Small Radius of Service Transmission Network Multiple Routes Loss Between Offices Class 5 Local Network Dual-homing in Data Network Customer Edge (CE) Router Provider Edge (PE) Router 14
  15. Backbone PoP Design • PoP typically has some redundancy • Multi-home access routers • Partial mesh between PoP routers • Parallel links between adjacent PoPs on different fiber runs P Cycles Protection (P) Cycle – Closed cycles are formulated in the mesh network. – Affected traffic is rerouted along these cycles. – For a large network will have a number of p-cycles (a) A pre-configure cycle (b) A link on the cycle fails (c) A link not on the cycle fails (d) Another link not on the cycle fails 15
  16. P-Cycles: Basics • For meshed networks • Pre-reserved protection paths (before failure) • Based on cycles, like rings • Also protects straddling failures, unlike rings • Local protection action, adjacent to failure (in the order of some 10 milliseconds) • Shared capacity (c) A link not on the cycle fails • “pre-configured protection cycles” Æ p-cycles • Developed in Canada at P-Cycles: Basics •A single p-cycle in a network: 16
  17. p-Cycles: Basics • Protected spans: • 9 „on-cycle“ (1 protection path) p-Cycles: Basics • Protected spans: • 9 ``on-cycle’’ (1 protection path) • 8 ``straddling’’ (2 protection paths) 17
  18. Mesh Survivability Techniques Mesh Survivability Techniques Protection Restoration Dedicated-backup Protection Path-based Restoration Path-based Link-based Restoration Link-based Shared-backup Protection Path-based Link-based P-cycle Survivability Technique Metrics • Scope of failure coverage – single link failure, single node/link failure, multiple failures, etc. • Recovery time – 50ms in SONET Ring • Backup capacity requirement (redundancy, amount of spare capacity ) Rr = • Guaranteed bandwidth amount of working capacity • Reordering and duplication – switching between WP and BP • Additive latency and jitter – quality of backup path, backup path length, congestion on backup path • State overhead • Scalability • Signaling requirements • Notion of recovery class (QoP) – Different level of connection availability, restorability and recovery time 18
  19. Transport Survivability • Number of techniques exist –APS – Multi-homing (with or without trunk diversity) – Link restoration – Path restoration – Self healing rings –p-cycles • See a mixture of techniques in real networks • Usually little or no survivability at the far edge (CPE – last mile) Access Core Access Implementation • Multi-layered: – Demand Topology – Logical Transport Topology – Fiber/Optical Topology • Can implement survivability techniques at each layer • Need to consider – Failure propagation – Alarm Setting – Speed of recovery –Cost – Management – Traffic Grooming –Etc. 19
  20. Traffic Restoration Capabilities • A survivability scheme and spare capacity doesn’t accomplish restoration by itself, must be used in conjunction with dynamic restoration techniques. • Need to detect failure and do path rearrangement given that there is enough spare capacity in the networks. • For example a dual-homing approach guarantees surviving connectivity, but it doesn’t restore the circuits/connections in itself. • Need network management procedures to perform path rearrangement. Steps in Traffic Recovery Detection Reconfiguration Repair process process Notification Fault Isolation Identification Path selection Repair Rerouting Normalization 20
  21. IP Survivability Options • Several techniques to improve survivability in IP networks •IP layer – – adjust link weights and timers for faster failure recovery – prestore second shortest paths, etc, • Adopt Optical Transport techniques from Telco operators (survivable rings, APS, path restoration, etc.) • MPLS logical layer restoration IP Dynamic Routing Link failure flooded New York San Francisco New Path Computed • OSPF or IS-IS computes path • If link or node fails, New path is computed • Response times: Typically a few seconds – Can be tuned to ~1000’s milliseconds – According to Sprint data – usually ~ 7secs to recover 21
  22. Backup Label Switched Paths Error signaled New York San Francisco Primary LSP Backup LSP • Primary (working) LSP & backup LSPs established a priori • If primary fails – Signal to head end, Use backup • Faster response, requires wide area signaling MPLS Fast Reroute • Increasing demand for “APS-like” redundancy – MPLS resilience to link/node failures Detour – Control-plane protection required Primary – Avoid cost of SONET APS protection LSR • Solution: MPLS Fast-reroute – RSVP Extensions define Fast Reroute – LSPs can be set up, a priori, to backup: • One LSP across a link and optionally next node, or • All LSPs across a particular link 22
  23. 1:1 Protection • For each LSP, for each node – Set up one LSP as backup – Merge into primary LSP further downstream – Backs up link and downstream node 1:1 LSP Protection Traffic uses detour LSP Link Fails Merged Downstream 23
  24. 1:N Link Protection • For each link, for each neighbor – Set up one detour LSP to backup the link as a whole – Uses LSP Hierarchy to backup all LSPs which were using failed link Multiple Primary LSPs on same link One detour LSP for link 1:N Link Protection Link Fails Primary LSPs multiplexed LSPs demultiplexed over one detour LSP at next node 24
  25. 1:N Link and Node Protection • For each link – For each node 2 hops away • Detour LSP backs up link & intermediate node • Uses LSP Hierarchy to backup all LSPs to that node • If there are two 2-hop paths to that node, setup two detour LSPs – For each node 1 hop away • Detour LSP backs up LSPs ending at that node MPLS Fast Reroute • Provides fast recovery for LSP failure – Based on a priori backup of detour LSPs – (eg, ~5 millisecond for tens of LSPs with 1:1) • There are significant tradeoffs between the approaches – Number of LSPs required – Whether node failures are protected – Ability to reserve resources for backup LSPs – Optimality of routes 25
  26. Summary of MPLS Methods • End-to-End disjoint backup LSP – one per working LSP in the network • MPLS Fast Re-Route – 1:1 LSP link or link + node protection – 1:N Link protection – 1:N Link plus node protection • All of these are interoperable based on IETF standards • Sink Trees are under study • Does MPLS solve all the problems??? Multilayer Networks • Backbone networks have multiple technology layers • Converging toward IP/MPLS/WDM • Multiple Layers present several survivability challenges • Coordination of recovery actions at different layers – Which layer is responsible for fault recovery? • Spare Capacity Allocation (SCA) – How to prevent over allocation, when each layer provides spare resources? • Failure Propagation – Lower layer failure can affect multiple higher layer links! 3 1 MPLS connections 5 WDM Physical Path 2 3 1 45 26