Network Design - Chapter 14: Survivable Network Design - University of Pittsburgh

26 trang hoanguyen 3810

Download

Bạn đang xem 20 trang mẫu của tài liệu "Network Design - Chapter 14: Survivable Network Design - University of Pittsburgh", để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên

Tài liệu đính kèm:

network_design_chapter_14_survivable_network_design_universi.pdf

Nội dung text: Network Design - Chapter 14: Survivable Network Design - University of Pittsburgh

Survivable Network Design David Tipper Associate Professor Department of Information Science and Telecommunications University of Pittsburgh Telcom 2110 Slides 14 Motivation • Communications networks need to be survivable? • Communication Networks are Critical Infrastructure (CI) (PCCIP 1996) the systems, assets and services upon which society and the economy depend • Communication infrastructure often considered most important CI due to reliance on it by other infrastructures – banking and finance, government services – power grid SCADA, etc. • Increasing Impact and Rate of Failures – Increased bandwidth of links (WDM technology in fiber optic network) – Increased societal dependence – Multiple network operators and vendor equipment 1
Causes of Network Outages • According to Sprint a link outage in IP backbone every 30 min on average • Accidents – cable cuts, car wreck, etc. – According to AT&T 4.39 Cable cuts / year / 1000 km • Human errors – incorrect maintenance, installation • Environmental hazards – fire, flood, etc. • Sabotage – physical, electronic • Operational disruptions – schedule upgrades, maintenance, power outage • Hardware/Software failures – Line card failure, faulty laser, software crash, etc. IP Backbone Failure Other Unknown 9% Link Failure 36% Router Operations Time to Recover Software Upgrade from Layer Hardware Upgrade 1 failure 32% Configuration Errors Congestion 23% Router Failures Software failures Hardware failures Source: University of Michigan, 2000 DOS Attacks 2
Network Survivability • Definition – Ability of the network to support the committed Quality of Services (QoS) continuously in the presence of various failure scenarios • Survivability Components – Analysis: understand failures and system functionality after failures – Design: adopt network procedures and architecture to prevent and minimize the impact of failures/attacks on network services. – Goal: maintain service for certain scenarios at reasonable cost • Self – Healing network Network Survivability • Definition – Ability of the network to support the committed Quality of Services (QoS) continuously in the presence of various failure scenarios • Survivability Components –Analysis: understand failures and system functionality after failures –Design: adopt network procedures and architecture to prevent and minimize the impact of failures/attacks on network services. – Goal: maintain service for certain scenarios at reasonable cost • Self – Healing network 3
Survivable Network Design • Three steps towards a survivable network 1. Prevention: – Robust equipment and architecture (e.g., backup power supplies) – Security (physical, electronic), Intrusion detection, etc. 2. Topology Design and Capacity Allocation Design network with enough resources in appropriate topology Spare capacity allocation – to recover from failure 3. Network Management and traffic restoration procedures Detect the failure, and reroute traffic around failure using the redundant capacity Survivability – Basic Concepts • Working path and Backup path (recovery path): • Working path: carry traffic under normal operation • Backup path: an alternate path to carry the traffic in case of failures 3 4 Working route Backup route DCS route Backup Customer 1 X 2 A B 4
Survivability – Basic Concept – To survive against a network failure – working path and backup path must be disjoint – So that both paths are not lost at the same time • Disjoint = ? (depending on a failure assumption) – Link disjoint – Node disjoint – (Shared Risk Link Group) SRLG disjoint BP BP AP AP Source Destination Source Destination Link-disjoint Node-disjoint Shared Risk Link Group (SRLG) C Logical intent Actual routing A Physical Cables B • Two fiber cables share the same duct or other common physical structure (such as a bridge crossing). • Two cables can be failed simultaneously 5
Classification of Survivability Techniques • Path-based (Global) versus Link-based (Local) • Protection versus Restoration • Dedicated-Backup versus Shared- Backup Capacity • Ring versus Mesh topology • Dual homing • P cycle Path-based versus Link-based • Path-based Scheme (Global) – Disjoint alternate routes are provided between source and destination node 2 3 Working path Backup path 1 6 45 6
Path-based versus Link-based • Link-based Scheme (Local) –Alternate routes are provided between end nodes of the failed link 23 Working path Backup path 1 6 4 5 Partial Path Scheme • Partial Path Scheme – Alternate routes are from the upstream node to destination node or from the downstream node to source node 2 3 23 1 6 1 6 4 5 45 Working path Backup path 7
Path-based versus Link-based Bandwidth Faster efficient Simpler recovery speed Path-based 3 Link-based 3 3 Protection versus Restoration • When to establish the backup paths? • Protection – Backup paths are fully setup before a failure occurs. – When failure occurs, no additional signaling is needed to establish the backup path – Faster recovery time W P • Restoration – Backup paths are established after a failure occurs – More flexible with regard to the failure scenarios BP • backup paths are setup after the location of failure is known – More capacity efficient • due to its shared-backup nature, • Utilize any spare capacity available in the network – But cannot guarantee 100% restorability after failures 8
Protection • Protection Variants – 1+1 Protection (dedicated protection) • Traffic is duplicated and transmitted over both working and backup paths – Fastest recovery speed, but not bandwidth efficient – 1:1 Protection (dedicated protection with extra traffic) • During normal operation (failure free), traffic is transmitted only over working path; backup path can be used to transmit extra traffic (low priority traffic) Æ better bandwidth utilization • When the working path fails, extra traffic is preempted, and traffic is switched to the backup path BP WP Source Destination Protection – 1:N Protection (shared recovery with extra traffic) • One protection entity for N working entities Protection Channel Working Channel 1 S S P Working Channel 2 AP A Working Channel n Node 1 Node 2 – M:N Protection (M ≤ N) • M protection entities for N working entities – Self Healing Rings are a form of Protection 9
Types of Self-healing Rings Working ring Working ring Protection ring Protection ring ADM ADM ADM ADM ADM ADM ADM ADM 1:1 Uni-directional self-healing ring 1:1 Bi-directional self-healing ring (USHR) (BSHR) Ring - Availability • Availability for the 4-nodes self-healing ring network ADM ADM ADM ADM 43 Aring =+AAA4(1) − 10
Dedicated versus Shared - Backup • Dedicated-Backup Capacity – Backup resource can be used only by a particular working path • Shared-Backup Capacity – Backup resource between several working paths can be shared – Rule: backup resource can be shared only when corresponding working paths are not expected to fail at the same time – More capacity efficient WP1 (traffic 5 units) 4 6 BP1 2 Working path Link 5-7: 5 7 Backup path dedicated spare capacity = 15 units 1 shared spare capacity = 10 units BP2 3 8 WP2 (traffic 10 units) Ring vs Mesh Architectures Advantages of Rings: • More cost efficient at low traffic volumes • Fast protection switching, some capacity sharing Advantages of Mesh: • More cost efficient at high traffic volumes • Facilitates capacity and cost efficient mesh restoration • More flexible channel re-configuration 11
Mesh Network Restoration • WDM Optical Networks - lightpath • MPLS Networks – LSP •Example: 7 9 Link restoration/Protection 13 backup path 1 2 3 Working path 6 12 10 5 Path restoration 4 backup path 11 8 Path protection backup path What Does Survivability Get You? BP WP Source (S) Destination (D) • Ai is an availability of link i • Availability of a connection between S-D: Ano− protection= ∏ A i iWP∈ AAAAprotection=+−∏ i∏∏ i i iWP∈∈∈∪ iBP iWPBP •Given Ai = 0.998297, - Ano-protection = 0.996597, Aprotection= 0.999983 12
Dual-homing and Multi-homing • Dual-homing – Customer host is connected to two switched-hubs. – Traffic may be split between primary and secondary paths connecting to the hubs. – Each path is served as a backup for another. • Multi-homing – Customer host is connected to more than two switched hubs. – Greater protection against a failure. Dual/Multi-homing Topologies switch customer host Dual-homing topology Multi-homing topology 13
Dual-homing in Telephone Network Transmission Network SDH/SONET Facility Protection Small X Radius of Class 4 Damage Toll Diverse Network Switch Locations X Small Radius of Service Transmission Network Multiple Routes Loss Between Offices Class 5 Local Network Dual-homing in Data Network Customer Edge (CE) Router Provider Edge (PE) Router 14
Backbone PoP Design • PoP typically has some redundancy • Multi-home access routers • Partial mesh between PoP routers • Parallel links between adjacent PoPs on different fiber runs P Cycles Protection (P) Cycle – Closed cycles are formulated in the mesh network. – Affected traffic is rerouted along these cycles. – For a large network will have a number of p-cycles (a) A pre-configure cycle (b) A link on the cycle fails (c) A link not on the cycle fails (d) Another link not on the cycle fails 15
P-Cycles: Basics • For meshed networks • Pre-reserved protection paths (before failure) • Based on cycles, like rings • Also protects straddling failures, unlike rings • Local protection action, adjacent to failure (in the order of some 10 milliseconds) • Shared capacity (c) A link not on the cycle fails • “pre-configured protection cycles” Æ p-cycles • Developed in Canada at P-Cycles: Basics •A single p-cycle in a network: 16
p-Cycles: Basics • Protected spans: • 9 „on-cycle“ (1 protection path) p-Cycles: Basics • Protected spans: • 9 ``on-cycle’’ (1 protection path) • 8 ``straddling’’ (2 protection paths) 17
Mesh Survivability Techniques Mesh Survivability Techniques Protection Restoration Dedicated-backup Protection Path-based Restoration Path-based Link-based Restoration Link-based Shared-backup Protection Path-based Link-based P-cycle Survivability Technique Metrics • Scope of failure coverage – single link failure, single node/link failure, multiple failures, etc. • Recovery time – 50ms in SONET Ring • Backup capacity requirement (redundancy, amount of spare capacity ) Rr = • Guaranteed bandwidth amount of working capacity • Reordering and duplication – switching between WP and BP • Additive latency and jitter – quality of backup path, backup path length, congestion on backup path • State overhead • Scalability • Signaling requirements • Notion of recovery class (QoP) – Different level of connection availability, restorability and recovery time 18
Transport Survivability • Number of techniques exist –APS – Multi-homing (with or without trunk diversity) – Link restoration – Path restoration – Self healing rings –p-cycles • See a mixture of techniques in real networks • Usually little or no survivability at the far edge (CPE – last mile) Access Core Access Implementation • Multi-layered: – Demand Topology – Logical Transport Topology – Fiber/Optical Topology • Can implement survivability techniques at each layer • Need to consider – Failure propagation – Alarm Setting – Speed of recovery –Cost – Management – Traffic Grooming –Etc. 19
Traffic Restoration Capabilities • A survivability scheme and spare capacity doesn’t accomplish restoration by itself, must be used in conjunction with dynamic restoration techniques. • Need to detect failure and do path rearrangement given that there is enough spare capacity in the networks. • For example a dual-homing approach guarantees surviving connectivity, but it doesn’t restore the circuits/connections in itself. • Need network management procedures to perform path rearrangement. Steps in Traffic Recovery Detection Reconfiguration Repair process process Notification Fault Isolation Identification Path selection Repair Rerouting Normalization 20
IP Survivability Options • Several techniques to improve survivability in IP networks •IP layer – – adjust link weights and timers for faster failure recovery – prestore second shortest paths, etc, • Adopt Optical Transport techniques from Telco operators (survivable rings, APS, path restoration, etc.) • MPLS logical layer restoration IP Dynamic Routing Link failure flooded New York San Francisco New Path Computed • OSPF or IS-IS computes path • If link or node fails, New path is computed • Response times: Typically a few seconds – Can be tuned to ~1000’s milliseconds – According to Sprint data – usually ~ 7secs to recover 21
Backup Label Switched Paths Error signaled New York San Francisco Primary LSP Backup LSP • Primary (working) LSP & backup LSPs established a priori • If primary fails – Signal to head end, Use backup • Faster response, requires wide area signaling MPLS Fast Reroute • Increasing demand for “APS-like” redundancy – MPLS resilience to link/node failures Detour – Control-plane protection required Primary – Avoid cost of SONET APS protection LSR • Solution: MPLS Fast-reroute – RSVP Extensions define Fast Reroute – LSPs can be set up, a priori, to backup: • One LSP across a link and optionally next node, or • All LSPs across a particular link 22
1:1 Protection • For each LSP, for each node – Set up one LSP as backup – Merge into primary LSP further downstream – Backs up link and downstream node 1:1 LSP Protection Traffic uses detour LSP Link Fails Merged Downstream 23
1:N Link Protection • For each link, for each neighbor – Set up one detour LSP to backup the link as a whole – Uses LSP Hierarchy to backup all LSPs which were using failed link Multiple Primary LSPs on same link One detour LSP for link 1:N Link Protection Link Fails Primary LSPs multiplexed LSPs demultiplexed over one detour LSP at next node 24
1:N Link and Node Protection • For each link – For each node 2 hops away • Detour LSP backs up link & intermediate node • Uses LSP Hierarchy to backup all LSPs to that node • If there are two 2-hop paths to that node, setup two detour LSPs – For each node 1 hop away • Detour LSP backs up LSPs ending at that node MPLS Fast Reroute • Provides fast recovery for LSP failure – Based on a priori backup of detour LSPs – (eg, ~5 millisecond for tens of LSPs with 1:1) • There are significant tradeoffs between the approaches – Number of LSPs required – Whether node failures are protected – Ability to reserve resources for backup LSPs – Optimality of routes 25
Summary of MPLS Methods • End-to-End disjoint backup LSP – one per working LSP in the network • MPLS Fast Re-Route – 1:1 LSP link or link + node protection – 1:N Link protection – 1:N Link plus node protection • All of these are interoperable based on IETF standards • Sink Trees are under study • Does MPLS solve all the problems??? Multilayer Networks • Backbone networks have multiple technology layers • Converging toward IP/MPLS/WDM • Multiple Layers present several survivability challenges • Coordination of recovery actions at different layers – Which layer is responsible for fault recovery? • Spare Capacity Allocation (SCA) – How to prevent over allocation, when each layer provides spare resources? • Failure Propagation – Lower layer failure can affect multiple higher layer links! 3 1 MPLS connections 5 WDM Physical Path 2 3 1 45 26