Network Design - Chapter 15: Survivable Network Design Continued - University of Pittsburgh

pdf 23 trang hoanguyen 3910
Bạn đang xem 20 trang mẫu của tài liệu "Network Design - Chapter 15: Survivable Network Design Continued - University of Pittsburgh", để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên

Tài liệu đính kèm:

  • pdfnetwork_design_chapter_15_survivable_network_design_continue.pdf

Nội dung text: Network Design - Chapter 15: Survivable Network Design Continued - University of Pittsburgh

  1. Survivable Network Design David Tipper Associate Professor Department of Information Science and Telecommunications University of Pittsburgh Telcom 2110 Slides 15 Survivable Network Design • Spare Capacity Allocation (SCA) Problem: – given working paths and network (or virtual network) topology – provision spare capacity and find backup routes for fault tolerance –Goal: minimum spare capacity or cost • Survivable Mesh Networks – Consider preplanned protection in mesh networks • STM - DCS, ATM - VP, WDM, MPLS, etc – determine routing/capacity allocation for normal demand – find location and amount of spare capacity for failure scenarios – spare capacity required depends on restoration/survivability technique 1
  2. Classification of Survivability Techniques • Path-based (Global) versus Link-based (Local) • Failure Dependent vs. Failure Independent • Protection versus Restoration • Dedicated-Backup versus Shared- Backup Capacity • Ring versus Mesh topology • Dual and multi-homing • P cycle •Etc. Failure Dependent vs. Failure Independent • Failure Dependent – the backup path depends on which device fails – need a set of paths one for each failure case • Failure Independent – backup path link and node disjoint with working path - one backup path per working path • Example: 7 9 Failure Dependent backup path 13 for link 2-3 failure 1 2 3 Working path 6 12 10 5 Failure Dependent backup 4 path for link 1-2 failure 11 8 Failure Independent backup path 2
  3. SCA Problem • SCA for Failure Independent Shared Backup Path Restoration • Notation r = 1,2, , D set of demands (source-destination pairs) p = 1,2, , Pr set of possible paths for demand pair r l = 1,2, , L set of network links • Input parameters (constants) α r offered traffic load of demand pair r cl unit cost of capacity on link l r,p δl = 1 if l belongs to path p realizing demand r = 0, otherwise f set of link failure scenarios • variables x r,p flow of demand r on path p sl spare capacity on link l SCA Path-flow model r,p Find sl and x , which Total spare capacity minimize ∑cl ⋅ sl l∈L Single backup path for each flow r, p s.t. ∑ x =1,∀r ∈ D p∈Pr Enough spare capacity on each link r r, p r, p ∑∑α δ l ⋅ x ≤ sl , ∀l ∈ L −{ f },∀f , f ∈ L r r∈D f p∈P 3
  4. Matrix Based Formulation of SCA • Matrix Based formulation of Optimization model for FID shared backup path restoration* • Consider path incident matrices P and Q for working and backup paths where each matrix has number of rows = number of flows in the network number of columns = number of links in the network – row i in the matrix P corresponds to the set of links used by flow i – where pij = 1 if flow i uses link j it is 0 otherwise – similary row i in the matrix Q corresponds to the set of backup path links used by flow i where qij = 1 if flow i uses link j it is 0 otherwise • Relate to spare provision matrix G, and spare capacity reservation s T – G = Q P, element Gij gives required spare capacity on link i when link j fails – s = max(G), or s ≥ G , spare capacity reservations are the maximum spare capacity for any single link failure • * Y.Liu, D.Tipper, and P. Siripongwutikorn, “Approximating Optimal Spare Capacity Allocation by Successive Survivable Routing,'' ACM/IEEE Transactions on Networking, Vol. 13., No. 1, pp. 198-211, Feb., 2005 Example From working and T Link i1234567 backup paths, G= Q P Backup path link incident matrix s G QT From G, 1 202 11101 001101100 0 2 2 2 0 2 1 1 0 0 110001100 0 s=maxG 3 1 0 1 0 0 0 1 1 001000001 0 4 1 1 1 1 0 0 1 0 100110001 0 5 2 1 1 1 0 0 0 2 011000000 1 6 1 0 0 1 0 1 0 1 000010010 1 Working path 7 2 1 0 2 0 2 0 0 010001010 0 Backup path 11 Flows 1 2 3 4 5678910 src dst 100000 01 ab P 101000 02 ac b 3 c 0 1 000013 ad 1 0 1 000004 ae Working path link 001000 05 bc a 4 6 5 incident matrix 0 0 1 0 1 0 0 6 bd An example: 000100 07 be 2 when link 2 fails, 000010 08 cd how much capacity is e 7 d 000001 09 ce 000000 110 de need on link 1 ? 4
  5. Matrix Based SCA for Link Failures min S = eT s Total spare capacity Q,s s.t. s ≥ G Enough spare capacity on each link G = QT M P Calculation of spare provision matrix P + Q ≤ 1 Link-disjointed backup paths T QB = D (mod 2) Flow conservation of backup Q is a binary matrix Integer programming Decision variable: Q, s Given: M – traffic demand matrix P – working path link incidence matrix B and D – node-link & flow-node incidence matrices Another way to find G T G = Σr Gr, where Gr = qr pr , pr and qr are vectors for working and backup paths of flow r Links G Failures G 1 Q G2 Flows P GR-1 GR 5
  6. Find spare capacity s Link i1234567 T Total working Spare path and link From Q and wi=sum P(*,i )2 2 3 1 2 1 2 13 incident matrix QT s =max(G) G=QT P P, get G 1 202 11 1 0 1 0011011000 2 2 2 0 2 1 1 0 0 1100011000 3 1 0 1 0 0 0 1 1 0010000010 4 1 1 1 1 0 0 1 0 1001100010 From G, get s 5 2 1 1 1 0 0 0 2 0110000001 6 1 0 0 1 0 1 0 1 0000100101 7 2 1 0 2 0 2 0 0 0100010100 b 3 c Total spare 11 Flows 12345678910 1 src dst 10000 0 01 ab a 4 6 5 10100 0 02 ac r = 2 0 1 00 0 0 13 ad G2 2 0 1 00 0 0 04 ae 0000000 Working path and link 0 0 1 0 0 0 0 5 bc 1010000 e 7 d incident matrix 0 0 1 0 1 0 0 6 bd 0000000 P 00010 0 07 be 0000000 00001 0 08 cd 1010000 00000 1 09 ce 0000000 For r =2, find 00000 0 110 de 1010000 T G2=qr pr Approximation algorithm • Decomposition – multi-commodity flow Æ multiple single flows • Using shortest path algorithm for each flow to – route link-disjointed backup paths – using spare provision matrix G to calculate link cost – incremental spare reservation vr ; •Flows successively update their backup paths Æ termed: successive survivable routing (SSR) 6
  7. Link cost and local objective • Goal: Each flow seeks a new backup path with minimal additional reservation • Additional reservation as link cost: + T + + –Let G = (e-pr) pr , and s = max (G +G) •(e - pr) assumes that a backup path uses all possible links • s+ is a temporary spare capacity reservation vector + – Additional spare reservation vr = s - s • vr tells how much additional spare capacity needed if a link is used on a new backup path • Run shortest path with link cost vr Example of link cost Link i1234567 Total working Spare path and link wi =sum P(*,i )2 2 3 1 2 1 2 13 incident matrix Assume backup path are QT using all possible links s=max(G) G=QT.P + qr = (e-pr) 1 2 0 2 1 1 1 0 1 0011011000 0 2 2 2 0 2 1 1 0 0 1100011000 1 3 1 0 1 0 0 0 1 1 0010000010 1 Find the contribution 4 1 1 1 1 0 0 1 0 1001100010 1 G+ = (e-p )T p 5 2 1 1 1 0 0 0 2 0110000001 1 r r 6 1 0 0 1 0 1 0 1 0000100101 1 + 7 2 1 0 2 0 2 0 0 0100010100 1 Find vr=s – s, Total spare 11 where s+ = max(G+G+) Flows 1 2 34567891011 src dst m 1000 0 0 01 ab1 Find qr, using shortest path 1010 0 0 02 ac1 r = 11 routing with link metric vr 0100 0 0 13 ad1 G+ v11 q11 0100 0 0 0 ae1 0000000 0 4 ∞ 0 ∞ b 3 c Working path and link 0 0 1 0 0 0 0 5 bc1 1 000000 1 1 1 incident matrix 0 0 1 0 1 0 0 6 bd1 1000000 0 1 1 0 0 P 0001 0 0 0 be1 1 000000 7 1 0 a 4 6 5 0000 1 0 08 cd1 1000000 0 0 1 0000 0 1 09 ce1 1000000 0 1 2 0000 0 0 110 de1 1000000 0 0 0 0 0 0 1 0 0 0 12 b e 1 e 7 d 1 0 0 0 0 0 0 11 a b 1 7
  8. SSR flowchart of flow r • On source node of flow r: 1. Given pr and dr – pr , qr : working and backup path vectors 2. Periodically update G – dr : destination node – G, s: spare provision matrix 3. Calculate v r and spare reservation vector – v : incremental spare 4. Update q using v r r r reservations as link cost 5. Update s, and G • Stop after no backup path update on the network Complexity • Polynomial running time – shortest path algorithm for each flow, O(N2) – Limited backup path update iterations for each flow • Polynomial space complexity – Advertised information in O(L2) –No per-flow based information 8
  9. Numerical comparison • Compare different algorithms and bounds – RAFT: Resource aggregation fault tolerance – SPI: Sharing with partial information – SR: Survivable routing (SSR without iteration) – SSR : Successive survivable routing – SA: Simulated annealing – BB: Branch and bound on a path-flow model – optimal – LP: Linear programming lower bound • Metrics: – % Redundancy = spare capacity/working capacity, – execution time Experiment networks 12 34 1 2 9 7 1 13 10 13 9 7 17 4 2 3 4 5 2 12 11 9 4 2 3 6 6 12 3 6 12 10 4 5 8 7 8 3 10 4 5 16 6 11 8 4 7 14 9 10 11 8 5 15 56 781 2 4 9 14 17 8 13 26 46 13 2 3 4 5 6 7 3 8 8 6 5 1 3 12 16 2 1 21 20 47 18 45 17 16 11 16 7 13 20 22 21 38 5 9 6 25 5 23 15 48 8 10 15 17 22 9 14 1 24 14 18 39 15 18 19 23 10 19 49 50 3 17 19 27 2 12 6 24 9 44 7 12 11 4 25 11 20 23 28 40 4 16 43 18 7 10 26 15 14 13 12 11 10 22 29 41 32 42 36 30 21 33 34 35 31 37 Network node degree ranges from 2.31 to 4.4 Consider balanced mesh load case 9
  10. Redundancy versus Time on Network 3 75 • SSR, SR, have 64 Network 3 70 Fast response Worse solutions, random cases with fast different flow orders 65 • Range of solutions 60 RAFT Near optimal 55 SPI • Time is the sum of solutions, fast 50 Better solutions, time to compute all (%) Redundancy slow, not scalable 64 cases 45 40 SR SSR SA BB 35 LP Infeasible 30 -2 0 2 4 10 10 10 10 Time (second) Typical SSR results 100 LP BB SA SSR SR SPI RAFT 90 80 70 60 50 Redundancy 40 30 20 10 0 12345678 Networks 10
  11. Multilayer Networks • Backbone networks have multiple technology layers • Converging toward IP/MPLS/WDM • Typically have survivability at each layer • Multiple Layers present several survivability challenges • Coordination of recovery actions at different layers – Which layer is responsible for fault recovery? • Spare Capacity Allocation (SCA) – How to prevent over allocation, when each layer provides spare resources? • Failure Propagation – Lower layer failure can affect multiple higher layer links! 3 1 MPLS connections 5 WDM Physical Path 2 3 1 45 Model Arbitrary Failures • To model failure propagation need to model multiple link failures th • F failure matrix, fik =1 Î link k fails in the i failure scenario • U flow failure incidence matrix - how flows affect by ith failure scenario U = P ☼ FT , ☼ binary multiply, capture logical relations • T flow tabu-link matrix of links avoid in determining backup T = U ☼ F ⎡1 0 ⎤ ⎡1000000 ⎤ ⎢0 0 ⎥ Scenario 1 b 3 c ⎡1000000 ⎤ ⎢ ⎥ ⎡10 ⎤ ⎢ ⎥ ⎢0 0 ⎥ 1 0100001 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0100001 ⎢ ⎥ 01 a P = U = ⎢ ⎥ 0 0 = ⎢ ⎥ 4 6 5 ⎢0010100 ⎥ ⎢0010100 ⎥ ⎢ ⎥ ⎢01 ⎥ 2 ⎢ ⎥ ⎢0 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎣0100010 ⎦ ⎢ ⎥ ⎣00 ⎦ e 7 d ⎣0100010 ⎦ ⎢0 0 ⎥ ⎢ ⎥ Scenario 2 ⎣0 1 ⎦ ⎡1000000 ⎤ ⎡10 ⎤ ⎡1000000 ⎤ F = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣0000101⎦ ⎢01⎥⎡1000000 ⎤ ⎢0000101 ⎥ T = ⎢ ⎥ = ⎢01⎥⎣0000101⎦ ⎢0000101 ⎥ ⎢ ⎥ ⎢ ⎥ ⎣00⎦ ⎣0000000 ⎦ 11
  12. SCA model for arbitrary failures min S = eT s Total spare capacity Q,s s.t. s ≥ G Enough spare capacity on each link G = QT M U Calculation of spare provision matrix T + Q 1 ≤ Failure-disjointed backup paths QBT = D (mod 2) Q is a binary matrix Flow conservation of backup U = P ☼ FT Integer programming T = U ☼ F Path failure incident matrix Decision variable: Q, s Path tabu-link incident matrix Given: M – traffic demand matrix P – working path link incidence matrix B and D – node-link & flow-node incidence matrices Solve SCA model using Branch and Bound algorithm – NP hard Heuristic Solution Algorithm • Successive survivable routing algorithm* – Decompose multi-commodity flow Æ multiple single flows – Goal: Each flow seeks a backup path with minimal additional spare capacity – Using shortest path algorithm for each flow to • route link-disjointed backup paths • using spare provision matrix G to calculate link cost – incremental spare reservation vr ; •Flows successively update their backup paths Æ termed: successive survivable routing (SSR) • Randomly order flows for successively updating. • Fast computation find near optimal solution • *Apparatus and Method for Spare Capacity Allocation, Y. Liu and D. Tipper , U.S. Patent 6,744,727 B2, June 1, 2004 • Presented in part (IEEE Infocom 2001, IEEE Trans. On Networking Feb.,2005) 12
  13. Why Optical Layer Protection? • Backbone networks have multiple technology layers • Converging toward IP/MPLS/WDM • Typically have survivability at each layer • Optical layer provides lightpath services to its client layers (e.g., SONET, IP, ATM) • Protection mechanisms exist in the client layers, so why need protection in optical layer? – IP and ATM networks don’t have extensive protection functions as SONET – Capacity efficient due to protection capacity sharing across multiple pairs of client layer equipments – Significant savings in equipment cost – Handle fiber cuts more efficiently than the client layers – Provide an additional degree of resilience (e.g., protect against multiple failures) – Can use mesh-based protection schemes that require significantly less protection capacity than ring-based schemes Limitations of Optical Layer Protection • Can’t handle all failures: client equipment failures must be dealt with by the client layer • Protect traffic in units of lightpaths: can’t protect only part of the traffic within a lightpath • Need pay careful attention to interworking of protection schemes between different layers 13
  14. Where are Optical Networks Used? • Access networks: – Fat pipes for transporting traffic between multiple end users and POPs • Metro networks: – Fat pipes for interconnecting multiple access networks, and providing access to backbone networks • Long haul backbone networks: – Fat pipes for transporting aggregates of traffic in the backbone • Grid networks (e-science): Traffic – Fat pipes for transferring large files Aggregation • Storage networks: – Fat pipes for transferring large files Optical Network Evolution Interconnected rings OXC and mesh topologies Evolution of the OXC Optical Network OXC OADM Architecture OADM OXC OADM OADM OXC WDM rings OADM OADM OADM OADM OADM OADM OADM OADM MSPP WDM linear add/drop OADM OADM WDM point-to-point SONET/SDH ADM 1995 2000 2005 14
  15. Optical Layer Protection Schemes • Optical channel (OCh) layer (or path layer) protection schemes – Restore one lightpath at a time – Need demultiplex all wavelengths • Optical multiplex section (OMS) layer (or line layer) protection schemes – Restore the entire group of lightpaths on a link – Require less equipment • OLTs and OADMs can provide both OCh and OMS layer protection in linear or ring configurations • OXCs can provide OCh layer protection in linear, ring, and mesh configurations • Backbone networks: use unprotected WDM point-to-point systems and rely on OXCs to perform the protection functions • Metropolitan networks: use WDM line terminals and OADMs to perform protection functions Optical Layer Protection Schemes • OMS layer protection schemes –1+1 –1:1 – OMS-Dedicated Protection Ring (DPR) –OMS-SPRing • OCh layer protection schemes –1+1 –OCh-SPRing –OCh-Mesh 15
  16. 1:1 OMS Protection • Shared protection in point-to-point links • The composite WDM signal is transmitted over only the working fiber – Use a switch at the transmitter, instead of a splitter • If the working fiber is cut, both ends switch over to the protection fiber – Need an APS protocol • Support low priority traffic on the protection fiber • Allow N working systems to share a single protection system OMS-DPRing • Dedicated protection ring • Two fibers operate in opposite directions • Each node transmits on both directions of the ring – Different nodes must transmit at different wavelengths • Normal operation: the ring functions as a bus, with one pair of amplifiers turned off and all the others turned on • When a link fails: an amplifier pair next to the failed link are turned off and the ones that were originally inactive are turned on • Equivalent to a Sonet USHR 16
  17. OMS-SPRing • Shared protection ring • Four fibers, analogous to a SONET BLSR/4 (BSHR) • The two protection fibers do not have attached WDM equipment • Use either span switch or ring switch • Two-fiber version – Dedicate half the wavelengths on each fiber for protection purposes – Make the protection wavelengths on one fiber correspond to the working wavelengths on the other fiberÎsignals can be rerouted w/o requiring wavelength conversion 1+1 OCh Dedicated Protection • Works in point-to-point, ring (OCh-DPRing), and mesh configurations • Two lightpaths on disjoint routes are setup for each client connection • The client signal is split at the input, the destination selects the better of the two lightpaths • No signaling requiredÎfast restoration • No protection bandwidth sharingÎbandwidth inefficient 17
  18. OCh-SPRing • Shared protection ring • Similar to SONET BLSR/4, but operate at the optical channel layer • Working lightpaths are set up on the shortest path along the ring • When a working lightpath fails, it’s restored using span switch or ring switch • Non-overlapping lightpaths in the ring can share a single wavelength around the ring for protectionÎmore efficient than OCh-DPRing OCh-Mesh Protection • Backbone networks are meshed • For mesh networks, OCh-mesh protection schemes are more bandwidth-efficient than rings – Efficiency improvements range from 20% to 60% 18
  19. Path-Based v.s Link-Based Protection • Path-based: the connection is rerouted end to end on an alternate path – Need notify the source node upon a failure • Link-based: the connection is rerouted on an alternate path around the failed link – Need not notify the source node upon a failure – Enable faster restoration than path-based schemes Offline v.s Online Protection • Offline protection – Protection path and wavelengths are reserved at the time of connection setup • In path-based scheme, a link-disjoint protection path is reserved • In link-based scheme, protection paths are reserved around each link of the working lightpath – Fast and guaranteed restoration • Online protection – Search for protection paths using the spare capacity in the network upon a failure – Capacity efficient – Slow and no guarantee of restoration 19
  20. Dedicated v.s Shared Protection • An offline scheme can use either dedicated protection or shared protection • Dedicated protection: each working lightpath is assigned its own dedicated protection bandwidth • Shared protection: if two working lightpaths are link-disjoint, they can share protection bandwidth – More capacity efficient than dedicated protection – Protection lightpaths are set up after a failure occurs Classification of OCh-Mesh Protection Schemes OCh-mesh protection schemes offline online Dedicated Shared Link-based Path-based Link-basedPath-based Link-based Path-based 20
  21. Internetworking between Layers • Need coordination between protection mechanisms in different layers • Bottom-up sequential approach: start at the layer where the failure occurs, let the layer try to restore service first, then let the higher layer try – Option 1: have the restoration in the lower layer happen so quickly that the upper layer doesn’t detect the failure – Option 2: impose an additional hold-off time in the higher layer before it attempts restoration Challenges in All Optical Networks • Dense wavelength division multiplexing (DWDM) Provide an enormous bandwidth by allowing simultaneous transmission of data on multiple wavelength in one fiber – Concern: a large gap between the capacity of one wavelength channel and the bandwidth requirement for a connection request – Solution: traffic grooming – Concern: a fiber cut or link failure can result in a loss of a large volume of data. – Solution: mesh restoration • Wavelength continuity constraint A signal must stay on the same wavelength from its source to its destination. Wavelength conversion is too expensive. – Concern: channel network resource utilization is reduced. – Solution: Wavelength retuning 21
  22. The Traffic Grooming Problem • Number of wavelengths per fiber = 4 -100+ • Per wavelength capacity = 2.5 Gbps to10 Gbps • Bandwidth requirements of most applications << 2.5 Gbps ∴ Group several sessions on the same wavelength channel in order to better utilize the available bandwidth Traffic Grooming: The intelligent allocation of traffic demands onto an available set of wavelengths in a way that reduces the overall cost of the network. The Traffic Grooming Problem: CapEx • Dominant cost factor: Electronic layer multiplexing; number of electronic layer Line Terminating Equipment (LTs): – SONET/SDH ADMs 3-4 times as expensive as OXC – IP/MPLS router ports ports • Solution: Assign the traffic such that minimum number of LTs is used 22
  23. Traffic Grooming Problems • Network design problem: dimensioning and network provisioning – Reduce capital and operational expenditure – Maximize revenue NP-Complete Problem • Solution types: – Exact solutions (based on ILP or MILP) – Heuristic and approximate solutions – Bounds Traffic Grooming for Ring Networks: Heuristics Paper Topology Traffic Result Simmons et al ‘98 BLSR Uniform all-to-all Heuristic Chiu and Modiano ’00 BLSR Uniform all-to-all Heuristic Cho et al. ‘01 Hubbed, and Uniform all-to-all SA single hop Li et al ‘00 BLSR, single Arbitrary 10/9 approximation hub Wan et al. ‘00 BLSR Arbitrary Heuristic Zhang and Qiao ‘00 UPSR/BLSR Arbitrary Heuristic Xu et al. UPSR Arbitrary GA Wang et al. ‘01 BLSR Arbitrary SA Mustafa & Kamal ’03 UPSR/BLSR Arbitrary Heuristic 23