Bài giảng Computer Architecture - Chapter 1: Technology & Performance evaluation

pdf 34 trang Gia Huy 16/05/2022 6090
Bạn đang xem 20 trang mẫu của tài liệu "Bài giảng Computer Architecture - Chapter 1: Technology & Performance evaluation", để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên

Tài liệu đính kèm:

  • pdfbai_giang_computer_architecture_chapter_1_technology_perform.pdf

Nội dung text: Bài giảng Computer Architecture - Chapter 1: Technology & Performance evaluation

  1. COMPUTER ARCHITECTURE Chapter 1: Technology & Performance evaluation Computer Engineering – CSE – HCMUT 1
  2. TECHNOLOGY REVIEW 2
  3. The computer revolution • The third revolution along with agriculture and industry • Progress in computer technology – Underpinned by Moore’s Law • Makes novel applications feasible – Computers in automobiles – Cell phones – Human genome project – World Wide Web – Search Engines • Computers are pervasive Computer Architecture (c) Cuong Pham-Quoc/HCMUT 3
  4. The Moore’s Law Gordon Moore Intel co-founder The number of transistors integrated in a chip has doubled every 18-24 months (1975) Computer Architecture (c) Cuong Pham-Quoc/HCMUT 4
  5. Intel processors • As of Q2/2020 th – Comet lake (10 generation) – 14 nm technology Computer Architecture (c) Cuong Pham-Quoc/HCMUT 5
  6. History • The first computer in the world Computer Architecture (c) Cuong Pham-Quoc/HCMUT 6
  7. History • Facts of ENIAC: – 30+ tons – 1,500+ square feet (140 square meter) – 18,000+ vacuum tubes – 140+ KW power – 5,000+ additions per second ENIAC: Electronic Numerical Integrator and Computer Computer Architecture (c) Cuong Pham-Quoc/HCMUT 7
  8. A Brief History of Computers • The first generation – Vacuum tubes – 1946 – 1955 • The second generation – Transistors – 1955 – 1965 • The third generation – 1965 – 1980 – Integrated circuits • The current generation – 1980 - – Personal computers • What’s the next? – Quantum computers? – Memristor? Computer Architecture (c) Cuong Pham-Quoc/HCMUT 8
  9. Classes of Computers • Personal computers – General purpose, variety of software – Subject to cost/performance tradeoff • Server computers – Network based – High capacity, performance, reliability – Range from small servers to building sized • Supercomputers – High-end scientific and engineering calculations – Highest capability but represent a small fraction of the overall computer market • Embedded computers – Hidden as components of systems – Stringent power/performance/cost constraints Computer Architecture (c) Cuong Pham-Quoc/HCMUT 9
  10. Post PC era The number of devices (millions) shipped - source: statista.com Computer Architecture (c) Cuong Pham-Quoc/HCMUT 10
  11. Modern computer components • Same components for all kinds • Components – Processor • Datapath • controller – Memory • Main memory • Cache – Input/Output • User-interface • Network • Storage Computer Architecture (c) Cuong Pham-Quoc/HCMUT 11
  12. Below your program • Application software – Written in high-level language • System software – Compiler: translates HLL code to machine code – Operating System: service code • Handling input/output • Managing memory and storage • Scheduling tasks & sharing resources • Hardware – Processor, memory, I/O controllers Computer Architecture (c) Cuong Pham-Quoc/HCMUT 12
  13. Levels of Program Code • High-level language – Level of abstraction closer to problem domain – Provides for productivity and portability • Assembly language – Textual representation of instructions • Hardware representation – Binary digits (bits) – Encoded instructions and data Computer Architecture (c) Cuong Pham-Quoc/HCMUT 13
  14. Technology trends • Thanks to electronics and material technologies – Increased capacity and performance – Reduced cost DRAM capacity Year Technology Relative performance/cost 1951 Vacuum tube 1 1965 Transistor 35 1975 Integrated circuit (IC) 900 1995 Very large scale IC (VLSI) 2,400,000 2013 Ultra large scale IC 250,000,000,000 Computer Architecture (c) Cuong Pham-Quoc/HCMUT 14
  15. PERFORMANCE EVALUATION 15
  16. Defining performance • Which airplane has the best performance? Boeing 777 Boeing 777 Boeing 747 Boeing 747 BAC/Sud BAC/Sud Concorde Concorde Douglas DC- Douglas DC- 8-50 8-50 0 100 200 300 400 500 0 2000 4000 6000 8000 10000 Passenger Capacity Cruising Range (miles) Boeing 777 Boeing 777 Boeing 747 Boeing 747 BAC/Sud BAC/Sud Concorde Concorde Douglas DC- Douglas DC- 8-50 8-50 0 500 1000 1500 0 100000 200000 300000 400000 Cruising Speed (mph) Passengers x mph Computer Architecture (c) Cuong Pham-Quoc/HCMUT 16
  17. Response Time and Throughput • Response time – How long it takes to do a task • Throughput – Total work done per unit time • e.g., tasks/transactions/ per hour • How are response time and throughput affected by – Replacing the processor with a faster version? – Adding more processors? • We’ll focus on response time for now Computer Architecture (c) Cuong Pham-Quoc/HCMUT 17
  18. Relative performance 1 Performance = Execuon me • Computer X is n times faster than Computer Y PerformanceX ExecuonY = = n PerformanceY Execuonx • Example: time take to run a program – 10s on A and 15s on B ExecuonB 15s – A is 1.5 × faster than B because = = 1.5 × ExecuonA 10s Computer Architecture (c) Cuong Pham-Quoc/HCMUT 18
  19. Measuring time • Elapsed time – Total response time, including all aspects • Processing, I/O, OS overhead, idle time – Determines system performance • CPU time – Time spent processing a given job • Discounts I/O time, other jobs’ shares – Comprises user CPU time and system CPU time – Different programs are affected differently by CPU and system performance Computer Architecture (c) Cuong Pham-Quoc/HCMUT 19
  20. Measuring CPU time • Operations of digital hardware (including CPU/processor) governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state • Clock period (T): duration of a clock cycle – s, ms, μs, ns 1 • Clock rate/frequency (F = ): the number of cycles per second T – Hz, KHz, MHz, GHz Computer Architecture (c) Cuong Pham-Quoc/HCMUT 20
  21. CPU time • Performance improved by – Reducing number of clock cycles – Increasing clock rate – Hardware designer must often trade off clock rate against cycle count CPU Time = CPU Clock Cycles × Clock Cycle Time CPU Clock Cycles = Clock Rate Computer Architecture (c) Cuong Pham-Quoc/HCMUT 21
  22. CPU time example • Computer A: 2GHz clock, 10s CPU time • Designing Computer B – Aim for 6s CPU time – Can do faster clock, but causes 1.2 × clock cycles • How fast must Computer B clock be? CPU Clock Cycles CPU Clock Cycles A A CPU TimeA = = = 10s Clock RateA 2.0GHz CPU Clock Cycles CPU Clock Cycles B 1.2 × A CPU TimeB = = = 6s Clock RateB Clock RateB ⇒ Clock RateB = 4.0GHz Computer Architecture (c) Cuong Pham-Quoc/HCMUT 22
  23. Instruction count & CPI Clock Cycles = Instrucon count × Cycles per Instrucon CPU Time = Instrucon count × Cycles per Instrucon × Clock Cycle Time Instrucon count × Cycles per Instrucon IC × CPI = = Clock Rate Clock Rate • Instruction Count for a program – Determined by program, ISA and compiler • Average cycles per instruction – Determined by CPU hardware – If different instructions have different CPI • Average CPI affected by instruction mix Computer Architecture (c) Cuong Pham-Quoc/HCMUT 23
  24. Example • Which is faster, and by how much? – Computer A: Cycle Time = 250ps, CPI = 2.0 – Computer B: Cycle Time = 500ps, CPI = 1.2 – Same ISA, compiler CPU Time IC CPI Cycle Time A = A × A × A = IC × 2.0 × 250ps CPU Time IC CPI Cycle Time B = B × B × B = IC × 1.2 × 500ps CPU TimeB IC × 600ps ⇒ = = 1.2 × CPU TimeA IC × 500ps Computer Architecture (c) Cuong Pham-Quoc/HCMUT 24
  25. Mixed instructions CPI • CPI for instructions/operations may vary – e.g.,: multiplication takes more cycles than addition • More precise CPU clock cycles should take instruction types into account n Clock cycles CPI Instrucon count = ∑ ( i × i) i=1 • Weighted average CPI n Clock cycles Instrucon counti CPI = = (CPI × ) Instrucon count ∑ i Instrucon count i=1 Computer Architecture (c) Cuong Pham-Quoc/HCMUT 25
  26. Example • Question: two implementations of an application that use instructions in classes A, B, C as follows. Which one is better? – Implementation 1 uses 2 A, 1 B, and 2 C – Implementation 2 uses 4 A, 1 B, and 1 C – CPIs for A, B, and C are 1, 2, and 3, respectively • Answer: Implementation 1: clock cycles – 1 = 2 × 1 + 1 × 2 + 2 × 3 = 10 • IC = 5, CPI = 2.0 Implementation 2: clock cycles – 2 = 4 × 1 + 1 × 2 + 1 × 3 = 9 • IC = 6, CPI = 1.5 Computer Architecture (c) Cuong Pham-Quoc/HCMUT 26
  27. Exercise • A program is executed on a 2 GHz CPU. The program consists of 1000 instructions in which: – 30% load/store instructions, CPI = 2.5 – 10% jump instructions, CPI = 1 – 20% branch instructions, CPI = 1.5 – The rest are arithmetic instructions, CPI = 2.0 a) What is execution time (CPU time) of the program? b) What is the weight average CPI of the program? c) If load/store instructions are improved so that their execution time is reduced by a factor of 2, what is the speed-up of the system? Computer Architecture (c) Cuong Pham-Quoc/HCMUT 27
  28. Performance summary • The BIG picture (take home message) Instrucons Clock cycles Seconds CPU me = × × Program Instrucon Clock cycle • Performance depends on – Algorithm: IC, possibly CPI – Programming language: IC, CPI – Compiler: IC, CPI – Instruction set architecture: IC, CPI, T Computer Architecture (c) Cuong Pham-Quoc/HCMUT 28
  29. Power trends • In CMOS technology Power(P) = Capacive load × Voltage2 × Clock rate 10000 120 3600 3900 2000 2667 3300 3400 100 frequency 103 ) 1000 95 200 87 80 ) MHz ( 66 77 W 75.3 ( 100 65 60 25 power 12.5 16 40 29.1 Power 10 Frequency Frequency 10.1 20 3.3 4.1 4.9 1 0 4 4 ) ) ) 5 ) ) ) ) ) ) ) Ivy 5 ) 2 5 2010 2004 2015 ( ( ( 1982 1985 1989 1993 1997 2001 2012 Core i Core Skylake Skylake 80286 80386 80486 ( ( ( ( ( ( ( 2007 Bridge Bridge Prescott Prescott Core i Core ( Core Core Clarkdal e Pentium Pentium Pentium Pentium Pentium Core i Core Willamette Kentsfield Pentium Pro Pro Pentium Computer Architecture (c) Cuong Pham-Quoc/HCMUT 29
  30. Reducing power • Suppose a new CPU has – 85% of capacitive load of old CPU – 15% voltage and 15% frequency reduction Pnew Cold × 0.85 × (Vold × 0.85)2 × Fold × 0.85 = = 0.854 = 0.52 2 Pold Cold × Vold × Fold • The power wall • We can’t reduce voltage further • We can’t remove more heat • How else can we improve performance? Computer Architecture (c) Cuong Pham-Quoc/HCMUT 30
  31. Multiprocessors Computer Architecture (c) Cuong Pham-Quoc/HCMUT 31
  32. Benchmark • Programs used to measure performance – Supposedly typical of actual workload • Standard Performance Evaluation Corp (SPEC) – Develops benchmarks for CPU, I/O, Web, • SPEC CPU2006 – Elapsed time to execute a selection of programs • Negligible I/O, so focuses on CPU performance – Normalize relative to reference machine – Summarize as geometric mean of performance ratios • CINT2006 (integer) and CFP2006 (floating-point) Computer Architecture (c) Cuong Pham-Quoc/HCMUT 32
  33. Example • Intel core i7 920 results with CINT 2006 Computer Architecture (c) Cuong Pham-Quoc/HCMUT 33
  34. Concluding remarks • Cost/performance is improving – Due to underlying technology development • Hierarchical layers of abstraction – In both hardware and software • Instruction set architecture – The hardware/software interface • Execution time: the best performance measure • Power is a limiting factor – Use parallelism to improve performance Computer Architecture (c) Cuong Pham-Quoc/HCMUT 34