1. | EXECUTIVE SUMMARY |
1.1. | What is AI? |
1.2. | What are AI chips for data center and cloud? |
1.3. | AI chips must improve as performance of AI models outstrips Moore's Law |
1.4. | Large AI models require scaling of more AI chips |
1.5. | Market dynamics and strategic shifts in AI hardware |
1.6. | Layers to designing an AI chip |
1.7. | Types of AI Chips |
1.8. | Technology readiness of AI chip technologies |
1.9. | AI chip technologies benchmarked |
1.10. | AI chip landscape - Chip designers |
1.11. | Graphics Processing Units (GPUs) |
1.12. | Trends in high-performance data center GPUs |
1.13. | ASICs used by major cloud service providers for accelerating AI workloads |
1.14. | Trends in GPU alternatives for AI data center |
1.15. | AI chip key workloads for training and inference |
1.16. | Hardware demands for training and inference |
1.17. | Inference benchmarks show real time performance of top GPUs |
1.18. | Performance of common AI chips: FP16/BF16 precisions |
1.19. | Trends in advanced process nodes and energy efficiency in the last decade |
1.20. | Key players: AI chip supply chain |
1.21. | Government industrial policy and funding for semiconductor industry |
1.22. | US Sanctions on AI chips to China |
1.23. | Market size forecast of AI chips: 2025-2035 |
1.24. | Annotated market size forecast of GPUs: 2025-2035 |
1.25. | Drivers and challenges for AI chip adoption |
1.26. | Access More With an IDTechEx Subscription |
2. | INTRODUCTION TO AI MODELS AND AI CHIPS |
2.1.1. | What is AI? |
2.1.2. | What is an AI chip? |
2.1.3. | AI acceleration |
2.1.4. | Types of AI chip product categories |
2.1.5. | Overview of major AI chip markets |
2.1.6. | Cloud and data center computing |
2.1.7. | Users, procurement and partnerships of cloud and data center compute |
2.1.8. | Cloud AI |
2.1.9. | Enterprise core |
2.1.10. | Telecom edge |
2.1.11. | Edge vs Cloud characteristics |
2.1.12. | Key players: AI chip supply chain |
2.2. | Fundamentals of AI |
2.2.1. | Fundamentals of AI: Algorithms, Data, and Hardware |
2.2.2. | Training and inference |
2.2.3. | AI chips use low-precision computing |
2.2.4. | Common number representations in AI chips |
2.2.5. | Parallel computing: Data parallelism and model parallelism |
2.2.6. | Deep learning: how an AI algorithm is implemented |
2.2.7. | Neural networks explained |
2.2.8. | Types of Neural Networks |
2.2.9. | Types of neural networks and use cases |
2.3. | Large AI Models |
2.3.1. | Notable AI models increasing performance at a rate of 4.5x a year since 2010 |
2.3.2. | Transformers used for LLM replace RNNs for natural language processing |
2.3.3. | Language, computer vision, and multimodal AI models are the most popular |
2.3.4. | Reasons for AI performance outpacing Moore's Law |
2.3.5. | Key drivers for continued growth of AI models |
2.3.6. | Scale-up and scale-out systems |
2.3.7. | Training AI models is very energy intensive |
2.3.8. | Hardware design and energy inefficiencies of compute |
2.3.9. | MLPerf Power: Power ranges for various AI chip types and applications |
3. | TECHNOLOGY OVERVIEW |
3.1. | AI Chips Hardware Design Overview |
3.1.1. | History of computer hardware |
3.1.2. | Progression of AI hardware |
3.1.3. | Trends in AI chips to expect |
3.1.4. | Layers to designing an AI chip |
3.2. | Instruction Set Architectures |
3.2.1. | Introduction to Instruction Set Architectures (ISAs) for AI workloads |
3.2.2. | CISC and RISC ISAs for AI accelerators |
3.3. | Programming Models and Execution Models |
3.3.1. | Programming model vs execution model |
3.3.2. | Flynn's taxonomy and programming models |
3.3.3. | Important execution models and programming models for AI chips |
3.3.4. | Introduction to Von Neumann Architecture |
3.3.5. | Von Neumann compared with common programming models |
3.4. | Hardware Architectures |
3.4.1. | ASICs, FPGAs, and GPUs used for neural network architectures |
3.4.2. | Benchmarking capabilities of AI chips |
3.4.3. | Types of AI Chips |
3.4.4. | TRL of AI chip technologies |
3.4.5. | Pros and cons of commercial AI chips |
3.4.6. | Pros and cons of emerging AI chips |
3.4.7. | Technologies found in general-purpose processors |
3.4.8. | Special-purpose resources |
3.4.9. | Accelerator taxonomy |
3.5. | Transistors |
3.5.1. | How transistors operate: p-n junctions |
3.5.2. | Moore's law |
3.5.3. | Gate length reductions pose challenges to planar FETs below 20nm |
3.5.4. | Increasing Transistor Count |
3.5.5. | Planar FET to FinFET |
3.5.6. | GAAFET, MBCFET, RibbonFET |
3.5.7. | TSMC's leading-edge nodes roadmap |
3.5.8. | Intel Foundry's leading-edge nodes roadmap |
3.5.9. | Samsung Foundry's leading-edge nodes roadmap |
3.5.10. | CFETs to be used beyond GAAFET scaling |
3.5.11. | Device architecture roadmap (I) |
3.5.12. | Scaling technology roadmap overview |
3.6. | Advanced Semiconductor Packaging |
3.6.1. | Progression from 1D to 3D semiconductor packaging |
3.6.2. | Key metrics for advanced semiconductor packaging performance |
3.6.3. | Overview of interconnection technique in semiconductor packaging |
3.6.4. | Overview of 2.5D packaging structure |
3.6.5. | 2.5D advanced semiconductor packaging technology portfolio |
3.6.6. | 2.5D advanced semiconductor packaging used in top AI chips |
3.6.7. | Overcoming die size limitations |
3.6.8. | Integrated heterogeneous systems |
3.6.9. | Case study: AMD MI300A CPU/GPU heterogenous integration |
3.6.10. | Future system-in-package architecture |
3.6.11. | For more information on advanced semiconductor packaging |
4. | AI-CAPABLE CENTRAL PROCESSING UNITS (CPUS) |
4.1. | Technology Overview of CPUs |
4.1.1. | CPU introduction |
4.1.2. | Core architecture of a HPC and AI CPU |
4.1.3. | Key CPU requirements for HPC and AI workloads (1) |
4.1.4. | Key CPU Requirements for HPC and AI Workloads (2) |
4.1.5. | AVX-512 vector extensions for x86-64 Instruction Set |
4.2. | Intel CPUs |
4.2.1. | Intel: Xeon CPUs for data center |
4.2.2. | Intel: Advanced Matrix Extensions in CPUs for built-in AI acceleration |
4.2.3. | Intel: 4th Gen Xeon Scalable Processor performance with AMX |
4.3. | AMD CPUs |
4.3.1. | AMD: EPYC CPUs for data center |
4.4. | IBM CPUs |
4.4.1. | IBM: Power CPUs for data center |
4.5. | Arm CPUs |
4.5.1. | Arm licenses core designs with its RISC-based ISAs |
4.5.2. | Arm CPUs for data center |
4.5.3. | CPU outlook |
5. | GRAPHICS PROCESSING UNITS (GPUS) |
5.1. | Market Overview of GPUs |
5.1.1. | Types of AI GPUs |
5.1.2. | Historical background of GPUs |
5.1.3. | GPUs popularity since the 2010s |
5.1.4. | Data center GPU player landscape by region |
5.1.5. | Commercial activity of key US and Chinese data center GPU manufacturers |
5.1.6. | Drivers: Technology advancements and market opportunities |
5.1.7. | Drivers: Energy efficiency, performance, incentives, and brand strength |
5.1.8. | Barriers: Monopolization, competition, and product complexity |
5.1.9. | Barriers: R&D, competition from customers, market consolidation |
5.1.10. | How can startups compete with GPU market leaders |
5.2. | GPU Technology Breakdown |
5.2.1. | Key architectural differences between CPUs and GPUs |
5.2.2. | Architecture breakdown of high-performance data center GPUs |
5.2.3. | Data center GPUs key features |
5.2.4. | NVIDIA and AMD data center GPUs benchmark |
5.2.5. | Consumer GPUs as cloud compute |
5.2.6. | Workstation / professional GPUs as cloud compute |
5.2.7. | Pricing of GPUs by type |
5.2.8. | Form factor options for GPUs |
5.2.9. | Pricing of data center GPU form factors |
5.2.10. | Threads show how latency and throughput is handled by GPUs and CPUs |
5.2.11. | NVIDIA and AMD software |
5.2.12. | Trends in high-performance data center GPUs |
5.2.13. | Trends in high-performance data center GPUs |
5.3. | NVIDIA GPUs |
5.3.1. | NVIDIA: Tensor mathematics |
5.3.2. | NVIDIA: Tensor cores |
5.3.3. | NVIDIA: NVIDIA CUDA and tensor cores |
5.3.4. | NVIDIA: Data center GPU product timeline |
5.3.5. | NVIDIA: Ampere GPUs |
5.3.6. | NVIDIA: Hopper GPUs |
5.3.7. | NVIDIA: Blackwell GPUs (I) |
5.3.8. | NVIDIA: Blackwell GPU (II) |
5.3.9. | NVIDIA: Rack-scale solutions |
5.4. | AMD GPUs |
5.4.1. | AMD: CDNA 3 Architecture and Compute Units for GPU Compute |
5.4.2. | AMD: MI325X GPU |
5.4.3. | AMD: Instinct GPU and competitive positioning |
5.4.4. | AMD: MI300A CPU/GPU memory coherency with heterogenous integration |
5.5. | Intel GPUs |
5.5.1. | Intel: Intel GPU Max and the Xe-HPC Architecture |
5.5.2. | Intel: Future ASIC and general-purpose GPU |
5.6. | Chinese GPUs |
5.6.1. | Biren Technologies: Chinese GPGPU |
5.6.2. | Biren Technologies: BR100 and BR104 Chinese GPGPU |
5.6.3. | Moore Threads: MTT S4000 Chinese GPU |
5.6.4. | MetaX: MXC500 Chinese GPGPU |
5.6.5. | Iluvatar CoreX: Tianyuan 100 and Zhikai 100 Chinese GPGPUs |
5.6.6. | GPU Outlook |
6. | CUSTOM AI ASICS FOR CLOUD SERVICE PROVIDERS (CSPS) |
6.1. | Market Overview of Custom AI ASICs for CSPs |
6.1.1. | Introduction to custom application-specific integrated circuits (ASICs) |
6.1.2. | AI ASICs based on application |
6.1.3. | Custom ASICs enter the market to compete with GPUs |
6.1.4. | Drivers for investment, and challenges for custom ASICs |
6.1.5. | CSP custom ASIC player landscape by region |
6.1.6. | ASICs used by major cloud service providers for accelerating AI workloads |
6.1.7. | AI ASIC companies' capabilities |
6.2. | Hardware Breakdown of Custom AI ASICs for CSPs |
6.2.1. | GPU and ASIC comparison |
6.2.2. | Cloud service provider ASICs have similar architectures, using systolic arrays |
6.2.3. | Systolic arrays in ASICS are an alternative to tensor cores in GPUs |
6.2.4. | "Systolic array lock-in" |
6.3. | Key Players |
6.3.1. | Google TPU |
6.3.2. | Amazon: Trainium and Inferentia |
6.3.3. | Amazon: Trainium and Inferentia chip components and packaging |
6.3.4. | Microsoft: Maia |
6.3.5. | Meta: MTIA |
6.3.6. | Future US ASIC players |
6.3.7. | Chinese ASIC players and Chinese AI chips from cloud service providers |
6.3.8. | Outlook |
7. | OTHER AI CHIPS |
7.1.1. | Introduction to other architectures: Chapter Overview |
7.1.2. | Other AI chips player landscape by region |
7.2. | Heterogenous Matrix-Based AI Accelerators |
7.2.1. | Heterogenous matrix-based AI accelerators |
7.2.2. | Heterogenous matrix-based AI accelerators architectures |
7.2.3. | Habana: Gaudi |
7.2.4. | Intel: Gaudi2 |
7.2.5. | Intel: Greco |
7.2.6. | Intel: Gaudi3 |
7.2.7. | Cambricon Technologies: Siyuan 370 is China's AI tensor-based AI chip |
7.2.8. | Huawei: Ascend 910 |
7.2.9. | Huawei: Da Vinci architecture |
7.2.10. | Baidu: Kunlun and XPU |
7.2.11. | Qualcomm: Cloud AI 100 |
7.2.12. | Qualcomm: AI core |
7.2.13. | Summary of key players |
7.3. | Spatial AI Accelerators |
7.3.1. | Spatial AI accelerators |
7.3.2. | Cerebras: Wafer-scale processors as a competitor to GPUs |
7.3.3. | Cerebras: WSE-3 |
7.3.4. | SambaNova: Reconfigurable dataflow processors as substitute to GPUs |
7.3.5. | SambaNova: SN40L Reconfigurable Dataflow Unit (RDU) |
7.3.6. | Graphcore: Second-generation Colossus™ MK2 IPU processor |
7.3.7. | Graphcore: Bow IPU and Pods |
7.3.8. | Groq: Natural language processor designed for AI inference |
7.3.9. | Groq: Performance and technology |
7.3.10. | Untether AI: SpeedAI240 uses at-memory computation |
7.3.11. | Key players summary (I) |
7.3.12. | Key players summary (II) |
7.4. | Coarse-Grained Reconfigurable Arrays (CGRAs) |
7.4.1. | CGRAs could be a future contender for mainstream compute fabrics |
7.4.2. | CGRA breakdown |
7.4.3. | Future outlook - the search for flexible architectures with high energy efficiency and performance |
7.4.4. | CGRAs vs dataflow vs manycore |
7.4.5. | Trends in GPU alternatives for AI data center |
7.4.6. | Trends in other AI chips |
8. | BENCHMARKS AND HARDWARE TRENDS |
8.1. | Benchmarking AI Chips |
8.1.1. | MLPerf by MLCommons for benchmarking AI chips |
8.1.2. | MLCommons benchmarks: Training and inference key workloads and models |
8.1.3. | AI chip capabilities (I) |
8.1.4. | AI chip capabilities (II) |
8.1.5. | Training benchmarking |
8.1.6. | Inference benchmarking |
8.1.7. | AI chip technologies benchmarked |
8.2. | Performance and Scalability |
8.2.1. | MLPerf Inference: Data Center: Tokens per second |
8.2.2. | MLPerf Training: Natural Language Processing performance |
8.2.3. | MLPerf Training: NVIDIA performance |
8.2.4. | MLPerf Training: Scalability of Google TPUs |
8.2.5. | NVIDIA and AMD data center GPU throughput with OpenCL benchmark |
8.2.6. | Neocloud giants: GPU inference performance and GPU scalability |
8.2.7. | Performance of common AI chips: FP16/BF16 precisions |
8.2.8. | Performance of common AI chips: Comparing different precisions |
8.3. | Energy Efficiency |
8.3.1. | Performance per watt for different AI chips |
8.3.2. | Trends in advanced process nodes and energy efficiency in the last decade |
8.4. | Memory and Memory Bandwidth |
8.4.1. | Key challenge: The memory wall |
8.4.2. | Illustrating the memory wall: Memory hierarchy latency bottleneck |
8.4.3. | Memory bandwidths of different chip types |
8.4.4. | High bandwidth memory (HBM) and comparison with other DRAM technologies |
8.4.5. | Evolution of HBM generations and transition to HBM4 |
8.4.6. | Benchmarking of HBM technologies in the market from key players (1) |
8.4.7. | Benchmarking of HBM technologies in the market from key players (2) |
8.4.8. | Memory bandwidth trends |
8.4.9. | Memory capacity trends |
8.5. | Considerations for Evaluating Performance of New AI Accelerators |
8.5.1. | Evaluating performance of AI accelerators |
8.5.2. | Performance of accelerators must be measured across various metrics |
8.5.3. | Latency must be optimized through various strategies |
8.5.4. | Fundamentals abundant data computing systems using the Roofline Model |
8.5.5. | Peak throughput is limited by DNN accelerator design constraints |
8.5.6. | Hardware design and energy inefficiencies of compute |
8.5.7. | Flexibility is key for handling wide range of DNNs |
8.5.8. | Network on Chip - example from academia showing flexibility |
9. | SUPPLY CHAIN, INVESTMENTS, AND TRADE RESTRICTIONS |
9.1. | Supply Chain |
9.1.1. | IC supply chain player categories |
9.1.2. | Integrated circuit supply chain models |
9.1.3. | Supply chain by production process |
9.1.4. | Concentration of AI chip supply chain |
9.1.5. | Populated supply chain for AI chips |
9.1.6. | Populated supply chain for AI chips by component |
9.1.7. | AI chip landscape - Chip designers |
9.1.8. | Populated supply chain for custom integrated circuits |
9.1.9. | IDM fabrication capabilities |
9.1.10. | Foundry capabilities |
9.1.11. | AI cloud categories and players |
9.1.12. | US hyperscalers capital expenditure |
9.2. | Investments |
9.2.1. | Government industrial policy and funding for semiconductor industry |
9.2.2. | Government investments in US and European advanced packaging |
9.2.3. | Government investments in Asian packaging and the TMSC supply chain |
9.3. | Trade Restrictions |
9.3.1. | US policy regarding advanced semiconductors in China and other nations |
9.3.2. | Oct 7th, 2022 US sanctions on China technologies |
9.3.3. | Oct 17th, 2023, US sanctions on AI chips (I) |
9.3.4. | Oct 17th, 2023, US sanctions on AI chips (II) |
9.3.5. | AI chips compliant in China |
9.3.6. | Dec 2nd, 2024, further controls on advanced computing and semiconductor manufacture |
9.3.7. | Restrictions on High-Bandwidth Memory (HBM) |
9.3.8. | Jan 13th, 2025, AI Diffusion Framework (US worldwide export controls) (I) |
9.3.9. | Jan 13th, 2025, AI Diffusion Framework (US worldwide export controls) (II) |
9.3.10. | NVIDIA revenues by geography, affected by US restrictions |
10. | FORECASTS |
10.1.1. | Forecast methodology |
10.1.2. | Forecast assumptions and outlook |
10.1.3. | Market size forecast of AI chips: 2025-2035 |
10.1.4. | Market share forecast of AI chips: 2025-2035 |
10.1.5. | Annotated market size forecast of GPUs: 2025-2035 |
10.1.6. | IDTechEx outlook for GPUs |
10.1.7. | Custom AI ASIC market value |
10.1.8. | Annotated market size forecast of custom AI ASICs: 2025-2035 |
10.1.9. | IDTechEx outlook for custom AI ASIC chips |
10.1.10. | Annotated market size forecast of other AI chips: 2025-2035 |
10.1.11. | IDTechEx outlook for other AI chip architectures |