Jump to content

Template:NvidiaDgxAccelerators

From Wikipedia, the free encyclopedia

Comparison of accelerators used in DGX:[1][2][3]

Model Architecture Socket FP32
CUDA
cores
FP64 cores
(excl. tensor)
Mixed
INT32/FP32
cores
INT32
cores
Boost
clock
Memory
clock
Memory
bus width
Memory
bandwidth
VRAM Single
precision
(FP32)
Double
precision
(FP64)
INT8
(non-tensor)
INT8
dense tensor
INT32 FP4
dense tensor
FP16 FP16
dense tensor
bfloat16
dense tensor
TensorFloat-32
(TF32)
dense tensor
FP64
dense tensor
Interconnect
(NVLink)
GPU L1 Cache L2 Cache TDP Die size Transistor
count
Process Launched
B200 Blackwell SXM6 N/A N/A N/A N/A N/A 8 Gbit/s HBM3e 8192-bit 8 TB/sec 192 GB HBM3e N/A N/A N/A 4.5 POPS N/A 9 PFLOPS N/A 2.25 PFLOPS 2.25 PFLOPS 1.2 PFLOPS 40 TFLOPS 1.8 TB/sec GB100 N/A N/A 1000 W N/A 208 B TSMC 4NP Q4 2024 (expected)
B100 Blackwell SXM6 N/A N/A N/A N/A N/A 8 Gbit/s HBM3e 8192-bit 8 TB/sec 192 GB HBM3e N/A N/A N/A 3.5 POPS N/A 7 PFLOPS N/A 1.98 PFLOPS 1.98 PFLOPS 989 TFLOPS 30 TFLOPS 1.8 TB/sec GB100 N/A N/A 700 W N/A 208 B TSMC 4NP
H200 Hopper SXM5 16896 4608 16896 N/A 1980 MHz 6.3 Gbit/s HBM3e 6144-bit 4.8 TB/sec 141 GB HBM3e 67 TFLOPS 34 TFLOPS N/A 1.98 POPS N/A N/A N/A 990 TFLOPS 990 TFLOPS 495 TFLOPS 67 TFLOPS 900 GB/sec GH100 25344 KB (192 KB × 132) 51200 KB 1000 W 814 mm2 80 B TSMC 4N Q3 2023
H100 Hopper SXM5 16896 4608 16896 N/A 1980 MHz 5.2 Gbit/s HBM3 5120-bit 3.35 TB/sec 80 GB HBM3 67 TFLOPS 34 TFLOPS N/A 1.98 POPS N/A N/A N/A 990 TFLOPS 990 TFLOPS 495 TFLOPS 67 TFLOPS 900 GB/sec GH100 25344 KB (192 KB × 132) 51200 KB 700 W 814 mm2 80 B TSMC 4N Q3 2022
A100 80GB Ampere SXM4 6912 3456 6912 N/A 1410 MHz 3.2 Gbit/s HBM2e 5120-bit 1.52 TB/sec 80 GB HBM2e 19.5 TFLOPS 9.7 TFLOPS N/A 624 TOPS 19.5 TOPS N/A 78 TFLOPS 312 TFLOPS 312 TFLOPS 156 TFLOPS 19.5 TFLOPS 600 GB/sec GA100 20736 KB (192 KB × 108) 40960 KB 400 W 826 mm2 54.2 B TSMC N7 Q1 2020
A100 40GB Ampere SXM4 6912 3456 6912 N/A 1410 MHz 2.4 Gbit/s HBM2 5120-bit 1.52 TB/sec 40 GB HBM2 19.5 TFLOPS 9.7 TFLOPS N/A 624 TOPS 19.5 TOPS N/A 78 TFLOPS 312 TFLOPS 312 TFLOPS 156 TFLOPS 19.5 TFLOPS 600 GB/sec GA100 20736 KB (192 KB × 108) 40960 KB 400 W 826 mm2 54.2 B TSMC N7
V100 32GB Volta SXM3 5120 2560 N/A 5120 1530 MHz 1.75 Gbit/s HBM2 4096-bit 900 GB/sec 32 GB HBM2 15.7 TFLOPS 7.8 TFLOPS 62 TOPS N/A 15.7 TOPS N/A 31.4 TFLOPS 125 TFLOPS N/A N/A N/A 300 GB/sec GV100 10240 KB (128 KB × 80) 6144 KB 350 W 815 mm2 21.1 B TSMC 12FFN Q3 2017
V100 16GB Volta SXM2 5120 2560 N/A 5120 1530 MHz 1.75 Gbit/s HBM2 4096-bit 900 GB/sec 16 GB HBM2 15.7 TFLOPS 7.8 TFLOPS 62 TOPS N/A 15.7 TOPS N/A 31.4 TFLOPS 125 TFLOPS N/A N/A N/A 300 GB/sec GV100 10240 KB (128 KB × 80) 6144 KB 300 W 815 mm2 21.1 B TSMC 12FFN
P100 Pascal SXM/SXM2 N/A 1792 3584 N/A 1480 MHz 1.4 Gbit/s HBM2 4096-bit 720 GB/sec 16 GB HBM2 10.6 TFLOPS 5.3 TFLOPS N/A N/A N/A N/A 21.2 TFLOPS N/A N/A N/A N/A 160 GB/sec GP100 1344 KB (24 KB × 56) 4096 KB 300 W 610 mm2 15.3 B TSMC 16FF+ Q2 2016
  1. ^ Smith, Ryan (March 22, 2022). "NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder". AnandTech.
  2. ^ Smith, Ryan (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
  3. ^ "NVIDIA Tesla V100 tested: near unbelievable GPU power". TweakTown. September 17, 2017.