Tesla Dojo和Nvidia Blackwell比較

超級電腦嘅性能真係日新月異

I.T. 9 遊戲日誌
5 min readMar 21, 2024

2021/2022 Tesla Dojo

Source: https://www.youtube.com/watch?v=j0z4FweCy4M, https://www.youtube.com/watch?v=ODSJsviD_SU

D1 Chip和Training Tile
輔助設備
Dojo Cabinet和ExaPOD

1粒D1 Chip [362 TFlops, 422.5MB SRAM]

25粒D1 Chip = 1塊Training Tiles [9 PFlops, 11GB SRAM]

12塊Tiles + 1280GB High Bandwidth DRAM + 8TB Host DRAM + 電源 = 1條Rack/Cabinet [108 PFlops, 200kW]

10條Rack = 1組Cluster/ExaPod [1.08 EFlops, 2MW, 1.3TB SRAM, 12.5TB High Speed DRAM, 2MW]

浮點精度係BF16/CFP8

每粒D1 Chip每邊嘅訊頻寬係4TBps

2024 Nvidia Blackwell

Source: https://www.youtube.com/watch?v=Y2F8yisiS6E

1 Blackwell GPU [10PFlops, 192GB DRAM]

2粒Blackwell GPU + 1粒Grace CPU = 1塊GB200 Blackwell Superchip [20PFlops, 384GB DRAM + 384GB “Fast Memory”]

2塊GB200 Blackwell Superchip = 1部Node [40PFlops, 1.7TB DRAM]

18部Node + 若干NVLink設備 = 1條Rack GB200 NVL72 [720PFlops, 30.6TB DRAM, 120kW]

8條GB200 NVL72 + 10條唔知乜嘢 = 1組Cluster [5.760ExaFlops]

浮點精度係FP8 (slide中嘅AI Performance係指FP4,後面有slide指FP8係一半速度)

每粒Blackwell GPU係由兩個die組成,之間通訊頻寬係10TBps

比較

當住BF16係兩倍FP8咁勁先,噉以FP8計嘅話…

1條Rack:

  • Tesla Dojo: 216PFlops, 200kW, 1.25TB DRAM, 0.13TB SRAM
  • Nvidia Blackwell: 720PFlops, 120kW, 30.6TB DRAM

Moore’s Law話18個月double乜乜乜嘛…睇落好似差唔多啦。(To be exact係積體電路上可容納的電晶體數目,約每隔18個月便增加一倍,而唔係指性能)

註:啲suffix由細到大係kilo, Mega, Giga, Tera, Peta, Exa

其他廢話

Nvidia個Omniverse ecosystem睇落係幾勁(勁咗好多年㗎啦唔係今年先出),用來做digital twins或者simulation嘅話睇落好勁。

不過generative AI training或inferencing又唔需要omniverse嘅,AMD即使缺少呢套嘢但市場份額應該都仲有得追。

Intel就算數把啦。🤪

--

--

I.T. 9 遊戲日誌

「IT9,你的資訊真的很有用」 你好 我就係IT9 Trust me I am IT9 // fb@it9gamelog, youtube@it9gamelog