The use of lower precision numbers, such as FP8, has been responsible for most of the improvement in GPU performance in last 10 years. The big change from the last set of training benchmarks was that the company had enabled Gaudi 2’s 8-bit floating-point (FP8) capabilities. Intel submitted results for systems using the Gaudi 2 accelerator chip and for those that had no accelerator at all, relying on only its fourth-generation Xeon CPU. A computer of more reasonable size-512 H100s-would take four months. Extrapolating from Eos’s 4 minutes means it would take eight days to complete the training, and that’s on what might be the most powerful AI supercomputer yet built. Instead, it involves training the system to a certain checkpoint that proves the training would have reached the needed accuracy given enough time. The GPT-3 benchmark Eos tackled is not a complete training of GPT-3, because MLPerf wanted it to be within reach of many companies. Efficient scaling is key to continued improvement of generative AI, which has been growing tenfold every year. That threefold increase achieved a 2.8-fold performance improvement, or 93 percent scaling efficiency. Eos triples the number of H100 GPUs that have been bound into a single machine.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |