Meta Builds AI Supercomputer
Written by Lucy Black   
Thursday, 27 January 2022

Meta, formerly known as Facebook, has announced that its researchers have designed and built an AI Research SuperCluster (RSC) that they believe is among the fastest AI supercomputers running today and will be the fastest AI supercomputer in the world when, in mid-2022, it’s fully built.

metaaisuperc

Announcing the new supercomputers, Kevin Lee, Technical Program Manager, and Shubho Sengupta, Software Engineer at Meta, said that Meta researchers have already started using RSC to train large models in natural language processing (NLP) and computer vision for research, with the aim of one day training models with trillions of parameters.

The need for the supercomputer is driven by the creation of increasingly large, complex, and adaptable models that are being trained in areas including vision, speech, language, or for critical use cases like identifying harmful content.

Like other AI supercomputers, the Meta machine has been built by combining multiple GPUs into compute nodes, which are then connected by a high-performance network fabric to allow fast communication between those GPUs. RSC today comprises a total of 760 NVIDIA DGX A100 systems as its compute nodes, for a total of 6,080 GPUs. RSC’s storage tier has 175 petabytes of Pure Storage FlashArray, 46 petabytes of cache storage in Penguin Computing Altus systems, and 10 petabytes of Pure Storage FlashBlade.

metaaisuperc2

The researchers say that early benchmarks on RSC, compared with Meta’s legacy production and research infrastructure, show it runs computer vision workflows up to 20 times faster, runs the NVIDIA Collective Communication Library (NCCL) more than nine times faster, and trains large-scale NLP models three times faster. That means a model with tens of billions of parameters can finish training in three weeks, compared with nine weeks before.

One question raised by the need for data to train such a system is that models have to be taught using real-world data from Meta's production systems. This raises questions on privacy and security, which the researchers say is handled by RSC being isolated from the larger internet, with no direct inbound or outbound connections, and traffic can flow only from Meta’s production data centers.

They say:

"To meet our privacy and security requirements, the entire data path from our storage systems to the GPUs is end-to-end encrypted"

The data is also anonymized, and only decrypted at one endpoint.

meta

More Information

Meta AI Blog

Related Articles

AWS And Facebook Launch PyTorch Tools

Facebook Releases Detectron2

Facebook Open Sources Natural Language Processing Model

Facebook Open Sources Two Technologies

RocksDB - Facebook's Database Now Open Source 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Sequin - Open Source Message Stream Built On Postgres
31/10/2024

Sequin is a tool for capturing changes and streaming data out of your Postgres database, guaranteeing exactly once processing. What does that mean?



52nd Mersenne Prime Found
27/10/2024

It has been nearly six years since the last Mersenne prime was discovered. Now, at last, we have Mersenne prime number 52 and it has 41,024,320 digits!


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 27 January 2022 )