Startec

Startec

MTIA v1: Meta’s first-generation AI inference accelerator

Mai 18, às 18:39

·

2 min de leitura

·

0 leituras

In 2020, we initiated the Meta Training and Inference Accelerator (MTIA) family of chips to support our evolving AI workloads, starting with an inference accelerator ASIC for deep learning recommendation models (DLRMs).
MTIA v1: Meta’s first-generation AI inference accelerator

The MTIA software (SW) stack aims to provide developer efficiency and high performance. It integrates fully with PyTorch, providing a familiar developer experience. Using PyTorch with MTIA is as easy as using PyTorch for CPUs or GPUs. The MTIA SW stack benefits from the flourishing PyTorch developer ecosystem and tooling. The compiler performs model-level transformations and optimizations using PyTorch FX IR and low-level optimizations using LLVM IR, with extensions to support the custom architecture and ISA of the MTIA accelerator.

The PyTorch runtime for MTIA manages on-device execution and features such as MTIA tensors, memory management, and the APIs for scheduling operators on the accelerator. The runtime and firmware perform communication to the accelerator device. The SW stack supports different modes of execution, such as eager mode and graph mode, and allows workloads to be partitioned across multiple accelerator cards. In the latter case, the SW stack also provides the necessary synchronization and communication between multiple accelerator boards.

The MTIA software stack.

There are multiple ways to author compute kernels that can run on the accelerator, including using PyTorch, C/C++ (for hand-tuned, very optimized kernels), and a new domain-specific language called KNYFE, which takes a short, high-level description of an ML operator as input and generates optimized, low-level C++ kernel code that is the implementation of this operator for MTIA.

Low-level code generation and optimizations leverage the open source LLVM compiler toolchain with MTIA extensions. The LLVM compiler then takes care of the next level of optimization and code generation to produce efficient executables that run on the processor cores within the PEs.

As part of the SW stack, we have also developed a library of hand-tuned and highly optimized kernels for performance-critical ML kernels, such as fully connected and embedding-bag operators. The higher levels of the SW stack can choose to instantiate and use these highly optimized kernels during the compilation and code generation process.

The MTIA SW stack continues to evolve with integration to PyTorch 2.0, which is faster and more Pythonic, yet as dynamic as ever. This will enable new features such as TorchDynamo and TorchInductor. We are also extending Triton DSL to support MTIA accelerators and using MLIR for internal representations and advanced optimizations.


Continue lendo

Showmetech

Motorola Razr Plus é o novo dobrável rival do Galaxy Z Flip
Após duas tentativas da Motorola em emplacar — novamente — telefones dobráveis, eis que temos aqui a terceira, e aparentemente bem-vinda, tentativa. Estamos falando do Motorola Razr Plus, um smartphone...

Hoje, às 15:20

DEV

Mentoring for the LGBTQ+ Community
Once unpublished, all posts by chetanan will become hidden and only accessible to themselves. If chetanan is not suspended, they can still re-publish their posts from their dashboard. Note: Once...

Hoje, às 15:13

TabNews

IA: mais um arrependido / Déficit de TI / Apple: acusação grave · NewsletterOficial
Mais um pioneiro da IA se arrepende de seu trabalho: Yoshua Bengio teria priorizado segurança em vez de utilidade se soubesse o ritmo em que a tecnologia evoluiria – ele junta-se a Geoffr...

Hoje, às 14:37

Hacker News

The Analog Thing: Analog Computing for the Future
THE ANALOG THING (THAT) THE ANALOG THING (THAT) is a high-quality, low-cost, open-source, and not-for-profit cutting-edge analog computer. THAT allows modeling dynamic systems with great speed,...

Hoje, às 14:25

TabNews

[DISCUSÃO/OPINIÕES] – Outsourcing! O que, para quem, por que sim, por que não! · dougg
Quero tentar trazer nesta minha primeira publicação, uma mistura de um breve esclarecimento sobre o que são empresas de outsourcing, como elas funcionam e ganham dinheiro, mas também, ven...

Hoje, às 13:58

TabNews

Duvida: JavaScript - Desenvolver uma aplicação que vai ler um arquivo *.json · RafaelMesquita
Bom dia a todos Estou estudando javascript e me deparei com uma dificuldade e preciso de ajuda *Objetivo do estudo: *desenvolver uma aplicação que vai ler um arquivo *.json Conteudo do in...

Hoje, às 13:43

Showmetech

Automatize suas negociações com um robô de criptomoedas
Índice Como o robô de criptomoedas Bitsgap funciona?Qual a vantagem de utilizar um robô de criptomoedas?Bitsgap é confiável? O mercado de trading tem se tornado cada vez mais popular e as possibilidades de...

Hoje, às 13:13

Hacker News

Sketch of a Post-ORM
I’ve been writing a lot of database access code as of late. It’s frustrating that in 2023, my choices are still to either write all of the boilerplate by hand, or hand all database access over to some...

Hoje, às 13:11

Showmetech

14 chuveiros elétricos para o banho dos seus sonhos
Índice Chuveiro ou Ducha?Tipos de chuveiro elétrico9 fatores importantes para considerar na hora de comprar chuveiros elétricosMelhores chuveiros elétricosDuo Shower LorenzettiFit HydraAcqua Storm Ultra...

Hoje, às 11:00

DEV

Learn about the difference between var, let, and const keywords in JavaScript and when to use them.
var, let, and const: What's the Difference in JavaScript? JavaScript is a dynamic and flexible language that allows you to declare variables in different ways. You can use var, let, or const keywords to...

Hoje, às 10:21