Startec

Startec

Meta AI Introduces MTIA v1: It's First-Generation AI Inference Accelerator

Mai 21, às 18:07

·

4 min de leitura

·

0 leituras

At Meta, AI workloads are everywhere, serving as the foundation for numerous applications like content comprehension, Feeds, generative AI, and ad ranking. Thanks to its seamless Python integration,...
Meta AI Introduces MTIA v1: It's First-Generation AI Inference Accelerator

At Meta, AI workloads are everywhere, serving as the foundation for numerous applications like content comprehension, Feeds, generative AI, and ad ranking. Thanks to its seamless Python integration, eager-mode programming, and straightforward APIs, PyTorch can run these workloads. In particular, DLRMs are vital to enhancing user experiences across all of Meta’s products and offerings. The hardware systems must supply increasingly more memory and computing as the size and complexity of these models grow, all without sacrificing efficiency.

When it comes to the highly efficient processing of Meta’s unique recommendation workloads at scale, GPUs aren’t always the best option. To address this issue, the Meta team developed a set of application-specific integrated circuits (ASICs) called the “Meta Training and Inference Accelerator” (MTIA). With the needs of the next-generation recommendation model in mind, the first-generation ASIC is included in PyTorch to develop a completely optimized ranking system. Keeping developers productive is an ongoing process as they maintain support for PyTorch 2.0, which dramatically improves the compiler-level performance of PyTorch.

In 2020, the team created the original MTIA ASIC to handle Meta’s internal processing needs. Co-designed with silicon, PyTorch, and the recommendation models, this inference accelerator is part of a full-stack solution. Using a TSMC 7nm technology, this 800 MHz accelerator can achieve 102.4 TOPS with INT8 precision and 51.2 TFLOPS with FP16 precision. The device’s TDP, or thermal design power, is 25 W.

The accelerator can be divided into constituent parts, including processing elements (PEs), on-chip and off-chip memory resources, and interconnects in a grid structure. An independent control subsystem within the accelerator manages the software. The firmware coordinates the execution of jobs on the accelerator, controls the available computing and memory resources, and communicates with the host through a specific host interface. LPDDR5 is used for off-chip DRAM in the memory subsystem, which allows for expansion to 128 GB. More bandwidth and far less latency are available for frequently accessed data and instructions because the chip’s 128 MB of on-chip SRAM is shared among all the PEs.

The 64 PEs in the grid are laid out in an 8 by 8 matrix. Each PE’s 128 KB of local SRAM memory allows for speedy data storage and processing. A mesh network links the PEs together and to the memory banks. The grid can be used in its whole to perform a job, or it can be split up into numerous subgrids, each of which can handle its work. Matrix multiplication, accumulation, data transportation, and nonlinear function calculation are only some of the important tasks optimized for by the multiple fixed-function units and two processor cores in each PE. The RISC-V ISA-based processor cores have been extensively modified to perform the required computation and control operations. The architecture was designed to make the most of two essentials for effective workload management: parallelism and data reuse.

The researchers compared MTIA to an NNPI accelerator and a graphics processing unit. The results show that MTIA relies on efficiently managing small forms and batch sizes for low-complexity models. MTIA actively optimizes its SW stack to achieve similar levels of performance. In the meantime, it uses larger forms that are significantly more optimized on the GPU’s SW stack to run medium- and high-complexity models.

To optimize performance for Meta’s workloads, the team is now concentrating on finding a happy medium between computing power, memory capacity, and interconnect bandwidth to develop a better and more efficient solution.


Check out the Project. Don’t forget to join our 21k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

🚀 Check Out 100’s AI Tools in AI Tools Club

Tanushree Shenwai

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.


Continue lendo

DEV

"From Occupational Therapy to Code": CodeNewbie Podcast S24E4
In S24E4 of the CodeNewbie Podcast, @saronyitbarek talks about building inclusive web applications with Africa Mincey, Accessibility Engineer! Africa Mincey is a Software Engineer and accessibility...

Hoje, às 15:34

DEV

How to make your own simple translator in HTML & JavaScript
DESCRIPTION: In this tutorial, I will teach you how to make a website that translates one language into another using HTML, CSS, and JavaScript. Part One [Choosing a Language]: When choosing a...

Hoje, às 15:23

Tech Crunch

Cortex raises $35M Series B for its internal developer portal
Cortex, a startup building an internal developer portal that helps engineering teams build better software at scale, today announced a $35 million Series B funding round led by IVP. Craft Ventures, along with...

Hoje, às 15:01

Tech Crunch

Okay, which analyzes engineers’ productivity, sells to Stripe
Fintech giant Stripe has acquired Okay, a startup that developed a low-code analytics software to help engineering leaders better understand how their teams are performing, the companies told TechCrunch...

Hoje, às 15:00

DEV

Next.js 13 and React Suspense: Create a loading component
Written by Suraj Vishwakarma✏️ It has been some time since Next.js 13 was launched in October 2022. In the beta, we learned of some major changes coming to Next.js, like support for Suspense, a React feature...

Hoje, às 14:29

DEV

AWS Lambda Use Cases: When You Should Use It?
Lambda, and Serverless in general, is rather “in” right now in the world of cloud computing. If you listened to all the marketing coming out from the big names about it (and yes, I’m guilty of this too);...

Hoje, às 14:13

Tech Crunch

Hyro secures $30M for its AI-powered, healthcare-focused conversational platform
Israel Krush and Rom Cohen first met in an AI course at Cornell Tech, where they bonded over a shared desire to apply AI voice technologies to the healthcare sector. Specifically, they sought to automate the...

Hoje, às 14:00

Tech Crunch

SecureSave's secret weapon: Suze Orman
Hello, and welcome back to Equity, a podcast about the business of startups, where we unpack the numbers and nuance behind the headlines. This is our Wednesday show, where we niche down to a single person,...

Hoje, às 13:59

DEV

Welcome Thread - v227
Leave a comment below to introduce yourself! You can talk about what brought you here, what you're learning, or just a fun fact about yourself. Reply to someone's comment, either with a question or just a...

Hoje, às 13:51

AI | Techcrunch

Just how hard is it for startups to raise capital today?
Hello, friends! If you are a founder looking to raise your first external capital or your startup is a bit farther down the line, you need to know what’s going on in the world of venture capital. Don’t worry, the TechCrunch+ crew has your back. Building off TechCrunch’s rapid-fire coverage of individual startup funding rounds, […]

Hoje, às 13:33