MSVP: Meta’s first ASIC for video transcoding

Mai 18, às 18:39


7 min de leitura


0 leituras

Here are the primary stages of the transcoding process:Support for a variety of elementary input video stream formats, including H.264, HEVC, VP9, and AV1All profiles, 8/10-bit pixel sample depths, and YUV420...
MSVP: Meta’s first ASIC for video transcoding

Here are the primary stages of the transcoding process:

  • Support for a variety of elementary input video stream formats, including H.264, HEVC, VP9, and AV1
  • All profiles, 8/10-bit pixel sample depths, and YUV420 color format
  • Format conversion, including video overlays
  • Frame resampling operation from arbitrary input resolutions to multiple resolutions (up to 4x) with high precision and wide filters
  • Shot detection
  • Support for H.264 (AVC) and VP9 coding standards
  • 8-bit pixel sample depth, YUV420 color format
  • Full reference: SSIM, MS-SSIM, VIF, PSNR (Luma and Chroma)
  • No-reference blurriness
  • QM at multiple viewport resolutions

These stages are implemented as memory-to-memory operations, meaning intermediate buffers are stored back to DRAM and refetched as needed by the downstream operation.

Power and performance

Each MSVP ASIC can offer a peak transcoding performance of 4K at 15fps at the highest quality configuration with 1-in, 5-out streams and can scale up to 4K at 60fps at the standard quality configuration. Performance scales linearly with resolution. This performance is achieved at ~10W of PCIe module power. We achieved a throughput gain of ~9x for H.264 when compared against libx264 SW encoding. For VP9, we achieved a throughput gain of ~50x when compared with libVPX speed 2 preset.


faster throughput for H.264 compared to libx264 SW


faster throughput for VP9 compared to libVPX speed 2

In video coding, we deploy a method to assess and compare compression efficiency, called the Bjontegaard delta rate (BD-Rate), which estimates the number of bits saved (if the BD-Rate is a negative figure) in order to deliver the same objective quality for a given video over a baseline configuration.

MSVP’s video encoding algorithms

The MSVP encoder has two main goals: to be highly power efficient and to deliver the same or better video quality as software encoders. There are existing video encoder IPs, but most of them are targeted at mobile devices with tight area/power constraints and cannot meet the quality bar set by current software encoders.

Because software encoders offer very flexible control and fast evolution over time, it is quite challenging for ASIC video encoders to meet the same performance bar as software encoders.

Here’s a simplified version of the data flow of modern hybrid (hardware and software) video encoders:

Flowchart showing simplified video encoder modules.

Simplified video encoder modules.

These encoders use intra-coding to reduce spatial redundancy and inter-coding to remove temporal redundancy. Different stages of motion estimation are applied in inter-coding to find out the best prediction among all possible block positions in available reference frames. Entropy coding is the lossless compression part that squeezes the statistical redundancy of all syntax elements, including encoding modes, motion vectors and quantized residual coefficients.

For MSVP’s algorithms to perform the way we wanted, we had to find hardware-friendly alternatives for each of the above key modules. We mainly focused on three levels: block level, frame level, and group of picture (GOP) level.

At the block level, we looked for coding tools with the highest return on investment, that were easy/economical (in terms of silicon area and power requirements) to implement in hardware, and that met our performance targets while maximizing compression efficiency. At frame level, we studied the best algorithms to make intelligent frame type decisions among I/P/B frames, and the best rate-control algorithms based on statistics collected from hardware. And at the GOP level, we had to figure out whether to use multiple-pass encoding with look-ahead, or to insert intra (key) frames at a given shot boundary.

Motion estimation

Motion estimation is one of the most computationally intensive algorithms in video encoding. To find accurate motion vectors that closely match the block currently being encoded, a full motion estimation pipeline often includes a multistage search to balance among large search range, computing complexity, and accuracy.

MSVP’s motion search algorithm needs to be one that identifies which potential neighboring blocks can contribute more to quality and only searches around highly correlated neighbors with a limited cycle budget. Although we lack the flexibility of iterative software motion search algorithms, such as diamond or hexagon shapes, hardware motion estimation can search multiple blocks in parallel. Thus, it allows us to search more candidates, cover a larger search range and more reference frames in both single direction and bidirectional mode, and search all supported block partition shapes in parallel.

Rate distortion optimization (RDO)

Achieving high video encoding quality also requires RDO support. Since there are so many decisions to make in video encoding (intra/inter modes, partition block size, transform block types/sizes, etc.), RDO is one of the best practices in video compression to determine which mode is optimal given the current rate or quality target.

MSVP supports exhaustive RDO at almost all mode decision stages. Distortion calculation is intensive but both straightforward and easily parallelizable. But the unique challenge is the bit rate estimation. Entropy coding for the final bitstream is sequential in nature, and each context model is dependent on the previously encoded ones. In a hardware encoder implementation, rate distortion (RD) cost for different blocks/partitions might be evaluated in parallel; thus, it is impossible to have very accurate bit rate estimation. We implemented a pretty accurate bit rate estimation model in MSVP. The model is hardware friendly, in that it is easy to evaluate multiple coding modes in parallel.

Smart quantization

Quantization is the only lossy part of video compression, and it is also the dominant bit rate control knob in any video coding standard. The corresponding parameter is called the quantization parameter (QP), and it is inversely related to quality: Low QP values result in small quantization errors, creating low distortion levels and, subsequently, high quality at the expense of higher bit rates. By making smart quantization choices, encoding bits can be allocated to areas that impact visual quality the most. We perform smart quantization using optimal QP selection and rounding decisions.

Modern video coding standards allow different QP values to be applied to different coding units. In MSVP’s hardware encoder, block-level QP values are determined adaptively based on both spatial and temporal characteristics.

In spatial adaptive QP (AQP) selection, since the human visual system is less sensitive to quality loss at high texture or high motion areas, a larger QP value can be applied to these coding blocks. In temporal AQP, coding blocks that are referenced more in the future can be quantized with a lower QP to get higher quality, such that future coding blocks that reference these blocks will benefit from it.

Smart rounding tries to make a joint optimization on rounding decisions for all coefficients in each coding block. Since the choices of rounding at different coefficient positions are dependent on one another, we need better algorithms that remove the dependency while maintaining the rounding decision accuracy. To reduce compute cost, we’ve applied smart rounding to the final stage after the coding mode for each block is determined. This feature alone can achieve a ~1 percent to 2 percent BD-Rate improvement.


The frame-level algorithm for the MSVP H.264 encoder can be configured to be either two-pass or one-pass, depending on whether it is a VOD or live streaming use case. In the high quality (longer latency) VOD two-pass mode, MSVP looks ahead N frames and collects statistics, such as intra/inter cost and motion vectors, from these frames. Then, based on the statistics collected in the look-ahead, frame level control applies back-propagation on the reference tree in the look-ahead buffers for each reference frame to assign an importance to frames. Then, finally, the accumulated reference importance of the frame to be coded is modulated using temporal AQP of each block. Finally, the delta QP map is passed to the final encoding pass to be used as the encoding QP, also captured in the output bitstream.

Flowchart showing H.264 encoder frame level control flow.

MSVP H.264 encoder frame level control flow.


In MSVP’s VP9 encoder, multiple-pass encoding is also enabled for high-quality VOD use cases. An analysis pass (the first pass) is performed up front to capture the video characteristics into a set of statistics, and the statistics are used to determine the frame level parameters for filtering and encoding. Since VP9’s frame type is different from H.264’s, the strategy for making frame level decisions is also different, as shown in the following figure:

Flowchart showing VP9 encoder frame level algorithm flow.

VP9 encoder frame level algorithm flow.

Continue lendo


How to Create an Evil Twin Access Point; Step-by-Step Guide
Step-by-Step Guide: Creating an Evil Twin An Evil Twin Access Point is a malicious wireless access point that is set up to mimic a legitimate one. It can be used to intercept sensitive information such as...

Jun 3, às 23:41


Atomic Design: A Methodology for Building Design Systems
Introduction Atomic Design is a methodology for creating design systems that recognizes the need to develop thoughtful design systems, rather than creating simple collections of web pages. In this approach,...

Jun 3, às 23:04

Hacker News

Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Language is often considered a key aspect of human thinking, providing us with exceptional abilities to generalize, explore, plan, replan, and adapt to new situations. However, Reinforcement...

Jun 3, às 23:00

AI | Techcrunch

YouTube rolls back its rules against election misinformation
YouTube was the slowest major platform to disallow misinformation during the 2020 U.S. election and almost three years later, the company will toss that policy out altogether. The company announced Friday...

Jun 3, às 22:57


Techinical Debt; what is it?
Imagine you're building a house. You want to finish it quickly, so you take some shortcuts along the way. You use low-quality materials, skip some important steps, and don't do thorough testing. The house is...

Jun 3, às 22:45

Marktechpost AI Research News

Researchers From UT Austin and UC Berkeley Introduce Ambient Diffusion: An AI Framework To Train/Finetune Diffusion Models Given Only Corrupted Data As Input
For learning high-dimensional distributions and resolving inverse problems, generative diffusion models are emerging as flexible and potent frameworks. Text conditional foundation models like Dalle-2, Latent...

Jun 3, às 22:40


Exploring Tech Paths: Discovering Exciting Possibilities
Table of Contents: Introduction Understanding the Diverse Fields in Tech The Search for the Right Tech Path A Case Scenario: Linda's Journey Actionable Advice for Exploring Tech Paths Success Story: Jane's...

Jun 3, às 20:55

Código Fonte

Ele usou o GPT-4 para criar uma nova linguagem de programação – Código Fonte
É inegável que os largos modelos de linguagem (LLMs), que são a base dos atuais algoritmos de Inteligência Artificial generativa, chegaram para revolucionar, nem que seja somente o ano de 2023, até a...

Jun 3, às 20:51


☸️ How to deploy a cost-efficient AWS/EKS Kubernetes cluster using Terraform in 2023
Introduction Variables Providers and locals The enclosing VPC network The actual Kubernetes cluster Docker registries S3 application bucket Outputs Conclusion Cover image generated locally by DiffusionBee...

Jun 3, às 20:18

AI | Techcrunch

Gig workers get paid, Fidelity slashes Reddit's valuation and AI conquers Minecraft
Hey, folks, welcome to Week in Review (WiR), TechCrunch’s regular newsletter that recaps the week in tech. Hope the summer’s treating y’all well — it’s a balmy 90 degrees here in NYC! — and that some...

Jun 3, às 20:15