
Which AWS CPU is Best for FFmpeg? AMD, Graviton, or Intel
Which AWS CPU is Best for FFmpeg? AMD, Graviton, or Intel If you encode with FFmpeg on AWS, you probably know that you have
As a parent, I long ago concluded that there were no words that could come out of my mouth that would change my daughter’s views on certain topics. As a marketer, I feel some of that same dynamic, that no words can come out of my keyboard that would shake the negative beliefs about ASICs from staunch software-encoding supporters.
So, don’t take our word that these beliefs are outdated; consider the results from the world’s largest video producer, YouTube. The following slides and observations are from a Google presentation by Aki Kuusela and Clint Smullen on the Argos ASIC-based transcoder at Hot Chips 33 back in August 2021. The slides are available here, and the video here.
In the presentation, the speakers discussed why YouTube developed its own ASIC and the performance and power efficiency achieved during the first 16 months of deployment. Their comments go a long way toward dispelling the myths identified above and make for interesting reading.
In discussing why Google created its own encoder, Kuusela explained that video was getting harder to compress, not only from a codec perspective but from a resolution and frame rate perspective. Here’s Kuusela (all quotes grabbed from the YouTube video and lightly edited for readability).
Reviewing Figure 1, it should be noted that though few engineers use VP9 as extensively as YouTube, if you swap HEVC for VP9, the complexity difference between H.264 is the same. Beyond the higher resolutions and frame rates engineers must support to remain competitive, the need for hardware becomes even more apparent when you consider the demands of live production.
One consistent concern about ASICs has been quality, which admittedly lagged in early hardware generations. However, Google’s comparison shows that properly designed hardware can deliver near-parity to software-based transcoding.
Kuusela doesn’t spend a lot of time on the slide shown in Figure 2, merely stating that “we also wanted to be able to optimize the compression efficiency of the video encoder based on the real-time requirements and time available for each encoder and to have full access to all quality control algorithms such as bitrate allocation and group of picture selection. So, we could get near parity to software-based encoding quality with our no-compromises implementation.”
NETINT’s data more than supports this claim. For example, Table 1 compares the NETINT Quadra VPU with various x265 presets. Depending upon the test configuration, Quadra delivers quality on par with the x265 medium preset. When you consider that software-based live production often necessitates using the veryfast or ultrafast preset to achieve marginal throughput, Quadra’s quality far exceeds that of software-based transcoding.
Another concern about ASIC-based transcoders is the inability to upgrade, and accelerated obsolescence. Proper ASIC design allows ASICs to balance encoding tasks between hardware, firmware, and control software to ensure continued upgradeability.
Figure 3 shows how the bitrate of VP9 and H.264 continued to improve compared to software in the months after the product launch, even without changes to the firmware or kernel driver. The second Google presenter, Clint Smullen attributed this to a hybrid hardware/software design, commenting that “Using a software approach was critical both to supporting the quality and feature development in the video core as well as allowing customer teams to iteratively improve quality and performance.”
The NETINT Codensity G4 ASIC included in the T408 and the NETINT Codensity G5 ASIC that powers our Quadra family of VPUs, both use a hybrid design that distributes critical functions between the ASIC, driver software, and firmware.
We optimize ASIC design to maximize functional longevity as explained here on the role of firmware in ASIC implementations, “The functions implemented in the hardware are typically the lower-level parts of a video codec standard that do not change over time, so the hardware does not need to be updated. The higher levels parts of the video codecs are in firmware and driver software and can still be changed.”
As Google’s experience and NETINT’s data show, well-designed ASICs can continue improving in quality and functionality long after deployment.
Few engineers question the throughput and power efficiency of ASICs, and Google’s data bears this out. Commenting on Figure 4, Smullen stated, “For H.264 transcoding a single VCU matches the speed of the baseline system while using about one-tenth of the system level power. For VP9, a single 20 VCU machine replaces multiple racks of CPU-only systems.”
NETINT ASICs deliver similar results. For example, a single T408 transcoder (H.264 and HEVC) delivers roughly the same throughput as a 16-core computer encoding with software and draws only about 7 watts compared to 250+ for the computer. NETINT Quadra draws 20 watts and delivers roughly 4x the performance of the T408 for H.264, HEVC, and AV1. In one implementation, a single 1RU rack of ten Quadras can deliver 320 1080p streams or 200 720p cloud gaming sessions, which like Argos, replaces multiple racks of CPUs.
Which AWS CPU is Best for FFmpeg? AMD, Graviton, or Intel If you encode with FFmpeg on AWS, you probably know that you have
World’s first AV1 live streaming CDN powered by NETINT AV1 live streaming CDN RealSprint’s vision for Vindral, its live-streaming CDN, is to deliver the quality
Hardware Transcoding. What it Is, How it Works, and Why You Care What is Transcoding? Like most terms relating to streaming, transcoding is defined
NETINT Video Transcoding Server Review Many high-volume streaming platforms and services still deploy software-only transcoding, but high energy prices for private data centers and