
Mobile cloud gaming and technology suppliers
Video games are a huge market segment, projected to reach US$221.4 billion in 2023, expanding to an estimated US$285 billion by 2027. Of that,
As a parent, I long ago concluded that there were no words that could come out of my mouth that would change my daughter’s views on certain topics. As a marketer, I feel some of that same dynamic, that no words can come out of my keyboard that would shake the negative beliefs about ASICs from staunch software-encoding supporters.
So, don’t take our word that these beliefs are outdated; consider the results from the world’s largest video producer, YouTube. The following slides and observations are from a Google presentation by Aki Kuusela and Clint Smullen on the Argos ASIC-based transcoder at Hot Chips 33 back in August 2021. The slides are available here, and the video here.
In the presentation, the speakers discussed why YouTube developed its own ASIC and the performance and power efficiency achieved during the first 16 months of deployment. Their comments go a long way toward dispelling the myths identified above and make for interesting reading.
In discussing why Google created its own encoder, Kuusela explained that video was getting harder to compress, not only from a codec perspective but from a resolution and frame rate perspective. Here’s Kuusela (all quotes grabbed from the YouTube video and lightly edited for readability).
Reviewing Figure 1, it should be noted that though few engineers use VP9 as extensively as YouTube, if you swap HEVC for VP9, the complexity difference between H.264 is the same. Beyond the higher resolutions and frame rates engineers must support to remain competitive, the need for hardware becomes even more apparent when you consider the demands of live production.
One consistent concern about ASICs has been quality, which admittedly lagged in early hardware generations. However, Google’s comparison shows that properly designed hardware can deliver near-parity to software-based transcoding.
Kuusela doesn’t spend a lot of time on the slide shown in Figure 2, merely stating that “we also wanted to be able to optimize the compression efficiency of the video encoder based on the real-time requirements and time available for each encoder and to have full access to all quality control algorithms such as bitrate allocation and group of picture selection. So, we could get near parity to software-based encoding quality with our no-compromises implementation.”
NETINT’s data more than supports this claim. For example, Table 1 compares the NETINT Quadra VPU with various x265 presets. Depending upon the test configuration, Quadra delivers quality on par with the x265 medium preset. When you consider that software-based live production often necessitates using the veryfast or ultrafast preset to achieve marginal throughput, Quadra’s quality far exceeds that of software-based transcoding.
Another concern about ASIC-based transcoders is the inability to upgrade, and accelerated obsolescence. Proper ASIC design allows ASICs to balance encoding tasks between hardware, firmware, and control software to ensure continued upgradeability.
Figure 3 shows how the bitrate of VP9 and H.264 continued to improve compared to software in the months after the product launch, even without changes to the firmware or kernel driver. The second Google presenter, Clint Smullen attributed this to a hybrid hardware/software design, commenting that “Using a software approach was critical both to supporting the quality and feature development in the video core as well as allowing customer teams to iteratively improve quality and performance.”
The NETINT Codensity G4 ASIC included in the T408 and the NETINT Codensity G5 ASIC that powers our Quadra family of VPUs, both use a hybrid design that distributes critical functions between the ASIC, driver software, and firmware.
We optimize ASIC design to maximize functional longevity as explained here on the role of firmware in ASIC implementations, “The functions implemented in the hardware are typically the lower-level parts of a video codec standard that do not change over time, so the hardware does not need to be updated. The higher levels parts of the video codecs are in firmware and driver software and can still be changed.”
As Google’s experience and NETINT’s data show, well-designed ASICs can continue improving in quality and functionality long after deployment.
Few engineers question the throughput and power efficiency of ASICs, and Google’s data bears this out. Commenting on Figure 4, Smullen stated, “For H.264 transcoding a single VCU matches the speed of the baseline system while using about one-tenth of the system level power. For VP9, a single 20 VCU machine replaces multiple racks of CPU-only systems.”
NETINT ASICs deliver similar results. For example, a single T408 transcoder (H.264 and HEVC) delivers roughly the same throughput as a 16-core computer encoding with software and draws only about 20 watts compared to 250+ for the computer. NETINT Quadra draws 70 watts and delivers roughly 4x the performance of the T408 for H.264, HEVC, and AV1. In one implementation, a single 1RU rack of ten Quadras can deliver 320 1080p streams or 200 720p cloud gaming sessions, which like Argos, replaces multiple racks of CPUs.
Video games are a huge market segment, projected to reach US$221.4 billion in 2023, expanding to an estimated US$285 billion by 2027. Of that,
Even in 2023, many high-volume streaming producers continue to rely on software-based transcoding, despite the clear CAPEX, OPEX, and environmental benefits of ASIC-based transcoding.
The thing about FFmpeg is that there are almost always multiple ways to accomplish the same basic function. In this post, we look at four
Power consumption is a priority for NETINT customers and a passion for NETINT engineers and technicians. Matthew Ariho, a system engineer in SoC Engineering at