Video Transcoder vs. Video Processing Unit (VPU)

When choosing a product for live stream processing, half the battle is knowing what to search for. Do you want a live transcoder, a video processing unit (VPU), a video coding unit (VCU), Scalable Video Processor (SVP) or something else? If you’re not quite sure what these terms mean and how they relate, this short article will educate you in four minutes or less.  

In the Beginning, There Were Transcoders

Simply stated, a transcoder is any technology, software or hardware, that can input a compressed stream (decode) and output a compressed stream (encode). FFmpeg is a transcoder, and for video-on-demand applications, it works fine in most low-volume applications.

For live applications, particularly high-volume live interactive applications (think Twitch), you’ll probably need a hardware transcoder to achieve the necessary cost per stream (CAPEX), operating cost per stream, and density.

For example, the NETINT Video Transcoding Server, a single 1RU server with ten NETINT T408 Video Transcoders, can deliver up to 80 H.264/HEVC 1080p30 streams while drawing under 250 watts. Performed in software using only the CPU, this same output could take up to ten separate 1RU servers, each drawing well over 250 watts.

Netint Codensity, ASIC-based T408 Video Transcoder
The NETINT T408 Video Transcoder.

Speaking of the T408, if Websters defined a transcoder (it doesn’t), it might have a picture of the T408 as the perfect example of a transcoder. Based on custom transcoding ASICs, the T408 is inexpensive ($400), capable (4K @ 60 FPS or 4x 1080p60 streams), flexible (H.264 and HEVC), and exceptionally efficient (only 7 watts).

What doesn’t the T408 do? Well, that leads us to the difference between a transcoder and a VPU.

The difference between a transcoder and a Video Processing Unit (VPU)

First, the T408 doesn’t scale video. If you’re building a full encoding ladder from a high-resolution source, all the scaling for the lower rungs is performed by the host CPU. In addition, the T408 doesn’t perform overlay in hardware. So, if you insert a logo or other bug over your videos, again, the CPU does the heavy lifting.

Finally, the T408 was launched in 2019, the first ASIC-based transcoder to ship in quite a long time. So, it’s not surprising that it doesn’t incorporate any artificial intelligence processing capabilities.

What is a Video Processing Unit (VPU)?

What’s a Video Processing Unit? A hardware device that does all that extra stuff, scaling, overlay, and AI. You see this in the transcoding pipeline shown below, which is for the NETINT Quadra.

When it came to labeling the Quadra, you see the problem; It does much more than a video transcoder. Not only does it outperform the T408 by a factor of four, it adds AV1 output and all the additional hardware functionality. It’s much more than a simple video transcoder, it’s a video processing unit (VPU).

As much as we’d like to lay claim to the acronym, it actually existed before we applied it to the Quadra. It’s not surprising. It follows the terminology for CPU (central processing unit) and GPU (graphical processing unit). And, if Websters defined VPU (it doesn’t). Oh, you get the point. Here’s the required Quadra glamour shot.

Netint Codensity, ASIC-based Quadra T1A Video Processing Unit
The NETINT Quadra Video Processing Unit.

VCUs and M(SVP)

While NETINT was busy developing ASIC-based transcoders and VPUs for the mass market, large video publishers like YouTube and Meta produced their own ASICs to achieve similar benefits (and produce more acronyms). In 2021, when Google shipped their own ASIC-based transcoder called Argos, they labeled it a Video Coding Unit, or VCU.

Like the T408 and Quadra, the benefits of this ASIC-based technology are profound; as reported by CNET, “Argos handles video 20 to 33 times more efficiently than conventional servers when you factor in the cost to design and build the chip, employ it in Google’s data centers, and pay YouTube’s colossal electricity and network usage bills.” Interestingly, despite YouTube’s heavy usage of the AV1 codec, Argos encodes only H.264 and VP9, not AV1.

In May 2023, Meta released their own ASIC, which, like Argos, outputs H.264 and VP9, but not AV1. Called the Meta Scalable Video Processor (MSVP), the unit delivered impressive results, including “a throughput gain of ~9x for H.264 when compared against libx264 SW encoding…[and] a throughput gain of ~50x when compared with libVPX speed 2 preset.” Meta also noted that the unit drew only 10 watts of power, which is skimpy but also about 43% higher than the T408.

Of course, neither Google or Meta sells their ASIC to third parties, so if want the CAPEX and OPEX efficiencies that ASIC-based VPUs deliver, you’ll have to buy from NETINT.

Of course, neither Google or Meta sells their ASIC to third parties, so if want the CAPEX and OPEX efficiencies that ASIC-based VPUs deliver, you’ll have to buy from NETINT. The bottom line is that whether you call it a transcoder, VPU, VCU, or MSVP, you’ll get the highest throughput and lowest power consumption if it’s powered by an ASIC.

Play Video about HARD QUESTIONS ON HOT TOPICS: ASIC-based Video Transcoder versus Video Processing Unit (VPU)
HARD QUESTIONS ON HOT TOPICS:
ASIC-based Video Transcoder versus Video Processing Unit (VPU)
Watch the full conversation on YouTube: https://youtu.be/iO7ApppgJAg

Hardware Transcoding: What it Is, How it Works, and Why You Care

What is Transcoding?

Like most terms relating to streaming, transcoding is defined more by practice than by a dictionary. In fact, transcoding isn’t in Websters or many other dictionaries. That said, it’s generally accepted that transcoding means converting a file from one format to another.  More particularly, it’s typically used within the context of a live-streaming application.

As an example, suppose you were watching a basketball game on NBA.tv. Assuming that the game is produced on-site, somewhere in the arena, a video mixer pulls together all video, audio, and graphics. The output would typically be fed into a device that compresses it to a high-bitrate H.264 or another compressed format and sends it to the cloud. You would typically call this live encoding; if the encoder is hardware-based, it would be hardware-based live encoding.

In the cloud, the incoming stream is transcoded to lower resolution H.264 streams for delivery to mobile and other devices or HEVC for delivery to a smart TV. This can be done in software but is typically performed using a hardware transcoder because it’s more efficient. More on this below.

Looking further into the production and common uses of streaming terminology, during the event or after, a video editor might create short highlights from the original H.264 video to share on social media. After editing the clip, they would encode it to H.264 or another compressed format to upload to Instagram or Facebook. You would typically call rendering the output from the software editor encoding, not transcoding, even though the software converts the H.264 input file to H.264 output, just like the transcoder.

Play Video about NETINT-Jan Ozer-Hardware Transcoding v Encoding
HARD QUESTIONS ON HOT TOPICS: Transcoding versus Encoding.
Watch the full conversation on YouTube: https://youtu.be/BcDVnoxMBLI

Boiling all this down in terms of common usage:

  • You encode a live stream from video input, in software or in hardware, to send it to the cloud for distribution. You use a live encoder, either hardware or software, for this.
  • In the cloud, you transcode the incoming stream to multiple resolutions or different formats using a hardware or software transcoder.
  • When outputting video for video-on-demand (VOD) deployment, you typically call this encoding (and not transcoding), even if you’re working from the same compressed format as the transcoding device.

Hardware Transcoding Alternatives

Anyone who has ever encoded a file knows that it’s a demanding process for your computer. When producing for VOD, time matters, but if the process takes a moment or two longer than planned, no one really notices. Live, of course, is different; if the video stream slows or is interrupted, viewers notice and may click to another website or change channels.

This is why hardware transcoding is typically deployed for high-volume transcoding applications. You can encode with a CPU and software, but CPUs perform multiple functions within the computer and are not optimized for transcoding. This means that a single server can produce fewer streams than hardware transcoders, which translates to higher CAPEX and power consumption.

Like the name suggests, hardware-based transcoding uses hardware devices other than the CPU to transcode the video. One alternative are graphics processing units (GPUs), which are highly optimized for graphic-intensive applications like gaming. Transcoding is supported with dedicated hardware circuits in the GPU, but the vast majority of circuits are for graphics and other non-transcoding functions. While GPUs are more efficient than CPUs for transcoding, they are expensive and consume significant power.

ASIC-Based Transcoding

Which takes us to ASICs. Application-Specific Integrated Circuits (ASICs) are designed for a specific task or application, like video transcoding. Because they‘re designed for this task, they are more efficient than CPU or GPU-based encoding, more affordable, and more power-efficient.

Because they‘re designed for this task, Application-Specific Integrated Circuits (ASICs) are more efficient than CPU or GPU-based encoding, more affordable, and more power-efficient.

ALEX LIU, Co-Founder,
COO at NETINT Technologies Inc.

ASICs are also very compact, so you can pack more ASICs into a server than GPUs or CPUs, increasing the output from that server. This means that fewer servers can deliver the same number of streams than with GPU or CPU-based transcoding, which saves additional server storage cost and maintenance.

While we’re certainly biased, if you’re looking for a cost-effective and power-efficient hardware alternative for high-volume transcoding applications, ASIC transcoders are the way to go. Don’t take our word for it; you can read here how YouTube converted much of their production operation to the ASIC-based Argos VCU (for video compression unit). Meta recently also released their own encoding ASIC. Of course, neither of these are for sale to the public; the primary vendor for ASIC-based transcoders is NETINT.

ASICs – The Time is Now

A brief review of the history of encoding ASICs reveals why they have become the technology of choice for high-volume video streaming services and cloud-gaming platforms.

Like all markets, there will be new market entrants that loudly announce for maximum PR effect, promising delivery at some time in the future. But, to date, outside of Google’s internal YouTube ASIC project called ARGOS and the recent Meta (Facebook) ASIC also for internal use only, NETINT is the only commercial company building ASIC-based transcoders for immediate delivery.

“ASICs are the future of high-volume video transcoding as NETINT, Google, and Meta have proven. NETINT is the only vendor that offers its product for sale and immediate delivery making the T408 and Quadra safe bets.”

Delaying a critical technology decision always carries risk. The risk is that you miss an opportunity or that your competitors move ahead of you. However, waiting to consider an announced and not yet shipping product means that you ALSO assume the manufacturing, technology, and supply chain risk of THAT product.

What if you delay only to find out that the announced delivery date was optimistic at best? Or, what if the vendor actually delivers, only for you to find out that their performance claims were not real? There are so many “what if’s” when you wait that it rarely is the right decision to delay when there is a viable product available.

Now let’s review the rebirth of ASICs for video encoding and see how they’ve become the technology of choice for high-volume transcoding operations.  

The Rebirth of ASICs for Video Encoding

An ASIC is an application specific integrated circuit that is designed to do a small number of tasks with high efficiency. ASICs are purpose-built for a specific function. The history of video encoding ASICs can be traced back to the initial applications of digital video and the adoption of the MPEG-2 standard for satellite and cable transmission.

Most production MPEG-2 encoders were ASIC-based.

As is the case for most new codec standards, the first implementation of MPEG-2 compression was CPU-based. Given the cost of using commodity servers and software, dedicated hardware is always necessary to handle the processing requirements of high-quality video encoding cost-effectively.

This led to the development and application of video encoding ASICs, which are specialized integrated circuits designed to perform the processing tasks required for video encoding. Encoding ASICs provide the necessary processing power to handle the demands of high-quality video encoding while being more cost-effective than CPU-based solutions.

With the advent of the internet, the demand for digital video continued to increase. The rise of on-demand and streaming video services, such as YouTube and Netflix, led to a shift towards CPU-based encoding solutions. This was due in part to the fact that streaming video required a more flexible approach to encoding including implementation agility with the cloud and an ability to adjust encoding parameters based on the available bandwidth and device capabilities.

As the demand for live streaming services increased, the limitations of CPU-based encoding solutions became apparent. Live streaming services, such as cloud gaming and real-time interactive video like gaming or conferencing, require the processing of millions of live interactive streams simultaneously at scale. This has led to a resurgence in the use of encoding ASICs for live-streaming applications. Thus, the rebirth of ASICs is upon us and it’s a technology trend that should not be ignored even if you are working in a more traditional entertainment streaming environment.

NETINT: Leading the Resurgence

NETINT has been at the forefront of the ASIC resurgence. In 2019, the company introduced its Codensity T408 ASIC-based transcoder. This device was designed to handle 8 simultaneous HEVC or H.264 1080p video streams, making it ideal for live-streaming applications.

The T408 was well-received by the market, and NETINT continued to innovate. In 2021, the company introduced its Quadra series. These devices can handle up to 32 simultaneous 1080p video streams, making it even more powerful than the T408, also adding the anticipated AV1 codec.

“NETINT has racked up a number of major wins including major names such as ByteDance, Baidu, Tencent, Alibaba, Kuaishou, and a US-based global entertainment service.”

As described by Dylan Patel, editor of the Semianalysis blog, in his article Meet NETINT: The Startup Selling Datacenter VPUs To ByteDance, Baidu, Tencent, Alibaba, And More, “NETINT has racked up a number of major wins including major names such as ByteDance, Baidu, Tencent, Alibaba, Kuaishou, and a similar sized US-based global platform.”

NETINT Quadra T1U Video Processing Unit
– NETINT’s second-generation of shipping ASIC-based transcoders.

Patel also reported that using the HEVC codec, NETINT video transcoders and VPUs crushed Nvidia’s T4 GPU, which is widely assumed to be the default choice when moving to a hardware encoder for the data center. The density and power consumption that can be achieved with a video ASIC is unmatched compared to CPUs and GPUs.

Patel commented further, “The comparison using AV1 is even more powerful… NETINT is the leader in merchant video encoding ASICs.”

“The comparison using AV1 is even more powerful…NETINT is the leader in video encoding ASICs.”

-Dylan Patel

ASIC Advantages

ASICs are designed to perform a specific task, such as encoding video, with a high degree of efficiency and speed. CPUs and GPUs are designed to perform a wide range of general-purpose computing tasks. As evidence of this fact, today, the primary application for GPUs has nothing to do with video encoding. In fact, just 5-10% of the silicon real estate on some of the most popular GPUs in the market are dedicated to video encoding or processing. Highly compute-intensive tasks like AI inferencing are the most common workload for GPUs today.

The key advantage of ASICs for video encoding is that they are optimized for this specific task, with a much higher percentage of gates on the chip dedicated to encoding than CPUs and GPUs. ASICs can encode much faster and with higher quality than CPUs and GPUs, while using less power and generating less heat.

“ASICs can encode much faster and with higher quality than CPUs and GPUs while using less power and generating less heat.”

-Dylan Patel

Additionally, because ASICs are designed for a specific task, they can be more easily customized and optimized for specific use cases. Though some assume that ASICs are inflexible, in reality, with a properly designed ASIC, the function it’s designed for may be tuned more highly than if the function was run on a general purpose computing platform. This can lead to even greater efficiency gains and improved performance.

The key takeaway is that ASICs are a superior choice for video encoding due to their application-specific design, which allows for faster and more efficient processing compared to general-purpose CPUs and GPUs.

Confirmation from Google and Meta

Recent industry announcements from Google and Meta confirm these conclusions. When Google announced the ASIC-based Argos VCU (Video Coding Unit) in 2021, the trade press rightfully applauded. CNET announced that “Google supercharges YouTube with a custom video chip.” Ars Technica reported that Argos brought “up to 20-33x improvements in compute efficiency compared to… software on traditional servers.” SemiAnalysis reported that Argos “Replaces 10 Million Intel CPUs.”

Google’s Argos confirms the value of encoding ASICs
(and shipped 2 years after the NETINT T408).

As described in the article “Argos dispels common myths about encoding ASICs” (bit.ly/ASIC_myths), Google’s experience highlights the benefits of ASIC-based transcoders. That is, while many streaming engineers still rely on software-based transcoding, ASIC-based transcoding offers a clear advantage in terms of CAPEX, OPEX, and environmental sustainability benefits. The article goes on to address outdated concerns about the shortcomings of ASICs, including sub-par quality and the lack of upgradeability.

The article discusses several key findings from Google’s presentation on the Argos ASIC-based transcoder at Hot Chips 33, including:

  • Encoding time has grown by 8000% due to increased complexity from higher resolutions and frame rates. ASIC-based transcoding is necessary to keep video services running smoothly.
  • ASICs can deliver near-parity to software-based transcoding quality with properly designed hardware.
  • ASICs quality and functionality can be improved and changed long after deployment.
  • ASICs deliver unparalleled throughput and power efficiency, with Google reporting a 90% reduction in power consumption.

Though much less is known about the Meta ASIC, its announcement prompted Facebook’s Director of Video Encoding, David Ronca, to proclaim, “I propose that there are two types of companies in the video business. Those that are using Video Processing ASICs in their workflows, and those that will.”

“…there are two types of companies in the video business. Those that are using Video Processing ASICs in their workflows, and those that will.”

Meta proudly announces its encoding ASIC
(3 years after NETINT’s T408 ships).

Unlike the ASICs from Google and Meta, you can actually buy ASIC-based transcoders from NETINT, and in fact scores of tens of thousands of units are operating in some of the largest hyperscaler networks and video streaming platforms today. The fact that two of the biggest names in the tech industry are investing in ASICs for video encoding is a clear indication of the growing trend towards application-specific hardware in the video field. With the increasing demand for high-quality video streaming across a variety of devices and platforms, ASICs provide the speed, efficiency, and customization needed to meet these needs.

Avoiding Shiny New Object Syndrome

ASICs as the best method for transcoding high volumes of live video has not gone unnoticed, meaning you should expect product announcements that are made pointing to “availability later this year.” When these occur around prominent trade shows, it can indicate a rushed announcement made for the show, and that the later availability may actually be “much later…”

It’s useful to remember that while waiting for a new product from a third-party supplier to become available, companies face three distinct risks: manufacturing, technology, and supply chain.

Manufacturing Risk:

One of the biggest risks associated with waiting for a new product is the manufacturing risk, which means that the product may have issues in manufacturing. That is, there is always a chance that the manufacturing process may encounter unexpected problems, causing delays and increasing costs. For example, Intel has faced manufacturing issues with its 10nm processors, which resulted in delays for its upcoming processors. As a result, Intel lost market share to competitors such as AMD and NVIDIA, who were able to release their products earlier.

Technology Risk:

Another risk associated with waiting for a new product is technology risk, or that the product may not conform to the expected specifications, leading to performance issues, security concerns, or other problems. For example, NVIDIA’s RTX 2080 Ti graphics card was highly anticipated, but upon release, many users reported issues with its performance, including crashes, artifacting, and overheating. This led to a delay in the release of the RTX 3080, as NVIDIA had to address these issues before releasing the new product. Similarly, AMD’s Radeon RX7900 XTX graphics card has been plagued with claims of overheating. 

Supply Chain Risk:

The third risk associated with waiting for a new product is supply chain risk. This means that the company may be unable to get the product manufactured and shipped on time due to issues in the supply chain. For example, AMD faced supply chain issues with its Radeon RX 6800 XT graphics card, leading to limited availability and higher prices.

The reality is that any company building and launching a cloud gaming or streaming service is assuming its own technology and market risks. Compounding that risk by waiting for a product that “might” deliver minor gains in quality or performance (but equally might not) is a highly questionable decision, particularly in a market where even minor delays in launch dates can tank a new service before its even off the ground.

Clearly, ASICs are the future of high-volume video transcoding; NETINT, Google, and Meta have all proven this. NETINT is the only vendor of the three that actually offers its product for sale and immediate delivery; in fast-moving markets like interactive streaming and cloud gaming, this makes NETINT’s shipping transcoders, the T408 and Quadra, the safest bets of all.

ASICs, A Preferred Technology for High Volume Transcoding

The video presented below (and the transcript) is from a talk I gave for the Streaming Video Alliance entitled The Nine Events that Shook the Codec World on March 30, 2023. During the talk, I discussed the events occurring over the previous 12-18 months that impacted codec deployment and utility.

Not surprisingly, number 1 was Google Chrome starting to play HEVC. Number 8 was Meta announcing their own ASIC -based transcoder. Given that both Google and Meta are now using ASICs in their encoding workflows, it was an important signal that ASICs were now the preferred technology for high-volume streaming. 

In this excerpt from the presentation, I discuss the history of ASIC-based encoding from the MPEG-2 days of satellite and cable TV to current-day deployments in cloud gaming and other high-volume live interactive video services. Spend about 4 minutes reading the transcript or watching the video and you’ll understand why ASICs have become the preferred technology for high-volume transcoding. 

Here’s the transcript; the video is below. I will say that I heavily edited the transcript to remove the ums, ahs, and other miscues in the transcript.  

Historically, you can look at ASIC usage in three phases. Back when digital video was primarily deployed on satellite and cable TV in a MPEG-2 format, almost all encoders were ASIC-based. And that was because the CPUs at the time weren’t powerful enough to produce MPEG-2 in real-time. 

Then starting in around 2012 or so and ending around 2018, video processing started moving to the cloud. CPUs were powerful enough to support real-time encoding or transcoding of H.264, and ASIC usage decreased significantly.

Then starting in around 2012 or so, and ending around 2018, video processing started moving to the cloud. CPUs were powerful enough to support real-time encoding or transcoding of H.264, and ASIC usage decreased significantly.

At the time, I was writing for Streaming Media Magazine, Elemental came out and in 2012 or 2013, they really hyped the fact that they had compression-centric hardware appliances for encoding. Later on, discussing the same hardware, they transitioned to what they called software-defined video processing. And that’s how they got bought by AWS. AWS now does most of the encoding with Elemental products with their own Graviton CPUs.

ASICs - the latest phase

Now the latest phase. We’re seeing a lot of high-volume interactive use like gambling, auctions, high-volume UGC and other live videos, and cloud gaming. 

Codecs are also getting more complex. As we move from H.264 to HEVC to AV1 and soon to VVC and perhaps LCEVC and EVC, GPUs and CPUs can’t keep up.

At the same time, power consumption and density are becoming critical factors. Everybody’s talking about cost of power, and power consumption in data centers, and using CPUs and GPUs is just very, very inefficient.

And this is where ASICs emerge as the best solution on a cost-per-stream, watts-per-stream, and density basis. Density means how many streams we can output from a single server.

And we saw this, “Google Replaces Millions of Intel’s CPUs With Its Own Homegrown Chips.” Those homegrown chips were encoding ASICs. And then we saw Meta. 

ASICs - significance.

These deployments legitimize encoding ASICs as the preferred technology for high-volume transcoding, implicitly and explicitly. 

“There are two types of companies in the video business. Those using Video Processing ASICs in their workflows, and those that will”.

– David Ronca

I say explicitly because of the following comments made by David Ronca, who was director of video encoding at Netflix and then moved to Meta, two or three years ago. Announcing Meta’s new ASIC, he said, “There are two types of companies in the video business. Those using Video Processing ASICs in their workflows, and those that will be.”

Usage by Google and Facebook, Meta, gives ASICs a lot more credibility than what you get from me saying it, as obviously, NETINT makes encoding ASICs. And these legitimize our technology. The technologies themselves are different. Meta made their own chips. Google made their own chips. We have our own chips. But the whole technology is legitimized by the usage of these premiere services.


Watch the full presentation on YouTube:
https://youtu.be/-4sJ0We0hro

Argos dispels common myths about encoding ASICs

Argos dispels common myths about encoding ASICs

Even in 2023, many high-volume streaming producers continue to rely on software-based transcoding, despite the clear CAPEX, OPEX, and environmental benefits of ASIC-based transcoding. Part of the inertia relates to outdated concerns about the shortcomings of ASICs, including sub-par quality and lack of flexibility to add features or codec enhancements.

As a parent, I long ago concluded that there were no words that could come out of my mouth that would change my daughter’s views on certain topics. As a marketer, I feel some of that same dynamic, that no words can come out of my keyboard that would shake the negative beliefs about ASICs from staunch software-encoding supporters.

So, don’t take our word that these beliefs are outdated; consider the results from the world’s largest video producer, YouTube. The following slides and observations are from a Google presentation by Aki Kuusela and Clint Smullen on the Argos ASIC-based transcoder at Hot Chips 33 back in August 2021. The slides are available here, and the video here

In the presentation, the speakers discussed why YouTube developed its own ASIC and the performance and power efficiency achieved during the first 16 months of deployment. Their comments go a long way toward dispelling the myths identified above and make for interesting reading.

Advanced Codecs Means Encoding Time Has Grown by 8,000% Since H.264

In discussing why Google created its own encoder, Kuusela explained that video was getting harder to compress, not only from a codec perspective but from a resolution and frame rate perspective.  Here’s Kuusela (all quotes grabbed from the YouTube video and  lightly edited for readability).

“In order to sustain the higher resolutions and frame rate requirements of video, we have to develop better video compression algorithms with improved compression efficiency. However, this efficiency comes with greatly increased complexity. For example, if we compare the vp9 from 2013 to the decade older H.264, the time to encode videos in software has grown to 10x. The more recent AV1 format from 2018 is already 200 times more time-consuming than the h.264 standard.

If we further compound this effect with the increase in resolution and frame rate for top-quality video, we can see that the time to encode a video from 2003 to 2018 has grown eight thousand-fold. It is very obvious that the CPU performance improvement has not kept up with this massive complexity growth, and to keep our video services running smoothly, we had to consider warehouse scale acceleration. We also knew things would not get any better with the next generation of compression.”

Argos dispels common myths about encoding ASICs - 1
Figure 1. Google moved to hardware
to address skyrocketing encoding times.

Reviewing Figure 1, it should be noted that though few engineers use VP9 as extensively as YouTube, if you swap HEVC for VP9, the complexity difference between H.264 is the same. Beyond the higher resolutions and frame rates engineers must support to remain competitive, the need for hardware becomes even more apparent when you consider the demands of live production.

“Near Parity” with Software Encoding Quality

One consistent concern about ASICs has been quality, which admittedly lagged in early hardware generations. However, Google’s comparison shows that properly designed hardware can deliver near-parity to software-based transcoding.

Kuusela doesn’t spend a lot of time on the slide shown in Figure 2, merely stating that “we also wanted to be able to optimize the compression efficiency of the video encoder based on the real-time requirements and time available for each encoder and to have full access to all quality control algorithms such as bitrate allocation and group of picture selection. So, we could get near parity to software-based encoding quality with our no-compromises implementation.”

Figure 2. Argos delivers “near-parity”
with software encoders.

NETINT’s data more than supports this claim. For example, Table 1 compares the NETINT Quadra VPU with various x265 presets. Depending upon the test configuration, Quadra delivers quality on par with the x265 medium preset. When you consider that software-based live production often necessitates using the veryfast or ultrafast preset to achieve marginal throughput, Quadra’s quality far exceeds that of software-based transcoding.

Argos dispels common myths about encoding ASICs - table 1
Table 1. Quadra HEVC quality compared to x265
in high-quality latency tolerant configuration.

ASIC Performance Can Improve After Deployment

Another concern about ASIC-based transcoders is the inability to upgrade, and accelerated obsolescence. Proper ASIC design allows ASICs to balance encoding tasks between hardware, firmware, and control software to ensure continued upgradeability.

Figure 3 shows how the bitrate of VP9 and H.264 continued to improve compared to software in the months after the product launch, even without changes to the firmware or kernel driver. The second Google presenter, Clint Smullen attributed this to a hybrid hardware/software design, commenting that “Using a software approach was critical both to supporting the quality and feature development in the video core as well as allowing customer teams to iteratively improve quality and performance.”

Figure 3. Argos continued to improve after deployment
without changes to firmware or the kernel driver.

The NETINT Codensity G4 ASIC included in the T408 and the NETINT Codensity G5 ASIC that powers our Quadra family of VPUs, both use a hybrid design that distributes critical functions between the ASIC, driver software, and firmware.

We optimize ASIC design to maximize functional longevity as explained here on the role of firmware in ASIC implementations, “The functions implemented in the hardware are typically the lower-level parts of a video codec standard that do not change over time, so the hardware does not need to be updated. The higher levels parts of the video codecs are in firmware and driver software and can still be changed.”

As Google’s experience and NETINT’s data show, well-designed ASICs can continue improving in quality and functionality long after deployment. 

90% Reduction in Power Consumption

Few engineers question the throughput and power efficiency of ASICs, and Google’s data bears this out. Commenting on Figure 4, Smullen stated, “For H.264 transcoding a single VCU matches the speed of the baseline system while using about one-tenth of the system level power. For VP9, a single 20 VCU machine replaces multiple racks of CPU-only systems.”

Figure 4. Throughput and comparative efficiency
of Argos vs software-only transcoding.

NETINT ASICs deliver similar results. For example, a single T408 transcoder (H.264 and HEVC) delivers roughly the same throughput as a 16-core computer encoding with software and draws only about 7 watts compared to 250+ for the computer. NETINT Quadra draws 20 watts and delivers roughly 4x the performance of the T408 for H.264, HEVC, and AV1. In one implementation, a single 1RU rack of ten Quadras can deliver 320 1080p streams or 200 720p cloud gaming sessions, which like Argos, replaces multiple racks of CPUs.

Time to Reconsider?

As Google’s experience with YouTube and Argos shows, ASICs deliver unparalleled throughput and power efficiency in high-volume publishing workflows. If you haven’t considered ASICs for your workflow, it’s time for another look.

2022-Opportunities and Challenges for the Streaming Video Industry

2022-Opportunities and Challenges for the Streaming Video Industry

As 2022 comes to a close, for those in the streaming video industry, it will be remembered as a turbulent year marked by new opportunities, including the emergence of new video platforms and services.

2022 started off with Meta’s futuristic vision of the internet known as the Metaverse. The Metaverse can be described as a combination of virtual reality, augmented reality, and video where users interact within a digital universe. The Metaverse continues to evolve with the trend of unique individual, one-to-one video streaming experiences in contrast to one-to-many video streaming services which are commonplace today. 

Recent surveys have shown that two-thirds of consumers are planning to cut back on streaming subscriptions due to rising costs and diminishing discretionary income. With consumers becoming more value-conscious and price-sensitive, Netflix and other platforms have introduced new innovative subscriber models. Netflix’s subscription offering, in addition to SVOD (Subscription Video on Demand), now includes an Ad-based tier, AVOD (Advertising Video on Demand).  

Netflix shows the way

This new ad-based tier targets the most price sensitive customers and it is projected that AVOD growth will lead SVOD by 3x in 2023. Netflix can potentially earn over $4B in advertising revenue, making them the second largest ad support platform only after YouTube. This year also saw Netflix making big moves into mobile cloud gaming with the purchase of its 6th gaming studio. Adding gaming to their product portfolio serves at least two purposes: it expands the number of platforms that can access their game titles and serves as another service to maintain their existing users.

These new services and platforms are a small sample of the continued growth in new streaming video services where business opportunities abound for video platforms willing to innovate and take risks.

Stop data center expansion

The new streaming video industry landscape requires platforms to provide innovative new services to highly cost sensitive customers in a regulatory environment that discourages data center expansion. To prosper in 2023 and beyond, video platforms must address key issues to prosper and add services and subscribers.

  • Controlling data center sprawl – new services and extra capacity can no longer be contingent on the creation of new and larger data centers.
  • Controlling OPEX and CAPEX – in the current global economic climate, costs need to be controlled to keep prices under control and drive subscriber growth. In addition, in today’s economic uncertainty, access to financing and capital to fund data expansion cannot be assumed.
  • Energy consumption and environmental impact are intrinsically linked, and both must be reduced. Governments are now enacting environmental regulations and platforms that do not adopt green policies do so at their own peril.

Application Specific Integrated Circuit

For a vision of what needs to be done to address these issues, one only needs to glimpse into the recent past at YouTube’s Argos VCU (Video Coding Unit). Argos is YouTube’s in-house designed ASIC (Application Specific Integrated Circuit) encoder that, among other objectives, enabled YouTube to reduce their encoding costs, server footprint, and power consumption. YouTube is encoding over 500 hours (about 3 weeks) of content per minute.

To stay ahead of this workload, Google designed their own ASIC, which enabled them to eliminate millions of Intel CPUs. Obviously, not everyone has their own in-house ASIC development team, but whether you are a hyperscale platform, commercial, institutional, or government video platform, the NETINT Codensity ASIC-powered video processing units are available.

To enable faster adoption, NETINT partnered with Supermicro, the global leader in green server solutions. The NETINT Video Transcoding Server is based on a 1RU Supermicro server powered with 10 NETINT T408 ASIC-based video transcoder modules. The NETINT Video Transcoding Server, with its ASIC encoding engine, enables a 20x reduction in operational costs compared to CPU/software-based encoding. The massive savings in operational costs offset the CAPEX associated with upgrading to the NETINT video transcoding server.

Supermicro and T408 Server Bundle

In addition to the extraordinary cost savings, the advantages of ASIC encoding include enabling a reduction in the server footprint by a factor of 25x or more, which has a corresponding reduction in power consumption and, as a bonus, is also accompanied by a 25x reduction in carbon emissions. This enables video platforms to expand encoding capacity without increasing their server or carbon footprints, avoiding potential regulatory setbacks.

In need of environmentally friendly technologies

2022 has seen the emergence of many new opportunities with the launch of new innovative video services and platforms. To ensure the business success of these services, in the light of global economic uncertainty and geopolitical unrest, video platforms must rethink how these services are deployed and embrace new cost-efficient, environmentally friendly technologies.