Video Transcoder vs. Video Processing Unit (VPU)

When choosing a product for live stream processing, half the battle is knowing what to search for. Do you want a live transcoder, a video processing unit (VPU), a video coding unit (VCU), Scalable Video Processor (SVP) or something else? If you’re not quite sure what these terms mean and how they relate, this short article will educate you in four minutes or less.  

In the Beginning, There Were Transcoders

Simply stated, a transcoder is any technology, software or hardware, that can input a compressed stream (decode) and output a compressed stream (encode). FFmpeg is a transcoder, and for video-on-demand applications, it works fine in most low-volume applications.

For live applications, particularly high-volume live interactive applications (think Twitch), you’ll probably need a hardware transcoder to achieve the necessary cost per stream (CAPEX), operating cost per stream, and density.

For example, the NETINT Video Transcoding Server, a single 1RU server with ten NETINT T408 Video Transcoders, can deliver up to 80 H.264/HEVC 1080p30 streams while drawing under 250 watts. Performed in software using only the CPU, this same output could take up to ten separate 1RU servers, each drawing well over 250 watts.

Netint Codensity, ASIC-based T408 Video Transcoder
The NETINT T408 Video Transcoder.

Speaking of the T408, if Websters defined a transcoder (it doesn’t), it might have a picture of the T408 as the perfect example of a transcoder. Based on custom transcoding ASICs, the T408 is inexpensive ($400), capable (4K @ 60 FPS or 4x 1080p60 streams), flexible (H.264 and HEVC), and exceptionally efficient (only 7 watts).

What doesn’t the T408 do? Well, that leads us to the difference between a transcoder and a VPU.

The difference between a transcoder and a Video Processing Unit (VPU)

First, the T408 doesn’t scale video. If you’re building a full encoding ladder from a high-resolution source, all the scaling for the lower rungs is performed by the host CPU. In addition, the T408 doesn’t perform overlay in hardware. So, if you insert a logo or other bug over your videos, again, the CPU does the heavy lifting.

Finally, the T408 was launched in 2019, the first ASIC-based transcoder to ship in quite a long time. So, it’s not surprising that it doesn’t incorporate any artificial intelligence processing capabilities.

What is a Video Processing Unit (VPU)?

What’s a Video Processing Unit? A hardware device that does all that extra stuff, scaling, overlay, and AI. You see this in the transcoding pipeline shown below, which is for the NETINT Quadra.

When it came to labeling the Quadra, you see the problem; It does much more than a video transcoder. Not only does it outperform the T408 by a factor of four, it adds AV1 output and all the additional hardware functionality. It’s much more than a simple video transcoder, it’s a video processing unit (VPU).

As much as we’d like to lay claim to the acronym, it actually existed before we applied it to the Quadra. It’s not surprising. It follows the terminology for CPU (central processing unit) and GPU (graphical processing unit). And, if Websters defined VPU (it doesn’t). Oh, you get the point. Here’s the required Quadra glamour shot.

Netint Codensity, ASIC-based Quadra T1A Video Processing Unit
The NETINT Quadra Video Processing Unit.

VCUs and M(SVP)

While NETINT was busy developing ASIC-based transcoders and VPUs for the mass market, large video publishers like YouTube and Meta produced their own ASICs to achieve similar benefits (and produce more acronyms). In 2021, when Google shipped their own ASIC-based transcoder called Argos, they labeled it a Video Coding Unit, or VCU.

Like the T408 and Quadra, the benefits of this ASIC-based technology are profound; as reported by CNET, “Argos handles video 20 to 33 times more efficiently than conventional servers when you factor in the cost to design and build the chip, employ it in Google’s data centers, and pay YouTube’s colossal electricity and network usage bills.” Interestingly, despite YouTube’s heavy usage of the AV1 codec, Argos encodes only H.264 and VP9, not AV1.

In May 2023, Meta released their own ASIC, which, like Argos, outputs H.264 and VP9, but not AV1. Called the Meta Scalable Video Processor (MSVP), the unit delivered impressive results, including “a throughput gain of ~9x for H.264 when compared against libx264 SW encoding…[and] a throughput gain of ~50x when compared with libVPX speed 2 preset.” Meta also noted that the unit drew only 10 watts of power, which is skimpy but also about 43% higher than the T408.

Of course, neither Google or Meta sells their ASIC to third parties, so if want the CAPEX and OPEX efficiencies that ASIC-based VPUs deliver, you’ll have to buy from NETINT.

Of course, neither Google or Meta sells their ASIC to third parties, so if want the CAPEX and OPEX efficiencies that ASIC-based VPUs deliver, you’ll have to buy from NETINT. The bottom line is that whether you call it a transcoder, VPU, VCU, or MSVP, you’ll get the highest throughput and lowest power consumption if it’s powered by an ASIC.

Play Video about HARD QUESTIONS ON HOT TOPICS: ASIC-based Video Transcoder versus Video Processing Unit (VPU)
HARD QUESTIONS ON HOT TOPICS:
ASIC-based Video Transcoder versus Video Processing Unit (VPU)
Watch the full conversation on YouTube: https://youtu.be/iO7ApppgJAg

Hardware Transcoding: What it Is, How it Works, and Why You Care

What is Transcoding?

Like most terms relating to streaming, transcoding is defined more by practice than by a dictionary. In fact, transcoding isn’t in Websters or many other dictionaries. That said, it’s generally accepted that transcoding means converting a file from one format to another.  More particularly, it’s typically used within the context of a live-streaming application.

As an example, suppose you were watching a basketball game on NBA.tv. Assuming that the game is produced on-site, somewhere in the arena, a video mixer pulls together all video, audio, and graphics. The output would typically be fed into a device that compresses it to a high-bitrate H.264 or another compressed format and sends it to the cloud. You would typically call this live encoding; if the encoder is hardware-based, it would be hardware-based live encoding.

In the cloud, the incoming stream is transcoded to lower resolution H.264 streams for delivery to mobile and other devices or HEVC for delivery to a smart TV. This can be done in software but is typically performed using a hardware transcoder because it’s more efficient. More on this below.

Looking further into the production and common uses of streaming terminology, during the event or after, a video editor might create short highlights from the original H.264 video to share on social media. After editing the clip, they would encode it to H.264 or another compressed format to upload to Instagram or Facebook. You would typically call rendering the output from the software editor encoding, not transcoding, even though the software converts the H.264 input file to H.264 output, just like the transcoder.

Play Video about NETINT-Jan Ozer-Hardware Transcoding v Encoding
HARD QUESTIONS ON HOT TOPICS: Transcoding versus Encoding.
Watch the full conversation on YouTube: https://youtu.be/BcDVnoxMBLI

Boiling all this down in terms of common usage:

  • You encode a live stream from video input, in software or in hardware, to send it to the cloud for distribution. You use a live encoder, either hardware or software, for this.
  • In the cloud, you transcode the incoming stream to multiple resolutions or different formats using a hardware or software transcoder.
  • When outputting video for video-on-demand (VOD) deployment, you typically call this encoding (and not transcoding), even if you’re working from the same compressed format as the transcoding device.

Hardware Transcoding Alternatives

Anyone who has ever encoded a file knows that it’s a demanding process for your computer. When producing for VOD, time matters, but if the process takes a moment or two longer than planned, no one really notices. Live, of course, is different; if the video stream slows or is interrupted, viewers notice and may click to another website or change channels.

This is why hardware transcoding is typically deployed for high-volume transcoding applications. You can encode with a CPU and software, but CPUs perform multiple functions within the computer and are not optimized for transcoding. This means that a single server can produce fewer streams than hardware transcoders, which translates to higher CAPEX and power consumption.

Like the name suggests, hardware-based transcoding uses hardware devices other than the CPU to transcode the video. One alternative are graphics processing units (GPUs), which are highly optimized for graphic-intensive applications like gaming. Transcoding is supported with dedicated hardware circuits in the GPU, but the vast majority of circuits are for graphics and other non-transcoding functions. While GPUs are more efficient than CPUs for transcoding, they are expensive and consume significant power.

ASIC-Based Transcoding

Which takes us to ASICs. Application-Specific Integrated Circuits (ASICs) are designed for a specific task or application, like video transcoding. Because they‘re designed for this task, they are more efficient than CPU or GPU-based encoding, more affordable, and more power-efficient.

Because they‘re designed for this task, Application-Specific Integrated Circuits (ASICs) are more efficient than CPU or GPU-based encoding, more affordable, and more power-efficient.

ALEX LIU, Co-Founder,
COO at NETINT Technologies Inc.

ASICs are also very compact, so you can pack more ASICs into a server than GPUs or CPUs, increasing the output from that server. This means that fewer servers can deliver the same number of streams than with GPU or CPU-based transcoding, which saves additional server storage cost and maintenance.

While we’re certainly biased, if you’re looking for a cost-effective and power-efficient hardware alternative for high-volume transcoding applications, ASIC transcoders are the way to go. Don’t take our word for it; you can read here how YouTube converted much of their production operation to the ASIC-based Argos VCU (for video compression unit). Meta recently also released their own encoding ASIC. Of course, neither of these are for sale to the public; the primary vendor for ASIC-based transcoders is NETINT.

ASICs – The Time is Now

A brief review of the history of encoding ASICs reveals why they have become the technology of choice for high-volume video streaming services and cloud-gaming platforms.

Like all markets, there will be new market entrants that loudly announce for maximum PR effect, promising delivery at some time in the future. But, to date, outside of Google’s internal YouTube ASIC project called ARGOS and the recent Meta (Facebook) ASIC also for internal use only, NETINT is the only commercial company building ASIC-based transcoders for immediate delivery.

“ASICs are the future of high-volume video transcoding as NETINT, Google, and Meta have proven. NETINT is the only vendor that offers its product for sale and immediate delivery making the T408 and Quadra safe bets.”

Delaying a critical technology decision always carries risk. The risk is that you miss an opportunity or that your competitors move ahead of you. However, waiting to consider an announced and not yet shipping product means that you ALSO assume the manufacturing, technology, and supply chain risk of THAT product.

What if you delay only to find out that the announced delivery date was optimistic at best? Or, what if the vendor actually delivers, only for you to find out that their performance claims were not real? There are so many “what if’s” when you wait that it rarely is the right decision to delay when there is a viable product available.

Now let’s review the rebirth of ASICs for video encoding and see how they’ve become the technology of choice for high-volume transcoding operations.  

The Rebirth of ASICs for Video Encoding

An ASIC is an application specific integrated circuit that is designed to do a small number of tasks with high efficiency. ASICs are purpose-built for a specific function. The history of video encoding ASICs can be traced back to the initial applications of digital video and the adoption of the MPEG-2 standard for satellite and cable transmission.

Most production MPEG-2 encoders were ASIC-based.

As is the case for most new codec standards, the first implementation of MPEG-2 compression was CPU-based. Given the cost of using commodity servers and software, dedicated hardware is always necessary to handle the processing requirements of high-quality video encoding cost-effectively.

This led to the development and application of video encoding ASICs, which are specialized integrated circuits designed to perform the processing tasks required for video encoding. Encoding ASICs provide the necessary processing power to handle the demands of high-quality video encoding while being more cost-effective than CPU-based solutions.

With the advent of the internet, the demand for digital video continued to increase. The rise of on-demand and streaming video services, such as YouTube and Netflix, led to a shift towards CPU-based encoding solutions. This was due in part to the fact that streaming video required a more flexible approach to encoding including implementation agility with the cloud and an ability to adjust encoding parameters based on the available bandwidth and device capabilities.

As the demand for live streaming services increased, the limitations of CPU-based encoding solutions became apparent. Live streaming services, such as cloud gaming and real-time interactive video like gaming or conferencing, require the processing of millions of live interactive streams simultaneously at scale. This has led to a resurgence in the use of encoding ASICs for live-streaming applications. Thus, the rebirth of ASICs is upon us and it’s a technology trend that should not be ignored even if you are working in a more traditional entertainment streaming environment.

NETINT: Leading the Resurgence

NETINT has been at the forefront of the ASIC resurgence. In 2019, the company introduced its Codensity T408 ASIC-based transcoder. This device was designed to handle 8 simultaneous HEVC or H.264 1080p video streams, making it ideal for live-streaming applications.

The T408 was well-received by the market, and NETINT continued to innovate. In 2021, the company introduced its Quadra series. These devices can handle up to 32 simultaneous 1080p video streams, making it even more powerful than the T408, also adding the anticipated AV1 codec.

“NETINT has racked up a number of major wins including major names such as ByteDance, Baidu, Tencent, Alibaba, Kuaishou, and a US-based global entertainment service.”

As described by Dylan Patel, editor of the Semianalysis blog, in his article Meet NETINT: The Startup Selling Datacenter VPUs To ByteDance, Baidu, Tencent, Alibaba, And More, “NETINT has racked up a number of major wins including major names such as ByteDance, Baidu, Tencent, Alibaba, Kuaishou, and a similar sized US-based global platform.”

NETINT Quadra T1U Video Processing Unit
– NETINT’s second-generation of shipping ASIC-based transcoders.

Patel also reported that using the HEVC codec, NETINT video transcoders and VPUs crushed Nvidia’s T4 GPU, which is widely assumed to be the default choice when moving to a hardware encoder for the data center. The density and power consumption that can be achieved with a video ASIC is unmatched compared to CPUs and GPUs.

Patel commented further, “The comparison using AV1 is even more powerful… NETINT is the leader in merchant video encoding ASICs.”

“The comparison using AV1 is even more powerful…NETINT is the leader in video encoding ASICs.”

-Dylan Patel

ASIC Advantages

ASICs are designed to perform a specific task, such as encoding video, with a high degree of efficiency and speed. CPUs and GPUs are designed to perform a wide range of general-purpose computing tasks. As evidence of this fact, today, the primary application for GPUs has nothing to do with video encoding. In fact, just 5-10% of the silicon real estate on some of the most popular GPUs in the market are dedicated to video encoding or processing. Highly compute-intensive tasks like AI inferencing are the most common workload for GPUs today.

The key advantage of ASICs for video encoding is that they are optimized for this specific task, with a much higher percentage of gates on the chip dedicated to encoding than CPUs and GPUs. ASICs can encode much faster and with higher quality than CPUs and GPUs, while using less power and generating less heat.

“ASICs can encode much faster and with higher quality than CPUs and GPUs while using less power and generating less heat.”

-Dylan Patel

Additionally, because ASICs are designed for a specific task, they can be more easily customized and optimized for specific use cases. Though some assume that ASICs are inflexible, in reality, with a properly designed ASIC, the function it’s designed for may be tuned more highly than if the function was run on a general purpose computing platform. This can lead to even greater efficiency gains and improved performance.

The key takeaway is that ASICs are a superior choice for video encoding due to their application-specific design, which allows for faster and more efficient processing compared to general-purpose CPUs and GPUs.

Confirmation from Google and Meta

Recent industry announcements from Google and Meta confirm these conclusions. When Google announced the ASIC-based Argos VCU (Video Coding Unit) in 2021, the trade press rightfully applauded. CNET announced that “Google supercharges YouTube with a custom video chip.” Ars Technica reported that Argos brought “up to 20-33x improvements in compute efficiency compared to… software on traditional servers.” SemiAnalysis reported that Argos “Replaces 10 Million Intel CPUs.”

Google’s Argos confirms the value of encoding ASICs
(and shipped 2 years after the NETINT T408).

As described in the article “Argos dispels common myths about encoding ASICs” (bit.ly/ASIC_myths), Google’s experience highlights the benefits of ASIC-based transcoders. That is, while many streaming engineers still rely on software-based transcoding, ASIC-based transcoding offers a clear advantage in terms of CAPEX, OPEX, and environmental sustainability benefits. The article goes on to address outdated concerns about the shortcomings of ASICs, including sub-par quality and the lack of upgradeability.

The article discusses several key findings from Google’s presentation on the Argos ASIC-based transcoder at Hot Chips 33, including:

  • Encoding time has grown by 8000% due to increased complexity from higher resolutions and frame rates. ASIC-based transcoding is necessary to keep video services running smoothly.
  • ASICs can deliver near-parity to software-based transcoding quality with properly designed hardware.
  • ASICs quality and functionality can be improved and changed long after deployment.
  • ASICs deliver unparalleled throughput and power efficiency, with Google reporting a 90% reduction in power consumption.

Though much less is known about the Meta ASIC, its announcement prompted Facebook’s Director of Video Encoding, David Ronca, to proclaim, “I propose that there are two types of companies in the video business. Those that are using Video Processing ASICs in their workflows, and those that will.”

“…there are two types of companies in the video business. Those that are using Video Processing ASICs in their workflows, and those that will.”

Meta proudly announces its encoding ASIC
(3 years after NETINT’s T408 ships).

Unlike the ASICs from Google and Meta, you can actually buy ASIC-based transcoders from NETINT, and in fact scores of tens of thousands of units are operating in some of the largest hyperscaler networks and video streaming platforms today. The fact that two of the biggest names in the tech industry are investing in ASICs for video encoding is a clear indication of the growing trend towards application-specific hardware in the video field. With the increasing demand for high-quality video streaming across a variety of devices and platforms, ASICs provide the speed, efficiency, and customization needed to meet these needs.

Avoiding Shiny New Object Syndrome

ASICs as the best method for transcoding high volumes of live video has not gone unnoticed, meaning you should expect product announcements that are made pointing to “availability later this year.” When these occur around prominent trade shows, it can indicate a rushed announcement made for the show, and that the later availability may actually be “much later…”

It’s useful to remember that while waiting for a new product from a third-party supplier to become available, companies face three distinct risks: manufacturing, technology, and supply chain.

Manufacturing Risk:

One of the biggest risks associated with waiting for a new product is the manufacturing risk, which means that the product may have issues in manufacturing. That is, there is always a chance that the manufacturing process may encounter unexpected problems, causing delays and increasing costs. For example, Intel has faced manufacturing issues with its 10nm processors, which resulted in delays for its upcoming processors. As a result, Intel lost market share to competitors such as AMD and NVIDIA, who were able to release their products earlier.

Technology Risk:

Another risk associated with waiting for a new product is technology risk, or that the product may not conform to the expected specifications, leading to performance issues, security concerns, or other problems. For example, NVIDIA’s RTX 2080 Ti graphics card was highly anticipated, but upon release, many users reported issues with its performance, including crashes, artifacting, and overheating. This led to a delay in the release of the RTX 3080, as NVIDIA had to address these issues before releasing the new product. Similarly, AMD’s Radeon RX7900 XTX graphics card has been plagued with claims of overheating. 

Supply Chain Risk:

The third risk associated with waiting for a new product is supply chain risk. This means that the company may be unable to get the product manufactured and shipped on time due to issues in the supply chain. For example, AMD faced supply chain issues with its Radeon RX 6800 XT graphics card, leading to limited availability and higher prices.

The reality is that any company building and launching a cloud gaming or streaming service is assuming its own technology and market risks. Compounding that risk by waiting for a product that “might” deliver minor gains in quality or performance (but equally might not) is a highly questionable decision, particularly in a market where even minor delays in launch dates can tank a new service before its even off the ground.

Clearly, ASICs are the future of high-volume video transcoding; NETINT, Google, and Meta have all proven this. NETINT is the only vendor of the three that actually offers its product for sale and immediate delivery; in fast-moving markets like interactive streaming and cloud gaming, this makes NETINT’s shipping transcoders, the T408 and Quadra, the safest bets of all.

ASICs, A Preferred Technology for High Volume Transcoding

The video presented below (and the transcript) is from a talk I gave for the Streaming Video Alliance entitled The Nine Events that Shook the Codec World on March 30, 2023. During the talk, I discussed the events occurring over the previous 12-18 months that impacted codec deployment and utility.

Not surprisingly, number 1 was Google Chrome starting to play HEVC. Number 8 was Meta announcing their own ASIC -based transcoder. Given that both Google and Meta are now using ASICs in their encoding workflows, it was an important signal that ASICs were now the preferred technology for high-volume streaming. 

In this excerpt from the presentation, I discuss the history of ASIC-based encoding from the MPEG-2 days of satellite and cable TV to current-day deployments in cloud gaming and other high-volume live interactive video services. Spend about 4 minutes reading the transcript or watching the video and you’ll understand why ASICs have become the preferred technology for high-volume transcoding. 

Here’s the transcript; the video is below. I will say that I heavily edited the transcript to remove the ums, ahs, and other miscues in the transcript.  

Historically, you can look at ASIC usage in three phases. Back when digital video was primarily deployed on satellite and cable TV in a MPEG-2 format, almost all encoders were ASIC-based. And that was because the CPUs at the time weren’t powerful enough to produce MPEG-2 in real-time. 

Then starting in around 2012 or so and ending around 2018, video processing started moving to the cloud. CPUs were powerful enough to support real-time encoding or transcoding of H.264, and ASIC usage decreased significantly.

Then starting in around 2012 or so, and ending around 2018, video processing started moving to the cloud. CPUs were powerful enough to support real-time encoding or transcoding of H.264, and ASIC usage decreased significantly.

At the time, I was writing for Streaming Media Magazine, Elemental came out and in 2012 or 2013, they really hyped the fact that they had compression-centric hardware appliances for encoding. Later on, discussing the same hardware, they transitioned to what they called software-defined video processing. And that’s how they got bought by AWS. AWS now does most of the encoding with Elemental products with their own Graviton CPUs.

ASICs - the latest phase

Now the latest phase. We’re seeing a lot of high-volume interactive use like gambling, auctions, high-volume UGC and other live videos, and cloud gaming. 

Codecs are also getting more complex. As we move from H.264 to HEVC to AV1 and soon to VVC and perhaps LCEVC and EVC, GPUs and CPUs can’t keep up.

At the same time, power consumption and density are becoming critical factors. Everybody’s talking about cost of power, and power consumption in data centers, and using CPUs and GPUs is just very, very inefficient.

And this is where ASICs emerge as the best solution on a cost-per-stream, watts-per-stream, and density basis. Density means how many streams we can output from a single server.

And we saw this, “Google Replaces Millions of Intel’s CPUs With Its Own Homegrown Chips.” Those homegrown chips were encoding ASICs. And then we saw Meta. 

ASICs - significance.

These deployments legitimize encoding ASICs as the preferred technology for high-volume transcoding, implicitly and explicitly. 

“There are two types of companies in the video business. Those using Video Processing ASICs in their workflows, and those that will”.

– David Ronca

I say explicitly because of the following comments made by David Ronca, who was director of video encoding at Netflix and then moved to Meta, two or three years ago. Announcing Meta’s new ASIC, he said, “There are two types of companies in the video business. Those using Video Processing ASICs in their workflows, and those that will be.”

Usage by Google and Facebook, Meta, gives ASICs a lot more credibility than what you get from me saying it, as obviously, NETINT makes encoding ASICs. And these legitimize our technology. The technologies themselves are different. Meta made their own chips. Google made their own chips. We have our own chips. But the whole technology is legitimized by the usage of these premiere services.


Watch the full presentation on YouTube:
https://youtu.be/-4sJ0We0hro