World’s First AV1 Live Streaming CDN powered by VPUs

AV1 live streaming CDN

RealSprint’s vision for Vindral, its live-streaming CDN, is to deliver the quality of HLS and the latency of WebRTC. Early trials revealed that CPU-only transcoding lacked scalability, and GPUs used excessive power and proved challenging to configure.

Implementing NETINT’s ASIC-based Quadra delivered the required quality and latency in a low-power, simple-to-configure package with H.264, HEVC, and AV1 output. As a result, Quadra became a “preferred component” of the Vindral setup.

Implementing NETINT’s ASIC-based Quadra delivered the required quality and latency in a low-power, simple-to-configure package with H.264, HEVC, and AV1 output. As a result, Quadra became a “preferred component” of the Vindral setup.

The RealSprint Story

RealSprint is a tech company founded in 2013 and based in Umeå, Sweden. Since its inception, RealSprint has delivered industry-defining solutions that drive real business value. It’s flagship solution, Vindral live CDN, combines ultra-low latency streaming with 4K support, sync, and absolute stability. The latest addition, Composer, streamlines the setup for live video compositing, effects, and encoding.

In explaining RealSprint’s goals to Streaming Media Magazine, RealSprint CEO Daniel Alinder stated that part of the company’s goal is “to disrupt, spur innovation, and ensure high-end streaming experiences.” This focus, and RealSprint’s painstaking execution, has brought customers like Sotheby’s, Hong Kong Jockey Club, and IcelandAir into RealSprint’s client roster.

RealSprint is a tech company founded in 2013 and based in Umeå, Sweden. Since its inception, RealSprint has delivered industry-defining solutions that drive real business value. It’s flagship solution, Vindral live CDN, combines ultra-low latency streaming with 4K support, sync, and absolute stability.

live streaming - World’s First AV1 Live Streaming CDN powered by VPUs
Figure 1. Check out this Vindral demo at https://demo.vindral.com/?4k

Finding the Ideal Transcoder for Vindral

The Vindral live CDN is transforming the landscape for live streaming, offering high-quality streaming at low latency and synchronized playout. As a result, Vindral is highly optimized for verticals such as live sports, iGaming, live auctions, and entertainment markets with a desired latency of around one second and where stability is imperative, even at high video quality.

Alinder explains, “It is, of course, possible to configure for 0.5-second latency as well, but none of our clients has chosen to go that low. More common focus areas are image quality and synchronized playout. A game show with host-crowd interaction does not require real-time latency. Keeping all viewers in sync, around 1 second, while maintaining full-HD quality is a common request that we see.”

Elaborating on Alinder’s comments, Niclas Åström, founder and Chief Product Officer at RealSprint, adds, “we call it the Sweet Spot. Vindral is built to put clients in charge of their own sweet spot in terms of buffer and quality. While we are highly impressed by technologies such as WebRTC, we aim to pave the way for a new mainstream in which latency is only one of the parameters.”

Expanding upon Vindral’s target use cases, Alinder details, “A typical use case is live auctions. The usual setup for live auctions is 1080P, and you want below one second of latency because people are bidding online. There are also people bidding in the actual auction house, so there’s the fairness aspect of it as well.”

“Clients typically configure around a 700-millisecond buffer, and even that small of a buffer makes such a huge difference in quality and reliability. What we see in our metrics is that, basically, 99% of the viewers watch the highest quality stream across all markets. That’s a huge deal.”

Play Video about live streaming - World’s First AV1 Live Streaming CDN powered by VPUs
HARD QUESTIONS ON HOT TOPICS:
World’s first AV1 live streaming CDN powered by NETINT’s Quadra VPU
Watch on YouTube: https://youtu.be/Qhe6wuJoOX0

Exploring Transcoder Options

To provide this flexible latency, Vindral depends upon a transcoder to produce the streams with minimal latency, and a vendor-agnostic hybrid content delivery network (CDN) to deliver the streams. To explain, the transcoder inputs the incoming stream from the live source and produces multiple outputs to deliver to viewers watching on different devices and connections.

Choosing the transcoder is obviously a critical decision for Vindral and RealSprint. When exploring its transcoder options, RealSprint considered multiple criteria, including cost per stream, power, output quality, format support, latency, and density.

According to CTO Per Mafrost, “We started using only CPUs but quickly concluded that we needed better scalability. We moved on to using GPUs, but the hardware setups got a bit more troublesome and more energy-demanding. A year back, we got in touch with NETINT to test their ASICs and were pleased with our findings.”

Netint Codensity, ASIC-based Quadra T2A Video Processing Unit
Figure 2. The NETINT Quadra T2 VPU.

“We’ve found that the quality when using ASICs is fantastic.”

RealSprint CEO Daniel Alinder

Quadra Fills the Gap

Specifically, Vindral implemented NETINT’s Quadra Video Processing Unit (VPU), which is driven by the Codensity G5 ASIC, which stands for Application Specific Integrated Circuit, in terms of transcoding, Quadra inputs H.264, HEVC, and VP9 video and outputs H.264, HEVC, and AV1, all at sub-frame latencies, which translate to under 0.03 seconds for a 30-fps input stream. Quadra is called a VPU rather than a transcoder because, in addition to audio and video transcoding, it also offers onboard scaling, overlay and houses two Deep Neural Network engines capable of 18 Trillion Operations per Second (TOPS).

According to Alinder, Quadra delivers both top quality and the necessary low latency. “We’ve found that the quality when using ASICs is fantastic. It’s all depending on what you want to do. Because we need to understand we’re talking about low latency here. Everything needs to work in real time. Our requirement on encoding is that it takes a frame to encode, and that’s all the time that you get.”

Quadra’s AV1 output was another key consideration. As Alinder explained, “we’re seeing markers that our clients are going to want AV1. And there are several reasons why that is the case. One of which is, of course, it’s license free. If you’re a content owner, especially if you’re a content owner with a large crowd with many subscribers to your content, that’s a game-changer. Because the cost of licensing a codec can grow to become a significant part of your business expenses.”

“That is a huge game changer because ASICs are unmatched in terms of the number of streams per rack unit.”

RealSprint CEO Daniel Alinder

Density and Power Consumption

Density refers to the number of streams a device or server can output. Because ASICs are purpose-built for video transcoding, they’re extremely efficient transcoders that provide maximum density but also very low power consumption. Speaking to Quadra’s density, Alinder commented, “That is a huge game changer because ASICs are unmatched in terms of the number of streams per rack unit.”

Of course, power consumption is also critical, particularly in Europe. As Alinder detailed, “If you look at the energy crisis and how things are evolving, I’d say [power consumption] is very, very important. The typical offer you’ll be getting from the data center is: we’re going to charge you 2x the electrical bill. In Germany, the energy price peaked in August 2022 at 0.7 Euros per kilowatt hour.”

To be clear, in some instances, Vindral can reduce power consumption and other carbon emissions by making travel unnecessary. As Alinder explained, “We have a Norwegian company that we’re working with that is doing remote inspections of ships. They were the first company in the world to do that. Instead of flying in an inspector, the ship owner, and two divers to the location, there’s only one operator of an underwater drone that is on the location. Everybody else is just connected. That’s obviously a good thing for the environment.”

“Another seldom mentioned topic set NETINT ASICs apart from CPUs and many GPUs: linear load. Specifically, it was relatively easy to create a solution where we could feel safe when calculating the load and expected capacity for transcoder nodes. The density, cost/stream, and quality are bonuses.”

RealSprint CTO Per Mafrost

Linear Load

One final characteristic set Quadra apart, was a predictable “linear load” pattern. As described by CTO Mafrost, “in choosing between different alternatives, the usual suspects such as cost, power, quality, and density were our main criteria. But another seldom mentioned topic set NETINT ASICs apart from CPUs and many GPUs: linear load. Specifically, it was relatively easy to create a solution where we could feel safe when calculating the load and expected capacity for transcoder nodes. The density, cost/stream, and quality are bonuses.”

RealSprint began deploying NETINT Quadra VPUs in 2022. As Mafrost concluded, “Since then, ASICs have started to be a preferred component of our setup.”

live streaming - World’s First AV1 Live Streaming CDN powered by VPUs
Figure 3. NETINT Quadra has become a “preferred component” of Vindral.

The NETINT View

NETINT Technologies is an innovator of ASIC-based video processing solutions for low-latency video transcoding. Users of NETINT solutions realize a 10X increase in encoding density and a 20X reduction in carbon emissions compared to CPU-based software encoding solutions. NETINT makes it seamless to move from software to hardware-based video encoding so that hyper-scale services and platforms can unlock the full potential in their computing infrastructure.

Regarding Vindral’s use of Quadra, NETINT’s COO Alex Liu commented, “Live streaming video platforms demand more efficient and cost-effective video encoding solutions due to the emergence of new interactive video applications which can only be met with ASIC hardware encoding. Vindral, the industry’s first 4K AV1 streaming platform and powered with NETINT’s Quadra T2 real-time, low-latency 4K AV1 encoder, is a game changer. We are really excited about the amazing video experiences that Vindral users will bring to their customers as a result of this breakthrough in latency and quality,”

RealSprint began deploying NETINT Quadra VPUs in 2022. As Mafrost concluded, “Since then, ASICs have started to be a preferred component of our setup.”

Figure 4. Streaming Media Magazine discussing Vindral with RealSprint CEO Daniel Alinder. https://youtu.be/xJ2Zfo2r7SM

The Industry Takes Notice

The potent combination of Vindral and Quadra has the industry taking notice. For example, in this Streaming Media interview, respected contributing editor Tim Siglin interviewed Alinder about Vindral, summarizing “the fact that [Quadra] is an ASIC that does more transcodes at a lower power consumption means that it gives you a better viability.” 

The Industry Takes Notice

NETINT was the first company to ship AV1-based ASIC transcoders and has shipped tens of thousands of transcoders and VPUs, producing over 200 billion streams in 2022. In fact, NETINT has shipped more ASIC-based transcoders than any other supplier to the cloud gaming, broadcast, and similar live-streaming markets.

Validating NETINT’s approach, in 2021, Google launched their own encoding ASIC-based transcoder, called ARGOS, as did Meta in 2022. Both products are exclusively used internally by the respective companies.

The best way to leverage the benefits of encoding ASICs is to contact NETINT.

Hardware Transcoding: What it Is, How it Works, and Why You Care

What is Transcoding?

Like most terms relating to streaming, transcoding is defined more by practice than by a dictionary. In fact, transcoding isn’t in Websters or many other dictionaries. That said, it’s generally accepted that transcoding means converting a file from one format to another.  More particularly, it’s typically used within the context of a live-streaming application.

As an example, suppose you were watching a basketball game on NBA.tv. Assuming that the game is produced on-site, somewhere in the arena, a video mixer pulls together all video, audio, and graphics. The output would typically be fed into a device that compresses it to a high-bitrate H.264 or another compressed format and sends it to the cloud. You would typically call this live encoding; if the encoder is hardware-based, it would be hardware-based live encoding.

In the cloud, the incoming stream is transcoded to lower resolution H.264 streams for delivery to mobile and other devices or HEVC for delivery to a smart TV. This can be done in software but is typically performed using a hardware transcoder because it’s more efficient. More on this below.

Looking further into the production and common uses of streaming terminology, during the event or after, a video editor might create short highlights from the original H.264 video to share on social media. After editing the clip, they would encode it to H.264 or another compressed format to upload to Instagram or Facebook. You would typically call rendering the output from the software editor encoding, not transcoding, even though the software converts the H.264 input file to H.264 output, just like the transcoder.

Play Video about NETINT-Jan Ozer-Hardware Transcoding v Encoding
HARD QUESTIONS ON HOT TOPICS: Transcoding versus Encoding.
Watch the full conversation on YouTube: https://youtu.be/BcDVnoxMBLI

Boiling all this down in terms of common usage:

  • You encode a live stream from video input, in software or in hardware, to send it to the cloud for distribution. You use a live encoder, either hardware or software, for this.
  • In the cloud, you transcode the incoming stream to multiple resolutions or different formats using a hardware or software transcoder.
  • When outputting video for video-on-demand (VOD) deployment, you typically call this encoding (and not transcoding), even if you’re working from the same compressed format as the transcoding device.

Hardware Transcoding Alternatives

Anyone who has ever encoded a file knows that it’s a demanding process for your computer. When producing for VOD, time matters, but if the process takes a moment or two longer than planned, no one really notices. Live, of course, is different; if the video stream slows or is interrupted, viewers notice and may click to another website or change channels.

This is why hardware transcoding is typically deployed for high-volume transcoding applications. You can encode with a CPU and software, but CPUs perform multiple functions within the computer and are not optimized for transcoding. This means that a single server can produce fewer streams than hardware transcoders, which translates to higher CAPEX and power consumption.

Like the name suggests, hardware-based transcoding uses hardware devices other than the CPU to transcode the video. One alternative are graphics processing units (GPUs), which are highly optimized for graphic-intensive applications like gaming. Transcoding is supported with dedicated hardware circuits in the GPU, but the vast majority of circuits are for graphics and other non-transcoding functions. While GPUs are more efficient than CPUs for transcoding, they are expensive and consume significant power.

ASIC-Based Transcoding

Which takes us to ASICs. Application-Specific Integrated Circuits (ASICs) are designed for a specific task or application, like video transcoding. Because they‘re designed for this task, they are more efficient than CPU or GPU-based encoding, more affordable, and more power-efficient.

Because they‘re designed for this task, Application-Specific Integrated Circuits (ASICs) are more efficient than CPU or GPU-based encoding, more affordable, and more power-efficient.

ALEX LIU, Co-Founder,
COO at NETINT Technologies Inc.

ASICs are also very compact, so you can pack more ASICs into a server than GPUs or CPUs, increasing the output from that server. This means that fewer servers can deliver the same number of streams than with GPU or CPU-based transcoding, which saves additional server storage cost and maintenance.

While we’re certainly biased, if you’re looking for a cost-effective and power-efficient hardware alternative for high-volume transcoding applications, ASIC transcoders are the way to go. Don’t take our word for it; you can read here how YouTube converted much of their production operation to the ASIC-based Argos VCU (for video compression unit). Meta recently also released their own encoding ASIC. Of course, neither of these are for sale to the public; the primary vendor for ASIC-based transcoders is NETINT.

NETINT Video Transcoding Server – ASIC technology at its best

NETINT Video Transcoding Server - quality-speed-density

Many high-volume streaming platforms and services still deploy software-only transcoding, but high energy prices for private data centers and escalating public cloud costs make the OPEX, carbon footprint, and dismal scalability unsustainable. Engineers looking for solutions to this challenge are actively exploring hardware that can integrate with their existing workflows and deliver the quality and flexibility of software with the performance and operational cost efficiency of purpose-built hardware. 

If this sounds like you, the USD $8,900 NETINT Video Transcoding Server could be the ideal solution. The server combines the Supermicro 1114S-WN10RT AMD EPYC 7543P-powered 1RU server with ten NETINT T408 video transcoders that draw just 7 watts each. Encoding HEVC and H.264 at normal or low latency, you can control transcoding operations via  FFmpeg, GStreamer, or a low-level API. This makes the server a drop-in replacement for a traditional x264 or x265 FFmpeg-based or GPU-powered encoding stack.

NETINT Video Transcoding Server

Due to the performance advantage of ASICs compared to software running on x86 CPUs, the server can perform the equivalent work of roughly 10 separate machines running a typical open-source FFmpeg and x264 or x265 configuration. Specifically,  the server can simultaneously transcode twenty 4Kp30 streams, and up to 80 1080p30 live streams. In ABR mode, the server transcodes up to 30 five-rung H.264 encoding ladders from 1080p to 360p resolution, and up to 28 four-rung HEVC encoding ladders. For engineers delivering UHD, the server can output seven 6-rung HEVC encoding ladders from 4K to 360p resolution, all while drawing less than 325 watts of total power.

This review begins with a technical description of the server and transcoding hardware and the options available to drive the encoders, including the resource manager that distributes jobs among the ten transcoders. Then we’ll review performance results for one-to-one streaming and then H.264 and HEVC ladder generation, and finish with a look at the server’s ultra-efficient power consumption.

NETINT Transcoding Server with 10 T408 Video Transcoders

Hardware Specs

Built on the Supermicro 1114S-WN10RT 1RU server platform, the NETINT Video Transcoding Server features ten NETINT Codensity ASIC-powered T408 video transcoders, and runs Ubuntu 20.04.05 LTSThe server ships with 128 GB of DDR4-3200 RAM and a 400GB M.2 SSD drive with 3x PCIe slots and ten NVME slots to house the ten U.2 T408 video transcoders.

You can buy the server with any of three AMD EPYC processors with 8 to 64 cores. We performed the tests for this review on the 32-core AMD EPYC 7543P CPU that doubles to 64 threads with multithreading.  The server configured with the AMD EPYC 7713P processor with 64-cores and 128-threads sells for USD $11,500, and the economical AMD EPYC 7232P processor-based server with 8-cores and 16-threads lists for USD $7,000.

Regarding the server hardware, Supermicro is a leading server and storage vendor that designs, develops, and manufactures primarily in the United States. Supermicro adheres to high-quality standards, with a quality management system certified to the ISO 9001:2015 and ISO 13485:2016 standards and an environmental management system certified to the ISO 14001:2015 standard. Supermicro is also a leader in green computing and reducing data center footprints (see the white paper Green Computing: Top Ten Best Practices for a Green Data Center). As you’ll see below, this focus has resulted in an extremely power-efficient machine when operated with NETINT video transcoders.

Let’s explore the system - NETINT Video Transcoding Server

With this as background, let’s explore the system. Once up and running in Ubuntu, you can check T408 status via the ni_rsrc_mon_logan command, which reveals the number of T408s installed and their status. Looking at Figure 1, the top table shows the decoder performance of the installed T408s, while the bottom table shows the encoding performance.

Figure 1. Tracking the operation of the T408s, decode on top, encode on the bottom.

About the T408

T408s have been in service since 2019 and are being used extensively in hyper-scale platforms and cloud gaming applications. To date, more than 200 billion viewer minutes of live video have been encoded using the T408. This makes it one of the bestselling ASIC-based encoders on the market.

The NETINT T408 is powered by the Codensity G4 ASIC technology and is available in both PCIe and U.2 form factors. The T408s installed in the server are the U.2 form factor plugged into ten NVMe bays. The T408 supports close caption passthrough, and EIA CEA-708 encode/decode, along with support for High Dynamic Range in HDR10 and HDR10+ formats.

“To date, more than 200 billion viewer minutes of live video have been encoded using the T408. This makes it one of the bestselling ASIC-based encoders on the market.” 

ALEX LIU, Co-Founder,
COO at NETINT Technologies Inc.

The T408 decodes and encodes H.264 and HEVC on board but performs all scaling and overlay operations via the host CPU. For one-to-one same-resolution transcoding, users can select an option called YUV Bypass that sends the video transcoded by the T408 directly to the T408 encoder. This eliminates high-bandwidth trips through the bus to and from system memory, reducing the load on the bus and CPU. As you’ll see, in pure 1:1 transcode applications without overlay, CPU utilization is very low, so the T408 and server are very efficient for cloud gaming and other same-resolution, low-latency interactive applications. 

Netint Codensity, ASIC-based T408 Video Transcoder
Figure 2. The T408 is powered by the Codensity G4 ASIC.

Testing Overview

We tested the server with FFmpeg and GStreamer. As you’ll see, in most operations, performance was similar. In some simple transcoding applications, FFmpeg pulled ahead, while in more complex encoding ladder productions, particularly 4K encoding, GStreamer proved more performant, particularly for low-latency output.

Figure 3. The software architecture for controlling the server.  

Operationally, both GStreamer and FFmpeg communicate with the libavcodec layer that functions between the T408 NVME interface and the FFmpeg software layer. This allows existing FFmpeg and GStreamer-based transcoding applications to control server operation with minimal changes.

To allocate jobs to the ten T408s, the T408 device driver software includes a resource management module that tracks T408 capacity and usage load to present inventory and status on available resources and enable resource distribution. There are several modes of operation, including auto, which automatically distributes the work among the available resources.

Alternatively, you can manually assign decoding and encoding tasks to different T408 devices in the command line or application and control which streams are decoded by the host CPU or a T408. With these and similar controls, you can efficiently balance the overall transcoding load between the T408s and host CPU to maximize throughput. We used auto distribution for all tests.

Testing Procedures

We tested using Server version 1.0, running FFmpeg v4.3.1 and GStreamer v1.18 and T408 release 3.2.0. We tested with two use cases in mind. The first is a stream in-single stream out, either at the same resolution as the incoming stream or output at a lower resolution.  This mode of operation is used in many interactive applications like cloud gaming, real-time gaming, and auctions where the absolute lowest latency is required. We also tested scaling performance since many interactive applications scale the input to a lower resolution.

The second use case is ABR, where a single input stream is transcoded to a full encoding ladder. In both modes, we tested normal and low-latency performance. To simulate live streaming and minimize file I/O as a drag on system performance, we retrieved the source file from a RAM drive on the server and delivered the encoded file to RAM.

Play Video about NETINT Video Transcoding Server - ASIC technology at its best
HARD QUESTIONS ON HOT TOPICS
All you need to know about NETINT Transcoding Server powered by ASICs
Watch the full conversation on YouTube: https://youtu.be/6j-dbPbmejw

One-to-One Performance

Table 1 shows transcoding results for 4K, 1080p, and 720p in latency tolerant and low-delay modes. Instances is the number of full frame rate outputs produced by the system, with CPU utilization shown for reference. These results are most relevant for cloud gaming and similar applications that input a single stream, transcode the stream at full resolution, and distribute it.

As you can see, 4K results peak at 20 streams for all codecs, though results differ by the software program used to generate the streams. The number of 1080p outputs range from 70 – 80, while 720p streams range from 140 to 170. As you would expect, CPU utilization is extremely low for all test cases as the T408s are shouldering the complete decoding/encoding role. This means that performance is limited by T408 throughput, not CPU, and that the 64-core CPU probably wouldn’t produce any extra streams in this use case. For pure encoding operations, the 8-core server would likely suffice, though given the minimal price differential between the 8-core and 32-core systems, opting for the higher-end model is a prudent investment.

Latency

As for latency, in the normal mode, latency averaged around 45 ms for 4K transcoding and 34 ms for 1080p and 720p transcoding. In low delay mode, this dropped to around 24 ms for 4K, 7 ms for 1080p, and 3 ms for 720, all at 30 fps transcoding and measured with FFmpeg. For reference, at 30 fps, each frame is displayed for 33.33 ms. Even in latency-tolerant mode, latency is just over 1.36 frames for 4K and under a single frame for 1080p and 720p. In low delay modes, all resolutions are under a single frame of latency.

It’s worth noting that while software performance would drop significantly from H.264 to HEVC, hardware performance does not. Thus questions of codec performance for more advanced standards like HEVC do not apply when using ASICs. This is good news for engineers adopting HEVC, and those considering HEVC in the future. It means you can buy the server, comfortable in the knowledge that it will perform equally well (if not better) for HEVC encoding or transcoding.

Table 1. Full resolution transcodes with FFmpeg and Gstreamer
in regular and low delay modes.

Table 2 shows the performance when scaling from 4K to 1080p and from 1080p to 720p, again by the different codecs in and out. Since scaling is performed by the host CPU, CPU usage increases significantly, particularly on the higher volume 1080p to 720p output. Still, given that CPU utilization never exceeds 35%, it appears that the gating factor to system performance is T408 throughput. Again, while the 8-core system might be able to produce similar output if your application involves scaling, the 32-core system is probably better advised.

In these tests, latency was slightly higher than pure transcoding. In normal mode, 4K > 1080p latencies topped out at 46 ms and dropped to 39 ms for 1080p > 720p scaling, just over a single frame of latency. In low latency mode, these results dropped to 10 ms for 4K > 1080p and 10 ms for 1080p > 720p. As before, these latency results are for 30fps and were measured with FFmpeg.

Table 2: Performance while scaling from 4K to 1080p and 1080p to 720p.

The final set of tests involves transcoding to the AVC and HEVC encoding ladders shown in Table 3. These results will be most relevant to engineers distributing full encoding ladders in HLS, DASH, or CMAF containers.

Here we see the most interesting discrepancies between FFmpeg and GStreamer, particularly in low delay modes and in 4K results. In the 1080p AVC tests, FFmpeg produced 30 5-rung encoding ladders in normal mode but dropped to nine in low-delay mode. GStreamer produced 30 encoding ladders in both modes using substantially lower CPU resources. You see the same pattern in the 1080p four-rung HEVC output where GStreamer produced more ladders than FFmpeg using lower CPU resources in both modes.

Table 3. Full encoding ladders output in the listed modes.

FFmpeg produced very poor results in 4K testing, particularly in low latency mode, and it was these results that drove the testing with GStreamer. As you can see, GStreamer produced more streams in both modes and CPU utilization again remained very low. As with the previous results, the low CPU utilization means that the results reflect the encoding limits of the T408. For this reason, it’s unlikely that the higher end server would produce more encoding ladders.

In terms of latency, in normal mode, latency was 59 ms for the H.264 ladder, 72 ms for the 4 rung 1080p HEVC ladder, and 52 ms for the 4K HEVC ladder. These numbers dropped to 5 ms, 7 ms, and 9 ms for the respective configurations in low latency mode.

Power Consumption

Power consumption is an obvious concern for all video engineers and operations teams. To assess system power consumption, we tested using the IPMI Tool. When running completely idle, the system consumed 154 watts, while at maximum CPU, the unit averaged 400 watts with a peak of 425 watts.

We measured consumption during the three basic operations tested, pure transcoding, transcoding with scaling, and ladder creation, in each case testing the GStreamer scenario that produced the highest recorded CPU usage. You see the results in Table 4.

When you consider that CPU-only transcoding would yield a fraction of the outputs shown while consuming 25-30% more power, you can see that the T408 is exceptionally efficient when it comes to power consumption. The Watts/Output figure provides a useful comparison for other competitive systems, whether CPU or GPU-based.

Table 4. Power consumption during the specified operation.

Conclusion

With impressive density, low power consumption, and multiple integration options, the NETINT Video Transcoding Server is the new standard to beat for live streaming applications. With a lower price model available for pure encoding operations, and a more powerful model for CPU-intensive operations, the NETINT server family meets a broad range of requirements.

ASICs – The Time is Now

A brief review of the history of encoding ASICs reveals why they have become the technology of choice for high-volume video streaming services and cloud-gaming platforms.

Like all markets, there will be new market entrants that loudly announce for maximum PR effect, promising delivery at some time in the future. But, to date, outside of Google’s internal YouTube ASIC project called ARGOS and the recent Meta (Facebook) ASIC also for internal use only, NETINT is the only commercial company building ASIC-based transcoders for immediate delivery.

“ASICs are the future of high-volume video transcoding as NETINT, Google, and Meta have proven. NETINT is the only vendor that offers its product for sale and immediate delivery making the T408 and Quadra safe bets.”

Delaying a critical technology decision always carries risk. The risk is that you miss an opportunity or that your competitors move ahead of you. However, waiting to consider an announced and not yet shipping product means that you ALSO assume the manufacturing, technology, and supply chain risk of THAT product.

What if you delay only to find out that the announced delivery date was optimistic at best? Or, what if the vendor actually delivers, only for you to find out that their performance claims were not real? There are so many “what if’s” when you wait that it rarely is the right decision to delay when there is a viable product available.

Now let’s review the rebirth of ASICs for video encoding and see how they’ve become the technology of choice for high-volume transcoding operations.  

The Rebirth of ASICs for Video Encoding

An ASIC is an application specific integrated circuit that is designed to do a small number of tasks with high efficiency. ASICs are purpose-built for a specific function. The history of video encoding ASICs can be traced back to the initial applications of digital video and the adoption of the MPEG-2 standard for satellite and cable transmission.

Most production MPEG-2 encoders were ASIC-based.

As is the case for most new codec standards, the first implementation of MPEG-2 compression was CPU-based. Given the cost of using commodity servers and software, dedicated hardware is always necessary to handle the processing requirements of high-quality video encoding cost-effectively.

This led to the development and application of video encoding ASICs, which are specialized integrated circuits designed to perform the processing tasks required for video encoding. Encoding ASICs provide the necessary processing power to handle the demands of high-quality video encoding while being more cost-effective than CPU-based solutions.

With the advent of the internet, the demand for digital video continued to increase. The rise of on-demand and streaming video services, such as YouTube and Netflix, led to a shift towards CPU-based encoding solutions. This was due in part to the fact that streaming video required a more flexible approach to encoding including implementation agility with the cloud and an ability to adjust encoding parameters based on the available bandwidth and device capabilities.

As the demand for live streaming services increased, the limitations of CPU-based encoding solutions became apparent. Live streaming services, such as cloud gaming and real-time interactive video like gaming or conferencing, require the processing of millions of live interactive streams simultaneously at scale. This has led to a resurgence in the use of encoding ASICs for live-streaming applications. Thus, the rebirth of ASICs is upon us and it’s a technology trend that should not be ignored even if you are working in a more traditional entertainment streaming environment.

NETINT: Leading the Resurgence

NETINT has been at the forefront of the ASIC resurgence. In 2019, the company introduced its Codensity T408 ASIC-based transcoder. This device was designed to handle 8 simultaneous HEVC or H.264 1080p video streams, making it ideal for live-streaming applications.

The T408 was well-received by the market, and NETINT continued to innovate. In 2021, the company introduced its Quadra series. These devices can handle up to 32 simultaneous 1080p video streams, making it even more powerful than the T408, also adding the anticipated AV1 codec.

“NETINT has racked up a number of major wins including major names such as ByteDance, Baidu, Tencent, Alibaba, Kuaishou, and a US-based global entertainment service.”

As described by Dylan Patel, editor of the Semianalysis blog, in his article Meet NETINT: The Startup Selling Datacenter VPUs To ByteDance, Baidu, Tencent, Alibaba, And More, “NETINT has racked up a number of major wins including major names such as ByteDance, Baidu, Tencent, Alibaba, Kuaishou, and a similar sized US-based global platform.”

NETINT Quadra T1U Video Processing Unit
– NETINT’s second-generation of shipping ASIC-based transcoders.

Patel also reported that using the HEVC codec, NETINT video transcoders and VPUs crushed Nvidia’s T4 GPU, which is widely assumed to be the default choice when moving to a hardware encoder for the data center. The density and power consumption that can be achieved with a video ASIC is unmatched compared to CPUs and GPUs.

Patel commented further, “The comparison using AV1 is even more powerful… NETINT is the leader in merchant video encoding ASICs.”

“The comparison using AV1 is even more powerful…NETINT is the leader in video encoding ASICs.”

-Dylan Patel

ASIC Advantages

ASICs are designed to perform a specific task, such as encoding video, with a high degree of efficiency and speed. CPUs and GPUs are designed to perform a wide range of general-purpose computing tasks. As evidence of this fact, today, the primary application for GPUs has nothing to do with video encoding. In fact, just 5-10% of the silicon real estate on some of the most popular GPUs in the market are dedicated to video encoding or processing. Highly compute-intensive tasks like AI inferencing are the most common workload for GPUs today.

The key advantage of ASICs for video encoding is that they are optimized for this specific task, with a much higher percentage of gates on the chip dedicated to encoding than CPUs and GPUs. ASICs can encode much faster and with higher quality than CPUs and GPUs, while using less power and generating less heat.

“ASICs can encode much faster and with higher quality than CPUs and GPUs while using less power and generating less heat.”

-Dylan Patel

Additionally, because ASICs are designed for a specific task, they can be more easily customized and optimized for specific use cases. Though some assume that ASICs are inflexible, in reality, with a properly designed ASIC, the function it’s designed for may be tuned more highly than if the function was run on a general purpose computing platform. This can lead to even greater efficiency gains and improved performance.

The key takeaway is that ASICs are a superior choice for video encoding due to their application-specific design, which allows for faster and more efficient processing compared to general-purpose CPUs and GPUs.

Confirmation from Google and Meta

Recent industry announcements from Google and Meta confirm these conclusions. When Google announced the ASIC-based Argos VCU (Video Coding Unit) in 2021, the trade press rightfully applauded. CNET announced that “Google supercharges YouTube with a custom video chip.” Ars Technica reported that Argos brought “up to 20-33x improvements in compute efficiency compared to… software on traditional servers.” SemiAnalysis reported that Argos “Replaces 10 Million Intel CPUs.”

Google’s Argos confirms the value of encoding ASICs
(and shipped 2 years after the NETINT T408).

As described in the article “Argos dispels common myths about encoding ASICs” (bit.ly/ASIC_myths), Google’s experience highlights the benefits of ASIC-based transcoders. That is, while many streaming engineers still rely on software-based transcoding, ASIC-based transcoding offers a clear advantage in terms of CAPEX, OPEX, and environmental sustainability benefits. The article goes on to address outdated concerns about the shortcomings of ASICs, including sub-par quality and the lack of upgradeability.

The article discusses several key findings from Google’s presentation on the Argos ASIC-based transcoder at Hot Chips 33, including:

  • Encoding time has grown by 8000% due to increased complexity from higher resolutions and frame rates. ASIC-based transcoding is necessary to keep video services running smoothly.
  • ASICs can deliver near-parity to software-based transcoding quality with properly designed hardware.
  • ASICs quality and functionality can be improved and changed long after deployment.
  • ASICs deliver unparalleled throughput and power efficiency, with Google reporting a 90% reduction in power consumption.

Though much less is known about the Meta ASIC, its announcement prompted Facebook’s Director of Video Encoding, David Ronca, to proclaim, “I propose that there are two types of companies in the video business. Those that are using Video Processing ASICs in their workflows, and those that will.”

“…there are two types of companies in the video business. Those that are using Video Processing ASICs in their workflows, and those that will.”

Meta proudly announces its encoding ASIC
(3 years after NETINT’s T408 ships).

Unlike the ASICs from Google and Meta, you can actually buy ASIC-based transcoders from NETINT, and in fact scores of tens of thousands of units are operating in some of the largest hyperscaler networks and video streaming platforms today. The fact that two of the biggest names in the tech industry are investing in ASICs for video encoding is a clear indication of the growing trend towards application-specific hardware in the video field. With the increasing demand for high-quality video streaming across a variety of devices and platforms, ASICs provide the speed, efficiency, and customization needed to meet these needs.

Avoiding Shiny New Object Syndrome

ASICs as the best method for transcoding high volumes of live video has not gone unnoticed, meaning you should expect product announcements that are made pointing to “availability later this year.” When these occur around prominent trade shows, it can indicate a rushed announcement made for the show, and that the later availability may actually be “much later…”

It’s useful to remember that while waiting for a new product from a third-party supplier to become available, companies face three distinct risks: manufacturing, technology, and supply chain.

Manufacturing Risk:

One of the biggest risks associated with waiting for a new product is the manufacturing risk, which means that the product may have issues in manufacturing. That is, there is always a chance that the manufacturing process may encounter unexpected problems, causing delays and increasing costs. For example, Intel has faced manufacturing issues with its 10nm processors, which resulted in delays for its upcoming processors. As a result, Intel lost market share to competitors such as AMD and NVIDIA, who were able to release their products earlier.

Technology Risk:

Another risk associated with waiting for a new product is technology risk, or that the product may not conform to the expected specifications, leading to performance issues, security concerns, or other problems. For example, NVIDIA’s RTX 2080 Ti graphics card was highly anticipated, but upon release, many users reported issues with its performance, including crashes, artifacting, and overheating. This led to a delay in the release of the RTX 3080, as NVIDIA had to address these issues before releasing the new product. Similarly, AMD’s Radeon RX7900 XTX graphics card has been plagued with claims of overheating. 

Supply Chain Risk:

The third risk associated with waiting for a new product is supply chain risk. This means that the company may be unable to get the product manufactured and shipped on time due to issues in the supply chain. For example, AMD faced supply chain issues with its Radeon RX 6800 XT graphics card, leading to limited availability and higher prices.

The reality is that any company building and launching a cloud gaming or streaming service is assuming its own technology and market risks. Compounding that risk by waiting for a product that “might” deliver minor gains in quality or performance (but equally might not) is a highly questionable decision, particularly in a market where even minor delays in launch dates can tank a new service before its even off the ground.

Clearly, ASICs are the future of high-volume video transcoding; NETINT, Google, and Meta have all proven this. NETINT is the only vendor of the three that actually offers its product for sale and immediate delivery; in fast-moving markets like interactive streaming and cloud gaming, this makes NETINT’s shipping transcoders, the T408 and Quadra, the safest bets of all.

ASICs, A Preferred Technology for High Volume Transcoding

The video presented below (and the transcript) is from a talk I gave for the Streaming Video Alliance entitled The Nine Events that Shook the Codec World on March 30, 2023. During the talk, I discussed the events occurring over the previous 12-18 months that impacted codec deployment and utility.

Not surprisingly, number 1 was Google Chrome starting to play HEVC. Number 8 was Meta announcing their own ASIC -based transcoder. Given that both Google and Meta are now using ASICs in their encoding workflows, it was an important signal that ASICs were now the preferred technology for high-volume streaming. 

In this excerpt from the presentation, I discuss the history of ASIC-based encoding from the MPEG-2 days of satellite and cable TV to current-day deployments in cloud gaming and other high-volume live interactive video services. Spend about 4 minutes reading the transcript or watching the video and you’ll understand why ASICs have become the preferred technology for high-volume transcoding. 

Here’s the transcript; the video is below. I will say that I heavily edited the transcript to remove the ums, ahs, and other miscues in the transcript.  

Historically, you can look at ASIC usage in three phases. Back when digital video was primarily deployed on satellite and cable TV in a MPEG-2 format, almost all encoders were ASIC-based. And that was because the CPUs at the time weren’t powerful enough to produce MPEG-2 in real-time. 

Then starting in around 2012 or so and ending around 2018, video processing started moving to the cloud. CPUs were powerful enough to support real-time encoding or transcoding of H.264, and ASIC usage decreased significantly.

Then starting in around 2012 or so, and ending around 2018, video processing started moving to the cloud. CPUs were powerful enough to support real-time encoding or transcoding of H.264, and ASIC usage decreased significantly.

At the time, I was writing for Streaming Media Magazine, Elemental came out and in 2012 or 2013, they really hyped the fact that they had compression-centric hardware appliances for encoding. Later on, discussing the same hardware, they transitioned to what they called software-defined video processing. And that’s how they got bought by AWS. AWS now does most of the encoding with Elemental products with their own Graviton CPUs.

ASICs - the latest phase

Now the latest phase. We’re seeing a lot of high-volume interactive use like gambling, auctions, high-volume UGC and other live videos, and cloud gaming. 

Codecs are also getting more complex. As we move from H.264 to HEVC to AV1 and soon to VVC and perhaps LCEVC and EVC, GPUs and CPUs can’t keep up.

At the same time, power consumption and density are becoming critical factors. Everybody’s talking about cost of power, and power consumption in data centers, and using CPUs and GPUs is just very, very inefficient.

And this is where ASICs emerge as the best solution on a cost-per-stream, watts-per-stream, and density basis. Density means how many streams we can output from a single server.

And we saw this, “Google Replaces Millions of Intel’s CPUs With Its Own Homegrown Chips.” Those homegrown chips were encoding ASICs. And then we saw Meta. 

ASICs - significance.

These deployments legitimize encoding ASICs as the preferred technology for high-volume transcoding, implicitly and explicitly. 

“There are two types of companies in the video business. Those using Video Processing ASICs in their workflows, and those that will”.

– David Ronca

I say explicitly because of the following comments made by David Ronca, who was director of video encoding at Netflix and then moved to Meta, two or three years ago. Announcing Meta’s new ASIC, he said, “There are two types of companies in the video business. Those using Video Processing ASICs in their workflows, and those that will be.”

Usage by Google and Facebook, Meta, gives ASICs a lot more credibility than what you get from me saying it, as obviously, NETINT makes encoding ASICs. And these legitimize our technology. The technologies themselves are different. Meta made their own chips. Google made their own chips. We have our own chips. But the whole technology is legitimized by the usage of these premiere services.


Watch the full presentation on YouTube:
https://youtu.be/-4sJ0We0hro

Cloud Gaming Economic Factors and Technical Considerations

Cloud Gaming Economic Factors

The gaming industry has come a long way. In 2022 it played host to an estimated 3.2 billion players worldwide, generating a total revenue of $184.4 billion, according to Newzoo.

One of the most remarkable developments in recent years has been the accessibility and affordability of gaming. Players can now enjoy gameplay on almost any device connected to the Internet via subscription services in addition to traditional PC and console games.

Game publishers have made great strides in adopting the latest graphics and hardware technologies. However, a delay in moving to cloud gaming from console-based approaches could open the door for disruption from subscription video platforms like NETFLIX. Just as NETFLIX disrupted the home entertainment rental ecosystem with their always-available subscription streaming service, they could do the same with gaming.

Cloud gaming platforms operate in a highly competitive environment with narrow margins. In the United States, popular cloud gaming platforms like Amazon Luna start at $4.99 per month. This makes choosing the right GPU for game graphics rendering and video encoder essential for profitability and competitiveness. Cloud gaming platforms specifying video encoders should consider four key factors; CAPEX, OPEX, Quality, and not funding their competitors.

Lowest Cost Per Stream

For a cloud gaming platform, the cost per stream represents the initial investment required to set up the platform, including the cost of servers and encoders. With the cloud, the cost per stream impacts the profitability of a managed service like a cloud gaming platform to the point of making the entire business model viable.

ASICs are the secret to making a cloud gaming service viable. With an ASIC-based encoder like the NETINT Quadra T2 VPU (Video Processing Unit), coupled with a GPU from AMD, a single server can deliver as many as 200 simultaneous 720p60 gameplay sessions. This performance beats the previous high-water mark of 48 game play sessions using eight GPUs in a single server chassis.

Lowest Possible OPEX Per Stream

OPEX (Operating Expense) represents the ongoing costs of running the platform, including electricity, bandwidth, and maintenance. Energy (electricity) costs are a significant part of OPEX, and they are increasing in many regions. This makes power consumption an important and key consideration for choosing an encoder.

NETINT VPUs are the ideal hedge against rising energy costs, ensuring the platform remains viable despite uncertain energy and economic conditions.

Compared to CPU-based encoding with software, the Quadra T2 VPU consumes 10 to 20-times less energy at only 40 watts per hour delivering the same throughput. Depending on the host server configuration, as many as ten VPUs may be installed making each server the functional equivalent of ten to twenty high-end server machines.

Rack space requirements should also be considered. With colocation prices ranging from $50 – $300 per month, the additional servers needed in a software only implementation would cost up to an extra $5,700 per month for 200 gamers (co-location costs only). While costs may be less if housed in your own facility, you still need racks, cooling, and maintenance for 20 servers compared to one.

With subscriber rates starting at $4.99 per month and in some cases lower, margins are razor thin making high-density transcoding and efficient power usage essential to profitability. This should put ASIC-based transcoders on the short list of all cloud gaming services.

Quality Considerations

A long-lingering misconception about ASICs is that the quality cannot match that produced by the software. Obviously, video quality depends upon configuration options and the operational state that the encoder is operated in. Internal tests show that the HEVC output quality of NETINT VPUs is quite competitive to software and other hardware transcoders, especially when run in their lowest latency state. See Table 1.

For example, as compared to x265, the Quadra VPU produced better output quality than NVENC, the popular encoder that is available on NVIDIA’s more recent GPUs and x265 up to the medium preset. x265 using the medium preset produces quality that is close to VOD. But it is an operational mode not commonly used because of the computing power needed.

Most live streaming engineers use the x265 veryfast or superfast presets. When compared to the x265 superfast preset, Quadra VPU produced the same quality and with an additional 25% bitrate reduction, which translates to significant savings.

Cloud Gaming Economic Factors and Technical Considerations
Table 1. BD-Rate PSNR quality comparisons between Quadra, x265,
and the NVIDIA RTX 3090 encoder in low latency settings.

At the extreme right, you see that Quadra was able to match the quality of the NVIDIA RTX 3090 HEVC encoder at up to an 11.57% bitrate production. ASICs producing quality that rivals software encoding is not unusual. As discussed here, Google has achieved near-software quality with their ASIC-based ARGOS transcoder as well. This shows that clearly, you do not need to compromise on quality to achieve the density and efficiency benefits of ASIC-based transcoding.

Play Video about Cloud Gaming Economic Factors and Technical Considerations - thumbnails
HARD QUESTIONS ON HOT TOPICS
Cloud Gaming Economic Factors and Technical Considerations
Watch the full conversation on YouTube: https://youtu.be/PM5Ts9Ko7DA

Hidden Costs of GPU

Evaluating the cost of hardware is relatively straightforward if the primary factors are easily understood and defined. However, with GPUs, there are hidden costs that are not always recognized or acknowledged. For example, as tech platforms expand their offerings, Cloud gaming platforms could find that they are funding potential competitors.

As an illustration, the US Federal Trade Commission is attempting to block Microsoft’s acquisition of Activision, partly because the Azure cloud platform gives Microsoft a cost advantage over cloud gaming platforms without similar infrastructure.

Presumably, Amazon with AWS, has the same advantage. Similarly, this article describes the cost advantage that NVIDIA derives from other services that buy its GPUs for game rendering.

Another hidden cost can be found in the complexity of the procurement process for GPUs. Due to the supply chain issues triggered by COVID, and the incredible demand spike for GPUs, simply having the opportunity to buy the amount needed was far from certain. Still, your negotiation strength could have significant sway on the price or delivery schedule that you received. Put simply, for anyone needing to buy GPUs in the quantities needed by a cloud gaming platform, it cannot be assumed all that is needed is a P.O.

Finally, there’s a significant loss of negotiating leverage once a gaming platform chooses a GPU vendor, and this is particularly true when the GPU performs double duty in rendering frames and encoding them for streaming. Once a platform chooses a GPU vendor, their technical architecture is essentially locked with that selection, so they can’t switch to another GPU vendor without significant development time and cost. This puts the platform at a disadvantage when negotiating with the selected vendor as they have limited bargaining power.

Often, GPU vendors abuse this leverage by charging expensive license/API costs or refusing to make improvements for their customers. In other cases, this lack of bargaining power could lock platforms into using a GPU-based encoder that delivers uncompetitive quality as compared to third-party options. Some GPU vendors may even refuse to undertake enhancements that would enable the use of third-party transcoders, even if this would improve throughput and quality and reduce OPEX for the game platform.

By implementing a dedicated transcoding unit separate from the GPU, a cloud gaming platform can decouple its design into standalone GPU and VPU modules. This makes it simpler for vendors to switch to different vendors, providing significant leverage to negotiations with all vendors.

The Cloud Gaming Opportunity

According to Newzoo, cloud gaming is one of the fastest-growing gaming industry segments, with a CAGR of 50.9% from 2020 to 2023, accounting for 49% of the global gaming market. Cloud gaming is a benefit to players in all regions and it opens up new entertainment experiences for many people without access to expensive consoles or who cannot afford the newest games.

For others, access to high-quality gaming is a way to extend the entertainment experience outside of the home. Also, it offers a way for mobile gamers to access games that they may be unable to play on their mobile devices due to hardware limitations.

With NETINT VPUs, you can deliver
a premium experience profitably.

The business and market outlook for cloud gaming is sure to be a growth driver not to be ignored. With NETINT VPUs, you can profitably deliver a premium experience. Reach out, and we’ll happily show you how to move forward on this exciting trend.

The Components That Make Cloud Gaming Production Affordable (or Not)

CPUs, GPUs, and ASICs - major cost elements of cloud gaming platforms with commercial examples of hardware combinations and stream output. Normalizing comparisons on a single form factor is essential.

If you’ve made it past the title, you know that cloud gaming platforms operate in a highly competitive environment with narrow margins. This makes the purchase and operating costs per stream critical elements to system success.

This brief article will lay out the major cost elements of cloud gaming platforms and cite some commercial examples of hardware combinations and stream output. We’ve created a table you can use to collect the critical data points while looking at potential solutions around the NAB show, or if you’re simply browsing around the web. If you are at NAB, come by and see us at booth W1672 to discuss the NETINT solution shown in the table.

At their cores, cloud gaming production systems perform three functions; game logic, graphics rendering, and video encoding (Figure 1). Most systems process the game logic on a CPU and the graphics on a GPU. Encoding can be performed via the host CPU, the GPU, or a separate transcoder like NETINT’s ASIC-based Quadra, which outputs H.264, HEVC, and AV1.

The Components That Make Cloud Gaming Production Affordable - diagram 1
Figure 1. The three core functions of a cloud gaming system.

Given the different components and configurations, identifying the cost per stream is critical to comparison analysis. Obviously, a $25,000 system that outputs 200 720p60 streams (cost/stream = $125) is more affordable than a $10,000 system that outputs 25 720p60 streams (cost/stream = $400).

Power consumption per stream is also a major cost contributor. Assuming a five-year expected life, even a small difference between two systems will be multiplied by 60 months of power bills and will significantly impact TCO, not to mention the environment or regulatory considerations.

Finally, normalizing comparisons on a single form factor, like a 1RU or 2RU server, is also essential. Beyond the power cost of a system, rack space costs money, whether in colocation fees or your own in-house costs. The other side of this coin is system maintenance; it costs less to maintain five servers that deliver 1,000 streams than 20 servers that deliver the same output.

Play Video about The Components That Make Cloud Gaming Production Affordable - thumbnail
HARD QUESTIONS ON HOT TOPICS
Get the cost per stream with the proper mix of GPU, CPU, and ASIC-based VPU
Watch the full conversation on YouTube: https://youtu.be/xaSRL847eIs

Comparing Systems

Enough talk; let’s compare some systems. Let’s agree up front that any comparison is unavoidably subjective, with results changing with the games tested and game configurations. You’ll almost certainly complete your own tests before buying, and at that point, you can ensure an apples-to-apples comparison. Use this information and the data you collect on your own to form a high-level impression of the value proposition delivered by each hardware configuration.

Table 1 details three systems, a reference design that is in mass production from NETINT, one from an established mobile cloud gaming platform, and one from Supermicro based on an Ampere Arm processor and four NVIDIA A16 GPUs.

Table 1. System configurations.

To compute the pricing information for the systems shown in table 2, we priced each component on the web and grabbed maximum power consumption data from each manufacturer. Pricing and power consumption shown are for the components listed, not the entire system. The number of 720p outputs is from each manufacturer, including NETINT.

Table 2. Component cost and power usage, total and on a cost-per-stream basis.

From there, it’s simple math; divide the cost and total watts by the 720p stream count to determine the cost per stream and watts per stream. Again, this is only for the core components identified, but the computer and other components should be relatively consistent irrespective of the CPU, GPU, and VPU that you use. 

ASIC-based transcoders plus GPUs are the most cost-effective configuration to deliver a profitable and high-quality game streaming experience.
We are happy to share our data and sources so you can confirm independently.

As you walk the NAB show floor, or check proposed solutions on the web, beware of custom bespoke architectures using proprietary solutions (e.g. all Intel, all NVIDIA, all AMD). Each company has their demos that showcase technology, but not operational competitiveness. None of these systems can meet the OPEX or CAPEX needed for a competitive and profitable cloud gaming solution.

We challenge you to get your own numbers and compare them!
Download the printable TABLE HERE

ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses

ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses

As the title suggests, this post compares CAPEX and OPEX costs for live streaming using ASIC- based transcoding and CPU-based transcoding. The bottom line?

NETINT Transcoding Server with 10 T408 Video Transcoders
Figure 1. The 1 RU Deep Edge Appliance with ten NETINT T408 U.2 transcoders.

Jet-Stream is a global provider of live-streaming services, platforms, and products. One such product is Jet-Stream’s Deep Edge OTT server, an ultra-dense scalable OTT streaming transcoder, transmuxer, and edge cache that incorporates ten NETINT T408 transcoders. In this article, we’ll briefly review how Deep Edge compared financially to a competitive product that provided similar functionality but used CPU-based transcoding.

About Deep Edge

Jet-Stream Deep Edge is an OTT edge transcoder and cache server solution for telcos, cloud operators, compounds, and enterprises. Each Deep Edge appliance converts up to 80 1080p30 television channels to OTT HLS and DASH video streams, with a built-in cache enabling delivery to thousands of viewers without additional caches or CDNs.

Each Deep Edge appliance can run individually, or you can group multiple systems into a cluster, automatically load-balancing input channels and viewers per site without the need for human operation. You can operate and monitor Edge appliances and clusters from a cloud interface for easy centralized control and maintenance. In the case of a backlink outage, the edge will autonomously keep working.

Figure 2. Deep Edge operating schematic.

Optionally, producers can stream access logs in real-time to the Jet-Stream cloud service. The Jet-Stream Cloud presents the resulting analytics in a user-friendly dashboard so producers can track data points like the most popular channels, average viewing time, devices, and geographies in real-time, per day, week, month, and year, per site, and for all the sites.

Deep Edge appliances can also act as a local edge for both the internal OTT channels and Jet-Stream Cloud’s live streaming and VOD streaming Cloud and CDN services. Each Deep Edge appliance or cluster can be linked to an IP-address, IP-range, AS-number, country, or continent, so local requests from a cell tower, mobile network, compound, football stadium, ISP, city, or country to Jet-Stream Cloud are directed to the local edge cache. Each Deep Edge site can be added to a dynamic mix of multiple backup global CDNs, to tune scale, availability, and performance and manage costs.

Under the Hood

Each Deep Edge appliance incorporates ten NETINT T408 transcoders into a 1RU form factor driven by a 32-core CPU with 128 GB of RAM. This ASIC-based acceleration is over 20x more efficient than encoding software on CPUs, decreasing operational cost and CO2 footprint by order of magnitude. For example, at full load, the Deep Edge appliance draws under 240 watts.

The software stack on each appliance incorporates a Kubernetes-based container architecture designed for production workloads in unattended, resource-constrained, remote locations. The architecture enables automated deployment, scaling, recovery, and orchestration to provide autonomous operation and reduced operational load and costs.

The integrated Jet-Stream Maelstrom transcoding software provides complete flexibility in encoding tuning, enabling multi-bit-rate transcoding in various profiles per individual channel.

Each channel is transcoded and transmuxed in an isolated container, and in the event of a crash, affected processes are restarted instantly and automatically.

Play Video about ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses
HARD QUESTIONS ON HOT TOPICS
 ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses
Watch the full conversation on YouTube: https://youtu.be/pXcBXDE6Xnk

Deep Edge Proposal

Recently, Jet-Stream submitted a bid to a company with a contract to provide local streaming services to multiple compounds in the Middle East. The prospective customer was fully transparent and shared the costs associated with a CPU-based solution against which Deep Edge competed.

In producing these projections, Jet-Stream incorporated a cost per kilowatt of € 0.20 Euros and assumed that the software-based server would run at 400 Watts/hour while Deep Edge would run at 220 Watts per hour.  These numbers are consistent with lab testing we’ve performed at NETINT; each T408 draws only 7 watts of power, and because they transcode the incoming signal onboard, host CPU utilization is typically at a minimum.

Jet-Stream produced three sets of comparisons; a single appliance, a two-appliance cluster, and ten sites with two-appliance clusters. Here are the comparisons. Note that the Deep Edge cost includes all software necessary to deliver the functionality detailed above for standard features. In contrast, the CPU-based server cost is hardware-only and doesn’t include the licensing cost of software needed to match this functionality.    

Single Appliance

A single Deep Edge appliance can produce 80 streams, which would require five separate servers for CPU-based transcoding. Considering both CAPEX and OPEX, the five-year savings was €166,800.

ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses - Table 1
Table 1. CAPEX/OPEX savings for a single
Deep Edge appliance over CPU-based transcoding.

A Two-Appliance Cluster

Two Deep Edge appliances can produce 160 streams, which would require nine CPU-based encoding servers to produce. Considering both CAPEX and OPEX, the five-year savings for this scenario was €293,071.

Table 2 CAPEX/OPEX savings for a dual-appliance
Deep Edge cluster over CPU-based transcoding.
.

Ten Sites with Two-Appliance Clusters

Supporting ten sites with 180 channels would require 20 Deep Edge appliances and 90 servers for CPU-based encoding. Over five years, the CPU-based option would cost over € 2.9 million Euros more than Deep Edge.

Table 3. CAPEX/OPEX savings for ten dual-appliance
Deep Edge clusters over CPU-based transcoding.

While these numbers border on unbelievable, they are actually quite similar to what we computed in this comparison, How to Slash CAPEX, OPEX, and Carbon Emissions with T408 Video Transcoder, which compared T408-based servers to CPU-only on-premises and AWS instances.

The bottom line is that if you’re transcoding with CPU-based software, you’re paying way too much for both CAPEX and OPEX, and your carbon footprint is unnecessarily high. If you’d like to explore how many T408s you would need to assume your current transcoding workload, and how long it would take to recoup your costs via lower energy costs, check out our calculators here.

Play Video about ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses
Voices of Video: Building Localized OTT Networks
Watch the full conversation on YouTube: https://youtu.be/xP1U2DGzKRo

The Evolution of Video Codecs: AV1 and HEVC Take the Lead

HEVC and AV1 - The Evolution of Codecs

For years, H.264 has remained dominant because it plays everywhere; but as videos grow larger, faster, and deeper in color, cost of distributing H.264 has become too high.

AV1 has leap-frogged VP9 in the so-called “open-source” horse race, while HEVC is the clear successor to H.264 in standards-based codecs, at least for the next 3-4 years as VVC slowly matures.

AV1 and HEVC have had their well-known Achilles heels, AV1 in the living room and on Apple devices, and HEVC in browsers. The last few months have seen critical movement and new data in all these platforms that will fundamentally change how we use them.

AV1 in the Living Room

HEVC has dominated Smart TVs and OTT dongles since 4K and High Dynamic Range (HDR) became must-haves for premium content producers. However, in late 2021, Netflix began distributing AV1 video to this market, and device support has burgeoned since then. As Bitmovin reported in this blog post, AV1 runs on smart TVs running Android TV and Google TV operating systems, including Sony Google TV models from 2021 and forward and many Amazon Fire TV models as far back as 2020. Starting in late 2020, most Samsung TVs have hardware AV1 decoders, with LG extending support to some TVs.

HEVC and AV1 - The Evolution of Codecs
Figure 1. Netflix started the migration of living room content towards AV1. 

Regarding OTT dongles, the Amazon Fire TV Stick 4K Max and the Roku Streaming Stick 4K, and other Roku models support AV1 playback, as does the PlayStation 4 Pro and Xbox One.

The one caveat is that AV1 support for dynamic metadata is nascent. The HDR10+ AV1 Metadata Handling Specification was finalized on December 7, 2022, so it will take a while for encoders and decoders to fully and reliably support it. Since Google’s Project Caviar is proposing a royalty-free alternative to Dolby Vision, Dolby Vision still only supports H.264 and HEVC and may never support AV1.

To be clear, YouTube supports HDR with AV1, so it’s technically feasible today. But standards like the HDR10+ Metadata Handling Specification promote broad playback compatibility necessary for most publishers to help it. For example, when Netflix first started streaming video to bright TV sets in 2021, it was Standard Dynamic Range only, and that’s still the case. Besides, suppose you’re already encoding your video to HEVC for living room delivery in HDR. In that case, it may not make economic sense to reencode to AV1 for slightly more efficient delivery to a market that you’re already serving.

Play Video about HEVC and AV1 - The Evolution of Codecs - thumbnail
HARD QUESTIONS ON HOT TOPICS – EVOLUTION OF VIDEO CODECS – WHEN IS AV1 READY?
Watch the full conversation on YouTube: https://youtu.be/wbMojTl_cpA

HEVC Plays in Chrome

Browser playback has been a traditional strength of AV1 since it first launched. Not surprising, given that all major browser developers are members of the Alliance for Open Media. For the same reason, it’s also no surprise that browsers like Chrome and Firefox never supported HEVC, even when hardware or software on the computer or device did support HEVC playback.

This changed in September 2022, when Google “fixed a bug” and enabled HEVC support when the hardware HEVC playback was available on the system. As the story goes, the lack of HEVC playback was reported by Bitmovin as a bug in 2015. On September 19, 2022, Google responded six years later, “Enabled by default on all releases.” Within weeks, browser support for HEVC, as reported in CanIUse, jumped from the low 20s to 86.49, well ahead of AV1 at around 73%.

This could be a massive benefit to streaming sites that deliver primarily to computers and mobile devices and have avoided HEVC because of the lack of Chrome playback. In a straightforward bugfix, Google enabled HEVC playback on all supported platforms with existing decoders, including Windows, Mac, iOS, and Android.

A caveat exists here, as well, specifically that “HEVC with Widevine DRM is not supported at this point.” This obviously limits the benefit of Chrome support for premium content producers.

Apple May Start Supporting AV1

Apple has a checkered history with the Alliance for Open Media. When Apple joined in 2018, they big footed their way in as a “founding member,” even though the organization was formed over two years earlier. Despite this aggressive posturing, Apple has never supported AV1 playback in its operating systems or browsers and was a massive supporter of HEVC.

Figure 2. Apple is now supporting AV1 playback in Safari 16.4.

At least respecting AV1, this may be about to change. With Safari 16.4, Apple added AV1 support in the media capabilities API and WebRTC support for hardware AV1 decoding on supported device configurations. It turns out that the software AV1 decoder dav1d is already included in the updated WebKit engine used in Apple Safari Technology Preview 161.

Apple is dipping its toes in the AV1 waters; this could mean that it intends to support AV1 playback via software in the short term or that it may unlock previously unannounced hardware playback capabilities in existing CPUs. It could also mean hardware AV1 support will be added in future CPUs. Whatever the strategy, it’s probably safe to assume that Safari will play AV1 at some point in the future, hopefully sooner than later.

That said, the major data point that recently surfaced was a Scientamobile report that indicated that while 86.60% of HEVC smartphones had HEVC hardware support, only 2.52% had AV1 support. Since hardware support guarantees full frame rate playback at minimal power draw, HEVC will likely remain the format of choice for mobile devices for the next 12-24 months.

#image_title
Figure 3. HEVC currently enjoys much greater hardware support in mobile devices than AV1.

Whether you decide to stay with H.264 for your live transcodes, or transition to AV1 or HEVC, NETINT has you covered. Our G4-based line of products (T408, T432) transcode to H.264 and HEVC, while the G5-based Quadra line (T1, T1A, T2A) support H.264, HEVC, and AV1. All products deliver competitive video quality, market-leading density, a highly affordable cost per stream, and the lowest possible power consumption and OPEX.

All You Need to Know About the NETINT Product Line

Quadra - All You Need to Know About the NETINT Product Line

This article will introduce you to the NETINT product line and Codensity ASIC generations. We will focus primarily on the hardware differences, since all products share a common software architecture and feature set, which are briefly described at the end of the article.

PRODUCT GALLERY. Click the product image to visit product page

Codensity G4-Powered Video Transcoder Products

The Codensity G4 was the first encoding ASIC developed by NETINT. There are two G4-based transcoders, the T408 (Figure 1), is available in a U.2 form factor and as an add-in card, and the T432 (Figure 2), which is available as an add-in card. The T408 contains a single G4 ASIC and draws 7 watts under full load, while the T432 contains four G4 ASICs and draws 27 watts.

The T408 costs $400 in low volumes, while the T432 costs $1,500. The T432 delivers 4x the raw performance of the T408.

Netint Codensity, ASIC-based T408 Video Transcoder
Figure 1. The NETINT T408 is powered by a single Codensity G4 ASIC.

T408 and T432 decode and encode H.264 and HEVC on the device but perform all scaling, overlay, and deinterlacing on the host CPU.

If you’re buying your own host, the selected CPU should reflect the extent of processing that it needs to perform and the overhead requirements of the media processing framework that is running the transcode function. 

When transcoding inputs without scaling, as in a cloud gaming or conferencing application, a modest CPU can suffice. If you are creating standard encoding ladders, deinterlacing multiple streams, or frequently scaling incoming videos, you’ll need a more capable CPU. For a turn-key solution, check out the NETINT Logan Video Server options.

Netint Codensity, ASIC-based T432 Video Transcoder
Figure 2. The NETINT T432 includes four Codensity G4 ASICs.

The T408 and T432 run on multiple versions of Ubuntu and CentOS; see here for more detail about those versions and recommendations for configuring your server.

The NETINT Logan Video Server

The NETINT Video Transcoding Server includes ten T408 U.2 transcoders. It is targeted for high-volume transcoding applications as an affordable turn-key replacement for existing hardware transcoders or where a drop-in solution to a software-based transcoder is preferred.

The lowest priced model costs $7,000 and is built on the Supermicro 1114S-WN10RT server platform powered by an AMD EPYC 7232P CPU Series Processor with eight CPU cores and 16 threads running Ubuntu 20.04.05 LTS. The server ships with 128 GB of DDR4-3200 RAM and a 400GB M.2 SSD drive with 3x PCIe slots and ten NVME slots that house the ten T408 transcoders. At full transcoding capacity, the server draws 220 watts while encoding or transcoding up to ten 4Kp60 streams or as many as 160 720p60 video streams.

The server is also offered with two more powerful CPUs, the AMD EPYC 7543P Server Processor (32-cores/64-threads, $8,900) and the AMD EPYC 7713P Server Processor (64-cores/128-threads, $11,500). Other than the CPU, the hardware specifications are identical.

FIGURE 3. The NETINT Video Transcoding Server.

All Codensity G4-based products support HDR10 and HDR10+ for H.264 and H.265 encode and decode, as well as EIA CEA-708 closed captions for H.264 and H.265 encode and decode. In low-latency mode, all products support sub-frame latency. Other features include region-of-interest encoding, a customizable GOP structure with eight presets, and forced IDR frame inserts at any location.

The T408, T432, and NETINT Server are targeted toward high-volume interactive applications that require inexpensive, low-power, and high-density transcoding using the H.264 and HEVC codecs.

Codensity G5-Powered Live Transcoder Products

In addition to roughly quadrupling the H.264 and HEVC throughput of the Codensity G4, the Codensity G5 is our second-generation ASIC that adds AV1 encode support, VP9 decode support, onboard scaling, cropping, padding, graphical overlay, and an 18 TOPS (Trillions of Operations Per Second) artificial intelligence engine that runs the most common frameworks all natively in silicon.

Codensity G5 also includes audio DSP engines for encoding and decoding audio codecs such as MP3, AAC-LC, and HE AAC. All this on-board activity minimizes the role of the CPU allowing Quadra products to operate effectively in systems with modest CPUs.

Where the G4 ASIC is primarily a transcoding engine, the G5 incorporates much more onboard processing for even greater video processing acceleration. For this reason, NETINT labels Codensity G4-based products as Video Transcoders and Codensity G5-based products as Video Processing Units or VPUs.

The Codensity G5 is available in three products (Figure 4), the U.2-based Quadra T1 and PCIe-based Quadra T1A, which include one Codensity G5 ASIC, and the PCIe-based , which includes two Codensity G5 ASICs. Pricing for the T1 starts at $1,500. 

In terms of power consumption, the T1 draws 17 Watts, the T1A 20 Watts, and the T2 draws 40 Watts.

Figure 4. The Quadra line of Codensity G5-based products.

All Codensity G5-based products provide the same HDR and close caption support as the Codensity G4-based products. They have also been tested on Windows, MacOS, Linux and Android OS with support for virtual machine and container virtualization, including Single Root I/O Virtualization [SRIOV].

From a quality perspective, the Codensity G4-based transcoder products offer no configuration options to optimize quality vs. throughput. Quadra Codensity G5-powered VPUs offer features like lookahead and rate-distortion optimization that allow users to customize quality and throughput for their particular applications.

Play Video about Hard Questions - NETINT product line
HARD QUESTIONS ON HOT TOPICS – WHAT DO YOU NEED TO UNDERSTAND ABOUT NETINT PRODUCTS LINE
Watch the full conversation on YouTube: https://youtu.be/qRtnwjGD2mY

AI-Based Video Processing

Beyond VP9 ingest and AV1 output, and superior on-board processing, the Codensity G5 AI engine is a game changer for many current and future video processing applications. Each Codensity G5 ASIC includes two onboard Neural Processing Units (NPUs). Combined with Quadra’s integrated decoding, scaling, and transcoding hardware, this creates an integrated AI and video processing architecture that requires minimal interaction from the host CPU.

Today, in early 2023, the AI-enabled processing market is nascent, but Quadra already supports several applications like AI-based region of interest filter, background removal (see Quadra App Note APPS553), and others. Additional features under development include an automatic facial ID for video conferencing, license plate detection and OCR for security, object detection for a range of applications, and voice-to-text.

Quadra includes an AI Toolchain workflow that enables importing models from AI tools like Caffe, TensorFLow, Keras, and Darknet for deployment on Quadra. So, in addition to the basic models that NETINT provides, developers can design their own applications and easily implement them on Quadra

Like NETINT’s Codensity G4 based products, Quadra VPUs are ideal for interactive applications that require low CAPEX and OPEX. Quadra VPUs offer increased onboard processing that enables lower-cost host systems and the ability to customize throughput and quality, deliver AV1 output, and deploy AI video applications.

The NETINT Quadra 100 Video Server

The NETINT Quadra 100 Video Server includes ten Quadra T1 U.2 VPUs and is targeted for ultra high-volume transcoding applications and for services seeking to deliver AV1 stream output.  

The Quadra 100 Video Server costs $20,000 and is built on the Supermicro 1114S-WN10RT server platform powered by an  AMD EPYC 7543P Server Processor (32-cores/64-threads) running Ubuntu 20.04.05 LTS. The server ships with 128 GB of DDR4-3200 RAM and a 400GB M.2 SSD drive with 3x PCIe slots and ten NVME slots that house the ten T1 U.2 VPUs. At full transcoding capacity, the server draws around 500 watts while encoding or transcoding up to 20 8Kp30 streams or as many as 640 720p30 video streams.

The Quadra server is also offered with two different CPUs, the AMD EPYC 7232P Server Processor (8-cores/16-threads, price TBD) and the AMD EPYC 7713P Server Processor (64-cores/128-threads, price TBD). Other than the CPU, the hardware specifications are identical.

Media Processing Frameworks - Driving NETINT Hardware

In addition to SDKs for both hardware generations, NETINT offers highly efficient FFmpeg and GStreamer SDKs that allow operators to apply an FFmpeg/libavcodec or GStreamer patch to complete the integration.

In the FFmpeg implementation, the libavcodec patch on the host server functions between the NETINT hardware and FFmpeg software layer, allowing existing FFmpeg-based video transcoding applications to control hardware operation with minimal changes.

The NETINT hardware device driver software includes a resource management module that tracks hardware capacity and usage load to present inventory and status on available resources and enable resource distribution. User applications can build their own resource management schemes on top of this resource pool or let the NETINT server automatically distribute the decoding and encoding tasks.

In automatic mode, users simply launch multiple transcoding jobs, and the device driver automatically distributed the decode/encode/processing tasks among the available resources. Or, users can assign different hardware tasks to different NETINT devices, and even control which streams are decoded by the host CPU or NETINT hardware. With these and similar controls, users can most efficiently balance the overall transcoding load between the NETINT hardware and host CPU and maximize throughput.

In all interfaces, the syntax and command structure is similar for T408s and Quadra units which simplifies migrating from G4-based products to Quadra hardware. It is also possible to operate T408 and Quadra hardware together in the same system.

That’s the overview. For more information on any product, please check the following product pages (click the image below to see product page). 

PRODUCT GALLERY. Click the product image to visit product page