AV1 Capped CRF Encoding with Quadra VPU

We’ve previously reported results for capped CRF encoding for H.264 and HEVC using NETINT Quadra video processing units (VPU). This post will detail AV1 performance, including both 1080p and 4K data.

For those with limited time, here’s what you need to know: Capped CRF delivers higher quality video during hard-to-encode regions than CBR, similar quality during all other scenes, and improved quality of experience at the same cost or lower than CBR. NETINT VPUs are the first hardware video encoders to adopt Capped CRF across the three most popular codecs in use today, AV1, HEVC, and H.264.

You can read a quick description of capped CRF here and get a deep dive with H.264 and HEVC performance results here

CAPPED CRF OVERVIEW

Briefly, capped CRF is a smart bitrate control technique that combines the benefits of CRF encoding with a bitrate cap. Unlike variable bitrate encoding (VBR) and constant bitrate encoding (CBR), which target specific bitrates, capped CRF targets a specific quality level, which is controlled by the CRF value. You also set a bitrate cap, which is applied if the encoder can’t meet the quality level below the bitrate cap.

On easy-to-encode videos, the CRF value sets the quality level, which it can usually achieve below the bitrate cap. In these cases, capped CRF typically delivers bitrate savings over CBR-encoded footage while delivering similar quality. For harder-to-encode footage, the bitrate cap usually controls, and capped CRF delivers close to the same quality and bitrate as CBR.

The value proposition is clear: lower bitrates and good quality during easy scenes, and similar to CBR in bitrate and quality for harder scenes. I’m not addressing VBR because NETINT’s focus is live streaming, where CBR usage dominates. If you’re analyzing capped CRF for VOD, you would compare against 2-pass VBR as well as potentially CBR.

One last detail. CRF values have an inverse relationship to quality and bitrate; the higher the CRF value, the lower the quality and bitrate. In general, video engineers select a CRF value that delivers their target quality level. For premium content, you might target an average VMAF score of 95. For user-generated content or training videos, you might target 93 or even lower. As you’ll see, the lower the quality score, the greater the bandwidth savings.

1080p RESULTS

We show 1080p results in Table 1, which is divided between easy-to-encode and hard-to-encode content. We encoded the CBR clips to 4.5 Mbps and applied the same cap for capped CRF encoding.

Jan Ozer-AV1 Capped CRF-1
Table 1. 1080p results using Quadra VPU and capped CRF encoding.

You see that in CBR mode, Quadra VPUs do not reach the target rate as accurately as when using capped CRF mode. This won’t degrade viewer quality of experience since the VMAF scores exceed 95, so this missing on the low side saves excess bandwidth with no visual quality detriment.

In this comparison, bitrate savings is minimized, particularly at CRF 19 and 21, as the capped CRF clips in the hard-to-encode content have a higher bitrate than the CBR counterparts (4,419 and 4,092 to 3,889). Not surprisingly, CRF 19 and 21 deliver little bandwidth savings and a slighly higher quality than CBR.

At CRF 23, things get interesting, with an overall bandwidth savings of 16.1% with a negligible quality delta from CBR. With a VMAF score of around 95, CRF 23 might be the target for engineers delivering premium content. Engineers targeting slightly lower quality can choose CRF 27 and achieve a bitrate savings of 43%, and an efficient 2.4 Mbps bit rate for hard-to-encode footage. At CRF 27, Quadra VPUs encoded the hard-to-encode Football clip at 3,999 kbps with an impressive VMAF score of 93.39.

Note that as with H.264 and HEVC, AV1 capped CRF does reduce throughput. Specifically, a single Quadra VPU installed in a 32-core workstation outputs 23 simultaneous CBR streams using CBR encoding. This dropped to eighteen for capped CRF, a reduction of 22%.

4K RESULTS

Many engineers encoding with AV1 are delivering UHD content, so we ran similar tests with the Quadra and 4K30 8-bit content with a CBR target and bitrate cap of 16 Mbps. Using four clips, including a 4K version of the high-motion Football clip to much less dynamic content like Netflix’s Meridian clip and Blender Foundation’s Sintel.

Table 2. 4K results for the Quadra VPU and capped CRF encoding.

In CBR mode, the Quadra VPU hit the bitrate target much more accurately at 4K than 1080p, so even at CRF 19, the VPU delivered a 13% bitrate savings with a VMAF score of 96.23. Again, CRF 23 delivered a VMAF score of very close to 95, with 45% savings over CBR. Impressively, at CRF 23, Quadra delivered an overall VMAF score of 94.87 for these 4K clips at 7.78 Mbps, and that’s with the Football clip weighing in at 14.3 Mbps.

Of course, these savings directly relate to the cap and CBR target. It’s certainly fair to argue that 16 Mbps is excessive for 4K AV1-encoded content, though Apple recommends 16.8 for 8-bit 4K content with HEVC here.

The point is, when you encode with CBR, you’re limiting quality to control bandwidth costs. With capped CRF, you can set the cap higher than your CBR target, knowing that all content contains easy-to-encode regions that will balance out the impact of the higher cap and deliver similar or lower bandwidth costs. With these comparative settings, capped CRF delivers higher quality video during hard-to-encode regions than CBR, similar quality during all other scenes, and improved quality of experience at the same cost or lower than CBR.

DENSER / LEANER / GREENER : Symposium on Building Your Own Streaming Cloud

World’s First AV1 Live Streaming CDN powered by VPUs

AV1 live streaming CDN

RealSprint’s vision for Vindral, its live-streaming CDN, is to deliver the quality of HLS and the latency of WebRTC. Early trials revealed that CPU-only transcoding lacked scalability, and GPUs used excessive power and proved challenging to configure.

Implementing NETINT’s ASIC-based Quadra delivered the required quality and latency in a low-power, simple-to-configure package with H.264, HEVC, and AV1 output. As a result, Quadra became a “preferred component” of the Vindral setup.

Implementing NETINT’s ASIC-based Quadra delivered the required quality and latency in a low-power, simple-to-configure package with H.264, HEVC, and AV1 output. As a result, Quadra became a “preferred component” of the Vindral setup.

The RealSprint Story

RealSprint is a tech company founded in 2013 and based in Umeå, Sweden. Since its inception, RealSprint has delivered industry-defining solutions that drive real business value. It’s flagship solution, Vindral live CDN, combines ultra-low latency streaming with 4K support, sync, and absolute stability. The latest addition, Composer, streamlines the setup for live video compositing, effects, and encoding.

In explaining RealSprint’s goals to Streaming Media Magazine, RealSprint CEO Daniel Alinder stated that part of the company’s goal is “to disrupt, spur innovation, and ensure high-end streaming experiences.” This focus, and RealSprint’s painstaking execution, has brought customers like Sotheby’s, Hong Kong Jockey Club, and IcelandAir into RealSprint’s client roster.

RealSprint is a tech company founded in 2013 and based in Umeå, Sweden. Since its inception, RealSprint has delivered industry-defining solutions that drive real business value. It’s flagship solution, Vindral live CDN, combines ultra-low latency streaming with 4K support, sync, and absolute stability.

live streaming - World’s First AV1 Live Streaming CDN powered by VPUs
Figure 1. Check out this Vindral demo at https://demo.vindral.com/?4k

Finding the Ideal Transcoder for Vindral

The Vindral live CDN is transforming the landscape for live streaming, offering high-quality streaming at low latency and synchronized playout. As a result, Vindral is highly optimized for verticals such as live sports, iGaming, live auctions, and entertainment markets with a desired latency of around one second and where stability is imperative, even at high video quality.

Alinder explains, “It is, of course, possible to configure for 0.5-second latency as well, but none of our clients has chosen to go that low. More common focus areas are image quality and synchronized playout. A game show with host-crowd interaction does not require real-time latency. Keeping all viewers in sync, around 1 second, while maintaining full-HD quality is a common request that we see.”

Elaborating on Alinder’s comments, Niclas Åström, founder and Chief Product Officer at RealSprint, adds, “we call it the Sweet Spot. Vindral is built to put clients in charge of their own sweet spot in terms of buffer and quality. While we are highly impressed by technologies such as WebRTC, we aim to pave the way for a new mainstream in which latency is only one of the parameters.”

Expanding upon Vindral’s target use cases, Alinder details, “A typical use case is live auctions. The usual setup for live auctions is 1080P, and you want below one second of latency because people are bidding online. There are also people bidding in the actual auction house, so there’s the fairness aspect of it as well.”

“Clients typically configure around a 700-millisecond buffer, and even that small of a buffer makes such a huge difference in quality and reliability. What we see in our metrics is that, basically, 99% of the viewers watch the highest quality stream across all markets. That’s a huge deal.”

Play Video about live streaming - World’s First AV1 Live Streaming CDN powered by VPUs
HARD QUESTIONS ON HOT TOPICS:
World’s first AV1 live streaming CDN powered by NETINT’s Quadra VPU
Watch on YouTube: https://youtu.be/Qhe6wuJoOX0

Exploring Transcoder Options

To provide this flexible latency, Vindral depends upon a transcoder to produce the streams with minimal latency, and a vendor-agnostic hybrid content delivery network (CDN) to deliver the streams. To explain, the transcoder inputs the incoming stream from the live source and produces multiple outputs to deliver to viewers watching on different devices and connections.

Choosing the transcoder is obviously a critical decision for Vindral and RealSprint. When exploring its transcoder options, RealSprint considered multiple criteria, including cost per stream, power, output quality, format support, latency, and density.

According to CTO Per Mafrost, “We started using only CPUs but quickly concluded that we needed better scalability. We moved on to using GPUs, but the hardware setups got a bit more troublesome and more energy-demanding. A year back, we got in touch with NETINT to test their ASICs and were pleased with our findings.”

Netint Codensity, ASIC-based Quadra T2A Video Processing Unit
Figure 2. The NETINT Quadra T2 VPU.

“We’ve found that the quality when using ASICs is fantastic.”

RealSprint CEO Daniel Alinder

Quadra Fills the Gap

Specifically, Vindral implemented NETINT’s Quadra Video Processing Unit (VPU), which is driven by the Codensity G5 ASIC, which stands for Application Specific Integrated Circuit, in terms of transcoding, Quadra inputs H.264, HEVC, and VP9 video and outputs H.264, HEVC, and AV1, all at sub-frame latencies, which translate to under 0.03 seconds for a 30-fps input stream. Quadra is called a VPU rather than a transcoder because, in addition to audio and video transcoding, it also offers onboard scaling, overlay and houses two Deep Neural Network engines capable of 18 Trillion Operations per Second (TOPS).

According to Alinder, Quadra delivers both top quality and the necessary low latency. “We’ve found that the quality when using ASICs is fantastic. It’s all depending on what you want to do. Because we need to understand we’re talking about low latency here. Everything needs to work in real time. Our requirement on encoding is that it takes a frame to encode, and that’s all the time that you get.”

Quadra’s AV1 output was another key consideration. As Alinder explained, “we’re seeing markers that our clients are going to want AV1. And there are several reasons why that is the case. One of which is, of course, it’s license free. If you’re a content owner, especially if you’re a content owner with a large crowd with many subscribers to your content, that’s a game-changer. Because the cost of licensing a codec can grow to become a significant part of your business expenses.”

“That is a huge game changer because ASICs are unmatched in terms of the number of streams per rack unit.”

RealSprint CEO Daniel Alinder

Density and Power Consumption

Density refers to the number of streams a device or server can output. Because ASICs are purpose-built for video transcoding, they’re extremely efficient transcoders that provide maximum density but also very low power consumption. Speaking to Quadra’s density, Alinder commented, “That is a huge game changer because ASICs are unmatched in terms of the number of streams per rack unit.”

Of course, power consumption is also critical, particularly in Europe. As Alinder detailed, “If you look at the energy crisis and how things are evolving, I’d say [power consumption] is very, very important. The typical offer you’ll be getting from the data center is: we’re going to charge you 2x the electrical bill. In Germany, the energy price peaked in August 2022 at 0.7 Euros per kilowatt hour.”

To be clear, in some instances, Vindral can reduce power consumption and other carbon emissions by making travel unnecessary. As Alinder explained, “We have a Norwegian company that we’re working with that is doing remote inspections of ships. They were the first company in the world to do that. Instead of flying in an inspector, the ship owner, and two divers to the location, there’s only one operator of an underwater drone that is on the location. Everybody else is just connected. That’s obviously a good thing for the environment.”

“Another seldom mentioned topic set NETINT ASICs apart from CPUs and many GPUs: linear load. Specifically, it was relatively easy to create a solution where we could feel safe when calculating the load and expected capacity for transcoder nodes. The density, cost/stream, and quality are bonuses.”

RealSprint CTO Per Mafrost

Linear Load

One final characteristic set Quadra apart, was a predictable “linear load” pattern. As described by CTO Mafrost, “in choosing between different alternatives, the usual suspects such as cost, power, quality, and density were our main criteria. But another seldom mentioned topic set NETINT ASICs apart from CPUs and many GPUs: linear load. Specifically, it was relatively easy to create a solution where we could feel safe when calculating the load and expected capacity for transcoder nodes. The density, cost/stream, and quality are bonuses.”

RealSprint began deploying NETINT Quadra VPUs in 2022. As Mafrost concluded, “Since then, ASICs have started to be a preferred component of our setup.”

live streaming - World’s First AV1 Live Streaming CDN powered by VPUs
Figure 3. NETINT Quadra has become a “preferred component” of Vindral.

The NETINT View

NETINT Technologies is an innovator of ASIC-based video processing solutions for low-latency video transcoding. Users of NETINT solutions realize a 10X increase in encoding density and a 20X reduction in carbon emissions compared to CPU-based software encoding solutions. NETINT makes it seamless to move from software to hardware-based video encoding so that hyper-scale services and platforms can unlock the full potential in their computing infrastructure.

Regarding Vindral’s use of Quadra, NETINT’s COO Alex Liu commented, “Live streaming video platforms demand more efficient and cost-effective video encoding solutions due to the emergence of new interactive video applications which can only be met with ASIC hardware encoding. Vindral, the industry’s first 4K AV1 streaming platform and powered with NETINT’s Quadra T2 real-time, low-latency 4K AV1 encoder, is a game changer. We are really excited about the amazing video experiences that Vindral users will bring to their customers as a result of this breakthrough in latency and quality,”

RealSprint began deploying NETINT Quadra VPUs in 2022. As Mafrost concluded, “Since then, ASICs have started to be a preferred component of our setup.”

Figure 4. Streaming Media Magazine discussing Vindral with RealSprint CEO Daniel Alinder. https://youtu.be/xJ2Zfo2r7SM

The Industry Takes Notice

The potent combination of Vindral and Quadra has the industry taking notice. For example, in this Streaming Media interview, respected contributing editor Tim Siglin interviewed Alinder about Vindral, summarizing “the fact that [Quadra] is an ASIC that does more transcodes at a lower power consumption means that it gives you a better viability.” 

The Industry Takes Notice

NETINT was the first company to ship AV1-based ASIC transcoders and has shipped tens of thousands of transcoders and VPUs, producing over 200 billion streams in 2022. In fact, NETINT has shipped more ASIC-based transcoders than any other supplier to the cloud gaming, broadcast, and similar live-streaming markets.

Validating NETINT’s approach, in 2021, Google launched their own encoding ASIC-based transcoder, called ARGOS, as did Meta in 2022. Both products are exclusively used internally by the respective companies.

The best way to leverage the benefits of encoding ASICs is to contact NETINT.

The Evolution of Video Codecs: AV1 and HEVC Take the Lead

HEVC and AV1 - The Evolution of Codecs

For years, H.264 has remained dominant because it plays everywhere; but as videos grow larger, faster, and deeper in color, cost of distributing H.264 has become too high.

AV1 has leap-frogged VP9 in the so-called “open-source” horse race, while HEVC is the clear successor to H.264 in standards-based codecs, at least for the next 3-4 years as VVC slowly matures.

AV1 and HEVC have had their well-known Achilles heels, AV1 in the living room and on Apple devices, and HEVC in browsers. The last few months have seen critical movement and new data in all these platforms that will fundamentally change how we use them.

AV1 in the Living Room

HEVC has dominated Smart TVs and OTT dongles since 4K and High Dynamic Range (HDR) became must-haves for premium content producers. However, in late 2021, Netflix began distributing AV1 video to this market, and device support has burgeoned since then. As Bitmovin reported in this blog post, AV1 runs on smart TVs running Android TV and Google TV operating systems, including Sony Google TV models from 2021 and forward and many Amazon Fire TV models as far back as 2020. Starting in late 2020, most Samsung TVs have hardware AV1 decoders, with LG extending support to some TVs.

HEVC and AV1 - The Evolution of Codecs
Figure 1. Netflix started the migration of living room content towards AV1. 

Regarding OTT dongles, the Amazon Fire TV Stick 4K Max and the Roku Streaming Stick 4K, and other Roku models support AV1 playback, as does the PlayStation 4 Pro and Xbox One.

The one caveat is that AV1 support for dynamic metadata is nascent. The HDR10+ AV1 Metadata Handling Specification was finalized on December 7, 2022, so it will take a while for encoders and decoders to fully and reliably support it. Since Google’s Project Caviar is proposing a royalty-free alternative to Dolby Vision, Dolby Vision still only supports H.264 and HEVC and may never support AV1.

To be clear, YouTube supports HDR with AV1, so it’s technically feasible today. But standards like the HDR10+ Metadata Handling Specification promote broad playback compatibility necessary for most publishers to help it. For example, when Netflix first started streaming video to bright TV sets in 2021, it was Standard Dynamic Range only, and that’s still the case. Besides, suppose you’re already encoding your video to HEVC for living room delivery in HDR. In that case, it may not make economic sense to reencode to AV1 for slightly more efficient delivery to a market that you’re already serving.

Play Video about HEVC and AV1 - The Evolution of Codecs - thumbnail
HARD QUESTIONS ON HOT TOPICS – EVOLUTION OF VIDEO CODECS – WHEN IS AV1 READY?
Watch the full conversation on YouTube: https://youtu.be/wbMojTl_cpA

HEVC Plays in Chrome

Browser playback has been a traditional strength of AV1 since it first launched. Not surprising, given that all major browser developers are members of the Alliance for Open Media. For the same reason, it’s also no surprise that browsers like Chrome and Firefox never supported HEVC, even when hardware or software on the computer or device did support HEVC playback.

This changed in September 2022, when Google “fixed a bug” and enabled HEVC support when the hardware HEVC playback was available on the system. As the story goes, the lack of HEVC playback was reported by Bitmovin as a bug in 2015. On September 19, 2022, Google responded six years later, “Enabled by default on all releases.” Within weeks, browser support for HEVC, as reported in CanIUse, jumped from the low 20s to 86.49, well ahead of AV1 at around 73%.

This could be a massive benefit to streaming sites that deliver primarily to computers and mobile devices and have avoided HEVC because of the lack of Chrome playback. In a straightforward bugfix, Google enabled HEVC playback on all supported platforms with existing decoders, including Windows, Mac, iOS, and Android.

A caveat exists here, as well, specifically that “HEVC with Widevine DRM is not supported at this point.” This obviously limits the benefit of Chrome support for premium content producers.

Apple May Start Supporting AV1

Apple has a checkered history with the Alliance for Open Media. When Apple joined in 2018, they big footed their way in as a “founding member,” even though the organization was formed over two years earlier. Despite this aggressive posturing, Apple has never supported AV1 playback in its operating systems or browsers and was a massive supporter of HEVC.

Figure 2. Apple is now supporting AV1 playback in Safari 16.4.

At least respecting AV1, this may be about to change. With Safari 16.4, Apple added AV1 support in the media capabilities API and WebRTC support for hardware AV1 decoding on supported device configurations. It turns out that the software AV1 decoder dav1d is already included in the updated WebKit engine used in Apple Safari Technology Preview 161.

Apple is dipping its toes in the AV1 waters; this could mean that it intends to support AV1 playback via software in the short term or that it may unlock previously unannounced hardware playback capabilities in existing CPUs. It could also mean hardware AV1 support will be added in future CPUs. Whatever the strategy, it’s probably safe to assume that Safari will play AV1 at some point in the future, hopefully sooner than later.

That said, the major data point that recently surfaced was a Scientamobile report that indicated that while 86.60% of HEVC smartphones had HEVC hardware support, only 2.52% had AV1 support. Since hardware support guarantees full frame rate playback at minimal power draw, HEVC will likely remain the format of choice for mobile devices for the next 12-24 months.

#image_title
Figure 3. HEVC currently enjoys much greater hardware support in mobile devices than AV1.

Whether you decide to stay with H.264 for your live transcodes, or transition to AV1 or HEVC, NETINT has you covered. Our G4-based line of products (T408, T432) transcode to H.264 and HEVC, while the G5-based Quadra line (T1, T1A, T2A) support H.264, HEVC, and AV1. All products deliver competitive video quality, market-leading density, a highly affordable cost per stream, and the lowest possible power consumption and OPEX.

Insights from the Bitmovin Video Developer Report

Insights from the Bitmovin Video Developer Report

The Bitmovin Video Developer Report, now in its 6th edition, is one of the most far-reaching and useful documents available to streaming professionals (now with no registration required). It’s a report that I happily download each December and generally refer to frequently during the next twelve months.

Like the proverbial elephant, what you find important in the report depends upon your interests. I typically zero in on video codec usage, encoding practices, and the most important problems and opportunities facing streaming developers. As discussed below, this year’s edition has some surprises, like the fact that more respondents are currently working with H.266/VVC than AV1.

Beyond this, the report also tracks details on development frameworks, content distribution, monetization practices, DRM, video analytics, and many other topics. This makes it extraordinarily valuable to anyone needing a finger on the pulse of streaming industry practices.

Let’s start with some details about how Bitmovin compiles the data and then jump to what I found most interesting.

Gathering the Data

Bitmovin collected the data between June and September 2022. A total of 424 respondents from over 80 countries answered the survey. Geographically, EMEA led the charge with 43%, followed by North America (34%), APAC (14%), and Latin America (8%). Regarding job function, 34% of respondents were manager/CEO/VP level, 23% developer/engineer, 14% technical manager, 10% product manager, 9% architect/consultant, 7% in R&D, and 3% in sales and marketing.

A quarter of respondents worked in OTT streaming services, 21% in online video platforms, 15% for broadcasters, 12% for integrators, 7% for publishers, 6% for telcos, 5% for social media sites, with 10% other. In terms of company size, 35% worked in companies with 300+ employees, 17% 101-300, 19% 51 – 100, and 29% 1 – 50. In other words, a very useful cross-section of geography, industry, job function, and company size.

To be clear, the results are not actual data from Bitmovin’s cloud encoding facility, which would be useful in its own right. Rather, the respondents answered questions about their current practices and future plans in each of the listed topics.

Current and Planned Codec Usage

Figure 1 shows current and planned codec usage for live encoding, with current usage in blue and planned usage in red. The numbers exceed 100% (of course) because most respondents use multiple codecs.

It’s always a surprise to see H.264 at less than 100%, but there’s 78% clear as day. Even given the breadth of industries that responded to the survey, it’s tough to imagine any publisher not supporting H.264.

Insights from the Bitmovin Video Developer Report - 1
Figure 1. Answers to the question, “Which streaming formats are you using in production for distribution and which ones are you planning to introduce within the next year?”

HEVC was next at 40%, with AV1 in fifth at 18%, bracketed by VP8 (19%) and VP9 (17%), presumably more for WebRTC than OTT. These are the codecs most likely to be used to actually publish video in 2022. Other codecs presumably implemented by infrascture providers were H.266/VVC a suprising third at 19%, with LCEVC and EVC both at 16%.

Looking ahead, HEVC looks to be most likely to succeed in 2023 with 43% of respondents planning to implement, with AV1 next at 34%, H.264/AVC at 33%, and VVC at 20%. Given that CanIUse lists AV1 support at 73% while VVC isn’t even listed, you’d have to assume that actual AV1 deployments in the near term will dwarf H.266/VVC, but you can’t ignore the interest this standard based codec is receiving from the industry. VOD encoding tracks these results fairly closely for both current and planned usage.

Video Quality Related Findings

Quality is a constant concern for video professionals and quality-related data appeared in several questions. In terms of challenges faced by respondents, “finding the root case of quality issues” ranked fifth with 23%, while “quality of experience” ranked ninth, with 19%.

Interestingly, in response to the question, “For which of the following video use cases do you expect to use machine learning (ML) or artificial intelligence (AI) to improve the video experience for your viewers,” 33% cited “video quality optimization,” which ranked third, while 30% cited “quality of experience (QoE),” which ranked fourth.

With so many respondents looking for futuristic means to improve quality, it was ironic that so many ignored content-aware encoding (CAE), a proven method of improving both quality and quality of experience. Specifically, only 33% percent of respondents were currently using CAQ, with 35% planning to implement CAE within the next 12 months. If you’re not in either of these camps, consider yourself scolded.

Live Encoding Practices

Lastly, I focused on live encoding practices, finding that 53% of respondents used commercial encoders, which presumably include both hardware and software. In comparison, 34% encode via open source, which is all software. What’s interesting is how poorly this group dovetails with both the most significant challenge faced by respondents and the largest opportunity for innovation perceived by respondents.

Figure 2. Answers to the question, “Where do you encode video?”

Specifically, controlling cost was the most significant challenge in the report, selected by 33% of respondents. On a cost per stream basis, considering both CAPEX and OPEX, software-encoding is by far more expensive than encoding with hardware, particularly ASICs.

The most significant opportunity for innovation reported by respondents was live streaming at scale, again at 33%. In this regard, the same lack of throughput that makes CPU-driven open-source encoding the most expensive solution makes it the least scalable. Simply stated, publishers currently encoding with CPU-driven open-source codecs can help address both their biggest challenge and their most significant opportunity by switching to ASIC-based transcoding.

Insights from the Bitmovin Video Developer Report - 3
Figure 3. Responses to the question, “Where do you see the most opportunity for innovation in your service?

Curious? Download our white paper, How to Slash CAPEX, OPEX, and Carbon Emissions Using the NETINT T408 Video Transcoder here. Or, compute how long it will take to recoup your investment in ASIC-based encoding through reduced power costs via calculators available here.

And don’t forget to download the Bitmovin Video Developer Report, here.

Vindral CDN Against Dinosaurs’ Agreement

Vindral's CDN Against Dinosaurs' Agreement.jpg

One thing is the bill that you're getting, the other thing is the bill we're leaving to our children...”

WATCH FULL CONVERSATION HERE: https://youtu.be/tNPFpXPVpxI

We’re going to talk about Vindral – but first, tell us a little bit about RealSprint?

RealSprint, we’re a Swedish company based in Northern Sweden, which is kind of a great place to be running a tech company. When you’re in a University Town, and any time after September, it gets dark outside for most parts of the day, which means  people generally try to find things to do inside. So, it’s a good place to have a tech business because you’ll have people spending a lot of time in front of their screens, creating things. RealSprint is a heavily culture-focused team, with the majority located in Northern Sweden and a few based in Stockholm and in the U.S.

The company started around 10 years ago as a really small team that did not have the end game figured out yet.  All they knew was that they wanted to do something around video, broadcasting, and streaming. From there it’s grown, and today we’re 30 people.

At a high level, what is Vindral?

Vindral is actually a product family. There is a live CDN, as you mentioned, and there’s also a video compositing software. As for the live CDN, it’s been around five or six years that it’s been running 24/7.

The product was born because we got questions from our clients about latency and quality. ‘Why do I have to choose if I want low latency or if I want high quality’. There are solutions on both ends of that spectrum, but when we got introduced to the problem, there weren’t really any good ones. We started looking into real-time technologies, like webRTC, in its current state and quickly found that it’s not really suitable if you want high quality. It’s amazing in terms of latency. But the client’s reality requires more. You can’t go all in on only one aspect of a solution. You need something that’s balanced.

Draw us a block diagram. So, you’ve got your encoder, you’ve got your CDN, you’ve got software…

We can take a typical client in entertainment or gaming. So, they have their content, and they want to broadcast that to a global audience. What they generally do is they ingest one signal to our endpoint, which is the most standard way of using our CDN. And there are several ways of ingesting multiple transfer protocols.

The first thing that happens on our end is we create the ABR ladder. We transcode all the qualities that are needed since network conditions vary  between  markets. Even in places that are well connected, the home Wi-Fi alone can be so bad at times, with a lot of jitter and latency.

After the ABR ladder is created, the next box fans out to the places in the world where there are potential viewers. And from there, we also have edge software as one part of this. Lastly, the signal is received by the player instanced on the device.

That’s basically it.

You’ve got an encoder in the middle of things creating the encoding ladder. Then you’ve got the CDN distributing. What about the software that you’ve contributed? How does that work? Do I log into some kind of portal and then administrate through there?

Exactly. Take a typical client in gaming, for example.They’re running 50 or 100 channels. And they want to see what’s going on in their operations, understand how much data is flowing through the system, and things like that. There is a portal where they can log in, see their usage, and see all of the channel information that they would need. It’s a very important part, of course, of any mature system that the client understands what’s going on.

Encoding is particularly important for us to solve because we have loads of channels running 24/7. So, that’s different. If you’re running a CDN, and your typical client is broadcasting for 20 minutes a month, then, of course, the encoding load is much lower. In our case, yes, we do have those types (minimal usage), but many of our clients are heavy users, and they own a lot of content rights. Therefore, the encoding part is several hundreds of terabytes ingested. Only one quality for each stream monthly on the ingest side.

You’re encoding ABR. Which codecs are you supporting? And which endpoints are you supporting?

So, codec-wise, everybody does H264, of course. That’s the standard when it comes to live streaming with low latency. We have recently added AV1, as well, which was something we announced as a world first. We weren’t the world’s first with AV1, but we were the world’s first with AV1 at what many would call real-time. We call it low latency.

We chose to add it because there’s a market pointing to AV1.

Which devices are you targeting? Is it TV? Smart TV? Mobile? The whole gamut?

I would say the whole gamut. That list of devices is steadily growing. I’m trying to think of any devices that we don’t support. Essentially, as long as it’s using the internet, we deliver to it. Any desktop or mobile browser, including IOS as well.

iOS is, basically, the hardest one. If you’re delivering to iOS browsers that are all running iOS Safari. We’re getting the same performance on iOS Safari. And then Apple TV, Google Chromecast, Samsung, LG TVs, and Android TVs. There’s a plethora of different devices that our clients require us to support.

4K? 1080p? HDR? SDR?

Yes, we support all of them. One of the very important things for us is to prove that you can get quality on low latency.

Take a typical client. They’re broadcasting sports and their viewers are used to watching this on their television, maybe a 77-inch or 85-inch TV. You don’t want that user to get a 720p stream. This is where the configurable latency really comes into play, allowing the client to pick a second of latency or 800 milliseconds, with 4K to be maintained on that latency. That is one of the use cases where we shine.

There’s also a huge market for lower qualities as well, where that’s important.

So, you mentioned ABR ladders, and yes, there are markets where you get 600 kilobits per second on the last mile. You need a solution for that as well.

Your system is the delivery side, the encoding side. Which types of encoders did you consider when you chose the encoder to fit into Vindral?

There are actually two steps to consider depending on whether we’re doing it on-prem or off, like a cloud solution. The client often has their own encoders. Many of our clients use Elemental or something similar just to push the material to us. But on the transcoding, where we generate the ladder, unless we’re passing all qualities through (which is also a possibility), there are, of course, different ways and different directions to go for different scenarios. For example, if you take an Intel CPU-based and you use software to encode. That is a viable option in some scenarios, but not in all.

There’s an Nvidia GPU, for example, which you could use in some scenarios since there are many factors coming into play when making that decision.

The highest priority of all is something that our business generally does badly –maintaining business viability. You want to make sure that any client that is using the system can pay and make their business work. Now, if we have channels that are running 24/7, as we do, and if it’s in a region where it’s not impossible to allocate bare metal or collocation space, then that is a fantastic option in many ways.

CPU-based, GPU-based, and ASICs are all different and make up the three different ones that we’ve looked into.

So, how do you differentiate? You talked about software being a good option in some instances. When is it not a good option?

No option is good or bad in a sense, but if you compare them, both the GPU and the ASIC outperform the software encoding when it comes to heavier use.

The software option is useful when you need to spin it up, spin it down, and you need to move things. You need it to be flexible which is, usually, in the lower revenue parts of the markets.

When it comes to the big broadcaster and the large rights holders, the use case is heavier with many channels, and large usage over time, then the GPU and especially the ASIC make a lot of sense.

You’re talking there about density. What is the quality picture?
A lot of people think software quality is going to be better than ASIC and GPUs. How do they compare?

It might be in some instances. We’ve found that the quality when using ASICs is fantastic. It’s all depending on what you want to do. Because we need to understand we’re talking about low latency here. We don’t have the option of passing encoding or anything like that. Everything needs to work in real time. Our requirement on encoding is that it takes a frame to encode, and that’s all the time that you get.

You mentioned density, but there are a lot of other things coming into play, quality being one.

If you’re looking at ASICs, you’re comparing that to GPUs. In some scenarios we’ve had for the past two years, the decision could have been based on the availability factor – there’s a chip shortage. What can I get my hands on? In some cases,  we’ve had a client banging on the door, and they want to go live right away.

Going back to the density part. That is a huge game changer because the ASIC is unmatched in terms of the number of streams per rack unit. If you just measure that KPI, and you’re willing to do the job of building your CDN in co-location spaces, which not everybody is, then that’s it. You have to ask yourself, though, who’s going to manage this? You don’t want to bloat when you’re managing this type of solution. If you have thousands of channels running, then cost is one thing when it comes to not having to take up a lot of rack space, but also, you don’t want it to bloat too much.

How formal of analysis did you make in choosing between the two hardware alternatives? Did you bring it down to cost per stream and power per stream?
Did you do any of that math? How did you make that decision between those two options?

Well, in a way, yes. But, on that particular metric, we need to look at the two options and say well, this is at a tenth of the cost. So I’m not going to give you the number, because I know it’s so much smaller.

We’re well aware of what costs are involved, but the cost per stream depends on profiles, etc. Just comparing them. We’ve, naturally, looked at things like started encoding streams, especially in AV1. We look at what the actual performance is, how much load there is, and what’s happening on the cards, and how much you can put on them before they start giving in… But then… there’s such a big difference…

Take, for example, a GPU. A great piece of hardware. But it’s also kind of like buying a car for the sound system. Because the GPU… If I’m buying an NVIDIA GPU to encode video, then I might not even be using the actual rendering capabilities. That is the biggest job that the GPU is typically built for. So, that’s one of the comparisons to make, of course.

Take, for example, a GPU. A great piece of hardware. But it's also kind of like buying a car for the sound system.”

What about the power side? How important is power consumption to either you yourself or your customers?

If you look at the energy crisis and how things are evolving I’d say it is very, very important. The typical offer you’ll be getting from the data center is: we’re going to charge you 2x the electrical bill. And that’s never been something that’s been charged because they don’t even bother. Only now, we’re seeing the first invoices coming in where the electrical bill is part of it. In Germany, the energy price peaked in August at 0.7 Euros per kilowatt hour.

Frankfurt, Germany, is one of the major exchanges that is extremely important. If you want performance streaming, you need to have something in Frankfurt.  There’s another part of it as well, which is, of course, the environmental aspect of it. One thing is the bill that you’re getting. The other thing is the bill we’re leaving to our children.

It’s kind of contradictory because many of our clients  make travel unnecessary. We have a Norwegian company that we’re working with that is doing remote inspections of ships. They were the first company in the world to do that. Instead of flying in an inspector, the ship owner, and two divers to the location, there’s only one operator of an underwater drone that is on the location. Everybody else is just connected. That’s obviously a good thing for the environment. But what are we doing?

Why did you decide to lead with AV1?

That’s a really good question. There are several reasons why we decided to lead with AV1. It is very compelling as soon as you can do it in real time. We had to wait for somebody to make it viable, which we found with the NETINT’s ASIC.

Viable acts at high quality and with latency and reliability that we could use and also, of course, with throughput. We don’t have to buy too much hardware to get it working.

We’re seeing markers that our clients are going to want AV1. And there are several reasons why that is the case. One of which is, of course, it’s license free. If you’re a content owner, especially if you’re a content owner with a large crowd with many subscribers to your content, that’s a game-changer. Because the cost of licensing a codec can grow to become a significant part of your business expenses.

Look at what’s happening with fast, free, ad-supported television. There you’re trying to get even more viewers. And you have lower margins so what you’re doing is creating eyeball minutes. And then, if you have codec and license costs, that’s a bit of an issue. It’s better if it’s free.

Is this what you’re hearing from your customers? Or is this what you’re assuming they’re thinking about?

That’s what we’re hearing from our customers, and that’s why we started implementing it.

For us, there’s also the bandwidth-to-quality aspect, which is great. I believe that it will explode in 2023. For example, if you look at what happened one month ago, Google made hardware decoding mandatory for Android 14 devices. That’s both phones and tablets. It opens so many possibilities.

We were not expecting to get business on it yet, but we are, and I’m happy about that. There are already clients reaching out because of the licensing aspect, as some of them are transmitting petabytes a month. If you can bring down the bandwidth while retaining the quality, that’s a good deal.

You mentioned before that your systems allow the user to dial in the latency and the quality. Could you explain how that works?

It’s important to make a difference between the user and the broadcaster. Our client is the broadcaster that owns the content, and they can pick the latency.

Vindral’s live CDN doesn’t work on a ‘fetch your file’ basis. The way it works is we’re going to push the file to you, and you’re going to play it out. And this is how much you’re going to buffer. Once you have that setup, and, of course, a lot of sync algorithms and things like that at work, then the stream is not allowed to drift.

A typical use case is where you have tick live auctions, for example. The typical setup for live auctions is 1080P, and you want below one second of latency because people are bidding. There are also people bidding in the actual auction house, so there’s the fairness aspect of it as well.

What we typically see is they configure maybe a 700-millisecond buffer, and it makes it possible. Even that small of a buffer makes such a huge difference. What we see in our metrics is that, basically, 99% of the viewers are getting the highest quality stream across all markets. That’s a huge deal.

How much does the quality drop off? What’s the lowest latency do you support and how much does the quality drop off at that latency as compared to one or two seconds.

I would say that the lowest that we would maybe recommend somebody to use our system for is 500 milliseconds. That would be about 250 milliseconds slower than a webRTC-based real-time solution. And why do I say that? It’s because other than that, I see no reason to use our approach. If you don’t want a buffer, you may as well use something else.

Actually, we don’t have that many clients trying that out, because most of them 500 milliseconds is the lowest somebody’s sets. And they’ve been like ‘this is so quick we don’t need anything more’. And it retains 4K at that latency.

How does the pitch work against webRTC?
If I’m a potential customer of yours and you come in and talk about your system and compared to webRTC, what are the pros and cons of each? It’s an interesting technological decision. I know that webRTC is going to be potentially lower latency, but it might only be one stream, may not come with captioning, it’s not gonna be the ABR It’s interesting to hear what technology was, how do you differentiate.

Let’s look from the perspective of when you should be using which. If you need to have a two-way voice conversation, you should use webRTC. There are actually studies that have been made proving that if you bring the latency up above 200 milliseconds, the conversation starts feeling awkward. If you have half a second, it is possible, but it’s not good. So, if that’s an ultimate requirement, then webRTC all day long.

Both technologies are actually very similar. The main difference I would point out is that we have added this buffer that the platform owner can set. So, the player’s instance is at that buffer level. WebRTC currently does not support that. And even if it did, we might even Implement that as an option. And it might go that way at some point. Today it’s not.

On the topic of differences, then. If 700 or 600 milliseconds of latency is good for you and quality is still important, then you should be using a buffer and using our solution. When you’re considering different vendors, the feature set, and what you’re actually getting in the package, there are huge differences. For some vendors, on their lower-tier products, ABR is not included. Things like that. Where the obvious thing is – you should be using ABR. Definitely.

You talked about the shortest. What’s the longest latency you see people dialing in?

We’ve actually had one use case in Hong Kong where they chose to set the latency at 3.7 seconds. That was because the television broadcast was at 3.7 seconds.

That’s the other thing. We talk a lot about latency. Latency is a hot topic, but honestly, many of our clients value synchronization even above latency. Not all clients, but some of them.

If you have a game show where you want to react to the chat and have some sort of interactivity… Maybe you have 1.5 seconds. That’s not a big issue if it’s at 1.5 seconds of latency. You will, naturally, get a little bit more stability since you’re increasing the buffer. Some of our clients have chosen to do that.

But around 3.5… That’s actually the only client we’ve had that has done that. But I think there could be more in the future. Especially in sports. If you have the satellite broadcast… It is at seven seconds of latency. We can match it to the hundreds of hundreds of milliseconds.

Latency is a hot topic, but honestly, many of our clients value synchronization even above latency.”

And the advantage of higher latency is going to be stream stability and quality.
Do you know what’s the quality difference is going to be?

Definitely. However, as soon as you’re above even one second, the returns are diminishing. It’s not like it unlocks this whole universe of opportunities. On extreme markets, it might, but I would think that if you’re going above two seconds, you’ve kind of done. There is no need to go higher. At least our clients have not found that need. The markets are basically from East Asia to South America and South Africa because we’ve expanded our CDN into those parts.

You’ve spoken a couple of times about where you install your equipment, and you’re talking about co-locating and things like that. What’s your typical server look like. How many encoders are you putting in it? And what type of density are you expecting from that?

In general, it would be something like one server can do 10 times as many streams if you’re using the ASIC. Then if you’re using GPUs, like Nvidia, for example, it’s probably just the one. I’m not stating any numbers, because my IT guys are going to tell me that I was wrong.

What is the cost of low latency? If I decide to go to the smallest setting, what is that going to cost me? I guess there’s going to be a quality answer, and there’s going to be a stability answer… Is there a hard economic answer?

My hope is that there shouldn’t be a cost difference, depending on regions. The way we’ve chosen to operate is about the design paradigm of the product that you’ve created. We have competitors that are going with one partner. They’ve picked cloud vendor X, and they’re running everything in their cloud. And then what they can do is limited to the deal with that cloud vendor.

For example, we had an AV1 request from Greece. Huge egress for an internet TV channel that I was blown away by, and they mentioned their pricing. They wanted to save costs by cutting their traffic by using av1. What we did with that request is we went out to our partners and vendors and asked them – can you help us match this, and we did. From a business perspective, it might, in some cases, cost more. But there is also a perception that plagues the low latency business of high cost and that is because many of these companies have not considered their power consumption – their form factors.

Actually, being willing to take a CAPEX investment instead of just running in the cloud and paying as you go. Many of those things that we’ve chosen to put the time into so that there will not be that big a difference.

Take, for example, Tata Communications, one of our biggest partners, and their pricing. They’re running our software stack in their environments to run their VDM, and it’s on a cost parity. So that’s something that should always be the aim. Then, I’m not going to say it’s always going to be like that, but that’s just a short version when you’re talking about the business implications.

We’re often getting requests where the potential client has this notion that it’s going to be a very high cost. Then they find that this makes sense, and we can build a business.

Are you seeing companies moving away from the cloud towards creating their own co-located servers with encoders and producing that way, as opposed to paying cents per minute to different cloud providers?

I would say I’m seeing the opposite. We’re doing both, just to be clear. I think the way to go is to do a hybrid.

For some clients, they’re going to be broadcasting 20 minutes a month. Cloud is awesome for that. You spin it up when you need it, and you kill it when it’s done. But that’s not always going to cut it. But if you’re asking me what motion I’m seeing in the market? There are more and more of these companies that are deploying across one cloud. And that’s where it resides. There are also types of offerings that you can instance yourself in third-party clouds, which is also an option. But again, it’s the design choice that it’s a cloud service that uses underlying cloud functions. It’s a shame that it’s not more of both. It creates an opportunity for us, though.

What are the big trends that you’re chasing for 2023 and beyond? What are you seeing? What forces are going to impact your business? The new features you’re going to be picking up? What are the big technology directions you’re seeing?

I mean, for us on our roadmap, we have been working hard on our partner strategy, and we’ve been seeing a higher demand for white-label solutions, which is what we’re working on with some partners.

We’ve done a few of those installs, and that’s where we are putting a lot of effort into it because we’re running our own CDN. But we can also enable others to do it, even as a managed service. You have these telcos that have maybe an edge or less offering since before, and they’re sitting on tons of equipment and fiber. So that’s one thing.

If we’re making predictions, there are two things worth a mention. I would expect the sports betting markets, especially in the US, to explode. That’s something we are definitely keeping our eyes on.

Maybe live shopping becomes a thing outside of China. Many of the big players, the big retailers, and even financial companies, are working on their own offerings and live shopping.

Vindral's CDN Against Dinosaurs' Agreement.jpg

The dinosaurs’ agreement?

Have I told you about the dinosaurs’ agreement? It’s comparable to a gentleman’s agreement. This might be provocative to some. And I get that it’s complicated in many cases.

There is, among some of the bigger players and also among independent consultants that have different stakes, a sort of mutual agreement to keep asking the question – do we really need low latency? Or do we really need synchronization?

As long as the bigger brands are not creating the experience that the audience is waiting for them to create, nobody's going to have to move.”

And while a valid question it’s also kind of a self-fulfilling prophecy. Because as long as the bigger brands are not creating the experience that the audience is waiting for them to create, nobody’s going to have to move. So that is what I’m calling the dinosaurs here. They’re holding on to the thing that they’ve always been doing. And they’re optimizing that, but not moving on to the next generation. And the problem they’re going to be facing, hopefully, is that when it reaches critical mass, the viewers are going to start expecting it, and that’s when things might start changing.

There are many workflow considerations, of course. There are tech legacy considerations. There are cost considerations and different aspects when it comes to scaling. However, saying that you don’t need low latency is a bit of an excuse.

One thing is the bill that you're getting, the other thing is the bill we're leaving to our children..”

Meta AV1 Delivery Presentation: Six Key Takeaways

Meta AV1 Delivery Presentation: Six Key Takeaways

One of the most gracious things that large companies like Meta and Netflix do is to share their knowledge with others in the community. On November 3, Meta hosted Video @Scale Fall 2022 which featured multiple speakers from Meta and other companies. If you’re unfamiliar with the event, here’s the description, “Designed for engineers that develop or manage large-scale video systems serving millions of people.”

Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

One talk drew my attention; Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta. Watch above or use this link:  https://bit.ly/Lei_AV1 

For perspective, where Netflix has focused AV1 distribution on Smart TVs, Meta’s focus is mobile. Briefly, the company started delivering “AV1-encoded FB/IG Reels videos to selected iPhone and Android devices” in 2022. Lei’s talk included encoding, decoding, and some observations about the bandwidth savings, improved MOS scores, and increased viewing time that AV1 delivered.

Here are my top 6 takeaways from Lei’s excellent presentation.

1. Meta Finds that AV1 is 30% More Efficient than HEVC/VP9

As you’ll learn later in this article, Meta relies upon software playback on iOS and Android platforms. Since both platforms support HEVC decoding, iOS in hardware (since 2017) and Android mostly in hardware but also in software, it’s reasonable to ask why Meta didn’t just use HEVC?

The answer is that in Meta’s own tests, they found that AV1 was 30% more efficient than both VP9 and HEVC, about 21% lower than the 38% higher efficiency that I found in this study by Streaming Media. Lei didn’t discuss HEVC in his presentation, but you’d have to guess that Meta chose AV1 over HEVC because the superior quality AV1 was able to deliver outweighed the potential impact of software-playback on mobile device battery life.

SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

2. Meta Encodes with SVT-AV1 For Video On Demand (VOD)

The chart shown below tracks the encoding time and quality levels of the open-source codecs shown on the upper right, which includes libaom-av1 (AV1 codec), libvpx (VP9), x265 (HEVC), x264, (AVC), vvenc (VVC), and SVT-AV1 (AV1).

Here’s how Lei interpreted this data. “From this graph, we see that SVT-AV1 maintains a consistent performance across a wide range of complexity levels. No matter for an encoding efficiency or compute efficiency point of view, SVT-AV1 always achieves the most optimal results among open-source encoders.” Again, these results track my own findings, at least as it relates to SVT-AV1 as compared to Libaom.

Interestingly, the chart only tracks software encoders, not hardware, which present a completely different quality/encoding time curve. You’ll see why this is important at the end of this post.

Meta about AV1-3
SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

3. Meta Creates Their Encoding Ladder Using the Convex Hull

There are many forms of per-title encoding. Some, like YouTube, are based on machine learning, while others’, like Netflix, are based on multiple encodes to find the convex hull. Since Meta’s encoding task is much closer to YouTube than Netflix (high volume UGC), you might assume that Meta uses AI as well.

However, Meta actually uses the convex hull, a brute force technique that involves encoding at multiple resolutions and multiple bitrates to find the combination that comprises the convex hull for that video. In the example shown below, Meta encoded at seven resolutions and five CRF levels, a total of 35 encodes. To compute the convex hull, Meta plots the 35 data points and then draws a line connecting the points on the upper left boundary. The points on the convex hull are the optimal encoding configuration for that video.

As Lei points out, “the complexity of this process is quite high.” To reduce the complexity, Meta uses techniques like computing the convex hull with high-speed presets, and then encoding the selected resolution and CRF points using higher-quality presets for final delivery. Lei noted that though there are more encodes using this hybrid approach, as the optimal configurations are encoded twice, overall encoding time is reduced. 

Just to state the obvious, this approach only works for video on demand, not live. Even with the fastest hardware encoders, you can’t produce 35 iterations to identify the optimal five. This indicates that Meta uses a different schema for live transcoding, which Lei doesn’t address.

Meta about AV1-4
SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

4. Meta Uses the Convex Hull Computed for AVC for VP9 and AV1

Like most large publishers, Meta encodes using multiple codecs like H.264, VP9, and AV1 to deliver to different devices. One surprising revelation was that Meta uses the convex hull computed for H.264 to guide the convex hull implementations for the VP9 and AV1 encodes.

Lei didn’t explain how this works – as you can see in the figure below, the resolutions and bitrates for the three codecs are obviously different, and that’s what you would expect. So, there must be some kind of interpolation of the convex hull information from one codec to another. But you see that VP9 delivers a 48% bitrate savings over the top H.264 ladder rung, while AV1 delivers 65%.

Meta about AV1-5
SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

5. Apple and Android Phones Present Completely Different Challenges

Again, no surprise. There are many fewer Apple devices, and all are premium high-performance models. In contrast, there’s a much greater range of Android devices, from low-cost/low-performance options to models that rival Apple in cost and performance.

Lei shared that Facebook tests Android devices to determine eligibility for AV1 videos. As you can see in the slide below, Meta delivers much different quality to iOS and Android devices.

It was clear from Lei’s talk that delivering AV1 to Apple phones was relatively simple compared to sending AV1 video to Android phones. This is actually the reverse of what you might expect, as iOS doesn’t support AV1 natively while Android does. Though you can deliver video via an app to iOS devices, as Meta does, Safari doesn’t support it. And even though Android does support AV1 playback natively, you’ll have to implement some type of testing protocol—like Meta—to ensure smooth playback until AV1 hardware support becomes pervasive, which probably won’t be until 2024 or beyond.

Meta about AV1-6
SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

6. AV1 has Delivered in Several Key Metrics

Integrating a new codec into your encoding and delivery pipeline isn’t trivial. So, the big question is, was AV1 worth it? The slide below displays three graphs. Sorry that the quality in the original slide is suboptimal, but here’s the net/net.

The graph on the top left shows the week-over-week playback MOS on all videos played on an iPhone. It shows about a 0.6 MOS point improvement. Since MOS (Mean Opinion Score) is usually computed on a scale from 1-5, .6 is a significant number. The second graph, on the upper right is the bitrate of all videos delivered, and it shows about a 12% bitrate reduction.

The bottom chart presents the average iPhone watch time for the different codecs used in Facebook Reels and shows that AV1 watch time went up to about 70% within the first week after rollout. This doesn’t seem to mean that AV1 increased watch time; rather, it seems to show that a significant number of devices were able to play AV1, which is how AV1 delivered the MOS improvement and bitrate reductions shown in the top two charts.

Meta about AV1-7
SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

Lei’s talk was about 18 minutes long, and there’s a lot more useful data and observations than I’ve presented here. Again, here’s the link – https://bit.ly/Lei_AV1. If you’re considering deploying AV1 for VOD encoding in your organization, you’ll find the encoding-related portions of Lei’s talk illuminating.

ASICs are able to deliver video quality on par with SW encoders with significantly improved power efficiency. Because of the rapid commoditization of video processing, rising energy costs, and pollution concerns, Video Processing ASICS are inevitable.”

What about live? Lei didn’t address it, but you can take some guidance from the fact that Meta recently announced their own Video Processing ASIC. After the announcement, David Ronca, Director, Video Encoding at Meta, commented that “ASICs are able to deliver video quality on par with SW encoders with significantly improved power efficiency. Because of the rapid commoditization of video processing, rising energy costs, and pollution concerns, Video Processing ASICS are inevitable.”

At NETINT, we’ve been shipping transcoders based upon custom encoding ASICs since 2019 and have real market validations of Ronca’s comments. While software encoding may be appropriate for VOD, ASIC-based transcoders are superior, if not essential, for live transcoding.

Back on Lei’s talk, whether you’re distributing VOD or live AV1 streams, Lei’s descriptions of the challenges of AV1 delivery to mobile will be instructive to all.