Revolutionizing Online Media Distribution and Delivery

Advancements in Streaming

Streaming technologies have revolutionized the digital media landscape, transforming how content is distributed and delivered to audiences worldwide. One pioneering figure in this field is Alex Zambelli, whose career at Microsoft has been closely intertwined with the rise of streaming as the dominant digital media distribution method. Zambelli’s work with NBC Sports, particularly during the 2008 Beijing Olympics and subsequent events, was pivotal in advancing online streaming capabilities and earning industry recognition. This article, based on Jan Ozer‘s conversation with Alex during Voices of Video, explores Zambelli’s contributions to streaming technologies, the implementation of multi-view camera angles in Sunday Night Football, and key considerations in livestreaming from insights gained during Olympic events.

Evolution of Streaming Technologies

Alex Zambelli’s career at Microsoft has coincided with the transition from physical media to streaming as the dominant method of distributing digital media. Around 2007, streaming started gaining momentum, gradually overtaking CDs, DVDs, and Blu-rays. Zambelli’s focus on streaming technologies led him to work on Microsoft’s Silverlight, a competitor to Flash, which facilitated the creation of rich web experiences and premium media delivery, including digital rights management. This technology was a significant milestone in the evolution of streaming.

Zambelli’s collaboration with NBC Sports began with the 2008 Beijing Olympics, where they aimed to pioneer online streaming of all Olympics content. Initially, they utilized Windows Media and Silverlight, incorporating adaptive streaming capabilities. The subsequent transition to Microsoft’s Smooth Streaming technology for the 2010 Vancouver Olympics marked a significant advancement. This technology offered on-demand and live streams in high definition, providing viewers with an immersive and seamless experience. These groundbreaking endeavors earned Zambelli and the team recognition from the industry, including nominations for sports Emmys.

Multi-View Camera Angles in Sunday Night Football

The implementation of Smooth Streaming technology played a crucial role in enabling the seamless transition between camera angles in Sunday Night Football broadcasts. By utilizing a single manifest that contained all four camera angles, switching between views became as smooth as switching between bitrates in modern streaming protocols like DASH or HLS. This technology, developed by the broadcast team, allowed viewers to simultaneously watch multiple camera angles, enhancing the overall viewing experience.

Key Considerations in Livestreaming: Insights from Olympic Events

Livestreaming presents unique challenges compared to on-demand streaming due to its real-time nature. Issues such as packet loss, segment loss, blackouts, and ad insertions demand immediate attention and resolution. Unlike on-demand streaming, where there is some leeway to address content or delivery chain issues over time, livestreaming requires constant vigilance. Even a brief interruption or technical problem can significantly impact the viewer experience.

Successful livestreaming events often involve collaborative efforts from multiple companies, including Microsoft, NBC, Akamai, and iStreamPlanet. These events require dedicated teams ready to address and resolve any issues that arise in real time. The nature of livestreaming necessitates a higher level of focus and attention compared to on-demand streaming. It is crucial to prioritize and allocate sufficient resources to ensure the seamless execution of live events. The potential for unexpected issues or failures makes constant monitoring and immediate troubleshooting essential, as even a minor disruption can have significant consequences.

Voices of Video - Cloud Gaming being Real

Play Video about Advancements in Streaming Technologies - NETINT Technologies (Voices of Video with Alex Zambelli from Warner Bros Discovery
VOICES OF VIDEO
Scalable distribution in the age of DRM: Key Challenges and Implications.
Watch the full conversation on YouTube: https://youtu.be/s_afoa71muM
 

Evolution of Video Codecs and Streaming Protocols

The evolution of video codecs and streaming protocols has played a vital role in shaping the streaming landscape. In the early 2000s, the popular video codecs for streaming were VC-1 (supported by Silverlight) and H.264 (supported by Flash). However, the introduction of HTML5 posed challenges for streaming solutions, as the HTML specification lacked the necessary APIs to provide the required level of control and functionality for streaming.

Silverlight and Flash emerged as proprietary plugins that advanced streaming technology beyond what HTML could offer at the time. They provided opportunities to overcome HTML’s limitations and introduced features such as media stream sources and content protection (DRM) to enhance the streaming experience. Silverlight’s media stream source concept, which later influenced HTML’s media source extensions, allowed developers to handle their own segment downloading and parsing, passing the video and audio streams to a media buffer for decoding and rendering. Content protection was a crucial aspect addressed by Silverlight and Flash, as HTML lacked a robust solution for DRM.

Around 2011-2012, Silverlight and Flash gradually phased out as HTML5 matured, offering the necessary APIs for implementing streaming protocols like DASH, HLS, and Smooth Streaming within the browser while incorporating DRM capabilities. HTML5 overcame initial growing pains and established itself as the predominant platform for streaming. By 2014-2015, HTML5 had evolved sufficiently to support basic streaming functionalities and content protection with DRM.

Optimizing Encoding Quality and Cost

Achieving optimal encoding quality while considering cost is a crucial concern for content creators and distributors. At Warner Brothers Discovery, the x264 and x265 codecs are commonly used for transcoding purposes, employing the slow or slower presets to achieve higher quality outputs. This approach balances encoding cost with desired video quality.

Recent discussions within the organization have prompted exploration into the idea of customizing presets based on specific resolutions and content complexities. The focus is on optimizing encoding efficiency by adjusting presets according to the intricacy of the content and the resolution being processed. Different resolutions have varying encoding requirements, and applying the very slow preset to all resolutions may result in unnecessary computational overhead for lower resolutions. Similarly, content complexity plays a role in determining the appropriate preset, as not all content requires the very slow preset. Customizing presets based on resolution and content characteristics allows for more efficient allocation of computational resources.

The popularity and viewership of specific content also factor into the choice of preset. Content with a larger audience may benefit from the slower preset due to potential CDN savings resulting from improved video quality. On the other hand, smaller-scale content with fewer viewers may not necessitate the same level of complexity in encoding. Balancing encoding quality and cost requires thoughtful consideration of these factors.

Adaptive Encoding Ladders: Variations, Frame Rates, and Device Considerations

Adaptive encoding ladders play a crucial role in delivering content based on source resolution and frame rate. At Warner Brothers Discovery, these encoding ladders consist of approximately six to eight different variations, allowing flexibility in content delivery. The source resolution determines the stopping point within the UHD ladder, minimizing the need for multiple permutations of the ladders themselves.

Variations in frame rates necessitate different encoding ladders. The introduction of high frame rates, especially with reality TV content, requires separate encoding ladders to preserve the temporal resolution. Encoding ladders also differ for SDR and HDR content, with distinctions made between HDR10 and Dolby Vision 5, offering specific encoding settings for each.

While currently the same encoding ladders are used for all devices, specific subsets of the ladder may be delivered to certain devices to accommodate their capabilities. Device differentiation is particularly important for high frame rates or resolutions above 1080p. By intentionally capping the manifest delivered to devices that cannot handle certain capabilities, compatibility and optimal viewing experiences can be ensured. Differentiating encoding ladders for various devices is essential for maintaining consistent quality across different devices.

VBR Control, Per-Title Encoding, and DRM Considerations in Video Encoding

Video encoding involves crucial considerations such as VBR control, per-title encoding, and DRM integration. At Warner Brothers Discovery, the x264 and x265 codecs employ a CRF (Constant Rate Factor) rate control with a bitrate and buffer cap for VBR (Variable Bit Rate) encoding. This approach ensures control over codec levels, peak rates, and overall encoding quality.

VBR control is achieved by using VBV (Video Buffering Verifier) buffer size and VBV max rate parameters. These parameters allow for setting the highest average bitrate for the video, while CRF brings the average bitrate below the specified max rate in most cases. This method enables per-title encoding, achieving CDN savings without compromising quality. Differentiating encoding ladders based on resolutions, frame rates, and HDR formats is essential to conform to content licensing agreements and compatibility requirements.

DRM has a significant impact on the encoding ladder. Licensing agreements often demand different security levels for various resolutions, necessitating the assignment of different encryption keys and playback policies to different security groups. The use of hardware-backed DRM, such as Widevine L1 and PlayReady SL3000, is often required for higher resolutions. The trend in the industry is moving towards increased use of DRM across the entire encoding ladder, with a focus on stricter requirements for HDR content. Content licensing agreements are evolving to require comprehensive DRM implementation for improved content protection.

Exploring Hardware and Software DRM: Implementation and Impact on Video Streaming

The choice between hardware and software DRM implementations has implications for video streaming security and performance. Hardware DRM involves integrating DRM clients into the secure video path of the system, tightly coupling with the hardware decoder. This ensures secure decoding and decryption of video streams, preventing unauthorized access to the content. Hardware-based DRM establishes a secure video path or secure media path, where the decrypted and decoded bits cannot be retrieved or accessed by applications. This level of security is achieved through close integration with the hardware decoder, ensuring protection throughout the entire decoding process.

On the other hand, software DRM performs decoding and decryption in software, introducing a potential vulnerability where the decoded bits could be compromised or accessed by unauthorized parties. Software DRM lacks the same level of hardware integration and security provided by hardware-based DRM.

The limitations of software-based DRM can impact the resolution of premium content when viewing it on certain platforms or browsers without hardware support. For example, Chrome’s support for Widevine DRM is limited to L3, the software-based implementation. This can result in inferior video quality compared to browsers like Edge or Safari, which support hardware DRM, allowing for a more secure video path and higher quality streaming.

Unifying Packaging Formats: HLS, DASH, and CMAF in Video Streaming

Standardizing packaging formats is crucial for compatibility and interoperability in video streaming. Warner Brothers Discovery and Hulu have been utilizing both HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP) for content distribution. HLS is predominantly used for Apple devices, while DASH is employed for other devices.

The commonality between HLS and DASH lies in their utilization of the CMAF (Common Media Application Format) standard. CMAF serves as a standardized version of fragmented MP4 (fMP4), specifying the necessary boxes and encryption application for fMP4 media segments used in HLS and DASH. CMAF is not a streaming protocol itself but encompasses two components.

Firstly, it defines a refined version of fMP4 for HLS and DASH, establishing a more precise set of guidelines for compatibility. Many existing HLS and DASH implementations using fMP4 media segments are already CMAF-compliant.

Secondly, CMAF specifies a hypothetical logical media presentation model, outlining the relationship between tracks, segments, fragments, and chunks. This model closely resembles HLS or DASH without explicitly using those terms. It provides a framework for addressing different levels of the media presentation.

HLS and DASH can be considered as the physical implementations of the logical media presentation model described by CMAF. The HLS-DASH interoperability specification, such as CTA 5005, heavily relies on CMAF, serving as a unifying model and describing how both HLS and DASH integrate with CMAF. This unification allows for similar concepts to be described across both formats, enhancing compatibility and simplifying the streaming ecosystem.

Exploring Hardware and Software DRM: Implementation and Impact on Video Streaming

The streaming industry faces challenges related to content publishing and compatibility across diverse platforms and devices. The Consumer Technology Association (CTA) plays a crucial role in addressing these challenges and streamlining content publishing processes. The CTA is actively working to enhance interoperability within the streaming industry, allowing publishers to focus primarily on content development rather than compatibility concerns.

The CTA’s WAVE initiative serves as a platform for fostering efforts to streamline content publishing and compatibility. One major challenge in the streaming landscape is the presence of numerous application development platforms. For example, within Warner Brothers Discovery, there are approximately a dozen or 16 different application development platforms utilized for their streaming service, with some overlap between certain platforms such as Android TV and Fire TV.

Developers often encounter the unique scenario of building multiple versions of the same application in various programming languages using different platform APIs. This complexity arises due to the diversity of devices and platforms requiring tailored applications. This situation is unparalleled compared to other industries where typically a web app, iOS app, and Android app cover the majority of development needs.

The multitude of application development platforms poses challenges in areas such as encoding and packaging. Determining device capabilities becomes arduous without a standardized specification or set of APIs that can provide consistent and reliable information across different platforms.

The standardization of device media capabilities detection APIs is a crucial step towards enhancing compatibility in the streaming industry. Efforts within the World Wide Web Consortium (W3C) to define these APIs in HTML are underway. However, it is important to note that not all platforms utilize HTML, necessitating the presence of similar APIs across all platforms. Once standardized APIs for media capabilities detection are established, developing a standardized method for signaling these capabilities to servers becomes essential. This facilitates targeting specific devices based on their capabilities and enables actions such as manifest filtering.

Standardization efforts are vital for simplifying content publishing and enhancing compatibility in the streaming industry. By establishing standardized specifications and APIs, the industry can overcome compatibility challenges and streamline the development and distribution of streaming content.

The Leverage Is Imperative

The evolution of streaming technologies has brought about significant advancements in digital media distribution and delivery. Pioneers like Alex Zambelli have played a crucial role in driving innovation and pushing the boundaries of what is possible in online streaming. The implementation of multi-view camera angles, considerations in livestreaming, advancements in video codecs and streaming protocols, and optimization of encoding quality and cost are key areas that shape the streaming landscape. Standardization efforts, hardware and software DRM implementations, and the role of organizations like the CTA further contribute to enhancing compatibility and simplifying content publishing in the streaming industry. As the streaming industry continues to evolve, leveraging these advancements and best practices is imperative to deliver high-quality, seamless streaming experiences to audiences worldwide.

How Scaling Method and Technique Impacts Quality and Throughput

How Scaling Method and Technique Impacts Quality and Throughput

The thing about FFmpeg is that there are almost always multiple ways to accomplish the same basic function. In this post, we look at four approaches to scaling to reveal how the scaling method and techniques used impact quality and throughput.

We found that if you’re scaling using the default -s function (-s 1280×720), you’re leaving a bit of quality on the table compared to other methods. How much depends upon the metric you prefer; about ten percent if you’re a VMAF (hand raised here) or SSIM fan, much less if you still bow to the PSNR gods. More importantly, if you’re chasing throughput via cascaded scaling with fast scaling algorithms (flags=fast_bilinear), you’re probably losing quality without a meaningful throughput increase.

That’s the TL/DR; here’s the backstory.

The Backstory

NETINT sells ASIC-based hardware transcoders. One key advantage over software-only/CPU-based encoding is throughput, so we perform lots of hardware vs. software benchmarking. Fairness dictates that we use the most efficient FFmpeg command string when deriving the command string for software-only encoding.

In addition, the NETINT T408 transcoder scales in software using the host CPU, so we are vested in techniques that increase throughput for T408 transcodes. In contrast, the NETINT Quadra scales and performs overlays in hardware and provides an AI engine, which is why it’s designated a Video Processing Unit (VPU) rather than a transcoder.

One proposed scaling technique for accelerating both software-only and T408 processing is cascading scaling, where you create a filter complex that starts at full resolution, scales to the next lower resolution, then uses the lower resolution to scale to the next lower resolution. Here’s an example.

filter_complex “[0:v]split=2[out4k][in4k];[in4k]scale=2560:1440:flags=fast_bilinear,split=2[out1440p][in1440p];[in1440p]scale=1920:1080:flags=fast_bilinear,split=3[out1080p][out1080p2][in1080p];[in1080p]scale=1280:720:flags=fast_bilinear,split=2[out720p][in720p];[in720p]scale=640:360:flags=fast_bilinear[out360p]”

So, rather than performing multiple scales from full resolution to the target (4K > 2K, 4K to 1080p, 4K > 720p, 4K to 360p), you’re performing multiple scales from lower resolution sources (4K > 2K > 1080p >720p > 360p). The theory was that this would reduce CPU cycles and improve throughput, particularly when coupled with a fast scaling algorithm. Even assuming a performance increase (which turned out to be a bad assumption), the obvious concern is quality; how much does quality degrade because the lower-resolution transcodes are working from a lower-resolution source?

In contrast, if you’ve read this far,  you know that the typical scaling technique used by most beginning FFmpeg producers is the -s command (-s 1280×720). For all rungs below 4K, FFmpeg scales the source footage down to the target resolution using the bicubic scaling algorithm,

So, we had two proposed methods which I expanded to four, as follows.

  • Default (-s 1280×720)
  • Cascade using fast bilinear
  • Cascade using Lanczos
  • Video filter using Lanczos (-vf scale=1280×720 -sws_flags lanczos)

I tested the following encoding ladder using the HEVC codec.

  • 4K @ 12 Mbps
  • 2K @ 7 Mbps
  • 1080p @ 3.5 Mbps
  • 1080p @ 1.8 Mbps
  • 720p @ 1 Mbps
  • 360p @ 500 kbps

I encoded two 3-minute 4Kp30 files, excerpts from the Netflix Meridian and Harmonic Football test clips using the x265 codec and ultrafast preset. You can see full command strings at the end of the article. I measured throughput in frames per second and measured the 2K to 360p rung quality with VMAF, PSNR, and SSIM, compiling the results into BD-Rate comparisons in Excel.

I tested on a Dell Precision 7820 tower driven by two 2.9 GH Intel Xeon Gold (6226R) CPUs running Windows 10 Pro for Workstations with 64 GB of RAM. I tested with FFmpeg 5.0, a version downloaded from www.gyan.dev on December 15, 2022.

Performance

How Scaling Method and Technique Impacts Quality and Throughput - table 1
TABLE 1. FPS BY SCALING METHOD

Table 1 shows that cascading delivered negligible performance benefits with the two test files and the selected encoding parameters. I asked the engineer who suggested the cascading scaling approach why we saw no throughput increase. Here’s a brief exchange. 

Engineer: It’s not going to make any performance difference in your example anyways but it does reduce the scaling load

       Me: Why wouldn’t it make a performance difference if it reduces the scaling load?

Engineer: Because, as your example has shown, the x265 encoding load dominates. It would make a very small difference

       Me: Ah, so the slowest, most CPU-intensive process controls overall performance.

Engineer: Yes, when you compare 1000+1 with 1000+10 there is not too much difference.

What this means, of course, is that these results may vary by the codec. If you’re encoding with H.264, which is much faster, cascading scaling might increase throughput. If you’re encoding with AV1 or VVC, almost certainly not.

Given that the T408 transcoder is multiple times faster than real-time, I’m now wondering if cascaded scaling might increase throughput when producing with the T408. You probably wouldn’t attempt this approach if quality suffered, but what if cascaded scaling improved quality? Sound far-fetched? Read on.

Quality Results

Table 2 shows the combined VMAF results for the two clips. Read this by choosing a row and moving from column to column. As you would suspect, green is good, and red is bad. So, for the Default row, that technique produces the same quality as Cascade – Fast Bilinear with a bitrate reduction of 18.55%. However, you’d have to boost the bitrate by 12.89% and 11.24%, respectively, to produce the same quality as Cascade – Lanczos and  Video Filter – Lanczos.

How Scaling Method and Technique Impacts Quality and Throughput - table 2
Table 2. BD-Rate comparisons for the four techniques using the VMAF metric.

From a quality perspective, the Cascade approach combined with the fast bilinear algorithm was the clear loser, particularly compared to either method using the Lanczos algorithm. Even if there was a substantial performance increase, which there wasn’t, it’s hard to see a relevant use case for this algorithm.

The most interesting takeaway was that cascading scaling with the Lanczos algorithm produced the best results, slightly higher than using a video filter with Lanczos. The same pattern emerged for PSNR, where Cascade – Lanc was green in all three columns, indicating the highest-quality approach. 

How Scaling Method and Technique Impacts Quality and Throughput - table 3
Table 3. BD-Rate comparisons for the four techniques using the PSNR metric.

Ditto for SSIM.

How Scaling Method and Technique Impacts Quality and Throughput - table 4
Table 4. BD-Rate comparisons for the four techniques using the SSIM metric.

The cascading approach delivering better quality than the video filter was an anomaly. Not surprisingly, the engineer noted:

Engineer: It is odd that cascading with Lanczos has better quality than direct scaling. I’m not sure why that would be.

       Me: Makes absolutely no sense. Is anything funky in the two command strings?

Engineer: Nothing obvious but I can look some more.

Later analysis yielded no epiphanies. Perhaps they can come from a reader.

The Net Net

First, the normal caveats; your mileage may vary by codec and content. My takeaways are:

  • Try cascading scaling with Lanczos with the T408,
  • For software encodes, never use -s again.
  • Use cascade or the simpler video filter approach. 
  • With most software-based encoders, faster scaling methods may not deliver performance increases but could degrade quality.

Further, as we all know, there are several, if not dozens, additional approaches to scaling; if you have meaningful results that prove one is substantially better, please share them with me via THIS email.

Finally, taking a macro view, it’s worth remembering that a $12,000 + workstation could only produce 25 fps when producing a live 4K ladder to HEVC using x265’s ultrafast preset. Sure, there are faster software encoders available. Still, hardware encoding is the best answer for affordable live 4K transcoding from both an OPEX and CAPEX perspective.

Command Strings:

Default:

c:\ffmpeg\bin\ffmpeg -y -i  football_4K30_all_264_short.mp4 -y ^

-c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 12M -maxrate 12M  -bufsize 24M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_4K_8_bit_12M_default.mp4 ^

-s 2560×1440 -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 7M -maxrate 7M  -bufsize 14M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_2K_8_bit_7M_default.mp4  ^

-s 1920×1080 -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 3.5M -maxrate 3.5M  -bufsize 7M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_8_bit_3_5M_default.mp4 ^

-s 1920×1080 -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1.8M -maxrate 1.8M  -bufsize 3.6M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_1_8M_default.mp4 ^

-s 1280×720  -c:v libx265 -an  -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1M -maxrate 1M  -bufsize 2M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_720p_1M_default.mp4 ^

-s 640×360  -c:v libx265 -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v .5M -maxrate .5M  -bufsize 1M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 -report Fball_x265_360p_500K_default.mp4

Cascade – Fast Bilinear

c:\ffmpeg\bin\ffmpeg -y -i  football_4K30_all_264_short.mp4 -y ^

-filter_complex “[0:v]split=2[out4k][in4k];[in4k]scale=2560:1440:flags=fast_bilinear,split=2[out1440p][in1440p];[in1440p]scale=1920:1080:flags=fast_bilinear,split=3[out1080p][out1080p2][in1080p];[in1080p]scale=1280:720:flags=fast_bilinear,split=2[out720p][in720p];[in720p]scale=640:360:flags=fast_bilinear[out360p]” ^

-map [out4k] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 12M -maxrate 12M  -bufsize 24M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_4K_8_bit_cascade_12M_fast_bi.mp4 ^

-map [out1440p] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 7M -maxrate 7M  -bufsize 14M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_2K_8_bit_cascade_7M_fast_bi.mp4  ^

-map [out1080p] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 3.5M -maxrate 3.5M  -bufsize 7M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_8_bit_cascade_3_5M_fast_bi.mp4 ^

-map [out1080p2] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1.8M -maxrate 1.8M  -bufsize 3.6M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_8_bit_cascade_1_8M_fast_bi.mp4 ^

-map [out720p]  -c:v libx265 -an  -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1M -maxrate 1M  -bufsize 2M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_720p_8_bit_cascade_1M_fast_bi.mp4 ^

-map [out360p]  -c:v libx265 -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v .5M -maxrate .5M  -bufsize 1M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 -report Fball_x265_360p_8_bit_cascade_500K_fast_bi.mp4

Cascade – Lanczos

c:\ffmpeg\bin\ffmpeg -y -i  football_4K30_all_264_short.mp4 -y ^

-filter_complex “[0:v]split=2[out4k][in4k];[in4k]scale=2560:1440:flags=lanczos,split=2[out1440p][in1440p];[in1440p]scale=1920:1080:flags=lanczos,split=3[out1080p][out1080p2][in1080p];[in1080p]scale=1280:720:flags=lanczos,split=2[out720p][in720p];[in720p]scale=640:360:flags=lanczos[out360p]” ^

-map [out4k] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 12M -maxrate 12M  -bufsize 24M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_4K_8_bit_cascade_12M_lanc.mp4 ^

-map [out1440p] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 7M -maxrate 7M  -bufsize 14M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_2K_8_bit_cascade_7M_lanc.mp4  ^

-map [out1080p] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 3.5M -maxrate 3.5M  -bufsize 7M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_8_bit_cascade_3_5M_lanc.mp4 ^

-map [out1080p2] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1.8M -maxrate 1.8M  -bufsize 3.6M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_8_bit_cascade_1_8M_lanc.mp4 ^

-map [out720p]  -c:v libx265 -an  -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1M -maxrate 1M  -bufsize 2M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_720p_8_bit_cascade_1M_lanc.mp4 ^

-map [out360p]  -c:v libx265 -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v .5M -maxrate .5M  -bufsize 1M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 -report Fball_x265_360p_cascade_500K_lanc.mp4

Video Filter – Lanczos

c:\ffmpeg\bin\ffmpeg -y -i  football_4K30_all_264_short.mp4 -y ^

-c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 12M -maxrate 12M  -bufsize 24M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_4K_12M_filter_lanc.mp4 ^

-vf scale=2560×1440 -sws_flags lanczos -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 7M -maxrate 7M  -bufsize 14M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_2K_7M_filter_lanc.mp4  ^

-vf scale=1920×1080 -sws_flags lanczos  -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 3.5M -maxrate 3.5M  -bufsize 7M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_3_5M_filter_lanc.mp4 ^

-vf scale=1920×1080 -sws_flags lanczos  -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1.8M -maxrate 1.8M  -bufsize 3.6M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_1_8M_filter_lanc.mp4 ^

-vf scale=1280×720 -sws_flags lanczos -c:v libx265 -an  -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1M -maxrate 1M  -bufsize 2M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_720p_1M_filter_lanc.mp4 ^

-vf scale=640×360 -sws_flags lanczos  -c:v libx265 -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v .5M -maxrate .5M  -bufsize 1M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 -report Fball_x265_360p_500K_filter_lanc.mp4

Meta AV1 Delivery Presentation: Six Key Takeaways

Meta AV1 Delivery Presentation: Six Key Takeaways

One of the most gracious things that large companies like Meta and Netflix do is to share their knowledge with others in the community. On November 3, Meta hosted Video @Scale Fall 2022 which featured multiple speakers from Meta and other companies. If you’re unfamiliar with the event, here’s the description, “Designed for engineers that develop or manage large-scale video systems serving millions of people.”

Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

One talk drew my attention; Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta. Watch above or use this link:  https://bit.ly/Lei_AV1 

For perspective, where Netflix has focused AV1 distribution on Smart TVs, Meta’s focus is mobile. Briefly, the company started delivering “AV1-encoded FB/IG Reels videos to selected iPhone and Android devices” in 2022. Lei’s talk included encoding, decoding, and some observations about the bandwidth savings, improved MOS scores, and increased viewing time that AV1 delivered.

Here are my top 6 takeaways from Lei’s excellent presentation.

1. Meta Finds that AV1 is 30% More Efficient than HEVC/VP9

As you’ll learn later in this article, Meta relies upon software playback on iOS and Android platforms. Since both platforms support HEVC decoding, iOS in hardware (since 2017) and Android mostly in hardware but also in software, it’s reasonable to ask why Meta didn’t just use HEVC?

The answer is that in Meta’s own tests, they found that AV1 was 30% more efficient than both VP9 and HEVC, about 21% lower than the 38% higher efficiency that I found in this study by Streaming Media. Lei didn’t discuss HEVC in his presentation, but you’d have to guess that Meta chose AV1 over HEVC because the superior quality AV1 was able to deliver outweighed the potential impact of software-playback on mobile device battery life.

SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

2. Meta Encodes with SVT-AV1 For Video On Demand (VOD)

The chart shown below tracks the encoding time and quality levels of the open-source codecs shown on the upper right, which includes libaom-av1 (AV1 codec), libvpx (VP9), x265 (HEVC), x264, (AVC), vvenc (VVC), and SVT-AV1 (AV1).

Here’s how Lei interpreted this data. “From this graph, we see that SVT-AV1 maintains a consistent performance across a wide range of complexity levels. No matter for an encoding efficiency or compute efficiency point of view, SVT-AV1 always achieves the most optimal results among open-source encoders.” Again, these results track my own findings, at least as it relates to SVT-AV1 as compared to Libaom.

Interestingly, the chart only tracks software encoders, not hardware, which present a completely different quality/encoding time curve. You’ll see why this is important at the end of this post.

Meta about AV1-3
SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

3. Meta Creates Their Encoding Ladder Using the Convex Hull

There are many forms of per-title encoding. Some, like YouTube, are based on machine learning, while others’, like Netflix, are based on multiple encodes to find the convex hull. Since Meta’s encoding task is much closer to YouTube than Netflix (high volume UGC), you might assume that Meta uses AI as well.

However, Meta actually uses the convex hull, a brute force technique that involves encoding at multiple resolutions and multiple bitrates to find the combination that comprises the convex hull for that video. In the example shown below, Meta encoded at seven resolutions and five CRF levels, a total of 35 encodes. To compute the convex hull, Meta plots the 35 data points and then draws a line connecting the points on the upper left boundary. The points on the convex hull are the optimal encoding configuration for that video.

As Lei points out, “the complexity of this process is quite high.” To reduce the complexity, Meta uses techniques like computing the convex hull with high-speed presets, and then encoding the selected resolution and CRF points using higher-quality presets for final delivery. Lei noted that though there are more encodes using this hybrid approach, as the optimal configurations are encoded twice, overall encoding time is reduced. 

Just to state the obvious, this approach only works for video on demand, not live. Even with the fastest hardware encoders, you can’t produce 35 iterations to identify the optimal five. This indicates that Meta uses a different schema for live transcoding, which Lei doesn’t address.

Meta about AV1-4
SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

4. Meta Uses the Convex Hull Computed for AVC for VP9 and AV1

Like most large publishers, Meta encodes using multiple codecs like H.264, VP9, and AV1 to deliver to different devices. One surprising revelation was that Meta uses the convex hull computed for H.264 to guide the convex hull implementations for the VP9 and AV1 encodes.

Lei didn’t explain how this works – as you can see in the figure below, the resolutions and bitrates for the three codecs are obviously different, and that’s what you would expect. So, there must be some kind of interpolation of the convex hull information from one codec to another. But you see that VP9 delivers a 48% bitrate savings over the top H.264 ladder rung, while AV1 delivers 65%.

Meta about AV1-5
SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

5. Apple and Android Phones Present Completely Different Challenges

Again, no surprise. There are many fewer Apple devices, and all are premium high-performance models. In contrast, there’s a much greater range of Android devices, from low-cost/low-performance options to models that rival Apple in cost and performance.

Lei shared that Facebook tests Android devices to determine eligibility for AV1 videos. As you can see in the slide below, Meta delivers much different quality to iOS and Android devices.

It was clear from Lei’s talk that delivering AV1 to Apple phones was relatively simple compared to sending AV1 video to Android phones. This is actually the reverse of what you might expect, as iOS doesn’t support AV1 natively while Android does. Though you can deliver video via an app to iOS devices, as Meta does, Safari doesn’t support it. And even though Android does support AV1 playback natively, you’ll have to implement some type of testing protocol—like Meta—to ensure smooth playback until AV1 hardware support becomes pervasive, which probably won’t be until 2024 or beyond.

Meta about AV1-6
SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

6. AV1 has Delivered in Several Key Metrics

Integrating a new codec into your encoding and delivery pipeline isn’t trivial. So, the big question is, was AV1 worth it? The slide below displays three graphs. Sorry that the quality in the original slide is suboptimal, but here’s the net/net.

The graph on the top left shows the week-over-week playback MOS on all videos played on an iPhone. It shows about a 0.6 MOS point improvement. Since MOS (Mean Opinion Score) is usually computed on a scale from 1-5, .6 is a significant number. The second graph, on the upper right is the bitrate of all videos delivered, and it shows about a 12% bitrate reduction.

The bottom chart presents the average iPhone watch time for the different codecs used in Facebook Reels and shows that AV1 watch time went up to about 70% within the first week after rollout. This doesn’t seem to mean that AV1 increased watch time; rather, it seems to show that a significant number of devices were able to play AV1, which is how AV1 delivered the MOS improvement and bitrate reductions shown in the top two charts.

Meta about AV1-7
SLIDE FROM Meta’s Ryan Lei speaking on Scaling AV1 End-To-End Delivery at Meta.

Lei’s talk was about 18 minutes long, and there’s a lot more useful data and observations than I’ve presented here. Again, here’s the link – https://bit.ly/Lei_AV1. If you’re considering deploying AV1 for VOD encoding in your organization, you’ll find the encoding-related portions of Lei’s talk illuminating.

ASICs are able to deliver video quality on par with SW encoders with significantly improved power efficiency. Because of the rapid commoditization of video processing, rising energy costs, and pollution concerns, Video Processing ASICS are inevitable.”

What about live? Lei didn’t address it, but you can take some guidance from the fact that Meta recently announced their own Video Processing ASIC. After the announcement, David Ronca, Director, Video Encoding at Meta, commented that “ASICs are able to deliver video quality on par with SW encoders with significantly improved power efficiency. Because of the rapid commoditization of video processing, rising energy costs, and pollution concerns, Video Processing ASICS are inevitable.”

At NETINT, we’ve been shipping transcoders based upon custom encoding ASICs since 2019 and have real market validations of Ronca’s comments. While software encoding may be appropriate for VOD, ASIC-based transcoders are superior, if not essential, for live transcoding.

Back on Lei’s talk, whether you’re distributing VOD or live AV1 streams, Lei’s descriptions of the challenges of AV1 delivery to mobile will be instructive to all.