AV1 Capped CRF Encoding with Quadra VPU

Jan Ozer-AV1 Capped CRF-featured image-B

We’ve previously reported results for capped CRF encoding for H.264 and HEVC using NETINT Quadra video processing units (VPU). This post will detail AV1 performance, including both 1080p and 4K data.

For those with limited time, here’s what you need to know: Capped CRF delivers higher quality video during hard-to-encode regions than CBR, similar quality during all other scenes, and improved quality of experience at the same cost or lower than CBR. NETINT VPUs are the first hardware video encoders to adopt Capped CRF across the three most popular codecs in use today, AV1, HEVC, and H.264.

You can read a quick description of capped CRF here and get a deep dive with H.264 and HEVC performance results here

CAPPED CRF OVERVIEW

Briefly, capped CRF is a smart bitrate control technique that combines the benefits of CRF encoding with a bitrate cap. Unlike variable bitrate encoding (VBR) and constant bitrate encoding (CBR), which target specific bitrates, capped CRF targets a specific quality level, which is controlled by the CRF value. You also set a bitrate cap, which is applied if the encoder can’t meet the quality level below the bitrate cap.

On easy-to-encode videos, the CRF value sets the quality level, which it can usually achieve below the bitrate cap. In these cases, capped CRF typically delivers bitrate savings over CBR-encoded footage while delivering similar quality. For harder-to-encode footage, the bitrate cap usually controls, and capped CRF delivers close to the same quality and bitrate as CBR.

The value proposition is clear: lower bitrates and good quality during easy scenes, and similar to CBR in bitrate and quality for harder scenes. I’m not addressing VBR because NETINT’s focus is live streaming, where CBR usage dominates. If you’re analyzing capped CRF for VOD, you would compare against 2-pass VBR as well as potentially CBR.

One last detail. CRF values have an inverse relationship to quality and bitrate; the higher the CRF value, the lower the quality and bitrate. In general, video engineers select a CRF value that delivers their target quality level. For premium content, you might target an average VMAF score of 95. For user-generated content or training videos, you might target 93 or even lower. As you’ll see, the lower the quality score, the greater the bandwidth savings.

1080p RESULTS

We show 1080p results in Table 1, which is divided between easy-to-encode and hard-to-encode content. We encoded the CBR clips to 4.5 Mbps and applied the same cap for capped CRF encoding.

Jan Ozer-AV1 Capped CRF-1
Table 1. 1080p results using Quadra VPU and capped CRF encoding.

You see that in CBR mode, Quadra VPUs do not reach the target rate as accurately as when using capped CRF mode. This won’t degrade viewer quality of experience since the VMAF scores exceed 95, so this missing on the low side saves excess bandwidth with no visual quality detriment.

In this comparison, bitrate savings is minimized, particularly at CRF 19 and 21, as the capped CRF clips in the hard-to-encode content have a higher bitrate than the CBR counterparts (4,419 and 4,092 to 3,889). Not surprisingly, CRF 19 and 21 deliver little bandwidth savings and a slighly higher quality than CBR.

At CRF 23, things get interesting, with an overall bandwidth savings of 16.1% with a negligible quality delta from CBR. With a VMAF score of around 95, CRF 23 might be the target for engineers delivering premium content. Engineers targeting slightly lower quality can choose CRF 27 and achieve a bitrate savings of 43%, and an efficient 2.4 Mbps bit rate for hard-to-encode footage. At CRF 27, Quadra VPUs encoded the hard-to-encode Football clip at 3,999 kbps with an impressive VMAF score of 93.39.

Note that as with H.264 and HEVC, AV1 capped CRF does reduce throughput. Specifically, a single Quadra VPU installed in a 32-core workstation outputs 23 simultaneous CBR streams using CBR encoding. This dropped to eighteen for capped CRF, a reduction of 22%.

4K RESULTS

Many engineers encoding with AV1 are delivering UHD content, so we ran similar tests with the Quadra and 4K30 8-bit content with a CBR target and bitrate cap of 16 Mbps. Using four clips, including a 4K version of the high-motion Football clip to much less dynamic content like Netflix’s Meridian clip and Blender Foundation’s Sintel.

Table 2. 4K results for the Quadra VPU and capped CRF encoding.

In CBR mode, the Quadra VPU hit the bitrate target much more accurately at 4K than 1080p, so even at CRF 19, the VPU delivered a 13% bitrate savings with a VMAF score of 96.23. Again, CRF 23 delivered a VMAF score of very close to 95, with 45% savings over CBR. Impressively, at CRF 23, Quadra delivered an overall VMAF score of 94.87 for these 4K clips at 7.78 Mbps, and that’s with the Football clip weighing in at 14.3 Mbps.

Of course, these savings directly relate to the cap and CBR target. It’s certainly fair to argue that 16 Mbps is excessive for 4K AV1-encoded content, though Apple recommends 16.8 for 8-bit 4K content with HEVC here.

The point is, when you encode with CBR, you’re limiting quality to control bandwidth costs. With capped CRF, you can set the cap higher than your CBR target, knowing that all content contains easy-to-encode regions that will balance out the impact of the higher cap and deliver similar or lower bandwidth costs. With these comparative settings, capped CRF delivers higher quality video during hard-to-encode regions than CBR, similar quality during all other scenes, and improved quality of experience at the same cost or lower than CBR.

DENSER / LEANER / GREENER : Symposium on Building Your Own Streaming Cloud

Get Free CAE on NETINT VPUs with Capped CRF

Capped CRF
NETINT recently added capped CRF to the rate control mechanism across our Video Processing Unit (VPU) product lines. With the wide adoption of content-adaptive encoding techniques (CAE), constant rate factor (CRF) encoding with a bit rate cap gained popularity as a lightweight form of CAE to reduce the bitrate of easy-to-encode sequences, saving delivery bandwidth with constant video quality. It’s a mode that we expect many of our customers to use, and this document will explain what it is, how it works, and how to get the most use from the feature.

In addition to working with H.264, HEVC, and AV1 on the Quadra VPU line, capped CRF works with H.264 and HEVC on the T408 and T432 video transcoders. This document details how to encode with capped CRF using the H.264 and HEVC codecs on Quadra VPUs, though most application scenarios apply to all codecs across the NETINT VPU lines.

What is Capped CRF and How Does it Work?

Capped CRF is a bitrate control technique that combines constant rate factor (CRF) encoding with a bit rate cap. Multiple codecs and software encoders support it, including x264 and x265 within FFmpeg. In contrast to CBR and VBR encoding, which encode to a specified target bitrate (and ignore output quality), CRF encodes to a specified quality level and ignores the bitrate.

CRF values range from 0-51, with lower numbers delivering higher quality at higher bitrates (less savings) and higher CRF values delivering lower quality levels at lower bitrates (more bitrate savings). Many encoding engineers will utilize values spanning 21 to 23. Which is right for you? As you will read below, your desired quality and bitrate savings balance determines the best value for your use case.

For example, with the x264 codec, if you transcode to CRF 23, the encoder typically outputs a file with a VMAF quality of 93-95. If that file is a 4K60 soccer match, the bitrate might be 30 Mbps. If it’s a 1080p talking head, it might be 1.2 Mbps. Because CRF delivers a known quality level, it’s ideal for creating archival copies of videos. However, since there’s no bitrate control, in most instances, CRF alone is unusable for streaming delivery.

When you combine CRF with a bit rate cap, you get the best of both worlds, a bit rate reduction with consistent quality for easy-to-encode clips and similar to CBR quality and bitrate or more complex clips.

Here’s how capped CRF could be used with the Quadra VPU:

ffmpeg -i input crf=23:vbvBufferSize=1000:bitrate=6000000 output

The relevant elements are:

  • CRF=23 – sets the quality target at around 95 VMAF

  • vbvBufferSize=1000 – sets the VBV buffer to one second (1000 ms)

  • bitrate=6000000 – caps the bitrate at 6 Mbps.

These commands would produce a file that targets close to 95 VMAF quality but, in all cases, peaks at around 6 Mbps.

For a simple-to-encode talking head clip, Quadra produced a file with an average bitrate of 1,274 kbps and a VMAF score of 95.14. Figure 1 shows this output in a program called Bitrate Viewer. Since the entire file is under the 6 Mbps cap, the CRF value controls the bitrate throughout.

Encoding this clip with Quadra using CBR at 6 Mbps produced a file with a bit rate of 5.4 Mbps and a VMAF score of 97.50. Multiple studies have found that VMAF scores above 95 are not perceptible by viewers, so the extra 2.26 VMAF score doesn’t improve the viewer’s quality of experience (QoE). In this case, capped CRF reduces your bandwidth cost by 76% without impacting QoE.

Figure 1. Capped CRF encoding a simple-to-encode video in Bitrate Viewer.

You see this in Figure 2, showing the capped CRF frame with a VMAF score of 94.73 on the left and the CBR frame with a VMAF score of 97.2 on the right. The video on the right has a bit rate over 4 Mbps larger than the video on the left, but the viewer wouldn’t notice the difference.

Figure 2. Frames from the talkinghead clip. Capped CRF at 1.23 Mbps on the left,
CBR at 5.4 Mbps on the right. No viewer would notice the difference.

Figure 3 shows capped CRF operation with a hard-to-encode American football clip. The average bitrate is 5900 kbps, and the VMAF score is 94.5. You see that the bitrate for most of the file is pushing against the 6 Mbps cap, which means that the cap is the controlling element. In the two regions where there are slight dips, the CRF setting controls the quality.

Figure 3. Capped CRF encoding a hard-to-encode video in Bitrate Viewer.

In contrast, the CBR encode of the football clip produced a bit rate of 6,013 kbps and a VMAF score of  94.73. Netflix has stated that most viewers won’t notice a VMAF differential under 6 points, so a viewer would not perceive the .25 VMAF delta between the CBR and capped CRF file. In this case, capped CRF reduced delivery bandwidth by about 2% without impacting QoE.

Of course, as shown in Figure 2, the two-minute segment tested was almost all high motion. The typical sports broadcast contains many lower-motion sequences, including some commercials, cutting to the broadcasters, or during timeouts and penalty calls. In most cases, you would expect many more dips like those shown in Figure 2 and more substantial savings.

So, the benefits of capped CRF are as follows:

  • You can use a single ladder for all your content, automatically saving bitrate on easy-to-encode clips and delivering the equivalent QoE on hard-to-encode clips.
  • Even if you modify your ladder by type of content, you should save bandwidth on easy-to-encode regions within all broadcasts without impacting QoE.
  • Provides the benefit of CAE without the added integration complexity or extra technology licensing cost. Capped CRF is free across all NETINT VPU and video transcoder products.

Producing Capped CRF

Using the NETINT Quadra VPU series, the following commands for H.264 capped CRF will optimize video quality and deliver a file or stream with a fully compliant VBV buffer. As noted previously, this command string with the appropriate modifications to codec value will work across the entire NETINT product line. For example, to output HEVC, change -c:v h264_ni_quadra_enc to -c:v h265_ni_quadra_enc.

Here’s the command string.

ffmpeg -y -i input.mp4 -y -c:v h264_ni_quadra_enc -xcoder-params “gopPresetIdx=5:RcEnable=0:crf=23:intraPeriod=120:lookAheadDepth=10:cuLevelRCEnable=1:v
bvBufferSize=1000:bitrate=6000000:tolCtbRcInter=0:tolCtbRcIntra=0:zeroCopyMode=0″ output.mp4

Here’s a brief explanation of the encoding-related switches.

  • -c:v h264_ni_quadra_enc -xcoder-params – Selects Quadra’s H.264 codec and identifies the codec commands identified below.

  • gopPresetIdx=5 – this chooses the Group of Pictures (GOP) pattern, or the mixture of B-frame and P-frames within each GOP. You should be able to adjust this without impacting capped CRF performance.

  • RcEnable=0 – this disables rate control. You must use this setting to enable capped CRF.

  • crf=23 – this chooses the CRF value. You must include a CRF value within your command string to enable capped CRF.

  • intraPeriod=120 – This sets the GOP size to four seconds which we used for all tests. You can adjust this setting to your normal target without impacting CRF operation.

  • lookAheadDepth=10 – This sets the lookahead to 10 frames. You can adjust this setting to your normal target without impacting CRF operation.

  • cuLevelRCEnable=1 – this enables coding unit-level rate control. Do not adjust this setting without verifying output quality and VBV compliance.

  • vbvBufferSize=1000 – This sets the VBV buffer size. You must set this to trigger capped CRF operation.

  • bitrate=6000000 – This sets the bitrate. You must set this to trigger capped CRF operation. You can adjust this setting to your target without impacting CRF operation.

  • tolCtbRcInter=0 – This defines the tolerance of CU-level rate control for P-frames and B-frames. Do not adjust this setting without verifying output quality and VBV compliance.

  • tolCtbRcIntra=0 – This sets the tolerance of CU level rate control for I-frames. Do not adjust this setting without verifying output quality and VBV compliance.

  • zeroCopyMode=0 – this enables or disables the libxcoder zero copy feature. Do not adjust this setting without verifying output quality and VBV compliance.

You can access additional information about these controls in the Quadra Integration and Programming Guide.

Choosing the CRF Value and Bitrate Cap – H.264

Deploying capped CRF involves two significant decisions, choosing the CRF value and setting the bitrate cap. Choosing the CRF value is the most critical decision, so let’s begin there.

Table 1 shows the bitrate and VMAF quality of ten files encoded with the H.264 codec using the CRF values shown with a 6 Mbps cap and using CBR encoding with a 6 Mbps cap. The table presents the easy-to-encode files on top, showing clip-specific results and the average value for the category. The Delta from CBR shows the bitrate and VMAF differential from the CBR score. Then the table does the same for hard-to-encode clips, showing clip-specific results and the average value for the category. The bottom two rows present the overall average bitrate and VMAF values and the overall savings and quality differential from CBR.

Capped CRF - Table 1. CBR and capped CRF bitrates and VMAF scores for H.264 encoded clips.
Table 1. CBR and capped CRF bitrates and VMAF scores for H.264 encoded clips.

As mentioned, with CRF, lower values produce higher quality. In the table, CRF 19 produces the highest quality (and lowest bitrate savings), and CRF 27 delivers the lowest quality (and highest bitrate savings). What’s the right CRF value? The one that delivers the target VMAF score for your typical clips for your target audience.

For the test clips shown, CRF 19 produces an average quality of well over 95; as mentioned above, VMAF scores beyond 95 aren’t perceivable by the average viewer, so the extra bandwidth needed to deliver these files is wasted. Premium services should choose CRF values between 21-23 to achieve the top rung quality of around 95 VMAF scores. These deliver more significant bandwidth savings than CRF 19 while preserving the desired quality level. In contrast, commodity services should experiment with higher values like 25-27 to deliver slightly lower VMAF scores while achieving more significant bandwidth savings.

What bitrate cap should you select? CRF sets quality, while the bitrate cap sets the budget. In most cases, you should consider using your existing cap. As we’ve seen, with easy-to-encode clips, capped CRF should deliver about the same quality of experience with the potential for bitrate savings. For hard-to-encode clips, capped CRF should deliver the same QoE with the potential for some bitrate savings on easy-to-encode sections of your broadcast.

Note that identifying the optimal CRF value will vary according to the complexity of your video files, as well as frame rate, resolution, and bitrate cap. If you plan to implement capped CRF with Quadra or any encoder, you should run similar tests on your standard test clips using your encoding parameters and draw your own conclusions.

Now let’s examine capped CRF and HEVC.

Choosing the CRF Value and Bitrate Cap – HEVC

Table 2 shows the results of HEVC encodes using CBR at 4.5 Mbps and the specified CRF values with a cap of 4.5 Mbps. With these test clips and encoding parameters, Quadra’s CRF values produce nearly the same result, with CRF values 21-23 appropriate for premium services and 25 – 27 good settings for UGC content.

Capped CRF - Table 2. CBR and capped CRF bitrates and VMAF scores for HEVC encoded clips.
Table 2. CBR and capped CRF bitrates and VMAF scores for HEVC encoded clips.

Again, the cap is yours to set; we arbitrarily reduced the H.264 bitrate cap of 6 Mbps by 25% to determine the 4.5 Mbps cap for HEVC.

Capped CRF Performance

Note that as currently tested, capped CRF comes with a modest performance hit, as shown in Table 3. Specifically, in CBR mode, Quadra output twenty 1080p30 H.264-encoded streams. This dropped to sixteen using capped CRF, a reduction of 20%.

For HEVC, throughput dropped from twenty-three to eighteen 1080p30 streams, a reduction of about 22%. We performed all tests using CRF 21, with a 6 Mbps cap for H.264 and 4.5 Mbps for HEVC. Note that these are early days in the CRF implementation, and it may be that this performance delta is reduced or even eliminated over time.

Capped CRF - Table 3. 1080p30 outputs produced using the techniques shown.
Table 3. 1080p30 outputs produced using the techniques shown.

We installed the Quadra in a workstation powered by a 3.6 GHz AMD Ryzen 5 5600X 6-Core Processor running Ubuntu 18.04.6 LTS with 16 GB of RAM. As you can see in the table, we also tested output for the x264 codec in FFmpeg using the medium and veryfast presets, producing two and five 1080p30 outputs, respectively. For x265, we tested using the medium and ultrafast presets and the workstation produced one and three 1080p30 streams.

Even at the reduced throughput, Quadra’s CRF output dwarfs the CPU-only output. When you consider that the NETINT Quadra Video Server packs ten Quadra VPUs into a single 1RU form factor, you get a sense of how VPUs offer unparalleled density and the industry’s lowest cost per stream and power consumption per stream.

Bandwidth is one of the most significant costs for all live-streaming productions. In many applications, capped CRF with the NETINT Quadra delivers a real opportunity to reduce bandwidth cost with no perceived impact on viewer quality of experience.