AV1 Capped CRF Encoding with Quadra VPU

Jan Ozer-AV1 Capped CRF-featured image-B

We’ve previously reported results for capped CRF encoding for H.264 and HEVC using NETINT Quadra video processing units (VPU). This post will detail AV1 performance, including both 1080p and 4K data.

For those with limited time, here’s what you need to know: Capped CRF delivers higher quality video during hard-to-encode regions than CBR, similar quality during all other scenes, and improved quality of experience at the same cost or lower than CBR. NETINT VPUs are the first hardware video encoders to adopt Capped CRF across the three most popular codecs in use today, AV1, HEVC, and H.264.

You can read a quick description of capped CRF here and get a deep dive with H.264 and HEVC performance results here

CAPPED CRF OVERVIEW

Briefly, capped CRF is a smart bitrate control technique that combines the benefits of CRF encoding with a bitrate cap. Unlike variable bitrate encoding (VBR) and constant bitrate encoding (CBR), which target specific bitrates, capped CRF targets a specific quality level, which is controlled by the CRF value. You also set a bitrate cap, which is applied if the encoder can’t meet the quality level below the bitrate cap.

On easy-to-encode videos, the CRF value sets the quality level, which it can usually achieve below the bitrate cap. In these cases, capped CRF typically delivers bitrate savings over CBR-encoded footage while delivering similar quality. For harder-to-encode footage, the bitrate cap usually controls, and capped CRF delivers close to the same quality and bitrate as CBR.

The value proposition is clear: lower bitrates and good quality during easy scenes, and similar to CBR in bitrate and quality for harder scenes. I’m not addressing VBR because NETINT’s focus is live streaming, where CBR usage dominates. If you’re analyzing capped CRF for VOD, you would compare against 2-pass VBR as well as potentially CBR.

One last detail. CRF values have an inverse relationship to quality and bitrate; the higher the CRF value, the lower the quality and bitrate. In general, video engineers select a CRF value that delivers their target quality level. For premium content, you might target an average VMAF score of 95. For user-generated content or training videos, you might target 93 or even lower. As you’ll see, the lower the quality score, the greater the bandwidth savings.

1080p RESULTS

We show 1080p results in Table 1, which is divided between easy-to-encode and hard-to-encode content. We encoded the CBR clips to 4.5 Mbps and applied the same cap for capped CRF encoding.

Jan Ozer-AV1 Capped CRF-1
Table 1. 1080p results using Quadra VPU and capped CRF encoding.

You see that in CBR mode, Quadra VPUs do not reach the target rate as accurately as when using capped CRF mode. This won’t degrade viewer quality of experience since the VMAF scores exceed 95, so this missing on the low side saves excess bandwidth with no visual quality detriment.

In this comparison, bitrate savings is minimized, particularly at CRF 19 and 21, as the capped CRF clips in the hard-to-encode content have a higher bitrate than the CBR counterparts (4,419 and 4,092 to 3,889). Not surprisingly, CRF 19 and 21 deliver little bandwidth savings and a slighly higher quality than CBR.

At CRF 23, things get interesting, with an overall bandwidth savings of 16.1% with a negligible quality delta from CBR. With a VMAF score of around 95, CRF 23 might be the target for engineers delivering premium content. Engineers targeting slightly lower quality can choose CRF 27 and achieve a bitrate savings of 43%, and an efficient 2.4 Mbps bit rate for hard-to-encode footage. At CRF 27, Quadra VPUs encoded the hard-to-encode Football clip at 3,999 kbps with an impressive VMAF score of 93.39.

Note that as with H.264 and HEVC, AV1 capped CRF does reduce throughput. Specifically, a single Quadra VPU installed in a 32-core workstation outputs 23 simultaneous CBR streams using CBR encoding. This dropped to eighteen for capped CRF, a reduction of 22%.

4K RESULTS

Many engineers encoding with AV1 are delivering UHD content, so we ran similar tests with the Quadra and 4K30 8-bit content with a CBR target and bitrate cap of 16 Mbps. Using four clips, including a 4K version of the high-motion Football clip to much less dynamic content like Netflix’s Meridian clip and Blender Foundation’s Sintel.

Table 2. 4K results for the Quadra VPU and capped CRF encoding.

In CBR mode, the Quadra VPU hit the bitrate target much more accurately at 4K than 1080p, so even at CRF 19, the VPU delivered a 13% bitrate savings with a VMAF score of 96.23. Again, CRF 23 delivered a VMAF score of very close to 95, with 45% savings over CBR. Impressively, at CRF 23, Quadra delivered an overall VMAF score of 94.87 for these 4K clips at 7.78 Mbps, and that’s with the Football clip weighing in at 14.3 Mbps.

Of course, these savings directly relate to the cap and CBR target. It’s certainly fair to argue that 16 Mbps is excessive for 4K AV1-encoded content, though Apple recommends 16.8 for 8-bit 4K content with HEVC here.

The point is, when you encode with CBR, you’re limiting quality to control bandwidth costs. With capped CRF, you can set the cap higher than your CBR target, knowing that all content contains easy-to-encode regions that will balance out the impact of the higher cap and deliver similar or lower bandwidth costs. With these comparative settings, capped CRF delivers higher quality video during hard-to-encode regions than CBR, similar quality during all other scenes, and improved quality of experience at the same cost or lower than CBR.

DENSER / LEANER / GREENER : Symposium on Building Your Own Streaming Cloud

Choosing a Co-Location Facility

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility

Edgio is the result of the merger between Limelight Networks and EdgeCast in 2022, which produced a company with over 20 years of experience choosing and installing their own equipment into co-location facilities.

With customers like Disney, ESPN, Amazon, and Verizon, Edgio has had to manage both explosive growth and exceptionally high expectations.

So, there’s no better source to help you learn to choose a co-location provider than Kyle Faber, Head of CDN Product Delivery Management at Edgio. He’s got experience, and as you’ll see below, the pictures to prove it.

Kyle starts with a description of the math involved in deciding whether co-location is the right direction for your organization, and then works though must-have and nice-to-have co-location features. He covers the value of certifications, the importance of redundancy and temperature management, explores connectivity, support, and cost considerations, and finishes with a look at sustainability. It’s a deep and comprehensive look at choosing a co-location provider and information that anyone facing this decision will find invaluable.

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-1

NAVIGATE THE COMPLEXITIES OF PRIVATE COLOCATION DECISIONS

Kyle started by addressing the considerations video engineers should prioritize when contemplating the shift to private co-location. In the context of modern public cloud computing platforms, he asserted that the decision to opt for private colocation requires a higher level of scrutiny due to the advanced capabilities of cloud offerings. While some enterprises rely solely on public cloud solutions for their production stack, there are compelling reasons to explore private colocation options.

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-2

He outlined his talk as follows:

  • First, he detailed a methodology for considering your financial break-even.
  • Then, he identified the “must have” features that a co-location provider must offer.
  • Then he related the nice-to-have, but not essential features that are potentially negotiable based on your organization’s goals.
  • He concluded with insight into how to balance the cloud vs. co-location decision, sharing that “it’s not a zero-sum game.”

As you’ll see, throughout the talk, Kyle provided practical insights to help video engineers navigate the complexities of private colocation decisions. He emphasized understanding the factors influencing these choices and making informed decisions based on an organization’s unique circumstances.

UNDERSTANDING THE MATH AND BREAKEVEN PRINCIPLES

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-3

Kyle started the economic discussion with the concept of the economics of minimum load and its relevance to private co-location decisions for video engineers. Using an everyday analogy, Kyle drew parallels between choosing to buy a car for daily use versus opting for ride-sharing services. He noted that the expenses associated with car ownership accumulate rapidly, but they eventually stabilize.

The convenience of controlling usage and trip frequency often leads to a reduced cost per ride compared to ride-sharing services over time. This analogy illustrated the dynamics of yearly co-location contracts, where minimum load drives efficiencies and potential gains.

Kyle then shifted to a scenario involving short-term heavy needs, like vacation car rentals. He noted that car rentals offer flexibility for unpredictable schedules without the commitment of ownership. This aligns with the flexibility provided by bare metal service providers, who offer diverse options within predefined parameters. This approach maintains efficiency while operating within certain boundaries.

Concluding his analogy, Kyle compared on-demand and public cloud offerings to ride-sharing services. He emphasized their ease of access, requiring just a few clicks to summon a driver or server, without concerns regarding operational aspects like insurance, maintenance, and updates.

By illustrating these relatable scenarios, Kyle underscored the importance of understanding the economics of minimum load in the context of private co-location decisions, specifically catering to the considerations of video engineers.

NAVIGATE THE ECONOMICS OF MINIMUM LOAD

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-4

Kyle next elaborated on the strategic approach required to navigate the economics of minimum load in the context of private co-location decisions. He emphasized the significance of aligning different models with specific data center demands.

Drawing from personal experiences, Kyle illustrated the concept using relatable scenarios. He contrasted his friend’s experience of living near a rail line in Seattle, which made car ownership unnecessary, with his own situation in Scottsdale, Arizona, where car ownership was essential due to logistical challenges.

Translating this to the business realm, Kyle pointed out that various companies have unique server requirements. Some prioritize flexible load management over specialized hardware needs and prefer to maintain a lean staff without extensive server administration roles. For Edgio, a content delivery network, private co-location globally was the optimal choice to meet their specific requirements.

Kyle then began a cost analysis, acknowledging that while the upfront cost of private co-location might seem daunting compared to public cloud prices, the cumulative server hour costs can accumulate rapidly. He referenced AWS’s substantial revenue from convenience as an example. He highlighted the necessity of considering hidden costs, including human capital requirements and logistical factors.

Addressing executive leaders, Kyle cautioned against assuming that software developers skilled with code are also adept at running data centers. He emphasized the importance of having dedicated data center and server administration experts to maximize cost savings and avoid potential disasters.

Looking toward the future, Kyle advised mid-sized companies to consider their future needs and focus on maintaining nimbleness. He shared his insights into the challenges of hardware logistics and the value of proper tracking and clarity to identify breakeven points. In this comprehensive overview, Kyle provided practical insights into the economics of minimum load, offering a pragmatic perspective on private co-location decisions for video engineers.

MUST-HAVE CO-LOCATION FEATURES

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-5

With the economics covered, Kyle shifted to identifying the must-have features in any co-location service, suggesting that certifications play a crucial role in evaluating co-location providers. ISO 9,000 and SOC 2, types one and two, were cited as common minimum standards, with additional regional and industry-specific variations. Kyle recommended requesting certifications from potential vendors and conducting thorough research to understand the significance of these certifications.

Kyle explained that by obtaining certifications, you can move beyond basic questions about construction methods, power backup systems, and operational standards. Instead, you can focus on more nuanced inquiries, like power sources, security standards for visitors, and the training and responsiveness of remote hands teams. This transition allows for a more informed assessment of vendors’ capabilities and suitability for specific needs.

THE SIGNIFICANCE OF ON-SITE VISITS

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-6

Kyle underscored the significance of on-site visits in the colocation decision-making process, sharing three images that highlighted the insights gained from physical visits to data center facilities. The first image depicted service cabling that entered a data center. While the front of the building seemed pristine, the back revealed potential issues lurking in the shadows. Kyle stressed that some problems can only be identified through close inspection.

The second image showed a fiber distribution panel, showcasing the low level of professionalism evident in the data center’s installations. This reinforced the idea that visual assessments can reveal the quality of a facility’s infrastructure.

The third image illustrated a unique scenario. During construction, a new fiber channel was being laid, but the basement entry of the fiber trench was left unsealed. An overnight rainstorm resulted in the trench filling with water. Because the basement access hole was uncapped, water flowed downhill into a room with valuable equipment. This real-life example served as a reminder of the importance of thorough inspection and due diligence in the colocation industry.

These visuals underscore the importance of physically visiting data centers to identify potential challenges and make informed decisions.

AND TEMPERATURE MANAGEMENT

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-8

Kyle also shared that  temperature management is particularly important to data centers. For example, Edgio emphasizes cooling speed, temperature regulation, and high-density heat rejection technology. It’s not merely about achieving lower temperatures; it’s about effectively managing and dissipating heat.

Kyle explained that even a slight temperature fluctuation can trigger far-reaching consequences, so maintaining a precise temperature of 76 degrees Fahrenheit is paramount. The utilization of advanced heat rejection technology ensures that any deviations from this optimal point can be promptly corrected, guaranteeing peak performance for their installations.

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-9

Paradoxically, economic success complicates temperature maintenance. Over the past eight years, Kyle reported that Edgio achieved a 30% improvement in server power efficiency, coupled with a 760% surge in server density metrics. However, since the laws of physics remain steadfast, this density surge brings with it an elevated heat generation within a smaller space.

CONNECTIVITY, SUPPORT, AND COST CONSIDERATIONS

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-10

Kyle’s discussion then shifted to connectivity, sustainability, and environmental considerations with a focus on where to place each factor in your decision-making scorecard.

Emphasizing the critical role of connectivity in businesses, Kyle noted that vendors often claim constant uptime and availability, and usually deliver this, so they differentiate themselves through their access to the wider internet. When choosing a co-location provider, all organizations should reflect on their unique requirements. For instance, he suggests that businesses intending to connect with a CDN like Edgio might require a local data center partner that facilitates data transformation and transcoding but might not need the extensive infrastructure for global data distribution.

Kyle then addressed the significance of remote support, especially during initial installations where a swift response to issues is crucial. While tools like iDRAC and remote Out-of-Band server access provide control, Kyle highlighted the importance of real-time assistance during other critical moments, such as identifying server issues.

Addressing costs, Kyle acknowledges its pivotal role in decision-making, a sentiment particularly relevant given the current technology landscape. Kyle urges a balance between cost-effectiveness and quality, drawing parallels between daily personal choices and those made in professional spheres. He references Terry Pratchett’s boot theory of economics, emphasizing the inevitability of change and the need for proactive lifecycle management. “Even the best boots will not last forever,” Kyle paraphrased, “and you need to plan lifecycle management.”

A FEW WORDS ABOUT SUSTAINABILITY

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-11

Kyle urged all participants and readers to consider sustainability, transcending its status as a mere buzzword. “Sustainability is more than a buzzword,” he declared, “It is a commitment.”

He illuminated the staggering energy appetite of data centers, exemplified by Amazon’s permits for generators in Virginia, capable of producing a remarkable 4.6 gigawatts of backup power – enough to illuminate New York City for a day. Kyle underscored the industry’s responsibility to reevaluate energy sources, citing the rising importance of Environmental Social Governance (ESG) movements. He emphasized that organizations are now compelled to report their environmental impact to stakeholders and investors, emphasizing transparency.

When considering colocation facilities, Kyle recommended evaluating their sustainability reports, which reveal critical information from energy-sourcing practices to governance approaches. By aligning operational needs with global responsibilities, businesses can make conscientious choices that resonate with their core values and forge meaningful partnerships with data center providers.

GET INTIMATELY ACQUAINTED WITH THE UNPREDICTABLE

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-12

While you should perform a comprehensive needs analysis and service comparison to choose your provider, Kyle also highlighted that data centers are intimately acquainted with the unpredictable. Construction activities, often beyond the data center provider’s control, persistently surround these facilities.

The photo above, taken a mile away from a facility, exemplifies the unforeseen challenges. A construction crew, possibly misinformed or negligent, drove an auger into the ground at an incorrect location, inadvertently ensnaring cabling, and yanking dozens of meters of fiber from the earth.

The incident’s specifics remain unclear, yet the lesson is evident – despite meticulous planning, unpredictability is an integral facet of this landscape. As Kyle summarized, “It’s a stark reminder that despite our best plans, unpredictability has to be part of this landscape, so always be prepared for the unexpected.”

NO ONE-SIZE-FITS-ALL SOLUTION

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-13

In closing, Kyle addressed the intricate decisions surrounding ownership, rental, and on-demand data center services, emphasizing that there’s no one-size-fits-all solution. He presents the choice between owning servers, renting them, or opting for on-demand cloud services as a complex tapestry woven with factors such as the unique average minimum load and an organization’s strategic objectives.

Kyle cautioned that navigating this intricate landscape demands a nuanced perspective. The decision requires a well-thought-out plan that not only accommodates an organization’s goals and growth but also anticipates the evolving trends of the industry. This approach ensures that the chosen path resonates seamlessly with an organization’s aspirations, offering stability for the journey ahead.

GO FROM A PURE OPEX MODEL TO A CAPEX MODEL

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-14

Before wrapping up, Kyle answered one question from the audience, “ How does someone begin to approach a transition? Is it even possible to go from a pure OPEX model to a CAPEX model? Any suggestions, ideas, insights?”

Kyle noted that when you assess an OPEX model, you’re essentially looking at linear costs. These costs offer a clear breakdown of your system expenses, which can be projected into the future.

While there might be some pricing fluctuations as public cloud providers compete, you can treat entire segments as a transition unit. It might not be feasible to buy just one server and place it in isolation, but you can transition comprehensive sections in one concerted effort.

So, you might build a small encoding farm, allowing for a gradual shift while maintaining flexibility across various cloud instances like AWS, Azure, or GCP. This phased approach grants greater control, cost benefits, and a smoother transition into the new paradigm.

ON-DEMAND: Kyle Faber - Choosing a Co-Location Facility

Norsk and NETINT: Elevating Live Streaming Efficiency

Norsk and NETINT
With the growing demand for high-quality viewing experiences and the heightened attention on cost efficiency and environmental impact,  hardware acceleration plays an ever-more-crucial role in live streaming. Here at NETINT, we want users to take full advantage of our transcoding hardware, so we’re pleased to announce that id3as NORSK now offers exceptionally efficient support for NETINT’s T408 and Quadra video processing unit (VPU) modules.

Here at NETINT, we want users to take full advantage of our transcoding hardware, so we’re pleased to announce that id3as NORSK now offers exceptionally efficient support for NETINT’s T408 and Quadra video processing unit (VPU) modules.

Using NETINT VPU’s, users can leverage the Norsk low-code live streaming SDK to achieve higher throughput and greater efficiency compared to running software on CPUs in on-prem or cloud configurations. Combined with Norsk’s proven high-availability track record, this makes it easy to deliver exceptional services with maximum reliability and performance at a never-before-available OPEX. 

NORSK AND NETINT

Norsk also takes advantage of Quadra’s hardware acceleration and onboard scaling to achieve complex compositions like picture-in-picture and resizing directly on the card. Even better, Norsk’s built-in ability to “do the right thing” also means that it knows when it can take advantage of hardware acceleration and when it can’t.  

 

For example, if you’re running Norsk on the T408, decoding will take place on the card, but Norsk will automatically utilize the host CPU for functions like picture-in-picture and resizing that the T408 doesn’t natively support, before returning the enriched media to the card for encoding (Scaling and resizing functions are native to Quadra VPUs so are performed onboard without the host CPU). 

 

“As founding members of Greening of Streaming, we’re keenly aware of the pressing need to focus on energy efficiency at every point of the video stack,” says Norsk CEO Adrian Roe. “By utilizing the Quadra and T408 VPU modules, users can reduce energy usage while achieving maximum performance even on compute-intensive tasks. With Norsk seamlessly running on NETINT hardware, live streaming services can consume as little energy as possible while delivering a fantastic experience to their customers.” 

“By utilizing the Quadra and T408 VPU modules, users can reduce energy usage while achieving maximum performance even on compute-intensive tasks. With Norsk seamlessly running on NETINT hardware, live streaming services can consume as little energy as possible while delivering a fantastic experience to their customers.” 

– Norsk CEO Adrian Roe. 

“Id3as has proven expertise in helping its customers produce polished, high-volume, compelling productions, and as a product, Norsk makes that expertise widely accessible,” commented Alex Liu, NETINT founder and COO. “With Norsk’s deep integration with our T408 and Quadra products, this partnership makes NETINT’s proven ASIC-based technology available to any video engineer seeking to create high-quality productions at scale.” 

“With Norsk’s deep integration with our T408 and Quadra products, this partnership makes NETINT’s proven ASIC-based technology available to any video engineer seeking to create high-quality productions at scale.”  

– Alex Liu, NETINT founder and COO.

Both Norsk and NETINT will be at IBC in Amsterdam, September 15-18. Click to request a meeting with Norsk, or NETINT, and/or visit NETINT at booth 5.A86

ON-DEMAND: Adrian Roe - Make Live Easy with NORSK SDK

Save Bandwidth with Capped CRF

Video engineers are constantly seeking ways to deliver high-quality video more efficiently and cost-effectively. Among the innovative techniques gaining traction is capped Constant Rate Factor (CRF) encoding, a form of Content-Adaptive Encoding (CAE), which NETINT recently introduced across our Video Processing Unit (VPU) product lines for x264 and x265. In this blog, we explore why capped CRF is essential for engineers seeking to streamline video delivery and save on bandwidth costs.

Capped CRF - The Efficient Encoding Solution

Capped CRF is a smart bitrate control technique that combines the benefits of CRF encoding with a bit rate cap. Unlike Variable Bitrate Encoding (VBR) and Constant Bitrate Encoding (CBR), which target specific bitrates, capped CRF targets a specific quality level controlled by the CRF value, with a bitrate cap applied if the encoder can’t meet the quality level below the bitrate cap.

A typical capped CRF command string might look like this:

crf 21    -maxrate 6MB

This tells the encoder to encode to CRF 21 quality, but don’t exceed 6 Mbps. Let’s see how this might work with the football video shown in the figure, which compares capped CRF at these parameters with a CBR file encoded to 6 Mbps.

NETINT - Bitrate Comparison - Capped CRF

With the x264 codec, CRF 21 typically delivers a VMAF score of around 95. With easy-to-encode sideline shots, the CRF value would control the encoding, delivering 95 VMAF quality at 2 Mbps, a substantial savings over CBR at 6 Mbps.

During actual plays, the 6 Mbps bitrate cap would control, delivering the same quality as CBR at 6 Mbps. So, capped CRF saves bandwidth with easy-to-encode scenes while delivering equivalent to CBR quality with hard-to-encode scenes.

Ease of Integration

As implemented within the NETINT product line, capped CRF requires no additional technology licensing or complex integration – you simply upgrade your products and change your encoding command string. This means that you can seamlessly implement the feature across NETINT’s VPUs without extensive adjustments or additional investments.

NETINT’s capped CRF is compatible with H.264 and HEVC, and AV1 coming (Quadra only), so you can use the feature across different codec options to suit your specific project requirements. Regardless of the codec used, capped CRF delivers consistent video quality with the potential for bandwidth savings, making it a valuable tool for optimizing video delivery.

A Game Changer

By deploying capped CRF, engineers can efficiently deliver high-quality video streams, enhance viewer experiences, and reduce operational expenses. As the demand for video streaming continues to grow, Capped CRF emerges as a game-changer for engineers striving to stay at the forefront of video delivery optimization.

You can read more about how capped CRF works here. You can read more about Quadra VPUs here, and T408 transcoders here.

Now ON-DEMAND: Symposium on Building Your Live Streaming Cloud

Evaluating Low-Latency Video Streaming Protocols, Technologies and Best Practices

Low-latency video is a priority for many streaming engineers. This article reviews three schemas for producing low-latency video, identifies their pros and cons, and summarizes the questions engineers should consider when choosing a low-latency technology and/or service provider.

Architectures:

From an architectural perspective, let’s start with what we all know; HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming Over HTTP (DASH) are the de facto stream delivery protocols in most regions of the world. Technically, HLS is a draft standard of the IETF largely controlled by Apple and is now in its 12th version. In contrast, MPEG-DASH is a true international standard for streaming media, ratified by the Motion Pictures Experts Group (MPEG). Both enjoy nearly universal support from encoders, packagers, DRM providers, players, and other streaming infrastructure providers.

By way of background, HLS and DASH were originally developed to enable streaming video delivery without a streaming server. Prior to their creation, media servers like Adobe’s Flash Media server maintained a connection between the server and player to deliver media to the Flash Player using the RTMP protocol. Since Adobe charged a license fee for the servers, and each server could only manage a finite number of connections, large-scale streaming events were expensive to produce.

HLS and DASH were the two most significant HTTP-based Adaptive Streaming (HAS) techniques that supplanted the Adobe servers (Adobe and Microsoft had HAS technologies that some operators still use). Like all HAS technologies, HLS and DASH operate using a combination of media segments and metadata files stored on standard HTTP web servers. You see this in Figure 1. During playback, players retrieve the metadata files to identify the location of the media files and then download the files via HTTP as needed to play the video. All logic resides in the player; the server just stores the files.

Figure 1. All HAS technologies operate using metadata files and media segments.
Figure 1. All HAS technologies operate using metadata files and media segments.

From a latency perspective, most HAS techniques used ten-second segments, and most players buffered up to three segments before starting to play. Do the math, and you get up a minimum of thirty seconds of latency, long enough for your neighbor watching via satellite to see the goal, cheer the goal, hug his or her significant other, and grab another beverage from the fridge, leaving you uncomfortably wondering “what just happened?”

While the latency is awful, consider all the benefits that HAS techniques deliver. Because they transfer via HTTP, HAS technologies are supported by all browsers and virtually all video players in smart TVs, OTT dongles, smartphones, tablets, and other devices. The players retrieve standard HTTP packets that don’t need a special media server, are firewall-friendly, and can be delivered by normal CDNs, just like other web data.

Adaptive bitrate delivery with multiple files and bandwidths for different clients is standard, with caption and multiple language support, promoting a very high quality of experience (QoE). Advertising insertion is available, as well as studio-quality digital rights management (DRM) via techniques like Widevine, PlayReady, FairPlay, and the newest entrant, Huawei WisePlay.

So, except for latency, HAS techniques are quite effective for both the viewer and the publisher. Note that engineers seeking to reduce latency from thirty seconds to five seconds or so can cut the segment sizes to one-to-two-second segments, but going below this could start to degrade video quality. To effectively reduce your latency to sub-3 seconds or so, you must switch to a low-latency variety of HLS or DASH.

Table 1. Feature sets of the different technology options.
Table 1. Feature sets of the different technology options.

HLS, DASH, AND CMAF

Let’s briefly explore what CMAF is and how it relates to DASH and HLS. Where DASH and HLS are adaptive streaming protocols, CMAF, which stands for the Common Media Application Format, is a container format for streaming media. More specifically, CMAF combines a single set of media files and multiple sets of manifest files to enable one group of files to serve multiple targets.

Today, many engineers use CMAF to combine one set of media files with separate DASH and HLS manifests. This saves encoding costs since you produce one set of files rather than two and halves the footprint on the streaming server. Technically, you don’t deliver via CMAF, you deliver via HLS or DASH using media packaged in a CMAF container.

Obviously, if you can package one set of media files for delivery via DASH or HLS, they must work similarly. That’s why I’m bundling HLS and DASH into their own column. Since CMAF is a container format, not a streaming protocol, I’m ignoring that altogether, though I could have just as easily included CMAF in the HLS/DASH column.

Figure 2. LL DASH uses chunks to reduce latency.
Figure 2. LL DASH uses chunks to reduce latency. (www.theoplayer.com/blog/low-latency-dash).

Rather than buffering three complete segments before playing, the LL DASH or LL HLS player typically buffers three to four chunks. In operation, this can reduce latency to perhaps three to five seconds, often more, depending upon multiple factors.

In terms of compatibility, most, but not all, current players support LL HLS/DASH. Obviously, if you’re going to deploy either technology, you should verify player compatibility. Both technologies are backwards compatible so that legacy players that don’t support low latency can simply play the streams at normal latency. Otherwise, LL HLS/DASH retains all the other benefits of HAS delivery, including standard CDNs, ABR delivery, DRM, captions, advertising, and multiple language support.

THEO High-Efficiency Streaming Protocol (HESP)

THEO Technologies invented HESP and it’s managed by the HESP Alliance, which includes multiple members that provide valuable infrastructure support for HESP, including EZDRM and BuyDRM (digital rights management), MediaMelon (optimization and analytics), Ceeblue (transcoding), and multiple turnkey service providers. Like Apple HLS, HESP has been made available as a draft information specification via the Internet Engineering Task Force (IETF).

Technically, HESP is a HAS technique that offers lower latency while retaining all the benefits of HAS delivery. HESP works by creating two streams for each live event (Figure 3).

First is the initialization stream, which contains only keyframes. This allows the player to start playback on any frame, not just at the start of a group of pictures, which often can be between 1-2 seconds long. The second stream is the continuation stream, which is encoded normally and is the stream viewed by the player.

When a viewer joins the stream, the player first loads a frame from the initialization stream, which plays immediately, reducing latency to as low as less than a second. Then, the player retrieves subsequent frames from the continuation stream. You see this in Figure 3. When a player starts to play the stream, it retrieves a keyframe from the Initialization stream (C1). Then, it continues to play frames from the Continuation stream (d1 and so on).

Figure 2. LL DASH uses chunks to reduce latency.
Figure 3. The two streams used for the High-Efficiency Streaming Protocol. From here: https://www.hespalliance.org/hesp-technical-deck (registration and download required)

If multiple videos are available, when the viewer switches streams, the same thing happens. The player grabs a frame from the initialization stream of the second video (G2) to enable the immediate switch and then retrieves additional frames from the Continuation stream (h2, i2, j2) to deliver normal quality.

As shown in Figure 4, you can implement HESP using standard production techniques, a standard encoder, and a regular CDN, but you need a HESP-compatible packager and player. Interestingly, the HESP continuation stream is backwards compatible to LL HLS and LL DASH, so if you don’t have an HESP player, you fall back to LL HLS or LL DASH latency. Once implemented, HESP delivers all the benefits of HAS technologies, including ABR delivery, DRM support, and advertising support.

Figure 4. Implementing HESP requires a HESP-compatible packager and player.
Figure 4. Implementing HESP requires a HESP-compatible packager and player. From here: https://www.hespalliance.org/hesp-technical-deck (registration and download required)

Unlike all the other technologies discussed, HESP is royalty-bearing, with royalties assessed on usage volume, subscriptions, and hardware devices. The annual cap for software and subscription is $2.5 million, with a $25 million annual cap on devices sold. Click here for more details on the HESP royalties.

To be fair, MPEG LA did attempt to form a patent pool on DASH-related technology in 2015, but it closed in 2019. Later in 2019, two companies sued Showtime, Vudu, and Crackle on the same patents, though the patent upon which the claims were made was found invalid in 2022.

WebRTC

Google created WebRTC to enable real-time audio/video communications between browsers without plug-ins. Today, WebRTC is an open-source project and standard published by both the IETF and W3C. These standards mean near-universal support by computer and mobile browsers, though support in smart TVs, OTT dongles, and game devices is not nearly as pervasive as support for HLS and DASH.

Originally used for simple peer-to-peer communications, WebRTC was later used to power conferencing applications and has been extended into one-to-many live streaming applications, primarily because it offers sub 500 ms latency. It’s useful to consider WebRTC from two perspectives: what you get out of the box and what product and service providers can build around it to make it more useful. Let’s explore how WebRTC works and then return to this observation.

Figure 5 shows the basic WebRTC operating schema in peer-to-peer mode. Two peers wishing to connect meet through a signaling server. Once they connect, they exchange audio-video data directly in real-time using the User Datagram Protocol (UDP) protocol as opposed to HTTP. These architectural differences from HAS technologies have multiple implications.

First, using direct streaming via the User Datagram Protocol (UDP) rather than chunks and segments via HTTP means lower latency than HAS technologies. That’s the object of the exercise, so that’s a good thing.

However, the matching and other roles that servers play in WebRTC mean that WebRTC events can’t scale beyond a certain size without adding servers, which adds to the cost. UDP delivery means that the audio/video data can’t be delivered by standard CDNs, adding further to the cost and potentially limiting your reach, and UDP packets may not be able to get through some corporate firewalls, limiting access by some viewers, though there are several techniques that can mitigate this risk.

Figure 5. Low Latency WebRTC schema for simple peer-to-peer applications.

Originally designed for browser-to-browser communications, WebRTC has gained a reputation for low-quality video because most traditional applications involve low-quality webcam video compressed by the low-quality encoder in a browser. WebRTC has also been rightfully criticized for the lack of true adaptive streaming, studio DRM, and features like advertising insertion and captions. If you were to develop your own WebRTC application for large-scale live event streaming from scratch, you would have to work around all these limitations.

Fortunately, you don’t have to start from scratch. Multiple vendors like Ant Media, Red5, Wowza, and others offer servers that can ingest and transcode high-quality audio/video streams and automatically add servers and WebRTC-capable CDN capacity to meet viewer demand.

For those seeking a more turnkey experience, multiple vendors like Ant Media, Dolby.io, Wowza, Phenix, , and many others also offer WebRTC packages that you can deploy by providing a live feed and writing a check.

Though different providers offer different feature sets, several WebRTC-based services provide true ABR delivery and other HAS-like features, though you may have to deploy a service-specific player. In addition, EZDRM and Castlabs have shown working DRM implementations for WebRTC, so if studio DRM is essential, that’s now available.

Going forward, two new protocols, WHIP and WHEP, may further simplify deploying WebRTC for large-scale live streaming. WHIP stands for the WebRTC-HTTP Ingest Protocol, and it simplifies high-quality, ultra-low latency ingest into WebRTC, much like RTMP does for typical live streaming. WHEP stands for WebRTC-HTTP Egress Protocol, and it standardizes how WebRTC data can be downloaded to a non-WebRTC client, like a smart TV without WebRTC support.

WebRTC clearly offers lower latency than HAS -based technologies. The only questions are how much extra you’ll pay to achieve that latency over HAS technologies and what, if any, features you won’t be able to access.

Other Low Latency Technologies

There are other technologies used for low latency. For example, Nanocosmos offers a low-latency service called NanoStream Cloud using a technology called WebSockets with LL HLS rather than WebRTC. Like WebRTC, Websockets offers lower latency than any HAS approach, but as a pure technology, can’t match the features of LL HLS/DASH. However, as with WebRTC, developers like Nanocosmos have built around WebSockets to create fully featured services and technologies worth considering for many types of projects.

Figure 4. Implementing HESP requires a HESP-compatible packager and player.
Figure 6. Nanocosmos’ nanoStream Cloud service is built around WebSockets.

Questions to Ask

If you’re choosing a turnkey service, you probably care more about performance, features, and cost than the technology underlying the service. Here’s a list of questions to ask when choosing your technology and/or service provider.

  • What’s the maximum latency you can tolerate? The lower the latency, the greater the cost. Lower latency also means decreased playback robustness since little, if any, of the stream is pre-buffered.
  • What’s the overall projected cost of your event? If WebRTC or other low-latency service increases the cost, do the economics of the project justify that increased price?
  • What latency will the service deliver at your scale and geographic distribution?
  • What devices are supported (computers, mobile, and living room)?
  • What are the critical features that you need, and can the service deliver them? Captions? Advertising? Studio quality DRM? Multiple language support?
  • Can the system maintain synchronization with all viewers to support auctions and sports gambling?
  • What codecs are available, and what bandwidths can the system support?
  • Does the system support dynamic adaptive streaming, which means that it switches streams during the event to adjust to changing bandwidth conditions (rather than sending a single quality stream)?
  • If the system transmits UDP packets, what features are available to avoid firewall blocking?
  • Can you use your own player, or do you need to deploy a custom player? If custom, what are the design and branding opportunities?
  • What encoders does the system support, and at what quality level?

NETINT’s Low Latency VPUs

All low-latency productions start with low-latency transcoding. For a look at the latency produced by NETINT’s Quadra Video Processing Unit in normal and low-latency modes, check out this review: Unveiling the Quadra Server: The Epitome of Power and Scalability.

Author’s note: The author would like to thank Barry Owen from Wowza and Pieter-Jan Speelmans from THEO Technologies for the high-level tech read. I appreciate you both sharing your knowledge and insights. Any errors in the article are solely attributable to the author.

Cloud services are an effective way to begin live streaming. Still, once you reach a particular scale, it’s common to realize that you’re paying too much and can save significant OPEX by deploying transcoding infrastructure yourself. The question is, how to get started?
NETINT’s Build Your Own Live Streaming Platform symposium gathers insights from the brightest engineers and game-changers in the live-video processing industry on how to build and deploy a live-streaming platform.

Simplify Building Your Own Streaming Cloud with GPAC

Romain Bouqueau is CEO of Motion Spell and one of the principal architects of the GPAC open-source software, one of the three software alternatives presented in the symposium. He spoke about the three challenges facing his typical customers: features, cost, and flexibility, and identified how GPAC delivers on each challenge.

Then, he illustrated these concepts with three impressive case studies: Synamedia/Quortex, Instagram, and Netflix. Overall, Romain made a strong case for GPAC as the transcoding/packaging element of your live streaming cloud.

Simplify Building Your Own Streaming Cloud with GPAC

NETINT Symposium - GPAC

Romain began his talk with an excellent summary of the situation facing many live-streaming engineers. “It’s a pleasure to discuss the challenges of building your own live-streaming cloud. Cloud services are convenient, but once you scale, you may realize that you’re paying too much and you are not as flexible as you’d like to be. I hope to convince you that the cost of customization that you have when using GPAC is actually an investment with a very interesting ROI if you make the right choices. That’s what we’re going to talk about.”

NETINT Symposium - GPAC - Figure 1. About Romain, GPAC, and Motion Spell.
Figure 1. About Romain, GPAC, and Motion Spell.

Then, he briefly described his background as a principal architect of the GPAC open-source software, which he has contributed to for over 15 years. In this role, Romain is known for his advocacy of open source and open standards and as a media streaming entrepreneur. His primary focus has been on GPAC, a multimedia framework recognized for its emphasis on modularity and standards compliance.

He described that GPAC offers tools for media content processing, inspection, packaging, streaming playback, and interaction. Unlike many multimedia frameworks that cater to 2D TV-like experiences, GPAC is characterized by versatility, controlled latency, and the ability to support various scenarios, including hybrid broadcast broadband setups, interactivity, scripting, virtual reality, and 3D scenes.

Romain’s notable achievements include streamlining the MPEG ISO-based media file format used in formats like MP4, CMAF, DASH, and HLS. His work earned recognition through a technology engineering EMMY award. To facilitate the wider use of GPAC, Romain established Motion Spell, which serves as a bridge between GPAC and its practical applications. Motion Spell provides consulting, support, and training, acting as the exclusive commercial licenser of GPAC.

During his introduction, Romain discussed challenges faced by companies in choosing between commercial solutions and open source for video encoding and packaging. He posited that many companies often lack the confidence and necessary skills to fully implement GPAC but emphasized that despite this, the implementation process is both achievable and simpler than commonly assumed.

He shared that his customers face three major challenges, features, cost, and flexibility, and addressed each in turn.

Features

NETINT Symposium - GPAC -  Figure 2. The three challenges facing those building their live streaming cloud.
Figure 2. The three challenges facing those building their live streaming cloud.

The first challenge Romain highlighted relates to features and capabilities. He advised the audience to create a comprehensive list that encompasses the needed capabilities, including codecs, formats, containers, DRMs, captions, and metadata management.

He also underscored the importance of seamless integration with the broader ecosystem, which involves interactions with external players, analytics probes, and specific content protocols. Romain noted that while some solutions offer user-friendly graphical interfaces, deeper configuration details often need to be addressed to accommodate diverse codecs, parameters, and use cases, especially at scale.

Highlighting Netflix’s usage of GPAC, Romain emphasized that GPAC is well-equipped to handle features and innovation, given its research and standardization foundation. He acknowledged that while GPAC is often a step ahead in the industry, it cannot implement everything alone. Thus, sponsorship and contributions from the industry are crucial for the continued development of this open-source software.

Romain explained that GPAC’s compatibility with the ecosystem is a result of its broad availability. Its role as a reference implementation, driven by standardization efforts, makes it a favored choice. Additionally, he mentioned that Motion Spell’s efforts have led to GPAC becoming part of numerous plugin systems across the industry.

Cost

The second challenge highlighted by Romain is cost optimization. He explained that costs are typically divided into Capital Expenditure (CAPEX) and Operational Expenditure (OPEX). He noted that GPAC, being written in the efficient C programming language, benefits from rigorous scrutiny from the open-source community, making it highly efficient. He acknowledged that while GPAC offers various features, each use case varies, leading to questions about resource allocation. Romain encouraged considerations like the need for CDNs for all channels and premium encoders for all content.

Regarding CAPEX, Romain mentioned integration costs associated with open-source software, emphasizing that some costs might be challenging to evaluate, such as error handling. He referenced the Synamedia/Quortex architecture as an example of efficient error management. Romain also addressed the misconception that open source implies free software, referencing a seminar he participated in that compared the costs of different options.

He shared an example of a broadcaster with a catalog of 100,000 videos and 500 concurrent streams. The CAPEX for packaging ranged from $100,000 to $200,000, depending on factors like developer rates and location, with running costs being relatively low compared to transcoding costs.

Romain revealed that, based on his research, open source consistently ranked as the most cost-efficient option or a close competitor across different use cases. He concluded that combining GPAC with Motion Spell’s professional services and efficient encoding appliances like NETINT‘s aligns well with the industry’s efficiency challenges.

Flexibility

The final challenge discussed by Romain was flexibility, emphasizing the importance of moving swiftly in a fast-paced environment. He described how Netflix successfully transitioned from SVOD to AVOD, adapted from on-demand to live streaming, switched from H.264 to newer codecs, and consolidated multiple containers into one over short time frames, contributing to their profitability. Romain underlined the potential for others to achieve similar success using GPAC.

He introduced a new application within GPAC called “gpac”, designed to build customized media pipelines. In contrast to historical GPAC applications that offered fixed media pipelines, this new “gpac” application enables users to create tailored pipelines to address specific requirements. This includes transcoding packaging, content protection, networking, and in general, any feature you need for your private cloud.

The Synamedia/Quortex “just-in-time everything” paradigm

NETINT Symposium - GPAC -  Figure 3. Motion Spell’s work with Quortex which was acquired by Synamedia.
Figure 3. Motion Spell’s work with Quortex, which was acquired by Synamedia.

Romain then moved on to the Synamedia/Quortex use case that illustrated the challenge of GPAC supplying comprehensive features. He described Quortex’s innovative “just-in-time everything” paradigm for media pipelines.

Unlike the traditional 24/7 transcoder that is designed to never fail and requires backup solutions for seamless switching, Quortex divides the media pipeline into small components that can fail and be relaunched when necessary. This approach is particularly effective for live streaming scenarios, offering low latency.

Romain highlighted that the Quortex approach is highly adaptable as it can run on various instances, including cloud instances that are cost-effective but might experience interruptions. The system generates content on-demand, meaning that when a user wants to watch specific content on a device, it’s either cached, already generated, or created just-in-time. This includes packaging, transcoding, and other media processing tasks.

Romain attributed the success of the development project to Quortex’s vision and talented teams, as well as the strategic partnership with Motion Spell. He also shared that after project completion, Synamedia acquired Quortex.

Instagram

NETINT Symposium - GPAC - Figure 4. GPAC helped Instagram cut compute times by 94%.
Figure 4. GPAC helped Instagram cut compute times by 94%.

The second use case addressed the challenge of cost and involved Instagram, a member of the Meta Group. According to Romain, Instagram utilized GPAC’s MP4Box to reduce video compute times by an impressive 94%. This strategic decision helped prevent a capacity shortage within just twelve months, ensuring the platform’s ability to provide video uploads for all users.

Romain presented Instagram’s approach as noteworthy because it emphasizes the importance of optimizing costs based on content usage patterns. The platform decided to prioritize transmission and packaging of content over transcoding, recognizing that a significant portion of Instagram’s content is watched only a few times. In this scenario, the cost of transcoding outweighs the savings on distribution expenses. As Romain explained, “It made more sense for them to package and transmit most content instead of transcoding it, because most of Instagram’s content is watched only a few times. The cost of transcoding, in their case, outweighs the savings on the distribution cost.”

According to Romain, this strategy aligns with the broader efficiency trend in the media tech industry. By adopting a combined approach, Instagram used lower quality and color profiles for less popular content, while leveraging higher quality encoders for content requiring better compression. This optimization was possible because Instagram controls its own encoding infrastructure, which underscores the value of open-source solutions in providing control and flexibility to organizations.

The computational complexity of GPAC’s packaging is close to a bit-for-bit copy, contributing to the 94% reduction in compute times. Romain felt that Instagram’s successful outcome exemplifies how open-source solutions like GPAC can empower organizations to make significant efficiency gains while retaining control over their systems.

Netflix

NETINT Symposium - GPAC - Figure 5. GPAC helped Netflix transition from SVOD to AVOD, from On-Demand to live, and from H264 to newer codecs.
Figure 5. GPAC helped Netflix transition from SVOD to AVOD,
from On-Demand to live, and from H264 to newer codecs.

The final use case addresses the challenge of flexibility and involves a significant collaboration between GPAC, Motion Spell, and Netflix. According to Romain, this collaboration had a profound impact on Netflix’s video encoding and packaging platform, and contributed to an exceptional streaming experience for millions of viewers globally.

At the NAB Streaming Summit, Netflix and Motion Spell took the stage to discuss the successful integration of GPAC’s open-source software into Netflix’s content operations. During the talk, Netflix highlighted the ubiquity of the ISO BMFF (MPEG ISO-based media file format) in their workflows and emphasized their commitment to open standards and innovation. The alignment between GPAC and Netflix’s goals allowed them to leverage GPAC’s innovations for free, thanks to sponsorships and prior implementations.

Romain explained how Netflix’s transformation from SVOD to AVOD, from On-Demand to live, and from H264 to newer codecs was facilitated by GPAC’s ease of integration and efficiency in operations. In this fashion, he asserted, the collaboration between Motion Spell and Netflix exemplifies the capacity of open-source solutions to drive innovation and adaptability.

Romain further described how GPAC’s rich feature set, rooted in research and standardization, offers capabilities beyond most publishers’ current needs. The unified “gpac” executable simplifies deployment, making it accessible for service implementation. Leveraging open-source principles, GPAC proves to be cost-competitive and easy to integrate. Motion Spell’s role in helping organizations maximize GPAC’s potential, as demonstrated with Netflix, underscores the practical benefits of the collaboration.

Romain summarized how GPAC’s flexibility empowers organizations to optimize and differentiate themselves rapidly. Examples like Netflix’s interactive Bandersnatch, intelligent previews, exceptional captioning, and accessibility enhancements showcase GPAC’s adaptability to evolving demands. Looking forward, Romain described how user feedback continues to shape GPAC’s evolution, ensuring its continued improvement and relevance in the media tech landscape.

With a detailed description of GPAC’s features and capabilities, underscored by very relevant case studies, Romain clearly demonstrated how GPAC can help live streaming publishers overcome any infrastructure-related challenge. And for those who would like to learn more, or need support or assistance integrating GPAC into their workflows, he invited them to contact him directly.

NETINT Symposium - GPAC

ON-DEMAND:
Romain Bouqueau, Deploying GPAC for Transcoding and Packaging

Simplify Building Your Own Streaming Cloud with Wowza

Transcoding and packaging software is a key component of any live-streaming cloud, and one of the most functional and flexible programs available is the Wowza Streaming Engine. During the symposium, Barry Owen, Chief Solutions Architect at Wowza, detailed how to create a scalable streaming infrastructure using the Wowza Streaming Engine (WSE).

He started by discussing Wowza’s history, from its formation in 2005 to its recent acquisition of FlowPlayer. After defining the typical live streaming production pipeline, Barry detailed how WSE can serve as an origin server, transcoder, and packager, ensuring optimal viewer experience. He discussed WSE’s adaptability, including its ability to scale through GPU- and VPU-based transcoding, and emphasized WSE’s deployment options, which range from on-premises to cloud-based infrastructures. He then outlined Wowza’s infrastructure for distributing to audiences large and small.

Barry concluded by validating the session title by getting WSE up and running in under five minutes using Docker in a demo that you can watch below, at the end of this article.

Simplify Building Your Own Streaming Cloud with WOWZA

Start Streaming in Minutes with Wowza Streaming Engine

The focus of Barry’s talk was how to create a highly scalable streaming infrastructure with Wowza Streaming Engine (WSE). He began by recounting Wowza’s history. Established in 2005, the company launched its inaugural product, the Wowza Media Server, in 2007. This was later complemented by the Wowza Cloud, a SaaS solution, in 2013. Since its inception, Wowza has grown to support over 6,000 customers in 170 countries and boasts more than 35,000 streaming implementations. Their products are responsible for 38 million video transcoding hours each month. Recently, the company acquired FlowPlayer, adding a premier video player to its product lineup.

Barry emphasized Wowza’s commitment to providing streaming solutions that are reliable, scalable, and adaptable. He noted the importance of customization in the streaming sector and highlighted the company’s robust support team and services, which are designed to ensure customer success.

Wowza Streaming Engine Functionality

Barry then moved to the heart of his talk, which he set up by illustrating the streaming pipeline, which begins with video capture from sources like cameras, encoders, or mobile devices (Figure 1). Within this pipeline, WSE serves as a comprehensive media server that’s capable of functioning as an origin server, transcoder, and packager in a single system.

In this role, WSE offers real-time encoding and transcoding, producing multiple-bit rate streams for optimal viewer experience. It also performs real-time packaging into formats like HLS and DASH, facilitating compatibility across devices, and ancillary functions like adding DRM and captions, ad insertion, and metadata handling. Once processed, the stream is ready for delivery to a vast audience through one or multiple CDNs, depending on the desired scale and workflow.

NETINT Symposium - Figure 1. The role WSE plays in the streaming pipeline.
Figure 1. The role WSE plays in the streaming pipeline.

Then Barry dug deeper into the capabilities of the Wowza Streaming Engine, emphasizing its comprehensive nature as an end-to-end media server. These capabilities include:

  • Input Protocols: The Streaming Engine can ingest almost any input protocol, including RTSP, RTMP, SRT, WebRTC, HLS, and more.
  • Transcoding: WSE offers just-in-time, real-time transcoding with minimal latency. It also supports features like compositing and overlays, preparing the stream for packaging.
  • Packaging: WSE supports commonly used formats like HLS and DASH, as well as more specialty formats such as WebRTC, RTSP, and MPEG-TS .
  • Delivery: Wowza supports both push and pull models for stream delivery. It can integrate with multiple CDN vendors, including its own, and allows syndication to platforms like Facebook and LinkedIn.
  • Extensibility: A significant feature of the Streaming Engine is its flexibility. It offers a complete Java API for custom processing and a REST API for system command and control. WSE’s user interface (Streaming Engine Manager) is built on this REST API, demonstrating its functionality.
  • Configuration and Control: This Streaming Engine Manager allows users to manage one or more Streaming Engine instances from one web interface. Advanced users can also programmatically edit configurations to integrate with their systems.

Barry underscored WSE’s adaptability, highlighting its ability to cater to custom workflows, from complex ad insertions to machine learning applications. He also mentioned the availability of GitHub libraries with examples and encouraged exploring the Streaming Engine Manager for system configuration and monitoring.

Deploying Wowza Streaming Engine
NETINT Symposium - Figure 2. WSE deployment options.
Figure 2. WSE deployment options.

Barry next discussed the deployment options for the Wowza Streaming Engine. These include:

  • On-Premises: WSE can be deployed on-premises, offering cost-effective and efficient solutions, especially in high-density scenarios or when access to a personal data center is available.
  • Managed Hardware Platforms: WSE can be set up on platforms like Linode, providing access to bare metal in a managed environment.
  • Public Clouds: Pre-built images are available for major cloud platforms, allowing quick setup. Users can choose from marketplace images or standard ones, where they bring their own license key. Pre-configurations for common use cases are also provided.
  • Docker: Wowza offers Docker images for users, emphasizing its significance in automating deployment, scaling, and ensuring high availability in modern infrastructure setups.

Barry emphasized WSE’s adaptability to various deployment needs, from traditional setups to modern cloud-based infrastructures.

Scaling Wowza Streaming Engine
NETINT Symposium - Figure 3. Scaling stream processing with GPUs and VPUs (ASICS).
Figure 3. Scaling stream processing with GPUs and VPUs (ASICS).

Barry shifted the discussion to scaling and stream processing, emphasizing the different approaches and addressing their pros and cons. For stream processing, WSE can deploy CPU, GPU, and VPU-based transcoding. Here’s a brief discussion of each option.

CPU-Based Transcoding:

Barry highlighted the traditional approach of using software CPU-based transcoding. The Wowza Streaming Engine can efficiently leverage the processing power of CPUs to handle video streams. This method is straightforward and can be scaled by adding more servers or opting for higher-capacity CPUs.

He shared that CPU-based transcoding offers a wide range of adaptability, allowing for various encoding and decoding combinations. Given that CPUs are a standard component in servers, there’s no need for specialized hardware. On the other hand, he pointed out CPUs aren’t the best option for achieving high density or low power consumption.

GPU-Based Transcoding:

Regarding GPU-based transcoding, Barry stated that GPUs can handle a significant number of streams, and take on the heavy lifting from the CPU, ensuring smoother operation. However, they are expensive, and not exclusively designed for video processing, which can lead to higher power consumption.

VPU-Based Transcoding:

Barry expressed considerable enthusiasm for the capabilities of Video Processing Units (VPUs), or ASIC-based transcoders. Unlike general-purpose CPUs and GPUs, VPUs are purpose-built for video processing which allows them to handle video streams with remarkable efficiency. In recent years, VPUs have emerged as a promising solution, especially when it comes to achieving high-density streaming. Barry noted that these units not only offer a competitive price per channel but also boast minimal power consumption.

The Evolution Towards Specialization:

Drawing from his insights, Barry seemed to suggest a trend in the streaming industry: a move towards more specialized solutions. While CPUs and GPUs have been stalwarts in the industry, the rise of VPUs indicates a shift towards tools and technologies tailored specifically for streaming. This specialization promises not only enhanced performance but also greater efficiency in terms of cost and energy consumption.

Distributing Your Streams

Barry concluded his talk by discussing the distribution options available from Wowza. He emphasized the importance of adaptability when it comes to scaling outputs, especially given the diverse audience sizes that streaming services might cater to. WSE offers multiple distribution options to ensure that content reaches its intended audience efficiently, regardless of its size.

On-Premises Scaling:

One of the primary methods Barry discussed was scaling on-premises. By simply adding more servers to the existing infrastructure, streaming services can handle a larger load. This method is particularly useful for organizations that already have a significant on-premises setup and are looking to leverage that infrastructure.

CDN (Content Delivery Network):

For those expecting a vast number of viewers, Barry recommended using a content delivery network, or CDN. CDNs are designed to handle large-scale content delivery, distributing the content across a network of servers to ensure smooth and efficient delivery to a global audience. By offloading the streaming to a CDN, services can ensure that their content reaches viewers without any hitches, even during peak times.

Hybrid Approaches:

Barry found the hybrid model particularly intriguing. This approach combines the strengths of both on-premises scaling and CDNs. For instance, an organization could use its on-premises setup for regular streaming to a smaller audience. However, during events or times when a larger audience is expected, they could “burst” to the cloud, leveraging the power of CDNs to handle the increased load. This model offers both cost efficiency and scalability, ensuring that services are not overextending their resources during regular times but are also prepared for peak events.

In essence, Barry underscored the importance of flexibility in scaling. The ability to choose between on-premises, CDN, or a hybrid approach ensures that streaming services can adapt to meet any audience size.

NETINT Symposium - Wowza - Figure 4. Options for distributing to various audience sizes.
Figure 4. Options for distributing to various audience sizes.
Figure 8. A simple production with two cameras, a source switcher, and WebRTC output.

Start Streaming in Minutes with WSE: The Demonstration

Play Video about NETINT Symposium - Wowza
Figure 5. Click the image to run Barry’s demo.

Barry then ran a recorded demonstration to illustrate the simplicity of setting up the Wowza Streaming Engine using Docker – you can run this below. He ran the demo using Docker Desktop and Docker Compose, and the objective was to launch two containers: one for the Wowza Streaming Engine and another for its manager.

He began by activating the services using the command ‘Docker compose up’. Since he recorded the demo on an M1 Mac, he noted that the process might be slightly slower due to the Rosetta translation layer. As the services initialized, Barry explained the YAML file he used to provision these services. The file contained configurations for both the Streaming Engine and its Manager, detailing aspects like image sources, environment variables, and port settings.

With the services up and running, Barry navigated to Docker Desktop to monitor the performance of the two launched services, observing metrics like CPU and memory usage. He then accessed the Streaming Engine Manager via a web browser. Barry highlighted the versatility of Docker Compose, mentioning that it can manage multiple service instances, which can be beneficial for scalability, high availability, or clustering.

Upon accessing the manager, Barry logged in to view the server’s health snapshot, providing insights into its status. He then navigated to a pre-configured application named ‘live’ to stream content. Using a live streaming program called Open Broadcaster Software on his system, Barry set it up to stream to the server, pointing out the server’s recognition of the incoming stream and its subsequent packaging.

Returning to the manager, Barry verified the incoming stream’s presence and details. He then extracted the HLS URL for the stream, which he opened in a Safari browser tab to demonstrate live playback. The stream played seamlessly, underscoring the efficiency and ease of the entire process.

The demo showcased how, in a matter of minutes, you can configure, initiate, and stream using the Wowza Streaming Engine. You can get started yourself by downloading a trial version of WSE here.

ON-DEMAND:
Barry Owen, Start Streaming in Minutes with Wowza Streaming Engine

Simplify Building Your Own Streaming Cloud with NORSK SDK

Adrian Roe from id3as discussed Norsk, a technology designed to simplify the building of large-scale media workflows. id3as, originally a consultancy-led organization, works with major clients like DAZN and Nasdaq and is now pivoting to concentrate on Norsk, which it sells as an SDK. The technology underlying Norsk is responsible for delivering hundreds of thousands of live events annually and offers extensive expertise in low-latency and early-adoption technologies.

Adrian emphasized the company’s commitment to reliability, especially during infrastructure failures, and its initiatives in promoting energy-efficient streaming, including founding the Greening of Streaming organization. He also highlighted that about half of their deployments are cloud-based, suitable for fluctuating workloads, while the other half are on-premises or hybrid models, often driven by the need for high density at low cost and low energy consumption.

NETINT Symposium - About id3as

Simplify Building Your Own Streaming Cloud with Norsk SDK

Encoding Infrastructure is Simpler and Cheaper than Ever Before

The focus of the symposium was creating your own encoding infrastructure, and Adrian next focused on how new technologies were simplifying this and making it more affordable. For example, Adrian mentioned that advancements like NETINT’s Quadra video processing units (VPU) are changing the game, allowing some clients to consider shifting back to on-premises solutions.

Then, he described a recent server purchase to highlight the advancements in computing hardware capabilities. The server, which is readily available off-the-shelf and not particularly expensive, boasts impressive specs with 256 physical cores, 512 logical cores, and room for 24 U.2 cards like NETINT’s T408 or Quadra T1U.

Adrian then shared that during load testing, the server’s CPU profile was so extensive that it exceeded the display capacity of his screen, and he joked that it gave him an excuse to file an expense report for a new monitor. This anecdote emphasized the enormous processing capacity now available in relatively affordable hardware setups. The server occupies just 2U of rack space, and Adrian speculated that it could potentially deliver hundreds of channels in a fully loaded configuration, showcasing the leaps in efficiency and power in modern server hardware.

I think he used the second person — “it gives you an excuse to file an expense report for a new monitor” — but close enough.

NETINT Symposium - Figure 2. Infrastructure is getting cheaper and more capable.
Figure 2. Infrastructure is getting cheaper and more capable.

Why Norsk?

Adrian then shifted his focus to Norsk. He emphasized that Norsk is designed to cater to large broadcasters and systems integrators who require more than just off-the-shelf solutions. These clients often need specialized functionalities, like the ability to make automated decisions such as switching to a backup source if the primary one fails, without the need for human intervention.

They may also require complex multi-camera setups and dynamic overlays of scores or graphics during live events. Norsk is engineered to simplify these historically challenging tasks, enabling clients to easily put together sophisticated streaming solutions.

NETINT Symposium - Figure 3. Why Norsk in a nutshell.
Figure 3. Why Norsk in a nutshell.

He also pointed out that while some existing solutions may offer these features out of the box, creating such capabilities from scratch usually requires a significant engineering effort and demands professionals with advanced skills and a deep understanding of media technology, including intricate details of different video and container formats and how to handle them.

According to Adrian, Norsk eliminates these complexities, making it easier for companies to implement advanced streaming functionalities without the need for specialized knowledge. In short, Norsk fills the gap in the market for large broadcasters and systems integrators who require customized, automated decision-making capabilities in their streaming solutions.

Norsk In Action

Adrian then began demonstrating Norsk’s operation. He started by showing Figure 4 as an example of an output that Norsk might produce. This involved multiple inputs and overlays of scores or graphics that might need to update dynamically.

NETINT Symposium - Figure 4.  A typical production with multiple inputs and overlays that needed to change dynamically.
Figure 4. A typical production with multiple inputs and overlays
that needed to change dynamically.

Figure 5 shows the code Norsk uses to produce this output in its entirety via its “low code” approach. Parsing through the code, in the top section, you see the inputs, outputs, and transformation nodes. In this example, Norsk ingests RTMP and SRT (and also a logo from the file) and publishes the output over WHEP, a WebRTC-HTTP Egress protocol.  However, with Norsk it is easy to accommodate any of the common formats; for example, to change the output to (Low Latency) HLS, you would simply replace the “whep” output with HLS, and you’d be done.

NETINT Symposium - Figure 5. Norsk’s low code approach to the production shown in Figure 4.
Figure 5. Norsk’s low-code approach to the production shown in Figure 4.

The next section of code directs how the media flows between the nodes. Compose takes the video from the various inputs, while the audio mixer combines the audio from inputs 1 and 2.  Finally, the WHEP output subscribes to the outputs of the audio mixer and compose nodes. That’s all the code needed to create a complex picture in picture.

Adrian then went over the building blocks of how Norsk solutions can be constructed.  This started with an example of a pass-through setup where an RTMP input is published as a WebRTC output (Figure 6). With Norsk, all that’s needed is to specify that the output should get its audio and video from a particular input node, in this case, RTMP.

He then shared that Norsk is designed to be format-agnostic so that if the input node changes to another format like SRT or SDI, everything else in the setup will continue to function seamlessly. This ease of use allows for the quick development of sophisticated streaming solutions without requiring deep technical expertise.

NETINT Symposium - Figure 6. A simple example of an RTMP input published as WebRTC.
Figure 6. A simple example of an RTMP input published as WebRTC.

Adrian then described how Norsk handles potential incompatibilities that might arise in a workflow. In the above example he noted that WebRTC supports only the Opus audio codec, which is not supported by RTMP.

In these cases, Norsk automatically identifies the incoming audio codec (in this case, AAC) and transcodes it to Opus for compatibility with WebRTC. It also changed the encapsulation of the H.264 video for WebRTC compatibility. These automated adjustments showcase Norsk’s ability to simplify complex streaming workflows by making intelligent decisions to ensure compatibility and functionality.

Norsk will automatically adjust your workflow to make it work

NETINT Symposium - Figure 7. Norsk will automatically adjust your workflow to make it work; in this case, converting AAC to Opus and encapsulating the H.264 encoded video for WebRTC output.
Figure 7. Norsk will automatically adjust your workflow to make it work;
in this case, converting AAC to Opus and encapsulating the H.264 encoded video
for WebRTC output.

Next in the quick tour of “building blocks,” Adrian showed how easy it is to build a source switcher, allowing the user to switch dynamically between two camera inputs (Figure 8). He explained how id3as’ low-code approach made it easy and natural to extend this (for example to handle an unknown number of sources that might come and go during a live event).  

NETINT Symposium - Figure 8. A simple production with two cameras, a source switcher, and WebRTC output.
Figure 8. A simple production with two cameras, a source switcher, and WebRTC output.

According to Adrian, this simplicity allows engineers building solutions with Norsk to focus on the user experience they want to deliver to their customers as well as how to automate and simplify operations. They can focus on the intended result, not on the highly complex media technology required to deliver that result. This puts their logic into a very transparent context and simplifies building an application that delivers what’s intended.

Visualizing Productions

To better manage and control operations, Norsk supports visualizations via an OpenTelemetry API, which enables real-time data retrieval and input into a decisioning system for monitoring. As well as simple integration with these monitoring tools, Norsk also includes the visualizer shown in Figure 9 that renders this data as an easy-to-understand flow of media between nodes. You’ll see another two examples of this below.

NETINT Symposium - Figure 9. Norsk’s workflow visualization makes it simple to understand the media flow within an application.
Figure 9. Norsk’s workflow visualization makes it simple
to understand the media flow within an application.

Adrian then returned to the first picture-in-picture application shown to illustrate how the effect was created. You see that it’s very easy to position, size, and control each of the three elements, so the engineer can focus on the desired output, not anything to do with what the media itself looks like.

NETINT Symposium - Figure 10. Integrating three production inputs into a picture-in-picture presentation in Norsk.
Figure 10. Integrating three production inputs into a picture-in-picture presentation in Norsk.

Adrian highlighted the convenience and flexibility of Norsk’s low-code approach by describing how the system handles dynamic updates and configurations using code. He emphasized that the entire process of making configuration changes, like repositioning embedded areas or switching sources, involves just a few lines of code. This approach allows users to easily build complex functionalities like a video mixer with minimal engineering efforts.

Additionally, Adrian described how overlays are seamlessly integrated into the workflow. He explained that a browser overlay is treated as just another source which can be transformed and composed alongside other sources. By combining and outputting these elements, a sophisticated output with overlays can be achieved with minimal code.

Adrian emphasized that the features he demonstrated are sufficient to build a comprehensive live production system using Norsk like that shown in Figure 11. With Norsk’s low-code approach, he asserted, there are no additional complex calls required to achieve the level of sophistication demonstrated. With Norsk, he reiterated engineers building media applications can focus on creating the desired user experience rather than dealing with intricate technical details.

NETINT Symposium - Figure 11. Norsk enables productions like this with just a few lines of code.
Figure 11. Norsk enables productions like this with just a few lines of code.

Taking a big-picture view of how productions are created and refined, Adrian shared how the entire process of describing media requirements and building proof of concepts is streamlined with Norsk’s approach. With just a few lines of code, proof of concepts can be developed in a matter of hours or days. This leads to shorter feedback cycles with potential users, enabling quicker validation of whether the solution meets their needs. In this manner, Adrian noted that Norsk enables rapid feature development and allows for quicker feature launches to the market.

Integrating Encoding Hardware

Adrian then shifted his focus to integrations with encoding hardware, noting that many customers have production hardware that utilizes transcoders and VPUs like those supplied by NETINT to achieve high-scale performance. However, the development teams might not have the same production setup for testing and development purposes. Norsk addresses this challenge by providing an easy way for developers to work productively on their applications without requiring the exact production hardware.

You see this in Figure 12, an example of where developers can configure different settings for different environments. For instance, in a production or QA environment, the output settings could be configured for 1080p at 60 frames with specific Quadra configurations.

In contrast, in a development environment, the output settings might be configured for the x264 codec outputting 720p with different parameters, like using the ultrafast preset and zero latency. This approach allows engineers to have a productive development experience while not requiring the same processing power or hardware as the production setup.

NETINT Symposium - Figure 12. Norsk can use one set of transcoding parameters for development (on the right), and another for production.
Figure 12. Norsk can use one set of transcoding parameters
for development (on the right), and another for production.

Adrian then described how Norsk maximizes the acceleration hardware capabilities of third-party transcoders to optimize performance, sharing that with the NETINT cards, Norsk outperformed FFmpeg. For example, when using hardware transcoders, it’s generally more efficient to keep the processing on the hardware as much as possible to avoid unnecessary data transfers.

Adrian provided a comparison between scenarios where hardware acceleration is used and scenarios where it’s not. In one example, he showed how a NETINT T408 was used for hardware decoding, but some manipulations like picture-in-picture and resizing weren’t natively supported by the hardware. In this case, Norsk pulled the content to the CPU, performed the necessary manipulations, and then sent it back to the hardware for encoding (Figure 13).

NETINT Symposium - Figure 13. Working with the T408, Norsk had to scale and overlay via the host CPU.
Figure 13. Working with the T408, Norsk had to scale and overlay via the host CPU.

In contrast, with a Quadra card that does support onboard scaling and overlay, Norsk performed these functions on the hardware, remarkably using the same exact application code as for the T408 version (Figure 14). This way, Adrian emphasized, Norsk maximized the efficiency of the hardware transcoder and optimized overall system performance.

NETINT Symposium - Figure 14. Norsk was able to scale an overlay on the Quadra using the same code as on the T408
Figure 14. Norsk was able to scale an overlay on the Quadra using the same code as on the T408.

Adrian also highlighted the practicality of using Norsk by offering trial licenses for users to experience its capabilities. The trial license allows users to explore Norsk’s features and benefits, showcasing how it leverages emerging hardware technologies in the market to deliver high-density, high-availability, and energy-efficient media experiences. He noted that the trial software was fully capable, though no single session can exceed 20 minutes in duration.

Adrian then took a question from the audience, addressing Norsk’s support for SCTE-35. Adrian highlighted that Norsk is capable of SCTE-35 insertion to signal events such as ad insertion and program switching. Additionally, he noted that Norsk allows the insertion of tags into HLS and DASH manifest files, which can trigger specific events in downstream systems. This functionality enables seamless integration and synchronization with various parts of the media distribution workflow.

Adrian also mentioned that Norsk offers integration with digital rights management (DRM) providers. This means that after content is processed and formatted, it can be securely packaged to ensure that only authorized viewers have access to it. Norsk’s background in the broadcast industry has enabled it to incorporate these capabilities that are essential for delivering content to the right audiences while maintaining content protection and security.

For more information about Norsk, contact the company via their website or request a meeting. And if you’ll be at IBC, you can set up a meeting with them HERE.

ON-DEMAND: Adrian Roe, CEO at id3as | Make Live Easy with NORSK SDK

Beyond Traditional Transcoding: NETINT’s Pioneering Technology for Today’s Streaming Needs

Welcome to our here’s-what’s-new-since-last-IBC-so-you-should-schedule-a-meeting-with-us blog post. I know you’ve got many of these to wade through, so I’ll be brief.
First, a brief introduction. We’re NETINT, the ASIC-based transcoding company. We sell standalone products like our T408 video transcoder and Quadra VPUs ( for video transcoding units) and servers with ten of either device installed. All offer exceptional throughput at an industry-low cost per stream and power consumption per stream. Our products are denser, leaner, and greener than any competitive technology.
They’re also more innovative. The first-generation T408 was the first ASIC-based hardware transcoder available for at least a decade, and the second-generation Quadra was the first hardware transcoder with AV1 and AI processing. Our Quadra shipped before Google and Meta shipped their first generation ASIC-based transcoders and they still don’t support AV1.
That’s us; here’s what’s new.

Capped CRF Encoding

We’ve added capped CRF encoding to our Quadra products for H.264, HEVC, and AV1, with capped CRF coming for the T408 and T432 (H.264/HEVC). By way of background, with the wide adoption of content-adaptive encoding techniques (CAE), constant rate factor (CRF) encoding with a bit rate cap gained popularity as a lightweight form of CAE to reduce the bitrate of easy-to-encode sequences, saving delivery bandwidth and delivering CBR-like quality on hard-to-encode sequences. Capped CRF encoding is a mode that we expect many of our customers to use.

Figure 1 shows capped CRF operation on a theoretical football clip. The relevant switches in the command string would look something like this:

-crf 21  -maxrate 6MB

This directs FFmpeg to deliver at least the quality of CRF 21, which for H.264 typically equals around a 95 VMAF score. However, the maxrate switch ensures that the bitrate never exceeds 6 Mbps.

As shown in the figure, in operation, the Quadra VPU transcodes the easy-to-encode sideline shots at CRF 21 quality, producing a bitrate of around 2 Mbps. Then, during actual high-motion game footage, the 6MB cap would control, and the VPU would deliver the same quality as CBR. In this fashion, capped CRF saves bandwidth with easy-to-encode scenes while delivering equivalent to CBR quality with hard-to-encode scenes.

Figure 1. Capped CRF in operation. Relatively low-motion sideline shots are encoded to CRF 21 quality (~95 VMAF), while the 6 Mbps bitrate cap controls during high-motion game footage. Transcoding.
Figure 1. Capped CRF in operation. Relatively low-motion sideline shots are encoded to CRF 21 quality (~95 VMAF), while the 6 Mbps bitrate cap controls during high-motion game footage.

By deploying capped CRF, engineers can efficiently deliver high-quality video streams, enhance viewer experiences, and reduce operational expenses. As the demand for video streaming continues to grow, Capped CRF emerges as a game-changer for engineers striving to stay at the forefront of video delivery optimization.

You can read more about capped CRF operation and performance in Get Free CAE on NETINT VPUs with Capped CRF.

Peer-to-Peer Direct Memory Access (DMA) for Cloud Gaming

Peer-to-peer DMA is a feature that makes the NETINT Quadra VPU ideal for cloud gaming. By way of background, in a cloud-gaming workflow, the GPU is primarily used to render frames from the game engine output. Once rendered, these frames are encoded with codecs like H.264 and HEVC.

Many GPUs can render frames and transcode to these codecs, so it might seem most efficient to perform both operations on the same GPU. However, encoding demands a significant chunk of the GPU’s resources, which in turn reduces overall system throughput. It’s not the rendering engine that’s stretched to its limits but the encoder.

What happens when you introduce a dedicated video transcoder into the system using normal techniques? The host CPU manages the frame transfer between the GPU and the transcoder, which can create a bottleneck and slow system performance.

Figure 2. Peer-to-peer DMA enables up to 200 720p60 game streams from a single 2RU server. Transcoding.
Figure 2. Peer-to-peer DMA enables up to 200 720p60 game streams from a single 2RU server.

In contrast, peer-to-peer DMA allows the GPU to send frames directly to the transcoder, eliminating CPU involvement in data transfers (Figure 2). With peer-to-peer DMA enabled, the Quadra supports latencies as low as 8ms, even under heavy loads. It also unburdens the CPU from managing inter-device data transfers, freeing it to handle other essential tasks like game logic and physics calculations. This optimization enhances the overall system performance, ensuring a seamless gaming experience.

Some NETINT customers are using Quadra and peer-to-peer DMA to produce 200 720p60 game streams from a single 2RU server, and that number will increase to 400 before year-end. If you’re currently assembling an infrastructure for cloud gaming, come see us at IBC.

Logan Video Server

NETINT started selling standalone PCIe and U.2 transcoding devices, which our customers installed into servers. In late 2022, customers started requesting a prepackaged solution comprised of a server with ten transcoders installed. The Logan Video Server is our first response.

Logan refers to NETINT’s first-generation G4 ASIC, which transcodes to H.264 and HEVC. The Logan Video Server, which launched in the first quarter of 2023, includes a SuperMicro server with a 32-core AMD CPU running Ubuntu 20.04 LTS and ten NETINT T408 U.2 transcoder cards (which cost $300 each) for $8,900. There’s also a 64-core option available for $11,500 and an 8-core option for $7,000.

The value proposition is simple. You get a break on price because of volume commitments and don’t have to install the individual cards, which is generally simple but still can take an hour or two. And the performance with ten installed cards is stunning, given the price tag.

You can read about the performance of the 32-core server model in my review here, which also discusses the software architecture and operation. We’ll share one table, which shows one-to-one transcoding of 4K, 1080p, and 720p inputs with FFmpeg and GStreamer.

At the $8,900 cost, the server delivers a cost per stream as low as $445 for 4K, $111.25 for 1080p, and just over $50 for 720p at normal and low latency. Since each T408 only draws 7 watts and CPU utilization is so low, power consumption is also exceptionally low.

Meet NETINT at IBC - Transcoding - Table-1
Table 1. One-to-one transcoding performance for 4K, 1080p, and 720p.

With impressive density, low power consumption, and multiple integration options, the NETINT Video Transcoding Server is the new standard to beat for live streaming applications. With a lower-priced model available for pure encoding operations and a more powerful model for CPU-intensive operations, the NETINT Logan server family meets a broad range of requirements.

Quadra Video Server

Once the Logan Video Server became available, customers started asking about a similarly configured server for NETINT’s Quadra line of video transcoding units (VPUs), which adds AV1 output, onboard scaling and overlay, and two AI processing engines. So, we created the Quadra Video Server.

This model uses the same Supermicro chassis as the Logan Video Server and the same Ubuntu operating system but comes with ten Quadra T1U U.2 form factor VPUs, which retail for $1,500 each. Each T1U offers roughly four times the throughput of the T408, performs on-board scaling and overlay, and can output AV1 in addition to H.264 and HEVC.

The CPU options are the same as the Logan server, with the 8-core unit costing $19,000, the 32-core unit costing $21,000, and the 64-core model costing $24,000. That’s 4X the throughput at just over 2x the price.

You can read my review of the 32-core Quadra Video Server here. I’ll again share one table, this time reporting encoding ladder performance at 1080p for H.264 (120 ladders), HEVC (140), and AV1 (120), and 4K for HEVC (40) and AV1 (30).

In comparison, running FFmpeg using only the CPU, the 32-core system only produced nineteen H.264 1080p ladders, five HEVC 1080p ladders, and six AV1 1080p ladders. Given this low-volume throughput at 1080p, we didn’t bother trying to duplicate the 4K results with CPU-only transcoding.

Figure 2. Encoding ladder performance of the Quadra Video Server.
Table 2. Encoding ladder performance of the Quadra Video Server.

Beyond sheer transcoding performance, the review also details AI-based operations and performance for tasks like region of interest transcoding, which can preserve facial quality in security and other relatively low-quality videos, and background removal for conferencing applications.

Where the Logan Video Server is your best low-cost option for high volume H.264 and HEVC transcoding, the Quadra Video Server quadruples these outputs, adds AV1 and onboard scaling and overlay, and makes AI processing available.

Come See Us at the Show

We now return to our normally scheduled IBC pitch. We’ll be in Stand 5.A86 and you can book a meeting by clicking here.

Figure 3. Book a meeting.
.

Now ON-DEMAND: Symposium on Building Your Live Streaming Cloud

NETINT Buyer’s Guide. Choosing the Right VPU & Server for Your Workflow.

This guide is designed to help you choose the optimum NETINT Video Processing Unit (VPU) for your encoding workflow.

As an overview, note that all NETINT hardware products run the same basic software controlled via FFmpeg and GStreamer patches or an SDK. This includes load balancing of all encoding resources in a server. In addition, both generations are similar in terms of latency and HDR support.

Question 1. Which ASIC Architecture: Codensity G4 (Logan) or Codensity G5 (Quadra)?

Tables 1 and 2 show the similarities and differences between Codensity G4 ASIC-powered products (T408 and T432) and Codensity G5-based products (Quadra T1U, T1A, T2A). Both architectures are available in either the U.2 or AIC form factor, the latter all half-height half-length (HHHL) configurations.

From a codec perspective, the main difference is that G5-based products support AV1 encoding and VP9 decoding. In terms of throughput, G5-based products offer four times the throughput but cost roughly three times more than G4, making the cost per output stream similar but with greater stream densities per host server. G5 power consumption is roughly 3x higher per ASIC than G4, but the throughput is 4x, making power consumption per stream lower.

Choosing the Right VPU & Server - Table 1. Codec support, throughput, and power consumption.
Table 1. Codec support, throughput, and power consumption.

Table 2 covers other hardware features. From an encoding perspective, G5-based products enable tuning of quality and throughput to match your applications, while quality and throughput are fixed for G4-based products. The G5’s quality ceiling is higher than the G4, at the cost of throughput, and the quality floor is lower, with an option for higher throughput.

G5-based products are much more capable hardware-wise, performing scaling, overlay, and audio compression and offer AI processing of 15 TOPS for T1U and 18 TOPS for T1A (36 TOPS T2A). In contrast, G4-based products scale, overlay, and encode audio via the host CPU and offer no AI processing. You can read about Quadra’s AI capability here.

Peer-to-peer DMA is a feature that allows G5-based products to communicate directly with some specific GPUs, which is particularly useful in cloud gaming. This is only available on G5-based products. Learn about peer-to-peer DMA here.

Note that G4 and G5-based devices can co-exist on the same server, so you can add G5 devices to a server with G4 devices already installed and vice versa.

Choosing the Right VPU & Server - Table 2 Advanced hardware functionality.
Table 2. Advanced hardware functionality.

Observations:

  • Codensity G4 and G5-based VPUs offer similar cost-per-stream, with Quadra slightly more efficient on a watts-per-stream basis. Both products transcode to H.264 and HEVC formats (G5 encodes to AV1 and decodes VP9).

  • Choose G4-based products for:
    • The absolute lowest overall cost
    • Compatibility with existing G4-based encoding stacks
    • Interactive same resolution-in/out productions (minimum scaling and overlay)

  • Choose G5-based products for:
    • AV1 output
    • AI integration
    • Applications that need quality and throughput tuning
    • Applications that involve scaling and overlay
    • Maximum throughput from a single server
    • Cloud gaming

Question 2: Which G4-based Product?

This section discusses your G4-based options shown in Figure 1, with the U.2-based T408 in the background and AIC-form factor T432 in the foreground. These products are designated as Transcoders since this is their primary hardware-based function.

Choosing the Right VPU & Server - Figure 1. The NETINT T408 in the back, T432 in the front.
Figure 1. The NETINT T408 in the back, T432 in the front.

Table 3 identifies the key differences between NETINT’s two G4-based VPUs, the T408, which includes a single G4 ASIC in a U.2 form factor, and the T432, which includes four G4 ASICS in an AIC half-height half-length configuration.

Choosing the Right VPU & Server - Table 3. NETINT’s two G4-based products.
Table 3. NETINT’s two G4-based products.

Observations:

  • The U.2-based T408 offers the best available density for installing units into a 1RU server.
  • The AIC-based T432 is the best option for computers without U.2 connections and for maximum server chassis density.

Question 3: Which G5-based Product?

Figure 2 identifies the three Quadra G5-based products, with the U.2-based T1U in the back, the AIC-based T1A in the middle, and the AIC-based T2A in the front. These products are designated Video Processing Units, or VPUs, because their hardware functionality extends far beyond simple transcoding.

Choosing the Right VPU & Server - Figure 2. The Quadra T1U in the back, T1A in the middle, and T2A in front.
Figure 2. The Quadra T1U in the back, T1A in the middle, and T2A in front.

Table 3 identifies the key differences between NETINT’s three G5-based VPUs:

  • The T1U includes a single G5 ASIC in a U.2 form factor.
  • The T1A includes a single G5 ASIC in an AIC half-height half-length configuration.
  • The T2A includes two G5 ASICs in an AIC half-height half-length configuration.
Choosing the Right VPU & Server - Table 4. NETINT’s two G4-based products.
Table 4. NETINT’s two G4-based products.

Observations:

  • The U.2-based Quadra T1U offers the best density for installing in a 1RU server.
  • The Quadra T2A offers the best density for AIC-based installation and is ideal for cloud gaming servers that need peer-to-peer DMA communication with GPUs.
  • The AIC-based Quadra T1A is the most affordable AIC option for installs that don’t need maximum density.

Question 4: VPU or Server?

NETINT offers two video servers that use the same Supermicro 1114S-WN10RT server chassis; the Logan Video Server contains ten T408 U.2 VPUs, while the Quadra Video Server contains ten Quadra T1U VPUs. Servers offer a turnkey option for fast and simple deployment.

An advantage of buying a NETINT Video Server is all components, including CPU, RAM, hard drive, OS, and software versions, have been extensively tested for compatibility, stability, and performance, making them the easiest and fastest way to transition from software to hardware encoding.

As for the choice between servers, your answer to question 1 should guide your selection.

If you have any questions about any products, please contact us here.

Now ON-DEMAND: Symposium on Building Your Live Streaming Cloud