Understanding the Economics of Transcoding

Understanding the Economics of Transcoding

Whether your business model is FAST or subscription-based premium content, your success depends upon your ability to deliver a high-quality viewing experience while relentlessly reducing costs. Transcoding is one of the most expensive production-related costs and the ultimate determinant of video quality, so obviously plays a huge role on both sides of this equation. This article identifies the most relevant metrics for ascertaining the true cost of transcoding and then uses these metrics to compare the relative cost of the available methods for live transcoding.

Economics of Transcoding: Cost Metrics

There are two potential cost categories associated with transcoding: capital costs and operating costs. Capital costs arise when you buy your own transcoding gear, while operating costs apply when you operate this equipment or use a cloud provider. Let’s discuss each in turn.

Economics of Transcoding: CAPEX

The simplest way to compare transcoders is to normalize capital and operating costs using the cost per stream or cost per ladder, which simplifies comparing disparate systems with different costs and throughput. The cost per stream applies to services inputting and delivering a single stream, while the cost per ladder applies to services inputting a single stream and outputting an encoding ladder.

We’ll present real-world comparisons once we introduce the available transcoding options, but for the purposes of this discussion, consider the simple example in Table 1. The top line shows that System B costs twice as much as System A, while line 2 shows that it also offers 250% of the capacity of System A. On a cost-per-stream basis, System B is actually cheaper.

Understanding the Economics of Transcoding - table 1
TABLE 1: A simple cost-per-stream analysis.

The next few lines use this data to compute the number of required systems for each approach and the total CAPEX. Assuming that your service needs 640 simultaneous streams, the total CAPEX for System A dwarfs that of System B. Clearly, just because a particular system costs more than another doesn’t make it the more expensive option.

For the record, the throughput of a particular server is also referred to as density, and it obviously impacts OPEX charges. System B delivers over six times the streams from the same 1RU rack as System A, so is much more dense, which will directly impact both power consumption and storage charges.

Details Matter

Several factors complicate the otherwise simple analysis of cost per stream. First, you should analyze using the output codec or codecs, current and future. Many systems output H.264 quite competently but choke considerably with the much more complex HEVC codec. If AV1 may be in your future plans, you should prioritize a transcoder that outputs AV1 and compare cost per stream against all alternatives.

The second requirement is to use consistent output parameters. Some vendors quote throughput at 30 fps, some at 60 fps. Obviously, you need to use the same value for all transcoding options. As a rough rule of thumb, if a vendor quotes 60 fps, you can double the throughput for 30 fps, so a system that can output 8 1080p60 streams and likely output 16 1080p30 streams. Obviously, you should verify this before buying.

If a vendor quotes in streams and you’re outputting encoding ladders, it’s more complicated. Encoding ladders involve scaling to lower resolutions for the lower-quality rungs. If the transcoder performs scaling on-board, throughput should be greater than systems that scale using the host CPU, and you can deploy a less capable (and less expensive) host system.

The last consideration involves the concept of “operating point,” or the encoding parameters that you would likely use for your production, and the throughput and quality at those parameters. To explain, most transcoders include encoding options that trade off quality vs throughput much like presets do for x264 and x265. Choosing the optimal setting for your transcoding hardware is often a balance of throughput and bandwidth costs. That is, if a particular setting saves 10% bandwidth, it might make economic sense to encode using that setting even if it drops throughput by 10% and raises your capital cost accordingly. So, you’d want to compute your throughput numbers and cost per stream at that operating point.

In addition, many transcoders produce lower throughput when operating in low latency mode. If you’re transcoding for low-latency productions, you should ascertain whether the quoted figures in the spec sheets are for normal or low latency.

For these reasons, completing a thorough comparison requires a two-step analysis. Use spec sheet numbers to identify transcoders that you’d like to consider and acquire them for further testing. Once you have them in your labs you can identify the operating point for all candidates, test at these settings, and compare them accordingly.

Economics of Transcoding: OPEX - Power

Now, let’s look at OPEX, which has two components: power and storage costs. Table 2 continues our example, looking at power consumption.

Unfortunately, ascertaining power consumption may be complicated if you’re buying individual transcoders rather than a complete system. That’s because while transcoding manufacturers often list the power consumption utilized by their devices, you can only run these devices in a complete system. Within the system, power consumption will vary by the number of units configured in the system and the specific functions performed by the transcoder.

Note that the most significant contributor to overall system power consumption is the CPU. Referring back to the previous section, a transcoder that scales onboard will require lower CPU contribution than a system that scales using the host CPU, reducing overall CPU consumption. Along the same lines, a system without a hardware transcoder uses the CPU for all functions, maxing out CPU utilization likely consuming about the same energy as a system loaded with transcoders that collectively might consume 200 watts. 

Again, the only way to achieve a full apples-to-apples comparison is to configure the server as you would for production and measure power consumption directly. Fortunately, as you can see in Table 2, stream throughput is a major determinant of overall power consumption. Even if you assume that systems A and B both consume the same power, System B’s throughput makes it much cheaper to operate over a five year expected life, and much kinder to the environment.

Understanding the Economics of Transcoding - table 2
TABLE 2. Computing the watts per stream of the two systems.

Economics of Transcoding: Storage Costs

Once you purchase the systems, you’ll have to house them. While these costs are easiest to compute if you’re paying for a third-party co-location service, you’ll have to estimate costs even for in-house data centers. Table 3 continues the five year cost estimates for our two systems, and the denser system B proves much cheaper to house as well as power.

Understanding the Economics of Transcoding - table 3
TABLE 3: Computing the storage costs for the two systems.

Economics of Transcoding: Transcoding Options

These are the cost fundamentals, now let’s explore them within the context of different encoding architectures.

There are three general transcoding options: CPU-only, GPU, and ASIC-based. There are also FPGA-based solutions, though these will probably be supplanted by cheaper-to-manufacture ASIC-based devices over time. Briefly,

  • CPU-based transcoding, also called software-based transcoding, relies on the host central processing unit, or CPU, for all transcoding functions.
  • GPU-based transcoding refers to Graphic Processing Units, which are developed primarily for graphics-related functions but may also transcode video. These are added to the server in add-in PCIe cards.
  • ASICs are Application-Specific Integrated Circuits designed specifically for transcoding. These are added to the server as add-in PCIe cards or devices that conform to the U.2 form factor.

Economics of Transcoding: Real-World Comparison

NETINT manufactures ASIC-based transcoders and video processing units. Recently, we published a case study where a customer, Mayflower, rigorously and exhaustively compared these three alternatives, and we’ll share the results here.

By way of background, Mayflower’s use case needed to input 10,000 incoming simultaneous streams and distribute over a million outgoing simultaneous streams worldwide at a latency of one to two seconds. Mayflower hosts a worldwide service available 24/7/365.

Mayflower started with 80-core bare metal servers and tested CPU-based transcoding, then GPU-based transcoding, and then two generations of ASIC-based transcoding. Table 4 shows the net/net of their analysis, with NETINT’s Quadra T2 delivering the lowest cost per stream and the greatest density, which contributed to the lowest co-location and power costs.

RESULTS: COST AND POWER

Understanding the Economics of Transcoding - table 4
TABLE 4. A real-world comparison of the cost per stream and OPEX associated with different transcoding techniques.

As you can see, the T2 delivered an 85% reduction in CAPEX with ~90% reductions in OPEX as compared to CPU-based transcoding. CAPEX savings as compared to the NVIDIA T4 GPU was about 57%, with OPEX savings around ~70%.

Table 5 shows the five-year cost of the Mayflower T-2 based solution using the cost per KWH in Cyprus of $0.335. As you can see, the total is $2,225,241, a number we’ll return to in a moment.

Understanding the Economics of Transcoding - table 5
TABLE 5: Five-year cost of the Mayflower transcoding facility.

Just to close a loop, Tables 1, 2, and 3, compare the cost and performance of a Quadra Video Server equipped with ten Quadra T1U VPUs (Video Processing Units) with CPU-based transcoding on the same server platform. You can read more details on that comparison here.

Table 6 shows the total cost of both solutions. In terms of overall outlay, meeting the transcoding requirements with the Quadra-based System B costs 73% less than the CPU-based system. If that sounds like a significant savings, keep reading. 

TABLE 6: Total cost of the CPU-based System A and Quadra T2-based System B.

Economics of Transcoding: Cloud Comparison

If you’re transcoding in the cloud, all of your costs are OPEX. With AWS, you have two alternatives: producing your streams with Elemental MediaLive or renting EC3 instances and running your own transcoding farm. We considered the MediaLive approach here, and it appears economically unviable for 24/7/365 operation.

Using Mayflower’s numbers, the CPU-only approach required 500 80-core Intel servers running 24/7. The closest CPU in the Amazon ECU pricing calculator was the 64-core c6i.16xlarge, which, under the EC2 Instance Savings plan, with a 3-year commitment and no upfront payment, costs 1,125.84/month.

Understanding the Economics of Transcoding - figure 1
FIGURE 1. The annual cost of the Mayflower system if using AWS.

We used Amazon’s pricing calculator to roll these numbers out to 12 months and 500 simultaneous servers, and you see the annual result in Figure 1. Multiply this by five to get to the five-year cost of $33,775,056, which is 15 times the cost of the Quadra T2 solution, as shown in table 5.

We ran the same calculation on the 13 systems required for the Quadra Video Server analysis shown in Tables 1-3 which was powered by a 32-core AMD CPU. Assuming a c6a.8xlarge CPU with a 3-year commitment and no upfront payment,, this produced an annual charge of $79,042.95, or $395,214.6 for the five-year period, which is about 8 times more costly than the Quadra-based solution.

Understanding the Economics of Transcoding - figure 2
FIGURE 2: The annual cost of an AWS system per the example schema presented in tables 1-3.

Cloud services are an effective means for getting services up and running, but are vastly more expensive than building your own encoding infrastructure. Service providers looking to achieve or enhance profitability and competitiveness should strongly consider building their own transcoding systems. As we’ve shown, building a system based on ASICs will be the least expensive option.

In August, NETINT held a symposium on Building Your Own Live Streaming Cloud. The on-demand version is available for any video engineer seeking guidance on which encoder architecture to acquire, the available software options for transcoding, where to install and run your encoding servers, and progress made on minimizing power consumption and your carbon footprint.

ON-DEMAND: Building Your Own Live Streaming Cloud

Demystifying the live-streaming setup

Demystifying the live-streaming setup w Stef van der Ziel from Jet-Stream (NETINT Symposium on Building Your Own Streaming Cloud) - featured image

Stef van der Ziel, our keynote speaker, has been in the streaming industry since 1994, and as founder of Jet-Stream, oversaw the development of Jet-Stream Cloud, a European-based streaming platform. He discussed the challenges associated with creating your own encoding infrastructure, how to choose the best transcoding technology, and the cost savings available when you build your own platform.

Stef started by recounting the evolution and significance of transcoding in the streaming industry. To help set the stage, he described the streaming process, starting with a feed from a source like a camera. This feed is encoded and then transcoded into various qualities. This is followed by origin creation, packaging, and, finally, delivery via a CDN.

Stef emphasized the distinction between encoding and transcoding, noting that the latter is mission-critical too. If errors occur during transcoding, the entire stream can fail, leading to poor quality or buffering issues for viewers.

He then related that quality and viewer experience are paramount for transcoding services, regardless of whether they are cloud-based or on-premises. However, cost management is equally crucial.

Beyond the direct costs of transcoding, incorrect settings can lead to increased bandwidth and storage costs. Stef noted the often-overlooked human operational costs associated with managing a streaming platform, especially in the realm of transcoding. Expertise is essential, necessitating either an in-house team or hiring external experts.

Stef observed that while traffic prices have decreased significantly over the years, transcoding costs have remained relatively high. However, he noted a current trend of decreasing transcoding costs, which he finds exciting.

Lastly, in line with the theme of sustainable streaming, Stef emphasized the importance of green practices at every step of the streaming process. He mentioned that Jet-Stream has practiced green streaming since 2004 and that the intense computational demands of transcoding and analytics make them resistant to green practices.

Demystifying the live-streaming setup w Stef van der Ziel from Jet-Stream (NETINT Symposium on Building Your Own Streaming Cloud) - slide 2

CHOOSING TRANSCODING OPTIONS

In discussing transcoding options, Stef related that CPU-based encoding can deliver very good quality, but that it’s costly in terms of CPU and energy usage. He noted that the quality of GPU-based encoding was lower than CPU and less cost and power efficient than ASICs.

Demystifying the live-streaming setup w Stef van der Ziel from Jet-Stream (NETINT Symposium on Building Your Own Streaming Cloud) - slide 10
FIGURE 1. Stef found CPU and ASIC-based transcoding quality superior to GPU-based transcoding.

The real game-changer, according to Stef, is ASIC-based encoding. ASICs not only offer superior quality but also minimal latency, a crucial factor for specific low-latency use cases.

Compared to software transcoding, ASICs are also much more power efficient. For instance, while CPU-based transcoding could consume anywhere from 2,800 to 9,000 watts for transcoding 80 OTT channels to HD, ASIC-based hardware transcoding required only 308 watts for the same task. This translates to an energy saving of at least 89%.

Beyond energy efficiency, ASICs also shine in terms of scalability. Stef explained that the power constraints of CPU encoding might limit the capacity of a single rack to 200 full HD channels. In contrast, a rack populated with ASIC-based transcoders could handle up to 2,400 channels concurrently. This capability means increased density, optimized use of rack space, and overall heightened efficiency.

Not surprisingly, given these insights, Stef positioned ASIC-based transcoding as a clear frontrunner over CPU- and GPU-based encoding methods.

OTHER FEATURES TO CONSIDER

Once you’ve chosen your transcoding technology, and implemented basic transcoding functions, you need to consider additional features for your encoding facility. Drawing from his experience with Jet-Stream’s own products and services, Stef identified some to consider.

  • Containerize operation in Kubernetes containers so any crash, however infrequent, is self-contained and easily replaceable, often without viewers’ noticing.
  • Stack multiple machines to build a microcloud and implement automatic scaling and pooling.

Combine multiple technologies like decoding, filtering, origin, and edge serving, into a single server. That way, a single server can provide a complete solution in many different scenarios.

Demystifying the live-streaming setup w Stef van der Ziel from Jet-Stream (NETINT Symposium on Building Your Own Streaming Cloud) - slide 20

BEYOND THE BASICS

Beyond these basics, Stef also explained the need to add a flexible and capable interface to your system and to add new features continually, as Jet-Stream does. For example, you may want to burn in a logo or add multi-language audio to your stream, particularly in Europe. You may want or need to support subtitles and offer speech-to-text transcription.

If you’re supporting multiple channels with varying complexity, you may need different encoding profiles tuned for each content type. Another option might be capped CRF encoding to minimize bandwidth costs, which is now standard on all NETINT VPUs and transcoders. On the distribution side, you may need your system to support multiple CDNs for optimized distribution in different geographic regions and auto-failover.

Finally, as your service grows, you’ll need interfaces for health and performance status. Some of the performance indicators that Jet-Stream systems track include bandwidth per stream, viewers per stream, total bandwidth, and many others.

The key point is that you should start with a complete list of necessary features for your system and estimate the development and implementation costs for each. Knowledge of sophisticated products and services like those offered by Jet-Stream will help you understand what’s essential. But you really need a clear-eyed view of the development cost and time before you undertake creating your own encoding infrastructure.

COST AND ENERGY SAVINGS

Fortunately, it’s clear that building your own system can be a huge cost saver. According to Stef, on AWS, a typical full AC channel would cost roughly 2,400 euros per month. By creating his own encoding infrastructure, Jet-Stream reduced this down to 750 euros per month.

Demystifying the live-streaming setup w Stef van der Ziel from Jet-Stream (NETINT Symposium on Building Your Own Streaming Cloud) - slide 14
FIGURE 2. Running your own system can deliver significant savings over AWS.

Obviously, the savings scale as you grow, so “if you do this times 12 months, times five years, times 80 channels, you’re saving almost 8 million euros.” If you run the same math on energy consumption, you’ll save 22,000 euros on energy costs alone.

By running the transcoding setup on-premises, the cost savings can even be doubled. On-premises is a popular choice to bring more control over core streaming processes back in house.

Overall, Stef’s keynote effectively communicated that while creating your own encoding infrastructure will involve significant planning and development time and cost, the financial reward can be very substantial.

Demystifying the live-streaming setup w Stef van der Ziel from Jet-Stream (NETINT Symposium on Building Your Own Streaming Cloud) - slide 46

ON-DEMAND: Stef van der Ziel - Demystifying the live-streaming setup

The LESS Accord and Energy-Efficient Streaming

The goal of our recent Build Your Live Streaming Cloud symposium was to help live video engineers learn how to build and house their own transcoding infrastructure while minimizing power consumption and carbon footprint. Accordingly, we invited Barbara Lange from the Greening of Streaming to speak at the symposium. This article relates the key points of her talk, particularly describing the short-term goals of the Low Energy Sustainable Streaming (LESS) Accord.

By way of background, Barbara is a Volunteer Secretariat for the Greening of Streaming and the principal and CEO of Kibo121, a consultancy dedicated to guiding the media tech sector towards sustainability. Barbara described the Greening of Streaming as a member organization formed roughly two years ago. Its primary focus is on the end-to-end energy efficiency of the technical supply chain that supports streaming services.

The organization has an international membership and is dedicated to addressing the energy implications of the streaming sector. Their mission is to provide the global internet streaming industry with a platform to enhance engineering practices and promote collaboration throughout the supply chain. One core belief is that as streaming increases in scope, understanding the true energy costs, backed by real-world data, is paramount. Barbara mentioned that the organization’s monthly membership meetings are now open to the public, with the next meeting scheduled for October 11 at 11:00 Eastern

Barbara then described the organization’s structure, highlighting its nine current working groups, which focus on diverse pursuits like defining terminology, organizing industry outreach, and identifying best practices. One notable initiative was the measurement of energy consumption during an English Premier soccer match. The organization also explores power consumption in audio streaming, compression/decompression, and the standardization of energy data.

A newly formed group is dedicated to understanding the energy costs associated with end-user devices. Barbara emphasized the importance of collaboration with academic and other industry groups to avoid duplication of effort and to ensure consistent and effective communication across the industry.

Energy-efficient streaming - Barbara-Lange-The-LESS-Accord-and-its-Energy-Savings-Drive-1

LESS ACCORD

With this as background, Barbara focused on the LESS Accord. She began by addressing a common misconception, which is that contrary to some media reports, there’s almost no direct correlation between internet traffic, measured in gigabytes, and energy consumption, measured in kilowatt-hours. This realization emerged from discussions within Working Group Six, which is responsible for examining compression-related issues. This group initiated the LESS Accord.

The LESS Accord’s mission statement is to define best practices for employing compression technologies in streaming video workflows. The goal is to optimize energy efficiency while ensuring a consistently high-quality viewing experience for users. These guidelines target energy reduction throughout the entire streaming process, from the initial encoding for distribution to the decoding and display on consumer devices for all video delivery services.

Energy-efficient streaming - Barbara-Lange-The-LESS-Accord-and-its-Energy-Savings-Drive-4

As Barbara reported, over the past six months, the group has actively engaged with industry professionals, engineers, and experts. They’ve sought insights and suggestions on how to enhance energy efficiency across all workflow and system stages. The essence of the Accord is to foster a collaborative environment where various, sometimes contrasting, initiatives from recent years can be harmonized.

The ultimate goal is to refine testing objectives and pinpoint organizations that can form project groups. Barbara detailed the first of four projects designated in the LESS Accord’s mission statement.

PROJECT ONE: INTELLIGENT DISTRIBUTION MODEL SHIFTING

Energy-efficient streaming - Barbara-Lange-The-LESS-Accord-and-its-Energy-Savings-Drive-6

Project one involves is determining the most energy-efficient distribution model at any given time and enabling content delivery networks (CDNs) to seamlessly transition between these models. The three distribution models to be considered are:

  • Unicast: The dominant model in today’s internet streaming.
  • Peer-to-peer: Typically used for video on demand distribution.
  • Net layer multicast: Often deployed for IPTV.

While each model has traditionally served a specific purpose, the group believes that all three could be viable options in various contexts. The hypothesis is that if these models can be provisioned almost spontaneously, there should be an underlying heuristic that facilitates the shift from one model to another. If energy efficiency is the primary concern, this shift could allow the CDN to meet that objective.

The main goal of this project is to design a workflow that incorporates energy measurements for the involved systems. The aim is to discern when an operator should transition from one model to another, with energy consumption of the entire system being the primary driver, without compromising the end user’s experience.

PROJECT TWO: THE "GOOD ENOUGH" CONCEPT

Energy-efficient streaming - Barbara-Lange-The-LESS-Accord-and-its-Energy-Savings-Drive-7

Barbara then described the second project, which involves potential energy savings through codec choices and optimization. The central question is whether energy can be conserved by allowing consumers to opt for a streaming experience that prioritizes energy efficiency.

The concept suggests introducing a “green button” on streaming media player devices or applications. By pressing this button, users would choose an experience optimized for energy conservation. Drawing a parallel, Barbara mentioned that many televisions come equipped with an “ECO” mode, which many users tend to disable or overlook. Project two will explore whether consumers might be more inclined to select the energy-efficient option if the energy consumption differences between modes were better communicated.

Taking the idea further, this project will explore consumer behavior if the devices defaulted to this ECO or green mode, and users had the choice to upgrade to a “gold mode” for a potentially enhanced quality. Or, if the default setting prioritized energy efficiency, would this lead to a more energy-conserving streaming system?

The project aims to explore these questions, especially considering that many users currently avoid ECO modes, possibly due to perceived concerns about service quality. As you’ve read, this project seeks to understand user behavior and preferences in the context of energy-efficient streaming.

PROJECT THREE: ENERGY MEASUREMENT THROUGHOUT WORKFLOWS

Energy-efficient streaming - Barbara-Lange-The-LESS-Accord-and-its-Energy-Savings-Drive-8

Barbara then described the third project, which she acknowledged as particularly intricate. The central challenge is to measure energy consumption at every stage of the streaming workflow. This initiative originated from Working Group Four, which has been exploring methods to monitor and probe systems to determine the energy costs associated with each step of the process.

The overarching question is: how much energy is required to deliver a stream to the consumer? While answering this question would be invaluable for economic, marketing, and feedback purposes, it’s a complex endeavor.

The proposed approach involves tracking energy consumption from start to finish in the streaming process. When a video file is created on a computer and encoding begins, an energy reading in kilowatt-hours could be taken. This process would be repeated at each subsequent production, delivery, and playback stage. The idea is to tag the video file with “energy breadcrumbs” or metadata that gets updated as the file progresses through the workflow. By the end, these breadcrumbs would provide a comprehensive view of the energy costs associated with the entire streaming process.

Barbara emphasized the ambitious nature of this project, noting that while it’s uncertain if they can fully realize this vision, they are committed to exploring it. She believes that this project, if successful, could have the most significant impact in terms of understanding energy consumption in the streaming sector.

PROJECT FOUR: TRANSITIONING WORKFLOWS FOR ENERGY EFFICIENCY

Energy-efficient streaming - Barbara-Lange-The-LESS-Accord-and-its-Energy-Savings-Drive-9

Barbara introduced the fourth project, which will explore how to adapt various technologies to transition existing workflows to hardware environments that are more energy-efficient. Some initial areas of exploration include:

  • Optimization between different silicon environments: Examining how different hardware platforms can be more energy-efficient.
  • Immersion cooling: Comparing traditional air cooling systems with alternative cooling methods in streaming environments. This includes processes like encoding, packaging, caching, and even playback in consumer electronics.
  • Deploying tasks to renewable energy infrastructures: Specifically, relocating non-time-sensitive encoding tasks to infrastructures powered by surplus renewable energy. An exciting development in this area is the interest shown by the Scottish Enterprise which aims to test the relocation of non-critical transcoding workloads to a wind-powered facility in Scotland.

ENERGY-EFFICIENT STREAMING

Energy-efficient streaming - Barbara-Lange-The-LESS-Accord-and-its-Energy-Savings-Drive-2

Barbara emphasized that all these projects were established during a Greening of Streaming event in June, and are currently in progress. She invited interested parties to join these projects and announced an upcoming member meeting that was held on September 13. Next one – October 11th.

Additionally, at IBC in September, the Greening of Streaming plans to present these projects to a broader audience, kick off the work in the fourth quarter, and continue into the next year. By the NAB event in April 2024, the organization hopes to discuss the projects in-depth and share test results.

ON-DEMAND: Barbara Lange - Empowering a Greener Tomorrow:
The LESS Accord and its Energy Savings Drive

AV1 Capped CRF Encoding with Quadra VPU

We’ve previously reported results for capped CRF encoding for H.264 and HEVC using NETINT Quadra video processing units (VPU). This post will detail AV1 performance, including both 1080p and 4K data.

For those with limited time, here’s what you need to know: Capped CRF delivers higher quality video during hard-to-encode regions than CBR, similar quality during all other scenes, and improved quality of experience at the same cost or lower than CBR. NETINT VPUs are the first hardware video encoders to adopt Capped CRF across the three most popular codecs in use today, AV1, HEVC, and H.264.

You can read a quick description of capped CRF here and get a deep dive with H.264 and HEVC performance results here

CAPPED CRF OVERVIEW

Briefly, capped CRF is a smart bitrate control technique that combines the benefits of CRF encoding with a bitrate cap. Unlike variable bitrate encoding (VBR) and constant bitrate encoding (CBR), which target specific bitrates, capped CRF targets a specific quality level, which is controlled by the CRF value. You also set a bitrate cap, which is applied if the encoder can’t meet the quality level below the bitrate cap.

On easy-to-encode videos, the CRF value sets the quality level, which it can usually achieve below the bitrate cap. In these cases, capped CRF typically delivers bitrate savings over CBR-encoded footage while delivering similar quality. For harder-to-encode footage, the bitrate cap usually controls, and capped CRF delivers close to the same quality and bitrate as CBR.

The value proposition is clear: lower bitrates and good quality during easy scenes, and similar to CBR in bitrate and quality for harder scenes. I’m not addressing VBR because NETINT’s focus is live streaming, where CBR usage dominates. If you’re analyzing capped CRF for VOD, you would compare against 2-pass VBR as well as potentially CBR.

One last detail. CRF values have an inverse relationship to quality and bitrate; the higher the CRF value, the lower the quality and bitrate. In general, video engineers select a CRF value that delivers their target quality level. For premium content, you might target an average VMAF score of 95. For user-generated content or training videos, you might target 93 or even lower. As you’ll see, the lower the quality score, the greater the bandwidth savings.

1080p RESULTS

We show 1080p results in Table 1, which is divided between easy-to-encode and hard-to-encode content. We encoded the CBR clips to 4.5 Mbps and applied the same cap for capped CRF encoding.

Jan Ozer-AV1 Capped CRF-1
Table 1. 1080p results using Quadra VPU and capped CRF encoding.

You see that in CBR mode, Quadra VPUs do not reach the target rate as accurately as when using capped CRF mode. This won’t degrade viewer quality of experience since the VMAF scores exceed 95, so this missing on the low side saves excess bandwidth with no visual quality detriment.

In this comparison, bitrate savings is minimized, particularly at CRF 19 and 21, as the capped CRF clips in the hard-to-encode content have a higher bitrate than the CBR counterparts (4,419 and 4,092 to 3,889). Not surprisingly, CRF 19 and 21 deliver little bandwidth savings and a slighly higher quality than CBR.

At CRF 23, things get interesting, with an overall bandwidth savings of 16.1% with a negligible quality delta from CBR. With a VMAF score of around 95, CRF 23 might be the target for engineers delivering premium content. Engineers targeting slightly lower quality can choose CRF 27 and achieve a bitrate savings of 43%, and an efficient 2.4 Mbps bit rate for hard-to-encode footage. At CRF 27, Quadra VPUs encoded the hard-to-encode Football clip at 3,999 kbps with an impressive VMAF score of 93.39.

Note that as with H.264 and HEVC, AV1 capped CRF does reduce throughput. Specifically, a single Quadra VPU installed in a 32-core workstation outputs 23 simultaneous CBR streams using CBR encoding. This dropped to eighteen for capped CRF, a reduction of 22%.

4K RESULTS

Many engineers encoding with AV1 are delivering UHD content, so we ran similar tests with the Quadra and 4K30 8-bit content with a CBR target and bitrate cap of 16 Mbps. Using four clips, including a 4K version of the high-motion Football clip to much less dynamic content like Netflix’s Meridian clip and Blender Foundation’s Sintel.

Table 2. 4K results for the Quadra VPU and capped CRF encoding.

In CBR mode, the Quadra VPU hit the bitrate target much more accurately at 4K than 1080p, so even at CRF 19, the VPU delivered a 13% bitrate savings with a VMAF score of 96.23. Again, CRF 23 delivered a VMAF score of very close to 95, with 45% savings over CBR. Impressively, at CRF 23, Quadra delivered an overall VMAF score of 94.87 for these 4K clips at 7.78 Mbps, and that’s with the Football clip weighing in at 14.3 Mbps.

Of course, these savings directly relate to the cap and CBR target. It’s certainly fair to argue that 16 Mbps is excessive for 4K AV1-encoded content, though Apple recommends 16.8 for 8-bit 4K content with HEVC here.

The point is, when you encode with CBR, you’re limiting quality to control bandwidth costs. With capped CRF, you can set the cap higher than your CBR target, knowing that all content contains easy-to-encode regions that will balance out the impact of the higher cap and deliver similar or lower bandwidth costs. With these comparative settings, capped CRF delivers higher quality video during hard-to-encode regions than CBR, similar quality during all other scenes, and improved quality of experience at the same cost or lower than CBR.

DENSER / LEANER / GREENER : Symposium on Building Your Own Streaming Cloud

Choosing a Co-Location Facility

Edgio is the result of the merger between Limelight Networks and EdgeCast in 2022, which produced a company with over 20 years of experience choosing and installing their own equipment into co-location facilities.

With customers like Disney, ESPN, Amazon, and Verizon, Edgio has had to manage both explosive growth and exceptionally high expectations.

So, there’s no better source to help you learn to choose a co-location provider than Kyle Faber, Head of CDN Product Delivery Management at Edgio. He’s got experience, and as you’ll see below, the pictures to prove it.

Kyle starts with a description of the math involved in deciding whether co-location is the right direction for your organization, and then works though must-have and nice-to-have co-location features. He covers the value of certifications, the importance of redundancy and temperature management, explores connectivity, support, and cost considerations, and finishes with a look at sustainability. It’s a deep and comprehensive look at choosing a co-location provider and information that anyone facing this decision will find invaluable.

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-1

NAVIGATE THE COMPLEXITIES OF PRIVATE COLOCATION DECISIONS

Kyle started by addressing the considerations video engineers should prioritize when contemplating the shift to private co-location. In the context of modern public cloud computing platforms, he asserted that the decision to opt for private colocation requires a higher level of scrutiny due to the advanced capabilities of cloud offerings. While some enterprises rely solely on public cloud solutions for their production stack, there are compelling reasons to explore private colocation options.

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-2

He outlined his talk as follows:

  • First, he detailed a methodology for considering your financial break-even.
  • Then, he identified the “must have” features that a co-location provider must offer.
  • Then he related the nice-to-have, but not essential features that are potentially negotiable based on your organization’s goals.
  • He concluded with insight into how to balance the cloud vs. co-location decision, sharing that “it’s not a zero-sum game.”

As you’ll see, throughout the talk, Kyle provided practical insights to help video engineers navigate the complexities of private colocation decisions. He emphasized understanding the factors influencing these choices and making informed decisions based on an organization’s unique circumstances.

UNDERSTANDING THE MATH AND BREAKEVEN PRINCIPLES

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-3

Kyle started the economic discussion with the concept of the economics of minimum load and its relevance to private co-location decisions for video engineers. Using an everyday analogy, Kyle drew parallels between choosing to buy a car for daily use versus opting for ride-sharing services. He noted that the expenses associated with car ownership accumulate rapidly, but they eventually stabilize.

The convenience of controlling usage and trip frequency often leads to a reduced cost per ride compared to ride-sharing services over time. This analogy illustrated the dynamics of yearly co-location contracts, where minimum load drives efficiencies and potential gains.

Kyle then shifted to a scenario involving short-term heavy needs, like vacation car rentals. He noted that car rentals offer flexibility for unpredictable schedules without the commitment of ownership. This aligns with the flexibility provided by bare metal service providers, who offer diverse options within predefined parameters. This approach maintains efficiency while operating within certain boundaries.

Concluding his analogy, Kyle compared on-demand and public cloud offerings to ride-sharing services. He emphasized their ease of access, requiring just a few clicks to summon a driver or server, without concerns regarding operational aspects like insurance, maintenance, and updates.

By illustrating these relatable scenarios, Kyle underscored the importance of understanding the economics of minimum load in the context of private co-location decisions, specifically catering to the considerations of video engineers.

NAVIGATE THE ECONOMICS OF MINIMUM LOAD

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-4

Kyle next elaborated on the strategic approach required to navigate the economics of minimum load in the context of private co-location decisions. He emphasized the significance of aligning different models with specific data center demands.

Drawing from personal experiences, Kyle illustrated the concept using relatable scenarios. He contrasted his friend’s experience of living near a rail line in Seattle, which made car ownership unnecessary, with his own situation in Scottsdale, Arizona, where car ownership was essential due to logistical challenges.

Translating this to the business realm, Kyle pointed out that various companies have unique server requirements. Some prioritize flexible load management over specialized hardware needs and prefer to maintain a lean staff without extensive server administration roles. For Edgio, a content delivery network, private co-location globally was the optimal choice to meet their specific requirements.

Kyle then began a cost analysis, acknowledging that while the upfront cost of private co-location might seem daunting compared to public cloud prices, the cumulative server hour costs can accumulate rapidly. He referenced AWS’s substantial revenue from convenience as an example. He highlighted the necessity of considering hidden costs, including human capital requirements and logistical factors.

Addressing executive leaders, Kyle cautioned against assuming that software developers skilled with code are also adept at running data centers. He emphasized the importance of having dedicated data center and server administration experts to maximize cost savings and avoid potential disasters.

Looking toward the future, Kyle advised mid-sized companies to consider their future needs and focus on maintaining nimbleness. He shared his insights into the challenges of hardware logistics and the value of proper tracking and clarity to identify breakeven points. In this comprehensive overview, Kyle provided practical insights into the economics of minimum load, offering a pragmatic perspective on private co-location decisions for video engineers.

MUST-HAVE CO-LOCATION FEATURES

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-5

With the economics covered, Kyle shifted to identifying the must-have features in any co-location service, suggesting that certifications play a crucial role in evaluating co-location providers. ISO 9,000 and SOC 2, types one and two, were cited as common minimum standards, with additional regional and industry-specific variations. Kyle recommended requesting certifications from potential vendors and conducting thorough research to understand the significance of these certifications.

Kyle explained that by obtaining certifications, you can move beyond basic questions about construction methods, power backup systems, and operational standards. Instead, you can focus on more nuanced inquiries, like power sources, security standards for visitors, and the training and responsiveness of remote hands teams. This transition allows for a more informed assessment of vendors’ capabilities and suitability for specific needs.

THE SIGNIFICANCE OF ON-SITE VISITS

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-6

Kyle underscored the significance of on-site visits in the colocation decision-making process, sharing three images that highlighted the insights gained from physical visits to data center facilities. The first image depicted service cabling that entered a data center. While the front of the building seemed pristine, the back revealed potential issues lurking in the shadows. Kyle stressed that some problems can only be identified through close inspection.

The second image showed a fiber distribution panel, showcasing the low level of professionalism evident in the data center’s installations. This reinforced the idea that visual assessments can reveal the quality of a facility’s infrastructure.

The third image illustrated a unique scenario. During construction, a new fiber channel was being laid, but the basement entry of the fiber trench was left unsealed. An overnight rainstorm resulted in the trench filling with water. Because the basement access hole was uncapped, water flowed downhill into a room with valuable equipment. This real-life example served as a reminder of the importance of thorough inspection and due diligence in the colocation industry.

These visuals underscore the importance of physically visiting data centers to identify potential challenges and make informed decisions.

AND TEMPERATURE MANAGEMENT

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-8

Kyle also shared that  temperature management is particularly important to data centers. For example, Edgio emphasizes cooling speed, temperature regulation, and high-density heat rejection technology. It’s not merely about achieving lower temperatures; it’s about effectively managing and dissipating heat.

Kyle explained that even a slight temperature fluctuation can trigger far-reaching consequences, so maintaining a precise temperature of 76 degrees Fahrenheit is paramount. The utilization of advanced heat rejection technology ensures that any deviations from this optimal point can be promptly corrected, guaranteeing peak performance for their installations.

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-9

Paradoxically, economic success complicates temperature maintenance. Over the past eight years, Kyle reported that Edgio achieved a 30% improvement in server power efficiency, coupled with a 760% surge in server density metrics. However, since the laws of physics remain steadfast, this density surge brings with it an elevated heat generation within a smaller space.

CONNECTIVITY, SUPPORT, AND COST CONSIDERATIONS

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-10

Kyle’s discussion then shifted to connectivity, sustainability, and environmental considerations with a focus on where to place each factor in your decision-making scorecard.

Emphasizing the critical role of connectivity in businesses, Kyle noted that vendors often claim constant uptime and availability, and usually deliver this, so they differentiate themselves through their access to the wider internet. When choosing a co-location provider, all organizations should reflect on their unique requirements. For instance, he suggests that businesses intending to connect with a CDN like Edgio might require a local data center partner that facilitates data transformation and transcoding but might not need the extensive infrastructure for global data distribution.

Kyle then addressed the significance of remote support, especially during initial installations where a swift response to issues is crucial. While tools like iDRAC and remote Out-of-Band server access provide control, Kyle highlighted the importance of real-time assistance during other critical moments, such as identifying server issues.

Addressing costs, Kyle acknowledges its pivotal role in decision-making, a sentiment particularly relevant given the current technology landscape. Kyle urges a balance between cost-effectiveness and quality, drawing parallels between daily personal choices and those made in professional spheres. He references Terry Pratchett’s boot theory of economics, emphasizing the inevitability of change and the need for proactive lifecycle management. “Even the best boots will not last forever,” Kyle paraphrased, “and you need to plan lifecycle management.”

A FEW WORDS ABOUT SUSTAINABILITY

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-11

Kyle urged all participants and readers to consider sustainability, transcending its status as a mere buzzword. “Sustainability is more than a buzzword,” he declared, “It is a commitment.”

He illuminated the staggering energy appetite of data centers, exemplified by Amazon’s permits for generators in Virginia, capable of producing a remarkable 4.6 gigawatts of backup power – enough to illuminate New York City for a day. Kyle underscored the industry’s responsibility to reevaluate energy sources, citing the rising importance of Environmental Social Governance (ESG) movements. He emphasized that organizations are now compelled to report their environmental impact to stakeholders and investors, emphasizing transparency.

When considering colocation facilities, Kyle recommended evaluating their sustainability reports, which reveal critical information from energy-sourcing practices to governance approaches. By aligning operational needs with global responsibilities, businesses can make conscientious choices that resonate with their core values and forge meaningful partnerships with data center providers.

GET INTIMATELY ACQUAINTED WITH THE UNPREDICTABLE

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-12

While you should perform a comprehensive needs analysis and service comparison to choose your provider, Kyle also highlighted that data centers are intimately acquainted with the unpredictable. Construction activities, often beyond the data center provider’s control, persistently surround these facilities.

The photo above, taken a mile away from a facility, exemplifies the unforeseen challenges. A construction crew, possibly misinformed or negligent, drove an auger into the ground at an incorrect location, inadvertently ensnaring cabling, and yanking dozens of meters of fiber from the earth.

The incident’s specifics remain unclear, yet the lesson is evident – despite meticulous planning, unpredictability is an integral facet of this landscape. As Kyle summarized, “It’s a stark reminder that despite our best plans, unpredictability has to be part of this landscape, so always be prepared for the unexpected.”

NO ONE-SIZE-FITS-ALL SOLUTION

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-13

In closing, Kyle addressed the intricate decisions surrounding ownership, rental, and on-demand data center services, emphasizing that there’s no one-size-fits-all solution. He presents the choice between owning servers, renting them, or opting for on-demand cloud services as a complex tapestry woven with factors such as the unique average minimum load and an organization’s strategic objectives.

Kyle cautioned that navigating this intricate landscape demands a nuanced perspective. The decision requires a well-thought-out plan that not only accommodates an organization’s goals and growth but also anticipates the evolving trends of the industry. This approach ensures that the chosen path resonates seamlessly with an organization’s aspirations, offering stability for the journey ahead.

GO FROM A PURE OPEX MODEL TO A CAPEX MODEL

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-14

Before wrapping up, Kyle answered one question from the audience, “ How does someone begin to approach a transition? Is it even possible to go from a pure OPEX model to a CAPEX model? Any suggestions, ideas, insights?”

Kyle noted that when you assess an OPEX model, you’re essentially looking at linear costs. These costs offer a clear breakdown of your system expenses, which can be projected into the future.

While there might be some pricing fluctuations as public cloud providers compete, you can treat entire segments as a transition unit. It might not be feasible to buy just one server and place it in isolation, but you can transition comprehensive sections in one concerted effort.

So, you might build a small encoding farm, allowing for a gradual shift while maintaining flexibility across various cloud instances like AWS, Azure, or GCP. This phased approach grants greater control, cost benefits, and a smoother transition into the new paradigm.

ON-DEMAND: Kyle Faber - Choosing a Co-Location Facility

Norsk and NETINT: Elevating Live Streaming Efficiency

With the growing demand for high-quality viewing experiences and the heightened attention on cost efficiency and environmental impact,  hardware acceleration plays an ever-more-crucial role in live streaming.

Here at NETINT, we want users to take full advantage of our transcoding hardware, so we’re pleased to announce that id3as NORSK now offers exceptionally efficient support for NETINT’s T408 and Quadra video processing unit (VPU) modules.

Here at NETINT, we want users to take full advantage of our transcoding hardware, so we’re pleased to announce that id3as NORSK now offers exceptionally efficient support for NETINT’s T408 and Quadra video processing unit (VPU) modules.

Using NETINT VPU’s, users can leverage the Norsk low-code live streaming SDK to achieve higher throughput and greater efficiency compared to running software on CPUs in on-prem or cloud configurations. Combined with Norsk’s proven high-availability track record, this makes it easy to deliver exceptional services with maximum reliability and performance at a never-before-available OPEX. 

Norsk and NETINT.

Norsk also takes advantage of Quadra’s hardware acceleration and onboard scaling to achieve complex compositions like picture-in-picture and resizing directly on the card. Even better, Norsk’s built-in ability to “do the right thing” also means that it knows when it can take advantage of hardware acceleration and when it can’t.  

 

For example, if you’re running Norsk on the T408, decoding will take place on the card, but Norsk will automatically utilize the host CPU for functions like picture-in-picture and resizing that the T408 doesn’t natively support, before returning the enriched media to the card for encoding (Scaling and resizing functions are native to Quadra VPUs so are performed onboard without the host CPU). 

 

“As founding members of Greening of Streaming, we’re keenly aware of the pressing need to focus on energy efficiency at every point of the video stack,” says Norsk CEO Adrian Roe. “By utilizing the Quadra and T408 VPU modules, users can reduce energy usage while achieving maximum performance even on compute-intensive tasks. With Norsk seamlessly running on NETINT hardware, live streaming services can consume as little energy as possible while delivering a fantastic experience to their customers.” 

“By utilizing the Quadra and T408 VPU modules, users can reduce energy usage while achieving maximum performance even on compute-intensive tasks. With Norsk seamlessly running on NETINT hardware, live streaming services can consume as little energy as possible while delivering a fantastic experience to their customers.” 

– Norsk CEO Adrian Roe. 

“Id3as has proven expertise in helping its customers produce polished, high-volume, compelling productions, and as a product, Norsk makes that expertise widely accessible,” commented Alex Liu, NETINT founder and COO. “With Norsk’s deep integration with our T408 and Quadra products, this partnership makes NETINT’s proven ASIC-based technology available to any video engineer seeking to create high-quality productions at scale.” 

“With Norsk’s deep integration with our T408 and Quadra products, this partnership makes NETINT’s proven ASIC-based technology available to any video engineer seeking to create high-quality productions at scale.”  

– Alex Liu, NETINT founder and COO.

Both Norsk and NETINT will be at IBC in Amsterdam, September 15-18. Click to request a meeting with Norsk, or NETINT, and/or visit NETINT at booth 5.A86

ON-DEMAND: Adrian Roe - Make Live Easy with NORSK SDK

Save Bandwidth with Capped CRF

Video engineers are constantly seeking ways to deliver high-quality video more efficiently and cost-effectively. Among the innovative techniques gaining traction is capped Constant Rate Factor (CRF) encoding, a form of Content-Adaptive Encoding (CAE), which NETINT recently introduced across our Video Processing Unit (VPU) product lines for x264 and x265. In this blog, we explore why capped CRF is essential for engineers seeking to streamline video delivery and save on bandwidth costs.

Capped CRF - The Efficient Encoding Solution

Capped CRF is a smart bitrate control technique that combines the benefits of CRF encoding with a bit rate cap. Unlike Variable Bitrate Encoding (VBR) and Constant Bitrate Encoding (CBR), which target specific bitrates, capped CRF targets a specific quality level controlled by the CRF value, with a bitrate cap applied if the encoder can’t meet the quality level below the bitrate cap.

A typical capped CRF command string might look like this:

crf 21    -maxrate 6MB

This tells the encoder to encode to CRF 21 quality, but don’t exceed 6 Mbps. Let’s see how this might work with the football video shown in the figure, which compares capped CRF at these parameters with a CBR file encoded to 6 Mbps.

NETINT - Bitrate Comparison - Capped CRF

With the x264 codec, CRF 21 typically delivers a VMAF score of around 95. With easy-to-encode sideline shots, the CRF value would control the encoding, delivering 95 VMAF quality at 2 Mbps, a substantial savings over CBR at 6 Mbps.

During actual plays, the 6 Mbps bitrate cap would control, delivering the same quality as CBR at 6 Mbps. So, capped CRF saves bandwidth with easy-to-encode scenes while delivering equivalent to CBR quality with hard-to-encode scenes.

Ease of Integration

As implemented within the NETINT product line, capped CRF requires no additional technology licensing or complex integration – you simply upgrade your products and change your encoding command string. This means that you can seamlessly implement the feature across NETINT’s VPUs without extensive adjustments or additional investments.

NETINT’s capped CRF is compatible with H.264 and HEVC, and AV1 coming (Quadra only), so you can use the feature across different codec options to suit your specific project requirements. Regardless of the codec used, capped CRF delivers consistent video quality with the potential for bandwidth savings, making it a valuable tool for optimizing video delivery.

A Game Changer

By deploying capped CRF, engineers can efficiently deliver high-quality video streams, enhance viewer experiences, and reduce operational expenses. As the demand for video streaming continues to grow, Capped CRF emerges as a game-changer for engineers striving to stay at the forefront of video delivery optimization.

You can read more about how capped CRF works here. You can read more about Quadra VPUs here, and T408 transcoders here.

Now ON-DEMAND: Symposium on Building Your Live Streaming Cloud

Evaluating Low-Latency Video Streaming Protocols, Technologies and Best Practices

Low-latency video is a priority for many streaming engineers. This article reviews three schemas for producing low-latency video, identifies their pros and cons, and summarizes the questions engineers should consider when choosing a low-latency technology and/or service provider.

Architectures:

From an architectural perspective, let’s start with what we all know; HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming Over HTTP (DASH) are the de facto stream delivery protocols in most regions of the world. Technically, HLS is a draft standard of the IETF largely controlled by Apple and is now in its 12th version. In contrast, MPEG-DASH is a true international standard for streaming media, ratified by the Motion Pictures Experts Group (MPEG). Both enjoy nearly universal support from encoders, packagers, DRM providers, players, and other streaming infrastructure providers.

By way of background, HLS and DASH were originally developed to enable streaming video delivery without a streaming server. Prior to their creation, media servers like Adobe’s Flash Media server maintained a connection between the server and player to deliver media to the Flash Player using the RTMP protocol. Since Adobe charged a license fee for the servers, and each server could only manage a finite number of connections, large-scale streaming events were expensive to produce.

HLS and DASH were the two most significant HTTP-based Adaptive Streaming (HAS) techniques that supplanted the Adobe servers (Adobe and Microsoft had HAS technologies that some operators still use). Like all HAS technologies, HLS and DASH operate using a combination of media segments and metadata files stored on standard HTTP web servers. You see this in Figure 1. During playback, players retrieve the metadata files to identify the location of the media files and then download the files via HTTP as needed to play the video. All logic resides in the player; the server just stores the files.

Figure 1. All HAS technologies operate using metadata files and media segments.
Figure 1. All HAS technologies operate using metadata files and media segments.

From a latency perspective, most HAS techniques used ten-second segments, and most players buffered up to three segments before starting to play. Do the math, and you get up a minimum of thirty seconds of latency, long enough for your neighbor watching via satellite to see the goal, cheer the goal, hug his or her significant other, and grab another beverage from the fridge, leaving you uncomfortably wondering “what just happened?”

While the latency is awful, consider all the benefits that HAS techniques deliver. Because they transfer via HTTP, HAS technologies are supported by all browsers and virtually all video players in smart TVs, OTT dongles, smartphones, tablets, and other devices. The players retrieve standard HTTP packets that don’t need a special media server, are firewall-friendly, and can be delivered by normal CDNs, just like other web data.

Adaptive bitrate delivery with multiple files and bandwidths for different clients is standard, with caption and multiple language support, promoting a very high quality of experience (QoE). Advertising insertion is available, as well as studio-quality digital rights management (DRM) via techniques like Widevine, PlayReady, FairPlay, and the newest entrant, Huawei WisePlay.

So, except for latency, HAS techniques are quite effective for both the viewer and the publisher. Note that engineers seeking to reduce latency from thirty seconds to five seconds or so can cut the segment sizes to one-to-two-second segments, but going below this could start to degrade video quality. To effectively reduce your latency to sub-3 seconds or so, you must switch to a low-latency variety of HLS or DASH.

Table 1. Feature sets of the different technology options.
Table 1. Feature sets of the different technology options.

HLS, DASH, AND CMAF

Let’s briefly explore what CMAF is and how it relates to DASH and HLS. Where DASH and HLS are adaptive streaming protocols, CMAF, which stands for the Common Media Application Format, is a container format for streaming media. More specifically, CMAF combines a single set of media files and multiple sets of manifest files to enable one group of files to serve multiple targets.

Today, many engineers use CMAF to combine one set of media files with separate DASH and HLS manifests. This saves encoding costs since you produce one set of files rather than two and halves the footprint on the streaming server. Technically, you don’t deliver via CMAF, you deliver via HLS or DASH using media packaged in a CMAF container.

Obviously, if you can package one set of media files for delivery via DASH or HLS, they must work similarly. That’s why I’m bundling HLS and DASH into their own column. Since CMAF is a container format, not a streaming protocol, I’m ignoring that altogether, though I could have just as easily included CMAF in the HLS/DASH column.

Figure 2. LL DASH uses chunks to reduce latency.
Figure 2. LL DASH uses chunks to reduce latency. (www.theoplayer.com/blog/low-latency-dash).

Rather than buffering three complete segments before playing, the LL DASH or LL HLS player typically buffers three to four chunks. In operation, this can reduce latency to perhaps three to five seconds, often more, depending upon multiple factors.

In terms of compatibility, most, but not all, current players support LL HLS/DASH. Obviously, if you’re going to deploy either technology, you should verify player compatibility. Both technologies are backwards compatible so that legacy players that don’t support low latency can simply play the streams at normal latency. Otherwise, LL HLS/DASH retains all the other benefits of HAS delivery, including standard CDNs, ABR delivery, DRM, captions, advertising, and multiple language support.

THEO High-Efficiency Streaming Protocol (HESP)

THEO Technologies invented HESP and it’s managed by the HESP Alliance, which includes multiple members that provide valuable infrastructure support for HESP, including EZDRM and BuyDRM (digital rights management), MediaMelon (optimization and analytics), Ceeblue (transcoding), and multiple turnkey service providers. Like Apple HLS, HESP has been made available as a draft information specification via the Internet Engineering Task Force (IETF).

Technically, HESP is a HAS technique that offers lower latency while retaining all the benefits of HAS delivery. HESP works by creating two streams for each live event (Figure 3).

First is the initialization stream, which contains only keyframes. This allows the player to start playback on any frame, not just at the start of a group of pictures, which often can be between 1-2 seconds long. The second stream is the continuation stream, which is encoded normally and is the stream viewed by the player.

When a viewer joins the stream, the player first loads a frame from the initialization stream, which plays immediately, reducing latency to as low as less than a second. Then, the player retrieves subsequent frames from the continuation stream. You see this in Figure 3. When a player starts to play the stream, it retrieves a keyframe from the Initialization stream (C1). Then, it continues to play frames from the Continuation stream (d1 and so on).

Figure 2. LL DASH uses chunks to reduce latency.
Figure 3. The two streams used for the High-Efficiency Streaming Protocol. From here: https://www.hespalliance.org/hesp-technical-deck (registration and download required)

If multiple videos are available, when the viewer switches streams, the same thing happens. The player grabs a frame from the initialization stream of the second video (G2) to enable the immediate switch and then retrieves additional frames from the Continuation stream (h2, i2, j2) to deliver normal quality.

As shown in Figure 4, you can implement HESP using standard production techniques, a standard encoder, and a regular CDN, but you need a HESP-compatible packager and player. Interestingly, the HESP continuation stream is backwards compatible to LL HLS and LL DASH, so if you don’t have an HESP player, you fall back to LL HLS or LL DASH latency. Once implemented, HESP delivers all the benefits of HAS technologies, including ABR delivery, DRM support, and advertising support.

Figure 4. Implementing HESP requires a HESP-compatible packager and player.
Figure 4. Implementing HESP requires a HESP-compatible packager and player. From here: https://www.hespalliance.org/hesp-technical-deck (registration and download required)

Unlike all the other technologies discussed, HESP is royalty-bearing, with royalties assessed on usage volume, subscriptions, and hardware devices. The annual cap for software and subscription is $2.5 million, with a $25 million annual cap on devices sold. Click here for more details on the HESP royalties.

To be fair, MPEG LA did attempt to form a patent pool on DASH-related technology in 2015, but it closed in 2019. Later in 2019, two companies sued Showtime, Vudu, and Crackle on the same patents, though the patent upon which the claims were made was found invalid in 2022.

WebRTC

Google created WebRTC to enable real-time audio/video communications between browsers without plug-ins. Today, WebRTC is an open-source project and standard published by both the IETF and W3C. These standards mean near-universal support by computer and mobile browsers, though support in smart TVs, OTT dongles, and game devices is not nearly as pervasive as support for HLS and DASH.

Originally used for simple peer-to-peer communications, WebRTC was later used to power conferencing applications and has been extended into one-to-many live streaming applications, primarily because it offers sub 500 ms latency. It’s useful to consider WebRTC from two perspectives: what you get out of the box and what product and service providers can build around it to make it more useful. Let’s explore how WebRTC works and then return to this observation.

Figure 5 shows the basic WebRTC operating schema in peer-to-peer mode. Two peers wishing to connect meet through a signaling server. Once they connect, they exchange audio-video data directly in real-time using the User Datagram Protocol (UDP) protocol as opposed to HTTP. These architectural differences from HAS technologies have multiple implications.

First, using direct streaming via the User Datagram Protocol (UDP) rather than chunks and segments via HTTP means lower latency than HAS technologies. That’s the object of the exercise, so that’s a good thing.

However, the matching and other roles that servers play in WebRTC mean that WebRTC events can’t scale beyond a certain size without adding servers, which adds to the cost. UDP delivery means that the audio/video data can’t be delivered by standard CDNs, adding further to the cost and potentially limiting your reach, and UDP packets may not be able to get through some corporate firewalls, limiting access by some viewers, though there are several techniques that can mitigate this risk.

Figure 5. Low Latency WebRTC schema for simple peer-to-peer applications.

Originally designed for browser-to-browser communications, WebRTC has gained a reputation for low-quality video because most traditional applications involve low-quality webcam video compressed by the low-quality encoder in a browser. WebRTC has also been rightfully criticized for the lack of true adaptive streaming, studio DRM, and features like advertising insertion and captions. If you were to develop your own WebRTC application for large-scale live event streaming from scratch, you would have to work around all these limitations.

Fortunately, you don’t have to start from scratch. Multiple vendors like Ant Media, Red5, Wowza, and others offer servers that can ingest and transcode high-quality audio/video streams and automatically add servers and WebRTC-capable CDN capacity to meet viewer demand.

For those seeking a more turnkey experience, multiple vendors like Ant Media, Dolby.io, Wowza, Phenix, , and many others also offer WebRTC packages that you can deploy by providing a live feed and writing a check.

Though different providers offer different feature sets, several WebRTC-based services provide true ABR delivery and other HAS-like features, though you may have to deploy a service-specific player. In addition, EZDRM and Castlabs have shown working DRM implementations for WebRTC, so if studio DRM is essential, that’s now available.

Going forward, two new protocols, WHIP and WHEP, may further simplify deploying WebRTC for large-scale live streaming. WHIP stands for the WebRTC-HTTP Ingest Protocol, and it simplifies high-quality, ultra-low latency ingest into WebRTC, much like RTMP does for typical live streaming. WHEP stands for WebRTC-HTTP Egress Protocol, and it standardizes how WebRTC data can be downloaded to a non-WebRTC client, like a smart TV without WebRTC support.

WebRTC clearly offers lower latency than HAS -based technologies. The only questions are how much extra you’ll pay to achieve that latency over HAS technologies and what, if any, features you won’t be able to access.

Other Low Latency Technologies

There are other technologies used for low latency. For example, Nanocosmos offers a low-latency service called NanoStream Cloud using a technology called WebSockets with LL HLS rather than WebRTC. Like WebRTC, Websockets offers lower latency than any HAS approach, but as a pure technology, can’t match the features of LL HLS/DASH. However, as with WebRTC, developers like Nanocosmos have built around WebSockets to create fully featured services and technologies worth considering for many types of projects.

Figure 4. Implementing HESP requires a HESP-compatible packager and player.
Figure 6. Nanocosmos’ nanoStream Cloud service is built around WebSockets.

Questions to Ask

If you’re choosing a turnkey service, you probably care more about performance, features, and cost than the technology underlying the service. Here’s a list of questions to ask when choosing your technology and/or service provider.

  • What’s the maximum latency you can tolerate? The lower the latency, the greater the cost. Lower latency also means decreased playback robustness since little, if any, of the stream is pre-buffered.
  • What’s the overall projected cost of your event? If WebRTC or other low-latency service increases the cost, do the economics of the project justify that increased price?
  • What latency will the service deliver at your scale and geographic distribution?
  • What devices are supported (computers, mobile, and living room)?
  • What are the critical features that you need, and can the service deliver them? Captions? Advertising? Studio quality DRM? Multiple language support?
  • Can the system maintain synchronization with all viewers to support auctions and sports gambling?
  • What codecs are available, and what bandwidths can the system support?
  • Does the system support dynamic adaptive streaming, which means that it switches streams during the event to adjust to changing bandwidth conditions (rather than sending a single quality stream)?
  • If the system transmits UDP packets, what features are available to avoid firewall blocking?
  • Can you use your own player, or do you need to deploy a custom player? If custom, what are the design and branding opportunities?
  • What encoders does the system support, and at what quality level?

NETINT’s Low Latency VPUs

All low-latency productions start with low-latency transcoding. For a look at the latency produced by NETINT’s Quadra Video Processing Unit in normal and low-latency modes, check out this review: Unveiling the Quadra Server: The Epitome of Power and Scalability.

Author’s note: The author would like to thank Barry Owen from Wowza and Pieter-Jan Speelmans from THEO Technologies for the high-level tech read. I appreciate you both sharing your knowledge and insights. Any errors in the article are solely attributable to the author.

Cloud services are an effective way to begin live streaming. Still, once you reach a particular scale, it’s common to realize that you’re paying too much and can save significant OPEX by deploying transcoding infrastructure yourself. The question is, how to get started?
NETINT’s Build Your Own Live Streaming Platform symposium gathers insights from the brightest engineers and game-changers in the live-video processing industry on how to build and deploy a live-streaming platform.

Simplify Building Your Own Streaming Cloud with GPAC

Romain Bouqueau is CEO of Motion Spell and one of the principal architects of the GPAC open-source software, one of the three software alternatives presented in the symposium. He spoke about the three challenges facing his typical customers: features, cost, and flexibility, and identified how GPAC delivers on each challenge.

Then, he illustrated these concepts with three impressive case studies: Synamedia/Quortex, Instagram, and Netflix. Overall, Romain made a strong case for GPAC as the transcoding/packaging element of your live streaming cloud.

Simplify Building Your Own Streaming Cloud with GPAC

NETINT Symposium - GPAC

Romain began his talk with an excellent summary of the situation facing many live-streaming engineers. “It’s a pleasure to discuss the challenges of building your own live-streaming cloud. Cloud services are convenient, but once you scale, you may realize that you’re paying too much and you are not as flexible as you’d like to be. I hope to convince you that the cost of customization that you have when using GPAC is actually an investment with a very interesting ROI if you make the right choices. That’s what we’re going to talk about.”

NETINT Symposium - GPAC - Figure 1. About Romain, GPAC, and Motion Spell.
Figure 1. About Romain, GPAC, and Motion Spell.

Then, he briefly described his background as a principal architect of the GPAC open-source software, which he has contributed to for over 15 years. In this role, Romain is known for his advocacy of open source and open standards and as a media streaming entrepreneur. His primary focus has been on GPAC, a multimedia framework recognized for its emphasis on modularity and standards compliance.

He described that GPAC offers tools for media content processing, inspection, packaging, streaming playback, and interaction. Unlike many multimedia frameworks that cater to 2D TV-like experiences, GPAC is characterized by versatility, controlled latency, and the ability to support various scenarios, including hybrid broadcast broadband setups, interactivity, scripting, virtual reality, and 3D scenes.

Romain’s notable achievements include streamlining the MPEG ISO-based media file format used in formats like MP4, CMAF, DASH, and HLS. His work earned recognition through a technology engineering EMMY award. To facilitate the wider use of GPAC, Romain established Motion Spell, which serves as a bridge between GPAC and its practical applications. Motion Spell provides consulting, support, and training, acting as the exclusive commercial licenser of GPAC.

During his introduction, Romain discussed challenges faced by companies in choosing between commercial solutions and open source for video encoding and packaging. He posited that many companies often lack the confidence and necessary skills to fully implement GPAC but emphasized that despite this, the implementation process is both achievable and simpler than commonly assumed.

He shared that his customers face three major challenges, features, cost, and flexibility, and addressed each in turn.

Features

NETINT Symposium - GPAC -  Figure 2. The three challenges facing those building their live streaming cloud.
Figure 2. The three challenges facing those building their live streaming cloud.

The first challenge Romain highlighted relates to features and capabilities. He advised the audience to create a comprehensive list that encompasses the needed capabilities, including codecs, formats, containers, DRMs, captions, and metadata management.

He also underscored the importance of seamless integration with the broader ecosystem, which involves interactions with external players, analytics probes, and specific content protocols. Romain noted that while some solutions offer user-friendly graphical interfaces, deeper configuration details often need to be addressed to accommodate diverse codecs, parameters, and use cases, especially at scale.

Highlighting Netflix’s usage of GPAC, Romain emphasized that GPAC is well-equipped to handle features and innovation, given its research and standardization foundation. He acknowledged that while GPAC is often a step ahead in the industry, it cannot implement everything alone. Thus, sponsorship and contributions from the industry are crucial for the continued development of this open-source software.

Romain explained that GPAC’s compatibility with the ecosystem is a result of its broad availability. Its role as a reference implementation, driven by standardization efforts, makes it a favored choice. Additionally, he mentioned that Motion Spell’s efforts have led to GPAC becoming part of numerous plugin systems across the industry.

Cost

The second challenge highlighted by Romain is cost optimization. He explained that costs are typically divided into Capital Expenditure (CAPEX) and Operational Expenditure (OPEX). He noted that GPAC, being written in the efficient C programming language, benefits from rigorous scrutiny from the open-source community, making it highly efficient. He acknowledged that while GPAC offers various features, each use case varies, leading to questions about resource allocation. Romain encouraged considerations like the need for CDNs for all channels and premium encoders for all content.

Regarding CAPEX, Romain mentioned integration costs associated with open-source software, emphasizing that some costs might be challenging to evaluate, such as error handling. He referenced the Synamedia/Quortex architecture as an example of efficient error management. Romain also addressed the misconception that open source implies free software, referencing a seminar he participated in that compared the costs of different options.

He shared an example of a broadcaster with a catalog of 100,000 videos and 500 concurrent streams. The CAPEX for packaging ranged from $100,000 to $200,000, depending on factors like developer rates and location, with running costs being relatively low compared to transcoding costs.

Romain revealed that, based on his research, open source consistently ranked as the most cost-efficient option or a close competitor across different use cases. He concluded that combining GPAC with Motion Spell’s professional services and efficient encoding appliances like NETINT‘s aligns well with the industry’s efficiency challenges.

Flexibility

The final challenge discussed by Romain was flexibility, emphasizing the importance of moving swiftly in a fast-paced environment. He described how Netflix successfully transitioned from SVOD to AVOD, adapted from on-demand to live streaming, switched from H.264 to newer codecs, and consolidated multiple containers into one over short time frames, contributing to their profitability. Romain underlined the potential for others to achieve similar success using GPAC.

He introduced a new application within GPAC called “gpac”, designed to build customized media pipelines. In contrast to historical GPAC applications that offered fixed media pipelines, this new “gpac” application enables users to create tailored pipelines to address specific requirements. This includes transcoding packaging, content protection, networking, and in general, any feature you need for your private cloud.

The Synamedia/Quortex “just-in-time everything” paradigm

NETINT Symposium - GPAC -  Figure 3. Motion Spell’s work with Quortex which was acquired by Synamedia.
Figure 3. Motion Spell’s work with Quortex, which was acquired by Synamedia.

Romain then moved on to the Synamedia/Quortex use case that illustrated the challenge of GPAC supplying comprehensive features. He described Quortex’s innovative “just-in-time everything” paradigm for media pipelines.

Unlike the traditional 24/7 transcoder that is designed to never fail and requires backup solutions for seamless switching, Quortex divides the media pipeline into small components that can fail and be relaunched when necessary. This approach is particularly effective for live streaming scenarios, offering low latency.

Romain highlighted that the Quortex approach is highly adaptable as it can run on various instances, including cloud instances that are cost-effective but might experience interruptions. The system generates content on-demand, meaning that when a user wants to watch specific content on a device, it’s either cached, already generated, or created just-in-time. This includes packaging, transcoding, and other media processing tasks.

Romain attributed the success of the development project to Quortex’s vision and talented teams, as well as the strategic partnership with Motion Spell. He also shared that after project completion, Synamedia acquired Quortex.

Instagram

NETINT Symposium - GPAC - Figure 4. GPAC helped Instagram cut compute times by 94%.
Figure 4. GPAC helped Instagram cut compute times by 94%.

The second use case addressed the challenge of cost and involved Instagram, a member of the Meta Group. According to Romain, Instagram utilized GPAC’s MP4Box to reduce video compute times by an impressive 94%. This strategic decision helped prevent a capacity shortage within just twelve months, ensuring the platform’s ability to provide video uploads for all users.

Romain presented Instagram’s approach as noteworthy because it emphasizes the importance of optimizing costs based on content usage patterns. The platform decided to prioritize transmission and packaging of content over transcoding, recognizing that a significant portion of Instagram’s content is watched only a few times. In this scenario, the cost of transcoding outweighs the savings on distribution expenses. As Romain explained, “It made more sense for them to package and transmit most content instead of transcoding it, because most of Instagram’s content is watched only a few times. The cost of transcoding, in their case, outweighs the savings on the distribution cost.”

According to Romain, this strategy aligns with the broader efficiency trend in the media tech industry. By adopting a combined approach, Instagram used lower quality and color profiles for less popular content, while leveraging higher quality encoders for content requiring better compression. This optimization was possible because Instagram controls its own encoding infrastructure, which underscores the value of open-source solutions in providing control and flexibility to organizations.

The computational complexity of GPAC’s packaging is close to a bit-for-bit copy, contributing to the 94% reduction in compute times. Romain felt that Instagram’s successful outcome exemplifies how open-source solutions like GPAC can empower organizations to make significant efficiency gains while retaining control over their systems.

Netflix

NETINT Symposium - GPAC - Figure 5. GPAC helped Netflix transition from SVOD to AVOD, from On-Demand to live, and from H264 to newer codecs.
Figure 5. GPAC helped Netflix transition from SVOD to AVOD,
from On-Demand to live, and from H264 to newer codecs.

The final use case addresses the challenge of flexibility and involves a significant collaboration between GPAC, Motion Spell, and Netflix. According to Romain, this collaboration had a profound impact on Netflix’s video encoding and packaging platform, and contributed to an exceptional streaming experience for millions of viewers globally.

At the NAB Streaming Summit, Netflix and Motion Spell took the stage to discuss the successful integration of GPAC’s open-source software into Netflix’s content operations. During the talk, Netflix highlighted the ubiquity of the ISO BMFF (MPEG ISO-based media file format) in their workflows and emphasized their commitment to open standards and innovation. The alignment between GPAC and Netflix’s goals allowed them to leverage GPAC’s innovations for free, thanks to sponsorships and prior implementations.

Romain explained how Netflix’s transformation from SVOD to AVOD, from On-Demand to live, and from H264 to newer codecs was facilitated by GPAC’s ease of integration and efficiency in operations. In this fashion, he asserted, the collaboration between Motion Spell and Netflix exemplifies the capacity of open-source solutions to drive innovation and adaptability.

Romain further described how GPAC’s rich feature set, rooted in research and standardization, offers capabilities beyond most publishers’ current needs. The unified “gpac” executable simplifies deployment, making it accessible for service implementation. Leveraging open-source principles, GPAC proves to be cost-competitive and easy to integrate. Motion Spell’s role in helping organizations maximize GPAC’s potential, as demonstrated with Netflix, underscores the practical benefits of the collaboration.

Romain summarized how GPAC’s flexibility empowers organizations to optimize and differentiate themselves rapidly. Examples like Netflix’s interactive Bandersnatch, intelligent previews, exceptional captioning, and accessibility enhancements showcase GPAC’s adaptability to evolving demands. Looking forward, Romain described how user feedback continues to shape GPAC’s evolution, ensuring its continued improvement and relevance in the media tech landscape.

With a detailed description of GPAC’s features and capabilities, underscored by very relevant case studies, Romain clearly demonstrated how GPAC can help live streaming publishers overcome any infrastructure-related challenge. And for those who would like to learn more, or need support or assistance integrating GPAC into their workflows, he invited them to contact him directly.

NETINT Symposium - GPAC

ON-DEMAND:
Romain Bouqueau, Deploying GPAC for Transcoding and Packaging

Simplify Building Your Own Streaming Cloud with Wowza

Transcoding and packaging software is a key component of any live-streaming cloud, and one of the most functional and flexible programs available is the Wowza Streaming Engine. During the symposium, Barry Owen, Chief Solutions Architect at Wowza, detailed how to create a scalable streaming infrastructure using the Wowza Streaming Engine (WSE).

He started by discussing Wowza’s history, from its formation in 2005 to its recent acquisition of FlowPlayer. After defining the typical live streaming production pipeline, Barry detailed how WSE can serve as an origin server, transcoder, and packager, ensuring optimal viewer experience. He discussed WSE’s adaptability, including its ability to scale through GPU- and VPU-based transcoding, and emphasized WSE’s deployment options, which range from on-premises to cloud-based infrastructures. He then outlined Wowza’s infrastructure for distributing to audiences large and small.

Barry concluded by validating the session title by getting WSE up and running in under five minutes using Docker in a demo that you can watch below, at the end of this article.

Simplify Building Your Own Streaming Cloud with WOWZA

Start Streaming in Minutes with Wowza Streaming Engine

The focus of Barry’s talk was how to create a highly scalable streaming infrastructure with Wowza Streaming Engine (WSE). He began by recounting Wowza’s history. Established in 2005, the company launched its inaugural product, the Wowza Media Server, in 2007. This was later complemented by the Wowza Cloud, a SaaS solution, in 2013. Since its inception, Wowza has grown to support over 6,000 customers in 170 countries and boasts more than 35,000 streaming implementations. Their products are responsible for 38 million video transcoding hours each month. Recently, the company acquired FlowPlayer, adding a premier video player to its product lineup.

Barry emphasized Wowza’s commitment to providing streaming solutions that are reliable, scalable, and adaptable. He noted the importance of customization in the streaming sector and highlighted the company’s robust support team and services, which are designed to ensure customer success.

Wowza Streaming Engine Functionality

Barry then moved to the heart of his talk, which he set up by illustrating the streaming pipeline, which begins with video capture from sources like cameras, encoders, or mobile devices (Figure 1). Within this pipeline, WSE serves as a comprehensive media server that’s capable of functioning as an origin server, transcoder, and packager in a single system.

In this role, WSE offers real-time encoding and transcoding, producing multiple-bit rate streams for optimal viewer experience. It also performs real-time packaging into formats like HLS and DASH, facilitating compatibility across devices, and ancillary functions like adding DRM and captions, ad insertion, and metadata handling. Once processed, the stream is ready for delivery to a vast audience through one or multiple CDNs, depending on the desired scale and workflow.

NETINT Symposium - Figure 1. The role WSE plays in the streaming pipeline.
Figure 1. The role WSE plays in the streaming pipeline.

Then Barry dug deeper into the capabilities of the Wowza Streaming Engine, emphasizing its comprehensive nature as an end-to-end media server. These capabilities include:

  • Input Protocols: The Streaming Engine can ingest almost any input protocol, including RTSP, RTMP, SRT, WebRTC, HLS, and more.
  • Transcoding: WSE offers just-in-time, real-time transcoding with minimal latency. It also supports features like compositing and overlays, preparing the stream for packaging.
  • Packaging: WSE supports commonly used formats like HLS and DASH, as well as more specialty formats such as WebRTC, RTSP, and MPEG-TS .
  • Delivery: Wowza supports both push and pull models for stream delivery. It can integrate with multiple CDN vendors, including its own, and allows syndication to platforms like Facebook and LinkedIn.
  • Extensibility: A significant feature of the Streaming Engine is its flexibility. It offers a complete Java API for custom processing and a REST API for system command and control. WSE’s user interface (Streaming Engine Manager) is built on this REST API, demonstrating its functionality.
  • Configuration and Control: This Streaming Engine Manager allows users to manage one or more Streaming Engine instances from one web interface. Advanced users can also programmatically edit configurations to integrate with their systems.

Barry underscored WSE’s adaptability, highlighting its ability to cater to custom workflows, from complex ad insertions to machine learning applications. He also mentioned the availability of GitHub libraries with examples and encouraged exploring the Streaming Engine Manager for system configuration and monitoring.

Deploying Wowza Streaming Engine
NETINT Symposium - Figure 2. WSE deployment options.
Figure 2. WSE deployment options.

Barry next discussed the deployment options for the Wowza Streaming Engine. These include:

  • On-Premises: WSE can be deployed on-premises, offering cost-effective and efficient solutions, especially in high-density scenarios or when access to a personal data center is available.
  • Managed Hardware Platforms: WSE can be set up on platforms like Linode, providing access to bare metal in a managed environment.
  • Public Clouds: Pre-built images are available for major cloud platforms, allowing quick setup. Users can choose from marketplace images or standard ones, where they bring their own license key. Pre-configurations for common use cases are also provided.
  • Docker: Wowza offers Docker images for users, emphasizing its significance in automating deployment, scaling, and ensuring high availability in modern infrastructure setups.

Barry emphasized WSE’s adaptability to various deployment needs, from traditional setups to modern cloud-based infrastructures.

Scaling Wowza Streaming Engine
NETINT Symposium - Figure 3. Scaling stream processing with GPUs and VPUs (ASICS).
Figure 3. Scaling stream processing with GPUs and VPUs (ASICS).

Barry shifted the discussion to scaling and stream processing, emphasizing the different approaches and addressing their pros and cons. For stream processing, WSE can deploy CPU, GPU, and VPU-based transcoding. Here’s a brief discussion of each option.

CPU-Based Transcoding:

Barry highlighted the traditional approach of using software CPU-based transcoding. The Wowza Streaming Engine can efficiently leverage the processing power of CPUs to handle video streams. This method is straightforward and can be scaled by adding more servers or opting for higher-capacity CPUs.

He shared that CPU-based transcoding offers a wide range of adaptability, allowing for various encoding and decoding combinations. Given that CPUs are a standard component in servers, there’s no need for specialized hardware. On the other hand, he pointed out CPUs aren’t the best option for achieving high density or low power consumption.

GPU-Based Transcoding:

Regarding GPU-based transcoding, Barry stated that GPUs can handle a significant number of streams, and take on the heavy lifting from the CPU, ensuring smoother operation. However, they are expensive, and not exclusively designed for video processing, which can lead to higher power consumption.

VPU-Based Transcoding:

Barry expressed considerable enthusiasm for the capabilities of Video Processing Units (VPUs), or ASIC-based transcoders. Unlike general-purpose CPUs and GPUs, VPUs are purpose-built for video processing which allows them to handle video streams with remarkable efficiency. In recent years, VPUs have emerged as a promising solution, especially when it comes to achieving high-density streaming. Barry noted that these units not only offer a competitive price per channel but also boast minimal power consumption.

The Evolution Towards Specialization:

Drawing from his insights, Barry seemed to suggest a trend in the streaming industry: a move towards more specialized solutions. While CPUs and GPUs have been stalwarts in the industry, the rise of VPUs indicates a shift towards tools and technologies tailored specifically for streaming. This specialization promises not only enhanced performance but also greater efficiency in terms of cost and energy consumption.

Distributing Your Streams

Barry concluded his talk by discussing the distribution options available from Wowza. He emphasized the importance of adaptability when it comes to scaling outputs, especially given the diverse audience sizes that streaming services might cater to. WSE offers multiple distribution options to ensure that content reaches its intended audience efficiently, regardless of its size.

On-Premises Scaling:

One of the primary methods Barry discussed was scaling on-premises. By simply adding more servers to the existing infrastructure, streaming services can handle a larger load. This method is particularly useful for organizations that already have a significant on-premises setup and are looking to leverage that infrastructure.

CDN (Content Delivery Network):

For those expecting a vast number of viewers, Barry recommended using a content delivery network, or CDN. CDNs are designed to handle large-scale content delivery, distributing the content across a network of servers to ensure smooth and efficient delivery to a global audience. By offloading the streaming to a CDN, services can ensure that their content reaches viewers without any hitches, even during peak times.

Hybrid Approaches:

Barry found the hybrid model particularly intriguing. This approach combines the strengths of both on-premises scaling and CDNs. For instance, an organization could use its on-premises setup for regular streaming to a smaller audience. However, during events or times when a larger audience is expected, they could “burst” to the cloud, leveraging the power of CDNs to handle the increased load. This model offers both cost efficiency and scalability, ensuring that services are not overextending their resources during regular times but are also prepared for peak events.

In essence, Barry underscored the importance of flexibility in scaling. The ability to choose between on-premises, CDN, or a hybrid approach ensures that streaming services can adapt to meet any audience size.

NETINT Symposium - Wowza - Figure 4. Options for distributing to various audience sizes.
Figure 4. Options for distributing to various audience sizes.
Figure 8. A simple production with two cameras, a source switcher, and WebRTC output.

Start Streaming in Minutes with WSE: The Demonstration

Play Video about NETINT Symposium - Wowza
Figure 5. Click the image to run Barry’s demo.

Barry then ran a recorded demonstration to illustrate the simplicity of setting up the Wowza Streaming Engine using Docker – you can run this below. He ran the demo using Docker Desktop and Docker Compose, and the objective was to launch two containers: one for the Wowza Streaming Engine and another for its manager.

He began by activating the services using the command ‘Docker compose up’. Since he recorded the demo on an M1 Mac, he noted that the process might be slightly slower due to the Rosetta translation layer. As the services initialized, Barry explained the YAML file he used to provision these services. The file contained configurations for both the Streaming Engine and its Manager, detailing aspects like image sources, environment variables, and port settings.

With the services up and running, Barry navigated to Docker Desktop to monitor the performance of the two launched services, observing metrics like CPU and memory usage. He then accessed the Streaming Engine Manager via a web browser. Barry highlighted the versatility of Docker Compose, mentioning that it can manage multiple service instances, which can be beneficial for scalability, high availability, or clustering.

Upon accessing the manager, Barry logged in to view the server’s health snapshot, providing insights into its status. He then navigated to a pre-configured application named ‘live’ to stream content. Using a live streaming program called Open Broadcaster Software on his system, Barry set it up to stream to the server, pointing out the server’s recognition of the incoming stream and its subsequent packaging.

Returning to the manager, Barry verified the incoming stream’s presence and details. He then extracted the HLS URL for the stream, which he opened in a Safari browser tab to demonstrate live playback. The stream played seamlessly, underscoring the efficiency and ease of the entire process.

The demo showcased how, in a matter of minutes, you can configure, initiate, and stream using the Wowza Streaming Engine. You can get started yourself by downloading a trial version of WSE here.

ON-DEMAND:
Barry Owen, Start Streaming in Minutes with Wowza Streaming Engine