Reducing Power Consumption in Data Centers: A Response to the European Energy Crisis

Reducing power consumption - European Energy Crisis

Encoding technology refreshes are seldom CFO driven. For European data centers, over the next few years, they may need to be as reducing power consumption in data centers becomes a primary focus.

Few European consumers or businesses need to be reminded that they are in the midst of a power crisis. But a recent McKinsey & Company article entitled Four themes shaping the future of the stormy European power market provides interesting insights into the causes of the crisis and its expected duration. Engineering and technical leaders, don’t stop reading because this crisis will impact the architecture and technology decisions you may be making.

The bottom line, according to McKinsey? Buckle up, Europe, “With the frequency of high-intensity heat waves expected to increase, additional outages of nuclear facilities planned in 2023, and further expected reductions in Russian gas imports, we expect that wholesale power prices may not reduce substantially (defined as returning to three times higher than pre-crisis levels) until at least 2027.” If you haven’t been thinking about steps your organization should take to reduce power consumption and carbon emissions, now is the time.

Play Video about Hard Questions - Reducing Power Consumption in Europe - NETINT technologies
HARD QUESTIONS ON HOT TOPICS – EUROPEAN ENERGY CRISIS AS PER MCKINSEY REPORT
WATCH THE FULL CONVERSATION ON YOUTUBE: https://youtu.be/yiYSoUB4yXc

The Past

The war in Ukraine is the most obvious contributor to the energy crisis, but McKinsey identifies multiple additional contributing factors. Significantly, even before the War, Europe was in the midst of “structural challenges” caused by its transition from carbon-emitting fossil fuels to cleaner and more sustainable sources like wind, solar, and hydroelectric.

Then, in 2022, the shock waves began. Prior to the invasion of Ukraine in February, Russia supplied 30 percent of Europe’s natural gas, which dropped by as much as 50% in 2022, and is expected to decline further. This was exacerbated by a drop of 19% in hydroelectric power caused by drought and a 14% drop in nuclear power caused by required maintenance that closed 32 of France’s 56 reactors. As a result, “wholesale prices of both electricity and natural gas nearly quadrupled from previous records in the third quarter of 2022 compared with 2021, creating concerns for skyrocketing energy costs for consumers and businesses.”

Reducing Power Consumption in Europe - NETINT technologies
Figure 1. As most European consumers and businesses know, prices skyrocketed in 2022
and are expected to remain high through 2027 and beyond.

Four key themes

Looking ahead, McKinsey identifies four key themes it expects to shape the market’s evolution over the next five years.

  • Increase in Required Demand

McKinsey sees power usage increasing from 2,900 terawatt-hours (TWh) in 2021 to 3,700 TWh in 2030, driven by multiple factors. For example, the switch to electric cars and other modes of transportation will increase power consumption by 14% annually. In addition, the manufacturing sector, which needs power for electrolysis, will increase to 200 TWh by 2030.

  • The Rise of Intermittent Renewable Energy Sources

By 2030, wind and solar power will provide 60% of Europe’s energy, double the share in 2021. This will require significant new construction but could also face challenges like supply chain issues, material shortages, and a scarcity of suitable land and talent.

  • Balancing Intermittent Energy Sources

McKinsey sees the energy market diverging into two types of sources; intermittent sources like solar, wind, and hydroelectric, and dispatchable sources like coal, natural gas, and nuclear that can be turned on and off to meet peak requirements. Over the next several years, McKinsey predicts that “a gap will develop between peak loads and the dispatchable power capacity that can be switched on to meet it.”

To close the gap, Europe has been aggressively developing clean energy sources of dispatchable capacity, including utility-scale battery systems, biomass, and hydrogen. In particular, hydrogen is set to play a key role in Europe’s energy future, as a source of dispatchable power and as a means to store energy from renewable sources.

All these sources must be further implemented and massively scaled, with “build-outs remaining highly uncertain due to a reliance on supportive regulations, the availability of government incentives, and the need for raw materials that are in short supply, such as lithium ion.”

  • New and evolving markets and rules

Beyond temporary measures designed to reduce costs for energy consumers, European policymakers are considering several options to reform how the EU energy market operates. These include

  • A central buyer model: A single EU or national regulatory agency would purchase electricity from dispatchable sources at fixed prices under long-term contracts and sell it to the market at average cost prices.
  • Decoupled day-ahead markets: Separate zero marginal cost energy resources (wind, solar) and marginal cost resources (coal) into separate markets to prioritize dispatching of renewables.
  • Capacity remuneration mechanism: Grid operator provides subsidies to producers based on forecast cost of keeping power capacity in the market to ensure a steady supply of dispatchable electricity and protect consumers.

McKinsey closes on a positive note, “Although the European power market is experiencing one of its most challenging periods, close collaboration among stakeholders (such as utilities, suppliers, and policy makers) can enable Europe’s green-energy transition to continue while ensuring a stable supply of power.”

The future of the European power market is complex and subject to many challenges, but policymakers and stakeholders are working to address them and find solutions to ensure a stable and affordable energy system for consumers and businesses.

In the meantime, the mandate for data centers isn’t new as video engineers are being asked to reduce power consumption to save OPEX, reduce carbon footprint to ensure ESG metrics are hit by the company, and minimize the potential disruption of energy instability.

If you’re in this mode, NETINT’s ASIC-based transcoders can help by offering the lowest available power draw of any silicon solution (CPU, GPU, FPGA), and thus the highest possible density.

Cloud or on-premise – streaming publisher’s dilemma

Publisher's dilemma - cloud or on-premise

Processing your media in the cloud or on-premises is one of the most critical decisions facing a streaming video service. Two recent articles provide strong opinions and insights on this decision and are worthy of review. Our take? Do the math and make your own decision.

The first article is “Why we’re leaving the cloud.”

By way of background, Hansson is co-owner and CTO of software developer 37signals, the developer of the project management platform Basecamp , and the premium email service Hey.

After running the two platforms on AWS for a number of years, Hannson commented that “renting computers is (mostly) a bad deal for medium-sized companies like ours with stable growth. The savings promised in reduced complexity never materialized.” As an overview, he asserts that the cloud excels at two ends of the spectrum: 1) simple and low-traffic applications and 2) highly irregular load with wild swings or towering peaks in usage.

When Hey first launched, running in AWS allowed the new service to seamlessly onboard the 300,000 users that signed up in the first three weeks, wildly exceeding the forecast of 30,000 in 6 months. However, since then, Hansson reported, these capacity spikes never reoccured, and by “continuing to operate in the cloud, we’re paying an at times almost absurd premium for the possibility that [they] could.”

In abandoning the cloud, Hansson had to stare down two common beliefs. First, is that the cloud simplifies systems and computer management. As it relates to his own businesses, he reports that “anyone who thinks running a major service like HEY or Basecamp in the cloud is “simple” has clearly never tried. Some things are simpler, others more complex, but on the whole, I’ve yet to hear of organizations at our scale being able to materially shrink their operations team, just because they moved to the cloud.”

He also tackles perceptions regarding the complexity of running equipment on-premise. “Up until very recently, everyone ran their own servers, and much of the progress in tooling that enabled the cloud is available for your own machines as well. Don’t let the entrenched cloud interests dazzle you into believing that running your own setup is too complicated. Everyone and their dog did it to get the internet off the ground, and it’s only gotten easier since.”

“Up until very recently, everyone ran their own servers, and much of the progress in tooling that enabled the cloud is available for your own machines as well. Don’t let the entrenched cloud interests dazzle you into believing that running your own setup is too complicated. Everyone and their dog did it to get the internet off the ground, and it’s only gotten easier since.”

In “Media Processing in the Cloud or On-Prem—Which Is Right for You?” , Alex Emmermann, Director of Business Development for Cloud Products at Telestream, takes a more moderate view (as you would expect).

Emmermann starts by pointing out where the cloud makes sense, zeroing in on the same capacity swings as Hansson. “A typical painful example is when capacity requirements shift underneath you, such as a service becoming more popular than you had initially allocated resources for. For example, when running a media services operation, there are many situations that can stress systems... In media processing, full-catalog licenses, mergers, or content migrations can cause enormous capacity requirements for transcoding and QC.”

Emmermann also introduces the concept of hybrid operations. “For many companies, a wholesale move may feel too risky, so a hybrid approach works well by allowing excess capacity requirements to burst into the cloud as required. This allows run rate systems to continue functioning while taking immediate advantage of cloud scaling when and if required. Depending on the needs of the service, a hybrid setup could continue to run indefinitely and very cost-effectively if on-prem CapEx resources have already been spent and the resources are in place to keep them running.”

In terms of companies that should operate on premises, Emmerman cites two examples. First are companies with significant CAPEX investments in encoding gear. “For the many thousands of busy on-premises servers processing run-rate media workflows throughout the world, they’re efficiently and cheaply doing what they need to do and will no doubt continue to do so for a long time.” He also mentions that inexpensive and reliable connectivity is an absolute requirement, and “there are certain places on the planet that may not have reliable interconnectivity to a cloud provider.”

All told, Emmerman concludes, “There’s no question that any media company investing in new services or wanting to have the capacity to say yes to any customer request will want to do this with a public cloud provider… On the other hand, any steady-state, on-premises service that is happily functioning as designed and only occasionally requires a small capital refresh will be happy to stay the course.”

Our Take? Do the Math

Play Video about Hard-Questions-on-Hot-Topics-1-cloud-or-on-prem
HARD QUESTIONS ON HOT TOPICS – CLOUD OR ON PREMISES, HOW TO DO THE MATH?
Watch the full conversation on YouTube: https://youtu.be/GSQsa4oQmCA

Anyone who has ever provisioned an EC2 instance from AWS and paid the hourly rate has wondered, “how does that compare to buying your own system?” We’re certainly not immune.

Given the impetus of this article, we decided to put pencil to paper or keyboard to a spreadsheet. We recently launched the NETINT Video Transcoding Server, which costs $7,000 and includes ten T408 transcoders that can output H.264 and HEVC. In benchmarking the entry-level system, it produced 21 five-rung H.264 ladders and 27 4-rung H.264 ladders. What would it cost to produce the same number of streams in AWS?

We checked the MediaLive price list here and confirmed it with the pricing calculator estimate here (Figure 3 for HEVC). Though a single hour of H.264 live streaming costs $0.46, this adds up to $4,004.17/per year. This jumps to $1.527 per hour for HEVC, or $13,375.55 per year. Both are for a single ladder.

Figure 3. Yearly cost for streaming a single five-rung HEVC encoding ladder.

To compare this to our streaming server, we multiplied each ladder by the number of ladders the server could produce, and extended all calculations out to five years. This translates to a five-year cost of $420,441 for H.264 and a staggering $1,805,712 for HEVC.

To compute the same five-year cost for the server, we added $69/month for colocation charges to the $7,000 base price. This came to $11,140 for either format.

Cloud or on-premise - streaming publisher's dilemma - table 1
Table 1. Five-year cost comparison, AWS MediaLive pricing compared to the NETINT server.

This comparison brought to mind Hansson’s comment that “Amazon, in particular, is printing profits renting out servers at obscene margins.” Surely, no streaming publisher is using MediaLive for 24/7 365 operations.

Taking a step back, it’s tough not to agree with the key points from both authors. The cloud does make the most sense when you need instant capacity for peak encoding. For steady-state operations, owning your own gear is always going to be cheaper.

All that said, run the numbers no matter what you’re doing in the cloud. While the results probably won’t be as startling as those shown in Table 1, you won’t know until you do the math.

Maximizing Cloud Gaming Performance with ASICs

Maximizing Cloud Gaming Performance with ASICs

Ask ten cloud gamers what an acceptable level of latency is for cloud gaming, and you’ll get ten different answers. However, they will all agree that lower latency is better.

At NETINT, we understand. As a supplier of encoders to the cloud gaming market, our role is to supply the lowest possible latency at the highest possible quality and the greatest encoding density with the lowest possible power consumption. While this sounds like a tall order, because our technology is ASIC based, it’s what we do for cloud gaming and high-volume video streaming workloads of all types.

In this article, we’ll take a quick look at the technology stack for cloud gaming and the role of compression. Then we’ll discuss the performance of the NETINT Quadra VPU (video processing unit) series using the four measuring sticks of latency, density, video quality, and power consumption.

The Cloud Gaming Technology Stack

Figure 1 illustrates the different elements of the cloud gaming technology stack, particularly how the various transfer, compute, rendering, and encoding activities contribute to overall latency.

At the heart of every cloud gaming center is a game engine that typically runs the operating system native to the game, usually Android or Windows, though Linux and macOS is not uncommon. (see here for Meta’s dual OS architecture)

Since most games rely on GPU for rendering, all cloud gaming data centers have a healthy dose of GPU resources. These functions are incorporated in the cloud compute and graphics engine shown on the left, which creates the frames sent to the encode function for encoding and transmission to the gamer.

As illustrated in Figure 1, Nokia budgets 100 ms for total latency. Inside the data center, which is shown on the left, Nokia allows 15 ms to receive the data, 40 ms to process the input and render the frame, 5 ms to encode the frame, and 15 seconds to return it to the remote player. That’s a lot to do in the time it takes a sound wave to travel just 100 feet.

Maximizing Cloud Gaming Performance with ASICs - figure 1
Figure 1. Cloud gaming latency budget from Nokia.

NETINT’s Quadra VPU series is ideal for the standalone encode function. All Quadra VPUs are powered by the NETINT Codensity G5 ASIC. It’s called a video processing unit because in addition to H.264, HEVC, and VP9 decode, and H.264, HEVC, and AVI encode, Quadra VPUs offer onboard scaling, overlay, and an 18 TOPS AI engine (per chip).

Quadra is available in several single-chip solutions (T1 and T1A) and a dual-chip solution (T2) and starts at $1,500 in low quantities. Depending upon the configuration that you purchase, you can install up to ten Quadra VPUs in a single 1RU server and twenty Quadra VPUs in a 2RU server.

Cloud Gaming Latency and Density

Table 1 reports latency and density for a single Quadra VPU. As you would expect, latency depends on video resolution by way of the available network bandwidth and, to a much lesser degree, the number of jobs being processed.

Game producers understand the resolution/latency tradeoff and design the experience around this. So, a cloud gaming vendor might deliver a first-person shooter game at 720p to minimize latency while providing a better UX on medium bandwidth connections and a slower-paced role-playing or strategy game at larger resolutions to optimize the visual experience. As you can see, a single Quadra VPU can service both scenarios, with 4K latency under 20 ms and 720p latency around 4 ms at extremely high stream counts.

Maximizing Cloud Gaming Performance with ASICs - table 1
Table 1. Quadra throughput and average latency for AVC and HEVC.

In terms of density, the jobs shown in Table 1 are for a single Quadra VPU. Though multiple units won’t scale linearly, performance will increase substantially as you install additional units into a server. Because the Quadra is focused solely on video processing and encoding operations, it outperforms most general-purpose GPUs, CPUs, and even FPGA-based encoders from a density perspective.

Quadra Output Quality

From a quality perspective, hardware transcoders are typically benchmarked against the x264 and x265 codecs running in FFmpeg. Though FFmpeg’s throughput is orders of magnitude lower, these codecs represent well known and accepted quality levels. NETINT recently compared Quadra quality against x264 and x265 in a low latency configuration using a CGI-based data set.

Table 2 shows the results for H.264, with Rate-Distortion Optimization Quantization enabled and disabled. Enabling RDOQ increases quality slightly but decreases throughput. Quadra exceeded x264 quality in both configurations using the veryfast preset, typical for live streaming.

Maximizing Cloud Gaming Performance with ASICs - table 2
Table 2. The NETINT Quadra VPU series delivers better H.264 quality
than the x264 codec using the veryfast preset.

For HEVC, Table 3 shows the equivalent x265 preset with RDOQ disabled (the high throughput, lower-quality option) at three Rate Distortion Optimization levels, which also trade-off quality for throughput. Even with RDOQ disabled and with RDO set to 1 (low quality. high throughput) Quadra delivers the equivalent of x265 Medium quality. Note that most live streaming engineers use superfast or ultrafast to produce even a modest number of HEVC streams in a software-only encoding scenario.

Table 3. The NETINT Quadra VPU series delivers better quality
than the x265 codec using the medium preset.

Low Power Transcoding for Cloud Gaming

At full power, Quadra T1 draws 70 watts. Though some GPUs offer similar power consumption, they typically deliver much fewer streams.

In this comparison with the NVIDIA T4, the Quadra T1 drew .71 watts per 1080p stream, about 84% less than the 3.7 watts per stream required by the T4. This obviously translates to an 84% reduction in energy costs and carbon emissions per stream. In terms of CAPEX, Quadra costs $53.57 per 1080p stream, 63% cheaper than the T4’s $144/stream.

When it comes to gameplay, most gamers prioritize latency and quality. In addition to delivering these two key QoE elements, cloud gaming vendors must also focus on CAPEX, OPEX, and sustainability.  By all these metrics, the ASIC-based Quadra is the most ideal encoder for any cloud gaming production workflow. 

Argos dispels common myths about encoding ASICs

Argos dispels common myths about encoding ASICs

Even in 2023, many high-volume streaming producers continue to rely on software-based transcoding, despite the clear CAPEX, OPEX, and environmental benefits of ASIC-based transcoding. Part of the inertia relates to outdated concerns about the shortcomings of ASICs, including sub-par quality and lack of flexibility to add features or codec enhancements.

As a parent, I long ago concluded that there were no words that could come out of my mouth that would change my daughter’s views on certain topics. As a marketer, I feel some of that same dynamic, that no words can come out of my keyboard that would shake the negative beliefs about ASICs from staunch software-encoding supporters.

So, don’t take our word that these beliefs are outdated; consider the results from the world’s largest video producer, YouTube. The following slides and observations are from a Google presentation by Aki Kuusela and Clint Smullen on the Argos ASIC-based transcoder at Hot Chips 33 back in August 2021. The slides are available here, and the video here

In the presentation, the speakers discussed why YouTube developed its own ASIC and the performance and power efficiency achieved during the first 16 months of deployment. Their comments go a long way toward dispelling the myths identified above and make for interesting reading.

Advanced Codecs Means Encoding Time Has Grown by 8,000% Since H.264

In discussing why Google created its own encoder, Kuusela explained that video was getting harder to compress, not only from a codec perspective but from a resolution and frame rate perspective.  Here’s Kuusela (all quotes grabbed from the YouTube video and  lightly edited for readability).

“In order to sustain the higher resolutions and frame rate requirements of video, we have to develop better video compression algorithms with improved compression efficiency. However, this efficiency comes with greatly increased complexity. For example, if we compare the vp9 from 2013 to the decade older H.264, the time to encode videos in software has grown to 10x. The more recent AV1 format from 2018 is already 200 times more time-consuming than the h.264 standard.

If we further compound this effect with the increase in resolution and frame rate for top-quality video, we can see that the time to encode a video from 2003 to 2018 has grown eight thousand-fold. It is very obvious that the CPU performance improvement has not kept up with this massive complexity growth, and to keep our video services running smoothly, we had to consider warehouse scale acceleration. We also knew things would not get any better with the next generation of compression.”

Argos dispels common myths about encoding ASICs - 1
Figure 1. Google moved to hardware
to address skyrocketing encoding times.

Reviewing Figure 1, it should be noted that though few engineers use VP9 as extensively as YouTube, if you swap HEVC for VP9, the complexity difference between H.264 is the same. Beyond the higher resolutions and frame rates engineers must support to remain competitive, the need for hardware becomes even more apparent when you consider the demands of live production.

“Near Parity” with Software Encoding Quality

One consistent concern about ASICs has been quality, which admittedly lagged in early hardware generations. However, Google’s comparison shows that properly designed hardware can deliver near-parity to software-based transcoding.

Kuusela doesn’t spend a lot of time on the slide shown in Figure 2, merely stating that “we also wanted to be able to optimize the compression efficiency of the video encoder based on the real-time requirements and time available for each encoder and to have full access to all quality control algorithms such as bitrate allocation and group of picture selection. So, we could get near parity to software-based encoding quality with our no-compromises implementation.”

Figure 2. Argos delivers “near-parity”
with software encoders.

NETINT’s data more than supports this claim. For example, Table 1 compares the NETINT Quadra VPU with various x265 presets. Depending upon the test configuration, Quadra delivers quality on par with the x265 medium preset. When you consider that software-based live production often necessitates using the veryfast or ultrafast preset to achieve marginal throughput, Quadra’s quality far exceeds that of software-based transcoding.

Argos dispels common myths about encoding ASICs - table 1
Table 1. Quadra HEVC quality compared to x265
in high-quality latency tolerant configuration.

ASIC Performance Can Improve After Deployment

Another concern about ASIC-based transcoders is the inability to upgrade, and accelerated obsolescence. Proper ASIC design allows ASICs to balance encoding tasks between hardware, firmware, and control software to ensure continued upgradeability.

Figure 3 shows how the bitrate of VP9 and H.264 continued to improve compared to software in the months after the product launch, even without changes to the firmware or kernel driver. The second Google presenter, Clint Smullen attributed this to a hybrid hardware/software design, commenting that “Using a software approach was critical both to supporting the quality and feature development in the video core as well as allowing customer teams to iteratively improve quality and performance.”

Figure 3. Argos continued to improve after deployment
without changes to firmware or the kernel driver.

The NETINT Codensity G4 ASIC included in the T408 and the NETINT Codensity G5 ASIC that powers our Quadra family of VPUs, both use a hybrid design that distributes critical functions between the ASIC, driver software, and firmware.

We optimize ASIC design to maximize functional longevity as explained here on the role of firmware in ASIC implementations, “The functions implemented in the hardware are typically the lower-level parts of a video codec standard that do not change over time, so the hardware does not need to be updated. The higher levels parts of the video codecs are in firmware and driver software and can still be changed.”

As Google’s experience and NETINT’s data show, well-designed ASICs can continue improving in quality and functionality long after deployment. 

90% Reduction in Power Consumption

Few engineers question the throughput and power efficiency of ASICs, and Google’s data bears this out. Commenting on Figure 4, Smullen stated, “For H.264 transcoding a single VCU matches the speed of the baseline system while using about one-tenth of the system level power. For VP9, a single 20 VCU machine replaces multiple racks of CPU-only systems.”

Figure 4. Throughput and comparative efficiency
of Argos vs software-only transcoding.

NETINT ASICs deliver similar results. For example, a single T408 transcoder (H.264 and HEVC) delivers roughly the same throughput as a 16-core computer encoding with software and draws only about 7 watts compared to 250+ for the computer. NETINT Quadra draws 20 watts and delivers roughly 4x the performance of the T408 for H.264, HEVC, and AV1. In one implementation, a single 1RU rack of ten Quadras can deliver 320 1080p streams or 200 720p cloud gaming sessions, which like Argos, replaces multiple racks of CPUs.

Time to Reconsider?

As Google’s experience with YouTube and Argos shows, ASICs deliver unparalleled throughput and power efficiency in high-volume publishing workflows. If you haven’t considered ASICs for your workflow, it’s time for another look.

How Scaling Method and Technique Impacts Quality and Throughput

How Scaling Method and Technique Impacts Quality and Throughput

The thing about FFmpeg is that there are almost always multiple ways to accomplish the same basic function. In this post, we look at four approaches to scaling to reveal how the scaling method and techniques used impact quality and throughput.

We found that if you’re scaling using the default -s function (-s 1280×720), you’re leaving a bit of quality on the table compared to other methods. How much depends upon the metric you prefer; about ten percent if you’re a VMAF (hand raised here) or SSIM fan, much less if you still bow to the PSNR gods. More importantly, if you’re chasing throughput via cascaded scaling with fast scaling algorithms (flags=fast_bilinear), you’re probably losing quality without a meaningful throughput increase.

That’s the TL/DR; here’s the backstory.

The Backstory

NETINT sells ASIC-based hardware transcoders. One key advantage over software-only/CPU-based encoding is throughput, so we perform lots of hardware vs. software benchmarking. Fairness dictates that we use the most efficient FFmpeg command string when deriving the command string for software-only encoding.

In addition, the NETINT T408 transcoder scales in software using the host CPU, so we are vested in techniques that increase throughput for T408 transcodes. In contrast, the NETINT Quadra scales and performs overlays in hardware and provides an AI engine, which is why it’s designated a Video Processing Unit (VPU) rather than a transcoder.

One proposed scaling technique for accelerating both software-only and T408 processing is cascading scaling, where you create a filter complex that starts at full resolution, scales to the next lower resolution, then uses the lower resolution to scale to the next lower resolution. Here’s an example.

filter_complex “[0:v]split=2[out4k][in4k];[in4k]scale=2560:1440:flags=fast_bilinear,split=2[out1440p][in1440p];[in1440p]scale=1920:1080:flags=fast_bilinear,split=3[out1080p][out1080p2][in1080p];[in1080p]scale=1280:720:flags=fast_bilinear,split=2[out720p][in720p];[in720p]scale=640:360:flags=fast_bilinear[out360p]”

So, rather than performing multiple scales from full resolution to the target (4K > 2K, 4K to 1080p, 4K > 720p, 4K to 360p), you’re performing multiple scales from lower resolution sources (4K > 2K > 1080p >720p > 360p). The theory was that this would reduce CPU cycles and improve throughput, particularly when coupled with a fast scaling algorithm. Even assuming a performance increase (which turned out to be a bad assumption), the obvious concern is quality; how much does quality degrade because the lower-resolution transcodes are working from a lower-resolution source?

In contrast, if you’ve read this far,  you know that the typical scaling technique used by most beginning FFmpeg producers is the -s command (-s 1280×720). For all rungs below 4K, FFmpeg scales the source footage down to the target resolution using the bicubic scaling algorithm,

So, we had two proposed methods which I expanded to four, as follows.

  • Default (-s 1280×720)
  • Cascade using fast bilinear
  • Cascade using Lanczos
  • Video filter using Lanczos (-vf scale=1280×720 -sws_flags lanczos)

I tested the following encoding ladder using the HEVC codec.

  • 4K @ 12 Mbps
  • 2K @ 7 Mbps
  • 1080p @ 3.5 Mbps
  • 1080p @ 1.8 Mbps
  • 720p @ 1 Mbps
  • 360p @ 500 kbps

I encoded two 3-minute 4Kp30 files, excerpts from the Netflix Meridian and Harmonic Football test clips using the x265 codec and ultrafast preset. You can see full command strings at the end of the article. I measured throughput in frames per second and measured the 2K to 360p rung quality with VMAF, PSNR, and SSIM, compiling the results into BD-Rate comparisons in Excel.

I tested on a Dell Precision 7820 tower driven by two 2.9 GH Intel Xeon Gold (6226R) CPUs running Windows 10 Pro for Workstations with 64 GB of RAM. I tested with FFmpeg 5.0, a version downloaded from www.gyan.dev on December 15, 2022.

Performance

How Scaling Method and Technique Impacts Quality and Throughput - table 1
TABLE 1. FPS BY SCALING METHOD

Table 1 shows that cascading delivered negligible performance benefits with the two test files and the selected encoding parameters. I asked the engineer who suggested the cascading scaling approach why we saw no throughput increase. Here’s a brief exchange. 

Engineer: It’s not going to make any performance difference in your example anyways but it does reduce the scaling load

       Me: Why wouldn’t it make a performance difference if it reduces the scaling load?

Engineer: Because, as your example has shown, the x265 encoding load dominates. It would make a very small difference

       Me: Ah, so the slowest, most CPU-intensive process controls overall performance.

Engineer: Yes, when you compare 1000+1 with 1000+10 there is not too much difference.

What this means, of course, is that these results may vary by the codec. If you’re encoding with H.264, which is much faster, cascading scaling might increase throughput. If you’re encoding with AV1 or VVC, almost certainly not.

Given that the T408 transcoder is multiple times faster than real-time, I’m now wondering if cascaded scaling might increase throughput when producing with the T408. You probably wouldn’t attempt this approach if quality suffered, but what if cascaded scaling improved quality? Sound far-fetched? Read on.

Quality Results

Table 2 shows the combined VMAF results for the two clips. Read this by choosing a row and moving from column to column. As you would suspect, green is good, and red is bad. So, for the Default row, that technique produces the same quality as Cascade – Fast Bilinear with a bitrate reduction of 18.55%. However, you’d have to boost the bitrate by 12.89% and 11.24%, respectively, to produce the same quality as Cascade – Lanczos and  Video Filter – Lanczos.

How Scaling Method and Technique Impacts Quality and Throughput - table 2
Table 2. BD-Rate comparisons for the four techniques using the VMAF metric.

From a quality perspective, the Cascade approach combined with the fast bilinear algorithm was the clear loser, particularly compared to either method using the Lanczos algorithm. Even if there was a substantial performance increase, which there wasn’t, it’s hard to see a relevant use case for this algorithm.

The most interesting takeaway was that cascading scaling with the Lanczos algorithm produced the best results, slightly higher than using a video filter with Lanczos. The same pattern emerged for PSNR, where Cascade – Lanc was green in all three columns, indicating the highest-quality approach. 

How Scaling Method and Technique Impacts Quality and Throughput - table 3
Table 3. BD-Rate comparisons for the four techniques using the PSNR metric.

Ditto for SSIM.

How Scaling Method and Technique Impacts Quality and Throughput - table 4
Table 4. BD-Rate comparisons for the four techniques using the SSIM metric.

The cascading approach delivering better quality than the video filter was an anomaly. Not surprisingly, the engineer noted:

Engineer: It is odd that cascading with Lanczos has better quality than direct scaling. I’m not sure why that would be.

       Me: Makes absolutely no sense. Is anything funky in the two command strings?

Engineer: Nothing obvious but I can look some more.

Later analysis yielded no epiphanies. Perhaps they can come from a reader.

The Net Net

First, the normal caveats; your mileage may vary by codec and content. My takeaways are:

  • Try cascading scaling with Lanczos with the T408,
  • For software encodes, never use -s again.
  • Use cascade or the simpler video filter approach. 
  • With most software-based encoders, faster scaling methods may not deliver performance increases but could degrade quality.

Further, as we all know, there are several, if not dozens, additional approaches to scaling; if you have meaningful results that prove one is substantially better, please share them with me via THIS email.

Finally, taking a macro view, it’s worth remembering that a $12,000 + workstation could only produce 25 fps when producing a live 4K ladder to HEVC using x265’s ultrafast preset. Sure, there are faster software encoders available. Still, hardware encoding is the best answer for affordable live 4K transcoding from both an OPEX and CAPEX perspective.

Command Strings:

Default:

c:\ffmpeg\bin\ffmpeg -y -i  football_4K30_all_264_short.mp4 -y ^

-c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 12M -maxrate 12M  -bufsize 24M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_4K_8_bit_12M_default.mp4 ^

-s 2560×1440 -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 7M -maxrate 7M  -bufsize 14M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_2K_8_bit_7M_default.mp4  ^

-s 1920×1080 -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 3.5M -maxrate 3.5M  -bufsize 7M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_8_bit_3_5M_default.mp4 ^

-s 1920×1080 -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1.8M -maxrate 1.8M  -bufsize 3.6M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_1_8M_default.mp4 ^

-s 1280×720  -c:v libx265 -an  -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1M -maxrate 1M  -bufsize 2M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_720p_1M_default.mp4 ^

-s 640×360  -c:v libx265 -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v .5M -maxrate .5M  -bufsize 1M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 -report Fball_x265_360p_500K_default.mp4

Cascade – Fast Bilinear

c:\ffmpeg\bin\ffmpeg -y -i  football_4K30_all_264_short.mp4 -y ^

-filter_complex “[0:v]split=2[out4k][in4k];[in4k]scale=2560:1440:flags=fast_bilinear,split=2[out1440p][in1440p];[in1440p]scale=1920:1080:flags=fast_bilinear,split=3[out1080p][out1080p2][in1080p];[in1080p]scale=1280:720:flags=fast_bilinear,split=2[out720p][in720p];[in720p]scale=640:360:flags=fast_bilinear[out360p]” ^

-map [out4k] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 12M -maxrate 12M  -bufsize 24M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_4K_8_bit_cascade_12M_fast_bi.mp4 ^

-map [out1440p] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 7M -maxrate 7M  -bufsize 14M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_2K_8_bit_cascade_7M_fast_bi.mp4  ^

-map [out1080p] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 3.5M -maxrate 3.5M  -bufsize 7M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_8_bit_cascade_3_5M_fast_bi.mp4 ^

-map [out1080p2] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1.8M -maxrate 1.8M  -bufsize 3.6M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_8_bit_cascade_1_8M_fast_bi.mp4 ^

-map [out720p]  -c:v libx265 -an  -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1M -maxrate 1M  -bufsize 2M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_720p_8_bit_cascade_1M_fast_bi.mp4 ^

-map [out360p]  -c:v libx265 -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v .5M -maxrate .5M  -bufsize 1M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 -report Fball_x265_360p_8_bit_cascade_500K_fast_bi.mp4

Cascade – Lanczos

c:\ffmpeg\bin\ffmpeg -y -i  football_4K30_all_264_short.mp4 -y ^

-filter_complex “[0:v]split=2[out4k][in4k];[in4k]scale=2560:1440:flags=lanczos,split=2[out1440p][in1440p];[in1440p]scale=1920:1080:flags=lanczos,split=3[out1080p][out1080p2][in1080p];[in1080p]scale=1280:720:flags=lanczos,split=2[out720p][in720p];[in720p]scale=640:360:flags=lanczos[out360p]” ^

-map [out4k] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 12M -maxrate 12M  -bufsize 24M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_4K_8_bit_cascade_12M_lanc.mp4 ^

-map [out1440p] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 7M -maxrate 7M  -bufsize 14M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_2K_8_bit_cascade_7M_lanc.mp4  ^

-map [out1080p] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 3.5M -maxrate 3.5M  -bufsize 7M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_8_bit_cascade_3_5M_lanc.mp4 ^

-map [out1080p2] -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1.8M -maxrate 1.8M  -bufsize 3.6M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_8_bit_cascade_1_8M_lanc.mp4 ^

-map [out720p]  -c:v libx265 -an  -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1M -maxrate 1M  -bufsize 2M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_720p_8_bit_cascade_1M_lanc.mp4 ^

-map [out360p]  -c:v libx265 -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v .5M -maxrate .5M  -bufsize 1M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 -report Fball_x265_360p_cascade_500K_lanc.mp4

Video Filter – Lanczos

c:\ffmpeg\bin\ffmpeg -y -i  football_4K30_all_264_short.mp4 -y ^

-c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 12M -maxrate 12M  -bufsize 24M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_4K_12M_filter_lanc.mp4 ^

-vf scale=2560×1440 -sws_flags lanczos -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 7M -maxrate 7M  -bufsize 14M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_2K_7M_filter_lanc.mp4  ^

-vf scale=1920×1080 -sws_flags lanczos  -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 3.5M -maxrate 3.5M  -bufsize 7M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_3_5M_filter_lanc.mp4 ^

-vf scale=1920×1080 -sws_flags lanczos  -c:v libx265 -an -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1.8M -maxrate 1.8M  -bufsize 3.6M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_1080p_1_8M_filter_lanc.mp4 ^

-vf scale=1280×720 -sws_flags lanczos -c:v libx265 -an  -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v 1M -maxrate 1M  -bufsize 2M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 Fball_x265_720p_1M_filter_lanc.mp4 ^

-vf scale=640×360 -sws_flags lanczos  -c:v libx265 -force_key_frames expr:gte^(t,n_forced*2^) -tune psnr -b:v .5M -maxrate .5M  -bufsize 1M -preset ultrafast  -x265-params open-gop=0:b-adapt=0:aq-mode=0:rc-lookahead=16 -report Fball_x265_360p_500K_filter_lanc.mp4

Insights from the Bitmovin Video Developer Report

Insights from the Bitmovin Video Developer Report

The Bitmovin Video Developer Report, now in its 6th edition, is one of the most far-reaching and useful documents available to streaming professionals (now with no registration required). It’s a report that I happily download each December and generally refer to frequently during the next twelve months.

Like the proverbial elephant, what you find important in the report depends upon your interests. I typically zero in on video codec usage, encoding practices, and the most important problems and opportunities facing streaming developers. As discussed below, this year’s edition has some surprises, like the fact that more respondents are currently working with H.266/VVC than AV1.

Beyond this, the report also tracks details on development frameworks, content distribution, monetization practices, DRM, video analytics, and many other topics. This makes it extraordinarily valuable to anyone needing a finger on the pulse of streaming industry practices.

Let’s start with some details about how Bitmovin compiles the data and then jump to what I found most interesting.

Gathering the Data

Bitmovin collected the data between June and September 2022. A total of 424 respondents from over 80 countries answered the survey. Geographically, EMEA led the charge with 43%, followed by North America (34%), APAC (14%), and Latin America (8%). Regarding job function, 34% of respondents were manager/CEO/VP level, 23% developer/engineer, 14% technical manager, 10% product manager, 9% architect/consultant, 7% in R&D, and 3% in sales and marketing.

A quarter of respondents worked in OTT streaming services, 21% in online video platforms, 15% for broadcasters, 12% for integrators, 7% for publishers, 6% for telcos, 5% for social media sites, with 10% other. In terms of company size, 35% worked in companies with 300+ employees, 17% 101-300, 19% 51 – 100, and 29% 1 – 50. In other words, a very useful cross-section of geography, industry, job function, and company size.

To be clear, the results are not actual data from Bitmovin’s cloud encoding facility, which would be useful in its own right. Rather, the respondents answered questions about their current practices and future plans in each of the listed topics.

Current and Planned Codec Usage

Figure 1 shows current and planned codec usage for live encoding, with current usage in blue and planned usage in red. The numbers exceed 100% (of course) because most respondents use multiple codecs.

It’s always a surprise to see H.264 at less than 100%, but there’s 78% clear as day. Even given the breadth of industries that responded to the survey, it’s tough to imagine any publisher not supporting H.264.

Insights from the Bitmovin Video Developer Report - 1
Figure 1. Answers to the question, “Which streaming formats are you using in production for distribution and which ones are you planning to introduce within the next year?”

HEVC was next at 40%, with AV1 in fifth at 18%, bracketed by VP8 (19%) and VP9 (17%), presumably more for WebRTC than OTT. These are the codecs most likely to be used to actually publish video in 2022. Other codecs presumably implemented by infrascture providers were H.266/VVC a suprising third at 19%, with LCEVC and EVC both at 16%.

Looking ahead, HEVC looks to be most likely to succeed in 2023 with 43% of respondents planning to implement, with AV1 next at 34%, H.264/AVC at 33%, and VVC at 20%. Given that CanIUse lists AV1 support at 73% while VVC isn’t even listed, you’d have to assume that actual AV1 deployments in the near term will dwarf H.266/VVC, but you can’t ignore the interest this standard based codec is receiving from the industry. VOD encoding tracks these results fairly closely for both current and planned usage.

Video Quality Related Findings

Quality is a constant concern for video professionals and quality-related data appeared in several questions. In terms of challenges faced by respondents, “finding the root case of quality issues” ranked fifth with 23%, while “quality of experience” ranked ninth, with 19%.

Interestingly, in response to the question, “For which of the following video use cases do you expect to use machine learning (ML) or artificial intelligence (AI) to improve the video experience for your viewers,” 33% cited “video quality optimization,” which ranked third, while 30% cited “quality of experience (QoE),” which ranked fourth.

With so many respondents looking for futuristic means to improve quality, it was ironic that so many ignored content-aware encoding (CAE), a proven method of improving both quality and quality of experience. Specifically, only 33% percent of respondents were currently using CAQ, with 35% planning to implement CAE within the next 12 months. If you’re not in either of these camps, consider yourself scolded.

Live Encoding Practices

Lastly, I focused on live encoding practices, finding that 53% of respondents used commercial encoders, which presumably include both hardware and software. In comparison, 34% encode via open source, which is all software. What’s interesting is how poorly this group dovetails with both the most significant challenge faced by respondents and the largest opportunity for innovation perceived by respondents.

Figure 2. Answers to the question, “Where do you encode video?”

Specifically, controlling cost was the most significant challenge in the report, selected by 33% of respondents. On a cost per stream basis, considering both CAPEX and OPEX, software-encoding is by far more expensive than encoding with hardware, particularly ASICs.

The most significant opportunity for innovation reported by respondents was live streaming at scale, again at 33%. In this regard, the same lack of throughput that makes CPU-driven open-source encoding the most expensive solution makes it the least scalable. Simply stated, publishers currently encoding with CPU-driven open-source codecs can help address both their biggest challenge and their most significant opportunity by switching to ASIC-based transcoding.

Insights from the Bitmovin Video Developer Report - 3
Figure 3. Responses to the question, “Where do you see the most opportunity for innovation in your service?

Curious? Download our white paper, How to Slash CAPEX, OPEX, and Carbon Emissions Using the NETINT T408 Video Transcoder here. Or, compute how long it will take to recoup your investment in ASIC-based encoding through reduced power costs via calculators available here.

And don’t forget to download the Bitmovin Video Developer Report, here.

How NETINT enables ASIC upgradeability with Software

ASIC upgradeability with Software - NETINT technologies

ASICs provide a tremendous energy efficiency, and yet suffer from being fixed function with limited programmability. This was a core engineering challenge that we addressed in the development of the Codensity ASIC family with upgradeable firmware that can be used for a variety of purposes, including adding new features and improving coding performance, and functionality.

To explore these capabilities, we spoke with two members of the NETINT development team, Neil Gunn, who is NETINT’s Video Firmware Tech Lead, and Savio Lam, a firmware engineer. In this short discussion, they describe how firmware allows Codensity video transcoders and VPU’s to evolve and improve long after leaving the foundry. 

This conversation focuses mainly on our Codensity G4 ASIC, however the capability to upgrade firmware applies to all of our ASIC platforms including the Codensity G5.

What do you do with NETINT?

NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Neil Gunn

I am a firmware architect and also develop the firmware and to a lesser extent, the host side software (libxcoder and FFmpeg) for NETINT transcoding ASICs. I started at NETINT in 2018 working on T408 (Codensity G4 based) firmware development. Then, I moved to Quadra (Codensity G5 based) as a software architect and firmware/software developer. I continue to support T408 in the background.

SAVIO LAM - NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Savio Lam

I am a firmware engineer working on our video transcoding products.

What did you do on the T408?

NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Neil Gunn

I implemented a number of video features in the firmware such as 10-bit transcoding, close captions, HDR10, HDR10+, HLG10, Dolby Vision, HRD, Region of Interest, encoder parameter change, etc. I also worked on bug fixes and customer issues.

SAVIO LAM - NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Savio Lam

I worked on the system design and integration. I mainly developed code that controls how video data comes in and out of our transcoder in the most efficient and reliable way.

What is firmware in an ASIC?

NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Neil Gunn

The firmware is software that runs on embedded CPUs within the ASIC. The firmware provides a high-level interface to the low-level encoding and decoding hardware. The firmware does a lot of the high-level bitstream processing, such as creating VPS, SPS, and PPS headers, and SEI processing, leaving the ASIC hardware to do the low-level number crunching. Functions that consume a lot of processing and are likely not to change are implemented in hardware.

SAVIO LAM - NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Savio Lam

To add to what Neil has already described, the firmware in our T408 ASIC manages several significant functions. For example, it comprises code responsible for the NVMe protocol, which allows us to efficiently receive and return up to 8GB/s of video input and output data. To properly consume and process the video data, the firmware sets up and schedules tasks to the appropriate hardware blocks.

Our firmware is also the brain that oversees the bigger picture part of the rate control. In this role, it’s part of a feedback loop that inputs subpicture data from low-level hardware blocks and uses that data to make better decisions that improve picture quality.

To sum up, the firmware is the brain that controls all the hardware blocks in the ASIC and gives instructions to each of them to perform their tasks as efficiently as possible.

How is firmware different from the gates burned into the chip?

NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Neil Gunn

Firmware, like all software, can be changed, unlike actual gates in a chip. It’s called firmware because it’s a little harder to change than software. Firmware is stored in Flash memory which can be reprogrammed through an upgrade process. A T408 firmware release typically consists of new host-side software and firmware that must be version-matched for proper operation. Software provided to our customers with the release simplify the upgrade for one or more T408s in a system.

SAVIO LAM - NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Savio Lam

There is logic in our T408 ASIC, which could have been designed as part of the hardware for better performance. However, that would significantly limit us from adding and improving the certain product features to suit different customer needs. We believe we have found the right balance on deciding what should be implemented in the firmware or hardware.

What functions can you adjust and/or improve within firmware?

NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Neil Gunn

Things like the codec headers, seis, and rate control, to a certain extent, can be adjusted and/or improved within the firmware. Some lower-level rate control features are fixed in the hardware. Lower-level parts of the encoding standard are fixed in the hardware as these require a lot of processing and are unlikely to change.

SAVIO LAM - NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Savio Lam

As Neil said, we are quite flexible when it comes to adding or improving support for different video metadata. And as we both explained earlier, since the firmware is also part of the brain that operates the picture rate control for encoding, we can continue to improve quality to a certain degree post-ASIC development.

Do you have any examples of significant improvements with the T408?

NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Neil Gunn

We significantly reduced codec delay on both the encoder and decoder. Our low delay mode removes all frame buffering and encodes and decodes a single frame at a time. Our encoder uses a low delay GOP and sets flags in the bitstream appropriately so that another decoder knows that it doesn’t need to add any delay while decoding.

SAVIO LAM - NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Savio Lam

Based on different customers’ feedback, we have made several improvements (or fixes) in the past to our rate control through firmware fixes which improved or resolved some of the video quality-related problems they have encountered.

When you hear people say ASICs are obsolete the day they come out of the foundry, what’s your response?

NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Neil Gunn

It’s not true. It is true that the hardware is fixed in an ASIC. Still, the functions implemented in the hardware are typically the lower-level parts of a video codec standard that do not change over time and so the hardware does not need to be updated. The higher levels parts of the video codecs are in firmware and driver software and can still be changed. For example, the T408 encoder hardware is designed for H.264 and H.265. We cannot add new codecs to the T408, but we can add new features to the existing codecs.

SAVIO LAM - NEIL GUNN - How NETINT enables ASIC upgradeability with Software
Savio Lam

There is a fine balance between what needs to be implemented in hardware for performance and what needs to be implemented in the firmware for flexibility (programmability). We think we struck the perfect balance with the Codensity G4 which is what makes it a great ASIC.

This conversation focuses mainly on our Codensity G4 ASIC, however the capability to upgrade firmware applies to all of our ASIC platforms including the Codensity G5.

Computing Payback Period on T408s

Computing Payback Period on T408s

One of the most power-hungry processes performed in data centers is software-based live transcoding, which can be performed much more efficiently with ASIC-based transcoders. With power costs soaring and carbon emissions an ever-increasing concern, data centers that perform high-volume live transcoding should strongly consider switching to ASIC-based transcoders like the NETINT T408. Computing the Payback Period is easy with this calculator.

To assist in this transition, NETINT recently published two online calculators that measure the cost savings and payback period for replacing software-based transcoders with T408s. This article describes how to use these calculators and shows that data centers can recover their investment in T408 transcoders in just a few months, even less if you can repurpose servers previously used for encoding for other uses. Most of the data shown are from a white paper that you can access here.

About the T408

Briefly, NETINT designs, develops, and sells ASIC-powered transcoders like the T408, which is a video transcoder in a U.2 form factor containing a single ASIC. Operating in x86 and ARM-based servers, T408 transcoders output H.264 or HEVC at up to 4Kp60 or 4x 1080p60 streams per T408 module and draw only 7 watts.

Simply stated, a single T408 can produce roughly the same output as a 32-core workstation encoding in software but drawing anywhere from 250 – 500 watts of power. You can install up to 24 T408s in a single workstation, which essentially replaces 20 – 24 standalone encoding workstations, slashing power costs and the associated carbon emissions.

In a nutshell, these savings are why large video publishers like YouTube and Meta are switching to ASICs. By deploying NETINT’s T408s, you can achieve the same benefits without the associated R&D and manufacturing costs. The new calculators will help you quantify the savings.

Determining the Required Number of T408s

The first calculator, available here, computes the number of T408s required for your production. There are two steps; first, enter the rungs of your encoding ladder into the table as shown. If you don’t know the details of your ladder, you can click the Insert Sample HD or 4K Ladder buttons to insert sample ladders.

After entering your ladder information, insert the number of encoding ladders that you need to produce simultaneously, which in the table is 100. Then press the Compute button (not shown in the Figure but obvious on the calculator).

Calculator 1: Computing the number of required T408 transcoders.

This yields a total of 41 T408s. For perspective, the calculator should be very accurate for streams that don’t require scaling, like 1080p inputs output to 1080p. However, while the T408 decodes and transcodes in hardware, it relies on the host CPU for scaling. If you’re processing full encoding ladders, as we are in this example, throughput will be impacted by the power of the host CPU.

As designed, the calculator assumes that your T408 server is driven by a 32-core host CPU. On an 8-16 core system, expect perhaps 5 – 10% lower throughput. On a 64-core system, throughput could increase by 15 – 20%. Accordingly, please consider the output from this calculator as a good rough estimate accurate to about plus or minus 20%.

To compute the payback period, click the Compute Payback Period shown in Figure 1. To restart the calculation, refresh your browser.

Computing Payback Period

Computing the payback period requires significantly more information, which is summarized in the following graphic.

Calculator 2: Information needed to compute the payback period.

Step by step

  1. Choose your currency in the drop-down list.

  2. Enter your current cost per KW. The $0.25/KW is the approximate UK cost as of March 2022 from this source, which you can also access by clicking the information button to the right of this field. This information button also contains a link to US power costs here.

  3. Enter the number of encoders currently transcoding your live streams. In the referenced white paper, 34 was the number of required servers needed to produce 100 H.264 encoding ladders.

  4. Enter the power consumption per encoding server. The 289 watts shown were the actual power consumption measured for the referenced white paper. If you don’t know your power consumption, click the Info button for some suggested values.

  5. Enter the number of encoding servers that can be repurposed. The T408s will dramatically improve encoding density; for example, in the white paper, it took 34 servers transcoding with software to produce the same streams as five servers with ten T408s each. Since you won’t need as many encoding servers, you can shift them to other applications, which has an immediate economic benefit. If you won’t be able to repurpose any existing servers for some reason, enter 0 here.

  6. Enter the current cost of the encoding servers that can be repurposed. This number will be used to compute the economic benefit of repurposing servers for other functions rather than buying new servers for those functions. You should use the current replacement cost for these servers rather than the original price.

  7. Enter the number of T408s required. If you start with the first calculator, this number will be auto-filled.

  8. Enter your cost for the T408s. $400 is the retail price of the T408 in low quantities. To request pricing for higher volumes, please check with a NETINT sales representative. You can arrange a meeting HERE. 

  9. Enter the power consumption for each T408. The T408 draws 7 watts of power which should be auto-filled.

  10. Enter the number of computers needed to host the T408s. You can deploy up to ten T408s in a 1RU server and up to 24 T408s in a 2RU server. We assumed that you would deploy using the first option (10 T408s in a single 1RU) and auto-filled this entry with that calculation. If the actual number is different, enter the number of computers you anticipate buying for the T408s.

  11. Enter the price for computers purchased to run T408s (USD). If you need to purchase new computers to house the T408, enter the cost here. Note that since the T408 decodes incoming H.264 and HEVC streams and transcodes on-board to those formats, most use cases work fine on workstations with 8-16 cores, though you’ll need a U.2 expansion chassis to house the T408s. Check this link for more information about choosing a server to house the T408s. We assumed $3,000 because that was the cost for the server used in the white paper.

    If you’re repurposing existing hardware, enter the current cost, similar to number 6.

 

  1. Enter power consumption for the servers (watts/hour). As mentioned, you won’t need a very powerful computer to run the T408s, and CPU utilization and power consumption should be modest because the T408s are doing most of the work. This number is the base power consumption of the computer itself; the power utilized by the T408s will be added separately.

When you’ve entered all the data, press the Calculate button.

Interpreting the Results

The calculator computes the payback period under three assumptions:

  • Simple: Payback Period on T408 Purchases
  • Simple: Payback Period on T408 + New Computers
  • Comprehensive: Consider all costs
Figure 3. Simple payback on T408 purchases.

This result divides the cost of the T408 purchases by the monthly savings and shows a payback period of around 11 months. That said, if five servers with T408s essentially replaced 34 servers, unless you’re discarding the 29 servers, the third result is probably a more accurate reflection of the actual economic impact.

Figure 4. Simple: Payback Period on T408 + New Computers

This result includes the cost of the servers necessary to run the T408s, which extends the payback period to about 20.5 months. Again, however, if you’re able to allocate existing encoding servers into other roles, the third calculation is a more accurate reflection.

Figure 5. Comprehensive: consider all costs.

This result incorporates all economic factors. In this case, the value of the repurposed computers ($145,000) exceeds the costs of the T408s and the computers necessary to house them ($103,600), so you’re ahead the day you make the switch.

However you run the numbers, data centers driving high-volume live transcoding operations will find that ASIC-based transcoders will pay for themselves in a matter of months. If power costs keep rising, the payback period will obviously shrink even further.

2022-Opportunities and Challenges for the Streaming Video Industry

2022-Opportunities and Challenges for the Streaming Video Industry

As 2022 comes to a close, for those in the streaming video industry, it will be remembered as a turbulent year marked by new opportunities, including the emergence of new video platforms and services.

2022 started off with Meta’s futuristic vision of the internet known as the Metaverse. The Metaverse can be described as a combination of virtual reality, augmented reality, and video where users interact within a digital universe. The Metaverse continues to evolve with the trend of unique individual, one-to-one video streaming experiences in contrast to one-to-many video streaming services which are commonplace today. 

Recent surveys have shown that two-thirds of consumers are planning to cut back on streaming subscriptions due to rising costs and diminishing discretionary income. With consumers becoming more value-conscious and price-sensitive, Netflix and other platforms have introduced new innovative subscriber models. Netflix’s subscription offering, in addition to SVOD (Subscription Video on Demand), now includes an Ad-based tier, AVOD (Advertising Video on Demand).  

Netflix shows the way

This new ad-based tier targets the most price sensitive customers and it is projected that AVOD growth will lead SVOD by 3x in 2023. Netflix can potentially earn over $4B in advertising revenue, making them the second largest ad support platform only after YouTube. This year also saw Netflix making big moves into mobile cloud gaming with the purchase of its 6th gaming studio. Adding gaming to their product portfolio serves at least two purposes: it expands the number of platforms that can access their game titles and serves as another service to maintain their existing users.

These new services and platforms are a small sample of the continued growth in new streaming video services where business opportunities abound for video platforms willing to innovate and take risks.

Stop data center expansion

The new streaming video industry landscape requires platforms to provide innovative new services to highly cost sensitive customers in a regulatory environment that discourages data center expansion. To prosper in 2023 and beyond, video platforms must address key issues to prosper and add services and subscribers.

  • Controlling data center sprawl – new services and extra capacity can no longer be contingent on the creation of new and larger data centers.
  • Controlling OPEX and CAPEX – in the current global economic climate, costs need to be controlled to keep prices under control and drive subscriber growth. In addition, in today’s economic uncertainty, access to financing and capital to fund data expansion cannot be assumed.
  • Energy consumption and environmental impact are intrinsically linked, and both must be reduced. Governments are now enacting environmental regulations and platforms that do not adopt green policies do so at their own peril.

Application Specific Integrated Circuit

For a vision of what needs to be done to address these issues, one only needs to glimpse into the recent past at YouTube’s Argos VCU (Video Coding Unit). Argos is YouTube’s in-house designed ASIC (Application Specific Integrated Circuit) encoder that, among other objectives, enabled YouTube to reduce their encoding costs, server footprint, and power consumption. YouTube is encoding over 500 hours (about 3 weeks) of content per minute.

To stay ahead of this workload, Google designed their own ASIC, which enabled them to eliminate millions of Intel CPUs. Obviously, not everyone has their own in-house ASIC development team, but whether you are a hyperscale platform, commercial, institutional, or government video platform, the NETINT Codensity ASIC-powered video processing units are available.

To enable faster adoption, NETINT partnered with Supermicro, the global leader in green server solutions. The NETINT Video Transcoding Server is based on a 1RU Supermicro server powered with 10 NETINT T408 ASIC-based video transcoder modules. The NETINT Video Transcoding Server, with its ASIC encoding engine, enables a 20x reduction in operational costs compared to CPU/software-based encoding. The massive savings in operational costs offset the CAPEX associated with upgrading to the NETINT video transcoding server.

Supermicro and T408 Server Bundle

In addition to the extraordinary cost savings, the advantages of ASIC encoding include enabling a reduction in the server footprint by a factor of 25x or more, which has a corresponding reduction in power consumption and, as a bonus, is also accompanied by a 25x reduction in carbon emissions. This enables video platforms to expand encoding capacity without increasing their server or carbon footprints, avoiding potential regulatory setbacks.

In need of environmentally friendly technologies

2022 has seen the emergence of many new opportunities with the launch of new innovative video services and platforms. To ensure the business success of these services, in the light of global economic uncertainty and geopolitical unrest, video platforms must rethink how these services are deployed and embrace new cost-efficient, environmentally friendly technologies.

Vindral CDN Against Dinosaurs’ Agreement

Vindral's CDN Against Dinosaurs' Agreement.jpg

One thing is the bill that you're getting, the other thing is the bill we're leaving to our children...”

WATCH FULL CONVERSATION HERE: https://youtu.be/tNPFpXPVpxI

We’re going to talk about Vindral – but first, tell us a little bit about RealSprint?

RealSprint, we’re a Swedish company based in Northern Sweden, which is kind of a great place to be running a tech company. When you’re in a University Town, and any time after September, it gets dark outside for most parts of the day, which means  people generally try to find things to do inside. So, it’s a good place to have a tech business because you’ll have people spending a lot of time in front of their screens, creating things. RealSprint is a heavily culture-focused team, with the majority located in Northern Sweden and a few based in Stockholm and in the U.S.

The company started around 10 years ago as a really small team that did not have the end game figured out yet.  All they knew was that they wanted to do something around video, broadcasting, and streaming. From there it’s grown, and today we’re 30 people.

At a high level, what is Vindral?

Vindral is actually a product family. There is a live CDN, as you mentioned, and there’s also a video compositing software. As for the live CDN, it’s been around five or six years that it’s been running 24/7.

The product was born because we got questions from our clients about latency and quality. ‘Why do I have to choose if I want low latency or if I want high quality’. There are solutions on both ends of that spectrum, but when we got introduced to the problem, there weren’t really any good ones. We started looking into real-time technologies, like webRTC, in its current state and quickly found that it’s not really suitable if you want high quality. It’s amazing in terms of latency. But the client’s reality requires more. You can’t go all in on only one aspect of a solution. You need something that’s balanced.

Draw us a block diagram. So, you’ve got your encoder, you’ve got your CDN, you’ve got software…

We can take a typical client in entertainment or gaming. So, they have their content, and they want to broadcast that to a global audience. What they generally do is they ingest one signal to our endpoint, which is the most standard way of using our CDN. And there are several ways of ingesting multiple transfer protocols.

The first thing that happens on our end is we create the ABR ladder. We transcode all the qualities that are needed since network conditions vary  between  markets. Even in places that are well connected, the home Wi-Fi alone can be so bad at times, with a lot of jitter and latency.

After the ABR ladder is created, the next box fans out to the places in the world where there are potential viewers. And from there, we also have edge software as one part of this. Lastly, the signal is received by the player instanced on the device.

That’s basically it.

You’ve got an encoder in the middle of things creating the encoding ladder. Then you’ve got the CDN distributing. What about the software that you’ve contributed? How does that work? Do I log into some kind of portal and then administrate through there?

Exactly. Take a typical client in gaming, for example.They’re running 50 or 100 channels. And they want to see what’s going on in their operations, understand how much data is flowing through the system, and things like that. There is a portal where they can log in, see their usage, and see all of the channel information that they would need. It’s a very important part, of course, of any mature system that the client understands what’s going on.

Encoding is particularly important for us to solve because we have loads of channels running 24/7. So, that’s different. If you’re running a CDN, and your typical client is broadcasting for 20 minutes a month, then, of course, the encoding load is much lower. In our case, yes, we do have those types (minimal usage), but many of our clients are heavy users, and they own a lot of content rights. Therefore, the encoding part is several hundreds of terabytes ingested. Only one quality for each stream monthly on the ingest side.

You’re encoding ABR. Which codecs are you supporting? And which endpoints are you supporting?

So, codec-wise, everybody does H264, of course. That’s the standard when it comes to live streaming with low latency. We have recently added AV1, as well, which was something we announced as a world first. We weren’t the world’s first with AV1, but we were the world’s first with AV1 at what many would call real-time. We call it low latency.

We chose to add it because there’s a market pointing to AV1.

Which devices are you targeting? Is it TV? Smart TV? Mobile? The whole gamut?

I would say the whole gamut. That list of devices is steadily growing. I’m trying to think of any devices that we don’t support. Essentially, as long as it’s using the internet, we deliver to it. Any desktop or mobile browser, including IOS as well.

iOS is, basically, the hardest one. If you’re delivering to iOS browsers that are all running iOS Safari. We’re getting the same performance on iOS Safari. And then Apple TV, Google Chromecast, Samsung, LG TVs, and Android TVs. There’s a plethora of different devices that our clients require us to support.

4K? 1080p? HDR? SDR?

Yes, we support all of them. One of the very important things for us is to prove that you can get quality on low latency.

Take a typical client. They’re broadcasting sports and their viewers are used to watching this on their television, maybe a 77-inch or 85-inch TV. You don’t want that user to get a 720p stream. This is where the configurable latency really comes into play, allowing the client to pick a second of latency or 800 milliseconds, with 4K to be maintained on that latency. That is one of the use cases where we shine.

There’s also a huge market for lower qualities as well, where that’s important.

So, you mentioned ABR ladders, and yes, there are markets where you get 600 kilobits per second on the last mile. You need a solution for that as well.

Your system is the delivery side, the encoding side. Which types of encoders did you consider when you chose the encoder to fit into Vindral?

There are actually two steps to consider depending on whether we’re doing it on-prem or off, like a cloud solution. The client often has their own encoders. Many of our clients use Elemental or something similar just to push the material to us. But on the transcoding, where we generate the ladder, unless we’re passing all qualities through (which is also a possibility), there are, of course, different ways and different directions to go for different scenarios. For example, if you take an Intel CPU-based and you use software to encode. That is a viable option in some scenarios, but not in all.

There’s an Nvidia GPU, for example, which you could use in some scenarios since there are many factors coming into play when making that decision.

The highest priority of all is something that our business generally does badly –maintaining business viability. You want to make sure that any client that is using the system can pay and make their business work. Now, if we have channels that are running 24/7, as we do, and if it’s in a region where it’s not impossible to allocate bare metal or collocation space, then that is a fantastic option in many ways.

CPU-based, GPU-based, and ASICs are all different and make up the three different ones that we’ve looked into.

So, how do you differentiate? You talked about software being a good option in some instances. When is it not a good option?

No option is good or bad in a sense, but if you compare them, both the GPU and the ASIC outperform the software encoding when it comes to heavier use.

The software option is useful when you need to spin it up, spin it down, and you need to move things. You need it to be flexible which is, usually, in the lower revenue parts of the markets.

When it comes to the big broadcaster and the large rights holders, the use case is heavier with many channels, and large usage over time, then the GPU and especially the ASIC make a lot of sense.

You’re talking there about density. What is the quality picture?
A lot of people think software quality is going to be better than ASIC and GPUs. How do they compare?

It might be in some instances. We’ve found that the quality when using ASICs is fantastic. It’s all depending on what you want to do. Because we need to understand we’re talking about low latency here. We don’t have the option of passing encoding or anything like that. Everything needs to work in real time. Our requirement on encoding is that it takes a frame to encode, and that’s all the time that you get.

You mentioned density, but there are a lot of other things coming into play, quality being one.

If you’re looking at ASICs, you’re comparing that to GPUs. In some scenarios we’ve had for the past two years, the decision could have been based on the availability factor – there’s a chip shortage. What can I get my hands on? In some cases,  we’ve had a client banging on the door, and they want to go live right away.

Going back to the density part. That is a huge game changer because the ASIC is unmatched in terms of the number of streams per rack unit. If you just measure that KPI, and you’re willing to do the job of building your CDN in co-location spaces, which not everybody is, then that’s it. You have to ask yourself, though, who’s going to manage this? You don’t want to bloat when you’re managing this type of solution. If you have thousands of channels running, then cost is one thing when it comes to not having to take up a lot of rack space, but also, you don’t want it to bloat too much.

How formal of analysis did you make in choosing between the two hardware alternatives? Did you bring it down to cost per stream and power per stream?
Did you do any of that math? How did you make that decision between those two options?

Well, in a way, yes. But, on that particular metric, we need to look at the two options and say well, this is at a tenth of the cost. So I’m not going to give you the number, because I know it’s so much smaller.

We’re well aware of what costs are involved, but the cost per stream depends on profiles, etc. Just comparing them. We’ve, naturally, looked at things like started encoding streams, especially in AV1. We look at what the actual performance is, how much load there is, and what’s happening on the cards, and how much you can put on them before they start giving in… But then… there’s such a big difference…

Take, for example, a GPU. A great piece of hardware. But it’s also kind of like buying a car for the sound system. Because the GPU… If I’m buying an NVIDIA GPU to encode video, then I might not even be using the actual rendering capabilities. That is the biggest job that the GPU is typically built for. So, that’s one of the comparisons to make, of course.

Take, for example, a GPU. A great piece of hardware. But it's also kind of like buying a car for the sound system.”

What about the power side? How important is power consumption to either you yourself or your customers?

If you look at the energy crisis and how things are evolving I’d say it is very, very important. The typical offer you’ll be getting from the data center is: we’re going to charge you 2x the electrical bill. And that’s never been something that’s been charged because they don’t even bother. Only now, we’re seeing the first invoices coming in where the electrical bill is part of it. In Germany, the energy price peaked in August at 0.7 Euros per kilowatt hour.

Frankfurt, Germany, is one of the major exchanges that is extremely important. If you want performance streaming, you need to have something in Frankfurt.  There’s another part of it as well, which is, of course, the environmental aspect of it. One thing is the bill that you’re getting. The other thing is the bill we’re leaving to our children.

It’s kind of contradictory because many of our clients  make travel unnecessary. We have a Norwegian company that we’re working with that is doing remote inspections of ships. They were the first company in the world to do that. Instead of flying in an inspector, the ship owner, and two divers to the location, there’s only one operator of an underwater drone that is on the location. Everybody else is just connected. That’s obviously a good thing for the environment. But what are we doing?

Why did you decide to lead with AV1?

That’s a really good question. There are several reasons why we decided to lead with AV1. It is very compelling as soon as you can do it in real time. We had to wait for somebody to make it viable, which we found with the NETINT’s ASIC.

Viable acts at high quality and with latency and reliability that we could use and also, of course, with throughput. We don’t have to buy too much hardware to get it working.

We’re seeing markers that our clients are going to want AV1. And there are several reasons why that is the case. One of which is, of course, it’s license free. If you’re a content owner, especially if you’re a content owner with a large crowd with many subscribers to your content, that’s a game-changer. Because the cost of licensing a codec can grow to become a significant part of your business expenses.

Look at what’s happening with fast, free, ad-supported television. There you’re trying to get even more viewers. And you have lower margins so what you’re doing is creating eyeball minutes. And then, if you have codec and license costs, that’s a bit of an issue. It’s better if it’s free.

Is this what you’re hearing from your customers? Or is this what you’re assuming they’re thinking about?

That’s what we’re hearing from our customers, and that’s why we started implementing it.

For us, there’s also the bandwidth-to-quality aspect, which is great. I believe that it will explode in 2023. For example, if you look at what happened one month ago, Google made hardware decoding mandatory for Android 14 devices. That’s both phones and tablets. It opens so many possibilities.

We were not expecting to get business on it yet, but we are, and I’m happy about that. There are already clients reaching out because of the licensing aspect, as some of them are transmitting petabytes a month. If you can bring down the bandwidth while retaining the quality, that’s a good deal.

You mentioned before that your systems allow the user to dial in the latency and the quality. Could you explain how that works?

It’s important to make a difference between the user and the broadcaster. Our client is the broadcaster that owns the content, and they can pick the latency.

Vindral’s live CDN doesn’t work on a ‘fetch your file’ basis. The way it works is we’re going to push the file to you, and you’re going to play it out. And this is how much you’re going to buffer. Once you have that setup, and, of course, a lot of sync algorithms and things like that at work, then the stream is not allowed to drift.

A typical use case is where you have tick live auctions, for example. The typical setup for live auctions is 1080P, and you want below one second of latency because people are bidding. There are also people bidding in the actual auction house, so there’s the fairness aspect of it as well.

What we typically see is they configure maybe a 700-millisecond buffer, and it makes it possible. Even that small of a buffer makes such a huge difference. What we see in our metrics is that, basically, 99% of the viewers are getting the highest quality stream across all markets. That’s a huge deal.

How much does the quality drop off? What’s the lowest latency do you support and how much does the quality drop off at that latency as compared to one or two seconds.

I would say that the lowest that we would maybe recommend somebody to use our system for is 500 milliseconds. That would be about 250 milliseconds slower than a webRTC-based real-time solution. And why do I say that? It’s because other than that, I see no reason to use our approach. If you don’t want a buffer, you may as well use something else.

Actually, we don’t have that many clients trying that out, because most of them 500 milliseconds is the lowest somebody’s sets. And they’ve been like ‘this is so quick we don’t need anything more’. And it retains 4K at that latency.

How does the pitch work against webRTC?
If I’m a potential customer of yours and you come in and talk about your system and compared to webRTC, what are the pros and cons of each? It’s an interesting technological decision. I know that webRTC is going to be potentially lower latency, but it might only be one stream, may not come with captioning, it’s not gonna be the ABR It’s interesting to hear what technology was, how do you differentiate.

Let’s look from the perspective of when you should be using which. If you need to have a two-way voice conversation, you should use webRTC. There are actually studies that have been made proving that if you bring the latency up above 200 milliseconds, the conversation starts feeling awkward. If you have half a second, it is possible, but it’s not good. So, if that’s an ultimate requirement, then webRTC all day long.

Both technologies are actually very similar. The main difference I would point out is that we have added this buffer that the platform owner can set. So, the player’s instance is at that buffer level. WebRTC currently does not support that. And even if it did, we might even Implement that as an option. And it might go that way at some point. Today it’s not.

On the topic of differences, then. If 700 or 600 milliseconds of latency is good for you and quality is still important, then you should be using a buffer and using our solution. When you’re considering different vendors, the feature set, and what you’re actually getting in the package, there are huge differences. For some vendors, on their lower-tier products, ABR is not included. Things like that. Where the obvious thing is – you should be using ABR. Definitely.

You talked about the shortest. What’s the longest latency you see people dialing in?

We’ve actually had one use case in Hong Kong where they chose to set the latency at 3.7 seconds. That was because the television broadcast was at 3.7 seconds.

That’s the other thing. We talk a lot about latency. Latency is a hot topic, but honestly, many of our clients value synchronization even above latency. Not all clients, but some of them.

If you have a game show where you want to react to the chat and have some sort of interactivity… Maybe you have 1.5 seconds. That’s not a big issue if it’s at 1.5 seconds of latency. You will, naturally, get a little bit more stability since you’re increasing the buffer. Some of our clients have chosen to do that.

But around 3.5… That’s actually the only client we’ve had that has done that. But I think there could be more in the future. Especially in sports. If you have the satellite broadcast… It is at seven seconds of latency. We can match it to the hundreds of hundreds of milliseconds.

Latency is a hot topic, but honestly, many of our clients value synchronization even above latency.”

And the advantage of higher latency is going to be stream stability and quality.
Do you know what’s the quality difference is going to be?

Definitely. However, as soon as you’re above even one second, the returns are diminishing. It’s not like it unlocks this whole universe of opportunities. On extreme markets, it might, but I would think that if you’re going above two seconds, you’ve kind of done. There is no need to go higher. At least our clients have not found that need. The markets are basically from East Asia to South America and South Africa because we’ve expanded our CDN into those parts.

You’ve spoken a couple of times about where you install your equipment, and you’re talking about co-locating and things like that. What’s your typical server look like. How many encoders are you putting in it? And what type of density are you expecting from that?

In general, it would be something like one server can do 10 times as many streams if you’re using the ASIC. Then if you’re using GPUs, like Nvidia, for example, it’s probably just the one. I’m not stating any numbers, because my IT guys are going to tell me that I was wrong.

What is the cost of low latency? If I decide to go to the smallest setting, what is that going to cost me? I guess there’s going to be a quality answer, and there’s going to be a stability answer… Is there a hard economic answer?

My hope is that there shouldn’t be a cost difference, depending on regions. The way we’ve chosen to operate is about the design paradigm of the product that you’ve created. We have competitors that are going with one partner. They’ve picked cloud vendor X, and they’re running everything in their cloud. And then what they can do is limited to the deal with that cloud vendor.

For example, we had an AV1 request from Greece. Huge egress for an internet TV channel that I was blown away by, and they mentioned their pricing. They wanted to save costs by cutting their traffic by using av1. What we did with that request is we went out to our partners and vendors and asked them – can you help us match this, and we did. From a business perspective, it might, in some cases, cost more. But there is also a perception that plagues the low latency business of high cost and that is because many of these companies have not considered their power consumption – their form factors.

Actually, being willing to take a CAPEX investment instead of just running in the cloud and paying as you go. Many of those things that we’ve chosen to put the time into so that there will not be that big a difference.

Take, for example, Tata Communications, one of our biggest partners, and their pricing. They’re running our software stack in their environments to run their VDM, and it’s on a cost parity. So that’s something that should always be the aim. Then, I’m not going to say it’s always going to be like that, but that’s just a short version when you’re talking about the business implications.

We’re often getting requests where the potential client has this notion that it’s going to be a very high cost. Then they find that this makes sense, and we can build a business.

Are you seeing companies moving away from the cloud towards creating their own co-located servers with encoders and producing that way, as opposed to paying cents per minute to different cloud providers?

I would say I’m seeing the opposite. We’re doing both, just to be clear. I think the way to go is to do a hybrid.

For some clients, they’re going to be broadcasting 20 minutes a month. Cloud is awesome for that. You spin it up when you need it, and you kill it when it’s done. But that’s not always going to cut it. But if you’re asking me what motion I’m seeing in the market? There are more and more of these companies that are deploying across one cloud. And that’s where it resides. There are also types of offerings that you can instance yourself in third-party clouds, which is also an option. But again, it’s the design choice that it’s a cloud service that uses underlying cloud functions. It’s a shame that it’s not more of both. It creates an opportunity for us, though.

What are the big trends that you’re chasing for 2023 and beyond? What are you seeing? What forces are going to impact your business? The new features you’re going to be picking up? What are the big technology directions you’re seeing?

I mean, for us on our roadmap, we have been working hard on our partner strategy, and we’ve been seeing a higher demand for white-label solutions, which is what we’re working on with some partners.

We’ve done a few of those installs, and that’s where we are putting a lot of effort into it because we’re running our own CDN. But we can also enable others to do it, even as a managed service. You have these telcos that have maybe an edge or less offering since before, and they’re sitting on tons of equipment and fiber. So that’s one thing.

If we’re making predictions, there are two things worth a mention. I would expect the sports betting markets, especially in the US, to explode. That’s something we are definitely keeping our eyes on.

Maybe live shopping becomes a thing outside of China. Many of the big players, the big retailers, and even financial companies, are working on their own offerings and live shopping.

Vindral's CDN Against Dinosaurs' Agreement.jpg

The dinosaurs’ agreement?

Have I told you about the dinosaurs’ agreement? It’s comparable to a gentleman’s agreement. This might be provocative to some. And I get that it’s complicated in many cases.

There is, among some of the bigger players and also among independent consultants that have different stakes, a sort of mutual agreement to keep asking the question – do we really need low latency? Or do we really need synchronization?

As long as the bigger brands are not creating the experience that the audience is waiting for them to create, nobody's going to have to move.”

And while a valid question it’s also kind of a self-fulfilling prophecy. Because as long as the bigger brands are not creating the experience that the audience is waiting for them to create, nobody’s going to have to move. So that is what I’m calling the dinosaurs here. They’re holding on to the thing that they’ve always been doing. And they’re optimizing that, but not moving on to the next generation. And the problem they’re going to be facing, hopefully, is that when it reaches critical mass, the viewers are going to start expecting it, and that’s when things might start changing.

There are many workflow considerations, of course. There are tech legacy considerations. There are cost considerations and different aspects when it comes to scaling. However, saying that you don’t need low latency is a bit of an excuse.

One thing is the bill that you're getting, the other thing is the bill we're leaving to our children..”