Understanding the Economics of Transcoding

Understanding the Economics of Transcoding

Whether your business model is FAST or subscription-based premium content, your success depends upon your ability to deliver a high-quality viewing experience while relentlessly reducing costs. Transcoding is one of the most expensive production-related costs and the ultimate determinant of video quality, so obviously plays a huge role on both sides of this equation. This article identifies the most relevant metrics for ascertaining the true cost of transcoding and then uses these metrics to compare the relative cost of the available methods for live transcoding.

Economics of Transcoding: Cost Metrics

There are two potential cost categories associated with transcoding: capital costs and operating costs. Capital costs arise when you buy your own transcoding gear, while operating costs apply when you operate this equipment or use a cloud provider. Let’s discuss each in turn.

Economics of Transcoding: CAPEX

The simplest way to compare transcoders is to normalize capital and operating costs using the cost per stream or cost per ladder, which simplifies comparing disparate systems with different costs and throughput. The cost per stream applies to services inputting and delivering a single stream, while the cost per ladder applies to services inputting a single stream and outputting an encoding ladder.

We’ll present real-world comparisons once we introduce the available transcoding options, but for the purposes of this discussion, consider the simple example in Table 1. The top line shows that System B costs twice as much as System A, while line 2 shows that it also offers 250% of the capacity of System A. On a cost-per-stream basis, System B is actually cheaper.

Understanding the Economics of Transcoding - table 1
TABLE 1: A simple cost-per-stream analysis.

The next few lines use this data to compute the number of required systems for each approach and the total CAPEX. Assuming that your service needs 640 simultaneous streams, the total CAPEX for System A dwarfs that of System B. Clearly, just because a particular system costs more than another doesn’t make it the more expensive option.

For the record, the throughput of a particular server is also referred to as density, and it obviously impacts OPEX charges. System B delivers over six times the streams from the same 1RU rack as System A, so is much more dense, which will directly impact both power consumption and storage charges.

Details Matter

Several factors complicate the otherwise simple analysis of cost per stream. First, you should analyze using the output codec or codecs, current and future. Many systems output H.264 quite competently but choke considerably with the much more complex HEVC codec. If AV1 may be in your future plans, you should prioritize a transcoder that outputs AV1 and compare cost per stream against all alternatives.

The second requirement is to use consistent output parameters. Some vendors quote throughput at 30 fps, some at 60 fps. Obviously, you need to use the same value for all transcoding options. As a rough rule of thumb, if a vendor quotes 60 fps, you can double the throughput for 30 fps, so a system that can output 8 1080p60 streams and likely output 16 1080p30 streams. Obviously, you should verify this before buying.

If a vendor quotes in streams and you’re outputting encoding ladders, it’s more complicated. Encoding ladders involve scaling to lower resolutions for the lower-quality rungs. If the transcoder performs scaling on-board, throughput should be greater than systems that scale using the host CPU, and you can deploy a less capable (and less expensive) host system.

The last consideration involves the concept of “operating point,” or the encoding parameters that you would likely use for your production, and the throughput and quality at those parameters. To explain, most transcoders include encoding options that trade off quality vs throughput much like presets do for x264 and x265. Choosing the optimal setting for your transcoding hardware is often a balance of throughput and bandwidth costs. That is, if a particular setting saves 10% bandwidth, it might make economic sense to encode using that setting even if it drops throughput by 10% and raises your capital cost accordingly. So, you’d want to compute your throughput numbers and cost per stream at that operating point.

In addition, many transcoders produce lower throughput when operating in low latency mode. If you’re transcoding for low-latency productions, you should ascertain whether the quoted figures in the spec sheets are for normal or low latency.

For these reasons, completing a thorough comparison requires a two-step analysis. Use spec sheet numbers to identify transcoders that you’d like to consider and acquire them for further testing. Once you have them in your labs you can identify the operating point for all candidates, test at these settings, and compare them accordingly.

Economics of Transcoding: OPEX - Power

Now, let’s look at OPEX, which has two components: power and storage costs. Table 2 continues our example, looking at power consumption.

Unfortunately, ascertaining power consumption may be complicated if you’re buying individual transcoders rather than a complete system. That’s because while transcoding manufacturers often list the power consumption utilized by their devices, you can only run these devices in a complete system. Within the system, power consumption will vary by the number of units configured in the system and the specific functions performed by the transcoder.

Note that the most significant contributor to overall system power consumption is the CPU. Referring back to the previous section, a transcoder that scales onboard will require lower CPU contribution than a system that scales using the host CPU, reducing overall CPU consumption. Along the same lines, a system without a hardware transcoder uses the CPU for all functions, maxing out CPU utilization likely consuming about the same energy as a system loaded with transcoders that collectively might consume 200 watts. 

Again, the only way to achieve a full apples-to-apples comparison is to configure the server as you would for production and measure power consumption directly. Fortunately, as you can see in Table 2, stream throughput is a major determinant of overall power consumption. Even if you assume that systems A and B both consume the same power, System B’s throughput makes it much cheaper to operate over a five year expected life, and much kinder to the environment.

Understanding the Economics of Transcoding - table 2
TABLE 2. Computing the watts per stream of the two systems.

Economics of Transcoding: Storage Costs

Once you purchase the systems, you’ll have to house them. While these costs are easiest to compute if you’re paying for a third-party co-location service, you’ll have to estimate costs even for in-house data centers. Table 3 continues the five year cost estimates for our two systems, and the denser system B proves much cheaper to house as well as power.

Understanding the Economics of Transcoding - table 3
TABLE 3: Computing the storage costs for the two systems.

Economics of Transcoding: Transcoding Options

These are the cost fundamentals, now let’s explore them within the context of different encoding architectures.

There are three general transcoding options: CPU-only, GPU, and ASIC-based. There are also FPGA-based solutions, though these will probably be supplanted by cheaper-to-manufacture ASIC-based devices over time. Briefly,

  • CPU-based transcoding, also called software-based transcoding, relies on the host central processing unit, or CPU, for all transcoding functions.
  • GPU-based transcoding refers to Graphic Processing Units, which are developed primarily for graphics-related functions but may also transcode video. These are added to the server in add-in PCIe cards.
  • ASICs are Application-Specific Integrated Circuits designed specifically for transcoding. These are added to the server as add-in PCIe cards or devices that conform to the U.2 form factor.

Economics of Transcoding: Real-World Comparison

NETINT manufactures ASIC-based transcoders and video processing units. Recently, we published a case study where a customer, Mayflower, rigorously and exhaustively compared these three alternatives, and we’ll share the results here.

By way of background, Mayflower’s use case needed to input 10,000 incoming simultaneous streams and distribute over a million outgoing simultaneous streams worldwide at a latency of one to two seconds. Mayflower hosts a worldwide service available 24/7/365.

Mayflower started with 80-core bare metal servers and tested CPU-based transcoding, then GPU-based transcoding, and then two generations of ASIC-based transcoding. Table 4 shows the net/net of their analysis, with NETINT’s Quadra T2 delivering the lowest cost per stream and the greatest density, which contributed to the lowest co-location and power costs.

RESULTS: COST AND POWER

Understanding the Economics of Transcoding - table 4
TABLE 4. A real-world comparison of the cost per stream and OPEX associated with different transcoding techniques.

As you can see, the T2 delivered an 85% reduction in CAPEX with ~90% reductions in OPEX as compared to CPU-based transcoding. CAPEX savings as compared to the NVIDIA T4 GPU was about 57%, with OPEX savings around ~70%.

Table 5 shows the five-year cost of the Mayflower T-2 based solution using the cost per KWH in Cyprus of $0.335. As you can see, the total is $2,225,241, a number we’ll return to in a moment.

Understanding the Economics of Transcoding - table 5
TABLE 5: Five-year cost of the Mayflower transcoding facility.

Just to close a loop, Tables 1, 2, and 3, compare the cost and performance of a Quadra Video Server equipped with ten Quadra T1U VPUs (Video Processing Units) with CPU-based transcoding on the same server platform. You can read more details on that comparison here.

Table 6 shows the total cost of both solutions. In terms of overall outlay, meeting the transcoding requirements with the Quadra-based System B costs 73% less than the CPU-based system. If that sounds like a significant savings, keep reading. 

TABLE 6: Total cost of the CPU-based System A and Quadra T2-based System B.

Economics of Transcoding: Cloud Comparison

If you’re transcoding in the cloud, all of your costs are OPEX. With AWS, you have two alternatives: producing your streams with Elemental MediaLive or renting EC3 instances and running your own transcoding farm. We considered the MediaLive approach here, and it appears economically unviable for 24/7/365 operation.

Using Mayflower’s numbers, the CPU-only approach required 500 80-core Intel servers running 24/7. The closest CPU in the Amazon ECU pricing calculator was the 64-core c6i.16xlarge, which, under the EC2 Instance Savings plan, with a 3-year commitment and no upfront payment, costs 1,125.84/month.

Understanding the Economics of Transcoding - figure 1
FIGURE 1. The annual cost of the Mayflower system if using AWS.

We used Amazon’s pricing calculator to roll these numbers out to 12 months and 500 simultaneous servers, and you see the annual result in Figure 1. Multiply this by five to get to the five-year cost of $33,775,056, which is 15 times the cost of the Quadra T2 solution, as shown in table 5.

We ran the same calculation on the 13 systems required for the Quadra Video Server analysis shown in Tables 1-3 which was powered by a 32-core AMD CPU. Assuming a c6a.8xlarge CPU with a 3-year commitment and no upfront payment,, this produced an annual charge of $79,042.95, or $395,214.6 for the five-year period, which is about 8 times more costly than the Quadra-based solution.

Understanding the Economics of Transcoding - figure 2
FIGURE 2: The annual cost of an AWS system per the example schema presented in tables 1-3.

Cloud services are an effective means for getting services up and running, but are vastly more expensive than building your own encoding infrastructure. Service providers looking to achieve or enhance profitability and competitiveness should strongly consider building their own transcoding systems. As we’ve shown, building a system based on ASICs will be the least expensive option.

In August, NETINT held a symposium on Building Your Own Live Streaming Cloud. The on-demand version is available for any video engineer seeking guidance on which encoder architecture to acquire, the available software options for transcoding, where to install and run your encoding servers, and progress made on minimizing power consumption and your carbon footprint.

ON-DEMAND: Building Your Own Live Streaming Cloud

ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses

ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses

As the title suggests, this post compares CAPEX and OPEX costs for live streaming using ASIC- based transcoding and CPU-based transcoding. The bottom line?

NETINT Transcoding Server with 10 T408 Video Transcoders
Figure 1. The 1 RU Deep Edge Appliance with ten NETINT T408 U.2 transcoders.

Jet-Stream is a global provider of live-streaming services, platforms, and products. One such product is Jet-Stream’s Deep Edge OTT server, an ultra-dense scalable OTT streaming transcoder, transmuxer, and edge cache that incorporates ten NETINT T408 transcoders. In this article, we’ll briefly review how Deep Edge compared financially to a competitive product that provided similar functionality but used CPU-based transcoding.

About Deep Edge

Jet-Stream Deep Edge is an OTT edge transcoder and cache server solution for telcos, cloud operators, compounds, and enterprises. Each Deep Edge appliance converts up to 80 1080p30 television channels to OTT HLS and DASH video streams, with a built-in cache enabling delivery to thousands of viewers without additional caches or CDNs.

Each Deep Edge appliance can run individually, or you can group multiple systems into a cluster, automatically load-balancing input channels and viewers per site without the need for human operation. You can operate and monitor Edge appliances and clusters from a cloud interface for easy centralized control and maintenance. In the case of a backlink outage, the edge will autonomously keep working.

Figure 2. Deep Edge operating schematic.

Optionally, producers can stream access logs in real-time to the Jet-Stream cloud service. The Jet-Stream Cloud presents the resulting analytics in a user-friendly dashboard so producers can track data points like the most popular channels, average viewing time, devices, and geographies in real-time, per day, week, month, and year, per site, and for all the sites.

Deep Edge appliances can also act as a local edge for both the internal OTT channels and Jet-Stream Cloud’s live streaming and VOD streaming Cloud and CDN services. Each Deep Edge appliance or cluster can be linked to an IP-address, IP-range, AS-number, country, or continent, so local requests from a cell tower, mobile network, compound, football stadium, ISP, city, or country to Jet-Stream Cloud are directed to the local edge cache. Each Deep Edge site can be added to a dynamic mix of multiple backup global CDNs, to tune scale, availability, and performance and manage costs.

Under the Hood

Each Deep Edge appliance incorporates ten NETINT T408 transcoders into a 1RU form factor driven by a 32-core CPU with 128 GB of RAM. This ASIC-based acceleration is over 20x more efficient than encoding software on CPUs, decreasing operational cost and CO2 footprint by order of magnitude. For example, at full load, the Deep Edge appliance draws under 240 watts.

The software stack on each appliance incorporates a Kubernetes-based container architecture designed for production workloads in unattended, resource-constrained, remote locations. The architecture enables automated deployment, scaling, recovery, and orchestration to provide autonomous operation and reduced operational load and costs.

The integrated Jet-Stream Maelstrom transcoding software provides complete flexibility in encoding tuning, enabling multi-bit-rate transcoding in various profiles per individual channel.

Each channel is transcoded and transmuxed in an isolated container, and in the event of a crash, affected processes are restarted instantly and automatically.

Play Video about ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses
HARD QUESTIONS ON HOT TOPICS
 ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses
Watch the full conversation on YouTube: https://youtu.be/pXcBXDE6Xnk

Deep Edge Proposal

Recently, Jet-Stream submitted a bid to a company with a contract to provide local streaming services to multiple compounds in the Middle East. The prospective customer was fully transparent and shared the costs associated with a CPU-based solution against which Deep Edge competed.

In producing these projections, Jet-Stream incorporated a cost per kilowatt of € 0.20 Euros and assumed that the software-based server would run at 400 Watts/hour while Deep Edge would run at 220 Watts per hour.  These numbers are consistent with lab testing we’ve performed at NETINT; each T408 draws only 7 watts of power, and because they transcode the incoming signal onboard, host CPU utilization is typically at a minimum.

Jet-Stream produced three sets of comparisons; a single appliance, a two-appliance cluster, and ten sites with two-appliance clusters. Here are the comparisons. Note that the Deep Edge cost includes all software necessary to deliver the functionality detailed above for standard features. In contrast, the CPU-based server cost is hardware-only and doesn’t include the licensing cost of software needed to match this functionality.    

Single Appliance

A single Deep Edge appliance can produce 80 streams, which would require five separate servers for CPU-based transcoding. Considering both CAPEX and OPEX, the five-year savings was €166,800.

ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses - Table 1
Table 1. CAPEX/OPEX savings for a single
Deep Edge appliance over CPU-based transcoding.

A Two-Appliance Cluster

Two Deep Edge appliances can produce 160 streams, which would require nine CPU-based encoding servers to produce. Considering both CAPEX and OPEX, the five-year savings for this scenario was €293,071.

Table 2 CAPEX/OPEX savings for a dual-appliance
Deep Edge cluster over CPU-based transcoding.
.

Ten Sites with Two-Appliance Clusters

Supporting ten sites with 180 channels would require 20 Deep Edge appliances and 90 servers for CPU-based encoding. Over five years, the CPU-based option would cost over € 2.9 million Euros more than Deep Edge.

Table 3. CAPEX/OPEX savings for ten dual-appliance
Deep Edge clusters over CPU-based transcoding.

While these numbers border on unbelievable, they are actually quite similar to what we computed in this comparison, How to Slash CAPEX, OPEX, and Carbon Emissions with T408 Video Transcoder, which compared T408-based servers to CPU-only on-premises and AWS instances.

The bottom line is that if you’re transcoding with CPU-based software, you’re paying way too much for both CAPEX and OPEX, and your carbon footprint is unnecessarily high. If you’d like to explore how many T408s you would need to assume your current transcoding workload, and how long it would take to recoup your costs via lower energy costs, check out our calculators here.

Play Video about ASIC vs. CPU-Based Transcoding: A Comparison of Capital and Operating Expenses
Voices of Video: Building Localized OTT Networks
Watch the full conversation on YouTube: https://youtu.be/xP1U2DGzKRo

Mobile cloud gaming and technology suppliers

Cloud gaming is the perfect application for ASIC-based transcoding. NETINT products are extensively deployed in cloud gaming overseas. High-profile domestic...

Video games are a huge market segment, projected to reach US$221.4 billion in 2023, expanding to an estimated US$285 billion by 2027. Of that, cloud gaming grossed an estimated US$3 billion+ in 2022 and is projected to produce over US$12 billion in revenue by 2026.

While the general video game market generates minimal revenue from encoder sales, cloud gaming is the perfect application for ASIC-based transcoding. NETINT products were designed, in part, for cloud gaming and are extensively deployed in cloud gaming overseas. We expect to announce some high-profile domestic design wins in 2023.

If you’re not a gamer, you may not be familiar with what cloud gaming is and how it’s different from PC or console-based gaming. This is the first of several introductory articles to get you up to speed on what cloud gaming is, how it works, who the major players are, and why it’s projected to grow so quickly. 

What is cloud gaming

Figure 1, from this article, illustrates the difference between PC/console gaming and cloud gaming. On top is traditional gaming, where the gamer needs an expensive, high-performance console or game computer to process the game logic and render the output. To the extent that there is a cloud component, say for multiple players, the online server tracks and reports the interactions, but all computational and rendering heavy lifting is performed locally.

Mobile cloud gaming and technology suppliers - figure 1
Figure 1. The difference between traditional and cloud gaming. From this article.

On the bottom is cloud gaming. As you can see, all you need on the consumer side is a screen and game controller. All of the game logic and rendering are performed in the cloud, along with encoding for delivery to the consumer.

Cloud gaming workflow

Figure 2 shows a high-level cloud workflow – we’ll dig deeper into the cloud gaming technology stack in future articles, but this should help you grasp the concept. As shown, the gamer’s inputs are sent to the cloud, where a virtual instance of the game interprets, executes, and renders the input. The resultant frames are captured, encoded, and transmitted back to the consumer, where the frames are decoded and displayed. 

#image_title
Figure 2. A high-level view of the cloud side of cloud gaming from this seminal article.

Cloud gaming and consumers' benefits

Cloud gaming services incorporate widely different business models, pricing levels, available games, performance envelopes, and compatible devices. In most cases, however, consumers benefit because:

  • They don’t need a high performant PC or game console to play games – they can play on most connected devices. This includes some Smart TVs for a true, big-screen experience.
  • They don’t need to download, install, or maintain games on their game platform.
  • They don’t need to buy expensive games to get started.
  • They can play the same game on multiple platforms, from an expensive gaming rig or console to a smartphone or tablet, with all ongoing game information stored in the cloud so you can immediately pick up where you left off.

Publishers benefit because they get instant access to users on all platforms, not just the native platforms the games were designed for. So, console and PC-based games are instantly accessible to all players, even those without the native hardware. Since games aren’t downloaded during cloud gaming, there’s no risk of piracy, and the cloud negates the performance advantages long-held by those with the fastest hardware, leveling the playing field for game play.

Gaming experience

Speaking of performance, what’s necessary to achieve a traditional local gameplay experience? Most cloud platforms recommend a 10 Mbps download speed at a minimum for mobile, with a wired Ethernet connection recommended for computers and smart TVs. As you would expect, your connection speed dictates performance, with 4K ultra-high frame rate games requiring faster connection speeds than 1080p@30fps gameplay.

As mentioned at the top, cloud gaming is expected to capture an increasing share of overall gameplay revenue going forward, both from existing gamers who want to play new games on new platforms and new gamers. Given the revenue numbers involved, this makes cloud gaming a critical market for all related technology suppliers. 

Is power consumption your company’s priority?

Is power consumption your company's priority?

Power consumption is a priority for NETINT customers and a passion for NETINT engineers and technicians. Matthew Ariho, a system engineer in SoC Engineering at NETINT, recently answered some questions about:

  • How to test power consumption
  • Which computer components draw the most power
  • Why using older computers is bad for your power bills, and
  • The best way for video-centric data centers to reduce power consumption.

What are the different ways to test power consumption (and cost)?

Is power consumption your company's priority? - Matthew Ariho
Matthew Ariho

There are software and hardware-based solutions to this problem. I use one of each as a means of confirming any results.

One software tool is the IPMItool linux package which provides a simple command-line interface to IPMI-enabled devices through a Linux kernel driver. This tool polls the instantaneous, average and peak and minimum instantaneous power draw of the over a sampling period.

Is power consumption your company's priority?

On the hardware side of things, you can use different forms of multimeters, like the Kill-A-Watt meter and a 208VAC power bar are examples of such devices available in our lab.

What are their pros and cons (and accuracy)?

Is power consumption your company's priority? - Matthew Ariho
Matthew Ariho

The IPMItool is great because it provides a lot of information. It is fairly simple to set up and use. There is a question of reliability because it is software based, it depends on readings whose source I’m not familiar with.

The multimeters (like the Kill-A-Watt meter), while also simple to use, do not have any logging capabilities which makes measurements like average or steady state power draw difficult to measure. Both methods have a resolution of 1W which is not ideal but more than sufficient for our use cases.

What activities to you run when you test power consumption?

Is power consumption your company's priority? - Matthew Ariho
Matthew Ariho

We run multi-instances that mimic streaming workloads but only to the point that each of those instances is performing up to par with our standards (for example, 30 fps).

What’s the range of power consumption you’ve seen?

Is power consumption your company's priority? - Matthew Ariho
Matthew Ariho

I’ve seen reports of power consumption of up to 450 watts, but personally never tested a unit that drew that much. Typically, without any load on the T408 devices, the power consumption hovers around 150W, which increases to 210 to 220W during peak periods.

What’s the difference between Power Supply rating and actual power consumption (and are they related)?

Is power consumption your company's priority? - Matthew Ariho
Matthew Ariho

Power supplies take in 120VAC or 208VAC and convert to various DC voltages (12V, 5V, 3.3V) to power different devices in a computer. This conversion process inherently has several inefficiencies. The extent of these inefficiencies depends on the make of the power supply and the quality of components used.

Power supplies are offered with an efficiency rating that certify how efficiently a power supply will function at different loads. Power consumption measured at the wall will always be less than power supplied within a computer.

What are the hidden sources of excessive power that most people don’t know about?

Is power consumption your company's priority? - Matthew Ariho
Matthew Ariho

The operating system of a computer can consume a lot of power performing background tasks though this has become less of a problem with more efficient CPUs on the market. Other sources of excessive power are bloatware that are usually unnecessary programs that run in the background.

What distinguishes a power-hungry computer from an efficient one – what should the reader look for?

Is power consumption your company's priority? - Matthew Ariho
Matthew Ariho

The power supply rating is something to watch. Small variations in the power supply rating make significant differences in efficiency. The difference between a PSU rated at 80 PLUS and a PSU rated at 80 PLUS Bronze is about 2% to 5% depending on the load. This number only grows with better rated PSUs.

Other factors including the components of the computer. Recently, newer devices (CPUs, GPUs and motherboards) have been made with beyond significant generational improvements in efficiency. A top-of-the-line computer from 3 years ago simply cannot compete with some mid-range computers in terms of both power efficiency or performance. So, while sourcing older but cheaper components in the past may have been a good decision, nowadays, its not as clear cut.

Which components draw the most power?

Is power consumption your company's priority? - Matthew Ariho
Matthew Ariho

CPUs and GPUs. Even consumer CPUs can draw over 200W sustained. GPUs on the lower end consume around 150W and now more recently over 400W.

How does the number of cores in a computer impact power usage?

Is power consumption your company's priority? - Matthew Ariho
Matthew Ariho

I’m really not an expert on server components and it is hard to say without having examples. There are too many options to provide a conclusion on a proper trend. There are AMD 64 core server CPUs that pull about 250 to 270 W and 12 to 38 core Intel server CPUs that do about the same. Ultimately architectural advantages/features determine performance and efficiencies when comparing CPUs across manufacturer or even CPUs from the same manufacturer.

You can't manage what you don't measure.

One famous quote attributed to Peter Drucker is that you can’t manage what you don’t measure. As power consumption becomes increasingly important, it’s incumbent upon all of us to both measure and manage it.