Skip to content
Maximizing Video Quality with Zero-Cost Encoding: api.video's Revolutionary Approach

Boosting Video Streaming: Efficient Transcoding for Better Performance & Cost

This case study outlines api.video’s innovative approach to video transcoding led by author Emilien Gobillot. It reveals their strategic pivot from CPU to ASIC-based VPU transcoding, utilizing NETINT’s G4 and G5 ASICs for improved power efficiency and cost-effectiveness. As discussed below, switching to ASICs slashed api.video’s encoding costs, enabling the company to offer free transcoding to all customers.

About api.video

We founded api.video with the mission to empower video engineers to effortlessly integrate video into their applications or services. To achieve this, we’ve built and provided a global video backbone with an easy-to-use API for video delivery, transcoding, and live streaming. With our API, our customers can upload their videos at will and count on us to make them quickly available for delivery.

Meeting this demand became a formidable challenge when our customers uploaded their entire video catalogs at once. In tackling high-volume transcoding in the past, we relied on the most common solution: CPUs. However, we found CPUs inefficient for video transcoding, especially with new video codecs like HEVC. To affordably meet our increasing encoding demand, we began considering other transcoding alternatives, including GPUs, FPGAs, and ASICs.

FPGA vs GPU vs ASICS: A Comparison

When we started our research on FPGA, GPU, and ASICs, we gathered information from technical marketing presentations on some products (i.e., Intel Flex series cards not yet available on the market at that time) and ran benchmarks in our lab. Mixing real and marketing measures could have been misleading. However, the Codensity G4 and G5 ASICs from NETINT looked to be the clear winners by an order of magnitude when it came to power efficiency and performance per dollar.

Boosting Video Streaming: Efficient Transcoding for Better Performance & Cost
FIGURE 1. Comparing power efficiency and performance per dollar of transcoding solutions.

Switching From CPU to ASICs

So, we continued the theoretical study with real benchmarks using NETINT VPUs (video processing units) and transcoders to validate this decision. Our team found the performance of NETINT’s G4-powered T408 transcoders satisfactory and concluded that we could have gone into production with these cards. However, NETINT’s Quadra G5-based transcoders were even more appealing because of their AV1 support and higher densities. The AI functions associated with the G5 ASIC on the Quadra T1U cards also gave the second-generation product, bonus points.

Refocusing on Quadra, we restarted our benchmarks with the G5-based Quadra T1U cards to assess their performance. We also tested different transcoding strategies to maximize transcoding speed and find the optimal density per server. The results matched our expectations, and we formulated a new transcoding strategy adapted to our transcoding video pipeline.

This decision marked a turning point in our infrastructure, opening doors to greater efficiency. With increased capacity and advanced encoding capabilities, we went into production.

How Quadra VPUs Reduced Our Encoding Costs

We purchased and deployed NETINT G5-based Quadra T1U cards and used the following procedure to ascertain comparative transcoding costs. Note that NETINT calls the Quadra a Video Processing Unit, or VPU, a term used in the following discussion.

First, we assumed that the incoming source video was H.264-encoded and that the T1U would transcode the incoming stream into five H.264 renditions for adaptive bitrate distribution (ABR) at these configurations: 1080p30, 720p30, 480p30, 360p30, 240p30.

To compute the load this transcode job imposed on the VPU, we converted each stream into MP/s (Megapixels per second) and added up the total. The outputs above represent 62 MP/s + 28 MP/s + 12MP/s + 7MP/s + 3MP/s respectively for 1080p30, 720p30, 480p30, 360p30, 240p30 renditions, which adds up to 112 MP/s. Since each VPU can encode 2000 MP/s to h.264, a single VPU can encode all five renditions of 17 minutes of video per minute (2000/112 = 17).

On the cost side, each server with four T1Us costs $10,000, which we amortized over three years. Power cost was approximately $1000 per year (420w at $0.30 / kWh) at full consumption.

Using this data, we calculated the average cost of one minute of a 1080p30 video transcoded to five renditions using different usage levels for the server. You see these results in Table 1.

Boosting Video Streaming: Efficient Transcoding for Better Performance & Cost - Transcoding cost per minute based upon varying usage percentages.
TABLE 1.  Transcoding cost per minute based upon varying usage percentages.

If the server is used only 0.5% per day (i.e., ~7 min per day fully loaded), the cost per minute would be $0.020. If, on average, the server is loaded at 50% (12 hours a day fully loaded), the cost per minute would be 1/100 of the previous price. Even at 5% utilization, the cost per video is immaterial, allowing us to make the strategic decision to offer transcoding services at no charge to our customers.

What About Video Quality?

In the past, hardware encoders were associated with lower-quality video, but tests performed by us prove that this isn’t the case with the Quadra VPU. Specifically, we encoded a 1080p30 version of the movie Caminandes 2 via our API and compared output quality with a video encoded with FFmpeg and the x264 codec using the following command string.

ffmpeg -r 30 -y -i input.mp4 -c:v libx264 -force_key_frames ‘expr:gte(t,n_forced*4)’ -b:v 4400k -preset medium -c:a copy output.mp4

As shown in Table 2, we compared the two videos using well-known quality metrics VMAF, PSNR, and SSIM. The scores are all extremely close, with the VPU-encoded video edging FFmpeg in VMAF scoring and the reverse for PSNR and SSIM. Given the closeness of the scores, however, it’s clear that no viewer would notice the difference.

Boosting Video Streaming: Efficient Transcoding for Better Performance & Cost - Quality comparisons with the video Caminandes 2
TABLE 1.  Transcoding cost per minute based upon varying usage percentages.

(Note: NETINT performed more extensive quality comparisons that you can read here and here).

To summarize, we built our own infrastructure to choose the optimal transcoding technology and offer our customers the best possible value proposition. By deploying NETINT’s Quadra T1Us, we reduced power consumption and the number of servers by an order of magnitude while offering video quality that matched CPU-based transcoding. That’s how VPUs enabled us to offer free video encoding to our customers.

Picture of Emilien Gobillot

Emilien Gobillot

Emilien Gobillot, the Head of Infrastructure at api.video, has been immersed in the video streaming industry since 2013. In a previous role, he led a performance engineering team with a mission to optimize the entire video streaming solution, covering all aspects of the video workflow from transcoding to CDN.
During this experience, Emilien recognized the advantages of using GPUs to boost transcoder density. Upon joining api.video, the team identified a significant challenge: transcoding videos at scale while keeping costs low and performance high.

JANUARY 17th, 2024 | 11am - 2pm EST: Building Your Own Live Streaming Cloud

REGISTER NOW:  ASIC-Based Transcoding
for High-volume Use Cases
Including social media, broadcast, interactive platforms, and service providers