Choosing Transcoding Hardware: Deciphering the Superiority of ASIC-based Technology

Which technology reigns supreme in transcoding: CPU-only, GPU, or ASIC-based? Kenneth Robinson’s incisive analysis from the recent symposium makes a compelling case for ASIC-based transcoding hardware, particularly NETINT’s Quadra. Robinson’s metrics prioritized viewer experience, power efficiency, and cost. While CPU-only systems appear initially economical, they falter with advanced codecs like HEVC. NVIDIA’s GPU transcoding offers more promise, but the Quadra system still outclasses both in quality, cost per stream, and power consumption. Furthermore, Quadra’s adaptability allows a seamless switch between H.264 and HEVC without incurring additional costs. Independent assessments, such as Ilya Mikhaelis‘, echo Robinson’s conclusions, cementing ASIC-based transcoding hardware as the optimal choice.

Choosing transcoding hardware

During the recent symposium, Kenneth Robinson, NETINT’s manager of Field Application Engineering, compared three transcoding technologies: CPU-only, GPU, and ASIC-based transcoding hardware. His analysis, which incorporated quality, throughput, and power consumption, is useful as a template for testing methodology and for the results. You can watch his presentation here and download a copy of his presentation materials here.

Figure 1. Overall savings from ASIC-based transcoding (Quadra) over GPU (NVIDIA) and CPU.
Figure 1. Overall savings from ASIC-based transcoding (Quadra) over GPU (NVIDIA) and CPU.

As a preview of his findings, Kenneth found that when producing H.264, ASIC-based hardware transcoding delivered CAPEX savings of 86% and 77% compared to CPU and GPU-based transcoding, respectively. OPEX savings were 95% vs. CPU-only transcoding and 88% compared to GPU.

For the more computationally complex HEVC codec, the savings were even greater. As compared to CPU-based transcoding, ASICs saved 94% on CAPEX and 98% on OPEX. As compared to GPU-based transcoding, ASICs saved 82% on CAPEX and 90% on OPEX. These savings are obviously profound and can make the difference between a successful and profitable service and one that’s mired in red ink.

Let’s jump into Kenneth’s analysis.

Determining Factors

Digging into the transcoding alternatives, Kenneth described the three options. First are CPUs from manufacturers like AMD or Intel. Second are GPUs from companies like NVIDIA or AMD. Third are ASICs, or Application Specific Integrated Circuits, from manufacturers like NETINT. Kenneth noted that NETINT calls its Quadra devices Video Processing Units (VPU), rather than transcoders because they perform multiple additional functions besides transcoding, including onboard scaling, overlay, and AI processing.

He then outlined the factors used to determine the optimal choice, detailing the four factors shown in Figure 2. Quality is the average quality as assessed using metrics like VMAF, PSNR, or subjective video quality evaluations involving A/B comparisons with viewers. Kenneth used VMAF for this comparison. VMAF has been shown to have the highest correlation with subjective scores, which makes it a good predictor of viewer quality of experience.

Choosing transcoding hardware - Determining Factors
Figure 2. How Kenneth compared the technologies.

Low-frame quality is the lowest VMAF score on any frame in the file. This is a predictor for transient quality issues that might only impact a short segment of the file. While these might not significantly impact overall average quality, short, low-quality regions may nonetheless degrade the viewer’s quality of experience, so are worth tracking in addition to average quality.

Server capacity measures how many streams each configuration can output, which is also referred to as throughput. Dividing server cost by the number of output streams produces the cost per stream, which is the most relevant capital cost comparison. The higher the number of output streams, the lower the cost per stream and the lower the necessary capital expenditures (CAPEX) when launching the service or sourcing additional capacity.

Power consumption measures the power draw of a server during operation. Dividing this by the number of streams produced results in the power per stream, the most useful figure for comparing different technologies.

Detailing his test procedures, Kenneth noted that he tested CPU-only transcoding on a system equipped with an AMD Epic 32-core CPU. Then he installed the NVIDIA L4 GPU (a recent release) for GPU testing and NETINT’s Quadra T1U U.2 form factor VPU for ASIC-based testing.

He evaluated two codecs, H.264 and HEVC, using a single file, the Meridian file from Netflix, which contains a mix of low and high-motion scenes and many challenging elements like bright lights, smoke and fog, and very dark regions. If you’re testing for your own deployments, Kenneth recommended testing with your own test footage.

Kenneth used FFmpeg to run all transcodes, testing CPU-only quality using the x264 and x265 codecs using the medium and very fast presets. He used FFmpeg for NVIDIA and NETINT testing as well, transcoding with the native H.264 and H.265 codec for each device.

H.264 Average, Low-Frame, and Rolling Frame Quality

The first result Kenneth presented was average H.264 quality. As shown in Figure 3, Kenneth encoded the Meridian file to four output files for each technology, with encodes at 2.2 Mbps, 3.0 Mbps, 3.9 Mbps, and 4.75 Mbps. In this “rate-distortion curve” display, the left axis is VMAF quality, and the bottom axis is bitrate. In all such displays, higher results are better, and Quadra’s blue line is the best alternative at all tested bitrates, beating NVIDIA and x264 using the medium and very fast presets.

Figure 3. Quadra was tops in H.264 quality at all tested bitrates.
Figure 3. Quadra was tops in H.264 quality at all tested bitrates.

Kenneth next shared the low-frame scores (Figure 4), noting that while the NVIDIA L4’s score was marginally higher than the Quadra’s, the difference at the higher end was only 1%. Since no viewer would notice this differential, this indicates operational parity in this measure.

Figure 4. NVIDIA’s L4 and the Quadra achieve relative parity in H.264 low-frame testing.
Figure 4. NVIDIA’s L4 and the Quadra achieve relative parity in H.264 low-frame testing.

The final H.264 quality finding displayed a 20-second rolling average of the VMAF score. As you can see in Figure 5, the Quadra, which is the blue line, is consistently higher than the NVIDIA L4 or medium or very fast. So, even though the Quadra had a lower single-frame VMAF score compared to NVIDIA, over the course of the entire file, the quality was predominantly superior.

Figure 5. 20-second rolling frame quality over file duration.
Figure 5. 20-second rolling frame quality over file duration.

HEVC Average, Low-Frame, and Rolling Frame Quality

Kenneth then related the same results for HEVC. In terms of average quality (Figure 6), NVIDIA was slightly higher than the Quadra, but the delta was insignificant. Specifically, NVIDIA’s advantage starts at 0.2% and drops to 0.04% at the higher bit rates. So, again, a difference that no viewer would notice. Both NVIDIA and Quadra produced better quality than CPU-only transcoding with x265 and the medium and very fast presets.

Figure 6. Quadra was tops in H.264 quality at all tested bitrates.
Figure 6. Quadra was tops in H.264 quality at all tested bitrates.

In the low-frame measure (Figure 7), Quadra proved consistently superior, with NVIDIA significantly lower, again a predictor for transient quality issues. In this measure, Quadra also consistently outperformed x265 using medium and very fast presets, which is impressive.

Figure 7. NVIDIA’s L4 and the Quadra achieve relative parity in H.264 low-frame testing.
Figure 7. NVIDIA’s L4 and the Quadra achieve relative parity in H.264 low-frame testing.

Finally, HEVC moving average scoring (Figure 8) again showed Quadra to be consistently better across all frames when compared to the other alternatives. You see NVIDIA’s downward spike around frame 3796, which could indicate a transient quality drop that could impact the viewer’s quality of experience.

Figure 8. 20-second rolling frame quality over file duration.
Figure 8. 20-second rolling frame quality over file duration.

Cost Per Stream and Power Consumption Per Stream - H.264

To measure cost and power consumption per stream, Kenneth first calculated the cost for a single server for each transcoding technology and then measured throughput and power consumption for that server using each technology. Then, he compared the results, assuming that a video engineer had to source and run systems capable of transcoding 320 1080p30 streams.

You see the first step for H.264 in Figure 9. The baseline computer without add-in cards costs $7,100 but can only output fifteen 1080p30 streams using an average of the medium and veryfast presets, resulting in a cost per stream was $473. Kenneth installed two NVIDIA L4 cards in the same system, which boosted the price to $14,214, but more than tripled throughput to fifty streams, dropping cost per stream to $285. Kenneth installed ten Quadra T1U VPUs in the system, which increased the price to $21,000, but skyrocketed throughput to 320 1080p30 streams, and a $65 cost per stream.

This analysis reveals why computing and focusing on the cost per stream is so important; though the Quadra system costs roughly three times the CPU-only system, the ASIC-fueled output is over 21 times greater, producing a much lower cost per stream. You’ll see how that impacts CAPEX for our 320-stream required output in a few slides.

Figure 9. Computing system cost and cost per stream.
Figure 9. Computing system cost and cost per stream.

Figure 10 shows the power consumption per stream computation. Kenneth measured power consumption during processing and divided that by the number of output streams produced. This analysis again illustrates why normalizing power consumption on a per-stream basis is so necessary; though the CPU-only system draws the least power, making it appear to be the most efficient, on a per-stream basis, it’s almost 20x the power draw of the Quadra system.

Figure 10. Computing power per stream for H.264 transcoding.
Figure 10. Computing power per stream for H.264 transcoding.

Figure 11 summarizes CAPEX and OPEX for a 320-channel system. Note that Kenneth rounded down rather than up to compute the total number of servers for CPU-only and NVIDIA. That is, at a capacity of 15 streams for CPU-only transcoding, you would need 21.33 servers to produce 320 streams. Since you can’t buy a fractional server, you would need 22, not the 21 shown. Ditto for NVIDIA and the six servers, which, at 50 output streams each, should have been 6.4, or actually 7. So, the savings shown are underrepresented by about 4.5% for CPU-only and 15% for NVIDIA. Even without the corrections, the CAPEX and OPEX differences are quite substantial.

Figure 11. CAPEX and OPEX for 320 H.264 1080p30 streams.
Figure 11. CAPEX and OPEX for 320 H.264 1080p30 streams.

Cost Per Stream and Power Consumption Per Stream - HEVC

Kenneth performed the same analysis for HEVC. All systems cost the same, but throughput of the CPU-only and NVIDIA-equipped systems both drop significantly, boosting their costs per stream. The ASIC-powered Quadra outputs the same stream count for HEVC as for H.264, producing an identical cost per stream.

Figure 12. Computing system cost and cost per stream.
Figure 12. Computing system cost and cost per stream.

The throughput drop for CPU-only and NVIDIA transcoding also boosted the power consumption per stream, while Quadra’s remained the same.

Figure 13. Computing power per stream for H.264 transcoding.
Figure 13. Computing power per stream for H.264 transcoding.

Figure 14 shows the total CAPEX and OPEX for the 320-channel system, and this time, all calculations are correct. While CPU-only systems are tenuous–at best– for H.264, they’re clearly economically untenable with more advanced codecs like HEVC. While the differential isn’t quite so stark with the NVIDIA products, Quadra’s superior quality and much lower CAPEX and OPEX are compelling reasons to adopt the ASIC-based solution.

Figure 14. CAPEX and OPEX for 320 1080p30 HEVC streams.
Figure 14. CAPEX and OPEX for 320 1080p30 HEVC streams.

As Kenneth pointed out in his talk, even if you’re producing only H.264 today, if you’re considering HEVC in the future, it still makes sense to choose a Quadra-equipped system because you can switch over to HEVC with no extra hardware cost at any time. With a CPU-only system, you’ll have to more than double your CAPEX spending, while with NVIDIA,  you’ll need to spend another 25% to meet capacity.

The Cost of Redundancy

Kenneth concluded his talk with a discussion of full hardware and geo-redundancy. He envisioned a setup where one location houses two servers (a primary and a backup) for full hardware redundancy. A similar setup would be replicated in a second location for geo-redundancy. Using the Quadra video server, four servers could provide both levels of redundancy, costing a total of $84,000. Obviously, this is much cheaper than any of the other transcoding alternatives.

NETINT’s Quadra VPU proved slightly superior in quality to the alternatives, vastly cheaper than CPU-only transcoding, and very meaningfully more affordable than GPU-based transcoders. While these conclusions may seem unsurprising – an employee at an encoding ASIC manufacturer concludes that his ASIC-based technology is best — you can check Ilya Mikhaelis’ independent analysis here and see that he reached the same result.

Now ON-DEMAND: Symposium on Building Your Live Streaming Cloud

From Cloud to Control. Building Your Own Live Streaming Platform

Cloud services are an effective way to begin live streaming. Still, once you reach a particular scale, it’s common to realize that you’re paying too much and can save significant OPEX by deploying transcoding infrastructure yourself. The question is, how to get started?

NETINT’s Build Your Own Live Streaming Platform symposium gathers insights from the brightest engineers and game-changers in the live-video processing industry on how to build and deploy a live-streaming platform.

In just three hours, we’ll cover the following:

  • Hardware options for live transcoding and encoding to cut costs by as much as 80%.
  • Software options for producing, delivering, and playing your live video streams.
  • Co-location selection criteria to achieve cloud-like performance with on-premise affordability.

You’ll also hear from two engineers who will demystify the process of assembling a live-streaming facility, how they identified and solved key hurdles, along with real costs and performance data.

Cloud? Or your own hardware?

It’s clear to many that producing live streams via a public cloud like AWS can be vastly more expensive than owning your hardware. (You can learn more by reading “Cloud or On-Premises? The Streaming Dilemma” and “How to Slash CAPEX, OPEX, and Carbon Emissions Using the NETINT T408 Video Transcoder”). 

To quote serial entrepreneur David Hansson, who recently migrated two SaaS services from the cloud to on-premise, “Don’t let the entrenched cloud interests dazzle you into believing that running your own setup is too complicated. Everyone and their dog did it to get the internet off the ground, and it’s only gotten easier since.” 

For those who have only operated in the cloud, there’s fear of the unknown. Fear buying hardware transcoders, selecting the right software, and choosing the best colocation service. So, we decided to fight fear with education and host a symposium to educate streaming engineers on all these topics.  

“Building Your Own Live Streaming Cloud” will uncover how owning your encoding stack can slash operating costs and boost performance with minimal CAPEX.

Learn to select the optimal transcoding hardware, transcoding and packaging software, and colocation facilities. We’ll also discuss strategies to reduce carbon emissions from your transcoding engine. 

This FREE virtual event takes place on August 17th, from 11:00 AM – 2:15 PM EST.

Five issues tackled by nine experts:

Transcoding Hardware Options:

Learn the pros and cons of CPU, GPU, and ASIC-based transcoding via detailed throughput and cost examples shared by Kenneth Robinson, Manager of Field Application Engineers at NETINT Technologies. Then Ilya Mikhaelis, Streaming Backend Tech Lead at Mayflower, will describe his company’s journey from CPU to GPU to ASICs, covering costs, power consumption, latency, and density metrics.

Software Options:

Jan Ozer from NETINT will identify the three categories of transcoding software: multimedia frameworks, media servers, and other tools. Then, you’ll hear from experts in each category, starting with Romain Bouqueau, founder of Motion Spell, who will discuss the capabilities of the GPAC multimedia framework. Barry Owen, Chief Solutions Architect at Wowza, will discuss Wowza Streaming Engine’s suitability for private clouds. Lastly, Adrian Roe, Director at Id3as, developer of Norsk, will demonstrate Norsk’s simple, scripting-based operation, and extensive production and transcoding features.

Housing Options:

Once you select your hardware and software, the next step is finding the right co-location facility to house your live streaming infrastructure. Kyle Faber, with experience in building Edgio’s video streaming infrastructure, will guide you through the essential factors to consider when choosing a co-location facility.

Minimizing the Environmental Impact:

As responsible streaming professionals, it’s essential to address the environmental impact of our operations. Barbara Lange, Secretariat of Greening of Streaming, will outline actionable steps video engineers can take to minimize power consumption when acquiring and deploying transcoding servers.

Pulling it All Together:

Stef van der Ziel, founder of live-streaming pioneer Jet-Stream, will share lessons learned from his experience in creating both Jet-Stream’s private cloud and cloud transcoding solutions for customers. In his closing talk, Stef will demystify the process of choosing hardware, software, and a hosting facility, bringing all the previous discussions together into a cohesive plan.

Full Agenda:

11:00 am. – 11:10 am EST

Introduction (10 minutes):
Mark Donnigan, Head of Strategic Marketing at NETINT Technologies
Welcome, overview, and what you will learn.

 

11:10 am. – 11:40 am EST

Choosing transcoding hardware (30 minutes):
Kenneth Robinson, Manager of Field Application Engineers at NETINT Technologies
You have three basic approaches to transcoding, CPU-only, GPU, and ASICs. Kenneth outlines the pros and cons of each approach with extensive throughput and CAPEX and OPEX examples for each.

 

11:40 am. – 12:00 pm EST

From CPU to GPU to ASIC: Our Transcoding Journey (20 minutes):
Ilya Mikhaelis, Streaming Backend Tech Lead at Mayflower
Charged with supporting very high-volume live transcoding operations, Ilya started with libx264 software transcoding, which consumed massive power but yielded low stream density per server. Then he experimented with GPUs and other hardware and ultimately transitioned to an ASIC-based solution with much lower power consumption and much higher stream density per server. Ilya will detail the costs, power consumption, and density of all options, providing both data and an invaluable evaluation framework.

 

12:00 pm. – 12:10 pm EST

Choosing your live production software (10 minutes): 
Jan Ozer, Senior Director of Video Technology at NETINT Technologies
The core of every live streaming system is transcoding and packaging software. This comes in many shapes and sizes, from open-source software like FFmpeg and GPAC, to streaming servers like Wowza, and production systems like Norsk. Jan discusses these multiple options so you can cohesively and affordably build your own live-streaming ecosystem.

 

12:10 pm. – 1:10 pm EST

Speed Round (60 minutes):
20-minute presentations from GPAC, Wowza, and NORSK.
Speakers from GPAC, Wowza, and NORSK discussing the features, functions, operational paradigms, and cost structure of their live software offering.

Speakers include:

  • Adrian Roe, CEO at id3as, Product: Norsk, Title: Make Live Easy with NORSK SDK
  • Romain Bouqueau, Founder and CEO, Motion Spell (home for GPAC Licensing), Product: GPAC Title of Talk: Deploying GPAC for Transcoding and Packaging
  • Barry Owen, Chief Solutions Architect at Wowza, Title of Talk: Start Streaming in Minutes with Wowza Streaming Engine



1:10 pm. – 1:40 pm EST

Choosing a co-location facility (30 minutes): 
Kyle Faber, Senior Director of Product Management at Edgio.
Once you’ve chosen your hardware and software, you need a place to install them. If you don’t have your own connected data center, you may consider a colocation facility. In his talk, Kyle addresses the key factors to consider when choosing a co-location facility for your live streaming infrastructure.

 

1:40 pm. – 1:55 pm EST

How to Greenify Your Encoding Stack (15 minutes):
Barbara Lange, Secretariat of Greening of Streaming.
Learn how video streaming companies can work to significantly reduce their energy footprint and contribute to a greener streaming industry. Implement hardware and infrastructure optimization using immersion cooling and data center design improvements to maximize energy efficiency in your streaming infrastructure.

 

1:55 pm. – 2:15 pm EST

Closing Keynote (20 minutes):
Stef van der Ziel, Founder Jet-Stream
Jet-stream has delivered streaming solutions since its launch in 1994 and offers its own live streaming platform. One focus has been creating custom transcoding solutions for customers seeking to create their own private cloud for various applications. In his closing talk, Stef will demystify the process of choosing hardware, software, and a hosting facility and wrap a pretty bow around all previous presentations.

Co-location for Optimized, Sustainable Live Streaming Success

Choosing a co-location facility

If you decide to buy and run your transcoding servers versus a public cloud, you must choose where to host the servers. If you have a well-connected data center, that’s an option. But if you don’t, you’ll want to consider a co-location facility or co-lo.

A co-location facility is a data center that rents space to third parties for servers and other computing hardware. This rented space typically includes the physical area for the hardware (often measured in rack units or cabinets) and the necessary power, cooling, and security.

While prices vary greatly, in the US, you can expect to pay between $50 – $200 per month per RU, with prices ranging from $60 – $250 per RU in Europe, $80 – $300 per month per RU in South American, and $70 – $280 per month per RU in Asia.

Co-location facilities will provide a high-bandwidth internet connection, redundant power supplies, and sophisticated cooling systems to ensure optimal performance and uptime for hosted equipment. They also include robust physical security measures, including surveillance cameras, biometric access controls, and round-the-clock security personnel.

At a high level, businesses use co-location facilities to leverage economies of scale they couldn’t achieve on their own. By sharing the infrastructure costs with other tenants, companies can access high-level data center capabilities without a significant upfront investment in building and maintaining their facility.

Choosing a Co-lo for Live Streaming

Choosing a co-lo facility for any use involves many factors. However, live streaming demands require a focus on a few specific capabilities. We discuss these below to help you make an informed decision and maximize the efficiency and cost-effectiveness of your live-streaming operations.

Network Infrastructure and Connectivity

Live streaming requires high-performance and reliable network connections. If you’re using a particular content delivery network, ensure the link to the CDN is high performing. Beyond this, consider a co-lo with multiple (and redundant) high-speed connections to multiple top-tier telecom and cloud providers, which can ensure your live stream remains stable, even if one of the connections has issues.

Multiple content distribution providers can also reduce costs by enabling competitive pricing. If you need to connect to a particular cloud provider, perhaps for content management, analytics, or other services, make sure these connections are also available.

Geographic Location and Service

Choosing the best location or locations is a delicate balance. From a pure quality of experience perspective, facilities closer to your target audience can reduce latency and ensure a smoother streaming experience. However, during your launch, cost considerations may dictate a single centralized location that you can supplement over time with edge servers near heavy concentrations of viewers.

During the start-up phase and any expansion, you may need access to the co-lo facility to update or otherwise service existing servers and install new ones. That’s simpler to perform when the facility is closer to your IT personnel.

If circumstances dictate choosing a facility far from your IT staff, consider choosing a provider with the necessary managed services. While the services offered will vary considerably among the different providers, most locations provide hardware deployment and management services, which should cover you for expansion and maintenance.

Similarly, live streaming operations usually run round-the-clock, so you need a facility that offers 24/7 technical support. A highly responsive, skilled, and knowledgeable support team can be crucial in resolving any unexpected issues quickly and efficiently.

Scalability

Your current needs may be modest, but your infrastructure needs to scale as your audience grows. The chosen co-lo facility (or facilities) should have ample space and resources to accommodate future growth and expansion. Check whether they have flexible plans allowing upgrades and scalability as needed.

Redundancy and Disaster Recovery

In live streaming, downtime is unacceptable. Check for guarantees in volatile coastal or mountain regions that data centers can withstand specific types of disasters, like floods and hurricanes.

When disaster strikes, the co-location facility should have redundant power supplies, backup generators, and efficient cooling systems to prevent potential hardware failures. Check for procedures to protect equipment, backup data, and other steps to minimize the risk and duration of loss of service. For example, some facilities offer disaster recovery services to help customers restore disrupted environments. Walk through the various scenarios that could impact your service and ensure that the providers you consider have plans to minimize disruption and get you up and running as quickly as possible.

Security and Compliance

Physical and digital security should be a primary concern, particularly if you’re streaming third-party premium content that must remain protected. Ensure the facility uses modern security measures like CCTV, biometric access, fire suppression systems, and 24/7 on-site staff. Digital security should include robust firewalls, DDoS mitigation services, and other necessary precautions.

Environment Sustainability

An essential requirement for most companies today is environmental sustainability. ASIC-based transcoding is the most power-efficient of all transcoding alternatives. We believe that all companies should work to reduce their carbon footprints. Accordingly, choosing a co-location facility committed to energy efficiency and renewable energy sources will lower your energy costs and align with your company’s environmental goals.

Remember, the co-location facility is an extension of your live-streaming business. With the proper infrastructure, you can ensure high-quality, reliable live streams that satisfy your audience and grow your business. Take the time to visit potential facilities, ask questions, and thoroughly evaluate before deciding.

Cloud services are an effective way to begin live streaming. Still, once you reach a particular scale, it’s common to realize that you’re paying too much and can save significant OPEX by deploying transcoding infrastructure yourself. The question is, how to get started?

NETINT’s Build Your Own Live Streaming Platform symposium gathers insights from the brightest engineers and game-changers in the live-video processing industry on how to build and deploy a live-streaming platform.

In just three hours, we’ll cover the following:

  • Hardware options for live transcoding and encoding to cut costs by as much as 80%.
  • Software options for producing, delivering, and playing your live video streams.
  • Co-location selection criteria to achieve cloud-like performance with on-premise affordability.

You’ll also hear from two engineers who will demystify the process of assembling a live-streaming facility, how they identified and solved key hurdles, along with real costs and performance data.

Denser / Leaner / Greener - Symposium on Building Your Live Streaming Cloud

Build Your Own Streaming Infrastructure – Software

Build Your Own Streaming Infrastructure - Article by Jan Ozer from NETINT Technologies

My assumption is that you’re currently using a cloud-based service like AWS for your live streaming and are seeking to reduce costs by buying your own transcoding hardware, installing the necessary software, and hosting the server on-premises or in a co-location facility. This article covers the software side.

To begin, let’s acknowledge that AWS and other cloud services have created a well-featured and highly integrated ecosystem for live streaming and distribution. The downside is the cost.

To illustrate the potential savings, I’ll refer to this article, which compared the cost of producing 21 H.264 ladders and 27 HEVC ladders via AWS MediaLive and by encoding with NETINT’s recently launched Logan Video Server. As you can see in the table, MediaLive costs around $400K for H.264 and $1.8 million for HEVC, as compared to $11,140 in both cases for the co-located server.

Streaming Infrastructure - Table from article 'cloud or on-prem'
Table 1. Five-year cost comparison . AWS MediaLive pricing compared to the NETINT Server

While there are less expensive options available inside and outside of AWS, whenever you pay for hardware by the minute or hour of production, you’re vastly overpaying as compared to owning your own hardware. Sure, you say, but it’s so easy compared to running your own hardware.

If that’s a concern, here are some comforting words from David Heinemeier Hansson, co-owner, and CTO of software developer 37signals, the developer of the project management platform Basecamp and email service Hey. Recently, Hansson wrote  Why we’re leaving the cloud, a blog that detailed his companies’ decisions to do just that. Here’s the relevant quote.

Up until very recently, everyone ran their own servers, and much of the progress in tooling that enabled the cloud is available for your own machines as well. Don’t let the entrenched cloud interests dazzle you into believing that running your own setup is too complicated. Everyone and their dog did it to get the internet off the ground, and it’s only gotten easier since.

My wife has chihuahuas, and given their difficulties with potty training, I seriously doubt they could do it, but you get the point. To paraphrase FDR, all you have to fear is fear itself. The bottom line is that running your own live streaming service should cost relatively little CAPEX, will save significant OPEX, and won’t be nearly as challenging as you might be fearing.

Let’s look at your options for the software required to run your homegrown system.

Transcoding and Packaging Software

Figure 1 shows the minimum software and infrastructure needed for a live-streaming service. Presumably, you’ve already got the live production covered, and since AWS doesn’t offer a player, you have that piece addressed as well. You’ll need a content delivery network to deliver your streaming video, but you can continue to use CloudFront or other CDN. The software that you absolutely have to replace is the live transcoding and packaging component.

Here you have three options; multimedia frameworks, media servers, and “other.” Let’s discuss each in turn.

Multimedia Frameworks

Multimedia frameworks are software libraries, tools, and APIs that provide a set of functionalities and capabilities for multimedia processing, manipulation, and streaming. The best-known framework is FFmpeg, followed by GStreamer and GPAC, and they are all available open source.

Build Your Own Streaming Infrastructure - Software- diagram-2
Figure 1. Netflix uses GPAC for its packaging,
a significant technology endorsement for GPAC
and for multimedia frameworks in general.

Multimedia frameworks excel in projects at both ends of the complexity spectrum. For simple projects, like transcoding an input stream to an encoding ladder, you can create a script that inputs the stream, transcodes, and hands the packaged output streams off to a CDN in a matter of minutes. You can use the script to process thousands of simultaneous jobs, all at no charge.

At the other end of the spectrum, these frameworks also excel at complex jobs with idiosyncratic custom requirements that likely aren’t available in a server or commercial software product. The development, maintenance, and modification costs are considerable, but you get maximum feature flexibility if you’re willing to pay that cost.

What you don’t get with these tools is a user interface or simple configuration options – you start with a blank slate and must program in all desired features. What could be as simple as checking a checkbox in a streaming media server could require dozens or even thousands of lines of code in a multimedia framework.

Which takes us to streaming media servers.

Streaming Media Servers

The next category of products are streaming media servers, and it includes Wowza Streaming Engine, Nimble Streamer, and two open-source servers, Red5 and Ant Media Server. These servers tend to excel for most productions in the middle of the complexity spectrum and offer multiple advantages over multimedia frameworks.

There are several reasons why you might choose to use a streaming server over a multimedia framework, including a simplified setup and configuration. Most streaming servers provide out-of-the-box streaming solutions with pre-configured settings and management interfaces that simplify the setup and configuration process. While not all offer GUIs, those that don’t offer simple option selection in configuration files.

Build Your Own Streaming Infrastructure - Software- diagram-3
Figure 2. Wowza Streaming Engine is a highly regarded streaming server

As mentioned above, streaming servers often offer simpler access to advanced features that you’d have to craft by hand with a multimedia framework. They also offer better integration with third-party services like digital rights management (DRM) and content delivery networks. Between the simplified setup, easier access to features, and improved integration with other services, packaged servers can dramatically accelerate getting your live streaming service up and running.

Once you’re operational, you’ll appreciate management interfaces that monitor the health and performance of your streaming infrastructure, track viewer analytics, manage streaming workflows, and make real-time adjustments. If you’re in a dynamic demand environment, some streaming servers offer built-in scalability features and load balancing to manage the load over multiple hard transcoding resources. You’d have to build all that by hand or with plug-ins if using a multimedia framework.

The two potential downsides of streaming servers are cost and customizability. You’ll have to pay a monthly fee for some versions of these servers, and you may find it complicated or nearly impossible to add what you might consider to be essential features.

Other Streaming-Capable Programs

Most companies building their own live-streaming infrastructures will implement either a multimedia framework or a streaming server, but there are other programs that incorporate the core encoding and packaging functions. One such program is Norsk from id3as. Norsk bills itself as “an SDK that enables developers to easily create amazing, dynamic live video workflows and deploy them at any scale.” As such, it combines both video production and streaming server-related functions

You see this in Figure 3. The top portion shows that Norsk supports the typical codecs and packaging formats deployed by live-streaming producers. At the bottom of the figure, you see that Norsk also offers production-oriented features like multiple camera support, graphics and overlays, and transitions.

Build Your Own Streaming Infrastructure - Software- diagram-4
Figure 3. Norsk offers both production and server-related functions.

Interestingly, Norsk doesn’t have a GUI, instead offering a high-level API to simplify configuration and operation, with a Workflow Visualizer component to view the running state of the application. In this fashion, Norsk attempts to provide the configurability of multimedia frameworks with the ease of operation of scripting-driven streaming media servers.

Finding a program like Norsk that combines transcoding and packaging with other essential streaming-related functions makes a lot of sense; there’s one less vendor to onboard and one less product to learn and support. As remote production becomes more common, we expect more programs like Norsk to become available.

Those are your high-level options. If you’re interested in learning more about these and other programs that can drive encoding and packaging for your live transcoder. You should plan to attend our upcoming symposium; details will be available in the next couple of weeks.