Choosing a Co-Location Facility

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility

Edgio is the result of the merger between Limelight Networks and EdgeCast in 2022, which produced a company with over 20 years of experience choosing and installing their own equipment into co-location facilities.

With customers like Disney, ESPN, Amazon, and Verizon, Edgio has had to manage both explosive growth and exceptionally high expectations.

So, there’s no better source to help you learn to choose a co-location provider than Kyle Faber, Head of CDN Product Delivery Management at Edgio. He’s got experience, and as you’ll see below, the pictures to prove it.

Kyle starts with a description of the math involved in deciding whether co-location is the right direction for your organization, and then works though must-have and nice-to-have co-location features. He covers the value of certifications, the importance of redundancy and temperature management, explores connectivity, support, and cost considerations, and finishes with a look at sustainability. It’s a deep and comprehensive look at choosing a co-location provider and information that anyone facing this decision will find invaluable.

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-1

NAVIGATE THE COMPLEXITIES OF PRIVATE COLOCATION DECISIONS

Kyle started by addressing the considerations video engineers should prioritize when contemplating the shift to private co-location. In the context of modern public cloud computing platforms, he asserted that the decision to opt for private colocation requires a higher level of scrutiny due to the advanced capabilities of cloud offerings. While some enterprises rely solely on public cloud solutions for their production stack, there are compelling reasons to explore private colocation options.

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-2

He outlined his talk as follows:

  • First, he detailed a methodology for considering your financial break-even.
  • Then, he identified the “must have” features that a co-location provider must offer.
  • Then he related the nice-to-have, but not essential features that are potentially negotiable based on your organization’s goals.
  • He concluded with insight into how to balance the cloud vs. co-location decision, sharing that “it’s not a zero-sum game.”

As you’ll see, throughout the talk, Kyle provided practical insights to help video engineers navigate the complexities of private colocation decisions. He emphasized understanding the factors influencing these choices and making informed decisions based on an organization’s unique circumstances.

UNDERSTANDING THE MATH AND BREAKEVEN PRINCIPLES

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-3

Kyle started the economic discussion with the concept of the economics of minimum load and its relevance to private co-location decisions for video engineers. Using an everyday analogy, Kyle drew parallels between choosing to buy a car for daily use versus opting for ride-sharing services. He noted that the expenses associated with car ownership accumulate rapidly, but they eventually stabilize.

The convenience of controlling usage and trip frequency often leads to a reduced cost per ride compared to ride-sharing services over time. This analogy illustrated the dynamics of yearly co-location contracts, where minimum load drives efficiencies and potential gains.

Kyle then shifted to a scenario involving short-term heavy needs, like vacation car rentals. He noted that car rentals offer flexibility for unpredictable schedules without the commitment of ownership. This aligns with the flexibility provided by bare metal service providers, who offer diverse options within predefined parameters. This approach maintains efficiency while operating within certain boundaries.

Concluding his analogy, Kyle compared on-demand and public cloud offerings to ride-sharing services. He emphasized their ease of access, requiring just a few clicks to summon a driver or server, without concerns regarding operational aspects like insurance, maintenance, and updates.

By illustrating these relatable scenarios, Kyle underscored the importance of understanding the economics of minimum load in the context of private co-location decisions, specifically catering to the considerations of video engineers.

NAVIGATE THE ECONOMICS OF MINIMUM LOAD

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-4

Kyle next elaborated on the strategic approach required to navigate the economics of minimum load in the context of private co-location decisions. He emphasized the significance of aligning different models with specific data center demands.

Drawing from personal experiences, Kyle illustrated the concept using relatable scenarios. He contrasted his friend’s experience of living near a rail line in Seattle, which made car ownership unnecessary, with his own situation in Scottsdale, Arizona, where car ownership was essential due to logistical challenges.

Translating this to the business realm, Kyle pointed out that various companies have unique server requirements. Some prioritize flexible load management over specialized hardware needs and prefer to maintain a lean staff without extensive server administration roles. For Edgio, a content delivery network, private co-location globally was the optimal choice to meet their specific requirements.

Kyle then began a cost analysis, acknowledging that while the upfront cost of private co-location might seem daunting compared to public cloud prices, the cumulative server hour costs can accumulate rapidly. He referenced AWS’s substantial revenue from convenience as an example. He highlighted the necessity of considering hidden costs, including human capital requirements and logistical factors.

Addressing executive leaders, Kyle cautioned against assuming that software developers skilled with code are also adept at running data centers. He emphasized the importance of having dedicated data center and server administration experts to maximize cost savings and avoid potential disasters.

Looking toward the future, Kyle advised mid-sized companies to consider their future needs and focus on maintaining nimbleness. He shared his insights into the challenges of hardware logistics and the value of proper tracking and clarity to identify breakeven points. In this comprehensive overview, Kyle provided practical insights into the economics of minimum load, offering a pragmatic perspective on private co-location decisions for video engineers.

MUST-HAVE CO-LOCATION FEATURES

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-5

With the economics covered, Kyle shifted to identifying the must-have features in any co-location service, suggesting that certifications play a crucial role in evaluating co-location providers. ISO 9,000 and SOC 2, types one and two, were cited as common minimum standards, with additional regional and industry-specific variations. Kyle recommended requesting certifications from potential vendors and conducting thorough research to understand the significance of these certifications.

Kyle explained that by obtaining certifications, you can move beyond basic questions about construction methods, power backup systems, and operational standards. Instead, you can focus on more nuanced inquiries, like power sources, security standards for visitors, and the training and responsiveness of remote hands teams. This transition allows for a more informed assessment of vendors’ capabilities and suitability for specific needs.

THE SIGNIFICANCE OF ON-SITE VISITS

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-6

Kyle underscored the significance of on-site visits in the colocation decision-making process, sharing three images that highlighted the insights gained from physical visits to data center facilities. The first image depicted service cabling that entered a data center. While the front of the building seemed pristine, the back revealed potential issues lurking in the shadows. Kyle stressed that some problems can only be identified through close inspection.

The second image showed a fiber distribution panel, showcasing the low level of professionalism evident in the data center’s installations. This reinforced the idea that visual assessments can reveal the quality of a facility’s infrastructure.

The third image illustrated a unique scenario. During construction, a new fiber channel was being laid, but the basement entry of the fiber trench was left unsealed. An overnight rainstorm resulted in the trench filling with water. Because the basement access hole was uncapped, water flowed downhill into a room with valuable equipment. This real-life example served as a reminder of the importance of thorough inspection and due diligence in the colocation industry.

These visuals underscore the importance of physically visiting data centers to identify potential challenges and make informed decisions.

AND TEMPERATURE MANAGEMENT

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-8

Kyle also shared that  temperature management is particularly important to data centers. For example, Edgio emphasizes cooling speed, temperature regulation, and high-density heat rejection technology. It’s not merely about achieving lower temperatures; it’s about effectively managing and dissipating heat.

Kyle explained that even a slight temperature fluctuation can trigger far-reaching consequences, so maintaining a precise temperature of 76 degrees Fahrenheit is paramount. The utilization of advanced heat rejection technology ensures that any deviations from this optimal point can be promptly corrected, guaranteeing peak performance for their installations.

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-9

Paradoxically, economic success complicates temperature maintenance. Over the past eight years, Kyle reported that Edgio achieved a 30% improvement in server power efficiency, coupled with a 760% surge in server density metrics. However, since the laws of physics remain steadfast, this density surge brings with it an elevated heat generation within a smaller space.

CONNECTIVITY, SUPPORT, AND COST CONSIDERATIONS

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-10

Kyle’s discussion then shifted to connectivity, sustainability, and environmental considerations with a focus on where to place each factor in your decision-making scorecard.

Emphasizing the critical role of connectivity in businesses, Kyle noted that vendors often claim constant uptime and availability, and usually deliver this, so they differentiate themselves through their access to the wider internet. When choosing a co-location provider, all organizations should reflect on their unique requirements. For instance, he suggests that businesses intending to connect with a CDN like Edgio might require a local data center partner that facilitates data transformation and transcoding but might not need the extensive infrastructure for global data distribution.

Kyle then addressed the significance of remote support, especially during initial installations where a swift response to issues is crucial. While tools like iDRAC and remote Out-of-Band server access provide control, Kyle highlighted the importance of real-time assistance during other critical moments, such as identifying server issues.

Addressing costs, Kyle acknowledges its pivotal role in decision-making, a sentiment particularly relevant given the current technology landscape. Kyle urges a balance between cost-effectiveness and quality, drawing parallels between daily personal choices and those made in professional spheres. He references Terry Pratchett’s boot theory of economics, emphasizing the inevitability of change and the need for proactive lifecycle management. “Even the best boots will not last forever,” Kyle paraphrased, “and you need to plan lifecycle management.”

A FEW WORDS ABOUT SUSTAINABILITY

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-11

Kyle urged all participants and readers to consider sustainability, transcending its status as a mere buzzword. “Sustainability is more than a buzzword,” he declared, “It is a commitment.”

He illuminated the staggering energy appetite of data centers, exemplified by Amazon’s permits for generators in Virginia, capable of producing a remarkable 4.6 gigawatts of backup power – enough to illuminate New York City for a day. Kyle underscored the industry’s responsibility to reevaluate energy sources, citing the rising importance of Environmental Social Governance (ESG) movements. He emphasized that organizations are now compelled to report their environmental impact to stakeholders and investors, emphasizing transparency.

When considering colocation facilities, Kyle recommended evaluating their sustainability reports, which reveal critical information from energy-sourcing practices to governance approaches. By aligning operational needs with global responsibilities, businesses can make conscientious choices that resonate with their core values and forge meaningful partnerships with data center providers.

GET INTIMATELY ACQUAINTED WITH THE UNPREDICTABLE

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-12

While you should perform a comprehensive needs analysis and service comparison to choose your provider, Kyle also highlighted that data centers are intimately acquainted with the unpredictable. Construction activities, often beyond the data center provider’s control, persistently surround these facilities.

The photo above, taken a mile away from a facility, exemplifies the unforeseen challenges. A construction crew, possibly misinformed or negligent, drove an auger into the ground at an incorrect location, inadvertently ensnaring cabling, and yanking dozens of meters of fiber from the earth.

The incident’s specifics remain unclear, yet the lesson is evident – despite meticulous planning, unpredictability is an integral facet of this landscape. As Kyle summarized, “It’s a stark reminder that despite our best plans, unpredictability has to be part of this landscape, so always be prepared for the unexpected.”

NO ONE-SIZE-FITS-ALL SOLUTION

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-13

In closing, Kyle addressed the intricate decisions surrounding ownership, rental, and on-demand data center services, emphasizing that there’s no one-size-fits-all solution. He presents the choice between owning servers, renting them, or opting for on-demand cloud services as a complex tapestry woven with factors such as the unique average minimum load and an organization’s strategic objectives.

Kyle cautioned that navigating this intricate landscape demands a nuanced perspective. The decision requires a well-thought-out plan that not only accommodates an organization’s goals and growth but also anticipates the evolving trends of the industry. This approach ensures that the chosen path resonates seamlessly with an organization’s aspirations, offering stability for the journey ahead.

GO FROM A PURE OPEX MODEL TO A CAPEX MODEL

NETINT Technologies | Building your own streaming cloud - online symposium | Kyle-Faber-Choosing-a-Co-Location-Facility-14

Before wrapping up, Kyle answered one question from the audience, “ How does someone begin to approach a transition? Is it even possible to go from a pure OPEX model to a CAPEX model? Any suggestions, ideas, insights?”

Kyle noted that when you assess an OPEX model, you’re essentially looking at linear costs. These costs offer a clear breakdown of your system expenses, which can be projected into the future.

While there might be some pricing fluctuations as public cloud providers compete, you can treat entire segments as a transition unit. It might not be feasible to buy just one server and place it in isolation, but you can transition comprehensive sections in one concerted effort.

So, you might build a small encoding farm, allowing for a gradual shift while maintaining flexibility across various cloud instances like AWS, Azure, or GCP. This phased approach grants greater control, cost benefits, and a smoother transition into the new paradigm.

ON-DEMAND: Kyle Faber - Choosing a Co-Location Facility

Choosing Transcoding Hardware: Deciphering the Superiority of ASIC-based Technology

Which technology reigns supreme in transcoding: CPU-only, GPU, or ASIC-based? Kenneth Robinson’s incisive analysis from the recent symposium makes a compelling case for ASIC-based transcoding, particularly NETINT’s Quadra. Robinson’s metrics prioritized viewer experience, power efficiency, and cost. While CPU-only systems appear initially economical, they falter with advanced codecs like HEVC. NVIDIA’s GPU transcoding offers more promise, but the Quadra system still outclasses both in quality, cost per stream, and power consumption. Furthermore, Quadra’s adaptability allows a seamless switch between H.264 and HEVC without incurring additional costs. Independent assessments, such as Ilya Mikhaelis’, echo Robinson’s conclusions, cementing ASIC-based transcoding as the optimal choice.
Choosing transcoding hardware

During the recent symposium, Kenneth Robinson, NETINT’s manager of Field Application Engineering, compared three transcoding technologies: CPU-only, GPU, and ASIC-based. His analysis, which incorporated quality, throughput, and power consumption, is useful as a template for testing methodology and for the results. You can watch his presentation here and download a copy of his presentation materials here.

Figure 1. Overall savings from ASIC-based transcoding (Quadra) over GPU (NVIDIA) and CPU.
Figure 1. Overall savings from ASIC-based transcoding (Quadra) over GPU (NVIDIA) and CPU.

As a preview of his findings, Kenneth found that when producing H.264, ASIC-based transcoding delivered CAPEX savings of 86% and 77% compared to CPU and GPU-based transcoding, respectively. OPEX savings were 95% vs. CPU-only transcoding and 88% compared to GPU.

For the more computationally complex HEVC codec, the savings were even greater. As compared to CPU-based transcoding, ASICs saved 94% on CAPEX and 98% on OPEX. As compared to GPU-based transcoding, ASICs saved 82% on CAPEX and 90% on OPEX. These savings are obviously profound and can make the difference between a successful and profitable service and one that’s mired in red ink.

Let’s jump into Kenneth’s analysis.

Determining Factors

Digging into the transcoding alternatives, Kenneth described the three options. First are CPUs from manufacturers like AMD or Intel. Second are GPUs from companies like NVIDIA or AMD. Third are ASICs, or Application Specific Integrated Circuits, from manufacturers like NETINT. Kenneth noted that NETINT calls its Quadra devices Video Processing Units (VPU), rather than transcoders because they perform multiple additional functions besides transcoding, including onboard scaling, overlay, and AI processing.

He then outlined the factors used to determine the optimal choice, detailing the four factors shown in Figure 2. Quality is the average quality as assessed using metrics like VMAF, PSNR, or subjective video quality evaluations involving A/B comparisons with viewers. Kenneth used VMAF for this comparison. VMAF has been shown to have the highest correlation with subjective scores, which makes it a good predictor of viewer quality of experience.

Choosing transcoding hardware - Determining Factors
Figure 2. How Kenneth compared the technologies.

Low-frame quality is the lowest VMAF score on any frame in the file. This is a predictor for transient quality issues that might only impact a short segment of the file. While these might not significantly impact overall average quality, short, low-quality regions may nonetheless degrade the viewer’s quality of experience, so are worth tracking in addition to average quality.

Server capacity measures how many streams each configuration can output, which is also referred to as throughput. Dividing server cost by the number of output streams produces the cost per stream, which is the most relevant capital cost comparison. The higher the number of output streams, the lower the cost per stream and the lower the necessary capital expenditures (CAPEX) when launching the service or sourcing additional capacity.

Power consumption measures the power draw of a server during operation. Dividing this by the number of streams produced results in the power per stream, the most useful figure for comparing different technologies.

Detailing his test procedures, Kenneth noted that he tested CPU-only transcoding on a system equipped with an AMD Epic 32-core CPU. Then he installed the NVIDIA L4 GPU (a recent release) for GPU testing and NETINT’s Quadra T1U U.2 form factor VPU for ASIC-based testing.

He evaluated two codecs, H.264 and HEVC, using a single file, the Meridian file from Netflix, which contains a mix of low and high-motion scenes and many challenging elements like bright lights, smoke and fog, and very dark regions. If you’re testing for your own deployments, Kenneth recommended testing with your own test footage.

Kenneth used FFmpeg to run all transcodes, testing CPU-only quality using the x264 and x265 codecs using the medium and very fast presets. He used FFmpeg for NVIDIA and NETINT testing as well, transcoding with the native H.264 and H.265 codec for each device.

H.264 Average, Low-Frame, and Rolling Frame Quality

The first result Kenneth presented was average H.264 quality. As shown in Figure 3, Kenneth encoded the Meridian file to four output files for each technology, with encodes at 2.2 Mbps, 3.0 Mbps, 3.9 Mbps, and 4.75 Mbps. In this “rate-distortion curve” display, the left axis is VMAF quality, and the bottom axis is bitrate. In all such displays, higher results are better, and Quadra’s blue line is the best alternative at all tested bitrates, beating NVIDIA and x264 using the medium and very fast presets.

Figure 3. Quadra was tops in H.264 quality at all tested bitrates.
Figure 3. Quadra was tops in H.264 quality at all tested bitrates.

Kenneth next shared the low-frame scores (Figure 4), noting that while the NVIDIA L4’s score was marginally higher than the Quadra’s, the difference at the higher end was only 1%. Since no viewer would notice this differential, this indicates operational parity in this measure.

Figure 4. NVIDIA’s L4 and the Quadra achieve relative parity in H.264 low-frame testing.
Figure 4. NVIDIA’s L4 and the Quadra achieve relative parity in H.264 low-frame testing.

The final H.264 quality finding displayed a 20-second rolling average of the VMAF score. As you can see in Figure 5, the Quadra, which is the blue line, is consistently higher than the NVIDIA L4 or medium or very fast. So, even though the Quadra had a lower single-frame VMAF score compared to NVIDIA, over the course of the entire file, the quality was predominantly superior.

Figure 5. 20-second rolling frame quality over file duration.
Figure 5. 20-second rolling frame quality over file duration.

HEVC Average, Low-Frame, and Rolling Frame Quality

Kenneth then related the same results for HEVC. In terms of average quality (Figure 6), NVIDIA was slightly higher than the Quadra, but the delta was insignificant. Specifically, NVIDIA’s advantage starts at 0.2% and drops to 0.04% at the higher bit rates. So, again, a difference that no viewer would notice. Both NVIDIA and Quadra produced better quality than CPU-only transcoding with x265 and the medium and very fast presets.

Figure 6. Quadra was tops in H.264 quality at all tested bitrates.
Figure 6. Quadra was tops in H.264 quality at all tested bitrates.

In the low-frame measure (Figure 7), Quadra proved consistently superior, with NVIDIA significantly lower, again a predictor for transient quality issues. In this measure, Quadra also consistently outperformed x265 using medium and very fast presets, which is impressive.

Figure 7. NVIDIA’s L4 and the Quadra achieve relative parity in H.264 low-frame testing.
Figure 7. NVIDIA’s L4 and the Quadra achieve relative parity in H.264 low-frame testing.

Finally, HEVC moving average scoring (Figure 8) again showed Quadra to be consistently better across all frames when compared to the other alternatives. You see NVIDIA’s downward spike around frame 3796, which could indicate a transient quality drop that could impact the viewer’s quality of experience.

Figure 8. 20-second rolling frame quality over file duration.
Figure 8. 20-second rolling frame quality over file duration.

Cost Per Stream and Power Consumption Per Stream - H.264

To measure cost and power consumption per stream, Kenneth first calculated the cost for a single server for each transcoding technology and then measured throughput and power consumption for that server using each technology. Then, he compared the results, assuming that a video engineer had to source and run systems capable of transcoding 320 1080p30 streams.

You see the first step for H.264 in Figure 9. The baseline computer without add-in cards costs $7,100 but can only output fifteen 1080p30 streams using an average of the medium and veryfast presets, resulting in a cost per stream was $473. Kenneth installed two NVIDIA L4 cards in the same system, which boosted the price to $14,214, but more than tripled throughput to fifty streams, dropping cost per stream to $285. Kenneth installed ten Quadra T1U VPUs in the system, which increased the price to $21,000, but skyrocketed throughput to 320 1080p30 streams, and a $65 cost per stream.

This analysis reveals why computing and focusing on the cost per stream is so important; though the Quadra system costs roughly three times the CPU-only system, the ASIC-fueled output is over 21 times greater, producing a much lower cost per stream. You’ll see how that impacts CAPEX for our 320-stream required output in a few slides.

Figure 9. Computing system cost and cost per stream.
Figure 9. Computing system cost and cost per stream.

Figure 10 shows the power consumption per stream computation. Kenneth measured power consumption during processing and divided that by the number of output streams produced. This analysis again illustrates why normalizing power consumption on a per-stream basis is so necessary; though the CPU-only system draws the least power, making it appear to be the most efficient, on a per-stream basis, it’s almost 20x the power draw of the Quadra system.

Figure 10. Computing power per stream for H.264 transcoding.
Figure 10. Computing power per stream for H.264 transcoding.

Figure 11 summarizes CAPEX and OPEX for a 320-channel system. Note that Kenneth rounded down rather than up to compute the total number of servers for CPU-only and NVIDIA. That is, at a capacity of 15 streams for CPU-only transcoding, you would need 21.33 servers to produce 320 streams. Since you can’t buy a fractional server, you would need 22, not the 21 shown. Ditto for NVIDIA and the six servers, which, at 50 output streams each, should have been 6.4, or actually 7. So, the savings shown are underrepresented by about 4.5% for CPU-only and 15% for NVIDIA. Even without the corrections, the CAPEX and OPEX differences are quite substantial.

Figure 11. CAPEX and OPEX for 320 H.264 1080p30 streams.
Figure 11. CAPEX and OPEX for 320 H.264 1080p30 streams.

Cost Per Stream and Power Consumption Per Stream - HEVC

Kenneth performed the same analysis for HEVC. All systems cost the same, but throughput of the CPU-only and NVIDIA-equipped systems both drop significantly, boosting their costs per stream. The ASIC-powered Quadra outputs the same stream count for HEVC as for H.264, producing an identical cost per stream.

Figure 12. Computing system cost and cost per stream.
Figure 12. Computing system cost and cost per stream.

The throughput drop for CPU-only and NVIDIA transcoding also boosted the power consumption per stream, while Quadra’s remained the same.

Figure 13. Computing power per stream for H.264 transcoding.
Figure 13. Computing power per stream for H.264 transcoding.

Figure 14 shows the total CAPEX and OPEX for the 320-channel system, and this time, all calculations are correct. While CPU-only systems are tenuous–at best– for H.264, they’re clearly economically untenable with more advanced codecs like HEVC. While the differential isn’t quite so stark with the NVIDIA products, Quadra’s superior quality and much lower CAPEX and OPEX are compelling reasons to adopt the ASIC-based solution.

Figure 14. CAPEX and OPEX for 320 1080p30 HEVC streams.
Figure 14. CAPEX and OPEX for 320 1080p30 HEVC streams.

As Kenneth pointed out in his talk, even if you’re producing only H.264 today, if you’re considering HEVC in the future, it still makes sense to choose a Quadra-equipped system because you can switch over to HEVC with no extra hardware cost at any time. With a CPU-only system, you’ll have to more than double your CAPEX spending, while with NVIDIA,  you’ll need to spend another 25% to meet capacity.

The Cost of Redundancy

Kenneth concluded his talk with a discussion of full hardware and geo-redundancy. He envisioned a setup where one location houses two servers (a primary and a backup) for full hardware redundancy. A similar setup would be replicated in a second location for geo-redundancy. Using the Quadra video server, four servers could provide both levels of redundancy, costing a total of $84,000. Obviously, this is much cheaper than any of the other transcoding alternatives.

NETINT’s Quadra VPU proved slightly superior in quality to the alternatives, vastly cheaper than CPU-only transcoding, and very meaningfully more affordable than GPU-based transcoders. While these conclusions may seem unsurprising – an employee at an encoding ASIC manufacturer concludes that his ASIC-based technology is best — you can check Ilya Mikhaelis’ independent analysis here and see that he reached the same result.

Now ON-DEMAND: Symposium on Building Your Live Streaming Cloud

From CPU to GPU to ASIC: Mayflower’s Transcoding Journey

From CPU to GPU to ASIC: Mayflower's Transcoding Journey
Ilya’s transcoding journey took him from $10 million to under $1.5 million CAPEX while cutting power consumption by over 90%. This analytical deep-dive reveals the trials, errors, and successes of Mayflower’s quest, highlighting a remarkable reduction in both cost and power consumption.

From CPU to GPU to ASIC: The Transcoding Journey

Ilya Mikhaelis

Ilya Mikhaelis is the streaming backend tech lead for Mayflower, which builds and hosts streaming infrastructures for multiple publishers. Mayflower’s infrastructure handles over 10,000 incoming streams and over one million plus outgoing streams at a latency that averages one to two seconds.

Ilya’s challenge was to find the most cost-effective technology to transcode the incoming streams. His journey took him from CPU-based transcoding to GPU and then two generations of ASIC-based transcoding. These transitions slashed total production transcoding costs from $10 million dollars to just under $1.5 million dollars while reducing power consumption by over 90%, from 325,000 watts to 33,820 watts.

Ilya’s rigorous textbook-worthy testing methodology and findings are invaluable to any video engineer seeking the highest quality transcoding technology at the lowest capital cost and most efficient power usage. But let’s start at the beginning.

The Mayflower Internal CDN

As Ilya describes it, “Mayflower is a big company, under which different projects stand. And most of these projects are about high-load, live media streaming. Moreover some of Mayflower resources were included  in the top 50 of the most visited sites worldwide. And all these streaming resources are handled by one internal CDN, which was completely designed and implemented by my team.”

Describing the requirements, Ilya added, “The typical load of this CDN is about 10,000 incoming simultaneous streams and more than one million outgoing simultaneous streams worldwide. In most cases, we target a latency of one to two seconds. We try to achieve a real-time experience for our content consumers, which is why we need a fast and effective transcoding solution.”

To build the CDN, Mayflower used bare metal servers to maximize network and resource utilization and run a high-performance profile to achieve stable stream processing and keep encoder and decoder queues around zero. As shown in Figure 1, the CDN inputs streams via WebRTC and RTMP and delivers with a mix of WebRTC, HLS, and low latency HLS. It uses customized WebRTC inside the CDN to achieve minimum latency between servers.

Figure 1. Mayflower’s Low Latency CDN
Figure 1. Mayflower’s Low Latency CDN .

Ilya’s team minimizes resource wastage by implementing all high-level network protocols, like WebRTC, HLS, and low latency HLS, on their own. They use libav, an FFmpeg component, as a framework for transcoding inside their transcoder servers.

The Transcoding Pipeline

In Mayflowers’ transcoding pipeline (Figure 2), the system inputs a single WebRTC stream, which it converts to a five-rung encoding ladder. Mayflower uses a mixture of proprietary and libav filters to achieve a stable frame rate and stable load. The stable frame rate is essential for outgoing streams because some protocols, like low latency HLS or HLS, can’t handle variable frame rates, especially on Apple devices.

Figure 2. Mayflower’s Low Latency CDN.
Figure 2. Mayflower’s Low Latency CDN.

CPU-Only Transcoding - Too Expensive, Too Much Power

After creating the architecture, Ilya had to find a transcoding technology as quickly as possible. Mayflower initially transcoded on a Dell R940, which currently costs around $20,000 as configured for Mayflower. When Ilya’s team first implemented software transcoding, most content creators input at 720p. After a few months, as they became more familiar with the production operation, most switched to 1080p, dramatically increasing the transcoding load.

You see the numbers in Figure 3. Each server could produce only 20 streams, which at a server cost of $20,000 meant a per stream cost of $1,000. At this capacity, scaling up to handle the 10,000 incoming streams would require 500 servers at a total cost of $10,000,000.

Total power consumption would equal 500 x 650, or 325,000 watts. The Dell R940 is a 3RU server; at an estimated monthly cost of $125 for colocation, this would add $750,000 per year. 

Figure 3. CPU-only transcoding was very costly and consumed excessive power.
Figure 3. CPU-only transcoding was very costly and consumed excessive power.

These numbers caused Ilya to pause and reassess. “After all these calculations, we understood that if we wanted to play big, we would need to find a cheaper transcoding solution than CPU-only with higher density per server, while maintaining low latency. So, we started researching and found some articles on companies like Wowza, Xilinx, Google, Twitch, YouTube, and so on. And the first hint was GPU. And when you think GPU, you think NVIDIA, a company all streaming engineers are aware of.”

“After all these calculations, we understood that if we wanted to play big, we would need to find a cheaper transcoding solution than CPU-only with higher density per server, while maintaining low latency.”

GPUs - Better, But Still Too Expensive

Ilya initially considered three NVIDIA products: the Tesla V100, Tesla P100, and Tesla T4. The first two, he concluded, were best for machine learning, leaving the T4 as the most relevant option. Mayflower could install six T4s into each existing Dell server. At a current cost of around $2,000 for each T4, this produced a total cost of $32,000 per server.

Under capacity testing, the T4-enabled system produced 96 streams, dropping the per-stream cost to $333. This also reduced the required number of servers to 105, and the total CAPEX cost to $3,360,000.

With the T4s installed, power consumption increased to 1,070 watts for a total of 112,350 watts. At $125 per month per server, the 105 servers would cost $157,500 annually to house in a colocation facility.

Figure 4. Capacity and costs for an NVIDIA T4-based solution.
Figure 4. Capacity and costs for an NVIDIA T4-based solution.

Round 1 ASICs: The NETINT T432

The NVIDIA numbers were better, but as Ilya commented, “It looked like we found a possible candidate, but we had a strong sense that we needed to further our research. We decided to continue our journey and found some articles about a company named NETINT and their ASIC-based solutions.”

Mayflower first ordered and tested the T432 video transcoder, which contains four NETINT G4 ASICs in a single PCIe card. As detailed by Ilya, “We received the T432 cards, and the results were quite exciting because we produced about 25 streams per card. Power consumption was much lower than NVIDIA, only 27 watts per card, and the cards were cheaper. The whole server produced 150 streams in full HD quality, with a power consumption of 812 watts. For the whole production, we would pay about 2 million, which is much cheaper than NVIDIA solution.”

You see all this data in Figure 5. The total number of T432-powered servers drops to 67, which reduces total power to 54,404 watts and annual colocation to $100,500.

Figure 5. Capacity and costs for the NETINT T432 solution.
Figure 5. Capacity and costs for the NETINT T432 solution.

While costs and power consumption kept improving, Ilya noticed that the CDN’s internal queue started increasing when processing with T432-equipped systems. Initially, Ilya thought the problem was the lack of onboard scaling on the T432, but then he noticed that “even when producing all these ABR ladders, our CPU load was about only 40% during high load hours. The bottleneck was the card’s decoding and encoding capacity, not onboard scaling.”

Finally, he pinpointed the increase in the internal queue to the fact that the T432’s decoder couldn’t maintain 4K60 fps decode for H.264 input. This was unacceptable because it increased stream latency. Ilya went searching one last time; fortunately, the solution was close at hand.

Round 2 ASICs: The NETINT Quadra T2 - The Transcoding Monster

Ilya next started testing with the NETINT Quadra T2 video processing unit, or VPU, which contains two NETINT G5 chips in a PCIe card. As with the other cards, Ilya could install six in each Dell server.

“All those disadvantages were eliminated in the new NETINT card – Quadra…We have already tested this card and have added servers with Quadra to our production. It really seems to be a transcoding monster.”

Ilya’s team liked what they found. “All those disadvantages were eliminated in the new NETINT card – Quadra. It has a hardware scaler inside with an optimized pipeline: decoder – scaler – encoder in the same VPU. And H264 4K60 decoding is not a problem for it. We have already tested this card and have added servers with Quadra to our production. It really seems to be a transcoding monster.”

Figure 6 shows the performance and cost numbers. Equipped with the six T2 VPUs, each server could output 270 streams, reducing the number of required servers from 500 for CPU-only to a mere 38. This dropped the per stream cost to $141, less than half of the NVIDIA T4 equipped system, and cut the total CAPEX down to $1,444,000. Total power consumption dropped to 33,820 watts, and annual colocation costs for the 38 3U servers were $57,000.

Figure 6. Capacity and costs for the NETINT Quadra T2 solution.
Figure 6. Capacity and costs for the NETINT Quadra T2 solution.

Cost and Power Summary

Figure 7 presents a summary of costs and power consumption, and the numbers speak for themselves. In Ilya’s words, “It is obvious that Quadra T2 dominates by all characteristics, and according to our team experience, it is the best transcoding solution on the market today.”

Figure 7. Summary of costs and power consumption.
Figure 5. Capacity and costs for the NETINT T432 solution.

“It is obvious that Quadra T2 dominates by all characteristics, and according to our team experience, it is the best transcoding solution on the market today.”

Ilya also commented on the suitability of the Dell R940 system. “I want to emphasize that the DELL R940 isn’t the best server for VPU and GPU transcoders. It has a small density of PCIe slots and, as a result, a small density of VPU/GPU. Moreover, in the case of  Quadra and even T432, you don’t need such powerful CPUs.”

In terms of other servers to consider, Ilya stated, “Nowadays, you may find platforms on the market with even 16 PCIe slots. In such systems, especially if you use Quadra, you don’t need powerful CPUs inside because everything is done on the VPU. But for us, it was a legacy with which we needed to live.”

Video engineers seeking the optimal transcoding solution can take a lot from Ilya’s transcoding journey: a willingness to test a range of potential solutions, a rigorous focus on cost and power consumption per stream, and extreme attention to detail. At NETINT, we’re confident that this approach will lead you to precisely the same conclusion as Ilya, that the Quadra T2 is “the best transcoding solution on the market today.”

Now ON-DEMAND: Symposium on Building Your Live Streaming Cloud

From Cloud to Local Transcoding For Minimum Latency and Maximum Quality

From Cloud to Local Transcoding
Over the last ten years or so, most live productions have migrated towards a workflow that sends a contribution stream from the venue into the cloud for transcoding and delivery. For live events that need absolute minimum latency and maximum quality, it may be time to rethink that workflow, particularly if you’ve got multiple sharable inputs at the venue.

So says Bart Snoeks, Account & Partnership Director of THEO Technologies (“THEO”). By way of background, THEO invented and has commercially implemented the High-Efficiency Streaming Protocol (HESP), an adaptive HTTP- based video streaming protocol that enables sub-second end-to-end latency. You see how HESP compares to other low latency protocols in the table shown in Figure 1 from the HESP Alliance website – the organization focused on promoting and further advancing HESP.

Figure 1. HESP compared to other low latency protocols.

THEO has productized HESP as a real-time streaming service called THEOlive, which targets applications like live sports and betting, casino igaming, live auctions, and other events that require high-quality video at exceptionally low latency with delivery at scale. For example, in the case of in-play betting, cutting latency from 8 to 10 seconds (HLS) to under one second expands the betting window during the critical period just before the event.

When streaming casino games, ultra-low latency promotes fluent interactions between the players and ensures that all players see the turn of the cards in real time. When latency is lower, players can bet more quickly, increasing the number of hands that can be played.

According to Snoeks, a live streaming workflow that sends a contribution stream to the cloud for transcoding will always increase latency and can degrade quality as re-transcoding is needed. It’s especially poorly suited for stadium venues with multiple camera locations that want to enhance the attendee experience with multiple live feeds. In those latency-critical use cases you are actually adding network latency with a roundtrip to and from the cloud. Instead, it makes much more sense creating your encoding ladder and packaging on-site and pulling that directly from the origin to a private CDN for delivery.

Let’s take a step back and examine these two workflows.

Live Streaming Workflows

As stated at the top, most live-streaming productions encode a single contribution stream on-site and send that into the cloud for transcoding to a full ladder, packaging, and delivery. You see this workflow in Figure 2.

Figure 2. Encoding a contribution stream on-site to deliver to the cloud for transcoding, packaging, and delivery

This schema has multiple advantages. First, you’re sending a single stream to the cloud, lowering bandwidth requirements. Second, you’re centralizing your transcoding assets in a single location in the cloud, which typically enables better utilization.

According to Snoeks, however, this workflow will add 200 to 500  milliseconds of latency at a minimum, depending on the encoding speed, quality, and contribution protocol. In addition, though high-quality contribution encoders can minimize generational loss from the contribution stream, lower-quality transcoders can noticeably degrade the quality of the final output. You also need a contribution encoder for each camera, which can jack up hardware costs in high-volume igaming and similar applications.

Instead, for some specific use cases, you should consider the workflow shown in Figure 3. Here, you transcode on-site and send the full encoding ladder to a public CDN for external delivery and to a private CDN or equivalent for local viewing. This decreases latency to a minimum and produces absolute top quality as you avoid the additional transcoding step.

From Cloud to Local Transcoding - Figure-2
Figure 3. Encoding and packaging the encoding ladder on site and transmitting the streams to a public CDN for external viewers and a private CDN for local viewers.

This schema is particularly useful for venues that want to enhance the in-stadium experience with multiple camera feeds. Imagine a stock car race where an attendee only sees his driver on the track once every minute or so. Encoding on-site might allow attendees to watch the camera view from inside their favorite driver’s car with near real-time latency. It might let golf fans follow multiple groups while parked at a hole or following their favorite player.

If you’re encoding input from many cameras, say in a casino or even racetrack environment, the cost of on-site encoding might be less than the cost of the individual contribution encoders. So, you get the best of all worlds, lower cost per stream, lower latency, higher quality, and a better in-person experience where applicable.

If you’re interested in learning about your transcoding options, check out our symposium Building Your Own Live Streaming Cloud, where you can hear from multiple technology experts discussing transcoding options like CPU-only, GPU, and ASIC-based transcoding and their respective costs, throughput, and density.

If you’re interested in learning more about HESP, THEO in general, or THEOlive, watch for an upcoming episode of Voices of Video, where I interview Pieter-Jan Speelman, CTO of THEO Technologies. We’ll discuss HESP’s history and evolution, the power of THEOlive real-time streaming technology, and how to use it in your live production stack. Make sure you don’t miss it!

Now ON-DEMAND: Symposium on Building Your Live Streaming Cloud

From Cloud to Control. Building Your Own Live Streaming Platform

Cloud services are an effective way to begin live streaming. Still, once you reach a particular scale, it's common to realize that you’re paying too much and can save significant OPEX by deploying transcoding infrastructure yourself. The question is, how to get started?

NETINT’s Build Your Own Live Streaming Platform symposium gathers insights from the brightest engineers and game-changers in the live-video processing industry on how to build and deploy a live-streaming platform.

In just three hours, we’ll cover the following:

  • Hardware options for live transcoding and encoding to cut costs by as much as 80%.
  • Software options for producing, delivering, and playing your live video streams.
  • Co-location selection criteria to achieve cloud-like performance with on-premise affordability.


You’ll also hear from two engineers who will demystify the process of assembling a live-streaming facility, how they identified and solved key hurdles, along with real costs and performance data.

Cloud? Or your own hardware?

It’s clear to many that producing live streams via a public cloud like AWS can be vastly more expensive than owning your hardware. (You can learn more by reading “Cloud or On-Premises? The Streaming Dilemma” and “How to Slash CAPEX, OPEX, and Carbon Emissions Using the NETINT T408 Video Transcoder”). 

To quote serial entrepreneur David Hansson, who recently migrated two SaaS services from the cloud to on-premise, “Don’t let the entrenched cloud interests dazzle you into believing that running your own setup is too complicated. Everyone and their dog did it to get the internet off the ground, and it’s only gotten easier since.” 

For those who have only operated in the cloud, there’s fear of the unknown. Fear buying hardware transcoders, selecting the right software, and choosing the best colocation service. So, we decided to fight fear with education and host a symposium to educate streaming engineers on all these topics.  

“Building Your Own Live Streaming Cloud” will uncover how owning your encoding stack can slash operating costs and boost performance with minimal CAPEX.

Learn to select the optimal transcoding hardware, transcoding and packaging software, and colocation facilities. We’ll also discuss strategies to reduce carbon emissions from your transcoding engine. 

This FREE virtual event takes place on August 17th, from 11:00 AM – 2:15 PM EST.

Five issues tackled by nine experts:

Transcoding Hardware Options:

Learn the pros and cons of CPU, GPU, and ASIC-based transcoding via detailed throughput and cost examples shared by Kenneth Robinson, Manager of Field Application Engineers at NETINT Technologies. Then Ilya Mikhaelis, Streaming Backend Tech Lead at Mayflower, will describe his company’s journey from CPU to GPU to ASICs, covering costs, power consumption, latency, and density metrics.

Software Options:

Jan Ozer from NETINT will identify the three categories of transcoding software: multimedia frameworks, media servers, and other tools. Then you’ll hear from experts in each category, starting with Romain Bouqueau, founder of Motion Spell, who will discuss the capabilities of the GPAC multimedia framework. Barry Owen, Chief Solutions Architect at Wowza, will discuss Wowza Streaming Engine’s suitability for private clouds. Lastly, Adrian Roe, Director at Id3as, developer of Norsk, will demonstrate Norsk’s simple, scripting-based operation, and extensive production and transcoding features.

Housing Options:

Once you select your hardware and software, the next step is finding the right co-location facility to house your live streaming infrastructure. Kyle Faber, with experience in building Edgio’s video streaming infrastructure, will guide you through the essential factors to consider when choosing a co-location facility.

Minimizing the Environmental Impact:

As responsible streaming professionals, it’s essential to address the environmental impact of our operations. Barbara Lange, Secretariat of Greening of Streaming, will outline actionable steps video engineers can take to minimize power consumption when acquiring and deploying transcoding servers.

Pulling it All Together:

Stef van der Ziel, founder of live-streaming pioneer Jet-Stream, will share lessons learned from his experience in creating both Jet-Stream’s private cloud and cloud transcoding solutions for customers. In his closing talk, Stef will demystify the process of choosing hardware, software, and a hosting facility, bringing all the previous discussions together into a cohesive plan.

Full Agenda:

11:00 am. – 11:10 am EST

Introduction (10 minutes):
Mark Donnigan, Head of Strategic Marketing at NETINT Technologies
Welcome, overview, and what you will learn.

 

11:10 am. – 11:40 am EST

Choosing transcoding hardware (30 minutes):
Kenneth Robinson, Manager of Field Application Engineers at NETINT Technologies
You have three basic approaches to transcoding, CPU-only, GPU, and ASICs. Kenneth outlines the pros and cons of each approach with extensive throughput and CAPEX and OPEX examples for each.

 

11:40 am. – 12:00 pm EST

From CPU to GPU to ASIC: Our Transcoding Journey (20 minutes):
Ilya Mikhaelis, Streaming Backend Tech Lead at Mayflower
Charged with supporting very high-volume live transcoding operations, Ilya started with libx264 software transcoding, which consumed massive power but yielded low stream density per server. Then he experimented with GPUs and other hardware and ultimately transitioned to an ASIC-based solution with much lower power consumption and much higher stream density per server. Ilya will detail the costs, power consumption, and density of all options, providing both data and an invaluable evaluation framework.

 

12:00 pm. – 12:10 pm EST

Choosing your live production software (10 minutes): 
Jan Ozer, Senior Director of Video Technology at NETINT Technologies
The core of every live streaming system is transcoding and packaging software. This comes in many shapes and sizes, from open-source software like FFmpeg and GPAC, to streaming servers like Wowza, and production systems like Norsk. Jan discusses these multiple options so you can cohesively and affordably build your own live-streaming ecosystem.

 

12:10 pm. – 1:10 pm EST

Speed Round (60 minutes):
20-minute presentations from GPAC, Wowza, and NORSK.
Speakers from GPAC, Wowza, and NORSK discussing the features, functions, operational paradigms, and cost structure of their live software offering.

Speakers include:

  • Adrian Roe, CEO at id3as, Product: Norsk, Title: Make Live Easy with NORSK SDK
  • Romain Bouqueau, Founder and CEO, Motion Spell (home for GPAC Licensing), Product: GPAC Title of Talk: Deploying GPAC for Transcoding and Packaging
  • Barry Owen, Chief Solutions Architect at Wowza, Title of Talk: Start Streaming in Minutes with Wowza Streaming Engine



1:10 pm. – 1:40 pm EST

Choosing a co-location facility (30 minutes): 
Kyle Faber, Senior Director of Product Management at Edgio.
Once you’ve chosen your hardware and software, you need a place to install them. If you don’t have your own connected data center, you may consider a colocation facility. In his talk, Kyle addresses the key factors to consider when choosing a co-location facility for your live streaming infrastructure.

 

1:40 pm. – 1:55 pm EST

How to Greenify Your Encoding Stack (15 minutes):
Barbara Lange, Secretariat of Greening of Streaming.
Learn how video streaming companies can work to significantly reduce their energy footprint and contribute to a greener streaming industry. Implement hardware and infrastructure optimization using immersion cooling and data center design improvements to maximize energy efficiency in your streaming infrastructure.

 

1:55 pm. – 2:15 pm EST

Closing Keynote (20 minutes):
Stef van der Ziel, Founder Jet-Stream
Jet-stream has delivered streaming solutions since its launch in 1994 and offers its own live streaming platform. One focus has been creating custom transcoding solutions for customers seeking to create their own private cloud for various applications. In his closing talk, Stef will demystify the process of choosing hardware, software, and a hosting facility and wrap a pretty bow around all previous presentations.