Netflix delves into gaming, aiming to transform the cloud and mobile landscape. Experience enhanced entertainment offerings.
Continue readingKey Cloud Gaming Concepts with Blacknut’s Olivier Avaro
Recently, our Mark Donnigan interviewed Olivier Avaro, the CEO of Blacknut, the world’s leading pure-player cloud gaming service. As an emerging market, cloud gaming is new to many, and the interview covered a comprehensive range of topics with clarity and conciseness. For this reason, we decided to summarize some of the key concepts and include them in this post. If you’d like to listen to the complete interview, and we recommend you do, click here. Otherwise, you can read a lightly edited summary of the key topics below.
For perspective, Avaro founded Blacknut in 2016, and the company offers consumers over seven hundred premium titles for a monthly subscription, with service available across Europe, Asia, and North America on a wide range of devices, including mobiles, set-top-boxes, and Smart TVs. Blacknut also distributes through ISPs, device manufacturers, OTT services, and media companies, offering a turnkey service, including infrastructure and games that allow businesses to instantly offer their own cloud gaming service.
Cloud Gaming Primer - the key points covered in the interview
The basic cloud gaming architecture is simple.
The architecture of cloud gaming is simple. You take games, you put them on the server in the cloud, and you virtualize and stream it in the form of a video stream so that you don’t have to download the game on the client side. When you interact with the game, you send a command back to the server, and you interact with the game this way.
Of course, bandwidth needs to be sufficient, let’s say six megabits per second. Latency needs to be good, let’s say less than 80 milliseconds. And, of course, you need to have the right infrastructure on the server that can run games. This means a mixture of CPU, GPU, storage, and all this needs to work well.
But cost control is key.
We passed the technology inflection point where actually the service becomes to be feasible. Technically feasible, the experience is good enough for the mass market. Now, the issue is on the unique economics and how much it costs to stream and deliver games in an efficient manner so that it is affordable for the mass market.
Public Cloud is great for proof of concept.
We started deploying the service based on the public cloud because this allowed us to test the different metrics, how people were playing the service, and how many hours. And this was actually very fast to launch and to scale…That’s great, but they are quite expensive.
But you need your own infrastructure to become profitable.
So, to optimize the economics, we built what we call the hybrid cloud for cloud gaming, which is a combination of both the public cloud and private cloud. So, we must install our own servers based on GPUs, CPUs, and so on so we can improve the overall performance and the unique economics of the system.
Cost per concurrent user (CCU) is the key metric.
The ultimate measure is the cost per concurrent user that you can get on a specific bill of material. If you have a CPU plus GPU architecture, the game is going to slice the GPU in different pieces in a more dynamic manner and in a more appropriate manner so that you can run different games and as many games as possible.
GPU-only architectures deliver high CCUs, which decreases profitability.
There are some limits on how much you can slice the GPU and still be efficient and so there are some limits in this architecture because it all relies on the GPU. We are investigating different architectures using a VPU, like NETINT’s, that will offload the GPU of the task of encoding and streaming the video so that we can augment the density.
VPU-augmented architectures decrease CCU by a factor of ten.
I think in terms of some big games, because they rely much more on the GPU, you will probably not augment the density that much. But we think that overall, we can probably gain a factor of ten on the number of games that you can run on this kind of architecture. So, passing from a max of 20, 24 games to running two hundred games on an architecture of this kind.
Which radically increases profitability.
So, augmenting the density by a factor of ten means also, of course, diminishing the cost per CCU by a factor of ten. So, if you pay $1 currently, you will pay ten cents, and that makes a whole difference. Because let’s assume basic gamers will play 10 hours per month or 30 hours per month; if this costs $1 per hour, this is $30, right? If this is ten cents, then costs are from $1 to $3, which I think makes the match work on the subscription, which is between 5 to 15 euros per month
The secret sauce is peer-to-peer DMA.
[Author’s note: These comments, explaining how NETINT VPU’s deliver a 10x performance advantage over GPUs, are from Mark Donnigan].
Anybody who understands basic server architecture, it’s not difficult to think, wait a second, isn’t there a bottleneck inside the machine? What NETINT did was create a peer-to-peer sharing inside the DMA (Direct Memory Access). So, the GPU will output a rendered frame, and it’s transferred inside memory, so that the VPU can pick that up, encode it, and there’s effectively zero latency because it’s happening in the memory buffer.
5G is key to successful gameplay in emerging markets.
[Back to Olivier] What we’ve been doing with Ericsson is using 5G networks and defining specific characteristics of what is a slice in the 5G network. So, we can tune the 5G network to make it fit for gaming and to optimize the delivery of gaming with 5G.
So, we think that 5G is going to get much faster in those regions where actually the internet is not so great. We’ve been deploying the Blacknut service in Thailand, Singapore, Malaysia, now in the Philippines. And this has allowed us to reach people in regions where there is no cable or bandwidth with fiber.
Latency needs to be eighty milliseconds or less (much less for first-person shooter games).
You can get a reasonably good experience at 80 milliseconds for most games. But for first-person shooter games, you need to be close to frame accuracy, which is very difficult in cloud gaming. You need to go down to thirty milliseconds and lower, right?
That’s only feasible with the optimal network infrastructure.
And that’s only feasible if you have a network that allows for it. Because it’s not only about the encoding part, the server side, and the client side; it’s also about where the packets are going through the networks. You need to make sure that there is some form of CDN for cloud gaming in place that makes the experience optimal.
Edge servers reduce latency.
We are putting a server at the edge of the network. So, inside the carrier’s infrastructure, the latency is super optimized. So that’s one thing that is key for the service. We started with a standard architecture, with CPU and GPU. And now, with the current VPU architecture, we are putting whole servers consisting of AMD GPU and NETINT VPU. We build the whole package so that we put this in the infrastructure of the carrier, and we can deploy the Blacknut cloud gaming on top of it.
The best delivery resolution is device dependent.
The question is, again, the cost and the experience. Okay? Streaming 4K on a mobile device does not really make sense. The screen is smaller, so you can screen a smaller resolution and that’s sufficient. On a TV, likely you need to have a bigger resolution. Even if there is a great upscale available on most TV sets, we stream 720p on Samsung devices, and that’s super great, right? But of course, scaling up to 1080p will provide a much better experience. So, on TVs and for the game that requires it, I think we’re indeed streaming the service at about 1080p.
Frame rates must match game speed.
When playing a first-person shooter, if you have the choice and you cannot stream 1080p, you would probably stream 720p at 60 FPS rather than 1080p at 30 FPS. But if you have different games with elaborate textures, the resolution is more important, then maybe you will actually select more 1080p and 30 fps resolution.
What we build is fully adaptable. Ultimately, you should not forget that there is a network in between. And even if technically you can stream 4K or 8K, the networks may not sustain it. Okay? And then you’ll have a worse experience streaming 4K than at 1080p 60 FPS resolution.
The Components That Make Cloud Gaming Production Affordable (or Not)
If you’ve made it past the title, you know that cloud gaming platforms operate in a highly competitive environment with narrow margins. This makes the purchase and operating costs per stream critical elements to system success.
This brief article will lay out the major cost elements of cloud gaming platforms and cite some commercial examples of hardware combinations and stream output. We’ve created a table you can use to collect the critical data points while looking at potential solutions around the NAB show, or if you’re simply browsing around the web. If you are at NAB, come by and see us at booth W1672 to discuss the NETINT solution shown in the table.
At their cores, cloud gaming production systems perform three functions; game logic, graphics rendering, and video encoding (Figure 1). Most systems process the game logic on a CPU and the graphics on a GPU. Encoding can be performed via the host CPU, the GPU, or a separate transcoder like NETINT’s ASIC-based Quadra, which outputs H.264, HEVC, and AV1.

Figure 1. The three core functions of a cloud gaming system.
Given the different components and configurations, identifying the cost per stream is critical to comparison analysis. Obviously, a $25,000 system that outputs 200 720p60 streams (cost/stream = $125) is more affordable than a $10,000 system that outputs 25 720p60 streams (cost/stream = $400).
Power consumption per stream is also a major cost contributor. Assuming a five-year expected life, even a small difference between two systems will be multiplied by 60 months of power bills and will significantly impact TCO, not to mention the environment or regulatory considerations.
Finally, normalizing comparisons on a single form factor, like a 1RU or 2RU server, is also essential. Beyond the power cost of a system, rack space costs money, whether in colocation fees or your own in-house costs. The other side of this coin is system maintenance; it costs less to maintain five servers that deliver 1,000 streams than 20 servers that deliver the same output.
HARD QUESTIONS ON HOT TOPICS
Get the cost per stream with the proper mix of GPU, CPU, and ASIC-based VPU
Watch the full conversation on YouTube: https://youtu.be/xaSRL847eIs
Comparing Systems
Enough talk; let’s compare some systems. Let’s agree up front that any comparison is unavoidably subjective, with results changing with the games tested and game configurations. You’ll almost certainly complete your own tests before buying, and at that point, you can ensure an apples-to-apples comparison. Use this information and the data you collect on your own to form a high-level impression of the value proposition delivered by each hardware configuration.
Table 1 details three systems, a reference design that is in mass production from NETINT, one from an established mobile cloud gaming platform, and one from Supermicro based on an Ampere Arm processor and four NVIDIA A16 GPUs.

Table 1. System configurations.
To compute the pricing information for the systems shown in table 2, we priced each component on the web and grabbed maximum power consumption data from each manufacturer. Pricing and power consumption shown are for the components listed, not the entire system. The number of 720p outputs is from each manufacturer, including NETINT.

Table 2. Component cost and power usage, total and on a cost-per-stream basis.
From there, it’s simple math; divide the cost and total watts by the 720p stream count to determine the cost per stream and watts per stream. Again, this is only for the core components identified, but the computer and other components should be relatively consistent irrespective of the CPU, GPU, and VPU that you use.
ASIC-based transcoders plus GPUs are the most cost-effective configuration to deliver a profitable and high-quality game streaming experience.
We are happy to share our data and sources so you can confirm independently.
As you walk the NAB show floor, or check proposed solutions on the web, beware of custom bespoke architectures using proprietary solutions (e.g. all Intel, all NVIDIA, all AMD). Each company has their demos that showcase technology, but not operational competitiveness. None of these systems can meet the OPEX or CAPEX needed for a competitive and profitable cloud gaming solution.

We challenge you to get your own numbers and compare them!
Download the printable TABLE HERE
Region of Interest Encoding for Cloud Gaming: A Survey of Approaches
As cloud gaming use cases expand, we are studying even more ways to deliver high-quality video with low latency and efficient bitrates.
Region of Interest Encoding (ROI) is one way to enhance video quality while reducing bandwidth. This post will discuss three ROI-based techniques recently proposed in research papers that may soon be adopted in cloud gaming encoding workflows.
This blog is meant to be informative. If I missed any important papers or methods, feel free to contact me HERE.
Region of Interest (ROI) Encoding
ROI encoding allows encoders to prioritize frame quality in critical regions most closely scrutinized by the viewer and is an established technique for improving viewer Quality of Experience. For example, NETINT’s Quadra video processing unit (VCU) uses artificial intelligence (AI) to detect faces in videos and then ROI encoding to improve facial quality. The NETINT T408/T432 also supports ROI encoding, but the specific regions must be manually defined in the command string.
ROI encoding is particularly relevant to cloud gaming, where viewers prefer fast-moving action, high resolutions, and high frame rates, but also want to play at low bitrates on wireless or cellular networks with ultra-low latency. These factors make cloud gaming a challenging compression environment.
Whether for real word videos or cloud gaming, the challenge with ROI encoding lies in identifying the most relevant regions of interest. As you’ll see, the three papers described below all take a markedly different approach.
In the paper “Content-aware Video Encoding for Cloud Gaming” (2019), researchers from Simon Fraser University and Advanced Micro Devices propose using metadata provided by the game developer to identify the crucial regions. As the article states,
“Identifying relevant blocks is straightforward for game developers, because they know the logic and semantics of the game. Thus, they can expose this information as metadata with the game that can be accessed via APIs... Using this information, one or more regions of interest (ROIs) are defined as bounding boxes containing objects of importance to the task being achieved by the player.”
The authors label their proposed method CAVE, for Content-Aware Video Encoding. Architecturally, CAVE sits between the game process and the encoder, as shown in Figure 1. Then, “CAVE uses information about the game’s ROIs and computes various encoding parameters to optimize the quality. It then passes these parameters to the Video Encoder, which produces the encoded frames sent to the client.”

Figure 1. The CAVE encoding method is implemented between the game process and encoder.
The results were promising. The technique “achieves quality gains in ROIs that can be translated to bitrate savings between 21% and 46% against the baseline HEVC encoder and between 12% and 89% against the closest work in the literature.”
Additionally, the processing overhead introduced by CAVE was less than 1.21%, which the authors felt would be reduced even further with parallelization, though implementing the process in silicon could completely eliminate the additional CPU loading.
ROI from Gaze Tracking
Another ROI-based approach was studied in the paper “Cloud Gaming With Foveated Video Encoding” by researchers from Aalto University in Finland and Politecnico di Torino in Italy. In this study, the region of interest was detected by a Tobii 4C Eye Tracker. This data was sent to the server, which used it to identify the ROI and adjust the Quantization Parameter (QP) values for the affected blocks accordingly.

Figure 2. Using region of interest data from a gaze tracker.
Referring to the title of this paper, the term ‘foveation’ refers to a “non-uniform sampling response to visual stimuli” that’s inherent to the human visual system. By incorporating the concept of foveation, the encoder can most effectively allocate QP values to the regions of interest and surrounding frames, and seamlessly blend them with other regions within the frame.
As stated in the paper, to compute the quality of each macroblock, “the gaze location is translated to a macroblock based coordinate system. The macroblock corresponding to the current gaze location is assigned the lowest QO, while the QO of macroblocks away from the gaze location increases progressively with distance from the gaze macroblock.”
The researchers performed extensive testing and analysis and ultimately concluded that “[o]ur evaluation results suggest that its potential to reduce bandwidth consumption is significant, as expected.” Regarding latency, the paper reports that “user study establishes the feasibility of FVE for FPS games, which are the most demanding latency wise.”
Obviously, any encoding solution tied to a gaze tracker has limited applicability, but the authors saw a much broader horizon ahead. “[w]e intend to attempt eliminating the need for specialized hardware for eye tracking by employing web cameras for the purpose. Using web cameras, which are ubiquitous in modern consumer computing devices like netbooks and mobile devices, would enable widespread adoption of foveated streaming for cloud gaming.”
Detecting ROI from Machine Learning
Finally, DeepGame: Efficient Video Encoding for Cloud Gaming was published in October 2021 by researchers from Simon Fraser University and Advanced Micro Devices, including three authors of the first paper mentioned above.
As detailed in the introduction, the authors propose “a new video encoding pipeline, called DeepGame, for cloud gaming to deliver high-quality game streams without codec or game modifications…DeepGame takes a learning-based approach to understand the player contextual interest within the game, predict the regions of interest (ROIs) across frames, and allocate bits to different regions based on their importance.”
At a high level, DeepGame is implemented in three stages:
- Scene analysis to gather data
- ROI prediction, and
- Encoding parameters calculation
Regarding the last stage, these encoding parameters are passed to the encoder via “a relatively straightforward set of APIs” so it’s not necessary to modify the encoder source code.
The authors describe their learning-based approach as follows; “DeepGame learns the player’s contextual interest in the game and the temporal correlation of that interest using a spatiotemporal deep neural network.” The schema for this operation is shown in Figure 3.
In essence, this learning-based approach means that some game-specific training is required beforehand and some processing during gameplay to identify ROIs in real time. The obvious questions are, how much latency does this process add, and how much bandwidth does the approach save.

Figure 3. DeepGame’s neural network-based schema for detecting region of interest.
Regarding latency, model training is performed offline and only once per game (and for major upgrades). Running the inference on the model is performed during each gaming session. During their testing, the researchers ran the inference model on every third frame and concluded that “ROI prediction time will not add any processing delays to the pipeline.”
The researchers trained and tested four games, FIFA 20, a soccer game, CS:GO, a first-person shooter game, and NBA Live 19 and NHL 19, and performed multiple analyses. First, they compared their predicted ROIs to actual ROIs detected using a Gazepoint GP3 eye-tracking device. Here, accuracy scores ranged from a high of 85.95% for FIFA 20 to a low of 73.96% for NHL 19.
Then, the researchers compared the quality in the ROI regions with an unidentified “state-of-the-art H.265 video encoder” using SSIM and PSNR. BD-Rate savings for SSIM ranged from 33.01% to 20.80%, and from 35.06% to 19.11% for PSNR. They also compared overall frame quality using VMAF, which yielded nearly identical scores, proving that DeepGame didn’t degrade overall quality despite the bandwidth savings and improved quality with regions of interest.
The authors also performed a subjective study with the FIFA 20 and CS:GO games using x264 with and without DeepGame inputs. The mean opinion scores incorporated the entire game experience, including lags, distortions, and artifacts. In these tests, DeepGame improved the Mean Opinion Scores by up to 33% over the base encoder.
HARD QUESTIONS ON HOT TOPICS – CLOUD OR ON PREMISES, HOW TO DO THE MATH?
Watch the full conversation on YouTube: https://youtu.be/KIaYFS54QNY
Summary
All approaches have their pros and cons. The CAVE approach should be most accurate in identifying ROIs but requires metadata from game developers. The gaze tracker approach can work with any game but requires hardware that many gamers don’t have and is unproven for webcams. Meanwhile, DeepGame can work with any game but requires pre-game training and involves ingame running of reference models.
All appear to be very viable approaches for improving QoE and reducing bandwidth and latency while working with existing codecs and encoders. Unfortunately, none of the three proposals described seem to have progressed towards implementation. This makes ROI encoding for cloud gaming a technology worth watching, if not yet available for implementation.
Mobile cloud gaming and technology suppliers
Video games are a huge market segment, projected to reach US$221.4 billion in 2023, expanding to an estimated US$285 billion by 2027. Of that, cloud gaming grossed an estimated US$3 billion+ in 2022 and is projected to produce over US$12 billion in revenue by 2026.
While the general video game market generates minimal revenue from encoder sales, cloud gaming is the perfect application for ASIC-based transcoding. NETINT products were designed, in part, for cloud gaming and are extensively deployed in cloud gaming overseas. We expect to announce some high-profile domestic design wins in 2023.
If you’re not a gamer, you may not be familiar with what cloud gaming is and how it’s different from PC or console-based gaming. This is the first of several introductory articles to get you up to speed on what cloud gaming is, how it works, who the major players are, and why it’s projected to grow so quickly.
What is cloud gaming
Figure 1, from this article, illustrates the difference between PC/console gaming and cloud gaming. On top is traditional gaming, where the gamer needs an expensive, high-performance console or game computer to process the game logic and render the output. To the extent that there is a cloud component, say for multiple players, the online server tracks and reports the interactions, but all computational and rendering heavy lifting is performed locally.

Figure 1. The difference between traditional and cloud gaming. From this article.
On the bottom is cloud gaming. As you can see, all you need on the consumer side is a screen and game controller. All of the game logic and rendering are performed in the cloud, along with encoding for delivery to the consumer.
Cloud gaming workflow
Figure 2 shows a high-level cloud workflow – we’ll dig deeper into the cloud gaming technology stack in future articles, but this should help you grasp the concept. As shown, the gamer’s inputs are sent to the cloud, where a virtual instance of the game interprets, executes, and renders the input. The resultant frames are captured, encoded, and transmitted back to the consumer, where the frames are decoded and displayed.

Figure 2. A high-level view of the cloud side of cloud gaming from this seminal article.
Cloud gaming and consumers' benefits
Cloud gaming services incorporate widely different business models, pricing levels, available games, performance envelopes, and compatible devices. In most cases, however, consumers benefit because:
- They don’t need a high performant PC or game console to play games – they can play on most connected devices. This includes some Smart TVs for a true, big-screen experience.
- They don’t need to download, install, or maintain games on their game platform.
- They don’t need to buy expensive games to get started.
- They can play the same game on multiple platforms, from an expensive gaming rig or console to a smartphone or tablet, with all ongoing game information stored in the cloud so you can immediately pick up where you left off.
Publishers benefit because they get instant access to users on all platforms, not just the native platforms the games were designed for. So, console and PC-based games are instantly accessible to all players, even those without the native hardware. Since games aren’t downloaded during cloud gaming, there’s no risk of piracy, and the cloud negates the performance advantages long-held by those with the fastest hardware, leveling the playing field for game play.
Gaming experience
Speaking of performance, what’s necessary to achieve a traditional local gameplay experience? Most cloud platforms recommend a 10 Mbps download speed at a minimum for mobile, with a wired Ethernet connection recommended for computers and smart TVs. As you would expect, your connection speed dictates performance, with 4K ultra-high frame rate games requiring faster connection speeds than 1080p@30fps gameplay.
As mentioned at the top, cloud gaming is expected to capture an increasing share of overall gameplay revenue going forward, both from existing gamers who want to play new games on new platforms and new gamers. Given the revenue numbers involved, this makes cloud gaming a critical market for all related technology suppliers.