We’re going to talk about Vindral – but first, tell us a little bit about RealSprint?
RealSprint, we’re a Swedish company based in Northern Sweden, which is kind of a great place to be running a tech company. When you’re in a University Town, and any time after September, it gets dark outside for most parts of the day, which means people generally try to find things to do inside. So, it’s a good place to have a tech business because you’ll have people spending a lot of time in front of their screens, creating things. RealSprint is a heavily culture-focused team, with the majority located in Northern Sweden and a few based in Stockholm and in the U.S.
The company started around 10 years ago as a really small team that did not have the end game figured out yet. All they knew was that they wanted to do something around video, broadcasting, and streaming. From there it’s grown, and today we’re 30 people.
At a high level, what is Vindral?
Vindral is actually a product family. There is a live CDN, as you mentioned, and there’s also a video compositing software. As for the live CDN, it’s been around five or six years that it’s been running 24/7.
The product was born because we got questions from our clients about latency and quality. ‘Why do I have to choose if I want low latency or if I want high quality’. There are solutions on both ends of that spectrum, but when we got introduced to the problem, there weren’t really any good ones. We started looking into real-time technologies, like webRTC, in its current state and quickly found that it’s not really suitable if you want high quality. It’s amazing in terms of latency. But the client’s reality requires more. You can’t go all in on only one aspect of a solution. You need something that’s balanced.
Draw us a block diagram. So, you’ve got your encoder, you’ve got your CDN, you’ve got software…
We can take a typical client in entertainment or gaming. So, they have their content, and they want to broadcast that to a global audience. What they generally do is they ingest one signal to our endpoint, which is the most standard way of using our CDN. And there are several ways of ingesting multiple transfer protocols.
The first thing that happens on our end is we create the ABR ladder. We transcode all the qualities that are needed since network conditions vary between markets. Even in places that are well connected, the home Wi-Fi alone can be so bad at times, with a lot of jitter and latency.
After the ABR ladder is created, the next box fans out to the places in the world where there are potential viewers. And from there, we also have edge software as one part of this. Lastly, the signal is received by the player instanced on the device.
That’s basically it.
You’ve got an encoder in the middle of things creating the encoding ladder. Then you’ve got the CDN distributing. What about the software that you’ve contributed? How does that work? Do I log into some kind of portal and then administrate through there?
Exactly. Take a typical client in gaming, for example.They’re running 50 or 100 channels. And they want to see what’s going on in their operations, understand how much data is flowing through the system, and things like that. There is a portal where they can log in, see their usage, and see all of the channel information that they would need. It’s a very important part, of course, of any mature system that the client understands what’s going on.
Encoding is particularly important for us to solve because we have loads of channels running 24/7. So, that’s different. If you’re running a CDN, and your typical client is broadcasting for 20 minutes a month, then, of course, the encoding load is much lower. In our case, yes, we do have those types (minimal usage), but many of our clients are heavy users, and they own a lot of content rights. Therefore, the encoding part is several hundreds of terabytes ingested. Only one quality for each stream monthly on the ingest side.
You’re encoding ABR. Which codecs are you supporting? And which endpoints are you supporting?
So, codec-wise, everybody does H264, of course. That’s the standard when it comes to live streaming with low latency. We have recently added AV1, as well, which was something we announced as a world first. We weren’t the world’s first with AV1, but we were the world’s first with AV1 at what many would call real-time. We call it low latency.
We chose to add it because there’s a market pointing to AV1.
Which devices are you targeting? Is it TV? Smart TV? Mobile? The whole gamut?
I would say the whole gamut. That list of devices is steadily growing. I’m trying to think of any devices that we don’t support. Essentially, as long as it’s using the internet, we deliver to it. Any desktop or mobile browser, including IOS as well.
iOS is, basically, the hardest one. If you’re delivering to iOS browsers that are all running iOS Safari. We’re getting the same performance on iOS Safari. And then Apple TV, Google Chromecast, Samsung, LG TVs, and Android TVs. There’s a plethora of different devices that our clients require us to support.
4K? 1080p? HDR? SDR?
Yes, we support all of them. One of the very important things for us is to prove that you can get quality on low latency.
Take a typical client. They’re broadcasting sports and their viewers are used to watching this on their television, maybe a 77-inch or 85-inch TV. You don’t want that user to get a 720p stream. This is where the configurable latency really comes into play, allowing the client to pick a second of latency or 800 milliseconds, with 4K to be maintained on that latency. That is one of the use cases where we shine.
There’s also a huge market for lower qualities as well, where that’s important.
So, you mentioned ABR ladders, and yes, there are markets where you get 600 kilobits per second on the last mile. You need a solution for that as well.
Your system is the delivery side, the encoding side. Which types of encoders did you consider when you chose the encoder to fit into Vindral?
There are actually two steps to consider depending on whether we’re doing it on-prem or off, like a cloud solution. The client often has their own encoders. Many of our clients use Elemental or something similar just to push the material to us. But on the transcoding, where we generate the ladder, unless we’re passing all qualities through (which is also a possibility), there are, of course, different ways and different directions to go for different scenarios. For example, if you take an Intel CPU-based and you use software to encode. That is a viable option in some scenarios, but not in all.
There’s an Nvidia GPU, for example, which you could use in some scenarios since there are many factors coming into play when making that decision.
The highest priority of all is something that our business generally does badly –maintaining business viability. You want to make sure that any client that is using the system can pay and make their business work. Now, if we have channels that are running 24/7, as we do, and if it’s in a region where it’s not impossible to allocate bare metal or collocation space, then that is a fantastic option in many ways.
CPU-based, GPU-based, and ASICs are all different and make up the three different ones that we’ve looked into.
So, how do you differentiate? You talked about software being a good option in some instances. When is it not a good option?
No option is good or bad in a sense, but if you compare them, both the GPU and the ASIC outperform the software encoding when it comes to heavier use.
The software option is useful when you need to spin it up, spin it down, and you need to move things. You need it to be flexible which is, usually, in the lower revenue parts of the markets.
When it comes to the big broadcaster and the large rights holders, the use case is heavier with many channels, and large usage over time, then the GPU and especially the ASIC make a lot of sense.
You’re talking there about density. What is the quality picture?
A lot of people think software quality is going to be better than ASIC and GPUs. How do they compare?
It might be in some instances. We’ve found that the quality when using ASICs is fantastic. It’s all depending on what you want to do. Because we need to understand we’re talking about low latency here. We don’t have the option of passing encoding or anything like that. Everything needs to work in real time. Our requirement on encoding is that it takes a frame to encode, and that’s all the time that you get.
You mentioned density, but there are a lot of other things coming into play, quality being one.
If you’re looking at ASICs, you’re comparing that to GPUs. In some scenarios we’ve had for the past two years, the decision could have been based on the availability factor – there’s a chip shortage. What can I get my hands on? In some cases, we’ve had a client banging on the door, and they want to go live right away.
Going back to the density part. That is a huge game changer because the ASIC is unmatched in terms of the number of streams per rack unit. If you just measure that KPI, and you’re willing to do the job of building your CDN in co-location spaces, which not everybody is, then that’s it. You have to ask yourself, though, who’s going to manage this? You don’t want to bloat when you’re managing this type of solution. If you have thousands of channels running, then cost is one thing when it comes to not having to take up a lot of rack space, but also, you don’t want it to bloat too much.
How formal of analysis did you make in choosing between the two hardware alternatives? Did you bring it down to cost per stream and power per stream?
Did you do any of that math? How did you make that decision between those two options?
Well, in a way, yes. But, on that particular metric, we need to look at the two options and say well, this is at a tenth of the cost. So I’m not going to give you the number, because I know it’s so much smaller.
We’re well aware of what costs are involved, but the cost per stream depends on profiles, etc. Just comparing them. We’ve, naturally, looked at things like started encoding streams, especially in AV1. We look at what the actual performance is, how much load there is, and what’s happening on the cards, and how much you can put on them before they start giving in… But then… there’s such a big difference…
Take, for example, a GPU. A great piece of hardware. But it’s also kind of like buying a car for the sound system. Because the GPU… If I’m buying an NVIDIA GPU to encode video, then I might not even be using the actual rendering capabilities. That is the biggest job that the GPU is typically built for. So, that’s one of the comparisons to make, of course.
What about the power side? How important is power consumption to either you yourself or your customers?
If you look at the energy crisis and how things are evolving I’d say it is very, very important. The typical offer you’ll be getting from the data center is: we’re going to charge you 2x the electrical bill. And that’s never been something that’s been charged because they don’t even bother. Only now, we’re seeing the first invoices coming in where the electrical bill is part of it. In Germany, the energy price peaked in August at 0.7 Euros per kilowatt hour.
Frankfurt, Germany, is one of the major exchanges that is extremely important. If you want performance streaming, you need to have something in Frankfurt. There’s another part of it as well, which is, of course, the environmental aspect of it. One thing is the bill that you’re getting. The other thing is the bill we’re leaving to our children.
It’s kind of contradictory because many of our clients make travel unnecessary. We have a Norwegian company that we’re working with that is doing remote inspections of ships. They were the first company in the world to do that. Instead of flying in an inspector, the ship owner, and two divers to the location, there’s only one operator of an underwater drone that is on the location. Everybody else is just connected. That’s obviously a good thing for the environment. But what are we doing?
Why did you decide to lead with AV1?
That’s a really good question. There are several reasons why we decided to lead with AV1. It is very compelling as soon as you can do it in real time. We had to wait for somebody to make it viable, which we found with the NETINT’s ASIC.
Viable acts at high quality and with latency and reliability that we could use and also, of course, with throughput. We don’t have to buy too much hardware to get it working.
We’re seeing markers that our clients are going to want AV1. And there are several reasons why that is the case. One of which is, of course, it’s license free. If you’re a content owner, especially if you’re a content owner with a large crowd with many subscribers to your content, that’s a game-changer. Because the cost of licensing a codec can grow to become a significant part of your business expenses.
Look at what’s happening with fast, free, ad-supported television. There you’re trying to get even more viewers. And you have lower margins so what you’re doing is creating eyeball minutes. And then, if you have codec and license costs, that’s a bit of an issue. It’s better if it’s free.
Is this what you’re hearing from your customers? Or is this what you’re assuming they’re thinking about?
That’s what we’re hearing from our customers, and that’s why we started implementing it.
For us, there’s also the bandwidth-to-quality aspect, which is great. I believe that it will explode in 2023. For example, if you look at what happened one month ago, Google made hardware decoding mandatory for Android 14 devices. That’s both phones and tablets. It opens so many possibilities.
We were not expecting to get business on it yet, but we are, and I’m happy about that. There are already clients reaching out because of the licensing aspect, as some of them are transmitting petabytes a month. If you can bring down the bandwidth while retaining the quality, that’s a good deal.
You mentioned before that your systems allow the user to dial in the latency and the quality. Could you explain how that works?
It’s important to make a difference between the user and the broadcaster. Our client is the broadcaster that owns the content, and they can pick the latency.
Vindral’s live CDN doesn’t work on a ‘fetch your file’ basis. The way it works is we’re going to push the file to you, and you’re going to play it out. And this is how much you’re going to buffer. Once you have that setup, and, of course, a lot of sync algorithms and things like that at work, then the stream is not allowed to drift.
A typical use case is where you have tick live auctions, for example. The typical setup for live auctions is 1080P, and you want below one second of latency because people are bidding. There are also people bidding in the actual auction house, so there’s the fairness aspect of it as well.
What we typically see is they configure maybe a 700-millisecond buffer, and it makes it possible. Even that small of a buffer makes such a huge difference. What we see in our metrics is that, basically, 99% of the viewers are getting the highest quality stream across all markets. That’s a huge deal.
How much does the quality drop off? What’s the lowest latency do you support and how much does the quality drop off at that latency as compared to one or two seconds.
I would say that the lowest that we would maybe recommend somebody to use our system for is 500 milliseconds. That would be about 250 milliseconds slower than a webRTC-based real-time solution. And why do I say that? It’s because other than that, I see no reason to use our approach. If you don’t want a buffer, you may as well use something else.
Actually, we don’t have that many clients trying that out, because most of them 500 milliseconds is the lowest somebody’s sets. And they’ve been like ‘this is so quick we don’t need anything more’. And it retains 4K at that latency.
How does the pitch work against webRTC?
If I’m a potential customer of yours and you come in and talk about your system and compared to webRTC, what are the pros and cons of each? It’s an interesting technological decision. I know that webRTC is going to be potentially lower latency, but it might only be one stream, may not come with captioning, it’s not gonna be the ABR It’s interesting to hear what technology was, how do you differentiate.
Let’s look from the perspective of when you should be using which. If you need to have a two-way voice conversation, you should use webRTC. There are actually studies that have been made proving that if you bring the latency up above 200 milliseconds, the conversation starts feeling awkward. If you have half a second, it is possible, but it’s not good. So, if that’s an ultimate requirement, then webRTC all day long.
Both technologies are actually very similar. The main difference I would point out is that we have added this buffer that the platform owner can set. So, the player’s instance is at that buffer level. WebRTC currently does not support that. And even if it did, we might even Implement that as an option. And it might go that way at some point. Today it’s not.
On the topic of differences, then. If 700 or 600 milliseconds of latency is good for you and quality is still important, then you should be using a buffer and using our solution. When you’re considering different vendors, the feature set, and what you’re actually getting in the package, there are huge differences. For some vendors, on their lower-tier products, ABR is not included. Things like that. Where the obvious thing is – you should be using ABR. Definitely.
You talked about the shortest. What’s the longest latency you see people dialing in?
We’ve actually had one use case in Hong Kong where they chose to set the latency at 3.7 seconds. That was because the television broadcast was at 3.7 seconds.
That’s the other thing. We talk a lot about latency. Latency is a hot topic, but honestly, many of our clients value synchronization even above latency. Not all clients, but some of them.
If you have a game show where you want to react to the chat and have some sort of interactivity… Maybe you have 1.5 seconds. That’s not a big issue if it’s at 1.5 seconds of latency. You will, naturally, get a little bit more stability since you’re increasing the buffer. Some of our clients have chosen to do that.
But around 3.5… That’s actually the only client we’ve had that has done that. But I think there could be more in the future. Especially in sports. If you have the satellite broadcast… It is at seven seconds of latency. We can match it to the hundreds of hundreds of milliseconds.
And the advantage of higher latency is going to be stream stability and quality.
Do you know what’s the quality difference is going to be?
Definitely. However, as soon as you’re above even one second, the returns are diminishing. It’s not like it unlocks this whole universe of opportunities. On extreme markets, it might, but I would think that if you’re going above two seconds, you’ve kind of done. There is no need to go higher. At least our clients have not found that need. The markets are basically from East Asia to South America and South Africa because we’ve expanded our CDN into those parts.
You’ve spoken a couple of times about where you install your equipment, and you’re talking about co-locating and things like that. What’s your typical server look like. How many encoders are you putting in it? And what type of density are you expecting from that?
In general, it would be something like one server can do 10 times as many streams if you’re using the ASIC. Then if you’re using GPUs, like Nvidia, for example, it’s probably just the one. I’m not stating any numbers, because my IT guys are going to tell me that I was wrong.
What is the cost of low latency? If I decide to go to the smallest setting, what is that going to cost me? I guess there’s going to be a quality answer, and there’s going to be a stability answer… Is there a hard economic answer?
My hope is that there shouldn’t be a cost difference, depending on regions. The way we’ve chosen to operate is about the design paradigm of the product that you’ve created. We have competitors that are going with one partner. They’ve picked cloud vendor X, and they’re running everything in their cloud. And then what they can do is limited to the deal with that cloud vendor.
For example, we had an AV1 request from Greece. Huge egress for an internet TV channel that I was blown away by, and they mentioned their pricing. They wanted to save costs by cutting their traffic by using av1. What we did with that request is we went out to our partners and vendors and asked them – can you help us match this, and we did. From a business perspective, it might, in some cases, cost more. But there is also a perception that plagues the low latency business of high cost and that is because many of these companies have not considered their power consumption – their form factors.
Actually, being willing to take a CAPEX investment instead of just running in the cloud and paying as you go. Many of those things that we’ve chosen to put the time into so that there will not be that big a difference.
Take, for example, Tata Communications, one of our biggest partners, and their pricing. They’re running our software stack in their environments to run their VDM, and it’s on a cost parity. So that’s something that should always be the aim. Then, I’m not going to say it’s always going to be like that, but that’s just a short version when you’re talking about the business implications.
We’re often getting requests where the potential client has this notion that it’s going to be a very high cost. Then they find that this makes sense, and we can build a business.
Are you seeing companies moving away from the cloud towards creating their own co-located servers with encoders and producing that way, as opposed to paying cents per minute to different cloud providers?
I would say I’m seeing the opposite. We’re doing both, just to be clear. I think the way to go is to do a hybrid.
For some clients, they’re going to be broadcasting 20 minutes a month. Cloud is awesome for that. You spin it up when you need it, and you kill it when it’s done. But that’s not always going to cut it. But if you’re asking me what motion I’m seeing in the market? There are more and more of these companies that are deploying across one cloud. And that’s where it resides. There are also types of offerings that you can instance yourself in third-party clouds, which is also an option. But again, it’s the design choice that it’s a cloud service that uses underlying cloud functions. It’s a shame that it’s not more of both. It creates an opportunity for us, though.
What are the big trends that you’re chasing for 2023 and beyond? What are you seeing? What forces are going to impact your business? The new features you’re going to be picking up? What are the big technology directions you’re seeing?
I mean, for us on our roadmap, we have been working hard on our partner strategy, and we’ve been seeing a higher demand for white-label solutions, which is what we’re working on with some partners.
We’ve done a few of those installs, and that’s where we are putting a lot of effort into it because we’re running our own CDN. But we can also enable others to do it, even as a managed service. You have these telcos that have maybe an edge or less offering since before, and they’re sitting on tons of equipment and fiber. So that’s one thing.
If we’re making predictions, there are two things worth a mention. I would expect the sports betting markets, especially in the US, to explode. That’s something we are definitely keeping our eyes on.
Maybe live shopping becomes a thing outside of China. Many of the big players, the big retailers, and even financial companies, are working on their own offerings and live shopping.
The dinosaurs’ agreement?
Have I told you about the dinosaurs’ agreement? It’s comparable to a gentleman’s agreement. This might be provocative to some. And I get that it’s complicated in many cases.
There is, among some of the bigger players and also among independent consultants that have different stakes, a sort of mutual agreement to keep asking the question – do we really need low latency? Or do we really need synchronization?
And while a valid question it’s also kind of a self-fulfilling prophecy. Because as long as the bigger brands are not creating the experience that the audience is waiting for them to create, nobody’s going to have to move. So that is what I’m calling the dinosaurs here. They’re holding on to the thing that they’ve always been doing. And they’re optimizing that, but not moving on to the next generation. And the problem they’re going to be facing, hopefully, is that when it reaches critical mass, the viewers are going to start expecting it, and that’s when things might start changing.
There are many workflow considerations, of course. There are tech legacy considerations. There are cost considerations and different aspects when it comes to scaling. However, saying that you don’t need low latency is a bit of an excuse.