Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I'm operating a cloud service like Netflix, then I'm already running thousands of ffmpeg processes on each machine. In other words, it's already a multi-core job.


Latency is still valuable. For example YouTube (which IIRC uses ffmpeg) often takes hours to do transcodes. This is likely somewhat due to scheduling but assuming that they can get the same result doing 4x threads for 1/4 of the time they would prefer that as each job finishes faster. The only real question is at what efficiency cost the latency benefit stops being worth it.


I think that if you're operating at the scale of Google using a single-threaded ffmpeg will finish your jobs in less time.

If you have a queue of 100k videos to process and a cluster of 100 cores, assigning a video to each core as it becomes available is the most efficient way to process them, because your skipping the thread joining time.

Anytime there is a queue of jobs, assigning the next job in the queue to the next free core is always going to be faster than assigning the next job to multiple cores.


YouTube does not use ffmpeg, at the scale at which they operate it would be too slow / expensive.

They use custom hardware just for encoding.

fyi they have to transcode over 500h of videos per minute. So multiple that by all the formats they support.

They operate at an insane scale, Netflix looks like a garage project for comparison.


There's still decoding. If a service claims to support all kinds of weird formats (like a MOV or AVI from the 90s) that means ffmpeg is running.


Google's use of ffmpeg: https://multimedia.cx/eggs/googles-youtube-uses-ffmpeg/

For encoding, recently, they've built their own ASIC to deal with H264 and VP9 encoding (for 7-33x faster encoding compared to CPU-only): https://arstechnica.com/gadgets/2021/04/youtube-is-now-build...


Facebook does, and contributes to ffmpeg.


Curious, what would that many ffmpeg processes be doing at Netflix? I assume new VOD content gets encoded once per format, and the amount of new content added per day is not gigantic.

Agree with the general premise, of course, if I've got 10 different videos encoding at once then I don't need additional efficiency because the CPU's already maxed out.


It's been reported in the past that Netflix encodes 120 different variants of each video they have [1] for different bitrates and different device's needs.

And that was years ago, I wouldn't be surprised to learn it's a bigger number now.

[1] https://news.ycombinator.com/item?id=4946275


I assume they re-compress for each resolution / format, quite possibly they also have different bitrate levels per resolution. Potentially even variants tweaked for certain classes of device (in cases this is not already covered by combination of format/resolution/bitrate). I would also assume they re-compress with new advances in video processing (things like HDR, improved compression).

Also, their devs likely want fast feedback on changes - I imagine they might have CI running changes on some standard movies, checking various stats (like SNR) for regressions. Everybody loves if their CI finishes fast, so you might want to compress even a single movie in multiple threads.


They'll be doing VBR encodes to DASH, HLS & (I guess still) MSS which covers the resolutions & formats... DRM will be what prevents high res content from working on some "less-trusted" platforms so the same encodes should work.

(Plus a couple more "legacy" encodes with PIFF instead of CENC for ancient devices, probably.)

New tech advances, sure, they probably do re-encode everything sometimes - even knocking a few MB off the size of a movie saves a measurable amount of $$ at that scale. But are there frequent enough tech advances to do that more than a couple of times a year..? The amount of difficult testing (every TV model group from the past 10 years, or something) required for an encode change is horrible. I'm sure they have better automation than anyone else, but I'm guessing it's still somewhat of a nightmare.

Youtube, OTOH, I really can imagine having thousands of concurrent ffmpeg processes.


Why bring up assumptions/suppositions about Netflix's encoding process?

Their tech blog and tech presentations discuss many of the requirements and steps involved for encoding source media to stream to all the devices that Netflix supports.

The Netflix tech blog: https://netflixtechblog.com/ or https://netflixtechblog.medium.com/

Netflix seems to use AWS CPU+GPU for encoding, whereas YouTube has gone to the expense of producing an ASIC to do much of their encoding.

2015 blog entry about their video encoding pipeline: https://netflixtechblog.com/high-quality-video-encoding-at-s...

2021 presentation of their media encoding pipeline: https://www.infoq.com/presentations/video-encoding-netflix/

An example of their FFmpeg usage - a neural-net video frame downscaler: https://netflixtechblog.com/for-your-eyes-only-improving-net...

Their dynamic optimization encoding framework - allocating more bits for complex scenes and fewer bits for simpler, quieter scenes: https://netflixtechblog.com/dynamic-optimizer-a-perceptual-v... and https://netflixtechblog.com/optimized-shot-based-encodes-now...

Netflix developed an algorithm for determining video quality - VMAF, which helps determine their encoding decisions: https://netflixtechblog.com/toward-a-practical-perceptual-vi..., https://netflixtechblog.com/vmaf-the-journey-continues-44b51..., https://netflixtechblog.com/toward-a-better-quality-metric-f...


> Their dynamic optimization encoding framework - allocating more bits for complex scenes and fewer bits for simpler, quieter scenes: https://netflixtechblog.com/dynamic-optimizer-a-perceptual-v... and https://netflixtechblog.com/optimized-shot-based-encodes-now...

This is overrated - of course that's how you do it, what else would you do?

> Mean-squared-error (MSE), typically used for encoder decisions, is a number that doesn’t always correlate very nicely with human perception.

Academics, the reference MPEG encoder, and old proprietary encoder vendors like On2 VP9 did make decisions this way because their customers didn't know what they wanted. But people who care about quality, i.e. anime and movie pirate college students with a lot of free time, didn't.

It looks like they've run x264 in an unnatural mode to get an improvement here, because the default "constant ratefactor" and "psy-rd" always behaved like this.


You're letting the video codec make all the decisions for bitrate allocation.

Netflix tries to optimize the encoding parameters per shot/scene.

from the dynamic optimization article:

- A long video sequence is split in shots ("Shots are portions of video with a relatively short duration, coming from the same camera under fairly constant lighting and environment conditions.")

- Each shot is encoded multiple times with different encoding parameters, such as resolutions and qualities (QPs)

- Each encode is evaluated using VMAF, which together with its bitrate produces an (R,D) point. One can convert VMAF quality to distortion using different mappings; we tested against the following two, linearly and inversely proportional mappings, which give rise to different temporal aggregation strategies, discussed in the subsequent section

- The convex hull of (R,D) points for each shot is calculated. In the following example figures, distortion is inverse of (VMAF+1)

- Points from the convex hull, one from each shot, are combined to create an encode for the entire video sequence by following the constant-slope principle and building end-to-end paths in a Trellis

- One produces as many aggregate encodes (final operating points) by varying the slope parameter of the R-D curve as necessary in order to cover a desired bitrate/quality range

- Final result is a complete R-D or rate-quality (R-Q) curve for the entire video sequence


> You're letting the video codec make all the decisions for bitrate allocation. > Netflix tries to optimize the encoding parameters per shot/scene.

That's the problem - if the encoding parameters need to be varied per scene, it means you've defined the wrong parameters. Using a fixed H264 QP is not on the rate-distortion frontier, so don't encode at constant QP then. That's why x264 has a different fixed quality setting called "ratefactor".


What about VP9? And any of the other codecs that Netflix uses (I'll assume AV1 is one they currently use)?


It's not a codec-specific concept, so it should be portable to any encoder. x265 and AV1 should have similar things, not sure about VP9 as I think it's too old and On2 were, as I said, not that competent.


Isn't two pass encoding similar? In the first pass you collect statistics you use in the second pass for bandwidth allocation?

Possibly Netflix statistics are way better.


> This is overrated - of course that's how you do it, what else would you do?

That's not what has been done previously for adaptive streaming. I guess you are referring to what encoding modes like CRF do for an individual, entire file? Or where else has this kind of approach been shown before?

In the early days of streaming you would've done constant bitrate for MPEG-TS, even adding zero bytes to pad "easy" scenes. Later you'd have selected 2-pass ABR with some VBV bitrate constraints to not mess up the decoding buffer. At the time, YouTube did something where they tried to predict the CRF they'd need to achieve a certain (average) bitrate target (can't find the reference anymore). With per-title encoding (which was also popularized by Netflix) you could change the target bitrates for an entire title based on a previous complexity analysis. It took quite some time for other players in the field to also hop on the per-title encoding train.

Going to a per-scene/per-shot level is the novely here, and exhaustively finding the best possible combination of QP/resolution pairs for an entire encoding ladder that also optimizes subjective quality – and not just MSE.


> exhaustively finding the best possible combination of QP/resolution pairs for an entire encoding ladder that also optimizes subjective quality – and not just MSE.

This is unnecessary if the encoder is well-written. It's like how some people used to run multipass encoders 3 or 4 times just in case the result got better. You only need one analysis pass to find the optimal quality at a bitrate.


Sure, the whole point of CRF is to set a quality target and forget about it, or, with ABR, to be as good as you can with an average bitrate target (under constraints). But you can't do that across resolutions, e.g. do you pick the higher bitrate 360p version, or the lower bitrate 480p one, considering both coding artifacts and upscaling degradation?


At those two resolutions you'd pick the higher resolution one. I agree that generation of codec doesn't scale all the way up to 4K and at that point you might need to make some smart decisions.

I think it should be possible to decide in one shot in the codec though. My memory is that codecs (image and video) have tried implementing scalable resolutions before, but it didn't catch on simply because dropping resolution is almost never better than dropping bitrate.


Probably a lot more than once when you consider that different devices have different capabilities, and that they might stream you different bitrates depending on conditions like your network capability, screen resolution, how much you've paid them..

You could also imagine they might apply some kind of heuristic to decide to re-encode something based on some condition... Like fine tune encoder settings when a title becomes popular. No idea if they do that, just using some imagination.


I guess it's irrelevant for Netflix then*. But it sounds great for the remaining 99.99%.

* I would be very surprised if Netflix even uses vanilla ffmpeg


> But it sounds great for the remaining 99.99%.

I believe the vast majority of ffmpeg usages are web services, or one off encodings.


Well, this feature is awesome for one-off encoding by a home user.

Subjectively, me compressing my holiday video is much more important than Netflix re-compressing a million of them.


I use ffmpeg all the time, so this change is much appreciated. Well not really that often, but when I do encode video/audio it's generally with ffmpeg.


As multi-core as Python and Ruby then.


Yes. The kernel multiplies your efforts for you. It works great for web services.


Okaaay, and if I'm not operating a cloud service like Netflix, and I'm not running thousands of ffmpeg processes? In other words, it's not already a multi-core job?


This kind of multithreaded code introduces great complexity. So who wants to pay the cost for that tradeoff? Since most performance sensitive ffmpeg uses are cloud services, I don't see the benefit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: