Video codecs often encode the delta from the previous frame, and because this delta is often small, it's efficient to do it this way. If each thread needed to process the frame separately, you would need to make significant changes to the codec, and I hypothesize it would cause the video stream to be bigger in size.
The parent comment referred to "keyframes" instead of just "frames". Keyframes—unlike normal frames—encode the full image. That is done in case the "delta" you mentioned could be dropped in a stream ending up with strange artifacts in the resulting video output. Keyframes are where the codec gets to press "reset".
Isn't that delta partially based on the last keyframe? I guess it would be codec dependent, but my understanding is that keyframes are like a synchronization mechanism where the decoder catches up to where it should be in time.
Yes, key frames are fully encoded, and some delta frames are based on the previous frame (which could be keyframe or another delta frame). Some delta frames (b-frames) can be based on next frame instead of previous. That's why sometimes you could have a visual glitch and mess up the image until the next key frame.
I'd assume if each thread is working on its own key frame, it would be difficult to make b-frames work? Live content also probably makes it hard.
In most codecs the entropy coder doesn't reset across frames, so there is enough freedom that you can do multithreaded decoding. ffmpeg has frame-based and slice-based threading for this.
It also has a lossless codec ffv1 where the entropy coder doesn't reset, so it truly can't be multithreaded.