I accidentally discovered a potential vulnerability in YouTube during a late night debugging session on a MP4 muxer.
If you don’t know what a “muxer” is, that’s fine. I didn’t have a single clue either, up until I had to actually fix one. Muxing is an abbreviation of multiplexing. Muxing is the process of encapsulating multiple encoded streams – audio, video, and subtitles (if any) – into a container format, such as AVI, Ogg, or Matroska. (Quoted from VideoLAN). A muxer is just a term used to describe piece of software that performs multiplexing.
My muxer had a bug, a serious one. I had set it to record 10 seconds of video footage yet it outputted a sped up version of 8 seconds. The muxer being open source and written in Golang, made me think that this should be a relatively easy issue to fix. Just modify some timestamps, right?
Well, turns out there are quite a few types of timestamps in the MP4 format:
- Decode Timestamps (DTS): when to decode the frames.
- Presentation Timestamps (PTS): when to present the frames on the screen.
- Composition Timestamps (CTS): when to compose a frame.
Each of them serve a distinct purpose but my bug resided in the presentation timestamps, my video frames were not being displayed on time correctly. Timestamps in the MP4 format are not your typical format (unix timestamps, …). Instead they are actually more of a duration difference between the start and the end of a video.
Let’s assume you want to display a frame at the fifth second mark, you would transform it as follows:
Where the timeScale is an arbitrary value that you can pick, essentially saying to the decoders that: