Frame accurate video client in the browser

HTML5-Javascript video player designed with the goal to provide the same UX (user experience) as the ones integrated in the broadcast video editors.
Test it yourself: Click here

Note: The audio in this test is processed using fs = 44.1Khz, if the audioContext of your browser uses another fs the audio will be deactivated (see list of future improvments :-) )

Introduction

Years ago I started thinking about migrate to "the cloud" some of the broadcast services that traditionally have always been in the broadcast premises, such as MAM (Media asset managment), NRCS (News Room Computer System), news editors. The available bandwidth is becoming less problem every day, and today we have cloud video CMS (content management system), we have API driven editing cloud tools, but I haven't found a UI tool that provides a similar UX as a broadcast video editor.

Goals

Try to prove that is possible to implement a player that provides the same UX as the ones in the broadcast video editors
Identify the problems of implementing the a frame accurate video player in the browser
Learn a bit about the player world :-)

Examples

Example1 Player that shows a mp4 source with no TC embedded
Example2 Player that shows a mov source with embedded TC and 2 scene change

Description

This project is a combination of backend media processing and front end code that provides similar scrubbing UX as the players integrated in broadcast video editors
It is an ideal tool for frame accuracy clipping systems and online video editors. For now is just working with VOD assets, but the idea is to extend the functionality to live streams - figure 1
Features:
- Video: Frame accuracy in all actions
- Audio: SAMPLE accuracy in all actions
- Automatic audio / video alignment in the backend process
- Instantaneous responsiveness in scrubbing and positioning (video and audio)
- Play reverse (video and audio)
- All frames identified with SMPTE Time Code
- Automatic cue point insertion at every scene change (helping to the end user to find the trim points)
- Tracking cue points from the origin to the player (frame accuracy)
- Accepts a huge variety of input formats (we use ffmpeg for decoding)

Blocks diagram — Figure 1: Block diagram.

Backend process

Extracts media information (fps, length, audio fs, Timecode, etc), using ffprobe
Detects scene changes, and add it as cue point information, using ffmpeg
Extract AV initial delay and compensate it, using my own lib
Ready to extract other timed metadata (Cue points)
For video track
- Decodes video and encodes each frame using JPEG (quality as a parameter) - figure 2, using ffmpeg
For audio tracks
- Decodes audio and encodes each portion (video frame aligned) as PCM (wav) with sample accuracy. It also compensates length differences at the beginning and at the end of the file by removing audio samples or adding audio silence - figure 3, using my own lib
Generate a JSON manifest with all the information figure 4, using my own lib

Figure 3: Audio compensation at the beginning of the file.

{
    "video": {
        "num_frames": 781,
        "base_frame_path": "../../media/transcoded/scrubbingTestV3/854x480/video",
        "base_file_name": "test1v_q14_",
        "num_digits_frame": 5,
        "frame_ext": ".jpg",
        "fps": 30
    },
    "audio": {
        "num_frames": 781,
        "base_frame_path": "../../media/transcoded/scrubbingTestV3/854x480/audio",
        "base_file_name": "test1a_",
        "num_digits_frame": 5,
        "frame_ext": ".wav",
        "sample_rate": 44100,
        "channels": 2,
        "bit_per_sample": 16,
        "sample_type": "signed"
    },
    "metadata": {
        "0": {
            "smpte_tc": 108000
        },
        "204": {
            "smpte_tc": 108204,
            "cue_info": {
                "info": "scene change",
                "mean": "152",
                "stddev": "72.2"
            }
        },
        "414": {
            "smpte_tc": 108414,
            "cue_info": {
                "info": "scene change",
                "mean": "110",
                "stddev": "72.4"
            }
        }
    }
}

Figure 4: Example of generated manifest.

Testing

This implementation was tested with: Desktop Chrome and Safari, but it should work in all browsers that implements in HTML5 and Javascript (almost everywhere :-) ). Just needs canvas and AudioContext

Future work

Better job avoiding A/V delay, now we sync A/V every 20s and this creates small audio glitches at those points. Difficulty: 4 (0..10)
Accept any audio sample frequency (now limited to 44.1Khz. Difficulty: 2 (0..10)
Use ABR (Adaptative Bitrate Streaming) approach, create different renditions in the backend and enable the player to switch between them. Difficulty: 6 (0..10)
Test using JPG2000 instead of JPG to increase BW savings. Difficulty: 2 (0..10)
Add intelligence to download algorithm, for instance: Download all the audio and for the video just a the range that surround the cursor, or download the lowest quality (using ABR) and improve quality around the cursor. Difficulty: ? (Depends on the alg)
Add automated tests (C.I. tool). Difficulty: 2 (0..10)
Optimize audio download (perhaps using webworker?). Difficulty: 1 (0..10)
Accurate multi speeds in both directions: x2, x4, /2, /4. Difficulty: 0 (0..10)

Notes

Note: M.S.E. (Media Source Extensions) NOT used in this project.