Editing Media in the Masher

Media Masher is a web-based video editing system that allows users to import, combine, and export video, audio, and images together with text and shapes.

The client application works with low resolution media, while the server uses source files to encode at full resolution. The architecture is designed to support a wide range of devices and network conditions, and to be easily customized by developers.

Supported Media

Media Masher classifies media by both its type and source. The three media types are ‘video’, ‘audio’, and ‘image’. Imported files of these types are assigned a source of ‘raw’, but other source values are possible.

For instance, imported font files are given a source of ‘text’ but a type of ‘image’, since they are otherwise treated like other vector-based images. SVG files are treated like other raw images, unless they contain just a single PATH element filled with currentColor. In this special case the media will be given a source of ‘shape’, allowing it to be recolored dynamically and more easily used as a mask.

Under the hood, each of these type/source combinations is implemented by a different JavaScript class that follows a TypeScript interface. For instance, video in the browser is implemented by the ClientRawVideoClass which follows the ClientRawVideo interface.

It is also possible to create media that is entirely code driven, with no underlying file. For instance, ColorMedia has a type of ‘image’ and a source of ‘color’. It has no intrinsic size, but rather fills its allotted space with a single color.

Media Resources

In addition to type and source string properties, each Media instance typically has at least one Resource in its resources array property. Resources contain metadata about their media, like the file size and location for raw media.

During the import process, the browser draws as much metadata from the raw media file as possible. This information can also be decoded on the server as needed, so only the file location is required.

Video and audio file types both have an intrinsic duration, while raw images and video have intrinsic width and height properties. Text media, as an image type, also has these properties which are calculated from the string rendered in that font.

Video files may or may not have a soundtrack, which the browser may or may not be able to determine! If possible, its audible property will be set accordingly. Otherwise, it’s determined on the server.

The server can also add other resource files to media while importing. For instance, it can add a low resolution version of an image file which will be used as a preview within the editor. It can also separate the soundtrack from a video file and generate a series of image files for each frame. These are requested as needed within the editor, resulting in a highly optimized loading experience.

User Interface

Developers are free to customize both the appearance and functionality of the editor, but the default user interface is divided into several panels that contain different views, inputs, and specialized controls which help structure and organize specific tasks:

Player - video preview, play button, time display and control
Browser - view of media items, search field, import button
Timeline - view of clips in tracks, zoom controls, scrub control
Inspector - inputs grouped by target, undo/redo buttons

Conceptually, the entire interface is geared towards a single task - creating an ‘edit decision list’, or mash, that is used by the server to encode a video file. Mashes are actually just specialized media with a source of ‘mash’. Other media that could be added to the mash appears in the browser. Selecting an item there and clicking the add button will insert it into the timeline as a Clip starting at the currently displayed frame. Each clip has a content property that references its associated media.

Each clip has additional properties that can be edited in the inspector, like start time and duration. These can also be edited within the timeline itself, by dragging the clip or its edges. Changes to clips are immediately reflected in the player, which shows a preview of the mash as it will appear in the final video.

Visual clips have an additional container property that references another visual media item used to mask (visually crop) the content. It is typically a shape image, but any video or image is supported. Both content and container have properties related to sizing and positioning.

Drag and Drop

While not required, Drag and Drop interactions provide the most intuitive means to import and arrange media. Video, audio, image and font files can all be dragged from the file system or other applications, and then dropped directly on the browser or timeline to import. Once preprocessed locally, the file will appear in the browser and information about it will appear in the inspector when it is selected. Browser items can be dragged and dropped like media files.

When media is dropped on an existing track within the timeline, the clips appear consecutively on that track but the start time of the sequence depends on whether it is a dense or sparse track. On a sparse track, the start time is the time that corresponds to the drop location - or if a clip is already there, the start of the next available space that can contain the media being dragged. On a dense track, the start time is the end time of the track, or if files are dropped directly on a clip, the start time of that clip.

Existing clips can be dragged within and between tracks. Video and audio clips can be trimmed by dragging on their left or right handles. The duration of images and other clips can be adjusted by dragging on their right handles.

Saving Media

As media is imported and clips are added and edited, Media Masher is continually saving the media and mash data to the browser’s local storage using the File System and IndexedDB APIs. Each file is temporarily copied to a FileSystemFileHandle and a JSON representation of each media object is copied to an IDBDatabase.

When the save button is clicked, all this information is synchronized with the server. Depending on configuration, saving may also trigger other server-side processes like decoding and transcoding. Saving is typically required before the mash can be encoded.

Encoding Mashes

Since a mash can combine multiple media files, fonts, and shapes together, encoding it into a single output file is required for sharing on other platforms.

The media type of the output file can be different than the mash, so a video mash can be encoded as an audio or image file, for example. Duration can also be different, so video and audio mashes can be trimmed. Even the orientation can be changed, so a video mash can be encoded in landscape instead.

The resolution of the output file can be specified, but keep in mind that higher quality settings will take more time and server resources to render. You will also want to avoid outputting at a resolution so high that raw media files in the mash are upsampled, as this adds distortion. Fonts and shapes are vector graphics, so they can be scaled arbitrarily without loss of quality.