Editing Media in the Masher
Media Masher is a web-based video editing system that allows users to import, combine, and export video, audio, and images together with text and shapes.
The client application works with low resolution media, while the server uses source files to encode at full resolution. The architecture is designed to support a wide range of devices and network conditions, and to be easily customized by developers.
Supported Media
Media Masher classifies media by both its type
and source
. The three media types are ‘video’, ‘audio’, and ‘image’. Imported files of these types are assigned a source
of ‘raw’, but other source
values are possible.
For instance, imported font files are given a source
of ‘text’ but a type
of ‘image’, since they are otherwise treated like other vector-based images. SVG files are treated like other raw images, unless they contain just a single PATH element filled with currentColor
. In this special case the media will be given a source
of ‘shape’, allowing it to be recolored dynamically and more easily used as a mask.
Under the hood, each of these type/source combinations is implemented by a different JavaScript class that follows a TypeScript interface. For instance, video in the browser is implemented by the ClientRawVideoClass which follows the ClientRawVideo interface.
It is also possible to create media that is entirely code driven, with no underlying file. For instance, ColorMedia has a type
of ‘image’ and a source
of ‘color’. It has no intrinsic size, but rather fills its allotted space with a single color.
Media Resources
In addition to type
and source
string properties, each Media instance typically has at least one Resource in its resources
array property. Resources contain metadata about their media, like the file size and location for raw media.
During the import process, the browser draws as much metadata from the raw media file as possible. This information can also be decoded on the server as needed, so only the file location is required.
Video and audio file types both have an intrinsic duration
, while raw images and video have intrinsic width
and height
properties. Text media, as an image type, also has these properties which are calculated from the string rendered in that font.
Video files may or may not have a soundtrack, which the browser may or may not be able to determine! If possible, its audible
property will be set accordingly. Otherwise, it’s determined on the server.
The server can also add other resource files to media while importing. For instance, it can add a low resolution version of an image file which will be used as a preview within the editor. It can also separate the soundtrack from a video file and generate a series of image files for each frame. These are requested as needed within the editor, resulting in a highly optimized loading experience.
User Interface
Developers are free to customize both the appearance and functionality of the editor, but the default user interface is divided into several panels that contain different views, inputs, and specialized controls which help structure and organize specific tasks:
- Player - video preview, play button, time display and control
- Browser - view of media items, search field, import button
- Timeline - view of clips in tracks, zoom controls, scrub control
- Inspector - inputs grouped by target, undo/redo buttons
Conceptually, the entire interface is geared towards a single task - creating an
‘edit decision list’, or mash, that is used by the server to encode a video
file. Mashes are actually just specialized media with a source
of ‘mash’.
Other media that could be added to the mash appears in the browser. Selecting an
item there and clicking the add button will insert it into the timeline as a
Clip starting at the currently displayed frame. Each clip has a content
property that references its associated media.
Each clip has additional properties that can be edited in the inspector, like start time and duration. These can also be edited within the timeline itself, by dragging the clip or its edges. Changes to clips are immediately reflected in the player, which shows a preview of the mash as it will appear in the final video.
Visual clips have an additional container
property that references another
visual media item used to mask (visually crop) the content
. It is typically a
shape image, but any video or image is supported. Both content
and container
have properties related to sizing and positioning.
Drag and Drop
While not required, Drag and Drop interactions provide the most intuitive means to import and arrange media. Video, audio, image and font files can all be dragged from the file system or other applications, and then dropped directly on the browser or timeline to import. Once preprocessed locally, the file will appear in the browser and information about it will appear in the inspector when it is selected. Browser items can be dragged and dropped like media files.
When media is dropped on an existing track within the timeline, the clips appear consecutively on that track but the start time of the sequence depends on whether it is a dense or sparse track. On a sparse track, the start time is the time that corresponds to the drop location - or if a clip is already there, the start of the next available space that can contain the media being dragged. On a dense track, the start time is the end time of the track, or if files are dropped directly on a clip, the start time of that clip.
Existing clips can be dragged within and between tracks. Video and audio clips can be trimmed by dragging on their left or right handles. The duration of images and other clips can be adjusted by dragging on their right handles.
Saving Media
As media is imported and clips are added and edited, Media Masher is continually saving the media and mash data to the browser’s local storage using the File System and IndexedDB APIs. Each file is temporarily copied to a FileSystemFileHandle and a JSON representation of each media object is copied to an IDBDatabase.
When the save button is clicked, all this information is synchronized with the server. Depending on configuration, saving may also trigger other server-side processes like decoding and transcoding. Saving is typically required before the mash can be encoded.
Encoding Mashes
Since a mash can combine multiple media files, fonts, and shapes together, encoding it into a single output file is required for sharing on other platforms.
The media type of the output file can be different than the mash, so a video mash can be encoded as an audio or image file, for example. Duration can also be different, so video and audio mashes can be trimmed. Even the orientation can be changed, so a video mash can be encoded in landscape instead.
The resolution of the output file can be specified, but keep in mind that higher quality settings will take more time and server resources to render. You will also want to avoid outputting at a resolution so high that raw media files in the mash are upsampled, as this adds distortion. Fonts and shapes are vector graphics, so they can be scaled arbitrarily without loss of quality.