Nov 19

I want to have "animated JPEGs"! (a manifest)

Category: Linux,multimedia   — Published by goeszen on November 19, 2013 at 4:23 pm

As sort-of follow-up to my earlier post, I want to set bookmarks in videos!, here's another feature / app request for the developer community:

I want to have looping MJPEG files, and broad support for it in browsers and image-viewers!

Why? Well, everyone knows: tiny "videos" that start more or less instantly, without any pesky (Flash) Plugin-Ins - that's great and everyone loves animated GIFs. Where would Internet memes be without it? Still, the format is ancient, and the compression used was never intended for compressing videos. The result is: many of those animated GIFs are huge, like Megabytes!

Once, PNG was on a good way, trying to replace animated GIFs with an extended "animated PNG". Seemed natural: PNG is the successor of GIF, and along with it, animated PNGs would adopt the improved compression of the png container/format. But the initiative died, though, and nothing ever came out.

Lately, Vine videos tried to "replace the animated gif", but the claim was more marketing than a true innovation: short mp4 videos that would only loop when they are played back embedded on the native Vine player/webpage/app. Not particularly useful as a self-contained media file or particularly easy to share when disconnected from the Internets.

Generally, video is making inroads into browsers. OGG videos (also WebM) are natively supported in many browsers, no Plugins, but no "automatic-loop" flag either. You can only set the HTML <video> element to loop, that's all. It's not self-contained.

Let's take a step back: When you look at it, all the elements are there:

  • We've got JPEG compression, which is particular effective on compressing real world images, like found in video with people in it, animals, etc.
  • Then, we've got the Motion-JPEG pseudo-standard, a bit like a stream of raw JPEG files concatenated together. It's working, and natively supported in browsers, more or less. Only thing is that most people think of it as a streaming format, but that's only because many webcam manufacturers use it as their format of choice. But it's perfectly valid as stand-alone file format, for example Canon cameras use it as their video format.
  • And, JPEG knows about tags (and sections), small data lumps that contain either metadata or image-data.

So, why can't we agree on a tag that says: "Loop this MJPEG"!?

The result could be much better looking short animation/videos. Forget about the rasterised graininess of animated gifs! And the savings on bandwidth would be dramatic. Any developers out there adventurous enough to implement this in their image-viewer apps? Or a Chromium/ Firefox fork?? Anyone?

Well, let's get serious: A closer look

What is JPEG really? Well, JPEG is actually many formats, JFIF and Exif being the mainly used types. What they both share is the APP0 or APP1 tag structure, sort of a header. Usually followed by the real image data.

Then, when you start to dig into the Motion-JPEG pseudo-standard, you'll pretty soon find out why it's a pseudo-standard: there is no official specification! Most manufactures of hardware which is able to output mjpeg invented their own standards. What most specs have in common is that mjpeg is simply multiple jpegs, header+imagedata, concatenated after each other. A consortium around MOV developers from Apple in 1996 agreed on some things (how to treat interlaced, MJPEG-A and MPEG-B), but that's all. Framerate is never actually stated, it's normally derived, from how quick images arrive (in browsers/over the network) or simply assumed to be 25 fps (when played back with avplay/ vlc/ mplayer). And playback in browsers seems to be a bit hackish, with the MIME header multipart/x-mixed-replace;boundary=<something> telling the browser to continuously replace the displayed image with subsequently arriving ones, looking for the boundary "<something>" within the stream. Displaying a simple self-contained .mjpeg file (image sequence) doesn't work. No header, no image tag, no flag is telling the browser to display more than the first image-data block.

Going back to what JPEG does, in comparison with real video containers and video codecs, JPEG falls somewhere in-between: it describes "frames", but in how it does it, it's possible to add metadata about the whole package, the container, onto each frame. In contrast to, for example, aac. Try adding metadata to aac with avconv: it doesn't work. The trick in this article was to wrap the aac in an mp4 container, adding the metadata to the "shell" while the actual "payload" remains untouched. The Apple-sponsored paper mentioned above had the same perspective on motion-jpeg: as a codec within a MOV container.

For a self contained MJPEG format now, I would propose that each concatenated JPEG should carry metadata about display-length (implicit framerate) and offset (see below for what "offset" means), while the first JPEG in a stream should tell a viewer-application that this JPEG is actually a stream of JPEGs, then a yes/no flag for loop or a number of repeats, and maybe a total number of frames, total runtime. Rephrasing the said, here's...

The proposed format:

"is animation" flag
"repeat" value (0 = continuous/looped/indefinite replay) compare the NAB block in GIF
"offset" value (optional, byte position of start of next frame/image)
"framerate" value (optional)

"exposure" value: duration/length of this frame (in ms), time this frame should be displayed, called "delay" in GIFs
"offset" value (optional, byte position of start of next frame/image)
"picoffset" value: NW corner of frame at 0, 0

There is no index-structure like in MP4 or AVI containers. Animated JPEG is just a stream of JPEG images concatenated together. The application has to build the index on-the-fly, while each frame may hold the seek-offset of the next, resulting in a chained-index so to speak. If that value is missing,

Extension/file-suffix can be .mjpeg/.mjpg, .ajpeg, or simply .jpg/.mjpeg as all viewers not aware of these extensions will simply show the first frame.

Update: this proposed format is what lead to the Animated JPEG experimental standard.

Where to store the extended information?

JPEG has the concept of segments of defined types APP0-APP15, COM and various image data related tags. One important thing is keeping backward compatibility of animated jpegs with established standard. That's why we can't define a new segment type - parsers would choke on it afaik. One of the older (core) standards of JPEG files is JFIF. And the JFIF specs excplicitely allow multiple APP0 segments after the obligatory APP0-JFIF segment (JFIF v1.02, page 2):

APP0 marker used for application-specific information
Additional APP0 marker segments can be used to hold application-specific information
which does not affect the decodability or displayability of the JFIF file. Application-
specific APP0 marker segments must appear after the JFIF APP0 and any JFXX APP0
segments. Decoders should skip any unrecognized application-specific APP0 segments.
Application-specific APP0 marker segments are identified by a zero terminated string which
identifies the application (not "JFIF" or “JFXX”). This string should be an organization
name or company trademark. Generic strings such as dog, cat, tree, etc. should not be

So let's agree on that: an Animation JPEG should be a number of concatenated JFIF files, where each frame is a JFIF with an additional APP0 segment labeled "ANIM". And each record within this segment is our collection of key=value pairs.

An alternative to that would be to use the APP12 segment which is a segment type usually used by older software to store additional camera details. It usually holds named key=value paris. Sometimes Adobe uses it to store number-identified "Ducky" metadata.
Another area would be the COM (comment) segment, where we could store CSV or JSON-like data to describe the animation, but as I think only one COM is allowed per JFIF, that would come into the way of "real" comments, and we shouldn't throw technical key=value stuff at people expecting human-readable text there.

One important thing to keep in mind would be to offer an easy way to identify a motion-jpeg, similar to a file's magic number. So applications can reliably tell if that's an animated jpeg or not. Without parsing the file structure, or guessing, after additional (concatenated) data after the


section has been found. Suffixes were never a reliable way, anyway. Probably, it would be possible to agree on a more or less fixed structure for the first few bytes of a file, so some application can probe the first 2-bytes to find out "ah, it's a JPEG" and then seek to, I don't know, offset 112, and look if that byte or bit has some significance for us, telling us that it's "probably an animation" or not.

"But hey, what about transparency!"

We all know, JPEG does not support transparent pixels. GIF does, and that's handy in animation, only replacing pixels that actually change. Well, in theory, that sounds good. But in reality, it doesn't save a lot. And that's where the "picoffset" value above comes into play. This way, a first frame can layout a large canvas, and subsequent frames can update only a small part of this canvas. For animations where the outer canvas is static and only small things change - which is quite common in Cinemagraphs - subsequent frames can be very small images compared to the initial canvas. And that might actually outperform the savings ever achieved by GIFs full support for transparency.


Many other approaches tried to overcome JPEG's static form. All of them, sadly, are out-of-band solutions:

I'd like to give a "Honorable Mention" for one approach user bas suggested on the Ffmpeg mailing-list: his idea: The MOV/Quicktime container has an option-flag to set the contained video stream to "loop". Would it be possible to have MJpeg in a MOV container with this flag set to loop/repeat and each Quicktime compatible player would play back the video as expected? Well, short answer: No. Although ffmpeg can pass the -loop flag to the output video, it doesn't seem to stick and most players seem to ignore it.

Nifty web developers use individual jpg files and loop/animate them via JavaScript, jquery for example. Quite portable, but, again, a solution similar to the one introduced by Vine - it's only working in combination with a web-page, not self-contained, etc. Not perfect.

And, dear suffix-renamers: you can't create an animted JPEG by just changing the extension! Any credible software identifies files by their magic number - sooner or later, not the .extension. That's why a file re-named kitten.jpg can be actually played back: It' still kitten.gif!

And: I don't know what Gargamel's Animation ReCreator does, or if it's any more than a scam...


In case you are one of the few who read up until here, be told that there now actually is a format, a proposed application extension standard for Animated JPEG.

Leave a Reply