Fun with GIFs

February 22, 2017

I decided it would be fun to contribute to Go. My recent projects have mostly been me writing things from scratch. I like to do that, but I often think I could make more open source contributions. Beyond giving back, it exposes one to code written by others, and delving into different projects presents challenges and opportunities to learn.

I began by skimming every issue looking for candidates to work on. I didn't mind what the issue was, only that I might be able to solve it and that it might be interesting. At the time there were 2,600 open issues (as of now there are 2,640). I found about 10 I thought looked good.

One I found related to the image/gif package, #16146 image/gif: gif decoding fails with too much image data. I figured out what was going on and provided a patch. Afterwards I investigated where the GIFs causing the behaviour came from, and submitted a patch to ffmpeg. This post is about these two fixes.

Problems with the GIF decoder in Go

The image/gif decoder failed reading certain GIFs with the error "too much image data". These GIFs were viewable through other means, such as through browsers and other decoders. Was there a bug in the decoder? Were the GIFs invalid in some way that other decoders were forgiving?

Initial plan

I knew that to figure this out I would need to learn about the GIF format. I decided to read the specification and see if I could come up with any theories about what the problem was. I also decided that I would write a toy decoder so that I could see what was in the files, and use that to help me find the problem.

The GIF89a Specification, dated July 1990, is the latest version of the GIF specification. The format includes a way to specify delays between images to produce animation. There is an extension which says to loop the images. Without using this extension, GIFs don't loop! The specification is readable and is similar to an RFC.

A GIF is made up of different types of blocks, each identified by magic numbers preceding them. Images appear as "Image Descriptor" blocks followed by pixels compressed using the LZW Algorithm (with some GIF specific updates). Raw data, such as image data, appears as a series of "data sub-blocks", each a maximum of 255 bytes.

After reading the specification I generally understood the format. Now I needed to figure out why the decoder was failing. I had an idea that it could be failing to recognize blocks that might be less common in the wild, such as comment blocks.

Finding the extra data

The code for decoder is in image/gif/reader.go. I read the part with the extra data error. After reading and decompressing the image's pixels, the decoder checks if there is more data in the data sub-blocks for the image. If there is, it raises the extra data error.

I modified the decoder to dump out what the extra data was. I found there was one extra byte containing 0x00.

I wondered if this could be the "Block Terminator". Data sub-blocks in GIFs end with a single data sub-block of size 0 (0x00). However, the decoder thinking this was extra seemed very unlikely. If it was not recognizing Block Terminators correctly then it would be a wonder if it ever worked.

I proceeded with writing a decoder. I based it on the grammar found in Appendix B in the specification. This grammar is similar to the Augmented Backus-Naur form used for the grammar in many RFCs. My decoder reads each block in the image and enforces it being in a valid order. For the most part it simply skips over blocks, printing out each it encounters along with its size. I only parse out things such as size when necessary. I did not make any attempt to decompress the image data.

According to my decoder, the GIFs were well formed. There was no extra data.

I went back to the Go decoder and dumped out which image the problem occurred in. I compared what the two decoders said about this image. The Go decoder was reading the LZW "End of Information Code" marking the end of the compressed pixel data. Then it read one more data sub-block of size 1 containing 0x00 (so there were actually two extra bytes, 0x01 0x00, the first indicating a data sub-block of size 1). By this point the image was decompressed and ended, so this byte was extra.

This explained why my decoder was not seeing a problem. Looking at the data sub-blocks from a size perspective, they are well formed. But if you decompress what is in them, you see there is more data after the compressed part ends.

Is the extra data valid?

At this point I pinpointed the cause, but I needed to know if this was technically valid or not.

I read through relevant the parts of the specification several times looking for a determination. I could find nothing about such extra data one way or the other. The closest point I could find was "[t]he code that terminates the LZW compressed data must appear before the Block Terminator". This does not say that there should be nothing else (in the data) afterwards, just that the end code should be before the Block Terminator.

One reason we may wish to permit extra data is because LZW compression is variable length. It is possible to end partway through a byte and need to pad the byte with zeroes. But what about whole extra bytes? The specification is silent.

I decided to look at how other GIF decoders handle this and if they say anything about it.

I downloaded GIFLIB, a venerable GIF library, and discovered that once it reads and decompresses an image inside a GIF, it reads and discards all data until it finds a Block Terminator. This is why it can read such GIFs without error. It silently throws away the extra data. This does not mean doing so is valid. It could be being liberal in what it accepts.

I decided that extra data was arguably valid given the ambiguity in the specification. I also thought it was an acceptable trade off to ignore it since the case is narrowly defined: Once you fully read an image, read and discard any data up to the next Block Terminator, and continue.

Patching Go

I wrote up what I discovered and posted it on the issue, and included a patch.

I found while making this change that there was a unit test in image/gif for this exact situation. It tested adding extra data in this exact way, and expected decoding to fail. This indicated an author considered this case and decided it should fail. I changed this unit test to reflect that decoding succeeds.

Where are these GIFs coming from?

The GIFs causing the decoder to complain are arguably valid, but they are not well formed. They contain extra data that does not need to be there. The person who reported the issue showed that certain GIFs started exhibiting the problem after decoding and re-encoding with ffmpeg (ffmpeg -i sample.gif sample-reencoded.gif). I thought there could be a bug in its encoder. Since I was on a roll and familiar with the GIF format, I thought I'd try to fix it.

I downloaded the ffmpeg source and reproduced the problem by transcoding the GIF provided by the reporter.

I discovered that in libavcodec/gif.c there is a call to ff_lzw_encode_flush() after writing all the LZW compressed data. In that function, I found:

writeCode(s, s->end_code);
if (s->mode == FF_LZW_GIF)
    s->put_bits(&s->pb, 1, 0);

This says if we are writing out LZW data for a GIF, then always write a single bit (0) after writing the LZW end code. This is almost exactly the problem with the resulting GIFs. There is a difference though: This writes a single bit, but the GIFs had a whole extra byte. I believed this to still be the cause, because if during LZW compression we fully utilize the last byte, adding a bit means adding a new byte to hold it.

Removing these two lines removed the extra data sub-block when transcoding the sample.

I wanted to understand why this code to add an extra bit was written. Maybe knowing the reason would help me figure out whether such bytes should be considered valid. I looked through the git log for libavcodec/lzwenc.c, and found a commit from August 2013 saying:

avcodec/lzwenc: Add 1 additional bit of padding for gif

This fixes issues with gimp reading animated gifs

It sounded like this was to work around a bug with GIMP. I looked through the ffmpeg mailing list archives and issue tracker, as well as briefly at the GIMP mailing list archives and issue tracker. I could not find any thread or issue talking about this. Another person found evidence of a fix made to GIMP in March 2013. Here's the GIMP issue.

I opened up the GIF in GIMP after removing code to add the extra bit, and exported it again to GIF. This seemed to work fine.

I decided that ffmpeg should not be writing this bit, so I wanted to submit a patch.

Sending my ffmpeg patch

I read through the guide to contributing to ffmpeg. ffmpeg has unit tests, and some were now failing. This was not surprising as the commit adding the lines I wanted to remove changed unit tests. To update the tests there is a single make command mentioned on their page about tests. Afterwards I made a commit to my local git checkout.

ffmpeg accepts patches through a mailing list. This is apparently similar to the approach used for the Linux kernel, but it's not one I am very familiar with (I think I did it for ratpoison at some point). The ffmpeg guide walks you through it well. This is what I did to make and send my patch (after having already made a commit):

git format-patch origin            # Make a patch
mutt -H <path to the .patch file>  # Mail the patch

The guide actually suggests using git send-email rather than the two step process I used, but I was skeptical about relying on git to talk SMTP (or whatever magic it uses). I knew I'd at least have to configure SMTP settings, and figured there must be a way to use mutt.

You can see my email with my patch here.

Emailing patches like this feels a bit retro, but it seems to work well for them (and for Linux!). There is a lot of activity on the ffmpeg list! I'm a big fan of email myself so I can see the attraction.

Second guessing my ffmpeg patch

I started thinking whether my patch was correct. I mentioned that ffmpeg wrote only a single extra bit (0). At most this meant writing an extra byte. Yet what lead to the issue was a new data sub-block with an extra byte. What was causing the extra data sub-block?

We know there was always an extra bit being added. This meant that for each image in the GIF output there would often but not always be an extra byte in the data stream. (Not always because sometimes a prior byte had space for the bit.) When adding the bit happened to occur after filling all 255 bytes in a data sub-block, then adding a new byte meant a new data sub-block was needed for that byte. My fix was correct.

Does the Go decoder treat all extra data equal?

I thought of another question: Was Go's decoder complaining when the extra byte was inside the data sub-block where the LZW compressed section ended? Or was it only complaining when the extra byte was in a data sub-block by itself?

I tested this by finding an image in a GIF with an extra byte that was not in a data sub-block by itself. This byte immediately after the LZW end code in the same data sub-block. The Go decoder passed over this byte without complaint. How?

The reason is this: The LZW decoder buffers whole data sub-blocks. It fully reads the data sub-block containing the LZW end code, including any extra bytes following it. The LZW reader sees the LZW end code and stops. Any subsequent read returns io.EOF. This means it silently ignores any extra bytes in a data sub-block after an LZW end code.

Specifically how this happens:

blockReader reads data sub-blocks, stopping only when it reaches a Block Terminator.
Each time it reads, it reads a full data sub-block in (which means at most 255 bytes).
The LZW reader reads from blockReader.
The LZW reader wraps the blockReader in a bufio.Reader. This means any reads to it buffer. Buffering performs at most a single read, which means each read buffers an entire data sub-block from blockReader.
The LZW decoder repeatedly reads one byte at a time (from the buffered blockReader). Even if it needs only a single byte, a read buffers additional bytes. It buffers the LZW end code along with anything following it as long as what follows is in the same data sub-block.
Once the LZW decoder encounters the end code, it refuses to look any further at what it has buffered. It considers the buffer fully read.

Note there is no guarantee the decoder always buffers. lzw.NewReader wraps the io.Reader in a bufio.Reader if it does not implement the io.ByteReader interface. A type could implement io.ByteReader without buffering, in which case extra data would not be ignored.

The fact that the GIF decoder already ignores extra data in some cases lends support to the approach in my patch. My patch adds logic to ignore extra data in another way.

The end

I learned about the GIF format and read a bunch of Go and ffmpeg code. It feels good to contribute. The Go project people responded really positively and were very welcoming.

I hope can find more opportunities to work on these projects. I have a classic open source contributor problem, the will to contribute but looking for something to work on!

The One and the Many