There’s a neat trick the original arcade machine uses on the background parallax layers. Several layers just have a different colour palette applied to them and get reused on many stages. Now that I’m doing parallax layers too, I can also use the same trick to save memory.
Looking into how best to store the background parallax layers and stage data, took me on an interesting detour into different implementations of Run Length Encoding. Experts look away, as what follows is an “Idiot’s Guide” to Run Length Encoding, and I will be your idiot for today.
Run Length Encoding (or just RLE) is a kind of poor man’s compression which is easy to implement and quick to run, even on old 8-bit computers. Even the concept is simple and obvious, unlike most other (admittedly much better) compression algorithms. The concept is just this – when you get several repeating items of the same value (a run), just store a count of how many there are, along with the value of the repeating item.
a,a,a,a,a,a,b,b,b becomes 6,a, 3,b
This saves space when we have nothing but repeats, but what happens when values do not repeat? In the easiest approach we might store a count of just 1 for those items that do not repeat.
a,b,c,d,d,d,d,d,d,e,e,f,g becomes 1,a, 1,b, 1,c, 6,d, 2,e, 1,f, 1,g
As you can see, in this case it has the unfortunate effect of expanding the data, rather than compressing it. This is where things get slightly more interesting. There are a few different approaches to that problem, and which one gets the best result is dependent on the kind of data you will be encoding.
Approach number 1. Use two repeats of the same value to show that a run is required, and the value after is the count remaining. This is pretty cunning, but not always that efficient. Note how in the example the two e’s below actually expand in size. Runs of 3 take up the same size, but any bigger runs will start to benefit. The non-repeating data takes up the same amount of space.
a,b,c,d,d,d,d,d,d,e,e,f,g becomes a,b,c, d,d,5, e,e,0, f,g
Approach number 2. Use an unambiguous symbol to represent that a run is required. In our examples we are just using letters of the alphabet, so we could use anything other than a letter as a symbol. Below the character “#” is chosen as a symbol. This approach has the advantage of not having to expand the data when it is not worth it; we can leave the 2 e’s as just 2 e’s.
a,b,c,d,d,d,d,d,d,e,e,f,g becomes a,b,c, #,d,6, e,e, f,g
What if there is no suitable value for a symbol though? If we want to have “#” as part of the data in the above example we hit a problem. One answer is again to use a repeating value to indicate that the symbol is to be used as real data, rather than just a symbol – but any runs of the symbol cannot then be represented as a count. Alternately we could represent the occurrence of the symbol as a run with a count of just 1 item. It is therefore important that a rarely used value is chosen as the symbol, or the data will start to expand every time the symbol is required to be represented in the data. If the symbol is not always going to be the same then we must also record the symbol value in the data. Right at the beginning is an obvious place to put it.
a,b,c,d,d,d,d,d,d,e,e,#,f,g becomes #, a,b,c, #,d,6, e,e, #,#, f,g
or
a,b,c,d,d,d,d,d,d,e,e,#,f,g becomes #, a,b,c, #,d,6, e,e, #,#,1, f,g
Approach number 3. This one actually has a name – it’s sometimes referred to as “PackBits”. The basic idea is to have a count for the runs and also a count of how many non-repeating values there are before the next run. We do need to distinguish between the two kinds of count though, and this reduces the range of the count. The usual method is to represent one of the counts as a positive number, and the other as a negative number. In the example here we use a negative value to show when the count is for a run.
a,b,c,d,d,d,d,d,d,e,e,f,g becomes 3,a,b,c, -6,d, -2,e, 2,f,g
That’s pretty much the basics for Run Length Encoding. One thing that hasn’t been mentioned yet, is how to know when we’ve got to the end of the data. You could have the file length stored at the beginning of the data, and keep count of when to stop. A normally more efficient way (at least on old processors with few registers, like the 6502) is to have a zero in the count value to represent we’ve finished.
Recent Comments