One of the spin-offs of our new Solitaire product has been the achievement of extraordinary levels of compression using Plausibility analysis. The strive for compression has tended to go a bit out of fashion as computers have had increasingly more memory and bigger hard drives, but this issue has made a comeback as the need for transmitted and receiving data across networks has necessitated the need for small data again. So the driving consideration is now skewed more towards limited time than limited space.
In this article we needed to keep a product APK file size as small as possible. This is how we did it.
Klondike Solitaire is probably the world's most popular solitaire game. In our path to creating such a product we needed to be able to present puzzles that could be solved. By estimation some 15% of Klondike puzzles have no solution (depending on the rule set). The principles needed to solve Klondike are well known but unfortunately not reliably fast enough to randomly create a puzzle on-the-fly. Returning to a common theme in earlier articles, you cannot just deal with the "likely" when your app is being used by millions of users. Even if the solver could be optimised quite well, you still have to face that attempted random deals with no solution might well string together and cause annoying delays. With the best will in the world a delay of 60 seconds to start a new puzzle will eventually be unavoidable.
First, if you are going to build a library of solutions it makes sense to grade these so that the user can choose between easy and expert puzzles (an option which would not be easily available if created on-the-fly). This is not a trivial exercise and actually was not solved well in our original Move it puzzle. Using the time for a machine solve proved a poor indicator of the difficulty of the puzzle. This method rests on the solver attempting to try and solve the puzzle in a sensible directed fashion. This was hard with Move it! as choosing between trial moves did not easily fit into any reasonable ordering except the more obvious avoidance of moves that made obvious negative progress. The issue of this is discussed in Solving Deep Puzzles.
For solitaire though we were able to do a better job of getting good move ordering so that solutions that followed more obvious good moves could be solved more quickly, and therefore gave a better indicator of the difficulty. This ordering is called plausibility analysis and is discussed in Driving search with Plausibility analysis: Looking at the right moves
So we were committed to creating a database of puzzles.
On the assumption that we want 3 puzzle sets, for 2 rule sets, each with 1000 puzzles, then we need 3x2x1000 solutions. From testing the longest solutions the worst case puzzle needs 198 moves. Looking at the move format it uses 8 bytes per move.
Taking the simplest (and worst) solution, we could declare an array  bytes in size, which would need a staggering 9,504,000 bytes. Most of our apps are not this big and they have game code and artwork, so clearly this is not a reasonable solution.
Of course this sizes the array by the worst case move record, which obviously could be compacted into variable length records. This gives an average record size of 125 instead of 198, which would reduce the overall size to 3*2*1000*125*8 = 6,000,000, which is still too much.
We can take a look at the move format and from this we can see that the 8-bit bytes are wasteful. There are only 10 move types and this can be contained in 4-bits. The card chosen needs 52 possible cards, which needs 6 bits. The tableau destination needs only 3-bits etc. To cut a long story short, this could cut the move record from 8 down to 3 bytes, needing 2,250,000 bytes in total.
This is looking better, but 2.25meg is approaching the size of our smallest released product.
We can do better…
The next idea is the simple idea of replacing the coded moves by an index that gives the number of the move in the legal moves list. This means that the record cannot be decoded without the actual game engine to determine what the move ordering is. This makes the record context sensitive but gives a radical improvement. Each move can now be stored in a single byte. Now the size is 4x smaller at 3*2*1000*125*1 = 750,000.
At this point we might have considered stopping as this is a third the size of our smallest app.
However we can do better…
If we had the moves well ordered we might then gain the advantage that the distribution of possible move list indexes would not only be random but actually highly skewed. If this works then "first" in the moves list would be very common and second less common and so on. This would allow a Morse-like coding strategy such that the most common index (assumed first) would be the equivalent of the letter "E" in Morse, having a single bit to represent it and late positions in the list having up to 5 bits. This would promise a plausible prospect of a very high level of compression, perhaps estimated as much as about 4 moves per byte. That would give a code size of about 190k.
This is very small, but a better and simpler solution is possible…
Examination of the actual solution records reduced to plausible move list indexes actually showed that the Plausibility analysis was getting the first move right almost all the time. This offers a new simpler and better option: to use run-line encoding on the move lists. So a series of 9 zeros (first move) could be reduced to a single byte of the form 0x89 or 128+9. Such was the dominance of these first moves that other moves could be left as-is occupying a byte per index.
Analysis of the entire set showed that the entire set could be reduced to just 96,080 bytes and this included the start-up 6 bytes needed for each solution, storing the random seed, length of record and number of moves. This is storing moves at an astonishing 12.8 moves per byte, ignoring the headers, and 7.8 moves per byte after header overheads. This is almost 100x smaller than the original worst-case method above and makes the point very strongly that plausibility analysis can have a very dramatic impact on game database compression.
The following is the very first game solution in the database, without the initial deal
|30.||T||Tableau to Tableau 5 -> 1 (from 5)|
|36.||T||Waste to Tableau -> 7|
|39.||2||Waste to Tableau -> 2|
|44.||9||Waste to Tableau -> 7|
|45.||8||Tableau to Tableau 3 -> 7 (from 3)|
|46.||7||Tableau to Tableau 5 -> 3 (from 4)|
|47.||7||Tableau to Tableau 5 -> 7 (from 3)|
|48.||Q||Tableau to Tableau 5 -> 4 (from 2)|
|49.||J||Tableau to Tableau 1 -> 4 (from 1)|
|53.||K||Waste to Tableau -> 1|
|56.||Q||Waste to Tableau -> 1|
|60.||6||Waste to Tableau -> 7|
|63.||A||Waste to Foundation -> 1|
|71.||9||Waste to Tableau -> 4|
|72.||8||Waste to Tableau -> 4|
|73.||5||Waste to Tableau -> 7|
|74.||4||Tableau to Tableau 5 -> 7 (from 1)|
|75.||K||Waste to Tableau -> 5|
|77.||7||Waste to Tableau -> 4|
|78.||A||Waste to Foundation -> 2|
|79.||Q||Waste to Tableau -> 5|
|81.||2||Waste to Foundation -> 2|
|82.||J||Waste to Tableau -> 1|
|85.||T||Waste to Tableau -> 1|
|87.||J||Tableau to Tableau 7 -> 5 (from 7)|
|88.||3||Tableau to Foundation 7 -> 2|
|89.||A||Tableau to Foundation 7 -> 3|
|90.||5||Tableau to Tableau 6 -> 7 (from 6)|
|91.||2||Tableau to Foundation 6 -> 1|
|92.||2||Tableau to Foundation 6 -> 3|
|93.||3||Tableau to Tableau 6 -> 5 (from 3)|
|94.||6||Tableau to Tableau 7 -> 4 (from 4)|
|95.||T||Tableau to Tableau 6 -> 7 (from 2)|
|96.||6||Tableau to Tableau 6 -> 3 (from 1)|
|97.||K||Tableau to Tableau 4 -> 6 (from 4)|
|98.||3||Tableau to Foundation 4 -> 1|
|99.||3||Tableau to Tableau 2 -> 4 (from 2)|
|100.||5||Tableau to Tableau 2 -> 3 (from 1)|
|101.||4||Tableau to Tableau 4 -> 3 (from 2)|
|103.||4||Waste to Foundation -> 2|
|105.||K||Waste to Tableau -> 2|
|106.||Q||Waste to Tableau -> 2|
|107.||J||Tableau to Tableau 7 -> 2 (from 3)|
|108.||A||Tableau to Foundation 7 -> 4|
|109.||2||Tableau to Foundation 3 -> 4|
|110.||3||Tableau to Foundation 3 -> 3|
|111.||4||Tableau to Foundation 3 -> 1|
|112.||5||Tableau to Foundation 3 -> 2|
|113.||3||Tableau to Foundation 5 -> 4|
|114.||4||Tableau to Foundation 5 -> 3|
|115.||5||Tableau to Foundation 5 -> 1|
|116.||6||Tableau to Foundation 5 -> 2|
|117.||5||Tableau to Foundation 6 -> 3|
|118.||6||Tableau to Foundation 6 -> 1|
|119.||7||Tableau to Foundation 5 -> 1|
|120.||8||Tableau to Tableau 3 -> 7 (from 2)|
|121.||9||Tableau to Tableau 7 -> 1 (from 1)|
|122.||7||Tableau to Tableau 3 -> 5 (from 1)|
|124.||9||Waste to Tableau -> 2|
|125.||8||Waste to Tableau -> 2|
|126.||6||Waste to Foundation -> 3|
|127.||4||Waste to Foundation -> 4|
|128.||5||Tableau to Foundation 4 -> 4|
|129.||6||Tableau to Foundation 1 -> 4|
|130.||7||Tableau to Foundation 1 -> 2|
|131.||7||Tableau to Foundation 5 -> 4|
|132.||8||Tableau to Foundation 1 -> 4|
|133.||8||Tableau to Foundation 5 -> 2|
|134.||9||Tableau to Foundation 1 -> 2|
|135.||9||Tableau to Foundation 5 -> 4|
|136.||T||Tableau to Foundation 1 -> 4|
|137.||T||Tableau to Foundation 5 -> 2|
|138.||7||Tableau to Foundation 6 -> 3|
|139.||8||Tableau to Foundation 2 -> 3|
|140.||8||Tableau to Foundation 6 -> 1|
|141.||9||Tableau to Foundation 2 -> 1|
|142.||9||Tableau to Foundation 6 -> 3|
|143.||T||Tableau to Foundation 2 -> 3|
|144.||J||Tableau to Foundation 1 -> 3|
|145.||J||Tableau to Foundation 2 -> 4|
|146.||Q||Tableau to Foundation 2 -> 3|
|147.||T||Tableau to Foundation 6 -> 1|
|148.||J||Tableau to Foundation 5 -> 1|
|149.||Q||Tableau to Foundation 1 -> 1|
|150.||K||Tableau to Foundation 1 -> 3|
|151.||J||Tableau to Foundation 6 -> 2|
|152.||Q||Tableau to Foundation 5 -> 2|
|153.||K||Tableau to Foundation 5 -> 1|
|154.||Q||Tableau to Foundation 6 -> 4|
|155.||K||Tableau to Foundation 2 -> 4|
|156.||K||Tableau to Foundation 6 -> 2|
and below is the complete compressed game record for this solution above:
Need we say more?
It is this compression record type that will be used in our shortly to be released Solitaire product.
Jeff Rollason - February 2013