Pushing compression limits using Plausibility analysis

Author: Jeff Rollason

Copyright Notice: This article is Copyright AI Factory Ltd. Ideas and code belonging to AI Factory may only be used with the direct written permission of AI Factory Ltd.

One of the spin-offs of our new Solitaire product has been the achievement of extraordinary levels of compression using Plausibility analysis. The strive for compression has tended to go a bit out of fashion as computers have had increasingly more memory and bigger hard drives, but this issue has made a comeback as the need for transmitted and receiving data across networks has necessitated the need for small data again. So the driving consideration is now skewed more towards limited time than limited space.

In this article we needed to keep a product APK file size as small as possible. This is how we did it.

The core requirement : A Klondike Solitaire App

Klondike Solitaire is probably the world's most popular solitaire game. In our path to creating such a product we needed to be able to present puzzles that could be solved. By estimation some 15% of Klondike puzzles have no solution (depending on the rule set). The principles needed to solve Klondike are well known but unfortunately not reliably fast enough to randomly create a puzzle on-the-fly. Returning to a common theme in earlier articles, you cannot just deal with the "likely" when your app is being used by millions of users. Even if the solver could be optimised quite well, you still have to face that attempted random deals with no solution might well string together and cause annoying delays. With the best will in the world a delay of 60 seconds to start a new puzzle will eventually be unavoidable.

Solving and storing solutions

First, if you are going to build a library of solutions it makes sense to grade these so that the user can choose between easy and expert puzzles (an option which would not be easily available if created on-the-fly). This is not a trivial exercise and actually was not solved well in our original Move it puzzle. Using the time for a machine solve proved a poor indicator of the difficulty of the puzzle. This method rests on the solver attempting to try and solve the puzzle in a sensible directed fashion. This was hard with Move it! as choosing between trial moves did not easily fit into any reasonable ordering except the more obvious avoidance of moves that made obvious negative progress. The issue of this is discussed in Solving Deep Puzzles.

For solitaire though we were able to do a better job of getting good move ordering so that solutions that followed more obvious good moves could be solved more quickly, and therefore gave a better indicator of the difficulty. This ordering is called plausibility analysis and is discussed in Driving search with Plausibility analysis: Looking at the right moves

So we were committed to creating a database of puzzles.

Assessing the Space Requirement

On the assumption that we want 3 puzzle sets, for 2 rule sets, each with 1000 puzzles, then we need 3x2x1000 solutions. From testing the longest solutions the worst case puzzle needs 198 moves. Looking at the move format it uses 8 bytes per move.

Taking the simplest (and worst) solution, we could declare an array [3][2][1000][198][8] bytes in size, which would need a staggering 9,504,000 bytes. Most of our apps are not this big and they have game code and artwork, so clearly this is not a reasonable solution.

Cutting the Space Requirement: Step 1

Of course this sizes the array by the worst case move record, which obviously could be compacted into variable length records. This gives an average record size of 125 instead of 198, which would reduce the overall size to 3*2*1000*125*8 = 6,000,000, which is still too much.

We can take a look at the move format and from this we can see that the 8-bit bytes are wasteful. There are only 10 move types and this can be contained in 4-bits. The card chosen needs 52 possible cards, which needs 6 bits. The tableau destination needs only 3-bits etc. To cut a long story short, this could cut the move record from 8 down to 3 bytes, needing 2,250,000 bytes in total.

This is looking better, but 2.25meg is approaching the size of our smallest released product.

We can do better…

Cutting the Space Requirement: Step 2

The next idea is the simple idea of replacing the coded moves by an index that gives the number of the move in the legal moves list. This means that the record cannot be decoded without the actual game engine to determine what the move ordering is. This makes the record context sensitive but gives a radical improvement. Each move can now be stored in a single byte. Now the size is 4x smaller at 3*2*1000*125*1 = 750,000.

At this point we might have considered stopping as this is a third the size of our smallest app.

However we can do better…

Cutting the Space Requirement: Step 3 Plausibility + Morse

If we had the moves well ordered we might then gain the advantage that the distribution of possible move list indexes would not only be random but actually highly skewed. If this works then "first" in the moves list would be very common and second less common and so on. This would allow a Morse-like coding strategy such that the most common index (assumed first) would be the equivalent of the letter "E" in Morse, having a single bit to represent it and late positions in the list having up to 5 bits. This would promise a plausible prospect of a very high level of compression, perhaps estimated as much as about 4 moves per byte. That would give a code size of about 190k.

This is very small, but a better and simpler solution is possible…

Cutting the Space Requirement: Step 4 Plausibility + Run-line encoding

Examination of the actual solution records reduced to plausible move list indexes actually showed that the Plausibility analysis was getting the first move right almost all the time. This offers a new simpler and better option: to use run-line encoding on the move lists. So a series of 9 zeros (first move) could be reduced to a single byte of the form 0x89 or 128+9. Such was the dominance of these first moves that other moves could be left as-is occupying a byte per index.

Analysis of the entire set showed that the entire set could be reduced to just 96,080 bytes and this included the start-up 6 bytes needed for each solution, storing the random seed, length of record and number of moves. This is storing moves at an astonishing 12.8 moves per byte, ignoring the headers, and 7.8 moves per byte after header overheads. This is almost 100x smaller than the original worst-case method above and makes the point very strongly that plausibility analysis can have a very dramatic impact on game database compression.

An example game record and the compressed record

The following is the very first game solution in the database, without the initial deal

30.	T	Tableau to Tableau 5 -> 1 (from 5)
31.	4	Draw
32.	K	Draw
33.	8	Draw
34.	A	Draw
35.	4	Draw
36.	T	Waste to Tableau -> 7
37.	K	Draw
38.	Q	Draw
39.	2	Waste to Tableau -> 2
40.	9	Draw
41.	?	Re-cycle
42.	4	Draw
43.	K	Draw
44.	9	Waste to Tableau -> 7
45.	8	Tableau to Tableau 3 -> 7 (from 3)
46.	7	Tableau to Tableau 5 -> 3 (from 4)
47.	7	Tableau to Tableau 5 -> 7 (from 3)
48.	Q	Tableau to Tableau 5 -> 4 (from 2)
49.	J	Tableau to Tableau 1 -> 4 (from 1)
50.	8	Draw
51.	A	Draw
52.	4	Draw
53.	K	Waste to Tableau -> 1
54.	2	Draw
55.	K	Draw
56.	Q	Waste to Tableau -> 1
57.	T	Draw
58.	?	Re-cycle
59.	4	Draw
60.	6	Waste to Tableau -> 7
61.	K	Draw
62.	9	Draw
63.	A	Waste to Foundation -> 1
64.	A	Draw
65.	J	Draw
66.	Q	Draw
67.	T	Draw
68.	?	Re-cycle
69.	4	Draw
70.	5	Draw
71.	9	Waste to Tableau -> 4
72.	8	Waste to Tableau -> 4
73.	5	Waste to Tableau -> 7
74.	4	Tableau to Tableau 5 -> 7 (from 1)
75.	K	Waste to Tableau -> 5
76.	Q	Draw
77.	7	Waste to Tableau -> 4
78.	A	Waste to Foundation -> 2
79.	Q	Waste to Tableau -> 5
80.	4	Draw
81.	2	Waste to Foundation -> 2
82.	J	Waste to Tableau -> 1
83.	8	Draw
84.	9	Draw
85.	T	Waste to Tableau -> 1
86.	?	Re-cycle
87.	J	Tableau to Tableau 7 -> 5 (from 7)
88.	3	Tableau to Foundation 7 -> 2
89.	A	Tableau to Foundation 7 -> 3
90.	5	Tableau to Tableau 6 -> 7 (from 6)
91.	2	Tableau to Foundation 6 -> 1
92.	2	Tableau to Foundation 6 -> 3
93.	3	Tableau to Tableau 6 -> 5 (from 3)
94.	6	Tableau to Tableau 7 -> 4 (from 4)
95.	T	Tableau to Tableau 6 -> 7 (from 2)
96.	6	Tableau to Tableau 6 -> 3 (from 1)
97.	K	Tableau to Tableau 4 -> 6 (from 4)
98.	3	Tableau to Foundation 4 -> 1
99.	3	Tableau to Tableau 2 -> 4 (from 2)
100.	5	Tableau to Tableau 2 -> 3 (from 1)
101.	4	Tableau to Tableau 4 -> 3 (from 2)
102.	4	Draw
103.	4	Waste to Foundation -> 2
104.	8	Draw
105.	K	Waste to Tableau -> 2
106.	Q	Waste to Tableau -> 2
107.	J	Tableau to Tableau 7 -> 2 (from 3)
108.	A	Tableau to Foundation 7 -> 4
109.	2	Tableau to Foundation 3 -> 4
110.	3	Tableau to Foundation 3 -> 3
111.	4	Tableau to Foundation 3 -> 1
112.	5	Tableau to Foundation 3 -> 2
113.	3	Tableau to Foundation 5 -> 4
114.	4	Tableau to Foundation 5 -> 3
115.	5	Tableau to Foundation 5 -> 1
116.	6	Tableau to Foundation 5 -> 2
117.	5	Tableau to Foundation 6 -> 3
118.	6	Tableau to Foundation 6 -> 1
119.	7	Tableau to Foundation 5 -> 1
120.	8	Tableau to Tableau 3 -> 7 (from 2)
121.	9	Tableau to Tableau 7 -> 1 (from 1)
122.	7	Tableau to Tableau 3 -> 5 (from 1)
123.	9	Draw
124.	9	Waste to Tableau -> 2
125.	8	Waste to Tableau -> 2
126.	6	Waste to Foundation -> 3
127.	4	Waste to Foundation -> 4
128.	5	Tableau to Foundation 4 -> 4
129.	6	Tableau to Foundation 1 -> 4
130.	7	Tableau to Foundation 1 -> 2
131.	7	Tableau to Foundation 5 -> 4
132.	8	Tableau to Foundation 1 -> 4
133.	8	Tableau to Foundation 5 -> 2
134.	9	Tableau to Foundation 1 -> 2
135.	9	Tableau to Foundation 5 -> 4
136.	T	Tableau to Foundation 1 -> 4
137.	T	Tableau to Foundation 5 -> 2
138.	7	Tableau to Foundation 6 -> 3
139.	8	Tableau to Foundation 2 -> 3
140.	8	Tableau to Foundation 6 -> 1
141.	9	Tableau to Foundation 2 -> 1
142.	9	Tableau to Foundation 6 -> 3
143.	T	Tableau to Foundation 2 -> 3
144.	J	Tableau to Foundation 1 -> 3
145.	J	Tableau to Foundation 2 -> 4
146.	Q	Tableau to Foundation 2 -> 3
147.	T	Tableau to Foundation 6 -> 1
148.	J	Tableau to Foundation 5 -> 1
149.	Q	Tableau to Foundation 1 -> 1
150.	K	Tableau to Foundation 1 -> 3
151.	J	Tableau to Foundation 6 -> 2
152.	Q	Tableau to Foundation 5 -> 2
153.	K	Tableau to Foundation 5 -> 1
154.	Q	Tableau to Foundation 6 -> 4
155.	K	Tableau to Foundation 2 -> 4
156.	K	Tableau to Foundation 6 -> 2

and below is the complete compressed game record for this solution above:

197,1,185

Need we say more?

It is this compression record type that will be used in our shortly to be released Solitaire product.

Jeff Rollason - February 2013