The way this thing works in general: First off, it creates the puzzle in it's start position in memory, compresses the puzzle and stores that position on disk as the first generation. From then on the solver iterates through each position of the previous generation. With each position the solver tests all actions that can be considered a single move (move one piece one or more times orthogonally but not necessarily in the same direction each time (ie right then up) to a different position). For each new position generated it tests to see if it has not been previously generated (if you store _all_ positions of the move tree you only need to search the current and previous 2 generations) and if it is found to be unique then it is stored on disk. Repeat this until the solution is found. The path through the move tree can be found by starting at the solved position and searching the previous generation for any position that can get to the solution in 1 move. Repeat this step with this new position and the generation before that until you reach the puzzle's start position. The BIG optimization: This is the way most solvers work but I've optimized this process by doing things in batches. Most importantly the duplicate testing is performed in batches. This is where bigmem and smallmem parameters come into play. The solver reads in a batch of positions off of disk from the previous generation into a smallmem memory block. These positions are compressed individually and then additionally compressed as a batch. Optionally I could have Deflate'd them also (and yes all three work well in concert better than any or or two individually, more on the compression algorythms later). The solver undoes the batch compression (in place!) so it can access each position in a random access fashion. The (now only individually) compressed positions are all the same length, already sorted, and take up to smallmem of memory in total. Now the solver walks through the list of positions in the first smallmem block of memory and finds new positions, dropping them (compressed) onto a binary tree in the bigmem sized memory block. Obviously we wish to discard anything that would be duplicated on the binary tree, but beyond that we don't do any other dup checking until either we've gotten to the end of the positions in the previous generation or we've filled up the bigmem memory block. Because the new positions are put onto a binary tree and the new positions are typically very similar to their previous position, it reads through the previous generations positions in memory in a non-linear (bifurcating?) pattern to give the binary tree better shape. When it's time to dup check the new positions (either the end of the generation or bigmem is full), the binary tree is converted in place into a sorted doubly linked list. The solver then goes through each batch of positions from the current generation and the previous two generations (loading them in turn into the second smallmem sized memory block) and removes from the linked list any "new" positions that duplicate a position from the previously saved batches of positions. This is made extremely efficient by the fact that each batch of positions is sorted (remember?) as is the linked list of "new" positions. Once all duplicates have been removed we create a one or more smallmem sized batches (in the second smallmem sized memory block) which are then batch compressed and saved on disk. Once the generation is completely gone through we do the same on the newly created generation until we find the solution. Finding new positions: To find all of the possible child positions for a given position is done by iterating through each movable piece, and if it can move... record it's current position, and perform a depth first search for every possible move that one piece can make, recording each new position and stop recursing at any given point if the piece has gotten here before. I do a few precalculations to optimize this some but that's not going to be covered here. Compression: Since the full move tree of a puzzle is being stored on disk and things run faster the more positions that can be loaded into memory at any given time, and because comparing two small structures is faster than comparing to larger structures, most of the work on any given position is done while it is compressed (everything but finding new positions). The compression algorythm I use goes something like this: Each piece _type_ is assigned a distinct value. An empty space also has it's own value for this purpose. Walk sequentially across all of the locations on the map of the puzzle ignoring walls. As you run across a puzzle piece (or empty space) that you have not previously recorded, record the value for that type of piece. When you have walked the entire puzzle you will have one entry recorded for each piece and each space on the puzzle. Store this in a string with as few bits as necessary per piece. Ideally you can use Huffman coding for this, but I didn't which typically costs me zero to two bytes per puzzle position over the ideal. Since each position is sorted in it's batch, I can take advantage of this to save space in a batch compression. Very simply all I do is see how many of the initial bytes of the current compressed position match those of the previous compressed position and store that number followed by the non duplicated bytes. I've found that this can be gzip'd/deflate'd for additional gain but I've never implemented it.