Generating the full list of valid states of a 2×2 Rubik’s Cube

(What’s this got to do with Arduino? Why bother doing any of this at all? Stay tuned for a future project!)

The Pocket Cube is a 2x2x2 version of the Rubik’s Cube. It’s a much simpler puzzle than the full 3×3, and a really high-quality speedcube like this MoYu WeiPo can be had for not a lot of dollars. I’ve recently taught myself to solve it using Ryan Goessl’s excellent 2×2 beginner’s guide, and am down to about 25 seconds each solve.

I’m not much of a speedcuber (the current world record for solving a 2×2 is 0.49 seconds), but I love the idea of a Rubik’s cube as a sort of state machine, and wanted to explore the mathematics of it a little bit.

A pocket cube has 24 stickers, four of each of the now-standard Rubik’s colours – white, yellow, green, blue, orange, and red. Each of the eight corners has three different colours and the cube can be mixed up or scrambled by rotating the faces 90 or 180 degrees each turn. There are 3,674,160 possible states (including solved, and the rather fetching striped pattern in the photo above). Despite this, it is only ever eleven moves away from being completely solved (this is God’s number for the 2×2; a 3×3 Rubik’s can always be solved in no more than 20 moves). I wanted to try to generate a full list of every ‘legal’ state of a 2×2 – only states you can get to by turning the puzzle in the usual way, no disassembly or twisting corners) – and turned to programming for a solution.

The state of a 2×2 cube can be represented by a string of 24 letters representing the colours on each face of the cube, such as “WWWWGGGGRRRRBBBBOOOOYYYY”. With 3,674,160 valid combinations, 25 letters (including a handy end-of-line character) would take 91,854,000 bytes to store – 92 megabytes. This surprised me a bit – the entire catalogue of possible states for a pocket cube to be in would fit on even the cheapest USB flash drives available today.

(The same can not be said about a regular Rubik’s Cube. A 3x3x3 has a total of 54 stickers, nine to a face, and a little over 43 quintillion unique, legal states – that’s 43 thousand thousand thousand thousand thousand thousand permutations. If you multiply that number by 55 bytes – 54 to store each sequence of letters representing each individual state, and another byte thrown in for handy line breaks – you’re looking at over two zettabytes of data. To try and put that into a human perspective, Seagate, Toshiba and Hitachi together manufactured about 0.9Zb of storage in 2015 – to store this much data, we’re talking about acquiring two and a half years of the entire world’s production of hard drives in order to do so. You’d be able to bring it down a fair way with compression, but data redundancy, availability requirements, which continent you’re going to build this data centre on and which nations you’ll be negotiating with for power will also affect the final details. The infrastructure nerds among you can feel free to discuss how they’d go about implementing such a project. Call it the Rubik’s Storage Problem, if you will.)

A stupid idea

My very first idea was just to take that cube state string – “WWWWGGGGRRRRBBBBOOOOYYYY” – and generate every random combination and permutation of those letters. The majority of these would be illegal states (including stuff like three white stickers on the same corner, which is rubbish), so to trim them down, I planned to run every generated state through a 2×2 solver, which would spit out an error any time it saw an invalid state.

C++ provides an in-built library for doing combinations and permutations, and it didn’t take long to find a helpful example on Stack Overflow that would do what I wanted. Here’s the code for my excessive 2×2 generator:

#include <iostream>
#include <algorithm>

int main()
    {
    char stickers[] = "WWWWGGGGRRRRBBBBOOOOYYYY";

    do cout << stickers << endl;
    while(std::next_permutation(stickers,stickers+24));

    return 0;
    }

This really was a terrible idea. I kind of expected to have to leave this running for a couple of hours, or maybe overnight at best. With zero logic at all behind finding actual legal states, this program went merrily on its way for two days straight, generating nearly a terabyte of garbage before I got sick of moving files out of the way and cut it short. I’ll leave it as an exercise for you guys to figure out how much space this would have taken up, had I left it running.

(It occurred to me a little later on that if I’d incorporated the 2×2 solver into my program flow, and verified each state as legal and solvable before committing it to disk, the issue of storage would simply vanish. But that approach was rather more complicated to put together, and would still take an inordinate amount of time to run, so I never really got around to it.)

A lot of wasted CPU cycles.

Within those two days, I managed to design and implement rather more intelligent solution anyway.

Hamiltonian paths and Singmaster notation

A Hamiltonian path is basically a way of joining a set of points in a graph while only visiting each point once. A Hamiltonian cycle is a path that joins the last point to the first point again in an endless loop. Someone with a much, much bigger brain than me generated the Hamiltonian cycles for both the 3×3 and 2×2 cubes, leaving me with a very strong starting point for my project: a set of move instructions for a 2×2 that, if followed precisely, would take a cube to every single unique and legal state possible, and return it to solved at the end.

The first 31 of 51,031 lines. It’s a pretty dull read.

Bruce generated his path using a variation on Singmaster notation, which is how individual moves on a Rubik’s cube are described – U turns the upper face clockwise by 90 degrees, R turns the right face clockwise by 90 degrees, U’ (called U prime) turns the upper face counterclockwise 90 degrees, R2 would be a 180 degree turn of the right face, etc etc. To avoid using multiple characters to represent a single move, or confusing things with upper and lower case letters, he used V, S and G to represent U’, R’ and F’, respectively.

Of course, to make any use at all of this, I would need a cube I could make all these moves on. So I wrote one.

The PocketCube Class

Because the Hamiltonian cycle data only makes U, U’, R, R’, F and F’ moves, I didn’t have to implement D, D’, L, L’, B, B’ (the other faces), U2/D2/R2/F2/L2/B2 (180 degree moves), or x, y and z (cube rotations). My pocketcube class simulates a virtual 2×2, performs U/R/F moves and their primes, and lets you print out the cube’s current state either in a pretty grid or just as a plain array. Here’s the declaration:

char state[] = "WWWWGGRRBBOOGGRRBBOOYYYY";

class pocketcube
    {
    public:
        bool make_move(char);
        void print_cube();
        void print_state();
    private:
        void turn_U();
        void turn_Uprime();
        void turn_R();
        void turn_Rprime();
        void turn_F();
        void turn_Fprime();
    };

(This is just the class prototype or header, not the actual code that makes it work. You can download the full implementation of this class, helpfully commented, along with the results, below.)

All that remains is to write a simple program to read the Hamiltonian cycle data out of the file, move by move, byte by byte, and run it through my virtual pocket cube:

#include <iostream>
#include <fstream>
#include "pocketcube.cpp"

int main()
    {
    pocketcube twobytwo;
 
    ifstream hamcycle ("Hamilton222.txt", ios::in);
    hamcycle.seekg (0, hamcycle.end);
    int length = hamcycle.tellg();
    hamcycle.seekg (0, hamcycle.beg);
 
    char* input = new char[1];
 
    for (int i = 0; i < length; i++)
        {
        hamcycle.read(input, 1);
        if(twobytwo.make_move(input[0])) 
            twobytwo.print_state();
        }
 
    hamcycle.close();
    return 0;
    }

And the results speak for themselves.

Booyah! We have data.

Run the program with a command redirector to dump output to a file:

D:\cycler\> cycler >2x2states.txt

…and suddenly you’ve got an 89Mb file with all the data right there. It only takes about ten seconds to finish running through the entire set.

(89 megs, compared to whatever Stupid Idea #1 was going to produce. Goes to show that your chances of re-stickering a Rubik’s Cube at random and coming up with a variation that is actually solvable are vanishingly slim.)

The results

Because it’s just the same six letters repeated over and over in various combinations, with some line endings thrown in for flavour, it compresses very well – click here to download the whole thing in a 9.3Mb 7-Zip file (the best I could do with a plain compressed .zip was about 19Mb). The expanded file is too big for Notepad to open without crashing; you’ll need Notepad++ or another smarter text editor to actually read it.

You can also download my code (the Pocket Cube class and another short program to use it), complete with verbose comments, here.

An alternative method

This would still be possible without the 2×2 Hamiltonian path, although it would have to be done in a very different way. Of the eight corners on a pocket cube, each can be in only one of eight positions, and one of three orientations. (If you’re really clever about it, you can fix one corner in space and only ‘move’ the other seven – a 2×2 has no intrinsic orientation in space because it lacks a 3×3’s ‘centre’ cubies. This property of the pocket cube is also why you can work through its entire Hamiltonian cycle making only U, R and F face moves – the D face completely changes orientation relative to the U face, even if it doesn’t physically move.)

If you’re careful with the rules about which corners can be rotated at any given time, you could generate the individual states just by exhaustively permuting the corners rather than the individual stickers. This approach was a little bit beyond my idle programming skills on the weekend I had spare for this project, and by the time I realised exactly how it could be done, I’d already succeeded with building the states with the Hamiltonian path.

A programming oversight

The 2×2 Hamiltonian cycle data file is 3,776,220 bytes long. It contains 51,031 lines of 74 characters – 72 alphanumeric, plus two more bytes for carriage return and line feed characters. V1.0 of my program generated 3,776,220 lines of output, not the magic 3,674,160, with many unexpected duplicates; my make_move() function was cleverly ignoring the CR/LF, but my outer function was still outputting the state every time the program read in a character, so for every 72 real states, I was accidentally printing out two unchanged, duplicate lines as well. This is why make_move() in the current version returns a bool, and the for loop in cycler.cpp only prints out a state if that function returns true. The corrected code is in the zip file above.