Hacker Cup - Problem #2 - Peg Game Solution
The brute force solution solved the sample inputs in a timely manner, which should not be a surprise given that the largest grid in the test set is only 5x4. After I constructed a test set with a 100x100 matrix, things took a predictable turn for the worse. I allowed it run for 30 minutes without result, and didn’t stick around to see the final outcome. Remember, we only have 6 minutes after downloading the input to submit an answer.
Brute force will not suffice.
Being a complete mathematical noob, I began searching for algorithms I could use to solve the problem in a timely and efficient manner. Two ideas in particular helped me out.
1. Pascal’s Triangle
The first thing I discovered was that, in terms of the data structure for probabilities, I’m dealing with something very similar to Pascal’s Triangle, only instead of each node being the sum of its two parents, each node’s probability to a specific coordinate is the sum of its children’s probabilities (divided by two) to that same coordinate.

This was my first clue toward optimization.
I can model a Slot object that contains a reference to each of its two child slots, unless there is an edge or missing peg to drop through. Then, I can calculate the probability from any given slot, to any other slot by recursively traversing the parent child relationships.
Previously, I had just been modelling slots as a two dimensional array of probabilities and crawling the peg grid every single time for, literally, every single path. Now I’ll have a self-referential data structure that I can create once and reuse for every path calculation.
Half way there.
2. Dynamic Programming

The problem still remains that I am going to be traversing the same paths, over-and-again, wasting time recalculating jumps I’ve already made in previous iterations.
Here’s where I’ll have to thank Facebook for the clue “Too hard for brute force switching to dp”.
We’ll switch to dp, then, or Dynamic Programming.
Dynamic programming refers not to the use of a dynamic programming language - Ruby, Python, Groovy, etc. - but rather to the mathematical concept that deals with breaking large problems up into groups of subproblems, particularly where the large problems consist of often overlapping subproblems.
This technique is frequently employed in situations where recursion would normally be used, but is too expensive for brute force algorithms. Genetic sequencing, or a Peg Game where you will be traversing the same paths repeatedly, for ridiculous example.
With the recent improvement to my data structure, and my existing recursion code, I’m well on my way to implementing a dp technique.
Really, all I need now is a way to store the pathing probabilities once they’ve been calculated, a technique called Memoization.
To do this, I’ll create a two dimensional array in each Slot that represents the probability for arriving at any other given slot from the current one. This way, once a path has been walked and its probability calculated, it is stored in memory and can be accessed directly the next time the path is overlapped by another one.
Each unique path will be walked a maximum of one time. That’s quite an improvement.
Of course, this drastically increases the memory footprint of our application as we are storing a double[][] array representation of the entire grid within every Slot of the grid.
Remember our worst case? 100x100 slots, and each one storing a 100x100 matrix inside itself? That’s 100,000,000 x size of a double in Java. There’s probably a better way to do this, but for now, I’ll just crank the JVM up with -Xmx1024m and forget about it.
After implementing dp, my program now executes in just under 2 seconds against the worst case data.
I downloaded the official contest data from Facebook and it ran all 20 tests in 5.4 seconds. Not terrible at all.
I’m not going to post all of the code here for constructing the Slot objects according to the rules of the game, but I will offer up the meaty part of the Slot code, and the engine that powers my dp implementation.
Again, I’m sure there are better ways to do this, but this should be good-enough:
public class Slot {
public Slot rightSlot;
// FYI, for consistency, I'm always storing the children of slots directly above a missing peg in the leftSlot
public Slot leftSlot;
public double[][] probabilities;
public boolean[][] hasCalculated;
int row;
int column;
//...skipping down to interesting code...//
// Row and Column parameters represent the ultimate TARGET coordinates to find
public double getScore(int row,int column) {
if(hasCalculated[row][column])
return probabilities[row][column];
double probability = 0.0;
if(rightSlot!=null && leftSlot!=null) {
double rightProb = rightSlot.getScore(row, column);
double leftProb = leftSlot.getScore(row, column);
probability = (rightProb+leftProb)*0.5;
} else if(rightSlot!=null) {
probability = rightSlot.getScore(row, column);
} else if(leftSlot!=null) {
probability = leftSlot.getScore(row, column);
} else if(this.row==row && this.column == column) {
probability = 1.0;
}
hasCalculated[row][column] = true;
probabilities[row][column] = probability;
return probability;
}
}
—————
UPDATE 01/10/2011
I optimized further by making the size of each Slot’s internal double[][] array’s first dimension scalable to the scope of its purview. For example, on a 100x100 board, a Slot that exists on the top level still stores a 100x100 array of probability results, but a slot that lives on the 99th row only stores a 2x100 array of results. (Yes, I am scaling only rows right now. One change at a time!)
Of course, I now need to account for the local row offset in my getScore() code, but that’s not a big deal.
With this small change, I’m now able to successfully pass the contest data in 1.8s.
Obviously, there are further optimizations to be gained.
- Scale the columns the same as the rows.
- Do not store a rectangle. The data structure of possibilities for any given path is a actually a triangle.
- Compute from the bottom up. Start with the target coordinates and work your way up. You’ll need to change the way getScore() works and your Slot relationships, but it’d be worth it.
Both of these should be easy enough to implement, but the round is now over and I’m ready to think about new problems.
1 year ago • 0 notes
