Tyler Bischel

May 5, 2013

2013 Google Code Jam Round 1B

Tyler Bischel

May 5, 2013

Better than Round 1, but still missed by several hundred. If I had been able to get a second problem complete, I think I could have advanced. However, my attempt at the third problem turned out to be way too slow.

Problem Osmos:

Armin is playing Osmos, a physics-based puzzle game developed by Hemisphere Games. In this game, he plays a "mote", moving around and absorbing smaller motes.

A "mote" in English is a small particle. In this game, it's a thing that absorbs (or is absorbed by) other things! The game in this problem has a similar idea to Osmos, but does not assume you have played the game.

When Armin's mote absorbs a smaller mote, his mote becomes bigger by the smaller mote's size. Now that it's bigger, it might be able to absorb even more motes. For example: suppose Armin's mote has size 10, and there are other motes of sizes 9, 13 and 19. At the start, Armin's mote can only absorb the mote of size 9. When it absorbs that, it will have size 19. Then it can only absorb the mote of size 13. When it absorbs that, it'll have size 32. Now Armin's mote can absorb the last mote.

Note that Armin's mote can absorb another mote if and only if the other mote is smaller. If the other mote is the same size as his, his mote can't absorb it.

You are responsible for the program that creates motes for Armin to absorb. The program has already created some motes, of various sizes, and has created Armin's mote. Unfortunately, given his mote's size and the list of other motes, it's possible that there's no way for Armin's mote to absorb them all.

You want to fix that. There are two kinds of operations you can perform, in any order, any number of times: you can add a mote of any positive integer size to the game, or you can remove any one of the existing motes. What is the minimum number of times you can perform those operations in order to make it possible for Armin's mote to absorb every other mote?

For example, suppose Armin's mote is of size 10 and the other motes are of sizes [9, 20, 25, 100]. This game isn't currently solvable, but by adding a mote of size 3 and removing the mote of size 100, you can make it solvable in only 2 operations. The answer here is 2.

My Solution:

This is actually a pretty easy problem, both for the small and large sets. The strategy in Osmos should always be just absorb the smallest remaining mote, if you can. Once you can't absorb the smallest, you can't absorb any other ones, so it is up to us to either add a new mote which you can absorb, or remove all the remaining motes. Since we have a vested interest in seeing you absorb the motes with as little help as possible, we want to give you the biggest mote you can currently absorb. This lends itself well to a recursive solution, where you have your current size, and a sorted list of the remaining motes.

There is one edge case to concern ourselves with... if we are a mote of size 1, then it isn't possible for our mote to absorb anyone. In this case, we have to return that we remove all remaining motes, as our strategy for comparison would go into an infinite loop.

Another concern is the runtime. Fortunately, our recursion only needs to run down a single branch for each decision, since if we have to remove motes, we know we have to remove all of them. With that in mind, we run in O(N) time with respect to the length of the mote list... The other concern is when the next remaining mote >> our current mote. But by picking the largest mote available to absorb, we nearly double in size with each iteration, which means we can cover the gap in O(log(M)) time, where M is with respect to the maximum size of a mote. So our algorithm is something like O( N log(M) ).

Problem Falling Diamonds:

Diamonds are falling from the sky. People are now buying up locations where the diamonds can land, just to own a diamond if one does land there. You have been offered one such place, and want to know whether it is a good deal.

Diamonds are shaped like, you guessed it, diamonds: they are squares with vertices (X-1, Y), (X, Y+1), (X+1, Y) and (X, Y-1) for some X, Y which we call the center of the diamond. All the diamonds are always in the X-Y plane. X is the horizontal direction, Y is the vertical direction. The ground is at Y=0, and positive Y coordinates are above the ground.

The diamonds fall one at a time along the Y axis. This means that they start at (0, Y) with Y very large, and fall vertically down, until they hit either the ground or another diamond.

When a diamond hits the ground, it falls until it is buried into the ground up to its center, and then stops moving. This effectively means that all diamonds stop falling or sliding if their center reaches Y=0.

When a diamond hits another diamond, vertex to vertex, it can start sliding down, without turning, in one of the two possible directions: down and left, or down and right. If there is no diamond immediately blocking either of the sides, it slides left or right with equal probability. If there is a diamond blocking one of the sides, the falling diamond will slide to the other side until it is blocked by another diamond, or becomes buried in the ground. If there are diamonds blocking the paths to the left and to the right, the diamond just stops.

Consider the example in the picture. The first diamond hits the ground and stops when halfway buried, with its center at (0, 0). The second diamond may slide either to the left or to the right with equal probability. Here, it happened to go left. It stops buried in the ground next to the first diamond, at (-2, 0). The third diamond will also hit the first one. Then it will either randomly slide to the right and stop in the ground, or slide to the left, and stop between and above the two already-placed diamonds. It again happened to go left, so it stopped at (-1, 1). The fourth diamond has no choice: it will slide right, and stop in the ground at (2, 0).

My Solution:

So I didn't reach the problem until after the contest had ended, but I was I had. There are a couple of insights that make this problem much easier to solve than the third problem. First, you may notice that what we are making is a pyramid, one layer at a time. Each layer of the pyramid (centered at 0,0) is completed before the next layer can begin, because we need the peak of the pyramid in place before a diamond can slide out to the next tier.

So how many diamonds are in a tier? Well, we can inspect the sequence to find a pattern.

So if we sum all those layers together, we get...

$\begin{matrix} D = & \sum_{n=1}^{L}4n-3 \\ D = & -3L + 4\sum_{n=1}^{L}n \\ D = & -3L + 4\frac{L(L+1)}{2} \\ D = & -3L + 2L^{2}+2L \\ D = & 2L^{2}-L \\ \end{matrix}$

Where D is the number of diamonds, and L is the layer of interest. So given that that previous layer must be complete before we start our own layer, we can subtract away everything from the layer below our own, and are left with N-D diamonds.

Now I'd reform the question as follows... how many of the remaining diamonds would have to fall on the opposite side from ours for the layer not to complete up to our location? Well, our location at height Y means that there would have to be at least Y diamonds falling on our side for us to get a diamond. That means that if only Y-1 diamonds fell to our side, N-D-(Y-1) diamonds must fall to the other side. If this number is greater than the other side can hold, then there is a 100% chance that our location will get a diamond. If N-D < Y-1, then there is a 0% chance that our location will get a diamond.

A quick google search tells me...

"

The general formula for the probability of w wins out of n trials, where the probability of each win is p, is combin(n,w) × p^w × (1-p)^(n-w) = [n!/(w! × (n-w)!] × p^w × (1-p)^(n-w)"

Since the probability p = 0.5 (its equally likely to fall left or right), and we want to consider all cases from Y to N-D, we can reformulate as follows...

$\begin{matrix} P = & \sum_{w=Y}^{N-D} p^{w}(1-p)^{n-w} (\frac{n!}{w!(n-w)!}) \\ & p = \frac{1}{2} & \\ & n = N-D \\ P = & \sum_{w=Y}^{N-D} (\frac{1}{2})^{N-D} (\frac{(N-D)!}{w!(N-D-w)!}) \\ \end{matrix}$

So we've arrived at a formula for calculating the probability. I haven't checked my numbers, since I didn't actually reach this part in competition, but I'm fairly certain that it will work. Another optimization in implementing this summation is to recognize that as w gets larger, the added probability gets smaller. We probably can stop the search early if the probability of the last iteration times the remaining iterations is less than the threshold google checks for... In cases where N-D is quite large relative to Y, this can be very useful.

Problem Garbled Email:

Gagan just got an email from her friend Jorge. The email contains important information, but unfortunately it was corrupted when it was sent: all of the spaces are missing, and after the removal of the spaces, some of the letters have been changed to other letters! All Gagan has now is a string S of lower-case characters.

You know that the email was originally made out of words from the dictionary described below. You also know the letters were changed after the spaces were removed, and that the difference between the indices of any two letter changes is not less than 5. So for example, the string "code jam" could have become "codejam", "dodejbm", "zodejan" or "cidejab", but not "kodezam" (because the distance between the indices of the "k" change and the "z" change is only 4).

What is the minimum number of letters that could have been changed?

My Solution:

This was the problem I actually attempted... and while I produced valid results for the small example test cases, it was far too slow to actually finish even the small dataset. Funny, I work at an email company, and have a coworker named Jorge... I couldn't resist trying this problem instead of the second. I'll present my ideas here, but would be interested in seeing what the correct approach is to this problem from someone else. It reminded me of a problem I'd done on my blog a few months ago... a Morse Decoder... so I felt like I could reuse some of those ideas. The wildcard matching needed seemed to make this problem much harder... so I went about setting up the dictionary in a tree, where each node of the tree contains a dictionary mapping the next character to the list of possible nodes, as well as a list of all sub nodes. When inserting the dictionary into this tree, I broke it up into N trees, where each tree was of words of length n. The tree class then had a function for returning the distance between itself and a given substring from the test case... that code looked like this:

Here, scoreSoFar is initialized to 0, and bestSoFar would be the length of the word (where we would have to change all letters to match something in the dictionary). Then this code was used in a recursive search, where we'd map out all possible first word fits in the test case, and recursively search the remaining part, keeping the minimum score of that. I tried caching as I went for reuse, but that didn't really help. This was just way too slow. Interested in feedback on successful solutions.

So those are my thoughts on Round 1B of the code jam. Last chance for me is 1C... if I can stay awake that late.

Tyler Bischel

December 15, 2012

Golden Section Search

Tyler Bischel

December 15, 2012

Golden Ratio seen in nature

Last time, I demonstrated how to implement a gradient descent optimization algorithm, which is widely used in artificial intelligence to optimize neural networks. It is versatile, able to accept any multidimensional input function, and simple to understand and implement. However, you wouldn't be likely to find this in a situation where performance matters... it takes many iterations to find a local minimum. If it is expensive to execute your function being minimized, you rapidly discover the importance of performance in optimization, and hence the large number of approaches. Another drawback is that Gradient Descent doesn't provide a simple mechanism for constraining the search. If you know the region where a minimum must exist, you need to look for another approach.

Enter the Golden Section Search, a one dimensional search tool with a number of benefits over gradient descent. First, it is a class of minimization techniques which work by reducing a bracket containing a minimum. This means we can preset the bracket based on prior information, which can be a major upside. Second, it is typically more efficient than a gradient descent search. The downside is that it is limited to a single input dimension, although it can be combined with a conjugate gradient search method, minimizing a single value in the current search direction.

Golden Section Search Diagram

So how does this approach work? Lets start with the above graph, where we have some unknown function f(x), and three points (x1, f1), (x2, f2), (x3, f3) which we already know something about. x4 is our next candidate x value, and f4a and f4b are possible y values for this candidate. We remember from the extreme value theorem that there will be a local minimum within a bracket when there is a point inside the bracket that falls below the line connecting the bracketing points. In this case, f2 clearly falls below the line connecting f1 and f3. Now we pick our bracketing points in a way that there will always be a slightly bigger side, and our next x value will be in the side with the bigger boundary. In this case, b is bigger than a (along the x-axis in the diagram), so our next point will be to the right of x2. Now with your minds eye, you can think of a parabola passing through f1, f2, and f4a, and you can see that the minimum value is in between f4a and f1... where as there would be a maximum if you made a parabola between f2, f4a, and f3. Because of this, you would use x1, x2, and x4 as your next three bracket points. If instead we were to do the same thing with f4b, the bracket would instead work out to x2, x4, and x3. This process is recursively repeated until your bracket is small enough to achieve a desired precision.

Now some code.

    public delegate double Function1D(double x);

    public class GoldenSectionSearch
    {
        private readonly double _goldenRatio = 2.0 - (1.0 + Math.Sqrt(5.0))/2.0;
        public double Threshold { get; set; }

        public GoldenSectionSearch()
        {
            Threshold = 0.001;
        }

        public double Minimize(Function1D function, double lowerBound, double upperBound)
        {
            var f = CachedFunction(function);
            var midpoint = StartingSearchPoint(lowerBound, upperBound);
            return Minimize(f, lowerBound, midpoint, upperBound);
        }

        private double Minimize(Function1D f, double lower, double midpoint, double upper)
        {
            var x = NextSearchPoint(lower, midpoint, upper);
            if (IsSearchDone(lower, midpoint, upper, x))
                return FinalPosition(lower, upper);

            if (IsMinimumInNewSearchBracket(f, midpoint, x))
            {
                if (IsUpperBracketLarger(lower, midpoint, upper))
                    return Minimize(f, midpoint, x, upper);
                return Minimize(f, lower, x, upper);
            }
            else
            {
                if (IsUpperBracketLarger(lower, midpoint, upper))
                    return Minimize(f, lower, midpoint, x);
                return Minimize(f, x, midpoint, upper);
            }
        }

        private Function1D CachedFunction(Function1D function)
        {
            var cache = new Dictionary();
            Function1D f = (x) =>
            {
                if (cache.ContainsKey(x))
                {
                    return cache[x];
                }
                var result = function(x);
                cache[x] = result;
                return result;
            };
            return f;
        }

        private double StartingSearchPoint(double lower, double upper)
        {
            return (upper - lower)*_goldenRatio + lower;
        }

        private double NextSearchPoint(double lower, double midpoint, double upper)
        {
            if(IsUpperBracketLarger(lower, midpoint, upper))
                return midpoint + _goldenRatio * (upper - midpoint);
            return midpoint - _goldenRatio * (midpoint - lower);
        }

        private bool IsSearchDone(double lower, double midpoint, double upper, double x)
        {
            return Math.Abs(upper - lower) < Threshold*(Math.Abs(midpoint) + Math.Abs(x));
        }

        private double FinalPosition(double lower, double upper)
        {
            return (upper + lower)/2.0;
        }

        private bool IsUpperBracketLarger(double lower, double midpoint, double upper)
        {
            return (upper - midpoint > midpoint - lower);
        }

        private bool IsMinimumInNewSearchBracket(Function1D f, double midpoint, double x)
        {
            return f(x) < f(midpoint);
        }
    }

Some explanation here...

First we use a function delegate to force the user to provide a 1d function to minimize. It is less flexible than the gradient descent delegates, but a necessary feature of this algorithm.

Inside the public Minimize function, we wrap the input with a caching decorator, so that any value we have already evaluated is reused.

The search termination criteria in the IsSearchDone is somewhat confusing, but a recommended formula by the numerical recipes guys. The gist is we say a minimum is found when the width of the bracket is smaller than a precision that we care about.

We use the "golden ratio" to pick the size of our bracket steps. This gives us special properties when we are reducing the size of the bracket to make sure the ratios never change, and also minimizes the number of steps required to reach the minimum. It is a surprising result that it is actually more efficient than bisection.

The private Minimize helper function is the workhorse, where the recursive search happens. Each iteration, we check to see if we are done, and if not, reduce the bracket, and pick a lower, midpoint, and upper bounds for the next iteration based on the intuition described above the code.

So now a test to show it in action....

        [Test]
        public void GoldenSectionSearchWorks()
        {
            var gss = new GoldenSectionSearch { Threshold = 0.0001 };
            const double lowerBounds = -10.0;
            const double upperBounds = 10.0;
            Function1D f = (x) => Math.Pow(x - 4.0, 2.0);
            var solution = gss.Minimize(f, lowerBounds, upperBounds);
            Assert.AreEqual(solution, 4.0, 0.001);
        }

And we are done! A C# Golden Section Search algorithm.

Tyler Bischel

December 2, 2012

Gradient Descent

Tyler Bischel

December 2, 2012

Whats the right plan to achieve our goals?

I've come across this story many times in my career... tell me if it sounds familiar to you: I'm working on a program that has to make a decision about what to do next. I can evaluate the outcome for whatever decision I make, but I can't predetermine which actions will lead to the best solution. This is a situation that calls for numerical optimization techniques. The use of numerical optimization is pervasive throughout computer science. Artificial intelligence uses it to optimize neural networks. Engineering tools use it to improve the design of components. Computer games use it to have the bad guys make decisions that look intelligent. So how does it work?

Black Hole Gravity Field

We can start by making a conceptual leap, illustrated by an example. Suppose you are in a space ship, which you are trying to pilot to the center of a black hole. You can't see where the black hole is, but you can feel its gravity. You know that its gravity will get stronger as you get close to the center. In this example, we can think of the effect of gravity on space as our optimization function. If we graph that against our position, we get a plot with a nice hole where our black hole sits. If we had a ball, all we would have to do is set it down on that graph, and let it slide downhill until it reached the center.

Gradient Descent

This simple strategy is actually the formulation for the gradient descent algorithm. If at each point, if we know the slope of our function to be optimized, we can just take small steps in the downhill direction. Eventually, we will reach a point at the bottom of the hill, and that is where we stop.

Gradient Descent Formula

With this understanding of what we are doing, it is actually very easy to understand the formulation. Here, F(x) is our function being evaluated, x is our position in the optimization space, gamma is the size of the step willing to take, and n is the step number. our position at step n+1 can be evaluated as our position at step n plus a small movement in the downhill direction. When our update fails to sufficiently improve our evaluation, we assume that a local minimum has been reached, and the search is terminated.

Code Design

From a usability perspective, we want a design flexible for reuse with any arbitrary function to be minimized. We don't want the user to have to provide a second function describing the gradient of their optimization function. And we want to allow the user to provide a good guess as a starting point for the optimization.

        public double Precision { get; set; }
        public double Eps { get; set; }
        public int MaxIterations { get; set; }
        public GradientDescent()
        {
            Precision = 0.001;
            Eps = 0.1;
            MaxIterations = 1000;
        }

We want to expose a couple parameters that effect the time it takes to reach a solution. First, we expose the precision we are looking to achieve, typically a very small floating point number. Next, we expose Eps, which is the step size we take each iteration. This can be tuned to the scale of your inputs, but typically we still want fairly small steps. Finally, we expose the maximum number of iterations. This is a practical measure, as small errors in the gradient can lead to infinite loops near the minimum, while the optimizer bounces back and forth between two positions. If we get stuck, we rely on the iterations to time out, and a satisfactory answer is returned.

public delegate double Function(List x);
public delegate List Derivative(List x);

To represent our optimization functions, the user must implement a function conforming to the first "Function" delegate pattern. It is designed to map an n-dimensional input vector to a single output metric. This makes it flexible enough to handle a wide variety of input functions. We are going to use a second delegate to represent the local derivative, so that we can generate a functor wrapping the input function, generically providing the derivative for easy use.

        public double Distance(List a, List b)
        {
            return Math.Sqrt(a.Zip(b, (i, j) => Math.Pow(i - j, 2.0)).Sum());
        }

We can recognize an ending connection when the next step is approximately the same location as the current step. Note that at a local minimum, the gradient of F(x) goes to zero, and thus so does our step length. We can use the Euclidean distance to measure how much our step has changed, and stop when this metric is sufficiently small.

        public Derivative Gradient(Function f)
        {
            return x => Enumerable.Range(0, x.Count)
                            .Select(i => (f(x.Select((y, j) => j == i ? y + Precision : y).ToList()) -
                                          f(x.Select((y, j) => j == i ? y - Precision : y).ToList()))
                                         / (2*Precision))
                            .ToList();
        }

If you reach back to your calculus class, the definition of the derivative gives us these simple formulations. I choose to use central differencing, because it is slightly more accurate, but still only requires two function evaluations.

        public List Optimize(Function f, List seed)
        {
            var fPrime = Gradient(f);
            var cnt = 0;
            var x_old = seed.Select(x => x + Eps).ToList();
            var x_new = seed;
            while (Distance(x_new, x_old) > Precision && cnt < MaxIterations)
            {
                x_old = x_new;
                x_new = fPrime(x_old).Zip(x_old, (i, j) => j - Eps*i).ToList();
                cnt++;
            }
            return x_new;
        }

Finally, we have our actual optimization function. the "seed" variable represents the starting point of the search, and good guesses can dramatically impact the performance of the algorithm. Also, since we generically defined the Function delegate to take an arbitrary sized vector as input, we need to use the seed to infer a specific size to the inputs of our function.

        [Test]
        public void OptimizerWorks()
        {
            var gd = new GradientDescent();
            var seed = new List {0.0, 0.0};
            var solution = gd.Optimize((data) => { return Math.Pow(data[0] - 2, 2.0) + Math.Pow(data[1] - 4, 2.0); },
                                       seed);

            Assert.AreEqual(solution[0], 2.0, 0.001, "best x found");
            Assert.AreEqual(solution[1], 4.0, 0.001, "best y found");
        }

Now, we have a test showing how to use the module, and voila, a simple C# Gradient Descent module!

Tyler Bischel

November 27, 2012

A Morse Decoder

Tyler Bischel

November 27, 2012

I couldn't resist tackling this puzzle at home, given that it is named for my distant relative, Samuel F.B. Morse. My colleague brought it in as a potential onsite interview question for a candidate we were considering. I can't help but be addicted by puzzles anyways, I love the challenge.

The premise is this: You have found a stream of dots and dashes which someone has told you is a Morse Code representation of some phrase. The problem is that Morse Code uses a timing gap to represent spaces between letters, and the spaces were ignored here. If you were given a list of possible words, and a hash table mapping of Morse style dots and dashes to the alpha character they represent, could you recover the original phrase?

This problem really has two steps... recover possible encoding of Morse code to letters, and then breaking the letters up into possible words that could match the letters. Both of these steps are actually the same procedure with a different mapping, so we may be able to achieve our goals with a single function reused multiple times. Clearly the input stream could be thought of as a tree of possible mappings, and this lends itself well to a recursive approach. Each character that has a Morse encoding that could match to the front of the stream is a potential branch. If we assume the phrase is only complete words, we throw out any leaf where we couldn't completely find valid mappings all the way to the end of the string. Likewise, we through out any letter mapping that doesn't finish with a valid word.

My approach is to create a class that has properties for the MorseCode characters, and the list of words making up the dictionary, so that one instance could be reused to recover phrases without having to constantly pass the same mappings. I'm going to have a public "DecodePhrases" method that takes a string of Morse Code charactors, and returns an enumeration of possible valid solutions. I have one private helper function which takes in some additional parameters about the node of the tree that we are currently at, and reuse this function for both encoding steps. My code came out as follows:

    public class Decoder
    {
        public Dictionary Words { get; set; }
        public Dictionary MorseLetters { get; set; }

        public Decoder(List words, Dictionary morseLetters)
        {
            MorseLetters = morseLetters;
            Words = new Dictionary();
            words.ForEach(w => Words[w] = " " + w);
        }

        public IEnumerable DecodePhrases(String input)
        {
            return GenerateCandidates(input, "", MorseLetters)
                .SelectMany(letters => GenerateCandidates(letters, "", Words))
                .Select(phrase => new String(phrase.Skip(1).ToArray()));
        }

        private IEnumerable GenerateCandidates(String input, String candidate, Dictionary validCandidates)
        {
            if (input == "")
                return new List(){candidate};
            return Enumerable.Range(1, input.Count())
                .Select(i => new
                                 {
                                     inp = new String(input.Skip(i).ToArray()),
                                     can = new String(input.Take(i).ToArray())
                                 })
                .Where(c => validCandidates.ContainsKey(c.can))
                .SelectMany(c => GenerateCandidates(c.inp, candidate + validCandidates[c.can], validCandidates));
        }
    }

I followed it up with some NUnit tests to just make sure I wasn't full of crap...

        [Test]
        public void TestSolutionThisIsATest()
        {
            var words =
                "this is a test of the emergency broadcast system.  it is only a test."
                .Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).ToList();


            var testPhrase = String.Join("", "thisisatest".Select(c => MorseLetters.Where(p => p.Value == c.ToString()).First().Key));

            var decoder = new Decoder(words, MorseLetters);
            decoder.DecodePhrases(testPhrase);

            Assert.AreEqual("this is a test", decoder.DecodePhrases(testPhrase).First(), "properly decodes the message");
        }

And there you go! A simple function for decoding Morse Code when spaces and stops are omitted.