For all the literature out there about both neural nets and genetic algorithms, coming up with a specific model for a system can still involve a lot of guesswork and trial and error. There are three big topics up for discussion here: how to structure and interact with the neural nets, how to evaluate their fitness, and how to pass their genomes on to subsequent generations.
I somewhat arbitrarily decided to use a binary activation function in my neural nets. This means they either fire a weighted signal to the next layer, or they don’t fire anything, depending on whether their input is above a variable threshold. My understanding is that the simplicity of a binary function works fine when there’s no error correction happening.
Since we want to plug board states into the net, a binary activation function means we can put a lower bound on the size of the input layer at 15 neurons. Each of the nine squares has three possible states: empty or occupied by one of the two players; if I’m not mistaken that takes log2(3) × 9 = 14 and change bits to encode. With any fewer inputs, we wouldn’t be able to uniquely represent a board state, which is obviously crucial for this exercise. I went with 18 neurons to give each square a clean spread of two neurons.
There are a couple possibilities for output layer size. I thought about using nine neurons, where each neuron calculated the favorability of making a move in that square, picking the highest numbered square as the move to make. Since that opens up the possibility of picking an occupied square, it seemed like it wouldn’t be an ideal situation. Alternately, with only one output neuron, its value could be rounded to give us the square in which to move, but we run into the same problem.
I couldn’t think of a sensible way with no ambiguities to get nets to output a move if we’re just plugging in the current board state. But, if we treat the net as a simple favorability analyzer for all valid moves, we can simply pick the move that yielded the highest output. Instead of plugging in just the current board state, we generate a board state from making each valid move, and plug that into the net. The net’s one output node will tell us the move to make: we simply pick the move that yielded the highest output value.
Technically this means we’re training neural nets to evaluate game states, not actually make moves. In practice, it doesn’t matter: we’re giving the net some education about what makes a good move, and letting it pick the one it thinks is best. It also means we’re running the net multiple times to get one move, which slows down the simulation considerably. I try to make up for that in other areas.
As for the internal layers: I just picked some numbers!
Scoring a game where a tie is usually the best outcome is tricky. I tried having the nets play against the perfect AI a few times, tallying the losses as negative points. Trouble is, it takes a lot of intelligence to even tie one game against a perfect AI, so the nets were effectively stuck at zero. I tried having the nets play against an opponent that just makes random moves. The trouble there is that it takes a number of games to get a statistically significant picture of how fit a net is, and playing that many games takes too long.
I eventually decided on the current method of seeing whether each net matches the perfect AI’s choice at each possible board state. The trick here is we only run each net through a subset of the possible states until the net is ready for the full battery. With more than 4,000 states to check, running each net through every one is too slow again, so we only run the nets through the first few rounds of possible states unless the board plays perfectly through them. This way we get enough of an idea how fit the net is without playing through every possibility, and we always have enough differentiation to keep evolution progressing.
The evolutionary search space we need to traverse is huge with a genome as big as our nets, which are composed of nearly 800 delicately interconnected random numbers. When I tried taking the best nets and simply mutating them slightly a few times each to compose the next generation, it was clear that after only a handful of generations, we’d already exhausted any differentiation between individuals, and no evolutionary progress was being made.
With a more traditional two-parent mating process, where we pick two nets at random, favoring the top scoring nets, and randomly choose which of the parents passes down each gene, the differentiation lasts much longer. But again, after enough generations, we’ve hit a local maximum and aren’t looking far enough away to break out of it. And here’s where I got stuck. I’m not satisfied with the genetic algorithm as it stands, but I haven’t yet stumbled upon how to improve it. I’m certainly open to ideas.
I think that wraps up all the major design choices along the way. I’m pretty satisfied with the results, although as I just said, I think an improved genetic component could go a long way. I haven’t let it run long enough to get past age two or so. I’d be curious to know if anyone else manages to get more intelligence out of it.
Anyway, hopefully someone reading this can point out all the mistakes I’m probably making. If not, maybe it’ll help someone else make the same ones.