ICS 180A, Spring 1997:
Strategy and board game programming

Lecture notes for April 15, 1997
Tuning the evaluation function

Last time I talked about the different types of functions that evaluate features in a position, and how to combine them into an evaluation function by adding the values of many such functions. But, where do the numbers come from?

E.g., in Othello, you might have say four functions:

f(pos) = material (# of my pieces - # of opponent pieces)
g(pos) = corners (# I control - # opponent controls)
h(pos) = mobility (# moves I have available)

You want to form an evaluation function by combining these (probably with some other terms): eval = a f + b g + c h. For instance, you might try eval = -1*f + 10*g + 1*h. But where do these numbers come from? What combination of numbers gives the best performance?

There are various methods for finding numbers by hand:

Normalize. Since you only care about the ordering of evaluations, and (usually) not the actual evaluation values, you can multiply everything by the same constant without changing the results. What this means is that you can choose some particular value (say the material value of a pawn) and force it to be one, so that all the other values are expressed in terms of how many pawns that value is worth. The net effect is that you have one fewer parameter that needs setting.
Deduce constraints. Sometimes it is possible to choose some of the parameters by considering what you want the machine to do in certain types of positions. For instance, in chess, it is usually bad to trade a rook for a bishop or knight, even if you also end up winning a single pawn, but good if you win two pawns, so the material values should satisfy R>B+P (to prevent the single-pawn trade) and R<B+2P (to encourage the double-pawn trade). The more of these inequalities you have, the smaller the set of weights that satisfy them. This can sometimes help get to a reasonable starting approximation for the evaluation weights, but you probably still need to do some adjustment afterwards.
Hand tweaking. Most commonly used. Simply play your program enough times to get an idea of its strengths and weaknesses, guess which parameters would improve those the best, and pick new values for the parameters. Produces a reasonable answer quickly. Requires that you understand the game well enough to play reasonable games against the computer and analyze what it does wrong (i.e. best when the computer is stupid and you are intelligent).

... and without human intervention (much of this should be review from 171, for those students who've taken 171 already; you probably won't have time to do much more than hand-tweaking):

Hill-climbing is like hand-tweaking: make a small change to the weights, test out performance of that change, keep the change only if performance improves, repeat many times. Tends to be slow, get stuck in "local optima" in which eval weights are bad but any change makes them even worse.
Simulated annealing. Like hill-climbing, makes small changes to eval and keeps changes that improve performance. But if change doesn't improve performance, sometimes (randomly, with a certain probability) accepts the change anyway, in an attempt to escape local optima. Need to specify these probabilities; they should start high and gradually become lower. Even slower than hill-climbing but eventually can get good values.
Genetic algorithms. Hill-climbing and simulated annealing maintain one good set of weights, which they change gradually. Instead, genetic algorithms maintain a collection of several different good sets of weights, add new sets to the collection by combining pairs of existing ones (take some weights from one and some from another, with a little mutation as well), and keep the size of the collection down by killing off sets of weights with bad performance.
Neural networks. Really, this is more a type of evaluation function than a method for choosing weights: a neuron is a function of the form threshhold(weighted sum of inputs), and one can form networks in which the neurons in the first layer take as inputs some basic features of the position (e.g. the individual bits of a bitboard representation) and successive layers take as inputs the neurons from the previous layer. So a one-layer network with only one-input neurons is the same as the first-order evaluation functions we talked about last time, but it's straightforward to build much more complicated neural networks, and it's not hard to use such a thing as the evaluation function (just recompute the outputs of neurons with changed inputs). The question is as before, how to set the weights? Along with the other methods above, there are some that have been developed specifically for neural networks, such as "temporal difference learning". The basic idea is to decide when the network makes a bad decision, and determine for each weight separately whether changing it up or down would lead to a better decision, so it's a lot like hill-climbing. One advantage of neural nets is they need even less human intelligence than the other automatic learning methods: you don't even really need to understand the game well enough to program a decent evaluation function. However, in the time available to us (a few weeks), you'll get good results faster by being more intelligent yourselves and leaving less of the work to the machines.

All of these methods require some method of automatically evaluating the performance of a program.

We can run the program on a large suite of test positions (say, taken from high-quality human games) and see if it gets the right answers.
We can play the program against some known opponent (say another program) and see how often it wins. Or, we can play the program against itself, or against other versions of itself; e.g. in hill-climbing the modified program can play against the unmodified one. Both of these have the disadvantage that, unless there is some randomness in the system, both programs will play exactly the same each time, so you only get to see the results of one game which may not be representative of overall play. One possible way around this would be to start playing several different games at positions taken from some test suite.
We can compare the results of the evaluation function with the results of combining evaluation and search. If the eval is good, they should be similar, but is vice versa true?

What has actually been done in automatically learning evaluation weights? A good source for this is Jay Scott's "machine learning in games" web page. He lists two experiments that I think are particularly interesting:

John Stanback (well-known as a commercial chess programmer) tried using a genetic algorithm to set the weights in the evaluation function of his program Zarkov. He only ran 2000-3000 games, which I think is way too few, and got material values that were ok but still worse than hand-tuned values. I think the lesson is that genetic algorithms do work, but need either a lot of generations or a good initial set of weights.
Risto Miikkulainen, genetic algorithms researcher at U. Texas, gave a talk last year on some experiments he'd done with Othello. He used a genetic algorithm to tune the weights in a neural net evaluation function. Performance evaluation of a net was done by play against a fixed opponent. If the fixed opponent played randomly, the neural net learned an evaluation in the form of piece-square tables (pieces in corners are good, pieces next to corners are bad, etc) after which it won all the time and stopped learning. Against an opponent combining piece-square tables with a short search, it eventually (after weeks of computer time) learned a better mobility-based strategy. But if its opponent was already a sophisticated mobility-based program, it lost all the time and never started learning. I think the lesson is to play against programs of similar strength, for instance to compare different evaluations in the genetic algorithm's current collection by playing them against each other, or alternatively to play against several fixed opponents of different strengths.

David Eppstein, Dept. Information & Computer Science, UC Irvine, .

ICS 180A, Spring 1997: Strategy and board game programming

Lecture notes for April 15, 1997 Tuning the evaluation function

ICS 180A, Spring 1997:
Strategy and board game programming

Lecture notes for April 15, 1997
Tuning the evaluation function