Strategy and board game programming

Variants of Alpha-Beta Search

First, let's quickly review the original alpha-beta search, omitting details like hashing or adjust winning scores for the current ply.

// basic alpha-beta search int alphabeta(int depth, int alpha, int beta) { move bestmove; if (game over or depth <= 0) return winning score or eval(); for (each possible move m) { make move m; score = -alphabeta(depth - 1, -beta, -alpha) if (score >= alpha) { alpha = score; bestmove = m; } unmake move m; if (alpha >= beta) break; } return alpha; }

// fail-soft alpha-beta search int alphabeta(int depth, int alpha, int beta) { move bestmove; int current = -WIN; if (game over or depth <= 0) return winning score or eval(); for (each possible move m) { make move m; score = -alphabeta(depth - 1, -beta, -alpha) unmake move m; if (score >= current) { current = score; bestmove = m; if (score >= alpha) alpha = score; if (score >= beta) break; } } return current; }With this change, one can determine a little more information than before about a position. If the returned value x is less than or equal to alpha, then we still don't know the true value of the position (because we may have pruned away some important lines of the search), but we do know that the true value is at most x. Similarly, if x is greater than or equal to beta, we know that the true search value is at least x. These slightly tighter upper and lower bounds don't improve the search itself, but they could lead to a greater number of successful hash probes. The use of fail-soft alpha-beta is also essential in the MTD(f) algorithm described below.

alphabeta(depth, -WIN, WIN)where the huge range between -WIN and WIN indicates that we don't know what the true search value will be, so all possible scores should be considered interesting. Then, the move one makes should be the one set in the variable bestmove at the outer level of the search.

Instead, it is often helpful to call alpha-beta with an artificially narrow window centered around the previous search value. If the result is a score within that window, you've saved time and found the correct search value. But if the search fails, you must widen the window and search again:

// aspiration search int alpha = previous - WINDOW; int beta = previous + WINDOW; for (;;) { score = alphabeta(depth, alpha, beta) if (score <= alpha) alpha = -WIN; else if (score >= beta) beta = WIN; else break; }The constant WINDOW should be set in a way that balances the time savings from a narrower search with the time lost from repeating an unsuccessful search. A typical value in chess might be around half a pawn. Variants of aspiration search include widening the window more gradually in the event of an unsuccessful search, or using an initial search window that is not necessarily centered around the previous search result.

One could use this idea to perform a binary search for the true score:

int alpha = -WIN; int beta = +WIN; while (beta > alpha+1) { int test = (alpha+beta)/2; if (alphabeta(depth, test, test+1) <= test) beta = test; else alpha = test+1; }

However, this will lead to a large number of searches (the logarithm of the difference between WIN and -WIN). The MTD(f) idea is to instead use fail-soft alpha-beta to control the search: each call to fail-soft alpha-beta returns a search value which is closer to the final score, so if we use that search value as the start of the next test, we should eventually converge.

// MTD(f) int test = 0; for (;;) { score = alphabeta(depth, test,test+1); if (test == score) break; test = score; }Unfortunately, complicated interactions with the hash table can cause this routine to get into an infinite loop, so one needs additional code to halt the search if too many iterations have been made without any convergence.

One big advantage of MTD(f) is that we can simplify the code to the alpha-beta search, since it only really has two parameters (depth and alpha) rather than three.

So PVS performs that first search with a normal window, but on subsequent searches uses a zero-width window to test each successive move against the first move. Only if the zero-width search fails does it do a normal search.

// principal variation search (fail-soft version) int alphabeta(int depth, int alpha, int beta) { move bestmove, current; if (game over or depth <= 0) return winning score or eval(); move m = first move; make move m; current = -alphabeta(depth - 1, -beta, -alpha) unmake move m; for (each remaining move m) { make move m; score = -alphabeta(depth - 1, -alpha-1, -alpha) if (score > alpha && score < beta) score = -alphabeta(depth - 1, -beta, -alpha) unmake move m; if (score >= current) { current = score; bestmove = m; if (score >= alpha) alpha = score; if (score >= beta) break; } } return current; }This shares the advantage with MTD(f) that most nodes in the search tree have zero-width windows, and can use a simpler two-parameter form of alpha-beta. Since there are very few calls with beta > alpha+1, one can do extra work in those calls (such as saving the best move for later use) without worrying much about the extra time it takes.

David Eppstein, Dept. Information & Computer Science, UC Irvine.