Adding Endgame Knowledge

If you test the first version of Maverick (v 0.05) you’ll soon find out it cannot mate at fast time controls with rook and king vs. a lone king. Quite shocking!  Despite gnawing its way through 9 million positions per second, the mate is too deep for an engine without hash tables.  So it just pushes its king and rook around the board aimlessly.  Obviously it needs help.  

I could add a few “if-then” statements to handle these cases.  This would work well if there are only a few standard endgame.  But I’d like to create a scalable solution where I can add all sorts of knowledge.  I posted a questions on the Computer Chess Club and Steven Edwards pointed me in the rights direction – material hash tables.  

Material hash work like this.  Each position has a material hash value which is calculated based on the number of each type of pieces on the board.  So when a piece us added or taken away the material hash value is updated.  Unlike a normal zobrist hash value the location of the piece is not relevant.

When the evaluation routine is called, one of the first tasks is to check to see if the material hash of the position matches the signature of a known endgame.  If there is a match the endgame specific routine is used to evaluate the position.  Here’s the code to evaluate white rook and king vs. long black king:

void known_endgame_RKvk(struct t_board *board, struct t_chess_eval *eval)
{
eval->static_score = lone_king[board->king_square[BLACK]] + 800 - 10 * square_distance(board->king_square[WHITE], board->king_square[BLACK]);
eval->static_score *= (1 - board->to_move * 2);
}

Adding knowledge is also really easy.  Here’s the routine to add KBN vs. K:

//-- R + K vs. k
clear_array(material);
material[WHITEROOK] = 1;
key = get_material_hash(material);
index = key & material_hash_mask;
if (material_hash[index].key != 0)
return FALSE;
material_hash[index].key = key;
material_hash[index].eval_endgame = &known_endgame_RKvk;

Cool stuff!

Chess Engine Testing Framework

Chess-Engine-TestingSince Maverick is now playing chess I really need a testing framework.  Testing has always been important but I think Fabian Letouzey really “turned some heads” when he started making rapid progress with Fruit back in 2004.  His secret turned out to be making decisions about engine parameters and potential improvements based on cold hard facts i.e. testing.  Since then we’ve seen much more discussion about testing.  Then came Rybka and Houdini who championed, what I’d call, turbo-testing.  This is where the games are super short – something like 60 moves in 20 seconds.  Ten years ago I think this would have been frowned upon as too fast and the results will likely not scale to longer time controls.  As it turns out this fear seems to be incorrect.  Turbo-testing seems to be the best way to quickly test potential improvements.

When developing Monarch my testing really wasn’t systematic.  I’d run games of varying lengths.  Maybe 50 games or at most 100 games.  In hindsight Monarch really wasn’t tested well.  I’m determined not to do that with Maverick.  Thankfully there are also some great tools to help with the testing.  The tool which seems to be regarded as the best is CuteChess-Cli.  Another one is LittleBlitzer.  Both enable you to run multiple games in parallel and with minimal screen updates.  I decided to plump for CuteChess-Cli as my tool of choice.  Here’s how I went about setting up the testing framework:

  1. First you need to download CuteChess-Cli
  2. I decided to create a folder for the testing framework as a sub-folder of my root directory.  This way any paths will be quite short.
  3. I then selected a number of engines which will be Maverick’s sparing partners.  The criteria I set for selecting the engine is as follows:
    • Stable – each engine must be rock solid stable
    • A playing strength of between 1800 ELO and 2200 ELO.  This will provide a range of opponents, all of whom will initially be better players.
  4. The engines I selected are as follows:
    • Monarch 
    • Clueless
    • MadChess
    • Predateur
  5. I created a few “good-old-fashioned” batch files to execute the actual tests.  Here’s an example of one of the test batch files:

@ECHO OFF

SET varYYYY=%DATE:~10,4%
SET varMM=%DATE:~4,2%
SET varDD=%DATE:~7,2%
SET varTodaysDate=%varYYYY%%varMM%%varDD%

SET PGN_DATABASE=Games%varTodaysDate%.pgn

c:CuteChesscutechess-cli -engine name="Maverick 0.10 Beta" cmd=Maverick dir=c:CuteChessMaverickNew1 -engine name="Predateur 2.2.1 x32" cmd=Predateur dir=C:CuteChessPredateur -each proto=uci tc=60/180 book=c:CuteChessobook.bin -pgnout c:CuteChessGames%PGN_DATABASE% -games 500 -concurrency 4 -wait 5

pause 10

As you can see this, runs one match of 500 games between Maverick and Predateur.  Four games are played concurrently on the same machine.  All the games are stored in a PGN which contains today’s date (you may need to adjust the code at the top of the bat file for different date formats – mine is US).   For editing these types of files I’d suggest NotePad++.  When installed you simply right click on a .bat file and you have the option to edit it using NotePad++.  It also color code the formatting – very useful.

Since it may be useful to some of you, here’s my whole testing framework zipped up and ready for download.

Maverick is Available for Download!

You can download the first release of Maverick on the Download page.

I decided to make this release as Maverick seems to be stable.  But don’t expect anything special.  I’d estimate the strength to be about 1700 ELO on the CCRL scale (which us about 300 ELO weaker than Monarch, my previous engine).  I wouldn’t be surprised if it’s even weaker than 1700 ELO.  If you’re interested in a fast but dumb engine, then Maverick will be perfect.

Here’s what it does:

  • Magic bitboard structure which would seem to be quite fast (the 64 bit version runs at about 8 million nps in the middle game on my Core-i7 4900QM at 2.8 GHz)
  • Fully UCI compatible (tested in Shredder and Fritz)
  • Basic PVS search
  • Move ordering – captures (mvv/lva ordered), killers and history move count
  • Basic Quiescent – all SEE positive moves played
  • Basic evaluation function which is little more than piece-square tables
  • Draw by 50 move rule
  • Draw by 3 times repetition (Maverick is quite “smart” and requires 3 time in appropriate positions – see here for discussion)
  • Ponders in opponent’s time
  • Polyglot compatible opening book (selectable from Maverick’s options menu!)

And that’s it!  Maverick doesn’t even have null move pruning or hash tables (so room for improvement!).

At this point Maverick seems to be stable; having played a few hundred games against Monarch.  Please let me know if you encounter any problems.

P.S. I started writing Maverick five months ago to the day!

UCI Chess Engine Protocol

I really like the UCI protocol.  I know there are diehard fans of Winboard out there, but for me I like the cleanness of UCI.  One of the aim I have for Maverick is to support as much of the protocol as possible.  In Monarch the UCI code was stable but it was a bit messy.  So I put some thought into how I could simplify the implementation.  Here’s what I came up with for Maverick:

There are two threads.  The first thread just listens for user / GUI input.  We’ll call this the listening thread. The second thread runs the engine’s search and reports back with it’s findings.  We’ll call this the thinking thread.  Most, if not all, problems with this type of implementation crop up when one thread gets out of sync with the other thread’s state. This usually results in the thinking thread being sent unexpected commands which are either ignored or processed incorrectly.

The insight I had was by using the UCI protocol there are only two commands which change the state of the thinking thread.  These are “go” and “stop”, (there is also “quit”, which I handle as a special form of “stop”).  So the “go” command needs to change the state of the thinking thread from “waiting” to “thinking”.  And the stop command needs to change the state back from “thinking” to “waiting”.  In Maverick’s implementation I’ve also added “Start_Thinking” as another state for the thinking thread.  This is needed when there is rapid fire changes and the GUI is blasting commands at the engine.  Under the rapid fire scenario the engine can receive a stop command before the engine starts to think.  By having the “Start_Thinking” state the engine can wait for the search to start before taking any more input from the GUI.

Here’s the code:

void uci_stop()
{
uci.stop = TRUE;
while (uci.engine_state != UCI_ENGINE_WAITING)
Sleep(1);
}

void uci_go(char *s)
{
search_start_time = time_now();
last_display_update = search_start_time;
set_uci_level(s, position->to_move);
uci.engine_state = UCI_ENGINE_START_THINKING;
while (uci.engine_state == UCI_ENGINE_START_THINKING)
Sleep(1);
}

It seems to work well and be rock-solid stable!

Maverick is almost ready to start playing chess.  A few more clean-ups needed!

US Citizenship & x5 Speed-up!

It’s been a while since I last posted.  I’ve been busy getting all of the UCI plumbing in place.  Maverick is almost at the stage it can play a game of chess.  I’ve created the basic search routines – complete with alpha-beta, quiescent search and root search. 

Some early tests show quite a speed-up compared to my old Monarch letterbox framework.  On my 2.8 GHz i7 4900MQ the old Monarch 1.7 engine is doing about 1.7 million nodes per second.  On the same positions Maverick is doing about 8.6 million.  I expect this will come down once I add hash tables and beef up the evaluation but this is better than I expected.  It’s about a five times speed-up.  I think quite a bit is due to the 64 bit GCC 4.8 compiler but I’m happy with the result.

I’m hoping Maverick will play its first game this weekend.

On another note – I became a US citizen today!

Static Exchange Evaluation in Chess

When humans chess players evaluate a sequence of moves they often mentally “swap off” the pieces at the end of a variation. For example if a proficient chess player took a quick look at the following position they would see Nxd5 wins a pawn:

Nxd5

Black can recapture with his knight but white would recapture again to win the piece. Effectively the player has resolved the sequence of captures on one square to see if the initial capture is winning or not. This type of logic is invaluable to humans and to chess engines.

In computer chess this type of routine is the “Static Exchange Evaluation” or “SEE” for short. It’s used in many different places in a chess engine. One of the most common uses is to prune losing captures in the quiescent search.

You would think such a simple “swap-off” procedure would be relatively easy to code. In my experience it’s not as simple as it seems at first! Tricky little situations add to the complexity. Is Rxd5 winning, losing or drawing?

Rxd5

All those discovered attacks on d5 need to be handled correctly. You also need to be able to handle the “stand-pat” condition, where one side can take but chooses not to as they would subsequently lose material.

In my previous engine (Monarch) I was never happy with the performance of the Static Exchange Evaluation. It built a list of attacking pieces and tried to resolve a score. It seemed to work but was really quite slow and I’m not convinced it was bug free. I’m determined the SEE routine in Maverick should be more robust.

In most cases the SEE function only needs to figure out if the move loses material. This simplification can save a lot of time. I Googled previous discussion about Static Exchange Evaluation on the various forums. Some people try to resolve the to a specific value using mini-max while other have an alpha-beta routine. I decided to be somewhat different and introduce a “threshold” parameter. The routine would return true if the move was equal-to, or better-than, the threshold parameter, and false if less than the threshold.

Here’s the logic in pseudo code first. “Side-to-move” is the same side as the initial SEE move being evaluated. Opponent is the opposite side:

  1. Generate all of the attacks and x-ray attacks on the target square for both sides
  2. The “see_value” is set to the value of the initially captured piece
  3. The “trophy_value” is set to the value of the piece making the initial capture
  4. Find the least valuable opponent’s piece which is attacking the target square (and is not blocked)
  5. Reduce the see_value by the trophy_value i.e. see_value = see_value – trophy_value
  6. Set the trophy_value equal to the value of the piece found in step 4
  7. If see_value >= threshold then return TRUE (i.e. the side to_move can stand pat and still have a score which reaches the threshold)
  8. If see_value + trophy_value < threshold then return FALSE (i.e. even if the side to move captures the new piece on offer, they will not reach the threshold)
  9. Find the least valuable piece for the side-to-move which is attacking the target square (and is not blocked)
  10. Increase the see_value by the trophy_value i.e. see_value = see_value + trophy_value
  11. Set the trophy_value equal to the value of the piece found in step 9
  12. If see_value – trophy_value >= threshold then return TRUE (i.e. even of the opponent captures the new piece on offer, they will not each the threshold)
  13. If see_value < threshold then return FALSE (i.e. the opponent can stand-pat and the score is less than the threshold).
  14. While each side has potential captures go to Step 4.

There are a few other wrinkles such as what to do when one side runs out of moves but I’m sure you can figure these out. I’ve also added some simple initial tests to see if an early decision can be make (e.g. can the piece be taken by a pawn?), without the need to generate all of the attacks.

Having tested the routine it seems to be quite efficient (certainly better than the SEE routine in Monarch).

How to Debug Capture & Check Move Generators?

As I’ve explained in previous posts, a perft routine is a great way to debug a move generator. If you can correctly calculate the leaf node count for a variety of positions you are (almost) certain to have eradicated all the bug in your move generator.

But what about the quiescent search? Unlike the regular search, the quiescent search doesn’t look at all moves. And I’m not aware of any specialized quiescent perft test.

At a minimum the quiescent search generates all of the possible capture moves: and in most cases it also generates checks (at least for the first ply). How can we write these move generation routines with a high level of certainty they are bug free?

When I write Monarch I created a perft routine but didn’t create any test routines for the ‘generate_captures’ or ‘generate_checks’ routines. With Maverick I’m more determined to try to write a test routine for each procedure. So I wrote three new move generators:

  1. generate_captures
  2. generate_quiet_checks
  3. generate_quiet_moves

As the name suggests, the first routine generates all captures except en-passant. Captures which promote the pawn are limited only to queen promotions.

The second routine, generates checking moves which are not captures. It ignores pawn promotions, en-passant moves and castling move which give check.

The third routine generates all the other moves. When Maverick finally plays chess I don’t expect to use this routine at all. However, all three routines together generate the same set of moves as the regular move generator. This means I can test them using a version of the perft routine which calls all three specialized move generators for each position. It worked well and I was able to trap a couple of bugs.

What really amazed me was the speed of this adjusted perft routine – it was only about 10% slower than the regular routine. This is all down to the bitboard structure. With a letterbox structure I’m sure there would have been a much wider gap between the normal perft and this specialized move generator version.

Bitboards rule!!!

Time to Create the Search Routines!

I had planned to have a reasonably solid evaluation function in Maverick from “day 1”.  It looks as if I’ll need to backtrack a little.  

I had imagined I’d put all of the gubbins of the evaluation function (e.g. attack tables) in the “board” structure.  However, the drawback to this approach is the need to store changes as moves are made and unmade on the board.  

For example, if I generate all of the attack tables when I evaluate the position I’d subsequently like to use this information in my Late-Move-Reduction code.  However, if I include the attack tables in the board structure I’ll need to store and retrieve it as part of the make / unmake process.  This sounds expensive! 

So what’s the solution?

What I intend to do is to have a “ply_data” array which stores information about the current position at any give ply.  This is not unusual and Monarch has this structure.  As an example, it enable us to see what move was played two moves ago.  So I’ll move some of the information I was planning to store in the board structure, including the attack tables, over to this ply_data array.

This is a fundamental structural issue.  It concerned me that as I push ahead with the evaluation I may need to make structural changes later when I put the search routines in place. So I’ve also decided to freeze the evaluation function in it’s curent basic state and create the search routines (i.e. make Maverick play chess).  I have the structure of the evaluation function in place so (hopefully) I’ll be able to develop it quickly I have a basic search.

Using Excel to Help Create Piece-Square Tables

I’ve been thinking about Piece-Square tables.  They’re not the most exciting part of computer chess but they seem quite important (based on Ed Schroder’s experiments), so I thought I’d put some time into thinking about reasonable values.  I’ve created a simple spreadsheet in Excel to help create them.  You can download it here (Piece-Square-Tables.xlsx).  It assumes your board representation starts with A1 = 0 and H8 = 63.

Here’s how to use the Excel spreadsheet:

  • The worksheet is protected to prevent accident overwrites.  You can unlock it easily as it’s not password protected
  • Enter the values in the yellow squares on the left.  The gray squares are the mirror image of the yellow squares.
  • The table in the middle is adjusted to make the average table value zero.   This means if I set the value of a bishop to be 350 centi-pawns and put it on a random square,  the value returned by the Piece Square Table will average out to 350 centi-pawns.  The idea is to minimize interaction with other evaluation terms.
  • The column to the far right is the table formatted as a “C” array.  You can copy and paste this code.

Here’s a screenshot:

Screenshot_060113_043614_PM

Domain Space of a Chess Evaluation Function

I’m up to my chops in evaluation code.  As I’ve said in previous posts the engines I’ve written in the past have had basic evaluation function.  I want Maverick to be different.  I’m using a number of good resources to inspire me.  The first is a book by Hans Berliner The System: A World Champion’s Approach to Chess, which takes quite a quantitative approach to chess.  I’ve found it invaluable and would recommend it to any chess programmer.

As I’m writing the code I can’t help but feel the enormity of the domain space of a chess evaluation function.  Firstly there are the values I give to each element.  For example in the middelgame, is a pawn on g4 worth more or less than a pawn on f4?  The pawn on f4 is probably worth more, but how much more – is it 2 centi pawns or 10 centi pawns?  I’ve got to make these decision for all six piece types in the middle-game and endgame.  And then there is the values I assign for a double pawn on h2 or e4?  The task is truly enormous.  Secondly there is complexity due to what I choose to evaluate and how I go about doing so.

This all reinforces the idea of a solid testing framework.  I also recent read an interview with Robert Houdart.  Of course I would expect Robert to test Houdini but I was struck by him playing 100,000 games per day!  That’s a lot of games.  This impressed upon me the need to create a testing setup (when Maverick starts to play chess).  Testing was also one of the things Fabian used to rocket Fruit up the rating lists back in 2005.  Once I get Maverick playing chess I must make it my objective to be 100% test driven.