Yet Another Chess Programming Approach to Bug Hunting

You all know I’m a maniac when it come to hunting down bugs. I’ve posted about it (Bugs-Bugs-Bugs). I suspect many engines don’t get past the 2200 ELO point simply because they are buggy.

Thomas Peztzke made a post on CCC which I found interesting,

I hit the same issue several times, different node counts between Debug and Release, between x32 and x64 and Intel vs MSVC.

Was painful to hunt it down but at the end there was always a bug

Comparing node counts using different compilers is an interesting idea, and one I haven’t previously tried. Since I’ve spent many hours perfecting my perft node counts I was quite sure Maverick would sail through this test.

I was wrong!!

I set up the test and used one of the complex perft positions as an example. I then created debug and release versions of Maverick. To my horror when performing a fixed ply search there were different node counts for the two versions.

How could this be? I know the perft nodes counts are all ok; so my move-generation must be ok – right? Doubt starts to creep into my thinking.

Here’s how I tackled the problem.

First I found the minimum depth which triggered a difference in node counts. In my case this was the 10th ply of an late middlegame position.

Then I created a small function which writes a position’s vital statistics (FEN, hash key and move to be played) to a text file. I added a call to this function during the “make_move” routine, just before I make every single move. Effectively I’m writing the whole search tree to a text file.

I was then able to run the two versions which create the two text files containing the search tree. I renamed them debug-tree.txt and release-tree.text. Both files were about 35 mb in size.

To search by hand through these files would be a boring and tedious task. I’m not a Linux user but I believe Linux has a “diff” command which find the differences between two files. Windows doesn’t come with anything similar. However, there is a useful utility called WinMerge which does the same thing. It’s also free.

When I compared the two files this is what I saw:

chess-search-tree-differences

The yellow sections show the variation. As you can see the first variation in the tree was at line 75,416! The release version seems to go back to the position after evaluating the position with hash key of 15560305028174901452 and play the same move again (e7e4), whereas the debug version plays another move (f3e3). It looks like the release version fired a cutoff, whereas the debug version didn’t. Here’s the critical position after f6e4:

Connected-Passed-Pawns

It doesn’t look like anything special. I then inserted a break to stop the search when it encounters the position (i.e. when there is a hash key match).

Sure enough, the two versions evaluated the same position differently. The bug must be in the evaluation!

Since I knew the problematic position, finding the bug proved to be straightforward. I stepped through the evaluation, comparing the results at each point until I tracked down the bug.

It turns out the bug was in the connected passed pawn code. which was evaluating the white pawns on f2 and g3. The buggy code was the following:

neighboring_files is a pre-calculated array which returns the mask of neighboring files. However, it take the file number as an input, not the square. This was the bug. The corrected code is as follows:

Once corrected I created and compared the MSVC Release, MSVC Debug and MinGW x64 versions and they all produced the same node count!

This isn’t a technique I’ve used before. I always assumed perft would track down search errors, and flipping the board and comparing white and black evaluations of the flipped position would find evaluation errors. I was wrong. This will be another technique I’ll put in my debugging toolbox.

Is your node count the same for different compilers?

Create a Polyglot Chess Opening Book (Part 2)

 If you’re having problem viewing the video, watch it on YouTube – Create a Polglot Chess Opening Book 

Download Links:

Creating an Opening Book for Maverick

Maverick is the first chess engine I’ve written which can use its own opening book.  I decided at an early stage Maverick would read Polyglot opening books.  It makes so much sense.  Polyglot seems to be a great tool create opening books and its format seems to be emerging as the de-facto opening book standard.

So now I need to actually create an opening book.  I did some research on the internet and it really is quite straightforward. Here are the steps I took:

  1. There are some really good free tools which you’ll need to download:
  2. You will also need a database of high quality game.  I used Ed Schroeder’s Million Base. Make sure your database is in PGN format.
  3. Open the MillionBase using SCID.
  4. We’ll be filtering the games using SCID’s excellent filtering system – select “Search / Header…” (Ctrl-Shift-H)
  5. I created a database consisting of only White wins by players over 2400 ELO with opponents over 2200 ELO (see diagram)Filter Options
  6. I then saved the white win database as “white2400.pgn” using SCID’s “Tools / Export All Filter Games” command
  7. Repeat for Black
  8. Now put both PGN files in the same folder as Polyglot.exe, PGN-Extract.exe, the batch file.
  9. The batch file contains the commands to create the  opening book

  1. The first line runs PGN-Extract and simply cleans the database, removing and games which don’t start from the normal starting position.
  2. The second line creates a opening book of white moves, where the positions have occurred at least 50 times.
  3. The third line creates a broader book which give more responses to a variety of openings.
  4. This is repeated for Black
  5. Finally all of the books are merged

The final opening book is a little over 1 Mb (it’s available on the download page)- quite small by most standards.  I’ll need to run some test to see how good it is compared to other Polyglot books.

If there are any opening book specialists out there I’d appreciate any feedback. Can the process of creating the book be improved?

Bugs Bugs Bugs!

The chess engine Fruit has many strengths; one of the main ones I believe is simply the lack of bugs. Fruit’s author Fabian Letouzey has a defensive coding style.  He litters his code with “assert” statements.  When he was actively developing Fruit he set out a challenge – he said if you find a bug in Fruit he’ll fix a bug in your code (can’t find link).  I think this highlights his emphasis on finding and fixing bugs.  This philosophy has certainly influenced my coding style.

So now Maverick 0.2 is out I’ve spent most of my chess development time finding and fixing bugs.  There is a great temptation to add knowledge, feature and improve the search. But a couple of times I’ve sat down and just focused on finding bugs. On some occasions I have a specific bug or odd behavior I have identified and which I’d like to fix.  But often I’m just hunting for bugs.  I might set myself the goal of finding ten (small) bugs in two hours.  I then go through Maverick’s code with a fine toothpick, forcing myself to challenge the logic of every line and asking myself what I’m assuming at each stage.  It’s amazing what you discover.  If you have a chess engine give it a try.  Set aside some time to focus on simply finding bugs.

Let me know how you get on!

How strong is Maverick 0.2?

Maverick 0.2 has been out a couple of days now.  I’ve run some test games and I really quite surprised at how strong it seems to be.

Fruit 1.0:

The first test was against Fruit 1.0.  I played this match using Shredder’s GUI. As I expected Fruit won.  I really was dazzles by Fruit’s silky smooth search as it carved it ways through each ply (especially in the endgame).

  • CCRL Rating = ???? (between 2300 and 2400)
  • Time = 1 minute plus 1 second per move
  • Result for Maverick: Wins = 24, Loses = 63, Draws = 13
  • Percentage = 30.5%
  • ELO Difference = -146

I was pleased with this result.  However, it would seem Fruit’s strength is not well established.  As far as I can tell it us somewhere between 2300 and 2400 ELO on the CCRL scale.  This would put Maverick 0.2 between 2150 and 2250 ELO.

Phalanx XXIII:

The second opponent was Phalanx XXIII.  Another classic engine.  As I watched them play it is clear Phalanx has much more knowledge than Maverick.  Yet Maverick did well to win some game based on its speed.

  • CCRL Rating = 2387
  • Results for Maverick: Wins = 36, Loses = 49, Draws = 15,
  • Percentage = 43.5%
  • ELO Difference = -45

Once again I was pleased.  Based on this result Maverick is over 2300 ELO, which is probably too high but let’s see.

And remember, at this stage Maverick is lacking some standard features:

  • No mobility in the evaluation
  • No passed pawn code
  • No king safety
  • No pawn structure code
  • No late move reduction
  • No Internal Iterative Deepening
  • Lots of endgame evaluation missing

So I’m encouraged by Maverick’s strength.  If you run any sort of ratings test I’d be interested in the results.

I will not release another version until it has gained 100 ELO.  This may be some time.

New Mavericks version 0.2

It’s only been a few days since I launched Maverick 0.11.  However,  yesterday was unusual.  I spent most of the day at Sort Hills Mall in New Jersey (don’t ask!).  I set myself the task of finding 10 bugs.  In the end I found three.  As it turns out these three severely impacted Maverick’s playing ability.  So the latest version seems to play at around 2150 ELO on the CCRL scale i.e. about 100 ELO better than version 0.11).

The download links are here and the source code is here.

New Version of Maverick

It’s been just under two weeks since I released Maverick into the wild.   Over that period there have been over 200 downloads – quite amazing.

I’ve also been busy.  After setting up the testing framework I had some time to add some standard chess engine components:

  • Checks in the first ply of the quiescent search
  • Standard endgame knowledge (KQ vs. K, KR vs. K, KBB vs. K, KBN vs. K)
  • Null move heuristic
  • Hash tables
  • Bug fixes (including a nasty time management bug)

To my surprise the null move heuristic didn’t add much value until I added the hash tables.  Once the hash tables were added the increase in strength was noticeable.  By my test this version of Maverick is now better than Monarch 1.7 and approximately +200 ELO better than the previous version.  In a 300 game match it convincingly beat Monarch in a lighting chess match (60 moves in 20 secs) with 140 wins, 104 loses and 56 draws, which is 56%  – yippee!!  The win was replicated at slightly slower time controls of 60 moves in 5 mins when Maverick scored 38 wins, 30 loses and 32 draws (54%).  If these results scale to other opponents it would give Maverick 0.1 a rating of about 2050 ELO on the CCRL rating list.  You can download Maverick 0.11 here.

EDIT: Version 0.11 replaces Version 0.10 which had a time management bug

Adding Endgame Knowledge

If you test the first version of Maverick (v 0.05) you’ll soon find out it cannot mate at fast time controls with rook and king vs. a lone king. Quite shocking!  Despite gnawing its way through 9 million positions per second, the mate is too deep for an engine without hash tables.  So it just pushes its king and rook around the board aimlessly.  Obviously it needs help.  

I could add a few “if-then” statements to handle these cases.  This would work well if there are only a few standard endgame.  But I’d like to create a scalable solution where I can add all sorts of knowledge.  I posted a questions on the Computer Chess Club and Steven Edwards pointed me in the rights direction – material hash tables.  

Material hash work like this.  Each position has a material hash value which is calculated based on the number of each type of pieces on the board.  So when a piece us added or taken away the material hash value is updated.  Unlike a normal zobrist hash value the location of the piece is not relevant.

When the evaluation routine is called, one of the first tasks is to check to see if the material hash of the position matches the signature of a known endgame.  If there is a match the endgame specific routine is used to evaluate the position.  Here’s the code to evaluate white rook and king vs. long black king:

void known_endgame_RKvk(struct t_board *board, struct t_chess_eval *eval)
{
eval->static_score = lone_king[board->king_square[BLACK]] + 800 - 10 * square_distance(board->king_square[WHITE], board->king_square[BLACK]);
eval->static_score *= (1 - board->to_move * 2);
}

Adding knowledge is also really easy.  Here’s the routine to add KBN vs. K:

//-- R + K vs. k
clear_array(material);
material[WHITEROOK] = 1;
key = get_material_hash(material);
index = key & material_hash_mask;
if (material_hash[index].key != 0)
return FALSE;
material_hash[index].key = key;
material_hash[index].eval_endgame = &known_endgame_RKvk;

Cool stuff!

Chess Engine Testing Framework

Chess-Engine-TestingSince Maverick is now playing chess I really need a testing framework.  Testing has always been important but I think Fabian Letouzey really “turned some heads” when he started making rapid progress with Fruit back in 2004.  His secret turned out to be making decisions about engine parameters and potential improvements based on cold hard facts i.e. testing.  Since then we’ve seen much more discussion about testing.  Then came Rybka and Houdini who championed, what I’d call, turbo-testing.  This is where the games are super short – something like 60 moves in 20 seconds.  Ten years ago I think this would have been frowned upon as too fast and the results will likely not scale to longer time controls.  As it turns out this fear seems to be incorrect.  Turbo-testing seems to be the best way to quickly test potential improvements.

When developing Monarch my testing really wasn’t systematic.  I’d run games of varying lengths.  Maybe 50 games or at most 100 games.  In hindsight Monarch really wasn’t tested well.  I’m determined not to do that with Maverick.  Thankfully there are also some great tools to help with the testing.  The tool which seems to be regarded as the best is CuteChess-Cli.  Another one is LittleBlitzer.  Both enable you to run multiple games in parallel and with minimal screen updates.  I decided to plump for CuteChess-Cli as my tool of choice.  Here’s how I went about setting up the testing framework:

  1. First you need to download CuteChess-Cli
  2. I decided to create a folder for the testing framework as a sub-folder of my root directory.  This way any paths will be quite short.
  3. I then selected a number of engines which will be Maverick’s sparing partners.  The criteria I set for selecting the engine is as follows:
    • Stable – each engine must be rock solid stable
    • A playing strength of between 1800 ELO and 2200 ELO.  This will provide a range of opponents, all of whom will initially be better players.
  4. The engines I selected are as follows:
    • Monarch 
    • Clueless
    • MadChess
    • Predateur
  5. I created a few “good-old-fashioned” batch files to execute the actual tests.  Here’s an example of one of the test batch files:

@ECHO OFF

SET varYYYY=%DATE:~10,4%
SET varMM=%DATE:~4,2%
SET varDD=%DATE:~7,2%
SET varTodaysDate=%varYYYY%%varMM%%varDD%

SET PGN_DATABASE=Games%varTodaysDate%.pgn

c:CuteChesscutechess-cli -engine name="Maverick 0.10 Beta" cmd=Maverick dir=c:CuteChessMaverickNew1 -engine name="Predateur 2.2.1 x32" cmd=Predateur dir=C:CuteChessPredateur -each proto=uci tc=60/180 book=c:CuteChessobook.bin -pgnout c:CuteChessGames%PGN_DATABASE% -games 500 -concurrency 4 -wait 5

pause 10

As you can see this, runs one match of 500 games between Maverick and Predateur.  Four games are played concurrently on the same machine.  All the games are stored in a PGN which contains today’s date (you may need to adjust the code at the top of the bat file for different date formats – mine is US).   For editing these types of files I’d suggest NotePad++.  When installed you simply right click on a .bat file and you have the option to edit it using NotePad++.  It also color code the formatting – very useful.

Since it may be useful to some of you, here’s my whole testing framework zipped up and ready for download.

Maverick is Available for Download!

You can download the first release of Maverick on the Download page.

I decided to make this release as Maverick seems to be stable.  But don’t expect anything special.  I’d estimate the strength to be about 1700 ELO on the CCRL scale (which us about 300 ELO weaker than Monarch, my previous engine).  I wouldn’t be surprised if it’s even weaker than 1700 ELO.  If you’re interested in a fast but dumb engine, then Maverick will be perfect.

Here’s what it does:

  • Magic bitboard structure which would seem to be quite fast (the 64 bit version runs at about 8 million nps in the middle game on my Core-i7 4900QM at 2.8 GHz)
  • Fully UCI compatible (tested in Shredder and Fritz)
  • Basic PVS search
  • Move ordering – captures (mvv/lva ordered), killers and history move count
  • Basic Quiescent – all SEE positive moves played
  • Basic evaluation function which is little more than piece-square tables
  • Draw by 50 move rule
  • Draw by 3 times repetition (Maverick is quite “smart” and requires 3 time in appropriate positions – see here for discussion)
  • Ponders in opponent’s time
  • Polyglot compatible opening book (selectable from Maverick’s options menu!)

And that’s it!  Maverick doesn’t even have null move pruning or hash tables (so room for improvement!).

At this point Maverick seems to be stable; having played a few hundred games against Monarch.  Please let me know if you encounter any problems.

P.S. I started writing Maverick five months ago to the day!