Bug Fix Release of Maverick (Version 0.51)

Maverick 0.5 has been in the wild for about a month. I’m pleased to say it seems to be playing at 2300 ELO or above.

However, occasionally it would crash. This really niggled me.

I do quite a lot of turbo-testing using CuteChess-Cli at fast time controls.  So I had convinced myself the occasional crash was probably an overload of the standard input and output pipes (I’m not even sure if this is possible). However, since I’ve tested Maverick quite thoroughly using perft positions, I was confident the basic chess engine routines were bug free. I was wrong.

This week I have been down in São Paulo Brazil on business. Graham Banks has been testing Maverick and broadcasting the games.  Since I had nothing better to do I logged in and watched a few games.  It’s amazing how nerve wracking it is to watch your engine play a game at 40 moves in 40 minutes. As I was watching, and to my surprise, Maverick suddenly crashed. I quickly copied the position and thankfully was able to reproduce the crash!

The bug was quite bizarre. Take a look at the position below:

discovered-check-bug

After black has played pawn to g4, white is in check from the queen on h6. However, the code which finds the source of the check is shown below:

discovered-check-bug-codeThis was a “copy and paste” error. The code thinks the knight on c2 is actually a king attacking the white king. This was the bug.  It was easy to fix.  Hopefully this will result in no more crashed.

You can downloaded the fixed version (Maverick 0.51) on the Download Page. There are really no strength improvements for this bug-fix release.  If you are testing Maverick you can keep the games already played.

Is Maverick a Clone?

Let me state unequivocally, Maverick is not a clone!

There have always been problems with clones and computer chess. The first one I recall was Quickstep.  This was a program competing in the 1989 World Computer Chess Championship in Portorož.  After a couple of rounds it was discovered to be a clone of Richard Lang’s Mephisto Almeria program. It was scandalous at the time.  Thankfully there was little cloning activity after this for about ten years. Then Crafty had a few clones (e.g. Bionic Impakt).  But when Fruit burst onto the scene it became a big problem. This then lead to the Rybka controversy (which I’m not doing to get into).

The computer chess community has developed a test to quickly identify clones.  This work was pioneered by Adam Hair.  You can read about it here – A Pairwise Comparison of Chess Engine Move Selections. I must say I was a little skeptical about the test. I wondered if a basic engine with piece-square-tables, mobility and passed-pawn code would choose similar moves even if they weren’t at all related.  Adam was kind enough to put Maverick through his test – here are the results:

------ Maverick 0.5 x64 (No Popcount) (time: 100 ms scale: 1.0) ------
45.50 Fruit 2.1 (time: 100 ms scale: 1.0)
45.24 Movei00_8_438 (time: 100 ms scale: 1.0)
44.74 SmarThink 1.20 (time: 100 ms scale: 1.0)
44.72 Strelka 2.0 B x64 (time: 100 ms scale: 1.0)
44.50 Fruit 2.2.1 (time: 100 ms scale: 1.0)
44.32 Doch64 09.980 JA (time: 100 ms scale: 1.0)
44.19 Loop 2007 32-Bit (time: 100 ms scale: 1.0)
44.17 Strelka 1.8 UCI (time: 100 ms scale: 1.0)
43.99 Toga II 1.0 (time: 100 ms scale: 1.0)
43.72 Nemo SP64n 1.0.1 Beta (time: 100 ms scale: 1.0)
43.38 Rybka 1.0 Beta 32-bit (time: 100 ms scale: 1.0)
43.26 Daydreamer 1.75 JA (time: 100 ms scale: 1.0)
43.17 Twisted Logic 20090922 (time: 100 ms scale: 1.0)
42.94 Nebula 2.0 (time: 100 ms scale: 1.0)
42.91 RedQueen 1.1.4 (time: 100 ms scale: 1.0)
42.66 Cyrano 0.6b17 JA (time: 100 ms scale: 1.0)
42.61 Hamsters 0.7.1 (time: 100 ms scale: 1.0)
42.60 Naum 4.2 (time: 100 ms scale: 1.0)
42.33 Murka 3 x64 UCI (time: 100 ms scale: 1.0)
42.32 spark-1.0 (time: 100 ms scale: 1.0)
42.26 Octochess revision 5190 (time: 100 ms scale: 1.0)
41.81 DiscoCheck 4.3 JA (time: 100 ms scale: 1.0)
41.77 Bobcat 3.25 (time: 100 ms scale: 1.0)
41.67 cheng3 1.07 (time: 100 ms scale: 1.0)
41.58 Gandalf 6.0 (time: 100 ms scale: 1.0)
41.37 Glass 2.0 PERSONALITY (time: 100 ms scale: 1.0)
41.28 RobboLite 0.085d3 x64 (time: 100 ms scale: 1.0)
41.22 Spike 1.4 (time: 100 ms scale: 1.0)
41.04 Houdini x64 1_CPU (time: 100 ms scale: 1.0)
40.93 Houdini 3 x64 (time: 100 ms scale: 1.0)
40.79 Gaviota v0.86 (time: 100 ms scale: 1.0)
40.75 Ruffian 2.1.0 (time: 100 ms scale: 1.0)
40.71 Stockfish 4 64 (time: 100 ms scale: 1.0)
40.70 Godel 2.3.7 (time: 100 ms scale: 1.0)
40.62 Pawny 1.0.x64 (time: 100 ms scale: 1.0)
40.59 Shredder 11 UCI (time: 100 ms scale: 1.0)
40.37 MinkoChess 1.3 x64 (time: 100 ms scale: 1.0)
40.36 Gaviota v0.87-a8 (time: 100 ms scale: 1.0)
40.24 Komodo CCT 64-bit (time: 100 ms scale: 1.0)
40.19 iCE 1.0 v1619 x64 (time: 100 ms scale: 1.0)
40.13 Arasan 16.1 (time: 100 ms scale: 1.0)
40.03 Alfil 13.1 x64 MT (time: 100 ms scale: 1.0)
40.01 GNU Chess 5.50-64 (time: 100 ms scale: 1.0)
39.92 Hannibal 1.3x64 (time: 100 ms scale: 1.0)
39.84 Tornado 4.88 x64 (time: 100 ms scale: 1.0)
39.78 Komodo 5.1 64-bit (time: 100 ms scale: 1.0)
39.66 Atlas 3.50 x64 (time: 100 ms scale: 1.0)
39.16 SlowChess 2.96 (time: 100 ms scale: 1.0)
38.84 Gull 2.2 x64 (time: 100 ms scale: 1.0)
38.54 Quazar 0.4 x64 (time: 100 ms scale: 1.0)
35.63 Texel 1.02 64-bit (time: 100 ms scale: 1.0)
34.33 Deep Sjeng WC2008 x64 (time: 100 ms scale: 1.0)

A score of >55 starts to get suspicious, while greater than 60 is highly likely to be a derivative work. As you can see all of Maverick’s scores are in the 40s! So it passed the test.

Some people seemed to think it was odd I asked about Maverick’s similarity to other engines. It was just simple curiosity. I know Maverick isn’t a clone. Having gone through the similarity test I now have more confidence in the similarity test.

Thanks again Adam for taking the time to test Maverick!

Maverick 0.5 Released!

You can download a new version of Maverick on the download page.

I’ve made quite a few changes since Maverick 0.5. 

  • Added more basic endgame knowledge
  • Added passed pawn evaluation (I think this added a lot of strength)
  • Beefed up the evaluation routine to include basic terms such as mobility
  • Fixed a nasty hash bug to do with mating positions
  • Added one-reply to check extensions in the first ply of the quiescent search (help with some tactics – little impact on playing strength)
  • Enabled hash cutoffs at PV nodes
  • Nodes per second have dropped considerably due to the slower evaluation (now between 3 million and 5 million on my 2.8 GHz i7 notebook)
  • A bazillion tweaks

How Strong is Maverick 0.5?

This is the first version of Maverick which I regards as reasonably strong.  In my tests it’s at least 200 ELO stronger than version 0.2.  If I had to guess I would say it’s about 2350 ELO on the CCRL rating scale.

Here are some test results:

[table align=”center” width=”800″ colwidth=”350|50|50|50″ colalign=”left|right|right|right”]

Match, Maverick, Opponent, Percentage(%)

Maverick 0.5 – Fruit 1.0,50.5,49.5,50.50%

Maverick 0.5 – Dragon 4.6,62.5,37.5,62.50%

Maverick 0.5 – Phalanx XXIII,57.5,42.5,57.50%

Maverick 0.5 – Tao 5.6,38.0,62.0,38.00%

Maverick 0.5 – TJchess 1.1U-x64,57.0,43.0,57.00%

Maverick 0.5 – Maverick 0.2 x64,75.5,24.5,75.50%

[/table]

All the games were played at 1 min per game on a 2.8 GHz Haswell i7 notebook.

Maverick only lost to Tao 5.6, a relatively old (2004) engine, but one I really like. Tao’s rating is about 2500 ELO so this results is expected.  I was surprised Maverick did so well against Dragon and Phalanx. I suspect their CCRL ratings of over 2350 ELO may be inflated.

Mate in 21:

I’ve used this simple position for many years to test hash tables. I first came across it in an advert for the Fidelity Mach III back in (circa.) 1988.  If a hash table is working as it should then an engine will have no problem finding c7, followed by an opposition king maneuver and the queening of the white pawn on b8.

Mate in 21

FEN: 2k5/8/1pP1K3/1P6/8/8/8/8 w – – 0 1

The previous version of Maverick could find c7 but when I added the hash cutoffs at PV nodes Maverick was able to find the mate in 21 in less than 10 seconds – I was pleased.

Next Developments:

I’m planning to release the next version when it can comfortably beat Tao 5.6, YACE and maybe Hermann 2.8. These engines are above 2500 ELO so it should take me a while.  My aim was to have a Maverick playing at 2500 ELO by Christmas – I think that’s still possible.  In the current version there is no king safety or selectivity apart from null move; so I think another 150 ELO by Christmas is possible.

Please met me know if you have any test results for Maverick or if you find a bug!

Yet Another Chess Programming Approach to Bug Hunting

You all know I’m a maniac when it come to hunting down bugs. I’ve posted about it (Bugs-Bugs-Bugs). I suspect many engines don’t get past the 2200 ELO point simply because they are buggy.

Thomas Peztzke made a post on CCC which I found interesting,

I hit the same issue several times, different node counts between Debug and Release, between x32 and x64 and Intel vs MSVC.

Was painful to hunt it down but at the end there was always a bug

Comparing node counts using different compilers is an interesting idea, and one I haven’t previously tried. Since I’ve spent many hours perfecting my perft node counts I was quite sure Maverick would sail through this test.

I was wrong!!

I set up the test and used one of the complex perft positions as an example. I then created debug and release versions of Maverick. To my horror when performing a fixed ply search there were different node counts for the two versions.

How could this be? I know the perft nodes counts are all ok; so my move-generation must be ok – right? Doubt starts to creep into my thinking.

Here’s how I tackled the problem.

First I found the minimum depth which triggered a difference in node counts. In my case this was the 10th ply of an late middlegame position.

Then I created a small function which writes a position’s vital statistics (FEN, hash key and move to be played) to a text file. I added a call to this function during the “make_move” routine, just before I make every single move. Effectively I’m writing the whole search tree to a text file.

I was then able to run the two versions which create the two text files containing the search tree. I renamed them debug-tree.txt and release-tree.text. Both files were about 35 mb in size.

To search by hand through these files would be a boring and tedious task. I’m not a Linux user but I believe Linux has a “diff” command which find the differences between two files. Windows doesn’t come with anything similar. However, there is a useful utility called WinMerge which does the same thing. It’s also free.

When I compared the two files this is what I saw:

chess-search-tree-differences

The yellow sections show the variation. As you can see the first variation in the tree was at line 75,416! The release version seems to go back to the position after evaluating the position with hash key of 15560305028174901452 and play the same move again (e7e4), whereas the debug version plays another move (f3e3). It looks like the release version fired a cutoff, whereas the debug version didn’t. Here’s the critical position after f6e4:

Connected-Passed-Pawns

It doesn’t look like anything special. I then inserted a break to stop the search when it encounters the position (i.e. when there is a hash key match).

Sure enough, the two versions evaluated the same position differently. The bug must be in the evaluation!

Since I knew the problematic position, finding the bug proved to be straightforward. I stepped through the evaluation, comparing the results at each point until I tracked down the bug.

It turns out the bug was in the connected passed pawn code. which was evaluating the white pawns on f2 and g3. The buggy code was the following:

//-- Connected passed pawn mask
connected_pawn_mask[s] = king_mask[s] & neighboring_file[s];

neighboring_files is a pre-calculated array which returns the mask of neighboring files. However, it take the file number as an input, not the square. This was the bug. The corrected code is as follows:

//-- Connected passed pawn mask
connected_pawn_mask[s] = king_mask[s] & neighboring_file[COLUMN(s)];

Once corrected I created and compared the MSVC Release, MSVC Debug and MinGW x64 versions and they all produced the same node count!

This isn’t a technique I’ve used before. I always assumed perft would track down search errors, and flipping the board and comparing white and black evaluations of the flipped position would find evaluation errors. I was wrong. This will be another technique I’ll put in my debugging toolbox.

Is your node count the same for different compilers?

Create a Polyglot Chess Opening Book (Part 2)

 If you’re having problem viewing the video, watch it on YouTube – Create a Polglot Chess Opening Book 

Download Links:

Creating an Opening Book for Maverick

Maverick is the first chess engine I’ve written which can use its own opening book.  I decided at an early stage Maverick would read Polyglot opening books.  It makes so much sense.  Polyglot seems to be a great tool create opening books and its format seems to be emerging as the de-facto opening book standard.

So now I need to actually create an opening book.  I did some research on the internet and it really is quite straightforward. Here are the steps I took:

  1. There are some really good free tools which you’ll need to download:
  2. You will also need a database of high quality game.  I used Ed Schroeder’s Million Base. Make sure your database is in PGN format.
  3. Open the MillionBase using SCID.
  4. We’ll be filtering the games using SCID’s excellent filtering system – select “Search / Header…” (Ctrl-Shift-H)
  5. I created a database consisting of only White wins by players over 2400 ELO with opponents over 2200 ELO (see diagram)Filter Options
  6. I then saved the white win database as “white2400.pgn” using SCID’s “Tools / Export All Filter Games” command
  7. Repeat for Black
  8. Now put both PGN files in the same folder as Polyglot.exe, PGN-Extract.exe, the batch file.
  9. The batch file contains the commands to create the  opening book
pgn-extract -s -C -N -V -tstartpos.txt -oclean-white2400.pgn white2400.pgn
polyglot make-book -only-white -pgn clean-white2400.pgn -bin w1.bin -max-ply 16 -min-game 50
polyglot make-book -only-white -pgn clean-white2400.pgn -bin w2.bin -max-ply 60 -min-game 5
polyglot merge-book -in1 w1.bin -in2 w2.bin -out w12.bin

pgn-extract -s -C -N -V -tstartpos.txt -oclean-black2400.pgn black2400.pgn
polyglot make-book -only-black -pgn clean-black2400.pgn -bin b1.bin -max-ply 16 -min-game 50
polyglot make-book -only-black -pgn clean-black2400.pgn -bin b2.bin -max-ply 60 -min-game 5
polyglot merge-book -in1 b1.bin -in2 b2.bin -out b12.bin

polyglot merge-book -in1 w12.bin -in2 b12.bin -out Maverick.bin

Pause 0
  1. The first line runs PGN-Extract and simply cleans the database, removing and games which don’t start from the normal starting position.
  2. The second line creates a opening book of white moves, where the positions have occurred at least 50 times.
  3. The third line creates a broader book which give more responses to a variety of openings.
  4. This is repeated for Black
  5. Finally all of the books are merged

The final opening book is a little over 1 Mb (it’s available on the download page)- quite small by most standards.  I’ll need to run some test to see how good it is compared to other Polyglot books.

If there are any opening book specialists out there I’d appreciate any feedback. Can the process of creating the book be improved?

Bugs Bugs Bugs!

The chess engine Fruit has many strengths; one of the main ones I believe is simply the lack of bugs. Fruit’s author Fabian Letouzey has a defensive coding style.  He litters his code with “assert” statements.  When he was actively developing Fruit he set out a challenge – he said if you find a bug in Fruit he’ll fix a bug in your code (can’t find link).  I think this highlights his emphasis on finding and fixing bugs.  This philosophy has certainly influenced my coding style.

So now Maverick 0.2 is out I’ve spent most of my chess development time finding and fixing bugs.  There is a great temptation to add knowledge, feature and improve the search. But a couple of times I’ve sat down and just focused on finding bugs. On some occasions I have a specific bug or odd behavior I have identified and which I’d like to fix.  But often I’m just hunting for bugs.  I might set myself the goal of finding ten (small) bugs in two hours.  I then go through Maverick’s code with a fine toothpick, forcing myself to challenge the logic of every line and asking myself what I’m assuming at each stage.  It’s amazing what you discover.  If you have a chess engine give it a try.  Set aside some time to focus on simply finding bugs.

Let me know how you get on!

How strong is Maverick 0.2?

Maverick 0.2 has been out a couple of days now.  I’ve run some test games and I really quite surprised at how strong it seems to be.

Fruit 1.0:

The first test was against Fruit 1.0.  I played this match using Shredder’s GUI. As I expected Fruit won.  I really was dazzles by Fruit’s silky smooth search as it carved it ways through each ply (especially in the endgame).

  • CCRL Rating = ???? (between 2300 and 2400)
  • Time = 1 minute plus 1 second per move
  • Result for Maverick: Wins = 24, Loses = 63, Draws = 13
  • Percentage = 30.5%
  • ELO Difference = -146

I was pleased with this result.  However, it would seem Fruit’s strength is not well established.  As far as I can tell it us somewhere between 2300 and 2400 ELO on the CCRL scale.  This would put Maverick 0.2 between 2150 and 2250 ELO.

Phalanx XXIII:

The second opponent was Phalanx XXIII.  Another classic engine.  As I watched them play it is clear Phalanx has much more knowledge than Maverick.  Yet Maverick did well to win some game based on its speed.

  • CCRL Rating = 2387
  • Results for Maverick: Wins = 36, Loses = 49, Draws = 15,
  • Percentage = 43.5%
  • ELO Difference = -45

Once again I was pleased.  Based on this result Maverick is over 2300 ELO, which is probably too high but let’s see.

And remember, at this stage Maverick is lacking some standard features:

  • No mobility in the evaluation
  • No passed pawn code
  • No king safety
  • No pawn structure code
  • No late move reduction
  • No Internal Iterative Deepening
  • Lots of endgame evaluation missing

So I’m encouraged by Maverick’s strength.  If you run any sort of ratings test I’d be interested in the results.

I will not release another version until it has gained 100 ELO.  This may be some time.

New Mavericks version 0.2

It’s only been a few days since I launched Maverick 0.11.  However,  yesterday was unusual.  I spent most of the day at Sort Hills Mall in New Jersey (don’t ask!).  I set myself the task of finding 10 bugs.  In the end I found three.  As it turns out these three severely impacted Maverick’s playing ability.  So the latest version seems to play at around 2150 ELO on the CCRL scale i.e. about 100 ELO better than version 0.11).

The download links are here and the source code is here.

New Version of Maverick

It’s been just under two weeks since I released Maverick into the wild.   Over that period there have been over 200 downloads – quite amazing.

I’ve also been busy.  After setting up the testing framework I had some time to add some standard chess engine components:

  • Checks in the first ply of the quiescent search
  • Standard endgame knowledge (KQ vs. K, KR vs. K, KBB vs. K, KBN vs. K)
  • Null move heuristic
  • Hash tables
  • Bug fixes (including a nasty time management bug)

To my surprise the null move heuristic didn’t add much value until I added the hash tables.  Once the hash tables were added the increase in strength was noticeable.  By my test this version of Maverick is now better than Monarch 1.7 and approximately +200 ELO better than the previous version.  In a 300 game match it convincingly beat Monarch in a lighting chess match (60 moves in 20 secs) with 140 wins, 104 loses and 56 draws, which is 56%  – yippee!!  The win was replicated at slightly slower time controls of 60 moves in 5 mins when Maverick scored 38 wins, 30 loses and 32 draws (54%).  If these results scale to other opponents it would give Maverick 0.1 a rating of about 2050 ELO on the CCRL rating list.  You can download Maverick 0.11 here.

EDIT: Version 0.11 replaces Version 0.10 which had a time management bug