Automatic Tuning of Evaluation Function

When it comes to Maverick’s evaluation function I’m frustrated and excited in equal measure!

My focus over the last couple of months has been to improve Maverick’s evaluation function.  However I’ve found it quite difficult to improve on Maverick 0.51’s actual playing strength.  Adding extra knowledge (e.g backward pawns) seems to add little in terms of playing strength. I’m struck by the massive number of ways to encode chess knowledge.  I feel like a blind archer as I make wild guesses trying to calibrate the parameters.

There must be a better way!

Early on in Maverick’s development I came across Thomas Petzke‘s approach to tuning evaluation function. He uses a form of genetic algorithm (PBIL) to tune the parameters.  PBIL optimization algorithms are really neat – they represents each number in binary format. Each bit of each number is represented as a floating point value between zero and one. As the system “learns” the floating point values are “trained” and gravitate to either zero or one based on the training data. In Thomas’ case he played a few games of chess to assess the fitness of a set of parameters. This is expensive – but ultimately game-playing-ability is the attribute we’d like to optimize.  Sp maybe the training time is justified.

Back in 2000 I worked a lot with standard genetic algorithms. I used them to evaluate marketing campaigns. I think PBIL may be even better for evaluating marketing campaigns (but that’s a story for another day).  I’m certainly interested in using them to tune Maverick’s evaluation function. The only problem is Thomas’ method take ages to complete (weeks!). I’d prefer a method which is quicker.

Then I came across a post on CCC by Peter Österlund:

How Do You Automatically Tune Your Evaluation Tables

Peter outlines a new way to tune an evaluation function. His approach takes 2.5 million chess positions and minimizes the following fitness function:

E = 1/N * sum(i=1,N, (result[i] – Sigmoid(QuiescientScore(pos[i])))^2)

This is really quite interesting for the following reasons:

  • Since we’re not playing complete chess games this runs *much* faster – maybe less than one day of computing time
  • The sigmoid function is *really* sensitive in the -100 to +100 centipawn range. This is a critical region where virtually all games are decided. If we can create an evaluation function which accurately evaluated this range then we’re likely to have a strong chess engine
  • I suspect Houdini uses a similar technique to calibrate its evaluation function since it’s evaluation is linked to the probability of winning. Robert Houdart mentions this on his website, “Houdini 4 uses calibrated evaluations in which engine scores correlate directly with the win expectancy in the position. A +1.00 pawn advantage gives a 80% chance of winning the game against an equal opponent at blitz time control. At +2.00 the engine will win 95% of the time, and at +3.00 about 99% of the time. If the advantage is +0.50, expect to win nearly 50% of the time
  • Some people have tested this approach to training have reported good results
  • When tested the pawn evaluation parameters (resulting from the PBIL optimization) they have varied considerably between the middlegame and endgame. The middlegame value of a pawn is 50 centipawns, while the endgame value is +145 centipawn. If these values are robust and uses in normal play they are likely to produce exciting chess where the engine is happy to sacrifices a pawn for a modest positional advantage. This sounds like the recipe for an interesting engine – which is one of the goals for Maverick!

So I’m in the process of rewriting Maverick evaluation code to accommodate the PBIL algorithm. I’m also writing a PGN parser (in Delphi) so I can train the evaluation function using different sets of training positions.

With all of this re-writing it may be a little while before I can release a new version of Maverick. My aim is still to take Maverick to 2500 ELO using only an improved evaluation function!

I’ll keep you posted!

Maverick Wins CCRL Division 7 (and Gets a CCRL Rating)

Over the past month Graham Banks has been running the Division 7 competition. I was delighted when Maverick managed to win with a score of 27.5 out of 44 games. After nine round Maverick languished in the bottom half of the table. It managed to fight back and win! During the tournament I logged onto Graham’s site quite a few time and it was nice to chat with Graham and Erik. There were many nail-biting games – not good for the blood pressure!

Graham then ran a gauntlet competition for Maverick to get enough games for a rating. It managed a respectable rating of 2317 ELO on the CCRL scale. You can see the details here:

Maverick’s CCRL Rating

As I mentioned on a previous post, Maverick doesn’t do so well at slow time controls, so I have happy it came out above 2300 ELO on CCRL.

Many thanks to Graham for taking the time to test Maverick!

Maverick’s Rating at Fast & Slow Time Controls…

I do most of my test at fast time controls. Sometimes game in 10 seconds or 5 seconds plus a small increment. This enables me to evaluate and tune changes using oodles of games.

Based on these super-fast test Maverick 0.51 rating seems to be about 2375 ELO on the CCRL scale. For example, I pitted Maverick against TJ-Chess rated 2367 ELO at 10 seconds per game. After 1000 games Maverick wins by 518 wins 347 loses and 1356 draws. This is +60 ELO.  Maverick seems to get similar results against other engines close to TJ Chess’ strength e.g. OliThink and Dragon 4.6.

So when Graham Banks tested Maverick in Division 6 I thought it would do quite well. I was wrong! Maverick ended up in 11th place out of twelve participants (TJ Chess came in 4th):

Division 6 Results

I thought I’d investigate and run a match at slower time controls. I used the same time controls as the Division 6 tournament (40 moves in 25 minutes) on my a 2.8 GHz i7. The results were not what I expected (or hoped for). Maverick lost with 20 wins 40 draws and 40 loses! This results shows TJ Chess is about 100 ELO better at slower time controls. This is a swing of 160 ELO between slow and fast time controls – far more than I thought.

As a result I’ve revised my estimate of Maverick’s strength based on the time controls.  At super fast time controls it’s about 2400 ELO while at longer time controls it’s 2250 ELO! 

Why the Difference?

I suspected the branching factor in Maverick is quite high. I ran some tests and indeed it seems to be about +4.5. This is high by today’s standard. I think Stockfish is about +2.0. This means every ply takes 4.5 longer to complete than the previous ply. At super-fast time controls Maverick does quite well it’s a relatively fast searcher (4.2 million nps on my i7). As the  time controls get longer the branching factor takes its toll and the more selective engines out-search Maverick.

The high branching factor is almost certainly due to lack of selectivity in Maverick’s search. It does use Null Move but it doesn’t use any form of Late-Move-Reduction, which seems to be the cost common, and most effective, form of selectivity. This is by design. I have a hunch that if I can create a reasonably good evaluation function, I’ll be able to add a more effective form of LMR selectivity, guided by the evaluation function. My aim is to take Maverick up to 2500 ELO before I add this selectivity. It looks like I have 200 ELO to go!

New Bitboard Viewer (Now with Hexadecimal)

After stuffing myself silly with a scrummy Christmas dinner, I felt I need to work it off a little with a bit of programming. I have had a number of requests to add hexadecimal support to the Bitboard Viewer. The original version also had a nasty overflow bug which showed up if bit 63 was set. I managed to fight off the tryptophan from the turkey to correct the bug and add hexadecimal support:


I also added the ability to click on a file or rank title to XOR the bits in the corresponding file or rank.

You can download it here.


Maverick 1.0 To Do List…

It’s been two months since I did any significant chess coding – life got in the way! I did tinker here and there but nothing special.  However, Christmas is around the corner. I’m hoping to have some time to relax with a IPA beer and do some coding.

I thought it would be worthwhile spending some time to think about what I need to do to create version 1.0 of Maverick. I’ve said from the start, version 1.0 will have a playing strength of 2500 ELO (i.e. Grandmaster level) on the CCRL scale. So I’ll release Maverick 1.0 when I can beat all of the following opponents in a 100 game match at 5 mins per move:

  • Tao 5.6 – I’ve admired Bas Hamstra’s engine for a long time – I hope one day he continues the development
  • YACE  – another classic
  • Anmon 5.6
  • Hermann 2.8 64 bit 

I tested Maverick 0.51 at fast time controls. One minute per game was the slowest time control I used. In my test this gave Maverick a rating of about 2370 ELO. However, in the CCRL Division 6 tournament Maverick struggled. I thought it would be in the top half of the table. But it finished one off the bottom! I suspect this is due to the time controls. Maverick is quite a fast searcher but I intentionally haven’t put any real selectivity into the search apart from null move. This probably results in a higher branching factor than other engines with Late-Move-Reduction. At super fast time control Maverick may do well due to its raw speed, but its higher branching factor holds it back at the slower time control. When version 0.51 is tested by CCRL I suspect it will be 100 ELO lower than my tests i.e. about 2270 ELO. This mean I have a long way to go to reach 2500 ELO.

Before I set out my to-do list for version 1.0 it might be helpful to go over what Maverick already contains i.e. the current state-of-the-engine. Here’s a reasonably comprehensive list:

  • Magic bitboard engine
  • Move dictionary architecture
  • Hash tables (4 slots structure)
  • Killer Moves
  • Evaluation function with the following terms:
    • Middle-game to Endgame interpolation
    • Piece square tables
    • All basic endgames (e.g. B + N +K vs. K)
    • Mobility
    • Passed pawns (a major impact on strength)
    • Basic pawn weaknesses (extremely rudimentary)
  • Null Move (R = 3)
  • Static-Exchange-Evaluation for move ordering and quiescent move pruning
  • Checks in the first ply of of the quiescent search
  • Extend on one-reply-to-check on the first ply of quiescent search
  • Internal Iterative Deepening
  • Enhanced Transposition cutoffs
  • Opening book
  • Pondering (really easy using the UCI protocol)

So I’d say Maverick isn’t too complex and doesn’t contain anything different.

Here’s the list of to-do’s for version 1.0:

  • CLOP tuning
  • Revisit some parameters for tuning:
    • Piece-Square-Tables
    • Mobility
    • Passed pawns
  • Add king safety
  • Add potential passed pawn terms to the evaluation
  • Add pawn storm code when castling on opposite sides
  • Add bad exchange terms to the evaluation (and tune)
  • Change the maximum depth measure to exclude the quiescent search (I think?)
  • Use magic multipliers to calculate mobility and center control
  • Improve knight placement and isolation code
  • Add pawn and king endgame code (this could be a big strength gain)
  • Add special endgame code for K + R vs. K + N (inspired by Thomas Pestzke)
  • Tweak move ordering:
    • Evade check move history
    • Null Move capture-evade
  • Multiple PV
  • Support for Chess960
  • Possibly MAC and Linux support (need to get a development environment up and running for both)

I’m hoping this list will be enough to push 2500 ELO – let’s see!

Have I missed anything?

Bug Fix Release of Maverick (Version 0.51)

Maverick 0.5 has been in the wild for about a month. I’m pleased to say it seems to be playing at 2300 ELO or above.

However, occasionally it would crash. This really niggled me.

I do quite a lot of turbo-testing using CuteChess-Cli at fast time controls.  So I had convinced myself the occasional crash was probably an overload of the standard input and output pipes (I’m not even sure if this is possible). However, since I’ve tested Maverick quite thoroughly using perft positions, I was confident the basic chess engine routines were bug free. I was wrong.

This week I have been down in São Paulo Brazil on business. Graham Banks has been testing Maverick and broadcasting the games.  Since I had nothing better to do I logged in and watched a few games.  It’s amazing how nerve wracking it is to watch your engine play a game at 40 moves in 40 minutes. As I was watching, and to my surprise, Maverick suddenly crashed. I quickly copied the position and thankfully was able to reproduce the crash!

The bug was quite bizarre. Take a look at the position below:


After black has played pawn to g4, white is in check from the queen on h6. However, the code which finds the source of the check is shown below:

discovered-check-bug-codeThis was a “copy and paste” error. The code thinks the knight on c2 is actually a king attacking the white king. This was the bug.  It was easy to fix.  Hopefully this will result in no more crashed.

You can downloaded the fixed version (Maverick 0.51) on the Download Page. There are really no strength improvements for this bug-fix release.  If you are testing Maverick you can keep the games already played.

Is Maverick a Clone?

Let me state unequivocally, Maverick is not a clone!

There have always been problems with clones and computer chess. The first one I recall was Quickstep.  This was a program competing in the 1989 World Computer Chess Championship in Portorož.  After a couple of rounds it was discovered to be a clone of Richard Lang’s Mephisto Almeria program. It was scandalous at the time.  Thankfully there was little cloning activity after this for about ten years. Then Crafty had a few clones (e.g. Bionic Impakt).  But when Fruit burst onto the scene it became a big problem. This then lead to the Rybka controversy (which I’m not doing to get into).

The computer chess community has developed a test to quickly identify clones.  This work was pioneered by Adam Hair.  You can read about it here – A Pairwise Comparison of Chess Engine Move Selections. I must say I was a little skeptical about the test. I wondered if a basic engine with piece-square-tables, mobility and passed-pawn code would choose similar moves even if they weren’t at all related.  Adam was kind enough to put Maverick through his test – here are the results:

------ Maverick 0.5 x64 (No Popcount) (time: 100 ms scale: 1.0) ------
45.50 Fruit 2.1 (time: 100 ms scale: 1.0)
45.24 Movei00_8_438 (time: 100 ms scale: 1.0)
44.74 SmarThink 1.20 (time: 100 ms scale: 1.0)
44.72 Strelka 2.0 B x64 (time: 100 ms scale: 1.0)
44.50 Fruit 2.2.1 (time: 100 ms scale: 1.0)
44.32 Doch64 09.980 JA (time: 100 ms scale: 1.0)
44.19 Loop 2007 32-Bit (time: 100 ms scale: 1.0)
44.17 Strelka 1.8 UCI (time: 100 ms scale: 1.0)
43.99 Toga II 1.0 (time: 100 ms scale: 1.0)
43.72 Nemo SP64n 1.0.1 Beta (time: 100 ms scale: 1.0)
43.38 Rybka 1.0 Beta 32-bit (time: 100 ms scale: 1.0)
43.26 Daydreamer 1.75 JA (time: 100 ms scale: 1.0)
43.17 Twisted Logic 20090922 (time: 100 ms scale: 1.0)
42.94 Nebula 2.0 (time: 100 ms scale: 1.0)
42.91 RedQueen 1.1.4 (time: 100 ms scale: 1.0)
42.66 Cyrano 0.6b17 JA (time: 100 ms scale: 1.0)
42.61 Hamsters 0.7.1 (time: 100 ms scale: 1.0)
42.60 Naum 4.2 (time: 100 ms scale: 1.0)
42.33 Murka 3 x64 UCI (time: 100 ms scale: 1.0)
42.32 spark-1.0 (time: 100 ms scale: 1.0)
42.26 Octochess revision 5190 (time: 100 ms scale: 1.0)
41.81 DiscoCheck 4.3 JA (time: 100 ms scale: 1.0)
41.77 Bobcat 3.25 (time: 100 ms scale: 1.0)
41.67 cheng3 1.07 (time: 100 ms scale: 1.0)
41.58 Gandalf 6.0 (time: 100 ms scale: 1.0)
41.37 Glass 2.0 PERSONALITY (time: 100 ms scale: 1.0)
41.28 RobboLite 0.085d3 x64 (time: 100 ms scale: 1.0)
41.22 Spike 1.4 (time: 100 ms scale: 1.0)
41.04 Houdini x64 1_CPU (time: 100 ms scale: 1.0)
40.93 Houdini 3 x64 (time: 100 ms scale: 1.0)
40.79 Gaviota v0.86 (time: 100 ms scale: 1.0)
40.75 Ruffian 2.1.0 (time: 100 ms scale: 1.0)
40.71 Stockfish 4 64 (time: 100 ms scale: 1.0)
40.70 Godel 2.3.7 (time: 100 ms scale: 1.0)
40.62 Pawny 1.0.x64 (time: 100 ms scale: 1.0)
40.59 Shredder 11 UCI (time: 100 ms scale: 1.0)
40.37 MinkoChess 1.3 x64 (time: 100 ms scale: 1.0)
40.36 Gaviota v0.87-a8 (time: 100 ms scale: 1.0)
40.24 Komodo CCT 64-bit (time: 100 ms scale: 1.0)
40.19 iCE 1.0 v1619 x64 (time: 100 ms scale: 1.0)
40.13 Arasan 16.1 (time: 100 ms scale: 1.0)
40.03 Alfil 13.1 x64 MT (time: 100 ms scale: 1.0)
40.01 GNU Chess 5.50-64 (time: 100 ms scale: 1.0)
39.92 Hannibal 1.3x64 (time: 100 ms scale: 1.0)
39.84 Tornado 4.88 x64 (time: 100 ms scale: 1.0)
39.78 Komodo 5.1 64-bit (time: 100 ms scale: 1.0)
39.66 Atlas 3.50 x64 (time: 100 ms scale: 1.0)
39.16 SlowChess 2.96 (time: 100 ms scale: 1.0)
38.84 Gull 2.2 x64 (time: 100 ms scale: 1.0)
38.54 Quazar 0.4 x64 (time: 100 ms scale: 1.0)
35.63 Texel 1.02 64-bit (time: 100 ms scale: 1.0)
34.33 Deep Sjeng WC2008 x64 (time: 100 ms scale: 1.0)

A score of >55 starts to get suspicious, while greater than 60 is highly likely to be a derivative work. As you can see all of Maverick’s scores are in the 40s! So it passed the test.

Some people seemed to think it was odd I asked about Maverick’s similarity to other engines. It was just simple curiosity. I know Maverick isn’t a clone. Having gone through the similarity test I now have more confidence in the similarity test.

Thanks again Adam for taking the time to test Maverick!

Maverick 0.5 Released!

You can download a new version of Maverick on the download page.

I’ve made quite a few changes since Maverick 0.5. 

  • Added more basic endgame knowledge
  • Added passed pawn evaluation (I think this added a lot of strength)
  • Beefed up the evaluation routine to include basic terms such as mobility
  • Fixed a nasty hash bug to do with mating positions
  • Added one-reply to check extensions in the first ply of the quiescent search (help with some tactics – little impact on playing strength)
  • Enabled hash cutoffs at PV nodes
  • Nodes per second have dropped considerably due to the slower evaluation (now between 3 million and 5 million on my 2.8 GHz i7 notebook)
  • A bazillion tweaks

How Strong is Maverick 0.5?

This is the first version of Maverick which I regards as reasonably strong.  In my tests it’s at least 200 ELO stronger than version 0.2.  If I had to guess I would say it’s about 2350 ELO on the CCRL rating scale.

Here are some test results:

[table align=”center” width=”800″ colwidth=”350|50|50|50″ colalign=”left|right|right|right”]

Match, Maverick, Opponent, Percentage(%)

Maverick 0.5 – Fruit 1.0,50.5,49.5,50.50%

Maverick 0.5 – Dragon 4.6,62.5,37.5,62.50%

Maverick 0.5 – Phalanx XXIII,57.5,42.5,57.50%

Maverick 0.5 – Tao 5.6,38.0,62.0,38.00%

Maverick 0.5 – TJchess 1.1U-x64,57.0,43.0,57.00%

Maverick 0.5 – Maverick 0.2 x64,75.5,24.5,75.50%


All the games were played at 1 min per game on a 2.8 GHz Haswell i7 notebook.

Maverick only lost to Tao 5.6, a relatively old (2004) engine, but one I really like. Tao’s rating is about 2500 ELO so this results is expected.  I was surprised Maverick did so well against Dragon and Phalanx. I suspect their CCRL ratings of over 2350 ELO may be inflated.

Mate in 21:

I’ve used this simple position for many years to test hash tables. I first came across it in an advert for the Fidelity Mach III back in (circa.) 1988.  If a hash table is working as it should then an engine will have no problem finding c7, followed by an opposition king maneuver and the queening of the white pawn on b8.

Mate in 21

FEN: 2k5/8/1pP1K3/1P6/8/8/8/8 w – – 0 1

The previous version of Maverick could find c7 but when I added the hash cutoffs at PV nodes Maverick was able to find the mate in 21 in less than 10 seconds – I was pleased.

Next Developments:

I’m planning to release the next version when it can comfortably beat Tao 5.6, YACE and maybe Hermann 2.8. These engines are above 2500 ELO so it should take me a while.  My aim was to have a Maverick playing at 2500 ELO by Christmas – I think that’s still possible.  In the current version there is no king safety or selectivity apart from null move; so I think another 150 ELO by Christmas is possible.

Please met me know if you have any test results for Maverick or if you find a bug!

Yet Another Chess Programming Approach to Bug Hunting

You all know I’m a maniac when it come to hunting down bugs. I’ve posted about it (Bugs-Bugs-Bugs). I suspect many engines don’t get past the 2200 ELO point simply because they are buggy.

Thomas Peztzke made a post on CCC which I found interesting,

I hit the same issue several times, different node counts between Debug and Release, between x32 and x64 and Intel vs MSVC.

Was painful to hunt it down but at the end there was always a bug

Comparing node counts using different compilers is an interesting idea, and one I haven’t previously tried. Since I’ve spent many hours perfecting my perft node counts I was quite sure Maverick would sail through this test.

I was wrong!!

I set up the test and used one of the complex perft positions as an example. I then created debug and release versions of Maverick. To my horror when performing a fixed ply search there were different node counts for the two versions.

How could this be? I know the perft nodes counts are all ok; so my move-generation must be ok – right? Doubt starts to creep into my thinking.

Here’s how I tackled the problem.

First I found the minimum depth which triggered a difference in node counts. In my case this was the 10th ply of an late middlegame position.

Then I created a small function which writes a position’s vital statistics (FEN, hash key and move to be played) to a text file. I added a call to this function during the “make_move” routine, just before I make every single move. Effectively I’m writing the whole search tree to a text file.

I was then able to run the two versions which create the two text files containing the search tree. I renamed them debug-tree.txt and release-tree.text. Both files were about 35 mb in size.

To search by hand through these files would be a boring and tedious task. I’m not a Linux user but I believe Linux has a “diff” command which find the differences between two files. Windows doesn’t come with anything similar. However, there is a useful utility called WinMerge which does the same thing. It’s also free.

When I compared the two files this is what I saw:


The yellow sections show the variation. As you can see the first variation in the tree was at line 75,416! The release version seems to go back to the position after evaluating the position with hash key of 15560305028174901452 and play the same move again (e7e4), whereas the debug version plays another move (f3e3). It looks like the release version fired a cutoff, whereas the debug version didn’t. Here’s the critical position after f6e4:


It doesn’t look like anything special. I then inserted a break to stop the search when it encounters the position (i.e. when there is a hash key match).

Sure enough, the two versions evaluated the same position differently. The bug must be in the evaluation!

Since I knew the problematic position, finding the bug proved to be straightforward. I stepped through the evaluation, comparing the results at each point until I tracked down the bug.

It turns out the bug was in the connected passed pawn code. which was evaluating the white pawns on f2 and g3. The buggy code was the following:

neighboring_files is a pre-calculated array which returns the mask of neighboring files. However, it take the file number as an input, not the square. This was the bug. The corrected code is as follows:

Once corrected I created and compared the MSVC Release, MSVC Debug and MinGW x64 versions and they all produced the same node count!

This isn’t a technique I’ve used before. I always assumed perft would track down search errors, and flipping the board and comparing white and black evaluations of the flipped position would find evaluation errors. I was wrong. This will be another technique I’ll put in my debugging toolbox.

Is your node count the same for different compilers?

Create a Polyglot Chess Opening Book (Part 2)

 If you’re having problem viewing the video, watch it on YouTube – Create a Polglot Chess Opening Book 

Download Links: