Python Chess

I’m delighted to give you this guest post by Niklas Fiekas, the creator of Python Chess. You may think Python Chess is just another chess engine. It isn’t. It’s a library of routines which can manipulate and analyze chess data using Python.  After I learnt about Python Chess I immediately went to Code Academy and took their course on Python. I really see this set of tools becoming a key part in any testing or development frame-work.

Over to Niklas…

Python-chess by example: Automating the Bratko-Kopec Test

Decades ago (1982) Kopec and Bratko published a systematic test for assessing the strength chess-playing programs (pdf). Despite its age it can still be fairly challenging, even for modern chess engines. Of course 24 test positions are not going to be enough for a definite result, especially given that most positions are of a specific type: fairly closed and the solution often involves some kind of sacrifice.

For each of the 24 positions a unique best move is known. The engine is given 120 seconds to solve each position. 1 point is awared for the correct best move.

The positions are given as EPDs. In this article we are going to automate the process of testing an UCI engine, introducing python-chess along the way.

 

Why Python (and python-chess)?

When it comes to chess programming, C++ often is the language of choice. Performance is critical. Not nescessarily so, in this case. All intensive calculations will be done by the engine. We just want to convert the given EPD lines, send them to the engine and handle its response. Python is a very good tool for high-level stuff like this (or making a mini chess computer, or making a website to probe tablebases or creating a cross platform chess GUI). We will use python-chess to deal with the chess rules and the involved formats: EPDs, FENs and the UCI protocol. python-chess can also read and write PGNs, read Polyglot opening books and probe Syzygy tablebases.

Chess positions

FEN: 1k1r4/pp1b1R2/3q2pp/4p3/2B5/4Q3/PPP2B2/2K5 b – – 0 1

BK.01 is one of the easier positions: Black to move and mate in 3

In python-chess a position is represented by a chess.Bitboard(). To create a position from a FEN:

You can then check if a specific move is legal:

Or to get the shorter SAN notation of the move:

Or to make a move and see if it is a check (and much more):

Here we are just going to parse an EPD line to setup a position and get the related info:

Communicating with an UCI engine

Next we will communicate via UCI with a chess engine like Maverick. First start the engine process:

The engine expects an initial uci command. Then it will send information about its identity and options.

Now setup the position:

And calculate for two minutes. The result is the best move (according to the engine) and an optional ponder move (the expected reply the engine wants to ponder on).

Or in fact comparing it with the expected best move from the EPD:

 

Putting it all together

Congratulations Maverick 0.51 x64! On my machine you score 18/24, which is almost on a par with Stockfish 6 64’s 19/24.

 

Bonus

The original Brato-Kopec test has one more rule: Sometimes the engine changes its mind. If it had the solution at the 30, 60 or 90 second mark but then discarded it, 1/4, 1/3 or 1/2 points are awarded. In order to do this we need to look at the engines principal variations while it is thinking. This is included in the full code of this example.

You can find out more about Python Chess at https://github.com/niklasf/python-chess.

Maverick 1.0 Released

I’m pleased to be able to release Maverick 1.0. This version adds considerable selectivity to the search. It has a basic implementation of Late Move Reduction and Beta pruning. In my tests it is about 2500 ELO on the CCRL scale, so it’s time to give it the 1.0 version number.

A full list of changes are:

  • Added Late Move Reduction
  • Added beta pruning (inspired by Senpai)
  • Added unstoppable pawn code
  • Added king and pawn endgame knowledge
  • Added knight outpost bonus
  • Added refutation moves to the move ordering

There are also ARM and Linux version included for the first time (thanks Jim Ablett)

I’m pleased with the style of play. Maverick aggression coupled with its positional naivety makes for interesting play! If you play against chess engines I’d be interested in any feedback.

You can download Maverick 1.0 from the Download Page

Free Version of Visual Studio Professional 2013

Last week Microsoft release a Community Edition of Visual Studio 2013.  This is a free version of the Professional edition of Visual Studio 2013. Previously Microsoft’s free edition was Visual Studio Express. This only compiled to 32 bit and didn’t include a Profiler or Profiler-Guided-Optimization.  The new Community Edition includes all of these goodies and can generate 64 bit executables.  This is a big deal for chess programmers. It means you can easily develop and test your engine from within the same high quality development environment (I know we’ve always had GCC but this is more integrated than the mishmash of open source tools).

There are some constraints on who can use the Community Edition. You can only use it if your engine isn’t commercial, or if it brings in less than $1 million per year – I’m sure that covers every chess engine developer!!

You can find out more and download here.

The Holiday season is coming. I’m hoping to have more time to dedicate to chess programming. For the last six months I’ve been swamped with work – but I think there is a light at the end of the tunnel. More updates to come.

Two New Engines Hosted on ChessProgramming.net

I’m excited to let everyone know about two new engines which are to be hosted on this blog.

The first engine is Fruit Reloaded. This is fork of Fabien Letouzey’s Fruit 2.1.  Most of the new development (including SMP search) has been done by Daniel Mehrmann and Ryan Benitez.  You can find out more here:

http://www.chessprogramming.net/fruit-reloaded/

The second, engine is a big surprise. Fabien himself has been dabbling once again in chess programming. He’s come up with a brand new engine – Senpai!  It’s a bitboard engine (Fruit was array based) with SMP search. I ran some quick tests on a beta version and Senpai 1.0 drew a 150 game match against Chiron 2.0. Although this is a small sample of only one engine, it implies a rating of approximately 3100 ELO on the CCRL scale.  You can find out more about Senpai here:

http://www.chessprogramming.net/senpai/

Let the fun begin!

Maverick Wins CCRL Division 7 (and Gets a CCRL Rating)

Over the past month Graham Banks has been running the Division 7 competition. I was delighted when Maverick managed to win with a score of 27.5 out of 44 games. After nine round Maverick languished in the bottom half of the table. It managed to fight back and win! During the tournament I logged onto Graham’s site quite a few time and it was nice to chat with Graham and Erik. There were many nail-biting games – not good for the blood pressure!

Graham then ran a gauntlet competition for Maverick to get enough games for a rating. It managed a respectable rating of 2317 ELO on the CCRL scale. You can see the details here:

Maverick’s CCRL Rating

As I mentioned on a previous post, Maverick doesn’t do so well at slow time controls, so I have happy it came out above 2300 ELO on CCRL.

Many thanks to Graham for taking the time to test Maverick!

Maverick’s Rating at Fast & Slow Time Controls…

I do most of my test at fast time controls. Sometimes game in 10 seconds or 5 seconds plus a small increment. This enables me to evaluate and tune changes using oodles of games.

Based on these super-fast test Maverick 0.51 rating seems to be about 2375 ELO on the CCRL scale. For example, I pitted Maverick against TJ-Chess rated 2367 ELO at 10 seconds per game. After 1000 games Maverick wins by 518 wins 347 loses and 1356 draws. This is +60 ELO.  Maverick seems to get similar results against other engines close to TJ Chess’ strength e.g. OliThink and Dragon 4.6.

So when Graham Banks tested Maverick in Division 6 I thought it would do quite well. I was wrong! Maverick ended up in 11th place out of twelve participants (TJ Chess came in 4th):

Division 6 Results

I thought I’d investigate and run a match at slower time controls. I used the same time controls as the Division 6 tournament (40 moves in 25 minutes) on my a 2.8 GHz i7. The results were not what I expected (or hoped for). Maverick lost with 20 wins 40 draws and 40 loses! This results shows TJ Chess is about 100 ELO better at slower time controls. This is a swing of 160 ELO between slow and fast time controls – far more than I thought.

As a result I’ve revised my estimate of Maverick’s strength based on the time controls.  At super fast time controls it’s about 2400 ELO while at longer time controls it’s 2250 ELO! 

Why the Difference?

I suspected the branching factor in Maverick is quite high. I ran some tests and indeed it seems to be about +4.5. This is high by today’s standard. I think Stockfish is about +2.0. This means every ply takes 4.5 longer to complete than the previous ply. At super-fast time controls Maverick does quite well it’s a relatively fast searcher (4.2 million nps on my i7). As the  time controls get longer the branching factor takes its toll and the more selective engines out-search Maverick.

The high branching factor is almost certainly due to lack of selectivity in Maverick’s search. It does use Null Move but it doesn’t use any form of Late-Move-Reduction, which seems to be the cost common, and most effective, form of selectivity. This is by design. I have a hunch that if I can create a reasonably good evaluation function, I’ll be able to add a more effective form of LMR selectivity, guided by the evaluation function. My aim is to take Maverick up to 2500 ELO before I add this selectivity. It looks like I have 200 ELO to go!

Maverick 1.0 To Do List…

It’s been two months since I did any significant chess coding – life got in the way! I did tinker here and there but nothing special.  However, Christmas is around the corner. I’m hoping to have some time to relax with a IPA beer and do some coding.

I thought it would be worthwhile spending some time to think about what I need to do to create version 1.0 of Maverick. I’ve said from the start, version 1.0 will have a playing strength of 2500 ELO (i.e. Grandmaster level) on the CCRL scale. So I’ll release Maverick 1.0 when I can beat all of the following opponents in a 100 game match at 5 mins per move:

  • Tao 5.6 – I’ve admired Bas Hamstra’s engine for a long time – I hope one day he continues the development
  • YACE  – another classic
  • Anmon 5.6
  • Hermann 2.8 64 bit 

I tested Maverick 0.51 at fast time controls. One minute per game was the slowest time control I used. In my test this gave Maverick a rating of about 2370 ELO. However, in the CCRL Division 6 tournament Maverick struggled. I thought it would be in the top half of the table. But it finished one off the bottom! I suspect this is due to the time controls. Maverick is quite a fast searcher but I intentionally haven’t put any real selectivity into the search apart from null move. This probably results in a higher branching factor than other engines with Late-Move-Reduction. At super fast time control Maverick may do well due to its raw speed, but its higher branching factor holds it back at the slower time control. When version 0.51 is tested by CCRL I suspect it will be 100 ELO lower than my tests i.e. about 2270 ELO. This mean I have a long way to go to reach 2500 ELO.

Before I set out my to-do list for version 1.0 it might be helpful to go over what Maverick already contains i.e. the current state-of-the-engine. Here’s a reasonably comprehensive list:

  • Magic bitboard engine
  • Move dictionary architecture
  • Hash tables (4 slots structure)
  • Killer Moves
  • Evaluation function with the following terms:
    • Middle-game to Endgame interpolation
    • Piece square tables
    • All basic endgames (e.g. B + N +K vs. K)
    • Mobility
    • Passed pawns (a major impact on strength)
    • Basic pawn weaknesses (extremely rudimentary)
  • Null Move (R = 3)
  • Static-Exchange-Evaluation for move ordering and quiescent move pruning
  • Checks in the first ply of of the quiescent search
  • Extend on one-reply-to-check on the first ply of quiescent search
  • Internal Iterative Deepening
  • Enhanced Transposition cutoffs
  • Opening book
  • Pondering (really easy using the UCI protocol)

So I’d say Maverick isn’t too complex and doesn’t contain anything different.

Here’s the list of to-do’s for version 1.0:

  • CLOP tuning
  • Revisit some parameters for tuning:
    • Piece-Square-Tables
    • Mobility
    • Passed pawns
  • Add king safety
  • Add potential passed pawn terms to the evaluation
  • Add pawn storm code when castling on opposite sides
  • Add bad exchange terms to the evaluation (and tune)
  • Change the maximum depth measure to exclude the quiescent search (I think?)
  • Use magic multipliers to calculate mobility and center control
  • Improve knight placement and isolation code
  • Add pawn and king endgame code (this could be a big strength gain)
  • Add special endgame code for K + R vs. K + N (inspired by Thomas Pestzke)
  • Tweak move ordering:
    • Evade check move history
    • Null Move capture-evade
  • Multiple PV
  • Support for Chess960
  • Possibly MAC and Linux support (need to get a development environment up and running for both)

I’m hoping this list will be enough to push 2500 ELO – let’s see!

Have I missed anything?

Is Maverick a Clone?

Let me state unequivocally, Maverick is not a clone!

There have always been problems with clones and computer chess. The first one I recall was Quickstep.  This was a program competing in the 1989 World Computer Chess Championship in Portorož.  After a couple of rounds it was discovered to be a clone of Richard Lang’s Mephisto Almeria program. It was scandalous at the time.  Thankfully there was little cloning activity after this for about ten years. Then Crafty had a few clones (e.g. Bionic Impakt).  But when Fruit burst onto the scene it became a big problem. This then lead to the Rybka controversy (which I’m not doing to get into).

The computer chess community has developed a test to quickly identify clones.  This work was pioneered by Adam Hair.  You can read about it here – A Pairwise Comparison of Chess Engine Move Selections. I must say I was a little skeptical about the test. I wondered if a basic engine with piece-square-tables, mobility and passed-pawn code would choose similar moves even if they weren’t at all related.  Adam was kind enough to put Maverick through his test – here are the results:

------ Maverick 0.5 x64 (No Popcount) (time: 100 ms scale: 1.0) ------
45.50 Fruit 2.1 (time: 100 ms scale: 1.0)
45.24 Movei00_8_438 (time: 100 ms scale: 1.0)
44.74 SmarThink 1.20 (time: 100 ms scale: 1.0)
44.72 Strelka 2.0 B x64 (time: 100 ms scale: 1.0)
44.50 Fruit 2.2.1 (time: 100 ms scale: 1.0)
44.32 Doch64 09.980 JA (time: 100 ms scale: 1.0)
44.19 Loop 2007 32-Bit (time: 100 ms scale: 1.0)
44.17 Strelka 1.8 UCI (time: 100 ms scale: 1.0)
43.99 Toga II 1.0 (time: 100 ms scale: 1.0)
43.72 Nemo SP64n 1.0.1 Beta (time: 100 ms scale: 1.0)
43.38 Rybka 1.0 Beta 32-bit (time: 100 ms scale: 1.0)
43.26 Daydreamer 1.75 JA (time: 100 ms scale: 1.0)
43.17 Twisted Logic 20090922 (time: 100 ms scale: 1.0)
42.94 Nebula 2.0 (time: 100 ms scale: 1.0)
42.91 RedQueen 1.1.4 (time: 100 ms scale: 1.0)
42.66 Cyrano 0.6b17 JA (time: 100 ms scale: 1.0)
42.61 Hamsters 0.7.1 (time: 100 ms scale: 1.0)
42.60 Naum 4.2 (time: 100 ms scale: 1.0)
42.33 Murka 3 x64 UCI (time: 100 ms scale: 1.0)
42.32 spark-1.0 (time: 100 ms scale: 1.0)
42.26 Octochess revision 5190 (time: 100 ms scale: 1.0)
41.81 DiscoCheck 4.3 JA (time: 100 ms scale: 1.0)
41.77 Bobcat 3.25 (time: 100 ms scale: 1.0)
41.67 cheng3 1.07 (time: 100 ms scale: 1.0)
41.58 Gandalf 6.0 (time: 100 ms scale: 1.0)
41.37 Glass 2.0 PERSONALITY (time: 100 ms scale: 1.0)
41.28 RobboLite 0.085d3 x64 (time: 100 ms scale: 1.0)
41.22 Spike 1.4 (time: 100 ms scale: 1.0)
41.04 Houdini x64 1_CPU (time: 100 ms scale: 1.0)
40.93 Houdini 3 x64 (time: 100 ms scale: 1.0)
40.79 Gaviota v0.86 (time: 100 ms scale: 1.0)
40.75 Ruffian 2.1.0 (time: 100 ms scale: 1.0)
40.71 Stockfish 4 64 (time: 100 ms scale: 1.0)
40.70 Godel 2.3.7 (time: 100 ms scale: 1.0)
40.62 Pawny 1.0.x64 (time: 100 ms scale: 1.0)
40.59 Shredder 11 UCI (time: 100 ms scale: 1.0)
40.37 MinkoChess 1.3 x64 (time: 100 ms scale: 1.0)
40.36 Gaviota v0.87-a8 (time: 100 ms scale: 1.0)
40.24 Komodo CCT 64-bit (time: 100 ms scale: 1.0)
40.19 iCE 1.0 v1619 x64 (time: 100 ms scale: 1.0)
40.13 Arasan 16.1 (time: 100 ms scale: 1.0)
40.03 Alfil 13.1 x64 MT (time: 100 ms scale: 1.0)
40.01 GNU Chess 5.50-64 (time: 100 ms scale: 1.0)
39.92 Hannibal 1.3x64 (time: 100 ms scale: 1.0)
39.84 Tornado 4.88 x64 (time: 100 ms scale: 1.0)
39.78 Komodo 5.1 64-bit (time: 100 ms scale: 1.0)
39.66 Atlas 3.50 x64 (time: 100 ms scale: 1.0)
39.16 SlowChess 2.96 (time: 100 ms scale: 1.0)
38.84 Gull 2.2 x64 (time: 100 ms scale: 1.0)
38.54 Quazar 0.4 x64 (time: 100 ms scale: 1.0)
35.63 Texel 1.02 64-bit (time: 100 ms scale: 1.0)
34.33 Deep Sjeng WC2008 x64 (time: 100 ms scale: 1.0)

A score of >55 starts to get suspicious, while greater than 60 is highly likely to be a derivative work. As you can see all of Maverick’s scores are in the 40s! So it passed the test.

Some people seemed to think it was odd I asked about Maverick’s similarity to other engines. It was just simple curiosity. I know Maverick isn’t a clone. Having gone through the similarity test I now have more confidence in the similarity test.

Thanks again Adam for taking the time to test Maverick!

Maverick 0.5 Released!

You can download a new version of Maverick on the download page.

I’ve made quite a few changes since Maverick 0.5. 

  • Added more basic endgame knowledge
  • Added passed pawn evaluation (I think this added a lot of strength)
  • Beefed up the evaluation routine to include basic terms such as mobility
  • Fixed a nasty hash bug to do with mating positions
  • Added one-reply to check extensions in the first ply of the quiescent search (help with some tactics – little impact on playing strength)
  • Enabled hash cutoffs at PV nodes
  • Nodes per second have dropped considerably due to the slower evaluation (now between 3 million and 5 million on my 2.8 GHz i7 notebook)
  • A bazillion tweaks

How Strong is Maverick 0.5?

This is the first version of Maverick which I regards as reasonably strong.  In my tests it’s at least 200 ELO stronger than version 0.2.  If I had to guess I would say it’s about 2350 ELO on the CCRL rating scale.

Here are some test results:

[table align=”center” width=”800″ colwidth=”350|50|50|50″ colalign=”left|right|right|right”]

Match, Maverick, Opponent, Percentage(%)

Maverick 0.5 – Fruit 1.0,50.5,49.5,50.50%

Maverick 0.5 – Dragon 4.6,62.5,37.5,62.50%

Maverick 0.5 – Phalanx XXIII,57.5,42.5,57.50%

Maverick 0.5 – Tao 5.6,38.0,62.0,38.00%

Maverick 0.5 – TJchess 1.1U-x64,57.0,43.0,57.00%

Maverick 0.5 – Maverick 0.2 x64,75.5,24.5,75.50%

[/table]

All the games were played at 1 min per game on a 2.8 GHz Haswell i7 notebook.

Maverick only lost to Tao 5.6, a relatively old (2004) engine, but one I really like. Tao’s rating is about 2500 ELO so this results is expected.  I was surprised Maverick did so well against Dragon and Phalanx. I suspect their CCRL ratings of over 2350 ELO may be inflated.

Mate in 21:

I’ve used this simple position for many years to test hash tables. I first came across it in an advert for the Fidelity Mach III back in (circa.) 1988.  If a hash table is working as it should then an engine will have no problem finding c7, followed by an opposition king maneuver and the queening of the white pawn on b8.

Mate in 21

FEN: 2k5/8/1pP1K3/1P6/8/8/8/8 w – – 0 1

The previous version of Maverick could find c7 but when I added the hash cutoffs at PV nodes Maverick was able to find the mate in 21 in less than 10 seconds – I was pleased.

Next Developments:

I’m planning to release the next version when it can comfortably beat Tao 5.6, YACE and maybe Hermann 2.8. These engines are above 2500 ELO so it should take me a while.  My aim was to have a Maverick playing at 2500 ELO by Christmas – I think that’s still possible.  In the current version there is no king safety or selectivity apart from null move; so I think another 150 ELO by Christmas is possible.

Please met me know if you have any test results for Maverick or if you find a bug!

Yet Another Chess Programming Approach to Bug Hunting

You all know I’m a maniac when it come to hunting down bugs. I’ve posted about it (Bugs-Bugs-Bugs). I suspect many engines don’t get past the 2200 ELO point simply because they are buggy.

Thomas Peztzke made a post on CCC which I found interesting,

I hit the same issue several times, different node counts between Debug and Release, between x32 and x64 and Intel vs MSVC.

Was painful to hunt it down but at the end there was always a bug

Comparing node counts using different compilers is an interesting idea, and one I haven’t previously tried. Since I’ve spent many hours perfecting my perft node counts I was quite sure Maverick would sail through this test.

I was wrong!!

I set up the test and used one of the complex perft positions as an example. I then created debug and release versions of Maverick. To my horror when performing a fixed ply search there were different node counts for the two versions.

How could this be? I know the perft nodes counts are all ok; so my move-generation must be ok – right? Doubt starts to creep into my thinking.

Here’s how I tackled the problem.

First I found the minimum depth which triggered a difference in node counts. In my case this was the 10th ply of an late middlegame position.

Then I created a small function which writes a position’s vital statistics (FEN, hash key and move to be played) to a text file. I added a call to this function during the “make_move” routine, just before I make every single move. Effectively I’m writing the whole search tree to a text file.

I was then able to run the two versions which create the two text files containing the search tree. I renamed them debug-tree.txt and release-tree.text. Both files were about 35 mb in size.

To search by hand through these files would be a boring and tedious task. I’m not a Linux user but I believe Linux has a “diff” command which find the differences between two files. Windows doesn’t come with anything similar. However, there is a useful utility called WinMerge which does the same thing. It’s also free.

When I compared the two files this is what I saw:

chess-search-tree-differences

The yellow sections show the variation. As you can see the first variation in the tree was at line 75,416! The release version seems to go back to the position after evaluating the position with hash key of 15560305028174901452 and play the same move again (e7e4), whereas the debug version plays another move (f3e3). It looks like the release version fired a cutoff, whereas the debug version didn’t. Here’s the critical position after f6e4:

Connected-Passed-Pawns

It doesn’t look like anything special. I then inserted a break to stop the search when it encounters the position (i.e. when there is a hash key match).

Sure enough, the two versions evaluated the same position differently. The bug must be in the evaluation!

Since I knew the problematic position, finding the bug proved to be straightforward. I stepped through the evaluation, comparing the results at each point until I tracked down the bug.

It turns out the bug was in the connected passed pawn code. which was evaluating the white pawns on f2 and g3. The buggy code was the following:

neighboring_files is a pre-calculated array which returns the mask of neighboring files. However, it take the file number as an input, not the square. This was the bug. The corrected code is as follows:

Once corrected I created and compared the MSVC Release, MSVC Debug and MinGW x64 versions and they all produced the same node count!

This isn’t a technique I’ve used before. I always assumed perft would track down search errors, and flipping the board and comparing white and black evaluations of the flipped position would find evaluation errors. I was wrong. This will be another technique I’ll put in my debugging toolbox.

Is your node count the same for different compilers?