New Bitboard Viewer (Now with Hexadecimal)

After stuffing myself silly with a scrummy Christmas dinner, I felt I need to work it off a little with a bit of programming. I have had a number of requests to add hexadecimal support to the Bitboard Viewer. The original version also had a nasty overflow bug which showed up if bit 63 was set. I managed to fight off the tryptophan from the turkey to correct the bug and add hexadecimal support:


I also added the ability to click on a file or rank title to XOR the bits in the corresponding file or rank.

You can download it here.


Yet Another Chess Programming Approach to Bug Hunting

You all know I’m a maniac when it come to hunting down bugs. I’ve posted about it (Bugs-Bugs-Bugs). I suspect many engines don’t get past the 2200 ELO point simply because they are buggy.

Thomas Peztzke made a post on CCC which I found interesting,

I hit the same issue several times, different node counts between Debug and Release, between x32 and x64 and Intel vs MSVC.

Was painful to hunt it down but at the end there was always a bug

Comparing node counts using different compilers is an interesting idea, and one I haven’t previously tried. Since I’ve spent many hours perfecting my perft node counts I was quite sure Maverick would sail through this test.

I was wrong!!

I set up the test and used one of the complex perft positions as an example. I then created debug and release versions of Maverick. To my horror when performing a fixed ply search there were different node counts for the two versions.

How could this be? I know the perft nodes counts are all ok; so my move-generation must be ok – right? Doubt starts to creep into my thinking.

Here’s how I tackled the problem.

First I found the minimum depth which triggered a difference in node counts. In my case this was the 10th ply of an late middlegame position.

Then I created a small function which writes a position’s vital statistics (FEN, hash key and move to be played) to a text file. I added a call to this function during the “make_move” routine, just before I make every single move. Effectively I’m writing the whole search tree to a text file.

I was then able to run the two versions which create the two text files containing the search tree. I renamed them debug-tree.txt and release-tree.text. Both files were about 35 mb in size.

To search by hand through these files would be a boring and tedious task. I’m not a Linux user but I believe Linux has a “diff” command which find the differences between two files. Windows doesn’t come with anything similar. However, there is a useful utility called WinMerge which does the same thing. It’s also free.

When I compared the two files this is what I saw:


The yellow sections show the variation. As you can see the first variation in the tree was at line 75,416! The release version seems to go back to the position after evaluating the position with hash key of 15560305028174901452 and play the same move again (e7e4), whereas the debug version plays another move (f3e3). It looks like the release version fired a cutoff, whereas the debug version didn’t. Here’s the critical position after f6e4:


It doesn’t look like anything special. I then inserted a break to stop the search when it encounters the position (i.e. when there is a hash key match).

Sure enough, the two versions evaluated the same position differently. The bug must be in the evaluation!

Since I knew the problematic position, finding the bug proved to be straightforward. I stepped through the evaluation, comparing the results at each point until I tracked down the bug.

It turns out the bug was in the connected passed pawn code. which was evaluating the white pawns on f2 and g3. The buggy code was the following:

neighboring_files is a pre-calculated array which returns the mask of neighboring files. However, it take the file number as an input, not the square. This was the bug. The corrected code is as follows:

Once corrected I created and compared the MSVC Release, MSVC Debug and MinGW x64 versions and they all produced the same node count!

This isn’t a technique I’ve used before. I always assumed perft would track down search errors, and flipping the board and comparing white and black evaluations of the flipped position would find evaluation errors. I was wrong. This will be another technique I’ll put in my debugging toolbox.

Is your node count the same for different compilers?

US Citizenship & x5 Speed-up!

It’s been a while since I last posted.  I’ve been busy getting all of the UCI plumbing in place.  Maverick is almost at the stage it can play a game of chess.  I’ve created the basic search routines – complete with alpha-beta, quiescent search and root search. 

Some early tests show quite a speed-up compared to my old Monarch letterbox framework.  On my 2.8 GHz i7 4900MQ the old Monarch 1.7 engine is doing about 1.7 million nodes per second.  On the same positions Maverick is doing about 8.6 million.  I expect this will come down once I add hash tables and beef up the evaluation but this is better than I expected.  It’s about a five times speed-up.  I think quite a bit is due to the 64 bit GCC 4.8 compiler but I’m happy with the result.

I’m hoping Maverick will play its first game this weekend.

On another note – I became a US citizen today!

How to Debug Capture & Check Move Generators?

As I’ve explained in previous posts, a perft routine is a great way to debug a move generator. If you can correctly calculate the leaf node count for a variety of positions you are (almost) certain to have eradicated all the bug in your move generator.

But what about the quiescent search? Unlike the regular search, the quiescent search doesn’t look at all moves. And I’m not aware of any specialized quiescent perft test.

At a minimum the quiescent search generates all of the possible capture moves: and in most cases it also generates checks (at least for the first ply). How can we write these move generation routines with a high level of certainty they are bug free?

When I write Monarch I created a perft routine but didn’t create any test routines for the ‘generate_captures’ or ‘generate_checks’ routines. With Maverick I’m more determined to try to write a test routine for each procedure. So I wrote three new move generators:

  1. generate_captures
  2. generate_quiet_checks
  3. generate_quiet_moves

As the name suggests, the first routine generates all captures except en-passant. Captures which promote the pawn are limited only to queen promotions.

The second routine, generates checking moves which are not captures. It ignores pawn promotions, en-passant moves and castling move which give check.

The third routine generates all the other moves. When Maverick finally plays chess I don’t expect to use this routine at all. However, all three routines together generate the same set of moves as the regular move generator. This means I can test them using a version of the perft routine which calls all three specialized move generators for each position. It worked well and I was able to trap a couple of bugs.

What really amazed me was the speed of this adjusted perft routine – it was only about 10% slower than the regular routine. This is all down to the bitboard structure. With a letterbox structure I’m sure there would have been a much wider gap between the normal perft and this specialized move generator version.

Bitboards rule!!!

Is Perft Speed Important?

The perft routine in Maverick is almost fully debugged!  I believe this is an important milestone in any chess engine’s development.  It’s really the first time the code “plays” chess, in the sense it generates moves, make and un-makes the moves, and iterates up and down the search tree.

I also think the speed of the perft routine is significant and is a measure which can be used to compare engines.  Now some people will disagree with me but here is my logic:

Validation of Structure:  The structure of a chess program could be defined as how the board is represented and the way moves are generated and stored.  This impacts the overall speed of an engine.   A perft routine is a good measure of manipulating this chess board structure.  Clearly there are other factors, most notable the size and complexity of the evaluation function.  But I would even argue a fast perft implementation is a indicator of a chess structure which can support a good evaluation function.  If you think about it, any half decent evaluation function must iterate over the squares and find some proxy for mobility.   At a basic level this is what is happening in a perft routine.  So I would argue a fast perft speed is an indication of a solid foundation upon which a strong chess engine can be developed.

Difficult to “Fiddle”:  Some people talk about an engine being “fast” based on the self reported measure of “number of nodes processed per second’.  The problem with nodes per second as a measure of speed is the definition of a node.  There is no standard definition of a node.  Some engine authors definite it as a call to the “alpha-beta” routine, while other base it around the make / undo move routines, and then again others use the “generate-move” procedure.  There isn’t one standard.  And the measure can easily be fiddled.  This is not the case with perft.  There are only really two ways to count the number of nodes; one being the number of leaf nodes, and the other is the total nodes (internal and leaf).  It would seem the standard measure is the number of leaf nodes.

Having said it’s difficult to fiddle there are three distinct approaches to perft routines and each one impacts the speed:

Make & Unmake All Moves: Most chess engines generate pseudo legal moves.  These are moves which are generally legal but may expose their king to a discovered attack and so are not actually level.  The reason chess engines generate pseudo legal moves is to save the time checking to see if there is a discovered check (which is costly) and may not even be required is there is a “cut-off”.  So the test for the discovered check is often carried out as part of the “Make-Move” routine.  The simplest perft implementation simply iterates through the depths generating the moves, make and unmaking each move in turn and counting the number of nodes.  This is the slowest type of perft implementation.  I regard this type of perft as a measure of the efficiency of the make and unmake routines, since this is the task which is carried out the most.

Counting Legal Moves at The Leaf: In contrast to the first method, another approach is to generate only truly legal moves.  In this approach a lot of time can be saved at the leaf nodes by simply returning the number of moves generated (without having to make and unmake each one).  The cost is a slightly more complex move generator which must detect potential discovered check.  In general this approach will be quite a bit faster than the first approach.  I regards this approach as a measure of the efficiency of the move generation routines.

Hashed Moves: In a perft search many transpositions occur.  This means the whole search can be significantly sped up by hashing the results and storing them in a hash table.  If the position reoccurs in the tree, the number of nodes in the sub-tree can be retrieved from the hash table and there is no need to search the sub tree.  I have not implemented this in Maverick.

Parallel Search: This is not something I have implemented in Maverick but the speed of the perft routine could be improved by implementing a parallel multiprocessor search.

Initial Perft Speed Results:

Based on the above logic I was eager to see how fast the new bitboard structure is compared to Monarch, which used a letter-box data structure. Monarch isn’t by any measure a fast searcher, so I was hoping for a speed improvement.  The position I used is from:

perft chess

FEN: r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq –

The number of leaf nodes to a depth of 5 ply deep is 193,690,690.  Monarch’s perft routine is of the “make / unmake all moves” type.  It managed to crank out the six five ply perft for the above position in 71.5 seconds on my Core i7 2670QM running a 2.2 GHz.  To my surprise Maverick blew this away with a time of only 16.3 second.  This is 4.3 times speed up – woohoo!!  Maverick’s legal move generator approach completed perft 5 in exactly 5.0 seconds.

Both engines were compiled using Microsoft Visual Studio Express 2012 in 32 bit mode.  I imagine there will be a reasonable speedup when I move to 64 bit and add the SSE bit-twiddling routines.  This is a much bigger speed-up than I anticipated and illustrates the superiority of the magic bitboard approach.

What type of perft routine does your engine employ and how fast is it?

Free Chess Bitboard Viewer

If your chess engine is bitboard based you will need to be able to visualize what the bitboard looks like as an actual chess board.  I can’t image developing a bitboard engine without this type of utility.  So I created a simple chess bitboard viewer. 

You download it here: Chess Bitboard Viewer

Chess Bitboard Viewer

It runs on Windows 32 / 64 bit systems.  It’s super-easy to use (i.e. if you cannot use it think about finding a different hobby).  I’ve zipped it up since I assume downloading a raw (and rare) exe file will make some virus detector scream like a banshee!  BTW it has been checked for viruses and it’s clean – (but use it at your own risk).

You can click on a square and the decimal number will update – or you change the decimal number and the board will update.  You can also copy and paste from the clipboard. 

I hope it’s useful – feedback and comments welcome!