How to Debug Capture & Check Move Generators?

As I’ve explained in previous posts, a perft routine is a great way to debug a move generator. If you can correctly calculate the leaf node count for a variety of positions you are (almost) certain to have eradicated all the bug in your move generator.

But what about the quiescent search? Unlike the regular search, the quiescent search doesn’t look at all moves. And I’m not aware of any specialized quiescent perft test.

At a minimum the quiescent search generates all of the possible capture moves: and in most cases it also generates checks (at least for the first ply). How can we write these move generation routines with a high level of certainty they are bug free?

When I write Monarch I created a perft routine but didn’t create any test routines for the ‘generate_captures’ or ‘generate_checks’ routines. With Maverick I’m more determined to try to write a test routine for each procedure. So I wrote three new move generators:

  1. generate_captures
  2. generate_quiet_checks
  3. generate_quiet_moves

As the name suggests, the first routine generates all captures except en-passant. Captures which promote the pawn are limited only to queen promotions.

The second routine, generates checking moves which are not captures. It ignores pawn promotions, en-passant moves and castling move which give check.

The third routine generates all the other moves. When Maverick finally plays chess I don’t expect to use this routine at all. However, all three routines together generate the same set of moves as the regular move generator. This means I can test them using a version of the perft routine which calls all three specialized move generators for each position. It worked well and I was able to trap a couple of bugs.

What really amazed me was the speed of this adjusted perft routine – it was only about 10% slower than the regular routine. This is all down to the bitboard structure. With a letterbox structure I’m sure there would have been a much wider gap between the normal perft and this specialized move generator version.

Bitboards rule!!!

50% Improvement in Perft Speed

I decided to see what Mavericks performance is like through the lens of a profiler.  I’m using CodeBlocks as my GCC IDE.  It was remarkably simple to get the profiler working.  The first thing I notices was the large amount of time taken to see if a move resulted in discovered check, and therefore illegal.  The time consumed by this one (small) procedure was almost as much as the move-generation code.  I thought something must be wrong.

After some prodding and poking of the code I realized there was a much better way to accomplish the same task.  In my original code I was doing a looking from the king through the “from_square” and seeing if there was a rook, bishop or queen on the relevant path.  This is what I did in Monarch but it’s really letter-box style thinking.  It’s much faster to calculate all of pins at the start of the move generation process and store them in a bitboard.  Then when you make the move you only need to check if there is a discovered check if the piece is pinned (a simple “AND”) – which is rare.  You still need to perform the check if the piece is pinned, it’s a king move or an en-passant move.  I also “inlined” the “is_in_check_after_move” procedure.

The result of these changes is a boost to the perft speed of about 50%.  Maverick now crunches 76 million nps on my humble 2.2 GHz i7 notebook. 

As an aside, my notebook is two years old.  On my wife’s newer 2.4 GHz i7 Maverick’s speed is 99 million nps. Somehow Maverick’s architecture is more suited to newer machines.  I assume it’s a bigger cache.

Debugging a Chess Move Generator

Perft is a fantastic way to debug your chess engine’s move generator (as well as the make and un-make routines).  Here are some tips which you may find helpful:

  • Find a set of test positions with known perft scores and which cover all the weird and obscure chess moves.  See my previous post Perfect Perft for some good examples.
  • Use an engine such as Sharper to split out the perft of each sub move for a position (for Sharper this is the divide command).  This way you can see which move gives you a problem.
  • Write a “flip_board” routine which (as the name suggests), flips the board.  This means you can test each perft position from white and black’s perspective.  The ensures you’re writing Color Blind Code!  Here’s my perft code which automatically flips the board:

//--Position 1
for(i = 0; i <= 1; i++){ set_fen(position, "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"); ok &= (perft(position, 6) == 119060324); flip_board(position); }

  • Use "asserts" liberally in conjunction with a "board_integrity" routine.  The "board_integrity" routine is key.  It simply ensures that everything makes sense in the chess board object / record - returning "true" if everything is OK and "false" if it find something wrong.  For example does the "all_pieces' bitboard agree with each individual color's occupancy bitboards?  Does the pawn has which has been created incrementally agree with a hash value created from scratch?  I add asserts at the beginning and end of almost every routine.  Here's my board integrity code:

BOOL integrity(struct t_board *board){
int i;
t_chess_color c;
t_bitboard b;

//-- Link between squares and bitboards
for(i = 0; i < 64; i++){ if (board->square[i] != BLANK)
if ((board->piecelist[board->square[i]] & SQUARE64(i)) == 0)
return FALSE;

//-- Color bitboards match up
for(c = WHITE; c <= BLACK; c++){ b = 0; for(i = KNIGHT; i <= KING; i++){ b |= board->pieces[c][i];
if (b != board->occupied[c])
return FALSE;
if (board->square[board->king_square[c]] != PIECEINDEX(c, KING))
return FALSE;
if (SQUARE64(board->king_square[c]) != board->pieces[c][KING])
return FALSE;
if (popcount(board->pieces[c][KING]) != 1)
return FALSE;

//-- All pieces match up with occupancy
if ((board->occupied[WHITE] | board->occupied[BLACK]) != board->all_pieces)
return FALSE;

//-- Castling rights
for (i = 0; i < 4; i++){ if (board->castling & ((uchar)1 << i)){ if (board->square[castle[i].king_from] != PIECEINDEX((i >> 1), KING))
return FALSE;
if (board->square[castle[i].rook_from] != PIECEINDEX((i >> 1), ROOK))
return FALSE;

//-- Hashing
if (board->pawn_hash != calc_pawn_hash(board))
return FALSE;
if (board->hash != calc_board_hash(board))
return FALSE;

return TRUE;

If you follow these tips and your engine agrees with the perft scores of the suggested positions (in debug mode), then I'm 99% sure you'll have a bug free move generator and make / unmake move routine.  Writing test routines never sounds appealing, but having code which you know is (almost) bug free is a GREAT feeling!!

Perfect Perft!

Over the weekend I got the last couple of bugs out of Maverick perft routine.  It now has a perfect node match for all the positions I’ve tried – Yippee!! 

In my opinion getting to this stage is a major milestone in the development of a chess engine.  If you can get this far you’re almost certainly capable of writing a fully functional chess engine.

It may be interesting to some to share my approach to perft.  When finding positions to use in a perft test you need to come up with “odd” position – ones which don’t occur too often and ones which are special cases.  Normally this means positions which contain the special chess moves i.e. castling, en-passant, promotions and capture promotions.  In particular I’ve found the best positions for perft are those which involve getting into, or out of, check using these special moves.  Also position with lots of revealed checks, pinned pieces are good.

Here’s the list of positions I used.  They are gleaned from Chess Programming on Wikispaces and this post on CCC:

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; perft 6 = 119060324

r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -; perft 5 = 193690690

8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - -; perft 7 = 178633661

r3k2r/Pppp1ppp/1b3nbN/nP6/BBP1P3/q4N2/Pp1P2PP/R2Q1RK1 w kq - 0 1; perft 6 = 706045033

1k6/1b6/8/8/7R/8/8/4K2R b K - 0 1; perft 5 = 1063513

TalkChess PERFT Tests (by Martin Sedlak)

//--Illegal ep move #1

3k4/3p4/8/K1P4r/8/8/8/8 b - - 0 1; perft 6 = 1134888

//--Illegal ep move #2

8/8/4k3/8/2p5/8/B2P2K1/8 w - - 0 1; perft 6 = 1015133

//--EP Capture Checks Opponent

8/8/1k6/2b5/2pP4/8/5K2/8 b - d3 0 1; perft 6 = 1440467

//--Short Castling Gives Check

5k2/8/8/8/8/8/8/4K2R w K - 0 1; perft 6 = 661072

//--Long Castling Gives Check

3k4/8/8/8/8/8/8/R3K3 w Q - 0 1; perft 6 = 803711

//--Castle Rights

r3k2r/1b4bq/8/8/8/8/7B/R3K2R w KQkq - 0 1; perft 4 = 1274206

//--Castling Prevented

r3k2r/8/3Q4/8/8/5q2/8/R3K2R b KQkq - 0 1; perft 4 = 1720476

//--Promote out of Check

2K2r2/4P3/8/8/8/8/8/3k4 w - - 0 1; perft 6 = 3821001

//--Discovered Check

8/8/1P2K3/8/2n5/1q6/8/5k2 b - - 0 1; perft 5 = 1004658

//--Promote to give check

4k3/1P6/8/8/8/8/K7/8 w - - 0 1; perft 6 = 217342

//--Under Promote to give check

8/P1k5/K7/8/8/8/8/8 w - - 0 1; perft 6 = 92683

//--Self Stalemate

K1k5/8/P7/8/8/8/8/8 w - - 0 1; perft 6 = 2217

//--Stalemate & Checkmate

8/k1P5/8/1K6/8/8/8/8 w - - 0 1; perft 7 = 567584    

//--Stalemate & Checkmate

8/8/2k5/5q2/5n2/8/5K2/8 b - - 0 1; perft 4 = 23527

Tomorrow I’ll talk about the perft debugging process – it’s quite simple once you put a couple of things in place!

Is Perft Speed Important?

The perft routine in Maverick is almost fully debugged!  I believe this is an important milestone in any chess engine’s development.  It’s really the first time the code “plays” chess, in the sense it generates moves, make and un-makes the moves, and iterates up and down the search tree.

I also think the speed of the perft routine is significant and is a measure which can be used to compare engines.  Now some people will disagree with me but here is my logic:

Validation of Structure:  The structure of a chess program could be defined as how the board is represented and the way moves are generated and stored.  This impacts the overall speed of an engine.   A perft routine is a good measure of manipulating this chess board structure.  Clearly there are other factors, most notable the size and complexity of the evaluation function.  But I would even argue a fast perft implementation is a indicator of a chess structure which can support a good evaluation function.  If you think about it, any half decent evaluation function must iterate over the squares and find some proxy for mobility.   At a basic level this is what is happening in a perft routine.  So I would argue a fast perft speed is an indication of a solid foundation upon which a strong chess engine can be developed.

Difficult to “Fiddle”:  Some people talk about an engine being “fast” based on the self reported measure of “number of nodes processed per second’.  The problem with nodes per second as a measure of speed is the definition of a node.  There is no standard definition of a node.  Some engine authors definite it as a call to the “alpha-beta” routine, while other base it around the make / undo move routines, and then again others use the “generate-move” procedure.  There isn’t one standard.  And the measure can easily be fiddled.  This is not the case with perft.  There are only really two ways to count the number of nodes; one being the number of leaf nodes, and the other is the total nodes (internal and leaf).  It would seem the standard measure is the number of leaf nodes.

Having said it’s difficult to fiddle there are three distinct approaches to perft routines and each one impacts the speed:

Make & Unmake All Moves: Most chess engines generate pseudo legal moves.  These are moves which are generally legal but may expose their king to a discovered attack and so are not actually level.  The reason chess engines generate pseudo legal moves is to save the time checking to see if there is a discovered check (which is costly) and may not even be required is there is a “cut-off”.  So the test for the discovered check is often carried out as part of the “Make-Move” routine.  The simplest perft implementation simply iterates through the depths generating the moves, make and unmaking each move in turn and counting the number of nodes.  This is the slowest type of perft implementation.  I regard this type of perft as a measure of the efficiency of the make and unmake routines, since this is the task which is carried out the most.

Counting Legal Moves at The Leaf: In contrast to the first method, another approach is to generate only truly legal moves.  In this approach a lot of time can be saved at the leaf nodes by simply returning the number of moves generated (without having to make and unmake each one).  The cost is a slightly more complex move generator which must detect potential discovered check.  In general this approach will be quite a bit faster than the first approach.  I regards this approach as a measure of the efficiency of the move generation routines.

Hashed Moves: In a perft search many transpositions occur.  This means the whole search can be significantly sped up by hashing the results and storing them in a hash table.  If the position reoccurs in the tree, the number of nodes in the sub-tree can be retrieved from the hash table and there is no need to search the sub tree.  I have not implemented this in Maverick.

Parallel Search: This is not something I have implemented in Maverick but the speed of the perft routine could be improved by implementing a parallel multiprocessor search.

Initial Perft Speed Results:

Based on the above logic I was eager to see how fast the new bitboard structure is compared to Monarch, which used a letter-box data structure. Monarch isn’t by any measure a fast searcher, so I was hoping for a speed improvement.  The position I used is from:

perft chess

FEN: r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq –

The number of leaf nodes to a depth of 5 ply deep is 193,690,690.  Monarch’s perft routine is of the “make / unmake all moves” type.  It managed to crank out the six five ply perft for the above position in 71.5 seconds on my Core i7 2670QM running a 2.2 GHz.  To my surprise Maverick blew this away with a time of only 16.3 second.  This is 4.3 times speed up – woohoo!!  Maverick’s legal move generator approach completed perft 5 in exactly 5.0 seconds.

Both engines were compiled using Microsoft Visual Studio Express 2012 in 32 bit mode.  I imagine there will be a reasonable speedup when I move to 64 bit and add the SSE bit-twiddling routines.  This is a much bigger speed-up than I anticipated and illustrates the superiority of the magic bitboard approach.

What type of perft routine does your engine employ and how fast is it?

Up to my Knees in Debugging

I’ve been in Tokyo this week on business.  So there hasn’t been too much time for chess programming (sigh).  Hopefully I’ll have time on the 13 hour flight back via Dallas Fort-Worth. 

I’m in the process of using perft to debug the move generating and make / unmake routines.  For those who haven’t come across perft, it’s a routine which calculates all of the possible positions from a give starting position to a specified depth.  It’s a great way of debugging your core code as there are known possible where stable programs have calculated the correct nodes counts. I’ll come back and talk about perft in a later post – once I’ve managed to get all of these bugs out! 

As I’m debugging I’m struck by just how many bugs there are in my code.  I’ve always thoughts of myself as a reasonably defensive programmer and quite good at creating bug free code.  But there are a myriad of little bugs which are cropping up.  Take for example this macro:

#define SQUARE64(s) ((t_bitboard)1 << s)

It's not too complex.  It take a square number between zero and 63 and returns the bitboard version.  But, to my surprise, it doesn't work all of the time!  I have some code which return the potential en-passant square as a bitboard:

board->ep_square = SQUARE64((from + to) >> 1);

This gave spurios result!  The reason was to do with the precedence of operators.  The corrected version simply encloses the "s" in brackets. 

#define SQUARE64(s) ((t_bitboard)1 << (s))

This is most likely a newbie error and highlights my lack of deep experience with "C", but it also illustrate the subtleties required to simply get a chess engine up and running!