Is Perft Speed Important?

The perft routine in Maverick is almost fully debugged!  I believe this is an important milestone in any chess engine’s development.  It’s really the first time the code “plays” chess, in the sense it generates moves, make and un-makes the moves, and iterates up and down the search tree.

I also think the speed of the perft routine is significant and is a measure which can be used to compare engines.  Now some people will disagree with me but here is my logic:

Validation of Structure:  The structure of a chess program could be defined as how the board is represented and the way moves are generated and stored.  This impacts the overall speed of an engine.   A perft routine is a good measure of manipulating this chess board structure.  Clearly there are other factors, most notable the size and complexity of the evaluation function.  But I would even argue a fast perft implementation is a indicator of a chess structure which can support a good evaluation function.  If you think about it, any half decent evaluation function must iterate over the squares and find some proxy for mobility.   At a basic level this is what is happening in a perft routine.  So I would argue a fast perft speed is an indication of a solid foundation upon which a strong chess engine can be developed.

Difficult to “Fiddle”:  Some people talk about an engine being “fast” based on the self reported measure of “number of nodes processed per second’.  The problem with nodes per second as a measure of speed is the definition of a node.  There is no standard definition of a node.  Some engine authors definite it as a call to the “alpha-beta” routine, while other base it around the make / undo move routines, and then again others use the “generate-move” procedure.  There isn’t one standard.  And the measure can easily be fiddled.  This is not the case with perft.  There are only really two ways to count the number of nodes; one being the number of leaf nodes, and the other is the total nodes (internal and leaf).  It would seem the standard measure is the number of leaf nodes.

Having said it’s difficult to fiddle there are three distinct approaches to perft routines and each one impacts the speed:

Make & Unmake All Moves: Most chess engines generate pseudo legal moves.  These are moves which are generally legal but may expose their king to a discovered attack and so are not actually level.  The reason chess engines generate pseudo legal moves is to save the time checking to see if there is a discovered check (which is costly) and may not even be required is there is a “cut-off”.  So the test for the discovered check is often carried out as part of the “Make-Move” routine.  The simplest perft implementation simply iterates through the depths generating the moves, make and unmaking each move in turn and counting the number of nodes.  This is the slowest type of perft implementation.  I regard this type of perft as a measure of the efficiency of the make and unmake routines, since this is the task which is carried out the most.

Counting Legal Moves at The Leaf: In contrast to the first method, another approach is to generate only truly legal moves.  In this approach a lot of time can be saved at the leaf nodes by simply returning the number of moves generated (without having to make and unmake each one).  The cost is a slightly more complex move generator which must detect potential discovered check.  In general this approach will be quite a bit faster than the first approach.  I regards this approach as a measure of the efficiency of the move generation routines.

Hashed Moves: In a perft search many transpositions occur.  This means the whole search can be significantly sped up by hashing the results and storing them in a hash table.  If the position reoccurs in the tree, the number of nodes in the sub-tree can be retrieved from the hash table and there is no need to search the sub tree.  I have not implemented this in Maverick.

Parallel Search: This is not something I have implemented in Maverick but the speed of the perft routine could be improved by implementing a parallel multiprocessor search.

Initial Perft Speed Results:

Based on the above logic I was eager to see how fast the new bitboard structure is compared to Monarch, which used a letter-box data structure. Monarch isn’t by any measure a fast searcher, so I was hoping for a speed improvement.  The position I used is from:

perft chess

FEN: r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq –

The number of leaf nodes to a depth of 5 ply deep is 193,690,690.  Monarch’s perft routine is of the “make / unmake all moves” type.  It managed to crank out the six five ply perft for the above position in 71.5 seconds on my Core i7 2670QM running a 2.2 GHz.  To my surprise Maverick blew this away with a time of only 16.3 second.  This is 4.3 times speed up – woohoo!!  Maverick’s legal move generator approach completed perft 5 in exactly 5.0 seconds.

Both engines were compiled using Microsoft Visual Studio Express 2012 in 32 bit mode.  I imagine there will be a reasonable speedup when I move to 64 bit and add the SSE bit-twiddling routines.  This is a much bigger speed-up than I anticipated and illustrates the superiority of the magic bitboard approach.

What type of perft routine does your engine employ and how fast is it?