50% Improvement in Perft Speed
I decided to see what Mavericks performance is like through the lens of a profiler. I’m using CodeBlocks as my GCC IDE. It was remarkably simple to get the profiler working. The first thing I notices was the large amount of time taken to see if a move resulted in discovered check, and therefore illegal. The time consumed by this one (small) procedure was almost as much as the move-generation code. I thought something must be wrong.
After some prodding and poking of the code I realized there was a much better way to accomplish the same task. In my original code I was doing a looking from the king through the “from_square” and seeing if there was a rook, bishop or queen on the relevant path. This is what I did in Monarch but it’s really letter-box style thinking. It’s much faster to calculate all of pins at the start of the move generation process and store them in a bitboard. Then when you make the move you only need to check if there is a discovered check if the piece is pinned (a simple “AND”) – which is rare. You still need to perform the check if the piece is pinned, it’s a king move or an en-passant move. I also “inlined” the “is_in_check_after_move” procedure.
The result of these changes is a boost to the perft speed of about 50%. Maverick now crunches 76 million nps on my humble 2.2 GHz i7 notebook.
As an aside, my notebook is two years old. On my wife’s newer 2.4 GHz i7 Maverick’s speed is 99 million nps. Somehow Maverick’s architecture is more suited to newer machines. I assume it’s a bigger cache.