Chess Engine Testing Framework

Chess-Engine-TestingSince Maverick is now playing chess I really need a testing framework.  Testing has always been important but I think Fabian Letouzey really “turned some heads” when he started making rapid progress with Fruit back in 2004.  His secret turned out to be making decisions about engine parameters and potential improvements based on cold hard facts i.e. testing.  Since then we’ve seen much more discussion about testing.  Then came Rybka and Houdini who championed, what I’d call, turbo-testing.  This is where the games are super short – something like 60 moves in 20 seconds.  Ten years ago I think this would have been frowned upon as too fast and the results will likely not scale to longer time controls.  As it turns out this fear seems to be incorrect.  Turbo-testing seems to be the best way to quickly test potential improvements.

When developing Monarch my testing really wasn’t systematic.  I’d run games of varying lengths.  Maybe 50 games or at most 100 games.  In hindsight Monarch really wasn’t tested well.  I’m determined not to do that with Maverick.  Thankfully there are also some great tools to help with the testing.  The tool which seems to be regarded as the best is CuteChess-Cli.  Another one is LittleBlitzer.  Both enable you to run multiple games in parallel and with minimal screen updates.  I decided to plump for CuteChess-Cli as my tool of choice.  Here’s how I went about setting up the testing framework:

  1. First you need to download CuteChess-Cli
  2. I decided to create a folder for the testing framework as a sub-folder of my root directory.  This way any paths will be quite short.
  3. I then selected a number of engines which will be Maverick’s sparing partners.  The criteria I set for selecting the engine is as follows:
    • Stable – each engine must be rock solid stable
    • A playing strength of between 1800 ELO and 2200 ELO.  This will provide a range of opponents, all of whom will initially be better players.
  4. The engines I selected are as follows:
    • Monarch 
    • Clueless
    • MadChess
    • Predateur
  5. I created a few “good-old-fashioned” batch files to execute the actual tests.  Here’s an example of one of the test batch files:


SET varYYYY=%DATE:~10,4%
SET varMM=%DATE:~4,2%
SET varDD=%DATE:~7,2%
SET varTodaysDate=%varYYYY%%varMM%%varDD%

SET PGN_DATABASE=Games%varTodaysDate%.pgn

c:CuteChesscutechess-cli -engine name="Maverick 0.10 Beta" cmd=Maverick dir=c:CuteChessMaverickNew1 -engine name="Predateur 2.2.1 x32" cmd=Predateur dir=C:CuteChessPredateur -each proto=uci tc=60/180 book=c:CuteChessobook.bin -pgnout c:CuteChessGames%PGN_DATABASE% -games 500 -concurrency 4 -wait 5

pause 10

As you can see this, runs one match of 500 games between Maverick and Predateur.  Four games are played concurrently on the same machine.  All the games are stored in a PGN which contains today’s date (you may need to adjust the code at the top of the bat file for different date formats – mine is US).   For editing these types of files I’d suggest NotePad++.  When installed you simply right click on a .bat file and you have the option to edit it using NotePad++.  It also color code the formatting – very useful.

Since it may be useful to some of you, here’s my whole testing framework zipped up and ready for download.

  • Emilio

    Most interesting stuff Steve.

    And the batch file you add can be quite useful to whoever is starting to use cutechess-cli.

  • cd

    Hi, good tool.

    Here’s a timestamp for replacement


    IF “%time:~0,1%” LSS “1” (
    SET varTodaysDate=%date:~6,4%-%date:~3,2%-%date:~0,2%-0%time:~1,1%-%time:~3,2%-%time:~6,2%
    ) ELSE (
    SET varTodaysDate=%date:~6,4%-%date:~3,2%-%date:~0,2%-%time:~0,2%-%time:~3,2%-%time:~6,2%

    SET PGN_DATABASE=Games%varTodaysDate%.pgn

    c:CuteChesscutechess-cli -engine name=”Maverick 0.10 Beta” cmd=Maverick dir=c:CuteChessMaverickNew1 -engine name=”Predateur 2.2.1 x32″ cmd=Predateur dir=C:CuteChessPredateur -each proto=uci tc=60/180 book=c:CuteChessobook.bin -pgnout c:CuteChessGames%PGN_DATABASE% -games 500 -concurrency 4 -wait 5

    pause 10[/code]

  • cd

    it’s supposed to fix such errors .

  • Steve Maughan

    Hi CD – yes these error can occur with European date and time formats. Does your script work with any date and time format?



  • cd

    Hi Steve
    I dont Know, give it a try and tell me. I pick up on a forum