Sunday, April 21, 2013

Me vs evolution: 0 - 2

I'm fighting a nasty cold for a week now so I tried to relax a bit over the weekend meaning no sports and no programming but at least my computer can do some calculations. 

The evolved weight set from the last PBIL run was much stronger than my hand selected set, but some of the weights looked suspicious.

The penalty for a double pawn was almost 0, even for a triple pawn very low, a weight related to the pawn shelter was even 0. So I thought I manually correct those values. Just a little bit.

I created a modified set, called ice.tom and let it play another 6000 games against evol-2. But I was not able to improve the set this way. The one coming out from evolution seemed stronger.

Considering the error bars still a tiny chance exists my set is stronger, but I doubt it.  

Rank Name         Elo    +    - games score oppo. draws
   1 ice.evol-2    89    4    4 18000   55%    51   36%
   2 ice.tom       84    7    7  6000   49%    89   43%
   3 ice.evol-1    69    5    5 12000   54%    44   33%
   4 ice.04         0    5    5 12000   39%    79   28%


For the moment I'm convinced that PBIL is better in tuning than myself.

2 comments:

  1. try your handmade engine play against engines not made by you and see if the pbil is still stronger than ice.tom

    ReplyDelete
  2. Yes, but in such an indirect comparison the error bar is 4 times as high as in a direct comparison, even here with 6000 games the result was not conclusive, so it would require probably about 100k games in total to be really sure. And the result would only be of temporary interest as I'm still changing the evaluation and also the general search behavior.

    In practice I think one should perform more runs and then take the strongest final solution. Two is not enough, 10 would be better. But also here you have to balance how many CPU hours you're willing to spend on it.

    Currently I'm thinking about search parameter tuning. So I will concentrate my efforts in that area. Different and interesting challenge again.

    ReplyDelete