Calibrating Ratings for Bots 3.0

For our third iteration of our bots, we spent a lot of effort to make sure we have bots at all levels and all playstyles. We adjusted the bots to cover the entire rating range evenly, and of course tried our best to keep the ratings as transparent and logical as possible.

What is Bots 3.0 And What Changed

Bots 3.0 is our third iteration of our bots, a lot of things changed, for example playstyles and opening books. With so many changes, we of course also needed to do a full rating calibration, and it moved ratings around quite a bit.

How we Calibrate Ratings

For the actual measurement and calibration, we are using the same methodology as we've used before. You can read more about it here. The difference this time around is that we're in more control of micromanaging the ratings with different methods, for example reducing likelihood of blundering increases the rating. We've used this to make sure we have bots at all rating levels.

Difficulty of Creating Weaker Bots

One big goal with Chessiverse is to offer fun and challenging bots for all users, from absolute beginners to strong masters. When we introduced bots 2.0, we added a new type of bot that played a lot stronger, meaning our strongest users got a challenge. However, the weakest bots were quite hard still. We've now fixed that and drastically reduced the playing strength of our lowest rated bots.

It might sound easy to make a bot play worse, but it's really one of the hardest things we've done. At low levels the bots have to blunder, a lot. But obvious blunders can feel very computery. The most common complaint is that a bot plays very well and then suddenly blunders the queen. That's not a fun experience, so we've done our best to let the bots blunder, a lot, but also play weak in other areas. They might neglect development, or overextend, miscalculate trades and all kinds of other beginner mistakes. We think we've done a good job of this, but you be the judge!

So What About the Ratings

So how do the new ratings compare to the old ratings? In general we think that users will gain about 200 rating points, all else being equal. This is a result of reevaluating how our anchor bots played, and also reviewing feedback and user behaviour.

Our lowest rated bots still sit at around 800, but they're now significantly weaker than our old 800-rated bots. At the other end, our highest rated bots have increased to up to 2800 rating, while being about similar in strength from before.

Why Can't You Just Add Dynamic Ratings

One common question we get is why we can't just let the bots ratings change as they win or lose. This would be the most accurate measurement right? And my answer is quite easy in that since we allow people to do whatever they want against the bots (use opening books, cheat, abandon games, whatever you like, we don't judge), the dynamic ratings will fluctuate. A lot. For example, if someone decided to punish a bot with Stockfish for ten games, that bot's rating would tank and there's really nothing that can be done to fix that, short of starting cheat detecting and similar, which we of course don't want.

Another problem is that so far we've kept users from interacting with each other, directly or indirectly. We don't have leaderboards, we don't have public profiles. This might change in the future (always opt-in of course), but we've stayed clear of anything like that so far. A dynamic rating for bots, is directly affected by how users play against them. So indirectly, everyone will be affected by your performance. This might not be all bad, but it's something to be very aware of.

But, We Just Added Dynamic Ratings!

Sorry for the misdirection. :) The previous section is all true, but thinking about it, we decided it could be a really fun thing to have. So from now on, bots will have two ratings. A static one, carefully calibrated with hundreds of thousands of bot games, and adjusted against Lichess blitz ratings to give a good indication of what you can expect from the bot. This is what you'll filter on and search for in the Explore-page and so.

And a dynamic rating, that will be changed after every game the bot plays. It's based on the opponent's rating, just like ratings are calculated for anyone else. In the image below you can see the static rating, and below it the current dynamic rating. You will also be able to see a quick history of how the bot's rating changes.

I have no idea how this will work out. Perhaps dynamic ratings will be super accurate and eventually replace the static ones, or they're a wild rollercoaster that says nothing about the bot's strength. We'll see!

In Summary

Bot should be about 200 rating points higher rated compared to before, but with all new playstyles and opening books, it's very hard to say how this feels while playing. Dynamic ratings is a fun addition and we'll keep a close watch on them!

Bots 3.0: How Ratings Changed

Written by