FIDE rating changes: Are they working so far?

March 2024 brought a compression of the minimum rating to 1400 and some calculation improvements. Let's discuss their impact, and some alternative ratings too!

Oct 23, 2024

Let me start by saying that I am not affiliated with FIDE or any other entity that seeks to govern the complex machinations in the world of chess. This is an independent report, made out of interest in numbers - specifically, FIDE ratings in Standard chess, and their validity at expressing true strength. Now that we got that out of the way, some limitations are in order. While I have been a soundboard for Jeff Sonas in the past, I like running my own numbers and hypotheses. Additionally, the changes are still nascent, and long-term trends should not be inferred from 7 months of action after the implementation. That is to say, “take any big statements that I make with an equally large grain of Himalayan salt!”

There are 2 families of data sets that I will continuously reference throughout this analysis. One is the FIDE Ratings Download page, which currently extends all the way back to the monthly February 2015 rating list. This allows for meaningful longitudinal analysis. Another important resource is the Universal Ratings System (URS), an independent rating system commonly used in the Grand Chess Tour. As of the time of this writing, their Downloads page is not indexed properly, but I still managed to download monthly lists using a little Python coding. The URS serves as a perfect validation for FIDE ratings…more on this a bit later.

The structure of the article will be as follows, just so you know what you’re going to be reading ahead of time:

FIDE Standard ratings distribution
Effectiveness of calculation improvements on new player ratings
Geographical disparities
Some factors that explain the disparities

You don’t have to be a data nerd in order to enjoy the pictures and graphs. However, familiarity with basic statistics at high school level is implied throughout the discussion. A bit of knowledge about the way FIDE ratings work would also help in parsing the material, and so would familiarity with the main discussion points of this Jeff Sonas article.

1. FIDE Standard ratings distribution

The above graph is a snapshot of all people with a FIDE Standard rating on the October 2024 list (nearly half a million players). The decision to cut off the graph at 2400 Elo and 80 years old was made consciously, as those categories above are so sparsely populated, they wouldn’t provide any contrasting points. It should be clear that teenagers are dominating the chess world nowadays, by sheer presence. Their accumulation in the 1400-1500 Elo range should be particularly alarming for adults facing them, as they tend to be extremely underrated.

In the Sonas article (on page 8), three age categories are introduced: the under-19 improvers, the 20-38 aged stable players, and the 39+ decliners. I have sought to verify that this delineation makes sense post-compression as well. Here’s what the distribution looked like in March, immediately after the compression.

You can observe the sudden drop-off in the number of players introduced at 2000 Elo due to the very nature of the formula. Remember, only players rated below 2000 gained points “for free.” This behavior is natural, like crowding more people into a confined space.

And here is what it looks like now, in October 2024.

An astute reader will sum up the percentages and claim they add up to less than 100%. That is correct, the FIDE players database contains entries without a birth year…
The improvers pool is steadily gaining representation (+2.6%) and rating (+2 Elo), as expected.
The stable pool hasn’t been that stable lately, losing both representation (-1.1%) and rating (-3 Elo) in the 7-month interval.
The decliners pool is steadily losing representation (-1.5%) and rating (-7 Elo), as expected.
Overall, deflationary pressures on the system have remained, albeit quite mild compared to previous years.

If anything, we should be extremely happy that the 3-fold segmentation into age categories seems not only accurate, but also descriptively named. This is a trend that should be followed years into the future. At the current rate of growth, I expect the number of Improvers to exceed the number of Decliners in early 2027 at the latest. That is when the deflationary pressure on the system should subside, which brings me to my next point…

2. Effectiveness of calculation improvements on new player ratings

As explained in the FIDE Handbook on Rating Regulations, a player needs to face at least 5 opponents with FIDE ratings and perform adequately in order to obtain their first official rating. Article 8.2.2. contains one of the calculation improvements, which is to add two virtual draws against 1800 opponents for any new player in the system. The main reason behind it was to avoid players entering close to the rating floor, and being underrated compared to their actual playing strength. It is time to evaluate the impact of this rule change, as it pertains to classical ratings only. In order to do so, the methodology is as follows:

identify players in the Oct 2024 Standard FRL who were not present in the Mar 2024 iteration and perform summary statistics (N=28529).
similarly look at players on the Dec 2023 FRL who were not present in the Mar 2023 iteration. The interval was chosen specifically to represent a roughly equal sample of newly rated players (N=26989).

Lots to unpack here! If anything, it appears that players are injected CLOSER to the rating floor when accounting for the 40% compression. The average Elo of 1331 on the left-hand side would correspond to a compressed Elo of 1599 on the right-hand side, when in reality the average value is 1571. This is remarkably shocking, as the goal of the two virtual draws against 1800 opponents was to push this injection point further away from the rating floor. There is a caveat, however…bonus points if you write a comment detailing why this methodology is a bit hand-wavy ;)

What if we look at segmentation by age category, could that show the culprit?

This is decidedly worse! If anything, the injection of new players into the system followed a more pleasantly shaped bell curve in 2023. The U19 players are still injected too close to the floor, and the main effect of the calculation improvements so far has been to add noise. I am not sure that such a panicky conclusion is truly worth broadcasting after only 7 months, but someone may want to forward this article to the FIDE QC, because things are not going as expected. If I were them, I would monitor this trend, and keep detailed statistics of where people are injected into the system.

Since I have already started the task independently, I will share some results here. In my opinion, the culprit is not the calculation formula itself, but…(article continues below)

3. Geographical disparities

It is no secret that some countries are more underrated than others. Ask any GM how they feel about facing young 2300s coming from countries such as India, China, Kazakhstan, Armenia, etc. Their reaction should tell you everything there is to know. The reason for this disparity is mostly the nature of the rating pool in each of these countries. For the most part, it is a closed system, with few players traveling abroad and mixing with other federations. I shall try to establish this fact by looking at the injection point of new players, then by comparing the FIDE ratings to the independently maintained URS database. Hopefully some trends will begin to emerge - I have already posted some findings on my Twitter, which were received well, including by someone named Anish Giri.

We can infer that some countries are more deflated than others from using two separate methods. First, let’s look at the injection point of new players into the system. This data also captures the rating evolution of some of these new players between the March list (when they were not present in the FIDE list) and October. To keep meaningful data, I also retained only federations that have added at least 100 players with standard ratings during this 7-month interval. That’s a whopping 56 federations!

The box plot with whiskers shows a clear marker representing the average near the middle of each box. 95% of the entire distribution is contained between the edge of the whiskers, and the data points beyond that are big outliers. Some of the usual suspects are present here - India, Armenia, Uzbekistan, etc. If we assume that FIDE Elo is truly a universal measure of skill and that geographical disparities do not exist, we would expect all of these boxes to be roughly aligned with each other. After all, there’s no reason why a new OTB player in Netherlands should be 300 points stronger than someone in Sri Lanka taking up active OTB play, on average. Clearly, the geographical disparity is a pronounced effect.

What if I told you there was a way to compare the validity of FIDE ratings by using an independent rating system?

Enter the Universal Rating System, referenced in the introduction. I won’t bore you with technical details. If I were to summarize its inner working, I would describe it as a recursive performance rating over the past 6 years of activity, taking into account FIDE-rated games in all time controls. There is also an exponential decay curve, which weighs recent games heavier compared to older games. Very logical, logical, logical chat.

If you are still skeptical, let me superimpose the current rating distribution in both systems, as of now. The dataset after the intersection contains roughly 220k player entries, so it’s a significant sample.

If this was a beauty contest, the winner should be clear. The URS rating distribution is nearly a true Gaussian. It’s so beautiful, that sometimes I wonder why it hasn’t been met with widespread adoption among chess players. Old habits die hard, I suppose… Where am I going with this? Buckle up, as the next graph is gonna blow your mind for sure. In the previous section, we have already established that U19 players are injected too low in the current distribution. Let’s see, maybe URS can capture the true strength of U19 players better?!

“HOLD UP. You are telling me there’s a more accurate rating that I should check whenever I am paired to a junior opponent?” Yes! By orders of magnitude better…and sorry to inform you that the Indian 1600 kid you are facing is actually rated 2000 URS. My condolences to your Elo - life’s tough.

By now, I have convinced you that one rating system seems to capture the true strength of juniors better. Next, let’s show which countries (on the whole) are underrated, and which ones are overrated. The methodology here is easy to follow. Download the URS rating list, then merge it with the FIDE rating list with a simple matching along the FideID column. I grouped everything by federation using Python in Jupyter Notebook. Then, kept only federations that have at least 500 players in that 220k merged sample - 69 federations total. This excludes countries with a low level of chess activity.

Players from countries at the top are significantly underrated, while those at the bottom would be great destinations for some chess tourism and “Elo farming.” The central tick represents the average of the distribution, while the edges of the box are set at 1 standard deviation (this captures the middle 66% of the distribution).

Let’s recap:

The countries with the lowest injection point in their FIDE rating are: Sri Lanka, India, Georgia, Bolivia, Armenia, Peru, Uzbekistan, Azerbaijan, Kazakhstan.
The most underrated countries (by the difference between URS and FIDE) are: Sri Lanka, India, Vietnam, Uzbekistan, Iran, Uganda, Bolivia, Kazakhstan.

This is great! Two completely independent methods show similar results, giving confidence that the geographical disparity is not just a perceived effect, but a real one.

4. Some factors that explain the disparities

A natural question could then be, “What is the main predictor of this deviation between FIDE ratings and URS ratings in a specific country?” I asked both Twitter and ChatGPT to do a bit of feature engineering, and am happy to report that Maurits van der Meer got the best answer of the bunch. If nothing else, this simplistic scatter plot should convince you of the fact.

That’s the winner, by far. Still, some other features are important and could explain part of the variation. Here’s a summary that I am happy with. It sits at the intersection of “I have thought enough about this” and “Doing more is overkill.”

*NB: As these are R-squared values, they don’t capture the proper relationship. A high average rating in a country’s population is negatively correlated with the level of “underratedness.”

5. Summary - Odds and Ends

Young players (defined as U19, per the Sonas methodology) are joining the rating pool at an accelerated pace. By 2027, they will exceed the number of aging, declining players.
U19 players are sapping rating points away from older, more established players, putting a deflationary pressure on the FIDE rating system overall. The uneven K-factors may need to be revisited in the upcoming years.
The “calculation improvements” implemented by FIDE so far have not necessarily been an improvement, but rather introduced extra noise in the distribution.
The URS distribution is remarkably smooth and captures a more accurate playing strength, especially for countries situated at the edge of the underrated/overrated range.
The geographical disparities in rating can mostly be attributed to the percentage of youth players in the overall chess playing population of each specific country.
You now have a list of countries to visit (and a list of countries to avoid), if your sole intention is to gain Elo.

And that’s it from me today, folks! If you want to delve deeper into the matter, I welcome your input, though I cannot make any promises that this research will continue, other than strictly from a hobby standpoint. My main occupation in the chess world is still as a coach for now - you can read more and even book an intro call (geared for adults rated below 1800 FIDE). Alternatively, if you would like to support my independent writing, consider making a one-time or recurring donation using the PayPal button below.

PayPal Donation Button

And, as always, thanks for taking the time to read my work. I am fortunate to be part of a community that seems to appreciate quality long-form posts. Until next time, ciao!

dboing dboing

Oct 24, 2024Edited

https://vladchess.substack.com/i/150600221/the-structure-of-the-article-will-be-as-follows-just-so-you-know-what-youre-going-to-be-reading-ahead-of-time

I just wanted to support this way of communicating technical reviews (or even chess, which is technical stuff). I find that chess literature or many communications might need of such keys to help the forward attention given the audience individuals various walks, and needs to further read such article.

I like the wide scope starting point, and complete backreferencing of information needed. Even though I am not familiar not coming from such chess community context, where for example "compression" might be already defined. Not asking, as it does not seem to matter given the rest that I read. When I will have the time and need to scruntize such rating system. I know where to go back to, for something both tangible and widely scoped for my need of the big picture and specifics to be satisfied or debatable in reasonble time frame.

Expand full comment

Dan Schmidt

Oct 24, 2024

Very interesting article, thank you. I was not surprised to see that USA has the second-highest average new player rating but is not overrated according to URS; the reason (I think) is that many of them are not really new players, because almost all US tournaments are conducted with USCF rules and ratings and only a fraction of them (typically higher-level tournaments) are FIDE-rated. I was over 2000 USCF and had been playing OTB for decades before I finally got a FIDE rating. I don't know how common it is for countries to have their own rating system (and for those countries, how common is it for tournaments to not be FIDE rated at all). CAN and ENG also seem to occupy similar positions in a hypothetical "average new player rating vs overrated-ness" scatterplot (which would be interesting to see!).

2 replies by Vlad Ghita and others

4 more comments...

Vlad’s Chess Chronicles

Discussion about this post

Ready for more?