The holy grail of statistical analysis in any sport is to have one metric that fits all.

Not only does it fit all, but it can accurately evaluate a player’s quality. Existing examples of this from other sports include “Wins Above Replacement” (WAR) in baseball or “Player Efficiency Rating” (PER) in basketball. To date, however, no such metric exists in rugby. I wanted to change that, and so I created the “Expected Wins Contributed” (xWC) metric.

Expected Wins Contributed

To understand how xWC was devised, we must first understand a few concepts. The first of which is the key performance indicator (KPI). Just using wins as a measure of success for your sports team can be misleading. For example, Ireland’s recent win against France could easily have been recorded as a loss if Sexton had missed his drop goal. Or even if Anthony Belleau had successfully kicked his penalty. Because of this, sports teams like to measure their success by looking at the statistics that highly correlate to winning. These statistics are called KPIs.

The 4 KPIs I used for this analysis were:

  • Tackle percentage,
  • Metres made,
  • Line breaks and
  • Tries.

Tackle percentage

If we look at tackle percentage specifically, we can make a scatter plot with data from last year’s Six Nations. The plot shows the clear correlation between percentage of tackles made and games won.

The R-squared* for this is 0.7228, indicating a strong correlation.

* – R-squared is a statistical measure of how close data is to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. The higher the number between 0 and 1 the more significant the correlation. Similarly, a negative number suggests a negative correlation.

We can now perform a regression analysis, which is finding the equation of the line of best fit. From this analysis, we can calculate that a simple model for predicting a team’s win is [0.3664 * Tackle percentage – 30.05]. The “- 30.05” essentially accounts for the y-intercept of the graph.

Of course, tackling isn’t everything, so we can improve our model by including the same values for all the other KPIs and taking an average. We end up with this table for what I’ll call expected wins.

As you can see, when rounding to the nearest whole number, the model correctly predicted that Ireland and France would get 3 wins each and suggests that Wales were slightly unlucky and should have possibly stolen a win off England.

But can it be applied to individual players?

Yes, but there are some important limitations to recognise before we do so. This model doesn’t consider the strength of opposition and, as such, a lot of English players will have inflated numbers this week. Also, by the nature of the metrics involved, backs are more likely to put up bigger numbers than forwards. That being said, it can still be a useful and interesting metric for the casual fan, so without further ado, here are the top players from the first weekend.

I would agree that xWC provides a fair representation of England’s key players in the match on Sunday. Do you?

I’m sure you’ll tell me in the comments.

Author: Peter Matthews

I’m from just north of Edinburgh, Scotland and I am currently studying Mathematics with Statistics. I love blending my two passions of maths and sports – mainly rugby, cricket and football (soccer) – by looking at stats and patterns in the game.

8 COMMENTS

  1. It looks like a pretty good measure but I feel that all the statistics apart from tackle percentage are heavily biased to favour backs (and fast forward Sam Simmonds).

    • Hi Jack,
      That is a perfectly valid criticism of the model. Keep in mind that it’s only supposed to be used for entertainment purposes for fans and so I didn’t want to over complecate it by adding more kpis than necessary

      • This is a good start, but there is a lot more to be done to get a really valuable metric. WAR was worked out by crunching both individual CORR and clustering CORR for 1000s of games going back decades. Rugby does not have that same wealth of data to draw on, but if you can get player level stats for even 3-4 years, then I think testing every potential KPI is needed before picking a specific set is needed. To get forwards more involved with relevant KPIs is by far the biggest issue and so proxies like penalties conceded and scrums lost are potentially needed to create forward specific stats. Peter, this is something I have been thinking about for a long old while, though my R skills aren’t what they might be. I will ask around a few friends who are better coders than I if you want to put a few minds on the idea of xWC?

  2. If you take out Italy from the graph, there is no correlation at all between games won and tackle completion. In fact, the correlation would be negative. You need more data points to prove a strong correlation (although it obviously makes sense in general).

    You’ll also find that one of the best teams in the Aviva Prem have one of the worst tackle completion %s (Sarries). The aim is not always to complete but to pressure the attacking team.

    You’re starting off with a tiny sample of games, and further breaking this down to theorise on an even smaller sample of games to rank individual players.

    I’m sure tries scored does correlate well to wins but calling this statistical analysis is very dubious.

    • I was just using last years 6 nations to express the method in a way that everyone could understand. After this the actual model has data from all internationals played since the World Cup. There were similar results

  3. To add, correlation is not the same as causation. I would bet that the players that miss the most conversions win the most games. I wouldn’t call it a desirable KPI though.

  4. Interesting article Peter. What other KPIs did you consider including? I was a little confused by whether you intend to create a metric for teams, or a metric for players. For players I think it should be position specific, or least split between forwards and backs. Unlike baseball or basketball the roles are so much more varied.

    If you want to keep the number of KPI’s down, I would suggest replacing them with subtly different ones.
    Linebreaks could be replaced with linebreak conversion (average tries per linebreak after removing tries from lineout mauls or pushover scrums).
    Tackle success could be replaced with average number of successful tackles per game (thereby accounting for riskier but potentially better defensive systems where teams may slip off more tackles but end up with more possession)
    Metres made could be replaced by average meters per carry
    Tries would already be covered by linebreak conversion, so a new metric could be introduced to balance out the metres made, which favours backs. For example, it could be total contact involvements (number of carries + tackles + offloads)

LEAVE A REPLY

Please enter your comment!
Please enter your name here