Ex-post probability assessments from expected goals statistics

The expected goals statistics (xg) in football/soccer games have been around since 2012, but I have only really started to notice them properly in the last few years. They are an interesting attempt to quantify and objectify randomness in football games. In this blog post, I want to explain how one could use these statistics to compute probabilities of the various results of a game after the game was played.

Computing probabilities of how a game could have ended after it was played does perhaps seem a bit silly. After the game, we already know how it ended. However, it could be quite relevant to a sober judgment about a team’s performance. Sometimes journalists heap praise on a team and its manager after a game that ended with a tight 1:0, where the opposition had plenty of chances that they did not take; sometimes journalists condemn a team and its manager for a game lost with a wonder strike in the last minute. I often feel that there is generally a lack of a proper appreciation of the randomness in football games.

Of course, you could ask whether there is any randomness in football at all. One can take the view that everything is deterministic and just a question of (Newtonian I think would suffice) physics. But I would think that most people would agree that there are some factors that nobody can control and nobody can completely foresee (even if they understand physics). An unexpected sudden gust of wind might make the difference between a long-range shot hitting the bar so that the ball goes in, or hitting it so that the ball stays out. An invisibly slightly wetter patch of grass might make the difference between a sliding tackle getting to you in time to take the ball off you, or just falling a tad short of that.

Once you allow for some such randomness, the next question would be how one should quantify such randomness. In fact, can we quantify it in terms of probabilities and even if we can, would we all come to the same conclusion? There are some instances of randomness that researchers (on the whole) would refer to as objective uncertainty, also often called risk. A good working definition of objective uncertainty (or risk) would be that it is such that most people assign the same probabilities to the various events (events = things that can happen). Think of the uncertainty in a casino in games such as Poker, Roulette, Blackjack, Baccara, or Craps. Most people would agree that the chances of the ball in roulette coming up, say, 13, is 1/37, because there are 37 numbers (0 to 36), symmetrically arranged around the roulette wheel. If someone told me that they feel lucky and think that the probability that their number, say 13, will come up is 50%, I would even believe that they are just wrong.

Outside the casino, it seems that most of the uncertainty we encounter is not objective in this sense. Uncertainty is often in the eye of the beholder, as people like to say. This is even true in the casino at times. If you have two aces in your hand playing poker, your assessment of you winning this round will be different from that of your opponents, who don’t know which cards you are holding. Similarly, someone who has observed the weather over the last 48 hours would probably make different weather predictions for the next day than someone who has not done so.

There is a deep philosophical debate in the literature (google the common prior assumption) as to whether we should model different individuals’ different probability assessments over some events as deriving exclusively from them having seen different information, or whether individuals could also sometimes be modeled as just having different beliefs about something – period. Luckily, this debate is not hugely relevant to what I want to discuss here. But I do believe that it is just empirically correct to say that different people would often quantify randomness differently (for whatever reason). You might suspect that some people don’t quantify uncertainty into probability at all. Probably true, but one has to be careful. People might not be able to tell you what probability they attach to certain things, but they might behave as if they do. I also don’t want to get into this either, though.

What I wanted to say is that I believe that football games have uncertainty that is not typically considered objective. Ask two different people (perhaps ideally people who bet on such things) about the chance of how a game would end and you will probably get two different answers. But I do like attempts to objectify the randomness in football games. And the xg approach is a pretty good attempt. As far as I understand, see again this post, an xg value for any chance at goal in the game is computed using a form of (probably logit- or probit-like) regression given a large data set of past shots, where shot success is explained with variables such as distance from goal, the angle of the shot, and many other factors. Personal characteristics do not seem to be used. This means that the same chance falling to Kylian Mbappe or a lesser-known player would have the same xg. We might come back to that later. [Actually, now that I am finished with this post, I see that we won’t. A shame.]

I want to get to one example that I will work through a bit, to eventually come up with what I promised at the beginning, an after-the-game assessment of the probabilities of how the game could have ended. Let me take the, for Austrians, slightly traumatic experience of the recent round of 16 game between Austria and Turkey at the 2024 European championship in Germany, which Austria lost 1:2. I found two good sources that provide an xg-value for this game: the Opta Analyst and a website called xgscore.io. Both provide xg-values for the two teams for the entire game: this is the sum of all xg-values for each goalscoring chance. The Opta Analyst makes it an xg of 3.14 for Austria and an xg of 0.92 for Turkey (when you click on the XG MAP in the graphic there) and in the text they make it: “Austria can consider themselves unfortunate, having 21 shots to Turkey’s six and recording 2.74 expected goals to their opponent’s 1.06.” Xgscore.io finds an xg of 2.84 for Austria and an xg of 0.97 for Turkey. So even they do not all agree.

An objective assessment of expected goals for each team is not quite enough yet to compute the probabilities of how the game could have ended. I need an assessment of not only the expected goals but also their variance. In fact, two teams with the same xg of 1, could have a very different distribution of goals scored. One team could have had an xg of 1 because they had one chance and that one chance had an xg of 1; perhaps it was a striker getting the ball one meter in front of goal with the goalkeeper stranded somewhere else on the pitch. Then this team would have scored 1 and only 1 goal, and that with certainty. Another team with the same xg of 1 could have had two chances that both had a 50% of going in. This team could have scored 0, 1, or 2 goals, with 0 and 2 goals 25% likely and 1 goal 50% likely.  

I don’t think that Opta (or any other source) regularly provides the xg details for each goal-scoring chance that I would need to compute these distributions. But for the game Austria versus Turkey, I can get a sense of these distributions from the XG MAP provided on the analyst.

Let me simplify and take an xg of 3 for Austria and 1 for Turkey. I will now calculate the probability distribution of the various outcomes of this game under two different scenarios. In both scenarios, I assume that Turkey had two (stochastically independent) chances, both with a 50% likelihood of success, a 0.5 xg. The two sum up to one. This makes the number of goals scored by Turkey, call it X, a binomial distribution with n=2 tries and a success probability p=0.5. In reality, Turkey had 5 or 6 chances, with all but two of them rather speculative efforts – see the XG MAP. In the first scenario, I assume Austria’s xg of 3 is decomposed into six (stochastically independent) chances with an xg of 0.5 each. This is not quite correct, but also not a terrible approximation of reality. This means the number of goals Austria scores, call it Y, is also binomial with n=6 and p=0.5. All I need to do now is to compute the probabilities that X>Y (Turkey wins), X=Y we have a draw, and X<Y (Austria wins). I asked chatgpt to do so; it does it correctly and provides not only the results but also the various steps of calculation. In this first scenario, I got a roughly 3.5% probability of Turkey winning, an 11% probability of a draw, and an 85.5% probability of Austria winning.

In the second scenario, I keep Turkey the same, but I now assume that Austria’s goals scored Y is binomial with n=10 and p=0.3. That means Austria had 10 chances with xg values of 0.3 each. Again, not quite correct, but also not a terrible approximation of reality. With chatgpt’s help, I now get a roughly 11% probability of Turkey winning, a 12.6% probability of a draw, and a 76.4% probability of Austria winning.

It is interesting to see how much of a difference there is in the two scenarios. If I had the full data for this game I could compute the probabilities more accurately, which would be a bit harder because each goal-scoring chance will typically have a different xg value and the total number of goals scored by each team is not simply binomial. But with some computing effort, the various probabilities could still be calculated. I would like it if Opta (or any other source) were to provide these after-the-game winning probabilities induced by their xg statistics.  

These after-the-game probabilities could now be compared with the before-the-game probabilities implied by the betting odds. I found betting odds for Austria vs Turkey here. In my notation, these are 2.05 for Austria winning, and 3.4 each for a draw and for Turkey winning. These translate into probabilities of 45.33% that Austria wins, 27.33% for a draw, and 27.33% that Turkey wins (I am making some assumptions here, ignoring the commonly observed favorite-longshot bias).

This does not mean that the betting odds were wrong, of course. It only means that, in some sense, Austria positively surprised the “market” in the game by producing a probably higher-than-expected xg-value, while in another sense, they negatively surprised the market by not winning.

So, assume that the xg-scores are indeed a good way to objectify the randomness of what happens in a game. Having detailed xg information, for every goal attempt, would allow us to compute objective probabilities for all possible ways the game could have ended. While I would very much like to see these after-the-match probabilities reported, and to see them used for a sober judgment of a team’s effort in a game, I also know that there is something we ignore when we do so. All this is under the assumption that the game would have been played equally irrespective of whether some of the earlier attempts at goal were successful or not. This is, of course, unrealistic. I, for instance, had the feeling that England tended to play better and with more urgency, creating more chances, when they were behind in a game than when they were in front or drawing. A team’s game plan is, generally, likely conditional on how the game unfolds. The randomness entailed in these game plans, however, is much harder to quantify.

3 comments

    • Thank you! One way to define betting odds, and this is what I do here, is that the odds o for one event are the euro amount that you would get back, if the event you bet on comes true, for every euro you bet. Then 1/o is the subjective probability that would make you as a risk-neutral bettor indifferent between betting on the event or not. Let me call 1/o the break-even probability of an event with odds o. One can compute this break-even probability for every one of the three possible results mentioned in the post: that Austria wins, a draw, that Turkey wins. If you sum up these three break even odds you get a number 1.076, which means that the betting company keeps about 7.6% of the money placed on the three bets. This number 1.076 is obviously slightly bigger than 1. So I adjust the break-even probabilities by dividing them by their sum (1.076) to generate probabilities. This procedure comes close to producing probabilities that cannot easily be rejected even with large data sets on sport bets, as I have argued here: https://gametheory.life/2022/11/12/predicting-the-outcome-of-the-world-cup/. But doing so, however, I don’t fully acknowledge the typical existences of the favorite-longshot bias that I mention in the post.

      Like

Leave a reply to lerachukaeva Cancel reply