World Cup Forecast: Brazil Starts as Front Runner
- Artificial Intelligence
- Top News
- Research

On Sunday, November 20, the men's soccer teams kick off the FIFA World Cup in Qatar. The favorite this time is Brazil with a probability of winning of 15 percent. An international team of researchers consisting of Andreas Groll and Neele Hormann (both TU Dortmund University), Gunther Schauberger (TUM), Christophe Ley (University of Luxembourg), Hans Van Eetvelde (Ghent University) and Achim Zeileis (University of Innsbruck) has used machine learning to show this. Their forecast combines several statistical models for the teams' playing strengths with information about team structure (such as market value or number of Champions League players) and socio-economic factors of the country of origin (population and gross domestic product). "The World Cup this time is overshadowed by many ethical and sporting problems that we do not want to hide. Nevertheless, out of scientific interest, we decided to use our machine learning approach, which we have used successfully in previous tournaments, to produce probabilistic forecasts," says Achim Zeileis.
100,000 simulations
Using the predicted values from the researchers' model, the entire World Cup was simulated 100,000 times: match by match, following the tournament draw and all FIFA rules. This results in probabilities for the advancement of all teams into the individual tournament rounds and ultimately for the World Cup victory. The favorite this time is Brazil with a 15 percent probability of winning, followed by Argentina (11.2 percent), the Netherlands (9.7 percent), Germany (9.2 percent) and France (9.1 percent). "The tournament is still not a done deal, of course - that is shown by the already comparatively low win probabilities of even the top four nations alone. It is in the nature of forecasts that they can also be wrong - otherwise soccer tournaments would also be very boring. We provide probabilities, not certainties, and a 15 percent probability of winning also means that 85 percent of the team cannot win the tournament," explains Andreas Groll.

So far, however, the forecasts have been quite successful: Achim Zeileis' Innsbruck model, which is based on adjusted odds from the betting providers, was already able to correctly predict the European Championship final in 2008, as well as world and European champions Spain in 2010 and 2012, among others. This year it will be used for the second time after the 2021 European Championship as part of a more comprehensive combined model developed by teams led by Andreas Groll (TU Dortmund University), Gunther Schauberger (TU Munich) and Christophe Ley (University of Luxembourg), which outperformed the forecasting quality of the betting providers at the 2018 World Cup.
Ethically and sporting problematic
The 2022 World Cup is interesting for the researchers* from a scientific perspective because of the date - as is well known, the tournament had to be postponed to the winter months due to the extremely high temperatures in Qatar in the summer: "In addition to the widely discussed ethical problems of this World Cup, this also raises very critical sporting issues: in the winter months, all the major soccer leagues in Europe and South America now have to interrupt their usual schedules to accommodate the tournament. As a result, national teams have less time to prepare and players have less time to recover before and after the World Cup. Combined with the extreme climatic conditions, this also increases the risk of injury," explains Achim Zeileis. Having a team with many players in the international leagues - such as the Champions League, Europa League, Europa Conference League - could therefore prove to be more of a disadvantage this year instead of an advantage as usual, as Andreas Groll elaborates: "All of these factors make it more difficult to predict how the tournament will go, as variables that have proven to be very meaningful in previous World Cups may not work, or may work differently."
As soccer fans, aside from scientific interest, the researchers* are dismayed by the circumstances under which the World Cup is taking place this year, emphasizes Achim Zeileis: "The usual joy and anticipation of a World Cup has been ruined by the terrible circumstances this year: from the alleged corruption in the FIFA awarding process, human rights and working conditions in Qatar, to the lack of sustainability in the construction of the stadiums."
Machine Learning
The researchers* calculation is based on four sources of information:
- a statistical model for the playing strength of each team based on all international matches played in the past eight years (Ghent and Luxembourg universities)
- a statistical model for the teams' playing strength based on the betting odds of 28 international bookmakers (University of Innsbruck)
- further information about the teams - for example, their market value - and their countries of origin - such as population size (TU Dortmund and TU Munich)
- a machine learning model that combines the other sources and optimizes them step by step.
The researchers trained the machine-learning model beforehand with historical data, as Andreas Groll explains: "We fed the model with the data that was current at the time for the past five World Cups, i.e. between 2002 and 2018, and had it compared with the actual match outcomes of all matches in the respective tournaments - so the weighting of the individual sources of information for the current tournament will ideally be very accurate." Incidentally, the model trained further in this way can also be used for other forecasts in the future - so a better soccer forecast may also provide more accurate weather forecasts in the future. How well the model performs in terms of soccer will be determined by the evening of December 18 at the latest.
To the entire forecast with interactive graphics
Contact for queries: