Two weeks ago, we had the pleasure of sharing our work at the OptaPro Analytics Forum, which was a great experience and we received some great feedback from so many people within the football analytics community. Our poster presentation looked at how we could quantify the playing style of the modern day full-back, which has become an increasingly specialized position in the last couple of seasons with many top teams switching to playing with wing-backs.
Given the flexibility of the position, there are now a wide variety of full-back profiles, each with differing strengths and weaknesses. Because of this, it can be difficult to compare across players, with current methods assessing and evaluating players within clubs using raw statistics alone (e.g. blocks, tackles, aerial duels won, etc.). One problem is that there is no accounting for how many of these skills are related and therefore measuring the same underlying skill. For example, players who have a high number of blocks and clearances are generally good at the overarching skill of last ditch defending.
For this reason, we thought that comparisons of a players’ style could be improved using Principal Component Analysis (PCA), building on the work from Johannes Harkins and Will Gürpınar-Morgan in years gone by. This method statistically reduces a set of variables (e.g. dribbles, crosses, key passes) into broader dimensions which are common to those variables (e.g. overall attacking play) whilst still retaining the key information (i.e. how good each player is at those skills). Importantly, it can allow us to assess which players rate high or low on certain attributes, which provides a more holistic view of a player’s profile.
We used Opta event data from 419 full-backs across English Premier League, Spanish La Liga, German Bundesliga and Italian Serie A from the 2015/16 & 2016/17 season. We entered raw statistics from 14 variables to run separate PCA analyses for each season (see below). Importantly, we chose not include outcome statistics such as goals and assists in the analysis, to ensure that the analysis would assess a players’ style rather than their proficiency. Defensive statistics were possession adjusted, and all statistics were adjusted per 90 minutes.
The results were really encouraging, with both 2015/16 and 2016/17 analyses presenting almost identical dimensions of playing style, with high correlations between factors across the two seasons. This is shown neatly by the heatmap, in which red highlights variables which correlate positively with that factor, and blue highlights variables that correlate negatively with that factor. Put simply, the above heatmap shows that 14 variables could be more neatly summarised into 5 broader dimensions:
- Front-Foot Defending – Wins duels, makes interceptions, and recovers loose balls.
- Chance Creation – Gets forward regularly, with high volume of crosses, shots, and chances created.
- Reactive Defending – Plays on the back-foot, with main focus to defend well.
- Ball-playing – Plays a lot of passes and plays the ball into the final third regularly.
- Incisive Attacking – Plays high volume of passes and through balls.
Essentially, the PCA ‘cleans’ the data by reducing a large number of related variables into simple dimension, and isolates them from uncorrelated, dissimilar variables. Importantly, this method has a lot of useful applications within clubs. Recruitment teams can use this model to screen and compare transfer targets objectively, without bias, in order to accurately recruit a profile of player that fits with the club’s style of play (see image below). For example, a team may want to buy a left-back who is proactive in his play, recycles the ball well, but doesn’t overcommit himself going forward. Looking at the above analysis, this profile of player would be best represented by a Front-Foot Full-Back. The statistical output neatly ranks all players in that factor, which allows us to assess which player’s profile correlates highest within that the factor.
It’s important to note that PCA doesn’t try to put players into a box in terms of one style of play. Players can be high on more than one skill/dimension; for example, in the radars below we can see Ryan Bertrand rates above average last season for both Reactive and Front-Foot defending. This method can also provide useful tactical analysis, in which teams could identify the playing style of the specific players in the opposition to target their strengths and weaknesses on a matchday. The flexibility of this analysis makes it a useful and simple tool to be used in a club environment.
Scale of the Radar Chart represents the players’ percentile rank on each dimension, out of all players entered into the analysis.
The take-home message of this analysis is that PCA compresses many related variables into fewer main categories/factors, which provides a broader comparison between players based on their overall style of play. This compression means club analysts don’t have to decide how many raw statistics to include or not when assessing a player, as the PCA can do the compression for them. Importantly, we can look at the degree to which individual players ‘load’ onto each factor as a percentile, allowing us to compare between players, both within and between factors. For the present analysis, we have carefully chosen 14 event variables, but clubs can choose which exact statistics to enter in to the analysis which would be of most value to them, depending on the position of the player they are assessing.
This method could provide clubs with a competitive edge in the scouting process, and improve decision making when used in conjunction with typical scouting methods. Importantly, such an analysis is sensitive enough to identify changes or developments in playing styles, both within and between seasons. Such methods are cost-effective and computationally quick in providing objective analysis of a players’ style of play, which can be used on a large scale for any position across leagues worldwide.