Decoding Player Roles: A Data-Driven Clustering Approach in Football
An Innovative Approach to Football Player Role Classification and Evolution Tracking
Introduction
In today's football landscape, the concept of player roles has evolved far beyond the traditional labels of "defender," "midfielder," or "forward." As tactical fluidity becomes the norm, understanding the nuanced contributions of each player has become crucial for coaches, analysts, and scouts. What if we could categorize players not just by their positions but by the actual roles they perform on the pitch? What if we could track how these roles evolve over time, offering insights into a player's adaptability and potential future performance?
This is the vision behind my latest project, A Data-Driven Clustering Approach in Football—a deep dive into the performance data of top players across Europe's elite leagues, using advanced data analytics to redefine how we understand player roles.
The Challenge
In football, traditional positional labels often fall short of capturing the full scope of a player's contributions. For instance, a "defender" could be anything from a rugged, no-nonsense center-back to a marauding full-back who spends more time in the opposition's half than in his own. Similarly, midfielders and forwards come in various forms, each with unique responsibilities that aren't always reflected in their nominal positions.
Football is no longer confined by rigid formations or static positions. As Luciano Spalletti famously said during his successful season, ”Systems no longer exist in football, it’s all about the spaces left by the opposition. You must be quick to spot them and know the right moment to strike, have the courage to start the move even when pressed.” This philosophy highlights the fluid nature of modern football, where players constantly adapt to the game’s demands rather than being tied to predefined positions. In this landscape, traditional labels like defender, midfielder, or forward fail to capture the full scope of a player’s contributions.
Recognizing this, we set out to develop a model that could group players based on their actual on-field roles, using performance metrics sourced from FBRef for the top 5 European leagues. Our goal was to create clusters of players that reflect the dynamic nature of modern football, offering a more sophisticated understanding of their roles.
The Methodology
We began by collecting and preprocessing three seasons' worth of data, focusing on the 2022, 2023, and 2024 campaigns. Our data pipeline involved several steps, from cleaning and normalizing the data to feature engineering, where we calculated scores for various aspects of the game—Passing and Creativity, Defense, Possession and Dribbling, and Shooting and Finishing.
Using the K-Means clustering algorithm, we categorized players into distinct roles within their primary positions, resulting in four unique clusters for each position group: defenders, midfielders, and forwards. For defenders, the clusters were identified as "Playmaking Defender," "Traditional Center-Back," "Balanced Defender," and "Attacking Full-Back." These roles reflect a diverse range of skills and responsibilities, such as the ability to contribute to build-up play, focus on defensive solidity, or provide an attacking threat from wide positions.
Similarly, midfielders were grouped into clusters like "Defensive Midfielder," "Attacking Midfielder," "Holding Midfielder," and "Box-to-Box Midfielder." Each cluster highlights different aspects of midfield play, from breaking up opposition attacks to creating scoring opportunities and controlling the tempo of the game.
For forwards, the clustering revealed roles such as "All-Round Forward," "Support Forward," "Poacher," and "Creative Forward." These categories encapsulate the various ways forwards contribute to their teams, whether it's through goal-scoring, assisting, or playing a more versatile role in the attacking phase.
By clustering players in this way, we provide a more nuanced understanding of the diverse roles within each position, moving beyond traditional labels to offer insights into the specific contributions players make on the pitch.
But we didn't stop there. We also tracked the evolution of these roles over time, analyzing how certain high-profile players—like Achraf Hakimi, Jude Bellingham and many more—have shifted between roles as their careers have progressed. This added layer of analysis offers valuable insights into how players adapt to different tactical demands, helping teams make informed decisions about player development and recruitment.
Key Findings
Score Results
Before clustering the players, we focused on four key aspects of the game: Passing and Creativity, Defense, Dribbling and Possession Retention, and Shooting and Finishing. For each of these aspects, we selected the most relevant features and assigned weights to reflect their importance. Positive contributions like assists and goals were given higher weights, while negative actions such as red cards and miscontrols were assigned negative weights. These weights were used to calculate a score for each player in each aspect, which we then scaled using the MinMaxScaler to keep all scores between 0 and 1. This method allowed us to fairly compare players across different roles and playing times. Below, we highlight the top-scoring players in each aspect for the 2023-24 season.
Passing and Creativity Score :
Toni Kroos tops the list, reflecting his exceptional ability to influence the game through passing. It’s noteworthy that several defenders also appear in this ranking, which highlights their role as key distributors, often taking risks to initiate plays from the back.
Defense Score :
The list is dominated by well-known defenders, including Virgil van Dijk and Benjamin Pavard. However, it also features players like Mouctar Diakhaby, hopefully can come back well from that horrific injury, who may have flown under the radar but have been defensively solid throughout the season.
Possession and Dribbling Score :
This ranking showcases a diverse group of players, including wingers, midfielders, defenders, and full-backs. Marquinhos and Danilo Pereira, primarily known as defenders, rank high due to their ability to carry the ball forward and maintain possession under pressure.
Shooting and Finishing Score :
As expected, big names like Kylian Mbappé and Harry Kane appear in the top 10, showcasing their elite finishing ability. Interestingly, players like César Azpilicueta also make the list, indicating their efficiency in front of goal despite their primary defensive duties. Jude Bellingham’s inclusion reflects his remarkable debut season at Real Madrid, where he has been a significant goal-scoring threat from midfield.
Clustering Results
Each cluster represents a specific role based on the players’ performance across different aspects of the game. By clustering players according to their primary position, we can better understand the different roles within each position group, rather than lumping all players together. For those listed under multiple positions on FBRef, we used their first listed role as their primary position to keep the analysis consistent. This method ensures that we focus on the role where each player is most impactful.
Defenders
Here are the clusters we got for the defenders :
Cluster 0 - Playmaking Defender: These defenders are strong in passing and creativity, making them key in building attacks from the back.
Cluster 1- Traditional Center-Back: Focused on defense, these defenders are traditional center-backs with less involvement in offensive play.
Cluster 2 - Balanced Defender: Balanced defenders who contribute both in defense and possession.
Cluster 3 - Attacking Full-Back: Attacking full-backs who often join the attack and have a strong presence in shooting and finishing.
Midfielders
Here are the clusters we got for the midfielders:
Cluster 0 - Defensive Midfielders: who excel at protecting the backline.
Cluster 1 - Attacking Midfielders: who focus on scoring and assisting.
Cluster 2 - Holding Midfielders: who maintain possession and control the game's tempo.
Cluster 3 - Box-to-box Midfielders: who contribute both offensively and defensively.
Forwards
Here are the clusters we got for the forwards:
Cluster 0 - All-round Forwards: who contribute to both attacking and defensive plays.
Cluster 1 - Support Forwards: who are more focused on setting up plays rather than scoring.
Cluster 2- Poachers : who excel in scoring, especially in the penalty box.
Cluster 3 - Creative forwards: who are involved in both creating and finishing goal-scoring opportunities.
Analysis of Top Players in Each Cluster for Selected Teams
To better understand the clustering, we focused on analyzing the top players from prominent European clubs such as Manchester City, Paris Saint-Germain, Bayern Munich, Liverpool, Real Madrid, Inter, Arsenal, Juventus, Milan, Barcelona, and Manchester United.
By isolating these top teams, we displayed the best players within each cluster across different positions.
By analyzing top players from leading European clubs, we gained valuable insights into the specific roles these players perform within their teams. The clustering approach allowed us to categorize players into distinct roles, providing a deeper understanding of their contributions beyond just their positions.
This analysis of elite players from top teams demonstrates the effectiveness of our clustering method in uncovering the varied roles within football positions. The findings provide valuable insights that coaches, analysts, and recruiters can use to better deploy and recruit players.
Examples of Players’ Role Evolution
The following tables showcase the role evolution of selected players from 2022 to 2024. These players were chosen for their interesting role transitions over the seasons, as revealed by the clustering analysis.
Achraf Hakimi
Overview: Achraf Hakimi consistently played as an Attacking Full-Back in 2022 and 2023, reflecting his involvement in offensive play and his contributions from wide positions. However, in 2024, under new tactical changes, he transitioned to a Playmaking Defender, indicating a shift towards more involvement in building play from deeper positions, contributing more to his team’s passing and playmaking.
Joško Gvardiol
Overview: Joško Gvardiol was consistently categorized as a Playmaking Defender in 2022 and 2023, highlighting his ability to contribute to the build-up play from the back. By 2024, his role shifted to an Attacking Full-Back, indicating a move towards more offensive contributions from wide areas, involving more overlaps, crosses, and even goals.
İlkay Gündoğan
Overview: İlkay Gündoğan showed flexibility in his role over the years. He started as a Box-to-Box Midfielder in 2022, transitioned to an Attacking Midfielder in 2023, focusing more on offensive play, and returned to a Box-to-Box role in 2024, indicating a balanced contribution to both defense and attack.
Granit Xhaka
Overview: Granit Xhaka was initially classified as a Holding Midfielder in 2022 and 2023, focusing on maintaining possession and dictating play. By 2024, he evolved into a Box-to-Box Midfielder at Leverkusen, reflecting a more dynamic role that involves contributing to both defense and attack.
Jude Bellingham
Overview: Jude Bellingham’s evolution showcases his growing influence on the pitch. Starting as a Defensive Midfielder in 2022, he transitioned to a Box-to-Box Midfielder in 2023, and by 2024, he had become an Attacking Midfielder, focusing more on creating and finishing scoring opportunities.
Future Directions
While our project has provided significant insights, there's always room for improvement. One potential avenue for future work is to expand the dataset to include leagues outside the top 5 in Europe. This would allow us to validate our clustering approach in different footballing contexts and possibly uncover new, emerging roles that are unique to certain leagues or styles of play.
Additionally, integrating event data and positional tracking could enhance our understanding of how players contribute to different phases of the game. This could lead to even more precise role definitions and open up new possibilities for tactical analysis.
Conclusion
By embracing a data-driven approach, we're not just categorizing players—we're uncovering the intricate, evolving roles that make football the beautiful game it is. Whether you're a coach, a scout, or simply a football enthusiast, we hope our work provides you with new tools to appreciate and understand the game at a deeper level.
Explore the Full Project on GitHub for more details: A Data-Driven Clustering Approach in Football
Call to Action:
If you're passionate about football analytics or data science, I invite you to explore the full project on GitHub. Dive into the code, experiment with different approaches—like applying a dimensionality reduction algorithm to the four aspects I used to create the scores, and then using these reduced features with a different clustering method.
My approach involved incorporating these scores directly into K-Means clustering, but there's always room for innovation. Try your hand at a new method and share your findings.
Let's continue advancing the field of football analysis together!
Enjoy the content, and I'll see you on the dugout!