Trying To Find Similar Players 2.0

Last year I tried to find similar players by looking at the Euclidean distance of the average position of a players events (shots, key passes, passes etc), before going on to measure output/production as a percentage and seeing which players had the most similar averages and output numbers. While the results weren’t terrible, the method had a lot of problems. Taking the average location of events isn’t great, as a lot of information isn’t accounted for, while adding up various output numbers also isn’t great.

With this in mind, I thought it’d be fun to try and find similar players again but with an altered method. It’s nothing new, but I always enjoy having a list of players pop up who are meant to be similar to another player.

Method

This method is a lot more streamlined than the previous one, but requires a look at the different outputs as opposed to just summing them. For this method I started by clustering together all the passes in Europe’s Top 5 leagues this season into 100 different clusters. This resulted in the centroids below:

The idea for this clustering came from @SaturdayOnCouch in their piece ‘Finding the Best Pass In the Bundesliga’ on StatsBomb.

Following the clustering I broke down a players passes by seeing the percentage of their passes for each cluster. This could probably be done better if I were to mess with the clustering and make it more position specific, but that may be for another day. After seeing how each player passes the next step is to compare them. To do this I looked at the Euclidean distance for each players pass percentages – the 100 values for how often the play the certain type of pass.

Doing this means we can see what players have most similar distribution of types of passes, based on the clusters above.

I did some messing around to do similar type of stuff for the output, but wasn’t happy with any of the results. The problem with using Euclidean distance is that a marginally smaller output is closer than a much larger output. So, say you want a creative midfielder and the comparison player makes 3 key passes p90 and 4 take-ons p90, a player who makes say 2.5 key passes p90 and 3.5 take-ons p90 will be closer than one who makes 7 key passes p90 and 9 take-ons p90. So the distance gives players with the same kind of output, when really you want to see players with the same and more output.

I also messed around with clustering output, but again wasn’t too happy.

In the end I decided to use the distance of their output as well as just looking at the values and plotting graphs. This combined with the breakdown of passes, which I’ll refer to as either style or passing style from now on, gives a fun look at similar players.

For examples I’ll do one defender, a deeper lying midfielder and a more advanced midfielder. I’ve had a play looking at some forwards, but there didn’t seem to be one worth including.

Mousa Dembele

For some reason Mousa Dembele is my favourite player to test with similar player stuff. The Belgian midfielder has been an important part of Tottenham’s side in recent years, but with him turning 31-years-old in the summer it’s probably time a replacement is found.

I’ve only used data from 2017/18 and players with over 900 minutes in this, which means some players with less minutes or have had a different season (in a different role/position etc) could be missed. It’d be better to include more seasons of player data, while also seeing if there’s some younger players with similar numbers but less minutes.

With that being said, here’s the top 10 players who are closest to Dembele’s passing ‘style’.

Player Team
Granit Xhaka Arsenal
Nuri Sahin Dortmund
Giannelli Imbula Toulouse
Tanguy Ndombele Lyon
Maxime Gonalons Roma
Lucas Torreira Sampdoria
Gini Wijnaldum Liverpool
Jordan Veretout Fiorentina
Dennis Geiger Hoffenheim
N’Golo Kante Chelsea

This is quite an encouraging list in the sense they’re all deeper midfielders, but they’re not all that similar. For instance, Xhaka may move the ball to and from similar areas as Dembele, but there’s lots of aspects of their games that isn’t the same. One thing that sets Dembele apart is his dribbling ability, which is something that him and Xhaka don’t share.

We can then start looking at the output of Dembele, compared with style. The graph below shows attempted take-ons and distance from Dembele.

 

This shows the likes of Xhaka, Sahin and Gonalons may pass similarly but don’t have the same dribbling ability as Dembele. Meanwhile, the likes of Lemina and Kovacic have impressive dribbling numbers and are just outside the top 10 in terms of passing style. Plotting the same type of graph for defensive actions can be seen below.

This process can then be repeated for other metrics, but you get the idea.

Rather than plotting everything and trying to compare it all, I decided to be lazy and just filter the output values so they’re around a similar level to Dembele’s. The values for each of these can be seen below. Of course much of this will depend on team style/performance, but it’s a starting point.

Metric Dembele Value Filter Value
Attempted Take-Ons (ATO) p90 3.58 >= 3
ATO in Deeper Areas p90 1.88 >= 1.2
ATO in Deeper Areas Success Rate 92% >= 80%
Attempted Vertical Passes p90 5.63 >= 4
Attempted Tackles + Interceptions p90 5.48 >= 4

Only 8 players satisfy these filters. These, along with their distance from the passing style, can be seen below.

Click to Enlarge

Then, while I mentioned the negatives of it earlier, plotting the distance of both style and output (which includes more than just the few mentioned above) gives the following graph.

Click to Enlarge

This is a bit of a mixed bag, some names close to Dembele are encouraging, but then others really aren’t. One way to adjust for this is to not factor in so many metrics, but rather just the ones we’re interested in. A second graph with the output being the metrics in the filter above can be seen below.

Click to Enlarge

This in more encouraging, with midfielders who tend to dribble more being closer to Dembele, rather than the likes of Lucas Leiva, Lucas Biglia and Nemanja Matic. To go a step further and look at those who will be 24 and under on the 1/6/2018 gives the following.

Click to Enlarge

From here, Spurs would have a good starting point for players to look into. NDombele and Kovacic seem to be the best options, but there’s also lots of interesting names sprinkled throughout. 21-year-old Youssef Aït Bennasser had impressive numbers on loan at Caen from Monaco this season as did Adrien Tameze for Nice, Lucas Torreira and Mario Lemina have a slightly different output but closer passing style. Bournemouth’s Lewis Cook isn’t too far away, while Fabian Ruiz of Real Betis could be worth looking into. Gladbach’s 18-year-old Michael Cuisance also has a really good output, albeit being slightly further away with passing style.

Of course from here lots more work would need to be done, but it provides a chance to just generate potential players to look into which may not have been considered before. From the above, NDombele seems like he could be a good replacement for Spurs, while being able to replace a player with someone very close to their namesake seems like an opportunity that shouldn’t be missed.

Jerome Boateng

While Kalidou Koulibaly is the defender with the most attempted vertical passes p90 this season, Jerome Boateng attempted the most long vertical passes p90 and consequently the most into the final third p90. With this in mind I thought it’d be an interesting pick for defenders, looking for those who go long more often and take more chances, as opposed to those more likely to play it safe.

The top 10 defenders closest to Boateng’s passing style are:

Player Team
Toby Alderweireld Tottenham
Davinson Sanchez Tottenham
Leonardo Bonucci Milan
Sokratis Dortmund
Kostas Manolas Roma
German Pezzella Fiorentina
Vincent Kompany Manchester City
Jeremy Gelin Rennes
Sebastien De Maio Bologna
Joel Matip Liverpool

Again, this is a pretty encouraging list. Alderweireld seems like a good match given his tendency to also play longer balls from the back, as can be seen in the embedded tweet below from @thefutebolist.


What should be encouraging to Spurs fans is that Davinson Sanchez is 2nd on the list, suggesting he’s in a good position to step up and replace Alderweireld should he leave. However, it’s also likely he’s so close because he’s playing the same role in the same team.

Another interesting name on the list is Jeremy Gelin from Rennes. I can’t claim to know anything about him, but being just 21-years-old and seeing him high up for similar passing with Boateng makes him and interesting name to look into.

One problem with centre-backs is that there doesn’t seem a lot of metrics to compare them, particularly as there doesn’t seem to be a great metric for showing a good of a defender a player is. Plotting how front footed a defender is with the distance can provide an interesting comparison, which can be seen below.

From this Sokratis seems to be fairly equal in front footed-ness as well as being close in pass style. Stuttgart young player Timo Baumgartl is also in quite a good position, not too far away on the x-axis while being more front footed than Boateng. Plotting the PPDA of the team could also be an interesting comparison.

Again, Sokratis has similar numbers to Boateng, as do Pezzella and Spurs pair Alderweireld and Sanchez.

Being lazy and plotting distance from both style and output gives the below graph.

Click to Enlarge

From this, Alderweireld seems to be the player most similar to Boateng, followed by the likes of Matip, Sokratis and Sanchez. Mateo Mussacchio, Raul Albiol and Niklas Sule are all names in good positions who haven’t been mentioned so far too.

Doing the same as Dembele and filtering for those 24 and under gives the following graph.

Click to Enlarge

From this Sanchez and Sule seem to be the closest. John Stones, Daniele Rugani, Marquinhos and Milan Skriniar are also in good positions, as is Marlon who has recently been linked with a move to West Ham.

Centre-backs are harder to look at for output, but this type of method seems as though it can help when trying to find players who bring the ball out of the defence in a similar ways.

Marek Hamsik

Marek Hamsik has been a key part of Napoli’s side for a good part of the last decade now, where he’s gone from being an exciting part of an attacking trio with Edinson Cavani and Ezequiel Lavezzi to being part of an elite midfield 3 with Jorginho and Allan. After reading about Finding the next Hamsik with nearest neighbour clustering from @FC_rSTATS I thought Hamsik would be an interesting choice for this article.

The top 10 closest passers to Hamsik are:

Player Team
Kevin Strootman Roma
Houssem Aouar Lyon
Karol Linetty Sampdoria
Piotr Zielinski Napoli
Andres Iniesta Barcelona
Giacomo Bonaventura Milan
Borja Garcia Girona
Blaise Matuidi Juventus
Thomas Lemar Monaco
Toni Kroos Real Madrid

This is quite a nice list, with other progressive number 8’s also being present. To further inspect the players, the next step is to see how they compare with some of Hamsik’s outputs.

In 2017/18 Hamsik had a high xG + xA p90 for a central midfielder with 0.499 p90. This plotted against the passing distance can be seen below.

A lot of the close passers some way behind Hamsik here, although one interesting name is Udinese’s Jakub Jankto. The 22-year-old isn’t far away from Hamsik in both passing style and xG + xA p90 this season. Jankto reportedly also wants to leave Italy, possibly going to Spain or England, and based on the above should have quite a few clubs interested in his services.

Another metric that Hamsik performed well in for a deeper lying midfielder was attempted passes into the box. This graph can be seen below.

Here it’s unsurprising that Zielinksi performs well, given he plays in the same team as Hamsik and style can heavily effect this number. Adrian Stoian also has good numbers here, however TransferMarkt has his main position as being left-wing this season.

Playing with Sarri’s Napoli (although it’ll be Ancelotti’s next season so the requirements may be different) Hamsik also attempts a lot of passes, 92.2 p90, meaning it’s preferable for a potential replacement to also play a high number of passes. As can be seen below, however, not many players come close to this figure.

Again, this is heavily dictated by the team the player plays on. The distance is probably more important as it shows how the players move the ball when they have it, as opposed to how often they have it.

Next is to be lazy and again and plot the distance from both style and all outputs.

Click to Enlarge

The players closest here are also from top clubs, which is mostly because these players also complete a high number of passes and spend more time on the ball. To filter the outputs I’ve decided just to look at some of the aspects that were deemed important to Hamsik’s play, so xG + xA p90 and attempted passes into the box (APIB) p90. I’ve also included the percentage of their passes which are vertical. This gives the below graph.

Click to Enlarge

You can see it changes quite a bit, with the likes of Kroos and Silva having a different output now, while Zielinski and Strootman look to be the two most similar options. It’s good news for Napoli that if Zielinski can start contributing to more goals they have a replacement ready to step up.

Filtering the above graph for under 24’s can be seen below.

Click to Enlarge

From here, Zielinski looks to be the best choice. While if a club is looking for a player who can move the ball like Hamsik, without breaking the bank, both Linetty and Jankto could be worth looking into.

Another interesting name on this graph is Alex Oxlade-Chamberlain. His output looks to be the closest to Hamsik’s for the three selected metrics, but his passing style is quite different. However, this difference may just be the fact that Hamsik finds himself on the left of a midfield three and Oxlade-Chamberlain on the right. The two plots below show their most common pass types (account for more than 1% of their passes).

While it’s not exactly symmetrical, you can see that Hamsik seems to live in the left half space while Chamberlain is mostly on the right, with a bit more variation in advanced areas.

Conclusion

This seems quite a fun way of finding players that move the ball in similar ways and generally with less problems than the previous method. Of course, there’ll always be things not accounted for and it’s not perfect, but it can help generate players to look into. Plotting the outputs and most common pass types also helps paint a more complete picture of the players too, especially in the example with Hamsik and Chamberlain.

Overall, I’ve enjoyed using this way to try and find similar players. Further improvements would probably be to make it more specific, either by separating passes made by position or area of the pitch. For instance, either clustering all the passes made by midfielders or all the passes in the centre to try and make the clusters more specific. At the moment it kind of feels as though most midfielders who play on the left-hand side won’t be all that far away from Hamsik. With that being said, it does seem to be useful and is probably something I’ll use in other pieces from now on too.

This article was written with the aid of StrataData, which is property of Stratagem Technologies. StrataData powers the StrataBet Sports Trading Platform, in addition to StrataBet Premium Recommendations.

Leave a Reply

Your email address will not be published. Required fields are marked *