%0 Journal Article %J ACM Transactions on Intelligent Systems and Technology (TIST) %D 2019 %T PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach %A Luca Pappalardo %A Paolo Cintia %A Ferragina, Paolo %A Massucco, Emanuele %A Dino Pedreschi %A Fosca Giannotti %X The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this article, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players’ evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by PlayeRank and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank—i.e. searching players and player versatility—showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics. %B ACM Transactions on Intelligent Systems and Technology (TIST) %V 10 %P 1–27 %G eng %U https://dl.acm.org/doi/abs/10.1145/3343172 %R 10.1145/3343172 %0 Journal Article %J Scientific data %D 2019 %T A public data set of spatio-temporal match events in soccer competitions %A Luca Pappalardo %A Paolo Cintia %A Alessio Rossi %A Massucco, Emanuele %A Ferragina, Paolo %A Dino Pedreschi %A Fosca Giannotti %X Soccer analytics is attracting increasing interest in academia and industry, thanks to the availability of sensing technologies that provide high-fidelity data streams for every match. Unfortunately, these detailed data are owned by specialized companies and hence are rarely publicly available for scientific research. To fill this gap, this paper describes the largest open collection of soccer-logs ever released, containing all the spatio-temporal events (passes, shots, fouls, etc.) that occured during each match for an entire season of seven prominent soccer competitions. Each match event contains information about its position, time, outcome, player and characteristics. The nature of team sports like soccer, halfway between the abstraction of a game and the reality of complex social systems, combined with the unique size and composition of this dataset, provide an ideal ground for tackling a wide range of data science problems, including the measurement and evaluation of performance, both at individual and at collective level, and the determinants of success and failure. %B Scientific data %V 6 %P 1–15 %G eng %U https://www.nature.com/articles/s41597-019-0247-7 %R 10.1038/s41597-019-0247-7