TY - JOUR T1 - Explaining the difference between men’s and women’s football JF - PLOS ONE Y1 - 2021 A1 - Luca Pappalardo A1 - Alessio Rossi A1 - Michela Natilli A1 - Paolo Cintia ED - Constantinou, Anthony C. AB - Women’s football is gaining supporters and practitioners worldwide, raising questions about what the differences are with men’s football. While the two sports are often compared based on the players’ physical attributes, we analyze the spatio-temporal events during matches in the last World Cups to compare male and female teams based on their technical performance. We train an artificial intelligence model to recognize if a team is male or female based on variables that describe a match’s playing intensity, accuracy, and performance quality. Our model accurately distinguishes between men’s and women’s football, revealing crucial technical differences, which we investigate through the extraction of explanations from the classifier’s decisions. The differences between men’s and women’s football are rooted in play accuracy, the recovery time of ball possession, and the players’ performance quality. Our methodology may help journalists and fans understand what makes women’s football a distinct sport and coaches design tactics tailored to female teams. VL - 16 UR - https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0255407 JO - PLoS ONE ER - TY - ABST T1 - Mobile phone data analytics against the COVID-19 epidemics in Italy: flow diversity and local job markets during the national lockdown Y1 - 2020 A1 - Pietro Bonato A1 - Paolo Cintia A1 - Francesco Fabbri A1 - Daniele Fadda A1 - Fosca Giannotti A1 - Pier Luigi Lopalco A1 - Sara Mazzilli A1 - Mirco Nanni A1 - Luca Pappalardo A1 - Dino Pedreschi A1 - Francesco Penone A1 - S Rinzivillo A1 - Giulio Rossetti A1 - Marcello Savarese A1 - Lara Tavoschi AB - Understanding collective mobility patterns is crucial to plan the restart of production and economic activities, which are currently put in stand-by to fight the diffusion of the epidemics. In this report, we use mobile phone data to infer the movements of people between Italian provinces and municipalities, and we analyze the incoming, outcoming and internal mobility flows before and during the national lockdown (March 9th, 2020) and after the closure of non-necessary productive and economic activities (March 23th, 2020). The population flow across provinces and municipalities enable for the modelling of a risk index tailored for the mobility of each municipality or province. Such an index would be a useful indicator to drive counter-measures in reaction to a sudden reactivation of the epidemics. Mobile phone data, even when aggregated to preserve the privacy of individuals, are a useful data source to track the evolution in time of human mobility, hence allowing for monitoring the effectiveness of control measures such as physical distancing. We address the following analytical questions: How does the mobility structure of a territory change? Do incoming and outcoming flows become more predictable during the lockdown, and what are the differences between weekdays and weekends? Can we detect proper local job markets based on human mobility flows, to eventually shape the borders of a local outbreak? UR - https://arxiv.org/abs/2004.11278 ER - TY - JOUR T1 - PRIMULE: Privacy risk mitigation for user profiles Y1 - 2020 A1 - Francesca Pratesi A1 - Lorenzo Gabrielli A1 - Paolo Cintia A1 - Anna Monreale A1 - Fosca Giannotti AB - The availability of mobile phone data has encouraged the development of different data-driven tools, supporting social science studies and providing new data sources to the standard official statistics. However, this particular kind of data are subject to privacy concerns because they can enable the inference of personal and private information. In this paper, we address the privacy issues related to the sharing of user profiles, derived from mobile phone data, by proposing PRIMULE, a privacy risk mitigation strategy. Such a method relies on PRUDEnce (Pratesi et al., 2018), a privacy risk assessment framework that provides a methodology for systematically identifying risky-users in a set of data. An extensive experimentation on real-world data shows the effectiveness of PRIMULE strategy in terms of both quality of mobile user profiles and utility of these profiles for analytical services such as the Sociometer (Furletti et al., 2013), a data mining tool for city users classification. VL - 125 SN - 0169-023X UR - https://www.sciencedirect.com/science/article/pii/S0169023X18305342 JO - Data & Knowledge Engineering ER - TY - JOUR T1 - The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy JF - arXiv preprint arXiv:2006.03141 Y1 - 2020 A1 - Paolo Cintia A1 - Daniele Fadda A1 - Fosca Giannotti A1 - Luca Pappalardo A1 - Giulio Rossetti A1 - Dino Pedreschi A1 - S Rinzivillo A1 - Bonato, Pietro A1 - Fabbri, Francesco A1 - Penone, Francesco A1 - Savarese, Marcello A1 - Checchi, Daniele A1 - Chiaromonte, Francesca A1 - Vineis , Paolo A1 - Guzzetta, Giorgio A1 - Riccardo, Flavia A1 - Marziano, Valentina A1 - Poletti, Piero A1 - Trentini, Filippo A1 - Bella, Antonio A1 - Andrianou, Xanthi A1 - Del Manso, Martina A1 - Fabiani, Massimo A1 - Bellino, Stefania A1 - Boros, Stefano A1 - Mateo Urdiales, Alberto A1 - Vescio, Maria Fenicia A1 - Brusaferro, Silvio A1 - Rezza, Giovanni A1 - Pezzotti, Patrizio A1 - Ajelli, Marco A1 - Merler, Stefano AB - We describe in this report our studies to understand the relationship between human mobility and the spreading of COVID-19, as an aid to manage the restart of the social and economic activities after the lockdown and monitor the epidemics in the coming weeks and months. We compare the evolution (from January to May 2020) of the daily mobility flows in Italy, measured by means of nation-wide mobile phone data, and the evolution of transmissibility, measured by the net reproduction number, i.e., the mean number of secondary infections generated by one primary infector in the presence of control interventions and human behavioural adaptations. We find a striking relationship between the negative variation of mobility flows and the net reproduction number, in all Italian regions, between March 11th and March 18th, when the country entered the lockdown. This observation allows us to quantify the time needed to "switch off" the country mobility (one week) and the time required to bring the net reproduction number below 1 (one week). A reasonably simple regression model provides evidence that the net reproduction number is correlated with a region's incoming, outgoing and internal mobility. We also find a strong relationship between the number of days above the epidemic threshold before the mobility flows reduce significantly as an effect of lockdowns, and the total number of confirmed SARS-CoV-2 infections per 100k inhabitants, thus indirectly showing the effectiveness of the lockdown and the other non-pharmaceutical interventions in the containment of the contagion. Our study demonstrates the value of "big" mobility data to the monitoring of key epidemic indicators to inform choices as the epidemics unfolds in the coming months. UR - https://arxiv.org/abs/2006.03141 ER - TY - JOUR T1 - (So) Big Data and the transformation of the city JF - International Journal of Data Science and Analytics Y1 - 2020 A1 - Andrienko, Gennady A1 - Andrienko, Natalia A1 - Boldrini, Chiara A1 - Caldarelli, Guido A1 - Paolo Cintia A1 - Cresci, Stefano A1 - Facchini, Angelo A1 - Fosca Giannotti A1 - Gionis, Aristides A1 - Riccardo Guidotti A1 - others AB - The exponential increase in the availability of large-scale mobility data has fueled the vision of smart cities that will transform our lives. The truth is that we have just scratched the surface of the research challenges that should be tackled in order to make this vision a reality. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders in building knowledge discovery pipelines over such data sources. At the same time, this widespread data availability also raises privacy issues that must be considered by both industrial and academic stakeholders. In this paper, we provide a wide perspective on the role that big data have in reshaping cities. The paper covers the main aspects of urban data analytics, focusing on privacy issues, algorithms, applications and services, and georeferenced data from social media. In discussing these aspects, we leverage, as concrete examples and case studies of urban data science tools, the results obtained in the “City of Citizens” thematic area of the Horizon 2020 SoBigData initiative, which includes a virtual research environment with mobility datasets and urban analytics methods developed by several institutions around Europe. We conclude the paper outlining the main research challenges that urban data science has yet to address in order to help make the smart city vision a reality. UR - https://link.springer.com/article/10.1007/s41060-020-00207-3 ER - TY - JOUR T1 - PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach JF - ACM Transactions on Intelligent Systems and Technology (TIST) Y1 - 2019 A1 - Luca Pappalardo A1 - Paolo Cintia A1 - Ferragina, Paolo A1 - Massucco, Emanuele A1 - Dino Pedreschi A1 - Fosca Giannotti AB - The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this article, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players’ evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by PlayeRank and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank—i.e. searching players and player versatility—showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics. VL - 10 UR - https://dl.acm.org/doi/abs/10.1145/3343172 ER - TY - JOUR T1 - A public data set of spatio-temporal match events in soccer competitions JF - Scientific data Y1 - 2019 A1 - Luca Pappalardo A1 - Paolo Cintia A1 - Alessio Rossi A1 - Massucco, Emanuele A1 - Ferragina, Paolo A1 - Dino Pedreschi A1 - Fosca Giannotti AB - Soccer analytics is attracting increasing interest in academia and industry, thanks to the availability of sensing technologies that provide high-fidelity data streams for every match. Unfortunately, these detailed data are owned by specialized companies and hence are rarely publicly available for scientific research. To fill this gap, this paper describes the largest open collection of soccer-logs ever released, containing all the spatio-temporal events (passes, shots, fouls, etc.) that occured during each match for an entire season of seven prominent soccer competitions. Each match event contains information about its position, time, outcome, player and characteristics. The nature of team sports like soccer, halfway between the abstraction of a game and the reality of complex social systems, combined with the unique size and composition of this dataset, provide an ideal ground for tackling a wide range of data science problems, including the measurement and evaluation of performance, both at individual and at collective level, and the determinants of success and failure. VL - 6 UR - https://www.nature.com/articles/s41597-019-0247-7 ER - TY - JOUR T1 - Relationship between External and Internal Workloads in Elite Soccer Players: Comparison between Rate of Perceived Exertion and Training Load JF - Applied Sciences Y1 - 2019 A1 - Alessio Rossi A1 - Perri, Enrico A1 - Luca Pappalardo A1 - Paolo Cintia A1 - Iaia, F Marcello AB - The use of machine learning (ML) in soccer allows for the management of a large amount of data deriving from the monitoring of sessions and matches. Although the rate of perceived exertion (RPE), training load (S-RPE), and global position system (GPS) are standard methodologies used in team sports to assess the internal and external workload; how the external workload affects RPE and S-RPE remains still unclear. This study explores the relationship between both RPE and S-RPE and the training workload through ML. Data were recorded from 22 elite soccer players, in 160 training sessions and 35 matches during the 2015/2016 season, by using GPS tracking technology. A feature selection process was applied to understand which workload features influence RPE and S-RPE the most. Our results show that the training workloads performed in the previous week have a strong effect on perceived exertion and training load. On the other hand, the analysis of our predictions shows higher accuracy for medium RPE and S-RPE values compared with the extremes. These results provide further evidence of the usefulness of ML as a support to athletic trainers and coaches in understanding the relationship between training load and individual-response in team sports. VL - 9 UR - https://www.mdpi.com/2076-3417/9/23/5174/htm ER - TY - JOUR T1 - Effective injury forecasting in soccer with GPS training data and machine learning JF - PloS one Y1 - 2018 A1 - Alessio Rossi A1 - Luca Pappalardo A1 - Paolo Cintia A1 - Iaia, F Marcello A1 - Fernàndez, Javier A1 - Medina, Daniel AB - Injuries have a great impact on professional soccer, due to their large influence on team performance and the considerable costs of rehabilitation for players. Existing studies in the literature provide just a preliminary understanding of which factors mostly affect injury risk, while an evaluation of the potential of statistical models in forecasting injuries is still missing. In this paper, we propose a multi-dimensional approach to injury forecasting in professional soccer that is based on GPS measurements and machine learning. By using GPS tracking technology, we collect data describing the training workload of players in a professional soccer club during a season. We then construct an injury forecaster and show that it is both accurate and interpretable by providing a set of case studies of interest to soccer practitioners. Our approach opens a novel perspective on injury prevention, providing a set of simple and practical rules for evaluating and interpreting the complex relations between injury risk and training performance in professional soccer. VL - 13 UR - https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0201264 ER - TY - JOUR T1 - Discovering and Understanding City Events with Big Data: The Case of Rome JF - Information Y1 - 2017 A1 - Barbara Furletti A1 - Roberto Trasarti A1 - Paolo Cintia A1 - Lorenzo Gabrielli AB - The increasing availability of large amounts of data and digital footprints has given rise to ambitious research challenges in many fields, which spans from medical research, financial and commercial world, to people and environmental monitoring. Whereas traditional data sources and census fail in capturing actual and up-to-date behaviors, Big Data integrate the missing knowledge providing useful and hidden information to analysts and decision makers. With this paper, we focus on the identification of city events by analyzing mobile phone data (Call Detail Record), and we study and evaluate the impact of these events over the typical city dynamics. We present an analytical process able to discover, understand and characterize city events from Call Detail Record, designing a distributed computation to implement Sociometer, that is a profiling tool to categorize phone users. The methodology provides an useful tool for city mobility manager to manage the events and taking future decisions on specific classes of users, i.e., residents, commuters and tourists. VL - 8 UR - https://doi.org/10.3390/info8030074 ER - TY - JOUR T1 - Quantifying the relation between performance and success in soccer JF - Advances in Complex Systems Y1 - 2017 A1 - Luca Pappalardo A1 - Paolo Cintia AB - The availability of massive data about sports activities offers nowadays the opportunity to quantify the relation between performance and success. In this study, we analyze more than 6000 games and 10 million events in six European leagues and investigate this relation in soccer competitions. We discover that a team’s position in a competition’s final ranking is significantly related to its typical performance, as described by a set of technical features extracted from the soccer data. Moreover, we find that, while victory and defeats can be explained by the team’s performance during a game, it is difficult to detect draws by using a machine learning approach. We then simulate the outcomes of an entire season of each league only relying on technical data and exploiting a machine learning model trained on data from past seasons. The simulation produces a team ranking which is similar to the actual ranking, suggesting that a complex systems’ view on soccer has the potential of revealing hidden patterns regarding the relation between performance and success. UR - http://www.worldscientific.com/doi/abs/10.1142/S021952591750014X ER - TY - Generic T1 - The harsh rule of the goals: data-driven performance indicators for football teams T2 - IEEE International Conference on Data Science and Advanced Analytics Y1 - 2015 A1 - Paolo Cintia A1 - Luca Pappalardo A1 - Dino Pedreschi A1 - Fosca Giannotti A1 - Marco Malvaldi AB - —Sports analytics in general, and football (soccer in USA) analytics in particular, have evolved in recent years in an amazing way, thanks to automated or semi-automated sensing technologies that provide high-fidelity data streams extracted from every game. In this paper we propose a data-driven approach and show that there is a large potential to boost the understanding of football team performance. From observational data of football games we extract a set of pass-based performance indicators and summarize them in the H indicator. We observe a strong correlation among the proposed indicator and the success of a team, and therefore perform a simulation on the four major European championships (78 teams, almost 1500 games). The outcome of each game in the championship was replaced by a synthetic outcome (win, loss or draw) based on the performance indicators computed for each team. We found that the final rankings in the simulated championships are very close to the actual rankings in the real championships, and show that teams with high ranking error show extreme values of a defense/attack efficiency measure, the Pezzali score. Our results are surprising given the simplicity of the proposed indicators, suggesting that a complex systems’ view on football data has the potential of revealing hidden patterns and behavior of superior quality. JF - IEEE International Conference on Data Science and Advanced Analytics UR - https://www.researchgate.net/profile/Luca_Pappalardo/publication/281318318_The_harsh_rule_of_the_goals_data-driven_performance_indicators_for_football_teams/links/561668e308ae37cfe4090a5d.pdf ER - TY - CHAP T1 - Towards a Boosted Route Planner Using Individual Mobility Models T2 - Software Engineering and Formal Methods Y1 - 2015 A1 - Riccardo Guidotti A1 - Paolo Cintia JF - Software Engineering and Formal Methods PB - Springer Berlin Heidelberg ER - TY - CONF T1 - Mining efficient training patterns of non-professional cyclists T2 - 22nd Italian Symposium on Advanced Database Systems, {SEBD} 2014, Sorrento Coast, Italy, June 16-18, 2014. Y1 - 2014 A1 - Paolo Cintia A1 - Luca Pappalardo A1 - Dino Pedreschi JF - 22nd Italian Symposium on Advanced Database Systems, {SEBD} 2014, Sorrento Coast, Italy, June 16-18, 2014. ER - TY - CHAP T1 - Mobility Profiling T2 - Data Science and Simulation in Transportation Research Y1 - 2014 A1 - Mirco Nanni A1 - Roberto Trasarti A1 - Paolo Cintia A1 - Barbara Furletti A1 - Chiara Renso A1 - Lorenzo Gabrielli A1 - S Rinzivillo A1 - Fosca Giannotti AB - The ability to understand the dynamics of human mobility is crucial for tasks like urban planning and transportation management. The recent rapidly growing availability of large spatio-temporal datasets gives us the possibility to develop sophisticated and accurate analysis methods and algorithms that can enable us to explore several relevant mobility phenomena: the distinct access paths to a territory, the groups of persons that move together in space and time, the regions of a territory that contains a high density of traffic demand, etc. All these paradigmatic perspectives focus on a collective view of the mobility where the interesting phenomenon is the result of the contribution of several moving objects. In this chapter, the authors explore a different approach to the topic and focus on the analysis and understanding of relevant individual mobility habits in order to assign a profile to an individual on the basis of his/her mobility. This process adds a semantic level to the raw mobility data, enabling further analyses that require a deeper understanding of the data itself. The studies described in this chapter are based on two large datasets of spatio-temporal data, originated, respectively, from GPS-equipped devices and from a mobile phone network. JF - Data Science and Simulation in Transportation Research PB - IGI Global ER - TY - CONF T1 - "Engine Matters": {A} First Large Scale Data Driven Study on Cyclists' Performance T2 - 13th {IEEE} International Conference on Data Mining Workshops, {ICDM} Workshops, TX, USA, December 7-10, 2013 Y1 - 2013 A1 - Paolo Cintia A1 - Luca Pappalardo A1 - Dino Pedreschi JF - 13th {IEEE} International Conference on Data Mining Workshops, {ICDM} Workshops, TX, USA, December 7-10, 2013 UR - http://dx.doi.org/10.1109/ICDMW.2013.41 ER - TY - CONF T1 - A Gravity Model for Speed Estimation over Road Network T2 - 2013 {IEEE} 14th International Conference on Mobile Data Management, Milan, Italy, June 3-6, 2013 - Volume 2 Y1 - 2013 A1 - Paolo Cintia A1 - Roberto Trasarti A1 - José Antônio Fernandes de Macêdo A1 - Livia Almada A1 - Camila Fereira JF - 2013 {IEEE} 14th International Conference on Mobile Data Management, Milan, Italy, June 3-6, 2013 - Volume 2 UR - http://dx.doi.org/10.1109/MDM.2013.83 ER - TY - CONF T1 - Inferring human activities from GPS tracks UrbComp T2 - Workshop at KDD 2013 Y1 - 2013 A1 - Paolo Cintia A1 - Barbara Furletti A1 - Chiara Renso JF - Workshop at KDD 2013 CY - Chicago USA ER -