Data Science aims at discovering patterns and models of human behavior across the various social dimensions, extracting multi-dimensional patterns and models from a vast variety of social data.
Data Science may have a potentially high impact and may generate enormous value to society. It can create new opportunities to understand complex aspects, such as mobility behaviors, economic and financial crises, the spread of epidemics, the diffusion of opinions and so on.
At KDD Lab we took the opportunities to challenge real-world analytical problems in order to advance the state of the art of data science methods and to acquire new problems to be solved. Our approach is strongly linked to data sources that we are able to collect and integrate to form a knowledge base that can be exploited for the analytical task at hand.
During these last decades, we have been working within a multi- and inter-disciplinary context, with many partners coming both from the industry and the academia.
We collected and build a vast library of datasets, that have been used within the research community and for education activities.
Sport Data Science
The proliferation of new sensing technologies that provide high-fidelity data streams extracted from every game is changing rapidly the way scientists, fans and practitioners conceive sports performance. By combining these (big) data with the powerful tools of data science and AI, we have now the possibility to unveil the great complexity underlying sports performance and perform many challenging tasks: from automatic tactical analysis to data-driven performance ranking, game outcome prediction and injury forecasting. Our data scientists in Pisa are using massive data describing several sports – especially soccer, cycling and rugby – to construct interpretable and easy-to-use tool for sports coaches and managers. Our studies open an interesting perspective on how to understand the factors influencing sports success and how to build simulation tools for boosting both individual and collective performance.
We try to estimate flows and stocks from available data in real time, by building models that map observed measures extracted from unconventional data sources to official data. Since migration might generate cultural changes on the local and incoming population, we evaluate the migrants integration in new communities through social network and retail data analysis. Furthermore, SoBigData.it supports datajournalism projects on migration. We partner with the team of “Demal Te Niew”, the webdocumentary on migration between Italy and Senegal published on L’Espresso and El Pais.
We investigate the changes in people’s and companies behavior using a data-driven approach. We try to correlate people well-being with their social and mobility data, discovering how socio-economic indicators may influence the life of people in different regions. This approach can potentially lead to the development of effective policies in order to reduce internal and external conflicts in the population, leading to a systematic improvement of well-being.
The effective graphical presentation of information is an essential step in the data science process. Visualization is intended to clearly convey and communicate information through graphical means, enabling everyone to comprehend data in a much more explicit way. Visual Analytics methods may enhance the analytical tasks by providing efficient representation to explore data and models or to explain and disseminate the results of the analyses. Through visualization, the results of data processing are made more accessible, straightforward, and user-friendly, even for those who have no specific technical or domain knowledge.
Urban Mobility Atlas
This is a web-based visual platform to access advanced analyses and patterns extracted from mobility data. Complex and time-demanding analysis are pre-computed on the server and they are made accessible through an interactive dashboard. The analyses are performed on a large dataset of GPS trajectories over the Tuscany region, in Italy. The user can access several statistics, spatial and temporal distributions and patterns relative to each single municipality of Tuscany, through an interactive interface. She can also filter or modify the visualizations through a visual interface where the relevant dimensions (time, origin, destination, municipality, etc.) can be explored.
Urban Mobility Atlas: flows of traffic exiting from the city of Pisa
Urban Mobility Atlas: time distribution of trips entering the city of Pisa during a tipical week
Urban Mobility Atlas: Radius of Gyration provides a synthesis of the extention of mobility of each individual
Mobility Atlas Booklet
This is a web-based visual platform to explore the mobility of an area, based on a library of analytical methods that are applied on a set of reconstructed trajectories of users, derived from GSM signals. The platform exploits methods developed within the KDDLab.
We introduce a paradigm where complex analytical processes are summarized into a set of quantitative estimators of the main properties of mobility in a territory. We call such estimators mobility indicators and, for each region, we pre-compute a selection of measurements to provide a general overview of the mobility. Mobility Atlas Booklet is designed as an analytical service for policymakers, businesses, public administrations, and individual citizens. The tool makes territorial information accessible through an API system and a set of easily navigable dashboards
Temporal distribution of subpolulations
Origins of flows by municipality
Daily distributions of accesses to the city
Didactic Data Mining Environment
Didactic Data Mining Environment (DDME) is a framework to provide support to teachers for the introduction and explanation of the data mining algorithms during their lectures, and to offer to students an effective way to understand the details, the functionalities, and behavior of the data mining algorithms, and to test their understanding and preparation.
DDME provides a collection of data mining algorithms ranging from clustering to classification to association rules. Every algorithm allows the user to tune the parameters mainly affecting the algorithm behavior. DDME algorithms can be run on custom datasets as well as on random datasets that can be generated using functionalities offered by the library. The innovative aspect of DDME is that running a data mining algorithm in such an environment provides the intermediate results for each step of the algorithm execution. These results are provided in an intelligible way using text or images, showing the value of a formula or a particular assignment. In this way, the DDME user can learn how algorithms work and why a result is returned with respect to certain parameters or settings.
Exploration of result of k-Means
DBScan result exploration
Step-wise exploration of frequent itemset generation of Apriori
CISIA VIZ is a visual analytic platform for the exploration and analysis of performance data on entry tests taken by Italian students when entering the university career. The data is provided by CISIA (Consorzio Interuniversitario Sistemi Integrati per l’Accesso), a non-profit consortium formed by public universities. The platform provides an analytic framework where all the performances on each test may be analyzed through visual exploration, combining filters and data transformation to enable comparison of different subpopulations
Number of tests per period
Distribution of grade per section
NDLib Viz is a web-based visual platform powered by the NDLib library, that abstracts from the coding complexity and allows the end user to setup, run and analyze diffusion experiments without coding. The aim of the visual platform is to enable non technicians to design, configure and run epidemic simulations. We developed NDlib-Viz to support students and facilitate teachers to introduce epidemic models.
Link: [no link available]
100Band is a data visualization application to show the results of a set of analyses performed on the bands that applied for a contest organized by Tuscany Region.
Visualization of a single band, by origin
Spatial distribution of bands in Tuscany