TY  - CONF
T1  - Explaining Siamese Networks in Few-Shot Learning for Audio Data
T2  - Discovery Science - 25th International Conference, DS 2022, Montpellier, France, October 10-12, 2022, Proceedings
Y1  - 2022
A1  - Andrea Fedele
A1  - Riccardo Guidotti
A1  - Dino Pedreschi
AB  - Machine learning models are not able to generalize correctly when queried on samples belonging to class distributions that were never seen during training. This is a critical issue, since real world applications might need to quickly adapt without the necessity of re-training. To overcome these limitations, few-shot learning frameworks have been proposed and their applicability has been studied widely for computer vision tasks. Siamese Networks learn pairs similarity in form of a metric that can be easily extended on new unseen classes. Unfortunately, the downside of such systems is the lack of explainability. We propose a method to explain the outcomes of Siamese Networks in the context of few-shot learning for audio data. This objective is pursued through a local perturbation-based approach that evaluates segments-weighted-average contributions to the final outcome considering the interplay between different areas of the audio spectrogram. Qualitative and quantitative results demonstrate that our method is able to show common intra-class characteristics and erroneous reliance on silent sections.
JF  - Discovery Science - 25th International Conference, DS 2022, Montpellier, France, October 10-12, 2022, Proceedings
PB  - Springer
UR  - https://doi.org/10.1007/978-3-031-18840-4_36
ER  - 

TY  - JOUR
T1  - Benchmarking and Survey of Explanation Methods for Black Box Models
JF  - CoRR
Y1  - 2021
A1  - Francesco Bodria
A1  - Fosca Giannotti
A1  - Riccardo Guidotti
A1  - Francesca Naretto
A1  - Dino Pedreschi
A1  - S Rinzivillo
VL  - abs/2102.13076
UR  - https://arxiv.org/abs/2102.13076
ER  - 

TY  - JOUR
T1  - Give more data, awareness and control to individual citizens, and they will help COVID-19 containment
Y1  - 2021
A1  - Mirco Nanni
A1  - Andrienko, Gennady
A1  - Barabasi, Albert-Laszlo
A1  - Boldrini, Chiara
A1  - Bonchi, Francesco
A1  - Cattuto, Ciro
A1  - Chiaromonte, Francesca
A1  - Comandé, Giovanni
A1  - Conti, Marco
A1  - Coté, Mark
A1  - Dignum, Frank
A1  - Dignum, Virginia
A1  - Domingo-Ferrer, Josep
A1  - Ferragina, Paolo
A1  - Fosca Giannotti
A1  - Riccardo Guidotti
A1  - Helbing, Dirk
A1  - Kaski, Kimmo
A1  - Kertész, János
A1  - Lehmann, Sune
A1  - Lepri, Bruno
A1  - Lukowicz, Paul
A1  - Matwin, Stan
A1  - Jiménez, David Megías
A1  - Anna Monreale
A1  - Morik, Katharina
A1  - Oliver, Nuria
A1  - Passarella, Andrea
A1  - Passerini, Andrea
A1  - Dino Pedreschi
A1  - Pentland, Alex
A1  - Pianesi, Fabio
A1  - Francesca Pratesi
A1  - S Rinzivillo
A1  - Salvatore Ruggieri
A1  - Siebes, Arno
A1  - Torra, Vicenc
A1  - Roberto Trasarti
A1  - Hoven, Jeroen van den
A1  - Vespignani, Alessandro
AB  - The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the “phase 2” of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens’ privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens’ “personal data stores”, to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates—if and when they want and for specific aims—with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.
SN  - 1572-8439
UR  - https://link.springer.com/article/10.1007/s10676-020-09572-w
JO  - Ethics and Information Technology
ER  - 

TY  - JOUR
T1  - GLocalX - From Local to Global Explanations of Black Box AI Models
Y1  - 2021
A1  - Mattia Setzu
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Franco Turini
A1  - Dino Pedreschi
A1  - Fosca Giannotti
AB  - Artificial Intelligence (AI) has come to prominence as one of the major components of our society, with applications in most aspects of our lives. In this field, complex and highly nonlinear machine learning models such as ensemble models, deep neural networks, and Support Vector Machines have consistently shown remarkable accuracy in solving complex tasks. Although accurate, AI models often are “black boxes” which we are not able to understand. Relying on these models has a multifaceted impact and raises significant concerns about their transparency. Applications in sensitive and critical domains are a strong motivational factor in trying to understand the behavior of black boxes. We propose to address this issue by providing an interpretable layer on top of black box models by aggregating “local” explanations. We present GLocalX, a “local-first” model agnostic explanation method. Starting from local explanations expressed in form of local decision rules, GLocalX iteratively generalizes them into global explanations by hierarchically aggregating them. Our goal is to learn accurate yet simple interpretable models to emulate the given black box, and, if possible, replace it entirely. We validate GLocalX in a set of experiments in standard and constrained settings with limited or no access to either data or local explanations. Experiments show that GLocalX is able to accurately emulate several models with simple and small models, reaching state-of-the-art performance against natively global solutions. Our findings show how it is often possible to achieve a high level of both accuracy and comprehensibility of classification models, even in complex domains with high-dimensional data, without necessarily trading one property for the other. This is a key requirement for a trustworthy AI, necessary for adoption in high-stakes decision making applications.
VL  - 294
SN  - 0004-3702
UR  - https://www.sciencedirect.com/science/article/pii/S0004370221000084
JO  - Artificial Intelligence
ER  - 

TY  - CONF
T1  - Analysis and Visualization of Performance Indicators in University Admission Tests
T2  - Formal Methods. FM 2019 International Workshops
Y1  - 2020
A1  - Michela Natilli
A1  - Daniele Fadda
A1  - S Rinzivillo
A1  - Dino Pedreschi
A1  - Licari, Federica
ED  - Sekerinski, Emil
ED  - Moreira, Nelma
ED  - Oliveira, José N.
ED  - Ratiu, Daniel
ED  - Riccardo Guidotti
ED  - Farrell, Marie
ED  - Luckcuck, Matt
ED  - Marmsoler, Diego
ED  - Campos, José
ED  - Astarte, Troy
ED  - Gonnord, Laure
ED  - Cerone, Antonio
ED  - Couto, Luis
ED  - Dongol, Brijesh
ED  - Kutrib, Martin
ED  - Monteiro, Pedro
ED  - Delmas, David
AB  - This paper presents an analytical platform for evaluation of the performance and anomaly detection of tests for admission to public universities in Italy. Each test is personalized for each student and is composed of a series of questions, classified on different domains (e.g. maths, science, logic, etc.). Since each test is unique for composition, it is crucial to guarantee a similar level of difficulty for all the tests in a session. For this reason, to each question, it is assigned a level of difficulty from a domain expert. Thus, the general difficultness of a test depends on the correct classification of each item. We propose two approaches to detect outliers. A visualization-based approach using dynamic filter and responsive visual widgets. A data mining approach to evaluate the performance of the different questions for five years. We used clustering to group the questions according to a set of performance indicators to provide labeling of the data-driven level of difficulty. The measured level is compared with the a priori assigned by experts. The misclassifications are then highlighted to the expert, who will be able to refine the question or the classification. Sequential pattern mining is used to check if biases are present in the composition of the tests and their performance. This analysis is meant to exclude overlaps or direct dependencies among questions. Analyzing co-occurrences we are able to state that the composition of each test is fair and uniform for all the students, even on several sessions. The analytical results are presented to the expert through a visual web application that loads the analytical data and indicators and composes an interactive dashboard. The user may explore the patterns and models extracted by filtering and changing thresholds and analytical parameters.
JF  - Formal Methods. FM 2019 International Workshops
PB  - Springer International Publishing
CY  - Cham
SN  - 978-3-030-54994-7
UR  - https://link.springer.com/chapter/10.1007/978-3-030-54994-7_14
ER  - 

TY  - CONF
T1  - Black Box Explanation by Learning Image Exemplars in the Latent Feature Space
T2  - Machine Learning and Knowledge Discovery in Databases
Y1  - 2020
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Matwin, Stan
A1  - Dino Pedreschi
ED  - Brefeld, Ulf
ED  - Fromont, Elisa
ED  - Hotho, Andreas
ED  - Knobbe, Arno
ED  - Maathuis, Marloes
ED  - Robardet, Céline
AB  - We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by “morphing” into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability.
JF  - Machine Learning and Knowledge Discovery in Databases
PB  - Springer International Publishing
CY  - Cham
SN  - 978-3-030-46150-8
UR  - https://link.springer.com/chapter/10.1007/978-3-030-46150-8_12
ER  - 

TY  - CONF
T1  - Explaining Sentiment Classification with Synthetic Exemplars and Counter-Exemplars
T2  - Discovery Science
Y1  - 2020
A1  - Lampridis, Orestis
A1  - Riccardo Guidotti
A1  - Salvatore Ruggieri
ED  - Appice, Annalisa
ED  - Tsoumakas, Grigorios
ED  - Manolopoulos, Yannis
ED  - Matwin, Stan
AB  - We present xspells, a model-agnostic local approach for explaining the decisions of a black box model for sentiment classification of short texts. The explanations provided consist of a set of exemplar sentences and a set of counter-exemplar sentences. The former are examples classified by the black box with the same label as the text to explain. The latter are examples classified with a different label (a form of counter-factuals). Both are close in meaning to the text to explain, and both are meaningful sentences – albeit they are synthetically generated. xspells generates neighbors of the text to explain in a latent space using Variational Autoencoders for encoding text and decoding latent instances. A decision tree is learned from randomly generated neighbors, and used to drive the selection of the exemplars and counter-exemplars. We report experiments on two datasets showing that xspells outperforms the well-known lime method in terms of quality of explanations, fidelity, and usefulness, and that is comparable to it in terms of stability.
JF  - Discovery Science
PB  - Springer International Publishing
CY  - Cham
SN  - 978-3-030-61527-7
UR  - https://link.springer.com/chapter/10.1007/978-3-030-61527-7_24
ER  - 

TY  - CONF
T1  - Global Explanations with Local Scoring
T2  - Machine Learning and Knowledge Discovery in Databases
Y1  - 2020
A1  - Mattia Setzu
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Franco Turini
ED  - Cellier, Peggy
ED  - Driessens, Kurt
AB  - Artificial Intelligence systems often adopt machine learning models encoding complex algorithms with potentially unknown behavior. As the application of these “black box” models grows, it is our responsibility to understand their inner working and formulate them in human-understandable explanations. To this end, we propose a rule-based model-agnostic explanation method that follows a local-to-global schema: it generalizes a global explanation summarizing the decision logic of a black box starting from the local explanations of single predicted instances. We define a scoring system based on a rule relevance score to extract global explanations from a set of local explanations in the form of decision rules. Experiments on several datasets and black boxes show the stability, and low complexity of the global explanations provided by the proposed solution in comparison with baselines and state-of-the-art global explainers.
JF  - Machine Learning and Knowledge Discovery in Databases
PB  - Springer International Publishing
CY  - Cham
SN  - 978-3-030-43823-4
UR  - https://link.springer.com/chapter/10.1007%2F978-3-030-43823-4_14
ER  - 

TY  - JOUR
T1  - Human migration: the big data perspective
JF  - International Journal of Data Science and Analytics
Y1  - 2020
A1  - Alina Sirbu
A1  - Andrienko, Gennady
A1  - Andrienko, Natalia
A1  - Boldrini, Chiara
A1  - Conti, Marco
A1  - Fosca Giannotti
A1  - Riccardo Guidotti
A1  - Bertoli, Simone
A1  - Jisu Kim
A1  - Muntean, Cristina Ioana
A1  - Luca Pappalardo
A1  - Passarella, Andrea
A1  - Dino Pedreschi
A1  - Pollacci, Laura
A1  - Francesca Pratesi
A1  - Sharma, Rajesh
AB  - How can big data help to understand the migration phenomenon? In this paper, we try to answer this question through an analysis of various phases of migration, comparing traditional and novel data sources and models at each phase. We concentrate on three phases of migration, at each phase describing the state of the art and recent developments and ideas. The first phase includes the journey, and we study migration flows and stocks, providing examples where big data can have an impact. The second phase discusses the stay, i.e. migrant integration in the destination country. We explore various data sets and models that can be used to quantify and understand migrant integration, with the final aim of providing the basis for the construction of a novel multi-level integration index. The last phase is related to the effects of migration on the source countries and the return of migrants.
SN  - 2364-4168
UR  - https://link.springer.com/article/10.1007%2Fs41060-020-00213-5
JO  - International Journal of Data Science and Analytics
ER  - 

TY  - CONF
T1  - “Know Thyself” How Personal Music Tastes Shape the Last.Fm Online Social Network
T2  - Formal Methods. FM 2019 International Workshops
Y1  - 2020
A1  - Riccardo Guidotti
A1  - Giulio Rossetti
ED  - Sekerinski, Emil
ED  - Moreira, Nelma
ED  - Oliveira, José N.
ED  - Ratiu, Daniel
ED  - Riccardo Guidotti
ED  - Farrell, Marie
ED  - Luckcuck, Matt
ED  - Marmsoler, Diego
ED  - Campos, José
ED  - Astarte, Troy
ED  - Gonnord, Laure
ED  - Cerone, Antonio
ED  - Couto, Luis
ED  - Dongol, Brijesh
ED  - Kutrib, Martin
ED  - Monteiro, Pedro
ED  - Delmas, David
AB  - As Nietzsche once wrote “Without music, life would be a mistake” (Twilight of the Idols, 1889.). The music we listen to reflects our personality, our way to approach life. In order to enforce self-awareness, we devised a Personal Listening Data Model that allows for capturing individual music preferences and patterns of music consumption. We applied our model to 30k users of Last.Fm for which we collected both friendship ties and multiple listening. Starting from such rich data we performed an analysis whose final aim was twofold: (i) capture, and characterize, the individual dimension of music consumption in order to identify clusters of like-minded Last.Fm users; (ii) analyze if, and how, such clusters relate to the social structure expressed by the users in the service. Do there exist individuals having similar Personal Listening Data Models? If so, are they directly connected in the social graph or belong to the same community?.
JF  - Formal Methods. FM 2019 International Workshops
PB  - Springer International Publishing
CY  - Cham
SN  - 978-3-030-54994-7
UR  - https://link.springer.com/chapter/10.1007/978-3-030-54994-7_11
ER  - 

TY  - CONF
T1  - Prediction and Explanation of Privacy Risk on Mobility Data with Neural Networks
T2  - ECML PKDD 2020 Workshops
Y1  - 2020
A1  - Francesca Naretto
A1  - Roberto Pellungrini
A1  - Nardini, Franco Maria
A1  - Fosca Giannotti
ED  - Koprinska, Irena
ED  - Kamp, Michael
ED  - Appice, Annalisa
ED  - Loglisci, Corrado
ED  - Antonie, Luiza
ED  - Zimmermann, Albrecht
ED  - Riccardo Guidotti
ED  - Özgöbek, Özlem
ED  - Ribeiro, Rita P.
ED  - Gavaldà, Ricard
ED  - Gama, João
ED  - Adilova, Linara
ED  - Krishnamurthy, Yamuna
ED  - Ferreira, Pedro M.
ED  - Malerba, Donato
ED  - Medeiros, Ibéria
ED  - Ceci, Michelangelo
ED  - Manco, Giuseppe
ED  - Masciari, Elio
ED  - Ras, Zbigniew W.
ED  - Christen, Peter
ED  - Ntoutsi, Eirini
ED  - Schubert, Erich
ED  - Zimek, Arthur
ED  - Anna Monreale
ED  - Biecek, Przemyslaw
ED  - S Rinzivillo
ED  - Kille, Benjamin
ED  - Lommatzsch, Andreas
ED  - Gulla, Jon Atle
AB  - The analysis of privacy risk for mobility data is a fundamental part of any privacy-aware process based on such data. Mobility data are highly sensitive. Therefore, the correct identification of the privacy risk before releasing the data to the public is of utmost importance. However, existing privacy risk assessment frameworks have high computational complexity. To tackle these issues, some recent work proposed a solution based on classification approaches to predict privacy risk using mobility features extracted from the data. In this paper, we propose an improvement of this approach by applying long short-term memory (LSTM) neural networks to predict the privacy risk directly from original mobility data. We empirically evaluate privacy risk on real data by applying our LSTM-based approach. Results show that our proposed method based on a LSTM network is effective in predicting the privacy risk with results in terms of F1 of up to 0.91. Moreover, to explain the predictions of our model, we employ a state-of-the-art explanation algorithm, Shap. We explore the resulting explanation, showing how it is possible to provide effective predictions while explaining them to the end-user.
JF  - ECML PKDD 2020 Workshops
PB  - Springer International Publishing
CY  - Cham
SN  - 978-3-030-65965-3
UR  - https://link.springer.com/chapter/10.1007/978-3-030-65965-3_34
ER  - 

TY  - JOUR
T1  - (So) Big Data and the transformation of the city
JF  - International Journal of Data Science and Analytics
Y1  - 2020
A1  - Andrienko, Gennady
A1  - Andrienko, Natalia
A1  - Boldrini, Chiara
A1  - Caldarelli, Guido
A1  - Paolo Cintia
A1  - Cresci, Stefano
A1  - Facchini, Angelo
A1  - Fosca Giannotti
A1  - Gionis, Aristides
A1  - Riccardo Guidotti
A1  - others
AB  - The exponential increase in the availability of large-scale mobility data has fueled the vision of smart cities that will transform our lives. The truth is that we have just scratched the surface of the research challenges that should be tackled in order to make this vision a reality. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders in building knowledge discovery pipelines over such data sources. At the same time, this widespread data availability also raises privacy issues that must be considered by both industrial and academic stakeholders. In this paper, we provide a wide perspective on the role that big data have in reshaping cities. The paper covers the main aspects of urban data analytics, focusing on privacy issues, algorithms, applications and services, and georeferenced data from social media. In discussing these aspects, we leverage, as concrete examples and case studies of urban data science tools, the results obtained in the “City of Citizens” thematic area of the Horizon 2020 SoBigData initiative, which includes a virtual research environment with mobility datasets and urban analytics methods developed by several institutions around Europe. We conclude the paper outlining the main research challenges that urban data science has yet to address in order to help make the smart city vision a reality.
UR  - https://link.springer.com/article/10.1007/s41060-020-00207-3
ER  - 

TY  - JOUR
T1  - The AI black box Explanation Problem
JF  - ERCIM NEWS
Y1  - 2019
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Dino Pedreschi
ER  - 

TY  - JOUR
T1  - Defining Geographic Markets from Probabilistic Clusters: A Machine Learning Algorithm Applied to Supermarket Scanner Data
JF  - Available at SSRN 3452058
Y1  - 2019
A1  - Bruestle, Stephen
A1  - Luca Pappalardo
A1  - Riccardo Guidotti
ER  - 

TY  - CONF
T1  - Explaining multi-label black-box classifiers for health applications
T2  - International Workshop on Health Intelligence
Y1  - 2019
A1  - Cecilia Panigutti
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Dino Pedreschi
AB  - Today the state-of-the-art performance in classification is achieved by the so-called “black boxes”, i.e. decision-making systems whose internal logic is obscure. Such models could revolutionize the health-care system, however their deployment in real-world diagnosis decision support systems is subject to several risks and limitations due to the lack of transparency. The typical classification problem in health-care requires a multi-label approach since the possible labels are not mutually exclusive, e.g. diagnoses. We propose MARLENA, a model-agnostic method which explains multi-label black box decisions. MARLENA explains an individual decision in three steps. First, it generates a synthetic neighborhood around the instance to be explained using a strategy suitable for multi-label decisions. It then learns a decision tree on such neighborhood and finally derives from it a decision rule that explains the black box decision. Our experiments show that MARLENA performs well in terms of mimicking the black box behavior while gaining at the same time a notable amount of interpretability through compact decision rules, i.e. rules with limited length.
JF  - International Workshop on Health Intelligence
PB  - Springer
UR  - https://link.springer.com/chapter/10.1007/978-3-030-24409-5_9
ER  - 

TY  - JOUR
T1  - Factual and Counterfactual Explanations for Black Box Decision Making
JF  - IEEE Intelligent Systems
Y1  - 2019
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Fosca Giannotti
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
A1  - Franco Turini
AB  - The rise of sophisticated machine learning models has brought accurate but obscure decision systems, which hide their logic, thus undermining transparency, trust, and the adoption of artificial intelligence (AI) in socially sensitive and safety-critical contexts. We introduce a local rule-based explanation method, providing faithful explanations of the decision made by a black box classifier on a specific instance. The proposed method first learns an interpretable, local classifier on a synthetic neighborhood of the instance under investigation, generated by a genetic algorithm. Then, it derives from the interpretable classifier an explanation consisting of a decision rule, explaining the factual reasons of the decision, and a set of counterfactuals, suggesting the changes in the instance features that would lead to a different outcome. Experimental results show that the proposed method outperforms existing approaches in terms of the quality of the explanations and of the accuracy in mimicking the black box.
UR  - https://ieeexplore.ieee.org/abstract/document/8920138
ER  - 

TY  - CONF
T1  - Investigating Neighborhood Generation Methods for Explanations of Obscure Image Classifiers
T2  - Pacific-Asia Conference on Knowledge Discovery and Data Mining
Y1  - 2019
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Cariaggi, Leonardo
AB  - Given the wide use of machine learning approaches based on opaque prediction models, understanding the reasons behind decisions of black box decision systems is nowadays a crucial topic. We address the problem of providing meaningful explanations in the widely-applied image classification tasks. In particular, we explore the impact of changing the neighborhood generation function for a local interpretable model-agnostic explanator by proposing four different variants. All the proposed methods are based on a grid-based segmentation of the images, but each of them proposes a different strategy for generating the neighborhood of the image for which an explanation is required. A deep experimentation shows both improvements and weakness of each proposed approach.
JF  - Pacific-Asia Conference on Knowledge Discovery and Data Mining
PB  - Springer
UR  - https://link.springer.com/chapter/10.1007/978-3-030-16148-4_5
ER  - 

TY  - CONF
T1  - “Know Thyself” How Personal Music Tastes Shape the Last. Fm Online Social Network
T2  - International Symposium on Formal Methods
Y1  - 2019
A1  - Riccardo Guidotti
A1  - Giulio Rossetti
JF  - International Symposium on Formal Methods
PB  - Springer
ER  - 

TY  - CONF
T1  - Meaningful explanations of Black Box AI decision systems
T2  - Proceedings of the AAAI Conference on Artificial Intelligence
Y1  - 2019
A1  - Dino Pedreschi
A1  - Fosca Giannotti
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Salvatore Ruggieri
A1  - Franco Turini
AB  - Black box AI systems for automated decision making, often based on machine learning over (big) data, map a user’s features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases inherited by the algorithms from human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We focus on the urgent open challenge of how to construct meaningful explanations of opaque AI/ML systems, introducing the local-toglobal framework for black box explanation, articulated along three lines: (i) the language for expressing explanations in terms of logic rules, with statistical and causal interpretation; (ii) the inference of local explanations for revealing the decision rationale for a specific case, by auditing the black box in the vicinity of the target instance; (iii), the bottom-up generalization of many local explanations into simple global ones, with algorithms that optimize for quality and comprehensibility. We argue that the local-first approach opens the door to a wide variety of alternative solutions along different dimensions: a variety of data sources (relational, text, images, etc.), a variety of learning problems (multi-label classification, regression, scoring, ranking), a variety of languages for expressing meaningful explanations, a variety of means to audit a black box.
JF  - Proceedings of the AAAI Conference on Artificial Intelligence
UR  - https://aaai.org/ojs/index.php/AAAI/article/view/5050
ER  - 

TY  - CONF
T1  - Privacy Risk for Individual Basket Patterns
T2  - ECML PKDD 2018 Workshops
Y1  - 2019
A1  - Roberto Pellungrini
A1  - Anna Monreale
A1  - Riccardo Guidotti
ED  - Alzate, Carlos
ED  - Anna Monreale
ED  - Bioglio, Livio
ED  - Bitetta, Valerio
ED  - Bordino, Ilaria
ED  - Caldarelli, Guido
ED  - Ferretti, Andrea
ED  - Riccardo Guidotti
ED  - Gullo, Francesco
ED  - Pascolutti, Stefano
ED  - Pensa, Ruggero G.
ED  - Robardet, Céline
ED  - Squartini, Tiziano
AB  - Retail data are of fundamental importance for businesses and enterprises that want to understand the purchasing behaviour of their customers. Such data is also useful to develop analytical services and for marketing purposes, often based on individual purchasing patterns. However, retail data and extracted models may also provide very sensitive information to possible malicious third parties. Therefore, in this paper we propose a methodology for empirically assessing privacy risk in the releasing of individual purchasing data. The experiments on real-world retail data show that although individual patterns describe a summary of the customer activity, they may be successful used for the customer re-identifiation.
JF  - ECML PKDD 2018 Workshops
PB  - Springer International Publishing
CY  - Cham
SN  - 978-3-030-13463-1
UR  - https://link.springer.com/chapter/10.1007/978-3-030-13463-1_11
ER  - 

TY  - CONF
T1  - On The Stability of Interpretable Models
T2  - 2019 International Joint Conference on Neural Networks (IJCNN)
Y1  - 2019
A1  - Riccardo Guidotti
A1  - Salvatore Ruggieri
AB  - Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process. Bias in data collection and preparation, or in model's construction may severely affect the accountability of the design process. We conduct an experimental study of the stability of interpretable models with respect to feature selection, instance selection, and model selection. Our conclusions should raise awareness and attention of the scientific community on the need of a stability impact assessment of interpretable models.
JF  - 2019 International Joint Conference on Neural Networks (IJCNN)
PB  - IEEE
UR  - https://ieeexplore.ieee.org/abstract/document/8852158
ER  - 

TY  - JOUR
T1  - Assessing the Stability of Interpretable Models
JF  - arXiv preprint arXiv:1810.09352
Y1  - 2018
A1  - Riccardo Guidotti
A1  - Salvatore Ruggieri
AB  - Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process, which, in particular, comprises data collection and filtering. Selection bias in data collection or in data pre-processing may affect the model learned. Although model induction algorithms are designed to learn to generalize, they pursue optimization of predictive accuracy. It remains unclear how interpretability is instead impacted. We conduct an experimental analysis to investigate whether interpretable models are able to cope with data selection bias as far as interpretability is concerned.
ER  - 

TY  - JOUR
T1  - Discovering temporal regularities in retail customers’ shopping behavior
JF  - EPJ Data Science
Y1  - 2018
A1  - Riccardo Guidotti
A1  - Lorenzo Gabrielli
A1  - Anna Monreale
A1  - Dino Pedreschi
A1  - Fosca Giannotti
AB  - In this paper we investigate the regularities characterizing the temporal purchasing behavior of the customers of a retail market chain. Most of the literature studying purchasing behavior focuses on what customers buy while giving few importance to the temporal dimension. As a consequence, the state of the art does not allow capturing which are the temporal purchasing patterns of each customers. These patterns should describe the customer’s temporal habits highlighting when she typically makes a purchase in correlation with information about the amount of expenditure, number of purchased items and other similar aggregates. This knowledge could be exploited for different scopes: set temporal discounts for making the purchases of customers more regular with respect the time, set personalized discounts in the day and time window preferred by the customer, provide recommendations for shopping time schedule, etc. To this aim, we introduce a framework for extracting from personal retail data a temporal purchasing profile able to summarize whether and when a customer makes her distinctive purchases. The individual profile describes a set of regular and characterizing shopping behavioral patterns, and the sequences in which these patterns take place. We show how to compare different customers by providing a collective perspective to their individual profiles, and how to group the customers with respect to these comparable profiles. By analyzing real datasets containing millions of shopping sessions we found that there is a limited number of patterns summarizing the temporal purchasing behavior of all the customers, and that they are sequentially followed in a finite number of ways. Moreover, we recognized regular customers characterized by a small number of temporal purchasing behaviors, and changing customers characterized by various types of temporal purchasing behaviors. Finally, we discuss on how the profiles can be exploited both by customers to enable personalized services, and by the retail market chain for providing tailored discounts based on temporal purchasing regularity.
VL  - 7
UR  - https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-018-0133-0
ER  - 

TY  - CONF
T1  - Explaining successful docker images using pattern mining analysis
T2  - Federation of International Conferences on Software Technologies: Applications and Foundations
Y1  - 2018
A1  - Riccardo Guidotti
A1  - Soldani, Jacopo
A1  - Neri, Davide
A1  - Antonio Brogi
AB  - Docker is on the rise in today’s enterprise IT. It permits shipping applications inside portable containers, which run from so-called Docker images. Docker images are distributed in public registries, which also monitor their popularity. The popularity of an image directly impacts on its usage, and hence on the potential revenues of its developers. In this paper, we present a frequent pattern mining-based approach for understanding how to improve an image to increase its popularity. The results in this work can provide valuable insights to Docker image providers, helping them to design more competitive software products.
JF  - Federation of International Conferences on Software Technologies: Applications and Foundations
PB  - Springer, Cham
UR  - https://link.springer.com/chapter/10.1007/978-3-030-04771-9_9
ER  - 

TY  - CONF
T1  - Exploring Students Eating Habits Through Individual Profiling and Clustering Analysis
T2  - ECML PKDD 2018 Workshops
Y1  - 2018
A1  - Michela Natilli
A1  - Anna Monreale
A1  - Riccardo Guidotti
A1  - Luca Pappalardo
JF  - ECML PKDD 2018 Workshops
PB  - Springer
ER  - 

TY  - CONF
T1  - The Fractal Dimension of Music: Geography, Popularity and Sentiment Analysis
T2  - International Conference on Smart Objects and Technologies for Social Good
Y1  - 2018
A1  - Pollacci, Laura
A1  - Riccardo Guidotti
A1  - Giulio Rossetti
A1  - Fosca Giannotti
A1  - Dino Pedreschi
AB  - Nowadays there is a growing standardization of musical contents. Our finding comes out from a cross-service multi-level dataset analysis where we study how geography affects the music production. The investigation presented in this paper highlights the existence of a “fractal” musical structure that relates the technical characteristics of the music produced at regional, national and world level. Moreover, a similar structure emerges also when we analyze the musicians’ popularity and the polarity of their songs defined as the mood that they are able to convey. Furthermore, the clusters identified are markedly distinct one from another with respect to popularity and sentiment.
JF  - International Conference on Smart Objects and Technologies for Social Good
PB  - Springer
UR  - https://link.springer.com/chapter/10.1007/978-3-319-76111-4_19
ER  - 

TY  - CONF
T1  - Helping your docker images to spread based on explainable models
T2  - Joint European Conference on Machine Learning and Knowledge Discovery in Databases
Y1  - 2018
A1  - Riccardo Guidotti
A1  - Soldani, Jacopo
A1  - Neri, Davide
A1  - Brogi, Antonio
A1  - Dino Pedreschi
AB  - Docker is on the rise in today’s enterprise IT. It permits shipping applications inside portable containers, which run from so-called Docker images. Docker images are distributed in public registries, which also monitor their popularity. The popularity of an image impacts on its actual usage, and hence on the potential revenues for its developers. In this paper, we present a solution based on interpretable decision tree and regression trees for estimating the popularity of a given Docker image, and for understanding how to improve an image to increase its popularity. The results presented in this work can provide valuable insights to Docker developers, helping them in spreading their images. Code related to this paper is available at: https://github.com/di-unipi-socc/DockerImageMiner.
JF  - Joint European Conference on Machine Learning and Knowledge Discovery in Databases
PB  - Springer
UR  - https://link.springer.com/chapter/10.1007/978-3-030-10997-4_13
ER  - 

TY  - JOUR
T1  - The italian music superdiversity
JF  - Multimedia Tools and Applications
Y1  - 2018
A1  - Pollacci, Laura
A1  - Riccardo Guidotti
A1  - Giulio Rossetti
A1  - Fosca Giannotti
A1  - Dino Pedreschi
AB  - Globalization can lead to a growing standardization of musical contents. Using a cross-service multi-level dataset we investigate the actual Italian music scene. The investigation highlights the musical Italian superdiversity both individually analyzing the geographical and lexical dimensions and combining them. Using different kinds of features over the geographical dimension leads to two similar, comparable and coherent results, confirming the strong and essential correlation between melodies and lyrics. The profiles identified are markedly distinct one from another with respect to sentiment, lexicon, and melodic features. Through a novel application of a sentiment spreading algorithm and songs’ melodic features, we are able to highlight discriminant characteristics that violate the standard regional political boundaries, reconfiguring them following the actual musical communicative practices.
UR  - https://link.springer.com/article/10.1007/s11042-018-6511-6
ER  - 

TY  - CONF
T1  - Learning Data Mining
T2  - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)
Y1  - 2018
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - S Rinzivillo
AB  - In the last decade the usage and study of data mining and machine learning algorithms have received an increasing attention from several and heterogeneous fields of research. Learning how and why a certain algorithm returns a particular result, and understanding which are the main problems connected to its execution is a hot topic in the education of data mining methods. In order to support data mining beginners, students, teachers, and researchers we introduce a novel didactic environment. The Didactic Data Mining Environment (DDME) allows to execute a data mining algorithm on a dataset and to observe the algorithm behavior step by step to learn how and why a certain result is returned. DDME can be practically exploited by teachers and students for having a more interactive learning of data mining. Indeed, on top of the core didactic library, we designed a visual platform that allows online execution of experiments and the visualization of the algorithm steps. The visual platform abstracts the coding activity and makes available the execution of algorithms to non-technicians.
JF  - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)
UR  - https://ieeexplore.ieee.org/document/8631453
ER  - 

TY  - RPRT
T1  - Local Rule-Based Explanations of Black Box Decision Systems
Y1  - 2018
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Salvatore Ruggieri
A1  - Dino Pedreschi
A1  - Franco Turini
A1  - Fosca Giannotti
JF  - arXiv preprint arXiv:1805.10820
ER  - 

TY  - RPRT
T1  - Open the Black Box Data-Driven Explanation of Black Box Decision Systems
Y1  - 2018
A1  - Dino Pedreschi
A1  - Fosca Giannotti
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Luca Pappalardo
A1  - Salvatore Ruggieri
A1  - Franco Turini
JF  - arXiv preprint arXiv:1806.09936
ER  - 

TY  - JOUR
T1  - Personalized Market Basket Prediction with Temporal Annotated Recurring Sequences
JF  - IEEE Transactions on Knowledge and Data Engineering
Y1  - 2018
A1  - Riccardo Guidotti
A1  - Giulio Rossetti
A1  - Luca Pappalardo
A1  - Fosca Giannotti
A1  - Dino Pedreschi
AB  - Nowadays, a hot challenge for supermarket chains is to offer personalized services to their customers. Market basket prediction, i.e., supplying the customer a shopping list for the next purchase according to her current needs, is one of these services. Current approaches are not capable of capturing at the same time the different factors influencing the customer's decision process: co-occurrence, sequentuality, periodicity and recurrency of the purchased items. To this aim, we define a pattern Temporal Annotated Recurring Sequence (TARS) able to capture simultaneously and adaptively all these factors. We define the method to extract TARS and develop a predictor for next basket named TBP (TARS Based Predictor) that, on top of TARS, is able to understand the level of the customer's stocks and recommend the set of most necessary items. By adopting the TBP the supermarket chains could crop tailored suggestions for each individual customer which in turn could effectively speed up their shopping sessions. A deep experimentation shows that TARS are able to explain the customer purchase behavior, and that TBP outperforms the state-of-the-art competitors.
UR  - https://ieeexplore.ieee.org/abstract/document/8477157
ER  - 

TY  - JOUR
T1  - A survey of methods for explaining black box models
JF  - ACM computing surveys (CSUR)
Y1  - 2018
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Salvatore Ruggieri
A1  - Franco Turini
A1  - Fosca Giannotti
A1  - Dino Pedreschi
AB  - In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.
VL  - 51
UR  - https://dl.acm.org/doi/abs/10.1145/3236009
ER  - 

TY  - CONF
T1  - Clustering Individual Transactional Data for Masses of Users
T2  - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Y1  - 2017
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Mirco Nanni
A1  - Fosca Giannotti
A1  - Dino Pedreschi
AB  - Mining a large number of datasets recording human activities for making sense of individual data is the key enabler of a new wave of personalized knowledge-based services. In this paper we focus on the problem of clustering individual transactional data for a large mass of users. Transactional data is a very pervasive kind of information that is collected by several services, often involving huge pools of users. We propose txmeans, a parameter-free clustering algorithm able to efficiently partitioning transactional data in a completely automatic way. Txmeans is designed for the case where clustering must be applied on a massive number of different datasets, for instance when a large set of users need to be analyzed individually and each of them has generated a long history of transactions. A deep experimentation on both real and synthetic datasets shows the practical effectiveness of txmeans for the mass clustering of different personal datasets, and suggests that txmeans outperforms existing methods in terms of quality and efficiency. Finally, we present a personal cart assistant application based on txmeans
JF  - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB  - ACM
ER  - 

TY  - CONF
T1  - On the Equivalence Between Community Discovery and Clustering
T2  - International Conference on Smart Objects and Technologies for Social Good
Y1  - 2017
A1  - Riccardo Guidotti
A1  - Michele Coscia
JF  - International Conference on Smart Objects and Technologies for Social Good
PB  - Springer, Cham
ER  - 

TY  - CONF
T1  - The Fractal Dimension of Music: Geography, Popularity and Sentiment Analysis
T2  - International Conference on Smart Objects and Technologies for Social Good
Y1  - 2017
A1  - Pollacci, Laura
A1  - Riccardo Guidotti
A1  - Giulio Rossetti
A1  - Fosca Giannotti
A1  - Dino Pedreschi
AB  - Nowadays there is a growing standardization of musical contents. Our finding comes out from a cross-service multi-level dataset analysis where we study how geography affects the music production. The investigation presented in this paper highlights the existence of a “fractal” musical structure that relates the technical characteristics of the music produced at regional, national and world level. Moreover, a similar structure emerges also when we analyze the musicians’ popularity and the polarity of their songs defined as the mood that they are able to convey. Furthermore, the clusters identified are markedly distinct one from another with respect to popularity and sentiment.
JF  - International Conference on Smart Objects and Technologies for Social Good
PB  - Springer, Cham
UR  - https://link.springer.com/chapter/10.1007/978-3-319-76111-4_19
ER  - 

TY  - JOUR
T1  - ICON Loop Carpooling Show Case
JF  - Data Mining and Constraint Programming: Foundations of a Cross-Disciplinary Approach
Y1  - 2017
A1  - Mirco Nanni
A1  - Lars Kotthoff
A1  - Riccardo Guidotti
A1  - Barry O'Sullivan
A1  - Dino Pedreschi
AB  - In this chapter we describe a proactive carpooling service that combines induction and optimization mechanisms to maximize the impact of carpooling within a community. The approach autonomously infers the mobility demand of the users through the analysis of their mobility traces (i.e. Data Mining of GPS trajectories) and builds the network of all possible ride sharing opportunities among the users. Then, the maximal set of carpooling matches that satisfy some standard requirements (maximal capacity of vehicles, etc.) is computed through Constraint Programming models, and the resulting matches are proactively proposed to the users. Finally, in order to maximize the expected impact of the service, the probability that each carpooling match is accepted by the users involved is inferred through Machine Learning mechanisms and put in the CP model. The whole process is reiterated at regular intervals, thus forming an instance of the general ICON loop.
VL  - 10101
UR  - https://link.springer.com/content/pdf/10.1007/978-3-319-50137-6.pdf#page=314
ER  - 

TY  - CONF
T1  - Market Basket Prediction using User-Centric Temporal Annotated Recurring Sequences
T2  - 2017 IEEE International Conference on Data Mining (ICDM)
Y1  - 2017
A1  - Riccardo Guidotti
A1  - Giulio Rossetti
A1  - Luca Pappalardo
A1  - Fosca Giannotti
A1  - Dino Pedreschi
AB  - Nowadays, a hot challenge for supermarket chains is to offer personalized services to their customers. Market basket prediction, i.e., supplying the customer a shopping list for the next purchase according to her current needs, is one of these services. Current approaches are not capable of capturing at the same time the different factors influencing the customer’s decision process: co-occurrence, sequentuality, periodicity and recurrency of the purchased items. To this aim, we define a pattern named Temporal Annotated Recurring Sequence (TARS). We define the method to extract TARS and develop a predictor for next basket named TBP (TARS Based Predictor) that, on top of TARS, is able to understand the level of the customer’s stocks and recommend the set of most necessary items. A deep experimentation shows that TARS can explain the customers’ purchase behavior, and that TBP outperforms the state-of-the-art competitors.
JF  - 2017 IEEE International Conference on Data Mining (ICDM)
PB  - IEEE
ER  - 

TY  - JOUR
T1  - MyWay: Location prediction via mobility profiling
JF  - Information Systems
Y1  - 2017
A1  - Roberto Trasarti
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Fosca Giannotti
AB  - Forecasting the future positions of mobile users is a valuable task allowing us to operate efficiently a myriad of different applications which need this type of information. We propose MyWay, a prediction system which exploits the individual systematic behaviors modeled by mobility profiles to predict human movements. MyWay provides three strategies: the individual strategy uses only the user individual mobility profile, the collective strategy takes advantage of all users individual systematic behaviors, and the hybrid strategy that is a combination of the previous two. A key point is that MyWay only requires the sharing of individual mobility profiles, a concise representation of the user׳s movements, instead of raw trajectory data revealing the detailed movement of the users. We evaluate the prediction performances of our proposal by a deep experimentation on large real-world data. The results highlight that the synergy between the individual and collective knowledge is the key for a better prediction and allow the system to outperform the state-of-art methods.
VL  - 64
ER  - 

TY  - JOUR
T1  - Never drive alone: Boosting carpooling with network analysis
JF  - Information Systems
Y1  - 2017
A1  - Riccardo Guidotti
A1  - Mirco Nanni
A1  - S Rinzivillo
A1  - Dino Pedreschi
A1  - Fosca Giannotti
AB  - Carpooling, i.e., the act where two or more travelers share the same car for a common trip, is one of the possibilities brought forward to reduce traffic and its externalities, but experience shows that it is difficult to boost the adoption of carpooling to significant levels. In our study, we analyze the potential impact of carpooling as a collective phenomenon emerging from people׳s mobility, by network analytics. Based on big mobility data from travelers in a given territory, we construct the network of potential carpooling, where nodes correspond to the users and links to possible shared trips, and analyze the structural and topological properties of this network, such as network communities and node ranking, to the purpose of highlighting the subpopulations with higher chances to create a carpooling community, and the propensity of users to be either drivers or passengers in a shared car. Our study is anchored to reality thanks to a large mobility dataset, consisting of the complete one-month-long GPS trajectories of approx. 10% circulating cars in Tuscany. We also analyze the aggregated outcome of carpooling by means of empirical simulations, showing how an assignment policy exploiting the network analytic concepts of communities and node rankings minimizes the number of single occupancy vehicles observed after carpooling.
VL  - 64
ER  - 

TY  - JOUR
T1  - Next Basket Prediction using Recurring Sequential Patterns
JF  - arXiv preprint arXiv:1702.07158
Y1  - 2017
A1  - Riccardo Guidotti
A1  - Giulio Rossetti
A1  - Luca Pappalardo
A1  - Fosca Giannotti
A1  - Dino Pedreschi
AB  - Nowadays, a hot challenge for supermarket chains is to offer personalized services for their customers. Next basket prediction, i.e., supplying the customer a shopping list for the next purchase according to her current needs, is one of these services. Current approaches are not capable to capture at the same time the different factors influencing the customer's decision process: co-occurrency, sequentuality, periodicity and recurrency of the purchased items. To this aim, we define a pattern Temporal Annotated Recurring Sequence (TARS) able to capture simultaneously and adaptively all these factors. We define the method to extract TARS and develop a predictor for next basket named TBP (TARS Based Predictor) that, on top of TARS, is able to to understand the level of the customer's stocks and recommend the set of most necessary items. By adopting the TBP the supermarket chains could crop tailored suggestions for each individual customer which in turn could effectively speed up their shopping sessions. A deep experimentation shows that TARS are able to explain the customer purchase behavior, and that TBP outperforms the state-of-the-art competitors.
UR  - https://arxiv.org/abs/1702.07158
ER  - 

TY  - CONF
T1  - Recognizing Residents and Tourists with Retail Data Using Shopping Profiles
T2  - International Conference on Smart Objects and Technologies for Social Good
Y1  - 2017
A1  - Riccardo Guidotti
A1  - Lorenzo Gabrielli
AB  - The huge quantity of personal data stored by service providers registering customers daily life enables the analysis of individual fingerprints characterizing the customers’ behavioral profiles. We propose  a framework for recognizing residents, tourists and occasional shoppers  among the customers of a retail market chain. We employ our recognition framework on a real massive dataset containing the shopping transactions of more than one million of customers, and we identify representative temporal shopping profiles for residents, tourists and occasional  customers. Our experiments show that even though residents are about  33% of the customers they are responsible for more than 90% of the expenditure. We statistically validate the number of residents and tourists  with national official statistics enabling in this way the adoption of our  recognition framework for the development of novel services and analysis.
JF  - International Conference on Smart Objects and Technologies for Social Good
PB  - Springer
UR  - https://link.springer.com/chapter/10.1007/978-3-319-76111-4_35
ER  - 

TY  - CONF
T1  - There's A Path For Everyone: A Data-Driven Personal Model Reproducing Mobility Agendas
T2  - 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2017)
Y1  - 2017
A1  - Riccardo Guidotti
A1  - Roberto Trasarti
A1  - Mirco Nanni
A1  - Fosca Giannotti
A1  - Dino Pedreschi
JF  - 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2017)
PB  - IEEE
CY  - Tokyo
ER  - 

TY  - Generic
T1  - “Are we playing like Music-Stars?” Placing Emerging Artists on the Italian Music Scene
T2  - 9th International Workshop on Machine Learning and Music, ECML-PKDD
Y1  - 2016
A1  - Pollacci, Laura
A1  - Riccardo Guidotti
A1  - Giulio Rossetti
AB  - The Italian emerging bands chase success on the footprint of popular artists by playing rhythmic danceable and happy songs. Our finding comes out from a deep study of the Italian music scene and how the new generation ofmusicians relate with the tradition of their country. By analyzing Spotify data we investigated the peculiarity of regional mu- sic and we placed emerging bands within the musical movements defined by already successful artists. The approach proposed and the results ob- tained are a first attempt to outline some rules suggesting how to reach the success in the musical Italian scene.
JF  - 9th International Workshop on Machine Learning and Music, ECML-PKDD
CY  - Riva del Garda
ER  - 

TY  - CONF
T1  - Audio Ergo Sum
T2  - Federation of International Conferences on Software Technologies: Applications and Foundations
Y1  - 2016
A1  - Riccardo Guidotti
A1  - Giulio Rossetti
A1  - Dino Pedreschi
AB  - Nobody can state “Rock is my favorite genre” or “David Bowie is my favorite artist”. We defined a Personal Listening Data Model able to capture musical preferences through indicators and patterns, and we discovered that we are all characterized by a limited set of musical preferences, but not by a unique predilection. The empowered capacity of mobile devices and their growing adoption in our everyday life is generating an enormous increment in the production of personal data such as calls, positioning, online purchases and even music listening. Musical listening is a type of data that has started receiving more attention from the scientific community as consequence of the increasing availability of rich and punctual online data sources. Starting from the listening of 30k Last.Fm users, we show how the employment of the Personal Listening Data Models can provide higher levels of self-awareness. In addition, the proposed model will enable the development of a wide range of analysis and musical services both at personal and at collective level.
JF  - Federation of International Conferences on Software Technologies: Applications and Foundations
PB  - Springer
ER  - 

TY  - CHAP
T1  - Going Beyond GDP to Nowcast Well-Being Using Retail Market Data
T2  - Advances in Network Science
Y1  - 2016
A1  - Riccardo Guidotti
A1  - Michele Coscia
A1  - Dino Pedreschi
A1  - Diego Pennacchioli
AB  - One of the most used measures of the economic health of a nation is the Gross Domestic Product (GDP): the market value of all officially recognized final goods and services produced within a country in a given period of time. GDP, prosperity and well-being of the citizens of a country have been shown to be highly correlated. However, GDP is an imperfect measure in many respects. GDP usually takes a lot of time to be estimated and arguably the well-being of the people is not quantifiable simply by the market value of the products available to them. In this paper we use a quantification of the average sophistication of satisfied needs of a population as an alternative to GDP. We show that this quantification can be calculated more easily than GDP and it is a very promising predictor of the GDP value, anticipating its estimation by six months. The measure is arguably a more multifaceted evaluation of the well-being of the population, as it tells us more about how people are satisfying their needs. Our study is based on a large dataset of retail micro transactions happening across the Italian territory.
JF  - Advances in Network Science
PB  - Springer International Publishing
ER  - 

TY  - JOUR
T1  - A supervised approach for intra-/inter-community interaction prediction in dynamic social networks
JF  - Social Network Analysis and Mining
Y1  - 2016
A1  - Giulio Rossetti
A1  - Riccardo Guidotti
A1  - Ioanna Miliou
A1  - Dino Pedreschi
A1  - Fosca Giannotti
AB  - Due to the growing availability of Internet services in the last decade, the interactions between people became more and more easy to establish. For example, we can have an intercontinental job interview, or we can send real-time multimedia content to any friend of us just owning a smartphone. All this kind of human activities generates digital footprints, that describe a complex, rapidly evolving, network structures. In such dynamic scenario, one of the most challenging tasks involves the prediction of future interactions between couples of actors (i.e., users in online social networks, researchers in collaboration networks). In this paper, we approach such problem by leveraging networks dynamics: to this extent, we propose a supervised learning approach which exploits features computed by time-aware forecasts of topological measures calculated between node pairs. Moreover, since real social networks are generally composed by weakly connected modules, we instantiate the interaction prediction problem in two disjoint applicative scenarios: intra-community and inter-community link prediction. Experimental results on real time-stamped networks show how our approach is able to reach high accuracy. Furthermore, we analyze the performances of our methodology when varying the typologies of features, community discovery algorithms and forecast methods.
VL  - 6
UR  - http://dx.doi.org/10.1007/s13278-016-0397-y
ER  - 

TY  - JOUR
T1  - Unveiling mobility complexity through complex network analysis
JF  - Social Network Analysis and Mining
Y1  - 2016
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - S Rinzivillo
A1  - Dino Pedreschi
A1  - Fosca Giannotti
AB  - The availability of massive digital traces of individuals is offering a series of novel insights on the understanding of patterns characterizing human mobility. Many studies try to semantically enrich mobility data with annotations about human activities. However, these approaches either focus on places with high frequencies (e.g., home and work), or relay on background knowledge (e.g., public available points of interest). In this paper, we depart from the concept of frequency and we focus on a high level representation of mobility using network analytics. The visits of each driver to each systematic destination are modeled as links in a bipartite network where a set of nodes represents drivers and the other set represents places. We extract such network from two real datasets of human mobility based, respectively, on GPS and GSM data. We introduce the concept of mobility complexity of drivers and places as a ranking analysis over the nodes of these networks. In addition, by means of community discovery analysis, we differentiate subgroups of drivers and places according both to their homogeneity and to their mobility complexity.
VL  - 6
ER  - 

TY  - CHAP
T1  - Where Is My Next Friend? Recommending Enjoyable Profiles in Location Based Services
T2  - Complex Networks VII
Y1  - 2016
A1  - Riccardo Guidotti
A1  - Michele Berlingerio
AB  - How many of your friends, with whom you enjoy spending some time, live close by? How many people are at your reach, with whom you could have a nice conversation? We introduce a measure of enjoyability that may be the basis for a new class of location-based services aimed at maximizing the likelihood that two persons, or a group of people, would enjoy spending time together. Our enjoyability takes into account both topic similarity between two users and the users’ tendency to connect to people with similar or dissimilar interest. We computed the enjoyability on two datasets of geo-located tweets, and we reasoned on the applicability of the obtained results for producing friend recommendations. We aim at suggesting couples of users which are not friends yet, but which are frequently co-located and maximize our enjoyability measure. By taking into account the spatial dimension, we show how 50 % of users may find at least one enjoyable person within 10 km of their two most visited locations. Our results are encouraging, and open the way for a new class of recommender systems based on enjoyability.
JF  - Complex Networks VII
PB  - Springer International Publishing
ER  - 

TY  - CONF
T1  - Behavioral Entropy and Profitability in Retail
T2  - IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA'2015)
Y1  - 2015
A1  - Riccardo Guidotti
A1  - Michele Coscia
A1  - Dino Pedreschi
A1  - Diego Pennacchioli
AB  - Human behavior is predictable in principle: people are systematic in their everyday choices. This predictability can be used to plan events and infrastructure, both for the public good and for private gains. In this paper we investigate the largely unexplored relationship between the systematic behavior of a customer and its profitability for a retail company. We estimate a customer’s behavioral entropy over two dimensions: the basket entropy is the variety of what customers buy, and the spatio-temporal entropy is the spatial and temporal variety of their shopping sessions. To estimate the basket and the spatiotemporal entropy we use data mining and information theoretic techniques. We find that predictable systematic customers are more profitable for a supermarket: their average per capita expenditures are higher than non systematic customers and they visit the shops more often. However, this higher individual profitability is masked by its overall level. The highly systematic customers are a minority of the customer set. As a consequence, the total amount of revenues they generate is small. We suggest that favoring a systematic behavior in their customers might be a good strategy for supermarkets to increase revenue. These results are based on data coming from a large Italian supermarket chain, including more than 50 thousand customers visiting 23 shops to purchase more than 80 thousand distinct products.
JF  - IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA'2015)
PB  - IEEE
CY  - Paris
ER  - 

TY  - CONF
T1  - Find Your Way Back: Mobility Profile Mining with Constraints
T2  - Principles and Practice of Constraint Programming
Y1  - 2015
A1  - Lars Kotthoff
A1  - Mirco Nanni
A1  - Riccardo Guidotti
A1  - Barry O'Sullivan
AB  - Mobility profile mining is a data mining task that can be formulated as clustering over movement trajectory data. The main challenge is to separate the signal from the noise, i.e. one-off trips. We show that standard data mining approaches suffer the important drawback that they cannot take the symmetry of non-noise trajectories into account. That is, if a trajectory has a symmetric equivalent that covers the same trip in the reverse direction, it should become more likely that neither of them is labelled as noise. We present a constraint model that takes this knowledge into account to produce better clusters. We show the efficacy of our approach on real-world data that was previously processed using standard data mining techniques.
JF  - Principles and Practice of Constraint Programming
PB  - Springer International Publishing
CY  - Cork
ER  - 

TY  - CONF
T1  - Interaction Prediction in Dynamic Networks exploiting Community Discovery
T2  - International conference on Advances in Social Network Analysis and Mining, ASONAM 2015
Y1  - 2015
A1  - Giulio Rossetti
A1  - Riccardo Guidotti
A1  - Diego Pennacchioli
A1  - Dino Pedreschi
A1  - Fosca Giannotti
AB  - Due to the growing availability of online social services, interactions between people became more and more easy to establish and track. Online social human activities generate digital footprints, that describe complex, rapidly evolving, dynamic networks. In such scenario one of the most challenging task to address involves the prediction of future interactions between couples of actors. In this study, we want to leverage networks dynamics and community structure to predict which are the future interactions more likely to appear. To this extent, we propose a supervised learning approach which exploit features computed by time-aware forecasts of topological measures calculated between pair of nodes belonging to the same community. Our experiments on real dynamic networks show that the designed analytical process is able to achieve interesting results.
JF  - International conference on Advances in Social Network Analysis and Mining, ASONAM 2015
PB  - IEEE
CY  - Paris, France
SN  - 978-1-4503-3854-7
UR  - http://dl.acm.org/citation.cfm?doid=2808797.2809401
ER  - 

TY  - CONF
T1  - Managing travels with PETRA: The Rome use case
T2  - 2015 31st IEEE International Conference on Data Engineering Workshops (ICDEW)
Y1  - 2015
A1  - Botea, Adi
A1  - Braghin, Stefano
A1  - Lopes, Nuno
A1  - Riccardo Guidotti
A1  - Francesca Pratesi
AB  - The aim of the PETRA project is to provide the basis for a city-wide transportation system that supports policies catering for both individual preferences of users and city-wide travel patterns. The PETRA platform will be initially deployed in the partner city of Rome, and later in Venice, and Tel-Aviv.
JF  - 2015 31st IEEE International Conference on Data Engineering Workshops (ICDEW)
PB  - IEEE
ER  - 

TY  - CONF
T1  - Mobility Mining for Journey Planning in Rome
T2  - Machine Learning and Knowledge Discovery in Databases
Y1  - 2015
A1  - Michele Berlingerio
A1  - Bicer, Veli
A1  - Botea, Adi
A1  - Braghin, Stefano
A1  - Lopes, Nuno
A1  - Riccardo Guidotti
A1  - Francesca Pratesi
AB  - We present recent results on integrating private car GPS routines obtained by a Data Mining module. into the PETRA (PErsonal TRansport Advisor) platform. The routines are used as additional “bus lines”, available to provide a ride to travelers. We present the effects of querying the planner with and without the routines, which show how Data Mining may help Smarter Cities applications.
JF  - Machine Learning and Knowledge Discovery in Databases
PB  - Springer International Publishing
ER  - 

TY  - CONF
T1  - Social or green? A data-driven approach for more enjoyable carpooling
T2  - Intelligent Transportation Systems (ITSC), 2015 IEEE 18th International Conference on
Y1  - 2015
A1  - Riccardo Guidotti
A1  - Sassi, Andrea
A1  - Michele Berlingerio
A1  - Pascale, Alessandra
A1  - Ghaddar, Bissan
JF  - Intelligent Transportation Systems (ITSC), 2015 IEEE 18th International Conference on
PB  - IEEE
ER  - 

TY  - CONF
T1  - {TOSCA:} two-steps clustering algorithm for personal locations detection
T2  - Proceedings of the 23rd {SIGSPATIAL} International Conference on Advances in Geographic Information Systems, Bellevue, WA, USA, November 3-6, 2015
Y1  - 2015
A1  - Riccardo Guidotti
A1  - Roberto Trasarti
A1  - Mirco Nanni
JF  - Proceedings of the 23rd {SIGSPATIAL} International Conference on Advances in Geographic Information Systems, Bellevue, WA, USA, November 3-6, 2015
UR  - http://doi.acm.org/10.1145/2820783.2820818
ER  - 

TY  - CHAP
T1  - Towards a Boosted Route Planner Using Individual Mobility Models
T2  - Software Engineering and Formal Methods
Y1  - 2015
A1  - Riccardo Guidotti
A1  - Paolo Cintia
JF  - Software Engineering and Formal Methods
PB  - Springer Berlin Heidelberg
ER  - 

TY  - CONF
T1  - Towards user-centric data management: individual mobility analytics for collective services
T2  - Proceedings of the 4th {ACM} {SIGSPATIAL} International Workshop on Mobile Geographic Information Systems, MobiGIS 2015, Bellevue, WA, USA, November 3-6, 2015
Y1  - 2015
A1  - Riccardo Guidotti
A1  - Roberto Trasarti
A1  - Mirco Nanni
A1  - Fosca Giannotti
JF  - Proceedings of the 4th {ACM} {SIGSPATIAL} International Workshop on Mobile Geographic Information Systems, MobiGIS 2015, Bellevue, WA, USA, November 3-6, 2015
UR  - http://doi.acm.org/10.1145/2834126.2834132
ER  - 

TY  - CHAP
T1  - Retrieving Points of Interest from Human Systematic Movements
T2  - Software Engineering and Formal Methods
Y1  - 2014
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - S Rinzivillo
A1  - Dino Pedreschi
A1  - Fosca Giannotti
AB  - Human mobility analysis is emerging as a more and more fundamental task to deeply understand human behavior. In the last decade these kind of studies have become feasible thanks to the massive increase in availability of mobility data. A crucial point, for many mobility applications and analysis, is to extract interesting locations for people. In this paper, we propose a novel methodology to retrieve efficiently significant places of interest from movement data. Using car drivers’ systematic movements we mine everyday interesting locations, that is, places around which people life gravitates. The outcomes show the empirical evidence that these places capture nearly the whole mobility even though generated only from systematic movements abstractions.
JF  - Software Engineering and Formal Methods
PB  - Springer International Publishing
ER  - 

TY  - THES
T1  - Mobility Ranking - Human Mobility Analysis Using Ranking Measures
Y1  - 2013
A1  - Riccardo Guidotti
ER  -