%0 Journal Article %J CoRR %D 2021 %T Benchmarking and Survey of Explanation Methods for Black Box Models %A Francesco Bodria %A Fosca Giannotti %A Riccardo Guidotti %A Francesca Naretto %A Dino Pedreschi %A S Rinzivillo %B CoRR %V abs/2102.13076 %G eng %U https://arxiv.org/abs/2102.13076 %0 Journal Article %D 2021 %T Give more data, awareness and control to individual citizens, and they will help COVID-19 containment %A Mirco Nanni %A Andrienko, Gennady %A Barabasi, Albert-Laszlo %A Boldrini, Chiara %A Bonchi, Francesco %A Cattuto, Ciro %A Chiaromonte, Francesca %A Comandé, Giovanni %A Conti, Marco %A Coté, Mark %A Dignum, Frank %A Dignum, Virginia %A Domingo-Ferrer, Josep %A Ferragina, Paolo %A Fosca Giannotti %A Riccardo Guidotti %A Helbing, Dirk %A Kaski, Kimmo %A Kertész, János %A Lehmann, Sune %A Lepri, Bruno %A Lukowicz, Paul %A Matwin, Stan %A Jiménez, David Megías %A Anna Monreale %A Morik, Katharina %A Oliver, Nuria %A Passarella, Andrea %A Passerini, Andrea %A Dino Pedreschi %A Pentland, Alex %A Pianesi, Fabio %A Francesca Pratesi %A S Rinzivillo %A Salvatore Ruggieri %A Siebes, Arno %A Torra, Vicenc %A Roberto Trasarti %A Hoven, Jeroen van den %A Vespignani, Alessandro %X The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the “phase 2” of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens’ privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens’ “personal data stores”, to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates—if and when they want and for specific aims—with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society. %8 2021/02/02 %@ 1572-8439 %G eng %U https://link.springer.com/article/10.1007/s10676-020-09572-w %! Ethics and Information Technology %R https://doi.org/10.1007/s10676-020-09572-w %0 Journal Article %D 2021 %T GLocalX - From Local to Global Explanations of Black Box AI Models %A Mattia Setzu %A Riccardo Guidotti %A Anna Monreale %A Franco Turini %A Dino Pedreschi %A Fosca Giannotti %X Artificial Intelligence (AI) has come to prominence as one of the major components of our society, with applications in most aspects of our lives. In this field, complex and highly nonlinear machine learning models such as ensemble models, deep neural networks, and Support Vector Machines have consistently shown remarkable accuracy in solving complex tasks. Although accurate, AI models often are “black boxes” which we are not able to understand. Relying on these models has a multifaceted impact and raises significant concerns about their transparency. Applications in sensitive and critical domains are a strong motivational factor in trying to understand the behavior of black boxes. We propose to address this issue by providing an interpretable layer on top of black box models by aggregating “local” explanations. We present GLocalX, a “local-first” model agnostic explanation method. Starting from local explanations expressed in form of local decision rules, GLocalX iteratively generalizes them into global explanations by hierarchically aggregating them. Our goal is to learn accurate yet simple interpretable models to emulate the given black box, and, if possible, replace it entirely. We validate GLocalX in a set of experiments in standard and constrained settings with limited or no access to either data or local explanations. Experiments show that GLocalX is able to accurately emulate several models with simple and small models, reaching state-of-the-art performance against natively global solutions. Our findings show how it is often possible to achieve a high level of both accuracy and comprehensibility of classification models, even in complex domains with high-dimensional data, without necessarily trading one property for the other. This is a key requirement for a trustworthy AI, necessary for adoption in high-stakes decision making applications. %V 294 %P 103457 %8 2021/05/01/ %@ 0004-3702 %G eng %U https://www.sciencedirect.com/science/article/pii/S0004370221000084 %! Artificial Intelligence %R https://doi.org/10.1016/j.artint.2021.103457 %0 Journal Article %J IEEE Transactions on Dependable and Secure Computing %D 2020 %T Authenticated Outlier Mining for Outsourced Databases %A Dong, Boxiang %A Wang, Hui %A Anna Monreale %A Dino Pedreschi %A Fosca Giannotti %A Guo, Wenge %X The Data-Mining-as-a-Service (DMaS) paradigm is becoming the focus of research, as it allows the data owner (client) who lacks expertise and/or computational resources to outsource their data and mining needs to a third-party service provider (server). Outsourcing, however, raises some issues about result integrity: how could the client verify the mining results returned by the server are both sound and complete? In this paper, we focus on outlier mining, an important mining task. Previous verification techniques use an authenticated data structure (ADS) for correctness authentication, which may incur much space and communication cost. In this paper, we propose a novel solution that returns a probabilistic result integrity guarantee with much cheaper verification cost. The key idea is to insert a set of artificial records (ARs) into the dataset, from which it constructs a set of artificial outliers (AOs) and artificial non-outliers (ANOs). The AOs and ANOs are used by the client to detect any incomplete and/or incorrect mining results with a probabilistic guarantee. The main challenge that we address is how to construct ARs so that they do not change the (non-)outlierness of original records, while guaranteeing that the client can identify ANOs and AOs without executing mining. Furthermore, we build a strategic game and show that a Nash equilibrium exists only when the server returns correct outliers. Our implementation and experiments demonstrate that our verification solution is efficient and lightweight. %B IEEE Transactions on Dependable and Secure Computing %V 17 %P 222 - 235 %8 Jan-03-2020 %G eng %U https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8858https://ieeexplore.ieee.org/document/8048342/http://xplorestaging.ieee.org/ielx7/8858/9034462/08048342.pdf?arnumber=8048342https://ieeexplore.ieee.org/ielam/8858/9034462/8048342-aam.pdf %! IEEE Trans. Dependable and Secure Comput. %R 10.1109/TDSC.885810.1109/TDSC.2017.2754493 %0 Conference Paper %B International Symposium on Intelligent Data Analysis %D 2020 %T Digital Footprints of International Migration on Twitter %A Jisu Kim %A Alina Sirbu %A Fosca Giannotti %A Lorenzo Gabrielli %X Studying migration using traditional data has some limitations. To date, there have been several studies proposing innovative methodologies to measure migration stocks and flows from social big data. Nevertheless, a uniform definition of a migrant is difficult to find as it varies from one work to another depending on the purpose of the study and nature of the dataset used. In this work, a generic methodology is developed to identify migrants within the Twitter population. This describes a migrant as a person who has the current residence different from the nationality. The residence is defined as the location where a user spends most of his/her time in a certain year. The nationality is inferred from linguistic and social connections to a migrant’s country of origin. This methodology is validated first with an internal gold standard dataset and second with two official statistics, and shows strong performance scores and correlation coefficients. Our method has the advantage that it can identify both immigrants and emigrants, regardless of the origin/destination countries. The new methodology can be used to study various aspects of migration, including opinions, integration, attachment, stocks and flows, motivations for migration, etc. Here, we exemplify how trending topics across and throughout different migrant communities can be observed. %B International Symposium on Intelligent Data Analysis %I Springer %G eng %U https://link.springer.com/chapter/10.1007/978-3-030-44584-3_22 %R https://doi.org/10.1007/978-3-030-44584-3_22 %0 Conference Paper %B 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA) %D 2020 %T Estimating countries’ peace index through the lens of the world news as monitored by GDELT %A V. Voukelatou %A Luca Pappalardo %A Lorenzo Gabrielli %A Fosca Giannotti %X Peacefulness is a principal dimension of well-being, and its measurement has lately drawn the attention of researchers and policy-makers. During the last years, novel digital data streams have drastically changed research in this field. In the current study, we exploit information extracted from Global Data on Events, Location, and Tone (GDELT) digital news database, to capture peacefulness through the Global Peace Index (GPI). Applying machine learning techniques, we demonstrate that news media attention, sentiment, and social stability from GDELT can be used as proxies for measuring GPI at a monthly level. Additionally, through the variable importance analysis, we show that each country's socio-economic, political, and military profile emerges. This could bring added value to researchers interested in "Data Science for Social Good", to policy-makers, and peacekeeping organizations since they could monitor peacefulness almost real-time, and therefore facilitate timely and more efficient policy-making. %B 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA) %8 2020 %G eng %U https://ieeexplore.ieee.org/abstract/document/9260052 %R https://doi.org/10.1109/DSAA49011.2020.00034 %0 Journal Article %J International Journal of Data Science and Analytics %D 2020 %T Human migration: the big data perspective %A Alina Sirbu %A Andrienko, Gennady %A Andrienko, Natalia %A Boldrini, Chiara %A Conti, Marco %A Fosca Giannotti %A Riccardo Guidotti %A Bertoli, Simone %A Jisu Kim %A Muntean, Cristina Ioana %A Luca Pappalardo %A Passarella, Andrea %A Dino Pedreschi %A Pollacci, Laura %A Francesca Pratesi %A Sharma, Rajesh %X How can big data help to understand the migration phenomenon? In this paper, we try to answer this question through an analysis of various phases of migration, comparing traditional and novel data sources and models at each phase. We concentrate on three phases of migration, at each phase describing the state of the art and recent developments and ideas. The first phase includes the journey, and we study migration flows and stocks, providing examples where big data can have an impact. The second phase discusses the stay, i.e. migrant integration in the destination country. We explore various data sets and models that can be used to quantify and understand migrant integration, with the final aim of providing the basis for the construction of a novel multi-level integration index. The last phase is related to the effects of migration on the source countries and the return of migrants. %B International Journal of Data Science and Analytics %P 1–20 %8 2020/03/23 %@ 2364-4168 %G eng %U https://link.springer.com/article/10.1007%2Fs41060-020-00213-5 %! International Journal of Data Science and Analytics %R https://doi.org/10.1007/s41060-020-00213-5 %0 Generic %D 2020 %T Mobile phone data analytics against the COVID-19 epidemics in Italy: flow diversity and local job markets during the national lockdown %A Pietro Bonato %A Paolo Cintia %A Francesco Fabbri %A Daniele Fadda %A Fosca Giannotti %A Pier Luigi Lopalco %A Sara Mazzilli %A Mirco Nanni %A Luca Pappalardo %A Dino Pedreschi %A Francesco Penone %A S Rinzivillo %A Giulio Rossetti %A Marcello Savarese %A Lara Tavoschi %X Understanding collective mobility patterns is crucial to plan the restart of production and economic activities, which are currently put in stand-by to fight the diffusion of the epidemics. In this report, we use mobile phone data to infer the movements of people between Italian provinces and municipalities, and we analyze the incoming, outcoming and internal mobility flows before and during the national lockdown (March 9th, 2020) and after the closure of non-necessary productive and economic activities (March 23th, 2020). The population flow across provinces and municipalities enable for the modelling of a risk index tailored for the mobility of each municipality or province. Such an index would be a useful indicator to drive counter-measures in reaction to a sudden reactivation of the epidemics. Mobile phone data, even when aggregated to preserve the privacy of individuals, are a useful data source to track the evolution in time of human mobility, hence allowing for monitoring the effectiveness of control measures such as physical distancing. We address the following analytical questions: How does the mobility structure of a territory change? Do incoming and outcoming flows become more predictable during the lockdown, and what are the differences between weekdays and weekends? Can we detect proper local job markets based on human mobility flows, to eventually shape the borders of a local outbreak? %G eng %U https://arxiv.org/abs/2004.11278 %R https://dx.doi.org/10.32079/ISTI-TR-2020/005 %0 Conference Paper %B ECML PKDD 2020 Workshops %D 2020 %T Prediction and Explanation of Privacy Risk on Mobility Data with Neural Networks %A Francesca Naretto %A Roberto Pellungrini %A Nardini, Franco Maria %A Fosca Giannotti %E Koprinska, Irena %E Kamp, Michael %E Appice, Annalisa %E Loglisci, Corrado %E Antonie, Luiza %E Zimmermann, Albrecht %E Riccardo Guidotti %E Özgöbek, Özlem %E Ribeiro, Rita P. %E Gavaldà, Ricard %E Gama, João %E Adilova, Linara %E Krishnamurthy, Yamuna %E Ferreira, Pedro M. %E Malerba, Donato %E Medeiros, Ibéria %E Ceci, Michelangelo %E Manco, Giuseppe %E Masciari, Elio %E Ras, Zbigniew W. %E Christen, Peter %E Ntoutsi, Eirini %E Schubert, Erich %E Zimek, Arthur %E Anna Monreale %E Biecek, Przemyslaw %E S Rinzivillo %E Kille, Benjamin %E Lommatzsch, Andreas %E Gulla, Jon Atle %X The analysis of privacy risk for mobility data is a fundamental part of any privacy-aware process based on such data. Mobility data are highly sensitive. Therefore, the correct identification of the privacy risk before releasing the data to the public is of utmost importance. However, existing privacy risk assessment frameworks have high computational complexity. To tackle these issues, some recent work proposed a solution based on classification approaches to predict privacy risk using mobility features extracted from the data. In this paper, we propose an improvement of this approach by applying long short-term memory (LSTM) neural networks to predict the privacy risk directly from original mobility data. We empirically evaluate privacy risk on real data by applying our LSTM-based approach. Results show that our proposed method based on a LSTM network is effective in predicting the privacy risk with results in terms of F1 of up to 0.91. Moreover, to explain the predictions of our model, we employ a state-of-the-art explanation algorithm, Shap. We explore the resulting explanation, showing how it is possible to provide effective predictions while explaining them to the end-user. %B ECML PKDD 2020 Workshops %I Springer International Publishing %C Cham %8 2020// %@ 978-3-030-65965-3 %G eng %U https://link.springer.com/chapter/10.1007/978-3-030-65965-3_34 %R https://doi.org/10.1007/978-3-030-65965-3_34 %0 Journal Article %D 2020 %T PRIMULE: Privacy risk mitigation for user profiles %A Francesca Pratesi %A Lorenzo Gabrielli %A Paolo Cintia %A Anna Monreale %A Fosca Giannotti %X The availability of mobile phone data has encouraged the development of different data-driven tools, supporting social science studies and providing new data sources to the standard official statistics. However, this particular kind of data are subject to privacy concerns because they can enable the inference of personal and private information. In this paper, we address the privacy issues related to the sharing of user profiles, derived from mobile phone data, by proposing PRIMULE, a privacy risk mitigation strategy. Such a method relies on PRUDEnce (Pratesi et al., 2018), a privacy risk assessment framework that provides a methodology for systematically identifying risky-users in a set of data. An extensive experimentation on real-world data shows the effectiveness of PRIMULE strategy in terms of both quality of mobile user profiles and utility of these profiles for analytical services such as the Sociometer (Furletti et al., 2013), a data mining tool for city users classification. %V 125 %P 101786 %8 2020/01/01/ %@ 0169-023X %G eng %U https://www.sciencedirect.com/science/article/pii/S0169023X18305342 %! Data & Knowledge Engineering %R https://doi.org/10.1016/j.datak.2019.101786 %0 Journal Article %J arXiv preprint arXiv:2006.03141 %D 2020 %T The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy %A Paolo Cintia %A Daniele Fadda %A Fosca Giannotti %A Luca Pappalardo %A Giulio Rossetti %A Dino Pedreschi %A S Rinzivillo %A Bonato, Pietro %A Fabbri, Francesco %A Penone, Francesco %A Savarese, Marcello %A Checchi, Daniele %A Chiaromonte, Francesca %A Vineis , Paolo %A Guzzetta, Giorgio %A Riccardo, Flavia %A Marziano, Valentina %A Poletti, Piero %A Trentini, Filippo %A Bella, Antonio %A Andrianou, Xanthi %A Del Manso, Martina %A Fabiani, Massimo %A Bellino, Stefania %A Boros, Stefano %A Mateo Urdiales, Alberto %A Vescio, Maria Fenicia %A Brusaferro, Silvio %A Rezza, Giovanni %A Pezzotti, Patrizio %A Ajelli, Marco %A Merler, Stefano %X We describe in this report our studies to understand the relationship between human mobility and the spreading of COVID-19, as an aid to manage the restart of the social and economic activities after the lockdown and monitor the epidemics in the coming weeks and months. We compare the evolution (from January to May 2020) of the daily mobility flows in Italy, measured by means of nation-wide mobile phone data, and the evolution of transmissibility, measured by the net reproduction number, i.e., the mean number of secondary infections generated by one primary infector in the presence of control interventions and human behavioural adaptations. We find a striking relationship between the negative variation of mobility flows and the net reproduction number, in all Italian regions, between March 11th and March 18th, when the country entered the lockdown. This observation allows us to quantify the time needed to "switch off" the country mobility (one week) and the time required to bring the net reproduction number below 1 (one week). A reasonably simple regression model provides evidence that the net reproduction number is correlated with a region's incoming, outgoing and internal mobility. We also find a strong relationship between the number of days above the epidemic threshold before the mobility flows reduce significantly as an effect of lockdowns, and the total number of confirmed SARS-CoV-2 infections per 100k inhabitants, thus indirectly showing the effectiveness of the lockdown and the other non-pharmaceutical interventions in the containment of the contagion. Our study demonstrates the value of "big" mobility data to the monitoring of key epidemic indicators to inform choices as the epidemics unfolds in the coming months. %B arXiv preprint arXiv:2006.03141 %G eng %U https://arxiv.org/abs/2006.03141 %0 Journal Article %J International Journal of Data Science and Analytics %D 2020 %T (So) Big Data and the transformation of the city %A Andrienko, Gennady %A Andrienko, Natalia %A Boldrini, Chiara %A Caldarelli, Guido %A Paolo Cintia %A Cresci, Stefano %A Facchini, Angelo %A Fosca Giannotti %A Gionis, Aristides %A Riccardo Guidotti %A others %X The exponential increase in the availability of large-scale mobility data has fueled the vision of smart cities that will transform our lives. The truth is that we have just scratched the surface of the research challenges that should be tackled in order to make this vision a reality. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders in building knowledge discovery pipelines over such data sources. At the same time, this widespread data availability also raises privacy issues that must be considered by both industrial and academic stakeholders. In this paper, we provide a wide perspective on the role that big data have in reshaping cities. The paper covers the main aspects of urban data analytics, focusing on privacy issues, algorithms, applications and services, and georeferenced data from social media. In discussing these aspects, we leverage, as concrete examples and case studies of urban data science tools, the results obtained in the “City of Citizens” thematic area of the Horizon 2020 SoBigData initiative, which includes a virtual research environment with mobility datasets and urban analytics methods developed by several institutions around Europe. We conclude the paper outlining the main research challenges that urban data science has yet to address in order to help make the smart city vision a reality. %B International Journal of Data Science and Analytics %G eng %U https://link.springer.com/article/10.1007/s41060-020-00207-3 %R https://doi.org/10.1007/s41060-020-00207-3 %0 Journal Article %J PloS one %D 2019 %T Algorithmic bias amplifies opinion fragmentation and polarization: A bounded confidence model %A Alina Sirbu %A Dino Pedreschi %A Fosca Giannotti %A Kertész, János %X The flow of information reaching us via the online media platforms is optimized not by the information content or relevance but by popularity and proximity to the target. This is typically performed in order to maximise platform usage. As a side effect, this introduces an algorithmic bias that is believed to enhance fragmentation and polarization of the societal debate. To study this phenomenon, we modify the well-known continuous opinion dynamics model of bounded confidence in order to account for the algorithmic bias and investigate its consequences. In the simplest version of the original model the pairs of discussion participants are chosen at random and their opinions get closer to each other if they are within a fixed tolerance level. We modify the selection rule of the discussion partners: there is an enhanced probability to choose individuals whose opinions are already close to each other, thus mimicking the behavior of online media which suggest interaction with similar peers. As a result we observe: a) an increased tendency towards opinion fragmentation, which emerges also in conditions where the original model would predict consensus, b) increased polarisation of opinions and c) a dramatic slowing down of the speed at which the convergence at the asymptotic state is reached, which makes the system highly unstable. Fragmentation and polarization are augmented by a fragmented initial population. %B PloS one %V 14 %P e0213246 %G eng %U https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213246 %R 10.1371/journal.pone.0213246 %0 Journal Article %J IEEE Intelligent Systems %D 2019 %T Factual and Counterfactual Explanations for Black Box Decision Making %A Riccardo Guidotti %A Anna Monreale %A Fosca Giannotti %A Dino Pedreschi %A Salvatore Ruggieri %A Franco Turini %X The rise of sophisticated machine learning models has brought accurate but obscure decision systems, which hide their logic, thus undermining transparency, trust, and the adoption of artificial intelligence (AI) in socially sensitive and safety-critical contexts. We introduce a local rule-based explanation method, providing faithful explanations of the decision made by a black box classifier on a specific instance. The proposed method first learns an interpretable, local classifier on a synthetic neighborhood of the instance under investigation, generated by a genetic algorithm. Then, it derives from the interpretable classifier an explanation consisting of a decision rule, explaining the factual reasons of the decision, and a set of counterfactuals, suggesting the changes in the instance features that would lead to a different outcome. Experimental results show that the proposed method outperforms existing approaches in terms of the quality of the explanations and of the accuracy in mimicking the black box. %B IEEE Intelligent Systems %G eng %U https://ieeexplore.ieee.org/abstract/document/8920138 %R 10.1109/MIS.2019.2957223 %0 Conference Paper %B Proceedings of the AAAI Conference on Artificial Intelligence %D 2019 %T Meaningful explanations of Black Box AI decision systems %A Dino Pedreschi %A Fosca Giannotti %A Riccardo Guidotti %A Anna Monreale %A Salvatore Ruggieri %A Franco Turini %X Black box AI systems for automated decision making, often based on machine learning over (big) data, map a user’s features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases inherited by the algorithms from human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We focus on the urgent open challenge of how to construct meaningful explanations of opaque AI/ML systems, introducing the local-toglobal framework for black box explanation, articulated along three lines: (i) the language for expressing explanations in terms of logic rules, with statistical and causal interpretation; (ii) the inference of local explanations for revealing the decision rationale for a specific case, by auditing the black box in the vicinity of the target instance; (iii), the bottom-up generalization of many local explanations into simple global ones, with algorithms that optimize for quality and comprehensibility. We argue that the local-first approach opens the door to a wide variety of alternative solutions along different dimensions: a variety of data sources (relational, text, images, etc.), a variety of learning problems (multi-label classification, regression, scoring, ranking), a variety of languages for expressing meaningful explanations, a variety of means to audit a black box. %B Proceedings of the AAAI Conference on Artificial Intelligence %G eng %U https://aaai.org/ojs/index.php/AAAI/article/view/5050 %R 10.1609/aaai.v33i01.33019780 %0 Journal Article %J ACM Transactions on Intelligent Systems and Technology (TIST) %D 2019 %T PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach %A Luca Pappalardo %A Paolo Cintia %A Ferragina, Paolo %A Massucco, Emanuele %A Dino Pedreschi %A Fosca Giannotti %X The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this article, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players’ evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by PlayeRank and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank—i.e. searching players and player versatility—showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics. %B ACM Transactions on Intelligent Systems and Technology (TIST) %V 10 %P 1–27 %G eng %U https://dl.acm.org/doi/abs/10.1145/3343172 %R 10.1145/3343172 %0 Journal Article %J Scientific data %D 2019 %T A public data set of spatio-temporal match events in soccer competitions %A Luca Pappalardo %A Paolo Cintia %A Alessio Rossi %A Massucco, Emanuele %A Ferragina, Paolo %A Dino Pedreschi %A Fosca Giannotti %X Soccer analytics is attracting increasing interest in academia and industry, thanks to the availability of sensing technologies that provide high-fidelity data streams for every match. Unfortunately, these detailed data are owned by specialized companies and hence are rarely publicly available for scientific research. To fill this gap, this paper describes the largest open collection of soccer-logs ever released, containing all the spatio-temporal events (passes, shots, fouls, etc.) that occured during each match for an entire season of seven prominent soccer competitions. Each match event contains information about its position, time, outcome, player and characteristics. The nature of team sports like soccer, halfway between the abstraction of a game and the reality of complex social systems, combined with the unique size and composition of this dataset, provide an ideal ground for tackling a wide range of data science problems, including the measurement and evaluation of performance, both at individual and at collective level, and the determinants of success and failure. %B Scientific data %V 6 %P 1–15 %G eng %U https://www.nature.com/articles/s41597-019-0247-7 %R 10.1038/s41597-019-0247-7 %0 Journal Article %J ERCIM News %D 2019 %T Public opinion and Algorithmic bias %A Alina Sirbu %A Fosca Giannotti %A Dino Pedreschi %A Kertész, János %B ERCIM News %G eng %U https://ercim-news.ercim.eu/en116/special/public-opinion-and-algorithmic-bias %0 Journal Article %J ERCIM News %D 2019 %T Transparency in Algorithmic Decision Making %A Andreas Rauber %A Roberto Trasarti %A Fosca Giannotti %B ERCIM News %G eng %U https://ercim-news.ercim.eu/en116/special/transparency-in-algorithmic-decision-making-introduction-to-the-special-theme %0 Journal Article %J Applied network science %D 2018 %T Active and passive diffusion processes in complex networks %A Letizia Milli %A Giulio Rossetti %A Dino Pedreschi %A Fosca Giannotti %X Ideas, information, viruses: all of them, with their mechanisms, spread over the complex social information, viruses: all tissues described by our interpersonal relations. Usually, to simulate and understand the unfolding of such complex phenomena are used general mathematical models; these models act agnostically from the object of which they simulate the diffusion, thus considering spreading of virus, ideas and innovations alike. Indeed, such degree of abstraction makes it easier to define a standard set of tools that can be applied to heterogeneous contexts; however, it can also lead to biased, incorrect, simulation outcomes. In this work we introduce the concepts of active and passive diffusion to discriminate the degree in which individuals choice affect the overall spreading of content over a social graph. Moving from the analysis of a well-known passive diffusion schema, the Threshold model (that can be used to model peer-pressure related processes), we introduce two novel approaches whose aim is to provide active and mixed schemas applicable in the context of innovations/ideas diffusion simulation. Our analysis, performed both in synthetic and real-world data, underline that the adoption of exclusively passive/active models leads to conflicting results, thus highlighting the need of mixed approaches to capture the real complexity of the simulated system better. %B Applied network science %V 3 %P 42 %G eng %U https://link.springer.com/article/10.1007/s41109-018-0100-5 %R https://doi.org/10.1007/s41109-018-0100-5 %0 Conference Paper %B International Conference on Complex Networks CompleNet %D 2018 %T Diffusive Phenomena in Dynamic Networks: a data-driven study %A Letizia Milli %A Giulio Rossetti %A Dino Pedreschi %A Fosca Giannotti %X Everyday, ideas, information as well as viruses spread over complex social tissues described by our interpersonal relations. So far, the network contexts upon which diffusive phenomena unfold have usually considered static, composed by a fixed set of nodes and edges. Recent studies describe social networks as rapidly changing topologies. In this work – following a data-driven approach – we compare the behaviors of classical spreading models when used to analyze a given social network whose topological dynamics are observed at different temporal-granularities. Our goal is to shed some light on the impacts that the adoption of a static topology has on spreading simulations as well as to provide an alternative formulation of two classical diffusion models. %B International Conference on Complex Networks CompleNet %I Springer %C Boston March 5-8 2018 %G eng %U https://link.springer.com/chapter/10.1007/978-3-319-73198-8_13 %R 10.1007/978-3-319-73198-8_13 %0 Conference Paper %B International Workshop on Complex Networks %D 2018 %T Discovering Mobility Functional Areas: A Mobility Data Analysis Approach %A Lorenzo Gabrielli %A Daniele Fadda %A Giulio Rossetti %A Mirco Nanni %A Piccinini, Leonardo %A Dino Pedreschi %A Fosca Giannotti %A Patrizia Lattarulo %X How do we measure the borders of urban areas and therefore decide which are the functional units of the territory? Nowadays, we typically do that just looking at census data, while in this work we aim to identify functional areas for mobility in a completely data-driven way. Our solution makes use of human mobility data (vehicle trajectories) and consists in an agglomerative process which gradually groups together those municipalities that maximize internal vehicular traffic while minimizing external one. The approach is tested against a dataset of trips involving individuals of an Italian Region, obtaining a new territorial division which allows us to identify mobility attractors. Leveraging such partitioning and external knowledge, we show that our method outperforms the state-of-the-art algorithms. Indeed, the outcome of our approach is of great value to public administrations for creating synergies within the aggregations of the territories obtained. %B International Workshop on Complex Networks %I Springer %G eng %U https://link.springer.com/chapter/10.1007/978-3-319-73198-8_27 %R 10.1007/978-3-319-73198-8_27 %0 Journal Article %J EPJ Data Science %D 2018 %T Discovering temporal regularities in retail customers’ shopping behavior %A Riccardo Guidotti %A Lorenzo Gabrielli %A Anna Monreale %A Dino Pedreschi %A Fosca Giannotti %X In this paper we investigate the regularities characterizing the temporal purchasing behavior of the customers of a retail market chain. Most of the literature studying purchasing behavior focuses on what customers buy while giving few importance to the temporal dimension. As a consequence, the state of the art does not allow capturing which are the temporal purchasing patterns of each customers. These patterns should describe the customer’s temporal habits highlighting when she typically makes a purchase in correlation with information about the amount of expenditure, number of purchased items and other similar aggregates. This knowledge could be exploited for different scopes: set temporal discounts for making the purchases of customers more regular with respect the time, set personalized discounts in the day and time window preferred by the customer, provide recommendations for shopping time schedule, etc. To this aim, we introduce a framework for extracting from personal retail data a temporal purchasing profile able to summarize whether and when a customer makes her distinctive purchases. The individual profile describes a set of regular and characterizing shopping behavioral patterns, and the sequences in which these patterns take place. We show how to compare different customers by providing a collective perspective to their individual profiles, and how to group the customers with respect to these comparable profiles. By analyzing real datasets containing millions of shopping sessions we found that there is a limited number of patterns summarizing the temporal purchasing behavior of all the customers, and that they are sequentially followed in a finite number of ways. Moreover, we recognized regular customers characterized by a small number of temporal purchasing behaviors, and changing customers characterized by various types of temporal purchasing behaviors. Finally, we discuss on how the profiles can be exploited both by customers to enable personalized services, and by the retail market chain for providing tailored discounts based on temporal purchasing regularity. %B EPJ Data Science %V 7 %P 6 %8 01/2018 %G eng %U https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-018-0133-0 %R 10.1140/epjds/s13688-018-0133-0 %0 Conference Paper %B International Conference on Smart Objects and Technologies for Social Good %D 2018 %T The Fractal Dimension of Music: Geography, Popularity and Sentiment Analysis %A Pollacci, Laura %A Riccardo Guidotti %A Giulio Rossetti %A Fosca Giannotti %A Dino Pedreschi %X Nowadays there is a growing standardization of musical contents. Our finding comes out from a cross-service multi-level dataset analysis where we study how geography affects the music production. The investigation presented in this paper highlights the existence of a “fractal” musical structure that relates the technical characteristics of the music produced at regional, national and world level. Moreover, a similar structure emerges also when we analyze the musicians’ popularity and the polarity of their songs defined as the mood that they are able to convey. Furthermore, the clusters identified are markedly distinct one from another with respect to popularity and sentiment. %B International Conference on Smart Objects and Technologies for Social Good %I Springer %G eng %U https://link.springer.com/chapter/10.1007/978-3-319-76111-4_19 %R 10.1007/978-3-319-76111-4_19 %0 Book Section %B A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years %D 2018 %T How Data Mining and Machine Learning Evolved from Relational Data Base to Data Science %A Amato, G. %A Candela, L. %A Castelli, D. %A Esuli, A. %A Falchi, F. %A Gennaro, C. %A Fosca Giannotti %A Anna Monreale %A Mirco Nanni %A Pagano, P. %A Luca Pappalardo %A Dino Pedreschi %A Francesca Pratesi %A Rabitti, F. %A S Rinzivillo %A Giulio Rossetti %A Salvatore Ruggieri %A Sebastiani, F. %A Tesconi, M. %E Flesca, Sergio %E Greco, Sergio %E Masciari, Elio %E Saccà, Domenico %X During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led to profound pervasiveness of relational databases in any kind of organization. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today. %B A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years %I Springer International Publishing %C Cham %P 287 - 306 %@ 978-3-319-61893-7 %G eng %U https://link.springer.com/chapter/10.1007%2F978-3-319-61893-7_17 %R https://doi.org/10.1007/978-3-319-61893-7_17 %0 Journal Article %J Multimedia Tools and Applications %D 2018 %T The italian music superdiversity %A Pollacci, Laura %A Riccardo Guidotti %A Giulio Rossetti %A Fosca Giannotti %A Dino Pedreschi %X Globalization can lead to a growing standardization of musical contents. Using a cross-service multi-level dataset we investigate the actual Italian music scene. The investigation highlights the musical Italian superdiversity both individually analyzing the geographical and lexical dimensions and combining them. Using different kinds of features over the geographical dimension leads to two similar, comparable and coherent results, confirming the strong and essential correlation between melodies and lyrics. The profiles identified are markedly distinct one from another with respect to sentiment, lexicon, and melodic features. Through a novel application of a sentiment spreading algorithm and songs’ melodic features, we are able to highlight discriminant characteristics that violate the standard regional political boundaries, reconfiguring them following the actual musical communicative practices. %B Multimedia Tools and Applications %P 1–23 %G eng %U https://link.springer.com/article/10.1007/s11042-018-6511-6 %R 10.1007/s11042-018-6511-6 %0 Report %D 2018 %T Local Rule-Based Explanations of Black Box Decision Systems %A Riccardo Guidotti %A Anna Monreale %A Salvatore Ruggieri %A Dino Pedreschi %A Franco Turini %A Fosca Giannotti %B arXiv preprint arXiv:1805.10820 %G eng %0 Journal Article %J International Journal of Data Science and Analytics %D 2018 %T NDlib: a python library to model and analyze diffusion processes over complex networks %A Giulio Rossetti %A Letizia Milli %A S Rinzivillo %A Alina Sirbu %A Dino Pedreschi %A Fosca Giannotti %X Nowadays the analysis of dynamics of and on networks represents a hot topic in the social network analysis playground. To support students, teachers, developers and researchers, in this work we introduce a novel framework, namely NDlib, an environment designed to describe diffusion simulations. NDlib is designed to be a multi-level ecosystem that can be fruitfully used by different user segments. For this reason, upon NDlib, we designed a simulation server that allows remote execution of experiments as well as an online visualization tool that abstracts its programmatic interface and makes available the simulation platform to non-technicians. %B International Journal of Data Science and Analytics %V 5 %P 61–79 %G eng %U https://link.springer.com/article/10.1007/s41060-017-0086-6 %R 10.1007/s41060-017-0086-6 %0 Report %D 2018 %T Open the Black Box Data-Driven Explanation of Black Box Decision Systems %A Dino Pedreschi %A Fosca Giannotti %A Riccardo Guidotti %A Anna Monreale %A Luca Pappalardo %A Salvatore Ruggieri %A Franco Turini %B arXiv preprint arXiv:1806.09936 %G eng %0 Journal Article %J IEEE Transactions on Knowledge and Data Engineering %D 2018 %T Personalized Market Basket Prediction with Temporal Annotated Recurring Sequences %A Riccardo Guidotti %A Giulio Rossetti %A Luca Pappalardo %A Fosca Giannotti %A Dino Pedreschi %X Nowadays, a hot challenge for supermarket chains is to offer personalized services to their customers. Market basket prediction, i.e., supplying the customer a shopping list for the next purchase according to her current needs, is one of these services. Current approaches are not capable of capturing at the same time the different factors influencing the customer's decision process: co-occurrence, sequentuality, periodicity and recurrency of the purchased items. To this aim, we define a pattern Temporal Annotated Recurring Sequence (TARS) able to capture simultaneously and adaptively all these factors. We define the method to extract TARS and develop a predictor for next basket named TBP (TARS Based Predictor) that, on top of TARS, is able to understand the level of the customer's stocks and recommend the set of most necessary items. By adopting the TBP the supermarket chains could crop tailored suggestions for each individual customer which in turn could effectively speed up their shopping sessions. A deep experimentation shows that TARS are able to explain the customer purchase behavior, and that TBP outperforms the state-of-the-art competitors. %B IEEE Transactions on Knowledge and Data Engineering %G eng %U https://ieeexplore.ieee.org/abstract/document/8477157 %R 10.1109/TKDE.2018.2872587 %0 Journal Article %J Transactions on Data Privacy %D 2018 %T PRUDEnce: a system for assessing privacy risk vs utility in data sharing ecosystems %A Francesca Pratesi %A Anna Monreale %A Roberto Trasarti %A Fosca Giannotti %A Dino Pedreschi %A Yanagihara, Tadashi %X Data describing human activities are an important source of knowledge useful for understanding individual and collective behavior and for developing a wide range of user services. Unfortunately, this kind of data is sensitive, because people’s whereabouts may allow re-identification of individuals in a de-identified database. Therefore, Data Providers, before sharing those data, must apply any sort of anonymization to lower the privacy risks, but they must be aware and capable of controlling also the data quality, since these two factors are often a trade-off. In this paper we propose PRUDEnce (Privacy Risk versus Utility in Data sharing Ecosystems), a system enabling a privacy-aware ecosystem for sharing personal data. It is based on a methodology for assessing both the empirical (not theoretical) privacy risk associated to users represented in the data, and the data quality guaranteed only with users not at risk. Our proposal is able to support the Data Provider in the exploration of a repertoire of possible data transformations with the aim of selecting one specific transformation that yields an adequate trade-off between data quality and privacy risk. We study the practical effectiveness of our proposal over three data formats underlying many services, defined on real mobility data, i.e., presence data, trajectory data and road segment data. %B Transactions on Data Privacy %V 11 %8 08/2018 %G eng %U http://www.tdp.cat/issues16/tdp.a284a17.pdf %0 Conference Paper %B Companion of the The Web Conference 2018 on The Web Conference 2018 %D 2018 %T SoBigData: Social Mining & Big Data Ecosystem %A Fosca Giannotti %A Roberto Trasarti %A Bontcheva, Kalina %A Valerio Grossi %X One of the most pressing and fascinating challenges scientists face today, is understanding the complexity of our globally interconnected society. The big data arising from the digital breadcrumbs of human activities has the potential of providing a powerful social microscope, which can help us understand many complex and hidden socio-economic phenomena. Such challenge requires high-level analytics, modeling and reasoning across all the social dimensions above. There is a need to harness these opportunities for scientific advancement and for the social good, compared to the currently prevalent exploitation of big data for commercial purposes or, worse, social control and surveillance. The main obstacle to this accomplishment, besides the scarcity of data scientists, is the lack of a large-scale open ecosystem where big data and social mining research can be carried out. The SoBigData Research Infrastructure (RI) provides an integrated ecosystem for ethic-sensitive scientific discoveries and advanced applications of social data mining on the various dimensions of social life as recorded by "big data". The research community uses the SoBigData facilities as a "secure digital wind-tunnel" for large-scale social data analysis and simulation experiments. SoBigData promotes repeatable and open science and supports data science research projects by providing: i) an ever-growing, distributed data ecosystem for procurement, access and curation and management of big social data, to underpin social data mining research within an ethic-sensitive context; ii) an ever-growing, distributed platform of interoperable, social data mining methods and associated skills: tools, methodologies and services for mining, analysing, and visualising complex and massive datasets, harnessing the techno-legal barriers to the ethically safe deployment of big data for social mining; iii) an ecosystem where protection of personal information and the respect for fundamental human rights can coexist with a safe use of the same information for scientific purposes of broad and central societal interest. SoBigData has a dedicated ethical and legal board, which is implementing a legal and ethical framework. %B Companion of the The Web Conference 2018 on The Web Conference 2018 %I International World Wide Web Conferences Steering Committee %G eng %U http://www.sobigdata.eu/sites/default/files/www%202018.pdf %0 Journal Article %J ACM computing surveys (CSUR) %D 2018 %T A survey of methods for explaining black box models %A Riccardo Guidotti %A Anna Monreale %A Salvatore Ruggieri %A Franco Turini %A Fosca Giannotti %A Dino Pedreschi %X In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective. %B ACM computing surveys (CSUR) %V 51 %P 93 %G eng %U https://dl.acm.org/doi/abs/10.1145/3236009 %R 10.1145/3236009 %0 Journal Article %J IEEE Transactions on Dependable and Secure Computing %D 2017 %T Authenticated Outlier Mining for Outsourced Databases %A Dong, Boxiang %A Hui Wendy Wang %A Anna Monreale %A Dino Pedreschi %A Fosca Giannotti %A W Guo %X The Data-Mining-as-a-Service (DMaS) paradigm is becoming the focus of research, as it allows the data owner (client) who lacks expertise and/or computational resources to outsource their data and mining needs to a third-party service provider (server). Outsourcing, however, raises some issues about result integrity: how could the client verify the mining results returned by the server are both sound and complete? In this paper, we focus on outlier mining, an important mining task. Previous verification techniques use an authenticated data structure (ADS) for correctness authentication, which may incur much space and communication cost. In this paper, we propose a novel solution that returns a probabilistic result integrity guarantee with much cheaper verification cost. The key idea is to insert a set of artificial records (ARs) into the dataset, from which it constructs a set of artificial outliers (AOs) and artificial non-outliers (ANOs). The AOs and ANOs are used by the client to detect any incomplete and/or incorrect mining results with a probabilistic guarantee. The main challenge that we address is how to construct ARs so that they do not change the (non-)outlierness of original records, while guaranteeing that the client can identify ANOs and AOs without executing mining. Furthermore, we build a strategic game and show that a Nash equilibrium exists only when the server returns correct outliers. Our implementation and experiments demonstrate that our verification solution is efficient and lightweight. %B IEEE Transactions on Dependable and Secure Computing %G eng %U https://ieeexplore.ieee.org/document/8048342/ %R 10.1109/TDSC.2017.2754493 %0 Conference Paper %B Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining %D 2017 %T Clustering Individual Transactional Data for Masses of Users %A Riccardo Guidotti %A Anna Monreale %A Mirco Nanni %A Fosca Giannotti %A Dino Pedreschi %X Mining a large number of datasets recording human activities for making sense of individual data is the key enabler of a new wave of personalized knowledge-based services. In this paper we focus on the problem of clustering individual transactional data for a large mass of users. Transactional data is a very pervasive kind of information that is collected by several services, often involving huge pools of users. We propose txmeans, a parameter-free clustering algorithm able to efficiently partitioning transactional data in a completely automatic way. Txmeans is designed for the case where clustering must be applied on a massive number of different datasets, for instance when a large set of users need to be analyzed individually and each of them has generated a long history of transactions. A deep experimentation on both real and synthetic datasets shows the practical effectiveness of txmeans for the mass clustering of different personal datasets, and suggests that txmeans outperforms existing methods in terms of quality and efficiency. Finally, we present a personal cart assistant application based on txmeans %B Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining %I ACM %G eng %R 10.1145/3097983.3098034 %0 Journal Article %J PloS one %D 2017 %T Forecasting success via early adoptions analysis: A data-driven study %A Giulio Rossetti %A Letizia Milli %A Fosca Giannotti %A Dino Pedreschi %X Innovations are continuously launched over markets, such as new products over the retail market or new artists over the music scene. Some innovations become a success; others don’t. Forecasting which innovations will succeed at the beginning of their lifecycle is hard. In this paper, we provide a data-driven, large-scale account of the existence of a special niche among early adopters, individuals that consistently tend to adopt successful innovations before they reach success: we will call them Hit-Savvy. Hit-Savvy can be discovered in very different markets and retain over time their ability to anticipate the success of innovations. As our second contribution, we devise a predictive analytical process, exploiting Hit-Savvy as signals, which achieves high accuracy in the early-stage prediction of successful innovations, far beyond the reach of state-of-the-art time series forecasting models. Indeed, our findings and predictive model can be fruitfully used to support marketing strategies and product placement. %B PloS one %V 12 %P e0189096 %G eng %0 Conference Paper %B International Conference on Smart Objects and Technologies for Social Good %D 2017 %T The Fractal Dimension of Music: Geography, Popularity and Sentiment Analysis %A Pollacci, Laura %A Riccardo Guidotti %A Giulio Rossetti %A Fosca Giannotti %A Dino Pedreschi %X Nowadays there is a growing standardization of musical contents. Our finding comes out from a cross-service multi-level dataset analysis where we study how geography affects the music production. The investigation presented in this paper highlights the existence of a “fractal” musical structure that relates the technical characteristics of the music produced at regional, national and world level. Moreover, a similar structure emerges also when we analyze the musicians’ popularity and the polarity of their songs defined as the mood that they are able to convey. Furthermore, the clusters identified are markedly distinct one from another with respect to popularity and sentiment. %B International Conference on Smart Objects and Technologies for Social Good %I Springer, Cham %G eng %U https://link.springer.com/chapter/10.1007/978-3-319-76111-4_19 %R https://doi.org/10.1007/978-3-319-76111-4_19 %0 Journal Article %J D-Lib Magazine %D 2017 %T HyWare: a HYbrid Workflow lAnguage for Research E-infrastructures %A Leonardo Candela %A Paolo Manghi %A Fosca Giannotti %A Valerio Grossi %A Roberto Trasarti %X Research e-infrastructures are "systems of systems", patchworks of tools, services and data sources, evolving over time to address the needs of the scientific process. Accordingly, in such environments, researchers implement their scientific processes by means of workflows made of a variety of actions, including for example usage of web services, download and execution of shared software libraries or tools, or local and manual manipulation of data. Although scientists may benefit from sharing their scientific process, the heterogeneity underpinning e-infrastructures hinders their ability to represent, share and eventually reproduce such workflows. This work presents HyWare, a language for representing scientific process in highly-heterogeneous e-infrastructures in terms of so-called hybrid workflows. HyWare lays in between "business process modeling languages", which offer a formal and high-level description of a reasoning, protocol, or procedure, and "workflow execution languages", which enable the fully automated execution of a sequence of computational steps via dedicated engines. %B D-Lib Magazine %V 23 %G eng %U http://dx.doi.org/10.1045/january2017-candela %R 10.1045/january2017-candela %0 Conference Paper %B International Workshop on Complex Networks and their Applications %D 2017 %T Information diffusion in complex networks: The active/passive conundrum %A Letizia Milli %A Giulio Rossetti %A Dino Pedreschi %A Fosca Giannotti %X Ideas, information, viruses: all of them, with their mechanisms, can spread over the complex social tissues described by our interpersonal relations. Classical spreading models can agnostically from the object of which they simulate the diffusion, thus considering spreading of virus, ideas and innovations alike. Indeed, such simplification makes easier to define a standard set of tools that can be applied to heterogeneous contexts; however, it can also lead to biased, partial, simulation outcomes. In this work we discuss the concepts of active and passive diffusion: moving from analysis of a well-known passive model, the Threshold one, we introduce two novel approaches whose aim is to provide active and mixed schemas applicable in the context of innovations/ideas diffusion simulation. Our data-driven analysis shows how, in such context, the adoption of exclusively passive/active models leads to conflicting results, thus highlighting the need of mixed approaches. %B International Workshop on Complex Networks and their Applications %I Springer %G eng %U https://link.springer.com/chapter/10.1007/978-3-319-72150-7_25 %R 10.1007/978-3-319-72150-7_25 %0 Conference Paper %B 2017 IEEE International Conference on Data Mining (ICDM) %D 2017 %T Market Basket Prediction using User-Centric Temporal Annotated Recurring Sequences %A Riccardo Guidotti %A Giulio Rossetti %A Luca Pappalardo %A Fosca Giannotti %A Dino Pedreschi %X Nowadays, a hot challenge for supermarket chains is to offer personalized services to their customers. Market basket prediction, i.e., supplying the customer a shopping list for the next purchase according to her current needs, is one of these services. Current approaches are not capable of capturing at the same time the different factors influencing the customer’s decision process: co-occurrence, sequentuality, periodicity and recurrency of the purchased items. To this aim, we define a pattern named Temporal Annotated Recurring Sequence (TARS). We define the method to extract TARS and develop a predictor for next basket named TBP (TARS Based Predictor) that, on top of TARS, is able to understand the level of the customer’s stocks and recommend the set of most necessary items. A deep experimentation shows that TARS can explain the customers’ purchase behavior, and that TBP outperforms the state-of-the-art competitors. %B 2017 IEEE International Conference on Data Mining (ICDM) %I IEEE %G eng %0 Conference Paper %B Personal Analytics and Privacy. An Individual and Collective Perspective - First International Workshop, {PAP} 2017, Held in Conjunction with {ECML} {PKDD} 2017, Skopje, Macedonia, September 18, 2017, Revised Selected Papers %D 2017 %T Movement Behaviour Recognition for Water Activities %A Mirco Nanni %A Roberto Trasarti %A Fosca Giannotti %B Personal Analytics and Privacy. An Individual and Collective Perspective - First International Workshop, {PAP} 2017, Held in Conjunction with {ECML} {PKDD} 2017, Skopje, Macedonia, September 18, 2017, Revised Selected Papers %G eng %U https://doi.org/10.1007/978-3-319-71970-2_7 %R 10.1007/978-3-319-71970-2_7 %0 Journal Article %J Information Systems %D 2017 %T MyWay: Location prediction via mobility profiling %A Roberto Trasarti %A Riccardo Guidotti %A Anna Monreale %A Fosca Giannotti %X Forecasting the future positions of mobile users is a valuable task allowing us to operate efficiently a myriad of different applications which need this type of information. We propose MyWay, a prediction system which exploits the individual systematic behaviors modeled by mobility profiles to predict human movements. MyWay provides three strategies: the individual strategy uses only the user individual mobility profile, the collective strategy takes advantage of all users individual systematic behaviors, and the hybrid strategy that is a combination of the previous two. A key point is that MyWay only requires the sharing of individual mobility profiles, a concise representation of the user׳s movements, instead of raw trajectory data revealing the detailed movement of the users. We evaluate the prediction performances of our proposal by a deep experimentation on large real-world data. The results highlight that the synergy between the individual and collective knowledge is the key for a better prediction and allow the system to outperform the state-of-art methods. %B Information Systems %V 64 %P 350–367 %8 03/2017 %G eng %0 Journal Article %J International Journal of Data Science and Analytics %D 2017 %T NDlib: a python library to model and analyze diffusion processes over complex networks %A Giulio Rossetti %A Letizia Milli %A S Rinzivillo %A Alina Sirbu %A Dino Pedreschi %A Fosca Giannotti %X Nowadays the analysis of dynamics of and on networks represents a hot topic in the social network analysis playground.To support students, teachers, developers and researchers, in this work we introduce a novel framework, namely NDlib, an environment designed to describe diffusion simulations. NDlib is designed to be a multi-level ecosystem that can be fruitfully used by different user segments. For this reason, upon NDlib, we designed a simulation server that allows remote execution of experiments as well as an online visualization tool that abstracts its programmatic interface and makes available the simulation platform to non-technicians. %B International Journal of Data Science and Analytics %P 1–19 %G eng %0 Conference Paper %B IEEE International Conference on Data Science and Advanced Analytics, DSA %D 2017 %T NDlib: Studying Network Diffusion Dynamics %A Giulio Rossetti %A Letizia Milli %A S Rinzivillo %A Alina Sirbu %A Dino Pedreschi %A Fosca Giannotti %X Nowadays the analysis of diffusive phenomena occurring on top of complex networks represents a hot topic in the Social Network Analysis playground. In order to support students, teachers, developers and researchers in this work we introduce a novel simulation framework, ND LIB . ND LIB is designed to be a multi-level ecosystem that can be fruitfully used by different user segments. Upon the diffusion library, we designed a simulation server that allows remote execution of experiments and an online visualization tool that abstract the programmatic interface and makes available the simulation platform to non-technicians. %B IEEE International Conference on Data Science and Advanced Analytics, DSA %C Tokyo %G eng %U https://ieeexplore.ieee.org/abstract/document/8259774 %R https://doi.org/10.1109/DSAA.2017.6 %0 Journal Article %J Information Systems %D 2017 %T Never drive alone: Boosting carpooling with network analysis %A Riccardo Guidotti %A Mirco Nanni %A S Rinzivillo %A Dino Pedreschi %A Fosca Giannotti %X Carpooling, i.e., the act where two or more travelers share the same car for a common trip, is one of the possibilities brought forward to reduce traffic and its externalities, but experience shows that it is difficult to boost the adoption of carpooling to significant levels. In our study, we analyze the potential impact of carpooling as a collective phenomenon emerging from people׳s mobility, by network analytics. Based on big mobility data from travelers in a given territory, we construct the network of potential carpooling, where nodes correspond to the users and links to possible shared trips, and analyze the structural and topological properties of this network, such as network communities and node ranking, to the purpose of highlighting the subpopulations with higher chances to create a carpooling community, and the propensity of users to be either drivers or passengers in a shared car. Our study is anchored to reality thanks to a large mobility dataset, consisting of the complete one-month-long GPS trajectories of approx. 10% circulating cars in Tuscany. We also analyze the aggregated outcome of carpooling by means of empirical simulations, showing how an assignment policy exploiting the network analytic concepts of communities and node rankings minimizes the number of single occupancy vehicles observed after carpooling. %B Information Systems %V 64 %P 237–257 %G eng %R 10.1016/j.is.2016.03.006 %0 Journal Article %J arXiv preprint arXiv:1702.07158 %D 2017 %T Next Basket Prediction using Recurring Sequential Patterns %A Riccardo Guidotti %A Giulio Rossetti %A Luca Pappalardo %A Fosca Giannotti %A Dino Pedreschi %X Nowadays, a hot challenge for supermarket chains is to offer personalized services for their customers. Next basket prediction, i.e., supplying the customer a shopping list for the next purchase according to her current needs, is one of these services. Current approaches are not capable to capture at the same time the different factors influencing the customer's decision process: co-occurrency, sequentuality, periodicity and recurrency of the purchased items. To this aim, we define a pattern Temporal Annotated Recurring Sequence (TARS) able to capture simultaneously and adaptively all these factors. We define the method to extract TARS and develop a predictor for next basket named TBP (TARS Based Predictor) that, on top of TARS, is able to to understand the level of the customer's stocks and recommend the set of most necessary items. By adopting the TBP the supermarket chains could crop tailored suggestions for each individual customer which in turn could effectively speed up their shopping sessions. A deep experimentation shows that TARS are able to explain the customer purchase behavior, and that TBP outperforms the state-of-the-art competitors. %B arXiv preprint arXiv:1702.07158 %G eng %U https://arxiv.org/abs/1702.07158 %0 Journal Article %J Online Social Networks and Media %D 2017 %T Node-centric Community Discovery: From static to dynamic social network analysis %A Giulio Rossetti %A Dino Pedreschi %A Fosca Giannotti %X Nowadays, online social networks represent privileged playgrounds that enable researchers to study, characterize and understand complex human behaviors. Social Network Analysis, commonly known as SNA, is the multidisciplinary field of research under which researchers of different backgrounds perform their studies: one of the hottest topics in such diversified context is indeed Community Discovery. Clustering individuals, whose relations are described by a networked structure, into homogeneous communities is a complex task required by several analytical processes. Moreover, due to the user-centric and dynamic nature of online social services, during the last decades, particular emphasis was dedicated to the definition of node-centric, overlapping and evolutive Community Discovery methodologies. In this paper we provide a comprehensive and concise review of the main results, both algorithmic and analytical, we obtained in this field. Moreover, to better underline the rationale behind our research activity on Community Discovery, in this work we provide a synthetic review of the relevant literature, discussing not only methodological results but also analytical ones. %B Online Social Networks and Media %V 3 %P 32–48 %G eng %U https://www.sciencedirect.com/science/article/abs/pii/S2468696417301052 %R https://doi.org/10.1016/j.osnem.2017.10.003 %0 Conference Paper %B International Conference on Smart Objects and Technologies for Social Good %D 2017 %T Privacy Preserving Multidimensional Profiling %A Francesca Pratesi %A Anna Monreale %A Fosca Giannotti %A Dino Pedreschi %X Recently, big data had become central in the analysis of human behavior and the development of innovative services. In particular, a new class of services is emerging, taking advantage of different sources of data, in order to consider the multiple aspects of human beings. Unfortunately, these data can lead to re-identification problems and other privacy leaks, as diffusely reported in both scientific literature and media. The risk is even more pressing if multiple sources of data are linked together since a potential adversary could know information related to each dataset. For this reason, it is necessary to evaluate accurately and mitigate the individual privacy risk before releasing personal data. In this paper, we propose a methodology for the first task, i.e., assessing privacy risk, in a multidimensional scenario, defining some possible privacy attacks and simulating them using real-world datasets. %B International Conference on Smart Objects and Technologies for Social Good %I Springer %G eng %U https://link.springer.com/chapter/10.1007/978-3-319-76111-4_15 %R 10.1007/978-3-319-76111-4_15 %0 Conference Paper %B Conference of the Italian Association for Artificial Intelligence %D 2017 %T Sentiment Spreading: An Epidemic Model for Lexicon-Based Sentiment Analysis on Twitter %A Pollacci, Laura %A Alina Sirbu %A Fosca Giannotti %A Dino Pedreschi %A Claudio Lucchese %A Muntean, Cristina Ioana %X While sentiment analysis has received significant attention in the last years, problems still exist when tools need to be applied to microblogging content. This because, typically, the text to be analysed consists of very short messages lacking in structure and semantic context. At the same time, the amount of text produced by online platforms is enormous. So, one needs simple, fast and effective methods in order to be able to efficiently study sentiment in these data. Lexicon-based methods, which use a predefined dictionary of terms tagged with sentiment valences to evaluate sentiment in longer sentences, can be a valid approach. Here we present a method based on epidemic spreading to automatically extend the dictionary used in lexicon-based sentiment analysis, starting from a reduced dictionary and large amounts of Twitter data. The resulting dictionary is shown to contain valences that correlate well with human-annotated sentiment, and to produce tweet sentiment classifications comparable to the original dictionary, with the advantage of being able to tag more tweets than the original. The method is easily extensible to various languages and applicable to large amounts of data. %B Conference of the Italian Association for Artificial Intelligence %I Springer %G eng %U https://link.springer.com/chapter/10.1007/978-3-319-70169-1_9 %R 10.1007/978-3-319-70169-1_9 %0 Conference Paper %B 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2017) %D 2017 %T There's A Path For Everyone: A Data-Driven Personal Model Reproducing Mobility Agendas %A Riccardo Guidotti %A Roberto Trasarti %A Mirco Nanni %A Fosca Giannotti %A Dino Pedreschi %B 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2017) %I IEEE %C Tokyo %G eng %0 Journal Article %J Machine Learning %D 2017 %T Tiles: an online algorithm for community discovery in dynamic social networks %A Giulio Rossetti %A Luca Pappalardo %A Dino Pedreschi %A Fosca Giannotti %X Community discovery has emerged during the last decade as one of the most challenging problems in social network analysis. Many algorithms have been proposed to find communities on static networks, i.e. networks which do not change in time. However, social networks are dynamic realities (e.g. call graphs, online social networks): in such scenarios static community discovery fails to identify a partition of the graph that is semantically consistent with the temporal information expressed by the data. In this work we propose Tiles, an algorithm that extracts overlapping communities and tracks their evolution in time following an online iterative procedure. Our algorithm operates following a domino effect strategy, dynamically recomputing nodes community memberships whenever a new interaction takes place. We compare Tiles with state-of-the-art community detection algorithms on both synthetic and real world networks having annotated community structure: our experiments show that the proposed approach is able to guarantee lower execution times and better correspondence with the ground truth communities than its competitors. Moreover, we illustrate the specifics of the proposed approach by discussing the properties of identified communities it is able to identify. %B Machine Learning %V 106 %P 1213–1241 %G eng %U https://link.springer.com/article/10.1007/s10994-016-5582-8 %R 10.1007/s10994-016-5582-8 %0 Journal Article %J International Journal of Data Science and Analytics %D 2016 %T An analytical framework to nowcast well-being using mobile phone data %A Luca Pappalardo %A Maarten Vanhoof %A Lorenzo Gabrielli %A Zbigniew Smoreda %A Dino Pedreschi %A Fosca Giannotti %X An intriguing open question is whether measurements derived from Big Data recording human activities can yield high-fidelity proxies of socio-economic development and well-being. Can we monitor and predict the socio-economic development of a territory just by observing the behavior of its inhabitants through the lens of Big Data? In this paper, we design a data-driven analytical framework that uses mobility measures and social measures extracted from mobile phone data to estimate indicators for socio-economic development and well-being. We discover that the diversity of mobility, defined in terms of entropy of the individual users’ trajectories, exhibits (i) significant correlation with two different socio-economic indicators and (ii) the highest importance in predictive models built to predict the socio-economic indicators. Our analytical framework opens an interesting perspective to study human behavior through the lens of Big Data by means of new statistical indicators that quantify and possibly “nowcast” the well-being and the socio-economic development of a territory. %B International Journal of Data Science and Analytics %V 2 %P 75–92 %G eng %R 10.1007/s41060-016-0013-2 %0 Journal Article %J Engineering %D 2016 %T Big Data Research in Italy: A Perspective %A Sonia Bergamaschi %A Emanuele Carlini %A Michelangelo Ceci %A Barbara Furletti %A Fosca Giannotti %A Donato Malerba %A Mario Mezzanzanica %A Anna Monreale %A Gabriella Pasi %A Dino Pedreschi %A Raffaele Perego %A Salvatore Ruggieri %X The aim of this article is to synthetically describe the research projects that a selection of Italian universities is undertaking in the context of big data. Far from being exhaustive, this article has the objective of offering a sample of distinct applications that address the issue of managing huge amounts of data in Italy, collected in relation to diverse domains. %B Engineering %V 2 %P 163 %8 06/2016 %G eng %U http://engineering.org.cn/EN/abstract/article_12288.shtml %L 10-1244/N %R 10.1016/J.ENG.2016.02.011 %0 Journal Article %J Social Network Analysis and Mining %D 2016 %T Homophilic network decomposition: a community-centric analysis of online social services %A Giulio Rossetti %A Luca Pappalardo %A Riivo Kikas %A Dino Pedreschi %A Fosca Giannotti %A Marlon Dumas %X In this paper we formulate the homophilic network decomposition problem: Is it possible to identify a network partition whose structure is able to characterize the degree of homophily of its nodes? The aim of our work is to understand the relations between the homophily of individuals and the topological features expressed by specific network substructures. We apply several community detection algorithms on three large-scale online social networks—Skype, LastFM and Google+—and advocate the need of identifying the right algorithm for each specific network in order to extract a homophilic network decomposition. Our results show clear relations between the topological features of communities and the degree of homophily of their nodes in three online social scenarios: product engagement in the Skype network, number of listened songs on LastFM and homogeneous level of education among users of Google+. %B Social Network Analysis and Mining %V 6 %P 103 %G eng %R 10.1007/s1327 %0 Journal Article %J Social Network Analysis and Mining %D 2016 %T A supervised approach for intra-/inter-community interaction prediction in dynamic social networks %A Giulio Rossetti %A Riccardo Guidotti %A Ioanna Miliou %A Dino Pedreschi %A Fosca Giannotti %X Due to the growing availability of Internet services in the last decade, the interactions between people became more and more easy to establish. For example, we can have an intercontinental job interview, or we can send real-time multimedia content to any friend of us just owning a smartphone. All this kind of human activities generates digital footprints, that describe a complex, rapidly evolving, network structures. In such dynamic scenario, one of the most challenging tasks involves the prediction of future interactions between couples of actors (i.e., users in online social networks, researchers in collaboration networks). In this paper, we approach such problem by leveraging networks dynamics: to this extent, we propose a supervised learning approach which exploits features computed by time-aware forecasts of topological measures calculated between node pairs. Moreover, since real social networks are generally composed by weakly connected modules, we instantiate the interaction prediction problem in two disjoint applicative scenarios: intra-community and inter-community link prediction. Experimental results on real time-stamped networks show how our approach is able to reach high accuracy. Furthermore, we analyze the performances of our methodology when varying the typologies of features, community discovery algorithms and forecast methods. %B Social Network Analysis and Mining %V 6 %P 86 %8 09/2016 %G eng %U http://dx.doi.org/10.1007/s13278-016-0397-y %R 10.1007/s13278-016-0397-y %0 Book Section %B Solving Large Scale Learning Tasks. Challenges and Algorithms %D 2016 %T Understanding human mobility with big data %A Fosca Giannotti %A Lorenzo Gabrielli %A Dino Pedreschi %A S Rinzivillo %X The paper illustrates basic methods of mobility data mining, designed to extract from the big mobility data the patterns of collective movement behavior, i.e., discover the subgroups of travelers characterized by a common purpose, profiles of individual movement activity, i.e., characterize the routine mobility of each traveler. We illustrate a number of concrete case studies where mobility data mining is put at work to create powerful analytical services for policy makers, businesses, public administrations, and individual citizens. %B Solving Large Scale Learning Tasks. Challenges and Algorithms %I Springer International Publishing %P 208–220 %G eng %R 10.1007/978-3-319-41706-6_10 %0 Journal Article %J Social Network Analysis and Mining %D 2016 %T Unveiling mobility complexity through complex network analysis %A Riccardo Guidotti %A Anna Monreale %A S Rinzivillo %A Dino Pedreschi %A Fosca Giannotti %X The availability of massive digital traces of individuals is offering a series of novel insights on the understanding of patterns characterizing human mobility. Many studies try to semantically enrich mobility data with annotations about human activities. However, these approaches either focus on places with high frequencies (e.g., home and work), or relay on background knowledge (e.g., public available points of interest). In this paper, we depart from the concept of frequency and we focus on a high level representation of mobility using network analytics. The visits of each driver to each systematic destination are modeled as links in a bipartite network where a set of nodes represents drivers and the other set represents places. We extract such network from two real datasets of human mobility based, respectively, on GPS and GSM data. We introduce the concept of mobility complexity of drivers and places as a ranking analysis over the nodes of these networks. In addition, by means of community discovery analysis, we differentiate subgroups of drivers and places according both to their homogeneity and to their mobility complexity. %B Social Network Analysis and Mining %V 6 %P 59 %G eng %R 10.1007/s13278-016-0369-2 %0 Conference Paper %B IEEE Big Data %D 2015 %T City users’ classification with mobile phone data %A Lorenzo Gabrielli %A Barbara Furletti %A Roberto Trasarti %A Fosca Giannotti %A Dino Pedreschi %X Nowadays mobile phone data are an actual proxy for studying the users’ social life and urban dynamics. In this paper we present the Sociometer, and analytical framework aimed at classifying mobile phone users into behavioral categories by means of their call habits. The analytical process starts from spatio-temporal profiles, learns the different behaviors, and returns annotated profiles. After the description of the methodology and its evaluation, we present an application of the Sociometer for studying city users of one small and one big city, evaluating the impact of big events in these cities. %B IEEE Big Data %C Santa Clara (CA) - USA %8 11/2015 %G eng %0 Conference Paper %B International conference on Advances in Social Network Analysis and Mining %D 2015 %T Community-centric analysis of user engagement in Skype social network %A Giulio Rossetti %A Luca Pappalardo %A Riivo Kikas %A Dino Pedreschi %A Fosca Giannotti %A Marlon Dumas %B International conference on Advances in Social Network Analysis and Mining %I IEEE %C Paris, France %@ 978-1-4503-3854-7 %G eng %U http://dl.acm.org/citation.cfm?doid=2808797.2809384 %R 10.1145/2808797.2809384 %0 Journal Article %J Data Min. Knowl. Discov. %D 2015 %T Discrimination- and privacy-aware patterns %A Sara Hajian %A Josep Domingo-Ferrer %A Anna Monreale %A Dino Pedreschi %A Fosca Giannotti %X Data mining is gaining societal momentum due to the ever increasing availability of large amounts of human data, easily collected by a variety of sensing technologies. We are therefore faced with unprecedented opportunities and risks: a deeper understanding of human behavior and how our society works is darkened by a greater chance of privacy intrusion and unfair discrimination based on the extracted patterns and profiles. Consider the case when a set of patterns extracted from the personal data of a population of individual persons is released for a subsequent use into a decision making process, such as, e.g., granting or denying credit. First, the set of patterns may reveal sensitive information about individual persons in the training population and, second, decision rules based on such patterns may lead to unfair discrimination, depending on what is represented in the training cases. Although methods independently addressing privacy or discrimination in data mining have been proposed in the literature, in this context we argue that privacy and discrimination risks should be tackled together, and we present a methodology for doing so while publishing frequent pattern mining results. We describe a set of pattern sanitization methods, one for each discrimination measure used in the legal literature, to achieve a fair publishing of frequent patterns in combination with two possible privacy transformations: one based on k-anonymity and one based on differential privacy. Our proposed pattern sanitization methods based on k-anonymity yield both privacy- and discrimination-protected patterns, while introducing reasonable (controlled) pattern distortion. Moreover, they obtain a better trade-off between protection and data quality than the sanitization methods based on differential privacy. Finally, the effectiveness of our proposals is assessed by extensive experiments. %B Data Min. Knowl. Discov. %V 29 %P 1733–1782 %G eng %U http://dx.doi.org/10.1007/s10618-014-0393-7 %R 10.1007/s10618-014-0393-7 %0 Conference Proceedings %B IEEE International Conference on Data Science and Advanced Analytics %D 2015 %T The harsh rule of the goals: data-driven performance indicators for football teams %A Paolo Cintia %A Luca Pappalardo %A Dino Pedreschi %A Fosca Giannotti %A Marco Malvaldi %X —Sports analytics in general, and football (soccer in USA) analytics in particular, have evolved in recent years in an amazing way, thanks to automated or semi-automated sensing technologies that provide high-fidelity data streams extracted from every game. In this paper we propose a data-driven approach and show that there is a large potential to boost the understanding of football team performance. From observational data of football games we extract a set of pass-based performance indicators and summarize them in the H indicator. We observe a strong correlation among the proposed indicator and the success of a team, and therefore perform a simulation on the four major European championships (78 teams, almost 1500 games). The outcome of each game in the championship was replaced by a synthetic outcome (win, loss or draw) based on the performance indicators computed for each team. We found that the final rankings in the simulated championships are very close to the actual rankings in the real championships, and show that teams with high ranking error show extreme values of a defense/attack efficiency measure, the Pezzali score. Our results are surprising given the simplicity of the proposed indicators, suggesting that a complex systems’ view on football data has the potential of revealing hidden patterns and behavior of superior quality. %B IEEE International Conference on Data Science and Advanced Analytics %G eng %U https://www.researchgate.net/profile/Luca_Pappalardo/publication/281318318_The_harsh_rule_of_the_goals_data-driven_performance_indicators_for_football_teams/links/561668e308ae37cfe4090a5d.pdf %0 Conference Paper %B International conference on Advances in Social Network Analysis and Mining, ASONAM 2015 %D 2015 %T Interaction Prediction in Dynamic Networks exploiting Community Discovery %A Giulio Rossetti %A Riccardo Guidotti %A Diego Pennacchioli %A Dino Pedreschi %A Fosca Giannotti %X Due to the growing availability of online social services, interactions between people became more and more easy to establish and track. Online social human activities generate digital footprints, that describe complex, rapidly evolving, dynamic networks. In such scenario one of the most challenging task to address involves the prediction of future interactions between couples of actors. In this study, we want to leverage networks dynamics and community structure to predict which are the future interactions more likely to appear. To this extent, we propose a supervised learning approach which exploit features computed by time-aware forecasts of topological measures calculated between pair of nodes belonging to the same community. Our experiments on real dynamic networks show that the designed analytical process is able to achieve interesting results. %B International conference on Advances in Social Network Analysis and Mining, ASONAM 2015 %I IEEE %C Paris, France %@ 978-1-4503-3854-7 %G eng %U http://dl.acm.org/citation.cfm?doid=2808797.2809401 %R 0.1145/2808797.2809401 %0 Journal Article %J EPJ Data Science %D 2015 %T Product assortment and customer mobility %A Michele Coscia %A Diego Pennacchioli %A Fosca Giannotti %X Customers mobility is dependent on the sophistication of their needs: sophisticated customers need to travel more to fulfill their needs. In this paper, we provide more detailed evidence of this phenomenon, providing an empirical validation of the Central Place Theory. For each customer, we detect what is her favorite shop, where she purchases most products. We can study the relationship between the favorite shop and the closest one, by recording the influence of the shop’s size and the customer’s sophistication in the discordance cases, i.e. the cases in which the favorite shop is not the closest one. We show that larger shops are able to retain most of their closest customers and they are able to catch large portions of customers from smaller shops around them. We connect this observation with the shop’s larger sophistication, and not with its other characteristics, as the phenomenon is especially noticeable when customers want to satisfy their sophisticated needs. This is a confirmation of the recent extensions of the Central Place Theory, where the original assumptions of homogeneity in customer purchase power and needs are challenged. Different types of shops have also different survival logics. The largest shops get closed if they are unable to catch customers from the smaller shops, while medium size shops get closed if they cannot retain their closest customers. All analysis are performed on a large real-world dataset recording all purchases from millions of customers across the west coast of Italy. %B EPJ Data Science %V 4 %P 1–18 %8 10-2015 %G eng %U http://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-015-0051-3 %R 10.1140/epjds/s13688-015-0051-3 %0 Conference Paper %B International Conference on Data Science and Advanced Analytics (IEEE DSAA'2015) %D 2015 %T Quantification in Social Networks %A Letizia Milli %A Anna Monreale %A Giulio Rossetti %A Dino Pedreschi %A Fosca Giannotti %A Fabrizio Sebastiani %X In many real-world applications there is a need to monitor the distribution of a population across different classes, and to track changes in this distribution over time. As an example, an important task is to monitor the percentage of unemployed adults in a given region. When the membership of an individual in a class cannot be established deterministically, a typical solution is the classification task. However, in the above applications the final goal is not determining which class the individuals belong to, but estimating the prevalence of each class in the unlabeled data. This task is called quantification. Most of the work in the literature addressed the quantification problem considering data presented in conventional attribute format. Since the ever-growing availability of web and social media we have a flourish of network data representing a new important source of information and by using quantification network techniques we could quantify collective behavior, i.e., the number of users that are involved in certain type of activities, preferences, or behaviors. In this paper we exploit the homophily effect observed in many social networks in order to construct a quantifier for networked data. Our experiments show the effectiveness of the proposed approaches and the comparison with the existing state-of-the-art quantification methods shows that they are more accurate. %B International Conference on Data Science and Advanced Analytics (IEEE DSAA'2015) %I IEEE %C Paris, France %G eng %U http://www.giuliorossetti.net/about/wp-content/uploads/2015/12/main_DSAA.pdf %R 10.1109/DSAA.2015.7344845 %0 Journal Article %J Nat Commun %D 2015 %T Returners and explorers dichotomy in human mobility %A Luca Pappalardo %A Filippo Simini %A S Rinzivillo %A Dino Pedreschi %A Fosca Giannotti %A Barabasi, Albert-Laszlo %X The availability of massive digital traces of human whereabouts has offered a series of novel insights on the quantitative patterns characterizing human mobility. In particular, numerous recent studies have lead to an unexpected consensus: the considerable variability in the characteristic travelled distance of individuals coexists with a high degree of predictability of their future locations. Here we shed light on this surprising coexistence by systematically investigating the impact of recurrent mobility on the characteristic distance travelled by individuals. Using both mobile phone and GPS data, we discover the existence of two distinct classes of individuals: returners and explorers. As existing models of human mobility cannot explain the existence of these two classes, we develop more realistic models able to capture the empirical findings. Finally, we show that returners and explorers play a distinct quantifiable role in spreading phenomena and that a correlation exists between their mobility patterns and social interactions. %B Nat Commun %V 6 %8 09 %G eng %U http://dx.doi.org/10.1038/ncomms9166 %0 Journal Article %J Journal of Trust Management %D 2015 %T A risk model for privacy in trajectory data %A Anirban Basu %A Anna Monreale %A Roberto Trasarti %A Juan Camilo Corena %A Fosca Giannotti %A Dino Pedreschi %A Shinsaku Kiyomoto %A Yutaka Miyake %A Tadashi Yanagihara %X Time sequence data relating to users, such as medical histories and mobility data, are good candidates for data mining, but often contain highly sensitive information. Different methods in privacy-preserving data publishing are utilised to release such private data so that individual records in the released data cannot be re-linked to specific users with a high degree of certainty. These methods provide theoretical worst-case privacy risks as measures of the privacy protection that they offer. However, often with many real-world data the worst-case scenario is too pessimistic and does not provide a realistic view of the privacy risks: the real probability of re-identification is often much lower than the theoretical worst-case risk. In this paper, we propose a novel empirical risk model for privacy which, in relation to the cost of privacy attacks, demonstrates better the practical risks associated with a privacy preserving data release. We show detailed evaluation of the proposed risk model by using k-anonymised real-world mobility data and then, we show how the empirical evaluation of the privacy risk has a different trend in synthetic data describing random movements. %B Journal of Trust Management %V 2 %P 9 %G eng %R 10.1186/s40493-015-0020-6 %0 Journal Article %J Journal of Official Statistics %D 2015 %T Small Area Model-Based Estimators Using Big Data Sources %A Stefano Marchetti %A Caterina Giusti %A Monica Pratesi %A Nicola Salvati %A Fosca Giannotti %A Dino Pedreschi %A S Rinzivillo %A Luca Pappalardo %A Lorenzo Gabrielli %B Journal of Official Statistics %V 31 %P 263–281 %G eng %0 Conference Paper %B Proceedings of the 4th {ACM} {SIGSPATIAL} International Workshop on Mobile Geographic Information Systems, MobiGIS 2015, Bellevue, WA, USA, November 3-6, 2015 %D 2015 %T Towards user-centric data management: individual mobility analytics for collective services %A Riccardo Guidotti %A Roberto Trasarti %A Mirco Nanni %A Fosca Giannotti %B Proceedings of the 4th {ACM} {SIGSPATIAL} International Workshop on Mobile Geographic Information Systems, MobiGIS 2015, Bellevue, WA, USA, November 3-6, 2015 %G eng %U http://doi.acm.org/10.1145/2834126.2834132 %R 10.1145/2834126.2834132 %0 Book Section %B Software Engineering and Formal Methods %D 2015 %T Use of Mobile Phone Data to Estimate Visitors Mobility Flows %A Lorenzo Gabrielli %A Barbara Furletti %A Fosca Giannotti %A Mirco Nanni %A S Rinzivillo %X Big Data originating from the digital breadcrumbs of human activities, sensed as by-product of the technologies that we use for our daily activities, allows us to observe the individual and collective behavior of people at an unprecedented detail. Many dimensions of our social life have big data “proxies”, such as the mobile calls data for mobility. In this paper we investigate to what extent data coming from mobile operators could be a support in producing reliable and timely estimates of intra-city mobility flows. The idea is to define an estimation method based on calling data to characterize the mobility habits of visitors at the level of a single municipality. %B Software Engineering and Formal Methods %I Springer International Publishing %V 8938 %P 214-226 %G eng %U http://link.springer.com/chapter/10.1007%2F978-3-319-15201-1_14 %R 10.1007/978-3-319-15201-1_14 %0 Conference Paper %B Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on %D 2014 %T CF-inspired Privacy-Preserving Prediction of Next Location in the Cloud %A Anirban Basu %A Juan Camilo Corena %A Anna Monreale %A Dino Pedreschi %A Fosca Giannotti %A Shinsaku Kiyomoto %A Vaidya, Jaideep %A Yutaka Miyake %X Mobility data gathered from location sensors such as Global Positioning System (GPS) enabled phones and vehicles is valuable for spatio-temporal data mining for various location-based services (LBS). Such data is often considered sensitive and there exist many a mechanism for privacy preserving analyses of the data. Through various anonymisation mechanisms, it can be ensured with a high probability that a particular individual cannot be identified when mobility data is outsourced to third parties for analysis. However, challenges remain with the privacy of the queries on outsourced analysis results, especially when the queries are sent directly to third parties by end-users. Drawing inspiration from our earlier work in privacy preserving collaborative filtering (CF) and next location prediction, in this exploratory work, we propose a novel representation of trajectory data in the CF domain and experiment with a privacy preserving Slope One CF predictor. We present evaluations for the accuracy and the computational performance of our proposal using anonymised data gathered from real traffic data in the Italian cities of Pisa and Milan. One use-case is a third-party location-prediction-as-a-service deployed on a public cloud, which can respond to privacy-preserving queries while enabling data owners to build a rich predictor on the cloud. %B Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on %I IEEE %G eng %U http://dx.doi.org/10.1109/CloudCom.2014.114 %R 10.1109/CloudCom.2014.114 %0 Journal Article %J Telecommunications Policy %D 2014 %T Discovering urban and country dynamics from mobile phone data with spatial correlation patterns %A Roberto Trasarti %A Ana-Maria Olteanu-Raimond %A Mirco Nanni %A Thomas Couronné %A Barbara Furletti %A Fosca Giannotti %A Zbigniew Smoreda %A Cezary Ziemlicki %K Urban dynamics %X Abstract Mobile communication technologies pervade our society and existing wireless networks are able to sense the movement of people, generating large volumes of data related to human activities, such as mobile phone call records. At the present, this kind of data is collected and stored by telecom operators infrastructures mainly for billing reasons, yet it represents a major source of information in the study of human mobility. In this paper, we propose an analytical process aimed at extracting interconnections between different areas of the city that emerge from highly correlated temporal variations of population local densities. To accomplish this objective, we propose a process based on two analytical tools: (i) a method to estimate the presence of people in different geographical areas; and (ii) a method to extract time- and space-constrained sequential patterns capable to capture correlations among geographical areas in terms of significant co-variations of the estimated presence. The methods are presented and combined in order to deal with two real scenarios of different spatial scale: the Paris Region and the whole France. %B Telecommunications Policy %P - %U http://www.sciencedirect.com/science/article/pii/S0308596113002012 %R http://dx.doi.org/10.1016/j.telpol.2013.12.002 %0 Conference Paper %B Symposium on Applied Computing, {SAC} 2014, Gyeongju, Republic of Korea - March 24 - 28, 2014 %D 2014 %T Fair pattern discovery %A Sara Hajian %A Anna Monreale %A Dino Pedreschi %A Josep Domingo-Ferrer %A Fosca Giannotti %X Data mining is gaining societal momentum due to the ever increasing availability of large amounts of human data, easily collected by a variety of sensing technologies. We are assisting to unprecedented opportunities of understanding human and society behavior that unfortunately is darkened by several risks for human rights: one of this is the unfair discrimination based on the extracted patterns and profiles. Consider the case when a set of patterns extracted from the personal data of a population of individual persons is released for subsequent use in a decision making process, such as, e.g., granting or denying credit. Decision rules based on such patterns may lead to unfair discrimination, depending on what is represented in the training cases. In this context, we address the discrimination risks resulting from publishing frequent patterns. We present a set of pattern sanitization methods, one for each discrimination measure used in the legal literature, for fair (discrimination-protected) publishing of frequent pattern mining results. Our proposed pattern sanitization methods yield discrimination-protected patterns, while introducing reasonable (controlled) pattern distortion. Finally, the effectiveness of our proposals is assessed by extensive experiments. %B Symposium on Applied Computing, {SAC} 2014, Gyeongju, Republic of Korea - March 24 - 28, 2014 %P 113–120 %U http://doi.acm.org/10.1145/2554850.2555043 %R 10.1145/2554850.2555043 %0 Book Section %B Data Science and Simulation in Transportation Research %D 2014 %T Mobility Profiling %A Mirco Nanni %A Roberto Trasarti %A Paolo Cintia %A Barbara Furletti %A Chiara Renso %A Lorenzo Gabrielli %A S Rinzivillo %A Fosca Giannotti %X The ability to understand the dynamics of human mobility is crucial for tasks like urban planning and transportation management. The recent rapidly growing availability of large spatio-temporal datasets gives us the possibility to develop sophisticated and accurate analysis methods and algorithms that can enable us to explore several relevant mobility phenomena: the distinct access paths to a territory, the groups of persons that move together in space and time, the regions of a territory that contains a high density of traffic demand, etc. All these paradigmatic perspectives focus on a collective view of the mobility where the interesting phenomenon is the result of the contribution of several moving objects. In this chapter, the authors explore a different approach to the topic and focus on the analysis and understanding of relevant individual mobility habits in order to assign a profile to an individual on the basis of his/her mobility. This process adds a semantic level to the raw mobility data, enabling further analyses that require a deeper understanding of the data itself. The studies described in this chapter are based on two large datasets of spatio-temporal data, originated, respectively, from GPS-equipped devices and from a mobile phone network. %B Data Science and Simulation in Transportation Research %I IGI Global %P 1-29 %& 1 %R 10.4018/978-1-4666-4920-0.ch001 %0 Conference Paper %B 22nd Italian Symposium on Advanced Database Systems, {SEBD} 2014, Sorrento Coast, Italy, June 16-18, 2014. %D 2014 %T The patterns of musical influence on the Last.Fm social network %A Diego Pennacchioli %A Giulio Rossetti %A Luca Pappalardo %A Dino Pedreschi %A Fosca Giannotti %A Michele Coscia %B 22nd Italian Symposium on Advanced Database Systems, {SEBD} 2014, Sorrento Coast, Italy, June 16-18, 2014. %G eng %0 Conference Paper %B Trust Management {VIII} - 8th {IFIP} {WG} 11.11 International Conference, {IFIPTM} 2014, Singapore, July 7-10, 2014. Proceedings %D 2014 %T A Privacy Risk Model for Trajectory Data %A Anirban Basu %A Anna Monreale %A Juan Camilo Corena %A Fosca Giannotti %A Dino Pedreschi %A Shinsaku Kiyomoto %A Yutaka Miyake %A Tadashi Yanagihara %A Roberto Trasarti %X Time sequence data relating to users, such as medical histories and mobility data, are good candidates for data mining, but often contain highly sensitive information. Different methods in privacy-preserving data publishing are utilised to release such private data so that individual records in the released data cannot be re-linked to specific users with a high degree of certainty. These methods provide theoretical worst-case privacy risks as measures of the privacy protection that they offer. However, often with many real-world data the worst-case scenario is too pessimistic and does not provide a realistic view of the privacy risks: the real probability of re-identification is often much lower than the theoretical worst-case risk. In this paper we propose a novel empirical risk model for privacy which, in relation to the cost of privacy attacks, demonstrates better the practical risks associated with a privacy preserving data release. We show detailed evaluation of the proposed risk model by using k-anonymised real-world mobility data. %B Trust Management {VIII} - 8th {IFIP} {WG} 11.11 International Conference, {IFIPTM} 2014, Singapore, July 7-10, 2014. Proceedings %P 125–140 %U http://dx.doi.org/10.1007/978-3-662-43813-8_9 %R 10.1007/978-3-662-43813-8_9 %0 Journal Article %J EPJ Data Science %D 2014 %T Privacy-by-Design in Big Data Analytics and Social Mining %A Anna Monreale %A S Rinzivillo %A Francesca Pratesi %A Fosca Giannotti %A Dino Pedreschi %X Privacy is ever-growing concern in our society and is becoming a fundamental aspect to take into account when one wants to use, publish and analyze data involving human personal sensitive information. Unfortunately, it is increasingly hard to transform the data in a way that it protects sensitive information: we live in the era of big data characterized by unprecedented opportunities to sense, store and analyze social data describing human activities in great detail and resolution. As a result, privacy preservation simply cannot be accomplished by de-identification alone. In this paper, we propose the privacy-by-design paradigm to develop technological frameworks for countering the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of social mining and big data analytical technologies. Our main idea is to inscribe privacy protection into the knowledge discovery technology by design, so that the analysis incorporates the relevant privacy requirements from the start. %B EPJ Data Science %V 10 %R 10.1140/epjds/s13688-014-0010-4 %0 Conference Paper %B International Conference on Data Science and Advanced Analytics, {DSAA} 2014, Shanghai, China, October 30 - November 1, 2014 %D 2014 %T The purpose of motion: Learning activities from Individual Mobility Networks %A S Rinzivillo %A Lorenzo Gabrielli %A Mirco Nanni %A Luca Pappalardo %A Dino Pedreschi %A Fosca Giannotti %B International Conference on Data Science and Advanced Analytics, {DSAA} 2014, Shanghai, China, October 30 - November 1, 2014 %G eng %U http://dx.doi.org/10.1109/DSAA.2014.7058090 %R 10.1109/DSAA.2014.7058090 %0 Journal Article %J EPJ Data Science %D 2014 %T The retail market as a complex system %A Diego Pennacchioli %A Michele Coscia %A S Rinzivillo %A Fosca Giannotti %A Dino Pedreschi %X Aim of this paper is to introduce the complex system perspective into retail market analysis. Currently, to understand the retail market means to search for local patterns at the micro level, involving the segmentation, separation and profiling of diverse groups of consumers. In other contexts, however, markets are modelled as complex systems. Such strategy is able to uncover emerging regularities and patterns that make markets more predictable, e.g. enabling to predict how much a country’s GDP will grow. Rather than isolate actors in homogeneous groups, this strategy requires to consider the system as a whole, as the emerging pattern can be detected only as a result of the interaction between its self-organizing parts. This assumption holds also in the retail market: each customer can be seen as an independent unit maximizing its own utility function. As a consequence, the global behaviour of the retail market naturally emerges, enabling a novel description of its properties, complementary to the local pattern approach. Such task demands for a data-driven empirical framework. In this paper, we analyse a unique transaction database, recording the micro-purchases of a million customers observed for several years in the stores of a national supermarket chain. We show the emergence of the fundamental pattern of this complex system, connecting the products’ volumes of sales with the customers’ volumes of purchases. This pattern has a number of applications. We provide three of them. By enabling us to evaluate the sophistication of needs that a customer has and a product satisfies, this pattern has been applied to the task of uncovering the hierarchy of needs of the customers, providing a hint about what is the next product a customer could be interested in buying and predicting in which shop she is likely to go to buy it. %B EPJ Data Science %V 3 %P 1–27 %G eng %U http://link.springer.com/article/10.1140/epjds/s13688-014-0033-x %R 10.1140/epjds/s13688-014-0033-x %0 Book Section %B Software Engineering and Formal Methods %D 2014 %T Retrieving Points of Interest from Human Systematic Movements %A Riccardo Guidotti %A Anna Monreale %A S Rinzivillo %A Dino Pedreschi %A Fosca Giannotti %X Human mobility analysis is emerging as a more and more fundamental task to deeply understand human behavior. In the last decade these kind of studies have become feasible thanks to the massive increase in availability of mobility data. A crucial point, for many mobility applications and analysis, is to extract interesting locations for people. In this paper, we propose a novel methodology to retrieve efficiently significant places of interest from movement data. Using car drivers’ systematic movements we mine everyday interesting locations, that is, places around which people life gravitates. The outcomes show the empirical evidence that these places capture nearly the whole mobility even though generated only from systematic movements abstractions. %B Software Engineering and Formal Methods %I Springer International Publishing %P 294–308 %G eng %R 10.1007/978-3-319-15201-1_19 %0 Journal Article %J {TKDD} %D 2014 %T Uncovering Hierarchical and Overlapping Communities with a Local-First Approach %A Michele Coscia %A Giulio Rossetti %A Fosca Giannotti %A Dino Pedreschi %X Community discovery in complex networks is the task of organizing a network’s structure by grouping together nodes related to each other. Traditional approaches are based on the assumption that there is a global-level organization in the network. However, in many scenarios, each node is the bearer of complex information and cannot be classified in disjoint clusters. The top-down global view of the partition approach is not designed for this. Here, we represent this complex information as multiple latent labels, and we postulate that edges in the networks are created among nodes carrying similar labels. The latent labels are the communities a node belongs to and we discover them with a simple local-first approach to community discovery. This is achieved by democratically letting each node vote for the communities it sees surrounding it in its limited view of the global system, its ego neighborhood, using a label propagation algorithm, assuming that each node is aware of the label it shares with each of its connections. The local communities are merged hierarchically, unveiling the modular organization of the network at the global level and identifying overlapping groups and groups of groups. We tested this intuition against the state-of-the-art overlapping community discovery and found that our new method advances in the chosen scenarios in the quality of the obtained communities. We perform a test on benchmark and on real-world networks, evaluating the quality of the community coverage by using the extracted communities to predict the metadata attached to the nodes, which we consider external information about the latent labels. We also provide an explanation about why real-world networks contain overlapping communities and how our logic is able to capture them. Finally, we show how our method is deterministic, is incremental, and has a limited time complexity, so that it can be used on real-world scale networks. %B {TKDD} %V 9 %P 6 %G eng %U http://doi.acm.org/10.1145/2629511 %R 10.1145/2629511 %0 Conference Paper %B 47th SIS Scientific Meeting of the Italian Statistica Society %D 2014 %T Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach %A Barbara Furletti %A Lorenzo Gabrielli %A Fosca Giannotti %A Letizia Milli %A Mirco Nanni %A Dino Pedreschi %? Roberta Vivio %? Giuseppe Garofalo %X The Big Data, originating from the digital breadcrumbs of human activi- ties, sensed as a by-product of the technologies that we use for our daily activities, let us to observe the individual and collective behavior of people at an unprecedented detail. Many dimensions of our social life have big data “proxies”, as the mobile calls data for mobility. In this paper we investigate to what extent such ”big data”, in integration with administrative ones, could be a support in producing reliable and timely estimates of inter-city mobility. The study has been jointly developed by Is- tat, CNR, University of Pisa in the range of interest of the “Commssione di studio avente il compito di orientare le scelte dellIstat sul tema dei Big Data ”. In an on- going project at ISTAT, called “Persons and Places” – based on an integration of administrative data sources, it has been produced a first release of Origin Destina- tion matrix – at municipality level – assuming that the places of residence and that of work (or study) be the terminal points of usual individual mobility for work or study. The coincidence between the city of residence and that of work (or study) – is considered as a proxy of the absence of intercity mobility for a person (we define him a static resident). The opposite case is considered as a proxy of presence of mo- bility (the person is a dynamic resident: commuter or embedded). As administrative data do not contain information on frequency of the mobility, the idea is to specify an estimate method, using calling data as support, to define for each municipality the stock of standing residents, embedded city users and daily city users (commuters) %B 47th SIS Scientific Meeting of the Italian Statistica Society %C Cagliari %8 06/2014 %@ 978-88-8467-874-4 %U http://www.sis2014.it/proceedings/allpapers/3026.pdf %0 Conference Paper %B Proceedings of MoKMaSD %D 2014 %T Use of mobile phone data to estimate visitors mobility flows %A Lorenzo Gabrielli %A Barbara Furletti %A Fosca Giannotti %A Mirco Nanni %A S Rinzivillo %X Big Data originating from the digital breadcrumbs of human activities, sensed as by-product of the technologies that we use for our daily activities, allows us to observe the individual and collective behavior of people at an unprecedented detail. Many dimensions of our social life have big data “proxies”, such as the mo- bile calls data for mobility. In this paper we investigate to what extent data coming from mobile operators could be a support in producing reliable and timely estimates of intra-city mobility flows. The idea is to define an estimation method based on calling data to characterize the mobility habits of visitors at the level of a single municipality %B Proceedings of MoKMaSD %U http://www.di.unipi.it/mokmasd/symposium-2014/preproceedings/GabrielliEtAl-mokmasd2014.pdf %0 Conference Paper %B Computational Intelligence and 11th Brazilian Congress on Computational Intelligence (BRICS-CCI CBIC), 2013 BRICS Congress on %D 2013 %T Comparing General Mobility and Mobility by Car %A Luca Pappalardo %A Filippo Simini %A S Rinzivillo %A Dino Pedreschi %A Fosca Giannotti %B Computational Intelligence and 11th Brazilian Congress on Computational Intelligence (BRICS-CCI CBIC), 2013 BRICS Congress on %8 Sept %G eng %R 10.1109/BRICS-CCI-CBIC.2013.116 %0 Journal Article %J Intell. Data Anal. %D 2013 %T Evolving networks: Eras and turning points %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %A Anna Monreale %A Dino Pedreschi %X Within the large body of research in complex network analysis, an important topic is the temporal evolution of networks. Existing approaches aim at analyzing the evolution on the global and the local scale, extracting properties of either the entire network or local patterns. In this paper, we focus on detecting clusters of temporal snapshots of a network, to be interpreted as eras of evolution. To this aim, we introduce a novel hierarchical clustering methodology, based on a dissimilarity measure (derived from the Jaccard coefficient) between two temporal snapshots of the network, able to detect the turning points at the beginning of the eras. We devise a framework to discover and browse the eras, either in top-down or a bottom-up fashion, supporting the exploration of the evolution at any level of temporal resolution. We show how our approach applies to real networks and null models, by detecting eras in an evolving co-authorship graph extracted from a bibliographic dataset, a collaboration graph extracted from a cinema database, and a network extracted from a database of terrorist attacks; we illustrate how the discovered temporal clustering highlights the crucial moments when the networks witnessed profound changes in their structure. Our approach is finally boosted by introducing a meaningful labeling of the obtained clusters, such as the characterizing topics of each discovered era, thus adding a semantic dimension to our analysis. %B Intell. Data Anal. %V 17 %P 27–48 %U http://dx.doi.org/10.3233/IDA-120566 %R 10.3233/IDA-120566 %0 Conference Proceedings %B IEEE Big Data %D 2013 %T Explaining the PRoduct Range Effect in Purchase Data %A Diego Pennacchioli %A Michele Coscia %A S Rinzivillo %A Dino Pedreschi %A Fosca Giannotti %B IEEE Big Data %0 Conference Paper %B SEDB 2013 %D 2013 %T On multidimensional network measures %A Matteo Magnani %A Anna Monreale %A Giulio Rossetti %A Fosca Giannotti %X Networks, i.e., sets of interconnected entities, are ubiquitous, spanning disciplines as diverse as sociology, biology and computer science. The recent availability of large amounts of network data has thus provided a unique opportunity to develop models and analysis tools applicable to a wide range of scenarios. However, real-world phenomena are often more complex than existing graph data models. One relevant example concerns the numerous types of social relationships (or edges) that can be present between individuals in a social network. In this short paper we present a unified model and a set of measures recently developed to represent and analyze network data with multiple types of edges. %B SEDB 2013 %8 2013 %U https://www.researchgate.net/publication/256194479_On_multidimensional_network_measures %0 Journal Article %J IEEE Systems Journal %D 2013 %T Privacy-Preserving Mining of Association Rules From Outsourced Transaction Databases %A Fosca Giannotti %A L.V.S. Lakshmanan %A Anna Monreale %A Dino Pedreschi %A Hui Wendy Wang %X Spurred by developments such as cloud computing, there has been considerable recent interest in the paradigm of data mining-as-a-service. A company (data owner) lacking in expertise or computational resources can outsource its mining needs to a third party service provider (server). However, both the items and the association rules of the outsourced database are considered private property of the corporation (data owner). To protect corporate privacy, the data owner transforms its data and ships it to the server, sends mining queries to the server, and recovers the true patterns from the extracted patterns received from the server. In this paper, we study the problem of outsourcing the association rule mining task within a corporate privacy-preserving framework. We propose an attack model based on background knowledge and devise a scheme for privacy preserving outsourced mining. Our scheme ensures that each transformed item is indistinguishable with respect to the attacker's background knowledge, from at least k-1 other transformed items. Our comprehensive experiments on a very large and real transaction database demonstrate that our techniques are effective, scalable, and protect privacy. %B IEEE Systems Journal %R 10.1109/JSYST.2012.2221854 %0 Conference Paper %B 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, December 7-10, 2013 %D 2013 %T Quantification Trees %A Letizia Milli %A Anna Monreale %A Giulio Rossetti %A Fosca Giannotti %A Dino Pedreschi %A Fabrizio Sebastiani %X In many applications there is a need to monitor how a population is distributed across different classes, and to track the changes in this distribution that derive from varying circumstances, an example such application is monitoring the percentage (or "prevalence") of unemployed people in a given region, or in a given age range, or at different time periods. When the membership of an individual in a class cannot be established deterministically, this monitoring activity requires classification. However, in the above applications the final goal is not determining which class each individual belongs to, but simply estimating the prevalence of each class in the unlabeled data. This task is called quantification. In a supervised learning framework we may estimate the distribution across the classes in a test set from a training set of labeled individuals. However, this may be sub optimal, since the distribution in the test set may be substantially different from that in the training set (a phenomenon called distribution drift). So far, quantification has mostly been addressed by learning a classifier optimized for individual classification and later adjusting the distribution it computes to compensate for its tendency to either under-or over-estimate the prevalence of the class. In this paper we propose instead to use a type of decision trees (quantification trees) optimized not for individual classification, but directly for quantification. Our experiments show that quantification trees are more accurate than existing state-of-the-art quantification methods, while retaining at the same time the simplicity and understandability of the decision tree framework. %B 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, December 7-10, 2013 %P 528–536 %U http://dx.doi.org/10.1109/ICDM.2013.122 %R 10.1109/ICDM.2013.122 %0 Journal Article %J Social Network Analysis and Mining %D 2013 %T Spatial and Temporal Evaluation of Network-based Analysis of Human Mobility %A Michele Coscia %A S Rinzivillo %A Fosca Giannotti %A Dino Pedreschi %B Social Network Analysis and Mining %V to appear %0 Conference Paper %B Social Informatics - 5th International Conference, SocInfo 2013, Kyoto, Japan, November 25-27, 2013, Proceedings %D 2013 %T The Three Dimensions of Social Prominence %A Diego Pennacchioli %A Giulio Rossetti %A Luca Pappalardo %A Dino Pedreschi %A Fosca Giannotti %A Michele Coscia %B Social Informatics - 5th International Conference, SocInfo 2013, Kyoto, Japan, November 25-27, 2013, Proceedings %G eng %U http://dx.doi.org/10.1007/978-3-319-03260-3_28 %R 10.1007/978-3-319-03260-3_28 %0 Journal Article %J The European Physical Journal Special Topics %D 2013 %T {Understanding the patterns of car travel} %A Luca Pappalardo %A S Rinzivillo %A Qu, Zehui %A Dino Pedreschi %A Fosca Giannotti %X {Are the patterns of car travel different from those of general human mobility? Based on a unique dataset consisting of the GPS trajectories of 10 million travels accomplished by 150,000 cars in Italy, we investigate how known mobility models apply to car travels, and illustrate novel analytical findings. We also assess to what extent the sample in our dataset is representative of the overall car mobility, and discover how to build an extremely accurate model that, given our GPS data, estimates the real traffic values as measured by road sensors.} %B The European Physical Journal Special Topics %V 215 %P 61–73 %G eng %U http://dx.doi.org/10.1140/epjst%252fe2013-01715-5 %R 10.1140/epjst%252fe2013-01715-5 %0 Conference Paper %B ASONAM 2013 %D 2013 %T You Know Because I Know”: a Multidimensional Network Approach to Human Resources Problem %A Michele Coscia %A Giulio Rossetti %A Diego Pennacchioli %A Damiano Ceccarelli %A Fosca Giannotti %B ASONAM 2013 %0 Conference Paper %B Proceedings of the 3rd International Conference on Ambient Systems, Networks and Technologies {(ANT} 2012), the 9th International Conference on Mobile Web Information Systems (MobiWIS-2012), Niagara Falls, Ontario, Canada, August 27-29, 2012 %D 2012 %T An Agent-Based Model to Evaluate Carpooling at Large Manufacturing Plants %A Tom Bellemans %A Sebastian Bothe %A Sungjin Cho %A Fosca Giannotti %A Davy Janssens %A Luk Knapen %A Christine Körner %A Michael May %A Mirco Nanni %A Dino Pedreschi %A Hendrik Stange %A Roberto Trasarti %A Ansar-Ul-Haque Yasar %A Geert Wets %B Proceedings of the 3rd International Conference on Ambient Systems, Networks and Technologies {(ANT} 2012), the 9th International Conference on Mobile Web Information Systems (MobiWIS-2012), Niagara Falls, Ontario, Canada, August 27-29, 2012 %G eng %U http://dx.doi.org/10.1016/j.procs.2012.08.001 %R 10.1016/j.procs.2012.08.001 %0 Report %D 2012 %T Analisi di Mobilita' con dati eterogenei %A Barbara Furletti %A Roberto Trasarti %A Lorenzo Gabrielli %A S Rinzivillo %A Luca Pappalardo %A Fosca Giannotti %I ISTI - CNR %C Pisa %0 Conference Paper %B Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2012 %D 2012 %T AUDIO: An Integrity Auditing Framework of Outlier-Mining-as-a-Service Systems. %A R.Liu %A Hui Wendy Wang %A Anna Monreale %A Dino Pedreschi %A Fosca Giannotti %A W Guo %X Spurred by developments such as cloud computing, there has been considerable recent interest in the data-mining-as-a-service paradigm. Users lacking in expertise or computational resources can outsource their data and mining needs to a third-party service provider (server). Outsourcing, however, raises issues about result integrity: how can the data owner verify that the mining results returned by the server are correct? In this paper, we present AUDIO, an integrity auditing framework for the specific task of distance-based outlier mining outsourcing. It provides efficient and practical verification approaches to check both completeness and correctness of the mining results. The key idea of our approach is to insert a small amount of artificial tuples into the outsourced data; the artificial tuples will produce artificial outliers and non-outliers that do not exist in the original dataset. The server’s answer is verified by analyzing the presence of artificial outliers/non-outliers, obtaining a probabilistic guarantee of correctness and completeness of the mining result. Our empirical results show the effectiveness and efficiency of our method. %B Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2012 %8 2012 %R 10.1007/978-3-642-33486-3_1 %0 Conference Paper %B 2012 International Conference on Privacy, Security, Risk and Trust, {PASSAT} 2012, and 2012 International Confernece on Social Computing, SocialCom 2012, Amsterdam, Netherlands, September 3-5, 2012 %D 2012 %T Classifying Trust/Distrust Relationships in Online Social Networks %A Giacomo Bachi %A Michele Coscia %A Anna Monreale %A Fosca Giannotti %X Online social networks are increasingly being used as places where communities gather to exchange information, form opinions, collaborate in response to events. An aspect of this information exchange is how to determine if a source of social information can be trusted or not. Data mining literature addresses this problem. However, if usually employs social balance theories, by looking at small structures in complex networks known as triangles. This has proven effective in some cases, but it under performs in the lack of context information about the relation and in more complex interactive structures. In this paper we address the problem of creating a framework for the trust inference, able to infer the trust/distrust relationships in those relational environments that cannot be described by using the classical social balance theory. We do so by decomposing a trust network in its ego network components and mining on this ego network set the trust relationships, extending a well known graph mining algorithm. We test our framework on three public datasets describing trust relationships in the real world (from the social media Epinions, Slash dot and Wikipedia) and confronting our results with the trust inference state of the art, showing better performances where the social balance theory fails. %B 2012 International Conference on Privacy, Security, Risk and Trust, {PASSAT} 2012, and 2012 International Confernece on Social Computing, SocialCom 2012, Amsterdam, Netherlands, September 3-5, 2012 %P 552–557 %U http://dx.doi.org/10.1109/SocialCom-PASSAT.2012.115 %R 10.1109/SocialCom-PASSAT.2012.115 %0 Journal Article %J KI - Künstliche Intelligenz %D 2012 %T Data Science for Simulating the Era of Electric Vehicles %A Davy Janssens %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %A S Rinzivillo %B KI - Künstliche Intelligenz %R 10.1007/s13218-012-0183-6 %0 Conference Paper %B The 18th {ACM} {SIGKDD} International Conference on Knowledge Discovery and Data Mining, {KDD} '12, Beijing, China, August 12-16, 2012 %D 2012 %T DEMON: a local-first discovery method for overlapping communities %A Michele Coscia %A Giulio Rossetti %A Fosca Giannotti %A Dino Pedreschi %B The 18th {ACM} {SIGKDD} International Conference on Knowledge Discovery and Data Mining, {KDD} '12, Beijing, China, August 12-16, 2012 %G eng %U http://doi.acm.org/10.1145/2339530.2339630 %R 10.1145/2339530.2339630 %0 Conference Paper %B KDD 2012 %D 2012 %T DEMON: a Local-First Discovery Method for Overlapping Communities %A Michele Coscia %A Giulio Rossetti %A Fosca Giannotti %A Dino Pedreschi %B KDD 2012 %8 2012 %0 Journal Article %J KI - Künstliche Intelligenz %D 2012 %T Discovering the Geographical Borders of Human Mobility %A S Rinzivillo %A Simone Mainardi %A Fabio Pezzoni %A Michele Coscia %A Fosca Giannotti %A Dino Pedreschi %X The availability of massive network and mobility data from diverse domains has fostered the analysis of human behavior and interactions. Broad, extensive, and multidisciplinary research has been devoted to the extraction of non-trivial knowledge from this novel form of data. We propose a general method to determine the influence of social and mobility behavior over a specific geographical area in order to evaluate to what extent the current administrative borders represent the real basin of human movement. We build a network representation of human movement starting with vehicle GPS tracks and extract relevant clusters, which are then mapped back onto the territory, finding a good match with the existing administrative borders. The novelty of our approach is the focus on a detailed spatial resolution, we map emerging borders in terms of individual municipalities, rather than macro regional or national areas. We present a series of experiments to illustrate and evaluate the effectiveness of our approach. %B KI - Künstliche Intelligenz %U https://link.springer.com/article/10.1007%2Fs13218-012-0181-8 %& 1 %R 10.1007/s13218-012-0181-8 %0 Conference Paper %B Twentieth Italian Symposium on Advanced Database Systems, {SEBD} 2012, Venice, Italy, June 24-27, 2012, Proceedings %D 2012 %T Individual Mobility Profiles: Methods and Application on Vehicle Sharing %A Roberto Trasarti %A Fabio Pinelli %A Mirco Nanni %A Fosca Giannotti %B Twentieth Italian Symposium on Advanced Database Systems, {SEBD} 2012, Venice, Italy, June 24-27, 2012, Proceedings %G eng %U http://sebd2012.dei.unipd.it/documents/188475/32d00b8a-8ead-4d97-923f-bd2f2cf6ddcb %0 Conference Paper %B 12th {IEEE} International Conference on Data Mining Workshops, {ICDM} Workshops, Brussels, Belgium, December 10, 2012 %D 2012 %T Injecting Discrimination and Privacy Awareness Into Pattern Discovery %A Sara Hajian %A Anna Monreale %A Dino Pedreschi %A Josep Domingo-Ferrer %A Fosca Giannotti %X Data mining is gaining societal momentum due to the ever increasing availability of large amounts of human data, easily collected by a variety of sensing technologies. Data mining comes with unprecedented opportunities and risks: a deeper understanding of human behavior and how our society works is darkened by a greater chance of privacy intrusion and unfair discrimination based on the extracted patterns and profiles. Although methods independently addressing privacy or discrimination in data mining have been proposed in the literature, in this context we argue that privacy and discrimination risks should be tackled together, and we present a methodology for doing so while publishing frequent pattern mining results. We describe a combined pattern sanitization framework that yields both privacy and discrimination-protected patterns, while introducing reasonable (controlled) pattern distortion. %B 12th {IEEE} International Conference on Data Mining Workshops, {ICDM} Workshops, Brussels, Belgium, December 10, 2012 %P 360–369 %U http://dx.doi.org/10.1109/ICDMW.2012.51 %R 10.1109/ICDMW.2012.51 %0 Journal Article %J World Wide Web %D 2012 %T Multidimensional networks: foundations of structural analysis %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %A Anna Monreale %A Dino Pedreschi %X Complex networks have been receiving increasing attention by the scientific community, thanks also to the increasing availability of real-world network data. So far, network analysis has focused on the characterization and measurement of local and global properties of graphs, such as diameter, degree distribution, centrality, and so on. In the last years, the multidimensional nature of many real world networks has been pointed out, i.e. many networks containing multiple connections between any pair of nodes have been analyzed. Despite the importance of analyzing this kind of networks was recognized by previous works, a complete framework for multidimensional network analysis is still missing. Such a framework would enable the analysts to study different phenomena, that can be either the generalization to the multidimensional setting of what happens in monodimensional networks, or a new class of phenomena induced by the additional degree of complexity that multidimensionality provides in real networks. The aim of this paper is then to give the basis for multidimensional network analysis: we present a solid repertoire of basic concepts and analytical measures, which take into account the general structure of multidimensional networks. We tested our framework on different real world multidimensional networks, showing the validity and the meaningfulness of the measures introduced, that are able to extract important and non-random information about complex phenomena in such networks. %B World Wide Web %V Volume 15 / 2012 %8 10/2012 %U http://www.springerlink.com/content/f774289854430410/abstract/ %R 10.1007/s11280-012-0190-4 %0 Conference Proceedings %B IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining %D 2012 %T Optimal Spatial Resolution for the Analysis of Human Mobility %A Michele Coscia %A S Rinzivillo %A Dino Pedreschi %A Fosca Giannotti %B IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining %C Instanbul, Turkey %0 Journal Article %J European Physical Journal-Special Topics %D 2012 %T Smart cities of the future %A Batty, Michael %A Axhausen, Kay W %A Fosca Giannotti %A Pozdnoukhov, Alexei %A Bazzani, Armando %A Monica Wachowicz %A Ouzounis, Georgios %A Portugali, Yuval %X Here we sketch the rudiments of what constitutes a smart city which we define as a city in which ICT is merged with traditional infrastructures, coordinated and integrated using new digital technologies. We first sketch our vision defining seven goals which concern: developing a new understanding of urban problems; effective and feasible ways to coordinate urban technologies; models and methods for using urban data across spatial and temporal scales; developing new technologies for communication and dissemination; developing new forms of urban governance and organisation; defining critical problems relating to cities, transport, and energy; and identifying risk, uncertainty, and hazards in the smart city. To this, we add six research challenges: to relate the infrastructure of smart cities to their operational functioning and planning through management, control and optimisation; to explore the notion of the city as a laboratory for innovation; to provide portfolios of urban simulation which inform future designs; to develop technologies that ensure equity, fairness and realise a better quality of city life; to develop technologies that ensure informed participation and create shared knowledge for democratic city governance; and to ensure greater and more effective mobility and access to opportunities for urban populations. We begin by defining the state of the art, explaining the science of smart cities. We define six scenarios based on new cities badging themselves as smart, older cities regenerating themselves as smart, the development of science parks, tech cities, and technopoles focused on high technologies, the development of urban services using contemporary ICT, the use of ICT to develop new urban intelligence functions, and the development of online and mobile forms of participation. Seven project areas are then proposed: Integrated Databases for the Smart City, Sensing, Networking and the Impact of New Social Media, Modelling Network Performance, Mobility and Travel Behaviour, Modelling Urban Land Use, Transport and Economic Interactions, Modelling Urban Transactional Activities in Labour and Housing Markets, Decision Support as Urban Intelligence, Participatory Governance and Planning Structures for the Smart City. Finally we anticipate the paradigm shifts that will occur in this research and define a series of key demonstrators which we believe are important to progressing a science of smart cities. %B European Physical Journal-Special Topics %V 214 %P 481 %G eng %R 10.1140/epjst/e2012-01703-3 %0 Journal Article %J Statistical Analysis and Data Mining %D 2011 %T A classification for community discovery methods in complex networks %A Michele Coscia %A Fosca Giannotti %A Dino Pedreschi %B Statistical Analysis and Data Mining %V 4 %P 512-546 %0 Conference Paper %B ASONAM %D 2011 %T Finding and Characterizing Communities in Multidimensional Networks %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %B ASONAM %P 490-494 %0 Conference Paper %B CIKM %D 2011 %T Finding redundant and complementary communities in multidimensional networks %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %B CIKM %P 2181-2184 %0 Conference Paper %B ASONAM %D 2011 %T Foundations of Multidimensional Network Analysis %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %A Anna Monreale %A Dino Pedreschi %X Complex networks have been receiving increasing attention by the scientific community, thanks also to the increasing availability of real-world network data. In the last years, the multidimensional nature of many real world networks has been pointed out, i.e. many networks containing multiple connections between any pair of nodes have been analyzed. Despite the importance of analyzing this kind of networks was recognized by previous works, a complete framework for multidimensional network analysis is still missing. Such a framework would enable the analysts to study different phenomena, that can be either the generalization to the multidimensional setting of what happens inmonodimensional network, or a new class of phenomena induced by the additional degree of complexity that multidimensionality provides in real networks. The aim of this paper is then to give the basis for multidimensional network analysis: we develop a solid repertoire of basic concepts and analytical measures, which takes into account the general structure of multidimensional networks. We tested our framework on a real world multidimensional network, showing the validity and the meaningfulness of the measures introduced, that are able to extract important, nonrandom, information about complex phenomena. %B ASONAM %P 485-489 %R 10.1109/ASONAM.2011.103 %0 Conference Paper %B Sistemi Evoluti per Basi di Dati - {SEBD} 2011, Proceedings of the Nineteenth Italian Symposium on Advanced Database Systems, Maratea, Italy, June 26-29, 2011 %D 2011 %T Link Prediction su Reti Multidimensionali %A Giulio Rossetti %A Michele Berlingerio %A Fosca Giannotti %B Sistemi Evoluti per Basi di Dati - {SEBD} 2011, Proceedings of the Nineteenth Italian Symposium on Advanced Database Systems, Maratea, Italy, June 26-29, 2011 %G eng %0 Conference Paper %B KDD %D 2011 %T Mining mobility user profiles for car pooling %A Roberto Trasarti %A Fabio Pinelli %A Mirco Nanni %A Fosca Giannotti %B KDD %P 1190-1198 %0 Conference Paper %B the 3rd International Conference on Computers, Privacy, and Data Protection: An element of choice %D 2011 %T Privacy-preserving data mining from outsourced databases. %A Fosca Giannotti %A L.V.S. Lakshmanan %A Anna Monreale %A Dino Pedreschi %A Hui Wendy Wang %X Spurred by developments such as cloud computing, there has been considerable recent interest in the paradigm of data mining-as-service: a company (data owner) lacking in expertise or computational resources can outsource its mining needs to a third party service provider (server). However, both the outsourced database and the knowledge extract from it by data mining are considered private property of the data owner. To protect corporate privacy, the data owner transforms its data and ships it to the server, sends mining queries to the server, and recovers the true patterns from the extracted patterns received from the server. In this paper, we study the problem of outsourcing a data mining task within a corporate privacy-preserving framework. We propose a scheme for privacy-preserving outsourced mining which offers a formal protection against information disclosure, and show that the data owner can recover the correct data mining results efficiently. %B the 3rd International Conference on Computers, Privacy, and Data Protection: An element of choice %8 2011 %R 10.1007/978-94-007-0641-5_19 %0 Journal Article %J J. Comput. Science %D 2011 %T The pursuit of hubbiness: Analysis of hubs in large multidimensional networks %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %A Anna Monreale %A Dino Pedreschi %X Hubs are highly connected nodes within a network. In complex network analysis, hubs have been widely studied, and are at the basis of many tasks, such as web search and epidemic outbreak detection. In reality, networks are often multidimensional, i.e., there can exist multiple connections between any pair of nodes. In this setting, the concept of hub depends on the multiple dimensions of the network, whose interplay becomes crucial for the connectedness of a node. In this paper, we characterize multidimensional hubs. We consider the multidimensional generalization of the degree and introduce a new class of measures, that we call Dimension Relevance, aimed at analyzing the importance of different dimensions for the hubbiness of a node. We assess the meaningfulness of our measures by comparing them on real networks and null models, then we study the interplay among dimensions and their effect on node connectivity. Our findings show that: (i) multidimensional hubs do exist and their characterization yields interesting insights and (ii) it is possible to detect the most influential dimensions that cause the different hub behaviors. We demonstrate the usefulness of multidimensional analysis in three real world domains: detection of ambiguous query terms in a word–word query log network, outlier detection in a social network, and temporal analysis of behaviors in a co-authorship network. %B J. Comput. Science %V 2 %P 223-237 %R 10.1016/j.jocs.2011.05.009 %0 Journal Article %J IJDWM %D 2011 %T A Query Language for Mobility Data Mining %A Roberto Trasarti %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %A Chiara Renso %B IJDWM %V 7 %P 24-45 %0 Conference Paper %B ICDM Workshops %D 2011 %T Scalable Link Prediction on Multidimensional Networks %A Giulio Rossetti %A Michele Berlingerio %A Fosca Giannotti %B ICDM Workshops %C Vancouver %P 979-986 %0 Conference Paper %B ECML/PKDD (3) %D 2011 %T Traffic Jams Detection Using Flock Mining %A Rebecca Ong %A Fabio Pinelli %A Roberto Trasarti %A Mirco Nanni %A Chiara Renso %A S Rinzivillo %A Fosca Giannotti %B ECML/PKDD (3) %P 650-653 %0 Journal Article %J VLDB J. %D 2011 %T Unveiling the complexity of human mobility by querying and mining massive trajectory data %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %A Fabio Pinelli %A Chiara Renso %A S Rinzivillo %A Roberto Trasarti %B VLDB J. %V 20 %P 695-719 %0 Conference Paper %B EDBT %D 2010 %T Advanced knowledge discovery on movement data with the GeoPKDD system %A Mirco Nanni %A Roberto Trasarti %A Chiara Renso %A Fosca Giannotti %A Dino Pedreschi %B EDBT %P 693-696 %0 Conference Paper %B EDBT %D 2010 %T Advanced knowledge discovery on movement data with the GeoPKDD system %A Mirco Nanni %A Roberto Trasarti %A Chiara Renso %A Fosca Giannotti %A Dino Pedreschi %B EDBT %P 693-696 %0 Conference Paper %B PAKDD (1) %D 2010 %T As Time Goes by: Discovering Eras in Evolving Social Networks %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %A Anna Monreale %A Dino Pedreschi %X Within the large body of research in complex network analysis, an important topic is the temporal evolution of networks. Existing approaches aim at analyzing the evolution on the global and the local scale, extracting properties of either the entire network or local patterns. In this paper, we focus instead on detecting clusters of temporal snapshots of a network, to be interpreted as eras of evolution. To this aim, we introduce a novel hierarchical clustering methodology, based on a dissimilarity measure (derived from the Jaccard coefficient) between two temporal snapshots of the network. We devise a framework to discover and browse the eras, either in top-down or a bottom-up fashion, supporting the exploration of the evolution at any level of temporal resolution. We show how our approach applies to real networks, by detecting eras in an evolving co-authorship graph extracted from a bibliographic dataset; we illustrate how the discovered temporal clustering highlights the crucial moments when the network had profound changes in its structure. Our approach is finally boosted by introducing a meaningful labeling of the obtained clusters, such as the characterizing topics of each discovered era, thus adding a semantic dimension to our analysis. %B PAKDD (1) %P 81-90 %R 10.1007/978-3-642-13657-3_11 %0 Conference Paper %B SEBD %D 2010 %T Discovering Eras in Evolving Social Networks (Extended Abstract) %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %A Anna Monreale %A Dino Pedreschi %B SEBD %P 78-85 %0 Conference Paper %B ECML/PKDD (3) %D 2010 %T Exploring Real Mobility Data with M-Atlas %A Roberto Trasarti %A S Rinzivillo %A Fabio Pinelli %A Mirco Nanni %A Anna Monreale %A Chiara Renso %A Dino Pedreschi %A Fosca Giannotti %X Research on moving-object data analysis has been recently fostered by the widespread diffusion of new techniques and systems for monitoring, collecting and storing location aware data, generated by a wealth of technological infrastructures, such as GPS positioning and wireless networks. These have made available massive repositories of spatio-temporal data recording human mobile activities, that call for suitable analytical methods, capable of enabling the development of innovative, location-aware applications. %B ECML/PKDD (3) %P 624-627 %R 10.1007/978-3-642-15939-8_48 %0 Conference Proceedings %B 13th AGILE conference on Geographic Information Science %D 2010 %T A Generalisation-based Approach to Anonymising Movement Data %A Gennady Andrienko %A Natalia Andrienko %A Fosca Giannotti %A Anna Monreale %A Dino Pedreschi %A S Rinzivillo %X The possibility to collect, store, disseminate, and analyze data about movements of people raises very serious privacy concerns, given the sensitivity of the information about personal positions. In particular, sensitive information about individuals can be uncovered with the use of data mining and visual analytics methods. In this paper we present a method for the generalization of trajectory data that can be adopted as the first step of a process to obtain k-anonymity in spatio-temporal datasets. We ran a preliminary set of experiments on a real-world trajectory dataset, demonstrating that this method of generalization of trajectories preserves the clustering analysis results. %B 13th AGILE conference on Geographic Information Science %U http://agile2010.dsi.uminho.pt/pen/ShortPapers_PDF%5C122_DOC.pdf %0 Conference Paper %B SEBD %D 2010 %T Location Prediction through Trajectory Pattern Mining (Extended Abstract) %A Anna Monreale %A Fabio Pinelli %A Roberto Trasarti %A Fosca Giannotti %B SEBD %P 134-141 %0 Conference Paper %B Computational Transportation Science %D 2010 %T Mobility data mining: discovering movement patterns from trajectory data %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %A Fabio Pinelli %A Chiara Renso %A S Rinzivillo %A Roberto Trasarti %B Computational Transportation Science %P 7-10 %0 Journal Article %J Transactions on Data Privacy %D 2010 %T Movement Data Anonymity through Generalization %A Anna Monreale %A Gennady Andrienko %A Natalia Andrienko %A Fosca Giannotti %A Dino Pedreschi %A S Rinzivillo %A Stefan Wrobel %X Wireless networks and mobile devices, such as mobile phones and GPS receivers, sense and track the movements of people and vehicles, producing society-wide mobility databases. This is a challenging scenario for data analysis and mining. On the one hand, exciting opportunities arise out of discovering new knowledge about human mobile behavior, and thus fuel intelligent info-mobility applications. On other hand, new privacy concerns arise when mobility data are published. The risk is particularly high for GPS trajectories, which represent movement of a very high precision and spatio-temporal resolution: the de-identification of such trajectories (i.e., forgetting the ID of their associated owners) is only a weak protection, as generally it is possible to re-identify a person by observing her routine movements. In this paper we propose a method for achieving true anonymity in a dataset of published trajectories, by defining a transformation of the original GPS trajectories based on spatial generalization and k-anonymity. The proposed method offers a formal data protection safeguard, quantified as a theoretical upper bound to the probability of re-identification. We conduct a thorough study on a real-life GPS trajectory dataset, and provide strong empirical evidence that the proposed anonymity techniques achieve the conflicting goals of data utility and data privacy. In practice, the achieved anonymity protection is much stronger than the theoretical worst case, while the quality of the cluster analysis on the trajectory data is preserved. %B Transactions on Data Privacy %V 3 %P 91–121 %U http://www.tdp.cat/issues/abs.a045a10.php %0 Conference Paper %B M3SN 2010 Workshop, in conjunction with ICDE2010 %D 2010 %T Towards Discovery of Eras in Social Networks %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %A Anna Monreale %A Dino Pedreschi %X In the last decades, much research has been devoted in topics related to Social Network Analysis. One important direction in this area is to analyze the temporal evolution of a network. So far, previous approaches analyzed this setting at both the global and the local level. In this paper, we focus on finding a way to detect temporal eras in an evolving network. We pose the basis for a general framework that aims at helping the analyst in browsing the temporal clusters both in a top-down and bottom-up way, exploring the network at any level of temporal details. We show the effectiveness of our approach on real data, by applying our proposed methodology to a co-authorship network extracted from a bibliographic dataset. Our first results are encouraging, and open the way for the definition and implementation of a general framework for discovering eras in evolving social networks. %B M3SN 2010 Workshop, in conjunction with ICDE2010 %R 10.1109/ICDEW.2010.5452713 %0 Journal Article %J Inf. Syst. %D 2009 %T A constraint-based querying system for exploratory pattern discovery %A Francesco Bonchi %A Fosca Giannotti %A Claudio Lucchese %A Salvatore Orlando %A Raffaele Perego %A Roberto Trasarti %B Inf. Syst. %V 34 %P 3-27 %0 Journal Article %J Inf. Syst. %D 2009 %T A constraint-based querying system for exploratory pattern discovery %A Francesco Bonchi %A Fosca Giannotti %A Claudio Lucchese %A Salvatore Orlando %A Raffaele Perego %A Roberto Trasarti %B Inf. Syst. %V 34 %P 3-27 %0 Conference Paper %B EDBT %D 2009 %T Geographic privacy-aware knowledge discovery and delivery %A Fosca Giannotti %A Dino Pedreschi %A Yannis Theodoridis %B EDBT %P 1157-1158 %0 Conference Paper %B The European Future Technologies Conference (FET 2009) %D 2009 %T GeoPKDD – Geographic Privacy-aware Knowledge Discovery %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %A Chiara Renso %A S Rinzivillo %A Roberto Trasarti %B The European Future Technologies Conference (FET 2009) %0 Book Section %B Biomedical Data and Applications %D 2009 %T Mining Clinical, Immunological, and Genetic Data of Solid Organ Transplantation %A Michele Berlingerio %A Francesco Bonchi %A Michele Curcio %A Fosca Giannotti %A Franco Turini %B Biomedical Data and Applications %P 211-236 %0 Conference Paper %B CSE (4) %D 2009 %T Mining Mobility Behavior from Trajectory Data %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %A Chiara Renso %A Roberto Trasarti %B CSE (4) %P 948-951 %0 Conference Paper %B SEBD %D 2009 %T Mining the Information Propagation in a Network %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %B SEBD %P 333-340 %0 Conference Paper %B SEBD %D 2009 %T Mining the Information Propagation in a Network %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %B SEBD %P 333-340 %0 Conference Paper %B IDA %D 2009 %T Mining the Temporal Dimension of the Information Propagation %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %B IDA %P 237-248 %0 Conference Paper %B IDA %D 2009 %T Mining the Temporal Dimension of the Information Propagation %A Michele Berlingerio %A Michele Coscia %A Fosca Giannotti %B IDA %P 237-248 %0 Conference Paper %B Proceedings of the 2nd SIGSPATIAL ACM GIS 2009 International Workshop on Security and Privacy in GIS and LBS %D 2009 %T Movement data anonymity through generalization %A Gennady Andrienko %A Natalia Andrienko %A Fosca Giannotti %A Anna Monreale %A Dino Pedreschi %X In recent years, spatio-temporal and moving objects databases have gained considerable interest, due to the diffusion of mobile devices (e.g., mobile phones, RFID devices and GPS devices) and of new applications, where the discovery of consumable, concise, and applicable knowledge is the key step. Clearly, in these applications privacy is a concern, since models extracted from this kind of data can reveal the behavior of group of individuals, thus compromising their privacy. Movement data present a new challenge for the privacy-preserving data mining community because of their spatial and temporal characteristics. In this position paper we briefly present an approach for the generalization of movement data that can be adopted for obtaining k-anonymity in spatio-temporal datasets; specifically, it can be used to realize a framework for publishing of spatio-temporal data while preserving privacy. We ran a preliminary set of experiments on a real-world trajectory dataset, demonstrating that this method of generalization of trajectories preserves the clustering analysis results. %B Proceedings of the 2nd SIGSPATIAL ACM GIS 2009 International Workshop on Security and Privacy in GIS and LBS %I ACM %G eng %R 10.1145/1667502.1667510 %0 Conference Paper %B ASONAM %D 2009 %T Social Network Analysis as Knowledge Discovery Process: A Case Study on Digital Bibliography %A Michele Coscia %A Fosca Giannotti %A Ruggero G. Pensa %B ASONAM %P 279-283 %0 Conference Paper %B ASONAM %D 2009 %T Social Network Analysis as Knowledge Discovery Process: A Case Study on Digital Bibliography %A Michele Coscia %A Fosca Giannotti %A Ruggero G. Pensa %B ASONAM %P 279-283 %0 Conference Paper %B KDD %D 2009 %T Temporal mining for interactive workflow data analysis %A Michele Berlingerio %A Fabio Pinelli %A Mirco Nanni %A Fosca Giannotti %B KDD %P 109-118 %0 Conference Paper %B Second International Workshop on Computational Transportation Science %D 2009 %T Trajectory pattern analysis for urban traffic %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %A Fabio Pinelli %B Second International Workshop on Computational Transportation Science %I ACM %C SEATTLE, USA %P 43-47 %8 11/2009 %0 Conference Paper %B IEEE Visual Analytics Science and Tecnology (VAST 2009) %D 2009 %T Visual Cluster Analysis of Large Collections of Trajectories %A Gennady Andrienko %A Natalia Andrienko %A S Rinzivillo %A Mirco Nanni %A Dino Pedreschi %A Fosca Giannotti %B IEEE Visual Analytics Science and Tecnology (VAST 2009) %I IEEE Computer Society Press %0 Conference Proceedings %B 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining %D 2009 %T WhereNext: a Location Predictor on Trajectory Pattern Mining %A Anna Monreale %A Fabio Pinelli %A Roberto Trasarti %A Fosca Giannotti %X The pervasiveness of mobile devices and location based services is leading to an increasing volume of mobility data.This side eect provides the opportunity for innovative methods that analyse the behaviors of movements. In this paper we propose WhereNext, which is a method aimed at predicting with a certain level of accuracy the next location of a moving object. The prediction uses previously extracted movement patterns named Trajectory Patterns, which are a concise representation of behaviors of moving objects as sequences of regions frequently visited with a typical travel time. A decision tree, named T-pattern Tree, is built and evaluated with a formal training and test process. The tree is learned from the Trajectory Patterns that hold a certain area and it may be used as a predictor of the next location of a new trajectory finding the best matching path in the tree. Three dierent best matching methods to classify a new moving object are proposed and their impact on the quality of prediction is studied extensively. Using Trajectory Patterns as predictive rules has the following implications: (I) the learning depends on the movement of all available objects in a certain area instead of on the individual history of an object; (II) the prediction tree intrinsically contains the spatio-temporal properties that have emerged from the data and this allows us to define matching methods that striclty depend on the properties of such movements. In addition, we propose a set of other measures, that evaluate a priori the predictive power of a set of Trajectory Patterns. This measures were tuned on a real life case study. Finally, an exhaustive set of experiments and results on the real dataset are presented. %B 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining %R 10.1145/1557019.1557091 %0 Journal Article %J VLDB J. %D 2008 %T Anonymity preserving pattern discovery %A Maurizio Atzori %A Francesco Bonchi %A Fosca Giannotti %A Dino Pedreschi %B VLDB J. %V 17 %P 703-727 %G eng %0 Journal Article %J GeoInformatica %D 2008 %T An Application of Advanced Spatio-Temporal Formalisms to Behavioural Ecology %A Alessandra Raffaetà %A T. Ceccarelli %A D. Centeno %A Fosca Giannotti %A A. Massolo %A Christine Parent %A Chiara Renso %A Stefano Spaccapietra %A Franco Turini %B GeoInformatica %V 12 %P 37-72 %G eng %0 Journal Article %D 2008 %T An Application of Advanced Spatio-Temporal Formalisms to Behavioural Ecology %A T. Ceccarelli %A D. Centeno %A Fosca Giannotti %A A. Massolo %A Christine Parent %A Alessandra Raffaetà %A Chiara Renso %A Stefano Spaccapietra %A Franco Turini %G eng %0 Conference Paper %B GIS %D 2008 %T Clustering of German municipalities based on mobility characteristics: an overview of results %A Andrea Zanda %A Christine Körner %A Fosca Giannotti %A Daniel Schulz %A Michael May %B GIS %P 69 %0 Conference Paper %B SEBD %D 2008 %T DAEDALUS: A knowledge discovery analysis framework for movement data %A Riccardo Ortale %A E Ritacco %A N. Pelekisy %A Roberto Trasarti %A Gianni Costa %A Fosca Giannotti %A Giuseppe Manco %A Chiara Renso %A Yannis Theodoridis %B SEBD %P 191-198 %G eng %0 Conference Paper %B GIS %D 2008 %T The DAEDALUS framework: progressive querying and mining of movement data %A Riccardo Ortale %A E Ritacco %A Nikos Pelekis %A Roberto Trasarti %A Gianni Costa %A Fosca Giannotti %A Giuseppe Manco %A Chiara Renso %A Yannis Theodoridis %B GIS %P 52 %0 Conference Proceedings %B First International Workshop on Computational Transportation Science %D 2008 %T Location prediction within the mobility data analysis environment Daedalus %A Fabio Pinelli %A Anna Monreale %A Roberto Trasarti %A Fosca Giannotti %X In this paper we propose a method to predict the next location of a moving object based on two recent results in GeoPKDD project: DAEDALUS, a mobility data analysis environment and Trajectory Pattern, a sequential pattern mining algorithm with temporal annotation integrated in DAEDALUS. The first one is a DMQL environment for moving objects, where both data and patterns can be represented. The second one extracts movement patterns as sequences of movements between locations with typical travel times. This paper proposes a prediction method which uses the local models extracted by Trajectory Pattern to build a global model called Prediction Tree. The future location of a moving object is predicted visiting the tree and calculating the best matching function. The integration within DAEDALUS system supports an interactive construction of the predictor on the top of a set of spatio-temporal patterns. Others proposals in literature base the definition of prediction methods for future location of a moving object on previously extracted frequent patterns. They use the recent history of movements of the object itself and often use time only to order the events. Our work uses the movements of all moving objects in a certain area to learn a classifier built on the mined trajectory patterns, which are intrinsically equipped with temporal information. %B First International Workshop on Computational Transportation Science %C Dublin, Ireland %R 10.4108/ICST.MOBIQUITOUS2008.3894 %0 Book Section %B Mobility, Data Mining and Privacy %D 2008 %T Mobility, Data Mining and Privacy: A Vision of Convergence %A Fosca Giannotti %A Dino Pedreschi %B Mobility, Data Mining and Privacy %P 1-11 %0 Book %B Mobility, Data Mining and Privacy %D 2008 %T Mobility, Data Mining and Privacy - Geographic Knowledge Discovery %A Fosca Giannotti %A Dino Pedreschi %E Fosca Giannotti %E Dino Pedreschi %B Mobility, Data Mining and Privacy %I Springer %@ 978-3-540-75176-2 %0 Conference Paper %B PinKDD %D 2008 %T Mobility, Data Mining and Privacy the Experience of the GeoPKDD Project %A Fosca Giannotti %A Dino Pedreschi %A Franco Turini %B PinKDD %P 25-32 %0 Book Section %B Mobility, Data Mining and Privacy %D 2008 %T Querying and Reasoning for Spatiotemporal Data Mining %A Giuseppe Manco %A Miriam Baglioni %A Fosca Giannotti %A Bart Kuijpers %A Alessandra Raffaetà %A Chiara Renso %B Mobility, Data Mining and Privacy %P 335-374 %0 Book Section %D 2008 %T Querying and Reasoning for Spatio-Temporal Data Mining %A Giuseppe Manco %A Miriam Baglioni %A Fosca Giannotti %A Bart Kuijpers %A Alessandra Raffaetà %A Chiara Renso %I a Knowledge Discovery vision %C Mobility, Privacy, and Geography %G eng %0 Conference Paper %B SEBD %D 2008 %T Temporal analysis of process logs: a case study %A Michele Berlingerio %A Fosca Giannotti %A Mirco Nanni %A Fabio Pinelli %B SEBD %P 430-437 %G eng %0 Journal Article %J Information Visualization %D 2008 %T Visually driven analysis of movement data by progressive clustering %A S Rinzivillo %A Dino Pedreschi %A Mirco Nanni %A Fosca Giannotti %A Natalia Andrienko %A Gennady Andrienko %B Information Visualization %I Palgrave Macmillan Ltd %V 7 %P 225-239 %0 Conference Paper %B ICDM Workshops %D 2007 %T Hiding Sensitive Trajectory Patterns %A Osman Abul %A Maurizio Atzori %A Francesco Bonchi %A Fosca Giannotti %B ICDM Workshops %P 693-698 %G eng %0 Conference Paper %B SEBD %D 2007 %T Hiding Sequences %A Osman Abul %A Maurizio Atzori %A Francesco Bonchi %A Fosca Giannotti %B SEBD %P 233-241 %G eng %0 Conference Paper %B ICDE Workshops %D 2007 %T Hiding Sequences %A Osman Abul %A Maurizio Atzori %A Francesco Bonchi %A Fosca Giannotti %B ICDE Workshops %P 147-156 %G eng %0 Conference Paper %B BIBM %D 2007 %T Mining Clinical Data with a Temporal Dimension: A Case Study %A Michele Berlingerio %A Francesco Bonchi %A Fosca Giannotti %A Franco Turini %B BIBM %P 429-436 %G eng %0 Conference Paper %B MDM %D 2007 %T Privacy-Aware Knowledge Discovery from Location Data %A Maurizio Atzori %A Francesco Bonchi %A Fosca Giannotti %A Dino Pedreschi %A Osman Abul %B MDM %P 283-287 %G eng %0 Conference Paper %B ICDM Workshops %D 2007 %T Time-Annotated Sequences for Medical Data Mining %A Michele Berlingerio %A Francesco Bonchi %A Fosca Giannotti %A Franco Turini %B ICDM Workshops %P 133-138 %G eng %0 Conference Paper %B SEBD %D 2007 %T Towards Constraint-Based Subgraph Mining %A Michele Berlingerio %A Francesco Bonchi %A Fosca Giannotti %B SEBD %P 274-281 %G eng %0 Conference Paper %B KDD %D 2007 %T Trajectory pattern mining %A Fosca Giannotti %A Mirco Nanni %A Fabio Pinelli %A Dino Pedreschi %B KDD %P 330-339 %G eng %0 Conference Paper %B ICDE %D 2006 %T ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery %A Francesco Bonchi %A Fosca Giannotti %A Claudio Lucchese %A Salvatore Orlando %A Raffaele Perego %A Roberto Trasarti %B ICDE %P 159 %G eng %0 Conference Paper %B SDM %D 2006 %T Efficient Mining of Temporally Annotated Sequences %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %B SDM %G eng %0 Conference Paper %B SEBD %D 2006 %T On Interactive Pattern Mining from Relational Databases %A Claudio Lucchese %A Francesco Bonchi %A Fosca Giannotti %A Salvatore Orlando %A Raffaele Perego %A Roberto Trasarti %B SEBD %P 329-338 %G eng %0 Conference Paper %B KDID %D 2006 %T On Interactive Pattern Mining from Relational Databases %A Francesco Bonchi %A Fosca Giannotti %A Claudio Lucchese %A Salvatore Orlando %A Raffaele Perego %A Roberto Trasarti %B KDID %P 42-62 %G eng %0 Conference Paper %B CBMS %D 2006 %T Mining HLA Patterns Explaining Liver Diseases %A Michele Berlingerio %A Francesco Bonchi %A Silvia Chelazzi %A Michele Curcio %A Fosca Giannotti %A Fabrizio Scatena %B CBMS %P 702-707 %G eng %0 Conference Paper %B SAC %D 2006 %T Mining sequences with temporal annotations %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %A Fabio Pinelli %B SAC %P 593-597 %G eng %0 Conference Paper %B SAC %D 2006 %T Towards low-perturbation anonymity preserving pattern discovery %A Maurizio Atzori %A Francesco Bonchi %A Fosca Giannotti %A Dino Pedreschi %B SAC %P 588-592 %G eng %0 Journal Article %J Comput. Syst. Sci. Eng. %D 2005 %T Anonymity and data mining %A Maurizio Atzori %A Francesco Bonchi %A Fosca Giannotti %A Dino Pedreschi %B Comput. Syst. Sci. Eng. %V 20 %G eng %0 Conference Paper %B ICDM %D 2005 %T Blocking Anonymity Threats Raised by Frequent Itemset Mining %A Maurizio Atzori %A Francesco Bonchi %A Fosca Giannotti %A Dino Pedreschi %B ICDM %P 561-564 %G eng %0 Journal Article %J Knowl. Inf. Syst. %D 2005 %T Efficient breadth-first mining of frequent pattern with monotone constraints %A Francesco Bonchi %A Fosca Giannotti %A Alessio Mazzanti %A Dino Pedreschi %B Knowl. Inf. Syst. %V 8 %P 131-153 %G eng %0 Journal Article %J IEEE Intelligent Systems %D 2005 %T Exante: A Preprocessing Method for Frequent-Pattern Mining %A Francesco Bonchi %A Fosca Giannotti %A Alessio Mazzanti %A Dino Pedreschi %B IEEE Intelligent Systems %V 20 %P 25-31 %G eng %0 Conference Paper %B GIS %D 2005 %T Synthetic generation of cellular network positioning data %A Fosca Giannotti %A Andrea Mazzoni %A Simone Puntoni %A Chiara Renso %B GIS %P 12-20 %G eng %0 Conference Paper %B GIS %D 2005 %T Synthetic generation of cellular network positioning data %A Fosca Giannotti %A Andrea Mazzoni %A Simone Puntoni %A Chiara Renso %B GIS %P 12-20 %G eng %0 Conference Paper %B DMKD %D 2004 %T Discovery of ads web hosts through traffic data analysis %A V. Bacarella %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %B DMKD %P 76-81 %G eng %0 Conference Paper %B SEBD %D 2004 %T Frequent Pattern Queries for Flexible Knowledge Discovery %A Francesco Bonchi %A Fosca Giannotti %A Dino Pedreschi %B SEBD %P 250-261 %G eng %0 Conference Proceedings %B Lecture Notes in Computer Science %D 2004 %T Knowledge Discovery in Databases: PKDD 2004, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, September 20-24, 2004, Proceedings %A Jean-François Boulicaut %A Floriana Esposito %A Fosca Giannotti %A Dino Pedreschi %B Lecture Notes in Computer Science %I Springer %V 3202 %@ 3-540-23108-0 %G eng %0 Conference Proceedings %B Lecture Notes in Computer Science %D 2004 %T Machine Learning: ECML 2004, 15th European Conference on Machine Learning, Pisa, Italy, September 20-24, 2004, Proceedings %A Jean-François Boulicaut %A Floriana Esposito %A Fosca Giannotti %A Dino Pedreschi %B Lecture Notes in Computer Science %I Springer %V 3201 %@ 3-540-23105-6 %G eng %0 Conference Paper %B Local Pattern Detection %D 2004 %T Pushing Constraints to Detect Local Patterns %A Francesco Bonchi %A Fosca Giannotti %B Local Pattern Detection %P 1-19 %G eng %0 Conference Paper %B Constraint-Based Mining and Inductive Databases %D 2004 %T A Relational Query Primitive for Constraint-Based Pattern Mining %A Francesco Bonchi %A Fosca Giannotti %A Dino Pedreschi %B Constraint-Based Mining and Inductive Databases %P 14-37 %G eng %0 Journal Article %J IEEE Trans. Knowl. Data Eng. %D 2004 %T Specifying Mining Algorithms with Iterative User-Defined Aggregates %A Fosca Giannotti %A Giuseppe Manco %A Franco Turini %B IEEE Trans. Knowl. Data Eng. %V 16 %P 1232-1246 %G eng %0 Conference Paper %B Database Support for Data Mining Applications %D 2004 %T Towards a Logic Query Language for Data Mining %A Fosca Giannotti %A Giuseppe Manco %A Franco Turini %B Database Support for Data Mining Applications %P 76-94 %G eng %0 Conference Paper %B PKDD %D 2003 %T Adaptive Constraint Pushing in Frequent Pattern Mining %A Francesco Bonchi %A Fosca Giannotti %A Alessio Mazzanti %A Dino Pedreschi %B PKDD %P 47-58 %G eng %0 Conference Paper %B ICDM %D 2003 %T ExAMiner: Optimized Level-wise Frequent Pattern Mining with Monotone Constraint %A Francesco Bonchi %A Fosca Giannotti %A Alessio Mazzanti %A Dino Pedreschi %B ICDM %P 11-18 %G eng %0 Conference Paper %B PKDD %D 2003 %T ExAnte: Anticipated Data Reduction in Constrained Pattern Mining %A Francesco Bonchi %A Fosca Giannotti %A Alessio Mazzanti %A Dino Pedreschi %B PKDD %P 59-70 %G eng %0 Conference Paper %B Logics for Emerging Applications of Databases %D 2003 %T Logical Languages for Data Mining %A Fosca Giannotti %A Giuseppe Manco %A Jef Wijsen %B Logics for Emerging Applications of Databases %P 325-361 %G eng %0 Conference Paper %B SEBD %D 2003 %T Pre-processing for Constrained Pattern Mining %A Francesco Bonchi %A Fosca Giannotti %A Alessio Mazzanti %A Dino Pedreschi %B SEBD %P 519-530 %G eng %0 Conference Paper %B SEBD %D 2003 %T WebCat: Automatic Categorization of Web Search Results %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %A F. Samaritani %B SEBD %P 507-518 %G eng %0 Conference Paper %B ITCC %D 2002 %T Characterizing Web User Accesses: A Transactional Approach to Web Log Clustering %A Fosca Giannotti %A Cristian Gozzi %A Giuseppe Manco %B ITCC %P 312 %G eng %0 Conference Paper %B PKDD %D 2002 %T Clustering Transactional Data %A Fosca Giannotti %A Cristian Gozzi %A Giuseppe Manco %B PKDD %P 175-187 %G eng %0 Conference Paper %B KDID %D 2002 %T Invited talk: Logical Data Mining Query Languages %A Fosca Giannotti %B KDID %P 1 %G eng %0 Conference Paper %B JELIA %D 2002 %T LDL-M$_{\mbox{ine}}$: Integrating Data Mining with Intelligent Query Answering %A Fosca Giannotti %A Giuseppe Manco %B JELIA %P 517-520 %G eng %0 Conference Paper %B SEBD %D 2001 %T Clustering Transactional Data %A Fosca Giannotti %A Cristian Gozzi %A Giuseppe Manco %B SEBD %P 163-176 %G eng %0 Conference Paper %B SEBD %D 2001 %T Complex Reasoning on Geographical Data %A Fosca Giannotti %A Alessandra Raffaetà %A Chiara Renso %A Franco Turini %B SEBD %P 331-338 %G eng %0 Conference Paper %B SEBD %D 2001 %T Complex Reasoning on Geographical Data %A Fosca Giannotti %A Alessandra Raffaetà %A Chiara Renso %A Franco Turini %B SEBD %P 331-338 %G eng %0 Conference Paper %B ITCC %D 2001 %T Data Mining for Intelligent Web Caching %A Francesco Bonchi %A Fosca Giannotti %A Giuseppe Manco %A Chiara Renso %A Mirco Nanni %A Dino Pedreschi %A Salvatore Ruggieri %B ITCC %P 599-603 %G eng %0 Conference Paper %B ITCC %D 2001 %T Data Mining for Intelligent Web Caching %A Francesco Bonchi %A Fosca Giannotti %A Giuseppe Manco %A Chiara Renso %A Mirco Nanni %A Dino Pedreschi %A Salvatore Ruggieri %B ITCC %P 599-603 %G eng %0 Journal Article %J IEEE Trans. Knowl. Data Eng. %D 2001 %T Nondeterministic, Nonmonotonic Logic Databases %A Fosca Giannotti %A Giuseppe Manco %A Mirco Nanni %A Dino Pedreschi %B IEEE Trans. Knowl. Data Eng. %V 13 %P 813-823 %G eng %0 Journal Article %J J. Comput. Syst. Sci. %D 2001 %T Semantics and Expressive Power of Nondeterministic Constructs in Deductive Databases %A Fosca Giannotti %A Dino Pedreschi %A Carlo Zaniolo %B J. Comput. Syst. Sci. %V 62 %P 15-42 %G eng %0 Conference Paper %B PKDD %D 2001 %T Specifying Mining Algorithms with Iterative User-Defined Aggregates: A Case Study %A Fosca Giannotti %A Giuseppe Manco %A Franco Turini %B PKDD %P 128-139 %G eng %0 Journal Article %J Data Knowl. Eng. %D 2001 %T Web log data warehousing and mining for intelligent web caching %A Francesco Bonchi %A Fosca Giannotti %A Cristian Gozzi %A Giuseppe Manco %A Mirco Nanni %A Dino Pedreschi %A Chiara Renso %A Salvatore Ruggieri %B Data Knowl. Eng. %V 39 %P 165-189 %G eng %0 Journal Article %J Data Knowl. Eng. %D 2001 %T Web log data warehousing and mining for intelligent web caching %A Francesco Bonchi %A Fosca Giannotti %A Cristian Gozzi %A Giuseppe Manco %A Mirco Nanni %A Dino Pedreschi %A Chiara Renso %A Salvatore Ruggieri %B Data Knowl. Eng. %V 39 %P 165-189 %G eng %0 Journal Article %J Data and Knowledge Engineering %D 2001 %T Web Log Data Warehousing and Mining for Intelligent Web Caching %A Francesco Bonchi %A Fosca Giannotti %A Cristian Gozzi %A Giuseppe Manco %A Mirco Nanni %A Dino Pedreschi %A Chiara Renso %A Salvatore Ruggieri %B Data and Knowledge Engineering %G eng %0 Conference Paper %B FQAS %D 2000 %T Declarative Knowledge Extraction with Interactive User-Defined Aggregates %A Fosca Giannotti %A Giuseppe Manco %B FQAS %P 435-444 %G eng %0 Conference Paper %B EJC %D 2000 %T Logic-Based Knowledge Discovery in Databases %A Fosca Giannotti %A Mirco Nanni %A Dino Pedreschi %B EJC %P 279-283 %G eng %0 Conference Paper %B PAKDD %D 2000 %T Making Knowledge Extraction and Reasoning Closer %A Fosca Giannotti %A Giuseppe Manco %B PAKDD %P 360-371 %G eng %0 Conference Paper %B Computational Logic %D 2000 %T On Verification in Logic Database Languages %A Francesco Bonchi %A Fosca Giannotti %A Dino Pedreschi %B Computational Logic %P 957-971 %G eng %0 Conference Paper %B DEXA Workshop %D 1999 %T Beyond Current Technology: The Perspective of Three EC GIS Projects %A Fosca Giannotti %A Robert Jeansoulin %A Yannis Theodoridis %B DEXA Workshop %P 510 %G eng %0 Conference Paper %B KDD %D 1999 %T A Classification-Based Methodology for Planning Audit Strategies in Fraud Detection %A Francesco Bonchi %A Fosca Giannotti %A Gianni Mainetto %A Dino Pedreschi %B KDD %P 175-184 %G eng %0 Conference Paper %B 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery %D 1999 %T Experiences with a Logic-based knowledge discovery Support Environment %A Fosca Giannotti %A Giuseppe Manco %A Dino Pedreschi %A Franco Turini %B 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery %G eng %0 Conference Paper %B AI*IA %D 1999 %T Experiences with a Logic-Based Knowledge Discovery Support Environment %A Fosca Giannotti %A Giuseppe Manco %A Dino Pedreschi %A Franco Turini %B AI*IA %P 202-213 %G eng %0 Conference Paper %B SEBD %D 1999 %T Integration of Deduction and Induction for Mining Supermarket Sales Data %A Fosca Giannotti %A Giuseppe Manco %A Mirco Nanni %A Dino Pedreschi %A Franco Turini %B SEBD %P 117-131 %G eng %0 Conference Paper %B APPIA-GULP-PRODE %D 1999 %T Querying inductive Databases via Logic-Based user-defined aggregates %A Fosca Giannotti %A Giuseppe Manco %B APPIA-GULP-PRODE %P 605-620 %G eng %0 Conference Paper %B PKDD %D 1999 %T Querying Inductive Databases via Logic-Based User-Defined Aggregates %A Fosca Giannotti %A Giuseppe Manco %B PKDD %P 125-135 %G eng %0 Conference Paper %B SEBD %D 1999 %T Una Metodologia Basata sulla Classificazione per la Pianificazione degli Accertamenti nel Rilevamento di Frodi %A Francesco Bonchi %A Fosca Giannotti %A Gianni Mainetto %A Dino Pedreschi %B SEBD %P 69-84 %G eng %0 Conference Paper %B DaWaK %D 1999 %T Using Data Mining Techniques in Fiscal Fraud Detection %A Francesco Bonchi %A Fosca Giannotti %A Gianni Mainetto %A Dino Pedreschi %B DaWaK %P 369-376 %G eng %0 Journal Article %J J. Log. Program. %D 1998 %T Datalog with Non-Deterministic Choice Computers NDB-PTIME %A Fosca Giannotti %A Dino Pedreschi %B J. Log. Program. %V 35 %P 79-101 %G eng %0 Conference Paper %B CSL %D 1998 %T On the Effective Semantics of Nondeterministic, Nonmonotonic, Temporal Logic Databases %A Fosca Giannotti %A Giuseppe Manco %A Mirco Nanni %A Dino Pedreschi %B CSL %P 58-72 %G eng %0 Conference Paper %B FQAS %D 1998 %T Query Answering in Nondeterministic, Nonmonotonic Logic Databases %A Fosca Giannotti %A Giuseppe Manco %A Mirco Nanni %A Dino Pedreschi %B FQAS %P 175-187 %G eng %0 Conference Paper %B DOOD %D 1997 %T Datalog++: A Basis for Active Object-Oriented Databases %A Fosca Giannotti %A Giuseppe Manco %A Mirco Nanni %A Dino Pedreschi %B DOOD %P 283-301 %G eng %0 Conference Paper %B SEBD %D 1997 %T Datalog++: a Basis for Active Object.Oriented Databases %A Fosca Giannotti %A Giuseppe Manco %A Mirco Nanni %A Dino Pedreschi %B SEBD %P 325-340 %G eng %0 Conference Paper %B APPIA-GULP-PRODE %D 1997 %T A Deductive Data Model for Representing and Querying Semistructured Data %A Fosca Giannotti %A Giuseppe Manco %A Dino Pedreschi %B APPIA-GULP-PRODE %P 129-140 %G eng %0 Journal Article %J Ann. Math. Artif. Intell. %D 1997 %T Programming with Non-Determinism in Deductive Databases %A Fosca Giannotti %A Sergio Greco %A Domenico Saccà %A Carlo Zaniolo %B Ann. Math. Artif. Intell. %V 19 %P 97-125 %G eng %0 Conference Paper %B DBPL %D 1997 %T Static Analysis of Transactions for Conservative Multigranularity Locking %A Giuseppe Amato %A Fosca Giannotti %A Gianni Mainetto %B DBPL %P 413-430 %G eng %0 Conference Paper %B SEBD %D 1996 %T Ragionamento spazio-temporale con LDLT: primi esperimenti verso un sistema deduttivo per applicazioni geografiche %A Marilisa E. Carboni %A Annalisa Di Deo %A Fosca Giannotti %A Maria V Masserotti %B SEBD %P 73-90 %G eng %0 Conference Paper %B DDLP %D 1996 %T Spatio-Temporal Reasoning with LDLT: First Steps Towards a Deductive System for Geographical Applications %A Marilisa E. Carboni %A Annalisa Di Deo %A Fosca Giannotti %A Maria V Masserotti %B DDLP %P 135-151 %G eng %0 Conference Paper %B SEBD %D 1995 %T Declarative Reconstruction of Updates in Logic Databases: A Compilative Approach %A Marilisa E. Carboni %A Fosca Giannotti %A V. Foddai %A Dino Pedreschi %B SEBD %P 3-13 %G eng %0 Conference Paper %B GULP-PRODE %D 1995 %T Declarative Reconstruction of Updates in Logic Databases: a Compilative Approach %A Marilisa E. Carboni %A V. Foddai %A Fosca Giannotti %A Dino Pedreschi %B GULP-PRODE %P 169-182 %G eng %0 Conference Paper %B FORTE %D 1994 %T An abstract interpreter for the specification language LOTOS %A Franco Fiore %A Fosca Giannotti %B FORTE %P 309-323 %G eng %0 Conference Paper %B SEBD %D 1994 %T Conservative Multigranularity Locking for an Obiect-Oriented Persistent Language via Abstract Interpretation %A Giuseppe Amato %A Fosca Giannotti %A Gianni Mainetto %B SEBD %P 329-349 %G eng %0 Conference Paper %B Workshop on Deductive Databases and Logic Programming %D 1994 %T Expressive Power of Non-Deterministic Operators for Logic-based Languages %A Luca Corciulo %A Fosca Giannotti %A Dino Pedreschi %A Carlo Zaniolo %B Workshop on Deductive Databases and Logic Programming %P 27-40 %G eng %0 Journal Article %J Sci. Comput. Program. %D 1994 %T Gate Splitting in LOTOS Specifications Using Abstract Interpretation %A Fosca Giannotti %A Diego Latella %B Sci. Comput. Program. %V 23 %P 127-149 %G eng %0 Conference Paper %B VLDB %D 1993 %T Data Sharing Analysis for a Database Programming Lanaguage via Abstract Interpretation %A Giuseppe Amato %A Fosca Giannotti %A Gianni Mainetto %B VLDB %P 405-415 %G eng %0 Conference Paper %B DOOD %D 1993 %T Datalog with Non-Deterministic Choice Computes NDB-PTIME %A Luca Corciulo %A Fosca Giannotti %A Dino Pedreschi %B DOOD %P 49-66 %G eng %0 Conference Paper %B TAPSOFT %D 1993 %T Gate Splitting in LOTOS Specifications Using Abstract Interpretation %A Fosca Giannotti %A Diego Latella %B TAPSOFT %P 437-452 %G eng %0 Conference Paper %B FMLDO %D 1993 %T Static Analysis of Transactions: an Experiment of Abstract Interpretation Usage %A Giuseppe Amato %A Fosca Giannotti %A Gianni Mainetto %B FMLDO %P 19-29 %G eng %0 Conference Paper %B WSA %D 1992 %T Analysis of Concurrent Transactions in a Functional Database Programming Language %A Giuseppe Amato %A Fosca Giannotti %A Gianni Mainetto %B WSA %P 174-184 %G eng %0 Conference Paper %B WSA %D 1992 %T Using Abstract Interpretation for Gate splitting in LOTOS Specifications %A Fosca Giannotti %A Diego Latella %B WSA %P 194-204 %G eng %0 Conference Paper %B DOOD %D 1991 %T Non-Determinism in Deductive Databases %A Fosca Giannotti %A Dino Pedreschi %A Domenico Saccà %A Carlo Zaniolo %B DOOD %P 129-146 %G eng %0 Conference Paper %B PLILP %D 1991 %T A Technique for Recursive Invariance Detection and Selective Program Specification %A Fosca Giannotti %A Manuel V. Hermenegildo %B PLILP %P 323-334 %G eng %0 Conference Paper %B LPNMR %D 1990 %T Declarative Semantics for Pruning Operators in Logic Programming %A Fosca Giannotti %A Dino Pedreschi %B LPNMR %P 27-37 %G eng %0 Conference Paper %B IEA/AIE (Vol. 2) %D 1990 %T RASP: A Resource Allocator for Software Projects %A C. Bertazzoni %A Fosca Giannotti %B IEA/AIE (Vol. 2) %P 628-637 %G eng %0 Journal Article %J Sci. Comput. Program. %D 1987 %T Symbolic Evaluation with Structural Recursive Symbolic Constants %A Fosca Giannotti %A Attilio Matteucci %A Dino Pedreschi %A Franco Turini %B Sci. Comput. Program. %V 9 %P 161-177 %G eng %0 Journal Article %J IEEE Trans. Software Eng. %D 1985 %T Symbolic Semantics and Program Reduction %A Vincenzo Ambriola %A Fosca Giannotti %A Dino Pedreschi %A Franco Turini %B IEEE Trans. Software Eng. %V 11 %P 784-794 %G eng %0 Conference Paper %B Data Types and Persistence (Appin), Informal Proceedings %D 1985 %T The Type System of Galileo %A Antonio Albano %A Fosca Giannotti %A Renzo Orsini %A Dino Pedreschi %B Data Types and Persistence (Appin), Informal Proceedings %P 175-195 %G eng %0 Conference Paper %B Data Types and Persistence (Appin) %D 1985 %T The Type System of Galileo %A Antonio Albano %A Fosca Giannotti %A Renzo Orsini %A Dino Pedreschi %B Data Types and Persistence (Appin) %P 101-119 %G eng