TY  - CONF
T1  - AUC-based Selective Classification
T2  - International Conference on Artificial Intelligence and Statistics, 25-27 April 2023, Palau de Congressos, Valencia, Spain
Y1  - 2023
A1  - Andrea Pugnana
A1  - Salvatore Ruggieri
AB  - Selective classification (or classification with a reject option) pairs a classifier with a selection function to determine whether or not a prediction should be accepted. This framework trades off coverage (probability of accepting a prediction) with predictive performance, typically measured by distributive loss functions. In many application scenarios, such as credit scoring, performance is instead measured by ranking metrics, such as the Area Under the ROC Curve (AUC). We propose a model-agnostic approach to associate a selection function to a given probabilistic binary classifier. The approach is specifically targeted at optimizing the AUC. We provide both theoretical justifications and a novel algorithm, called AUCROSS, to achieve such a goal. Experiments show that our method succeeds in trading-off coverage for AUC, improving over existing selective classification methods targeted at optimizing accuracy.
JF  - International Conference on Artificial Intelligence and Statistics, 25-27 April 2023, Palau de Congressos, Valencia, Spain
PB  - PMLR
UR  - https://proceedings.mlr.press/v206/pugnana23a.html
ER  - 

TY  - CONF
T1  - A Model-Agnostic Heuristics for Selective Classification
T2  - Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, 2023, Washington, DC, USA, February 7-14, 2023
Y1  - 2023
A1  - Andrea Pugnana
A1  - Salvatore Ruggieri
AB  - Selective classification (also known as classification with reject option) conservatively extends a classifier with a selection function to determine whether or not a prediction should be accepted (i.e., trusted, used, deployed). This is a highly relevant issue in socially sensitive tasks, such as credit scoring. State-of-the-art approaches rely on Deep Neural Networks (DNNs) that train at the same time both the classifier and the selection function. These approaches are model-specific and computationally expensive. We propose a model-agnostic approach, as it can work with any base probabilistic binary classification algorithm, and it can be scalable to large tabular datasets if the base classifier is so. The proposed algorithm, called SCROSS, exploits a cross-fitting strategy and theoretical results for quantile estimation to build the selection function. Experiments on real-world data show that SCROSS improves over existing methods.
JF  - Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, 2023, Washington, DC, USA, February 7-14, 2023
PB  - AAAI Press
UR  - https://doi.org/10.1609/aaai.v37i8.26133
ER  - 

TY  - JOUR
T1  - Methods and tools for causal discovery and causal inference
JF  - WIREs Data Mining Knowl. Discov.
Y1  - 2022
A1  - Ana Rita Nogueira
A1  - Andrea Pugnana
A1  - Salvatore Ruggieri
A1  - Dino Pedreschi
A1  - João Gama
AB  - Causality is a complex concept, which roots its developments across several fields, such as statistics, economics, epidemiology, computer science, and philosophy. In recent years, the study of causal relationships has become a crucial part of the Artificial Intelligence community, as causality can be a key tool for overcoming some limitations of correlation-based Machine Learning systems. Causality research can generally be divided into two main branches, that is, causal discovery and causal inference. The former focuses on obtaining causal knowledge directly from observational data. The latter aims to estimate the impact deriving from a change of a certain variable over an outcome of interest. This article aims at covering several methodologies that have been developed for both tasks. This survey does not only focus on theoretical aspects. But also provides a practical toolkit for interested researchers and practitioners, including software, datasets, and running examples.
VL  - 12
ER  - 

TY  - JOUR
T1  - Estimating the Total Volume of Queries to a Search Engine
JF  - IEEE Transactions on Knowledge and Data Engineering
Y1  - 2021
A1  - F. Lillo
A1  - Salvatore Ruggieri
AB  - We study the problem of estimating the total number of searches (volume) of queries in a specific domain, which were submitted to a search engine in a given time period. Our statistical model assumes that the distribution of searches follows a Zipf's law, and that the observed sample volumes are biased accordingly to three possible scenarios. These assumptions are consistent with empirical data, with keyword research practices, and with approximate algorithms used to take counts of query frequencies. A few estimators of the parameters of the distribution are devised and experimented, based on the nature of the empirical/simulated data. We apply the methods on the domain of recipes and cooking queries searched in Italian in 2017. The observed volumes of sample queries are collected from Google Trends (continuous data) and SearchVolume (binned data). The estimated total number of queries and total volume are computed for the two cases, and the results are compared and discussed.
UR  - https://ieeexplore.ieee.org/abstract/document/9336245
ER  - 

TY  - JOUR
T1  - Give more data, awareness and control to individual citizens, and they will help COVID-19 containment
Y1  - 2021
A1  - Mirco Nanni
A1  - Andrienko, Gennady
A1  - Barabasi, Albert-Laszlo
A1  - Boldrini, Chiara
A1  - Bonchi, Francesco
A1  - Cattuto, Ciro
A1  - Chiaromonte, Francesca
A1  - Comandé, Giovanni
A1  - Conti, Marco
A1  - Coté, Mark
A1  - Dignum, Frank
A1  - Dignum, Virginia
A1  - Domingo-Ferrer, Josep
A1  - Ferragina, Paolo
A1  - Fosca Giannotti
A1  - Riccardo Guidotti
A1  - Helbing, Dirk
A1  - Kaski, Kimmo
A1  - Kertész, János
A1  - Lehmann, Sune
A1  - Lepri, Bruno
A1  - Lukowicz, Paul
A1  - Matwin, Stan
A1  - Jiménez, David Megías
A1  - Anna Monreale
A1  - Morik, Katharina
A1  - Oliver, Nuria
A1  - Passarella, Andrea
A1  - Passerini, Andrea
A1  - Dino Pedreschi
A1  - Pentland, Alex
A1  - Pianesi, Fabio
A1  - Francesca Pratesi
A1  - S Rinzivillo
A1  - Salvatore Ruggieri
A1  - Siebes, Arno
A1  - Torra, Vicenc
A1  - Roberto Trasarti
A1  - Hoven, Jeroen van den
A1  - Vespignani, Alessandro
AB  - The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the “phase 2” of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens’ privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens’ “personal data stores”, to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates—if and when they want and for specific aims—with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.
SN  - 1572-8439
UR  - https://link.springer.com/article/10.1007/s10676-020-09572-w
JO  - Ethics and Information Technology
ER  - 

TY  - JOUR
T1  - Bias in data-driven artificial intelligence systems—An introductory survey
JF  - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Y1  - 2020
A1  - Ntoutsi, Eirini
A1  - Fafalios, Pavlos
A1  - Gadiraju, Ujwal
A1  - Iosifidis, Vasileios
A1  - Nejdl, Wolfgang
A1  - Vidal, Maria-Esther
A1  - Salvatore Ruggieri
A1  - Franco Turini
A1  - Papadopoulos, Symeon
A1  - Krasanakis, Emmanouil
A1  - others
AB  - Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training, and deployment to ensure social good while still benefiting from the huge potential of the AI technology. The goal of this survey is to provide a broad multidisciplinary overview of the area of bias in AI systems, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well‐grounded in a legal frame. In this survey, we focus on data‐driven AI, as a large part of AI is powered nowadays by (big) data and powerful machine learning algorithms. If otherwise not specified, we use the general term bias to describe problems related to the gathering or processing of data that might result in prejudiced decisions on the bases of demographic features such as race, sex, and so forth.
VL  - 10
UR  - https://onlinelibrary.wiley.com/doi/full/10.1002/widm.1356
ER  - 

TY  - JOUR
T1  - Causal inference for social discrimination reasoning
Y1  - 2020
A1  - Qureshi, Bilal
A1  - Kamiran, Faisal
A1  - Karim, Asim
A1  - Salvatore Ruggieri
A1  - Dino Pedreschi
AB  - The discovery of discriminatory bias in human or automated decision making is a task of increasing importance and difficulty, exacerbated by the pervasive use of machine learning and data mining. Currently, discrimination discovery largely relies upon correlation analysis of decisions records, disregarding the impact of confounding biases. We present a method for causal discrimination discovery based on propensity score analysis, a statistical tool for filtering out the effect of confounding variables. We introduce causal measures of discrimination which quantify the effect of group membership on the decisions, and highlight causal discrimination/favoritism patterns by learning regression trees over the novel measures. We validate our approach on two real world datasets. Our proposed framework for causal discrimination has the potential to enhance the transparency of machine learning with tools for detecting discriminatory bias both in the training data and in the learning algorithms.
VL  - 54
SN  - 1573-7675
UR  - https://link.springer.com/article/10.1007/s10844-019-00580-x
JO  - Journal of Intelligent Information Systems
ER  - 

TY  - CONF
T1  - Explaining Sentiment Classification with Synthetic Exemplars and Counter-Exemplars
T2  - Discovery Science
Y1  - 2020
A1  - Lampridis, Orestis
A1  - Riccardo Guidotti
A1  - Salvatore Ruggieri
ED  - Appice, Annalisa
ED  - Tsoumakas, Grigorios
ED  - Manolopoulos, Yannis
ED  - Matwin, Stan
AB  - We present xspells, a model-agnostic local approach for explaining the decisions of a black box model for sentiment classification of short texts. The explanations provided consist of a set of exemplar sentences and a set of counter-exemplar sentences. The former are examples classified by the black box with the same label as the text to explain. The latter are examples classified with a different label (a form of counter-factuals). Both are close in meaning to the text to explain, and both are meaningful sentences – albeit they are synthetically generated. xspells generates neighbors of the text to explain in a latent space using Variational Autoencoders for encoding text and decoding latent instances. A decision tree is learned from randomly generated neighbors, and used to drive the selection of the exemplars and counter-exemplars. We report experiments on two datasets showing that xspells outperforms the well-known lime method in terms of quality of explanations, fidelity, and usefulness, and that is comparable to it in terms of stability.
JF  - Discovery Science
PB  - Springer International Publishing
CY  - Cham
SN  - 978-3-030-61527-7
UR  - https://link.springer.com/chapter/10.1007/978-3-030-61527-7_24
ER  - 

TY  - JOUR
T1  - Causal inference for social discrimination reasoning
JF  - Journal of Intelligent Information Systems
Y1  - 2019
A1  - Qureshi, Bilal
A1  - Kamiran, Faisal
A1  - Karim, Asim
A1  - Salvatore Ruggieri
A1  - Dino Pedreschi
AB  - The discovery of discriminatory bias in human or automated decision making is a task of increasing importance and difficulty, exacerbated by the pervasive use of machine learning and data mining. Currently, discrimination discovery largely relies upon correlation analysis of decisions records, disregarding the impact of confounding biases. We present a method for causal discrimination discovery based on propensity score analysis, a statistical tool for filtering out the effect of confounding variables. We introduce causal measures of discrimination which quantify the effect of group membership on the decisions, and highlight causal discrimination/favoritism patterns by learning regression trees over the novel measures. We validate our approach on two real world datasets. Our proposed framework for causal discrimination has the potential to enhance the transparency of machine learning with tools for detecting discriminatory bias both in the training data and in the learning algorithms.
UR  - https://link.springer.com/article/10.1007/s10844-019-00580-x
ER  - 

TY  - JOUR
T1  - Factual and Counterfactual Explanations for Black Box Decision Making
JF  - IEEE Intelligent Systems
Y1  - 2019
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Fosca Giannotti
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
A1  - Franco Turini
AB  - The rise of sophisticated machine learning models has brought accurate but obscure decision systems, which hide their logic, thus undermining transparency, trust, and the adoption of artificial intelligence (AI) in socially sensitive and safety-critical contexts. We introduce a local rule-based explanation method, providing faithful explanations of the decision made by a black box classifier on a specific instance. The proposed method first learns an interpretable, local classifier on a synthetic neighborhood of the instance under investigation, generated by a genetic algorithm. Then, it derives from the interpretable classifier an explanation consisting of a decision rule, explaining the factual reasons of the decision, and a set of counterfactuals, suggesting the changes in the instance features that would lead to a different outcome. Experimental results show that the proposed method outperforms existing approaches in terms of the quality of the explanations and of the accuracy in mimicking the black box.
UR  - https://ieeexplore.ieee.org/abstract/document/8920138
ER  - 

TY  - CONF
T1  - Meaningful explanations of Black Box AI decision systems
T2  - Proceedings of the AAAI Conference on Artificial Intelligence
Y1  - 2019
A1  - Dino Pedreschi
A1  - Fosca Giannotti
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Salvatore Ruggieri
A1  - Franco Turini
AB  - Black box AI systems for automated decision making, often based on machine learning over (big) data, map a user’s features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases inherited by the algorithms from human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We focus on the urgent open challenge of how to construct meaningful explanations of opaque AI/ML systems, introducing the local-toglobal framework for black box explanation, articulated along three lines: (i) the language for expressing explanations in terms of logic rules, with statistical and causal interpretation; (ii) the inference of local explanations for revealing the decision rationale for a specific case, by auditing the black box in the vicinity of the target instance; (iii), the bottom-up generalization of many local explanations into simple global ones, with algorithms that optimize for quality and comprehensibility. We argue that the local-first approach opens the door to a wide variety of alternative solutions along different dimensions: a variety of data sources (relational, text, images, etc.), a variety of learning problems (multi-label classification, regression, scoring, ranking), a variety of languages for expressing meaningful explanations, a variety of means to audit a black box.
JF  - Proceedings of the AAAI Conference on Artificial Intelligence
UR  - https://aaai.org/ojs/index.php/AAAI/article/view/5050
ER  - 

TY  - CONF
T1  - On The Stability of Interpretable Models
T2  - 2019 International Joint Conference on Neural Networks (IJCNN)
Y1  - 2019
A1  - Riccardo Guidotti
A1  - Salvatore Ruggieri
AB  - Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process. Bias in data collection and preparation, or in model's construction may severely affect the accountability of the design process. We conduct an experimental study of the stability of interpretable models with respect to feature selection, instance selection, and model selection. Our conclusions should raise awareness and attention of the scientific community on the need of a stability impact assessment of interpretable models.
JF  - 2019 International Joint Conference on Neural Networks (IJCNN)
PB  - IEEE
UR  - https://ieeexplore.ieee.org/abstract/document/8852158
ER  - 

TY  - JOUR
T1  - Assessing the Stability of Interpretable Models
JF  - arXiv preprint arXiv:1810.09352
Y1  - 2018
A1  - Riccardo Guidotti
A1  - Salvatore Ruggieri
AB  - Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process, which, in particular, comprises data collection and filtering. Selection bias in data collection or in data pre-processing may affect the model learned. Although model induction algorithms are designed to learn to generalize, they pursue optimization of predictive accuracy. It remains unclear how interpretability is instead impacted. We conduct an experimental analysis to investigate whether interpretable models are able to cope with data selection bias as far as interpretability is concerned.
ER  - 

TY  - CHAP
T1  - How Data Mining and Machine Learning Evolved from Relational Data Base to Data Science
T2  - A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years
Y1  - 2018
A1  - Amato, G.
A1  - Candela, L.
A1  - Castelli, D.
A1  - Esuli, A.
A1  - Falchi, F.
A1  - Gennaro, C.
A1  - Fosca Giannotti
A1  - Anna Monreale
A1  - Mirco Nanni
A1  - Pagano, P.
A1  - Luca Pappalardo
A1  - Dino Pedreschi
A1  - Francesca Pratesi
A1  - Rabitti, F.
A1  - S Rinzivillo
A1  - Giulio Rossetti
A1  - Salvatore Ruggieri
A1  - Sebastiani, F.
A1  - Tesconi, M.
ED  - Flesca, Sergio
ED  - Greco, Sergio
ED  - Masciari, Elio
ED  - Saccà, Domenico
AB  - During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led to profound pervasiveness of relational databases in any kind of organization. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today.
JF  - A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years
PB  - Springer International Publishing
CY  - Cham
SN  - 978-3-319-61893-7
UR  - https://link.springer.com/chapter/10.1007%2F978-3-319-61893-7_17
ER  - 

TY  - RPRT
T1  - Local Rule-Based Explanations of Black Box Decision Systems
Y1  - 2018
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Salvatore Ruggieri
A1  - Dino Pedreschi
A1  - Franco Turini
A1  - Fosca Giannotti
JF  - arXiv preprint arXiv:1805.10820
ER  - 

TY  - RPRT
T1  - Open the Black Box Data-Driven Explanation of Black Box Decision Systems
Y1  - 2018
A1  - Dino Pedreschi
A1  - Fosca Giannotti
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Luca Pappalardo
A1  - Salvatore Ruggieri
A1  - Franco Turini
JF  - arXiv preprint arXiv:1806.09936
ER  - 

TY  - JOUR
T1  - A survey of methods for explaining black box models
JF  - ACM computing surveys (CSUR)
Y1  - 2018
A1  - Riccardo Guidotti
A1  - Anna Monreale
A1  - Salvatore Ruggieri
A1  - Franco Turini
A1  - Fosca Giannotti
A1  - Dino Pedreschi
AB  - In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.
VL  - 51
UR  - https://dl.acm.org/doi/abs/10.1145/3236009
ER  - 

TY  - JOUR
T1  - Efficiently Clustering Very Large Attributed Graphs
JF  - arXiv preprint arXiv:1703.08590
Y1  - 2017
A1  - Alessandro Baroni
A1  - Conte, Alessio
A1  - Patrignani, Maurizio
A1  - Salvatore Ruggieri
AB  - Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorithms limit their scalability to medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a fast and scalable algorithm for partitioning large attributed graphs. The approach is robust, being compatible both with categorical and with quantitative attributes, and it is tailorable, allowing the user to weight the semantic and topological components. Further, the approach does not require the user to guess in advance the number of clusters. SToC relies on well known approximation techniques such as bottom-k sketches, traditional graph-theoretic concepts, and a new perspective on the composition of heterogeneous distance measures. Experimental results demonstrate its ability to efficiently compute high-quality partitions of large scale attributed graphs.
ER  - 

TY  - CONF
T1  - Enumerating Distinct Decision Trees
T2  - International Conference on Machine Learning
Y1  - 2017
A1  - Salvatore Ruggieri
AB  - The search space for the feature selection problem in decision tree learning is the lattice of subsets of the available features. We provide an exact enumeration procedure of the subsets that lead to all and only the distinct decision trees. The procedure can be adopted to prune the search space of complete and heuristics search methods in wrapper models for feature selection. Based on this, we design a computational optimization of the sequential backward elimination heuristics with a performance improvement of up to 100X.
JF  - International Conference on Machine Learning
UR  - http://proceedings.mlr.press/v70/ruggieri17a.html
ER  - 

TY  - JOUR
T1  - Segregation discovery in a social network of companies
JF  - Journal of Intelligent Information Systems
Y1  - 2017
A1  - Alessandro Baroni
A1  - Salvatore Ruggieri
AB  - We introduce a framework for the data-driven analysis of social segregation of minority groups, and challenge it on a complex scenario. The framework builds on quantitative measures of segregation, called segregation indexes, proposed in the social science literature. The segregation discovery problem is introduced, which consists of searching sub-groups of population and minorities for which a segregation index is above a minimum threshold. A search algorithm is devised that solves the segregation problem by computing a multi-dimensional data cube that can be explored by the analyst. The machinery underlying the search algorithm relies on frequent itemset mining concepts and tools. The framework is challenged on a cases study in the context of company networks. We analyse segregation on the grounds of sex and age for directors in the boards of the Italian companies. The network includes 2.15M companies and 3.63M directors.
UR  - https://doi.org/10.1007/s10844-017-0485-0
ER  - 

TY  - JOUR
T1  - Big Data Research in Italy: A Perspective
JF  - Engineering
Y1  - 2016
A1  - Sonia Bergamaschi
A1  - Emanuele Carlini
A1  - Michelangelo Ceci
A1  - Barbara Furletti
A1  - Fosca Giannotti
A1  - Donato Malerba
A1  - Mario Mezzanzanica
A1  - Anna Monreale
A1  - Gabriella Pasi
A1  - Dino Pedreschi
A1  - Raffaele Perego
A1  - Salvatore Ruggieri
AB  - The aim of this article is to synthetically describe the research projects that a selection of Italian universities is undertaking in the context of big data. Far from being exhaustive, this article has the objective of offering a sample of distinct applications that address the issue of managing huge amounts of data in Italy, collected in relation to diverse domains.
VL  - 2
UR  - http://engineering.org.cn/EN/abstract/article_12288.shtml
ER  - 

TY  - JOUR
T1  - Causal Discrimination Discovery Through Propensity Score Analysis
JF  - arXiv preprint arXiv:1608.03735
Y1  - 2016
A1  - Qureshi, Bilal
A1  - Kamiran, Faisal
A1  - Karim, Asim
A1  - Salvatore Ruggieri
AB  - Social discrimination is considered illegal and unethical in the modern world. Such discrimination is often implicit in observed decisions' datasets, and anti-discrimination organizations seek to discover cases of discrimination and to understand the reasons behind them. Previous work in this direction adopted simple observational data analysis; however, this can produce biased results due to the effect of confounding variables. In this paper, we propose a causal discrimination discovery and understanding approach based on propensity score analysis. The propensity score is an effective statistical tool for filtering out the effect of confounding variables. We employ propensity score weighting to balance the distribution of individuals from protected and unprotected groups w.r.t. the confounding variables. For each individual in the dataset, we quantify its causal discrimination or favoritism with a neighborhood-based measure calculated on the balanced distributions. Subsequently, the causal discrimination/favoritism patterns are understood by learning a regression tree. Our approach avoids common pitfalls in observational data analysis and make its results legally admissible. We demonstrate the results of our approach on two discrimination datasets.
UR  - https://arxiv.org/abs/1608.03735
ER  - 

TY  - CONF
T1  - Classification Rule Mining Supported by Ontology for Discrimination Discovery
T2  - Data Mining Workshops (ICDMW), 2016 IEEE 16th International Conference on
Y1  - 2016
A1  - Luong, Binh Thanh
A1  - Salvatore Ruggieri
A1  - Franco Turini
AB  - Discrimination discovery from data consists of designing data mining methods for the actual discovery of discriminatory situations and practices hidden in a large amount of historical decision records. Approaches based on classification rule mining consider items at a flat concept level, with no exploitation of background knowledge on the hierarchical and inter-relational structure of domains. On the other hand, ontologies are a widespread and ever increasing means for expressing such a knowledge. In this paper, we propose a framework for discrimination discovery from ontologies, where contexts of prima-facie evidence of discrimination are summarized in the form of generalized classification rules at different levels of abstraction. Throughout the paper, we adopt a motivating and intriguing case study based on discriminatory tariffs applied by the U. S. Harmonized Tariff Schedules on imported goods.
JF  - Data Mining Workshops (ICDMW), 2016 IEEE 16th International Conference on
PB  - IEEE
ER  - 

TY  - CONF
T1  - A KDD process for discrimination discovery
T2  - Joint European Conference on Machine Learning and Knowledge Discovery in Databases
Y1  - 2016
A1  - Salvatore Ruggieri
A1  - Franco Turini
AB  - The acceptance of analytical methods for discrimination discovery by practitioners and legal scholars can be only achieved if the data mining and machine learning communities will be able to provide case studies, methodological refinements, and the consolidation of a KDD process. We summarize here an approach along these directions.
JF  - Joint European Conference on Machine Learning and Knowledge Discovery in Databases
PB  - Springer International Publishing
ER  - 

TY  - JOUR
T1  - Introduction to the special issue on Artificial Intelligence for Society and Economy
JF  - Intelligenza Artificiale
Y1  - 2015
A1  - Salvatore Ruggieri
VL  - 9
ER  - 

TY  - CONF
T1  - The layered structure of company share networks
T2  - Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on
Y1  - 2015
A1  - Andrea Romei
A1  - Salvatore Ruggieri
A1  - Franco Turini
AB  - We present a framework for the analysis of corporate governance problems using network science and graph algorithms on ownership networks. In such networks, nodes model companies/shareholders and edges model shares owned. Inspired by the widespread pyramidal organization of corporate groups of companies, we model ownership networks as layered graphs, and exploit the layered structure to design feasible and efficient solutions to three key problems of corporate governance. The first one is the long-standing problem of computing direct and indirect ownership (integrated ownership problem). The other two problems are introduced here: computing direct and indirect dividends (dividend problem), and computing the group of companies controlled by a parent shareholder (corporate group problem). We conduct an extensive empirical analysis of the Italian ownership network, which, with its 3.9M nodes, is 30× the largest network studied so far.
JF  - Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on
PB  - IEEE
ER  - 

TY  - CONF
T1  - Segregation Discovery in a Social Network of Companies
T2  - International Symposium on Intelligent Data Analysis
Y1  - 2015
A1  - Alessandro Baroni
A1  - Salvatore Ruggieri
AB  - We introduce a framework for a data-driven analysis of segregation of minority groups in social networks, and challenge it on a complex scenario. The framework builds on quantitative measures of segregation, called segregation indexes, proposed in the social science literature. The segregation discovery problem consists of searching sub-graphs and sub-groups for which a reference segregation index is above a minimum threshold. A search algorithm is devised that solves the segregation problem. The framework is challenged on the analysis of segregation of social groups in the boards of directors of the real and large network of Italian companies connected through shared directors.
JF  - International Symposium on Intelligent Data Analysis
PB  - Springer, Cham
ER  - 

TY  - CONF
T1  - Anti-discrimination analysis using privacy attack strategies
T2  - Joint European Conference on Machine Learning and Knowledge Discovery in Databases
Y1  - 2014
A1  - Salvatore Ruggieri
A1  - Sara Hajian
A1  - Kamiran, Faisal
A1  - Zhang, Xiangliang
AB  - Social discrimination discovery from data is an important task to identify illegal and unethical discriminatory patterns towards protected-by-law groups, e.g., ethnic minorities. We deploy privacy attack strategies as tools for discrimination discovery under hard assumptions which have rarely tackled in the literature: indirect discrimination discovery, privacy-aware discrimination discovery, and discrimination data recovery. The intuition comes from the intriguing parallel between the role of the anti-discrimination authority in the three scenarios above and the role of an attacker in private data publishing. We design strategies and algorithms inspired/based on Frèchet bounds attacks, attribute inference attacks, and minimality attacks to the purpose of unveiling hidden discriminatory practices. Experimental results show that they can be effective tools in the hands of anti-discrimination authorities.
JF  - Joint European Conference on Machine Learning and Knowledge Discovery in Databases
PB  - Springer, Berlin, Heidelberg
ER  - 

TY  - JOUR
T1  - On the complexity of quantified linear systems
JF  - Theoretical Computer Science
Y1  - 2014
A1  - Salvatore Ruggieri
A1  - Eirinakis, Pavlos
A1  - Subramani, K
A1  - Wojciechowski, Piotr
AB  - In this paper, we explore the computational complexity of the conjunctive fragment of the first-order theory of linear arithmetic. Quantified propositional formulas of linear inequalities with (k−1) quantifier alternations are log-space complete in ΣkP or ΠkP depending on the initial quantifier. We show that when we restrict ourselves to quantified conjunctions of linear inequalities, i.e., quantified linear systems, the complexity classes collapse to polynomial time. In other words, the presence of universal quantifiers does not alter the complexity of the linear programming problem, which is known to be in P. Our result reinforces the importance of sentence formats from the perspective of computational complexity.
VL  - 518
ER  - 

TY  - JOUR
T1  - Decision tree building on multi-core using FastFlow
JF  - Concurrency and Computation: Practice and Experience
Y1  - 2014
A1  - Aldinucci, Marco
A1  - Salvatore Ruggieri
A1  - Torquati, Massimo
AB  - The whole computer hardware industry embraced the multi-core. The extreme optimisation of sequential algorithms is then no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an in-depth study of the parallelisation of an implementation of the C4.5 algorithm for multi-core architectures. We characterise elapsed time lower bounds for the forms of parallelisations adopted and achieve close to optimal performance. Our implementation is based on the FastFlow parallel programming environment, and it requires minimal changes to the original sequential code. Copyright © 2013 John Wiley & Sons, Ltd.
VL  - 26
ER  - 

TY  - JOUR
T1  - Introduction to special issue on computational methods for enforcing privacy and fairness in the knowledge society
JF  - Artificial Intelligence and Law
Y1  - 2014
A1  - Sergio Mascetti
A1  - Ricci, Annarita
A1  - Salvatore Ruggieri
VL  - 22
ER  - 

TY  - JOUR
T1  - A multidisciplinary survey on discrimination analysis
JF  - The Knowledge Engineering Review
Y1  - 2014
A1  - Andrea Romei
A1  - Salvatore Ruggieri
AB  - The collection and analysis of observational and experimental data represent the main tools for assessing the presence, the extent, the nature, and the trend of discrimination phenomena. Data analysis techniques have been proposed in the last 50 years in the economic, legal, statistical, and, recently, in the data mining literature. This is not surprising, since discrimination analysis is a multidisciplinary problem, involving sociological causes, legal argumentations, economic models, statistical techniques, and computational issues. The objective of this survey is to provide a guidance and a glue for researchers and anti-discrimination data analysts on concepts, problems, application areas, datasets, methods, and approaches from a multidisciplinary perspective. We organize the approaches according to their method of data collection as observational, quasi-experimental, and experimental studies. A fourth line of recently blooming research on knowledge discovery based methods is also covered. Observational methods are further categorized on the basis of their application context: labor economics, social profiling, consumer markets, and others.
VL  - 29
ER  - 

TY  - JOUR
T1  - On quantified linear implications
JF  - Annals of Mathematics and Artificial Intelligence
Y1  - 2014
A1  - Eirinakis, Pavlos
A1  - Salvatore Ruggieri
A1  - Subramani, K
A1  - Wojciechowski, Piotr
AB  - A Quantified Linear Implication (QLI) is an inclusion query over two polyhedral sets, with a quantifier string that specifies which variables are existentially quantified and which are universally quantified. Equivalently, it can be viewed as a quantified implication of two systems of linear inequalities. In this paper, we provide a 2-person game semantics for the QLI problem, which allows us to explore the computational complexities of several of its classes. More specifically, we prove that the decision problem for QLIs with an arbitrary number of quantifier alternations is PSPACE-hard. Furthermore, we explore the computational complexities of several classes of 0, 1, and 2-quantifier alternation QLIs. We observed that some classes are decidable in polynomial time, some are NP-complete, some are coNP-hard and some are  ΠP2Π2P -hard. We also establish the hardness of QLIs with 2 or more quantifier alternations with respect to the first quantifier in the quantifier string and the number of quantifier alternations. All the proofs that we provide for polynomially solvable problems are constructive, i.e., polynomial-time decision algorithms are devised that utilize well-known procedures. QLIs can be utilized as powerful modelling tools for real-life applications. Such applications include reactive systems, real-time schedulers, and static program analyzers.
VL  - 71
ER  - 

TY  - JOUR
T1  - Using t-closeness anonymity to control for non-discrimination.
JF  - Trans. Data Privacy
Y1  - 2014
A1  - Salvatore Ruggieri
AB  - We investigate the relation between t-closeness, a well-known model of data anonymization  against attribute disclosure, and α-protection, a model of the social discrimination hidden in  data. We show that t-closeness implies bdf (t)-protection, for a bound function bdf () depending on  the discrimination measure f() at hand. This allows us to adapt inference control methods, such  as the Mondrian multidimensional generalization technique and the Sabre bucketization and redistribution  framework, to the purpose of non-discrimination data protection. The parallel between  the two analytical models raises intriguing issues on the interplay between data anonymization and  non-discrimination research in data protection.
VL  - 7
UR  - http://dl.acm.org/citation.cfm?id=2870623
ER  - 

TY  - CONF
T1  - Data Anonymity Meets Non-discrimination
T2  - Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
Y1  - 2013
A1  - Salvatore Ruggieri
JF  - Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
PB  - IEEE
ER  - 

TY  - CHAP
T1  - The discovery of discrimination
T2  - Discrimination and privacy in the information society
Y1  - 2013
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
A1  - Franco Turini
JF  - Discrimination and privacy in the information society
PB  - Springer
ER  - 

TY  - JOUR
T1  - Discrimination discovery in scientific project evaluation: A case study
JF  - Expert Systems with Applications
Y1  - 2013
A1  - Andrea Romei
A1  - Salvatore Ruggieri
A1  - Franco Turini
VL  - 40
ER  - 

TY  - CONF
T1  - Learning from polyhedral sets
T2  - Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Y1  - 2013
A1  - Salvatore Ruggieri
JF  - Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
PB  - AAAI Press
ER  - 

TY  - CONF
T1  - Who/Where Are My New Customers?
T2  - ISMIS Industrial Session
Y1  - 2011
A1  - S Rinzivillo
A1  - Salvatore Ruggieri
JF  - ISMIS Industrial Session
ER  - 

TY  - CONF
T1  - Integrating induction and deduction for finding evidence of discrimination
T2  - ICAIL
Y1  - 2009
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
A1  - Franco Turini
JF  - ICAIL
ER  - 

TY  - CONF
T1  - Measuring Discrimination in Socially-Sensitive Decision Records
T2  - SDM
Y1  - 2009
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
A1  - Franco Turini
JF  - SDM
ER  - 

TY  - CONF
T1  - A Case Study in Sequential Pattern Mining for IT-Operational Risk
T2  - ECML/PKDD (1)
Y1  - 2008
A1  - Valerio Grossi
A1  - Andrea Romei
A1  - Salvatore Ruggieri
JF  - ECML/PKDD (1)
ER  - 

TY  - CONF
T1  - Discrimination-aware data mining
T2  - KDD
Y1  - 2008
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
A1  - Franco Turini
JF  - KDD
ER  - 

TY  - CONF
T1  - Typing Linear Constraints for Moding CLP() Programs
T2  - SAS
Y1  - 2008
A1  - Salvatore Ruggieri
A1  - Frédéric Mesnard
JF  - SAS
ER  - 

TY  - JOUR
T1  - Bounded Nondeterminism of Logic Programs
JF  - Ann. Math. Artif. Intell.
Y1  - 2004
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
VL  - 42
ER  - 

TY  - CONF
T1  - Characterisations of Termination in Logic Programming
T2  - Program Development in Computational Logic
Y1  - 2004
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
A1  - Jan-Georg Smaus
JF  - Program Development in Computational Logic
ER  - 

TY  - JOUR
T1  - On logic programs that always succeed
JF  - Sci. Comput. Program.
Y1  - 2003
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
VL  - 48
ER  - 

TY  - JOUR
T1  - Classes of terminating logic programs
JF  - TPLP
Y1  - 2002
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
A1  - Jan-Georg Smaus
VL  - 2
ER  - 

TY  - CONF
T1  - Negation as Failure through Abduction: Reasoning about Termination
T2  - Computational Logic: Logic Programming and Beyond
Y1  - 2002
A1  - Paolo Mancarella
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
JF  - Computational Logic: Logic Programming and Beyond
ER  - 

TY  - JOUR
T1  - Classes of Terminating Logic Programs
JF  - CoRR
Y1  - 2001
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
A1  - Jan-Georg Smaus
VL  - cs.LO/0106
ER  - 

TY  - CONF
T1  - Data Mining for Intelligent Web Caching
T2  - ITCC
Y1  - 2001
A1  - Francesco Bonchi
A1  - Fosca Giannotti
A1  - Giuseppe Manco
A1  - Chiara Renso
A1  - Mirco Nanni
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
JF  - ITCC
ER  - 

TY  - CONF
T1  - Data Mining for Intelligent Web Caching
T2  - ITCC
Y1  - 2001
A1  - Francesco Bonchi
A1  - Fosca Giannotti
A1  - Giuseppe Manco
A1  - Chiara Renso
A1  - Mirco Nanni
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
JF  - ITCC
ER  - 

TY  - JOUR
T1  - Web Log Data Warehousing and Mining for Intelligent Web Caching
JF  - Data and Knowledge Engineering
Y1  - 2001
A1  - Francesco Bonchi
A1  - Fosca Giannotti
A1  - Cristian Gozzi
A1  - Giuseppe Manco
A1  - Mirco Nanni
A1  - Dino Pedreschi
A1  - Chiara Renso
A1  - Salvatore Ruggieri
N1  - 39:165, November .
ER  - 

TY  - JOUR
T1  - Web log data warehousing and mining for intelligent web caching
JF  - Data Knowl. Eng.
Y1  - 2001
A1  - Francesco Bonchi
A1  - Fosca Giannotti
A1  - Cristian Gozzi
A1  - Giuseppe Manco
A1  - Mirco Nanni
A1  - Dino Pedreschi
A1  - Chiara Renso
A1  - Salvatore Ruggieri
VL  - 39
ER  - 

TY  - JOUR
T1  - Web log data warehousing and mining for intelligent web caching
JF  - Data Knowl. Eng.
Y1  - 2001
A1  - Francesco Bonchi
A1  - Fosca Giannotti
A1  - Cristian Gozzi
A1  - Giuseppe Manco
A1  - Mirco Nanni
A1  - Dino Pedreschi
A1  - Chiara Renso
A1  - Salvatore Ruggieri
VL  - 39
ER  - 

TY  - CONF
T1  - Bounded Nondeterminism of Logic Programs
T2  - ICLP
Y1  - 1999
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
JF  - ICLP
ER  - 

TY  - JOUR
T1  - On Logic Programs That Do Not Fail
JF  - Electr. Notes Theor. Comput. Sci.
Y1  - 1999
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
VL  - 30
ER  - 

TY  - JOUR
T1  - Verification of Logic Programs
JF  - J. Log. Program.
Y1  - 1999
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
VL  - 39
ER  - 

TY  - JOUR
T1  - A Mediator Approach for Representing Knowledge
JF  - Intelligent Multimedia Presentation Systems. Human Computer Interaction Letters, 1 (1): 32-38, April 1998.
Y1  - 1998
A1  - Chiara Renso
A1  - Salvatore Ruggieri
ER  - 

TY  - JOUR
T1  - Weakest Preconditions for Pure Prolog Programs
JF  - Inf. Process. Lett.
Y1  - 1998
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
VL  - 67
ER  - 

TY  - JOUR
T1  - Verification of Meta-Interpreters
JF  - J. Log. Comput.
Y1  - 1997
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
VL  - 7
ER  - 

TY  - CONF
T1  - A Case Study in Logic Program Verification: the Vanilla Metainterpreter
T2  - GULP-PRODE
Y1  - 1995
A1  - Dino Pedreschi
A1  - Salvatore Ruggieri
JF  - GULP-PRODE
ER  -