The tutorial focuses on the issue of digital discrimination, particularly towards gender. Its main goal is to help participants improve their digital literacy by understanding the social issues at stake in digital (gender) discrimination, and learning about technical applications and solutions. The tutorial is divided in four parts: it basically iterates twice through the social and technical dimensions. We make use of our own research in language modelling and Word Embeddings in order to clarify how human gender biases may be incorporated into AI/ML models. We first offer a short introduction into digital discrimination and (gender) bias. We give examples of gender discrimination in the field of AI/ML, and discuss the clear gender binary (M/F) that is presupposed when dealing with computational bias towards gender. We then move to a technical perspective, introducing the DADD Language Bias Visualiser which allows us to discover and analyse gender bias using Word Embeddings. Finally, we show how computational models of bias and discrimination are built on implicit binaries, and discuss with participants the difficulties pertaining to these assumptions in times of post-binary gender attribution. The tutorial will also include pre- and post-tutorial questionnaires, which are intended to track participants’ digital discrimination literacy, as well as the explanatory value of the tutorial. The tutorial comes from our UK EPSRC-funded project Discovering and Attesting Digital Discrimination (DADD), a cross-disciplinary project at King’s College London addressing research questions on digital discrimination involving academic (Computer Science, Digital Humanities, Law and Ethics) and non-academic partners (Google, AI Club), and the general public, including technical and non-technical users.
What is supervised prevalence estimation (or: “quantification”), exactly? In a nutshell: classification stands to quantification as individual unlabelled data items stand to entire sets of such items. Learning to quantify thus means learning to predict how many (as opposed to which) unlabelled data items, in a sample of such items, belong to a given class. Obviously, quantification might be solved by classifying all the items in the sample and counting how many such items have been assigned to the class of interest. However, there are both theoretical arguments and experimental results showing that this “classify and count” method leads to suboptimal quantification accuracy: much better accuracy may be achieved by using, instead of standard classification technology, supervised learning methods and algorithms that have explicitly been designed for quantification. Social science (SS) is a discipline that is inherently interested not in individual data but in aggregate data. This tutorial thus aims to raise the awareness of computational SS researchers on the fact that, when they are using classification technology, their research would almost always benefit from using quantification technology instead. The tutorial will introduce the attendees to the main supervised learning techniques that have been proposed for solving quantification, to the metrics used to evaluate these techniques, and to some off-the-shelf, publicly available software packages that implement them.
During the hands-on exercises of the tutorial on 'Learning to Quantify' we will be using the scikit-learn machine learning library for Python 3. Basic knowledge of the Python programming language is assumed. In order to follow the lesson, you can either choose one of the following options:
Every year more than 150 Million people worldwide are affected by natural disasters. As declared by the United Nations Office for the Coordination of Humanitarian Affairs, “The first 72 hours after a disaster are crucial; response must begin during that time to save lives”. Social media data has demonstrated to be a potential data source to provide actionable data just after a disaster happens thus allowing emergency responder to better coordinate their activities. However, social media data also presents many challenges regarding data quality and geolocation. Also, over the years we have had several enabling technologies to retrieve high volume of data, and artificial intelligence is often perceived as a potential replacement for human intelligence when doing data classification tasks. However the need in some cases to deliver results within a critical response time, is a major challenge. In this tutorial we will see how crowdsourcing assisted by artificial intelligence can make a significant contribution, especially where critical thinking and decision making is needed, in extracting actionable information from unconventional data sources. The tutorial will introduce the basics for extracting and analyzing information from social media, with a specific focus of extracting images in an emergency after a natural disaster. The tutorial will provide the basics about crawling and tweet analysis. A specific focus will be given to fine-grained geolocalization of tweets and crowdsourcing to filter relevant images and confirm geolocations, which are needed to provide high-quality information. The experiences with social media analysis, geolocalization, and crowdsourcing obtained in a recently concluded H2020 project E2mC (Evolution of Emergency Copernicus services) and in the on-going H2020 project CROWD4SDG Citizen Science for Monitoring Climate Impacts and Achieving Climate Resilience will be illustrated. The objective of the tutorial is to provide an introduction and hands-on experience in some of the tools available in the field of emergency information system and in particular on tools enabling the search of posts, focusing on Twitter and considering also alternative social media, post analysis, with text analysis techniques based on NLP and relevant image analysis approaches to filter images according to different criteria, and tools for setting up a crowdsourcing environment, based on the PyBossa open source tool, and for evaluating the quality of results from crowdsourcing.The methods and processes for using such tools in a sudden emergency to gather different types of information to support first responders and decision makers will also be discussed.
Introduction to Big Crisis Data (Amudha)
Hands on (all): EMS crowdsourcing, one of the COVID crowdsourcing projects, Project builder for PyBossa