Tutorial

Home
Tutorial

Tutorial

How to follow the conference

In order to attend the conference correctly, please ensure to use:
Zoom Client for Meetings desktop application (version 5.3.1 or higher)

1.
Discovering Gender Bias and Discrimination in Language

2.
Learning to Quantify: Supervised Prevalence Estimation for Computational Social Science

3.
Social Information for emergency response : Actionable information from unconventional data sources

Discovering Gender Bias and Discrimination in Language

The tutorial focuses on the issue of digital discrimination, particularly towards gender. Its main goal is to help participants improve their digital literacy by understanding the social issues at stake in digital (gender) discrimination, and learning about technical applications and solutions. The tutorial is divided in four parts: it basically iterates twice through the social and technical dimensions. We make use of our own research in language modelling and Word Embeddings in order to clarify how human gender biases may be incorporated into AI/ML models. We first offer a short introduction into digital discrimination and (gender) bias. We give examples of gender discrimination in the field of AI/ML, and discuss the clear gender binary (M/F) that is presupposed when dealing with computational bias towards gender. We then move to a technical perspective, introducing the DADD Language Bias Visualiser which allows us to discover and analyse gender bias using Word Embeddings. Finally, we show how computational models of bias and discrimination are built on implicit binaries, and discuss with participants the difficulties pertaining to these assumptions in times of post-binary gender attribution. The tutorial will also include pre- and post-tutorial questionnaires, which are intended to track participants’ digital discrimination literacy, as well as the explanatory value of the tutorial. The tutorial comes from our UK EPSRC-funded project Discovering and Attesting Digital Discrimination (DADD), a cross-disciplinary project at King’s College London addressing research questions on digital discrimination involving academic (Computer Science, Digital Humanities, Law and Ethics) and non-academic partners (Google, AI Club), and the general public, including technical and non-technical users.

Organizers:

Mark Coté, King’s College London, UK - mark.cote@kcl.ac.uk
Xavier Ferrer Aran, King’s College London, UK - xavier.ferrer_aran@kcl.ac.uk
Tom van Nuenen, King’s College London tom.van_nuenen@kcl.ac.uk

Learning to Quantify: Supervised Prevalence Estimation for Computational Social Science

What is supervised prevalence estimation (or: “quantification”), exactly? In a nutshell: classification stands to quantification as individual unlabelled data items stand to entire sets of such items. Learning to quantify thus means learning to predict how many (as opposed to which) unlabelled data items, in a sample of such items, belong to a given class. Obviously, quantification might be solved by classifying all the items in the sample and counting how many such items have been assigned to the class of interest. However, there are both theoretical arguments and experimental results showing that this “classify and count” method leads to suboptimal quantification accuracy: much better accuracy may be achieved by using, instead of standard classification technology, supervised learning methods and algorithms that have explicitly been designed for quantification. Social science (SS) is a discipline that is inherently interested not in individual data but in aggregate data. This tutorial thus aims to raise the awareness of computational SS researchers on the fact that, when they are using classification technology, their research would almost always benefit from using quantification technology instead. The tutorial will introduce the attendees to the main supervised learning techniques that have been proposed for solving quantification, to the metrics used to evaluate these techniques, and to some off-the-shelf, publicly available software packages that implement them.

Organizers:

Alejandro Moreo Fernandez, Institute of Information Science and Technologies (ISTI) of the Italian National Research Council (CNR), Pisa, Italy, alejandro.moreo@isti.cnr.it
Fabrizio Sebastiani, Institute of Information Science and Technologies (ISTI) of the Italian National Research Council (CNR), Pisa, Italy fabrizio.sebastiani@isti.cnr.it

Requirements for participants

During the hands-on exercises of the tutorial on 'Learning to Quantify' we will be using the scikit-learn machine learning library for Python 3. Basic knowledge of the Python programming language is assumed. In order to follow the lesson, you can either choose one of the following options:

Simply login into Google's Colab: the code you type through the browser will run on a Google's server, so you don't have to install anything.
For those of you who instead prefer to work locally on your own computer, please be sure to have installed and operative the scikit-learn library. This typically involves installing the so-called 'SciPy Stack' which includes SciPy itself, NumPy, Pandas, Matplotlib,SymPy, and IPython. The IPython package will allow you to code, write text and annotations, and visualize results and plots, interactively through the Jupyter's notebooks -- we will be using that in our lecture.

Exercises and dataset

A few useful links

Social Information for emergency response : Actionable information from unconventional data sources

Every year more than 150 Million people worldwide are affected by natural disasters. As declared by the United Nations Office for the Coordination of Humanitarian Affairs, “The first 72 hours after a disaster are crucial; response must begin during that time to save lives”. Social media data has demonstrated to be a potential data source to provide actionable data just after a disaster happens thus allowing emergency responder to better coordinate their activities. However, social media data also presents many challenges regarding data quality and geolocation. Also, over the years we have had several enabling technologies to retrieve high volume of data, and artificial intelligence is often perceived as a potential replacement for human intelligence when doing data classification tasks. However the need in some cases to deliver results within a critical response time, is a major challenge. In this tutorial we will see how crowdsourcing assisted by artificial intelligence can make a significant contribution, especially where critical thinking and decision making is needed, in extracting actionable information from unconventional data sources. The tutorial will introduce the basics for extracting and analyzing information from social media, with a specific focus of extracting images in an emergency after a natural disaster. The tutorial will provide the basics about crawling and tweet analysis. A specific focus will be given to fine-grained geolocalization of tweets and crowdsourcing to filter relevant images and confirm geolocations, which are needed to provide high-quality information. The experiences with social media analysis, geolocalization, and crowdsourcing obtained in a recently concluded H2020 project E2mC (Evolution of Emergency Copernicus services) and in the on-going H2020 project CROWD4SDG Citizen Science for Monitoring Climate Impacts and Achieving Climate Resilience will be illustrated. The objective of the tutorial is to provide an introduction and hands-on experience in some of the tools available in the field of emergency information system and in particular on tools enabling the search of posts, focusing on Twitter and considering also alternative social media, post analysis, with text analysis techniques based on NLP and relevant image analysis approaches to filter images according to different criteria, and tools for setting up a crowdsourcing environment, based on the PyBossa open source tool, and for evaluating the quality of results from crowdsourcing.The methods and processes for using such tools in a sudden emergency to gather different types of information to support first responders and decision makers will also be discussed.

Organizers

Barbara Pernici, Politecnico di Milano, Italy - barbara.pernici@polimi.it
Jose Luis Fernandez Marquez, University of Geneva, Switzerland - joseluis.fernandez@unige.ch
Amudha Ravi Shankar, University of Geneva, Switzerland - amudha.ravishankar@unige.ch
Gabriele Scalia, Politecnico di Milano, Italy - gabriele.scalia@polimi.it

Program of the tutorial

H1 – Social media and emergency management

Amudha Ravi Shankar

Introduction to Big Crisis Data (Amudha)

Digital humanitarians initiatives

Copernicus and Emergency Mapping Service

Brief introduction to HOT

Social media: a focus on Twitter

Restrictions and ethical issues

The concept of relevance, actionable data?

Challenges faced by existing DH initiatives ? Addressing the data gaps from the perspective of Emergency Responder. Need for Unconventional data sources.

H2 – Augmenting social sensing

Barbara Pernici, Gabriele Scalia, Amudha Ravi Shankar

Introduction to the European H2020 E2mC Project

Information contained in a social media post, introduction to geoparsing

Crawling with Python and Twitter APIs, limitations, keywords

Geoparsing: CIME

Other resources: a first intro to E2mC crowdsourcing tools

Pipelines

H3 – Crowdsourcing

Jose Luis Fernandez-Marquez

Citizen Cyberlab

Intro to the H2020 Crowd4SDG Project

Crowdsourcing in emergencies: introduction to crowdsourcing tools

Crowdsourcing in E2mC

Assessing crowdsourcing results

Relevance
Geolocation
SDG solution kit (and project builder)
Need for digital humanitarians

Hands on (all): EMS crowdsourcing, one of the COVID crowdsourcing projects, Project builder for PyBossa

Requirements and message for the participants

No special equipment or software is needed (online services will be used, a browser is sufficient)

A questionnaire will be sent to the participants before the event (please contact crowd4SDG[at]polimi.it)

Participants under 27 yrs old are invited to consider participating in the Urban water resilience challenge (see #Open17Water) challenge, presenting a short video by October 4. The tutorial will provide some tools that can be used during the challenge, if selected for the next steps.

Tutorial Chair

Luca Pappalardo, ISTI-CNR, Italy
Marco De Nadai, Fbk, Italy

Event detail

6-9 October, 2020

Virtual Conference

Contacts and social updates

You can contact us at socinfo2020[at]isti.cnr.it

Follow the conference using #SocInfo20 on:

Tutorial

How to follow the conference

1.Discovering Gender Bias and Discrimination in Language

2.Learning to Quantify: Supervised Prevalence Estimation for Computational Social Science

3.Social Information for emergency response : Actionable information from unconventional data sources

Discovering Gender Bias and Discrimination in Language

Organizers:

Learning to Quantify: Supervised Prevalence Estimation for Computational Social Science

Organizers:

Requirements for participants

A few useful links

Social Information for emergency response : Actionable information from unconventional data sources

Organizers

H1 – Social media and emergency management

H2 – Augmenting social sensing

H3 – Crowdsourcing

Requirements and message for the participants

Tutorial Chair

Event detail

Contacts and social updates

1.
Discovering Gender Bias and Discrimination in Language

2.
Learning to Quantify: Supervised Prevalence Estimation for Computational Social Science

3.
Social Information for emergency response : Actionable information from unconventional data sources