In 2000 mobile phone users accounted for 12% of the world’s population. By the end of 2014, this figure had reached 96%, i.e., 6.8 billion people. The number of mobile phones in developed countries amounts to 128% of the inhabitants and 90% in developing countries. There has been an explosive increase in the number of ways we use them through their built-in sensors, capable of recording location, acceleration, acquiring images and videos, interacting with other devices and, obviously, connecting to the internet. Probably the most unexpected and disruptive effect of the emergence of always-connected mankind is data, the digital breadcrumbs that we leave behind us. Because, thanks to these data, human activities on a global scale can be observed and, therefore, measured, quantified and, ultimately, predicted. Only fortune-tellers and consultants can predict the future without data, says the network scientist Laszlo Barabasi. But if we get enough detailed information on some phenomenon, even an unexpected or bizarre one – a black swan – then we can predict it.
Therefore, it is not too surprising how many aspects of our daily behaviour, like our whereabouts and purchases, become predictable, given the regularity of our routines! And when we discover that by observing the “likes” on Facebook it is possible to figure out if someone is gay with an accuracy exceeding 95%, or that our income can be estimated from the photos we publish, or that, on the basis of where we go on a Friday, guess if someone is Muslim, we then realise that the we have crossed a phase transition. Nothing will be as before.
A new, powerful tool is being created that will enable us to see the world and our society through eyes that were not available to us before. It will enable us to predict epidemics, instability and economic crises. It will help us to predict the consequences of our decisions both at collective and individual levels. Therefore, we will be in a position to make better choices, be more aware, understand and, perhaps, manage the complexity of the pluralistic and interconnected society we live in. It will improve our wellbeing.
At the same time, however, big data is turning us into little mice under a lens, microorganisms on the slide of a microscope. And if someone can link back our identity and most intimate sphere to data, then we may find ourselves at the mercy of sorcerers’ apprentices looking for alleged terrorists or for unscrupulous business opportunities. No doubt that in our imagination, today, big data is associated to a sense of vanishing privacy, fear of surveillance and social control, rather than the progressive and magnificent destiny of the use of data as a common good. Yet, we cannot afford to do without this source of knowledge; we absolutely need it to deal with the complexity of our societies, with the challenges posed by poverty, energy, unemployment, inequality, food, environment, health. And democracy. At the same time we cannot give up the right of managing our information and communications freely, sharing what we want with whom we choose and like. Otherwise, the very idea of democracy collapses.
Solutions are therefore urgently needed to ensure that knowledge and freedom coexist side by side. Is this wishful thinking? No. The fact that in this initial phase of a measurable society there are few large harvesters, or “latifundists”, who store data on masses of people in large inaccessible repositories, does not mean that this is the only model nor that it is the most efficient. On the contrary, many experiences worldwide demonstrate that open access to interesting data stimulates creativity, new business ideas and new jobs. The point is how to do this in a safe and ethical context that is facilitated by responsible and transparent technology. These issues are being widely investigated throughout the world. Here, in Europe, infrastructures to carry out research on big data placing ethics and privacy at their core are being developed. How? There are two lines of attack: a technology for today’s data, privacy-by-design, and a “new deal” for tomorrow’s data.
Privacy-by-design means bearing in mind the safeguards for personal data protection from the very beginning of the design of new services based on big data, in such a way that the privacy risks of all users involved is measured, and kept acceptably low. What the sceptics, the “privacy-is-dead” guys, ignore is that quality of services can coexist with privacy. They ignore that the overwhelming majority of services for citizens to optimise their journeys, or decision-makers to optimise public transport networks, can be fine-tuned by leveraging personal data coming out of protected systems only after such data have been transformed to make sure that there is a negligible risk that a specific user is recognised and their personal information (for instance, where they go on a Friday evening) leaked. This approach can be used for telephone data, for instance, or GPS tracking of car sat-nav systems in order to exploit the vast reservoir of knowledge within the databases of telecom and telematics operators. For several years now, our research laboratories have been applying the privacy-by-design approach to develop projects and prototypes with various industrial partners like Wind, OctoTelematics and Toyota. With the Toyota InfoTechnology centre of Tokyo, for instance, we are developing a system to perform empirical risk assessment when outsourcing personal mobility data to third parties to develop innovative services. As of today, however, privacy-by-design is still not a common practice in big data analytics, and not just in Italy. It will soon be a specific provision in the new European Directive on privacy. Hopefully, this will promote the safe use of data and avoid a “tragedy of data commons”, a term coined by law researcher Jane Yakowitz to describe today’s situation where highly valuable common resources are wasted because of a combination of commercial interest, arrogance, ignorance, fear, wrong perception of risk, and lack of trust.
This is for today or, at most, for tomorrow morning. In the longer term, passive data collection needs to be replaced with participation, based on the awareness of the value of own personal data for each one of us, individually. We all need to own and use our own data. Every year, each person leaves behind 3 gigabytes of digital breadcrumbs, disseminated in the most diverse systems and services that we use for our daily activities, to travel, communicate, pay for goods, bills and food, banking, sport, searching the web, reading, playing, texting, writing, posting or tweeting, screening our health. Three gigabytes, without taking photos and videos into account, otherwise numbers would grow considerably. An avalanche of personal information that, in most cases, gets lost – like tears in the rain. Yet, only each of use, individually, has the power to connect all this personal information into some personal data repository. No Google or Facebook has a similar power today and we should very carefully avoid giving it away in the future.
Let’s imagine for a moment that we have this mechanism – a “personal data store” – that is not just a place where all our tracks can be stored but that can extract meaning from them and offer us an image of ourselves, our image reflected in the digital mirror. This data store could help us understand our behavioural, eating and shopping patterns or, at least, how these emerge from the tracks we leave behind. It could also give us an opportunity to compare our patterns with those of the community, with possible alternatives, to raise our self-awareness and our ability to change and improve. This is feasible and it is being experienced within real communities throughout the world. We are currently performing a social experiment with some hundreds of volunteers who, in conjunction with Telecom Italia and Coop, are creating an ecosystem of personal data stores focused on mobility, telecom services and supermarket purchases.
There are important ethical challenges facing us. In any case, the idea of ensuring that everyone masters self-produced data is a game changer. If we manage to understand the importance of personal data in our daily lives to simplify, be more efficient and diversify, then we boost the emergence of a totally different ecosystem where information can flow without the need to concentrate data in large centralised repositories. An ecosystem where each one of us, rather than giving up own data by agreeing to some obscure disclaimer, decides whether to answer or not to questions asked by other people or entities, based on one’s own interest in participating and the trust we have on the interlocutors. An ecosystem that, seen from the outside, looks like a large database we can ask queries to, but in reality is a peer-to-peer network of people with their personal data stores, who can choose to cooperate in order to reply. Clearly, there are bits of IT and ethical technology that are partly missing to make sure that this ecosystem works, but there are many working on this. With the Skill Centre of Trento in Italy, Alex Pentland of the MIT Media Lab is developing architectures for personal data store and studying business models that can make it sustainable and safe. Protocols like Ubiquitous Commons are being put forward as safe peer-to-peer communication mechanisms that can support growth from the bottom up of peer networks to share information. Decentralised reputation systems for people and organisations need to be deployed, because networks of this type can grow and flourish only based on trust and their ability to self-organize. Remuneration and incentive mechanisms need to be developed accordingly. Visualisation and storytelling, to ensure that individual and collective knowledge can be used by everybody – decision-makers, business people, citizens – need to be developed, together with the “data literacy” skills of everybody in our societies. Because such ecosystem requires both data scientists able to make sense of data and citizens able to participate as active prosumers of data and information.
The path leading to harmony between individual and collective knowledge might seem a convoluted and unlikely one. But we will get there. Because the potential of individual knowledge, the energy that is imprisoned in those 3 gigabytes per year of personal breadcrumbs, will eventually be perceived by many. This wave will be unstoppable. In the mean time, some good news: in its Horizon 2020 programme, the European Commission has decided to fund a new research infrastructure on big data analytics and social mining. This is a network of European centres making available data, tools and resources as well as the skills of their data scientists for large and small experiments on big data analytics to be carried out by researchers, innovators, start-uppers, policy-makers, business and public institutions. A wind-tunnel for big data research, which makes it accessible and doable in a protected environment. It is called SoBigData and is coordinated by an Italian scientist, Fosca Giannotti from Italy’s CNR in Pisa. SoBigData was proposed by our European laboratory of Big Data Analytics & Social Mining http://www.sobigdata.eu together with centres in Sheffield, Hanover, Zurich, Helsinki, Tartu, Delft and London. It is a place where scientists and innovators can experiment and grow the ecosystem of data and knowledge: an embryo of the digital nervous system for our society, open, ethical and participatory, as necessary today as the air we breathe.