The real you: say goodbye to online anonymity @ The New Scientist

TheRealYou.jpg

The real you: say goodbye to online anonymity @ The New Scientist

2011/11/03 00:00 Dino Pedreschi 0 Comments Originally posted on NewScientist

Online anonymity may be a luxury we can no longer afford – and it's disappearing fast anyway. Are we ready to bare all on the internet?

IN JULY last year, Orlando Figes, one of the UK's most eminent historians, admitted posting savage critiques of rivals' books on the Amazon website under the pseudonym "Historian" - alongside praise of his own. His eventual confession came only after he had threatened to take legal action against anyone who accused him of the misdeed. Later he blamed his wife for the reviews.

Figes's online behaviour is an example of what's known as sock puppetry - pretending to be someone other than who you are for the sake of furthering your own interests. It made for a juicy academic scandal that in the end hurt him more than anyone else, but the consequences of the internet's ability to cloak users' identities aren't always so confined. Vicious cyberbullying has, in extreme cases, driven victims to suicide. Scammers and spammers can hijack email addresses to steal banking credentials and even state secrets. Earlier this year, for example, a convincing email fooled several senior US government officials into handing over their email passwords to hackers. For all the benefits that the internet has brought us, it often remains a deeply uncivilised place.

Illegal and just plain bad behaviour online has sparked discussions of new laws to combat cyberbullying and secure the internet from criminal activity. Such legislation may soon be irrelevant. Several companies are building tools that can identify internet users with unprecedented precision. Proponents claims the new tools will lead to a safer and less hostile internet. If the internet is to keep developing, they say, perhaps we can no longer afford to live in an anonymous environment where no one need ever be held accountable for their actions. Are we ready to abandon the option of shielding our online identity?

"The internet would be better if we had an accurate notion that you were a real person as opposed to a dog, or a fake person, or a spammer," Eric Schmidt, Google's executive chairman, told an audience in the UK at the Edinburgh Television Festival in August.

The ability to be a "fake person" is a large part of why comments on otherwise respectable publications often descend into outright abuse. Similarly, the ease with which it's possible for wrongdoers to cover their tracks online enables credit-card fraud, which costs retailers and the card companies hundreds of millions of dollars a year in the US alone. Retailers simply have no easy way of knowing whether the buyer is who they claim to be. So it remains fairly easy to purchase goods online using a stolen credit card. A more robust method for verifying identity would almost certainly reduce fraud.

So why not have people take their identities with them any time they use the internet? The problem, simply put, is that no one wants to be forced to use their real name online. At least one previous attempt to create an identity system for the internet, Microsoft's Passport initiative of the early 2000s, failed in part because privacy advocates objected to one organisation controlling the process.

That changed, however, with the arrival of Facebook. To use the social network, you must register with your real name. Once logged in, everything you do - posting messages, sending messages, tagging photos - is attributed to what, for most users, is their actual, offline identity.

Using your real name on a single site may not seem like a big deal, but Facebook's reach continues to creep beyond Facebook.com. Since 2008, websites have been able to integrate with Facebook in such a way that their visitors must be Facebook users to post comments. Each post is clearly associated with the user who made it.

Why would anyone want this? As it turned out, Facebook integration benefited not only the websites that adopted it, by making it much easier to police comments, but also individual users, who could then conveniently log into multiple sites with their Facebook password. Users' comments could also be set to be copied automatically to their Facebook wall, so their friends could easily see what they had posted.

The effects were immediate. Before popular technology blog TechCrunch introduced Facebook integration earlier this year, half of their comments were either from spammers hawking their wares or were "trollish nonsense", wrote TechCrunch columnist M. G. Siegler in an article explaining the site's decision. After the change, the majority of comments became "coherent thoughts in response to the post itself - you know, what a comment is supposed to be".

Google's new social network, Google+, also requires people to use their real names when signing up. And with Facebook's and Google's social networks likely to become increasingly intertwined with the rest of the web, eventually one or both could very well emerge as de facto online identity systems.

Most people who tie their real identites to their Facebook and Google accounts don't need much policing. Scammers and trolls, on the other hand, can simply decamp to the parts of the online world that are not integrated with either company.

There too, however, identity technologies are waiting to catalogue their misdeeds. Whether or not you share your identity with the websites you visit is no longer entirely within your control.

Indelible ID

One of the most powerful identity tracking systems now available is offered by BlueCava, a company based in Irvine, California, that helps websites monitor fraud, among other things. BlueCava's software "fingerprints" any device that someone uses to visit a website, be it a desktop or laptop computer or a mobile device like a smartphone. That fingerprint is made possible by the hundreds of types of data that browsers send when connecting to a website, from the machine's operating system to the time zone in which the device is set to operate (see diagram).

You might be surprised at just how mundane these details can be. Consider one of the data types passed from browser to website: the fonts installed on your machine. They will include not just the fonts that it came with, but also fonts that may have been included with software you installed, making the complete list distinctive. "A typical machine has 4000 to 20,000 fonts," says BlueCava chief executive David Norris. Fall outside this average and your machine is distinctive. "If you have 1926 that's a lot of uniqueness," Norris says.

See more: view the data your browser is passing to websites using an online test from privacy.net.

BlueCava combines these bits of information to create a unique ID number for every device that accesses a website running the company's software. The firm has assembled a dossier on 1 billion devices, and Norris estimates that the number will double in the coming year. At this rate, it won't be long, he says, before all 10 billion internet-enabled computers in the world have a place in BlueCava's repository. Norris claims that when presented with a query from a machine already in the database, the software can recognise its source 99.5 per cent of the time.

BlueCava also has what it calls a reputation exchange - a database of information on how devices that the company has fingerprinted have been used over the past year. When online fraud occurs, information about the guilty party's computer is sent to the database. Retailers who check the database can then decide to bar that device from being used to make purchases on their website.

Although the system can be extremely effective, it does have one large loophole: not every computer has just one user. Take the cybercriminal who logs in at an internet cafe, for example. Even when a device only has a single user, they can cover their tracks when sending email, which is relayed via mail servers and doesn't necessarily leave accurate traces back to the originating device. Neither voluntary identity-sharing via social networks nor device fingerprinting can address this issue, leaving scammers and spammers free to continue using email for their nefarious purposes.

Most spam is no more than annoying, and most people manage to resist email offers of cheap Viagra. But hackers are infamously handy at faking the originating address so that an email appears to come from a trusted source, like a close relative, a bank employee or indeed anyone the hacker thinks you will deem trustworthy. "Give me ten minutes and I can arrange for you to get an email from Santa Claus," says Patrick Juola, a computer scientist at Duquesne University in Pittsburgh, Pennsylvania. While filters may block some of these messages, they offer no hope of definitively determining the sender's identity.

Here too, however, solutions are beginning to emerge. One approach traces its roots to the collapse of the energy giant Enron in 2001. In the ensuing government investigation, 1.5 million company emails were made public. Researchers analysed them and found patterns in the way the messages were written. This has helped them build software that can identify the author of an email.

Such systems rely on idiosyncrasies in everyday writing. Some people prefer long sentences to short ones; one writer is formal while another embraces slang; Americans and Brits use different spellings. Preposition use also varies: we can "work for" or "work at" a company, but an individual will probably not switch between the two forms in writing. Given a big enough database of a person's writing style, and algorithms that can accurately find patterns in it, researchers can compile a style signature that can be used to check whether a piece of text was authored by that person.

Earlier this year Juola helped organise a competition that tested various programs on the Enron emails, as part of the run-up to a workshop entitled "Uncovering Plagiarism, Authorship, and Social Software Misuse", held in Amsterdam, the Netherlands, in September. The results showed that the software entrants correctly identified the author of up to 70 per cent of the emails they were tested on. This accuracy is likely to increase, Juola says.

It has to be you

Authorship authentication software could be bolted onto a company's email system, where it would identify the sender of a message not by their name - easily spoofed in emails - but by their style of writing, which is much harder to fake. The same software could be used to analyse what people post online. Tempted to write a glowing review of a friend's product under a fake name? Or to anonymously criticise a rival, like Figes did? Do it often enough and such software may one day reveal links to material written under your real name.

Now stand back for a minute. A series of robust identity technologies is spreading across the web. Powerful new authentication methods like writing-style analysis are probably just a couple of years away from being put into widespread use. In a report issued this April, the US government issued a report calling for an interlocking system of compatible identity systems. It seems like one is already emerging.

Will online anonymity, and the crime and abuse that come with it, become a remnant of a past age? Is the internet about to grow up?

Before we throw a coming-of-age party, we might take a moment to consider the implications. Identity-tracking systems can have a chilling effect on people's willingness to express themselves online. In 2007, South Korea began requiring users of the country's major websites to sign up with their national identity number to post comments. A study by Jisuk Woo at Seoul National University found that the rate at which people posted comments on the popular forum dcinside dropped precipitously after the law went into force. "Most users became afraid to write on online services," adds Chun Eung Hwi, a consumer rights campaigner in Seoul. "They were reduced to passive readers and kept silent on public issues." Yet the evidence for a fall in libellous or obscene comments was mixed. A similar dynamic was evident at TechCrunch. Though the quality of comments increased with Facebook integration, their numbers decreased. Siegler pondered whether people were censoring themselves.

While no western government is proposing anything similar, private ownership of identity databases like Facebook's and Google's introduces comparable problems. Facebook has a history of changing its privacy policy suddenly to suit its commercial aims. While Google's stated mission is to organise the world's information, "at the end of day it makes its money by selling its users' profiles to advertisers for target marketing", says Dino Pedreschi, a computer scientist at the University of Pisa in Italy. Pedreschi, who is working on a major study of trusted identity systems, says it is crucially important for such systems to be controlled by entities who are not driven by profits.

What's more, online social networks collapse our social lives to a single space, completely unlike normal life where we generally interact with different groups at different times and in different ways. A person might share a radical political view with a friend but shy away from expressing the same opinion at work, for example. It is normal for us to take on what sociologists call different "social roles", yet this behaviour is inhibited by the openness of Facebook, and less directly by the less transparent technologies that bind our online activities into a single identity.

Granted, people tend to behave better when they are visibly part of a social network as opposed to operating anonymously, says Zeynep Tufekci, a sociologist at the University of North Carolina, Chapel Hill. However, she adds, "Facebook is the wrong tool to extend to the rest of the web."

Letting users adopt nicknames on social networks might be one answer, combining the civilising effect of social networks with the ability to adopt different social roles. "Most of the time we want pseudonymity, not anonymity," says Danah Boyd at Microsoft Research in Cambridge, Massachusetts. And yet, a pseudonym is exactly what Figes was using when he trashed his rivals' work.

The question that will shape the future of identity on the net is how much we are willing to give up to be assured that every book reviewer on Amazon is who they say they are. Right now, along with anonymously maligning the competition, we are all free to peruse websites on radical politics, investigate medical diagnoses and make a cheeky remark or two in a chat room - all without feeling that anyone is looking over our shoulder. Would we be willing to do so if all of our online actions were logged in an identity database?

Search form

The real you: say goodbye to online anonymity @ The New Scientist

You are here

TheRealYou.jpg

The real you: say goodbye to online anonymity @ The New Scientist