Privacy in the age of AI: who’s following our digital footsteps?

Photo credit: EdIntelligence, the University of Edinburgh Machine Learning Society.

Two weeks ago, EdIntelligence, the University of Edinburgh Machine Learning society, launched the “We Need to Talk About AI” seminar series in collaboration with the School of Informatics, with a sold-out event centring around big data and privacy in an increasingly digital world. The event was hosted by postgraduate Artificial Intelligence students Mari Liis Pedak and Anton Fuxjaeger, and proved to be a very well-run and interesting evening, with an excellent interdisciplinary panel ready to discuss the social, economic, and political implications of the collection of our online data.

The speakers each had the chance to give a short introductory presentation, and Dr Kami Vaniea, lecturer in security and privacy at the School of Informatics here at the University of Edinburgh, was the first to speak. As a researcher of the human factors of security and privacy, she began by discussing the type of data that websites collect from visitors: your operating system, your location, the device you connect from…  The website then uses this data to load what you see on your screen, such as the layout of the site (mobile vs. desktop), different fonts, graphics, embedded links to other sites, and so on. This often results in, not only a different viewing experience but also a personalization of the information presented on the website itself. Your cookies may be used to load third-party advertisements or tailor content that you are predicted to like. For instance, in 2012, the travel-rate comparison website Orbitz showed more expensive hotel rooms to Mac-users than it did to those accessing the site from a PC.

Although it is extremely worrying, the fact that websites use our browsing information to boost their sales is hardly surprising in a capitalist economy; what instead stood out from Vaniea’s talk was the idea that our experience of the Internet is dynamic, highly tailored, and therefore subjective. The website does not really exist until you load it, unlike the pages of a book which will show the same words to every reader. In a way, there is no such thing as an objective Internet because it cannot exist without an observer. This is an interesting way of conceptualizing the web because it underlines that the Internet is not a static landscape of facts and information, but rather a multi-layered network where information is not circulated in a level playing field; where all voices have the same platform to reach who needs to hear them. The danger of regarding the Internet as an objective and static space is clear when we consider how personalized search engine results and social media feeds encourage the formation of digital echo-chambers.

This ties into how human bias is translated into machine learning programmes, which was one of the topics covered by the second speaker, Lilian Edwards. Professor of Law, Innovation and Society at Newcastle University and Associate Director of the Arts and Humanities Research Council Centre for IT and Technology Law, she approached the topic from the moral point of view. Starting with the 2016 ProPublica investigation which discovered racial bias in the risk assessments used by courts to predict reoffending rates after conviction, she discussed how human bias is translated into machine learning programmes.

Predictions can only be as objective as their input data: if real-world statistics reflect injustices and discrimination in our society, then training algorithms with this data lead to those same biases being learnt by the program. Ironically, decisions by machines are not impartial, despite being “driven by math”, if they are founded on an inaccurate, incomplete, or unfair picture of reality, as encoded by the training data. For instance, an experimental recruiting tool under development by Amazon to rate and review CVs of applicants was recently found to discriminate against female candidates, after “learning” from training data that male candidates were more likely to be hired and therefore deemed them preferable.

Edwards also challenged the audience to think about the motivations behind using machine learning programs for initiatives in the social and political sector. These can improve efficiency, for instance by helping local councils or governments identify specific areas or communities that need support (i.e. more social workers or targeted welfare spending). She argued that this is a “band-aid” approach, reflecting a lack of adequate resources and investment in the sector, and one that does not address the underlying issues that lead to these problems in the first place.

The final speaker was Laura Cram, head of the Neuropolitics Research Lab and Professor of European Politics at the University of Edinburgh. She specialises in using cognitive neuroscience and psychology to shed light on political behaviour, recognising and analysing trends in the “mind-brain-action nexus” with big data. Cram offered a fresh perspective, pointing out that the ways in which we consume news on social media and how companies or political campaigns use our data to show us targeted ads are not new or innovative phenomena. “Transmission of information even in traditional media is sensational and far from reality,” she said, underlining the role that traditional broadcast and printed media have in generating these reactions to the news. The fear-inducing narratives that outlets use to report on artificial intelligence have a huge impact on public discussion, and not always in a way that reflects reality. For instance, her own research shows that at the moment there is no conclusive evidence that Russian-linked Twitter bots significantly influenced the outcome of the Brexit referendum, whilst news articles describing this supposed “hacking of democracy” abound.

“Now, there’s a deep sense of insecurity and fear on what information can be trusted if sources of information are being pushed towards you,” Cram argued that this ontological insecurity makes it difficult to allow people to make informed, rational political and voting decisions. According to her, propagating a largely unfounded fear of “Russian bots” and other algorithms only makes the situation worse.

The speakers then came together for the panel discussion. This format worked incredibly well and led to a whole range of topics being covered, prompted by the audience’s opportunity to follow up on the panellists’ introductory presentations. One of the discussions that stood out regarded regulation of data, privacy and feasibility of such policies. With the European Union’s General Data Protection Regulation (GDPR), which came into force in May 2018, users have the right to ask for the complete data that websites have collected from them. Unfortunately, this isn’t as straightforward as it might seem: knowing how to start is difficult for the average person and, as was pointed out in an audience question, “how can we ask for data from companies when we don’t even know who they are or what data they have?”

Since the implementation of the GDPR, a lot of websites have been asking users to consent to their use of cookies or their privacy policies before viewing their content. However, the panellists did not seem to think this was a particularly effective way of improving data protection or informing Internet users of the data being collected or how it is used. Edwards argued that in this context, “consent is dead,” and that most privacy policies are either unintelligible or vague. There is no negotiation between parties because users cannot challenge or question the terms they are presented with, and there is no competition of services, as they don’t have anywhere else to go to get the information that they need. In this sense, we only have one option: to accept the terms or not go on the Internet altogether.

As Vaniea pointed out, a cookie is often saved on your computer as soon as you visit the website, before you even have the opportunity to agree or disagree, and if you disagree you are just sent off the website but the cookie is not deleted, meaning that no meaningful consent is even possible. Cram added to this from a behavioural psychology perspective, arguing that repeated exposure to this type of question results in us becoming desensitised and clicking “I agree” without giving it much thought.

Edwards was also critical of privacy policies, viewing them as legal tricks to stop companies being sued, as opposed to creating actual change. “We should ban privacy policies,” she mentioned, instead proposing that legislation and policy aiming for regulation should focus on combating toxic use of user data, as opposed to user consent. To improve the public’s understanding of policies, she suggested that a “default” privacy policy could be defined, set out within the GDPR or a similar framework, and companies wishing to tweak it would have to highlight and meaningfully explain their changes (i.e. collection of more data than the standard policy) and why they are necessary. Although this might not solve the issue, it might make it easier for us to understand what data websites collect and how it might be used.

Of course, no interactive discussion would be complete without a “but what about Brexit?” question from the audience. Edwards assured us that a post-Brexit UK would still have to comply with GDPR regulations. The GDPR applies not only to companies in the EU but to all those that collect and process data of EU residents. Effectively, any business, online retailer, news outlet, or website will have to comply with EU privacy laws if in order to keep their EU consumer base, which is a huge economic incentive. This means that the GDPR and other EU legislation could provide a unique opportunity to bring meaningful change to the way in which everyone’s data is handled on the Internet because of their large, international reach.

Another interesting point raised was the value of our online data, and how this might be different from traditional marketing. Facebook, for instance, does not sell your personal data but instead sells “influence” to third-parties who want to target specific audiences through ads, according to Mark Zuckerberg’s testimony to the US Congress and Senate. This makes sense, as the exclusivity of Facebook’s service is dependent on the fact that only they have access to this data, which is why they have strong incentives to keep this data safe. Furthermore, tech companies do not want to be headlining front pages with data breach/misuse scandals, as this backlash (for instance #deleteFacebook after the investigations into Cambridge Analytica) can lead to a loss of users, which directly results in a loss of the value of their services. On the other hand, this still means that our data, whilst perhaps “safe,” is actively used to sell us goods and services, and influence our behaviour.

Vaniea argued that although the principles behind targeted advertising are the same as they have always been, different psychology is involved. She proposed that an advertisement can be seen as a bet on future revenue. A thirty-second Superbowl ad, costing about 5 million dollars, is a large bet for a company to make, particularly when compared to the $0.25-10 that a simple Facebook ad approximately costs.

Although many of us might think that we are critical of advertising and would not fall for third-party ads, it is likely that repeated exposure to targeted advertisements affects our consumer behaviour. Cram pointed out that measuring their effectiveness is again a complex problem, echoing the difficulties in correlating political outcomes to a particular campaign, as this would require researchers to have access not only to our online behaviour, but also our sales (or voting) history.

In the last ten minutes, the panellists went on to discuss how the increasing use of “smart” devices, such as smart watches, wearable health and fitness monitors like Fitbits, or virtual assistants like Amazon Alexa or Google Home (collectively referred to as the Internet of Things), feeds to the collection of data that companies learn about us.

Because of the effectiveness of these approaches, companies often collect much more data than is strictly necessary for the provision of their services. This is not necessarily because data is useful in and of itself, but because correlations between online and offline behaviour can be translated into market value. In an increasingly digital world, it is important that we question this status quo, in which our data is collected without meaningful consent to be sold to or used by third-parties or websites themselves to drive our consumerism, and potentially our political ideas, forward. The question is no longer just about who’s following our digital footsteps, but whether or not they are in fact, paving the roads on which we travel.

 

Written by Simone Eizagirre and edited by Karolina Zieba.

To stay up-to-date with the upcoming events of the “We Need to Talk About AI” series, visit: https://www.ed.ac.uk/informatics/news-events/public/we-need-to-talk-about-ai

For more information on EdIntelligence, follow them on social media or check out their website: https://edintelligence.github.io/

For further information on the panellists and their research, visit their homepages:

Dr Kami Vaniea: https://vaniea.com/

Prof. Lilian Edwards: https://www.lilianedwards.co.uk/about/

Prof. Laura Cram: http://www.sps.ed.ac.uk/staff/politics/laura_cram

Leave a Reply

Your email address will not be published. Required fields are marked *