Our data should be ours

Sarah Newman, Director of Art and Education at the metaLAB at Harvard University, examines digital technologies and what they mean for privacy.
Sarah Newman

Director of Art and Education

02 Jul 2021
Sarah Newman
Key Points
  • Technology often advances faster than we can understand its implications. Today, when we share private information through various platforms, we don’t know how it might resurface or be used in the future.
  • People’s shared secrets can be a connecting force, and projects like the Future of Secrets aspire to show how technology may be used to facilitate connections rather than reinforce divisions.
  • Data can be misused, incomplete or biased, so labelling data sets properly might be key to improving choices made with and about data.

Recorded and archived

Photo by Virrage Images

Humans are very complex, layered beings and we have all different parts of ourselves that manifest in different ways. With the rise of digital technologies and ease of communicating digitally, whether by email, text or any messenger on a platform, nearly all of the things that we’re now communicating about are being recorded and archived somewhere. There are, of course, certain technologies that are encrypted or trying to make things vanish, but most people aren’t using those and aren’t even aware of what’s being archived. As a result, things that are meant to be private, and things that we would call secrets, are being captured outside of our homes or our physical spaces, and they’re really no longer our property.

These new technologies have caught on so quickly, and they’re so easy and convenient, that most of us aren’t thinking about where this data will be in the future. If you think about a global scale, most people aren’t thinking about the number of secrets that are being recorded and potentially left to future generations, either. There is a growing awareness of the importance of protecting data and privacy around certain data, but that’s following, like many things follow, technological innovation. So, we are in this unusual place right now where we’re utilising these technologies; as soon as we do, we lose control of the things that we’re sharing. Because humans are so complicated, and we have many different parts of ourselves, we might not want this digital trace of our communication to be out there, and it might not accurately represent us anyway.

Can’t unlearn what we learn

Most of us know only a limited amount about our great-grandparents, great-great-grandparents or our other ancestors. I would love to know more about my distant relatives and far past, and I think a lot of people would. I’ve done research and surveys that indicate that most people would like to know more about their pasts, but we can’t unlearn the things we learn about the past. In a way, there’s a kind of freedom with being born and knowing a limited amount about your past and being able to create yourself fresh.

Should there be, which there will be, this surfeit of information in the future that we’re not in control of, we might be doing a disservice to future generations who won’t be able to unlearn what they learn – aren’t necessarily learning something true or accurate, and might be learning things that we wouldn’t want them to know.

The Future of Secrets

The Future of Secrets is an interactive installation that explores why we so willingly give our private information over to a digital device, immediately losing control of it and not knowing where or how it might recirculate. This is something we’re doing constantly with our phones or with Facebook or whatever. The installation is essentially a computer on a pedestal that asks, do you have a secret? It gives the participant the opportunity to go and type in a secret. Somewhat surprisingly, people love to do this. It’s been compared to a confessional, a digital confessional. People do like to share secrets, especially if it’s anonymous. In this case, it is mostly anonymous, but when you put in your secret and hit enter, a little printer beside the computer prints somebody else’s secret, which you can take with you. The viewer then also realises that somebody else is going to receive their secret. So, there’s this moment of surprise, anxiety or wanting to retract, to retrieve the secret they just put in, knowing that somebody else is going to get it.

What’s curious about this is how people will line up, queue up in order to enter secrets into this installation and also the joy that people have in receiving somebody else’s secret.

We’re in a strange time, because there’s going to be a lot more data about our current generation than there certainly was about past generations – but also than there will be about future generations, when people will have locked down their data more responsibly and effectively. That hasn’t come. This view might not be popular, but it’s part of what inspired the Secrets project.

Connected with strangers

Photo by Rawpixel.com

It’s very healthy to have secrets, and normal and very human. One of the benefits of this particular installation is it allows people to see that, because you’re reading other people’s secrets, and different things happen where sometimes people leave the printouts behind. You start to see this whole spread of secrets, and it’s actually one way that technology evokes humanness in strangers – and technology is often doing the opposite. It’s often making people feel more distant and less connected. In this case, you’re seeing all these individual secrets of memories, regrets, guilty stories from childhood or unfulfilled wishes, and all the things that secrets might be. Seeing them submitted by people from all over the world and the commonalities in them actually helps us feel more connected with strangers, and particularly more connected with the humanity in other people. While I do think it’s healthy, natural and okay to have secrets, I also think if we can find more avenues for sharing that humanity with strangers through our technologies, that’s something we should aspire toward.

The Data Nutrition Project

There’s a big power imbalance between the technology companies and individuals. There’s a number of things we can do, and are doing, to try to basically improve this power imbalance. There are a few things that individuals can do in a limited way, but it doesn’t actually change the balance of power, because most people are opting-in to technologies that are convenient, free, etc., even if those large companies are scraping the individual’s data. So, there are lots of different innovations that intervene at various points in the development pipeline. Some have to do with protecting data.

One initiative that I am involved with is something called the Data Nutrition Project, which is not about protecting data per se, but about better labelling data: what’s contained in it, and information about how it can be used and how it ought not be used. This is particularly important because the standards are lacking, or at least not consistent in terms of the way data sets are used to train models or algorithms. So to step back a little bit, data sets, large amounts of data, are used to train these models.

Biased data means a biased model

That’s how we can do any kind of prediction in machine learning. The machine learns through past data, and sometimes a data set that has a lot of problems in it – racial or gender bias, lack of representation in a number of different ways, missing areas of the data – sometimes a data set that is used to train a model that’s been used to predict the future or how things ought to be in the future. If you’re using biased data, then you’re going to have a biased model.

In the criminal justice system in the US, there was an algorithm that was predicting recidivism and it was trained on racist data, essentially. So, it was basically a racist algorithm that was being used to inform a judge about who should get bail. This is obviously a problem. There’s also who gets approved for mortgage, and that also being based on zip codes, which can be a proxy for race. So, there are these examples that we’re seeing where harms that are in the data are being replicated into the future, because the data is biased and not labelled as such.

The Data Nutrition Project uses the metaphor of the food nutrition label, as you would see on a candy bar that shows you calories and fat; it looks kind of like that. However, it goes on to a digital data set and shows the contents of the data set, but it’s really specific to the ways the data set may be used. Then it flags certain alerts that one ought to know when they’re using that data set. Our hope with this work is to encourage more responsible use of data sets. Most people try to have good intentions, and they just might not know enough about the data set to know that it’s going to be problematic. So, this intervention is early in the development pipeline, at the data set stage, with the hope that it will increase education, transparency and, more broadly, responsible use of data in the production of AI.

Our data should be ours

Photo by HandMadeFont.com

We ought to own our data; however, our data is extremely valuable and extremely valuable in aggregate. So, we should have more checks in place where we can choose how our data gets used or whether it can get used in a way that’s no longer tied to our individual identity.

I’ll mention two things about this. One is that the incentive structure is very problematic right now because most of these platforms use advertising for their monetary stream. We can read newspapers online for free, and magazines; many of them, certainly the various social media platforms, use advertising. That’s how they make money, and they make a lot of money. Unless you specify otherwise, their advertising is targeted toward the individual. The more that they know about you, the more they can serve you with advertisements that will appeal to you and things that you might potentially purchase, which, of course, will make their advertising more successful.

The other thing is that there are ways in which our data, when it’s anonymised to a certain extent, can be very useful in important things. Machine learning is an amazing technological innovation; computers are much faster than humans and make different kinds of mistakes. A collaboration between a radiologist and a machine-learning model that can read scans of tumors, being able to check each other’s blind spots and have the best diagnosis, is extremely promising in the medical space. There are ways, particularly around medicine, but also around pretty much everything else that involves human flourishing, that these machines that have massive amounts of data can be quite helpful.

So, I think we want to be careful to not just say, ‘Don’t give them the data.’ It’s finding that fine line between protecting ourselves and our identities, protecting ourselves from being manipulated or targeted if we don’t want that, which I think we don’t, while also recognising the value that these new technologies offer and can continue to offer into the future.

Discover more about

secrets and protecting data

Ip, C. (2018, March 11). 'The Future of Secrets' is a digital confession booth. Engadget.

Holland, S., Hosny, A., Newman, S., et al. (2020). The Dataset Nutrition Label: A Framework to Drive Higher Data Quality Standards. In D. Hallinan, R. Leenes, S. Gutwirth, & P. de Hert (Eds.), Data Protection and Privacy: Data Protection and Democracy. Hart Publishing.

Battles, M., Newman, S., & Simeone, L. (2015). Mapping Danger, Making Connections. In S. Cortesi & U. Gasser (Eds.), Digitally Connected: Global Perspectives on Youth and Digital Media (pp.60–65). The Berkman Center for Internet and Society at Harvard University.

0:00 / 0:00