Map of AI Ethics

HAL 9000

We’ve all heard the news about how AI will revolutionize multiple industries, transform mankind and bring us into a new era of technology. But it all sounds too good to be true. Indeed, it is. Like any other technological tool, there are serious consequences if it is left unchecked.

Discussing the ethics of AI has always been a tricky subject, since the question often asked is “whose fault is it?” This is just scratching the surface in terms of ethical discussions. A deeper analysis of what we mean is more revealing and makes for a better framework to have these conversations.

Let’s dive into detail and discuss AI Ethics.

Wait, what’s AI?

There are lots of articles talking about AI (Artificial Intelligence) and different ways in which it can be defined. No wonder such a topic is difficult to discuss, since getting started on the right foot is such a challenge. But here’s a quick definition that can help: AI consists of algorithms that sort through huge portions of data, find patterns and insights, and make predictions or classifications that humans would not be able to do without intelligence. As these models are released into the real world to do their job, they get exposed to more data and can further learn and adapt.

I have always found that analyzing a difficult subject is easier if I can locate myself in the middle of it all. It’s the same feeling of visiting a huge mall plaza and feeling relieved because there’s always a sign that says “you are here.” This is what will help us through this discussion: a map to guide us. Our map will be an AI/ML “pipeline,” which is the process of creating one of these tools from start to end.

The AI pipeline

A pipeline is roughly the set of steps that need to happen before we can have a functioning piece of AI. First, if we want our AI system to learn, we need something for it to learn from. It follows that our first step is data collection. As the name suggests, this is the action of collecting data so that our system can analyze it.

Data engineers, a rare breed between programmers and data scientists, will make use of this data to analyze and explore it. This is actually the most important step, because any conclusions drawn here will drastically change what comes next. They perform a data exploration and once done, we are ready to transform the data in a way that is machine-learnable.

After this, data scientists will build machine learning models, which is what will be adapted to predict or classify aspects about the data. Intuitive as it might be, this step is called model building, and it involves arranging the right algorithm for data to be processed. It sounds simple, and indeed it is for some models. It can be hauntingly difficult too, leading to gigantic models like DeepMind’s AlphaGo or OpenAI’s GPT-3.

The models are then evaluated by one or multiple performance metrics, where it can be decided if the results are “good enough” (whatever that means), and if they can be improved or not. Insights also happen here, sometimes about correlations from other fields that we didn’t expect to find.

Finally, the model is productivized and it starts acting in the real world. This is called deployment. At this point, we might collect more data and re-cycle the process.

With our map at hand, let’s dive deeper into each of these steps and discuss what dangers and ethical challenges lurk ahead.

1. Data collection

1.1. Data privacy

If we’re off to build a model, we need some data for it to learn from. Fantastic. Where should the data be sourced from? It is entirely possible that the sources are not legal or have data privacy concerns. Web scraping is not illegal, but is it ethical? Did the users consent to this use of their data? Should they? Some websites like StackOverflow actually consent to this by giving reusable licenses on the data.

This issue goes hand-in-hand with changes that make users’ privacy defaults more lax, which results from a growing need for social networks to be more interconnected. Getting data out of Facebook today, for instance, is a lot easier than it was 10 years ago. We know that’s the case, since a huge nation-wide scandal happened based on that fact alone.

Even if data engineers are able to get the data, should they? Should users, system admins, legal teams and webmasters be notified and consent to its use?

The answer is not yet clear, as there are many different kinds of data online.

In most cases, models don’t really need to identify users, so we’re talking about anonymized data. Anonymized doesn’t mean private. Also, what guarantees can we give that an anonymized data-point won’t really identify someone? We need to consider that data re-identification is a full-blown discipline and gets real results.

1.2. Expert exploitation

There is a particular kind of learning algorithm called supervised learning. This algorithm takes examples with pre-calculated results, and learns how to get to the result from the data points. To work these models, we need to set up the results first, an activity sometimes called tagging.

The work needed to tag datasets is sometimes easy, but not always. You might have done this yourself unknowingly when you translated an image to words and helped Google’s OCR. Sometimes, the initial work needed requires legal knowledge, or deep scientific training, or whatever-it-is in your industry of choice.

Creating these datasets is time consuming and often industry experts get stuck making them. To lower costs, companies like Amazon have turned these expert-level jobs into optimized minimal tasks, something we could call a data-factory. This has brought concerns over how much people are paid for their time, along with the quality of the subject treatment when the people involved are actual subjects in a study.

This problem, while not new to AI, has immensely blown up since AI is very data-hungry and investment-promising. It is what most major and minor companies are turning to. Before services like Mechanical Turk, our options were limited to a few online surveys, but now there are lots of other alternatives around. Some of these options seek to address the ethical concerns behind the Mechanical Turk approach, while others just replicate it.

1.3. Bias

There are several measures of how data is “quality” data. But let’s shine another light on this problem to make it easier to understand. Let’s assume we build a system to predict who might be good candidates for our workforce, based on their resumes. After a while of using it, we identify a disturbing trend: it rejects women a lot more than men.

If AI models could talk (they can, it’s called explainability and we’ll dive into it later) they could say: “I learned from a bunch of examples and they were 80% men and 20% women – hence, men are better suited for your task force.” Your intentions are good, little model, but you’ve got it wrong.

The disturbing realization is not that the model is sexist, but rather that it’s magnifying the underlying bias in our initial dataset. And it’s hard to fix biases in data, because we might not even be aware of them. Have we accounted for differences in gender? Race? Age? What about location? What about socioeconomic class? Are neurotypical low-income latinos represented in the same fashion as Crazy Rich Asians (2018)? What about… You get the idea.

This issue is not just theoretical, it causes actual problems in real life. Predictive policing algorithms are the best example of this concept, one of the worst cases happening. Companies interested in predicting crime use police records for predicting problematic areas. The objective is to be proactive and know where to police more. The existing bias in police (as shown in research and even FBI reports) means that they are already patrolling certain neighborhoods more, which leads to further reports, which leads to the algorithm predicting them as more dangerous, which leads to further patrolling and further bias.

This is a sneaky, dangerous problem. It might be a long time before we even know it exists. Back to the example of the sexist model, instead of taking for granted that men and women are equal, we could mistakenly accept the insights learnt by the model. “Ah yes, this proves that men are really better suited for an office job.” It sounds silly because it’s obviously wrong. Sometimes it’s more subtle and hence more dangerous: “Ah yes, this proves this black neighborhood is more dangerous, as we always suspected.” This sort of bias is a serious problem.

2. Data exploration

2.1. Privacy

When we explore the data (maybe looking for biases?), we will go through multiple data points that we should not be looking at. Earlier we explored the concerns of obtaining and storing that data, but now we have other people with their hands in it.

Let’s pretend, for example, that we have gathered location information for some kind of mapping application. Let’s say that we have already gone through the ethical and legal hurdles of obtaining and storing that data. To make the example more real, feel free to check one of these open datasets which came out of a quick Google search: [1], [2], [3]. Data scientists might find patterns in the data that they were not looking for. “Huh, guess what – this person leaves their house every day at 9 AM and makes a 30-minute stop before going to work.”

These “insights” are baked into the data that we (GPS users) chose to share, but we never knew those insights came with it. Did we agree to share that? Can data scientists even warn us about what they would find before actually finding it?

2.2. Insight rights

Once the insights are found, are data scientists under an obligation to let us know what they found? Should they warn authorities if they find evidence of non-legal activities? Should they contact people when health is at stake? Are they under the obligation of sharing benefits from those insights?

Google Maps is probably not required to guarantee the fastest trip, but should a DNA sequencing company get in touch about a possible finding correlated with cancer?

Currently, we, the AI builders, are not under an obligation to share these insights. We are not under an obligation to share the benefits of what we find. Again, this is one of those cases where what we’re doing is not illegal [2], but is it ethical?

3. Model Building

3.1. Proprietary Algorithms

The best model for text-to-speech currently out there is not open source. It’s not used by a big company and nobody knows exactly how it works. Nobody except the creator, who claims to be the only person behind it. This model, 15.ai, was created with a dataset from the Pony Preservation Project, a huge effort by multiple people around the world to duplicate the voices of the My Little Pony characters.

This is an amazing example of multiple people working under the same interest, but one of them reaching it sooner with amazing results and the rest of them being left in the dark. Anons from the Pony Preservation Project were never able to replicate the amazing results of 15.ai: text-to-speech with faster-than-real-time processing, multiple-speaker-support, context-aware pronunciations, dynamic realistic emotions, and voices that can be cloned with less than 30 seconds of data.

This example seems trivial (unless you really miss Twilight Sparkle) but this is actually a common occurrence in the world of AI. Advances will push technology and science forward, but companies and institutions are reluctant to share these advantages with the world. Sometimes, because of the fear of abuse. Sometimes because they would lose the competitive advantage they just acquired.

This has become such a problem that OpenAI was founded with the very objective of circumventing it. As a foundation, they strive to make great strides in the world of AI and make these advances open to the whole world. This way, if everyone has advantages, we all win and nobody is left behind.

If you have been paying attention to the links in this article, you might have been amazed that OpenAI was the one to avoid disclosing the results from their GPT-2 breakthrough. Even with good intentions, a breakthrough might be too dangerous to let out. We’ll talk about the societal impact later.

Is this really a problem though? At this point, the challenge is that companies are not incentivized to share their progress, but instead, benefit from it privately. It’s not a technical problem, but a business one. Most researchers are working under the grant of paid enterprises. If they’re not working for a company like FAIR (Facebook) or DeepMind (Google), then they’re under the watchful eye of government institutions like CIFAR (Canada), NSF (US), etc.

3.2. Technological monopoly

But what about the competition? If a company happened to come across a breakthrough, they could release a product so good that it would just wipe out the competition. Fortunately enough, today there is more than one company making great advances in research. We are still at the verge of creating a huge gap .

This is so real that the Future of Humanity Institute has proposed the Windfall Clause, a policy for governments that would mitigate the risk of AI creating a gap so vast that some governments/companies would outrun others.

Today it is just that: a proposal. The same document acknowledges that there are risks and downsides that haven’t been completely addressed. In a very real sense, this is still an open problem.

4. Model evaluation

We’re a bit ahead of ourselves. Back to the scenario where we are building a model. Assuming we can measure and publish some of the research we’ve done…

4.1. Reproducibility crisis

…sometimes we don’t have enough resources to ensure that our experiments are perfectly reproducible. We might have not accounted for all the variables, which means we might be accepting, knowingly or unknowingly, the risk of some bias. We might just be focused on getting our product out the door because that’s what will get us our paycheck.

While this is very similar in concept to the Technological monopoly problem we just explored, there is a slight difference: this is not about the economical impact, but rather the impact to science and knowledge itself. Think of the problems we explored with the Proprietary algorithm: the knowledge is only in the hands of those who built it. So if that’s the case… how are we ever going to reproduce those studies? Will we ever have good sources that explain how feasible these results are?

The answer is, sadly, no.

You might have heard the term “reproducibility crisis” before, outside of the context of AI. It is not a problem specific to AI, but to the whole scientific community. To learn and improve, technology experiments must be run again, and again, and twist and turn their workings until we are sure of the conditions behind our results. Only then, scientific knowledge can be settled.

A great business investment in AI does not have scientific progress high up on the list of priorities.

One final aspect to consider in this reproducibility crisis is that, because most research is driven by… well, funding… and payments are given in direct relation to the publications, researchers are forced to publish. Most of the time, research ends up in boring no-result findings (which is knowledge too!). But a publication titled “No link found between jelly beans and acne” doesn’t bring in the payments (see the explained version of the comic, which covers the ethical problem with forcing results in research).

4.2. False positive and false negatives

One thing that AI experts know and are not really proud to say is that AI is imperfect by definition. AIs are algorithms that learn to approximate a function (the “thing” we want it to learn) as close as possible… but it’s always going to be an approximation. If we knew the exact result for all cases then there would be no need for an AI – we would just code the right answer and have a perfect system working.

This means that an AI algorithm is always going to make mistakes. The whole process of creating these models focuses on reducing those errors, but it’s impossible to get rid of them all.

Because scientists tend to be very creative with naming, they called the first kind of errors (false positives) Type I errors. These are the mistakes that a model makes when it thinks it found what it was looking for, but it was wrong.

Following this creative stroke of genius, false negatives are Type II errors. This is the case where the model predicts that whatever it was looking for is not there, but it actually was.

Spoiler alert: we cannot make both of those measures 0% for a model at the same time.

Which is worse? It depends on the case, and how your model is designed. If you’re looking to detect cancerous cells, you want to ensure there are no false negatives at all. It’s better to mistakenly diagnose cancer, which will lead to more in-depth testing and confirmations, than to mistakenly say patients are safe and send them on their way.

5. Model deployment

5.1. Explainability

After the model is live and working, it starts making predictions and we use its results, probably as part of another system. This system may or may not be a computer system. A OCR feature inside a receipt-capturing application is a model integrated in a computer system. A loan-application system, for which the decisions are always made with a human, is still part of a system, a non-computer system where the output of this model is the input that the human takes.

When decisions are made, it is important that we know why these decisions are made. In the case of a loan-application system, there might be laws that require you as a company to justify the reasons why you deny loans to people. In the case of a cancer-detection imaging system, the reasons are what a doctor can use to decide if the diagnosis makes sense. Situations like these are, literally, life or death decisions.

Some model algorithms are inherently more explainable than others. A few are still seen as “black boxes” – this was the case with Neural Networks until a few years back. This means that sometimes, having a model that works comes at the cost of not knowing exactly why.

The ethical dilemma of this seems pretty obvious when put that way: if we cannot explain exactly why the predictions are what they are, we cannot challenge them or correct them. And comparing them against the ground-truth (this is, what we know for sure) is not going to tell us more, because the bias in the data might be a reflection of the errors in the model. Meaning that if the model is failing successfully, we might never know it.

5.2. Societal Impact

As models start getting used to making decisions and we incorporate them into our daily lives, or into the products that we use, they start to have an impact into the general culture that we live in. Like any new piece of technology that makes it into our society, it can be used or misused.

The benefit of a technology like AI is making difficult tasks (that required “intelligence” of some sort) easier for a computer. Auto-generating music, or recognizing speech (“Alexa, play Despacito”) sounds like the kind of application that benefits us all. But as beneficial tasks get simpler, so do malicious ones.

A common occurrence today in the digital age of media is that we trust things that we can see with our own eyes. While a video could always be faked, acted, or manipulated, it was hard to do so, and hence unusual. Deep fakes are an example of how creating a video of someone that was never there is easier than ever. Not even original video feeds are needed, a single picture is enough.

It’s not just video that can be faked, but content too. As algorithms get more powerful and more available, they can start generating all types of content. From music to fake news. And if you believe this wouldn’t go very far because the creator would be eventually caught, think bigger. Not even giants like Facebook, Google and Twitter can fight these issues, which sometimes take tolls on real people’s lives. You might want to see Destin Sandlin’s (SmarterEveryDay) amazing investigations called Manipulating the Youtube Algorithm, The Twitter Bot Battle (Who is attacking Twitter?) and Who is manipulating Facebook? You’ll be amazed that attackers can get so sophisticated with such simple tools and that they can do so much (sometimes, unfortunately, to the point of actual genocide).

Even if it gives it a bad name, this is all AI-powered. Yet, it doesn’t mean that AI is a tool for evil, but rather that it’s a powerful tool, and we need to deal with the ethical implications of its use.

5.4. Societal preparation

Sometimes tools are just too advanced for us to use - like toddlers who get access to knives - because as a species we’re not quite ready to make good use of them. Insights come in all colors and shapes, and scientific progress is not always right if we cannot handle it correctly. No, I’m not talking about atomic power but that’s a great example of how great things can go awry.

You might remember a study that made the news because it created a model that was able to predict if you were straight or gay based on your photograph. Now, that study has been debunked since then (thanks to the Reproducibility Crisis goddess!), but what if it wasn’t? Would we, mankind, be ready to accept knowledge that you could predict from a photograph, a trait that in some places carries a death sentence? Not to mention that, because of the technical monopolies we’ve discussed before, it would mean that big corporations and governments would be able to apply all of these tools, leaving individuals defenseless.

5.5. Data privacy (take 3)

We’ve already discussed why models are data-hungry when trained. But like any algorithm, they require some input so that the output is meaningful. For instance, a realtor price-estimating algorithm is meaningless unless you provide it with some information about the place you want it to estimate.

So models also need data to work. Which is, at some point, even expected. But some algorithms need constant data to properly work.

An example of this might be the data-tracking built in almost all web pages around the world, used so that advertisers can predict the best placed ad to show users. This is such a serious deal that GDPR laws have forced almost all of the internet to ask whether you wanted to be tracked. You might have been surprised to learn (I was too) that almost all sites have some flavour of these.

But that might be too abstract to grasp. So here’s another example that is closer to home: how does Siri know when we utter the words “hey Siri,” between all the things that we say? Well, “speech recognition” is only part of the answer. The other part is: Siri’s always listening.

Aside from data privacy concerns like Smart TVs and speech recognition, this has opened up a new world of attacks for hackers.

6. Overall

6.1. Regulation

Yet, there are aspects to consider that do not fit anywhere into this pipeline that we’ve imagined, for instance, laws.

So far we’ve seen how AI systems have the possibility of doing a lot of good, and a lot of evil in our world. So, how do we ensure that it is used for the right thing?

Back to the age-old question: if a driverless car hits someone, who is to blame? This question is no longer hypothetical. Regulations should step in and define a responsible party. Laws are not really in place just to finger-point, but also to define someone accountable for making things better.

To make things worse, regulation making is a slow process, while model building gets faster and more powerful by the day (GPT-3 has been replicated with 0.1% the power, and cars can be driven with 19 neurons). It makes sense that regulation is a slow process: politicians consult experts who in turn need to assess and provide them with details of what the  state-of-the-art technology is, how it works, what can turn out well and what can turn out badly. While the investigations are made, papers are being written, and sessions are being held in congress, new breakthroughs are being made. How are we going to ensure that we set up fair, up-to-date laws that do not allow for loopholes and evil usage? This is an open-ended question.

6.3. Human baselines

Pragmatism is also an ethical choice. We might not have perfect algorithms, but in some cases, they have already surpassed human capabilities. Cars might not make perfect decisions with even dire consequences, but if they drive better than the average human… shouldn’t we use them? Algorithms for detecting cancer might not be perfect, but if they work better than the average doctor… shouldn’t we use them?

There are voices for and against these choices, some of them focused on the fact that we’re still not ready, some of them claiming that we need to get our regulatory practices straight first. But having a tool that improves our lives and not using it is an ethical decision. So is having a dangerous tool that can make our lives worse and deciding to use it without being prepared.

Moreover, for some of the discussions about ethics in AI, the point isn’t really about AI. “Who will a driverless car kill?” is actually a nonsensical question. Forget about AI: who would you kill? And this is not a hypothetical question. This happens. Every day.

6.4. Jobs

The creation of new technology and tools always brings forth the discussion of what will happen with existing jobs. Sure, new automation tools, especially intelligent tools, will make a lot of jobs disappear.

A superficial look into this tells us that this is not a big issue, because while a set of jobs disappear (like the repetitive factory worker being replaced by a robot), new jobs appear (for instance, the supervisor or trainer for the robot).

But jobs disappearing and jobs appearing are not the same. The same thing happened with industrialization, and a lot of manual and dull work was replaced by machines to give forth a new set of jobs that would control those machines. What happens to the people that do not know how to operate a machine?

“Just press this button and it works” is the type of premise that we see in movies, but in reality, people need to have higher education levels, or  specific training to be able to take these new jobs. This works just fine in a society that has everyone educated and prepared for the future. But alas, we don’t live in that perfect society just yet.

When the models themselves become so much better at what they do (see 6.3 Human baselines), how will we correct them? Even now, some of us fail the Turing test, which proves that we are humans to a machine.

Conclusion

Today we explored a systematic approach to identify the ethical issues with AI, but there are other approaches.

This makes sense, since ethics in AI is no small feat to discuss. It covers all aspects that any popular technology touches, from business to scientific discovery, to human safety. There is no clear one-for-all answer that currently covers the questions that we posed today. But knowing what they are is a great first step into making humane, ethical decisions.