We live in an increasingly digitalized world, where more and more of our day-to-day decisions are being made by algorithms in our cars, phones, computers, and TVs. AI is touching on almost all aspects of our lives, from smart self-learning home systems and assistive devices to simple shopping apps suggesting us what to buy based on our previously observed behavior.
One could argue that people have been using algorithms - mathematical rules and calculations - in decision making for a long time, and they wouldn’t be wrong, it just never happened at the scale that AI makes it possible. With ever-growing datasets, vast computing power, and the ability to learn in an unsupervised manner, the amount of decisions that AI can make goes way beyond the abilities of any human being. Its potential impact on people and societies is, therefore, greater and the ethical concerns related to AI are more pressing than ever.
Ethics and Bias in AI
How can we define ethical AI systems? Here are some ethics principles to consider: ethical AI systems deliver outcomes that are in line with their intended purpose, but also with people’s intent, social and moral codes. They should benefit individuals, society and the environment and reduce the risk of negative outcomes. They should respect human rights, diversity, the autonomy and privacy of individuals and be reliable, inclusive and accessible. Finally, they should not involve or result in any kind of unfair discrimination against individuals or groups, creating or reinforcing inequalities.
This sounds quite like what humanity has been trying to achieve for millennia, doesn’t it? Which brings us to the key question: if we as humans can display intentional or unintentional bias, how can we expect a system that is being programmed by us not to exhibit the same? Well, technically, we could. In AI, bias is considered an anomaly in the output of machine learning algorithms. It often happens when prejudiced assumptions are made in the process of developing algorithms and most often when training data contain bias. So, it is fair to assume that an AI system is as “good” as the quality of the data it’s being fed. If the training dataset is cleared of conscious and unconscious assumptions on ideological concepts such as race, gender and so on, we should be able to build an AI system that makes unbiased data-driven decisions.
Avoiding Bias in Data
So, where do we start if we want to make input data bias-free? In one of the previous articles we covered the 9 most common types of data bias in Machine Learning. Specific actions can be taken to aid each one of them, but here we’ll look at more general ways of preventing data-related biased decision-making.
1. Understand Scope and Limitations of Your Model and Data
Before gathering data, it is crucial to fully understand the scope of an AI experiment and its application. This includes understanding any societal or underlying scenarios which could potentially impact the outcome.
A model trained to find correlations should not be used to make causal inferences. For example, a model can learn that users in India who prefer to browse the web in English over Hindi are also likely to buy in English, but that model should not necessarily infer that they don’t speak Hindi (or another Indic language) or that they wouldn’t buy in Hindi. It might just be that the current availability of content is greater in English. Additionally, a dataset that is not reflective of present-day norms and scenarios should not be used to train current applications. A job-matching algorithm trained with historical data can assume female pronouns when translating words like “nurse” or “babysitter” into Spanish, and only return matches for female applicants.
2. Ensure That Your Dataset is Representative and Diverse
Proper data collection is perhaps one of the most impactful measures we can take to avoid bias in data. Data needs to be accurate and sampled in a way that represents all users and a real-life setting. Are you building a model that needs to serve users of all ages, but you only have training data from Millenials? Then you’ll want to collect additional data to also represent other age groups.
Moreover, you should consider collecting data from multiple sources to ensure data diversity. In general, data coming from one source is assumed to be weaker than the data coming from multiple sources, and more likely to cause measurement bias. If you are training a chatbot, you’ll probably want to use publicly available data, data generated by professional translators, and also good amounts of user-generated data to cover as many ways as people may express themselves in a conversational setting. Check here the domain-specific Colloquial dataset TAUS created based on the sample provided by Oracle.
3. Employ Human Data Augmentation
If you are using your legacy data, public training datasets, or even data acquired for a specific use case, you will often need to augment it to better reflect real-world frequencies of gender, race, events, and attributes that your model will be making predictions about.
Getting representative and balanced training and test datasets is not an easy task, so you might want to consider services such as domain-specific data collection, data annotation or labeling for the best outcomes. TAUS has a community of over 3,000 data contributors and specializes in data services to help you collect and prepare text, image, or audio datasets that fit your project specifications.
4. Continuously Evaluate Results and Refresh Test Sets
Any model that has been deployed should be thoroughly and continuously evaluated to assess the accuracy of its results. The results should be compared across different subgroups and stress tests should be used in cases where bias can potentially occur.
As new features are being added and the systems are being further developed, you might want to consider using new or refreshed test sets that cover the new real-world scenarios.
Technology Can Help
Being aware of potential bias in data and actively taking preventative measures against it can help you build systems that can generate equally representative outputs for various scenarios. We can't expect technology to create ethical systems or make moral judgments by itself while even humans cannot collectively deliver an ethical judgment in many cases as seen in the Moral Machine experiment. But, just as AI technologies can potentially amplify poor decision-making, similar technologies can be applied to help identify and mitigate these risks. You can use Google's What-if Tool or IBM's open-source AI Fairness 360 toolkit or reach out to TAUS for custom data solutions tailored to your specific use case.
6 minute read