Blog 8: Harms in Machine Learning
Published on:
Harms of Machine Learning
Case Study:
Understanding Potential Sources of Harm throughout the Machine Learning Life Cycle
This case study discusses how machine learning can cause harm in different ways. It summarizes each of the seven different ways and explains how to detect them and what to do to alleviate the problem. These problems can impact groups greatly and are important to understand and examine when you make a machine learning model. I think many developers don’t think too deep into impacts of their technology, so this case study is a good baseline to start developing a framework on ethical developing.
Here I have made a checklist for each of the seven ways and how you can detect the harms.
- Historical Bias: Change someone’s background to see if the algorithm changes its response.
- Representation Bias: See if everyone is accounted for in the data. Brainstorm every criteria or group that could exist.
- Measurement Bias: See if the measurement equates to different things for different groups. Could also check with existing data to validate the algorithm and see how accurate it is.
- Aggregation Bias: Check to see if it really is a one-size-fits-all, or if the model affects different groups differently.
- Learning Bias: See how minimizing or maximizing one variable affects others, and compare the performance across different groups.
- Evaluation Bias: Evaluate the accuracy of your data population vs your actual population you want to predict on. Make sure they are similar and you aren’t extrapolating.
- Deployment Bias: Check if use is for the way it was intended for. Make sure people are using it and understanding what it is doing, and not taking data from it and assuming a different conclusion.
Are these the only harms possible with machine learning? No. This study does not go into environmental harms, which are very apparent today. Resources like water are being used, and small AI interactions are costing a lot of it. This not only impacts our world environmentally, but has been seen to affect certain groups disproportionately. One example is indigenous communities having to fight to keep their land free from data center construction. I have friends that also have lived near huge data centers and had their environment at risk, with things like energy use and pollution. These cause people to not trust technology, and also makes them want to move away.
There are lots of important questions to ask yourself to fully understand the effects of a model that you might be using and will have an effect on people. Here are some questions to ask yourself.
- Does the data imitate past assumptions or issues?
- Are all possible groups included in the data?
- Does the measurement mean the same thing for every group?
- Can the model affect different groups in a different way?
- Can the model affect smaller groups disproportionately?
- Is the model being used on the same population data comes from?
- Is it being used for the intended purpose?
Now after reading this case study I wonder how we can ever be sure that we have accounted for every single possible nuance to a population. Since we can never represent a population 100% accurately unless we have data on every single person, at what point is a model ‘good enough’ ?
This is a common understanding in statistics, and shows that we can never use prediction models to undoubtedly predict something. We know it would be near impossible to get data from every single person in a big population, so it begs the question of what point is fair enough to the majority of the population and subgroups within.
This case study dived into the effects of machine learning models and it was shocking how easy it is to make a model that is biased. I personally would not want part of my identity to be used against me in an algorithm/model and make me have an unfair disadvantage. There are many things to think about when using a model, especially when gaining data for the model to train on. It is imperative that people who make these models think of these things, so that we don’t continue to cause the same issues. These models can easily perpetuate bias and cause discrimination against groups in ways that the developer did not intend, meaning we must be extra careful when crafting the data and models. Attention to detail and being perceptive to possible outcomes is crucial in the age of AI.
