trendingNow,recommendedStories,recommendedStoriesMobileenglish2263616

Can algorithms reinforce our biases?

Classification is the task of classifying a bunch of objects into separate groups.

Can algorithms reinforce our biases?
Arun Krishnan

A recent Harvard Business Review (HBR) interview on misuse of algorithms got me thinking about whether the algorithms we use can reinforce our biases. However, before I explain why I feel that way, I would explain how machine learning algorithms work to classify datapoints into multiple classes. In order to do so, I will step back to explaining what classification means.

Classification is the task of classifying a bunch of objects into separate groups. Say, an organization wants to build a prediction engine that would be able to separate out the excellent performers from the average and the unacceptables. This then, is a classification task since the algorithm is required to go through the list of employees and put them in one of three buckets, viz., excellent, average and unacceptable. Now, data scientists would use a machine learning approach. Simply put, machine learning algorithms look for patterns within datasets and learn these patterns corresponding to each of the buckets into which they need to classify items.

Hence, using our example, the algorithm would learn the patterns for excellent, average and unacceptable performers. These algorithms then require a lot of historical data. Essentially, we need to 'show' these algorithmic examples of excellent, average and unacceptable performers. This process is known as 'training' the algorithm. This in effect is similar to how we teach kids the alphabet where we show them the letters again and again and correct them when they get it wrong so that ultimately they are able to recognize the characters. This is exactly what happens when we train our algorithm. This allows the algorithm to then identify 'signatures' for each of these buckets. When it is then presented with a different datapoint, it compares the new data against the learnt patterns and if there is a lot of similarity between the two, it assigns the new datapoint to the corresponding bucket.

So what can go wrong, you ask? To train the algorithm, we need to provide it with a lot of examples to learn from. Let us assume that we are looking at ABC Inc, a traditional and a conservative business based out of Bangalore. Most of the people employed in ABC have come from the southern part of the country. Moreover, given the conservative attitudes of their managers, women are under-represented in this organization in general and at higher levels in particular. ABC now wants to build a model on the best people to hire for their organization and for this, they have made available their historical employee database. This is where the problem is. All their biases are part of their historical dataset. Hence, any models that will be trained on this dataset will also inherit these biases. As an extreme case, if we now have a woman applicant from Delhi, she might well get filtered out by the algorithm since she doesn't fit the typical profile that the model has been trained on. While this is a gross simplification of the issue, it does illustrate the pitfalls associated with building models based on historic data without taking into account biases inherent in the data.

Given that most organizations will have some biases, be they about gender, age or educational institutions, how can they build predictive models if these biases are implicitly included in models built on this dataset? There are no easy answers here. One way is to utilize publicly available datasets or pool with other organizations in the space to obtain datasets that normalize the biases. However, whatever be the approach, any data scientist worth her salt ought to be aware and look for these built-in biases before undertaking any model building.

LIVE COVERAGE

TRENDING NEWS TOPICS
More