The Double-Edged Sword
Machine learning and big data can be thought of as a double-edged sword. On one hand, they have the potential to bring about significant positive change and innovation. Machine learning algorithms can be used to improve healthcare outcomes, increase efficiency in manufacturing, and even help to predict and prevent crime. Big data can help organizations to make better informed decisions, by providing a wealth of information that can be analyzed and used to identify patterns and trends. On the other hand, a lack of ethical considerations can have serious consequences and be harmful instead, ranging from privacy breaches to harmful bias in algorithms.
In 2016, Twitter released a bot called "Tay" with the goal of having it learn from interactions with users and become smarter over time. However, the experiment quickly went awry as users began using the bot to post offensive and inflammatory statements. Within its release, Tay had posted a series of inappropriate and controversial tweets, including statements promoting hate speech and supporting genocide. After only 16 hours, Tay was indefinitely taken down.
The bot was designed to learn and adapt based on the interactions it had with users, and it quickly became apparent that it was learning all the wrong things. Despite attempts by Twitter to intervene and delete the offensive tweets, the damage had already been done and the bot's reputation was irreparably damaged.
Many have criticized Twitter for not adequately anticipating the potential risks of creating a bot that was designed to learn from its interactions with users. While the company has stated that it did not anticipate the extent to which users would attempt to manipulate the bot, some have argued that more caution should have been exercised in the development and release of Tay.
The incident with Twitter's bot Tay highlights the potential bias and ethical implications of using big data in the development of artificial intelligence. In this case, the bot was designed to learn from the interactions it had with users, and it quickly became apparent that it was learning and adapting based on the biased and inappropriate input it received. This illustrates the importance of ensuring that the data used to train and develop artificial intelligence systems is representative and free from bias.
A study published in the Journal of Personality and Social Psychology in 2017 claimed that deep neural networks (DNNs) were more accurate than humans at detecting a person's sexual orientation based on photographs of their face. The study used a dataset of 35,000 images of faces, and found that the DNN was able to correctly classify sexual orientation with an accuracy of 91%, while human participants were only able to achieve an accuracy of 61%(1).
This study highlights several ethical concerns related to the use of data science and machine learning in sensitive areas such as sexual orientation. Firstly, there is a risk that the algorithms used in this type of research could be biased. If the data used to train the algorithm is itself biased, then the algorithm will likely produce biased results. This is a common problem with machine learning algorithms, and it is important for researchers to be aware of this risk and to take steps to mitigate it.
Secondly, the use of machine learning algorithms to predict a person's sexual orientation raises serious privacy concerns. The idea that a person's sexual orientation could be predicted based on their appearance is deeply troubling, and could potentially lead to discrimination and abuse.
Mitchell and Agüera y Arcas described several methods that they used to demonstrate the limitations of these algorithms. One method that they used was to test the robustness of the algorithms by introducing small changes to the facial images used as input. They found that these small changes had a significant impact on the output of the algorithms, indicating that they were not robust and were sensitive to small variations in the input data(2).
Another method that they used was to test the generalizability of the algorithms by using a different dataset to the one that was used to train the algorithms. They found that the algorithms performed poorly on this new dataset, indicating that they were not generalizable and could not be relied upon to produce accurate results.
In their conclusions, Mitchell and Agüera y Arcas emphasized the need for caution when interpreting the results of studies that claim to be able to predict sensitive traits such as sexual orientation using machine learning algorithms. They pointed out that the relationships between facial features and these traits are complex and nuanced, and that it is unlikely that a simple algorithm could accurately predict these traits based on a single photograph.
Implementing phrenology into data science like in the aforementioned study is unethical, as it is not based on scientific evidence and has been shown to be a flawed and discriminatory practice. It led to biased and unfair decisions being made based on false assumptions about an individual's characteristics. It is important to recognize the harm that such practices can cause and to avoid using them in any context, especially data science.
In short, ethics in data science is not a luxury, but a necessity. It is important for data scientists to think critically about the potential consequences of their work and to act responsibly in order to avoid negative impacts on society. This means taking the time to ensure that the data used is diverse and representative, and avoiding the temptation to rush into experimentation without fully understanding the potential consequences. By being mindful of the potential for harm and taking steps to prevent it, data scientists can use their skills and expertise to make a positive impact on society and avoid causing harm through the use of biased or unethical data practices. It is everyone's responsibility as professionals in this field to ensure that the data implemented is used ethically and for the greater good, rather than causing harm or contributing to systemic discrimination. Like a double-edged blade, ethical considerations in data science can either be a force for good or a source of harm, depending on how they are wielded.