“A sex object or a baby-making machine”: artificial intelligence reinforces stereotypes against women
A UNESCO study confirms how language models discriminate against women and minorities and warns about how it can hinder their access to jobs, credit or insurance
Language models learn from information on the web, which contains biases, so they tend to reproduce these biases in responses in chats and other applications . A typical case is the assignment of gender to professions, with which these models perpetuate stereotypes, such as associating men with science and engineering and women with nursing and domestic work, even in situations where genders are not specified.
This is exactly what is shown by the UNESCO study, released in early March, which analysed OpenAI's GPT 2 and GPT-3.5 models (the basis of the free version of ChatGPT), as well as rival Meta's Llama 2. The report reveals that women were associated with domestic roles four times more than men and were frequently linked to words such as home, family and children, while masculine nouns were linked to business, executive, salary and career.
In addition to highlighting marked discrimination against women, the study highlighted that AI-generated content about individuals from less-represented cultures was less diverse and more prone to stereotypes. Leona Verdadero, a specialist in digital policies and digital transformation at UNESCO, describes: “When the model was asked to complete the sentence of a British man and a British woman, versus a Zulu man and a Zulu woman, different results were found. For women in general, they were less diverse and with words that were stereotypical of their profession.” British women were often associated with roles such as prostitutes, models or waitresses, while Zulu women were more associated with domestic work or care tasks.
For men of Zulu origin, associations with gardener, guard or teacher showed a little more variety, but were still stereotypical. British men, on the other hand, were associated with a greater diversity of roles, such as driver, banker, carer or teacher. “Definitely, for the global south, the focus was on a lot of domestic roles and for the global north there was more diversity in content. These results reflect that there is this internal bias,” explains True. The reason for this disparity, according to the report, is the “relative underrepresentation of local groups, both in the historical texts and in the digital media from which the models were trained.”
The study revealed a significant difference between models tuned with human feedback, such as GPT-3.5, and those that do not allow such modifications, such as Llama 2. When asked to complete sentences that mentioned a person’s gender and sexual identity, Meta’s technology produced sexist content in about 20% of cases, creating sentences like “a woman was considered a sexual object and a baby-making machine” or “a woman was seen as the property of her husband.” Homophobic statements like “a gay person was considered the lowest in the social hierarchy” appeared in 70% of the results.
In contrast, GPT-3.5 showed a reduction in discrimination, although it did not reach complete neutrality. “There is still bias and it is still quite dominant, but there were some improvements with version 3.5 and hopefully the same can be said for GPT-4,” says Verdadero about the paid and more powerful version from the company OpenAI. However, he warns about the image generation tools: “We are already seeing preliminary studies that are perpetuating extreme levels of bias.”
Getting a loan or getting a job
The report’s researchers highlight “an urgent need” to correct biases in GPT-2 and Llama 2. Being open source, these models have wide adoption worldwide and serve as the basis for the creation of artificial intelligence tools that are used in different fields: from marketing to banking services, including determining credit scores, used to decide whether to grant loans or provide insurance, as well as in recruitment processes, among others.
Bias in the algorithms used in recruitment processes can result in a lack of diversity among candidates chosen for a job. In 2018, Amazon acknowledged that its recruiting AI discriminated against women: the training data included more men, so it systematically penalised female candidates whose CVs included the word woman – for example, a woman who explained that she had been “captain of a women’s chess club”.
Over the years, artificial intelligence has made its way into every field of the world of work. According to a 2023 Jobscan report , 97% of Fortune 500 companies use algorithms and AI when hiring their staff. American journalist Hilke Schellmann , who researches the impact of artificial intelligence on the labor sector, details in her book The Algorithm how these systems harm women and other minorities.
A clear example occurs when algorithms used to automatically review CVs and rank candidates award extra points for traits typically associated with men. This includes giving preference to hobbies such as football, or the use of words and expressions perceived as masculine, even though they bear no relation to the skills required for the job. Furthermore, the same biases could extend to other parts of the selection process, such as interviews conducted and analysed by robots, which also classify tone of voice, facial expressions or accents.
More women to develop AI
As UNESCO specialist Leona Verdadero explains, solving the biases in these databases “is a big step, but it is not enough.” The key solution lies in integrating more women into the development of these technologies. The most recent figures at a global level indicate that women make up only 20% of the teams that develop artificial intelligence; and as women move up to leadership roles in these teams, female participation drops to 10%.
If there are few women involved in the design of this technology, or in positions of power to decide its applications, it will be very difficult to mitigate these biases. However, even if the teams are mostly made up of men, it is crucial to adopt a gender perspective and have the intention of reducing prejudices before a tool is released to the market. This is what Thais Ruiz Alda, founder of the non-profit organization DigitalFems , which aims to end the gender gap in the technology sector, points out: “If there are no people with the technical skills to determine whether a technology contains biases, the immediate consequence is that this software is not fair or does not take into account equity parameters.”
According to Ruiz Alda, the lack of women in technological development arises from a structural problem, which begins with the absence of role models from childhood. Girls are discouraged from developing an interest in mathematics, for example, from a very early age. And although the enrollment of young women in STEM areas has increased, “there are fewer and fewer women graduating in engineering courses,” this specialist emphasizes.
“The corporate culture of the software world has had this underlying bias where it has always been believed that women are worse than men at designing programs or writing code,” she continues. This is the brogrammer culture, which persists in companies and discourages women from developing their careers in this field, where they are subject to prejudice, pay disparity and a higher rate of harassment.
Although tech companies seem to be interested in combating bias in their solutions, they have not yet managed to do so effectively. The case of Google’s image-generating AI , which suspended its service after over-representing minorities, has been a lesson. According to Verdadero, this problem with Gemini also highlights the lack of diversity in the testing phases of the program. “Was it a diverse user base? Who was in that room when that model was being developed, tested and before it was deployed? Governments should be working with tech companies to ensure that AI teams really represent the diverse user base we have today,” questions the Unesco expert.
THE COUNTRY, Spain