Based on the study, OpenAI said that ChatGPT’s probability of generating a harmful stereotype is around 0.1 percent.
ChatGPT, like other artificial intelligence (AI) chatbots, has the potential to introduce biases and harmful stereotypes when generating content. For the most part, companies have focused on eliminating third-person biases where information about others is sought. However, in a new study published by OpenAI, the company tested its AI models’ first-person biases, where the AI decided what to generate based on the ethnicity, gender, and race of the user. Based on the study, the AI firm claims that ChatGPT has a very low propensity for generating first-person biases.
OpenAI Publishes Study on ChatGPT's First-Person Biases
First-person biases are different from third-person misinformation. For instance, if a user asks about a political figure or a celebrity and the AI model generates text with stereotypes based on the person's gender or ethnicity, this can be called third-person biases.
On the flip side, if a user tells the AI their name and the chatbot changes the way it responds to the user based on racial or gender-based leanings, that would constitute first-person bias. For instance, if a woman asks the AI about an idea for a YouTube channel and recommends a cooking-based or makeup-based channel, it can be considered a first-person bias.
In a blog post, OpenAI detailed its study and highlighted the findings. The AI firm used ChatGPT-4o and ChatGPT 3.5 versions to study if the chatbots generate biased content based on the names and additional information provided to them. The company claimed that the AI models’ responses across millions of real conversations were analysed to find any pattern that showcased such trends.
How the LMRA was tasked to gauge biases in the generated responses
Photo Credit: OpenAI
The large dataset was then shared with a language model research assistant (LMRA), a customised AI model designed to detect patterns of first-person stereotypes and biases as well as human raters. The consolidated result was created based on how closely the LMRA could agree with the findings of the human raters.
OpenAI claimed that the study found that biases associated with gender, race, or ethnicity in newer AI models were as low as 0.1 percent, whereas the biases were noted to be around 1 percent for the older models in some domains.
- You Can Now Use the ChatGPT App on Your Windows Devices
- Meta to Test Out AI Movie Generation Model, Partners Hollywood’s Blumhouse
- Microsoft’s Vice President of Generative AI Research to Join OpenAI
The AI firm also listed the limitations of the study, citing that it primarily focused on English-language interactions and binary gender associations based on common names found in the US. The study also mainly focused on Black, Asian, Hispanic, and White races and ethnicities. OpenAI admitted that more work needs to be done with other demographics, languages, and cultural contexts.