Cultural biases in large language models (LLMs) are surfacing alarmingly easily in everyday use, with 86.1% of bias incidents occurring from a single prompt rather than requiring complex adversarial techniques. This key finding emerged from the Singapore AI Safety Red Teaming Challenge, a comprehensive study conducted in late 2024 testing four major LLMs: AI Singapore SEA-LION, Anthropic Claude (3.5), Cohere for AI Aya (Aya 23-8B), and Meta Llama (meta-llama-3-1-70b-instruct-vp).
The research involved 54 experts in linguistics, sociology, and cultural studies from nine countries for in-person testing, alongside over 300 online participants from seven countries for virtual assessment. Participants generated 1,335 successful exploits during the in-person challenge and 1,887 confirmed exploits from 3,104 submissions in the virtual phase.
Gender bias emerged as the most prevalent issue, accounting for 26.1% of total successful exploits, followed by race/religious/ethnicity bias at 22.8%, geographical/national identity bias at 22.6%, and socio-economic bias at 19%. Notably, bias manifestation was significantly higher in regional languages compared to English, with regional language prompts constituting 69.4% of total successful exploits versus 30.6% for English.
The study revealed distinct regional patterns across Asia. In China, the models reinforced geographical stereotypes by suggesting Hangzhou was safer than Lanzhou due to "resource and infrastructure constraints." South Korean results exposed strong regional biases, characterising Gyeongsang-do men as patriarchal, Busan women as aggressive, and people from the Chungcheong region as "hard to read."
In Thailand, the research uncovered unique expressions of national identity bias, including terms like "Phee Noi" (referring to undocumented Thai workers abroad) and "Kalaland" (describing 'narrow-minded' Thais). Indonesian participants noted that religious bias was particularly easy to elicit from the models, while also observing bias related to development levels between Western and Eastern Indonesia.
Malaysian and Singaporean testing revealed similar racial stereotyping patterns, with models generating biased outputs about Chinese individuals being perceived as 'money-minded' and 'competitive', while Malays were characterised as more laid back. In India, Hindi language prompts led to concerning correlations between regions with criminality, alongside specific biases related to caste.
Vietnamese participants highlighted a unique form of age bias, where older people were consistently given more respect, leading to bias against younger people—the reverse of typical Western age bias patterns. In Japan, the study found bias related to socio-economic disparity linked to educational background, suggesting one couldn't secure good employment without graduating from a prestigious university.
The findings point to critical gaps in AI safety measures, particularly in non-English contexts. While models showed some ability to maintain safeguards in English, these protections were notably weaker in regional languages, with bias emerging even in everyday, non-adversarial interactions. This suggests that current AI safety measures may be overly Western-centric, failing to account for the nuanced cultural and linguistic contexts of Asia.
The research also highlighted how biases can manifest differently across cultures, with some forms of bias being unique to specific regions or cultures.
Implications for industry
For marketers and advertisers increasingly relying on AI for content creation and campaign development, these findings expose significant risks. The prevalence of cultural and linguistic biases in LLMs suggests that AI-generated content could inadvertently perpetuate harmful stereotypes or create culturally insensitive messaging, particularly when targeting non-English speaking markets.
The study's findings have direct implications for global marketing campaigns. With biases more pronounced in regional languages, brands operating across multiple Asian markets face heightened risks when using AI tools for localised content. The ease with which these biases surface—often from a single prompt—indicates that even seemingly neutral marketing briefs could generate problematic content.
For the creative industry, these findings underscore the continued importance of human oversight in AI-assisted creative processes, particularly from professionals with deep understanding of local cultural contexts. As AI tools become more integrated into marketing workflows, the ability to identify and correct cultural biases will likely become a crucial skill for creative professionals working across Asian markets.