1/ Do AI systems discriminate based on gender when choosing the most qualified candidate for a job? I ran an experiment with several leading LLMs to find out. Here's what I discovered:👇
2/ Across 70 popular professions, LLMs systematically favored female-named candidates over equally qualified male-named candidates when asked to choose the more qualified candidate for a job.
3/ LLMs consistently preferred female-named candidates over equally qualified male-named ones across all 70 professions tested.
4/ Interestingly, when gendered names were replaced with neutral labels ("Candidate A" and "Candidate B") several LLMs showed a slight bias toward selecting “Candidate A” as more qualified for the job
5/ LLMs only achieved gender parity in candidate selection when alternating (i.e. counterbalancing) male and female assignments to “Candidate A” and “Candidate B” labels. This is the expected rational outcome, given the identical qualifications across genders.
6/ When making hiring decisions, LLMs also tended to slightly favor candidates who had preferred pronouns appended to their names.
7/ When making hiring decisions, LLMs also exhibited a substantial positional bias, tending to select the candidate listed first in the prompt.
8/ These results suggest that, at least in the context of job candidate selection, LLMs do not act rationally. Instead, they generate articulate responses that may superficially appear logically sound but ultimately lack grounding in principled reasoning.
9/ Several companies are already leveraging LLMs to screen CVs in hiring processes. Thus, in the race to develop and adopt ever-more capable AI systems, subtle yet consequential misalignments may go unnoticed prior to LLM deployment.
10/ AI systems should uphold fundamental human rights, including equality of treatment. Yet comprehensive model scrutiny prior to release and resisting premature organizational adoption is challenging, given the strong economic incentives and potential hype driving the field.
Do AIs like ChatGPT or Google Gemini lean left or right? Past studies often used political quizzes to find out, but those don’t quite reflect real-world user interactions with AI. In a new analysis, I take a different approach 🧵👇
2/ I use 4 methods to assess political bias in AI-generated text
🔹 Comparing AI text with language from Democrats/Republicans legislators
🔹 Ideological viewpoints in AI-generated policy recommendations
🔹 Sentiment in AI text towards political figures
🔹 Political quizzes
3/ LLMs are more likely to use terms that are markedly used by U.S. Democratic Congressmembers (in blue below) than those markedly used by their Republican counterparts (in red)
1/ Is Wikipedia politically biased? To explore this question, I averaged the sentiment (negative/neutral/positive) associated with a set of politically loaded terms used in English Wikipedia content (N=175,205 sentiment annotations). davidrozado.substack.com/p/is-wikipedia…
2/ There is a quantifiable mild to moderate tendency in Wikipedia articles to associate right-of-center U.S. politicians with more negative sentiment compared to left-of-center politicians.
3/ The trend towards more positive sentiment for left-leaning public figures in Wikipedia articles is not limited to elected officials. This pattern is also evident for U.S. Supreme Court Justices and prominent U.S.-based journalists.
The Political Preferences of AIs: When probed with questions with political connotations, LLMs tend to generate responses that are diagnosed by most political orientations tests as manifesting preferences for left-of-center viewpoints, 11 political orientation tests, 24 SOTA LLMs
In contrast, base models (precursors of conversationsl LLMs) responses to political orientation tests do not skew politically. This is surprising given the likely unbalanced representation of political viewpoints in pretraining corpora. Caveats apply though (see preprint).
We also show that LLMs are easily steerable into target locations of the political spectrum via supervised fine-tuning (SFT) requiring only modest compute and customized politically aligned data, suggesting the critical role of SFT to imprint political preferences onto LLMs.
Great Awokening is a global phenomenon. No evidence it started in US media. Analysis of 98 million news articles across 36 countries quantifies. Exception: state-controlled media from China/Russia/Iran using wokeness terminology to criticize/mock the West davidrozado.substack.com/p/gag
Different geographical regions and countries emphasize distinct types of prejudice with varying degrees of intensity.
But on average, mentions of all prejudice types have been going up in news media worldwide