Stanford Digital Economy Lab / May 28, 2026

Q&A | Algorithmic Monoculture in Hiring

The World Economic Forum estimates that more than 90% of employers use automated systems to screen job applicants. But what happens when the systems they use come from a limited set of vendors?

 

A new study found evidence of “algorithmic monoculture,” where the same algorithms led to repeated results and biases across the labor market.

by Matty Smith

Algorithmic Monocultures in Hiring,” by Rishi Bommasani, Sarah Bana, Kathleen A. Creel, Dan Jurafsky, and Percy Liang, examines how automated systems built by the same few algorithm vendors can cause the same applicants to face rejection again and again, noting “clear racial disparities.”

The study’s unique, position-by-position investigation is eye-opening for both job seekers and those hoping to find the best candidates for the job. We spoke with the authors about what they found.

 

What is “algorithmic monoculture?”

 

Sarah Bana: Algorithmic monoculture, to me, is any circumstance in which similar outcomes occur because of algorithms. There are plenty of simple algorithms, like needing a college degree or three years of experience, before getting a job. But the more complex algorithms that are now appearing in the labor market produce similar outcomes through more complicated processes. These machine learning tools, built to characterize opaque elements like “fit,” generate similar outcomes across firms with less interpretability. 

Kathleen Creel: Algorithmic monoculture occurs when the same algorithm dominates a sector, or in its weaker but more typical form, algorithms made in similar ways using similar data such that they make similar decisions. 

 

Can you explain your study’s methodology?

 

Sarah: We looked at 4 million applications across 3 million applicants, all screened by the vendor pymetrics. We did two primary sets of analyses: one on bias, and the other on homogenization.

For the bias analyses, we looked at the applicants that had provided race data at the position level. US employment law flags a position when one group is recommended at less than 80% of the rate of the most-recommended group — this is the “four-fifths rule.” So we calculated the recommendation rate of each group, and tested whether any group had a recommendation rate that was statistically significantly different from the highest passing group, and lower than 80% of the rate of the highest passing group.

For the homogenization analyses, we started by looking at how many models were used across firms. This number was 42.

We also looked at the probability of being systemically rejected – that is, being rejected by every position you had applied to. In our other work, we establish this concept of a baseline – which helps us understand what the rate could be if the models were independent. In the pymetrics data, 10 percent of applicants who apply to 4 positions are systemically rejected. And the observed rate and the benchmark rate diverge substantially.

But when we used our methods to analyze the largest prior study of hiring decisions, which sent 83,000 applications to 108 Fortune 500 firms, we find that the systemic rejection rates observed in their data are very accurately predicted by employers making statistically independent decisions. So there’s something different going on in our algorithmically mediated data, even though the time periods were similar.

 

What would you say is the main takeaway?

 

Kathleen: We’ve speculated in past work that if many firms relied on the same AI vendor to screen job applicants, that could prevent some applicants from getting any interviews.  But this study was the first time we were able to show this effect in real hiring data. 

Sarah: I think the most significant result of our study is how much bias we find in this algorithmic hiring system. The vendor has published aggregated audits that demonstrate that their tools do not demonstrate measurable bias. In that way, I was surprised because I thought that their algorithms would be an example of best practice. When you read that something you’re buying has been audited, you tend to take that finding at face value – and that’s likely part of what is going on. 

 

Before we get to the results, can you explain the pymetrics platform at the center of the study?

 

Sarah: Of course. Pymetrics is a response to a long tradition of industrial-organizational (I/O) psychology — the discipline that aims to improve work outcomes for individuals and organizations. Earlier generations of job screening tools used personality tests that you’d take on a computer; Autor and Scarborough (2008), for example, analyzed a test built around the Five Factor model (conscientiousness, agreeableness, extroversion, openness, and neuroticism).

Pymetrics’s founders argued there was a better way to measure personality than asking people about themselves. The games they have candidates play — short, gamified tasks rooted in neuroscience — are designed to surface those traits through behavior rather than self-report. In many ways, pymetrics is trying to improve systems that have historically been quite biased: people often struggle to get jobs because of what’s on (or missing from) their resume, and a process that doesn’t rely on resumes would, in principle, remove that barrier.

But behavior also encodes who we are. One of the pymetrics games involves popping balloons to measure risk tolerance — and a friend recently pointed out that risk aversion looks very different for someone at the poverty line than it does for someone who has never missed a meal.

Ideally, the future of this kind of assessment would involve simulating actual work as part of the interview. I’m very sympathetic to firms trying to hire — I recently went through an interview process that took more than five months end to end, with far fewer applicants than the median pymetrics position handles. AI makes hiring faster and cheaper, but at a cost we need to be willing to measure and audit.

 

What groups experience the most adverse impact?

 

Sarah: We see many Black and Asian applicants adversely impacted. We don’t have any causal evidence here but my guess is that behaviors that are being picked up by the games are functioning as proxies for race – the kind of bias that is hard to remove without explicit adjustments to the trained models. 

There’s also a structural piece that we don’t observe with the data we have access to. The models are trained against each firm’s current employees in a given role, and those workforces likely aren’t very diverse to begin with. 

Rishi Bommasani Senior Research Scholar, Stanford HAI

I don’t think we want to discourage the application of AI in this domain, but recognize the stakes are high and be judicious in the approach.

Can you speak to how these biases emerge in contrast to traditional demographic cues?


Rishi Bommasani: The biases we see arise from a different mechanism than most prior work in hiring. Over the last 20 to 30 years, researchers of the labor economy have shown that human decisionmakers exhibit bias when screening resumes due to implicit biases rooted in demography: names like Jamal or accolades like being the captain of the college softball team are associated with certain demographic groups. In the last 10 years, researchers had the same concern for the use of AI to screen resumes: Amazon infamously almost deployed a hiring AI tool biased against women that learned this bias from its training data. But our data is more subtle: every applicant plays a fixed set of games and never provides their name, gender, race, age, or other overt features that are tied to demography. So the biases we find reflect that these gameplay features are still unevenly distributed across racial groups, and that uneven distribution yields disparities in which groups get selected by the machine learning models.

 

How significant is the impact of algorithms on overall hiring?

 

Rishi: The World Economic Forum claims more than 90% of employers use automated systems to filter or rank job applications, especially given the growing volume of job applications. According to them, Google received over 3 million applications in 2024 and the Indian government received 220.5 million applications from 2014 to 2022.

Dealing with this scale incentivizes AI adoption. So I don’t think we want to discourage the application of AI in this domain, but recognize the stakes are high and be judicious in the approach. The first step is to make the issues we surface more obvious to decisionmakers: AI vendors should measure adverse impact for each of their models and offering separately, and employers should request this information when procuring hiring AI tools.

 

How do you explain the gap between earlier research that found no significant bias and your findings?

 

Sarah: Earlier research reported aggregate numbers — averaged across all the positions a vendor screens for. We disaggregated and looked at each position separately. That’s the major difference.

U.S. employment law evaluates adverse impact one position at a time, because that’s how employers actually make hiring decisions. Aggregating can mask the disparity. Imagine a model that over-selects one group for warehouse jobs and under-selects them for finance jobs. The averages would look balanced; the position-by-position picture would show real bias. That’s roughly the pattern we found.

There’s also a scope difference: previous research drew on far fewer positions. Some of those positions were characterized as overseas, meaning they may not have been subject to U.S. employment law in the first place.

 

Can you talk about the importance your work places on data access?

 

Rishi: How do we study the impacts of AI in different domains? A common approach is to directly query the AI and see how it behaves: we might conjecture that Google Search is favoring a certain brand in its search results and we can directly test this by trying it out via Google Search. But the hiring AI tools we study are different: they are deployed to affect the internal decision making at firms, so it is hard to detect their traces from aggregate labor market data, and the systems themselves are rarely accessible without purchase, so researchers cannot do their own testing. Hiring AI brings together three facts that really should not coexist: the use of AI to affect high-stakes decisions, the pervasive adoption of said AI throughout the economy, and yet extremely little external research of any kind. 

 

And your study had the advantage of rare access to data that prior studies didn’t have?

 

Sarah: We were interested in the question of homogenization in hiring algorithms, and pymetrics generously shared their data with us. To our knowledge, no other independent research team has had this kind of access — most hiring vendors keep their data under lock and key, which makes a market with this much at stake very hard to study from the outside.

Prior research has been limited by exactly that constraint. I’ve seen papers based on a single firm’s deployment of a hiring tool, and a few that look at one to three positions. Our study doesn’t have the quasi-experimental or experimental variation those studies sometimes offer, but it examines a much larger set of positions and applicants, and the scale gives it much stronger external validity.

There is no causal evidence in our paper. But this kind of work matters because I get the impression firms often aren’t fully aware of what their screening filters do to the applicant pool.

 

What advice would you give to those hiring using current algorithms?

 

Sarah: Figure out what your algorithm is doing – who it is screening in/out for each position you’re using it. This means you have to let, ideally, a random subset of applicants through that first stage and see how they fare. This is probably worth doing regularly because your algorithm is probably not changing at the rate that your work is. 

Sarah Bana Digital Fellow

This kind of work matters because I get the impression firms often aren’t fully aware of what their screening filters do to the applicant pool.

What advice would you give someone seeking a job?

 

Sarah: I want to be honest about what young people are walking into right now. The implicit deal a lot of us grew up with—get a college degree, get a good job, build a stable life—is fraying, and AI is part of why. Other members of the Lab are finding that employment of 22- to 25-year-olds has dropped 16% in the most AI-exposed occupations, and this is happening before significant organizational redesign inside firms.

I believe this is because hiring is now a forward-looking activity in a way it wasn’t before. When an employer posts a job today, they’re investing in someone for three to five years, and they’re anticipating what AI will handle in that period. They’re restructuring positions now, and young people are bearing the brunt of that uncertainty. The result is fewer entry-level openings and longer searches.

On the algorithmic side, our work finds that you need to apply to more jobs than ever before — and apply widely. Some employers use the same processes (sometimes literally the same vendor), so submitting more applications through identical pipelines won’t generate new evaluations. That’s part of why range matters. The flip side: as a job seeker, you’ll probably end up learning more about employers’ hiring stacks than researchers do, because you’ll be asked to complete the various steps of these processes yourself across many firms.

 

Sarah offered the following suggestions for a job search:

  1. Build demonstrable experience. Internships, freelance work, project-based work, portfolios, public writing — anything that lets you show what you can do, not just what’s on your resume. Concrete examples are harder for an algorithm to filter out.
  2. Lean on weak ties. The people most likely to help you find a job aren’t your closest friends — they’re acquaintances who have different information than you do. In some cases, a referral can also bypass algorithmic screening altogether.
  3. Stay flexible. Apply across roles, industries, and geographies. Be open to contract, temporary, or adjacent positions; organizations are running lean and a lot of opportunities show up in these forms.
  4. Build a routine. A long job search is hard — emotionally as much as logistically. Track your progress, and connect with your communities. The risk in this market isn’t that you can’t find anything; it’s that the search wears you down before it pays off.
Read the paper with further analysis here
DigDig Newsletter
The Lab would love to keep you up to date on the work our researchers are doing.

Sign up to receive updates on events, studies, and more!