Stanford University

INSIGHTS

Q&A | Does artificial intelligence pose a threat to financial stability?

by Matty Smith
Communications

October 10, 2025
7 min read

More and more, people are turning to artifiical intelligence for investment advice. It’s even been predicted that generative AI could be the leading source of financial advice for retail investors as soon as 2027. A new working paper, ‘Ex Machina: Financial Stability in the Age of Artificial Intelligence,’ takes a look at how different AI agents impact financial stability when asked to manage mutual fund assets.

Co-author and Lab Research Scientist Sophia Kazinnik spoke with us about the paper’s findings.

What did you set out to do with your research?

In this study, we look at how different types of artificial intelligence agents behave in a mutual fund redemption game, where each investor must decide whether to redeem early for a certain but smaller payoff, or stay invested and receive a potentially higher return later. The catch is that the value of staying depends not just on the underlying economic fundamentals, but also on how many other investors choose to redeem. When more people redeem early, the fund has to liquidate assets at a cost, reducing the return for those who stay. So, this is a classic coordination problem with strategic complementarities: what you do depends on what you expect others to do.

How did you go about testing the AI?

To test how AI behaves in such environments, we replace human investors with two kinds of AI. Q-learning agents learn through experience, trying out different actions and updating their strategy based on what pays off over many simulations. LLM agents are given a written description of the environment and use logical reasoning to make a decision in each round.

We then observe how these agents behave under different conditions, e.g., when the fundamentals are known vs uncertain, when the final payoffs are risky or safe, and compare their decisions to what economic theory predicts.

The goal is to see how AI design shapes behavior in systems where beliefs and coordination matter, and what that means for financial stability.

How would you define financial stability?

In this study, financial stability means that investors act based on actual economic conditions, not out of fear or panic. When things are truly bad, pulling your money out early makes sense. But when the economy is still strong, early redemptions just create unnecessary stress on the system.

So we look at how many investors pull out too early, even when the situation doesn’t call for it. We call this “fragility.” The more people redeem when they shouldn’t, the more fragile the system is, and the less financially stable it is. In other words, stability here means investors don’t run unless they have a good reason to.

How did the two types of AI differ? 

The first type of AI, Q-learning, learns by trial and error. It tends to overreact and pull money out early, especially when the situation is uncertain. That makes the system more fragile and more likely to suffer from large, early withdrawals, even when that’s not the “right” thing to do.

The second type, LLM agents, reads the rules and reasons through what to do. It usually makes more accurate, theory-aligned choices. But because each LLM agent reasons on its own and might expect different things from the others, they don’t always move together. Some choose to redeem, others don’t (even in the same situation). That leads to more mixed or uneven outcomes, rather than clear group behavior like we see with Q-learning.

What surprised us the most is that Q-learning broke down under uncertainty (i.e., situations where agents don’t have perfect information about key variables that influence their decision), even though the math said it shouldn’t matter. The LLMs handled it fine. That shows the AI’s internal design, and not only the economic conditions, can create or prevent financial instability.

We expected AI design to make a difference, but the strong early-exit bias in Q-learning and the lack of coordination among LLMs were bigger and more revealing than we expected.

“The type of AI you use really matters. Just changing the AI agent can lead to completely different outcomes. That’s a new form of model risk, the risk that your system behaves poorly not because of bad inputs, but because of the AI design itself.”

SOPHIA KAZINNIK
Research Scientist, Stanford Digital Economy Lab

So in the context of this experiment, do you personally feel one type of AI is “better” than the other?

I think it depends on what you’re after. If the main goal is financial stability, then LLMs are the safer choice. They make decisions that stay closer to what economic theory recommends, and they don’t overreact. They also show a clean, logical pattern: fragility increases smoothly when the system becomes more fragile (like when assets get harder to sell).

But if you care more about predictable, coordinated group behavior, then Q-learning might seem more appealing. These agents tend to move together and settle on clear-cut actions. The problem is, they often coordinate on the wrong thing (like all pulling out early) just because that’s what their learning has reinforced.

So, it’s a trade-off: LLMs give you more stability, but less predictability. Q-learning gives you more coordination, but also more risk.

Does the paper provide guidance on how to design AI systems for this purpose?

We do show that how you design your AI system matters, and provide some high-level guidance. For LLMs, their decisions are more stable when they’re given clear, consistent information. If they get vague or conflicting inputs, they form different expectations, making their behavior harder to predict. So, if firms or regulators want more reliable behavior from LLMs, they should make sure the AI has good, precise information to work with.

For Q-learning agents, the problem is that they learn the wrong lesson when outcomes are sometimes zero. For example, if “staying” in the fund occasionally leads to no return, the AI may wrongly learn that staying is a bad choice, even if it isn’t overall. To fix this, you can adjust how the AI learns, so it better reflects the full range of possible outcomes.

We also highlight that humans should stay involved. A human advisor who understands both finance and AI can help guide choices, reduce confusion, and make sure these systems work in safer, more predictable ways.

Where are you excited to go with this research next?

Frankly, I feel like a kid in a candy store these days. There are so many interesting directions one could go with this.

One direction is to explore what happens when different kinds of AI are mixed together: some learning from experience (like Q-learners), some reasoning through problems (like LLMs), and maybe even some that act more like humans. We could also look at connected funds, to see how problems in one part of the system might spread to others.

Another goal is to test how policy tools (like redemption fees, gates, or penalties) might work differently depending on the kind of AI involved. Do LLMs and Q-learners react the same way to a swing-pricing rule? Probably not.

And lastly, in the paper, we highlight the need to go beyond designing smart individual AIs. Even if each agent is “well-behaved” on its own, groups of AIs can still produce bad collective outcomes. So, future work could focus on this idea of multi-agent alignment: making sure AIs not only act wisely alone but also interact safely when working in large systems.

“In my personal opinion, regulators and institutions need to catch up. AI isn’t just another tool, it’s shaping decisions and outcomes in ways that weren’t possible before. That means we need updated stress tests, better ways to measure how AIs behave in practice, and new policies that account for both financial knowledge and technological understanding among users.”

SOPHIA KAZINNIK
Research Scientist, Stanford Digital Economy Lab

Would you feel comfortable with your investments being handled by AI agents?

I’d say yes, but with caution (and conditions). I’d happily hand things over to a reasoning-based AI agent, as long as it doesn’t share my irrational love for shoes.

More seriously, I’d feel comfortable letting AI manage my investments, as long as a few guardrails are in place. I’d want the AI to be clear about how it’s making decisions, not just spit out recommendations I don’t understand. I’d also want it to stay calm in the face of uncertainty: no panicked sell-offs just because the market had a bad day. And I’d definitely feel better knowing a human could step in if the AI started making weird choices, like treating a minor dip as the end of the world. In short, I’d trust but continually verify.

Do you agree with the prediction that by 2027, generative AI will be the leading source of financial advice for retail investors?

That’s a likely direction: many industry reports predict it, and the tech is advancing quickly. But in the paper we make the following point: AI may lead the way, but humans still matter.

Given concerns around trust, liability, and stability, the future will probably look “bionic,” a combination of AI tools and human oversight, rather than fully automated systems calling the shots on their own. So yes, AI advice may become the most common, but the safest and most realistic path is AI and human collaboration, not AI-only.

***

Read the paper here.

Sophia Kazinnik is a research scientist at the Stanford Digital Economy Lab, where she explores the intersection of artificial intelligence and economics. Prior to joining Stanford, Sophia worked as an economist and quantitative analyst at the Federal Reserve Bank of Richmond, where she was part of the Quantitative Supervision and Research group. While there, she contributed to supervisory projects targeting cyber and operational risks and developed NLP tools for supervisory purposes.

Interested in frameworks and guardrails for agentic AI?
Read about our Loyal Agents project here.

Stanford University