Stanford University

INSIGHTS

Q&A | GDP-B: Is it time for a “new” GDP?

by Matty Smith
Communications

March 27, 2025
10 min read

Since the 1930s, gross domestic product, the sum of the value of all goods and services produced in a country, has been used to measure the economy. But with the rise of digital goods, traditional methods have become increasingly outdated and inaccurate. That’s why the Lab proposes GDP-B, a better measure of the health of the AI-powered digital economy.

Lab Digital Fellow Avinash Collis, Research Scientist David Nguyen, and Research Scientist Sophia Kazinnik answered questions about the project.

Why do we need a new GDP?

Avinash Collis: GDP does a great job of measuring the production side of the economy. It does a really good job of measuring the monetary value of all final goods produced. The challenge is that it was never meant to measure well-being or economic well-being.

Historically, that was not really a problem because the world was physical. If you consumed more physical goods, GDP would increase. At the same time, the benefits people got from consuming those goods would also increase. So benefits and what showed up in GDP were mostly correlated.

As the world switched from physical to digital, one big challenge with the digital economy was that most of the things we consume don’t have a price. 

Sophia Kazinnik: There are many aspects that current metrics are not capturing, but what stands out to me is the growing importance of “free” digital services. It’s hard for me to imagine my day-to-day activities without apps like Waze, Gmail, and WhatsApp. I get countless benefits from apps without paying for them, but traditional GDP doesn’t capture that extra value. This mismatch shows why we need a more complete way to measure our well-being—one that captures all the benefits people get instead of just tracking how much they spend.

David Nguyen: There is a clear demand from society and policy for a new indicator that can measure what we ultimately care about in life. GDP is a good indicator, but it’s limited to market-based production, which is only one–although important–source of people’s welfare. 

AC: We call it GDP-B, where ‘B’ stands for benefits. The idea is this metric directly captures the benefits as opposed to looking at the production side. If you look at the information technology sector as a percentage of the overall economy, that share has remained at around 4 to 5 percent for the last 40 years. 

The world has gotten more digital, but the IT sector’s share of the economy, as evident in GDP, hasn’t really changed. Most of the benefits we are getting from the digital revolution are not showing up.

What does it mean to measure “well-being?” 

AC: One thing I want to clarify is that we are focusing on economic well-being, measuring what in economics is called consumer surplus. Within the standard economic framework, this is basically the extra value you get from consuming something on top of what you’re paying for it. If you think of well-being more broadly, as happiness and more subjective metrics, we are not measuring that.

There are many research groups and agencies that put out happiness rankings every year. In our 2019 HBR article, that’s why we show the spectrum of metrics. On one end, we have GDP. On the other end, we have happiness metrics trying to truly measure well-being. Our GDP metric is somewhere in the middle.

Can you break down your research methodology?

DN: We run massive online choice experiments asking tens of thousands of people how much they value having access to certain goods and services or other quality of life factors. 

AC: How much people are paying for various things is not necessarily a good measure because people are not paying for most digital goods. What we try to measure is, how much do we have to compensate people so that they give up access to a certain good? To estimate valuations even when there is no market data on prices.

SK: We can also use large language models to augment the survey data. LLMs have been trained on massive amounts of human-created text, effectively “absorbing” patterns in people’s thinking, behavior, and decision-making. When you provide them with enough context—like demographics or previous behaviors—they can make realistic guesses about how someone might respond.

Is GDP-B adding new metrics to the existing GDP, or does it rebuild it from the ground up?

AC: I would say it’s a different way of thinking about benefits and economic well-being while staying within the framework used by national statistical agencies. We should be looking at GDP numbers and GDP-B numbers side by side. It’s not like we just look at GDP-B because we still care about production.

DN: GDP-B is meant to complement GDP and other economic indicators. We do not advocate for replacing GDP but want to go “beyond” in the sense of moving closer to measuring what people ultimately care about in life. Due to its level of standardization and comparability over time and across countries, GDP will for now remain a relevant measure of market-based production. 

SK: Personally, I don’t see the old GDP going away entirely, because it’s still a useful baseline for things like government spending, cross-country comparisons, and historical data tracking. But once we have a “finished” expanded measure, I’d expect it to sit alongside the traditional GDP, giving policymakers and economists a more rounded view of economic well-being.

What has changed since you started working on GDP-B?

AC: We have noticed GDP growth is quite high. The benefits we get from digital goods are steadily increasing since we started tracking in 2016. 

SK: I think that one of the biggest shifts has been the rise of generative AI—not just as a research tool, but as part of the very digital ecosystem we’re trying to measure. Previously, the focus was on platforms like Facebook, Wikipedia, and Google Search—things people use. Now we’re seeing people co-create with tools like ChatGPT, Midjourney, or Copilot, which blurs the lines between producer and consumer, and raises new questions about where value is being generated and who’s actually benefiting from it.

AC: We still don’t see GDP and productivity rise due to AI. But on the GDP-B side, what we found is that in 2024, generative AI tools contributed to around $50 billion of benefits to consumers. Consumers are benefiting already, and it shows up in our GDP-B metrics. On the GDP and productivity side, there is no impact yet. I think this makes the case for why we should be also tracking GDP-B in addition to GDP.

“Let’s say a policymaker is thinking about regulating generative AI technologies. We can, using GDP-B numbers, put a dollar number on what would happen, an increase or decrease in benefits, from a potential regulation. These numbers would help them come up with better policies because they directly capture the benefit.”

AVINASH COLLIS
Digital fellow, Stanford Digital Economy Lab

Are there any particular areas of research you’ve found challenging?

AC: We’re exploring areas like healthcare and infrastructure, and more durable goods like the value of having a car. What we’re finding in our survey-based approach is that it’s more challenging to measure goods that are essential to survival.

It’s easier to put values on goods when you can live without them. But if it’s impossible to give up something, the perceived values are, of course, very high. People have difficulty saying if it’s 1 million dollars or 10 million dollars, so when the numbers become too high our method doesn’t work as well.

One approach is to focus on changes over time rather than the absolute numbers. How much one values healthcare is a hard question because it’s essential, but the value of the improvement in healthcare from last year to this year is relatively easier.

DN: Since I first started working on economic measurement under Professor Diane Coyle, I realized quickly how large the challenges lying ahead of us are. Now I know that it is possible to come up and establish new indicators if you are willing to build the networks. Walking the line between improving traditional metrics that remain relevant and establishing something complementary can, at times, be challenging. 

What are the areas of concern when looking to apply GDP-B internationally?

DN: We are working with other countries to establish GDP-B as a global indicator from the start. There are no theoretical barriers to using GDP-B as an internationally comparable indicator. 

AC: When we measure GDP contributions from digital goods across several countries, we find lower-income countries benefit much more from digital goods than higher-income countries. Similarly, within a country, lower-income individuals seem to benefit much more from digital goods related to their income compared to higher-income individuals. 

It’s not like Elon Musk gets a better quality Google search, right? Elon gets the same Google search as you and I, and for free provided you have internet access. If you are richer, you have better homes, better cars. But in a digital world, if you are rich or poor, you still have the same Google search, the same YouTube, and so on.

One caveat I want to mention is this is not to say digital goods are amazing. We are only looking at the economic benefits. If you measure happiness and subjective well-being, there could be negative impacts as well.

Does GDP-B examine any potential harms or negative impacts?

AC: It’s hard to measure these externalities. For example, we find that social media apps generate a lot of welfare and billions of dollars, but–this is a question of debate in academic circles–they may impact increasing political polarization at the national level. That wouldn’t show up in our data because we are measuring individual-level benefits. 

DN: We are actively working on accounting for the downsides of (over-)consumption like addiction, and defensive expenditures like cleaning up after environmental disasters or conflicts. GDP has the same issues given that many things we might see as welfare decreasing are counted as positive factors. Examples include cigarettes, surgeries after preventable accidents, or securing private property.

SK: From a purely economic standpoint, standard GDP rarely factors in costs like pollution or other negative externalities; it just measures output or consumption. In the same vein, GDP-B zeroes in on consumer benefits. So, GDP-B only captures the upside.

However, this is an important point that we keep coming back to. We would need to develop a methodology to capture potential harms, perhaps something along the lines of surveying users on how much they’d pay to avoid data privacy issues or unwanted attention “taxes.” But these types of questions are more abstract and are inherently more difficult to answer. Ask yourself, how much would you be willing to pay to not feel the urge to check your phone every five seconds?

How do you see policymakers using these new measures?

SK: I think policymakers could use these new measures (like GDP-B) to inform decisions about everything from regulating platform services to prioritizing infrastructure that boosts overall welfare, not just the economy’s “cash flows.” They’d have numbers that more accurately reflect the actual quality of life and benefits that people experience, especially in the realm of “free” digital goods.

As for the general public, a measure that captures the free services they actually use might resonate better than GDP alone, which can feel disconnected from everyday experiences. People might find it validating to see their reliance on messaging apps or streaming services recognized as part of the economy’s real value.

AC: Looking at changes in GDP-B over time would be a great proxy to see, are people better off or worse off because of various technologies?

Let’s say a policymaker is thinking about regulating generative AI technologies. We can, using GDP-B numbers, put a dollar number on what would happen, an increase or decrease in benefits, from a potential regulation. These numbers would help them come up with better policies because they directly capture the benefit.

If you could ensure one specific change in how we measure economic progress in the next decade, what would it be?

AC: We would have a “dashboard” of metrics–this is something Erik Brynjolfsson said a few years back. If you’re driving a car, you have a dashboard, and you have different numbers. We have speed, we have fuel, all these different numbers. 

Similarly, for policymaking, we need a dashboard. We already have GDP and productivity. In addition to that, we can measure happiness. We create a dashboard, and we try to focus on looking at this spectrum rather than focusing on one specific number. This would be on my wish list for the future.

DN: GDP-B becomes the default global measure of progress and societies are benchmarked against improving the things that people ultimately care about. 

SK: Personally, I’d push for routinely measuring consumer surplus from digital goods—how much value people actually get from services that don’t have a price tag, like Google Maps or Wikipedia. This is something that’s long overdue.

Have you seen any sort of pushback?

AC: Whenever we present GDP-B, it has been very well received. The big challenge, of course, is that statistical agencies–in the US the Bureau of Economic Analysis–have been facing declining budgets over the past several years. They are already struggling to produce the existing GDP numbers. This is a new method requiring survey data collection, which costs a lot of money.

DN: Policymakers have shown a keen interest in our work, demonstrating that there is a demand out there for welfare indicators beyond GDP. One challenge is the decreasing budget and funding for economic measurement and statistics at the federal level. This is ill-guided given that the rapidly changing economic structure requires better measurement techniques to keep track of what is going on.

What steps are being taken to future-proof GDP-B?

SK: The focus needs to be less on specific tools, and more on the framework. What people value changes fast these days, and our focus needs to be on that. 

DN: GDP-B is a flexible measure that can easily account for new products in a rapidly changing digital economy – much more so than traditional GDP that struggles whenever the product mix changes.

AC: We’re creating a template for running the service. If someone is interested in digging deeper into a specific industry or geography, they can take our template, do their own surveys, plug the numbers into our framework and come up with GDP-B estimates for their country or region.

The real impact will come from adoption. If countries adopt GDP-B, that’s where the scale will come.

You can learn more about GDP-B by visiting gdp-b.org.

***

Avinash Collis is an Assistant Professor at the Heinz College of Information Systems and Public Policy at Carnegie Mellon University. Avi holds a Ph.D. from the MIT Sloan School of Management. He studies the economic implications of information technologies.

David Nguyen‘s research explores new and better ways to measure the modern and digital economy. He is particularly interested in advancing economic metrics and statistics on economic output and welfare. As a research associate, he remains affiliated to the London-based Economic Statistics Centre of Excellence (ESCoE). David received his PhD from the London School of Economics.

Sophia Kazinnik is a research scientist at the Stanford Digital Economy Lab, where she explores the intersection of artificial intelligence and economics. Prior to joining Stanford, Sophia worked as an economist and quantitative analyst at the Federal Reserve Bank of Richmond, where she was part of the Quantitative Supervision and Research group. While there, she contributed to supervisory projects targeting cyber and operational risks and developed NLP tools for supervisory purposes

INSIGHTS

Democratizing statistics: How AI can empower everyone (without causing disasters)

February 17, 2025
8 min read



by José Ramón Enríquez, David Levine and David Nguyen

“Democratizing Statistics: How AI can Empower Everyone (without Causing Disasters)” is an opinion piece on how large language models are making complex data analysis available to everyone, and the risks involved. A video interview discussing the piece with authors José Ramón Enríquez, postdoctoral fellow at Stanford University, David Levine, professor at the Haas School of Business at the University of California, Berkeley, and David Nguyen, research scientist at the Stanford Digital Economy Lab, can be viewed here.

Imagine a world where you don’t need an advanced degree in statistics to uncover meaningful insights from data. Once the realm of statisticians, Large Language Models (LLMs) such as ChatGPT are making complex data analysis accessible to anyone with curiosity and a computer. But can LLMs help users navigate the complexities of data analysis safely?  Or will LLMs help users create data-driven “hallucinations” (or, at least, misleading analyses), counterparts to the textual hallucinations that LLMs already create?

The Upside: AI Unlocks the Power of Data for Everyone

LLMs are lowering the barriers to data analysis for everyday users from business and healthcare to education and civil society. These models enable users to apply statistical analysis to answer questions and make informed decisions. For example, a nurse might use AI to analyze patient outcomes without waiting for an analyst.  A teacher can assess the effectiveness of a lesson plan by visualizing student performance data. Job-hunters can do A/B testing on which resume gets the most responses. 

More generally, teams of all sizes and backgrounds and across every industry and sector can incorporate statistical analysis into their workflows, fostering a culture of data-informed decision-making.  We can expect faster decisions as AI assistants enable real-time data analysis, cutting down the time it takes to move from raw data to actionable insights while also reducing the need for specialized skills or expertise.  Engaging non-traditional users brings unique perspectives to data analysis, as new users will ask new questions and uncover important patterns that experts might have overlooked.

The Risks: When AI + Stats Go Wrong

While the benefits are substantial, the risks of democratizing statistics are equally important. Without proper training or safeguards, errors can lead to misguided decisions, wasted resources, or even harm.  As anyone who has taught (or been taught) statistics knows, we can expect new users to:

  • Be Unaware of the Problems with Data Quality: New users will not know to look for missing values, outliers, or inconsistencies that can distort results. For example, because the mean is not a robust static of central tendency, when estimating the mean age of a group, it only takes a few observations mistakenly reporting an age of 700+ years to move an estimate of the mean by a lot. Users might also ignore data collection procedures that can lead to sampling or selection bias. For example, conducting a survey exclusively online might exclude individuals without internet access, skewing the findings and limiting the generalizability of conclusions. Another issue related to “hallucinations” arises when the dataset (or parts of it) are, intentionally or not, entirely made up by the LLM
  • Place Excessive Confidence in Statistical Outputs: Users may accept results without understanding statistical assumptions (e.g., normality, independence) or fail to acknowledge limits to the models. For example, new users may interpret a P value for non-normally distributed data without realizing the results may be invalid.  Similarly, they will use linear regression when the relationship between variables is clearly non-linear. Users may also ignore the need to perform corrections for multiple hypotheses testing, increasing the risks of false positives.
  • Misinterpret Results:  Driven by the often-confident presentation of results, users may draw incorrect conclusions, such as interpreting correlation as causation.  For example, a team might conclude that increased social media spending drives higher sales when both are actually linked to an omitted factor—seasonal demand. Other common mistakes are misinterpreting the units of analysis, the magnitude of estimates, or confusing statistical with practical significance. 

From a systems perspective, algorithmic bias can also contribute to misguidance and perpetuation of existing biases. If AI systems are trained on skewed, non-representative datasets, or if they rely on inappropriate explanatory and outcome variables, they risk perpetuating and propagating errors and generating equivocal analyses. A well-documented example is the misrepresentation of health risks for Black patients in the US when health needs are proxied by health expenditures. This issue could be further exacerbated by reinforcement learning from human feedback (RLHF). When novice users accept AI-generated suggestions that are poorly calibrated or unsuitable for the problem at hand, and this feedback is subsequently incorporated into the model, biases can become entrenched and amplify over time.

“We invite developers, educators, and policymakers to join the conversation and ensure that democratized statistics truly benefits everyone.”

JOSÉ RAMÓN ENRIQUEZ, DAVID LEVINE, AND DAVID NGUYEN
“Democratizing Statistics: How AI can Empower Everyone (without Causing Disasters)”

Better AI Can Reduce the Risks

If poorly designed AI can create risks, better AI offers a powerful way to reduce the risks of democratized statistics. 

Ideally, any AI helping with statistics will create a guided workflow.  First, it will automate data validation. Tools can flag issues such as missing values, duplicates, and outliers before analysis begins. Imagine an assistant that read in some data and reported: 

Your second variable is called height and appears to be in centimeters.  It also uses “9999” to code for a missing value.  One observation is 999 cm (33 feet) tall. It is likely that this value is a typo. Should I code it as missing? 

As the AI proceeds with the analysis, it can verify if the data satisfy statistical assumptions such as normality or linearity.  For example, before running a correlation or t-test, the AI could automatically check for normality and suggest a non-parametric test if necessary. 

It can also check functional forms. For example, it might ask:  

You requested linear regression, which assumes your variables have a linear relationship. But the residuals in your specification follow a predictable pattern (according to the Ramsey RESET test).  This result is often due to a nonlinear relationship.  I suggest using a more flexible specification such as polynomial regression and also an approach that does not make assumptions about functional form such as random forests. Should I run those analyses instead?

The AI system might also proactively alert users to potential methodological pitfalls. For instance, it could display a cautionary message such as, “Your dataset’s sample size is too small to derive reliable conclusions”—a feature already seen in some advanced software. Additionally, the AI could autonomously adjust P values in contexts involving multiple hypothesis tests, flag the risks of overfitting, or adjust standard errors to more accurately reflect the data characteristics.

One might also imagine the AI engaging interactively with the analyst to solicit further context or supplementary documentation. For example, it could request additional materials—such as codebooks, questionnaires, or survey instruments—to deepen its understanding of the dataset. If the analysis suggests that the available data are inadequate, the system might even recommend collecting additional observations to strengthen the study’s empirical foundation.

Perhaps most importantly, the AI assistant can help interpret the results.  To start, it can point out: “Remember, correlation does not imply causation. Consider additional analysis to understand causality.” 

As the AI system evolves, it will increasingly discern both the intent of the analyst and the contextual nuances of the problem. Such a well-informed assistant can go further in spotting and solving potential concerns.  For example, if the AI knows the analyst is examining the relationship of education and wages, it might add:  

It is important to consider a third factor, currently omitted from the analysis, such as the family’s emphasis on child-rearing that might increase both education and wages.

Similarly, it could warn about selection bias, such as: 

Because your sample consists entirely of individuals who have completed college or higher, these estimates may be subject to selection bias, as those who did not pursue post-secondary education are excluded. To properly interpret your results, it is crucial to account for this non-random selection process.

As the software learns from the user’s evolving expertise—and ideally, as the user’s own analytical skills expand—the degree of handholding and alerting provided by the AI can modify over time. AI systems can contribute to this process by logging each decision they make—from handling missing observations to selecting specific tests. Such an iterative approach can facilitate error detection, enhance understanding of the rationale behind the AI’s suggestions, and foster trust in the system. Just as researchers now routinely submit their Stata or R code with publications, one might foresee a future in which the submission of transcripts detailing LLM prompt-response interactions becomes a requisite component of the peer-review process.

How to implement? 

Many of these capabilities can be integrated simply by refining the prompt provided to a large language model.  For example, ChatGPT 4o implements some of these features if we add to a prompt requesting a correlation: 

To the extent possible, test whether the data satisfy the statistical assumptions of all techniques. Flag potential issues and interpret results with caution. 

Alternatively, it would also be possible to have additional software, module or agent that does this additional testing and explanation. In the context of increasingly popular agentic systems such functionality could be implemented as an adversarial mechanism: where one model is tasked with justifying each decision to another. However, relying on an extra module introduces a significant challenge: novice users, who stand to benefit the most from such extra guidance, may not recognize its relevance.

Thus, the optimal solution would be to embed this supplementary analysis as a default feature in any artificial intelligence tool that facilitates statistical analyses. Developers at organizations such as OpenAI, Anthropic, Google, and Meta should consider incorporating these instructive defaults into the system’s core functionality whenever it identifies that a statistical analysis is being performed. An effective approach might include querying users about their statistical proficiency at the outset and dynamically adapting the system’s feedback based on subsequent interactions and inquiries.

The Balanced Future of Statistics

As statistics become more accessible, the opportunities for innovation and insight are vast. However, the risks of misuse are just as significant. The same AI assistants that empower novices to use statistical techniques should also act as a safety net, helping new users avoid common pitfalls and guiding them to make confident, data-driven decisions.

By combining automated tools with ongoing education and awareness of statistical principles, we can create a world where everyone—not just experts—can harness the power of data responsibly and effectively. We will see users with less initial education producing rapid and credible analyses.  Along the way, the users will learn more statistics and all of society will benefit from more decisions informed by data.

As AI-based statistical tools mature, there is also an opportunity to integrate interactive learning modules. This integrated approach would allow users—regardless of their prior expertise—to progressively build deeper statistical literacy, thereby ensuring that the benefits of democratized data analysis are both far-reaching and sustainable.

We invite developers, educators, and policymakers to join the conversation and ensure that democratized statistics truly benefits everyone.

Note: Authors are listed in alphabetical order. The examples of age over 700 years and height of 999 cm are both from Levine’s experience. Written with the assistance of ChatGPT.


José Ramón Enríquez is a postdoctoral fellow at the Stanford Digital Economy Lab (Stanford HAI) and the Golub Capital Social Impact Lab (Stanford GSB). José Ramón obtained his Ph.D. in Political Economy and Government (PEG) from Harvard University in May 2023.

José Ramón studies the political economy of economic and political development with a focus on political accountability. Specifically, he has worked on understanding the role of information in improving political accountability, with a specific emphasis on misinformation, political polarization, and corruption; the causes and effects of criminal-political violence on democratic representation; and the effects of the lack of coordination across levels of government.

David I. Levine is a professor of business administration at Berkeley Haas and serves as chair of the Economic Analysis & Policy Group. Levine’s research focuses on understanding and overcoming barriers to improving health in poor nations. This research has examined both how to increase demand for health-promoting goods such as safer cookstoves and water filters, and how to change health-related behaviors such as handwashing with soap. He has also written extensively on organizational learning (and failures to learn).

David Nguyen is a research scientist at the Stanford Digital Economy Lab. David’s research explores new and better ways to measure the modern and digital economy. He is particularly interested in advancing economic metrics and statistics on economic output and welfare.

Prior to joining the Stanford Digital Economy Lab, David worked as an economist at the OECD in Paris, and as a senior economist at the National Institute of Economic and Social Research (NIESR). As a research associate, he remains affiliated to the London-based Economic Statistics Centre of Excellence (ESCoE). David received his PhD from the London School of Economics.

INSIGHTS

American democracy is soul-searching. An AI-era version of the Federalist Papers may be the answer

Erik Brynjolfsson is the Jerry Yang and Akiko Yamazaki Professor and Senior Fellow at the Stanford Institute for Human-Centered AI (HAI), and Director of the Stanford Digital Economy Lab. He also is the Ralph Landau Senior Fellow at the Stanford Institute for Economic Policy Research (SIEPR), Professor by Courtesy at the Stanford Graduate School of Business and Stanford Department of Economics, and a Research Associate at the National Bureau of Economic Research (NBER).

by Erik Brynjolfsson
Director, Stanford Digital Economy Lab

December 5, 2024
4 min read

This article originally appeared on Fortune.com.

Most Americans associate the year 1776 with the country’s independence, but a lot more happened that year. Scottish inventor Thomas Watt introduced an improved steam engine that ushered in the Industrial Revolution. Adam Smith published The Wealth of Nations, providing a fresh vision for economics. This convergence of transformative technological, economic, and political activity completely reshaped the way people lived, worked, and governed themselves. It was a time when leaders were reimagining the very purpose of our nascent democratic institutions and how to proceed in a way that was best for all citizens.

Almost 250 years later, we’re at a similar convergence. The introduction and rapid progression of artificial intelligence (AI) presents our society with unprecedented challenges but also opportunities that, if approached thoughtfully, could strengthen Americans’ engagement with the democratic process.

Today, we still have the power to steer AI toward beneficial outcomes. But we must act swiftly because the decisions we make today will shape the institutions that govern generations to come. 

First, it’s important to frame the challenges. In 2024, much of the underlying infrastructure (our laws, institutions, policies, and systems) has not kept up with the rapidly changing technological landscape. The adoption of computation, the internet, multi-sided networks, globalization, and data-driven decision-making have catalyzed the transformation of almost every facet of our lives. And now the possibilities presented by AI will further accelerate that transformation due to its broad applications. 

Currently, there is a growing gap between the technical capabilities of technologies like large language models, and what most people, including governments, actually do with them. These tools already beginning to shape our discourse and will have even more influence soon, making it crucial for us to stay actively involved in guiding their direction.

“A technology as fast-changing as AI requires nimbleness, keeping an open mind, and the continual engagement of a diverse group of leaders to debate and guide the technology.”

ERIK BRYNJOLFSSON
Director, Stanford Digital Economy Lab

The modern-day Federalist Papers

In an effort to guide the critical decisions before us as a society, we borrowed the concept of The Federalist Papers, a series of essays written in the late 18th century that analyzed the great challenges of the day and provided a roadmap for institutional innovation for a young America. We concluded that it was time for another gathering of leading thinkers who could offer ideas about how we should approach governing in the age of AI. We asked 19 authors their thoughts and brought together their responses in a volume of essays we call The Digitalist Papers.

Unlike The Federalist Papers, we did not have a goal of persuading readers to adopt a pre-determined action (in their case, arguing for the ratification of the Constitution). Rather, our aim was to elevate the voices of varied experts who have informed and well-researched ideas and bring out a menu of solutions across a range of related topics. Authors represent diverse fields—including economics, law, technology, management, and political science—along with leaders from industry and civil society. 

This matters because the thoughtful examination of AI-enabled developments in relation to our institutions requires us to turn the spotlight on those institutions and ask whether they remain fit for purpose in all respects, and or if they can benefit from AI or other changes. This is a daunting exploration that demands deep domain expertise and collaboration across academic disciplines.

(One final thing worth noting: Authors did not see each other’s work before publishing.)

Here’s what we learned:

  • AI has the power to make governance more inclusive and technologically integrated: A group of essays urged for a significant shift in how AI intersects with the democratic process. Ideas included AI-driven collaborative governance, using AI to scale direct democracy by amplifying citizens’ voices, and using digital tools to advance hyper-local community-focused engagement.

  • AI is forcing us to rethink how we organize our government: Some authors focused on how our government’s failure to deliver the services its citizens have come to expect will lead to civic disengagement, and how the adoption of AI—as well as broader adoption of technology focused on meeting citizens’ needs—can help reinstill faith that government is for the people.

  • The regulation of AI is arguably more important than the tech itself: Three authors focused on the role of regulation, stating that the processes by which we currently regulate don’t stand up to something as dynamic and transformative as AI, that we might be exaggerating on AI’s impact on information and thereby undermining trust in all media, regardless of its accuracy.

Given the broad spectrum of perspectives, at times the authors offered conflicting ideas and differing approaches. That is by design. But the bedrock of every essay was that democracy and AI can and should work well together; that our democratic institutions can be renewed and reinvented for the AI era.  

A technology as fast-changing as AI requires nimbleness, keeping an open mind, and the continual engagement of a diverse group of leaders to debate and guide the technology. It’s not inevitable that AI will lead to more freedom and participation in democracy. If left unchecked, there’s a chance AI would change how we govern in ways the Founding Fathers might have found abhorrent. Therefore, it is our responsibility to approach these urgent questions with the conviction that we have the agency to change it.

Keep reading

INSIGHTS

Q&A: AI and the Future of Work with Erik Brynjolfsson and Tom Mitchell

Erik Brynjolfsson is the Jerry Yang and Akiko Yamazaki Professor and Senior Fellow at the Stanford Institute for Human-Centered AI (HAI), and Director of the Stanford Digital Economy Lab. He also is the Ralph Landau Senior Fellow at the Stanford Institute for Economic Policy Research (SIEPR), Professor by Courtesy at the Stanford Graduate School of Business and Stanford Department of Economics, and a Research Associate at the National Bureau of Economic Research (NBER).

Tom Mitchell is Founders University Professor in the Machine Learning Department in Carnegie Mellon University’s School of Computer Science.

by Matty Smith
Communications

November 22, 2024
7 min read

How can humans shape a future where AI benefits everyone? The National Academies of Sciences, Engineering, and Medicine have released a new report called “Artificial Intelligence and the Future of Work,” which takes an in-depth look at the relationship between AI and the workplace.

Co-chairs Erik Brynjolfsson, director of the Stanford Digital Economy Lab, and Tom Mitchell, Founders University Professor at Carnegie Mellon University, answered questions about the report.

What was the purpose of the report?

Tom Mitchell: Congress requested this report to provide a study by the U.S. National Academies of the “current and future impact of artificial intelligence on the workforce of the United States across sectors.” It builds on an earlier 2017 report on the same topic, which Erik and I co-chaired. A lot has happened in AI and the economy since 2017!

What were your hopes going into the report, and were they met?

Erik Brynjolfsson: We started working on this report in 2022, before ChatGPT was released. At the time, I knew that we were in the early stages of a technology revolution that was vastly increasing the power of AI—I’ve been using LLMs for a while already.

We assembled some of the absolute top experts from academia and industry to put together a report on the implications for the workforce in the economy. We hoped to provide the definitive source that people could turn to to better understand not only the technology, but also the implications for productivity, the workforce, education, and measurement issues. I’m delighted with the results and believe we delivered on that goal.

What findings surprised you?

TM: [When we began in 2022] I was assuming that physical robots would be one of the AI technologies that would have the greatest impact on jobs—things like self-driving vehicles, and assembly line work. When ChatGPT appeared, and other LLMs, we had to go back to square one in our thinking about where AI was headed, and what it meant for jobs. 

As it turns out, LLMs operate in the mental world of knowledge work, in contrast to the physical world where robots work. Therefore, the impact on jobs is very different from what I expected when we got started.

What are the biggest changes since the 2017 study?

TM: The appearance of LLMs, and the impact the pandemic had on work habits such as remote work and the economy (for example, a big increase in online retail and services).

EB: One of the real pleasures of being able to work with Tom Mitchell again on a National Academies report was that we could compare what we learned this time to what we did in 2017. One of the biggest changes is what happened with measurement. Perhaps the most prominent conclusion of the earlier report was that we were flying blind due to a lack of good data on AI and its effects on the economy. In fact, we used the phrase ”flying blind” in an article for the journal Nature that we wrote to summarize our findings.

However, this time around the data is much improved. Not only has the government stepped up and helped and funded detailed research on AI technology adoption by over 800,000 firms but the private sector also has much more detailed and often real-time data about technology, wages, job postings, and other changes in the economy. There’s a real opportunity to combine these improved sources of data in a public-private partnership to get an even better understanding of AI and its effects.

What hasn’t changed in the way you’d like?

TM: In our 2017 report, we predicted that self-driving cars might be widespread by now—too optimistic a prediction—but we never predicted that by 2024 you’d be able to have intelligent conversations with computers.

“Our most important conclusion was that there is no predetermined future, but rather our choices will determine our future. In fact as the tools become more powerful, our choices become ever more consequential.”

ERIK BRYNJOLFSSON
Director, Stanford Digital Economy Lab

What excites you most about the potential for AI to enhance education?

TM: I believe this is the decade when AI can change education for the better. Why? 

First, we have known for a while that human tutoring significantly improves student learning. 

Second, we now have multiple online education platforms that have taught millions of online students, who therefore have more experience teaching than a person could get even if they taught for a hundred years. We now have machine learning methods that use that data to learn and teach better. 

Third, we have just begun to create new computer tutors that use LLMs like ChatGPT to teach in new ways, like having conversations with students about their confusions, and like co-writing essays with students. The opportunity is great, but we need to organize and fund this research if we want to turn this potential into reality.

EB: As an educator myself, I’m acutely aware that my industry has been badly lagging when it comes to using technology–after all, I often use a chalkboard that Aristotle himself would have been comfortable with. But with generative AI, we have the potential to transform education in a fundamental way. In particular, we know from research that students who get individualized tutoring can learn up to two standard deviations faster than students in a classroom where 20 or 30 kids are all instructed at the same pace. Of course, individual tutors have been far too expensive to provide for all kids. 

Now LLMs like ChatGPT have the potential to change that. I already know a lot of kids, and for that matter, adults, including a Nobel Laureate, who use these tools daily for individualized tutoring to learn new subjects and to dive deeper into existing topics. What’s more, LLMs can be incredibly engaging and entertaining. In the coming years, I expect a revolution as developers create even better versions of these tools and millions of stu dents gain their benefits.

Is there anything in particular you’d like readers to focus on?

TM: One of the most important things we can do for the workforce is give them a clear, real-time picture of how the demand for different worker expertise is shifting over time, and what education opportunities are available to them to chart their own career path. A key part of this is actually collecting this real-time information. That will require public-private data partnerships that combine data from different companies that have real-time information about job openings, resumes, salary scales, and more. 

We can do this, but it will require imaginative thinking about how to build these partnerships while respecting the legitimate privacy and competitive concerns of the organizations that have these data.

What would you hope to see if you did another report down the line?

EB: One thing I expect will happen if we do a report like this in the future is that we’re going to not just talk to talk but walk the walk. AI will be a much bigger contributor: helping us with the literature review, gathering the relevant data, analyzing it, and even assisting with the writing. We are acutely aware of some of the weaknesses and risks of AI, but also the real strengths, especially when it’s combined with human oversight.

Lastly, what do you hope people take away from the report?

TM: We strongly expect continuing technical progress in AI, and expect it to significantly impact the economy and the workforce. I hope readers of the report get that the future we’ll live in is not destined by technology—really, it will be determined by the decisions we make about how to use it. That includes decisions made by politicians, technologists, workers, and all of us.

EB: As we worked on this report, it became clear that there are some real opportunities for an amazingly better world when it comes to improvements in science, healthcare, and widely shared prosperity, but also a real risk in areas like privacy, bias, democracy, national defense–even catastrophic risk. We laid out different scenarios and discussed some of the choices that we can make individually and collectively. 

Our most important conclusion was that there is no predetermined future, but rather our choices will determine our future. In fact as the tools become more powerful, our choices become ever more consequential.

2023: Year in Review

Just a few of the things we did in 2023

Show buttons

The past year has seen the Lab holding true to our mission of pursuing a deeper understanding of the digital economy and its impact on the future of work and society. From groundbreaking research to new collaborations to far-reaching events, here are just a few of our accomplishments in 2023.

Follow us on X and LinkedIn to keep up with the Lab and our work. While you’re at it, sign up for email updates to receive The DigDig newsletter and news about future events.

Research

First-of-its-kind study explores how generative AI impacts productivity in the workplace

A groundbreaking study finds that generative AI tools like ChatGPT can boost productivity
Women sitting at desk looking at computer

Can generative AI boost productivity in the workplace? Lab researchers Erik Brynjolfsson, Danielle Li, and Lindsey Raymond tested AI software with more than 5,000 agents at an unnamed Fortune 500 company to find out. The surprising result: The company posted a 14% increase in the number of customer service chats an agent successfully responded to per hour. Another surprising finding: The boost in productivity was seen in the lowest-skilled customer service agents, while their higher-skilled counterparts experienced only a slight increase.

Related
Generative AI at Work
Arxiv

First study to look at AI in the workplace finds it boosts productivity
Axios

How to capitalize on generative AI
Harvard Business Review

Will generative AI make you more productive at work? Yes, but only if you’re not already great at your job
Stanford HAI

How generative AI is placing CFOs at the forefront of company strategy
Fortune

Brainstorm AI 2023: Economic Impacts of AI and ML on the Workforce
Fortune

Collaborations

A new initiative to promote democracy and responsible technology on the internet

The Lab joined forces with Project Liberty to promote a ‘more responsible approach to digital technology worldwide’
Civic entrepreneur Frank McCourt, Jr. (second from left) with members of the faculty steering committee, who will lead and oversee Stanford’s activities under Project Liberty’s Institute. From left to right, Erik Brynjolfsson, Rob Reich, Michael McFaul, Marietje Schaake, and Nathaniel Persily. (Image credit: Melissa Morgan)
From left to right, Erik Brynjolfsson, Frank McCourt, Jr., Rob Reich, Michael McFaul, Marietje Schaake, and Nathaniel Persily. (Image credit: Melissa Morgan)

Project Liberty brings technologists, academics, policymakers, and citizens together to improve technology, including a more open internet infrastructure. Earlier this year, the Lab, along with Stanford University, joined Project Liberty in its effort to produce a more responsible approach to digital technology. “Stanford will add an important anchor for us in Silicon Valley,” said founder Frank McCourt. Jr. “With their openness to collaboration, focus on solutions, and shared sense of urgency, Stanford faculty will help propel our work.”

Related
Project Liberty expands global alliance to strengthen democracy and responsible technology

Stanford joins international initiative to strengthen democracy and foster responsible technology

Project Liberty Institute

Knowledge sharing

Addressing a diverse range of topics about AI and the digital economy

Researchers, scholars, and experts visited the Lab throughout the year to share their insights with us—and the world
Simon Johnson
Simon Johnson of MIT visited us in October 2023 for this talk, “Can We Redirect Technological Change? When, How, and to Achieve What Exactly?”

Our 2023 Seminar Series covered a wide range of pressing topics—including productivity during the pandemic, consumer demand to support black-owned businesses, and data deserts and inequality. Catch up on all of our seminars (or watch them again) from the past year.

Related
Seminar Series 2023: Year in Review

Collaboration

Ongoing collaboration: The ADP National Employment Report

The Lab continued its collaboration with the ADP Research Insititute to deliver a monthly snapshot of employment among private employers in the United States
Female grocery store worker

The Lab began working with the ADP Research Institute in 2022 to produce the new and improved ADP National Employment Report, which measures the changes in private employment based on payroll data from more than 25 million employees. Today, that collaboration is going strong.

Related
ADP National Employment Report 

ADP Pay Insights Report

ADP Research Institute

Events

Discovering new and better ways to measure the economy

Workshop makes the case for new and better methods to measure the economy

The way we currently measure the economy is outdated—and that divide will only grow in the emerging AI-powered economy characterized by goods and services that have zero price. So how can we understand, let alone manage, what we do not accurately gauge? The New Measures of the Economy Workshop convened researchers and experts to explore new and better methods of measurement.

Related
Crafting a New Measure of Economic Well-Being
Stanford Digital Economy Lab

Research

The who, what, and where of AI adoption in America

A new paper examines the early adoption of five AI-related technologies in the US
Office building at night

In the working paper, “AI Adoption in America: Who What, and Where,” a team of researchers examined how 850,000 firms in the United States used five AI-powered technologies—autonomous vehicles, machine learning, machine vision, natural language processing, and voice recognition. Their findings? Fewer than 6% of firms used any of the five AI-related technologies.

Related
AI Adoption in America: Who, What, and Where
NBER

‘AI divide’ across the US leaves economists concerned
The Register

Education

Exploring the AI awakening

A new course drew experts to Stanford to discuss how artificial intelligence will transform the economy and society in the years to come
Erik Brynjolfsson

In “The AI Awakening: Implications for the Economy and Society,” a Stanford course led by Lab Director Erik Brynjolfsson, students discussed and debated the ways AI will impact the future. Guest speakers included Mira MuratiJack ClarkLaura D’Andrea TysonAlexandr WangCondoleezza RiceBindu ReddyEric SchmidtMustafa Suleyman, and Jeff Dean, A 2024 course is planned.

Related
The AI Awakening: Implications for the Economy and Society
Stanford Digital Economy Lab

Research

Where are all the robots?

New research reveals details about robot adoption and concentration in US manufacturing

Who knew that Iowa, Michigan, Kansas, Wisconsin, and Minnesota led the nation with the highest concentration of robots in manufacturing? In the working paper titled “The Characteristics and Geographic Distribution of Robot Hubs In US Manufacturing Establishments,” Erik Brynjolfsson, J. Frank Li, and other researchers used data from the US Census Bureau’s Annual Survey of Manufacturers to examine which manufacturers use robotics, where the robots are, and how establishments are using them.

Related
The Characteristics and Geographic Distribution of Robot Hubs in U.S. Manufacturing Establishments
NBER

What ‘robot hubs’ mean for the future of US manufacturing
Stanford Digital Economy Lab

The Midwest is America’s robot capital
Axios

Stanford University