Total Recall? Evaluating the Macroeconomic Knowledge of Large Language Models

Leland D. Crane; Akhil Karra; Paul E. Soto

June 2025

Total Recall? Evaluating the Macroeconomic Knowledge of Large Language Models

Leland D. Crane, Akhil Karra, and Paul E. Soto

Abstract:

We evaluate the ability of large language models (LLMs) to estimate historical macroeconomic variables and data release dates. We find that LLMs have precise knowledge of some recent statistics, but performance degrades as we go farther back in history. We highlight two particularly important kinds of recall errors: mixing together first print data with subsequent revisions (i.e., smoothing across vintages) and mixing data for past and future reference periods (i.e., smoothing within vintages). We also find that LLMs can often recall individual data release dates accurately, but aggregating across series shows that on any given day the LLM is likely to believe it has data in hand which has not been released. Our results indicate that while LLMs have impressively accurate recall, their errors point to some limitations when used for historical analysis or to mimic real time forecasters.

Keywords: Artificial intelligence, Forecasting, Large language models, Real-time data

DOI: https://doi.org/10.17016/FEDS.2025.044

PDF: Full Paper

Disclaimer: The economic research that is linked from this page represents the views of the authors and does not indicate concurrence either by other members of the Board's staff or by the Board of Governors. The economic research and their conclusions are often preliminary and are circulated to stimulate discussion and critical comment. The Board values having a staff that conducts research on a wide range of economic topics and that explores a diverse array of perspectives on those topics. The resulting conversations in academia, the economic policy community, and the broader public are important to sharpening our collective thinking.

Board of Governors of the Federal Reserve System

Finance and Economics Discussion Series (FEDS)

Total Recall? Evaluating the Macroeconomic Knowledge of Large Language Models