Researchers at Helmholtz Munich today unveiled Centaur, a new artificial intelligence model that can predict human decisions with impressive accuracy by learning from more than ten million choices made in psychological experiments. The model, which builds on the Llama 3.1 70B language architecture, was fine‑tuned on a dataset called Psych‑101, and it not only reproduces familiar decision patterns but also generalizes to entirely new scenarios described in natural language helmholtz-munich.denature.com.
Psych‑101 serves as the backbone for Centaur’s learning, containing trial‑by‑trial transcripts from 160 behavioral studies in which over 60,000 participants made more than 10.6 million individual decisions nature.com. Each entry in the dataset pairs a natural language description of the task with the choice a human subject made, and it tags those responses with a simple “>” symbol so the model can clearly distinguish between instructions and answers. By processing this vast corpus, the AI developed an internal representation of how humans weigh options, face risk, or react under uncertainty.
What sets Centaur apart is its ability to handle fresh tasks that it never saw during training. When researchers tested the model on experiments with modified cover stories or entirely new problem structures, Centaur matched or outperformed traditional cognitive models in predicting both the choices people made and how long they took to decide. In one evaluation involving nearly four million reaction times, Centaur’s measure of uncertainty in its own output explained 58 percent of the variance in human response times, compared to 40 percent for the base Llama model and 38 percent for classic decision theories nature.com. This finding aligns with Hick’s law, which links decision time to the amount of information processed.
The team led by Dr. Marcel Binz and Dr. Eric Schulz sees Centaur not only as a prediction engine but as a virtual lab for scientists who study the mind. By simulating how people choose between options, psychologists can test new hypotheses without running costly human trials, and they can explore how decision patterns shift in groups with different traits or under different conditions. Potential applications include tailoring mental health treatments by simulating how patients with anxiety or depression might respond to therapeutic choices, or designing training programs that account for individual decision styles.
However, Centaur also faces valid criticism. Some experts point out that the model can produce unrealistically fast decisions—sometimes claiming to decide in under a millisecond—because it optimizes for accuracy rather than human‑like timing. That gap highlights the challenge of truly capturing the biological constraints of the brain, even as Centaur advances our ability to forecast decisions. Moreover, replicating brain‑like processes remains a frontier: although fine‑tuning brought the model’s internal representations closer to patterns seen in neural data, it has not yet matched the full complexity of real neural networks.
Overall, the launch of Centaur marks a milestone in the drive toward unified theories of cognition that both explain and predict human thought. By blending large‑scale language modeling with behavioral science, researchers have created a tool that promises to reshape how we study decision making and develop clinical interventions.
Short Analysis
I find Centaur fascinating because it shows how AI and psychology can work together to deepen our understanding of the mind, and I expect that as the Psych‑201 dataset expands to 100 million choices, the model will grow even more reliable and versatile. At the same time, I wonder how well Centaur will handle cultural differences or choices driven by emotion rather than logic, so I look forward to seeing studies that test it in real‑world settings and with diverse populations.