Steve Newman

Wiki Contributions

Comments

Steve Newman's Shortform

This is a linkpost for https://amistrongeryet.substack.com/p/alphaproof-and-openai-o1

The latest advances in AI reasoning come from OpenAI's o1 and Google's AlphaProof. In this post, I explore how these new models work, and what that tells us about the path to AGI.

Interestingly, unlike GPT-2 -> GPT-3 -> GPT-4, neither of these models rely on increased scale to drive capabilities. Instead, both systems rely on training data that shows, not just the solution to a problem, but the path to that solution. This opens a new frontier for progress in AI capabilities: how to create that sort of data?

In this post, I review what is known about how AlphaProof and o1 work, discuss the connection between their training data and their capabilities, and identify some problems that remain to be solved in order for capabilities to continue to progress along this path.