OpenAI is launching o3 and o4-mini today. These are the latest additions to a series of language models optimized for reasoning.
The product launch took place amid reports that the company may want to acquire Windsurf for $3 billion. Windsurf, officially Exafunction, sells popular AI programming tools. The company uses OpenAI models for some of its functions.
According to OpenAI, o3 is the most advanced model to date regarding reasoning. The other new model, o4-mini, sacrifices some output quality in exchange for faster performance and lower costs. According to the company, both models are more cost-efficient than their predecessors in most practical applications.
OpenAI claims that o3 has set new records on several well-known AI benchmarks. One is SWE-bench, a test that assesses AI’s programming abilities by having models solve bugs in open-source projects. Another benchmark, MMMLU, on which o3 also excelled, contains university-level questions on topics such as science and economics.
One factor contributing to the quality of the output is the model’s improved ability to use tools. This means that the model can use external systems, such as a code editor or search engine, to perform tasks that it would not otherwise be able to complete independently. OpenAI reports that o3 can analyze and generate images, execute Python code, search the internet, and interact with custom tools that customers connect via an API.
Twenty percent fewer major errors
OpenAI employees stated that external experts have determined in evaluations that o3 makes 20% fewer major errors than the previous model, o1, when performing complex, practical tasks.
The second model launched today, o4-mini, has many of the same tool usage features as o3. The difference is that this model is smaller, allowing it to handle a more limited number of tasks, but it performs them faster and more cheaply. OpenAI says this cost efficiency makes offering much broader usage limits possible than with o3.
Internal testing shows that o4-mini is particularly suitable for tasks that require calculation, programming, and visual input. Even without tools, this model can outperform the more advanced o3 on tests such as AIME 2024 and AIME 2025, qualifying competitions for the US Mathematical Olympiad. OpenAI staff indicated that expert evaluations showed that o4-mini performs better than its predecessor, o3-mini, in non-science tasks and in areas such as data science.
New open-source project
The launch of the models was accompanied by a new open-source project called Codex CLI. This AI agent is optimized for programming tasks that developers can run locally on their desktops. The program is accessible via the terminal, the part of the operating system where users can execute commands via scripts instead of a graphical interface.
Rumors of Windsurf acquisition
OpenAI’s ambitions in the programming assistant market may extend beyond open-source projects. Based on sources familiar with the situation, Bloomberg and CNBC reported that OpenAI is in talks to acquire Windsurf. The acquisition could be worth $3 billion.
Windsurf, which until recently operated under the name Codeium, offers an AI programming assistant that can generate new code, explain existing code, and perform related tasks. This assistant can be integrated into popular code editors via plugins. In addition, Windsurf has developed its own editor specifically designed to help developers integrate AI into their work.