In a surprising shift, OpenAI has unveiled a new family of artificial intelligence models known as the “o1” series, skipping the anticipated GPT-5 for now. Since the release of the GPT-4 model in March 2023, the tech community has been eagerly awaiting the next generation of AI from OpenAI. However, instead of introducing GPT-5, the company has launched two new models under the o1 family: o1-preview and o1-mini, designed to excel at solving complex tasks and surpassing the capabilities of the GPT series.
These new models, which OpenAI says are engineered to tackle more advanced problems, became available today for ChatGPT Plus users, albeit with limitations on the number of messages that can be sent each week—30 messages for o1-preview and 50 for o1-mini. While the new models bring exciting advancements, OpenAI has been clear that they currently lack some of the practical features that have made ChatGPT popular, such as the ability to browse the internet and upload files or images. In fact, initial testing revealed that o1-preview was unable to generate images, a feature that remains unavailable in this early beta version.
What Sets the o1 Series Apart?
The o1 models have been optimized for users in highly specialized fields such as science, healthcare, and technology. OpenAI envisions the o1 series being used for everything from helping physicists with complex quantum optics calculations to assisting healthcare professionals in analyzing and annotating intricate biological data. For developers, the o1-mini model promises to be particularly useful in constructing multi-step workflows, debugging code, and solving programming challenges with greater efficiency.
PhD-Level Performance with o1-preview
One of the standout features of the o1-preview model is its ability to dedicate more time to formulating responses, mimicking how a person would carefully consider complex problems before arriving at a solution. This approach has enabled o1-preview to perform at a level comparable to PhD students in challenging academic fields such as physics, chemistry, and biology.
In coding tasks, o1-preview has also demonstrated impressive prowess, ranking in the 89th percentile of Codeforces competitions, where it has been particularly effective at handling multi-step processes, debugging intricate code, and delivering accurate solutions. Additionally, the model excelled in the International Mathematics Olympiad (IMO) qualifying exams, solving 83% of the problems presented—an enormous leap compared to GPT-4o’s 13% success rate.
ChatGPT Plus and Team users already have access to o1-preview, with Enterprise and educational users expected to gain access in the coming week. Developers interested in integrating the model into their systems can also access it through OpenAI’s API, although usage is initially limited to those in tier 5 of the API system, with certain rate limits in place.
o1-mini: More Affordable, Streamlined Performance
Alongside the o1-preview model, OpenAI has launched a more cost-effective version known as o1-mini. Though not as powerful, o1-mini still delivers strong performance, particularly in STEM (science, technology, engineering, and mathematics) fields, and has been optimized for coding and mathematical tasks. The o1-mini model scored 70% on the IMO math benchmarks, coming close to the 74% achieved by the more advanced o1-preview model, while being far more affordable.
In coding competitions, o1-mini ranked in the top 86% of programmers, achieving an Elo score of 1650 on Codeforces. This makes it a strong contender for developers seeking reasoning capabilities without the need for the more comprehensive knowledge provided by o1-preview. With a price tag 80% lower than that of o1-preview, o1-mini is positioned to appeal to developers and researchers working on a budget.
Enhanced Safety Features
Safety has always been a top priority for OpenAI, and the o1 models reflect this focus. Both o1-preview and o1-mini incorporate enhanced safety mechanisms designed to improve their ability to follow safety guidelines and respond to potentially harmful prompts appropriately. OpenAI reported that o1-preview scored an impressive 84 out of 100 on one of its most challenging tests designed to detect and prevent “jailbreaking” attempts, where users try to bypass safety protocols. This marks a significant improvement over GPT-4o, which scored only 22 on the same test.
The o1 models’ improved ability to reason within safety contexts allows them to avoid generating inappropriate or harmful content more effectively than their predecessors. As part of OpenAI’s ongoing commitment to safety, the company has partnered with both the U.S. and U.K. AI Safety Institutes to further enhance the security of future AI systems. These partnerships will allow AI safety researchers early access to a research version of the o1 models, aiding in the testing and evaluation of future AI developments.