Are you looking for smarter insights delivered straight to your inbox? Sign up for our weekly newsletters to receive essential updates focused on enterprise AI, data, and security. Subscribe now!
OpenAI’s New Models
OpenAI made a significant move yesterday by releasing two new large language models (LLMs): gpt-oss-120B and gpt-oss-20B. This marks the company’s long-awaited return to the “open” aspect of its identity. Despite achieving technical benchmarks comparable to OpenAI’s proprietary offerings, the initial reactions from the broader AI developer and user community have been mixed. If this release were evaluated like a movie on Rotten Tomatoes, the feedback would suggest a nearly 50% split.
Background on the Release
These two new text-only language models are released under the permissive open-source Apache 2.0 license, a first for OpenAI since 2019, prior to the advent of ChatGPT. The ChatGPT era has predominantly relied on proprietary or closed-source models, which users had to pay for or use under limited free tiers. This restricted customizability and the ability to run models offline or on private hardware.
The Impact of AI Scaling
Power limitations, rising token costs, and inference delays are reshaping the landscape of enterprise AI. To explore how leading teams are addressing these challenges, join our exclusive salon where you’ll learn about strategies for turning energy into a strategic advantage, architecting efficient inference for real throughput gains, and unlocking competitive ROI with sustainable AI systems. Secure your spot to stay ahead: https://bit.ly/4mwGngO.
Feedback on the New Models
The release of the gpt-oss models has introduced a new dynamic. One model is designed for use on a single Nvidia H100 GPU, suitable for small to medium-sized enterprises, while the smaller model can run on a standard consumer laptop or desktop. As these models are fresh, it has taken several hours for the AI power user community to independently test them against their benchmarks.
The feedback has been a mix of optimistic enthusiasm regarding the capabilities of these free and efficient models, contrasted with dissatisfaction from some users who perceive significant limitations, especially when compared to similarly licensed multimodal LLMs from Chinese startups. These models can also be customized and run locally on U.S. Hardware at no cost.
Benchmark Performance
Intelligence benchmarks indicate that the gpt-oss models outperform many American open-source options. According to the independent AI benchmarking firm Artificial Analysis, gpt-oss-120B is recognized as “the most intelligent American open weights model,” although it still lags behind notable Chinese models like DeepSeek R1 and Qwen3 235B.
Critics have voiced skepticism about the release. One commentator remarked, “On reflection, that’s all they did. Mogged on benchmarks,” suggesting that no significant new use cases have emerged. Another researcher described the release as “a legitimate nothing burger,” anticipating that a Chinese model would soon surpass it.
Limitations of the Models
Criticism has also centered on the gpt-oss models’ narrow applicability. An AI influencer pointed out that while the models excel in math and coding, they lack common sense and taste. In creative writing tests, some users reported that the models included equations in poetic outputs.
A researcher from a decentralized AI model training company noted that gpt-oss-120B appears to have less knowledge than a well-performing 32B model, speculating that it was trained primarily on synthetic data to avoid copyright issues. This has led to concerns about its overall effectiveness. An independent AI developer corroborated this, stating that the models seem overly specialized, excelling at specific tasks but performing poorly in other areas.