Are you looking for smarter insights delivered straight to your inbox? Sign up for our weekly newsletters to receive essential updates on enterprise AI, data, and security. Subscribe now!
OpenAI’s New Model Release
OpenAI recently launched its powerful open weights AI large language model (LLM) family, known as gpt-oss, under a permissive Apache 2.0 license. This marks the company’s first open weights model release since GPT-2 in 2019. In less than two weeks, developers outside of OpenAI have already begun to reshape this model. A notable example is Jack Morris, a PhD student at Cornell Tech and former Google Brain Resident, who introduced gpt-oss-20b-base. This customized version of OpenAI’s smaller gpt-oss-20B model eliminates the model’s “reasoning” behavior, reverting it to a pre-trained “base” version that delivers faster, more unrestricted, and uncensored responses. The model is currently available on Hugging Face under a permissive MIT License, allowing for both research and commercial use.
Understanding Base Models
To grasp Morris’s modifications, it’s essential to differentiate between OpenAI’s release and what AI researchers refer to as a “base model.” Most LLMs from leading AI labs, including OpenAI, Anthropic, Google, and even open-source contributors like Meta and Alibaba’s Qwen team, are “post-trained.” This means they undergo an additional phase where they are exposed to curated examples of desired behavior.
For instruction-tuned models, this involves providing numerous examples of instructions paired with ideal responses, enabling the model to respond more helpfully, politely, or safely to natural language requests. The gpt-oss models released by OpenAI on August 5 were “reasoning-optimized.” They were trained and fine-tuned not just to predict the next word, but also to follow instructions in a safe and consistent manner, often using structured “chain of thought” reasoning before arriving at a final answer.
This trend began with OpenAI’s o1 model released in September 2024, which many leading AI labs have since adopted. These models are designed to think through multiple steps and verify their outputs before delivering well-reasoned responses. While this enhances their suitability for tasks like coding, solving math problems, or providing factual answers with explanations, it also means their outputs are filtered to avoid unsafe or undesirable content.
In contrast, a base model represents the raw, pre-trained version of a large language model, prior to the application of reasoning-specific alignment. Base models focus solely on predicting the next segment of text based on previous input, without any built-in guardrails, stylistic preferences, or refusal behaviors. Researchers value these models for their ability to produce diverse and less constrained outputs, and for the insights they provide into how models store knowledge and patterns from their training data.
Morris aimed to “reverse” OpenAI’s alignment process and restore the smaller gpt-oss-20B to a state much closer to its original pre-trained condition. He explained in an X thread announcing his project, “We basically reversed the alignment part of LLM training, so we have something that produces natural-looking text again. It doesn’t engage in CoT anymore. It is back to a model that just predicts the next token on generic text.”
The Alignment Reversal Process
OpenAI has not released a base model since GPT-2 in 2019. Although they recently introduced GPT-OSS, which is reasoning-focused, it turns out that a robust base model still exists beneath the surface. Morris extracted this base model, stating, “So we extracted it. Introducing gpt-oss-20b-base.”
Instead of attempting to jailbreak the model using clever prompts—an approach Morris found ineffective during his initial experiments—he adopted a different strategy after consulting with John Schulman, a former OpenAI co-founder and current chief scientist at Thinking Machines.
Morris conceptualized the alignment reversal as a minor optimization problem: if most of the model’s pre-trained knowledge remains in its weights, only a small, low-rank update might be necessary to guide it back to base model behavior. He implemented this idea by applying a LoRA (low-rank adapter) update to just three layers of the model—the MLP layers at positions 7, 15, and 23—with a rank of 16. This process involved training approximately 60 million parameters, or 0.