Hermes 4 AI Models Challenge ChatGPT with Unrestricted Performance: New Open-Source Alternative Beats GPT-4 in Speed Tests

Are you looking for smarter insights delivered straight to your inbox? Subscribe to our weekly newsletters for essential updates tailored for leaders in enterprise AI, data, and security.

Introducing Hermes 4

Nous Research, an enigmatic artificial intelligence startup, has emerged as a prominent voice in the open-source AI movement. On Monday, the company quietly launched Hermes 4, a series of large language models that it claims can rival the performance of leading proprietary systems while providing exceptional user control and minimal content restrictions. This release marks a significant escalation in the ongoing rivalry between open-source AI advocates and major tech companies regarding access to advanced AI capabilities.

A Shift in AI Design

Unlike models from OpenAI, Google, or Anthropic, Hermes 4 is uniquely designed to respond to nearly any request without the safety guardrails that have become commonplace in commercial AI systems. Nous Research describes Hermes 4 as the latest iteration of its user-aligned models, featuring enhanced test-time compute capabilities. The company emphasized that considerable attention was devoted to ensuring the models are creative and engaging while remaining free from censorship and neutral in alignment, all while maintaining top-tier performance in mathematics, coding, and reasoning for open-weight models.

Hybrid Reasoning Feature

Hermes 4 introduces a novel feature called “hybrid reasoning,” which allows users to switch between rapid responses and more in-depth, step-by-step thinking processes. When activated, the models generate their internal reasoning within special tags before delivering a final answer. This approach is reminiscent of OpenAI’s o1 reasoning models but offers complete transparency into the AI’s thought process.

Technical Achievements

The technical accomplishments behind Hermes 4 are noteworthy. In testing, the model’s largest version, with 405 billion parameters, achieved a score of 96.3% on the MATH-500 benchmark in reasoning mode and 81.9% on the challenging AIME’24 mathematics competition, rivaling or surpassing many proprietary systems that cost millions more to develop. AI researcher Rohan Paul highlighted the challenge of making thinking traces useful and verifiable without allowing for runaway reasoning, underscoring one of the technical advancements of this release.

Performance on RefusalBench

Notably, Hermes 4 achieved the highest score among all tested models on “RefusalBench,” a new benchmark created by Nous Research to assess how often AI systems decline to answer questions. In reasoning mode, the model scored 57.1%, significantly outperforming GPT-4o (17.67%) and Claude Sonnet 4 (17%). This performance indicates that Hermes 4 answered substantially more questions than competing AI systems on RefusalBench, which measures the frequency of model refusals to respond to user requests.

Innovative Training Infrastructure

The capabilities of Hermes 4 are underpinned by a sophisticated training infrastructure that Nous Research has developed over several years. The models were trained using two innovative systems: DataForge, a graph-based synthetic data generator, and Atropos, an open-source reinforcement learning framework. DataForge creates training data through “random walks” within directed graphs, transforming basic pre-training data into complex instruction-following examples. For instance, it can convert a Wikipedia article into a rap song and subsequently generate related questions and answers.

Atropos functions as a series of specialized training environments where AI models practice specific skills—such as mathematics, coding, tool use, and creative writing—receiving feedback only when they produce correct solutions. This “rejection sampling” method ensures that only verified, high-quality responses are included in the training data.

In summary, Nous Research has leveraged these advanced environments to curate the dataset for Hermes 4, setting a new standard in the realm of open-source AI models.

Tags
Artificial intelligence

Hermes 4 AI Models Challenge ChatGPT with Unrestricted Performance: New Open-Source Alternative Beats GPT-4 in Speed Tests

Introducing Hermes 4

A Shift in AI Design

Hybrid Reasoning Feature

Technical Achievements

Performance on RefusalBench

Innovative Training Infrastructure

Navigation

Top Infos

Testo Showcases New Thermal Camera and Digital Measurement Ecosystem at Efintec 2025 – Energética 21 Insights

Dehn to Unveil Next-Gen Surge Protection Solution at Efintec 2025: Advanced Safety Features for Industrial Power Systems

Telematel Showcases Digital Solutions for Installers at Efintec 2025: Smart Tech Transforming the Energy Sector

Latest Updates on Genera/Matelec 2025: Key Insights and Developments From Energética 21 for Industry Professionals and Stakeholders

Efintec 2025: Leading Energy Innovation Summit Returns With Advanced Solutions for Sustainable Power Technology

Favorites

F1 Dutch GP: Verstappen Dominates Rain-Soaked Race at Zandvoort, Claims 9th Straight Win of 2023 Season

Game Enthusiasts Prioritize Rent Over Gpu Upgrades: a Liquid Web Study

Claude 4.1 Achieves 98% Accuracy in MIT Coding Tests, Sets New AI Performance Benchmark Ahead of GPT-5 Launch