Nvidia Unveils Nemotron-Nano-9B-v2: Open-Source AI Model with 9B Parameters and Toggle Reasoning Capabilities

Are you looking for smarter insights delivered directly to your inbox? Subscribe to our weekly newsletters for essential updates tailored for leaders in enterprise AI, data, and security.

The Rise of Small Models

Small models are gaining significant attention. Following the launch of a new AI vision model from MIT spinoff Liquid AI, which is compact enough to fit on a smartwatch, and a model from Google that operates on a smartphone, Nvidia has now introduced its own small language model (SLM) named Nemotron-Nano-9B-V2. This model has achieved top performance in its category on selected benchmarks and includes a feature that allows users to enable or disable AI “reasoning,” which serves as a self-check before providing an answer.

Performance and Specifications

With 9 billion parameters, Nemotron-Nano-9B-V2 is larger than some of the multimillion-parameter models previously covered by VentureBeat. However, Nvidia emphasizes that this is a significant reduction from its original size of 12 billion parameters, specifically designed to fit on a single Nvidia A10 GPU. Oleksii Kuchiaev, Nvidia’s Director of AI Model Post-Training, explained on X that the model was pruned to 9 billion parameters to accommodate the A10, which is a popular choice for deployment. He also noted that this hybrid model can process larger batch sizes and is up to six times faster than similarly sized transformer models. For context, many leading large language models (LLMs) have over 70 billion parameters.

Challenges in AI Scaling

The landscape of enterprise AI is being reshaped by power caps, rising token costs, and inference delays. To address these challenges, join our exclusive salon to learn how top teams are:

– Transforming energy into a strategic advantage
– Designing efficient inference for real throughput gains
– Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

Multilingual Capabilities

Nemotron-Nano-9B-V2 supports multiple languages, including English, German, Spanish, French, Italian, Japanese, and extended support for Korean, Portuguese, Russian, and Chinese. This model is suitable for both instruction-following tasks and code generation. Users can find Nemotron-Nano-9B-V2 and its pre-training datasets available on Hugging Face and through Nvidia’s model catalog.

Innovative Architecture

The model is based on Nemotron-H, a set of hybrid Mamba-Transformer models that underpin Nvidia’s latest offerings. Unlike most popular LLMs that rely solely on attention layers, which can be costly in terms of memory and computation as sequence lengths increase, the Nemotron-H models incorporate selective state space models (SSMs). These SSMs can manage very long sequences of information by maintaining state, scaling linearly with sequence length and processing contexts much longer than standard self-attention without incurring the same memory and compute overhead. By replacing much of the attention with linear-time state space layers, the hybrid Mamba-Transformer achieves 2 to 3 times higher throughput on long contexts while maintaining comparable accuracy.

Features of Nemotron-Nano-9B-V2

Nemotron-Nano-9B-V2 is designed as a unified, text-only chat and reasoning model trained from scratch. By default, it generates a reasoning trace prior to delivering a final answer, although users can easily toggle this feature using control tokens like /think or /no_think. Additionally, the model introduces runtime “thinking budget” management, allowing developers to limit the number of tokens allocated to internal reasoning before the model finalizes a response. This feature aims to balance accuracy with latency, particularly in applications such as customer support or autonomous agents.

Evaluation and Accuracy

Evaluation results demonstrate competitive accuracy compared to other open small-scale models. In “reasoning on” mode, tested with the NeMo-Skills suite, Nemotron-Nano-9B-V2 achieved scores of 72.1% on AIME25, 97.8% on MATH500, 64.0% on GPQA, and 71.1% on LiveCodeBench. It also reported scores for instruction following and long-context benchmarks: 90.3% on IFEval, 78.9% on the RULER 128K test, along with smaller but measurable improvements on BFCL v3 and the HLE benchmark. Overall, Nano-9B-V2 demonstrated higher accuracy than Qwen3-8B, a common benchmark.

Nvidia has illustrated these findings with accuracy-versus-budget curves, showing how performance scales with increased token allowances for reasoning. The company suggests that careful management of the budget can help developers optimize both quality and latency in real-world applications. Both the Nano model and the Nemotron-H family utilize a combination of curated, web-sourced, and synthetic training data.

Nvidia Unveils Nemotron-Nano-9B-v2: Open-Source AI Model with 9B Parameters and Toggle Reasoning Capabilities

The Rise of Small Models

Performance and Specifications

Challenges in AI Scaling

Multilingual Capabilities

Innovative Architecture

Features of Nemotron-Nano-9B-V2

Evaluation and Accuracy

Navigation

Top Infos

Testo Showcases New Thermal Camera and Digital Measurement Ecosystem at Efintec 2025 – Energética 21 Insights

Dehn to Unveil Next-Gen Surge Protection Solution at Efintec 2025: Advanced Safety Features for Industrial Power Systems

Telematel Showcases Digital Solutions for Installers at Efintec 2025: Smart Tech Transforming the Energy Sector

Latest Updates on Genera/Matelec 2025: Key Insights and Developments From Energética 21 for Industry Professionals and Stakeholders

Efintec 2025: Leading Energy Innovation Summit Returns With Advanced Solutions for Sustainable Power Technology

Favorites

Hydrogen, the fuel of the future: A new chapter for trucking and aviation

The United States wants to triple its solar panels. To achieve this, they had a brilliant idea: an army of robots.

Must-have Gym Bag Essentials for the Fitness Enthusiast in 2025