- Updated: December 13, 2025
- 9 min read
Nanbeige4‑3B: 23‑Trillion‑Token Pipeline Propels 3B Model to 30B‑Class Reasoning
Nanbeige4‑3B is a 3‑billion‑parameter large language model that, thanks to a 23‑trillion‑token training pipeline, delivers reasoning performance on par with 30‑billion‑parameter models such as Qwen3‑32B, setting a new benchmark for efficiency‑driven AI research in 2025.

Introduction
The AI community has long chased the notion that larger models automatically mean better reasoning. The recent release of Nanbeige4‑3B challenges that assumption by showing how a meticulously crafted 23‑trillion‑token pipeline can push a modest 3B model into the performance tier of 30B‑class systems. This breakthrough, detailed in a pre‑print paper released in December 2025, has sparked intense discussion among researchers, startup founders, and enterprise AI teams looking for cost‑effective alternatives to massive models.
In this article we dissect the model’s architecture, training recipe, benchmark results, and broader implications for the large language model (LLM) ecosystem. Along the way we’ll illustrate how the UBOS homepage and its suite of AI‑centric tools can help you experiment with Nanbeige4‑3B or build complementary applications.
Overview of Nanbeige4‑3B and the 23‑Trillion‑Token Pipeline
Nanbeige4‑3B is the flagship model from the Nanbeige LLM Lab at Boss Zhipin. It comes in two checkpoints:
- Nanbeige4‑3B‑Base – a vanilla 3B transformer trained on a filtered corpus.
- Nanbeige4‑3B‑Thinking – a reasoning‑tuned variant that incorporates curriculum scheduling, distillation, and reinforcement learning.
The core of the model’s advantage lies in its 23‑trillion‑token pipeline. Instead of simply scaling up raw data, the team applied a multi‑stage filtering and resampling strategy that yields a high‑utility training set:
- Hybrid data filtering reduces the raw corpus to 12.5 T tokens of “high‑quality” text.
- A second pass selects a 6.5 T token subset, which is then up‑sampled for multiple epochs, reaching a total of 23 T tokens.
- Each token is scored on a 0‑to‑9 utility scale, allowing the pipeline to prioritize content that most improves reasoning.
This data‑centric approach, combined with a novel curriculum scheduler called Fine‑Grained Warmup‑Stable‑Decay (FG‑WSD), ensures that the model sees increasingly higher‑quality examples as training progresses. The result is a model that learns more efficiently, achieving state‑of‑the‑art performance without the massive compute budget typically required for 30B‑class models.
Technical Innovations Behind the Performance Leap
1. Curriculum Scheduling (FG‑WSD)
Traditional LLM training uses a uniform data sampler throughout the stable phase. Nanbeige’s FG‑WSD replaces this with a dynamic scheduler that gradually shifts the data mixture toward higher‑utility tokens. In a 1B‑parameter ablation, this change boosted GSM8K scores from 27.1 to 34.3, a 27% relative gain. The full 3B run splits training into four phases: Warmup, Diversity‑Enriched Stable, High‑Quality Stable, and Decay, each with its own sampling policy.
2. Multi‑Stage Supervised Fine‑Tuning (SFT)
After the base pre‑training, the model undergoes a two‑stage SFT pipeline:
- Cold‑Start SFT: ~30 M QA samples focused on math, science, and code, with a 32K context window.
- Full‑Scale SFT: Expands to 64K context, adds chain‑of‑thought (CoT) reconstruction, solution refinement, and tool‑use tasks.
The iterative “generate‑critique‑revise” loop, guided by a dynamic checklist, produces cleaner reasoning traces, which are crucial for downstream reinforcement learning.
3. Dual‑Level Preference Distillation (DPD)
Distillation is performed on two levels:
- Token‑level distribution matching from a larger teacher (Nanbeige3.5‑Pro).
- Sequence‑level preference optimization using a DPO‑style objective that rewards positive responses and penalizes negatives.
This dual approach reduces “confident errors” and improves the model’s ability to generate diverse yet accurate answers.
4. Reinforcement Learning with Domain‑Specific Verifiers
The RL stage is split by domain:
- STEM RL: Uses a Python interpreter as a verifier to check mathematical equivalence.
- Coding RL: Executes generated code in a sandbox and rewards passing tests.
- Human Preference RL: Employs a pairwise reward model that focuses on short‑term token preferences, reducing reward‑hacking risk.
Benchmark Performance Versus Larger Models
The authors evaluated Nanbeige4‑3B‑Thinking against the Qwen3 family (4B‑32B parameters) on several high‑profile benchmarks. The most striking results appear on AIME 2024 and GPQA‑Diamond, where the 3B model outperformed the 32B Qwen3 variant.
| Benchmark | Nanbeige4‑3B‑Thinking | Qwen3‑14B | Qwen3‑32B |
|---|---|---|---|
| AIME 2024 (avg@8) | 90.4 | 81.4 | 81.4 |
| GPQA‑Diamond (avg@3) | 82.2 | 64.0 | 68.7 |
| BFCL‑V4 (avg@3) | 53.8 | 45.4 | 47.9 |
| Arena‑Hard V2 (avg@3) | 60.0 | 39.9 | 48.4 |
| Fullstack‑Bench (avg@3) | 48.0 | 55.7 | 58.2 |
| SuperGPQA (avg@3) | 53.2 | 46.8 | 54.1 |
The table shows that on reasoning‑heavy tasks (AIME, GPQA‑Diamond) the 3B model not only matches but surpasses the 32B baseline. On more general benchmarks (Fullstack‑Bench, SuperGPQA) the larger Qwen3 models retain an edge, indicating that Nanbeige’s gains are most pronounced where data quality and reasoning pipelines matter most.
Implications for the AI Community
The success of Nanbeige4‑3B reshapes three core narratives in AI research:
- Efficiency over Scale: High‑utility data pipelines can replace raw compute, making cutting‑edge reasoning accessible to smaller labs and startups.
- Curriculum‑Driven Training: FG‑WSD demonstrates that a well‑designed curriculum yields measurable gains across math, code, and scientific reasoning.
- Hybrid Distillation + RL: The dual‑level preference distillation combined with domain‑specific RL offers a reproducible recipe for aligning smaller models with human preferences.
For enterprises, this means that deploying a 3B model can dramatically cut inference costs while still delivering high‑quality answers for customer support, internal knowledge bases, or tool‑augmented agents. Startups can now experiment with state‑of‑the‑art reasoning without the $1M+ GPU budget traditionally required for 30B‑class models.
How UBOS Helps You Leverage Nanbeige4‑3B
The UBOS platform overview provides a low‑code environment where you can import any OpenAI‑compatible model, including Nanbeige4‑3B, and instantly expose it via REST or GraphQL endpoints. The Web app editor on UBOS lets you prototype UI layers for chat, code assistance, or data extraction without writing a single line of backend code.
If you need to orchestrate complex workflows—say, a pipeline that first queries Nanbeige4‑3B for a summary, then passes the result to a tool‑calling module—use the Workflow automation studio. This visual builder supports conditional branching, retries, and parallel execution, making it ideal for the multi‑stage RL verification steps described earlier.
For teams focused on marketing, the AI marketing agents template can be repurposed to generate SEO‑friendly copy using the AI SEO Analyzer or the AI Article Copywriter. Both templates are pre‑wired to call a language model endpoint, so swapping in Nanbeige4‑3B is a matter of a few clicks.
Startups can explore the UBOS for startups program, which offers free credits for early‑stage projects. SMBs can benefit from the UBOS solutions for SMBs, where the cost‑effective 3B model fits perfectly into budget‑constrained environments.
Enterprises looking for a full‑stack AI stack can adopt the Enterprise AI platform by UBOS, which includes built‑in monitoring, role‑based access, and compliance features—critical when deploying reasoning models in regulated sectors.
Author Insights: Why This Matters to You
As a researcher who has spent the last decade evaluating LLM scaling laws, I’ve seen the community repeatedly assume “more parameters = better performance.” Nanbeige4‑3B forces us to reconsider that equation. The model’s success proves that data curation + curriculum + smart distillation can close the gap that raw compute leaves open.
For product teams, the takeaway is clear: invest in a robust data pipeline before you pour money into larger GPUs. The UBOS pricing plans make it affordable to spin up the necessary compute for a 23‑trillion‑token run on cloud spot instances, especially when combined with the UBOS partner program, which offers co‑marketing and technical support.
Moreover, the model’s open‑source nature encourages community‑driven extensions. For example, you could integrate the Telegram integration on UBOS to create a real‑time reasoning bot, or combine it with the ChatGPT and Telegram integration for a hybrid assistant that leverages both Nanbeige and OpenAI APIs.
If you need voice capabilities, the ElevenLabs AI voice integration can turn Nanbeige’s textual output into natural‑sounding speech, opening doors for accessibility‑focused products.
Sample Use‑Case: AI‑Powered Knowledge Base Assistant
Imagine a corporate knowledge base that answers employee queries with high‑precision reasoning. Using the OpenAI ChatGPT integration as a fallback, you can route simple FAQs to a cheap Nanbeige4‑3B endpoint and only invoke the larger model for ambiguous or high‑stakes questions. The Chroma DB integration stores vector embeddings of your internal documents, enabling fast similarity search before the LLM even sees the prompt.
The workflow would look like:
- User submits a question via a web UI built with the Web app editor on UBOS.
- UBOS’s Workflow automation studio queries Chroma DB for the top‑5 relevant passages.
- The retrieved passages are fed to Nanbeige4‑3B‑Thinking for a concise answer.
- If the confidence score falls below a threshold, the request is escalated to the OpenAI API via the OpenAI ChatGPT integration.
- The final answer is delivered to the user, optionally spoken through the ElevenLabs AI voice integration.
Community Resources & Templates
UBOS’s template marketplace offers ready‑made building blocks that accelerate the creation of Nanbeige‑powered applications:
- Talk with Claude AI app – a conversational UI you can swap the backend for Nanbeige4‑3B.
- Your Speaking Avatar template – combine voice synthesis with LLM reasoning for interactive agents.
- AI SEO Analyzer – leverage Nanbeige’s reasoning to audit on‑page SEO at scale.
- AI Article Copywriter – generate long‑form content, then fine‑tune with Nanbeige’s SFT pipeline for brand‑specific tone.
Conclusion
The Nanbeige4‑3B model demonstrates that a well‑engineered training pipeline can deliver 30B‑class reasoning from a fraction of the parameters. For AI researchers, this opens a new research frontier focused on data utility, curriculum design, and multi‑stage alignment. For product teams, it offers a cost‑effective path to deploy high‑quality reasoning without the prohibitive compute bill.
To explore the model yourself, you can download the weights from the authors’ repository and plug them into the UBOS platform. Combine it with UBOS’s extensive integration catalog, workflow studio, and template marketplace to build the next generation of AI‑driven products.
For the full technical details, see the original pre‑print: Nanbeige4‑3B paper (PDF).
Explore more AI solutions on the About UBOS page, view our UBOS portfolio examples, or check out the UBOS templates for quick start.