Zen Models - Accurate Qwen3 Model Specifications¶
Zoo Labs Foundation - Gym Training Platform
Updated with correct Qwen3 model information
📊 Zen Model Family - Based on Real Qwen3 Models¶
🔬 Zen Nano - Qwen3-0.6B¶
- Base Model:
Qwen/Qwen3-0.6B - Architecture: Dense (traditional transformer)
- Parameters: 600M total
- Context: 32K tokens native
- Use Case: Edge devices, mobile, ultra-lightweight deployment
- Performance: Matches Qwen2.5-1.5B despite smaller size
🌿 Zen Eco - Qwen3-4B¶
- Base Model:
Qwen/Qwen3-4B - Architecture: Dense (traditional transformer)
- Parameters: 4B total
- Context: 32K tokens native
- Use Case: Production APIs, balanced performance
- Performance: Matches Qwen2.5-7B performance (50% density improvement)
💻 Zen Coder - Qwen3-Coder-480B-A35B¶
- Base Model:
Qwen/Qwen3-Coder-480B-A35B-Instruct - Architecture: MoE (Mixture of Experts)
- Parameters: 480B total, 35B active per token
- Experts: 128 total experts, 8 activated per token
- Context: 256K native, 1M with YaRN extension
- Training: 7.5T tokens (70% code ratio)
- Use Case: State-of-the-art code generation, agentic coding
- Special: Agent RL post-training for multi-turn tool use
🌐 Zen Omni - Qwen3-Omni-30B-A3B¶
- Base Model:
Qwen/Qwen3-Omni-30B-A3B-Instruct - Architecture: MoE (Mixture of Experts)
- Parameters: 30B total, 3B active per token
- Modalities: Text (119 languages), Speech (19 languages in, 10 out), Vision, Video
- Components:
- Vision: 675M parameter ViT encoder
- Audio: Whisper-large-v3 based encoder (16kHz, 128-channel mel-spectrogram)
- Use Case: Multimodal understanding, real-time omni-modal AI
- Variants: Also available as Thinking and Captioner models
🚀 Zen Next - Qwen3-Next-80B-A3B¶
- Base Model:
Qwen/Qwen3-Next-80B-A3B-Instruct - Architecture: Hybrid MoE with novel attention
- Parameters: 80B total, 3B active per token
- Experts: 512 total experts, 10+1 activated
- Attention: Hybrid Gated DeltaNet + Gated Attention (every 4th layer uses GQA)
- Context: 256K native, 1M extendable
- Performance: 10x throughput for >32K context vs traditional
- Special Features:
- Multi-Token Prediction (MTP)
- Linear attention for most layers
- Ultra-sparse activation (3.75% parameters active)
- Variants: Instruct and Thinking modes available
🎯 Key Innovations¶
Dense Models (Nano, Eco)¶
- 50% density improvement over Qwen2.5
- Qwen3-4B matches Qwen2.5-7B performance
- Qwen3-0.6B matches Qwen2.5-1.5B performance
- 36 trillion token training (2x Qwen2.5)
MoE Models (Coder, Omni, Next)¶
- Sparse activation for efficiency
- No shared experts (unlike Qwen2.5-MoE)
- Global batch load balancing for expert specialization
- GSPO training with MoE stabilization
Qwen3-Next Specific¶
- Hybrid attention mechanism replacing standard attention
- 10x inference speedup for long contexts
- 3B active out of 80B - extreme sparsity
- Foundation for upcoming Qwen3.5
📁 Configuration Files¶
# Dense Models
models/nano/configs/gspo_training.yaml # Qwen3-0.6B
models/eco/configs/gspo_training.yaml # Qwen3-4B
# MoE Models
models/coder/configs/gspo_training.yaml # Qwen3-Coder-480B-A35B
models/omni/configs/gspo_training.yaml # Qwen3-Omni-30B-A3B
models/next/configs/gspo_training.yaml # Qwen3-Next-80B-A3B
🚀 Training Commands¶
# Train any model
gym train models/nano/configs/gspo_training.yaml # Zen Nano
gym train models/eco/configs/gspo_training.yaml # Zen Eco
gym train models/coder/configs/gspo_training.yaml # Zen Coder
gym train models/omni/configs/gspo_training.yaml # Zen Omni
gym train models/next/configs/gspo_training.yaml # Zen Next
📈 Performance Summary¶
| Model | Total Params | Active Params | Context | Architecture |
|---|---|---|---|---|
| Nano | 0.6B | 0.6B | 32K | Dense |
| Eco | 4B | 4B | 32K | Dense |
| Coder | 480B | 35B | 256K-1M | MoE (128 experts) |
| Omni | 30B | 3B | 32K | MoE (multimodal) |
| Next | 80B | 3B | 256K-1M | Hybrid MoE (512 experts) |
🔗 References¶
- Qwen3 Technical Report: https://arxiv.org/pdf/2505.09388
- Qwen3-Coder: https://qwenlm.github.io/blog/qwen3-coder/
- Qwen3-Omni: https://arxiv.org/html/2509.17765v1
- Qwen3-Next: Released September 15, 2025
- GSPO Paper: https://arxiv.org/abs/2507.18071
Zoo Labs Foundation
Building the future of AI training with accurate, state-of-the-art models
Website: https://zoo.ngo