✅ Zen Model Architecture - MoE Update Complete¶
🔄 What Changed¶
Removed¶
- ❌ Deleted
models/moe/directory - No standalone MoE model - ❌ Removed
configs/zen_moe.yamlsymlink
Updated to MoE Architectures¶
The following models are now correctly identified as MoE (Mixture of Experts) architectures:
- Zen Coder (7B MoE)
- Path:
models/coder/ - Model:
Qwen/Qwen3-Coder-7B-MoE-Instruct - Active params: 2B per token
-
Specialized for code generation
-
Zen Omni (14B MoE)
- Path:
models/omni/ - Model:
Qwen/Qwen3-Omni-14B-MoE - Active params: 4B per token
-
Multimodal capabilities
-
Zen Next (32B MoE)
- Path:
models/next/ - Model:
Qwen/Qwen3-Next-32B-MoE-Instruct - Active params: 8B per token
- Advanced reasoning
📊 Final Model Architecture¶
| Model | Total Params | Architecture | Active Params | Use Case |
|---|---|---|---|---|
| Nano | 0.6B | Dense | 0.6B | Edge/Mobile |
| Eco | 4B | Dense | 4B | Production |
| Coder | 7B | MoE | ~2B | Code Gen |
| Omni | 14B | MoE | ~4B | Multimodal |
| Next | 32B | MoE | ~8B | Advanced AI |
🚀 Training Commands¶
# Dense Models (traditional architecture)
gym train models/nano/configs/gspo_training.yaml # 0.6B Dense
gym train models/eco/configs/gspo_training.yaml # 4B Dense
# MoE Models (sparse activation)
gym train models/coder/configs/gspo_training.yaml # 7B MoE
gym train models/omni/configs/gspo_training.yaml # 14B MoE
gym train models/next/configs/gspo_training.yaml # 32B MoE
🎯 Why MoE?¶
Mixture of Experts Benefits: - Efficiency: Only ~25-30% of parameters active per token - Scalability: Can build larger models with same compute - Specialization: Different experts learn different patterns - Performance: Better accuracy with lower inference cost
GSPO + MoE: - GSPO algorithm includes MoE stabilization - Prevents expert collapse - Ensures balanced routing - Optimizes group-wise training for sparse models
📁 Updated Structure¶
models/
├── nano/ # 0.6B Dense - Ultra-light
├── eco/ # 4B Dense - Balanced
├── coder/ # 7B MoE - Code specialist
├── omni/ # 14B MoE - Multimodal
└── next/ # 32B MoE - Advanced reasoning
✨ Key Improvements¶
- Accurate Architecture Labels: Models correctly identified as Dense or MoE
- Proper Model Names: Updated to reflect actual Qwen3 MoE variants
- Cleaner Structure: Removed redundant MoE directory
- Better Documentation: Clear distinction between architectures
Zoo Labs Foundation
Zen Models - Bringing balance to AI through efficient architectures
Website: https://zoo.ngo