Your First Training Session¶
A comprehensive guide to training your first model with Gym.
Overview¶
In this tutorial, you'll:
- Prepare a dataset
- Choose and configure a model
- Set up training parameters
- Monitor training progress
- Evaluate your fine-tuned model
- Deploy and use your model
Step 1: Prepare Your Dataset¶
Option A: Use Built-in Dataset¶
Gym includes several demo datasets:
# List available datasets
ls data/
# alpaca_en_demo - English instruction dataset
# alpaca_zh_demo - Chinese instruction dataset
# identity - Identity dataset for safe responses
Option B: Create Custom Dataset¶
Create a JSON file with your data:
[
{
"instruction": "What is machine learning?",
"input": "",
"output": "Machine learning is a subset of artificial intelligence..."
},
{
"instruction": "Translate to Spanish",
"input": "Hello, how are you?",
"output": "Hola, ¿cómo estás?"
}
]
Register in data/dataset_info.json:
{
"my_dataset": {
"file_name": "my_dataset.json",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output"
}
}
}
Step 2: Choose Your Model¶
Recommended Models for Beginners¶
| Model | Size | VRAM | Best For |
|---|---|---|---|
| Qwen3-0.6B | 0.6B | 4GB | Testing, CPU |
| Qwen3-4B | 4B | 8GB | General use |
| LLaMA-3.2-3B | 3B | 8GB | English tasks |
| Gemma-2B | 2B | 6GB | Lightweight |
Model Selection Tips¶
- Start small (0.6B-4B) for learning
- Use 7B-14B for production
- 32B+ requires multi-GPU
Step 3: Configure Training¶
Basic LoRA Training¶
gym train \
--model_name_or_path Qwen/Qwen3-4B-Instruct \
--template qwen3 \
--dataset my_dataset \
--finetuning_type lora \
--lora_rank 8 \
--lora_alpha 16 \
--lora_dropout 0.1 \
--output_dir ./output/my-model \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 4 \
--num_train_epochs 3 \
--learning_rate 5e-5 \
--lr_scheduler_type cosine \
--warmup_steps 100 \
--logging_steps 10 \
--save_steps 500 \
--fp16
QLoRA (Memory Efficient)¶
For limited GPU memory:
gym train \
--model_name_or_path Qwen/Qwen3-7B-Instruct \
--template qwen3 \
--dataset my_dataset \
--finetuning_type lora \
--quantization_bit 4 \
--double_quantization \
--lora_rank 8 \
--output_dir ./output/my-qlora \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 8 \
--num_train_epochs 3 \
--learning_rate 1e-4 \
--fp16
Configuration File¶
Create configs/my_training.yaml:
model_name_or_path: Qwen/Qwen3-4B-Instruct
template: qwen3
dataset: my_dataset
finetuning_type: lora
# LoRA settings
lora_rank: 8
lora_alpha: 16
lora_dropout: 0.1
lora_target: all
# Training settings
output_dir: ./output/my-model
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
num_train_epochs: 3
learning_rate: 5.0e-5
lr_scheduler_type: cosine
warmup_ratio: 0.1
# Optimization
fp16: true
gradient_checkpointing: true
optim: adamw_torch
# Logging
logging_steps: 10
save_steps: 500
save_total_limit: 3
# Evaluation
evaluation_strategy: steps
eval_steps: 500
per_device_eval_batch_size: 4
Run with:
Step 4: Monitor Training¶
TensorBoard¶
Monitor: - Training loss (should decrease) - Learning rate (follows schedule) - Gradient norm (should be stable)
W&B Integration¶
# Install wandb
pip install wandb
# Login
wandb login
# Add to training
gym train \
--config configs/my_training.yaml \
--report_to wandb \
--run_name my-experiment
Training Logs¶
Check logs in real-time:
Step 5: Evaluate Your Model¶
Interactive Chat¶
gym chat \
--model_name_or_path Qwen/Qwen3-4B-Instruct \
--adapter_name_or_path ./output/my-model \
--template qwen3
Benchmark Evaluation¶
gym eval \
--model_name_or_path Qwen/Qwen3-4B-Instruct \
--adapter_name_or_path ./output/my-model \
--template qwen3 \
--task mmlu \
--batch_size 4
Custom Evaluation¶
Create eval_data.json:
Evaluate:
Step 6: Deploy Your Model¶
Local API Server¶
gym api \
--model_name_or_path Qwen/Qwen3-4B-Instruct \
--adapter_name_or_path ./output/my-model \
--port 8000
Test:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Export to GGUF¶
For llama.cpp compatibility:
gym export \
--model_name_or_path Qwen/Qwen3-4B-Instruct \
--adapter_name_or_path ./output/my-model \
--export_dir ./output/gguf \
--export_size 4 \
--export_quantization_bit 4
Deploy to HuggingFace¶
# Login
huggingface-cli login
# Upload
gym export \
--model_name_or_path ./output/my-model \
--export_dir ./hf_model \
--export_hub_model_id username/my-model
huggingface-cli upload username/my-model ./hf_model
Common Issues¶
Out of Memory¶
Reduce batch size:
Enable gradient checkpointing:
Use QLoRA:
Slow Training¶
Enable Flash Attention:
Use bf16 instead of fp16:
Poor Results¶
Increase training epochs:
Adjust learning rate:
More training data: - Aim for 1000+ examples - Diverse examples - High-quality annotations
Checkpoint Issues¶
Resume from checkpoint:
gym train \
--config configs/my_training.yaml \
--resume_from_checkpoint ./output/my-model/checkpoint-500
Next Steps¶
- Continuous Learning GRPO - Learn without parameter updates
- Custom Agents - Build specialized agents
- Chat-to-Experiences - Learn from conversations
Best Practices¶
- Start Small: Test with small model and dataset first
- Monitor Closely: Watch training loss and metrics
- Save Often: Use
--save_stepsfor regular checkpoints - Evaluate Early: Check quality after 1 epoch
- Version Control: Track configs and results
- Document: Note what works and what doesn't
Ready for advanced features? Try Continuous Learning GRPO!