Skip to content

Your First Training Session

A comprehensive guide to training your first model with Gym.

Overview

In this tutorial, you'll:

  1. Prepare a dataset
  2. Choose and configure a model
  3. Set up training parameters
  4. Monitor training progress
  5. Evaluate your fine-tuned model
  6. Deploy and use your model

Step 1: Prepare Your Dataset

Option A: Use Built-in Dataset

Gym includes several demo datasets:

# List available datasets
ls data/

# alpaca_en_demo - English instruction dataset
# alpaca_zh_demo - Chinese instruction dataset
# identity - Identity dataset for safe responses

Option B: Create Custom Dataset

Create a JSON file with your data:

[
  {
    "instruction": "What is machine learning?",
    "input": "",
    "output": "Machine learning is a subset of artificial intelligence..."
  },
  {
    "instruction": "Translate to Spanish",
    "input": "Hello, how are you?",
    "output": "Hola, ¿cómo estás?"
  }
]

Register in data/dataset_info.json:

{
  "my_dataset": {
    "file_name": "my_dataset.json",
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "response": "output"
    }
  }
}

Step 2: Choose Your Model

Model Size VRAM Best For
Qwen3-0.6B 0.6B 4GB Testing, CPU
Qwen3-4B 4B 8GB General use
LLaMA-3.2-3B 3B 8GB English tasks
Gemma-2B 2B 6GB Lightweight

Model Selection Tips

  • Start small (0.6B-4B) for learning
  • Use 7B-14B for production
  • 32B+ requires multi-GPU

Step 3: Configure Training

Basic LoRA Training

gym train \
  --model_name_or_path Qwen/Qwen3-4B-Instruct \
  --template qwen3 \
  --dataset my_dataset \
  --finetuning_type lora \
  --lora_rank 8 \
  --lora_alpha 16 \
  --lora_dropout 0.1 \
  --output_dir ./output/my-model \
  --per_device_train_batch_size 2 \
  --gradient_accumulation_steps 4 \
  --num_train_epochs 3 \
  --learning_rate 5e-5 \
  --lr_scheduler_type cosine \
  --warmup_steps 100 \
  --logging_steps 10 \
  --save_steps 500 \
  --fp16

QLoRA (Memory Efficient)

For limited GPU memory:

gym train \
  --model_name_or_path Qwen/Qwen3-7B-Instruct \
  --template qwen3 \
  --dataset my_dataset \
  --finetuning_type lora \
  --quantization_bit 4 \
  --double_quantization \
  --lora_rank 8 \
  --output_dir ./output/my-qlora \
  --per_device_train_batch_size 1 \
  --gradient_accumulation_steps 8 \
  --num_train_epochs 3 \
  --learning_rate 1e-4 \
  --fp16

Configuration File

Create configs/my_training.yaml:

model_name_or_path: Qwen/Qwen3-4B-Instruct
template: qwen3
dataset: my_dataset
finetuning_type: lora

# LoRA settings
lora_rank: 8
lora_alpha: 16
lora_dropout: 0.1
lora_target: all

# Training settings
output_dir: ./output/my-model
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
num_train_epochs: 3
learning_rate: 5.0e-5
lr_scheduler_type: cosine
warmup_ratio: 0.1

# Optimization
fp16: true
gradient_checkpointing: true
optim: adamw_torch

# Logging
logging_steps: 10
save_steps: 500
save_total_limit: 3

# Evaluation
evaluation_strategy: steps
eval_steps: 500
per_device_eval_batch_size: 4

Run with:

gym train --config configs/my_training.yaml

Step 4: Monitor Training

TensorBoard

# Start TensorBoard
tensorboard --logdir ./output/my-model

# Open browser at http://localhost:6006

Monitor: - Training loss (should decrease) - Learning rate (follows schedule) - Gradient norm (should be stable)

W&B Integration

# Install wandb
pip install wandb

# Login
wandb login

# Add to training
gym train \
  --config configs/my_training.yaml \
  --report_to wandb \
  --run_name my-experiment

Training Logs

Check logs in real-time:

tail -f ./output/my-model/trainer_log.jsonl

Step 5: Evaluate Your Model

Interactive Chat

gym chat \
  --model_name_or_path Qwen/Qwen3-4B-Instruct \
  --adapter_name_or_path ./output/my-model \
  --template qwen3

Benchmark Evaluation

gym eval \
  --model_name_or_path Qwen/Qwen3-4B-Instruct \
  --adapter_name_or_path ./output/my-model \
  --template qwen3 \
  --task mmlu \
  --batch_size 4

Custom Evaluation

Create eval_data.json:

[
  {"input": "What is AI?", "output": "..."},
  {"input": "Explain ML", "output": "..."}
]

Evaluate:

gym eval \
  --model_name_or_path ./output/my-model \
  --eval_dataset eval_data \
  --template qwen3

Step 6: Deploy Your Model

Local API Server

gym api \
  --model_name_or_path Qwen/Qwen3-4B-Instruct \
  --adapter_name_or_path ./output/my-model \
  --port 8000

Test:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Export to GGUF

For llama.cpp compatibility:

gym export \
  --model_name_or_path Qwen/Qwen3-4B-Instruct \
  --adapter_name_or_path ./output/my-model \
  --export_dir ./output/gguf \
  --export_size 4 \
  --export_quantization_bit 4

Deploy to HuggingFace

# Login
huggingface-cli login

# Upload
gym export \
  --model_name_or_path ./output/my-model \
  --export_dir ./hf_model \
  --export_hub_model_id username/my-model

huggingface-cli upload username/my-model ./hf_model

Common Issues

Out of Memory

Reduce batch size:

--per_device_train_batch_size 1
--gradient_accumulation_steps 16

Enable gradient checkpointing:

--gradient_checkpointing true

Use QLoRA:

--quantization_bit 4

Slow Training

Enable Flash Attention:

pip install flash-attn --no-build-isolation
--flash_attn fa2

Use bf16 instead of fp16:

--bf16 true

Poor Results

Increase training epochs:

--num_train_epochs 5

Adjust learning rate:

--learning_rate 1e-4  # Higher for small datasets
--learning_rate 5e-6  # Lower for large datasets

More training data: - Aim for 1000+ examples - Diverse examples - High-quality annotations

Checkpoint Issues

Resume from checkpoint:

gym train \
  --config configs/my_training.yaml \
  --resume_from_checkpoint ./output/my-model/checkpoint-500

Next Steps

Best Practices

  1. Start Small: Test with small model and dataset first
  2. Monitor Closely: Watch training loss and metrics
  3. Save Often: Use --save_steps for regular checkpoints
  4. Evaluate Early: Check quality after 1 epoch
  5. Version Control: Track configs and results
  6. Document: Note what works and what doesn't

Ready for advanced features? Try Continuous Learning GRPO!