Configuration Reference

The YAML configuration that drives a Gym fine-tuning run.

A Gym run is fully described by a single YAML config file. You pass it to the CLI (gym train your_config.yml) and it is validated against a strict Pydantic schema before training starts, so typos and incompatible options are caught up front rather than mid-run.

The authoritative, always-current definition of every field lives in the schema source: src/gym/utils/schemasconfig.py is the top-level model, and it composes the focused models listed under Schema modules below. This page is a guided tour of the most common options; the schema is the source of truth.

A minimal config

base_model: NousResearch/Llama-3.2-1B

datasets:
  - path: teknium/GPT4-LLM-Cleaned
    type: alpaca
val_set_size: 0.1
output_dir: ./outputs/lora-out

adapter: lora
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, down_proj, up_proj]

sequence_len: 2048
sample_packing: true

micro_batch_size: 2
gradient_accumulation_steps: 2
num_epochs: 1
optimizer: adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002

bf16: auto
gradient_checkpointing: true
flash_attention: true

Ready-to-run configs for many model families live in the examples/ directory — start from one close to your model and edit.

Core options

Model

Key Description
base_model HF Hub id or local path of the model to fine-tune.
tokenizer_config Override tokenizer if it differs from base_model.
model_type / tokenizer_type Force a specific loader class when auto-detection is not enough.
trust_remote_code Allow custom modeling code shipped with the model.
load_in_8bit / load_in_4bit Quantize the base weights for QLoRA-style training.

Datasets

Key Description
datasets List of sources, each with a path and a type (prompt strategy).
chat_template Template used to render conversational data (e.g. chatml, llama3).
val_set_size Fraction (or count) held out for evaluation.
dataset_prepared_path Cache location for the tokenized/packed dataset.
sample_packing Pack multiple short samples into one sequence for throughput.

See Dataset Formats for the full list of supported type: strategies.

Adapters (PEFT / LoRA)

Key Description
adapter lora, qlora, or unset for full fine-tuning.
lora_r / lora_alpha / lora_dropout LoRA rank, scaling, and dropout.
lora_target_modules Linear layers to attach LoRA to.
lora_modules_to_save Extra modules trained in full precision (e.g. resized embeddings).

Training hyperparameters

Key Description
micro_batch_size Per-device batch size.
gradient_accumulation_steps Steps accumulated before an optimizer update.
num_epochs / max_steps How long to train.
optimizer e.g. adamw_torch, adamw_8bit, adamw_bnb_8bit.
learning_rate, lr_scheduler, warmup_ratio Learning-rate schedule.
weight_decay, max_grad_norm Regularization and gradient clipping.
sequence_len Maximum token length per sample.

Precision & performance

Key Description
bf16 / fp16 / tf32 Mixed-precision modes (auto picks per hardware).
gradient_checkpointing Trade compute for memory by recomputing activations.
flash_attention / sdp_attention Fast attention kernels.
deepspeed / fsdp Sharded multi-GPU / multi-node training backends.

Checkpointing & logging

Key Description
output_dir Where checkpoints and the final model are written.
saves_per_epoch / save_steps Checkpoint cadence.
evals_per_epoch / eval_steps Evaluation cadence.
logging_steps Metric logging cadence.
wandb_*, mlflow_*, comet_* Experiment-tracking integrations.
hub_model_id Push checkpoints and the final model to the HF Hub.

Schema modules

Each focused area of the config is defined in its own module under src/gym/utils/schemas:

Module Covers
config Top-level model that composes all sections below.
model Base model, tokenizer, quantization loading.
datasets Dataset sources, packing, preprocessing.
training Hyperparameters, schedulers, batch sizing.
peft LoRA / ReLoRA adapter settings.
trl RL / preference-tuning (DPO, KTO, GRPO, ORPO) settings.
multimodal Vision-language configuration.
quantization QAT and post-training quantization.
integrations Third-party plugins (Liger, Cut Cross Entropy, Spectrum, …).

Validation

Because the config is validated up front, an invalid combination (for example load_in_4bit without adapter: qlora) fails fast with a precise error naming the offending field. When in doubt, copy a known-good file from examples/ and change one thing at a time.

See Getting Started for an end-to-end walkthrough.