Semantic Memory API Reference¶

Overview¶

The Semantic Memory system powers Continuous Learning GRPO by managing experience libraries - collections of natural language insights that guide model behavior. This API reference documents the core classes and methods.

Module Structure¶

src/gym/train/grpo/
├── experience_manager.py    # CRUD operations on experience library
├── semantic_extractor.py    # 3-stage LLM extraction process
└── api_model_adapter.py     # Cloud model adapters (DeepSeek, OpenAI)

ExperienceManager¶

Location: src/gym/train/grpo/experience_manager.py

Manages the experience library E, providing CRUD operations and persistence.

Class Definition¶

class ExperienceManager:
    """Manages the experience library E for Continuous Learning GRPO."""

    def __init__(self, checkpoint_path: str = None)

Parameters¶

checkpoint_path (str, optional): Path to load/save experiences (JSON format). If provided and exists, experiences are loaded automatically.

Attributes¶

experiences (Dict[str, str]): Dictionary mapping experience IDs to text
_next_id (int): Counter for generating unique IDs

Methods¶

add()¶

Add new experience to library.

def add(self, experience: str) -> str

Parameters: - experience (str): Natural language experience statement (≤32 words recommended)

Returns: - exp_id (str): Assigned experience ID (format: "G{N}")

Example:

manager = ExperienceManager()
exp_id = manager.add("When solving equations, verify solutions by substitution.")
print(exp_id)  # "G0"

delete()¶

Delete experience by ID.

def delete(self, exp_id: str) -> bool

Parameters: - exp_id (str): Experience ID to delete

Returns: - success (bool): True if deleted, False if ID not found

Example:

manager.delete("G5")  # Returns True if G5 exists

modify()¶

Modify existing experience.

def modify(self, exp_id: str, new_experience: str) -> bool

Parameters: - exp_id (str): Experience ID to modify - new_experience (str): New experience text

Returns: - success (bool): True if modified, False if ID not found

Example:

manager.modify("G17", "Updated experience text...")

merge()¶

Merge multiple experiences into one.

def merge(self, exp_ids: List[str], merged_experience: str) -> str

Parameters: - exp_ids (List[str]): List of experience IDs to merge - merged_experience (str): Merged experience text

Returns: - new_exp_id (str): ID of newly created merged experience

Side Effects: Deletes the original experiences

Example:

new_id = manager.merge(
    ["G1", "G3", "G7"],
    "Combined guidance for polynomial equations..."
)
# G1, G3, G7 deleted; new experience created

apply_operations()¶

Apply batch of operations from LLM.

def apply_operations(self, operations: List[Dict]) -> None

Parameters: - operations (List[Dict]): List of operation dictionaries with keys: - "option": One of ["add", "modify", "delete", "merge", "keep"] - "experience": New/modified experience text (for add/modify/merge) - "modified_from": Original experience ID (for modify) - "delete_id": Experience ID to delete (for delete) - "merged_from": List of experience IDs to merge (for merge)

Example:

operations = [
    {"option": "add", "experience": "When solving..."},
    {"option": "modify", "experience": "Updated...", "modified_from": "G17"},
    {"option": "delete", "delete_id": "G5"},
    {"option": "merge", "experience": "Merged...", "merged_from": ["G1", "G3"]}
]
manager.apply_operations(operations)

format_for_prompt()¶

Format experiences for injection into prompts.

def format_for_prompt(self) -> str

Returns: - formatted (str): Multi-line string with numbered experiences

Example Output:

[G0]. When solving equations, verify solutions by substitution.
[G1]. For optimization, check boundary conditions first.
[G2]. In probability, use conditional probability when dependent.

Example:

experiences_str = manager.format_for_prompt()
prompt = f"Helpful experiences:\n{experiences_str}\n\nQuery: {query}"

save() / load()¶

Persist experiences to/from JSON file.

def save(self, path: str) -> None
def load(self, path: str) -> None

Parameters: - path (str): File path for JSON persistence

File Format:

{
  "experiences": {
    "G0": "Experience text...",
    "G1": "Another experience...",
    ...
  },
  "next_id": 42
}

Example:

manager.save("./experiences.json")

# Later...
manager2 = ExperienceManager()
manager2.load("./experiences.json")
assert len(manager2) == len(manager)

len()¶

Get number of experiences.

def __len__(self) -> int

Example:

print(f"Library contains {len(manager)} experiences")

SemanticExtractor¶

Location: src/gym/train/grpo/semantic_extractor.py

Extracts semantic advantages from trajectory groups using 3-stage LLM process.

Class Definition¶

class SemanticExtractor:
    """Extracts semantic advantages from groups of trajectories."""

    def __init__(self, llm_client: Any, max_operations: int = 3)

Parameters¶

llm_client: LLM client with .chat() method (e.g., LLMClient, OpenAI)
max_operations (int, default=3): Max operations per group critique

Methods¶

summarize_trajectory()¶

Stage 1: Summarize a single trajectory step-by-step.

def summarize_trajectory(
    self,
    trajectory: Trajectory,
    use_groundtruth: bool = True
) -> str

Parameters: - trajectory (Trajectory): Trajectory to summarize with fields: - query: Input query - output: Model-generated output - reward: Numerical reward (e.g., 0 for wrong, 1 for correct) - groundtruth: Optional ground truth answer - use_groundtruth (bool, default=True): Include ground truth in prompt

Returns: - summary (str): Step-by-step analysis

Example:

from gym.train.grpo.semantic_extractor import Trajectory, SemanticExtractor, LLMClient

llm = LLMClient(api_key="sk-xxx")
extractor = SemanticExtractor(llm)

traj = Trajectory(
    query="Solve: x² + 2x + 5 = 0",
    output="Using quadratic formula... discriminant = -16, no real solutions",
    reward=1.0,
    groundtruth="No real solutions"
)

summary = extractor.summarize_trajectory(traj)
print(summary)
# Output:
# 1. Applied quadratic formula with a=1, b=2, c=5
# 2. Calculated discriminant: b²-4ac = 4-20 = -16
# 3. Correctly concluded no real solutions (discriminant < 0)

extract_group_advantage()¶

Stage 2: Extract semantic advantage from group of trajectories.

def extract_group_advantage(
    self,
    trajectories: List[Trajectory],
    experiences: str,
    use_groundtruth: bool = True
) -> List[Dict]

Parameters: - trajectories (List[Trajectory]): G trajectories for same query - experiences (str): Formatted experience library (from format_for_prompt()) - use_groundtruth (bool, default=True): Include ground truth in prompt

Returns: - operations (List[Dict]): List of operations to apply

Example:

trajectories = [
    Trajectory(query="Solve x²+2x+5=0", output="...", reward=1.0),
    Trajectory(query="Solve x²+2x+5=0", output="...", reward=0.0),
    ...  # 5 total trajectories
]

experiences_str = manager.format_for_prompt()

operations = extractor.extract_group_advantage(
    trajectories,
    experiences_str,
    use_groundtruth=True
)

# Returns:
# [
#   {"option": "add", "experience": "For quadratic equations, check discriminant..."},
#   {"option": "modify", "experience": "...", "modified_from": "G17"}
# ]

Note: Returns empty list [] if group has no variation (all correct or all wrong).

consolidate_batch()¶

Stage 3: Consolidate all group operations into final updates.

def consolidate_batch(
    self,
    all_group_operations: List[List[Dict]],
    experiences: str
) -> List[Dict]

Parameters: - all_group_operations (List[List[Dict]]): Operations from each group in batch - experiences (str): Current formatted experience library

Returns: - final_operations (List[Dict]): Consolidated operations

Example:

all_ops = [
    [{"option": "add", "experience": "..."}],  # Group 1
    [{"option": "modify", "experience": "...", "modified_from": "G17"}],  # Group 2
    ...
]

final_ops = extractor.consolidate_batch(all_ops, experiences_str)

# Returns merged/consolidated operations:
# [
#   {"option": "merge", "experience": "...", "merged_from": ["G3", "G17"]},
#   {"option": "delete", "delete_id": "G5"}
# ]

Trajectory¶

Location: src/gym/train/grpo/semantic_extractor.py

Data class representing a single rollout trajectory.

Definition¶

@dataclass
class Trajectory:
    """Single rollout trajectory."""
    query: str                        # Input query/problem
    output: str                       # Model-generated output
    reward: float                     # Numerical reward (0-1)
    groundtruth: Optional[str] = None # Ground truth answer
    summary: Optional[str] = None     # LLM-generated summary

Example¶

from gym.train.grpo.semantic_extractor import Trajectory

traj = Trajectory(
    query="What is 5 + 3?",
    output="5 + 3 = 8",
    reward=1.0,
    groundtruth="8",
    summary="Added 5 and 3 to get 8 (correct)"
)

LLMClient¶

Location: src/gym/train/grpo/semantic_extractor.py

Simple wrapper for LLM API clients with unified .chat() interface.

Class Definition¶

class LLMClient:
    """Simple wrapper for LLM API clients."""

    def __init__(
        self,
        api_key: str,
        base_url: str = "https://api.deepseek.com/v1",
        model: str = "deepseek-chat"
    )

Parameters¶

api_key (str): API key for LLM service
base_url (str, default=DeepSeek): Base URL for API endpoint
model (str, default="deepseek-chat"): Model name

Methods¶

chat()¶

Send chat request to LLM.

def chat(
    self,
    prompt: str,
    temperature: float = 0.7,
    max_tokens: int = 4096
) -> str

Parameters: - prompt (str): User prompt - temperature (float, default=0.7): Sampling temperature - max_tokens (int, default=4096): Max tokens to generate

Returns: - response (str): LLM response text

Example:

llm = LLMClient(api_key="sk-xxx", model="deepseek-chat")
response = llm.chat("What is 2 + 2?")
print(response)  # "4"

APIModelAdapter¶

Location: src/gym/train/grpo/api_model_adapter.py

Adapter for using cloud-hosted models (DeepSeek, OpenAI) in Continuous Learning.

Class Definition¶

class APIModelAdapter:
    """Adapter for API-hosted models."""

    def __init__(self, config: APIModelConfig)

Parameters¶

config (APIModelConfig): Configuration with fields:
api_key: API key
base_url: API endpoint URL
model: Model name
temperature: Sampling temperature
max_tokens: Max tokens
top_p: Nucleus sampling parameter

Methods¶

generate()¶

Generate response from API model.

def generate(
    self,
    prompt: str,
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
    system_prompt: Optional[str] = None
) -> str

Parameters: - prompt (str): User prompt - temperature (float, optional): Override config temperature - max_tokens (int, optional): Override config max_tokens - system_prompt (str, optional): System prompt

Returns: - response (str): Generated text

Example:

from gym.train.grpo.api_model_adapter import APIModelConfig, APIModelAdapter

config = APIModelConfig(
    api_key="sk-xxx",
    base_url="https://api.deepseek.com/v1",
    model="deepseek-chat"
)
adapter = APIModelAdapter(config)

response = adapter.generate(
    "Solve: x² + 2x + 5 = 0",
    system_prompt="You are a math tutor."
)

generate_with_experiences()¶

Generate response with experiences injected into prompt.

def generate_with_experiences(
    self,
    query: str,
    experiences: str,
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None
) -> str

Parameters: - query (str): Problem/query to solve - experiences (str): Formatted experience library - temperature (float, optional): Sampling temperature - max_tokens (int, optional): Max tokens

Returns: - response (str): Generated solution

Example:

experiences = manager.format_for_prompt()
response = adapter.generate_with_experiences(
    query="Solve: x² + 2x + 5 = 0",
    experiences=experiences
)

DeepSeekAdapter¶

Convenience wrapper for DeepSeek models.

Class Definition¶

class DeepSeekAdapter(APIModelAdapter):
    """Convenience wrapper for DeepSeek models."""

    def __init__(
        self,
        api_key: str,
        model: str = "deepseek-chat",
        temperature: float = 0.7
    )

Supported Models: - deepseek-chat: DeepSeek-V3 (recommended) - deepseek-reasoner: DeepSeek-V3 with reasoning traces

Example:

from gym.train.grpo.api_model_adapter import DeepSeekAdapter

adapter = DeepSeekAdapter(api_key="sk-xxx")
response = adapter.generate("What is 2+2?")

OpenAIAdapter¶

Convenience wrapper for OpenAI models.

Class Definition¶

class OpenAIAdapter(APIModelAdapter):
    """Convenience wrapper for OpenAI models."""

    def __init__(
        self,
        api_key: str,
        model: str = "gpt-4o-mini",
        temperature: float = 0.7
    )

Supported Models: - gpt-4o: Latest GPT-4 Omni - gpt-4o-mini: Smaller, faster, cheaper - o1-preview: Reasoning model

Example:

from gym.train.grpo.api_model_adapter import OpenAIAdapter

adapter = OpenAIAdapter(api_key="sk-xxx", model="gpt-4o-mini")
response = adapter.generate("What is 2+2?")

Complete Workflow Example¶

from gym.train.grpo.experience_manager import ExperienceManager
from gym.train.grpo.semantic_extractor import SemanticExtractor, LLMClient, Trajectory
from gym.train.grpo.api_model_adapter import DeepSeekAdapter

# 1. Initialize components
experience_manager = ExperienceManager(checkpoint_path="./experiences.json")
llm_client = LLMClient(api_key="sk-xxx")
extractor = SemanticExtractor(llm_client, max_operations=3)
model_adapter = DeepSeekAdapter(api_key="sk-xxx")

# 2. Training data
queries = ["Solve: x² + 2x + 5 = 0", ...]
groundtruths = ["No real solutions", ...]

# 3. Training loop (1 epoch)
for query, gt in zip(queries, groundtruths):
    # Generate G rollouts
    trajectories = []
    experiences = experience_manager.format_for_prompt()

    for _ in range(5):  # G=5
        output = model_adapter.generate_with_experiences(query, experiences)
        reward = 1.0 if "no real" in output.lower() else 0.0

        traj = Trajectory(query, output, reward, gt)
        trajectories.append(traj)

    # Stage 1: Summarize
    for traj in trajectories:
        traj.summary = extractor.summarize_trajectory(traj)

    # Stage 2: Extract advantages
    operations = extractor.extract_group_advantage(
        trajectories, experiences, use_groundtruth=True
    )

    # Apply operations
    if operations:
        experience_manager.apply_operations(operations)

# 4. Save
experience_manager.save("./experiences_final.json")
print(f"Final library size: {len(experience_manager)}")

Performance Optimization¶

Memory Compression¶

For large experience libraries (100+ experiences), use embedding-based compression:

from sentence_transformers import SentenceTransformer
import numpy as np

class CompressedExperienceManager(ExperienceManager):
    """Experience manager with embedding-based compression."""

    def __init__(self, checkpoint_path=None, compression_threshold=100):
        super().__init__(checkpoint_path)
        self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
        self.compression_threshold = compression_threshold
        self.embeddings = {}

    def add(self, experience: str) -> str:
        exp_id = super().add(experience)
        self.embeddings[exp_id] = self.embedder.encode(experience)

        # Compress if threshold exceeded
        if len(self) > self.compression_threshold:
            self._compress()

        return exp_id

    def _compress(self):
        """Merge similar experiences."""
        from sklearn.cluster import AgglomerativeClustering

        # Cluster experiences
        emb_matrix = np.array(list(self.embeddings.values()))
        clustering = AgglomerativeClustering(
            n_clusters=self.compression_threshold,
            linkage='average'
        )
        labels = clustering.fit_predict(emb_matrix)

        # Merge within clusters
        for cluster_id in set(labels):
            cluster_ids = [
                exp_id for i, exp_id in enumerate(self.experiences.keys())
                if labels[i] == cluster_id
            ]

            if len(cluster_ids) > 1:
                # Merge similar experiences
                merged_text = self._merge_experiences(cluster_ids)
                self.merge(cluster_ids, merged_text)

    def _merge_experiences(self, exp_ids: List[str]) -> str:
        """Use LLM to merge similar experiences."""
        # Implementation depends on LLM client
        pass

Caching¶

Cache LLM responses for repeated queries:

from functools import lru_cache
import hashlib

class CachedSemanticExtractor(SemanticExtractor):
    """Semantic extractor with LRU cache."""

    def __init__(self, llm_client, max_operations=3, cache_size=128):
        super().__init__(llm_client, max_operations)
        self._summarize_cache = {}
        self._cache_size = cache_size

    def summarize_trajectory(self, trajectory, use_groundtruth=True):
        # Create cache key
        key = hashlib.md5(
            f"{trajectory.query}{trajectory.output}{trajectory.reward}".encode()
        ).hexdigest()

        # Check cache
        if key in self._summarize_cache:
            return self._summarize_cache[key]

        # Compute
        summary = super().summarize_trajectory(trajectory, use_groundtruth)

        # Store in cache (LRU eviction)
        if len(self._summarize_cache) >= self._cache_size:
            # Remove oldest entry
            self._summarize_cache.pop(next(iter(self._summarize_cache)))

        self._summarize_cache[key] = summary
        return summary

Error Handling¶

Graceful Degradation¶

class RobustSemanticExtractor(SemanticExtractor):
    """Semantic extractor with error handling."""

    def extract_group_advantage(self, trajectories, experiences, use_groundtruth=True):
        try:
            return super().extract_group_advantage(
                trajectories, experiences, use_groundtruth
            )
        except Exception as e:
            print(f"Group advantage extraction failed: {e}")
            # Return empty operations (skip this group)
            return []

    def consolidate_batch(self, all_group_operations, experiences):
        try:
            return super().consolidate_batch(all_group_operations, experiences)
        except Exception as e:
            print(f"Batch consolidation failed: {e}")
            # Return flattened operations (no consolidation)
            return [op for group_ops in all_group_operations for op in group_ops]

Summary¶

The Semantic Memory API provides:

ExperienceManager: CRUD operations, persistence
SemanticExtractor: 3-stage LLM extraction
APIModelAdapter: Cloud model integration
Trajectory: Data structure for rollouts
LLMClient: Unified LLM interface

Key Design Principles: - Simple, composable interfaces - Minimal dependencies - Human-readable data formats (JSON) - Extensible via subclassing

Next Steps: - Read Main Documentation - Try Tutorials - Explore Examples

API Reference Last Updated: October 28, 2025 Gym v0.9.4 - Zoo Labs Foundation Inc