Semantic Memory API Reference¶
Overview¶
The Semantic Memory system powers Continuous Learning GRPO by managing experience libraries - collections of natural language insights that guide model behavior. This API reference documents the core classes and methods.
Module Structure¶
src/gym/train/grpo/
├── experience_manager.py # CRUD operations on experience library
├── semantic_extractor.py # 3-stage LLM extraction process
└── api_model_adapter.py # Cloud model adapters (DeepSeek, OpenAI)
ExperienceManager¶
Location: src/gym/train/grpo/experience_manager.py
Manages the experience library E, providing CRUD operations and persistence.
Class Definition¶
class ExperienceManager:
"""Manages the experience library E for Continuous Learning GRPO."""
def __init__(self, checkpoint_path: str = None)
Parameters¶
- checkpoint_path (
str, optional): Path to load/save experiences (JSON format). If provided and exists, experiences are loaded automatically.
Attributes¶
- experiences (
Dict[str, str]): Dictionary mapping experience IDs to text - _next_id (
int): Counter for generating unique IDs
Methods¶
add()¶
Add new experience to library.
Parameters: - experience (str): Natural language experience statement (≤32 words recommended)
Returns: - exp_id (str): Assigned experience ID (format: "G{N}")
Example:
manager = ExperienceManager()
exp_id = manager.add("When solving equations, verify solutions by substitution.")
print(exp_id) # "G0"
delete()¶
Delete experience by ID.
Parameters: - exp_id (str): Experience ID to delete
Returns: - success (bool): True if deleted, False if ID not found
Example:
modify()¶
Modify existing experience.
Parameters: - exp_id (str): Experience ID to modify - new_experience (str): New experience text
Returns: - success (bool): True if modified, False if ID not found
Example:
merge()¶
Merge multiple experiences into one.
Parameters: - exp_ids (List[str]): List of experience IDs to merge - merged_experience (str): Merged experience text
Returns: - new_exp_id (str): ID of newly created merged experience
Side Effects: Deletes the original experiences
Example:
new_id = manager.merge(
["G1", "G3", "G7"],
"Combined guidance for polynomial equations..."
)
# G1, G3, G7 deleted; new experience created
apply_operations()¶
Apply batch of operations from LLM.
Parameters: - operations (List[Dict]): List of operation dictionaries with keys: - "option": One of ["add", "modify", "delete", "merge", "keep"] - "experience": New/modified experience text (for add/modify/merge) - "modified_from": Original experience ID (for modify) - "delete_id": Experience ID to delete (for delete) - "merged_from": List of experience IDs to merge (for merge)
Example:
operations = [
{"option": "add", "experience": "When solving..."},
{"option": "modify", "experience": "Updated...", "modified_from": "G17"},
{"option": "delete", "delete_id": "G5"},
{"option": "merge", "experience": "Merged...", "merged_from": ["G1", "G3"]}
]
manager.apply_operations(operations)
format_for_prompt()¶
Format experiences for injection into prompts.
Returns: - formatted (str): Multi-line string with numbered experiences
Example Output:
[G0]. When solving equations, verify solutions by substitution.
[G1]. For optimization, check boundary conditions first.
[G2]. In probability, use conditional probability when dependent.
Example:
experiences_str = manager.format_for_prompt()
prompt = f"Helpful experiences:\n{experiences_str}\n\nQuery: {query}"
save() / load()¶
Persist experiences to/from JSON file.
Parameters: - path (str): File path for JSON persistence
File Format:
{
"experiences": {
"G0": "Experience text...",
"G1": "Another experience...",
...
},
"next_id": 42
}
Example:
manager.save("./experiences.json")
# Later...
manager2 = ExperienceManager()
manager2.load("./experiences.json")
assert len(manager2) == len(manager)
len()¶
Get number of experiences.
Example:
SemanticExtractor¶
Location: src/gym/train/grpo/semantic_extractor.py
Extracts semantic advantages from trajectory groups using 3-stage LLM process.
Class Definition¶
class SemanticExtractor:
"""Extracts semantic advantages from groups of trajectories."""
def __init__(self, llm_client: Any, max_operations: int = 3)
Parameters¶
- llm_client: LLM client with
.chat()method (e.g.,LLMClient, OpenAI) - max_operations (
int, default=3): Max operations per group critique
Methods¶
summarize_trajectory()¶
Stage 1: Summarize a single trajectory step-by-step.
Parameters: - trajectory (Trajectory): Trajectory to summarize with fields: - query: Input query - output: Model-generated output - reward: Numerical reward (e.g., 0 for wrong, 1 for correct) - groundtruth: Optional ground truth answer - use_groundtruth (bool, default=True): Include ground truth in prompt
Returns: - summary (str): Step-by-step analysis
Example:
from gym.train.grpo.semantic_extractor import Trajectory, SemanticExtractor, LLMClient
llm = LLMClient(api_key="sk-xxx")
extractor = SemanticExtractor(llm)
traj = Trajectory(
query="Solve: x² + 2x + 5 = 0",
output="Using quadratic formula... discriminant = -16, no real solutions",
reward=1.0,
groundtruth="No real solutions"
)
summary = extractor.summarize_trajectory(traj)
print(summary)
# Output:
# 1. Applied quadratic formula with a=1, b=2, c=5
# 2. Calculated discriminant: b²-4ac = 4-20 = -16
# 3. Correctly concluded no real solutions (discriminant < 0)
extract_group_advantage()¶
Stage 2: Extract semantic advantage from group of trajectories.
def extract_group_advantage(
self,
trajectories: List[Trajectory],
experiences: str,
use_groundtruth: bool = True
) -> List[Dict]
Parameters: - trajectories (List[Trajectory]): G trajectories for same query - experiences (str): Formatted experience library (from format_for_prompt()) - use_groundtruth (bool, default=True): Include ground truth in prompt
Returns: - operations (List[Dict]): List of operations to apply
Example:
trajectories = [
Trajectory(query="Solve x²+2x+5=0", output="...", reward=1.0),
Trajectory(query="Solve x²+2x+5=0", output="...", reward=0.0),
... # 5 total trajectories
]
experiences_str = manager.format_for_prompt()
operations = extractor.extract_group_advantage(
trajectories,
experiences_str,
use_groundtruth=True
)
# Returns:
# [
# {"option": "add", "experience": "For quadratic equations, check discriminant..."},
# {"option": "modify", "experience": "...", "modified_from": "G17"}
# ]
Note: Returns empty list [] if group has no variation (all correct or all wrong).
consolidate_batch()¶
Stage 3: Consolidate all group operations into final updates.
def consolidate_batch(
self,
all_group_operations: List[List[Dict]],
experiences: str
) -> List[Dict]
Parameters: - all_group_operations (List[List[Dict]]): Operations from each group in batch - experiences (str): Current formatted experience library
Returns: - final_operations (List[Dict]): Consolidated operations
Example:
all_ops = [
[{"option": "add", "experience": "..."}], # Group 1
[{"option": "modify", "experience": "...", "modified_from": "G17"}], # Group 2
...
]
final_ops = extractor.consolidate_batch(all_ops, experiences_str)
# Returns merged/consolidated operations:
# [
# {"option": "merge", "experience": "...", "merged_from": ["G3", "G17"]},
# {"option": "delete", "delete_id": "G5"}
# ]
Trajectory¶
Location: src/gym/train/grpo/semantic_extractor.py
Data class representing a single rollout trajectory.
Definition¶
@dataclass
class Trajectory:
"""Single rollout trajectory."""
query: str # Input query/problem
output: str # Model-generated output
reward: float # Numerical reward (0-1)
groundtruth: Optional[str] = None # Ground truth answer
summary: Optional[str] = None # LLM-generated summary
Example¶
from gym.train.grpo.semantic_extractor import Trajectory
traj = Trajectory(
query="What is 5 + 3?",
output="5 + 3 = 8",
reward=1.0,
groundtruth="8",
summary="Added 5 and 3 to get 8 (correct)"
)
LLMClient¶
Location: src/gym/train/grpo/semantic_extractor.py
Simple wrapper for LLM API clients with unified .chat() interface.
Class Definition¶
class LLMClient:
"""Simple wrapper for LLM API clients."""
def __init__(
self,
api_key: str,
base_url: str = "https://api.deepseek.com/v1",
model: str = "deepseek-chat"
)
Parameters¶
- api_key (
str): API key for LLM service - base_url (
str, default=DeepSeek): Base URL for API endpoint - model (
str, default="deepseek-chat"): Model name
Methods¶
chat()¶
Send chat request to LLM.
Parameters: - prompt (str): User prompt - temperature (float, default=0.7): Sampling temperature - max_tokens (int, default=4096): Max tokens to generate
Returns: - response (str): LLM response text
Example:
llm = LLMClient(api_key="sk-xxx", model="deepseek-chat")
response = llm.chat("What is 2 + 2?")
print(response) # "4"
APIModelAdapter¶
Location: src/gym/train/grpo/api_model_adapter.py
Adapter for using cloud-hosted models (DeepSeek, OpenAI) in Continuous Learning.
Class Definition¶
class APIModelAdapter:
"""Adapter for API-hosted models."""
def __init__(self, config: APIModelConfig)
Parameters¶
- config (
APIModelConfig): Configuration with fields: api_key: API keybase_url: API endpoint URLmodel: Model nametemperature: Sampling temperaturemax_tokens: Max tokenstop_p: Nucleus sampling parameter
Methods¶
generate()¶
Generate response from API model.
def generate(
self,
prompt: str,
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
system_prompt: Optional[str] = None
) -> str
Parameters: - prompt (str): User prompt - temperature (float, optional): Override config temperature - max_tokens (int, optional): Override config max_tokens - system_prompt (str, optional): System prompt
Returns: - response (str): Generated text
Example:
from gym.train.grpo.api_model_adapter import APIModelConfig, APIModelAdapter
config = APIModelConfig(
api_key="sk-xxx",
base_url="https://api.deepseek.com/v1",
model="deepseek-chat"
)
adapter = APIModelAdapter(config)
response = adapter.generate(
"Solve: x² + 2x + 5 = 0",
system_prompt="You are a math tutor."
)
generate_with_experiences()¶
Generate response with experiences injected into prompt.
def generate_with_experiences(
self,
query: str,
experiences: str,
temperature: Optional[float] = None,
max_tokens: Optional[int] = None
) -> str
Parameters: - query (str): Problem/query to solve - experiences (str): Formatted experience library - temperature (float, optional): Sampling temperature - max_tokens (int, optional): Max tokens
Returns: - response (str): Generated solution
Example:
experiences = manager.format_for_prompt()
response = adapter.generate_with_experiences(
query="Solve: x² + 2x + 5 = 0",
experiences=experiences
)
DeepSeekAdapter¶
Convenience wrapper for DeepSeek models.
Class Definition¶
class DeepSeekAdapter(APIModelAdapter):
"""Convenience wrapper for DeepSeek models."""
def __init__(
self,
api_key: str,
model: str = "deepseek-chat",
temperature: float = 0.7
)
Supported Models: - deepseek-chat: DeepSeek-V3 (recommended) - deepseek-reasoner: DeepSeek-V3 with reasoning traces
Example:
from gym.train.grpo.api_model_adapter import DeepSeekAdapter
adapter = DeepSeekAdapter(api_key="sk-xxx")
response = adapter.generate("What is 2+2?")
OpenAIAdapter¶
Convenience wrapper for OpenAI models.
Class Definition¶
class OpenAIAdapter(APIModelAdapter):
"""Convenience wrapper for OpenAI models."""
def __init__(
self,
api_key: str,
model: str = "gpt-4o-mini",
temperature: float = 0.7
)
Supported Models: - gpt-4o: Latest GPT-4 Omni - gpt-4o-mini: Smaller, faster, cheaper - o1-preview: Reasoning model
Example:
from gym.train.grpo.api_model_adapter import OpenAIAdapter
adapter = OpenAIAdapter(api_key="sk-xxx", model="gpt-4o-mini")
response = adapter.generate("What is 2+2?")
Complete Workflow Example¶
from gym.train.grpo.experience_manager import ExperienceManager
from gym.train.grpo.semantic_extractor import SemanticExtractor, LLMClient, Trajectory
from gym.train.grpo.api_model_adapter import DeepSeekAdapter
# 1. Initialize components
experience_manager = ExperienceManager(checkpoint_path="./experiences.json")
llm_client = LLMClient(api_key="sk-xxx")
extractor = SemanticExtractor(llm_client, max_operations=3)
model_adapter = DeepSeekAdapter(api_key="sk-xxx")
# 2. Training data
queries = ["Solve: x² + 2x + 5 = 0", ...]
groundtruths = ["No real solutions", ...]
# 3. Training loop (1 epoch)
for query, gt in zip(queries, groundtruths):
# Generate G rollouts
trajectories = []
experiences = experience_manager.format_for_prompt()
for _ in range(5): # G=5
output = model_adapter.generate_with_experiences(query, experiences)
reward = 1.0 if "no real" in output.lower() else 0.0
traj = Trajectory(query, output, reward, gt)
trajectories.append(traj)
# Stage 1: Summarize
for traj in trajectories:
traj.summary = extractor.summarize_trajectory(traj)
# Stage 2: Extract advantages
operations = extractor.extract_group_advantage(
trajectories, experiences, use_groundtruth=True
)
# Apply operations
if operations:
experience_manager.apply_operations(operations)
# 4. Save
experience_manager.save("./experiences_final.json")
print(f"Final library size: {len(experience_manager)}")
Performance Optimization¶
Memory Compression¶
For large experience libraries (100+ experiences), use embedding-based compression:
from sentence_transformers import SentenceTransformer
import numpy as np
class CompressedExperienceManager(ExperienceManager):
"""Experience manager with embedding-based compression."""
def __init__(self, checkpoint_path=None, compression_threshold=100):
super().__init__(checkpoint_path)
self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
self.compression_threshold = compression_threshold
self.embeddings = {}
def add(self, experience: str) -> str:
exp_id = super().add(experience)
self.embeddings[exp_id] = self.embedder.encode(experience)
# Compress if threshold exceeded
if len(self) > self.compression_threshold:
self._compress()
return exp_id
def _compress(self):
"""Merge similar experiences."""
from sklearn.cluster import AgglomerativeClustering
# Cluster experiences
emb_matrix = np.array(list(self.embeddings.values()))
clustering = AgglomerativeClustering(
n_clusters=self.compression_threshold,
linkage='average'
)
labels = clustering.fit_predict(emb_matrix)
# Merge within clusters
for cluster_id in set(labels):
cluster_ids = [
exp_id for i, exp_id in enumerate(self.experiences.keys())
if labels[i] == cluster_id
]
if len(cluster_ids) > 1:
# Merge similar experiences
merged_text = self._merge_experiences(cluster_ids)
self.merge(cluster_ids, merged_text)
def _merge_experiences(self, exp_ids: List[str]) -> str:
"""Use LLM to merge similar experiences."""
# Implementation depends on LLM client
pass
Caching¶
Cache LLM responses for repeated queries:
from functools import lru_cache
import hashlib
class CachedSemanticExtractor(SemanticExtractor):
"""Semantic extractor with LRU cache."""
def __init__(self, llm_client, max_operations=3, cache_size=128):
super().__init__(llm_client, max_operations)
self._summarize_cache = {}
self._cache_size = cache_size
def summarize_trajectory(self, trajectory, use_groundtruth=True):
# Create cache key
key = hashlib.md5(
f"{trajectory.query}{trajectory.output}{trajectory.reward}".encode()
).hexdigest()
# Check cache
if key in self._summarize_cache:
return self._summarize_cache[key]
# Compute
summary = super().summarize_trajectory(trajectory, use_groundtruth)
# Store in cache (LRU eviction)
if len(self._summarize_cache) >= self._cache_size:
# Remove oldest entry
self._summarize_cache.pop(next(iter(self._summarize_cache)))
self._summarize_cache[key] = summary
return summary
Error Handling¶
Graceful Degradation¶
class RobustSemanticExtractor(SemanticExtractor):
"""Semantic extractor with error handling."""
def extract_group_advantage(self, trajectories, experiences, use_groundtruth=True):
try:
return super().extract_group_advantage(
trajectories, experiences, use_groundtruth
)
except Exception as e:
print(f"Group advantage extraction failed: {e}")
# Return empty operations (skip this group)
return []
def consolidate_batch(self, all_group_operations, experiences):
try:
return super().consolidate_batch(all_group_operations, experiences)
except Exception as e:
print(f"Batch consolidation failed: {e}")
# Return flattened operations (no consolidation)
return [op for group_ops in all_group_operations for op in group_ops]
Summary¶
The Semantic Memory API provides:
- ExperienceManager: CRUD operations, persistence
- SemanticExtractor: 3-stage LLM extraction
- APIModelAdapter: Cloud model integration
- Trajectory: Data structure for rollouts
- LLMClient: Unified LLM interface
Key Design Principles: - Simple, composable interfaces - Minimal dependencies - Human-readable data formats (JSON) - Extensible via subclassing
Next Steps: - Read Main Documentation - Try Tutorials - Explore Examples
API Reference Last Updated: October 28, 2025 Gym v0.9.4 - Zoo Labs Foundation Inc