Continuous Learning GRPO - Cross-Platform Integration Plan¶
Date: October 28, 2025
Goal: Integrate Continuous Learning GRPO across all Hanzo AI platforms
Platforms to Integrate¶
- ✅ zoo/gym - Reference implementation (DONE)
- 🔄 hanzo/python-sdk - Python client library
- 🔄 hanzo/dev - TypeScript/JS CLI (codex-based)
- 🔄 hanzo/mcp - Rust/JS MCP servers
- 🔄 shinkai/hanzo-node - Rust inference engine
- 🔄 Python CLI - Standalone tool
- 🔄 Rust CLI - Standalone tool
Integration Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ Continuous Learning GRPO Core │
│ (zoo/gym - reference impl) │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────────┼───────────────┬───────────────┐
│ │ │ │
▼ ▼ ▼ ▼
Python SDK hanzo-dev CLI MCP Servers hanzo-node
(Client Lib) (TS/JS CLI) (Rust/JS) (Inference)
│ │ │ │
└───────────────┴───────────────┴───────────────┘
│
┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
Python CLI Rust CLI DeepSeek API
Phase 1: Python SDK (~/work/hanzo/python-sdk)¶
Create: pkg/hanzoai/grpo/__init__.py¶
from .experience_manager import ExperienceManager
from .semantic_extractor import SemanticExtractor, Trajectory
from .api_adapter import DeepSeekAdapter, APIModelAdapter
__all__ = [
"ExperienceManager",
"SemanticExtractor",
"Trajectory",
"DeepSeekAdapter",
"APIModelAdapter"
]
Create: pkg/hanzoai/grpo/experience_manager.py¶
Copy from zoo/gym implementation
Create: pkg/hanzoai/grpo/semantic_extractor.py¶
Copy from zoo/gym implementation
Create: pkg/hanzoai/grpo/api_adapter.py¶
Copy from zoo/gym implementation
Usage Example:¶
from hanzoai.grpo import DeepSeekAdapter, ExperienceManager
model = DeepSeekAdapter(api_key="sk-xxx")
exp_mgr = ExperienceManager()
exp_mgr.add("When coding, use type hints.")
response = model.generate_with_experiences(
query="Write a function...",
experiences=exp_mgr.format_for_prompt()
)
Phase 2: hanzo-dev CLI (~/work/hanzo/dev)¶
Create: src/commands/grpo.ts¶
import { Command } from 'commander';
import axios from 'axios';
interface Experience {
id: string;
text: string;
}
class ExperienceManager {
private experiences: Map<string, string> = new Map();
private nextId = 0;
add(text: string): string {
const id = `G${this.nextId++}`;
this.experiences.set(id, text);
return id;
}
formatForPrompt(): string {
const lines: string[] = [];
this.experiences.forEach((text, id) => {
lines.push(`[${id}]. ${text}`);
});
return lines.join('\\n');
}
}
class DeepSeekAdapter {
constructor(
private apiKey: string,
private model = 'deepseek-chat'
) {}
async generate(prompt: string): Promise<string> {
const response = await axios.post(
'https://api.deepseek.com/v1/chat/completions',
{
model: this.model,
messages: [{ role: 'user', content: prompt }],
temperature: 0.7
},
{
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json'
}
}
);
return response.data.choices[0].message.content;
}
async generateWithExperiences(
query: string,
experiences: string
): Promise<string> {
const prompt = `Please solve the problem:
${query}
When solving, use these experiences:
${experiences}`;
return this.generate(prompt);
}
}
const grpoCommand = new Command('grpo')
.description('Continuous Learning GRPO commands')
.command('generate')
.argument('<query>', 'Query to solve')
.option('-e, --experiences <file>', 'Load experiences from file')
.action(async (query, options) => {
const apiKey = process.env.DEEPSEEK_API_KEY;
if (!apiKey) {
console.error('DEEPSEEK_API_KEY not set');
process.exit(1);
}
const model = new DeepSeekAdapter(apiKey);
const expMgr = new ExperienceManager();
// Load default experiences
expMgr.add('When coding, use type hints.');
expMgr.add('Handle edge cases.');
const response = await model.generateWithExperiences(
query,
expMgr.formatForPrompt()
);
console.log(response);
});
export default grpoCommand;
Add to main CLI: src/index.ts¶
Usage:¶
Phase 3: MCP Servers (~/work/hanzo/mcp)¶
Create Rust MCP Tool: src/grpo.rs¶
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Experience {
id: String,
text: String,
}
pub struct ExperienceManager {
experiences: HashMap<String, String>,
next_id: usize,
}
impl ExperienceManager {
pub fn new() -> Self {
Self {
experiences: HashMap::new(),
next_id: 0,
}
}
pub fn add(&mut self, text: String) -> String {
let id = format!("G{}", self.next_id);
self.next_id += 1;
self.experiences.insert(id.clone(), text);
id
}
pub fn format_for_prompt(&self) -> String {
let mut lines = Vec::new();
for (id, text) in &self.experiences {
lines.push(format!("[{}]. {}", id, text));
}
lines.join("\\n")
}
}
// MCP tool definition
pub async fn register_grpo_tools(server: &mut McpServer) {
server.add_tool(
"grpo_generate",
"Generate code with Continuous Learning GRPO",
|params| async {
// Implementation
}
);
}
Phase 4: hanzo-node (Rust Inference Engine)¶
Create: hanzo-libs/hanzo-grpo/src/lib.rs¶
mod experience;
mod semantic;
mod trainer;
pub use experience::ExperienceManager;
pub use semantic::SemanticExtractor;
pub use trainer::ContinuousLearningGRPO;
/// Core Continuous Learning GRPO implementation for hanzo-node
pub struct ContinuousLearningGRPO {
experience_manager: ExperienceManager,
semantic_extractor: SemanticExtractor,
}
impl ContinuousLearningGRPO {
pub fn new(api_key: String) -> Self {
Self {
experience_manager: ExperienceManager::new(),
semantic_extractor: SemanticExtractor::new(api_key),
}
}
pub async fn generate_with_experiences(
&self,
query: &str,
) -> Result<String, Box<dyn std::error::Error>> {
let experiences = self.experience_manager.format_for_prompt();
// Call inference with experiences injected
let prompt = format!(
"Please solve: {}\\n\\nExperiences:\\n{}",
query, experiences
);
self.semantic_extractor.generate(&prompt).await
}
}
Integrate with hanzo-node inference:¶
// In hanzo-node/src/inference/mod.rs
use hanzo_grpo::ContinuousLearningGRPO;
pub struct HanzoInference {
grpo: Option<ContinuousLearningGRPO>,
// ... existing fields
}
impl HanzoInference {
pub fn with_grpo(mut self, api_key: String) -> Self {
self.grpo = Some(ContinuousLearningGRPO::new(api_key));
self
}
pub async fn infer_with_grpo(&self, query: &str) -> Result<String> {
if let Some(grpo) = &self.grpo {
grpo.generate_with_experiences(query).await
} else {
// Fall back to regular inference
self.infer(query).await
}
}
}
Phase 5: Python CLI Tool¶
Create: ~/work/zoo/gym/cli/grpo.py¶
#!/usr/bin/env python3
"""
Continuous Learning GRPO CLI Tool
Unified command-line interface for all platforms
"""
import sys
import os
import argparse
from pathlib import Path
# Add gym to path
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
from gym.train.grpo.api_model_adapter import DeepSeekAdapter
from gym.train.grpo.experience_manager import ExperienceManager
def main():
parser = argparse.ArgumentParser(
description="Continuous Learning GRPO CLI"
)
subparsers = parser.add_subparsers(dest="command")
# Generate command
gen_parser = subparsers.add_parser("generate")
gen_parser.add_argument("query", help="Query to solve")
gen_parser.add_argument(
"--experiences",
"-e",
help="Load experiences from file"
)
gen_parser.add_argument(
"--api-key",
help="DeepSeek API key (or use DEEPSEEK_API_KEY env)"
)
# Train command
train_parser = subparsers.add_parser("train")
train_parser.add_argument("dataset", help="Dataset to train on")
train_parser.add_argument("--samples", type=int, default=100)
args = parser.parse_args()
# Get API key
api_key = args.api_key or os.getenv("DEEPSEEK_API_KEY")
if not api_key:
print("ERROR: No API key. Set DEEPSEEK_API_KEY or use --api-key")
sys.exit(1)
if args.command == "generate":
model = DeepSeekAdapter(api_key=api_key)
exp_mgr = ExperienceManager()
# Load experiences if provided
if args.experiences:
exp_mgr.load(args.experiences)
else:
# Default experiences
exp_mgr.add("Write type-safe code with clear annotations.")
exp_mgr.add("Handle edge cases: empty, null, single element.")
# Generate
response = model.generate_with_experiences(
args.query,
exp_mgr.format_for_prompt()
)
print(response)
if __name__ == "__main__":
main()
Install:¶
chmod +x cli/grpo.py
ln -s ~/work/zoo/gym/cli/grpo.py /usr/local/bin/grpo
# Usage
grpo generate "Write a function to reverse a list"
Phase 6: Rust CLI Tool¶
Create: ~/work/zoo/gym/cli-rust/src/main.rs¶
use clap::{Parser, Subcommand};
use serde_json::json;
use reqwest;
#[derive(Parser)]
#[command(name = "grpo")]
#[command(about = "Continuous Learning GRPO CLI (Rust)", long_about = None)]
struct Cli {
#[command(subcommand)]
command: Commands,
}
#[derive(Subcommand)]
enum Commands {
Generate {
query: String,
#[arg(short, long)]
experiences: Option<String>,
},
}
#[tokio::main]
async fn main() {
let cli = Cli::parse();
match cli.command {
Commands::Generate { query, experiences } => {
generate_command(query, experiences).await;
}
}
}
async fn generate_command(query: String, _experiences: Option<String>) {
let api_key = std::env::var("DEEPSEEK_API_KEY")
.expect("DEEPSEEK_API_KEY not set");
let client = reqwest::Client::new();
let body = json!({
"model": "deepseek-chat",
"messages": [
{"role": "user", "content": query}
],
"temperature": 0.7
});
let response = client
.post("https://api.deepseek.com/v1/chat/completions")
.header("Authorization", format!("Bearer {}", api_key))
.json(&body)
.send()
.await
.expect("Failed to call API");
let json: serde_json::Value = response.json().await.unwrap();
let content = json["choices"][0]["message"]["content"]
.as_str()
.unwrap();
println!("{}", content);
}
Build and install:¶
cd ~/work/zoo/gym/cli-rust
cargo build --release
cp target/release/grpo /usr/local/bin/grpo-rs
# Usage
grpo-rs generate "Write a function..."
Integration Testing¶
Test all platforms work together:¶
# Python SDK
python -c "from hanzoai.grpo import DeepSeekAdapter; print('✓ Python SDK')"
# hanzo-dev CLI
hanzo-dev grpo generate "test" && echo '✓ hanzo-dev CLI'
# Python CLI
grpo generate "test" && echo '✓ Python CLI'
# Rust CLI
grpo-rs generate "test" && echo '✓ Rust CLI'
# hanzo-node
# Test via API endpoint
curl -X POST http://localhost:8080/v1/grpo/generate \\
-d '{"query": "test"}' && echo '✓ hanzo-node'
Next Steps¶
- Copy implementations from zoo/gym to each platform
- Test each integration independently
- Create unified interface - all use same API
- Document usage for each platform
- Publish packages - PyPI, npm, crates.io
All platforms will support Continuous Learning GRPO with DeepSeek API! 🚀