Skip to content

Continuous Learning GRPO - Cross-Platform Integration Plan

Date: October 28, 2025
Goal: Integrate Continuous Learning GRPO across all Hanzo AI platforms


Platforms to Integrate

  1. zoo/gym - Reference implementation (DONE)
  2. 🔄 hanzo/python-sdk - Python client library
  3. 🔄 hanzo/dev - TypeScript/JS CLI (codex-based)
  4. 🔄 hanzo/mcp - Rust/JS MCP servers
  5. 🔄 shinkai/hanzo-node - Rust inference engine
  6. 🔄 Python CLI - Standalone tool
  7. 🔄 Rust CLI - Standalone tool

Integration Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Continuous Learning GRPO Core                   │
│                  (zoo/gym - reference impl)                  │
└───────────────────────┬─────────────────────────────────────┘
        ┌───────────────┼───────────────┬───────────────┐
        │               │               │               │
        ▼               ▼               ▼               ▼
  Python SDK      hanzo-dev CLI    MCP Servers    hanzo-node
  (Client Lib)     (TS/JS CLI)    (Rust/JS)       (Inference)
        │               │               │               │
        └───────────────┴───────────────┴───────────────┘
        ┌───────────────┼───────────────┐
        │               │               │
        ▼               ▼               ▼
    Python CLI      Rust CLI      DeepSeek API

Phase 1: Python SDK (~/work/hanzo/python-sdk)

Create: pkg/hanzoai/grpo/__init__.py

from .experience_manager import ExperienceManager
from .semantic_extractor import SemanticExtractor, Trajectory
from .api_adapter import DeepSeekAdapter, APIModelAdapter

__all__ = [
    "ExperienceManager",
    "SemanticExtractor", 
    "Trajectory",
    "DeepSeekAdapter",
    "APIModelAdapter"
]

Create: pkg/hanzoai/grpo/experience_manager.py

Copy from zoo/gym implementation

Create: pkg/hanzoai/grpo/semantic_extractor.py

Copy from zoo/gym implementation

Create: pkg/hanzoai/grpo/api_adapter.py

Copy from zoo/gym implementation

Usage Example:

from hanzoai.grpo import DeepSeekAdapter, ExperienceManager

model = DeepSeekAdapter(api_key="sk-xxx")
exp_mgr = ExperienceManager()
exp_mgr.add("When coding, use type hints.")

response = model.generate_with_experiences(
    query="Write a function...",
    experiences=exp_mgr.format_for_prompt()
)

Phase 2: hanzo-dev CLI (~/work/hanzo/dev)

Create: src/commands/grpo.ts

import { Command } from 'commander';
import axios from 'axios';

interface Experience {
  id: string;
  text: string;
}

class ExperienceManager {
  private experiences: Map<string, string> = new Map();
  private nextId = 0;

  add(text: string): string {
    const id = `G${this.nextId++}`;
    this.experiences.set(id, text);
    return id;
  }

  formatForPrompt(): string {
    const lines: string[] = [];
    this.experiences.forEach((text, id) => {
      lines.push(`[${id}]. ${text}`);
    });
    return lines.join('\\n');
  }
}

class DeepSeekAdapter {
  constructor(
    private apiKey: string,
    private model = 'deepseek-chat'
  ) {}

  async generate(prompt: string): Promise<string> {
    const response = await axios.post(
      'https://api.deepseek.com/v1/chat/completions',
      {
        model: this.model,
        messages: [{ role: 'user', content: prompt }],
        temperature: 0.7
      },
      {
        headers: {
          'Authorization': `Bearer ${this.apiKey}`,
          'Content-Type': 'application/json'
        }
      }
    );
    return response.data.choices[0].message.content;
  }

  async generateWithExperiences(
    query: string,
    experiences: string
  ): Promise<string> {
    const prompt = `Please solve the problem:
${query}

When solving, use these experiences:
${experiences}`;
    return this.generate(prompt);
  }
}

const grpoCommand = new Command('grpo')
  .description('Continuous Learning GRPO commands')
  .command('generate')
  .argument('<query>', 'Query to solve')
  .option('-e, --experiences <file>', 'Load experiences from file')
  .action(async (query, options) => {
    const apiKey = process.env.DEEPSEEK_API_KEY;
    if (!apiKey) {
      console.error('DEEPSEEK_API_KEY not set');
      process.exit(1);
    }

    const model = new DeepSeekAdapter(apiKey);
    const expMgr = new ExperienceManager();

    // Load default experiences
    expMgr.add('When coding, use type hints.');
    expMgr.add('Handle edge cases.');

    const response = await model.generateWithExperiences(
      query,
      expMgr.formatForPrompt()
    );

    console.log(response);
  });

export default grpoCommand;

Add to main CLI: src/index.ts

import grpoCommand from './commands/grpo';

program
  .addCommand(grpoCommand);

Usage:

export DEEPSEEK_API_KEY=sk-xxx
hanzo-dev grpo generate "Write a binary search function"

Phase 3: MCP Servers (~/work/hanzo/mcp)

Create Rust MCP Tool: src/grpo.rs

use serde::{Deserialize, Serialize};
use std::collections::HashMap;

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Experience {
    id: String,
    text: String,
}

pub struct ExperienceManager {
    experiences: HashMap<String, String>,
    next_id: usize,
}

impl ExperienceManager {
    pub fn new() -> Self {
        Self {
            experiences: HashMap::new(),
            next_id: 0,
        }
    }

    pub fn add(&mut self, text: String) -> String {
        let id = format!("G{}", self.next_id);
        self.next_id += 1;
        self.experiences.insert(id.clone(), text);
        id
    }

    pub fn format_for_prompt(&self) -> String {
        let mut lines = Vec::new();
        for (id, text) in &self.experiences {
            lines.push(format!("[{}]. {}", id, text));
        }
        lines.join("\\n")
    }
}

// MCP tool definition
pub async fn register_grpo_tools(server: &mut McpServer) {
    server.add_tool(
        "grpo_generate",
        "Generate code with Continuous Learning GRPO",
        |params| async {
            // Implementation
        }
    );
}

Phase 4: hanzo-node (Rust Inference Engine)

Create: hanzo-libs/hanzo-grpo/src/lib.rs

mod experience;
mod semantic;
mod trainer;

pub use experience::ExperienceManager;
pub use semantic::SemanticExtractor;
pub use trainer::ContinuousLearningGRPO;

/// Core Continuous Learning GRPO implementation for hanzo-node
pub struct ContinuousLearningGRPO {
    experience_manager: ExperienceManager,
    semantic_extractor: SemanticExtractor,
}

impl ContinuousLearningGRPO {
    pub fn new(api_key: String) -> Self {
        Self {
            experience_manager: ExperienceManager::new(),
            semantic_extractor: SemanticExtractor::new(api_key),
        }
    }

    pub async fn generate_with_experiences(
        &self,
        query: &str,
    ) -> Result<String, Box<dyn std::error::Error>> {
        let experiences = self.experience_manager.format_for_prompt();

        // Call inference with experiences injected
        let prompt = format!(
            "Please solve: {}\\n\\nExperiences:\\n{}",
            query, experiences
        );

        self.semantic_extractor.generate(&prompt).await
    }
}

Integrate with hanzo-node inference:

// In hanzo-node/src/inference/mod.rs

use hanzo_grpo::ContinuousLearningGRPO;

pub struct HanzoInference {
    grpo: Option<ContinuousLearningGRPO>,
    // ... existing fields
}

impl HanzoInference {
    pub fn with_grpo(mut self, api_key: String) -> Self {
        self.grpo = Some(ContinuousLearningGRPO::new(api_key));
        self
    }

    pub async fn infer_with_grpo(&self, query: &str) -> Result<String> {
        if let Some(grpo) = &self.grpo {
            grpo.generate_with_experiences(query).await
        } else {
            // Fall back to regular inference
            self.infer(query).await
        }
    }
}

Phase 5: Python CLI Tool

Create: ~/work/zoo/gym/cli/grpo.py

#!/usr/bin/env python3
"""
Continuous Learning GRPO CLI Tool
Unified command-line interface for all platforms
"""

import sys
import os
import argparse
from pathlib import Path

# Add gym to path
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))

from gym.train.grpo.api_model_adapter import DeepSeekAdapter
from gym.train.grpo.experience_manager import ExperienceManager

def main():
    parser = argparse.ArgumentParser(
        description="Continuous Learning GRPO CLI"
    )
    subparsers = parser.add_subparsers(dest="command")

    # Generate command
    gen_parser = subparsers.add_parser("generate")
    gen_parser.add_argument("query", help="Query to solve")
    gen_parser.add_argument(
        "--experiences", 
        "-e",
        help="Load experiences from file"
    )
    gen_parser.add_argument(
        "--api-key",
        help="DeepSeek API key (or use DEEPSEEK_API_KEY env)"
    )

    # Train command
    train_parser = subparsers.add_parser("train")
    train_parser.add_argument("dataset", help="Dataset to train on")
    train_parser.add_argument("--samples", type=int, default=100)

    args = parser.parse_args()

    # Get API key
    api_key = args.api_key or os.getenv("DEEPSEEK_API_KEY")
    if not api_key:
        print("ERROR: No API key. Set DEEPSEEK_API_KEY or use --api-key")
        sys.exit(1)

    if args.command == "generate":
        model = DeepSeekAdapter(api_key=api_key)
        exp_mgr = ExperienceManager()

        # Load experiences if provided
        if args.experiences:
            exp_mgr.load(args.experiences)
        else:
            # Default experiences
            exp_mgr.add("Write type-safe code with clear annotations.")
            exp_mgr.add("Handle edge cases: empty, null, single element.")

        # Generate
        response = model.generate_with_experiences(
            args.query,
            exp_mgr.format_for_prompt()
        )
        print(response)

if __name__ == "__main__":
    main()

Install:

chmod +x cli/grpo.py
ln -s ~/work/zoo/gym/cli/grpo.py /usr/local/bin/grpo

# Usage
grpo generate "Write a function to reverse a list"

Phase 6: Rust CLI Tool

Create: ~/work/zoo/gym/cli-rust/src/main.rs

use clap::{Parser, Subcommand};
use serde_json::json;
use reqwest;

#[derive(Parser)]
#[command(name = "grpo")]
#[command(about = "Continuous Learning GRPO CLI (Rust)", long_about = None)]
struct Cli {
    #[command(subcommand)]
    command: Commands,
}

#[derive(Subcommand)]
enum Commands {
    Generate {
        query: String,

        #[arg(short, long)]
        experiences: Option<String>,
    },
}

#[tokio::main]
async fn main() {
    let cli = Cli::parse();

    match cli.command {
        Commands::Generate { query, experiences } => {
            generate_command(query, experiences).await;
        }
    }
}

async fn generate_command(query: String, _experiences: Option<String>) {
    let api_key = std::env::var("DEEPSEEK_API_KEY")
        .expect("DEEPSEEK_API_KEY not set");

    let client = reqwest::Client::new();

    let body = json!({
        "model": "deepseek-chat",
        "messages": [
            {"role": "user", "content": query}
        ],
        "temperature": 0.7
    });

    let response = client
        .post("https://api.deepseek.com/v1/chat/completions")
        .header("Authorization", format!("Bearer {}", api_key))
        .json(&body)
        .send()
        .await
        .expect("Failed to call API");

    let json: serde_json::Value = response.json().await.unwrap();
    let content = json["choices"][0]["message"]["content"]
        .as_str()
        .unwrap();

    println!("{}", content);
}

Build and install:

cd ~/work/zoo/gym/cli-rust
cargo build --release
cp target/release/grpo /usr/local/bin/grpo-rs

# Usage
grpo-rs generate "Write a function..."

Integration Testing

Test all platforms work together:

# Python SDK
python -c "from hanzoai.grpo import DeepSeekAdapter; print('✓ Python SDK')"

# hanzo-dev CLI  
hanzo-dev grpo generate "test" && echo '✓ hanzo-dev CLI'

# Python CLI
grpo generate "test" && echo '✓ Python CLI'

# Rust CLI
grpo-rs generate "test" && echo '✓ Rust CLI'

# hanzo-node
# Test via API endpoint
curl -X POST http://localhost:8080/v1/grpo/generate \\
  -d '{"query": "test"}' && echo '✓ hanzo-node'

Next Steps

  1. Copy implementations from zoo/gym to each platform
  2. Test each integration independently
  3. Create unified interface - all use same API
  4. Document usage for each platform
  5. Publish packages - PyPI, npm, crates.io

All platforms will support Continuous Learning GRPO with DeepSeek API! 🚀