Skip to content

Evolve

Self-improving agents through iterations.

Coding agents repeat the same mistakes because they start fresh every session. Evolve gives agents memory — they learn from what worked and what didn't, so each session is better than the last.

On the AppWorld benchmark, Evolve improved agent reliability by +8.9 points overall, with a 74% relative increase on hard multi-step tasks. See the full results and the paper (arXiv:2603.10600). Evolve is a system designed to help agents improve over time by learning from their trajectories. It uses a combination of an MCP server for tool integration, vector storage for memory, and LLM-based conflict resolution to refine its knowledge base.

[!IMPORTANT] ⭐ Star the repo: it helps others discover it.

When setting up API keys and extra services are too much

General Installation

Claude Code IBM Bob Codex

Total Control

Under Development

  • MCP Server


    Exposes tools to get guidelines and save trajectories.

  • Conflict Resolution


    Intelligently merges new insights with existing guidelines using LLMs.

  • Trajectory Analysis


    Automatically analyzes agent trajectories to generate guidelines and best practices.

  • Milvus Integration


    Uses Milvus (or Milvus Lite) for efficient vector storage and retrieval.

Guides

  • Configuration: Configure models, backends, and environment variables.
  • Low-Code Tracing: Instrument agents with Phoenix and verify end-to-end tracing.
  • Phoenix Sync: Pull trajectories from Phoenix and generate stored guidelines.
  • Extract Trajectories: Export Phoenix traces into an OpenAI-style message format.

Reference

  • CLI Reference: Manage namespaces, entities, and sync jobs from the command line.
  • Policies: Structured policy entities and how to retrieve them with MCP tools.

Demos

How It Works

Evolve analyzes agent trajectories to extract guidelines and best practices, then recalls them in future sessions. It supports both a lightweight file-based mode (Evolve Lite) and a full mode backed by an MCP server with vector storage and LLM-based conflict resolution.

Architecture Architecture