Evolve¶
Self-improving agents through iterations.
Coding agents repeat the same mistakes because they start fresh every session. Evolve gives agents memory — they learn from what worked and what didn't, so each session is better than the last.
On the AppWorld benchmark, Evolve improved agent reliability by +8.9 points overall, with a 74% relative increase on hard multi-step tasks. See the full results and the paper (arXiv:2603.10600). Evolve is a system designed to help agents improve over time by learning from their trajectories. It uses a combination of an MCP server for tool integration, vector storage for memory, and LLM-based conflict resolution to refine its knowledge base.
[!IMPORTANT] ⭐ Star the repo: it helps others discover it.
When setting up API keys and extra services are too much
Total Control
Under Development
-
MCP Server
Exposes tools to get guidelines and save trajectories.
-
Conflict Resolution
Intelligently merges new insights with existing guidelines using LLMs.
-
Trajectory Analysis
Automatically analyzes agent trajectories to generate guidelines and best practices.
-
Milvus Integration
Uses Milvus (or Milvus Lite) for efficient vector storage and retrieval.
Guides¶
- Configuration: Configure models, backends, and environment variables.
- Low-Code Tracing: Instrument agents with Phoenix and verify end-to-end tracing.
- Phoenix Sync: Pull trajectories from Phoenix and generate stored guidelines.
- Extract Trajectories: Export Phoenix traces into an OpenAI-style message format.
Reference¶
- CLI Reference: Manage namespaces, entities, and sync jobs from the command line.
- Policies: Structured policy entities and how to retrieve them with MCP tools.
Demos¶
How It Works¶
Evolve analyzes agent trajectories to extract guidelines and best practices, then recalls them in future sessions. It supports both a lightweight file-based mode (Evolve Lite) and a full mode backed by an MCP server with vector storage and LLM-based conflict resolution.