ALTK Components¶
We summarize the components currently in ALTK in the table below.

| Lifecycle Step | Component | Problem | Description | Performance | Resources |
|---|---|---|---|---|---|
| Pre-LLM | Spotlight | Agent does not follow instructions in the prompt. | SpotLight enables users to emphasize important spans within their prompt and steers the LLMs attention towards those spans. It is an inference-time hook and does not involve any training or changes to model weights. | 5 and 40 point accuracy improvements | Paper |
| Pre-tool | Refraction | Agent generates inconsistent tool sequences. | Verify the syntax of tool call sequences and repair any errors that will result in execution failures. | 48% error correction | Demo |
| Pre-tool | SPARC | The agent calls incorrect tools (in the wrong order, redundantly, etc.) or uses incorrect or hallucinated arguments. | Evaluates tool calls before execution, identifying potential issues and suggesting corrections with reasoning for tool selection or argument values, including the corrected values. | Achieved 88% accuracy in detecting tool-calling mistakes and +15% improvement in end-to-end tool-calling agent pass^k performance across GPT-4o, GPT-4o-mini, and Mistral-Large models. | |
| Post-tool | JSON Processor | Agent gets overwhelmed with large JSON payloads in its context. | If the agent calls tools which generate complex JSON objects as responses, this component will use LLM based Python code generation to process those responses and extract relevant information from them. | +3 to +50 percentage point gains observed across 15 model from various families and sized on a dataset with 1298 samples | Paper, Demo |
| Post-tool | Silent Review | Tool calls return subtle semantic errors that aren’t handled by the agent. | A prompt-based approach to identify silent errors in tool calls (errors that do not produce any visible or explicit error message); Determines whether the tool response is relevant, accurate and complete based on the user's query | 4% improvement observed in end-to-end agent accuracy | |
| Post-tool | RAG Repair | Agent isn’t able to recover from tool call failures. | Given a failing tool call, this component attempts to use an LLM to repair the call while making use of domain documents such as documentation or troubleshooting examples via RAG. This component will require a set of related documents to ingest | 8% improvement observed on models like GPT-4o | Paper |
| Pre-Response | Policy Guard | Agent returns responses that violate policies or instructions. | Checks if the agent's output adheres to the policy statement and repairs the output if it does not | +10 point improvement in accuracy | Paper |