Skip to Content

AutoPrompter: Closed-Loop Autonomous Prompt Optimization

NEO built a closed loop around prompts: synthetic data, scoring, failure review, and a persistent ledger so you can tell whether a change actually helped.


Problem Statement

We asked NEO to split an Optimizer LLM (writes and revises prompts) from a Target LLM you are evaluating. Run batches, score with accuracy or semantic similarity, and log every iteration so comparisons stay fair and repeatable.


Solution Overview

NEO shipped AutoPrompter with:

  1. Dual-model setup: Optimizer (for example Gemini Flash) and Target (for example Qwen 3.5 9B) at different temperatures.
  2. Metrics: Classification accuracy or embedding similarity for open-ended tasks.
  3. Experiment ledger: JSON history with caps and summarization for long runs.
  4. YAML + CLI: Full config plus --override for quick tweaks.
  5. Backends: OpenRouter, Ollama, or llama.cpp from config.

AutoPrompter pipeline

Workflow / Pipeline

StepDescription
1. DatasetOptimizer generates synthetic examples for the task (e.g. classification)
2. ExecuteTarget runs current prompt on the batch; scores recorded
3. Stop or iterateStop if score ≥ convergence threshold or improvement < minimum delta
4. RefineOptimizer reads failure summaries and writes next prompt; ledger updated

Repository & Artifacts

gauravvij/autoprompterView on GitHub

Generated Artifacts:


References

View source on GitHub


Learn More