Multi-Agent Observability Platform: Trace Agent Swarms in Production
NEO added observability for multi-agent setups: spans follow planners, tools, and workers so you can actually see latency, failures, and cost when workflows get messy.
Problem Statement
We asked NEO to give us tracing, metrics, and structured logs for a whole run with many agents, and to keep causality straight when agents call tools and each other.
Solution Overview
- Distributed traces: Tie agent steps together with shared trace IDs.
- Metrics: Tokens, latency, and error rates per agent role.
- Dashboards: Compare runs and releases side by side.

Workflow / Pipeline
| Step | Description |
|---|---|
| 1. Instrument | SDK hooks for agent lifecycle and tool calls |
| 2. Collect | Spans exported to OTLP-compatible backends |
| 3. Analyze | Drill into slow paths and failure clusters |
| 4. Alert | Thresholds on latency and error budgets |
Repository & Artifacts
Generated Artifacts:
- Instrumentation SDK and collector configs
- Example dashboards for agent graphs