Guides

Performance Dashboard

Monitor and optimize your AI agent's response speed with real-time latency metrics.

| View as Markdown
Hunter Hodnett
Hunter Hodnett CPTO at Chipp
| 1 min read
# performance # monitoring # optimization # latency # tutorials

The Performance tab gives you real-time visibility into how fast your AI agent responds. Use it to identify bottlenecks, compare models, and optimize the experience for your users.

Accessing Performance

Open any app in the builder and click the Performance tab. Data starts collecting automatically after the first message is sent to your agent.

Time Range

Use the range selector at the top to filter all metrics by time period:

RangeShows
24hLast 24 hours
7dLast 7 days (default)
30dLast 30 days
90dLast 90 days

All cards, charts, and session lists update when you change the range.

Overview Cards

The top section displays four key metrics:

Median TTFC (Time to First Chunk)

The most important metric. This is how long a user waits before the first word appears in the response. The card is color-coded:

ColorRangeMeaning
GreenUnder 800msExcellent — feels instant
Yellow800ms — 2sAcceptable for most use cases
RedOver 2sUsers may notice a delay

A trend arrow shows whether TTFC is improving or degrading compared to the previous period.

P95 TTFC

The 95th percentile response start time. This tells you the worst-case experience — 95% of requests start faster than this value. If your median is fast but P95 is slow, some users are hitting edge cases (complex prompts, large knowledge bases, or slow tool calls).

Success Rate

The percentage of streams that completed without errors. A healthy agent should be above 98%. If this drops, check for:

  • Model API outages
  • Tool execution failures
  • Rate limiting

Total Streams

The total number of chat responses generated in the selected period. A trend arrow shows volume changes compared to the previous period.

Pipeline Breakdown

This section shows where time is spent before the first token arrives. The bar is split into two phases:

  • Setup: Context gathering, tool resolution, message history loading, and knowledge retrieval. This is work your agent does before calling the LLM.
  • Model: The time from when the LLM request is sent to when the first token arrives. This is the LLM provider’s latency.
💡

If Setup is the majority of your TTFC, consider reducing the number of active tools, trimming conversation history length, or optimizing your knowledge sources. If Model dominates, try switching to a faster model or reducing your system prompt size.

Token Usage

Shows the average input and output tokens per request. High input token counts can increase TTFC because the model has more context to process. Common causes of high input tokens:

  • Long system prompts
  • Large knowledge source context
  • Deep conversation history

Model Comparison

If your agent has used multiple models (e.g., after switching models or using model overrides), this table compares them side-by-side:

ColumnDescription
ModelThe model identifier
StreamsNumber of requests processed
Median TTFCMedian time to first chunk
P95 TTFC95th percentile TTFC
Median DurationMedian total response time
Avg Input/OutputAverage token counts
Error RatePercentage of failed streams

Click a model row to filter the entire dashboard to that model. A filter bar appears at the top with a clear button.

ℹ️

Model comparison is especially useful when evaluating whether a model upgrade improves response speed. Switch models, wait for data to accumulate, then compare the before and after.

Recent Sessions

A paginated list of individual chat sessions with performance data:

ColumnDescription
TimeWhen the session occurred
ModelModel used
TTFCTime to first chunk
DurationTotal response time
TokensInput and output token counts
StatusSuccess, error, or client disconnect

Click a session to open the Session Detail Drawer, which shows a per-stream breakdown including:

  • Individual phase timings (app config, billing, session load, context, tools, messages, agent ready, first chunk)
  • Whether knowledge sources were used
  • How many tools were called
  • Conversation history length at the time of the request
  • Whether deep thinking or deep research was active

Optimization Tips

1

Check your Pipeline Breakdown first

If Setup is slow, the bottleneck is on your side (tools, knowledge, history). If Model is slow, the bottleneck is the LLM provider.

2

Compare models

Smaller, faster models like GPT-4.1 Mini often have significantly lower TTFC than larger models. Use the Model Comparison table to find the right balance of speed and quality.

3

Watch your P95

A good median with a bad P95 means some users are having a poor experience. Drill into slow sessions to find the pattern — it is often a specific tool or a large knowledge source query.

4

Monitor trends over time

Use the trend arrows on the overview cards to catch regressions early. A sudden TTFC increase after a configuration change tells you exactly what to roll back.

Troubleshooting

No Performance Data

Performance metrics are collected from streaming chat responses. If you see the empty state:

  1. Send a test message to your agent
  2. Wait a minute for data to appear
  3. Refresh the Performance tab

TTFC Seems High

  1. Check the Pipeline Breakdown — is Setup or Model the bottleneck?
  2. Review your knowledge sources (large document collections increase context time)
  3. Check if deep thinking or deep research is enabled (these add processing time by design)
  4. Try the same prompt with a faster model to isolate the issue