Performance Dashboard

Monitor and optimize your AI agent's response speed with real-time latency metrics.

The Performance tab gives you real-time visibility into how fast your AI agent responds. Use it to identify bottlenecks, compare models, and optimize the experience for your users.

Accessing Performance

Open any app in the builder and click the Performance tab. Data starts collecting automatically after the first message is sent to your agent.

Time Range

Use the range selector at the top to filter all metrics by time period:

Range	Shows
24h	Last 24 hours
7d	Last 7 days (default)
30d	Last 30 days
90d	Last 90 days

All cards, charts, and session lists update when you change the range.

Overview Cards

The top section displays four key metrics:

Median TTFC (Time to First Chunk)

The most important metric. This is how long a user waits before the first word appears in the response. The card is color-coded:

Color	Range	Meaning
Green	Under 800ms	Excellent — feels instant
Yellow	800ms — 2s	Acceptable for most use cases
Red	Over 2s	Users may notice a delay

A trend arrow shows whether TTFC is improving or degrading compared to the previous period.

The 95th percentile response start time. This tells you the worst-case experience — 95% of requests start faster than this value. If your median is fast but P95 is slow, some users are hitting edge cases (complex prompts, large knowledge bases, or slow tool calls).

Success Rate

The percentage of streams that completed without errors. A healthy agent should be above 98%. If this drops, check for:

Model API outages
Tool execution failures
Rate limiting

Total Streams

The total number of chat responses generated in the selected period. A trend arrow shows volume changes compared to the previous period.

Pipeline Breakdown

This section shows where time is spent before the first token arrives. The bar is split into two phases:

Setup: Context gathering, tool resolution, message history loading, and knowledge retrieval. This is work your agent does before calling the LLM.
Model: The time from when the LLM request is sent to when the first token arrives. This is the LLM provider’s latency.

💡

If Setup is the majority of your TTFC, consider reducing the number of active tools, trimming conversation history length, or optimizing your knowledge sources. If Model dominates, try switching to a faster model or reducing your system prompt size.

Token Usage

Shows the average input and output tokens per request. High input token counts can increase TTFC because the model has more context to process. Common causes of high input tokens:

Long system prompts
Large knowledge source context
Deep conversation history

Model Comparison

If your agent has used multiple models (e.g., after switching models or using model overrides), this table compares them side-by-side:

Column	Description
Model	The model identifier
Streams	Number of requests processed
Median TTFC	Median time to first chunk
P95 TTFC	95th percentile TTFC
Median Duration	Median total response time
Avg Input/Output	Average token counts
Error Rate	Percentage of failed streams

Click a model row to filter the entire dashboard to that model. A filter bar appears at the top with a clear button.

ℹ️

Model comparison is especially useful when evaluating whether a model upgrade improves response speed. Switch models, wait for data to accumulate, then compare the before and after.

Recent Sessions

A paginated list of individual chat sessions with performance data:

Column	Description
Time	When the session occurred
Model	Model used
TTFC	Time to first chunk
Duration	Total response time
Tokens	Input and output token counts
Status	Success, error, or client disconnect

Click a session to open the Session Detail Drawer, which shows a per-stream breakdown including:

Individual phase timings (app config, billing, session load, context, tools, messages, agent ready, first chunk)
Whether knowledge sources were used
How many tools were called
Conversation history length at the time of the request
Whether deep thinking or deep research was active

Optimization Tips

1

Check your Pipeline Breakdown first

If Setup is slow, the bottleneck is on your side (tools, knowledge, history). If Model is slow, the bottleneck is the LLM provider.

2

Compare models

Smaller, faster models like GPT-4.1 Mini often have significantly lower TTFC than larger models. Use the Model Comparison table to find the right balance of speed and quality.

3

Watch your P95

A good median with a bad P95 means some users are having a poor experience. Drill into slow sessions to find the pattern — it is often a specific tool or a large knowledge source query.

4

Monitor trends over time

Use the trend arrows on the overview cards to catch regressions early. A sudden TTFC increase after a configuration change tells you exactly what to roll back.

Troubleshooting

No Performance Data

Performance metrics are collected from streaming chat responses. If you see the empty state:

Send a test message to your agent
Wait a minute for data to appear
Refresh the Performance tab

TTFC Seems High

Check the Pipeline Breakdown — is Setup or Model the bottleneck?
Review your knowledge sources (large document collections increase context time)
Check if deep thinking or deep research is enabled (these add processing time by design)
Try the same prompt with a faster model to isolate the issue