# Performance Dashboard

Monitor and optimize your AI agent's response speed with real-time latency metrics.

---

The Performance tab gives you real-time visibility into how fast your AI agent responds. Use it to identify bottlenecks, compare models, and optimize the experience for your users.

## Accessing Performance

Open any app in the builder and click the **Performance** tab. Data starts collecting automatically after the first message is sent to your agent.

## Time Range

Use the range selector at the top to filter all metrics by time period:

| Range | Shows |
|-------|-------|
| **24h** | Last 24 hours |
| **7d** | Last 7 days (default) |
| **30d** | Last 30 days |
| **90d** | Last 90 days |

All cards, charts, and session lists update when you change the range.

## Overview Cards

The top section displays four key metrics:

### Median TTFC (Time to First Chunk)

The most important metric. This is how long a user waits before the first word appears in the response. The card is color-coded:

| Color | Range | Meaning |
|-------|-------|---------|
| **Green** | Under 800ms | Excellent -- feels instant |
| **Yellow** | 800ms -- 2s | Acceptable for most use cases |
| **Red** | Over 2s | Users may notice a delay |

A trend arrow shows whether TTFC is improving or degrading compared to the previous period.

### P95 TTFC

The 95th percentile response start time. This tells you the worst-case experience -- 95% of requests start faster than this value. If your median is fast but P95 is slow, some users are hitting edge cases (complex prompts, large knowledge bases, or slow tool calls).

### Success Rate

The percentage of streams that completed without errors. A healthy agent should be above 98%. If this drops, check for:

- Model API outages
- Tool execution failures
- Rate limiting

### Total Streams

The total number of chat responses generated in the selected period. A trend arrow shows volume changes compared to the previous period.

## Pipeline Breakdown

This section shows where time is spent before the first token arrives. The bar is split into two phases:

- **Setup**: Context gathering, tool resolution, message history loading, and knowledge retrieval. This is work your agent does before calling the LLM.
- **Model**: The time from when the LLM request is sent to when the first token arrives. This is the LLM provider's latency.

> **Tip:** If Setup is the majority of your TTFC, consider reducing the number of active tools, trimming conversation history length, or optimizing your knowledge sources. If Model dominates, try switching to a faster model or reducing your system prompt size.

## Token Usage

Shows the average input and output tokens per request. High input token counts can increase TTFC because the model has more context to process. Common causes of high input tokens:

- Long system prompts
- Large knowledge source context
- Deep conversation history

## Model Comparison

If your agent has used multiple models (e.g., after switching models or using model overrides), this table compares them side-by-side:

| Column | Description |
|--------|-------------|
| **Model** | The model identifier |
| **Streams** | Number of requests processed |
| **Median TTFC** | Median time to first chunk |
| **P95 TTFC** | 95th percentile TTFC |
| **Median Duration** | Median total response time |
| **Avg Input/Output** | Average token counts |
| **Error Rate** | Percentage of failed streams |

Click a model row to filter the entire dashboard to that model. A filter bar appears at the top with a clear button.

> **Note:** Model comparison is especially useful when evaluating whether a model upgrade improves response speed. Switch models, wait for data to accumulate, then compare the before and after.

## Recent Sessions

A paginated list of individual chat sessions with performance data:

| Column | Description |
|--------|-------------|
| **Time** | When the session occurred |
| **Model** | Model used |
| **TTFC** | Time to first chunk |
| **Duration** | Total response time |
| **Tokens** | Input and output token counts |
| **Status** | Success, error, or client disconnect |

Click a session to open the **Session Detail Drawer**, which shows a per-stream breakdown including:

- Individual phase timings (app config, billing, session load, context, tools, messages, agent ready, first chunk)
- Whether knowledge sources were used
- How many tools were called
- Conversation history length at the time of the request
- Whether deep thinking or deep research was active

## Optimization Tips


  **1.** 
    <strong>Check your Pipeline Breakdown first</strong>

    If Setup is slow, the bottleneck is on your side (tools, knowledge, history). If Model is slow, the bottleneck is the LLM provider.
  

  **2.** 
    <strong>Compare models</strong>

    Smaller, faster models like GPT-4.1 Mini often have significantly lower TTFC than larger models. Use the Model Comparison table to find the right balance of speed and quality.
  

  **3.** 
    <strong>Watch your P95</strong>

    A good median with a bad P95 means some users are having a poor experience. Drill into slow sessions to find the pattern -- it is often a specific tool or a large knowledge source query.
  

  **4.** 
    <strong>Monitor trends over time</strong>

    Use the trend arrows on the overview cards to catch regressions early. A sudden TTFC increase after a configuration change tells you exactly what to roll back.
  

## Troubleshooting

### No Performance Data

Performance metrics are collected from streaming chat responses. If you see the empty state:

1. Send a test message to your agent
2. Wait a minute for data to appear
3. Refresh the Performance tab

### TTFC Seems High

1. Check the Pipeline Breakdown -- is Setup or Model the bottleneck?
2. Review your knowledge sources (large document collections increase context time)
3. Check if deep thinking or deep research is enabled (these add processing time by design)
4. Try the same prompt with a faster model to isolate the issue