# Desktop Agent
A macOS menu bar app that lets you hold a hotkey to speak to your AI agents with screen context.
---
The Chipp Desktop Agent is a native macOS menu bar app. Hold a hotkey to speak to your AI agents while they can see your screen and act on your computer -- smart dictation, form filling, cross-app workflows, and more.
> **Warning:** Desktop Agent is an Enterprise-only feature. Contact sales for access.
## What It Can Do
| Capability | Description |
|-----------|-------------|
| **Hold-to-talk voice** | Hold Fn (or custom hotkey) to speak, release to send |
| **Screen context** | Agent sees your active app, window title, focused element, and selected text |
| **Type text** | Smart dictation into any app with agent intelligence |
| **Press keys** | Execute keyboard shortcuts (Cmd+Enter, Tab, etc.) |
| **Paste text** | Insert larger blocks while preserving formatting |
| **Open apps/URLs** | Launch applications or navigate to websites |
| **Take screenshots** | Request visual context when needed |
| **Multi-agent switching** | Switch between agents from the menu bar |
## System Requirements
- **macOS 14.0 (Sonoma)** or later
- Intel or Apple Silicon (native support)
- Permissions: Accessibility, Microphone, Input Monitoring
## Getting Started
**1.**
Install the App
Download the Desktop Agent from your Chipp dashboard. Drag it into your Applications folder.
**2.**
Sign In
Launch the app and sign in with your Chipp builder account. Your session is stored securely in the macOS Keychain.
**3.**
Grant Permissions
The app guides you through three macOS permissions:
- **Accessibility** -- Lets the agent read UI elements and simulate keystrokes in other apps
- **Microphone** -- For voice input while holding the hotkey
- **Input Monitoring** -- For global hotkey capture outside the app
Each permission opens System Settings to the right location.
**4.**
Configure Your Hotkey
Default: **Fn** (hold to start, release to stop). Configurable to any key combination in the menu bar dropdown.
## Use Cases
### Smart Dictation
Hold Fn in any app and speak naturally. The agent understands context -- in Gmail, it drafts emails; in Slack, it composes messages; in VS Code, it writes code.
### Form Filling
On a web form, say "Fill this with Acme Corp's details." The agent reads the form fields, recalls information from memory, and fills each field.
### Code Review
In VS Code, select code and say "Refactor this to use async/await." The agent sees the selected text, rewrites it, and replaces the selection.
### Cross-App Workflows
"Take what's in this spreadsheet and draft a summary email." The agent captures visible data, drafts the email, opens Mail, and pastes it.
### Quick Research
"What was the conversion rate Sarah mentioned last week?" The agent speaks the answer aloud using memory -- no typing needed.
## The Command Bar
Press **Cmd+K** to open a Spotlight-style floating overlay where you can:
- Type commands to the active agent
- View recent actions and history
- Switch agents without opening the menu bar
## Screen Context Details
When you hold the hotkey, the agent can see:
| Context | What It Reads |
|---------|---------------|
| **Active app** | Which application is focused (Gmail, VS Code, Slack, etc.) |
| **Window title** | The current window name (e.g., "Compose - Gmail") |
| **Focused element** | Text fields, buttons, and UI elements via Accessibility APIs |
| **Selected text** | Highlighted content in the active app |
| **Screenshots** | Full window or screen region on request |
Context is captured **on-demand only** when the hotkey is held -- there's no background recording.
## Troubleshooting
**Hotkey not working?**
- Check Input Monitoring permission in System Settings → Privacy → Input Monitoring
- If using a custom hotkey, verify it doesn't conflict with system or app shortcuts
**Agent can't see the screen?**
- Verify Accessibility permission is granted
- Some Electron apps (Slack, VS Code) expose limited accessibility data -- the agent may request a screenshot instead
**Text not typing into apps?**
- The app uses multiple methods (character-by-character, clipboard paste, accessibility APIs) with automatic fallback
- If direct typing fails in a specific app, the agent falls back to clipboard paste
> **Note:** Desktop Agent includes all voice capabilities. For voice configuration details, see the [Voice Agents guide](/docs/integrations/voice-agents).