# Desktop Agent A macOS menu bar app that lets you hold a hotkey to speak to your AI agents with screen context. --- The Chipp Desktop Agent is a native macOS menu bar app. Hold a hotkey to speak to your AI agents while they can see your screen and act on your computer -- smart dictation, form filling, cross-app workflows, and more. > **Warning:** Desktop Agent is an Enterprise-only feature. Contact sales for access. ## What It Can Do | Capability | Description | |-----------|-------------| | **Hold-to-talk voice** | Hold Fn (or custom hotkey) to speak, release to send | | **Screen context** | Agent sees your active app, window title, focused element, and selected text | | **Type text** | Smart dictation into any app with agent intelligence | | **Press keys** | Execute keyboard shortcuts (Cmd+Enter, Tab, etc.) | | **Paste text** | Insert larger blocks while preserving formatting | | **Open apps/URLs** | Launch applications or navigate to websites | | **Take screenshots** | Request visual context when needed | | **Multi-agent switching** | Switch between agents from the menu bar | ## System Requirements - **macOS 14.0 (Sonoma)** or later - Intel or Apple Silicon (native support) - Permissions: Accessibility, Microphone, Input Monitoring ## Getting Started **1.** Install the App Download the Desktop Agent from your Chipp dashboard. Drag it into your Applications folder. **2.** Sign In Launch the app and sign in with your Chipp builder account. Your session is stored securely in the macOS Keychain. **3.** Grant Permissions The app guides you through three macOS permissions: - **Accessibility** -- Lets the agent read UI elements and simulate keystrokes in other apps - **Microphone** -- For voice input while holding the hotkey - **Input Monitoring** -- For global hotkey capture outside the app Each permission opens System Settings to the right location. **4.** Configure Your Hotkey Default: **Fn** (hold to start, release to stop). Configurable to any key combination in the menu bar dropdown. ## Use Cases ### Smart Dictation Hold Fn in any app and speak naturally. The agent understands context -- in Gmail, it drafts emails; in Slack, it composes messages; in VS Code, it writes code. ### Form Filling On a web form, say "Fill this with Acme Corp's details." The agent reads the form fields, recalls information from memory, and fills each field. ### Code Review In VS Code, select code and say "Refactor this to use async/await." The agent sees the selected text, rewrites it, and replaces the selection. ### Cross-App Workflows "Take what's in this spreadsheet and draft a summary email." The agent captures visible data, drafts the email, opens Mail, and pastes it. ### Quick Research "What was the conversion rate Sarah mentioned last week?" The agent speaks the answer aloud using memory -- no typing needed. ## The Command Bar Press **Cmd+K** to open a Spotlight-style floating overlay where you can: - Type commands to the active agent - View recent actions and history - Switch agents without opening the menu bar ## Screen Context Details When you hold the hotkey, the agent can see: | Context | What It Reads | |---------|---------------| | **Active app** | Which application is focused (Gmail, VS Code, Slack, etc.) | | **Window title** | The current window name (e.g., "Compose - Gmail") | | **Focused element** | Text fields, buttons, and UI elements via Accessibility APIs | | **Selected text** | Highlighted content in the active app | | **Screenshots** | Full window or screen region on request | Context is captured **on-demand only** when the hotkey is held -- there's no background recording. ## Troubleshooting **Hotkey not working?** - Check Input Monitoring permission in System Settings → Privacy → Input Monitoring - If using a custom hotkey, verify it doesn't conflict with system or app shortcuts **Agent can't see the screen?** - Verify Accessibility permission is granted - Some Electron apps (Slack, VS Code) expose limited accessibility data -- the agent may request a screenshot instead **Text not typing into apps?** - The app uses multiple methods (character-by-character, clipboard paste, accessibility APIs) with automatic fallback - If direct typing fails in a specific app, the agent falls back to clipboard paste > **Note:** Desktop Agent includes all voice capabilities. For voice configuration details, see the [Voice Agents guide](/docs/integrations/voice-agents).