Capture & read
| Tool | Parameters | Description |
take_screenshot | output_name? | Capture the full screen and save it as a PNG artifact. Optionally specify a filename; defaults to a timestamped name. Returns the file path and base64 preview. |
screen_read | region? | Run OCR on the current screen and return the extracted text. Optionally limit to a screen region (x, y, width, height). Useful for reading UI that can't be accessed via DOM. |
Mouse & keyboard
| Tool | Parameters | Description |
screen_click | x, y, button?, double? | Move the mouse to screen coordinates and click. Supports left/right/middle button and double-click. |
screen_type | text | Type text at the current cursor position using simulated keystrokes. Works in any focused input field. |
screen_key | key | Press a single key or modifier combination (e.g. Return, Escape, ctrl+c, cmd+shift+4). |
key_sequence | keys | Execute a sequence of key presses in order. Useful for keyboard shortcuts that require multiple steps. |
Window management
| Tool | Parameters | Description |
window_list | — | List all open windows with their title, app name, and window ID. |
window_focus | window_id | Bring a window to the foreground by its window ID. Combined with window_list to focus any open app. |
screen_find_window | title | Find a window by partial title match and return its ID, position, and size. |
Screen tools require macOS 10.15+ (with Screen Recording permission), Linux with X11 or Wayland, or Windows 10+. They do not work in headless or SSH-only environments.