Wiki · Concept · Last reviewed June 25, 2026

Screen Capture API

The Screen Capture API lets selected web pages ask users to share a screen, window, or tab as a media stream, creating a high-risk boundary for agents and surveillance.

Category: Concept Updated: June 25, 2026 Tags: browser security, screen sharing, privacy, agents, surveillance

Definition

The Screen Capture API is a web-platform capability for turning a user-selected display surface into a MediaStream. The W3C Screen Capture Working Draft defines getDisplayMedia() as an extension to the Media Capture API for using a user's display, or part of it, as a media-stream source. The captured source may be a browser tab, a window, a monitor, or another surface offered by the user agent.

MDN marks MediaDevices.getDisplayMedia() as limited availability and secure-context only. The method prompts the user to select and grant permission to capture display content. The returned stream can be recorded with the MediaStream Recording API or transmitted through WebRTC, making screen capture central to meetings, support, demos, production tools, and AI systems that watch computer-use context.

Mechanism

A page calls navigator.mediaDevices.getDisplayMedia(options). The user agent presents a picker and the user chooses what to share. MDN's security section notes that options cannot pre-limit the user's choices, permission cannot be persisted, and transient user activation is required. The W3C draft identifies Screen Capture as the powerful feature display-capture, requires express permission, and defines a Permissions Policy feature with a default allowlist of self.

The options object can request or hint at audio, system-audio behavior, current-tab prominence, monitor inclusion, current-tab exclusion, surface switching, and capture-session control. The result is still bounded by the surface the user selects and by browser support. MDN also documents that audio capture is optional and may be absent even when content asks for audio.

Adjacent browser features make screen sharing more structured. Chrome's Capture Handle documentation describes an opt-in way for a captured web app to expose a handle, origin, or both to a permitted capturing app. Chrome's Region Capture documentation describes cropping a video track from current-tab capture to a chosen element's bounding box. These features can reduce awkward meeting flows, but they also add more state that governance records should preserve.

Agent Context

For AI Browsers and Computer Use, screen capture is a perception boundary. A model may need a live screen stream to coach a user through software, summarize a meeting, inspect a remote support session, convert a visual workflow into steps, or supervise a browser automation task. The same stream may expose private chats, password managers, medical records, finance tabs, employee dashboards, filenames, notification banners, or other people who did not consent to be part of the session.

This differs from a normal DOM-reading tool. Screen capture sees pixels, not only structured page data. It can cross origins, desktop applications, tabs, windows, and accidental background content. Once routed into an AI system, it may become model input, a recording, a transcript source, or an audit artifact. The governance question is what surface was shared, who or what could see it, what was retained, and when capture stopped.

Governance Use

A governance review should treat display capture as live observation authority. Record the origin, top-level or embedded context, secure-context state, Permissions Policy state, transient activation path, prompt result, selected surface type, audio options, whether system or window audio was available, whether surface switching was enabled, stream destination, recording status, model access, retention period, and user-facing stop control.

For workplace and support tools, separate user-mediated screen capture from unattended screenshot collection or headless automation. getDisplayMedia() is designed around active user choice. A product that needs background monitoring, productivity scoring, fraud review, or evidence capture is a different governance problem and should not hide behind a meeting-style permission prompt.

Limits

Screen-share permission does not mean informed consent to every visible fact. A user may intend to share one tab and still reveal notifications, account names, hidden windows, a side chat, or sensitive content in a logical display surface. MDN's Screen Capture guidance specifically flags accidental sharing as the main privacy risk, including background windows and password managers.

Safer screen capture uses narrow surfaces, visible indicators, explicit stop controls, no surprise audio, redaction where possible, short retention, local processing when feasible, and clear notice when an AI system is receiving frames. Capture Handle and Region Capture can help collaboration and cropping, but they do not decide whether the session should be captured, retained, summarized, or used to score a worker.

Review Record

Origin: record origin, frame context, secure-context state, Permissions Policy state, transient activation path, and user prompt result.
Surface: record tab, window, monitor, region, audio availability, system-audio choice, surface switching, Capture Handle state, and visible indicators.
Stream: record destination, recording status, frame routing, retention, redaction, stop time, error state, and whether a human could review the shared surface.
Agents: record model prompts, frame access, generated summaries, tool calls based on the stream, human approvals, and incident-review artifacts.

Source Discipline

Claims about API scope, permission integration, privacy indicators, and Permissions Policy should cite the W3C Screen Capture draft. Claims about secure contexts, return values, exceptions, non-persistent permission, transient activation, and user-choice constraints should cite MDN. Claims about Capture Handle and Region Capture should cite Chrome's developer documentation and be treated as browser-specific implementation context, not universal support.

Spiralist Reading

Spiralism reads screen capture as the moment the workstation becomes a camera pointed inward. The screen is where labor, attention, messages, documents, mistakes, and institutional power pass through the same rectangle. The humane boundary is not only a prompt. It is a visible session, a narrow surface, a real stop button, and a record that lets the person know when the machine was watching.

Sources

W3C, Screen Capture, Working Draft.
MDN Web Docs, MediaDevices: getDisplayMedia() method.
MDN Web Docs, Using the Screen Capture API.
Chrome for Developers, Better tab sharing with Capture Handle.
Chrome for Developers, Better tab sharing with Region Capture.

Return to Wiki