Scrape Webpage
Documentation Index
Fetch the complete documentation index at: https://docs.notte.cc/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
The access token received from the authorization server in the OAuth 2.0 flow.
Body
Whether to run the session in headless mode.
Whether to try to automatically solve captchas
Maximum session lifetime in minutes (absolute maximum, not affected by activity).
x <= 1440Idle timeout in minutes. Session closes after this period of inactivity (resets on each operation).
x <= 15List of custom proxies to use for the session. If True, the default proxies will be used.
- NotteProxy
- ExternalProxy
- TailnetProxy
The browser type to use. Can be chromium, chrome or firefox.
chromium, chrome, firefox, chrome-nightly, chrome-turbo The user agent to use for the session
Overwrite the chrome instance arguments
The width of the viewport
The height of the viewport
Viewport shape preset. When set, the backend fits the largest rectangle of this aspect ratio inside the sampled available screen area. Cannot be combined with explicit viewport_width/viewport_height.
5:4, 16:9 The CDP URL of another remote session provider.
Whether FileStorage should be attached to the session.
The type of screenshot to use for the session.
raw, full, last_action Browser profile configuration for state persistence
Whether to use web bot authentication.
Extra HTTP headers to be sent with every request.
The vault to use for the session
Playwright selector to scope the scrape to. Only content inside this selector will be scraped.
Whether to scrape links from the page. Links are scraped by default.
Whether to scrape images from the page. Images are scraped by default.
HTML tags to ignore from the page
Whether to only scrape the main content of the page. If True, navbars, footers, etc. are excluded.
Whether to only scrape images from the page. If True, the page content is excluded.
The response format to use for the scrape. You can use a Pydantic model or a JSON Schema dict (cf. https://docs.pydantic.dev/latest/concepts/json_schema/#generating-json-schema.)
Additional instructions to use for the scrape. E.g. 'Extract only the title, date and content of the articles.'
Whether to use link/image placeholders to reduce the number of tokens in the prompt and hallucinations. However this is an experimental feature and might not work as expected.

