The access token received from the authorization server in the OAuth 2.0 flow.
Whether to run the session in headless mode.
Whether to try to automatically solve captchas
Session timeout in minutes. Cannot exceed the global timeout.
x <= 15List of custom proxies to use for the session. If True, the default proxies will be used.
The browser type to use. Can be chromium, chrome or firefox.
chromium, chrome, firefox, chrome-nightly The user agent to use for the session
Overwrite the chrome instance arguments
The width of the viewport
The height of the viewport
The CDP URL of another remote session provider.
Whether FileStorage should be attached to the session.
The type of screenshot to use for the session.
raw, full, last_action Playwright selector to scope the scrape to. Only content inside this selector will be scraped.
Whether to scrape links from the page. Links are scraped by default.
Whether to scrape images from the page. Images are scraped by default.
HTML tags to ignore from the page
Whether to only scrape the main content of the page. If True, navbars, footers, etc. are excluded.
Whether to only scrape images from the page. If True, the page content is excluded.
The response format to use for the scrape. You can use a Pydantic model or a JSON Schema dict (cf. https://docs.pydantic.dev/latest/concepts/json_schema/#generating-json-schema.)
Additional instructions to use for the scrape. E.g. 'Extract only the title, date and content of the articles.'
Whether to use link/image placeholders to reduce the number of tokens in the prompt and hallucinations. However this is an experimental feature and might not work as expected.
Successful Response
Markdown representation of the extracted data
Browser session information
List of images extracted from the page (ID and download link)
Structured data extracted from the page in JSON format