POST
/
sessions
/
{session_id}
/
page
/
scrape
{
  "markdown": "<string>",
  "images": [
    {
      "url": "<string>",
      "category": "favicon",
      "description": "<string>"
    }
  ],
  "structured": {
    "success": true,
    "error": "<string>",
    "data": {}
  },
  "session": {
    "session_id": "<string>",
    "timeout_minutes": 123,
    "created_at": "2023-11-07T05:31:56Z",
    "closed_at": "2023-11-07T05:31:56Z",
    "last_accessed_at": "2023-11-07T05:31:56Z",
    "duration": "<string>",
    "status": "active",
    "steps": [
      {}
    ],
    "error": "<string>",
    "credit_usage": 123,
    "proxies": false,
    "browser_type": "chromium",
    "use_file_storage": false
  }
}

Authorizations

Authorization
string
header
required

The access token received from the authorization server in the OAuth 2.0 flow.

Path Parameters

session_id
string
required

Body

application/json

Whether to scrape links from the page. Links are scraped by default.

scrape_images
boolean
default:false

Whether to scrape images from the page. Images are scraped by default.

ignored_tags
string[] | null

HTML tags to ignore from the page

only_main_content
boolean
default:true

Whether to only scrape the main content of the page. If True, navbars, footers, etc. are excluded.

only_images
boolean
default:false

Whether to only scrape images from the page. If True, the page content is excluded.

response_format
any

The response format to use for the scrape. You can use a Pydantic model or a JSON Schema dict (cf. https://docs.pydantic.dev/latest/concepts/json_schema/#generating-json-schema.)

instructions
string | null

Additional instructions to use for the scrape. E.g. 'Extract only the title, date and content of the articles.'

Whether to use link/image placeholders to reduce the number of tokens in the prompt and hallucinations. However this is an experimental feature and might not work as expected.

Response

Successful Response

markdown
string
required

Markdown representation of the extracted data

session
object
required

Browser session information

images
ImageData · object[] | null

List of images extracted from the page (ID and download link)

structured
object | null

Structured data extracted from the page in JSON format