curl --request POST \
--url https://api.example.com/scrape_from_html \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"selector": "<string>",
"scrape_links": true,
"scrape_images": false,
"ignored_tags": [
"<string>"
],
"only_main_content": true,
"only_images": false,
"response_format": "<unknown>",
"instructions": "",
"use_link_placeholders": false,
"frames": [
{
"frameUrl": "<string>",
"frameData": "<string>"
}
]
}
'{
"model_schema": {
"success": true,
"model_schema": {},
"error": "<string>"
},
"scrape": {
"success": true,
"error": "<string>",
"data": {}
}
}curl --request POST \
--url https://api.example.com/scrape_from_html \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"selector": "<string>",
"scrape_links": true,
"scrape_images": false,
"ignored_tags": [
"<string>"
],
"only_main_content": true,
"only_images": false,
"response_format": "<unknown>",
"instructions": "",
"use_link_placeholders": false,
"frames": [
{
"frameUrl": "<string>",
"frameData": "<string>"
}
]
}
'{
"model_schema": {
"success": true,
"model_schema": {},
"error": "<string>"
},
"scrape": {
"success": true,
"error": "<string>",
"data": {}
}
}The access token received from the authorization server in the OAuth 2.0 flow.
Playwright selector to scope the scrape to. Only content inside this selector will be scraped.
Whether to scrape links from the page. Links are scraped by default.
Whether to scrape images from the page. Images are scraped by default.
HTML tags to ignore from the page
Whether to only scrape the main content of the page. If True, navbars, footers, etc. are excluded.
Whether to only scrape images from the page. If True, the page content is excluded.
The response format to use for the scrape. You can use a Pydantic model or a JSON Schema dict (cf. https://docs.pydantic.dev/latest/concepts/json_schema/#generating-json-schema.)
User description as to what needs to be scraped
Whether to use link/image placeholders to reduce the number of tokens in the prompt and hallucinations. However this is an experimental feature and might not work as expected.
Frame data for all frames found in the page. The main frame has to be the first one
Show child attributes
Was this page helpful?
Suggestions