Scrape From Html

curl --request POST \ --url https://api.example.com/scrape_from_html \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "selector": "<string>", "scrape_links": true, "scrape_images": false, "ignored_tags": [ "<string>" ], "only_main_content": false, "only_images": false, "response_format": "<unknown>", "instructions": "", "use_link_placeholders": false, "frames": [ { "frameUrl": "<string>", "frameData": "<string>" } ] } '

Authorizations

Authorization

string

header

required

The access token received from the authorization server in the OAuth 2.0 flow.

Headers

x-notte-request-origin

string | null

x-notte-sdk-version

string | null

Body

application/json

selector

string | null

Playwright selector to scope the scrape to. Only content inside this selector will be scraped.

scrape_links

boolean

default:true

Whether to scrape links from the page. Links are scraped by default.

scrape_images

boolean

default:false

Whether to scrape images from the page. Images are scraped by default.

ignored_tags

string[] | null

HTML tags to ignore from the page

only_main_content

boolean

default:false

Whether to only scrape the main content of the page. If True, navbars, footers, etc. are excluded.

only_images

boolean

default:false

Whether to only scrape images from the page. If True, the page content is excluded.

response_format

any | null

The response format to use for the scrape. You can use a Pydantic model or a JSON Schema dict (cf. https://docs.pydantic.dev/latest/concepts/json_schema/#generating-json-schema.)

instructions

string

default:""

User description as to what needs to be scraped

use_link_placeholders

boolean

default:false

Whether to use link/image placeholders to reduce the number of tokens in the prompt and hallucinations. However this is an experimental feature and might not work as expected.

frames

FrameData · object[]

Frame data for all frames found in the page. The main frame has to be the first one

Show child attributes

Response

Successful Response

model_schema

SchemaGenerationResponse · object

required

Output schema generated by the llm according to the instructions

Show child attributes

scrape

StructuredData[BaseModel] · object

required

Output from the scrape, validated upon the model schema

Show child attributes

Getting Started

sessions

agents

functions

vaults

profiles

personas

storage

anything

scrape

Scrape From Html

Authorizations

Headers

Body

Response