Observation Response:
- a list of actions that can be taken on the page (e.g. click on a button, scroll, etc.)
- a screenshot of the page (base64 encoded)
- some metadata about the page (title, url, etc.)
# Observe the page
obs = session.observe()
# Select an action from the list of interactible elements on the page
actions = obs.space.interaction_actions
# display the action space as a string to be able to visualize it
print(obs.space.description)
# get the screenshot
screenshot = obs.screenshot.bytes()
Once you have selected an action (either manually or using an LLM), you can execute it with:
Note that by default, a very simple page perception is used to generate the action space (i.e perception_type='fast'
) to make the query fast.
If you want a more powerful and LLM-ready action space, you can use:
obs = session.observe(perception_type='deep')
print(obs.space.description)
At the cost of a slower query since this uses an LLM call to format the interactive elements.
Additionally, you can use the instructions
parameter to narrow down the action space to a specific intent on a website. This is useful if you want to quickly create a workflow using natural language:
_ = session.execute({"type": "goto", "url": "https://console.notte.cc"})
obs = session.observe(instructions="Fill the email input")
action = obs.space.first()
print(action.model_dump())
Parameters
The minimum number of actions to list before stopping. If not provided, the listing will continue until the maximum number of actions is reached.
The maximum number of actions to list after which the listing will stop. Used when min_nb_actions is not provided.
The URL to observe. If not provided, uses the current page URL.
Additional instructions to use for the observation.
perception_type
typing.Optional[typing.Literal['fast', 'deep']]
Returns
ObserveResponse
: The formatted observation result from the API response.