Observation Response:
  • a list of actions that can be taken on the page (e.g. click on a button, scroll, etc.)
  • a screenshot of the page (base64 encoded)
  • some metadata about the page (title, url, etc.)
# Observe the page
obs = session.observe()
# Select an action from the list of interactible elements on the page
actions = obs.space.interaction_actions
# display the action space as a string to be able to visualize it
print(obs.space.description)
# get the screenshot
screenshot = obs.screenshot.bytes()
Once you have selected an action (either manually or using an LLM), you can execute it with:
session.execute(action)
Note that by default, a very simple page perception is used to generate the action space (i.e perception_type='fast') to make the query fast. If you want a more powerful and LLM-ready action space, you can use:
obs = session.observe(perception_type='deep')
print(obs.space.description)
At the cost of a slower query since this uses an LLM call to format the interactive elements. Additionally, you can use the instructions parameter to narrow down the action space to a specific intent on a website. This is useful if you want to quickly create a workflow using natural language:
_ = session.execute({"type": "goto", "url": "https://console.notte.cc"})
obs = session.observe(instructions="Fill the email input")
action = obs.space.first()
print(action.model_dump())

Parameters

min_nb_actions
int | None
The minimum number of actions to list before stopping. If not provided, the listing will continue until the maximum number of actions is reached.
max_nb_actions
int
The maximum number of actions to list after which the listing will stop. Used when min_nb_actions is not provided.
url
str | None
The URL to observe. If not provided, uses the current page URL.
instructions
str | None
Additional instructions to use for the observation.
perception_type
typing.Optional[typing.Literal['fast', 'deep']]

Returns

ObserveResponse: The formatted observation result from the API response.