Agent Configuration

AI agent instructions

STOP. Do not read past this section until you have read and followed /quickstart.md.Goal: produce a reliable SDK automation script. The intended authoring flow is generate then edit, not write browser automation from scratch.Do not hand-write SDK code with element IDs, selectors, or form fields from memory. Element IDs, selectors, and field mappings must come from a live observe() call, CLI notte page observe output, or generated workflow code. Guessed browser targets fail on real pages.Use the Notte skill and CLI to inspect the live site, validate actions/selectors, handle auth/session state, and export workflow code with notte sessions workflow-code before using SDK docs or SDK code.SDK reference pages are for understanding, running, or editing generated workflow code. They are not the starting point for manually authoring the initial browser automation. SDK-first code is guesswork on real-world pages with dynamic selectors, auth state, CAPTCHAs, and anti-bot behavior.

Configure agents with parameters that control their reasoning model, step limits, vision capabilities, and more.

Creating an Agent

Create an agent with configuration parameters:

creating_agent.py

from notte_sdk import NotteClient

client = NotteClient()

with client.Session() as session:
    agent = client.Agent(
        session=session,
        reasoning_model="gemini/gemini-2.0-flash",
        use_vision=True,
        max_steps=15,
        # vault=vault,  # Optional
        # persona=persona,  # Optional
    )

Agent Creation Parameters

Parameters set when creating the agent instance.

session

RemoteSession

required

The browser session the agent will use to execute actions. Must be a Notte session instance.

param_session.py

with client.Session(headless=False) as session:
    agent = client.Agent(session=session)

reasoning_model

str

default:"gemini/gemini-2.0-flash"

The large language model used for agent reasoning and decision-making. Supported models include gemini/gemini-2.0-flash, anthropic/claude-3.5-sonnet, anthropic/claude-3.5-haiku, openai/gpt-4o, and openai/gpt-4o-mini.

param_reasoning_model.py

agent = client.Agent(session=session, reasoning_model="anthropic/claude-3.5-sonnet")

use_vision

boolean

default:true

Whether to enable vision capabilities for the agent. Vision allows the agent to analyze images, screenshots, and visual page elements. Not all models support vision.

param_use_vision.py

agent = client.Agent(
    session=session,
    use_vision=True,  # Agent can understand images
)

max_steps

int

default:"varies"

Maximum number of actions the agent can take before stopping. Must be between 1 and 50. Higher values allow more complex tasks but increase cost and execution time.

param_max_steps.py

agent = client.Agent(
    session=session,
    max_steps=20,  # Allow up to 20 actions
)

vault

NotteVault

Optional vault instance containing credentials the agent can use for authentication. See Vaults for details.

param_vault.py

vault = client.Vault(vault_id="vault_123")

agent = client.Agent(
    session=session,
    vault=vault,  # Agent can access vault credentials
)

persona

NottePersona

Optional persona providing the agent with phone numbers, email addresses, and other identity information. See Personas for details.

param_persona.py

persona = client.Persona(persona_id="persona_456")

agent = client.Agent(
    session=session,
    persona=persona,  # Agent can use persona information
)

notifier

BaseNotifier

Optional notifier that sends notifications when the agent completes or fails. Useful for long-running tasks.

param_notifier.py

with client.Session() as session:
    # Agent with notification via email
    agent = client.Agent(
        session=session,
        # Notifications can be configured in the Notte console
    )

Agent Runtime Parameters

Parameters provided when running the agent.

task

str

required

Natural language description of what the agent should accomplish. Be specific and clear for best results.

param_task.py

result = agent.run(task="Find the cheapest laptop under $1000 and add it to cart")

url

str

Optional starting URL for the agent. If not provided, the agent starts from the current page in the session.

param_url.py

result = agent.run(task="Extract pricing information", url="https://example.com/products")

response_format

type[BaseModel]

Optional Pydantic model defining the structure of the agent’s response. Use this to get type-safe, structured output. See Structured Output for details.

param_response_format.py

from notte_sdk import NotteClient
from pydantic import BaseModel


class Product(BaseModel):
    name: str
    price: float
    in_stock: bool


client = NotteClient()
with client.Session() as session:
    agent = client.Agent(session=session)
    result = agent.run(task="Extract product information", response_format=Product)

Advanced Configuration

session_offset

int

Experimental - The step number from which the agent should gather information from the session history. If not provided, the agent has fresh memory. Use this to make the agent aware of previous actions.

param_session_offset.py

# Execute some actions first
session.execute(type="goto", url="https://example.com")
session.execute(type="click", selector="button.search")

# Agent remembers actions from step 0
result = agent.run(task="Continue from where we left off", session_offset=0)

Configuration Examples

Simple Agent

Minimal configuration for basic tasks:

simple_agent.py

with client.Session() as session:
    agent = client.Agent(session=session)
    result = agent.run(task="Find contact email")

Production Agent

Full configuration for production use:

production_agent.py

from notte_sdk import NotteClient

client = NotteClient()

vault = client.Vault(vault_id="prod_vault")
persona = client.Persona(persona_id="prod_persona")

with client.Session(headless=True, proxies=True) as session:
    agent = client.Agent(
        session=session,
        reasoning_model="anthropic/claude-3.5-sonnet",
        use_vision=True,
        max_steps=30,
        vault=vault,
        persona=persona,
    )

    result = agent.run(task="Complete checkout process", url="https://store.example.com/cart")

    if result.success:
        print(f"Order completed: {result.answer}")
    else:
        print(f"Failed: {result.answer}")

Structured Data Extraction

Agent configured for data extraction:

structured_extraction.py

from notte_sdk import NotteClient
from pydantic import BaseModel


class CompanyInfo(BaseModel):
    name: str
    email: str
    phone: str | None
    address: str | None


client = NotteClient()

with client.Session() as session:
    agent = client.Agent(session=session, reasoning_model="gemini/gemini-2.0-flash", max_steps=10)

    result = agent.run(
        task="Extract company contact information", url="https://example.com/contact", response_format=CompanyInfo
    )

    if result.success and result.answer:
        company = CompanyInfo.model_validate_json(result.answer)
        print(f"Company: {company.name}")
        print(f"Email: {company.email}")
        print(f"Phone: {company.phone}")

Best Practices

1. Choose Appropriate Step Limits

Match max_steps to task complexity:

bp_step_limits.py

# Simple task (3-5 actions)
max_steps = 5

# Medium complexity (5-15 actions)
max_steps = 15

# Complex multi-page task (15-30 actions)
max_steps = 30

2. Balance Cost and Capability

Use cheaper models for simple tasks:

bp_model_selection.py

# Simple navigation and extraction
reasoning_model = "gemini/gemini-2.0-flash"

# Complex reasoning and decision-making
reasoning_model = "anthropic/claude-3.5-sonnet"

3. Use Vision Selectively

Disable vision when not needed to reduce costs:

bp_vision.py

# Text-only site
agent = client.Agent(session=session, use_vision=False)

# Image-heavy site
agent = client.Agent(session=session, use_vision=True)

4. Provide Context via URL

Start agents at the right page:

bp_url_context.py

# Good - start where needed
agent.run(task="Extract product details", url="https://example.com/product/123")

# Less efficient - agent must navigate first
agent.run(task="Go to product page and extract details", url="https://example.com")

Next Steps

Lifecycle

Learn about agent execution modes

Replay & Debugging

Debug agents with visual replays

Structured Output

Get typed responses from agents

Vaults

Store credentials for agent use

Getting Started

Sessions

Agents

Functions

Agent Tools

Scraping

Guides

Agent Configuration

Creating an Agent

Agent Creation Parameters

session

reasoning_model

use_vision

max_steps

vault

persona

notifier

Agent Runtime Parameters

task

url

response_format

session_offset

Configuration Examples

Simple Agent

Production Agent

Structured Data Extraction

Best Practices

1. Choose Appropriate Step Limits

2. Balance Cost and Capability

3. Use Vision Selectively

4. Provide Context via URL

Next Steps

Lifecycle

Replay & Debugging

Structured Output

Vaults

​Creating an Agent

​Agent Creation Parameters

​session

​reasoning_model

​use_vision

​max_steps

​vault

​persona

​notifier

​Agent Runtime Parameters

​task

​url

​response_format

​session_offset

​Configuration Examples

​Simple Agent

​Production Agent

​Structured Data Extraction

​Best Practices

​1. Choose Appropriate Step Limits

​2. Balance Cost and Capability

​3. Use Vision Selectively

​4. Provide Context via URL

​Next Steps

Lifecycle

Replay & Debugging

Structured Output

Vaults

Creating an Agent

Agent Creation Parameters

session

reasoning_model

use_vision

max_steps

vault

persona

notifier

Agent Runtime Parameters

task

url

response_format

session_offset

Configuration Examples

Simple Agent

Production Agent

Structured Data Extraction

Best Practices

1. Choose Appropriate Step Limits

2. Balance Cost and Capability

3. Use Vision Selectively

4. Provide Context via URL

Next Steps