The Scrape API allows you to get the data you want from web pages using a single call. You can scrape page content and capture its data in various formats.
For detailed usage, checkout the Scrape API Reference.
The simplest way to scrape a webpage is to extract its content as markdown. This is useful when you want to preserve the page’s structure and formatting.
simple_scrape.py
Copy
Ask AI
from notte_sdk import NotteClientclient = NotteClient()with client.Session() as session: data = session.scrape( url="https://www.notte.cc", only_main_content=True ) print(data.markdown)
For more sophisticated use cases, you can extract structured data from web pages by defining a schema using Pydantic models. This is particularly useful when you need to extract specific information like product details, pricing plans, or article metadata.
Let’s say you want to extract pricing information from a website. First, define your data models then use these models to extract structured data:
structured_scrape.py
Copy
Ask AI
from pydantic import BaseModelfrom notte_sdk import NotteClientclass PricingPlan(BaseModel): name: str price_per_month: int | None = None features: list[str]class PricingPlans(BaseModel): plans: list[PricingPlan]client = NotteClient()data = client.scrape( url="https://www.notte.cc", instructions="Extract the pricing plans from the page", response_format=PricingPlans)# plans is a PricingPlans instance# > note that the following line can raise an exception# in case of a scraping errorplans = data.structured.get()
Agent Scraping is a more powerful way to scrape web pages. It allows you to navigate through the page, fill forms, and extract data from dynamic content.
agent_scrape.py
Copy
Ask AI
from pydantic import BaseModelfrom notte_sdk import NotteClientclass LinkedInConversation(BaseModel): recipient: str messages: list[str]client = NotteClient()vault = client.Vault(vault_id="<your-vault-id>")with client.Session() as session: agent = client.Agent(session=session, vault=vault, max_steps=15) response = agent.run( task="Got to linkedin.com, login with the credentials and extract the last 10 messages from my most recent conversation", response_format=LinkedInConversation )print(response.answer)