Let’s say you want to scrape the latest shoes from Nike.com. You can use Notte to scrape the page and get a list of all the shoes.

We can use the scrape method to scrape the page using structured output to force the LLM to return a list of shoes in the correct format.

nike_scraper.py
from notte_sdk import NotteClient
from pydantic import BaseModel

notte = NotteClient()

class ShoppingItem(BaseModel):
    name: str
    price: float
    url: str
    image_url: str | None = None
    
class ShoppingList(BaseModel):
    items: list[ShoppingItem]
    

url = "https://www.nike.com/w/mens-shoes-nik1zy7ok"
data = notte.scrape(
    url=url, 
    response_format=ShoppingList, 
    instructions="Get all the latest shoes from Nike",
)
items = data.structured.get()
print(items)

Here’s the output:

{
    "items": [
        {
            "name": "Nike Air Max Dn8 Men's Shoes",
            "price": 190.0,
            "url": "https://www.nike.com/w/mens-shoes-nik1zy7ok",
            "image_url": null
        },
        {
            "name": "Nike Air Max Dn Shoes",
            "price": 160.0,
            "url": "https://www.nike.com/w/mens-shoes-nik1zy7ok",
            "image_url": null
        },
        {
            "name": "Nike Air Force 1 Flyknit 2.0 Shoes",
            "price": 120.0,
            "url": "https://www.nike.com/w/mens-shoes-nik1zy7ok",
            "image_url": null
        }
        // ... more items ...
    ]
}

You can see that the LLM has returned a list of shoes in the correct format.