Let’s say you want to scrape the latest shoes from Nike.com. You can use Notte to scrape the page and get a list of all the shoes.
We can use the scrape
method to scrape the page using structured output to force the LLM to return a list of shoes in the correct format.
from notte_sdk import NotteClient
from pydantic import BaseModel
notte = NotteClient()
class ShoppingItem(BaseModel):
name: str
price: float
url: str
image_url: str | None = None
class ShoppingList(BaseModel):
items: list[ShoppingItem]
url = "https://www.nike.com/w/mens-shoes-nik1zy7ok"
data = notte.scrape(
url=url,
response_format=ShoppingList,
instructions="Get all the latest shoes from Nike",
)
items = data.structured.get()
print(items)
Here’s the output:
{
"items": [
{
"name": "Nike Air Max Dn8 Men's Shoes",
"price": 190.0,
"url": "https://www.nike.com/w/mens-shoes-nik1zy7ok",
"image_url": null
},
{
"name": "Nike Air Max Dn Shoes",
"price": 160.0,
"url": "https://www.nike.com/w/mens-shoes-nik1zy7ok",
"image_url": null
},
{
"name": "Nike Air Force 1 Flyknit 2.0 Shoes",
"price": 120.0,
"url": "https://www.nike.com/w/mens-shoes-nik1zy7ok",
"image_url": null
}
// ... more items ...
]
}
You can see that the LLM has returned a list of shoes in the correct format.