Scrape any page and get formatted data
The Scrape API allows you to get the data you want from web pages using a single call. You can scrape page content and capture its data in various formats. For detailed usage, checkout the Scrape API Reference.Basic Markdown Scraping
The simplest way to scrape a webpage is to extract its content as markdown. This is useful when you want to preserve the page’s structure and formatting.simple_scrape.py
Structured Data Extraction
For more sophisticated use cases, you can extract structured data from web pages by defining a schema using Pydantic models. This is particularly useful when you need to extract specific information like product details, pricing plans, or article metadata.Example: Extracting Pricing Plans from notte.cc
Let’s say you want to extract pricing information from a website. First, define your data models then use these models to extract structured data:
structured_scrape.py
Agent Scraping
Agent Scraping is a more powerful way to scrape web pages. It allows you to navigate through the page, fill forms, and extract data from dynamic content.agent_scrape.py
Topics & Tips
Scrape API vs Agent Scrape
Scrape API
Perfect for1. One-off scraping tasks2. Simple data extraction3. Static content
Agent Scrape
Perfect for1. Authentication or login flows2. Form filling and submission3. Dynamic content
Response Format Best Practices
Use
response_format
whenever possible to yield the best & most reliable results:- Try a few different schemas to find what works best
- If you ask for a
company_name
field but there is nocompany_name
on the page, LLM scraping will fail - Design your schema carefully based on the actual content structure
- Response format is available for both
scrape
andagent.run