Resolving Bot Detection Issues

Bot detection is a common challenge for web agents. This guide provides strategies to bypass these detection mechanisms using Notte’s stealth configuration options.

Common Bot Detection Challenges:

  • Accessing e-commerce sites with anti-bot measures
  • Scraping content from news or social media platforms
  • Interacting with banking or financial websites
  • Accessing sites with geographic restrictions

Stealth Configuration Strategies

1. Proxy Configuration

Proxies are one of the most effective ways to bypass bot detection. Different proxy configurations can help you appear as legitimate traffic from various locations.

Using Default Proxies

Enable Notte’s built-in residential proxies for better anonymity:

from notte_sdk import NotteClient

notte = NotteClient()

# Start a session with built-in proxies
with notte.Session(proxies=True) as session:
    _ = session.observe(url="https://www.notte.cc/")

Country-Specific Proxies

For sites with geographic restrictions, use proxies from specific countries:

from notte_sdk.types import NotteProxy
proxies = NotteProxy.from_country("fr")

2. Browser Type Selection

Different browsers have varying levels of detection resistance. Experiment with different browser types for your specific use case:

from notte_sdk import NotteClient

notte = NotteClient()

# Try different browser types
browsers = ["chromium", "chrome", "firefox"]
for browser in browsers:
	with notte.Session(
		browser_type=browser,
		proxies=True,
		solve_captchas=True
	) as session:
		result = session.observe(url="https://example.com")
		print(f"Success with {browser}")

chromium is the default browser type but is the most easily detected.

3. CAPTCHA Solving

Enable automatic CAPTCHA solving for sites that use these challenges:

solve_captchas.py
from notte_sdk import NotteClient

notte = NotteClient()
with notte.Session(
    solve_captchas=True,
    browser_type="firefox",
    headless=False,
) as session:
    # Navigate to a page with a CAPTCHA
    agent = notte.Agent(session=session, max_steps=5)
    resp = agent.run(
        task=(
            "Try to solve the CAPTCHA using internal tools. "
            "If you fail, try to solve it manually."
        ),
        url="https://www.google.com/recaptcha/api2/demo"
    )

Not all CAPTCHA types are supported. Some complex CAPTCHAs may still require manual intervention.

Complete Stealth Configuration Example

Here’s a comprehensive example combining all stealth techniques:

stealth_configuration.py
from notte_sdk import NotteClient
from notte_sdk.types import NotteProxy

notte = NotteClient()

# Example stealth configuration
# this is just one possible configuration, with an obvious fingerprint
# rotating those values will raise your chances
stealth_config = {
    "solve_captchas": True,
    "proxies": [NotteProxy.from_country("us")],
    "browser_type": "chrome",
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "viewport_width": 1920,
    "viewport_height": 1080,
}

# Try the stealth configuration
with notte.Session(**stealth_config) as session:
    result = session.observe(url="https://example.com")
    print("Success with fallback configuration")

Troubleshooting Tips

  1. Start Simple. Begin with basic configurations and gradually add complexity:
    • Try proxies=True first
    • Add solve_captchas=True if needed
    • Experiment with different browser_type values
    • Add custom user_agent if still detected
  2. Test Incrementally. Test each configuration change individually to identify what works:
  3. Monitor for Patterns. Keep track of which configurations work for different types of sites:
    • E-commerce sites often respond well to residential proxies
    • Social media sites may require specific user agents
    • Banking sites may need country-specific proxies

Best Practices

  1. Rotate Configurations: Don’t rely on a single configuration: it makes it easier to track you
  2. Monitor Success Rates: Verify which configurations work best for different site types
  3. Respect Rate Limits: Implement delays between requests to avoid triggering rate limiting
  4. Keep Configurations Updated: Bot detection methods evolve, so regularly test and update your configurations