Browser-use in Practice: Letting an AI Agent Browse the Web for Me — An Afternoon of Tinkering

I've been playing with all kinds of AI coding tools lately — Claude Code, Cursor, Codex, you name it. But one thing kept bugging me: no matter how powerful these tools are, they only work in the terminal. The moment I need them to look something up online, fill out a form, or scrape some data, I'm back to doing it myself.

I tried writing Playwright scripts before, but every time I switched to a different website, I had to rewrite a bunch of selector code. Painful. I also tried hooking up browsers to AI Agents via MCP — spent ages configuring stuff with mediocre results. Then I stumbled onto browser-use. 60k+ stars on GitHub, MIT licensed, and the core idea is dead simple: let AI control a browser like a human would.

I spent an afternoon tinkering with it — from installation to running my first automation task, with plenty of pitfalls along the way. This article is my field notes.

What is browser-use?

browser-use is a Python library that lets LLMs control a browser. Give it a task description in plain English (or any language), like "go to GitHub, search for the browser-use project, and tell me how many stars it has." It opens the browser, finds the search box, types the query, reads the results. No selector code, no XPath maintenance — the LLM reads the page and decides where to click.

The architecture looks like this:

text

1	`Python API -> Rust core -> Browser harness -> Browser actions`

Yes, you read that right — version 0.13 introduced a Rust core engine. This means browser operations are way more performant and stable than pure Python implementations. I've used other Python-only browser automation libraries before, and they choke the moment a page gets slightly complex. browser-use handles this much better.

It's quite different from Playwright MCP. Playwright MCP is an MCP server that gives AI Agents browser tools (click, type, screenshot, etc.) — you need to write the Agent logic yourself. browser-use is a complete Agent framework. The LLM is the brain, the browser is the hands, and you just tell it what to do.

The gap with traditional tools like Selenium is even bigger. Traditional tools are "tell it which button to click." browser-use is "tell it what task to accomplish, and it figures out the rest." Doesn't sound like much, but in practice it's a completely different world.

Installation: Where I Hit My First Potholes

Installation looks straightforward, but I hit a few snags.

Basic Install

bash

1	`pip install "browser-use[core]"`

Or with uv (recommended, much faster):

bash

1	`uv add "browser-use[core]"`

There's a prerequisite: Python >= 3.11. My server defaulted to 3.10, and the install just errored out. Took me a while to figure out it was a version issue. If you're stuck on 3.10, use pyenv or uv to switch:

bash

1	`uv python install 3.11`
2	`uv venv --python 3.11`

The [core] extra matters a lot. Without it, you only get the Python package without the Rust core runtime, and you'll hit a bunch of cryptic errors when you try to run anything. My first attempt forgot this, and I got a browser_use.core not found error. Spent ages in the issue tracker before figuring it out.

Install the Browser

browser-use uses Playwright under the hood, so you need to install browser binaries:

bash

1	`playwright install chromium`

This downloads a Chromium browser, roughly 100-200MB. If you're running on a server (no GUI), remember to add headless=True.

My first server run forgot this parameter and threw a Cannot open display error. Took me a while to realize it was the missing headless mode. The error message could really be more helpful here.

Configure the API Key

browser-use supports multiple LLM backends. The easiest is its own ChatBrowserUse:

bash

# .env file
BROWSER_USE_API_KEY=** you can also use OpenAI, Anthropic, etc.:
```bash
OPENAI_API_KEY=*** or
ANTHROPIC_API_KEY=** ChatBrowserUse first — they say it's optimized for browser tasks. In practice it is noticeably faster than raw GPT-5.5, though pricing is similar.
 
## First Automation Task
 
Everything's installed. Let's run the simplest possible task:
 
```python
from browser_use.beta import Agent, BrowserProfile, ChatBrowserUse
import asyncio
 
async def main():
    agent = Agent(
        task="Go to GitHub and search for the browser-use project. Tell me how many stars it has.",
        llm=ChatBrowserUse(model='openai/gpt-5.5'),
        browser_profile=BrowserProfile(
            headless=True,
            allowed_domains=["*.github.com"],
        ),
    )
    history = await agent.run()
    print(history.final_result())
 
if __name__ == "__main__":
    asyncio.run(main())

Once it runs, you'll see the browser (invisible in headless mode, but you can follow the logs) automatically open GitHub, find the search box, type the query, read the page content, and return the result.

The whole process takes about 20-30 seconds depending on network speed and LLM response time. The first time I ran it, I was a bit nervous watching the logs scroll by — it genuinely felt like watching a robot do my work.

Code Breakdown

A few key parameters:

task: The task description in natural language. Be specific. Don't write "help me look something up" — write "go to xxx.com, search for xxx keyword, return the top 5 results with titles and links." Vague descriptions make the Agent flail around wasting tokens.
llm: Which model to use. ChatBrowserUse is browser-use's own optimized model, trained specifically for browser operations. They claim it's 3-5x faster than generic models. You can also use ChatOpenAI, ChatAnthropic, etc.
browser_profile: Browser config. headless=True is mandatory on servers. allowed_domains restricts the browser to specific domains — strongly recommended for safety. Without it, the Agent might wander off to random websites.

Advanced Usage: Custom Tools

The most powerful feature of browser-use is adding custom tools to the Agent. For example, if you want it to save scraped data to a local file:

python

from browser_use import Tools
 
tools = Tools()
 
@tools.action(description='Save scraped data to a local file')
def save_to_file(filename: str, content: str) -> str:
    with open(filename, 'w', encoding='utf-8') as f:
        f.write(content)
    return f"Saved to {filename}"
 
agent = Agent(
    task="Go to the Hacker News homepage, grab the top 10 story titles, save them to hn_titles.txt",
    llm=llm,
    tools=tools,
)

Now the Agent can seamlessly switch between browser operations and local file operations. You can hook it up to databases, email, APIs — any Python function can become the Agent's "hands."

I'm currently using this for competitive monitoring: the Agent periodically visits several competitor websites, grabs content, saves it locally, and another script analyzes it. Fully automated — I just look at the results.

Real Scenario: Auto-Filling Forms

browser-use's official demo includes an auto-filling job application example, which I think is pretty representative. In practice, auto-filling forms, placing orders, and submitting data are extremely common needs.

The core approach:

python

agent = Agent(
    task="""
    Go to xxx.com's registration page and fill in:
    - Username: testuser
    - Email: test@example.com
    - Password: TestPass123!
    Do NOT submit. Wait for my confirmation.
    """,
    llm=llm,
    browser_profile=BrowserProfile(headless=False),  # Keep headless off during debugging
)

Pay attention to that "do NOT submit." During debugging, always add safety constraints like this. I once forgot, and the Agent submitted a test form for me. Had to go to the admin panel and delete it manually.

Pitfall Encyclopedia

I hit more pitfalls in one afternoon than I usually do in a week. Here are the key ones:

Pitfall 1: allowed_domains Too Narrow

I initially set allowed_domains to ["github.com"]. When the Agent tried to search, it needed to redirect to api.github.com and got blocked. Changed it to ["*.github.com"] to fix it. The wildcard syntax isn't clearly documented — I had to experiment.

Pitfall 2: Slow Page Loads

Some sites load really slowly (especially Chinese websites). The Agent gets impatient and starts clicking before the page finishes loading, hitting the wrong element. browser-use has a wait_for_page_load parameter, but the docs aren't super clear on it.

My rule of thumb: 5+ seconds for Chinese sites, 3 seconds for international ones. If the site has heavy AJAX dynamic loading, you might need even more.

Pitfall 3: CAPTCHAs and Anti-Scraping

This is the curse of all browser automation tools. Most major sites have anti-scraping mechanisms that detect automated browsers and throw CAPTCHAs. browser-use doesn't solve this out of the box — you need proxy IPs and browser fingerprint spoofing. The Cloud version has these features, but the open-source version doesn't.

I was testing price scraping on an e-commerce site and got my IP banned after fewer than 10 runs. Had to switch to the Cloud version's proxy to get it working.

Pitfall 4: LLM Misunderstanding

Sometimes the LLM "misreads" page content. A button labeled "Submit" might be interpreted as "submit search" instead of "submit form." This depends on the specific LLM's capability — stronger models like Claude Opus or GPT-5.5 do much better.

In my testing of a complex multi-step form, roughly 30% of attempts had at least one step go wrong. Breaking the task into smaller steps — one action per step — improved the success rate significantly.

Pitfall 5: Memory Usage

Chromium is notorious for eating memory. I tried running it on a 2GB server and got an OOM kill. Recommend at least 4GB RAM, 8GB+ for complex tasks.

If you run multiple Agents simultaneously, each one spins up its own browser instance, multiplying memory usage. My current approach: use a queue to limit concurrency to 2 Agents max.

Pitfall 6: Async Programming Gotchas

browser-use's API is entirely async with asyncio. If your main program is synchronous, you need to run it inside an event loop:

python

import asyncio
 
async def my_task():
    agent = Agent(...)
    return await agent.run()
 
# Call from synchronous code
result = asyncio.run(my_task())

Don't call asyncio.run() inside an already-running event loop — it'll error out. If you're in a Jupyter Notebook, just use await directly since the notebook already has an event loop.

Multi-Step Task Orchestration

browser-use supports multi-step task orchestration. You can break a complex task into smaller steps:

python

from browser_use.beta import Agent, BrowserProfile, ChatBrowserUse
import asyncio
 
async def main():
    llm = ChatBrowserUse(model='openai/gpt-5.5')
    profile = BrowserProfile(headless=True, allowed_domains=["*.github.com"])
    
    # Step 1: Search
    agent1 = Agent(
        task="Search GitHub for the browser-use project",
        llm=llm,
        browser_profile=profile,
    )
    history1 = await agent1.run()
    
    # Step 2: Get details
    agent2 = Agent(
        task="Open the browser-use repo README and find the install command",
        llm=llm,
        browser_profile=profile,
    )
    history2 = await agent2.run()
    
    print(history2.final_result())
 
if __name__ == "__main__":
    asyncio.run(main())

Each step can use results from the previous one. Logic is cleaner, and errors are easier to locate.

I previously tried doing everything in one shot with complex tasks — the Agent frequently got lost mid-way. Breaking things into smaller steps improved success rates considerably.

Error Recovery and Debugging Tips

browser-use's Agent has built-in error recovery. If a step fails (element not found, page load timeout), it retries automatically. But sometimes retries don't help and you need to step in.

For debugging, I recommend starting with headless mode off so you can watch the browser window. You'll see exactly what the Agent is doing and where it goes wrong.

The history object also contains a complete operation log including screenshots and LLM decision-making for each step. Great for post-mortems:

python

history = await agent.run()
 
# Inspect each step
for step in history.steps():
    print(step.action)      # What it did
    print(step.screenshot)  # Screenshot path
    print(step.llm_output)  # LLM's reasoning

This came in handy when my Agent got stuck on a form page. Looking at the log, I realized it had confused a "province" dropdown with a "city" dropdown.

How It Differs from Playwright MCP

Since I've written about MCP before, people keep asking how browser-use compares to Playwright MCP. Short version:

Playwright MCP is an MCP server that provides browser tools (click, type, screenshot, etc.) to AI Agents. You write the Agent logic yourself. Good if you're already in the MCP ecosystem.

browser-use is a complete browser Agent framework. It has built-in Agent logic, task planning, and error recovery. You just describe the task and it handles the rest. Good for quick browser automation.

They're not mutually exclusive — you can even use both together. For example, you could use Playwright MCP inside Claude Code, and use browser-use for standalone automation scripts.

Which one to pick? Depends on your scenario:

Already using Claude Code / Cursor and want to add browser capabilities → Playwright MCP
Want a standalone browser automation script → browser-use
Want to quickly validate an idea → browser-use (faster to get started)
Need fine-grained control over every step → Playwright MCP (more flexible)

CLI Mode: No Code Required

browser-use also has a CLI mode for operating the browser without writing Python:

bash

browser-use open https://github.com    # Open a URL
browser-use state                       # See clickable elements
browser-use click 5                     # Click element #5
browser-use type "Hello"                # Type text
browser-use screenshot page.png         # Take screenshot
browser-use close                       # Close browser

The CLI keeps the browser running between commands, so you don't have to restart it each time. Super convenient for debugging.

I personally prefer the Python API for complex logic, but CLI mode is great for quick testing — like checking whether the Agent correctly understands a page layout.

Claude Code Integration

browser-use provides a Claude Code Skill for browser access right inside Claude Code:

bash

1	`mkdir -p ~/.claude/skills/browser-use`
2	`curl -o ~/.claude/skills/browser-use/SKILL.md \`
3	`https://raw.githubusercontent.com/browser-use/browser-use/main/skills/browser-use/SKILL.md`

Once installed, Claude Code can directly control the browser. Writing frontend code and want to verify the result? Let it open the browser and check for you.

I tried having it debug a CSS layout issue — it opened the browser, took a screenshot, analyzed the layout, and gave me fix suggestions. Fully automated. Way faster than me manually poking around in DevTools.

Cost Considerations

browser-use itself is free and open source, but LLM calls cost money. A single browser task typically requires 5-20 LLM calls depending on complexity. At GPT-5.5 pricing, that's roughly $0.01-$0.1 per task.

For high-volume use (hundreds of tasks per day), consider browser-use's own bu-* models — they claim better cost-efficiency. Or use a local model via Ollama, though quality will drop.

I currently run about 20-30 automation tasks per day using ChatBrowserUse's bu-2-0 model, costing roughly $10-15/month. About half what I was paying with GPT-5.5.

Common Questions

Q: How is browser-use different from Selenium?

Selenium is a traditional browser automation tool — you write CSS selectors and XPath to locate elements. browser-use uses an LLM to read the page and accepts natural language task descriptions. Selenium is better for structured, stable test automation; browser-use is better for varied scenarios that need "understanding."

Q: Can I use a free LLM?

You can run local models via Ollama, but quality drops significantly. browser-use's operations depend on the LLM's ability to understand web pages, and smaller models get things wrong frequently. At minimum, use something at the Claude Sonnet or GPT-4o level. Honestly, the money you save on API costs gets eaten up by failed tasks and retries.

Q: Which browsers are supported?

Chromium by default (via Playwright). Theoretically all Playwright-supported browsers work, but Chromium is the most thoroughly tested.

Q: Can it handle iframes?

Yes. browser-use automatically detects iframes and operates inside them. Cross-origin restrictions on iframes might cause issues though.

Q: What if a task fails?

browser-use has built-in retry logic. If it keeps failing, try breaking the task into smaller steps or adding more specific constraints. Debugging with headless=False so you can watch the browser is the most intuitive approach.

My Actual Use Cases

After that afternoon of tinkering, here's what I'm currently using browser-use for:

Automated ranking checks: Periodically search Google for my article keywords to see where they rank. Used to do this manually — now a script runs every morning and sends results to Telegram.
Competitive monitoring: Regularly scrape competitor websites for updates, save locally for analysis. I tried doing this with requests + BeautifulSoup before, but many sites use dynamic loading and the content wouldn't render. browser-use uses a real browser, so it sees everything.
Batch test account registration: When developing, I need lots of test accounts. Used to register them manually — exhausting. Now the Agent does it. Just watch out for CAPTCHAs — some sites throw them up and the Agent can't handle them.

Honestly, the best part isn't the time saved — it's the feeling of "I finally don't have to do this boring stuff myself." These repetitive web tasks used to mean either manual labor or maintaining a pile of Selenium scripts. Now I just describe what I want in plain English.

What's Next

I'm planning to explore a few directions:

Integrate browser-use with n8n for a complete automation workflow. n8n handles scheduling and notifications, browser-use handles browser operations. This combo should cover most automation scenarios.
Try local models (Qwen 3 or DeepSeek) as replacements for GPT to see if the quality is acceptable. The big win would be zero API costs, though speed and accuracy might suffer.
Research anti-scraping bypass techniques. Sensitive topic, but some legitimate use cases genuinely need it — like monitoring your own website's ranking changes.

Drop questions in the comments. browser-use updates fast — keep an eye on the GitHub release notes. New features land frequently. If this article helped you, or if you're also tinkering with browser automation, share your experience in the comments. Debugging alone is lonely — it's more fun when we all stumble together.

Written June 16, 2026. browser-use version 0.13.x. If API changes happen after this writing, defer to the official docs. The browser automation field moves fast — tutorials from six months ago might already be outdated, so always check the latest documentation.*

1	`# .env file`
2	`BROWSER_USE_API_KEY=** you can also use OpenAI, Anthropic, etc.:`
3	```bash
4	`OPENAI_API_KEY=*** or`
5	`ANTHROPIC_API_KEY=** ChatBrowserUse first — they say it's optimized for browser tasks. In practice it is noticeably faster than raw GPT-5.5, though pricing is similar.`
6
7	`## First Automation Task`
8
9	`Everything's installed. Let's run the simplest possible task:`
10
11	```python
12	`from browser_use.beta import Agent, BrowserProfile, ChatBrowserUse`
13	`import asyncio`
14
15	`async def main():`
16	`agent = Agent(`
17	`task="Go to GitHub and search for the browser-use project. Tell me how many stars it has.",`
18	`llm=ChatBrowserUse(model='openai/gpt-5.5'),`
19	`browser_profile=BrowserProfile(`
20	`headless=True,`
21	`allowed_domains=["*.github.com"],`
22	`),`
23	`)`
24	`history = await agent.run()`
25	`print(history.final_result())`
26
27	`if __name__ == "__main__":`
28	`asyncio.run(main())`

1	`from browser_use import Tools`
2
3	`tools = Tools()`
4
5	`@tools.action(description='Save scraped data to a local file')`
6	`def save_to_file(filename: str, content: str) -> str:`
7	`with open(filename, 'w', encoding='utf-8') as f:`
8	`f.write(content)`
9	`return f"Saved to {filename}"`
10
11	`agent = Agent(`
12	`task="Go to the Hacker News homepage, grab the top 10 story titles, save them to hn_titles.txt",`
13	`llm=llm,`
14	`tools=tools,`
15	`)`

1	`agent = Agent(`
2	`task="""`
3	`Go to xxx.com's registration page and fill in:`
4	`- Username: testuser`
5	`- Email: test@example.com`
6	`- Password: TestPass123!`
7	`Do NOT submit. Wait for my confirmation.`
8	`""",`
9	`llm=llm,`
10	`browser_profile=BrowserProfile(headless=False), # Keep headless off during debugging`
11	`)`

1	`import asyncio`
2
3	`async def my_task():`
4	`agent = Agent(...)`
5	`return await agent.run()`
6
7	`# Call from synchronous code`
8	`result = asyncio.run(my_task())`

1	`from browser_use.beta import Agent, BrowserProfile, ChatBrowserUse`
2	`import asyncio`
3
4	`async def main():`
5	`llm = ChatBrowserUse(model='openai/gpt-5.5')`
6	`profile = BrowserProfile(headless=True, allowed_domains=["*.github.com"])`
7
8	`# Step 1: Search`
9	`agent1 = Agent(`
10	`task="Search GitHub for the browser-use project",`
11	`llm=llm,`
12	`browser_profile=profile,`
13	`)`
14	`history1 = await agent1.run()`
15
16	`# Step 2: Get details`
17	`agent2 = Agent(`
18	`task="Open the browser-use repo README and find the install command",`
19	`llm=llm,`
20	`browser_profile=profile,`
21	`)`
22	`history2 = await agent2.run()`
23
24	`print(history2.final_result())`
25
26	`if __name__ == "__main__":`
27	`asyncio.run(main())`

1	`history = await agent.run()`
2
3	`# Inspect each step`
4	`for step in history.steps():`
5	`print(step.action) # What it did`
6	`print(step.screenshot) # Screenshot path`
7	`print(step.llm_output) # LLM's reasoning`

1	`browser-use open https://github.com # Open a URL`
2	`browser-use state # See clickable elements`
3	`browser-use click 5 # Click element #5`
4	`browser-use type "Hello" # Type text`
5	`browser-use screenshot page.png # Take screenshot`
6	`browser-use close # Close browser`