In my last post, I mused about buying Claude a Mac Mini. Well: I actually did it. I have a beautiful, shiny, silver box on my desk, I can connect to it from my phone or my laptop from anywhere, and let Claude Code go wild. Unfortunately, though, I don't find myself taking advantage of this very often. Instead, when people ask me what I'm doing with it, I sheepishly reply, "Not much yet... but I'm sure I'll think of something." Why haven't I thought of anything yet? Why, in the Year of our Agents, 2025, do browser agents not feel that useful?
The capabilities are basically there. Models that are intended to be visually grounded, are. Claude, Gemini, and many wonderful open-source Chinese models can look at a UI and tell you the coordinates of an element. (Weirdly, OpenAI refuses to make its API models visually grounded, so you get funny results like GPT-4o having a Screenspot score of 2%.) Long-context inference allows you to keep a long trajectory of screenshots and actions in context, and caching allows that context to grow long without inference slowing to a crawl. Reasoning allows models to plan about how to solve a task, and pivot when something isn't working.
And yet, the Mac Mini is just sitting there. In this post, I'll ramble a bit about why I think that is. But first I'll share a bit about my setup.
The Unboxening
The first things I did when I opened up the new Mac Mini (in no particular order) were: install uv, install Claude Code, install Tailscale. And of course, set up a bunch of system preferences to make remote login work. Then, I got to work making tools to let Claude Code use the computer.
This turned out to be slightly more annoying than anticipated. Macs are somewhat hostile to automation and remote control, probably because of Apple's emphasis on privacy. I started with cliclick as my automation driver, but soon had to add some things from pyautogui for scrolling, and even a custom screenshot command that simulates Cmd+Shift+4 keypress, since terminal-based screenshots on Mac don't capture windows.
Eventually, though, the tools were serviceable. I could start Claude Code in a terminal, it could click around, open Chrome, and do some stuff. But like. What stuff? Find me a travel destination? Look for a puppy to adopt in San Francisco? (Man, I actually understand why the AI labs struggle to demo anything other than travel and ordering food.)
Why Computer-Use Agents Aren't Usually the Vibe
Obviously, if you have a truly general agent, there's all manner of tasks you could hand off to it. Answer all my emails. Go scam some grandmas out of their retirement savings. Find me a boyfriend. If you don't feel inclined to hand these tasks off, it means you either don't trust the agent to do it well, or it's literally easier to just do it yourself.
And, kidding aside, there actually are a lot of annoying, repetitive tasks that I do in the browser that I'd have thought would be great candidates for automation with agents. For example: open a legacy data portal, search for every date from January 1, 2001 to today, and download all the records for each date into one CSV. Unfortunately, I think many repetitive browser tasks, including this one, fall into one of two buckets: they're either simple enough that doing them with an agent feels slow and expensive for no reason, or, they require enough judgment that you still want to supervise the task, and if you're supervising it, you may as well just do it.
In these cases, I find myself reaching for Playwright automations and vibe-coded browser extensions instead. Playwright doesn't really require an explanation or introduction, so I won't waste space here on that. Suffice it to say that if you want to scrape 10,000 webpages that all work the same way, it is easier to bang your head against an AI one time to get a working Playwright script than it is to bang your head against an AI 10,000 times, once for each page you want to scrape.
A Non-Exhaustive List of my Awesome Extensions
Playwright is good for full-auto mode, but there's a lot of stuff I do in the browser where I still want to be "in the loop," making judgment calls about what pages to open, what files to download, and so on. In these cases, I have found browser extensions to be a powerful way to "scale myself" to get more done with less effort. Each of these extensions was created in 5-20 minutes with AI, a paradigmatic example of what the LinkedIn Chattering Class would call "disposable software."
- Open Links in Tabs: Paste a list of links into a text box, they all open in the browser. (Or, open all links on the current page in new tabs.) No residential proxy needed!
- Export Tabs as JSONL: All HTML of all open tabs is saved to disk as a JSONL file. I use this to save contents after using Open Links in Tabs.
- Download All PDFs: Downloads all PDFs currently open in the browser.
- Close Duplicate Tabs: Self-explanatory.
- Download Tables: Download each table in each open tab as a CSV.
- And my personal favorite: Switch to o3, which automatically adds
?model=o3to the URL whenever I visit ChatGPT, since for a while OpenAI was defaulting to 4o regardless of my last-used model.
These sound simple, and they are! I vibe-coded them in 15 minutes! But they're also very powerful, and slot in to whatever I'm already doing, relieving cognitive load instead of adding. Telling an agent in painstaking detail EXACTLY what you want feels like adding. Not to mention figuring out how to spin them all up, deal with errors and retries, collect all the results, parse malformed JSON... I can feel my cortisol spiking just writing this. No thanks!
Now, whenever I feel frustration at repetitive browser tasks, I step back and consider whether I can make a tool that allows me to scale my own effort, instead of trying to hand the task off.
So... What are Browser Agents Good For?
This post is not intended to be a rant against computer-use agents. After all, I bought the Mac Mini because I'm bullish on the idea of having a little servant in a box do stuff for me. I'm finding that many repetitive tasks are better automated a different way. Playwright is great for closed-ended tasks. Agents SHOULD be preferred for open-ended tasks, but it feels like the current barrier is trusting them enough to relenquish control. I think the calculus will change when:
- (1) Agents are more reliable/trustworthy. The feeling that it's going to be done wrong and you'll just have to re-do it yourself prevents people from even trying.
- (2) It's easier to manage them. Claude Code is a great harness for coding; less so for computer use, even if it can be hacked to work. I think the CUA harness of the future remains to be built, although there are promising projects like Magnitude and browser-use working on exactly this. OpenAI Agent also lowers the activation energy of spawning agents, which is a move in the right direction.
If you've found a knock-down case for using a browser agent to automate something in your life, I'd love to hear about it. Not because I'm working on this professionally—I'm just curious. You can find me on Twitter: @andersonbcdefg.


