87 PRs in One Weekend: How I Went from Copy-Pasting AI Output to Running an Agent Fleet
I woke up Saturday morning to 14 pull requests I didn’t write.
They were real PRs. Typed commits, passing builds, coherent diffs. One added Supabase Auth to HydrantMap. Another restructured PermitRadar’s routing from flat municipality paths to nested county/municipality paths. A third bolted on a CSV import pipeline for municipal hydrant data. All opened between 1am and 7am while I was asleep.
By Sunday night, the total was 87 PRs across two projects. 86 merged. 204 commits. That’s 29 PRs per day, or roughly 1 PR merged every 50 minutes, around the clock.
Two months ago, I was copy-pasting code from ChatGPT into my editor at 1.6 PRs per day.
Here’s how I got from there to here, and what broke along the way.
Phase 1: The chatbot era (Jan 3-10)
I was building EaglesMegaTour, an Eagles-themed browser game, right before the NFL playoffs. The tool was OpenAI Codex through ChatGPT. The workflow: describe what I wanted, copy the output, paste it into my editor, test it, fix what broke, commit.
51 commits in 7 days. 11 PRs, 8 from Codex. 1.6 PRs per day.
The commit messages from this era tell you everything: “bugfix”, “fixes”, “speed changes”, “mobilespeed”, “dsf”. That last one isn’t a typo. I just didn’t care enough to type a real message because I was already back in the chat window asking for the next thing.
Codex was a faster Stack Overflow. Better answers, less tab-switching. But I was still the one doing all the work. Every line of code passed through my hands, my clipboard, my editor. The AI had no memory, no context, no idea what the project looked like 5 minutes ago.
Phase 2: The co-pilot era (Jan 25 to Feb 19)
I switched to Claude Code in interactive mode and stopped shipping product entirely. For almost a month, I built infrastructure.
dotfiles: 28 commits, 17 AI co-authored. I set up a CLAUDE.md context system, project docs, symlinks, SSH configs. This was me teaching Claude how to work with me across machines and repos.
n8n-multi-tenant-setup: 13 commits, 8 AI co-authored. VPS automation, n8n workflows, Slack alerts, PDF monitoring for local government.
Life-Dashboard: 12 commits, 9 AI co-authored. Built a full React dashboard connected to n8n. Went from a LCARS Star Trek theme to a clean modern UI in one session.
PRs per day across all 3 projects: about 0.1. Almost everything was direct commits because Claude and I were pair programming. I was driving. Claude was in the passenger seat suggesting turns.
This phase looks like a waste on a velocity chart. It wasn’t. I was wiring up the context chain (CLAUDE.md files pointing to project docs pointing to conventions) that would let Claude understand my projects without me explaining them every session. Think of it as onboarding a new hire. The first month is slow. That’s the point.
Phase 3: The delegator era (late Feb to early Mar)
Two changes happened at once: I started filing GitHub issues instead of typing instructions into a terminal, and I built claude-agent-bootstrap, a setup.sh that drops into any repo and generates a CI pipeline, CLAUDE.md with agent rules, autonomous loop prompts, and permissions.
The label system: claude-ready (backlog), claude-wip (claimed), claude-blocked (stuck). Plus a TODO(@claude) convention for inline micro-tasks in source code.
About 0.5 PRs per day. Still running sessions manually. But the shape of my work changed. I was filing issues, writing acceptance criteria, prioritizing a backlog. I was managing, not building.
Phase 4: The fleet era (Mar 7-9, one weekend)
Friday night I filed a batch of issues across HydrantMap and PermitRadar, set up 5 autonomous /loop commands in Claude Code, and went to bed.
The 5 loops:
- TODO Worker (every 15 min): scans for
TODO(@claude)comments, files issues, opens PRs - Lint Guardian (every 30 min): runs linting, fixes violations, opens PRs
- Build Watchdog (every 30 min): runs the build, fixes failures, opens PRs
- PR Comment Responder (every 15 min): reads review comments, pushes fixes
- Code Quality Sweep (every 1 hour): finds code smells, refactors, opens PRs
87 PRs in 3 days. 86 merged. 29 PRs per day.
HydrantMap went through 37 PRs (all merged). On Friday it was a simple static Leaflet map with hydrant markers. By Sunday: full multi-tenant SaaS with Supabase Auth, user profiles, avatars, login/signup, hash router, admin dashboard, verification queue, CSV import, leaderboard, photo capture, dark mode, mobile responsive, invite codes for municipalities, and a vitest suite covering 45 tests. Auth, profiles, admin, moderation, import, social features, test infrastructure, and responsive design. One weekend.
PermitRadar went through 50 PRs (49 merged). Started as a Camden County-only app with hardcoded references. By Sunday: multi-county support with Gloucester County fully integrated, route restructure from /[municipality] to /[county]/[municipality], a new GDB-to-GeoJSON data pipeline, SODA API verification, parameterized refresh scripts, county selector, dynamic text, error boundaries, a GitHub Actions CI pipeline, vitest test suite with 43 tests, aria-label accessibility sweep, and an editorial visual redesign with percentile range charts and stacked bar visualizations.
My role for the entire weekend: file issues Friday night, review PRs Saturday and Sunday. That’s it.
What broke
A lot.
Branch sprawl. Lint Guardian created 10 duplicate branches for the same fix. Build Watchdog created 5. Code Quality Sweep created 22. Each loop would detect the same issue, not realize another loop already had a PR open for it, and spin up its own branch. I had to add dedup rules so loops check for existing branches before creating new ones.
The MCP-first lesson. The agent kept generating SQL migration scripts and posting them in PR descriptions with a note saying “run this in Supabase.” It had a Supabase MCP tool that could execute SQL directly. It just… didn’t use it. I had to add explicit rules: you execute, I review. Never insert a human step that the agent can do itself. Every time you write “please run this,” you’ve failed.
Epic branches. The Gloucester County expansion touched routing, data pipelines, UI components, and test fixtures. 15 PRs all targeting main created a merge conflict nightmare. Big features need a parent branch with sub-tasks PRing against it. I learned this the expensive way.
The math
| Phase | Period | PRs/day | My role |
|---|---|---|---|
| Chatbot | Jan 3-10 | 1.6 | Builder (copy-paste from chat) |
| Co-pilot | Jan 25 - Feb 19 | ~0.1 | Pair programmer (direct commits) |
| Delegator | Late Feb | ~0.5 | Manager (filing issues manually) |
| Fleet | Mar 7-9 | 29 | Architect (reviewing PRs) |
1.6 to 29. An 18x increase.
But the number that matters more: Phase 2’s 0.1 PRs/day is what made Phase 4’s 29 possible. Without spending a month teaching Claude my conventions, my project architecture, my preferences for commit messages and branch naming and code style, the autonomous loops would’ve produced garbage. Fast garbage, but garbage.
What actually changed
I’m a TPM. That’s my day job, and these are personal projects I build on nights and weekends. I’m not doing this 8 hours a day. That’s kind of the whole point.
The reason this worked is that both halves of my skill set finally had something to grab onto. The technical side (writing the bootstrap scripts, configuring CI pipelines, setting up MCP tools, debugging branch dedup logic) got the system running. The program management side (scoping work into clear issues, defining acceptance criteria, prioritizing a backlog, structuring epics with parent branches) kept the system producing useful output instead of chaos.
Most conversations about AI productivity focus on one or the other: engineers shipping faster, or managers prompting better. What I found is that the combination is the thing. The agent doesn’t need a perfect engineer or a perfect PM. It needs someone who can set up the tooling AND write a clear enough issue that an autonomous loop can pick it up at 3am and produce a mergeable PR.
That’s what TPMs do. We sit at the intersection of technical systems and project structure. We’ve always been translators between “what needs to happen” and “how it actually gets built.” The agent just made that translation literal.
The agent works while I sleep. I wake up on a Saturday morning to a queue of PRs. Some are perfect. Some need a comment and a revision. A few get closed. But the throughput is real, the code compiles, and the features ship.
I keep thinking about what this looks like at scale. 5 loops across 2 projects got me 87 PRs in a weekend. What happens with 20 loops across 10 projects? What happens when the loops can spin up other loops?
I honestly don’t know. And I’m curious: has anyone else hit a wall with this approach that I haven’t found yet?