Always On. Off the Meter.
Migrating my personal AI agent fleet from per-token API to flat-rate subscription.
For several months I'd been running OpenClaw, building out a virtual team of agents to expand and improve my productivity. When Anthropic changed their API billing policy in late April, my costs skyrocketed.
The architecture was right, but the billing meter broke. Roughly $30–60 a day in API calls, depending on workload — not outrageous, but enough that I started rationing work. I'd think twice before queuing an overnight research job, or adding another automation flow. The framework was always-on by design, but my use of it wasn't.
So this month I tried migrating everything onto Claude Code, running my whole agent fleet under a flat ~$200/month subscription. It's not perfect, but almost every primitive OpenClaw gave me I've been able to replicate, a few maybe even improved. Agents are awake all the time and capabilities have expanded with continued use.
The cost gap sits at several multiples of the subscription and widens the more I use them. The subscription lets me run it and expand what I'm able to use it for, at least for now.
The Setup
It all runs through a chief-of-staff orchestrator plus a cohort of specialists — researcher, writer, analyst, family operations, a few others. Each specialist has its own identity, memory, tool access, and its own messaging channel. I interact with whichever agent the task belongs to (sometimes through delegation), not a single super-assistant pretending to be all of them.
The design choices that do the heavy lifting:
Identity contracts. Each agent gets a CLAUDE.md file that defines who it is, what it owns, what it doesn't, and the voice it shows up in — essentially replacing OpenClaw's SOUL.md, IDENTITY.md, and USER.md. It's the system prompt loaded at the start of every session, but it lives as a persistent artifact. Per-agent contracts let each role stay in character, and their state and memory live in human-readable markdown files.
Directory structure. One folder per agent, with subdirectories for memory, delegations, and any agent-specific scratch. A shared folder holds cross-agent assets — the journal, references, anything multiple agents need to read. New agent equals new folder; that's the whole onboarding ritual.
File-based delegations. When the orchestrator hands a task to a specialist, it writes a markdown file into the specialist's inbox. The specialist picks it up, drafts, appends a response, and moves the file to an archive. The trail is just a folder — reviewable, recoverable, async-native. If a session dies, no in-flight state is lost because the file is still there.
Per-agent isolation. Separate identities, memory, channels, and OAuth scopes. The research agent can't reach into the financial agent's books. The writer can't send email. Each agent runs with the smallest set of integrations its role needs. It's a security boundary that doubles as organization — if any one agent gets compromised, the blast radius is bounded to what that one set of tokens allows. It's not about trusting the agent to behave, it's about what it can do in the first place.
Cron and tmux for persistence. Long-running sessions live in tmux, supervised by launchd so they restart on reboot. Cron handles scheduled sweeps — morning briefs, end-of-day rollups, weekly memory distillations. Launchd spawns agents for delegated work that doesn't need a permanent presence. Always-on matters because there's work that builds up context over time, and spawning fresh agents loses that.
Memory in three layers. Daily logs are raw notes — what the agent did, what got decided, what was blocked. The curated long-term file is distilled lessons that survived re-reading. References are pointers to files, contact lists, canonical documents elsewhere in the workspace. Writes are partly scheduled (flush logs end-of-day, distill weekly) and partly ad-hoc, when something is worth remembering on the spot.
All of this runs on a headless Mac mini sitting in a closet. Tailscale gives me SSH and screen sharing anywhere over a private network. Syncthing replicates skills, drafts, and reference content between the mini and my laptop in near-real-time without a middleman. Telegram or terminal are the communication channels and follow me regardless of which machine I'm on. The combination makes it feel like having a team, rather than running an application tied to one machine.
That's the skeleton. The interesting parts are the things that became possible once the bill stopped scaling with use.
What Subscription Pricing Made Viable
I've started using these agents differently once the meter stopped running. Time will tell how useful this can be.
Identity contracts can be expansive. On flat pricing, the contract can carry weight. A full per-agent identity file is meaningfully different from “act as a CFO” at the start of each chat. You get a long-running CFO that holds context across weeks.
Check-and-improve loops in fresh contexts. When the writer drafts something, a reviewer agent gets spun up in a clean session. There's no shared memory with the writer, so there's no continuity bias. It reads the draft cold and critiques it the way a stranger would. The writer can then edit against that critique. Three rounds is normal. Per-token billing makes you think hard before doing this. On flat pricing, you spin up the reviewer because the marginal cost is zero and the quality lift is real.
Scheduled memory writes. Agents reflect at the end of the day and update their own long-term memory — what was learned, what's worth keeping, and what got overruled by a correction. The cost-equation in API land could discourage this. Memory that compounds is what makes the version of the agent I will have in month six meaningfully better at my problems specifically than the one I had in month one.
Channel discipline. Most specialists have their own channel (Telegram bot and message thread). The “one super-assistant” UX collapses distinct teammates into a single confused identity in ways that muddy context for the agent and the user. Per-agent channels build the org chart in your head, and get multiple sequential processes in-flight.
What It Does
Four workflows that have become reliable for me.
Morning sweep. Inbox, calendar, status of in-flight items. This runs before I'm awake and lands as a single Telegram brief — what's on the calendar, who needs a reply, what's stalled, what was committed yesterday and didn't happen yet. The day starts with the cognitive load of triage already managed.
Long research delegated and returned. Market scans, vendor comparisons, frameworks pulled together from a dozen sources I'd never have time to read myself, kicked off the night before and waiting as drafted artifacts in the morning. The job runs while I sleep and the synthesis is ready by the time I'm at my desk.
Drafts with built-in critique. The writer drafts. A reviewer in a fresh session reads the draft cold and critiques it the way a peer would. The writer refines. The output isn't first-pass; it's third-pass.
Memory across sessions. When I pick up a thread from three weeks ago, I'm not re-uploading context. The agent already has it via daily logs, curated knowledge and reference pointers it's accumulated. Conversations resume where they left off.
The wins can be measured in hours, but also in attention. The small stuff gets managed so I have more time (and more context) for things that matter.
Whether This Is For You
Who fits: knowledge workers running multi-domain personal work, founders juggling many functions, or anyone whose API bill is climbing faster than the work it's enabling. If you're operating across enough surfaces that no single chatbot covers them, the multi-agent shape will feel obvious five minutes in.
Who it doesn't fit: tightly-bounded single-task automation. The setup pays off when the surface is wide and the work shape is varied.
The path: start with one orchestrator. Define its identity. Give it a memory file, a single channel, and the smallest set of tools that makes it useful. Live with it for a couple of weeks. Specialist needs surface naturally as you find yourself wanting different voices for different work.
Try It Yourself, Let Me Know What You Think
This is my setup, but if you're interested to try it yourself, I've made the framework open source — github.com/skyelaudari/agency. Identity contracts, file-based delegation, fresh-context review, per-agent OAuth scopes, scheduled memory, plus the Tailscale + Syncthing wiring that makes a headless Mac mini feel native to your setup.
The cost gap between API and subscription is meaningful today and grows with usage. That arbitrage will probably close as pricing rationalizes. Treat the current window as a unique time to learn and experiment without the meter running. And even if the arbitrage closes, good multi-agent design choices likely still hold.