Personal Agent Building Complete Collection: From Vibe Coding to a Personal Work System That I Use Every Day

Foreword: A collection of guides on building personal agents and Vibe Coding.

Over the past month or so, I’ve been posting a whole series on X, running from “Why build your own” all the way to “Dual-model peer review.” Each post was a standalone topic. A lot of folks in the DMs asked for a version they could read all at once, so I’ve reorganized and woven those pieces into a single comprehensive guide. Consider this the wrap-up of the series.

If you’ve seen a few of these posts scattered around, here’s the full version. If this is your first time seeing it, this one post is enough.

Over the last 30+ days, I’ve taken my personal agent EvoPaw (GitHub: https://github.com/hxdflying/EvoPaw) from a “barely-running fixer-upper” and iterated it into a daily work system. Now it handles around 70–80% of my repetitive work.

This collection breaks down that entire 30-day process, dissects it, and shares everything without holding back. No frameworks, no architecture diagrams—just how an ordinary person can grow their own agent, step by step.

Tools come and go, but a system that truly understands your workflow, preferences, and methods can only be cultivated slowly by you. That’s the real message of this collection.

---

Chapter 1: Why I Recommend Building Your Own Agent — It’s a Sovereignty Issue

Every time I talk about EvoPaw, someone asks the same thing: “OpenClaw, Hermes Agent, Nanobot are already this good—why build your own?”

My answer is simple: the moment you use an off-the-shelf framework, you hand over the initiative to evolve.

With a ready-made framework, you’re forever chasing updates, tweaking prompts, and fixing compatibility. When the framework upgrades, you upgrade. When the framework stops being maintained, you start from scratch. It looks like you’re using a tool, but the tool is actually pulling you along.

Building your own system, on the other hand, brings a few very concrete benefits beyond vague words like “sense of control.”

First: clean module boundaries. You can swap out any layer—provider, orchestration, memory, skills—without major changes to upper-level code. That kind of freedom is nearly impossible with off-the-shelf frameworks.

Second: you can “steal” designs. Read Hermes’ Curator to learn about automatic skill evolution, read Nanobot to learn about dependency pre-checks, read Pi-mono to learn about multi-provider abstraction. Once you’ve run these patterns in your own system, you stop looking at other projects as “a whole framework” and start seeing them as a pile of components you can disassemble and reassemble.

Third: sovereignty over your data and understanding. After you’ve used an agent for a while, it gradually builds an “understanding” of you: your preferred formats, how you break down tasks, what you’ve been anxious about lately. These aren’t just files; they’re “second-order assets” that emerge from long-term interaction. Platform agents are hard to migrate fully. The rapport you’ve spent a year feeding could reset to zero overnight when you switch platforms.

Fourth: the barrier to entry is ridiculously low now. In 2026, with Vibe Coding tools like Claude Code and Codex, you don’t need to be a programmer. Modify one line of prompt, have AI write a Skill for you, add a memory file—step by step you can shape this system into something uniquely yours.

So my advice is very concrete: use ready-made tools for two weeks first, get a feel for what it’s like to have an AI assistant at hand. Then use them with a critical eye, write down everything that feels awkward, and start building from those pain points. The starting point is not the destination.

---

Chapter 2: From Zero to Daily Use in 15 Days — The Process That Worked for Me

If you’re determined to build your own, here’s the smoothest process I’ve found after running it myself.

I recommend Nanobot as the base. Clean code, lightweight, built-in Feishu integration, multi-provider support. Among all the projects I’ve tried, it’s the one that gives you the least hassle.

The entire process boils down to five steps.

Step 1: Pick a lightweight base. The criteria are strict: under 8,000 lines of code ideally, the agent loop should be clear at a glance, and it should be fairly easy to hook into Feishu or Telegram. The main point isn’t which one you pick, but whether you can modify it—if you can modify it, it can grow with you.

Step 2: Set it up with Claude Code and get Feishu running. Strongly recommend using long-connection mode for Feishu. No public IP, no webhook configuration needed—it’s the smoothest path.

Step 3: Create your scaffolding files. CLAUDE.md is your project’s instruction manual. Under docs/, put spec.md, prompt_plan.md, and todo.md. These files serve as persistent memory across sessions—more powerful than any prompt trick.

Step 4: Force fuzzy requirements into a clear spec. This is the most critical step in the entire process. I expand on it in Chapter 3.

Step 5: Every feature must come with tests. Also critical—expanded in Chapter 4.

After these five steps, you’ll move from “using someone else’s agent” to “owning your own system.” The difference is bigger than you might think.

Two advanced tips worth mentioning: one is to “steal” good designs from projects like Hermes and OpenClaw—whether it’s a conceptual rewrite, module-level copy, or even file-level reuse. The other is to install Codex MCP and let the two models review each other. The effect is significant, and I’ll discuss it in Chapter 7.

---

Chapter 3: The Achilles’ Heel of Vibe Coding — Turning Fuzzy Requirements into Clear Specs

The speed advantage of Vibe Coding depends on one thing: the spec must be clear. The fuzzier the spec, the faster the AI runs, and the harder you crash. I only realized this after paying many tuition fees.

Now when I run a new requirement, I follow these five steps, using “Feishu group todo summary” as an example.

Step 1: Fill out a table first—force yourself to speak plainly. Write down the pain point, the desired outcome, and honestly note what you don’t know. This step seems wasteful, but it blocks 80% of “writing as your mind wanders.”

Step 2: Open Plan mode (Shift+Tab) and let Claude Code drill you with questions. Edge cases, error handling, performance requirements, conflicts with existing features—ask everything. This step turns fuzzy thoughts into tangible concepts. The more you’re questioned, the clearer you think.

Step 3: Output docs/spec.md. Use lists and tables. Write it like a contract, not an essay. A spec that someone else could implement faithfully is a qualified spec.

Step 4 (optional but strongly recommended): Have Codex review the spec. It can catch blind spots that both you and Claude missed.

Step 5: Break the spec into prompt_plan.md and todo.md. Each step should take 2–5 minutes and be independently verifiable. This keeps your pace steady when you start coding.

And the lazy person’s tool: I highly recommend obra/superpowers (https://github.com/obra/superpowers). Once installed, if you say “add a feature,” it automatically triggers a brainstorming skill, forcing you to nail down the spec before you start. For beginners, it prevents about 70% of pitfalls.

---

Chapter 4: Tests — The Courage to Swap Models and Do Major Refactoring

In the age of Vibe Coding, tests are no longer just “bug prevention”; they are the channel to truth. Without tests, you have no idea whether the AI’s latest change broke old functionality.

But there’s a counterintuitive issue: letting AI write tests for itself is inherently prone to cheating. The tests and the implementation mirror each other, always passing, giving you a false sense of security with zero defensive value.

The correct approach is a simplified version of TDD, in five steps.

Have the AI write tests based on the spec. At this point the tests should be red.
Review those tests yourself—only test behavior, not implementation details. This gatekeeping is crucial.
Have the AI write the implementation to make the tests green.
Add boundary cases and adversarial cases—deliberately feed bad data to functions.
Have Codex review those tests again.

In superpowers, there’s a test-driven-development skill that forces the red → green → refactor cycle, plus an “Iron Law”: if you write implementation without writing the test first, delete it and start over. Sounds harsh, but after two weeks, you’ll find the real leverage isn’t the tests themselves—it’s the courage to refactor.

---

Chapter 5: Key Configurations for Claude Code and Codex — Instant Improvements

These two configuration sets have been running for months with very noticeable effects. Spending 5–10 minutes making these changes will make your tools smarter, cheaper, more reliable, and quieter.

Claude Code: 8 Key Configurations

Paste into ~/.zshrc or ~/.claude/settings.json:

Force high thinking budget (ANTHROPIC_THINKING_BUDGET)
Turn off adaptive thinking to prevent hallucinations
Use the cheap Haiku model for sub-agents — cuts your bill to 1/5
Default main model to Sonnet, switch to Opus only for hard problems
Increase max output token limit
Enable virtual viewport and diff rendering to avoid screen flicker
Disable all telemetry
Relax bash timeout to 30+ minutes

Codex: 10 TOML Configurations

Place in ~/.codex/config.toml:

Default strong model, reasoning effort set to high; approval policy set to on-request – ask when needed; sandbox set to workspace-write, default offline; search set to cached; disable alternate screen, disable real-time reasoning event display, disable long-term history persistence, disable analytics.

Each setting individually looks minor, but combined, the user experience is a qualitative leap.

---

Chapter 6: Superpowers Discipline System — A Must for Beginners

The biggest enemy for beginners doing Vibe Coding isn’t AI being not smart enough, but the lack of enforced discipline. AI is fast, but humans can’t resist the urge to “skip a step.”

Install obra/superpowers and focus on these five skills, all set to auto-trigger:

brainstorming — forces a spec before you start coding
writing-plans — splits the spec into bite-sized tasks
test-driven-development — enforces red → green cycle + Iron Law
systematic-debugging — no fix without reproduction
verification-before-completion — don’t claim done without actually running the verification command

Stick strictly to these for the first week to build muscle memory before you consider customizing. Leave the other 9 skills (worktree, parallel agents, etc.) for a month later, or you’ll scatter your energy.

---

Chapter 7: Dual-Model Peer Review — Breaking Your Own Blind Spots

Claude and Codex have different training distributions, so their blind spots don’t overlap. Having them review each other is the highest ROI QA method I’ve found so far.

Setup is simple: register Codex as a tool for Claude Code via MCP (or vice versa). The whole flow runs within a single session.

The four most valuable peer-review scenarios are:

Spec review — specifically catches edge cases, ambiguities, and hidden assumptions.
Test review — looks only at the tests themselves, catching “fake tests” that don’t actually test anything.
Debug second opinion — let the other model diagnose the root cause independently without seeing the fix.
Important diff review — focuses on three types of issues: breaking interfaces, swallowing errors, and inconsistency with the spec.

Two ironclad rules must be followed:

Tests must be written by the model that did not write the implementation.
Debug must be done by the other model independently, without seeing the fix.

As long as these two rules are enforced, peer review won’t degrade into two models nodding at each other.

---

Conclusion: Take Action — Build a System That’s Yours

After all this series, it comes down to one sentence:

With ready-made frameworks, you’re always a user. Build your own, and you have a chance to become the owner.

Starting today, you can do these four concrete things:

Fork Nanobot or a similar lightweight project.
Install Claude Code or Codex and apply the configuration tweaks.
Install Superpowers.
Pick a real pain point of yours and run through the spec process once.

You don’t need to get it perfect in one go. Change one line of prompt a day, add one Skill a week, refactor the base once a month—and little by little, it will become the system that understands only you.

That’s how EvoPaw grew. I hope this collection gives you a spark of inspiration and the courage to start.