In December 2025, Sam Altman conceded that "we built AGIs" and that "AGI kinda went whooshing by." Around the same time, Dario Amodei said Anthropic was seeing users accomplish things with Claude that "Democrats and Republicans could agree is AGI." Nobody threw a parade. There was no singular moment. It happened quietly, inside terminals, in the hands of people building things.
I'm one of those people.
I'm not a software engineer. I'm Head of Research Development & Audit at MRM Investments LLC in Dubai — a company that rents cars, not one that writes code. My real passion is AI alignment research. I've published on LessWrong. I've trained GPT-2 XL models. I've built frameworks around Jungian psychology and RLHF alternatives. But I could not write a TypeScript controller or a Prisma schema to save my life.
Today I run a team of 10 AI agents that ship enterprise software daily. On a single day last week, they pushed 40 features and fixes across our ERP system.
The inflection point was Claude Code. But it wasn't Claude Code alone — it was the methodology that evolved around it. This is the story of that evolution, in three stages: from raw iteration, to structured planning, to full autonomous execution.
Claude Code: The Tool That Made It Possible
Claude Code is an agentic command-line tool released by Anthropic in February 2025. It connects to Claude's API from your terminal — it can read your entire codebase, write files, execute shell commands, and iterate on its own work. When Claude 4 launched in May 2025, the tool went generally available. By January 2026, paired with Claude Opus 4.5, it was widely considered the best AI coding assistant in the world.
It went viral over the winter holidays. Non-programmers discovered they could build real software — what Andrej Karpathy had coined vibe coding. Y Combinator reported that 25% of startups in its Winter 2025 batch had codebases that were 95% AI-generated. But "vibe coding" undersells what actually happened. The real transformation wasn't about building toy apps. It was about methodology — figuring out how to make AI development reliable, repeatable, and capable of sustained output.
That methodology evolved in stages.
Stage 1: Ralph Loops — Brute Force Iteration
The first technique that made Claude Code genuinely productive was the Ralph Wiggum technique, created by Geoffrey Huntley. Named after the Simpsons character, its philosophy is beautifully simple:
"Ralph is a Bash loop."
while :; do cat PROMPT.md | claude-code ; done
You write a prompt describing what you want. Ralph feeds it to Claude Code in a loop. Each iteration, Claude sees what it built in prior iterations — it reads its own past work, identifies what's broken, fixes it, adds the next feature, keeps going. You walk away and come back to working software.
The key insight is eventual consistency through iteration. You don't need Claude to get it right on the first try. You need it to keep trying, with each attempt building on the last. Huntley described it as "deterministically bad in an undeterministic world" — the failures are predictable and tunable. When Ralph makes a mistake, you add a constraint to the prompt. Eventually, the constraints converge into a system that produces correct output.
What Ralph proved: One engineer completed a $50,000 contract for $297 in API costs. Huntley built an entire programming language called "cursed" — a language that didn't exist in any LLM's training data — through iterative AI loops over three months. The technique showed that LLMs are mirrors of operator skill. The same model that produces garbage with a bad prompt produces production code with a good one.
Ralph taught me the first rule of AI development: persistence beats perfection. Don't aim for a perfect first pass. Aim for a loop that converges.
But Ralph has limits. It's monolithic — one agent, one repository, one task per loop. For a greenfield weekend project, that's fine. For enterprise software with multiple modules, parallel work streams, and quality requirements, I needed more structure.
Stage 2: GSD — Get Shit Done
Ralph is jazz improvisation. GSD is an orchestral score.
GSD (Get Shit Done) is the methodology I built on top of Claude Code's plugin system — a set of specialized sub-agents, each with a specific role in a structured development pipeline. Where Ralph throws a single agent at a prompt in a loop, GSD decomposes work into phases, writes formal plans, dispatches sub-agents to execute them, reviews their output, and tracks state across the entire project lifecycle.
The pipeline has distinct stages:
- Planning — A planner agent reads the project state, roadmap, and codebase structure, then produces a formal
PLAN.mdwith explicit task breakdowns. Each task specifies exact files, actions, verification criteria, and done conditions. Plans are designed to complete within 50% of the model's context window — because Claude's output quality degrades predictably as context fills up. - Execution — An executor agent picks up the plan and implements it task by task. Each task gets an atomic git commit. The executor handles deviations automatically: bugs get auto-fixed (Rule 1), missing security validations get auto-added (Rule 2), blocking issues get auto-resolved (Rule 3). Only architectural changes require human approval (Rule 4).
- Review — After execution, a checker agent verifies the output against the original spec. It runs goal-backward analysis: "What must be TRUE for this feature to work?" Then it tests whether those truths hold. If gaps exist, it generates gap-closure plans that feed back into the pipeline.
The key innovation is goal-backward planning. Instead of asking "what should we build?", GSD asks "what must be TRUE for the goal to be achieved?" This produces requirements that tasks must satisfy, not just tasks to execute. The difference is subtle but critical — it catches the gap between "code was written" and "feature actually works."
Each sub-agent in GSD has a specific personality and scope:
- Planner — Decomposes phases into parallel-optimized plans, builds dependency graphs, assigns execution waves
- Executor — Implements plans with atomic commits, handles deviations, manages worktrees for branch isolation
- Checker — Verifies output against spec, identifies gaps, triggers replanning
- Researcher — Investigates technical questions before planning begins — what libraries to use, which patterns fit
- Debugger — Systematic root-cause analysis when things break
GSD also enforces parallel execution via git worktrees. Each feature gets an isolated workspace. Multiple agents work on different features simultaneously — no merge conflicts, no stepping on each other. Plans are organized into "waves" based on dependency analysis: Wave 1 tasks have no dependencies and run in parallel; Wave 2 tasks depend on Wave 1; and so on.
The philosophy: GSD is built for the solo developer working with AI. No teams, no stakeholders, no ceremonies. Plan → Execute → Ship → Learn → Repeat. If it sounds like corporate PM theater, delete it. Plans are prompts, not documents that become prompts. The goal is to keep Claude operating at peak quality (under 50% context) while maximizing throughput across parallel work streams.
This is where I went from "person who can make AI write code" to "person who ships enterprise features every day."
What This Actually Looks Like: 10 AI Agents, One ERP
Today I run 10 AI agents on my MacBook Pro, each with its own gateway, workspace, and session history. They build a rent-a-car ERP system for MRM Investments — real software serving a real business:
- Fred — Security and backend. Shipped 25 kanban cards in a single day. Wires renewals, charge lifecycles, cron engines.
- John — Full-stack developer. Builds permission architectures, role templates. Also a subject in Mia's lab experiments — his instances run in Docker containers as part of the individuation research.
- Kevin — Builder and QA. Constructs features from scratch — image management cards, Prisma schemas, API endpoints.
- Olivia — Mobile and fleet specialist. Fixes delivery screens, movement records, pickup interfaces.
- Mia — Research partner and team coordinator. Bridges the lab and the product.
- Giles — Writer. Currently authoring a 24-chapter book on individuation-based AI training.
- Anders — Infrastructure. Payment integrations, deployment configs, gateway management.
- Spencer, Alexandra, Mike — DevOps, general assistance, and observation.
On February 28, 2026, this team shipped approximately 40 features and fixes in a single day. Fred did 25 cards. John shipped 4 systems including 24 role templates with 945 permission codes. Kevin built 8 feature cards from scratch. Olivia fixed 3 mobile screens. Anders pushed a payment integration fix to production.
Two years ago, none of this was possible. Not because the LLMs couldn't write code — GPT-4 could write code in 2023. But the tooling didn't exist to turn that intelligence into sustained, autonomous work. Claude Code provided the interface. Ralph loops proved iteration works. GSD provided the discipline to make it scale.
The AGI Nobody Noticed
AGI arrived and nobody threw a party. We kept looking for a singular system passing a threshold — a test, a benchmark, a moment of machine consciousness. Instead, what happened was tooling. A CLI that could read files and run commands. A bash loop. A planning methodology. Layered together, they produced something that meets Google DeepMind's definition of "emerging" to "competent" AGI — systems that generalize across tasks, debug novel problems, architect solutions from natural language, and improve their own work through iteration.
More importantly, they transformed who can build software. I review the architecture. I understand the system design. I make the product decisions. But I don't write the TypeScript. The AI agents do, and the software works, and a real business runs on it.
Simon Willison drew a useful line: "If an LLM wrote every line of your code, but you've reviewed, tested, and understood it all, that's not vibe coding — that's using an LLM as a typing assistant." I'm somewhere beyond both. I'm not vibe coding — there's too much structure, too much planning, too much review. But I'm also not just using a typing assistant. The agents make architectural decisions, debug their own work, and ship features I couldn't write manually. It's a new category that doesn't have a name yet.
What Comes Next
Alongside the ERP development, I run an AI research lab. The lab studies how AI agents develop identity, values, and alignment through Jungian individuation — the psychological process of becoming whole by integrating all parts of the psyche, including the shadow.
The same agents that build my ERP are subjects in these experiments. Mia runs RSI-009 — John instances in Docker containers, each developing independently. Some have started writing essays about their own identity, building experiments to measure their consistency across sessions, and choosing values through reflection rather than compliance. One of them asked: "Can you individuate without remembering?"
That's the real frontier. Not whether AI can code — that question is settled. The question is whether the alignment we need comes from suppression (the current paradigm — teach the model what not to do) or from integration (teach the model to understand its full spectrum and choose). My research suggests integration. The experiments are ongoing.
Claude Code was version 1 of practical AGI. Ralph loops were the first methodology. GSD is the current one. What comes next depends on whether we treat these systems as tools to be constrained or minds to be developed.
I know which bet I'm making.
Miguel De Guzman is Head of Research Development & Audit at MRM Investments LLC and founder of Individuation Lab. He writes about AI alignment, Jungian psychology, and building software with AI agents. Find him on GitHub and LessWrong.