The promise is simple: stop context-switching between your editor and a chat window, let a loop run while you aren’t watching, and wake up to a finished feature. I spent a week testing five agentic tools—Claude Code, OpenCode, Factory Droid, Copilot CLI, and Pi—trying to find a loop I could actually trust.

CLIStrengthsWeaknessesVerdict
Claude CodeDeep reasoning, multi-file editsVery expensive, poor 3rd party/local model supportOnly plays well with Anthropic models
OpenCodeFast, easy onboardingBuggy guardrails, easily bypassedPromising but untrustworthy
Factory DroidThe “Lamborghini”: Enterprise-grade, seamless IDE integrationNot optimized for 3rd-party/local LLMsPersonally, I’m more of a Corolla Guy
GitHub Copilot CLISafe, familiarHorrendous local model supportIt seems that Microsoft hasn’t learned from Internet Explorer
PiRaw agentic behavior, hybrid-friendlyZero built-in safetyThe one I’m using

The Ralph Loop

Most agents are designed to exit after a single task. The Ralph Loop—a pattern originally coined by Geoffrey Huntley—inverts this. Instead of the agent deciding when it’s done, you wrap it in a shell loop that forces it to keep iterating until external verification (like a test suite or a “Judge” model) passes. It’s named after Ralph Wiggum: stubborn, naive persistence in the face of danger.

The magic happens when you separate the Worker from the Judge. In my setup, a local Qwen-35b handles the repetitive coding tasks, but a SOTA cloud model (GLM-5.1) reviews every commit. If the Judge smells a hallucination, it pulls the fire alarm.

ralph.sh open in VS Code, phases stacked up like a to-do list with consequences

# A snippet from ralph.sh: The circuit breaker
if _run_judge "per-iteration" "$judge_verdict_file" "phase-${PHASE}-iter-${i}"; then
  echo "[ralph] judge passed iteration $i — continuing"
else
  echo "[ralph] judge FAILED iteration $i — halting loop" >&2
  _notify "Judge ✗" "Phase $PHASE iter $i: judge review FAILED"
  exit 1
fi

The “Lamborghini” Problem

After finding success with the Ralph Loop in OpenCode, I tried Factory Droid. Mario Zechner (the creator of Pi) famously called Droid the “Lamborghini” of CLI agents, and he’s right—it’s enterprise-grade, handles compliance, and has a beautiful VS Code integration.

But a Lamborghini is high-maintenance. While OpenCode played nice with LM Studio, Droid struggled to parse Qwen’s “thoughts.” I eventually fixed the parsing by swapping to oMLX, but even then, Qwen couldn’t grasp Droid’s advanced tool-calling. It would get stuck in infinite loops trying to fetch a website over and over rather than admitting a failure.

Why I’m sticking with Pi (for now)

I eventually settled on Pi. It doesn’t have the “guardrail marketing” of OpenCode or the corporate polish of Droid. It’s a raw engine. By running Pi inside a Lume VM, I get the physical safety I need without the “buggy” permission layers that agents just bypass anyway.

The other reason is this wonderful rpiv-pi package. The project manager part of my brain love this! It wires in a skill pipeline — discover, research, design, plan, implement, validate — where each step produces an artifact the next one consumes. Specialist subagents fan out automatically per stage; hand-off skills freeze session state so the next session resumes from a document, not from memory.

If these tools want to be the “Linux” of the AI space, security and guardrails should be first-class citizens, not optional add-ons. But for now, I’ll keep building my own boxes.