Spec-driven development (SDD) is a methodology where specs (natural language text files) are used as a basis for generating code. SDD typically involves AI agent assistance with writing both the specs and the resulting code.

My short summary of the concept is: Code is written based on a series of increasingly-granular spec documents. The documents, in turn, are ultimately derived from an initial, high-level spec (set of requirements and acceptance criteria). The spec documents ultimately drive the implementation of the code.

For example:

Write a high-level specification (list of requirements, goals etc. — somewhat like a MoSCoW document).
Research and write a plan for implementing the specification — somewhat like a design document.
Based on the plan, construct a sequence of concrete tasks that need to be complete — somewhat like a Jira epic.
Finally, write code to implement each task.

Sounds familiar? The SDD methodology is a microcosm of a typical process for rolling out a new feature in a larger organization, but reworked for individual engineers using AI agents instead.

In other words, SDD is a way of rethinking software engineering where the engineer describes the feature they want at a high level, effectively playing a Product Owner-esque role, and then guides the AI through the process of planning, research, writing tasks, and eventually writing code.

Spec-driven development doesn’t strictly require special tooling — just an AI coding agent — but there are tools out there that streamline the process with optimized commands, prompts, and workflows.

Different SDD tools might have different interpretations of exactly what “specs” are and how they’re used. I’m mostly focusing on the SDD approach used by spec-kit, since that’s what I’m actually using at the moment.

Depending on the particular approach, SDD can encompass a variety of opinionated ideas — like an emphasis on documenting test cases and then converting them to automated tests. I won’t be getting into those in this post, but you can read more about the theory in Specification-Driven Development.

There are also some fascinating, but immature, approaches that involve spec documents being persistent sources of truth over time. See Understanding Spec-Driven-Development for additional details on that front.

spec-kit

Let’s get concrete. spec-kit is a CLI tool that implements its own version of Spec-Driven Development and works with a variety of agents.

I’m using spec-kit alongside Claude Code for my AI-assisted development workflow.

In practice, implementing a feature with spec-kit involves running multiple custom commands in sequence:

The spec-kit CLI automatically scaffolds these custom commands in your project directory, along with a variety of helper scripts, so you don’t need to worry about how they’re implemented under the hood (unless you want to!)

Run /speckit.specify [feature description] — creates a spec for the new feature.
1. (Optional) Review and adjust spec by running /speckit.specify again.
2. (Optional) Prompt the AI to ask questions by running /speckit.clarify.
Run /speckit.plan to create a plan for implementing the spec. The AI will do research on how to implement a solution and produce multiple documents.
Run /speckit.tasks to create a phased implementation plan composed of granular tasks.
1. (Optional) Run /speckit.analyze to review the documents for consistency.
Run /speckit.implement to implement the tasks.
1. Between implementation phases, or when encountering manual testing tasks, the AI will pause implementation and prompt the user.

Because spec-kit is an open source tool that targets apps at all levels of complexity, it’s flexible in how it can be used. As a result, you have to manually advance through each stage of the process (presumably after reviewing and adjusting the documents generated by the previous stage).

I’ve previously experimented with Backlog.md as well — this has a lighter-weight approach that feels like a sweet spot for small tasks, but has less refined prompts that don’t always produce equivalent results.

For example, features implemented with spec-kit tend to have more detailed specs, more automated tests, and more thought put into edge cases, and more complex features can be implemented in a single go.

But, I still think Backlog.md is probably suitable for most personal projects that seek a compromise between “full SDD” and just manually prompting the agent to write code.

Subagents go brrr

Using spec-kit as described above, I started to feel frustrated. Features took a long time to implement, and worse, I constantly needed to check in and run this or that command to get the process going again. And really, am I going to read all those documents for each feature? Maybe if it was a work project, but for a vibe coded personal project? No way.

What if I just want to write a spec, manually review/iterate on that, and once the spec is good, just have the agent take over and do everything else?

Well, one of the challenges with that is context management: even if you create a custom command or skill to run all the stages of the process, the agent eventually loses track of what it needs to do next. By the time the agent is done with the /speckit.plan command, it’s forgotten that it needs to move on to running /speckit.tasks next.

But as we know, Claude Code has a solution for managing context for long-running, multi-step tasks, and that’s subagents!

That led me to try the following:

Create three subagents: one for planning, one for writing tasks, and one for implementing the tasks. These subagents are just thin wrappers over /speckit commands.
Include instructions for the subagents to skip manual testing steps and proceed autonomously until the implementation is complete.
Create an /implement-spec custom command that coordinates running the subagents in order.

You could also create a subagent for the /speckit.specify command, but I find that I typically want to iterate on the spec, so fully automating that would be going a bit too far. But once the detailed spec is written, the rest of it can be automated.

Now my Claude session for implementing a feature looks like this:

An illustration of how multiple subagents can be coordinated to autonomously implement a spec

Because the heavy lifting is happening in subagents, the main agent doesn’t forget about the content of the /implement-spec command, and successfully drives the whole process to completion.

This is an especially effective use of subagents because it dodges the “context-passing” question — namely, what information/context subagents return to the main agent, and then how the main agent passes that on to the next subagent. Because the subagents are communicating with each other by reading and writing Markdown files according to the spec-kit information structure, there’s no need for any context-passing to/from the main agent at all!

Even with this subagent approach, Claude Code doesn’t always autonomously complete all the tasks. I suspect it might be a prompt issue — I need to do more iteration to do on my prompts to achieve the results I want. But even with “perfect” prompts, getting 100% success rates might not be practical to achieve. Temper your expectations!

The nice thing about using spec-kit is that the state of the implementation is tracked in the spec files. If Claude does abort the process early, you can just re-run the command (e.g. /speckit.implement) and it’ll automatically pick up where it left off.

If you want to see my exact subagents/prompts/etc., you can find them here.

Features great and small

I go through the above workflow for each “feature.” But features can be big or small. What granularity is right for this workflow?

One advantage of using spec-kit is that the workflow can handle fairly large features — moreso than when using Backlog.md, which itself can handle larger features than when just prompting Claude Code by hand. But larger features add time and complexity and therefore risk.

The example I use in the images above — User management and RBAC — is a pretty huge feature to swallow in a single spec (unless your app is so small that there are only a couple possible roles and permissions to begin with).

The exact sweet spot for feature size is, I suspect, project dependent.

When working on Webhook Testing Tool, I’ve had success with features at the level of:

Ability to archive or delete historical requests, both individually and in bulk.
Adding a CLI with 3 admin management commands.
Adding a user management page with email/password changing functionality.

If the feature was smaller, e.g. fixing a bug in a particular page, I would just prompt Claude to fix it directly (or just do it myself, probably).

If the feature was larger (e.g. supporting multi-user login and authorization), I would split it into chunks to be implemented separately.

We’re all Product Owners now? — or not

If all we’re doing is iterating on a spec document, then handing it off to the AI for planning and implementation, then have we basically changed our role from Software Engineer to Product Owner? Could an actual (nontechnical) PO just do the same thing?

Well, not really. For better or worse, this process is far from foolproof, and the resulting code typically needs some adjustment. No matter how autonomous the agent is, we can’t skip the final stage of code review and manual testing. Worse, sometimes there are issues that the agent just doesn’t know how to fix, and we need to actually get in there and write some code ourselves.

In short — there’s promise in this approach, but the actual coding/implementation isn’t good enough to cut out an engineer entirely. We still have a long march of nines before that’s viable.