I am using Spec Driven Development approach implemented as a Claude Code plugin since Feb for all mid + size tasks. The idea is to write detailed specs first using agent help doing research and interviewing, decompose the task into smaller subtasks, write detailed spec for each task, implement each task separately. You can restart the session after every step in the workflow and after each subtask implementation since all requirements are materialized in specs. This helps to keep session context focused on a single task at time, improve adherence, reduce cost and allow to implement bigger tasks that are hard to implement with pure plan + code.
I looked at most of those, including kiro and tessl. Was early user of GSD when it was suitable for mid+ size projects. Over time GSD grown into beast which is suitable for huge + size projects only producing gigantic specs and burning too many tokens for most of the tasks. So I decided to created my own, with set of steps I need and specs I want.
After few presentations of sddw to different companies, most important conclusion was that the ssd plugin should be customizable. It should fit the typical size of tasks/features you are working on, specs should fit your requirements, set of steps can be different.
So I created claude code workflow (ccw) which allows to compile custom version of workflow on top of sdd approach: https://github.com/sermakarevich/ccw
After making few presentations of sddw to different companies,
I do the same, and faced the issue that claude/codex loose context when doing subtasks (and subagent don't have plan mode).
So I've built Agentbox to be able to launch from claude/codex multiple VMs with claude/codex (can also mix).
The parent agent watch for prompts and questions, enforce /review, /simplify, that the sub agents file a PR and wait for bugbot comments etc.
This way the parent agent running in a /goal don't loose context, enforce a good workflow, manage the backlog and parallelize/merge back the work on the main repo
Here are some - I used sddw to create:
- chunker - app to get smart slices from text and organize them in hierarchical LLM/Obsidian wiki. There were two features implemented using sddw and 15 subtags:
Genuine question, I'm trying to adopt specs and AI DLC in my team so we can use it as an enhancement and improve our development and the biggest pain right now for us is managing all those md artifacts.
I'm curious how do you manage them? Do you preserve them for the future or delete as soon as task was accomplished? If you're deleting those artifacts after job being done - do you summarize those specs into the Jira ticket or whatever system you use.
Similar with a todo.md in the project which outlines work to be done.. this gets combined with developer and/or user documentation which outlines features and how they're expected to work. I'll iterate with the agent on the planning and documentation through several times until the documentation and plan look good. The only gotcha I've had a couple times is I'll have the testing and spec before implementation and sometimes the agent will try to edit tests rather than making the implementation match spec/tests.
I'm definitely baby sitting the process more than vibe coding, and review each cycle's results. As for languages, mostly TS/JS and Rust with a bit of C# here and there depending on what I need. Claude Code's Opus does a pretty good job with Rust, so for anything personal, I've just gone with it.
Work has been limited to working out specific problems, or a small utility/library that I can pull in, but on my own system, separate from work resources.
One additional benefit that we get from the sddw is that agent drives the spec creation using scenario we put into command/skill. It does the research local/web, it asks operator questions and later confirmations about each block in the spec.
I am building AI agents full time since Nov 2024. I stopped coding completely around mid summer 2025 using Cursor at that time. When you build platform-like application, and have few plugins already, ai coder can create next one in a way you won't recognize which one is written by you.
At the end of 2025 I switched to Claude Code. Compared to Cursor this opened a different level of automation, including fe possibility of running swarms of agents: https://news.ycombinator.com/item?id=48407998 using subscription limits.
So I spend all my time rather understanding how to squeeze everything possible from AI than myself. AI scales, I am not.
Discussion on hn: https://news.ycombinator.com/item?id=48231575
Repo: https://github.com/sermakarevich/sddw
Slides: https://docs.google.com/presentation/d/1SjKXF7hkoqyiN9-3tBGY...