diff --git a/.specs/tasks/roadmap.md b/.specs/tasks/roadmap.md index 9f9be74..8743ddd 100644 --- a/.specs/tasks/roadmap.md +++ b/.specs/tasks/roadmap.md @@ -6,6 +6,7 @@ [x] Move all commands to skills format, in order to properly support all installers [] Add support for vercel skill installer - but left support for installers that exists now [] Publish skills in vercel marketplace +[] Add skills to https://github.com/VoltAgent/awesome-agent-skills [x] Migrate SDD plugin to v2 version [x] Fix issues with scratchpad id generation - potentially write script for generation of them [x] switch to `git mv` instead of `mv` in order to keep git history clean and avoid conflicts. diff --git a/README.md b/README.md index b916709..aca87f8 100644 --- a/README.md +++ b/README.md @@ -21,20 +21,20 @@ Hand-crafted collection of advanced context engineering techniques and patterns The marketplace is based on prompts used daily by our company developers for a long time, while adding plugins from benchmarked papers and high-quality projects. > [!IMPORTANT] -> **v2 marketplace release:** [Spec-Driven Development plugin](https://cek.neolab.finance/plugins/sdd) was rewritten from sctratch. It is now able to produce working code in 100% of cases on real-life production projects! +> **v2 marketplace release:** [Spec-Driven Development plugin](https://cek.neolab.finance/plugins/sdd) was rewritten from scratch. It is now able to produce working code in 100% of cases on real-life production projects! ## Key Features - **Simple to Use** - Easy to install and use without any dependencies. Contains automatically used skills and self-explanatory commands. -- **Token-Efficient** - Carefully crafted prompts and architecture, preferring command oriented skills with sub-agents over general information skills when possible, to minimize populating context with unnecessary information. +- **Token-Efficient** - Carefully crafted prompts and architecture, preferring command-oriented skills with sub-agents over general information skills when possible, to minimize populating context with unnecessary information. - **Quality-Focused** - Each plugin is focused on meaningfully improving agent results in a specific area. - **Granular** - Install only the plugins you need. Each plugin loads only its specific agents, commands, and skills. Each without overlap and redundant skills. - **Scientifically proven** - Plugins are based on proven techniques and patterns that were tested by well-trusted benchmarks and studies. -- **Open-Standards** - Skills are based on [agentskills.io](https://agentskills.io) and [openskills](https://github.com/numman-ali/openskills). [SDD](https://cek.neolab.finance/plugins/sdd) plugin is based on Arc42 specification standard for software development documentation. +- **Open-Standards** - Skills are based on [agentskills.io](https://agentskills.io) and [openskills](https://github.com/numman-ali/openskills). The [SDD](https://cek.neolab.finance/plugins/sdd) plugin is based on the Arc42 specification standard for software development documentation. ## Quick Start -### Step 1: Install Marketplaces and Plugin +### Step 1: Install Marketplace and Plugins #### Claude Code @@ -57,7 +57,7 @@ Each installed plugin loads only its specific agents, commands, and skills into
Installation for Cursor, Windsurf, Cline, OpenCode and others -Use [OpenSkills](https://github.com/numman-ali/openskills) to install skills for broad range of agents: +Use [OpenSkills](https://github.com/numman-ali/openskills) to install skills for a broad range of agents: ```bash npx openskills install NeoLabHQ/context-engineering-kit @@ -97,13 +97,13 @@ In order to use this hook, you need to have `bun` installed. However, it is not You can find the complete Context Engineering Kit documentation [here](https://cek.neolab.finance). -But main plugin we recommend to start with is [Spec-Driven Development](https://cek.neolab.finance/plugins/sdd). +However, the main plugin we recommend starting with is [Spec-Driven Development](https://cek.neolab.finance/plugins/sdd). ## [Spec-Driven Development](https://cek.neolab.finance/plugins/sdd) Comprehensive specification-driven development workflow plugin that transforms prompts into production-ready implementations through structured planning, architecture design, and quality-gated execution. -This plugin is designed to consistently produce working code. It was tested on real-life production projects by our team, and in 100% of cases it generated working code aligned with the initial prompt. If you find a use case it cannot handle, please report it as an issue. +This plugin is designed to consistently produce working code. It was tested on real-life production projects by our team, and in 100% of cases, it generated working code aligned with the initial prompt. If you find a use case it cannot handle, please report it as an issue. ### Key Features @@ -147,7 +147,7 @@ Restart the Claude Code session to clear context and start fresh. Then run the f - [/sdd:add-task](https://cek.neolab.finance/plugins/sdd/add-task) - Create task template file with initial prompt - [/sdd:plan](https://cek.neolab.finance/plugins/sdd/plan) - Analyze prompt, generate required skills and refine task specification -- [/sdd:implement](https://cek.neolab.finance/plugins/sdd/implement) - Produce working implementation of the task and verify it +- [/sdd:implement](https://cek.neolab.finance/plugins/sdd/implement) - Produce a working implementation of the task and verify it Additional commands useful before creating a task: @@ -182,7 +182,7 @@ Key patterns implemented in this plugin: ### Vibe Coding vs. Specification-Driven Development -This plugin is not a "vibe coding" solution, but out of the box it works like one. By default it is designed to work from a single prompt through to the end of the task, making reasonable assumptions and evidence-based decisions instead of constantly asking for clarification. This is caused by fact that developer time is more valuable than model time, so it allow developer to decide how much time task is worth to spend. Plugin will always produce working results, but quality will be sub-optimal if no human feedback is provided. +This plugin is not a "vibe coding" solution, but out of the box it works like one. By default it is designed to work from a single prompt through to the end of the task, making reasonable assumptions and evidence-based decisions instead of constantly asking for clarification. This is because developer time is more valuable than model time, allowing the developer to decide how much time the task is worth. The plugin will always produce working results, but quality will be sub-optimal if no human feedback is provided. To improve quality, after generating a specification you can correct it or leave comments using `//`, then run the `/plan` command again with the `--refine` flag. You can also verify each planning and implementation phase by adding the `--human-in-the-loop` flag. According to the majority of known research, human feedback is the most effective way to improve results. @@ -201,11 +201,11 @@ To view all available plugins: ``` - [Reflexion](https://cek.neolab.finance/plugins/reflexion) - Introduces feedback and refinement loops to improve output quality. -- [Spec-Driven Development](https://cek.neolab.finance/plugins/sdd) - Introduces commands for specification-driven development, based on Continuous Learning + LLM-as-Judge + Agent Swarm. Achives **development as compilation** through reliable code generation. +- [Spec-Driven Development](https://cek.neolab.finance/plugins/sdd) - Introduces commands for specification-driven development, based on Continuous Learning + LLM-as-Judge + Agent Swarm. Achieves **development as compilation** through reliable code generation. - [Code Review](https://cek.neolab.finance/plugins/code-review) - Introduces codebase and PR review commands and skills using multiple specialized agents. - [Git](https://cek.neolab.finance/plugins/git) - Introduces commands for commit and PRs creation. - [Test-Driven Development](https://cek.neolab.finance/plugins/tdd) - Introduces commands for test-driven development, common anti-patterns and skills for testing using subagents. -- [Subagent-Driven Development](https://cek.neolab.finance/plugins/sadd) - Introduces skills for subagent-driven development, dispatches fresh subagent for each task with code review between tasks, enabling fast iteration with quality gates. +- [Subagent-Driven Development](https://cek.neolab.finance/plugins/sadd) - Introduces skills for subagent-driven development, which dispatches a fresh subagent for each task with code review between tasks, enabling fast iteration with quality gates. - [Domain-Driven Development](https://cek.neolab.finance/plugins/ddd) - Introduces commands to update CLAUDE.md with best practices for domain-driven development, focused on code quality, and includes Clean Architecture, SOLID principles, and other design patterns. - [FPF - First Principles Framework](https://cek.neolab.finance/plugins/fpf) - Introduces structured reasoning using ADI cycle (Abduction-Deduction-Induction) with knowledge layer progression. Uses workflow command pattern with fpf-agent for hypothesis generation, verification, and auditable decision-making. - [Kaizen](https://cek.neolab.finance/plugins/kaizen) - Inspired by Japanese continuous improvement philosophy, Agile and Lean development practices. Introduces commands for analysis of root causes of issues and problems, including 5 Whys, Cause and Effect Analysis, and other techniques. @@ -260,7 +260,7 @@ This plugin uses multiple specialized agents for comprehensive code quality anal - **security-auditor** - Identifies security vulnerabilities and potential attack vectors - **test-coverage-reviewer** - Evaluates test coverage and suggests missing test cases -You can use this plugin to review code in github actions, in order to do it follow [this guide](https://cek.neolab.finance/guides/ci-integration). +You can use this plugin to review code in GitHub Actions; to do so, follow [this guide](https://cek.neolab.finance/guides/ci-integration). ### [Git](https://cek.neolab.finance/plugins/git) @@ -501,10 +501,10 @@ Commands for integrating Model Context Protocol servers with your project. Each **Commands** -- [/mcp:setup-context7-mcp](https://cek.neolab.finance/plugins/mcp/setup-context7-mcp) - Guide for setup Context7 MCP server to load documentation for specific technologies -- [/mcp:setup-serena-mcp](https://cek.neolab.finance/plugins/mcp/setup-serena-mcp) - Guide for setup Serena MCP server for semantic code retrieval and editing capabilities -- [/mcp:setup-codemap-cli](https://cek.neolab.finance/plugins/mcp/setup-codemap-cli) - Guide for setup Codemap CLI for intelligent codebase visualization and navigation -- [/mcp:setup-arxiv-mcp](https://cek.neolab.finance/plugins/mcp/setup-arxiv-mcp) - Guide for setup arXiv/Paper Search MCP server via Docker MCP for academic paper search and retrieval from multiple sources +- [/mcp:setup-context7-mcp](https://cek.neolab.finance/plugins/mcp/setup-context7-mcp) - Guide for setting up Context7 MCP server to load documentation for specific technologies +- [/mcp:setup-serena-mcp](https://cek.neolab.finance/plugins/mcp/setup-serena-mcp) - Guide for setting up Serena MCP server for semantic code retrieval and editing capabilities +- [/mcp:setup-codemap-cli](https://cek.neolab.finance/plugins/mcp/setup-codemap-cli) - Guide for setting up Codemap CLI for intelligent codebase visualization and navigation +- [/mcp:setup-arxiv-mcp](https://cek.neolab.finance/plugins/mcp/setup-arxiv-mcp) - Guide for setting up arXiv/Paper Search MCP server via Docker MCP for academic paper search and retrieval from multiple sources - [/mcp:build-mcp](https://cek.neolab.finance/plugins/mcp/build-mcp) - Guide for creating high-quality MCP servers that enable LLMs to interact with external services ## Theoretical Foundation @@ -525,4 +525,4 @@ This project is based on research and papers from the following sources: - [Chain of Thought Prompting](https://arxiv.org/abs/2201.11903) - Step-by-step reasoning - [Inference-Time Scaling of Verification](https://arxiv.org/abs/2601.15808) - Rubric-guided verification -More details about theoretical foundation can be found in [resources](https://cek.neolab.finance/resources) page. +More details about the theoretical foundation can be found on the [resources](https://cek.neolab.finance/resources) page. diff --git a/docs/plugins/sdd/README.md b/docs/plugins/sdd/README.md index d506a79..9737e38 100644 --- a/docs/plugins/sdd/README.md +++ b/docs/plugins/sdd/README.md @@ -6,13 +6,13 @@ This plugin is designed to consistently and reproducibly produce working code. I ## Key Features -- **Development as compilation** — The plugin works like a "compilation" or "nightly build" for your development process: `task specs → run /sdd:implement → working code`. After writing your prompt, you can launch the plugin and expect a working result when you come back. The time it takes depends on task complexity — simple tasks may finish in 30 minutes, while complex ones can take a few days. -- **Benchmark-level quality in real life** — Model benchmarks improve with each release, yet real-world results usually stay the same. That's because benchmarks reflect the best possible output a model can achieve, whereas in practice LLMs tend to drift toward sub-optimal solutions that can be wrong or non-functional. This plugin uses a variety of patterns to keep the model working at its peak performance. -- **Customizable** — Balance between result quality and process speed by adjusting command parameters. Learn more in the [Customization](customization.md) section. -- **Developer time-efficient** — The overall process is designed to minimize developer time and reduce the number of interactions, while still producing results better than what a model can generate from scratch. However, overall quality is highly proportional to the time you invest in iterating and refining the specification. -- **Industry-standard** — The plugin's specification template is based on the arc42 standard, adjusted for LLM capabilities. Arc42 is a widely adopted, high-quality standard for software development documentation used by many companies and organizations. -- **Works best in complex or large codebases** — While most other frameworks work best for new projects and greenfield development, this plugin is designed to perform better the more existing code and well-structured architecture you have. At each planning phase it includes a **codebase impact analysis** step that evaluates which files may be affected and which patterns to follow to achieve the desired result. -- **Simple** — This plugin avoids unnecessary complexity and mainly uses just 3 commands, offloading process complexity to the model via multi-agent orchestration. `/sdd:implement` is a single command that produces working code from a task specification. To create that specification, you run `/sdd:add-task` and `/sdd:plan`, which analyze your prompt and iteratively refine the specification until it meets the required quality. +- **Development as compilation** — The plugin functions like a "compilation" or "nightly build" for your development process: `task specs → run /sdd:implement → working code`. After writing your prompt, you can launch the plugin and expect a functional result when you return. The completion time depends on task complexity — simple tasks may finish within 30 minutes, while complex ones can take several days. +- **Benchmark-level quality in real life** — Model benchmarks improve with each release, yet real-world results often stagnate. This is because benchmarks reflect the best possible output a model can achieve, whereas in practice LLMs tend to drift toward sub-optimal, non-functional solutions. This plugin uses a variety of patterns to keep the model operating at peak performance. +- **Customizable** — Balance result quality and process speed by adjusting command parameters. Learn more in the [Customization](customization.md) section. +- **Developer time-efficiency** — The overall process is designed to minimize developer time and reduce the number of interactions, while still producing results superior to what a model can generate from scratch. However, overall quality is proportional to the time invested in iterating on and refining the specification. +- **Industry-standard** — The plugin's specification template is based on the arc42 standard, adjusted for LLM capabilities. Arc42 is a widely adopted, high-quality standard for software development documentation used by many organizations. +- **Works best in complex or large codebases** — While most other frameworks work best for new projects and greenfield development, this plugin is designed to perform better as your codebase grows and your architecture becomes more structured. Each planning phase includes a **codebase impact analysis** step that evaluates which files may be affected and which patterns to follow to achieve the desired result. +- **Simple** — This plugin avoids unnecessary complexity by primarily using only three commands, offloading process complexity to the model via multi-agent orchestration. `/sdd:implement` is a single command that produces functional code from a task specification. To create that specification, you run `/sdd:add-task` and `/sdd:plan`, which analyze your prompt and iteratively refine the specification until it meets the required quality standards. ## Quick Start @@ -20,7 +20,7 @@ This plugin is designed to consistently and reproducibly produce working code. I /plugin marketplace add NeoLabHQ/context-engineering-kit ``` -Enable `sdd` plugin in installed plugins list +Enable the `sdd` plugin in the installed plugins list: ```bash /plugin @@ -30,20 +30,20 @@ Enable `sdd` plugin in installed plugins list Then run the following commands: ```bash -# create .specs/tasks/draft/design-auth-middleware.feature.md file with initial prompt +# Create the .specs/tasks/draft/design-auth-middleware.feature.md file with the initial prompt /sdd:add-task "Design and implement authentication middleware with JWT support" -# write detailed specification for the task +# Write a detailed specification for the task /sdd:plan -# will move task to .specs/tasks/todo/ folder +# Moves the task to the .specs/tasks/todo/ folder ``` Restart the Claude Code session to clear context and start fresh. Then run the following command: ```bash -# implement the task +# Implement the task /sdd:implement @.specs/tasks/todo/design-auth-middleware.feature.md -# produces working implementation and moves the task to .specs/tasks/done/ folder +# Produces a working implementation and moves the task to the .specs/tasks/done/ folder ``` - [Detailed guide](../../guides/spec-driven-development.md) @@ -53,11 +53,11 @@ Restart the Claude Code session to clear context and start fresh. Then run the f End-to-end task implementation process from initial prompt to pull request, including commands from the [git](../git/README.md) plugin: -- `/sdd:add-task` → creates a `.specs/tasks/draft/..md` file with the initial task description. -- `/sdd:plan` → generates a `.claude/skills//SKILL.md` file with skills needed to implement the task (by analyzing library and framework documentation used in the codebase), then updates the task file with a refined specification and moves it to `.specs/tasks/todo/`. -- `/sdd:implement` → produces a working implementation, verifies it, then moves the task to `.specs/tasks/done/`. -- `/git:commit` → commits changes. -- `/git:create-pr` → creates a pull request. +- `/sdd:add-task` → Creates a `.specs/tasks/draft/..md` file with the initial task description. +- `/sdd:plan` → Generates a `.claude/skills//SKILL.md` file with the skills needed to implement the task (by analyzing the library and framework documentation used in the codebase), then updates the task file with a refined specification and moves it to `.specs/tasks/todo/`. +- `/sdd:implement` → Produces a working implementation, verifies it, then moves the task to `.specs/tasks/done/`. +- `/git:commit` → Commits changes. +- `/git:create-pr` → Creates a pull request. ``` 1. Create 2. Plan 3. Implement 4. Ship @@ -102,28 +102,28 @@ The SDD plugin uses specialized agents for different phases of development: | `team-lead` | Step parallelization, agent assignment, execution planning | `/sdd:plan` (Phase 5) | | `qa-engineer` | Verification rubrics, quality gates, LLM-as-Judge definitions | `/sdd:plan` (Phase 6) | | `developer` | Code implementation, TDD execution, quality review, verification | `/sdd:implement` | -| `tech-writer` | Technical documentation writing, API guides, architecture updates, lessons learned | `/sdd:implement` | +| `tech-writer` | Technical documentation, API guides, architecture updates, and lessons learned | `/sdd:implement` | ## Patterns Key patterns implemented in this plugin: -- **Structured reasoning templates** — includes Zero-shot and Few-shot Chain of Thought, Tree of Thoughts, Problem Decomposition, and Self-Critique. Each is tailored to a specific agent and task, enabling sufficiently detailed decomposition so that isolated sub-agents can implement each step independently. -- **Multi-agent orchestration for context management** — Context isolation of independent agents prevents the context rot problem, essentially keeping LLMs at optimal performance at each step of the process. The main agent acts as an orchestrator that launches sub-agents and controls their work. -- **Quality gates based on LLM-as-Judge** — Evaluate the quality of each planning and implementation step using evidence-based scoring and predefined verification rubrics. This fully eliminates cases where an agent produces non-working or incorrect solutions. -- **Continuous learning** — Builds skills that the agent needs to implement a specific task, which it would otherwise not be able to perform from scratch. -- **Spec-driven development pattern** — Based on the arc42 specification standard, adjusted for LLM capabilities, to eliminate parts of the specification that add no value to implementation quality or that could degrade it. -- **MAKER** — An agent reliability pattern introduced in [Solving a Million-Step LLM Task with Zero Errors](https://arxiv.org/abs/2511.09030). It removes agent mistakes caused by accumulated context and hallucinations by utilizing clean-state agent launches, filesystem-based memory storage, and multi-agent voting during critical decision-making. +- **Structured reasoning templates** — Includes Zero-shot and Few-shot Chain of Thought, Tree of Thoughts, Problem Decomposition, and Self-Critique. Each is tailored to a specific agent and task, enabling sufficiently detailed decomposition so that isolated sub-agents can implement each step independently. +- **Multi-agent orchestration for context management** — Context isolation of independent agents prevents "context rot," maintaining optimal LLM performance at each step. The main agent acts as an orchestrator that launches sub-agents and manages their workflow. +- **Quality gates based on LLM-as-Judge** — Evaluates the quality of each planning and implementation step using evidence-based scoring and predefined verification rubrics. This eliminates cases where an agent produces non-functional or incorrect solutions. +- **Continuous learning** — Automatically builds specific skills the agent needs to implement a task, which it might otherwise be unable to perform from scratch. +- **Spec-driven development pattern** — Based on the arc42 specification standard adjusted for LLM capabilities, this pattern eliminates elements of the specification that do not add value to implementation quality. +- **MAKER** — An agent reliability pattern introduced in [Solving a Million-Step LLM Task with Zero Errors](https://arxiv.org/abs/2511.09030). It minimizes agent mistakes caused by context accumulation and hallucinations by utilizing clean-state agent launches, filesystem-based memory storage, and multi-agent voting during critical decisions. ## Vibe Coding vs. Specification-Driven Development -This plugin is not a "vibe coding" solution, but out of the box it works like one. By default it is designed to work from a single prompt through to the end of the task, making reasonable assumptions and evidence-based decisions instead of constantly asking for clarification. This is caused by fact that developer time is more valuable than model time, so it allow developer to decide how much time task is worth to spend. Plugin will always produce working results, but quality will be sub-optimal if no human feedback is provided. +This plugin is not a "vibe coding" solution, though it can function like one out of the box. By default, it is designed to work from a single prompt through to task completion, making reasonable assumptions and evidence-based decisions instead of constantly asking for clarification. This is because developer time is more valuable than model time, allowing the developer to decide how much time is worth spending on a task. The plugin will always produce functional results, but quality may be sub-optimal without human feedback. -To improve quality, after generating a specification you can correct it or leave comments using `//`, then run the `/plan` command again with the `--refine` flag. You can also verify each planning and implementation phase by adding the `--human-in-the-loop` flag. According to the majority of known research, human feedback is the most effective way to improve results. +To improve quality, you can correct the generated specification or leave comments using `//`, then run the `/sdd:plan` command again with the `--refine` flag. You can also verify each planning and implementation phase by adding the `--human-in-the-loop` flag. Majority of researches show that human feedback is the most effective way to improve results. -Our tests showed that even when the initially generated specification was incorrect due to lack of information or task complexity, the agent was still able to self-correct until it reached a working solution. However, it usually took much longer, spending time on wrong paths and stopping more frequently. To avoid this, we strongly advise decomposing tasks into smaller separate tasks with dependencies and reviewing the specification for each one. You can add dependencies between tasks as arguments to the `/add-task` command, and the model will link them together by adding a `depends_on` section to the task file frontmatter. +Our tests showed that even when the initially generated specification was incorrect due to missing information or task complexity, the agent was still able to self-correct until it reached a working solution. However, this process often took longer, as the agent explored incorrect paths and stopped more frequently. To avoid this, we strongly recommend decomposing complex tasks into smaller, separate tasks with dependencies and reviewing the specification for each one. You can add dependencies between tasks as arguments to the `/sdd:add-task` command, and the model will link them by adding a `depends_on` section to the task file's frontmatter. -Even if you don't want to spend much time on this process, you can still use the plugin for complex tasks without decomposition or human verification — but you will likely need tools like ralph-loop to keep the agent running for a longer time. +Even if you prefer a less hands-on approach, you can still use the plugin for complex tasks without decomposition or human verification — though you may need tools to keep the session active for longer periods, for example ralph-loop. Learn more about available customization options in [Customization](customization.md). @@ -144,11 +144,11 @@ The SDD plugin is based on established software engineering methodologies and re - [Test-Driven Development](https://www.agilealliance.org/glossary/tdd/) - Writing tests before implementation - [Clean Architecture](https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html) - Separation of concerns and dependency inversion - [Vertical Slice Architecture](https://jimmybogard.com/vertical-slice-architecture/) - Feature-based organization for incremental delivery -- [Verbalized Sampling](https://arxiv.org/abs/2510.01171) - Training-free prompting strategy for diverse idea generation. Achieves **2-3x diversity improvement** while maintaining quality. Used for `create-ideas`, `brainstorm` and `plan` commands +- [Verbalized Sampling](https://arxiv.org/abs/2510.01171) - A training-free prompting strategy for diverse idea generation. It achieves a **2-3x diversity improvement** while maintaining quality. Used for the `create-ideas`, `brainstorm`, and `plan` commands. - [Solving a Million-Step LLM Task with Zero Errors](https://arxiv.org/abs/2511.09030) - Reliability pattern for LLM-based agents that enables solving complex tasks with zero errors. -- [LLM-as-a-Judge](https://arxiv.org/abs/2306.05685) - Evaluation patterns -- [Multi-Agent Debate](https://arxiv.org/abs/2305.14325) - Multiple perspectives -- [Chain-of-Verification](https://arxiv.org/abs/2309.11495) - Hallucination reduction -- [Tree of Thoughts](https://arxiv.org/abs/2305.10601) - Structured exploration -- [Constitutional AI](https://arxiv.org/abs/2212.08073) - Project constitution -- [Chain of Thought Prompting](https://arxiv.org/abs/2201.11903) - Step-by-step reasoning +- [LLM-as-a-Judge](https://arxiv.org/abs/2306.05685) - Evaluation patterns for grading LLM output. +- [Multi-Agent Debate](https://arxiv.org/abs/2305.14325) - Leveraging multiple perspectives for higher accuracy. +- [Chain-of-Verification](https://arxiv.org/abs/2309.11495) - Reducing hallucinations through verification steps. +- [Tree of Thoughts](https://arxiv.org/abs/2305.10601) - Structured exploration of complex solution spaces. +- [Constitutional AI](https://arxiv.org/abs/2212.08073) - Defining core principles for agent behavior. +- [Chain of Thought Prompting](https://arxiv.org/abs/2201.11903) - Enabling step-by-step reasoning. diff --git a/docs/plugins/sdd/add-task.md b/docs/plugins/sdd/add-task.md index adc686c..2dc829a 100644 --- a/docs/plugins/sdd/add-task.md +++ b/docs/plugins/sdd/add-task.md @@ -2,8 +2,8 @@ Create a draft task file that captures the user's intent with structured metadata, proper classification, and dependency tracking — ready for refinement by `/sdd:plan`. -- Purpose — Transform a user prompt into a well-structured draft task file with action-oriented title, type classification, and optional dependencies -- Output — Task file in `.specs/tasks/draft/..md` +- Purpose — Transform a user prompt into a well-structured draft task file with an action-oriented title, type classification, and optional dependencies +- Output — Task file at `.specs/tasks/draft/..md` ```bash /sdd:add-task "Task description" [dependency-file-paths...] @@ -65,7 +65,7 @@ Create a draft task file that captures the user's intent with structured metadat ### Phase 1: Setup Directory Structure -Creates the full task lifecycle directory structure if it doesn't exist: +Creates the full task lifecycle directory structure if it does not exist: | Directory | Purpose | |-----------|---------| @@ -115,7 +115,7 @@ depends_on: ## Description -// Will be filled in future stages by business analyst +// Will be filled in future stages by a business analyst ``` > The `depends_on` field is only included when dependencies are explicitly provided. @@ -180,10 +180,10 @@ After creating a draft task, proceed with the SDD workflow: /sdd:implement ``` -## Best practices +## Best Practices -- Keep descriptions focused — one task per prompt, decompose large features into multiple dependent tasks -- Provide dependencies explicitly — use task file paths as additional arguments when tasks have ordering requirements -- Use natural language — the agent infers type and title from your description; no special formatting needed -- Review the draft — verify the generated title and type before running `/sdd:plan` -- Decompose before planning — creating smaller tasks with dependencies produces better specifications than one large task +- Keep descriptions focused — one task per prompt; decompose large features into multiple dependent tasks. +- Provide dependencies explicitly — use task file paths as additional arguments when tasks have ordering requirements. +- Use natural language — the agent infers type and title from your description; no special formatting is needed. +- Review the draft — verify the generated title and type before running `/sdd:plan`. +- Decompose before planning — creating smaller tasks with dependencies produces better specifications than one large task. diff --git a/docs/plugins/sdd/customization.md b/docs/plugins/sdd/customization.md index 4473890..5b86157 100644 --- a/docs/plugins/sdd/customization.md +++ b/docs/plugins/sdd/customization.md @@ -4,9 +4,9 @@ Customization options available for the SDD plugin. ## Token Usage and Efficiency -The main limitation of SDD plugin is the number of tokens you're willing to spend on each task. +The main limitation of the SDD plugin is the number of tokens you're willing to spend on each task. -In contrast to other plugins in the context-engineering-kit marketplace, this plugin tries to use as many tokens as possible to get the best results. This approach can consume an entire Claude Code session's token budget on a single task, which is why it has default limits like `target-quality` and `max-iterations` set per command. These are predefined in a way that if a task is well-defined and not too big, in majority of the cases results will be good enough, that you not will be need to re-iterate on it. +In contrast to other plugins in the context-engineering-kit marketplace, this plugin tries to use as many tokens as possible to get the best results. This approach can consume an entire Claude Code session's token budget on a single task, which is why it has default limits like `target-quality` and `max-iterations` set per command. These are predefined in a way that if a task is well-defined and not too big, in the majority of cases, results will be good enough that you will not need to reiterate on it. If you want better results or want to finish tasks faster, you can adjust command parameters. For example, adding `--target-quality 4.5 --max-iterations 5` to `/plan` or `/implement` allows the orchestrator agent to iterate more toward "ideal" results. Conversely, setting `--target-quality 3.0 --max-iterations 1` makes agents finish when results minimally meet the criteria, iterating only once to resolve issues. This lets you configure each command to balance quality and speed per task run. @@ -18,9 +18,9 @@ Last but not least, you can ask the orchestrator to use only the `haiku` model f ## Human-in-the-Loop Verification -The initial version of this plugin was designed to produce the highest possible quality solution an LLM can generate — in other words, to move real-world LLM performance closer to benchmark results. However, in practice, LLMs tend to drift toward sub-optimal solutions, which is not the desired outcome. The current version filters out all non-working and obviously incorrect solutions. That said, the overall quality still depends on the quality of the specification file and, consequently, on the quality of your review of that specification. +The initial version of this plugin was designed to produce the highest possible quality solution that an LLM can generate — in other words, to move real-world LLM performance closer to benchmark results. However, in practice, LLMs tend to drift toward sub-optimal solutions, which is not the desired outcome. The current version filters out all non-working and obviously incorrect solutions. That said, the overall quality still depends on the quality of the specification file and, consequently, on the quality of your review of that specification. -In order to incorporate human feedback into the process, you can use the `--human-in-the-loop` parameter in the `/plan` and `/implement` commands. It will pause the process after each phase and ask you to review the results of last phase, before continuing to the next one. +In order to incorporate human feedback into the process, you can use the `--human-in-the-loop` parameter in the `/plan` and `/implement` commands. It will pause the process after each phase and ask you to review the results of the last phase before continuing to the next one. ## Epics, User Stories, and Roadmaps diff --git a/docs/plugins/sdd/implement.md b/docs/plugins/sdd/implement.md index d9c0562..4cb8810 100644 --- a/docs/plugins/sdd/implement.md +++ b/docs/plugins/sdd/implement.md @@ -1,9 +1,9 @@ # /sdd:implement - Task Implementation with Verification -Execute task implementation steps with automated LLM-as-Judge quality verification, sequential and parallel step execution, and Definition of Done validation. +Execute task implementation steps using automated LLM-as-Judge quality verification, sequential and parallel execution, and Definition of Done (DoD) validation. -- Purpose - Implement all steps from a planned task specification and verify working results -- Output - Working code with tests passing, task moved to `.specs/tasks/done/` +- **Purpose**: Implement all steps from a planned task specification and verify the results. +- **Output**: Working code with passing tests; task moved to `.specs/tasks/done/`. ```bash /sdd:implement [task-file] [options] @@ -16,10 +16,10 @@ Execute task implementation steps with automated LLM-as-Judge quality verificati | `task-file` | Path or filename | Auto-detect | Task file name or path (e.g., `add-validation.feature.md`). Auto-selects from `in-progress/` or `todo/` if only one task exists. | | `--target-quality` | `--target-quality X.X` or `X.X,Y.Y` | `4.0` (standard) / `4.5` (critical) | Quality threshold. Single value sets both. Two comma-separated values set standard,critical. | | `--max-iterations` | `--max-iterations N` | `3` | Maximum fix→verify cycles per step. Set to `unlimited` for no limit. | -| `--human-in-the-loop` | `--human-in-the-loop [s1,s2,...]` | None | Steps after which to pause for review. Without step numbers, pauses after every step. | -| `--skip-judges` | flag | `false` | Skip all judge validation — fast but no quality gates | -| `--continue` | flag | None | Resume from last completed step | -| `--refine` | flag | `false` | Detect changed project files and re-verify from earliest affected step | +| `--human-in-the-loop` | `--human-in-the-loop [s1,s2,...]` | None | Steps after which to pause for review. If no steps are specified, the process pauses after every step. | +| `--skip-judges` | flag | `false` | Skip all judge validation — fast but provides no quality gates | +| `--continue` | flag | None | Resume from the last completed step | +| `--refine` | flag | `false` | Detect changed project files and re-verify from the earliest affected step | ## Workflow Diagram @@ -91,7 +91,7 @@ Execute task implementation steps with automated LLM-as-Judge quality verificati ### Phase 0: Select Task & Move to In-Progress -1. Resolves the task file — checks `in-progress/` first, then `todo/` +1. Resolves the task file by checking `in-progress/` first, then `todo/` 2. Moves the task from `todo/` to `in-progress/` 3. Parses flags and displays resolved configuration @@ -112,16 +112,16 @@ For each step in dependency order, the orchestrator launches sub-agents and judg For simple operations (directory creation, file deletion): 1. Launch `sdd:developer` agent to implement the step -2. Mark step complete — no judge verification needed +2. Mark the step as complete — no judge verification is needed #### Pattern B: Critical Step (Panel of 2 Evaluations) For critical artifacts requiring high confidence: -1. Launch `sdd:developer` agent to implement +1. Launch the `sdd:developer` agent to implement the step 2. Launch 2 `sdd:developer` evaluation agents **in parallel** with the step's rubric -3. Calculate median score; pass if median ≥ threshold -4. On FAIL: iterate fix→verify until PASS or max iterations reached +3. Calculate the median score; pass if median ≥ threshold +4. On failure: iterate through fix→verify cycles until they pass or the maximum number of iterations is reached #### Pattern C: Multi-Item Step (Per-Item Evaluations) @@ -129,24 +129,24 @@ For steps creating multiple similar items: 1. Launch `sdd:developer` agents **in parallel** (one per item) 2. Launch evaluation agents **in parallel** (one per item) -3. All items must pass; failing items get re-implemented -4. Iterate until all pass or max iterations reached +3. All items must pass; failing items are re-implemented +4. Iterate until all pass or the maximum number of iterations is reached ### Phase 3: Final Verification After all steps complete: 1. Launch `sdd:developer` agent to verify all **Definition of Done** items -2. Each item checked with evidence (tests pass, build succeeds, files exist, patterns match) -3. Failing items get fixed by dedicated developer agents +2. Each item is checked for evidence (e.g., passing tests, successful builds, existing files, matching patterns) +3. Failing items are fixed by dedicated developer agents 4. Re-verify until all items pass ### Phase 4: Complete 1. Move task from `in-progress/` to `done/` -2. All step titles marked `[DONE]`, subtasks marked `[X]` -3. All DoD items marked `[X]` -4. Generate final implementation report +2. All step titles are marked `[DONE]`, and subtasks are marked `[X]` +3. All DoD items are marked `[X]` +4. Generate a final implementation report ## Verification Levels @@ -164,7 +164,7 @@ Resumes implementation from the last completed step: 1. Parses task file for `[DONE]` markers 2. Launches judge to verify the last incomplete step's artifacts 3. If PASS: marks done, resumes from next step -4. If FAIL: re-implements the step and iterates +4. If it fails: re-implement the step and iterate ## Refine Mode (`--refine`) @@ -173,7 +173,7 @@ Detects changes to **project files** (not the task file) and re-verifies from th 1. Detects changed files via `git diff` 2. Maps changed files to implementation steps using "Expected Output" and artifact paths 3. Determines the earliest affected step -4. Launches judge for each affected step — if PASS, user's fix is accepted; if FAIL, implementation agent aligns the rest of code with user's changes +4. Launches a judge for each affected step — if it passes, the user's fix is accepted; if it fails, the implementation agent aligns the rest of the code with the user's changes 5. All subsequent steps are also re-verified ## Human-in-the-Loop (`--human-in-the-loop`) @@ -182,7 +182,7 @@ After each specified step passes: 1. Displays step results, artifacts, and judge feedback 2. Asks: `Continue? [Y/n/feedback]` -3. User feedback gets incorporated into subsequent iterations +3. User feedback is incorporated into subsequent iterations 4. User can pause the workflow at any point ## Usage Examples @@ -230,7 +230,7 @@ After each specified step passes: | Final verification PASS | Move task from `in-progress/` → `done/` | | Implementation aborted | Keep in `in-progress/` | -## Best practices +## Best Practices - Let the orchestrator work autonomously — it launches sub-agents for both implementation and verification - Use `--continue` if the process is interrupted — it picks up where it left off diff --git a/docs/plugins/sdd/plan.md b/docs/plugins/sdd/plan.md index bb7dac6..d48b895 100644 --- a/docs/plugins/sdd/plan.md +++ b/docs/plugins/sdd/plan.md @@ -2,8 +2,8 @@ Refine a draft task specification into a fully planned, implementation-ready task through multi-agent analysis, architecture synthesis, and quality-gated verification. -- Purpose - Transform draft task into complete specification with architecture, implementation steps, parallelization, and verification rubrics -- Output - Refined task file moved to `.specs/tasks/todo/`, plus skill files in `.claude/skills/` and analysis files in `.specs/analysis/` +- Purpose - Transforms a draft task into a complete specification with architecture, implementation steps, parallelization, and verification rubrics +- Output - A refined task file moved to `.specs/tasks/todo/`, plus skill files in `.claude/skills/` and analysis files in `.specs/analysis/` ```bash /sdd:plan .specs/tasks/draft/add-validation.feature.md [options] @@ -29,13 +29,13 @@ Refine a draft task specification into a fully planned, implementation-ready tas | Stage Name | Phase | Description | |------------|-------|-------------| -| `research` | 2a | Gather relevant resources, documentation, libraries | -| `codebase analysis` | 2b | Identify affected files, interfaces, integration points | -| `business analysis` | 2c | Refine description and create acceptance criteria | -| `architecture synthesis` | 3 | Synthesize research and analysis into architecture | -| `decomposition` | 4 | Break into implementation steps with risks | -| `parallelize` | 5 | Reorganize steps for parallel execution | -| `verifications` | 6 | Add LLM-as-Judge verification rubrics | +| `research` | 2a | Gathers relevant resources, documentation, and libraries | +| `codebase analysis` | 2b | Identifies affected files, interfaces, and integration points | +| `business analysis` | 2c | Refines the description and creates acceptance criteria | +| `architecture synthesis` | 3 | Synthesizes research and analysis into an architecture | +| `decomposition` | 4 | Breaks the architecture into implementation steps with risks | +| `parallelize` | 5 | Reorganizes steps for parallel execution | +| `verifications` | 6 | Adds LLM-as-Judge verification rubrics | ## Workflow Diagram @@ -114,18 +114,18 @@ Refine a draft task specification into a fully planned, implementation-ready tas Three analysis agents run **in parallel**, each with its own judge validation: - **Phase 2a: Research** (`researcher` agent, sonnet) — Gathers relevant resources, documentation, and libraries. Creates or updates a reusable skill file in `.claude/skills/`. -- **Phase 2b: Codebase Impact Analysis** (`code-explorer` agent, sonnet) — Identifies affected files, interfaces, and integration points. Produces analysis file in `.specs/analysis/`. -- **Phase 2c: Business Analysis** (`business-analyst` agent, opus) — Refines task description, creates acceptance criteria, and documents user scenarios. +- **Phase 2b: Codebase Impact Analysis** (`code-explorer` agent, sonnet) — Identifies affected files, interfaces, and integration points. Produces an analysis file in `.specs/analysis/`. +- **Phase 2c: Business Analysis** (`business-analyst` agent, opus) — Refines the task description, creates acceptance criteria, and documents user scenarios. Each sub-phase is validated by a judge agent. All three must pass before proceeding. ### Phase 3: Architecture Synthesis -`software-architect` agent (opus) synthesizes findings from research, codebase analysis, and business analysis into an architectural overview with key decisions, solution strategy, and expected file changes. +`software-architect` agent (opus) synthesizes findings from research, codebase analysis, and business analysis into an architectural overview featuring key decisions, a solution strategy, and expected file changes. ### Phase 4: Decomposition -`tech-lead` agent (opus) breaks the architecture into ordered implementation steps with success criteria, subtasks, blockers, risks, and complexity ratings. +`tech-lead` agent (opus) breaks the architecture into ordered implementation steps, including success criteria, subtasks, blockers, risks, and complexity ratings. ### Phase 5: Parallelize Steps @@ -133,23 +133,23 @@ Each sub-phase is validated by a judge agent. All three must pass before proceed ### Phase 6: Define Verifications -`qa-engineer` agent (opus) adds LLM-as-Judge verification sections with custom rubrics, thresholds, and verification levels (None / Single Judge / Panel of 2 / Per-Item) for each implementation step. +`qa-engineer` agent (opus) adds LLM-as-Judge verification sections with custom rubrics, thresholds, and verification levels (None, Single Judge, Panel of 2, or Per-Item) for each implementation step. ### Phase 7: Promote Task -Moves the refined task file from `draft/` to `todo/` and stages all generated artifacts with git. +Moves the refined task file from `draft/` to `todo/` and stages all generated artifacts with Git. ## Quality Gates Every phase includes a judge validation step using LLM-as-Judge: -- **PASS** (score >= threshold) — Phase complete, proceed to next -- **FAIL** (score < threshold) — Re-run phase with judge feedback -- **MAX_ITERATIONS reached** — Proceed to next stage automatically (with warning logged) +- **PASS** (score >= threshold) — Phase complete; proceed to the next stage. +- **FAIL** (score < threshold) — Re-run the phase with judge feedback. +- **MAX_ITERATIONS reached** — Proceed to the next stage automatically (with a warning logged). ## Refine Mode (`--refine`) -After reviewing the generated specification, you can edit it directly and re-run planning with `--refine`: +After reviewing the generated specification, you can edit it directly and re-run the planning process with `--refine`: 1. Detects changes via `git diff HEAD -- ` 2. Identifies the earliest modified section @@ -209,11 +209,11 @@ After reviewing the generated specification, you can edit it directly and re-run └── .md # Working scratchpads (gitignored) ``` -## Best practices +## Best Practices -- Review the generated specification before implementing — human feedback is the most effective quality lever -- Use `--refine` after making edits instead of re-running the full workflow -- Add `//` comment markers to lines that need clarification — agents will incorporate your feedback -- For complex tasks, use `--human-in-the-loop` to verify architecture decisions before decomposition -- Use `--fast` for simple well-defined tasks where full analysis is unnecessary -- Use `--skip research` when working with familiar technologies +- Review the generated specification before implementing — human feedback is the most effective quality lever. +- Use `--refine` after making edits instead of re-running the full workflow. +- Add `//` comment markers to lines that need clarification — agents will incorporate your feedback. +- For complex tasks, use `--human-in-the-loop` to verify architecture decisions before decomposition. +- Use `--fast` for simple, well-defined tasks where full analysis is unnecessary. +- Use `--skip research` when working with familiar technologies. diff --git a/docs/plugins/sdd/usage-examples.md b/docs/plugins/sdd/usage-examples.md index da983d5..f8bd074 100644 --- a/docs/plugins/sdd/usage-examples.md +++ b/docs/plugins/sdd/usage-examples.md @@ -1,6 +1,6 @@ # SDD Plugin - Usage Examples -Real-world scenarios demonstrating effective use of the Spec-Driven Development plugin for various project types and complexity levels. +Real-world scenarios demonstrating the effective use of the Spec-Driven Development plugin across various project types and complexity levels. ## Examples @@ -64,7 +64,7 @@ Real-world scenarios demonstrating effective use of the Spec-Driven Development /sdd:implement @.specs/tasks/todo/fix-null-pointer-user-service.bug.md --skip-judges ``` -The `--fast` flag sets `--target-quality 3.0 --max-iterations 1 --included-stages business analysis,decomposition,verifications`, skipping research, codebase analysis, architecture synthesis, and parallelization. +The `--fast` flag sets `--target-quality 3.0 --max-iterations 1 --included-stages "business analysis,decomposition,verifications"`, skipping research, codebase analysis, architecture synthesis, and parallelization. --- @@ -174,9 +174,9 @@ The `--refine` flag uses git diff to detect which sections were modified and onl # Detecting changed project files... # Changed: src/validation/validation.service.ts (modified) # Maps to: Step 2 (Create ValidationService) -# Step 2: Judge PASS ✅ — user's fix is good +# Step 2: Judge PASS ✅ — The user's fix is good # Step 3: Judge PASS ✅ — no cascading issues -# Step 4: Judge FAIL — launching implementation agent to align... +# Step 4: Judge FAIL — Launching the implementation agent to align... # Step 4: Judge PASS ✅ (after fix) ``` @@ -215,13 +215,13 @@ The `--refine` flag uses git diff to detect which sections were modified and onl ```bash # Quick diverse idea generation -/sdd:create-ideas caching strategies for real-time product catalog +/sdd:create-ideas "caching strategies for a real-time product catalog" # Output: 5 diverse ideas with probability scores # Pick the most promising approach # Deeper exploration with collaborative dialogue -/sdd:brainstorm We need real-time features but not sure about WebSockets vs Server-Sent Events +/sdd:brainstorm "We need real-time features but are not sure about WebSockets vs. Server-Sent Events" # After brainstorm produces a design document: /sdd:add-task "Implement real-time stock updates using WebSocket connections" @@ -239,11 +239,11 @@ The `--refine` flag uses git diff to detect which sections were modified and onl # Skip research phase — you're familiar with the stack /sdd:plan @.specs/tasks/draft/add-pagination.feature.md --skip research -# Skip research and codebase analysis — small isolated change -/sdd:plan @.specs/tasks/draft/fix-date-format.bug.md --skip research,codebase analysis +# Skip research and codebase analysis — A small, isolated change +/sdd:plan @.specs/tasks/draft/fix-date-format.bug.md --skip research,"codebase analysis" # Only run business analysis and decomposition -/sdd:plan @.specs/tasks/draft/update-config.chore.md --included-stages business analysis,decomposition +/sdd:plan @.specs/tasks/draft/update-config.chore.md --included-stages "business analysis,decomposition" ``` --- @@ -297,7 +297,7 @@ The `--refine` flag uses git diff to detect which sections were modified and onl ```bash # For unfamiliar technology — brainstorm first -/sdd:brainstorm We need real-time features but I'm not sure about WebSockets vs Server-Sent Events +/sdd:brainstorm "We need real-time features, but I'm not sure about WebSockets vs. Server-Sent Events" # The research phase in /sdd:plan will: # - Launch researcher agent to compare libraries @@ -317,9 +317,9 @@ The `--refine` flag uses git diff to detect which sections were modified and onl - New features with unclear requirements - Complex integrations with multiple systems -- Features affecting multiple parts of codebase +- Features affecting multiple parts of the codebase - Public APIs or features with external consumers -- Refactoring with high regression risk +- Refactoring projects with high regression risk ### When to Use Abbreviated Workflow @@ -337,8 +337,8 @@ The `--refine` flag uses git diff to detect which sections were modified and onl ### Anti-Patterns to Avoid -1. Skipping specification review for complex features +1. Skipping specification reviews for complex features 2. Ignoring high-risk task warnings in decomposition 3. Using `--skip-judges` for production-critical code 4. Creating tasks that are too large — decompose into smaller dependent tasks -5. Not using `--refine` after editing specifications (re-running full plan is wasteful) +5. Not using `--refine` after editing specifications (re-running a full plan is wasteful)