Summary
Parse recent git commit history to infer feature groupings from commit messages and changed file paths. Language-agnostic — uses subprocess + git log, not tree-sitter.
Depends on: #124, #126
New file
src/specleft/discovery/miners/shared/git_history.py
import uuid
from specleft.discovery.models import SupportedLanguage, MinerResult, DiscoveredItem, ItemKind, GitCommitMeta, MinerErrorKind
from specleft.discovery.context import MinerContext
class GitHistoryMiner:
miner_id = uuid.UUID("f1c93075-4e3c-44b8-bef6-9c0bc25b6c42")
name = "git_history"
languages = frozenset() # language-agnostic; always runs
def mine(self, ctx: MinerContext) -> MinerResult: ...
Git log command
git -C {ctx.root} log --no-merges \
--format="%H%n%s%n%b%n---END---" \
--name-only -n {ctx.config.max_git_commits}
Note: MAX_COMMITS is now read from ctx.config.max_git_commits (default: 200, configurable via [tool.specleft.discovery].max_git_commits in pyproject.toml).
Parsing
- Split output on
---END--- separator
- Per commit: extract short hash (7 chars), subject, body, list of changed files
- Skip commits whose subject matches conventional commit noise prefixes:
chore:, ci:, build:, docs:, style:, test:
- Produce one
DiscoveredItem per remaining commit that has >=1 changed source file
Typed metadata
Each item's metadata dict must conform to GitCommitMeta:
GitCommitMeta(
commit_hash = "a7b21db",
subject = "feat: add login endpoint",
body = "Implements JWT-based authentication...",
changed_files = ["src/auth/login.py", "tests/test_login.py"],
conventional_type = "feat",
file_prefixes = ["src/auth", "tests"],
)
name: commit subject line
file_path: None (git items span multiple files)
language: None (language-agnostic)
confidence: 0.5 (git history is a weak intent signal)
Note: languages = frozenset() means this miner always runs regardless of detected languages. The pipeline treats empty frozenset as "language-agnostic".
Error handling
If git is not on PATH or ctx.root is not a git repository:
MinerResult(
miner_id=self.miner_id,
miner_name=self.name,
items=[],
error="not a git repository",
error_kind=MinerErrorKind.NOT_INSTALLED,
duration_ms=0,
)
Do not raise.
Acceptance criteria
Summary
Parse recent git commit history to infer feature groupings from commit messages and changed file paths. Language-agnostic — uses
subprocess+git log, not tree-sitter.Depends on: #124, #126
New file
src/specleft/discovery/miners/shared/git_history.pyGit log command
git -C {ctx.root} log --no-merges \ --format="%H%n%s%n%b%n---END---" \ --name-only -n {ctx.config.max_git_commits}Note:
MAX_COMMITSis now read fromctx.config.max_git_commits(default: 200, configurable via[tool.specleft.discovery].max_git_commitsin pyproject.toml).Parsing
---END---separatorchore:,ci:,build:,docs:,style:,test:DiscoveredItemper remaining commit that has >=1 changed source fileTyped metadata
Each item's
metadatadict must conform toGitCommitMeta:name: commit subject linefile_path:None(git items span multiple files)language:None(language-agnostic)confidence:0.5(git history is a weak intent signal)Note:
languages = frozenset()means this miner always runs regardless of detected languages. The pipeline treats emptyfrozensetas "language-agnostic".Error handling
If
gitis not on PATH orctx.rootis not a git repository:Do not raise.
Acceptance criteria
total_items > 0--no-merges)ctx.config.max_git_commits— not a hardcoded constantmetadatavalidates againstGitCommitMetaconventional_type="feat"parsed from"feat: add login endpoint""chore: update lockfile"are skippedlanguage=NoneMinerResultwitherror+error_kind=NOT_INSTALLED, no exceptiontests/discovery/miners/test_git_history.pyusing atmp_pathgit repo fixturefeatures/feature-spec-discovery.mdto cover the functionality introduced by this issue