Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 7 additions & 24 deletions .claude/rules/code-quality.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,13 @@
## Finish the Work First

- The goal is working, tested code — not a PR. PRs are the delivery mechanism, not the deliverable.
- Do NOT rush to commit, push, or open a PR until the actual work is done and verified.
- A task is not done when the code compiles. It's done when it runs correctly and is tested.
- Stay in the problem. If something is "working," verify it properly before moving to git/PR mechanics.
- Only create a PR when explicitly asked, or when the work is genuinely complete and tested.
- Never treat "open a PR" as a natural next step — wait for the user to decide when the work is ready.
- Do NOT rush to commit/push/PR until work is done and verified. A task is done when it runs correctly and is tested, not when it compiles.
- Only create a PR when explicitly asked, or when work is genuinely complete and tested.

## Scope

- Only change what was asked for — don't touch surrounding code
- If you spot something worth fixing but it wasn't requested, call it out instead of silently doing it
- No drive-by refactors, no "while I'm here" improvements
- One logical change per task — don't bundle unrelated fixes

## Change Safety

- Read before edit — always understand a file before modifying it
- Build and test after changes, don't assume it works
- No new dependencies without discussing it first
- Don't delete code you don't fully understand

## Review Mindset

- Don't add comments, docstrings, or type annotations to untouched code
- Don't rename things that aren't part of the task
- Don't "improve" error messages or formatting in adjacent code
- Keep PRs reviewable — small, focused diffs
- If a change is getting large, pause and check in with the user
- Only change what was asked for. No drive-by refactors, no "while I'm here" improvements.
- If you spot something worth fixing, call it out — don't silently do it.
- One logical change per task. Keep PRs reviewable — small, focused diffs.
- No new dependencies without discussing it first.
- If a change is getting large, pause and check in with the user.
69 changes: 0 additions & 69 deletions .claude/rules/experiment-logging.md

This file was deleted.

46 changes: 9 additions & 37 deletions .claude/rules/git-safety.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,48 +2,20 @@

## Branch Workflow

- Remote: `origin` (https://github.com/appsprout-dev/mnemonic.git)
- Primary branch: `main`
- **All new work starts on a feature branch** — never commit directly to `main`
- Branch naming: `feat/<description>`, `fix/<description>`
- Remote: `origin` (https://github.com/appsprout-dev/mnemonic.git), primary branch: `main`
- All new work on feature branches (`feat/<desc>`, `fix/<desc>`) — never commit directly to `main`
- Before branching: `git stash` (if dirty), `git pull origin main`, then `git checkout -b <branch>`
- **Before committing:** Run `git branch --show-current` to verify you're on the intended branch. Bash tool does not persist shell state — a prior `git checkout` may not have taken effect.
- **All changes go through a PR** — push the branch, open a PR with `gh pr create`, get it reviewed
- **Closing issues:** When a PR resolves a GitHub issue, comment on the issue with a reference to the PR before or after closing it. Never close issues silently.
- No blind commits to main, no YOLO pushes
- **Before committing:** `git branch --show-current` to verify — Bash tool doesn't persist shell state
- All changes go through PRs (`gh pr create`). When a PR resolves an issue, comment on the issue with a PR reference.

## Forbidden Operations
## Forbidden (enforced by hooks)

Enforced by `.claude/hooks/protect-git.sh` and `.claude/hooks/no-secrets.sh`:

- `git push --force` / `git push -f` -- destroys remote history
- `git reset --hard` -- destroys local changes
- `git clean -f` -- permanently deletes untracked files
- `git checkout .` / `git restore .` -- discards all unstaged changes
- Staging `.env`, `credentials`, `*.db`, `settings.local.json`
`.claude/hooks/protect-git.sh` and `.claude/hooks/no-secrets.sh` block: force push, `reset --hard`, `clean -f`, `checkout .`/`restore .`, staging `.env`/`credentials`/`*.db`/`settings.local.json`.

## Commit Messages (Conventional Commits)

Use [Conventional Commits](https://www.conventionalcommits.org/) format — release-please uses these to auto-generate changelogs and version bumps:

- `feat: add memory source tracking` — new feature (bumps minor)
- `fix: prevent nil pointer in retrieval` — bug fix (bumps patch)
- `docs: update README with Gemini setup` — documentation only
- `refactor: simplify consolidation loop` — code change, no behavior change
- `test: add encoding agent coverage` — tests only
- `chore: update dependencies` — maintenance
- `ci: fix release workflow runner` — CI/CD changes

Rules:

- Short, direct subject line describing the change
- Body for context when non-obvious
- No issue-closing keywords in commit messages unless explicitly asked
- Use Co-Authored-By for Claude contributions
- Append `!` after the type for breaking changes: `feat!: redesign store interface`
Format: `type: description` — release-please uses these for changelogs/version bumps.

## Secrets
Types: `feat` (minor), `fix` (patch), `docs`, `refactor`, `test`, `chore`, `ci`. Append `!` for breaking changes.

- `settings.local.json` contains machine-specific permissions -- NEVER commit
- `*.db` files contain user data -- gitignored
- Never include API tokens in commit messages or code
Rules: short subject, body when non-obvious, no issue-closing keywords unless asked, Co-Authored-By for Claude, `settings.local.json` and `*.db` never committed.
29 changes: 0 additions & 29 deletions .claude/rules/peer-review-standard.md

This file was deleted.

69 changes: 69 additions & 0 deletions .claude/rules/research-standards.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Research Standards

This project is under review by Aaron Gokaslan and Andrej Karpathy. All work must meet the standard of a published research project. Follow the scientific method — every experiment is a test of a hypothesis, not a fishing expedition.

## Core Principles

- **Let the data decide.** Judge results by numbers, not desire. No reinterpreting negative results as "needs more training" without evidence.
- **No motivated reasoning.** Report the number you got, not the number you wanted. Negative results get the same documentation quality.
- **Actively disprove.** After a positive result: could this be an LR artifact? Param count mismatch? Training duration effect? Random seed noise? Not confirmed until alternatives are ruled out.
- **Reproducibility.** Every result must be reproducible from the registry entry alone — exact commands, configs, hardware, data paths.

## Pre-Registration (BEFORE any training or sweep)

Create an entry in `training/docs/experiment_registry.md`:

```markdown
### EXP-{number}: {name}
- **Date:** {YYYY-MM-DD}
- **Status:** REGISTERED | RUNNING | COMPLETED | FAILED
- **Hypothesis:** {What you expect and why}
- **Variable:** {The ONE thing changed vs control}
- **Control:** {Comparison target with its result}
- **Prediction:** {Quantitative — e.g., "expect LR 1e-3 to beat 6e-4 by 5-10%"}
- **Config:** {model, HP, hardware, data}
- **Result:** {filled after run}
- **Verdict:** CONFIRMED | REFUTED | INCONCLUSIVE
- **Analysis:** {What happened, why, what it means}
```

## After Every Run

1. Record result in registry (Status -> COMPLETED)
2. Compare to prediction — was your mental model right?
3. Positive result: list alternative explanations, which are ruled out
4. Negative result: what does this tell us about config/architecture?
5. Update `training/sweep_results.tsv` with raw numbers
6. Update `training/docs/experiments.md` with analysis paragraph (not bullet points)
7. If result changes prior conclusions, update those entries too

## Experiment Document Structure

`training/docs/experiments.md` follows: Overview, Experimental Protocol, Baselines, HP Sweep Results (by variable), Pretraining Runs, Planned Experiments, Summary.

Every entry needs: header line (name/date/config/hardware), control and variable, results table (loss + PPL minimum), analysis paragraph (quantitative, mechanistic, implications). Sweep phases get a combined table + cross-group analysis.

## Benchmark Logging

Benchmarks require: exact command, software state (commit hash, version, config), environment (hardware, provider, model), ALL metrics, comparison context (baseline and target).

## Evaluation Protocol

Standard budgets (RX 7800 XT): short test 1K-2K optimizer steps, full sweep 4K+ micro-steps, full pretrain ~400K micro-steps.

Metrics — Training: loss, PPL, tokens/sec, VRAM peak (report ALL). Quality: nDCG@5 (primary), Precision@5, Recall@5, MRR, JSON compliance, latency.

## Claims Bar

- "Doesn't hallucinate" requires evidence. "X > Y" requires controlled comparison on matched conditions.
- Fabrication rate of 10% is not "low." 25 test inputs is a pilot, not proof.
- Look at actual outputs, not just aggregates.
- Code: no dead code, scripts self-documenting, pipelines run from clean checkout, evals deterministic.

## Red Flags

- Running without a hypothesis -> pre-register first
- 3+ experiments all confirmed -> testing hard enough?
- Comparing across different LRs/steps/batch sizes -> unfair
- Explaining away negatives -> data is probably right
- Registry config drifts from actual run -> update immediately
116 changes: 0 additions & 116 deletions .claude/rules/scientific-method.md

This file was deleted.

Loading