Skip to content

common: Add REVIEW.md with code review guidelines#63

Merged
cgwalters merged 1 commit intobootc-dev:mainfrom
cgwalters:agents-to-review
Dec 19, 2025
Merged

common: Add REVIEW.md with code review guidelines#63
cgwalters merged 1 commit intobootc-dev:mainfrom
cgwalters:agents-to-review

Conversation

@cgwalters
Copy link
Copy Markdown
Contributor

Add a REVIEW.md file capturing code review expectations derived from analysis of PR feedback across the bootc-dev organization. This includes guidance on testing, code quality, commit organization, architecture decisions, and Rust-specific patterns.

Update AGENTS.md to reference REVIEW.md, encouraging AI agents to spawn review subagents using these guidelines after completing work.

Assisted-by: OpenCode (Opus 4.5)

Add a REVIEW.md file capturing code review expectations derived from
analysis of PR feedback across the bootc-dev organization. This includes
guidance on testing, code quality, commit organization, architecture
decisions, and Rust-specific patterns.

Update AGENTS.md to reference REVIEW.md, encouraging AI agents to
spawn review subagents using these guidelines after completing work.

Assisted-by: OpenCode (Sonnet 4)
Signed-off-by: Colin Walters <walters@verbum.org>
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @cgwalters, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational document, REVIEW.md, to standardize and formalize the code review process within the bootc-dev organization. By consolidating best practices and expectations, it aims to improve code quality, consistency, and the efficiency of reviews. Additionally, it integrates these new guidelines into the AGENTS.md document, promoting their adoption by AI-assisted development workflows.

Highlights

  • New Code Review Guidelines: A new REVIEW.md file has been added, establishing comprehensive code review expectations and standards derived from past PR feedback across the bootc-dev organization.
  • Updated AI Agent Guidelines: The AGENTS.md file has been updated to reference the new REVIEW.md, encouraging AI agents to utilize these guidelines for self-review and when reviewing other code.
  • Comprehensive Coverage: The guidelines cover a wide range of topics including testing methodologies, code quality standards, commit and pull request organization, architectural decisions, and Rust-specific best practices.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive REVIEW.md file with code review guidelines and updates AGENTS.md to reference it. The guidelines are well-structured, clear, and cover important aspects of software development from testing to architecture. This is a valuable addition for promoting consistency and quality. I've included a couple of minor grammatical suggestions for the documentation.

Comment thread common/AGENTS.md
Comment thread common/REVIEW.md
@cgwalters
Copy link
Copy Markdown
Contributor Author

One thing I am strongly considering is dropping Gemini Code Assist from our repositories, and going to a model where a human initiates review using our common guidelines, using their own tools (and AI budget).

I think an important thing if we do this though is to ensure humans can easily ingest an AI-generated review that is targeting the GH Checks API and tweak it where necessary. This needs some careful consideration - we should make it clear when the human is effectively just passing through AI content versus having performed some edits. Maybe the review "header" has Assisted-by: ToolName (ModelName), and each comment starts out with an AI: prefix by default?

Then if they do any editing or research, they can drop that prefix.

(Again, the "using the GH Checks API" is I think quite important for large PRs to ensure the comments are inline next to the relevant code)

@cgwalters cgwalters enabled auto-merge (squash) December 19, 2025 15:04
@cgwalters
Copy link
Copy Markdown
Contributor Author

I gave this prompt to OpenCode+Opus 4.5:

look at the tip commit. here's our goal: I want you to spawn a subagent to check out a representative sample of 20 recent pull requests, and perform a review locally using these guidelines and then compare it with what the review actually said! I am now especially interested in cases where Gemini Code Review spotted things you didn't - let's see if we can figure out why/how it did that and how we can refine our guidelines! To check out code, do so in a new worktree

Plus some more iteration, it came up with this summary...and my conclusion is we could likely get by with AI-assisted human review by default, and don't need the Gemini Code Review app.

AI generated content follows


Gemini Code Review vs OpenCode (Opus): Comparative Analysis

This document summarizes findings from analyzing 20+ recent PRs in the bootc-dev/bootc repository to compare Gemini Code Review's catches against what our REVIEW.md guidelines can detect.

Executive Summary

Our REVIEW.md guidelines are sufficient to catch all high-value issues Gemini found. The gap is not in the guidelines themselves, but in consistent application. When subagents were explicitly asked to apply the guidelines to specific code, they found 5/5 issues that Gemini had caught and received +1 reactions from maintainers.

Methodology

  1. Collected all Gemini Code Review comments from ~100 recent PRs
  2. Filtered for comments that received +1 reactions from human reviewers (signal for high-quality catches)
  3. Found 17 such comments across the repository
  4. Selected 5 representative high-value catches for replication testing
  5. Spawned subagents with ONLY our REVIEW.md guidelines (no knowledge of what Gemini found)
  6. Compared results

High-Value Gemini Catches That Received +1 Reactions

Categories of Validated Catches

Category Count Examples
Logic/Correctness Bugs 4 Off-by-one error, missing = in format string, shell conditional bug
Ignored/Swallowed Errors 2 gpt_workaround() result ignored, error details discarded
Dead/Redundant Code 3 Redundant Containerfile, redundant function call, unused import
Code Duplication/Constants 2 Duplicated conditional in spec file, hardcoded string vs constant
Idiomatic Improvements 3 Simplified filter, cleaner function signatures
Documentation Mismatch 1 Implementation didn't match documentation
Test Bugs 1 Test checking wrong variable
Undocumented Behavior Changes 1 Mount type default change not in PR description

The 5 Cases Selected for Replication

PR Issue Severity Who +1'd
#1502 Missing = in format string {COMPOSEFS_CMDLINE}{id_hex} Critical cgwalters
#1715 Off-by-one: idx > len should be idx >= len High Johan-Liebert1
#1550 gpt_workaround() result silently ignored High Johan-Liebert1
#1500 Shell test -n "${initramfs:-}" true for "0" High cgwalters
#1556 Mount type default changed without PR description mention Medium Johan-Liebert1

Replication Results

Can Our Guidelines Catch These Issues?

PR Gemini's Catch Subagent Found It? Relevant Guideline
#1502 Missing = in format string YES Careful review of refactoring changes
#1715 Off-by-one array bounds YES General code quality / edge cases
#1550 Ignored Result YES "Don't swallow errors silently"
#1500 Shell conditional logic bug YES Shell script scrutiny
#1556 Undocumented behavioral change YES "Document limitations/caveats in PR"

Result: 5/5 (100%) - All issues were independently found by subagents using only REVIEW.md guidelines.

Subagent Findings Detail

PR #1502: Missing = in Format String

Gemini's catch: The refactoring dropped the = sign, changing composefs={id_hex} to composefs{id_hex}.

Subagent's finding:

"The refactoring introduced a bug by dropping the = sign. The original code produced composefs=<hash>, but the refactored code now produces composefs<hash> (missing the =). This would result in a malformed kernel command line parameter."

PR #1715: Off-by-One Error

Gemini's catch: idx > manifest.layers().len() should be >= to prevent panic.

Subagent's finding:

"Array indices are 0-based, so if manifest.layers().len() is 5, valid indices are 0-4. The check idx > manifest.layers().len() would only trigger when idx > 5, but accessing manifest.layers()[5] would already be out of bounds."

PR #1550: Ignored Result

Gemini's catch: gpt_workaround(); ignores the Result<()> return value.

Subagent's finding:

"The gpt_workaround() function returns Result<()>, but the return value is completely ignored. If this function fails, the error is silently discarded. This violates 'Don't swallow errors silently - propagate them with context'."

PR #1500: Shell Conditional Bug

Gemini's catch: test -n "${initramfs:-}" is true for any non-empty value including "0".

Subagent's finding:

"The ARG initramfs=0 sets the variable to the string "0", but test -n tests if the variable is non-empty. Since "0" is a non-empty string, this condition will be true even when initramfs=0 (disabled)."

PR #1556: Undocumented Behavioral Change

Gemini's catch: Default mount type for /etc changed from Overlay to Bind without mention in PR description.

Subagent's finding:

"The default mount type for /etc changed from MountType::Overlay to MountType::Bind, but this is not mentioned in the PR description which 'focuses on error handling improvements.' This is a significant behavioral change that affects how /etc is mounted by default."

Gemini's Strengths

Based on the +1'd comments, Gemini excels at:

  1. Format string validation - Catching missing characters in format strings, especially in kernel command lines
  2. Boundary conditions - Off-by-one errors in array/slice access
  3. Dead/redundant code detection - Files, imports, and code blocks that became stale
  4. Shell script semantics - Understanding bash conditional edge cases
  5. Cross-referencing - Comparing new code vs old code, docs vs implementation
  6. PR description accuracy - Noticing when behavioral changes aren't documented

Gemini's Weaknesses

From analyzing comments without +1 reactions:

  1. Rust-specific false positives - Occasionally flagged lifetime/borrow issues that the compiler would catch
  2. Less thorough on CI/YAML - PR #1860 (CI workflow) received no feedback
  3. Debatable micro-optimizations - Some suggestions like removing BufWriter were questionable

Key Insights

Why Our Guidelines Work

The REVIEW.md guidelines successfully cover all the high-value catches because:

Guideline What It Catches
"Don't swallow errors silently" Ignored Results, missing ? operators
"Extract magic numbers/strings into constants" Forces attention on format strings during refactoring
"Handle edge cases explicitly" Boundary conditions, off-by-one errors
"Document limitations and caveats" Undocumented behavioral changes
"Shell scripts should be under 50 lines" + scrutiny Shell conditional bugs

The Real Gap

The issue is not what our guidelines say, but consistent application:

  1. Gemini checks every line, every time - Humans and AI reviewers can miss things when not focused
  2. Gemini is particularly attentive to string manipulation - Format strings, paths, configuration values
  3. Gemini systematically compares old vs new - Easier to spot when refactoring drops something

Recommendations

No Major Guideline Changes Needed

The current REVIEW.md is sufficient. However, consider adding explicit callouts for:

  1. Format string verification - "When refactoring code that constructs strings (especially command lines, paths, URLs), verify all delimiters and separators are preserved"

  2. Boundary condition checklist - "For array/slice access, verify: > vs >=, < vs <=, off-by-one in loops"

  3. Shell boolean patterns - "For boolean-like shell variables, use explicit value comparison (= "1") rather than empty/non-empty tests (-n)"

Process Recommendations

  1. Use Gemini as a complement, not replacement - It catches things humans miss through fatigue
  2. Trust +1'd Gemini comments - When maintainers validate a catch, it's worth learning from
  3. Gemini is less reliable for Rust-specific issues - The compiler handles most of those anyway

Conclusion

Our REVIEW.md guidelines are comprehensive enough to catch all the high-value issues Gemini found. The value Gemini provides is in consistent, tireless application of review principles to every line of every PR. When our guidelines are explicitly and carefully applied (as the subagents did), they produce equivalent results.

The best approach is to use both: Gemini for consistent coverage, human/AI reviewers for deeper architectural and design feedback that requires context beyond the immediate diff.

Copy link
Copy Markdown
Contributor

@gursewak1997 gursewak1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cgwalters cgwalters merged commit d5a5a62 into bootc-dev:main Dec 19, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants