Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "fastcommit"
version = "0.2.3"
version = "0.3.0"
description = "AI-based command line tool to quickly generate standardized commit messages."
edition = "2021"
authors = ["longjin <fslongjin@vip.qq.com>"]
Expand All @@ -17,6 +17,7 @@ env_logger = "0.11.6"
lazy_static = "1.5.0"
log = "0.4.26"
openai_api_rust = { git = "https://github.com/fslongjin/openai-api", rev = "e2a3f6f" }
regex = "1.11.0"
reqwest = { version = "0.12.9", features = ["json"] }
serde = { version = "1.0.218", features = ["derive"] }
serde_json = "1.0.134"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ You can install `fastcommit` using the following method:

```bash
# Install using cargo
cargo install --git https://github.com/fslongjin/fastcommit --tag v0.2.3
cargo install --git https://github.com/fslongjin/fastcommit --tag v0.3.0
```

## Usage
Expand Down
2 changes: 1 addition & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

```bash
# 使用 cargo 安装
cargo install --git https://github.com/fslongjin/fastcommit --tag v0.2.3
cargo install --git https://github.com/fslongjin/fastcommit --tag v0.3.0
```

## 使用
Expand Down
103 changes: 103 additions & 0 deletions docs/sanitizer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Sanitizer Configuration Guide

To prevent leaking sensitive information when sending diffs/user descriptions to model providers, fastcommit includes a built-in secret sanitization mechanism. This mechanism replaces matched sensitive content with placeholders before generating commit messages or branch names, for example:

```
AKIAIOSFODNN7EXAMPLE -> [REDACTED:AWS_ACCESS_KEY_ID#1]
-----BEGIN PRIVATE KEY----- ... -> [REDACTED:PRIVATE_KEY_BLOCK#2]
Bearer abcdef123456 .... -> [REDACTED:BEARER_TOKEN#3]
```

## 1. Basic Toggle

Configuration file: `~/.fastcommit/config.toml`

Field:
```
sanitize_secrets = true
```
Set to `false` to completely disable sanitization.

## 2. Built-in Matching Rules
Current built-in rules (name -> regex description):

| Name | Description |
|------|-------------|
| PRIVATE_KEY_BLOCK | Matches private key blocks from `-----BEGIN ... PRIVATE KEY-----` to `-----END ... PRIVATE KEY-----` |
| GITHUB_TOKEN | Matches tokens with prefixes like `ghp_` / `ghs_` / `gho_` / `ghr_` / `ghu_` + 36 alphanumeric characters |
| AWS_ACCESS_KEY_ID | Starts with `AKIA` + 16 uppercase alphanumeric characters |
| JWT | Typical 3-segment Base64URL JWT structure |
| BEARER_TOKEN | Bearer token headers (`Bearer xxx`) |
| GENERIC_API_KEY | Common field names: `api_key` / `apikey` / `apiKey` / `secret` / `token` / `authorization` followed by separator and value |

Matched content will be replaced with `[REDACTED:<name>#sequence_number]`.

## 3. Custom Rules
You can add custom rules in the configuration file to capture team-specific sensitive string formats.

Example:
```
[[custom_sanitize_patterns]]
name = "INTERNAL_URL"
regex = "https://internal\\.corp\\.example\\.com/[A-Za-z0-9/_-]+"

[[custom_sanitize_patterns]]
name = "UUID_TOKEN"
regex = "[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}"
```

Notes:
- `name`: Identifier in the placeholder; recommended to use uppercase underscore style.
- `regex`: Rust regex (ECMAScript-like, but without backtracking support); please escape backslashes appropriately.
- All custom rules are executed after built-in rules.
- If a regex is invalid, it will be skipped and a warning will be output in the logs.

## 4. Viewing Sanitization Statistics
The current version outputs the following when running with `RUST_LOG=debug`:
```
Sanitized N potential secrets from diff/prompt
```
In the future, `--show-redactions` can be added to display more detailed tables (planned feature).

## 5. Performance and Notes
- There may be minor performance overhead for very large diffs (multiple find-replace passes). If performance is sensitive, reduce the number of custom rules.
- Custom regex should not be overly broad, otherwise it may falsely match normal code context, affecting model understanding.
- The model cannot see the original replaced content. If context hints are needed, design semantically expressive tags with `name`, for example: `DB_PASSWORD`/`INTERNAL_ENDPOINT`.

## 6. Common Custom Pattern Examples
```
[[custom_sanitize_patterns]]
name = "SLACK_WEBHOOK"
regex = "https://hooks\\.slack\\.com/services/[A-Za-z0-9/_-]+"

[[custom_sanitize_patterns]]
name = "DISCORD_WEBHOOK"
regex = "https://discord(?:app)?\\.com/api/webhooks/[0-9]+/[A-Za-z0-9_-]+"

[[custom_sanitize_patterns]]
name = "GCP_SERVICE_ACCOUNT"
regex = "[0-9]{12}-compute@developer\\.gserviceaccount\\.com"

[[custom_sanitize_patterns]]
name = "STRIPE_KEY"
regex = "sk_(live|test)_[A-Za-z0-9]{10,}"
```

## 7. Complete Example Configuration Snippet
```
sanitize_secrets = true

[[custom_sanitize_patterns]]
name = "INTERNAL_URL"
regex = "https://internal\\.corp\\.example\\.com/[A-Za-z0-9/_-]+"

[[custom_sanitize_patterns]]
name = "STRIPE_KEY"
regex = "sk_(live|test)_[A-Za-z0-9]{10,}"
```

## 8. Future Plans
- Report mode: Output table statistics of match categories and counts
- Allow listing redacted placeholder hints at the end of commit messages (configurable)

For adding new default built-in rules or improvements, welcome to submit Issues / PRs.
6 changes: 6 additions & 0 deletions src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,10 @@ pub struct Args {
help = "Generate commit message (use with -b to output both)"
)]
pub generate_message: bool,

#[clap(
long = "no-sanitize",
help = "Temporarily disable sensitive info sanitizer for this run"
)]
pub no_sanitize: bool,
}
20 changes: 20 additions & 0 deletions src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@ use std::{fmt::Display, fs};

use crate::constants::{DEFAULT_MAX_TOKENS, DEFAULT_OPENAI_API_BASE, DEFAULT_OPENAI_MODEL};

fn default_true() -> bool {
true
}

#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct CustomSanitizePattern {
/// A short name/identifier for the pattern. e.g. "INTERNAL_URL"
pub name: String,
/// The regex pattern string. It should be a valid Rust regex.
pub regex: String,
}

#[derive(Debug, Serialize, Deserialize)]
pub struct Config {
api_base: Option<String>,
Expand All @@ -16,6 +28,12 @@ pub struct Config {
pub verbosity: Verbosity,
/// Prefix for generated branch names (e.g. username in monorepo)
pub branch_prefix: Option<String>,
/// Enable sanitizing sensitive information (API keys, tokens, secrets) before sending diff to AI provider.
#[serde(default = "default_true")]
pub sanitize_secrets: bool,
/// User defined extra regex patterns for sanitizer.
#[serde(default)]
pub custom_sanitize_patterns: Vec<CustomSanitizePattern>,
}

impl Config {
Expand Down Expand Up @@ -104,6 +122,8 @@ impl Default for Config {
language: CommitLanguage::default(),
verbosity: Verbosity::default(),
branch_prefix: None,
sanitize_secrets: true,
custom_sanitize_patterns: Vec::new(),
}
}
}
Expand Down
36 changes: 26 additions & 10 deletions src/generate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,31 @@ use crate::config::{self, Config};

use crate::constants::BRANCH_NAME_PROMPT;
use crate::constants::{DEFAULT_MAX_TOKENS, DEFAULT_OPENAI_MODEL, DEFAULT_PROMPT_TEMPLATE};
use crate::sanitizer::sanitize_with_config;
use crate::template_engine::{render_template, TemplateContext};

async fn generate_commit_message(
diff: &str,
config: &config::Config,
user_description: Option<&str>,
) -> anyhow::Result<String> {
let auth = Auth::new(config.api_key.as_str());
// sanitize diff & user description first
let (sanitized_diff, sanitized_user_desc_opt, redactions) =
sanitize_with_config(diff, user_description, config);
if !redactions.is_empty() {
log::debug!(
"Sanitized {} potential secrets from diff/prompt",
redactions.len()
);
}

let auth = Auth::new(config.api_key.as_str());
let openai = OpenAI::new(auth, &config.api_base());

// Add "commit message: " prefix to user description if provided
let prefixed_user_description = user_description.map(|desc| {
// Add "commit message: " prefix to user description if provided (after sanitization)
let prefixed_user_description = sanitized_user_desc_opt.map(|desc| {
if desc.trim().is_empty() {
desc.to_string()
desc
} else {
format!("commit message: {}", desc)
}
Expand All @@ -31,7 +41,7 @@ async fn generate_commit_message(
config.conventional,
config.language,
config.verbosity,
diff,
&sanitized_diff,
prefixed_user_description.as_deref(),
);

Expand Down Expand Up @@ -72,7 +82,6 @@ async fn generate_commit_message(
.as_ref()
.ok_or(anyhow::anyhow!("No message in response"))?
.content;
// Extract content between <aicommit> tags
let commit_message = extract_aicommit_message(msg)?;
Ok(commit_message)
}
Expand Down Expand Up @@ -164,11 +173,19 @@ async fn generate_branch_name_with_ai(
prefix: Option<&str>,
config: &Config,
) -> anyhow::Result<String> {
let auth = Auth::new(config.api_key.as_str());
// sanitize diff only (branch name uses only diff)
let (sanitized_diff, _, redactions) = sanitize_with_config(diff, None, config);
if !redactions.is_empty() {
log::debug!(
"Sanitized {} potential secrets from diff before branch generation",
redactions.len()
);
}

let auth = Auth::new(config.api_key.as_str());
let openai = OpenAI::new(auth, &config.api_base());

let prompt = BRANCH_NAME_PROMPT.replace("{{diff}}", diff);
let prompt = BRANCH_NAME_PROMPT.replace("{{diff}}", &sanitized_diff);
let messages = vec![
Message {
role: Role::System,
Expand All @@ -191,7 +208,7 @@ async fn generate_branch_name_with_ai(
top_p: None,
n: None,
stream: Some(false),
stop: None, // 移除 stop words 以避免思考过程中的干扰
stop: None,
max_tokens: Some(DEFAULT_MAX_TOKENS as i32),
presence_penalty: None,
frequency_penalty: None,
Expand All @@ -215,7 +232,6 @@ async fn generate_branch_name_with_ai(

let branch_name = extract_aicommit_message(&msg)?;

// Clean up the branch name
let branch_name = if let Some(prefix) = prefix {
format!("{}{}", prefix.trim(), branch_name.trim())
} else {
Expand Down
5 changes: 5 additions & 0 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ mod cli;
mod config;
mod constants;
mod generate;
mod sanitizer;
mod template_engine;
mod update_checker;

Expand All @@ -24,6 +25,10 @@ async fn main() -> anyhow::Result<()> {
if let Some(v) = args.verbosity {
config.verbosity = v;
}
if args.no_sanitize {
// CLI override to disable sanitizer
config.sanitize_secrets = false;
}

run_update_checker().await;

Expand Down
Loading