refactor: move pattern mining to library layer for better integration by tarunKoyalwar · Pull Request #284 · projectdiscovery/alterx

tarunKoyalwar · 2025-11-10T15:53:47Z

Summary

This PR refactors the pattern mining implementation to move it from the CLI layer into the core library layer, enabling programmatic usage and better separation of concerns.

Key Changes

Add Mode field to Options struct: Supports default, discover, and both modes
Concurrent execution: Discover and default modes run in parallel for better performance in "both" mode
Domain validation utility: New getNValidateRootDomain function in util.go with proper error handling for edge cases
Centralized validation: Options.Validate() method consolidates all validation logic
Backward compatibility: Empty Mode field defaults to "default" mode, preserving existing behavior

Benefits

Programmatic usage: Library users can now integrate pattern mining directly into their applications
Better architecture: Clear separation between CLI concerns and core library functionality
Improved performance: Concurrent execution in "both" mode
Robust error handling: Fixed edge case where all domains are empty/whitespace-only
Maintainability: Cleaner code organization with consolidated validation logic

Testing

✅ All existing tests pass
✅ Linter passes with 0 issues
✅ Functional testing verified:
- Default mode with patterns
- Discover mode with pattern mining
- Both mode (combined)
- ClusterBomb algorithm with custom payloads
- Enrich mode with built-in patterns

Breaking Changes

None - this is a pure refactoring that maintains full backward compatibility.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Add pattern discovery modes with configurable mining parameters and ability to save discovered rules after a run.
Improvements
- Unified execution path with a single output writer, concurrent pattern mining, and streamlined estimation/reporting of generated payload counts.
Validation
- Enforced homogeneous root/domain validation and added safeguards to skip unsuitable or overly long inputs.
Documentation
- Updated architecture and usage docs to reflect modes, concurrency, and rule-saving.

This refactor moves pattern mining capabilities from the CLI layer into the core Mutator library, enabling programmatic use and better separation of concerns. Key changes: - Add Mode field to Options struct (default, discover, both) - Implement concurrent execution of discover and default modes - Move domain validation to util.go with proper error handling - Add Options.Validate() for centralized validation logic - Maintain backward compatibility (empty Mode defaults to "default") - Fix edge case in getNValidateRootDomain for all-empty domains Benefits: - Library users can now use pattern mining programmatically - Better code organization with clear separation of concerns - Concurrent execution improves performance in "both" mode - More robust error handling for edge cases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2025-11-10T15:53:56Z

Walkthrough

CLI simplified to a single execution path: alterx.Options is built and validated, Mutator gained mode-aware mining and concurrent execution, pattern-mining results can be saved via SaveRules, root-domain validation moved to utilities, and pattern-mining input checks were tightened.

Changes

Cohort / File(s)	Summary
CLI / main `cmd/alterx/main.go`	Builds a single `alterx.Options`, validates/merges permutation config, initializes Alterx once, runs unified `ExecuteWithWriter(output)`, uses `EstimateCount()` for counts, and invokes `SaveRules` post-run when requested. Removed mode branching, dedup-writer flow, and output-close logic.
Mutator / pattern mining `mutator.go`	Added `Options.Validate()` and new exported fields on `Options` (`Mode`, `MinDistance`, `MaxDistance`, `PatternThreshold`, `QualityRatio`, `NgramsLimit`, `MaxLength`). `Mutator` now stores `rootDomain`, `miner`, `miningResult`, and uses atomic `timeTaken`. `New()` initializes per `Mode`. `Execute()` runs mining concurrently with input generation, merges results, `EstimateCount()` incorporates miner counts, and `SaveRules(filename)` persists mining output.
Utilities / domain validation `util.go`	Added `getNValidateRootDomain(domains []string) (string, error)` using `publicsuffix.EffectiveTLDPlusOne` to derive and enforce a common root domain across inputs; returns clear errors for empty/non-homogeneous inputs.
Pattern-mining internals `internal/patternmining/patternmining.go`	Introduced early-sanitization checks in domain validation to skip hosts with >5 subdomain levels or excessive token-length sums, adding early-continue branches to prevent further processing of those inputs.
Docs / architecture `CLAUDE.md`	Documentation updated to reflect CLI flow, Mode semantics (default/discover/both), concurrent mining in `Mutator.Execute()`, root-domain validation, DFA/NGram notes, and `SaveRules` behavior.
Repo config `/.gitignore`	Added `/.testing` to ignore rules.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as cmd/alterx
    participant Mut as alterx.Mutator
    participant Miner as patternmining.Miner
    participant Gen as Generator
    CLI->>Mut: Build & Validate alterx.Options
    Note right of Mut: Init Miner if Mode == "discover" or "both"
    CLI->>Mut: ExecuteWithWriter(output)
    par Input generation
        Mut->>Gen: Generate from inputs
    and Pattern mining (async)
        Mut->>Miner: Start mining
        Miner-->>Mut: miningResult
        Mut->>Gen: Generate from mined patterns
    end
    Gen-->>Mut: candidates
    Mut->>Mut: Merge & dedupe
    Mut-->>CLI: Write output
    alt SaveRules requested
        CLI->>Mut: SaveRules(path)
        Mut->>Miner: Export rules
        Miner-->>CLI: Rules persisted
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Inspect concurrency and synchronization in mutator.go::Execute() (goroutines, WaitGroup, channels, atomic timeTaken).
Verify Options.Validate() defaults, allowed Mode values, and interactions between options.
Review SaveRules() persistence/IO and error paths.
Validate getNValidateRootDomain publicsuffix usage and homogeneity checks.
Check early-rejection thresholds in internal/patternmining/patternmining.go for correctness and test coverage.

Poem

🐇 I hopped through code where patterns hide,
Mining in parallel, inputs at my side.
Rules saved at dawn, candidates take flight,
Small hops of logic, everything feels right.
A rabbit cheers — the pipeline hums bright!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'refactor: move pattern mining to library layer for better integration' accurately and specifically summarizes the main objective of the PR, which is moving pattern mining from CLI to library layer.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch update-library-impl

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e46ff7e and ddb275d.

📒 Files selected for processing (1)

internal/patternmining/patternmining.go (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test Builds (windows-latest, 1.21.x)

🔇 Additional comments (2)

internal/patternmining/patternmining.go (2)

280-288: LGTM! Clear safety check for subdomain depth.

The validation correctly skips deeply nested subdomains (>5 levels) to avoid expensive computations. The example and log message clearly communicate the rationale.

289-299: 

No issues found—implementation correctly counts tokens and is more accurate than the past review suggestion.

The data structure verification confirms that tokenize() returns [][][]string where:

tokens[0] is all subdomain labels for the first host ([][]string)

Each tokenGroup in the loop is a slice of tokens for one label ([]string)

len(tokenGroup) correctly counts tokens per label

Your implementation sums token counts across all labels, which properly rejects inputs with >10 total tokens. The past review suggestion of tokenCount := len(tokens[0]) was incorrect—it would only count the number of labels, not total tokens. Your current code is more accurate for the stated goal.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03451bf and a04895a.

📒 Files selected for processing (3)

cmd/alterx/main.go (2 hunks)
mutator.go (5 hunks)
util.go (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

mutator.go (4)

internal/runner/runner.go (1)

Options (17-40)

internal/patternmining/patternmining.go (3)

Miner (82-85)

Result (47-50)

NewMiner (88-93)

config.go (1)

DefaultConfig (18-18)

replacer.go (1)

Replace (19-27)

cmd/alterx/main.go (2)

mutator.go (1)

Options (29-60)

internal/runner/runner.go (1)

Options (17-40)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Test Builds (macOS-latest, 1.21.x)
GitHub Check: Analyze (go)
GitHub Check: Test Builds (windows-latest, 1.21.x)

cmd/alterx/main.go

util.go

Fixes critical bug where SaveRules was called before pattern mining completed, causing miningResult to be nil. Changes: - Move SaveRules block after ExecuteWithWriter call - Ensures mining goroutine completes before saving rules - Skips SaveRules in estimate mode (no execution happens) - Add explanatory comment about execution order requirement Resolves CodeRabbit review comment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

cmd/alterx/main.go (1)

67-72: SaveRules correctly placed after execution.

The past review comment flagged that SaveRules was called before mining, causing it to always fail. The refactor now correctly calls SaveRules after ExecuteWithWriter (line 61), ensuring that m.miningResult is populated before attempting to save. Well done!

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a04895a and 63fd622.

📒 Files selected for processing (1)

cmd/alterx/main.go (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

cmd/alterx/main.go (2)

mutator.go (1)

Options (29-60)

internal/runner/runner.go (1)

Options (17-40)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Test Builds (macOS-latest, 1.21.x)
GitHub Check: Test Builds (windows-latest, 1.21.x)

🔇 Additional comments (2)

cmd/alterx/main.go (2)
33-33: Verify the hardcoded MaxLength value.

MaxLength is hardcoded to 1000 without explanation. Confirm whether this should be exposed as a configurable CLI option or if this default is intentional for all use cases.

61-72: Verify SaveRules behavior when ExecuteWithWriter fails.

If ExecuteWithWriter returns an error (line 61), mining may not have completed successfully, yet SaveRules (line 69) still executes. Confirm whether incomplete mining results should be saved or if SaveRules should be skipped when err != nil.

Consider applying this diff to skip saving rules on execution failure:
 	// Execute mutator (handles all modes internally)
 	if err = m.ExecuteWithWriter(output); err != nil {
 		gologger.Error().Msgf("failed to execute alterx: %v", err)
+		return
 	}
 
 	gologger.Info().Msgf("Generated %d total unique subdomains", m.PayloadCount())
 
 	// Save rules if requested (must be after Execute to ensure mining is complete)
 	if cliOpts.SaveRules != "" {
 		if err := m.SaveRules(cliOpts.SaveRules); err != nil {
 			gologger.Error().Msgf("failed to save rules: %v", err)
 		}
 	}

cmd/alterx/main.go

… mining This commit addresses timing accuracy and prevents resource exhaustion on complex inputs. The timing measurement now captures the full execution duration across parallel goroutines, and input validation rejects problematic domains (>5 levels or >10 tokens) to avoid blocking issues. Also improves user experience by removing duplicate logging and clarifying output messages. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixes GitHub Actions Build Test failures caused by concurrent read/write access to timeTaken field. The cleanup goroutine writes to timeTaken while the main goroutine reads it via Time() method, creating a race condition detected by Go's race detector. Solution: Use sync/atomic operations (StoreInt64/LoadInt64) for thread-safe access to timeTaken field, ensuring proper synchronization across goroutines without blocking overhead. Tested locally with race detector - all tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63fd622 and 58f6ea0.

📒 Files selected for processing (5)

.gitignore (1 hunks)
CLAUDE.md (6 hunks)
cmd/alterx/main.go (2 hunks)
internal/patternmining/patternmining.go (1 hunks)
mutator.go (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

cmd/alterx/main.go (2)

mutator.go (1)

Options (29-60)

internal/runner/runner.go (1)

Options (17-40)

mutator.go (3)

internal/patternmining/patternmining.go (4)

Options (27-44)

Miner (82-85)

Result (47-50)

NewMiner (88-93)

internal/runner/runner.go (1)

Options (17-40)

config.go (1)

DefaultConfig (18-18)

🪛 GitHub Actions: 🔨 Build Test

cmd/alterx/main.go

[error] 61-61: Go data race detected; application exited with race detector enabled. See stack traces for details.

mutator.go

[error] 467-467: DATA RACE detected in Mutator.Time() (read) during race-enabled run.

[error] 263-263: DATA RACE detected in Mutator.Execute.func3() (write) during race-enabled run.

[error] 286-286: DATA RACE detected in Mutator.ExecuteWithWriter() (concurrent access) during race-enabled run.

[error] 280-280: DATA RACE detected in Mutator.ExecuteWithWriter() (concurrent access) during race-enabled run.

coderabbitai · 2025-11-13T14:27:59Z

internal/patternmining/patternmining.go

+		sum := 0
+		for _, token := range tokens[0] {
+			sum += len(token)
+		}
+		// to avoid expensive computation skip any subdomain that can be tokenized into more than 10 tokens
+		if sum > 10 {
+			// ex: api1dev-home-us1..... basically even if subdomain levels are less than 5 but have too many
+			// seperators the vector length would become too long
+			gologger.Verbose().Msgf("Rejecting input: %s since it can be tokenized into more than 10 tokens", host)
+			continue
+		}


⚠️ Potential issue | 🔴 Critical

Rejecting valid inputs due to wrong “token” metric

This branch aims to skip hosts that explode into >10 tokens, but it sums the character lengths of each token. Any realistic subdomain like payments-api.internal.example.com (8+3+8 characters) now exceeds the threshold and gets rejected even though it only has three tokens. That breaks pattern mining for the majority of real-world inputs.

Replace the character-sum with an actual token-count check so we only skip when len(tokens[0]) itself exceeds the limit, and update the log accordingly.

- sum := 0 - for _, token := range tokens[0] { - sum += len(token) - } - // to avoid expensive computation skip any subdomain that can be tokenized into more than 10 tokens - if sum > 10 { - // ex: api1dev-home-us1..... basically even if subdomain levels are less than 5 but have too many - // seperators the vector length would become too long - gologger.Verbose().Msgf("Rejecting input: %s since it can be tokenized into more than 10 tokens", host) + tokenCount := len(tokens[0]) + // to avoid expensive computation skip any subdomain that can be tokenized into more than 10 tokens + if tokenCount > 10 { + // ex: api1dev-home-us1..... basically even if subdomain levels are less than 5 but have too many + // separators the vector length would become too long + gologger.Verbose().Msgf("Rejecting input: %s since it has more than 10 tokens", host) continue }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sum := 0

for _, token := range tokens[0] {

sum += len(token)

}

// to avoid expensive computation skip any subdomain that can be tokenized into more than 10 tokens

if sum > 10 {

// ex: api1dev-home-us1..... basically even if subdomain levels are less than 5 but have too many

// seperators the vector length would become too long

gologger.Verbose().Msgf("Rejecting input: %s since it can be tokenized into more than 10 tokens", host)

continue

}

tokenCount := len(tokens[0])

// to avoid expensive computation skip any subdomain that can be tokenized into more than 10 tokens

if tokenCount > 10 {

// ex: api1dev-home-us1..... basically even if subdomain levels are less than 5 but have too many

// separators the vector length would become too long

gologger.Verbose().Msgf("Rejecting input: %s since it has more than 10 tokens", host)

continue

}

🤖 Prompt for AI Agents

In internal/patternmining/patternmining.go around lines 289 to 299, the code incorrectly sums character lengths of tokens and rejects hosts when that character sum is >10; change this to check the actual token count instead. Replace the sum calculation and if-check with a simple count check: if len(tokens[0]) > 10 { ... } (i.e., compare number of tokens, not sum of token lengths), and update the log message to indicate rejection due to too many tokens rather than too many characters. Ensure the continue behavior and verbose log remain the same.

mutator.go

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

mutator.go (1)
63-91: Mode validation and defaults look good, but remove the no-op assignment.

The validation logic correctly defaults to "default" mode for backward compatibility and validates the mode string. Default values for mining parameters are reasonable.

However, lines 85-86 are a no-op since NgramsLimit is already 0 by default:
-	if v.NgramsLimit == 0 {
-		v.NgramsLimit = 0
-	}
Consider either removing this entirely or adding a non-zero default if one is intended. Based on the comment "0 = no limit" in the patternmining.Options struct, the current zero value is intentional, making this assignment unnecessary.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58f6ea0 and e46ff7e.

📒 Files selected for processing (1)

mutator.go (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

mutator.go (4)

internal/patternmining/patternmining.go (4)

Options (27-44)

Miner (82-85)

Result (47-50)

NewMiner (88-93)

internal/runner/runner.go (1)

Options (17-40)

inputs.go (1)

Input (14-22)

replacer.go (1)

Replace (19-27)

🔇 Additional comments (12)

mutator.go (12)

10-11: LGTM: Necessary imports for concurrent execution and pattern mining.

The addition of sync, sync/atomic, and the internal patternmining package appropriately supports the new concurrent mode execution and atomic time tracking.

Also applies to: 14-14

46-61: LGTM: Well-structured Options additions.

The new fields support the three operational modes (default, discover, both) with appropriate mining parameters. The fields are well-documented and maintain backward compatibility.

98-98: Excellent fix for the race condition!

Changing timeTaken from time.Duration to int64 with atomic operations (used at lines 264 and 468) properly resolves the data race flagged in the previous review. The nanoseconds are stored atomically and converted back to duration when read.

101-103: LGTM: New fields appropriately support pattern mining.

The addition of rootDomain, miner, and miningResult fields enables the library to maintain mining state and results for programmatic access.

107-175: Well-structured refactoring with clear mode-based initialization.

The refactored New() function cleanly separates concerns:

Validates options upfront (lines 108-110)

Initializes mining components for discover/both modes (lines 116-137)

Initializes default generation components for default/both modes (lines 139-172)

The mode-specific initialization ensures that "both" mode properly sets up both pipelines while maintaining backward compatibility when mode is unspecified. Error handling is appropriate throughout.

177-190: LGTM: Clear error handling and appropriate guards.

The SaveRules() method properly validates that mining is enabled and results are available before attempting to save. The error messages are clear and the success logging is helpful.

194-204: LGTM: Concurrent execution properly synchronized with atomic time tracking.

The refactored Execute() uses a WaitGroup to synchronize parallel mining and default generation goroutines. The atomic storage of timeTaken at line 264 correctly addresses the race condition from the previous review.

Also applies to: 261-265

213-216: Verify the intended behavior when mining fails.

When mining fails, the error is logged and the goroutine returns without sending any results. This means:

In "discover" mode: users get zero results with only a log message indicating failure

In "both" mode: users still get default generation results, potentially not realizing mining failed

Is this graceful degradation intentional, or should mining failures in "discover" mode cause the entire operation to fail more explicitly? Consider whether the current logging-only approach provides sufficient visibility to users who might not check logs.

220-234: LGTM: Proper filtering of input domains from mined results.

The seen map correctly filters out input domains from the generated subdomains (lines 227-230), preventing the regulator algorithm from returning input subdomains as new discoveries. The map is local to the mining goroutine, avoiding any concurrency issues.

238-259: LGTM: Default generation properly checks context for cancellation.

The default generation goroutine correctly checks ctx.Done() at line 248 before processing each pattern, allowing timely cancellation of the operation.

349-351: LGTM: EstimateCount correctly includes mining results.

The addition of mining result counts (when available) to the overall estimate is correct and maintains accuracy across all modes.

467-470: LGTM: Atomic time access completes the race condition fix.

The Time() method now uses atomic.LoadInt64 to safely read the concurrently-written timeTaken field, completing the fix for the data race flagged in the previous review.

coderabbitai · 2025-11-13T14:32:42Z

mutator.go

+		go func() {
+			defer wg.Done()
+
+			// Run pattern mining
+			gologger.Info().Msgf("Running pattern mining on %d domains...", len(m.Options.Domains))
+			result, err := m.miner.Mine()
+			if err != nil {
+				gologger.Error().Msgf("pattern mining failed: %v", err)
+				return
+			}
+			m.miningResult = result
+			gologger.Info().Msgf("Discovered %d patterns from input domains", len(result.Patterns))
+
+			var seen = make(map[string]bool)
+			for _, sub := range m.Options.Domains {
+				seen[sub] = true
+			}
+			// In discover mode, only use mined patterns
+			generated := m.miner.GenerateFromPatterns(m.miningResult.Patterns)
+			for _, subdomain := range generated {
+				if seen[subdomain] {
+					// skip the input subdomains
+					// regulator algo has tendency to generate input subdomains as patterns
+					continue
+				}
+				seen[subdomain] = true
+				results <- subdomain
+			}
+		}()


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Add context cancellation support to the mining goroutine.

The mining goroutine runs a potentially long operation (m.miner.Mine() at line 212) but never checks if the context has been cancelled. In contrast, the default generation goroutine properly checks ctx.Done() at line 248.

This inconsistency means that if a user cancels the operation (via context cancellation), the default generation will stop, but mining will continue to consume resources until completion.

If m.miner.Mine() accepts a context parameter, pass it through. Otherwise, consider periodically checking the context or document this limitation. For example:

go func() { defer wg.Done() select { case <-ctx.Done(): return default: } // Run pattern mining gologger.Info().Msgf("Running pattern mining on %d domains...", len(m.Options.Domains)) result, err := m.miner.Mine() // or m.miner.MineWithContext(ctx) if available // ... rest of the logic }()

Can you verify whether Mine() supports context cancellation, or add periodic context checks within the mining loop?

--- 🏁 Script executed: ```shell #!/bin/bash # Find mutator.go and examine the context around lines 207-235 and 248 fd -t f 'mutator\.go' | head -5

Length of output: 10

🏁 Script executed:

#!/bin/bash # Examine the mining goroutine and context handling fd -t f 'mutator\.go' -x cat {} | head -300 | tail -100

Length of output: 2624

🏁 Script executed:

#!/bin/bash # Search for the miner interface definition and Mine() method rg -n "func.*Mine\(" --type=go -B2 -A5

Length of output: 617

🏁 Script executed:

#!/bin/bash # Search for ctx.Done() usage in mutator.go to understand the pattern rg -n "ctx\.Done\(\)" --type=go -B3 -A3

Length of output: 321

🏁 Script executed:

#!/bin/bash # Look for the miner interface to understand available methods rg -n "type.*[Mm]iner.*interface" --type=go -A10

Length of output: 0

--- 🏁 Script executed: ```shell #!/bin/bash # Find Miner type definition and all its methods rg -n "type Miner " --type=go -A2

Length of output: 218

🏁 Script executed:

#!/bin/bash # List all methods on Miner type in patternmining.go rg -n "func \(m \*Miner\)" internal/patternmining/patternmining.go

Length of output: 926

🏁 Script executed:

#!/bin/bash # Find the Execute function signature in mutator.go rg -n "func.*Execute" mutator.go | head -5

Length of output: 132

🏁 Script executed:

#!/bin/bash # Check function signature and context parameter rg -n "func \(m \*Mutator\) Execute" mutator.go -A5

Length of output: 442

🏁 Script executed:

#!/bin/bash # Search for MineWithContext or any context-aware mining methods rg -n "MineWithContext|miner\.Mine" --type=go

Length of output: 47

🏁 Script executed:

#!/bin/bash # Examine the Mine() implementation to understand its operations sed -n '96,260p' internal/patternmining/patternmining.go

Length of output: 4797

Add context cancellation support to the mining goroutine.

The mining goroutine runs a potentially long-running operation (m.miner.Mine()) without checking if the context has been cancelled, while the default generation goroutine properly checks ctx.Done() at line 248. The Mine() method doesn't currently accept a context parameter.

At minimum, add an early context check before the mining operation:

go func() { defer wg.Done() select { case <-ctx.Done(): return default: } // Run pattern mining gologger.Info().Msgf("Running pattern mining on %d domains...", len(m.Options.Domains)) result, err := m.miner.Mine() // ... rest of the logic }()

For a more complete fix, modify Mine() to accept a context parameter and periodically check it during the clustering loops to allow early termination on cancellation.

🤖 Prompt for AI Agents

In mutator.go around lines 207 to 235, the mining goroutine calls m.miner.Mine() without honoring context cancellation; add an early ctx cancellation check before starting mining (return immediately if <-ctx.Done()) and propagate a context through the miner by changing Mine() to Mine(ctx context.Context) and update all callsites so Mine periodically checks ctx.Done() inside its long-running loops and returns promptly on cancellation; ensure the goroutine uses the new signature and handles returned errors the same way.

Fixes incorrect validation logic that was summing character lengths instead of counting tokens. The previous code rejected valid subdomains like `payments-api.internal.example.com` because it summed character lengths (8+3+8=19) rather than counting actual tokens (3). Now properly counts the number of tokens across all subdomain levels by summing `len(tokenGroup)` for each group, preventing legitimate subdomains from being incorrectly rejected. Addresses CodeRabbit review feedback. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

cmd/alterx/main.go Outdated Show resolved Hide resolved

util.go Show resolved Hide resolved

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

cmd/alterx/main.go Show resolved Hide resolved

tarunKoyalwar requested a review from ehsandeep November 10, 2025 16:32

tarunKoyalwar and others added 2 commits November 13, 2025 19:48

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

ehsandeep approved these changes Nov 18, 2025

View reviewed changes

ehsandeep merged commit 9839604 into main Nov 18, 2025
9 checks passed

ehsandeep deleted the update-library-impl branch November 18, 2025 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: move pattern mining to library layer for better integration#284

refactor: move pattern mining to library layer for better integration#284
ehsandeep merged 5 commits intomainfrom
update-library-impl

tarunKoyalwar commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 13, 2025

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tarunKoyalwar commented Nov 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Benefits

Testing

Breaking Changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tarunKoyalwar commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 10, 2025 •

edited

Loading