Skip to content

refactor: move pattern mining to library layer for better integration#284

Merged
ehsandeep merged 5 commits intomainfrom
update-library-impl
Nov 18, 2025
Merged

refactor: move pattern mining to library layer for better integration#284
ehsandeep merged 5 commits intomainfrom
update-library-impl

Conversation

@tarunKoyalwar
Copy link
Member

@tarunKoyalwar tarunKoyalwar commented Nov 10, 2025

Summary

This PR refactors the pattern mining implementation to move it from the CLI layer into the core library layer, enabling programmatic usage and better separation of concerns.

Key Changes

  • Add Mode field to Options struct: Supports default, discover, and both modes
  • Concurrent execution: Discover and default modes run in parallel for better performance in "both" mode
  • Domain validation utility: New getNValidateRootDomain function in util.go with proper error handling for edge cases
  • Centralized validation: Options.Validate() method consolidates all validation logic
  • Backward compatibility: Empty Mode field defaults to "default" mode, preserving existing behavior

Benefits

  • Programmatic usage: Library users can now integrate pattern mining directly into their applications
  • Better architecture: Clear separation between CLI concerns and core library functionality
  • Improved performance: Concurrent execution in "both" mode
  • Robust error handling: Fixed edge case where all domains are empty/whitespace-only
  • Maintainability: Cleaner code organization with consolidated validation logic

Testing

  • ✅ All existing tests pass
  • ✅ Linter passes with 0 issues
  • ✅ Functional testing verified:
    • Default mode with patterns
    • Discover mode with pattern mining
    • Both mode (combined)
    • ClusterBomb algorithm with custom payloads
    • Enrich mode with built-in patterns

Breaking Changes

None - this is a pure refactoring that maintains full backward compatibility.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Add pattern discovery modes with configurable mining parameters and ability to save discovered rules after a run.
  • Improvements

    • Unified execution path with a single output writer, concurrent pattern mining, and streamlined estimation/reporting of generated payload counts.
  • Validation

    • Enforced homogeneous root/domain validation and added safeguards to skip unsuitable or overly long inputs.
  • Documentation

    • Updated architecture and usage docs to reflect modes, concurrency, and rule-saving.

This refactor moves pattern mining capabilities from the CLI layer into the core Mutator library, enabling programmatic use and better separation of concerns.

Key changes:
- Add Mode field to Options struct (default, discover, both)
- Implement concurrent execution of discover and default modes
- Move domain validation to util.go with proper error handling
- Add Options.Validate() for centralized validation logic
- Maintain backward compatibility (empty Mode defaults to "default")
- Fix edge case in getNValidateRootDomain for all-empty domains

Benefits:
- Library users can now use pattern mining programmatically
- Better code organization with clear separation of concerns
- Concurrent execution improves performance in "both" mode
- More robust error handling for edge cases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Nov 10, 2025

Walkthrough

CLI simplified to a single execution path: alterx.Options is built and validated, Mutator gained mode-aware mining and concurrent execution, pattern-mining results can be saved via SaveRules, root-domain validation moved to utilities, and pattern-mining input checks were tightened.

Changes

Cohort / File(s) Summary
CLI / main
cmd/alterx/main.go
Builds a single alterx.Options, validates/merges permutation config, initializes Alterx once, runs unified ExecuteWithWriter(output), uses EstimateCount() for counts, and invokes SaveRules post-run when requested. Removed mode branching, dedup-writer flow, and output-close logic.
Mutator / pattern mining
mutator.go
Added Options.Validate() and new exported fields on Options (Mode, MinDistance, MaxDistance, PatternThreshold, QualityRatio, NgramsLimit, MaxLength). Mutator now stores rootDomain, miner, miningResult, and uses atomic timeTaken. New() initializes per Mode. Execute() runs mining concurrently with input generation, merges results, EstimateCount() incorporates miner counts, and SaveRules(filename) persists mining output.
Utilities / domain validation
util.go
Added getNValidateRootDomain(domains []string) (string, error) using publicsuffix.EffectiveTLDPlusOne to derive and enforce a common root domain across inputs; returns clear errors for empty/non-homogeneous inputs.
Pattern-mining internals
internal/patternmining/patternmining.go
Introduced early-sanitization checks in domain validation to skip hosts with >5 subdomain levels or excessive token-length sums, adding early-continue branches to prevent further processing of those inputs.
Docs / architecture
CLAUDE.md
Documentation updated to reflect CLI flow, Mode semantics (default/discover/both), concurrent mining in Mutator.Execute(), root-domain validation, DFA/NGram notes, and SaveRules behavior.
Repo config
/.gitignore
Added /.testing to ignore rules.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as cmd/alterx
    participant Mut as alterx.Mutator
    participant Miner as patternmining.Miner
    participant Gen as Generator
    CLI->>Mut: Build & Validate alterx.Options
    Note right of Mut: Init Miner if Mode == "discover" or "both"
    CLI->>Mut: ExecuteWithWriter(output)
    par Input generation
        Mut->>Gen: Generate from inputs
    and Pattern mining (async)
        Mut->>Miner: Start mining
        Miner-->>Mut: miningResult
        Mut->>Gen: Generate from mined patterns
    end
    Gen-->>Mut: candidates
    Mut->>Mut: Merge & dedupe
    Mut-->>CLI: Write output
    alt SaveRules requested
        CLI->>Mut: SaveRules(path)
        Mut->>Miner: Export rules
        Miner-->>CLI: Rules persisted
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Inspect concurrency and synchronization in mutator.go::Execute() (goroutines, WaitGroup, channels, atomic timeTaken).
  • Verify Options.Validate() defaults, allowed Mode values, and interactions between options.
  • Review SaveRules() persistence/IO and error paths.
  • Validate getNValidateRootDomain publicsuffix usage and homogeneity checks.
  • Check early-rejection thresholds in internal/patternmining/patternmining.go for correctness and test coverage.

Poem

🐇 I hopped through code where patterns hide,
Mining in parallel, inputs at my side.
Rules saved at dawn, candidates take flight,
Small hops of logic, everything feels right.
A rabbit cheers — the pipeline hums bright!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor: move pattern mining to library layer for better integration' accurately and specifically summarizes the main objective of the PR, which is moving pattern mining from CLI to library layer.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch update-library-impl

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e46ff7e and ddb275d.

📒 Files selected for processing (1)
  • internal/patternmining/patternmining.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test Builds (windows-latest, 1.21.x)
🔇 Additional comments (2)
internal/patternmining/patternmining.go (2)

280-288: LGTM! Clear safety check for subdomain depth.

The validation correctly skips deeply nested subdomains (>5 levels) to avoid expensive computations. The example and log message clearly communicate the rationale.


289-299: <!-- [scratchpad_start]
[task review_changes_2/2]
Review lines 289-299: token count check - VERIFICATION COMPLETE
[observations]

  • tokenize() returns [][][]string: slice of hosts → slice of labels → slice of tokens
  • tokens is [][][]string result from tokenize
  • tokens[0] is [][]string (all labels for first host)
  • In loop: tokenGroup is []string (tokens for one label)
  • len(tokenGroup) gives count of tokens in that label
  • tokenCount sums across all labels
  • Log message correctly says "more than 10 tokens"
    [analysis]
    Current implementation is CORRECT:
  • Correctly counts total tokens across all labels of first host
  • Past suggestion "tokenCount := len(tokens[0])" was WRONG (would only count labels, not tokens)
  • Implementation properly rejects if total tokens > 10
  • Variable naming is accurate (tokenGroup is a slice of token strings)
    [done]
    Verification complete. Current code is correct. Past review comment suggestion was incorrect.
    [scratchpad_end]
    -->

No issues found—implementation correctly counts tokens and is more accurate than the past review suggestion.

The data structure verification confirms that tokenize() returns [][][]string where:

  • tokens[0] is all subdomain labels for the first host ([][]string)
  • Each tokenGroup in the loop is a slice of tokens for one label ([]string)
  • len(tokenGroup) correctly counts tokens per label

Your implementation sums token counts across all labels, which properly rejects inputs with >10 total tokens. The past review suggestion of tokenCount := len(tokens[0]) was incorrect—it would only count the number of labels, not total tokens. Your current code is more accurate for the stated goal.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03451bf and a04895a.

📒 Files selected for processing (3)
  • cmd/alterx/main.go (2 hunks)
  • mutator.go (5 hunks)
  • util.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
mutator.go (4)
internal/runner/runner.go (1)
  • Options (17-40)
internal/patternmining/patternmining.go (3)
  • Miner (82-85)
  • Result (47-50)
  • NewMiner (88-93)
config.go (1)
  • DefaultConfig (18-18)
replacer.go (1)
  • Replace (19-27)
cmd/alterx/main.go (2)
mutator.go (1)
  • Options (29-60)
internal/runner/runner.go (1)
  • Options (17-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test Builds (macOS-latest, 1.21.x)
  • GitHub Check: Analyze (go)
  • GitHub Check: Test Builds (windows-latest, 1.21.x)

Fixes critical bug where SaveRules was called before pattern mining
completed, causing miningResult to be nil.

Changes:
- Move SaveRules block after ExecuteWithWriter call
- Ensures mining goroutine completes before saving rules
- Skips SaveRules in estimate mode (no execution happens)
- Add explanatory comment about execution order requirement

Resolves CodeRabbit review comment.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
cmd/alterx/main.go (1)

67-72: SaveRules correctly placed after execution.

The past review comment flagged that SaveRules was called before mining, causing it to always fail. The refactor now correctly calls SaveRules after ExecuteWithWriter (line 61), ensuring that m.miningResult is populated before attempting to save. Well done!

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a04895a and 63fd622.

📒 Files selected for processing (1)
  • cmd/alterx/main.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cmd/alterx/main.go (2)
mutator.go (1)
  • Options (29-60)
internal/runner/runner.go (1)
  • Options (17-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test Builds (macOS-latest, 1.21.x)
  • GitHub Check: Test Builds (windows-latest, 1.21.x)
🔇 Additional comments (2)
cmd/alterx/main.go (2)

33-33: Verify the hardcoded MaxLength value.

MaxLength is hardcoded to 1000 without explanation. Confirm whether this should be exposed as a configurable CLI option or if this default is intentional for all use cases.


61-72: Verify SaveRules behavior when ExecuteWithWriter fails.

If ExecuteWithWriter returns an error (line 61), mining may not have completed successfully, yet SaveRules (line 69) still executes. Confirm whether incomplete mining results should be saved or if SaveRules should be skipped when err != nil.

Consider applying this diff to skip saving rules on execution failure:

 	// Execute mutator (handles all modes internally)
 	if err = m.ExecuteWithWriter(output); err != nil {
 		gologger.Error().Msgf("failed to execute alterx: %v", err)
+		return
 	}
 
 	gologger.Info().Msgf("Generated %d total unique subdomains", m.PayloadCount())
 
 	// Save rules if requested (must be after Execute to ensure mining is complete)
 	if cliOpts.SaveRules != "" {
 		if err := m.SaveRules(cliOpts.SaveRules); err != nil {
 			gologger.Error().Msgf("failed to save rules: %v", err)
 		}
 	}

tarunKoyalwar and others added 2 commits November 13, 2025 19:48
… mining

This commit addresses timing accuracy and prevents resource exhaustion on complex inputs. The timing measurement now captures the full execution duration across parallel goroutines, and input validation rejects problematic domains (>5 levels or >10 tokens) to avoid blocking issues. Also improves user experience by removing duplicate logging and clarifying output messages.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fixes GitHub Actions Build Test failures caused by concurrent read/write access to timeTaken field. The cleanup goroutine writes to timeTaken while the main goroutine reads it via Time() method, creating a race condition detected by Go's race detector.

Solution: Use sync/atomic operations (StoreInt64/LoadInt64) for thread-safe access to timeTaken field, ensuring proper synchronization across goroutines without blocking overhead.

Tested locally with race detector - all tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63fd622 and 58f6ea0.

📒 Files selected for processing (5)
  • .gitignore (1 hunks)
  • CLAUDE.md (6 hunks)
  • cmd/alterx/main.go (2 hunks)
  • internal/patternmining/patternmining.go (1 hunks)
  • mutator.go (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
cmd/alterx/main.go (2)
mutator.go (1)
  • Options (29-60)
internal/runner/runner.go (1)
  • Options (17-40)
mutator.go (3)
internal/patternmining/patternmining.go (4)
  • Options (27-44)
  • Miner (82-85)
  • Result (47-50)
  • NewMiner (88-93)
internal/runner/runner.go (1)
  • Options (17-40)
config.go (1)
  • DefaultConfig (18-18)
🪛 GitHub Actions: 🔨 Build Test
cmd/alterx/main.go

[error] 61-61: Go data race detected; application exited with race detector enabled. See stack traces for details.

mutator.go

[error] 467-467: DATA RACE detected in Mutator.Time() (read) during race-enabled run.


[error] 263-263: DATA RACE detected in Mutator.Execute.func3() (write) during race-enabled run.


[error] 286-286: DATA RACE detected in Mutator.ExecuteWithWriter() (concurrent access) during race-enabled run.


[error] 280-280: DATA RACE detected in Mutator.ExecuteWithWriter() (concurrent access) during race-enabled run.

Comment on lines 289 to 299
sum := 0
for _, token := range tokens[0] {
sum += len(token)
}
// to avoid expensive computation skip any subdomain that can be tokenized into more than 10 tokens
if sum > 10 {
// ex: api1dev-home-us1..... basically even if subdomain levels are less than 5 but have too many
// seperators the vector length would become too long
gologger.Verbose().Msgf("Rejecting input: %s since it can be tokenized into more than 10 tokens", host)
continue
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Rejecting valid inputs due to wrong “token” metric

This branch aims to skip hosts that explode into >10 tokens, but it sums the character lengths of each token. Any realistic subdomain like payments-api.internal.example.com (8+3+8 characters) now exceeds the threshold and gets rejected even though it only has three tokens. That breaks pattern mining for the majority of real-world inputs.

Replace the character-sum with an actual token-count check so we only skip when len(tokens[0]) itself exceeds the limit, and update the log accordingly.

-	sum := 0
-	for _, token := range tokens[0] {
-		sum += len(token)
-	}
-	// to avoid expensive computation skip any subdomain that can be tokenized into more than 10 tokens
-	if sum > 10 {
-		// ex: api1dev-home-us1..... basically even if subdomain levels are less than 5 but have too many
-		// seperators the vector length would become too long
-		gologger.Verbose().Msgf("Rejecting input: %s since it can be tokenized into more than 10 tokens", host)
+	tokenCount := len(tokens[0])
+	// to avoid expensive computation skip any subdomain that can be tokenized into more than 10 tokens
+	if tokenCount > 10 {
+		// ex: api1dev-home-us1..... basically even if subdomain levels are less than 5 but have too many
+		// separators the vector length would become too long
+		gologger.Verbose().Msgf("Rejecting input: %s since it has more than 10 tokens", host)
 		continue
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sum := 0
for _, token := range tokens[0] {
sum += len(token)
}
// to avoid expensive computation skip any subdomain that can be tokenized into more than 10 tokens
if sum > 10 {
// ex: api1dev-home-us1..... basically even if subdomain levels are less than 5 but have too many
// seperators the vector length would become too long
gologger.Verbose().Msgf("Rejecting input: %s since it can be tokenized into more than 10 tokens", host)
continue
}
tokenCount := len(tokens[0])
// to avoid expensive computation skip any subdomain that can be tokenized into more than 10 tokens
if tokenCount > 10 {
// ex: api1dev-home-us1..... basically even if subdomain levels are less than 5 but have too many
// separators the vector length would become too long
gologger.Verbose().Msgf("Rejecting input: %s since it has more than 10 tokens", host)
continue
}
🤖 Prompt for AI Agents
In internal/patternmining/patternmining.go around lines 289 to 299, the code
incorrectly sums character lengths of tokens and rejects hosts when that
character sum is >10; change this to check the actual token count instead.
Replace the sum calculation and if-check with a simple count check: if
len(tokens[0]) > 10 { ... } (i.e., compare number of tokens, not sum of token
lengths), and update the log message to indicate rejection due to too many
tokens rather than too many characters. Ensure the continue behavior and verbose
log remain the same.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
mutator.go (1)

63-91: Mode validation and defaults look good, but remove the no-op assignment.

The validation logic correctly defaults to "default" mode for backward compatibility and validates the mode string. Default values for mining parameters are reasonable.

However, lines 85-86 are a no-op since NgramsLimit is already 0 by default:

-	if v.NgramsLimit == 0 {
-		v.NgramsLimit = 0
-	}

Consider either removing this entirely or adding a non-zero default if one is intended. Based on the comment "0 = no limit" in the patternmining.Options struct, the current zero value is intentional, making this assignment unnecessary.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58f6ea0 and e46ff7e.

📒 Files selected for processing (1)
  • mutator.go (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
mutator.go (4)
internal/patternmining/patternmining.go (4)
  • Options (27-44)
  • Miner (82-85)
  • Result (47-50)
  • NewMiner (88-93)
internal/runner/runner.go (1)
  • Options (17-40)
inputs.go (1)
  • Input (14-22)
replacer.go (1)
  • Replace (19-27)
🔇 Additional comments (12)
mutator.go (12)

10-11: LGTM: Necessary imports for concurrent execution and pattern mining.

The addition of sync, sync/atomic, and the internal patternmining package appropriately supports the new concurrent mode execution and atomic time tracking.

Also applies to: 14-14


46-61: LGTM: Well-structured Options additions.

The new fields support the three operational modes (default, discover, both) with appropriate mining parameters. The fields are well-documented and maintain backward compatibility.


98-98: Excellent fix for the race condition!

Changing timeTaken from time.Duration to int64 with atomic operations (used at lines 264 and 468) properly resolves the data race flagged in the previous review. The nanoseconds are stored atomically and converted back to duration when read.


101-103: LGTM: New fields appropriately support pattern mining.

The addition of rootDomain, miner, and miningResult fields enables the library to maintain mining state and results for programmatic access.


107-175: Well-structured refactoring with clear mode-based initialization.

The refactored New() function cleanly separates concerns:

  • Validates options upfront (lines 108-110)
  • Initializes mining components for discover/both modes (lines 116-137)
  • Initializes default generation components for default/both modes (lines 139-172)

The mode-specific initialization ensures that "both" mode properly sets up both pipelines while maintaining backward compatibility when mode is unspecified. Error handling is appropriate throughout.


177-190: LGTM: Clear error handling and appropriate guards.

The SaveRules() method properly validates that mining is enabled and results are available before attempting to save. The error messages are clear and the success logging is helpful.


194-204: LGTM: Concurrent execution properly synchronized with atomic time tracking.

The refactored Execute() uses a WaitGroup to synchronize parallel mining and default generation goroutines. The atomic storage of timeTaken at line 264 correctly addresses the race condition from the previous review.

Also applies to: 261-265


213-216: Verify the intended behavior when mining fails.

When mining fails, the error is logged and the goroutine returns without sending any results. This means:

  • In "discover" mode: users get zero results with only a log message indicating failure
  • In "both" mode: users still get default generation results, potentially not realizing mining failed

Is this graceful degradation intentional, or should mining failures in "discover" mode cause the entire operation to fail more explicitly? Consider whether the current logging-only approach provides sufficient visibility to users who might not check logs.


220-234: LGTM: Proper filtering of input domains from mined results.

The seen map correctly filters out input domains from the generated subdomains (lines 227-230), preventing the regulator algorithm from returning input subdomains as new discoveries. The map is local to the mining goroutine, avoiding any concurrency issues.


238-259: LGTM: Default generation properly checks context for cancellation.

The default generation goroutine correctly checks ctx.Done() at line 248 before processing each pattern, allowing timely cancellation of the operation.


349-351: LGTM: EstimateCount correctly includes mining results.

The addition of mining result counts (when available) to the overall estimate is correct and maintains accuracy across all modes.


467-470: LGTM: Atomic time access completes the race condition fix.

The Time() method now uses atomic.LoadInt64 to safely read the concurrently-written timeTaken field, completing the fix for the data race flagged in the previous review.

Comment on lines +207 to +235
go func() {
defer wg.Done()

// Run pattern mining
gologger.Info().Msgf("Running pattern mining on %d domains...", len(m.Options.Domains))
result, err := m.miner.Mine()
if err != nil {
gologger.Error().Msgf("pattern mining failed: %v", err)
return
}
m.miningResult = result
gologger.Info().Msgf("Discovered %d patterns from input domains", len(result.Patterns))

var seen = make(map[string]bool)
for _, sub := range m.Options.Domains {
seen[sub] = true
}
// In discover mode, only use mined patterns
generated := m.miner.GenerateFromPatterns(m.miningResult.Patterns)
for _, subdomain := range generated {
if seen[subdomain] {
// skip the input subdomains
// regulator algo has tendency to generate input subdomains as patterns
continue
}
seen[subdomain] = true
results <- subdomain
}
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Add context cancellation support to the mining goroutine.

The mining goroutine runs a potentially long operation (m.miner.Mine() at line 212) but never checks if the context has been cancelled. In contrast, the default generation goroutine properly checks ctx.Done() at line 248.

This inconsistency means that if a user cancels the operation (via context cancellation), the default generation will stop, but mining will continue to consume resources until completion.

If m.miner.Mine() accepts a context parameter, pass it through. Otherwise, consider periodically checking the context or document this limitation. For example:

go func() {
	defer wg.Done()
	
	select {
	case <-ctx.Done():
		return
	default:
	}
	
	// Run pattern mining
	gologger.Info().Msgf("Running pattern mining on %d domains...", len(m.Options.Domains))
	result, err := m.miner.Mine()  // or m.miner.MineWithContext(ctx) if available
	// ... rest of the logic
}()

Can you verify whether Mine() supports context cancellation, or add periodic context checks within the mining loop?



---

🏁 Script executed:

```shell
#!/bin/bash
# Find mutator.go and examine the context around lines 207-235 and 248
fd -t f 'mutator\.go' | head -5

Length of output: 10


🏁 Script executed:

#!/bin/bash
# Examine the mining goroutine and context handling
fd -t f 'mutator\.go' -x cat {} | head -300 | tail -100

Length of output: 2624


🏁 Script executed:

#!/bin/bash
# Search for the miner interface definition and Mine() method
rg -n "func.*Mine\(" --type=go -B2 -A5

Length of output: 617


🏁 Script executed:

#!/bin/bash
# Search for ctx.Done() usage in mutator.go to understand the pattern
rg -n "ctx\.Done\(\)" --type=go -B3 -A3

Length of output: 321


🏁 Script executed:

#!/bin/bash
# Look for the miner interface to understand available methods
rg -n "type.*[Mm]iner.*interface" --type=go -A10

Length of output: 0



---

🏁 Script executed:

```shell
#!/bin/bash
# Find Miner type definition and all its methods
rg -n "type Miner " --type=go -A2

Length of output: 218


🏁 Script executed:

#!/bin/bash
# List all methods on Miner type in patternmining.go
rg -n "func \(m \*Miner\)" internal/patternmining/patternmining.go

Length of output: 926


🏁 Script executed:

#!/bin/bash
# Find the Execute function signature in mutator.go
rg -n "func.*Execute" mutator.go | head -5

Length of output: 132


🏁 Script executed:

#!/bin/bash
# Check function signature and context parameter
rg -n "func \(m \*Mutator\) Execute" mutator.go -A5

Length of output: 442


🏁 Script executed:

#!/bin/bash
# Search for MineWithContext or any context-aware mining methods
rg -n "MineWithContext|miner\.Mine" --type=go

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Examine the Mine() implementation to understand its operations
sed -n '96,260p' internal/patternmining/patternmining.go

Length of output: 4797


Add context cancellation support to the mining goroutine.

The mining goroutine runs a potentially long-running operation (m.miner.Mine()) without checking if the context has been cancelled, while the default generation goroutine properly checks ctx.Done() at line 248. The Mine() method doesn't currently accept a context parameter.

At minimum, add an early context check before the mining operation:

go func() {
	defer wg.Done()
	
	select {
	case <-ctx.Done():
		return
	default:
	}
	
	// Run pattern mining
	gologger.Info().Msgf("Running pattern mining on %d domains...", len(m.Options.Domains))
	result, err := m.miner.Mine()
	// ... rest of the logic
}()

For a more complete fix, modify Mine() to accept a context parameter and periodically check it during the clustering loops to allow early termination on cancellation.

🤖 Prompt for AI Agents
In mutator.go around lines 207 to 235, the mining goroutine calls m.miner.Mine()
without honoring context cancellation; add an early ctx cancellation check
before starting mining (return immediately if <-ctx.Done()) and propagate a
context through the miner by changing Mine() to Mine(ctx context.Context) and
update all callsites so Mine periodically checks ctx.Done() inside its
long-running loops and returns promptly on cancellation; ensure the goroutine
uses the new signature and handles returned errors the same way.

Fixes incorrect validation logic that was summing character lengths instead of counting tokens. The previous code rejected valid subdomains like `payments-api.internal.example.com` because it summed character lengths (8+3+8=19) rather than counting actual tokens (3).

Now properly counts the number of tokens across all subdomain levels by summing `len(tokenGroup)` for each group, preventing legitimate subdomains from being incorrectly rejected.

Addresses CodeRabbit review feedback.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ehsandeep ehsandeep merged commit 9839604 into main Nov 18, 2025
9 checks passed
@ehsandeep ehsandeep deleted the update-library-impl branch November 18, 2025 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants