Skip to content

feat: Add pdsh-style hostlist expression support#107

Merged
inureyes merged 2 commits intomainfrom
feature/issue-98-hostlist-expression
Dec 16, 2025
Merged

feat: Add pdsh-style hostlist expression support#107
inureyes merged 2 commits intomainfrom
feature/issue-98-hostlist-expression

Conversation

@inureyes
Copy link
Member

@inureyes inureyes commented Dec 16, 2025

Summary

  • Implement pdsh-style hostlist expression syntax for range-based host specification
  • Add new src/hostlist/ module with parser, expander, and error types
  • Integrate range expansion with -H, --filter, and --exclude options
  • Support simple ranges, zero-padded ranges, comma-separated values, mixed ranges, cartesian product, domain suffixes, and file input

Features

Expression Expansion
node[1-5] node1, node2, node3, node4, node5
node[01-05] node01, node02, node03, node04, node05
node[1,3,5] node1, node3, node5
node[1-3,7] node1, node2, node3, node7
rack[1-2]-node[1-3] rack1-node1, rack1-node2, ... (6 hosts)
web[1-3].example.com web1.example.com, web2.example.com, web3.example.com
^/path/to/file reads hosts from file

Test plan

  • All 52 hostlist unit tests pass
  • All 29 app::nodes integration tests pass
  • cargo clippy passes with no warnings
  • cargo build succeeds
  • Manual testing with actual SSH hosts

Closes #98

Implement range expansion syntax for specifying multiple hosts:
- Simple ranges: node[1-5] -> node1, node2, node3, node4, node5
- Zero-padded: node[01-05] -> node01, node02, node03, node04, node05
- Comma-separated: node[1,3,5] -> node1, node3, node5
- Mixed: node[1-3,7] -> node1, node2, node3, node7
- Cartesian product: rack[1-2]-node[1-3] -> 6 hosts
- File input: ^/path/to/file reads hosts from file

Integrate with -H option, --filter, and --exclude options.
Add 52 unit tests for hostlist module.
@inureyes inureyes added type:enhancement New feature or request status:review Under review pdsh-compat pdsh compatibility mode features labels Dec 16, 2025
@inureyes
Copy link
Member Author

Security & Performance Review

Analysis Summary

  • Scope: changed-files (10 files)
  • Languages: Rust
  • Total issues: 5
  • Critical: 0 | High: 1 | Medium: 3 | Low: 1

Prioritized Issue Roadmap

HIGH - Potential Resource Exhaustion via Hostfile

File: /Users/inureyes/Development/backend.ai/bssh/src/hostlist/parser.rs (lines 388-410)

Issue: The parse_hostfile function reads an entire file into memory without any size limit. A malicious or accidentally large hostfile (via ^/path/to/file syntax) could cause memory exhaustion.

Current code:

pub fn parse_hostfile(path: &Path) -> Result<Vec<String>, HostlistError> {
    let content = std::fs::read_to_string(path).map_err(|e| {
        // ... error handling
    })?;
    // No size check before processing

Recommendation: Add a maximum file size check before reading:

const MAX_HOSTFILE_SIZE: u64 = 1_048_576; // 1MB limit
let metadata = std::fs::metadata(path).map_err(/* ... */)?;
if metadata.len() > MAX_HOSTFILE_SIZE {
    return Err(HostlistError::FileTooLarge { path: path.display().to_string(), limit: MAX_HOSTFILE_SIZE });
}

MEDIUM - Code Duplication

Files:

  • /Users/inureyes/Development/backend.ai/bssh/src/main.rs (lines 243-310)
  • /Users/inureyes/Development/backend.ai/bssh/src/app/nodes.rs (lines 353-423)

Issue: The functions is_hostlist_expression and looks_like_hostlist_range are duplicated verbatim in both files. This violates DRY principle and increases maintenance burden.

Recommendation: Keep only the implementation in src/app/nodes.rs and make it public, or move these utility functions into the hostlist module itself for reuse:

// In src/hostlist/mod.rs
pub fn is_hostlist_expression(pattern: &str) -> bool { ... }

MEDIUM - Unused Sign Variable

File: /Users/inureyes/Development/backend.ai/bssh/src/hostlist/parser.rs (lines 354-376)

Issue: The sign variable is computed but then explicitly ignored with let _ = sign;. This is dead code that should be cleaned up.

Current code:

let (sign, digits) = if let Some(rest) = s.strip_prefix('-') {
    (-1, rest)
} else {
    (1, s)
};
// ...
let _ = sign; // value already includes sign from parse

Recommendation: Remove the unnecessary sign calculation since i64::parse() already handles negative numbers:

let digits = s.strip_prefix('-').unwrap_or(s);
let padding = if digits.len() > 1 && digits.starts_with('0') {
    digits.len()
} else {
    0
};
let value: i64 = s.parse().map_err(/* ... */)?;

MEDIUM - Missing Overflow Protection in Cartesian Product

File: /Users/inureyes/Development/backend.ai/bssh/src/hostlist/expander.rs (lines 92-128)

Issue: The expand_segments function uses Vec::with_capacity(results.len() * values.len()) which could overflow on extremely large inputs. While MAX_EXPANSION_SIZE (100,000) provides some protection, checking happens before expansion starts. If multiple ranges are processed sequentially, intermediate allocations could still be large.

Current protection:

// Only checks final count before starting expansion
let expansion_count = pattern.expansion_count();
if expansion_count > MAX_EXPANSION_SIZE { ... }

Recommendation: Add a check within the expansion loop or use checked_mul:

let new_capacity = results.len()
    .checked_mul(values.len())
    .ok_or_else(|| HostlistError::RangeTooLarge { /* ... */ })?;
if new_capacity > MAX_EXPANSION_SIZE {
    return Err(HostlistError::RangeTooLarge { /* ... */ });
}

LOW - Potential Off-by-One Edge Case in Range Count

File: /Users/inureyes/Development/backend.ai/bssh/src/hostlist/parser.rs (lines 114-122)

Issue: The RangeItem::count() method correctly handles the case where end < start by returning 0, but this is inconsistent with the parser which explicitly rejects reversed ranges. The dead code path suggests a potential logic gap.

Current code:

fn count(&self) -> usize {
    match self {
        RangeItem::Single(_) => 1,
        RangeItem::Range { start, end } => {
            if end >= start {
                (end - start + 1) as usize
            } else {
                0  // This code path is dead since parser rejects reversed ranges
            }
        }
    }
}

Recommendation: Consider using .saturating_sub() for defensive programming or add a debug assertion:

RangeItem::Range { start, end } => {
    debug_assert!(end >= start, "Reversed ranges should be rejected by parser");
    (*end - *start + 1) as usize
}

Positive Findings

The implementation demonstrates several security best practices:

  1. Input Validation: Strong validation in src/app/nodes.rs for patterns (length limits, wildcard limits, character validation, path traversal detection)

  2. Range Size Limits: The MAX_EXPANSION_SIZE constant (100,000) prevents denial-of-service through excessive expansion

  3. Error Handling: Comprehensive error types with context using thiserror for clear error messages

  4. Test Coverage: 52 unit tests for hostlist module + 29 integration tests for node filtering, covering edge cases and error conditions

  5. No Injection Vulnerabilities: The parser correctly handles bracket matching without allowing injection of shell metacharacters


Test Coverage Assessment

Well covered:

  • Simple and complex range expansion
  • Zero-padded ranges
  • Cartesian product expansion
  • Error cases (empty brackets, reversed ranges, invalid numbers)
  • Pattern matching and filtering

Could benefit from additional tests:

  • Very long patterns at boundary limits
  • Unicode characters in hostnames
  • Empty file input (^/dev/null equivalent)
  • Symlink following for hostfile paths

Summary

This is a well-implemented feature with good test coverage and security considerations. The main issues are:

  1. One HIGH severity issue around unbounded file reading
  2. Code duplication that should be refactored
  3. Minor cleanup of unused code

Overall, the PR is in good shape for merge after addressing the HIGH severity hostfile size limit issue.

@inureyes inureyes added status:done Completed and removed status:review Under review labels Dec 16, 2025
@inureyes inureyes self-assigned this Dec 16, 2025
)

* fix: Address PR review findings - resource limits and code cleanup

HIGH Priority Fixes:
- Add resource exhaustion protection in parse_hostfile():
  * Maximum file size limit of 1 MB
  * Maximum line count limit of 100,000 lines
  * Check file size before reading to prevent DoS attacks

MEDIUM Priority Fixes:
- Remove code duplication:
  * Move is_hostlist_expression() and looks_like_hostlist_range()
    from src/main.rs and src/app/nodes.rs to src/hostlist/mod.rs
  * Export functions as public API from hostlist module
  * Update all call sites to use the exported functions

- Remove unused variable:
  * Remove unused 'sign' variable from parse_number() in parser.rs

- Add overflow protection:
  * Use checked_mul() for cartesian product allocations in expander.rs
  * Return RangeTooLarge error if overflow would occur
  * Prevent integer overflow in intermediate calculations

All changes compile successfully and pass tests (cargo test, cargo clippy).

* docs: Add HOSTLIST EXPRESSIONS section and update option docs in manpage
@inureyes inureyes merged commit 5c5f1f0 into main Dec 16, 2025
1 of 2 checks passed
@inureyes inureyes deleted the feature/issue-98-hostlist-expression branch December 16, 2025 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pdsh-compat pdsh compatibility mode features status:done Completed type:enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add pdsh-style hostlist expression support (range expansion)

1 participant