Skip to content

chore: bloom filter opt#1811

Merged
sergeytimoshin merged 1 commit intomainfrom
jorrit/chore-bloom-filter-opt
Jun 15, 2025
Merged

chore: bloom filter opt#1811
sergeytimoshin merged 1 commit intomainfrom
jorrit/chore-bloom-filter-opt

Conversation

@ananas-block
Copy link
Contributor

@ananas-block ananas-block commented Jun 15, 2025

Issue:

  • we use 2 hashes in bloom filter hashing, this is not strictly necessary

Solution:

  • Remove one of the hashes
  • switch to no-std-keccak hashes which are a couple CU cheaper than fastmurmur3
  • use the complete value as keccak hash input and use the complete keccak output
    -> this makes it harder to construct an account to produce a bloom filter collision

Summary by CodeRabbit

  • Refactor

    • Updated hashing implementation to use a Keccak-based approach, replacing previous hashing methods.
    • Simplified and unified hashing logic for improved maintainability and consistency across platforms.
  • Chores

    • Updated dependencies to remove unused libraries and add the new Keccak hashing library.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 15, 2025

Walkthrough

The changes replace the use of the fastmurmur3 hashing crate with the solana-nostd-keccak crate in both the bloom filter and hasher libraries. Hashing logic in the bloom filter and hasher implementations is updated to use Keccak-based hashing, removing platform-specific code and simplifying hash computation.

Changes

File(s) Change Summary
program-libs/bloom-filter/Cargo.toml Replaced fastmurmur3 dependency with solana-nostd-keccak.
program-libs/bloom-filter/src/lib.rs Replaced probe_index_fast_murmur with probe_index_keccak, changed hashing input and index calculation to use Keccak. Updated insert methods accordingly.
program-libs/hasher/Cargo.toml Added solana-nostd-keccak as a dependency.
program-libs/hasher/src/keccak.rs Refactored hashv to delegate hashing to solana_nostd_keccak::hashv, removing platform-specific and unsafe code.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant BloomFilter
    participant KeccakHasher

    Caller->>BloomFilter: probe_index_keccak(value_bytes, iteration, capacity)
    BloomFilter->>KeccakHasher: hash(concatenated value_bytes + iteration)
    KeccakHasher-->>BloomFilter: keccak_hash_result
    BloomFilter->>BloomFilter: Compute index from hash chunks
    BloomFilter-->>Caller: index
Loading
sequenceDiagram
    participant Caller
    participant KeccakHasher
    participant solana_nostd_keccak

    Caller->>KeccakHasher: hashv(input_slices)
    KeccakHasher->>solana_nostd_keccak: hashv(input_slices)
    solana_nostd_keccak-->>KeccakHasher: hash_result
    KeccakHasher-->>Caller: hash_result
Loading

Poem

A hop and a skip, the hashes now leap,
From murmur to Keccak, a change that runs deep.
No more unsafe, no platform divide,
Just Keccak’s strong whiskers, in code we confide.
With hashes now swirling as rabbits might play,
The bloom and the hasher both brighten their day!
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31721fa and 287e260.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • program-libs/bloom-filter/Cargo.toml (1 hunks)
  • program-libs/bloom-filter/src/lib.rs (2 hunks)
  • program-libs/hasher/Cargo.toml (1 hunks)
  • program-libs/hasher/src/keccak.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • program-libs/hasher/Cargo.toml
  • program-libs/bloom-filter/Cargo.toml
  • program-libs/hasher/src/keccak.rs
  • program-libs/bloom-filter/src/lib.rs
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: lint
  • GitHub Check: stateless-js
  • GitHub Check: Test concurrent-merkle-tree
  • GitHub Check: Test sdk-libs
  • GitHub Check: Test program-libs
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ananas-block ananas-block force-pushed the jorrit/chore-bloom-filter-opt branch from 8fffb35 to c478e32 Compare June 15, 2025 15:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
program-libs/bloom-filter/Cargo.toml (1)

15-15: Same feature-flag concern as in light-hasher

Mirror the minimal-feature declaration here as well to keep both crates in sync and avoid duplicate code paths being built.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55ee3cd and c478e32.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • program-libs/bloom-filter/Cargo.toml (1 hunks)
  • program-libs/bloom-filter/src/lib.rs (1 hunks)
  • program-libs/hasher/Cargo.toml (1 hunks)
  • program-libs/hasher/src/keccak.rs (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Test program-libs
  • GitHub Check: stateless-js
  • GitHub Check: Test concurrent-merkle-tree
  • GitHub Check: Test sdk-libs
  • GitHub Check: lint
🔇 Additional comments (1)
program-libs/hasher/Cargo.toml (1)

26-26: Pinning solana-nostd-keccak without feature flags may bloat the WASM/BPF build

solana-nostd-keccak = "0.1.3" pulls in all default features of the crate.
If the crate exposes std / alloc gated features, they are compiled for the BPF target unless you explicitly disable them, increasing binary size.

-solana-nostd-keccak = "0.1.3"
+solana-nostd-keccak = { version = "0.1.3", default-features = false, features = ["keccak"] }

Validate the exact feature set the crate provides and enable only what’s needed.

@ananas-block ananas-block force-pushed the jorrit/chore-bloom-filter-opt branch 2 times, most recently from 594da88 to 31721fa Compare June 15, 2025 15:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
program-libs/bloom-filter/src/lib.rs (1)

71-76: ⚠️ Potential issue

Still prone to slice-length & 32-vs-64 bit panics

The two root problems flagged in the previous review are still present:

  1. value_bytes is a generic slice. copy_from_slice will panic if the caller passes fewer than 32 bytes.
  2. iteration is usize; on a 32-bit target to_le_bytes() yields only 4 bytes, but you blindly copy 8 bytes into combined_bytes[32..], causing another panic.
-pub fn probe_index_keccak(value_bytes: &[u8], iteration: usize, capacity: &u64) -> usize {
-    let iter_bytes = iteration.to_le_bytes();
+pub fn probe_index_keccak(value_bytes: &[u8; 32], iteration: u64, capacity: u64) -> usize {
+    let iter_bytes = iteration.to_le_bytes();          // always 8 bytes

(you can still take &capacity at the call-site if you prefer).

Failing to fix these will crash in production the first time an undersized slice is supplied or the code is compiled for a 32-bit environment.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c478e32 and 31721fa.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • program-libs/bloom-filter/Cargo.toml (1 hunks)
  • program-libs/bloom-filter/src/lib.rs (2 hunks)
  • program-libs/hasher/Cargo.toml (1 hunks)
  • program-libs/hasher/src/keccak.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • program-libs/hasher/Cargo.toml
  • program-libs/bloom-filter/Cargo.toml
  • program-libs/hasher/src/keccak.rs
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: stateless-js
  • GitHub Check: Test sdk-libs
  • GitHub Check: Test concurrent-merkle-tree
  • GitHub Check: Test program-libs
  • GitHub Check: lint
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
program-libs/bloom-filter/src/lib.rs (1)

104-107: Surface API and internal helper are now inconsistent

_insert passes a &[u8;32] to probe_index_keccak, which currently accepts &[u8]. Once the helper’s signature is tightened (see first comment), remember to update this call site; otherwise you’ll get a mismatched-type compile error.

Comment on lines +79 to +84
let mut index = 0u64;
for chunk in hash.chunks(8) {
let value = u64::from_le_bytes(chunk.try_into().unwrap());
index = value.wrapping_add(index) % *capacity;
}
index as usize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Potential usize overflow & extra modulo work

index is accumulated as u64 but returned as usize. On a 32-bit build capacity may legitimately exceed usize::MAX, making the cast truncating and producing out-of-range bit indices (bits[index] will panic). Either:

  1. Store capacity as usize everywhere, or
  2. Keep the function u64 throughout and check/guard before indexing.

Also, taking mod for every 8-byte chunk is unnecessary; taking a single u64 from the hash and modding once is sufficient and faster.

🤖 Prompt for AI Agents
In program-libs/bloom-filter/src/lib.rs around lines 79 to 84, the code
accumulates an index as u64 and then casts it to usize, which can cause overflow
on 32-bit systems if capacity exceeds usize::MAX, leading to panics when
indexing. To fix this, either change capacity to usize throughout the code to
ensure safe indexing or keep the index as u64 and add checks before indexing to
prevent out-of-range access. Additionally, optimize by removing the modulo
operation inside the loop and instead take a single u64 from the hash and apply
modulo once after accumulation to improve performance.

@ananas-block ananas-block force-pushed the jorrit/chore-bloom-filter-opt branch from 31721fa to 39b6144 Compare June 15, 2025 16:06
@ananas-block ananas-block marked this pull request as draft June 15, 2025 16:46
@ananas-block ananas-block force-pushed the jorrit/chore-bloom-filter-opt branch from 39b6144 to 287e260 Compare June 15, 2025 16:55
@ananas-block ananas-block marked this pull request as ready for review June 15, 2025 16:56
@sergeytimoshin sergeytimoshin merged commit 983f687 into main Jun 15, 2025
16 of 17 checks passed
@sergeytimoshin sergeytimoshin deleted the jorrit/chore-bloom-filter-opt branch June 15, 2025 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants