feat(merk): support large value opcodes in encoding and decoding#406
Conversation
📝 WalkthroughWalkthroughAdds large-value proof encoding/decoding support: new large opcodes for Push (0x20–0x25) and PushInverted (0x28–0x2d), conditional headers using 16-bit or 32-bit value lengths based on size (MAX_VALUE_LEN = 64 MB), and corresponding updates to encoding, decoding, and length calculation. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
merk/src/proofs/encoding.rs (1)
895-1593: Missing test coverage for large value variants.The test suite covers small value encoding/decoding thoroughly, but there are no tests for the new large value opcodes (0x20-0x25, 0x28-0x2d). Consider adding tests that:
- Encode/decode values at the boundary (exactly 65536 bytes)
- Encode/decode values larger than 65536 bytes
- Verify round-trip encoding/decoding for each large value variant
Do you want me to generate test cases for the large value variants?
🤖 Fix all issues with AI agents
In `@merk/src/proofs/encoding.rs`:
- Around line 629-804: The decoding paths for opcodes 0x20-0x25 and 0x28-0x2d
allocate value buffers using an unchecked u32 `value_len`, allowing a
maliciously large length to trigger memory exhaustion; add a constant
MAX_VALUE_LEN (e.g. usize) and validate `value_len as usize <= MAX_VALUE_LEN`
immediately after decoding `value_len` and before vec allocation in all branches
handling Node::KV, Node::KVValueHash, Node::KVRefValueHash,
Node::KVValueHashFeatureType, Node::KVCount, and Node::KVRefValueHashCount (both
Push and PushInverted variants); factor the logic into a small helper (e.g.
read_validated_value or ensure_max_value_len) to perform
decode->validate->allocate->read_exact and return a Decode error if the length
exceeds the max to avoid duplication.
🧹 Nitpick comments (1)
merk/src/proofs/encoding.rs (1)
33-275: Consider extracting helper functions to reduce code duplication.The encoding logic for small vs. large values is repeated for each variant with minor differences. A helper function could handle the common pattern:
fn write_kv_header<W: Write>( dest: &mut W, small_opcode: u8, large_opcode: u8, key: &[u8], value_len: usize, ) -> ed::Result<()> { if value_len < 65536 { dest.write_all(&[small_opcode, key.len() as u8])?; dest.write_all(key)?; (value_len as u16).encode_into(dest) } else { dest.write_all(&[large_opcode, key.len() as u8])?; dest.write_all(key)?; (value_len as u32).encode_into(dest) } }This would reduce the repetitive branching logic across 12 match arms.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
merk/src/proofs/encoding.rs (1)
41-148: Enforce MAX_VALUE_LEN during encoding to avoid invalid proofs and u32 truncation.
value.len()is written asu32without a cap; values > 64MB (or >u32::MAX) will truncate and produce proofs the decoder rejects. Add a guard and fail early in all value-bearing branches.🛠️ Suggested fix (apply to all value-bearing variants)
Op::Push(Node::KV(key, value)) => { debug_assert!(key.len() < 256); - if value.len() < 65536 { + if value.len() > MAX_VALUE_LEN as usize { + return Err(ed::Error::IOError(std::io::Error::new( + std::io::ErrorKind::InvalidInput, + "value length exceeds MAX_VALUE_LEN", + ))); + } + if value.len() < 65536 { dest.write_all(&[0x03, key.len() as u8])?; dest.write_all(key)?; (value.len() as u16).encode_into(dest)?; dest.write_all(value)?; } else { dest.write_all(&[0x20, key.len() as u8])?; dest.write_all(key)?; (value.len() as u32).encode_into(dest)?; dest.write_all(value)?; } }Also applies to: 169-280
This update introduces support for large value opcodes in the encoding and decoding processes. Values greater than or equal to 64KB are now handled using a
u32for value length instead of au16.The changes include:
These modifications ensure that the system can now efficiently manage larger data sizes while maintaining backward compatibility with existing smaller values.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.