Use std::unordered_set instead of set in blockfilter interface#14074
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
I don't think util.h is the best place for a generic utility data structure such as this, it's more for assorted operating system functions
There was a problem hiding this comment.
@laanwj What would be the right place? A new file utiltypes.{h,cpp}? Or util/types.{h,cpp}?
There was a problem hiding this comment.
or maybe support/bytevectorhash.h ?
unless there's a strong need to group this with other things
There was a problem hiding this comment.
ACK on moving siphash to a separate unit
aa37070 to
f460835
Compare
|
utACK f4608359e4e643c6e197df822e03226146e37a49 |
There was a problem hiding this comment.
Initialize count and tmp using default member initializers or in the member initializer list of the constructor?
There was a problem hiding this comment.
This is a move-only commit. I'd rather not modify the contents of the code here.
|
@ryanofsky can you take a look here please? according to @jimpo this was your suggestion |
promag
left a comment
There was a problem hiding this comment.
@jimpo can you improve PR description of the advantages of this change (beside linking @ryanofsky suggestion)?
Do you have numbers to support this change?
Left some nits.
There was a problem hiding this comment.
This may be used as the hash.. ?
There was a problem hiding this comment.
I think it's more clear to leave it explicit even if the default for class is private.
There was a problem hiding this comment.
It's used in SaltedTxidHasher
There was a problem hiding this comment.
Missing includes <stdint.h> and <vector>.
ryanofsky
left a comment
There was a problem hiding this comment.
utACK f4608359e4e643c6e197df822e03226146e37a49. Sorry for missing this previously. I confirmed the first commit is move-only, and I think the changes in the second commit should make it easier to use hash maps more places in the future. I left a suggestion below, but also think this change looks good as it is.
There was a problem hiding this comment.
In commit "blockfilter: Use unordered_set instead of set in blockfilter." (f4608359e4e643c6e197df822e03226146e37a49)
The way this is written makes it seem like this whole class is tied to std::vector<unsigned char>, when actually only the call operator is. I guess my concern that if someone wants to reuse this hash function on a similar type like Span<char>, prevector, or std::string, the naming and comments here will lead them to copy/paste/rename this whole class instead of doing something simpler like overloading the call operator or adding a Span conversion.
I'd suggest:
- Mentioning in this comment that
operator()overloads for new types could be added here in the future. - Changing the name of the class from
ByteVectorHashto something likeRandomizedSipHashto avoid tying it tostd::vector. - Maybe replacing
std::vectorwithSpan.
There was a problem hiding this comment.
I updated the comment to say that it works for any types that internally store (or reference) a byte array, but decided not to change the class name.
7406129 to
cfabca8
Compare
|
utACK cfabca8749a132220e649d4563fb876afc21ceec |
|
Just one not comment: I always imagined the things in 'support' as low level features that are independently usable, and find it a bit strange to put something there that depends on |
|
@sipa I'd prefer to start creating more directory structure. How would you feel about a |
|
@jimpo Sounds great, but perhaps something to do independently? I'd prefer to not start a new convention, and then have it linger in a half finished state if there's unclarity about how to move forward. |
ryanofsky
left a comment
There was a problem hiding this comment.
utACK cfabca8749a132220e649d4563fb876afc21ceec. Only changes are updating ByteVectorHash comment, adding final specifier, and tweaking includes.
2068f08 scripted-diff: Move util files to separate directory. (Jim Posen) Pull request description: As discussed [here](#14074 (comment)), this establishes a `util/` directory to introduce more organizational structure and have a clear place for new util files. It's really not scary to review, it's just one big scripted diff. Tree-SHA512: 39cf15480d7d35e987b6088d52a857a2d5b1802e36c6b815eb42718d80cd95e669757af9bcc7c04426cd8523662cb1050b8da1e2377d3730672820ed298b894b
This is a move-only commit with the exception of changes to includes.
cfabca8 to
fef5adc
Compare
|
@sipa I moved bytevectorhash to util/ now. |
|
utACK fef5adc |
…terface fef5adc blockfilter: Use unordered_set instead of set in blockfilter. (Jim Posen) 4fb789e Extract CSipHasher to it's own file in crypto/ directory. (Jim Posen) Pull request description: Use `std::unordered_set` (hash set) instead of `std::set` (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured. This moves `CSipHasher` to it's own file `crypto/siphash.h`, so that it can be used in the libbitcoin_util library without including `hash.{h,cpp}`. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is. Tree-SHA512: 593d1abda771e45f2860d5334272980d20df0b81925a402bb9ee875e17595c2517c0d8ac9c579218b84bbf66e15b49418241c1fe9f9265719bcd2377b0cd0d88
partial bitcoin#15638, bitcoin#21966, bitcoin#16889, merge bitcoin#14555, bitcoin#20499, bitcoin#14074, bitcoin#17073: util refactoring
Use
std::unordered_set(hash set) instead ofstd::set(tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured.This moves
CSipHasherto it's own filecrypto/siphash.h, so that it can be used in the libbitcoin_util library without includinghash.{h,cpp}. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.