Conversation
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
|
@ccaraman do you mind just marking down the notes we just went over so they don't get duplicated by other reviewers? Awesome stuff! |
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
| // avoid resizing while building the trie. | ||
| // TODO(ccaraman): Define a better buffer value for the trie_ size. | ||
| // TODO(ccaraman): Define a better buffer value for the trie_ size. | ||
| // The value(2000000) was used in the original implementation in |
There was a problem hiding this comment.
@brian-pane Do you have any insight as to why 2000000 was chosen? Is there a better way to compute a buffer given the fill_factor?
There was a problem hiding this comment.
In the original implementation, I think the author just picked 2000000 as a really big number that they thought would be big enough to handle any situation. And then, after populating the buffer, they resized it down to the amount of space they actually ended up using.
I don't have a good way to compute the buffer size in advance, since it depends on the exact shape of the tree. However, I think it's okay to not pre-size the buffer. I think std::vector uses a power-of-two allocator internally, so the number of resize operations will be O(log(N)) for N prefixes. So my proposal is to just use std::vector, see how fast it is, and then optimize if needed.
There was a problem hiding this comment.
That sounds good! I prefer your suggestion. I'll change the comment to say how this diverged from the reference implementation.
There was a problem hiding this comment.
I decided to not remove the resizing due to the accessing pattern used across the code base. Once there are performance benchmark tests, the resizing the vector can be something we look to improve if needed.
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
|
@junr03 Please take a first pass. |
| private: | ||
| /** | ||
| * Extract n bits from input starting at position p. | ||
| * @param position supplies the position. |
There was a problem hiding this comment.
It's called position in the comment but p in the declaration.
| * Extract n bits from input starting at position p. | ||
| * @param position supplies the position. | ||
| * @param n supplies the number of bits to extract. | ||
| * @param input supplies the |
There was a problem hiding this comment.
... "IP address from which to extract the bits," maybe?
There was a problem hiding this comment.
Thank you for catching that. I'll update.
| // The IP's are stored in network byte order. By shifting the value to the left by p bits, the | ||
| // bits between 0 and p-1 are zero'd out. Then to get the n bits, shift the IP back by the | ||
| // address_size minus the number of desired bits. | ||
| return input << p >> (address_size - n); |
There was a problem hiding this comment.
I think this may be unsafe. C doesn't specify whether a right shift is logical (high order bits get filled with zeros) or arithmetic (high order bits get filled with a copy of whatever the input high order bit was).
It's probably safest to bitwise-and the result of these shifts with a mask consisting of n ones in the low order bits.
There was a problem hiding this comment.
I double checked the c++ standard and for unsigned left/right shifting is safe and produces the values we expect. In this version of the pr, they were int's. I've change the methods to be uint32_t.
| return input << n >> n; | ||
| } | ||
|
|
||
| template <class IpType, uint32_t address_size = 8 * sizeof(IpType)> struct IpPrefixes { |
There was a problem hiding this comment.
Should this struct be named "IpPrefix," singular, since it holds just one prefix?
There was a problem hiding this comment.
Yes, that is clearer. I've updated it to be IpPrefix.
| * http://www.csc.kth.se/~snilsson/software/router/C/ | ||
| * | ||
| * Note: This trie handles Ipv4 and Ipv6 separately. | ||
| * Note: The trie can only support up 524288(2^19) nodes. Please see LcNode for further |
There was a problem hiding this comment.
Would you be open to making this unlimited? The global routing table currently contains 700,000 prefixes, so it won't fit within the 2^19 limit. I know Envoy itself is unlikely to ever have that many prefixes configured; but if this implementation can support more nodes, it could be useful to people working on other projects someday.
There was a problem hiding this comment.
I should note, though, that making this unlimited would require using a uint64_t instead of uint32_t for LcNode, so the LC trie would take up more cache space.
There was a problem hiding this comment.
I'm am open to it. I don't think we need it now. I've added a comment to change the data structure for LcNode to be uint64_t if any consumer requires it.
There was a problem hiding this comment.
To summarize our offline conversation, the code will stay with uint32_t as the underlying LcNode structure.
To avoid getting an out of range exception, I'll add a check at the beginning of every buildRecursive call to ensure the size is > the position being accessed in the Lc-Trie that way if the number of trie_ nodes required is greater than 2^20 the code will throw an exception. If this case is it, then the uint64_t conversation should be revisited.
Once there are larger data sets and perf tests, there can be an effort to see if using uint64_t is worth supporting larger set of prefixes. Another benefit is then there can be some experimentation in determining what the buffer can be.
| * | ||
| * The LcNode has three parts to it | ||
| * - Branch: the first 5 bits represent the branching factor. This supports a branching factor | ||
| * of 2^31. |
There was a problem hiding this comment.
The max branching factor is just 31 rather than 2^31, right?
There was a problem hiding this comment.
Yes, that is another typo (and compression of the logic 2^5-1 = 31).
| * of 2^31. | ||
| * - Skip: the next 7 bits represent the number of bits to skip when looking at an IP address. | ||
| * This value can be between 0 and 127, so IPv6 is supported as well. | ||
| * - Address: the remaining 30 bits represent the pointer. If branch value is not zero, this |
There was a problem hiding this comment.
If I'm counting right, it seems like there are only 20 remaining bits. 32 bits minus 5 for the branching factor, minus 7 for the skip count.
There was a problem hiding this comment.
Yes, that is a typo. I'll fix that.
| /** | ||
| * @return Returns the value at bits 6-12. | ||
| */ | ||
| static LcNode getSkip(LcNode node) { return node >> 20 & 127; } |
There was a problem hiding this comment.
In order to make code like this less error-prone, the mechanism I usually use is:
struct LcNode {
uint32_t branch:5;
uint32_t skip:7;
uint32_t address:20;
};
Then you can just refer to fields in the struct by name, and let the compiler deal with all the bit shifting.
I don't know if bitfield structs like that are encouraged or discouraged in Envoy, though.
There was a problem hiding this comment.
Thank you for the suggestion. This is cleaner. I've updated the code to use your proposed structure.
| } | ||
|
|
||
| // The node used for the trie uses the last 20 bits of uint32_t. The trie cannot support | ||
| // an input greater than 524288(2^19) nodes. |
There was a problem hiding this comment.
I think it might be possible for a trie built from 2^19 prefixes to end up with more than 2^20 nodes, if the fill_factor is set to something less than 0.5. Hmmm, maybe the safe limit is 2^20 * fill_factor, but I haven't thought through it fully, so please don't take my word for it. :)
| std::sort(tag_data.begin(), tag_data.end()); | ||
| ip_prefixes_.push_back(tag_data[0]); | ||
|
|
||
| // Remove duplicate entries and nested prefixes. |
There was a problem hiding this comment.
Does the original LC trie code work with nested prefixes? I know the corresponding paper didn't use any nested prefixes in its examples, but it isn't clear to me if the design can handle nested prefixes without modifications.
There was a problem hiding this comment.
I'll update the comment here and for the class and method definition. Currently, the code will throw an exception if there are nested prefixes. I have a todo to make the exception show which prefixes are nested.
The future implementation for nested prefixes won't modify the algorithm or the underlying LcNode. There are two options to handle it:
- Have a sorted list of the nested prefixes
- Use another lc-trie for the nested prefixes
The most general prefix will have a way of accessing one of those options once it is implemented.
There was a problem hiding this comment.
FWIW, I would do the implementation that was done in the original C implementation. IMO it's very easy to understand and will perform very well (linked list of prefixes, accessed from trie node). It will also perform well for our case where 0 prefixes will be by far the most common case and 1 prefix will be the next most common case by a huge margin.
| EXPECT_EQ("", trie.search(Utility::parseInternetAddress("1.2.3.4"))); | ||
| } | ||
|
|
||
| TEST(LcTrie, BothIpvVersions) { |
There was a problem hiding this comment.
In practice, will IPv4 and IPv6 need separate trees? I'm thinking about the case where a user has configured an IPv4 prefix with length 24, and then they get a request from an IPv6 address whose first 24 bits happen to match the prefix. In that scenario, the user will probably want the IPv6 address to not match the IPv4 prefix.
There was a problem hiding this comment.
Oops, never mind; I just realized the LCTrie maintains two separate LCTrieInternal objects, so it won't accidentally match v6 addresses against v4 prefixes.
There was a problem hiding this comment.
I'll improve the comments for the LcTrie to make this clearer. Thank you for the feedback.
|
Thanks for implementing this! In case it's useful for unit tests, I computed the IPv4 CIDR prefixes corresponding to the examples in the original paper. These prefixes probably don't exercise 100% of the trie-building code paths, but they do result in a trie with a mix of different branching factors (figure 3 from the paper). |
| ipv6_trie_.reset(new LcTrieInternal<Ipv6>(ipv6_prefixes, fill_factor, root_branching_factor)); | ||
| } | ||
|
|
||
| std::string LcTrie::search(Network::Address::InstanceConstSharedPtr ip_address) const { |
There was a problem hiding this comment.
drive by quick comment: const Network::Address::Instance&.
Will let you get through @brian-pane and @junr03 reviews before looking more in-depth. Looks great though!
| } | ||
|
|
||
| if (k == 0) { | ||
| if (p == first + n) { |
There was a problem hiding this comment.
Do the unit tests currently cover both this and its "else" case? I ask because the handling of the "k==0" situation (which requires interpolating the values for new child nodes that have to be created when the number of children for a node is less than 2^branch_) seems like one of the trickiest parts of the LC Trie construction.
There was a problem hiding this comment.
I double checked. The sample input provided in the paper covers all of the if/else cases in buildRecursive.
|
|
||
| // TODO(ccaraman): Add a performance benchmark test. | ||
|
|
||
| TEST(LcTrie, Ipv4) { |
There was a problem hiding this comment.
I'm going to restructure these tests. There is lots of duplicate in logic, take in a list of strings for cidr range, build trie, search for an input.
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
|
|
||
| uint32_t address = next_free_index; | ||
| trie_[position].branch_ = output.branch_; | ||
| trie_[position].skip_ = output.prefix_ - prefix; |
There was a problem hiding this comment.
Synced with Jose offline - i'll add a comment that the skip value is the number of bits between the last prefix at prefix and the output.prefix . (Mention that it is the number of bits)
| ComputePair(int branch, int prefix) : branch_(branch), prefix_(prefix) {} | ||
|
|
||
| uint32_t branch_; | ||
| uint32_t prefix_; |
There was a problem hiding this comment.
I'll add note that this is total number of bits that have the same prefix for a range of data.
junr03
left a comment
There was a problem hiding this comment.
I have more comments but I have to leave for now. Will put out more tonight
| * supported. | ||
| * @param ip_address supplies the IP address. | ||
| * @return tag from the CIDR range that contains 'ip_address'. An empty string is returned | ||
| * if no prefix contains 'ip_address'or there is no data for the IP version of the |
There was a problem hiding this comment.
nit: space between 'ip_address'or
| * byte order. | ||
| * @return extracted bits in the format of IpType. | ||
| */ | ||
| template <class IpType, uint32_t address_size = 8 * sizeof(IpType)> |
There was a problem hiding this comment.
can we make the number of bits in a byte a constant?
There was a problem hiding this comment.
I was curious, so I just looked it up, and there's standardized CHAR_BIT in C and C++. (Documentation)
| * @param tag_data supplies a vector of tag and CIDR ranges. | ||
| * @param fill_factor supplies the fraction of completeness to use when calculating the branch | ||
| * value for a sub-trie. | ||
| * @param root_branching_factor supplies the branching factor at the root. The paper suggests for |
There was a problem hiding this comment.
I know the paper is below. But it might be useful to cite the paper up top and let the reader know from the beginning that this is based on the paper?
| * 'http://www.csc.kth.se/~snilsson/software/router/C/' were used as reference during | ||
| * implementation. | ||
| * | ||
| * Note: The trie can only support up 524288(2^19) nodes. |
There was a problem hiding this comment.
Might be worth it to spell out why. Reference the comment in the build method?
| // In theory, the trie_ should only need twice the amount of entries of CIDR ranges. | ||
| // To prevent index out of bounds issues, only support a maximum of 2^19 CIDR ranges. | ||
| // TODO(ccaraman): Add a test case for this. | ||
| if (tag_data.size() > (1 << 19)) { |
There was a problem hiding this comment.
Make a constant of the supported size
| tag_data[i].asString(), tag_data[i - 1].asString())); | ||
| } | ||
|
|
||
| if (tag_data[i - 1].compare(tag_data[i]) != 0) { |
| /** | ||
| * Recursively build a trie for IP prefixes from position 'first' to 'first+n-1'. | ||
| * @param prefix supplies the prefix to ignore when building the sub-trie. | ||
| * @param first supplies the ip_prefixes_ index into for this sub-trie. |
| } | ||
|
|
||
| // Compute the number of bits required for branching by checking for all | ||
| // patterns (b=2 {00, 01, 10, 11}; b=3 {000,001,010,011,100,101,110,111}, etc) |
There was a problem hiding this comment.
compute the number of bits required for branching by checking for all patterns are covered in the set?
| uint32_t address_ : 20; | ||
| }; | ||
|
|
||
| // Vector that contains the CIDR ranges and tags. |
There was a problem hiding this comment.
explain that this is kept because you need to make sure that you've got a match. I know this is based on the paper, but it might be worth it to spell that here.
| * @param n supplies the number of nodes to use while computing the branch. | ||
| * @return pair of integers for the branching factor and the skip. | ||
| */ | ||
| ComputePair computeBranch(uint32_t prefix, uint32_t first, uint32_t n) const { |
There was a problem hiding this comment.
I would rename this method. computeBranchAndSkip?
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
| // Stop iterating once the size of the branch (with the fill factor ratio) is able to | ||
| // contain all of the prefixes within the current range of ip_prefixes_[first] to | ||
| // ip_prefixes_[first+n-1]. | ||
| } while (count >= fill_factor_ * (1 << branch)); |
There was a problem hiding this comment.
I am a little concerned about the default fill factor of 0. That would mean that any count would keep increasing the branch going up and collapsing the trie
There was a problem hiding this comment.
fill factor will be validated in the config to have this range (0, 1]
| } while (count >= fill_factor_ * (1 << branch)); | ||
|
|
||
| // Set the branching factor. | ||
| compute.branch_ = branch - 1; |
| // exception be thrown. | ||
| // TODO(ccaraman): Add a test case. | ||
| if (position > MAXIMUM_TRIE_NODES) { | ||
| throw EnvoyException(fmt::format("The number of internal nodes required for the LC-Trie " |
There was a problem hiding this comment.
I know this is not the responsibility of the throwing code. But given we own both the throwing code and the code that receives the exception, I want to make sure this is caught at config time, which I am 99% sure it does.
There was a problem hiding this comment.
Yes, this will get thrown during config parsing The call chain is LcTrie()->LcTrieInternal<>-> build()->buildRecursive
| for (uint32_t bit_pattern = 0; bit_pattern < static_cast<uint32_t>(1 << output.branch_); | ||
| ++bit_pattern) { | ||
|
|
||
| // count is the number of entries in the base vector that have the same bit pattern as the |
| } | ||
|
|
||
| // This logic was taken from | ||
| // https://github.com/beevek/libkrb/blob/24a224d3ea840e2e7d2926e17d8849aefecc1101/krb/lc_trie.hpp#L396. |
There was a problem hiding this comment.
I think this section lacks in explanation compared to the rest of the code, and it is a pretty crucial set of operations. I'll leave it up to you if you want to spell it out more. At the very least to explain what each case is.
There was a problem hiding this comment.
Added more comments
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
mattklein123
left a comment
There was a problem hiding this comment.
Looks great! Just a few comments.
| * within this class with no calling pattern changes. | ||
| * | ||
| * The algorithm to build the LC-Trie is desribed in the paper 'IP-address lookup using LC-tries' | ||
| * by'S. Nilsson' and 'G. Karlsson'. Refer to LcTrieInternal for implementation and algorithm |
There was a problem hiding this comment.
nit: missing space after "by". Please also provide direct link to paper/article if possible.
There was a problem hiding this comment.
Fixed. Found a free version of the paper hosted by the author.
| * factor. It reduces the depth of the trie. | ||
| */ | ||
| LcTrie(const std::vector<std::pair<std::string, std::vector<Address::CidrRange>>>& tag_data, | ||
| double fill_factor = 0.5, uint32_t root_branching_factor = 0); |
There was a problem hiding this comment.
Should we just make the default 16 for root_branching_factor?
There was a problem hiding this comment.
I think it may be better to use a much smaller default, like 6 or 8. Or maybe auto-compute the root branching factor based on the number of prefixes submitted to the build function.
Here's my rationale...
With a root_branching factor of 16, each trie will use at least 256KB of memory: 2^16 nodes at the first level below the root, times 4 bytes per node.
When Nilsson et al recommended 16 bits of branching at the root in their original paper, their target use case was a router with at least tens of thousands of prefixes. In that scenario, dedicating 256KB of memory to shard the trie into 2^16 buckets seems like a reasonable tradeoff of memory for speed.
But proxy applications, if my own experience is representative, will usually have a much smaller number of prefixes configured: often dozens of prefixes, sometimes hundreds, rarely tens of thousands. For such configurations, 256KB strikes me as too much overhead, especially relative to the L1D and even L2 cache sizes of contemporary server CPUs.
There was a problem hiding this comment.
SGTM. Whatever you all think is best. Was mostly just wondering if the default should not be 0.
There was a problem hiding this comment.
I'll leave it at zero for now.
| // Helper methods to retrieve the string representation of the address in IpPrefix using | ||
| // inet_ntop. These strings are used in the nested prefixes exception messages. | ||
| // TODO(ccaraman): Remove once nested prefixes are supported. | ||
| static std::string toString(const Ipv4& input) { |
There was a problem hiding this comment.
For these can you just construct an Ipv4Instance or Ipv6Instance and return the string from that? Seems simpler.
| } | ||
| } | ||
|
|
||
| // In theory, the trie_ vector can have at most twice the number of ip_prefixes entries - 1. |
There was a problem hiding this comment.
nit: extra space before ip_prefixes
|
|
||
| /** | ||
| * Converts Ipv6 address in std::array<uint8_t, 16> to absl::uint128 in host byte order. | ||
| * TODO (ccaraman): Remove this workaround once Ipv6Helper returns absl::uint128t. |
There was a problem hiding this comment.
Can we just do that now? It doesn't seem that hard and seems like it would clean up a bunch of stuff. You could do another PR and just merge that into this?
There was a problem hiding this comment.
I'll do this in a separate PR and merge it into here.
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
mattklein123
left a comment
There was a problem hiding this comment.
two small nits. Nice work!
| * if no prefix contains 'ip_address' or there is no data for the IP version of the | ||
| * ip_address. | ||
| */ | ||
| std::string getTag(const Network::Address::InstanceConstSharedPtr ip_address) const; |
There was a problem hiding this comment.
nit: const Network::Address::InstanceConstSharedPtr&
|
|
||
| class LcTrieTest : public testing::Test { | ||
| public: | ||
| LcTrieTest() {} |
Signed-off-by: Constance Caramanolis <ccaramanolis@lyft.com>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com> Signed-off-by: JP Simard <jp@jpsim.com>
Signed-off-by: Mike Schore <mike.schore@gmail.com> Signed-off-by: JP Simard <jp@jpsim.com>
Implementation of the Level Compressed Trie as described by the paper IP address lookup using LC-tries by Stefan Nelson and Gunnar Karlsson.
For this implementation, I referenced https://github.com/beevek/libkrb/blob/master/krb/lc_trie.hpp which is based off of the C implementation found at http://www.csc.kth.se/~snilsson/software/router/C/
Note: This doesn't support nested prefixes. I've added a TODO to add support in the next PR.
Risk Level: Medium - This will be the core logic behind the Ip Tagging Filter.
Testing: Unit Tests. A todo has been added for performance benchmark.
Docs Changes: Docs will be updated when the filter consumes this implementation.
Release Notes: I'll update the release notes when the filter consumes this implementation.
Related to #1001
Next Pr's will do the following: