IndexOfAny currently uses a trick where it initializes a bitmap to represent if a character is probably in the array. This makes checking whether a character is represented by the bitmap O(1) since it's just a couple of bit operations, however initializing the map is quite expensive since it's O(n) and these bit operations have to be done for every character of the array. This means that in some cases, the "naive" approach where we make a pass through the array for every character can be faster for small strings (i.e. if anyOf.Length == 2, the threshold seems to be at about 9).
We should consider applying the following optimizations:
- Checking if the string's length is below a certain threshold depending on the length of
anyOf, and if so, use the "naive" approach.
- This could be implemented as a lookup table, i.e.
static int32_t thresholds[] = { 9, 7, ... } for minimal overhead.
- Checking if
anyOf.Length == 1 and if so return IndexOf(anyOf[0])
- Checking if
this.Length == 1 and if so return Array.IndexOf(anyOf, this[0])
- Additionally if
anyOf.Length == 2 and the difference between the two characters is a power of 2, we may be able to apply this optimization where instead of searching the array or initializing the bitmap, we simply OR each char with anyOf[0] ^ anyOf[1] (which will have exactly one bit set) and check if it's equal to anyOf[0] | anyOf[1] (which will be one of the 2 elements).
- The drawback of this, of course, is that it would only apply to arrays of length 2.
Related issues:
IndexOfAnycurrently uses a trick where it initializes a bitmap to represent if a character is probably in the array. This makes checking whether a character is represented by the bitmap O(1) since it's just a couple of bit operations, however initializing the map is quite expensive since it's O(n) and these bit operations have to be done for every character of the array. This means that in some cases, the "naive" approach where we make a pass through the array for every character can be faster for small strings (i.e. ifanyOf.Length == 2, the threshold seems to be at about 9).We should consider applying the following optimizations:
anyOf, and if so, use the "naive" approach.static int32_t thresholds[] = { 9, 7, ... }for minimal overhead.anyOf.Length == 1and if soreturn IndexOf(anyOf[0])this.Length == 1and if soreturn Array.IndexOf(anyOf, this[0])anyOf.Length == 2and the difference between the two characters is a power of 2, we may be able to apply this optimization where instead of searching the array or initializing the bitmap, we simply OR each char withanyOf[0] ^ anyOf[1](which will have exactly one bit set) and check if it's equal toanyOf[0] | anyOf[1](which will be one of the 2 elements).Related issues: