Skip to content

Fix nested repeat operator warnings in Ruby 3.3#662

Open
svm1048 wants to merge 1 commit intorapid7:mainfrom
svm1048:fix-ruby33-regex-warnings
Open

Fix nested repeat operator warnings in Ruby 3.3#662
svm1048 wants to merge 1 commit intorapid7:mainfrom
svm1048:fix-ruby33-regex-warnings

Conversation

@svm1048
Copy link
Copy Markdown

@svm1048 svm1048 commented Mar 4, 2026

This PR resolves the nested repeat operator warnings thrown by Ruby 3.3 during fingerprinting.

Ruby 3.3's regex engine is much stricter and warns when redundant operators are used (e.g., (?:[a-z]+)?). I've audited the XML databases and updated these redundant +)? patterns to their mathematical equivalent *) (zero or more) to silence the warnings while preserving the exact same fingerprinting logic.

Tested locally with bundle exec rspec (195,555 examples, 0 failures).

Resolves rapid7/metasploit-framework issue #20121 (RegEx Replacement Warning in smb_version)

@cdelafuente-r7
Copy link
Copy Markdown

Thank you @svm1048 for looking into this.

There is a corner case that we need to consider with these changes. Using (.*) can result in a different outcome than (.+)? if the captured data is not present. Look at this example:

3.3.8 :010 > "foo123bar".match(/foo(.*)bar/)
 => #<MatchData "foo123bar" 1:"123">
3.3.8 :011 > "foobar".match(/foo(.*)bar/)
 => #<MatchData "foobar" 1:"">
3.3.8 :012 > "foo123bar".match(/foo(.+)?bar/)
 => #<MatchData "foo123bar" 1:"123">
3.3.8 :013 > "foobar".match(/foo(.+)?bar/)
 => #<MatchData "foobar" 1:nil>

As you can see, (.*) will capture an empty string "" when the capture group is not present. On the other hand, (.+)? captures nil.

This can be an issue if the logic rely on this capture group being "nil" or not, like this part of #match method in the ruby-recog gem:

      replacements.each_pair do |replacement_k, replacement_vs|
        replacement_vs.each do |replacement|
          if result[replacement]
            result[replacement_k] = result[replacement_k].gsub(/\{#{replacement}\}/, result[replacement])
          else
            # if the value uses an interpolated value that does not exist, in general this could be
            # very bad, but over time we have allowed the use of regexes with
            # optional captures that are then used for parts of the asserted
            # fingerprints.  This is frequently done for optional version
            # strings.  If the key in question is cpe23 and the interpolated
            # value we are trying to replace is version related, use the CPE
            # standard of '-' for the version, otherwise raise and exception as
            # this code currently does not handle interpolation of undefined
            # values in other cases.
            raise "Invalid use of nil interpolated non-version value #{replacement} in non-cpe23 fingerprint param #{replacement_k}" unless replacement_k =~ (/\.cpe23$/) && replacement =~ (/\.version$/)

            result[replacement_k] = result[replacement_k].gsub(/\{#{replacement}\}/, '-')

          end

https://github.com/rapid7/recog-ruby/blob/733225e0d46501e32f1c9fe7606f8f998f091a31/lib/recog/fingerprint.rb#L112-L130

Here the code takes different branches according to if result[replacement]. When no version is present, result[replacement] will be nil with old regex (.+)? and "" with the fix in place (.*). As a result, the code will take different branch since nil is falsy and "" is truthy.

I could confirm this adding this an example without version in the xml/http_servers.xml file (using LiteSpeed Enterprise instead of LiteSpeed/5.2.8 Enterprise).

  <fingerprint pattern="^LiteSpeed\/?([\d.]+)?(?: \S+)?">
    <description>LiteSpeed</description>
    <example>LiteSpeed Enterprise</example>
    <param pos="0" name="service.vendor" value="LiteSpeed Technologies"/>
    <param pos="0" name="service.product" value="LiteSpeed Web Server"/>
    <param pos="1" name="service.version"/>
    <param pos="0" name="service.cpe23" value="cpe:/a:litespeedtech:litespeed_web_server:{service.version}"/>
  </fingerprint>

I put a break point after this block code and run ruby bin/recog_verify xml/http_servers.xml. Here are the results:

  • with (.+)?
$ ruby bin/recog_verify xml/http_servers.xml
[+] pry-byebug loaded

From: /Users/cdelafuente/.rvm/gems/ruby-3.3.8@recog/gems/recog-3.1.26/lib/recog/fingerprint.rb:136 Recog::Fingerprint#match:

    131:         end
    132:       end
    133:       require 'pry';binding.pry if match_string =~ /LiteSpeed/
    134:
    135:       # After performing interpolation, remove temporary keys from results
 => 136:       result.each_pair do |k, _|
    137:         result.delete(k) if k.start_with?('_tmp.')
    138:       end
    139:
    140:       result
    141:     end

[1] pry(#<Recog::Fingerprint>)> result
=> {"matched"=>"LiteSpeed", "service.vendor"=>"LiteSpeed Technologies", "service.product"=>"LiteSpeed Web Server", "service.version"=>nil, "service.cpe23"=>"cpe:/a:litespeedtech:litespeed_web_server:-", "service.protocol"=>"http", "fingerprint_db"=>"http_header.server"}
[2] pry(#<Recog::Fingerprint>)> result["service.version"]
=> nil
[3] pry(#<Recog::Fingerprint>)> result["service.cpe23"]
=> "cpe:/a:litespeedtech:litespeed_web_server:-"
  • with (.*)
$ ruby bin/recog_verify xml/http_servers.xml
[+] pry-byebug loaded

From: /Users/cdelafuente/.rvm/gems/ruby-3.3.8@recog/gems/recog-3.1.26/lib/recog/fingerprint.rb:136 Recog::Fingerprint#match:

    131:         end
    132:       end
    133:       require 'pry';binding.pry if match_string =~ /LiteSpeed/
    134:
    135:       # After performing interpolation, remove temporary keys from results
 => 136:       result.each_pair do |k, _|
    137:         result.delete(k) if k.start_with?('_tmp.')
    138:       end
    139:
    140:       result
    141:     end

[1] pry(#<Recog::Fingerprint>)> result
=> {"matched"=>"LiteSpeed", "service.vendor"=>"LiteSpeed Technologies", "service.product"=>"LiteSpeed Web Server", "service.version"=>"", "service.cpe23"=>"cpe:/a:litespeedtech:litespeed_web_server:", "service.protocol"=>"http", "fingerprint_db"=>"http_header.server"}
[2] pry(#<Recog::Fingerprint>)> result["service.version"]
=> ""
[3] pry(#<Recog::Fingerprint>)> result["service.cpe23"]
=> "cpe:/a:litespeedtech:litespeed_web_server:"

You can see the results are different (notice the hyphen added to the service.cpe23 value).

So, it looks like the bug is also in the ruby-recog code. The same branch should be taken if the the value is nil or empty. Something along these lines:

if result[replacement] && !result[replacement].empty?

Note that I found this example in ruby-recog, but there might be others. The code should be audited to make sure there is nothing that changes whether a captured value is nil or "".

Once the code is fix, we can land this PR with the regex updates.

Copy link
Copy Markdown

@cdelafuente-r7 cdelafuente-r7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these changes are not necessary and won't produce a warning with ruby v3.3. Please, can you confirm this really fixes the original issue?

Also, it looks like this regex in xml/snmp_sysdescr.xml does produce a warning:
"^Linux, Cisco(?: Small Business)? (RV[0-9]+(?:[A-Z]+)?(?:-[A-Z]+[0-9]+)?).+Version (\d+(?:\.\d+)+)"

<fingerprint pattern="^Linux, Cisco(?: Small Business)? (RV[0-9]+(?:[A-Z]+)?(?:-[A-Z]+[0-9]+)?).+Version (\d+(?:\.\d+)+)">

ruby -e 'Regexp.new("^Linux, Cisco(?: Small Business)? (RV[0-9]+(?:[A-Z]+)?(?:-[A-Z]+[0-9]+)?).+Version (\d+(?:\.\d+)+)")'
-e:1: warning: nested repeat operator '+' and '?' was replaced with '*' in regular expression

Comment thread xml/http_servers.xml
</fingerprint>

<fingerprint pattern="^Oracle-HTTP-Server(?:(?:-(?:\d{2}[cg]\/?)?([\d.]+)?(?:\/[\d.]+)?)?(?:.{0,512}\([TF]?[HSUGMN]?(?:;max-age=\d+(?:\+\d+)?)?(?:;age=\d+)?;ecid=[^)]+\))?)?$">
<fingerprint pattern="^Oracle-HTTP-Server(?:(?:-(?:\d{2}[cg]\/?)?([\d.]+)?(?:\/[\d.]+)?)?(?:.{0,512}\([TF]?[HSUGMN]?(?:;max-age=\d+(?:\+\d*))?(?:;age=\d+)?;ecid=[^)]+\))?)?$">
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not semantically equivalent. (?:\+\d+)? (optional: + followed by one-or-more digits) is different than (?:\+\d*) (required: + followed by zero-or-more digits - no outer ?).

If a server sends max-age=1800 without +<value>, the old regex will still capture it, while the new one will not match, causing the full (?:;max-age=...) outer group to fail.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this change necessary? I'm wondering if the original regex causes a warning using ruby 3.3.8:

ruby -e 'Regexp.new(/^Oracle-HTTP-Server(?:(?:-(?:\d{2}[cg]\/?)?([\d.]+)?(?:\/[\d.]+)?)?(?:.{0,512}\([TF]?[HSUGMN]?(?:;max-age=\d+(?:\+\d+)?)?(?:;age=\d+)?;ecid=[^)]+\))?)?$/)'

Comment thread xml/http_servers.xml
</fingerprint>

<fingerprint pattern="^LiteSpeed\/?([\d.]+)?(?: \S+)?">
<fingerprint pattern="^LiteSpeed\/?([\d.]*)(?: \S+)?">
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, it looks like no warning is returned with this regex:

ruby -e 'Regexp.new("^LiteSpeed\/?([\d.]+)?(?: \S+)?")'

I checked the other changes and none of them seems to produce a warning. Please, can you check if these changes are really necessary?

@cdelafuente-r7 cdelafuente-r7 moved this from In Progress to Waiting on Contributor in Metasploit Kanban Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Waiting on Contributor

Development

Successfully merging this pull request may close these issues.

RegEx Replacement Warning in smb_version

3 participants