Skip to content

[std.regex] Bit-NFA kickstart engine#4286

Merged
DmitryOlshansky merged 23 commits intodlang:masterfrom
DmitryOlshansky:bitnfa
Oct 9, 2016
Merged

[std.regex] Bit-NFA kickstart engine#4286
DmitryOlshansky merged 23 commits intodlang:masterfrom
DmitryOlshansky:bitnfa

Conversation

@DmitryOlshansky
Copy link
Member

@DmitryOlshansky DmitryOlshansky commented May 7, 2016

This adds an alternative "kickstart" engine to locate a prefix of a given pattern.
It is far more general the the original ShiftOr while a being few times slower,
still it's many times faster then any full-blow engine.

UPDATE: it's even ~x2 times faster then grep! Now that's some news :)

@DmitryOlshansky DmitryOlshansky force-pushed the bitnfa branch 4 times, most recently from 45da00b to 3577a07 Compare May 7, 2016 15:33
@DmitryOlshansky
Copy link
Member Author

screenshot at 2016-05-07 21 37 12

@DmitryOlshansky
Copy link
Member Author

Win32 is blocked by https://issues.dlang.org/show_bug.cgi?id=15989

@DmitryOlshansky DmitryOlshansky force-pushed the bitnfa branch 2 times, most recently from cfe7a48 to e42788c Compare October 6, 2016 22:26
@dlang-bot
Copy link
Contributor

dlang-bot commented Oct 6, 2016

Fix Bugzilla Description
9391 Constant std.regex.regex

@DmitryOlshansky
Copy link
Member Author

Now with immutable regexes!

@aG0aep6G
Copy link
Contributor

aG0aep6G commented Oct 6, 2016

Looks like you have a typo in the issue number. 9381 isn't about regex.

@DmitryOlshansky
Copy link
Member Author

Looks like you have a typo in the issue number. 9381 isn't about regex.

Right my bad issue is 9391.

@DmitryOlshansky
Copy link
Member Author

All green!

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is terrific work, and a large project on its own. I can't pretend to be able to review the logic of the additions in a reasonable time. Hopefully a paper or article will follow :o).

Two things:

  • Some lines that seem important are not covered in unittests.
  • Although this is in a way a project with its own life, it would be good to bring at least newly added code in alignment with the prevalent Phobos style. There, we use spaces around all operators; here, sometimes spaces are present but some other times they're omitted.

break;
case IR.Trie:
if (charsets.length && charsets[ir[0].data].byInterval.length <= 8)
if (charsets.length && charsets[ir[0].data].length <= 8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems to be not covered.

Also has two spaces before <= :)

break;
slot += 1;
if (slot == table.length)
slot = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncovered

break;
case LookaheadStart, NeglookaheadStart, LookbehindStart,
NeglookbehindStart:
paths.push(j + IRL!LookaheadStart + ir[j].data + IRL!LookaheadEnd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncovered (this would be important, right?)

i += (s-1)*IRL!OrChar;
bitCount++;
if (bitCount == 32)
break outer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncovered

finalMask |= 1u<<bitMapping[i];
break;
case Any:
uint mask = 1u<<bitMapping[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncovered

@andralex
Copy link
Member

andralex commented Oct 7, 2016

@DmitryOlshansky by the new flow you can merge your work now, optionally with follow-up work after the review.

@@ -320,13 +320,11 @@ public alias StaticRegex(Char) = std.regex.internal.ir.StaticRegex!(Char);

Throws: $(D RegexException) if there were any errors during compilation.
+/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc comment needs to be moved down to regex(); now it refers to the newly inserted regexPure().

@DmitryOlshansky
Copy link
Member Author

@andralex Will try to resolve the issues first then merge.

@DmitryOlshansky
Copy link
Member Author

Auto-merge toggled on

@DmitryOlshansky DmitryOlshansky merged commit 47adcab into dlang:master Oct 9, 2016
@MartinNowak
Copy link
Member

Seems like this caused a regression w/ building Higgs.
Issue 16626 – [Reg 2.073] recent dmd nightly runs out of memory building Higgs.

@DmitryOlshansky
Copy link
Member Author

Seems like this caused a regression w/ building Higgs.
Issue 16626 – [Reg 2.073] recent dmd nightly runs out of memory building Higgs.

The new version is more demanding on memory during CTFE so yeah, most likely it now fails for some patterns. Dunno what to do here would really love to see the new engine sometime soon.

@MartinNowak
Copy link
Member

Seems to me we should disable that engine in CTFE for now.
Let's continue the discussion in Bugzilla https://issues.dlang.org/show_bug.cgi?id=16626#c2.

MartinNowak added a commit to MartinNowak/phobos that referenced this pull request Dec 24, 2016
- consumes too much memory, introduced by
  e98fa4a (dlang#4286)
@MartinNowak
Copy link
Member

Seems to have caused another regression.
Issue 17066 – [REG2.073a] std.regex captures got immutable

@DmitryOlshansky
Copy link
Member Author

@MartinNowak will look into it, though sounds highly suspicious.

@DmitryOlshansky
Copy link
Member Author

@MartinNowak #5042

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants