Eliminate catastrophic backtracking that hangs Atom#119
Conversation
| 'match': '''(?x) | ||
| (-?(?!\\d)(?:[\\w-]|[^\\x00-\\x7F]|\\\\(?:[0-9a-fA-F]{1,6}|.))+) | ||
| \\s* | ||
| (?=[~|^\\]$*=]|/\\*) |
There was a problem hiding this comment.
What the HELL, that quantifier shouldn't be there. :| Oops.
@nathansobo Try removing $*, that's a zero-length match that'll match anything. Really bad news.
As in, it should just be $, not $*.
There was a problem hiding this comment.
In fact, I bet dollars to donuts that's what's causing the regression.
There was a problem hiding this comment.
Thanks for your prompt reply. Watching my daughter so I'll give it a try later. Thanks again.
There was a problem hiding this comment.
Removing the asterisk does not fix the issue. EDIT: That's because it's in the character class, so it's interpreted literally.
There was a problem hiding this comment.
About to eat, but changing (?:[\\w-]...) to an atomic group (?>[\\w-]...) works. Didn't run specs though.
There was a problem hiding this comment.
@nathansobo I'll wait for you (or somebody else) to confirm this before attempting a different solution. Trying to tie up loose ends elsewhere ATM.
There was a problem hiding this comment.
Removing the * after the $ didn't seem to help. 😞
|
Thanks for taking this on @nathansobo and sorry I didn't get to it sooner 😬. |
|
@50Wliu Oh right, whoops. Still waking up, haha, sorry. I'd recommend grouping the embedded grammar inside something that tries the "clamping" technique, but @50Wliu seems to think that's too "weird", so I don't know what to do. =) |
|
So what's this "clamping" technique you speak of? Can you link me to a description? |
|
"Clamping" is a term I penned to describe a technique to prevent runaway issues with ... eh, hang on, I'll slap together a crude example so you can see what I mean. |
|
Basically, it's an irregular and slightly messy (albeit highly effective) hack that we should only consider if an atomic group doesn't solve our problem. |
|
Ok, all specs pass with the atomic group, including the new regression test. |
|
@50Wliu Do you want to push to this PR? |
|
Pushed! I can also confirm that this fixes atom/atom#14856. Regarding the spec, is there any reason to include the HTML grammar (L3257)? |
No. I started with the assumption that it was an interaction between HTML and CSS and then realized it wasn't. So that line can go. |
|
Thanks to both for engaging so rapidly. This went much smoother than I expected. ✨s to both of you. |
|
@nathansobo Done with my changes, feel free to merge when you're ready! |
Closes atom/atom#15030
Refs atom/atom#14856 (Not sure if this also solves this one, but it might)
#99 introduced a regression that causes the parser to hang on certain kinds of diabolical input.
The first two commits of this PR introduce a minimal test case that isolates the problem. The third commit comments out the offending component of the regular expression that causes the problem.
@Alhadis @50Wliu Can we use a different approach to support the same functionality without this regular expression that has a potential for catastrophic performance?
I'm not very well versed in these grammars, but I'm inclined to revert #99 and go back to the older, less capable grammar if we can't guarantee good performance. Let me know if I should wait for you to attempt a fix or go ahead and revert.
Thank you.
/cc @iolsen @ungb