Skip to content
This repository was archived by the owner on Dec 15, 2022. It is now read-only.

Eliminate catastrophic backtracking that hangs Atom#119

Merged
nathansobo merged 6 commits intomasterfrom
fix-hangs
Aug 13, 2017
Merged

Eliminate catastrophic backtracking that hangs Atom#119
nathansobo merged 6 commits intomasterfrom
fix-hangs

Conversation

@nathansobo
Copy link
Copy Markdown
Contributor

@nathansobo nathansobo commented Aug 12, 2017

Closes atom/atom#15030
Refs atom/atom#14856 (Not sure if this also solves this one, but it might)

#99 introduced a regression that causes the parser to hang on certain kinds of diabolical input.

The first two commits of this PR introduce a minimal test case that isolates the problem. The third commit comments out the offending component of the regular expression that causes the problem.

@Alhadis @50Wliu Can we use a different approach to support the same functionality without this regular expression that has a potential for catastrophic performance?

I'm not very well versed in these grammars, but I'm inclined to revert #99 and go back to the older, less capable grammar if we can't guarantee good performance. Let me know if I should wait for you to attempt a fix or go ahead and revert.

Thank you.

/cc @iolsen @ungb

'match': '''(?x)
(-?(?!\\d)(?:[\\w-]|[^\\x00-\\x7F]|\\\\(?:[0-9a-fA-F]{1,6}|.))+)
\\s*
(?=[~|^\\]$*=]|/\\*)
Copy link
Copy Markdown
Contributor

@Alhadis Alhadis Aug 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the HELL, that quantifier shouldn't be there. :| Oops.

@nathansobo Try removing $*, that's a zero-length match that'll match anything. Really bad news.

As in, it should just be $, not $*.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I bet dollars to donuts that's what's causing the regression.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your prompt reply. Watching my daughter so I'll give it a try later. Thanks again.

Copy link
Copy Markdown
Contributor

@winstliu winstliu Aug 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the asterisk does not fix the issue. EDIT: That's because it's in the character class, so it's interpreted literally.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About to eat, but changing (?:[\\w-]...) to an atomic group (?>[\\w-]...) works. Didn't run specs though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathansobo I'll wait for you (or somebody else) to confirm this before attempting a different solution. Trying to tie up loose ends elsewhere ATM.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the * after the $ didn't seem to help. 😞

@winstliu
Copy link
Copy Markdown
Contributor

Thanks for taking this on @nathansobo and sorry I didn't get to it sooner 😬.

@winstliu
Copy link
Copy Markdown
Contributor

I'm inclined to revert #99 and go back to the older, less capable grammar if we can't guarantee good performance.

Unfortunately, that's more involved as it sounds as language-sass and language-less now rely on the new patterns introduced in #99.

@Alhadis
Copy link
Copy Markdown
Contributor

Alhadis commented Aug 12, 2017

@50Wliu Oh right, whoops. Still waking up, haha, sorry.

I'd recommend grouping the embedded grammar inside something that tries the "clamping" technique, but @50Wliu seems to think that's too "weird", so I don't know what to do. =)

@nathansobo
Copy link
Copy Markdown
Contributor Author

So what's this "clamping" technique you speak of? Can you link me to a description?

@Alhadis
Copy link
Copy Markdown
Contributor

Alhadis commented Aug 12, 2017

"Clamping" is a term I penned to describe a technique to prevent runaway issues with begin/end pattern-pairs. Embedding a pattern-set within a fixed-range (matched by pattern, as opposed to begin/end) carries no risk of an unclosed tag affecting the rest of the document.

... eh, hang on, I'll slap together a crude example so you can see what I mean.

@Alhadis
Copy link
Copy Markdown
Contributor

Alhadis commented Aug 12, 2017

Basically, it's an irregular and slightly messy (albeit highly effective) hack that we should only consider if an atomic group doesn't solve our problem.

@winstliu
Copy link
Copy Markdown
Contributor

Ok, all specs pass with the atomic group, including the new regression test.

@nathansobo
Copy link
Copy Markdown
Contributor Author

@50Wliu Do you want to push to this PR?

@winstliu
Copy link
Copy Markdown
Contributor

Pushed! I can also confirm that this fixes atom/atom#14856.

Regarding the spec, is there any reason to include the HTML grammar (L3257)?

@nathansobo
Copy link
Copy Markdown
Contributor Author

Regarding the spec, is there any reason to include the HTML grammar

No. I started with the assumption that it was an interaction between HTML and CSS and then realized it wasn't. So that line can go.

@Alhadis
Copy link
Copy Markdown
Contributor

Alhadis commented Aug 12, 2017

Good news regarding the simpler solution. ;-)

Just for posterity and reference, I'll post an example of what I was talking about. Compare this with this. The second example is using "clamping" to embed the pattern with runaway potential inside a precalculated range.

@nathansobo
Copy link
Copy Markdown
Contributor Author

Thanks to both for engaging so rapidly. This went much smoother than I expected. ✨s to both of you.

@winstliu
Copy link
Copy Markdown
Contributor

@nathansobo Done with my changes, feel free to merge when you're ready!

@nathansobo nathansobo merged commit 3caad52 into master Aug 13, 2017
@nathansobo nathansobo deleted the fix-hangs branch August 13, 2017 00:08
@nathansobo nathansobo changed the title WIP: Eliminate catastrophic backtracking that hangs Atom Eliminate catastrophic backtracking that hangs Atom Aug 13, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

XHTML gets stuck

3 participants