Skip to content

Regex negative lookahead not supported #62

@leothelocust

Description

@leothelocust

TL;DR

I need to improve the results of a specific css selector that returns items I don't want based on a word in the captured text. In this case "release". I don't want anything with that word in the source documentation to be added as a selector.

A bit wordier

In the source documentation, my selector not only returns a nice list of "Sections", but it also returns about 200 "release notes" sections, with the same css query selector.

Essentially I have a bunch of these I want to get rid of:

2020 Release Notes
2019 Winter Release Notes
Upgrade release-notes for xyz

I don't want those to be included in the resulting docset, so I tried my hand at the regex field to return everything not including the word release:

I essentially need the opposite of this:

^.*release.*$

So, don't return anything that has the word "release" in it.

I tried the (?!) negative lookahead in regex, but I get the message:

error parsing regexp: invalid or unsupported Perl syntax: `(?!`

Is there a field in the selector object for rejecting if the title contains a word? I didn't see anything for this purpose in the README:

"css selector": {
      "requiretext": "require that the text matches a regexp. If not, this node is not considered as selected",
      "type": "Dash data type",
      "attr": "Use the value of the specified attribute instead of html node text as the basis for transformation",
      "regexp": "PCRE regular expression (no need to enclose in //)",
      "replacement": "Replacement text for each match of 'regexp'",
      "matchpath": "Only files matching this regular expression will be parsed. Will match all files if not set."
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions