-
Notifications
You must be signed in to change notification settings - Fork 55
Description
TL;DR
I need to improve the results of a specific css selector that returns items I don't want based on a word in the captured text. In this case "release". I don't want anything with that word in the source documentation to be added as a selector.
A bit wordier
In the source documentation, my selector not only returns a nice list of "Sections", but it also returns about 200 "release notes" sections, with the same css query selector.
Essentially I have a bunch of these I want to get rid of:
2020 Release Notes
2019 Winter Release Notes
Upgrade release-notes for xyz
I don't want those to be included in the resulting docset, so I tried my hand at the regex field to return everything not including the word release:
I essentially need the opposite of this:
^.*release.*$So, don't return anything that has the word "release" in it.
I tried the (?!) negative lookahead in regex, but I get the message:
error parsing regexp: invalid or unsupported Perl syntax: `(?!`
Is there a field in the selector object for rejecting if the title contains a word? I didn't see anything for this purpose in the README:
"css selector": {
"requiretext": "require that the text matches a regexp. If not, this node is not considered as selected",
"type": "Dash data type",
"attr": "Use the value of the specified attribute instead of html node text as the basis for transformation",
"regexp": "PCRE regular expression (no need to enclose in //)",
"replacement": "Replacement text for each match of 'regexp'",
"matchpath": "Only files matching this regular expression will be parsed. Will match all files if not set."
}