Skip to content

Convert compatible full-regex rules to content-blocking syntax#621

Open
shivaram19 wants to merge 1 commit into
brave:masterfrom
shivaram19:convert-compatible-full-regex-rules
Open

Convert compatible full-regex rules to content-blocking syntax#621
shivaram19 wants to merge 1 commit into
brave:masterfrom
shivaram19:convert-compatible-full-regex-rules

Conversation

@shivaram19
Copy link
Copy Markdown

Description

Fixes #156

Currently, full-regex rules are blanket-rejected when converting to content-blocking syntax because Safari's support for regex features is much more limited than what ABP syntax allows.

This PR makes it possible to convert some full-regex rules by first verifying that the regex only uses features supported by the content-blocking format.

Supported regex features (now allowed)

  • Literal characters and escaped metacharacters (e.g. \., \*, \/)
  • . (any character)
  • [...] character classes, including ranges [a-z] and negation [^...]
  • ?, +, * quantifiers
  • (...) capturing groups
  • ^ and $ anchors

Still unsupported (correctly rejected)

  • \d, \w, \s, \b, etc. character-class escapes
  • {...} quantified repetition
  • | alternation
  • (?:...), (?=...), (?!...), etc. special groups

Changes

  • Added is_cb_compatible_regex() validator in src/content_blocking.rs
  • Updated TryFrom<NetworkFilter> for CbRuleEquivalent to attempt conversion for compatible full-regex rules
  • Added unit tests for both supported conversions and unsupported rejections
  • Added FilterSet integration test verifying that compatible rules are included and incompatible ones are dropped

Test results

cargo test --features content-blocking
# 220 passed, 0 failed

Fixes brave#156

Currently, full-regex rules are blanket-rejected when converting to
content-blocking syntax. However, Safari's content-blocking format does
support a subset of regex features:

- Literal characters and escaped metacharacters
- . (any character)
- [...] character classes (including ranges and negation)
- ?, +, * quantifiers
- (...) capturing groups
- ^ and $ anchors

This change adds a validator (is_cb_compatible_regex) that checks
whether a full-regex rule uses only supported features. Compatible
rules are now converted directly; incompatible ones still return
FullRegexUnsupported.

Test coverage added for supported conversions (simple regex, char
classes, groups, anchors) and unsupported rejections (\d, {n},
alternation, non-capturing groups, lookaheads, word boundaries).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"Best-effort" conversion of full-regex rules to content-blocking format

1 participant