Skip to content

Conversation

@kojiishi
Copy link
Collaborator

#251 assumed that all tags are closed properly.

This assumption doesn't stand for cases like:

  1. Self-closing tags such as <img> don't have corresponding close tags.
  2. Unpaired close tags are still valid HTML.

This patch supports these cases by assuming all open tags that doesn't nest correctly or that doesn't close are automatically closed.

This isn't the full HTML "adoption agency algorithm", but it should be good enough for the needs of BudouX.

Fixes #355

google#251 assumed that all tags are closed properly.

This assumption doesn't stand for cases like:
1. Self-closing tags such as `<img>` don't have corresponding close tags.
2. Unpaired close tags are still valid HTML.

This patch supports these cases by assuming all open tags that doesn't
nest correctly or that doesn't close are automatically closed.

This isn't the full HTML "adoption agency algorithm", but it should be
good enough for the needs of BudouX.

Fixes google#355
@kojiishi kojiishi requested a review from tushuhei November 10, 2023 11:59
tushuhei
tushuhei previously approved these changes Nov 10, 2023
Copy link
Member

@tushuhei tushuhei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

resolver = html_processor.HTMLChunkResolver(['abxyabc', 'def'], '<wbr>')
resolver.feed(input)
self.assertEqual(resolver.output, expected,
'WBR tags should not be inserted if NOBR.')
Copy link
Member

@tushuhei tushuhei Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate this test message by mentioning the IMG tag?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks, done.

@tushuhei
Copy link
Member

@kojiishi I left a small comment actually. PTAL.

Copy link
Member

@tushuhei tushuhei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit request about the test message

@kojiishi kojiishi merged commit 2457c51 into google:main Nov 11, 2023
@kojiishi kojiishi deleted the unpaired branch November 11, 2023 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unopened HTML tag causes exception in budoux 0.6

2 participants