Skip to content

Conversation

@IAL32
Copy link
Contributor

@IAL32 IAL32 commented Jan 12, 2023

Closes #28888

if webserver.instance_name_has_markup = True, we strip markup from flask_app.config["APP_NAME"], preventing it to show up as a sanitized HTML in the <title> tag.

@boring-cyborg boring-cyborg bot added the area:webserver Webserver related Issues label Jan 12, 2023
@eladkal eladkal added this to the Airflow 2.5.2 milestone Jan 13, 2023
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to also un-escape the value? (for &amp; etc) Maybe it’s better to use an HTML parser for this. Or maybe that’s an overkill. No idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to also un-escape the value? (for & etc)

Something like https://docs.python.org/3/library/html.html#html.unescape ?

From the docs:

Convert all named and numeric character references (e.g. &gt;, &#62;, &#x3e;) in the string s to the corresponding Unicode characters. This function uses the rules defined by the HTML 5 standard for both valid and invalid character references, and the list of HTML 5 named character references.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing html with regex can be tricky. Please see this answer https://stackoverflow.com/a/4869782/2610955 and also https://blog.codinghorror.com/parsing-html-the-cthulhu-way/. The question has several options and also discusses handling &amp; . Since beautifulsoup is already a dependency maybe it can be used.

python      
Python 3.10.6 (main, Aug  2 2022, 15:11:28) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bs4
>>> from bs4 import BeautifulSoup
>>> b = BeautifulSoup("<b>Bold Site Title Test</b>")          
>>> b.get_text()
'Bold Site Title Test'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I had a devel installation and beautifulsoup is a devel dependency.

Copy link
Member

@uranusjr uranusjr Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's why I wonder if this would be an overkill. It's not terribly difficult to implement text extraction with html.parser (stdlib) but whether it's worthwhile is still susceptive.

Copy link
Member

@uranusjr uranusjr Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is easier than I thought!

import html.parser

def strip_tags(inp: str) -> str:
    parts: list[str] = []

    class TagStripParser(html.parser.HTMLParser):
        def handle_data(self, d: str) -> None:
            parts.append(d)

    TagStripParser().feed(inp)
    return "".join(parts)
>>> strip_tags("<b>Bold Site <span>Title</span> Test &amp;</b>")
'Bold Site Title Test &'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used @uranusjr as suggestion for bc21a02. I really didn't know where to put the method. Is there a better place?

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Markup() class from markupsafe instead?

@IAL32
Copy link
Contributor Author

IAL32 commented Jan 14, 2023

Can we use Markup() class from markupsafe instead?

@ashb We can't. Markup only escapes html text, which is already done by Flask (see tests).

The goal of my PR would be to remove HTML tags prior to being escaped by Flask, so it doesn't look ugly.

An alternative would be to have something like instance_title which does not contain any markup, but I imagine that's a whole other config entry that we might not want to have.

@ashb
Copy link
Member

ashb commented Jan 14, 2023

The goal of my PR would be to remove HTML tags prior to being escaped by Flask, so it doesn't look ugly.

If you put markup in the attribute and don't want it escaped: don't put markup in the attribute.

Stripping didn't seem worth it when the user can just do it selves.

@ashb
Copy link
Member

ashb commented Jan 14, 2023

Oh sorry I see. We have it on title and on the page as the h1

@IAL32
Copy link
Contributor Author

IAL32 commented Jan 14, 2023

Exactly, we use the same attribute for both, which makes this a bit tricky

@potiuk potiuk force-pushed the ac/28888-instance-name-title-markup branch from 798d17b to 51bdd9b Compare January 20, 2023 22:28
@BasPH
Copy link
Contributor

BasPH commented Feb 13, 2023

Isn't striptags() better to use here? Instead of implementing our own...

@Taragolis Taragolis force-pushed the ac/28888-instance-name-title-markup branch from 664924b to 3f38004 Compare February 18, 2023 17:46
@pierrejeambrun pierrejeambrun added the type:bug-fix Changelog: Bug Fixes label Mar 1, 2023
@potiuk potiuk force-pushed the ac/28888-instance-name-title-markup branch 2 times, most recently from a1df630 to dc9dd28 Compare March 16, 2023 00:57
@potiuk potiuk force-pushed the ac/28888-instance-name-title-markup branch from dc9dd28 to 46177e2 Compare March 16, 2023 08:54
@uranusjr
Copy link
Member

I switched the stripping implementation to use Markupsafe. One less thing to maintain.

@uranusjr uranusjr changed the title Strip markup from appbuilder.app_name if instance_name_has_markup = True Strip markup from app_name if instance_name_has_markup = True Mar 16, 2023
@uranusjr uranusjr merged commit 971e322 into apache:main Mar 16, 2023
@IAL32 IAL32 deleted the ac/28888-instance-name-title-markup branch March 16, 2023 12:03
pierrejeambrun pushed a commit that referenced this pull request Mar 23, 2023
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit 971e322)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:webserver Webserver related Issues type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

webserver.instance_name shows markup text in <title> tag

9 participants