Skip to content

Conversation

@leonardr
Copy link
Contributor

@leonardr leonardr commented Feb 12, 2024

I'm adding type hints to my project Beautiful Soup, and part of this work required minor changes to the typeshed stubs for html5lib. This PR contributes those changes back to typeshed.

I've also noticed that the typeshed stub for HTMLParser.parse() is wrong: it lists the second argument as scripting:bool, but the second argument is actually treebuilder:str; scripting is one of the keyword arguments. But I'm too new to Python type hints to know what should be done with scripting, and I also don't know what downstream effects a change might have on other users, so I'm not trying to fix that in this branch.

def reset(self) -> None: ...
def openStream(self, source): ...
def position(self): ...
def position(self) -> tuple[int, int]: ...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastPhase: Any
beforeRCDataPhase: Any
framesetOK: bool
tokenizer: Any
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is set by HTMLParser._parse, not by the constructor, so it's not always guaranteed to be there, but the same is true of lastPhase, beforeRCDataPhase, framesetOK, and so on.

@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll look into the other issue you mentioned in your description.

@JelleZijlstra JelleZijlstra merged commit cb27610 into python:main Feb 16, 2024
@JelleZijlstra
Copy link
Member

I've also noticed that the typeshed stub for HTMLParser.parse() is wrong: it lists the second argument as scripting:bool, but the second argument is actually treebuilder:str; scripting is one of the keyword arguments.

I think the stub is actually right. The parse method (not to be confused with the separate module-level function named parse) is here: it calls _parse with arguments stream, False, None, *args, **kwargs. Now _parse takes arguments stream, innerHTML=False, container="div", scripting=False, **kwargs. innerHTML and container are always False and None, so the *args and **kwargs accepted by parse have to map to this scripting arg and the **kwargs.

JelleZijlstra added a commit to JelleZijlstra/typeshed that referenced this pull request Feb 16, 2024
I started out investigating comments in python#11411 and ended up adding a few other
types that were reasonably obvious from the source code. For reference:

https://github.com/html5lib/html5lib-python/tree/master/html5lib

cc @leonardr
@leonardr leonardr deleted the html5lib-for-beautifulsoup4 branch February 16, 2024 13:29
@leonardr
Copy link
Contributor Author

I've also noticed that the typeshed stub for HTMLParser.parse() is wrong: it lists the second argument as scripting:bool, but the second argument is actually treebuilder:str; scripting is one of the keyword arguments.

I think the stub is actually right. The parse method (not to be confused with the separate module-level function named parse) is here: it calls _parse with arguments stream, False, None, *args, **kwargs. Now _parse takes arguments stream, innerHTML=False, container="div", scripting=False, **kwargs. innerHTML and container are always False and None, so the *args and **kwargs accepted by parse have to map to this scripting arg and the **kwargs.

This makes sense, thanks for the explanation.

hauntsaninja pushed a commit that referenced this pull request Feb 20, 2024
I started out investigating comments in #11411 and ended up adding a few other
types that were reasonably obvious from the source code. For reference:
https://github.com/html5lib/html5lib-python/tree/master/html5lib
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants