Skip to content

Enhance the handling of Union types in HfArgumentParser#41441

Open
cyyever wants to merge 2 commits intohuggingface:mainfrom
cyyever:str_union
Open

Enhance the handling of Union types in HfArgumentParser#41441
cyyever wants to merge 2 commits intohuggingface:mainfrom
cyyever:str_union

Conversation

@cyyever
Copy link
Copy Markdown
Contributor

@cyyever cyyever commented Oct 8, 2025

What does this PR do?

This PR enhance the handling of Union types containing str by two rules:

  1. If str in Union and Union has more than one other types pass str to parser.add_argument. The caller is responsible to convert the string argument to a more proper type.
  2. Otherwise, Union must be X | str and pass X to parser.add_argument

@cyyever
Copy link
Copy Markdown
Contributor Author

cyyever commented Oct 8, 2025

@SunMarc This can resolve fsdp error.

@cyyever cyyever marked this pull request as draft October 8, 2025 10:50
@cyyever cyyever changed the title Simplify handling of Union types in HfArgumentParser Enhance handling of Union types in HfArgumentParser Oct 8, 2025
@cyyever cyyever marked this pull request as ready for review October 8, 2025 11:03
@cyyever cyyever changed the title Enhance handling of Union types in HfArgumentParser Enhance the handling of Union types in HfArgumentParser Oct 9, 2025
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
@Rocketknight1
Copy link
Copy Markdown
Member

CI seems green so gentle ping @SunMarc

@cyyever
Copy link
Copy Markdown
Contributor Author

cyyever commented Nov 25, 2025

@Rocketknight1 @SunMarc We also need it for moving to Python 3.10 | typing that unifying Optional and Union for config classes.

Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this was always been a bit of a mess, I left a comment regarding how to fix this, would you like to fix fixing this ?

Comment on lines +179 to +184
if len(field.type.__args__) > 2:
origin_type = str
else:
# filter `str` in Union
field.type = field.type.__args__[0] if field.type.__args__[1] is str else field.type.__args__[1]
origin_type = getattr(field.type, "__origin__", field.type)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really special case when None is not in the field. In most cases, this don't happen at all. For example for fsdp, we have dict[str, Any] | str | None. So it is not going here but instead in the
elif bool not in field.type.__args__: condition.

Could you try to clean a bit the mess here ? From what I see:

  1. If we have bool + None, we don't touch
  2. if we don't have none, we only allow one class
  3. if we have none, we can allow more:
    3.1 if we have str and other types, we should enforce the user to pass str.
    3.2 if we don't have str, we use the first type passed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can add some tests testing those, that would be huge also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants