Fix 'T5GemmaConfig' object has no attribute 'num_hidden_layers'#40454
Fix 'T5GemmaConfig' object has no attribute 'num_hidden_layers'#40454
Conversation
|
Ah I see my change led to some failed tests - let me investigate |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
a18531d to
725d8f3
Compare
|
@lewtun I think this is an issue with the In a nutshell, the Do you want to have a go at it, or would you rather have me fix it? 🤗 |
Thanks for the context! I can have a go at it :) |
1aef8f6 to
ef13c59
Compare
gante
left a comment
There was a problem hiding this comment.
A few nits to trim unnecessary logic :D
| # Access the encoder/decoder sub-configs directly for models like T5Gemma | ||
| if hasattr(self.config, "decoder") and hasattr(self.config, "encoder"): | ||
| decoder_cache_kwargs["config"] = self.config.decoder | ||
| encoder_cache_kwargs["config"] = self.config.encoder | ||
| else: |
There was a problem hiding this comment.
This shouldn't be needed -- if self.config.get_text_config(...) is not working properly, then it means it is missing logic :)
(I think get_text_config() is missing "encoder" in the encoder_possible_text_config_names list)
There was a problem hiding this comment.
Thanks for the tip, I tried adding "encoder" to encoder_possible_text_config_names but then hit this constraint when get_text_config() is called with default values:
transformers/src/transformers/configuration_utils.py
Lines 1226 to 1230 in d10603f
I have now tried to override the get_text_config() method for T5Gemma specifically, but let me know if you think the base method should be made to work out of the box instead
| # For encoder-decoder models, we need to use separate configs for encoder and decoder | ||
| decoder_cache_kwargs = dynamic_cache_kwargs.copy() | ||
| encoder_cache_kwargs = dynamic_cache_kwargs.copy() |
There was a problem hiding this comment.
dynamic_cache_kwargs only contains a config which we know for sure we won't use in this branch -- we don't need to copy the object, we can create a new dictionary
| decoder_cache_kwargs = dynamic_cache_kwargs.copy() | ||
| encoder_cache_kwargs = dynamic_cache_kwargs.copy() | ||
|
|
||
| if "config" in dynamic_cache_kwargs: |
There was a problem hiding this comment.
this is always true in this branch
|
[For maintainers] Suggested jobs to run (before merge) run-slow: t5gemma |
|
(see #40553 ) |
|
closing in favor of #40553 |
What does this PR do?
Fixes
AttributeError: 'T5GemmaConfig' object has no attribute 'num_hidden_layers'when training. I wasn't sure if this kind of issue is typically unit tested, so I'm happy to add a regression test if you can point me to where it should go :)Minimal repro script below:
Stack trace on
main(commit58cebc848baa0af2e4ff159fb11504d94179f376):Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.