Skip to content

Claude code skills for transformers-api#43340

Open
gautamvarmadatla wants to merge 2 commits intohuggingface:mainfrom
gautamvarmadatla:claude-transformers-skill
Open

Claude code skills for transformers-api#43340
gautamvarmadatla wants to merge 2 commits intohuggingface:mainfrom
gautamvarmadatla:claude-transformers-skill

Conversation

@gautamvarmadatla
Copy link
Copy Markdown

What does this PR do?

Fixes #42971

This PR adds a Claude Skill for the huggingface/transformers to help contributors navigate the codebase and common development workflows more efficiently

What’s included

  • A repo-specific Claude Skill (SKILL.md and corresponding reference files ) describing:
    • Key library entry points and directory map (models, configs, tokenizers, generation, pipelines, trainer, etc.)
    • Common contributor workflows
    • Conventions and gotchas that help Claude give higher-quality, repo-aligned guidance

What’s not included

  • Claude Code plugin support is not implemented in this PR.
    The original issue mentions a plugin request as well, but this PR focuses on delivering the Skill first as a minimal, useful step. Plugin support can be handled in a follow-up PR.

How to test

  • Load the repository in Claude and verify the Skill is discovered.
  • Ask a few repo-navigation questions (e.g., “Where do model configs live?” / “What tests should I run after changing X?”) and confirm Claude follows the Skill’s structure and pointers.

A few of the many examples I tested include questions like:

  • API existence / anti-hallucination check:
    “Does Transformers have a public argument called temperature_decay on generate()? If yes, show the exact signature location. If no, point to the closest real knobs and where they’re defined.”

  • Repo navigation / backend dispatch:
    “Where is the logic that decides which backend (PyTorch vs TensorFlow vs Flax) gets used when calling AutoModel.from_pretrained()? Point to the exact files and decision flow.”

  • Generation internals / repetition debugging:
    “I’m getting repetitive text in long generations even with repetition_penalty set, what knobs interact most strongly with repetition, and which files apply these penalties during decoding?”

  • Quantization & loading performance troubleshooting:
    “Loading a 7B causal LM with 4-bit quantization and device_map="auto" is causing slow CPU offload and high RAM. what are the likely causes in the loading path, what knobs should I change, and where are they handled in code?”

  • Serving/export reality check:
    “Is there a supported CLI command transformers serve for text-generation with batching? If not, what are the supported alternatives in the Transformers ecosystem, and where are the relevant docs/code in this repo?”

PS: This is just an initial draft I put together so maintainers and other community folks can try it out first. Once people test it and share feedback, we can iterate on it and polish/improve it.

For review : @Rocketknight1, @stevhliu, @ArthurZucker
CC : @Emasoft, @coolgalsandiego

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43340&sha=d33b06

@gautamvarmadatla
Copy link
Copy Markdown
Author

gautamvarmadatla commented Jan 19, 2026

Looks like CI is failing for reasons unrelated to this PR. This PR only adds Claude Skill markdown files.

The failing tests involve dynamic or custom tokenizers where AutoTokenizer.from_pretrained(..., trust_remote_code=True) returns TokenizersBackend instead of CustomTokenizerFast. This matches a known upstream regression where auto_map is ignored in some cases. See issue #43202.

CI job link: https://app.circleci.com/pipelines/github/huggingface/transformers/160360/workflows/306aee43-9030-477f-919e-3d09752353dd/jobs/2110009/tests

I am happy to rebase and rerun CI once the upstream tokenizer fix is in main. Or please lmk i can open another issue and PR to fix this

@Rocketknight1
Copy link
Copy Markdown
Member

Hi @gautamvarmadatla, this is definitely just written by an LLM and untested, I'm sorry! I think using this will probably reduce the performance of Claude on the codebase, not improve it.

We're really looking for smaller PRs, ideally human-written, that actually address observed weaknesses of code agents like Claude on our codebase, not just a big pile of LLM slop with zero verification

@gautamvarmadatla
Copy link
Copy Markdown
Author

gautamvarmadatla commented Jan 19, 2026

Hi @gautamvarmadatla, this is definitely just written by an LLM and untested, I'm sorry! I think using this will probably reduce the performance of Claude on the codebase, not improve it.

We're really looking for smaller PRs, ideally human-written, that actually address observed weaknesses of code agents like Claude on our codebase, not just a big pile of LLM slop with zero verification

Hi not sure what made you call it out as AI ( maybe it's long? ) . For certain parts of it I did use AI but i manually verified everything ( e.g. module tree file ) . But most of it was created by me ( including over skill structure, structure of each reference file , examples and content in it ) + I ran all the examples locally to be 100% sure. In addition to this i generated some test files & also asked a lot of questions. I can share the entire logs and Jupiter notebooks if required :)

And as far as the length / speed goes, not really. Its max of one hop so speed wouldn't be effected much. Another thing to really note here is that It's kind of lengthy cause I tried to capture entire transformers API ( including exceptions , which is where most agents fail based off my past work ) . But if we look at each individual module it's mostly 200-300 lines of instruction.

@Rocketknight1
Copy link
Copy Markdown
Member

Hi @gautamvarmadatla, sorry - I did assume it was LLM-written because of the length. However, I still think it will probably significantly reduce the performance of Claude. Claude is already very familiar with Transformers, because it's a large and famous repo that was definitely included during its training. Most of this either repeats stuff that it already is able to do, or adds confusing instructions, so we definitely don't want it! I would prefer to start with observing a known weakness of Claude on the codebase, and then adding a skill to address that specifically, rather than trying to cover the whole codebase at a high level and repeating things that Claude is clearly already capable of.

@gautamvarmadatla
Copy link
Copy Markdown
Author

Hi @gautamvarmadatla, sorry - I did assume it was LLM-written because of the length. However, I still think it will probably significantly reduce the performance of Claude. Claude is already very familiar with Transformers, because it's a large and famous repo that was definitely included during its training. Most of this either repeats stuff that it already is able to do, or adds confusing instructions, so we definitely don't want it! I would prefer to start with observing a known weakness of Claude on the codebase, and then adding a skill to address that specifically, rather than trying to cover the whole codebase at a high level and repeating things that Claude is clearly already capable of.

Totally agree. That was my starting point too.

But after digging through a bunch of existing skills (including ones used with the Claude plugin), I noticed they work way better when “where to look” is spelled out explicitly, even if the model already kind of knows the repo. It’s also worth keeping in mind that this “knowing” mostly comes from pretraining, and the library moves fast, so that mental model can get stale.

In my testing, having that explicit layer reduced confident-but-wrong answers and helped me jump to the right files faster, though I’m sure there are question types I may have missed or overlooked, and it may add some latency. That’s why I structured this skill more hierarchically. SKILL.md is the table of contents and routing, and the real value is in the reference md files. I designed those around common failure modes like generation internals, Auto dispatch, and tokenizer edge cases, plus concrete grep patterns and “if you see X, check Y” pointers, etc (the search patterns, in particular, improved speed for a lot of the queries I tried).

That said, I’m happy to narrow this down and remove anything that feels redundant. If you have a minute for some quick testing, any notes on what felt helpful vs noisy would be super useful.

@gautamvarmadatla
Copy link
Copy Markdown
Author

Hi @Emasoft and @Rocketknight1,

Just following up on the PR I opened. I have also seen a lot of interest from others who want to contribute, but I have not seen any additional PRs come in over the past two weeks.

Have you had a chance to test it yet? If so, I would really appreciate any feedback and guidance on next steps. If you feel this is not the right direction, please let me know and I can either update the PR or close it.

Thanks!

@Rocketknight1
Copy link
Copy Markdown
Member

I don't even know how I'd begin testing this, I'm sorry! Like I said, it's enormous and Claude really doesn't seem to struggle with most of what's in here, so I don't know how I'd even observe an improvement.

As a result, I think I'd just prefer not to merge it - it's not that I have specific complaints about the text, it's more that I have zero data to suggest it will help, and maintainer time and attention is very limited, so I don't have a spare week to A/B test this in a statistically significant range of scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Please create a Huggingface Transformers SKILL for Claude

2 participants