Claude code skills for transformers-api#43340
Claude code skills for transformers-api#43340gautamvarmadatla wants to merge 2 commits intohuggingface:mainfrom
Conversation
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43340&sha=d33b06 |
|
Looks like CI is failing for reasons unrelated to this PR. This PR only adds Claude Skill markdown files. The failing tests involve dynamic or custom tokenizers where I am happy to rebase and rerun CI once the upstream tokenizer fix is in |
|
Hi @gautamvarmadatla, this is definitely just written by an LLM and untested, I'm sorry! I think using this will probably reduce the performance of Claude on the codebase, not improve it. We're really looking for smaller PRs, ideally human-written, that actually address observed weaknesses of code agents like Claude on our codebase, not just a big pile of LLM slop with zero verification |
Hi not sure what made you call it out as AI ( maybe it's long? ) . For certain parts of it I did use AI but i manually verified everything ( e.g. module tree file ) . But most of it was created by me ( including over skill structure, structure of each reference file , examples and content in it ) + I ran all the examples locally to be 100% sure. In addition to this i generated some test files & also asked a lot of questions. I can share the entire logs and Jupiter notebooks if required :) And as far as the length / speed goes, not really. Its max of one hop so speed wouldn't be effected much. Another thing to really note here is that It's kind of lengthy cause I tried to capture entire transformers API ( including exceptions , which is where most agents fail based off my past work ) . But if we look at each individual module it's mostly 200-300 lines of instruction. |
|
Hi @gautamvarmadatla, sorry - I did assume it was LLM-written because of the length. However, I still think it will probably significantly reduce the performance of Claude. Claude is already very familiar with Transformers, because it's a large and famous repo that was definitely included during its training. Most of this either repeats stuff that it already is able to do, or adds confusing instructions, so we definitely don't want it! I would prefer to start with observing a known weakness of Claude on the codebase, and then adding a skill to address that specifically, rather than trying to cover the whole codebase at a high level and repeating things that Claude is clearly already capable of. |
Totally agree. That was my starting point too. But after digging through a bunch of existing skills (including ones used with the Claude plugin), I noticed they work way better when “where to look” is spelled out explicitly, even if the model already kind of knows the repo. It’s also worth keeping in mind that this “knowing” mostly comes from pretraining, and the library moves fast, so that mental model can get stale. In my testing, having that explicit layer reduced confident-but-wrong answers and helped me jump to the right files faster, though I’m sure there are question types I may have missed or overlooked, and it may add some latency. That’s why I structured this skill more hierarchically. SKILL.md is the table of contents and routing, and the real value is in the reference md files. I designed those around common failure modes like generation internals, Auto dispatch, and tokenizer edge cases, plus concrete grep patterns and “if you see X, check Y” pointers, etc (the search patterns, in particular, improved speed for a lot of the queries I tried). That said, I’m happy to narrow this down and remove anything that feels redundant. If you have a minute for some quick testing, any notes on what felt helpful vs noisy would be super useful. |
|
Hi @Emasoft and @Rocketknight1, Just following up on the PR I opened. I have also seen a lot of interest from others who want to contribute, but I have not seen any additional PRs come in over the past two weeks. Have you had a chance to test it yet? If so, I would really appreciate any feedback and guidance on next steps. If you feel this is not the right direction, please let me know and I can either update the PR or close it. Thanks! |
|
I don't even know how I'd begin testing this, I'm sorry! Like I said, it's enormous and Claude really doesn't seem to struggle with most of what's in here, so I don't know how I'd even observe an improvement. As a result, I think I'd just prefer not to merge it - it's not that I have specific complaints about the text, it's more that I have zero data to suggest it will help, and maintainer time and attention is very limited, so I don't have a spare week to A/B test this in a statistically significant range of scenarios. |
What does this PR do?
Fixes #42971
This PR adds a Claude Skill for the
huggingface/transformersto help contributors navigate the codebase and common development workflows more efficientlyWhat’s included
What’s not included
The original issue mentions a plugin request as well, but this PR focuses on delivering the Skill first as a minimal, useful step. Plugin support can be handled in a follow-up PR.
How to test
A few of the many examples I tested include questions like:
API existence / anti-hallucination check:
“Does Transformers have a public argument called
temperature_decayongenerate()? If yes, show the exact signature location. If no, point to the closest real knobs and where they’re defined.”Repo navigation / backend dispatch:
“Where is the logic that decides which backend (PyTorch vs TensorFlow vs Flax) gets used when calling
AutoModel.from_pretrained()? Point to the exact files and decision flow.”Generation internals / repetition debugging:
“I’m getting repetitive text in long generations even with
repetition_penaltyset, what knobs interact most strongly with repetition, and which files apply these penalties during decoding?”Quantization & loading performance troubleshooting:
“Loading a 7B causal LM with 4-bit quantization and
device_map="auto"is causing slow CPU offload and high RAM. what are the likely causes in the loading path, what knobs should I change, and where are they handled in code?”Serving/export reality check:
“Is there a supported CLI command
transformers servefor text-generation with batching? If not, what are the supported alternatives in the Transformers ecosystem, and where are the relevant docs/code in this repo?”PS: This is just an initial draft I put together so maintainers and other community folks can try it out first. Once people test it and share feedback, we can iterate on it and polish/improve it.
For review : @Rocketknight1, @stevhliu, @ArthurZucker
CC : @Emasoft, @coolgalsandiego