Improve GPTNeoX model card following standardization guidelines by Gurudeep-hn · Pull Request #38550 · huggingface/transformers

Gurudeep-hn · 2025-06-03T09:14:07Z

Addresses #36979

This PR standardizes the GPTNeoX model card by:

✅ Adding a clear, accessible description highlighting what makes GPTNeoX unique (RoPE, parallel layers, strong few-shot reasoning)
✅ Including ready-to-use code examples for Pipeline, AutoModel, and transformers-cli
✅ Adding a quantization example for easier deployment on consumer hardware
✅ Following the template format specified in the issue
✅ Providing comprehensive usage examples that help users get started quickly

The improved model card now provides users with everything they need to understand and use GPTNeoX effectively, from basic Pipeline usage to advanced quantization techniques.

Changes made:

Replaced technical jargon with accessible explanations
Added practical code examples for different skill levels
Included quantization for users with limited hardware
Maintained all technical documentation sections
Followed the exact template from issue [Community contributions] Model cards #36979

- Add clear description highlighting unique features (RoPE, parallel layers) - Include Pipeline, AutoModel, and CLI usage examples - Add quantization example for better accessibility on consumer hardware - Follow template format from issue huggingface#36979 - Provide comprehensive code examples for different use cases

Rocketknight1 · 2025-06-03T11:52:58Z

cc @stevhliu

stevhliu

Thanks, make sure you're following the template please! 🤗

stevhliu · 2025-06-03T17:35:35Z

 -->

-# GPT-NeoX
+# GPTNeoX


Suggested change

# GPTNeoX

# GPT-NeoX

stevhliu · 2025-06-03T17:36:20Z

-# GPT-NeoX
+# GPTNeoX

-<div class="flex flex-wrap space-x-1">


The badges shouldn't be removed, and they go above # GPT-NeoX and should be aligned to the right. It should also include the FlashAttention and SDPA badges. Check the example template to see what it should look like!

stevhliu · 2025-06-03T17:45:14Z

-<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
-<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
-</div>
+GPTNeoX is a 20 billion parameter autoregressive language model that represents a breakthrough in open-source large language models. What makes GPTNeoX unique is its use of rotary positional embeddings (RoPE) instead of learned positional embeddings, allowing for better extrapolation to longer sequences than traditional transformer models. It also employs parallel attention and feedforward layers, making it more efficient during both training and inference.


Suggested change

GPTNeoX is a 20 billion parameter autoregressive language model that represents a breakthrough in open-source large language models. What makes GPTNeoX unique is its use of rotary positional embeddings (RoPE) instead of learned positional embeddings, allowing for better extrapolation to longer sequences than traditional transformer models. It also employs parallel attention and feedforward layers, making it more efficient during both training and inference.

[GPT-NeoX](https://huggingface.co/papers/2204.06745) is a fully open-source 20B language model built for transparency and improving research on LLM training and AI safety and interpretability. It uses rotary positional embeddings (RoPE) to better handle longer sequences and computes attention and feedforward layers in parallel for efficiency. It is trained on the [Pile](https://huggingface.co/datasets/EleutherAI/pile), a 825GB dataset consisting of 22 smaller high-quality datasets.

stevhliu · 2025-06-03T17:45:23Z

+GPTNeoX is a 20 billion parameter autoregressive language model that represents a breakthrough in open-source large language models. What makes GPTNeoX unique is its use of rotary positional embeddings (RoPE) instead of learned positional embeddings, allowing for better extrapolation to longer sequences than traditional transformer models. It also employs parallel attention and feedforward layers, making it more efficient during both training and inference.

-## Overview
+Developed by EleutherAI and trained on the comprehensive Pile dataset, GPTNeoX delivers particularly strong few-shot reasoning capabilities that often exceed similarly sized models like GPT-3. At the time of its release, it was the largest dense autoregressive model with publicly available weights.


Suggested change

Developed by EleutherAI and trained on the comprehensive Pile dataset, GPTNeoX delivers particularly strong few-shot reasoning capabilities that often exceed similarly sized models like GPT-3. At the time of its release, it was the largest dense autoregressive model with publicly available weights.

stevhliu · 2025-06-03T17:46:57Z

-mathematics, and knowledge-based tasks. We find that GPT-NeoX-20B is a particularly powerful few-shot reasoner and
-gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models. We open-source
-the training and evaluation code, as well as the model weights, at [https://github.com/EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox).
+The original paper can be found [here](https://hf.co/papers/2204.06745), and you can find the official checkpoints on the [Hugging Face Hub](https://huggingface.co/EleutherAI/gpt-neox-20b).


Suggested change

The original paper can be found [here](https://hf.co/papers/2204.06745), and you can find the official checkpoints on the [Hugging Face Hub](https://huggingface.co/EleutherAI/gpt-neox-20b).

You can find the original GPT-NeoX checkpoint under the [EleutherAI](https://huggingface.co/EleutherAI/gpt-neox-20b) organization.

stevhliu · 2025-06-03T17:54:43Z

 ```bash
-pip install -U flash-attn --no-build-isolation
+# Using transformers-cli
+transformers-cli env


Suggested change

transformers-cli env

echo -e "Plants create energy through a process known as" | transformers-cli run --task text-generation --model EleutherAI/gpt-neox-20b --device 0

stevhliu · 2025-06-03T17:54:56Z

 ```

-### Usage
+### Quantization Example


Suggested change

### Quantization Example

stevhliu · 2025-06-03T17:55:13Z

+### Quantization Example

-To load a model using Flash Attention 2, we can pass the argument `attn_implementation="flash_attention_2"` to [`.from_pretrained`](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained). We'll also load the model in half-precision (e.g. `torch.float16`), since it results in almost no degradation to audio quality but significantly lower memory usage and faster inference:
+For easier deployment on consumer hardware, you can use quantization:


Suggested change

For easier deployment on consumer hardware, you can use quantization:

Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.

The example below uses [torchao](../quantization/torchao) to only quantize the weights to int4.

stevhliu · 2025-06-03T17:55:56Z

-
-model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b", torch_dtype=torch.float16, attn_implementation="flash_attention_2").to(device)
-...
+from transformers import AutoTokenizer, AutoModelForCausalLM


import torch from transformers import TorchAoConfig, AutoModelForCausalLM, AutoTokenizer quantization_config = TorchAoConfig("int4_weight_only", group_size=128) tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b") model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b", torch_dtype=torch.bfloat16, device_map="auto", quantization_config=quantization_config) inputs = tokenizer("The future of AI is", return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_length=50, cache_implementation="static") tokenizer.decode(outputs[0], skip_special_tokens=True)

stevhliu · 2025-06-03T17:57:59Z

-## Resources
-
- [Causal language modeling task guide](../tasks/language_modeling)
+GPTNeoX uses rotary positional embeddings (RoPE) instead of learned positional embeddings, which allows for better extrapolation to longer sequences. The model also employs parallel attention and feedforward layers, making it more efficient during training.


Suggested change

GPTNeoX uses rotary positional embeddings (RoPE) instead of learned positional embeddings, which allows for better extrapolation to longer sequences. The model also employs parallel attention and feedforward layers, making it more efficient during training.

- GPT-NeoX uses a different tokenizer than [GPT-J](./gptj) and [GPT-Neo](./gpt_neo). This tokenizer allocates additional tokens to whitespace characters, making the model more suitable for certain tasks like code generation.

stevhliu reviewed Jun 3, 2025

View reviewed changes

stevhliu mentioned this pull request Jun 4, 2025

[Community contributions] Model cards #36979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve GPTNeoX model card following standardization guidelines#38550

Improve GPTNeoX model card following standardization guidelines#38550
Gurudeep-hn wants to merge 1 commit intohuggingface:mainfrom
Gurudeep-hn:improve-gptneox-model-card

Gurudeep-hn commented Jun 3, 2025

Uh oh!

Rocketknight1 commented Jun 3, 2025

Uh oh!

stevhliu left a comment

Uh oh!

stevhliu Jun 3, 2025

Uh oh!

stevhliu Jun 3, 2025

Uh oh!

stevhliu Jun 3, 2025

Uh oh!

stevhliu Jun 3, 2025

Uh oh!

stevhliu Jun 3, 2025

Uh oh!

stevhliu Jun 3, 2025

Uh oh!

stevhliu Jun 3, 2025

Uh oh!

stevhliu Jun 3, 2025

Uh oh!

stevhliu Jun 3, 2025

Uh oh!

stevhliu Jun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	The original paper can be found [here](https://hf.co/papers/2204.06745), and you can find the official checkpoints on the [Hugging Face Hub](https://huggingface.co/EleutherAI/gpt-neox-20b).
	You can find the original GPT-NeoX checkpoint under the [EleutherAI](https://huggingface.co/EleutherAI/gpt-neox-20b) organization.

	transformers-cli env
	echo -e "Plants create energy through a process known as" \| transformers-cli run --task text-generation --model EleutherAI/gpt-neox-20b --device 0

-For easier deployment on consumer hardware, you can use quantization:
+Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
+The example below uses [torchao](../quantization/torchao) to only quantize the weights to int4.

	GPTNeoX uses rotary positional embeddings (RoPE) instead of learned positional embeddings, which allows for better extrapolation to longer sequences. The model also employs parallel attention and feedforward layers, making it more efficient during training.
	- GPT-NeoX uses a different tokenizer than [GPT-J](./gptj) and [GPT-Neo](./gpt_neo). This tokenizer allocates additional tokens to whitespace characters, making the model more suitable for certain tasks like code generation.

Conversation

Gurudeep-hn commented Jun 3, 2025

Uh oh!

Rocketknight1 commented Jun 3, 2025

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants