System Info
Two versions of transformers:
========= NEW VERSION ==============
transformers version: 4.46.1
- Platform: Linux-5.15.0-1044-nvidia-x86_64-with-glibc2.35
- Python version: 3.11.10
- Huggingface_hub version: 0.23.3
- Safetensors version: 0.4.3
- Accelerate version: 0.32.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.3.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
- GPU type: NVIDIA H100 80GB HBM3
=========== OLD VERSION =====================
transformers version: 4.34.1
- Platform: Linux-5.15.0-1044-nvidia-x86_64-with-glibc2.35
- Python version: 3.11.10
- Huggingface_hub version: 0.17.3
- Safetensors version: 0.4.3
- Accelerate version: 0.20.3
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
@ArthurZucker
Information
Tasks
Reproduction
The pull request #29285 was aimed to make calculations of sin and cos of RoPE to be in float 32.
But it seems that changing device from cpu to cuda also produces different results. Though the difference is not so big.
To check this you may run the following code.
vals = torch.linspace(0, 1, 30000, dtype=torch.float32)
computes = {
"cpu_32" : vals.float().cpu().cos(),
"cuda_32" : vals.float().cuda().cos(),
"cpu_16" : vals.half().cpu().cos(),
"cuda_16": vals.half().cuda().cos()
}
def compare(x, y):
return max(torch.max(torch.abs(x.to(y.device) - y)), torch.max(torch.abs(x - y.to(x.device)))).item()
keys = computes.keys()
print(end='\t')
for k in keys:
print(k, end='\t\t')
print()
for k1 in keys:
print(k1, end='\t')
for k2 in keys:
print(f"{compare(computes[k1], computes[k2]):1.3e}", end='\t')
print()
The output:
cpu_32 cuda_32 cpu_16 cuda_16
cpu_32 0.000e+00 5.960e-08 4.389e-04 4.389e-04
cuda_32 5.960e-08 0.000e+00 4.389e-04 4.389e-04
cpu_16 4.389e-04 4.389e-04 0.000e+00 0.000e+00
cuda_16 4.389e-04 4.389e-04 0.000e+00 0.000e+00
This table shows the maximum difference between calculations on different devices and using different data types.
You may see that all float16 computations are identical. But float32 are different for cuda and cpu.
Previously all sin and cos computations were performed on cpu. To maintain backward compatibility, I propose to run float32 computations on cpu.
Here
https://github.com/unslothai/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L142
142 emb = torch.cat((freqs, freqs), dim=-1)
143 cos = emb.cos()
144 sin = emb.sin()
change to
142 emb = torch.cat((freqs, freqs), dim=-1).cpu()
143 cos = emb.cos().to(device_type)
144 sin = emb.sin().to(device_type)
Impact
According to my study, this difference in calculation of sin & cos embeddings impacts output logits and generated tokens.
The difference between values of output logits may exceed 10. More than 0.1% of output tokens may be changed in comparison to the original calculations.
Expected behavior
RoPE sin and cos values are expected to be the same as in previous versions of transformers.
System Info
Two versions of transformers:
========= NEW VERSION ==============
transformersversion: 4.46.1=========== OLD VERSION =====================
transformersversion: 4.34.1Who can help?
@ArthurZucker
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
The pull request #29285 was aimed to make calculations of sin and cos of RoPE to be in float 32.
But it seems that changing device from cpu to cuda also produces different results. Though the difference is not so big.
To check this you may run the following code.
The output:
This table shows the maximum difference between calculations on different devices and using different data types.
You may see that all float16 computations are identical. But float32 are different for cuda and cpu.
Previously all sin and cos computations were performed on cpu. To maintain backward compatibility, I propose to run float32 computations on cpu.
Here
https://github.com/unslothai/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L142
change to
Impact
According to my study, this difference in calculation of sin & cos embeddings impacts output logits and generated tokens.
The difference between values of output logits may exceed 10. More than 0.1% of output tokens may be changed in comparison to the original calculations.
Expected behavior
RoPE sin and cos values are expected to be the same as in previous versions of transformers.