Skip to content

Different LlamaRotaryEmbedding in old and new versions of transformers #34657

@ivankrylatskoe

Description

@ivankrylatskoe

System Info

Two versions of transformers:
========= NEW VERSION ==============

  • transformers version: 4.46.1
  • Platform: Linux-5.15.0-1044-nvidia-x86_64-with-glibc2.35
  • Python version: 3.11.10
  • Huggingface_hub version: 0.23.3
  • Safetensors version: 0.4.3
  • Accelerate version: 0.32.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA H100 80GB HBM3
    =========== OLD VERSION =====================
  • transformers version: 4.34.1
  • Platform: Linux-5.15.0-1044-nvidia-x86_64-with-glibc2.35
  • Python version: 3.11.10
  • Huggingface_hub version: 0.17.3
  • Safetensors version: 0.4.3
  • Accelerate version: 0.20.3
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The pull request #29285 was aimed to make calculations of sin and cos of RoPE to be in float 32.

But it seems that changing device from cpu to cuda also produces different results. Though the difference is not so big.

To check this you may run the following code.

vals = torch.linspace(0, 1, 30000, dtype=torch.float32)

computes = { 
    "cpu_32" : vals.float().cpu().cos(),
    "cuda_32" : vals.float().cuda().cos(),
    "cpu_16" : vals.half().cpu().cos(),
    "cuda_16": vals.half().cuda().cos()
}

def compare(x, y):
    return max(torch.max(torch.abs(x.to(y.device) - y)), torch.max(torch.abs(x - y.to(x.device)))).item()

keys = computes.keys()
print(end='\t')
for k in keys:
    print(k, end='\t\t')
print()
for k1 in keys:    
    print(k1, end='\t')
    for k2 in keys:
        print(f"{compare(computes[k1], computes[k2]):1.3e}", end='\t')
    print()

The output:

    	cpu_32		cuda_32		cpu_16		cuda_16		
cpu_32	0.000e+00	5.960e-08	4.389e-04	4.389e-04	
cuda_32	5.960e-08	0.000e+00	4.389e-04	4.389e-04	
cpu_16	4.389e-04	4.389e-04	0.000e+00	0.000e+00	
cuda_16	4.389e-04	4.389e-04	0.000e+00	0.000e+00

This table shows the maximum difference between calculations on different devices and using different data types.

You may see that all float16 computations are identical. But float32 are different for cuda and cpu.

Previously all sin and cos computations were performed on cpu. To maintain backward compatibility, I propose to run float32 computations on cpu.

Here
https://github.com/unslothai/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L142

142            emb = torch.cat((freqs, freqs), dim=-1)
143            cos = emb.cos()
144            sin = emb.sin()

change to

142            emb = torch.cat((freqs, freqs), dim=-1).cpu()
143            cos = emb.cos().to(device_type)
144            sin = emb.sin().to(device_type)

Impact

According to my study, this difference in calculation of sin & cos embeddings impacts output logits and generated tokens.
The difference between values of output logits may exceed 10. More than 0.1% of output tokens may be changed in comparison to the original calculations.

Expected behavior

RoPE sin and cos values are expected to be the same as in previous versions of transformers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions