Skip to content

[BUG] Error in llama3.1 resizing embedding with ZeRO 3 #26

@Gaiejj

Description

@Gaiejj

Required prerequisites

What version of align-anything are you using?

0.1.0-dev

System information

  • transformers version: 4.43.1
  • Platform: Linux-5.15.0-1040-nvidia-x86_64-with-glibc2.35
  • Python version: 3.11.9
  • Huggingface_hub version: 0.24.1
  • Safetensors version: 0.4.3
  • Accelerate version: 0.33.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:

Problem description

The llama3.1 can naturally be supported by the training, evaluation, and deployment modules of Align-Anything. However, according to our tests, due to some issues with the current transformers, it is temporarily unable to support deepspeed's ZeRO3 training. Our developers have reported this issue to the transformers community, and we have received a clear response and will continue to follow up.

This bug may affect the training of other types of models. Currently, if you need to use a stable version for training, you can temporarily use transformers version 4.41.2.

If you want to fine-tune llama3.1, we have verified that using ZeRO 2 can be implemented without errors in the latest 4.43.0 version of transformers.

Reproducible example code

import torch
import deepspeed
import json

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer
)

from transformers.integrations.deepspeed import HfDeepSpeedConfig


DEFAULT_BOS_TOKEN: str = '<s>'
DEFAULT_EOS_TOKEN: str = '</s>'
DEFAULT_PAD_TOKEN: str = '<pad>'
DEFAULT_UNK_TOKEN: str = '<unk>'

model_name_or_path = 'PATHTO/Llama-3.1'
ds_cfgs_path = 'PATH'

deepspeed.init_distributed()

with open(ds_cfgs_path) as f:
    ds_cfgs = json.load(f)
    ds_cfgs['bf16']['enabled'] = True

dstchf = HfDeepSpeedConfig(ds_cfgs)

tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path,
    model_max_length=2048,
    padding_side='right',
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
        model_name_or_path,
        torch_dtype=torch.bfloat16,
        trust_remote_code=True,
)

# Reference: https://github.com/tatsu-lab/stanford_alpaca/blob/main/train.py
def resize_tokenizer_embedding(tokenizer, model) -> None:
    """Resize tokenizer and embedding.

    Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
    """
    def init_new_embeddings(
        embeddings,
        new_num_embeddings: int,
        num_new_embeddings: int,
    ) -> None:
        if embeddings is None:
            return

        params = [embeddings.weight]
        print(hasattr(embeddings.weight, 'ds_id'))
        # True for transformers 4.43.1, False for transformers 4.41.2
        exit()
        context = (
            deepspeed.zero.GatheredParameters(params, modifier_rank=0)
            if is_deepspeed_zero3_enabled()
            else contextlib.nullcontext()
        )
        with context:
            for param in params:
                if param is None:
                    continue
                assert param.size(0) == new_num_embeddings, f'{param.size(0)}, {new_num_embeddings}'
                # bug here, param size is 32000 while new_num_embeddings is 32001
                param_data = param.data
                param_mean = param_data[:-num_new_embeddings].mean(dim=0, keepdim=True)
                param_data[-num_new_embeddings:] = param_mean

    special_tokens_dict = {}
    if tokenizer.pad_token is None:
        special_tokens_dict['pad_token'] = DEFAULT_PAD_TOKEN
    if tokenizer.eos_token is None:
        special_tokens_dict['eos_token'] = DEFAULT_EOS_TOKEN
    if tokenizer.bos_token is None:
        special_tokens_dict['bos_token'] = DEFAULT_BOS_TOKEN
    if tokenizer.unk_token is None:
        special_tokens_dict['unk_token'] = DEFAULT_UNK_TOKEN

    num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict)
    new_num_embeddings = len(tokenizer)

    model.config.bos_token_id = tokenizer.bos_token_id
    model.config.eos_token_id = tokenizer.eos_token_id
    model.config.pad_token_id = tokenizer.pad_token_id

    if num_new_tokens > 0:
        hf_device_map = getattr(model, 'hf_device_map', {})
        devices = {
            torch.device(device)
            for device in hf_device_map.values()
            if device not in {'cpu', 'disk'}
        }
        is_model_parallel = len(devices) > 1

        if not is_model_parallel:
            model.resize_token_embeddings(new_num_embeddings)

            init_new_embeddings(
                model.get_input_embeddings(),
                new_num_embeddings=new_num_embeddings,
                num_new_embeddings=num_new_tokens,
            )
            init_new_embeddings(
                model.get_output_embeddings(),
                new_num_embeddings=new_num_embeddings,
                num_new_embeddings=num_new_tokens,
            )
            
resize_tokenizer_embedding(tokenizer=tokenizer, model=model)

Traceback

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions