Required prerequisites
What version of align-anything are you using?
0.1.0-dev
System information
transformers version: 4.43.1
- Platform: Linux-5.15.0-1040-nvidia-x86_64-with-glibc2.35
- Python version: 3.11.9
- Huggingface_hub version: 0.24.1
- Safetensors version: 0.4.3
- Accelerate version: 0.33.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.3.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
Problem description
The llama3.1 can naturally be supported by the training, evaluation, and deployment modules of Align-Anything. However, according to our tests, due to some issues with the current transformers, it is temporarily unable to support deepspeed's ZeRO3 training. Our developers have reported this issue to the transformers community, and we have received a clear response and will continue to follow up.
This bug may affect the training of other types of models. Currently, if you need to use a stable version for training, you can temporarily use transformers version 4.41.2.
If you want to fine-tune llama3.1, we have verified that using ZeRO 2 can be implemented without errors in the latest 4.43.0 version of transformers.
Reproducible example code
import torch
import deepspeed
import json
from transformers import (
AutoModelForCausalLM,
AutoTokenizer
)
from transformers.integrations.deepspeed import HfDeepSpeedConfig
DEFAULT_BOS_TOKEN: str = '<s>'
DEFAULT_EOS_TOKEN: str = '</s>'
DEFAULT_PAD_TOKEN: str = '<pad>'
DEFAULT_UNK_TOKEN: str = '<unk>'
model_name_or_path = 'PATHTO/Llama-3.1'
ds_cfgs_path = 'PATH'
deepspeed.init_distributed()
with open(ds_cfgs_path) as f:
ds_cfgs = json.load(f)
ds_cfgs['bf16']['enabled'] = True
dstchf = HfDeepSpeedConfig(ds_cfgs)
tokenizer = AutoTokenizer.from_pretrained(
model_name_or_path,
model_max_length=2048,
padding_side='right',
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
# Reference: https://github.com/tatsu-lab/stanford_alpaca/blob/main/train.py
def resize_tokenizer_embedding(tokenizer, model) -> None:
"""Resize tokenizer and embedding.
Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
"""
def init_new_embeddings(
embeddings,
new_num_embeddings: int,
num_new_embeddings: int,
) -> None:
if embeddings is None:
return
params = [embeddings.weight]
print(hasattr(embeddings.weight, 'ds_id'))
# True for transformers 4.43.1, False for transformers 4.41.2
exit()
context = (
deepspeed.zero.GatheredParameters(params, modifier_rank=0)
if is_deepspeed_zero3_enabled()
else contextlib.nullcontext()
)
with context:
for param in params:
if param is None:
continue
assert param.size(0) == new_num_embeddings, f'{param.size(0)}, {new_num_embeddings}'
# bug here, param size is 32000 while new_num_embeddings is 32001
param_data = param.data
param_mean = param_data[:-num_new_embeddings].mean(dim=0, keepdim=True)
param_data[-num_new_embeddings:] = param_mean
special_tokens_dict = {}
if tokenizer.pad_token is None:
special_tokens_dict['pad_token'] = DEFAULT_PAD_TOKEN
if tokenizer.eos_token is None:
special_tokens_dict['eos_token'] = DEFAULT_EOS_TOKEN
if tokenizer.bos_token is None:
special_tokens_dict['bos_token'] = DEFAULT_BOS_TOKEN
if tokenizer.unk_token is None:
special_tokens_dict['unk_token'] = DEFAULT_UNK_TOKEN
num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict)
new_num_embeddings = len(tokenizer)
model.config.bos_token_id = tokenizer.bos_token_id
model.config.eos_token_id = tokenizer.eos_token_id
model.config.pad_token_id = tokenizer.pad_token_id
if num_new_tokens > 0:
hf_device_map = getattr(model, 'hf_device_map', {})
devices = {
torch.device(device)
for device in hf_device_map.values()
if device not in {'cpu', 'disk'}
}
is_model_parallel = len(devices) > 1
if not is_model_parallel:
model.resize_token_embeddings(new_num_embeddings)
init_new_embeddings(
model.get_input_embeddings(),
new_num_embeddings=new_num_embeddings,
num_new_embeddings=num_new_tokens,
)
init_new_embeddings(
model.get_output_embeddings(),
new_num_embeddings=new_num_embeddings,
num_new_embeddings=num_new_tokens,
)
resize_tokenizer_embedding(tokenizer=tokenizer, model=model)
Traceback
No response
Expected behavior
No response
Additional context
No response
Required prerequisites
What version of align-anything are you using?
0.1.0-dev
System information
transformersversion: 4.43.1Problem description
The llama3.1 can naturally be supported by the training, evaluation, and deployment modules of Align-Anything. However, according to our tests, due to some issues with the current transformers, it is temporarily unable to support deepspeed's ZeRO3 training. Our developers have reported this issue to the transformers community, and we have received a clear response and will continue to follow up.
This bug may affect the training of other types of models. Currently, if you need to use a stable version for training, you can temporarily use transformers version 4.41.2.
If you want to fine-tune llama3.1, we have verified that using ZeRO 2 can be implemented without errors in the latest 4.43.0 version of transformers.
Reproducible example code
Traceback
No response
Expected behavior
No response
Additional context
No response