Skip to content

ValueError: block_mask was created for block_mask.shape=(1, 1, 20096, 40192) but got q_len=20000 and kv_len=40000. #196

@Jim2016713

Description

@Jim2016713

When I was training, I encountered a problem like this.
[rank3]: ValueError: block_mask was created for block_mask.shape=(1, 1, 20096, 40192) but got q_len=20000 and kv_len=40000. As the block mask was created for a larger length than you're using it for, you can either 1. create a new block mask with the correct length, or 2. 'adjust' the existing block mask to the correct length by calling block_mask._adjust(q_len, kv_len). This essentially 'crops' the block mask to the upper left corner, which does not work for all mask_mods!
[rank2]: Traceback (most recent call last):

training scripts:
python ./scripts/data_generation_offline.py
--target-model-path $TARGET_MODEL_NAME_OR_PATH
--train-data-path $TRAIN_DATA_PATH
--seq-length 20000
--hf-cache-dir ./output/eagle3_data_gen/cache
--output-dir ./eagle3_data_gen/training_data
--max-model-len 20000
--batch-size 32
--num-preprocessing-workers 32

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions