[Long seq length] GPT Seq length constrain 

It seems that maximum seq length supported is 4096 for GPT:

> (https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/models/multi_gpu_gpt/ParallelGptDecoderLayerWeight.h#L31)   

Bert seems also have the same maximum seq length. 

May I ask the following questions:  

1. where the constrain comes from, i.e., which kernel.  
2. Do you have a plan to support longer seq length? 

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Long seq length] GPT Seq length constrain #752

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Long seq length] GPT Seq length constrain #752

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions