It seems that maximum seq length supported is 4096 for GPT:
(https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/models/multi_gpu_gpt/ParallelGptDecoderLayerWeight.h#L31)
Bert seems also have the same maximum seq length.
May I ask the following questions:
- where the constrain comes from, i.e., which kernel.
- Do you have a plan to support longer seq length?
Thanks!