You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From this article, I learned that continuous batching and PagedAttention greatly improve the inference performance of large models. I would like to know if fastertransformer has plans to support these two features.
77h2l, rndlr96, SiriusNEO, chiendb97, Yangruipis and 9 more