The motivation is that the get_num_tokens method of the LangChain chat models is buggy, which will have this warning:
Token indices sequence length is longer than the specified maximum sequence length for this model (8599 > 1024). Running this sequence through the model will result in indexing errors
It's better to calculate the token by ourself, or maybe we can check the source code.
The motivation is that the
get_num_tokensmethod of the LangChain chat models is buggy, which will have this warning:It's better to calculate the token by ourself, or maybe we can check the source code.