Hi Simon - thanks for the nifty tooling. So useful when debugging with these models.
Would like do a feature request to add an option to view the actual tokenized text (bytes produced).
It would be quick to do it from ttok with the other functionality it provides for quick exploration at the command line.
Would also be easy to compare the differences in tokenization from different model variants easily.
Example:
echo -n "this tool is much fineness!" | ttok --bytes
# [b'this', b' tool', b' is', b' much', b' fin', b'eness', b'!']
Hi Simon - thanks for the nifty tooling. So useful when debugging with these models.
Would like do a feature request to add an option to view the actual tokenized text (bytes produced).
It would be quick to do it from
ttokwith the other functionality it provides for quick exploration at the command line.Would also be easy to compare the differences in tokenization from different model variants easily.
Example: