Skip to content

Comments

feat: improved text splitter#5539

Closed
samdenty wants to merge 7 commits intomainfrom
feat/smooth-stream-cjk
Closed

feat: improved text splitter#5539
samdenty wants to merge 7 commits intomainfrom
feat/smooth-stream-cjk

Conversation

@samdenty
Copy link
Contributor

@samdenty samdenty commented Apr 3, 2025

This integrates the custom text-splitter API I created in react-ai-flow, for better handling of splitted elements.

Features:

Spaces are always trailing on the last element, the regex is created over the whole string so you can use lookahead

Closes #5231

@samdenty samdenty changed the title feat: proper text splitter feat: improved text splitter Apr 3, 2025
* origin/main:
  docs: Use new `create-nuxt` tool when starting Nuxt projects (#5175)
  chore (ui/react,vue): update more tests to unified test server (#5509)
  Version Packages (#5532)
  fix (core): send buffered text in smooth stream when stream parts change (#5531)
'delay 10',
{
textDelta: 'line \n\n',
textDelta: 'line \n\n ',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will create an additional delay

Comment on lines +8 to +10
character: /(?!\s)(?=.)/g,
word: /[\u4E00-\u9FFF\u3040-\u309F\u30A0-\u30FF]|\s+/gm,
line: /\r\n|\r|\n/g,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen to custom user regexp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll add more tests for custom chunking

@lgrammel
Copy link
Collaborator

lgrammel commented Apr 3, 2025

This seems very complex, unclear to me if this complexity is needed.

@lgrammel
Copy link
Collaborator

lgrammel commented Apr 3, 2025

This breaks/changes the current behavior - might be okay but I wonder if there's something simpler we can do.

@lgrammel
Copy link
Collaborator

lgrammel commented Apr 3, 2025

@samdenty I think overall I'd like to understand what the simplest thing is that we could do to support kanjii etc, vs revamping the whole algorithm.

@samdenty
Copy link
Contributor Author

samdenty commented Apr 4, 2025

closing in favour of #5548

@samdenty samdenty closed this Apr 4, 2025
@lgrammel lgrammel deleted the feat/smooth-stream-cjk branch July 2, 2025 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

smoothStream does not support CJK characters

2 participants