Skip to content

Adding Semantic Text Splitting & Token Text Splitting#720

Merged
kunal0137 merged 18 commits intodevfrom
696-python-add-chonky-semantic-text-splitting
May 5, 2025
Merged

Adding Semantic Text Splitting & Token Text Splitting#720
kunal0137 merged 18 commits intodevfrom
696-python-add-chonky-semantic-text-splitting

Conversation

@pasupathimuniyappan
Copy link
Copy Markdown
Contributor

@pasupathimuniyappan pasupathimuniyappan commented Apr 22, 2025

Description

Add Chonky Semantic Text Splitting feature in the existing flow and route that by passing specific parameter for text splitting.

Changes Made

Added split_text_semantically() function in text_splitting.py file and integrated with split_text()function to route it properly. It requires to modify the codes in the below files.

  • pyproject.toml -> added the version of chonky library that used.
  • text_splitting.py -> Added the required function and integration.
  • vector_db_test.py -> Modified bit in the test_split_text() testcase to test the split_text_semantically function.

How to Test

Add the chonky library to your Python environment.

The smss can be configured with a default chunking method like this: DEFAULT_CHUNKING_METHOD semantic
Or it can be passed in the params of the call.

REACTOR USAGE

CreateEmbeddingsFromDocuments (engine = "1222b449-1bc6-4358-9398-1ed828e4f26a", filePaths = ["fileName1.pdf"], paramValues = [{"chunkingMethod": "semantic"}]);

PYTHON USAGE

from gaas_gpt_vector import VectorEngine
vectorEngine = VectorEngine(engine_id = "1222b449-1bc6-4358-9398-1ed828e4f26a", insight_id = '${i}')
vectorEngine.addDocument(file_paths = ['fileName1.pdf'], param_dict={"chunkingMethod": "semantic"})

SDK USAGE

from ai_server import VectorEngine, ServerClient

server_connection = ServerClient(
    base="http://localhost:9090/Monolith/api",
    access_key="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
    secret_key="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
)

is_connected = server_connection.connected
print(f"Am I connected to the server? {is_connected}")

vector_engine = VectorEngine(
    engine_id="1222b449-1bc6-4358-9398-1ed828e4f26a",
    insight_id=server_connection.cur_insight,
)

file_path = "C:/Users/rweiler/Desktop/REPOS/python-sdk/python/ai-server/src/ai_server/tests/test_files/constitution.pdf"

vector_engine.addDocument(
    file_paths=[file_path], param_dict={"chunkingMethod": "semantic"}
)

1. Using sample code snippet:

  • Here is the sample code snippet to test the above changes.
from genai_client import get_tokenizer
import vector_database

embed_tokenizer = get_tokenizer(
        tokenizer_name="BAAI/bge-large-en-v1.5",
        max_tokens=None,
        tokenizer_type="EMBEDDED",
    )

vector_database.split_text(
        csv_file_location=sample_csv,
        cfg_tokenizer=embed_tokenizer,
        chunk_unit="tokens",
        chunk_size=10,
        chunk_overlap=0,
        chunking_strategy="PAGE_BY_PAGE",
        split_method="semantic",  # by passing this parameter to test the split_text_semantically function
    )

2. Using Pytest testcase: (Recommended)

  • Instead, we can also use the pytest testcase, by uncommenting this line. # split_method="semantic", # enable it if you want to test the split_text_semantically function
  • File Path - Semoss\py\testing\vector_database\vector_db_test.py
  • Function Name - test_split_text()
    image

Notes

@pasupathimuniyappan pasupathimuniyappan linked an issue Apr 22, 2025 that may be closed by this pull request
@github-actions
Copy link
Copy Markdown

@CodiumAI-Agent /describe

@QodoAI-Agent
Copy link
Copy Markdown

Title

[DRAFT PR - DO NOT MERGE] - 696 Python Add Chonky Semantic Text Splitting


User description

Description

Add Chonky Semantic Text Splitting feature in the existing flow and route that by passing specific parameter for text splitting.

Changes Made

Added split_text_semantically() function in text_splitting.py file and integrated with split_text()function to route it properly. It requires to modify the codes in the below files.

  • pyproject.toml -> added the version of chonky library that used.
  • text_splitting.py -> Added the required function and integration.
  • vector_db_test.py -> Modified bit in the test_split_text() testcase to test the split_text_semantically function.

How to Test

1. Using sample code snippet:

  • Here is the sample code snippet to test the above changes.
from genai_client import get_tokenizer
import vector_database

embed_tokenizer = get_tokenizer(
        tokenizer_name="BAAI/bge-large-en-v1.5",
        max_tokens=None,
        tokenizer_type="EMBEDDED",
    )

vector_database.split_text(
        csv_file_location=sample_csv,
        cfg_tokenizer=embed_tokenizer,
        chunk_unit="tokens",
        chunk_size=10,
        chunk_overlap=0,
        chunking_strategy="PAGE_BY_PAGE",
        split_method="semantic",  # by passing this parameter to test the split_text_semantically function
    )

2. Using Pytest testcase: (Recommended)

  • Instead, we can also use the pytest testcase, by uncommenting this line. # split_method="semantic", # enable it if you want to test the split_text_semantically function
  • File Path - Semoss\py\testing\vector_database\vector_db_test.py
  • Function Name - test_split_text()
    image

Notes


PR Type

Enhancement, Tests


Description

  • Add semantic text splitting via Chonky

  • Route split_text by split_method parameter

  • Implement split_text_semantically function

  • Include Chonky dependency in pyproject.toml

  • Update test fixture for semantic splitting


Changes walkthrough 📝

Relevant files
Tests
vector_db_test.py
Expand test fixture and semantic test option                         

py/testing/vector_database/vector_db_test.py

  • Expand sample_csv fixture with longer content
  • Add commented split_method="semantic" option
  • +5/-3     
    Enhancement
    text_splitting.py
    Integrate semantic splitting via Chonky                                   

    py/vector_database/utils/text_splitting.py

  • Add split_method check in split_text
  • Introduce split_text_semantically function
  • Use chonky.TextSplitter for semantic chunking
  • +62/-9   
    Dependencies
    pyproject.toml
    Add Chonky dependency                                                                       

    py/install_config/pyproject.toml

    • Add "chonky==0.1.4" dependency
    +1/-0     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link
    Copy Markdown

    @CodiumAI-Agent /review

    @github-actions
    Copy link
    Copy Markdown

    @CodiumAI-Agent /improve

    @QodoAI-Agent
    Copy link
    Copy Markdown

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Divider Type Inconsistency

    The semantic splitter sets the Divider column to the string "semantic", which may conflict with numerical expectations downstream and cause type or sorting errors.

        "text",
        "semantic",  # No specific page number since it's semantic text splitting
        i,
        cfg_tokenizer.count_tokens(chunk),
        chunk,
    ]
    Potential NameError

    clean_up_string is used inside split_text_semantically without being imported or defined in this scope, leading to a potential NameError at runtime.

    # Initialize Chonky's semantic text splitter (uses transformer models under the hood)
    Missing Semantic Test

    The split_method="semantic" parameter remains commented out, so the semantic splitting branch isn’t actually exercised by the tests; add or enable a test for that path.

    csv_file_location=sample_csv,
    cfg_tokenizer=embed_tokenizer,
    chunk_unit="tokens",
    chunk_size=10,
    chunk_overlap=0,
    chunking_strategy="PAGE_BY_PAGE",
    # split_method="semantic",  # enable it if you want to test the split_text_semantically function

    @QodoAI-Agent
    Copy link
    Copy Markdown

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Declare split_method parameter

    Ensure split_method is declared in the split_text signature so it is defined when
    used.

    py/vector_database/utils/text_splitting.py [141-146]

    -if split_method == "semantic":
    -    text_results_df = split_text_semantically(
    -        text_results_df=text_results_df,
    -        document_name=document_name,
    -        cfg_tokenizer=cfg_tokenizer,
    -    )
    +def split_text(
    +    main_df: pd.DataFrame,
    +    chunking_strategy: str,
    +    cfg_tokenizer,
    +    chunk_unit: str,
    +    chunk_size: int,
    +    chunk_overlap: int,
    +    split_method: str = "recursive",
    +) -> pd.DataFrame:
    +    document_name = main_df["Source"][0]
    +    if split_method == "semantic":
    +        text_results_df = split_text_semantically(
    +            text_results_df=text_results_df,
    +            document_name=document_name,
    +            cfg_tokenizer=cfg_tokenizer,
    +        )
    +    else:
    +        text_results_df = split_text_recursively(
    +            text_results_df=text_results_df,
    +            chunking_strategy=chunking_strategy,
    +            document_name=document_name,
    +            cfg_tokenizer=cfg_tokenizer,
    +            chunk_unit=chunk_unit,
    +            chunk_size=chunk_size,
    +            chunk_overlap=chunk_overlap,
    +        )
    Suggestion importance[1-10]: 9

    __

    Why: Without split_method in the signature, the semantic branch will raise a NameError; adding the parameter is crucial for functionality.

    High
    Initialize missing DataFrame

    Initialize other_modalities_df before concatenation to avoid a NameError in the
    semantic branch.

    py/vector_database/utils/text_splitting.py [158-159]

     # Combine text chunks with other modalities and save to CSV
    +other_modalities_df = pd.DataFrame()
     result = pd.concat([text_results_df, other_modalities_df], ignore_index=True)
    Suggestion importance[1-10]: 8

    __

    Why: other_modalities_df is undefined when split_method is semantic, so initializing it prevents a NameError.

    Medium
    General
    Consistent Divider column type

    Use a consistent type (e.g. None or a numeric code) for the Divider column instead
    of a string to keep column types uniform.

    py/vector_database/utils/text_splitting.py [307-318]

     text_results_df = pd.DataFrame(
         [
             [
                 document_name,
                 "text",
    -            "semantic",  # No specific page number since it's semantic text splitting
    +            None,  # No specific divider for semantic chunks
                 i,
                 cfg_tokenizer.count_tokens(chunk),
                 chunk,
             ]
             for i, chunk in enumerate(chunks)
         ],
         columns=["Source", "Modality", "Divider", "Part", "Tokens", "Content"],
     )
    Suggestion importance[1-10]: 5

    __

    Why: Changing the Divider value to None ensures a uniform column type, improving data consistency with minor impact.

    Low

    @ryanweiler92 ryanweiler92 changed the title [DRAFT PR - DO NOT MERGE] - 696 Python Add Chonky Semantic Text Splitting Add Semantic Text Splitting with Chonky Python Package Apr 22, 2025
    @ryanweiler92 ryanweiler92 changed the title Add Semantic Text Splitting with Chonky Python Package Adding Semantic Text Splitting & Token Text Splitting May 5, 2025
    @kunal0137 kunal0137 merged commit df7db2b into dev May 5, 2025
    3 checks passed
    @kunal0137 kunal0137 deleted the 696-python-add-chonky-semantic-text-splitting branch May 5, 2025 18:08
    @github-actions
    Copy link
    Copy Markdown

    github-actions bot commented May 5, 2025

    @CodiumAI-Agent /update_changelog

    @QodoAI-Agent
    Copy link
    Copy Markdown

    Changelog updates: 🔄

    2025-05-05 *

    Added

    • Configurable chunkingMethod with support for semantic (Chonky) and token‐based text splitting
    • Integrated semantic and token splitters into Java and Python flows

    to commit the new content to the CHANGELOG.md file, please type:
    '/update_changelog --pr_update_changelog.push_changelog_changes=true'

    manamittal added a commit that referenced this pull request May 20, 2025
    * fix(python): handle eval when it is a single line execution but there is string input with space (#756)
    
    * Update Dockerfile.tomcat (#757)
    
    * fix: tomcat builder setting env var
    
    * fix: updating tomcat to 9.0.104
    
    * Update Dockerfile.ubuntu22.04
    
    * Update Dockerfile.ubuntu22.04
    
    * Update Dockerfile.ubuntu22.04
    
    * feat: creating KubernetesModelScaler class (#763)
    
    * Update Dockerfile.ubuntu22.04
    
    * feat: adding ability to attach a file to a vector db source (#736)
    
    * Added AttachSourceToVectorDbReactor for uploading pdf file to an existing csv file and modified VectorFileDownloadReactor
    
    * fix: proper return for the download and matching the reactor name
    
    * fix: error for downloading single file vs multiple; error for copyToDirectory instead of copyFile
    
    * chore: renaming so reactor matches VectorFileDownload
    
    ---------
    
    Co-authored-by: Maher Khalil <themaherkhalil@gmail.com>
    
    * Update Dockerfile.ubuntu22.04
    
    * Update ubuntu2204.yml
    
    * Update ubuntu2204.yml
    
    * Update ubuntu2204_cuda.yml
    
    * Update Dockerfile.nvidia.cuda.12.5.1.ubuntu22.04
    
    * Update ubuntu2204_cuda.yml
    
    * Update ubuntu2204.yml
    
    * feat: exposing tools calling through models (#764)
    
    * 587 unit test for prernadsutil (#654)
    
    * test(unit): unit tests for the prerna.util.ds package
    
    * test(unit): unit tests for the prerna.util.ds.flatfile package
    
    * test(unit): removed reflections, added paraquet tests
    
    * test(unit): unit tests for the prerna.util.ds package
    
    * test(unit): unit tests for the prerna.util.ds.flatfile package
    
    * test(unit): removed reflections, added paraquet tests
    
    * Update ubuntu2204.yml
    
    * Update ubuntu2204.yml
    
    * Update ubuntu2204.yml
    
    * fix: update pipeline docker buildx version
    
    * fix: ignore buildx
    
    * fix: adjusting pipeline for cuda
    
    * feat: switching dynamic sas to default false (#766)
    
    * fix: changes to account for version 2.0.0 of pyjarowinkler (#769)
    
    * chore: using 'Py' instead of 'py' to be consistent (#770)
    
    * feat: full ast parsing of code to return evaluation of the last expression (#771)
    
    * Python Deterministic Token Trimming for Message Truncation (#765)
    
    * feat: deterministic-token-trimming
    
    * feat: modifying logic such that system prompt is second to last message for truncation
    
    ---------
    
    Co-authored-by: Maher Khalil <themaherkhalil@gmail.com>
    
    * fix: added date added column to enginepermission table (#768)
    
    * fix: add docker-in-docker container to run on sef-hosted runner (#773)
    
    Co-authored-by: Raul Esquivel <resmas.work@gmail.com>
    
    * fix: properly passing in the parameters from kwargs/smss into model limits calculation (#774)
    
    * fix: removing legacy param from arguments (#777)
    
    * fix: Fix docker cache build issue (#778)
    
    * adding no cache
    
    * adding no cache
    
    * feat: Adding Semantic Text Splitting & Token Text Splitting (#720)
    
    * [696] - build - Add chonky semantic text splitting - Added the function for chonky semantic text splitting and integrated with existing flow.
    
    * [696] - build - Add chonky semantic text splitting - Updated the code
    
    * [696] - build - Add chonky semantic text splitting - Updated the code comments
    
    * feat: adding reactor support through java
    
    * feat: updating pyproject.toml with chonky package
    
    * feat: check for default chunking method in smss
    
    * [696] - feat - Add chonku semantic text splitting - Resolved the conflicts
    
    * [696] - feat - Add chonky semantic text splitting - Organized the code.
    
    * feat: adding chunking by tokens and setting as default
    
    * updating comments on chunking strategies
    
    ---------
    
    Co-authored-by: Weiler, Ryan <ryanweiler92@gmail.com>
    Co-authored-by: kunal0137 <kunal0137@gmail.com>
    
    * feat: allowing for tools message in full prompt (#780)
    
    * UPDATE ::: Add docker in docker Dockerfiler (#784)
    
    * add docker in docker Dockerfile
    
    * Update Dockerfile.dind
    
    Remove python and tomcat arguments from Dockerfile
    
    * fix: remove-paddle-ocr (#786)
    
    * [#595] test(unit): adds unit test for prerna.engine.impl.model.kserve
    
    Co-authored-by: Ryan Weiler <ryanweiler92@gmail.com>
    
    * feat: Tag semoss image (#789)
    
    * adding changes for non-release docker build
    
    * adding non-release build logic to cuda-semoss builder
    
    * updating push branches
    
    * fix: branch names on docker builds
    
    * fix: branch names on docker builds cuda
    
    * fix: adding push condition - change to pyproject toml file; adding event input vars to env vars (#790)
    
    * fix: python builder toml file change (#792)
    
    * fix: Catch errors when calling pixels from Python (#787)
    
    Co-authored-by: Weiler, Ryan <ryanweiler92@gmail.com>
    
    * Creating db links between engines and default apps (#693)
    
    * create db links between engine and default app
    
    * Rename column APPID to TOOL_APP
    
    * feat: add database_tool_app to getUserEngineList
    
    ---------
    
    Co-authored-by: Weiler, Ryan <ryanweiler92@gmail.com>
    
    * Adding sort options to the myengines reactor (#479)
    
    * added sort feature to MyEnginesReactor and genericized reactor imports
    
    * formatting
    
    * overloading method
    
    * validate sortList
    
    ---------
    
    Co-authored-by: Ryan Weiler <ryanweiler92@gmail.com>
    
    * feat: cleaning up unused imports in MyEngine reactor (#793)
    
    * feat: Create Enum projectTemplate and update CreateAppFromTemplateReactor to accept existing appID for cloning applications (#621)
    
    Co-authored-by: kunal0137 <kunal0137@gmail.com>
    
    * Update GetEngineUsageReactor.java (#417)
    
    Co-authored-by: Maher Khalil <themaherkhalil@gmail.com>
    Co-authored-by: Ryan Weiler <ryanweiler92@gmail.com>
    
    * Issue 596: Adds Unit Tests for prerna/engine/impl/model/responses and workers (#727)
    
    * [#596] test(unit): adds unit tests
    
    * fix: implements ai-agents suggestions
    
    ---------
    
    Co-authored-by: Jeff Vitunac <jvitunac@gmail.com>
    Co-authored-by: Ryan Weiler <ryanweiler92@gmail.com>
    
    * 609 implement native blob storage for azure gcp and aws (#674)
    
    * Initial commit : implementation for azure blob storage
    
    * added dependency for azure in pom.xml
    
    * update logic to fetch the metadata from list details
    
    * changed functionality from listing containers to listing files within a selected container
    
    * initial commit for google cloud storage implementation
    
    * added field contant in enum class and removed unused method
    
    * add methods to parse comma-separated local and cloud paths
    
    * add methods to parse comma-separated local and cloud paths
    
    * implementation for aws s3 bucket
    
    * normalize container prefix path
    
    * merged all: implementation for azure, aws and gcp
    
    * refactor(storage): replace manual path normalization with normalizePath from Utility class
    
    ---------
    
    Co-authored-by: pvijayaraghavareddy <pvijayaraghavareddy@WORKSPA-6QV71G7.us.deloitte.com>
    Co-authored-by: Parth <parthpatel3@deloitte.com>
    Co-authored-by: Ryan Weiler <ryanweiler92@gmail.com>
    
    * Get Node Pool Information for Remote Models (#806)
    
    * 590 unit test for prernaengineimpl (#808)
    
    * test(unit): update to filesystems hijacking for testing files
    
    * test: start of unit tests for abstract database engine
    
    * test(unit): added unit test for prerna.engine.impl
    
    * test(unit): finsihed tests for prerna.engine.impl
    
    * test(unit): adding back unused assignment
    
    ---------
    
    Co-authored-by: Ryan Weiler <ryanweiler92@gmail.com>
    
    * Creating WordCountTokenizer Class (#802)
    
    * feat: creating word count tokenizer class && falling back to word count tokenizer if tiktok fails
    
    * feat: updating comment
    
    * feat: setting default chunking method as recursive (#810)
    
    * Unit tests fixes and Unit test Class file location updates (#812)
    
    * test(unit): moved tests to correct packages
    
    * test(unit): fixed a couple of unit tests
    
    * VectorDatabaseQueryReactor: output divider value for word doc chunks always 1 (#804)
    
    * Code implementation for #733
    
    * feat: Added code to resolve Divider page issue
    
    * Console output replaced by LOGGERs as per review comments
    
    * feat: replaced Console with Loggers
    
    ---------
    
    Co-authored-by: Varaham <katchabi50@gmail.com>
    Co-authored-by: Ryan Weiler <ryanweiler92@gmail.com>
    
    * GetCurrentUserReactor (#818)
    
    Adding GetCurrentUserReactor to return user info including if user is an admin.
    
    * Python User Class (#819)
    
    * fix: trimming properties read from smss; fix: logging commands before executing (#821)
    
    * Updating getNodePoolsInfo() to parse and return zk info and models active actual (#822)
    
    * feat: update get node pool information for zk info and models active actual
    
    * feat: get remote model configs
    
    * Add unit tests for package prerna\engine\impl\vector (#728)
    
    * Create ChromaVectorDatabaseEngineUnitTests.java
    
    * completed tests for ChromaVectorDatabaseEngine class
    
    * [#604] test(unit): Created ChromaVectorDatabaseEngine unit tests
    
    * [604] tests(unit) : Completed test cases for ChromaVectorDatabaseEngine; update File operations to nio operations in ChromaVectorDatabaseEngine.java
    
    * [#604] tests(unit): added unit tests for all vector database engines and util classes in the prerna\engine\impl\vector package
    
    * [604] test(unit): replaced creating file paths with string literals with java.nio Paths.resolve/Paths.get methods
    
    ---------
    
    Co-authored-by: Maher Khalil <themaherkhalil@gmail.com>
    Co-authored-by: Ryan Weiler <ryanweiler92@gmail.com>
    
    * feat: adding to the return of getenginemetadata (#813)
    
    * feat: adding to the return of getenginemetadata
    
    * fix: removing throws
    
    ---------
    
    Co-authored-by: Arash Afghahi <48933336+AAfghahi@users.noreply.github.com>
    Co-authored-by: Ryan Weiler <ryanweiler92@gmail.com>
    
    * 718 create a single reactor to search both engines and apps (#794)
    
    * feat(engineProject): Initial commit
    
    * chore: 718 create a single reactor to search both engines and apps
    
    * chore: 718 create a single reactor to search both engines and apps
    
    ---------
    
    Co-authored-by: Ryan Weiler <ryanweiler92@gmail.com>
    Co-authored-by: Vijayaraghavareddy <pvijayaraghavareddy@deloitte.com>
    
    * feat: update openai wrapper to handle multiple images (#832)
    
    * feat: adding user room map (#840)
    
    * feat: hiding side menu bar for non admins (#833)
    
    * Side menu changes
    
    * Review Comments fixed
    
    * Flag is renamed in  Constants.java
    
    * Review Comment fixed in Utility.java
    
    * fix: cleaning up defaults and comments
    
    ---------
    
    Co-authored-by: kunal0137 <kunal0137@gmail.com>
    
    ---------
    
    Co-authored-by: Maher Khalil <themaherkhalil@gmail.com>
    Co-authored-by: kunal0137 <kunal0137@gmail.com>
    Co-authored-by: Ryan Weiler <ryanweiler92@gmail.com>
    Co-authored-by: ManjariYadav2310 <manjayadav@deloitte.com>
    Co-authored-by: dpartika <dpartika@deloitte.com>
    Co-authored-by: Raul Esquivel <resmas.work@gmail.com>
    Co-authored-by: Pasupathi Muniyappan <pasupathi.muniyappan@kanini.com>
    Co-authored-by: resmas-tx <131498457+resmas-tx@users.noreply.github.com>
    Co-authored-by: AndrewRodddd <62724891+AndrewRodddd@users.noreply.github.com>
    Co-authored-by: radkalyan <107957324+radkalyan@users.noreply.github.com>
    Co-authored-by: samarthKharote <samarth.kharote@kanini.com>
    Co-authored-by: Shubham Mahure <shubham.mahure@kanini.com>
    Co-authored-by: rithvik-doshi <81876806+rithvik-doshi@users.noreply.github.com>
    Co-authored-by: Mogillapalli Manoj kumar <86736340+Khumar23@users.noreply.github.com>
    Co-authored-by: Jeff Vitunac <jvitunac@gmail.com>
    Co-authored-by: pvijayaraghavareddy <pvijayaraghavareddy@WORKSPA-6QV71G7.us.deloitte.com>
    Co-authored-by: Parth <parthpatel3@deloitte.com>
    Co-authored-by: KT Space <119169984+Varaham@users.noreply.github.com>
    Co-authored-by: Varaham <katchabi50@gmail.com>
    Co-authored-by: ericgonzal8 <ericgonzalez8@deloitte.com>
    Co-authored-by: Arash Afghahi <48933336+AAfghahi@users.noreply.github.com>
    Co-authored-by: Vijayaraghavareddy <pvijayaraghavareddy@deloitte.com>
    Co-authored-by: ammb-123 <ammb@deloitte.com>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    [PYTHON] Add Chonky Semantic Text Splitting

    4 participants