FEAT: Threshold and ratio configuration and testing file for optimal threshold and ratio configuration#21
Conversation
…SentinelLocalIndex. FEAT: created a testing tool for best threshold and ratio analysis
…e with optional flags. DOCS: Updated relevent documentation with these fixes
|
Note: All 20 tests passed with two warnings regarding configuration of the pytests as follows, unsure if this is due to an outdated version of the pytest library, or if these config keys have depreciated. .venv\Lib\site-packages_pytest\config_init_.py:1441 -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html" |
…dling and fallback for ratio adjustments
Feat: Adjusts message metrics dynamically
…prevent inclusion
… load time of model exponentially after the first caching. TESTS: Updated embedding tests to include caching and its management functions FEAT: Updated the example script for testing purposes to include caching mechanics
- Apply PEP 8 formatting to Example_Threshold_Script.py - Update embeddings.safetensors - Update sentinel_against_hate.ipynb - Fixed line length violations (max 79 characters) - Corrected indentation and spacing - Enhanced readability while maintaining functionality
…r speech examples
…mpty score arrays, fixes edge case, NaN returns.
…_affinity`, update example file to use path/to/index rather than local path
…omponents in score_formulae and SentinelLocalIndex
…onality, removed redundant exports, added no-cache flag to the testing script
|
@vcai4071 all requested changes requested have been made |
vcai4071
left a comment
There was a problem hiding this comment.
LGTM, thanks for the great additions!
|
@rafainn Can you take a look at the failing test I will merge once the test are passing. |
… have support for PEP 517 builds hence swapped to ^2.0.0 which is compatable - may require further testing however didn't impact functionality of code
|
@leoRblx Would you be able to run the tests again, this should fix the build issue it was displaying earlier, however I am unsure if there would be any further conflicts. |
Built upon pull request #7
This pull request introduces significant improvements to the Sentinel library, focusing on aggregation flexibility, explainability, and performance optimizations. The README is updated to document new aggregation strategies and explainability features, and the codebase now exposes multiple aggregation functions for scoring, adds per-text explanations, and improves model caching and negative sample ratio handling.
Aggregation and Explainability Enhancements:
skewness,top_k_mean,percentile_score,softmax_weighted_mean,max_score) for combining observation scores, with documentation and usage examples inREADME.md. [1] [2] [3] [4]RareClassAffinityResultdataclass and README usage examples. [1] [2]Performance and Robustness Improvements:
SentenceTransformermodels insrc/sentinel/embeddings/sbert.pyto avoid redundant loading, with cache management utilities. Global caching seems to have reduced load time of ~300 conversations down to 3.5s from the previous 12.3s.Cache_Modelin thecalculate_rare_class_affinitymodel insrc\sentinel\sentinel_local_index.pyto enable and disable caching easily, depending on space constraints and model requirementtest_thresholds_and_ratiosinexamples/Example_Threshold_Script.pyand how different ratios and temperatures affect detection, this shows a high relation with using 0.00 and 0.01 temperature, and ratios of 2-4:1 for optimal accuracy and minimal false positives.API and Documentation Updates:
__init__.pyto expose new aggregation functions in the public API.These changes collectively make Sentinel more configurable, interpretable, and efficient for diverse deployment scenarios.