Skip to content

feat: add queue, nosql, markup, tesseract plugin#9

Merged
martsokha merged 17 commits intomainfrom
feature/prerelease
Feb 15, 2026
Merged

feat: add queue, nosql, markup, tesseract plugin#9
martsokha merged 17 commits intomainfrom
feature/prerelease

Conversation

@martsokha
Copy link
Member

No description provided.

Add empty shell plugins for message queues (Kafka, RabbitMQ, SQS,
Redis Streams), NoSQL databases (MongoDB, DynamoDB, Firestore),
markup/tabular parsing (HTML, XML, JSON, CSV, TSV), and Tesseract OCR.
Register all four with the server engine. Move the packages table from
the root README to packages/README.md with white-paper-style
descriptions for each plugin category.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@martsokha martsokha self-assigned this Feb 6, 2026
@martsokha martsokha added docs improvements, updates or additions to docs feat request for or implementation of a new feature labels Feb 6, 2026
martsokha and others added 16 commits February 7, 2026 09:00
…move shared types to types.ts

Extract type-specific fields from the flat Element base class into dedicated
subclasses (TableElement, FormElement, EmailElement, CompositeElement), bundle
OCR/extraction provenance into ElementProvenance, add sourceTag/textAsHtml/
pageName for source fidelity, add FormKeyValuePair for structured form data,
introduce EmailType ontology category, and move JsonValue/Metadata to types.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…h strategies, add tests

Move rule-based chunk and partition actions from nvisy-plugin-ai to nvisy-core
since they have no AI dependency. Split enrich.ts into individual strategy files
following the same pattern. Add maxCharacters/combineUnder/inferTableStructure
options and comprehensive tests for all strategies.

Key changes:
- Move Chunk datatype, chunk (character/section/page), and partition (auto/rule)
  actions to nvisy-core with full test coverage
- Split enrich into enrich-by-metadata, enrich-by-ner, enrich-by-description,
  enrich-by-table-html strategy files
- Use plain interfaces in strategy files, zod .extend() schemas in combining files
- Move parseJsonResponse to providers/client.ts
- Rename embed.ts to generate-embedding.ts
- Move action.ts into actions/ folder

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…h strategies, add tests

- Create @nvisy/plugin-core with chunk/partition actions, plaintext/CSV/JSON
  loaders, and core datatype registration
- Move Action factory from src/actions/action.ts back to src/action.ts in
  nvisy-core as a framework primitive
- Remove corePlugin and action implementations from nvisy-core
- Add limit parameter to SQL read stream for capped pagination
- Delete @nvisy/plugin-markup (empty after moving plaintext loader to core)
- Improve JSDoc across the entire engine module
- Update nvisy-runtime and nvisy-server to use @nvisy/plugin-core

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tatypes, bump pinecone

- Replace SourceConfig/TargetConfig `types` tuple with separate `type`,
  `context`, `params` fields for consistency with Action configs
- Simplify Blob and Document datatypes, update loader implementations
- Streamline registry and bridge internals
- Bump @pinecone-database/pinecone from 4.x to 7.x, update upsert call
- Update README examples and add plugin-core README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the entire TypeScript implementation with a Rust workspace
(6 crates) and a Python AI package, porting all domain types, DAG
engine, detection patterns, object store, and HTTP server.

Rust crates:
- nvisy-core: domain types, traits (Action, Provider, Stream, Loader), plugin registry
- nvisy-detect: 10 regex patterns, 6 actions (detect/evaluate/redact/classify/audit), 3 loaders
- nvisy-engine: DAG compiler (petgraph), executor (tokio tasks + mpsc), retry/timeout, run tracking
- nvisy-object: ObjectStoreClient trait, S3 provider (aws-sdk-s3), read/write streams
- nvisy-python: PyO3 bridge for AI NER (GIL + spawn_blocking), action wrappers
- nvisy-server: Axum HTTP server with health, graphs, redact, policies, audit routes

Python package (packages/nvisy-ai):
- NER detection for text and images via OpenAI, Anthropic, and Gemini providers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…a, JSON patterns, nvisy-exif, update CI

- Move state.rs and config.rs into service/, update all imports
- Add RedactionContext type to nvisy-core for per-request redaction config
- Add schemars schema feature with JsonSchema derives on all server-facing types
- Add utoipa OpenAPI annotations to all 15 handler endpoints with SwaggerUI at /swagger-ui
- Refactor NvisyError enum into Error struct with ErrorKind, source field, Result alias
- Clean crate roots: move standalone .rs files into module directories
- Load detection patterns from JSON (assets/patterns.json) instead of hardcoded statics
- Create nvisy-exif Python package for EXIF metadata reading/stripping
- Bump deps: petgraph 0.8, infer 0.19, utoipa-swagger-ui 9, add schemars 1
- Replace TypeScript CI with Rust workflows, update Dependabot for cargo/pip
- Update Dockerfile for multi-stage Rust build
- Update Makefile with cargo targets
- Rewrite README and docs for Rust codebase

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… pipeline, add docs

Fully-qualify async_trait attributes across all crates. Dissolve schema/
into compiler/graph.rs and policies/retry.rs. Dissolve types/ into
datatypes/ submodules. Merge data/ into datatypes/. Remove DataValue
enum — the pipeline is now blob-centric with an artifact registry
(add_artifact/get_artifacts) for derived data. Add comprehensive
documentation across all 6 crates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… types

Reorganize nvisy-core: rename DataItem→Data, move entity/redaction/audit
types to ontology/, create redaction/ for context+policy, rename traits/
to registry/. Replace serde_json::Value params and Box<dyn Any> clients
with associated types (Params, Credentials, Client) on all traits. Update
all consumer crates (detect, object, python, server) with typed param
structs and concrete client types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sy-media crates

Extract ontology/redaction modules into nvisy-ontology, move Loader trait
into nvisy-ingest, move StreamSource/StreamTarget into nvisy-object, add
new nvisy-ingest and nvisy-media crates, expose fs/io/path modules in
nvisy-core with new dependencies (jiff, hipstr, sha2, hex, strum), and
add From<io::Error> impl plus InternalError/InvalidInput/Serialization
error variants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace previous ARCHITECTURE.md and DEVELOPMENT.md with a structured
document set covering ingestion, detection, redaction, compliance,
infrastructure, and developer experience.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-pattern, nvisy-pipeline

Remove nvisy-detect (actions, dictionaries, patterns), nvisy-media
(render, media redaction actions), and nvisy-server (handlers, services,
middleware). Restructure nvisy-core by removing datatypes and registry
modules. Reorganize nvisy-ingest from flat loaders into modality-specific
modules (audio, binary, image, tabular, text). Add nvisy-pattern and
nvisy-pipeline crates. Update all workspace dependencies accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…logy types

Introduce the top-level Engine trait in nvisy-engine defining the
redaction pipeline contract (ContentHandler + policies + DAG graph →
redacted output + audit trail + full breakdown).

Add supporting ontology types: ClassificationResult, PolicyEvaluation,
RedactionSummary. Extract ModelInfo/ModelKind into entity/model.rs.
Migrate pipeline classify action to use ontology SensitivityLevel enum.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…les; remove Parquet/WebP/BMP support

Consolidate sensitivity assessment into a reusable Sensitivity struct
combining SensitivityLevel and risk_score. Add DocumentType enum in
entity with concrete image (Png, Jpeg, Tiff) and audio (Wav, Mp3)
variants. Make all ontology sub-modules private with public re-exports.
Reformat all Cargo.toml deps to consistent table syntax. Remove Parquet
loader/handler/deps and WebP/BMP from ImageHandler as unsupported.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tream, merge into Handler

Delete the Content trait (with its GAT Iter<'a>) and merge SpanId,
SpanData, view_spans, and edit_spans directly into the Handler trait
as async methods via #[async_trait]. Span iteration is now exposed
through SpanStream (view_stream.rs) and SpanEditStream (edit_stream.rs),
thin wrappers around Pin<Box<dyn Stream + Send>>.

- Add futures dependency to nvisy-ingest
- Create document/view_stream.rs (SpanStream) and document/edit_stream.rs
  (SpanEditStream)
- Rename parse_spans → view_spans; make view_spans/edit_spans async
- Privatize TxtSpanIter, CsvSpanIter, JsonSpanIter
- Add stub SpanId=()/SpanData=() impls to all 8 format-stub handlers
- Convert all 44 handler/loader tests to async

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…bs; reorganize into detection/redaction/generation modules

Add NER detection, OCR, transcription, and synthetic data generation
action stubs with typed input/output structs. Add audio redaction
pass-through to apply action. Enrich ApplyRedactionParams to cover
text, image, and audio modalities. Add bytes field to WavHandler and
Mp3Handler. Split flat actions/ directory into detection/, redaction/,
and generation/ top-level modules; move render/ under redaction/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…one and mutability

- ontology: remove EntityLocation enum; replace with per-modality
  Optional fields on Entity (text_location, image_location, etc.)
  and Annotation. Add with_*_location builder methods.
- ontology: remove Audit builder methods and details field; add
  RetentionPolicy::duration(). Add Policies collection type.
- ontology: slim PolicyRule (drop name, description, enabled, context,
  metadata); remove Policy builder/find_matching_rule methods.
- ingest: add bytes field + Clone to PngHandler; add Clone to
  TxtHandler/CsvHandler; add TxtHandler::new, CsvHandler::rows_mut;
  remove Document::page_number.
- engine: use Policies instead of Vec<Policy>.
- python: update NER/OCR bridges for new Entity location API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@martsokha martsokha merged commit 66c1695 into main Feb 15, 2026
0 of 3 checks passed
@martsokha martsokha deleted the feature/prerelease branch February 15, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs improvements, updates or additions to docs feat request for or implementation of a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments