feat: add queue, nosql, markup, tesseract plugin by martsokha · Pull Request #9 · nvisycom/runtime

martsokha · 2026-02-06T12:34:54Z

No description provided.

Add empty shell plugins for message queues (Kafka, RabbitMQ, SQS, Redis Streams), NoSQL databases (MongoDB, DynamoDB, Firestore), markup/tabular parsing (HTML, XML, JSON, CSV, TSV), and Tesseract OCR. Register all four with the server engine. Move the packages table from the root README to packages/README.md with white-paper-style descriptions for each plugin category. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…move shared types to types.ts Extract type-specific fields from the flat Element base class into dedicated subclasses (TableElement, FormElement, EmailElement, CompositeElement), bundle OCR/extraction provenance into ElementProvenance, add sourceTag/textAsHtml/ pageName for source fidelity, add FormKeyValuePair for structured form data, introduce EmailType ontology category, and move JsonValue/Metadata to types.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…h strategies, add tests Move rule-based chunk and partition actions from nvisy-plugin-ai to nvisy-core since they have no AI dependency. Split enrich.ts into individual strategy files following the same pattern. Add maxCharacters/combineUnder/inferTableStructure options and comprehensive tests for all strategies. Key changes: - Move Chunk datatype, chunk (character/section/page), and partition (auto/rule) actions to nvisy-core with full test coverage - Split enrich into enrich-by-metadata, enrich-by-ner, enrich-by-description, enrich-by-table-html strategy files - Use plain interfaces in strategy files, zod .extend() schemas in combining files - Move parseJsonResponse to providers/client.ts - Rename embed.ts to generate-embedding.ts - Move action.ts into actions/ folder Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…h strategies, add tests - Create @nvisy/plugin-core with chunk/partition actions, plaintext/CSV/JSON loaders, and core datatype registration - Move Action factory from src/actions/action.ts back to src/action.ts in nvisy-core as a framework primitive - Remove corePlugin and action implementations from nvisy-core - Add limit parameter to SQL read stream for capped pagination - Delete @nvisy/plugin-markup (empty after moving plaintext loader to core) - Improve JSDoc across the entire engine module - Update nvisy-runtime and nvisy-server to use @nvisy/plugin-core Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tatypes, bump pinecone - Replace SourceConfig/TargetConfig `types` tuple with separate `type`, `context`, `params` fields for consistency with Action configs - Simplify Blob and Document datatypes, update loader implementations - Streamline registry and bridge internals - Bump @pinecone-database/pinecone from 4.x to 7.x, update upsert call - Update README examples and add plugin-core README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the entire TypeScript implementation with a Rust workspace (6 crates) and a Python AI package, porting all domain types, DAG engine, detection patterns, object store, and HTTP server. Rust crates: - nvisy-core: domain types, traits (Action, Provider, Stream, Loader), plugin registry - nvisy-detect: 10 regex patterns, 6 actions (detect/evaluate/redact/classify/audit), 3 loaders - nvisy-engine: DAG compiler (petgraph), executor (tokio tasks + mpsc), retry/timeout, run tracking - nvisy-object: ObjectStoreClient trait, S3 provider (aws-sdk-s3), read/write streams - nvisy-python: PyO3 bridge for AI NER (GIL + spawn_blocking), action wrappers - nvisy-server: Axum HTTP server with health, graphs, redact, policies, audit routes Python package (packages/nvisy-ai): - NER detection for text and images via OpenAI, Anthropic, and Gemini providers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…a, JSON patterns, nvisy-exif, update CI - Move state.rs and config.rs into service/, update all imports - Add RedactionContext type to nvisy-core for per-request redaction config - Add schemars schema feature with JsonSchema derives on all server-facing types - Add utoipa OpenAPI annotations to all 15 handler endpoints with SwaggerUI at /swagger-ui - Refactor NvisyError enum into Error struct with ErrorKind, source field, Result alias - Clean crate roots: move standalone .rs files into module directories - Load detection patterns from JSON (assets/patterns.json) instead of hardcoded statics - Create nvisy-exif Python package for EXIF metadata reading/stripping - Bump deps: petgraph 0.8, infer 0.19, utoipa-swagger-ui 9, add schemars 1 - Replace TypeScript CI with Rust workflows, update Dependabot for cargo/pip - Update Dockerfile for multi-stage Rust build - Update Makefile with cargo targets - Rewrite README and docs for Rust codebase Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… pipeline, add docs Fully-qualify async_trait attributes across all crates. Dissolve schema/ into compiler/graph.rs and policies/retry.rs. Dissolve types/ into datatypes/ submodules. Merge data/ into datatypes/. Remove DataValue enum — the pipeline is now blob-centric with an artifact registry (add_artifact/get_artifacts) for derived data. Add comprehensive documentation across all 6 crates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… types Reorganize nvisy-core: rename DataItem→Data, move entity/redaction/audit types to ontology/, create redaction/ for context+policy, rename traits/ to registry/. Replace serde_json::Value params and Box<dyn Any> clients with associated types (Params, Credentials, Client) on all traits. Update all consumer crates (detect, object, python, server) with typed param structs and concrete client types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…sy-media crates Extract ontology/redaction modules into nvisy-ontology, move Loader trait into nvisy-ingest, move StreamSource/StreamTarget into nvisy-object, add new nvisy-ingest and nvisy-media crates, expose fs/io/path modules in nvisy-core with new dependencies (jiff, hipstr, sha2, hex, strum), and add From<io::Error> impl plus InternalError/InvalidInput/Serialization error variants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace previous ARCHITECTURE.md and DEVELOPMENT.md with a structured document set covering ingestion, detection, redaction, compliance, infrastructure, and developer experience. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…-pattern, nvisy-pipeline Remove nvisy-detect (actions, dictionaries, patterns), nvisy-media (render, media redaction actions), and nvisy-server (handlers, services, middleware). Restructure nvisy-core by removing datatypes and registry modules. Reorganize nvisy-ingest from flat loaders into modality-specific modules (audio, binary, image, tabular, text). Add nvisy-pattern and nvisy-pipeline crates. Update all workspace dependencies accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…logy types Introduce the top-level Engine trait in nvisy-engine defining the redaction pipeline contract (ContentHandler + policies + DAG graph → redacted output + audit trail + full breakdown). Add supporting ontology types: ClassificationResult, PolicyEvaluation, RedactionSummary. Extract ModelInfo/ModelKind into entity/model.rs. Migrate pipeline classify action to use ontology SensitivityLevel enum. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…les; remove Parquet/WebP/BMP support Consolidate sensitivity assessment into a reusable Sensitivity struct combining SensitivityLevel and risk_score. Add DocumentType enum in entity with concrete image (Png, Jpeg, Tiff) and audio (Wav, Mp3) variants. Make all ontology sub-modules private with public re-exports. Reformat all Cargo.toml deps to consistent table syntax. Remove Parquet loader/handler/deps and WebP/BMP from ImageHandler as unsupported. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tream, merge into Handler Delete the Content trait (with its GAT Iter<'a>) and merge SpanId, SpanData, view_spans, and edit_spans directly into the Handler trait as async methods via #[async_trait]. Span iteration is now exposed through SpanStream (view_stream.rs) and SpanEditStream (edit_stream.rs), thin wrappers around Pin<Box<dyn Stream + Send>>. - Add futures dependency to nvisy-ingest - Create document/view_stream.rs (SpanStream) and document/edit_stream.rs (SpanEditStream) - Rename parse_spans → view_spans; make view_spans/edit_spans async - Privatize TxtSpanIter, CsvSpanIter, JsonSpanIter - Add stub SpanId=()/SpanData=() impls to all 8 format-stub handlers - Convert all 44 handler/loader tests to async Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…bs; reorganize into detection/redaction/generation modules Add NER detection, OCR, transcription, and synthetic data generation action stubs with typed input/output structs. Add audio redaction pass-through to apply action. Enrich ApplyRedactionParams to cover text, image, and audio modalities. Add bytes field to WavHandler and Mp3Handler. Split flat actions/ directory into detection/, redaction/, and generation/ top-level modules; move render/ under redaction/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…one and mutability - ontology: remove EntityLocation enum; replace with per-modality Optional fields on Entity (text_location, image_location, etc.) and Annotation. Add with_*_location builder methods. - ontology: remove Audit builder methods and details field; add RetentionPolicy::duration(). Add Policies collection type. - ontology: slim PolicyRule (drop name, description, enabled, context, metadata); remove Policy builder/find_matching_rule methods. - ingest: add bytes field + Clone to PngHandler; add Clone to TxtHandler/CsvHandler; add TxtHandler::new, CsvHandler::rows_mut; remove Document::page_number. - engine: use Policies instead of Vec<Policy>. - python: update NER/OCR bridges for new Entity location API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

martsokha self-assigned this Feb 6, 2026

martsokha added docs improvements, updates or additions to docs feat request for or implementation of a new feature labels Feb 6, 2026

martsokha and others added 16 commits February 7, 2026 09:00

martsokha merged commit 66c1695 into main Feb 15, 2026
0 of 3 checks passed

martsokha deleted the feature/prerelease branch February 15, 2026 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add queue, nosql, markup, tesseract plugin#9

feat: add queue, nosql, markup, tesseract plugin#9
martsokha merged 17 commits intomainfrom
feature/prerelease

martsokha commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

martsokha commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments