feat: add queue, nosql, markup, tesseract plugin#9
Merged
Conversation
Add empty shell plugins for message queues (Kafka, RabbitMQ, SQS, Redis Streams), NoSQL databases (MongoDB, DynamoDB, Firestore), markup/tabular parsing (HTML, XML, JSON, CSV, TSV), and Tesseract OCR. Register all four with the server engine. Move the packages table from the root README to packages/README.md with white-paper-style descriptions for each plugin category. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…move shared types to types.ts Extract type-specific fields from the flat Element base class into dedicated subclasses (TableElement, FormElement, EmailElement, CompositeElement), bundle OCR/extraction provenance into ElementProvenance, add sourceTag/textAsHtml/ pageName for source fidelity, add FormKeyValuePair for structured form data, introduce EmailType ontology category, and move JsonValue/Metadata to types.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…h strategies, add tests Move rule-based chunk and partition actions from nvisy-plugin-ai to nvisy-core since they have no AI dependency. Split enrich.ts into individual strategy files following the same pattern. Add maxCharacters/combineUnder/inferTableStructure options and comprehensive tests for all strategies. Key changes: - Move Chunk datatype, chunk (character/section/page), and partition (auto/rule) actions to nvisy-core with full test coverage - Split enrich into enrich-by-metadata, enrich-by-ner, enrich-by-description, enrich-by-table-html strategy files - Use plain interfaces in strategy files, zod .extend() schemas in combining files - Move parseJsonResponse to providers/client.ts - Rename embed.ts to generate-embedding.ts - Move action.ts into actions/ folder Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…h strategies, add tests - Create @nvisy/plugin-core with chunk/partition actions, plaintext/CSV/JSON loaders, and core datatype registration - Move Action factory from src/actions/action.ts back to src/action.ts in nvisy-core as a framework primitive - Remove corePlugin and action implementations from nvisy-core - Add limit parameter to SQL read stream for capped pagination - Delete @nvisy/plugin-markup (empty after moving plaintext loader to core) - Improve JSDoc across the entire engine module - Update nvisy-runtime and nvisy-server to use @nvisy/plugin-core Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tatypes, bump pinecone - Replace SourceConfig/TargetConfig `types` tuple with separate `type`, `context`, `params` fields for consistency with Action configs - Simplify Blob and Document datatypes, update loader implementations - Streamline registry and bridge internals - Bump @pinecone-database/pinecone from 4.x to 7.x, update upsert call - Update README examples and add plugin-core README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the entire TypeScript implementation with a Rust workspace (6 crates) and a Python AI package, porting all domain types, DAG engine, detection patterns, object store, and HTTP server. Rust crates: - nvisy-core: domain types, traits (Action, Provider, Stream, Loader), plugin registry - nvisy-detect: 10 regex patterns, 6 actions (detect/evaluate/redact/classify/audit), 3 loaders - nvisy-engine: DAG compiler (petgraph), executor (tokio tasks + mpsc), retry/timeout, run tracking - nvisy-object: ObjectStoreClient trait, S3 provider (aws-sdk-s3), read/write streams - nvisy-python: PyO3 bridge for AI NER (GIL + spawn_blocking), action wrappers - nvisy-server: Axum HTTP server with health, graphs, redact, policies, audit routes Python package (packages/nvisy-ai): - NER detection for text and images via OpenAI, Anthropic, and Gemini providers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…a, JSON patterns, nvisy-exif, update CI - Move state.rs and config.rs into service/, update all imports - Add RedactionContext type to nvisy-core for per-request redaction config - Add schemars schema feature with JsonSchema derives on all server-facing types - Add utoipa OpenAPI annotations to all 15 handler endpoints with SwaggerUI at /swagger-ui - Refactor NvisyError enum into Error struct with ErrorKind, source field, Result alias - Clean crate roots: move standalone .rs files into module directories - Load detection patterns from JSON (assets/patterns.json) instead of hardcoded statics - Create nvisy-exif Python package for EXIF metadata reading/stripping - Bump deps: petgraph 0.8, infer 0.19, utoipa-swagger-ui 9, add schemars 1 - Replace TypeScript CI with Rust workflows, update Dependabot for cargo/pip - Update Dockerfile for multi-stage Rust build - Update Makefile with cargo targets - Rewrite README and docs for Rust codebase Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… pipeline, add docs Fully-qualify async_trait attributes across all crates. Dissolve schema/ into compiler/graph.rs and policies/retry.rs. Dissolve types/ into datatypes/ submodules. Merge data/ into datatypes/. Remove DataValue enum — the pipeline is now blob-centric with an artifact registry (add_artifact/get_artifacts) for derived data. Add comprehensive documentation across all 6 crates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… types Reorganize nvisy-core: rename DataItem→Data, move entity/redaction/audit types to ontology/, create redaction/ for context+policy, rename traits/ to registry/. Replace serde_json::Value params and Box<dyn Any> clients with associated types (Params, Credentials, Client) on all traits. Update all consumer crates (detect, object, python, server) with typed param structs and concrete client types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sy-media crates Extract ontology/redaction modules into nvisy-ontology, move Loader trait into nvisy-ingest, move StreamSource/StreamTarget into nvisy-object, add new nvisy-ingest and nvisy-media crates, expose fs/io/path modules in nvisy-core with new dependencies (jiff, hipstr, sha2, hex, strum), and add From<io::Error> impl plus InternalError/InvalidInput/Serialization error variants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace previous ARCHITECTURE.md and DEVELOPMENT.md with a structured document set covering ingestion, detection, redaction, compliance, infrastructure, and developer experience. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-pattern, nvisy-pipeline Remove nvisy-detect (actions, dictionaries, patterns), nvisy-media (render, media redaction actions), and nvisy-server (handlers, services, middleware). Restructure nvisy-core by removing datatypes and registry modules. Reorganize nvisy-ingest from flat loaders into modality-specific modules (audio, binary, image, tabular, text). Add nvisy-pattern and nvisy-pipeline crates. Update all workspace dependencies accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…logy types Introduce the top-level Engine trait in nvisy-engine defining the redaction pipeline contract (ContentHandler + policies + DAG graph → redacted output + audit trail + full breakdown). Add supporting ontology types: ClassificationResult, PolicyEvaluation, RedactionSummary. Extract ModelInfo/ModelKind into entity/model.rs. Migrate pipeline classify action to use ontology SensitivityLevel enum. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…les; remove Parquet/WebP/BMP support Consolidate sensitivity assessment into a reusable Sensitivity struct combining SensitivityLevel and risk_score. Add DocumentType enum in entity with concrete image (Png, Jpeg, Tiff) and audio (Wav, Mp3) variants. Make all ontology sub-modules private with public re-exports. Reformat all Cargo.toml deps to consistent table syntax. Remove Parquet loader/handler/deps and WebP/BMP from ImageHandler as unsupported. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tream, merge into Handler Delete the Content trait (with its GAT Iter<'a>) and merge SpanId, SpanData, view_spans, and edit_spans directly into the Handler trait as async methods via #[async_trait]. Span iteration is now exposed through SpanStream (view_stream.rs) and SpanEditStream (edit_stream.rs), thin wrappers around Pin<Box<dyn Stream + Send>>. - Add futures dependency to nvisy-ingest - Create document/view_stream.rs (SpanStream) and document/edit_stream.rs (SpanEditStream) - Rename parse_spans → view_spans; make view_spans/edit_spans async - Privatize TxtSpanIter, CsvSpanIter, JsonSpanIter - Add stub SpanId=()/SpanData=() impls to all 8 format-stub handlers - Convert all 44 handler/loader tests to async Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…bs; reorganize into detection/redaction/generation modules Add NER detection, OCR, transcription, and synthetic data generation action stubs with typed input/output structs. Add audio redaction pass-through to apply action. Enrich ApplyRedactionParams to cover text, image, and audio modalities. Add bytes field to WavHandler and Mp3Handler. Split flat actions/ directory into detection/, redaction/, and generation/ top-level modules; move render/ under redaction/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…one and mutability - ontology: remove EntityLocation enum; replace with per-modality Optional fields on Entity (text_location, image_location, etc.) and Annotation. Add with_*_location builder methods. - ontology: remove Audit builder methods and details field; add RetentionPolicy::duration(). Add Policies collection type. - ontology: slim PolicyRule (drop name, description, enabled, context, metadata); remove Policy builder/find_matching_rule methods. - ingest: add bytes field + Clone to PngHandler; add Clone to TxtHandler/CsvHandler; add TxtHandler::new, CsvHandler::rows_mut; remove Document::page_number. - engine: use Policies instead of Vec<Policy>. - python: update NER/OCR bridges for new Entity location API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.