feat: _inputs/custom_feeds/ auto-detected drop-in ingestion (#139)#149
Merged
feat: _inputs/custom_feeds/ auto-detected drop-in ingestion (#139)#149
Conversation
Drop any CSV into _inputs/custom_feeds/ and it is auto-routed to the correct DuckDB table on the next pipeline run — no code changes required. - src/ingest/custom_feeds.py: feed-type detection by column signature (sar/cargo/sanctions/ais) with filename-prefix fallback; per-type ingestors; optional .columnmap.json sidecar for AIS feeds - scripts/run_pipeline.py: new step_custom_feeds (step 5 of 10) between sanctions loading and ownership graph; also included in cadence rescore loop - scripts/run_operations_shell.sh: job 15 — interactive custom feed drop-in - docs/pipeline-operations.md: custom feeds section with interface contract (required/optional columns per feed type), sidecar format, standalone usage - tests/test_custom_feeds.py: 26 unit tests covering all 4 feed types, column-map sidecar, dry-run, unknown schema, deduplication, edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #139
Summary
src/ingest/custom_feeds.py— auto-detects feed type from column signature (SAR / cargo / sanctions / AIS), falls back to filename prefix (sar_*,cargo_*,manifest_*,sanctions_*,ais_*), and routes each CSV to the correct DuckDB table. Optional.columnmap.jsonsidecar lets AIS feeds with non-standard columns map without code changes.scripts/run_pipeline.py— newstep_custom_feedsinserted as step 5 of 10 (after sanctions, before ownership graph); also runs on every cadence rescore loop so live drop-ins are picked up automatically.scripts/run_operations_shell.sh— job 15: interactive prompt for feeds directory + dry-run option + optional feature matrix + scoring.docs/pipeline-operations.md— new Custom feed drop-ins section with interface contract table (required/optional columns per feed type),.columnmap.jsonformat, and standalone CLI usage.tests/test_custom_feeds.py— 26 unit tests covering all four feed types, column-map sidecar, dry-run, unknown schema skip, deduplication, and edge cases.Required columns per feed type
mmsi/MMSI,lat/LAT,lon/LON,timestamp/BaseDateTimeais_positionslat,lon,detected_atsar_detectionsreporter,partner,hs_code,periodtrade_flowname,list_sourcesanctions_entitiesTest plan
uv run pytest tests/test_custom_feeds.py— 26 passeduv run ruff check src/ingest/custom_feeds.py tests/test_custom_feeds.py scripts/run_pipeline.py— clean🤖 Generated with Claude Code