Nidus Backlog & Technical Roadmap

This backlog tracks the evolution of Nidus from MVP to scalable SaaS.

🚨 Priority: High (Critical Infrastructure & Architecture)

[Arch] ADR-001: Formalize Hybrid Parsing Strategy

Context:
We currently parse PDFs in the API layer (using threadpools) and offload AI analysis to Celery. This avoids S3 complexity but risks CPU saturation. Decision:
Stick to this "Hybrid" approach until >10 concurrent uploads/sec or memory pressure becomes unsustainable (OOM). Tasks:

Document this decision in README (Done).
Ensure upload.py uses def (threadpool) not async def (main loop) for CPU bound tasks.
Validate standard thread count in production container. Impact:
Prevents premature over-engineering (S3/GCS) while acknowledging the bottleneck.

[Security] Implement Strict Parsing Limits (DoS Mitigation)

Context:
Since parsing happens on the API node, a "Zip Bomb" or complex PDF can freeze the server. Risks:

CPU exhaustion (DoS).
Memory leaks from pdfplumber. Tasks:
Enforce hard limit: 5MB per file.
Enforce timeout: 5 seconds max for text extraction.
Add pydantic validation for mime-types beyond extension checking. Impact:
Protects the API availability from malicious or erroneous uploads.

🟠 Priority: Medium (Scalability & Observability)

[Scalability] Plan Migration to "Presigned URL" Pattern (S3/GCS)

Context:
When we migrate to a true stateless cluster, local file handling will be insufficient. Plan:

Client requests upload URL from API.
Client uploads directly to Object Storage (S3).
Cloud Function / Worker triggers on file event. Trigger:
Implement this when we reach >50 daily users or observe >500ms latency on the upload endpoint.

[Observability] Metric Instrumentation for Parsing vs Inference

Context:
Users perceive "slowness" but we don't know if it's the PDF parser or the LLM. Tasks:

Structured logging: Log duration_parsing_ms and duration_ai_ms.
Add correlation ID to track flow from Request -> Worker -> DB. Impact:
Enables data-driven optimization.

🟢 Priority: Low (Tech Debt & Refactor)

[Refactor] Dependency Injection for LLM Providers

Context:
ai_service.py is currently tightly coupled to Groq. Tasks:

Define ILLMProvider interface.
Inject provider into Service layer.
Create MockProvider for unit testing without API costs.

[DB] Migrate `keywords` JSON to Relational Table

Context:
Storing skills as JSON prevents efficient SQL querying (e.g. "Find applicants with Python"). Tasks:

Create Skill and CandidateSkill tables.
Write migration script for existing data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nidus Backlog & Technical Roadmap

🚨 Priority: High (Critical Infrastructure & Architecture)

[Arch] ADR-001: Formalize Hybrid Parsing Strategy

[Security] Implement Strict Parsing Limits (DoS Mitigation)

🟠 Priority: Medium (Scalability & Observability)

[Scalability] Plan Migration to "Presigned URL" Pattern (S3/GCS)

[Observability] Metric Instrumentation for Parsing vs Inference

🟢 Priority: Low (Tech Debt & Refactor)

[Refactor] Dependency Injection for LLM Providers

[DB] Migrate `keywords` JSON to Relational Table

FilesExpand file tree

BACKLOG.md

Latest commit

History

BACKLOG.md

File metadata and controls

Nidus Backlog & Technical Roadmap

🚨 Priority: High (Critical Infrastructure & Architecture)

[Arch] ADR-001: Formalize Hybrid Parsing Strategy

[Security] Implement Strict Parsing Limits (DoS Mitigation)

🟠 Priority: Medium (Scalability & Observability)

[Scalability] Plan Migration to "Presigned URL" Pattern (S3/GCS)

[Observability] Metric Instrumentation for Parsing vs Inference

🟢 Priority: Low (Tech Debt & Refactor)

[Refactor] Dependency Injection for LLM Providers

[DB] Migrate keywords JSON to Relational Table

[DB] Migrate `keywords` JSON to Relational Table