This backlog tracks the evolution of Nidus from MVP to scalable SaaS.
Context:
We currently parse PDFs in the API layer (using threadpools) and offload AI analysis to Celery. This avoids S3 complexity but risks CPU saturation.
Decision:
Stick to this "Hybrid" approach until >10 concurrent uploads/sec or memory pressure becomes unsustainable (OOM).
Tasks:
- Document this decision in README (Done).
- Ensure
upload.pyusesdef(threadpool) notasync def(main loop) for CPU bound tasks. - Validate standard thread count in production container.
Impact:
Prevents premature over-engineering (S3/GCS) while acknowledging the bottleneck.
Context:
Since parsing happens on the API node, a "Zip Bomb" or complex PDF can freeze the server.
Risks:
- CPU exhaustion (DoS).
- Memory leaks from
pdfplumber. Tasks: - Enforce hard limit: 5MB per file.
- Enforce timeout: 5 seconds max for text extraction.
- Add
pydanticvalidation for mime-types beyond extension checking. Impact:
Protects the API availability from malicious or erroneous uploads.
Context:
When we migrate to a true stateless cluster, local file handling will be insufficient.
Plan:
- Client requests upload URL from API.
- Client uploads directly to Object Storage (S3).
- Cloud Function / Worker triggers on file event.
Trigger:
Implement this when we reach >50 daily users or observe >500ms latency on the upload endpoint.
Context:
Users perceive "slowness" but we don't know if it's the PDF parser or the LLM.
Tasks:
- Structured logging: Log
duration_parsing_msandduration_ai_ms. - Add correlation ID to track flow from Request -> Worker -> DB.
Impact:
Enables data-driven optimization.
Context:
ai_service.py is currently tightly coupled to Groq.
Tasks:
- Define
ILLMProviderinterface. - Inject provider into Service layer.
- Create
MockProviderfor unit testing without API costs.
Context:
Storing skills as JSON prevents efficient SQL querying (e.g. "Find applicants with Python").
Tasks:
- Create
SkillandCandidateSkilltables. - Write migration script for existing data.