Description:
-
Currently, user-uploaded task files (.py) are directly saved to the worker/tasks directory, posing a significant security risk by allowing arbitrary code execution in the
main worker environment. To mitigate this, we need to implement a secure sandbox environment for validating and testing user-uploaded task files before they are made available
the production worker cluster.
-
This issue specifically addresses the "Execution Isolation (Proposed)" point outlined in CHANGES.md. The goal is to establish a robust and scalable mechanism to test user cod
in isolation, preventing malicious or faulty scripts from impacting the core system or other tasks.
Problem Statement:
Directly executing user-provided code without prior validation and isolation can lead to:
- Security vulnerabilities: Malicious code could access sensitive data, compromise the worker container, or interfere with other services.
- System instability: Faulty code could crash worker processes, consume excessive resources, or create deadlocks, impacting overall system reliability.
- Resource exhaustion: Untested code might have unbounded resource usage (CPU, memory), leading to denial-of-service for other tasks.
Proposed Solution (High-Level):
Implement a sandbox pool, leveraging technologies like gVisor or Firecracker (or a more basic containerization approach for initial implementation), to execute user-uploade
task files in an isolated and controlled environment. This will involve:
1. Staging Uploads: All user-uploaded task files will first land in a designated UPLOAD_STAGING_DIR.
2. Validation Queue: A message will be pushed to a TASK_VALIDATION_QUEUE_NAME triggering a dedicated validation worker.
3. Sandbox Execution: The validation worker will pull files from the staging area, mount them into a sandbox container, and execute them with test payloads under strict
resource limits and monitoring.
4. Security Checks: Monitor for forbidden system calls, excessive resource usage, and ensure the file contains the expected async def handler(payload: dict) entry point.
5. File Promotion/Rejection: If the file passes validation, it will be moved to the worker/tasks directory for use by the main worker cluster. If it fails, it will be
rejected (and potentially deleted or quarantined).
Key Tasks:
Relevant Documents:
CHANGES.md (for the overall roadmap and context)
Description:
Currently, user-uploaded task files (
.py) are directly saved to theworker/tasksdirectory, posing a significant security risk by allowing arbitrary code execution in themain worker environment. To mitigate this, we need to implement a secure sandbox environment for validating and testing user-uploaded task files before they are made available
the production worker cluster.
This issue specifically addresses the "Execution Isolation (Proposed)" point outlined in
CHANGES.md. The goal is to establish a robust and scalable mechanism to test user codin isolation, preventing malicious or faulty scripts from impacting the core system or other tasks.
Problem Statement:
Directly executing user-provided code without prior validation and isolation can lead to:
- Resource exhaustion: Untested code might have unbounded resource usage (CPU, memory), leading to denial-of-service for other tasks.
Proposed Solution (High-Level):
Implement a sandbox pool, leveraging technologies like
gVisororFirecracker(or a more basic containerization approach for initial implementation), to execute user-uploadetask files in an isolated and controlled environment. This will involve:
1. Staging Uploads: All user-uploaded task files will first land in a designated
UPLOAD_STAGING_DIR.2. Validation Queue: A message will be pushed to a
TASK_VALIDATION_QUEUE_NAMEtriggering a dedicated validation worker.3. Sandbox Execution: The validation worker will pull files from the staging area, mount them into a sandbox container, and execute them with test payloads under strict
resource limits and monitoring.
4. Security Checks: Monitor for forbidden system calls, excessive resource usage, and ensure the file contains the expected
async def handler(payload: dict)entry point.5. File Promotion/Rejection: If the file passes validation, it will be moved to the
worker/tasksdirectory for use by the main worker cluster. If it fails, it will berejected (and potentially deleted or quarantined).
Key Tasks:
core/config.py: AddUPLOAD_STAGING_DIRandTASK_VALIDATION_QUEUE_NAMEsettings.api/routers/tasks.py(/upload_fileendpoint):UPLOAD_STAGING_DIR.TASK_VALIDATION_QUEUE_NAME.202 Acceptedresponse, indicating pending validation.TASK_VALIDATION_QUEUE_NAME.handlerfunction presence).worker/tasksor delete/quarantine failed files.api/routers/tasks.py(POST /tasks/endpoint):.pyfile has been successfully validated and moved toworker/tasks.GET /tasks/validation_status/{validation_id}endpoint: Allow users to query the validation status of their uploaded files.gVisororFirecracker: For true hardware-level isolation and enhanced security.Relevant Documents:
CHANGES.md(for the overall roadmap and context)