Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
c69276a
docs(spec): Phase 2 W3b — HF reverse-proxy design (SEC-02 / INVARIANT 2)
l17728 May 14, 2026
8b5001c
docs(plan): Phase 2 W3b — HF reverse-proxy implementation plan (9 tasks)
l17728 May 14, 2026
0d9b571
feat(config): add hf_proxy_timeout_seconds Setting (W3b)
l17728 May 14, 2026
020d13c
test(config): rename test_config_w3b->test_config, add env-override t…
l17728 May 14, 2026
f8b7bcc
feat(api): HF reverse-proxy endpoint — streaming + token injection (W3b)
l17728 May 14, 2026
5e84028
fix(api): close HF proxy client on send() failure + test cleanup fixt…
l17728 May 14, 2026
d956086
feat(api): HF proxy ownership + assignment_token + epoch fence (W3b)
l17728 May 14, 2026
e31b807
refactor(api): explicit None guard on assignment_token + test comment…
l17728 May 14, 2026
dc378a5
feat(executor): Assignment.assignment_token + ControllerClient.stream…
l17728 May 14, 2026
6c5bc0c
refactor(executor): annotate stream_hf return type + test streaming p…
l17728 May 14, 2026
651175a
feat(executor): HfS3StreamDownloader fetches via controller proxy (W3b)
l17728 May 14, 2026
021ce86
test: make_fake_controller_client forwards X-Assignment-Token + build…
l17728 May 14, 2026
58d9640
feat(executor): DirectOffsetDownloader fetches via controller proxy (…
l17728 May 14, 2026
cdab6fb
test(executor): cover _resolve_size RuntimeError branch + clarify fal…
l17728 May 14, 2026
cad2db1
feat(executor): wire proxy client end-to-end, delete ExecutorSettings…
l17728 May 14, 2026
a2fbf1f
test(executor): assert runner threads assignment_token into Assignmen…
l17728 May 14, 2026
2ec622f
feat(lint): check_no_hf_token_in_executor — lock INVARIANT 2 (W3b)
l17728 May 14, 2026
327e37f
test(lint): document full-text scan behavior + cache _load_lint (W3b)
l17728 May 14, 2026
7586471
test(e2e)+docs: e2e through controller proxy, OpenAPI + operator runb…
l17728 May 14, 2026
8a7bed0
docs(test): note W3b seam migration in e2e module docstring (W3b)
l17728 May 14, 2026
c5ca86c
docs(plan): sync W3b plan with review-driven fixes during execution
l17728 May 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions api/openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
paths:
# ========== Tasks ==========
/tasks:
get:

Check warning on line 70 in api/openapi.yaml

View workflow job for this annotation

GitHub Actions / OpenAPI lint

operation-description Operation "description" must be present and non-empty string.
tags: [tasks]
summary: List tasks
operationId: listTasks
Expand All @@ -90,7 +90,7 @@
'401': {$ref: '#/components/responses/Unauthenticated'}
'429': {$ref: '#/components/responses/RateLimited'}

post:

Check warning on line 93 in api/openapi.yaml

View workflow job for this annotation

GitHub Actions / OpenAPI lint

operation-description Operation "description" must be present and non-empty string.
tags: [tasks]
summary: Create download task
operationId: createTask
Expand Down Expand Up @@ -191,7 +191,7 @@
parameters:
- $ref: '#/components/parameters/TaskId'

get:

Check warning on line 194 in api/openapi.yaml

View workflow job for this annotation

GitHub Actions / OpenAPI lint

operation-description Operation "description" must be present and non-empty string.
tags: [tasks]
summary: Get task by ID
operationId: getTask
Expand All @@ -202,7 +202,7 @@
application/json:
schema: {$ref: '#/components/schemas/DownloadTask'}

patch:

Check warning on line 205 in api/openapi.yaml

View workflow job for this annotation

GitHub Actions / OpenAPI lint

operation-description Operation "description" must be present and non-empty string.
tags: [tasks]
summary: Update task (priority only)
operationId: updateTask
Expand All @@ -224,7 +224,7 @@
/tasks/{taskId}/cancel:
parameters:
- $ref: '#/components/parameters/TaskId'
post:

Check warning on line 227 in api/openapi.yaml

View workflow job for this annotation

GitHub Actions / OpenAPI lint

operation-description Operation "description" must be present and non-empty string.
tags: [tasks]
summary: Cancel task (async)
operationId: cancelTask
Expand All @@ -250,7 +250,7 @@
/tasks/{taskId}/retry:
parameters:
- $ref: '#/components/parameters/TaskId'
post:

Check warning on line 253 in api/openapi.yaml

View workflow job for this annotation

GitHub Actions / OpenAPI lint

operation-description Operation "description" must be present and non-empty string.
tags: [tasks]
summary: Retry failed subtasks
operationId: retrySubtasks
Expand Down Expand Up @@ -296,7 +296,7 @@
/tasks/{taskId}/upgrade:
parameters:
- $ref: '#/components/parameters/TaskId'
post:

Check warning on line 299 in api/openapi.yaml

View workflow job for this annotation

GitHub Actions / OpenAPI lint

operation-description Operation "description" must be present and non-empty string.
tags: [tasks]
summary: Upgrade to new revision (incremental)
operationId: upgradeTask
Expand All @@ -320,7 +320,7 @@
/tasks/{taskId}/subtasks:
parameters:
- $ref: '#/components/parameters/TaskId'
get:

Check warning on line 323 in api/openapi.yaml

View workflow job for this annotation

GitHub Actions / OpenAPI lint

operation-description Operation "description" must be present and non-empty string.
tags: [subtasks]
summary: List subtasks of a task
operationId: listSubtasks
Expand All @@ -343,7 +343,7 @@
/tasks/{taskId}/source-allocation:
parameters:
- $ref: '#/components/parameters/TaskId'
get:

Check warning on line 346 in api/openapi.yaml

View workflow job for this annotation

GitHub Actions / OpenAPI lint

operation-description Operation "description" must be present and non-empty string.
tags: [tasks]
summary: View source allocation (multi-source visualization)
operationId: getSourceAllocation
Expand All @@ -357,7 +357,7 @@
/tasks/{taskId}/events:
parameters:
- $ref: '#/components/parameters/TaskId'
get:

Check warning on line 360 in api/openapi.yaml

View workflow job for this annotation

GitHub Actions / OpenAPI lint

operation-description Operation "description" must be present and non-empty string.
tags: [tasks]
summary: Task event log
operationId: getTaskEvents
Expand Down Expand Up @@ -730,6 +730,62 @@
got:
type: integer

/hf-proxy/subtask/{subtaskId}:
get:
tags: [executors]
summary: HF reverse proxy — stream a subtask's file from HF
description: >-
Controller-side reverse proxy. Authenticates the executor (mTLS + JWT),
verifies subtask ownership (assignment_token + epoch fence), reconstructs
the HF URL from the subtask row, injects the tenant HF token, and streams
the bytes back. The executor never holds the HF token (INVARIANT 2).
operationId: hfProxySubtask
parameters:
- name: subtaskId
in: path
required: true
schema:
type: string
format: uuid
- name: X-Assignment-Token
in: header
required: true
schema:
type: string
format: uuid
- name: Range
in: header
required: false
schema:
type: string
responses:
'200':
description: Full file stream.
content:
application/octet-stream:
schema:
type: string
format: binary
'206':
description: Partial content (Range request).
content:
application/octet-stream:
schema:
type: string
format: binary
'401':
description: Missing or invalid mTLS / executor JWT.
'403':
description: NOT_YOUR_SUBTASK — authenticated executor does not own this subtask.
'404':
description: Subtask not found.
'409':
description: STALE_ASSIGNMENT or EPOCH_MISMATCH (fence-token mismatch).
'429':
description: Forwarded from HF — rate limited.
'503':
description: Forwarded from HF — upstream unavailable.

/executors/{executorId}/subtasks/{subtaskId}/complete:
parameters:
- in: path
Expand Down
26 changes: 26 additions & 0 deletions docs/operator/executor-runbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,29 @@ anyone can forge the header. Default is `0` (direct uvicorn TLS only).

Heartbeats carry an HMAC timestamp validated within ±5 min. Run
`chrony` / `systemd-timesyncd` on all executor + controller hosts.

## W3b — HF access via the controller reverse proxy

As of Phase 2 W3b, executors no longer talk to huggingface.co directly. All
HF file downloads flow through the controller's reverse proxy
(`GET /api/v1/hf-proxy/subtask/{id}`), which injects the tenant HF token
server-side (INVARIANT 2 — the token never leaves the controller).

**Removed executor environment variables** (delete them from `.env.executor`
and any deployment manifests — they are now ignored):

- `DLW_EXECUTOR_HF_TOKEN`
- `DLW_EXECUTOR_HF_ENDPOINT`

**Controller environment variables:**

- `DLW_HF_TOKEN` — the tenant HF token (already used by the controller for
repo-metadata enumeration; W3b also uses it for the download proxy).
- `DLW_HF_ENDPOINT` — defaults to `https://huggingface.co`.
- `DLW_HF_PROXY_TIMEOUT_SECONDS` — per-request timeout for the proxy's HF
fetch (default 300, range 10–3600).

**Operational tradeoff:** download bandwidth now flows through the controller
rather than executor→HF directly. For the internal beta this is acceptable;
global rate-limit coordination and an executor-local credential pool are
Phase 3 items.
Loading
Loading