feat: add list_docs/update_docs/copy_docs methods on NexsetsResource#17
Conversation
Wraps GET/POST `/data_sets/{id}/docs` so callers can read, replace, and
copy the rich documentation (DocContainer markdown bodies) attached to
nexsets. POST has replace-all semantics for the nexset's docs.
Notable choices caught during testing:
- DocContainerInput sets `extra="ignore"` so a full DocContainer response
can be round-tripped through it; the SDK BaseModel default is
`extra="allow"` which would otherwise leak server-owned fields back to
the API in copy_docs.
- DocContainerInput excludes `public` and `tags`. The server rejects
`public` ("Input cannot include public attribute"); `tags` is on the
response but unconfirmed as writable, so kept out conservatively. Can
be added later if/when confirmed.
Also expands the previously stub DocContainer response model to its
real shape (id, owner, org, doc_type, public, repo_type, repo_config,
text, access_roles, tags, copied_from_id, created_at, updated_at).
Tests: 6 new unit tests in test_nexsets.py covering list/update/copy
including dict-input pass-through (for MCP callers) and copy_docs's
empty-source no-op + field-stripping behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
saksham-nexla
left a comment
There was a problem hiding this comment.
Review Summary
This is a clean, well-scoped addition. The core design choices are sound and the implementation is consistent with existing SDK patterns.
What's done well:
DocContainerInputwithextra="ignore"is exactly the right mechanism for thecopy_docsround-trip strip — the PR description even calls out the unit test that caught the issue before it could reach a live POST, which is the right way to develop this kind of model.- The
update_docsdual-input type (Union[DocContainerInput, Dict]) is a practical concession for MCP/pass-through callers and is tested explicitly. - Test coverage is thorough: happy paths, dict pass-through, field-strip assertions in
copy_docs, and the empty-source no-op are all covered. DocContainerresponse model expansion is complete and well-typed (datetime,Optional,Field(default_factory=list)for list fields).- The
_serialize_datareuse inupdate_docskeeps serialisation consistent with the rest ofBaseResource.
Concerns (details in inline comments):
-
[minor]
copy_docscan raise an unhandledValidationErrorwhen a source doc hastext=None—model_dump(exclude_none=True)silently dropstext, which then failsDocContainerInput's requiredtext: strfield. Droppingexclude_none=True(field stripping still works viaextra="ignore") or pre-filteringNone-text docs would fix this. -
[nit]
Returns:in theupdate_docsdocstring is mis-indented inside theArgs:block — Sphinx will render it as part of thedocsparameter description. -
[nit]
DocContainerInputdocstring doesn't mention thatrepo_type/repo_configare also excluded (onlypublic/tagshave documented rationale). Minor maintenance debt. -
[nit] The silent
return []on non-list API responses inlist_docs/update_docsis consistent with the rest of the SDK, but worth tracking as a broader improvement area.
None of these are blockers — the text=None case is an uncommon API edge case, and the rest are documentation nits. Approving; the minor can be addressed in a follow-up or as part of this PR if the author prefers.
Generated by Claude Code
| Returns: | ||
| The new list of ``DocContainer`` entries as parsed from the | ||
| server response. |
There was a problem hiding this comment.
[nit] Returns: section is mis-indented inside Args:
Issue: The Returns: block (lines 217-219) is indented at 12 spaces — the same level as individual parameter descriptions — rather than 8 spaces (same level as Args:). Sphinx/autodoc and most docstring parsers treat it as a continuation of the docs parameter description rather than a top-level section, so rendered API docs will look broken.
Suggestion: Dedent Returns: and its body by one level:
Args:
set_id: Nexset ID.
docs: ...
Returns:
The new list of ``DocContainer`` entries as parsed from the
server response.Generated by Claude Code
| return [] | ||
| payload = [ | ||
| DocContainerInput.model_validate( | ||
| doc.model_dump(exclude_none=True) |
There was a problem hiding this comment.
[minor] copy_docs raises ValidationError when a source doc has text=None
Issue: DocContainer.text is Optional[str], so the API can return null for text. When that happens, model_dump(exclude_none=True) omits text from the dict entirely. DocContainerInput.model_validate(...) then raises a ValidationError ("field required") because DocContainerInput.text: str has no default. The caller gets an unhandled exception mid-copy with no indication of which doc caused it.
Suggestion: Drop exclude_none=True from the model_dump call here. With DocContainerInput's extra="ignore" policy, server-owned fields are still stripped regardless. text=None will then cause a type-validation failure (str doesn't accept None), which is arguably correct — but at least the error message will be clearer. Alternatively, pre-filter docs with None text and document the behaviour:
valid_docs = [doc for doc in source_docs if doc.text is not None]
if not valid_docs:
return []
payload = [
DocContainerInput.model_validate(doc.model_dump())
for doc in valid_docs
]Generated by Claude Code
| ``public``, ``tags``, ``created_at`` etc.) are silently ignored on | ||
| construction so a full ``DocContainer`` response can be round-tripped | ||
| through this model to drop fields the API does not accept on input. | ||
| """ |
There was a problem hiding this comment.
[nit] Docstring doesn't explain why repo_type / repo_config are excluded
Issue: The class docstring lists id, owner, org, public, tags, and created_at as intentionally excluded, but doesn't mention repo_type or repo_config. The PR description documents the reasoning for public and tags specifically; a future maintainer seeing a 400 from a POST with repo_type included will have no breadcrumb here.
Suggestion: Add a brief note (matching the existing PR description rationale) about repo_type / repo_config:
Server-owned and read-only fields (``id``, ``owner``, ``org``,
``public``, ``tags``, ``created_at`` etc.) are silently ignored on
construction so a full ``DocContainer`` response can be round-tripped
through this model to drop fields the API does not accept on input.
``repo_type`` and ``repo_config`` are also excluded pending confirmation
that the API accepts them on write.Generated by Claude Code
| response = self._make_request("GET", path, params=params) | ||
| if isinstance(response, list): | ||
| return [DocContainer.model_validate(item) for item in response] | ||
| return [] |
There was a problem hiding this comment.
[nit] Silent return [] on non-list response (same applies to update_docs at line 242)
Issue: If the API ever returns a non-list shape (e.g. a dict or null) for these endpoints — say, during a server-side change or transient error that slips past HTTP-level checking — both list_docs and update_docs silently return []. The caller receives an empty result and has no way to distinguish "no docs" from "unexpected response".
This pattern is consistent with the rest of the SDK (get_accessors, etc.), so it's not blocking. Worth tracking as a broader SDK concern: consider adding a warning log or raising NexlaError when isinstance(response, list) is unexpectedly False.
Generated by Claude Code
Summary
Wraps
GET/POST /data_sets/{id}/docsso callers can read, replace, and copy the rich documentation (DocContainermarkdown bodies) attached to nexsets. POST has replace-all semantics.This unblocks the Express MCP server in
veda-ai, which previously couldn't read or write the structuredtextdoc bodies — only the single-linedescriptionon the nexset itself. A companion PR inveda-aiexposes these as MCP tools (list_nexla_nexset_docs,update_nexla_nexset_docs,copy_nexla_nexset_docs).Changes
NexsetsResource(nexla_sdk/resources/nexsets.py):list_docs(set_id, expand=True)—GET /data_sets/{id}/docs?expand=1update_docs(set_id, docs)—POSTwith body{"docs": [...]}, replace-all semantics. AcceptsList[DocContainerInput]orList[Dict]to support typed callers and pass-through callers (e.g. MCP).copy_docs(src_id, dst_id)— convenience: list from src, strip server-owned fields via round-trip throughDocContainerInput, post to dst. No-op if source is empty.DocContainerresponse model from a{id, name}stub to the full shape (id, owner, org, doc_type, public, repo_type, repo_config, text, access_roles, tags, copied_from_id, created_at, updated_at).DocContainerInputrequest model (writable subset).copy_docswith field-strip assertions, and the empty-source no-op.docs-site/docs/guides/coverage.mdx).Notable choices caught during testing
DocContainerInputsetsextra="ignore". The SDK BaseModel default isextra="allow", which would have causedcopy_docs's response→request round-trip to silently retain server-owned fields (id,owner,org, etc.) and send them back in the POST. The unit test caught it before it leaked to a real POST.DocContainerInputexcludespublicandtags. A live POST withpublicreturns400 "Input cannot include public attribute".tagsis on the response but unconfirmed as writable, kept out conservatively. Easy to add later.Test plan
pytest tests/unit/test_nexsets.py -v— 23/23 pass (6 new + 17 pre-existing)pytest tests/unit -q— 288/288 pass, no regressionsdataops.nexla.io:c.nexsets.copy_docs(src_id=..., dst_id=...)— succeeded; destination doc verified via subsequentlist_docslist_nexla_nexset_docs+update_nexla_nexset_docsand wrote the transformed docs🤖 Generated with Claude Code