Skip to content

feat: add list_docs/update_docs/copy_docs methods on NexsetsResource#17

Merged
tariqalagha-nexla merged 1 commit intonexla-opensource:mainfrom
tariqalagha-nexla:feat/nexset-doc-methods
May 9, 2026
Merged

feat: add list_docs/update_docs/copy_docs methods on NexsetsResource#17
tariqalagha-nexla merged 1 commit intonexla-opensource:mainfrom
tariqalagha-nexla:feat/nexset-doc-methods

Conversation

@tariqalagha-nexla
Copy link
Copy Markdown
Collaborator

Summary

Wraps GET/POST /data_sets/{id}/docs so callers can read, replace, and copy the rich documentation (DocContainer markdown bodies) attached to nexsets. POST has replace-all semantics.

This unblocks the Express MCP server in veda-ai, which previously couldn't read or write the structured text doc bodies — only the single-line description on the nexset itself. A companion PR in veda-ai exposes these as MCP tools (list_nexla_nexset_docs, update_nexla_nexset_docs, copy_nexla_nexset_docs).

Changes

  • New methods on NexsetsResource (nexla_sdk/resources/nexsets.py):
    • list_docs(set_id, expand=True)GET /data_sets/{id}/docs?expand=1
    • update_docs(set_id, docs)POST with body {"docs": [...]}, replace-all semantics. Accepts List[DocContainerInput] or List[Dict] to support typed callers and pass-through callers (e.g. MCP).
    • copy_docs(src_id, dst_id) — convenience: list from src, strip server-owned fields via round-trip through DocContainerInput, post to dst. No-op if source is empty.
  • Expanded DocContainer response model from a {id, name} stub to the full shape (id, owner, org, doc_type, public, repo_type, repo_config, text, access_roles, tags, copied_from_id, created_at, updated_at).
  • New DocContainerInput request model (writable subset).
  • 6 new unit tests covering happy paths, dict-input pass-through, multi-call copy_docs with field-strip assertions, and the empty-source no-op.
  • Coverage doc updated (docs-site/docs/guides/coverage.mdx).

Notable choices caught during testing

  • DocContainerInput sets extra="ignore". The SDK BaseModel default is extra="allow", which would have caused copy_docs's response→request round-trip to silently retain server-owned fields (id, owner, org, etc.) and send them back in the POST. The unit test caught it before it leaked to a real POST.
  • DocContainerInput excludes public and tags. A live POST with public returns 400 "Input cannot include public attribute". tags is on the response but unconfirmed as writable, kept out conservatively. Easy to add later.

Test plan

  • pytest tests/unit/test_nexsets.py -v — 23/23 pass (6 new + 17 pre-existing)
  • pytest tests/unit -q — 288/288 pass, no regressions
  • Live REPL against dataops.nexla.io: c.nexsets.copy_docs(src_id=..., dst_id=...) — succeeded; destination doc verified via subsequent list_docs
  • End-to-end via Claude Desktop + Express MCP server: prompted "Copy the docs from nexset A to nexset B with this find/replace", which used list_nexla_nexset_docs + update_nexla_nexset_docs and wrote the transformed docs

🤖 Generated with Claude Code

Wraps GET/POST `/data_sets/{id}/docs` so callers can read, replace, and
copy the rich documentation (DocContainer markdown bodies) attached to
nexsets. POST has replace-all semantics for the nexset's docs.

Notable choices caught during testing:
- DocContainerInput sets `extra="ignore"` so a full DocContainer response
  can be round-tripped through it; the SDK BaseModel default is
  `extra="allow"` which would otherwise leak server-owned fields back to
  the API in copy_docs.
- DocContainerInput excludes `public` and `tags`. The server rejects
  `public` ("Input cannot include public attribute"); `tags` is on the
  response but unconfirmed as writable, so kept out conservatively. Can
  be added later if/when confirmed.

Also expands the previously stub DocContainer response model to its
real shape (id, owner, org, doc_type, public, repo_type, repo_config,
text, access_roles, tags, copied_from_id, created_at, updated_at).

Tests: 6 new unit tests in test_nexsets.py covering list/update/copy
including dict-input pass-through (for MCP callers) and copy_docs's
empty-source no-op + field-stripping behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tariqalagha-nexla tariqalagha-nexla merged commit 89fdc84 into nexla-opensource:main May 9, 2026
7 checks passed
Copy link
Copy Markdown
Member

@saksham-nexla saksham-nexla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This is a clean, well-scoped addition. The core design choices are sound and the implementation is consistent with existing SDK patterns.

What's done well:

  • DocContainerInput with extra="ignore" is exactly the right mechanism for the copy_docs round-trip strip — the PR description even calls out the unit test that caught the issue before it could reach a live POST, which is the right way to develop this kind of model.
  • The update_docs dual-input type (Union[DocContainerInput, Dict]) is a practical concession for MCP/pass-through callers and is tested explicitly.
  • Test coverage is thorough: happy paths, dict pass-through, field-strip assertions in copy_docs, and the empty-source no-op are all covered.
  • DocContainer response model expansion is complete and well-typed (datetime, Optional, Field(default_factory=list) for list fields).
  • The _serialize_data reuse in update_docs keeps serialisation consistent with the rest of BaseResource.

Concerns (details in inline comments):

  1. [minor] copy_docs can raise an unhandled ValidationError when a source doc has text=Nonemodel_dump(exclude_none=True) silently drops text, which then fails DocContainerInput's required text: str field. Dropping exclude_none=True (field stripping still works via extra="ignore") or pre-filtering None-text docs would fix this.

  2. [nit] Returns: in the update_docs docstring is mis-indented inside the Args: block — Sphinx will render it as part of the docs parameter description.

  3. [nit] DocContainerInput docstring doesn't mention that repo_type/repo_config are also excluded (only public/tags have documented rationale). Minor maintenance debt.

  4. [nit] The silent return [] on non-list API responses in list_docs/update_docs is consistent with the rest of the SDK, but worth tracking as a broader improvement area.

None of these are blockers — the text=None case is an uncommon API edge case, and the rest are documentation nits. Approving; the minor can be addressed in a follow-up or as part of this PR if the author prefers.


Generated by Claude Code

Comment on lines +217 to +219
Returns:
The new list of ``DocContainer`` entries as parsed from the
server response.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Returns: section is mis-indented inside Args:

Issue: The Returns: block (lines 217-219) is indented at 12 spaces — the same level as individual parameter descriptions — rather than 8 spaces (same level as Args:). Sphinx/autodoc and most docstring parsers treat it as a continuation of the docs parameter description rather than a top-level section, so rendered API docs will look broken.

Suggestion: Dedent Returns: and its body by one level:

        Args:
            set_id: Nexset ID.
            docs: ...

        Returns:
            The new list of ``DocContainer`` entries as parsed from the
            server response.

Generated by Claude Code

return []
payload = [
DocContainerInput.model_validate(
doc.model_dump(exclude_none=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] copy_docs raises ValidationError when a source doc has text=None

Issue: DocContainer.text is Optional[str], so the API can return null for text. When that happens, model_dump(exclude_none=True) omits text from the dict entirely. DocContainerInput.model_validate(...) then raises a ValidationError ("field required") because DocContainerInput.text: str has no default. The caller gets an unhandled exception mid-copy with no indication of which doc caused it.

Suggestion: Drop exclude_none=True from the model_dump call here. With DocContainerInput's extra="ignore" policy, server-owned fields are still stripped regardless. text=None will then cause a type-validation failure (str doesn't accept None), which is arguably correct — but at least the error message will be clearer. Alternatively, pre-filter docs with None text and document the behaviour:

valid_docs = [doc for doc in source_docs if doc.text is not None]
if not valid_docs:
    return []
payload = [
    DocContainerInput.model_validate(doc.model_dump())
    for doc in valid_docs
]

Generated by Claude Code

``public``, ``tags``, ``created_at`` etc.) are silently ignored on
construction so a full ``DocContainer`` response can be round-tripped
through this model to drop fields the API does not accept on input.
"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Docstring doesn't explain why repo_type / repo_config are excluded

Issue: The class docstring lists id, owner, org, public, tags, and created_at as intentionally excluded, but doesn't mention repo_type or repo_config. The PR description documents the reasoning for public and tags specifically; a future maintainer seeing a 400 from a POST with repo_type included will have no breadcrumb here.

Suggestion: Add a brief note (matching the existing PR description rationale) about repo_type / repo_config:

    Server-owned and read-only fields (``id``, ``owner``, ``org``,
    ``public``, ``tags``, ``created_at`` etc.) are silently ignored on
    construction so a full ``DocContainer`` response can be round-tripped
    through this model to drop fields the API does not accept on input.
    ``repo_type`` and ``repo_config`` are also excluded pending confirmation
    that the API accepts them on write.

Generated by Claude Code

response = self._make_request("GET", path, params=params)
if isinstance(response, list):
return [DocContainer.model_validate(item) for item in response]
return []
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Silent return [] on non-list response (same applies to update_docs at line 242)

Issue: If the API ever returns a non-list shape (e.g. a dict or null) for these endpoints — say, during a server-side change or transient error that slips past HTTP-level checking — both list_docs and update_docs silently return []. The caller receives an empty result and has no way to distinguish "no docs" from "unexpected response".

This pattern is consistent with the rest of the SDK (get_accessors, etc.), so it's not blocking. Worth tracking as a broader SDK concern: consider adding a warning log or raising NexlaError when isinstance(response, list) is unexpectedly False.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants