docs: update dedup docs for WorkflowRunResult (PR #1275)#1654
docs: update dedup docs for WorkflowRunResult (PR #1275)#1654lbliii merged 5 commits intoNVIDIA-NeMo:26.04-stagingfrom
Conversation
Greptile SummaryThis PR updates the v26.04 Fern documentation to reflect the new One minor inconsistency remains: Confidence Score: 5/5Safe to merge; all findings are P2 documentation improvements with no impact on runtime behaviour. All prior P1 concerns (WorkflowBase inheritance claim, missing id_generator_path, incomplete semdedup metadata comments) have been resolved. The one remaining inline comment (incomplete TextSemanticDeduplicationWorkflow metadata list in index.mdx) is a minor P2 inconsistency between the overview page and the more detailed semdedup.mdx page. fern/versions/v26.04/pages/curate-text/process-data/deduplication/index.mdx — the TextSemanticDeduplicationWorkflow metadata comment at line 118 is incomplete relative to semdedup.mdx and the source code. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["workflow.run()"] --> B["WorkflowRunResult"]
B --> C["result.metadata dict"]
B --> D["result.pipeline_tasks dict"]
C --> E1["ExactDeduplicationWorkflow\ntotal_time, num_duplicates,\nidentification_time, id_generator_path"]
C --> E2["FuzzyDeduplicationWorkflow\ntotal_time, num_duplicates, minhash_time,\nlsh_time, connected_components_pipeline_time,\nid_generator_path"]
C --> E3["TextSemanticDeduplicationWorkflow\ntotal_time, num_duplicates, num_duplicates_removed,\nembedding_time, identification_time,\nremoval_time, final_output_path"]
C --> E4["SemanticDeduplicationWorkflow\ntotal_time, num_duplicates,\nkmeans_time, pairwise_time"]
C --> E5["TextDuplicatesRemovalWorkflow\ntotal_time, num_duplicates_removed"]
Reviews (3): Last reviewed commit: "docs: remove internal WorkflowRunResult/..." | Re-trigger Greptile |
|
|
||
| --- | ||
|
|
||
| ## WorkflowRunResult |
There was a problem hiding this comment.
Do we need this change? @ayushdg / @sarahyurick / @VibhuJawa seems too internal
There was a problem hiding this comment.
Yeah maybe it can be added after the WorkflowBase section and just be:
## WorkflowRunResult
A dataclass returned by all deduplication workflow run() methods. Contains workflow outputs, pipeline task mappings, and metadata such as timing and duplicate counts.
See the API documentation (link) for more details.
without the full list of methods etc.
praateekmahajan
left a comment
There was a problem hiding this comment.
Rest of the PR LGTM except that one comment
Update v26.04 fern docs to reflect the new WorkflowRunResult return type from all deduplication workflow run() methods. Add API reference docs for WorkflowRunResult and WorkflowBase, update code examples across dedup pages, and replace 26.02 release notes with 26.04 skeleton. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
Add input_filegroups_time, connected_components_pipeline_time, and complete TextSemanticDedup keys to API reference table and inline comments. Note PR NVIDIA-NeMo#1275 provenance on workflow.py source link. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
- Qualify WorkflowBase claim: "most" workflows inherit, not "all" (TextSemanticDeduplicationWorkflow duck-types the interface) - Add missing id_generator_path to exact/fuzzy comments in index.mdx - Complete semdedup.mdx metadata comments with identification_time, removal_time, and final_output_path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
Per praateekmahajan review feedback — these sections are too internal for public docs. Also removes dead link in dedup index page. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
Description
Updates v26.04 fern documentation to reflect the new
WorkflowRunResultreturn type introduced in #1275. All deduplication workflowrun()code examples now capture the return value and show available metadata keys. Adds API reference documentation forWorkflowRunResultandWorkflowBaseto the pipeline reference page. Replaces the 26.02 release notes with a 26.04 skeleton including the Workflow Results API entry and breaking changes.Usage
Checklist
Run the following to install Fern:
Run the following to generate a local preview: