[Feature] Implement Missing Workspace API Methods and Rebuild CLI for Argilla V2#57
Conversation
…aining, users, and workspaces - Migrate CLI command structure from Argilla v1 to a new modular format. - Implement dataset management commands including create, list, delete, and push-to-huggingface. - Implement user management commands including create, list, and delete. - Implement workspace management commands including create, list, add-user, and delete-user. - Update main entry points for each command module to utilize Typer for command-line interface. - Enhance error handling and user feedback with rich console output. - Update tasks documentation to reflect completed command migrations and new functionalities.
- Replace mock workspace validation with actual workspace retrieval. - Integrate client initialization for schema operations. - Add methods to list, get, delete, and upload schemas via API. - Enhance error handling and user feedback for schema operations. - Update upload command to handle JSON schema files and provide detailed output on uploaded schemas.
- Added a new task for replacing mock implementations with real API calls. - Reorganized priorities for live server testing and error handling. - Expanded documentation for command completion, aliases, and developer guidelines. - Documented recent progress on server setup, authentication issues, and CLI command testing.
…Workspace classes
…oad, download, list, and delete functionalities
…ment file and schema management tests
…e CLI and Workspace API documentation
There was a problem hiding this comment.
Pull Request Overview
This PR implements the missing workspace API methods and rebuilds the CLI for Argilla V2, restoring critical functionality from the previous version. Key changes include:
- Adding new CLI commands for document and dataset management.
- Implementing file, document, and schema operations along with comprehensive error handling.
- Updating documentation, Docker configurations, and contribution guidelines.
Reviewed Changes
Copilot reviewed 74 out of 76 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| argilla/src/argilla/cli/documents/add.py | Implements the add document CLI command with progress UI |
| argilla/src/argilla/cli/documents/main.py | Registers document commands |
| argilla/src/argilla/cli/datasets/main.py | Adds and updates dataset CLI commands and error messages |
| argilla/src/argilla/cli/callback.py | Provides client initialization and callback functions |
| argilla/src/argilla/cli/app.py | Consolidates command modules into the CLI application |
| argilla/src/argilla/_models/_files.py | Implements file operation models |
| argilla/src/argilla/_models/_documents.py | Implements document operation models |
| argilla/pyproject.toml | Updates dependencies and CLI entry script |
| Documentation and Docker/Server config files | Update and add CLI, API, and Docker documentation |
Files not reviewed (2)
- .env.test: Language not supported
- argilla/.gitignore: Language not supported
|
Amazing work @Ashutoshx7 and @priyankeshh! It's great that you addressed both the API, client SDK, and CLI functionalities. I will be testing it and making minor adjustments as needed. |
|
When I tried to connect it to the demo HF spaces demo instance: It returns a message saying logged in successfully, but actually it doesn't connect to the server. It's supposed to return an error if the api-url or the api-key is incorrect, but it only gives a warning. So, most other functions didn't use the builtin authentication to connect to the server API, for example,
|
It seems like the issue is coming from the implementation of https://github.com/extralit/extralit/blob/cd017f10936e43ff79fbca2d6c7ba9175afeeafd/argilla/src/argilla/client/__init__.py#L49 Ideally, the CLI should be using the same Argilla client class defined in |
- Moved the Argilla client implementation to a new core module for better separation of concerns. - Created a resources module to manage collections of Argilla resources (Users, Workspaces, Datasets, Webhooks). - Updated the main client initialization to use the new structure and maintain backward compatibility. - Improved documentation and comments for clarity. - Removed circular import issues by restructuring imports.
|
@priyankeshh @Ashutoshx7 Thanks so much for all your hard work! It seems to work when I tested I found a few issues we still need to address before merging:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from extralit.extraction.models import SchemaStructure
def upload_schemas(schemas: "SchemaStructure"): ...Also, there are additional minor/chores issues can be addressed after the problems above are working - listing them here for now:
I can get to these work items by the end of this week, unless someone in @extralit/gsoc-contributors wants to get their hands wet in the codebase (it's a good way to learn the API) |
|
Hello @JonnyTran, |
fix: `extralit whoami` CLI
- Changed license header from Argilla, Inc. to Extralit Labs, Inc. - Removed outdated comments in various CLI modules to improve code clarity and maintainability.
- Removed deprecated DatasetType enum. - Updated import statements for circular imports and type checking.
- Updated the list_datasets command to enforce workspace parameter as required and improved dataset display formatting. - Ensured workspace parameter is required in the create_dataset command. - Cleaned up comments and improved error messages in the extraction module.
refactor: simplify dataset and user listing with print_rich_table function chore: disable temporary tests for datasets and extraction commands feat: add precommit hooks for removing unused imports and variables in argilla and argilla-server
❌ 3 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
- Removed unnecessary docstrings from several CLI files for clarity. - Updated the `list_files` and `list_documents` commands to utilize the `print_rich_table` function for improved output formatting. - Refactored the command registration process in the CLI app for better organization. - Ensured consistent handling of optional fields in table displays.
- Improved output formatting in CLI commands using `print_rich_table`. - Updated workspace methods to use type hints for better clarity and consistency. - Refactored schema-related commands to improve error handling and user feedback. - Replaced deprecated methods with updated ones for listing and managing schemas. - Removed unused schema command files to streamline the codebase.
- Removed unnecessary docstrings and comments from various CLI test files for improved clarity. - Disabled several tests temporarily to streamline the testing process while addressing ongoing issues.
There was a problem hiding this comment.
Pull Request Overview
This PR implements the missing workspace API methods and rebuilds the CLI for Argilla V2, addressing major migration gaps from V1. The changes include new models for file and document operations, updated schema management (including migration of API methods), and extensive additions to documentation and CLI command usage.
Reviewed Changes
Copilot reviewed 92 out of 92 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| argilla/src/argilla/_models/_files.py | New models for workspace file metadata and responses. |
| argilla/src/argilla/_models/_documents.py | New models for workspace document management. |
| argilla/pyproject.toml | Dependency and script updates (e.g., adding Typer dependency). |
| argilla/docs/workspace_api.md | Comprehensive documentation for workspace API methods. |
| argilla/docs/reference/argilla/client.md | Updated client reference paths for API consistency. |
| argilla/docs/cli_commands.md | New CLI command documentation for file, document, and schema operations. |
| argilla-v2-server/docker-compose.yaml | New configuration for running the V2 server locally. |
| argilla-v2-server/README.md | Instructions for starting and using the local V2 server. |
| argilla-v1/src/extralit/extraction/models/schema.py | Schema management enhancements and API migration changes. |
| argilla-v1/src/argilla_v1/client/workspaces.py | Updated import and usage of constants for schema paths. |
| argilla-v1/src/argilla_v1/cli/schemas/list.py | Adjusted imports to align with new schema structure. |
| argilla-v1/src/argilla_v1/cli/schemas/delete.py | Updated imports for schema deletion functionality. |
| argilla-server/LICENSE_HEADER | Updated license header to reflect Extralit Labs, Inc. |
| CLI_README.md | New CLI usage documentation. |
| CLI_CONTRIBUTING.md | Guidelines for contributing to the CLI. |
| API_DIFFERENCES.md | Document outlining API differences between Argilla v1 and v2. |
| .pre-commit-config.yaml | Added autoflake hooks for cleanup in argilla and server directories. |
| .env.test | New test environment variables for extraction commands. |
- Deleted the `API_DIFFERENCES.md` file as its content has been integrated into the migration guide and other documentation. - Updated migration guide to include key API differences between Argilla v1 and v2, enhancing clarity for users transitioning to the new version. - Improved documentation for workspace and schema management in the CLI and Python API.
…devcontainers or in sub-folders
JonnyTran
left a comment
There was a problem hiding this comment.
Great work everyone! This has been a significant feat which will allow users to use the CLI, fixed the API from the Argilla v2 migration, and will keep us moving forward to build other important features.
I reviewed all codes summited by @priyankeshh and @Ashutoshx7 , refactored the code, applied styling standards, and fixed datasets, users, schemas, workspaces CLI functions, as well as client/api codes in argilla/.
To keep this PR merged quickly to unblock other issues, several tests has been marked skipped with @pytest.mark.skip(reason="Test temporarily disabled"). @ArthrowAbstract @Ashutoshx7 @prakharsingh-74 , if you want to revisit these tests, please feel free to open a new PR.
… Argilla V2 (#57) This PR is a comprehensive overhaul and modernization of the Extralit CLI and Python client, completing the migration of all workspace and schema management features to Argilla V2. The work was a collaborative effort by @priyankeshh, @Ashutoshx7, and @JonnyTran. ### Major Features and Improvements - **Refactored CLI Structure:** Migrated the CLI command structure from Argilla v1 to a new modular format using Typer, with separate command modules for datasets, training, users, workspaces, schemas, files, and documents. - **Dataset Management:** Implemented commands for creating, listing, deleting, and pushing datasets to HuggingFace Hub, with improved error handling and output formatting. - **User and Workspace Management:** Added commands for creating, listing, and deleting users, as well as creating, listing, adding users to, and deleting workspaces. Enhanced help messages and user feedback. - **Schema Management:** Implemented full CRUD support for workspace schemas, including Pandera-based schema serialization/deserialization, versioning, and sharing across datasets. Added commands for listing, uploading, downloading, and deleting schemas via CLI and Python API. - **File and Document Management:** Added commands for uploading, downloading, listing, and deleting files and documents in workspaces, with robust error handling and logging. - **Testing and Documentation:** Added and updated unit and integration tests for all CLI commands and API methods. Expanded and improved documentation for workspace API methods, CLI usage, and migration from Argilla v1 to v2. - **Developer Experience:** Enhanced error handling and user feedback with rich console output. Updated the CLI script name from `argilla` to `extralit`. Improved code organization, removed circular imports, and updated license headers. - **Migration and Compatibility:** Restored and modernized all critical workspace and schema operations from Argilla V1, unblocking CLI-based workflows for researchers and users. Integrated key API differences into the migration guide and removed outdated documentation. ### Notable Commits - Refactor CLI structure and implement command modules for datasets, training, users, and workspaces - Add CLI schema management commands and corresponding tests - Implement schema management features in CLI - Implement file and document management features in WorkspacesAPI and Workspace classes - Implement file and document management commands in CLI, including upload, download, list, and delete functionalities - Add integration tests for CLI commands and document operations; implement file and schema management tests - Refactor Argilla client structure and improve module organization - Refactor: imports to use centralized DEFAULT_SCHEMA_S3_PATH constant - Refactor: reorganize client structure and update imports - Refactor: updated circular imports and `datasets` CLI commands - Refactor: updated `files` and `documents` CLI modules - Refactor: enhance `workspace` and `schema` handling in CLI - Tests: clean up CLI test files and disable temporary tests - Docs: moved md files from repo root to `argilla/docs/` - Chore: remove outdated API differences documentation ### Documentation - Updated and expanded documentation for all workspace API methods and CLI commands. - Integrated API differences between Argilla v1 and v2 into the migration guide. ### Contributors - Special thanks to @priyankeshh and @Ashutoshx7 for their collaborative work on this release. --- **This PR closes #56 and #53.**
… Argilla V2 (#57) This PR is a comprehensive overhaul and modernization of the Extralit CLI and Python client, completing the migration of all workspace and schema management features to Argilla V2. The work was a collaborative effort by @priyankeshh, @Ashutoshx7, and @JonnyTran. ### Major Features and Improvements - **Refactored CLI Structure:** Migrated the CLI command structure from Argilla v1 to a new modular format using Typer, with separate command modules for datasets, training, users, workspaces, schemas, files, and documents. - **Dataset Management:** Implemented commands for creating, listing, deleting, and pushing datasets to HuggingFace Hub, with improved error handling and output formatting. - **User and Workspace Management:** Added commands for creating, listing, and deleting users, as well as creating, listing, adding users to, and deleting workspaces. Enhanced help messages and user feedback. - **Schema Management:** Implemented full CRUD support for workspace schemas, including Pandera-based schema serialization/deserialization, versioning, and sharing across datasets. Added commands for listing, uploading, downloading, and deleting schemas via CLI and Python API. - **File and Document Management:** Added commands for uploading, downloading, listing, and deleting files and documents in workspaces, with robust error handling and logging. - **Testing and Documentation:** Added and updated unit and integration tests for all CLI commands and API methods. Expanded and improved documentation for workspace API methods, CLI usage, and migration from Argilla v1 to v2. - **Developer Experience:** Enhanced error handling and user feedback with rich console output. Updated the CLI script name from `argilla` to `extralit`. Improved code organization, removed circular imports, and updated license headers. - **Migration and Compatibility:** Restored and modernized all critical workspace and schema operations from Argilla V1, unblocking CLI-based workflows for researchers and users. Integrated key API differences into the migration guide and removed outdated documentation. ### Notable Commits - Refactor CLI structure and implement command modules for datasets, training, users, and workspaces - Add CLI schema management commands and corresponding tests - Implement schema management features in CLI - Implement file and document management features in WorkspacesAPI and Workspace classes - Implement file and document management commands in CLI, including upload, download, list, and delete functionalities - Add integration tests for CLI commands and document operations; implement file and schema management tests - Refactor Argilla client structure and improve module organization - Refactor: imports to use centralized DEFAULT_SCHEMA_S3_PATH constant - Refactor: reorganize client structure and update imports - Refactor: updated circular imports and `datasets` CLI commands - Refactor: updated `files` and `documents` CLI modules - Refactor: enhance `workspace` and `schema` handling in CLI - Tests: clean up CLI test files and disable temporary tests - Docs: moved md files from repo root to `argilla/docs/` - Chore: remove outdated API differences documentation ### Documentation - Updated and expanded documentation for all workspace API methods and CLI commands. - Integrated API differences between Argilla v1 and v2 into the migration guide. ### Contributors - Special thanks to @priyankeshh and @Ashutoshx7 for their collaborative work on this release. --- **This PR closes #56 and #53.**
… Argilla V2 (#57) This PR is a comprehensive overhaul and modernization of the Extralit CLI and Python client, completing the migration of all workspace and schema management features to Argilla V2. The work was a collaborative effort by @priyankeshh, @Ashutoshx7, and @JonnyTran. ### Major Features and Improvements - **Refactored CLI Structure:** Migrated the CLI command structure from Argilla v1 to a new modular format using Typer, with separate command modules for datasets, training, users, workspaces, schemas, files, and documents. - **Dataset Management:** Implemented commands for creating, listing, deleting, and pushing datasets to HuggingFace Hub, with improved error handling and output formatting. - **User and Workspace Management:** Added commands for creating, listing, and deleting users, as well as creating, listing, adding users to, and deleting workspaces. Enhanced help messages and user feedback. - **Schema Management:** Implemented full CRUD support for workspace schemas, including Pandera-based schema serialization/deserialization, versioning, and sharing across datasets. Added commands for listing, uploading, downloading, and deleting schemas via CLI and Python API. - **File and Document Management:** Added commands for uploading, downloading, listing, and deleting files and documents in workspaces, with robust error handling and logging. - **Testing and Documentation:** Added and updated unit and integration tests for all CLI commands and API methods. Expanded and improved documentation for workspace API methods, CLI usage, and migration from Argilla v1 to v2. - **Developer Experience:** Enhanced error handling and user feedback with rich console output. Updated the CLI script name from `argilla` to `extralit`. Improved code organization, removed circular imports, and updated license headers. - **Migration and Compatibility:** Restored and modernized all critical workspace and schema operations from Argilla V1, unblocking CLI-based workflows for researchers and users. Integrated key API differences into the migration guide and removed outdated documentation. ### Notable Commits - Refactor CLI structure and implement command modules for datasets, training, users, and workspaces - Add CLI schema management commands and corresponding tests - Implement schema management features in CLI - Implement file and document management features in WorkspacesAPI and Workspace classes - Implement file and document management commands in CLI, including upload, download, list, and delete functionalities - Add integration tests for CLI commands and document operations; implement file and schema management tests - Refactor Argilla client structure and improve module organization - Refactor: imports to use centralized DEFAULT_SCHEMA_S3_PATH constant - Refactor: reorganize client structure and update imports - Refactor: updated circular imports and `datasets` CLI commands - Refactor: updated `files` and `documents` CLI modules - Refactor: enhance `workspace` and `schema` handling in CLI - Tests: clean up CLI test files and disable temporary tests - Docs: moved md files from repo root to `argilla/docs/` - Chore: remove outdated API differences documentation ### Documentation - Updated and expanded documentation for all workspace API methods and CLI commands. - Integrated API differences between Argilla v1 and v2 into the migration guide. ### Contributors - Special thanks to @priyankeshh and @Ashutoshx7 for their collaborative work on this release. --- **This PR closes #56 and #53.**
… Argilla V2 (#57) This PR is a comprehensive overhaul and modernization of the Extralit CLI and Python client, completing the migration of all workspace and schema management features to Argilla V2. The work was a collaborative effort by @priyankeshh, @Ashutoshx7, and @JonnyTran. ### Major Features and Improvements - **Refactored CLI Structure:** Migrated the CLI command structure from Argilla v1 to a new modular format using Typer, with separate command modules for datasets, training, users, workspaces, schemas, files, and documents. - **Dataset Management:** Implemented commands for creating, listing, deleting, and pushing datasets to HuggingFace Hub, with improved error handling and output formatting. - **User and Workspace Management:** Added commands for creating, listing, and deleting users, as well as creating, listing, adding users to, and deleting workspaces. Enhanced help messages and user feedback. - **Schema Management:** Implemented full CRUD support for workspace schemas, including Pandera-based schema serialization/deserialization, versioning, and sharing across datasets. Added commands for listing, uploading, downloading, and deleting schemas via CLI and Python API. - **File and Document Management:** Added commands for uploading, downloading, listing, and deleting files and documents in workspaces, with robust error handling and logging. - **Testing and Documentation:** Added and updated unit and integration tests for all CLI commands and API methods. Expanded and improved documentation for workspace API methods, CLI usage, and migration from Argilla v1 to v2. - **Developer Experience:** Enhanced error handling and user feedback with rich console output. Updated the CLI script name from `argilla` to `extralit`. Improved code organization, removed circular imports, and updated license headers. - **Migration and Compatibility:** Restored and modernized all critical workspace and schema operations from Argilla V1, unblocking CLI-based workflows for researchers and users. Integrated key API differences into the migration guide and removed outdated documentation. ### Notable Commits - Refactor CLI structure and implement command modules for datasets, training, users, and workspaces - Add CLI schema management commands and corresponding tests - Implement schema management features in CLI - Implement file and document management features in WorkspacesAPI and Workspace classes - Implement file and document management commands in CLI, including upload, download, list, and delete functionalities - Add integration tests for CLI commands and document operations; implement file and schema management tests - Refactor Argilla client structure and improve module organization - Refactor: imports to use centralized DEFAULT_SCHEMA_S3_PATH constant - Refactor: reorganize client structure and update imports - Refactor: updated circular imports and `datasets` CLI commands - Refactor: updated `files` and `documents` CLI modules - Refactor: enhance `workspace` and `schema` handling in CLI - Tests: clean up CLI test files and disable temporary tests - Docs: moved md files from repo root to `argilla/docs/` - Chore: remove outdated API differences documentation ### Documentation - Updated and expanded documentation for all workspace API methods and CLI commands. - Integrated API differences between Argilla v1 and v2 into the migration guide. ### Contributors - Special thanks to @priyankeshh and @Ashutoshx7 for their collaborative work on this release. --- **This PR closes #56 and #53.**
Me and @Ashutoshx7 have worked together on this PR
This PR completes the migration and implementation of all workspace schema management features in the Argilla V2 CLI and Python client.
It restores and modernizes all critical workspace and schema operations that were present in Argilla V1, including:
This resolves the major regression from the V1 to V2 migration and unblocks researchers and users who rely on CLI-based workflows.
Related Tickets & Documents
Closes #56
Closes #53 (Rebuild Extralit CLI for Argilla V2)
What type of PR is this? (check all applicable)
Steps to QA
extralit schemas list --workspace <name>extralit schemas upload <dir> --workspace <name>extralit schemas download <dir> --workspace <name>extralit schemas delete <schema_id> --workspace <name>docs/workspace_api.mdAdded/updated tests?
Added/updated documentations?
Checklist