[FIX] Import dataset mapping#140
Merged
Merged
Conversation
* added GET "/datasets/compatible" * Add GetImportCompatibleDatasets use case and integrate into dataset configuration * Enhance dataset creation workflow with update functionality and new dialog. - Added DatasetUpdateDialog component for updating datasets, integrated data source selection, and improved dataset configuration forms. - Updated translations for button labels and added validation for compatible datasets. * latest * Implement dataset update functionality and improve error handling. - Introduced UpdateDatasetUseCase for handling dataset updates. - Enhanced DatasetConfigurationForm and DatasetUpdateDialog to support source and target dataset selection. - Added error handling and validation for dataset updates in the relevant components. - Updated useDatasetConfigurationForm to include the new update method. * refactor * Refactor dataset creation components and introduce DatasetCreateDialog. - Renamed DatasetConfigurationDialog to DatasetCreateDialog for clarity. - Added new DatasetCreateDialog component to handle dataset creation with improved UI and validation. - Updated useDatasetConfigurationNameAndWorkspace to remove unused imports. * refactoring * Enhance error handling in AxiosErrorHandler and DocumentRepository. - Prioritize specific error messages in AxiosErrorHandler based on business logic, detailed messages, and generic HTTP status messages. - Update DocumentRepository to include a new error constant for listing documents and adjust error handling accordingly. - Modify error detail in documents.py to provide more specific feedback when no documents are found. * Refactor dataset configuration components to support TypeScript. - Updated DatasetConfigurationForm, DatasetConfigurationMetadataSelector, and DatasetCreateDialog to use TypeScript for improved type safety. - Enhanced validator functions in DatasetConfigurationForm and DatasetCreateDialog to specify parameter types. * fix extralit/unit tests * fix tests * fix tests * latest * fix tests * fix DatasetMapping * Refactor DatasetCreation and ImportHistoryDatasetBuilder to replace external_id with source_id and target_id. - Updated DatasetCreation to use source_id and target_id for improved clarity in dataset mappings. - Modified ImportHistoryDatasetBuilder to align with the new DatasetCreation structure, ensuring proper mapping of source_id and target_id. * test fixes * fix tests * fix tests * Revert "fix tests" This reverts commit 3ecb423. * fix tests * fix tests * Refactor document handling in WorkspacesAPI and update related tests - Removed deprecated document methods from WorkspacesAPI. - Updated document creation logic to directly use the Document class. - Simplified the add_document method in the Workspace class. - Cleaned up test cases related to document operations in WorkspacesAPI.
- Added structured dataset mapping support with `DatasetMappingModel` and `DatasetMapping` abstractions. - Introduced `mapping` field in `DatasetModel` for enhanced serialization/deserialization. - Implemented incremental dataset import functionality in the frontend with `DatasetUpdateDialog`. - Refactored backend endpoints for improved document fetching and dataset import processes. - Fixed various issues related to document handling and import analysis display. - Updated version numbers across all components to reflect the new release.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new, structured dataset mapping model to the Extralit codebase, replacing the previous ad-hoc dictionary-based mapping approach. The changes span both the backend and client libraries, updating API endpoints, models, and serialization logic to support the new
DatasetMappingModelandDatasetMappingabstractions. Additionally, several database index names are corrected for consistency, and the API endpoints for dataset import operations are clarified and renamed.Key changes:
Dataset Mapping Model Introduction and Integration
DatasetMappingModeland related classes to provide a structured, validated way to represent dataset mappings (extralit/src/extralit/_models/_settings/_mapping.py).DatasetMappingabstraction in the settings layer, with methods for conversion between models and dictionaries, and updated all relevant code to use this new abstraction instead of raw dictionaries (extralit/src/extralit/settings/_mapping.py,extralit/src/extralit/settings/_resource.py). [1] [2] [3] [4] [5] [6] [7] [8] [9]DatasetModelto include amappingfield and ensured proper serialization and deserialization throughout the stack (extralit/src/extralit/_models/_dataset.py). [1] [2]API and Schema Adjustments
mappingfield and validate it using the new model (extralit-server/src/extralit_server/api/schemas/v1/datasets.py). [1] [2]/import-hubfor HuggingFace imports and/importfor import-history-based imports (extralit-server/src/extralit_server/api/handlers/v1/datasets/datasets.py,extralit-frontend/v1/infrastructure/repositories/DatasetRepository.ts). [1] [2] [3]Database Index Naming Consistency
importstable to use a consistent naming convention, both in Alembic migrations and in the SQLAlchemy model (extralit-server/src/extralit_server/alembic/versions/7d6b33203390_create_import_history_table.py,.kiro/specs/papers-library-importer/design.md). [1] [2]Miscellaneous Model Improvements
workspace_idinDocumentModelto ensure proper string conversion (extralit/src/extralit/_models/_document.py).extralit/src/extralit/_models/_document.py, extralit/src/extralit/settings/_resource.pyR29)These changes collectively modernize and standardize how dataset mappings are handled across Extralit, improving validation, maintainability, and clarity throughout the codebase.