diff --git a/.github/workflows/argilla-frontend.yml b/.github/workflows/argilla-frontend.yml index 6321c889c..87158318d 100644 --- a/.github/workflows/argilla-frontend.yml +++ b/.github/workflows/argilla-frontend.yml @@ -72,7 +72,6 @@ jobs: - name: Run tests with coverage ๐Ÿงช id: run-tests - continue-on-error: true run: | npm run test:coverage diff --git a/.kiro/specs/import-history-sidebar/design.md b/.kiro/specs/import-history-sidebar/design.md new file mode 100644 index 000000000..c6fdd9233 --- /dev/null +++ b/.kiro/specs/import-history-sidebar/design.md @@ -0,0 +1,508 @@ +# Design Document + +## Overview + +The Import History Sidebar feature enhances the home page user experience by replacing the example datasets section with a "Recent Imports" section that displays the 5 most recent ImportHistory records for the selected workspace. Users can click on these records to navigate to a new route (`new/import/{import_id}`) that displays the DatasetConfiguration component populated with ImportHistory data instead of HuggingFace Hub data. + +This feature builds on the existing papers-library-importer functionality and reuses the existing DatasetConfiguration component architecture, adapting it to work with ImportHistory data structures. The design follows the established patterns in the codebase: Vue.js components with TypeScript, domain-driven architecture with use cases and repositories, and consistent styling using the existing design system. + +## Architecture + +### High-Level Flow + +1. **Home Page Load**: User visits home page, Recent Imports section loads automatically for selected workspace +2. **Import History Display**: System fetches and displays 5 most recent ImportHistory records with summary information +3. **Navigation**: User clicks on ImportHistory record, navigates to `new/import/{import_id}` route +4. **Import Configuration Load**: System fetches detailed ImportHistory data and populates DatasetConfiguration component +5. **Data Preview**: ImportHistory tabular data displays in preview section instead of HuggingFace iframe +6. **Dataset Configuration**: User configures fields and questions using existing DatasetConfiguration workflow +7. **Dataset Creation**: System creates dataset from ImportHistory data with user's configuration + +### Component Interaction + +```mermaid +graph TD + A[Home Page] --> B[Recent Imports Sidebar] + B --> C[ImportHistory API] + C --> D[ImportHistory List Display] + D --> E[User Clicks Import Record] + E --> F[Navigate to /new/import/{import_id}] + F --> G[Import Configuration Page] + G --> H[Fetch ImportHistory Details] + H --> I[ImportHistory Details API] + I --> J[DatasetConfiguration Component] + J --> K[ImportHistory Data Preview] + J --> L[Field Mapping Configuration] + L --> M[Dataset Creation] + M --> N[Dataset Created with ImportHistory Data] +``` + +## Components and Interfaces + +### Backend Components + +#### 1. ImportHistory API (Already Implemented) + +**Existing Endpoints:** +- `GET /api/v1/imports/history` - List import histories for workspace (supports workspace_id parameter) +- `GET /api/v1/imports/history/{history_id}` - Get detailed ImportHistory record with full data + +**Enhancement Needed:** +- Add support for `limit` parameter to existing list endpoint for Recent Imports sidebar + +**Existing Functionality:** +- Returns complete ImportHistory record including full `data` field in detailed view +- Validates user access to the ImportHistory record's workspace +- Provides detailed tabular data for DatasetConfiguration component +- List view includes metadata but excludes data for performance + +#### 2. ImportHistory Context Enhancement (`argilla-server/src/argilla_server/contexts/imports.py`) + +**New Services:** +- `get_import_history_details()` - Retrieve complete ImportHistory record with data +- `get_recent_import_history()` - Get limited list of recent imports for sidebar + +### Frontend Components + +#### 1. Home Page Integration (`argilla-frontend/pages/index.vue`) + +**Recent Imports Sidebar Section:** +- Replace example datasets section with Recent Imports component +- Display 5 most recent ImportHistory records for selected workspace +- Show loading states and empty states appropriately +- Integrate with existing workspace selection functionality +- Add "View All Imports" button below recent imports list +- Add "Import Documents" button to open ImportModal + +**Modified Structure:** +```vue + +``` + +#### 2. Recent Imports Component (`argilla-frontend/components/features/import/RecentImports.vue`) + +**Features:** +- Displays 5 most recent ImportHistory records for the workspace +- Shows filename, import date, and summary statistics +- Handles loading and empty states +- Provides click navigation to import configuration +- Responsive design for sidebar display + +**Component Structure:** +```vue + +``` + +#### 3. Recent Import Card Component (`argilla-frontend/components/features/import/RecentImportCard.vue`) + +**Features:** +- Compact card display for individual ImportHistory records +- Shows filename, date, and summary statistics +- Hover states and click interactions +- Consistent styling with existing card components + +**Card Structure:** +```vue + +``` + +#### 4. Import Configuration Page (`argilla-frontend/pages/new/import/_id.vue`) + +**Features:** +- New page route for ImportHistory-based dataset configuration +- Fetches ImportHistory details using route parameter +- Renders DatasetConfiguration component with ImportHistory data +- Handles loading, error, and navigation states +- Provides breadcrumb navigation + +**Page Structure:** +```vue + +``` + +#### 5. ImportHistory Dataset Creation Builder (`argilla-frontend/v1/domain/entities/import/ImportHistoryDatasetBuilder.ts`) + +**Purpose:** +- Adapts ImportHistory data structure to DatasetCreation format +- Converts ImportHistory tabular data to dataset fields and records +- Provides field mapping capabilities similar to HuggingFace datasets +- Handles data type inference and validation + +**Class Structure:** +```typescript +export class ImportHistoryDatasetBuilder { + private readonly importHistoryData: ImportHistoryDetailsResponse; + private readonly datasetName: string; + + constructor(importHistoryData: ImportHistoryDetailsResponse) { + this.importHistoryData = importHistoryData; + this.datasetName = this.generateDatasetName(); + } + + build(): DatasetCreation { + const subset = this.createSubsetFromImportHistory(); + return new DatasetCreation( + this.importHistoryData.id, + this.datasetName, + [subset] + ); + } + + private createSubsetFromImportHistory(): Subset { + const features = this.extractFeaturesFromSchema(); + return new Subset("default", { features }); + } + + private extractFeaturesFromSchema(): Record { + // Convert ImportHistory schema fields to DatasetCreation features + } +} +``` + +#### 6. Enhanced ImportHistory Use Cases + +**New Get ImportHistory Details Use Case (`argilla-frontend/v1/domain/usecases/get-import-history-details-use-case.ts`)** +```typescript +export class GetImportHistoryDetailsUseCase { + constructor(private readonly axios: NuxtAxiosInstance) {} + + async execute(importId: string): Promise { + const response = await this.axios.get(`/v1/imports/history/${importId}`); + return response.data; + } +} +``` + +**Enhanced Existing GetImportHistoryUseCase (add recent imports method)** +```typescript +export class GetImportHistoryUseCase { + constructor(private readonly axios: NuxtAxiosInstance) {} + + // Existing method for full history with pagination + async execute(params: ImportHistoryListParams): Promise { + // Existing implementation + } + + // New method for recent imports + async getRecent(workspaceId: string, limit: number = 5): Promise { + const params: ImportHistoryListParams = { + size: limit, + sort_by: 'created_at', + sort_order: 'desc', + filters: { workspace_id: workspaceId } + }; + return await this.execute(params); + } +} +``` + +### API Schemas + +#### Existing API Schemas (Already Implemented) +The `ImportHistoryResponse` schema already exists and supports both list and detailed views: +- List view: includes `id`, `workspace_id`, `user_id`, `filename`, `created_at`, `metadata` (excludes `data`) +- Detailed view: includes all fields including `data` with complete tabular dataframe + +#### Enhanced ImportHistory List Parameters +The existing `GET /api/v1/imports/history` endpoint will be enhanced to support: +- `limit` parameter for Recent Imports sidebar (default: 5) +- Existing `workspace_id`, `page`, `size`, `sort_by`, `sort_order` parameters +- No new request models needed - reuse existing query parameters + +### Data Flow Integration + +#### ImportHistory to DatasetConfiguration Integration + +The ImportHistoryDatasetBuilder handles conversion from ImportHistory data structure to DatasetCreation format, enabling seamless integration with the existing DatasetConfiguration component. Key mappings include ImportHistory schema fields to DatasetCreation features, and automatic population of `record.metadata.reference` from ImportHistory data. + +#### DatasetConfiguration Component Enhancement + +The existing DatasetConfiguration component will be enhanced to support ImportHistory data by adding conditional rendering in the preview section. When ImportHistory data is provided, the component will display ImportHistoryDataPreview instead of the HuggingFace Hub iframe, while maintaining all existing functionality for field mapping and dataset creation. + +#### ImportHistory Data Preview Component + +A new component that displays ImportHistory tabular data using BaseSimpleTable with search, filtering, and pagination capabilities. This component replaces the HuggingFace Hub iframe when working with ImportHistory data in the DatasetConfiguration component. + +## Error Handling + +### ImportHistory Loading Errors +- Invalid import ID: Redirect to home with error message +- Access denied: Show authorization error with login option +- Network failures: Provide retry mechanism with exponential backoff +- Data corruption: Display error details and suggest re-import + +### DatasetConfiguration Errors +- Missing ImportHistory data: Show loading state until data arrives +- Invalid data structure: Display validation errors with guidance +- Field mapping conflicts: Highlight problematic mappings with suggestions +- Dataset creation failures: Show detailed error messages with retry options + +### User Experience +- Loading states for all async operations +- Progressive enhancement with skeleton screens +- Graceful degradation for network issues +- Clear error messages with actionable guidance + +## Testing Strategy + +### Unit Tests +- ImportHistory data transformation logic +- DatasetConfiguration component with ImportHistory data +- Recent Imports component display and interactions +- API response handling and error scenarios + +### Integration Tests +- End-to-end navigation from home page to import configuration +- ImportHistory API integration with frontend components +- DatasetConfiguration workflow with ImportHistory data +- Error handling across component boundaries + +### Performance Tests +- Large ImportHistory dataset handling +- Recent Imports sidebar loading performance +- DatasetConfiguration rendering with large datasets +- Memory usage with multiple ImportHistory records + +## Implementation Structure + +### File Organization +``` +argilla-frontend/ +โ”œโ”€โ”€ pages/new/import/ +โ”‚ โ””โ”€โ”€ _id.vue # Import configuration page +โ”œโ”€โ”€ components/features/import/ +โ”‚ โ”œโ”€โ”€ RecentImports.vue # Recent imports sidebar component +โ”‚ โ”œโ”€โ”€ RecentImportCard.vue # Individual import card +โ”‚ โ”œโ”€โ”€ ImportHistoryDataPreview.vue # Data preview for configuration +โ”‚ โ””โ”€โ”€ useRecentImportsViewModel.ts # Recent imports view model +โ”œโ”€โ”€ v1/domain/ +โ”‚ โ”œโ”€โ”€ entities/import/ +โ”‚ โ”‚ โ”œโ”€โ”€ ImportHistoryDatasetBuilder.ts # ImportHistory to DatasetCreation adapter +โ”‚ โ”‚ โ””โ”€โ”€ ImportHistoryDetails.ts # ImportHistory details entity +โ”‚ โ””โ”€โ”€ usecases/ +โ”‚ โ”œโ”€โ”€ get-import-history-details-use-case.ts +โ”‚ โ””โ”€โ”€ get-recent-imports-use-case.ts +โ””โ”€โ”€ v1/domain/usecases/ + โ””โ”€โ”€ get-import-history-details-use-case.ts # New use case for detailed import data +``` + +### Backend File Organization +``` +argilla-server/ +โ”œโ”€โ”€ src/argilla_server/api/ +โ”‚ โ”œโ”€โ”€ handlers/v1/ +โ”‚ โ”‚ โ””โ”€โ”€ imports.py # Enhanced with details endpoint +โ”‚ โ””โ”€โ”€ schemas/v1/ +โ”‚ โ””โ”€โ”€ imports.py # Enhanced with details response +โ””โ”€โ”€ src/argilla_server/contexts/ + โ””โ”€โ”€ imports.py # Enhanced with details service +``` + +### Route Configuration +```typescript +// Enhanced ROUTES object +export const ROUTES = { + index: "/", + signIn: "/sign-in", + annotationPage: (datasetId: string) => `/dataset/${datasetId}/annotation-mode`, + settings: (id: string) => `/dataset/${id}/settings`, + importDatasetFromHub: (id: string) => `/new/hf/${encodeURIComponent(id)}`, + importConfiguration: (importId: string) => `/new/import/${importId}`, // New route +}; +``` + +### Styling Integration + +**Design System Consistency:** +- Reuse existing SCSS variables and mixins +- Follow established component patterns +- Maintain consistent spacing and typography +- Use existing color schemes and interaction states + +**Recent Imports Sidebar Styling (based on ExampleDatasetCard):** +```scss +.recent-imports { + display: flex; + flex-direction: column; + gap: $base-space; + + &__header { + h3 { + margin: 0 0 $base-space 0; + font-weight: 500; + color: var(--fg-primary); + } + + .subtitle { + margin: 0; + color: var(--fg-secondary); + font-size: 0.9rem; + } + } + + .imports-list { + display: flex; + flex-direction: column; + gap: $base-space * 2; + } +} +``` + +**Import Card Styling (matching ExampleDatasetCard pattern):** +```scss +.import-card { + &.button { + width: 100%; + max-width: 75%; + padding: $base-space * 2; + border: 1px solid var(--bg-opacity-6); + border-radius: $border-radius-m; + background: var(--bg-accent-grey-2); + color: var(--fg-primary); + text-align: left; + + @include media(" Import Configuration > {filename}" + +### Requirement 3 + +**User Story:** As a researcher, I want to see the DatasetConfiguration component populated with my ImportHistory data instead of HuggingFace Hub data, so that I can configure datasets from my imported documents. + +#### Acceptance Criteria + +1. WHEN the DatasetConfiguration component loads with ImportHistory data THEN the system SHALL display the imported tabular data in the preview section instead of the HuggingFace Hub iframe +2. WHEN displaying ImportHistory data THEN the system SHALL show the data in a table format with columns from the original import +3. WHEN ImportHistory data contains multiple records THEN the system SHALL display them in a paginated table with sorting and filtering capabilities +4. WHEN the first record exists in ImportHistory data THEN the system SHALL use it to populate the fields section of the DatasetConfiguration +5. WHEN ImportHistory data is being processed THEN the system SHALL display loading indicators in the appropriate sections +6. WHEN ImportHistory data fails to load THEN the system SHALL display error messages and provide options to retry or return to home +7. WHEN displaying ImportHistory data THEN the system SHALL show a preview of the annotation dataset using the first row from ImportHistoryDetailsResponse.data.data +8. WHEN ImportHistory data is loaded THEN the system SHALL allow mapping columns from the ImportHistory data to create record Fields, similar to HuggingFace Dataset import functionality + +### Requirement 4 + +**User Story:** As a researcher, I want the ImportHistory-based DatasetConfiguration to integrate seamlessly with existing dataset creation workflows, so that I can create datasets from my imported data using familiar interfaces. + +#### Acceptance Criteria + +1. WHEN using DatasetConfiguration with ImportHistory data THEN the system SHALL support all existing configuration options (field mapping, question creation, etc.) +2. WHEN I configure fields and questions THEN the system SHALL apply them to the ImportHistory data structure +3. WHEN I save the dataset configuration THEN the system SHALL create a new dataset using the ImportHistory data and my configuration settings +4. WHEN the dataset is created successfully THEN the system SHALL navigate to the dataset view or provide options to continue working with the dataset +5. WHEN configuration changes are made THEN the system SHALL validate them against the ImportHistory data structure +6. WHEN I navigate away from the configuration page THEN the system SHALL prompt to save unsaved changes if any exist +7. WHEN Records are created from ImportHistory data THEN the system SHALL populate each record.metadata.reference field with the value from ImportHistoryDetailsResponse.data.data[]["reference"] + +### Requirement 5 + +**User Story:** As a researcher, I want the Recent Imports sidebar to be responsive and performant, so that it doesn't slow down my workflow or interfere with other home page functionality. + +#### Acceptance Criteria + +1. WHEN the Recent Imports section loads THEN the system SHALL fetch data asynchronously without blocking other page functionality +2. WHEN ImportHistory records are updated THEN the system SHALL refresh the Recent Imports list automatically +3. WHEN I interact with Recent Imports THEN the system SHALL provide immediate visual feedback (hover states, loading indicators) +4. WHEN the sidebar is displayed on mobile devices THEN the system SHALL adapt the layout appropriately for smaller screens +5. WHEN there are many ImportHistory records THEN the system SHALL only load the most recent 5 to maintain performance +6. WHEN API requests fail THEN the system SHALL handle errors gracefully and provide retry options + +### Requirement 6 + +**User Story:** As a researcher, I want to access the full import history and import new documents from the Recent Imports sidebar, so that I can manage all my imports from one convenient location. + +#### Acceptance Criteria + +1. WHEN I view the Recent Imports sidebar THEN the system SHALL display a "View All Imports" button below the recent imports list +2. WHEN I click "View All Imports" THEN the system SHALL open the ImportHistoryList modal showing the complete import history for the workspace +3. WHEN I view the Recent Imports sidebar THEN the system SHALL display an "Import Documents" button +4. WHEN I click "Import Documents" THEN the system SHALL open the ImportModal for uploading new documents +5. WHEN the ImportHistoryList modal is open THEN the system SHALL support all existing functionality (filtering, pagination, viewing details) +6. WHEN I close the ImportHistoryList modal THEN the system SHALL return to the home page with the Recent Imports sidebar still visible + +### Requirement 7 + +**User Story:** As a researcher, I want proper navigation and routing for the import configuration feature, so that I can bookmark, share, and navigate to specific import configurations easily. + +#### Acceptance Criteria + +1. WHEN I access the route `new/import/{import_id}` THEN the system SHALL validate the import_id parameter and load the corresponding ImportHistory record +2. WHEN the import_id is invalid or doesn't exist THEN the system SHALL redirect to the home page with an appropriate error message +3. WHEN I bookmark the import configuration URL THEN the system SHALL allow direct access to that specific import configuration +4. WHEN I use browser back/forward buttons THEN the system SHALL navigate correctly between the home page and import configuration +5. WHEN I refresh the import configuration page THEN the system SHALL reload the ImportHistory data and maintain the current state +6. WHEN I navigate to the import configuration route without proper authentication THEN the system SHALL redirect to the login page \ No newline at end of file diff --git a/.kiro/specs/import-history-sidebar/tasks.md b/.kiro/specs/import-history-sidebar/tasks.md new file mode 100644 index 000000000..e97ce8976 --- /dev/null +++ b/.kiro/specs/import-history-sidebar/tasks.md @@ -0,0 +1,185 @@ +# Implementation Plan + +- [x] 1. Enhance backend ImportHistory API for Recent Imports + - Add `limit` parameter support to existing `GET /api/v1/imports/history` endpoint + - Modify query logic to support limit parameter for Recent Imports sidebar + - Test endpoint with limit parameter to ensure proper functionality + - _Requirements: 1.2, 1.3_ + +- [ ] 2. Create ImportHistory Details Use Case + - [x] 2.1 Create GetImportHistoryDetailsUseCase class + - Write use case class in `argilla-frontend/v1/domain/usecases/get-import-history-details-use-case.ts` + - Implement execute method to fetch detailed ImportHistory data + - Add proper TypeScript interfaces for ImportHistoryResponse with data field + - _Requirements: 2.3, 3.1_ + + - [x] 2.2 Enhance existing GetImportHistoryUseCase for Recent Imports + - Add getRecent method to existing GetImportHistoryUseCase class + - Implement method to fetch limited recent imports for sidebar + - Reuse existing execute method with appropriate parameters + - _Requirements: 1.2, 1.3_ + +- [ ] 3. Create ImportHistory Dataset Builder + - [x] 3.1 Create ImportHistoryDatasetBuilder class + - Write builder class in `argilla-frontend/v1/domain/entities/import/ImportHistoryDatasetBuilder.ts` + - Implement conversion from ImportHistory data structure to DatasetCreation format + - Add field mapping capabilities similar to HuggingFace datasets + - Handle data type inference and validation for ImportHistory fields + - _Requirements: 3.4, 4.1, 4.7_ + + - [x] 3.2 Create ImportHistory entity types + - Define ImportHistoryDetailsResponse interface in `argilla-frontend/v1/domain/entities/import/ImportHistoryDetails.ts` + - Add proper TypeScript types for ImportHistory data structure + - Ensure compatibility with existing ImportHistoryResponse from backend + - _Requirements: 3.1, 3.3_ + +- [ ] 4. Create Recent Imports Sidebar Components + - [x] 4.1 Create RecentImports component + - Write component in `argilla-frontend/components/features/import/RecentImports.vue` + - Implement loading, empty, and error states for recent imports + - Add integration with GetImportHistoryUseCase for fetching recent imports + - Include "View All Imports" and "Import Documents" buttons + - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.6, 6.1, 6.3_ + + - [x] 4.2 Create RecentImportCard component + - Write component in `argilla-frontend/components/features/import/RecentImportCard.vue` + - Implement compact card display for individual ImportHistory records + - Show filename, date, and summary statistics with proper styling + - Add hover states and click interactions + - _Requirements: 1.3, 5.3_ + + - [x] 4.3 Create RecentImports view model + - Write view model in `argilla-frontend/components/features/import/useRecentImportsViewModel.ts` + - Implement reactive state management for recent imports data + - Add error handling and loading state management + - Integrate with GetImportHistoryUseCase for data fetching + - _Requirements: 1.5, 5.1, 5.2_ + +- [ ] 5. Create ImportHistory Data Preview Component + - [x] 5.1 Create ImportHistoryDataPreview component + - Write component in `argilla-frontend/components/features/import/ImportHistoryDataPreview.vue` + - Implement tabular display of ImportHistory data using BaseSimpleTable + - Add pagination, search, and filtering capabilities for large datasets + - Create responsive design for preview pane integration + - _Requirements: 3.2, 3.3, 3.7_ +- [x] 6. Create Import Configuration Page + - [x] 6.1 Create import configuration page route + - Create page file `argilla-frontend/pages/new/import/_id.vue` + - Implement route parameter validation and ImportHistory data fetching + - Add loading, error, and navigation states + - Create breadcrumb navigation with proper routing + - _Requirements: 2.1, 2.2, 2.3, 7.1, 7.2, 7.4, 7.5_ + + - [x] 6.2 Create import configuration view model + - Write view model for import configuration page + - Integrate GetImportHistoryDetailsUseCase for data fetching + - Implement error handling and retry logic + - Add navigation and routing helper methods + - _Requirements: 2.3, 2.4, 7.3, 7.6_ + +- [ ] 7. Enhance DatasetConfiguration Component for ImportHistory + - [x] 7.1 Modify DatasetConfiguration component props + - Update component to accept dataSource prop ('hub' | 'import') + - Add importData prop for ImportHistory data + - Modify preview section to conditionally render ImportHistory or HuggingFace data + - Ensure backward compatibility with existing HuggingFace functionality + - _Requirements: 3.1, 3.4, 3.8, 4.1_ + + - [x] 7.2 Integrate ImportHistoryDataPreview in DatasetConfiguration + - Replace HuggingFace iframe with ImportHistoryDataPreview when dataSource is 'import' + - Pass ImportHistory data and field mapping to preview component + - Maintain existing layout and styling consistency + - _Requirements: 3.1, 3.7, 3.8_ + + - [x] 7.3 Update DatasetConfiguration view model + - Modify useDatasetConfiguration to handle ImportHistory data + - Integrate ImportHistoryDatasetBuilder for data conversion + - Add support for ImportHistory field mapping and configuration + - Ensure record.metadata.reference is populated from ImportHistory data + - _Requirements: 3.4, 4.1, 4.2, 4.7_ + +- [x] 8. Integrate Recent Imports into Home Page + - [x] 8.1 Modify home page sidebar + - Update `argilla-frontend/pages/index.vue` to replace example datasets with RecentImports component + - Add event handlers for import selection and modal opening + - Integrate with existing workspace selection functionality + - Maintain existing ImportModal and ImportHistoryList modal functionality + - _Requirements: 1.1, 1.5, 2.1, 6.2, 6.4_ + + - [x] 8.2 Update home page view model + - Modify useHomeViewModel to handle Recent Imports integration + - Add navigation methods for import configuration routing + - Integrate modal opening logic for ImportHistoryList and ImportModal + - _Requirements: 2.1, 6.2, 6.4, 6.5_ + +- [x] 9. Add Import Configuration Route + - [x] 9.1 Update routes configuration + - Add importConfiguration route to ROUTES object in `argilla-frontend/v1/infrastructure/services/useRoutes.ts` + - Implement goToImportConfiguration navigation method + - Ensure proper route parameter handling for import_id + - _Requirements: 2.1, 2.2, 7.1_ + + - [x] 9.2 Update routing integration + - Integrate new route with existing navigation patterns + - Add proper breadcrumb support for import configuration + - Ensure browser back/forward navigation works correctly + - _Requirements: 7.4, 7.5_ + +- [ ] 10. Add Styling and Responsive Design + - [x] 10.1 Style Recent Imports components + - Create SCSS styles for RecentImports and RecentImportCard components + - Follow existing design system patterns and variables from ExampleDatasetCard styling + - Implement responsive design for sidebar display + - Add hover states and interaction feedback matching ExampleDatasetCard + - _Requirements: 5.3, 5.4_ + + - [ ] 10.2 Style ImportHistory Data Preview + - Create SCSS styles for ImportHistoryDataPreview component + - Ensure consistent styling with existing table components + - Implement responsive design for preview pane + - Add loading and empty state styling + - _Requirements: 3.2, 5.4_ + + - [ ] 10.3 Update Import Configuration page styling + - Style import configuration page layout and components + - Ensure consistent breadcrumb and navigation styling + - Add responsive design for mobile devices + - Maintain consistency with existing dataset configuration styling + - _Requirements: 5.4, 7.4_ + +- [ ] 11. Testing and Error Handling + - [ ] 11.1 Add error handling for ImportHistory operations + - Implement proper error handling in all ImportHistory use cases + - Add retry mechanisms for failed API requests + - Create user-friendly error messages and recovery options + - Handle edge cases like missing or corrupted ImportHistory data + - _Requirements: 2.5, 5.1, 5.6_ + + - [ ] 11.2 Add loading states and performance optimization + - Implement loading indicators for all async operations + - Add skeleton screens for better user experience + - Optimize performance for large ImportHistory datasets + - Implement proper cleanup for component unmounting + - _Requirements: 1.6, 3.5, 5.1, 5.2, 5.5_ + +- [-] 12. Integration Testing and Validation + - [x] 12.1 Test Recent Imports sidebar functionality + - Verify Recent Imports display and interaction + - Test workspace selection integration + - Validate modal opening and navigation + - Test responsive design on different screen sizes + - _Requirements: 1.1, 1.2, 1.3, 1.5, 5.4, 6.1, 6.3_ + + - [ ] 12.2 Test Import Configuration workflow + - Verify end-to-end navigation from Recent Imports to configuration + - Test ImportHistory data loading and display + - Validate DatasetConfiguration integration with ImportHistory data + - Test dataset creation from ImportHistory data + - _Requirements: 2.1, 2.3, 3.1, 3.4, 4.1, 4.2, 4.7_ + + - [ ] 12.3 Test error scenarios and edge cases + - Test invalid import_id handling and error messages + - Verify proper error handling for missing or corrupted data + - Test network failure scenarios and retry mechanisms + - Validate proper cleanup and memory management + - _Requirements: 2.5, 7.2, 7.6_ \ No newline at end of file diff --git a/.kiro/specs/papers-library-importer/tasks.md b/.kiro/specs/papers-library-importer/tasks.md index 20b29d0f3..6aceae326 100644 --- a/.kiro/specs/papers-library-importer/tasks.md +++ b/.kiro/specs/papers-library-importer/tasks.md @@ -206,14 +206,11 @@ - [ ] 11. Add comprehensive error handling and validation - [ ] 11.1 Implement robust error handling - - Add specific error messages for BibTeX parsing failures - - Handle corrupted PDF files with detailed error reporting - Implement retry mechanisms for network and storage failures - Add workspace storage quota validation - _Requirements: 5.1, 5.2, 5.3, 5.5_ - [ ] 11.2 Add security and performance optimizations - - Implement file type and size validation - Add rate limiting for bulk upload requests - Add cleanup of temporary files and partial uploads - _Requirements: 6.1, 6.2, 6.5, 6.6_ diff --git a/.kiro/steering/structure.md b/.kiro/steering/structure.md index 214907bba..125e0719a 100644 --- a/.kiro/steering/structure.md +++ b/.kiro/steering/structure.md @@ -74,6 +74,86 @@ argilla-frontend/ - **Styling**: SCSS in `assets/scss/` with component-scoped styles - **Base Components**: BaseSimpleTable.vue already exists for tabular data display +### Jest Testing Patterns +- **Test Files**: Place `.spec.js` files next to the component they test +- **Mock Setup**: Define mocks inline within `jest.mock()` calls to avoid hoisting issues: + ```javascript + // Mock dependencies inline to avoid hoisting issues + jest.mock("ts-injecty", () => ({ + useResolve: jest.fn(() => mockUseCase), + })); + + jest.mock("@nuxtjs/composition-api", () => ({ + ref: jest.fn(), + computed: jest.fn(), + watch: jest.fn(), + onMounted: jest.fn(), + })); + ``` +- **Mock Configuration**: Set up mocks in `beforeEach` by getting them from required modules: + ```javascript + beforeEach(() => { + jest.clearAllMocks(); + + const compositionApi = require("@nuxtjs/composition-api"); + mockRef = compositionApi.ref; + mockComputed = compositionApi.computed; + // Configure mock behavior... + }); + ``` +- **Component Stubs**: Use stubs in the mount options for base components: + ```javascript + wrapper = mount(ComponentName, { + propsData: { /* props */ }, + stubs: { + "BaseButton": { + template: '', + props: ["variant", "disabled", "loading"], + }, + "BaseIcon": true, + "BaseFlowModal": true, + }, + }); + ``` +- **Global Mocks**: Mock browser APIs and global functions: + ```javascript + beforeEach(() => { + // Mock window.confirm for modal dialogs + global.confirm = jest.fn(() => true); + // Mock other browser APIs as needed + global.alert = jest.fn(); + }); + + afterEach(() => { + jest.restoreAllMocks(); + }); + ``` +- **View Model Testing**: Test the public interface rather than internal implementation: + ```javascript + // Test computed properties return expected values + expect(computedFn()).toBe("expected-value"); + + // Test methods exist and are callable + expect(typeof viewModel.methodName).toBe("function"); + expect(viewModel.methodName).toBeDefined(); + + // Test reactive state objects are returned + expect(viewModel.property).toBe(mockRefObject); + ``` +- **Test Structure**: + - Use `beforeEach` to reset mock state between tests + - Use `afterEach` to clean up mocks and destroy wrappers + - Group related tests in `describe` blocks + - Test public interfaces, not internal implementation details +- **Props Testing**: Test component behavior with different prop combinations +- **Event Testing**: Verify component emits correct events with proper data +- **State Testing**: Test computed properties and reactive state changes +- **User Interaction**: Mock user actions and verify component responses +- **Error Handling**: Test error states and error recovery +- **Lifecycle Testing**: Test component mounting, updating, and destruction +- **Async Testing**: Use `async/await` for asynchronous operations +- **Mock Validation**: Ensure mocks match actual component interfaces + ## Client SDK Structure (extralit/) ``` extralit/ diff --git a/.kiro/steering/tech.md b/.kiro/steering/tech.md index 438c33f9b..586b59c31 100644 --- a/.kiro/steering/tech.md +++ b/.kiro/steering/tech.md @@ -32,7 +32,7 @@ Extralit is a multi-component system with 5 core components: - **Testing**: Jest (unit), Playwright (e2e) - **Node**: >=18.16.1 - **TypeScript**: Use ` - diff --git a/argilla-frontend/components/features/dataset-creation/configuration/DatasetConfigurationForm.vue b/argilla-frontend/components/features/dataset-creation/configuration/DatasetConfigurationForm.vue index 01748a1d4..7fa71d7d7 100644 --- a/argilla-frontend/components/features/dataset-creation/configuration/DatasetConfigurationForm.vue +++ b/argilla-frontend/components/features/dataset-creation/configuration/DatasetConfigurationForm.vue @@ -2,10 +2,10 @@
-
+
{{ $t("datasetCreation.fields") }} -
+
{{ $t("datasetCreation.subset") }}:
+
@@ -94,6 +106,7 @@ + + diff --git a/argilla-frontend/components/features/dataset-creation/configuration/shared/DatasetConfigurationCard.vue b/argilla-frontend/components/features/dataset-creation/configuration/shared/DatasetConfigurationCard.vue index d18452102..40b7082ec 100644 --- a/argilla-frontend/components/features/dataset-creation/configuration/shared/DatasetConfigurationCard.vue +++ b/argilla-frontend/components/features/dataset-creation/configuration/shared/DatasetConfigurationCard.vue @@ -1,7 +1,7 @@ @@ -114,7 +141,7 @@ import Home from "@/layouts/Home.vue"; import { useHomeViewModel } from "./useHomeViewModel"; import { Workspace } from "~/v1/domain/entities/workspace/Workspace"; - +import ImportHistoryDetailsModal from "~/components/features/import/ImportHistoryDetailsModal.vue"; export default { data() { @@ -126,6 +153,9 @@ export default { { id: 'datasets', name: this.$t('home.datasets') }, { id: 'documents', name: this.$t('home.documents') }, ], + // Import details modal state + isImportDetailsModalVisible: false, + selectedImportDetails: null, }; }, methods: { @@ -151,10 +181,42 @@ export default { this.activeTab = selectedTab; } }, + handleImportSelected(importRecord) { + this.goToImportConfiguration(importRecord.id); + }, + handleViewImportDetails(importRecord) { + this.selectedImportDetails = importRecord; + this.isImportDetailsModalVisible = true; + }, + closeImportDetailsModal() { + this.isImportDetailsModalVisible = false; + this.selectedImportDetails = null; + }, + handleRetryItem(item) { + // Handle retry item functionality if needed + console.log('Retry item:', item); + }, }, components: { Home, + ImportHistoryDetailsModal, + }, + computed: { + // Modal state is managed by useHomeViewModel + }, + + watch: { + workspaces: { + immediate: true, + handler(newWorkspaces) { + // Auto-assign the first workspace if none is selected and workspaces exist + if (!this.selectedWorkspace && newWorkspaces && newWorkspaces.length > 0) { + this.selectedWorkspace = newWorkspaces[0]; + } + } + } }, + setup() { return useHomeViewModel(); }, diff --git a/argilla-frontend/pages/new/import/_id.vue b/argilla-frontend/pages/new/import/_id.vue new file mode 100644 index 000000000..aa1684875 --- /dev/null +++ b/argilla-frontend/pages/new/import/_id.vue @@ -0,0 +1,157 @@ + + + + + diff --git a/argilla-frontend/pages/new/import/useImportConfigurationViewModel.spec.js b/argilla-frontend/pages/new/import/useImportConfigurationViewModel.spec.js new file mode 100644 index 000000000..b4d326568 --- /dev/null +++ b/argilla-frontend/pages/new/import/useImportConfigurationViewModel.spec.js @@ -0,0 +1,419 @@ +import { useImportConfigurationViewModel } from "./useImportConfigurationViewModel"; + +// Mock dependencies +jest.mock("ts-injecty", () => ({ + useResolve: jest.fn(), +})); + +jest.mock("@nuxtjs/composition-api", () => ({ + ref: jest.fn(), + useContext: jest.fn(), + useRoute: jest.fn(), +})); + +jest.mock("~/v1/infrastructure/services/useRoutes", () => ({ + useRoutes: jest.fn(), +})); + +jest.mock("~/v1/domain/entities/import/ImportHistoryDatasetBuilder", () => ({ + ImportHistoryDatasetBuilder: jest.fn(), +})); + +jest.mock("~/v1/domain/entities/import/ImportHistoryDetails", () => ({ + ImportHistoryDetails: jest.fn(), +})); + +describe("useImportConfigurationViewModel", () => { + let mockGetImportHistoryDetailsUseCase; + let mockGoToHome; + let mockRoute; + let mockRef; + let mockImportHistoryDatasetBuilder; + let mockImportHistoryDetails; + + beforeEach(() => { + jest.clearAllMocks(); + + // Mock composition API + const compositionApi = require("@nuxtjs/composition-api"); + mockRef = compositionApi.ref; + mockRef.mockImplementation((initialValue) => ({ + value: initialValue, + })); + + // Mock route + mockRoute = { + value: { + params: { + id: "test-import-123", + }, + }, + }; + compositionApi.useRoute.mockReturnValue(mockRoute); + + // Mock routes service + mockGoToHome = jest.fn(); + const useRoutes = require("~/v1/infrastructure/services/useRoutes"); + useRoutes.useRoutes.mockReturnValue({ + goToHome: mockGoToHome, + }); + + // Mock use case + mockGetImportHistoryDetailsUseCase = { + execute: jest.fn(), + }; + const tsInjecty = require("ts-injecty"); + tsInjecty.useResolve.mockReturnValue(mockGetImportHistoryDetailsUseCase); + + // Mock builder + mockImportHistoryDatasetBuilder = { + build: jest.fn(), + }; + const ImportHistoryDatasetBuilder = require("~/v1/domain/entities/import/ImportHistoryDatasetBuilder"); + ImportHistoryDatasetBuilder.ImportHistoryDatasetBuilder.mockImplementation(() => mockImportHistoryDatasetBuilder); + + // Mock ImportHistoryDetails + mockImportHistoryDetails = {}; + const ImportHistoryDetails = require("~/v1/domain/entities/import/ImportHistoryDetails"); + ImportHistoryDetails.ImportHistoryDetails.mockImplementation(() => mockImportHistoryDetails); + + // Mock console.error to avoid noise in tests + jest.spyOn(console, "error").mockImplementation(() => {}); + }); + + afterEach(() => { + jest.restoreAllMocks(); + }); + + describe("loadImportConfiguration", () => { + it("should load import configuration successfully", async () => { + const mockImportData = { + id: "test-import-123", + filename: "test-papers.csv", + data: { + data: [ + { reference: "paper_001", title: "Test Paper 1" }, + { reference: "paper_002", title: "Test Paper 2" }, + ], + schema: { + fields: [ + { name: "reference", type: "string" }, + { name: "title", type: "string" }, + ], + }, + }, + metadata: { + paper_001: { status: "add" }, + paper_002: { status: "add" }, + }, + }; + + const mockDatasetConfig = { + name: "Test Dataset", + fields: [], + questions: [], + }; + + mockGetImportHistoryDetailsUseCase.execute.mockResolvedValue(mockImportData); + mockImportHistoryDatasetBuilder.build.mockReturnValue(mockDatasetConfig); + + const viewModel = useImportConfigurationViewModel(); + await viewModel.loadImportConfiguration("test-import-123"); + + expect(mockGetImportHistoryDetailsUseCase.execute).toHaveBeenCalledWith("test-import-123"); + expect(viewModel.importHistoryData.value).toBe(mockImportHistoryDetails); + expect(viewModel.datasetConfig.value).toBe(mockDatasetConfig); + expect(viewModel.error.value).toBeNull(); + expect(viewModel.isLoading.value).toBe(false); + }); + + it("should handle invalid import ID format", async () => { + const viewModel = useImportConfigurationViewModel(); + await viewModel.loadImportConfiguration(""); + + expect(mockGetImportHistoryDetailsUseCase.execute).not.toHaveBeenCalled(); + expect(viewModel.error.value).toBe("The import ID format is invalid. Please check the URL and try again."); + expect(viewModel.isLoading.value).toBe(false); + }); + + it("should handle empty import data", async () => { + const mockImportData = { + id: "test-import-123", + filename: "empty-import.csv", + data: { + data: [], // Empty data + schema: { fields: [] }, + }, + metadata: {}, + }; + + mockGetImportHistoryDetailsUseCase.execute.mockResolvedValue(mockImportData); + + const viewModel = useImportConfigurationViewModel(); + await viewModel.loadImportConfiguration("test-import-123"); + + expect(viewModel.error.value).toBe( + "This import contains no data to configure. Please try importing documents first." + ); + expect(viewModel.datasetConfig.value).toBeNull(); + }); + + it("should handle 404 error", async () => { + const error = new Error("Not found"); + error.response = { status: 404 }; + mockGetImportHistoryDetailsUseCase.execute.mockRejectedValue(error); + + const viewModel = useImportConfigurationViewModel(); + await viewModel.loadImportConfiguration("test-import-123"); + + expect(viewModel.error.value).toBe( + "Import record not found. It may have been deleted or you don't have access to it." + ); + expect(viewModel.isLoading.value).toBe(false); + }); + + it("should handle 403 error", async () => { + const error = new Error("Forbidden"); + error.response = { status: 403 }; + mockGetImportHistoryDetailsUseCase.execute.mockRejectedValue(error); + + const viewModel = useImportConfigurationViewModel(); + await viewModel.loadImportConfiguration("test-import-123"); + + expect(viewModel.error.value).toBe( + "You don't have permission to access this import record. Please check with your workspace administrator." + ); + }); + + it("should handle 401 error", async () => { + const error = new Error("Unauthorized"); + error.response = { status: 401 }; + mockGetImportHistoryDetailsUseCase.execute.mockRejectedValue(error); + + const viewModel = useImportConfigurationViewModel(); + await viewModel.loadImportConfiguration("test-import-123"); + + expect(viewModel.error.value).toBe("Your session has expired. Please sign in again."); + }); + + it("should handle server error", async () => { + const error = new Error("Server error"); + error.response = { status: 500 }; + mockGetImportHistoryDetailsUseCase.execute.mockRejectedValue(error); + + const viewModel = useImportConfigurationViewModel(); + await viewModel.loadImportConfiguration("test-import-123"); + + expect(viewModel.error.value).toBe("Server error occurred while loading the import. Please try again later."); + }); + + it("should handle network error", async () => { + const error = new Error("Network Error"); + mockGetImportHistoryDetailsUseCase.execute.mockRejectedValue(error); + + const viewModel = useImportConfigurationViewModel(); + await viewModel.loadImportConfiguration("test-import-123"); + + expect(viewModel.error.value).toBe( + "Network connection error. Please check your internet connection and try again." + ); + }); + + it("should handle generic error", async () => { + const error = new Error("Generic error"); + mockGetImportHistoryDetailsUseCase.execute.mockRejectedValue(error); + + const viewModel = useImportConfigurationViewModel(); + await viewModel.loadImportConfiguration("test-import-123"); + + expect(viewModel.error.value).toBe( + "Failed to load import configuration. Please check your connection and try again." + ); + }); + }); + + describe("retry", () => { + it("should retry loading configuration with exponential backoff", async () => { + const mockImportData = { + id: "test-import-123", + filename: "test-papers.csv", + data: { + data: [{ reference: "paper_001", title: "Test Paper 1" }], + schema: { fields: [{ name: "reference", type: "string" }] }, + }, + metadata: { paper_001: { status: "add" } }, + }; + + mockGetImportHistoryDetailsUseCase.execute.mockResolvedValue(mockImportData); + mockImportHistoryDatasetBuilder.build.mockReturnValue({}); + + // Mock setTimeout to avoid actual delays in tests + jest.spyOn(global, "setTimeout").mockImplementation((callback) => { + callback(); + return 123; + }); + + const viewModel = useImportConfigurationViewModel(); + await viewModel.retry(); + + expect(mockGetImportHistoryDetailsUseCase.execute).toHaveBeenCalledWith("test-import-123"); + expect(viewModel.retryCount.value).toBe(0); + + global.setTimeout.mockRestore(); + }); + + it("should not retry if max retries exceeded", async () => { + const viewModel = useImportConfigurationViewModel(); + viewModel.retryCount.value = 3; // Set to max retries + + await viewModel.retry(); + + expect(mockGetImportHistoryDetailsUseCase.execute).not.toHaveBeenCalled(); + expect(viewModel.error.value).toBe( + "Maximum retry attempts (3) exceeded. Please refresh the page or contact support." + ); + }); + + it("should handle missing import ID during retry", async () => { + mockRoute.value.params.id = null; + + const viewModel = useImportConfigurationViewModel(); + await viewModel.retry(); + + expect(mockGetImportHistoryDetailsUseCase.execute).not.toHaveBeenCalled(); + expect(viewModel.error.value).toBe("Unable to determine import ID for retry."); + }); + }); + + describe("handleSubsetChange", () => { + it("should handle subset change successfully", () => { + const mockDatasetConfig = { + changeSubset: jest.fn(), + }; + + const viewModel = useImportConfigurationViewModel(); + viewModel.datasetConfig.value = mockDatasetConfig; + + viewModel.handleSubsetChange("test-subset"); + + expect(mockDatasetConfig.changeSubset).toHaveBeenCalledWith("test-subset"); + expect(viewModel.error.value).toBeNull(); + }); + + it("should handle subset change error", () => { + const mockDatasetConfig = { + changeSubset: jest.fn(() => { + throw new Error("Subset change failed"); + }), + }; + + const viewModel = useImportConfigurationViewModel(); + viewModel.datasetConfig.value = mockDatasetConfig; + + viewModel.handleSubsetChange("test-subset"); + + expect(viewModel.error.value).toBe("Failed to change dataset subset. Please try again."); + }); + + it("should handle missing dataset config", () => { + const viewModel = useImportConfigurationViewModel(); + viewModel.datasetConfig.value = null; + + // Should not throw error + expect(() => viewModel.handleSubsetChange("test-subset")).not.toThrow(); + }); + }); + + describe("handleBreadcrumbAction", () => { + it("should handle home action", () => { + const viewModel = useImportConfigurationViewModel(); + viewModel.handleBreadcrumbAction("home"); + + expect(mockGoToHome).toHaveBeenCalled(); + }); + + it("should handle back action", () => { + // Mock window.history.back + const mockBack = jest.fn(); + Object.defineProperty(window, "history", { + value: { back: mockBack }, + writable: true, + }); + + const viewModel = useImportConfigurationViewModel(); + viewModel.handleBreadcrumbAction("back"); + + expect(mockBack).toHaveBeenCalled(); + }); + + it("should handle unknown action", () => { + const consoleSpy = jest.spyOn(console, "warn").mockImplementation(() => {}); + + const viewModel = useImportConfigurationViewModel(); + viewModel.handleBreadcrumbAction("unknown"); + + expect(consoleSpy).toHaveBeenCalledWith("Unknown breadcrumb action:", "unknown"); + + consoleSpy.mockRestore(); + }); + }); + + describe("navigateToHome", () => { + it("should navigate to home", () => { + const viewModel = useImportConfigurationViewModel(); + viewModel.navigateToHome(); + + expect(mockGoToHome).toHaveBeenCalled(); + }); + }); + + describe("getImportId", () => { + it("should return import ID from route params", () => { + const viewModel = useImportConfigurationViewModel(); + const importId = viewModel.getImportId(); + + expect(importId).toBe("test-import-123"); + }); + + it("should return null if no import ID in route", () => { + mockRoute.value.params.id = null; + + const viewModel = useImportConfigurationViewModel(); + const importId = viewModel.getImportId(); + + expect(importId).toBeNull(); + }); + }); + + describe("resetError", () => { + it("should reset error state", () => { + const viewModel = useImportConfigurationViewModel(); + viewModel.error.value = "Test error"; + + viewModel.resetError(); + + expect(viewModel.error.value).toBeNull(); + }); + }); + + describe("isValidImportId", () => { + it("should validate UUID format", async () => { + const viewModel = useImportConfigurationViewModel(); + + // Access the private method through the returned object (if exposed) or test indirectly + // Since isValidImportId is private, we test it indirectly through loadImportConfiguration + + // Test valid UUID + expect(() => viewModel.loadImportConfiguration("550e8400-e29b-41d4-a716-446655440000")).not.toThrow(); + + // Test valid numeric ID + expect(() => viewModel.loadImportConfiguration("123")).not.toThrow(); + + // Test invalid empty string + await viewModel.loadImportConfiguration(""); + expect(viewModel.error.value).toBe( + "Failed to load import configuration. Please check your connection and try again." + ); + }); + }); +}); diff --git a/argilla-frontend/pages/new/import/useImportConfigurationViewModel.ts b/argilla-frontend/pages/new/import/useImportConfigurationViewModel.ts new file mode 100644 index 000000000..66f06b67e --- /dev/null +++ b/argilla-frontend/pages/new/import/useImportConfigurationViewModel.ts @@ -0,0 +1,165 @@ +import { useResolve } from "ts-injecty"; +import { ref, useContext, useRoute } from "@nuxtjs/composition-api"; +import { + GetImportHistoryDetailsUseCase, + ImportHistoryDetailsResponse, +} from "~/v1/domain/usecases/get-import-history-details-use-case"; +import { ImportHistoryDatasetBuilder } from "~/v1/domain/entities/import/ImportHistoryDatasetBuilder"; +import { ImportHistoryDetails } from "~/v1/domain/entities/import/ImportHistoryDetails"; +import { useRoutes } from "~/v1/infrastructure/services/useRoutes"; + +export const useImportConfigurationViewModel = () => { + const { goToHome } = useRoutes(); + const route = useRoute(); + + const isLoading = ref(false); + const error = ref(null); + const importHistoryData = ref(null); + const datasetConfig = ref(null); + const retryCount = ref(0); + const maxRetries = 3; + + const getImportHistoryDetailsUseCase = useResolve(GetImportHistoryDetailsUseCase); + + const loadImportConfiguration = async (importId: string) => { + isLoading.value = true; + error.value = null; + + try { + // Validate import ID format (should be UUID or similar) + if (!isValidImportId(importId)) { + throw new Error("Invalid import ID format"); + } + + // Fetch the import history details + const result = await getImportHistoryDetailsUseCase.execute(importId); + + if (!result) { + throw new Error("No import details received"); + } + + // Convert raw data to ImportHistoryDetails instance + const importHistoryDetails = new ImportHistoryDetails(result); + importHistoryData.value = importHistoryDetails; + + // Validate that we have data to work with + if (!result.data || !result.data.data || result.data.data.length === 0) { + error.value = "This import contains no data to configure. Please try importing documents first."; + return; + } + + // Build dataset configuration from import history data + const builder = new ImportHistoryDatasetBuilder(result); + datasetConfig.value = builder.build(); + + // Reset retry count on success + retryCount.value = 0; + } catch (e) { + console.error("Failed to load import configuration:", e); + + // Handle different error types with more specific messages + if (e.response?.status === 404) { + error.value = "Import record not found. It may have been deleted or you don't have access to it."; + } else if (e.response?.status === 403) { + error.value = + "You don't have permission to access this import record. Please check with your workspace administrator."; + } else if (e.response?.status === 401) { + error.value = "Your session has expired. Please sign in again."; + // Could redirect to login here + } else if (e.response?.status >= 500) { + error.value = "Server error occurred while loading the import. Please try again later."; + } else if (e.message === "Invalid import ID format") { + error.value = "The import ID format is invalid. Please check the URL and try again."; + } else if (e.message === "Network Error" || !navigator.onLine) { + error.value = "Network connection error. Please check your internet connection and try again."; + } else { + error.value = "Failed to load import configuration. Please check your connection and try again."; + } + } finally { + isLoading.value = false; + } + }; + + const isValidImportId = (importId: string): boolean => { + // Basic validation for import ID (UUID format or similar) + const uuidRegex = /^[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i; + const numericRegex = /^\d+$/; + + return uuidRegex.test(importId) || numericRegex.test(importId) || importId.length > 0; + }; + + const retry = async () => { + if (retryCount.value >= maxRetries) { + error.value = `Maximum retry attempts (${maxRetries}) exceeded. Please refresh the page or contact support.`; + return; + } + + retryCount.value++; + + // Get import ID from route params + const importId = route.value.params.id; + if (importId) { + // Add exponential backoff delay + const delay = Math.pow(2, retryCount.value - 1) * 1000; // 1s, 2s, 4s + await new Promise((resolve) => setTimeout(resolve, delay)); + + await loadImportConfiguration(importId); + } else { + error.value = "Unable to determine import ID for retry."; + } + }; + + const handleSubsetChange = (subsetName: string) => { + if (datasetConfig.value && typeof datasetConfig.value.changeSubset === "function") { + try { + datasetConfig.value.changeSubset(subsetName); + } catch (e) { + console.error("Failed to change subset:", e); + error.value = "Failed to change dataset subset. Please try again."; + } + } + }; + + const handleBreadcrumbAction = (action: string) => { + switch (action) { + case "home": + goToHome(); + break; + case "back": + // Navigate back to previous page + window.history.back(); + break; + default: + console.warn("Unknown breadcrumb action:", action); + } + }; + + const navigateToHome = () => { + goToHome(); + }; + + const getImportId = (): string | null => { + return route.value.params.id || null; + }; + + const resetError = () => { + error.value = null; + }; + + return { + isLoading, + error, + importHistoryData, + datasetConfig, + retryCount, + maxRetries, + loadImportConfiguration, + retry, + goToHome, + navigateToHome, + handleSubsetChange, + handleBreadcrumbAction, + getImportId, + resetError, + }; +}; diff --git a/argilla-frontend/pages/useHomeViewModel.ts b/argilla-frontend/pages/useHomeViewModel.ts index 53bf2bf65..55d5dc224 100644 --- a/argilla-frontend/pages/useHomeViewModel.ts +++ b/argilla-frontend/pages/useHomeViewModel.ts @@ -6,13 +6,14 @@ import { GetDatasetsUseCase } from "@/v1/domain/usecases/get-datasets-use-case"; import { GetWorkspacesUseCase } from "~/v1/domain/usecases/get-workspaces-use-case"; import { useDatasets } from "~/v1/infrastructure/storage/DatasetsStorage"; import { useRole } from "~/v1/infrastructure/services/useRole"; +import { ImportHistoryListItem } from "~/v1/domain/usecases/get-import-history-use-case"; export const useHomeViewModel = () => { const workspaces = ref([]); const getWorkspacesUseCase = useResolve(GetWorkspacesUseCase); const { isAdminOrOwnerRole } = useRole(); const isLoadingDatasets = ref(false); - const { goToImportDatasetFromHub } = useRoutes(); + const { goToImportDatasetFromHub, goToImportConfiguration } = useRoutes(); const { state: datasets } = useDatasets(); const getDatasetsUseCase = useResolve(GetDatasetsUseCase); const getDatasetCreationUseCase = useResolve(GetHfDatasetCreationUseCase); @@ -91,11 +92,37 @@ export const useHomeViewModel = () => { selectedWorkspaceId.value = workspaceId; }; + // Import history modal state + const showImportHistoryModal = ref(false); + + const isImportHistoryModalVisible = computed(() => { + return showImportHistoryModal.value; + }); + + const openImportHistoryModal = () => { + showImportHistoryModal.value = true; + }; + + const closeImportHistoryModal = () => { + showImportHistoryModal.value = false; + }; + + // Navigation methods for import configuration routing + const handleImportSelected = (importRecord: ImportHistoryListItem) => { + goToImportConfiguration(importRecord.id); + }; + + const handleViewImportDetails = (importRecord: ImportHistoryListItem) => { + closeImportHistoryModal(); + goToImportConfiguration(importRecord.id); + }; + return { datasets, workspaces, isLoadingDatasets, getNewHfDatasetByRepoId, + goToImportConfiguration, isAdminOrOwnerRole, exampleDatasets, error, @@ -104,5 +131,11 @@ export const useHomeViewModel = () => { openImportModal, selectedWorkspace: selectedWorkspaceId, setSelectedWorkspaceId, + showImportHistoryModal, + isImportHistoryModalVisible, + openImportHistoryModal, + closeImportHistoryModal, + handleImportSelected, + handleViewImportDetails, }; }; diff --git a/argilla-frontend/translation/en.js b/argilla-frontend/translation/en.js index 382d4e6a9..c2679f00c 100644 --- a/argilla-frontend/translation/en.js +++ b/argilla-frontend/translation/en.js @@ -298,6 +298,16 @@ export default { }, import: { title: "Import Documents to {workspaceName} Workspace", + historyTitle: "Import History", + }, + importConfiguration: { + title: "Import", + loading: "Loading import data...", + errorTitle: "Failed to Load Import", + retry: "Retry", + retrying: "Retrying...", + returnHome: "Return Home", + retryAttempt: "Retry attempt {current} of {max}", }, datasetCreation: { questions: { @@ -325,6 +335,8 @@ export default { "The created dataset will include the first 10K rows and further records can be logged via the python SDK.", button: "Create dataset", fields: "Fields", + metadata: "Metadata Fields", + metadataDescription: "Select fields to include as metadata for filtering and sorting", questionsTitle: "Questions", yourQuestions: "Your questions", requiredField: "Required field", diff --git a/argilla-frontend/v1/domain/entities/hub/DatasetCreation.ts b/argilla-frontend/v1/domain/entities/hub/DatasetCreation.ts index 69efef7ce..6f876da1a 100644 --- a/argilla-frontend/v1/domain/entities/hub/DatasetCreation.ts +++ b/argilla-frontend/v1/domain/entities/hub/DatasetCreation.ts @@ -7,6 +7,7 @@ export class DatasetCreation { public readonly firstRecord: {}; public workspace: Workspace; + public importHistoryId?: string; constructor(public readonly repoId: string, public name: string, private readonly subset: Subset[]) { this.selectedSubset = subset[0]; diff --git a/argilla-frontend/v1/domain/entities/hub/DatasetCreationBuilder.ts b/argilla-frontend/v1/domain/entities/hub/DatasetCreationBuilder.ts index 8c5986a32..509355b22 100644 --- a/argilla-frontend/v1/domain/entities/hub/DatasetCreationBuilder.ts +++ b/argilla-frontend/v1/domain/entities/hub/DatasetCreationBuilder.ts @@ -2,7 +2,7 @@ import { DatasetCreation } from "./DatasetCreation"; import { Subset } from "./Subset"; export interface Feature { - dtype: "string" | "int32" | "int64"; + dtype: "string" | "int32" | "int64" | "float32" | "boolean"; _type: "Value" | "Image" | "ClassLabel"; names?: string[]; feature?: Feature; diff --git a/argilla-frontend/v1/domain/entities/hub/MetadataCreation.ts b/argilla-frontend/v1/domain/entities/hub/MetadataCreation.ts index 8033a6321..54655af2b 100644 --- a/argilla-frontend/v1/domain/entities/hub/MetadataCreation.ts +++ b/argilla-frontend/v1/domain/entities/hub/MetadataCreation.ts @@ -25,13 +25,14 @@ const ADAPTED_TYPES = { float16: "float", float32: "float", float64: "float", + terms: "terms", }; export type MetadataTypes = | "uint8" | "uint16" | "uint32" - | "unit64" + | "uint64" | "int8" | "int32" | "int64" @@ -53,7 +54,7 @@ export class MetadataCreation { return null; } - get adapteType() { + get adaptedType() { return ADAPTED_TYPES[this.type.value]; } } diff --git a/argilla-frontend/v1/domain/entities/hub/Subset.ts b/argilla-frontend/v1/domain/entities/hub/Subset.ts index 493fd9de3..48fc8b50f 100644 --- a/argilla-frontend/v1/domain/entities/hub/Subset.ts +++ b/argilla-frontend/v1/domain/entities/hub/Subset.ts @@ -11,7 +11,7 @@ type Structure = { content?: string; structure?: Structure[]; kindObject?: "Value" | "Image" | "ClassLabel" | "Sequence"; - type?: "string" | MetadataTypes; + type?: "string" | "boolean" | "float32" | MetadataTypes; feature?: Feature; }; diff --git a/argilla-frontend/v1/domain/entities/import/ImportAnalysis.ts b/argilla-frontend/v1/domain/entities/import/ImportAnalysis.ts index a872b2131..45759e9ef 100644 --- a/argilla-frontend/v1/domain/entities/import/ImportAnalysis.ts +++ b/argilla-frontend/v1/domain/entities/import/ImportAnalysis.ts @@ -28,21 +28,15 @@ export interface DataframeData { export type ImportStatus = "add" | "update" | "skip" | "ignore" | "failed"; // Document creation data for import (maps to DocumentCreate in backend) +// Only includes fields that are part of the backend DocumentCreate schema +// BibTeX fields (title, authors, year, journal, etc.) are stored in import_history.data export interface DocumentCreate { - title?: string; - authors?: string[] | string; - year?: string | number; - journal?: string; - volume?: string; - pages?: string; - doi?: string; + workspace_id?: string; url?: string; - abstract?: string; - keywords?: string[] | string; + file_name?: string; reference?: string; pmid?: string; - file_name?: string; - workspace_id?: string; + doi?: string; metadata?: Record; } @@ -94,3 +88,17 @@ export interface ImportHistoryCreate { data: Record; // Tabular dataframe data converted from BibTeX file metadata?: Record; // Import metadata including ImportStatus and associated files for each reference } + +// Import history response (maps to ImportHistoryResponse in backend) +export interface ImportHistoryResponse { + id: string; + workspace_id: string; + user_id: string; + filename: string; + created_at: string; + data?: DataframeData; // Tabular dataframe data (only in detailed view) + metadata?: { + documents: Record; // Reference key to document info mapping + summary: ImportSummary; // Import analysis summary + }; +} diff --git a/argilla-frontend/v1/domain/entities/import/ImportHistoryDatasetBuilder.ts b/argilla-frontend/v1/domain/entities/import/ImportHistoryDatasetBuilder.ts new file mode 100644 index 000000000..96f9b498d --- /dev/null +++ b/argilla-frontend/v1/domain/entities/import/ImportHistoryDatasetBuilder.ts @@ -0,0 +1,526 @@ +/** + * Builder class for converting ImportHistory data to DatasetCreation format + * Handles field mapping and data type inference similar to HuggingFace datasets + */ + +import { DatasetCreation } from "../hub/DatasetCreation"; +import { Subset } from "../hub/Subset"; +import { FieldCreationTypes } from "../hub/FieldCreation"; +import { MetadataTypes, MetadataCreation } from "../hub/MetadataCreation"; +import { ImportHistoryDetailsResponse } from "../../usecases/get-import-history-details-use-case"; + +export interface ImportHistoryFeature { + dtype: "string" | "int32" | "int64" | "float32" | "boolean"; + _type: "Value"; + name: string; +} + +export class ImportHistoryDatasetBuilder { + private readonly importHistoryData: ImportHistoryDetailsResponse; + private readonly datasetName: string; + + // Fields that should be treated as metadata rather than dataset fields + private static readonly METADATA_FIELDS = ["reference", "doi", "imdb"] as const; + + constructor(importHistoryData: ImportHistoryDetailsResponse) { + this.importHistoryData = importHistoryData; + this.datasetName = this.generateDatasetName(); + } + + build(): DatasetCreation { + const subset = this.createSubsetFromImportHistory(); + const dataset = new DatasetCreation(this.importHistoryData.id, this.datasetName, [subset]); + + // Set the importHistoryId for backend import routing + dataset.importHistoryId = this.importHistoryData.id; + + // Add available fields for metadata selection + (dataset as any).availableFields = this.availableFields; + + // Enhance the dataset to ensure proper reference field handling + this.enhanceDatasetForImportHistory(dataset); + + return dataset; + } + + /** + * Enhance DatasetCreation instance for ImportHistory-specific requirements + */ + private enhanceDatasetForImportHistory(dataset: DatasetCreation): void { + // Override the mappings getter to ensure reference field is included in metadata + const originalMappings = dataset.mappings; + + Object.defineProperty(dataset, "mappings", { + get: () => { + const mappings = { + fields: originalMappings.fields, + metadata: [...originalMappings.metadata], + suggestions: originalMappings.suggestions, + external_id: originalMappings.external_id, + }; + + // Ensure metadata fields are properly mapped + ImportHistoryDatasetBuilder.METADATA_FIELDS.forEach((metadataField) => { + if (this.availableFields.includes(metadataField)) { + const hasMapping = mappings.metadata.some((m) => m.target === metadataField); + if (!hasMapping) { + mappings.metadata.push({ + source: metadataField, + target: metadataField, + }); + } + } + }); + + return mappings; + }, + configurable: true, + enumerable: true, + }); + + // Override createFields to ensure proper field creation with ImportHistory data + const originalCreateFields = dataset.createFields.bind(dataset); + dataset.createFields = (firstRawRecord: unknown) => { + // Use ImportHistory first record if no record provided + const recordToUse = firstRawRecord || this.firstRecord; + return originalCreateFields(recordToUse); + }; + } + + private generateDatasetName(): string { + // Generate dataset name from filename, removing extension and making it dataset-friendly + const baseName = this.importHistoryData.filename + .replace(/\.[^/.]+$/, "") // Remove file extension + .replace(/[^a-zA-Z0-9_-]/g, "_") // Replace special chars with underscore + .toLowerCase(); + + return `${baseName}_dataset`; + } + + private createSubsetFromImportHistory(): Subset { + // Create a mock datasetInfo structure that mimics HuggingFace format + const features = this.extractFeaturesFromSchema(); + + // Ensure metadata fields are included in features if they exist in the data + ImportHistoryDatasetBuilder.METADATA_FIELDS.forEach((metadataField) => { + if (this.availableFields.includes(metadataField) && !features[metadataField]) { + features[metadataField] = { + dtype: "string", + _type: "Value", + name: metadataField, + }; + } + }); + + const mockDatasetInfo = { + default: { + dataset_name: this.datasetName, + features, + splits: { + train: { + name: "train", + num_bytes: 0, + num_examples: this.importHistoryData.data.data.length, + }, + }, + }, + }; + + const subset = new Subset("default", mockDatasetInfo.default); + + // Override the metadata creation to use proper metadata types + this.enhanceSubsetForImportHistory(subset); + + return subset; + } + + /** + * Enhance the Subset to properly handle ImportHistory metadata creation + * Only creates metadata for specific fields (reference, doi, imdb) + */ + private enhanceSubsetForImportHistory(subset: Subset): void { + // Clear existing metadata that might have been created with invalid types + (subset as any).metadata.length = 0; + + // Only create metadata for specific fields that should be treated as metadata + this.importHistoryData.data.schema.fields.forEach((field) => { + if (ImportHistoryDatasetBuilder.METADATA_FIELDS.includes(field.name as any)) { + const metadataType = this.inferMetadataType(field.name); + if (metadataType) { + const metadata = MetadataCreation.from(field.name, metadataType); + if (metadata) { + (subset as any).metadata.push(metadata); + } + } + } + }); + + // Ensure reference field metadata is included if it exists and is not already added + if (this.hasReferenceField()) { + const hasReferenceMetadata = (subset as any).metadata.some((m: any) => m.name === "reference"); + if (!hasReferenceMetadata) { + const referenceSource = this.availableFields.includes("reference") ? "reference" : "id"; + // Only add if the reference source is one of our metadata fields + if (ImportHistoryDatasetBuilder.METADATA_FIELDS.includes(referenceSource as any)) { + const referenceMetadata = MetadataCreation.from(referenceSource, "terms"); + if (referenceMetadata) { + (subset as any).metadata.push(referenceMetadata); + } + } + } + } + } + + /** + * Check if the ImportHistory data contains a reference field + */ + private hasReferenceField(): boolean { + return ( + this.importHistoryData.data.schema.fields.some((field) => + ImportHistoryDatasetBuilder.METADATA_FIELDS.includes(field.name as any) + ) || + this.importHistoryData.data.data.some((record) => + ImportHistoryDatasetBuilder.METADATA_FIELDS.some((field) => field in record) + ) + ); + } + + private extractFeaturesFromSchema(): Record { + const features: Record = {}; + + // Process each field from the ImportHistory schema + this.importHistoryData.data.schema.fields.forEach((field) => { + features[field.name] = { + dtype: this.mapDataTypeToFeatureType(field.type), + _type: "Value", + name: field.name, + }; + }); + + return features; + } + + private mapDataTypeToFeatureType(dataType: string): "string" | "int32" | "int64" | "float32" | "boolean" { + // Map Table Schema data types to standard feature types + switch (dataType.toLowerCase()) { + case "string": + case "text": + case "str": + return "string"; + case "integer": + case "int": + case "int32": + return "int32"; + case "int64": + case "bigint": + return "int64"; + case "number": + case "float": + case "float32": + case "double": + return "float32"; + case "boolean": + case "bool": + return "boolean"; + case "datetime": + case "duration": + return "string"; + case "any": + return "string"; + default: + return "string"; + } + } + + /** + * Get the first record from ImportHistory data for field mapping + * This is used by DatasetConfiguration to populate field examples + */ + get firstRecord(): Record { + if (this.importHistoryData.data.data.length === 0) { + return {}; + } + return this.importHistoryData.data.data[0]; + } + + /** + * Get records with enhanced metadata including only specific metadata fields + * Only includes reference, doi, and imdb fields as metadata + */ + getRecordsWithMetadata(): Array & { metadata: Record }> { + return this.importHistoryData.data.data.map((record) => { + const metadata: Record = { ...record.metadata }; + + // Only include specific fields as metadata + ImportHistoryDatasetBuilder.METADATA_FIELDS.forEach((metadataField) => { + if (record[metadataField] !== undefined) { + metadata[metadataField] = record[metadataField]; + } + }); + + // Ensure reference field has a value if it exists + if ("reference" in record || "id" in record) { + metadata.reference = record.reference || record.id || `record_${Math.random().toString(36).substring(2, 11)}`; + } + + return { + ...record, + metadata, + }; + }); + } + + /** + * Get all data records from ImportHistory + * This is used for preview and dataset creation + */ + get allRecords(): Record[] { + return this.importHistoryData.data.data; + } + + /** + * Get field names available for mapping + */ + get availableFields(): string[] { + return this.importHistoryData.data.schema.fields.map((field) => field.name); + } + + /** + * Infer field type for DatasetConfiguration field mapping + */ + inferFieldType(fieldName: string): FieldCreationTypes { + const field = this.importHistoryData.data.schema.fields.find((f) => f.name === fieldName); + if (!field) return "no mapping"; + + // Skip fields that should be treated as metadata + if (ImportHistoryDatasetBuilder.METADATA_FIELDS.includes(fieldName as any)) { + return "no mapping"; + } + + // Map data types to field creation types + switch (field.type.toLowerCase()) { + case "string": + case "text": + // Check if this looks like a text field that should be used for annotation + if (this.isTextAnnotationField(fieldName)) { + return "text"; + } + return "no mapping"; // Most string fields will be metadata + default: + return "no mapping"; // Non-text fields typically become metadata + } + } + + /** + * Infer metadata type for DatasetConfiguration metadata mapping based on Table Schema + * Only returns metadata types for fields that should be treated as metadata + */ + inferMetadataType(fieldName: string): MetadataTypes | "terms" | null { + // Only return metadata types for fields that should be treated as metadata + if (!ImportHistoryDatasetBuilder.METADATA_FIELDS.includes(fieldName as any)) { + return null; + } + + const field = this.importHistoryData.data.schema.fields.find((f) => f.name === fieldName); + if (!field) return null; + + // Map Table Schema data types to metadata types + switch (field.type.toLowerCase()) { + case "integer": + case "int": + case "int32": + return "int32"; + case "int64": + case "bigint": + return "int64"; + case "number": + case "float": + case "float32": + case "double": + return "float32"; + case "boolean": + case "bool": + return "terms"; // Booleans as terms (true/false) + case "datetime": + case "duration": + return "terms"; // Dates as terms for filtering + case "any": + // For 'any' type, try to infer from actual data + return this.inferMetadataTypeFromData(fieldName); + default: + return "terms"; // String and other fields become terms metadata + } + } + + /** + * Infer metadata type from actual data when schema type is 'any' + */ + private inferMetadataTypeFromData(fieldName: string): MetadataTypes | "terms" { + const sampleValues = this.importHistoryData.data.data + .slice(0, 10) // Sample first 10 records + .map((record) => record[fieldName]) + .filter((value) => value !== null && value !== undefined); + + if (sampleValues.length === 0) return "terms"; + + // Check if all values are numbers + const allNumbers = sampleValues.every((val) => typeof val === "number" || !isNaN(Number(val))); + if (allNumbers) { + const allIntegers = sampleValues.every((val) => Number.isInteger(Number(val))); + return allIntegers ? "int32" : "float32"; + } + + // Check if all values are booleans + const allBooleans = sampleValues.every((val) => typeof val === "boolean" || val === "true" || val === "false"); + if (allBooleans) return "terms"; + + // Default to terms for mixed or string data + return "terms"; + } + + /** + * Check if a field should be treated as a text annotation field based on Table Schema + * Uses field type and content analysis rather than hardcoded field names + */ + private isTextAnnotationField(fieldName: string): boolean { + const field = this.importHistoryData.data.schema.fields.find((f) => f.name === fieldName); + if (!field) return false; + + // Only string/text fields can be text annotation fields + if (!["string", "text"].includes(field.type.toLowerCase())) { + return false; + } + + // Check if field contains substantial text content by sampling the data + const sampleValues = this.importHistoryData.data.data + .slice(0, 5) // Sample first 5 records + .map((record) => record[fieldName]) + .filter((value) => value && typeof value === "string"); + + if (sampleValues.length === 0) return false; + + // Consider it a text field if average length is > 50 characters + // This indicates substantial text content suitable for annotation + const avgLength = sampleValues.reduce((sum, val) => sum + val.length, 0) / sampleValues.length; + return avgLength > 50; + } + + /** + * Get suggested question mappings based on Table Schema field types and data analysis + */ + getSuggestedQuestions(): Array<{ + fieldName: string; + questionName: string; + questionType: "label_selection" | "multi_label_selection" | "text" | "rating"; + options?: Array<{ text: string; value: string; id: string }>; + }> { + const suggestions: Array<{ + fieldName: string; + questionName: string; + questionType: "label_selection" | "multi_label_selection" | "text" | "rating"; + options?: Array<{ text: string; value: string; id: string }>; + }> = []; + + // Analyze each field to suggest appropriate question types + this.importHistoryData.data.schema.fields.forEach((field) => { + const fieldName = field.name; + const fieldType = field.type.toLowerCase(); + + // Skip fields that are likely to be used as text fields for annotation + if (this.isTextAnnotationField(fieldName)) { + return; + } + + // Skip fields that should be treated as metadata + if (ImportHistoryDatasetBuilder.METADATA_FIELDS.includes(fieldName as any)) { + return; + } + + // Suggest questions based on field type and data characteristics + if (fieldType === "boolean" || fieldType === "bool") { + // Boolean fields are good for binary classification + suggestions.push({ + fieldName, + questionName: `${fieldName}_verification`, + questionType: "label_selection", + options: [ + { text: "Yes", value: "yes", id: "yes" }, + { text: "No", value: "no", id: "no" }, + ], + }); + } else if (fieldType === "string" || fieldType === "str") { + // For string fields, analyze the data to suggest question types + const uniqueValues = this.getUniqueValues(fieldName); + + if (uniqueValues.length <= 10 && uniqueValues.length > 1) { + // Low cardinality string fields are good for label selection + const options = uniqueValues.map((value) => ({ + text: String(value), + value: String(value).toLowerCase().replace(/\s+/g, "_"), + id: String(value).toLowerCase().replace(/\s+/g, "_"), + })); + + suggestions.push({ + fieldName, + questionName: `${fieldName}_category`, + questionType: "label_selection", + options, + }); + } else if (uniqueValues.length > 10) { + // High cardinality fields might be good for multi-label or text questions + suggestions.push({ + fieldName, + questionName: `${fieldName}_relevance`, + questionType: "label_selection", + options: [ + { text: "Relevant", value: "relevant", id: "relevant" }, + { text: "Not Relevant", value: "not_relevant", id: "not_relevant" }, + { text: "Partially Relevant", value: "partial", id: "partial" }, + ], + }); + } + } else if (["integer", "int", "number", "float"].includes(fieldType)) { + // Numeric fields might be good for rating questions + const values = this.getNumericValues(fieldName); + if (values.length > 0) { + const min = Math.min(...values); + const max = Math.max(...values); + + // If the range is reasonable for rating (e.g., 1-10), suggest rating + if (max - min <= 10 && min >= 0) { + suggestions.push({ + fieldName, + questionName: `${fieldName}_quality`, + questionType: "rating", + }); + } + } + } + }); + + return suggestions; + } + + /** + * Get unique values for a field (limited to first 100 for performance) + */ + private getUniqueValues(fieldName: string): any[] { + const values = this.importHistoryData.data.data + .slice(0, 100) // Limit for performance + .map((record) => record[fieldName]) + .filter((value) => value !== null && value !== undefined && value !== ""); + + return [...new Set(values)]; + } + + /** + * Get numeric values for a field + */ + private getNumericValues(fieldName: string): number[] { + return this.importHistoryData.data.data + .slice(0, 100) // Limit for performance + .map((record) => record[fieldName]) + .filter((value) => value !== null && value !== undefined && !isNaN(Number(value))) + .map((value) => Number(value)); + } +} diff --git a/argilla-frontend/v1/domain/entities/import/ImportHistoryDetails.ts b/argilla-frontend/v1/domain/entities/import/ImportHistoryDetails.ts new file mode 100644 index 000000000..53e9585c6 --- /dev/null +++ b/argilla-frontend/v1/domain/entities/import/ImportHistoryDetails.ts @@ -0,0 +1,216 @@ +/** + * ImportHistory Details entity types + * Provides TypeScript interfaces for ImportHistory data structure + */ + +import type { + ImportHistoryResponse, + DataframeData, + ImportStatus, + DocumentImportAnalysis, + ImportSummary, +} from "~/v1/domain/entities/import/ImportAnalysis"; + +// Additional entity types specific to ImportHistory details +export interface ImportHistoryDataField { + name: string; + type: "string" | "integer" | "float" | "boolean"; + nullable?: boolean; + description?: string; +} + +export interface ImportHistoryDataSchema { + fields: ImportHistoryDataField[]; + primaryKey: string[]; + totalRows: number; +} + +export interface ImportHistorySummaryStats { + total_documents: number; + add_count: number; + update_count: number; + skip_count: number; + failed_count: number; + success_rate: number; +} + +/** + * Enhanced ImportHistory details with computed properties + */ +export class ImportHistoryDetails { + constructor(private readonly data: ImportHistoryResponse & { data: DataframeData }) {} + + get id(): string { + return this.data.id; + } + + get workspaceId(): string { + return this.data.workspace_id; + } + + get filename(): string { + return this.data.filename; + } + + get createdAt(): Date { + return new Date(this.data.created_at); + } + + get schema(): ImportHistoryDataSchema { + return { + fields: this.data.data.schema.fields.map((field) => ({ + name: field.name, + type: field.type as "string" | "integer" | "float" | "boolean", + })), + primaryKey: this.data.data.schema.primaryKey, + totalRows: this.data.data.data.length, + }; + } + + get records(): Record[] { + return this.data.data.data; + } + + get metadata(): + | { + documents: Record; + summary: ImportSummary; + } + | undefined { + return this.data.metadata; + } + + get summary(): ImportHistorySummaryStats { + const baseSummary = this.calculateSummary(); + return { + ...baseSummary, + success_rate: + baseSummary.total_documents > 0 + ? (baseSummary.add_count + baseSummary.update_count) / baseSummary.total_documents + : 0, + }; + } + + get fieldNames(): string[] { + return this.schema.fields.map((field) => field.name); + } + + get hasErrors(): boolean { + return this.summary.failed_count > 0; + } + + get isSuccessful(): boolean { + return this.summary.success_rate > 0.8; // Consider successful if >80% success rate + } + + /** + * Get a sample record for field mapping preview + */ + getSampleRecord(): Record { + return this.records.length > 0 ? this.records[0] : {}; + } + + /** + * Get records by status + */ + getRecordsByStatus(status: ImportStatus): Record[] { + return this.records.filter((record) => { + const reference = record.reference || record.id; + const documentAnalysis = this.metadata?.documents?.[reference]; + return documentAnalysis?.status === status; + }); + } + + /** + * Get field statistics for data preview + */ + getFieldStats(fieldName: string): { + totalValues: number; + uniqueValues: number; + nullValues: number; + sampleValues: any[]; + } { + const values = this.records.map((record) => record[fieldName]); + const nonNullValues = values.filter((value) => value != null); + const uniqueValues = new Set(nonNullValues); + + return { + totalValues: values.length, + uniqueValues: uniqueValues.size, + nullValues: values.length - nonNullValues.length, + sampleValues: Array.from(uniqueValues).slice(0, 5), // First 5 unique values + }; + } + + /** + * Check if a field contains text suitable for annotation + */ + isTextAnnotationField(fieldName: string): boolean { + const field = this.schema.fields.find((f) => f.name === fieldName); + if (!field || field.type !== "string") return false; + + const stats = this.getFieldStats(fieldName); + const avgLength = + stats.sampleValues.filter((value) => typeof value === "string").reduce((sum, value) => sum + value.length, 0) / + stats.sampleValues.length; + + // Consider it a text field if average length > 50 characters + return avgLength > 50; + } + + /** + * Get raw data for export or further processing + */ + getRawData(): ImportHistoryResponse & { data: DataframeData } { + return this.data; + } + + /** + * Calculate summary from data and metadata + */ + private calculateSummary(): { + total_documents: number; + add_count: number; + update_count: number; + skip_count: number; + failed_count: number; + } { + // If metadata already contains a summary, use it + if (this.data.metadata?.summary) { + return this.data.metadata.summary; + } + + // Otherwise calculate from documents + const summary = { + total_documents: this.data.data.data.length, + add_count: 0, + update_count: 0, + skip_count: 0, + failed_count: 0, + }; + + if (this.data.metadata?.documents) { + Object.values(this.data.metadata.documents).forEach((documentAnalysis: DocumentImportAnalysis) => { + switch (documentAnalysis.status) { + case "add": + summary.add_count++; + break; + case "update": + summary.update_count++; + break; + case "skip": + summary.skip_count++; + break; + case "failed": + summary.failed_count++; + break; + case "ignore": + // Ignore status doesn't count towards any category + break; + } + }); + } + + return summary; + } +} diff --git a/argilla-frontend/v1/domain/usecases/create-import-history-use-case.ts b/argilla-frontend/v1/domain/usecases/create-import-history-use-case.ts index 1b323b5c5..fe278db89 100644 --- a/argilla-frontend/v1/domain/usecases/create-import-history-use-case.ts +++ b/argilla-frontend/v1/domain/usecases/create-import-history-use-case.ts @@ -5,10 +5,9 @@ import { type NuxtAxiosInstance } from "@nuxtjs/axios"; import type { ImportHistoryCreate } from "~/v1/domain/entities/import/ImportAnalysis"; -export interface ImportHistoryResponse { +export interface CreateImportHistoryResponse { id: string; workspace_id: string; - user_id: string; filename: string; created_at: string; } @@ -16,8 +15,8 @@ export interface ImportHistoryResponse { export class CreateImportHistoryUseCase { constructor(private readonly axios: NuxtAxiosInstance) {} - async execute(importHistoryData: ImportHistoryCreate): Promise { - const response = await this.axios.post("/v1/imports/history", importHistoryData); + async execute(importHistoryData: ImportHistoryCreate): Promise { + const response = await this.axios.post("/v1/imports/history", importHistoryData); return response.data; } diff --git a/argilla-frontend/v1/domain/usecases/get-import-history-details-use-case.ts b/argilla-frontend/v1/domain/usecases/get-import-history-details-use-case.ts index cdfea976e..4ac2a3748 100644 --- a/argilla-frontend/v1/domain/usecases/get-import-history-details-use-case.ts +++ b/argilla-frontend/v1/domain/usecases/get-import-history-details-use-case.ts @@ -3,152 +3,65 @@ */ import { type NuxtAxiosInstance } from "@nuxtjs/axios"; +import type { + ImportHistoryResponse, + DataframeData, + DocumentImportAnalysis, + ImportStatus, + ImportSummary, +} from "~/v1/domain/entities/import/ImportAnalysis"; export interface ImportHistoryDetailItem { reference: string; - title: string; - authors: string; - year: string; - journal?: string; - doi?: string; - pmid?: string; - status: "add" | "update" | "skip" | "failed"; + status: ImportStatus; associated_files: string[]; error_message?: string; validation_errors?: string[]; - // Dynamic fields from original dataframe + // Dynamic fields from original dataframe (title, authors, year, journal, etc.) [key: string]: any; } -export interface ImportHistoryDetailsResponse { - id: string; - workspace_id: string; - user_id: string; - filename: string; - created_at: string; - uploaded_by?: string; - data: { - schema: { - fields: Array<{ - name: string; - type: string; - }>; - primaryKey: string[]; - }; - data: Record[]; - }; - metadata?: Record; // Contains status and file info for each reference - summary: { - total_documents: number; - add_count: number; - update_count: number; - skip_count: number; - failed_count: number; +export interface ImportHistoryDetailsResponse extends ImportHistoryResponse { + data: DataframeData; // Always present in detailed view + metadata: { + documents: Record; // Reference key to document info mapping + summary: ImportSummary; // Import analysis summary }; } -export interface ImportHistoryDetailsFilters { - reference?: string; - title?: string; - authors?: string; - status?: string; - error_message?: string; -} - -export interface ImportHistoryDetailsParams { - page?: number; - size?: number; - sort_by?: string; - sort_order?: "asc" | "desc"; - filters?: ImportHistoryDetailsFilters; -} - export class GetImportHistoryDetailsUseCase { constructor(private readonly axios: NuxtAxiosInstance) {} - async execute( - importId: string, - params: ImportHistoryDetailsParams = {} - ): Promise<{ - details: ImportHistoryDetailsResponse; - items: ImportHistoryDetailItem[]; - total: number; - page: number; - size: number; - pages: number; - }> { - const queryParams = new URLSearchParams(); - - // Pagination - if (params.page !== undefined) { - queryParams.append("page", params.page.toString()); - } - if (params.size !== undefined) { - queryParams.append("size", params.size.toString()); - } - - // Sorting - if (params.sort_by) { - queryParams.append("sort_by", params.sort_by); - } - if (params.sort_order) { - queryParams.append("sort_order", params.sort_order); - } - - // Filters - if (params.filters) { - Object.entries(params.filters).forEach(([key, value]) => { - if (value !== undefined && value !== null && value !== "") { - queryParams.append(key, value.toString()); - } - }); - } - - const response = await this.axios.get( - `/v1/imports/history/${importId}?${queryParams.toString()}` - ); - - const details = response.data; - - // Process the data to create detailed items - const items = this.processDetailItems(details); - - // Apply client-side pagination and filtering if needed - const filteredItems = this.applyFilters(items, params.filters); - const paginatedItems = this.applyPagination(filteredItems, params.page || 1, params.size || 20); + async execute(importId: string): Promise { + const response = await this.axios.get(`/v1/imports/history/${importId}`); - return { - details, - items: paginatedItems.items, - total: filteredItems.length, - page: params.page || 1, - size: params.size || 20, - pages: Math.ceil(filteredItems.length / (params.size || 20)), - }; + return response.data; } - private processDetailItems(details: ImportHistoryDetailsResponse): ImportHistoryDetailItem[] { + /** + * Process the import history response into detail items + */ + processDetailItems(details: ImportHistoryDetailsResponse): ImportHistoryDetailItem[] { const items: ImportHistoryDetailItem[] = []; // Process each data row from the dataframe details.data.data.forEach((row: Record) => { const reference = row.reference || row.id || "Unknown"; - const metadata = details.metadata?.[reference] || {}; + const documentAnalysis: DocumentImportAnalysis = details.metadata?.documents?.[reference] || { + document_create: {}, + associated_files: [], + status: "unknown" as ImportStatus, + validation_errors: [], + }; const item: ImportHistoryDetailItem = { reference, - title: row.title || "Unknown Title", - authors: this.formatAuthors(row.author || row.authors), - year: row.year?.toString() || "Unknown", - journal: row.journal || row.venue, - doi: row.doi, - pmid: row.pmid, - status: metadata.status || "unknown", - associated_files: metadata.associated_files || [], - error_message: metadata.error_message, - validation_errors: metadata.validation_errors, - // Include all other fields from the original data - ...row, + status: documentAnalysis.status, + associated_files: documentAnalysis.associated_files, + error_message: documentAnalysis.validation_errors?.join("; ") || undefined, + validation_errors: documentAnalysis.validation_errors, + // Include all fields from the original data with proper formatting + ...this.formatDataFields(row), }; items.push(item); @@ -157,6 +70,49 @@ export class GetImportHistoryDetailsUseCase { return items; } + /** + * Calculate summary from data and metadata + */ + calculateSummary(data: DataframeData, metadata?: ImportHistoryResponse["metadata"]): ImportSummary { + // If metadata already contains a summary, use it + if (metadata?.summary) { + return metadata.summary; + } + + // Otherwise calculate from documents + const summary = { + total_documents: data.data.length, + add_count: 0, + update_count: 0, + skip_count: 0, + failed_count: 0, + }; + + if (metadata?.documents) { + Object.values(metadata.documents).forEach((documentAnalysis: DocumentImportAnalysis) => { + switch (documentAnalysis.status) { + case "add": + summary.add_count++; + break; + case "update": + summary.update_count++; + break; + case "skip": + summary.skip_count++; + break; + case "failed": + summary.failed_count++; + break; + case "ignore": + // Ignore status doesn't count towards any category + break; + } + }); + } + + return summary; + } + private formatAuthors(authors: string | string[] | undefined): string { if (!authors) return "Unknown Authors"; if (Array.isArray(authors)) { @@ -165,46 +121,34 @@ export class GetImportHistoryDetailsUseCase { return String(authors); } - private applyFilters( - items: ImportHistoryDetailItem[], - filters?: ImportHistoryDetailsFilters - ): ImportHistoryDetailItem[] { - if (!filters) return items; - - return items.filter((item) => { - if (filters.reference && !item.reference.toLowerCase().includes(filters.reference.toLowerCase())) { - return false; - } - if (filters.title && !item.title.toLowerCase().includes(filters.title.toLowerCase())) { - return false; - } - if (filters.authors && !item.authors.toLowerCase().includes(filters.authors.toLowerCase())) { - return false; + /** + * Format data fields from the original dataframe + */ + private formatDataFields(row: Record): Record { + const formatted: Record = {}; + + // Process each field from the original data + Object.entries(row).forEach(([key, value]) => { + if (key === "reference" || key === "id") { + // Skip reference/id as it's handled separately + return; } - if (filters.status && item.status !== filters.status) { - return false; - } - if ( - filters.error_message && - (!item.error_message || !item.error_message.toLowerCase().includes(filters.error_message.toLowerCase())) - ) { - return false; + + // Format specific field types + if (key === "authors" || key === "author") { + formatted[key] = this.formatAuthors(value); + } else if (key === "year") { + formatted[key] = value?.toString() || "Unknown"; + } else if (key === "journal" || key === "venue") { + formatted[key] = value || "Unknown"; + } else if (key === "title") { + formatted[key] = value || "Unknown Title"; + } else { + // For all other fields, use the original value + formatted[key] = value; } - return true; }); - } - private applyPagination( - items: ImportHistoryDetailItem[], - page: number, - size: number - ): { items: ImportHistoryDetailItem[]; total: number } { - const startIndex = (page - 1) * size; - const endIndex = startIndex + size; - - return { - items: items.slice(startIndex, endIndex), - total: items.length, - }; + return formatted; } } diff --git a/argilla-frontend/v1/domain/usecases/get-import-history-use-case.ts b/argilla-frontend/v1/domain/usecases/get-import-history-use-case.ts index b2ba00148..0f2bb484c 100644 --- a/argilla-frontend/v1/domain/usecases/get-import-history-use-case.ts +++ b/argilla-frontend/v1/domain/usecases/get-import-history-use-case.ts @@ -1,16 +1,12 @@ -/** - * Use case for fetching import history records - */ - import { type NuxtAxiosInstance } from "@nuxtjs/axios"; +import { ImportStatus } from "../entities/import/ImportAnalysis"; export interface ImportHistoryListItem { id: string; workspace_id: string; - user_id: string; + username: string; filename: string; created_at: string; - uploaded_by?: string; // User name, populated from user relationship total_papers: number; success_count: number; updated_count: number; @@ -28,32 +24,49 @@ export interface ImportHistoryListResponse { export interface ImportHistoryFilters { workspace_id?: string; - user_id?: string; - filename?: string; - date_from?: string; - date_to?: string; + username?: string; } -export interface ImportHistoryListParams { +export interface ImportHistoryListRequest { page?: number; - size?: number; + limit?: number; sort_by?: string; sort_order?: "asc" | "desc"; filters?: ImportHistoryFilters; } +// Backend response structure +interface ImportHistoryResponse { + id: string; + workspace_id: string; + username: string; + filename: string; + created_at: string; + data?: any; + metadata?: Record< + string, + { + status: ImportStatus; + associated_files: string[]; + error_message?: string; + validation_errors?: string[]; + import_timestamp?: string; + } + >; +} + export class GetImportHistoryUseCase { constructor(private readonly axios: NuxtAxiosInstance) {} - async execute(params: ImportHistoryListParams = {}): Promise { + async execute(params: ImportHistoryListRequest = {}): Promise { const queryParams = new URLSearchParams(); // Pagination if (params.page !== undefined) { queryParams.append("page", params.page.toString()); } - if (params.size !== undefined) { - queryParams.append("size", params.size.toString()); + if (params.limit !== undefined) { + queryParams.append("limit", params.limit.toString()); } // Sorting @@ -64,17 +77,106 @@ export class GetImportHistoryUseCase { queryParams.append("sort_order", params.sort_order); } - // Filters + // Filters - only workspace_id and username (not yet implemented in backend) if (params.filters) { - Object.entries(params.filters).forEach(([key, value]) => { - if (value !== undefined && value !== null && value !== "") { - queryParams.append(key, value.toString()); - } - }); + if (params.filters.workspace_id) { + queryParams.append("workspace_id", params.filters.workspace_id); + } + if (params.filters.username) { + queryParams.append("username", params.filters.username); + } } - const response = await this.axios.get(`/v1/imports/history?${queryParams.toString()}`); + const response = await this.axios.get(`/v1/imports/history?${queryParams.toString()}`); + + // The API returns an array directly, not an object with items property + const rawItems = Array.isArray(response.data) ? response.data : []; + + // Transform backend response to frontend format with calculated fields + const items: ImportHistoryListItem[] = rawItems.map((item) => { + const counts = this.calculateCountsFromMetadata(item.metadata); + + return { + id: item.id, + workspace_id: item.workspace_id, + username: item.username, + filename: item.filename, + created_at: item.created_at, + total_papers: counts.total, + success_count: counts.success, + updated_count: counts.updated, + skipped_count: counts.skipped, + failed_count: counts.failed, + }; + }); + + return { + items, + total: items.length, + page: params.page || 1, + size: params.limit || items.length, + pages: 1, // Since we're getting all items in one response + }; + } + + /** + * Calculate counts from metadata + */ + private calculateCountsFromMetadata(metadata?: Record): { + total: number; + success: number; + updated: number; + skipped: number; + failed: number; + } { + if (!metadata) { + return { total: 0, success: 0, updated: 0, skipped: 0, failed: 0 }; + } + + let success = 0; + let updated = 0; + let skipped = 0; + let failed = 0; + + // Count statuses from metadata + Object.values(metadata).forEach((item: any) => { + if (item && typeof item === "object" && item.status) { + switch (item.status) { + case "add": + success++; + break; + case "update": + updated++; + break; + case "skip": + skipped++; + break; + case "failed": + failed++; + break; + } + } + }); + + const total = success + updated + skipped + failed; + + return { total, success, updated, skipped, failed }; + } + + /** + * Fetch recent imports for sidebar display + * @param workspaceId - The workspace ID to filter imports + * @param limit - Maximum number of recent imports to fetch (default: 5) + * @returns Promise - Recent imports sorted by creation date + */ + async getRecent(workspaceId: string, limit = 5): Promise { + const params: ImportHistoryListRequest = { + limit, + sort_by: "created_at", + sort_order: "desc", + filters: { workspace_id: workspaceId }, + }; - return response.data; + return await this.execute(params); } } diff --git a/argilla-frontend/v1/infrastructure/repositories/DatasetRepository.ts b/argilla-frontend/v1/infrastructure/repositories/DatasetRepository.ts index 47ee63d49..e6000f281 100644 --- a/argilla-frontend/v1/infrastructure/repositories/DatasetRepository.ts +++ b/argilla-frontend/v1/infrastructure/repositories/DatasetRepository.ts @@ -71,14 +71,25 @@ export class DatasetRepository implements IDatasetRepository { async import(datasetId: DatasetId, creation: DatasetCreation): Promise { try { - const { data } = await this.axios.post(`/v1/datasets/${datasetId}/import`, { - name: creation.repoId, - subset: creation.selectedSubset.name, - split: creation.selectedSubset.selectedSplit.name, - mapping: creation.mappings, - }); - - return data.id; + // Check if this is an ImportHistory-based dataset + if (creation.importHistoryId) { + const { data } = await this.axios.post(`/v1/datasets/${datasetId}/import-history`, { + history_id: creation.importHistoryId, + mapping: creation.mappings, + }); + + return data.id; + } else { + // Original HuggingFace Hub import + const { data } = await this.axios.post(`/v1/datasets/${datasetId}/import`, { + name: creation.repoId, + subset: creation.selectedSubset.name, + split: creation.selectedSubset.selectedSplit.name, + mapping: creation.mappings, + }); + + return data.id; + } } catch (err) { throw { response: DATASET_API_ERRORS.ERROR_IMPORTING_DATASET, diff --git a/argilla-frontend/v1/infrastructure/repositories/FieldRepository.ts b/argilla-frontend/v1/infrastructure/repositories/FieldRepository.ts index d155c9406..4524bedcb 100644 --- a/argilla-frontend/v1/infrastructure/repositories/FieldRepository.ts +++ b/argilla-frontend/v1/infrastructure/repositories/FieldRepository.ts @@ -18,7 +18,10 @@ export class FieldRepository { name: field.name, title: field.title, required: field.required, - settings: field.settings, + settings: { + ...field.settings, + type: field.settings.type.value, // Extract the string value from FieldType + }, }); return data; @@ -61,7 +64,11 @@ export class FieldRepository { private createRequest({ name, title, settings }: Field): Partial { return { title: !title || title === "" ? name : title, - settings, + settings: { + ...settings, + // Ensure type is serialized as string value if it's a FieldType object + type: typeof settings.type === "object" && settings.type?.value ? settings.type.value : settings.type, + }, }; } } diff --git a/argilla-frontend/v1/infrastructure/repositories/MetadataRepository.ts b/argilla-frontend/v1/infrastructure/repositories/MetadataRepository.ts index 4616df2a6..21e4c0279 100644 --- a/argilla-frontend/v1/infrastructure/repositories/MetadataRepository.ts +++ b/argilla-frontend/v1/infrastructure/repositories/MetadataRepository.ts @@ -69,7 +69,7 @@ export class MetadataRepository { name: metadata.name, title: metadata.title, settings: { - type: metadata.adapteType, + type: metadata.adaptedType, }, }); diff --git a/argilla-frontend/v1/infrastructure/services/useLanguageDirection.test.ts b/argilla-frontend/v1/infrastructure/services/useLanguageDirection.test.ts index 61ee7f167..ccfe01f0a 100644 --- a/argilla-frontend/v1/infrastructure/services/useLanguageDirection.test.ts +++ b/argilla-frontend/v1/infrastructure/services/useLanguageDirection.test.ts @@ -51,5 +51,29 @@ describe("useLanguageDirection", () => { expect(result).toBe(false); }); + + test("be false if the text is undefined", () => { + const result = isRTL(undefined); + + expect(result).toBe(false); + }); + + test("be false if the text is null", () => { + const result = isRTL(null); + + expect(result).toBe(false); + }); + + test("be false if the text is empty string", () => { + const result = isRTL(""); + + expect(result).toBe(false); + }); + + test("be false if the text is not a string", () => { + const result = isRTL(123 as any); + + expect(result).toBe(false); + }); }); }); diff --git a/argilla-frontend/v1/infrastructure/services/useLanguageDirection.ts b/argilla-frontend/v1/infrastructure/services/useLanguageDirection.ts index d95ac1c37..607001578 100644 --- a/argilla-frontend/v1/infrastructure/services/useLanguageDirection.ts +++ b/argilla-frontend/v1/infrastructure/services/useLanguageDirection.ts @@ -1,9 +1,14 @@ export const useLanguageDirection = () => { - const isRTL = (text: string) => { - const rtlCount = (text?.match(/[\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC]/g) || []).length; + const isRTL = (text: string | undefined | null) => { + // Handle undefined, null, or empty string cases + if (!text || typeof text !== "string") { + return false; + } + + const rtlCount = (text.match(/[\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC]/g) || []).length; const ltrCount = ( - text?.match( + text.match( // eslint-disable-next-line no-misleading-character-class /[A-Za-z\u00C0-\u00C0\u00D8-\u00F6\u00F8-\u02B8\u0300-\u0590\u0800-\u1FFF\u2C00-\uFB1C\uFDFE-\uFE6F\uFEFD-\uFFFF]/g ) || [] diff --git a/argilla-frontend/v1/infrastructure/services/useRoutes.ts b/argilla-frontend/v1/infrastructure/services/useRoutes.ts index 9db9fd23d..d967733d4 100644 --- a/argilla-frontend/v1/infrastructure/services/useRoutes.ts +++ b/argilla-frontend/v1/infrastructure/services/useRoutes.ts @@ -24,6 +24,7 @@ export const ROUTES = { annotationPage: (datasetId: string) => `/dataset/${datasetId}/annotation-mode`, settings: (id: string) => `/dataset/${id}/settings`, importDatasetFromHub: (id: string) => `/new/hf/${encodeURIComponent(id)}`, + importConfiguration: (importId: string) => `/new/import/${importId}`, }; export const useRoutes = () => { @@ -60,6 +61,10 @@ export const useRoutes = () => { router.push(ROUTES.importDatasetFromHub(id)); }; + const goToImportConfiguration = (importId: string) => { + router.push(ROUTES.importConfiguration(importId)); + }; + const goToHome = () => { router.push(ROUTES.index); }; @@ -151,6 +156,7 @@ export const useRoutes = () => { goToSignIn, getQuery, goToImportDatasetFromHub, + goToImportConfiguration, goToFeedbackTaskAnnotationPage, goToHome, goToSetting, diff --git a/argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py b/argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py index 6baef1edf..e7163779c 100644 --- a/argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py +++ b/argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py @@ -31,6 +31,7 @@ DatasetUpdate, HubDataset, HubDatasetExport, + ImportHistoryDataset, UsersProgress, ) from argilla_server.api.schemas.v1.fields import Field, FieldCreate, Fields @@ -45,7 +46,7 @@ from argilla_server.contexts import datasets from argilla_server.database import get_async_db from argilla_server.enums import DatasetStatus -from argilla_server.jobs import hub_jobs +from argilla_server.jobs import hub_jobs, import_jobs from argilla_server.models import Dataset, User from argilla_server.search_engine import ( SearchEngine, @@ -336,6 +337,27 @@ async def import_dataset_from_hub( return JobSchema(id=job.id, status=job.get_status()) +@router.post("/datasets/{dataset_id}/import-history", status_code=status.HTTP_202_ACCEPTED, response_model=JobSchema) +async def import_dataset_from_import_history( + *, + db: AsyncSession = Depends(get_async_db), + dataset_id: UUID, + import_history_dataset: ImportHistoryDataset, + current_user: User = Security(auth.get_current_user), +): + dataset = await Dataset.get_or_raise(db, dataset_id) + + await authorize(current_user, DatasetPolicy.import_from_hub(dataset)) + + job = import_jobs.import_dataset_from_import_history_job.delay( + history_id=import_history_dataset.history_id, + dataset_id=dataset.id, + mapping=import_history_dataset.mapping.model_dump(), + ) + + return JobSchema(id=job.id, status=job.get_status()) + + @router.post("/datasets/{dataset_id}/export", status_code=status.HTTP_202_ACCEPTED, response_model=JobSchema) async def export_dataset_to_hub( *, diff --git a/argilla-server/src/argilla_server/api/handlers/v1/imports.py b/argilla-server/src/argilla_server/api/handlers/v1/imports.py index 848d11323..0a2f52370 100644 --- a/argilla-server/src/argilla_server/api/handlers/v1/imports.py +++ b/argilla-server/src/argilla_server/api/handlers/v1/imports.py @@ -13,12 +13,13 @@ # limitations under the License. import logging -from typing import List +from typing import List, Optional from uuid import UUID from fastapi import APIRouter, Depends, HTTPException, Security, status from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy import select +from sqlalchemy.orm import selectinload from pydantic import ValidationError from argilla_server.database import get_async_db @@ -31,6 +32,7 @@ ImportAnalysisResponse, ImportHistoryCreate, ImportHistoryResponse, + ImportHistoryCreateResponse, ) _LOGGER = logging.getLogger(__name__) @@ -138,13 +140,13 @@ def _validate_analysis_request(analysis_request: ImportAnalysisRequest) -> List[ return errors -@router.post("/imports/history", status_code=status.HTTP_201_CREATED, response_model=ImportHistoryResponse) +@router.post("/imports/history", status_code=status.HTTP_201_CREATED, response_model=ImportHistoryCreateResponse) async def create_import_history_endpoint( *, import_history_create: ImportHistoryCreate, db: AsyncSession = Depends(get_async_db), current_user: User = Security(auth.get_current_user), -) -> ImportHistoryResponse: +) -> ImportHistoryCreateResponse: """ Create import history record to store generic tabular dataframe data. @@ -282,6 +284,7 @@ def _validate_import_history_request(import_history_create: ImportHistoryCreate) async def list_import_histories( *, workspace_id: UUID, + limit: Optional[int] = None, db: AsyncSession = Depends(get_async_db), current_user: User = Security(auth.get_current_user), ) -> List[ImportHistoryResponse]: @@ -290,6 +293,7 @@ async def list_import_histories( Args: workspace_id: Workspace ID to filter import histories + limit: Optional limit on number of records to return (for Recent Imports sidebar) db: Database session current_user: Authenticated user @@ -309,11 +313,17 @@ async def list_import_histories( ) try: - result = await db.execute( + query = ( select(ImportHistory) + .options(selectinload(ImportHistory.user)) .where(ImportHistory.workspace_id == workspace_id) .order_by(ImportHistory.inserted_at.desc()) ) + + if limit is not None: + query = query.limit(limit) + + result = await db.execute(query) import_histories = result.scalars().all() # Convert to response format (include metadata but not data for list view) @@ -323,13 +333,13 @@ async def list_import_histories( ImportHistoryResponse( id=history.id, workspace_id=history.workspace_id, - user_id=history.user_id, + username=history.user.username, filename=history.filename, created_at=history.inserted_at, metadata=history.metadata_, # Include metadata in list view + data=None, ) ) - _LOGGER.info(f"Retrieved {len(response_list)} import histories for workspace {workspace_id}") return response_list @@ -365,7 +375,10 @@ async def get_import_history( await authorize(current_user, DocumentPolicy.create()) try: - history = await ImportHistory.get(db, history_id) + query = select(ImportHistory).options(selectinload(ImportHistory.user)).where(ImportHistory.id == history_id) + result = await db.execute(query) + history = result.scalar_one_or_none() + if not history: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, @@ -383,7 +396,7 @@ async def get_import_history( response = ImportHistoryResponse( id=history.id, workspace_id=history.workspace_id, - user_id=history.user_id, + username=history.user.username, filename=history.filename, created_at=history.inserted_at, data=history.data, # Include data in detailed view diff --git a/argilla-server/src/argilla_server/api/schemas/v1/datasets.py b/argilla-server/src/argilla_server/api/schemas/v1/datasets.py index 3f700778f..077d3a2f8 100644 --- a/argilla-server/src/argilla_server/api/schemas/v1/datasets.py +++ b/argilla-server/src/argilla_server/api/schemas/v1/datasets.py @@ -1,16 +1,16 @@ -# Copyright 2021-present, the Recognai S.L. team. +# Copyright 2024-present, Extralit Labs, Inc. # -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at # -# http://www.apache.org/licenses/LICENSE-2.0 +# http://www.apache.org/licenses/LICENSE-2.0 # -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. from datetime import datetime from typing import List, Literal, Optional, Dict, Any @@ -204,3 +204,8 @@ class HubDatasetExport(BaseModel): split: Optional[str] = Field("train", min_length=1) private: Optional[bool] = False token: str = Field(..., min_length=1) + + +class ImportHistoryDataset(BaseModel): + history_id: UUID = Field(..., description="The ID of the import history to import from") + mapping: HubDatasetMapping = Field(..., description="The mapping configuration for the import") diff --git a/argilla-server/src/argilla_server/api/schemas/v1/imports.py b/argilla-server/src/argilla_server/api/schemas/v1/imports.py index 7810cddcb..817621a7f 100644 --- a/argilla-server/src/argilla_server/api/schemas/v1/imports.py +++ b/argilla-server/src/argilla_server/api/schemas/v1/imports.py @@ -136,10 +136,19 @@ class ImportHistoryResponse(BaseModel): id: UUID = Field(..., description="Import history record ID") workspace_id: UUID = Field(..., description="Workspace ID") - user_id: UUID = Field(..., description="User ID who created the import") + username: str = Field(..., description="Username who created the import") filename: str = Field(..., description="Import filename") created_at: datetime = Field(..., description="Creation timestamp") data: Optional[Dict[str, Any]] = Field(None, description="Tabular dataframe data (only in detailed view)") metadata: Optional[Dict[str, Any]] = Field( None, description="Import metadata with status and files (in list and detailed view)" ) + + +class ImportHistoryCreateResponse(BaseModel): + """Response schema for import history creation (without user object).""" + + id: UUID = Field(..., description="Import history record ID") + workspace_id: UUID = Field(..., description="Workspace ID") + filename: str = Field(..., description="Import filename") + created_at: datetime = Field(..., description="Creation timestamp") diff --git a/argilla-server/src/argilla_server/contexts/hub.py b/argilla-server/src/argilla_server/contexts/hub.py index 8f2a21f2b..219234738 100644 --- a/argilla-server/src/argilla_server/contexts/hub.py +++ b/argilla-server/src/argilla_server/contexts/hub.py @@ -62,7 +62,7 @@ class HubDataset: def __init__(self, name: str, subset: str, split: str, mapping: HubDatasetMapping): - self.dataset = load_dataset(path=name, name=subset, split=split, streaming=True) + self.dataset: HFDataset = load_dataset(path=name, name=subset, split=split, streaming=True) # type: ignore self.split = split self.mapping = mapping self.mapping_feature_names = mapping.sources @@ -231,7 +231,7 @@ def __init__(self, dataset: Dataset): self.cache_version = uuid4() def export_to(self, name: str, subset: str, split: str, private: bool, token: str) -> None: - hf_dataset = HFDataset.from_generator(self._rows_generator, split=NamedSplit(split)) + hf_dataset: HFDataset = HFDataset.from_generator(self._rows_generator, split=NamedSplit(split)) # type: ignore hf_dataset.push_to_hub( repo_id=name, config_name=subset, diff --git a/argilla-server/src/argilla_server/contexts/imports.py b/argilla-server/src/argilla_server/contexts/imports.py index e1592e3b2..1146a72f4 100644 --- a/argilla-server/src/argilla_server/contexts/imports.py +++ b/argilla-server/src/argilla_server/contexts/imports.py @@ -35,7 +35,7 @@ DocumentsBulkCreate, DocumentsBulkResponse, ImportHistoryCreate, - ImportHistoryResponse, + ImportHistoryCreateResponse, ) from argilla_server.jobs.document_jobs import upload_reference_documents_job @@ -460,8 +460,8 @@ async def process_bulk_upload( async def create_import_history( - db: AsyncSession, import_history_create: ImportHistoryCreate, user_id: str -) -> ImportHistoryResponse: + db: AsyncSession, import_history_create: ImportHistoryCreate, user_id: UUID | str +) -> ImportHistoryCreateResponse: """ Create an import history record to store tabular dataframe data and import metadata. @@ -499,10 +499,9 @@ async def create_import_history( f"with filename {import_history.filename}" ) - return ImportHistoryResponse( + return ImportHistoryCreateResponse( id=import_history.id, workspace_id=import_history.workspace_id, - user_id=import_history.user_id, filename=import_history.filename, created_at=import_history.inserted_at, ) diff --git a/argilla-server/src/argilla_server/jobs/import_jobs.py b/argilla-server/src/argilla_server/jobs/import_jobs.py new file mode 100644 index 000000000..35f8ef961 --- /dev/null +++ b/argilla-server/src/argilla_server/jobs/import_jobs.py @@ -0,0 +1,196 @@ +# Copyright 2024-present, Extralit Labs, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Imporfor processing ta sources into Argilasets. + +Thisodule providebackground jobs for data from different sources: +- ImportHistory: Import data from previously uploaded files stored in ImportHistory +- Future: Additional import sources can be added here + +The jobs use the same HubDatasetMapping schema for consistency with existing Hub imports. +""" + +""" +Import jobs for processing data from various sources into Argilla datasets. + +This module provides background jobs for importing data from ImportHistory records, +reusing the same mapping and processing infrastructure as HuggingFace Hub imports. +""" + +from uuid import UUID +from typing import Any, Dict, List + +from rq import Retry +from rq.decorators import job +from sqlalchemy.orm import selectinload +from sqlalchemy.ext.asyncio import AsyncSession + +from argilla_server.models import Dataset, ImportHistory +from argilla_server.settings import settings +from argilla_server.database import AsyncSessionLocal +from argilla_server.search_engine.base import SearchEngine +from argilla_server.api.schemas.v1.datasets import HubDatasetMapping +from argilla_server.api.schemas.v1.records import RecordUpsert as RecordUpsertSchema +from argilla_server.api.schemas.v1.records_bulk import RecordsBulkUpsert as RecordsBulkUpsertSchema +from argilla_server.api.schemas.v1.suggestions import SuggestionCreate +from argilla_server.bulk.records_bulk import UpsertRecordsBulk +from argilla_server.jobs.queues import DEFAULT_QUEUE, JOB_TIMEOUT_DISABLED + +BATCH_SIZE = 100 + + +class ImportHistoryDataset: + """Adapter class to process ImportHistory data similar to HubDataset""" + + def __init__(self, import_history: ImportHistory, mapping: HubDatasetMapping): + self.import_history = import_history + self.mapping = mapping + self.data = import_history.data.get("data", []) + self.row_idx = -1 + + def _next_row_idx(self) -> int: + self.row_idx += 1 + return self.row_idx + + async def import_to(self, db: AsyncSession, search_engine: SearchEngine, dataset: Dataset) -> None: + if not dataset.is_ready: + raise Exception("it's not possible to import records to a non published dataset") + + self.row_idx = -1 + + # Process data in batches + for i in range(0, len(self.data), BATCH_SIZE): + batch = self.data[i : i + BATCH_SIZE] + await self._import_batch_to(db, search_engine, batch, dataset) + + async def _import_batch_to( + self, db: AsyncSession, search_engine: SearchEngine, batch: List[Dict[str, Any]], dataset: Dataset + ) -> None: + items = [] + for row in batch: + items.append(self._row_to_record_schema(row, dataset)) + + await UpsertRecordsBulk(db, search_engine).upsert_records_bulk( + dataset, + RecordsBulkUpsertSchema(items=items), + raise_on_error=True, + ) + + def _row_to_record_schema(self, row: Dict[str, Any], dataset: Dataset) -> RecordUpsertSchema: + return RecordUpsertSchema( + id=None, + external_id=self._row_external_id(row), + fields=self._row_fields(row, dataset), + metadata=self._row_metadata(row, dataset), + suggestions=self._row_suggestions(row, dataset), + responses=None, + vectors=None, + ) + + def _row_external_id(self, row: Dict[str, Any]) -> str: + if not self.mapping.external_id: + return f"import_history_{self.import_history.id}_{self._next_row_idx()}" + + return str(row.get(self.mapping.external_id, f"import_history_{self.import_history.id}_{self._next_row_idx()}")) + + def _row_fields(self, row: Dict[str, Any], dataset: Dataset) -> Dict[str, Any]: + fields = {} + for mapping_field in self.mapping.fields: + value = row.get(mapping_field.source) + field = dataset.field_by_name(mapping_field.target) + if value is None or not field: + continue + + if field.is_text and value is not None: + value = str(value) + + fields[field.name] = value + + return fields + + def _row_metadata(self, row: Dict[str, Any], dataset: Dataset) -> Dict[str, Any]: + metadata = {} + for mapping_metadata in self.mapping.metadata or []: + value = row.get(mapping_metadata.source) + metadata_property = dataset.metadata_property_by_name(mapping_metadata.target) + if value is None or not metadata_property: + continue + + metadata[metadata_property.name] = value + + return metadata + + def _row_suggestions(self, row: Dict[str, Any], dataset: Dataset) -> List[SuggestionCreate]: + suggestions = [] + for mapping_suggestion in self.mapping.suggestions or []: + value = row.get(mapping_suggestion.source) + question = dataset.question_by_name(mapping_suggestion.target) + if value is None or not question: + continue + + if question.is_text or question.is_label_selection: + value = str(value) + + if question.is_multi_label_selection: + if isinstance(value, list): + value = [str(v) for v in value] + else: + value = [str(value)] + + if question.is_rating: + value = int(value) # type: ignore + + suggestions.append( + SuggestionCreate( + question_id=question.id, + value=value, + type=None, + agent=None, + score=None, + ), + ) + + return suggestions + + +@job(DEFAULT_QUEUE, timeout=JOB_TIMEOUT_DISABLED, retry=Retry(max=3)) +async def import_dataset_from_import_history_job(history_id: UUID, dataset_id: UUID, mapping: dict) -> None: + """ + Import dataset records from ImportHistory data. + + This job loads data from an ImportHistory data and creates dataset records + using the same mapping containing fields, metadata, and suggestions configured in DatasetConfiguration. + + Args: + history_id: UUID of the ImportHistory record containing the data + dataset_id: UUID of the Dataset to import records into + """ + async with AsyncSessionLocal() as db: + import_history = await ImportHistory.get_or_raise(db, history_id) + + dataset = await Dataset.get_or_raise( + db, + dataset_id, + options=[ + selectinload(Dataset.fields), + selectinload(Dataset.questions), + selectinload(Dataset.metadata_properties), + ], + ) + + async with SearchEngine.get_by_name(settings.search_engine) as search_engine: + parsed_mapping = HubDatasetMapping.model_validate(mapping) + + await ImportHistoryDataset(import_history, parsed_mapping).import_to(db, search_engine, dataset) diff --git a/argilla-server/tests/unit/api/handlers/v1/test_imports.py b/argilla-server/tests/unit/api/handlers/v1/test_imports.py index a5e287078..eaa55332a 100644 --- a/argilla-server/tests/unit/api/handlers/v1/test_imports.py +++ b/argilla-server/tests/unit/api/handlers/v1/test_imports.py @@ -671,3 +671,125 @@ async def test_create_import_history_bibtex_data(self, async_client: AsyncClient assert data["workspace_id"] == str(workspace.id) assert data["filename"] == "zotero_export.bib" assert "created_at" in data + + async def test_list_import_histories_unauthorized(self, async_client: AsyncClient): + """Test that unauthorized users cannot access the list import histories endpoint.""" + # Make request without authentication + response = await async_client.get(f"/api/v1/imports/history?workspace_id={uuid4()}") + + # Verify response + assert response.status_code == status.HTTP_401_UNAUTHORIZED + + async def test_list_import_histories_invalid_workspace(self, async_client: AsyncClient, owner_auth_header: dict): + """Test list import histories endpoint with invalid workspace ID.""" + # Make request with non-existent workspace ID + response = await async_client.get(f"/api/v1/imports/history?workspace_id={uuid4()}", headers=owner_auth_header) + + # Verify response + assert response.status_code == status.HTTP_422_UNPROCESSABLE_ENTITY + assert "not found" in response.json()["detail"] + + async def test_list_import_histories_empty(self, async_client: AsyncClient, owner_auth_header: dict): + """Test list import histories endpoint with no import histories.""" + # Create owner user and workspace + owner = await UserFactory.create(role=UserRole.owner) + workspaces = await WorkspaceFactory.create_batch(1) + workspace = workspaces[0] + + # Make request + response = await async_client.get( + f"/api/v1/imports/history?workspace_id={workspace.id}", headers=owner_auth_header + ) + + # Verify response + assert response.status_code == status.HTTP_200_OK + data = response.json() + assert isinstance(data, list) + assert len(data) == 0 + + async def test_list_import_histories_with_limit(self, async_client: AsyncClient, owner_auth_header: dict): + """Test list import histories endpoint with limit parameter for Recent Imports sidebar.""" + # Create owner user and workspace + owner = await UserFactory.create(role=UserRole.owner) + workspaces = await WorkspaceFactory.create_batch(1) + workspace = workspaces[0] + + # Create multiple import history records + import_requests = [] + for i in range(10): + dataframe_data = { + "schema": { + "fields": [ + {"name": "reference", "type": "string"}, + {"name": "title", "type": "string"}, + ], + "primaryKey": ["reference"], + }, + "data": [ + { + "reference": f"ref{i}", + "title": f"Test Paper {i}", + } + ], + } + + request = ImportHistoryCreate( + workspace_id=workspace.id, + filename=f"test_import_{i}.bib", + data=dataframe_data, + metadata={f"ref{i}": {"status": "add", "associated_files": [f"test{i}.pdf"]}}, + ) + import_requests.append(request) + + # Create all import history records + for request in import_requests: + response = await async_client.post( + "/api/v1/imports/history", headers=owner_auth_header, json=request.model_dump(mode="json") + ) + assert response.status_code == status.HTTP_201_CREATED + + # Test without limit - should return all records + response = await async_client.get( + f"/api/v1/imports/history?workspace_id={workspace.id}", headers=owner_auth_header + ) + assert response.status_code == status.HTTP_200_OK + all_data = response.json() + assert len(all_data) == 10 + + # Test with limit=5 - should return only 5 most recent records + response = await async_client.get( + f"/api/v1/imports/history?workspace_id={workspace.id}&limit=5", headers=owner_auth_header + ) + assert response.status_code == status.HTTP_200_OK + limited_data = response.json() + assert len(limited_data) == 5 + + # Verify that the returned records are the most recent ones (ordered by created_at desc) + # The most recent should be test_import_9.bib, test_import_8.bib, etc. + filenames = [record["filename"] for record in limited_data] + expected_filenames = [f"test_import_{i}.bib" for i in range(9, 4, -1)] # 9, 8, 7, 6, 5 + assert filenames == expected_filenames + + # Test with limit=3 - should return only 3 most recent records + response = await async_client.get( + f"/api/v1/imports/history?workspace_id={workspace.id}&limit=3", headers=owner_auth_header + ) + assert response.status_code == status.HTTP_200_OK + limited_data = response.json() + assert len(limited_data) == 3 + + # Test with limit=0 - should return empty list + response = await async_client.get( + f"/api/v1/imports/history?workspace_id={workspace.id}&limit=0", headers=owner_auth_header + ) + assert response.status_code == status.HTTP_200_OK + limited_data = response.json() + assert len(limited_data) == 0 + + # Verify that list view doesn't include data field (only metadata) + for record in all_data: + assert "id" in record + assert "workspace_id" in record + assert "filename" in record + assert "created_at" in record + assert "metadata" in record diff --git a/codecov.yml b/codecov.yml index b6d20a2c9..db8448649 100644 --- a/codecov.yml +++ b/codecov.yml @@ -1,17 +1,11 @@ -comment: - require_changes: true coverage: status: project: default: - target: auto - threshold: 2% - informational: true + enabled: false patch: default: - target: auto - threshold: 2% - informational: true + enabled: false flags: frontend: