Skip to content

[Feature] BibTeX Importer UI & Workspace Document Viewer #102

@JonnyTran

Description

@JonnyTran

Summary

The Papers Library Importer allows researchers to bulk import documents into an Extralit workspace by uploading a .bib file and a folder of PDFs. It provides:

  • Analysis Phase: Parse bibliographic entries and match PDFs to references; classify as add/update/skip/failed.
  • Preview Phase: Display a reviewable table of import candidates with metadata and status, and allow action overrides.
  • Execution Phase: Perform asynchronous bulk upsert (add or update documents), upload PDFs to S3, and track progress via real-time job updates.
  • Integration: Persist structured bibliographic metadata, tag imports by collection and source, and surface imported documents alongside existing ones.

Key Requirements

  1. Metadata Extraction & Matching

    • Parse .bib entries (title, authors, year, DOI/PMID, reference key).
    • Match PDF filenames to entries (exact, partial, or fuzzy match).
    • Tag unmatched or failed PDFs.
  2. Preview & Confirmation

    • Show import preview table with reference key, title, authors, year, files, and status.
    • Allow users to override add/update/skip on a per‐document basis.
  3. Bulk Import Execution

    • Asynchronously upsert documents and upload files in batches (20–50 per request).
    • Track progress and report summary counts (added, updated, skipped, failed).
    • Continue processing on individual errors; report specific failures.
  4. Workspace Integration

    • Store bibliographic metadata in Document records (reference key, DOI, PMID).
    • Add collections and source: "bib_import" tags to imported documents.
    • Ensure imported documents appear with existing documents for extraction workflows.
  5. Error Handling & Security

    • Provide clear errors for malformed BibTeX, unreadable PDFs, upload failures, duplicates, and quota issues.
    • Validate file types/sizes, sanitize BibTeX input, and integrate virus scanning.
    • Support retry and resume for interrupted imports.

Acceptance Criteria

  • POST /imports/analyze returns status for each entry.
  • Preview UI displays editable import actions.
  • POST /documents/bulk enqueues upload jobs and returns job IDs.
  • Progress UI tracks jobs in real time and handles errors.
  • Imported documents carry correct metadata and tags.
  • Comprehensive unit/integration tests for all phases.

Files & Components

  • Backend: handlers (imports.py, documents.py), contexts (imports.py), jobs (document_jobs.py), ImportHistory model.
  • Frontend: Vue components ImportUpload.vue, ImportPreview.vue, ImportProgress.vue, ImportResults.vue, and the page at pages/workspace/_id/import.vue.
  • Schemas: Pydantic models for ImportAnalysisRequest/Response, BulkUploadMetadata, and ImportHistory DB schema.

Metadata

Metadata

Assignees

Labels

No labels
No labels
No fields configured for Feature.

Projects

Status

2025 Q3 Jul - Sep

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions