Skip to content

Conversation

@emirotin
Copy link
Collaborator

@emirotin emirotin commented Nov 6, 2025

Relaxed tests that produce inconsistent results between Linux and Mac (different programs are used for extraction)
Dropped DXF support (CAD files, require two libs and a binary built from source, one lib's source git URL is not working, lot's of fun for nothing)

Replaced all legacy deps.
PPTX extraction was fully vibe-coded

… deprecated util functions and implement new extraction methods for DOC files. Transition to ESM modules for better compatibility.
…mline binary checks. Update string handling for better clarity and consistency.
… handling. Update failed message handling for clarity and consistency.
…g type safety and clarity. Update extraction methods to use async/await for improved error handling and streamline text processing.
…ting async/await for improved error handling and clarity. Update function signatures and enhance binary check logic.
…ting async/await for improved error handling and clarity. Update function signatures and streamline text processing.
… codebase. This change enhances clarity and prepares for future refactoring.
…cript. Implement async/await for improved error handling and clarity. Update function signatures and streamline text processing methods.
…nt async/await for improved error handling and clarity. Update function signatures and streamline command generation for tesseract.
…Script

- Added @types/marked as a development dependency for type definitions.
- Replaced the JavaScript Markdown extraction logic with a TypeScript implementation, utilizing async/await for improved error handling.
- Updated function signatures and streamlined the extraction process for better clarity and performance.
- Removed the old JavaScript Markdown extractor file.
…ing async/await for improved error handling and clarity. Update function signatures and streamline text processing methods.
…tion logic to utilize the new utility for improved error handling and clarity. Update function signatures and streamline entry processing methods.
- Implemented a new extractor for XLS files, allowing text extraction from both .xls and .xlsx formats.
- Included error handling for file reading and conversion to CSV format.
- Defined supported MIME types for XLS file extraction.
…pm-lock.yaml for improved compatibility and performance.
- Changed the command option format in the README and types.ts to use '--psm' instead of '-psm' for better clarity and accuracy.
- Adjusted the corresponding test case to reflect the updated command option format.
…pmignore

- Added 'dist' to .gitignore to exclude build artifacts.
- Updated .npmignore to retain .vscode directory.
- Introduced 'build' script in package.json to clean and compile TypeScript files.
- Added 'rimraf' as a dependency for build script functionality.
- Adjusted tsconfig.json to prevent emitting output files during compilation.
- Added 'tsconfig.tsbuildinfo' to .gitignore to exclude TypeScript build info files.
- Updated ESLint configuration to include test files and modified rules for unpublished imports.
- Introduced a typecheck script in package.json and updated prepublishOnly to include type checking before building.
- Renamed extraction functions to `extractFromBuffer` and `extractFromFile` for clarity.
- Updated `package.json` to point to the new distribution files in the `dist` directory.
- Added a new contributor to the package metadata.
- Adjusted tests to utilize the new extraction function names for consistency.
- Introduced steps for linting, type checking, and building the project in the CI workflow.
- Ensured that these processes run before executing tests to maintain code quality and integrity.
@emirotin emirotin merged commit 3c09567 into master Nov 7, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants