forked from dbashford/textract
-
Notifications
You must be signed in to change notification settings - Fork 0
V3 #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… deprecated util functions and implement new extraction methods for DOC files. Transition to ESM modules for better compatibility.
…mline binary checks. Update string handling for better clarity and consistency.
… handling. Update failed message handling for clarity and consistency.
…g type safety and clarity. Update extraction methods to use async/await for improved error handling and streamline text processing.
…ting async/await for improved error handling and clarity. Update function signatures and enhance binary check logic.
…ting async/await for improved error handling and clarity. Update function signatures and streamline text processing.
… codebase. This change enhances clarity and prepares for future refactoring.
…cript. Implement async/await for improved error handling and clarity. Update function signatures and streamline text processing methods.
…nt async/await for improved error handling and clarity. Update function signatures and streamline command generation for tesseract.
…Script - Added @types/marked as a development dependency for type definitions. - Replaced the JavaScript Markdown extraction logic with a TypeScript implementation, utilizing async/await for improved error handling. - Updated function signatures and streamlined the extraction process for better clarity and performance. - Removed the old JavaScript Markdown extractor file.
…ing async/await for improved error handling and clarity. Update function signatures and streamline text processing methods.
…tion logic to utilize the new utility for improved error handling and clarity. Update function signatures and streamline entry processing methods.
- Implemented a new extractor for XLS files, allowing text extraction from both .xls and .xlsx formats. - Included error handling for file reading and conversion to CSV format. - Defined supported MIME types for XLS file extraction.
…tement for improved clarity
…pm-lock.yaml for improved compatibility and performance.
- Changed the command option format in the README and types.ts to use '--psm' instead of '-psm' for better clarity and accuracy. - Adjusted the corresponding test case to reflect the updated command option format.
…branch and streamline pnpm setup
…ub package registry configuration
…roved error handling
…larity on exec options
… support for Tesseract OCR
…roved accuracy in assertions
…pmignore - Added 'dist' to .gitignore to exclude build artifacts. - Updated .npmignore to retain .vscode directory. - Introduced 'build' script in package.json to clean and compile TypeScript files. - Added 'rimraf' as a dependency for build script functionality. - Adjusted tsconfig.json to prevent emitting output files during compilation.
emirotin
commented
Nov 6, 2025
- Added 'tsconfig.tsbuildinfo' to .gitignore to exclude TypeScript build info files. - Updated ESLint configuration to include test files and modified rules for unpublished imports. - Introduced a typecheck script in package.json and updated prepublishOnly to include type checking before building.
- Renamed extraction functions to `extractFromBuffer` and `extractFromFile` for clarity. - Updated `package.json` to point to the new distribution files in the `dist` directory. - Added a new contributor to the package metadata. - Adjusted tests to utilize the new extraction function names for consistency.
- Introduced steps for linting, type checking, and building the project in the CI workflow. - Ensured that these processes run before executing tests to maintain code quality and integrity.
…sertions for clarity and accuracy
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Relaxed tests that produce inconsistent results between Linux and Mac (different programs are used for extraction)
Dropped DXF support (CAD files, require two libs and a binary built from source, one lib's source git URL is not working, lot's of fun for nothing)
Replaced all legacy deps.
PPTX extraction was fully vibe-coded