π’ Subscribe to the author's telegram channel for updates and more projects: @vtvz_dev
A powerful Rust-based tool for processing and organizing medical documents from Telegram chat exports into structured PDF documents with automatic OCR, metadata extraction, and comprehensive table of contents generation.
MedPack transforms structured Telegram chat exports containing medical records into beautifully organized PDF documents. It intelligently processes images, PDFs, and text messages, groups them by person, and creates professional medical document collections with proper pagination, OCR processing, and detailed table of contents.
- π± Multi-format Processing: Handles images (PNG, JPG), PDFs, and text messages from Telegram exports
- π OCR Integration: Automatic OCR processing for images using
ocrmypdfwith Russian and English language support - π Metadata Extraction: Parses YAML metadata blocks from messages to extract structured medical record information
- π₯ Smart Organization: Groups messages by person and creates separate PDF documents for each individual
- π Table of Contents: Generates detailed TOC with page numbers, dates, tags, and clickable Telegram message links
- β‘ Parallel Processing: Multi-threaded processing with real-time progress bars for efficient handling of large datasets
- π·οΈ Document Labeling: Adds professional headers, footers, and page numbers to all documents
- π οΈ Flexible Configuration: Optional OCR processing, temporary file preservation for debugging
- π Telegram Integration: Preserves links to original messages for easy reference
Want to see MedPack in action? Check out our live example Telegram group:
This group contains:
- π± Real medical record messages with proper YAML metadata formatting
- πΌοΈ Sample images and PDFs showing the input format MedPack expects
- π Processing results - the final generated PDF documents
- π‘ Best practices for structuring your medical records in Telegram
The group demonstrates exactly how to format your Telegram messages for optimal MedPack processing, including proper YAML metadata blocks, image attachments, and text formatting. You can use this as a reference when preparing your own medical record exports.
The easiest way to run MedPack is using Docker. All prerequisites are preinstalled in the image.
docker run --rm -v "$(pwd):$(pwd)" -w "$(pwd)" -u "$(id -u):$(id -g)" -it --pull always ghcr.io/vtvz/medpack:latestBefore using MedPack, ensure you have all the required external tools installed. The complete list of required tools can be found in the src/command.rs file.
- Clone the repository:
git clone <repository-url>
cd medpack- Build the project:
cargo build --releaseThe binary will be available at target/release/medpack.
Alternatively, you can install MedPack directly to your system using Cargo:
cargo install --path .This will install the medpack binary to your Cargo bin directory (usually ~/.cargo/bin/), making it available system-wide.
medpack [OPTIONS] [SOURCES...]For a complete list of available options and their descriptions, run:
medpack --helpProcess current directory:
medpackProcess specific directories without OCR:
medpack --no-ocr /path/to/export1 /path/to/export2Debug mode with temporary file preservation:
medpack --preserve-tmp --no-ocr ./telegram_exportProcess multiple exports simultaneously:
medpack ~/Downloads/ChatExport_2023 ~/Downloads/ChatExport_2024π‘ Tip: When processing multiple exports, MedPack will merge them together. This allows you to process only new days in the future instead of re-exporting the entire chat history - simply export the new messages and process them alongside your existing exports.
π Note: When merging exports that contain the same messages (including edited versions), MedPack automatically uses the latest edited version of each message. This ensures that any corrections or updates made to medical records in Telegram are properly reflected in the final PDF output.
MedPack expects Telegram chat exports in JSON format with the following structure:
telegram_export/
βββ result.json # Main export file with message data
βββ photos/ # Directory containing image files
β βββ photo_1.jpg
β βββ photo_2.png
βββ files/ # Directory containing PDF attachments
βββ document.pdf
- π Messages with YAML metadata blocks - Define medical records with structured information
- π· Image messages - Photos in PNG or JPEG format (both compressed regular photos and uncompressed file attachments) that can be processed with OCR
- π PDF attachments - Direct PDF files from messages
- π¬ Text messages - Converted to PDF format
Messages MUST contain YAML blocks with medical record metadata:
date: 2023.12.22
person: John Doe
tags:
- cardiology
- checkup
- ECG
place: City Hospital
doctor: Dr. SmithFor text-only records (messages without images or PDF files), you can use special code blocks to enhance the content:
HTML Code Blocks - Insert raw HTML directly into the generated PDF
CSV Code Blocks - Create tables from CSV data, where the first row is treated as the header
Hidden Code Blocks - Add personal notes that won't appear in the final PDF
Telegram Formatting - All Telegram message formatting is preserved
Example Text Record:
```yaml
date: 2023.12.22
person: John Doe
tags:
- consultation
- notes
```
Patient reported feeling better after treatment.
```html
<div class="alert alert-info">
<strong>Important:</strong> Patient has allergies to penicillin and sulfa drugs.
</div>
```
```csv
Medication,Dosage,Frequency,Duration
Aspirin,100mg,Daily,30 days
Lisinopril,10mg,Daily,Ongoing
Metformin,500mg,Twice daily,90 days
```
Follow-up appointment scheduled for next month.
```hidden
Remember to follow up on blood test results next week.
Patient seemed anxious - consider referral to counselor.
```
- π Position: The YAML block must be at the very beginning of the message text
- π· Multiple Images: If a medical record consists of multiple images, the YAML block should be placed under the first image in the sequence
- πΌοΈ Image Format: Images must be in PNG or JPEG format (both compressed regular photos and uncompressed file attachments) for proper OCR processing
- π» Formatting: The YAML block must be formatted as code within the Telegram message, not as plain text
| Field | Type | Description | Required |
|---|---|---|---|
date |
String | Date of the medical record (YYYY.MM.DD) | β |
person |
String | Name of the person the record belongs to | β |
tags |
Array | List of tags/categories for the record | β |
place |
String | Medical facility or location | β |
doctor |
String | Doctor's name | β |
Tags now support HTML formatting for enhanced visual presentation in the generated PDFs. This is particularly useful for highlighting important issues or categorizing records with visual emphasis.
Examples:
tags:
- cardiology
- <b>urgent</b>
- <i>follow-up required</i>
- ECG
- <b style="color: red;">critical</b>For each person found in the chat export, MedPack generates:
PersonName.pdf- Complete medical document collection- Table of Contents - At the beginning of each PDF containing:
- Record dates and tags
- Page numbers with proper pagination
- Clickable links to original Telegram messages
- Doctor and location information
- Professional formatting with Bootstrap CSS
- π Professional Layout: Clean, medical-grade document formatting
- π’ Page Numbers: Consistent pagination throughout the document
- π·οΈ Headers & Footers: Record metadata displayed in document headers
- π Telegram Links: Direct links to original messages for verification
- π Progress Tracking: Real-time progress bars during processing
- π¨ Responsive Design: Bootstrap-based HTML rendering for PDFs
# Error: command not found
medpack: error: `img2pdf` not found in PATHSolution: Install missing prerequisites using your package manager.
# Use --no-ocr flag to completely disable OCR processing
medpack --no-ocrNote: The --no-ocr flag completely disables OCR processing for images, which significantly speeds up processing but means that text within images will not be extracted or searchable in the final PDF.
Enable debug mode to inspect temporary files:
medpack --preserve-tmpThis will output paths to temporary directories:
tmp folders: /tmp/medpack_html_xyz /tmp/medpack_img_xyz /tmp/medpack_label_xyz
This project is licensed under the MIT License. See the LICENSE file for details.
π§ Support: For issues and questions, please use the GitHub issue tracker.
π Updates: Check releases for the latest features and bug fixes.
π¬ Personal Support: If you have any questions or need help, feel free to reach out to me personally on Telegram: @vtvz_me