A high-performance, multithreaded file compression and decompression tool based on the Huffman Coding Algorithm — written entirely in modern C++17.
This project efficiently splits large files into chunks, compresses them in parallel threads, and then reconstructs them — achieving excellent speed and compression ratio.
✅ Multithreaded processing – Utilizes multiple CPU cores for faster compression/decompression
✅ Huffman Coding – Lossless data compression
✅ Chunk-based architecture – Handles very large files efficiently
✅ Cross-platform path handling – Works seamlessly on Windows, Linux, and macOS
✅ Thread-safe queues – Robust concurrency management
✅ Compression statistics – View compression ratio and processing time
- Input file is read in fixed-size chunks (default 1 MB each).
- Each chunk is sent to a worker thread through a thread-safe queue.
- Worker compresses the chunk using Huffman coding and sends it back.
- Main thread writes compressed data + Huffman code metadata into the output file.
- Reads Huffman code metadata for each chunk.
- Sends compressed data to multiple worker threads.
- Each thread reconstructs original data.
- Main thread writes decompressed chunks sequentially.
┌──────────────────────────┐
│ Main Thread │
├─────────────┬────────────┤
│ Reads file │ Writes file│
│ in chunks │ sequentially
└──────┬──────┘
│
▼
┌──────────────┐ ┌──────────────┐
│ SafeQueue │ --> │ WorkerThread│
│ (inputQueue) │ │ (compression)│
└──────────────┘ └──────────────┘
│
▼
┌──────────────┐
│ SafeQueue │
│ (outputQueue)│
└──────────────┘
Make sure you’re using a C++17-compatible compiler.
g++ -std=c++17 -pthread main.cpp -o huffman./huffman
Enter mode (c for compress, d for decompress): c
Enter input file path: input.txt
Enter output file path: output.huf
Enter number of threads (default 4): 4
Enter chunk size in MB (default 1): 1./huffman
Enter mode (c for compress, d for decompress): d
Enter input file path: input.huf
Enter output file path: output.txt
Enter number of threads (default 4): 4
Enter chunk size in MB (default 1): 1
---
## 📊 Example Output
Starting compression with 4 threads and chunk size 1 MB...
Processing chunk 0 (20% complete)
Processing chunk 1 (40% complete)
...
-------- Operation Statistics --------
Operation: Compression
Input file size: 10,485,760 bytes
Output file size: 3,242,112 bytes
Compression ratio: 69.09%
Processing time: 4.32 seconds
Threads used: 4
Chunk size: 1 MB
----------------------------------
Operation completed successfully! Output file: output.huf
Huffman Coding is a lossless compression algorithm that assigns shorter binary codes to more frequent characters and longer codes to less frequent ones.
Example:
| Character | Frequency | Code |
|---|---|---|
a |
10 | 0 |
b |
5 | 10 |
c |
2 | 110 |
d |
1 | 111 |
Text "abac" → binary "0100110"
├── main.cpp # Main source file ├── README.md # Documentation └── (output files) ├── input.txt ├── output.huf └── decompressed.txt
| Component | Purpose |
|---|---|
normalizePath() |
Cleans and converts file paths |
SafeQueue |
Thread-safe queue for inter-thread communication |
HuffmanNode |
Represents tree node with char and frequency |
buildHuffmanTree() |
Builds Huffman tree from frequency map |
generateCodes() |
Generates binary codes recursively |
compressChunk() |
Compresses a single file chunk |
decompressChunk() |
Restores original chunk |
worker() |
Handles per-thread compression |
decompressWorker() |
Handles per-thread decompression |
- Use higher thread counts (
NUM_THREADS) for large files. - Adjust chunk size (
CHUNK_SIZE) for optimal CPU utilization. - SSD storage improves I/O speed.
- Avoid running with too many threads on low-core CPUs.
- C++17 or later
- g++ / clang++ / MSVC
- Supported OS: Windows, Linux, macOS
This project is open-source and free to use under the MIT License.
Palguna Shetty 🎓 B.E. in Computer Science & Engineering (2026) 📧 [palgunashetty263@example.com] 🌐 GitHub Profile
“Parallelism is not just speed — it’s efficiency done right.”