This project chunks FLAC audio files into 20-second segments and transcribes them using OpenAI's Whisper LLM via HuggingFace Inference API.
- Audio Chunking: Splits FLAC files into configurable 20-second segments using PyDub
- Whisper Integration: Sends each chunk to Whisper LLM for transcription
- Results Management: Writes all transcriptions to a formatted text file
- File Protection: Results file is set to read-only after creation
-
Python 3.8+
-
FFmpeg: Required by PyDub for audio processing
- macOS:
brew install ffmpeg - Linux:
apt-get install ffmpeg - Windows: Download from https://ffmpeg.org/download.html
- macOS:
-
HuggingFace Credentials:
- Create account at https://huggingface.co
- Get HF Token from https://huggingface.co/settings/tokens
- Set up HF Inference endpoint for Whisper
-
Install Dependencies:
pip install -r requirements.txt
-
Configure Environment Variables in
.envfile:HF_TOKEN=your_huggingface_token_here HF_INFERENCE_ENDPOINT=your_inference_endpoint_url_here -
Prepare Audio Files:
- Place FLAC files in the
audio/folder - Ensure files are in FLAC format (.flac extension)
- Place FLAC files in the
Run the main script:
python chunk_and_transcribe.py- Scans the
audio/folder for FLAC files - For each FLAC file:
- Creates chunks of 20 seconds
- Saves chunks to
chunks/{filename_without_extension}/folder
- Processes all chunks through Whisper LLM API
- Writes results to
transcription_results.txt(read-only)
- Chunks: Stored in
chunks/{original_filename}/directory - Transcriptions: Saved to
transcription_results.txt- Formatted with chunk names and transcription results
- Timestamp of generation included
- File is automatically set to read-only
chunk-script/
├── audio/ # Place FLAC files here
├── chunks/ # Generated chunk files
├── sciprt2.py # Original script (reference)
├── chunk_and_transcribe.py # Main processing script
├── transcription_results.txt # Output file (read-only)
├── requirements.txt # Python dependencies
├── .env # Environment variables (not in repo)
└── README.md # This file
Edit chunk_and_transcribe.py to customize:
- Chunk Length: Change
chunk_length = 20(in seconds) - Audio Folder: Change
audio_folder = "audio" - Chunks Folder: Change
chunks_folder = "chunks" - Output File: Change
output_text_file = "transcription_results.txt"
- Install FFmpeg (see Prerequisites section)
- Verify files are in
audio/folder - Check file extension is
.flac
- Verify HF_TOKEN is correct in
.env - Check HF_INFERENCE_ENDPOINT URL is valid
- Ensure you have API credits/access
- The file is intentionally read-only. Change permissions if needed:
chmod u+w transcription_results.txt
- Original script:
sciprt2.py - Reference chunking code provided
- Built with PyDub and Whisper LLM integration
- Each chunk is processed sequentially for reliable API handling
- Results include both successful transcriptions and any errors
- File timestamps help track when transcriptions were created
- Read-only status prevents accidental modification of results