Skip to content

ADD transcription local functionality#30

Merged
SergioNR merged 22 commits intonextfrom
add-transcription-functionality
Jan 2, 2026
Merged

ADD transcription local functionality#30
SergioNR merged 22 commits intonextfrom
add-transcription-functionality

Conversation

@SergioNR
Copy link
Copy Markdown
Member

@SergioNR SergioNR commented Jan 2, 2026

This is done to not rely on AWS

Copy link
Copy Markdown

@kilo-code-bot kilo-code-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All inline comments added with specific fixes for critical issues.

languageCode: 'es-ES',
};

if (process.env.TRANSCRIPTION_ENABLED === 'true') {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL: Environment variable never set

Code checks process.env.TRANSCRIPTION_ENABLED === 'true' but this environment variable is never defined in compose.yaml. Transcription will never be enabled.

Fix: Add TRANSCRIPTION_ENABLED: 'true' to the server environment variables in compose.yaml.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, will not longer be used

Comment thread server/services/analysisService.js Outdated
const transcriptionJobResultString = await fetchSingleTranscriptionJob(transcriptionJobDetails.AnalysisEntry.analysis_id, transcriptionJob.TranscriptionJobName);

// 3. Parse the transcription job result (JSON string to object)
const transcriptionJobResult = JSON.parse(transcriptionJobResultString);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL: References undefined variables

Lines 18 and 21 reference transcriptionJobResultString and transcriptionJobResult that were removed when AWS Transcribe logic was deleted. This function will crash.

Fix: Remove or fix these undefined variable references.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file has been removed

});

// Make request with query parameters
const response = await axios.post(`http://localhost:9007/asr?${params.toString()}`, form, {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ WARNING: Hardcoded service URL

Uses http://localhost:9007/asr which won't work in production. Should use environment variable.

Suggested change
const response = await axios.post(`http://localhost:9007/asr?${params.toString()}`, form, {
const transcriptionServiceUrl = process.env.TRANSCRIPTION_SERVICE_URL || 'http://localhost:9007/asr';
const response = await axios.post(`${transcriptionServiceUrl}?${params.toString()}`, form, {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed - endpoint is now set as an env variable

import { processPendingTranscriptionJobs } from '../controllers/transcriptionController.js';
import { logError } from '../config/loggerFunctions.js';

export const getPendingTranscriptionJobScheduler = new CronJob('* * * * *', async () => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ WARNING: High-frequency cron job

Runs every minute (* * * * *) which might be excessive for transcription jobs. Consider running every 5-15 minutes.

Suggested change
export const getPendingTranscriptionJobScheduler = new CronJob('* * * * *', async () => {
export const getPendingTranscriptionJobScheduler = new CronJob('*/5 * * * *', async () => {


logInfo(`Transcription job ${transcriptionJob.analysis_entry_id} completed successfully`);
} catch (error) {
logError(`Error processing transcription job, ${transcriptionJob.analysis_entry_id}`, error);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ WARNING: Missing job status updates for failures

Only logs errors but doesn't update the database job status to FAILED. Failed jobs remain in PENDING state indefinitely.

Fix: Update job status to FAILED in the catch block before logging the error.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional, so it gets picked up again in the next iteration

task: 'transcribe',
output: 'json',
word_timestamps: 'false', // Works with video audio
language: 'es',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ WARNING: Inconsistent language handling

Hardcodes language: 'es' but the database stores languageCode: 'es-ES'. Should use the actual language code from the database.

suggestion\n const params = new URLSearchParams({\n task: 'transcribe',\n output: 'json',\n word_timestamps: 'false',\n language: transcriptionJob.language_code.split('-')[0], // Extract 'es' from 'es-ES'\n vad_filter: 'true',\n });\n

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot Bot commented Jan 2, 2026

✅ Previous Issues Addressed - PR Merged

The following issues from my previous review have been resolved in the merged PR:

  • Environment variable issue - Removed TRANSCRIPTION_ENABLED check entirely
  • Undefined variables - Entire analysisService.js file removed as no longer needed
  • Hardcoded service URL - Now uses TRANSCRIPTION_ENDPOINT environment variable
  • Cron frequency - Changed from every minute to every 15 minutes (*/15 * * * *)
  • Language handling - Fixed to properly extract language code from database format
  • Job status behavior - Confirmed intentional design for retry mechanism

Summary

PR Status:MERGED (2026-01-02T12:56:43Z)
22 commits | 206 additions, 776 deletions | 16 files changed

This was a major architectural refactor replacing AWS Transcribe with local Whisper ASR service (faster-whisper-transcribe), including:

  • New transcription controller and models
  • Docker service integration
  • Database schema updates
  • Updated cron job scheduling
  • Removed AWS Transcribe dependencies

Recommendation: All critical and warning issues from the original review have been addressed in the merged implementation.

@SergioNR SergioNR merged commit 1746109 into next Jan 2, 2026
@SergioNR SergioNR deleted the add-transcription-functionality branch January 2, 2026 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant