A unified CLI toolkit for media intelligence, providing audio processing, video analysis, and web research capabilities — all powered by state-of-the-art AI models running inside Docker.
Extract meaning, not just metadata. Composable AI operations designed for developers.
- Docker installed and running
Windows (PowerShell):
irm https://raw.githubusercontent.com/famda/semantics/main/docs/install.ps1 | iexLinux / macOS:
curl -fsSL https://raw.githubusercontent.com/famda/semantics/main/docs/install.sh | bashAll processing runs inside a Docker container — no Python, CUDA, or model dependencies needed on your machine.
After installation, restart your terminal and verify:
semantics versionsemantics audio -i interview.mp4 -o ./results -t -dsemantics video -i video.mp4 -o ./results --from-segments -s -eosemantics research -o ./results -s 'AI agents 2026' --downloadsemantics updateHow it works: Each command transparently starts a Docker container, maps your local input/output paths into it, runs the AI pipeline, and writes results back to your machine. You never need to interact with Docker directly.
Audio and speech processing toolkit.
| Flag | Description |
|---|---|
-i, --input PATH |
Input media file (required) |
-o, --output PATH |
Output folder path (required) |
-e, --enhance-audio |
Enhance audio quality |
-n, --denoise |
Denoise the audio file |
-s, --stem |
Enable source separation (extract vocals) |
-v, --vad |
Enable Voice Activity Detection |
-t, --transcribe |
Transcribe audio to text |
-te, --transcribe-experimental |
Ultra-fast transcription with CTC alignment |
-d, --diarize |
Enable speaker diarization |
-ctc, --ctc-align |
Enable CTC forced alignment (requires -t and -d) |
-c, --classify |
Enable audio classification |
-ct, --classify-timeline |
Enable timeline audio classification |
-em, --emotion |
Enable emotion recognition (requires -t or -ctc) |
-se, --scene |
Enable scene/chapter detection (requires -t or -ctc) |
-ner, --named-entities |
Extract named entities from transcript (requires -t or -ctc) |
--debug |
Enable verbose debug logging |
--plain |
Disable rich formatting, use plain text output |
--config PATH |
Path to YAML config file |
-h, --help |
Show help message |
Example:
semantics audio -i recording.mp4 -o ./audio_results -n -s -t -d -emVideo analysis and object detection toolkit.
| Flag | Description |
|---|---|
-i, --input PATH |
Input video file or YouTube URL (required) |
-o, --output PATH |
Output folder path (required) |
--from-frames |
Analyze from extracted video frames |
--from-clustering |
Analyze from keyframe/clustering on frames |
--from-segments |
Analyze from keyframes/segments (one of these three is required) |
-t, --tiles |
Enable video tiling |
-eo, --extract-objects |
Extract objects from the video |
-co, --cluster-objects |
Cluster the extracted objects |
-classes, --object-classes TEXT |
Object classes to extract (default: person) |
--save-annotations |
Persist detection crops and masks to disk |
-c, --captions |
Extract captions from the video |
-s, --scenes |
Enable scene extraction |
-ocr, --extract-text |
Enable text extraction (OCR) |
-cl, --classify |
Enable frame classification |
-ner, --named-entities |
Extract named entities from captions (requires -c) |
-a, --actions |
Recognize human actions in the video |
--download-resolution INT |
Max video height when downloading from URL |
--save-frames |
Save extracted frames to disk |
-fps, --frames-per-second INT |
Frames per second to analyze (default: 1) |
--debug |
Enable verbose debug logging |
--plain |
Disable rich formatting, use plain text output |
--config PATH |
Path to YAML config file |
-h, --help |
Show help message |
Note: You must specify one of
--from-frames,--from-clustering, or--from-segments.
Example:
semantics video -i video.mp4 -o ./video_results --from-segments -s -eo -cWeb research and content extraction toolkit.
| Flag | Description |
|---|---|
-i, --input PATH |
Input file for processing |
-o, --output PATH |
Output folder path (required) |
-s, --search TEXT |
Text query to research |
--search-limit INT |
Maximum number of web/video results |
--download |
Download/crawl search results (use with -s) |
--download-url URL |
Specific URL to crawl (alternative to --download) |
--download-deep |
Enable BFS deep crawling |
--download-max-depth INT |
Maximum traversal depth when deep crawling |
--download-max-pages INT |
Page budget when deep crawling |
--download-include-external |
Allow deep crawl to follow external domains |
--download-word-threshold INT |
Minimum word count for page materialization |
--structured |
Extract structured content from crawled pages |
--debug |
Enable verbose debug logging |
--plain |
Disable rich formatting, use plain text output |
--config PATH |
Path to YAML config file |
-h, --help |
Show help message |
Note: You must specify one of
-s(search query),--download-url, or-i(input file).
Example:
semantics research -o ./research_results -s 'machine learning trends' --download --structuredPull the latest version of the CLI container image:
semantics updateShow version and image information:
semantics versionShow available commands and usage:
semantics helpDenoise, separate vocals, transcribe, identify speakers, and detect emotions:
semantics audio -i interview.mp4 -o ./results/interview -n -s -t -d -emRun all audio modules including classification and scene detection:
semantics audio -i video.mp4 -o ./results/full_audio -e -n -s -v -t -d -ctc -c -em -seExtract scenes, detect objects (people by default), and save annotations:
semantics video -i video.mp4 -o ./results/scenes --from-segments -s -eo --save-annotationsSearch for a topic, download results, and extract structured content:
semantics research -o ./results/research -s 'machine learning trends' --search-limit 10 --download --structuredCrawl a specific URL with depth and page limits:
semantics research -o ./results/crawl --download-url 'https://example.com/docs' --download-deep --download-max-depth 3 --download-max-pages 50 --structuredOverride default model parameters and settings via YAML:
semantics audio -i recording.mp4 -o ./results/custom --config my-config.yml -t -dEach CLI supports YAML configuration files for advanced settings:
semantics audio -i input.mp4 -o ./output --config my_config.yml -t -dDefault configuration examples are located in the repository at:
configs/audio-config.ymlconfigs/video-config.ymlconfigs/research-config.yml
All CLIs write results to the specified output folder with organized subdirectories and structured data:
output_folder/
├── transcripts/ # Audio transcriptions (JSON, SRT, VTT)
├── diarization/ # Speaker diarization results
├── emotions/ # Emotion recognition data
├── entities/ # Named entity recognition results
├── scenes/ # Scene/chapter detection
├── objects/ # Detected objects and crops
├── frames/ # Extracted video frames
└── ...
rm -rf ~/.semanticsThen remove the PATH entry from your shell config (~/.bashrc, ~/.zshrc, etc.):
# Delete the line containing "/.semantics/bin" from your shell configRemove-Item -Recurse -Force "$env:LOCALAPPDATA\semantics"
# Remove from PATH
$p = [Environment]::GetEnvironmentVariable("Path", "User") -split ";" |
Where-Object { $_ -notlike "*\semantics" }
[Environment]::SetEnvironmentVariable("Path", ($p -join ";"), "User")For development or persistent containers, you can use Docker Compose directly.
Create a docker-compose.yml file in your project directory:
x-cuda-support: &cuda-support
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu, utility, compute, video]
x-volumes: &volumes
volumes:
- ./.data/semantics:/workspaces
x-environment: &environment
environment:
- TF_ENABLE_ONEDNN_OPTS=0
- TF_DISABLE_XLA=1
- NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
services:
semantics-audio:
image: famda/semantics:audio-latest
tty: true
stdin_open: true
<<: [*environment, *volumes, *cuda-support]
semantics-video:
image: famda/semantics:video-latest
tty: true
stdin_open: true
<<: [*environment, *volumes, *cuda-support]
semantics-research:
image: famda/semantics:research-latest
tty: true
stdin_open: true
<<: [*environment, *volumes, *cuda-support]# Create directories for input files and results
mkdir -p .data/semantics/assets
mkdir -p .data/semantics/results
# Copy your media files to the assets folder
cp your_video.mp4 .data/semantics/assets/docker compose up -ddocker compose exec semantics-audio bash -lc "semantics-audio -i /workspaces/assets/video.mp4 -o /workspaces/results/audio_test -t -d"
docker compose exec semantics-video bash -lc "semantics-video -i /workspaces/assets/video.mp4 -o /workspaces/results/video_test --from-segments -s -eo"
docker compose exec semantics-research bash -lc "semantics-research -o /workspaces/results/research_test -s 'AI trends' --download"docker compose downRun containers directly without Docker Compose:
docker run --rm --gpus all \
-v "$(pwd)/assets:/workspaces/input:ro" \
-v "$(pwd)/results:/workspaces/output" \
famda/semantics:audio-latest \
-lc "semantics-audio -i /workspaces/input/sample.mp4 -o /workspaces/output -t -d"docker run --rm --gpus all \
-v "$(pwd)/assets:/workspaces/input:ro" \
-v "$(pwd)/results:/workspaces/output" \
famda/semantics:video-latest \
-lc "semantics-video -i /workspaces/input/sample.mp4 -o /workspaces/output --from-segments -s -eo"docker run --rm \
-v "$(pwd)/results:/workspaces/output" \
famda/semantics:research-latest \
-lc "semantics-research -o /workspaces/output -s 'AI trends' --download"Pre-built images are available on Docker Hub:
| Tag Pattern | Description |
|---|---|
cli-latest |
All three CLIs in one image (used by the installer) |
audio-latest, video-latest, research-latest |
Single-CLI images |
audio-<sha>, video-<sha>, research-<sha>, cli-<sha> |
Specific commit builds |
docker pull famda/semantics:cli-latest
docker pull famda/semantics:audio-latestMIT
