Semantics CLI

A unified CLI toolkit for media intelligence, providing audio processing, video analysis, and web research capabilities — all powered by state-of-the-art AI models running inside Docker.

Extract meaning, not just metadata. Composable AI operations designed for developers.

Install

Prerequisites

Docker installed and running

Quick Install

Windows (PowerShell):

irm https://raw.githubusercontent.com/famda/semantics/main/docs/install.ps1 | iex

Linux / macOS:

curl -fsSL https://raw.githubusercontent.com/famda/semantics/main/docs/install.sh | bash

All processing runs inside a Docker container — no Python, CUDA, or model dependencies needed on your machine.

After installation, restart your terminal and verify:

semantics version

Quick Start

Audio: Transcribe and Identify Speakers

semantics audio -i interview.mp4 -o ./results -t -d

Video: Extract Scenes and Detect Objects

semantics video -i video.mp4 -o ./results --from-segments -s -eo

Research: Search and Download Web Content

semantics research -o ./results -s 'AI agents 2026' --download

Update to the Latest Version

semantics update

How it works: Each command transparently starts a Docker container, maps your local input/output paths into it, runs the AI pipeline, and writes results back to your machine. You never need to interact with Docker directly.

Commands

semantics-audio

Audio and speech processing toolkit.

Flag	Description
`-i, --input PATH`	Input media file (required)
`-o, --output PATH`	Output folder path (required)
`-e, --enhance-audio`	Enhance audio quality
`-n, --denoise`	Denoise the audio file
`-s, --stem`	Enable source separation (extract vocals)
`-v, --vad`	Enable Voice Activity Detection
`-t, --transcribe`	Transcribe audio to text
`-te, --transcribe-experimental`	Ultra-fast transcription with CTC alignment
`-d, --diarize`	Enable speaker diarization
`-ctc, --ctc-align`	Enable CTC forced alignment (requires `-t` and `-d`)
`-c, --classify`	Enable audio classification
`-ct, --classify-timeline`	Enable timeline audio classification
`-em, --emotion`	Enable emotion recognition (requires `-t` or `-ctc`)
`-se, --scene`	Enable scene/chapter detection (requires `-t` or `-ctc`)
`-ner, --named-entities`	Extract named entities from transcript (requires `-t` or `-ctc`)
`--debug`	Enable verbose debug logging
`--plain`	Disable rich formatting, use plain text output
`--config PATH`	Path to YAML config file
`-h, --help`	Show help message

Example:

semantics audio -i recording.mp4 -o ./audio_results -n -s -t -d -em

semantics-video

Video analysis and object detection toolkit.

Flag	Description
`-i, --input PATH`	Input video file or YouTube URL (required)
`-o, --output PATH`	Output folder path (required)
`--from-frames`	Analyze from extracted video frames
`--from-clustering`	Analyze from keyframe/clustering on frames
`--from-segments`	Analyze from keyframes/segments (one of these three is required)
`-t, --tiles`	Enable video tiling
`-eo, --extract-objects`	Extract objects from the video
`-co, --cluster-objects`	Cluster the extracted objects
`-classes, --object-classes TEXT`	Object classes to extract (default: `person`)
`--save-annotations`	Persist detection crops and masks to disk
`-c, --captions`	Extract captions from the video
`-s, --scenes`	Enable scene extraction
`-ocr, --extract-text`	Enable text extraction (OCR)
`-cl, --classify`	Enable frame classification
`-ner, --named-entities`	Extract named entities from captions (requires `-c`)
`-a, --actions`	Recognize human actions in the video
`--download-resolution INT`	Max video height when downloading from URL
`--save-frames`	Save extracted frames to disk
`-fps, --frames-per-second INT`	Frames per second to analyze (default: 1)
`--debug`	Enable verbose debug logging
`--plain`	Disable rich formatting, use plain text output
`--config PATH`	Path to YAML config file
`-h, --help`	Show help message

Note: You must specify one of --from-frames, --from-clustering, or --from-segments.

Example:

semantics video -i video.mp4 -o ./video_results --from-segments -s -eo -c

semantics-research

Web research and content extraction toolkit.

Flag	Description
`-i, --input PATH`	Input file for processing
`-o, --output PATH`	Output folder path (required)
`-s, --search TEXT`	Text query to research
`--search-limit INT`	Maximum number of web/video results
`--download`	Download/crawl search results (use with `-s`)
`--download-url URL`	Specific URL to crawl (alternative to `--download`)
`--download-deep`	Enable BFS deep crawling
`--download-max-depth INT`	Maximum traversal depth when deep crawling
`--download-max-pages INT`	Page budget when deep crawling
`--download-include-external`	Allow deep crawl to follow external domains
`--download-word-threshold INT`	Minimum word count for page materialization
`--structured`	Extract structured content from crawled pages
`--debug`	Enable verbose debug logging
`--plain`	Disable rich formatting, use plain text output
`--config PATH`	Path to YAML config file
`-h, --help`	Show help message

Note: You must specify one of -s (search query), --download-url, or -i (input file).

Example:

semantics research -o ./research_results -s 'machine learning trends' --download --structured

Utility Commands

`semantics update`

Pull the latest version of the CLI container image:

semantics update

`semantics version`

Show version and image information:

semantics version

`semantics help`

Show available commands and usage:

semantics help

Common Workflows

Interview Transcription

Denoise, separate vocals, transcribe, identify speakers, and detect emotions:

semantics audio -i interview.mp4 -o ./results/interview -n -s -t -d -em

Full Audio Analysis Pipeline

Run all audio modules including classification and scene detection:

semantics audio -i video.mp4 -o ./results/full_audio -e -n -s -v -t -d -ctc -c -em -se

Video Scene Analysis with Object Tracking

Extract scenes, detect objects (people by default), and save annotations:

semantics video -i video.mp4 -o ./results/scenes --from-segments -s -eo --save-annotations

Web Research Pipeline

Search for a topic, download results, and extract structured content:

semantics research -o ./results/research -s 'machine learning trends' --search-limit 10 --download --structured

Deep Crawl a Website

Crawl a specific URL with depth and page limits:

semantics research -o ./results/crawl --download-url 'https://example.com/docs' --download-deep --download-max-depth 3 --download-max-pages 50 --structured

Using a Config File

Override default model parameters and settings via YAML:

semantics audio -i recording.mp4 -o ./results/custom --config my-config.yml -t -d

Configuration

Each CLI supports YAML configuration files for advanced settings:

semantics audio -i input.mp4 -o ./output --config my_config.yml -t -d

Default configuration examples are located in the repository at:

configs/audio-config.yml
configs/video-config.yml
configs/research-config.yml

Output Structure

All CLIs write results to the specified output folder with organized subdirectories and structured data:

output_folder/
├── transcripts/        # Audio transcriptions (JSON, SRT, VTT)
├── diarization/        # Speaker diarization results
├── emotions/           # Emotion recognition data
├── entities/           # Named entity recognition results
├── scenes/             # Scene/chapter detection
├── objects/            # Detected objects and crops
├── frames/             # Extracted video frames
└── ...

Uninstall

Linux / macOS

rm -rf ~/.semantics

Then remove the PATH entry from your shell config (~/.bashrc, ~/.zshrc, etc.):

# Delete the line containing "/.semantics/bin" from your shell config

Windows (PowerShell)

Remove-Item -Recurse -Force "$env:LOCALAPPDATA\semantics"

# Remove from PATH
$p = [Environment]::GetEnvironmentVariable("Path", "User") -split ";" |
     Where-Object { $_ -notlike "*\semantics" }
[Environment]::SetEnvironmentVariable("Path", ($p -join ";"), "User")

Advanced: Docker Compose

For development or persistent containers, you can use Docker Compose directly.

1. Create a Docker Compose File

Create a docker-compose.yml file in your project directory:

x-cuda-support: &cuda-support
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu, utility, compute, video]

x-volumes: &volumes
  volumes:
    - ./.data/semantics:/workspaces

x-environment: &environment
  environment:
    - TF_ENABLE_ONEDNN_OPTS=0
    - TF_DISABLE_XLA=1
    - NVIDIA_DRIVER_CAPABILITIES=compute,utility,video

services:
  semantics-audio:
    image: famda/semantics:audio-latest
    tty: true
    stdin_open: true
    <<: [*environment, *volumes, *cuda-support]

  semantics-video:
    image: famda/semantics:video-latest
    tty: true
    stdin_open: true
    <<: [*environment, *volumes, *cuda-support]

  semantics-research:
    image: famda/semantics:research-latest
    tty: true
    stdin_open: true
    <<: [*environment, *volumes, *cuda-support]

2. Setup Your Workspace

# Create directories for input files and results
mkdir -p .data/semantics/assets
mkdir -p .data/semantics/results

# Copy your media files to the assets folder
cp your_video.mp4 .data/semantics/assets/

3. Start the Workers

docker compose up -d

4. Run Commands

docker compose exec semantics-audio bash -lc "semantics-audio -i /workspaces/assets/video.mp4 -o /workspaces/results/audio_test -t -d"
docker compose exec semantics-video bash -lc "semantics-video -i /workspaces/assets/video.mp4 -o /workspaces/results/video_test --from-segments -s -eo"
docker compose exec semantics-research bash -lc "semantics-research -o /workspaces/results/research_test -s 'AI trends' --download"

5. Stop Workers

docker compose down

Advanced: Standalone Docker Run

Run containers directly without Docker Compose:

Audio Processing

docker run --rm --gpus all \
  -v "$(pwd)/assets:/workspaces/input:ro" \
  -v "$(pwd)/results:/workspaces/output" \
  famda/semantics:audio-latest \
  -lc "semantics-audio -i /workspaces/input/sample.mp4 -o /workspaces/output -t -d"

Video Analysis

docker run --rm --gpus all \
  -v "$(pwd)/assets:/workspaces/input:ro" \
  -v "$(pwd)/results:/workspaces/output" \
  famda/semantics:video-latest \
  -lc "semantics-video -i /workspaces/input/sample.mp4 -o /workspaces/output --from-segments -s -eo"

Web Research

docker run --rm \
  -v "$(pwd)/results:/workspaces/output" \
  famda/semantics:research-latest \
  -lc "semantics-research -o /workspaces/output -s 'AI trends' --download"

Available Docker Images

Pre-built images are available on Docker Hub:

Tag Pattern	Description
`cli-latest`	All three CLIs in one image (used by the installer)
`audio-latest`, `video-latest`, `research-latest`	Single-CLI images
`audio-<sha>`, `video-<sha>`, `research-<sha>`, `cli-<sha>`	Specific commit builds

docker pull famda/semantics:cli-latest
docker pull famda/semantics:audio-latest

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
configs		configs
docs		docs
src		src
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
semantics.sln		semantics.sln

Folders and files

Latest commit

History

Repository files navigation

Semantics CLI

Install

Prerequisites

Quick Install

Quick Start

Audio: Transcribe and Identify Speakers

Video: Extract Scenes and Detect Objects

Research: Search and Download Web Content

Update to the Latest Version

Commands

semantics-audio

semantics-video

semantics-research

Utility Commands

semantics update

semantics version

semantics help

Common Workflows

Interview Transcription

Full Audio Analysis Pipeline

Video Scene Analysis with Object Tracking

Web Research Pipeline

Deep Crawl a Website

Using a Config File

Configuration

Output Structure

Uninstall

Linux / macOS

Windows (PowerShell)

Advanced: Docker Compose

1. Create a Docker Compose File

2. Setup Your Workspace

3. Start the Workers

4. Run Commands

5. Stop Workers

Advanced: Standalone Docker Run

Audio Processing

Video Analysis

Web Research

Available Docker Images

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages

`semantics update`

`semantics version`

`semantics help`