GitHub - Multimodal-Intelligence-Lab/CLIP-CC: The CLIP-CC Dataset is a carefully curated collection of 200 YouTube video links with human-written summaries, specifically designed for research and experimentation in multimodal AI tasks. This dataset addresses the growing need for high-quality video comprehension benchmarks that can effectively evaluate the narrative understanding capabilities.

A Comprehensive Dataset for Video Comprehension and Multimodal AI Research

📰 Research Paper · 📓 Interactive Notebook

🤔 What is CLIP-CC?

The CLIP-CC Dataset is a carefully curated collection of 200 YouTube video links with human-written summaries, specifically designed for research and experimentation in multimodal AI tasks. This dataset addresses the growing need for high-quality video comprehension benchmarks that can effectively evaluate the narrative understanding capabilities of Large Video Language Models (LVLMs).

🔬 How CLIP-CC is Made

🎬 Step 1: Curate & Clip

Source: YouTube's Rotten Tomatoes Movieclips channel

Selection Criteria:

Action-rich scenes with complex events
≥2 recurring actors for multi-character tracking
Multiple scenes/shots for temporal understanding
Characters leave and return (tests identity memory)
Cannot be understood from a single frame alone

Output: ~90 second clips

🎙️ Step 2: Human Narration

Narration Process:

Human watches clip and narrates in real-time
Present tense, factual description style
Consistent entity labels throughout
Audio recording of live narration
Professional transcription to text format

Output: Raw text transcript

🤖 Step 3: LLM Clean-up

Refinement Process:

Feed raw transcript to GPT-4o/ChatGPT
Fix grammar and sentence structure only
Preserve all original content (no drops)
Maintain narrative flow and accuracy
Quality assurance with diff check review

Output: Final dataset entry

📊 Dataset Statistics

🎬 Content

200 curated video clips
~90 seconds each
100% YouTube sourced

📝 Summaries

402 words average
349 words median
99-1,011 word range

🎭 Complexity

Multi-scene narratives
Character identity tracking
Action-dense sequences

📊 Distribution

57.5% detailed (300+ words)
24.0% long (200-300 words)
17.5% medium (100-200 words)

Sourced from Rotten Tomatoes Movieclips • 3-step curation process

Perfect for evaluating video captioning models, multimodal language models, and narrative understanding systems using advanced metrics like VCS (Video Comprehension Score).

📂 Dataset Contents & Key Features

📊 Dataset Structure

📋 Metadata Format

JSON Lines format with video links and summaries

🎥 Video Access

Optional download and 90-second clipping utilities

📋 Example Entry (metadata.jsonl):

{
  "id": "001",
  "file_link": "https://www.youtube.com/watch?v=abc123",
  "summary": "A man explains the basics of machine learning with real-world examples."
}

⚡ Getting Started

Choose the installation method that best fits your workflow. All methods provide access to the complete CLIP-CC dataset with 200 video entries and human-written summaries.

⚡ Quick Start Tip: Most research only needs metadata - video downloading is completely optional!

📦 Installation Methods

🎯 Manual Installation

Full control and development

🖱️ Click to expand steps

Terminal:

git clone https://github.com/hdubey-debug/CLIP-CC.git
cd CLIP-CC
pip install .

Colab/Jupyter:

!git clone https://github.com/hdubey-debug/CLIP-CC.git
%cd CLIP-CC
!pip install .

🔧 Full source access
🛠️ Modification ready
📁 Local development

🚀 Pip Installation

Quick and simple

🖱️ Click to expand steps

Terminal:

pip install git+https://github.com/hdubey-debug/CLIP-CC.git

Colab/Jupyter:

!pip install git+https://github.com/hdubey-debug/CLIP-CC.git

⚡ 30-second setup
🔥 Zero configuration
✅ Instant access

🤗 Hugging Face

Seamless dataset integration

🖱️ Click to expand steps

Terminal:

from datasets import load_dataset

# Load the dataset
dataset = load_dataset("IVSL-SDSU/Clip-CC")

# Access a sample entry
print(dataset["train"][0])

Colab/Jupyter:

# If path issues occur, upgrade packages:
!pip install --upgrade datasets fsspec

from datasets import load_dataset
dataset = load_dataset("IVSL-SDSU/Clip-CC")
print(dataset["train"][0])

🤗 Native HF integration
📊 Dataset ecosystem
🔄 Auto-versioning

💻 Basic Usage

Once installed, start using CLIP-CC in seconds:

🐍 Python Package (Manual & Pip)

# Import CLIP-CC (works for both manual and pip installation)
from clip_cc.loader import load_metadata

# Load dataset
data = load_metadata()

# Explore
print(f"Dataset size: {len(data)} videos")
print(f"Sample: {data[0]}")

# Access summaries
for entry in data[:3]:
    print(f"Video {entry['id']}: {entry['summary']}")

🤗 Hugging Face

# Load with Hugging Face
from datasets import load_dataset

dataset = load_dataset("IVSL-SDSU/Clip-CC")

# Explore
print(f"Dataset: {dataset}")
print(f"Sample: {dataset['train'][0]}")

# Access summaries
for i in range(3):
    entry = dataset['train'][i]
    print(f"Video {entry['id']}: {entry['summary']}")

🎥 Video Downloader & Clipper

⚠️ Video Downloading is Optional - Most research only needs the metadata (summaries + links)!

🔧 Setup & Usage

⚠️ Video Downloading is Terminal Only - Colab/Jupyter environments don't support the browser cookies needed for age-restricted videos.

💻 Terminal Only

1. Install Dependencies:

pip install yt-dlp ffmpeg-python

2. Check FFmpeg:

ffmpeg -version

3. Download & Clip Videos:

from clip_cc.downloader import download_and_clip_dataset

# Download one specific video (ID: 001)
download_and_clip_dataset(
    output_dir="downloads/clips",
    target_ids={"001"},
    use_browser_cookies=True,  # Uses Chrome/Firefox cookies directly
    clip_duration=90
)

print("✅ Videos downloaded and clipped!")

Parameters:

output_dir: Where to save the clipped videos
target_ids: Specific video ID(s) to download (use None to download all 200 videos)
use_browser_cookies: Use browser cookies directly (recommended for age-restricted videos)
cookiefile_path: Alternative cookie file path (if browser cookies don't work)
clip_duration: Video length in seconds (default: 90)

What this does:

Downloads videos using yt-dlp to a temporary folder
Clips the first 90 seconds using ffmpeg
Saves final clipped videos to output_dir
Automatically cleans up intermediate files

🔧 Troubleshooting Downloads

Age-Restricted Videos (Most Common Issue)

Problem: Sign in to confirm your age error

Solution: Use Browser Cookies (Terminal Only)

Method 1: Browser cookies (Recommended)

# Uses your browser's cookies directly (Chrome/Firefox)
# Make sure you're signed into YouTube in your browser first
download_and_clip_dataset(
    output_dir="clips",
    target_ids={"002"},  # Or None for all videos
    use_browser_cookies=True,  # Uses Chrome or Firefox cookies directly
    clip_duration=90
)

Method 2: Export cookies file (Alternative)

Export cookies from your browser:
- Install browser extension "Get cookies.txt LOCALLY" (Chrome/Firefox)
- Visit YouTube and sign in to your account
- Click the extension and download cookies.txt file
- Important: Make sure it's in Netscape format

Use cookies in download:

download_and_clip_dataset(
    output_dir="clips",
    target_ids={"002"},  # Or None for all videos
    cookiefile_path="cookies.txt",
    clip_duration=90
)

Note: Method 1 (browser cookies) works much more reliably than cookie files.

Other Common Issues

FFmpeg Issues:

Ensure ffmpeg is installed and in PATH
Test with: ffmpeg -version

Network Issues:

Check internet connection for YouTube access
Some videos may be region-restricted or removed

Video Not Found:

Video may have been removed from YouTube
The download will skip missing videos and continue with others

🤖 VCS Ecosystem Integration

CLIP-CC is designed to work seamlessly with VCS (Video Comprehension Score) for comprehensive video understanding evaluation.

📊 Perfect Integration: CLIP-CC + VCS

🎥 CLIP-CC provides the data → Rich video dataset with human summaries
🔍 VCS provides the evaluation → Advanced narrative comprehension metrics
🏆 Together: Complete research pipeline → From data loading to evaluation

📚 Citation

If you use CLIP-CC Dataset in your research, please cite:

@software{vcs_metrics_2024,
  title = {VCS Metrics: Video Comprehension Score for Text Similarity Evaluation},
  author = {Dubey, Harsh and Ali, Mukhtiar and Mishra, Sugam and Pack, Chulwoo},
  year = {2024},
  institution = {South Dakota State University},
  url = {https://github.com/Multimodal-Intelligence-Lab/Video-Comprehension-Score},
  note = {Python package for narrative similarity evaluation}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🌟 Made with ❤️ by the VCS Team

Authors: Harsh Dubey, Mukhtiar Ali, Sugam Mishra, and Chulwoo Pack
Institution: South Dakota State University
Year: 2024

⭐ Star this repo • 🐛 Report Bug • 💡 Request Feature • 💬 Community Q&A

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
clip_cc		clip_cc
metadata		metadata
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤔 What is CLIP-CC?

🔬 How CLIP-CC is Made

🎬 Step 1: Curate & Clip

🎙️ Step 2: Human Narration

🤖 Step 3: LLM Clean-up

📂 Dataset Contents & Key Features

📊 Dataset Structure

📋 Metadata Format

🎥 Video Access

⚡ Getting Started

📦 Installation Methods

🎯 Manual Installation

🚀 Pip Installation

🤗 Hugging Face

💻 Basic Usage

🐍 Python Package (Manual & Pip)

🤗 Hugging Face

🎥 Video Downloader & Clipper

🔧 Setup & Usage

💻 Terminal Only

🔧 Troubleshooting Downloads

🤖 VCS Ecosystem Integration

📚 Citation

📄 License

🌟 Made with ❤️ by the VCS Team

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

Multimodal-Intelligence-Lab/CLIP-CC

Folders and files

Latest commit

History

Repository files navigation

🤔 What is CLIP-CC?

🔬 How CLIP-CC is Made

🎬 Step 1: Curate & Clip

🎙️ Step 2: Human Narration

🤖 Step 3: LLM Clean-up

📂 Dataset Contents & Key Features

📊 Dataset Structure

📋 Metadata Format

🎥 Video Access

⚡ Getting Started

📦 Installation Methods

🎯 Manual Installation

🚀 Pip Installation

🤗 Hugging Face

💻 Basic Usage

🐍 Python Package (Manual & Pip)

🤗 Hugging Face

🎥 Video Downloader & Clipper

🔧 Setup & Usage

💻 Terminal Only

🔧 Troubleshooting Downloads

🤖 VCS Ecosystem Integration

📚 Citation

📄 License

🌟 Made with ❤️ by the VCS Team

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages