Skip to content

The CLIP-CC Dataset is a carefully curated collection of 200 YouTube video links with human-written summaries, specifically designed for research and experimentation in multimodal AI tasks. This dataset addresses the growing need for high-quality video comprehension benchmarks that can effectively evaluate the narrative understanding capabilities.

License

Notifications You must be signed in to change notification settings

Multimodal-Intelligence-Lab/CLIP-CC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CLIP-CC Process Flow

A Comprehensive Dataset for Video Comprehension and Multimodal AI Research

Python 3.8+ License: MIT Dataset

πŸ“° Research Paper Β· πŸ““ Interactive Notebook


πŸ€” What is CLIP-CC?

The CLIP-CC Dataset is a carefully curated collection of 200 YouTube video links with human-written summaries, specifically designed for research and experimentation in multimodal AI tasks. This dataset addresses the growing need for high-quality video comprehension benchmarks that can effectively evaluate the narrative understanding capabilities of Large Video Language Models (LVLMs).

πŸ”¬ How CLIP-CC is Made

🎬 Step 1: Curate & Clip

Source: YouTube's Rotten Tomatoes Movieclips channel

Selection Criteria:

  • Action-rich scenes with complex events
  • β‰₯2 recurring actors for multi-character tracking
  • Multiple scenes/shots for temporal understanding
  • Characters leave and return (tests identity memory)
  • Cannot be understood from a single frame alone

Output: ~90 second clips

πŸŽ™οΈ Step 2: Human Narration

Narration Process:

  • Human watches clip and narrates in real-time
  • Present tense, factual description style
  • Consistent entity labels throughout
  • Audio recording of live narration
  • Professional transcription to text format

Output: Raw text transcript

πŸ€– Step 3: LLM Clean-up

Refinement Process:

  • Feed raw transcript to GPT-4o/ChatGPT
  • Fix grammar and sentence structure only
  • Preserve all original content (no drops)
  • Maintain narrative flow and accuracy
  • Quality assurance with diff check review

Output: Final dataset entry

πŸ“Š Dataset Statistics

🎬 Content

  • 200 curated video clips
  • ~90 seconds each
  • 100% YouTube sourced

πŸ“ Summaries

  • 402 words average
  • 349 words median
  • 99-1,011 word range

🎭 Complexity

  • Multi-scene narratives
  • Character identity tracking
  • Action-dense sequences

πŸ“Š Distribution

  • 57.5% detailed (300+ words)
  • 24.0% long (200-300 words)
  • 17.5% medium (100-200 words)

Sourced from Rotten Tomatoes Movieclips β€’ 3-step curation process

Perfect for evaluating video captioning models, multimodal language models, and narrative understanding systems using advanced metrics like VCS (Video Comprehension Score).


πŸ“‚ Dataset Contents & Key Features

πŸ“Š Dataset Structure

πŸ“‹ Metadata Format

JSON Lines format with video links and summaries

πŸŽ₯ Video Access

Optional download and 90-second clipping utilities

πŸ“‹ Example Entry (metadata.jsonl):

{
  "id": "001",
  "file_link": "https://www.youtube.com/watch?v=abc123",
  "summary": "A man explains the basics of machine learning with real-world examples."
}

⚑ Getting Started

Choose the installation method that best fits your workflow. All methods provide access to the complete CLIP-CC dataset with 200 video entries and human-written summaries.

⚑ Quick Start Tip: Most research only needs metadata - video downloading is completely optional!

πŸ“¦ Installation Methods

🎯 Manual Installation

Full control and development

πŸ–±οΈ Click to expand steps

Terminal:

git clone https://github.com/hdubey-debug/CLIP-CC.git
cd CLIP-CC
pip install .

Colab/Jupyter:

!git clone https://github.com/hdubey-debug/CLIP-CC.git
%cd CLIP-CC
!pip install .

πŸ”§ Full source access
πŸ› οΈ Modification ready
πŸ“ Local development

πŸš€ Pip Installation

Quick and simple

πŸ–±οΈ Click to expand steps

Terminal:

pip install git+https://github.com/hdubey-debug/CLIP-CC.git

Colab/Jupyter:

!pip install git+https://github.com/hdubey-debug/CLIP-CC.git

⚑ 30-second setup
πŸ”₯ Zero configuration
βœ… Instant access

πŸ€— Hugging Face

Seamless dataset integration

πŸ–±οΈ Click to expand steps

Terminal:

from datasets import load_dataset

# Load the dataset
dataset = load_dataset("IVSL-SDSU/Clip-CC")

# Access a sample entry
print(dataset["train"][0])

Colab/Jupyter:

# If path issues occur, upgrade packages:
!pip install --upgrade datasets fsspec

from datasets import load_dataset
dataset = load_dataset("IVSL-SDSU/Clip-CC")
print(dataset["train"][0])

πŸ€— Native HF integration
πŸ“Š Dataset ecosystem
πŸ”„ Auto-versioning

πŸ’» Basic Usage

Once installed, start using CLIP-CC in seconds:

🐍 Python Package (Manual & Pip)

# Import CLIP-CC (works for both manual and pip installation)
from clip_cc.loader import load_metadata

# Load dataset
data = load_metadata()

# Explore
print(f"Dataset size: {len(data)} videos")
print(f"Sample: {data[0]}")

# Access summaries
for entry in data[:3]:
    print(f"Video {entry['id']}: {entry['summary']}")

πŸ€— Hugging Face

# Load with Hugging Face
from datasets import load_dataset

dataset = load_dataset("IVSL-SDSU/Clip-CC")

# Explore
print(f"Dataset: {dataset}")
print(f"Sample: {dataset['train'][0]}")

# Access summaries
for i in range(3):
    entry = dataset['train'][i]
    print(f"Video {entry['id']}: {entry['summary']}")

πŸŽ₯ Video Downloader & Clipper

⚠️ Video Downloading is Optional - Most research only needs the metadata (summaries + links)!

πŸ”§ Setup & Usage

⚠️ Video Downloading is Terminal Only - Colab/Jupyter environments don't support the browser cookies needed for age-restricted videos.

πŸ’» Terminal Only

1. Install Dependencies:

pip install yt-dlp ffmpeg-python

2. Check FFmpeg:

ffmpeg -version

3. Download & Clip Videos:

from clip_cc.downloader import download_and_clip_dataset

# Download one specific video (ID: 001)
download_and_clip_dataset(
    output_dir="downloads/clips",
    target_ids={"001"},
    use_browser_cookies=True,  # Uses Chrome/Firefox cookies directly
    clip_duration=90
)

print("βœ… Videos downloaded and clipped!")

Parameters:

  • output_dir: Where to save the clipped videos
  • target_ids: Specific video ID(s) to download (use None to download all 200 videos)
  • use_browser_cookies: Use browser cookies directly (recommended for age-restricted videos)
  • cookiefile_path: Alternative cookie file path (if browser cookies don't work)
  • clip_duration: Video length in seconds (default: 90)

What this does:

  • Downloads videos using yt-dlp to a temporary folder
  • Clips the first 90 seconds using ffmpeg
  • Saves final clipped videos to output_dir
  • Automatically cleans up intermediate files

πŸ”§ Troubleshooting Downloads

Age-Restricted Videos (Most Common Issue)

Problem: Sign in to confirm your age error

Solution: Use Browser Cookies (Terminal Only)

Method 1: Browser cookies (Recommended)

# Uses your browser's cookies directly (Chrome/Firefox)
# Make sure you're signed into YouTube in your browser first
download_and_clip_dataset(
    output_dir="clips",
    target_ids={"002"},  # Or None for all videos
    use_browser_cookies=True,  # Uses Chrome or Firefox cookies directly
    clip_duration=90
)

Method 2: Export cookies file (Alternative)

  1. Export cookies from your browser:

    • Install browser extension "Get cookies.txt LOCALLY" (Chrome/Firefox)
    • Visit YouTube and sign in to your account
    • Click the extension and download cookies.txt file
    • Important: Make sure it's in Netscape format
  2. Use cookies in download:

    download_and_clip_dataset(
        output_dir="clips",
        target_ids={"002"},  # Or None for all videos
        cookiefile_path="cookies.txt",
        clip_duration=90
    )

Note: Method 1 (browser cookies) works much more reliably than cookie files.

Other Common Issues

FFmpeg Issues:

  • Ensure ffmpeg is installed and in PATH
  • Test with: ffmpeg -version

Network Issues:

  • Check internet connection for YouTube access
  • Some videos may be region-restricted or removed

Video Not Found:

  • Video may have been removed from YouTube
  • The download will skip missing videos and continue with others

πŸ€– VCS Ecosystem Integration

CLIP-CC is designed to work seamlessly with VCS (Video Comprehension Score) for comprehensive video understanding evaluation.

VCS Metrics

πŸ“Š Perfect Integration: CLIP-CC + VCS

  • πŸŽ₯ CLIP-CC provides the data β†’ Rich video dataset with human summaries
  • πŸ” VCS provides the evaluation β†’ Advanced narrative comprehension metrics
  • πŸ† Together: Complete research pipeline β†’ From data loading to evaluation

πŸ“š Citation

If you use CLIP-CC Dataset in your research, please cite:

@software{vcs_metrics_2024,
  title = {VCS Metrics: Video Comprehension Score for Text Similarity Evaluation},
  author = {Dubey, Harsh and Ali, Mukhtiar and Mishra, Sugam and Pack, Chulwoo},
  year = {2024},
  institution = {South Dakota State University},
  url = {https://github.com/Multimodal-Intelligence-Lab/Video-Comprehension-Score},
  note = {Python package for narrative similarity evaluation}
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


🌟 Made with ❀️ by the VCS Team

Authors: Harsh Dubey, Mukhtiar Ali, Sugam Mishra, and Chulwoo Pack
Institution: South Dakota State University
Year: 2024

⭐ Star this repo β€’ πŸ› Report Bug β€’ πŸ’‘ Request Feature β€’ πŸ’¬ Community Q&A

About

The CLIP-CC Dataset is a carefully curated collection of 200 YouTube video links with human-written summaries, specifically designed for research and experimentation in multimodal AI tasks. This dataset addresses the growing need for high-quality video comprehension benchmarks that can effectively evaluate the narrative understanding capabilities.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages