Files
LLM_Engineering_OLD/week1/community-contributions/Youtube_video_summarizer/README.md

5.1 KiB

YouTube Video Summarizer

A Python tool that automatically fetches YouTube video transcripts and generates comprehensive summaries using OpenAI's GPT-4o-mini model. Features intelligent chunking for large videos and high-quality summarization.

Features

  • 🎬 YouTube Integration: Automatically fetches video transcripts
  • 🤖 AI-Powered Summaries: Uses GPT-4o-mini for high-quality summaries
  • 📊 Smart Chunking: Handles large videos by splitting into manageable chunks
  • 🔄 Automatic Stitching: Combines chunk summaries into cohesive final summaries
  • 💰 Cost-Effective: Optimized for GPT-4o-mini's token limits
  • 🛡️ Error Handling: Robust error handling with helpful messages

Installation

Prerequisites

  • Python 3.8 or higher
# Run the automated installation script
python install.py

# The script will let you choose between UV and pip
# Then run the script with your chosen method

Option 2: Using UV

# Install UV if not already installed
pip install uv

# Install dependencies and create virtual environment
uv sync

# Run the script
uv run python youtube_video_summarizer.py

Option 3: Using pip

# Install dependencies
pip install -r requirements.txt

# Run the script
python youtube_video_summarizer.py

Optional Dependencies

With UV:

# For Jupyter notebook support
uv sync --extra jupyter

# For development dependencies (testing, linting, etc.)
uv sync --extra dev

With pip:

# For Jupyter notebook support
pip install ipython jupyter

# For development dependencies
pip install pytest black flake8 mypy

Setup

  1. Get an OpenAI API Key:

  2. Create a .env file:

    echo "OPENAI_API_KEY=your_api_key_here" > .env
    
  3. Update the video URL in youtube_video_summarizer.py:

    video_url = "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
    

Usage

Basic Usage

from youtube_video_summarizer import YouTubeVideo, summarize_video

# Create video object
video = YouTubeVideo("https://www.youtube.com/watch?v=VIDEO_ID")

# Generate summary
summary = summarize_video(video)
print(summary)

Advanced Usage with Custom Settings

# Custom chunking settings
summary = summarize_video(
    video, 
    use_chunking=True, 
    max_chunk_tokens=4000
)

How It Works

  1. Video Processing: Fetches YouTube video metadata and transcript
  2. Token Analysis: Counts tokens to determine if chunking is needed
  3. Smart Chunking: Splits large transcripts into manageable pieces
  4. Individual Summaries: Generates summaries for each chunk
  5. Intelligent Stitching: Combines chunk summaries into final result

Configuration

Model Settings

  • Model: GPT-4o-mini (cost-effective and high-quality)
  • Temperature: 0.3 (focused, consistent output)
  • Max Tokens: 2,000 (optimal for summaries)

Chunking Settings

  • Max Chunk Size: 4,000 tokens (auto-calculated per model)
  • Overlap: 5% of chunk size (maintains context)
  • Auto-detection: Automatically determines if chunking is needed

Error Handling

The script includes comprehensive error handling:

  • Missing Dependencies: Clear installation instructions
  • Invalid URLs: YouTube URL validation
  • API Errors: OpenAI API error handling
  • Network Issues: Request timeout and retry logic

Requirements

  • Python: 3.8 or higher
  • OpenAI API Key: Required for summarization
  • Internet Connection: For YouTube and OpenAI API access

Dependencies

Core Dependencies

  • requests: HTTP requests
  • tiktoken: Token counting
  • python-dotenv: Environment variable management
  • openai: OpenAI API client
  • youtube-transcript-api: YouTube transcript fetching
  • beautifulsoup4: HTML parsing

Optional Dependencies

  • ipython: Jupyter notebook support
  • jupyter: Jupyter notebook support

Troubleshooting

Common Issues

  1. ModuleNotFoundError:
    • With UV: Run uv sync to install dependencies
    • With pip: Run pip install -r requirements.txt
  2. UV not found: Install UV with pip install uv or run python install.py
  3. OpenAI API Error: Check your API key in .env file
  4. YouTube Transcript Error: Video may not have transcripts available
  5. Token Limit Error: Video transcript is too long (rare with chunking)

Getting Help

If you encounter issues:

  1. Check the error messages (they include helpful installation instructions)
  2. Ensure all dependencies are installed:
    • With UV: uv sync
    • With pip: pip install -r requirements.txt
  3. Verify your OpenAI API key is correct
  4. Check that the YouTube video has transcripts available
  5. Try running with the appropriate command:
    • With UV: uv run python youtube_video_summarizer.py
    • With pip: python youtube_video_summarizer.py

License

This project is part of the LLM Engineering course materials.

Contributing

Feel free to submit issues and enhancement requests!