# YouTube Video Summarizer

A Python tool that automatically fetches YouTube video transcripts and generates comprehensive summaries using OpenAI's GPT-4o-mini model. Features intelligent chunking for large videos and high-quality summarization.

## Features

- 🎬 **YouTube Integration**: Automatically fetches video transcripts
- 🤖 **AI-Powered Summaries**: Uses GPT-4o-mini for high-quality summaries
- 📊 **Smart Chunking**: Handles large videos by splitting into manageable chunks
- 🔄 **Automatic Stitching**: Combines chunk summaries into cohesive final summaries
- 💰 **Cost-Effective**: Optimized for GPT-4o-mini's token limits
- 🛡️ **Error Handling**: Robust error handling with helpful messages

## Installation

### Prerequisites
- Python 3.8 or higher

### Option 1: Using the installation script (Recommended)
```bash
# Run the automated installation script
python install.py

# The script will let you choose between UV and pip
# Then run the script with your chosen method
```

### Option 2: Using UV
```bash
# Install UV if not already installed
pip install uv

# Install dependencies and create virtual environment
uv sync

# Run the script
uv run python youtube_video_summarizer.py
```

### Option 3: Using pip
```bash
# Install dependencies
pip install -r requirements.txt

# Run the script
python youtube_video_summarizer.py
```

### Optional Dependencies

#### With UV:
```bash
# For Jupyter notebook support
uv sync --extra jupyter

# For development dependencies (testing, linting, etc.)
uv sync --extra dev
```

#### With pip:
```bash
# For Jupyter notebook support
pip install ipython jupyter

# For development dependencies
pip install pytest black flake8 mypy
```

## Setup

1. **Get an OpenAI API Key**:
   - Visit [OpenAI API](https://platform.openai.com/api-keys)
   - Create a new API key

2. **Create a .env file**:
   ```bash
   echo "OPENAI_API_KEY=your_api_key_here" > .env
   ```

3. **Update the video URL** in `youtube_video_summarizer.py`:
   ```python
   video_url = "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
   ```

## Usage

### Basic Usage
```python
from youtube_video_summarizer import YouTubeVideo, summarize_video

# Create video object
video = YouTubeVideo("https://www.youtube.com/watch?v=VIDEO_ID")

# Generate summary
summary = summarize_video(video)
print(summary)
```

### Advanced Usage with Custom Settings
```python
# Custom chunking settings
summary = summarize_video(
    video, 
    use_chunking=True, 
    max_chunk_tokens=4000
)
```

## How It Works

1. **Video Processing**: Fetches YouTube video metadata and transcript
2. **Token Analysis**: Counts tokens to determine if chunking is needed
3. **Smart Chunking**: Splits large transcripts into manageable pieces
4. **Individual Summaries**: Generates summaries for each chunk
5. **Intelligent Stitching**: Combines chunk summaries into final result

## Configuration

### Model Settings
- **Model**: GPT-4o-mini (cost-effective and high-quality)
- **Temperature**: 0.3 (focused, consistent output)
- **Max Tokens**: 2,000 (optimal for summaries)

### Chunking Settings
- **Max Chunk Size**: 4,000 tokens (auto-calculated per model)
- **Overlap**: 5% of chunk size (maintains context)
- **Auto-detection**: Automatically determines if chunking is needed

## Error Handling

The script includes comprehensive error handling:
- ✅ **Missing Dependencies**: Clear installation instructions
- ✅ **Invalid URLs**: YouTube URL validation
- ✅ **API Errors**: OpenAI API error handling
- ✅ **Network Issues**: Request timeout and retry logic

## Requirements

- **Python**: 3.8 or higher
- **OpenAI API Key**: Required for summarization
- **Internet Connection**: For YouTube and OpenAI API access

## Dependencies

### Core Dependencies
- `requests`: HTTP requests
- `tiktoken`: Token counting
- `python-dotenv`: Environment variable management
- `openai`: OpenAI API client
- `youtube-transcript-api`: YouTube transcript fetching
- `beautifulsoup4`: HTML parsing

### Optional Dependencies
- `ipython`: Jupyter notebook support
- `jupyter`: Jupyter notebook support

## Troubleshooting

### Common Issues

1. **ModuleNotFoundError**: 
   - With UV: Run `uv sync` to install dependencies
   - With pip: Run `pip install -r requirements.txt`
2. **UV not found**: Install UV with `pip install uv` or run `python install.py`
3. **OpenAI API Error**: Check your API key in `.env` file
4. **YouTube Transcript Error**: Video may not have transcripts available
5. **Token Limit Error**: Video transcript is too long (rare with chunking)

### Getting Help

If you encounter issues:
1. Check the error messages (they include helpful installation instructions)
2. Ensure all dependencies are installed:
   - With UV: `uv sync`
   - With pip: `pip install -r requirements.txt`
3. Verify your OpenAI API key is correct
4. Check that the YouTube video has transcripts available
5. Try running with the appropriate command:
   - With UV: `uv run python youtube_video_summarizer.py`
   - With pip: `python youtube_video_summarizer.py`

## License

This project is part of the LLM Engineering course materials.

## Contributing

Feel free to submit issues and enhancement requests!