Adding shabsi4u youtube video summarizer for day 1

This commit is contained in:
shabsi4u
2025-09-16 15:45:47 +05:30
parent 236749eb29
commit 07ccfaa3ed
7 changed files with 6223 additions and 0 deletions

View File

@@ -0,0 +1,188 @@
# YouTube Video Summarizer
A Python tool that automatically fetches YouTube video transcripts and generates comprehensive summaries using OpenAI's GPT-4o-mini model. Features intelligent chunking for large videos and high-quality summarization.
## Features
- 🎬 **YouTube Integration**: Automatically fetches video transcripts
- 🤖 **AI-Powered Summaries**: Uses GPT-4o-mini for high-quality summaries
- 📊 **Smart Chunking**: Handles large videos by splitting into manageable chunks
- 🔄 **Automatic Stitching**: Combines chunk summaries into cohesive final summaries
- 💰 **Cost-Effective**: Optimized for GPT-4o-mini's token limits
- 🛡️ **Error Handling**: Robust error handling with helpful messages
## Installation
### Prerequisites
- Python 3.8 or higher
### Option 1: Using the installation script (Recommended)
```bash
# Run the automated installation script
python install.py
# The script will let you choose between UV and pip
# Then run the script with your chosen method
```
### Option 2: Using UV
```bash
# Install UV if not already installed
pip install uv
# Install dependencies and create virtual environment
uv sync
# Run the script
uv run python youtube_video_summarizer.py
```
### Option 3: Using pip
```bash
# Install dependencies
pip install -r requirements.txt
# Run the script
python youtube_video_summarizer.py
```
### Optional Dependencies
#### With UV:
```bash
# For Jupyter notebook support
uv sync --extra jupyter
# For development dependencies (testing, linting, etc.)
uv sync --extra dev
```
#### With pip:
```bash
# For Jupyter notebook support
pip install ipython jupyter
# For development dependencies
pip install pytest black flake8 mypy
```
## Setup
1. **Get an OpenAI API Key**:
- Visit [OpenAI API](https://platform.openai.com/api-keys)
- Create a new API key
2. **Create a .env file**:
```bash
echo "OPENAI_API_KEY=your_api_key_here" > .env
```
3. **Update the video URL** in `youtube_video_summarizer.py`:
```python
video_url = "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
```
## Usage
### Basic Usage
```python
from youtube_video_summarizer import YouTubeVideo, summarize_video
# Create video object
video = YouTubeVideo("https://www.youtube.com/watch?v=VIDEO_ID")
# Generate summary
summary = summarize_video(video)
print(summary)
```
### Advanced Usage with Custom Settings
```python
# Custom chunking settings
summary = summarize_video(
video,
use_chunking=True,
max_chunk_tokens=4000
)
```
## How It Works
1. **Video Processing**: Fetches YouTube video metadata and transcript
2. **Token Analysis**: Counts tokens to determine if chunking is needed
3. **Smart Chunking**: Splits large transcripts into manageable pieces
4. **Individual Summaries**: Generates summaries for each chunk
5. **Intelligent Stitching**: Combines chunk summaries into final result
## Configuration
### Model Settings
- **Model**: GPT-4o-mini (cost-effective and high-quality)
- **Temperature**: 0.3 (focused, consistent output)
- **Max Tokens**: 2,000 (optimal for summaries)
### Chunking Settings
- **Max Chunk Size**: 4,000 tokens (auto-calculated per model)
- **Overlap**: 5% of chunk size (maintains context)
- **Auto-detection**: Automatically determines if chunking is needed
## Error Handling
The script includes comprehensive error handling:
- ✅ **Missing Dependencies**: Clear installation instructions
- ✅ **Invalid URLs**: YouTube URL validation
- ✅ **API Errors**: OpenAI API error handling
- ✅ **Network Issues**: Request timeout and retry logic
## Requirements
- **Python**: 3.8 or higher
- **OpenAI API Key**: Required for summarization
- **Internet Connection**: For YouTube and OpenAI API access
## Dependencies
### Core Dependencies
- `requests`: HTTP requests
- `tiktoken`: Token counting
- `python-dotenv`: Environment variable management
- `openai`: OpenAI API client
- `youtube-transcript-api`: YouTube transcript fetching
- `beautifulsoup4`: HTML parsing
### Optional Dependencies
- `ipython`: Jupyter notebook support
- `jupyter`: Jupyter notebook support
## Troubleshooting
### Common Issues
1. **ModuleNotFoundError**:
- With UV: Run `uv sync` to install dependencies
- With pip: Run `pip install -r requirements.txt`
2. **UV not found**: Install UV with `pip install uv` or run `python install.py`
3. **OpenAI API Error**: Check your API key in `.env` file
4. **YouTube Transcript Error**: Video may not have transcripts available
5. **Token Limit Error**: Video transcript is too long (rare with chunking)
### Getting Help
If you encounter issues:
1. Check the error messages (they include helpful installation instructions)
2. Ensure all dependencies are installed:
- With UV: `uv sync`
- With pip: `pip install -r requirements.txt`
3. Verify your OpenAI API key is correct
4. Check that the YouTube video has transcripts available
5. Try running with the appropriate command:
- With UV: `uv run python youtube_video_summarizer.py`
- With pip: `python youtube_video_summarizer.py`
## License
This project is part of the LLM Engineering course materials.
## Contributing
Feel free to submit issues and enhancement requests!