Adding shabsi4u youtube video summarizer for day 1

2025-09-16 15:45:47 +05:30
parent 236749eb29
commit 07ccfaa3ed
7 changed files with 6223 additions and 0 deletions
--- a/week1/community-contributions/Youtube_video_summarizer/README.md
+++ b/week1/community-contributions/Youtube_video_summarizer/README.md
@@ -0,0 +1,188 @@
+# YouTube Video Summarizer
+
+A Python tool that automatically fetches YouTube video transcripts and generates comprehensive summaries using OpenAI's GPT-4o-mini model. Features intelligent chunking for large videos and high-quality summarization.
+
+## Features
+
+- 🎬 **YouTube Integration**: Automatically fetches video transcripts
+- 🤖 **AI-Powered Summaries**: Uses GPT-4o-mini for high-quality summaries
+- 📊 **Smart Chunking**: Handles large videos by splitting into manageable chunks
+- 🔄 **Automatic Stitching**: Combines chunk summaries into cohesive final summaries
+- 💰 **Cost-Effective**: Optimized for GPT-4o-mini's token limits
+- 🛡️ **Error Handling**: Robust error handling with helpful messages
+
+## Installation
+
+### Prerequisites
+- Python 3.8 or higher
+
+### Option 1: Using the installation script (Recommended)
+```bash
+# Run the automated installation script
+python install.py
+
+# The script will let you choose between UV and pip
+# Then run the script with your chosen method
+```
+
+### Option 2: Using UV
+```bash
+# Install UV if not already installed
+pip install uv
+
+# Install dependencies and create virtual environment
+uv sync
+
+# Run the script
+uv run python youtube_video_summarizer.py
+```
+
+### Option 3: Using pip
+```bash
+# Install dependencies
+pip install -r requirements.txt
+
+# Run the script
+python youtube_video_summarizer.py
+```
+
+### Optional Dependencies
+
+#### With UV:
+```bash
+# For Jupyter notebook support
+uv sync --extra jupyter
+
+# For development dependencies (testing, linting, etc.)
+uv sync --extra dev
+```
+
+#### With pip:
+```bash
+# For Jupyter notebook support
+pip install ipython jupyter
+
+# For development dependencies
+pip install pytest black flake8 mypy
+```
+
+## Setup
+
+1. **Get an OpenAI API Key**:
+   - Visit [OpenAI API](https://platform.openai.com/api-keys)
+   - Create a new API key
+
+2. **Create a .env file**:
+   ```bash
+   echo "OPENAI_API_KEY=your_api_key_here" > .env
+   ```
+
+3. **Update the video URL** in `youtube_video_summarizer.py`:
+   ```python
+   video_url = "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
+   ```
+
+## Usage
+
+### Basic Usage
+```python
+from youtube_video_summarizer import YouTubeVideo, summarize_video
+
+# Create video object
+video = YouTubeVideo("https://www.youtube.com/watch?v=VIDEO_ID")
+
+# Generate summary
+summary = summarize_video(video)
+print(summary)
+```
+
+### Advanced Usage with Custom Settings
+```python
+# Custom chunking settings
+summary = summarize_video(
+    video, 
+    use_chunking=True, 
+    max_chunk_tokens=4000
+)
+```
+
+## How It Works
+
+1. **Video Processing**: Fetches YouTube video metadata and transcript
+2. **Token Analysis**: Counts tokens to determine if chunking is needed
+3. **Smart Chunking**: Splits large transcripts into manageable pieces
+4. **Individual Summaries**: Generates summaries for each chunk
+5. **Intelligent Stitching**: Combines chunk summaries into final result
+
+## Configuration
+
+### Model Settings
+- **Model**: GPT-4o-mini (cost-effective and high-quality)
+- **Temperature**: 0.3 (focused, consistent output)
+- **Max Tokens**: 2,000 (optimal for summaries)
+
+### Chunking Settings
+- **Max Chunk Size**: 4,000 tokens (auto-calculated per model)
+- **Overlap**: 5% of chunk size (maintains context)
+- **Auto-detection**: Automatically determines if chunking is needed
+
+## Error Handling
+
+The script includes comprehensive error handling:
+- ✅ **Missing Dependencies**: Clear installation instructions
+- ✅ **Invalid URLs**: YouTube URL validation
+- ✅ **API Errors**: OpenAI API error handling
+- ✅ **Network Issues**: Request timeout and retry logic
+
+## Requirements
+
+- **Python**: 3.8 or higher
+- **OpenAI API Key**: Required for summarization
+- **Internet Connection**: For YouTube and OpenAI API access
+
+## Dependencies
+
+### Core Dependencies
+- `requests`: HTTP requests
+- `tiktoken`: Token counting
+- `python-dotenv`: Environment variable management
+- `openai`: OpenAI API client
+- `youtube-transcript-api`: YouTube transcript fetching
+- `beautifulsoup4`: HTML parsing
+
+### Optional Dependencies
+- `ipython`: Jupyter notebook support
+- `jupyter`: Jupyter notebook support
+
+## Troubleshooting
+
+### Common Issues
+
+1. **ModuleNotFoundError**: 
+   - With UV: Run `uv sync` to install dependencies
+   - With pip: Run `pip install -r requirements.txt`
+2. **UV not found**: Install UV with `pip install uv` or run `python install.py`
+3. **OpenAI API Error**: Check your API key in `.env` file
+4. **YouTube Transcript Error**: Video may not have transcripts available
+5. **Token Limit Error**: Video transcript is too long (rare with chunking)
+
+### Getting Help
+
+If you encounter issues:
+1. Check the error messages (they include helpful installation instructions)
+2. Ensure all dependencies are installed:
+   - With UV: `uv sync`
+   - With pip: `pip install -r requirements.txt`
+3. Verify your OpenAI API key is correct
+4. Check that the YouTube video has transcripts available
+5. Try running with the appropriate command:
+   - With UV: `uv run python youtube_video_summarizer.py`
+   - With pip: `python youtube_video_summarizer.py`
+
+## License
+
+This project is part of the LLM Engineering course materials.
+
+## Contributing
+
+Feel free to submit issues and enhancement requests!