Adding shabsi4u youtube video summarizer for day 1
This commit is contained in:
188
week1/community-contributions/Youtube_video_summarizer/README.md
Normal file
188
week1/community-contributions/Youtube_video_summarizer/README.md
Normal file
@@ -0,0 +1,188 @@
|
||||
# YouTube Video Summarizer
|
||||
|
||||
A Python tool that automatically fetches YouTube video transcripts and generates comprehensive summaries using OpenAI's GPT-4o-mini model. Features intelligent chunking for large videos and high-quality summarization.
|
||||
|
||||
## Features
|
||||
|
||||
- 🎬 **YouTube Integration**: Automatically fetches video transcripts
|
||||
- 🤖 **AI-Powered Summaries**: Uses GPT-4o-mini for high-quality summaries
|
||||
- 📊 **Smart Chunking**: Handles large videos by splitting into manageable chunks
|
||||
- 🔄 **Automatic Stitching**: Combines chunk summaries into cohesive final summaries
|
||||
- 💰 **Cost-Effective**: Optimized for GPT-4o-mini's token limits
|
||||
- 🛡️ **Error Handling**: Robust error handling with helpful messages
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
- Python 3.8 or higher
|
||||
|
||||
### Option 1: Using the installation script (Recommended)
|
||||
```bash
|
||||
# Run the automated installation script
|
||||
python install.py
|
||||
|
||||
# The script will let you choose between UV and pip
|
||||
# Then run the script with your chosen method
|
||||
```
|
||||
|
||||
### Option 2: Using UV
|
||||
```bash
|
||||
# Install UV if not already installed
|
||||
pip install uv
|
||||
|
||||
# Install dependencies and create virtual environment
|
||||
uv sync
|
||||
|
||||
# Run the script
|
||||
uv run python youtube_video_summarizer.py
|
||||
```
|
||||
|
||||
### Option 3: Using pip
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Run the script
|
||||
python youtube_video_summarizer.py
|
||||
```
|
||||
|
||||
### Optional Dependencies
|
||||
|
||||
#### With UV:
|
||||
```bash
|
||||
# For Jupyter notebook support
|
||||
uv sync --extra jupyter
|
||||
|
||||
# For development dependencies (testing, linting, etc.)
|
||||
uv sync --extra dev
|
||||
```
|
||||
|
||||
#### With pip:
|
||||
```bash
|
||||
# For Jupyter notebook support
|
||||
pip install ipython jupyter
|
||||
|
||||
# For development dependencies
|
||||
pip install pytest black flake8 mypy
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
1. **Get an OpenAI API Key**:
|
||||
- Visit [OpenAI API](https://platform.openai.com/api-keys)
|
||||
- Create a new API key
|
||||
|
||||
2. **Create a .env file**:
|
||||
```bash
|
||||
echo "OPENAI_API_KEY=your_api_key_here" > .env
|
||||
```
|
||||
|
||||
3. **Update the video URL** in `youtube_video_summarizer.py`:
|
||||
```python
|
||||
video_url = "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Usage
|
||||
```python
|
||||
from youtube_video_summarizer import YouTubeVideo, summarize_video
|
||||
|
||||
# Create video object
|
||||
video = YouTubeVideo("https://www.youtube.com/watch?v=VIDEO_ID")
|
||||
|
||||
# Generate summary
|
||||
summary = summarize_video(video)
|
||||
print(summary)
|
||||
```
|
||||
|
||||
### Advanced Usage with Custom Settings
|
||||
```python
|
||||
# Custom chunking settings
|
||||
summary = summarize_video(
|
||||
video,
|
||||
use_chunking=True,
|
||||
max_chunk_tokens=4000
|
||||
)
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Video Processing**: Fetches YouTube video metadata and transcript
|
||||
2. **Token Analysis**: Counts tokens to determine if chunking is needed
|
||||
3. **Smart Chunking**: Splits large transcripts into manageable pieces
|
||||
4. **Individual Summaries**: Generates summaries for each chunk
|
||||
5. **Intelligent Stitching**: Combines chunk summaries into final result
|
||||
|
||||
## Configuration
|
||||
|
||||
### Model Settings
|
||||
- **Model**: GPT-4o-mini (cost-effective and high-quality)
|
||||
- **Temperature**: 0.3 (focused, consistent output)
|
||||
- **Max Tokens**: 2,000 (optimal for summaries)
|
||||
|
||||
### Chunking Settings
|
||||
- **Max Chunk Size**: 4,000 tokens (auto-calculated per model)
|
||||
- **Overlap**: 5% of chunk size (maintains context)
|
||||
- **Auto-detection**: Automatically determines if chunking is needed
|
||||
|
||||
## Error Handling
|
||||
|
||||
The script includes comprehensive error handling:
|
||||
- ✅ **Missing Dependencies**: Clear installation instructions
|
||||
- ✅ **Invalid URLs**: YouTube URL validation
|
||||
- ✅ **API Errors**: OpenAI API error handling
|
||||
- ✅ **Network Issues**: Request timeout and retry logic
|
||||
|
||||
## Requirements
|
||||
|
||||
- **Python**: 3.8 or higher
|
||||
- **OpenAI API Key**: Required for summarization
|
||||
- **Internet Connection**: For YouTube and OpenAI API access
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Core Dependencies
|
||||
- `requests`: HTTP requests
|
||||
- `tiktoken`: Token counting
|
||||
- `python-dotenv`: Environment variable management
|
||||
- `openai`: OpenAI API client
|
||||
- `youtube-transcript-api`: YouTube transcript fetching
|
||||
- `beautifulsoup4`: HTML parsing
|
||||
|
||||
### Optional Dependencies
|
||||
- `ipython`: Jupyter notebook support
|
||||
- `jupyter`: Jupyter notebook support
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **ModuleNotFoundError**:
|
||||
- With UV: Run `uv sync` to install dependencies
|
||||
- With pip: Run `pip install -r requirements.txt`
|
||||
2. **UV not found**: Install UV with `pip install uv` or run `python install.py`
|
||||
3. **OpenAI API Error**: Check your API key in `.env` file
|
||||
4. **YouTube Transcript Error**: Video may not have transcripts available
|
||||
5. **Token Limit Error**: Video transcript is too long (rare with chunking)
|
||||
|
||||
### Getting Help
|
||||
|
||||
If you encounter issues:
|
||||
1. Check the error messages (they include helpful installation instructions)
|
||||
2. Ensure all dependencies are installed:
|
||||
- With UV: `uv sync`
|
||||
- With pip: `pip install -r requirements.txt`
|
||||
3. Verify your OpenAI API key is correct
|
||||
4. Check that the YouTube video has transcripts available
|
||||
5. Try running with the appropriate command:
|
||||
- With UV: `uv run python youtube_video_summarizer.py`
|
||||
- With pip: `python youtube_video_summarizer.py`
|
||||
|
||||
## License
|
||||
|
||||
This project is part of the LLM Engineering course materials.
|
||||
|
||||
## Contributing
|
||||
|
||||
Feel free to submit issues and enhancement requests!
|
||||
Reference in New Issue
Block a user