Adding shabsi4u youtube video summarizer for day 1

2025-09-16 15:45:47 +05:30
parent 236749eb29
commit 07ccfaa3ed
7 changed files with 6223 additions and 0 deletions
--- a/week1/community-contributions/Youtube_video_summarizer/README.md
+++ b/week1/community-contributions/Youtube_video_summarizer/README.md
@@ -0,0 +1,188 @@
+# YouTube Video Summarizer
+
+A Python tool that automatically fetches YouTube video transcripts and generates comprehensive summaries using OpenAI's GPT-4o-mini model. Features intelligent chunking for large videos and high-quality summarization.
+
+## Features
+
+- 🎬 **YouTube Integration**: Automatically fetches video transcripts
+- 🤖 **AI-Powered Summaries**: Uses GPT-4o-mini for high-quality summaries
+- 📊 **Smart Chunking**: Handles large videos by splitting into manageable chunks
+- 🔄 **Automatic Stitching**: Combines chunk summaries into cohesive final summaries
+- 💰 **Cost-Effective**: Optimized for GPT-4o-mini's token limits
+- 🛡️ **Error Handling**: Robust error handling with helpful messages
+
+## Installation
+
+### Prerequisites
+- Python 3.8 or higher
+
+### Option 1: Using the installation script (Recommended)
+```bash
+# Run the automated installation script
+python install.py
+
+# The script will let you choose between UV and pip
+# Then run the script with your chosen method
+```
+
+### Option 2: Using UV
+```bash
+# Install UV if not already installed
+pip install uv
+
+# Install dependencies and create virtual environment
+uv sync
+
+# Run the script
+uv run python youtube_video_summarizer.py
+```
+
+### Option 3: Using pip
+```bash
+# Install dependencies
+pip install -r requirements.txt
+
+# Run the script
+python youtube_video_summarizer.py
+```
+
+### Optional Dependencies
+
+#### With UV:
+```bash
+# For Jupyter notebook support
+uv sync --extra jupyter
+
+# For development dependencies (testing, linting, etc.)
+uv sync --extra dev
+```
+
+#### With pip:
+```bash
+# For Jupyter notebook support
+pip install ipython jupyter
+
+# For development dependencies
+pip install pytest black flake8 mypy
+```
+
+## Setup
+
+1. **Get an OpenAI API Key**:
+   - Visit [OpenAI API](https://platform.openai.com/api-keys)
+   - Create a new API key
+
+2. **Create a .env file**:
+   ```bash
+   echo "OPENAI_API_KEY=your_api_key_here" > .env
+   ```
+
+3. **Update the video URL** in `youtube_video_summarizer.py`:
+   ```python
+   video_url = "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
+   ```
+
+## Usage
+
+### Basic Usage
+```python
+from youtube_video_summarizer import YouTubeVideo, summarize_video
+
+# Create video object
+video = YouTubeVideo("https://www.youtube.com/watch?v=VIDEO_ID")
+
+# Generate summary
+summary = summarize_video(video)
+print(summary)
+```
+
+### Advanced Usage with Custom Settings
+```python
+# Custom chunking settings
+summary = summarize_video(
+    video, 
+    use_chunking=True, 
+    max_chunk_tokens=4000
+)
+```
+
+## How It Works
+
+1. **Video Processing**: Fetches YouTube video metadata and transcript
+2. **Token Analysis**: Counts tokens to determine if chunking is needed
+3. **Smart Chunking**: Splits large transcripts into manageable pieces
+4. **Individual Summaries**: Generates summaries for each chunk
+5. **Intelligent Stitching**: Combines chunk summaries into final result
+
+## Configuration
+
+### Model Settings
+- **Model**: GPT-4o-mini (cost-effective and high-quality)
+- **Temperature**: 0.3 (focused, consistent output)
+- **Max Tokens**: 2,000 (optimal for summaries)
+
+### Chunking Settings
+- **Max Chunk Size**: 4,000 tokens (auto-calculated per model)
+- **Overlap**: 5% of chunk size (maintains context)
+- **Auto-detection**: Automatically determines if chunking is needed
+
+## Error Handling
+
+The script includes comprehensive error handling:
+- ✅ **Missing Dependencies**: Clear installation instructions
+- ✅ **Invalid URLs**: YouTube URL validation
+- ✅ **API Errors**: OpenAI API error handling
+- ✅ **Network Issues**: Request timeout and retry logic
+
+## Requirements
+
+- **Python**: 3.8 or higher
+- **OpenAI API Key**: Required for summarization
+- **Internet Connection**: For YouTube and OpenAI API access
+
+## Dependencies
+
+### Core Dependencies
+- `requests`: HTTP requests
+- `tiktoken`: Token counting
+- `python-dotenv`: Environment variable management
+- `openai`: OpenAI API client
+- `youtube-transcript-api`: YouTube transcript fetching
+- `beautifulsoup4`: HTML parsing
+
+### Optional Dependencies
+- `ipython`: Jupyter notebook support
+- `jupyter`: Jupyter notebook support
+
+## Troubleshooting
+
+### Common Issues
+
+1. **ModuleNotFoundError**: 
+   - With UV: Run `uv sync` to install dependencies
+   - With pip: Run `pip install -r requirements.txt`
+2. **UV not found**: Install UV with `pip install uv` or run `python install.py`
+3. **OpenAI API Error**: Check your API key in `.env` file
+4. **YouTube Transcript Error**: Video may not have transcripts available
+5. **Token Limit Error**: Video transcript is too long (rare with chunking)
+
+### Getting Help
+
+If you encounter issues:
+1. Check the error messages (they include helpful installation instructions)
+2. Ensure all dependencies are installed:
+   - With UV: `uv sync`
+   - With pip: `pip install -r requirements.txt`
+3. Verify your OpenAI API key is correct
+4. Check that the YouTube video has transcripts available
+5. Try running with the appropriate command:
+   - With UV: `uv run python youtube_video_summarizer.py`
+   - With pip: `python youtube_video_summarizer.py`
+
+## License
+
+This project is part of the LLM Engineering course materials.
+
+## Contributing
+
+Feel free to submit issues and enhancement requests!
--- a/week1/community-contributions/Youtube_video_summarizer/install.py
+++ b/week1/community-contributions/Youtube_video_summarizer/install.py
@@ -0,0 +1,178 @@
+#!/usr/bin/env python3
+"""
+Installation script for YouTube Video Summarizer
+This script installs all required dependencies for the project using either UV or pip.
+"""
+
+import subprocess
+import sys
+import os
+import shutil
+
+def run_command(command, description):
+    """Run a command and handle errors"""
+    print(f"🔄 {description}...")
+    try:
+        result = subprocess.run(command, shell=True, check=True, capture_output=True, text=True)
+        print(f"✅ {description} completed successfully")
+        return True
+    except subprocess.CalledProcessError as e:
+        print(f"❌ {description} failed:")
+        print(f"   Error: {e.stderr}")
+        return False
+
+def check_python_version():
+    """Check if Python version is compatible"""
+    version = sys.version_info
+    if version.major < 3 or (version.major == 3 and version.minor < 8):
+        print("❌ Python 3.8 or higher is required")
+        print(f"   Current version: {version.major}.{version.minor}.{version.micro}")
+        return False
+    print(f"✅ Python {version.major}.{version.minor}.{version.micro} is compatible")
+    return True
+
+def check_uv_installed():
+    """Check if UV is installed"""
+    if shutil.which("uv"):
+        print("✅ UV is already installed")
+        return True
+    else:
+        print("❌ UV is not installed")
+        return False
+
+def install_uv():
+    """Install UV package manager"""
+    print("🔄 Installing UV...")
+    try:
+        # Try to install UV using pip first
+        if not run_command(f"{sys.executable} -m pip install uv", "Installing UV via pip"):
+            # Fallback to curl installation
+            install_script = "curl -LsSf https://astral.sh/uv/install.sh | sh"
+            if not run_command(install_script, "Installing UV via curl"):
+                print("❌ Failed to install UV. Please install it manually:")
+                print("   pip install uv")
+                print("   or visit: https://github.com/astral-sh/uv")
+                return False
+        return True
+    except Exception as e:
+        print(f"❌ Error installing UV: {e}")
+        return False
+
+def choose_package_manager():
+    """Let user choose between UV and pip"""
+    print("\n📦 Choose your package manager:")
+    print("1. UV (recommended - faster, better dependency resolution)")
+    print("2. pip (traditional Python package manager)")
+    
+    while True:
+        choice = input("\nEnter your choice (1 or 2): ").strip()
+        if choice == "1":
+            return "uv"
+        elif choice == "2":
+            return "pip"
+        else:
+            print("❌ Invalid choice. Please enter 1 or 2.")
+
+def install_dependencies_uv():
+    """Install dependencies using UV"""
+    print("🚀 Installing YouTube Video Summarizer dependencies with UV...")
+    print("=" * 60)
+    
+    # Check if UV is installed, install if not
+    if not check_uv_installed():
+        if not install_uv():
+            return False
+    
+    # Check if pyproject.toml exists
+    pyproject_file = os.path.join(os.path.dirname(__file__), "pyproject.toml")
+    if not os.path.exists(pyproject_file):
+        print("❌ pyproject.toml not found. Please ensure you're in the project directory.")
+        return False
+    
+    # Install dependencies using UV
+    if not run_command("uv sync", "Installing dependencies with UV"):
+        return False
+    
+    print("=" * 60)
+    print("🎉 Installation completed successfully!")
+    print("\n📋 Next steps:")
+    print("1. Create a .env file with your OpenAI API key:")
+    print("   OPENAI_API_KEY=your_api_key_here")
+    print("2. Run the script:")
+    print("   uv run python youtube_video_summarizer.py")
+    print("\n💡 For Jupyter notebook support, install with:")
+    print("   uv sync --extra jupyter")
+    print("\n💡 For development dependencies, install with:")
+    print("   uv sync --extra dev")
+    
+    return True
+
+def install_dependencies_pip():
+    """Install dependencies using pip"""
+    print("🚀 Installing YouTube Video Summarizer dependencies with pip...")
+    print("=" * 60)
+    
+    # Upgrade pip first
+    if not run_command(f"{sys.executable} -m pip install --upgrade pip", "Upgrading pip"):
+        return False
+    
+    # Install dependencies from requirements.txt
+    requirements_file = os.path.join(os.path.dirname(__file__), "requirements.txt")
+    if os.path.exists(requirements_file):
+        if not run_command(f"{sys.executable} -m pip install -r {requirements_file}", "Installing dependencies from requirements.txt"):
+            return False
+    else:
+        # Install core dependencies individually
+        core_deps = [
+            "requests",
+            "tiktoken", 
+            "python-dotenv",
+            "openai",
+            "youtube-transcript-api",
+            "beautifulsoup4"
+        ]
+        
+        for dep in core_deps:
+            if not run_command(f"{sys.executable} -m pip install {dep}", f"Installing {dep}"):
+                return False
+    
+    print("=" * 60)
+    print("🎉 Installation completed successfully!")
+    print("\n📋 Next steps:")
+    print("1. Create a .env file with your OpenAI API key:")
+    print("   OPENAI_API_KEY=your_api_key_here")
+    print("2. Run the script:")
+    print("   python youtube_video_summarizer.py")
+    print("\n💡 For Jupyter notebook support, also install:")
+    print("   pip install jupyter ipython")
+    
+    return True
+
+def install_dependencies():
+    """Install required dependencies using chosen package manager"""
+    # Check Python version
+    if not check_python_version():
+        return False
+    
+    # Let user choose package manager
+    package_manager = choose_package_manager()
+    
+    if package_manager == "uv":
+        return install_dependencies_uv()
+    else:
+        return install_dependencies_pip()
+
+def main():
+    """Main installation function"""
+    print("🎬 YouTube Video Summarizer - Installation Script")
+    print("=" * 60)
+    
+    if install_dependencies():
+        print("\n✅ All dependencies installed successfully!")
+        print("🚀 You can now run the YouTube Video Summarizer!")
+    else:
+        print("\n❌ Installation failed. Please check the error messages above.")
+        sys.exit(1)
+
+if __name__ == "__main__":
+    main()
--- a/week1/community-contributions/Youtube_video_summarizer/pyproject.toml
+++ b/week1/community-contributions/Youtube_video_summarizer/pyproject.toml
@@ -0,0 +1,78 @@
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[project]
+name = "youtube-video-summarizer"
+version = "1.0.0"
+description = "A tool to summarize YouTube videos using OpenAI's GPT models"
+readme = "README.md"
+requires-python = ">=3.8"
+license = {text = "MIT"}
+authors = [
+    {name = "YouTube Video Summarizer Team"},
+]
+keywords = ["youtube", "summarizer", "openai", "transcript", "ai"]
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: MIT License",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.8",
+    "Programming Language :: Python :: 3.9",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Topic :: Multimedia :: Video",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence",
+]
+
+dependencies = [
+    "requests>=2.25.0",
+    "tiktoken>=0.5.0",
+    "python-dotenv>=0.19.0",
+    "openai>=1.0.0",
+    "youtube-transcript-api>=0.6.0",
+    "beautifulsoup4>=4.9.0",
+]
+
+[project.optional-dependencies]
+jupyter = [
+    "ipython>=7.0.0",
+    "jupyter>=1.0.0",
+]
+dev = [
+    "pytest>=6.0.0",
+    "black>=22.0.0",
+    "flake8>=4.0.0",
+    "mypy>=0.950",
+]
+
+[project.urls]
+Homepage = "https://github.com/your-username/youtube-video-summarizer"
+Repository = "https://github.com/your-username/youtube-video-summarizer"
+Issues = "https://github.com/your-username/youtube-video-summarizer/issues"
+
+[project.scripts]
+youtube-summarizer = "youtube_video_summarizer:main"
+
+[tool.uv]
+dev-dependencies = [
+    "pytest>=6.0.0",
+    "black>=22.0.0",
+    "flake8>=4.0.0",
+    "mypy>=0.950",
+]
+
+[tool.black]
+line-length = 88
+target-version = ['py38']
+
+[tool.mypy]
+python_version = "3.8"
+warn_return_any = true
+warn_unused_configs = true
+disallow_untyped_defs = true
+
+
+
--- a/week1/community-contributions/Youtube_video_summarizer/requirements.txt
+++ b/week1/community-contributions/Youtube_video_summarizer/requirements.txt
@@ -0,0 +1,17 @@
+# Core dependencies for YouTube Video Summarizer
+requests>=2.25.0
+tiktoken>=0.5.0
+python-dotenv>=0.19.0
+openai>=1.0.0
+youtube-transcript-api>=0.6.0
+beautifulsoup4>=4.9.0
+
+# Optional dependencies for Jupyter notebook support
+ipython>=7.0.0
+jupyter>=1.0.0
+
+# Development dependencies (optional)
+pytest>=6.0.0
+black>=22.0.0
+flake8>=4.0.0
+mypy>=0.950
--- a/week1/community-contributions/Youtube_video_summarizer/uv.lock
+++ b/week1/community-contributions/Youtube_video_summarizer/uv.lock
--- a/week1/community-contributions/Youtube_video_summarizer/youtube_video_summarizer.ipynb
+++ b/week1/community-contributions/Youtube_video_summarizer/youtube_video_summarizer.ipynb
@@ -0,0 +1,906 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e371ea2b",
+   "metadata": {},
+   "source": [
+    "# YouTube Video Summarizer\n",
+    "\n",
+    "This notebook provides a comprehensive solution for summarizing YouTube videos using OpenAI's GPT models. It includes:\n",
+    "\n",
+    "- **Automatic transcript extraction** from YouTube videos\n",
+    "- **Intelligent chunking** for large videos that exceed token limits\n",
+    "- **Smart summarization** with academic-quality output\n",
+    "- **Error handling** and dependency management\n",
+    "\n",
+    "## Features\n",
+    "\n",
+    "- ✅ Extracts transcripts from YouTube videos\n",
+    "- ✅ Handles videos of any length with automatic chunking\n",
+    "- ✅ Generates structured, academic-quality summaries\n",
+    "- ✅ Includes proper error handling and dependency checks\n",
+    "- ✅ Optimized for different OpenAI models\n",
+    "- ✅ Interactive notebook format for easy testing\n",
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "Make sure you have the required dependencies installed:\n",
+    "```bash\n",
+    "pip install -r requirements.txt\n",
+    "```\n",
+    "\n",
+    "You'll also need an OpenAI API key set in your environment variables or `.env` file.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95b713e0",
+   "metadata": {},
+   "source": [
+    "## 1. Import Dependencies and Setup\n",
+    "\n",
+    "First, let's import all required libraries and set up the environment.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c940970b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import re\n",
+    "import sys\n",
+    "\n",
+    "# Check for required dependencies and provide helpful error messages\n",
+    "try:\n",
+    "    import requests\n",
+    "    print(\"✅ requests imported successfully\")\n",
+    "except ImportError:\n",
+    "    print(\"❌ Error: 'requests' module not found.\")\n",
+    "    print(\"💡 Install with: pip install requests\")\n",
+    "    print(\"   Or: pip install -r requirements.txt\")\n",
+    "    sys.exit(1)\n",
+    "\n",
+    "try:\n",
+    "    import tiktoken\n",
+    "    print(\"✅ tiktoken imported successfully\")\n",
+    "except ImportError:\n",
+    "    print(\"❌ Error: 'tiktoken' module not found.\")\n",
+    "    print(\"💡 Install with: pip install tiktoken\")\n",
+    "    print(\"   Or: pip install -r requirements.txt\")\n",
+    "    sys.exit(1)\n",
+    "\n",
+    "try:\n",
+    "    from dotenv import load_dotenv\n",
+    "    print(\"✅ python-dotenv imported successfully\")\n",
+    "except ImportError:\n",
+    "    print(\"❌ Error: 'python-dotenv' module not found.\")\n",
+    "    print(\"💡 Install with: pip install python-dotenv\")\n",
+    "    print(\"   Or: pip install -r requirements.txt\")\n",
+    "    sys.exit(1)\n",
+    "\n",
+    "try:\n",
+    "    from openai import OpenAI\n",
+    "    print(\"✅ openai imported successfully\")\n",
+    "except ImportError:\n",
+    "    print(\"❌ Error: 'openai' module not found.\")\n",
+    "    print(\"💡 Install with: pip install openai\")\n",
+    "    print(\"   Or: pip install -r requirements.txt\")\n",
+    "    sys.exit(1)\n",
+    "\n",
+    "try:\n",
+    "    from youtube_transcript_api import YouTubeTranscriptApi\n",
+    "    print(\"✅ youtube-transcript-api imported successfully\")\n",
+    "except ImportError:\n",
+    "    print(\"❌ Error: 'youtube-transcript-api' module not found.\")\n",
+    "    print(\"💡 Install with: pip install youtube-transcript-api\")\n",
+    "    print(\"   Or: pip install -r requirements.txt\")\n",
+    "    sys.exit(1)\n",
+    "\n",
+    "try:\n",
+    "    from bs4 import BeautifulSoup\n",
+    "    print(\"✅ beautifulsoup4 imported successfully\")\n",
+    "except ImportError:\n",
+    "    print(\"❌ Error: 'beautifulsoup4' module not found.\")\n",
+    "    print(\"💡 Install with: pip install beautifulsoup4\")\n",
+    "    print(\"   Or: pip install -r requirements.txt\")\n",
+    "    sys.exit(1)\n",
+    "\n",
+    "try:\n",
+    "    from IPython.display import Markdown, display\n",
+    "    print(\"✅ IPython.display imported successfully\")\n",
+    "except ImportError:\n",
+    "    # IPython is optional for Jupyter notebooks\n",
+    "    print(\"⚠️  Warning: IPython not available (optional for Jupyter notebooks)\")\n",
+    "    Markdown = None\n",
+    "    display = None\n",
+    "\n",
+    "print(\"\\n🎉 All dependencies imported successfully!\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "603e9c3b",
+   "metadata": {},
+   "source": [
+    "## 2. Configuration and Constants\n",
+    "\n",
+    "Set up headers for web scraping and define the YouTubeVideo class.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8584ca1a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Headers for website scraping\n",
+    "headers = {\n",
+    "    \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
+    "}\n",
+    "\n",
+    "class YouTubeVideo:\n",
+    "    \"\"\"Class to handle YouTube video data extraction and processing\"\"\"\n",
+    "    \n",
+    "    def __init__(self, url):\n",
+    "        \"\"\"\n",
+    "        Initialize YouTube video object\n",
+    "        \n",
+    "        Args:\n",
+    "            url (str): YouTube video URL\n",
+    "        \"\"\"\n",
+    "        self.url = url\n",
+    "        youtube_pattern = r'https://www\\.youtube\\.com/watch\\?v=[a-zA-Z0-9_-]+'\n",
+    "        \n",
+    "        if re.match(youtube_pattern, url):\n",
+    "            response = requests.get(url, headers=headers)\n",
+    "            soup = BeautifulSoup(response.content, 'html.parser')\n",
+    "            self.video_id = url.split(\"v=\")[1]\n",
+    "            self.title = soup.title.string if soup.title else \"No title found\"\n",
+    "            self.transcript = YouTubeTranscriptApi().fetch(self.video_id)\n",
+    "        else:\n",
+    "            raise ValueError(\"Invalid YouTube URL\")\n",
+    "    \n",
+    "    def get_transcript_text(self):\n",
+    "        \"\"\"Get transcript as a single text string\"\"\"\n",
+    "        return \" \".join([segment.text for segment in self.transcript])\n",
+    "    \n",
+    "    def get_video_info(self):\n",
+    "        \"\"\"Get basic video information\"\"\"\n",
+    "        return {\n",
+    "            \"title\": self.title,\n",
+    "            \"video_id\": self.video_id,\n",
+    "            \"url\": self.url,\n",
+    "            \"transcript_length\": len(self.transcript)\n",
+    "        }\n",
+    "\n",
+    "print(\"✅ YouTubeVideo class defined successfully\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "235e9998",
+   "metadata": {},
+   "source": [
+    "## 3. OpenAI API Setup\n",
+    "\n",
+    "Functions to handle OpenAI API key and client initialization.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4fa7aba3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_api_key():\n",
+    "    \"\"\"Get OpenAI API key from environment variables\"\"\"\n",
+    "    load_dotenv(override=True)\n",
+    "    api_key = os.getenv(\"OPENAI_API_KEY\")\n",
+    "    if not api_key:\n",
+    "        raise ValueError(\"OPENAI_API_KEY is not set. Please set it in your environment variables or .env file.\")\n",
+    "    return api_key\n",
+    "\n",
+    "def get_openai_client():\n",
+    "    \"\"\"Initialize and return OpenAI client\"\"\"\n",
+    "    api_key = get_api_key()\n",
+    "    return OpenAI(api_key=api_key)\n",
+    "\n",
+    "# Test API connection\n",
+    "try:\n",
+    "    client = get_openai_client()\n",
+    "    print(\"✅ OpenAI client initialized successfully\")\n",
+    "    print(\"✅ API key is valid\")\n",
+    "except Exception as e:\n",
+    "    print(f\"❌ Error initializing OpenAI client: {e}\")\n",
+    "    print(\"💡 Make sure you have set your OPENAI_API_KEY environment variable\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d3223f4",
+   "metadata": {},
+   "source": [
+    "## 4. Token Counting and Chunking Functions\n",
+    "\n",
+    "Functions to handle token counting and intelligent chunking of large transcripts.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "71f68ad0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def count_tokens(text, model=\"gpt-4o-mini\"):\n",
+    "    \"\"\"Count tokens in text using tiktoken with fallback\"\"\"\n",
+    "    try:\n",
+    "        # Try model-specific encoding first\n",
+    "        encoding = tiktoken.encoding_for_model(model)\n",
+    "        return len(encoding.encode(text))\n",
+    "    except KeyError:\n",
+    "        # Fallback to cl100k_base encoding (used by most OpenAI models)\n",
+    "        # This ensures compatibility even if model-specific encoding isn't available\n",
+    "        encoding = tiktoken.get_encoding(\"cl100k_base\")\n",
+    "        return len(encoding.encode(text))\n",
+    "    except Exception as e:\n",
+    "        # Ultimate fallback - rough estimation\n",
+    "        print(f\"Warning: Token counting failed ({e}), using rough estimation\")\n",
+    "        return len(text.split()) * 1.3  # Rough word-to-token ratio\n",
+    "\n",
+    "def get_optimal_chunk_size(model=\"gpt-4o-mini\"):\n",
+    "    \"\"\"Calculate optimal chunk size based on model's context window\"\"\"\n",
+    "    model_limits = {\n",
+    "        \"gpt-4o-mini\": 8192,\n",
+    "        \"gpt-4o\": 128000,\n",
+    "        \"gpt-4-turbo\": 128000,\n",
+    "        \"gpt-3.5-turbo\": 4096,\n",
+    "        \"gpt-4\": 8192,\n",
+    "    }\n",
+    "    \n",
+    "    context_window = model_limits.get(model, 8192)  # Default to 8K\n",
+    "    \n",
+    "    # Reserve tokens for:\n",
+    "    # - System prompt: ~800 tokens\n",
+    "    # - User prompt overhead: ~300 tokens  \n",
+    "    # - Output: ~2000 tokens\n",
+    "    # - Safety buffer: ~500 tokens\n",
+    "    reserved_tokens = 800 + 300 + 2000 + 500\n",
+    "    \n",
+    "    optimal_chunk_size = context_window - reserved_tokens\n",
+    "    \n",
+    "    # Ensure minimum chunk size\n",
+    "    return max(optimal_chunk_size, 2000)\n",
+    "\n",
+    "print(\"✅ Token counting and chunk size functions defined\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b6647838",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def chunk_transcript(transcript, max_tokens=4000, overlap_tokens=200, model=\"gpt-4o-mini\"):\n",
+    "    \"\"\"\n",
+    "    Split transcript into chunks that fit within token limits\n",
+    "    \n",
+    "    Args:\n",
+    "        transcript: List of transcript segments from YouTube\n",
+    "        max_tokens: Maximum tokens per chunk (auto-calculated if None)\n",
+    "        overlap_tokens: Number of tokens to overlap between chunks\n",
+    "        model: Model name for token limit calculation\n",
+    "    \n",
+    "    Returns:\n",
+    "        List of transcript chunks\n",
+    "    \"\"\"\n",
+    "    # Auto-calculate max_tokens based on model if not provided\n",
+    "    if max_tokens is None:\n",
+    "        max_tokens = get_optimal_chunk_size(model)\n",
+    "    \n",
+    "    # Auto-calculate overlap as percentage of max_tokens\n",
+    "    if overlap_tokens is None:\n",
+    "        overlap_tokens = int(max_tokens * 0.05)  # 5% overlap\n",
+    "        \n",
+    "    # Convert transcript to text\n",
+    "    transcript_text = \" \".join([segment.text for segment in transcript])\n",
+    "    \n",
+    "    # If transcript is small enough, return as single chunk\n",
+    "    if count_tokens(transcript_text) <= max_tokens:\n",
+    "        return [transcript_text]\n",
+    "    \n",
+    "    # Split into sentences for better chunking\n",
+    "    sentences = re.split(r'[.!?]+', transcript_text)\n",
+    "    chunks = []\n",
+    "    current_chunk = \"\"\n",
+    "    \n",
+    "    for sentence in sentences:\n",
+    "        sentence = sentence.strip()\n",
+    "        if not sentence:\n",
+    "            continue\n",
+    "            \n",
+    "        # Check if adding this sentence would exceed token limit\n",
+    "        test_chunk = current_chunk + \" \" + sentence if current_chunk else sentence\n",
+    "        \n",
+    "        if count_tokens(test_chunk) <= max_tokens:\n",
+    "            current_chunk = test_chunk\n",
+    "        else:\n",
+    "            # Save current chunk and start new one\n",
+    "            if current_chunk:\n",
+    "                chunks.append(current_chunk)\n",
+    "            \n",
+    "            # Start new chunk with overlap from previous chunk\n",
+    "            if chunks and overlap_tokens > 0:\n",
+    "                # Get last few words from previous chunk for overlap\n",
+    "                prev_words = current_chunk.split()[-overlap_tokens//4:]  # Rough word-to-token ratio\n",
+    "                current_chunk = \" \".join(prev_words) + \" \" + sentence\n",
+    "            else:\n",
+    "                current_chunk = sentence\n",
+    "    \n",
+    "    # Add the last chunk\n",
+    "    if current_chunk:\n",
+    "        chunks.append(current_chunk)\n",
+    "    \n",
+    "    return chunks\n",
+    "\n",
+    "print(\"✅ Chunking function defined\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ee3f8a4",
+   "metadata": {},
+   "source": [
+    "## 5. Prompt Generation Functions\n",
+    "\n",
+    "Functions to generate system prompts, user prompts, and stitching prompts for the summarization process.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e7f20bf5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def generate_system_prompt():\n",
+    "    \"\"\"Generate the system prompt for video summarization\"\"\"\n",
+    "    return f\"\"\"\n",
+    "    You are an expert YouTube video summarizer. Your job is to take the full transcript of a video and generate a structured, precise, and academically grounded summary.\n",
+    "\n",
+    "    Your output must include:\n",
+    "\n",
+    "    1. Title\n",
+    "    - Either reuse the video's title (if it is clear, accurate, and concise)\n",
+    "    - Or generate a new, sharper, more descriptive title that best reflects the actual content covered.\n",
+    "\n",
+    "    2. Topic & Area of Coverage\n",
+    "    - Provide a 1–2 line highlight of the main topic of the video and the specific area it best covers.\n",
+    "    - Format:\n",
+    "        - Domain (e.g., Finance, Health, Technology, Psychology, Fitness, Productivity, etc.)\n",
+    "        - Sub-area (e.g., investment strategies, portfolio design; training routine, best exercises; productivity systems, cognitive science insights, etc.)\n",
+    "\n",
+    "    3. Summary of the Video\n",
+    "    - A structured, clear, and concise summary of the video.\n",
+    "    - Focus only on relevant, high-value content.\n",
+    "    - Skip fluff, tangents, product promotions, personal banter, or irrelevant side discussions.\n",
+    "    - Include key insights, frameworks, step-by-step methods, and actionable advice.\n",
+    "    - Where applicable, reference scientific studies, historical sources, or authoritative references (with author + year or journal if mentioned in the video, or inferred if the reference is well known).\n",
+    "\n",
+    "    Style & Quality Rules:\n",
+    "    - Be extremely specific: avoid vague generalizations.\n",
+    "    - Use precise language and structured formatting (bullet points, numbered lists, sub-sections if needed).\n",
+    "    - Prioritize clarity and factual accuracy.\n",
+    "    - Write as though preparing an executive briefing or academic digest.\n",
+    "    - If the transcript includes non-relevant sections (jokes, ads, unrelated chit-chat), skip summarizing them entirely.\n",
+    "    \"\"\"\n",
+    "\n",
+    "def generate_user_prompt(website, transcript_chunk=None):\n",
+    "    \"\"\"Generate user prompt for video summarization\"\"\"\n",
+    "    if transcript_chunk:\n",
+    "        return f\"\"\"Here is a portion of a YouTube video transcript. Use the system instructions to generate a summary of this section.\n",
+    "\n",
+    "    Video Title: {website.title}\n",
+    "\n",
+    "    Transcript Section: {transcript_chunk}\n",
+    "    \"\"\"\n",
+    "    else:\n",
+    "        return f\"\"\"Here is the transcript of a YouTube video. Use the system instructions to generate the output.\n",
+    "\n",
+    "    Video Title: {website.title}\n",
+    "\n",
+    "    Transcript: {website.transcript}\n",
+    "    \"\"\"\n",
+    "\n",
+    "def generate_stitching_prompt(chunk_summaries, video_title):\n",
+    "    \"\"\"Generate prompt for stitching together chunk summaries\"\"\"\n",
+    "    return f\"\"\"You are an expert at combining multiple summaries into a cohesive, comprehensive summary.\n",
+    "\n",
+    "    Video Title: {video_title}\n",
+    "\n",
+    "    Below are summaries of different sections of this video. Combine them into a single, well-structured summary that:\n",
+    "    1. Maintains the original structure and quality standards\n",
+    "    2. Eliminates redundancy between sections\n",
+    "    3. Creates smooth transitions between topics\n",
+    "    4. Preserves all important information \n",
+    "    5. Maintains the academic, professional tone\n",
+    "    6. Include examples and nuances where relevant\n",
+    "    7. Include the citations and references where applicable\n",
+    "\n",
+    "    Section Summaries:\n",
+    "    {chr(10).join([f\"Section {i+1}: {summary}\" for i, summary in enumerate(chunk_summaries)])}\n",
+    "\n",
+    "    Please provide a unified, comprehensive summary following the same format as the individual sections.\n",
+    "    Make sure the final summary is cohesive and logical.\n",
+    "    \"\"\"\n",
+    "\n",
+    "print(\"✅ Prompt generation functions defined\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c9a620d",
+   "metadata": {},
+   "source": [
+    "## 6. Summarization Functions\n",
+    "\n",
+    "Core functions for summarizing videos with support for both single-chunk and chunked processing.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cc8a183b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def summarize_single_chunk(website, client):\n",
+    "    \"\"\"Summarize a single chunk (small video)\"\"\"\n",
+    "    system_prompt = generate_system_prompt()\n",
+    "    user_prompt = generate_user_prompt(website)\n",
+    "    \n",
+    "    try:\n",
+    "        response = client.chat.completions.create(\n",
+    "            model=\"gpt-4o-mini\",\n",
+    "            messages=[\n",
+    "                {\"role\": \"system\", \"content\": system_prompt},\n",
+    "                {\"role\": \"user\", \"content\": user_prompt}\n",
+    "            ],\n",
+    "            max_tokens=2000,\n",
+    "            temperature=0.3\n",
+    "        )\n",
+    "        \n",
+    "        return response.choices[0].message.content\n",
+    "        \n",
+    "    except Exception as e:\n",
+    "        return f\"Error generating summary: {str(e)}\"\n",
+    "\n",
+    "def summarize_with_chunking(website, client, max_chunk_tokens=4000):\n",
+    "    \"\"\"Summarize a large video by chunking and stitching\"\"\"\n",
+    "    print(\"Video is large, using chunking strategy...\")\n",
+    "    \n",
+    "    # Chunk the transcript\n",
+    "    chunks = chunk_transcript(website.transcript, max_chunk_tokens)\n",
+    "    print(f\"Split into {len(chunks)} chunks\")\n",
+    "    \n",
+    "    # Summarize each chunk\n",
+    "    chunk_summaries = []\n",
+    "    system_prompt = generate_system_prompt()\n",
+    "    \n",
+    "    for i, chunk in enumerate(chunks):\n",
+    "        print(f\"Processing chunk {i+1}/{len(chunks)}...\")\n",
+    "        user_prompt = generate_user_prompt(website, chunk)\n",
+    "        \n",
+    "        try:\n",
+    "            response = client.chat.completions.create(\n",
+    "                model=\"gpt-4o-mini\",\n",
+    "                messages=[\n",
+    "                    {\"role\": \"system\", \"content\": system_prompt},\n",
+    "                    {\"role\": \"user\", \"content\": user_prompt}\n",
+    "                ],\n",
+    "                max_tokens=1500,  # Smaller for chunks\n",
+    "                temperature=0.3\n",
+    "            )\n",
+    "            \n",
+    "            chunk_summaries.append(response.choices[0].message.content)\n",
+    "            \n",
+    "        except Exception as e:\n",
+    "            chunk_summaries.append(f\"Error in chunk {i+1}: {str(e)}\")\n",
+    "    \n",
+    "    # Stitch the summaries together\n",
+    "    print(\"Stitching summaries together...\")\n",
+    "    stitching_prompt = generate_stitching_prompt(chunk_summaries, website.title)\n",
+    "    \n",
+    "    try:\n",
+    "        response = client.chat.completions.create(\n",
+    "            model=\"gpt-4o-mini\",\n",
+    "            messages=[\n",
+    "                {\"role\": \"system\", \"content\": \"You are an expert at combining multiple summaries into a cohesive, comprehensive summary.\"},\n",
+    "                {\"role\": \"user\", \"content\": stitching_prompt}\n",
+    "            ],\n",
+    "            max_tokens=2000,\n",
+    "            temperature=0.3\n",
+    "        )\n",
+    "        \n",
+    "        return response.choices[0].message.content\n",
+    "        \n",
+    "    except Exception as e:\n",
+    "        return f\"Error stitching summaries: {str(e)}\"\n",
+    "\n",
+    "print(\"✅ Summarization functions defined\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "99168160",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def summarize_video(website, use_chunking=True, max_chunk_tokens=4000):\n",
+    "    \"\"\"Summarize a YouTube video using OpenAI API with optional chunking for large videos\"\"\"\n",
+    "    client = get_openai_client()\n",
+    "    \n",
+    "    # Check if we need chunking\n",
+    "    transcript_text = \" \".join([segment.text for segment in website.transcript])\n",
+    "    total_tokens = count_tokens(transcript_text)\n",
+    "    \n",
+    "    print(f\"Total transcript tokens: {total_tokens}\")\n",
+    "    \n",
+    "    if total_tokens <= max_chunk_tokens and not use_chunking:\n",
+    "        # Single summary for small videos\n",
+    "        return summarize_single_chunk(website, client)\n",
+    "    else:\n",
+    "        # Chunked summary for large videos\n",
+    "        return summarize_with_chunking(website, client, max_chunk_tokens)\n",
+    "\n",
+    "print(\"✅ Main summarization function defined\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54a76dab",
+   "metadata": {},
+   "source": [
+    "## 7. Interactive Demo\n",
+    "\n",
+    "Now let's test the YouTube video summarizer with a sample video. You can replace the URL with any YouTube video you want to summarize.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "87badeff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Example usage - replace with your YouTube URL\n",
+    "video_url = \"https://www.youtube.com/watch?v=Xan5JnecLNA\"\n",
+    "\n",
+    "try:\n",
+    "    # Create YouTube video object\n",
+    "    print(\"🎬 Fetching video data...\")\n",
+    "    video = YouTubeVideo(video_url)\n",
+    "    \n",
+    "    # Display video info\n",
+    "    print(f\"📺 Video Title: {video.title}\")\n",
+    "    print(f\"🆔 Video ID: {video.video_id}\")\n",
+    "    \n",
+    "    # Count tokens in transcript\n",
+    "    transcript_text = video.get_transcript_text()\n",
+    "    total_tokens = count_tokens(transcript_text)\n",
+    "    print(f\"📊 Total transcript tokens: {total_tokens}\")\n",
+    "    \n",
+    "    # Show video info\n",
+    "    info = video.get_video_info()\n",
+    "    print(f\"📝 Transcript segments: {info['transcript_length']}\")\n",
+    "    \n",
+    "except Exception as e:\n",
+    "    print(f\"❌ Error: {str(e)}\")\n",
+    "    print(\"💡 Make sure the YouTube URL is valid and the video has captions available\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9e4cf2f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate summary (automatically uses chunking if needed)\n",
+    "if 'video' in locals():\n",
+    "    print(\"\\n🤖 Generating summary...\")\n",
+    "    print(\"⏳ This may take a few minutes for long videos...\")\n",
+    "    \n",
+    "    try:\n",
+    "        summary = summarize_video(video, use_chunking=True, max_chunk_tokens=4000)\n",
+    "        \n",
+    "        # Display results with nice formatting\n",
+    "        print(\"\\n\" + \"=\"*60)\n",
+    "        print(\"📋 FINAL SUMMARY\")\n",
+    "        print(\"=\"*60)\n",
+    "        \n",
+    "        # Use IPython display if available for better formatting\n",
+    "        if display and Markdown:\n",
+    "            display(Markdown(summary))\n",
+    "        else:\n",
+    "            print(summary)\n",
+    "            \n",
+    "    except Exception as e:\n",
+    "        print(f\"❌ Error generating summary: {str(e)}\")\n",
+    "else:\n",
+    "    print(\"⚠️ Please run the previous cell first to load a video\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42ff8a15",
+   "metadata": {},
+   "source": [
+    "## 8. Testing and Utility Functions\n",
+    "\n",
+    "Additional functions for testing the chunking functionality and other utilities.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d798b08f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def test_chunking():\n",
+    "    \"\"\"Test function to demonstrate chunking with a sample transcript\"\"\"\n",
+    "    # Sample transcript for testing\n",
+    "    sample_transcript = [\n",
+    "        {\"text\": \"This is a sample transcript segment 1. \" * 100},  # ~1000 tokens\n",
+    "        {\"text\": \"This is a sample transcript segment 2. \" * 100},  # ~1000 tokens\n",
+    "        {\"text\": \"This is a sample transcript segment 3. \" * 100},  # ~1000 tokens\n",
+    "        {\"text\": \"This is a sample transcript segment 4. \" * 100},  # ~1000 tokens\n",
+    "        {\"text\": \"This is a sample transcript segment 5. \" * 100},  # ~1000 tokens\n",
+    "    ]\n",
+    "    \n",
+    "    print(\"🧪 Testing chunking functionality...\")\n",
+    "    chunks = chunk_transcript(sample_transcript, max_tokens=2000, overlap_tokens=100)\n",
+    "    \n",
+    "    print(f\"📊 Original transcript: {count_tokens(' '.join([s['text'] for s in sample_transcript]))} tokens\")\n",
+    "    print(f\"📦 Number of chunks: {len(chunks)}\")\n",
+    "    \n",
+    "    for i, chunk in enumerate(chunks):\n",
+    "        print(f\"📄 Chunk {i+1}: {count_tokens(chunk)} tokens\")\n",
+    "\n",
+    "def analyze_video_tokens(video_url):\n",
+    "    \"\"\"Analyze token count and chunking strategy for a video\"\"\"\n",
+    "    try:\n",
+    "        video = YouTubeVideo(video_url)\n",
+    "        transcript_text = video.get_transcript_text()\n",
+    "        total_tokens = count_tokens(transcript_text)\n",
+    "        \n",
+    "        print(f\"📺 Video: {video.title}\")\n",
+    "        print(f\"📊 Total tokens: {total_tokens}\")\n",
+    "        print(f\"📦 Optimal chunk size: {get_optimal_chunk_size()}\")\n",
+    "        \n",
+    "        if total_tokens > 4000:\n",
+    "            chunks = chunk_transcript(video.transcript, max_tokens=4000)\n",
+    "            print(f\"🔀 Would be split into {len(chunks)} chunks\")\n",
+    "            print(\"✅ Chunking strategy recommended\")\n",
+    "        else:\n",
+    "            print(\"✅ Single summary strategy sufficient\")\n",
+    "            \n",
+    "    except Exception as e:\n",
+    "        print(f\"❌ Error analyzing video: {str(e)}\")\n",
+    "\n",
+    "print(\"✅ Testing and utility functions defined\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bfd789e5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Test chunking functionality (optional)\n",
+    "# Uncomment the line below to test chunking with sample data\n",
+    "# test_chunking()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3528125f",
+   "metadata": {},
+   "source": [
+    "## 9. Usage Instructions\n",
+    "\n",
+    "### How to Use This Notebook\n",
+    "\n",
+    "1. **Set up your OpenAI API key**:\n",
+    "   - Create a `.env` file in the same directory as this notebook\n",
+    "   - Add your API key: `OPENAI_API_KEY=your_api_key_here`\n",
+    "   - Or set it as an environment variable\n",
+    "\n",
+    "2. **Install dependencies**:\n",
+    "   ```bash\n",
+    "   pip install -r requirements.txt\n",
+    "   ```\n",
+    "\n",
+    "3. **Run the cells in order**:\n",
+    "   - Start with the import and setup cells\n",
+    "   - Modify the `video_url` variable in the demo section\n",
+    "   - Run the demo cells to test the summarizer\n",
+    "\n",
+    "### Customization Options\n",
+    "\n",
+    "- **Change the model**: Modify the model parameter in the summarization functions\n",
+    "- **Adjust chunk size**: Change `max_chunk_tokens` parameter\n",
+    "- **Modify prompts**: Edit the prompt generation functions for different output styles\n",
+    "- **Add error handling**: Extend the exception handling as needed\n",
+    "\n",
+    "### Features\n",
+    "\n",
+    "- ✅ **Automatic transcript extraction** from YouTube videos\n",
+    "- ✅ **Intelligent chunking** for videos exceeding token limits\n",
+    "- ✅ **Academic-quality summaries** with structured output\n",
+    "- ✅ **Error handling** and dependency validation\n",
+    "- ✅ **Interactive testing** with sample data\n",
+    "- ✅ **Token analysis** and optimization recommendations\n",
+    "\n",
+    "### Troubleshooting\n",
+    "\n",
+    "- **\"No transcript available\"**: The video may not have captions enabled\n",
+    "- **\"Invalid YouTube URL\"**: Make sure the URL follows the correct format\n",
+    "- **\"API key not set\"**: Check your `.env` file or environment variables\n",
+    "- **Import errors**: Run `pip install -r requirements.txt` to install dependencies\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5a44fb8",
+   "metadata": {},
+   "source": [
+    "## 10. Advanced Usage Examples\n",
+    "\n",
+    "Here are some advanced usage patterns you can try with this notebook.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2bef390a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Example 1: Analyze multiple videos\n",
+    "video_urls = [\n",
+    "    \"https://www.youtube.com/watch?v=Xan5JnecLNA\",\n",
+    "    # Add more URLs here\n",
+    "]\n",
+    "\n",
+    "for url in video_urls:\n",
+    "    print(f\"\\n{'='*50}\")\n",
+    "    print(f\"Analyzing: {url}\")\n",
+    "    print('='*50)\n",
+    "    analyze_video_tokens(url)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fbdb5cd8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Example 2: Custom summarization with different parameters\n",
+    "def custom_summarize(video_url, model=\"gpt-4o-mini\", max_tokens=3000, temperature=0.1):\n",
+    "    \"\"\"Custom summarization with specific parameters\"\"\"\n",
+    "    try:\n",
+    "        video = YouTubeVideo(video_url)\n",
+    "        client = get_openai_client()\n",
+    "        \n",
+    "        # Use custom chunking parameters\n",
+    "        chunks = chunk_transcript(video.transcript, max_tokens=max_tokens)\n",
+    "        \n",
+    "        if len(chunks) == 1:\n",
+    "            # Single chunk\n",
+    "            system_prompt = generate_system_prompt()\n",
+    "            user_prompt = generate_user_prompt(video, chunks[0])\n",
+    "            \n",
+    "            response = client.chat.completions.create(\n",
+    "                model=model,\n",
+    "                messages=[\n",
+    "                    {\"role\": \"system\", \"content\": system_prompt},\n",
+    "                    {\"role\": \"user\", \"content\": user_prompt}\n",
+    "                ],\n",
+    "                max_tokens=2000,\n",
+    "                temperature=temperature\n",
+    "            )\n",
+    "            \n",
+    "            return response.choices[0].message.content\n",
+    "        else:\n",
+    "            # Multiple chunks - use standard chunking approach\n",
+    "            return summarize_with_chunking(video, client, max_tokens)\n",
+    "            \n",
+    "    except Exception as e:\n",
+    "        return f\"Error: {str(e)}\"\n",
+    "\n",
+    "# Example usage:\n",
+    "# custom_summary = custom_summarize(\"https://www.youtube.com/watch?v=Xan5JnecLNA\")\n",
+    "# print(custom_summary)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7a5a9e9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate summary (automatically uses chunking if needed)\n",
+    "if 'video' in locals():\n",
+    "    print(\"\\n🤖 Generating summary...\")\n",
+    "    print(\"⏳ This may take a few minutes for long videos...\")\n",
+    "    \n",
+    "    try:\n",
+    "        summary = summarize_video(video, use_chunking=True, max_chunk_tokens=4000)\n",
+    "        \n",
+    "        # Display results with nice formatting\n",
+    "        print(\"\\n\" + \"=\"*60)\n",
+    "        print(\"📋 FINAL SUMMARY\")\n",
+    "        print(\"=\"*60)\n",
+    "        \n",
+    "        # Use IPython display if available for better formatting\n",
+    "        if display and Markdown:\n",
+    "            display(Markdown(summary))\n",
+    "        else:\n",
+    "            print(summary)\n",
+    "            \n",
+    "    except Exception as e:\n",
+    "        print(f\"❌ Error generating summary: {str(e)}\")\n",
+    "else:\n",
+    "    print(\"⚠️ Please run the previous cell first to load a video\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4028fa5e",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c100b384-2c3e-49de-92ce-f5dd0b4b58c0",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/week1/community-contributions/Youtube_video_summarizer/youtube_video_summarizer.py
+++ b/week1/community-contributions/Youtube_video_summarizer/youtube_video_summarizer.py
@@ -0,0 +1,421 @@
+import os
+import re
+import sys
+
+# Check for required dependencies and provide helpful error messages
+try:
+    import requests
+except ImportError:
+    print("❌ Error: 'requests' module not found.")
+    print("💡 Install with: pip install requests")
+    print("   Or: pip install -r requirements.txt")
+    sys.exit(1)
+
+try:
+    import tiktoken
+except ImportError:
+    print("❌ Error: 'tiktoken' module not found.")
+    print("💡 Install with: pip install tiktoken")
+    print("   Or: pip install -r requirements.txt")
+    sys.exit(1)
+
+try:
+    from dotenv import load_dotenv
+except ImportError:
+    print("❌ Error: 'python-dotenv' module not found.")
+    print("💡 Install with: pip install python-dotenv")
+    print("   Or: pip install -r requirements.txt")
+    sys.exit(1)
+
+try:
+    from openai import OpenAI
+except ImportError:
+    print("❌ Error: 'openai' module not found.")
+    print("💡 Install with: pip install openai")
+    print("   Or: pip install -r requirements.txt")
+    sys.exit(1)
+
+try:
+    from youtube_transcript_api import YouTubeTranscriptApi
+except ImportError:
+    print("❌ Error: 'youtube-transcript-api' module not found.")
+    print("💡 Install with: pip install youtube-transcript-api")
+    print("   Or: pip install -r requirements.txt")
+    sys.exit(1)
+
+try:
+    from bs4 import BeautifulSoup
+except ImportError:
+    print("❌ Error: 'beautifulsoup4' module not found.")
+    print("💡 Install with: pip install beautifulsoup4")
+    print("   Or: pip install -r requirements.txt")
+    sys.exit(1)
+
+try:
+    from IPython.display import Markdown, display
+except ImportError:
+    # IPython is optional for Jupyter notebooks
+    print("⚠️  Warning: IPython not available (optional for Jupyter notebooks)")
+    Markdown = None
+    display = None
+
+#headers and class for website to summarize
+headers = {
+ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
+}
+
+class YouTubeVideo:
+    def __init__(self, url):
+        self.url = url
+        youtube_pattern = r'https://www\.youtube\.com/watch\?v=[a-zA-Z0-9_-]+'
+        if re.match(youtube_pattern, url):
+            response = requests.get(url, headers=headers)
+            soup = BeautifulSoup(response.content, 'html.parser')
+            self.video_id = url.split("v=")[1]
+            self.title = soup.title.string if soup.title else "No title found"
+            self.transcript = YouTubeTranscriptApi().fetch(self.video_id)
+        else:
+            raise ValueError("Invalid YouTube URL")                
+
+#get api key and openai client
+def get_api_key():
+    load_dotenv(override=True)
+    api_key = os.getenv("OPENAI_API_KEY")
+    if not api_key:
+        raise ValueError("OPENAI_API_KEY is not set")
+    return api_key
+
+def get_openai_client():
+    api_key = get_api_key()
+    return OpenAI(api_key=api_key)
+
+#count tokens
+def count_tokens(text, model="gpt-4o-mini"):
+    """Count tokens in text using tiktoken with fallback"""
+    try:
+        # Try model-specific encoding first
+        encoding = tiktoken.encoding_for_model(model)
+        return len(encoding.encode(text))
+    except KeyError:
+        # Fallback to cl100k_base encoding (used by most OpenAI models)
+        # This ensures compatibility even if model-specific encoding isn't available
+        encoding = tiktoken.get_encoding("cl100k_base")
+        return len(encoding.encode(text))
+    except Exception as e:
+        # Ultimate fallback - rough estimation
+        print(f"Warning: Token counting failed ({e}), using rough estimation")
+        return len(text.split()) * 1.3  # Rough word-to-token ratio
+
+
+def get_optimal_chunk_size(model="gpt-4o-mini"):
+    """Calculate optimal chunk size based on model's context window"""
+    model_limits = {
+        "gpt-4o-mini": 8192,
+        "gpt-4o": 128000,
+        "gpt-4-turbo": 128000,
+        "gpt-3.5-turbo": 4096,
+        "gpt-4": 8192,
+    }
+    
+    context_window = model_limits.get(model, 8192)  # Default to 8K
+    
+    # Reserve tokens for:
+    # - System prompt: ~800 tokens
+    # - User prompt overhead: ~300 tokens  
+    # - Output: ~2000 tokens
+    # - Safety buffer: ~500 tokens
+    reserved_tokens = 800 + 300 + 2000 + 500
+    
+    optimal_chunk_size = context_window - reserved_tokens
+    
+    # Ensure minimum chunk size
+    return max(optimal_chunk_size, 2000)
+
+#chunk transcript
+def chunk_transcript(transcript, max_tokens=4000, overlap_tokens=200, model="gpt-4o-mini"):
+    """
+    Split transcript into chunks that fit within token limits
+    
+    Args:
+        transcript: List of transcript segments from YouTube
+        max_tokens: Maximum tokens per chunk (auto-calculated if None)
+        overlap_tokens: Number of tokens to overlap between chunks
+        model: Model name for token limit calculation
+    
+    Returns:
+        List of transcript chunks
+    """
+    # Auto-calculate max_tokens based on model if not provided
+    if max_tokens is None:
+        max_tokens = get_optimal_chunk_size(model)
+    
+    # Auto-calculate overlap as percentage of max_tokens
+    if overlap_tokens is None:
+        overlap_tokens = int(max_tokens * 0.05)  # 5% overlap
+    # Convert transcript to text
+    transcript_text = " ".join([segment.text for segment in transcript])
+    
+    # If transcript is small enough, return as single chunk
+    if count_tokens(transcript_text) <= max_tokens:
+        return [transcript_text]
+    
+    # Split into sentences for better chunking
+    sentences = re.split(r'[.!?]+', transcript_text)
+    chunks = []
+    current_chunk = ""
+    
+    for sentence in sentences:
+        sentence = sentence.strip()
+        if not sentence:
+            continue
+            
+        # Check if adding this sentence would exceed token limit
+        test_chunk = current_chunk + " " + sentence if current_chunk else sentence
+        
+        if count_tokens(test_chunk) <= max_tokens:
+            current_chunk = test_chunk
+        else:
+            # Save current chunk and start new one
+            if current_chunk:
+                chunks.append(current_chunk)
+            
+            # Start new chunk with overlap from previous chunk
+            if chunks and overlap_tokens > 0:
+                # Get last few words from previous chunk for overlap
+                prev_words = current_chunk.split()[-overlap_tokens//4:]  # Rough word-to-token ratio
+                current_chunk = " ".join(prev_words) + " " + sentence
+            else:
+                current_chunk = sentence
+    
+    # Add the last chunk
+    if current_chunk:
+        chunks.append(current_chunk)
+    
+    return chunks
+
+#generate system prompt
+def generate_system_prompt():
+    return f"""
+    You are an expert YouTube video summarizer. Your job is to take the full transcript of a video and generate a structured, precise, and academically grounded summary.
+
+    Your output must include:
+
+    1. Title
+    - Either reuse the video’s title (if it is clear, accurate, and concise)
+    - Or generate a new, sharper, more descriptive title that best reflects the actual content covered.
+
+    2. Topic & Area of Coverage
+    - Provide a 1–2 line highlight of the main topic of the video and the specific area it best covers.
+    - Format:
+        - Domain (e.g., Finance, Health, Technology, Psychology, Fitness, Productivity, etc.)
+        - Sub-area (e.g., investment strategies, portfolio design; training routine, best exercises; productivity systems, cognitive science insights, etc.)
+
+    3. Summary of the Video
+    - A structured, clear, and concise summary of the video.
+    - Focus only on relevant, high-value content.
+    - Skip fluff, tangents, product promotions, personal banter, or irrelevant side discussions.
+    - Include key insights, frameworks, step-by-step methods, and actionable advice.
+    - Where applicable, reference scientific studies, historical sources, or authoritative references (with author + year or journal if mentioned in the video, or inferred if the reference is well known).
+
+    Style & Quality Rules:
+    - Be extremely specific: avoid vague generalizations.
+    - Use precise language and structured formatting (bullet points, numbered lists, sub-sections if needed).
+    - Prioritize clarity and factual accuracy.
+    - Write as though preparing an executive briefing or academic digest.
+    - If the transcript includes non-relevant sections (jokes, ads, unrelated chit-chat), skip summarizing them entirely.
+    """
+
+#generate user prompt
+def generate_user_prompt(website, transcript_chunk=None):
+    if transcript_chunk:
+        return f"""Here is a portion of a YouTube video transcript. Use the system instructions to generate a summary of this section.
+
+    Video Title: {website.title}
+
+    Transcript Section: {transcript_chunk}
+    """
+    else:
+        return f"""Here is the transcript of a YouTube video. Use the system instructions to generate the output.
+
+    Video Title: {website.title}
+
+    Transcript: {website.transcript}
+    """
+
+#generate stitching prompt
+def generate_stitching_prompt(chunk_summaries, video_title):
+    """Generate prompt for stitching together chunk summaries"""
+    return f"""You are an expert at combining multiple summaries into a cohesive, comprehensive summary.
+
+    Video Title: {video_title}
+
+    Below are summaries of different sections of this video. Combine them into a single, well-structured summary that:
+    1. Maintains the original structure and quality standards
+    2. Eliminates redundancy between sections
+    3. Creates smooth transitions between topics
+    4. Preserves all important information 
+    5. Maintains the academic, professional tone
+    6. Include examples and nuances where relevant
+    7. Include the citations and references where applicable
+
+    Section Summaries:
+    {chr(10).join([f"Section {i+1}: {summary}" for i, summary in enumerate(chunk_summaries)])}
+
+    Please provide a unified, comprehensive summary following the same format as the individual sections.
+    Make sure the final summary is cohesive and logical.
+    """
+
+#summarize video
+def summarize_video(website, use_chunking=True, max_chunk_tokens=4000):
+    """Summarize a YouTube video using OpenAI API with optional chunking for large videos"""
+    client = get_openai_client()
+    
+    # Check if we need chunking
+    transcript_text = " ".join([segment.text for segment in website.transcript])
+    total_tokens = count_tokens(transcript_text)
+    
+    print(f"Total transcript tokens: {total_tokens}")
+    
+    if total_tokens <= max_chunk_tokens and not use_chunking:
+        # Single summary for small videos
+        return summarize_single_chunk(website, client)
+    else:
+        # Chunked summary for large videos
+        return summarize_with_chunking(website, client, max_chunk_tokens)
+
+#summarize single chunk
+def summarize_single_chunk(website, client):
+    """Summarize a single chunk (small video)"""
+    system_prompt = generate_system_prompt()
+    user_prompt = generate_user_prompt(website)
+    
+    try:
+        response = client.chat.completions.create(
+            model="gpt-4o-mini",
+            messages=[
+                {"role": "system", "content": system_prompt},
+                {"role": "user", "content": user_prompt}
+            ],
+            max_tokens=2000,
+            temperature=0.3
+        )
+        
+        return response.choices[0].message.content
+        
+    except Exception as e:
+        return f"Error generating summary: {str(e)}"
+
+#summarize with chunking
+def summarize_with_chunking(website, client, max_chunk_tokens=4000):
+    """Summarize a large video by chunking and stitching"""
+    print("Video is large, using chunking strategy...")
+    
+    # Chunk the transcript
+    chunks = chunk_transcript(website.transcript, max_chunk_tokens)
+    print(f"Split into {len(chunks)} chunks")
+    
+    # Summarize each chunk
+    chunk_summaries = []
+    system_prompt = generate_system_prompt()
+    
+    for i, chunk in enumerate(chunks):
+        print(f"Processing chunk {i+1}/{len(chunks)}...")
+        user_prompt = generate_user_prompt(website, chunk)
+        
+        try:
+            response = client.chat.completions.create(
+                model="gpt-4o-mini",
+                messages=[
+                    {"role": "system", "content": system_prompt},
+                    {"role": "user", "content": user_prompt}
+                ],
+                max_tokens=1500,  # Smaller for chunks
+                temperature=0.3
+            )
+            
+            chunk_summaries.append(response.choices[0].message.content)
+            
+        except Exception as e:
+            chunk_summaries.append(f"Error in chunk {i+1}: {str(e)}")
+    
+    # Stitch the summaries together
+    print("Stitching summaries together...")
+    stitching_prompt = generate_stitching_prompt(chunk_summaries, website.title)
+    
+    try:
+        response = client.chat.completions.create(
+            model="gpt-4o-mini",
+            messages=[
+                {"role": "system", "content": "You are an expert at combining multiple summaries into a cohesive, comprehensive summary."},
+                {"role": "user", "content": stitching_prompt}
+            ],
+            max_tokens=2000,
+            temperature=0.3
+        )
+        
+        return response.choices[0].message.content
+        
+    except Exception as e:
+        return f"Error stitching summaries: {str(e)}"
+
+#main function
+def main():
+    """Main function to demonstrate usage"""
+    # Example usage - replace with actual YouTube URL
+    video_url = "https://www.youtube.com/watch?v=Xan5JnecLNA"
+    
+    try:
+        # Create YouTube video object
+        print("Fetching video data...")
+        video = YouTubeVideo(video_url)
+        
+        # Display video info
+        print(f"Video Title: {video.title}")
+        print(f"Video ID: {video.video_id}")
+        
+        # Count tokens in transcript
+        transcript_text = " ".join([segment.text for segment in video.transcript])
+        total_tokens = count_tokens(transcript_text)
+        print(f"Total transcript tokens: {total_tokens}")
+        
+        # Generate summary (automatically uses chunking if needed)
+        print("\nGenerating summary...")
+        summary = summarize_video(video, use_chunking=True, max_chunk_tokens=4000)
+        
+        # Display results
+        print("\n" + "="*50)
+        print("FINAL SUMMARY")
+        print("="*50)
+        print(summary)
+        
+    except Exception as e:
+        print(f"Error: {str(e)}")
+
+
+def test_chunking():
+    """Test function to demonstrate chunking with a sample transcript"""
+    # Sample transcript for testing
+    sample_transcript = [
+        {"text": "This is a sample transcript segment 1. " * 100},  # ~1000 tokens
+        {"text": "This is a sample transcript segment 2. " * 100},  # ~1000 tokens
+        {"text": "This is a sample transcript segment 3. " * 100},  # ~1000 tokens
+        {"text": "This is a sample transcript segment 4. " * 100},  # ~1000 tokens
+        {"text": "This is a sample transcript segment 5. " * 100},  # ~1000 tokens
+    ]
+    
+    print("Testing chunking functionality...")
+    chunks = chunk_transcript(sample_transcript, max_tokens=2000, overlap_tokens=100)
+    
+    print(f"Original transcript: {count_tokens(' '.join([s['text'] for s in sample_transcript]))} tokens")
+    print(f"Number of chunks: {len(chunks)}")
+    
+    for i, chunk in enumerate(chunks):
+        print(f"Chunk {i+1}: {count_tokens(chunk)} tokens")
+
+
+if __name__ == "__main__":
+    # Uncomment the line below to test chunking
+    # test_chunking()
+    
+    # Run main function
+    main()