diff --git a/week4/community-contributions/hopeogbons/week4 EXERCISE_hopeogbons.ipynb b/week4/community-contributions/hopeogbons/week4 EXERCISE_hopeogbons.ipynb new file mode 100644 index 0000000..bd0d030 --- /dev/null +++ b/week4/community-contributions/hopeogbons/week4 EXERCISE_hopeogbons.ipynb @@ -0,0 +1,1892 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "905062e4", + "metadata": {}, + "source": [ + "# ๐Ÿ”ถ Multi-Language Code Complexity Annotator\n", + "\n", + "## Why I Built This\n", + "\n", + "Understanding time complexity (Big-O notation) is crucial for writing efficient algorithms, identifying bottlenecks, making informed optimization decisions, and passing technical interviews.\n", + "\n", + "Analyzing complexity manually is tedious and error-prone. This tool **automates** the entire processโ€”detecting loops, recursion, and functions, then annotating code with Big-O estimates and explanations.\n", + "\n", + "---\n", + "\n", + "## What This Does\n", + "\n", + "This app analyzes source code and automatically:\n", + "- ๐Ÿ“Š Detects loops, recursion, and functions\n", + "- ๐Ÿงฎ Estimates Big-O complexity (O(1), O(n), O(nยฒ), etc.)\n", + "- ๐Ÿ’ฌ Inserts inline comments explaining the complexity\n", + "- ๐ŸŽจ Generates syntax-highlighted previews\n", + "- ๐Ÿค– **Optional:** Gets AI-powered code review from LLaMA\n", + "\n", + "**Supports 13 languages:** Python โ€ข JavaScript โ€ข TypeScript โ€ข Java โ€ข C/C++ โ€ข C# โ€ข Go โ€ข PHP โ€ข Swift โ€ข Ruby โ€ข Kotlin โ€ข Rust\n", + "\n", + "**Tech:** HuggingFace Transformers โ€ข LLaMA 3.2 โ€ข Gradio UI โ€ข Pygments โ€ข Regex Analysis\n", + "\n", + "---\n", + "\n", + "**Use Case:** Upload your code โ†’ Get instant complexity analysis โ†’ Optimize with confidence\n" + ] + }, + { + "cell_type": "markdown", + "id": "69e9876d", + "metadata": {}, + "source": [ + "## Step 1: Install Dependencies\n", + "\n", + "Installing the complete stack:\n", + "- **Transformers** - HuggingFace library for loading LLaMA models\n", + "- **Accelerate** - Fast distributed training/inference\n", + "- **Gradio** - Beautiful web interface\n", + "- **PyTorch** (CPU version) - Deep learning framework\n", + "- **BitsAndBytes** - 4/8-bit quantization for large models\n", + "- **Pygments** - Syntax highlighting engine\n", + "- **Python-dotenv** - Environment variable management\n", + "\n", + "**Note:** This installs the CPU-only version of PyTorch. For GPU support, modify the install command.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "f035a1c5", + "metadata": {}, + "outputs": [], + "source": [ + "!uv pip -q install -U pip\n", + "!uv pip -q install transformers accelerate gradio torch --extra-index-url https://download.pytorch.org/whl/cpu\n", + "!uv pip -q install bitsandbytes pygments python-dotenv" + ] + }, + { + "cell_type": "markdown", + "id": "6ab14cd1", + "metadata": {}, + "source": [ + "## Step 2: Core Configuration & Imports\n", + "\n", + "Setting up:\n", + "- **Environment variables** to suppress progress bars (prevents Jupyter ContextVar issues)\n", + "- **Dummy tqdm** class to avoid notebook conflicts\n", + "- **Language mappings** for 13+ programming languages\n", + "- **Complexity constants** for Big-O estimation\n", + "- **Comment syntax** for each language (# vs //)\n", + "\n", + "**Key Configurations:**\n", + "- Max file size: 2 MB\n", + "- Default model: `meta-llama/Llama-3.2-1B`\n", + "- Supported file extensions and their language identifiers\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "5666a121", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import re\n", + "import io\n", + "import json\n", + "import time\n", + "import math\n", + "from dataclasses import dataclass\n", + "from typing import Tuple, List, Dict, Optional, Generator\n", + "\n", + "# Disable tqdm progress bars to avoid Jupyter ContextVar issues\n", + "os.environ[\"TRANSFORMERS_NO_ADVISORY_WARNINGS\"] = \"1\"\n", + "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n", + "os.environ[\"TQDM_DISABLE\"] = \"1\" # Completely disable tqdm\n", + "\n", + "# Provide a module-level lock expected by some integrations\n", + "class _DummyLock:\n", + " def __enter__(self):\n", + " return self\n", + " def __exit__(self, *args):\n", + " pass\n", + "\n", + "class _DummyTqdm:\n", + " \"\"\"Dummy tqdm that does nothing - prevents Jupyter notebook ContextVar errors\"\"\"\n", + " def __init__(self, *args, **kwargs):\n", + " self.iterable = args[0] if args else None\n", + " self.total = kwargs.get('total', 0)\n", + " self.n = 0\n", + " def __iter__(self):\n", + " return iter(self.iterable) if self.iterable else iter([])\n", + " def __enter__(self):\n", + " return self\n", + " def __exit__(self, *args):\n", + " pass\n", + " def update(self, n=1, *args, **kwargs):\n", + " self.n += n\n", + " def close(self):\n", + " pass\n", + " def set_description(self, *args, **kwargs):\n", + " pass\n", + " def set_postfix(self, *args, **kwargs):\n", + " pass\n", + " def refresh(self, *args, **kwargs):\n", + " pass\n", + " def clear(self, *args, **kwargs):\n", + " pass\n", + " def write(self, *args, **kwargs):\n", + " pass\n", + " def reset(self, total=None):\n", + " self.n = 0\n", + " if total is not None:\n", + " self.total = total\n", + " @staticmethod\n", + " def get_lock():\n", + " \"\"\"Return a dummy lock to avoid ContextVar issues\"\"\"\n", + " return _DummyLock()\n", + " \n", + " @staticmethod\n", + " def set_lock(lock=None):\n", + " \"\"\"Dummy set_lock method - does nothing\"\"\"\n", + " pass\n", + "\n", + "def _dummy_get_lock():\n", + " \"\"\"Module-level get_lock function\"\"\"\n", + " return _DummyLock()\n", + "\n", + "def _dummy_set_lock(lock=None):\n", + " \"\"\"Module-level set_lock function - does nothing\"\"\"\n", + " pass\n", + "\n", + "# Import and immediately patch tqdm before transformers can use it\n", + "def _patch_tqdm():\n", + " \"\"\"Patch tqdm to avoid ContextVar errors in Jupyter\"\"\"\n", + " import sys # Import sys here since it's not available in outer scope\n", + " try:\n", + " import tqdm\n", + " import tqdm.auto\n", + " import tqdm.notebook\n", + "\n", + " # Patch classes\n", + " tqdm.tqdm = _DummyTqdm\n", + " tqdm.auto.tqdm = _DummyTqdm\n", + " tqdm.notebook.tqdm = _DummyTqdm\n", + "\n", + " # Patch module-level functions that other code might call directly\n", + " tqdm.get_lock = _dummy_get_lock\n", + " tqdm.auto.get_lock = _dummy_get_lock\n", + " tqdm.notebook.get_lock = _dummy_get_lock\n", + " tqdm.set_lock = _dummy_set_lock\n", + " tqdm.auto.set_lock = _dummy_set_lock\n", + " tqdm.notebook.set_lock = _dummy_set_lock\n", + "\n", + " # Also patch in sys.modules to catch any dynamic imports\n", + " sys.modules['tqdm'].tqdm = _DummyTqdm\n", + " sys.modules['tqdm.auto'].tqdm = _DummyTqdm\n", + " sys.modules['tqdm.notebook'].tqdm = _DummyTqdm\n", + " sys.modules['tqdm'].get_lock = _dummy_get_lock\n", + " sys.modules['tqdm.auto'].get_lock = _dummy_get_lock\n", + " sys.modules['tqdm.notebook'].get_lock = _dummy_get_lock\n", + " sys.modules['tqdm'].set_lock = _dummy_set_lock\n", + " sys.modules['tqdm.auto'].set_lock = _dummy_set_lock\n", + " sys.modules['tqdm.notebook'].set_lock = _dummy_set_lock\n", + "\n", + " except ImportError:\n", + " pass\n", + "\n", + "_patch_tqdm()\n", + "\n", + "from dotenv import load_dotenv\n", + "\n", + "SUPPORTED_EXTENSIONS = {\n", + " \".py\": \"python\",\n", + " \".js\": \"javascript\",\n", + " \".ts\": \"typescript\",\n", + " \".java\": \"java\",\n", + " \".c\": \"c\",\n", + " \".h\": \"c\",\n", + " \".cpp\": \"cpp\",\n", + " \".cc\": \"cpp\",\n", + " \".hpp\": \"cpp\",\n", + " \".cs\": \"csharp\",\n", + " \".go\": \"go\",\n", + " \".php\": \"php\",\n", + " \".swift\": \"swift\",\n", + " \".rb\": \"ruby\",\n", + " \".kt\": \"kotlin\",\n", + " \".rs\": \"rust\",\n", + "}\n", + "\n", + "COMMENT_SYNTAX = {\n", + " \"python\": \"#\",\n", + " \"javascript\": \"//\",\n", + " \"typescript\": \"//\",\n", + " \"java\": \"//\",\n", + " \"c\": \"//\",\n", + " \"cpp\": \"//\",\n", + " \"csharp\": \"//\",\n", + " \"go\": \"//\",\n", + " \"php\": \"//\",\n", + " \"swift\": \"//\",\n", + " \"ruby\": \"#\",\n", + " \"kotlin\": \"//\",\n", + " \"rust\": \"//\",\n", + "}\n", + "\n", + "MAX_FILE_SIZE_MB = 2.0\n", + "# Llama 3.2 1B - The actual model name (not -Instruct suffix)\n", + "# Requires Meta approval: https://huggingface.co/meta-llama/Llama-3.2-1B\n", + "DEFAULT_MODEL_ID = \"meta-llama/Llama-3.2-1B\"\n", + "DEVICE_HINT = \"auto\"\n", + "\n", + "# Global token storage (set in Cell 2 to avoid Jupyter ContextVar issues)\n", + "HUGGINGFACE_TOKEN = None\n", + "\n", + "# Complexity estimation constants\n", + "LOOP_KEYWORDS = [r\"\\bfor\\b\", r\"\\bwhile\\b\"]\n", + "\n", + "FUNCTION_PATTERNS = [\n", + " r\"^\\s*def\\s+([A-Za-z_]\\w*)\\s*\\(\", # Python\n", + " r\"^\\s*(?:public|private|protected)?\\s*(?:static\\s+)?[A-Za-z_<>\\[\\]]+\\s+([A-Za-z_]\\w*)\\s*\\(\", # Java/C#/C++\n", + " r\"^\\s*function\\s+([A-Za-z_]\\w*)\\s*\\(\", # JavaScript\n", + " r\"^\\s*(?:const|let|var)\\s+([A-Za-z_]\\w*)\\s*=\\s*\\(\", # JavaScript arrow/function\n", + "]\n", + "\n", + "COMPLEXITY_ORDER = {\n", + " \"O(1)\": 0,\n", + " \"O(log n)\": 1,\n", + " \"O(n)\": 2,\n", + " \"O(n log n)\": 3,\n", + " \"O(n^2)\": 4,\n", + " \"O(n^3)\": 5,\n", + "}\n", + "\n", + "RECURSION_PATTERNS = {\n", + " \"divide_conquer\": r\"\\b(n/2|n >> 1|n>>1|n\\s*//\\s*2|mid\\b)\",\n", + "}\n", + "\n", + "# HTML syntax highlighting styles (orange comments)\n", + "SYNTAX_HIGHLIGHT_CSS = \"\"\"\"\"\"" + ] + }, + { + "cell_type": "markdown", + "id": "d17e8406", + "metadata": {}, + "source": [ + "## Step 3: Load HuggingFace Token\n", + "\n", + "Loading authentication token from `.env` file to access gated models like LLaMA.\n", + "\n", + "**Why?** Meta's LLaMA models require:\n", + "1. Accepting their license agreement on HuggingFace\n", + "2. Using an access token for authentication\n", + "\n", + "**Create a `.env` file with:**\n", + "```\n", + "HF_TOKEN=hf_your_token_here\n", + "```\n", + "\n", + "Get your token at: https://huggingface.co/settings/tokens\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "70beee01", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "โœ… Hugging Face token loaded successfully from .env file\n", + " Token length: 37 characters\n" + ] + } + ], + "source": [ + "load_dotenv()\n", + "\n", + "# Load token from .env file\n", + "HF_TOKEN = os.getenv(\"HF_TOKEN\", \"\").strip()\n", + "\n", + "# Store in global variable to avoid Jupyter ContextVar issues with os.environ\n", + "global HUGGINGFACE_TOKEN\n", + "\n", + "if HF_TOKEN:\n", + " os.environ[\"HUGGING_FACE_HUB_TOKEN\"] = HF_TOKEN\n", + " HUGGINGFACE_TOKEN = HF_TOKEN # Store in global variable\n", + " print(\"โœ… Hugging Face token loaded successfully from .env file\")\n", + " print(f\" Token length: {len(HF_TOKEN)} characters\")\n", + "else:\n", + " print(\"โš ๏ธ No HF_TOKEN found in .env file. Gated models may not work.\")\n", + " HUGGINGFACE_TOKEN = None" + ] + }, + { + "cell_type": "markdown", + "id": "bd0a557e", + "metadata": {}, + "source": [ + "## Step 4: Language Detection Functions\n", + "\n", + "Two simple but essential utilities:\n", + "\n", + "1. **`detect_language(filename)`** - Detects programming language from file extension\n", + "2. **`comment_prefix_for(lang)`** - Returns the comment symbol for that language (# or //)\n", + "\n", + "These enable the tool to automatically adapt to any supported language.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "a0dbad5f", + "metadata": {}, + "outputs": [], + "source": [ + "def detect_language(filename: str) -> str:\n", + " \"\"\"\n", + " Detect programming language based on file extension.\n", + " \n", + " Args:\n", + " filename: Name of the file (must have a supported extension)\n", + " \n", + " Returns:\n", + " Language identifier string (e.g., 'python', 'javascript', etc.)\n", + " \n", + " Raises:\n", + " ValueError: If file extension is not supported\n", + " \"\"\"\n", + " ext = os.path.splitext(filename)[1].lower()\n", + " \n", + " if not ext:\n", + " supported = \", \".join(sorted(SUPPORTED_EXTENSIONS.keys()))\n", + " raise ValueError(f\"File has no extension. Supported extensions: {supported}\")\n", + " \n", + " if ext not in SUPPORTED_EXTENSIONS:\n", + " supported = \", \".join(sorted(SUPPORTED_EXTENSIONS.keys()))\n", + " raise ValueError(f\"Unsupported file extension '{ext}'. Supported extensions: {supported}\")\n", + " \n", + " return SUPPORTED_EXTENSIONS[ext]\n", + "\n", + "def comment_prefix_for(lang: str) -> str:\n", + " \"\"\"\n", + " Get the comment prefix for a given language.\n", + " \n", + " Args:\n", + " lang: Language identifier (e.g., 'python', 'javascript')\n", + " \n", + " Returns:\n", + " Comment prefix string (e.g., '#' or '//')\n", + " \n", + " Raises:\n", + " ValueError: If language is not supported\n", + " \"\"\"\n", + " if lang not in COMMENT_SYNTAX:\n", + " raise ValueError(f\"Unsupported language '{lang}'. Supported: {', '.join(sorted(COMMENT_SYNTAX.keys()))}\")\n", + " \n", + " return COMMENT_SYNTAX[lang]" + ] + }, + { + "cell_type": "markdown", + "id": "13e0f6d8", + "metadata": {}, + "source": [ + "## Step 5: Complexity Estimation Engine\n", + "\n", + "The core analysis logic using **heuristic pattern matching**:\n", + "\n", + "**How it works:**\n", + "1. **Detect blocks** - Find all functions, loops, and recursion using regex patterns\n", + "2. **Analyze loops** - Count nesting depth (1 loop = O(n), 2 nested = O(nยฒ), etc.)\n", + "3. **Analyze recursion** - Detect divide-and-conquer (O(log n)) vs exponential (O(2^n))\n", + "4. **Aggregate** - Functions inherit the worst complexity of their inner operations\n", + "\n", + "**Key Functions:**\n", + "- `detect_blocks()` - Pattern matching for code structures\n", + "- `analyze_recursion()` - Identifies recursive patterns\n", + "- `analyze_loop_complexity()` - Counts nested loops\n", + "- `estimate_complexity()` - Orchestrates the full analysis\n" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "7595dfe3", + "metadata": {}, + "outputs": [], + "source": [ + "@dataclass\n", + "class BlockInfo:\n", + " \"\"\"Represents a code block (function, loop, or recursion) with complexity information.\"\"\"\n", + " line_idx: int\n", + " kind: str # \"function\" | \"loop\" | \"recursion\"\n", + " name: Optional[str] = None\n", + " depth: int = 0\n", + " complexity: str = \"O(1)\"\n", + " reason: str = \"\"\n", + "\n", + "\n", + "def get_indent_level(line: str) -> int:\n", + " \"\"\"Calculate indentation level of a line (tabs converted to 4 spaces).\"\"\"\n", + " normalized = line.replace(\"\\t\", \" \")\n", + " return len(normalized) - len(normalized.lstrip(\" \"))\n", + "\n", + "\n", + "def find_function_name(line: str) -> Optional[str]:\n", + " \"\"\"Extract function name from a line if it contains a function declaration.\"\"\"\n", + " for pattern in FUNCTION_PATTERNS:\n", + " match = re.search(pattern, line)\n", + " if match and match.lastindex:\n", + " return match.group(1)\n", + " return None\n", + "\n", + "\n", + "def get_block_end(block: BlockInfo, all_blocks: List[BlockInfo], total_lines: int) -> int:\n", + " \"\"\"Calculate the end line index for a given block.\"\"\"\n", + " end = total_lines\n", + " for other in all_blocks:\n", + " if other.line_idx > block.line_idx and other.depth <= block.depth:\n", + " end = min(end, other.line_idx)\n", + " return end\n", + "\n", + "\n", + "def rank_complexity(complexity: str) -> int:\n", + " \"\"\"Assign a numeric rank to a complexity string for comparison.\"\"\"\n", + " # Check for polynomial complexities O(n^k)\n", + " match = re.match(r\"O\\(n\\^(\\d+)\\)\", complexity)\n", + " if match:\n", + " return 10 + int(match.group(1))\n", + " \n", + " return COMPLEXITY_ORDER.get(complexity, 0)\n", + "\n", + "\n", + "def detect_blocks(lines: List[str], lang: str) -> List[BlockInfo]:\n", + " \"\"\"Detect all code blocks (functions and loops) in the source code.\"\"\"\n", + " blocks = []\n", + " stack = []\n", + " brace_depth = 0\n", + " \n", + " # Pre-compute indentation for Python\n", + " indents = [get_indent_level(line) for line in lines] if lang == \"python\" else []\n", + " \n", + " for i, line in enumerate(lines):\n", + " stripped = line.strip()\n", + " \n", + " # Track brace depth for non-Python languages\n", + " if lang != \"python\":\n", + " brace_depth += line.count(\"{\") - line.count(\"}\")\n", + " brace_depth = max(0, brace_depth)\n", + " \n", + " # Update stack based on indentation/brace depth\n", + " if lang == \"python\":\n", + " while stack and indents[i] < stack[-1]:\n", + " stack.pop()\n", + " else:\n", + " while stack and brace_depth < stack[-1]:\n", + " stack.pop()\n", + " \n", + " current_depth = len(stack)\n", + " \n", + " # Detect loops\n", + " if any(re.search(pattern, stripped) for pattern in LOOP_KEYWORDS):\n", + " blocks.append(BlockInfo(\n", + " line_idx=i,\n", + " kind=\"loop\",\n", + " depth=current_depth + 1\n", + " ))\n", + " stack.append(indents[i] if lang == \"python\" else brace_depth)\n", + " \n", + " # Detect functions\n", + " func_name = find_function_name(line)\n", + " if func_name:\n", + " blocks.append(BlockInfo(\n", + " line_idx=i,\n", + " kind=\"function\",\n", + " name=func_name,\n", + " depth=current_depth + 1\n", + " ))\n", + " stack.append(indents[i] if lang == \"python\" else brace_depth)\n", + " \n", + " return blocks\n", + "\n", + "\n", + "def analyze_recursion(block: BlockInfo, blocks: List[BlockInfo], lines: List[str]) -> None:\n", + " \"\"\"Analyze a function block for recursion and update its complexity.\"\"\"\n", + " if block.kind != \"function\" or not block.name:\n", + " return\n", + " \n", + " end = get_block_end(block, blocks, len(lines))\n", + " body = \"\\n\".join(lines[block.line_idx:end])\n", + " \n", + " # Count recursive calls (subtract 1 for the function definition itself)\n", + " recursive_calls = len(re.findall(rf\"\\b{re.escape(block.name)}\\s*\\(\", body)) - 1\n", + " \n", + " if recursive_calls == 0:\n", + " return\n", + " \n", + " # Detect divide-and-conquer pattern\n", + " if re.search(RECURSION_PATTERNS[\"divide_conquer\"], body):\n", + " block.kind = \"recursion\"\n", + " block.complexity = \"O(log n)\"\n", + " block.reason = \"Divide-and-conquer recursion (problem size halves each call).\"\n", + " # Multiple recursive calls suggest exponential\n", + " elif recursive_calls >= 2:\n", + " block.kind = \"recursion\"\n", + " block.complexity = \"O(2^n)\"\n", + " block.reason = \"Multiple recursive calls per frame suggest exponential growth.\"\n", + " # Single recursive call is linear\n", + " else:\n", + " block.kind = \"recursion\"\n", + " block.complexity = \"O(n)\"\n", + " block.reason = \"Single recursive call per frame suggests linear recursion.\"\n", + "\n", + "\n", + "def analyze_loop_complexity(block: BlockInfo, all_loops: List[BlockInfo], blocks: List[BlockInfo], total_lines: int) -> None:\n", + " \"\"\"Analyze loop nesting depth and assign complexity.\"\"\"\n", + " if block.kind != \"loop\":\n", + " return\n", + " \n", + " end = get_block_end(block, blocks, total_lines)\n", + " \n", + " # Count nested loops within this loop\n", + " inner_loops = [loop for loop in all_loops \n", + " if block.line_idx < loop.line_idx < end]\n", + " \n", + " nesting_depth = 1 + len(inner_loops)\n", + " \n", + " if nesting_depth == 1:\n", + " block.complexity = \"O(n)\"\n", + " block.reason = \"Single loop scales linearly with input size.\"\n", + " elif nesting_depth == 2:\n", + " block.complexity = \"O(n^2)\"\n", + " block.reason = \"Nested loops indicate quadratic time.\"\n", + " elif nesting_depth == 3:\n", + " block.complexity = \"O(n^3)\"\n", + " block.reason = \"Three nested loops indicate cubic time.\"\n", + " else:\n", + " block.complexity = f\"O(n^{nesting_depth})\"\n", + " block.reason = f\"{nesting_depth} nested loops suggest polynomial time.\"\n", + "\n", + "\n", + "def analyze_function_complexity(block: BlockInfo, blocks: List[BlockInfo], total_lines: int) -> None:\n", + " \"\"\"Analyze overall function complexity based on contained blocks.\"\"\"\n", + " if block.kind != \"function\":\n", + " return\n", + " \n", + " end = get_block_end(block, blocks, total_lines)\n", + " \n", + " # Get all blocks within this function\n", + " inner_blocks = [b for b in blocks if block.line_idx < b.line_idx < end]\n", + " \n", + " # Find the worst complexity among inner blocks\n", + " worst_complexity = \"O(1)\"\n", + " for inner in inner_blocks:\n", + " if rank_complexity(inner.complexity) > rank_complexity(worst_complexity):\n", + " worst_complexity = inner.complexity\n", + " \n", + " # Special case: recursion + loop = O(n log n)\n", + " has_recursion = any(b.kind == \"recursion\" for b in inner_blocks)\n", + " has_loop = any(b.kind == \"loop\" for b in inner_blocks)\n", + " \n", + " if has_recursion and has_loop:\n", + " block.complexity = \"O(n log n)\"\n", + " block.reason = \"Combines recursion with iteration (e.g., merge sort pattern).\"\n", + " else:\n", + " block.complexity = worst_complexity\n", + " block.reason = \"Based on worst-case complexity of inner operations.\"\n", + "\n", + "\n", + "def estimate_complexity(lines: List[str], lang: str) -> List[BlockInfo]:\n", + " \"\"\"\n", + " Estimate Big-O complexity for code blocks using heuristic analysis.\n", + " \n", + " Heuristics:\n", + " - Single/nested loops: O(n), O(n^2), O(n^3), etc.\n", + " - Recursion patterns: O(n), O(log n), O(2^n)\n", + " - Function complexity: worst case of internal operations\n", + " \n", + " Args:\n", + " lines: Source code lines\n", + " lang: Programming language identifier\n", + " \n", + " Returns:\n", + " List of BlockInfo objects with complexity estimates\n", + " \"\"\"\n", + " # Step 1: Detect all blocks\n", + " blocks = detect_blocks(lines, lang)\n", + " \n", + " # Step 2: Analyze recursion in functions\n", + " for block in blocks:\n", + " analyze_recursion(block, blocks, lines)\n", + " \n", + " # Step 3: Analyze loop complexities\n", + " loops = [b for b in blocks if b.kind == \"loop\"]\n", + " for loop in loops:\n", + " analyze_loop_complexity(loop, loops, blocks, len(lines))\n", + " \n", + " # Step 4: Analyze overall function complexities\n", + " for block in blocks:\n", + " analyze_function_complexity(block, blocks, len(lines))\n", + " \n", + " return blocks" + ] + }, + { + "cell_type": "markdown", + "id": "f2a22988", + "metadata": {}, + "source": [ + "## Step 6: Code Annotation Functions\n", + "\n", + "Takes the complexity estimates and **inserts them as comments** into the source code:\n", + "\n", + "**Process:**\n", + "1. `create_annotation_comment()` - Formats Big-O annotations as language-specific comments\n", + "2. `insert_annotations()` - Inserts comments below each function/loop\n", + "3. `to_markdown()` - Wraps annotated code in Markdown code blocks\n", + "\n", + "**Example output:**\n", + "```python\n", + "def bubble_sort(arr):\n", + "# Big-O: O(n^2)\n", + "# Explanation: Nested loops indicate quadratic time.\n", + " for i in range(len(arr)):\n", + " for j in range(len(arr) - i - 1):\n", + " if arr[j] > arr[j + 1]:\n", + " arr[j], arr[j + 1] = arr[j + 1], arr[j]\n", + "```\n" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "2e642483", + "metadata": {}, + "outputs": [], + "source": [ + "def create_annotation_comment(block: BlockInfo, comment_prefix: str) -> List[str]:\n", + " \"\"\"\n", + " Create annotation comments for a code block.\n", + " \n", + " Args:\n", + " block: BlockInfo object containing complexity information\n", + " comment_prefix: Comment syntax for the language (e.g., '#' or '//')\n", + " \n", + " Returns:\n", + " List of comment lines to insert\n", + " \"\"\"\n", + " complexity = block.complexity or \"O(1)\"\n", + " reason = block.reason or \"Heuristic estimate based on detected structure.\"\n", + " \n", + " return [\n", + " f\"{comment_prefix} Big-O: {complexity}\",\n", + " f\"{comment_prefix} Explanation: {reason}\"\n", + " ]\n", + "\n", + "\n", + "def insert_annotations(code: str, lang: str) -> str:\n", + " \"\"\"\n", + " Insert Big-O complexity annotations into source code.\n", + " \n", + " Analyzes the code for loops, functions, and recursion, then inserts\n", + " orange-colored comment annotations (when syntax highlighted) beneath\n", + " each detected block explaining its time complexity.\n", + " \n", + " Args:\n", + " code: Source code string to annotate\n", + " lang: Programming language identifier\n", + " \n", + " Returns:\n", + " Annotated source code with Big-O comments inserted\n", + " \"\"\"\n", + " if not code.strip():\n", + " return code\n", + " \n", + " lines = code.splitlines()\n", + " blocks = estimate_complexity(lines, lang)\n", + " \n", + " if not blocks:\n", + " return code\n", + " \n", + " comment_prefix = comment_prefix_for(lang)\n", + " \n", + " # Build a map of line numbers to annotation comments\n", + " annotations: Dict[int, List[str]] = {}\n", + " for block in blocks:\n", + " line_num = block.line_idx + 1 # Convert 0-indexed to 1-indexed\n", + " comments = create_annotation_comment(block, comment_prefix)\n", + " annotations.setdefault(line_num, []).extend(comments)\n", + " \n", + " # Insert annotations after their corresponding lines\n", + " annotated_lines = []\n", + " for line_num, original_line in enumerate(lines, start=1):\n", + " annotated_lines.append(original_line)\n", + " if line_num in annotations:\n", + " annotated_lines.extend(annotations[line_num])\n", + " \n", + " return \"\\n\".join(annotated_lines)\n", + "\n", + "\n", + "def to_markdown(code: str, lang: str) -> str:\n", + " \"\"\"\n", + " Format annotated code as Markdown with syntax highlighting.\n", + " \n", + " Args:\n", + " code: Annotated source code\n", + " lang: Programming language identifier for syntax highlighting\n", + " \n", + " Returns:\n", + " Markdown-formatted code block\n", + " \"\"\"\n", + " lang_display = lang.capitalize()\n", + " \n", + " return f\"\"\"### Annotated Code ({lang_display})\n", + "\n", + "```{lang}\n", + "{code}\n", + "```\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "id": "184ad5c1", + "metadata": {}, + "source": [ + "## Step 7: Syntax Highlighting with Pygments\n", + "\n", + "Generates beautiful, syntax-highlighted HTML previews with **orange-colored complexity comments**.\n", + "\n", + "**Features:**\n", + "- Uses Pygments lexer for accurate language-specific highlighting\n", + "- Custom CSS to make Big-O comments stand out in orange\n", + "- Fallback to plain HTML if Pygments is unavailable\n", + "- HTML escaping for security\n" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "0f01d30b", + "metadata": {}, + "outputs": [], + "source": [ + "def escape_html(text: str) -> str:\n", + " \"\"\"\n", + " Escape HTML special characters for safe display.\n", + " \n", + " Args:\n", + " text: Raw text to escape\n", + " \n", + " Returns:\n", + " HTML-safe text\n", + " \"\"\"\n", + " html_escape_table = {\n", + " \"&\": \"&\",\n", + " \"<\": \"<\",\n", + " \">\": \">\",\n", + " '\"': \""\",\n", + " \"'\": \"'\",\n", + " }\n", + " return \"\".join(html_escape_table.get(c, c) for c in text)\n", + "\n", + "\n", + "def highlighted_html(code: str, lang: str) -> str:\n", + " \"\"\"\n", + " Generate syntax-highlighted HTML with orange-colored comments.\n", + " \n", + " Uses Pygments for syntax highlighting with custom CSS to make\n", + " comments appear in orange for visual emphasis of Big-O annotations.\n", + " \n", + " Args:\n", + " code: Source code to highlight\n", + " lang: Programming language identifier\n", + " \n", + " Returns:\n", + " HTML string with embedded CSS and syntax highlighting\n", + " \"\"\"\n", + " if not code.strip():\n", + " return f\"
{escape_html(code)}
\"\n", + " \n", + " try:\n", + " from pygments import highlight\n", + " from pygments.lexers import get_lexer_by_name\n", + " from pygments.formatters import HtmlFormatter\n", + " \n", + " # Get appropriate lexer for the language\n", + " lexer = get_lexer_by_name(lang)\n", + " \n", + " # Configure HTML formatter\n", + " formatter = HtmlFormatter(\n", + " nowrap=False,\n", + " full=False,\n", + " cssclass=\"codehilite\",\n", + " linenos=False\n", + " )\n", + " \n", + " # Generate highlighted HTML\n", + " html_code = highlight(code, lexer, formatter)\n", + " \n", + " return SYNTAX_HIGHLIGHT_CSS + html_code\n", + " \n", + " except ImportError:\n", + " # Pygments not available - return plain HTML\n", + " return f\"
{escape_html(code)}
\"\n", + " \n", + " except Exception as e:\n", + " # Lexer not found or other error - fallback to plain HTML\n", + " print(f\"โš ๏ธ Syntax highlighting failed for '{lang}': {e}\")\n", + " return f\"
{escape_html(code)}
\"" + ] + }, + { + "cell_type": "markdown", + "id": "36fd0454", + "metadata": {}, + "source": [ + "## Step 8: LLaMA Model Loading & Streaming\n", + "\n", + "Loading HuggingFace LLaMA models for AI-powered code review:\n", + "\n", + "**Key Features:**\n", + "- **Quantization support** - 4-bit or 8-bit to reduce memory (requires GPU)\n", + "- **Streaming generation** - See tokens appear in real-time\n", + "- **Automatic device mapping** - Uses GPU if available, CPU otherwise\n", + "- **Thread-safe streaming** - Uses `TextIteratorStreamer` for parallel generation\n", + "\n", + "**Functions:**\n", + "- `load_model()` - Downloads and initializes the LLaMA model\n", + "- `stream_generate()` - Generates text token-by-token with streaming\n", + "\n", + "**Memory Requirements:**\n", + "- **Without quantization:** ~14GB RAM (7B models) or ~26GB (13B models)\n", + "- **With 8-bit:** ~50% reduction (GPU required)\n", + "- **With 4-bit:** ~75% reduction (GPU required)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "e7de6947", + "metadata": {}, + "outputs": [], + "source": [ + "# Hugging Face model imports\n", + "try:\n", + " from transformers import (\n", + " AutoModelForCausalLM,\n", + " AutoTokenizer,\n", + " BitsAndBytesConfig,\n", + " TextIteratorStreamer,\n", + " pipeline\n", + " )\n", + " import threading\n", + " TRANSFORMERS_AVAILABLE = True\n", + "except ImportError:\n", + " TRANSFORMERS_AVAILABLE = False\n", + "\n", + "# Global model state\n", + "MODEL_PIPELINE = None\n", + "TOKENIZER = None\n", + "\n", + "\n", + "def get_quantization_config(load_in_4bit: bool, load_in_8bit: bool) -> Optional[BitsAndBytesConfig]:\n", + " \"\"\"\n", + " Create quantization configuration for model loading.\n", + " \n", + " Args:\n", + " load_in_4bit: Whether to use 4-bit quantization\n", + " load_in_8bit: Whether to use 8-bit quantization\n", + " \n", + " Returns:\n", + " BitsAndBytesConfig object or None if quantization not requested/available\n", + " \n", + " Raises:\n", + " RuntimeError: If quantization requested but CUDA not available\n", + " \"\"\"\n", + " if not (load_in_4bit or load_in_8bit):\n", + " return None\n", + " \n", + " # Check if CUDA is available\n", + " try:\n", + " import torch\n", + " if not torch.cuda.is_available():\n", + " raise RuntimeError(\n", + " \"Quantization requires CUDA (NVIDIA GPU).\\n\\n\"\n", + " \"You are running on CPU/Mac and have requested quantization.\\n\"\n", + " \"Options:\\n\"\n", + " \" 1. Disable both 4-bit and 8-bit quantization to run on CPU\\n\"\n", + " \" (requires ~26GB RAM for 13B models, ~14GB for 7B models)\\n\"\n", + " \" 2. Use a GPU with CUDA support\\n\"\n", + " \" 3. Try smaller models like gpt2 or microsoft/DialoGPT-medium\\n\\n\"\n", + " \"Note: Quantization significantly reduces memory usage but requires GPU.\"\n", + " )\n", + " except ImportError:\n", + " pass\n", + " \n", + " try:\n", + " return BitsAndBytesConfig(load_in_4bit=load_in_4bit, load_in_8bit=load_in_8bit)\n", + " except Exception as e:\n", + " raise RuntimeError(f\"Failed to create quantization config: {e}\")\n", + "\n", + "\n", + "def load_model(\n", + " model_id: str = DEFAULT_MODEL_ID,\n", + " device_map: str = DEVICE_HINT,\n", + " load_in_8bit: bool = False,\n", + " load_in_4bit: bool = False\n", + ") -> None:\n", + " \"\"\"\n", + " Load a Hugging Face LLaMA-family model for text generation.\n", + " \n", + " Supports optional 4-bit or 8-bit quantization to reduce memory usage.\n", + " Model is loaded into global MODEL_PIPELINE and TOKENIZER variables.\n", + " \n", + " Args:\n", + " model_id: Hugging Face model identifier\n", + " device_map: Device mapping strategy ('auto', 'cpu', 'cuda', etc.)\n", + " load_in_8bit: Enable 8-bit quantization\n", + " load_in_4bit: Enable 4-bit quantization\n", + " \n", + " Raises:\n", + " ImportError: If transformers is not installed\n", + " Exception: If model loading fails\n", + " \"\"\"\n", + " global MODEL_PIPELINE, TOKENIZER\n", + " \n", + " if not TRANSFORMERS_AVAILABLE:\n", + " raise ImportError(\n", + " \"Transformers library is not installed. \"\n", + " \"Please run the installation cell and restart the kernel.\"\n", + " )\n", + " \n", + " # Use global variable instead of os.environ to avoid Jupyter ContextVar issues\n", + " global HUGGINGFACE_TOKEN\n", + " hf_token = HUGGINGFACE_TOKEN if HUGGINGFACE_TOKEN else None\n", + " \n", + " if hf_token:\n", + " print(f\" Using HuggingFace token: {hf_token[:10]}...{hf_token[-4:]}\")\n", + " else:\n", + " print(\" No HuggingFace token available (may fail for gated models)\")\n", + " \n", + " # Configure quantization if requested\n", + " quant_config = get_quantization_config(load_in_4bit, load_in_8bit)\n", + " \n", + " print(f\"๐Ÿ”„ Loading model: {model_id}\")\n", + " print(f\" Device map: {device_map}\")\n", + " print(f\" Quantization: 8-bit={load_in_8bit}, 4-bit={load_in_4bit}\")\n", + " print(f\" This may take 2-5 minutes... please wait...\")\n", + "\n", + " # Final tqdm patch before model loading (catches any missed imports)\n", + " _patch_tqdm()\n", + "\n", + " try:\n", + " # Suppress transformers warnings\n", + " from transformers.utils import logging\n", + " logging.set_verbosity_error()\n", + " \n", + " TOKENIZER = AutoTokenizer.from_pretrained(\n", + " model_id,\n", + " token=hf_token,\n", + " trust_remote_code=False\n", + " )\n", + " \n", + " print(\" โœ“ Tokenizer loaded\")\n", + " \n", + " # Load model\n", + " model = AutoModelForCausalLM.from_pretrained(\n", + " model_id,\n", + " device_map=device_map,\n", + " quantization_config=quant_config,\n", + " token=hf_token,\n", + " trust_remote_code=False,\n", + " low_cpu_mem_usage=True\n", + " )\n", + " \n", + " print(\" โœ“ Model loaded into memory\")\n", + " \n", + " # Create pipeline\n", + " MODEL_PIPELINE = pipeline(\n", + " \"text-generation\",\n", + " model=model,\n", + " tokenizer=TOKENIZER\n", + " )\n", + " \n", + " print(\"โœ… Model loaded successfully\")\n", + " \n", + " except Exception as e:\n", + " print(f\"โŒ Model loading failed: {e}\")\n", + " print(\"\\n๐Ÿ’ก Troubleshooting:\")\n", + " print(\" โ€ข Gated models require HuggingFace approval and token\")\n", + " print(\" โ€ข Large models (13B+) need quantization OR ~26GB+ RAM\")\n", + " print(\" โ€ข Quantization requires NVIDIA GPU with CUDA\")\n", + " print(\"\\n๐Ÿ’ก Models that work on CPU/Mac (no GPU needed):\")\n", + " print(\" โ€ข gpt2 (~500MB RAM)\")\n", + " print(\" โ€ข microsoft/DialoGPT-medium (~1GB RAM)\")\n", + " print(\" โ€ข meta-llama/Llama-2-7b-chat-hf (~14GB RAM, needs approval)\")\n", + " print(\"\\nBrowse more models: https://huggingface.co/models?pipeline_tag=text-generation\")\n", + " MODEL_PIPELINE = None\n", + " TOKENIZER = None\n", + " raise\n", + "\n", + "\n", + "def stream_generate(\n", + " prompt: str,\n", + " max_new_tokens: int = 256,\n", + " temperature: float = 0.7\n", + ") -> Generator[str, None, None]:\n", + " \"\"\"\n", + " Stream generated tokens from the loaded model.\n", + " \n", + " Uses TextIteratorStreamer for real-time token streaming.\n", + " Falls back to non-streaming generation if streaming is unavailable.\n", + " \n", + " Args:\n", + " prompt: Input text prompt for generation\n", + " max_new_tokens: Maximum number of tokens to generate\n", + " temperature: Sampling temperature (0.0 = deterministic, higher = more random)\n", + " \n", + " Yields:\n", + " Generated text tokens as they are produced\n", + " \"\"\"\n", + " # Validate model is loaded\n", + " if MODEL_PIPELINE is None:\n", + " yield \"โš ๏ธ Model not loaded. Please run load_model() first.\"\n", + " return\n", + " \n", + " # Validate inputs\n", + " if not prompt.strip():\n", + " yield \"โš ๏ธ Empty prompt provided.\"\n", + " return\n", + " \n", + " try:\n", + " # Create streamer\n", + " streamer = TextIteratorStreamer(\n", + " MODEL_PIPELINE.tokenizer,\n", + " skip_prompt=True,\n", + " skip_special_tokens=True\n", + " )\n", + " \n", + " # Prepare generation arguments\n", + " generation_kwargs = {\n", + " \"text_inputs\": prompt,\n", + " \"streamer\": streamer,\n", + " \"max_new_tokens\": max_new_tokens,\n", + " \"do_sample\": True,\n", + " \"temperature\": temperature,\n", + " }\n", + " \n", + " # Run generation in separate thread\n", + " def generate_in_thread():\n", + " try:\n", + " MODEL_PIPELINE(**generation_kwargs)\n", + " except Exception as e:\n", + " print(f\"โš ๏ธ Generation error: {e}\")\n", + " \n", + " thread = threading.Thread(target=generate_in_thread, daemon=True)\n", + " thread.start()\n", + " \n", + " # Stream tokens as they're generated\n", + " for token in streamer:\n", + " yield token\n", + " \n", + " except Exception as e:\n", + " # Fallback to non-streaming generation\n", + " print(f\"โš ๏ธ Streaming failed ({e}), falling back to non-streaming generation\")\n", + " try:\n", + " result = MODEL_PIPELINE(\n", + " prompt,\n", + " max_new_tokens=max_new_tokens,\n", + " do_sample=True,\n", + " temperature=temperature\n", + " )\n", + " yield result[0][\"generated_text\"]\n", + " except Exception as fallback_error:\n", + " yield f\"โŒ Generation failed: {fallback_error}\"" + ] + }, + { + "cell_type": "markdown", + "id": "a9b51cd6", + "metadata": {}, + "source": [ + "## Step 9: File Processing Pipeline\n", + "\n", + "The main orchestration function that ties everything together:\n", + "\n", + "**Workflow:**\n", + "1. **Read file** - Validate size (<2MB) and decode to text\n", + "2. **Detect language** - From file extension\n", + "3. **Analyze code** - Estimate complexity using heuristics\n", + "4. **Annotate** - Insert Big-O comments\n", + "5. **Generate previews** - Create Markdown and HTML views\n", + "6. **Optional AI review** - Send to LLaMA for deeper analysis\n", + "\n", + "**Functions:**\n", + "- `read_file_content()` - Loads and validates uploaded files\n", + "- `create_review_prompt()` - Formats code for LLM analysis\n", + "- `generate_model_analysis()` - Gets AI-powered insights\n", + "- `process_code_file()` - Main orchestrator\n" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "766f6636", + "metadata": {}, + "outputs": [], + "source": [ + "def read_file_content(fileobj) -> Tuple[str, str, float]:\n", + " \"\"\"\n", + " Read and decode file content from a file-like object.\n", + " \n", + " Args:\n", + " fileobj: File-like object (from Gradio upload or file handle)\n", + " \n", + " Returns:\n", + " Tuple of (filename, content_text, size_in_mb)\n", + " \n", + " Raises:\n", + " ValueError: If file is too large\n", + " \"\"\"\n", + " # Get filename, ensuring we have a valid name\n", + " filename = getattr(fileobj, \"name\", None)\n", + " if not filename:\n", + " raise ValueError(\"Uploaded file must have a valid filename with extension\")\n", + " \n", + " # Read raw content\n", + " raw = fileobj.read()\n", + " \n", + " # Decode to text and calculate size\n", + " if isinstance(raw, bytes):\n", + " text = raw.decode(\"utf-8\", errors=\"replace\")\n", + " size_mb = len(raw) / (1024 * 1024)\n", + " else:\n", + " text = str(raw)\n", + " size_mb = len(text.encode(\"utf-8\")) / (1024 * 1024)\n", + " \n", + " # Validate file size\n", + " if size_mb > MAX_FILE_SIZE_MB:\n", + " raise ValueError(\n", + " f\"File too large: {size_mb:.2f} MB. \"\n", + " f\"Maximum allowed size is {MAX_FILE_SIZE_MB} MB.\"\n", + " )\n", + " \n", + " return filename, text, size_mb\n", + "\n", + "\n", + "def create_review_prompt(code: str, lang: str, max_code_chars: int = 4000) -> str:\n", + " \"\"\"\n", + " Create a prompt for LLM code review.\n", + " \n", + " Args:\n", + " code: Annotated source code\n", + " lang: Programming language\n", + " max_code_chars: Maximum characters to include in prompt\n", + " \n", + " Returns:\n", + " Formatted prompt string\n", + " \"\"\"\n", + " # Truncate code if necessary to fit token limits\n", + " code_snippet = code[:max_code_chars]\n", + " if len(code) > max_code_chars:\n", + " code_snippet += \"\\n... (code truncated for analysis)\"\n", + " \n", + " return f\"\"\"You are a senior code reviewer specializing in performance analysis.\n", + "\n", + "Language: {lang}\n", + "\n", + "Task: Analyze the following annotated code and provide:\n", + "1. Validation of the Big-O annotations\n", + "2. Identification of performance bottlenecks\n", + "3. Specific optimization suggestions (max 8 bullet points)\n", + "4. Any algorithmic improvements\n", + "\n", + "--- CODE START ---\n", + "{code_snippet}\n", + "--- CODE END ---\n", + "\n", + "Provide a concise, actionable analysis:\"\"\"\n", + "\n", + "\n", + "def generate_model_analysis(code: str, lang: str, model_params: Dict) -> str:\n", + " \"\"\"\n", + " Generate LLM-powered code complexity analysis.\n", + " \n", + " Args:\n", + " code: Annotated source code\n", + " lang: Programming language\n", + " model_params: Parameters for model generation (max_new_tokens, temperature)\n", + " \n", + " Returns:\n", + " Generated analysis text or error message\n", + " \"\"\"\n", + " # Check if model is loaded\n", + " if MODEL_PIPELINE is None:\n", + " return \"โš ๏ธ **Model not loaded.** Please click '๐Ÿ”„ Load Model' button first before requesting AI analysis.\"\n", + " \n", + " try:\n", + " prompt = create_review_prompt(code, lang)\n", + " \n", + " # Stream and collect tokens\n", + " tokens = []\n", + " for token in stream_generate(prompt, **model_params):\n", + " tokens.append(token)\n", + " \n", + " result = \"\".join(tokens)\n", + " return result if result.strip() else \"_(No analysis generated)_\"\n", + " \n", + " except Exception as e:\n", + " return f\"โš ๏ธ Model analysis failed: {e}\"\n", + "\n", + "\n", + "def process_code_file(\n", + " fileobj,\n", + " ask_model: bool,\n", + " model_params: Dict\n", + ") -> Tuple[str, str, str, str, str]:\n", + " \"\"\"\n", + " Process uploaded code file: detect language, annotate complexity, generate HTML preview.\n", + " \n", + " This is the main orchestration function that:\n", + " 1. Reads and validates the uploaded file\n", + " 2. Detects programming language from extension\n", + " 3. Analyzes and annotates code with Big-O complexity\n", + " 4. Generates Markdown and HTML previews\n", + " 5. Optionally generates LLM-powered code review\n", + " \n", + " Args:\n", + " fileobj: File-like object from Gradio file upload\n", + " ask_model: Whether to generate LLM analysis\n", + " model_params: Dict with 'max_new_tokens' and 'temperature' for generation\n", + " \n", + " Returns:\n", + " Tuple of (language, annotated_code, markdown_preview, html_preview, model_commentary)\n", + " \n", + " Raises:\n", + " ValueError: If file is invalid, too large, or has unsupported extension\n", + " \"\"\"\n", + " # Step 1: Read and validate file\n", + " filename, code_text, file_size_mb = read_file_content(fileobj)\n", + " \n", + " print(f\"๐Ÿ“„ Processing: {filename} ({file_size_mb:.2f} MB)\")\n", + " \n", + " # Step 2: Detect language from file extension\n", + " lang = detect_language(filename)\n", + " \n", + " print(f\"๐Ÿ” Detected language: {lang}\")\n", + " \n", + " # Step 3: Analyze and annotate code\n", + " annotated_code = insert_annotations(code_text, lang)\n", + " \n", + " # Step 4: Generate preview formats\n", + " markdown_preview = to_markdown(annotated_code, lang)\n", + " html_preview = highlighted_html(annotated_code, lang)\n", + " \n", + " # Step 5: Optionally generate model analysis\n", + " model_commentary = \"\"\n", + " if ask_model:\n", + " print(\"๐Ÿค– Generating model analysis...\")\n", + " model_commentary = generate_model_analysis(annotated_code, lang, model_params)\n", + " \n", + " return lang, annotated_code, markdown_preview, html_preview, model_commentary" + ] + }, + { + "cell_type": "markdown", + "id": "6060f778", + "metadata": {}, + "source": [ + "## Step 10: Build the Gradio Interface\n", + "\n", + "Creating a professional two-column UI with:\n", + "\n", + "**Left Column (Input):**\n", + "- File uploader (filters to code files only)\n", + "- AI review toggle\n", + "- Model configuration (ID, quantization options)\n", + "- Temperature and max tokens sliders\n", + "- Load model & process buttons\n", + "\n", + "**Right Column (Output):**\n", + "- Detected language display\n", + "- Syntax-highlighted code preview (orange comments!)\n", + "- AI code review (if enabled)\n", + "- Download buttons for annotated code + Markdown\n", + "\n", + "**Event Handlers:**\n", + "- `handle_model_loading()` - Shows live progress during model download\n", + "- `handle_file_processing()` - Processes uploaded files and updates all outputs\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "85691712", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Gradio Blocks instance: 2 backend functions\n", + "-------------------------------------------\n", + "fn_index=0\n", + " inputs:\n", + " |-\n", + " |-\n", + " |-\n", + " outputs:\n", + " |-\n", + "fn_index=1\n", + " inputs:\n", + " |-\n", + " |-\n", + " |-\n", + " |-\n", + " outputs:\n", + " |-\n", + " |-\n", + " |-\n", + " |-\n", + " |-" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Gradio UI imports\n", + "try:\n", + " import gradio as gr\n", + " GRADIO_AVAILABLE = True\n", + "except ImportError:\n", + " GRADIO_AVAILABLE = False\n", + "\n", + "import tempfile\n", + "from pathlib import Path\n", + "\n", + "\n", + "def get_file_extension_for_language(lang: str) -> str:\n", + " \"\"\"\n", + " Get the primary file extension for a given language.\n", + " \n", + " Args:\n", + " lang: Language identifier\n", + " \n", + " Returns:\n", + " File extension with dot (e.g., '.py', '.js')\n", + " \"\"\"\n", + " # Create reverse mapping from language to primary extension\n", + " lang_to_ext = {\n", + " \"python\": \".py\",\n", + " \"javascript\": \".js\",\n", + " \"typescript\": \".ts\",\n", + " \"java\": \".java\",\n", + " \"c\": \".c\",\n", + " \"cpp\": \".cpp\",\n", + " \"csharp\": \".cs\",\n", + " \"go\": \".go\",\n", + " \"php\": \".php\",\n", + " \"swift\": \".swift\",\n", + " \"ruby\": \".rb\",\n", + " \"kotlin\": \".kt\",\n", + " \"rust\": \".rs\",\n", + " }\n", + " return lang_to_ext.get(lang, \".txt\")\n", + "\n", + "\n", + "def save_outputs_to_temp(annotated_code: str, markdown: str, lang: str) -> Tuple[str, str]:\n", + " \"\"\"\n", + " Save annotated code and markdown to temporary files for download.\n", + " \n", + " Args:\n", + " annotated_code: Annotated source code\n", + " markdown: Markdown preview\n", + " lang: Programming language\n", + " \n", + " Returns:\n", + " Tuple of (source_file_path, markdown_file_path)\n", + " \"\"\"\n", + " # Get appropriate extension\n", + " ext = get_file_extension_for_language(lang)\n", + " \n", + " # Create temporary files\n", + " source_file = tempfile.NamedTemporaryFile(\n", + " mode='w',\n", + " suffix=ext,\n", + " prefix='annotated_',\n", + " delete=False,\n", + " encoding='utf-8'\n", + " )\n", + " \n", + " markdown_file = tempfile.NamedTemporaryFile(\n", + " mode='w',\n", + " suffix='.md',\n", + " prefix='annotated_',\n", + " delete=False,\n", + " encoding='utf-8'\n", + " )\n", + " \n", + " # Write content\n", + " source_file.write(annotated_code)\n", + " source_file.close()\n", + " \n", + " markdown_file.write(markdown)\n", + " markdown_file.close()\n", + " \n", + " return source_file.name, markdown_file.name\n", + "\n", + "\n", + "def handle_model_loading(model_id: str, load_in_8bit: bool, load_in_4bit: bool):\n", + " \"\"\"\n", + " Handle model loading with error handling and live progress updates for Gradio UI.\n", + " Yields status updates with elapsed time.\n", + " \n", + " Args:\n", + " model_id: Hugging Face model identifier\n", + " load_in_8bit: Whether to use 8-bit quantization\n", + " load_in_4bit: Whether to use 4-bit quantization\n", + " \n", + " Yields:\n", + " Status message updates with progress\n", + " \"\"\"\n", + " import threading\n", + " import time\n", + " \n", + " # Immediate status update - clears old text\n", + " yield \"๐Ÿ”„ **Step 1/4:** Initializing... (0s elapsed)\"\n", + " \n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"๐Ÿ”„ Starting model load: {model_id.strip()}\")\n", + " print(f\"{'='*60}\\n\")\n", + " \n", + " start_time = time.time()\n", + " loading_complete = False\n", + " error_message = None\n", + " \n", + " # Function to load model in background thread\n", + " def load_in_background():\n", + " nonlocal loading_complete, error_message\n", + " try:\n", + " load_model(\n", + " model_id.strip(),\n", + " device_map=DEVICE_HINT,\n", + " load_in_8bit=load_in_8bit,\n", + " load_in_4bit=load_in_4bit\n", + " )\n", + " loading_complete = True\n", + " except Exception as e:\n", + " error_message = str(e)\n", + " loading_complete = True\n", + " \n", + " # Start loading in background thread\n", + " thread = threading.Thread(target=load_in_background, daemon=True)\n", + " thread.start()\n", + " \n", + " # Progress stages with approximate timing\n", + " stages = [\n", + " (0, \"๐Ÿ”„ **Step 1/4:** Connecting to HuggingFace...\"),\n", + " (5, \"๐Ÿ”„ **Step 2/4:** Downloading tokenizer...\"),\n", + " (15, \"๐Ÿ”„ **Step 3/4:** Loading model weights...\"),\n", + " (30, \"๐Ÿ”„ **Step 4/4:** Finalizing model setup...\"),\n", + " ]\n", + " \n", + " stage_idx = 0\n", + " last_update = time.time()\n", + " \n", + " # Show progress updates while loading\n", + " while not loading_complete:\n", + " elapsed = int(time.time() - start_time)\n", + " \n", + " # Move to next stage if enough time passed\n", + " if stage_idx < len(stages) - 1 and elapsed >= stages[stage_idx + 1][0]:\n", + " stage_idx += 1\n", + " \n", + " # Update every 2 seconds\n", + " if time.time() - last_update >= 2:\n", + " current_stage = stages[stage_idx][1]\n", + " yield f\"{current_stage} ({elapsed}s elapsed)\"\n", + " last_update = time.time()\n", + " \n", + " time.sleep(0.5) # Check every 0.5 seconds\n", + " \n", + " # Final result\n", + " elapsed = int(time.time() - start_time)\n", + " if error_message:\n", + " yield f\"โŒ **Model loading failed** ({elapsed}s elapsed)\\n\\n{error_message}\"\n", + " else:\n", + " yield f\"โœ… **Model loaded successfully!** ({elapsed}s total)\"\n", + "\n", + "\n", + "def handle_file_processing(\n", + " file,\n", + " ask_model_flag: bool,\n", + " temperature: float,\n", + " max_new_tokens: int\n", + ") -> Tuple[str, str, str, Optional[str], Optional[str]]:\n", + " \"\"\"\n", + " Handle file processing workflow for Gradio UI.\n", + " \n", + " Args:\n", + " file: Gradio file upload object\n", + " ask_model_flag: Whether to generate model commentary\n", + " temperature: Generation temperature\n", + " max_new_tokens: Max tokens to generate\n", + " \n", + " Returns:\n", + " Tuple of (language, html_preview, model_commentary, source_path, markdown_path)\n", + " \"\"\"\n", + " # Validate file upload\n", + " if file is None:\n", + " return \"\", \"โš ๏ธ Please upload a code file.\", \"\", None, None\n", + " \n", + " # Check if model is required but not loaded\n", + " if ask_model_flag and MODEL_PIPELINE is None:\n", + " return \"\", \"\", \"โš ๏ธ **Model not loaded.** Please click '๐Ÿ”„ Load Model' button first before requesting AI analysis.\", None, None\n", + " \n", + " try:\n", + " # Gradio provides file as a path string or file object\n", + " if isinstance(file, str):\n", + " file_path = file\n", + " elif hasattr(file, 'name'):\n", + " file_path = file.name\n", + " else:\n", + " return \"\", \"
โŒ Invalid file upload format
\", \"\", None, None\n", + " \n", + " # Open and process the file\n", + " with open(file_path, 'rb') as f:\n", + " # Prepare model parameters\n", + " model_params = {\n", + " \"max_new_tokens\": int(max_new_tokens),\n", + " \"temperature\": float(temperature)\n", + " }\n", + " \n", + " # Process the code file\n", + " lang, annotated_code, markdown_preview, html_preview, model_commentary = process_code_file(\n", + " f,\n", + " ask_model_flag,\n", + " model_params\n", + " )\n", + " \n", + " # Save outputs to temporary files for download\n", + " source_path, markdown_path = save_outputs_to_temp(annotated_code, markdown_preview, lang)\n", + " \n", + " # Format model commentary\n", + " commentary_display = model_commentary if model_commentary else \"_(No model analysis generated)_\"\n", + " \n", + " return lang, html_preview, commentary_display, source_path, markdown_path\n", + " \n", + " except ValueError as e:\n", + " # User-facing errors (file too large, unsupported extension, etc.)\n", + " return \"\", f\"
โš ๏ธ {str(e)}
\", \"\", None, None\n", + " except Exception as e:\n", + " # Unexpected errors\n", + " import traceback\n", + " error_detail = traceback.format_exc()\n", + " print(f\"Error processing file: {error_detail}\")\n", + " return \"\", f\"
โŒ Processing failed: {str(e)}
\", \"\", None, None\n", + "\n", + "\n", + "def build_ui():\n", + " \"\"\"\n", + " Build the Gradio user interface for the Code Complexity Annotator.\n", + " \n", + " Returns:\n", + " Gradio Blocks interface\n", + " \"\"\"\n", + " if not GRADIO_AVAILABLE:\n", + " raise ImportError(\n", + " \"Gradio is not installed. Please run the installation cell \"\n", + " \"and restart the kernel.\"\n", + " )\n", + " \n", + " # Custom CSS for better UI\n", + " custom_css = \"\"\"\n", + " footer {visibility: hidden}\n", + " .gradio-container {font-family: 'Inter', sans-serif}\n", + " \"\"\"\n", + " \n", + " with gr.Blocks(css=custom_css, title=\"Code Complexity Annotator\") as demo:\n", + " # Header\n", + " gr.Markdown(\"# ๐Ÿ”ถ Multi-Language Code Complexity Annotator\")\n", + " gr.Markdown(\n", + " \"Upload code โ†’ Detect language โ†’ Auto-annotate with Big-O complexity โ†’ \"\n", + " \"Preview with syntax highlighting โ†’ Download results. \"\n", + " \"Optional: Get AI-powered code review from LLaMA.\"\n", + " )\n", + " \n", + " with gr.Row():\n", + " # Left column: Input controls\n", + " with gr.Column(scale=2):\n", + " gr.Markdown(\"### ๐Ÿ“ค Upload & Settings\")\n", + " \n", + " file_upload = gr.File(\n", + " label=\"Upload Code File\",\n", + " file_count=\"single\",\n", + " file_types=[ext for ext in SUPPORTED_EXTENSIONS.keys()]\n", + " )\n", + " \n", + " ask_model = gr.Checkbox(\n", + " label=\"๐Ÿค– Generate AI Code Review\",\n", + " value=True,\n", + " info=\"โš ๏ธ Requires model to be loaded first using the button below\"\n", + " )\n", + " \n", + " gr.Markdown(\"### ๐Ÿง  Model Configuration\")\n", + " \n", + " model_id = gr.Textbox(\n", + " label=\"Hugging Face Model ID\",\n", + " value=DEFAULT_MODEL_ID,\n", + " placeholder=\"meta-llama/Llama-3.2-1B\"\n", + " )\n", + " \n", + " with gr.Row():\n", + " load_8bit = gr.Checkbox(\n", + " label=\"8-bit Quantization\",\n", + " value=False,\n", + " info=\"โš ๏ธ Requires CUDA/GPU (reduces memory by ~50%)\"\n", + " )\n", + " load_4bit = gr.Checkbox(\n", + " label=\"4-bit Quantization\",\n", + " value=False,\n", + " info=\"โš ๏ธ Requires CUDA/GPU (reduces memory by ~75%, lower quality)\"\n", + " )\n", + " \n", + " temperature = gr.Slider(\n", + " label=\"Temperature\",\n", + " minimum=0.0,\n", + " maximum=1.5,\n", + " value=0.7,\n", + " step=0.05,\n", + " info=\"Lower = more deterministic, Higher = more creative\"\n", + " )\n", + " \n", + " max_tokens = gr.Slider(\n", + " label=\"Max New Tokens\",\n", + " minimum=16,\n", + " maximum=1024,\n", + " value=256,\n", + " step=16,\n", + " info=\"Maximum length of generated review\"\n", + " )\n", + " \n", + " with gr.Row():\n", + " load_model_btn = gr.Button(\"๐Ÿ”„ Load Model\", variant=\"secondary\")\n", + " process_btn = gr.Button(\"๐Ÿš€ Process & Annotate\", variant=\"primary\")\n", + " \n", + " model_status = gr.Markdown(\"โšช **Status:** Model not loaded\")\n", + " \n", + " # Right column: Output displays\n", + " with gr.Column(scale=3):\n", + " gr.Markdown(\"### ๐Ÿ“Š Results\")\n", + " \n", + " detected_lang = gr.Textbox(\n", + " label=\"Detected Language\",\n", + " interactive=False,\n", + " placeholder=\"Upload a file to detect language\"\n", + " )\n", + " \n", + " html_preview = gr.HTML(\n", + " label=\"Code Preview (Orange = Complexity Annotations)\",\n", + " value=\"Upload and process a file to see preview...\"\n", + " )\n", + " \n", + " model_output = gr.Markdown(\n", + " label=\"๐Ÿค– AI Code Review\",\n", + " value=\"*Enable 'Generate AI Code Review' and process a file to see analysis...*\"\n", + " )\n", + " \n", + " gr.Markdown(\"### ๐Ÿ’พ Downloads\")\n", + " \n", + " with gr.Row():\n", + " download_source = gr.File(\n", + " label=\"Annotated Source Code\",\n", + " interactive=False\n", + " )\n", + " download_markdown = gr.File(\n", + " label=\"Markdown Preview\",\n", + " interactive=False\n", + " )\n", + " \n", + " # Event handlers\n", + " load_model_btn.click(\n", + " fn=handle_model_loading,\n", + " inputs=[model_id, load_8bit, load_4bit],\n", + " outputs=[model_status],\n", + " show_progress=\"full\" # Show clear loading indicator\n", + " )\n", + " \n", + " process_btn.click(\n", + " fn=handle_file_processing,\n", + " inputs=[file_upload, ask_model, temperature, max_tokens],\n", + " outputs=[detected_lang, html_preview, model_output, download_source, download_markdown]\n", + " )\n", + " \n", + " return demo\n", + "\n", + "\n", + "# Build and display the interface\n", + "demo = build_ui()\n", + "demo" + ] + }, + { + "cell_type": "markdown", + "id": "3608ab3c", + "metadata": {}, + "source": [ + "## Step 11: Launch the App\n", + "\n", + "Starting the Gradio server with auto-browser launch.\n", + "\n", + "**Options:**\n", + "- `share=False` - Local only (set to True for public Gradio link)\n", + "- `inbrowser=True` - Automatically opens in your default browser\n", + "- `show_error=True` - Displays detailed error messages in the UI\n", + "\n", + "The app will be available at: `http://127.0.0.1:7861`\n", + "\n", + "---\n", + "\n", + "## ๐Ÿ’ก How to Use\n", + "\n", + "### Without AI Review (No Model Needed):\n", + "1. **Upload** a code file (.py, .js, .java, etc.)\n", + "2. **Uncheck** \"Generate AI Code Review\"\n", + "3. **Click** \"๐Ÿš€ Process & Annotate\"\n", + "4. **View** syntax-highlighted code with Big-O annotations\n", + "5. **Download** the annotated source + Markdown\n", + "\n", + "### With AI Review (Requires Model):\n", + "1. **Click** \"๐Ÿ”„ Load Model\" (wait 2-5 minutes for first download)\n", + "2. **Upload** your code file\n", + "3. **Check** \"Generate AI Code Review\"\n", + "4. **Adjust** temperature/tokens if needed\n", + "5. **Click** \"๐Ÿš€ Process & Annotate\"\n", + "6. **Read** AI-generated optimization suggestions\n", + "\n", + "---\n", + "\n", + "## ๐ŸŽฏ Supported Languages\n", + "\n", + "Python โ€ข JavaScript โ€ข TypeScript โ€ข Java โ€ข C โ€ข C++ โ€ข C# โ€ข Go โ€ข PHP โ€ข Swift โ€ข Ruby โ€ข Kotlin โ€ข Rust\n", + "\n", + "---\n", + "\n", + "## ๐Ÿง  Model Options\n", + "\n", + "**Recommended for CPU/Mac:**\n", + "- `meta-llama/Llama-3.2-1B` (Default, ~1GB, requires HF approval)\n", + "- `gpt2` (No approval needed, ~500MB)\n", + "- `microsoft/DialoGPT-medium` (~1GB)\n", + "\n", + "**For GPU users:**\n", + "- Any model with 8-bit or 4-bit quantization enabled\n", + "- `meta-llama/Llama-2-7b-chat-hf` (requires approval)\n", + "\n", + "---\n", + "\n", + "**Note:** First model load downloads weights (~1-14GB depending on model). Subsequent runs load from cache.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "eec78f72", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7861\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "๐Ÿ“„ Processing: /private/var/folders/jq/vdvn5cg53sj2xsq1w_0wjjc80000gn/T/gradio/d8fe7d241f82ae93c8cf07e99823e6db91d20185c411ded7454eb7a0d89174a4/3 Simple Python Functions with Different Time Complexities.py (0.00 MB)\n", + "๐Ÿ” Detected language: python\n", + "๐Ÿ“„ Processing: /private/var/folders/jq/vdvn5cg53sj2xsq1w_0wjjc80000gn/T/gradio/a2b7a4fdfb5e5f657878a74459fd8d68e30fc0afdfb6e5627aab99cf8552011d/Simple Python Functions with Different Time Complexities.py (0.00 MB)\n", + "๐Ÿ” Detected language: python\n", + "๐Ÿ“„ Processing: /private/var/folders/jq/vdvn5cg53sj2xsq1w_0wjjc80000gn/T/gradio/4dad1dc092f0232b348a683e42414de456c388b3e21d93ee820b8e7bc4a2aa47/Python Function.py (0.00 MB)\n", + "๐Ÿ” Detected language: python\n" + ] + } + ], + "source": [ + "# Launch the Gradio interface\n", + "demo.launch(\n", + " share=False, # Set to True to create a public shareable link\n", + " inbrowser=True, # Automatically open in browser\n", + " show_error=True # Show detailed errors in UI\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}