{ "cells": [ { "cell_type": "markdown", "id": "905062e4", "metadata": {}, "source": [ "# ๐ถ Multi-Language Code Complexity Annotator\n", "\n", "## Why I Built This\n", "\n", "Understanding time complexity (Big-O notation) is crucial for writing efficient algorithms, identifying bottlenecks, making informed optimization decisions, and passing technical interviews.\n", "\n", "Analyzing complexity manually is tedious and error-prone. This tool **automates** the entire processโdetecting loops, recursion, and functions, then annotating code with Big-O estimates and explanations.\n", "\n", "---\n", "\n", "## What This Does\n", "\n", "This app analyzes source code and automatically:\n", "- ๐ Detects loops, recursion, and functions\n", "- ๐งฎ Estimates Big-O complexity (O(1), O(n), O(nยฒ), etc.)\n", "- ๐ฌ Inserts inline comments explaining the complexity\n", "- ๐จ Generates syntax-highlighted previews\n", "- ๐ค **Optional:** Gets AI-powered code review from LLaMA\n", "\n", "**Supports 13 languages:** Python โข JavaScript โข TypeScript โข Java โข C/C++ โข C# โข Go โข PHP โข Swift โข Ruby โข Kotlin โข Rust\n", "\n", "**Tech:** HuggingFace Transformers โข LLaMA 3.2 โข Gradio UI โข Pygments โข Regex Analysis\n", "\n", "---\n", "\n", "**Use Case:** Upload your code โ Get instant complexity analysis โ Optimize with confidence\n" ] }, { "cell_type": "markdown", "id": "69e9876d", "metadata": {}, "source": [ "## Step 1: Install Dependencies\n", "\n", "Installing the complete stack:\n", "- **Transformers** - HuggingFace library for loading LLaMA models\n", "- **Accelerate** - Fast distributed training/inference\n", "- **Gradio** - Beautiful web interface\n", "- **PyTorch** (CPU version) - Deep learning framework\n", "- **BitsAndBytes** - 4/8-bit quantization for large models\n", "- **Pygments** - Syntax highlighting engine\n", "- **Python-dotenv** - Environment variable management\n", "\n", "**Note:** This installs the CPU-only version of PyTorch. For GPU support, modify the install command.\n" ] }, { "cell_type": "code", "execution_count": 12, "id": "f035a1c5", "metadata": {}, "outputs": [], "source": [ "!uv pip -q install -U pip\n", "!uv pip -q install transformers accelerate gradio torch --extra-index-url https://download.pytorch.org/whl/cpu\n", "!uv pip -q install bitsandbytes pygments python-dotenv" ] }, { "cell_type": "markdown", "id": "6ab14cd1", "metadata": {}, "source": [ "## Step 2: Core Configuration & Imports\n", "\n", "Setting up:\n", "- **Environment variables** to suppress progress bars (prevents Jupyter ContextVar issues)\n", "- **Dummy tqdm** class to avoid notebook conflicts\n", "- **Language mappings** for 13+ programming languages\n", "- **Complexity constants** for Big-O estimation\n", "- **Comment syntax** for each language (# vs //)\n", "\n", "**Key Configurations:**\n", "- Max file size: 2 MB\n", "- Default model: `meta-llama/Llama-3.2-1B`\n", "- Supported file extensions and their language identifiers\n" ] }, { "cell_type": "code", "execution_count": 13, "id": "5666a121", "metadata": {}, "outputs": [], "source": [ "import os\n", "import re\n", "import io\n", "import json\n", "import time\n", "import math\n", "from dataclasses import dataclass\n", "from typing import Tuple, List, Dict, Optional, Generator\n", "\n", "# Disable tqdm progress bars to avoid Jupyter ContextVar issues\n", "os.environ[\"TRANSFORMERS_NO_ADVISORY_WARNINGS\"] = \"1\"\n", "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n", "os.environ[\"TQDM_DISABLE\"] = \"1\" # Completely disable tqdm\n", "\n", "# Provide a module-level lock expected by some integrations\n", "class _DummyLock:\n", " def __enter__(self):\n", " return self\n", " def __exit__(self, *args):\n", " pass\n", "\n", "class _DummyTqdm:\n", " \"\"\"Dummy tqdm that does nothing - prevents Jupyter notebook ContextVar errors\"\"\"\n", " def __init__(self, *args, **kwargs):\n", " self.iterable = args[0] if args else None\n", " self.total = kwargs.get('total', 0)\n", " self.n = 0\n", " def __iter__(self):\n", " return iter(self.iterable) if self.iterable else iter([])\n", " def __enter__(self):\n", " return self\n", " def __exit__(self, *args):\n", " pass\n", " def update(self, n=1, *args, **kwargs):\n", " self.n += n\n", " def close(self):\n", " pass\n", " def set_description(self, *args, **kwargs):\n", " pass\n", " def set_postfix(self, *args, **kwargs):\n", " pass\n", " def refresh(self, *args, **kwargs):\n", " pass\n", " def clear(self, *args, **kwargs):\n", " pass\n", " def write(self, *args, **kwargs):\n", " pass\n", " def reset(self, total=None):\n", " self.n = 0\n", " if total is not None:\n", " self.total = total\n", " @staticmethod\n", " def get_lock():\n", " \"\"\"Return a dummy lock to avoid ContextVar issues\"\"\"\n", " return _DummyLock()\n", " \n", " @staticmethod\n", " def set_lock(lock=None):\n", " \"\"\"Dummy set_lock method - does nothing\"\"\"\n", " pass\n", "\n", "def _dummy_get_lock():\n", " \"\"\"Module-level get_lock function\"\"\"\n", " return _DummyLock()\n", "\n", "def _dummy_set_lock(lock=None):\n", " \"\"\"Module-level set_lock function - does nothing\"\"\"\n", " pass\n", "\n", "# Import and immediately patch tqdm before transformers can use it\n", "def _patch_tqdm():\n", " \"\"\"Patch tqdm to avoid ContextVar errors in Jupyter\"\"\"\n", " import sys # Import sys here since it's not available in outer scope\n", " try:\n", " import tqdm\n", " import tqdm.auto\n", " import tqdm.notebook\n", "\n", " # Patch classes\n", " tqdm.tqdm = _DummyTqdm\n", " tqdm.auto.tqdm = _DummyTqdm\n", " tqdm.notebook.tqdm = _DummyTqdm\n", "\n", " # Patch module-level functions that other code might call directly\n", " tqdm.get_lock = _dummy_get_lock\n", " tqdm.auto.get_lock = _dummy_get_lock\n", " tqdm.notebook.get_lock = _dummy_get_lock\n", " tqdm.set_lock = _dummy_set_lock\n", " tqdm.auto.set_lock = _dummy_set_lock\n", " tqdm.notebook.set_lock = _dummy_set_lock\n", "\n", " # Also patch in sys.modules to catch any dynamic imports\n", " sys.modules['tqdm'].tqdm = _DummyTqdm\n", " sys.modules['tqdm.auto'].tqdm = _DummyTqdm\n", " sys.modules['tqdm.notebook'].tqdm = _DummyTqdm\n", " sys.modules['tqdm'].get_lock = _dummy_get_lock\n", " sys.modules['tqdm.auto'].get_lock = _dummy_get_lock\n", " sys.modules['tqdm.notebook'].get_lock = _dummy_get_lock\n", " sys.modules['tqdm'].set_lock = _dummy_set_lock\n", " sys.modules['tqdm.auto'].set_lock = _dummy_set_lock\n", " sys.modules['tqdm.notebook'].set_lock = _dummy_set_lock\n", "\n", " except ImportError:\n", " pass\n", "\n", "_patch_tqdm()\n", "\n", "from dotenv import load_dotenv\n", "\n", "SUPPORTED_EXTENSIONS = {\n", " \".py\": \"python\",\n", " \".js\": \"javascript\",\n", " \".ts\": \"typescript\",\n", " \".java\": \"java\",\n", " \".c\": \"c\",\n", " \".h\": \"c\",\n", " \".cpp\": \"cpp\",\n", " \".cc\": \"cpp\",\n", " \".hpp\": \"cpp\",\n", " \".cs\": \"csharp\",\n", " \".go\": \"go\",\n", " \".php\": \"php\",\n", " \".swift\": \"swift\",\n", " \".rb\": \"ruby\",\n", " \".kt\": \"kotlin\",\n", " \".rs\": \"rust\",\n", "}\n", "\n", "COMMENT_SYNTAX = {\n", " \"python\": \"#\",\n", " \"javascript\": \"//\",\n", " \"typescript\": \"//\",\n", " \"java\": \"//\",\n", " \"c\": \"//\",\n", " \"cpp\": \"//\",\n", " \"csharp\": \"//\",\n", " \"go\": \"//\",\n", " \"php\": \"//\",\n", " \"swift\": \"//\",\n", " \"ruby\": \"#\",\n", " \"kotlin\": \"//\",\n", " \"rust\": \"//\",\n", "}\n", "\n", "MAX_FILE_SIZE_MB = 2.0\n", "# Llama 3.2 1B - The actual model name (not -Instruct suffix)\n", "# Requires Meta approval: https://huggingface.co/meta-llama/Llama-3.2-1B\n", "DEFAULT_MODEL_ID = \"meta-llama/Llama-3.2-1B\"\n", "DEVICE_HINT = \"auto\"\n", "\n", "# Global token storage (set in Cell 2 to avoid Jupyter ContextVar issues)\n", "HUGGINGFACE_TOKEN = None\n", "\n", "# Complexity estimation constants\n", "LOOP_KEYWORDS = [r\"\\bfor\\b\", r\"\\bwhile\\b\"]\n", "\n", "FUNCTION_PATTERNS = [\n", " r\"^\\s*def\\s+([A-Za-z_]\\w*)\\s*\\(\", # Python\n", " r\"^\\s*(?:public|private|protected)?\\s*(?:static\\s+)?[A-Za-z_<>\\[\\]]+\\s+([A-Za-z_]\\w*)\\s*\\(\", # Java/C#/C++\n", " r\"^\\s*function\\s+([A-Za-z_]\\w*)\\s*\\(\", # JavaScript\n", " r\"^\\s*(?:const|let|var)\\s+([A-Za-z_]\\w*)\\s*=\\s*\\(\", # JavaScript arrow/function\n", "]\n", "\n", "COMPLEXITY_ORDER = {\n", " \"O(1)\": 0,\n", " \"O(log n)\": 1,\n", " \"O(n)\": 2,\n", " \"O(n log n)\": 3,\n", " \"O(n^2)\": 4,\n", " \"O(n^3)\": 5,\n", "}\n", "\n", "RECURSION_PATTERNS = {\n", " \"divide_conquer\": r\"\\b(n/2|n >> 1|n>>1|n\\s*//\\s*2|mid\\b)\",\n", "}\n", "\n", "# HTML syntax highlighting styles (orange comments)\n", "SYNTAX_HIGHLIGHT_CSS = \"\"\"\"\"\"" ] }, { "cell_type": "markdown", "id": "d17e8406", "metadata": {}, "source": [ "## Step 3: Load HuggingFace Token\n", "\n", "Loading authentication token from `.env` file to access gated models like LLaMA.\n", "\n", "**Why?** Meta's LLaMA models require:\n", "1. Accepting their license agreement on HuggingFace\n", "2. Using an access token for authentication\n", "\n", "**Create a `.env` file with:**\n", "```\n", "HF_TOKEN=hf_your_token_here\n", "```\n", "\n", "Get your token at: https://huggingface.co/settings/tokens\n" ] }, { "cell_type": "code", "execution_count": 14, "id": "70beee01", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "โ Hugging Face token loaded successfully from .env file\n", " Token length: 37 characters\n" ] } ], "source": [ "load_dotenv()\n", "\n", "# Load token from .env file\n", "HF_TOKEN = os.getenv(\"HF_TOKEN\", \"\").strip()\n", "\n", "# Store in global variable to avoid Jupyter ContextVar issues with os.environ\n", "global HUGGINGFACE_TOKEN\n", "\n", "if HF_TOKEN:\n", " os.environ[\"HUGGING_FACE_HUB_TOKEN\"] = HF_TOKEN\n", " HUGGINGFACE_TOKEN = HF_TOKEN # Store in global variable\n", " print(\"โ Hugging Face token loaded successfully from .env file\")\n", " print(f\" Token length: {len(HF_TOKEN)} characters\")\n", "else:\n", " print(\"โ ๏ธ No HF_TOKEN found in .env file. Gated models may not work.\")\n", " HUGGINGFACE_TOKEN = None" ] }, { "cell_type": "markdown", "id": "bd0a557e", "metadata": {}, "source": [ "## Step 4: Language Detection Functions\n", "\n", "Two simple but essential utilities:\n", "\n", "1. **`detect_language(filename)`** - Detects programming language from file extension\n", "2. **`comment_prefix_for(lang)`** - Returns the comment symbol for that language (# or //)\n", "\n", "These enable the tool to automatically adapt to any supported language.\n" ] }, { "cell_type": "code", "execution_count": 15, "id": "a0dbad5f", "metadata": {}, "outputs": [], "source": [ "def detect_language(filename: str) -> str:\n", " \"\"\"\n", " Detect programming language based on file extension.\n", " \n", " Args:\n", " filename: Name of the file (must have a supported extension)\n", " \n", " Returns:\n", " Language identifier string (e.g., 'python', 'javascript', etc.)\n", " \n", " Raises:\n", " ValueError: If file extension is not supported\n", " \"\"\"\n", " ext = os.path.splitext(filename)[1].lower()\n", " \n", " if not ext:\n", " supported = \", \".join(sorted(SUPPORTED_EXTENSIONS.keys()))\n", " raise ValueError(f\"File has no extension. Supported extensions: {supported}\")\n", " \n", " if ext not in SUPPORTED_EXTENSIONS:\n", " supported = \", \".join(sorted(SUPPORTED_EXTENSIONS.keys()))\n", " raise ValueError(f\"Unsupported file extension '{ext}'. Supported extensions: {supported}\")\n", " \n", " return SUPPORTED_EXTENSIONS[ext]\n", "\n", "def comment_prefix_for(lang: str) -> str:\n", " \"\"\"\n", " Get the comment prefix for a given language.\n", " \n", " Args:\n", " lang: Language identifier (e.g., 'python', 'javascript')\n", " \n", " Returns:\n", " Comment prefix string (e.g., '#' or '//')\n", " \n", " Raises:\n", " ValueError: If language is not supported\n", " \"\"\"\n", " if lang not in COMMENT_SYNTAX:\n", " raise ValueError(f\"Unsupported language '{lang}'. Supported: {', '.join(sorted(COMMENT_SYNTAX.keys()))}\")\n", " \n", " return COMMENT_SYNTAX[lang]" ] }, { "cell_type": "markdown", "id": "13e0f6d8", "metadata": {}, "source": [ "## Step 5: Complexity Estimation Engine\n", "\n", "The core analysis logic using **heuristic pattern matching**:\n", "\n", "**How it works:**\n", "1. **Detect blocks** - Find all functions, loops, and recursion using regex patterns\n", "2. **Analyze loops** - Count nesting depth (1 loop = O(n), 2 nested = O(nยฒ), etc.)\n", "3. **Analyze recursion** - Detect divide-and-conquer (O(log n)) vs exponential (O(2^n))\n", "4. **Aggregate** - Functions inherit the worst complexity of their inner operations\n", "\n", "**Key Functions:**\n", "- `detect_blocks()` - Pattern matching for code structures\n", "- `analyze_recursion()` - Identifies recursive patterns\n", "- `analyze_loop_complexity()` - Counts nested loops\n", "- `estimate_complexity()` - Orchestrates the full analysis\n" ] }, { "cell_type": "code", "execution_count": 16, "id": "7595dfe3", "metadata": {}, "outputs": [], "source": [ "@dataclass\n", "class BlockInfo:\n", " \"\"\"Represents a code block (function, loop, or recursion) with complexity information.\"\"\"\n", " line_idx: int\n", " kind: str # \"function\" | \"loop\" | \"recursion\"\n", " name: Optional[str] = None\n", " depth: int = 0\n", " complexity: str = \"O(1)\"\n", " reason: str = \"\"\n", "\n", "\n", "def get_indent_level(line: str) -> int:\n", " \"\"\"Calculate indentation level of a line (tabs converted to 4 spaces).\"\"\"\n", " normalized = line.replace(\"\\t\", \" \")\n", " return len(normalized) - len(normalized.lstrip(\" \"))\n", "\n", "\n", "def find_function_name(line: str) -> Optional[str]:\n", " \"\"\"Extract function name from a line if it contains a function declaration.\"\"\"\n", " for pattern in FUNCTION_PATTERNS:\n", " match = re.search(pattern, line)\n", " if match and match.lastindex:\n", " return match.group(1)\n", " return None\n", "\n", "\n", "def get_block_end(block: BlockInfo, all_blocks: List[BlockInfo], total_lines: int) -> int:\n", " \"\"\"Calculate the end line index for a given block.\"\"\"\n", " end = total_lines\n", " for other in all_blocks:\n", " if other.line_idx > block.line_idx and other.depth <= block.depth:\n", " end = min(end, other.line_idx)\n", " return end\n", "\n", "\n", "def rank_complexity(complexity: str) -> int:\n", " \"\"\"Assign a numeric rank to a complexity string for comparison.\"\"\"\n", " # Check for polynomial complexities O(n^k)\n", " match = re.match(r\"O\\(n\\^(\\d+)\\)\", complexity)\n", " if match:\n", " return 10 + int(match.group(1))\n", " \n", " return COMPLEXITY_ORDER.get(complexity, 0)\n", "\n", "\n", "def detect_blocks(lines: List[str], lang: str) -> List[BlockInfo]:\n", " \"\"\"Detect all code blocks (functions and loops) in the source code.\"\"\"\n", " blocks = []\n", " stack = []\n", " brace_depth = 0\n", " \n", " # Pre-compute indentation for Python\n", " indents = [get_indent_level(line) for line in lines] if lang == \"python\" else []\n", " \n", " for i, line in enumerate(lines):\n", " stripped = line.strip()\n", " \n", " # Track brace depth for non-Python languages\n", " if lang != \"python\":\n", " brace_depth += line.count(\"{\") - line.count(\"}\")\n", " brace_depth = max(0, brace_depth)\n", " \n", " # Update stack based on indentation/brace depth\n", " if lang == \"python\":\n", " while stack and indents[i] < stack[-1]:\n", " stack.pop()\n", " else:\n", " while stack and brace_depth < stack[-1]:\n", " stack.pop()\n", " \n", " current_depth = len(stack)\n", " \n", " # Detect loops\n", " if any(re.search(pattern, stripped) for pattern in LOOP_KEYWORDS):\n", " blocks.append(BlockInfo(\n", " line_idx=i,\n", " kind=\"loop\",\n", " depth=current_depth + 1\n", " ))\n", " stack.append(indents[i] if lang == \"python\" else brace_depth)\n", " \n", " # Detect functions\n", " func_name = find_function_name(line)\n", " if func_name:\n", " blocks.append(BlockInfo(\n", " line_idx=i,\n", " kind=\"function\",\n", " name=func_name,\n", " depth=current_depth + 1\n", " ))\n", " stack.append(indents[i] if lang == \"python\" else brace_depth)\n", " \n", " return blocks\n", "\n", "\n", "def analyze_recursion(block: BlockInfo, blocks: List[BlockInfo], lines: List[str]) -> None:\n", " \"\"\"Analyze a function block for recursion and update its complexity.\"\"\"\n", " if block.kind != \"function\" or not block.name:\n", " return\n", " \n", " end = get_block_end(block, blocks, len(lines))\n", " body = \"\\n\".join(lines[block.line_idx:end])\n", " \n", " # Count recursive calls (subtract 1 for the function definition itself)\n", " recursive_calls = len(re.findall(rf\"\\b{re.escape(block.name)}\\s*\\(\", body)) - 1\n", " \n", " if recursive_calls == 0:\n", " return\n", " \n", " # Detect divide-and-conquer pattern\n", " if re.search(RECURSION_PATTERNS[\"divide_conquer\"], body):\n", " block.kind = \"recursion\"\n", " block.complexity = \"O(log n)\"\n", " block.reason = \"Divide-and-conquer recursion (problem size halves each call).\"\n", " # Multiple recursive calls suggest exponential\n", " elif recursive_calls >= 2:\n", " block.kind = \"recursion\"\n", " block.complexity = \"O(2^n)\"\n", " block.reason = \"Multiple recursive calls per frame suggest exponential growth.\"\n", " # Single recursive call is linear\n", " else:\n", " block.kind = \"recursion\"\n", " block.complexity = \"O(n)\"\n", " block.reason = \"Single recursive call per frame suggests linear recursion.\"\n", "\n", "\n", "def analyze_loop_complexity(block: BlockInfo, all_loops: List[BlockInfo], blocks: List[BlockInfo], total_lines: int) -> None:\n", " \"\"\"Analyze loop nesting depth and assign complexity.\"\"\"\n", " if block.kind != \"loop\":\n", " return\n", " \n", " end = get_block_end(block, blocks, total_lines)\n", " \n", " # Count nested loops within this loop\n", " inner_loops = [loop for loop in all_loops \n", " if block.line_idx < loop.line_idx < end]\n", " \n", " nesting_depth = 1 + len(inner_loops)\n", " \n", " if nesting_depth == 1:\n", " block.complexity = \"O(n)\"\n", " block.reason = \"Single loop scales linearly with input size.\"\n", " elif nesting_depth == 2:\n", " block.complexity = \"O(n^2)\"\n", " block.reason = \"Nested loops indicate quadratic time.\"\n", " elif nesting_depth == 3:\n", " block.complexity = \"O(n^3)\"\n", " block.reason = \"Three nested loops indicate cubic time.\"\n", " else:\n", " block.complexity = f\"O(n^{nesting_depth})\"\n", " block.reason = f\"{nesting_depth} nested loops suggest polynomial time.\"\n", "\n", "\n", "def analyze_function_complexity(block: BlockInfo, blocks: List[BlockInfo], total_lines: int) -> None:\n", " \"\"\"Analyze overall function complexity based on contained blocks.\"\"\"\n", " if block.kind != \"function\":\n", " return\n", " \n", " end = get_block_end(block, blocks, total_lines)\n", " \n", " # Get all blocks within this function\n", " inner_blocks = [b for b in blocks if block.line_idx < b.line_idx < end]\n", " \n", " # Find the worst complexity among inner blocks\n", " worst_complexity = \"O(1)\"\n", " for inner in inner_blocks:\n", " if rank_complexity(inner.complexity) > rank_complexity(worst_complexity):\n", " worst_complexity = inner.complexity\n", " \n", " # Special case: recursion + loop = O(n log n)\n", " has_recursion = any(b.kind == \"recursion\" for b in inner_blocks)\n", " has_loop = any(b.kind == \"loop\" for b in inner_blocks)\n", " \n", " if has_recursion and has_loop:\n", " block.complexity = \"O(n log n)\"\n", " block.reason = \"Combines recursion with iteration (e.g., merge sort pattern).\"\n", " else:\n", " block.complexity = worst_complexity\n", " block.reason = \"Based on worst-case complexity of inner operations.\"\n", "\n", "\n", "def estimate_complexity(lines: List[str], lang: str) -> List[BlockInfo]:\n", " \"\"\"\n", " Estimate Big-O complexity for code blocks using heuristic analysis.\n", " \n", " Heuristics:\n", " - Single/nested loops: O(n), O(n^2), O(n^3), etc.\n", " - Recursion patterns: O(n), O(log n), O(2^n)\n", " - Function complexity: worst case of internal operations\n", " \n", " Args:\n", " lines: Source code lines\n", " lang: Programming language identifier\n", " \n", " Returns:\n", " List of BlockInfo objects with complexity estimates\n", " \"\"\"\n", " # Step 1: Detect all blocks\n", " blocks = detect_blocks(lines, lang)\n", " \n", " # Step 2: Analyze recursion in functions\n", " for block in blocks:\n", " analyze_recursion(block, blocks, lines)\n", " \n", " # Step 3: Analyze loop complexities\n", " loops = [b for b in blocks if b.kind == \"loop\"]\n", " for loop in loops:\n", " analyze_loop_complexity(loop, loops, blocks, len(lines))\n", " \n", " # Step 4: Analyze overall function complexities\n", " for block in blocks:\n", " analyze_function_complexity(block, blocks, len(lines))\n", " \n", " return blocks" ] }, { "cell_type": "markdown", "id": "f2a22988", "metadata": {}, "source": [ "## Step 6: Code Annotation Functions\n", "\n", "Takes the complexity estimates and **inserts them as comments** into the source code:\n", "\n", "**Process:**\n", "1. `create_annotation_comment()` - Formats Big-O annotations as language-specific comments\n", "2. `insert_annotations()` - Inserts comments below each function/loop\n", "3. `to_markdown()` - Wraps annotated code in Markdown code blocks\n", "\n", "**Example output:**\n", "```python\n", "def bubble_sort(arr):\n", "# Big-O: O(n^2)\n", "# Explanation: Nested loops indicate quadratic time.\n", " for i in range(len(arr)):\n", " for j in range(len(arr) - i - 1):\n", " if arr[j] > arr[j + 1]:\n", " arr[j], arr[j + 1] = arr[j + 1], arr[j]\n", "```\n" ] }, { "cell_type": "code", "execution_count": 17, "id": "2e642483", "metadata": {}, "outputs": [], "source": [ "def create_annotation_comment(block: BlockInfo, comment_prefix: str) -> List[str]:\n", " \"\"\"\n", " Create annotation comments for a code block.\n", " \n", " Args:\n", " block: BlockInfo object containing complexity information\n", " comment_prefix: Comment syntax for the language (e.g., '#' or '//')\n", " \n", " Returns:\n", " List of comment lines to insert\n", " \"\"\"\n", " complexity = block.complexity or \"O(1)\"\n", " reason = block.reason or \"Heuristic estimate based on detected structure.\"\n", " \n", " return [\n", " f\"{comment_prefix} Big-O: {complexity}\",\n", " f\"{comment_prefix} Explanation: {reason}\"\n", " ]\n", "\n", "\n", "def insert_annotations(code: str, lang: str) -> str:\n", " \"\"\"\n", " Insert Big-O complexity annotations into source code.\n", " \n", " Analyzes the code for loops, functions, and recursion, then inserts\n", " orange-colored comment annotations (when syntax highlighted) beneath\n", " each detected block explaining its time complexity.\n", " \n", " Args:\n", " code: Source code string to annotate\n", " lang: Programming language identifier\n", " \n", " Returns:\n", " Annotated source code with Big-O comments inserted\n", " \"\"\"\n", " if not code.strip():\n", " return code\n", " \n", " lines = code.splitlines()\n", " blocks = estimate_complexity(lines, lang)\n", " \n", " if not blocks:\n", " return code\n", " \n", " comment_prefix = comment_prefix_for(lang)\n", " \n", " # Build a map of line numbers to annotation comments\n", " annotations: Dict[int, List[str]] = {}\n", " for block in blocks:\n", " line_num = block.line_idx + 1 # Convert 0-indexed to 1-indexed\n", " comments = create_annotation_comment(block, comment_prefix)\n", " annotations.setdefault(line_num, []).extend(comments)\n", " \n", " # Insert annotations after their corresponding lines\n", " annotated_lines = []\n", " for line_num, original_line in enumerate(lines, start=1):\n", " annotated_lines.append(original_line)\n", " if line_num in annotations:\n", " annotated_lines.extend(annotations[line_num])\n", " \n", " return \"\\n\".join(annotated_lines)\n", "\n", "\n", "def to_markdown(code: str, lang: str) -> str:\n", " \"\"\"\n", " Format annotated code as Markdown with syntax highlighting.\n", " \n", " Args:\n", " code: Annotated source code\n", " lang: Programming language identifier for syntax highlighting\n", " \n", " Returns:\n", " Markdown-formatted code block\n", " \"\"\"\n", " lang_display = lang.capitalize()\n", " \n", " return f\"\"\"### Annotated Code ({lang_display})\n", "\n", "```{lang}\n", "{code}\n", "```\n", "\"\"\"" ] }, { "cell_type": "markdown", "id": "184ad5c1", "metadata": {}, "source": [ "## Step 7: Syntax Highlighting with Pygments\n", "\n", "Generates beautiful, syntax-highlighted HTML previews with **orange-colored complexity comments**.\n", "\n", "**Features:**\n", "- Uses Pygments lexer for accurate language-specific highlighting\n", "- Custom CSS to make Big-O comments stand out in orange\n", "- Fallback to plain HTML if Pygments is unavailable\n", "- HTML escaping for security\n" ] }, { "cell_type": "code", "execution_count": 18, "id": "0f01d30b", "metadata": {}, "outputs": [], "source": [ "def escape_html(text: str) -> str:\n", " \"\"\"\n", " Escape HTML special characters for safe display.\n", " \n", " Args:\n", " text: Raw text to escape\n", " \n", " Returns:\n", " HTML-safe text\n", " \"\"\"\n", " html_escape_table = {\n", " \"&\": \"&\",\n", " \"<\": \"<\",\n", " \">\": \">\",\n", " '\"': \""\",\n", " \"'\": \"'\",\n", " }\n", " return \"\".join(html_escape_table.get(c, c) for c in text)\n", "\n", "\n", "def highlighted_html(code: str, lang: str) -> str:\n", " \"\"\"\n", " Generate syntax-highlighted HTML with orange-colored comments.\n", " \n", " Uses Pygments for syntax highlighting with custom CSS to make\n", " comments appear in orange for visual emphasis of Big-O annotations.\n", " \n", " Args:\n", " code: Source code to highlight\n", " lang: Programming language identifier\n", " \n", " Returns:\n", " HTML string with embedded CSS and syntax highlighting\n", " \"\"\"\n", " if not code.strip():\n", " return f\"
{escape_html(code)}\"\n",
" \n",
" try:\n",
" from pygments import highlight\n",
" from pygments.lexers import get_lexer_by_name\n",
" from pygments.formatters import HtmlFormatter\n",
" \n",
" # Get appropriate lexer for the language\n",
" lexer = get_lexer_by_name(lang)\n",
" \n",
" # Configure HTML formatter\n",
" formatter = HtmlFormatter(\n",
" nowrap=False,\n",
" full=False,\n",
" cssclass=\"codehilite\",\n",
" linenos=False\n",
" )\n",
" \n",
" # Generate highlighted HTML\n",
" html_code = highlight(code, lexer, formatter)\n",
" \n",
" return SYNTAX_HIGHLIGHT_CSS + html_code\n",
" \n",
" except ImportError:\n",
" # Pygments not available - return plain HTML\n",
" return f\"{escape_html(code)}\"\n",
" \n",
" except Exception as e:\n",
" # Lexer not found or other error - fallback to plain HTML\n",
" print(f\"โ ๏ธ Syntax highlighting failed for '{lang}': {e}\")\n",
" return f\"{escape_html(code)}\""
]
},
{
"cell_type": "markdown",
"id": "36fd0454",
"metadata": {},
"source": [
"## Step 8: LLaMA Model Loading & Streaming\n",
"\n",
"Loading HuggingFace LLaMA models for AI-powered code review:\n",
"\n",
"**Key Features:**\n",
"- **Quantization support** - 4-bit or 8-bit to reduce memory (requires GPU)\n",
"- **Streaming generation** - See tokens appear in real-time\n",
"- **Automatic device mapping** - Uses GPU if available, CPU otherwise\n",
"- **Thread-safe streaming** - Uses `TextIteratorStreamer` for parallel generation\n",
"\n",
"**Functions:**\n",
"- `load_model()` - Downloads and initializes the LLaMA model\n",
"- `stream_generate()` - Generates text token-by-token with streaming\n",
"\n",
"**Memory Requirements:**\n",
"- **Without quantization:** ~14GB RAM (7B models) or ~26GB (13B models)\n",
"- **With 8-bit:** ~50% reduction (GPU required)\n",
"- **With 4-bit:** ~75% reduction (GPU required)\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "e7de6947",
"metadata": {},
"outputs": [],
"source": [
"# Hugging Face model imports\n",
"try:\n",
" from transformers import (\n",
" AutoModelForCausalLM,\n",
" AutoTokenizer,\n",
" BitsAndBytesConfig,\n",
" TextIteratorStreamer,\n",
" pipeline\n",
" )\n",
" import threading\n",
" TRANSFORMERS_AVAILABLE = True\n",
"except ImportError:\n",
" TRANSFORMERS_AVAILABLE = False\n",
"\n",
"# Global model state\n",
"MODEL_PIPELINE = None\n",
"TOKENIZER = None\n",
"\n",
"\n",
"def get_quantization_config(load_in_4bit: bool, load_in_8bit: bool) -> Optional[BitsAndBytesConfig]:\n",
" \"\"\"\n",
" Create quantization configuration for model loading.\n",
" \n",
" Args:\n",
" load_in_4bit: Whether to use 4-bit quantization\n",
" load_in_8bit: Whether to use 8-bit quantization\n",
" \n",
" Returns:\n",
" BitsAndBytesConfig object or None if quantization not requested/available\n",
" \n",
" Raises:\n",
" RuntimeError: If quantization requested but CUDA not available\n",
" \"\"\"\n",
" if not (load_in_4bit or load_in_8bit):\n",
" return None\n",
" \n",
" # Check if CUDA is available\n",
" try:\n",
" import torch\n",
" if not torch.cuda.is_available():\n",
" raise RuntimeError(\n",
" \"Quantization requires CUDA (NVIDIA GPU).\\n\\n\"\n",
" \"You are running on CPU/Mac and have requested quantization.\\n\"\n",
" \"Options:\\n\"\n",
" \" 1. Disable both 4-bit and 8-bit quantization to run on CPU\\n\"\n",
" \" (requires ~26GB RAM for 13B models, ~14GB for 7B models)\\n\"\n",
" \" 2. Use a GPU with CUDA support\\n\"\n",
" \" 3. Try smaller models like gpt2 or microsoft/DialoGPT-medium\\n\\n\"\n",
" \"Note: Quantization significantly reduces memory usage but requires GPU.\"\n",
" )\n",
" except ImportError:\n",
" pass\n",
" \n",
" try:\n",
" return BitsAndBytesConfig(load_in_4bit=load_in_4bit, load_in_8bit=load_in_8bit)\n",
" except Exception as e:\n",
" raise RuntimeError(f\"Failed to create quantization config: {e}\")\n",
"\n",
"\n",
"def load_model(\n",
" model_id: str = DEFAULT_MODEL_ID,\n",
" device_map: str = DEVICE_HINT,\n",
" load_in_8bit: bool = False,\n",
" load_in_4bit: bool = False\n",
") -> None:\n",
" \"\"\"\n",
" Load a Hugging Face LLaMA-family model for text generation.\n",
" \n",
" Supports optional 4-bit or 8-bit quantization to reduce memory usage.\n",
" Model is loaded into global MODEL_PIPELINE and TOKENIZER variables.\n",
" \n",
" Args:\n",
" model_id: Hugging Face model identifier\n",
" device_map: Device mapping strategy ('auto', 'cpu', 'cuda', etc.)\n",
" load_in_8bit: Enable 8-bit quantization\n",
" load_in_4bit: Enable 4-bit quantization\n",
" \n",
" Raises:\n",
" ImportError: If transformers is not installed\n",
" Exception: If model loading fails\n",
" \"\"\"\n",
" global MODEL_PIPELINE, TOKENIZER\n",
" \n",
" if not TRANSFORMERS_AVAILABLE:\n",
" raise ImportError(\n",
" \"Transformers library is not installed. \"\n",
" \"Please run the installation cell and restart the kernel.\"\n",
" )\n",
" \n",
" # Use global variable instead of os.environ to avoid Jupyter ContextVar issues\n",
" global HUGGINGFACE_TOKEN\n",
" hf_token = HUGGINGFACE_TOKEN if HUGGINGFACE_TOKEN else None\n",
" \n",
" if hf_token:\n",
" print(f\" Using HuggingFace token: {hf_token[:10]}...{hf_token[-4:]}\")\n",
" else:\n",
" print(\" No HuggingFace token available (may fail for gated models)\")\n",
" \n",
" # Configure quantization if requested\n",
" quant_config = get_quantization_config(load_in_4bit, load_in_8bit)\n",
" \n",
" print(f\"๐ Loading model: {model_id}\")\n",
" print(f\" Device map: {device_map}\")\n",
" print(f\" Quantization: 8-bit={load_in_8bit}, 4-bit={load_in_4bit}\")\n",
" print(f\" This may take 2-5 minutes... please wait...\")\n",
"\n",
" # Final tqdm patch before model loading (catches any missed imports)\n",
" _patch_tqdm()\n",
"\n",
" try:\n",
" # Suppress transformers warnings\n",
" from transformers.utils import logging\n",
" logging.set_verbosity_error()\n",
" \n",
" TOKENIZER = AutoTokenizer.from_pretrained(\n",
" model_id,\n",
" token=hf_token,\n",
" trust_remote_code=False\n",
" )\n",
" \n",
" print(\" โ Tokenizer loaded\")\n",
" \n",
" # Load model\n",
" model = AutoModelForCausalLM.from_pretrained(\n",
" model_id,\n",
" device_map=device_map,\n",
" quantization_config=quant_config,\n",
" token=hf_token,\n",
" trust_remote_code=False,\n",
" low_cpu_mem_usage=True\n",
" )\n",
" \n",
" print(\" โ Model loaded into memory\")\n",
" \n",
" # Create pipeline\n",
" MODEL_PIPELINE = pipeline(\n",
" \"text-generation\",\n",
" model=model,\n",
" tokenizer=TOKENIZER\n",
" )\n",
" \n",
" print(\"โ
Model loaded successfully\")\n",
" \n",
" except Exception as e:\n",
" print(f\"โ Model loading failed: {e}\")\n",
" print(\"\\n๐ก Troubleshooting:\")\n",
" print(\" โข Gated models require HuggingFace approval and token\")\n",
" print(\" โข Large models (13B+) need quantization OR ~26GB+ RAM\")\n",
" print(\" โข Quantization requires NVIDIA GPU with CUDA\")\n",
" print(\"\\n๐ก Models that work on CPU/Mac (no GPU needed):\")\n",
" print(\" โข gpt2 (~500MB RAM)\")\n",
" print(\" โข microsoft/DialoGPT-medium (~1GB RAM)\")\n",
" print(\" โข meta-llama/Llama-2-7b-chat-hf (~14GB RAM, needs approval)\")\n",
" print(\"\\nBrowse more models: https://huggingface.co/models?pipeline_tag=text-generation\")\n",
" MODEL_PIPELINE = None\n",
" TOKENIZER = None\n",
" raise\n",
"\n",
"\n",
"def stream_generate(\n",
" prompt: str,\n",
" max_new_tokens: int = 256,\n",
" temperature: float = 0.7\n",
") -> Generator[str, None, None]:\n",
" \"\"\"\n",
" Stream generated tokens from the loaded model.\n",
" \n",
" Uses TextIteratorStreamer for real-time token streaming.\n",
" Falls back to non-streaming generation if streaming is unavailable.\n",
" \n",
" Args:\n",
" prompt: Input text prompt for generation\n",
" max_new_tokens: Maximum number of tokens to generate\n",
" temperature: Sampling temperature (0.0 = deterministic, higher = more random)\n",
" \n",
" Yields:\n",
" Generated text tokens as they are produced\n",
" \"\"\"\n",
" # Validate model is loaded\n",
" if MODEL_PIPELINE is None:\n",
" yield \"โ ๏ธ Model not loaded. Please run load_model() first.\"\n",
" return\n",
" \n",
" # Validate inputs\n",
" if not prompt.strip():\n",
" yield \"โ ๏ธ Empty prompt provided.\"\n",
" return\n",
" \n",
" try:\n",
" # Create streamer\n",
" streamer = TextIteratorStreamer(\n",
" MODEL_PIPELINE.tokenizer,\n",
" skip_prompt=True,\n",
" skip_special_tokens=True\n",
" )\n",
" \n",
" # Prepare generation arguments\n",
" generation_kwargs = {\n",
" \"text_inputs\": prompt,\n",
" \"streamer\": streamer,\n",
" \"max_new_tokens\": max_new_tokens,\n",
" \"do_sample\": True,\n",
" \"temperature\": temperature,\n",
" }\n",
" \n",
" # Run generation in separate thread\n",
" def generate_in_thread():\n",
" try:\n",
" MODEL_PIPELINE(**generation_kwargs)\n",
" except Exception as e:\n",
" print(f\"โ ๏ธ Generation error: {e}\")\n",
" \n",
" thread = threading.Thread(target=generate_in_thread, daemon=True)\n",
" thread.start()\n",
" \n",
" # Stream tokens as they're generated\n",
" for token in streamer:\n",
" yield token\n",
" \n",
" except Exception as e:\n",
" # Fallback to non-streaming generation\n",
" print(f\"โ ๏ธ Streaming failed ({e}), falling back to non-streaming generation\")\n",
" try:\n",
" result = MODEL_PIPELINE(\n",
" prompt,\n",
" max_new_tokens=max_new_tokens,\n",
" do_sample=True,\n",
" temperature=temperature\n",
" )\n",
" yield result[0][\"generated_text\"]\n",
" except Exception as fallback_error:\n",
" yield f\"โ Generation failed: {fallback_error}\""
]
},
{
"cell_type": "markdown",
"id": "a9b51cd6",
"metadata": {},
"source": [
"## Step 9: File Processing Pipeline\n",
"\n",
"The main orchestration function that ties everything together:\n",
"\n",
"**Workflow:**\n",
"1. **Read file** - Validate size (<2MB) and decode to text\n",
"2. **Detect language** - From file extension\n",
"3. **Analyze code** - Estimate complexity using heuristics\n",
"4. **Annotate** - Insert Big-O comments\n",
"5. **Generate previews** - Create Markdown and HTML views\n",
"6. **Optional AI review** - Send to LLaMA for deeper analysis\n",
"\n",
"**Functions:**\n",
"- `read_file_content()` - Loads and validates uploaded files\n",
"- `create_review_prompt()` - Formats code for LLM analysis\n",
"- `generate_model_analysis()` - Gets AI-powered insights\n",
"- `process_code_file()` - Main orchestrator\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "766f6636",
"metadata": {},
"outputs": [],
"source": [
"def read_file_content(fileobj) -> Tuple[str, str, float]:\n",
" \"\"\"\n",
" Read and decode file content from a file-like object.\n",
" \n",
" Args:\n",
" fileobj: File-like object (from Gradio upload or file handle)\n",
" \n",
" Returns:\n",
" Tuple of (filename, content_text, size_in_mb)\n",
" \n",
" Raises:\n",
" ValueError: If file is too large\n",
" \"\"\"\n",
" # Get filename, ensuring we have a valid name\n",
" filename = getattr(fileobj, \"name\", None)\n",
" if not filename:\n",
" raise ValueError(\"Uploaded file must have a valid filename with extension\")\n",
" \n",
" # Read raw content\n",
" raw = fileobj.read()\n",
" \n",
" # Decode to text and calculate size\n",
" if isinstance(raw, bytes):\n",
" text = raw.decode(\"utf-8\", errors=\"replace\")\n",
" size_mb = len(raw) / (1024 * 1024)\n",
" else:\n",
" text = str(raw)\n",
" size_mb = len(text.encode(\"utf-8\")) / (1024 * 1024)\n",
" \n",
" # Validate file size\n",
" if size_mb > MAX_FILE_SIZE_MB:\n",
" raise ValueError(\n",
" f\"File too large: {size_mb:.2f} MB. \"\n",
" f\"Maximum allowed size is {MAX_FILE_SIZE_MB} MB.\"\n",
" )\n",
" \n",
" return filename, text, size_mb\n",
"\n",
"\n",
"def create_review_prompt(code: str, lang: str, max_code_chars: int = 4000) -> str:\n",
" \"\"\"\n",
" Create a prompt for LLM code review.\n",
" \n",
" Args:\n",
" code: Annotated source code\n",
" lang: Programming language\n",
" max_code_chars: Maximum characters to include in prompt\n",
" \n",
" Returns:\n",
" Formatted prompt string\n",
" \"\"\"\n",
" # Truncate code if necessary to fit token limits\n",
" code_snippet = code[:max_code_chars]\n",
" if len(code) > max_code_chars:\n",
" code_snippet += \"\\n... (code truncated for analysis)\"\n",
" \n",
" return f\"\"\"You are a senior code reviewer specializing in performance analysis.\n",
"\n",
"Language: {lang}\n",
"\n",
"Task: Analyze the following annotated code and provide:\n",
"1. Validation of the Big-O annotations\n",
"2. Identification of performance bottlenecks\n",
"3. Specific optimization suggestions (max 8 bullet points)\n",
"4. Any algorithmic improvements\n",
"\n",
"--- CODE START ---\n",
"{code_snippet}\n",
"--- CODE END ---\n",
"\n",
"Provide a concise, actionable analysis:\"\"\"\n",
"\n",
"\n",
"def generate_model_analysis(code: str, lang: str, model_params: Dict) -> str:\n",
" \"\"\"\n",
" Generate LLM-powered code complexity analysis.\n",
" \n",
" Args:\n",
" code: Annotated source code\n",
" lang: Programming language\n",
" model_params: Parameters for model generation (max_new_tokens, temperature)\n",
" \n",
" Returns:\n",
" Generated analysis text or error message\n",
" \"\"\"\n",
" # Check if model is loaded\n",
" if MODEL_PIPELINE is None:\n",
" return \"โ ๏ธ **Model not loaded.** Please click '๐ Load Model' button first before requesting AI analysis.\"\n",
" \n",
" try:\n",
" prompt = create_review_prompt(code, lang)\n",
" \n",
" # Stream and collect tokens\n",
" tokens = []\n",
" for token in stream_generate(prompt, **model_params):\n",
" tokens.append(token)\n",
" \n",
" result = \"\".join(tokens)\n",
" return result if result.strip() else \"_(No analysis generated)_\"\n",
" \n",
" except Exception as e:\n",
" return f\"โ ๏ธ Model analysis failed: {e}\"\n",
"\n",
"\n",
"def process_code_file(\n",
" fileobj,\n",
" ask_model: bool,\n",
" model_params: Dict\n",
") -> Tuple[str, str, str, str, str]:\n",
" \"\"\"\n",
" Process uploaded code file: detect language, annotate complexity, generate HTML preview.\n",
" \n",
" This is the main orchestration function that:\n",
" 1. Reads and validates the uploaded file\n",
" 2. Detects programming language from extension\n",
" 3. Analyzes and annotates code with Big-O complexity\n",
" 4. Generates Markdown and HTML previews\n",
" 5. Optionally generates LLM-powered code review\n",
" \n",
" Args:\n",
" fileobj: File-like object from Gradio file upload\n",
" ask_model: Whether to generate LLM analysis\n",
" model_params: Dict with 'max_new_tokens' and 'temperature' for generation\n",
" \n",
" Returns:\n",
" Tuple of (language, annotated_code, markdown_preview, html_preview, model_commentary)\n",
" \n",
" Raises:\n",
" ValueError: If file is invalid, too large, or has unsupported extension\n",
" \"\"\"\n",
" # Step 1: Read and validate file\n",
" filename, code_text, file_size_mb = read_file_content(fileobj)\n",
" \n",
" print(f\"๐ Processing: {filename} ({file_size_mb:.2f} MB)\")\n",
" \n",
" # Step 2: Detect language from file extension\n",
" lang = detect_language(filename)\n",
" \n",
" print(f\"๐ Detected language: {lang}\")\n",
" \n",
" # Step 3: Analyze and annotate code\n",
" annotated_code = insert_annotations(code_text, lang)\n",
" \n",
" # Step 4: Generate preview formats\n",
" markdown_preview = to_markdown(annotated_code, lang)\n",
" html_preview = highlighted_html(annotated_code, lang)\n",
" \n",
" # Step 5: Optionally generate model analysis\n",
" model_commentary = \"\"\n",
" if ask_model:\n",
" print(\"๐ค Generating model analysis...\")\n",
" model_commentary = generate_model_analysis(annotated_code, lang, model_params)\n",
" \n",
" return lang, annotated_code, markdown_preview, html_preview, model_commentary"
]
},
{
"cell_type": "markdown",
"id": "6060f778",
"metadata": {},
"source": [
"## Step 10: Build the Gradio Interface\n",
"\n",
"Creating a professional two-column UI with:\n",
"\n",
"**Left Column (Input):**\n",
"- File uploader (filters to code files only)\n",
"- AI review toggle\n",
"- Model configuration (ID, quantization options)\n",
"- Temperature and max tokens sliders\n",
"- Load model & process buttons\n",
"\n",
"**Right Column (Output):**\n",
"- Detected language display\n",
"- Syntax-highlighted code preview (orange comments!)\n",
"- AI code review (if enabled)\n",
"- Download buttons for annotated code + Markdown\n",
"\n",
"**Event Handlers:**\n",
"- `handle_model_loading()` - Shows live progress during model download\n",
"- `handle_file_processing()` - Processes uploaded files and updates all outputs\n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "85691712",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Gradio Blocks instance: 2 backend functions\n",
"-------------------------------------------\n",
"fn_index=0\n",
" inputs:\n",
" |-โ Invalid file upload format\", \"\", None, None\n", " \n", " # Open and process the file\n", " with open(file_path, 'rb') as f:\n", " # Prepare model parameters\n", " model_params = {\n", " \"max_new_tokens\": int(max_new_tokens),\n", " \"temperature\": float(temperature)\n", " }\n", " \n", " # Process the code file\n", " lang, annotated_code, markdown_preview, html_preview, model_commentary = process_code_file(\n", " f,\n", " ask_model_flag,\n", " model_params\n", " )\n", " \n", " # Save outputs to temporary files for download\n", " source_path, markdown_path = save_outputs_to_temp(annotated_code, markdown_preview, lang)\n", " \n", " # Format model commentary\n", " commentary_display = model_commentary if model_commentary else \"_(No model analysis generated)_\"\n", " \n", " return lang, html_preview, commentary_display, source_path, markdown_path\n", " \n", " except ValueError as e:\n", " # User-facing errors (file too large, unsupported extension, etc.)\n", " return \"\", f\"
โ ๏ธ {str(e)}\", \"\", None, None\n",
" except Exception as e:\n",
" # Unexpected errors\n",
" import traceback\n",
" error_detail = traceback.format_exc()\n",
" print(f\"Error processing file: {error_detail}\")\n",
" return \"\", f\"โ Processing failed: {str(e)}\", \"\", None, None\n",
"\n",
"\n",
"def build_ui():\n",
" \"\"\"\n",
" Build the Gradio user interface for the Code Complexity Annotator.\n",
" \n",
" Returns:\n",
" Gradio Blocks interface\n",
" \"\"\"\n",
" if not GRADIO_AVAILABLE:\n",
" raise ImportError(\n",
" \"Gradio is not installed. Please run the installation cell \"\n",
" \"and restart the kernel.\"\n",
" )\n",
" \n",
" # Custom CSS for better UI\n",
" custom_css = \"\"\"\n",
" footer {visibility: hidden}\n",
" .gradio-container {font-family: 'Inter', sans-serif}\n",
" \"\"\"\n",
" \n",
" with gr.Blocks(css=custom_css, title=\"Code Complexity Annotator\") as demo:\n",
" # Header\n",
" gr.Markdown(\"# ๐ถ Multi-Language Code Complexity Annotator\")\n",
" gr.Markdown(\n",
" \"Upload code โ Detect language โ Auto-annotate with Big-O complexity โ \"\n",
" \"Preview with syntax highlighting โ Download results. \"\n",
" \"Optional: Get AI-powered code review from LLaMA.\"\n",
" )\n",
" \n",
" with gr.Row():\n",
" # Left column: Input controls\n",
" with gr.Column(scale=2):\n",
" gr.Markdown(\"### ๐ค Upload & Settings\")\n",
" \n",
" file_upload = gr.File(\n",
" label=\"Upload Code File\",\n",
" file_count=\"single\",\n",
" file_types=[ext for ext in SUPPORTED_EXTENSIONS.keys()]\n",
" )\n",
" \n",
" ask_model = gr.Checkbox(\n",
" label=\"๐ค Generate AI Code Review\",\n",
" value=True,\n",
" info=\"โ ๏ธ Requires model to be loaded first using the button below\"\n",
" )\n",
" \n",
" gr.Markdown(\"### ๐ง Model Configuration\")\n",
" \n",
" model_id = gr.Textbox(\n",
" label=\"Hugging Face Model ID\",\n",
" value=DEFAULT_MODEL_ID,\n",
" placeholder=\"meta-llama/Llama-3.2-1B\"\n",
" )\n",
" \n",
" with gr.Row():\n",
" load_8bit = gr.Checkbox(\n",
" label=\"8-bit Quantization\",\n",
" value=False,\n",
" info=\"โ ๏ธ Requires CUDA/GPU (reduces memory by ~50%)\"\n",
" )\n",
" load_4bit = gr.Checkbox(\n",
" label=\"4-bit Quantization\",\n",
" value=False,\n",
" info=\"โ ๏ธ Requires CUDA/GPU (reduces memory by ~75%, lower quality)\"\n",
" )\n",
" \n",
" temperature = gr.Slider(\n",
" label=\"Temperature\",\n",
" minimum=0.0,\n",
" maximum=1.5,\n",
" value=0.7,\n",
" step=0.05,\n",
" info=\"Lower = more deterministic, Higher = more creative\"\n",
" )\n",
" \n",
" max_tokens = gr.Slider(\n",
" label=\"Max New Tokens\",\n",
" minimum=16,\n",
" maximum=1024,\n",
" value=256,\n",
" step=16,\n",
" info=\"Maximum length of generated review\"\n",
" )\n",
" \n",
" with gr.Row():\n",
" load_model_btn = gr.Button(\"๐ Load Model\", variant=\"secondary\")\n",
" process_btn = gr.Button(\"๐ Process & Annotate\", variant=\"primary\")\n",
" \n",
" model_status = gr.Markdown(\"โช **Status:** Model not loaded\")\n",
" \n",
" # Right column: Output displays\n",
" with gr.Column(scale=3):\n",
" gr.Markdown(\"### ๐ Results\")\n",
" \n",
" detected_lang = gr.Textbox(\n",
" label=\"Detected Language\",\n",
" interactive=False,\n",
" placeholder=\"Upload a file to detect language\"\n",
" )\n",
" \n",
" html_preview = gr.HTML(\n",
" label=\"Code Preview (Orange = Complexity Annotations)\",\n",
" value=\"Upload and process a file to see preview...\"\n",
" )\n",
" \n",
" model_output = gr.Markdown(\n",
" label=\"๐ค AI Code Review\",\n",
" value=\"*Enable 'Generate AI Code Review' and process a file to see analysis...*\"\n",
" )\n",
" \n",
" gr.Markdown(\"### ๐พ Downloads\")\n",
" \n",
" with gr.Row():\n",
" download_source = gr.File(\n",
" label=\"Annotated Source Code\",\n",
" interactive=False\n",
" )\n",
" download_markdown = gr.File(\n",
" label=\"Markdown Preview\",\n",
" interactive=False\n",
" )\n",
" \n",
" # Event handlers\n",
" load_model_btn.click(\n",
" fn=handle_model_loading,\n",
" inputs=[model_id, load_8bit, load_4bit],\n",
" outputs=[model_status],\n",
" show_progress=\"full\" # Show clear loading indicator\n",
" )\n",
" \n",
" process_btn.click(\n",
" fn=handle_file_processing,\n",
" inputs=[file_upload, ask_model, temperature, max_tokens],\n",
" outputs=[detected_lang, html_preview, model_output, download_source, download_markdown]\n",
" )\n",
" \n",
" return demo\n",
"\n",
"\n",
"# Build and display the interface\n",
"demo = build_ui()\n",
"demo"
]
},
{
"cell_type": "markdown",
"id": "3608ab3c",
"metadata": {},
"source": [
"## Step 11: Launch the App\n",
"\n",
"Starting the Gradio server with auto-browser launch.\n",
"\n",
"**Options:**\n",
"- `share=False` - Local only (set to True for public Gradio link)\n",
"- `inbrowser=True` - Automatically opens in your default browser\n",
"- `show_error=True` - Displays detailed error messages in the UI\n",
"\n",
"The app will be available at: `http://127.0.0.1:7861`\n",
"\n",
"---\n",
"\n",
"## ๐ก How to Use\n",
"\n",
"### Without AI Review (No Model Needed):\n",
"1. **Upload** a code file (.py, .js, .java, etc.)\n",
"2. **Uncheck** \"Generate AI Code Review\"\n",
"3. **Click** \"๐ Process & Annotate\"\n",
"4. **View** syntax-highlighted code with Big-O annotations\n",
"5. **Download** the annotated source + Markdown\n",
"\n",
"### With AI Review (Requires Model):\n",
"1. **Click** \"๐ Load Model\" (wait 2-5 minutes for first download)\n",
"2. **Upload** your code file\n",
"3. **Check** \"Generate AI Code Review\"\n",
"4. **Adjust** temperature/tokens if needed\n",
"5. **Click** \"๐ Process & Annotate\"\n",
"6. **Read** AI-generated optimization suggestions\n",
"\n",
"---\n",
"\n",
"## ๐ฏ Supported Languages\n",
"\n",
"Python โข JavaScript โข TypeScript โข Java โข C โข C++ โข C# โข Go โข PHP โข Swift โข Ruby โข Kotlin โข Rust\n",
"\n",
"---\n",
"\n",
"## ๐ง Model Options\n",
"\n",
"**Recommended for CPU/Mac:**\n",
"- `meta-llama/Llama-3.2-1B` (Default, ~1GB, requires HF approval)\n",
"- `gpt2` (No approval needed, ~500MB)\n",
"- `microsoft/DialoGPT-medium` (~1GB)\n",
"\n",
"**For GPU users:**\n",
"- Any model with 8-bit or 4-bit quantization enabled\n",
"- `meta-llama/Llama-2-7b-chat-hf` (requires approval)\n",
"\n",
"---\n",
"\n",
"**Note:** First model load downloads weights (~1-14GB depending on model). Subsequent runs load from cache.\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "eec78f72",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7861\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
""
],
"text/plain": [
"