1893 lines
73 KiB
Plaintext
1893 lines
73 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "905062e4",
|
|
"metadata": {},
|
|
"source": [
|
|
"# 🔶 Multi-Language Code Complexity Annotator\n",
|
|
"\n",
|
|
"## Why I Built This\n",
|
|
"\n",
|
|
"Understanding time complexity (Big-O notation) is crucial for writing efficient algorithms, identifying bottlenecks, making informed optimization decisions, and passing technical interviews.\n",
|
|
"\n",
|
|
"Analyzing complexity manually is tedious and error-prone. This tool **automates** the entire process—detecting loops, recursion, and functions, then annotating code with Big-O estimates and explanations.\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"## What This Does\n",
|
|
"\n",
|
|
"This app analyzes source code and automatically:\n",
|
|
"- 📊 Detects loops, recursion, and functions\n",
|
|
"- 🧮 Estimates Big-O complexity (O(1), O(n), O(n²), etc.)\n",
|
|
"- 💬 Inserts inline comments explaining the complexity\n",
|
|
"- 🎨 Generates syntax-highlighted previews\n",
|
|
"- 🤖 **Optional:** Gets AI-powered code review from LLaMA\n",
|
|
"\n",
|
|
"**Supports 13 languages:** Python • JavaScript • TypeScript • Java • C/C++ • C# • Go • PHP • Swift • Ruby • Kotlin • Rust\n",
|
|
"\n",
|
|
"**Tech:** HuggingFace Transformers • LLaMA 3.2 • Gradio UI • Pygments • Regex Analysis\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"**Use Case:** Upload your code → Get instant complexity analysis → Optimize with confidence\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "69e9876d",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 1: Install Dependencies\n",
|
|
"\n",
|
|
"Installing the complete stack:\n",
|
|
"- **Transformers** - HuggingFace library for loading LLaMA models\n",
|
|
"- **Accelerate** - Fast distributed training/inference\n",
|
|
"- **Gradio** - Beautiful web interface\n",
|
|
"- **PyTorch** (CPU version) - Deep learning framework\n",
|
|
"- **BitsAndBytes** - 4/8-bit quantization for large models\n",
|
|
"- **Pygments** - Syntax highlighting engine\n",
|
|
"- **Python-dotenv** - Environment variable management\n",
|
|
"\n",
|
|
"**Note:** This installs the CPU-only version of PyTorch. For GPU support, modify the install command.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"id": "f035a1c5",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!uv pip -q install -U pip\n",
|
|
"!uv pip -q install transformers accelerate gradio torch --extra-index-url https://download.pytorch.org/whl/cpu\n",
|
|
"!uv pip -q install bitsandbytes pygments python-dotenv"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6ab14cd1",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 2: Core Configuration & Imports\n",
|
|
"\n",
|
|
"Setting up:\n",
|
|
"- **Environment variables** to suppress progress bars (prevents Jupyter ContextVar issues)\n",
|
|
"- **Dummy tqdm** class to avoid notebook conflicts\n",
|
|
"- **Language mappings** for 13+ programming languages\n",
|
|
"- **Complexity constants** for Big-O estimation\n",
|
|
"- **Comment syntax** for each language (# vs //)\n",
|
|
"\n",
|
|
"**Key Configurations:**\n",
|
|
"- Max file size: 2 MB\n",
|
|
"- Default model: `meta-llama/Llama-3.2-1B`\n",
|
|
"- Supported file extensions and their language identifiers\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"id": "5666a121",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"import re\n",
|
|
"import io\n",
|
|
"import json\n",
|
|
"import time\n",
|
|
"import math\n",
|
|
"from dataclasses import dataclass\n",
|
|
"from typing import Tuple, List, Dict, Optional, Generator\n",
|
|
"\n",
|
|
"# Disable tqdm progress bars to avoid Jupyter ContextVar issues\n",
|
|
"os.environ[\"TRANSFORMERS_NO_ADVISORY_WARNINGS\"] = \"1\"\n",
|
|
"os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n",
|
|
"os.environ[\"TQDM_DISABLE\"] = \"1\" # Completely disable tqdm\n",
|
|
"\n",
|
|
"# Provide a module-level lock expected by some integrations\n",
|
|
"class _DummyLock:\n",
|
|
" def __enter__(self):\n",
|
|
" return self\n",
|
|
" def __exit__(self, *args):\n",
|
|
" pass\n",
|
|
"\n",
|
|
"class _DummyTqdm:\n",
|
|
" \"\"\"Dummy tqdm that does nothing - prevents Jupyter notebook ContextVar errors\"\"\"\n",
|
|
" def __init__(self, *args, **kwargs):\n",
|
|
" self.iterable = args[0] if args else None\n",
|
|
" self.total = kwargs.get('total', 0)\n",
|
|
" self.n = 0\n",
|
|
" def __iter__(self):\n",
|
|
" return iter(self.iterable) if self.iterable else iter([])\n",
|
|
" def __enter__(self):\n",
|
|
" return self\n",
|
|
" def __exit__(self, *args):\n",
|
|
" pass\n",
|
|
" def update(self, n=1, *args, **kwargs):\n",
|
|
" self.n += n\n",
|
|
" def close(self):\n",
|
|
" pass\n",
|
|
" def set_description(self, *args, **kwargs):\n",
|
|
" pass\n",
|
|
" def set_postfix(self, *args, **kwargs):\n",
|
|
" pass\n",
|
|
" def refresh(self, *args, **kwargs):\n",
|
|
" pass\n",
|
|
" def clear(self, *args, **kwargs):\n",
|
|
" pass\n",
|
|
" def write(self, *args, **kwargs):\n",
|
|
" pass\n",
|
|
" def reset(self, total=None):\n",
|
|
" self.n = 0\n",
|
|
" if total is not None:\n",
|
|
" self.total = total\n",
|
|
" @staticmethod\n",
|
|
" def get_lock():\n",
|
|
" \"\"\"Return a dummy lock to avoid ContextVar issues\"\"\"\n",
|
|
" return _DummyLock()\n",
|
|
" \n",
|
|
" @staticmethod\n",
|
|
" def set_lock(lock=None):\n",
|
|
" \"\"\"Dummy set_lock method - does nothing\"\"\"\n",
|
|
" pass\n",
|
|
"\n",
|
|
"def _dummy_get_lock():\n",
|
|
" \"\"\"Module-level get_lock function\"\"\"\n",
|
|
" return _DummyLock()\n",
|
|
"\n",
|
|
"def _dummy_set_lock(lock=None):\n",
|
|
" \"\"\"Module-level set_lock function - does nothing\"\"\"\n",
|
|
" pass\n",
|
|
"\n",
|
|
"# Import and immediately patch tqdm before transformers can use it\n",
|
|
"def _patch_tqdm():\n",
|
|
" \"\"\"Patch tqdm to avoid ContextVar errors in Jupyter\"\"\"\n",
|
|
" import sys # Import sys here since it's not available in outer scope\n",
|
|
" try:\n",
|
|
" import tqdm\n",
|
|
" import tqdm.auto\n",
|
|
" import tqdm.notebook\n",
|
|
"\n",
|
|
" # Patch classes\n",
|
|
" tqdm.tqdm = _DummyTqdm\n",
|
|
" tqdm.auto.tqdm = _DummyTqdm\n",
|
|
" tqdm.notebook.tqdm = _DummyTqdm\n",
|
|
"\n",
|
|
" # Patch module-level functions that other code might call directly\n",
|
|
" tqdm.get_lock = _dummy_get_lock\n",
|
|
" tqdm.auto.get_lock = _dummy_get_lock\n",
|
|
" tqdm.notebook.get_lock = _dummy_get_lock\n",
|
|
" tqdm.set_lock = _dummy_set_lock\n",
|
|
" tqdm.auto.set_lock = _dummy_set_lock\n",
|
|
" tqdm.notebook.set_lock = _dummy_set_lock\n",
|
|
"\n",
|
|
" # Also patch in sys.modules to catch any dynamic imports\n",
|
|
" sys.modules['tqdm'].tqdm = _DummyTqdm\n",
|
|
" sys.modules['tqdm.auto'].tqdm = _DummyTqdm\n",
|
|
" sys.modules['tqdm.notebook'].tqdm = _DummyTqdm\n",
|
|
" sys.modules['tqdm'].get_lock = _dummy_get_lock\n",
|
|
" sys.modules['tqdm.auto'].get_lock = _dummy_get_lock\n",
|
|
" sys.modules['tqdm.notebook'].get_lock = _dummy_get_lock\n",
|
|
" sys.modules['tqdm'].set_lock = _dummy_set_lock\n",
|
|
" sys.modules['tqdm.auto'].set_lock = _dummy_set_lock\n",
|
|
" sys.modules['tqdm.notebook'].set_lock = _dummy_set_lock\n",
|
|
"\n",
|
|
" except ImportError:\n",
|
|
" pass\n",
|
|
"\n",
|
|
"_patch_tqdm()\n",
|
|
"\n",
|
|
"from dotenv import load_dotenv\n",
|
|
"\n",
|
|
"SUPPORTED_EXTENSIONS = {\n",
|
|
" \".py\": \"python\",\n",
|
|
" \".js\": \"javascript\",\n",
|
|
" \".ts\": \"typescript\",\n",
|
|
" \".java\": \"java\",\n",
|
|
" \".c\": \"c\",\n",
|
|
" \".h\": \"c\",\n",
|
|
" \".cpp\": \"cpp\",\n",
|
|
" \".cc\": \"cpp\",\n",
|
|
" \".hpp\": \"cpp\",\n",
|
|
" \".cs\": \"csharp\",\n",
|
|
" \".go\": \"go\",\n",
|
|
" \".php\": \"php\",\n",
|
|
" \".swift\": \"swift\",\n",
|
|
" \".rb\": \"ruby\",\n",
|
|
" \".kt\": \"kotlin\",\n",
|
|
" \".rs\": \"rust\",\n",
|
|
"}\n",
|
|
"\n",
|
|
"COMMENT_SYNTAX = {\n",
|
|
" \"python\": \"#\",\n",
|
|
" \"javascript\": \"//\",\n",
|
|
" \"typescript\": \"//\",\n",
|
|
" \"java\": \"//\",\n",
|
|
" \"c\": \"//\",\n",
|
|
" \"cpp\": \"//\",\n",
|
|
" \"csharp\": \"//\",\n",
|
|
" \"go\": \"//\",\n",
|
|
" \"php\": \"//\",\n",
|
|
" \"swift\": \"//\",\n",
|
|
" \"ruby\": \"#\",\n",
|
|
" \"kotlin\": \"//\",\n",
|
|
" \"rust\": \"//\",\n",
|
|
"}\n",
|
|
"\n",
|
|
"MAX_FILE_SIZE_MB = 2.0\n",
|
|
"# Llama 3.2 1B - The actual model name (not -Instruct suffix)\n",
|
|
"# Requires Meta approval: https://huggingface.co/meta-llama/Llama-3.2-1B\n",
|
|
"DEFAULT_MODEL_ID = \"meta-llama/Llama-3.2-1B\"\n",
|
|
"DEVICE_HINT = \"auto\"\n",
|
|
"\n",
|
|
"# Global token storage (set in Cell 2 to avoid Jupyter ContextVar issues)\n",
|
|
"HUGGINGFACE_TOKEN = None\n",
|
|
"\n",
|
|
"# Complexity estimation constants\n",
|
|
"LOOP_KEYWORDS = [r\"\\bfor\\b\", r\"\\bwhile\\b\"]\n",
|
|
"\n",
|
|
"FUNCTION_PATTERNS = [\n",
|
|
" r\"^\\s*def\\s+([A-Za-z_]\\w*)\\s*\\(\", # Python\n",
|
|
" r\"^\\s*(?:public|private|protected)?\\s*(?:static\\s+)?[A-Za-z_<>\\[\\]]+\\s+([A-Za-z_]\\w*)\\s*\\(\", # Java/C#/C++\n",
|
|
" r\"^\\s*function\\s+([A-Za-z_]\\w*)\\s*\\(\", # JavaScript\n",
|
|
" r\"^\\s*(?:const|let|var)\\s+([A-Za-z_]\\w*)\\s*=\\s*\\(\", # JavaScript arrow/function\n",
|
|
"]\n",
|
|
"\n",
|
|
"COMPLEXITY_ORDER = {\n",
|
|
" \"O(1)\": 0,\n",
|
|
" \"O(log n)\": 1,\n",
|
|
" \"O(n)\": 2,\n",
|
|
" \"O(n log n)\": 3,\n",
|
|
" \"O(n^2)\": 4,\n",
|
|
" \"O(n^3)\": 5,\n",
|
|
"}\n",
|
|
"\n",
|
|
"RECURSION_PATTERNS = {\n",
|
|
" \"divide_conquer\": r\"\\b(n/2|n >> 1|n>>1|n\\s*//\\s*2|mid\\b)\",\n",
|
|
"}\n",
|
|
"\n",
|
|
"# HTML syntax highlighting styles (orange comments)\n",
|
|
"SYNTAX_HIGHLIGHT_CSS = \"\"\"<style>\n",
|
|
".codehilite .c, .codehilite .c1, .codehilite .cm, .codehilite .cp {\n",
|
|
" color: orange !important;\n",
|
|
" font-style: italic;\n",
|
|
"}\n",
|
|
".codehilite {\n",
|
|
" background: #0b0b0b11;\n",
|
|
" padding: 12px;\n",
|
|
" border-radius: 10px;\n",
|
|
" overflow: auto;\n",
|
|
" font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', 'Consolas', monospace;\n",
|
|
" font-size: 14px;\n",
|
|
" line-height: 1.5;\n",
|
|
"}\n",
|
|
"</style>\"\"\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d17e8406",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 3: Load HuggingFace Token\n",
|
|
"\n",
|
|
"Loading authentication token from `.env` file to access gated models like LLaMA.\n",
|
|
"\n",
|
|
"**Why?** Meta's LLaMA models require:\n",
|
|
"1. Accepting their license agreement on HuggingFace\n",
|
|
"2. Using an access token for authentication\n",
|
|
"\n",
|
|
"**Create a `.env` file with:**\n",
|
|
"```\n",
|
|
"HF_TOKEN=hf_your_token_here\n",
|
|
"```\n",
|
|
"\n",
|
|
"Get your token at: https://huggingface.co/settings/tokens\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"id": "70beee01",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"✅ Hugging Face token loaded successfully from .env file\n",
|
|
" Token length: 37 characters\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"load_dotenv()\n",
|
|
"\n",
|
|
"# Load token from .env file\n",
|
|
"HF_TOKEN = os.getenv(\"HF_TOKEN\", \"\").strip()\n",
|
|
"\n",
|
|
"# Store in global variable to avoid Jupyter ContextVar issues with os.environ\n",
|
|
"global HUGGINGFACE_TOKEN\n",
|
|
"\n",
|
|
"if HF_TOKEN:\n",
|
|
" os.environ[\"HUGGING_FACE_HUB_TOKEN\"] = HF_TOKEN\n",
|
|
" HUGGINGFACE_TOKEN = HF_TOKEN # Store in global variable\n",
|
|
" print(\"✅ Hugging Face token loaded successfully from .env file\")\n",
|
|
" print(f\" Token length: {len(HF_TOKEN)} characters\")\n",
|
|
"else:\n",
|
|
" print(\"⚠️ No HF_TOKEN found in .env file. Gated models may not work.\")\n",
|
|
" HUGGINGFACE_TOKEN = None"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bd0a557e",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 4: Language Detection Functions\n",
|
|
"\n",
|
|
"Two simple but essential utilities:\n",
|
|
"\n",
|
|
"1. **`detect_language(filename)`** - Detects programming language from file extension\n",
|
|
"2. **`comment_prefix_for(lang)`** - Returns the comment symbol for that language (# or //)\n",
|
|
"\n",
|
|
"These enable the tool to automatically adapt to any supported language.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"id": "a0dbad5f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def detect_language(filename: str) -> str:\n",
|
|
" \"\"\"\n",
|
|
" Detect programming language based on file extension.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" filename: Name of the file (must have a supported extension)\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Language identifier string (e.g., 'python', 'javascript', etc.)\n",
|
|
" \n",
|
|
" Raises:\n",
|
|
" ValueError: If file extension is not supported\n",
|
|
" \"\"\"\n",
|
|
" ext = os.path.splitext(filename)[1].lower()\n",
|
|
" \n",
|
|
" if not ext:\n",
|
|
" supported = \", \".join(sorted(SUPPORTED_EXTENSIONS.keys()))\n",
|
|
" raise ValueError(f\"File has no extension. Supported extensions: {supported}\")\n",
|
|
" \n",
|
|
" if ext not in SUPPORTED_EXTENSIONS:\n",
|
|
" supported = \", \".join(sorted(SUPPORTED_EXTENSIONS.keys()))\n",
|
|
" raise ValueError(f\"Unsupported file extension '{ext}'. Supported extensions: {supported}\")\n",
|
|
" \n",
|
|
" return SUPPORTED_EXTENSIONS[ext]\n",
|
|
"\n",
|
|
"def comment_prefix_for(lang: str) -> str:\n",
|
|
" \"\"\"\n",
|
|
" Get the comment prefix for a given language.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" lang: Language identifier (e.g., 'python', 'javascript')\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Comment prefix string (e.g., '#' or '//')\n",
|
|
" \n",
|
|
" Raises:\n",
|
|
" ValueError: If language is not supported\n",
|
|
" \"\"\"\n",
|
|
" if lang not in COMMENT_SYNTAX:\n",
|
|
" raise ValueError(f\"Unsupported language '{lang}'. Supported: {', '.join(sorted(COMMENT_SYNTAX.keys()))}\")\n",
|
|
" \n",
|
|
" return COMMENT_SYNTAX[lang]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "13e0f6d8",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 5: Complexity Estimation Engine\n",
|
|
"\n",
|
|
"The core analysis logic using **heuristic pattern matching**:\n",
|
|
"\n",
|
|
"**How it works:**\n",
|
|
"1. **Detect blocks** - Find all functions, loops, and recursion using regex patterns\n",
|
|
"2. **Analyze loops** - Count nesting depth (1 loop = O(n), 2 nested = O(n²), etc.)\n",
|
|
"3. **Analyze recursion** - Detect divide-and-conquer (O(log n)) vs exponential (O(2^n))\n",
|
|
"4. **Aggregate** - Functions inherit the worst complexity of their inner operations\n",
|
|
"\n",
|
|
"**Key Functions:**\n",
|
|
"- `detect_blocks()` - Pattern matching for code structures\n",
|
|
"- `analyze_recursion()` - Identifies recursive patterns\n",
|
|
"- `analyze_loop_complexity()` - Counts nested loops\n",
|
|
"- `estimate_complexity()` - Orchestrates the full analysis\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"id": "7595dfe3",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@dataclass\n",
|
|
"class BlockInfo:\n",
|
|
" \"\"\"Represents a code block (function, loop, or recursion) with complexity information.\"\"\"\n",
|
|
" line_idx: int\n",
|
|
" kind: str # \"function\" | \"loop\" | \"recursion\"\n",
|
|
" name: Optional[str] = None\n",
|
|
" depth: int = 0\n",
|
|
" complexity: str = \"O(1)\"\n",
|
|
" reason: str = \"\"\n",
|
|
"\n",
|
|
"\n",
|
|
"def get_indent_level(line: str) -> int:\n",
|
|
" \"\"\"Calculate indentation level of a line (tabs converted to 4 spaces).\"\"\"\n",
|
|
" normalized = line.replace(\"\\t\", \" \")\n",
|
|
" return len(normalized) - len(normalized.lstrip(\" \"))\n",
|
|
"\n",
|
|
"\n",
|
|
"def find_function_name(line: str) -> Optional[str]:\n",
|
|
" \"\"\"Extract function name from a line if it contains a function declaration.\"\"\"\n",
|
|
" for pattern in FUNCTION_PATTERNS:\n",
|
|
" match = re.search(pattern, line)\n",
|
|
" if match and match.lastindex:\n",
|
|
" return match.group(1)\n",
|
|
" return None\n",
|
|
"\n",
|
|
"\n",
|
|
"def get_block_end(block: BlockInfo, all_blocks: List[BlockInfo], total_lines: int) -> int:\n",
|
|
" \"\"\"Calculate the end line index for a given block.\"\"\"\n",
|
|
" end = total_lines\n",
|
|
" for other in all_blocks:\n",
|
|
" if other.line_idx > block.line_idx and other.depth <= block.depth:\n",
|
|
" end = min(end, other.line_idx)\n",
|
|
" return end\n",
|
|
"\n",
|
|
"\n",
|
|
"def rank_complexity(complexity: str) -> int:\n",
|
|
" \"\"\"Assign a numeric rank to a complexity string for comparison.\"\"\"\n",
|
|
" # Check for polynomial complexities O(n^k)\n",
|
|
" match = re.match(r\"O\\(n\\^(\\d+)\\)\", complexity)\n",
|
|
" if match:\n",
|
|
" return 10 + int(match.group(1))\n",
|
|
" \n",
|
|
" return COMPLEXITY_ORDER.get(complexity, 0)\n",
|
|
"\n",
|
|
"\n",
|
|
"def detect_blocks(lines: List[str], lang: str) -> List[BlockInfo]:\n",
|
|
" \"\"\"Detect all code blocks (functions and loops) in the source code.\"\"\"\n",
|
|
" blocks = []\n",
|
|
" stack = []\n",
|
|
" brace_depth = 0\n",
|
|
" \n",
|
|
" # Pre-compute indentation for Python\n",
|
|
" indents = [get_indent_level(line) for line in lines] if lang == \"python\" else []\n",
|
|
" \n",
|
|
" for i, line in enumerate(lines):\n",
|
|
" stripped = line.strip()\n",
|
|
" \n",
|
|
" # Track brace depth for non-Python languages\n",
|
|
" if lang != \"python\":\n",
|
|
" brace_depth += line.count(\"{\") - line.count(\"}\")\n",
|
|
" brace_depth = max(0, brace_depth)\n",
|
|
" \n",
|
|
" # Update stack based on indentation/brace depth\n",
|
|
" if lang == \"python\":\n",
|
|
" while stack and indents[i] < stack[-1]:\n",
|
|
" stack.pop()\n",
|
|
" else:\n",
|
|
" while stack and brace_depth < stack[-1]:\n",
|
|
" stack.pop()\n",
|
|
" \n",
|
|
" current_depth = len(stack)\n",
|
|
" \n",
|
|
" # Detect loops\n",
|
|
" if any(re.search(pattern, stripped) for pattern in LOOP_KEYWORDS):\n",
|
|
" blocks.append(BlockInfo(\n",
|
|
" line_idx=i,\n",
|
|
" kind=\"loop\",\n",
|
|
" depth=current_depth + 1\n",
|
|
" ))\n",
|
|
" stack.append(indents[i] if lang == \"python\" else brace_depth)\n",
|
|
" \n",
|
|
" # Detect functions\n",
|
|
" func_name = find_function_name(line)\n",
|
|
" if func_name:\n",
|
|
" blocks.append(BlockInfo(\n",
|
|
" line_idx=i,\n",
|
|
" kind=\"function\",\n",
|
|
" name=func_name,\n",
|
|
" depth=current_depth + 1\n",
|
|
" ))\n",
|
|
" stack.append(indents[i] if lang == \"python\" else brace_depth)\n",
|
|
" \n",
|
|
" return blocks\n",
|
|
"\n",
|
|
"\n",
|
|
"def analyze_recursion(block: BlockInfo, blocks: List[BlockInfo], lines: List[str]) -> None:\n",
|
|
" \"\"\"Analyze a function block for recursion and update its complexity.\"\"\"\n",
|
|
" if block.kind != \"function\" or not block.name:\n",
|
|
" return\n",
|
|
" \n",
|
|
" end = get_block_end(block, blocks, len(lines))\n",
|
|
" body = \"\\n\".join(lines[block.line_idx:end])\n",
|
|
" \n",
|
|
" # Count recursive calls (subtract 1 for the function definition itself)\n",
|
|
" recursive_calls = len(re.findall(rf\"\\b{re.escape(block.name)}\\s*\\(\", body)) - 1\n",
|
|
" \n",
|
|
" if recursive_calls == 0:\n",
|
|
" return\n",
|
|
" \n",
|
|
" # Detect divide-and-conquer pattern\n",
|
|
" if re.search(RECURSION_PATTERNS[\"divide_conquer\"], body):\n",
|
|
" block.kind = \"recursion\"\n",
|
|
" block.complexity = \"O(log n)\"\n",
|
|
" block.reason = \"Divide-and-conquer recursion (problem size halves each call).\"\n",
|
|
" # Multiple recursive calls suggest exponential\n",
|
|
" elif recursive_calls >= 2:\n",
|
|
" block.kind = \"recursion\"\n",
|
|
" block.complexity = \"O(2^n)\"\n",
|
|
" block.reason = \"Multiple recursive calls per frame suggest exponential growth.\"\n",
|
|
" # Single recursive call is linear\n",
|
|
" else:\n",
|
|
" block.kind = \"recursion\"\n",
|
|
" block.complexity = \"O(n)\"\n",
|
|
" block.reason = \"Single recursive call per frame suggests linear recursion.\"\n",
|
|
"\n",
|
|
"\n",
|
|
"def analyze_loop_complexity(block: BlockInfo, all_loops: List[BlockInfo], blocks: List[BlockInfo], total_lines: int) -> None:\n",
|
|
" \"\"\"Analyze loop nesting depth and assign complexity.\"\"\"\n",
|
|
" if block.kind != \"loop\":\n",
|
|
" return\n",
|
|
" \n",
|
|
" end = get_block_end(block, blocks, total_lines)\n",
|
|
" \n",
|
|
" # Count nested loops within this loop\n",
|
|
" inner_loops = [loop for loop in all_loops \n",
|
|
" if block.line_idx < loop.line_idx < end]\n",
|
|
" \n",
|
|
" nesting_depth = 1 + len(inner_loops)\n",
|
|
" \n",
|
|
" if nesting_depth == 1:\n",
|
|
" block.complexity = \"O(n)\"\n",
|
|
" block.reason = \"Single loop scales linearly with input size.\"\n",
|
|
" elif nesting_depth == 2:\n",
|
|
" block.complexity = \"O(n^2)\"\n",
|
|
" block.reason = \"Nested loops indicate quadratic time.\"\n",
|
|
" elif nesting_depth == 3:\n",
|
|
" block.complexity = \"O(n^3)\"\n",
|
|
" block.reason = \"Three nested loops indicate cubic time.\"\n",
|
|
" else:\n",
|
|
" block.complexity = f\"O(n^{nesting_depth})\"\n",
|
|
" block.reason = f\"{nesting_depth} nested loops suggest polynomial time.\"\n",
|
|
"\n",
|
|
"\n",
|
|
"def analyze_function_complexity(block: BlockInfo, blocks: List[BlockInfo], total_lines: int) -> None:\n",
|
|
" \"\"\"Analyze overall function complexity based on contained blocks.\"\"\"\n",
|
|
" if block.kind != \"function\":\n",
|
|
" return\n",
|
|
" \n",
|
|
" end = get_block_end(block, blocks, total_lines)\n",
|
|
" \n",
|
|
" # Get all blocks within this function\n",
|
|
" inner_blocks = [b for b in blocks if block.line_idx < b.line_idx < end]\n",
|
|
" \n",
|
|
" # Find the worst complexity among inner blocks\n",
|
|
" worst_complexity = \"O(1)\"\n",
|
|
" for inner in inner_blocks:\n",
|
|
" if rank_complexity(inner.complexity) > rank_complexity(worst_complexity):\n",
|
|
" worst_complexity = inner.complexity\n",
|
|
" \n",
|
|
" # Special case: recursion + loop = O(n log n)\n",
|
|
" has_recursion = any(b.kind == \"recursion\" for b in inner_blocks)\n",
|
|
" has_loop = any(b.kind == \"loop\" for b in inner_blocks)\n",
|
|
" \n",
|
|
" if has_recursion and has_loop:\n",
|
|
" block.complexity = \"O(n log n)\"\n",
|
|
" block.reason = \"Combines recursion with iteration (e.g., merge sort pattern).\"\n",
|
|
" else:\n",
|
|
" block.complexity = worst_complexity\n",
|
|
" block.reason = \"Based on worst-case complexity of inner operations.\"\n",
|
|
"\n",
|
|
"\n",
|
|
"def estimate_complexity(lines: List[str], lang: str) -> List[BlockInfo]:\n",
|
|
" \"\"\"\n",
|
|
" Estimate Big-O complexity for code blocks using heuristic analysis.\n",
|
|
" \n",
|
|
" Heuristics:\n",
|
|
" - Single/nested loops: O(n), O(n^2), O(n^3), etc.\n",
|
|
" - Recursion patterns: O(n), O(log n), O(2^n)\n",
|
|
" - Function complexity: worst case of internal operations\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" lines: Source code lines\n",
|
|
" lang: Programming language identifier\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" List of BlockInfo objects with complexity estimates\n",
|
|
" \"\"\"\n",
|
|
" # Step 1: Detect all blocks\n",
|
|
" blocks = detect_blocks(lines, lang)\n",
|
|
" \n",
|
|
" # Step 2: Analyze recursion in functions\n",
|
|
" for block in blocks:\n",
|
|
" analyze_recursion(block, blocks, lines)\n",
|
|
" \n",
|
|
" # Step 3: Analyze loop complexities\n",
|
|
" loops = [b for b in blocks if b.kind == \"loop\"]\n",
|
|
" for loop in loops:\n",
|
|
" analyze_loop_complexity(loop, loops, blocks, len(lines))\n",
|
|
" \n",
|
|
" # Step 4: Analyze overall function complexities\n",
|
|
" for block in blocks:\n",
|
|
" analyze_function_complexity(block, blocks, len(lines))\n",
|
|
" \n",
|
|
" return blocks"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f2a22988",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 6: Code Annotation Functions\n",
|
|
"\n",
|
|
"Takes the complexity estimates and **inserts them as comments** into the source code:\n",
|
|
"\n",
|
|
"**Process:**\n",
|
|
"1. `create_annotation_comment()` - Formats Big-O annotations as language-specific comments\n",
|
|
"2. `insert_annotations()` - Inserts comments below each function/loop\n",
|
|
"3. `to_markdown()` - Wraps annotated code in Markdown code blocks\n",
|
|
"\n",
|
|
"**Example output:**\n",
|
|
"```python\n",
|
|
"def bubble_sort(arr):\n",
|
|
"# Big-O: O(n^2)\n",
|
|
"# Explanation: Nested loops indicate quadratic time.\n",
|
|
" for i in range(len(arr)):\n",
|
|
" for j in range(len(arr) - i - 1):\n",
|
|
" if arr[j] > arr[j + 1]:\n",
|
|
" arr[j], arr[j + 1] = arr[j + 1], arr[j]\n",
|
|
"```\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"id": "2e642483",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def create_annotation_comment(block: BlockInfo, comment_prefix: str) -> List[str]:\n",
|
|
" \"\"\"\n",
|
|
" Create annotation comments for a code block.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" block: BlockInfo object containing complexity information\n",
|
|
" comment_prefix: Comment syntax for the language (e.g., '#' or '//')\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" List of comment lines to insert\n",
|
|
" \"\"\"\n",
|
|
" complexity = block.complexity or \"O(1)\"\n",
|
|
" reason = block.reason or \"Heuristic estimate based on detected structure.\"\n",
|
|
" \n",
|
|
" return [\n",
|
|
" f\"{comment_prefix} Big-O: {complexity}\",\n",
|
|
" f\"{comment_prefix} Explanation: {reason}\"\n",
|
|
" ]\n",
|
|
"\n",
|
|
"\n",
|
|
"def insert_annotations(code: str, lang: str) -> str:\n",
|
|
" \"\"\"\n",
|
|
" Insert Big-O complexity annotations into source code.\n",
|
|
" \n",
|
|
" Analyzes the code for loops, functions, and recursion, then inserts\n",
|
|
" orange-colored comment annotations (when syntax highlighted) beneath\n",
|
|
" each detected block explaining its time complexity.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" code: Source code string to annotate\n",
|
|
" lang: Programming language identifier\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Annotated source code with Big-O comments inserted\n",
|
|
" \"\"\"\n",
|
|
" if not code.strip():\n",
|
|
" return code\n",
|
|
" \n",
|
|
" lines = code.splitlines()\n",
|
|
" blocks = estimate_complexity(lines, lang)\n",
|
|
" \n",
|
|
" if not blocks:\n",
|
|
" return code\n",
|
|
" \n",
|
|
" comment_prefix = comment_prefix_for(lang)\n",
|
|
" \n",
|
|
" # Build a map of line numbers to annotation comments\n",
|
|
" annotations: Dict[int, List[str]] = {}\n",
|
|
" for block in blocks:\n",
|
|
" line_num = block.line_idx + 1 # Convert 0-indexed to 1-indexed\n",
|
|
" comments = create_annotation_comment(block, comment_prefix)\n",
|
|
" annotations.setdefault(line_num, []).extend(comments)\n",
|
|
" \n",
|
|
" # Insert annotations after their corresponding lines\n",
|
|
" annotated_lines = []\n",
|
|
" for line_num, original_line in enumerate(lines, start=1):\n",
|
|
" annotated_lines.append(original_line)\n",
|
|
" if line_num in annotations:\n",
|
|
" annotated_lines.extend(annotations[line_num])\n",
|
|
" \n",
|
|
" return \"\\n\".join(annotated_lines)\n",
|
|
"\n",
|
|
"\n",
|
|
"def to_markdown(code: str, lang: str) -> str:\n",
|
|
" \"\"\"\n",
|
|
" Format annotated code as Markdown with syntax highlighting.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" code: Annotated source code\n",
|
|
" lang: Programming language identifier for syntax highlighting\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Markdown-formatted code block\n",
|
|
" \"\"\"\n",
|
|
" lang_display = lang.capitalize()\n",
|
|
" \n",
|
|
" return f\"\"\"### Annotated Code ({lang_display})\n",
|
|
"\n",
|
|
"```{lang}\n",
|
|
"{code}\n",
|
|
"```\n",
|
|
"\"\"\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "184ad5c1",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 7: Syntax Highlighting with Pygments\n",
|
|
"\n",
|
|
"Generates beautiful, syntax-highlighted HTML previews with **orange-colored complexity comments**.\n",
|
|
"\n",
|
|
"**Features:**\n",
|
|
"- Uses Pygments lexer for accurate language-specific highlighting\n",
|
|
"- Custom CSS to make Big-O comments stand out in orange\n",
|
|
"- Fallback to plain HTML if Pygments is unavailable\n",
|
|
"- HTML escaping for security\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"id": "0f01d30b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def escape_html(text: str) -> str:\n",
|
|
" \"\"\"\n",
|
|
" Escape HTML special characters for safe display.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" text: Raw text to escape\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" HTML-safe text\n",
|
|
" \"\"\"\n",
|
|
" html_escape_table = {\n",
|
|
" \"&\": \"&\",\n",
|
|
" \"<\": \"<\",\n",
|
|
" \">\": \">\",\n",
|
|
" '\"': \""\",\n",
|
|
" \"'\": \"'\",\n",
|
|
" }\n",
|
|
" return \"\".join(html_escape_table.get(c, c) for c in text)\n",
|
|
"\n",
|
|
"\n",
|
|
"def highlighted_html(code: str, lang: str) -> str:\n",
|
|
" \"\"\"\n",
|
|
" Generate syntax-highlighted HTML with orange-colored comments.\n",
|
|
" \n",
|
|
" Uses Pygments for syntax highlighting with custom CSS to make\n",
|
|
" comments appear in orange for visual emphasis of Big-O annotations.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" code: Source code to highlight\n",
|
|
" lang: Programming language identifier\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" HTML string with embedded CSS and syntax highlighting\n",
|
|
" \"\"\"\n",
|
|
" if not code.strip():\n",
|
|
" return f\"<pre><code>{escape_html(code)}</code></pre>\"\n",
|
|
" \n",
|
|
" try:\n",
|
|
" from pygments import highlight\n",
|
|
" from pygments.lexers import get_lexer_by_name\n",
|
|
" from pygments.formatters import HtmlFormatter\n",
|
|
" \n",
|
|
" # Get appropriate lexer for the language\n",
|
|
" lexer = get_lexer_by_name(lang)\n",
|
|
" \n",
|
|
" # Configure HTML formatter\n",
|
|
" formatter = HtmlFormatter(\n",
|
|
" nowrap=False,\n",
|
|
" full=False,\n",
|
|
" cssclass=\"codehilite\",\n",
|
|
" linenos=False\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Generate highlighted HTML\n",
|
|
" html_code = highlight(code, lexer, formatter)\n",
|
|
" \n",
|
|
" return SYNTAX_HIGHLIGHT_CSS + html_code\n",
|
|
" \n",
|
|
" except ImportError:\n",
|
|
" # Pygments not available - return plain HTML\n",
|
|
" return f\"<pre><code>{escape_html(code)}</code></pre>\"\n",
|
|
" \n",
|
|
" except Exception as e:\n",
|
|
" # Lexer not found or other error - fallback to plain HTML\n",
|
|
" print(f\"⚠️ Syntax highlighting failed for '{lang}': {e}\")\n",
|
|
" return f\"<pre><code>{escape_html(code)}</code></pre>\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "36fd0454",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 8: LLaMA Model Loading & Streaming\n",
|
|
"\n",
|
|
"Loading HuggingFace LLaMA models for AI-powered code review:\n",
|
|
"\n",
|
|
"**Key Features:**\n",
|
|
"- **Quantization support** - 4-bit or 8-bit to reduce memory (requires GPU)\n",
|
|
"- **Streaming generation** - See tokens appear in real-time\n",
|
|
"- **Automatic device mapping** - Uses GPU if available, CPU otherwise\n",
|
|
"- **Thread-safe streaming** - Uses `TextIteratorStreamer` for parallel generation\n",
|
|
"\n",
|
|
"**Functions:**\n",
|
|
"- `load_model()` - Downloads and initializes the LLaMA model\n",
|
|
"- `stream_generate()` - Generates text token-by-token with streaming\n",
|
|
"\n",
|
|
"**Memory Requirements:**\n",
|
|
"- **Without quantization:** ~14GB RAM (7B models) or ~26GB (13B models)\n",
|
|
"- **With 8-bit:** ~50% reduction (GPU required)\n",
|
|
"- **With 4-bit:** ~75% reduction (GPU required)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 19,
|
|
"id": "e7de6947",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Hugging Face model imports\n",
|
|
"try:\n",
|
|
" from transformers import (\n",
|
|
" AutoModelForCausalLM,\n",
|
|
" AutoTokenizer,\n",
|
|
" BitsAndBytesConfig,\n",
|
|
" TextIteratorStreamer,\n",
|
|
" pipeline\n",
|
|
" )\n",
|
|
" import threading\n",
|
|
" TRANSFORMERS_AVAILABLE = True\n",
|
|
"except ImportError:\n",
|
|
" TRANSFORMERS_AVAILABLE = False\n",
|
|
"\n",
|
|
"# Global model state\n",
|
|
"MODEL_PIPELINE = None\n",
|
|
"TOKENIZER = None\n",
|
|
"\n",
|
|
"\n",
|
|
"def get_quantization_config(load_in_4bit: bool, load_in_8bit: bool) -> Optional[BitsAndBytesConfig]:\n",
|
|
" \"\"\"\n",
|
|
" Create quantization configuration for model loading.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" load_in_4bit: Whether to use 4-bit quantization\n",
|
|
" load_in_8bit: Whether to use 8-bit quantization\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" BitsAndBytesConfig object or None if quantization not requested/available\n",
|
|
" \n",
|
|
" Raises:\n",
|
|
" RuntimeError: If quantization requested but CUDA not available\n",
|
|
" \"\"\"\n",
|
|
" if not (load_in_4bit or load_in_8bit):\n",
|
|
" return None\n",
|
|
" \n",
|
|
" # Check if CUDA is available\n",
|
|
" try:\n",
|
|
" import torch\n",
|
|
" if not torch.cuda.is_available():\n",
|
|
" raise RuntimeError(\n",
|
|
" \"Quantization requires CUDA (NVIDIA GPU).\\n\\n\"\n",
|
|
" \"You are running on CPU/Mac and have requested quantization.\\n\"\n",
|
|
" \"Options:\\n\"\n",
|
|
" \" 1. Disable both 4-bit and 8-bit quantization to run on CPU\\n\"\n",
|
|
" \" (requires ~26GB RAM for 13B models, ~14GB for 7B models)\\n\"\n",
|
|
" \" 2. Use a GPU with CUDA support\\n\"\n",
|
|
" \" 3. Try smaller models like gpt2 or microsoft/DialoGPT-medium\\n\\n\"\n",
|
|
" \"Note: Quantization significantly reduces memory usage but requires GPU.\"\n",
|
|
" )\n",
|
|
" except ImportError:\n",
|
|
" pass\n",
|
|
" \n",
|
|
" try:\n",
|
|
" return BitsAndBytesConfig(load_in_4bit=load_in_4bit, load_in_8bit=load_in_8bit)\n",
|
|
" except Exception as e:\n",
|
|
" raise RuntimeError(f\"Failed to create quantization config: {e}\")\n",
|
|
"\n",
|
|
"\n",
|
|
"def load_model(\n",
|
|
" model_id: str = DEFAULT_MODEL_ID,\n",
|
|
" device_map: str = DEVICE_HINT,\n",
|
|
" load_in_8bit: bool = False,\n",
|
|
" load_in_4bit: bool = False\n",
|
|
") -> None:\n",
|
|
" \"\"\"\n",
|
|
" Load a Hugging Face LLaMA-family model for text generation.\n",
|
|
" \n",
|
|
" Supports optional 4-bit or 8-bit quantization to reduce memory usage.\n",
|
|
" Model is loaded into global MODEL_PIPELINE and TOKENIZER variables.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" model_id: Hugging Face model identifier\n",
|
|
" device_map: Device mapping strategy ('auto', 'cpu', 'cuda', etc.)\n",
|
|
" load_in_8bit: Enable 8-bit quantization\n",
|
|
" load_in_4bit: Enable 4-bit quantization\n",
|
|
" \n",
|
|
" Raises:\n",
|
|
" ImportError: If transformers is not installed\n",
|
|
" Exception: If model loading fails\n",
|
|
" \"\"\"\n",
|
|
" global MODEL_PIPELINE, TOKENIZER\n",
|
|
" \n",
|
|
" if not TRANSFORMERS_AVAILABLE:\n",
|
|
" raise ImportError(\n",
|
|
" \"Transformers library is not installed. \"\n",
|
|
" \"Please run the installation cell and restart the kernel.\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Use global variable instead of os.environ to avoid Jupyter ContextVar issues\n",
|
|
" global HUGGINGFACE_TOKEN\n",
|
|
" hf_token = HUGGINGFACE_TOKEN if HUGGINGFACE_TOKEN else None\n",
|
|
" \n",
|
|
" if hf_token:\n",
|
|
" print(f\" Using HuggingFace token: {hf_token[:10]}...{hf_token[-4:]}\")\n",
|
|
" else:\n",
|
|
" print(\" No HuggingFace token available (may fail for gated models)\")\n",
|
|
" \n",
|
|
" # Configure quantization if requested\n",
|
|
" quant_config = get_quantization_config(load_in_4bit, load_in_8bit)\n",
|
|
" \n",
|
|
" print(f\"🔄 Loading model: {model_id}\")\n",
|
|
" print(f\" Device map: {device_map}\")\n",
|
|
" print(f\" Quantization: 8-bit={load_in_8bit}, 4-bit={load_in_4bit}\")\n",
|
|
" print(f\" This may take 2-5 minutes... please wait...\")\n",
|
|
"\n",
|
|
" # Final tqdm patch before model loading (catches any missed imports)\n",
|
|
" _patch_tqdm()\n",
|
|
"\n",
|
|
" try:\n",
|
|
" # Suppress transformers warnings\n",
|
|
" from transformers.utils import logging\n",
|
|
" logging.set_verbosity_error()\n",
|
|
" \n",
|
|
" TOKENIZER = AutoTokenizer.from_pretrained(\n",
|
|
" model_id,\n",
|
|
" token=hf_token,\n",
|
|
" trust_remote_code=False\n",
|
|
" )\n",
|
|
" \n",
|
|
" print(\" ✓ Tokenizer loaded\")\n",
|
|
" \n",
|
|
" # Load model\n",
|
|
" model = AutoModelForCausalLM.from_pretrained(\n",
|
|
" model_id,\n",
|
|
" device_map=device_map,\n",
|
|
" quantization_config=quant_config,\n",
|
|
" token=hf_token,\n",
|
|
" trust_remote_code=False,\n",
|
|
" low_cpu_mem_usage=True\n",
|
|
" )\n",
|
|
" \n",
|
|
" print(\" ✓ Model loaded into memory\")\n",
|
|
" \n",
|
|
" # Create pipeline\n",
|
|
" MODEL_PIPELINE = pipeline(\n",
|
|
" \"text-generation\",\n",
|
|
" model=model,\n",
|
|
" tokenizer=TOKENIZER\n",
|
|
" )\n",
|
|
" \n",
|
|
" print(\"✅ Model loaded successfully\")\n",
|
|
" \n",
|
|
" except Exception as e:\n",
|
|
" print(f\"❌ Model loading failed: {e}\")\n",
|
|
" print(\"\\n💡 Troubleshooting:\")\n",
|
|
" print(\" • Gated models require HuggingFace approval and token\")\n",
|
|
" print(\" • Large models (13B+) need quantization OR ~26GB+ RAM\")\n",
|
|
" print(\" • Quantization requires NVIDIA GPU with CUDA\")\n",
|
|
" print(\"\\n💡 Models that work on CPU/Mac (no GPU needed):\")\n",
|
|
" print(\" • gpt2 (~500MB RAM)\")\n",
|
|
" print(\" • microsoft/DialoGPT-medium (~1GB RAM)\")\n",
|
|
" print(\" • meta-llama/Llama-2-7b-chat-hf (~14GB RAM, needs approval)\")\n",
|
|
" print(\"\\nBrowse more models: https://huggingface.co/models?pipeline_tag=text-generation\")\n",
|
|
" MODEL_PIPELINE = None\n",
|
|
" TOKENIZER = None\n",
|
|
" raise\n",
|
|
"\n",
|
|
"\n",
|
|
"def stream_generate(\n",
|
|
" prompt: str,\n",
|
|
" max_new_tokens: int = 256,\n",
|
|
" temperature: float = 0.7\n",
|
|
") -> Generator[str, None, None]:\n",
|
|
" \"\"\"\n",
|
|
" Stream generated tokens from the loaded model.\n",
|
|
" \n",
|
|
" Uses TextIteratorStreamer for real-time token streaming.\n",
|
|
" Falls back to non-streaming generation if streaming is unavailable.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" prompt: Input text prompt for generation\n",
|
|
" max_new_tokens: Maximum number of tokens to generate\n",
|
|
" temperature: Sampling temperature (0.0 = deterministic, higher = more random)\n",
|
|
" \n",
|
|
" Yields:\n",
|
|
" Generated text tokens as they are produced\n",
|
|
" \"\"\"\n",
|
|
" # Validate model is loaded\n",
|
|
" if MODEL_PIPELINE is None:\n",
|
|
" yield \"⚠️ Model not loaded. Please run load_model() first.\"\n",
|
|
" return\n",
|
|
" \n",
|
|
" # Validate inputs\n",
|
|
" if not prompt.strip():\n",
|
|
" yield \"⚠️ Empty prompt provided.\"\n",
|
|
" return\n",
|
|
" \n",
|
|
" try:\n",
|
|
" # Create streamer\n",
|
|
" streamer = TextIteratorStreamer(\n",
|
|
" MODEL_PIPELINE.tokenizer,\n",
|
|
" skip_prompt=True,\n",
|
|
" skip_special_tokens=True\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Prepare generation arguments\n",
|
|
" generation_kwargs = {\n",
|
|
" \"text_inputs\": prompt,\n",
|
|
" \"streamer\": streamer,\n",
|
|
" \"max_new_tokens\": max_new_tokens,\n",
|
|
" \"do_sample\": True,\n",
|
|
" \"temperature\": temperature,\n",
|
|
" }\n",
|
|
" \n",
|
|
" # Run generation in separate thread\n",
|
|
" def generate_in_thread():\n",
|
|
" try:\n",
|
|
" MODEL_PIPELINE(**generation_kwargs)\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"⚠️ Generation error: {e}\")\n",
|
|
" \n",
|
|
" thread = threading.Thread(target=generate_in_thread, daemon=True)\n",
|
|
" thread.start()\n",
|
|
" \n",
|
|
" # Stream tokens as they're generated\n",
|
|
" for token in streamer:\n",
|
|
" yield token\n",
|
|
" \n",
|
|
" except Exception as e:\n",
|
|
" # Fallback to non-streaming generation\n",
|
|
" print(f\"⚠️ Streaming failed ({e}), falling back to non-streaming generation\")\n",
|
|
" try:\n",
|
|
" result = MODEL_PIPELINE(\n",
|
|
" prompt,\n",
|
|
" max_new_tokens=max_new_tokens,\n",
|
|
" do_sample=True,\n",
|
|
" temperature=temperature\n",
|
|
" )\n",
|
|
" yield result[0][\"generated_text\"]\n",
|
|
" except Exception as fallback_error:\n",
|
|
" yield f\"❌ Generation failed: {fallback_error}\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a9b51cd6",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 9: File Processing Pipeline\n",
|
|
"\n",
|
|
"The main orchestration function that ties everything together:\n",
|
|
"\n",
|
|
"**Workflow:**\n",
|
|
"1. **Read file** - Validate size (<2MB) and decode to text\n",
|
|
"2. **Detect language** - From file extension\n",
|
|
"3. **Analyze code** - Estimate complexity using heuristics\n",
|
|
"4. **Annotate** - Insert Big-O comments\n",
|
|
"5. **Generate previews** - Create Markdown and HTML views\n",
|
|
"6. **Optional AI review** - Send to LLaMA for deeper analysis\n",
|
|
"\n",
|
|
"**Functions:**\n",
|
|
"- `read_file_content()` - Loads and validates uploaded files\n",
|
|
"- `create_review_prompt()` - Formats code for LLM analysis\n",
|
|
"- `generate_model_analysis()` - Gets AI-powered insights\n",
|
|
"- `process_code_file()` - Main orchestrator\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 20,
|
|
"id": "766f6636",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def read_file_content(fileobj) -> Tuple[str, str, float]:\n",
|
|
" \"\"\"\n",
|
|
" Read and decode file content from a file-like object.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" fileobj: File-like object (from Gradio upload or file handle)\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Tuple of (filename, content_text, size_in_mb)\n",
|
|
" \n",
|
|
" Raises:\n",
|
|
" ValueError: If file is too large\n",
|
|
" \"\"\"\n",
|
|
" # Get filename, ensuring we have a valid name\n",
|
|
" filename = getattr(fileobj, \"name\", None)\n",
|
|
" if not filename:\n",
|
|
" raise ValueError(\"Uploaded file must have a valid filename with extension\")\n",
|
|
" \n",
|
|
" # Read raw content\n",
|
|
" raw = fileobj.read()\n",
|
|
" \n",
|
|
" # Decode to text and calculate size\n",
|
|
" if isinstance(raw, bytes):\n",
|
|
" text = raw.decode(\"utf-8\", errors=\"replace\")\n",
|
|
" size_mb = len(raw) / (1024 * 1024)\n",
|
|
" else:\n",
|
|
" text = str(raw)\n",
|
|
" size_mb = len(text.encode(\"utf-8\")) / (1024 * 1024)\n",
|
|
" \n",
|
|
" # Validate file size\n",
|
|
" if size_mb > MAX_FILE_SIZE_MB:\n",
|
|
" raise ValueError(\n",
|
|
" f\"File too large: {size_mb:.2f} MB. \"\n",
|
|
" f\"Maximum allowed size is {MAX_FILE_SIZE_MB} MB.\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" return filename, text, size_mb\n",
|
|
"\n",
|
|
"\n",
|
|
"def create_review_prompt(code: str, lang: str, max_code_chars: int = 4000) -> str:\n",
|
|
" \"\"\"\n",
|
|
" Create a prompt for LLM code review.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" code: Annotated source code\n",
|
|
" lang: Programming language\n",
|
|
" max_code_chars: Maximum characters to include in prompt\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Formatted prompt string\n",
|
|
" \"\"\"\n",
|
|
" # Truncate code if necessary to fit token limits\n",
|
|
" code_snippet = code[:max_code_chars]\n",
|
|
" if len(code) > max_code_chars:\n",
|
|
" code_snippet += \"\\n... (code truncated for analysis)\"\n",
|
|
" \n",
|
|
" return f\"\"\"You are a senior code reviewer specializing in performance analysis.\n",
|
|
"\n",
|
|
"Language: {lang}\n",
|
|
"\n",
|
|
"Task: Analyze the following annotated code and provide:\n",
|
|
"1. Validation of the Big-O annotations\n",
|
|
"2. Identification of performance bottlenecks\n",
|
|
"3. Specific optimization suggestions (max 8 bullet points)\n",
|
|
"4. Any algorithmic improvements\n",
|
|
"\n",
|
|
"--- CODE START ---\n",
|
|
"{code_snippet}\n",
|
|
"--- CODE END ---\n",
|
|
"\n",
|
|
"Provide a concise, actionable analysis:\"\"\"\n",
|
|
"\n",
|
|
"\n",
|
|
"def generate_model_analysis(code: str, lang: str, model_params: Dict) -> str:\n",
|
|
" \"\"\"\n",
|
|
" Generate LLM-powered code complexity analysis.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" code: Annotated source code\n",
|
|
" lang: Programming language\n",
|
|
" model_params: Parameters for model generation (max_new_tokens, temperature)\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Generated analysis text or error message\n",
|
|
" \"\"\"\n",
|
|
" # Check if model is loaded\n",
|
|
" if MODEL_PIPELINE is None:\n",
|
|
" return \"⚠️ **Model not loaded.** Please click '🔄 Load Model' button first before requesting AI analysis.\"\n",
|
|
" \n",
|
|
" try:\n",
|
|
" prompt = create_review_prompt(code, lang)\n",
|
|
" \n",
|
|
" # Stream and collect tokens\n",
|
|
" tokens = []\n",
|
|
" for token in stream_generate(prompt, **model_params):\n",
|
|
" tokens.append(token)\n",
|
|
" \n",
|
|
" result = \"\".join(tokens)\n",
|
|
" return result if result.strip() else \"_(No analysis generated)_\"\n",
|
|
" \n",
|
|
" except Exception as e:\n",
|
|
" return f\"⚠️ Model analysis failed: {e}\"\n",
|
|
"\n",
|
|
"\n",
|
|
"def process_code_file(\n",
|
|
" fileobj,\n",
|
|
" ask_model: bool,\n",
|
|
" model_params: Dict\n",
|
|
") -> Tuple[str, str, str, str, str]:\n",
|
|
" \"\"\"\n",
|
|
" Process uploaded code file: detect language, annotate complexity, generate HTML preview.\n",
|
|
" \n",
|
|
" This is the main orchestration function that:\n",
|
|
" 1. Reads and validates the uploaded file\n",
|
|
" 2. Detects programming language from extension\n",
|
|
" 3. Analyzes and annotates code with Big-O complexity\n",
|
|
" 4. Generates Markdown and HTML previews\n",
|
|
" 5. Optionally generates LLM-powered code review\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" fileobj: File-like object from Gradio file upload\n",
|
|
" ask_model: Whether to generate LLM analysis\n",
|
|
" model_params: Dict with 'max_new_tokens' and 'temperature' for generation\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Tuple of (language, annotated_code, markdown_preview, html_preview, model_commentary)\n",
|
|
" \n",
|
|
" Raises:\n",
|
|
" ValueError: If file is invalid, too large, or has unsupported extension\n",
|
|
" \"\"\"\n",
|
|
" # Step 1: Read and validate file\n",
|
|
" filename, code_text, file_size_mb = read_file_content(fileobj)\n",
|
|
" \n",
|
|
" print(f\"📄 Processing: {filename} ({file_size_mb:.2f} MB)\")\n",
|
|
" \n",
|
|
" # Step 2: Detect language from file extension\n",
|
|
" lang = detect_language(filename)\n",
|
|
" \n",
|
|
" print(f\"🔍 Detected language: {lang}\")\n",
|
|
" \n",
|
|
" # Step 3: Analyze and annotate code\n",
|
|
" annotated_code = insert_annotations(code_text, lang)\n",
|
|
" \n",
|
|
" # Step 4: Generate preview formats\n",
|
|
" markdown_preview = to_markdown(annotated_code, lang)\n",
|
|
" html_preview = highlighted_html(annotated_code, lang)\n",
|
|
" \n",
|
|
" # Step 5: Optionally generate model analysis\n",
|
|
" model_commentary = \"\"\n",
|
|
" if ask_model:\n",
|
|
" print(\"🤖 Generating model analysis...\")\n",
|
|
" model_commentary = generate_model_analysis(annotated_code, lang, model_params)\n",
|
|
" \n",
|
|
" return lang, annotated_code, markdown_preview, html_preview, model_commentary"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6060f778",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 10: Build the Gradio Interface\n",
|
|
"\n",
|
|
"Creating a professional two-column UI with:\n",
|
|
"\n",
|
|
"**Left Column (Input):**\n",
|
|
"- File uploader (filters to code files only)\n",
|
|
"- AI review toggle\n",
|
|
"- Model configuration (ID, quantization options)\n",
|
|
"- Temperature and max tokens sliders\n",
|
|
"- Load model & process buttons\n",
|
|
"\n",
|
|
"**Right Column (Output):**\n",
|
|
"- Detected language display\n",
|
|
"- Syntax-highlighted code preview (orange comments!)\n",
|
|
"- AI code review (if enabled)\n",
|
|
"- Download buttons for annotated code + Markdown\n",
|
|
"\n",
|
|
"**Event Handlers:**\n",
|
|
"- `handle_model_loading()` - Shows live progress during model download\n",
|
|
"- `handle_file_processing()` - Processes uploaded files and updates all outputs\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 21,
|
|
"id": "85691712",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Gradio Blocks instance: 2 backend functions\n",
|
|
"-------------------------------------------\n",
|
|
"fn_index=0\n",
|
|
" inputs:\n",
|
|
" |-<gradio.components.textbox.Textbox object at 0x14292bbf0>\n",
|
|
" |-<gradio.components.checkbox.Checkbox object at 0x142d06990>\n",
|
|
" |-<gradio.components.checkbox.Checkbox object at 0x142913380>\n",
|
|
" outputs:\n",
|
|
" |-<gradio.components.markdown.Markdown object at 0x142d5b4d0>\n",
|
|
"fn_index=1\n",
|
|
" inputs:\n",
|
|
" |-<gradio.components.file.File object at 0x142913d70>\n",
|
|
" |-<gradio.components.checkbox.Checkbox object at 0x1400ab080>\n",
|
|
" |-<gradio.components.slider.Slider object at 0x14264f3e0>\n",
|
|
" |-<gradio.components.slider.Slider object at 0x112bd9c40>\n",
|
|
" outputs:\n",
|
|
" |-<gradio.components.textbox.Textbox object at 0x1425ea420>\n",
|
|
" |-<gradio.components.html.HTML object at 0x1426dcec0>\n",
|
|
" |-<gradio.components.markdown.Markdown object at 0x142d4bc80>\n",
|
|
" |-<gradio.components.file.File object at 0x1423232c0>\n",
|
|
" |-<gradio.components.file.File object at 0x1422f2750>"
|
|
]
|
|
},
|
|
"execution_count": 21,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Gradio UI imports\n",
|
|
"try:\n",
|
|
" import gradio as gr\n",
|
|
" GRADIO_AVAILABLE = True\n",
|
|
"except ImportError:\n",
|
|
" GRADIO_AVAILABLE = False\n",
|
|
"\n",
|
|
"import tempfile\n",
|
|
"from pathlib import Path\n",
|
|
"\n",
|
|
"\n",
|
|
"def get_file_extension_for_language(lang: str) -> str:\n",
|
|
" \"\"\"\n",
|
|
" Get the primary file extension for a given language.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" lang: Language identifier\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" File extension with dot (e.g., '.py', '.js')\n",
|
|
" \"\"\"\n",
|
|
" # Create reverse mapping from language to primary extension\n",
|
|
" lang_to_ext = {\n",
|
|
" \"python\": \".py\",\n",
|
|
" \"javascript\": \".js\",\n",
|
|
" \"typescript\": \".ts\",\n",
|
|
" \"java\": \".java\",\n",
|
|
" \"c\": \".c\",\n",
|
|
" \"cpp\": \".cpp\",\n",
|
|
" \"csharp\": \".cs\",\n",
|
|
" \"go\": \".go\",\n",
|
|
" \"php\": \".php\",\n",
|
|
" \"swift\": \".swift\",\n",
|
|
" \"ruby\": \".rb\",\n",
|
|
" \"kotlin\": \".kt\",\n",
|
|
" \"rust\": \".rs\",\n",
|
|
" }\n",
|
|
" return lang_to_ext.get(lang, \".txt\")\n",
|
|
"\n",
|
|
"\n",
|
|
"def save_outputs_to_temp(annotated_code: str, markdown: str, lang: str) -> Tuple[str, str]:\n",
|
|
" \"\"\"\n",
|
|
" Save annotated code and markdown to temporary files for download.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" annotated_code: Annotated source code\n",
|
|
" markdown: Markdown preview\n",
|
|
" lang: Programming language\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Tuple of (source_file_path, markdown_file_path)\n",
|
|
" \"\"\"\n",
|
|
" # Get appropriate extension\n",
|
|
" ext = get_file_extension_for_language(lang)\n",
|
|
" \n",
|
|
" # Create temporary files\n",
|
|
" source_file = tempfile.NamedTemporaryFile(\n",
|
|
" mode='w',\n",
|
|
" suffix=ext,\n",
|
|
" prefix='annotated_',\n",
|
|
" delete=False,\n",
|
|
" encoding='utf-8'\n",
|
|
" )\n",
|
|
" \n",
|
|
" markdown_file = tempfile.NamedTemporaryFile(\n",
|
|
" mode='w',\n",
|
|
" suffix='.md',\n",
|
|
" prefix='annotated_',\n",
|
|
" delete=False,\n",
|
|
" encoding='utf-8'\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Write content\n",
|
|
" source_file.write(annotated_code)\n",
|
|
" source_file.close()\n",
|
|
" \n",
|
|
" markdown_file.write(markdown)\n",
|
|
" markdown_file.close()\n",
|
|
" \n",
|
|
" return source_file.name, markdown_file.name\n",
|
|
"\n",
|
|
"\n",
|
|
"def handle_model_loading(model_id: str, load_in_8bit: bool, load_in_4bit: bool):\n",
|
|
" \"\"\"\n",
|
|
" Handle model loading with error handling and live progress updates for Gradio UI.\n",
|
|
" Yields status updates with elapsed time.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" model_id: Hugging Face model identifier\n",
|
|
" load_in_8bit: Whether to use 8-bit quantization\n",
|
|
" load_in_4bit: Whether to use 4-bit quantization\n",
|
|
" \n",
|
|
" Yields:\n",
|
|
" Status message updates with progress\n",
|
|
" \"\"\"\n",
|
|
" import threading\n",
|
|
" import time\n",
|
|
" \n",
|
|
" # Immediate status update - clears old text\n",
|
|
" yield \"🔄 **Step 1/4:** Initializing... (0s elapsed)\"\n",
|
|
" \n",
|
|
" print(f\"\\n{'='*60}\")\n",
|
|
" print(f\"🔄 Starting model load: {model_id.strip()}\")\n",
|
|
" print(f\"{'='*60}\\n\")\n",
|
|
" \n",
|
|
" start_time = time.time()\n",
|
|
" loading_complete = False\n",
|
|
" error_message = None\n",
|
|
" \n",
|
|
" # Function to load model in background thread\n",
|
|
" def load_in_background():\n",
|
|
" nonlocal loading_complete, error_message\n",
|
|
" try:\n",
|
|
" load_model(\n",
|
|
" model_id.strip(),\n",
|
|
" device_map=DEVICE_HINT,\n",
|
|
" load_in_8bit=load_in_8bit,\n",
|
|
" load_in_4bit=load_in_4bit\n",
|
|
" )\n",
|
|
" loading_complete = True\n",
|
|
" except Exception as e:\n",
|
|
" error_message = str(e)\n",
|
|
" loading_complete = True\n",
|
|
" \n",
|
|
" # Start loading in background thread\n",
|
|
" thread = threading.Thread(target=load_in_background, daemon=True)\n",
|
|
" thread.start()\n",
|
|
" \n",
|
|
" # Progress stages with approximate timing\n",
|
|
" stages = [\n",
|
|
" (0, \"🔄 **Step 1/4:** Connecting to HuggingFace...\"),\n",
|
|
" (5, \"🔄 **Step 2/4:** Downloading tokenizer...\"),\n",
|
|
" (15, \"🔄 **Step 3/4:** Loading model weights...\"),\n",
|
|
" (30, \"🔄 **Step 4/4:** Finalizing model setup...\"),\n",
|
|
" ]\n",
|
|
" \n",
|
|
" stage_idx = 0\n",
|
|
" last_update = time.time()\n",
|
|
" \n",
|
|
" # Show progress updates while loading\n",
|
|
" while not loading_complete:\n",
|
|
" elapsed = int(time.time() - start_time)\n",
|
|
" \n",
|
|
" # Move to next stage if enough time passed\n",
|
|
" if stage_idx < len(stages) - 1 and elapsed >= stages[stage_idx + 1][0]:\n",
|
|
" stage_idx += 1\n",
|
|
" \n",
|
|
" # Update every 2 seconds\n",
|
|
" if time.time() - last_update >= 2:\n",
|
|
" current_stage = stages[stage_idx][1]\n",
|
|
" yield f\"{current_stage} ({elapsed}s elapsed)\"\n",
|
|
" last_update = time.time()\n",
|
|
" \n",
|
|
" time.sleep(0.5) # Check every 0.5 seconds\n",
|
|
" \n",
|
|
" # Final result\n",
|
|
" elapsed = int(time.time() - start_time)\n",
|
|
" if error_message:\n",
|
|
" yield f\"❌ **Model loading failed** ({elapsed}s elapsed)\\n\\n{error_message}\"\n",
|
|
" else:\n",
|
|
" yield f\"✅ **Model loaded successfully!** ({elapsed}s total)\"\n",
|
|
"\n",
|
|
"\n",
|
|
"def handle_file_processing(\n",
|
|
" file,\n",
|
|
" ask_model_flag: bool,\n",
|
|
" temperature: float,\n",
|
|
" max_new_tokens: int\n",
|
|
") -> Tuple[str, str, str, Optional[str], Optional[str]]:\n",
|
|
" \"\"\"\n",
|
|
" Handle file processing workflow for Gradio UI.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" file: Gradio file upload object\n",
|
|
" ask_model_flag: Whether to generate model commentary\n",
|
|
" temperature: Generation temperature\n",
|
|
" max_new_tokens: Max tokens to generate\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Tuple of (language, html_preview, model_commentary, source_path, markdown_path)\n",
|
|
" \"\"\"\n",
|
|
" # Validate file upload\n",
|
|
" if file is None:\n",
|
|
" return \"\", \"<i>⚠️ Please upload a code file.</i>\", \"\", None, None\n",
|
|
" \n",
|
|
" # Check if model is required but not loaded\n",
|
|
" if ask_model_flag and MODEL_PIPELINE is None:\n",
|
|
" return \"\", \"\", \"⚠️ **Model not loaded.** Please click '🔄 Load Model' button first before requesting AI analysis.\", None, None\n",
|
|
" \n",
|
|
" try:\n",
|
|
" # Gradio provides file as a path string or file object\n",
|
|
" if isinstance(file, str):\n",
|
|
" file_path = file\n",
|
|
" elif hasattr(file, 'name'):\n",
|
|
" file_path = file.name\n",
|
|
" else:\n",
|
|
" return \"\", \"<pre>❌ Invalid file upload format</pre>\", \"\", None, None\n",
|
|
" \n",
|
|
" # Open and process the file\n",
|
|
" with open(file_path, 'rb') as f:\n",
|
|
" # Prepare model parameters\n",
|
|
" model_params = {\n",
|
|
" \"max_new_tokens\": int(max_new_tokens),\n",
|
|
" \"temperature\": float(temperature)\n",
|
|
" }\n",
|
|
" \n",
|
|
" # Process the code file\n",
|
|
" lang, annotated_code, markdown_preview, html_preview, model_commentary = process_code_file(\n",
|
|
" f,\n",
|
|
" ask_model_flag,\n",
|
|
" model_params\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Save outputs to temporary files for download\n",
|
|
" source_path, markdown_path = save_outputs_to_temp(annotated_code, markdown_preview, lang)\n",
|
|
" \n",
|
|
" # Format model commentary\n",
|
|
" commentary_display = model_commentary if model_commentary else \"_(No model analysis generated)_\"\n",
|
|
" \n",
|
|
" return lang, html_preview, commentary_display, source_path, markdown_path\n",
|
|
" \n",
|
|
" except ValueError as e:\n",
|
|
" # User-facing errors (file too large, unsupported extension, etc.)\n",
|
|
" return \"\", f\"<pre>⚠️ {str(e)}</pre>\", \"\", None, None\n",
|
|
" except Exception as e:\n",
|
|
" # Unexpected errors\n",
|
|
" import traceback\n",
|
|
" error_detail = traceback.format_exc()\n",
|
|
" print(f\"Error processing file: {error_detail}\")\n",
|
|
" return \"\", f\"<pre>❌ Processing failed: {str(e)}</pre>\", \"\", None, None\n",
|
|
"\n",
|
|
"\n",
|
|
"def build_ui():\n",
|
|
" \"\"\"\n",
|
|
" Build the Gradio user interface for the Code Complexity Annotator.\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Gradio Blocks interface\n",
|
|
" \"\"\"\n",
|
|
" if not GRADIO_AVAILABLE:\n",
|
|
" raise ImportError(\n",
|
|
" \"Gradio is not installed. Please run the installation cell \"\n",
|
|
" \"and restart the kernel.\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Custom CSS for better UI\n",
|
|
" custom_css = \"\"\"\n",
|
|
" footer {visibility: hidden}\n",
|
|
" .gradio-container {font-family: 'Inter', sans-serif}\n",
|
|
" \"\"\"\n",
|
|
" \n",
|
|
" with gr.Blocks(css=custom_css, title=\"Code Complexity Annotator\") as demo:\n",
|
|
" # Header\n",
|
|
" gr.Markdown(\"# 🔶 Multi-Language Code Complexity Annotator\")\n",
|
|
" gr.Markdown(\n",
|
|
" \"Upload code → Detect language → Auto-annotate with Big-O complexity → \"\n",
|
|
" \"Preview with syntax highlighting → Download results. \"\n",
|
|
" \"Optional: Get AI-powered code review from LLaMA.\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" with gr.Row():\n",
|
|
" # Left column: Input controls\n",
|
|
" with gr.Column(scale=2):\n",
|
|
" gr.Markdown(\"### 📤 Upload & Settings\")\n",
|
|
" \n",
|
|
" file_upload = gr.File(\n",
|
|
" label=\"Upload Code File\",\n",
|
|
" file_count=\"single\",\n",
|
|
" file_types=[ext for ext in SUPPORTED_EXTENSIONS.keys()]\n",
|
|
" )\n",
|
|
" \n",
|
|
" ask_model = gr.Checkbox(\n",
|
|
" label=\"🤖 Generate AI Code Review\",\n",
|
|
" value=True,\n",
|
|
" info=\"⚠️ Requires model to be loaded first using the button below\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" gr.Markdown(\"### 🧠 Model Configuration\")\n",
|
|
" \n",
|
|
" model_id = gr.Textbox(\n",
|
|
" label=\"Hugging Face Model ID\",\n",
|
|
" value=DEFAULT_MODEL_ID,\n",
|
|
" placeholder=\"meta-llama/Llama-3.2-1B\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" with gr.Row():\n",
|
|
" load_8bit = gr.Checkbox(\n",
|
|
" label=\"8-bit Quantization\",\n",
|
|
" value=False,\n",
|
|
" info=\"⚠️ Requires CUDA/GPU (reduces memory by ~50%)\"\n",
|
|
" )\n",
|
|
" load_4bit = gr.Checkbox(\n",
|
|
" label=\"4-bit Quantization\",\n",
|
|
" value=False,\n",
|
|
" info=\"⚠️ Requires CUDA/GPU (reduces memory by ~75%, lower quality)\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" temperature = gr.Slider(\n",
|
|
" label=\"Temperature\",\n",
|
|
" minimum=0.0,\n",
|
|
" maximum=1.5,\n",
|
|
" value=0.7,\n",
|
|
" step=0.05,\n",
|
|
" info=\"Lower = more deterministic, Higher = more creative\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" max_tokens = gr.Slider(\n",
|
|
" label=\"Max New Tokens\",\n",
|
|
" minimum=16,\n",
|
|
" maximum=1024,\n",
|
|
" value=256,\n",
|
|
" step=16,\n",
|
|
" info=\"Maximum length of generated review\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" with gr.Row():\n",
|
|
" load_model_btn = gr.Button(\"🔄 Load Model\", variant=\"secondary\")\n",
|
|
" process_btn = gr.Button(\"🚀 Process & Annotate\", variant=\"primary\")\n",
|
|
" \n",
|
|
" model_status = gr.Markdown(\"⚪ **Status:** Model not loaded\")\n",
|
|
" \n",
|
|
" # Right column: Output displays\n",
|
|
" with gr.Column(scale=3):\n",
|
|
" gr.Markdown(\"### 📊 Results\")\n",
|
|
" \n",
|
|
" detected_lang = gr.Textbox(\n",
|
|
" label=\"Detected Language\",\n",
|
|
" interactive=False,\n",
|
|
" placeholder=\"Upload a file to detect language\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" html_preview = gr.HTML(\n",
|
|
" label=\"Code Preview (Orange = Complexity Annotations)\",\n",
|
|
" value=\"<i>Upload and process a file to see preview...</i>\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" model_output = gr.Markdown(\n",
|
|
" label=\"🤖 AI Code Review\",\n",
|
|
" value=\"*Enable 'Generate AI Code Review' and process a file to see analysis...*\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" gr.Markdown(\"### 💾 Downloads\")\n",
|
|
" \n",
|
|
" with gr.Row():\n",
|
|
" download_source = gr.File(\n",
|
|
" label=\"Annotated Source Code\",\n",
|
|
" interactive=False\n",
|
|
" )\n",
|
|
" download_markdown = gr.File(\n",
|
|
" label=\"Markdown Preview\",\n",
|
|
" interactive=False\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Event handlers\n",
|
|
" load_model_btn.click(\n",
|
|
" fn=handle_model_loading,\n",
|
|
" inputs=[model_id, load_8bit, load_4bit],\n",
|
|
" outputs=[model_status],\n",
|
|
" show_progress=\"full\" # Show clear loading indicator\n",
|
|
" )\n",
|
|
" \n",
|
|
" process_btn.click(\n",
|
|
" fn=handle_file_processing,\n",
|
|
" inputs=[file_upload, ask_model, temperature, max_tokens],\n",
|
|
" outputs=[detected_lang, html_preview, model_output, download_source, download_markdown]\n",
|
|
" )\n",
|
|
" \n",
|
|
" return demo\n",
|
|
"\n",
|
|
"\n",
|
|
"# Build and display the interface\n",
|
|
"demo = build_ui()\n",
|
|
"demo"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3608ab3c",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 11: Launch the App\n",
|
|
"\n",
|
|
"Starting the Gradio server with auto-browser launch.\n",
|
|
"\n",
|
|
"**Options:**\n",
|
|
"- `share=False` - Local only (set to True for public Gradio link)\n",
|
|
"- `inbrowser=True` - Automatically opens in your default browser\n",
|
|
"- `show_error=True` - Displays detailed error messages in the UI\n",
|
|
"\n",
|
|
"The app will be available at: `http://127.0.0.1:7861`\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"## 💡 How to Use\n",
|
|
"\n",
|
|
"### Without AI Review (No Model Needed):\n",
|
|
"1. **Upload** a code file (.py, .js, .java, etc.)\n",
|
|
"2. **Uncheck** \"Generate AI Code Review\"\n",
|
|
"3. **Click** \"🚀 Process & Annotate\"\n",
|
|
"4. **View** syntax-highlighted code with Big-O annotations\n",
|
|
"5. **Download** the annotated source + Markdown\n",
|
|
"\n",
|
|
"### With AI Review (Requires Model):\n",
|
|
"1. **Click** \"🔄 Load Model\" (wait 2-5 minutes for first download)\n",
|
|
"2. **Upload** your code file\n",
|
|
"3. **Check** \"Generate AI Code Review\"\n",
|
|
"4. **Adjust** temperature/tokens if needed\n",
|
|
"5. **Click** \"🚀 Process & Annotate\"\n",
|
|
"6. **Read** AI-generated optimization suggestions\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"## 🎯 Supported Languages\n",
|
|
"\n",
|
|
"Python • JavaScript • TypeScript • Java • C • C++ • C# • Go • PHP • Swift • Ruby • Kotlin • Rust\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"## 🧠 Model Options\n",
|
|
"\n",
|
|
"**Recommended for CPU/Mac:**\n",
|
|
"- `meta-llama/Llama-3.2-1B` (Default, ~1GB, requires HF approval)\n",
|
|
"- `gpt2` (No approval needed, ~500MB)\n",
|
|
"- `microsoft/DialoGPT-medium` (~1GB)\n",
|
|
"\n",
|
|
"**For GPU users:**\n",
|
|
"- Any model with 8-bit or 4-bit quantization enabled\n",
|
|
"- `meta-llama/Llama-2-7b-chat-hf` (requires approval)\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"**Note:** First model load downloads weights (~1-14GB depending on model). Subsequent runs load from cache.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 22,
|
|
"id": "eec78f72",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"* Running on local URL: http://127.0.0.1:7861\n",
|
|
"* To create a public link, set `share=True` in `launch()`.\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div><iframe src=\"http://127.0.0.1:7861/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
|
|
],
|
|
"text/plain": [
|
|
"<IPython.core.display.HTML object>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": []
|
|
},
|
|
"execution_count": 22,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"📄 Processing: /private/var/folders/jq/vdvn5cg53sj2xsq1w_0wjjc80000gn/T/gradio/d8fe7d241f82ae93c8cf07e99823e6db91d20185c411ded7454eb7a0d89174a4/3 Simple Python Functions with Different Time Complexities.py (0.00 MB)\n",
|
|
"🔍 Detected language: python\n",
|
|
"📄 Processing: /private/var/folders/jq/vdvn5cg53sj2xsq1w_0wjjc80000gn/T/gradio/a2b7a4fdfb5e5f657878a74459fd8d68e30fc0afdfb6e5627aab99cf8552011d/Simple Python Functions with Different Time Complexities.py (0.00 MB)\n",
|
|
"🔍 Detected language: python\n",
|
|
"📄 Processing: /private/var/folders/jq/vdvn5cg53sj2xsq1w_0wjjc80000gn/T/gradio/4dad1dc092f0232b348a683e42414de456c388b3e21d93ee820b8e7bc4a2aa47/Python Function.py (0.00 MB)\n",
|
|
"🔍 Detected language: python\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Launch the Gradio interface\n",
|
|
"demo.launch(\n",
|
|
" share=False, # Set to True to create a public shareable link\n",
|
|
" inbrowser=True, # Automatically open in browser\n",
|
|
" show_error=True # Show detailed errors in UI\n",
|
|
")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": ".venv",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.12.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|