Files

Hope Ogbons 028419628e Add README for Multi-Language Code Complexity Annotator, detailing features, installation, usage, and troubleshooting

2025-10-27 06:17:55 +01:00

7.7 KiB

Raw Blame History

🔶 Multi-Language Code Complexity Annotator

An automated tool that analyzes source code and annotates it with Big-O complexity estimates, complete with syntax highlighting and optional AI-powered code reviews.

🎯 What It Does

Understanding time complexity (Big-O notation) is crucial for writing efficient algorithms, identifying bottlenecks, making informed optimization decisions, and passing technical interviews.

Analyzing complexity manually is tedious and error-prone. This tool automates the entire process—detecting loops, recursion, and functions, then annotating code with Big-O estimates and explanations.

Core Features

📊 Automatic Detection - Identifies loops, recursion, and functions across 13+ programming languages
🧮 Complexity Estimation - Calculates Big-O complexity (O(1), O(n), O(n²), O(log n), etc.)
💬 Inline Annotations - Inserts explanatory comments directly into your code
🎨 Syntax Highlighting - Generates beautiful HTML previews with orange-colored complexity comments
🤖 AI Code Review - Optional LLaMA-powered analysis for optimization suggestions
💾 Export Options - Download annotated source code and Markdown previews

🌐 Supported Languages

Python • JavaScript • TypeScript • Java • C • C++ • C# • Go • PHP • Swift • Ruby • Kotlin • Rust

🛠️ Tech Stack

HuggingFace Transformers - LLM model loading and inference
LLaMA 3.2 - AI-powered code review
Gradio - Interactive web interface
Pygments - Syntax highlighting
PyTorch - Deep learning framework
Regex Analysis - Heuristic complexity detection

📋 Prerequisites

Python 3.12+
uv package manager (or pip)
4GB+ RAM (for basic use without AI)
14GB+ RAM (for AI code review with LLaMA models)
Optional: NVIDIA GPU with CUDA (for model quantization)

🚀 Installation

1. Clone the Repository

cd week4

2. Install Dependencies

uv pip install -U pip
uv pip install transformers accelerate gradio torch --extra-index-url https://download.pytorch.org/whl/cpu
uv pip install bitsandbytes pygments python-dotenv

Note: This installs the CPU-only version of PyTorch. For GPU support, remove the --extra-index-url flag.

3. Set Up HuggingFace Token (Optional - for AI Features)

Create a .env file in the week4 directory:

HF_TOKEN=hf_your_token_here

Get your token at: https://huggingface.co/settings/tokens

Required for: LLaMA models (requires accepting Meta's license agreement)

💡 Usage

Option 1: Jupyter Notebook

Open and run week4 EXERCISE_hopeogbons.ipynb:

jupyter notebook "week4 EXERCISE_hopeogbons.ipynb"

Run all cells in order. The Gradio interface will launch at http://127.0.0.1:7861

Option 2: Web Interface

Once the Gradio app is running:

Without AI Review (No Model Needed)

Upload a code file (.py, .js, .java, etc.)
Uncheck "Generate AI Code Review"
Click "🚀 Process & Annotate"
View syntax-highlighted code with Big-O annotations
Download the annotated source + Markdown

With AI Review (Requires Model)

Click "🔄 Load Model" (wait 2-5 minutes for first download)
Upload your code file
Check "Generate AI Code Review"
Adjust temperature/tokens if needed
Click "🚀 Process & Annotate"
Read AI-generated optimization suggestions

📊 How It Works

Complexity Detection Algorithm

The tool uses heuristic pattern matching to estimate Big-O complexity:

Detect Blocks - Regex patterns find functions, loops, and recursion
Analyze Loops - Count nesting depth:
- 1 loop = O(n)
- 2 nested loops = O(n²)
- 3 nested loops = O(n³)
Analyze Recursion - Pattern detection:
- Divide-and-conquer (binary search) = O(log n)
- Single recursive call = O(n)
- Multiple recursive calls = O(2^n)
Aggregate - Functions inherit worst-case complexity of inner operations

Example Output

Input (Python):

def bubble_sort(arr):
    for i in range(len(arr)):
        for j in range(len(arr) - i - 1):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]

Output (Annotated):

def bubble_sort(arr):
# Big-O: O(n^2)
# Explanation: Nested loops indicate quadratic time.
    for i in range(len(arr)):
        for j in range(len(arr) - i - 1):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]

🧠 AI Model Options

CPU/Mac (No GPU)

meta-llama/Llama-3.2-1B (Default, ~1GB, requires HF approval)
gpt2 (No approval needed, ~500MB)
microsoft/DialoGPT-medium (~1GB)

GPU Users

Any model with 8-bit or 4-bit quantization enabled
meta-llama/Llama-2-7b-chat-hf (requires approval)

Memory Requirements

Without quantization: ~14GB RAM (7B models) or ~26GB (13B models)
With 8-bit quantization: ~50% reduction (GPU required)
With 4-bit quantization: ~75% reduction (GPU required)

⚙️ Configuration

File Limits

Max file size: 2 MB
Supported extensions: .py, .js, .ts, .java, .c, .cpp, .cs, .go, .php, .swift, .rb, .kt, .rs

Model Parameters

Temperature (0.0 - 1.5): Controls randomness
- Lower = more deterministic
- Higher = more creative
Max Tokens (16 - 1024): Maximum length of AI review

📁 Project Structure

week4/
├── week4 EXERCISE_hopeogbons.ipynb  # Main application notebook
├── README.md                         # This file
└── .env                             # HuggingFace token (create this)

🐛 Troubleshooting

Model Loading Issues

Error: "Model not found" or "Access denied"

Solution: Accept Meta's license at https://huggingface.co/meta-llama/Llama-3.2-1B
Ensure your .env file contains a valid HF_TOKEN

Memory Issues

Error: "Out of memory" during model loading

Solution: Use a smaller model like gpt2 or microsoft/DialoGPT-medium
Try 8-bit or 4-bit quantization (GPU required)

Quantization Requires GPU

Error: "Quantization requires CUDA"

Solution: Disable both 4-bit and 8-bit quantization checkboxes
Run on CPU with smaller models

File Upload Issues

Error: "Unsupported file extension"

Solution: Ensure your file has one of the supported extensions
Check that the file size is under 2MB

🎓 Use Cases

Code Review - Automated complexity analysis for pull requests
Interview Prep - Understand algorithm efficiency before coding interviews
Performance Optimization - Identify bottlenecks in existing code
Education - Learn Big-O notation through practical examples
Documentation - Auto-generate complexity documentation

📝 Notes

First model load downloads weights (~1-14GB depending on model)
Subsequent runs load from cache (much faster)
Complexity estimates are heuristic-based, not formally verified
For production use, consider manual verification of critical algorithms

🤝 Contributing

This is a learning project from the Andela LLM Engineering course (Week 4). Feel free to extend it with:

Additional language support
More sophisticated complexity detection
Integration with CI/CD pipelines
Support for space complexity analysis

📄 License

Educational project - use as reference for learning purposes.

🙏 Acknowledgments

OpenAI Whisper for inspiration on model integration
HuggingFace for providing the Transformers library
Meta for LLaMA models
Gradio for the excellent UI framework
Andela for the LLM Engineering curriculum

Built with ❤️ as part of Week 4 LLM Engineering coursework

7.7 KiB Raw Blame History