Files

🥊 Summarization Battle: Ollama vs. OpenAI Judge

This mini-project pits multiple local LLMs (via Ollama) against each other in a web summarization contest, with an OpenAI model serving as the impartial judge.
It automatically fetches web articles, summarizes them with several models, and evaluates the results on coverage, faithfulness, clarity, and conciseness.


🚀 Features

  • Fetch Articles Download and clean text content from given URLs.
  • Summarize with Ollama Run multiple local models (e.g., llama3.2, phi3, deepseek-r1) via the Ollama API.
  • Judge with OpenAI Use gpt-4o-mini (or any other OpenAI model) to score summaries.
  • Battle Results Collect JSON results with per-model scores, rationales, and winners.
  • Timeout Handling & Warmup Keeps models alive with keep_alive to avoid cold-start delays.

📂 Project Structure

.
├── urls.txt              # Dictionary of categories → URLs
├── battle_results.json   # Summarization + judging results
├── main.py               # Main script
├── requirements.txt      # Dependencies
└── README.md             # You are here

⚙️ Installation

  1. Clone the repo:

    git clone https://github.com/khashayarbayati1/wikipedia-summarization-battle.git
    cd summarization-battle
    
  2. Install dependencies:

    pip install -r requirements.txt
    

    Minimal requirements:

    requests
    beautifulsoup4
    python-dotenv
    openai>=1.0.0
    httpx
    
  3. Install Ollama & models:

    • Install Ollama if not already installed.
    • Pull the models you want:
      ollama pull llama3.2:latest
      ollama pull deepseek-r1:1.5b
      ollama pull phi3:latest
      
  4. Set up OpenAI API key: Create a .env file with:

    OPENAI_API_KEY=sk-proj-xxxx...
    

▶️ Usage

  1. Put your URL dictionary in urls.txt, e.g.:

    {
      "sports": "https://en.wikipedia.org/wiki/Sabermetrics",
      "Politics": "https://en.wikipedia.org/wiki/Separation_of_powers",
      "History": "https://en.wikipedia.org/wiki/Industrial_Revolution"
    }
    
  2. Run the script:

    python main.py
    
  3. Results are written to:

    • battle_results.json
    • Printed in the terminal

🏆 Example Results

Sample output (excerpt):

{
  "category": "sports",
  "url": "https://en.wikipedia.org/wiki/Sabermetrics",
  "scores": {
    "llama3.2:latest": { "score": 4, "rationale": "Covers the main points..." },
    "deepseek-r1:1.5b": { "score": 3, "rationale": "Some inaccuracies..." },
    "phi3:latest": { "score": 5, "rationale": "Concise, accurate, well-organized." }
  },
  "winner": "phi3:latest"
}

From the full run:

  • 🥇 phi3:latest won in Sports, History, Productivity
  • 🥇 deepseek-r1:1.5b won in Politics, Technology

💡 Ideas for Extension

  • Add more Ollama models (e.g., mistral, gemma, etc.)
  • Try different evaluation criteria (e.g., readability, length control)
  • Visualize results with charts
  • Benchmark runtime and token usage

📜 License

MIT License free to use, modify, and share.