This mini-project pits multiple local LLMs (via Ollama) against each other in a web summarization contest, with an OpenAI model serving as the impartial judge.
It automatically fetches web articles, summarizes them with several models, and evaluates the results on coverage, faithfulness, clarity, and conciseness.

🚀 Features

Fetch Articles – Download and clean text content from given URLs.
Summarize with Ollama – Run multiple local models (e.g., llama3.2, phi3, deepseek-r1) via the Ollama API.
Judge with OpenAI – Use gpt-4o-mini (or any other OpenAI model) to score summaries.
Battle Results – Collect JSON results with per-model scores, rationales, and winners.
Timeout Handling & Warmup – Keeps models alive with keep_alive to avoid cold-start delays.

📂 Project Structure

.
├── urls.txt              # Dictionary of categories → URLs
├── battle_results.json   # Summarization + judging results
├── main.py               # Main script
├── requirements.txt      # Dependencies
└── README.md             # You are here

⚙️ Installation

Clone the repo:

git clone https://github.com/khashayarbayati1/wikipedia-summarization-battle.git
cd summarization-battle

Install dependencies:

pip install -r requirements.txt

Minimal requirements:

requests
beautifulsoup4
python-dotenv
openai>=1.0.0
httpx

Install Ollama & models:
- Install Ollama if not already installed.
- Pull the models you want:
```
ollama pull llama3.2:latest
ollama pull deepseek-r1:1.5b
ollama pull phi3:latest
```
Set up OpenAI API key: Create a .env file with:
```
OPENAI_API_KEY=sk-proj-xxxx...
```

▶️ Usage

Put your URL dictionary in urls.txt, e.g.:

{
  "sports": "https://en.wikipedia.org/wiki/Sabermetrics",
  "Politics": "https://en.wikipedia.org/wiki/Separation_of_powers",
  "History": "https://en.wikipedia.org/wiki/Industrial_Revolution"
}

Run the script:
```
python main.py
```
Results are written to:
- battle_results.json
- Printed in the terminal

🏆 Example Results

Sample output (excerpt):

{
  "category": "sports",
  "url": "https://en.wikipedia.org/wiki/Sabermetrics",
  "scores": {
    "llama3.2:latest": { "score": 4, "rationale": "Covers the main points..." },
    "deepseek-r1:1.5b": { "score": 3, "rationale": "Some inaccuracies..." },
    "phi3:latest": { "score": 5, "rationale": "Concise, accurate, well-organized." }
  },
  "winner": "phi3:latest"
}

From the full run:

🥇 phi3:latest won in Sports, History, Productivity
🥇 deepseek-r1:1.5b won in Politics, Technology

💡 Ideas for Extension

Add more Ollama models (e.g., mistral, gemma, etc.)
Try different evaluation criteria (e.g., readability, length control)
Visualize results with charts
Benchmark runtime and token usage

📜 License

MIT License – free to use, modify, and share.

README.md Unescape Escape

🥊 Summarization Battle: Ollama vs. OpenAI Judge