Merge pull request #688 from Khashayarbayati1/add-khashayar-summarizer

Add community contribution: Wikipedia Summarizer Battle (Ollama + OpenAI)
This commit is contained in:
Ed Donner
2025-10-07 15:48:40 -04:00
committed by GitHub
5 changed files with 508 additions and 0 deletions

View File

@@ -0,0 +1,122 @@
# 🥊 Summarization Battle: Ollama vs. OpenAI Judge
This mini-project pits multiple **local LLMs** (via [Ollama](https://ollama.ai)) against each other in a **web summarization contest**, with an **OpenAI model** serving as the impartial judge.
It automatically fetches web articles, summarizes them with several models, and evaluates the results on **coverage, faithfulness, clarity, and conciseness**.
---
## 🚀 Features
- **Fetch Articles** Download and clean text content from given URLs.
- **Summarize with Ollama** Run multiple local models (e.g., `llama3.2`, `phi3`, `deepseek-r1`) via the Ollama API.
- **Judge with OpenAI** Use `gpt-4o-mini` (or any other OpenAI model) to score summaries.
- **Battle Results** Collect JSON results with per-model scores, rationales, and winners.
- **Timeout Handling & Warmup** Keeps models alive with `keep_alive` to avoid cold-start delays.
---
## 📂 Project Structure
```
.
├── urls.txt # Dictionary of categories → URLs
├── battle_results.json # Summarization + judging results
├── main.py # Main script
├── requirements.txt # Dependencies
└── README.md # You are here
```
---
## ⚙️ Installation
1. **Clone the repo**:
```bash
git clone https://github.com/khashayarbayati1/wikipedia-summarization-battle.git
cd summarization-battle
```
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
Minimal requirements:
```txt
requests
beautifulsoup4
python-dotenv
openai>=1.0.0
httpx
```
3. **Install Ollama & models**:
- [Install Ollama](https://ollama.ai/download) if not already installed.
- Pull the models you want:
```bash
ollama pull llama3.2:latest
ollama pull deepseek-r1:1.5b
ollama pull phi3:latest
```
4. **Set up OpenAI API key**:
Create a `.env` file with:
```env
OPENAI_API_KEY=sk-proj-xxxx...
```
---
## ▶️ Usage
1. Put your URL dictionary in `urls.txt`, e.g.:
```python
{
"sports": "https://en.wikipedia.org/wiki/Sabermetrics",
"Politics": "https://en.wikipedia.org/wiki/Separation_of_powers",
"History": "https://en.wikipedia.org/wiki/Industrial_Revolution"
}
```
2. Run the script:
```bash
python main.py
```
3. Results are written to:
- `battle_results.json`
- Printed in the terminal
---
## 🏆 Example Results
Sample output (excerpt):
```json
{
"category": "sports",
"url": "https://en.wikipedia.org/wiki/Sabermetrics",
"scores": {
"llama3.2:latest": { "score": 4, "rationale": "Covers the main points..." },
"deepseek-r1:1.5b": { "score": 3, "rationale": "Some inaccuracies..." },
"phi3:latest": { "score": 5, "rationale": "Concise, accurate, well-organized." }
},
"winner": "phi3:latest"
}
```
From the full run:
- 🥇 **`phi3:latest`** won in *Sports, History, Productivity*
- 🥇 **`deepseek-r1:1.5b`** won in *Politics, Technology*
---
## 💡 Ideas for Extension
- Add more Ollama models (e.g., `mistral`, `gemma`, etc.)
- Try different evaluation criteria (e.g., readability, length control)
- Visualize results with charts
- Benchmark runtime and token usage
---
## 📜 License
MIT License free to use, modify, and share.

View File

@@ -0,0 +1,97 @@
[
{
"category": "sports",
"url": "https://en.wikipedia.org/wiki/Sabermetrics",
"scores": {
"llama3.2:latest": {
"score": 4,
"rationale": "This summary covers the main points of the article well, including the origins of sabermetrics, its evolution, and its impact on baseball analytics. However, it could be slightly more concise."
},
"deepseek-r1:1.5b": {
"score": 3,
"rationale": "While this summary captures several key aspects of sabermetrics, it lacks clarity in organization and includes some inaccuracies, such as misattributing the coinage of the term to Earnshaw Cook."
},
"phi3:latest": {
"score": 5,
"rationale": "This summary is concise and accurately reflects the key elements of the article, including the contributions of Bill James and the evolution of metrics in baseball, making it clear and well-organized."
}
},
"winner": "phi3:latest"
},
{
"category": "Politics",
"url": "https://en.wikipedia.org/wiki/Separation_of_powers",
"scores": {
"llama3.2:latest": {
"score": 4,
"rationale": "This summary effectively covers the main points of the article, including the definition of separation of powers, its implementation, and the philosophical background. However, it could benefit from a bit more detail on historical context."
},
"deepseek-r1:1.5b": {
"score": 5,
"rationale": "This summary is comprehensive and well-organized, clearly outlining the structure of the separation of powers, examples from different countries, and implications for political ideologies. It maintains clarity and accuracy throughout."
},
"phi3:latest": {
"score": 3,
"rationale": "While this summary provides a broad overview of the historical and theoretical aspects of separation of powers, it lacks focus on the core principles and practical implications, making it less concise and clear compared to the others."
}
},
"winner": "deepseek-r1:1.5b"
},
{
"category": "History",
"url": "https://en.wikipedia.org/wiki/Industrial_Revolution",
"scores": {
"llama3.2:latest": {
"score": 4,
"rationale": "This summary effectively covers the main points of the Industrial Revolution, including its timeline, technological advancements, and societal impacts. However, it could benefit from more detail on the causes and criticisms."
},
"deepseek-r1:1.5b": {
"score": 3,
"rationale": "While this summary captures some key aspects of the Industrial Revolution, it lacks clarity and organization, making it harder to follow. It also misses some significant details about the social effects and criticisms."
},
"phi3:latest": {
"score": 5,
"rationale": "This summary is comprehensive and well-organized, covering a wide range of topics including technological advancements, social impacts, and historical context. It provides a clear and detailed overview of the Industrial Revolution."
}
},
"winner": "phi3:latest"
},
{
"category": "Technology",
"url": "https://en.wikipedia.org/wiki/Artificial_general_intelligence",
"scores": {
"llama3.2:latest": {
"score": 4,
"rationale": "The summary covers key aspects of AGI, including its definition, development goals, and associated risks, but could benefit from more technical details."
},
"deepseek-r1:1.5b": {
"score": 5,
"rationale": "This summary is well-structured and comprehensive, accurately capturing the essence of AGI, its distinctions from narrow AI, and the associated risks while maintaining clarity."
},
"phi3:latest": {
"score": 4,
"rationale": "The summary effectively outlines the definition and characteristics of AGI, but it lacks some depth in discussing the implications and technical definitions compared to the best summary."
}
},
"winner": "deepseek-r1:1.5b"
},
{
"category": "Productivity",
"url": "https://en.wikipedia.org/wiki/Scientific_management",
"scores": {
"llama3.2:latest": {
"score": 4,
"rationale": "This summary covers the main points of the article, including the origins, principles, and historical context of scientific management. However, it could be more concise and organized."
},
"deepseek-r1:1.5b": {
"score": 3,
"rationale": "While this summary captures key aspects of scientific management, it lacks clarity and organization, making it harder to follow. The bullet points are somewhat disjointed."
},
"phi3:latest": {
"score": 5,
"rationale": "This summary is well-structured, covering the essential elements of scientific management, including its principles, historical context, and criticisms. It is clear, concise, and accurately reflects the source material."
}
},
"winner": "phi3:latest"
}
]

View File

@@ -0,0 +1,214 @@
# imports
import os, json, ast, pathlib
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from openai import OpenAI
import traceback
from typing import List, Dict
from httpx import Timeout
# ---------- utils ----------
def openai_api_key_loader():
load_dotenv(dotenv_path=".env", override=True)
api_key = os.getenv('OPENAI_API_KEY')
if not api_key:
print("❌ No API key found. Please check your .env file.")
return False
if not api_key.startswith("sk-proj-"):
print("⚠️ API key found, but does not start with 'sk-proj-'. Check you're using the right one.")
return False
if api_key.strip() != api_key:
print("⚠️ API key has leading/trailing whitespace. Please clean it.")
return False
print("✅ API key found and looks good!")
return True
def ollama_installed_tags(base_url="http://localhost:11434"):
r = requests.get(f"{base_url}/api/tags", timeout=10)
r.raise_for_status()
return {m["name"] for m in r.json().get("models", [])}
def get_urls(file_name: str):
with open(f"{file_name}.txt", "r") as f:
content = f.read()
url_dict = ast.literal_eval(content) # expects a dict literal in the file
return url_dict
def text_from_url(url: str):
session = requests.Session()
session.headers.update({
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/117.0.0.0 Safari/537.36"
)
})
resp = session.get(url, timeout=30)
resp.raise_for_status()
soup = BeautifulSoup(resp.content, 'html.parser')
title = soup.title.string.strip() if soup.title and soup.title.string else "No title found"
body = soup.body
if not body:
return title, ""
for irrelevant in body(["script", "style", "img", "input", "noscript"]):
irrelevant.decompose()
text = body.get_text(separator="\n", strip=True)
return title, text
# ---------- contestants (Ollama) ----------
def summarize_with_model(text: str, model: str, ollama_client: OpenAI) -> str:
clipped = text[:9000] # keep it modest for small models
messages = [
{"role": "system", "content": "You are a concise, faithful web summarizer."},
{"role": "user", "content": (
"Summarize the article below in 46 bullet points. "
"Be factual, avoid speculation, and do not add information not present in the text.\n\n"
f"=== ARTICLE START ===\n{clipped}\n=== ARTICLE END ==="
)}
]
stream = ollama_client.chat.completions.create(
model=model,
messages=messages,
temperature=0,
stream=True,
extra_body={"keep_alive": "30m", "num_ctx": 2048}
)
chunks = []
for event in stream:
delta = getattr(event.choices[0].delta, "content", None)
if delta:
chunks.append(delta)
return "".join(chunks).strip()
# ---------- judge (ChatGPT) ----------
JUDGE_MODEL = "gpt-4o-mini"
def judge_summaries(category: str, url: str, source_text: str, summaries: dict, judge_client: OpenAI) -> dict:
src = source_text[:12000]
judge_prompt = f"""
You are the referee in a web summarization contest.
Task:
1) Read the SOURCE ARTICLE (below).
2) Evaluate EACH SUMMARY on: Coverage, Accuracy/Faithfulness, Clarity/Organization, Conciseness.
3) Give a 05 integer SCORE for each model (5 best).
4) Brief rationale (12 sentences per model).
5) Choose a single WINNER (tie-break on accuracy then clarity).
Return STRICT JSON only with this schema:
{{
"category": "{category}",
"url": "{url}",
"scores": {{
"<model_name>": {{ "score": <0-5>, "rationale": "<1-2 sentences>" }}
}},
"winner": "<model_name>"
}}
SOURCE ARTICLE:
{src}
SUMMARIES:
"""
for m, s in summaries.items():
judge_prompt += f"\n--- {m} ---\n{s}\n"
messages = [
{"role": "system", "content": "You are a strict, reliable evaluation judge for summaries."},
{"role": "user", "content": judge_prompt}
]
resp = judge_client.chat.completions.create(
model=JUDGE_MODEL,
messages=messages,
response_format={"type": "json_object"},
temperature=0
)
content = resp.choices[0].message.content
try:
return json.loads(content)
except json.JSONDecodeError:
# fallback: wrap in an envelope if the model added extra text
start = content.find("{")
end = content.rfind("}")
return json.loads(content[start:end+1])
def run_battle(url_dict: Dict[str, str], ollama_client: OpenAI, judge_client: OpenAI, models: List[str]) -> List[dict]:
all_results = []
for category, url in url_dict.items():
title, text = text_from_url(url)
summaries = {}
for m in models:
try:
summaries[m] = summarize_with_model(text, m, ollama_client)
except Exception as e:
print(f"\n--- Error from {m} ---")
print(repr(e))
traceback.print_exc()
summaries[m] = f"[ERROR from {m}: {e}]"
clean_summaries = {m: s for m, s in summaries.items() if not s.startswith("[ERROR")}
verdict = judge_summaries(category, url, text, clean_summaries or summaries, judge_client)
all_results.append(verdict)
return all_results
def warmup(ollama_client: OpenAI, model: str):
try:
ollama_client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "OK"}],
temperature=0,
extra_body={"keep_alive": "30m"}
)
except Exception as e:
print(f"[warmup] {model}: {e}")
# ---------- main ----------
def main():
if not openai_api_key_loader():
return
# contestants (local Ollama)
ollama_client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama",
timeout=Timeout(300.0, connect=30.0) # generous read/connect timeouts
)
# judge (cloud OpenAI)
judge_client = OpenAI()
available = ollama_installed_tags()
desired = ["llama3.2:latest", "deepseek-r1:1.5b", "phi3:latest"] # keep here
models = [m for m in desired if m in available]
print("Available:", sorted(available))
print("Desired :", desired)
print("Running :", models)
if not models:
raise RuntimeError(f"No desired models installed. Have: {sorted(available)}")
url_dict = get_urls(file_name="urls")
for m in models:
warmup(ollama_client, m)
results = run_battle(url_dict, ollama_client, judge_client, models)
pathlib.Path("battle_results.json").write_text(json.dumps(results, indent=2), encoding="utf-8")
print(json.dumps(results, indent=2))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,68 @@
annotated-types==0.7.0
anyio==4.10.0
appnope @ file:///home/conda/feedstock_root/build_artifacts/appnope_1733332318622/work
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1733250440834/work
attrs==25.3.0
beautifulsoup4==4.13.5
bs4==0.0.2
certifi==2025.8.3
charset-normalizer==3.4.3
comm @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_comm_1753453984/work
debugpy @ file:///Users/runner/miniforge3/conda-bld/bld/rattler-build_debugpy_1758162070/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1740384970518/work
distro==1.9.0
dotenv==0.9.9
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1746947292760/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1756729339227/work
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.10
importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_importlib-metadata_1747934053/work
ipykernel @ file:///Users/runner/miniforge3/conda-bld/ipykernel_1754352890318/work
ipython @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_ipython_1748711175/work
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1733300866624/work
jiter==0.11.0
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1733440914442/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1748333051527/work
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1733416936468/work
nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1733325553580/work
ollama==0.5.4
openai==1.108.1
outcome==1.3.0.post0
packaging @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_packaging_1745345660/work
parso @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_parso_1755974222/work
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1733301927746/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1733327343728/work
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_platformdirs_1756227402/work
prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1756321756983/work
psutil @ file:///Users/runner/miniforge3/conda-bld/psutil_1758169248045/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1733302279685/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl#sha256=92c32ff62b5fd8cf325bec5ab90d7be3d2a8ca8c8a3813ff487a8d2002630d1f
pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1733569405015/work
pydantic==2.11.9
pydantic_core==2.33.2
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1750615794071/work
PySocks==1.7.1
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_python-dateutil_1751104122/work
python-dotenv==1.1.1
pyzmq @ file:///Users/runner/miniforge3/conda-bld/bld/rattler-build_pyzmq_1757387129/work
requests==2.32.5
selenium==4.35.0
six @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_six_1753199211/work
sniffio==1.3.1
sortedcontainers==2.4.0
soupsieve==2.8
stack_data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1733569443808/work
tornado @ file:///Users/runner/miniforge3/conda-bld/tornado_1756854937117/work
tqdm==4.67.1
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1733367359838/work
trio==0.30.0
trio-websocket==0.12.2
typing-inspection==0.4.1
typing_extensions==4.14.1
urllib3==2.5.0
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1733231326287/work
webdriver-manager==4.0.2
websocket-client==1.8.0
wsproto==1.2.0
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1749421620841/work

View File

@@ -0,0 +1,7 @@
{
"sports": "https://en.wikipedia.org/wiki/Sabermetrics",
"Politics": "https://en.wikipedia.org/wiki/Separation_of_powers",
"History": "https://en.wikipedia.org/wiki/Industrial_Revolution",
"Technology": "https://en.wikipedia.org/wiki/Artificial_general_intelligence",
"Productivity": "https://en.wikipedia.org/wiki/Scientific_management",
}