Merge pull request #688 from Khashayarbayati1/add-khashayar-summarizer

Add community contribution: Wikipedia Summarizer Battle (Ollama + OpenAI)
2025-10-07 15:48:40 -04:00
parent f9d92fe991 eb4df28e34
commit 03003ce36d
5 changed files with 508 additions and 0 deletions
--- a/week1/community-contributions/khashayar_summarizer_battle/README.md
+++ b/week1/community-contributions/khashayar_summarizer_battle/README.md
@@ -0,0 +1,122 @@
 # 🥊 Summarization Battle: Ollama vs. OpenAI Judge
 This mini-project pits multiple **local LLMs** (via [Ollama](https://ollama.ai)) against each other in a **web summarization contest**, with an **OpenAI model** serving as the impartial judge.  
 It automatically fetches web articles, summarizes them with several models, and evaluates the results on **coverage, faithfulness, clarity, and conciseness**.
 ---
 ## 🚀 Features
 - **Fetch Articles** – Download and clean text content from given URLs.
 - **Summarize with Ollama** – Run multiple local models (e.g., `llama3.2`, `phi3`, `deepseek-r1`) via the Ollama API.
 - **Judge with OpenAI** – Use `gpt-4o-mini` (or any other OpenAI model) to score summaries.
 - **Battle Results** – Collect JSON results with per-model scores, rationales, and winners.
 - **Timeout Handling & Warmup** – Keeps models alive with `keep_alive` to avoid cold-start delays.
 ---
 ## 📂 Project Structure
 ```
 .
 ├── urls.txt              # Dictionary of categories → URLs
 ├── battle_results.json   # Summarization + judging results
 ├── main.py               # Main script
 ├── requirements.txt      # Dependencies
 └── README.md             # You are here
 ```
 ---
 ## ⚙️ Installation
 1. **Clone the repo**:
   ```bash
   git clone https://github.com/khashayarbayati1/wikipedia-summarization-battle.git
   cd summarization-battle
   ```
 2. **Install dependencies**:
   ```bash
   pip install -r requirements.txt
   ```
   Minimal requirements:
   ```txt
   requests
   beautifulsoup4
   python-dotenv
   openai>=1.0.0
   httpx
   ```
 3. **Install Ollama & models**:
   - [Install Ollama](https://ollama.ai/download) if not already installed.
   - Pull the models you want:
     ```bash
     ollama pull llama3.2:latest
     ollama pull deepseek-r1:1.5b
     ollama pull phi3:latest
     ```
 4. **Set up OpenAI API key**:
   Create a `.env` file with:
   ```env
   OPENAI_API_KEY=sk-proj-xxxx...
   ```
 ---
 ## ▶️ Usage
 1. Put your URL dictionary in `urls.txt`, e.g.:
   ```python
   {
     "sports": "https://en.wikipedia.org/wiki/Sabermetrics",
     "Politics": "https://en.wikipedia.org/wiki/Separation_of_powers",
     "History": "https://en.wikipedia.org/wiki/Industrial_Revolution"
   }
   ```
 2. Run the script:
   ```bash
   python main.py
   ```
 3. Results are written to:
   - `battle_results.json`
   - Printed in the terminal
 ---
 ## 🏆 Example Results
 Sample output (excerpt):
 ```json
 {
  "category": "sports",
  "url": "https://en.wikipedia.org/wiki/Sabermetrics",
  "scores": {
    "llama3.2:latest": { "score": 4, "rationale": "Covers the main points..." },
    "deepseek-r1:1.5b": { "score": 3, "rationale": "Some inaccuracies..." },
    "phi3:latest": { "score": 5, "rationale": "Concise, accurate, well-organized." }
  },
  "winner": "phi3:latest"
 }
 ```
 From the full run:
 - 🥇 **`phi3:latest`** won in *Sports, History, Productivity*
 - 🥇 **`deepseek-r1:1.5b`** won in *Politics, Technology*
 ---
 ## 💡 Ideas for Extension
 - Add more Ollama models (e.g., `mistral`, `gemma`, etc.)
 - Try different evaluation criteria (e.g., readability, length control)
 - Visualize results with charts
 - Benchmark runtime and token usage
 ---
 ## 📜 License
 MIT License – free to use, modify, and share.
--- a/week1/community-contributions/khashayar_summarizer_battle/battle_results.json
+++ b/week1/community-contributions/khashayar_summarizer_battle/battle_results.json
@@ -0,0 +1,97 @@
 [
  {
    "category": "sports",
    "url": "https://en.wikipedia.org/wiki/Sabermetrics",
    "scores": {
      "llama3.2:latest": {
        "score": 4,
        "rationale": "This summary covers the main points of the article well, including the origins of sabermetrics, its evolution, and its impact on baseball analytics. However, it could be slightly more concise."
      },
      "deepseek-r1:1.5b": {
        "score": 3,
        "rationale": "While this summary captures several key aspects of sabermetrics, it lacks clarity in organization and includes some inaccuracies, such as misattributing the coinage of the term to Earnshaw Cook."
      },
      "phi3:latest": {
        "score": 5,
        "rationale": "This summary is concise and accurately reflects the key elements of the article, including the contributions of Bill James and the evolution of metrics in baseball, making it clear and well-organized."
      }
    },
    "winner": "phi3:latest"
  },
  {
    "category": "Politics",
    "url": "https://en.wikipedia.org/wiki/Separation_of_powers",
    "scores": {
      "llama3.2:latest": {
        "score": 4,
        "rationale": "This summary effectively covers the main points of the article, including the definition of separation of powers, its implementation, and the philosophical background. However, it could benefit from a bit more detail on historical context."
      },
      "deepseek-r1:1.5b": {
        "score": 5,
        "rationale": "This summary is comprehensive and well-organized, clearly outlining the structure of the separation of powers, examples from different countries, and implications for political ideologies. It maintains clarity and accuracy throughout."
      },
      "phi3:latest": {
        "score": 3,
        "rationale": "While this summary provides a broad overview of the historical and theoretical aspects of separation of powers, it lacks focus on the core principles and practical implications, making it less concise and clear compared to the others."
      }
    },
    "winner": "deepseek-r1:1.5b"
  },
  {
    "category": "History",
    "url": "https://en.wikipedia.org/wiki/Industrial_Revolution",
    "scores": {
      "llama3.2:latest": {
        "score": 4,
        "rationale": "This summary effectively covers the main points of the Industrial Revolution, including its timeline, technological advancements, and societal impacts. However, it could benefit from more detail on the causes and criticisms."
      },
      "deepseek-r1:1.5b": {
        "score": 3,
        "rationale": "While this summary captures some key aspects of the Industrial Revolution, it lacks clarity and organization, making it harder to follow. It also misses some significant details about the social effects and criticisms."
      },
      "phi3:latest": {
        "score": 5,
        "rationale": "This summary is comprehensive and well-organized, covering a wide range of topics including technological advancements, social impacts, and historical context. It provides a clear and detailed overview of the Industrial Revolution."
      }
    },
    "winner": "phi3:latest"
  },
  {
    "category": "Technology",
    "url": "https://en.wikipedia.org/wiki/Artificial_general_intelligence",
    "scores": {
      "llama3.2:latest": {
        "score": 4,
        "rationale": "The summary covers key aspects of AGI, including its definition, development goals, and associated risks, but could benefit from more technical details."
      },
      "deepseek-r1:1.5b": {
        "score": 5,
        "rationale": "This summary is well-structured and comprehensive, accurately capturing the essence of AGI, its distinctions from narrow AI, and the associated risks while maintaining clarity."
      },
      "phi3:latest": {
        "score": 4,
        "rationale": "The summary effectively outlines the definition and characteristics of AGI, but it lacks some depth in discussing the implications and technical definitions compared to the best summary."
      }
    },
    "winner": "deepseek-r1:1.5b"
  },
  {
    "category": "Productivity",
    "url": "https://en.wikipedia.org/wiki/Scientific_management",
    "scores": {
      "llama3.2:latest": {
        "score": 4,
        "rationale": "This summary covers the main points of the article, including the origins, principles, and historical context of scientific management. However, it could be more concise and organized."
      },
      "deepseek-r1:1.5b": {
        "score": 3,
        "rationale": "While this summary captures key aspects of scientific management, it lacks clarity and organization, making it harder to follow. The bullet points are somewhat disjointed."
      },
      "phi3:latest": {
        "score": 5,
        "rationale": "This summary is well-structured, covering the essential elements of scientific management, including its principles, historical context, and criticisms. It is clear, concise, and accurately reflects the source material."
      }
    },
    "winner": "phi3:latest"
  }
 ]
--- a/week1/community-contributions/khashayar_summarizer_battle/main.py
+++ b/week1/community-contributions/khashayar_summarizer_battle/main.py
@@ -0,0 +1,214 @@
 # imports
 import os, json, ast, pathlib
 import requests
 from dotenv import load_dotenv
 from bs4 import BeautifulSoup
 from openai import OpenAI
 import traceback
 from typing import List, Dict
 from httpx import Timeout
 # ---------- utils ----------
 def openai_api_key_loader():
    load_dotenv(dotenv_path=".env", override=True)
    api_key = os.getenv('OPENAI_API_KEY')
    if not api_key:
        print("❌ No API key found. Please check your .env file.")
        return False
    if not api_key.startswith("sk-proj-"):
        print("⚠️ API key found, but does not start with 'sk-proj-'. Check you're using the right one.")
        return False
    if api_key.strip() != api_key:
        print("⚠️ API key has leading/trailing whitespace. Please clean it.")
        return False
    print("✅ API key found and looks good!")
    return True
 def ollama_installed_tags(base_url="http://localhost:11434"):
    r = requests.get(f"{base_url}/api/tags", timeout=10)
    r.raise_for_status()
    return {m["name"] for m in r.json().get("models", [])}
 def get_urls(file_name: str):
    with open(f"{file_name}.txt", "r") as f:
        content = f.read()
    url_dict = ast.literal_eval(content)  # expects a dict literal in the file
    return url_dict
 def text_from_url(url: str):
    session = requests.Session()
    session.headers.update({
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/117.0.0.0 Safari/537.36"
        )
    })
    resp = session.get(url, timeout=30)
    resp.raise_for_status()
    soup = BeautifulSoup(resp.content, 'html.parser')
    title = soup.title.string.strip() if soup.title and soup.title.string else "No title found"
    body = soup.body
    if not body:
        return title, ""
    for irrelevant in body(["script", "style", "img", "input", "noscript"]):
        irrelevant.decompose()
    text = body.get_text(separator="\n", strip=True)
    return title, text
 # ---------- contestants (Ollama) ----------
 def summarize_with_model(text: str, model: str, ollama_client: OpenAI) -> str:
    clipped = text[:9000]  # keep it modest for small models
    messages = [
        {"role": "system", "content": "You are a concise, faithful web summarizer."},
        {"role": "user", "content": (
            "Summarize the article below in 4–6 bullet points. "
            "Be factual, avoid speculation, and do not add information not present in the text.\n\n"
            f"=== ARTICLE START ===\n{clipped}\n=== ARTICLE END ==="
        )}
    ]
    stream = ollama_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,
        stream=True,
        extra_body={"keep_alive": "30m", "num_ctx": 2048}  
    )
    chunks = []
    for event in stream:
        delta = getattr(event.choices[0].delta, "content", None)
        if delta:
            chunks.append(delta)
    return "".join(chunks).strip()
 # ---------- judge (ChatGPT) ----------
 JUDGE_MODEL = "gpt-4o-mini"
 def judge_summaries(category: str, url: str, source_text: str, summaries: dict, judge_client: OpenAI) -> dict:
    src = source_text[:12000]
    judge_prompt = f"""
                        You are the referee in a web summarization contest.
                        Task:
                        1) Read the SOURCE ARTICLE (below).
                        2) Evaluate EACH SUMMARY on: Coverage, Accuracy/Faithfulness, Clarity/Organization, Conciseness.
                        3) Give a 0–5 integer SCORE for each model (5 best).
                        4) Brief rationale (1–2 sentences per model).
                        5) Choose a single WINNER (tie-break on accuracy then clarity).
                        Return STRICT JSON only with this schema:
                        {{
                        "category": "{category}",
                        "url": "{url}",
                        "scores": {{
                            "<model_name>": {{ "score": <0-5>, "rationale": "<1-2 sentences>" }}
                        }},
                        "winner": "<model_name>"
                        }}
                        SOURCE ARTICLE:
                        {src}
                        SUMMARIES:
                    """
    for m, s in summaries.items():
        judge_prompt += f"\n--- {m} ---\n{s}\n"
    messages = [
        {"role": "system", "content": "You are a strict, reliable evaluation judge for summaries."},
        {"role": "user", "content": judge_prompt}
    ]
    resp = judge_client.chat.completions.create(
                                                model=JUDGE_MODEL,
                                                messages=messages,
                                                response_format={"type": "json_object"},
                                                temperature=0
                                                )
    content = resp.choices[0].message.content
    try:
        return json.loads(content)
    except json.JSONDecodeError:
        # fallback: wrap in an envelope if the model added extra text
        start = content.find("{")
        end = content.rfind("}")
        return json.loads(content[start:end+1])
 def run_battle(url_dict: Dict[str, str], ollama_client: OpenAI, judge_client: OpenAI, models: List[str]) -> List[dict]:
    all_results = []
    for category, url in url_dict.items():
        title, text = text_from_url(url)
        summaries = {}
        for m in models:
            try:
                summaries[m] = summarize_with_model(text, m, ollama_client)
            except Exception as e:
                print(f"\n--- Error from {m} ---")
                print(repr(e))
                traceback.print_exc()
                summaries[m] = f"[ERROR from {m}: {e}]"
        clean_summaries = {m: s for m, s in summaries.items() if not s.startswith("[ERROR")}
        verdict = judge_summaries(category, url, text, clean_summaries or summaries, judge_client)
        all_results.append(verdict)
    return all_results
 def warmup(ollama_client: OpenAI, model: str):
    try:
        ollama_client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": "OK"}],
            temperature=0,
            extra_body={"keep_alive": "30m"}
        )
    except Exception as e:
        print(f"[warmup] {model}: {e}")
 # ---------- main ----------
 def main():
    if not openai_api_key_loader():
        return
    # contestants (local Ollama)
    ollama_client = OpenAI(
        base_url="http://localhost:11434/v1",
        api_key="ollama",
        timeout=Timeout(300.0, connect=30.0)  # generous read/connect timeouts
    )
    # judge (cloud OpenAI)
    judge_client = OpenAI()
    available = ollama_installed_tags()
    desired = ["llama3.2:latest", "deepseek-r1:1.5b", "phi3:latest"]  # keep here
    models  = [m for m in desired if m in available]
    print("Available:", sorted(available))
    print("Desired  :", desired)
    print("Running  :", models)
    if not models:
        raise RuntimeError(f"No desired models installed. Have: {sorted(available)}")
    url_dict = get_urls(file_name="urls")
    for m in models:
        warmup(ollama_client, m)
    results = run_battle(url_dict, ollama_client, judge_client, models)
    pathlib.Path("battle_results.json").write_text(json.dumps(results, indent=2), encoding="utf-8")
    print(json.dumps(results, indent=2))
 if __name__ == "__main__":
    main()
--- a/week1/community-contributions/khashayar_summarizer_battle/requirements.txt
+++ b/week1/community-contributions/khashayar_summarizer_battle/requirements.txt
@@ -0,0 +1,68 @@
 annotated-types==0.7.0
 anyio==4.10.0
 appnope @ file:///home/conda/feedstock_root/build_artifacts/appnope_1733332318622/work
 asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1733250440834/work
 attrs==25.3.0
 beautifulsoup4==4.13.5
 bs4==0.0.2
 certifi==2025.8.3
 charset-normalizer==3.4.3
 comm @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_comm_1753453984/work
 debugpy @ file:///Users/runner/miniforge3/conda-bld/bld/rattler-build_debugpy_1758162070/work
 decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1740384970518/work
 distro==1.9.0
 dotenv==0.9.9
 exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1746947292760/work
 executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1756729339227/work
 h11==0.16.0
 httpcore==1.0.9
 httpx==0.28.1
 idna==3.10
 importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_importlib-metadata_1747934053/work
 ipykernel @ file:///Users/runner/miniforge3/conda-bld/ipykernel_1754352890318/work
 ipython @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_ipython_1748711175/work
 jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1733300866624/work
 jiter==0.11.0
 jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1733440914442/work
 jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1748333051527/work
 matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1733416936468/work
 nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1733325553580/work
 ollama==0.5.4
 openai==1.108.1
 outcome==1.3.0.post0
 packaging @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_packaging_1745345660/work
 parso @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_parso_1755974222/work
 pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1733301927746/work
 pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1733327343728/work
 platformdirs @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_platformdirs_1756227402/work
 prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1756321756983/work
 psutil @ file:///Users/runner/miniforge3/conda-bld/psutil_1758169248045/work
 ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1733302279685/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl#sha256=92c32ff62b5fd8cf325bec5ab90d7be3d2a8ca8c8a3813ff487a8d2002630d1f
 pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1733569405015/work
 pydantic==2.11.9
 pydantic_core==2.33.2
 Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1750615794071/work
 PySocks==1.7.1
 python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_python-dateutil_1751104122/work
 python-dotenv==1.1.1
 pyzmq @ file:///Users/runner/miniforge3/conda-bld/bld/rattler-build_pyzmq_1757387129/work
 requests==2.32.5
 selenium==4.35.0
 six @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_six_1753199211/work
 sniffio==1.3.1
 sortedcontainers==2.4.0
 soupsieve==2.8
 stack_data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1733569443808/work
 tornado @ file:///Users/runner/miniforge3/conda-bld/tornado_1756854937117/work
 tqdm==4.67.1
 traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1733367359838/work
 trio==0.30.0
 trio-websocket==0.12.2
 typing-inspection==0.4.1
 typing_extensions==4.14.1
 urllib3==2.5.0
 wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1733231326287/work
 webdriver-manager==4.0.2
 websocket-client==1.8.0
 wsproto==1.2.0
 zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1749421620841/work
--- a/week1/community-contributions/khashayar_summarizer_battle/urls.txt
+++ b/week1/community-contributions/khashayar_summarizer_battle/urls.txt
@@ -0,0 +1,7 @@
 {
    "sports": "https://en.wikipedia.org/wiki/Sabermetrics", 
    "Politics": "https://en.wikipedia.org/wiki/Separation_of_powers",
    "History": "https://en.wikipedia.org/wiki/Industrial_Revolution",
    "Technology": "https://en.wikipedia.org/wiki/Artificial_general_intelligence",
    "Productivity": "https://en.wikipedia.org/wiki/Scientific_management",
 }