LLM_Engineering_OLD/community-contributions/Prashanth/Week 1/day2_local_ollama.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "8d0046e5-13fc-410d-be51-6b5c0e423280",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !ollama pull deepseek-r1:1.5b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "32ae5855-570e-4f0b-8e72-12d372185195",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import requests\n",
    "from dotenv import load_dotenv\n",
    "from bs4 import BeautifulSoup\n",
    "from IPython.display import Markdown, display\n",
    "from openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "a7c36be5-a658-4228-8b2e-cfb160615941",
   "metadata": {},
   "outputs": [],
   "source": [
    "headers = {\n",
    " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
    "}\n",
    "\n",
    "class Website:\n",
    "    def __init__(self,url):\n",
    "        self.url = url\n",
    "        response = requests.get(url, headers=headers)\n",
    "        soup = BeautifulSoup(response.content,'html.parser')\n",
    "        self.title = soup.title.string if soup.title else \"No title found\"\n",
    "        for irrelevant in soup.body([\"script\",\"style\",\"img\",\"input\"]):\n",
    "            irrelevant.decompose()\n",
    "        self.text = soup.body.get_text(separator=\"\\n\",strip=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "49856ef2-dc1a-406d-8ff3-e37858279e83",
   "metadata": {},
   "outputs": [],
   "source": [
    "OLLAMA_API = \"http://localhost:11434/api/chat\"\n",
    "HEADERS = {\"Content-Type\":\"application/json\"}\n",
    "MODEL = \"llama3.2\"\n",
    "# MODEL = \"deepseek-r1:1.5b\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "c2b8a150-f050-45d7-be24-a5d9b2c1e738",
   "metadata": {},
   "outputs": [],
   "source": [
    "def user_prompt(website):\n",
    "    prompt = f\"You are looking at a website titled {website.title}\"\n",
    "    prompt += \"\\nThe contents of this website is as follows; \\\n",
    "    please provide a short summary of this website in markdown. \\\n",
    "    If it includes newsor announcements, then summarize these too.\\n\\n\"\n",
    "    prompt += website.text\n",
    "    return prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "38ae9ff8-5983-4cad-a73d-9c028c5c08fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "we = Website(\"https://edition.cnn.com/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "e335fbbd-5d47-44cc-b064-abaa0d80ffd3",
   "metadata": {},
   "outputs": [],
   "source": [
    "message = [\n",
    "    # {\"role\":\"system\",\"content\":\"you are a smart assistant\"},\n",
    "    {\"role\":\"user\",\"content\":user_prompt(we)}\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "c089dfa0-aa98-40c8-af1f-d7699c59a4ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "payload = {\n",
    "    \"model\": MODEL,\n",
    "    \"messages\": message,\n",
    "    \"stream\":False\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "ccffb18a-0cb1-497d-bdf2-a7585ca0789d",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = requests.post(OLLAMA_API,json=payload,headers=HEADERS)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "14d98b3d-4cae-44d3-8a2a-80557f396c72",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<think>\n",
       "Alright, let me break this down step by step. The user provided a detailed query about the Breaking News website titled \"Breaking News, Latest News and Videos | CNN.\" They want a short summary in markdown, including relevant news or announcements.\n",
       "\n",
       "First, I need to identify the key sections of the content on the website. The main areas are breaking news, events, culture, health, education, technology, climate, weather, etc. Each of these categories has several news points that contribute to the overall narrative.\n",
       "\n",
       "Looking at the user's query, they specifically mentioned asking for the ad feedback section and including any technical issues. From the content provided, there are several ads listed on the homepage. The user might be interested in understanding how CNN handles its advertisements and what users might expect from them.\n",
       "\n",
       "I'll need to extract the main points from each relevant section. For breaking news, I can mention notable events like the Ukraine-Russia war or Israel-Hamas war. In culture, I'll note that the website promotes diversity, especially with content about athletes in Mexico City.\n",
       "\n",
       "For health, there are mentions of Tekken, a video game related to autism, and issues with vaccines, which are crucial for public health awareness. Education is covered by topics on the arts, tech, and science, which likely target various audiences.\n",
       "\n",
       "Technology news includes updates on AI and blockchain, which are hot areas in today's industry. Climate and weather sections highlight natural disasters like hurricanes, a significant global issue, and geopolitical events.\n",
       "\n",
       "Weather specifically mentions hurricane impacts and climate change impacts, which are essential for meteorological awareness. Environmental and safety content focuses on sustainability and disaster management.\n",
       "\n",
       "The user also mentioned video topics, including rocket launches and tech reviews. These can be useful for viewers looking to stay updated in their areas of interest.\n",
       "\n",
       "Putting it all together, the summary should reflect each section's key points concisely, ensuring that all major news items are included without getting bogged down by minor details. It should give a clear overview of the website's content and what users might want to read.\n",
       "</think>\n",
       "\n",
       "Breaking News, Latest News and Videos | CNN  \n",
       "\n",
       "The Breaking News website \"Breaking News, Latest News and Videos | CNN\" offers a wide range of news updates, videos, and culture-focused content. Below is a concise summary of its key sections:  \n",
       "\n",
       "1. **Breakings**  \n",
       "   - The site highlights significant events, including Ukraine-Russia War updates, Israel-Hamas War developments, and notable athletes in Mexico City.  \n",
       "\n",
       "2. **Cultures**  \n",
       "   - Promotes diversity and inclusion, with content on diverse topics such as the beauty of diversity, tech innovations, and cultural heritage.  \n",
       "\n",
       "3. **Health**  \n",
       "   - Features news related to diseases like Tekken (a video game based on autism), vaccine hesitancies, and health issues affecting the population.  \n",
       "\n",
       "4. **Education**  \n",
       "   - Provides information on trending topics in education, including art, technology, science, and sustainability.  \n",
       "\n",
       "5. **Technology**  \n",
       "   - Discusses advancements in AI, blockchain, and other emerging technologies relevant to modern society.  \n",
       "\n",
       "6. **Climate and Weather**  \n",
       "   - Highlights natural disasters like hurricanes, climate change impacts, and weather-related news such as tornadoes.  \n",
       "\n",
       "7. **Video**  \n",
       "   - Offers a variety of video content, including rocket launches, tech reviews, and sports highlights from popular events.  \n",
       "\n",
       "8. **Vibe**  \n",
       "   - Focuses on environmental topics like renewable energy, sustainability, and disaster management efforts.  \n",
       "\n",
       "The site is a comprehensive resource for staying informed about current events, technology news, culture, health, and more."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# print(response.json()['message']['content'])\n",
    "display(Markdown(response.json()['message']['content']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "f3aec6e9-4e38-447d-a5f5-9b89ea2cd51a",
   "metadata": {},
   "outputs": [],
   "source": [
    "ed = Website(\"https://edition.cnn.com/\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}