LLM_Engineering_OLD/week1/day1.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9",
   "metadata": {},
   "source": [
    "# Instant Gratification!\n",
    "\n",
    "Let's build a useful LLM solution - in a matter of minutes.\n",
    "\n",
    "Our goal is to code a new kind of Web Browser. Give it a URL, and it will respond with a summary. The Reader's Digest of the internet!!\n",
    "\n",
    "Before starting, be sure to have followed the instructions in the \"README\" file, including creating your API key with OpenAI and adding it to the `.env` file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports\n",
    "\n",
    "import os\n",
    "import requests\n",
    "from dotenv import load_dotenv\n",
    "from bs4 import BeautifulSoup\n",
    "from IPython.display import Markdown, display\n",
    "from openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b87cadb-d513-4303-baee-a37b6f938e4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load environment variables in a file called .env\n",
    "\n",
    "load_dotenv()\n",
    "os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n",
    "openai = OpenAI()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5e793b2-6775-426a-a139-4848291d0463",
   "metadata": {},
   "outputs": [],
   "source": [
    "# A class to represent a Webpage\n",
    "\n",
    "class Website:\n",
    "    url: str\n",
    "    title: str\n",
    "    text: str\n",
    "\n",
    "    def __init__(self, url):\n",
    "        self.url = url\n",
    "        response = requests.get(url)\n",
    "        soup = BeautifulSoup(response.content, 'html.parser')\n",
    "        self.title = soup.title.string if soup.title else \"No title found\"\n",
    "        for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
    "            irrelevant.decompose()\n",
    "        self.text = soup.body.get_text(separator=\"\\n\", strip=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ef960cf-6dc2-4cda-afb3-b38be12f4c97",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's try one out\n",
    "\n",
    "ed = Website(\"https://edwarddonner.com\")\n",
    "print(ed.title)\n",
    "print(ed.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a478a0c-2c53-48ff-869c-4d08199931e1",
   "metadata": {},
   "source": [
    "## Types of prompts\n",
    "\n",
    "You may know this already - but if not, you will get very familiar with it!\n",
    "\n",
    "Models like GPT4o have been trained to receive instructions in a particular way.\n",
    "\n",
    "They expect to receive:\n",
    "\n",
    "**A system prompt** that tells them what task they are performing and what tone they should use\n",
    "\n",
    "**A user prompt** -- the conversation starter that they should reply to"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "abdb8417-c5dc-44bc-9bee-2e059d162699",
   "metadata": {},
   "outputs": [],
   "source": [
    "system_prompt = \"You are an assistant that analyzes the contents of a website \\\n",
    "and provides a short summary, ignoring text that might be navigation related. \\\n",
    "Respond in markdown.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0275b1b-7cfe-4f9d-abfa-7650d378da0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def user_prompt_for(website):\n",
    "    user_prompt = f\"You are looking at a website titled {website.title}\"\n",
    "    user_prompt += \"The contents of this website is as follows; \\\n",
    "please provide a short summary of this website in markdown. \\\n",
    "If it includes news or announcements, then summarize these too.\\n\\n\"\n",
    "    user_prompt += website.text\n",
    "    return user_prompt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea211b5f-28e1-4a86-8e52-c0b7677cadcc",
   "metadata": {},
   "source": [
    "## Messages\n",
    "\n",
    "The API from OpenAI expects to receive messages in a particular structure.\n",
    "Many of the other APIs share this structure:\n",
    "\n",
    "```\n",
    "[\n",
    "    {\"role\": \"system\", \"content\": \"system message goes here\"},\n",
    "    {\"role\": \"user\", \"content\": \"user message goes here\"}\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0134dfa4-8299-48b5-b444-f2a8c3403c88",
   "metadata": {},
   "outputs": [],
   "source": [
    "def messages_for(website):\n",
    "    return [\n",
    "        {\"role\": \"system\", \"content\": system_prompt},\n",
    "        {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
    "    ]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16f49d46-bf55-4c3e-928f-68fc0bf715b0",
   "metadata": {},
   "source": [
    "## Time to bring it together - the API for OpenAI is very simple!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "905b9919-aba7-45b5-ae65-81b3d1d78e34",
   "metadata": {},
   "outputs": [],
   "source": [
    "def summarize(url):\n",
    "    website = Website(url)\n",
    "    response = openai.chat.completions.create(\n",
    "        model = \"gpt-4o-mini\",\n",
    "        messages = messages_for(website)\n",
    "    )\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "05e38d41-dfa4-4b20-9c96-c46ea75d9fb5",
   "metadata": {},
   "outputs": [],
   "source": [
    "summarize(\"https://edwarddonner.com\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d926d59-450e-4609-92ba-2d6f244f1342",
   "metadata": {},
   "outputs": [],
   "source": [
    "def display_summary(url):\n",
    "    summary = summarize(url)\n",
    "    display(Markdown(summary))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3018853a-445f-41ff-9560-d925d1774b2f",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_summary(\"https://edwarddonner.com\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45d83403-a24c-44b5-84ac-961449b4008f",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_summary(\"https://cnn.com\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75e9fd40-b354-4341-991e-863ef2e59db7",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_summary(\"https://anthropic.com\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49c4315f-340b-4371-b6cd-2a772f4b7bdd",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}