Launching refreshed version of LLM Engineering weeks 1-4 - see README

2025-10-11 15:58:39 -04:00
parent 3286cfb395
commit c7257b9ae6
68 changed files with 16583 additions and 3756 deletions
--- a/week1/day1.ipynb
+++ b/week1/day1.ipynb
@@ -8,64 +8,31 @@
    "# YOUR FIRST LAB\n",
    "### Please read this section. This is valuable to get you prepared, even if it's a long read -- it's important stuff.\n",
    "\n",
-    "## Your first Frontier LLM Project\n",
+    "Be sure to read the README.md first!\n",
    "\n",
-    "Let's build a useful LLM solution - in a matter of minutes.\n",
+    "## Your first Frontier LLM Project\n",
    "\n",
    "By the end of this course, you will have built an autonomous Agentic AI solution with 7 agents that collaborate to solve a business problem. All in good time! We will start with something smaller...\n",
    "\n",
    "Our goal is to code a new kind of Web Browser. Give it a URL, and it will respond with a summary. The Reader's Digest of the internet!!\n",
    "\n",
-    "Before starting, you should have completed the setup for [PC](../SETUP-PC.md) or [Mac](../SETUP-mac.md) and you hopefully launched this jupyter lab from within the project root directory, with your environment activated.\n",
+    "Before starting, you should have completed the setup linked in the README.\n",
    "\n",
-    "## If you're new to Jupyter Lab\n",
+    "### If you're new to working in \"Notebooks\" (also known as Labs or Jupyter Lab)\n",
    "\n",
-    "Welcome to the wonderful world of Data Science experimentation! Once you've used Jupyter Lab, you'll wonder how you ever lived without it. Simply click in each \"cell\" with code in it, such as the cell immediately below this text, and hit Shift+Return to execute that cell. As you wish, you can add a cell with the + button in the toolbar, and print values of variables, or try out variations.  \n",
+    "Welcome to the wonderful world of Data Science experimentation! Simply click in each \"cell\" with code in it, such as the cell immediately below this text, and hit Shift+Return to execute that cell. Be sure to run every cell, starting at the top, in order.\n",
    "\n",
-    "I've written a notebook called [Guide to Jupyter](Guide%20to%20Jupyter.ipynb) to help you get more familiar with Jupyter Labs, including adding Markdown comments, using `!` to run shell commands, and `tqdm` to show progress.\n",
-    "\n",
-    "## If you're new to the Command Line\n",
-    "\n",
-    "Please see these excellent guides: [Command line on PC](https://chatgpt.com/share/67b0acea-ba38-8012-9c34-7a2541052665) and [Command line on Mac](https://chatgpt.com/canvas/shared/67b0b10c93a081918210723867525d2b).  \n",
-    "\n",
-    "## If you'd prefer to work in IDEs\n",
-    "\n",
-    "If you're more comfortable in IDEs like VSCode, Cursor or PyCharm, they both work great with these lab notebooks too.  \n",
-    "If you'd prefer to work in VSCode, [here](https://chatgpt.com/share/676f2e19-c228-8012-9911-6ca42f8ed766) are instructions from an AI friend on how to configure it for the course.\n",
-    "\n",
-    "## If you'd like to brush up your Python\n",
-    "\n",
-    "I've added a notebook called [Intermediate Python](Intermediate%20Python.ipynb) to get you up to speed. But you should give it a miss if you already have a good idea what this code does:    \n",
-    "`yield from {book.get(\"author\") for book in books if book.get(\"author\")}`\n",
+    "Please look in the [Guides folder](../guides/01_intro.ipynb) for all the guides.\n",
    "\n",
    "## I am here to help\n",
    "\n",
    "If you have any problems at all, please do reach out.  \n",
    "I'm available through the platform, or at ed@edwarddonner.com, or at https://www.linkedin.com/in/eddonner/ if you'd like to connect (and I love connecting!)  \n",
-    "And this is new to me, but I'm also trying out X/Twitter at [@edwarddonner](https://x.com/edwarddonner) - if you're on X, please show me how it's done 😂  \n",
+    "And this is new to me, but I'm also trying out X at [@edwarddonner](https://x.com/edwarddonner) - if you're on X, please show me how it's done 😂  \n",
    "\n",
    "## More troubleshooting\n",
    "\n",
-    "Please see the [troubleshooting](troubleshooting.ipynb) notebook in this folder to diagnose and fix common problems. At the very end of it is a diagnostics script with some useful debug info.\n",
-    "\n",
-    "## For foundational technical knowledge (eg Git, APIs, debugging) \n",
-    "\n",
-    "If you're relatively new to programming -- I've got your back! While it's ideal to have some programming experience for this course, there's only one mandatory prerequisite: plenty of patience. 😁 I've put together a set of self-study guides that cover Git and GitHub, APIs and endpoints, beginner python and more.\n",
-    "\n",
-    "This covers Git and GitHub; what they are, the difference, and how to use them:  \n",
-    "https://github.com/ed-donner/agents/blob/main/guides/03_git_and_github.ipynb\n",
-    "\n",
-    "This covers technical foundations:  \n",
-    "ChatGPT vs API; taking screenshots; Environment Variables; Networking basics; APIs and endpoints:  \n",
-    "https://github.com/ed-donner/agents/blob/main/guides/04_technical_foundations.ipynb\n",
-    "\n",
-    "This covers Python for beginners, and making sure that a `NameError` never trips you up:  \n",
-    "https://github.com/ed-donner/agents/blob/main/guides/06_python_foundations.ipynb\n",
-    "\n",
-    "This covers the essential techniques for figuring out errors:  \n",
-    "https://github.com/ed-donner/agents/blob/main/guides/08_debugging.ipynb\n",
-    "\n",
-    "And you'll find other useful guides in the same folder in GitHub. Some information applies to my other Udemy course (eg Async Python) but most of it is very relevant for LLM engineering.\n",
+    "Please see the [troubleshooting](../setup/troubleshooting.ipynb) notebook in the setup folder to diagnose and fix common problems. At the very end of it is a diagnostics script with some useful debug info.\n",
    "\n",
    "## If this is old hat!\n",
    "\n",
@@ -74,7 +41,7 @@
    "<table style=\"margin: 0; text-align: left;\">\n",
    "    <tr>\n",
    "        <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
-    "            <img src=\"../important.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
+    "            <img src=\"../assets/important.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
    "        </td>\n",
    "        <td>\n",
    "            <h2 style=\"color:#900;\">Please read - important note</h2>\n",
@@ -85,7 +52,7 @@
    "<table style=\"margin: 0; text-align: left;\">\n",
    "    <tr>\n",
    "        <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
-    "            <img src=\"../resources.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
+    "            <img src=\"../assets/resources.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
    "        </td>\n",
    "        <td>\n",
    "            <h2 style=\"color:#f71;\">This code is a live resource - keep an eye out for my emails</h2>\n",
@@ -98,7 +65,7 @@
    "<table style=\"margin: 0; text-align: left;\">\n",
    "    <tr>\n",
    "        <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
-    "            <img src=\"../business.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
+    "            <img src=\"../assets/business.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
    "        </td>\n",
    "        <td>\n",
    "            <h2 style=\"color:#181;\">Business value of these exercises</h2>\n",
@@ -108,6 +75,33 @@
    "</table>"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "83f28feb",
+   "metadata": {},
+   "source": [
+    "### If necessary, install Cursor Extensions\n",
+    "\n",
+    "1. From the View menu, select Extensions\n",
+    "2. Search for Python\n",
+    "3. Click on \"Python\" made by \"ms-python\" and select Install if not already installed\n",
+    "4. Search for Jupyter\n",
+    "5. Click on \"Jupyter\" made by \"ms-toolsai\" and select Install of not already installed\n",
+    "\n",
+    "\n",
+    "### Next Select the Kernel\n",
+    "\n",
+    "Click on \"Select Kernel\" on the Top Right\n",
+    "\n",
+    "Choose \"Python Environments...\"\n",
+    "\n",
+    "Then choose the one that looks like `.venv (Python 3.12.x) .venv/bin/python` - it should be marked as \"Recommended\" and have a big star next to it.\n",
+    "\n",
+    "Any problems with this? Head over to the troubleshooting.\n",
+    "\n",
+    "### Note: you'll need to set the Kernel with every notebook.."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@@ -118,9 +112,8 @@
    "# imports\n",
    "\n",
    "import os\n",
-    "import requests\n",
    "from dotenv import load_dotenv\n",
-    "from bs4 import BeautifulSoup\n",
+    "from scraper import fetch_website_contents\n",
    "from IPython.display import Markdown, display\n",
    "from openai import OpenAI\n",
    "\n",
@@ -140,9 +133,9 @@
    "\n",
    "## Troubleshooting if you have problems:\n",
    "\n",
-    "Head over to the [troubleshooting](troubleshooting.ipynb) notebook in this folder for step by step code to identify the root cause and fix it!\n",
+    "If you get a \"Name Error\" - have you run all cells from the top down? Head over to the Python Foundations guide for a bulletproof way to find and fix all Name Errors.\n",
    "\n",
-    "If you make a change, try restarting the \"Kernel\" (the python process sitting behind this notebook) by Kernel menu >> Restart Kernel and Clear Outputs of All Cells. Then try this notebook again, starting at the top.\n",
+    "If that doesn't fix it, head over to the [troubleshooting](../setup/troubleshooting.ipynb) notebook for step by step code to identify the root cause and fix it!\n",
    "\n",
    "Or, contact me! Message me or email ed@edwarddonner.com and we will get this to work.\n",
    "\n",
@@ -173,19 +166,6 @@
    "    print(\"API key found and looks good so far!\")\n"
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "019974d9-f3ad-4a8a-b5f9-0a3719aea2d3",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "openai = OpenAI()\n",
-    "\n",
-    "# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.\n",
-    "# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions"
-   ]
-  },
  {
   "cell_type": "markdown",
   "id": "442fc84b-0815-4f40-99ab-d9a5da6bda91",
@@ -204,8 +184,23 @@
    "# To give you a preview -- calling OpenAI with these messages is this easy. Any problems, head over to the Troubleshooting notebook.\n",
    "\n",
    "message = \"Hello, GPT! This is my first ever message to you! Hi!\"\n",
-    "response = openai.chat.completions.create(model=\"gpt-4o-mini\", messages=[{\"role\":\"user\", \"content\":message}])\n",
-    "print(response.choices[0].message.content)"
+    "\n",
+    "messages = [{\"role\": \"user\", \"content\": message}]\n",
+    "\n",
+    "messages\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "08330159",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "openai = OpenAI()\n",
+    "\n",
+    "response = openai.chat.completions.create(model=\"gpt-5-nano\", messages=messages)\n",
+    "response.choices[0].message.content"
   ]
  },
  {
@@ -216,36 +211,6 @@
    "## OK onwards with our first project"
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c5e793b2-6775-426a-a139-4848291d0463",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# A class to represent a Webpage\n",
-    "# If you're not familiar with Classes, check out the \"Intermediate Python\" notebook\n",
-    "\n",
-    "# Some websites need you to use proper headers when fetching them:\n",
-    "headers = {\n",
-    " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
-    "}\n",
-    "\n",
-    "class Website:\n",
-    "\n",
-    "    def __init__(self, url):\n",
-    "        \"\"\"\n",
-    "        Create this Website object from the given url using the BeautifulSoup library\n",
-    "        \"\"\"\n",
-    "        self.url = url\n",
-    "        response = requests.get(url, headers=headers)\n",
-    "        soup = BeautifulSoup(response.content, 'html.parser')\n",
-    "        self.title = soup.title.string if soup.title else \"No title found\"\n",
-    "        for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
-    "            irrelevant.decompose()\n",
-    "        self.text = soup.body.get_text(separator=\"\\n\", strip=True)"
-   ]
-  },
  {
   "cell_type": "code",
   "execution_count": null,
@@ -253,11 +218,10 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "# Let's try one out. Change the website and add print statements to follow along.\n",
+    "# Let's try out this utility\n",
    "\n",
-    "ed = Website(\"https://edwarddonner.com\")\n",
-    "print(ed.title)\n",
-    "print(ed.text)"
+    "ed = fetch_website_contents(\"https://edwarddonner.com\")\n",
+    "print(ed)"
   ]
  },
  {
@@ -269,7 +233,7 @@
    "\n",
    "You may know this already - but if not, you will get very familiar with it!\n",
    "\n",
-    "Models like GPT4o have been trained to receive instructions in a particular way.\n",
+    "Models like GPT have been trained to receive instructions in a particular way.\n",
    "\n",
    "They expect to receive:\n",
    "\n",
@@ -287,9 +251,11 @@
   "source": [
    "# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish.\"\n",
    "\n",
-    "system_prompt = \"You are an assistant that analyzes the contents of a website \\\n",
-    "and provides a short summary, ignoring text that might be navigation related. \\\n",
-    "Respond in markdown.\""
+    "system_prompt = \"\"\"\n",
+    "You are a snarkyassistant that analyzes the contents of a website,\n",
+    "and provides a short, snarky, humorous summary, ignoring text that might be navigation related.\n",
+    "Respond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.\n",
+    "\"\"\""
   ]
  },
  {
@@ -299,25 +265,14 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "# A function that writes a User Prompt that asks for summaries of websites:\n",
+    "# Define our user prompt\n",
    "\n",
-    "def user_prompt_for(website):\n",
-    "    user_prompt = f\"You are looking at a website titled {website.title}\"\n",
-    "    user_prompt += \"\\nThe contents of this website is as follows; \\\n",
-    "please provide a short summary of this website in markdown. \\\n",
-    "If it includes news or announcements, then summarize these too.\\n\\n\"\n",
-    "    user_prompt += website.text\n",
-    "    return user_prompt"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "26448ec4-5c00-4204-baec-7df91d11ff2e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print(user_prompt_for(ed))"
+    "user_prompt_prefix = \"\"\"\n",
+    "Here are the contents of a website.\n",
+    "Provide a short summary of this website.\n",
+    "If it includes news or announcements, then summarize these too.\n",
+    "\n",
+    "\"\"\""
   ]
  },
  {
@@ -347,22 +302,12 @@
   "outputs": [],
   "source": [
    "messages = [\n",
-    "    {\"role\": \"system\", \"content\": \"You are a snarky assistant\"},\n",
+    "    {\"role\": \"system\", \"content\": \"You are a helpful assistant\"},\n",
    "    {\"role\": \"user\", \"content\": \"What is 2 + 2?\"}\n",
-    "]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "21ed95c5-7001-47de-a36d-1d6673b403ce",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# To give you a preview -- calling OpenAI with system and user messages:\n",
+    "]\n",
    "\n",
-    "response = openai.chat.completions.create(model=\"gpt-4o-mini\", messages=messages)\n",
-    "print(response.choices[0].message.content)"
+    "response = openai.chat.completions.create(model=\"gpt-4.1-nano\", messages=messages)\n",
+    "response.choices[0].message.content"
   ]
  },
  {
@@ -370,7 +315,7 @@
   "id": "d06e8d78-ce4c-4b05-aa8e-17050c82bb47",
   "metadata": {},
   "source": [
-    "## And now let's build useful messages for GPT-4o-mini, using a function"
+    "## And now let's build useful messages for GPT-4.1-mini, using a function"
   ]
  },
  {
@@ -385,7 +330,7 @@
    "def messages_for(website):\n",
    "    return [\n",
    "        {\"role\": \"system\", \"content\": system_prompt},\n",
-    "        {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
+    "        {\"role\": \"user\", \"content\": user_prompt_prefix + website}\n",
    "    ]"
   ]
  },
@@ -419,9 +364,9 @@
    "# And now: call the OpenAI API. You will get very familiar with this!\n",
    "\n",
    "def summarize(url):\n",
-    "    website = Website(url)\n",
+    "    website = fetch_website_contents(url)\n",
    "    response = openai.chat.completions.create(\n",
-    "        model = \"gpt-4o-mini\",\n",
+    "        model = \"gpt-4.1-mini\",\n",
    "        messages = messages_for(website)\n",
    "    )\n",
    "    return response.choices[0].message.content"
@@ -444,7 +389,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "# A function to display this nicely in the Jupyter output, using markdown\n",
+    "# A function to display this nicely in the output, using markdown\n",
    "\n",
    "def display_summary(url):\n",
    "    summary = summarize(url)\n",
@@ -505,7 +450,7 @@
    "<table style=\"margin: 0; text-align: left;\">\n",
    "    <tr>\n",
    "        <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
-    "            <img src=\"../business.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
+    "            <img src=\"../assets/business.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
    "        </td>\n",
    "        <td>\n",
    "            <h2 style=\"color:#181;\">Business applications</h2>\n",
@@ -519,7 +464,7 @@
    "<table style=\"margin: 0; text-align: left;\">\n",
    "    <tr>\n",
    "        <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
-    "            <img src=\"../important.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
+    "            <img src=\"../assets/important.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
    "        </td>\n",
    "        <td>\n",
    "            <h2 style=\"color:#900;\">Before you continue - now try yourself</h2>\n",
@@ -549,12 +494,10 @@
    "messages = [] # fill this in\n",
    "\n",
    "# Step 3: Call OpenAI\n",
-    "\n",
-    "response =\n",
+    "# response =\n",
    "\n",
    "# Step 4: print the result\n",
-    "\n",
-    "print("
+    "# print("
   ]
  },
  {
@@ -593,7 +536,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
@@ -607,7 +550,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.12"
+   "version": "3.12.9"
  }
 },
 "nbformat": 4,