267 lines
8.5 KiB
Plaintext
267 lines
8.5 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Welcome to your first assignment!\n",
|
|
"\n",
|
|
"Instructions are below. Please give this a try, and look in the solutions folder if you get stuck (or feel free to ask me!)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ada885d9-4d42-4d9b-97f0-74fbbbfe93a9",
|
|
"metadata": {},
|
|
"source": [
|
|
"<table style=\"margin: 0; text-align: left;\">\n",
|
|
" <tr>\n",
|
|
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
|
|
" <img src=\"../resources.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
|
|
" </td>\n",
|
|
" <td>\n",
|
|
" <h2 style=\"color:#f71;\">Just before we get to the assignment --</h2>\n",
|
|
" <span style=\"color:#f71;\">I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.<br/>\n",
|
|
" <a href=\"https://edwarddonner.com/2024/11/13/llm-engineering-resources/\">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>\n",
|
|
" Please keep this bookmarked, and I'll continue to add more useful links there over time.\n",
|
|
" </span>\n",
|
|
" </td>\n",
|
|
" </tr>\n",
|
|
"</table>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9cc85216-f6e4-436e-b6c1-976c8f2d1152",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!pip install webdriver-manager\n",
|
|
"!pip install selenium"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# imports\n",
|
|
"\n",
|
|
"import requests\n",
|
|
"from bs4 import BeautifulSoup\n",
|
|
"from IPython.display import Markdown, display\n",
|
|
"import ollama\n",
|
|
"from openai import OpenAI\n",
|
|
"from selenium import webdriver\n",
|
|
"from selenium.webdriver.chrome.options import Options\n",
|
|
"from selenium.webdriver.chrome.service import Service\n",
|
|
"from webdriver_manager.chrome import ChromeDriverManager\n",
|
|
"import time"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "29ddd15d-a3c5-4f4e-a678-873f56162724",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Constants\n",
|
|
"MODEL = \"llama3.2\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "479ff514-e8bd-4985-a572-2ea28bb4fa40",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Let's just make sure the model is loaded\n",
|
|
"\n",
|
|
"!ollama pull llama3.2"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6a021f13-d6a1-4b96-8e18-4eae49d876fe",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Introducing the ollama package\n",
|
|
"\n",
|
|
"And now we'll do the same thing, but using the elegant ollama python package instead of a direct HTTP call.\n",
|
|
"\n",
|
|
"Under the hood, it's making the same call as above to the ollama server running at localhost:11434"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a4704e10-f5fb-4c15-a935-f046c06fb13d",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Alternative approach - using OpenAI python library to connect to Ollama"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "23057e00-b6fc-4678-93a9-6b31cb704bff",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# There's actually an alternative approach that some people might prefer\n",
|
|
"# You can use the OpenAI client python library to call Ollama:\n",
|
|
"\n",
|
|
"\n",
|
|
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "1622d9bb-5c68-4d4e-9ca4-b492c751f898",
|
|
"metadata": {},
|
|
"source": [
|
|
"# NOW the exercise for you\n",
|
|
"\n",
|
|
"Take the code from day1 and incorporate it here, to build a website summarizer that uses Llama 3.2 running locally instead of OpenAI; use either of the above approaches."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "8251b6a5-7b43-42b9-84a9-4a94b6bdb933",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A class to represent a Webpage\n",
|
|
"class ScrapeWebsite:\n",
|
|
" def __init__(self, url):\n",
|
|
" \"\"\"\n",
|
|
" Create this Website object from the given URL using Selenium + BeautifulSoup\n",
|
|
" Supports JavaScript-heavy and normal websites uniformly.\n",
|
|
" \"\"\"\n",
|
|
" self.url = url\n",
|
|
"\n",
|
|
" # Configure headless Chrome\n",
|
|
" options = Options()\n",
|
|
" options.add_argument('--headless')\n",
|
|
" options.add_argument('--no-sandbox')\n",
|
|
" options.add_argument('--disable-dev-shm-usage')\n",
|
|
"\n",
|
|
" # Use webdriver-manager to manage ChromeDriver\n",
|
|
" service = Service(ChromeDriverManager().install())\n",
|
|
"\n",
|
|
" # Initialize the Chrome WebDriver with the service and options\n",
|
|
" driver = webdriver.Chrome(service=service, options=options)\n",
|
|
"\n",
|
|
" # Start Selenium WebDriver\n",
|
|
" driver.get(url)\n",
|
|
"\n",
|
|
" # Wait for JS to load (adjust as needed)\n",
|
|
" time.sleep(3)\n",
|
|
"\n",
|
|
" # Fetch the page source after JS execution\n",
|
|
" page_source = driver.page_source\n",
|
|
" driver.quit()\n",
|
|
"\n",
|
|
" # Parse the HTML content with BeautifulSoup\n",
|
|
" soup = BeautifulSoup(page_source, 'html.parser')\n",
|
|
"\n",
|
|
" # Extract title\n",
|
|
" self.title = soup.title.string if soup.title else \"No title found\"\n",
|
|
"\n",
|
|
" # Remove unnecessary elements\n",
|
|
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
|
|
" irrelevant.decompose()\n",
|
|
"\n",
|
|
" # Extract the main text\n",
|
|
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "6de38216-6d1c-48c4-877b-86d403f4e0f8",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish.\"\n",
|
|
"\n",
|
|
"system_prompt = \"You are an assistant that analyzes the contents of a website \\\n",
|
|
"and provides a short summary, ignoring text that might be navigation related. \\\n",
|
|
"Respond in markdown.\"\n",
|
|
"\n",
|
|
"# A function that writes a User Prompt that asks for summaries of websites:\n",
|
|
"\n",
|
|
"def user_prompt_for(website):\n",
|
|
" user_prompt = f\"You are looking at a website titled {website.title}\"\n",
|
|
" user_prompt += \"\\nThe contents of this website is as follows; \\\n",
|
|
"please provide a short summary of this website in markdown. \\\n",
|
|
"If it includes news or announcements, then summarize these too.\\n\\n\"\n",
|
|
" user_prompt += website.text\n",
|
|
" return user_prompt\n",
|
|
"\n",
|
|
"def messages_for(website):\n",
|
|
" return [\n",
|
|
" {\"role\": \"system\", \"content\": system_prompt},\n",
|
|
" {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
|
|
" ]\n",
|
|
"\n",
|
|
"# And now: call the OpenAI API. You will get very familiar with this!\n",
|
|
"\n",
|
|
"def summarize(url):\n",
|
|
" website = ScrapeWebsite(url)\n",
|
|
" response = ollama_via_openai.chat.completions.create(\n",
|
|
" model = MODEL,\n",
|
|
" messages = messages_for(website)\n",
|
|
" )\n",
|
|
" return response.choices[0].message.content"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "5dbf8d5c-a42a-4a72-b3a4-c75865b841bb",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"summary = summarize(\"https://edwarddonner.com/2024/11/13/llm-engineering-resources/\")\n",
|
|
"display(Markdown(summary))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "4ddfacdc-b16a-4999-9ff2-93ed19600d24",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|