Launching refreshed version of LLM Engineering weeks 1-4 - see README

This commit is contained in:
Edward Donner
2025-10-11 15:58:39 -04:00
parent 3286cfb395
commit c7257b9ae6
68 changed files with 16583 additions and 3756 deletions

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -35,23 +35,11 @@
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"if anthropic_api_key:\n",
" print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
"else:\n",
" print(\"Anthropic API Key not set\")\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
"else:\n",
" print(\"Google API Key not set\")"
" print(\"OpenAI API Key not set\")"
]
},
{
@@ -64,7 +52,7 @@
"# Initialize\n",
"\n",
"openai = OpenAI()\n",
"MODEL = 'gpt-4o-mini'"
"MODEL = 'gpt-4.1-mini'"
]
},
{
@@ -74,6 +62,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Again, I'll be in scientist-mode and change this global during the lab\n",
"\n",
"system_message = \"You are a helpful assistant\""
]
},
@@ -82,32 +72,75 @@
"id": "98e97227-f162-4d1a-a0b2-345ff248cbe7",
"metadata": {},
"source": [
"# Please read this! A change from the video:\n",
"## And now, writing a new callback\n",
"\n",
"In the video, I explain how we now need to write a function called:\n",
"We now need to write a function called:\n",
"\n",
"`chat(message, history)`\n",
"\n",
"Which expects to receive `history` in a particular format, which we need to map to the OpenAI format before we call OpenAI:\n",
"Which will be a callback function we will give gradio.\n",
"\n",
"```\n",
"[\n",
" {\"role\": \"system\", \"content\": \"system message here\"},\n",
" {\"role\": \"user\", \"content\": \"first user prompt here\"},\n",
" {\"role\": \"assistant\", \"content\": \"the assistant's response\"},\n",
" {\"role\": \"user\", \"content\": \"the new user prompt\"},\n",
"]\n",
"```\n",
"### The job of this function\n",
"\n",
"But Gradio has been upgraded! Now it will pass in `history` in the exact OpenAI format, perfect for us to send straight to OpenAI.\n",
"\n",
"So our work just got easier!\n",
"\n",
"We will write a function `chat(message, history)` where: \n",
"**message** is the prompt to use \n",
"**history** is the past conversation, in OpenAI format \n",
"\n",
"We will combine the system message, history and latest message, then call OpenAI."
"Take a message, take the prior conversation, and return the response.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "354ce793",
"metadata": {},
"outputs": [],
"source": [
"def chat(message, history):\n",
" return \"bananas\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e87f3417",
"metadata": {},
"outputs": [],
"source": [
"gr.ChatInterface(fn=chat, type=\"messages\").launch()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5d4996e8",
"metadata": {},
"outputs": [],
"source": [
"def chat(message, history):\n",
" return f\"You said {message} and the history is {history} but I still say bananas\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "434a0417",
"metadata": {},
"outputs": [],
"source": [
"gr.ChatInterface(fn=chat, type=\"messages\").launch()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7890cac3",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "f7330d7f",
"metadata": {},
"source": [
"## OK! Let's write a slightly better chat callback!"
]
},
{
@@ -117,46 +150,61 @@
"metadata": {},
"outputs": [],
"source": [
"# Simpler than in my video - we can easily create this function that calls OpenAI\n",
"# It's now just 1 line of code to prepare the input to OpenAI!\n",
"\n",
"# Student Octavio O. has pointed out that this isn't quite as straightforward for Claude -\n",
"# see the excellent contribution in community-contributions \"Gradio_issue_with_Claude\" that handles Claude.\n",
"\n",
"def chat(message, history):\n",
" history = [{\"role\":h[\"role\"], \"content\":h[\"content\"]} for h in history]\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
" return response.choices[0].message.content\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0ab706f9",
"metadata": {},
"outputs": [],
"source": [
"gr.ChatInterface(fn=chat, type=\"messages\").launch()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3bce145a",
"metadata": {},
"outputs": [],
"source": [
"def chat(message, history):\n",
" history = [{\"role\":h[\"role\"], \"content\":h[\"content\"]} for h in history]\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
"\n",
" print(\"History is:\")\n",
" print(history)\n",
" print(\"And messages is:\")\n",
" print(messages)\n",
"\n",
" stream = openai.chat.completions.create(model=MODEL, messages=messages, stream=True)\n",
"\n",
" response = \"\"\n",
" for chunk in stream:\n",
" response += chunk.choices[0].delta.content or ''\n",
" yield response"
]
},
{
"cell_type": "markdown",
"id": "1334422a-808f-4147-9c4c-57d63d9780d0",
"metadata": {},
"source": [
"## And then enter Gradio's magic!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0866ca56-100a-44ab-8bd0-1568feaf6bf2",
"id": "b8beeca6",
"metadata": {},
"outputs": [],
"source": [
"gr.ChatInterface(fn=chat, type=\"messages\").launch()"
]
},
{
"cell_type": "markdown",
"id": "1334422a-808f-4147-9c4c-57d63d9780d0",
"metadata": {},
"source": [
"## OK let's keep going!\n",
"\n",
"Using a system message to add context, and to give an example answer.. this is \"one shot prompting\" again"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -171,24 +219,6 @@
"Encourage the customer to buy hats if they are unsure what to get.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4e5be3ec-c26c-42bc-ac16-c39d369883f6",
"metadata": {},
"outputs": [],
"source": [
"def chat(message, history):\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
"\n",
" stream = openai.chat.completions.create(model=MODEL, messages=messages, stream=True)\n",
"\n",
" response = \"\"\n",
" for chunk in stream:\n",
" response += chunk.choices[0].delta.content or ''\n",
" yield response"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -227,13 +257,11 @@
"metadata": {},
"outputs": [],
"source": [
"# Fixed a bug in this function brilliantly identified by student Gabor M.!\n",
"# I've also improved the structure of this function\n",
"\n",
"def chat(message, history):\n",
"\n",
" history = [{\"role\":h[\"role\"], \"content\":h[\"content\"]} for h in history]\n",
" relevant_system_message = system_message\n",
" if 'belt' in message:\n",
" if 'belt' in message.lower():\n",
" relevant_system_message += \" The store does not sell belts; if you are asked for belts, be sure to point out other items on sale.\"\n",
" \n",
" messages = [{\"role\": \"system\", \"content\": relevant_system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
@@ -264,7 +292,7 @@
"<table style=\"margin: 0; text-align: left;\">\n",
" <tr>\n",
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
" <img src=\"../business.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" <img src=\"../assets/business.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" </td>\n",
" <td>\n",
" <h2 style=\"color:#181;\">Business Applications</h2>\n",
@@ -277,17 +305,15 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6dfb9e21-df67-4c2b-b952-5e7e7961b03d",
"cell_type": "markdown",
"id": "acc0e5a9",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": ".venv",
"language": "python",
"name": "python3"
},
@@ -301,7 +327,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
"version": "3.12.9"
}
},
"nbformat": 4,

View File

@@ -17,8 +17,6 @@
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import json\n",
"from dotenv import load_dotenv\n",
@@ -43,7 +41,7 @@
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"MODEL = \"gpt-4o-mini\"\n",
"MODEL = \"gpt-4.1-mini\"\n",
"openai = OpenAI()\n",
"\n",
"# As an alternative, if you'd like to use Ollama instead of OpenAI\n",
@@ -59,9 +57,11 @@
"metadata": {},
"outputs": [],
"source": [
"system_message = \"You are a helpful assistant for an Airline called FlightAI. \"\n",
"system_message += \"Give short, courteous answers, no more than 1 sentence. \"\n",
"system_message += \"Always be accurate. If you don't know the answer, say so.\""
"system_message = \"\"\"\n",
"You are a helpful assistant for an Airline called FlightAI.\n",
"Give short, courteous answers, no more than 1 sentence.\n",
"Always be accurate. If you don't know the answer, say so.\n",
"\"\"\""
]
},
{
@@ -71,9 +71,8 @@
"metadata": {},
"outputs": [],
"source": [
"# This function looks rather simpler than the one from my video, because we're taking advantage of the latest Gradio updates\n",
"\n",
"def chat(message, history):\n",
" history = [{\"role\":h[\"role\"], \"content\":h[\"content\"]} for h in history]\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
" return response.choices[0].message.content\n",
@@ -109,9 +108,9 @@
"ticket_prices = {\"london\": \"$799\", \"paris\": \"$899\", \"tokyo\": \"$1400\", \"berlin\": \"$499\"}\n",
"\n",
"def get_ticket_price(destination_city):\n",
" print(f\"Tool get_ticket_price called for {destination_city}\")\n",
" city = destination_city.lower()\n",
" return ticket_prices.get(city, \"Unknown\")"
" print(f\"Tool called for city {destination_city}\")\n",
" price = ticket_prices.get(destination_city.lower(), \"Unknown ticket price\")\n",
" return f\"The price of a ticket to {destination_city} is {price}\"\n"
]
},
{
@@ -121,7 +120,7 @@
"metadata": {},
"outputs": [],
"source": [
"get_ticket_price(\"Berlin\")"
"get_ticket_price(\"London\")"
]
},
{
@@ -135,7 +134,7 @@
"\n",
"price_function = {\n",
" \"name\": \"get_ticket_price\",\n",
" \"description\": \"Get the price of a return ticket to the destination city. Call this whenever you need to know the ticket price, for example when a customer asks 'How much is a ticket to this city'\",\n",
" \"description\": \"Get the price of a return ticket to the destination city.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
@@ -162,6 +161,16 @@
"tools = [{\"type\": \"function\", \"function\": price_function}]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "818b4b2b",
"metadata": {},
"outputs": [],
"source": [
"tools"
]
},
{
"cell_type": "markdown",
"id": "c3d3554f-b4e3-4ce7-af6f-68faa6dd2340",
@@ -184,12 +193,13 @@
"outputs": [],
"source": [
"def chat(message, history):\n",
" history = [{\"role\":h[\"role\"], \"content\":h[\"content\"]} for h in history]\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
"\n",
" if response.choices[0].finish_reason==\"tool_calls\":\n",
" message = response.choices[0].message\n",
" response, city = handle_tool_call(message)\n",
" response = handle_tool_call(message)\n",
" messages.append(message)\n",
" messages.append(response)\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
@@ -208,15 +218,16 @@
"\n",
"def handle_tool_call(message):\n",
" tool_call = message.tool_calls[0]\n",
" arguments = json.loads(tool_call.function.arguments)\n",
" city = arguments.get('destination_city')\n",
" price = get_ticket_price(city)\n",
" response = {\n",
" \"role\": \"tool\",\n",
" \"content\": json.dumps({\"destination_city\": city,\"price\": price}),\n",
" \"tool_call_id\": tool_call.id\n",
" }\n",
" return response, city"
" if tool_call.function.name == \"get_ticket_price\":\n",
" arguments = json.loads(tool_call.function.arguments)\n",
" city = arguments.get('destination_city')\n",
" price_details = get_ticket_price(city)\n",
" response = {\n",
" \"role\": \"tool\",\n",
" \"content\": price_details,\n",
" \"tool_call_id\": tool_call.id\n",
" }\n",
" return response"
]
},
{
@@ -228,11 +239,224 @@
"source": [
"gr.ChatInterface(fn=chat, type=\"messages\").launch()"
]
},
{
"cell_type": "markdown",
"id": "47f30fbe",
"metadata": {},
"source": [
"## Let's make a couple of improvements\n",
"\n",
"Handling multiple tool calls in 1 response\n",
"\n",
"Handling multiple tool calls 1 after another"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6f5c860",
"metadata": {},
"outputs": [],
"source": [
"def chat(message, history):\n",
" history = [{\"role\":h[\"role\"], \"content\":h[\"content\"]} for h in history]\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
"\n",
" if response.choices[0].finish_reason==\"tool_calls\":\n",
" message = response.choices[0].message\n",
" responses = handle_tool_calls(message)\n",
" messages.append(message)\n",
" messages.extend(responses)\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
" \n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c46a861",
"metadata": {},
"outputs": [],
"source": [
"def handle_tool_calls(message):\n",
" responses = []\n",
" for tool_call in message.tool_calls:\n",
" if tool_call.function.name == \"get_ticket_price\":\n",
" arguments = json.loads(tool_call.function.arguments)\n",
" city = arguments.get('destination_city')\n",
" price_details = get_ticket_price(city)\n",
" responses.append({\n",
" \"role\": \"tool\",\n",
" \"content\": price_details,\n",
" \"tool_call_id\": tool_call.id\n",
" })\n",
" return responses"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "95f02a4d",
"metadata": {},
"outputs": [],
"source": [
"gr.ChatInterface(fn=chat, type=\"messages\").launch()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf262abc",
"metadata": {},
"outputs": [],
"source": [
"def chat(message, history):\n",
" history = [{\"role\":h[\"role\"], \"content\":h[\"content\"]} for h in history]\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
"\n",
" while response.choices[0].finish_reason==\"tool_calls\":\n",
" message = response.choices[0].message\n",
" responses = handle_tool_calls(message)\n",
" messages.append(message)\n",
" messages.extend(responses)\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
" \n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "47d50e70",
"metadata": {},
"outputs": [],
"source": [
"import sqlite3\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb61a45d",
"metadata": {},
"outputs": [],
"source": [
"DB = \"prices.db\"\n",
"\n",
"with sqlite3.connect(DB) as conn:\n",
" cursor = conn.cursor()\n",
" cursor.execute('CREATE TABLE IF NOT EXISTS prices (city TEXT PRIMARY KEY, price REAL)')\n",
" conn.commit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12c73b6a",
"metadata": {},
"outputs": [],
"source": [
"def get_ticket_price(city):\n",
" print(f\"DATABASE TOOL CALLED: Getting price for {city}\", flush=True)\n",
" with sqlite3.connect(DB) as conn:\n",
" cursor = conn.cursor()\n",
" cursor.execute('SELECT price FROM prices WHERE city = ?', (city.lower(),))\n",
" result = cursor.fetchone()\n",
" return f\"Ticket price to {city} is ${result[0]}\" if result else \"No price data available for this city\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7cb2e079",
"metadata": {},
"outputs": [],
"source": [
"get_ticket_price(\"London\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46e43463",
"metadata": {},
"outputs": [],
"source": [
"def set_ticket_price(city, price):\n",
" with sqlite3.connect(DB) as conn:\n",
" cursor = conn.cursor()\n",
" cursor.execute('INSERT INTO prices (city, price) VALUES (?, ?) ON CONFLICT(city) DO UPDATE SET price = ?', (city.lower(), price, price))\n",
" conn.commit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9185228e",
"metadata": {},
"outputs": [],
"source": [
"ticket_prices = {\"london\":799, \"paris\": 899, \"tokyo\": 1420, \"sydney\": 2999}\n",
"for city, price in ticket_prices.items():\n",
" set_ticket_price(city, price)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cda459b9",
"metadata": {},
"outputs": [],
"source": [
"get_ticket_price(\"Tokyo\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bfbfa251",
"metadata": {},
"outputs": [],
"source": [
"gr.ChatInterface(fn=chat, type=\"messages\").launch()"
]
},
{
"cell_type": "markdown",
"id": "d1a9e9c7",
"metadata": {},
"source": [
"## Exercise\n",
"\n",
"Add a tool to set the price of a ticket!"
]
},
{
"cell_type": "markdown",
"id": "6aeba34c",
"metadata": {},
"source": [
"<table style=\"margin: 0; text-align: left;\">\n",
" <tr>\n",
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
" <img src=\"../assets/business.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" </td>\n",
" <td>\n",
" <h2 style=\"color:#181;\">Business Applications</h2>\n",
" <span style=\"color:#181;\">Hopefully this hardly needs to be stated! You now have the ability to give actions to your LLMs. This Airline Assistant can now do more than answer questions - it could interact with booking APIs to make bookings!</span>\n",
" </td>\n",
" </tr>\n",
"</table>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": ".venv",
"language": "python",
"name": "python3"
},
@@ -246,7 +470,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
"version": "3.12.9"
}
},
"nbformat": 4,

View File

@@ -23,7 +23,8 @@
"import json\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import gradio as gr"
"import gradio as gr\n",
"import sqlite3"
]
},
{
@@ -43,8 +44,10 @@
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"MODEL = \"gpt-4o-mini\"\n",
"openai = OpenAI()"
"MODEL = \"gpt-4.1-mini\"\n",
"openai = OpenAI()\n",
"\n",
"DB = \"prices.db\""
]
},
{
@@ -54,83 +57,49 @@
"metadata": {},
"outputs": [],
"source": [
"system_message = \"You are a helpful assistant for an Airline called FlightAI. \"\n",
"system_message += \"Give short, courteous answers, no more than 1 sentence. \"\n",
"system_message += \"Always be accurate. If you don't know the answer, say so.\""
"system_message = \"\"\"\n",
"You are a helpful assistant for an Airline called FlightAI.\n",
"Give short, courteous answers, no more than 1 sentence.\n",
"Always be accurate. If you don't know the answer, say so.\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "61a2a15d-b559-4844-b377-6bd5cb4949f6",
"id": "c3e8173c",
"metadata": {},
"outputs": [],
"source": [
"# This function looks rather simpler than the one from my video, because we're taking advantage of the latest Gradio updates\n",
"\n",
"def chat(message, history):\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
" return response.choices[0].message.content\n",
"\n",
"gr.ChatInterface(fn=chat, type=\"messages\").launch()"
]
},
{
"cell_type": "markdown",
"id": "36bedabf-a0a7-4985-ad8e-07ed6a55a3a4",
"metadata": {},
"source": [
"## Tools\n",
"\n",
"Tools are an incredibly powerful feature provided by the frontier LLMs.\n",
"\n",
"With tools, you can write a function, and have the LLM call that function as part of its response.\n",
"\n",
"Sounds almost spooky.. we're giving it the power to run code on our machine?\n",
"\n",
"Well, kinda."
"def get_ticket_price(city):\n",
" print(f\"DATABASE TOOL CALLED: Getting price for {city}\", flush=True)\n",
" with sqlite3.connect(DB) as conn:\n",
" cursor = conn.cursor()\n",
" cursor.execute('SELECT price FROM prices WHERE city = ?', (city.lower(),))\n",
" result = cursor.fetchone()\n",
" return f\"Ticket price to {city} is ${result[0]}\" if result else \"No price data available for this city\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0696acb1-0b05-4dc2-80d5-771be04f1fb2",
"id": "03f19289",
"metadata": {},
"outputs": [],
"source": [
"# Let's start by making a useful function\n",
"\n",
"ticket_prices = {\"london\": \"$799\", \"paris\": \"$899\", \"tokyo\": \"$1400\", \"berlin\": \"$499\"}\n",
"\n",
"def get_ticket_price(destination_city):\n",
" print(f\"Tool get_ticket_price called for {destination_city}\")\n",
" city = destination_city.lower()\n",
" return ticket_prices.get(city, \"Unknown\")"
"get_ticket_price(\"Paris\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "80ca4e09-6287-4d3f-997d-fa6afbcf6c85",
"id": "bcfb6523",
"metadata": {},
"outputs": [],
"source": [
"get_ticket_price(\"London\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4afceded-7178-4c05-8fa6-9f2085e6a344",
"metadata": {},
"outputs": [],
"source": [
"# There's a particular dictionary structure that's required to describe our function:\n",
"\n",
"price_function = {\n",
" \"name\": \"get_ticket_price\",\n",
" \"description\": \"Get the price of a return ticket to the destination city. Call this whenever you need to know the ticket price, for example when a customer asks 'How much is a ticket to this city'\",\n",
" \"description\": \"Get the price of a return ticket to the destination city.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
@@ -142,52 +111,46 @@
" \"required\": [\"destination_city\"],\n",
" \"additionalProperties\": False\n",
" }\n",
"}"
"}\n",
"tools = [{\"type\": \"function\", \"function\": price_function}]\n",
"tools"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bdca8679-935f-4e7f-97e6-e71a4d4f228c",
"id": "61a2a15d-b559-4844-b377-6bd5cb4949f6",
"metadata": {},
"outputs": [],
"source": [
"# And this is included in a list of tools:\n",
"\n",
"tools = [{\"type\": \"function\", \"function\": price_function}]"
]
},
{
"cell_type": "markdown",
"id": "c3d3554f-b4e3-4ce7-af6f-68faa6dd2340",
"metadata": {},
"source": [
"## Getting OpenAI to use our Tool\n",
"def chat(message, history):\n",
" history = [{\"role\": h[\"role\"], \"content\": h[\"content\"]} for h in history]\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
" return response.choices[0].message.content\n",
"\n",
"There's some fiddly stuff to allow OpenAI \"to call our tool\"\n",
"\n",
"What we actually do is give the LLM the opportunity to inform us that it wants us to run the tool.\n",
"\n",
"Here's how the new chat function looks:"
"gr.ChatInterface(fn=chat, type=\"messages\").launch()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce9b0744-9c78-408d-b9df-9f6fd9ed78cf",
"id": "c91d012e",
"metadata": {},
"outputs": [],
"source": [
"def chat(message, history):\n",
" history = [{\"role\":h[\"role\"], \"content\":h[\"content\"]} for h in history]\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
"\n",
" if response.choices[0].finish_reason==\"tool_calls\":\n",
" while response.choices[0].finish_reason==\"tool_calls\":\n",
" message = response.choices[0].message\n",
" response, city = handle_tool_call(message)\n",
" responses = handle_tool_calls(message)\n",
" messages.append(message)\n",
" messages.append(response)\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
" messages.extend(responses)\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
" \n",
" return response.choices[0].message.content"
]
@@ -195,35 +158,59 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b0992986-ea09-4912-a076-8e5603ee631f",
"id": "956c3b61",
"metadata": {},
"outputs": [],
"source": [
"# We have to write that function handle_tool_call:\n",
"\n",
"def handle_tool_call(message):\n",
" tool_call = message.tool_calls[0]\n",
" arguments = json.loads(tool_call.function.arguments)\n",
" city = arguments.get('destination_city')\n",
" price = get_ticket_price(city)\n",
" response = {\n",
" \"role\": \"tool\",\n",
" \"content\": json.dumps({\"destination_city\": city,\"price\": price}),\n",
" \"tool_call_id\": tool_call.id\n",
" }\n",
" return response, city"
"def handle_tool_calls(message):\n",
" responses = []\n",
" for tool_call in message.tool_calls:\n",
" if tool_call.function.name == \"get_ticket_price\":\n",
" arguments = json.loads(tool_call.function.arguments)\n",
" city = arguments.get('destination_city')\n",
" price_details = get_ticket_price(city)\n",
" responses.append({\n",
" \"role\": \"tool\",\n",
" \"content\": price_details,\n",
" \"tool_call_id\": tool_call.id\n",
" })\n",
" return responses"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f4be8a71-b19e-4c2f-80df-f59ff2661f14",
"id": "8eca803e",
"metadata": {},
"outputs": [],
"source": [
"gr.ChatInterface(fn=chat, type=\"messages\").launch()"
]
},
{
"cell_type": "markdown",
"id": "b369bf10",
"metadata": {},
"source": [
"## A bit more about what Gradio actually does:\n",
"\n",
"1. Gradio constructs a frontend Svelte app based on our Python description of the UI\n",
"2. Gradio starts a server built upon the Starlette web framework listening on a free port that serves this React app\n",
"3. Gradio creates backend routes for our callbacks, like chat(), which calls our functions\n",
"\n",
"And of course when Gradio generates the frontend app, it ensures that the the Submit button calls the right backend route.\n",
"\n",
"That's it!\n",
"\n",
"It's simple, and it has a resut that feels magical."
]
},
{
"cell_type": "markdown",
"id": "863aac34",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"id": "473e5b39-da8f-4db1-83ae-dbaca2e9531e",
@@ -289,414 +276,128 @@
"id": "728a12c5-adc3-415d-bb05-82beb73b079b",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "f4975b87-19e9-4ade-a232-9b809ec75c9a",
"metadata": {},
"source": [
"## Audio (NOTE - Audio is optional for this course - feel free to skip Audio if it causes trouble!)\n",
"\n",
"And let's make a function talker that uses OpenAI's speech model to generate Audio\n",
"\n",
"### Troubleshooting Audio issues\n",
"\n",
"If you have any problems running this code below (like a FileNotFound error, or a warning of a missing package), you may need to install FFmpeg, a very popular audio utility.\n",
"\n",
"**For PC Users**\n",
"\n",
"Detailed instructions are [here](https://chatgpt.com/share/6724efee-6b0c-8012-ac5e-72e2e3885905) and summary instructions:\n",
"\n",
"1. Download FFmpeg from the official website: https://ffmpeg.org/download.html\n",
"\n",
"2. Extract the downloaded files to a location on your computer (e.g., `C:\\ffmpeg`)\n",
"\n",
"3. Add the FFmpeg bin folder to your system PATH:\n",
"- Right-click on 'This PC' or 'My Computer' and select 'Properties'\n",
"- Click on 'Advanced system settings'\n",
"- Click on 'Environment Variables'\n",
"- Under 'System variables', find and edit 'Path'\n",
"- Add a new entry with the path to your FFmpeg bin folder (e.g., `C:\\ffmpeg\\bin`)\n",
"- Restart your command prompt, and within Jupyter Lab do Kernel -> Restart kernel, to pick up the changes\n",
"\n",
"4. Open a new command prompt and run this to make sure it's installed OK\n",
"`ffmpeg -version`\n",
"\n",
"**For Mac Users**\n",
"\n",
"1. Install homebrew if you don't have it already by running this in a Terminal window and following any instructions: \n",
"`/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"`\n",
"\n",
"2. Then install FFmpeg with `brew install ffmpeg`\n",
"\n",
"3. Verify your installation with `ffmpeg -version` and if everything is good, within Jupyter Lab do Kernel -> Restart kernel to pick up the changes\n",
"\n",
"Message me or email me at ed@edwarddonner.com with any problems!"
]
},
{
"cell_type": "markdown",
"id": "4cc90e80-c96e-4dd4-b9d6-386fe2b7e797",
"metadata": {},
"source": [
"## To check you now have ffmpeg and can access it here\n",
"\n",
"Excecute the next cell to see if you get a version number. (Putting an exclamation mark before something in Jupyter Lab tells it to run it as a terminal command rather than python code).\n",
"\n",
"If this doesn't work, you may need to actually save and close down your Jupyter lab, and start it again from a new Terminal window (Mac) or Anaconda prompt (PC), remembering to activate the llms environment. This ensures you pick up ffmpeg.\n",
"\n",
"And if that doesn't work, please contact me!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7b3be0fb-1d34-4693-ab6f-dbff190afcd7",
"metadata": {},
"outputs": [],
"source": [
"!ffmpeg -version\n",
"!ffprobe -version\n",
"!ffplay -version"
]
},
{
"cell_type": "markdown",
"id": "d91d3f8f-e505-4e3c-a87c-9e42ed823db6",
"metadata": {},
"source": [
"# For Mac users - and possibly many PC users too\n",
"\n",
"This version should work fine for you. It might work for Windows users too, but you might get a Permissions error writing to a temp file. If so, see the next section!\n",
"\n",
"As always, if you have problems, please contact me! (You could also comment out the audio talker() in the later code if you're less interested in audio generation)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ffbfe93b-5e86-4e68-ba71-b301cd5230db",
"metadata": {},
"outputs": [],
"source": [
"from pydub import AudioSegment\n",
"from pydub.playback import play\n",
"\n",
"def talker(message):\n",
" response = openai.audio.speech.create(\n",
" model=\"tts-1\",\n",
" voice=\"onyx\", # Also, try replacing onyx with alloy\n",
" model=\"gpt-4o-mini-tts\",\n",
" voice=\"onyx\", # Also, try replacing onyx with alloy or coral\n",
" input=message\n",
" )\n",
" \n",
" audio_stream = BytesIO(response.content)\n",
" audio = AudioSegment.from_file(audio_stream, format=\"mp3\")\n",
" play(audio)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b88d775d-d357-4292-a1ad-5dc5ed567281",
"metadata": {},
"outputs": [],
"source": [
"talker(\"Well, hi there\")"
" return response.content"
]
},
{
"cell_type": "markdown",
"id": "ad89a9bd-bb1e-4bbb-a49a-83af5f500c24",
"id": "3bc7580b",
"metadata": {},
"source": [
"# For Windows users (or any Mac users with problems above)\n",
"## Let's bring this home:\n",
"\n",
"## First try the Mac version above, but if you get a permissions error writing to a temp file, then this code should work instead.\n",
"\n",
"A collaboration between students Mark M. and Patrick H. and Claude got this resolved!\n",
"\n",
"Below are 4 variations - hopefully one of them will work on your PC. If not, message me please!\n",
"\n",
"And for Mac people - all 3 of the below work on my Mac too - please try these if the Mac version gave you problems.\n",
"\n",
"## PC Variation 1"
"1. A multi-modal AI assistant with image and audio generation\n",
"2. Tool callling with database lookup\n",
"3. A step towards an Agentic workflow\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d104b96a-02ca-4159-82fe-88e0452aa479",
"metadata": {},
"outputs": [],
"source": [
"import base64\n",
"from io import BytesIO\n",
"from PIL import Image\n",
"from IPython.display import Audio, display\n",
"\n",
"def talker(message):\n",
" response = openai.audio.speech.create(\n",
" model=\"tts-1\",\n",
" voice=\"onyx\",\n",
" input=message)\n",
"\n",
" audio_stream = BytesIO(response.content)\n",
" output_filename = \"output_audio.mp3\"\n",
" with open(output_filename, \"wb\") as f:\n",
" f.write(audio_stream.read())\n",
"\n",
" # Play the generated audio\n",
" display(Audio(output_filename, autoplay=True))\n",
"\n",
"talker(\"Well, hi there\")"
]
},
{
"cell_type": "markdown",
"id": "3a5d11f4-bbd3-43a1-904d-f684eb5f3e3a",
"metadata": {},
"source": [
"## PC Variation 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d59c8ebd-79c5-498a-bdf2-3a1c50d91aa0",
"metadata": {},
"outputs": [],
"source": [
"import tempfile\n",
"import subprocess\n",
"from io import BytesIO\n",
"from pydub import AudioSegment\n",
"import time\n",
"\n",
"def play_audio(audio_segment):\n",
" temp_dir = tempfile.gettempdir()\n",
" temp_path = os.path.join(temp_dir, \"temp_audio.wav\")\n",
" try:\n",
" audio_segment.export(temp_path, format=\"wav\")\n",
" time.sleep(3) # Student Dominic found that this was needed. You could also try commenting out to see if not needed on your PC\n",
" subprocess.call([\n",
" \"ffplay\",\n",
" \"-nodisp\",\n",
" \"-autoexit\",\n",
" \"-hide_banner\",\n",
" temp_path\n",
" ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)\n",
" finally:\n",
" try:\n",
" os.remove(temp_path)\n",
" except Exception:\n",
" pass\n",
" \n",
"def talker(message):\n",
" response = openai.audio.speech.create(\n",
" model=\"tts-1\",\n",
" voice=\"onyx\", # Also, try replacing onyx with alloy\n",
" input=message\n",
" )\n",
" audio_stream = BytesIO(response.content)\n",
" audio = AudioSegment.from_file(audio_stream, format=\"mp3\")\n",
" play_audio(audio)\n",
"\n",
"talker(\"Well hi there\")"
]
},
{
"cell_type": "markdown",
"id": "96f90e35-f71e-468e-afea-07b98f74dbcf",
"metadata": {},
"source": [
"## PC Variation 3"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8597c7f8-7b50-44ad-9b31-db12375cd57b",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from pydub import AudioSegment\n",
"from pydub.playback import play\n",
"from io import BytesIO\n",
"\n",
"def talker(message):\n",
" # Set a custom directory for temporary files on Windows\n",
" custom_temp_dir = os.path.expanduser(\"~/Documents/temp_audio\")\n",
" os.environ['TEMP'] = custom_temp_dir # You can also use 'TMP' if necessary\n",
" \n",
" # Create the folder if it doesn't exist\n",
" if not os.path.exists(custom_temp_dir):\n",
" os.makedirs(custom_temp_dir)\n",
" \n",
" response = openai.audio.speech.create(\n",
" model=\"tts-1\",\n",
" voice=\"onyx\", # Also, try replacing onyx with alloy\n",
" input=message\n",
" )\n",
" \n",
" audio_stream = BytesIO(response.content)\n",
" audio = AudioSegment.from_file(audio_stream, format=\"mp3\")\n",
"\n",
" play(audio)\n",
"\n",
"talker(\"Well hi there\")"
]
},
{
"cell_type": "markdown",
"id": "e821224c-b069-4f9b-9535-c15fdb0e411c",
"metadata": {},
"source": [
"## PC Variation 4\n",
"\n",
"### Let's try a completely different sound library\n",
"\n",
"First run the next cell to install a new library, then try the cell below it."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "69d3c0d9-afcc-49e3-b829-9c9869d8b472",
"metadata": {},
"outputs": [],
"source": [
"!pip install simpleaudio"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "28f9cc99-36b7-4554-b3f4-f2012f614a13",
"metadata": {},
"outputs": [],
"source": [
"from pydub import AudioSegment\n",
"from io import BytesIO\n",
"import tempfile\n",
"import os\n",
"import simpleaudio as sa\n",
"\n",
"def talker(message):\n",
" response = openai.audio.speech.create(\n",
" model=\"tts-1\",\n",
" voice=\"onyx\", # Also, try replacing onyx with alloy\n",
" input=message\n",
" )\n",
" \n",
" audio_stream = BytesIO(response.content)\n",
" audio = AudioSegment.from_file(audio_stream, format=\"mp3\")\n",
"\n",
" # Create a temporary file in a folder where you have write permissions\n",
" with tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False, dir=os.path.expanduser(\"~/Documents\")) as temp_audio_file:\n",
" temp_file_name = temp_audio_file.name\n",
" audio.export(temp_file_name, format=\"wav\")\n",
" \n",
" # Load and play audio using simpleaudio\n",
" wave_obj = sa.WaveObject.from_wave_file(temp_file_name)\n",
" play_obj = wave_obj.play()\n",
" play_obj.wait_done() # Wait for playback to finish\n",
"\n",
" # Clean up the temporary file afterward\n",
" os.remove(temp_file_name)\n",
" \n",
"talker(\"Well hi there\")"
]
},
{
"cell_type": "markdown",
"id": "7986176b-cd04-495f-a47f-e057b0e462ed",
"metadata": {},
"source": [
"## PC Users - if none of those 4 variations worked!\n",
"\n",
"Please get in touch with me. I'm sorry this is causing problems! We'll figure it out.\n",
"\n",
"Alternatively: playing audio from your PC isn't super-critical for this course, and you can feel free to focus on image generation and skip audio for now, or come back to it later."
]
},
{
"cell_type": "markdown",
"id": "1d48876d-c4fa-46a8-a04f-f9fadf61fb0d",
"metadata": {},
"source": [
"# Our Agent Framework\n",
"\n",
"The term 'Agentic AI' and Agentization is an umbrella term that refers to a number of techniques, such as:\n",
"\n",
"1. Breaking a complex problem into smaller steps, with multiple LLMs carrying out specialized tasks\n",
"2. The ability for LLMs to use Tools to give them additional capabilities\n",
"3. The 'Agent Environment' which allows Agents to collaborate\n",
"4. An LLM can act as the Planner, dividing bigger tasks into smaller ones for the specialists\n",
"5. The concept of an Agent having autonomy / agency, beyond just responding to a prompt - such as Memory\n",
"\n",
"We're showing 1 and 2 here, and to a lesser extent 3 and 5. In week 8 we will do the lot!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba820c95-02f5-499e-8f3c-8727ee0a6c0c",
"id": "b119ed1b",
"metadata": {},
"outputs": [],
"source": [
"def chat(history):\n",
" history = [{\"role\":h[\"role\"], \"content\":h[\"content\"]} for h in history]\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
" cities = []\n",
" image = None\n",
" \n",
" if response.choices[0].finish_reason==\"tool_calls\":\n",
"\n",
" while response.choices[0].finish_reason==\"tool_calls\":\n",
" message = response.choices[0].message\n",
" response, city = handle_tool_call(message)\n",
" responses, cities = handle_tool_calls_and_return_cities(message)\n",
" messages.append(message)\n",
" messages.append(response)\n",
" image = artist(city)\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
" \n",
" messages.extend(responses)\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
"\n",
" reply = response.choices[0].message.content\n",
" history += [{\"role\":\"assistant\", \"content\":reply}]\n",
"\n",
" # Comment out or delete the next line if you'd rather skip Audio for now..\n",
" talker(reply)\n",
" voice = talker(reply)\n",
"\n",
" if cities:\n",
" image = artist(cities[0])\n",
" \n",
" return history, image"
" return history, voice, image\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f38d0d27-33bf-4992-a2e5-5dbed973cde7",
"id": "5846bc77",
"metadata": {},
"outputs": [],
"source": [
"# More involved Gradio code as we're not using the preset Chat interface!\n",
"# Passing in inbrowser=True in the last line will cause a Gradio window to pop up immediately.\n",
"def handle_tool_calls_and_return_cities(message):\n",
" responses = []\n",
" cities = []\n",
" for tool_call in message.tool_calls:\n",
" if tool_call.function.name == \"get_ticket_price\":\n",
" arguments = json.loads(tool_call.function.arguments)\n",
" city = arguments.get('destination_city')\n",
" cities.append(city)\n",
" price_details = get_ticket_price(city)\n",
" responses.append({\n",
" \"role\": \"tool\",\n",
" \"content\": price_details,\n",
" \"tool_call_id\": tool_call.id\n",
" })\n",
" return responses, cities"
]
},
{
"cell_type": "markdown",
"id": "6e520161",
"metadata": {},
"source": [
"## The 3 types of Gradio UI\n",
"\n",
"`gr.Interface` is for standard, simple UIs\n",
"\n",
"`gr.ChatInterface` is for standard ChatBot UIs\n",
"\n",
"`gr.Blocks` is for custom UIs where you control the components and the callbacks"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f250915",
"metadata": {},
"outputs": [],
"source": [
"# Callbacks (along with the chat() function above)\n",
"\n",
"def put_message_in_chatbot(message, history):\n",
" return \"\", history + [{\"role\":\"user\", \"content\":message}]\n",
"\n",
"# UI definition\n",
"\n",
"with gr.Blocks() as ui:\n",
" with gr.Row():\n",
" chatbot = gr.Chatbot(height=500, type=\"messages\")\n",
" image_output = gr.Image(height=500)\n",
" image_output = gr.Image(height=500, interactive=False)\n",
" with gr.Row():\n",
" entry = gr.Textbox(label=\"Chat with our AI Assistant:\")\n",
" audio_output = gr.Audio(autoplay=True)\n",
" with gr.Row():\n",
" clear = gr.Button(\"Clear\")\n",
" message = gr.Textbox(label=\"Chat with our AI Assistant:\")\n",
"\n",
" def do_entry(message, history):\n",
" history += [{\"role\":\"user\", \"content\":message}]\n",
" return \"\", history\n",
"# Hooking up events to callbacks\n",
"\n",
" entry.submit(do_entry, inputs=[entry, chatbot], outputs=[entry, chatbot]).then(\n",
" chat, inputs=chatbot, outputs=[chatbot, image_output]\n",
" message.submit(put_message_in_chatbot, inputs=[message, chatbot], outputs=[message, chatbot]).then(\n",
" chat, inputs=chatbot, outputs=[chatbot, audio_output, image_output]\n",
" )\n",
" clear.click(lambda: None, inputs=None, outputs=chatbot, queue=False)\n",
"\n",
"ui.launch(inbrowser=True)"
"ui.launch(inbrowser=True, auth=(\"ed\", \"bananas\"))"
]
},
{
@@ -719,7 +420,7 @@
"<table style=\"margin: 0; text-align: left;\">\n",
" <tr>\n",
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
" <img src=\"../thankyou.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" <img src=\"../assets/thankyou.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" </td>\n",
" <td>\n",
" <h2 style=\"color:#090;\">I have a special request for you</h2>\n",
@@ -734,7 +435,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": ".venv",
"language": "python",
"name": "python3"
},
@@ -748,7 +449,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
"version": "3.12.9"
}
},
"nbformat": 4,

4459
week2/hamlet.txt Normal file

File diff suppressed because it is too large Load Diff

37
week2/scraper.py Normal file
View File

@@ -0,0 +1,37 @@
from bs4 import BeautifulSoup
import requests
# Standard headers to fetch a website
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}
def fetch_website_contents(url):
"""
Return the title and contents of the website at the given url;
truncate to 2,000 characters as a sensible limit
"""
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
title = soup.title.string if soup.title else "No title found"
if soup.body:
for irrelevant in soup.body(["script", "style", "img", "input"]):
irrelevant.decompose()
text = soup.body.get_text(separator="\n", strip=True)
else:
text = ""
return (title + "\n\n" + text)[:2_000]
def fetch_website_links(url):
"""
Return the links on the webiste at the given url
I realize this is inefficient as we're parsing twice! This is to keep the code in the lab simple.
Feel free to use a class and optimize it!
"""
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
links = [link.get("href") for link in soup.find_all("a")]
return [link for link in links if link]