Refreshed notebooks, particularly with new Week 1

This commit is contained in:
Edward Donner
2024-11-13 15:46:22 +00:00
parent 6ba1875cd3
commit 21c7a8155c
34 changed files with 2331 additions and 410 deletions

View File

@@ -10,7 +10,7 @@
"## A Quick Start Guide\n",
"\n",
"Welcome to the wonderful world of Jupyter lab! \n",
"This is a Data Science playground where you can easily write code that builds and builds. It's an ideal environment for: \n",
"This is a Data Science playground where you can easily write code and investigate the results. It's an ideal environment for: \n",
"- Research & Development\n",
"- Prototyping\n",
"- Learning (that's us!)\n",
@@ -21,7 +21,7 @@
"\n",
"A long time ago, Jupyter used to be called \"IPython\", and so the extensions of notebooks are \".ipynb\" which stands for \"IPython Notebook\".\n",
"\n",
"On the left is a File Browser that lets you navigate around the weeks and choose different notebooks. But you probably know that already, or you wouldn't have got here!\n",
"On the left is a File Browser that lets you navigate around the directories and choose different notebooks. But you probably know that already, or you wouldn't have got here!\n",
"\n",
"The notebook consists of a series of square boxes called \"cells\". Some of them contain text, like this cell, and some of them contain code, like the cell below.\n",
"\n",
@@ -37,6 +37,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Click anywhere in this cell and press Shift + Return\n",
"\n",
"2 + 2"
]
},
@@ -47,7 +49,7 @@
"source": [
"## Congrats!\n",
"\n",
"Now run the next cell which sets a value, followed by the cell after it to print the value"
"Now run the next cell which sets a value, followed by the cells after it to print the value"
]
},
{
@@ -62,6 +64,18 @@
"favorite_fruit = \"bananas\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "07792faa-761d-46cb-b9b7-2bbf70bb1628",
"metadata": {},
"outputs": [],
"source": [
"# The result of the last statement is shown after you run it\n",
"\n",
"favorite_fruit"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -91,10 +105,13 @@
"id": "9442d5c9-f57d-4839-b0af-dce58646c04f",
"metadata": {},
"source": [
"# Now go back and rerun the prior cell with the print statement\n",
"## Now go back and rerun the cell with the print statement, two cells back\n",
"\n",
"See how it prints something different, even though favorite_fruit was changed afterwards? \n",
"The order that code appears in the notebook doesn't matter. What matters is the order that the code is **executed**."
"See how it prints something different, even though favorite_fruit was changed further down in the notebook? \n",
"\n",
"The order that code appears in the notebook doesn't matter. What matters is the order that the code is **executed**. There's a python process sitting behind this notebook in which the variables are being changed.\n",
"\n",
"This catches some people out when they first use Jupyter."
]
},
{
@@ -104,16 +121,239 @@
"metadata": {},
"outputs": [],
"source": [
"# More coming here soon!"
"# Then run this cell twice, and see if you understand what's going on\n",
"\n",
"print(f\"My favorite fruit is {favorite_fruit}\")\n",
"\n",
"favorite_fruit = \"apples\""
]
},
{
"cell_type": "markdown",
"id": "a29dab2d-bab9-4a54-8504-05e62594cc6f",
"metadata": {},
"source": [
"# Explaining the 'kernel'\n",
"\n",
"Sitting behind this notebook is a Python process which executes each cell when you run it. That Python process is known as the Kernel. Each notebook has its own separate Kernel.\n",
"\n",
"You can go to the Kernel menu and select \"Restart Kernel\".\n",
"\n",
"If you then try to run the next cell, you'll get an error, because favorite_fruit is no longer defined. You'll need to run the cells from the top of the notebook again. Then the next cell should run fine."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b51950ca-b512-4829-974f-442bd50e29a5",
"id": "84b1e410-5eda-4e2c-97ce-4eebcff816c5",
"metadata": {},
"outputs": [],
"source": [
"print(f\"My favorite fruit is {favorite_fruit}\")"
]
},
{
"cell_type": "markdown",
"id": "4d4188fc-d9cc-42be-8b4e-ae8630456764",
"metadata": {},
"source": [
"# Adding and moving cells\n",
"\n",
"Click in this cell, then click the \\[+\\] button in the toolbar above to create a new cell immediately below this one. Copy and paste in the code in the prior cell, then run it! There are also icons in the top right of the selected cell to delete it (bin), duplicate it, and move it up and down.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce258424-40c3-49a7-9462-e6fa25014b03",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "30e71f50-8f01-470a-9d7a-b82a6cef4236",
"metadata": {},
"source": [
"# Cell output\n",
"\n",
"When you execute a cell, the standard output and the result of the last statement is written to the area immediately under the code, known as the 'cell output'. When you save a Notebook from the file menu (or command+S), the output is also saved, making it a useful record of what happened.\n",
"\n",
"You can clean this up by going to Edit menu >> Clear Outputs of All Cells, or Kernel menu >> Restart Kernel and Clear Outputs of All Cells."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a4d021e2-c284-411f-8ab1-030530cfbe72",
"metadata": {},
"outputs": [],
"source": [
"spams = [\"spam\"] * 1000\n",
"print(spams)\n",
"\n",
"# Might be worth clearing output after running this!"
]
},
{
"cell_type": "markdown",
"id": "eac060f2-7a71-46e7-8235-b6ad0a76f5f8",
"metadata": {},
"source": [
"# Using markdown\n",
"\n",
"So what's going on with these areas with writing in them, like this one? Well, there's actually a different kind of cell called a 'Markdown' cell for adding explanations like this. Click the + button to add a cell. Then in the toolbar, click where it says 'Code' and change it to 'Markdown'.\n",
"\n",
"Add some comments using Markdown format, perhaps copying and pasting from here:\n",
"\n",
"```\n",
"# This is a heading\n",
"## This is a sub-head\n",
"### And a sub-sub-head\n",
"\n",
"I like Jupyter Lab because it's\n",
"- Easy\n",
"- Flexible\n",
"- Satisfying\n",
"```\n",
"\n",
"And to turn this into formatted text simply with Shift+Return in the cell.\n",
"Click in the cell and press the Bin icon if you want to remove it."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1586320-c90f-4f22-8b39-df6865484950",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "1330c83c-67ac-4ca0-ac92-a71699e0c31b",
"metadata": {},
"source": [
"# The exclamation point\n",
"\n",
"There's a super useful feature of jupyter labs; you can type a command with a ! in front of it in a code cell, like:\n",
"\n",
"!pip install \\[some_package\\]\n",
"\n",
"And it will run it at the command line (as if in Windows Powershell or Mac Terminal) and print the result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "82042fc5-a907-4381-a4b8-eb9386df19cd",
"metadata": {},
"outputs": [],
"source": [
"# list the current directory\n",
"\n",
"!ls"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4fc3e3da-8a55-40cc-9706-48bf12a0e20e",
"metadata": {},
"outputs": [],
"source": [
"# ping cnn.com - press the stop button in the toolbar when you're bored\n",
"\n",
"!ping cnn.com"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a58e9462-89a2-4b4f-b4aa-51c4bd9f796b",
"metadata": {},
"outputs": [],
"source": [
"# This is a useful command that ensures your Anaconda environment \n",
"# is up to date with any new upgrades to packages;\n",
"# But it might take a minute and will print a lot to output\n",
"\n",
"!conda env update -f ../environment.yml --prune"
]
},
{
"cell_type": "markdown",
"id": "4688baaf-a72c-41b5-90b6-474cb24790a7",
"metadata": {},
"source": [
"# Minor things we encounter on the course\n",
"\n",
"This isn't necessarily a feature of Jupyter, but it's a nice package to know about that is useful in Jupyter Labs, and I use it in the course.\n",
"\n",
"The package `tqdm` will print a nice progress bar if you wrap any iterable."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2646a4e5-3c23-4aee-a34d-d623815187d2",
"metadata": {},
"outputs": [],
"source": [
"# Here's some code with no progress bar\n",
"# It will take 10 seconds while you wonder what's happpening..\n",
"\n",
"import time\n",
"\n",
"spams = [\"spam\"] * 1000\n",
"\n",
"for spam in spams:\n",
" time.sleep(0.01)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e96be3d-fa82-42a3-a8aa-b81dd20563a5",
"metadata": {},
"outputs": [],
"source": [
"# And now, with a nice little progress bar:\n",
"\n",
"import time\n",
"from tqdm import tqdm\n",
"\n",
"spams = [\"spam\"] * 1000\n",
"\n",
"for spam in tqdm(spams):\n",
" time.sleep(0.01)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63c788dd-4618-4bb4-a5ce-204411a38ade",
"metadata": {},
"outputs": [],
"source": [
"# On a different topic, here's a useful way to print output in markdown\n",
"\n",
"from IPython.display import Markdown, display\n",
"\n",
"display(Markdown(\"# This is a big heading!\\n\\n- And this is a bullet-point\\n- So is this\\n- Me, too!\"))\n"
]
},
{
"cell_type": "markdown",
"id": "9d14c1fb-3321-4387-b6ca-9af27676f980",
"metadata": {},
"source": [
"# That's it! You're up to speed on Jupyter Lab.\n",
"\n",
"## Want to be even more advanced?\n",
"\n",
"If you want to become a pro at Jupyter Lab, you can read their tutorial [here](https://jupyterlab.readthedocs.io/en/latest/). But this isn't required for our course; just a good technique for hitting Shift + Return and enjoying the result!"
]
}
],
"metadata": {

View File

@@ -0,0 +1,464 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "5c291475-8c7c-461c-9b12-545a887b2432",
"metadata": {},
"source": [
"# Intermediate Level Python\n",
"\n",
"## Getting you up to speed\n",
"\n",
"This course assumes that you're at an intermediate level of python. For example, you should have a decent idea what something like this might do:\n",
"\n",
"`yield from {book.get(\"author\") for book in books if book.get(\"author\")}`\n",
"\n",
"If not - then you've come to the right place! Welcome to the crash course in intermediate level python. The best way to learn is by doing!\n"
]
},
{
"cell_type": "markdown",
"id": "542f0577-a826-4613-a5d7-4170e9666d04",
"metadata": {},
"source": [
"## First: if you need a refresher on the foundations\n",
"\n",
"I'm going to defer to an AI friend for this, because these explanations are so well written with great examples. Copy and paste the code examples into a new cell to give them a try.\n",
"\n",
"**Python imports:** \n",
"https://chatgpt.com/share/672f9f31-8114-8012-be09-29ef0d0140fb\n",
"\n",
"**Python functions** including default arguments: \n",
"https://chatgpt.com/share/672f9f99-7060-8012-bfec-46d4cf77d672\n",
"\n",
"**Python strings**, including slicing, split/join, replace and literals: \n",
"https://chatgpt.com/share/672fb526-0aa0-8012-9e00-ad1687c04518\n",
"\n",
"**Python f-strings** including number and date formatting: \n",
"https://chatgpt.com/share/672fa125-0de0-8012-8e35-27918cbb481c\n",
"\n",
"**Python lists, dicts and sets**, including the `get()` method: \n",
"https://chatgpt.com/share/672fa225-3f04-8012-91af-f9c95287da8d\n",
"\n",
"**Python classes:** \n",
"https://chatgpt.com/share/672fa07a-1014-8012-b2ea-6dc679552715"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5802e2f0-0ea0-4237-bbb7-f375a34260f0",
"metadata": {},
"outputs": [],
"source": [
"# Next let's create some things:\n",
"\n",
"fruits = [\"Apples\", \"Bananas\", \"Pears\"]\n",
"\n",
"book1 = {\"title\": \"Great Expectations\", \"author\": \"Charles Dickens\"}\n",
"book2 = {\"title\": \"Bleak House\", \"author\": \"Charles Dickens\"}\n",
"book3 = {\"title\": \"An Book By No Author\"}\n",
"book4 = {\"title\": \"Moby Dick\", \"author\": \"Herman Melville\"}\n",
"\n",
"books = [book1, book2, book3, book4]"
]
},
{
"cell_type": "markdown",
"id": "9b941e6a-3658-4144-a8d4-72f5e72f3707",
"metadata": {},
"source": [
"# Part 1: List and dict comprehensions"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "61992bb8-735d-4dad-8747-8c10b63aec82",
"metadata": {},
"outputs": [],
"source": [
"# Simple enough to start\n",
"\n",
"for fruit in fruits:\n",
" print(fruit)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c89c3842-9b74-47fa-8424-0fcb08e4177c",
"metadata": {},
"outputs": [],
"source": [
"# Let's make a new version of fruits\n",
"\n",
"fruits_shouted = []\n",
"for fruit in fruits:\n",
" fruits_shouted.append(fruit.upper())\n",
"\n",
"fruits_shouted"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4ec13b3a-9545-44f1-874a-2910a0663560",
"metadata": {},
"outputs": [],
"source": [
"# You probably already know this\n",
"# There's a nice Python construct called \"list comprehension\" that does this:\n",
"\n",
"fruits_shouted2 = [fruit.upper() for fruit in fruits]\n",
"fruits_shouted2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ecc08c3c-181d-4b64-a3e1-b0ccffc6c0cd",
"metadata": {},
"outputs": [],
"source": [
"# But you may not know that you can do this to create dictionaries, too:\n",
"\n",
"fruit_mapping = {fruit:fruit.upper() for fruit in fruits}\n",
"fruit_mapping"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "500c2406-00d2-4793-b57b-f49b612760c8",
"metadata": {},
"outputs": [],
"source": [
"# you can also use the if statement to filter the results\n",
"\n",
"fruits_with_longer_names_shouted = [fruit.upper() for fruit in fruits if len(fruit)>5]\n",
"fruits_with_longer_names_shouted"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "38c11c34-d71e-45ba-945b-a3d37dc29793",
"metadata": {},
"outputs": [],
"source": [
"fruit_mapping_unless_starts_with_a = {fruit:fruit.upper() for fruit in fruits if not fruit.startswith('A')}\n",
"fruit_mapping_unless_starts_with_a"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c97d8e8-31de-4afa-973e-28d8e5cab749",
"metadata": {},
"outputs": [],
"source": [
"# Another comprehension\n",
"\n",
"[book['title'] for book in books]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "50be0edc-a4cd-493f-a680-06080bb497b4",
"metadata": {},
"outputs": [],
"source": [
"# This code will fail with an error because one of our books doesn't have an author\n",
"\n",
"[book['author'] for book in books]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "53794083-cc09-4edb-b448-2ffb7e8495c2",
"metadata": {},
"outputs": [],
"source": [
"# But this will work, because get() returns None\n",
"\n",
"[book.get('author') for book in books]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8e4b859-24f8-4016-8d74-c2cef226d049",
"metadata": {},
"outputs": [],
"source": [
"# And this variation will filter out the None\n",
"\n",
"[book.get('author') for book in books if book.get('author')]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c44bb999-52b4-4dee-810b-8a400db8f25f",
"metadata": {},
"outputs": [],
"source": [
"# And this version will convert it into a set, removing duplicates\n",
"\n",
"set([book.get('author') for book in books if book.get('author')])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "80a65156-6192-4bb4-b4e6-df3fdc933891",
"metadata": {},
"outputs": [],
"source": [
"# And finally, this version is even nicer\n",
"# curly braces creates a set, so this is a set comprehension\n",
"\n",
"{book.get('author') for book in books if book.get('author')}"
]
},
{
"cell_type": "markdown",
"id": "c100e5db-5438-4715-921c-3f7152f83f4a",
"metadata": {},
"source": [
"# Part 2: Generators\n",
"\n",
"We use Generators in the course because AI models can stream back results.\n",
"\n",
"If you've not used Generators before, please start with this excellent intro from ChatGPT:\n",
"\n",
"https://chatgpt.com/share/672faa6e-7dd0-8012-aae5-44fc0d0ec218\n",
"\n",
"Try pasting some of its examples into a cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1efc26fa-9144-4352-9a17-dfec1d246aad",
"metadata": {},
"outputs": [],
"source": [
"# First define a generator; it looks like a function, but it has yield instead of return\n",
"\n",
"import time\n",
"\n",
"def come_up_with_fruit_names():\n",
" for fruit in fruits:\n",
" time.sleep(1) # thinking of a fruit\n",
" yield fruit"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eac338bb-285c-45c8-8a3e-dbfc41409ca3",
"metadata": {},
"outputs": [],
"source": [
"# Then use it\n",
"\n",
"for fruit in come_up_with_fruit_names():\n",
" print(fruit)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f6880578-a3de-4502-952a-4572b95eb9ff",
"metadata": {},
"outputs": [],
"source": [
"# Here's another one\n",
"\n",
"def authors_generator():\n",
" for book in books:\n",
" if book.get(\"author\"):\n",
" yield book.get(\"author\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9e316f02-f87f-441d-a01f-024ade949607",
"metadata": {},
"outputs": [],
"source": [
"# Use it\n",
"\n",
"for author in authors_generator():\n",
" print(author)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7535c9d0-410e-4e56-a86c-ae6c0e16053f",
"metadata": {},
"outputs": [],
"source": [
"# Here's the same thing written with list comprehension\n",
"\n",
"def authors_generator():\n",
" for author in [book.get(\"author\") for book in books if book.get(\"author\")]:\n",
" yield author"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dad34494-0f6c-4edb-b03f-b8d49ee186f2",
"metadata": {},
"outputs": [],
"source": [
"# Use it\n",
"\n",
"for author in authors_generator():\n",
" print(author)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "abeb7e61-d8aa-4af0-b05a-ae17323e678c",
"metadata": {},
"outputs": [],
"source": [
"# Here's a nice shortcut\n",
"# You can use \"yield from\" to yield each item of an iterable\n",
"\n",
"def authors_generator():\n",
" yield from [book.get(\"author\") for book in books if book.get(\"author\")]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "05b0cb43-aa83-4762-a797-d3beb0f22c44",
"metadata": {},
"outputs": [],
"source": [
"# Use it\n",
"\n",
"for author in authors_generator():\n",
" print(author)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fdfea58e-d809-4dd4-b7b0-c26427f8be55",
"metadata": {},
"outputs": [],
"source": [
"# And finally - we can replace the list comprehension with a set comprehension\n",
"\n",
"def unique_authors_generator():\n",
" yield from {book.get(\"author\") for book in books if book.get(\"author\")}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3e821d08-97be-4db9-9a5b-ce5dced3eff8",
"metadata": {},
"outputs": [],
"source": [
"# Use it\n",
"\n",
"for author in unique_authors_generator():\n",
" print(author)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "905ba603-15d8-4d01-9a79-60ec293d7ca1",
"metadata": {},
"outputs": [],
"source": [
"# And for some fun - press the stop button in the toolbar when bored!\n",
"# It's like we've made our own Large Language Model... although not particularly large..\n",
"# See if you understand why it prints a letter at a time, instead of a word at a time. If you're unsure, try removing the keyword \"from\" everywhere in the code.\n",
"\n",
"import random\n",
"import time\n",
"\n",
"pronouns = [\"I\", \"You\", \"We\", \"They\"]\n",
"verbs = [\"eat\", \"detest\", \"bathe in\", \"deny the existence of\", \"resent\", \"pontificate about\", \"juggle\", \"impersonate\", \"worship\", \"misplace\", \"conspire with\", \"philosophize about\", \"tap dance on\", \"dramatically renounce\", \"secretly collect\"]\n",
"adjectives = [\"turqoise\", \"smelly\", \"arrogant\", \"festering\", \"pleasing\", \"whimsical\", \"disheveled\", \"pretentious\", \"wobbly\", \"melodramatic\", \"pompous\", \"fluorescent\", \"bewildered\", \"suspicious\", \"overripe\"]\n",
"nouns = [\"turnips\", \"rodents\", \"eels\", \"walruses\", \"kumquats\", \"monocles\", \"spreadsheets\", \"bagpipes\", \"wombats\", \"accordions\", \"mustaches\", \"calculators\", \"jellyfish\", \"thermostats\"]\n",
"\n",
"def infinite_random_sentences():\n",
" while True:\n",
" yield from random.choice(pronouns)\n",
" yield \" \"\n",
" yield from random.choice(verbs)\n",
" yield \" \"\n",
" yield from random.choice(adjectives)\n",
" yield \" \"\n",
" yield from random.choice(nouns)\n",
" yield \". \"\n",
"\n",
"for letter in infinite_random_sentences():\n",
" print(letter, end=\"\", flush=True)\n",
" time.sleep(0.02)"
]
},
{
"cell_type": "markdown",
"id": "04832ea2-2447-4473-a449-104f80e24d85",
"metadata": {},
"source": [
"# Exercise\n",
"\n",
"Write some python classes for the books example.\n",
"\n",
"Write a Book class with a title and author. Include a method has_author()\n",
"\n",
"Write a BookShelf class with a list of books. Include a generator method unique_authors()"
]
},
{
"cell_type": "markdown",
"id": "35760406-fe6c-41f9-b0c0-3e8cf73aafd0",
"metadata": {},
"source": [
"# Finally\n",
"\n",
"Here are some intermediate level details of Classes from our AI friend, including use of type hints, inheritance and class methods. This includes a Book example.\n",
"\n",
"https://chatgpt.com/share/67348aca-65fc-8012-a4a9-fd1b8f04ba59"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -5,7 +5,9 @@
"id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9",
"metadata": {},
"source": [
"# Instant Gratification!\n",
"# Instant Gratification\n",
"\n",
"## Your first Frontier LLM Project!\n",
"\n",
"Let's build a useful LLM solution - in a matter of minutes.\n",
"\n",
@@ -13,26 +15,29 @@
"\n",
"Our goal is to code a new kind of Web Browser. Give it a URL, and it will respond with a summary. The Reader's Digest of the internet!!\n",
"\n",
"Before starting, be sure to have followed the instructions in the \"README\" file, including creating your API key with OpenAI and adding it to the `.env` file.\n",
"Before starting, you should have completed the setup for [PC](../SETUP-PC.md) or [Mac](../SETUP-mac.md) and you hopefully launched this jupyter lab from within the project root directory, with your environment activated.\n",
"\n",
"## If you're new to Jupyter Lab\n",
"\n",
"Welcome to the wonderful world of Data Science experimentation! Once you've used Jupyter Lab, you'll wonder how you ever lived without it. Simply click in each \"cell\" with code in it, such as the cell immediately below this text, and hit Shift+Return to execute that cell. As you wish, you can add a cell with the + button in the toolbar, and print values of variables, or try out variations. \n",
"\n",
"If you need to start a 'notebook' again, go to Kernel menu >> Restart kernel. \n",
"\n",
"If you want to become a pro at Jupyter Lab, you can read their tutorial [here](https://jupyterlab.readthedocs.io/en/latest/). But this isn't required for our course; just a good technique for hitting Shift + Return and enjoying the result!\n",
"I've written a notebook called [Guide to Jupyter](Guide%20to%20Jupyter.ipynb) to help you get more familiar with Jupyter Labs, including adding Markdown comments, using `!` to run shell commands, and `tqdm` to show progress.\n",
"\n",
"If you prefer to work in IDEs like VSCode or Pycharm, they both work great with these lab notebooks too. \n",
"\n",
"## If you'd like to brush up your Python\n",
"\n",
"I've added a notebook called [Intermediate Python](Intermediate%20Python.ipynb) to get you up to speed. But you should give it a miss if you already have a good idea what this code does: \n",
"`yield from {book.get(\"author\") for book in books if book.get(\"author\")}`\n",
"\n",
"## I am here to help\n",
"\n",
"If you have any problems at all, please do reach out. \n",
"I'm available through the platform, or at ed@edwarddonner.com, or at https://www.linkedin.com/in/eddonner/ if you'd like to connect.\n",
"I'm available through the platform, or at ed@edwarddonner.com, or at https://www.linkedin.com/in/eddonner/ if you'd like to connect (and I love connecting!)\n",
"\n",
"## More troubleshooting\n",
"\n",
"Please see the [troubleshooting](troubleshooting.ipynb) notebook in this folder for more ideas!\n",
"Please see the [troubleshooting](troubleshooting.ipynb) notebook in this folder to diagnose and fix common problems.\n",
"\n",
"## If this is old hat!\n",
"\n",
@@ -59,7 +64,7 @@
"from IPython.display import Markdown, display\n",
"from openai import OpenAI\n",
"\n",
"# If you get an error like \"NameError\" running this cell, then please head over to the troubleshooting notebook!"
"# If you get an error running this cell, then please head over to the troubleshooting notebook!"
]
},
{
@@ -75,19 +80,11 @@
"\n",
"Head over to the [troubleshooting](troubleshooting.ipynb) notebook in this folder for step by step code to identify the root cause and fix it!\n",
"\n",
"A summary of some points:\n",
"If you make a change, try restarting the \"Kernel\" (the python process sitting behind this notebook) by Kernel menu >> Restart Kernel and Clear Outputs of All Cells. Then try this notebook again, starting at the top.\n",
"\n",
"1. OpenAI takes a few minutes to register after you set up an account. If you receive an error about being over quota, try waiting a few minutes and try again.\n",
"2. You'll need to set up billing and add the minimum amount of credit at this page [here](https://platform.openai.com/settings/organization/billing/overview). OpenAI requires a minimum of $5 to get started in the U.S. right now - this might be different for your region. You'll only need to use a fraction for this course. In my view, this is well worth the investment for your education and future projects - but if you have any concerns, you can skip this and watch me using OpenAI instead. In week 3 we will start to use free open-source models!\n",
"3. Also, double check you have the right kind of API token with the right permissions. You should find it on [this webpage](https://platform.openai.com/api-keys) and it should show with Permissions of \"All\". If not, try creating another key by:\n",
"- Pressing \"Create new secret key\" on the top right\n",
"- Select **Owned by:** you, **Project:** Default project, **Permissions:** All\n",
"- Click Create secret key, and use that new key in the code and the `.env` file (it might take a few minutes to activate)\n",
"- Do a Kernel >> Restart kernel, and execute the cells in this Jupyter lab starting at the top\n",
"4. As a fallback, replace the line `openai = OpenAI()` with `openai = OpenAI(api_key=\"your-key-here\")` - while it's not recommended to hard code tokens in Jupyter lab, because then you can't share your lab with others, it's a workaround for now\n",
"5. Contact me! Message me or email ed@edwarddonner.com and we will get this to work.\n",
"Or, contact me! Message me or email ed@edwarddonner.com and we will get this to work.\n",
"\n",
"Any concerns about API costs? See my notes in the README - costs should be minimal, and you can control it at every point."
"Any concerns about API costs? See my notes in the README - costs should be minimal, and you can control it at every point. You can also use Ollama as a free alternative, which we discuss during Day 2."
]
},
{
@@ -106,10 +103,10 @@
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif api_key[:8]!=\"sk-proj-\":\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them\")\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")\n"
]
@@ -124,8 +121,8 @@
"openai = OpenAI()\n",
"\n",
"# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.\n",
"# If it STILL doesn't work (horrors!) then please see the troubleshooting notebook, ot try the below line instead:\n",
"# openai = OpenAI(api_key=\"your-key-here\")"
"# If it STILL doesn't work (horrors!) then please see the troubleshooting notebook, or try the below line instead:\n",
"# openai = OpenAI(api_key=\"your-key-here-starting-sk-proj-\")"
]
},
{
@@ -136,14 +133,12 @@
"outputs": [],
"source": [
"# A class to represent a Webpage\n",
"# If you're not familiar with Classes, check out the \"Intermediate Python\" notebook\n",
"\n",
"class Website:\n",
" \"\"\"\n",
" A utility class to represent a Website that we have scraped\n",
" \"\"\"\n",
" url: str\n",
" title: str\n",
" text: str\n",
"\n",
" def __init__(self, url):\n",
" \"\"\"\n",
@@ -215,13 +210,23 @@
"\n",
"def user_prompt_for(website):\n",
" user_prompt = f\"You are looking at a website titled {website.title}\"\n",
" user_prompt += \"The contents of this website is as follows; \\\n",
" user_prompt += \"\\nThe contents of this website is as follows; \\\n",
"please provide a short summary of this website in markdown. \\\n",
"If it includes news or announcements, then summarize these too.\\n\\n\"\n",
" user_prompt += website.text\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "26448ec4-5c00-4204-baec-7df91d11ff2e",
"metadata": {},
"outputs": [],
"source": [
"print(user_prompt_for(ed))"
]
},
{
"cell_type": "markdown",
"id": "ea211b5f-28e1-4a86-8e52-c0b7677cadcc",
@@ -255,6 +260,16 @@
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "36478464-39ee-485c-9f3f-6a4e458dbc9c",
"metadata": {},
"outputs": [],
"source": [
"messages_for(ed)"
]
},
{
"cell_type": "markdown",
"id": "16f49d46-bf55-4c3e-928f-68fc0bf715b0",
@@ -370,7 +385,7 @@
"source": [
"## An extra exercise for those who enjoy web scraping\n",
"\n",
"You may notice that if you try `display_summary(\"https://openai.com\")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them."
"You may notice that if you try `display_summary(\"https://openai.com\")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. In the community-contributions folder, you'll find an example Selenium solution from a student (thank you!)"
]
},
{

167
week1/day2 EXERCISE.ipynb Normal file
View File

@@ -0,0 +1,167 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9",
"metadata": {},
"source": [
"# HOMEWORK EXERCISE ASSIGNMENT\n",
"\n",
"Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI\n",
"\n",
"You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.\n",
"\n",
"**Benefits:**\n",
"1. No API charges - open-source\n",
"2. Data doesn't leave your box\n",
"\n",
"**Disadvantages:**\n",
"1. Significantly less power than Frontier Model\n",
"\n",
"## Recap on installation of Ollama\n",
"\n",
"Simply visit [ollama.com](https://ollama.com) and install!\n",
"\n",
"Once complete, the ollama server should already be running locally. \n",
"If you visit: \n",
"[http://localhost:11434/](http://localhost:11434/)\n",
"\n",
"You should see the message `Ollama is running`. \n",
"\n",
"If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve` \n",
"Then try [http://localhost:11434/](http://localhost:11434/) again."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import requests\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29ddd15d-a3c5-4f4e-a678-873f56162724",
"metadata": {},
"outputs": [],
"source": [
"# Constants\n",
"\n",
"OLLAMA_API = \"http://localhost:11434/api/chat\"\n",
"HEADERS = {\"Content-Type\": \"application/json\"}\n",
"MODEL = \"llama3.2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dac0a679-599c-441f-9bf2-ddc73d35b940",
"metadata": {},
"outputs": [],
"source": [
"# Create a messages list using the same format that we used for OpenAI\n",
"\n",
"messages = [\n",
" {\"role\": \"user\", \"content\": \"Describe some of the business applications of Generative AI\"}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7bb9c624-14f0-4945-a719-8ddb64f66f47",
"metadata": {},
"outputs": [],
"source": [
"payload = {\n",
" \"model\": MODEL,\n",
" \"messages\": messages,\n",
" \"stream\": False\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42b9f644-522d-4e05-a691-56e7658c0ea9",
"metadata": {},
"outputs": [],
"source": [
"response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)\n",
"print(response.json()['message']['content'])"
]
},
{
"cell_type": "markdown",
"id": "6a021f13-d6a1-4b96-8e18-4eae49d876fe",
"metadata": {},
"source": [
"# Introducing the ollama package\n",
"\n",
"And now we'll do the same thing, but using the elegant ollama python package instead of a direct HTTP call.\n",
"\n",
"Under the hood, it's making the same call as above to the ollama server running at localhost:11434"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7745b9c4-57dc-4867-9180-61fa5db55eb8",
"metadata": {},
"outputs": [],
"source": [
"import ollama\n",
"\n",
"response = ollama.chat(model=MODEL, messages=messages)\n",
"print(response['message']['content'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a611b05-b5b0-4c83-b82d-b3a39ffb917d",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "1622d9bb-5c68-4d4e-9ca4-b492c751f898",
"metadata": {},
"source": [
"# NOW the exercise for you\n",
"\n",
"Take the code from day1 and incorporate it here, to build a website summarizer that uses Llama 3.2 running locally instead of OpenAI"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -28,6 +28,7 @@
"outputs": [],
"source": [
"# imports\n",
"# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt\n",
"\n",
"import os\n",
"import requests\n",
@@ -49,7 +50,13 @@
"# Initialize and constants\n",
"\n",
"load_dotenv()\n",
"os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"if api_key and api_key[:8]=='sk-proj-':\n",
" print(\"API key looks good so far\")\n",
"else:\n",
" print(\"There might be a problem with your API key? Please visit the troubleshooting notebook!\")\n",
" \n",
"MODEL = 'gpt-4o-mini'\n",
"openai = OpenAI()"
]
@@ -67,11 +74,6 @@
" \"\"\"\n",
" A utility class to represent a Website that we have scraped, now with links\n",
" \"\"\"\n",
" url: str\n",
" title: str\n",
" body: str\n",
" links: List[str]\n",
" text: str\n",
"\n",
" def __init__(self, url):\n",
" self.url = url\n",
@@ -100,7 +102,7 @@
"outputs": [],
"source": [
"ed = Website(\"https://edwarddonner.com\")\n",
"print(ed.get_contents())"
"ed.links"
]
},
{
@@ -140,6 +142,16 @@
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b97e4068-97ed-4120-beae-c42105e4d59a",
"metadata": {},
"outputs": [],
"source": [
"print(link_system_prompt)"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -175,7 +187,7 @@
"source": [
"def get_links(url):\n",
" website = Website(url)\n",
" completion = openai.chat.completions.create(\n",
" response = openai.chat.completions.create(\n",
" model=MODEL,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": link_system_prompt},\n",
@@ -183,10 +195,21 @@
" ],\n",
" response_format={\"type\": \"json_object\"}\n",
" )\n",
" result = completion.choices[0].message.content\n",
" result = response.choices[0].message.content\n",
" return json.loads(result)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74a827a0-2782-4ae5-b210-4a242a8b4cc2",
"metadata": {},
"outputs": [],
"source": [
"anthropic = Website(\"https://anthropic.com\")\n",
"anthropic.links"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -268,6 +291,16 @@
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cd909e0b-1312-4ce2-a553-821e795d7572",
"metadata": {},
"outputs": [],
"source": [
"get_brochure_user_prompt(\"Anthropic\", \"https://anthropic.com\")"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -308,14 +341,6 @@
"with the familiar typewriter animation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bcb358a4-aa7f-47ec-b2bc-67768783dfe1",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,

View File

@@ -0,0 +1,463 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9",
"metadata": {},
"source": [
"# EXERCISE SOLUTION\n",
"\n",
"Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI\n",
"\n",
"You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.\n",
"\n",
"**Benefits:**\n",
"1. No API charges - open-source\n",
"2. Data doesn't leave your box\n",
"\n",
"**Disadvantages:**\n",
"1. Significantly less power than Frontier Model\n",
"\n",
"## Recap on installation of Ollama\n",
"\n",
"Simply visit [ollama.com](https://ollama.com) and install!\n",
"\n",
"Once complete, the ollama server should already be running locally. \n",
"If you visit: \n",
"[http://localhost:11434/](http://localhost:11434/)\n",
"\n",
"You should see the message `Ollama is running`. \n",
"\n",
"If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve` \n",
"Then try [http://localhost:11434/](http://localhost:11434/) again."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import requests\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display\n",
"import ollama"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "29ddd15d-a3c5-4f4e-a678-873f56162724",
"metadata": {},
"outputs": [],
"source": [
"# Constants\n",
"\n",
"MODEL = \"llama3.2\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c5e793b2-6775-426a-a139-4848291d0463",
"metadata": {},
"outputs": [],
"source": [
"# A class to represent a Webpage\n",
"\n",
"class Website:\n",
" \"\"\"\n",
" A utility class to represent a Website that we have scraped\n",
" \"\"\"\n",
" url: str\n",
" title: str\n",
" text: str\n",
"\n",
" def __init__(self, url):\n",
" \"\"\"\n",
" Create this Website object from the given url using the BeautifulSoup library\n",
" \"\"\"\n",
" self.url = url\n",
" response = requests.get(url)\n",
" soup = BeautifulSoup(response.content, 'html.parser')\n",
" self.title = soup.title.string if soup.title else \"No title found\"\n",
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
" irrelevant.decompose()\n",
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "2ef960cf-6dc2-4cda-afb3-b38be12f4c97",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Home - Edward Donner\n",
"Home\n",
"Outsmart\n",
"An arena that pits LLMs against each other in a battle of diplomacy and deviousness\n",
"About\n",
"Posts\n",
"Well, hi there.\n",
"Im Ed. I like writing code and experimenting with LLMs, and hopefully youre here because you do too. I also enjoy DJing (but Im badly out of practice), amateur electronic music production (\n",
"very\n",
"amateur) and losing myself in\n",
"Hacker News\n",
", nodding my head sagely to things I only half understand.\n",
"Im the co-founder and CTO of\n",
"Nebula.io\n",
". Were applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. Im previously the founder and CEO of AI startup untapt,\n",
"acquired in 2021\n",
".\n",
"We work with groundbreaking, proprietary LLMs verticalized for talent, weve\n",
"patented\n",
"our matching model, and our award-winning platform has happy customers and tons of press coverage.\n",
"Connect\n",
"with me for more!\n",
"October 16, 2024\n",
"From Software Engineer to AI Data Scientist resources\n",
"August 6, 2024\n",
"Outsmart LLM Arena a battle of diplomacy and deviousness\n",
"June 26, 2024\n",
"Choosing the Right LLM: Toolkit and Resources\n",
"February 7, 2024\n",
"Fine-tuning an LLM on your texts: a simulation of you\n",
"Navigation\n",
"Home\n",
"Outsmart\n",
"An arena that pits LLMs against each other in a battle of diplomacy and deviousness\n",
"About\n",
"Posts\n",
"Get in touch\n",
"ed [at] edwarddonner [dot] com\n",
"www.edwarddonner.com\n",
"Follow me\n",
"LinkedIn\n",
"Twitter\n",
"Facebook\n",
"Subscribe to newsletter\n",
"Type your email…\n",
"Subscribe\n"
]
}
],
"source": [
"# Let's try one out\n",
"\n",
"ed = Website(\"https://edwarddonner.com\")\n",
"print(ed.title)\n",
"print(ed.text)"
]
},
{
"cell_type": "markdown",
"id": "6a478a0c-2c53-48ff-869c-4d08199931e1",
"metadata": {},
"source": [
"## Types of prompts\n",
"\n",
"You may know this already - but if not, you will get very familiar with it!\n",
"\n",
"Models like GPT4o have been trained to receive instructions in a particular way.\n",
"\n",
"They expect to receive:\n",
"\n",
"**A system prompt** that tells them what task they are performing and what tone they should use\n",
"\n",
"**A user prompt** -- the conversation starter that they should reply to"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "abdb8417-c5dc-44bc-9bee-2e059d162699",
"metadata": {},
"outputs": [],
"source": [
"# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish.\"\n",
"\n",
"system_prompt = \"You are an assistant that analyzes the contents of a website \\\n",
"and provides a short summary, ignoring text that might be navigation related. \\\n",
"Respond in markdown.\""
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "f0275b1b-7cfe-4f9d-abfa-7650d378da0c",
"metadata": {},
"outputs": [],
"source": [
"# A function that writes a User Prompt that asks for summaries of websites:\n",
"\n",
"def user_prompt_for(website):\n",
" user_prompt = f\"You are looking at a website titled {website.title}\"\n",
" user_prompt += \"The contents of this website is as follows; \\\n",
"please provide a short summary of this website in markdown. \\\n",
"If it includes news or announcements, then summarize these too.\\n\\n\"\n",
" user_prompt += website.text\n",
" return user_prompt"
]
},
{
"cell_type": "markdown",
"id": "ea211b5f-28e1-4a86-8e52-c0b7677cadcc",
"metadata": {},
"source": [
"## Messages\n",
"\n",
"The API from Ollama expects the same message format as OpenAI:\n",
"\n",
"```\n",
"[\n",
" {\"role\": \"system\", \"content\": \"system message goes here\"},\n",
" {\"role\": \"user\", \"content\": \"user message goes here\"}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "0134dfa4-8299-48b5-b444-f2a8c3403c88",
"metadata": {},
"outputs": [],
"source": [
"# See how this function creates exactly the format above\n",
"\n",
"def messages_for(website):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
" ]"
]
},
{
"cell_type": "markdown",
"id": "16f49d46-bf55-4c3e-928f-68fc0bf715b0",
"metadata": {},
"source": [
"## Time to bring it together - now with Ollama instead of OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "905b9919-aba7-45b5-ae65-81b3d1d78e34",
"metadata": {},
"outputs": [],
"source": [
"# And now: call the Ollama function instead of OpenAI\n",
"\n",
"def summarize(url):\n",
" website = Website(url)\n",
" messages = messages_for(website)\n",
" response = ollama.chat(model=MODEL, messages=messages)\n",
" return response['message']['content']"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "05e38d41-dfa4-4b20-9c96-c46ea75d9fb5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'**Summary**\\n\\n* Website belongs to Edward Donner, a co-founder and CTO of Nebula.io.\\n* He is the founder and CEO of AI startup untapt, which was acquired in 2021.\\n\\n**News/Announcements**\\n\\n* October 16, 2024: \"From Software Engineer to AI Data Scientist resources\" (resource list for career advancement)\\n* August 6, 2024: \"Outsmart LLM Arena a battle of diplomacy and deviousness\" (introducing the Outsmart arena, a competition between LLMs)\\n* June 26, 2024: \"Choosing the Right LLM: Toolkit and Resources\" (resource list for selecting the right LLM)\\n* February 7, 2024: \"Fine-tuning an LLM on your texts: a simulation of you\" (blog post about simulating human-like conversations with LLMs)'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"summarize(\"https://edwarddonner.com\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "3d926d59-450e-4609-92ba-2d6f244f1342",
"metadata": {},
"outputs": [],
"source": [
"# A function to display this nicely in the Jupyter output, using markdown\n",
"\n",
"def display_summary(url):\n",
" summary = summarize(url)\n",
" display(Markdown(summary))"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "3018853a-445f-41ff-9560-d925d1774b2f",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"# Summary of Edward Donner's Website\n",
"\n",
"## About the Creator\n",
"Edward Donner is a writer, code enthusiast, and co-founder/CTO of Nebula.io, an AI company that applies AI to help people discover their potential.\n",
"\n",
"## Recent Announcements and News\n",
"\n",
"* October 16, 2024: \"From Software Engineer to AI Data Scientist resources\" - a resource list for those transitioning into AI data science.\n",
"* August 6, 2024: \"Outsmart LLM Arena a battle of diplomacy and deviousness\" - an introduction to the Outsmart arena where LLMs compete against each other in diplomacy and strategy.\n",
"* June 26, 2024: \"Choosing the Right LLM: Toolkit and Resources\" - a resource list for choosing the right Large Language Model (LLM) for specific use cases.\n",
"\n",
"## Miscellaneous\n",
"\n",
"* A section about Ed's personal interests, including DJing and amateur electronic music production.\n",
"* Links to his professional profiles on LinkedIn, Twitter, Facebook, and a contact email."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display_summary(\"https://edwarddonner.com\")"
]
},
{
"cell_type": "markdown",
"id": "b3bcf6f4-adce-45e9-97ad-d9a5d7a3a624",
"metadata": {},
"source": [
"# Let's try more websites\n",
"\n",
"Note that this will only work on websites that can be scraped using this simplistic approach.\n",
"\n",
"Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)\n",
"\n",
"Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.\n",
"\n",
"But many websites will work just fine!"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "45d83403-a24c-44b5-84ac-961449b4008f",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"I can't provide information on that topic."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display_summary(\"https://cnn.com\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "75e9fd40-b354-4341-991e-863ef2e59db7",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"# Website Summary: Anthropic\n",
"## Overview\n",
"\n",
"Anthropic is an AI safety and research company based in San Francisco. Their interdisciplinary team has experience across ML, physics, policy, and product.\n",
"\n",
"### News and Announcements\n",
"\n",
"* **Claude 3.5 Sonnet** is now available, featuring the most intelligent AI model.\n",
"* **Announcement**: Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku (October 22, 2024)\n",
"* **Research Update**: Constitutional AI: Harmlessness from AI Feedback (December 15, 2022) and Core Views on AI Safety: When, Why, What, and How (March 8, 2023)\n",
"\n",
"### Products and Services\n",
"\n",
"* Claude for Enterprise\n",
"* Research and development of AI systems with a focus on safety and reliability.\n",
"\n",
"### Company Information\n",
"\n",
"* Founded in San Francisco\n",
"* Interdisciplinary team with experience across ML, physics, policy, and product.\n",
"* Provides reliable and beneficial AI systems."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display_summary(\"https://anthropic.com\")"
]
},
{
"cell_type": "markdown",
"id": "eeab24dc-5f90-4570-b542-b0585aca3eb6",
"metadata": {},
"source": [
"# Sharing your code\n",
"\n",
"I'd love it if you share your code afterwards so I can share it with others! You'll notice that some students have already made changes (including a Selenium implementation) which you will find in the community-contributions folder. If you'd like add your changes to that folder, submit a Pull Request with your new versions in that folder and I'll merge your changes.\n",
"\n",
"If you're not an expert with git (and I am not!) then GPT has given some nice instructions on how to submit a Pull Request. It's a bit of an involved process, but once you've done it once it's pretty clear. As a pro-tip: it's best if you clear the outputs of your Jupyter notebooks (Edit >> Clean outputs of all cells, and then Save) for clean notebooks.\n",
"\n",
"PR instructions courtesy of an AI friend: https://chatgpt.com/share/670145d5-e8a8-8012-8f93-39ee4e248b4c"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "682eff74-55c4-4d4b-b267-703edbc293c7",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,176 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5",
"metadata": {},
"source": [
"# End of week 1 solution\n",
"\n",
"To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n",
"and responds with an explanation. This is a tool that you will be able to use yourself during the course!\n",
"\n",
"After week 2 you'll be able to add a User Interface to this tool, giving you a valuable application."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1070317-3ed9-4659-abe3-828943230e03",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"from dotenv import load_dotenv\n",
"from IPython.display import Markdown, display, update_display\n",
"from openai import OpenAI\n",
"import ollama"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a456906-915a-4bfd-bb9d-57e505c5093f",
"metadata": {},
"outputs": [],
"source": [
"# constants\n",
"\n",
"MODEL_GPT = 'gpt-4o-mini'\n",
"MODEL_LLAMA = 'llama3.2'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8d7923c-5f28-4c30-8556-342d7c8497c1",
"metadata": {},
"outputs": [],
"source": [
"# set up environment\n",
"\n",
"load_dotenv()\n",
"openai = OpenAI()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f0d0137-52b0-47a8-81a8-11a90a010798",
"metadata": {},
"outputs": [],
"source": [
"# here is the question; type over this to ask something new\n",
"\n",
"question = \"\"\"\n",
"Please explain what this code does and why:\n",
"yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8595807b-8ae2-4e1b-95d9-e8532142e8bb",
"metadata": {},
"outputs": [],
"source": [
"# prompts\n",
"\n",
"system_prompt = \"You are a helpful technical tutor who answers questions about python code, software engineering, data science and LLMs\"\n",
"user_prompt = \"Please give a detailed explanation to the following question: \" + question"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9605cbb6-3d3f-4969-b420-7f4cae0b9328",
"metadata": {},
"outputs": [],
"source": [
"# messages\n",
"\n",
"messages = [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60ce7000-a4a5-4cce-a261-e75ef45063b4",
"metadata": {},
"outputs": [],
"source": [
"# Get gpt-4o-mini to answer, with streaming\n",
"\n",
"stream = openai.chat.completions.create(model=MODEL_GPT, messages=messages,stream=True)\n",
" \n",
"response = \"\"\n",
"display_handle = display(Markdown(\"\"), display_id=True)\n",
"for chunk in stream:\n",
" response += chunk.choices[0].delta.content or ''\n",
" response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n",
" update_display(Markdown(response), display_id=display_handle.display_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538",
"metadata": {},
"outputs": [],
"source": [
"# Get Llama 3.2 to answer\n",
"\n",
"response = ollama.chat(model=MODEL_LLAMA, messages=messages)\n",
"reply = response['message']['content']\n",
"display(Markdown(reply))"
]
},
{
"cell_type": "markdown",
"id": "7e14bcdb-b928-4b14-961e-9f7d8c7335bf",
"metadata": {},
"source": [
"# Congratulations!\n",
"\n",
"You could make it better by taking in the question using \n",
"`my_question = input(\"Please enter your question:\")`\n",
"\n",
"And then creating the prompts and making the calls interactively."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da663d73-dd2a-4fff-84df-2209cf2b330b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -23,11 +23,18 @@
"\n",
"Try running the next cell (click in the cell under this one and hit shift+return).\n",
"\n",
"If this doesn't work, then you're likely not running in an \"activated\" environment. Please check back in the [README](../README.md) for setting up the Anaconda (or virtualenv) environment and activating it, before running `jupyter lab`.\n",
"If this gives an error, then you're likely not running in an \"activated\" environment. Please check back in Part 5 of the SETUP guide for [PC](../SETUP-PC.md) or [Mac](../SETUP-mac.md) for setting up the Anaconda (or virtualenv) environment and activating it, before running `jupyter lab`.\n",
"\n",
"If you look in the Anaconda window (on PC) or the Terminal window (on Mac), you should see `(llms)` in your prompt where you launch `jupyter lab` - that's your clue that the llms environment is activated.\n",
"If you look in the Anaconda prompt (PC) or the Terminal (Mac), you should see `(llms)` in your prompt where you launch `jupyter lab` - that's your clue that the llms environment is activated.\n",
"\n",
"You might also need to restart the \"Kernel\" (the python process used to run this notebook). To do that, go to the Kernel menu and select \"Restart Kernel and Clear Outputs Of All Cells..\""
"If you are in an activated environment, the next thing to try is to restart everything:\n",
"1. Close down all Jupyter windows, like this\n",
"2. Exit all command prompts / Terminals / Anaconda\n",
"3. Repeat Part 5 from the SETUP instructions to begin a new activated environment and launch jupyter lab\n",
"4. Kernel menu >> Restart Kernel and Clear Outputs of All Cells\n",
"5. Come back to this notebook and try the cell below again.\n",
"\n",
"If **that** doesn't work, then please contact me! I'll respond quickly, and we'll figure it out."
]
},
{
@@ -37,7 +44,8 @@
"metadata": {},
"outputs": [],
"source": [
"# This should run with no output - no import errors:\n",
"# This should run with no output - no import errors.\n",
"# Import errors might indicate that you started jupyter lab without your environment activated? See SETUP part 5.\n",
"\n",
"from openai import OpenAI"
]
@@ -200,7 +208,7 @@
"Then there's something up with your API key!\n",
"\n",
"First check this webpage to make sure you have a positive credit balance.\n",
"OpenAI requires that you have a positive credit balance and it has minimums. My salespitch for OpenAI is that this is well worth it for your education: for less than the price of a music album, you will build so much valuable commercial experience. But it's not required for this course at all; you can watch me running OpenAI code, and then wait until we get to open source models in week 3.\n",
"OpenAI requires that you have a positive credit balance and it has minimums. My salespitch for OpenAI is that this is well worth it for your education: for less than the price of a music album, you will build so much valuable commercial experience. But it's not required for this course at all; you can watch me running OpenAI code, and then wait until we get to open source models in week 3. Also, I'll show you how to use Ollama to run open-source models locally.\n",
"\n",
"https://platform.openai.com/settings/organization/billing/overview\n",
"\n",
@@ -208,7 +216,7 @@
"\n",
"https://platform.openai.com/api-keys\n",
"\n",
"And note that sometimes OpenAI seems to take a few minutes to give you access after you try.\n",
"Sometimes OpenAI may take a few minutes to give you access after you try.\n",
"\n",
"## If all else fails:\n",
"\n",

104
week1/week1 EXERCISE.ipynb Normal file
View File

@@ -0,0 +1,104 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5",
"metadata": {},
"source": [
"# End of week 1 exercise\n",
"\n",
"To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n",
"and responds with an explanation. This is a tool that you will be able to use yourself during the course!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1070317-3ed9-4659-abe3-828943230e03",
"metadata": {},
"outputs": [],
"source": [
"# imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a456906-915a-4bfd-bb9d-57e505c5093f",
"metadata": {},
"outputs": [],
"source": [
"# constants\n",
"\n",
"MODEL_GPT = 'gpt-4o-mini'\n",
"MODEL_LLAMA = 'llama3.2'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8d7923c-5f28-4c30-8556-342d7c8497c1",
"metadata": {},
"outputs": [],
"source": [
"# set up environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f0d0137-52b0-47a8-81a8-11a90a010798",
"metadata": {},
"outputs": [],
"source": [
"# here is the question; type over this to ask something new\n",
"\n",
"question = \"\"\"\n",
"Please explain what this code does and why:\n",
"yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60ce7000-a4a5-4cce-a261-e75ef45063b4",
"metadata": {},
"outputs": [],
"source": [
"# Get gpt-4o-mini to answer, with streaming"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538",
"metadata": {},
"outputs": [],
"source": [
"# Get Llama 3.2 to answer"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}