Files
LLM_Engineering_OLD/week1/community-contributions/philip/week1_EXERCISE.ipynb
2025-10-29 19:35:06 +01:00

336 lines
14 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5",
"metadata": {},
"source": [
"# End of week 1 exercise\n",
"\n",
"To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n",
"and responds with an explanation. This is a tool that you will be able to use yourself during the course!"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "c1070317-3ed9-4659-abe3-828943230e03",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"from IPython.display import display, update_display, Markdown\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI \n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4a456906-915a-4bfd-bb9d-57e505c5093f",
"metadata": {},
"outputs": [],
"source": [
"# constants\n",
"\n",
"MODEL_GPT = 'gpt-4o-mini'\n",
"MODEL_LLAMA = 'llama3.2'"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a8d7923c-5f28-4c30-8556-342d7c8497c1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"API key found and looks good so far!\n"
]
}
],
"source": [
"# set up environment\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "3f0d0137-52b0-47a8-81a8-11a90a010798",
"metadata": {},
"outputs": [],
"source": [
"# here is the question; type over this to ask something new\n",
"question = \"\"\"\n",
"Please explain what this code does and why:\n",
"yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n",
"\"\"\"\n",
"\n",
"system_prompt = \"\"\"\n",
"You are python expert who is skilled in explaining code and helping people understand it.\n",
"Make sure to explain the code in a way that is easy to understand.\n",
"Keep your explanations concise and to the point.\n",
"Make sure to make it beginner friendly.\n",
"\"\"\"\n",
"\n",
"messages = [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\" : question}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "60ce7000-a4a5-4cce-a261-e75ef45063b4",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"Certainly! Let's break down the code snippet you provided:\n",
"\n",
"```python\n",
"yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n",
"```\n",
"\n",
"### Explanation:\n",
"\n",
"1. **`{book.get(\"author\") for book in books if book.get(\"author\")}`**:\n",
" - This part is a **set comprehension**. A set comprehension is used to create a set (a collection of unique items) in Python.\n",
" - It iterates over a list called `books` (which is expected to be a collection of book data).\n",
" - For each `book`, it tries to get the value associated with the key `\"author\"` using `book.get(\"author\")`. \n",
" - If the author exists (meaning its not `None` or an empty string), it adds that author to the set.\n",
" - The result is a set of unique author names from the list of books.\n",
"\n",
"2. **`yield from`**:\n",
" - This keyword is used in generator functions. It allows you to yield all values from an iterable (like a set or a list) one by one.\n",
" - In this case, it means that for each unique author in the set that was created, a value will be returned each time the generator is iterated over. \n",
"\n",
"### Why It Matters:\n",
"- **Efficiency**: By using a set, it ensures that each author is unique and there are no duplicates.\n",
"- **Generator Function**: The use of `yield from` indicates that this code is likely part of a generator function, making it memory efficient since it generates values on-the-fly rather than storing them all in memory.\n",
"\n",
"### Summary:\n",
"This code snippet efficiently gathers unique authors from a list of books and yields each author one at a time when the generator is iterated upon. It's a concise way to collect and return unique data from potentially large datasets without memory overhead."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Get gpt-4o-mini to answer, with streaming\n",
"openai = OpenAI()\n",
"\n",
"stream = openai.chat.completions.create(model=MODEL_GPT, \n",
" messages= messages,\n",
" stream=True)\n",
"\n",
"gpt_response = \"\"\n",
"display_handle = display(Markdown(\"\"), display_id=True)\n",
"\n",
"for chunk in stream:\n",
" gpt_response += chunk.choices[0].delta.content or ''\n",
" update_display(Markdown(gpt_response), display_id=display_handle.display_id)\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"**Explanation:**\n",
"\n",
"This code snippet is using a technique called \"generator expression\" to extract author names from a list of dictionaries. Let's break it down:\n",
"\n",
"* `yield from`: This keyword is used to yield values from an iterable (in this case, a generator).\n",
"* `{book.get(\"author\") for book in books if book.get(\"author\")}`: This is a tuple-comprehension that performs the following operations:\n",
"\t+ `for book in books`: Iterates over each dictionary (book) in the `books` list.\n",
"\t+ `if book.get(\"author\")`: Filters out dictionaries that don't have an author name. It uses the `.get()` method to safely retrieve a value from the dictionary, avoiding a KeyError if \"author\" doesn't exist.\n",
"\t+ `book.get(\"author\")`: Extracts the author's name (or None if it doesn't exist) from each filtered dictionary.\n",
"* `{...}`: This is an empty dictionary. It's used as a container to hold the extracted values.\n",
"\n",
"**Why?**\n",
"\n",
"The purpose of this code snippet is likely to provide a list of authors' names, where:\n",
"\n",
"1. Not all books have an author name (in which case `None` will be included in the result).\n",
"2. The input data (`books`) might contain dictionaries with only certain keys (e.g., `title`, `description`, etc.), while this code assumes that each book dictionary must have a key named \"author\".\n",
"\n",
"**Example use cases:**\n",
"\n",
"When working with large datasets or data structures, generator expressions like this one can help you:\n",
"\n",
"* Process data in chunks, avoiding excessive memory consumption\n",
"* Work with complex, nested data structures (in this case, a list of dictionaries)\n",
"* Improve code readability by encapsulating logic within a single expression\n",
"\n",
"Keep in mind that the resulting values will be yielded on-the-fly as you iterate over them, rather than having all authors' names stored in memory at once."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Get Llama 3.2 to answer\n",
"OLLAMA_BASE_URL = \"http://localhost:11434/v1\"\n",
"ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')\n",
"\n",
"stream = ollama.chat.completions.create(model=MODEL_LLAMA, \n",
" messages= messages,\n",
" stream=True)\n",
"\n",
"ollama_response = \"\"\n",
"display_handle = display(Markdown(\"\"), display_id=True)\n",
"\n",
"for chunk in stream:\n",
" ollama_response += chunk.choices[0].delta.content or ''\n",
" update_display(Markdown(ollama_response), display_id=display_handle.display_id)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "08c5f646",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"## 📊 Comparison Results\n",
"\n",
"Heres a side-by-side rating and justification for each criterion.\n",
"\n",
"Response A (GPT-4o-mini)\n",
"\n",
"- Beginner Friendliness: 4/5\n",
" - Why: It explains the two main parts (set comprehension and yield from) in straightforward terms and uses bullet points. Its easy to follow for someone with basic Python knowledge. Minor improvement could be more emphasis on what a generator function is and when youd use this pattern.\n",
"\n",
"- Tackles Main Point: 4/5\n",
" - Why: It correctly identifies the core ideas: using a set comprehension to collect unique authors and using yield from to yield items from that iterable. It also notes potential memory-related considerations and the generator nature. The nuance about memory efficiency could be clarified (the set is built in memory first), but the core point is solid.\n",
"\n",
"- Clear Examples: 4/5\n",
" - Why: It provides a succinct, concrete breakdown of each piece of the expression and a concise summary of the result. It would be even better with a short concrete example of input and the yielded values, but the explanation is clear and actionable.\n",
"\n",
"Response B (Llama 3.2)\n",
"\n",
"- Beginner Friendliness: 2/5\n",
" - Why: Contains several incorrect or misleading statements (calls the comprehension a “tuple-comprehension,” says the container is an empty dictionary, misdescribes yield from usage, and incorrectly claims that None would be yielded). These errors can confuse beginners and propagate misconceptions.\n",
"\n",
"- Tackles Main Point: 2/5\n",
" - Why: Several core misunderstandings (generator expression vs set comprehension, the effect of the if filter, and the nature of the resulting container) obscure the main concept. The explanation diverges from what the code does, reducing its usefulness as a core clarifier.\n",
"\n",
"- Clear Examples: 2/5\n",
" - Why: While it attempts to describe use cases and benefits of generator-like processing, the inaccuracies undermine the usefulness of the examples. It also incorrectly describes the resulting container and behavior of the code, making the examples less reliable for learning.\n",
"\n",
"Overall recommendation\n",
"\n",
"- Better for a beginner: Response A\n",
" - Rationale: Response A is accurate, coherent, and focused on the essential ideas (set comprehension, deduplication, and how yield from works). It avoids the factual errors present in Response B and offers a clearer path to understanding the snippet. Response B contains multiple conceptual mistakes that could mislead beginners. If you want to improve Response B, youd need to correct terminology (set vs. dict, generator vs. set comprehension) and fix the behavior description (no None is yielded due to the filter)."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Compare the two responses using GPT-5-nano\n",
"comparison_prompt = f\"\"\"\n",
"You are an expert evaluator of educational content. Compare the following two explanations of the same Python code and rank them on a scale of 1-5 for each criterion:\n",
"\n",
"**Criteria:**\n",
"1. **Beginner Friendliness**: How easy is it for a beginner to understand?\n",
"2. **Tackles Main Point**: How well does it address the core concept?\n",
"3. **Clear Examples**: How effective are the examples and explanations?\n",
"\n",
"**Response A (GPT-4o-mini):**\n",
"{gpt_response}\n",
"\n",
"**Response B (Llama 3.2):**\n",
"{ollama_response}\n",
"\n",
"**Instructions:**\n",
"- Rate each response on each criterion from 1 (poor) to 5 (excellent)\n",
"- Provide a brief justification for each rating\n",
"- Give an overall recommendation on which response is better for a beginner\n",
"\n",
"Please format your response clearly with the ratings and justifications.\n",
"\"\"\"\n",
"\n",
"comparison_messages = [\n",
" {\"role\": \"user\", \"content\": comparison_prompt}\n",
"]\n",
"\n",
"stream = openai.chat.completions.create(\n",
" model=\"gpt-5-nano\",\n",
" messages=comparison_messages,\n",
" stream=True\n",
")\n",
"\n",
"comparison_response = \"\"\n",
"display_handle = display(Markdown(\"## 📊 Comparison Results\\n\\n\"), display_id=True)\n",
"\n",
"for chunk in stream:\n",
" comparison_response += chunk.choices[0].delta.content or ''\n",
" update_display(Markdown(\"## 📊 Comparison Results\\n\\n\" + comparison_response), display_id=display_handle.display_id)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13deb6e6",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}