{ "cells": [ { "cell_type": "markdown", "id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5", "metadata": {}, "source": [ "# End of week 1 exercise\n", "\n", "To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n", "and responds with an explanation. This is a tool that you will be able to use yourself during the course!" ] }, { "cell_type": "code", "execution_count": null, "id": "c1070317-3ed9-4659-abe3-828943230e03", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setup Successful!\n" ] } ], "source": [ "# Imports and Setup\n", "import os\n", "import json\n", "from dotenv import load_dotenv\n", "from openai import OpenAI\n", "from IPython.display import Markdown, display, update_display\n", "import ollama\n", "\n", "# Load environment variables\n", "load_dotenv(override=True)\n", "\n", "# Constants\n", "MODEL_GPT = 'gpt-4o-mini'\n", "MODEL_LLAMA = 'llama3.2'\n", "\n", "print(\"Setup Successful!\")" ] }, { "cell_type": "code", "execution_count": null, "id": "4a456906-915a-4bfd-bb9d-57e505c5093f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Question to analyze:\n", "\n", "Please explain what this code does and why:\n", "yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n", "\n" ] } ], "source": [ "# Technical Question - You can modify this\n", "question = \"\"\"\n", "Please explain what this code does and why:\n", "yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n", "\"\"\"\n", "\n", "print(\"Question to analyze:\")\n", "print(question)\n", "\n", "# prompts\n", "system_prompt = \"You are a helpful technical tutor who answers questions about python code, software engineering, data science and LLMs\"\n", "user_prompt = \"Please give a detailed explanation to the following question: \" + question\n", "\n", "# messages\n", "messages = [\n", " {\"role\": \"system\", \"content\": system_prompt},\n", " {\"role\": \"user\", \"content\": user_prompt}\n", "]" ] }, { "cell_type": "code", "execution_count": null, "id": "60ce7000-a4a5-4cce-a261-e75ef45063b4", "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "## GPT-4o-mini Response:\n", "Certainly! Let's break down the provided code snippet step by step.\n", "\n", "### Code Analysis\n", "```python\n", "yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n", "```\n", "\n", "This code snippet is a generator expression, which is intended to yield values from a set comprehension. Let's clarify each part of the expression:\n", "\n", "1. **Set Comprehension**:\n", " - `{book.get(\"author\") for book in books if book.get(\"author\")}` is a set comprehension.\n", " - This means it creates a set of unique authors from a collection called `books`.\n", "\n", "2. **`books`**:\n", " - `books` is expected to be an iterable (like a list) that contains dictionaries. Each dictionary represents a book and may contain various keys, such as \"author\".\n", "\n", "3. **`book.get(\"author\")`**:\n", " - For each `book` in the `books` iterable, `book.get(\"author\")` tries to access the value associated with the key `\"author\"`.\n", " - The `.get()` method returns the value for the given key if it exists; otherwise, it returns `None`.\n", "\n", "4. **Filter Condition**: \n", " - The expression includes an `if book.get(\"author\")` filter, which ensures that only books with a defined author (i.e., `None` or an empty string are excluded) are considered.\n", " - This means that if the author is not provided, that book will not contribute to the final set.\n", "\n", "5. **Set Creation**:\n", " - The result of the set comprehension is a set of unique author names from the list of books. Since sets automatically ensure uniqueness, duplicates will be filtered out.\n", "\n", "6. **`yield from`**:\n", " - The `yield from` statement is used within a generator function. It allows the generator to yield all values from the given iterable (in this case, our created set).\n", " - This means that the values generated (i.e., unique authors) can be iterated over one by one.\n", "\n", "### Purpose and Use Case\n", "The purpose of this code snippet is to produce a generator that emits the unique author names of books from the `books` collection. This is useful in scenarios where you want to streamline the retrieval of distinct authors without immediately materializing them into a list. You can consume these unique authors one at a time efficiently, which is particularly beneficial when dealing with a large dataset.\n", "\n", "### Example\n", "Consider the following example to illustrate how this might work:\n", "\n", "```python\n", "books = [\n", " {\"title\": \"Book1\", \"author\": \"AuthorA\"},\n", " {\"title\": \"Book2\", \"author\": \"AuthorB\"},\n", " {\"title\": \"Book3\", \"author\": \"AuthorA\"}, # Duplicate author\n", " {\"title\": \"Book4\"}, # No author\n", " {\"title\": \"Book5\", \"author\": \"AuthorC\"}\n", "]\n", "\n", "# Let's say this code is inside a generator function\n", "def unique_authors(books):\n", " yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n", "\n", "for author in unique_authors(books):\n", " print(author)\n", "```\n", "### Output\n", "```\n", "AuthorA\n", "AuthorB\n", "AuthorC\n", "```\n", "\n", "### Summary\n", "This code snippet creates a generator that yields unique authors of books, omitting any entries where the author is not provided. This demonstrates an efficient and Pythonic way to handle data extraction, particularly with potentially sparse datasets." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Get gpt-4o-mini to answer, with streaming\n", "api_key = os.getenv('OPENAI_API_KEY')\n", "\n", "# Initialize OpenAI client\n", "openai = OpenAI()\n", "\n", "stream = openai.chat.completions.create(model=MODEL_GPT, messages=messages,stream=True)\n", " \n", "response = \"\"\n", "\n", "display_handle = display(Markdown(\"\"), display_id=True)\n", "\n", "for chunk in stream:\n", " if chunk.choices[0].delta.content:\n", " response += chunk.choices[0].delta.content\n", " update_display(Markdown(f\"## GPT-4o-mini Response:\\n{response}\"), display_id=display_handle.display_id)" ] }, { "cell_type": "code", "execution_count": 11, "id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538", "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "The given Python expression appears as part of an asynchronous generator, typically used with coroutines like those found within the `async`/`await` syntax introduced in Python 3.5+:\n", "\n", "```python\n", "yield from {book.get('author') for book in books if book.get('author')}\n", "```\n", "\n", "Here's a breakdown of what this code does, line by line and overall explanation:\n", "\n", "1. `{book.get('author') for book in books if book.get('author')}` is a set comprehension that iterates through each `book` object (assumed to be dictionary-like since it uses square brackets notation) within the `books` variable, which should also contain such objects as its elements based on context provided herein:\n", " - For every iteration of this generator expression, if `'author'` exists in a book's key set (`book.keys()`), then that value (presumably the name of an author associated with their corresponding `book`) is included within the resulting comprehension/set; otherwise it skips to the next item since there isn't one without `'author'`.\n", " - The `.get` method returns a specified key’s value from dictionary-like objects, but if that key doesn't exist, instead of causing an error (as would be typical with direct indexing), this expression safely retrieves `None`, or another default return type you specify as the second argument to `.get()` which isn't shown here.\n", " - Set comprehension is a construct for creating sets directly from iterables using set-building syntax (`{}`). Note that it inherently discards duplicates (if any), but this does not seem relevant since we are assuming books will uniquely have author information in the context of its key presence or absence, rather than repetitive entries.\n", " \n", "2. `yield from` is a statement used with asynchronous generators (`async def`) that handles yielding values and delegating further execution within nested coroutines: \n", " - Its function here seems to be sending each author's name (extracted by the generator expression before) back into this outercoroutine. The `yield from` statement thus passes control over these names directly as output of its own operation, rather than managing an internal sequence or iterable in a traditional manner with for-loops and appending to lists inside coroutines (which may result in blocking behavior).\n", " - In this expression's specific case without `async`/`await`, it looks like the code intends to simulate asynchronous yielding by passing values from an internal generator back out. However, proper usage would require surrounding with async function decorators and using await as needed for actual I/O-bound or network operations within a coroutine workflow context; this snippet in isolation does not directly demonstrate that behavior but instead presents a pattern resembling how yielding could be structured should it be part of an asynchronous generator expression.\n", " - It's worth mentioning, though `yield from` isn't typically used with set comprehensions or non-coroutine functions as these expressions cannot 'receive values.' Instead, this construct suggests a conceptual approach where each found author is yielded one by one in what would be the sequence of execution within an asynchronous coroutine.\n", " - Given that `yield from` isn't directly compatible with set comprehensions (without modification and appropriate context), it seems we might have encountered syntactical confusion or a misplacement here, since normally you wouldn’t see this in standalone Python code outside the scope of coroutine functions.\n", " \n", "Assuming that `books` is an iterable over dictionary-like objects (which may contain author information), and if one were to translate typical synchronous usage into asynchronous semantics or consider using a generator, then we'd instead see something like this for proper async operation:\n", "\n", "```python\n", "async def find_authors():\n", " authors = set()\n", " async for book in books: # Assuming `books` can be an iterable of awaitables (e.g., coroutines) or other asynchronous generators\n", " author = book.get('author')\n", " if author is not None:\n", " await asyncio.sleep(0) # Yield control back to the event loop, simulate async I/O operation here with `await`ing a sleep call for example purposes only (in reality this would typically handle some real asynchronous task like fetching data from an external API). Then we'd yield using 'yield':\n", " await asyncio0.sleep(2) # This line is placeholder logic and wouldn't execute without async decorators, but it serves to illustrate the use of `await` alongside a coroutine function:\n", " authors.add(author)\n", " return authors\n", "```\n", "In this modified version suitable for an asynchronous context (and with necessary adjustments): \n", "- This would be inside an `@async def find_authors()` decorated async generator/coroutine, and the `yield` keyword is used to temporarily pause execution until another coroutine or future calls its `.send(None)` method. The example uses a placeholder sleep call (`await asyncio.sleep(2)`) for demonstration purposes only; in practice one might use non-blocking I/O operations such as reading from files, network responses etc., within an async function decorated with `@async def`.\n", " \n", "It is crucial to note that the original expression provided seems like a pseudocode representation of how we could structure asynchronous behavior using `yield` and comprehensions if it were actually part of coroutine code in Python 3.5+, but isn't syntactically correct or conventionally used outside such contexts due to misunderstandings about yielding semantics from set operations without await statements (or decorators)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Get Llama 3.2 to answer\n", "response = ollama.chat(model=MODEL_LLAMA, messages=messages)\n", "\n", "reply = response['message']['content']\n", "\n", "display(Markdown(reply))" ] }, { "cell_type": "code", "execution_count": null, "id": "d1f8aa0a", "metadata": {}, "outputs": [], "source": [ "# Week 1 Learnings Summary\n", "\n", "summary = \"\"\"\n", "## Week 1 Learnings Demonstrated\n", "\n", "### ✅ Day 1 - Web Scraping & API Integration\n", "- **BeautifulSoup** for HTML parsing\n", "- **Requests** for HTTP calls\n", "- **OpenAI API** integration\n", "- **SSL certificate** handling for Windows\n", "\n", "### ✅ Day 2 - Chat Completions API & Ollama\n", "- **Chat Completions API** understanding\n", "- **OpenAI-compatible endpoints** (Ollama)\n", "- **Model comparison** techniques\n", "- **Streaming responses** implementation\n", "\n", "### ✅ Day 4 - Tokenization & Cost Management\n", "- **tiktoken** for token counting\n", "- **Cost estimation** strategies\n", "- **Text chunking** techniques\n", "- **Token-aware** processing\n", "\n", "### ✅ Day 5 - Business Solutions\n", "- **Intelligent link selection** using LLM\n", "- **Multi-page content** aggregation\n", "- **Professional brochure** generation\n", "- **Error handling** and robustness\n", "\n", "### ✅ Week 1 Exercise - Technical Question Answerer\n", "- **Streaming responses** from OpenAI\n", "- **Local inference** with Ollama\n", "- **Side-by-side comparison** of models\n", "- **Error handling** for both APIs\n", "\n", "## Key Skills Acquired:\n", "1. **API Integration** - OpenAI, Ollama, web scraping\n", "2. **Model Comparison** - Understanding different LLM capabilities\n", "3. **Streaming** - Real-time response display\n", "4. **Error Handling** - Robust application design\n", "5. **Business Applications** - Practical LLM implementations\n", "\"\"\"\n", "\n", "display(Markdown(summary))" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.12" } }, "nbformat": 4, "nbformat_minor": 5 }