Files
LLM_Engineering_OLD/week1/community-contributions/abhayas/week1 EXERCISE.ipynb

247 lines
8.7 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5",
"metadata": {},
"source": [
"# End of week 1 exercise\n",
"\n",
"To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n",
"and responds with an explanation. This is a tool that you will be able to use yourself during the course!"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "c1070317-3ed9-4659-abe3-828943230e03",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from bs4 import BeautifulSoup\n",
"from openai import OpenAI\n",
"import ollama\n",
"from IPython.display import Markdown, clear_output, display"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "4a456906-915a-4bfd-bb9d-57e505c5093f",
"metadata": {},
"outputs": [],
"source": [
"# constants\n",
"\n",
"MODEL_GPT = 'gpt-4o-mini'\n",
"MODEL_LLAMA = 'llama3.2'"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a8d7923c-5f28-4c30-8556-342d7c8497c1",
"metadata": {},
"outputs": [],
"source": [
"# set up environment\n",
"load_dotenv(override=True)\n",
"apikey = os.getenv(\"OPENAI_API_KEY\")\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3f0d0137-52b0-47a8-81a8-11a90a010798",
"metadata": {},
"outputs": [],
"source": [
"# here is the question; type over this to ask something new\n",
"\n",
"question = \"\"\"\n",
"Please explain what this code does and why:\n",
"yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "d9630ca0-fa23-4f80-8c52-4c51b0f25534",
"metadata": {},
"outputs": [],
"source": [
"messages = [\n",
" {\n",
" \"role\":\"system\",\n",
" \"content\" : '''You are a technical adviser. the student is learning llm engineering \n",
" and you will be asked few lines of codes to explain with an example. \n",
" mostly in python'''\n",
" },\n",
" {\n",
" \"role\":\"user\",\n",
" \"content\":question\n",
" }\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "60ce7000-a4a5-4cce-a261-e75ef45063b4",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"This line of code uses a generator in Python to yield values from a set comprehension. Lets break it down:\n",
"\n",
"1. **`{book.get(\"author\") for book in books if book.get(\"author\")}`**:\n",
" - This is a set comprehension that creates a set of unique authors from a collection called `books`.\n",
" - `books` is expected to be a list (or any iterable) where each item (called `book`) is likely a dictionary.\n",
" - The expression `book.get(\"author\")` attempts to retrieve the value associated with the key `\"author\"` from each `book` dictionary.\n",
" - The `if book.get(\"author\")` condition filters out any books where the `author` key does not exist or is `None`, ensuring only valid author names are included in the set.\n",
" - Since its a set comprehension, any duplicate authors will be automatically removed, resulting in a set of unique authors.\n",
"\n",
"2. **`yield from`**:\n",
" - The `yield from` syntax is used within a generator function to yield all values from another iterable. In this case, it is yielding each item from the set created by the comprehension.\n",
" - This means that when this generator function is called, it will produce each unique author found in the `books` iterable one at a time.\n",
"\n",
"### Summary\n",
"The line of code effectively constructs a generator that will yield unique authors from a list of book dictionaries, where each dictionary is expected to contain an `\"author\"` key. The use of `yield from` allows the generator to yield each author in the set without further iteration code. This approach is efficient and neatly combines filtering, uniqueness, and yielding into a single line of code."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"stream = openai.chat.completions.create(\n",
" model=MODEL_GPT,\n",
" messages=messages,\n",
" stream=True)\n",
"stringx = \"\"\n",
"print(stream)\n",
"for x in stream:\n",
" if getattr(x.choices[0].delta, \"content\", None):\n",
" stringx+=x.choices[0].delta.content\n",
" clear_output(wait=True)\n",
" display(Markdown(stringx))"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "4d482c69-b61a-4a94-84df-73f1d97a4419",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"Let's break down this line of code:\n",
"\n",
"**Code Analysis**\n",
"\n",
"```python\n",
"yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n",
"```\n",
"\n",
"**Explanation**\n",
"\n",
"This is a Python generator expression that uses the `yield from` syntax.\n",
"\n",
"Here's what it does:\n",
"\n",
"1. **List Comprehension**: `{...}` is a list comprehension, which generates a new list containing the results of an expression applied to each item in the input iterable (`books`).\n",
"2. **Filtering**: The condition `if book.get(\"author\")` filters out any items from the `books` list where `\"author\"` is not present as a key-value pair.\n",
"3. **Dictionary Lookup**: `.get(\"author\")` looks up the value associated with the key `\"author\"` in each dictionary (`book`) and returns it if found, or `None` otherwise.\n",
"\n",
"**What does `yield from` do?**\n",
"\n",
"The `yield from` keyword is used to \"forward\" the iteration of another generator (or iterable) into this one. In other words, instead of creating a new list containing all the values generated by the inner iterator (`{book.get(\"author\") for book in books if book.get(\"author\")}`), it yields each value **one at a time**, as if you were iterating over the original `books` list.\n",
"\n",
"**Why is this useful?**\n",
"\n",
"By using `yield from`, we can create a generator that:\n",
"\n",
"* Only generates values when they are actually needed (i.e., only when an iteration is requested).\n",
"* Does not consume extra memory for creating an intermediate list.\n",
"\n",
"This makes it more memory-efficient, especially when dealing with large datasets or infinite iterations.\n",
"\n",
"**Example**\n",
"\n",
"Suppose we have a list of books with authors:\n",
"```python\n",
"books = [\n",
" {\"title\": \"Book 1\", \"author\": \"Author A\"},\n",
" {\"title\": \"Book 2\", \"author\": None},\n",
" {\"title\": \"Book 3\", \"author\": \"Author C\"}\n",
"]\n",
"```\n",
"If we apply the generator expression to this list, it would yield:\n",
"```python\n",
"yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n",
"```\n",
"The output would be: `['Author A', 'Author C']`\n",
"\n",
"Note that the second book (\"Book 2\") is skipped because its author is `None`."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"text = \"\"\n",
"for obj in ollama.chat(\n",
" model=MODEL_LLAMA,\n",
" messages=messages,\n",
" stream=True):\n",
" text+=obj.message.content\n",
" clear_output(wait=True)\n",
" display(Markdown(text))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef1194fc-3c9c-432c-86cc-f77f33916188",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}