LLM_Engineering_OLD/community-contributions/abdoul/week_one_ excercise.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5",
   "metadata": {},
   "source": [
    "# End of week 1 exercise\n",
    "\n",
    "To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  \n",
    "and responds with an explanation. This is a tool that you will be able to use yourself during the course!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1070317-3ed9-4659-abe3-828943230e03",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import requests\n",
    "from openai import OpenAI\n",
    "from dotenv import load_dotenv\n",
    "from typing import Optional, Literal\n",
    "from IPython.display import display, Markdown"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "4a456906-915a-4bfd-bb9d-57e505c5093f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# constants\n",
    "\n",
    "MODEL_GPT = 'gpt-4o-mini'\n",
    "MODEL_LLAMA = 'llama3.2'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "a8d7923c-5f28-4c30-8556-342d7c8497c1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# set up environment\n",
    "load_dotenv()\n",
    "\n",
    "class Model:\n",
    "    def __init__(self):\n",
    "        self.client_oai = OpenAI(api_key=os.getenv(\"OPENAI_KEY\"))\n",
    "        self.ollama_base_url = f\"{os.getenv('OLLAMA_URL')}/api/chat\"\n",
    "\n",
    "    def _prompt_llama(self, text: str):\n",
    "        response = requests.post(\n",
    "            self.ollama_base_url,\n",
    "            json={\n",
    "                \"model\": MODEL_LLAMA,\n",
    "                \"messages\": [{\"role\": \"user\", \"content\": text}],\n",
    "                \"stream\": True\n",
    "            },\n",
    "            stream=True\n",
    "        )\n",
    "        for line in response.iter_lines():\n",
    "            if not line:\n",
    "                continue\n",
    "                \n",
    "            data = line.decode(\"utf-8\")\n",
    "            if data.strip() == \"\":\n",
    "                continue\n",
    "                \n",
    "            if data.startswith(\"data:\"):\n",
    "                data = data[5:].strip()\n",
    "                \n",
    "            yield data\n",
    "\n",
    "    def _prompt_oai(self, question: str):\n",
    "        stream = self.client_oai.chat.completions.create(\n",
    "            model=MODEL_GPT,\n",
    "            messages=[\n",
    "                {\n",
    "                    \"role\": \"system\",\n",
    "                    \"content\": (\n",
    "                        \"You are an advanced reasoning and explanation engine. \"\n",
    "                        \"You write with precision, clarity, and conciseness. \"\n",
    "                        \"You can explain Python, algorithms, code design, and systems-level behavior \"\n",
    "                        \"with technical rigor, while being straight to the point.\"\n",
    "                    ),\n",
    "                },\n",
    "                {\"role\": \"user\", \"content\": question},\n",
    "            ],\n",
    "            stream=True,\n",
    "        )\n",
    "        return stream\n",
    "\n",
    "    def prompt(self, question: str, model: Optional[Literal[MODEL_GPT, MODEL_LLAMA]] = MODEL_GPT):\n",
    "        if \"gpt\" in model:\n",
    "            stream = self._prompt_oai(question)\n",
    "            buffer = []\n",
    "            for event in stream:\n",
    "                if event.choices and event.choices[0].delta.content:\n",
    "                    chunk = event.choices[0].delta.content\n",
    "                    buffer.append(chunk)\n",
    "                    yield chunk\n",
    "                    \n",
    "            output = \"\".join(buffer)\n",
    "\n",
    "        else:\n",
    "            stream = self._prompt_llama(question)\n",
    "            buffer = []\n",
    "            for chunk in stream:\n",
    "                try:\n",
    "                    import json\n",
    "                    data = json.loads(chunk)\n",
    "                    content = data.get(\"message\", {}).get(\"content\", \"\")\n",
    "                    \n",
    "                    if content:\n",
    "                        buffer.append(content)\n",
    "                        yield content\n",
    "                        \n",
    "                except Exception as e:\n",
    "                    print(\"An error occured\", e)\n",
    "                    continue\n",
    "                    \n",
    "            output = \"\".join(buffer)\n",
    "\n",
    "        display(Markdown(output))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "3f0d0137-52b0-47a8-81a8-11a90a010798",
   "metadata": {},
   "outputs": [],
   "source": [
    "# here is the question; type over this to ask something new\n",
    "question = \"\"\"\n",
    "Please explain what this code does and why:\n",
    "yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "60ce7000-a4a5-4cce-a261-e75ef45063b4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The provided code snippet is a Python expression that uses `yield from` with a set comprehension. Here’s a breakdown of what it does:\n",
      "\n",
      "1. **Set Comprehension**: The expression `{book.get(\"author\") for book in books if book.get(\"author\")}` creates a set of unique authors extracted from a list (or any iterable) called `books`. \n",
      "    - `book.get(\"author\")` retrieves the value associated with the key `\"author\"` from each `book` dictionary.\n",
      "    - The `if book.get(\"author\")` condition filters out any books that do not have an author (i.e., where the author value is `None` or an empty string). \n",
      "\n",
      "2. **Yield from**: The `yield from` statement is used in a generator function to yield all values from the iterable that follows it. In this case, it yields the unique authors produced by the set comprehension.\n",
      "\n",
      "### Why Use This Code?\n",
      "\n",
      "- **Uniqueness**: Using a set comprehension ensures that only unique authors are collected, automatically eliminating duplicates.\n",
      "- **Efficiency**: The code succinctly processes the list of books and yields only the relevant data (authors) directly from the generator, making it memory-efficient and lazy.\n",
      "- **Readability**: The use of `yield from` keeps the code clean and avoids the need to create an intermediate list before yielding authors.\n",
      "\n",
      "### Example:\n",
      "\n",
      "Given a list of books represented as dictionaries:\n",
      "\n",
      "```python\n",
      "books = [\n",
      "    {\"title\": \"Book 1\", \"author\": \"Author A\"},\n",
      "    {\"title\": \"Book 2\", \"author\": \"Author B\"},\n",
      "    {\"title\": \"Book 3\", \"author\": \"Author A\"},\n",
      "    {\"title\": \"Book 4\", \"author\": None},\n",
      "]\n",
      "```\n",
      "\n",
      "The code would yield `Author A` and `Author B`, iterating over each author only once.\n",
      "\n",
      "### Conclusion:\n",
      "\n",
      "The code succinctly generates a lazily yielded sequence of unique authors from a collection of book dictionaries, efficiently handling potential duplicates and missing values."
     ]
    },
    {
     "data": {
      "text/markdown": [
       "The provided code snippet is a Python expression that uses `yield from` with a set comprehension. Here’s a breakdown of what it does:\n",
       "\n",
       "1. **Set Comprehension**: The expression `{book.get(\"author\") for book in books if book.get(\"author\")}` creates a set of unique authors extracted from a list (or any iterable) called `books`. \n",
       "    - `book.get(\"author\")` retrieves the value associated with the key `\"author\"` from each `book` dictionary.\n",
       "    - The `if book.get(\"author\")` condition filters out any books that do not have an author (i.e., where the author value is `None` or an empty string). \n",
       "\n",
       "2. **Yield from**: The `yield from` statement is used in a generator function to yield all values from the iterable that follows it. In this case, it yields the unique authors produced by the set comprehension.\n",
       "\n",
       "### Why Use This Code?\n",
       "\n",
       "- **Uniqueness**: Using a set comprehension ensures that only unique authors are collected, automatically eliminating duplicates.\n",
       "- **Efficiency**: The code succinctly processes the list of books and yields only the relevant data (authors) directly from the generator, making it memory-efficient and lazy.\n",
       "- **Readability**: The use of `yield from` keeps the code clean and avoids the need to create an intermediate list before yielding authors.\n",
       "\n",
       "### Example:\n",
       "\n",
       "Given a list of books represented as dictionaries:\n",
       "\n",
       "```python\n",
       "books = [\n",
       "    {\"title\": \"Book 1\", \"author\": \"Author A\"},\n",
       "    {\"title\": \"Book 2\", \"author\": \"Author B\"},\n",
       "    {\"title\": \"Book 3\", \"author\": \"Author A\"},\n",
       "    {\"title\": \"Book 4\", \"author\": None},\n",
       "]\n",
       "```\n",
       "\n",
       "The code would yield `Author A` and `Author B`, iterating over each author only once.\n",
       "\n",
       "### Conclusion:\n",
       "\n",
       "The code succinctly generates a lazily yielded sequence of unique authors from a collection of book dictionaries, efficiently handling potential duplicates and missing values."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Get gpt-4o-mini to answer, with streaming\n",
    "model = Model()\n",
    "\n",
    "for token in model.prompt(question, model=MODEL_GPT):\n",
    "    print(token, end=\"\", flush=True)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This line of code is a part of Python's iteration protocol, specifically using the `yield from` keyword.\n",
      "\n",
      "Here's what it does:\n",
      "\n",
      "- It generates an iterator that yields values from another iterable.\n",
      "- The expression `{book.get(\"author\") for book in books if book.get(\"author\")}` generates an iterator over the authors of all books where an author is available (`book.get(\"author\") != None or book.get(\"author\") == \"\"`).\n",
      "\n",
      "The `yield from` keyword takes this generator expression and yields from it. It's essentially saying \"use this generator to generate values, I'll take them in order\".\n",
      "\n",
      "Here's a step-by-step explanation:\n",
      "\n",
      "1. `{book.get(\"author\") for book in books if book.get(\"author\")}` generates an iterator over the authors of all books where an author is available.\n",
      "\n",
      "   - This works by iterating over each `book` in the collection (`books`), checking if it has a valid author, and then yielding the author's name.\n",
      "   - If `book.get(\"author\")` returns `None`, or if its value is an empty string, it skips that book.\n",
      "\n",
      "2. `yield from { ... }` takes this generator expression and yields from it.\n",
      "\n",
      "   - It's like saying \"take all values yielded by the inner generator and use them one by one\". \n",
      "\n",
      "   This makes the code cleaner and easier to read because you don't have to manually call `next(book_generator)` for each book in the collection. You can simply iterate over this new iterator to get all authors.\n",
      "\n",
      "However, without more context about how these books and their data are structured, it's hard to tell exactly why or when someone would use this line of code specifically. But generally, it's useful for generating values from other iterables while still following a clear sequence.\n",
      "\n",
      "Here's an example:\n",
      "\n",
      "```python\n",
      "books = [\n",
      "    {\"title\": \"Book 1\", \"author\": None},\n",
      "    {\"title\": \"Book 2\", \"author\": \"Author 1\"},\n",
      "    {\"title\": \"Book 3\", \"author\": \"\"}\n",
      "]\n",
      "\n",
      "for author in yield from {book.get(\"author\") for book in books if book.get(\"author\")}:\n",
      "    print(author)\n",
      "```\n",
      "\n",
      "This would print:\n",
      "\n",
      "```\n",
      "Author 1\n",
      "```"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "This line of code is a part of Python's iteration protocol, specifically using the `yield from` keyword.\n",
       "\n",
       "Here's what it does:\n",
       "\n",
       "- It generates an iterator that yields values from another iterable.\n",
       "- The expression `{book.get(\"author\") for book in books if book.get(\"author\")}` generates an iterator over the authors of all books where an author is available (`book.get(\"author\") != None or book.get(\"author\") == \"\"`).\n",
       "\n",
       "The `yield from` keyword takes this generator expression and yields from it. It's essentially saying \"use this generator to generate values, I'll take them in order\".\n",
       "\n",
       "Here's a step-by-step explanation:\n",
       "\n",
       "1. `{book.get(\"author\") for book in books if book.get(\"author\")}` generates an iterator over the authors of all books where an author is available.\n",
       "\n",
       "   - This works by iterating over each `book` in the collection (`books`), checking if it has a valid author, and then yielding the author's name.\n",
       "   - If `book.get(\"author\")` returns `None`, or if its value is an empty string, it skips that book.\n",
       "\n",
       "2. `yield from { ... }` takes this generator expression and yields from it.\n",
       "\n",
       "   - It's like saying \"take all values yielded by the inner generator and use them one by one\". \n",
       "\n",
       "   This makes the code cleaner and easier to read because you don't have to manually call `next(book_generator)` for each book in the collection. You can simply iterate over this new iterator to get all authors.\n",
       "\n",
       "However, without more context about how these books and their data are structured, it's hard to tell exactly why or when someone would use this line of code specifically. But generally, it's useful for generating values from other iterables while still following a clear sequence.\n",
       "\n",
       "Here's an example:\n",
       "\n",
       "```python\n",
       "books = [\n",
       "    {\"title\": \"Book 1\", \"author\": None},\n",
       "    {\"title\": \"Book 2\", \"author\": \"Author 1\"},\n",
       "    {\"title\": \"Book 3\", \"author\": \"\"}\n",
       "]\n",
       "\n",
       "for author in yield from {book.get(\"author\") for book in books if book.get(\"author\")}:\n",
       "    print(author)\n",
       "```\n",
       "\n",
       "This would print:\n",
       "\n",
       "```\n",
       "Author 1\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Get Llama 3.2 to answer\n",
    "\n",
    "for token in model.prompt(question, model=MODEL_LLAMA):\n",
    "    print(token, end=\"\", flush=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}