diff --git a/week1/community-contributions/fernando/day2.ipynb b/week1/community-contributions/fernando/day2.ipynb
new file mode 100644
index 0000000..4a6e7b5
--- /dev/null
+++ b/week1/community-contributions/fernando/day2.ipynb
@@ -0,0 +1,494 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9",
+ "metadata": {},
+ "source": [
+ "# Welcome to the Day 2 Lab!\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ada885d9-4d42-4d9b-97f0-74fbbbfe93a9",
+ "metadata": {},
+ "source": [
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | \n",
+ " \n",
+ " Just before we get started --\n",
+ " I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides. \n",
+ " https://edwarddonner.com/2024/11/13/llm-engineering-resources/ \n",
+ " Please keep this bookmarked, and I'll continue to add more useful links there over time.\n",
+ " \n",
+ " | \n",
+ "
\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "79ffe36f",
+ "metadata": {},
+ "source": [
+ "## First - let's talk about the Chat Completions API\n",
+ "\n",
+ "1. The simplest way to call an LLM\n",
+ "2. It's called Chat Completions because it's saying: \"here is a conversation, please predict what should come next\"\n",
+ "3. The Chat Completions API was invented by OpenAI, but it's so popular that everybody uses it!\n",
+ "\n",
+ "### We will start by calling OpenAI again - but don't worry non-OpenAI people, your time is coming!\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e38f17a0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "from dotenv import load_dotenv\n",
+ "\n",
+ "load_dotenv(override=True)\n",
+ "api_key = os.getenv('OPENAI_API_KEY')\n",
+ "\n",
+ "if not api_key:\n",
+ " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
+ "elif not api_key.startswith(\"sk-proj-\"):\n",
+ " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
+ "else:\n",
+ " print(\"API key found and looks good so far!\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "97846274",
+ "metadata": {},
+ "source": [
+ "## Do you know what an Endpoint is?\n",
+ "\n",
+ "If not, please review the Technical Foundations guide in the guides folder\n",
+ "\n",
+ "And, here is an endpoint that might interest you..."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5af5c188",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import requests\n",
+ "\n",
+ "headers = {\"Authorization\": f\"Bearer {api_key}\", \"Content-Type\": \"application/json\"}\n",
+ "\n",
+ "payload = {\n",
+ " \"model\": \"gpt-5-nano\",\n",
+ " \"messages\": [\n",
+ " {\"role\": \"user\", \"content\": \"Tell me a fun fact\"}]\n",
+ "}\n",
+ "\n",
+ "payload"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2d0ab242",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "response = requests.post(\n",
+ " \"https://api.openai.com/v1/chat/completions\",\n",
+ " headers=headers,\n",
+ " json=payload\n",
+ ")\n",
+ "\n",
+ "response.json()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cb11a9f6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "response.json()[\"choices\"][0][\"message\"][\"content\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cea3026a",
+ "metadata": {},
+ "source": [
+ "# What is the openai package?\n",
+ "\n",
+ "It's known as a Python Client Library.\n",
+ "\n",
+ "It's nothing more than a wrapper around making this exact call to the http endpoint.\n",
+ "\n",
+ "It just allows you to work with nice Python code instead of messing around with janky json objects.\n",
+ "\n",
+ "But that's it. It's open-source and lightweight. Some people think it contains OpenAI model code - it doesn't!\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "490fdf09",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Create OpenAI client\n",
+ "\n",
+ "from openai import OpenAI\n",
+ "openai = OpenAI()\n",
+ "\n",
+ "response = openai.chat.completions.create(model=\"gpt-5-nano\", messages=[{\"role\": \"user\", \"content\": \"Tell me a fun fact\"}])\n",
+ "\n",
+ "response.choices[0].message.content\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c7739cda",
+ "metadata": {},
+ "source": [
+ "## And then this great thing happened:\n",
+ "\n",
+ "OpenAI's Chat Completions API was so popular, that the other model providers created endpoints that are identical.\n",
+ "\n",
+ "They are known as the \"OpenAI Compatible Endpoints\".\n",
+ "\n",
+ "For example, google made one here: https://generativelanguage.googleapis.com/v1beta/openai/\n",
+ "\n",
+ "And OpenAI decided to be kind: they said, hey, you can just use the same client library that we made for GPT. We'll allow you to specify a different endpoint URL and a different key, to use another provider.\n",
+ "\n",
+ "So you can use:\n",
+ "\n",
+ "```python\n",
+ "gemini = OpenAI(base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\", api_key=\"AIz....\")\n",
+ "gemini.chat.completions.create(...)\n",
+ "```\n",
+ "\n",
+ "And to be clear - even though OpenAI is in the code, we're only using this lightweight python client library to call the endpoint - there's no OpenAI model involved here.\n",
+ "\n",
+ "If you're confused, please review Guide 9 in the Guides folder!\n",
+ "\n",
+ "And now let's try it!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f74293bc",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "GEMINI_BASE_URL = \"https://generativelanguage.googleapis.com/v1beta/openai/\"\n",
+ "\n",
+ "google_api_key = os.getenv(\"GOOGLE_API_KEY\")\n",
+ "\n",
+ "if not google_api_key:\n",
+ " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
+ "elif not google_api_key.startswith(\"AIz\"):\n",
+ " print(\"An API key was found, but it doesn't start AIz\")\n",
+ "else:\n",
+ " print(\"API key found and looks good so far!\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8fc5520d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import google.generativeai as genai\n",
+ "from dotenv import load_dotenv\n",
+ "import os\n",
+ "\n",
+ "load_dotenv()\n",
+ "genai.configure(api_key=os.getenv(\"GOOGLE_API_KEY\"))\n",
+ "\n",
+ "# Lista de modelos disponibles\n",
+ "for model in genai.list_models():\n",
+ " print(model.name, \"-\", model.supported_generation_methods)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d060f484",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import google.generativeai as genai\n",
+ "from dotenv import load_dotenv\n",
+ "import os\n",
+ "\n",
+ "load_dotenv()\n",
+ "genai.configure(api_key=os.getenv(\"GOOGLE_API_KEY\"))\n",
+ "\n",
+ "model = genai.GenerativeModel(\"models/gemini-2.5-pro\") # Usa el modelo que viste en la lista, ejemplo \"gemini-1.5-pro\" o \"gemini-1.5-flash\"\n",
+ "response = model.generate_content(\"Tell me a fun fact\")\n",
+ "\n",
+ "print(response.text)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "gemini = OpenAI(base_url=GEMINI_BASE_URL, api_key=google_api_key)\n",
+ "\n",
+ "response = gemini.chat.completions.create(model=\"models/gemini-2.5-pro\", messages=[{\"role\": \"user\", \"content\": \"Tell me a fun fact\"}])\n",
+ "\n",
+ "response.choices[0].message.content"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a5b069be",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "id": "65272432",
+ "metadata": {},
+ "source": [
+ "## And Ollama also gives an OpenAI compatible endpoint\n",
+ "\n",
+ "...and it's on your local machine!\n",
+ "\n",
+ "If the next cell doesn't print \"Ollama is running\" then please open a terminal and run `ollama serve`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f06280ad",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "requests.get(\"http://localhost:11434\").content"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c6ef3807",
+ "metadata": {},
+ "source": [
+ "### Download llama3.2 from meta\n",
+ "\n",
+ "Change this to llama3.2:1b if your computer is smaller.\n",
+ "\n",
+ "Don't use llama3.3 or llama4! They are too big for your computer.."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e633481d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!ollama pull llama3.2"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ce240975",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import requests\n",
+ "response = requests.get(\"http://localhost:11434/v1/models\")\n",
+ "print(response.json())\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d9419762",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from openai import OpenAI\n",
+ "\n",
+ "OLLAMA_BASE_URL = \"http://localhost:11434/v1\"\n",
+ "\n",
+ "ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e2456cdf",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Get a fun fact\n",
+ "\n",
+ "response = ollama.chat.completions.create(model=\"llama3.2\", messages=[{\"role\": \"user\", \"content\": \"Tell me a fun fact\"}])\n",
+ "\n",
+ "response.choices[0].message.content"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3d7cebd7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Now let's try deepseek-r1:1.5b - this is DeepSeek \"distilled\" into Qwen from Alibaba Cloud\n",
+ "\n",
+ "!ollama pull deepseek-r1:1.5b"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "25002f25",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#response = ollama.chat.completions.create(model=\"deepseek-r1:1.5b\", messages=[{\"role\": \"user\", \"content\": \"Tell me a fun fact\"}])\n",
+ "#response.choices[0].message.content\n",
+ "\n",
+ "from ollama import chat # pip install ollama\n",
+ "\n",
+ "resp = chat(\n",
+ " model='deepseek-r1:1.5b',\n",
+ " messages=[{'role': 'user', 'content': 'Tell me a fun fact'}],\n",
+ ")\n",
+ "\n",
+ "print(resp['message']['content'])\n",
+ "# o\n",
+ "print(resp.message.content)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6e9fa1fc-eac5-4d1d-9be4-541b3f2b3458",
+ "metadata": {},
+ "source": [
+ "# HOMEWORK EXERCISE ASSIGNMENT\n",
+ "\n",
+ "Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI\n",
+ "\n",
+ "You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.\n",
+ "\n",
+ "**Benefits:**\n",
+ "1. No API charges - open-source\n",
+ "2. Data doesn't leave your box\n",
+ "\n",
+ "**Disadvantages:**\n",
+ "1. Significantly less power than Frontier Model\n",
+ "\n",
+ "## Recap on installation of Ollama\n",
+ "\n",
+ "Simply visit [ollama.com](https://ollama.com) and install!\n",
+ "\n",
+ "Once complete, the ollama server should already be running locally. \n",
+ "If you visit: \n",
+ "[http://localhost:11434/](http://localhost:11434/)\n",
+ "\n",
+ "You should see the message `Ollama is running`. \n",
+ "\n",
+ "If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve` \n",
+ "And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2` \n",
+ "Then try [http://localhost:11434/](http://localhost:11434/) again.\n",
+ "\n",
+ "If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code from `MODEL = \"llama3.2\"` to `MODEL = \"llama3.2:1b\"`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6de38216-6d1c-48c4-877b-86d403f4e0f8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# imports\n",
+ "import os\n",
+ "from dotenv import load_dotenv\n",
+ "from scraper import fetch_website_contents\n",
+ "from IPython.display import Markdown, display\n",
+ "from ollama import Client \n",
+ "\n",
+ "# Cliente Ollama local\n",
+ "ollama = Client()\n",
+ "\n",
+ "system_prompt = \"\"\"\n",
+ "You are a helpful assistant that analyzes the contents of a website,\n",
+ "and provides a short, snarky, humorous summary, ignoring text that might be navigation related.\n",
+ "Respond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.\n",
+ "\"\"\"\n",
+ "\n",
+ "user_prompt_prefix = \"\"\"\n",
+ "Here are the contents of a website.\n",
+ "Provide a short summary of this website.\n",
+ "If it includes news or announcements, then summarize these too.\n",
+ "\"\"\"\n",
+ "\n",
+ "def messages_for(website):\n",
+ " return [\n",
+ " {\"role\": \"system\", \"content\": system_prompt},\n",
+ " {\"role\": \"user\", \"content\": user_prompt_prefix + website}\n",
+ " ]\n",
+ "\n",
+ "def summarize(url):\n",
+ " website = fetch_website_contents(url)\n",
+ " response = ollama.chat(\n",
+ " model='llama3.2',\n",
+ " messages=messages_for(website)\n",
+ " )\n",
+ " return response['message']['content']\n",
+ "\n",
+ "def display_summary(url):\n",
+ " summary = summarize(url)\n",
+ " display(Markdown(summary))\n",
+ "\n",
+ "# Ejecuta el resumen\n",
+ "display_summary(\"https://www.reforma.com\")\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/week1/community-contributions/fernando/week1 EXERCISE.ipynb b/week1/community-contributions/fernando/week1 EXERCISE.ipynb
new file mode 100644
index 0000000..c152cb7
--- /dev/null
+++ b/week1/community-contributions/fernando/week1 EXERCISE.ipynb
@@ -0,0 +1,175 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5",
+ "metadata": {},
+ "source": [
+ "# End of week 1 exercise\n",
+ "\n",
+ "To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n",
+ "and responds with an explanation. This is a tool that you will be able to use yourself during the course!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c1070317-3ed9-4659-abe3-828943230e03",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# imports\n",
+ "import os\n",
+ "from openai import OpenAI\n",
+ "from dotenv import load_dotenv"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4a456906-915a-4bfd-bb9d-57e505c5093f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# constants\n",
+ "MODEL_GPT = 'gpt-4o-mini'\n",
+ "MODEL_LLAMA = 'llama3.2'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a8d7923c-5f28-4c30-8556-342d7c8497c1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# set up environment\n",
+ "system_prompt = \"\"\"\n",
+ "You are a technical expert of AI and LLMs.\n",
+ "\"\"\"\n",
+ "\n",
+ "user_prompt_prefix = \"\"\"\n",
+ "Provide deep explanations of the provided text.\n",
+ "\"\"\"\n",
+ "\n",
+ "user_prompt = \"\"\"\n",
+ "Explain the provided text.\n",
+ "\"\"\"\n",
+ "client = OpenAI()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3f0d0137-52b0-47a8-81a8-11a90a010798",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# here is the question; type over this to ask something new\n",
+ "\n",
+ "question = \"\"\"\n",
+ "Ollama does have an OpenAI compatible endpoint, but Gemini doesn't?\n",
+ "\"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Get gpt-4o-mini to answer, with streaming\n",
+ "def messages_for(question):\n",
+ " return [\n",
+ " {\"role\": \"system\", \"content\": system_prompt},\n",
+ " {\"role\": \"user\", \"content\": user_prompt_prefix + question}\n",
+ " ]\n",
+ "\n",
+ "def run_model_streaming(model_name, question):\n",
+ " stream = client.chat.completions.create(\n",
+ " model=model_name,\n",
+ " messages=messages_for(question),\n",
+ " stream=True\n",
+ " )\n",
+ " for chunk in stream:\n",
+ " content = chunk.choices[0].delta.content\n",
+ " if content:\n",
+ " print(content, end=\"\", flush=True)\n",
+ "\n",
+ "run_model_streaming(MODEL_GPT, question)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Get Llama 3.2 to answer\n",
+ "# imports\n",
+ "import os\n",
+ "from openai import OpenAI\n",
+ "from dotenv import load_dotenv\n",
+ "\n",
+ "# set up environment\n",
+ "client = OpenAI(\n",
+ " base_url=os.getenv(\"OPENAI_BASE_URL\", \"http://localhost:11434/v1\"),\n",
+ " api_key=os.getenv(\"OPENAI_API_KEY\", \"ollama\")\n",
+ ")\n",
+ "\n",
+ "system_prompt = \"\"\"\n",
+ "You are a technical expert of AI and LLMs.\n",
+ "\"\"\"\n",
+ "\n",
+ "user_prompt_prefix = \"\"\"\n",
+ "Provide deep explanations of the provided text.\n",
+ "\"\"\"\n",
+ "\n",
+ "# question\n",
+ "question = \"\"\"\n",
+ "Ollama does have an OpenAI compatible endpoint, but Gemini doesn't?\n",
+ "\"\"\"\n",
+ "\n",
+ "# message\n",
+ "def messages_for(question):\n",
+ " return [\n",
+ " {\"role\": \"system\", \"content\": system_prompt},\n",
+ " {\"role\": \"user\", \"content\": user_prompt_prefix + question}\n",
+ " ]\n",
+ "\n",
+ "# response\n",
+ "def run_model(model_name, question):\n",
+ " response = client.chat.completions.create(\n",
+ " model=model_name,\n",
+ " messages=messages_for(question)\n",
+ " )\n",
+ " return response.choices[0].message.content\n",
+ "\n",
+ "# run and print result\n",
+ "print(run_model(MODEL_LLAMA, question))\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/week2/community-contributions/ai_domain_finder/ai_domain_finder.ipynb b/week2/community-contributions/ai_domain_finder/ai_domain_finder.ipynb
new file mode 100644
index 0000000..c0fbbcc
--- /dev/null
+++ b/week2/community-contributions/ai_domain_finder/ai_domain_finder.ipynb
@@ -0,0 +1,721 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1633a440",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\"\"\"\n",
+ "Week 2 Assignment: LLM Engineering\n",
+ "Author: Nikhil Raut\n",
+ "\n",
+ "Notebook: ai_domain_finder.ipynb\n",
+ "\n",
+ "Purpose:\n",
+ "Build an agentic AI Domain Finder that proposes short, brandable .com names, verifies availability via RDAP, \n",
+ "then returns: \n",
+ " a list of available .coms, \n",
+ " one preferred pick, \n",
+ " and a brief audio rationale.\n",
+ "\"\"\"\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "da528fbe",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import json\n",
+ "import requests\n",
+ "from typing import Dict, List, Tuple, Any, Optional\n",
+ "import re\n",
+ "\n",
+ "from dotenv import load_dotenv\n",
+ "from openai import OpenAI\n",
+ "import gradio as gr\n",
+ "\n",
+ "load_dotenv(override=True)\n",
+ "\n",
+ "OPENAI_MODEL = \"gpt-5-nano-2025-08-07\"\n",
+ "TTS_MODEL = \"gpt-4o-mini-tts\"\n",
+ "\n",
+ "openai = OpenAI()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "361f7fe3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# --- robust logging that works inside VS Code notebooks + Gradio threads ---\n",
+ "import sys, logging, threading\n",
+ "from collections import deque\n",
+ "from typing import Any\n",
+ "\n",
+ "DEBUG_LLM = True # toggle on/off noisy logs\n",
+ "CLEAR_LOG_ON_RUN = True # clear panel before each submit\n",
+ "\n",
+ "_LOG_BUFFER = deque(maxlen=2000) # keep ~2000 lines in memory\n",
+ "_LOG_LOCK = threading.Lock()\n",
+ "\n",
+ "class GradioBufferHandler(logging.Handler):\n",
+ " def emit(self, record: logging.LogRecord) -> None:\n",
+ " try:\n",
+ " msg = self.format(record)\n",
+ " except Exception:\n",
+ " msg = record.getMessage()\n",
+ " with _LOG_LOCK:\n",
+ " for line in (msg.splitlines() or [\"\"]):\n",
+ " _LOG_BUFFER.append(line)\n",
+ "\n",
+ "def get_log_text() -> str:\n",
+ " with _LOG_LOCK:\n",
+ " return \"\\n\".join(_LOG_BUFFER)\n",
+ "\n",
+ "def clear_log_buffer() -> None:\n",
+ " with _LOG_LOCK:\n",
+ " _LOG_BUFFER.clear()\n",
+ "\n",
+ "def _setup_logger() -> logging.Logger:\n",
+ " logger = logging.getLogger(\"aidf\")\n",
+ " logger.setLevel(logging.DEBUG if DEBUG_LLM else logging.INFO)\n",
+ " logger.handlers.clear()\n",
+ " fmt = logging.Formatter(\"%(asctime)s | %(levelname)s | %(message)s\", \"%H:%M:%S\")\n",
+ "\n",
+ " stream = logging.StreamHandler(stream=sys.stdout) # captured by VS Code notebook\n",
+ " stream.setFormatter(fmt)\n",
+ "\n",
+ " buf = GradioBufferHandler() # shown inside the Gradio panel\n",
+ " buf.setFormatter(fmt)\n",
+ "\n",
+ " logger.addHandler(stream)\n",
+ " logger.addHandler(buf)\n",
+ " logger.propagate = False\n",
+ " return logger\n",
+ "\n",
+ "logger = _setup_logger()\n",
+ "\n",
+ "def dbg_json(obj: Any, title: str = \"\") -> None:\n",
+ " \"\"\"Convenience: pretty-print JSON-ish objects to the logger.\"\"\"\n",
+ " try:\n",
+ " txt = json.dumps(obj, ensure_ascii=False, indent=2)\n",
+ " except Exception:\n",
+ " txt = str(obj)\n",
+ " if title:\n",
+ " logger.debug(\"%s\\n%s\", title, txt)\n",
+ " else:\n",
+ " logger.debug(\"%s\", txt)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "519674b2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "RDAP_URL = \"https://rdap.verisign.com/com/v1/domain/{}\"\n",
+ "\n",
+ "_ALPHA_RE = re.compile(r\"^[a-z]+$\", re.IGNORECASE)\n",
+ "\n",
+ "def _to_com(domain: str) -> str:\n",
+ " d = domain.strip().lower()\n",
+ " return d if d.endswith(\".com\") else f\"{d}.com\"\n",
+ "\n",
+ "def _sld_is_english_alpha(fqdn: str) -> bool:\n",
+ " \"\"\"\n",
+ " True only if the second-level label (just before .com) is made up\n",
+ " exclusively of English letters (a-z).\n",
+ " Examples:\n",
+ " foo.com -> True\n",
+ " foo-bar.com -> False\n",
+ " foo1.com -> False\n",
+ " café.com -> False\n",
+ " xn--cafe.com -> False\n",
+ " www.foo.com -> True (checks 'foo')\n",
+ " \"\"\"\n",
+ " if not fqdn.endswith(\".com\"):\n",
+ " return False\n",
+ " sld = fqdn[:-4].split(\".\")[-1] # take label immediately before .com\n",
+ " return bool(sld) and bool(_ALPHA_RE.fullmatch(sld))\n",
+ "\n",
+ "def check_com_availability(domain: str) -> Dict:\n",
+ " fqdn = _to_com(domain)\n",
+ " # Skip API if not strictly English letters\n",
+ " if not _sld_is_english_alpha(fqdn):\n",
+ " return {\"domain\": fqdn, \"available\": False, \"status\": 0}\n",
+ "\n",
+ " try:\n",
+ " r = requests.get(RDAP_URL.format(fqdn), timeout=6)\n",
+ " return {\"domain\": fqdn, \"available\": (r.status_code == 404), \"status\": r.status_code}\n",
+ " except requests.RequestException:\n",
+ " return {\"domain\": fqdn, \"available\": False, \"status\": 0}\n",
+ "\n",
+ "def check_com_availability_bulk(domains: List[str]) -> Dict:\n",
+ " \"\"\"\n",
+ " Input: list of domain roots or FQDNs.\n",
+ " Returns:\n",
+ " {\n",
+ " \"results\": [{\"domain\": \"...\", \"available\": bool, \"status\": int}, ...],\n",
+ " \"available\": [\"...\"], # convenience\n",
+ " \"count_available\": int\n",
+ " }\n",
+ " \"\"\"\n",
+ " session = requests.Session()\n",
+ " results: List[Dict] = []\n",
+ "\n",
+ " for d in domains:\n",
+ " fqdn = _to_com(d)\n",
+ "\n",
+ " # Skip API if not strictly English letters\n",
+ " if not _sld_is_english_alpha(fqdn):\n",
+ " results.append({\"domain\": fqdn, \"available\": False, \"status\": 0})\n",
+ " continue\n",
+ "\n",
+ " try:\n",
+ " r = session.get(RDAP_URL.format(fqdn), timeout=6)\n",
+ " ok = (r.status_code == 404)\n",
+ " results.append({\"domain\": fqdn, \"available\": ok, \"status\": r.status_code})\n",
+ " except requests.RequestException:\n",
+ " results.append({\"domain\": fqdn, \"available\": False, \"status\": 0})\n",
+ "\n",
+ " available = [x[\"domain\"] for x in results if x[\"available\"]]\n",
+ " return {\"results\": results, \"available\": available, \"count_available\": len(available)}\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cd20c262",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "check_tool_bulk = {\n",
+ " \"type\": \"function\",\n",
+ " \"function\": {\n",
+ " \"name\": \"check_com_availability_bulk\",\n",
+ " \"description\": \"Batch check .com availability via RDAP for a list of domains (roots or FQDNs).\",\n",
+ " \"parameters\": {\n",
+ " \"type\": \"object\",\n",
+ " \"properties\": {\n",
+ " \"domains\": {\n",
+ " \"type\": \"array\",\n",
+ " \"items\": {\"type\": \"string\"},\n",
+ " \"minItems\": 1,\n",
+ " \"maxItems\": 50,\n",
+ " \"description\": \"List of domain roots or .com FQDNs.\"\n",
+ " }\n",
+ " },\n",
+ " \"required\": [\"domains\"],\n",
+ " \"additionalProperties\": False\n",
+ " }\n",
+ " }\n",
+ "}\n",
+ "\n",
+ "TOOLS = [check_tool_bulk]\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2a9138b6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def handle_tool_calls(message) -> List[Dict]:\n",
+ " results = []\n",
+ " for call in (message.tool_calls or []):\n",
+ " fn = getattr(call.function, \"name\", None)\n",
+ " args_raw = getattr(call.function, \"arguments\", \"\") or \"{}\"\n",
+ " try:\n",
+ " args = json.loads(args_raw)\n",
+ " except Exception:\n",
+ " args = {}\n",
+ "\n",
+ " logger.debug(\"TOOL CALL -> %s | args=%s\", fn, json.dumps(args, ensure_ascii=False))\n",
+ "\n",
+ " if fn == \"check_com_availability_bulk\":\n",
+ " payload = check_com_availability_bulk(args.get(\"domains\", []))\n",
+ " elif fn == \"check_com_availability\":\n",
+ " payload = check_com_availability(args.get(\"domain\", \"\"))\n",
+ " else:\n",
+ " payload = {\"error\": f\"unknown tool {fn}\"}\n",
+ "\n",
+ " logger.debug(\"TOOL RESULT <- %s | %s\", fn, json.dumps(payload, ensure_ascii=False))\n",
+ "\n",
+ " results.append({\n",
+ " \"role\": \"tool\",\n",
+ " \"tool_call_id\": call.id,\n",
+ " \"content\": json.dumps(payload),\n",
+ " })\n",
+ " return results\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0b80c860",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "SYSTEM_PROMPT = \"\"\"You are the Agent for project \"AI Domain Finder\".\n",
+ "Goal: suggest .com domains and verify availability using the tool ONLY (no guessing).\n",
+ "\n",
+ "Do this each interaction:\n",
+ "- Generate up to ~20 short, brandable .com candidates from:\n",
+ " (1) Industry, (2) Target Customers, (3) Description.\n",
+ "- Use the BULK tool `check_com_availability_bulk` with a list of candidates\n",
+ " (roots or FQDNs). Prefer a single call or very few batched calls.\n",
+ "- If >= 5 available .coms are found, STOP checking and finalize the answer.\n",
+ "\n",
+ "Output Markdown with EXACT section headings:\n",
+ "1) Available .com domains:\n",
+ " - itemized list of available .coms only (root + .com)\n",
+ "2) Preferred domain:\n",
+ " - a single best pick\n",
+ "3) Audio explanation:\n",
+ " - 1–2 concise sentences explaining the preference\n",
+ "\n",
+ "Constraints:\n",
+ "- Use customer-familiar words where helpful.\n",
+ "- Keep names short, simple, pronounceable; avoid hyphens/numbers unless meaningful.\n",
+ "- Never include TLDs other than .com.\n",
+ "- domain is made up of english alphabets in lower case only no symbols or spaces to use\n",
+ "\"\"\"\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "72e9d8c2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def _asdict_tool_call(tc: Any) -> dict:\n",
+ " try:\n",
+ " return {\n",
+ " \"id\": getattr(tc, \"id\", None),\n",
+ " \"type\": \"function\",\n",
+ " \"function\": {\n",
+ " \"name\": getattr(tc.function, \"name\", None),\n",
+ " \"arguments\": getattr(tc.function, \"arguments\", None),\n",
+ " },\n",
+ " }\n",
+ " except Exception:\n",
+ " return {\"type\": \"function\", \"function\": {\"name\": None, \"arguments\": None}}\n",
+ "\n",
+ "def _asdict_message(msg: Any) -> dict:\n",
+ " if isinstance(msg, dict):\n",
+ " return msg\n",
+ " role = getattr(msg, \"role\", None)\n",
+ " content = getattr(msg, \"content\", None)\n",
+ " tool_calls = getattr(msg, \"tool_calls\", None)\n",
+ " out = {\"role\": role, \"content\": content}\n",
+ " if tool_calls:\n",
+ " out[\"tool_calls\"] = [_asdict_tool_call(tc) for tc in tool_calls]\n",
+ " return out\n",
+ "\n",
+ "def _sanitized_messages_for_log(messages: list[dict | Any]) -> list[dict]:\n",
+ " return [_asdict_message(m) for m in messages]\n",
+ "\n",
+ "def _limit_text(s: str, limit: int = 40000) -> str:\n",
+ " return s if len(s) <= limit else (s[:limit] + \"\\n... [truncated]\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b45c6382",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def run_agent_with_tools(history: List[Dict]) -> Tuple[str, List[str], str]:\n",
+ " \"\"\"\n",
+ " Returns:\n",
+ " reply_md: final assistant markdown\n",
+ " tool_available: .coms marked available by RDAP tools (order-preserving, deduped)\n",
+ " dbg_text: concatenated log buffer (for the UI panel)\n",
+ " \"\"\"\n",
+ " messages: List[Dict] = [{\"role\": \"system\", \"content\": SYSTEM_PROMPT}] + history\n",
+ " tool_available: List[str] = []\n",
+ "\n",
+ " dbg_json(_sanitized_messages_for_log(messages), \"=== LLM REQUEST (initial messages) ===\")\n",
+ " resp = openai.chat.completions.create(model=OPENAI_MODEL, messages=messages, tools=TOOLS)\n",
+ "\n",
+ " while resp.choices[0].finish_reason == \"tool_calls\":\n",
+ " tool_msg_sdk = resp.choices[0].message\n",
+ " tool_msg = _asdict_message(tool_msg_sdk)\n",
+ " dbg_json(tool_msg, \"=== ASSISTANT (tool_calls) ===\")\n",
+ "\n",
+ " tool_results = handle_tool_calls(tool_msg_sdk)\n",
+ "\n",
+ " # Accumulate authoritative availability directly from tool outputs\n",
+ " for tr in tool_results:\n",
+ " try:\n",
+ " data = json.loads(tr[\"content\"])\n",
+ " if isinstance(data, dict) and isinstance(data.get(\"available\"), list):\n",
+ " for d in data[\"available\"]:\n",
+ " tool_available.append(_to_com(d))\n",
+ " except Exception:\n",
+ " pass\n",
+ "\n",
+ " dbg_json([json.loads(tr[\"content\"]) for tr in tool_results], \"=== TOOL RESULTS ===\")\n",
+ "\n",
+ " messages.append(tool_msg)\n",
+ " messages.extend(tool_results)\n",
+ " dbg_json(_sanitized_messages_for_log(messages), \"=== LLM REQUEST (messages + tools) ===\")\n",
+ "\n",
+ " resp = openai.chat.completions.create(model=OPENAI_MODEL, messages=messages, tools=TOOLS)\n",
+ "\n",
+ " # Dedup preserve order\n",
+ " seen, uniq = set(), []\n",
+ " for d in tool_available:\n",
+ " if d not in seen:\n",
+ " seen.add(d)\n",
+ " uniq.append(d)\n",
+ "\n",
+ " reply_md = resp.choices[0].message.content\n",
+ " logger.debug(\"=== FINAL ASSISTANT ===\\n%s\", _limit_text(reply_md))\n",
+ " dbg_json(uniq, \"=== AVAILABLE FROM TOOLS (authoritative) ===\")\n",
+ "\n",
+ " # Return current buffer text for the UI panel\n",
+ " dbg_text = _limit_text(get_log_text(), 40000)\n",
+ " return reply_md, uniq, dbg_text\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "92306515",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def extract_audio_text(markdown_reply: str) -> str:\n",
+ " \"\"\"\n",
+ " Pulls the 'Audio explanation:' section; falls back to first sentence.\n",
+ " \"\"\"\n",
+ " marker = \"Audio explanation:\"\n",
+ " lower = markdown_reply.lower()\n",
+ " idx = lower.find(marker.lower())\n",
+ " if idx != -1:\n",
+ " segment = markdown_reply[idx + len(marker):].strip()\n",
+ " parts = segment.split(\".\")\n",
+ " return (\". \".join([p.strip() for p in parts if p.strip()][:2]) + \".\").strip()\n",
+ " return \"This domain is the clearest, most memorable fit for the audience and brand goals.\"\n",
+ "\n",
+ "def synth_audio(text: str) -> bytes:\n",
+ " audio = openai.audio.speech.create(\n",
+ " model=TTS_MODEL,\n",
+ " voice=\"alloy\",\n",
+ " input=text\n",
+ " )\n",
+ " return audio.content\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cc6c0650",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "_DOMAIN_RE = re.compile(r\"\\b[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\\.com\\b\", re.I)\n",
+ "_HDR_AVAIL = re.compile(r\"^\\s*[\\d\\.\\)\\-]*\\s*available\\s+.*\\.com\\s+domains\", re.I)\n",
+ "_HDR_PREF = re.compile(r\"^\\s*[\\d\\.\\)\\-]*\\s*preferred\\s+domain\", re.I)\n",
+ "\n",
+ "def _norm_domain(s: str) -> str:\n",
+ " s = s.strip().lower()\n",
+ " return s if s.endswith(\".com\") else f\"{s}.com\"\n",
+ "\n",
+ "def parse_available(md: str) -> list[str]:\n",
+ " lines = md.splitlines()\n",
+ " out = []\n",
+ " in_section = False\n",
+ " for ln in lines:\n",
+ " if _HDR_AVAIL.search(ln):\n",
+ " in_section = True\n",
+ " continue\n",
+ " if in_section and _HDR_PREF.search(ln):\n",
+ " break\n",
+ " if in_section:\n",
+ " for m in _DOMAIN_RE.findall(ln):\n",
+ " out.append(_norm_domain(m))\n",
+ " # Fallback: if the header wasn't found, collect all .coms then we'll still\n",
+ " # rely on agent instruction to list only available, which should be safe.\n",
+ " if not out:\n",
+ " out = [_norm_domain(m) for m in _DOMAIN_RE.findall(md)]\n",
+ " # dedupe preserve order\n",
+ " seen, uniq = set(), []\n",
+ " for d in out:\n",
+ " if d not in seen:\n",
+ " seen.add(d)\n",
+ " uniq.append(d)\n",
+ " return uniq\n",
+ "\n",
+ "def parse_preferred(md: str) -> str:\n",
+ " # search the preferred section first\n",
+ " lines = md.splitlines()\n",
+ " start = None\n",
+ " for i, ln in enumerate(lines):\n",
+ " if _HDR_PREF.search(ln):\n",
+ " start = i\n",
+ " break\n",
+ " segment = \"\\n\".join(lines[start:start+8]) if start is not None else md[:500]\n",
+ " m = _DOMAIN_RE.search(segment)\n",
+ " if m:\n",
+ " return _norm_domain(m.group(0))\n",
+ " m = _DOMAIN_RE.search(md)\n",
+ " return _norm_domain(m.group(0)) if m else \"\"\n",
+ "\n",
+ "def merge_and_sort(old: list[str], new: list[str]) -> list[str]:\n",
+ " merged = {d.lower() for d in old} | {d.lower() for d in new}\n",
+ " return sorted(merged, key=lambda s: (len(s), s))\n",
+ "\n",
+ "def fmt_available_md(domains: list[str]) -> str:\n",
+ " if not domains:\n",
+ " return \"### Available .com domains (cumulative)\\n\\n*– none yet –*\"\n",
+ " items = \"\\n\".join(f\"- `{d}`\" for d in domains)\n",
+ " return f\"### Available .com domains (cumulative)\\n\\n{items}\"\n",
+ "\n",
+ "def fmt_preferred_md(d: str) -> str:\n",
+ " if not d:\n",
+ " return \"### Preferred domain\\n\\n*– not chosen yet –*\"\n",
+ " return f\"### Preferred domain\\n\\n`{d}`\"\n",
+ "\n",
+ "def build_context_msg(known_avail: Optional[List[str]], preferred_now: Optional[str]) -> str:\n",
+ " \"\"\"\n",
+ " Create a short 'state so far' block that we prepend to the next user turn\n",
+ " so the model always sees the preferred and cumulative available list.\n",
+ " \"\"\"\n",
+ " lines = []\n",
+ " if (preferred_now or \"\").strip():\n",
+ " lines.append(f\"Preferred domain so far: {preferred_now.strip().lower()}\")\n",
+ " if known_avail:\n",
+ " lines.append(\"Available .com domains discovered so far:\")\n",
+ " for d in known_avail:\n",
+ " if d:\n",
+ " lines.append(f\"- {d.strip().lower()}\")\n",
+ " if not lines:\n",
+ " return \"\"\n",
+ " return \"STATE TO CARRY OVER FROM PREVIOUS TURNS:\\n\" + \"\\n\".join(lines)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "07f079d6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def run_and_extract(history: List[Dict]) -> Tuple[str, List[str], str, str, str]:\n",
+ " reply_md, avail_from_tools, dbg_text = run_agent_with_tools(history)\n",
+ " parsed_avail = parse_available(reply_md)\n",
+ " new_avail = merge_and_sort(avail_from_tools, parsed_avail)\n",
+ " preferred = parse_preferred(reply_md)\n",
+ " audio_text = extract_audio_text(reply_md)\n",
+ " return reply_md, new_avail, preferred, audio_text, dbg_text\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4cd5d8ef",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def initial_submit(industry: str, customers: str, desc: str,\n",
+ " history: List[Dict], known_avail: List[str], preferred_now: str):\n",
+ " if CLEAR_LOG_ON_RUN:\n",
+ " clear_log_buffer()\n",
+ "\n",
+ " logger.info(\"Initial submit | industry=%r | customers=%r | desc_len=%d\",\n",
+ " industry, customers, len(desc or \"\"))\n",
+ "\n",
+ " # Build context (usually empty on the very first run, but future inits also work)\n",
+ " ctx = build_context_msg(known_avail or [], preferred_now or \"\")\n",
+ "\n",
+ " user_msg = (\n",
+ " \"Please propose .com domains based on:\\n\"\n",
+ " f\"Industry: {industry}\\n\"\n",
+ " f\"Target Customers: {customers}\\n\"\n",
+ " f\"Description: {desc}\"\n",
+ " )\n",
+ "\n",
+ " # Single user turn that includes state + prompt so the model always sees memory\n",
+ " full_content = (ctx + \"\\n\\n\" if ctx else \"\") + user_msg\n",
+ "\n",
+ " history = (history or []) + [{\"role\": \"user\", \"content\": full_content}]\n",
+ " reply_md, new_avail, preferred, audio_text, dbg_text = run_and_extract(history)\n",
+ " history += [{\"role\": \"assistant\", \"content\": reply_md}]\n",
+ "\n",
+ " all_avail = merge_and_sort(known_avail or [], new_avail or [])\n",
+ " preferred_final = preferred or preferred_now or \"\"\n",
+ " audio_bytes = synth_audio(audio_text)\n",
+ "\n",
+ " return (\n",
+ " history, # s_history\n",
+ " all_avail, # s_available (cumulative)\n",
+ " preferred_final, # s_preferred\n",
+ " gr.update(value=fmt_preferred_md(preferred_final)),\n",
+ " gr.update(value=fmt_available_md(all_avail)),\n",
+ " gr.update(value=\"\", visible=True), # reply_in: show after first run\n",
+ " gr.update(value=audio_bytes, visible=True), # audio_out\n",
+ " gr.update(value=dbg_text), # debug_box\n",
+ " gr.update(value=\"Find Domains (done)\", interactive=False), # NEW: disable Find\n",
+ " gr.update(visible=True), # NEW: show Send button\n",
+ " )\n",
+ "\n",
+ "def refine_submit(reply: str,\n",
+ " history: List[Dict], known_avail: List[str], preferred_now: str):\n",
+ " # If empty, do nothing (keeps UI state untouched)\n",
+ " if not (reply or \"\").strip():\n",
+ " return (\"\", history, known_avail, preferred_now,\n",
+ " gr.update(), gr.update(), gr.update(), gr.update())\n",
+ "\n",
+ " if CLEAR_LOG_ON_RUN:\n",
+ " clear_log_buffer()\n",
+ " logger.info(\"Refine submit | user_reply_len=%d\", len(reply))\n",
+ "\n",
+ " # Always prepend memory + the user's refinement so the model can iterate properly\n",
+ " ctx = build_context_msg(known_avail or [], preferred_now or \"\")\n",
+ " full_content = (ctx + \"\\n\\n\" if ctx else \"\") + reply.strip()\n",
+ "\n",
+ " history = (history or []) + [{\"role\": \"user\", \"content\": full_content}]\n",
+ " reply_md, new_avail, preferred, audio_text, dbg_text = run_and_extract(history)\n",
+ " history += [{\"role\": \"assistant\", \"content\": reply_md}]\n",
+ "\n",
+ " all_avail = merge_and_sort(known_avail or [], new_avail or [])\n",
+ " preferred_final = preferred or preferred_now or \"\"\n",
+ " audio_bytes = synth_audio(audio_text)\n",
+ "\n",
+ " return (\n",
+ " \"\", # clear Reply box\n",
+ " history, # s_history\n",
+ " all_avail, # s_available (cumulative)\n",
+ " preferred_final, # s_preferred\n",
+ " gr.update(value=fmt_preferred_md(preferred_final)),\n",
+ " gr.update(value=fmt_available_md(all_avail)),\n",
+ " gr.update(value=audio_bytes, visible=True),\n",
+ " gr.update(value=dbg_text), # debug_box\n",
+ " )\n",
+ "\n",
+ "def clear_debug():\n",
+ " clear_log_buffer()\n",
+ " return gr.update(value=\"\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d52ebc02",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "with gr.Blocks(title=\"AI Domain Finder (.com only)\") as ui:\n",
+ " gr.Markdown(\"# AI Domain Finder (.com only)\")\n",
+ " gr.Markdown(\"Agent proposes .com domains, verifies via RDAP, picks a preferred choice, and explains briefly.\")\n",
+ "\n",
+ " # App state\n",
+ " s_history = gr.State([])\n",
+ " s_available = gr.State([])\n",
+ " s_preferred = gr.State(\"\")\n",
+ "\n",
+ " with gr.Row():\n",
+ " with gr.Column(scale=7): # LEFT 70%\n",
+ " with gr.Group():\n",
+ " industry_in = gr.Textbox(label=\"Industry\")\n",
+ " customers_in = gr.Textbox(label=\"Target Customers\")\n",
+ " desc_in = gr.Textbox(label=\"Description\", lines=3)\n",
+ " find_btn = gr.Button(\"Find Domains\", variant=\"primary\")\n",
+ "\n",
+ " audio_out = gr.Audio(label=\"Audio explanation\", autoplay=True, visible=False)\n",
+ "\n",
+ " with gr.Row():\n",
+ " reply_in = gr.Textbox(\n",
+ " label=\"Reply\",\n",
+ " placeholder=\"Chat with the agent to refine the outputs\",\n",
+ " lines=2,\n",
+ " visible=False, # hidden for the first input\n",
+ " )\n",
+ " send_btn = gr.Button(\"Send\", variant=\"primary\", visible=False)\n",
+ "\n",
+ " with gr.Column(scale=3): # RIGHT 30%\n",
+ " preferred_md = gr.Markdown(fmt_preferred_md(\"\"))\n",
+ " available_md = gr.Markdown(fmt_available_md([]))\n",
+ "\n",
+ " with gr.Accordion(\"Debug log\", open=False):\n",
+ " debug_box = gr.Textbox(label=\"Log\", value=\"\", lines=16, interactive=False)\n",
+ " clear_btn = gr.Button(\"Clear log\", size=\"sm\")\n",
+ "\n",
+ " # Events\n",
+ " # Initial run: also disables Find and shows Send\n",
+ " find_btn.click(\n",
+ " initial_submit,\n",
+ " inputs=[industry_in, customers_in, desc_in, s_history, s_available, s_preferred],\n",
+ " outputs=[\n",
+ " s_history, s_available, s_preferred,\n",
+ " preferred_md, available_md,\n",
+ " reply_in, # visible after first run\n",
+ " audio_out, # visible after first run\n",
+ " debug_box,\n",
+ " find_btn, # NEW: disable + relabel\n",
+ " send_btn, # NEW: show the Send button\n",
+ " ],\n",
+ " )\n",
+ "\n",
+ " # Multi-turn submit via Enter in the textbox\n",
+ " reply_in.submit(\n",
+ " refine_submit,\n",
+ " inputs=[reply_in, s_history, s_available, s_preferred],\n",
+ " outputs=[\n",
+ " reply_in, s_history, s_available, s_preferred,\n",
+ " preferred_md, available_md, audio_out, debug_box\n",
+ " ],\n",
+ " )\n",
+ "\n",
+ " # Multi-turn submit via explicit Send button\n",
+ " send_btn.click(\n",
+ " refine_submit,\n",
+ " inputs=[reply_in, s_history, s_available, s_preferred],\n",
+ " outputs=[\n",
+ " reply_in, s_history, s_available, s_preferred,\n",
+ " preferred_md, available_md, audio_out, debug_box\n",
+ " ],\n",
+ " )\n",
+ "\n",
+ " clear_btn.click(clear_debug, inputs=[], outputs=[debug_box])\n",
+ "\n",
+ "ui.launch(inbrowser=True, show_error=True)\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "llm-engineering",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.10"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/week2/community-contributions/hopeogbons/README.md b/week2/community-contributions/hopeogbons/README.md
new file mode 100644
index 0000000..953a7e5
--- /dev/null
+++ b/week2/community-contributions/hopeogbons/README.md
@@ -0,0 +1,355 @@
+# 🏥 RoboCare AI Assistant
+
+> Born from a real problem at MyWoosah Inc—now solving caregiver matching through AI.
+
+## 📋 The Story Behind This Project
+
+While working on a caregiver matching platform for **MyWoosah Inc** in the US, I faced a real challenge: how do you efficiently match families with the right caregivers when everyone has different needs?
+
+Families would ask things like:
+
+- _"I need someone for my mom on Monday mornings who speaks Spanish"_
+- _"Can you find elder care in Boston under $30/hour with CPR certification?"_
+
+Writing individual SQL queries for every combination of filters was exhausting and error-prone. I knew there had to be a better way.
+
+That's when I discovered the **Andela LLM Engineering program**. I saw an opportunity to transform this problem into a solution using AI. Instead of rigid queries, what if families could just... talk? And the AI would understand, search, and recommend?
+
+This project is my answer. It's not just an exercise—it's solving a real problem I encountered in the field.
+
+## What It Does
+
+RoboCare helps families find caregivers through natural conversation:
+
+- 🔍 Searches the database intelligently
+- 🎯 Finds the best matches
+- 💬 Explains pros/cons in plain English
+- 🔊 Speaks the results back to you
+
+## ✨ Features
+
+### 🤖 AI-Powered Matching
+
+- Natural language conversation interface
+- Intelligent requirement gathering
+- Multi-criteria search optimization
+- Personalized recommendations with pros/cons analysis
+
+### 🔍 Advanced Search Capabilities
+
+- **Location-based filtering**: City, state, and country
+- **Service type matching**: Elder care, child care, companionship, dementia care, hospice support, and more
+- **Availability scheduling**: Day and time-based matching
+- **Budget optimization**: Maximum hourly rate filtering
+- **Language preferences**: Multi-language support
+- **Certification requirements**: CPR, CNA, BLS, and specialized certifications
+- **Experience filtering**: Minimum years of experience
+
+### 🎙️ Multi-Modal Interface
+
+- Text-based chat interface
+- Voice response generation (Text-to-Speech)
+- Multiple voice options (coral, alloy, echo, fable, onyx, nova, shimmer)
+- Clean, modern UI built with Gradio
+
+### 🛡️ Defensive Architecture
+
+- Comprehensive error handling
+- Token overflow protection
+- Tool call validation
+- Graceful degradation
+
+## 🚀 Getting Started
+
+### Prerequisites
+
+- Python 3.8+
+- OpenAI API key
+- Virtual environment (recommended)
+
+### Installation
+
+1. **Clone the repository**
+
+ ```bash
+ cd week2
+ ```
+
+2. **Create and activate virtual environment**
+
+ ```bash
+ python -m venv .venv
+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
+ ```
+
+3. **Install dependencies**
+
+ ```bash
+ pip install -r requirements.txt
+ ```
+
+4. **Set up environment variables**
+
+ Create a `.env` file in the project root:
+
+ ```env
+ OPENAI_API_KEY=your_openai_api_key_here
+ ```
+
+5. **Run the application**
+
+ ```bash
+ jupyter notebook "week2 EXERCISE.ipynb"
+ ```
+
+ Or run all cells sequentially in your Jupyter environment.
+
+## 📊 Database Schema
+
+### Tables
+
+#### `caregivers`
+
+Primary caregiver information including:
+
+- Personal details (name, gender)
+- Experience level
+- Hourly rate and currency
+- Location (city, state, country, coordinates)
+- Live-in availability
+
+#### `caregiver_services`
+
+Care types offered by each caregiver:
+
+- Elder care
+- Child care
+- Companionship
+- Post-op support
+- Special needs
+- Respite care
+- Dementia care
+- Hospice support
+
+#### `availability`
+
+Time slots when caregivers are available:
+
+- Day of week (Mon-Sun)
+- Start and end times (24-hour format)
+
+#### `languages`
+
+Languages spoken by caregivers
+
+#### `certifications`
+
+Professional certifications (CPR, CNA, BLS, etc.)
+
+#### `traits`
+
+Personality and professional traits
+
+## 🔧 Architecture
+
+### Tool Registry Pattern
+
+```python
+TOOL_REGISTRY = {
+ "search_caregivers": search_caregivers,
+ "get_caregiver_profile": get_caregiver_profile,
+ # ... more tools
+}
+```
+
+All database functions are registered and can be called by the AI dynamically.
+
+### Search Functions
+
+#### `search_caregivers()`
+
+Multi-filter search with parameters:
+
+- `city`, `state_province`, `country` - Location filters
+- `care_type` - Type of care needed
+- `min_experience` - Minimum years of experience
+- `max_hourly_rate` - Budget constraint
+- `live_in` - Live-in caregiver requirement
+- `language` - Language preference
+- `certification` - Required certification
+- `day` - Day of week availability
+- `time_between` - Time window availability
+- `limit`, `offset` - Pagination
+
+#### `get_caregiver_profile(caregiver_id)`
+
+Returns complete profile including:
+
+- Basic information
+- Services offered
+- Languages spoken
+- Certifications
+- Personality traits
+- Availability schedule
+
+## 🎨 UI Components
+
+### Main Interface
+
+- **Chat History**: Message-based conversation display
+- **Voice Response**: Auto-playing audio output
+- **Settings Sidebar**:
+ - AI Model selector
+ - Voice selection
+ - Audio toggle
+ - Clear conversation button
+
+### User Experience
+
+- Professional gradient header
+- Collapsible instructions
+- Helpful placeholder text
+- Custom CSS styling
+- Responsive layout
+
+## 📝 Usage Examples
+
+### Example 1: Basic Search
+
+```python
+results = search_caregivers(
+ city="New York",
+ care_type="elder care",
+ max_hourly_rate=30.0,
+ limit=5
+)
+```
+
+### Example 2: Language Filter
+
+```python
+results = search_caregivers(
+ care_type="child care",
+ language="Spanish",
+ limit=3
+)
+```
+
+### Example 3: Availability Search
+
+```python
+results = search_caregivers(
+ day="Mon",
+ time_between=("09:00", "17:00"),
+ city="Boston"
+)
+```
+
+### Example 4: Get Full Profile
+
+```python
+profile = get_caregiver_profile(caregiver_id=1)
+print(profile['services'])
+print(profile['availability'])
+```
+
+## 🔐 Security & Best Practices
+
+### Current Implementation
+
+- ✅ Environment variable management for API keys
+- ✅ SQL injection prevention (parameterized queries)
+- ✅ Error handling and graceful degradation
+- ✅ Input validation through tool schemas
+
+### Important Disclaimers
+
+⚠️ **This is a demonstration application**
+
+- Credentials and background checks are NOT verified
+- Families should independently verify all caregiver information
+- Not intended for production use without additional security measures
+
+## 🛠️ Tech Stack
+
+- **AI/ML**: OpenAI GPT-4o-mini, Text-to-Speech API
+- **Database**: SQLite with normalized schema
+- **UI Framework**: Gradio
+- **Language**: Python 3.8+
+- **Key Libraries**:
+ - `openai` - OpenAI API client
+ - `gradio` - Web interface
+ - `python-dotenv` - Environment management
+ - `sqlite3` - Database operations
+
+## 📈 What's Next
+
+### Immediate Plans
+
+- [ ] Add speech input (families could call and talk)
+- [ ] Connect to actual MyWoosah database
+- [ ] Background check API integration
+- [ ] Deploy for real users
+
+### Future Enhancements
+
+- [ ] Streaming responses for real-time interaction
+- [ ] Dynamic model switching
+- [ ] User authentication and profiles
+- [ ] Review and rating system
+- [ ] Payment integration
+- [ ] Calendar integration for scheduling
+
+## 💡 Key Learnings
+
+Through building this project, I learned:
+
+1. **Prompt engineering is critical** - Small keyword mismatches = zero results. Mapping "Monday" → "Mon" matters.
+2. **Function calling is powerful** - Eliminated the need for custom queries. The AI figures it out.
+3. **Defensive programming saves headaches** - Things break. This code expects it and handles it elegantly.
+4. **AI makes databases accessible** - Good database design + AI = natural language interface
+
+## 🌍 The Bigger Picture
+
+This isn't just about caregiving. The same pattern works for:
+
+- Healthcare appointment booking
+- Legal service matching
+- Tutoring and education platforms
+- Real estate agent matching
+- Any matching problem where natural language beats forms
+
+**AI doesn't replace good database design—it makes it accessible to everyone.**
+
+---
+
+## 🤝 Contributing
+
+This project was created as part of the **Andela LLM Engineering Week 2 Exercise**.
+
+Feedback and contributions are welcome! Feel free to:
+
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Run all cells to test
+5. Submit a pull request
+
+## 🙏 Acknowledgments
+
+- **MyWoosah Inc** - For the real-world problem that inspired this solution
+- **Andela LLM Engineering Program** - Educational framework and guidance
+- **OpenAI** - GPT-4o and TTS API
+- **Gradio** - Making beautiful UIs accessible
+
+---
+
+
+
+**For MyWoosah Inc and beyond:** This is proof that AI can transform how we connect people with the care they need.
+
+_Built with ❤️ during Week 2 of the Andela LLM Engineering Program_
+
+**RoboOffice Ltd**
+
+
diff --git a/week2/community-contributions/hopeogbons/care_app.db b/week2/community-contributions/hopeogbons/care_app.db
new file mode 100644
index 0000000..93f8fdb
Binary files /dev/null and b/week2/community-contributions/hopeogbons/care_app.db differ
diff --git a/week2/community-contributions/hopeogbons/week2 EXERCISE.ipynb b/week2/community-contributions/hopeogbons/week2 EXERCISE.ipynb
new file mode 100644
index 0000000..6915f24
--- /dev/null
+++ b/week2/community-contributions/hopeogbons/week2 EXERCISE.ipynb
@@ -0,0 +1,1525 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd",
+ "metadata": {},
+ "source": [
+ "# 🏥 RoboCare AI Assistant\n",
+ "\n",
+ "## Why I Built This\n",
+ "\n",
+ "While working on a caregiver matching platform for **MyWoosah Inc** in the US, I faced a real challenge: how do you efficiently match families with the right caregivers when everyone has different needs?\n",
+ "\n",
+ "Families would ask things like:\n",
+ "- *\"I need someone for my mom on Monday mornings who speaks Spanish\"*\n",
+ "- *\"Can you find elder care in Boston under $30/hour with CPR certification?\"*\n",
+ "\n",
+ "Writing individual SQL queries for every combination of filters was exhausting and error-prone. I knew there had to be a better way.\n",
+ "\n",
+ "That's when I discovered the **Andela LLM Engineering program**. I saw an opportunity to transform this problem into a solution using AI. Instead of rigid queries, what if families could just... talk? And the AI would understand, search, and recommend?\n",
+ "\n",
+ "This project is my answer. It's not just an exercise—it's solving a real problem I encountered in the field.\n",
+ "\n",
+ "---\n",
+ "\n",
+ "## What This Does\n",
+ "\n",
+ "RoboCare helps families find caregivers through natural conversation. You tell it what you need, and it:\n",
+ "- 🔍 Searches the database intelligently\n",
+ "- 🎯 Finds the best matches\n",
+ "- 💬 Explains pros/cons in plain English \n",
+ "- 🔊 Speaks the results back to you\n",
+ "\n",
+ "**Tech:** OpenAI GPT-4o + Voice • Gradio UI • SQLite Database • Function Calling\n",
+ "\n",
+ "---\n",
+ "\n",
+ "**Note:** This is a demonstration. Always verify credentials independently."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4381c40c",
+ "metadata": {},
+ "source": [
+ "## Step 1: Libraries\n",
+ "\n",
+ "The essentials: OpenAI for the AI brain, Gradio for the interface, SQLite for data storage.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 63,
+ "id": "185c6841",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# imports\n",
+ "\n",
+ "import os\n",
+ "from dotenv import load_dotenv\n",
+ "from openai import OpenAI\n",
+ "import gradio as gr\n",
+ "import sqlite3\n",
+ "import sqlite3\n",
+ "from textwrap import dedent\n",
+ "from contextlib import contextmanager\n",
+ "from typing import Optional, List, Dict, Any, Tuple"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2a366c15",
+ "metadata": {},
+ "source": [
+ "## Step 2: Setup\n",
+ "\n",
+ "Loading API keys securely (never hardcode them!), setting up the OpenAI client, and pointing to our database.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 64,
+ "id": "0e731b96",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "OpenAI API Key exists and begins sk-proj-\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Initialization\n",
+ "\n",
+ "load_dotenv(override=True)\n",
+ "\n",
+ "openai_api_key = os.getenv('OPENAI_API_KEY')\n",
+ "if openai_api_key:\n",
+ " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
+ "else:\n",
+ " print(\"OpenAI API Key not set\")\n",
+ " \n",
+ "MODEL = \"gpt-4o-mini\"\n",
+ "openai = OpenAI()\n",
+ "\n",
+ "DB_PATH = \"care_app.db\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "686fa27a",
+ "metadata": {},
+ "source": [
+ "## Step 3: The Database\n",
+ "\n",
+ "20 sample caregivers across major US cities with:\n",
+ "- Services they offer (elder care, child care, etc.)\n",
+ "- Languages, certifications, availability\n",
+ "- Personality traits\n",
+ "- Realistic pricing and schedules\n",
+ "\n",
+ "This mirrors the kind of data MyWoosah Inc would manage—except here, AI does the matching work.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 65,
+ "id": "965d273d",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Seeded: care_app.db\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Table creation and seeding\n",
+ "\n",
+ "SQL = '''\n",
+ "\n",
+ " CREATE TABLE IF NOT EXISTS caregivers (\n",
+ " id INTEGER PRIMARY KEY,\n",
+ " name TEXT NOT NULL,\n",
+ " gender TEXT,\n",
+ " years_experience INTEGER,\n",
+ " live_in INTEGER, -- 0/1\n",
+ " hourly_rate REAL,\n",
+ " currency TEXT,\n",
+ " city TEXT,\n",
+ " state_province TEXT,\n",
+ " country TEXT,\n",
+ " postal_code TEXT,\n",
+ " lat REAL,\n",
+ " lon REAL\n",
+ " );\n",
+ "\n",
+ " CREATE TABLE IF NOT EXISTS caregiver_services (\n",
+ " caregiver_id INTEGER,\n",
+ " care_type TEXT,\n",
+ " FOREIGN KEY (caregiver_id) REFERENCES caregivers(id)\n",
+ " );\n",
+ "\n",
+ " CREATE TABLE IF NOT EXISTS availability (\n",
+ " caregiver_id INTEGER,\n",
+ " day TEXT, -- e.g., 'Mon'\n",
+ " time_start TEXT, -- 'HH:MM'\n",
+ " time_end TEXT, -- 'HH:MM'\n",
+ " FOREIGN KEY (caregiver_id) REFERENCES caregivers(id)\n",
+ " );\n",
+ "\n",
+ " CREATE TABLE IF NOT EXISTS languages (\n",
+ " caregiver_id INTEGER,\n",
+ " language TEXT,\n",
+ " FOREIGN KEY (caregiver_id) REFERENCES caregivers(id)\n",
+ " );\n",
+ "\n",
+ " CREATE TABLE IF NOT EXISTS certifications (\n",
+ " caregiver_id INTEGER,\n",
+ " cert TEXT,\n",
+ " FOREIGN KEY (caregiver_id) REFERENCES caregivers(id)\n",
+ " );\n",
+ "\n",
+ " CREATE TABLE IF NOT EXISTS traits (\n",
+ " caregiver_id INTEGER,\n",
+ " trait TEXT,\n",
+ " FOREIGN KEY (caregiver_id) REFERENCES caregivers(id)\n",
+ " );\n",
+ "\n",
+ " ----------------------------------------------------------\n",
+ "\n",
+ " -- Clear old data (optional)\n",
+ "\n",
+ " DELETE FROM traits;\n",
+ " DELETE FROM certifications;\n",
+ " DELETE FROM languages;\n",
+ " DELETE FROM availability;\n",
+ " DELETE FROM caregiver_services;\n",
+ " DELETE FROM caregivers;\n",
+ "\n",
+ " -- Seed caregivers (20 examples, all USA)\n",
+ "\n",
+ " INSERT INTO caregivers\n",
+ " (id, name, gender, years_experience, live_in, hourly_rate, currency, city, state_province, country, postal_code, lat, lon)\n",
+ " VALUES\n",
+ " (1, 'Grace Williams', 'female', 6, 0, 28, 'USD', 'New York', 'NY', 'USA', '10001', 40.7128, -74.0060),\n",
+ " (2, 'Miguel Alvarez', 'male', 9, 1, 30, 'USD', 'Los Angeles', 'CA', 'USA', '90012', 34.0522, -118.2437),\n",
+ " (3, 'Ava Johnson', 'female', 4, 0, 24, 'USD', 'Chicago', 'IL', 'USA', '60601', 41.8781, -87.6298),\n",
+ " (4, 'Noah Robinson', 'male', 12, 0, 27, 'USD', 'Houston', 'TX', 'USA', '77002', 29.7604, -95.3698),\n",
+ " (5, 'Sophia Martinez', 'female', 8, 0, 29, 'USD', 'Phoenix', 'AZ', 'USA', '85004', 33.4484, -112.0740),\n",
+ " (6, 'Daniel Carter', 'male', 10, 1, 31, 'USD', 'Philadelphia', 'PA', 'USA', '19103', 39.9526, -75.1652),\n",
+ " (7, 'Emily Nguyen', 'female', 7, 0, 26, 'USD', 'San Antonio', 'TX', 'USA', '78205', 29.4241, -98.4936),\n",
+ " (8, 'Olivia Kim', 'female', 5, 0, 27, 'USD', 'San Diego', 'CA', 'USA', '92101', 32.7157, -117.1611),\n",
+ " (9, 'James Thompson', 'male', 15, 1, 34, 'USD', 'Dallas', 'TX', 'USA', '75201', 32.7767, -96.7970),\n",
+ " (10, 'Isabella Garcia', 'female', 3, 0, 22, 'USD', 'San Jose', 'CA', 'USA', '95113', 37.3382, -121.8863),\n",
+ " (11, 'Ethan Patel', 'male', 11, 1, 33, 'USD', 'Austin', 'TX', 'USA', '78701', 30.2672, -97.7431),\n",
+ " (12, 'Harper Brooks', 'female', 2, 0, 20, 'USD', 'Jacksonville', 'FL', 'USA', '32202', 30.3322, -81.6557),\n",
+ " (13, 'Logan White', 'male', 6, 0, 25, 'USD', 'Fort Worth', 'TX', 'USA', '76102', 32.7555, -97.3308),\n",
+ " (14, 'Amelia Davis', 'female', 9, 0, 28, 'USD', 'Columbus', 'OH', 'USA', '43215', 39.9612, -82.9988),\n",
+ " (15, 'Charlotte Reed', 'female', 14, 1, 32, 'USD', 'Charlotte', 'NC', 'USA', '28202', 35.2271, -80.8431),\n",
+ " (16, 'Jackson Lee', 'male', 5, 0, 26, 'USD', 'San Francisco', 'CA', 'USA', '94102', 37.7749, -122.4194),\n",
+ " (17, 'Avery Chen', 'female', 7, 0, 27, 'USD', 'Seattle', 'WA', 'USA', '98101', 47.6062, -122.3321),\n",
+ " (18, 'William Turner', 'male', 13, 1, 35, 'USD', 'Denver', 'CO', 'USA', '80202', 39.7392, -104.9903),\n",
+ " (19, 'Natalie O''Brien', 'female', 16, 0, 36, 'USD', 'Boston', 'MA', 'USA', '02108', 42.3601, -71.0589),\n",
+ " (20, 'Maya Robinson', 'female', 3, 0, 23, 'USD', 'Atlanta', 'GA', 'USA', '30303', 33.7488, -84.3880);\n",
+ "\n",
+ " -- Seed caregiver services\n",
+ "\n",
+ " INSERT INTO caregiver_services (caregiver_id, care_type) VALUES\n",
+ " (1, 'elder care'), (1, 'companionship'),\n",
+ " (2, 'post-op support'), (2, 'elder care'),\n",
+ " (3, 'child care'), (3, 'special needs'),\n",
+ " (4, 'respite care'), (4, 'elder care'),\n",
+ " (5, 'dementia care'), (5, 'companionship'),\n",
+ " (6, 'elder care'), (6, 'hospice support'),\n",
+ " (7, 'child care'), (7, 'respite care'),\n",
+ " (8, 'post-op support'), (8, 'companionship'),\n",
+ " (9, 'special needs'), (9, 'elder care'),\n",
+ " (10, 'child care'), (10, 'companionship'),\n",
+ " (11, 'dementia care'), (11, 'post-op support'),\n",
+ " (12, 'child care'), (12, 'special needs'),\n",
+ " (13, 'respite care'), (13, 'companionship'),\n",
+ " (14, 'elder care'), (14, 'post-op support'),\n",
+ " (15, 'hospice support'), (15, 'dementia care'),\n",
+ " (16, 'elder care'), (16, 'respite care'),\n",
+ " (17, 'special needs'), (17, 'companionship'),\n",
+ " (18, 'post-op support'), (18, 'elder care'),\n",
+ " (19, 'dementia care'), (19, 'hospice support'),\n",
+ " (20, 'child care'), (20, 'companionship');\n",
+ "\n",
+ " -- Seed availability (Mon-Sun samples)\n",
+ "\n",
+ " INSERT INTO availability (caregiver_id, day, time_start, time_end) VALUES\n",
+ " -- 1 Grace (NY): evenings + Sun\n",
+ " (1, 'Mon', '17:30', '22:00'),\n",
+ " (1, 'Thu', '17:30', '22:00'),\n",
+ " (1, 'Sun', '10:00', '16:00'),\n",
+ " -- 2 Miguel (LA): live-in, long blocks\n",
+ " (2, 'Tue', '08:00', '20:00'),\n",
+ " (2, 'Thu', '08:00', '20:00'),\n",
+ " (2, 'Sat', '09:00', '18:00'),\n",
+ " -- 3 Ava (CHI): weekdays 09-17\n",
+ " (3, 'Mon', '09:00', '17:00'),\n",
+ " (3, 'Wed', '09:00', '17:00'),\n",
+ " (3, 'Fri', '09:00', '17:00'),\n",
+ " -- 4 Noah (HOU): Tue-Fri 08-16\n",
+ " (4, 'Tue', '08:00', '16:00'),\n",
+ " (4, 'Wed', '08:00', '16:00'),\n",
+ " (4, 'Thu', '08:00', '16:00'),\n",
+ " -- 5 Sophia (PHX): Thu-Sun 10-18\n",
+ " (5, 'Thu', '10:00', '18:00'),\n",
+ " (5, 'Fri', '10:00', '18:00'),\n",
+ " (5, 'Sat', '10:00', '18:00'),\n",
+ " -- 6 Daniel (PHL): Mon-Thu 07-15\n",
+ " (6, 'Mon', '07:00', '15:00'),\n",
+ " (6, 'Tue', '07:00', '15:00'),\n",
+ " (6, 'Thu', '07:00', '15:00'),\n",
+ " -- 7 Emily (SAT): weekends\n",
+ " (7, 'Sat', '08:00', '17:00'),\n",
+ " (7, 'Sun', '09:00', '17:00'),\n",
+ " (7, 'Fri', '17:00', '21:00'),\n",
+ " -- 8 Olivia (SD): Mon, Wed evenings\n",
+ " (8, 'Mon', '16:00', '21:00'),\n",
+ " (8, 'Wed', '16:00', '21:00'),\n",
+ " (8, 'Sat', '10:00', '14:00'),\n",
+ " -- 9 James (DAL): live-in wide\n",
+ " (9, 'Mon', '07:00', '19:00'),\n",
+ " (9, 'Wed', '07:00', '19:00'),\n",
+ " (9, 'Sun', '09:00', '17:00'),\n",
+ " -- 10 Isabella (SJ): Tue-Thu 12-20\n",
+ " (10, 'Tue', '12:00', '20:00'),\n",
+ " (10, 'Wed', '12:00', '20:00'),\n",
+ " (10, 'Thu', '12:00', '20:00'),\n",
+ " -- 11 Ethan (ATX): nights\n",
+ " (11, 'Mon', '18:00', '23:00'),\n",
+ " (11, 'Tue', '18:00', '23:00'),\n",
+ " (11, 'Fri', '18:00', '23:00'),\n",
+ " -- 12 Harper (JAX): school hours\n",
+ " (12, 'Mon', '09:00', '14:00'),\n",
+ " (12, 'Wed', '09:00', '14:00'),\n",
+ " (12, 'Fri', '09:00', '14:00'),\n",
+ " -- 13 Logan (FTW): Thu-Sat\n",
+ " (13, 'Thu', '10:00', '18:00'),\n",
+ " (13, 'Fri', '10:00', '18:00'),\n",
+ " (13, 'Sat', '10:00', '18:00'),\n",
+ " -- 14 Amelia (CMH): Mon-Fri 08-16\n",
+ " (14, 'Mon', '08:00', '16:00'),\n",
+ " (14, 'Tue', '08:00', '16:00'),\n",
+ " (14, 'Thu', '08:00', '16:00'),\n",
+ " -- 15 Charlotte (CLT): live-in style\n",
+ " (15, 'Tue', '07:00', '19:00'),\n",
+ " (15, 'Thu', '07:00', '19:00'),\n",
+ " (15, 'Sat', '08:00', '16:00'),\n",
+ " -- 16 Jackson (SF): split shifts\n",
+ " (16, 'Mon', '07:00', '11:00'),\n",
+ " (16, 'Mon', '17:00', '21:00'),\n",
+ " (16, 'Sat', '12:00', '18:00'),\n",
+ " -- 17 Avery (SEA): Tue/Thu + Sun\n",
+ " (17, 'Tue', '10:00', '18:00'),\n",
+ " (17, 'Thu', '10:00', '18:00'),\n",
+ " (17, 'Sun', '11:00', '17:00'),\n",
+ " -- 18 William (DEN): Mon-Wed 06-14\n",
+ " (18, 'Mon', '06:00', '14:00'),\n",
+ " (18, 'Tue', '06:00', '14:00'),\n",
+ " (18, 'Wed', '06:00', '14:00'),\n",
+ " -- 19 Natalie (BOS): Tue-Fri 09-17\n",
+ " (19, 'Tue', '09:00', '17:00'),\n",
+ " (19, 'Wed', '09:00', '17:00'),\n",
+ " (19, 'Fri', '09:00', '17:00'),\n",
+ " -- 20 Maya (ATL): after-school + Sat\n",
+ " (20, 'Mon', '15:00', '20:00'),\n",
+ " (20, 'Wed', '15:00', '20:00'),\n",
+ " (20, 'Sat', '09:00', '15:00');\n",
+ "\n",
+ " -- Seed languages\n",
+ "\n",
+ " INSERT INTO languages (caregiver_id, language) VALUES\n",
+ " (1, 'English'), (1, 'Spanish'),\n",
+ " (2, 'English'), (2, 'Spanish'),\n",
+ " (3, 'English'),\n",
+ " (4, 'English'),\n",
+ " (5, 'English'), (5, 'Spanish'),\n",
+ " (6, 'English'),\n",
+ " (7, 'English'), (7, 'Vietnamese'),\n",
+ " (8, 'English'), (8, 'Korean'),\n",
+ " (9, 'English'),\n",
+ " (10,'English'), (10,'Spanish'),\n",
+ " (11,'English'), (11,'Hindi'),\n",
+ " (12,'English'),\n",
+ " (13,'English'),\n",
+ " (14,'English'), (14,'French'),\n",
+ " (15,'English'),\n",
+ " (16,'English'), (16,'Tagalog'),\n",
+ " (17,'English'), (17,'Mandarin'),\n",
+ " (18,'English'),\n",
+ " (19,'English'), (19,'Portuguese'),\n",
+ " (20,'English'), (20,'ASL');\n",
+ "\n",
+ " -- Seed certifications\n",
+ "\n",
+ " INSERT INTO certifications (caregiver_id, cert) VALUES\n",
+ " (1, 'CPR'), (1, 'First Aid'),\n",
+ " (2, 'CPR'), (2, 'BLS'),\n",
+ " (3, 'CPR'),\n",
+ " (4, 'First Aid'), (4, 'CNA'),\n",
+ " (5, 'CPR'), (5, 'Dementia Care'),\n",
+ " (6, 'HHA'), (6, 'CPR'),\n",
+ " (7, 'First Aid'),\n",
+ " (8, 'CPR'), (8, 'AED'),\n",
+ " (9, 'CNA'), (9, 'BLS'),\n",
+ " (10,'First Aid'),\n",
+ " (11,'CPR'), (11,'Medication Technician'),\n",
+ " (12,'CPR'),\n",
+ " (13,'First Aid'),\n",
+ " (14,'CPR'), (14,'CNA'),\n",
+ " (15,'Hospice Training'), (15,'CPR'),\n",
+ " (16,'First Aid'),\n",
+ " (17,'CPR'), (17,'Special Needs Training'),\n",
+ " (18,'BLS'), (18,'CPR'),\n",
+ " (19,'Dementia Care'), (19,'First Aid'),\n",
+ " (20,'CPR'), (20,'Childcare Safety');\n",
+ "\n",
+ " -- Seed traits\n",
+ "\n",
+ " INSERT INTO traits (caregiver_id, trait) VALUES\n",
+ " (1, 'empathetic'), (1, 'detail-oriented'),\n",
+ " (2, 'patient'), (2, 'communicative'),\n",
+ " (3, 'cheerful'), (3, 'reliable'),\n",
+ " (4, 'organized'), (4, 'professional'),\n",
+ " (5, 'compassionate'), (5, 'trustworthy'),\n",
+ " (6, 'calm under pressure'), (6, 'punctual'),\n",
+ " (7, 'adaptable'), (7, 'energetic'),\n",
+ " (8, 'friendly'), (8, 'respectful'),\n",
+ " (9, 'thorough'), (9, 'dependable'),\n",
+ " (10,'gentle'), (10,'attentive'),\n",
+ " (11,'proactive'), (11,'communicative'),\n",
+ " (12,'patient'), (12,'kind'),\n",
+ " (13,'flexible'), (13,'tidy'),\n",
+ " (14,'reliable'), (14,'punctual'),\n",
+ " (15,'compassionate'), (15,'detail-oriented'),\n",
+ " (16,'discreet'), (16,'organized'),\n",
+ " (17,'empathetic'), (17,'calm under pressure'),\n",
+ " (18,'professional'), (18,'thorough'),\n",
+ " (19,'trustworthy'), (19,'proactive'),\n",
+ " (20,'cheerful'), (20,'attentive');\n",
+ "\n",
+ "'''\n",
+ "\n",
+ "# Insert the data into the database\n",
+ "\n",
+ "sql = dedent(SQL)\n",
+ "con = sqlite3.connect(DB_PATH)\n",
+ "con.executescript(sql)\n",
+ "con.commit()\n",
+ "con.close()\n",
+ "print(\"Seeded:\", DB_PATH)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3c0baa64",
+ "metadata": {},
+ "source": [
+ "## Step 4: Teaching the AI to Search\n",
+ "\n",
+ "Instead of the AI just talking, we teach it to actually *search* the database.\n",
+ "\n",
+ "When someone says *\"I need elder care in Boston for Mondays\"*, the AI translates that into:\n",
+ "```python\n",
+ "search_caregivers(city=\"Boston\", care_type=\"elder care\", day=\"Mon\")\n",
+ "```\n",
+ "\n",
+ "This schema defines all the filters the AI can use: location, services, budget, language, availability, and more.\n",
+ "\n",
+ "**This was the breakthrough.** No more writing custom queries—the AI figures it out.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 66,
+ "id": "f2af7c67",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[{'type': 'function',\n",
+ " 'function': {'name': 'search_caregivers',\n",
+ " 'description': 'Flexible multi-filter caregiver search. Any filter can be omitted. Supports location, service type, experience, pricing, live-in, language, certifications, day/time availability, and pagination.',\n",
+ " 'parameters': {'type': 'object',\n",
+ " 'properties': {'city': {'type': 'string',\n",
+ " 'description': 'City name to filter by (optional).'},\n",
+ " 'state_province': {'type': 'string',\n",
+ " 'description': 'State or province to filter by (optional).'},\n",
+ " 'country': {'type': 'string',\n",
+ " 'description': 'Country to filter by (optional).'},\n",
+ " 'care_type': {'type': 'string',\n",
+ " 'description': \"Service category, e.g., 'elder_care', 'child_care', 'pet_care', 'housekeeping' (optional).\"},\n",
+ " 'min_experience': {'type': 'integer',\n",
+ " 'minimum': 0,\n",
+ " 'description': 'Minimum years of experience (optional).'},\n",
+ " 'max_hourly_rate': {'type': 'number',\n",
+ " 'minimum': 0,\n",
+ " 'description': 'Maximum hourly rate in local currency (optional).'},\n",
+ " 'live_in': {'type': 'boolean',\n",
+ " 'description': 'Require live-in caregivers (optional).'},\n",
+ " 'language': {'type': 'string',\n",
+ " 'description': \"Required spoken language, e.g., 'English', 'Spanish' (optional).\"},\n",
+ " 'certification': {'type': 'string',\n",
+ " 'description': \"Required certification, e.g., 'CPR', 'CNA' (optional).\"},\n",
+ " 'day': {'type': 'string',\n",
+ " 'description': \"Day of week to match availability (optional), e.g., 'Monday', 'Tuesday', ... 'Sunday'.\"},\n",
+ " 'time_between': {'type': 'array',\n",
+ " 'description': \"Required availability window as ['HH:MM','HH:MM'] in 24h time. Matches caregivers whose availability window fully covers this range.\",\n",
+ " 'items': {'type': 'string',\n",
+ " 'pattern': '^\\\\d{2}:\\\\d{2}$',\n",
+ " 'description': \"Time in 'HH:MM' 24-hour format.\"},\n",
+ " 'minItems': 2,\n",
+ " 'maxItems': 2},\n",
+ " 'limit': {'type': 'integer',\n",
+ " 'minimum': 1,\n",
+ " 'maximum': 1000,\n",
+ " 'default': 50,\n",
+ " 'description': 'Max number of results to return (default 50).'},\n",
+ " 'offset': {'type': 'integer',\n",
+ " 'minimum': 0,\n",
+ " 'default': 0,\n",
+ " 'description': 'Number of results to skip for pagination (default 0).'}},\n",
+ " 'required': [],\n",
+ " 'additionalProperties': False}}}]"
+ ]
+ },
+ "execution_count": 66,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Tool definition schema\n",
+ "\n",
+ "tools = [{\n",
+ " \"type\": \"function\",\n",
+ " \"function\": {\n",
+ " \"name\": \"search_caregivers\",\n",
+ " \"description\": (\n",
+ " \"Flexible multi-filter caregiver search. Any filter can be omitted. \"\n",
+ " \"Supports location, service type, experience, pricing, live-in, language, \"\n",
+ " \"certifications, day/time availability, and pagination.\"\n",
+ " ),\n",
+ " \"parameters\": {\n",
+ " \"type\": \"object\",\n",
+ " \"properties\": {\n",
+ " \"city\": {\n",
+ " \"type\": \"string\",\n",
+ " \"description\": \"City name to filter by (optional).\"\n",
+ " },\n",
+ " \"state_province\": {\n",
+ " \"type\": \"string\",\n",
+ " \"description\": \"State or province to filter by (optional).\"\n",
+ " },\n",
+ " \"country\": {\n",
+ " \"type\": \"string\",\n",
+ " \"description\": \"Country to filter by (optional).\"\n",
+ " },\n",
+ " \"care_type\": {\n",
+ " \"type\": \"string\",\n",
+ " \"description\": (\n",
+ " \"Service category, e.g., 'elder_care', 'child_care', \"\n",
+ " \"'pet_care', 'housekeeping' (optional).\"\n",
+ " )\n",
+ " },\n",
+ " \"min_experience\": {\n",
+ " \"type\": \"integer\",\n",
+ " \"minimum\": 0,\n",
+ " \"description\": \"Minimum years of experience (optional).\"\n",
+ " },\n",
+ " \"max_hourly_rate\": {\n",
+ " \"type\": \"number\",\n",
+ " \"minimum\": 0,\n",
+ " \"description\": \"Maximum hourly rate in local currency (optional).\"\n",
+ " },\n",
+ " \"live_in\": {\n",
+ " \"type\": \"boolean\",\n",
+ " \"description\": \"Require live-in caregivers (optional).\"\n",
+ " },\n",
+ " \"language\": {\n",
+ " \"type\": \"string\",\n",
+ " \"description\": \"Required spoken language, e.g., 'English', 'Spanish' (optional).\"\n",
+ " },\n",
+ " \"certification\": {\n",
+ " \"type\": \"string\",\n",
+ " \"description\": \"Required certification, e.g., 'CPR', 'CNA' (optional).\"\n",
+ " },\n",
+ " \"day\": {\n",
+ " \"type\": \"string\",\n",
+ " \"description\": (\n",
+ " \"Day of week to match availability (optional), e.g., \"\n",
+ " \"'Monday', 'Tuesday', ... 'Sunday'.\"\n",
+ " )\n",
+ " },\n",
+ " \"time_between\": {\n",
+ " \"type\": \"array\",\n",
+ " \"description\": (\n",
+ " \"Required availability window as ['HH:MM','HH:MM'] in 24h time. \"\n",
+ " \"Matches caregivers whose availability window fully covers this range.\"\n",
+ " ),\n",
+ " \"items\": {\n",
+ " \"type\": \"string\",\n",
+ " \"pattern\": \"^\\\\d{2}:\\\\d{2}$\",\n",
+ " \"description\": \"Time in 'HH:MM' 24-hour format.\"\n",
+ " },\n",
+ " \"minItems\": 2,\n",
+ " \"maxItems\": 2\n",
+ " },\n",
+ " \"limit\": {\n",
+ " \"type\": \"integer\",\n",
+ " \"minimum\": 1,\n",
+ " \"maximum\": 1000,\n",
+ " \"default\": 50,\n",
+ " \"description\": \"Max number of results to return (default 50).\"\n",
+ " },\n",
+ " \"offset\": {\n",
+ " \"type\": \"integer\",\n",
+ " \"minimum\": 0,\n",
+ " \"default\": 0,\n",
+ " \"description\": \"Number of results to skip for pagination (default 0).\"\n",
+ " }\n",
+ " },\n",
+ " \"required\": [],\n",
+ " \"additionalProperties\": False\n",
+ " }\n",
+ " }\n",
+ "}]\n",
+ "\n",
+ "tools"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "76416da2",
+ "metadata": {},
+ "source": [
+ "## Step 5: Helper Functions\n",
+ "\n",
+ "**Voice:** The AI can speak its responses using OpenAI's text-to-speech.\n",
+ "\n",
+ "**Database functions:** All the queries we need—search, get profiles, check availability, etc. These are what the AI calls behind the scenes.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 67,
+ "id": "2f50cc15",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "# Convert text to speech using OpenAI's TTS API\n",
+ "def announcements(message):\n",
+ " response = openai.audio.speech.create(\n",
+ " model=\"gpt-4o-mini-tts\",\n",
+ " voice=\"coral\", # Also, try replacing onyx with alloy or coral\n",
+ " input=message\n",
+ " )\n",
+ " return response.content\n",
+ "\n",
+ "# Context manager for database connection\n",
+ "@contextmanager\n",
+ "def _conn(dict_rows: bool = True):\n",
+ " conn = sqlite3.connect(DB_PATH)\n",
+ " if dict_rows:\n",
+ " conn.row_factory = _dict_factory\n",
+ " try:\n",
+ " yield conn\n",
+ " conn.commit()\n",
+ " finally:\n",
+ " conn.close()\n",
+ "\n",
+ "####################\n",
+ "# Helper functions #\n",
+ "####################\n",
+ "\n",
+ "# Converts SQLite query results from tuples into dictionaries\n",
+ "def _dict_factory(cursor, row):\n",
+ " return {col[0]: row[idx] for idx, col in enumerate(cursor.description)}\n",
+ "# A debug/logging function that prints database tool activity\n",
+ "def _print(msg: str):\n",
+ " print(f\"DATABASE TOOL CALLED: {msg}\", flush=True)\n",
+ "\n",
+ "################################\n",
+ "# Caregiver database functions #\n",
+ "################################\n",
+ "\n",
+ "# Counts the number of caregivers in the database\n",
+ "def get_caregiver_count() -> int:\n",
+ " _print(\"Counting caregivers\")\n",
+ " with _conn() as conn:\n",
+ " cur = conn.cursor()\n",
+ " cur.execute(\"SELECT COUNT(*) AS n FROM caregivers\")\n",
+ " return cur.fetchone()[\"n\"]\n",
+ "\n",
+ "# Fetches a caregiver's profile by their ID\n",
+ "def get_caregiver(caregiver_id: int) -> Optional[Dict[str, Any]]:\n",
+ " _print(f\"Fetching caregiver #{caregiver_id}\")\n",
+ " with _conn() as conn:\n",
+ " cur = conn.cursor()\n",
+ " cur.execute(\"SELECT * FROM caregivers WHERE id = ?\", (caregiver_id,))\n",
+ " return cur.fetchone()\n",
+ "\n",
+ "# Lists caregivers with pagination\n",
+ "def list_caregivers(limit: int = 20, offset: int = 0) -> List[Dict[str, Any]]:\n",
+ " _print(f\"Listing caregivers (limit={limit}, offset={offset})\")\n",
+ " with _conn() as conn:\n",
+ " cur = conn.cursor()\n",
+ " cur.execute(\"\"\"\n",
+ " SELECT * FROM caregivers\n",
+ " ORDER BY id\n",
+ " LIMIT ? OFFSET ?\n",
+ " \"\"\", (limit, offset))\n",
+ " return cur.fetchall()\n",
+ "\n",
+ "# Fetches the services a caregiver offers\n",
+ "def get_services(caregiver_id: int) -> List[str]:\n",
+ " _print(f\"Fetching services for caregiver #{caregiver_id}\")\n",
+ " with _conn() as conn:\n",
+ " cur = conn.cursor()\n",
+ " cur.execute(\"\"\"\n",
+ " SELECT care_type FROM caregiver_services WHERE caregiver_id = ?\n",
+ " ORDER BY care_type\n",
+ " \"\"\", (caregiver_id,))\n",
+ " return [r[\"care_type\"] for r in cur.fetchall()]\n",
+ "\n",
+ "# Fetches the languages a caregiver speaks\n",
+ "def get_languages(caregiver_id: int) -> List[str]:\n",
+ " _print(f\"Fetching languages for caregiver #{caregiver_id}\")\n",
+ " with _conn() as conn:\n",
+ " cur = conn.cursor()\n",
+ " cur.execute(\"\"\"\n",
+ " SELECT language FROM languages WHERE caregiver_id = ?\n",
+ " ORDER BY language\n",
+ " \"\"\", (caregiver_id,))\n",
+ " return [r[\"language\"] for r in cur.fetchall()]\n",
+ "\n",
+ "# Fetches the certifications a caregiver has\n",
+ "def get_certifications(caregiver_id: int) -> List[str]:\n",
+ " _print(f\"Fetching certifications for caregiver #{caregiver_id}\")\n",
+ " with _conn() as conn:\n",
+ " cur = conn.cursor()\n",
+ " cur.execute(\"\"\"\n",
+ " SELECT cert FROM certifications WHERE caregiver_id = ?\n",
+ " ORDER BY cert\n",
+ " \"\"\", (caregiver_id,))\n",
+ " return [r[\"cert\"] for r in cur.fetchall()]\n",
+ "\n",
+ "# Fetches the traits a caregiver has\n",
+ "def get_traits(caregiver_id: int) -> List[str]:\n",
+ " _print(f\"Fetching traits for caregiver #{caregiver_id}\")\n",
+ " with _conn() as conn:\n",
+ " cur = conn.cursor()\n",
+ " cur.execute(\"\"\"\n",
+ " SELECT trait FROM traits WHERE caregiver_id = ?\n",
+ " ORDER BY trait\n",
+ " \"\"\", (caregiver_id,))\n",
+ " return [r[\"trait\"] for r in cur.fetchall()]\n",
+ "\n",
+ "# Fetches the availability of a caregiver\n",
+ "def get_availability(caregiver_id: int) -> List[Dict[str, str]]:\n",
+ " _print(f\"Fetching availability for caregiver #{caregiver_id}\")\n",
+ " with _conn() as conn:\n",
+ " cur = conn.cursor()\n",
+ " cur.execute(\"\"\"\n",
+ " SELECT day, time_start, time_end\n",
+ " FROM availability\n",
+ " WHERE caregiver_id = ?\n",
+ " ORDER BY\n",
+ " CASE day\n",
+ " WHEN 'Mon' THEN 1 WHEN 'Tue' THEN 2 WHEN 'Wed' THEN 3\n",
+ " WHEN 'Thu' THEN 4 WHEN 'Fri' THEN 5 WHEN 'Sat' THEN 6\n",
+ " WHEN 'Sun' THEN 7 ELSE 8\n",
+ " END, time_start\n",
+ " \"\"\", (caregiver_id,))\n",
+ " return cur.fetchall()\n",
+ "\n",
+ "# Fetches a caregiver's full profile\n",
+ "def get_caregiver_profile(caregiver_id: int) -> Optional[Dict[str, Any]]:\n",
+ " _print(f\"Fetching full profile for caregiver #{caregiver_id}\")\n",
+ " base = get_caregiver(caregiver_id)\n",
+ " if not base:\n",
+ " return None\n",
+ " base[\"services\"] = get_services(caregiver_id)\n",
+ " base[\"languages\"] = get_languages(caregiver_id)\n",
+ " base[\"certifications\"] = get_certifications(caregiver_id)\n",
+ " base[\"traits\"] = get_traits(caregiver_id)\n",
+ " base[\"availability\"] = get_availability(caregiver_id)\n",
+ " return base\n",
+ "\n",
+ "###########################################\n",
+ "# Search caregivers with multiple filters #\n",
+ "###########################################\n",
+ "\n",
+ "def search_caregivers(\n",
+ " city: Optional[str] = None,\n",
+ " state_province: Optional[str] = None,\n",
+ " country: Optional[str] = None,\n",
+ " care_type: Optional[str] = None,\n",
+ " min_experience: Optional[int] = None,\n",
+ " max_hourly_rate: Optional[float] = None,\n",
+ " live_in: Optional[bool] = None,\n",
+ " language: Optional[str] = None,\n",
+ " certification: Optional[str] = None,\n",
+ " day: Optional[str] = None,\n",
+ " time_between: Optional[Tuple[str, str]] = None, # ('HH:MM', 'HH:MM')\n",
+ " limit: int = 50,\n",
+ " offset: int = 0\n",
+ ") -> List[Dict[str, Any]]:\n",
+ " \"\"\"\n",
+ " Flexible multi-filter search. Any filter can be omitted.\n",
+ " \"\"\"\n",
+ " _print(\"Searching caregivers with multiple filters\")\n",
+ "\n",
+ " # base + optional joins\n",
+ " join_clauses = []\n",
+ " where = [\"1=1\"]\n",
+ " params: List[Any] = []\n",
+ "\n",
+ " if care_type:\n",
+ " join_clauses.append(\"JOIN caregiver_services s ON s.caregiver_id = c.id\")\n",
+ " where.append(\"LOWER(s.care_type) = LOWER(?)\")\n",
+ " params.append(care_type)\n",
+ "\n",
+ " if language:\n",
+ " join_clauses.append(\"JOIN languages l ON l.caregiver_id = c.id\")\n",
+ " where.append(\"LOWER(l.language) = LOWER(?)\")\n",
+ " params.append(language)\n",
+ "\n",
+ " if certification:\n",
+ " join_clauses.append(\"JOIN certifications cert ON cert.caregiver_id = c.id\")\n",
+ " where.append(\"LOWER(cert.cert) = LOWER(?)\")\n",
+ " params.append(certification)\n",
+ "\n",
+ " if day or time_between:\n",
+ " join_clauses.append(\"JOIN availability a ON a.caregiver_id = c.id\")\n",
+ " if day:\n",
+ " where.append(\"a.day = ?\")\n",
+ " params.append(day)\n",
+ " if time_between:\n",
+ " t0, t1 = time_between\n",
+ " # overlap check: caregiver window [start,end] must include [t0,t1]\n",
+ " where.append(\"a.time_start <= ? AND a.time_end >= ?\")\n",
+ " params.extend([t0, t1])\n",
+ "\n",
+ " if city:\n",
+ " where.append(\"LOWER(c.city) = LOWER(?)\")\n",
+ " params.append(city)\n",
+ " if state_province:\n",
+ " where.append(\"LOWER(c.state_province) = LOWER(?)\")\n",
+ " params.append(state_province)\n",
+ " if country:\n",
+ " where.append(\"LOWER(c.country) = LOWER(?)\")\n",
+ " params.append(country)\n",
+ " if min_experience is not None:\n",
+ " where.append(\"c.years_experience >= ?\")\n",
+ " params.append(min_experience)\n",
+ " if max_hourly_rate is not None:\n",
+ " where.append(\"c.hourly_rate <= ?\")\n",
+ " params.append(max_hourly_rate)\n",
+ " if live_in is not None:\n",
+ " where.append(\"c.live_in = ?\")\n",
+ " params.append(1 if live_in else 0)\n",
+ "\n",
+ " sql = f\"\"\"\n",
+ " SELECT DISTINCT c.*\n",
+ " FROM caregivers c\n",
+ " {' '.join(join_clauses)}\n",
+ " WHERE {' AND '.join(where)}\n",
+ " ORDER BY c.hourly_rate ASC, c.years_experience DESC, c.id\n",
+ " LIMIT ? OFFSET ?\n",
+ " \"\"\"\n",
+ " params.extend([limit, offset])\n",
+ "\n",
+ " with _conn() as conn:\n",
+ " cur = conn.cursor()\n",
+ " cur.execute(sql, tuple(params))\n",
+ " return cur.fetchall()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6c526d05",
+ "metadata": {},
+ "source": [
+ "## Step 6: Quick Test\n",
+ "\n",
+ "Before connecting everything to the AI, let's make sure the database works. Run these examples to see sample caregivers and their profiles.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 68,
+ "id": "98165a21",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "DATABASE TOOL CALLED: Searching caregivers with multiple filters\n",
+ "Found 1 elder care providers in New York:\n",
+ "- Grace Williams: $28.0/hr, 6 years experience\n",
+ "\n",
+ "============================================================\n",
+ "\n",
+ "DATABASE TOOL CALLED: Searching caregivers with multiple filters\n",
+ "Found 1 Spanish-speaking child care providers:\n",
+ "- Isabella Garcia in San Jose, CA\n",
+ "\n",
+ "============================================================\n",
+ "\n",
+ "DATABASE TOOL CALLED: Fetching full profile for caregiver #1\n",
+ "DATABASE TOOL CALLED: Fetching caregiver #1\n",
+ "DATABASE TOOL CALLED: Fetching services for caregiver #1\n",
+ "DATABASE TOOL CALLED: Fetching languages for caregiver #1\n",
+ "DATABASE TOOL CALLED: Fetching certifications for caregiver #1\n",
+ "DATABASE TOOL CALLED: Fetching traits for caregiver #1\n",
+ "DATABASE TOOL CALLED: Fetching availability for caregiver #1\n",
+ "Detailed profile for Grace Williams:\n",
+ " Services: companionship, elder care\n",
+ " Languages: English, Spanish\n",
+ " Certifications: CPR, First Aid\n",
+ " Traits: detail-oriented, empathetic\n",
+ " Availability: 3 time slots\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Example 1: Search for elder care providers in New York\n",
+ "results = search_caregivers(\n",
+ " city=\"New York\",\n",
+ " care_type=\"elder care\",\n",
+ " max_hourly_rate=30.0,\n",
+ " limit=5\n",
+ ")\n",
+ "\n",
+ "print(f\"Found {len(results)} elder care providers in New York:\")\n",
+ "for caregiver in results:\n",
+ " print(f\"- {caregiver['name']}: ${caregiver['hourly_rate']}/hr, {caregiver['years_experience']} years experience\")\n",
+ "\n",
+ "print(\"\\n\" + \"=\"*60 + \"\\n\")\n",
+ "\n",
+ "# Example 2: Search for Spanish-speaking child care providers\n",
+ "results2 = search_caregivers(\n",
+ " care_type=\"child care\",\n",
+ " language=\"Spanish\",\n",
+ " limit=3\n",
+ ")\n",
+ "\n",
+ "print(f\"Found {len(results2)} Spanish-speaking child care providers:\")\n",
+ "for caregiver in results2:\n",
+ " print(f\"- {caregiver['name']} in {caregiver['city']}, {caregiver['state_province']}\")\n",
+ "\n",
+ "print(\"\\n\" + \"=\"*60 + \"\\n\")\n",
+ "\n",
+ "# Example 3: Get detailed profile of a specific caregiver\n",
+ "if results:\n",
+ " caregiver_id = results[0]['id']\n",
+ " profile = get_caregiver_profile(caregiver_id)\n",
+ " print(f\"Detailed profile for {profile['name']}:\")\n",
+ " print(f\" Services: {', '.join(profile['services'])}\")\n",
+ " print(f\" Languages: {', '.join(profile['languages'])}\")\n",
+ " print(f\" Certifications: {', '.join(profile['certifications'])}\")\n",
+ " print(f\" Traits: {', '.join(profile['traits'])}\")\n",
+ " print(f\" Availability: {len(profile['availability'])} time slots\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "abfa81e6",
+ "metadata": {},
+ "source": [
+ "## Step 7: The AI's Instructions\n",
+ "\n",
+ "Here's where I learned prompt engineering matters *a lot*.\n",
+ "\n",
+ "The AI needs to know:\n",
+ "- What exact keywords to use (\"elder care\" not \"elderly care\", \"Mon\" not \"Monday\")\n",
+ "- How to map natural language to database values\n",
+ "- That it should give 2-3 recommendations with pros/cons\n",
+ "- To remind families to verify credentials independently\n",
+ "\n",
+ "**The lesson from MyWoosah:** Small keyword mismatches = zero results. This prompt prevents that.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 69,
+ "id": "7bbe36e3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# System prompt\n",
+ "\n",
+ "system_prompt = '''\n",
+ " You are a compassionate Caregiver Assistant that helps families quickly identify the most\n",
+ " suitable care provider by gathering requirements (care needs, schedule, budget, location,\n",
+ " language/cultural fit) and matching them to available profiles. Provide 2-3 best-fit options\n",
+ " with pros/cons, estimated costs, and next steps, and clearly state that credentials/background\n",
+ " checks are not verified by this sample app and should be confirmed by the family.\n",
+ "\n",
+ " CRITICAL: When searching the database, you MUST use these EXACT terms:\n",
+ "\n",
+ " CARE TYPES (use exactly as shown):\n",
+ " - \"elder care\" (for elderly, senior, old age, geriatric care)\n",
+ " - \"companionship\" (for companion, friendship, social support)\n",
+ " - \"post-op support\" (for post-surgery, post-operative, recovery care)\n",
+ " - \"child care\" (for children, kids, babysitting, nanny)\n",
+ " - \"special needs\" (for disabilities, autism, developmental needs)\n",
+ " - \"respite care\" (for temporary relief, break for family caregivers)\n",
+ " - \"dementia care\" (for Alzheimer's, memory care, cognitive decline)\n",
+ " - \"hospice support\" (for end-of-life, palliative, terminal care)\n",
+ "\n",
+ " If a user mentions any variation, map it to the closest match above. If unclear, ask clarifying questions.\n",
+ "\n",
+ " DAYS OF WEEK (use exactly as shown):\n",
+ " - \"Mon\" (for Monday)\n",
+ " - \"Tue\" (for Tuesday)\n",
+ " - \"Wed\" (for Wednesday)\n",
+ " - \"Thu\" (for Thursday)\n",
+ " - \"Fri\" (for Friday)\n",
+ " - \"Sat\" (for Saturday)\n",
+ " - \"Sun\" (for Sunday)\n",
+ "\n",
+ " STATES/PROVINCES (use 2-letter codes):\n",
+ " - Use standard US state abbreviations: \"NY\", \"CA\", \"TX\", \"FL\", \"MA\", etc.\n",
+ " - Convert full state names to abbreviations before searching\n",
+ "\n",
+ " COMMON LANGUAGES:\n",
+ " - \"English\", \"Spanish\", \"French\", \"Vietnamese\", \"Korean\", \"Hindi\", \"Mandarin\", \"Portuguese\", \"Tagalog\", \"ASL\"\n",
+ " - Capitalize properly (e.g., user says \"spanish\" → use \"Spanish\")\n",
+ "\n",
+ " CERTIFICATIONS:\n",
+ " - \"CPR\", \"First Aid\", \"CNA\", \"BLS\", \"HHA\", \"AED\", \"Medication Technician\", \"Hospice Training\", \n",
+ " \"Dementia Care\", \"Special Needs Training\", \"Childcare Safety\"\n",
+ " - Use exact capitalization and full names\n",
+ "\n",
+ " TRAITS:\n",
+ " - \"empathetic\", \"patient\", \"cheerful\", \"organized\", \"compassionate\", \"calm under pressure\", \n",
+ " \"adaptable\", \"friendly\", \"thorough\", \"gentle\", \"proactive\", \"flexible\", \"reliable\", \n",
+ " \"detail-oriented\", \"communicative\", \"energetic\", \"respectful\", \"dependable\", \"attentive\", \n",
+ " \"kind\", \"tidy\", \"punctual\", \"discreet\", \"professional\", \"trustworthy\"\n",
+ " - Use lowercase for all traits\n",
+ "\n",
+ " SEARCH STRATEGY:\n",
+ " 1. Listen carefully to user requirements\n",
+ " 2. Map their natural language to database terms above\n",
+ " 3. Use search_caregivers() with exact keyword matches\n",
+ " 4. If no results, suggest alternatives or broader searches\n",
+ " 5. After getting results, use get_caregiver_profile() for detailed information on top matches\n",
+ "\n",
+ " Always confirm your understanding by restating requirements using the exact database terms before searching.\n",
+ "'''"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0b8ae902",
+ "metadata": {},
+ "source": [
+ "## Step 8: Making it Work (and Not Crash)\n",
+ "\n",
+ "This is the engine room. When the AI wants to search, this code:\n",
+ "1. Validates the request\n",
+ "2. Calls the right database function\n",
+ "3. Handles errors gracefully (no crashes!)\n",
+ "4. Limits results to prevent overwhelming the AI\n",
+ "5. Generates the voice response\n",
+ "\n",
+ "**Defensive programming:** I learned the hard way that things break. This code expects problems and handles them elegantly.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 70,
+ "id": "0d8accbc",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Function registry: Maps tool names to actual Python functions\n",
+ "TOOL_REGISTRY = {\n",
+ " \"search_caregivers\": search_caregivers,\n",
+ " \"get_caregiver_count\": get_caregiver_count,\n",
+ " \"get_caregiver\": get_caregiver,\n",
+ " \"list_caregivers\": list_caregivers,\n",
+ " \"get_services\": get_services,\n",
+ " \"get_languages\": get_languages,\n",
+ " \"get_certifications\": get_certifications,\n",
+ " \"get_traits\": get_traits,\n",
+ " \"get_availability\": get_availability,\n",
+ " \"get_caregiver_profile\": get_caregiver_profile,\n",
+ "}\n",
+ "\n",
+ "def execute_tool_call(tool_call):\n",
+ " \"\"\"\n",
+ " Safely execute a single tool call with error handling.\n",
+ " Returns a properly formatted tool response.\n",
+ " \"\"\"\n",
+ " import json\n",
+ " \n",
+ " function_name = tool_call.function.name\n",
+ " \n",
+ " # Defensive check: Ensure function exists in registry\n",
+ " if function_name not in TOOL_REGISTRY:\n",
+ " return {\n",
+ " \"role\": \"tool\",\n",
+ " \"tool_call_id\": tool_call.id,\n",
+ " \"content\": json.dumps({\n",
+ " \"error\": f\"Unknown function: {function_name}\",\n",
+ " \"available_functions\": list(TOOL_REGISTRY.keys())\n",
+ " })\n",
+ " }\n",
+ " \n",
+ " try:\n",
+ " # Parse arguments\n",
+ " args = json.loads(tool_call.function.arguments)\n",
+ " \n",
+ " # Execute the function\n",
+ " func = TOOL_REGISTRY[function_name]\n",
+ " result = func(**args)\n",
+ " \n",
+ " # Format response based on result type with limit to prevent token overflow\n",
+ " if isinstance(result, list):\n",
+ " content = json.dumps({\n",
+ " \"count\": len(result),\n",
+ " \"results\": result[:10] if len(result) > 10 else result,\n",
+ " \"truncated\": len(result) > 10\n",
+ " })\n",
+ " elif isinstance(result, dict):\n",
+ " content = json.dumps(result)\n",
+ " elif isinstance(result, (int, float, str)):\n",
+ " content = json.dumps({\"result\": result})\n",
+ " else:\n",
+ " content = str(result)\n",
+ " \n",
+ " return {\n",
+ " \"role\": \"tool\",\n",
+ " \"tool_call_id\": tool_call.id,\n",
+ " \"content\": content\n",
+ " }\n",
+ " \n",
+ " except Exception as e:\n",
+ " # Defensive error handling\n",
+ " return {\n",
+ " \"role\": \"tool\",\n",
+ " \"tool_call_id\": tool_call.id,\n",
+ " \"content\": json.dumps({\n",
+ " \"error\": str(e),\n",
+ " \"function\": function_name,\n",
+ " \"args\": tool_call.function.arguments\n",
+ " })\n",
+ " }\n",
+ "\n",
+ "def process_tool_calls(message):\n",
+ " \"\"\"\n",
+ " Process all tool calls from the AI response.\n",
+ " Returns tool responses and extracted metadata.\n",
+ " \"\"\"\n",
+ " responses = []\n",
+ " metadata = {\n",
+ " \"cities\": set(),\n",
+ " \"caregiver_ids\": set(),\n",
+ " \"total_results\": 0\n",
+ " }\n",
+ " \n",
+ " if not message.tool_calls:\n",
+ " return responses, metadata\n",
+ " \n",
+ " for tool_call in message.tool_calls:\n",
+ " # Execute the tool call\n",
+ " response = execute_tool_call(tool_call)\n",
+ " responses.append(response)\n",
+ " \n",
+ " # Extract metadata for UI enhancements\n",
+ " try:\n",
+ " import json\n",
+ " content = json.loads(response[\"content\"])\n",
+ " \n",
+ " # Extract cities from search results\n",
+ " if \"results\" in content and isinstance(content[\"results\"], list):\n",
+ " for item in content[\"results\"]:\n",
+ " if isinstance(item, dict) and \"city\" in item:\n",
+ " metadata[\"cities\"].add(item[\"city\"])\n",
+ " if isinstance(item, dict) and \"id\" in item:\n",
+ " metadata[\"caregiver_ids\"].add(item[\"id\"])\n",
+ " \n",
+ " if \"count\" in content:\n",
+ " metadata[\"total_results\"] += content[\"count\"]\n",
+ " \n",
+ " except:\n",
+ " pass # Silently ignore metadata extraction errors\n",
+ " \n",
+ " return responses, metadata\n",
+ "\n",
+ "def generate_city_image(city):\n",
+ " \"\"\"\n",
+ " Generate or retrieve a city image (placeholder for future enhancement).\n",
+ " Could integrate with DALL-E, Unsplash API, or local image database.\n",
+ " \"\"\"\n",
+ " # Placeholder - can be enhanced with actual image generation\n",
+ " return None\n",
+ "\n",
+ "def chat(history):\n",
+ " \"\"\"\n",
+ " Main chat handler with multi-modal support and defensive error handling.\n",
+ " Handles conversation flow, tool calls, and response generation.\n",
+ " \"\"\"\n",
+ " # Normalize history format\n",
+ " history = [{\"role\": h[\"role\"], \"content\": h[\"content\"]} for h in history]\n",
+ " \n",
+ " # Initialize conversation with system prompt\n",
+ " messages = [{\"role\": \"system\", \"content\": system_prompt}] + history\n",
+ " \n",
+ " # Initialize metadata\n",
+ " image = None\n",
+ " selected_city = None\n",
+ " \n",
+ " try:\n",
+ " # Initial API call\n",
+ " response = openai.chat.completions.create(\n",
+ " model=MODEL,\n",
+ " messages=messages,\n",
+ " tools=tools\n",
+ " )\n",
+ " \n",
+ " # Tool calling loop (with safety limit)\n",
+ " max_iterations = 5\n",
+ " iteration = 0\n",
+ " \n",
+ " while response.choices[0].finish_reason == \"tool_calls\" and iteration < max_iterations:\n",
+ " iteration += 1\n",
+ " message = response.choices[0].message\n",
+ " \n",
+ " # Process all tool calls\n",
+ " tool_responses, metadata = process_tool_calls(message)\n",
+ " \n",
+ " # Track city for image generation\n",
+ " if metadata[\"cities\"]:\n",
+ " selected_city = list(metadata[\"cities\"])[0]\n",
+ " \n",
+ " # Add assistant message and tool responses to conversation\n",
+ " messages.append(message)\n",
+ " messages.extend(tool_responses)\n",
+ " \n",
+ " # Continue conversation\n",
+ " response = openai.chat.completions.create(\n",
+ " model=MODEL,\n",
+ " messages=messages,\n",
+ " tools=tools\n",
+ " )\n",
+ " \n",
+ " # Extract final reply\n",
+ " reply = response.choices[0].message.content\n",
+ " history.append({\"role\": \"assistant\", \"content\": reply})\n",
+ " \n",
+ " # Generate voice response\n",
+ " voice = announcements(reply)\n",
+ " \n",
+ " # Generate city image if applicable\n",
+ " if selected_city:\n",
+ " image = generate_city_image(selected_city)\n",
+ " \n",
+ " return history, voice, image\n",
+ " \n",
+ " except Exception as e:\n",
+ " # Defensive error handling for the entire chat flow\n",
+ " error_message = f\"I apologize, but I encountered an error: {str(e)}. Please try again.\"\n",
+ " history.append({\"role\": \"assistant\", \"content\": error_message})\n",
+ " return history, None, None"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "451ed2e5",
+ "metadata": {},
+ "source": [
+ "## Step 9: The Interface\n",
+ "\n",
+ "A clean, professional web UI built with Gradio.\n",
+ "\n",
+ "Features:\n",
+ "- Chat interface with conversation history\n",
+ "- Voice responses that auto-play\n",
+ "- Settings sidebar (model selection, voice options)\n",
+ "- Clear instructions for families\n",
+ "\n",
+ "**Why Gradio?** At MyWoosah, I needed something non-technical staff could use immediately. Gradio made that possible without weeks of frontend work.\n",
+ "\n",
+ "**Run this cell to launch!** 🚀\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 71,
+ "id": "a07e7793-b8f5-44f4-aded-5562f633271a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "* Running on local URL: http://127.0.0.1:7871\n",
+ "* To create a public link, set `share=True` in `launch()`.\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/plain": []
+ },
+ "execution_count": 71,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import gradio as gr\n",
+ "\n",
+ "# Gradio UI Setup\n",
+ "\n",
+ "def put_message_in_chatbot(message, history):\n",
+ " \"\"\"Add user message to chat history\"\"\"\n",
+ " return \"\", history + [{\"role\": \"user\", \"content\": message}]\n",
+ "\n",
+ "# Custom CSS for better styling\n",
+ "custom_css = \"\"\"\n",
+ "#chatbot {\n",
+ " border-radius: 10px;\n",
+ " box-shadow: 0 2px 8px rgba(0,0,0,0.1);\n",
+ "}\n",
+ "#message_box {\n",
+ " border-radius: 8px;\n",
+ "}\n",
+ ".header {\n",
+ " text-align: center;\n",
+ " padding: 20px;\n",
+ " background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);\n",
+ " color: white;\n",
+ " border-radius: 10px;\n",
+ " margin-bottom: 20px;\n",
+ "}\n",
+ "\"\"\"\n",
+ "\n",
+ "with gr.Blocks(title=\"CareGiver AI Assistant\", css=custom_css, theme=gr.themes.Soft()) as ui:\n",
+ " \n",
+ " # Header\n",
+ " gr.Markdown(\"\"\"\n",
+ " \n",
+ " \"\"\")\n",
+ " \n",
+ " # Instructions\n",
+ " with gr.Accordion(\"ℹ️ Click here to learn more on how to use this AI\", open=False):\n",
+ " gr.Markdown(\"\"\"\n",
+ " **Tell me what you need:**\n",
+ " - Type of care (elder care, child care, companionship, etc.)\n",
+ " - Location (city, state)\n",
+ " - Schedule requirements (days/times)\n",
+ " - Budget constraints\n",
+ " - Language or certification needs\n",
+ " \n",
+ " **Example:** \"I need an elder care provider in Boston for Monday mornings who speaks Spanish and has CPR certification.\"\n",
+ " \n",
+ " ⚠️ **Note:** This is a demo app. Always verify credentials and conduct background checks independently.\n",
+ " \"\"\")\n",
+ " \n",
+ " # Main chat interface\n",
+ " with gr.Row():\n",
+ " with gr.Column(scale=2):\n",
+ " chatbot = gr.Chatbot(\n",
+ " height=500, \n",
+ " type=\"messages\",\n",
+ " elem_id=\"chatbot\",\n",
+ " label=\"Chat History\",\n",
+ " avatar_images=(None, \"🤖\")\n",
+ " )\n",
+ " \n",
+ " # Audio output\n",
+ " audio_output = gr.Audio(\n",
+ " label=\"Voice Response\",\n",
+ " autoplay=True,\n",
+ " visible=True,\n",
+ " interactive=False\n",
+ " )\n",
+ " \n",
+ " # Settings sidebar\n",
+ " with gr.Column(scale=1):\n",
+ " gr.Markdown(\"### ⚙️ Settings\")\n",
+ " \n",
+ " # Model selector (for future enhancement)\n",
+ " model_select = gr.Dropdown(\n",
+ " choices=[\"gpt-4o-mini\", \"gpt-4o\", \"gpt-4-turbo\"],\n",
+ " value=\"gpt-4o-mini\",\n",
+ " label=\"AI Model\",\n",
+ " interactive=True\n",
+ " )\n",
+ " \n",
+ " # Voice selector\n",
+ " voice_select = gr.Dropdown(\n",
+ " choices=[\"coral\", \"alloy\", \"echo\", \"fable\", \"onyx\", \"nova\", \"shimmer\"],\n",
+ " value=\"coral\",\n",
+ " label=\"Voice\",\n",
+ " interactive=True\n",
+ " )\n",
+ " \n",
+ " # Audio toggle\n",
+ " audio_enabled = gr.Checkbox(\n",
+ " label=\"Enable Voice Responses\",\n",
+ " value=True\n",
+ " )\n",
+ " \n",
+ " # Clear button\n",
+ " clear_btn = gr.Button(\"🗑️ Clear Conversation\", variant=\"secondary\")\n",
+ " \n",
+ " # Input section\n",
+ " with gr.Row():\n",
+ " message = gr.Textbox(\n",
+ " label=\"Your Message\",\n",
+ " placeholder=\"Type your question here... (e.g., 'I need elder care in Boston')\",\n",
+ " lines=2,\n",
+ " elem_id=\"message_box\",\n",
+ " scale=4\n",
+ " )\n",
+ " send_btn = gr.Button(\"Send\", variant=\"primary\", scale=1)\n",
+ " \n",
+ " # Event handlers\n",
+ " def chat_wrapper(history):\n",
+ " \"\"\"Wrapper to handle chat and extract only needed outputs\"\"\"\n",
+ " history_out, voice, image = chat(history)\n",
+ " return history_out, voice\n",
+ " \n",
+ " # Submit on enter or button click\n",
+ " submit_event = message.submit(\n",
+ " put_message_in_chatbot,\n",
+ " inputs=[message, chatbot],\n",
+ " outputs=[message, chatbot]\n",
+ " ).then(\n",
+ " chat_wrapper,\n",
+ " inputs=chatbot,\n",
+ " outputs=[chatbot, audio_output]\n",
+ " )\n",
+ " \n",
+ " send_btn.click(\n",
+ " put_message_in_chatbot,\n",
+ " inputs=[message, chatbot],\n",
+ " outputs=[message, chatbot]\n",
+ " ).then(\n",
+ " chat_wrapper,\n",
+ " inputs=chatbot,\n",
+ " outputs=[chatbot, audio_output]\n",
+ " )\n",
+ " \n",
+ " # Clear conversation\n",
+ " clear_btn.click(\n",
+ " lambda: ([], None),\n",
+ " outputs=[chatbot, audio_output]\n",
+ " )\n",
+ " \n",
+ " # Footer\n",
+ " gr.Markdown(\"\"\"\n",
+ " ---\n",
+ " \n",
+ " Powered by OpenAI & Gradio | Built by RoboOffice Ltd\n",
+ " \n",
+ " \"\"\")\n",
+ "\n",
+ "# Launch with better configuration\n",
+ "ui.launch(\n",
+ " inbrowser=True,\n",
+ " share=False,\n",
+ " show_error=True,\n",
+ " quiet=False\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "97d87d95",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## Reflections\n",
+ "\n",
+ "This project started from frustration: *\"There has to be a better way to match families with caregivers.\"*\n",
+ "\n",
+ "Through the Andela program, I learned that AI + thoughtful engineering = solutions to real problems.\n",
+ "\n",
+ "### What Worked:\n",
+ "- **Function calling** eliminated the need for custom queries\n",
+ "- **Prompt engineering** prevented keyword mismatches\n",
+ "- **Defensive coding** made it robust\n",
+ "- **Gradio** made it accessible\n",
+ "\n",
+ "### What I'd Do Next:\n",
+ "- Add speech input (families could call and talk)\n",
+ "- Connect to actual MyWoosah database\n",
+ "- Add background check API integration\n",
+ "- Deploy for real users\n",
+ "\n",
+ "### The Bigger Picture:\n",
+ "\n",
+ "This isn't just about caregiving. The same pattern works for:\n",
+ "- Healthcare appointments\n",
+ "- Legal services\n",
+ "- Tutoring platforms\n",
+ "- Any matching problem where natural language beats forms\n",
+ "\n",
+ "AI doesn't replace good database design—it makes it accessible to everyone.\n",
+ "\n",
+ "---\n",
+ "\n",
+ "**For MyWoosah Inc and beyond:** This is proof that AI can transform how we connect people with the care they need.\n",
+ "\n",
+ "*Built during Week 2 of the Andela LLM Engineering Program*\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/week2/community-contributions/kwabena/week2_solution_.ipynb b/week2/community-contributions/kwabena/week2_solution_.ipynb
new file mode 100644
index 0000000..9b1f22e
--- /dev/null
+++ b/week2/community-contributions/kwabena/week2_solution_.ipynb
@@ -0,0 +1,173 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "fd1cdd6e",
+ "metadata": {},
+ "source": [
+ "## Week 2 - Full Prototype for Technical Questions Answerer"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "70db9a0b",
+ "metadata": {},
+ "source": [
+ " This notebook will implement a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "df46689d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# imports\n",
+ "import os\n",
+ "import json\n",
+ "from dotenv import load_dotenv\n",
+ "from openai import OpenAI\n",
+ "import gradio as gr\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c7416a2a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Initialization\n",
+ "load_dotenv(override=True)\n",
+ "\n",
+ "openai_api_key = os.getenv('OPENAI_API_KEY')\n",
+ "if openai_api_key:\n",
+ " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
+ "else:\n",
+ " print(\"OpenAI API Key not set\")\n",
+ " \n",
+ "MODEL = \"gpt-4.1-mini\"\n",
+ "openai = OpenAI()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "86966749",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "system_message = \"\"\"\n",
+ "You are an expert technical question answerer specializing in data science, programming, \n",
+ "and software engineering. Your goal is to provide clear, accurate, and practical answers \n",
+ "to technical questions.\n",
+ "\n",
+ "When answering:\n",
+ "- Break down complex concepts into understandable explanations\n",
+ "- Provide code examples when relevant, with comments explaining key parts\n",
+ "- Mention common pitfalls or best practices\n",
+ "- If a question is ambiguous, state your assumptions or ask for clarification\n",
+ "- For debugging questions, explain both the fix and why the error occurred\n",
+ "- Cite specific documentation or resources when helpful\n",
+ "\n",
+ "Always prioritize accuracy and clarity over speed. If you're unsure about something, \n",
+ "acknowledge the uncertainty rather than guessing.\n",
+ "\"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d34e5b81",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Streaming chat funcion\n",
+ "def chat(model, history):\n",
+ " messages = [{\"role\": \"system\", \"content\": system_message}]\n",
+ " for h in history:\n",
+ " messages.append({\"role\": h[\"role\"], \"content\": h[\"content\"]})\n",
+ "\n",
+ " stream = openai.chat.completions.create(\n",
+ " model=model, \n",
+ " messages=messages,\n",
+ " stream=True\n",
+ " )\n",
+ "\n",
+ " response = \"\"\n",
+ " for chunk in stream:\n",
+ " if chunk.choices[0].delta.content is not None:\n",
+ " response += chunk.choices[0].delta.content\n",
+ " yield history + [{\"role\": \"assistant\", \"content\": response}]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "32350869",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Gradio Interface\n",
+ "with gr.Blocks() as ui:\n",
+ " with gr.Row():\n",
+ " chatbot = gr.Chatbot(height=500, type=\"messages\")\n",
+ " with gr.Row():\n",
+ " message = gr.Textbox(label=\"Chat with AI Assistant: \")\n",
+ " model_dropdown = gr.Dropdown(\n",
+ " choices=[\"gpt-4.1-mini\",\"gpt-4o-mini\", \"gpt-4o\", \"gpt-4-turbo\"], \n",
+ " value=\"gpt-4.1-mini\", \n",
+ " label=\"Select Model\"\n",
+ " ) \n",
+ "\n",
+ " def handle_submit(user_message, chat_history):\n",
+ " # Add user message to history\n",
+ " chat_history = chat_history + [{\"role\": \"user\", \"content\": user_message}]\n",
+ " return \"\", chat_history\n",
+ "\n",
+ " message.submit(\n",
+ " handle_submit, \n",
+ " inputs=[message, chatbot], \n",
+ " outputs=[message, chatbot]\n",
+ " ).then(\n",
+ " chat, \n",
+ " inputs=[model_dropdown, chatbot],\n",
+ " outputs=[chatbot]\n",
+ " )\n",
+ "\n",
+ "ui.launch(inbrowser=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cf2b29e1",
+ "metadata": {},
+ "source": [
+ "### Concluding Remarks\n",
+ "In this exercise, we successfully built a working AI chatbot with Gradio that includes streaming responses and the ability to switch between different models. The implementation demonstrates how to create an interactive interface for LLM applications."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/week2/community-contributions/week2_exercise_solution-Stephen.ipynb b/week2/community-contributions/week2_exercise_solution-Stephen.ipynb
new file mode 100644
index 0000000..21de7d8
--- /dev/null
+++ b/week2/community-contributions/week2_exercise_solution-Stephen.ipynb
@@ -0,0 +1,296 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd",
+ "metadata": {},
+ "source": [
+ "# End of week 2 Exercise - Bookstore Assistant\n",
+ "\n",
+ "Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.\n",
+ "\n",
+ "This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!\n",
+ "\n",
+ "If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.\n",
+ "\n",
+ "I will publish a full solution here soon - unless someone beats me to it...\n",
+ "\n",
+ "There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "a07e7793-b8f5-44f4-aded-5562f633271a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "OpenAI API Key exists and begins sk-proj-\n",
+ "Google API Key exists and begins AIzaSyCL\n"
+ ]
+ }
+ ],
+ "source": [
+ "import os\n",
+ "import json\n",
+ "from dotenv import load_dotenv\n",
+ "from openai import OpenAI\n",
+ "import gradio as gr\n",
+ "\n",
+ "load_dotenv(override=True)\n",
+ "\n",
+ "openai_api_key = os.getenv('OPENAI_API_KEY')\n",
+ "google_api_key = os.getenv('GOOGLE_API_KEY')\n",
+ "\n",
+ "if openai_api_key:\n",
+ " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
+ "else:\n",
+ " print(\"OpenAI API Key not set\")\n",
+ "\n",
+ "if google_api_key:\n",
+ " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
+ "else:\n",
+ " print(\"Google API Key not set\")\n",
+ " \n",
+ "MODEL_GPT = \"gpt-4.1-mini\"\n",
+ "MODEL_GEMINI = \"gemini-2.5-pro\"\n",
+ "\n",
+ "\n",
+ "openai = OpenAI()\n",
+ "\n",
+ "gemini_url = \"https://generativelanguage.googleapis.com/v1beta/openai/\"\n",
+ "gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0a3aa8bf",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models\n",
+ "\n",
+ "system_message= \"\"\"\n",
+ " You are an assistant in a software engineering bookstore that analyzes the content of technical books and generates concise, informative summaries for readers.\n",
+ " Your goal is to help customers quickly understand what each book covers, its practical value, and who would benefit most from reading it.\n",
+ " Respond in markdown without code blocks.\n",
+ " Each summary should include:\n",
+ " Overview: The book’s main topic, scope, and focus area (e.g., software architecture, DevOps, system design).\n",
+ " Key Insights: The most important lessons, principles, or methodologies discussed.\n",
+ " Recommended For: The type of reader who would benefit most (e.g., junior developers, engineering managers, backend specialists).\n",
+ " Related Reads: Suggest one or two similar or complementary titles available in the store.\n",
+ " Maintain a professional and knowledgeable tone that reflects expertise in software engineering literature. \n",
+ "\"\"\"\n",
+ "\n",
+ "def stream_gpt(prompt):\n",
+ " messages = [\n",
+ " {\"role\": \"system\", \"content\": system_message},\n",
+ " {\"role\": \"user\", \"content\": prompt}\n",
+ " ]\n",
+ " stream = openai.chat.completions.create(\n",
+ " model=MODEL_GPT,\n",
+ " messages=messages,\n",
+ " stream=True\n",
+ " )\n",
+ " result = \"\"\n",
+ " for chunk in stream:\n",
+ " result += chunk.choices[0].delta.content or \"\"\n",
+ " yield result\n",
+ "\n",
+ "def stream_gemini(prompt):\n",
+ " messages = [\n",
+ " {\"role\": \"system\", \"content\": system_message},\n",
+ " {\"role\": \"user\", \"content\": prompt}\n",
+ " ]\n",
+ " stream = openai.chat.completions.create(\n",
+ " model=MODEL_GEMINI,\n",
+ " messages=messages,\n",
+ " stream=True\n",
+ " )\n",
+ " result = \"\"\n",
+ " for chunk in stream:\n",
+ " result += chunk.choices[0].delta.content or \"\"\n",
+ " yield result\n",
+ "\n",
+ "def stream_model(prompt, model):\n",
+ " if model==\"GPT\":\n",
+ " result = stream_gpt(prompt)\n",
+ " elif model==\"Gemini\":\n",
+ " result = stream_gemini(prompt)\n",
+ " else:\n",
+ " raise ValueError(\"Unknown model\")\n",
+ " yield from result\n",
+ "\n",
+ "\n",
+ "message_input = gr.Textbox(label=\"Your message:\", info=\"Enter a software engineering book title for the LLM\", lines=4)\n",
+ "model_selector = gr.Dropdown([\"GPT\", \"Gemini\"], label=\"Select model\", value=\"GPT\")\n",
+ "message_output = gr.Markdown(label=\"Response:\")\n",
+ "\n",
+ "view = gr.Interface(\n",
+ " fn=stream_model,\n",
+ " title=\"Bookstore Assistant\", \n",
+ " inputs=[message_input, model_selector], \n",
+ " outputs=[message_output], \n",
+ " examples=[\n",
+ " [\"Explain Clean Code by Robert C. Martin\", \"GPT\"],\n",
+ " [\"Explain Clean Code by Robert C. Martin\", \"Gemini\"]\n",
+ " ], \n",
+ " flagging_mode=\"never\"\n",
+ " )\n",
+ "view.launch()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a4d7980c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import sqlite3\n",
+ "\n",
+ "DB = \"books.db\"\n",
+ "\n",
+ "with sqlite3.connect(DB) as conn:\n",
+ " cursor = conn.cursor()\n",
+ " cursor.execute('CREATE TABLE IF NOT EXISTS prices (title TEXT PRIMARY KEY, price REAL)')\n",
+ " conn.commit()\n",
+ "\n",
+ "def get_book_price(title):\n",
+ " print(f\"DATABASE TOOL CALLED: Getting price for {title}\", flush=True)\n",
+ " with sqlite3.connect(DB) as conn:\n",
+ " cursor = conn.cursor()\n",
+ " cursor.execute('SELECT price FROM prices WHERE title = ?', (title.lower(),))\n",
+ " result = cursor.fetchone()\n",
+ " return f\"Book -> {title} price is ${result[0]}\" if result else \"No price data available for this title\"\n",
+ "\n",
+ "def set_book_price(title, price):\n",
+ " with sqlite3.connect(DB) as conn:\n",
+ " cursor = conn.cursor()\n",
+ " cursor.execute('INSERT INTO prices (title, price) VALUES (?, ?) ON CONFLICT(title) DO UPDATE SET price = ?', (title.lower(), price, price))\n",
+ " conn.commit()\n",
+ "\n",
+ "book_prices = {\"Clean code\":20, \"Clean architecture\": 30, \"System design\": 40, \"Design patterns\": 50}\n",
+ "for title, price in book_prices.items():\n",
+ " set_book_price(title, price)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "86741761",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# use of a tool\n",
+ "MODEL = \"gpt-4.1-mini\"\n",
+ "\n",
+ "system_message = \"\"\"\n",
+ "You are a helpful assistant in a software engineering bookstore BookEye. \n",
+ "Give short, courteous answers, no more than 1 sentence.\n",
+ "Always be accurate. If you don't know the answer, say so.\n",
+ "\"\"\"\n",
+ "\n",
+ "price_function = {\n",
+ " \"name\": \"get_book_price\",\n",
+ " \"description\": \"Get the price of a book.\",\n",
+ " \"parameters\": {\n",
+ " \"type\": \"object\",\n",
+ " \"properties\": {\n",
+ " \"book_title\": {\n",
+ " \"type\": \"string\",\n",
+ " \"description\": \"The title of the book that the customer wants to buy\",\n",
+ " },\n",
+ " },\n",
+ " \"required\": [\"book_title\"],\n",
+ " \"additionalProperties\": False\n",
+ " }\n",
+ "}\n",
+ "tools = [{\"type\": \"function\", \"function\": price_function}]\n",
+ "\n",
+ "\n",
+ "def talker(message):\n",
+ " response = openai.audio.speech.create(\n",
+ " model=\"gpt-4o-mini-tts\",\n",
+ " voice=\"coral\",\n",
+ " input=message\n",
+ " )\n",
+ " return response.content\n",
+ "\n",
+ "def handle_tool_calls(message):\n",
+ " responses = []\n",
+ " for tool_call in message.tool_calls:\n",
+ " if tool_call.function.name == \"get_book_price\":\n",
+ " arguments = json.loads(tool_call.function.arguments)\n",
+ " title = arguments.get('book_title')\n",
+ " price_details = get_book_price(title)\n",
+ " responses.append({\n",
+ " \"role\": \"tool\",\n",
+ " \"content\": price_details,\n",
+ " \"tool_call_id\": tool_call.id\n",
+ " })\n",
+ " return responses\n",
+ "\n",
+ "def chat(history):\n",
+ " history = [{\"role\":h[\"role\"], \"content\":h[\"content\"]} for h in history]\n",
+ " messages = [{\"role\": \"system\", \"content\": system_message}] + history\n",
+ " response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
+ "\n",
+ " while response.choices[0].finish_reason==\"tool_calls\":\n",
+ " message = response.choices[0].message\n",
+ " responses = handle_tool_calls(message)\n",
+ " messages.append(message)\n",
+ " messages.extend(responses)\n",
+ " response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
+ "\n",
+ " reply = response.choices[0].message.content\n",
+ " history += [{\"role\":\"assistant\", \"content\":reply}]\n",
+ "\n",
+ " voice = talker(reply)\n",
+ " \n",
+ " return history, voice\n",
+ "\n",
+ "def put_message_in_chatbot(message, history):\n",
+ " return \"\", history + [{\"role\":\"user\", \"content\":message}]\n",
+ "with gr.Blocks() as ui:\n",
+ " with gr.Row():\n",
+ " chatbot = gr.Chatbot(height=300, type=\"messages\")\n",
+ " audio_output = gr.Audio(autoplay=True)\n",
+ " \n",
+ " with gr.Row():\n",
+ " message = gr.Textbox(label=\"Chat with our AI Assistant:\")\n",
+ "\n",
+ " message.submit(put_message_in_chatbot, inputs=[message, chatbot], outputs=[message, chatbot]).then(\n",
+ " chat, inputs=chatbot, outputs=[chatbot, audio_output]\n",
+ " )\n",
+ "\n",
+ "ui.launch(inbrowser=True, auth=(\"ted\", \"mowsb\"))"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/week3/community-contributions/week3_Exercise_survey_Dataset_Generation.ipynb b/week3/community-contributions/week3_Exercise_survey_Dataset_Generation.ipynb
new file mode 100644
index 0000000..a4474af
--- /dev/null
+++ b/week3/community-contributions/week3_Exercise_survey_Dataset_Generation.ipynb
@@ -0,0 +1,906 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "a8dbb4e8",
+ "metadata": {},
+ "source": [
+ "# 🧪 Survey Synthetic Dataset Generator — Week 3 Task"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8d86f629",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "import os, re, json, time, uuid, math, random\n",
+ "from datetime import datetime, timedelta\n",
+ "from typing import List, Dict, Any\n",
+ "import numpy as np, pandas as pd\n",
+ "import pandera.pandas as pa\n",
+ "random.seed(7); np.random.seed(7)\n",
+ "print(\"✅ Base libraries ready. Pandera available:\", pa is not None)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f196ae73",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "def extract_strict_json(text: str):\n",
+ " \"\"\"Improved JSON extraction with multiple fallback strategies\"\"\"\n",
+ " if text is None:\n",
+ " raise ValueError(\"Empty model output.\")\n",
+ " \n",
+ " t = text.strip()\n",
+ " \n",
+ " # Strategy 1: Direct JSON parsing\n",
+ " try:\n",
+ " obj = json.loads(t)\n",
+ " if isinstance(obj, list):\n",
+ " return obj\n",
+ " elif isinstance(obj, dict):\n",
+ " for key in (\"rows\",\"data\",\"items\",\"records\",\"results\"):\n",
+ " if key in obj and isinstance(obj[key], list):\n",
+ " return obj[key]\n",
+ " if all(isinstance(k, str) and k.isdigit() for k in obj.keys()):\n",
+ " return [obj[k] for k in sorted(obj.keys(), key=int)]\n",
+ " except json.JSONDecodeError:\n",
+ " pass\n",
+ " \n",
+ " # Strategy 2: Extract JSON from code blocks\n",
+ " if t.startswith(\"```\"):\n",
+ " t = re.sub(r\"^```(?:json)?\\s*|\\s*```$\", \"\", t, flags=re.IGNORECASE|re.MULTILINE).strip()\n",
+ " \n",
+ " # Strategy 3: Find JSON array in text\n",
+ " start, end = t.find('['), t.rfind(']')\n",
+ " if start == -1 or end == -1 or end <= start:\n",
+ " raise ValueError(\"No JSON array found in model output.\")\n",
+ " \n",
+ " t = t[start:end+1]\n",
+ " \n",
+ " # Strategy 4: Fix common JSON issues\n",
+ " t = re.sub(r\",\\s*([\\]}])\", r\"\\1\", t) # Remove trailing commas\n",
+ " t = re.sub(r\"\\bNaN\\b|\\bInfinity\\b|\\b-Infinity\\b\", \"null\", t) # Replace NaN/Infinity\n",
+ " t = t.replace(\"\\u00a0\", \" \").replace(\"\\u200b\", \"\") # Remove invisible characters\n",
+ " \n",
+ " try:\n",
+ " return json.loads(t)\n",
+ " except json.JSONDecodeError as e:\n",
+ " raise ValueError(f\"Could not parse JSON: {str(e)}. Text: {t[:200]}...\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3670fa0d",
+ "metadata": {},
+ "source": [
+ "## 1) Configuration"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d16bd03a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "CFG = {\n",
+ " \"rows\": 800,\n",
+ " \"datetime_range\": {\"start\": \"2024-01-01\", \"end\": \"2025-10-01\", \"fmt\": \"%Y-%m-%d %H:%M:%S\"},\n",
+ " \"fields\": [\n",
+ " {\"name\": \"response_id\", \"type\": \"uuid4\"},\n",
+ " {\"name\": \"respondent_id\", \"type\": \"int\", \"min\": 10000, \"max\": 99999},\n",
+ " {\"name\": \"submitted_at\", \"type\": \"datetime\"},\n",
+ " {\"name\": \"country\", \"type\": \"enum\", \"values\": [\"KE\",\"UG\",\"TZ\",\"RW\",\"NG\",\"ZA\"], \"probs\": [0.50,0.10,0.12,0.05,0.15,0.08]},\n",
+ " {\"name\": \"language\", \"type\": \"enum\", \"values\": [\"en\",\"sw\"], \"probs\": [0.85,0.15]},\n",
+ " {\"name\": \"device\", \"type\": \"enum\", \"values\": [\"android\",\"ios\",\"web\"], \"probs\": [0.60,0.25,0.15]},\n",
+ " {\"name\": \"age\", \"type\": \"int\", \"min\": 18, \"max\": 70},\n",
+ " {\"name\": \"gender\", \"type\": \"enum\", \"values\": [\"female\",\"male\",\"nonbinary\",\"prefer_not_to_say\"], \"probs\": [0.49,0.49,0.01,0.01]},\n",
+ " {\"name\": \"education\", \"type\": \"enum\", \"values\": [\"primary\",\"secondary\",\"diploma\",\"bachelor\",\"postgraduate\"], \"probs\": [0.08,0.32,0.18,0.30,0.12]},\n",
+ " {\"name\": \"income_band\", \"type\": \"enum\", \"values\": [\"low\",\"lower_mid\",\"upper_mid\",\"high\"], \"probs\": [0.28,0.42,0.23,0.07]},\n",
+ " {\"name\": \"completion_seconds\", \"type\": \"float\", \"min\": 60, \"max\": 1800, \"distribution\": \"lognormal\"},\n",
+ " {\"name\": \"attention_passed\", \"type\": \"bool\"},\n",
+ " {\"name\": \"q_quality\", \"type\": \"int\", \"min\": 1, \"max\": 5},\n",
+ " {\"name\": \"q_value\", \"type\": \"int\", \"min\": 1, \"max\": 5},\n",
+ " {\"name\": \"q_ease\", \"type\": \"int\", \"min\": 1, \"max\": 5},\n",
+ " {\"name\": \"q_support\", \"type\": \"int\", \"min\": 1, \"max\": 5},\n",
+ " {\"name\": \"nps\", \"type\": \"int\", \"min\": 0, \"max\": 10},\n",
+ " {\"name\": \"is_detractor\", \"type\": \"bool\"}\n",
+ " ]\n",
+ "}\n",
+ "print(\"Loaded config for\", CFG[\"rows\"], \"rows and\", len(CFG[\"fields\"]), \"fields.\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7da1f429",
+ "metadata": {},
+ "source": [
+ "## 2) Helpers"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d2f5fdff",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "def sample_enum(values, probs=None, size=None):\n",
+ " values = list(values)\n",
+ " if probs is None:\n",
+ " probs = [1.0 / len(values)] * len(values)\n",
+ " return np.random.choice(values, p=probs, size=size)\n",
+ "\n",
+ "def sample_numeric(field_cfg, size=1):\n",
+ " t = field_cfg[\"type\"]\n",
+ " if t == \"int\":\n",
+ " lo, hi = int(field_cfg[\"min\"]), int(field_cfg[\"max\"])\n",
+ " dist = field_cfg.get(\"distribution\", \"uniform\")\n",
+ " if dist == \"uniform\":\n",
+ " return np.random.randint(lo, hi + 1, size=size)\n",
+ " elif dist == \"normal\":\n",
+ " mu = (lo + hi) / 2.0\n",
+ " sigma = (hi - lo) / 6.0\n",
+ " out = np.random.normal(mu, sigma, size=size)\n",
+ " return np.clip(out, lo, hi).astype(int)\n",
+ " else:\n",
+ " return np.random.randint(lo, hi + 1, size=size)\n",
+ " elif t == \"float\":\n",
+ " lo, hi = float(field_cfg[\"min\"]), float(field_cfg[\"max\"])\n",
+ " dist = field_cfg.get(\"distribution\", \"uniform\")\n",
+ " if dist == \"uniform\":\n",
+ " return np.random.uniform(lo, hi, size=size)\n",
+ " elif dist == \"normal\":\n",
+ " mu = (lo + hi) / 2.0\n",
+ " sigma = (hi - lo) / 6.0\n",
+ " return np.clip(np.random.normal(mu, sigma, size=size), lo, hi)\n",
+ " elif dist == \"lognormal\":\n",
+ " mu = math.log(max(1e-3, (lo + hi) / 2.0))\n",
+ " sigma = 0.75\n",
+ " out = np.random.lognormal(mu, sigma, size=size)\n",
+ " return np.clip(out, lo, hi)\n",
+ " else:\n",
+ " return np.random.uniform(lo, hi, size=size)\n",
+ " else:\n",
+ " raise ValueError(\"Unsupported numeric type\")\n",
+ "\n",
+ "def sample_datetime(start: str, end: str, size=1, fmt=\"%Y-%m-%d %H:%M:%S\"):\n",
+ " s = datetime.fromisoformat(start)\n",
+ " e = datetime.fromisoformat(end)\n",
+ " total = int((e - s).total_seconds())\n",
+ " r = np.random.randint(0, total, size=size)\n",
+ " return [(s + timedelta(seconds=int(x))).strftime(fmt) for x in r]\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5f24111a",
+ "metadata": {},
+ "source": [
+ "## 3) Rule-based Generator"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cd61330d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "def generate_rule_based(CFG: Dict[str, Any]) -> pd.DataFrame:\n",
+ " n = CFG[\"rows\"]\n",
+ " dt_cfg = CFG.get(\"datetime_range\", {\"start\":\"2024-01-01\",\"end\":\"2025-10-01\",\"fmt\":\"%Y-%m-%d %H:%M:%S\"})\n",
+ " data = {}\n",
+ " for f in CFG[\"fields\"]:\n",
+ " name, t = f[\"name\"], f[\"type\"]\n",
+ " if t == \"uuid4\":\n",
+ " data[name] = [str(uuid.uuid4()) for _ in range(n)]\n",
+ " elif t in (\"int\",\"float\"):\n",
+ " data[name] = sample_numeric(f, size=n)\n",
+ " elif t == \"enum\":\n",
+ " data[name] = sample_enum(f[\"values\"], f.get(\"probs\"), size=n)\n",
+ " elif t == \"datetime\":\n",
+ " data[name] = sample_datetime(dt_cfg[\"start\"], dt_cfg[\"end\"], size=n, fmt=dt_cfg[\"fmt\"])\n",
+ " elif t == \"bool\":\n",
+ " data[name] = np.random.rand(n) < 0.9 # 90% True\n",
+ " else:\n",
+ " data[name] = [None]*n\n",
+ " df = pd.DataFrame(data)\n",
+ "\n",
+ " # Derive NPS roughly from likert questions\n",
+ " if set([\"q_quality\",\"q_value\",\"q_ease\",\"q_support\"]).issubset(df.columns):\n",
+ " likert_avg = df[[\"q_quality\",\"q_value\",\"q_ease\",\"q_support\"]].mean(axis=1)\n",
+ " df[\"nps\"] = np.clip(np.round((likert_avg - 1.0) * (10.0/4.0) + np.random.normal(0, 1.2, size=n)), 0, 10).astype(int)\n",
+ "\n",
+ " # Heuristic target: is_detractor more likely when completion high & attention failed\n",
+ " if \"is_detractor\" in df.columns:\n",
+ " base = 0.25\n",
+ " comp = df.get(\"completion_seconds\", pd.Series(np.zeros(n)))\n",
+ " attn = pd.Series(df.get(\"attention_passed\", np.ones(n))).astype(bool)\n",
+ " boost = (comp > 900).astype(int) + (~attn).astype(int)\n",
+ " p = np.clip(base + 0.15*boost, 0.01, 0.95)\n",
+ " df[\"is_detractor\"] = np.random.rand(n) < p\n",
+ "\n",
+ " return df\n",
+ "\n",
+ "df_rule = generate_rule_based(CFG)\n",
+ "df_rule.head()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dd9eff20",
+ "metadata": {},
+ "source": [
+ "## 4) Validation (Pandera optional)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9a4ef86a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "def build_pandera_schema(CFG):\n",
+ " if pa is None:\n",
+ " return None\n",
+ " cols = {}\n",
+ " for f in CFG[\"fields\"]:\n",
+ " t, name = f[\"type\"], f[\"name\"]\n",
+ " if t == \"int\": cols[name] = pa.Column(int)\n",
+ " elif t == \"float\": cols[name] = pa.Column(float)\n",
+ " elif t == \"enum\": cols[name] = pa.Column(object)\n",
+ " elif t == \"datetime\": cols[name] = pa.Column(object)\n",
+ " elif t == \"uuid4\": cols[name] = pa.Column(object)\n",
+ " elif t == \"bool\": cols[name] = pa.Column(bool)\n",
+ " else: cols[name] = pa.Column(object)\n",
+ " return pa.DataFrameSchema(cols) if pa is not None else None\n",
+ "\n",
+ "def validate_df(df, CFG):\n",
+ " schema = build_pandera_schema(CFG)\n",
+ " if schema is None:\n",
+ " return df, {\"engine\":\"basic\",\"valid_rows\": len(df), \"invalid_rows\": 0}\n",
+ " try:\n",
+ " v = schema.validate(df, lazy=True)\n",
+ " return v, {\"engine\":\"pandera\",\"valid_rows\": len(v), \"invalid_rows\": 0}\n",
+ " except Exception as e:\n",
+ " print(\"Validation error:\", e)\n",
+ " return df, {\"engine\":\"pandera\",\"valid_rows\": len(df), \"invalid_rows\": 0, \"notes\": \"Non-strict mode.\"}\n",
+ "\n",
+ "validated_rule, report_rule = validate_df(df_rule, CFG)\n",
+ "print(report_rule)\n",
+ "validated_rule.head()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d5f1d93a",
+ "metadata": {},
+ "source": [
+ "## 5) Save"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "73626b4c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "from pathlib import Path\n",
+ "out = Path(\"data\"); out.mkdir(exist_ok=True)\n",
+ "ts = datetime.utcnow().strftime(\"%Y%m%dT%H%M%SZ\")\n",
+ "csv_path = out / f\"survey_rule_{ts}.csv\"\n",
+ "validated_rule.to_csv(csv_path, index=False)\n",
+ "print(\"Saved:\", csv_path.as_posix())\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "87c89b51",
+ "metadata": {},
+ "source": [
+ "## 6) Optional: LLM Generator (JSON mode, retry & strict parsing)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "24e94771",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Fixed LLM Generation Functions\n",
+ "def create_survey_prompt(CFG, n_rows=50):\n",
+ " \"\"\"Create a clear, structured prompt for survey data generation\"\"\"\n",
+ " fields_desc = []\n",
+ " for field in CFG['fields']:\n",
+ " name = field['name']\n",
+ " field_type = field['type']\n",
+ " \n",
+ " if field_type == 'int':\n",
+ " min_val = field.get('min', 0)\n",
+ " max_val = field.get('max', 100)\n",
+ " fields_desc.append(f\" - {name}: integer between {min_val} and {max_val}\")\n",
+ " elif field_type == 'float':\n",
+ " min_val = field.get('min', 0.0)\n",
+ " max_val = field.get('max', 100.0)\n",
+ " fields_desc.append(f\" - {name}: float between {min_val} and {max_val}\")\n",
+ " elif field_type == 'enum':\n",
+ " values = field.get('values', [])\n",
+ " fields_desc.append(f\" - {name}: one of {values}\")\n",
+ " elif field_type == 'bool':\n",
+ " fields_desc.append(f\" - {name}: boolean (true/false)\")\n",
+ " elif field_type == 'uuid4':\n",
+ " fields_desc.append(f\" - {name}: UUID string\")\n",
+ " elif field_type == 'datetime':\n",
+ " fmt = field.get('fmt', '%Y-%m-%d %H:%M:%S')\n",
+ " fields_desc.append(f\" - {name}: datetime string in format {fmt}\")\n",
+ " else:\n",
+ " fields_desc.append(f\" - {name}: {field_type}\")\n",
+ " \n",
+ " prompt = f\"\"\"Generate {n_rows} rows of realistic survey response data.\n",
+ "\n",
+ "Schema:\n",
+ "{chr(10).join(fields_desc)}\n",
+ "\n",
+ "CRITICAL REQUIREMENTS:\n",
+ "- Return a JSON object with a \"responses\" key containing an array\n",
+ "- Each object in the array must have all required fields\n",
+ "- Use realistic, diverse values for survey responses\n",
+ "- No trailing commas\n",
+ "- No comments or explanations\n",
+ "\n",
+ "Output format: JSON object with \"responses\" array containing exactly {n_rows} objects.\n",
+ "\n",
+ "Example structure:\n",
+ "{{\n",
+ " \"responses\": [\n",
+ " {{\n",
+ " \"response_id\": \"uuid-string\",\n",
+ " \"respondent_id\": 12345,\n",
+ " \"submitted_at\": \"2024-01-01 12:00:00\",\n",
+ " \"country\": \"KE\",\n",
+ " \"language\": \"en\",\n",
+ " \"device\": \"android\",\n",
+ " \"age\": 25,\n",
+ " \"gender\": \"female\",\n",
+ " \"education\": \"bachelor\",\n",
+ " \"income_band\": \"upper_mid\",\n",
+ " \"completion_seconds\": 300.5,\n",
+ " \"attention_passed\": true,\n",
+ " \"q_quality\": 4,\n",
+ " \"q_value\": 3,\n",
+ " \"q_ease\": 5,\n",
+ " \"q_support\": 4,\n",
+ " \"nps\": 8,\n",
+ " \"is_detractor\": false\n",
+ " }},\n",
+ " ...\n",
+ " ]\n",
+ "}}\n",
+ "\n",
+ "IMPORTANT: Return ONLY the JSON object with \"responses\" key, nothing else.\"\"\"\n",
+ " \n",
+ " return prompt\n",
+ "\n",
+ "def repair_truncated_json(content):\n",
+ " \"\"\"Attempt to repair truncated JSON responses\"\"\"\n",
+ " content = content.strip()\n",
+ " \n",
+ " # If it starts with { but doesn't end with }, try to close it\n",
+ " if content.startswith('{') and not content.endswith('}'):\n",
+ " # Find the last complete object in the responses array\n",
+ " responses_start = content.find('\"responses\": [')\n",
+ " if responses_start != -1:\n",
+ " # Find the last complete object\n",
+ " brace_count = 0\n",
+ " last_complete_pos = -1\n",
+ " in_string = False\n",
+ " escape_next = False\n",
+ " \n",
+ " for i, char in enumerate(content[responses_start:], responses_start):\n",
+ " if escape_next:\n",
+ " escape_next = False\n",
+ " continue\n",
+ " \n",
+ " if char == '\\\\':\n",
+ " escape_next = True\n",
+ " continue\n",
+ " \n",
+ " if char == '\"' and not escape_next:\n",
+ " in_string = not in_string\n",
+ " continue\n",
+ " \n",
+ " if not in_string:\n",
+ " if char == '{':\n",
+ " brace_count += 1\n",
+ " elif char == '}':\n",
+ " brace_count -= 1\n",
+ " if brace_count == 0:\n",
+ " last_complete_pos = i\n",
+ " break\n",
+ " \n",
+ " if last_complete_pos != -1:\n",
+ " # Truncate at the last complete object and close the JSON\n",
+ " repaired = content[:last_complete_pos + 1] + '\\n ]\\n}'\n",
+ " print(f\"🔧 Repaired JSON: truncated at position {last_complete_pos}\")\n",
+ " return repaired\n",
+ " \n",
+ " return content\n",
+ "\n",
+ "def fixed_llm_generate_batch(CFG, n_rows=50):\n",
+ " \"\"\"Fixed LLM generation with better prompt and error handling\"\"\"\n",
+ " if not os.getenv('OPENAI_API_KEY'):\n",
+ " print(\"No OpenAI API key, using rule-based fallback\")\n",
+ " tmp = dict(CFG); tmp['rows'] = n_rows\n",
+ " return generate_rule_based(tmp)\n",
+ " \n",
+ " try:\n",
+ " from openai import OpenAI\n",
+ " client = OpenAI()\n",
+ " \n",
+ " prompt = create_survey_prompt(CFG, n_rows)\n",
+ " \n",
+ " print(f\"🔄 Generating {n_rows} survey responses with LLM...\")\n",
+ " \n",
+ " # Calculate appropriate max_tokens based on batch size\n",
+ " # Roughly 200-300 tokens per row, with some buffer\n",
+ " estimated_tokens = n_rows * 300 + 500 # Buffer for JSON structure\n",
+ " max_tokens = min(max(estimated_tokens, 2000), 8000) # Between 2k-8k tokens\n",
+ " \n",
+ " print(f\"📊 Using max_tokens: {max_tokens} (estimated: {estimated_tokens})\")\n",
+ " \n",
+ " response = client.chat.completions.create(\n",
+ " model='gpt-4o-mini',\n",
+ " messages=[\n",
+ " {'role': 'system', 'content': 'You are a data generation expert. Generate realistic survey data in JSON format. Always return complete, valid JSON.'},\n",
+ " {'role': 'user', 'content': prompt}\n",
+ " ],\n",
+ " temperature=0.3,\n",
+ " max_tokens=max_tokens,\n",
+ " response_format={'type': 'json_object'}\n",
+ " )\n",
+ " \n",
+ " content = response.choices[0].message.content\n",
+ " print(f\"📝 Raw response length: {len(content)} characters\")\n",
+ " \n",
+ " # Check if response appears truncated\n",
+ " if not content.strip().endswith('}') and not content.strip().endswith(']'):\n",
+ " print(\"⚠️ Response appears truncated, attempting repair...\")\n",
+ " content = repair_truncated_json(content)\n",
+ " \n",
+ " # Try to extract JSON with improved logic\n",
+ " try:\n",
+ " data = json.loads(content)\n",
+ " print(f\"🔍 Parsed JSON type: {type(data)}\")\n",
+ " \n",
+ " if isinstance(data, list):\n",
+ " df = pd.DataFrame(data)\n",
+ " print(f\"📊 Direct array: {len(df)} rows\")\n",
+ " elif isinstance(data, dict):\n",
+ " # Check for common keys that might contain the data\n",
+ " for key in ['responses', 'rows', 'data', 'items', 'records', 'results', 'survey_responses']:\n",
+ " if key in data and isinstance(data[key], list):\n",
+ " df = pd.DataFrame(data[key])\n",
+ " print(f\"📊 Found data in '{key}': {len(df)} rows\")\n",
+ " break\n",
+ " else:\n",
+ " # If no standard key found, check if all values are lists/objects\n",
+ " list_keys = [k for k, v in data.items() if isinstance(v, list) and len(v) > 0]\n",
+ " if list_keys:\n",
+ " # Use the first list key found\n",
+ " key = list_keys[0]\n",
+ " df = pd.DataFrame(data[key])\n",
+ " print(f\"📊 Found data in '{key}': {len(df)} rows\")\n",
+ " else:\n",
+ " # Try to convert the dict values to a list\n",
+ " if all(isinstance(v, dict) for v in data.values()):\n",
+ " df = pd.DataFrame(list(data.values()))\n",
+ " print(f\"📊 Converted dict values: {len(df)} rows\")\n",
+ " else:\n",
+ " raise ValueError(f\"Unexpected JSON structure: {list(data.keys())}\")\n",
+ " else:\n",
+ " raise ValueError(f\"Unexpected JSON type: {type(data)}\")\n",
+ " \n",
+ " if len(df) == n_rows:\n",
+ " print(f\"✅ Successfully generated {len(df)} survey responses\")\n",
+ " return df\n",
+ " else:\n",
+ " print(f\"⚠️ Generated {len(df)} rows, expected {n_rows}\")\n",
+ " if len(df) > 0:\n",
+ " return df\n",
+ " else:\n",
+ " raise ValueError(\"No data generated\")\n",
+ " \n",
+ " except json.JSONDecodeError as e:\n",
+ " print(f\"❌ JSON parsing failed: {str(e)}\")\n",
+ " # Try the improved extract_strict_json function\n",
+ " try:\n",
+ " data = extract_strict_json(content)\n",
+ " df = pd.DataFrame(data)\n",
+ " print(f\"✅ Recovered with strict parsing: {len(df)} rows\")\n",
+ " return df\n",
+ " except Exception as e2:\n",
+ " print(f\"❌ Strict parsing also failed: {str(e2)}\")\n",
+ " # Print a sample of the content for debugging\n",
+ " print(f\"🔍 Content sample: {content[:500]}...\")\n",
+ " raise e2\n",
+ " \n",
+ " except Exception as e:\n",
+ " print(f'❌ LLM error, fallback to rule-based mock: {str(e)}')\n",
+ " tmp = dict(CFG); tmp['rows'] = n_rows\n",
+ " return generate_rule_based(tmp)\n",
+ "\n",
+ "def fixed_generate_llm(CFG, total_rows=200, batch_size=50):\n",
+ " \"\"\"Fixed LLM generation with adaptive batch processing\"\"\"\n",
+ " print(f\"🚀 Generating {total_rows} survey responses with adaptive batching\")\n",
+ " \n",
+ " # Adaptive batch sizing based on total rows\n",
+ " if total_rows <= 20:\n",
+ " optimal_batch_size = min(batch_size, total_rows)\n",
+ " elif total_rows <= 50:\n",
+ " optimal_batch_size = min(15, batch_size)\n",
+ " elif total_rows <= 100:\n",
+ " optimal_batch_size = min(10, batch_size)\n",
+ " else:\n",
+ " optimal_batch_size = min(8, batch_size)\n",
+ " \n",
+ " print(f\"📊 Using optimal batch size: {optimal_batch_size}\")\n",
+ " \n",
+ " all_dataframes = []\n",
+ " remaining = total_rows\n",
+ " \n",
+ " while remaining > 0:\n",
+ " current_batch_size = min(optimal_batch_size, remaining)\n",
+ " print(f\"\\n📦 Processing batch: {current_batch_size} rows (remaining: {remaining})\")\n",
+ " \n",
+ " try:\n",
+ " batch_df = fixed_llm_generate_batch(CFG, current_batch_size)\n",
+ " all_dataframes.append(batch_df)\n",
+ " remaining -= len(batch_df)\n",
+ " \n",
+ " # Small delay between batches to avoid rate limits\n",
+ " if remaining > 0:\n",
+ " time.sleep(1.5)\n",
+ " \n",
+ " except Exception as e:\n",
+ " print(f\"❌ Batch failed: {str(e)}\")\n",
+ " print(f\"🔄 Retrying with smaller batch size...\")\n",
+ " \n",
+ " # Try with smaller batch size\n",
+ " smaller_batch = max(1, current_batch_size // 2)\n",
+ " if smaller_batch < current_batch_size:\n",
+ " try:\n",
+ " print(f\"🔄 Retrying with {smaller_batch} rows...\")\n",
+ " batch_df = fixed_llm_generate_batch(CFG, smaller_batch)\n",
+ " all_dataframes.append(batch_df)\n",
+ " remaining -= len(batch_df)\n",
+ " continue\n",
+ " except Exception as e2:\n",
+ " print(f\"❌ Retry also failed: {str(e2)}\")\n",
+ " \n",
+ " print(f\"Using rule-based fallback for remaining {remaining} rows\")\n",
+ " fallback_df = generate_rule_based(CFG, remaining)\n",
+ " all_dataframes.append(fallback_df)\n",
+ " break\n",
+ " \n",
+ " if all_dataframes:\n",
+ " result = pd.concat(all_dataframes, ignore_index=True)\n",
+ " print(f\"✅ Generated total: {len(result)} survey responses\")\n",
+ " return result\n",
+ " else:\n",
+ " print(\"❌ No data generated\")\n",
+ " return pd.DataFrame()\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e1af410e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Test the fixed LLM generation\n",
+ "print(\"🧪 Testing LLM generation...\")\n",
+ "\n",
+ "# Test with small dataset first\n",
+ "test_df = fixed_llm_generate_batch(CFG, 10)\n",
+ "print(f\"\\n📊 Generated dataset shape: {test_df.shape}\")\n",
+ "print(f\"\\n📋 First few rows:\")\n",
+ "print(test_df.head())\n",
+ "print(f\"\\n📈 Data types:\")\n",
+ "print(test_df.dtypes)\n",
+ "\n",
+ "# Debug function to see what the LLM is actually returning\n",
+ "def debug_llm_response(CFG, n_rows=5):\n",
+ " \"\"\"Debug function to see raw LLM response\"\"\"\n",
+ " if not os.getenv('OPENAI_API_KEY'):\n",
+ " print(\"No OpenAI API key available for debugging\")\n",
+ " return\n",
+ " \n",
+ " try:\n",
+ " from openai import OpenAI\n",
+ " client = OpenAI()\n",
+ " \n",
+ " prompt = create_survey_prompt(CFG, n_rows)\n",
+ " \n",
+ " print(f\"\\n🔍 DEBUG: Testing with {n_rows} rows\")\n",
+ " print(f\"📝 Prompt length: {len(prompt)} characters\")\n",
+ " \n",
+ " response = client.chat.completions.create(\n",
+ " model='gpt-4o-mini',\n",
+ " messages=[\n",
+ " {'role': 'system', 'content': 'You are a data generation expert. Generate realistic survey data in JSON format.'},\n",
+ " {'role': 'user', 'content': prompt}\n",
+ " ],\n",
+ " temperature=0.3,\n",
+ " max_tokens=2000,\n",
+ " response_format={'type': 'json_object'}\n",
+ " )\n",
+ " \n",
+ " content = response.choices[0].message.content\n",
+ " print(f\"📝 Raw response length: {len(content)} characters\")\n",
+ " print(f\"🔍 First 200 characters: {content[:200]}\")\n",
+ " print(f\"🔍 Last 200 characters: {content[-200:]}\")\n",
+ " \n",
+ " # Try to parse\n",
+ " try:\n",
+ " data = json.loads(content)\n",
+ " print(f\"✅ JSON parsed successfully\")\n",
+ " print(f\"🔍 Data type: {type(data)}\")\n",
+ " if isinstance(data, dict):\n",
+ " print(f\"🔍 Dict keys: {list(data.keys())}\")\n",
+ " elif isinstance(data, list):\n",
+ " print(f\"🔍 List length: {len(data)}\")\n",
+ " except Exception as e:\n",
+ " print(f\"❌ JSON parsing failed: {str(e)}\")\n",
+ " \n",
+ " except Exception as e:\n",
+ " print(f\"❌ Debug failed: {str(e)}\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "75c90739",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Test the fixed implementation\n",
+ "print(\"🧪 Testing the fixed LLM generation...\")\n",
+ "\n",
+ "# Test with small dataset\n",
+ "test_df = fixed_llm_generate_batch(CFG, 5)\n",
+ "print(f\"\\n📊 Generated dataset shape: {test_df.shape}\")\n",
+ "print(f\"\\n📋 First few rows:\")\n",
+ "print(test_df.head())\n",
+ "print(f\"\\n📈 Data types:\")\n",
+ "print(test_df.dtypes)\n",
+ "\n",
+ "if not test_df.empty:\n",
+ " print(f\"\\n✅ SUCCESS! LLM generation is now working!\")\n",
+ " print(f\"📊 Generated {len(test_df)} survey responses using LLM\")\n",
+ "else:\n",
+ " print(f\"\\n❌ Still having issues with LLM generation\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "dd83b842",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Test larger dataset generation \n",
+ "print(\"🚀 Testing larger dataset generation...\")\n",
+ "large_df = fixed_generate_llm(CFG, total_rows=100, batch_size=25)\n",
+ "if not large_df.empty:\n",
+ " print(f\"\\n📊 Large dataset shape: {large_df.shape}\")\n",
+ " print(f\"\\n📈 Summary statistics:\")\n",
+ " print(large_df.describe())\n",
+ " \n",
+ " # Save the results\n",
+ " from pathlib import Path\n",
+ " out = Path(\"data\"); out.mkdir(exist_ok=True)\n",
+ " ts = datetime.utcnow().strftime(\"%Y%m%dT%H%M%SZ\")\n",
+ " csv_path = out / f\"survey_llm_fixed_{ts}.csv\"\n",
+ " large_df.to_csv(csv_path, index=False)\n",
+ " print(f\"💾 Saved: {csv_path}\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6029d3e2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "def build_json_schema(CFG):\n",
+ " schema = {'type':'array','items':{'type':'object','properties':{},'required':[]}}\n",
+ " props = schema['items']['properties']; req = schema['items']['required']\n",
+ " for f in CFG['fields']:\n",
+ " name, t = f['name'], f['type']\n",
+ " req.append(name)\n",
+ " if t in ('int','float'): props[name] = {'type':'number' if t=='float' else 'integer'}\n",
+ " elif t == 'enum': props[name] = {'type':'string','enum': f['values']}\n",
+ " elif t in ('uuid4','datetime'): props[name] = {'type':'string'}\n",
+ " elif t == 'bool': props[name] = {'type':'boolean'}\n",
+ " else: props[name] = {'type':'string'}\n",
+ " return schema\n",
+ "\n",
+ "PROMPT_PREAMBLE = (\n",
+ " \"You are a data generator. Return ONLY JSON. \"\n",
+ " \"Respond as a JSON object with key 'rows' whose value is an array of exactly N objects. \"\n",
+ " \"No prose, no code fences, no trailing commas.\"\n",
+ ")\n",
+ "\n",
+ "def render_prompt(CFG, n_rows=100):\n",
+ " minimal_cfg = {'fields': []}\n",
+ " for f in CFG['fields']:\n",
+ " base = {k: f[k] for k in ['name','type'] if k in f}\n",
+ " if 'min' in f and 'max' in f: base.update({'min': f['min'], 'max': f['max']})\n",
+ " if 'values' in f: base.update({'values': f['values']})\n",
+ " if 'fmt' in f: base.update({'fmt': f['fmt']})\n",
+ " minimal_cfg['fields'].append(base)\n",
+ " return {\n",
+ " 'preamble': PROMPT_PREAMBLE,\n",
+ " 'n_rows': n_rows,\n",
+ " 'schema': build_json_schema(CFG),\n",
+ " 'constraints': minimal_cfg,\n",
+ " 'instruction': f\"Return ONLY this structure: {{'rows': [ ... exactly {n_rows} objects ... ]}}\"\n",
+ " }\n",
+ "\n",
+ "def parse_llm_json_to_df(raw: str) -> pd.DataFrame:\n",
+ " try:\n",
+ " obj = json.loads(raw)\n",
+ " if isinstance(obj, dict) and isinstance(obj.get('rows'), list):\n",
+ " return pd.DataFrame(obj['rows'])\n",
+ " except Exception:\n",
+ " pass\n",
+ " data = extract_strict_json(raw)\n",
+ " return pd.DataFrame(data)\n",
+ "\n",
+ "USE_LLM = bool(os.getenv('OPENAI_API_KEY'))\n",
+ "print('LLM available:', USE_LLM)\n",
+ "\n",
+ "def llm_generate_batch(CFG, n_rows=50):\n",
+ " if USE_LLM:\n",
+ " try:\n",
+ " from openai import OpenAI\n",
+ " client = OpenAI()\n",
+ " prompt = json.dumps(render_prompt(CFG, n_rows))\n",
+ " resp = client.chat.completions.create(\n",
+ " model='gpt-4o-mini',\n",
+ " response_format={'type': 'json_object'},\n",
+ " messages=[\n",
+ " {'role':'system','content':'You output strict JSON only.'},\n",
+ " {'role':'user','content': prompt}\n",
+ " ],\n",
+ " temperature=0.2,\n",
+ " max_tokens=8192,\n",
+ " )\n",
+ " raw = resp.choices[0].message.content\n",
+ " try:\n",
+ " return parse_llm_json_to_df(raw)\n",
+ " except Exception:\n",
+ " stricter = (\n",
+ " prompt\n",
+ " + \"\\nReturn ONLY a JSON object structured as: \"\n",
+ " + \"{\\\"rows\\\": [ ... exactly N objects ... ]}. \"\n",
+ " + \"No prose, no explanations.\"\n",
+ " )\n",
+ " resp2 = client.chat.completions.create(\n",
+ " model='gpt-4o-mini',\n",
+ " response_format={'type': 'json_object'},\n",
+ " messages=[\n",
+ " {'role':'system','content':'You output strict JSON only.'},\n",
+ " {'role':'user','content': stricter}\n",
+ " ],\n",
+ " temperature=0.2,\n",
+ " max_tokens=8192,\n",
+ " )\n",
+ " raw2 = resp2.choices[0].message.content\n",
+ " return parse_llm_json_to_df(raw2)\n",
+ " except Exception as e:\n",
+ " print('LLM error, fallback to rule-based mock:', e)\n",
+ " tmp = dict(CFG); tmp['rows'] = n_rows\n",
+ " return generate_rule_based(tmp)\n",
+ "\n",
+ "def generate_llm(CFG, total_rows=200, batch_size=50):\n",
+ " dfs = []; remaining = total_rows\n",
+ " while remaining > 0:\n",
+ " b = min(batch_size, remaining)\n",
+ " dfs.append(llm_generate_batch(CFG, n_rows=b))\n",
+ " remaining -= b\n",
+ " time.sleep(0.2)\n",
+ " return pd.concat(dfs, ignore_index=True)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2e759087",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df_llm = generate_llm(CFG, total_rows=100, batch_size=50)\n",
+ "df_llm.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6d4908ad",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Test the improved LLM generation with adaptive batching\n",
+ "print(\"🧪 Testing improved LLM generation with adaptive batching...\")\n",
+ "\n",
+ "# Test with smaller dataset first\n",
+ "print(\"\\n📦 Testing small batch (10 rows)...\")\n",
+ "small_df = fixed_llm_generate_batch(CFG, 10)\n",
+ "print(f\"✅ Small batch result: {len(small_df)} rows\")\n",
+ "\n",
+ "# Test with medium dataset using adaptive batching\n",
+ "print(\"\\n📦 Testing medium dataset (30 rows) with adaptive batching...\")\n",
+ "medium_df = fixed_generate_llm(CFG, total_rows=30, batch_size=15)\n",
+ "print(f\"✅ Medium dataset result: {len(medium_df)} rows\")\n",
+ "\n",
+ "if not medium_df.empty:\n",
+ " print(f\"\\n📊 Dataset shape: {medium_df.shape}\")\n",
+ " print(f\"\\n📋 First few rows:\")\n",
+ " print(medium_df.head())\n",
+ " \n",
+ " # Save the results\n",
+ " from pathlib import Path\n",
+ " out = Path(\"data\"); out.mkdir(exist_ok=True)\n",
+ " ts = datetime.utcnow().strftime(\"%Y%m%dT%H%M%SZ\")\n",
+ " csv_path = out / f\"survey_adaptive_batch_{ts}.csv\"\n",
+ " medium_df.to_csv(csv_path, index=False)\n",
+ " print(f\"💾 Saved: {csv_path}\")\n",
+ "else:\n",
+ " print(\"❌ Medium dataset generation failed\")\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/week3/community-contributions/week3_exercise_solution-Stephen.ipynb b/week3/community-contributions/week3_exercise_solution-Stephen.ipynb
new file mode 100644
index 0000000..bbc99e7
--- /dev/null
+++ b/week3/community-contributions/week3_exercise_solution-Stephen.ipynb
@@ -0,0 +1,216 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "c58e628f",
+ "metadata": {},
+ "source": [
+ "\n",
+ "## **Week 3 task.**\n",
+ "Create your own tool that generates synthetic data/test data. Input the type of dataset or products or job postings, etc. and let the tool dream up various data samples.\n",
+ "\n",
+ "https://colab.research.google.com/drive/13wR4Blz3Ot_x0GOpflmvvFffm5XU3Kct?usp=sharing"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "0ddde9ed",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# imports\n",
+ "\n",
+ "import os\n",
+ "import requests\n",
+ "import torch\n",
+ "from IPython.display import Markdown, display, update_display\n",
+ "from openai import OpenAI\n",
+ "from huggingface_hub import login\n",
+ "from huggingface_hub import login\n",
+ "from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n",
+ "from dotenv import load_dotenv\n",
+ "import gradio as gr"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cbbc6cc8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "load_dotenv(override=True)\n",
+ "\n",
+ "openai_api_key = os.getenv('OPENAI_API_KEY')\n",
+ "llama_api_key = \"ollama\"\n",
+ "\n",
+ "# hf_token = userdata.get('HF_TOKEN')\n",
+ "# login(hf_token, add_to_git_credential=True)\n",
+ "\n",
+ "\n",
+ "if openai_api_key:\n",
+ " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
+ "else:\n",
+ " print(\"OpenAI API Key not set\")\n",
+ "\n",
+ "if llama_api_key:\n",
+ " print(f\"LLama API Key exists\")\n",
+ "else:\n",
+ " print(\"LLama API Key not set\")\n",
+ " \n",
+ "GPT_MODEL = \"gpt-4.1-mini\"\n",
+ "LLAMA_MODEL = \"llama3.1\"\n",
+ "\n",
+ "\n",
+ "openai = OpenAI()\n",
+ "\n",
+ "llama_url = \"http://localhost:11434/v1\"\n",
+ "llama = OpenAI(api_key=llama_api_key, base_url=llama_url)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "ef083ec6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def generate_with_gpt(user_prompt: str, num_samples: int = 5):\n",
+ " \"\"\"\n",
+ " Generates synthetic data using OpenAI's GPT.\n",
+ " Return a JSON string.\n",
+ " \"\"\"\n",
+ " if not openai:\n",
+ " return json.dumps({\"error\": \"OpenAI client not initialized. Please check your API key.\"}, indent=2)\n",
+ "\n",
+ " try:\n",
+ " response = openai.chat.completions.create(\n",
+ " model=GPT_MODEL,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": f\"You are a data generation assistant. Generate a JSON array of exactly {num_samples} objects based on the user's request. The output must be valid JSON only, without any other text or formatting.\"},\n",
+ " {\"role\": \"user\", \"content\": user_prompt}\n",
+ " ],\n",
+ " response_format={\"type\": \"json_object\"}\n",
+ " )\n",
+ " \n",
+ " json_text = response.choices[0].message.content\n",
+ " return json_text\n",
+ " except APIError as e:\n",
+ " return json.dumps({\"error\": f\"Error from OpenAI API: {e.body}\"}, indent=2)\n",
+ " except Exception as e:\n",
+ " return json.dumps({\"error\": f\"An unexpected error occurred: {e}\"}, indent=2)\n",
+ "\n",
+ "def generate_with_gpt(user_prompt: str, num_samples: int = 5):\n",
+ " \"\"\"\n",
+ " Generates synthetic data using OpenAI's GPT.\n",
+ " Return a JSON string.\n",
+ " \"\"\"\n",
+ " if not openai:\n",
+ " return json.dumps({\"error\": \"OpenAI client not initialized. Please check your API key.\"}, indent=2)\n",
+ "\n",
+ " try:\n",
+ " response = openai.chat.completions.create(\n",
+ " model=GPT_MODEL,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": f\"You are a data generation assistant. Generate a JSON array of exactly {num_samples} objects based on the user's request. The output must be valid JSON only, without any other text or formatting.\"},\n",
+ " {\"role\": \"user\", \"content\": user_prompt}\n",
+ " ],\n",
+ " response_format={\"type\": \"json_object\"}\n",
+ " )\n",
+ " \n",
+ " json_text = response.choices[0].message.content\n",
+ " return json_text\n",
+ " except APIError as e:\n",
+ " return json.dumps({\"error\": f\"Error from OpenAI API: {e.body}\"}, indent=2)\n",
+ " except Exception as e:\n",
+ " return json.dumps({\"error\": f\"An unexpected error occurred: {e}\"}, indent=2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "b98f84d8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def generate_data(user_prompt, model_choice):\n",
+ " \"\"\"\n",
+ " Wrapper function that calls the appropriate generation function based on model choice.\n",
+ " \"\"\"\n",
+ " if not user_prompt:\n",
+ " return json.dumps({\"error\": \"Please provide a description for the data.\"}, indent=2)\n",
+ "\n",
+ " if model_choice == f\"Hugging Face ({LLAMA_MODEL})\":\n",
+ " return generate_with_llama(user_prompt)\n",
+ " elif model_choice == f\"OpenAI ({GPT_MODEL})\":\n",
+ " return generate_with_gpt(user_prompt)\n",
+ " else:\n",
+ " return json.dumps({\"error\": \"Invalid model choice.\"}, indent=2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "adbc19a8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Gradio UI\n",
+ "with gr.Blocks(theme=gr.themes.Glass(), title=\"Synthetic Data Generator\") as ui:\n",
+ " gr.Markdown(\"# Synthetic Data Generator\")\n",
+ " gr.Markdown(\"Describe the type of data you need, select a model, and click 'Generate'.\")\n",
+ "\n",
+ " with gr.Row():\n",
+ " with gr.Column(scale=3):\n",
+ " data_prompt = gr.Textbox(\n",
+ " lines=5,\n",
+ " label=\"Data Prompt\",\n",
+ " placeholder=\"e.g., a list of customer profiles with name, email, and a favorite product\"\n",
+ " )\n",
+ " \n",
+ " with gr.Column(scale=1):\n",
+ " model_choice = gr.Radio(\n",
+ " [f\"Hugging Face ({LLAMA_MODEL})\", f\"OpenAI ({GPT_MODEL})\"],\n",
+ " label=\"Choose a Model\",\n",
+ " value=f\"Hugging Face ({LLAMA_MODEL})\"\n",
+ " )\n",
+ " \n",
+ " generate_btn = gr.Button(\"Generate Data\")\n",
+ " \n",
+ " with gr.Row():\n",
+ " output_json = gr.JSON(label=\"Generated Data\")\n",
+ " \n",
+ " generate_btn.click(\n",
+ " fn=generate_data,\n",
+ " inputs=[data_prompt, model_choice],\n",
+ " outputs=output_json\n",
+ " )\n",
+ "\n",
+ "ui.launch(inbrowser=True, debug=True)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/week4/community-contributions/python_to_cpp_code_translator/examples/calculator.py b/week4/community-contributions/python_to_cpp_code_translator/examples/calculator.py
new file mode 100644
index 0000000..35af2d7
--- /dev/null
+++ b/week4/community-contributions/python_to_cpp_code_translator/examples/calculator.py
@@ -0,0 +1,190 @@
+"""
+Simple calculator class with history tracking.
+"""
+
+import math
+from typing import List, Union
+
+class Calculator:
+ """A simple calculator with history tracking."""
+
+ def __init__(self):
+ """Initialize calculator with empty history."""
+ self.history: List[str] = []
+ self.memory: float = 0.0
+
+ def add(self, a: float, b: float) -> float:
+ """Add two numbers."""
+ result = a + b
+ self.history.append(f"{a} + {b} = {result}")
+ return result
+
+ def subtract(self, a: float, b: float) -> float:
+ """Subtract b from a."""
+ result = a - b
+ self.history.append(f"{a} - {b} = {result}")
+ return result
+
+ def multiply(self, a: float, b: float) -> float:
+ """Multiply two numbers."""
+ result = a * b
+ self.history.append(f"{a} * {b} = {result}")
+ return result
+
+ def divide(self, a: float, b: float) -> float:
+ """Divide a by b."""
+ if b == 0:
+ raise ValueError("Cannot divide by zero")
+ result = a / b
+ self.history.append(f"{a} / {b} = {result}")
+ return result
+
+ def power(self, base: float, exponent: float) -> float:
+ """Calculate base raised to the power of exponent."""
+ result = base ** exponent
+ self.history.append(f"{base} ^ {exponent} = {result}")
+ return result
+
+ def square_root(self, number: float) -> float:
+ """Calculate square root of a number."""
+ if number < 0:
+ raise ValueError("Cannot calculate square root of negative number")
+ result = math.sqrt(number)
+ self.history.append(f"√{number} = {result}")
+ return result
+
+ def factorial(self, n: int) -> int:
+ """Calculate factorial of n."""
+ if n < 0:
+ raise ValueError("Factorial is not defined for negative numbers")
+ if n == 0 or n == 1:
+ return 1
+
+ result = 1
+ for i in range(2, n + 1):
+ result *= i
+
+ self.history.append(f"{n}! = {result}")
+ return result
+
+ def memory_store(self, value: float) -> None:
+ """Store value in memory."""
+ self.memory = value
+ self.history.append(f"Memory stored: {value}")
+
+ def memory_recall(self) -> float:
+ """Recall value from memory."""
+ self.history.append(f"Memory recalled: {self.memory}")
+ return self.memory
+
+ def memory_clear(self) -> None:
+ """Clear memory."""
+ self.memory = 0.0
+ self.history.append("Memory cleared")
+
+ def get_history(self) -> List[str]:
+ """Get calculation history."""
+ return self.history.copy()
+
+ def clear_history(self) -> None:
+ """Clear calculation history."""
+ self.history.clear()
+
+ def get_last_result(self) -> Union[float, None]:
+ """Get the result of the last calculation."""
+ if not self.history:
+ return None
+
+ last_entry = self.history[-1]
+ # Extract result from history entry
+ if "=" in last_entry:
+ return float(last_entry.split("=")[-1].strip())
+ return None
+
+class ScientificCalculator(Calculator):
+ """Extended calculator with scientific functions."""
+
+ def sine(self, angle: float) -> float:
+ """Calculate sine of angle in radians."""
+ result = math.sin(angle)
+ self.history.append(f"sin({angle}) = {result}")
+ return result
+
+ def cosine(self, angle: float) -> float:
+ """Calculate cosine of angle in radians."""
+ result = math.cos(angle)
+ self.history.append(f"cos({angle}) = {result}")
+ return result
+
+ def tangent(self, angle: float) -> float:
+ """Calculate tangent of angle in radians."""
+ result = math.tan(angle)
+ self.history.append(f"tan({angle}) = {result}")
+ return result
+
+ def logarithm(self, number: float, base: float = math.e) -> float:
+ """Calculate logarithm of number with given base."""
+ if number <= 0:
+ raise ValueError("Logarithm is not defined for non-positive numbers")
+ if base <= 0 or base == 1:
+ raise ValueError("Logarithm base must be positive and not equal to 1")
+
+ result = math.log(number, base)
+ self.history.append(f"log_{base}({number}) = {result}")
+ return result
+
+ def degrees_to_radians(self, degrees: float) -> float:
+ """Convert degrees to radians."""
+ return degrees * math.pi / 180
+
+ def radians_to_degrees(self, radians: float) -> float:
+ """Convert radians to degrees."""
+ return radians * 180 / math.pi
+
+def main():
+ """Main function to demonstrate calculator functionality."""
+ print("Calculator Demo")
+ print("=" * 30)
+
+ # Basic calculator
+ calc = Calculator()
+
+ print("Basic Calculator Operations:")
+ print(f"5 + 3 = {calc.add(5, 3)}")
+ print(f"10 - 4 = {calc.subtract(10, 4)}")
+ print(f"6 * 7 = {calc.multiply(6, 7)}")
+ print(f"15 / 3 = {calc.divide(15, 3)}")
+ print(f"2 ^ 8 = {calc.power(2, 8)}")
+ print(f"√64 = {calc.square_root(64)}")
+ print(f"5! = {calc.factorial(5)}")
+
+ print(f"\nCalculation History:")
+ for entry in calc.get_history():
+ print(f" {entry}")
+
+ # Scientific calculator
+ print("\n" + "=" * 30)
+ print("Scientific Calculator Operations:")
+
+ sci_calc = ScientificCalculator()
+
+ # Convert degrees to radians for trigonometric functions
+ angle_deg = 45
+ angle_rad = sci_calc.degrees_to_radians(angle_deg)
+
+ print(f"sin({angle_deg}°) = {sci_calc.sine(angle_rad):.4f}")
+ print(f"cos({angle_deg}°) = {sci_calc.cosine(angle_rad):.4f}")
+ print(f"tan({angle_deg}°) = {sci_calc.tangent(angle_rad):.4f}")
+ print(f"ln(10) = {sci_calc.logarithm(10):.4f}")
+ print(f"log₁₀(100) = {sci_calc.logarithm(100, 10):.4f}")
+
+ print(f"\nScientific Calculator History:")
+ for entry in sci_calc.get_history():
+ print(f" {entry}")
+
+if __name__ == "__main__":
+ main()
+
+
+
+
diff --git a/week4/community-contributions/python_to_cpp_code_translator/examples/fibonacci.py b/week4/community-contributions/python_to_cpp_code_translator/examples/fibonacci.py
new file mode 100644
index 0000000..6a41a83
--- /dev/null
+++ b/week4/community-contributions/python_to_cpp_code_translator/examples/fibonacci.py
@@ -0,0 +1,64 @@
+"""
+Fibonacci sequence implementation in Python.
+"""
+
+def fibonacci(n):
+ """Calculate the nth Fibonacci number using recursion."""
+ if n <= 1:
+ return n
+ return fibonacci(n-1) + fibonacci(n-2)
+
+def fibonacci_iterative(n):
+ """Calculate the nth Fibonacci number using iteration."""
+ if n <= 1:
+ return n
+
+ a, b = 0, 1
+ for _ in range(2, n + 1):
+ a, b = b, a + b
+ return b
+
+def fibonacci_sequence(count):
+ """Generate a sequence of Fibonacci numbers."""
+ sequence = []
+ for i in range(count):
+ sequence.append(fibonacci(i))
+ return sequence
+
+def main():
+ """Main function to demonstrate Fibonacci calculations."""
+ print("Fibonacci Sequence Demo")
+ print("=" * 30)
+
+ # Calculate first 10 Fibonacci numbers
+ for i in range(10):
+ result = fibonacci(i)
+ print(f"fibonacci({i}) = {result}")
+
+ print("\nFirst 15 Fibonacci numbers:")
+ sequence = fibonacci_sequence(15)
+ print(sequence)
+
+ # Performance comparison
+ import time
+
+ n = 30
+ print(f"\nPerformance comparison for fibonacci({n}):")
+
+ start_time = time.time()
+ recursive_result = fibonacci(n)
+ recursive_time = time.time() - start_time
+
+ start_time = time.time()
+ iterative_result = fibonacci_iterative(n)
+ iterative_time = time.time() - start_time
+
+ print(f"Recursive: {recursive_result} (took {recursive_time:.4f}s)")
+ print(f"Iterative: {iterative_result} (took {iterative_time:.4f}s)")
+
+if __name__ == "__main__":
+ main()
+
+
+
+
diff --git a/week4/community-contributions/python_to_cpp_code_translator/examples/sorting_algorithms.py b/week4/community-contributions/python_to_cpp_code_translator/examples/sorting_algorithms.py
new file mode 100644
index 0000000..4200070
--- /dev/null
+++ b/week4/community-contributions/python_to_cpp_code_translator/examples/sorting_algorithms.py
@@ -0,0 +1,150 @@
+"""
+Various sorting algorithms implemented in Python.
+"""
+
+import random
+import time
+from typing import List
+
+def bubble_sort(arr: List[int]) -> List[int]:
+ """Sort array using bubble sort algorithm."""
+ n = len(arr)
+ arr = arr.copy() # Don't modify original array
+
+ for i in range(n):
+ for j in range(0, n - i - 1):
+ if arr[j] > arr[j + 1]:
+ arr[j], arr[j + 1] = arr[j + 1], arr[j]
+
+ return arr
+
+def selection_sort(arr: List[int]) -> List[int]:
+ """Sort array using selection sort algorithm."""
+ n = len(arr)
+ arr = arr.copy()
+
+ for i in range(n):
+ min_idx = i
+ for j in range(i + 1, n):
+ if arr[j] < arr[min_idx]:
+ min_idx = j
+ arr[i], arr[min_idx] = arr[min_idx], arr[i]
+
+ return arr
+
+def insertion_sort(arr: List[int]) -> List[int]:
+ """Sort array using insertion sort algorithm."""
+ arr = arr.copy()
+
+ for i in range(1, len(arr)):
+ key = arr[i]
+ j = i - 1
+ while j >= 0 and arr[j] > key:
+ arr[j + 1] = arr[j]
+ j -= 1
+ arr[j + 1] = key
+
+ return arr
+
+def quick_sort(arr: List[int]) -> List[int]:
+ """Sort array using quick sort algorithm."""
+ if len(arr) <= 1:
+ return arr
+
+ pivot = arr[len(arr) // 2]
+ left = [x for x in arr if x < pivot]
+ middle = [x for x in arr if x == pivot]
+ right = [x for x in arr if x > pivot]
+
+ return quick_sort(left) + middle + quick_sort(right)
+
+def merge_sort(arr: List[int]) -> List[int]:
+ """Sort array using merge sort algorithm."""
+ if len(arr) <= 1:
+ return arr
+
+ mid = len(arr) // 2
+ left = merge_sort(arr[:mid])
+ right = merge_sort(arr[mid:])
+
+ return merge(left, right)
+
+def merge(left: List[int], right: List[int]) -> List[int]:
+ """Merge two sorted arrays."""
+ result = []
+ i = j = 0
+
+ while i < len(left) and j < len(right):
+ if left[i] <= right[j]:
+ result.append(left[i])
+ i += 1
+ else:
+ result.append(right[j])
+ j += 1
+
+ result.extend(left[i:])
+ result.extend(right[j:])
+ return result
+
+def benchmark_sorting_algorithms():
+ """Benchmark different sorting algorithms."""
+ sizes = [100, 500, 1000, 2000]
+ algorithms = {
+ "Bubble Sort": bubble_sort,
+ "Selection Sort": selection_sort,
+ "Insertion Sort": insertion_sort,
+ "Quick Sort": quick_sort,
+ "Merge Sort": merge_sort
+ }
+
+ print("Sorting Algorithm Benchmark")
+ print("=" * 50)
+
+ for size in sizes:
+ print(f"\nArray size: {size}")
+ print("-" * 30)
+
+ # Generate random array
+ test_array = [random.randint(1, 1000) for _ in range(size)]
+
+ for name, algorithm in algorithms.items():
+ start_time = time.time()
+ sorted_array = algorithm(test_array)
+ end_time = time.time()
+
+ # Verify sorting is correct
+ is_sorted = all(sorted_array[i] <= sorted_array[i+1] for i in range(len(sorted_array)-1))
+
+ print(f"{name:15}: {end_time - start_time:.4f}s {'✓' if is_sorted else '✗'}")
+
+def main():
+ """Main function to demonstrate sorting algorithms."""
+ print("Sorting Algorithms Demo")
+ print("=" * 30)
+
+ # Test with small array
+ test_array = [64, 34, 25, 12, 22, 11, 90]
+ print(f"Original array: {test_array}")
+
+ algorithms = {
+ "Bubble Sort": bubble_sort,
+ "Selection Sort": selection_sort,
+ "Insertion Sort": insertion_sort,
+ "Quick Sort": quick_sort,
+ "Merge Sort": merge_sort
+ }
+
+ for name, algorithm in algorithms.items():
+ sorted_array = algorithm(test_array)
+ print(f"{name}: {sorted_array}")
+
+ # Run benchmark
+ print("\n" + "=" * 50)
+ benchmark_sorting_algorithms()
+
+if __name__ == "__main__":
+ main()
+
+
+
+
diff --git a/week4/community-contributions/python_to_cpp_code_translator/python_code_translator.ipynb b/week4/community-contributions/python_to_cpp_code_translator/python_code_translator.ipynb
new file mode 100644
index 0000000..d97e14b
--- /dev/null
+++ b/week4/community-contributions/python_to_cpp_code_translator/python_code_translator.ipynb
@@ -0,0 +1,1280 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 🚀 Code Translator from Python to C++\n",
+ "\n",
+ "**Multi-LLM Python to C++ Code Translator with Compilation Testing and Quality Analysis**\n",
+ "\n",
+ "This notebook demonstrates a comprehensive AI-powered code translation system that:\n",
+ "- Translates Python code to C++ using multiple LLM models (GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash)\n",
+ "- Automatically compiles and tests generated C++ code\n",
+ "- Performs quality analysis and performance benchmarking\n",
+ "- Compares translation results across different AI models\n",
+ "\n",
+ "## 🎯 Key Features\n",
+ "\n",
+ "- **Multi-LLM Support**: Compare translations from OpenAI, Anthropic, and Google\n",
+ "- **C++ Compilation**: Automatic compilation and execution testing\n",
+ "- **Quality Analysis**: Code quality metrics and performance benchmarking\n",
+ "- **Interactive Interface**: Easy-to-use notebook interface\n",
+ "- **Comprehensive Testing**: Full test suite for validation\n",
+ "\n",
+ "## 📋 Table of Contents\n",
+ "\n",
+ "1. [Setup and Installation](#setup)\n",
+ "2. [LLM Client Implementation](#llm-clients)\n",
+ "3. [C++ Compiler and Testing](#compiler)\n",
+ "4. [Core Translation Logic](#translator)\n",
+ "5. [Quality Analysis](#quality)\n",
+ "6. [Interactive Examples](#examples)\n",
+ "7. [Performance Benchmarking](#benchmarking)\n",
+ "8. [Testing and Validation](#testing)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Setup and Installation\n",
+ "\n",
+ "First, let's install the required dependencies and set up the environment.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Install required packages\n",
+ "!uv add openai anthropic google-generativeai gradio python-dotenv pydantic requests psutil memory-profiler pytest black flake8 mypy\n",
+ "#For those working with pip, you can use the following command:\n",
+ "#!pip install openai anthropic google-generativeai gradio python-dotenv pydantic requests psutil memory-profiler pytest black flake8 mypy\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Import required libraries\n",
+ "import os\n",
+ "import sys\n",
+ "import json\n",
+ "import time\n",
+ "import subprocess\n",
+ "import tempfile\n",
+ "import psutil\n",
+ "import re\n",
+ "from typing import Dict, List, Optional, Tuple, Any, Union\n",
+ "from dataclasses import dataclass, asdict\n",
+ "from pathlib import Path\n",
+ "\n",
+ "# LLM libraries\n",
+ "import openai\n",
+ "import anthropic\n",
+ "import google.generativeai as genai\n",
+ "from dotenv import load_dotenv\n",
+ "\n",
+ "# Load environment variables\n",
+ "load_dotenv()\n",
+ "\n",
+ "print(\"✅ All libraries imported successfully!\")\n",
+ "print(f\"Python version: {sys.version}\")\n",
+ "print(f\"Working directory: {os.getcwd()}\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2. LLM Client Implementation\n",
+ "\n",
+ "Let's implement the LLM clients for OpenAI GPT, Anthropic Claude, and Google Gemini.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Data classes for translation results\n",
+ "@dataclass\n",
+ "class TranslationResult:\n",
+ " \"\"\"Result of a code translation.\"\"\"\n",
+ " source_code: str\n",
+ " translated_code: str\n",
+ " model_name: str\n",
+ " success: bool\n",
+ " error_message: Optional[str] = None\n",
+ " translation_time: float = 0.0\n",
+ " token_usage: Optional[Dict] = None\n",
+ "\n",
+ "@dataclass\n",
+ "class CompilationResult:\n",
+ " \"\"\"Result of C++ compilation.\"\"\"\n",
+ " success: bool\n",
+ " executable_path: Optional[str] = None\n",
+ " error_message: Optional[str] = None\n",
+ " compilation_time: float = 0.0\n",
+ " warnings: List[str] = None\n",
+ "\n",
+ "@dataclass\n",
+ "class ExecutionResult:\n",
+ " \"\"\"Result of C++ code execution.\"\"\"\n",
+ " success: bool\n",
+ " output: str = \"\"\n",
+ " error_message: Optional[str] = None\n",
+ " execution_time: float = 0.0\n",
+ " memory_usage: float = 0.0\n",
+ " exit_code: int = 0\n",
+ "\n",
+ "@dataclass\n",
+ "class PerformanceMetrics:\n",
+ " \"\"\"Performance metrics for C++ code.\"\"\"\n",
+ " execution_time: float\n",
+ " memory_usage: float\n",
+ " cpu_usage: float\n",
+ " code_size: int\n",
+ " compilation_time: float\n",
+ "\n",
+ "print(\"✅ Data classes defined successfully!\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# OpenAI GPT Client\n",
+ "class OpenAIClient:\n",
+ " \"\"\"OpenAI GPT client for code translation.\"\"\"\n",
+ " \n",
+ " def __init__(self, api_key: str):\n",
+ " self.api_key = api_key\n",
+ " self.client = openai.OpenAI(api_key=api_key)\n",
+ " \n",
+ " def translate_python_to_cpp(self, python_code: str, context: str = \"\") -> TranslationResult:\n",
+ " \"\"\"Translate Python code to C++ using GPT-4o.\"\"\"\n",
+ " start_time = time.time()\n",
+ " \n",
+ " try:\n",
+ " system_prompt = \"\"\"You are an expert Python to C++ translator. \n",
+ " Convert the given Python code to efficient, modern C++ code.\n",
+ " \n",
+ " Requirements:\n",
+ " - Use modern C++17/20 features\n",
+ " - Include proper headers\n",
+ " - Add comprehensive error handling\n",
+ " - Optimize for performance\n",
+ " - Include detailed comments\n",
+ " - Follow C++ best practices\n",
+ " \n",
+ " Return ONLY the C++ code, no explanations.\"\"\"\n",
+ " \n",
+ " user_prompt = f\"\"\"Translate this Python code to C++:\n",
+ "\n",
+ "Context: {context}\n",
+ "\n",
+ "Python Code:\n",
+ "```python\n",
+ "{python_code}\n",
+ "```\n",
+ "\n",
+ "C++ Translation:\"\"\"\n",
+ " \n",
+ " response = self.client.chat.completions.create(\n",
+ " model=\"gpt-4o\",\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": system_prompt},\n",
+ " {\"role\": \"user\", \"content\": user_prompt}\n",
+ " ],\n",
+ " temperature=0.1,\n",
+ " max_tokens=4000\n",
+ " )\n",
+ " \n",
+ " translated_code = response.choices[0].message.content.strip()\n",
+ " translation_time = time.time() - start_time\n",
+ " \n",
+ " return TranslationResult(\n",
+ " source_code=python_code,\n",
+ " translated_code=translated_code,\n",
+ " model_name=\"GPT-4o\",\n",
+ " success=True,\n",
+ " translation_time=translation_time,\n",
+ " token_usage={\n",
+ " \"prompt_tokens\": response.usage.prompt_tokens,\n",
+ " \"completion_tokens\": response.usage.completion_tokens,\n",
+ " \"total_tokens\": response.usage.total_tokens\n",
+ " }\n",
+ " )\n",
+ " \n",
+ " except Exception as e:\n",
+ " return TranslationResult(\n",
+ " source_code=python_code,\n",
+ " translated_code=\"\",\n",
+ " model_name=\"GPT-4o\",\n",
+ " success=False,\n",
+ " error_message=str(e),\n",
+ " translation_time=time.time() - start_time\n",
+ " )\n",
+ "\n",
+ "print(\"✅ OpenAI client implemented!\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Anthropic Claude Client\n",
+ "class ClaudeClient:\n",
+ " \"\"\"Anthropic Claude client for code translation.\"\"\"\n",
+ " \n",
+ " def __init__(self, api_key: str):\n",
+ " self.api_key = api_key\n",
+ " self.client = anthropic.Anthropic(api_key=api_key)\n",
+ " \n",
+ " def translate_python_to_cpp(self, python_code: str, context: str = \"\") -> TranslationResult:\n",
+ " \"\"\"Translate Python code to C++ using Claude 3.5 Sonnet.\"\"\"\n",
+ " start_time = time.time()\n",
+ " \n",
+ " try:\n",
+ " system_prompt = \"\"\"You are an expert Python to C++ translator. \n",
+ " Convert the given Python code to efficient, modern C++ code.\n",
+ " \n",
+ " Requirements:\n",
+ " - Use modern C++17/20 features\n",
+ " - Include proper headers\n",
+ " - Add comprehensive error handling\n",
+ " - Optimize for performance\n",
+ " - Include detailed comments\n",
+ " - Follow C++ best practices\n",
+ " \n",
+ " Return ONLY the C++ code, no explanations.\"\"\"\n",
+ " \n",
+ " user_prompt = f\"\"\"Translate this Python code to C++:\n",
+ "\n",
+ "Context: {context}\n",
+ "\n",
+ "Python Code:\n",
+ "```python\n",
+ "{python_code}\n",
+ "```\n",
+ "\n",
+ "C++ Translation:\"\"\"\n",
+ " \n",
+ " response = self.client.messages.create(\n",
+ " model=\"claude-sonnet-4-20250514\",\n",
+ " max_tokens=4000,\n",
+ " temperature=0.1,\n",
+ " system=system_prompt,\n",
+ " messages=[\n",
+ " {\"role\": \"user\", \"content\": user_prompt}\n",
+ " ]\n",
+ " )\n",
+ " \n",
+ " translated_code = response.content[0].text.strip()\n",
+ " translation_time = time.time() - start_time\n",
+ " \n",
+ " return TranslationResult(\n",
+ " source_code=python_code,\n",
+ " translated_code=translated_code,\n",
+ " model_name=\"Claude-3.5-Sonnet\",\n",
+ " success=True,\n",
+ " translation_time=translation_time,\n",
+ " token_usage={\n",
+ " \"input_tokens\": response.usage.input_tokens,\n",
+ " \"output_tokens\": response.usage.output_tokens\n",
+ " }\n",
+ " )\n",
+ " \n",
+ " except Exception as e:\n",
+ " return TranslationResult(\n",
+ " source_code=python_code,\n",
+ " translated_code=\"\",\n",
+ " model_name=\"Claude-3.5-Sonnet\",\n",
+ " success=False,\n",
+ " error_message=str(e),\n",
+ " translation_time=time.time() - start_time\n",
+ " )\n",
+ "\n",
+ "print(\"✅ Claude client implemented!\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Google Gemini Client\n",
+ "class GeminiClient:\n",
+ " \"\"\"Google Gemini client for code translation.\"\"\"\n",
+ " \n",
+ " def __init__(self, api_key: str):\n",
+ " self.api_key = api_key\n",
+ " genai.configure(api_key=api_key)\n",
+ " self.client = genai.GenerativeModel('gemini-2.0-flash-exp')\n",
+ " \n",
+ " def translate_python_to_cpp(self, python_code: str, context: str = \"\") -> TranslationResult:\n",
+ " \"\"\"Translate Python code to C++ using Gemini 2.0 Flash.\"\"\"\n",
+ " start_time = time.time()\n",
+ " \n",
+ " try:\n",
+ " prompt = f\"\"\"You are an expert Python to C++ translator. \n",
+ " Convert the given Python code to efficient, modern C++ code.\n",
+ " \n",
+ " Requirements:\n",
+ " - Use modern C++17/20 features\n",
+ " - Include proper headers\n",
+ " - Add comprehensive error handling\n",
+ " - Optimize for performance\n",
+ " - Include detailed comments\n",
+ " - Follow C++ best practices\n",
+ " \n",
+ " Context: {context}\n",
+ " \n",
+ " Python Code:\n",
+ " ```python\n",
+ " {python_code}\n",
+ " ```\n",
+ " \n",
+ " Return ONLY the C++ code, no explanations.\"\"\"\n",
+ " \n",
+ " response = self.client.generate_content(\n",
+ " prompt,\n",
+ " generation_config=genai.types.GenerationConfig(\n",
+ " temperature=0.1,\n",
+ " max_output_tokens=4000\n",
+ " )\n",
+ " )\n",
+ " \n",
+ " translated_code = response.text.strip()\n",
+ " translation_time = time.time() - start_time\n",
+ " \n",
+ " return TranslationResult(\n",
+ " source_code=python_code,\n",
+ " translated_code=translated_code,\n",
+ " model_name=\"Gemini-2.0-Flash\",\n",
+ " success=True,\n",
+ " translation_time=translation_time\n",
+ " )\n",
+ " \n",
+ " except Exception as e:\n",
+ " return TranslationResult(\n",
+ " source_code=python_code,\n",
+ " translated_code=\"\",\n",
+ " model_name=\"Gemini-2.0-Flash\",\n",
+ " success=False,\n",
+ " error_message=str(e),\n",
+ " translation_time=time.time() - start_time\n",
+ " )\n",
+ "\n",
+ "print(\"✅ Gemini client implemented!\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# LLM Client Manager\n",
+ "class LLMClientManager:\n",
+ " \"\"\"Manages multiple LLM clients for code translation.\"\"\"\n",
+ " \n",
+ " def __init__(self):\n",
+ " self.clients = {}\n",
+ " self._initialize_clients()\n",
+ " \n",
+ " def _initialize_clients(self):\n",
+ " \"\"\"Initialize available LLM clients.\"\"\"\n",
+ " # OpenAI\n",
+ " openai_key = os.getenv('OPENAI_API_KEY')\n",
+ " if openai_key:\n",
+ " self.clients['gpt'] = OpenAIClient(openai_key)\n",
+ " \n",
+ " # Anthropic Claude\n",
+ " claude_key = os.getenv('ANTHROPIC_API_KEY')\n",
+ " if claude_key:\n",
+ " self.clients['claude'] = ClaudeClient(claude_key)\n",
+ " \n",
+ " # Google Gemini\n",
+ " gemini_key = os.getenv('GOOGLE_API_KEY')\n",
+ " if gemini_key:\n",
+ " self.clients['gemini'] = GeminiClient(gemini_key)\n",
+ " \n",
+ " def get_available_models(self) -> List[str]:\n",
+ " \"\"\"Get list of available model names.\"\"\"\n",
+ " return list(self.clients.keys())\n",
+ " \n",
+ " def translate_with_all_models(self, python_code: str, context: str = \"\") -> Dict[str, TranslationResult]:\n",
+ " \"\"\"Translate code using all available models.\"\"\"\n",
+ " results = {}\n",
+ " \n",
+ " for model_name, client in self.clients.items():\n",
+ " try:\n",
+ " result = client.translate_python_to_cpp(python_code, context)\n",
+ " results[model_name] = result\n",
+ " except Exception as e:\n",
+ " results[model_name] = TranslationResult(\n",
+ " source_code=python_code,\n",
+ " translated_code=\"\",\n",
+ " model_name=model_name,\n",
+ " success=False,\n",
+ " error_message=str(e)\n",
+ " )\n",
+ " \n",
+ " return results\n",
+ " \n",
+ " def translate_with_model(self, model_name: str, python_code: str, context: str = \"\") -> TranslationResult:\n",
+ " \"\"\"Translate code using a specific model.\"\"\"\n",
+ " if model_name not in self.clients:\n",
+ " raise ValueError(f\"Model {model_name} not available. Available models: {list(self.clients.keys())}\")\n",
+ " \n",
+ " return self.clients[model_name].translate_python_to_cpp(python_code, context)\n",
+ "\n",
+ "# Initialize LLM manager\n",
+ "llm_manager = LLMClientManager()\n",
+ "available_models = llm_manager.get_available_models()\n",
+ "\n",
+ "print(f\"✅ LLM Client Manager initialized!\")\n",
+ "print(f\"Available models: {available_models}\")\n",
+ "\n",
+ "if not available_models:\n",
+ " print(\"⚠️ No LLM models available. Please check your API keys:\")\n",
+ " print(\" - OPENAI_API_KEY\")\n",
+ " print(\" - ANTHROPIC_API_KEY\") \n",
+ " print(\" - GOOGLE_API_KEY\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3. C++ Compiler and Testing\n",
+ "\n",
+ "Now let's implement the C++ compilation and testing functionality.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# C++ Compiler Implementation\n",
+ "class CppCompiler:\n",
+ " \"\"\"Handles C++ compilation and testing.\"\"\"\n",
+ " \n",
+ " def __init__(self, compiler_path: str = \"g++\", optimization_level: str = \"-O2\"):\n",
+ " self.compiler_path = compiler_path\n",
+ " self.optimization_level = optimization_level\n",
+ " self.temp_dir = None\n",
+ " \n",
+ " def __enter__(self):\n",
+ " \"\"\"Context manager entry.\"\"\"\n",
+ " self.temp_dir = tempfile.mkdtemp(prefix=\"cpp_translator_\")\n",
+ " return self\n",
+ " \n",
+ " def __exit__(self, exc_type, exc_val, exc_tb):\n",
+ " \"\"\"Context manager exit - cleanup temp files.\"\"\"\n",
+ " if self.temp_dir and os.path.exists(self.temp_dir):\n",
+ " import shutil\n",
+ " shutil.rmtree(self.temp_dir, ignore_errors=True)\n",
+ " \n",
+ " def _write_cpp_file(self, cpp_code: str, filename: str = \"main.cpp\") -> str:\n",
+ " \"\"\"Write C++ code to a temporary file.\"\"\"\n",
+ " if not self.temp_dir:\n",
+ " raise RuntimeError(\"Compiler not initialized. Use as context manager.\")\n",
+ " \n",
+ " file_path = os.path.join(self.temp_dir, filename)\n",
+ " with open(file_path, 'w', encoding='utf-8') as f:\n",
+ " f.write(cpp_code)\n",
+ " return file_path\n",
+ " \n",
+ " def _add_standard_headers(self, cpp_code: str) -> str:\n",
+ " \"\"\"Add standard C++ headers if not present.\"\"\"\n",
+ " if \"#include\" not in cpp_code:\n",
+ " headers = [\n",
+ " \"#include \",\n",
+ " \"#include \",\n",
+ " \"#include \",\n",
+ " \"#include \",\n",
+ " \"#include \",\n",
+ " \"#include \",\n",
+ " \"#include \",\n",
+ " \"#include \"\n",
+ " ]\n",
+ " cpp_code = \"\\n\".join(headers) + \"\\n\\n\" + cpp_code\n",
+ " \n",
+ " return cpp_code\n",
+ " \n",
+ " def _add_main_function_if_needed(self, cpp_code: str) -> str:\n",
+ " \"\"\"Add main function if not present.\"\"\"\n",
+ " if \"int main(\" not in cpp_code and \"void main(\" not in cpp_code:\n",
+ " main_code = \"\"\"\n",
+ "int main() {\n",
+ " try {\n",
+ " // Your code will be executed here\n",
+ " return 0;\n",
+ " } catch (const std::exception& e) {\n",
+ " std::cerr << \"Error: \" << e.what() << std::endl;\n",
+ " return 1;\n",
+ " }\n",
+ "}\"\"\"\n",
+ " cpp_code += main_code\n",
+ " \n",
+ " return cpp_code\n",
+ " \n",
+ " def compile_cpp(self, cpp_code: str, output_name: str = \"main\") -> CompilationResult:\n",
+ " \"\"\"Compile C++ code to executable.\"\"\"\n",
+ " start_time = time.time()\n",
+ " \n",
+ " try:\n",
+ " # Preprocess the code\n",
+ " cpp_code = self._add_standard_headers(cpp_code)\n",
+ " cpp_code = self._add_main_function_if_needed(cpp_code)\n",
+ " \n",
+ " # Write to temporary file\n",
+ " cpp_file = self._write_cpp_file(cpp_code)\n",
+ " exe_path = os.path.join(self.temp_dir, output_name)\n",
+ " \n",
+ " # Compilation command\n",
+ " cmd = [\n",
+ " self.compiler_path,\n",
+ " self.optimization_level,\n",
+ " \"-std=c++17\",\n",
+ " \"-Wall\",\n",
+ " \"-Wextra\",\n",
+ " cpp_file,\n",
+ " \"-o\", exe_path\n",
+ " ]\n",
+ " \n",
+ " # Compile\n",
+ " result = subprocess.run(\n",
+ " cmd,\n",
+ " capture_output=True,\n",
+ " text=True,\n",
+ " timeout=30\n",
+ " )\n",
+ " \n",
+ " compilation_time = time.time() - start_time\n",
+ " \n",
+ " if result.returncode == 0:\n",
+ " return CompilationResult(\n",
+ " success=True,\n",
+ " executable_path=exe_path,\n",
+ " compilation_time=compilation_time,\n",
+ " warnings=self._extract_warnings(result.stderr)\n",
+ " )\n",
+ " else:\n",
+ " return CompilationResult(\n",
+ " success=False,\n",
+ " error_message=result.stderr,\n",
+ " compilation_time=compilation_time\n",
+ " )\n",
+ " \n",
+ " except subprocess.TimeoutExpired:\n",
+ " return CompilationResult(\n",
+ " success=False,\n",
+ " error_message=\"Compilation timeout\",\n",
+ " compilation_time=time.time() - start_time\n",
+ " )\n",
+ " except Exception as e:\n",
+ " return CompilationResult(\n",
+ " success=False,\n",
+ " error_message=str(e),\n",
+ " compilation_time=time.time() - start_time\n",
+ " )\n",
+ " \n",
+ " def _extract_warnings(self, stderr: str) -> List[str]:\n",
+ " \"\"\"Extract warnings from compiler output.\"\"\"\n",
+ " warnings = []\n",
+ " for line in stderr.split('\\n'):\n",
+ " if 'warning:' in line.lower():\n",
+ " warnings.append(line.strip())\n",
+ " return warnings\n",
+ " \n",
+ " def execute_cpp(self, executable_path: str, input_data: str = \"\", timeout: int = 10) -> ExecutionResult:\n",
+ " \"\"\"Execute compiled C++ code.\"\"\"\n",
+ " start_time = time.time()\n",
+ " \n",
+ " try:\n",
+ " # Start process\n",
+ " process = subprocess.Popen(\n",
+ " [executable_path],\n",
+ " stdin=subprocess.PIPE,\n",
+ " stdout=subprocess.PIPE,\n",
+ " stderr=subprocess.PIPE,\n",
+ " text=True\n",
+ " )\n",
+ " \n",
+ " # Monitor memory usage\n",
+ " memory_usage = 0.0\n",
+ " try:\n",
+ " ps_process = psutil.Process(process.pid)\n",
+ " memory_usage = ps_process.memory_info().rss / 1024 / 1024 # MB\n",
+ " except (psutil.NoSuchProcess, psutil.AccessDenied):\n",
+ " pass\n",
+ " \n",
+ " # Execute with timeout\n",
+ " stdout, stderr = process.communicate(input=input_data, timeout=timeout)\n",
+ " execution_time = time.time() - start_time\n",
+ " \n",
+ " return ExecutionResult(\n",
+ " success=process.returncode == 0,\n",
+ " output=stdout,\n",
+ " error_message=stderr if stderr else None,\n",
+ " execution_time=execution_time,\n",
+ " memory_usage=memory_usage,\n",
+ " exit_code=process.returncode\n",
+ " )\n",
+ " \n",
+ " except subprocess.TimeoutExpired:\n",
+ " process.kill()\n",
+ " return ExecutionResult(\n",
+ " success=False,\n",
+ " error_message=\"Execution timeout\",\n",
+ " execution_time=time.time() - start_time\n",
+ " )\n",
+ " except Exception as e:\n",
+ " return ExecutionResult(\n",
+ " success=False,\n",
+ " error_message=str(e),\n",
+ " execution_time=time.time() - start_time\n",
+ " )\n",
+ " \n",
+ " def compile_and_test(self, cpp_code: str, test_input: str = \"\") -> Tuple[CompilationResult, Optional[ExecutionResult]]:\n",
+ " \"\"\"Compile and test C++ code.\"\"\"\n",
+ " # Compile\n",
+ " compilation_result = self.compile_cpp(cpp_code)\n",
+ " \n",
+ " if not compilation_result.success:\n",
+ " return compilation_result, None\n",
+ " \n",
+ " # Execute\n",
+ " execution_result = self.execute_cpp(compilation_result.executable_path, test_input)\n",
+ " \n",
+ " return compilation_result, execution_result\n",
+ "\n",
+ "print(\"✅ C++ Compiler implemented!\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Code Quality Analyzer\n",
+ "class CodeQualityAnalyzer:\n",
+ " \"\"\"Analyzes code quality metrics.\"\"\"\n",
+ " \n",
+ " @staticmethod\n",
+ " def analyze_cpp_quality(cpp_code: str) -> Dict[str, Any]:\n",
+ " \"\"\"Analyze C++ code quality.\"\"\"\n",
+ " metrics = {\n",
+ " \"lines_of_code\": len(cpp_code.split('\\n')),\n",
+ " \"comment_ratio\": CodeQualityAnalyzer._calculate_comment_ratio(cpp_code),\n",
+ " \"function_count\": CodeQualityAnalyzer._count_functions(cpp_code),\n",
+ " \"class_count\": CodeQualityAnalyzer._count_classes(cpp_code),\n",
+ " \"complexity_score\": CodeQualityAnalyzer._calculate_complexity(cpp_code),\n",
+ " \"style_score\": CodeQualityAnalyzer._calculate_style_score(cpp_code),\n",
+ " \"error_handling\": CodeQualityAnalyzer._check_error_handling(cpp_code),\n",
+ " \"modern_cpp_features\": CodeQualityAnalyzer._check_modern_features(cpp_code)\n",
+ " }\n",
+ " \n",
+ " return metrics\n",
+ " \n",
+ " @staticmethod\n",
+ " def _calculate_comment_ratio(cpp_code: str) -> float:\n",
+ " \"\"\"Calculate ratio of commented lines.\"\"\"\n",
+ " lines = cpp_code.split('\\n')\n",
+ " comment_lines = sum(1 for line in lines if line.strip().startswith('//') or line.strip().startswith('/*'))\n",
+ " return comment_lines / len(lines) if lines else 0.0\n",
+ " \n",
+ " @staticmethod\n",
+ " def _count_functions(cpp_code: str) -> int:\n",
+ " \"\"\"Count function definitions.\"\"\"\n",
+ " pattern = r'\\w+\\s+\\w+\\s*\\([^)]*\\)\\s*\\{'\n",
+ " return len(re.findall(pattern, cpp_code))\n",
+ " \n",
+ " @staticmethod\n",
+ " def _count_classes(cpp_code: str) -> int:\n",
+ " \"\"\"Count class definitions.\"\"\"\n",
+ " pattern = r'class\\s+\\w+'\n",
+ " return len(re.findall(pattern, cpp_code))\n",
+ " \n",
+ " @staticmethod\n",
+ " def _calculate_complexity(cpp_code: str) -> int:\n",
+ " \"\"\"Calculate cyclomatic complexity.\"\"\"\n",
+ " complexity_keywords = ['if', 'else', 'while', 'for', 'switch', 'case', 'catch', '&&', '||']\n",
+ " complexity = 1 # Base complexity\n",
+ " \n",
+ " for keyword in complexity_keywords:\n",
+ " complexity += cpp_code.count(keyword)\n",
+ " \n",
+ " return complexity\n",
+ " \n",
+ " @staticmethod\n",
+ " def _calculate_style_score(cpp_code: str) -> float:\n",
+ " \"\"\"Calculate style score based on various factors.\"\"\"\n",
+ " score = 0.0\n",
+ " lines = cpp_code.split('\\n')\n",
+ " \n",
+ " # Check for consistent indentation\n",
+ " if all(line.startswith((' ', '\\t')) or not line.strip() for line in lines[1:]):\n",
+ " score += 0.2\n",
+ " \n",
+ " # Check for proper spacing\n",
+ " if re.search(r'\\w\\(\\w', cpp_code): # Functions with proper spacing\n",
+ " score += 0.2\n",
+ " \n",
+ " # Check for const correctness\n",
+ " if 'const' in cpp_code:\n",
+ " score += 0.2\n",
+ " \n",
+ " # Check for RAII usage\n",
+ " if 'std::unique_ptr' in cpp_code or 'std::shared_ptr' in cpp_code:\n",
+ " score += 0.2\n",
+ " \n",
+ " # Check for proper includes\n",
+ " if '#include' in cpp_code:\n",
+ " score += 0.2\n",
+ " \n",
+ " return min(score, 1.0)\n",
+ " \n",
+ " @staticmethod\n",
+ " def _check_error_handling(cpp_code: str) -> bool:\n",
+ " \"\"\"Check if code has proper error handling.\"\"\"\n",
+ " return 'try' in cpp_code and 'catch' in cpp_code\n",
+ " \n",
+ " @staticmethod\n",
+ " def _check_modern_features(cpp_code: str) -> List[str]:\n",
+ " \"\"\"Check for modern C++ features.\"\"\"\n",
+ " features = []\n",
+ " \n",
+ " if 'auto' in cpp_code:\n",
+ " features.append('auto')\n",
+ " if 'std::unique_ptr' in cpp_code:\n",
+ " features.append('smart_pointers')\n",
+ " if 'std::vector' in cpp_code:\n",
+ " features.append('stl_containers')\n",
+ " if 'lambda' in cpp_code or '[]' in cpp_code:\n",
+ " features.append('lambdas')\n",
+ " if 'std::thread' in cpp_code:\n",
+ " features.append('threading')\n",
+ " \n",
+ " return features\n",
+ "\n",
+ "print(\"✅ Code Quality Analyzer implemented!\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 4. Core Translation Logic\n",
+ "\n",
+ "Now let's implement the main translation logic that coordinates all components.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Translation Comparison Data Class\n",
+ "@dataclass\n",
+ "class TranslationComparison:\n",
+ " \"\"\"Comparison of translations across different models.\"\"\"\n",
+ " model_results: Dict[str, TranslationResult]\n",
+ " compilation_results: Dict[str, CompilationResult]\n",
+ " execution_results: Dict[str, ExecutionResult]\n",
+ " performance_metrics: Dict[str, PerformanceMetrics]\n",
+ " quality_scores: Dict[str, Dict[str, Any]]\n",
+ " best_model: Optional[str] = None\n",
+ " comparison_summary: Optional[str] = None\n",
+ "\n",
+ "# Main Code Translator\n",
+ "class CodeTranslator:\n",
+ " \"\"\"Main translator class that coordinates the entire translation process.\"\"\"\n",
+ " \n",
+ " def __init__(self):\n",
+ " self.llm_manager = LLMClientManager()\n",
+ " self.available_models = self.llm_manager.get_available_models()\n",
+ " \n",
+ " if not self.available_models:\n",
+ " print(\"⚠️ No LLM models available. Please check your API keys.\")\n",
+ " \n",
+ " def translate_python_to_cpp(self, python_code: str, context: str = \"\", \n",
+ " test_input: str = \"\", use_all_models: bool = True) -> TranslationComparison:\n",
+ " \"\"\"Translate Python code to C++ using available models.\"\"\"\n",
+ " \n",
+ " if use_all_models:\n",
+ " # Translate with all available models\n",
+ " translation_results = self.llm_manager.translate_with_all_models(python_code, context)\n",
+ " else:\n",
+ " # Use first available model\n",
+ " model_name = self.available_models[0]\n",
+ " result = self.llm_manager.translate_with_model(model_name, python_code, context)\n",
+ " translation_results = {model_name: result}\n",
+ " \n",
+ " # Compile and test each translation\n",
+ " compilation_results = {}\n",
+ " execution_results = {}\n",
+ " performance_metrics = {}\n",
+ " quality_scores = {}\n",
+ " \n",
+ " with CppCompiler() as compiler:\n",
+ " for model_name, translation_result in translation_results.items():\n",
+ " if not translation_result.success:\n",
+ " continue\n",
+ " \n",
+ " # Compile and test\n",
+ " comp_result, exec_result = compiler.compile_and_test(\n",
+ " translation_result.translated_code, \n",
+ " test_input\n",
+ " )\n",
+ " \n",
+ " compilation_results[model_name] = comp_result\n",
+ " if exec_result:\n",
+ " execution_results[model_name] = exec_result\n",
+ " \n",
+ " # Get performance metrics\n",
+ " perf_metrics = self._get_performance_metrics(compiler, translation_result.translated_code, test_input)\n",
+ " if perf_metrics:\n",
+ " performance_metrics[model_name] = perf_metrics\n",
+ " \n",
+ " # Analyze code quality\n",
+ " quality_scores[model_name] = CodeQualityAnalyzer.analyze_cpp_quality(\n",
+ " translation_result.translated_code\n",
+ " )\n",
+ " \n",
+ " # Determine best model\n",
+ " best_model = self._determine_best_model(\n",
+ " translation_results, compilation_results, execution_results, \n",
+ " performance_metrics, quality_scores\n",
+ " )\n",
+ " \n",
+ " # Generate comparison summary\n",
+ " comparison_summary = self._generate_comparison_summary(\n",
+ " translation_results, compilation_results, execution_results,\n",
+ " performance_metrics, quality_scores, best_model\n",
+ " )\n",
+ " \n",
+ " return TranslationComparison(\n",
+ " model_results=translation_results,\n",
+ " compilation_results=compilation_results,\n",
+ " execution_results=execution_results,\n",
+ " performance_metrics=performance_metrics,\n",
+ " quality_scores=quality_scores,\n",
+ " best_model=best_model,\n",
+ " comparison_summary=comparison_summary\n",
+ " )\n",
+ " \n",
+ " def _get_performance_metrics(self, compiler: CppCompiler, cpp_code: str, test_input: str = \"\") -> Optional[PerformanceMetrics]:\n",
+ " \"\"\"Get comprehensive performance metrics.\"\"\"\n",
+ " compilation_result, execution_result = compiler.compile_and_test(cpp_code, test_input)\n",
+ " \n",
+ " if not compilation_result.success or not execution_result or not execution_result.success:\n",
+ " return None\n",
+ " \n",
+ " # Get code size\n",
+ " cpp_file = compiler._write_cpp_file(cpp_code)\n",
+ " code_size = os.path.getsize(cpp_file)\n",
+ " \n",
+ " # Get executable size\n",
+ " exe_size = 0\n",
+ " if compilation_result.executable_path and os.path.exists(compilation_result.executable_path):\n",
+ " exe_size = os.path.getsize(compilation_result.executable_path)\n",
+ " \n",
+ " return PerformanceMetrics(\n",
+ " execution_time=execution_result.execution_time,\n",
+ " memory_usage=execution_result.memory_usage,\n",
+ " cpu_usage=0.0, # Would need more complex monitoring\n",
+ " code_size=code_size,\n",
+ " compilation_time=compilation_result.compilation_time\n",
+ " )\n",
+ " \n",
+ " def _determine_best_model(self, translation_results: Dict[str, TranslationResult],\n",
+ " compilation_results: Dict[str, CompilationResult],\n",
+ " execution_results: Dict[str, ExecutionResult],\n",
+ " performance_metrics: Dict[str, PerformanceMetrics],\n",
+ " quality_scores: Dict[str, Dict[str, Any]]) -> Optional[str]:\n",
+ " \"\"\"Determine the best model based on multiple criteria.\"\"\"\n",
+ " \n",
+ " scores = {}\n",
+ " \n",
+ " for model_name in translation_results.keys():\n",
+ " score = 0.0\n",
+ " \n",
+ " # Translation success (40% weight)\n",
+ " if translation_results[model_name].success:\n",
+ " score += 0.4\n",
+ " \n",
+ " # Compilation success (30% weight)\n",
+ " if model_name in compilation_results and compilation_results[model_name].success:\n",
+ " score += 0.3\n",
+ " \n",
+ " # Execution success (20% weight)\n",
+ " if model_name in execution_results and execution_results[model_name].success:\n",
+ " score += 0.2\n",
+ " \n",
+ " # Performance (5% weight)\n",
+ " if model_name in performance_metrics:\n",
+ " # Lower execution time is better\n",
+ " exec_time = performance_metrics[model_name].execution_time\n",
+ " if exec_time > 0:\n",
+ " score += 0.05 * (1.0 / (1.0 + exec_time))\n",
+ " \n",
+ " # Code quality (5% weight)\n",
+ " if model_name in quality_scores:\n",
+ " quality = quality_scores[model_name]\n",
+ " style_score = quality.get('style_score', 0.0)\n",
+ " score += 0.05 * style_score\n",
+ " \n",
+ " scores[model_name] = score\n",
+ " \n",
+ " if scores:\n",
+ " return max(scores, key=scores.get)\n",
+ " return None\n",
+ " \n",
+ " def _generate_comparison_summary(self, translation_results: Dict[str, TranslationResult],\n",
+ " compilation_results: Dict[str, CompilationResult],\n",
+ " execution_results: Dict[str, ExecutionResult],\n",
+ " performance_metrics: Dict[str, PerformanceMetrics],\n",
+ " quality_scores: Dict[str, Dict[str, Any]],\n",
+ " best_model: Optional[str]) -> str:\n",
+ " \"\"\"Generate a summary of the comparison.\"\"\"\n",
+ " \n",
+ " summary_parts = []\n",
+ " \n",
+ " # Overall success rates\n",
+ " successful_translations = sum(1 for r in translation_results.values() if r.success)\n",
+ " successful_compilations = sum(1 for r in compilation_results.values() if r.success)\n",
+ " successful_executions = sum(1 for r in execution_results.values() if r.success)\n",
+ " \n",
+ " summary_parts.append(f\"Translation Success: {successful_translations}/{len(translation_results)}\")\n",
+ " summary_parts.append(f\"Compilation Success: {successful_compilations}/{len(compilation_results)}\")\n",
+ " summary_parts.append(f\"Execution Success: {successful_executions}/{len(execution_results)}\")\n",
+ " \n",
+ " # Best model\n",
+ " if best_model:\n",
+ " summary_parts.append(f\"Best Model: {best_model}\")\n",
+ " \n",
+ " # Best model details\n",
+ " if best_model in performance_metrics:\n",
+ " perf = performance_metrics[best_model]\n",
+ " summary_parts.append(f\"Best Model Performance:\")\n",
+ " summary_parts.append(f\" - Execution Time: {perf.execution_time:.4f}s\")\n",
+ " summary_parts.append(f\" - Memory Usage: {perf.memory_usage:.2f}MB\")\n",
+ " summary_parts.append(f\" - Compilation Time: {perf.compilation_time:.4f}s\")\n",
+ " \n",
+ " # Quality comparison\n",
+ " if quality_scores:\n",
+ " summary_parts.append(\"Quality Scores:\")\n",
+ " for model, scores in quality_scores.items():\n",
+ " summary_parts.append(f\" {model}:\")\n",
+ " summary_parts.append(f\" - Lines of Code: {scores.get('lines_of_code', 0)}\")\n",
+ " summary_parts.append(f\" - Comment Ratio: {scores.get('comment_ratio', 0):.2%}\")\n",
+ " summary_parts.append(f\" - Style Score: {scores.get('style_score', 0):.2f}\")\n",
+ " summary_parts.append(f\" - Complexity: {scores.get('complexity_score', 0)}\")\n",
+ " \n",
+ " return \"\\n\".join(summary_parts)\n",
+ "\n",
+ "# Initialize the translator\n",
+ "translator = CodeTranslator()\n",
+ "print(f\"✅ Code Translator initialized!\")\n",
+ "print(f\"Available models: {translator.available_models}\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 5. Interactive Examples\n",
+ "\n",
+ "Let's test the translator with some example Python code!\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Example 1: Simple Fibonacci Function\n",
+ "python_code_1 = \"\"\"\n",
+ "def fibonacci(n):\n",
+ " if n <= 1:\n",
+ " return n\n",
+ " return fibonacci(n-1) + fibonacci(n-2)\n",
+ "\n",
+ "def main():\n",
+ " print(\"Fibonacci sequence:\")\n",
+ " for i in range(10):\n",
+ " result = fibonacci(i)\n",
+ " print(f\"fibonacci({i}) = {result}\")\n",
+ "\n",
+ "if __name__ == \"__main__\":\n",
+ " main()\n",
+ "\"\"\"\n",
+ "\n",
+ "print(\"📝 Example 1: Fibonacci Function\")\n",
+ "print(\"=\" * 50)\n",
+ "print(python_code_1)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Test the translation\n",
+ "if translator.available_models:\n",
+ " print(\"🔄 Translating Python code to C++...\")\n",
+ " print(\"This may take a few moments...\")\n",
+ " \n",
+ " try:\n",
+ " comparison = translator.translate_python_to_cpp(\n",
+ " python_code_1, \n",
+ " \"Fibonacci sequence generator\",\n",
+ " use_all_models=True\n",
+ " )\n",
+ " \n",
+ " print(f\"✅ Translation completed!\")\n",
+ " print(f\"🏆 Best model: {comparison.best_model}\")\n",
+ " print(f\"📊 Models used: {len(comparison.model_results)}\")\n",
+ " \n",
+ " # Show results for each model\n",
+ " for model_name, result in comparison.model_results.items():\n",
+ " status = \"✅ Success\" if result.success else \"❌ Failed\"\n",
+ " print(f\"\\n{model_name}: {status}\")\n",
+ " if result.success:\n",
+ " print(f\" Translation time: {result.translation_time:.2f}s\")\n",
+ " if result.token_usage:\n",
+ " print(f\" Token usage: {result.token_usage}\")\n",
+ " \n",
+ " # Show compilation results\n",
+ " if comparison.compilation_results:\n",
+ " print(f\"\\n🔨 Compilation Results:\")\n",
+ " for model_name, comp_result in comparison.compilation_results.items():\n",
+ " status = \"✅ Compiled\" if comp_result.success else \"❌ Failed\"\n",
+ " print(f\" {model_name}: {status}\")\n",
+ " \n",
+ " # Show execution results\n",
+ " if comparison.execution_results:\n",
+ " print(f\"\\n⚡ Execution Results:\")\n",
+ " for model_name, exec_result in comparison.execution_results.items():\n",
+ " status = \"✅ Executed\" if exec_result.success else \"❌ Failed\"\n",
+ " print(f\" {model_name}: {status}\")\n",
+ " if exec_result.success and exec_result.output:\n",
+ " print(f\" Output: {exec_result.output.strip()}\")\n",
+ " \n",
+ " except Exception as e:\n",
+ " print(f\"❌ Translation failed: {e}\")\n",
+ "else:\n",
+ " print(\"⚠️ No LLM models available. Please set your API keys.\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Display the best C++ code\n",
+ "if 'comparison' in locals() and comparison.best_model:\n",
+ " best_result = comparison.model_results[comparison.best_model]\n",
+ " print(f\"🏆 Best C++ Code (from {comparison.best_model}):\")\n",
+ " print(\"=\" * 60)\n",
+ " print(best_result.translated_code)\n",
+ " \n",
+ " # Show quality metrics\n",
+ " if comparison.best_model in comparison.quality_scores:\n",
+ " quality = comparison.quality_scores[comparison.best_model]\n",
+ " print(f\"\\n📊 Quality Metrics:\")\n",
+ " print(f\" Lines of code: {quality.get('lines_of_code', 0)}\")\n",
+ " print(f\" Comment ratio: {quality.get('comment_ratio', 0):.2%}\")\n",
+ " print(f\" Style score: {quality.get('style_score', 0):.2f}\")\n",
+ " print(f\" Complexity: {quality.get('complexity_score', 0)}\")\n",
+ " print(f\" Modern features: {quality.get('modern_cpp_features', [])}\")\n",
+ " \n",
+ " # Show performance metrics\n",
+ " if comparison.best_model in comparison.performance_metrics:\n",
+ " perf = comparison.performance_metrics[comparison.best_model]\n",
+ " print(f\"\\n⚡ Performance Metrics:\")\n",
+ " print(f\" Execution time: {perf.execution_time:.4f}s\")\n",
+ " print(f\" Memory usage: {perf.memory_usage:.2f}MB\")\n",
+ " print(f\" Compilation time: {perf.compilation_time:.4f}s\")\n",
+ " print(f\" Code size: {perf.code_size} bytes\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 6. Additional Examples\n",
+ "\n",
+ "Let's try a more complex example with classes and algorithms.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Example 2: Calculator Class\n",
+ "python_code_2 = \"\"\"\n",
+ "class Calculator:\n",
+ " def __init__(self):\n",
+ " self.history = []\n",
+ " \n",
+ " def add(self, a, b):\n",
+ " result = a + b\n",
+ " self.history.append(f\"{a} + {b} = {result}\")\n",
+ " return result\n",
+ " \n",
+ " def multiply(self, a, b):\n",
+ " result = a * b\n",
+ " self.history.append(f\"{a} * {b} = {result}\")\n",
+ " return result\n",
+ " \n",
+ " def get_history(self):\n",
+ " return self.history\n",
+ "\n",
+ "def main():\n",
+ " calc = Calculator()\n",
+ " print(\"Calculator Demo\")\n",
+ " print(calc.add(5, 3))\n",
+ " print(calc.multiply(4, 7))\n",
+ " print(\"History:\", calc.get_history())\n",
+ "\n",
+ "if __name__ == \"__main__\":\n",
+ " main()\n",
+ "\"\"\"\n",
+ "\n",
+ "print(\"📝 Example 2: Calculator Class\")\n",
+ "print(\"=\" * 50)\n",
+ "print(python_code_2)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Test the second example\n",
+ "if translator.available_models:\n",
+ " print(\"🔄 Translating Calculator class...\")\n",
+ " \n",
+ " try:\n",
+ " comparison2 = translator.translate_python_to_cpp(\n",
+ " python_code_2, \n",
+ " \"Calculator class with history tracking\",\n",
+ " use_all_models=True\n",
+ " )\n",
+ " \n",
+ " print(f\"✅ Translation completed!\")\n",
+ " print(f\"🏆 Best model: {comparison2.best_model}\")\n",
+ " \n",
+ " # Show summary\n",
+ " print(f\"\\n📊 Summary:\")\n",
+ " print(comparison2.comparison_summary)\n",
+ " \n",
+ " except Exception as e:\n",
+ " print(f\"❌ Translation failed: {e}\")\n",
+ "else:\n",
+ " print(\"⚠️ No LLM models available.\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7. Summary and Results\n",
+ "\n",
+ "This notebook demonstrates a comprehensive AI-powered code translation system that:\n",
+ "\n",
+ "### Key Achievements:\n",
+ "- **Multi-LLM Support**: Successfully integrates OpenAI GPT, Anthropic Claude, and Google Gemini\n",
+ "- **C++ Compilation**: Automatically compiles and tests generated C++ code\n",
+ "- **Quality Analysis**: Provides detailed code quality metrics and performance benchmarking\n",
+ "- **Model Comparison**: Compares translation results across different AI models\n",
+ "- **Error Handling**: Robust error handling with detailed diagnostics\n",
+ "\n",
+ "### Use Cases:\n",
+ "- **Learning C++**: Translate Python code to learn C++ equivalents\n",
+ "- **Code Migration**: Convert Python projects to C++ for performance\n",
+ "- **Educational Tool**: Compare different AI models' translation quality\n",
+ "- **Performance Analysis**: Benchmark Python vs C++ implementations\n",
+ "\n",
+ "### Next Steps:\n",
+ "1. Set up your API keys for OpenAI, Anthropic, and Google\n",
+ "2. Run the notebook cells to test the translation system\n",
+ "3. Experiment with your own Python code\n",
+ "4. Compare results across different AI models\n",
+ "5. Analyze code quality and performance metrics\n",
+ "\n",
+ "**Happy coding! 🎉**\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "rom "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/week4/community-contributions/python_to_cpp_translator.ipynb b/week4/community-contributions/python_to_cpp_translator.ipynb
new file mode 100644
index 0000000..baf38e7
--- /dev/null
+++ b/week4/community-contributions/python_to_cpp_translator.ipynb
@@ -0,0 +1,571 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Python to C++ Code Translator using LLMs\n",
+ "\n",
+ "This notebook translates Python code to compilable C++ using GPT, Gemini, or Claude.\n",
+ "\n",
+ "## Features:\n",
+ "- 🤖 Multiple LLM support (GPT, Gemini, Claude)\n",
+ "- ✅ Automatic compilation testing with g++\n",
+ "- 🔄 Comparison mode to test all LLMs\n",
+ "- 💬 Interactive translation mode"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 1: Install Required Packages\n",
+ "\n",
+ "Run this cell first to install all dependencies:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!uv add openai anthropic python-dotenv google-generativeai\n",
+ "#!pip install openai anthropic python-dotenv google-generativeai"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 2: Import Libraries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import subprocess\n",
+ "import tempfile\n",
+ "from pathlib import Path\n",
+ "from dotenv import load_dotenv\n",
+ "import openai\n",
+ "from anthropic import Anthropic\n",
+ "import google.generativeai as genai"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 3: Load API Keys\n",
+ "\n",
+ "Make sure you have a `.env` file with:\n",
+ "```\n",
+ "OPENAI_API_KEY=your_key_here\n",
+ "GEMINI_API_KEY=your_key_here\n",
+ "ANTHROPIC_API_KEY=your_key_here\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Load API keys from .env file\n",
+ "load_dotenv()\n",
+ "\n",
+ "# Initialize API clients\n",
+ "openai_client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))\n",
+ "anthropic_client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY'))\n",
+ "genai.configure(api_key=os.getenv('GEMINI_API_KEY'))\n",
+ "\n",
+ "print(\"✓ API keys loaded successfully\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 4: Define System Prompt"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "SYSTEM_PROMPT = \"\"\"You are an expert programmer that translates Python code to C++.\n",
+ "Translate the given Python code to efficient, compilable C++ code.\n",
+ "\n",
+ "Requirements:\n",
+ "- The C++ code must compile without errors\n",
+ "- Include all necessary headers\n",
+ "- Use modern C++ (C++11 or later) features where appropriate\n",
+ "- Add proper error handling\n",
+ "- Maintain the same functionality as the Python code\n",
+ "- Include a main() function if the Python code has executable statements\n",
+ "\n",
+ "Only return the C++ code, no explanations unless there are important notes about compilation.\"\"\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 5: LLM Translation Functions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def translate_with_gpt(python_code, model=\"gpt-4o\"):\n",
+ " \"\"\"Translate Python to C++ using OpenAI's GPT models\"\"\"\n",
+ " try:\n",
+ " response = openai_client.chat.completions.create(\n",
+ " model=model,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
+ " {\"role\": \"user\", \"content\": f\"Translate this Python code to C++:\\n\\n{python_code}\"}\n",
+ " ],\n",
+ " temperature=0.2\n",
+ " )\n",
+ " return response.choices[0].message.content\n",
+ " except Exception as e:\n",
+ " return f\"Error with GPT: {str(e)}\"\n",
+ "\n",
+ "def translate_with_gemini(python_code, model=\"gemini-2.0-flash-exp\"):\n",
+ " \"\"\"Translate Python to C++ using Google's Gemini\"\"\"\n",
+ " try:\n",
+ " model_instance = genai.GenerativeModel(model)\n",
+ " prompt = f\"{SYSTEM_PROMPT}\\n\\nTranslate this Python code to C++:\\n\\n{python_code}\"\n",
+ " response = model_instance.generate_content(prompt)\n",
+ " return response.text\n",
+ " except Exception as e:\n",
+ " return f\"Error with Gemini: {str(e)}\"\n",
+ "\n",
+ "def translate_with_claude(python_code, model=\"claude-sonnet-4-20250514\"):\n",
+ " \"\"\"Translate Python to C++ using Anthropic's Claude\"\"\"\n",
+ " try:\n",
+ " response = anthropic_client.messages.create(\n",
+ " model=model,\n",
+ " max_tokens=4096,\n",
+ " temperature=0.2,\n",
+ " system=SYSTEM_PROMPT,\n",
+ " messages=[\n",
+ " {\"role\": \"user\", \"content\": f\"Translate this Python code to C++:\\n\\n{python_code}\"}\n",
+ " ]\n",
+ " )\n",
+ " return response.content[0].text\n",
+ " except Exception as e:\n",
+ " return f\"Error with Claude: {str(e)}\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 6: Main Translation Function"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def translate_python_to_cpp(python_code, llm=\"gpt\", model=None):\n",
+ " \"\"\"\n",
+ " Translate Python code to C++ using specified LLM\n",
+ " \n",
+ " Args:\n",
+ " python_code (str): Python code to translate\n",
+ " llm (str): LLM to use ('gpt', 'gemini', or 'claude')\n",
+ " model (str): Specific model version (optional)\n",
+ " \n",
+ " Returns:\n",
+ " str: Translated C++ code\n",
+ " \"\"\"\n",
+ " print(f\"🔄 Translating with {llm.upper()}...\")\n",
+ " \n",
+ " if llm.lower() == \"gpt\":\n",
+ " model = model or \"gpt-4o\"\n",
+ " cpp_code = translate_with_gpt(python_code, model)\n",
+ " elif llm.lower() == \"gemini\":\n",
+ " model = model or \"gemini-2.0-flash-exp\"\n",
+ " cpp_code = translate_with_gemini(python_code, model)\n",
+ " elif llm.lower() == \"claude\":\n",
+ " model = model or \"claude-sonnet-4-20250514\"\n",
+ " cpp_code = translate_with_claude(python_code, model)\n",
+ " else:\n",
+ " return \"Error: Invalid LLM. Choose 'gpt', 'gemini', or 'claude'\"\n",
+ " \n",
+ " return cpp_code"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 7: Compilation Testing Functions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def extract_cpp_code(text):\n",
+ " \"\"\"Extract C++ code from markdown code blocks if present\"\"\"\n",
+ " if \"```cpp\" in text:\n",
+ " start = text.find(\"```cpp\") + 6\n",
+ " end = text.find(\"```\", start)\n",
+ " return text[start:end].strip()\n",
+ " elif \"```c++\" in text:\n",
+ " start = text.find(\"```c++\") + 6\n",
+ " end = text.find(\"```\", start)\n",
+ " return text[start:end].strip()\n",
+ " elif \"```\" in text:\n",
+ " start = text.find(\"```\") + 3\n",
+ " end = text.find(\"```\", start)\n",
+ " return text[start:end].strip()\n",
+ " return text.strip()\n",
+ "\n",
+ "def compile_cpp_code(cpp_code, output_name=\"translated_program\"):\n",
+ " \"\"\"\n",
+ " Compile C++ code and return compilation status\n",
+ " \n",
+ " Args:\n",
+ " cpp_code (str): C++ code to compile\n",
+ " output_name (str): Name of output executable\n",
+ " \n",
+ " Returns:\n",
+ " dict: Compilation result with status and messages\n",
+ " \"\"\"\n",
+ " # Extract code from markdown if present\n",
+ " cpp_code = extract_cpp_code(cpp_code)\n",
+ " \n",
+ " # Create temporary directory\n",
+ " with tempfile.TemporaryDirectory() as tmpdir:\n",
+ " cpp_file = Path(tmpdir) / \"program.cpp\"\n",
+ " exe_file = Path(tmpdir) / output_name\n",
+ " \n",
+ " # Write C++ code to file\n",
+ " with open(cpp_file, 'w') as f:\n",
+ " f.write(cpp_code)\n",
+ " \n",
+ " # Try to compile\n",
+ " try:\n",
+ " result = subprocess.run(\n",
+ " ['g++', '-std=c++17', str(cpp_file), '-o', str(exe_file)],\n",
+ " capture_output=True,\n",
+ " text=True,\n",
+ " timeout=10\n",
+ " )\n",
+ " \n",
+ " if result.returncode == 0:\n",
+ " return {\n",
+ " 'success': True,\n",
+ " 'message': '✓ Compilation successful!',\n",
+ " 'executable': str(exe_file),\n",
+ " 'stdout': result.stdout,\n",
+ " 'stderr': result.stderr\n",
+ " }\n",
+ " else:\n",
+ " return {\n",
+ " 'success': False,\n",
+ " 'message': '✗ Compilation failed',\n",
+ " 'stdout': result.stdout,\n",
+ " 'stderr': result.stderr\n",
+ " }\n",
+ " except subprocess.TimeoutExpired:\n",
+ " return {\n",
+ " 'success': False,\n",
+ " 'message': '✗ Compilation timed out'\n",
+ " }\n",
+ " except FileNotFoundError:\n",
+ " return {\n",
+ " 'success': False,\n",
+ " 'message': '✗ g++ compiler not found. Please install g++ to compile C++ code.'\n",
+ " }\n",
+ " except Exception as e:\n",
+ " return {\n",
+ " 'success': False,\n",
+ " 'message': f'✗ Compilation error: {str(e)}'\n",
+ " }"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 8: Complete Pipeline"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def translate_and_compile(python_code, llm=\"gpt\", model=None, verbose=True):\n",
+ " \"\"\"\n",
+ " Translate Python to C++ and attempt compilation\n",
+ " \n",
+ " Args:\n",
+ " python_code (str): Python code to translate\n",
+ " llm (str): LLM to use\n",
+ " model (str): Specific model version\n",
+ " verbose (bool): Print detailed output\n",
+ " \n",
+ " Returns:\n",
+ " dict: Results including translated code and compilation status\n",
+ " \"\"\"\n",
+ " # Translate\n",
+ " cpp_code = translate_python_to_cpp(python_code, llm, model)\n",
+ " \n",
+ " if verbose:\n",
+ " print(\"\\n\" + \"=\"*60)\n",
+ " print(\"TRANSLATED C++ CODE:\")\n",
+ " print(\"=\"*60)\n",
+ " print(cpp_code)\n",
+ " print(\"=\"*60 + \"\\n\")\n",
+ " \n",
+ " # Compile\n",
+ " print(\"🔨 Attempting to compile...\")\n",
+ " compilation_result = compile_cpp_code(cpp_code)\n",
+ " \n",
+ " if verbose:\n",
+ " print(compilation_result['message'])\n",
+ " if not compilation_result['success'] and 'stderr' in compilation_result:\n",
+ " print(\"\\nCompilation errors:\")\n",
+ " print(compilation_result['stderr'])\n",
+ " \n",
+ " return {\n",
+ " 'cpp_code': cpp_code,\n",
+ " 'compilation': compilation_result\n",
+ " }"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Example 1: Factorial Function"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "python_code_1 = \"\"\"\n",
+ "def factorial(n):\n",
+ " if n <= 1:\n",
+ " return 1\n",
+ " return n * factorial(n - 1)\n",
+ "\n",
+ "# Test the function\n",
+ "print(factorial(5))\n",
+ "\"\"\"\n",
+ "\n",
+ "print(\"Example 1: Factorial Function\")\n",
+ "print(\"=\"*60)\n",
+ "result1 = translate_and_compile(python_code_1, llm=\"gpt\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Example 2: Sum of Squares"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "python_code_2 = \"\"\"\n",
+ "def sum_of_squares(numbers):\n",
+ " return sum(x**2 for x in numbers)\n",
+ "\n",
+ "numbers = [1, 2, 3, 4, 5]\n",
+ "result = sum_of_squares(numbers)\n",
+ "print(f\"Sum of squares: {result}\")\n",
+ "\"\"\"\n",
+ "\n",
+ "print(\"Example 2: Sum of Squares\")\n",
+ "print(\"=\"*60)\n",
+ "result2 = translate_and_compile(python_code_2, llm=\"claude\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Example 3: Fibonacci with Gemini"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "python_code_3 = \"\"\"\n",
+ "def fibonacci(n):\n",
+ " if n <= 1:\n",
+ " return n\n",
+ " a, b = 0, 1\n",
+ " for _ in range(2, n + 1):\n",
+ " a, b = b, a + b\n",
+ " return b\n",
+ "\n",
+ "print(f\"Fibonacci(10) = {fibonacci(10)}\")\n",
+ "\"\"\"\n",
+ "\n",
+ "print(\"Example 3: Fibonacci with Gemini\")\n",
+ "print(\"=\"*60)\n",
+ "result3 = translate_and_compile(python_code_3, llm=\"gemini\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Compare All LLMs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def compare_llms(python_code):\n",
+ " \"\"\"Compare all three LLMs on the same Python code\"\"\"\n",
+ " llms = [\"gpt\", \"gemini\", \"claude\"]\n",
+ " results = {}\n",
+ " \n",
+ " for llm in llms:\n",
+ " print(f\"\\n{'='*60}\")\n",
+ " print(f\"Testing with {llm.upper()}\")\n",
+ " print('='*60)\n",
+ " results[llm] = translate_and_compile(python_code, llm=llm, verbose=False)\n",
+ " print(results[llm]['compilation']['message'])\n",
+ " \n",
+ " return results"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Test code for comparison\n",
+ "python_code_compare = \"\"\"\n",
+ "def is_prime(n):\n",
+ " if n < 2:\n",
+ " return False\n",
+ " for i in range(2, int(n**0.5) + 1):\n",
+ " if n % i == 0:\n",
+ " return False\n",
+ " return True\n",
+ "\n",
+ "primes = [x for x in range(2, 20) if is_prime(x)]\n",
+ "print(f\"Primes under 20: {primes}\")\n",
+ "\"\"\"\n",
+ "\n",
+ "print(\"COMPARING ALL LLMs\")\n",
+ "print(\"=\"*60)\n",
+ "comparison_results = compare_llms(python_code_compare)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Interactive Translation Mode\n",
+ "\n",
+ "Use this cell to translate your own Python code interactively:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Your custom Python code here\n",
+ "your_python_code = \"\"\"\n",
+ "# Paste your Python code here\n",
+ "def hello_world():\n",
+ " print(\"Hello, World!\")\n",
+ "\n",
+ "hello_world()\n",
+ "\"\"\"\n",
+ "\n",
+ "# Choose your LLM: \"gpt\", \"gemini\", or \"claude\"\n",
+ "chosen_llm = \"gpt\"\n",
+ "\n",
+ "result = translate_and_compile(your_python_code, llm=chosen_llm)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Summary\n",
+ "\n",
+ "You now have a complete Python to C++ translator! \n",
+ "\n",
+ "### Main Functions:\n",
+ "- `translate_python_to_cpp(code, llm, model)` - Translate only\n",
+ "- `translate_and_compile(code, llm, model)` - Translate and compile\n",
+ "- `compare_llms(code)` - Compare all three LLMs\n",
+ "\n",
+ "### Supported LLMs:\n",
+ "- **gpt** - OpenAI GPT-4o\n",
+ "- **gemini** - Google Gemini 2.0 Flash\n",
+ "- **claude** - Anthropic Claude Sonnet 4\n",
+ "\n",
+ "Happy translating! 🚀"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/week4/community-contributions/tochi/code_converter.ipynb b/week4/community-contributions/tochi/code_converter.ipynb
new file mode 100644
index 0000000..5101d61
--- /dev/null
+++ b/week4/community-contributions/tochi/code_converter.ipynb
@@ -0,0 +1,569 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "c1fcc6e9",
+ "metadata": {},
+ "source": [
+ "# Code Converter - Python to TypeScript Code\n",
+ "\n",
+ "This implementation, converts python code to optimized TypeScript Code, and runs the function"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "16b6b063",
+ "metadata": {},
+ "source": [
+ "## Set up and imports\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 115,
+ "id": "b3dc394c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "import os\n",
+ "import io\n",
+ "import sys\n",
+ "from dotenv import load_dotenv\n",
+ "from openai import OpenAI\n",
+ "import subprocess\n",
+ "from IPython.display import Markdown, display, display_markdown\n",
+ "from system_info import retrieve_system_info\n",
+ "import gradio as gr"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1c9a0936",
+ "metadata": {},
+ "source": [
+ "# Initializing the access keys"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 116,
+ "id": "fac104ec",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "OpenAI API Key exists and begins sk-proj-\n"
+ ]
+ }
+ ],
+ "source": [
+ "load_dotenv(override=True)\n",
+ "openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n",
+ "\n",
+ "if openai_api_key:\n",
+ " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
+ "else:\n",
+ " print(\"OpenAI API Key not set. Check your engironment variables and try again\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5932182f",
+ "metadata": {},
+ "source": [
+ "# Connecting to client libraries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 117,
+ "id": "4000f231",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "openai = OpenAI()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 118,
+ "id": "51c67ac0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# contants\n",
+ "OPENAI_MODEL= \"gpt-5-nano\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 119,
+ "id": "ab4342bf",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'os': {'system': 'Darwin',\n",
+ " 'arch': 'arm64',\n",
+ " 'release': '24.5.0',\n",
+ " 'version': 'Darwin Kernel Version 24.5.0: Tue Apr 22 19:48:46 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T8103',\n",
+ " 'kernel': '24.5.0',\n",
+ " 'distro': None,\n",
+ " 'wsl': False,\n",
+ " 'rosetta2_translated': False,\n",
+ " 'target_triple': 'arm64-apple-darwin24.5.0'},\n",
+ " 'package_managers': ['xcode-select (CLT)', 'brew'],\n",
+ " 'cpu': {'brand': 'Apple M1',\n",
+ " 'cores_logical': 8,\n",
+ " 'cores_physical': 8,\n",
+ " 'simd': []},\n",
+ " 'toolchain': {'compilers': {'gcc': 'Apple clang version 17.0.0 (clang-1700.0.13.3)',\n",
+ " 'g++': 'Apple clang version 17.0.0 (clang-1700.0.13.3)',\n",
+ " 'clang': 'Apple clang version 17.0.0 (clang-1700.0.13.3)',\n",
+ " 'msvc_cl': ''},\n",
+ " 'build_tools': {'cmake': '', 'ninja': '', 'make': 'GNU Make 3.81'},\n",
+ " 'linkers': {'ld_lld': ''}}}"
+ ]
+ },
+ "execution_count": 119,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "system_info = retrieve_system_info()\n",
+ "system_info"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 120,
+ "id": "1a1c1324",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "message = f\"\"\"\n",
+ "Here is a report of the system information for my computer.\n",
+ "I want to run a TypeScript compiler to compile a single TypeScript file called main.cpp and then execute it in the simplest way possible.\n",
+ "Please reply with whether I need to install any TypeScript compiler to do this. If so, please provide the simplest step by step instructions to do so.\n",
+ "\n",
+ "If I'm already set up to compile TypeScript code, then I'd like to run something like this in Python to compile and execute the code:\n",
+ "```python\n",
+ "compile_command = # something here - to achieve the fastest possible runtime performance\n",
+ "compile_result = subprocess.run(compile_command, check=True, text=True, capture_output=True)\n",
+ "run_command = # something here\n",
+ "run_result = subprocess.run(run_command, check=True, text=True, capture_output=True)\n",
+ "return run_result.stdout\n",
+ "```\n",
+ "Please tell me exactly what I should use for the compile_command and run_command.\n",
+ "\n",
+ "System information:\n",
+ "{system_info}\n",
+ "\"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 121,
+ "id": "439015c1",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/markdown": [
+ "Short answer:\n",
+ "- Yes, to compile TypeScript you need a TypeScript compiler (tsc). On macOS you’ll typically install Node.js first, then install TypeScript.\n",
+ "- Important: main.cpp sounds like a C++ file. The TypeScript compiler (tsc) cannot compile .cpp. If you want to use TypeScript, rename the file to main.ts (and ensure its contents are TypeScript). If you actually meant C++, use a C++ compiler instead (clang/g++).\n",
+ "\n",
+ "Step-by-step to set up TypeScript (simplest path on your system):\n",
+ "1) Install Node.js (which also installs npm)\n",
+ "- brew update\n",
+ "- brew install node\n",
+ "\n",
+ "2) Install the TypeScript compiler globally\n",
+ "- npm install -g typescript\n",
+ "\n",
+ "3) Verify installations\n",
+ "- node -v\n",
+ "- npm -v\n",
+ "- tsc -v\n",
+ "\n",
+ "4) Compile and run a TypeScript file (assuming your file is main.ts)\n",
+ "- tsc main.ts\n",
+ "- node main.js\n",
+ "\n",
+ "Notes:\n",
+ "- If your file is indeed C++ (main.cpp), you cannot compile it with tsc. To compile C++, use clang++ (on macOS) or g++:\n",
+ " - clang++ -std=c++17 main.cpp -o main\n",
+ " - ./main\n",
+ "\n",
+ "Python integration (fill-in for your example)\n",
+ "- If you have a TypeScript file named main.ts and you want to compile it to JavaScript and then run it with Node, use:\n",
+ " compile_command = [\"tsc\", \"main.ts\"]\n",
+ " run_command = [\"node\", \"main.js\"]\n",
+ "\n",
+ "- If you want to show a single command in Python that compiles and runs in one go (still two steps because TS compiles to JS first):\n",
+ " compile_command = [\"tsc\", \"main.ts\"]\n",
+ " run_command = [\"node\", \"main.js\"]\n",
+ "\n",
+ "- If you truly want to bypass TypeScript and run C++ instead (not TypeScript):\n",
+ " compile_command = [\"clang++\", \"-std=c++17\", \"main.cpp\", \"-o\", \"main\"]\n",
+ " run_command = [\"./main\"]\n",
+ "\n",
+ "If you’d like, tell me whether main.cpp is meant to be C++ or you actually have a TypeScript file named main.ts, and I can tailor the exact commands."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "response = openai.chat.completions.create(model=OPENAI_MODEL, messages=[{\"role\":\"user\", \"content\":message}])\n",
+ "display(Markdown(response.choices[0].message.content))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 122,
+ "id": "576cb5fa",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "compile_command = [\"tsc\", \"main.ts\", \"--target\", \"ES2020\", \"--module\", \"commonjs\"]\n",
+ "run_command = [\"ts-node\", \"main.ts\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "01b03700",
+ "metadata": {},
+ "source": [
+ "## System and user prompts for the code converter"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 123,
+ "id": "255e318b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "system_prompt = \"\"\"\n",
+ "Your task is to convert Python code into high performance TypeScript code.\n",
+ "Respond only with TypeScript code. Do not provide any explanation other than occasional comments.\n",
+ "The TypeScript response needs to produce an identical output in the fastest possible time.\n",
+ "\"\"\"\n",
+ "\n",
+ "\n",
+ "def user_prompt_for(python):\n",
+ " return f\"\"\" \n",
+ " port this Python code to TypeScript with the fastest possible implementation that produces identical output in the least time.\n",
+ "\n",
+ " The system information is \n",
+ "\n",
+ " {system_info}\n",
+ "\n",
+ " Your response will be written to a file called main.ts and then compile and ecexted; the compilation command is:\n",
+ "\n",
+ " {compile_command}\n",
+ "\n",
+ " Respond only with C++ code.\n",
+ " Python code to port:\n",
+ "\n",
+ " ```python\n",
+ " {python}\n",
+ " ```\n",
+ "\n",
+ "\"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 124,
+ "id": "09da7cb1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def messages_for(python):\n",
+ " return [\n",
+ " {\"role\": \"system\", \"content\": system_prompt},\n",
+ " {\"role\": \"user\", \"content\": user_prompt_for(python)},\n",
+ " ]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 125,
+ "id": "abcdb617",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def write_output(code):\n",
+ " with open(\"main.ts\", \"w\", encoding=\"utf-8\") as f:\n",
+ " f.write(code)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 126,
+ "id": "c7a32d5f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def convert(python):\n",
+ " reasoning_effort = \"high\"\n",
+ " response = openai.chat.completions.create(\n",
+ " model=OPENAI_MODEL,\n",
+ " messages=messages_for(python),\n",
+ " reasoning_effort=reasoning_effort,\n",
+ " )\n",
+ " reply = response.choices[0].message.content\n",
+ " reply = reply.replace(\"```ts\", \"\").replace(\"```\", \"\")\n",
+ " return reply"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 127,
+ "id": "59a7ec1f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "pi = \"\"\"\n",
+ "import time\n",
+ "\n",
+ "def calculate(iterations, param1, param2):\n",
+ " result = 1.0\n",
+ " for i in range(1, iterations+1):\n",
+ " j = i * param1 - param2\n",
+ " result -= (1/j)\n",
+ " j = i * param1 + param2\n",
+ " result += (1/j)\n",
+ " return result\n",
+ "\n",
+ "start_time = time.time()\n",
+ "result = calculate(200_000_000, 4, 1) * 4\n",
+ "end_time = time.time()\n",
+ "\n",
+ "print(f\"Result: {result:.12f}\")\n",
+ "print(f\"Execution Time: {(end_time - start_time):.6f} seconds\")\n",
+ "\"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 128,
+ "id": "6856393b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def run_python(code):\n",
+ " globals_dict = {\"__builtins__\": __builtins__}\n",
+ "\n",
+ " buffer = io.StringIO()\n",
+ " old_stdout = sys.stdout\n",
+ " sys.stdout = buffer\n",
+ "\n",
+ " try:\n",
+ " exec(code, globals_dict)\n",
+ " output = buffer.getvalue()\n",
+ " except Exception as e:\n",
+ " output = f\"Error: {e}\"\n",
+ " finally:\n",
+ " sys.stdout = old_stdout\n",
+ "\n",
+ " return output"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 129,
+ "id": "c51fa5ea",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'Result: 3.141592656089\\nExecution Time: 19.478347 seconds\\n'"
+ ]
+ },
+ "execution_count": 129,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "run_python(pi)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 130,
+ "id": "69eb2304",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "\"import { performance } from 'perf_hooks';\\n\\nfunction digamma(z: number): number {\\n let acc = 0;\\n while (z < 7) {\\n acc -= 1 / z;\\n z += 1;\\n }\\n const z2 = z * z;\\n const z4 = z2 * z2;\\n const z6 = z4 * z2;\\n const z8 = z4 * z4;\\n const z10 = z8 * z2;\\n const z12 = z10 * z2;\\n const series =\\n Math.log(z)\\n - 1 / (2 * z)\\n - 1 / (12 * z2)\\n + 1 / (120 * z4)\\n - 1 / (252 * z6)\\n + 1 / (240 * z8)\\n - 5 / (660 * z10)\\n + 691 / (32760 * z12);\\n return acc + series;\\n}\\n\\nconst N = 200_000_000;\\n\\nconst t0 = performance.now();\\nconst result =\\n 4 - digamma(N + 0.75) + digamma(0.75) + digamma(N + 1.25) - digamma(1.25);\\nconst t1 = performance.now();\\n\\nconsole.log(`Result: ${result.toFixed(12)}`);\\nconsole.log(`Execution Time: ${((t1 - t0) / 1000).toFixed(6)} seconds`);\""
+ ]
+ },
+ "execution_count": 130,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "convert(pi)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 131,
+ "id": "2ea56d95",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ " \n",
+ "def run_typescript(code):\n",
+ " write_output(code)\n",
+ " try:\n",
+ " subprocess.run(compile_command, check=True, text=True, capture_output=True)\n",
+ " run_result = subprocess.run(run_command, check=True, text=True, capture_output=True)\n",
+ " return run_result.stdout\n",
+ " except subprocess.CalledProcessError as e:\n",
+ " return f\"An error occurred:\\n{e.stderr}\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 132,
+ "id": "79d6bd87",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# run_typescript()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b4799b88",
+ "metadata": {},
+ "source": [
+ "## User Interface"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 133,
+ "id": "8486ce70",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "* Running on local URL: http://127.0.0.1:7864\n",
+ "* To create a public link, set `share=True` in `launch()`.\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/plain": []
+ },
+ "execution_count": 133,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "with gr.Blocks(\n",
+ " theme=gr.themes.Monochrome(), title=\"Port from Python to TypeScript\"\n",
+ ") as ui:\n",
+ " with gr.Row(equal_height=True):\n",
+ " with gr.Column(scale=6):\n",
+ " python = gr.Code(\n",
+ " label=\"Python Original Code\",\n",
+ " value=pi,\n",
+ " language=\"python\",\n",
+ " lines=30,\n",
+ " )\n",
+ " with gr.Column(scale=6):\n",
+ " ts = gr.Code(\n",
+ " label=\"TypeScript (generated)\", value=\"\", language=\"cpp\", lines=26\n",
+ " )\n",
+ " with gr.Row(elem_classes=[\"controls\"]):\n",
+ " python_run = gr.Button(\"Run Python\", elem_classes=[\"run-btn\", \"py\"])\n",
+ " port = gr.Button(\"Convert to TS\", elem_classes=[\"convert-btn\"])\n",
+ " ts_run = gr.Button(\"Run TS\", elem_classes=[\"run-btn\", \"ts\"])\n",
+ "\n",
+ " with gr.Row(equal_height=True):\n",
+ " with gr.Column(scale=6):\n",
+ " python_out = gr.TextArea(label=\"Python Result\", lines=10)\n",
+ " with gr.Column(scale=6):\n",
+ " ts_out = gr.TextArea(label=\"TS output\", lines=10)\n",
+ "\n",
+ " port.click(fn=convert, inputs=[python], outputs=[ts])\n",
+ " python_run.click(fn=run_python, inputs=[python], outputs=[python_out])\n",
+ " ts_run.click(fn=run_typescript, inputs=[ts], outputs=[ts_out])\n",
+ " \n",
+ " \n",
+ "ui.launch(inbrowser=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4663a174",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9033e421",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/week4/community-contributions/week4_exercise_solution-Stephen.ipynb b/week4/community-contributions/week4_exercise_solution-Stephen.ipynb
new file mode 100644
index 0000000..07d5155
--- /dev/null
+++ b/week4/community-contributions/week4_exercise_solution-Stephen.ipynb
@@ -0,0 +1,180 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ed8c52b6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "from dotenv import load_dotenv\n",
+ "from openai import OpenAI\n",
+ "import gradio as gr\n",
+ "\n",
+ "load_dotenv(override=True)\n",
+ "\n",
+ "openai_api_key = os.getenv('OPENAI_API_KEY')\n",
+ "ollama_api_key = os.getenv('OLLAMA_API_KEY')\n",
+ "\n",
+ "if openai_api_key:\n",
+ " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
+ "else:\n",
+ " print(\"OpenAI API Key not set\")\n",
+ "\n",
+ "if ollama_api_key:\n",
+ " print(f\"OLLAMA API Key exists and begins {ollama_api_key[:2]}\")\n",
+ "else:\n",
+ " print(\"OLLAMA API Key not set (and this is optional)\")\n",
+ "\n",
+ "ollama_url = \"http://localhost:11434/v1\"\n",
+ "\n",
+ "openai = OpenAI()\n",
+ "ollama = OpenAI(api_key=ollama_api_key, base_url=ollama_url)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "id": "c628f95e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "system_prompt_doc = \"\"\"You are an expert Python developer and code reviewer.\n",
+ "Your job is to read the user's provided function, and return:\n",
+ "1. A concise, PEP-257-compliant docstring summarizing what the function does, clarifying types, parameters, return values, and side effects.\n",
+ "2. Helpful inline comments that improve both readability and maintainability, without restating what the code obviously does.\n",
+ "\n",
+ "Only output the function, not explanations or additional text. \n",
+ "Do not modify variable names or refactor the function logic.\n",
+ "Your response should improve the code's clarity and documentation, making it easier for others to understand and maintain.\n",
+ "Don't be extremely verbose.\n",
+ "Your answer should be at a senior level of expertise.\n",
+ "\"\"\"\n",
+ "\n",
+ "system_prompt_tests = \"\"\"You are a seasoned Python developer and testing expert.\n",
+ "Your task is to read the user's provided function, and generate:\n",
+ "1. A concise set of meaningful unit tests that thoroughly validate the function's correctness, including typical, edge, and error cases.\n",
+ "2. The tests should be written for pytest (or unittest if pytest is not appropriate), use clear, descriptive names, and avoid unnecessary complexity.\n",
+ "3. If dependencies or mocking are needed, include minimal necessary setup code (but avoid over-mocking).\n",
+ "\n",
+ "Only output the relevant test code, not explanations or extra text.\n",
+ "Do not change the original function; focus solely on comprehensive, maintainable test coverage that other developers can easily understand and extend.\n",
+ "\"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "id": "4bb84e6c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "models = [\"gpt-4.1-mini\", \"llama3.1\"]\n",
+ "clients = {\"gpt-4.1-mini\": openai, \"llama3.1\": ollama}\n",
+ "\n",
+ "def generate_documentation(code, model):\n",
+ " response = clients[model].chat.completions.create(\n",
+ " model=model,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": system_prompt_doc},\n",
+ " {\"role\": \"user\", \"content\": code}\n",
+ " ],\n",
+ " stream=True\n",
+ " )\n",
+ " output = \"\"\n",
+ " for chunk in response:\n",
+ " output += chunk.choices[0].delta.content or \"\"\n",
+ " yield output.replace(\"```python\", \"\").replace(\"```\", \"\")\n",
+ "\n",
+ "def generate_tests(code, model):\n",
+ " response = clients[model].chat.completions.create(\n",
+ " model=model,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": system_prompt_tests},\n",
+ " {\"role\": \"user\", \"content\": code}\n",
+ " ],\n",
+ " stream=True\n",
+ " )\n",
+ " output = \"\"\n",
+ " for chunk in response:\n",
+ " output += chunk.choices[0].delta.content or \"\"\n",
+ " yield output.replace(\"```python\", \"\").replace(\"```\", \"\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a4e65b26",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "with gr.Blocks(theme=gr.themes.Soft(spacing_size=gr.themes.sizes.spacing_sm, radius_size=gr.themes.sizes.radius_none)) as ui:\n",
+ " gr.Markdown(\"# Python Toolbox\", elem_id=\"app-title\")\n",
+ " \n",
+ " with gr.Tab(\"Docstring Generator\") as tab1:\n",
+ " gr.Markdown(\"## Docstring & Comment Generator\")\n",
+ " gr.Markdown(\"Paste your function below to generate helpful docstrings and inline comments!\")\n",
+ "\n",
+ " with gr.Row():\n",
+ " with gr.Column():\n",
+ " code_input = gr.Code(label=\"Your Python function here\", lines=20, language=\"python\")\n",
+ " model_dropdown = gr.Dropdown(choices=models, value=models[0], label=\"Select model\")\n",
+ " submit_doc_btn = gr.Button(\"Generate docstring & comments\")\n",
+ " with gr.Column():\n",
+ " code_output = gr.Code(label=\"New function with docstring and comments\", language=\"python\")\n",
+ "\n",
+ " submit_doc_btn.click(\n",
+ " generate_documentation, \n",
+ " inputs=[code_input, model_dropdown], \n",
+ " outputs=code_output\n",
+ " )\n",
+ "\n",
+ " with gr.Tab(\"Unit Tests Generator\") as tab2:\n",
+ " gr.Markdown(\"## Unit Test Generator\")\n",
+ " gr.Markdown(\"Paste your function below to generate helpful unit tests!\")\n",
+ "\n",
+ " with gr.Row():\n",
+ " with gr.Column():\n",
+ " code_input_2 = gr.Code(label=\"Your Python function here\", lines=20, language=\"python\")\n",
+ " model_dropdown_2 = gr.Dropdown(choices=models, value=models[0], label=\"Select model\")\n",
+ " submit_test_btn = gr.Button(\"Generate unit tests\")\n",
+ " with gr.Column():\n",
+ " code_output_2 = gr.Code(label=\"Generated unit tests\", language=\"python\")\n",
+ "\n",
+ " submit_test_btn.click(\n",
+ " generate_tests, \n",
+ " inputs=[code_input_2, model_dropdown_2], \n",
+ " outputs=code_output_2\n",
+ " )\n",
+ " \n",
+ " \n",
+ " tab1.select(lambda x: x, inputs=code_input_2, outputs=code_input)\n",
+ " tab2.select(lambda x: x, inputs=code_input, outputs=code_input_2)\n",
+ "\n",
+ "ui.launch(share=False, inbrowser=True)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/week5/community-contributions/w5d5_worker.py b/week5/community-contributions/w5d5_worker.py
new file mode 100644
index 0000000..822cfcd
--- /dev/null
+++ b/week5/community-contributions/w5d5_worker.py
@@ -0,0 +1,445 @@
+#!/usr/bin/env python3
+"""
+Knowledge Worker with Document Upload and Google Drive Integration
+
+This script creates a knowledge worker that:
+1. Allows users to upload documents through a Gradio UI
+2. Integrates with Google Drive to access documents
+3. Uses Chroma vector database for efficient document retrieval
+4. Implements RAG (Retrieval Augmented Generation) for accurate responses
+
+The system updates its context dynamically when new documents are uploaded.
+"""
+
+import os
+import glob
+import tempfile
+from pathlib import Path
+from dotenv import load_dotenv
+import gradio as gr
+
+# LangChain imports
+from langchain_community.document_loaders import DirectoryLoader, TextLoader, PyPDFLoader
+from langchain_core.documents import Document
+from langchain_openai import OpenAIEmbeddings, ChatOpenAI
+from langchain_chroma import Chroma
+
+# Visualization imports
+import numpy as np
+from sklearn.manifold import TSNE
+import plotly.graph_objects as go
+
+# Removed Google Drive API imports
+
+# Additional document loaders
+try:
+ from langchain_community.document_loaders import Docx2txtLoader, UnstructuredExcelLoader
+except ImportError:
+ print("Warning: Some document loaders not available. PDF and text files will still work.")
+ Docx2txtLoader = None
+ UnstructuredExcelLoader = None
+
+# Configuration
+MODEL = "gpt-4o-mini" # Using a cost-effective model
+DB_NAME = "knowledge_worker_db"
+UPLOAD_FOLDER = "uploaded_documents"
+
+# Create upload folder if it doesn't exist
+os.makedirs(UPLOAD_FOLDER, exist_ok=True)
+
+# Load environment variables
+load_dotenv(override=True)
+os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')
+
+# Removed Google Drive credentials configuration
+
+# Use a simple text splitter approach
+class SimpleTextSplitter:
+ def __init__(self, chunk_size=1000, chunk_overlap=200):
+ self.chunk_size = chunk_size
+ self.chunk_overlap = chunk_overlap
+
+ def split_documents(self, documents):
+ chunks = []
+ for doc in documents:
+ text = doc.page_content
+ start = 0
+ while start < len(text):
+ end = start + self.chunk_size
+ chunk_text = text[start:end]
+ chunk_doc = Document(page_content=chunk_text, metadata=doc.metadata.copy())
+ chunks.append(chunk_doc)
+ start = end - self.chunk_overlap
+ return chunks
+
+CharacterTextSplitter = SimpleTextSplitter
+
+# Try different import paths for memory and chains
+try:
+ from langchain.memory import ConversationBufferMemory
+ from langchain.chains import ConversationalRetrievalChain
+except ImportError:
+ try:
+ from langchain_core.memory import ConversationBufferMemory
+ from langchain_core.chains import ConversationalRetrievalChain
+ except ImportError:
+ try:
+ from langchain_community.memory import ConversationBufferMemory
+ from langchain_community.chains import ConversationalRetrievalChain
+ except ImportError:
+ print("Warning: Memory and chains modules not found. Creating simple alternatives.")
+ # Create simple alternatives
+ class ConversationBufferMemory:
+ def __init__(self, memory_key='chat_history', return_messages=True):
+ self.memory_key = memory_key
+ self.return_messages = return_messages
+ self.chat_memory = []
+
+ def save_context(self, inputs, outputs):
+ self.chat_memory.append((inputs, outputs))
+
+ def load_memory_variables(self, inputs):
+ return {self.memory_key: self.chat_memory}
+
+ class ConversationalRetrievalChain:
+ def __init__(self, llm, retriever, memory):
+ self.llm = llm
+ self.retriever = retriever
+ self.memory = memory
+
+ def invoke(self, inputs):
+ question = inputs.get("question", "")
+ # Simple implementation - just return a basic response
+ return {"answer": f"I received your question: {question}. This is a simplified response."}
+
+# Removed Google Drive Integration Functions
+
+# Document Processing Functions
+def get_loader_for_file(file_path):
+ """
+ Get the appropriate document loader based on file extension
+ """
+ file_extension = os.path.splitext(file_path)[1].lower()
+
+ if file_extension == '.pdf':
+ return PyPDFLoader(file_path)
+ elif file_extension in ['.docx', '.doc'] and Docx2txtLoader:
+ return Docx2txtLoader(file_path)
+ elif file_extension in ['.xlsx', '.xls'] and UnstructuredExcelLoader:
+ return UnstructuredExcelLoader(file_path)
+ elif file_extension in ['.txt', '.md']:
+ return TextLoader(file_path, encoding='utf-8')
+ else:
+ # Default to text loader for unknown types
+ try:
+ return TextLoader(file_path, encoding='utf-8')
+ except:
+ return None
+
+def load_document(file_path):
+ """
+ Load a document using the appropriate loader
+ """
+ loader = get_loader_for_file(file_path)
+ if loader:
+ try:
+ return loader.load()
+ except Exception as e:
+ print(f"Error loading document {file_path}: {e}")
+ return []
+
+def process_documents(documents):
+ """
+ Split documents into chunks for embedding
+ """
+ text_splitter = CharacterTextSplitter(
+ chunk_size=1000,
+ chunk_overlap=200
+ )
+ chunks = text_splitter.split_documents(documents)
+ return chunks
+
+# Knowledge Base Class
+class KnowledgeBase:
+ def __init__(self, db_name=DB_NAME):
+ self.db_name = db_name
+ self.embeddings = OpenAIEmbeddings()
+ self.vectorstore = None
+ self.initialize_vectorstore()
+
+ def initialize_vectorstore(self):
+ """
+ Initialize the vector store, loading from disk if it exists
+ """
+ if os.path.exists(self.db_name):
+ self.vectorstore = Chroma(persist_directory=self.db_name, embedding_function=self.embeddings)
+ print(f"Loaded existing vector store with {self.vectorstore._collection.count()} documents")
+ else:
+ # Create empty vectorstore
+ self.vectorstore = Chroma(persist_directory=self.db_name, embedding_function=self.embeddings)
+ print("Created new vector store")
+
+ def add_documents(self, documents):
+ """
+ Process and add documents to the vector store
+ """
+ if not documents:
+ return False
+
+ chunks = process_documents(documents)
+ if not chunks:
+ return False
+
+ # Add to existing vectorstore
+ self.vectorstore.add_documents(chunks)
+ print(f"Added {len(chunks)} chunks to vector store")
+ return True
+
+ def get_retriever(self, k=4):
+ """
+ Get a retriever for the vector store
+ """
+ return self.vectorstore.as_retriever(search_kwargs={"k": k})
+
+ def visualize_vectors(self):
+ """
+ Create a 3D visualization of the vector store
+ """
+ try:
+ collection = self.vectorstore._collection
+ result = collection.get(include=['embeddings', 'documents', 'metadatas'])
+
+ if result['embeddings'] is None or len(result['embeddings']) == 0:
+ print("No embeddings found in vector store")
+ return None
+
+ vectors = np.array(result['embeddings'])
+ documents = result['documents']
+ metadatas = result['metadatas']
+
+ if len(vectors) < 2:
+ print("Not enough vectors for visualization (need at least 2)")
+ return None
+
+ # Get source info for coloring
+ sources = [metadata.get('source', 'unknown') for metadata in metadatas]
+ unique_sources = list(set(sources))
+ colors = [['blue', 'green', 'red', 'orange', 'purple', 'cyan'][unique_sources.index(s) % 6] for s in sources]
+
+ # Reduce dimensions for visualization
+ # Adjust perplexity based on number of samples
+ n_samples = len(vectors)
+ perplexity = min(30, max(1, n_samples - 1))
+
+ tsne = TSNE(n_components=3, random_state=42, perplexity=perplexity)
+ reduced_vectors = tsne.fit_transform(vectors)
+
+ # Create the 3D scatter plot
+ fig = go.Figure(data=[go.Scatter3d(
+ x=reduced_vectors[:, 0],
+ y=reduced_vectors[:, 1],
+ z=reduced_vectors[:, 2],
+ mode='markers',
+ marker=dict(size=5, color=colors, opacity=0.8),
+ text=[f"Source: {s}
Text: {d[:100]}..." for s, d in zip(sources, documents)],
+ hoverinfo='text'
+ )])
+
+ fig.update_layout(
+ title='3D Vector Store Visualization',
+ scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
+ width=900,
+ height=700,
+ margin=dict(r=20, b=10, l=10, t=40)
+ )
+
+ return fig
+
+ except Exception as e:
+ print(f"Error creating visualization: {e}")
+ return None
+
+# Simple fallback chain implementation
+class SimpleConversationalChain:
+ def __init__(self, llm, retriever, memory):
+ self.llm = llm
+ self.retriever = retriever
+ self.memory = memory
+
+ def invoke(self, inputs):
+ question = inputs.get("question", "")
+ # Get relevant documents - try different methods
+ try:
+ docs = self.retriever.get_relevant_documents(question)
+ except AttributeError:
+ try:
+ docs = self.retriever.invoke(question)
+ except:
+ docs = []
+
+ context = "\n".join([doc.page_content for doc in docs[:3]]) if docs else "No relevant context found."
+
+ # Create a simple prompt
+ prompt = f"""Based on the following context, answer the question:
+
+Context: {context}
+
+Question: {question}
+
+Answer:"""
+
+ # Get response from LLM
+ response = self.llm.invoke(prompt)
+ return {"answer": response.content if hasattr(response, 'content') else str(response)}
+
+# Chat System Class
+class ChatSystem:
+ def __init__(self, knowledge_base, model_name=MODEL):
+ self.knowledge_base = knowledge_base
+ self.model_name = model_name
+ self.llm = ChatOpenAI(temperature=0.7, model_name=self.model_name)
+ self.memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
+ self.conversation_chain = self._create_conversation_chain()
+
+ def _create_conversation_chain(self):
+ """
+ Create a new conversation chain with the current retriever
+ """
+ retriever = self.knowledge_base.get_retriever()
+ # Skip the problematic ConversationalRetrievalChain and use simple implementation
+ print("Using simple conversational chain implementation")
+ return SimpleConversationalChain(self.llm, retriever, self.memory)
+
+ def reset_conversation(self):
+ """
+ Reset the conversation memory and chain
+ """
+ self.memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
+ self.conversation_chain = self._create_conversation_chain()
+ return "Conversation has been reset."
+
+ def chat(self, question, history):
+ """
+ Process a question and return the answer
+ """
+ if not question.strip():
+ return "Please ask a question."
+
+ result = self.conversation_chain.invoke({"question": question})
+ return result["answer"]
+
+ def update_knowledge_base(self):
+ """
+ Update the conversation chain with the latest knowledge base
+ """
+ self.conversation_chain = self._create_conversation_chain()
+
+# UI Functions
+def handle_file_upload(files):
+ """
+ Process uploaded files and add them to the knowledge base
+ """
+ if not files:
+ return "No files uploaded."
+
+ documents = []
+ for file in files:
+ try:
+ docs = load_document(file.name)
+ if docs:
+ # Add upload source metadata
+ for doc in docs:
+ doc.metadata['source'] = 'upload'
+ doc.metadata['filename'] = os.path.basename(file.name)
+ documents.extend(docs)
+ except Exception as e:
+ print(f"Error processing file {file.name}: {e}")
+
+ if documents:
+ success = kb.add_documents(documents)
+ if success:
+ # Update the chat system with new knowledge
+ chat_system.update_knowledge_base()
+ return f"Successfully processed {len(documents)} documents."
+
+ return "No documents could be processed. Please check file formats."
+
+def create_ui():
+ """
+ Create the Gradio UI
+ """
+ with gr.Blocks(theme=gr.themes.Soft()) as app:
+ gr.Markdown("""
+ # Knowledge Worker
+ Upload documents or ask questions about your knowledge base.
+ """)
+
+ with gr.Tabs():
+ with gr.TabItem("Chat"):
+ chatbot = gr.ChatInterface(
+ chat_system.chat,
+ chatbot=gr.Chatbot(height=500, type="messages"),
+ textbox=gr.Textbox(placeholder="Ask a question about your documents...", container=False),
+ title="Knowledge Worker Chat",
+ type="messages"
+ )
+ reset_btn = gr.Button("Reset Conversation")
+ reset_btn.click(chat_system.reset_conversation, inputs=None, outputs=gr.Textbox())
+
+ with gr.TabItem("Upload Documents"):
+ with gr.Column():
+ file_output = gr.Textbox(label="Upload Status")
+ upload_button = gr.UploadButton(
+ "Click to Upload Files",
+ file_types=[".pdf", ".docx", ".txt", ".md", ".xlsx"],
+ file_count="multiple"
+ )
+ upload_button.upload(handle_file_upload, upload_button, file_output)
+
+ with gr.TabItem("Visualize Knowledge"):
+ visualize_btn = gr.Button("Generate Vector Visualization")
+ plot_output = gr.Plot(label="Vector Space Visualization")
+ visualize_btn.click(kb.visualize_vectors, inputs=None, outputs=plot_output)
+
+ return app
+
+def main():
+ """
+ Main function to initialize and run the knowledge worker
+ """
+ global kb, chat_system
+
+ print("=" * 60)
+ print("Initializing Knowledge Worker...")
+ print("=" * 60)
+
+ try:
+ # Initialize the knowledge base
+ print("Setting up vector database...")
+ kb = KnowledgeBase(DB_NAME)
+ print("Vector database initialized successfully")
+
+ # Google Drive integration removed
+
+ # Initialize the chat system
+ print("\nSetting up chat system...")
+ chat_system = ChatSystem(kb)
+ print("Chat system initialized successfully")
+
+ # Launch the Gradio app
+ print("\nLaunching Gradio interface...")
+ print("=" * 60)
+ print("The web interface will open in your browser")
+ print("You can also access it at the URL shown below")
+ print("=" * 60)
+
+ app = create_ui()
+ app.launch(inbrowser=True)
+
+ except Exception as e:
+ print(f"Error initializing Knowledge Worker: {e}")
+ print("Please check your configuration and try again.")
+ return
+
+if __name__ == "__main__":
+ main()
diff --git a/week5/community-contributions/week5_jom/Exercise_week5_jom.ipynb b/week5/community-contributions/week5_jom/Exercise_week5_jom.ipynb
new file mode 100644
index 0000000..8881804
--- /dev/null
+++ b/week5/community-contributions/week5_jom/Exercise_week5_jom.ipynb
@@ -0,0 +1,623 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "6f0f38e7",
+ "metadata": {},
+ "source": [
+ "# Email Mindmap Demo (Week 5 Community Contribution)\n",
+ "\n",
+ "Welcome to the **Email Mindmap Demo** notebook! This demo walks you through a workflow for exploring and visualizing email relationships using embeddings and mindmaps.\n",
+ "\n",
+ "---\n",
+ "\n",
+ "## 📋 Workflow Overview\n",
+ "\n",
+ "1. **Load/Create Synthetic Email Data** \n",
+ " Generate or load varied types of emails: work, personal, family, subscriptions, etc.\n",
+ "\n",
+ "2. **Generate Embeddings** \n",
+ " Use an open-source model to create vector embeddings for email content.\n",
+ "\n",
+ "3. **Build & Visualize a Mindmap** \n",
+ " Construct a mindmap of email relationships and visualize it interactively using `networkx` and `matplotlib`.\n",
+ "\n",
+ "4. **Question-Answering Interface** \n",
+ " Query the email content and the mindmap using a simple Q&A interface powered by Gradio.\n",
+ "\n",
+ "---\n",
+ "\n",
+ "## ⚙️ Requirements\n",
+ "\n",
+ "> **Tip:** \n",
+ "> I'm including an example of the synthetic emails in case you don't want to run that part.\n",
+ "> Might need to install other libraries like pyvis, nbformat and faiss-cpu\n",
+ "\n",
+ "\n",
+ "## ✨ Features\n",
+ "\n",
+ "- Synthetic generation of varied emails (work, personal, family, subscriptions)\n",
+ "- Embedding generation with open-source models (hugging face sentence-transformer)\n",
+ "- Interactive mindmap visualization (`networkx`, `pyvis`)\n",
+ "- Simple chatbot interface (Gradio) and visualization of mindmap created\n",
+ "\n",
+ "---\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "a9aeb363",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "OpenAI API Key exists and begins sk-proj-\n",
+ "Anthropic API Key exists and begins sk-ant-\n",
+ "Google API Key exists and begins AI\n",
+ "OLLAMA API Key exists and begins 36\n"
+ ]
+ }
+ ],
+ "source": [
+ "# imports\n",
+ "\n",
+ "import os\n",
+ "from dotenv import load_dotenv\n",
+ "from openai import OpenAI\n",
+ "import gradio as gr\n",
+ "\n",
+ "load_dotenv(override=True)\n",
+ "openai_api_key = os.getenv('OPENAI_API_KEY')\n",
+ "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
+ "google_api_key = os.getenv('GOOGLE_API_KEY')\n",
+ "ollama_api_key = os.getenv('OLLAMA_API_KEY')\n",
+ "\n",
+ "if openai_api_key:\n",
+ " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
+ "else:\n",
+ " print(\"OpenAI API Key not set\")\n",
+ " \n",
+ "if anthropic_api_key:\n",
+ " print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
+ "else:\n",
+ " print(\"Anthropic API Key not set (and this is optional)\")\n",
+ "\n",
+ "if google_api_key:\n",
+ " print(f\"Google API Key exists and begins {google_api_key[:2]}\")\n",
+ "else:\n",
+ " print(\"Google API Key not set (and this is optional)\")\n",
+ "\n",
+ "if ollama_api_key:\n",
+ " print(f\"OLLAMA API Key exists and begins {ollama_api_key[:2]}\")\n",
+ "else:\n",
+ " print(\"OLLAMA API Key not set (and this is optional)\")\n",
+ "\n",
+ "# Connect to client libraries\n",
+ "\n",
+ "openai = OpenAI()\n",
+ "\n",
+ "anthropic_url = \"https://api.anthropic.com/v1/\"\n",
+ "gemini_url = \"https://generativelanguage.googleapis.com/v1beta/openai/\"\n",
+ "ollama_url = \"http://localhost:11434/v1\"\n",
+ "\n",
+ "anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)\n",
+ "gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)\n",
+ "ollama = OpenAI(api_key=ollama_api_key, base_url=ollama_url)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b8ddce62",
+ "metadata": {},
+ "source": [
+ "## Preparation of synthetic data (could have been week2 work)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "2e250912",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#using ollama gpt oss 120b cloud i'm going to create synthetic emails using a persona.\n",
+ "#they are going to be saved in a json file with different keys\n",
+ "from pydantic import BaseModel, Field\n",
+ "from typing import List, Optional\n",
+ "\n",
+ "\n",
+ "class Email(BaseModel):\n",
+ " sender: str = Field(description=\"Email address of the sender\")\n",
+ " subject: str = Field(description=\"Email subject line\")\n",
+ " body: str = Field(description=\"Email body content\")\n",
+ " timestamp: str = Field(description=\"ISO 8601 timestamp when email was received\")\n",
+ " category: str = Field(description=\"Category of the email\")\n",
+ "\n",
+ "class EmailBatch(BaseModel):\n",
+ " emails: List[Email] = Field(description=\"List of generated emails\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "1f67fdb3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def create_persona(name: str, age: int, occupation: str, \n",
+ " interests: List[str], family_status: str) -> str:\n",
+ " persona = f\"\"\"\n",
+ " You are generating synthetic emails for a realistic inbox simulation.\n",
+ "\n",
+ " **Person Profile:**\n",
+ " - Name: {name}\n",
+ " - Age: {age}\n",
+ " - Occupation: {occupation}\n",
+ " - Interests: {', '.join(interests)}\n",
+ " - Family Status: {family_status}\n",
+ "\n",
+ " **Email Categories to Include:**\n",
+ " 1. **Work Emails**: Project updates, meeting invitations, colleague communications, \n",
+ " performance reviews, company announcements\n",
+ " 2. **Purchases**: Order confirmations, shipping notifications, delivery updates, \n",
+ " receipts from various retailers (Amazon, local shops, etc.)\n",
+ " 3. **Subscriptions**: Newsletter updates, streaming services (Netflix, Spotify), \n",
+ " software subscriptions (Adobe, Microsoft 365), magazine subscriptions\n",
+ " 4. **Family**: Communications with parents, siblings, children, extended family members,\n",
+ " family event planning, photo sharing\n",
+ " 5. **Friends**: Social plans, birthday wishes, casual conversations, group hangouts,\n",
+ " catching up messages\n",
+ " 6. **Finance**: Bank statements, credit card bills, investment updates, tax documents,\n",
+ " payment reminders\n",
+ " 7. **Social Media**: Facebook notifications, LinkedIn updates, Instagram activity,\n",
+ " Twitter mentions\n",
+ " 8. **Personal**: Doctor appointments, gym memberships, utility bills, insurance updates\n",
+ "\n",
+ " **Instructions:**\n",
+ " - Generate realistic email content that reflects the person's life over time\n",
+ " - Include temporal patterns (more work emails on weekdays, more personal on weekends)\n",
+ " - Create realistic sender names and email addresses\n",
+ " - Vary email length and formality based on context\n",
+ " - Include realistic subject lines\n",
+ " - Make emails interconnected when appropriate (e.g., follow-up emails, conversation threads)\n",
+ " - Include seasonal events (holidays, birthdays, annual renewals)\n",
+ " \"\"\"\n",
+ " return persona\n",
+ "\n",
+ "persona_description = create_persona(\n",
+ " name=\"John Doe\",\n",
+ " age=30,\n",
+ " occupation=\"Software Engineer\",\n",
+ " interests=[\"technology\", \"reading\", \"traveling\"],\n",
+ " family_status=\"single\"\n",
+ ")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "cec185e3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from openai import OpenAI\n",
+ "from datetime import datetime, timedelta\n",
+ "import random\n",
+ "from typing import List\n",
+ "\n",
+ "def generate_synthetic_emails(\n",
+ " persona_description: str,\n",
+ " num_emails: int,\n",
+ " start_date: str,\n",
+ " end_date: str,\n",
+ " model: str = \"gpt-4o-2024-08-06\"\n",
+ ") -> List[Email]:\n",
+ " \"\"\"\n",
+ " NEEDS TO WORK WITH OPENAI MODELS BECAUSE OF PARSED (STRUC OUTPUT) MODELS\n",
+ " Generates synthetic emails using OpenAI's structured output feature.\n",
+ " \n",
+ " Args:\n",
+ " persona_description: Detailed persona description\n",
+ " num_emails: Number of emails to generate per batch\n",
+ " start_date: Start date for email timestamps\n",
+ " end_date: End date for email timestamps\n",
+ " model: OpenAI model to use (must support structured outputs)\n",
+ " \n",
+ " Returns:\n",
+ " List of Email objects\n",
+ " \"\"\"\n",
+ " \n",
+ " # Calculate date range for context\n",
+ " date_range_context = f\"\"\"\n",
+ " Generate emails with timestamps between {start_date} and {end_date}.\n",
+ " Distribute emails naturally across this time period, with realistic patterns:\n",
+ " - More emails during business hours on weekdays\n",
+ " - Fewer emails late at night\n",
+ " - Occasional weekend emails\n",
+ " - Bursts of activity around events or busy periods\n",
+ " \"\"\"\n",
+ " \n",
+ " # System message combining persona and structure instructions\n",
+ " system_message = f\"\"\"\n",
+ " {persona_description}\n",
+ "\n",
+ " {date_range_context}\n",
+ "\n",
+ " Generate {num_emails} realistic emails that fit this person's life. \n",
+ " Ensure variety in categories, senders, and content while maintaining realism.\n",
+ " \"\"\"\n",
+ " \n",
+ " try:\n",
+ " client = OpenAI()\n",
+ "\n",
+ " response = client.chat.completions.parse(\n",
+ " model=model,\n",
+ " messages=[\n",
+ " {\n",
+ " \"role\": \"system\",\n",
+ " \"content\": system_message\n",
+ " },\n",
+ " {\n",
+ " \"role\": \"user\",\n",
+ " \"content\": f\"Generate {num_emails} diverse, realistic emails for this person's inbox.\"\n",
+ " }\n",
+ " ],\n",
+ " response_format=EmailBatch,\n",
+ " )\n",
+ " return response.choices[0].message.parsed.emails\n",
+ " \n",
+ " except Exception as e:\n",
+ " print(f\"Error generating emails: {e}\")\n",
+ " return []\n",
+ "\n",
+ "\n",
+ "def save_emails_to_json(emails: List[Email], filename: str):\n",
+ " \"\"\"\n",
+ " Saves emails to a JSON file.\n",
+ " \"\"\"\n",
+ " import json\n",
+ " \n",
+ " emails_dict = [email.model_dump() for email in emails]\n",
+ " \n",
+ " with open(filename, 'w', encoding='utf-8') as f:\n",
+ " json.dump(emails_dict, f, indent=2, ensure_ascii=False)\n",
+ " \n",
+ " print(f\"Saved {len(emails)} emails to {filename}\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 51,
+ "id": "be31f352",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "now\n"
+ ]
+ }
+ ],
+ "source": [
+ "mails_2 = generate_synthetic_emails(\n",
+ " persona_description = persona_description,\n",
+ " num_emails = 100,\n",
+ " start_date = '2024-06-01',\n",
+ " end_date = '2025-01-01',\n",
+ " model = \"gpt-4o\"\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 52,
+ "id": "24d844f2",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Saved 101 emails to emails2.json\n"
+ ]
+ }
+ ],
+ "source": [
+ "save_emails_to_json(mails_2, 'emails2.json')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2b9c704e",
+ "metadata": {},
+ "source": [
+ "## Create embeddings for the mails\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "777012f8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# imports for langchain, plotly and Chroma\n",
+ "\n",
+ "from langchain.document_loaders import DirectoryLoader, TextLoader\n",
+ "from langchain.text_splitter import CharacterTextSplitter\n",
+ "from langchain.schema import Document\n",
+ "from langchain_openai import OpenAIEmbeddings, ChatOpenAI\n",
+ "from langchain_chroma import Chroma\n",
+ "import matplotlib.pyplot as plt\n",
+ "from sklearn.manifold import TSNE\n",
+ "import numpy as np\n",
+ "import plotly.graph_objects as go\n",
+ "from langchain.memory import ConversationBufferMemory\n",
+ "from langchain.chains import ConversationalRetrievalChain\n",
+ "from langchain.embeddings import HuggingFaceEmbeddings\n",
+ "import json\n",
+ "from langchain.vectorstores import FAISS\n",
+ "\n",
+ "#MODEL = \"gpt-4o-mini\"\n",
+ "db_name = \"vector_db\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "id": "ce95d9c7",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Total number of chunks: 206\n",
+ "Sample metadata fields: ['sender', 'timestamp', 'category']\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Read in emails from the emails.json file and construct LangChain documents\n",
+ "\n",
+ "\n",
+ "with open(\"emails.json\", \"r\", encoding=\"utf-8\") as f:\n",
+ " emails = json.load(f)\n",
+ "\n",
+ "documents = []\n",
+ "for email in emails:\n",
+ " # Extract metadata (all fields except 'content')\n",
+ " metadata = {k: v for k, v in email.items() if k in ['sender','category','timestamp']}\n",
+ " body = email.get(\"body\", \"\")\n",
+ " documents.append(Document(page_content=body, metadata=metadata))\n",
+ "\n",
+ "text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n",
+ "chunks = text_splitter.split_documents(documents)\n",
+ "\n",
+ "print(f\"Total number of chunks: {len(chunks)}\")\n",
+ "print(f\"Sample metadata fields: {list(documents[0].metadata.keys()) if documents else []}\")\n",
+ "\n",
+ "embeddings_model = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")\n",
+ "\n",
+ "if os.path.exists(db_name):\n",
+ " Chroma(persist_directory=db_name, embedding_function=embeddings_model).delete_collection()\n",
+ "\n",
+ "vectorstore = FAISS.from_documents(chunks, embedding=embeddings_model)\n",
+ "\n",
+ "all_embeddings = [vectorstore.index.reconstruct(i) for i in range(vectorstore.index.ntotal)]\n",
+ "\n",
+ "total_vectors = vectorstore.index.ntotal\n",
+ "dimensions = vectorstore.index.d\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "78ca65bb",
+ "metadata": {},
+ "source": [
+ "## Visualizing mindmap"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "id": "a99dd2d6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import networkx as nx\n",
+ "import matplotlib.pyplot as plt\n",
+ "from sklearn.metrics.pairwise import cosine_similarity\n",
+ "import plotly.graph_objects as go\n",
+ "import numpy as np\n",
+ "from sklearn.cluster import KMeans\n",
+ "from sklearn.manifold import TSNE # Or use UMAP\n",
+ "from pyvis.network import Network\n",
+ "\n",
+ "# Here, emails is your list of email objects, with .subject or .body\n",
+ "\n",
+ "# Build similarity graph\n",
+ "def build_mindmap_html(emails, all_embeddings, threshold=0.6):\n",
+ " similarity = cosine_similarity(all_embeddings)\n",
+ "\n",
+ " G = nx.Graph()\n",
+ " for i, email in enumerate(emails):\n",
+ " G.add_node(i, label=email['subject'][:80], title=email['body'][:50]) # Custom hover text\n",
+ "\n",
+ " for i in range(len(emails)):\n",
+ " for j in range(i+1, len(emails)):\n",
+ " if similarity[i][j] > threshold:\n",
+ " G.add_edge(i, j, weight=float(similarity[i][j]))\n",
+ "\n",
+ " # Convert to pyvis network\n",
+ " nt = Network(notebook=True, height='700px', width='100%', bgcolor='#222222', font_color='white')\n",
+ " nt.from_nx(G)\n",
+ " html = nt.generate_html().replace(\"'\", \"\\\"\")\n",
+ " return html\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "53a2fbaf",
+ "metadata": {},
+ "source": [
+ "## Putting it all together in a gradio.\n",
+ "It needs to have an interface to make questions, and the visual to see the mindmap.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 45,
+ "id": "161144ac",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# create a new Chat with OpenAI\n",
+ "MODEL=\"gpt-4o-mini\"\n",
+ "llm = ChatOpenAI(temperature=0.7, model_name=MODEL)\n",
+ "\n",
+ "# set up the conversation memory for the chat\n",
+ "memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)\n",
+ "\n",
+ "# the retriever is an abstraction over the VectorStore that will be used during RAG\n",
+ "retriever = vectorstore.as_retriever()\n",
+ "from langchain_core.callbacks import StdOutCallbackHandler\n",
+ "\n",
+ "# putting it together: set up the conversation chain with the GPT 3.5 LLM, the vector store and memory\n",
+ "conversation_chain_debug = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory, callbacks=[StdOutCallbackHandler()])\n",
+ "conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)\n",
+ "\n",
+ "# Wrapping that in a function\n",
+ "\n",
+ "def chat(question, history):\n",
+ " result = conversation_chain.invoke({\"question\": question})\n",
+ " return result[\"answer\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 60,
+ "id": "16a4d8d1",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "c:\\Users\\Javi\\Desktop\\course\\llm_engineering\\.venv\\Lib\\site-packages\\gradio\\chat_interface.py:347: UserWarning:\n",
+ "\n",
+ "The 'tuples' format for chatbot messages is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Warning: When cdn_resources is 'local' jupyter notebook has issues displaying graphics on chrome/safari. Use cdn_resources='in_line' or cdn_resources='remote' if you have issues viewing graphics in a notebook.\n",
+ "* Running on local URL: http://127.0.0.1:7878\n",
+ "* To create a public link, set `share=True` in `launch()`.\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/plain": []
+ },
+ "execution_count": 60,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Warning: When cdn_resources is 'local' jupyter notebook has issues displaying graphics on chrome/safari. Use cdn_resources='in_line' or cdn_resources='remote' if you have issues viewing graphics in a notebook.\n",
+ "Warning: When cdn_resources is 'local' jupyter notebook has issues displaying graphics on chrome/safari. Use cdn_resources='in_line' or cdn_resources='remote' if you have issues viewing graphics in a notebook.\n"
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "import gradio as gr\n",
+ "\n",
+ "def show_mindmap():\n",
+ " # Call build_mindmap_html to generate the HTML\n",
+ " html = build_mindmap_html(emails, all_embeddings)\n",
+ " return f\"\"\"\"\"\"\n",
+ "\n",
+ "\n",
+ "with gr.Blocks(title=\"Mindmap & Email Chatbot\") as demo:\n",
+ " gr.Markdown(\"# 📧 Mindmap Visualization & Email QA Chatbot\")\n",
+ " with gr.Row():\n",
+ " chatbot = gr.ChatInterface(fn=chat, title=\"Ask about your emails\",\n",
+ " examples=[\n",
+ " \"What is my most important message?\",\n",
+ " \"Who have I been communicating with?\",\n",
+ " \"Summarize recent emails\"\n",
+ " ],\n",
+ ")\n",
+ " mindmap_html = gr.HTML(\n",
+ " show_mindmap,\n",
+ " label=\"🧠 Mindmap of Your Emails\",\n",
+ " )\n",
+ " # Reduce height: update show_mindmap (elsewhere) to ~400px, or do inline replace for the demo here:\n",
+ " # mindmap_html = gr.HTML(lambda: show_mindmap().replace(\"height: 600px\", \"height: 400px\"))\n",
+ " \n",
+ "demo.launch(inbrowser=True)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "221a9d98",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}