503 lines
21 KiB
Plaintext
503 lines
21 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Day 2 Solution - Chat Completions API & Ollama Integration\n",
|
||
"\n",
|
||
"This is my solution to the Day 2 assignment. I've implemented the Chat Completions API with both OpenAI and Ollama.\n",
|
||
"\n",
|
||
"## Features Implemented:\n",
|
||
"- Chat Completions API understanding and implementation\n",
|
||
"- OpenAI API integration with different models\n",
|
||
"- Ollama local model integration (Llama 3.2)\n",
|
||
"- Model comparison and testing\n",
|
||
"- Advanced web scraping with Selenium fallback\n",
|
||
"- Temperature and token control\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Day 2 setup complete! Ready for Chat Completions API.\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Day 2 Solution - Imports and Setup\n",
|
||
"import os\n",
|
||
"import ssl\n",
|
||
"import requests\n",
|
||
"from bs4 import BeautifulSoup\n",
|
||
"from urllib.parse import urljoin\n",
|
||
"from IPython.display import Markdown, display\n",
|
||
"from openai import OpenAI\n",
|
||
"from dotenv import load_dotenv\n",
|
||
"import ollama\n",
|
||
"import time\n",
|
||
"\n",
|
||
"# Load environment variables\n",
|
||
"load_dotenv(override=True)\n",
|
||
"\n",
|
||
"# SSL fix for Windows\n",
|
||
"ssl._create_default_https_context = ssl._create_unverified_context\n",
|
||
"os.environ['PYTHONHTTPSVERIFY'] = '0'\n",
|
||
"os.environ['CURL_CA_BUNDLE'] = ''\n",
|
||
"\n",
|
||
"# Initialize OpenAI client\n",
|
||
"openai = OpenAI()\n",
|
||
"\n",
|
||
"print(\"Day 2 setup complete! Ready for Chat Completions API.\")\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"## Chat Completions API - Key Concepts\n",
|
||
"==================================================\n",
|
||
"\n",
|
||
"1. **What is Chat Completions API?**\n",
|
||
" - The simplest way to call an LLM\n",
|
||
" - Takes a conversation and predicts what should come next\n",
|
||
" - Invented by OpenAI, now used by everyone\n",
|
||
"\n",
|
||
"2. **Key Components:**\n",
|
||
" - Messages: List of conversation turns\n",
|
||
" - Roles: system, user, assistant\n",
|
||
" - Models: Different LLMs with different capabilities\n",
|
||
" - Parameters: temperature, max_tokens, etc.\n",
|
||
"\n",
|
||
"3. **Message Format:**\n",
|
||
" [\n",
|
||
" {\"role\": \"system\", \"content\": \"You are a helpful assistant\"},\n",
|
||
" {\"role\": \"user\", \"content\": \"Hello!\"},\n",
|
||
" {\"role\": \"assistant\", \"content\": \"Hi there!\"},\n",
|
||
" {\"role\": \"user\", \"content\": \"What's the weather?\"}\n",
|
||
" ]\n",
|
||
"\n",
|
||
"\\nTesting basic Chat Completions API...\n",
|
||
"Response: A Chat Completions API is a tool that allows developers to create applications that can interact with users through text-based conversations. Here’s a simple breakdown:\n",
|
||
"\n",
|
||
"1. **Chat**: This means it can hold a conversation, similar to how you chat with friends or a customer service representative.\n",
|
||
"\n",
|
||
"2. **Completions**: This refers to the API's ability to generate responses. When a user sends a message or question, the API processes that input and provides a relevant response.\n",
|
||
"\n",
|
||
"3. **API (Application Programming Interface)**: This is a set of rules that allows different software programs to communicate with each other. In this case, it lets your application talk to the chat service to get responses.\n",
|
||
"\n",
|
||
"So, in simple terms, a Chat Com\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Understanding Chat Completions API\n",
|
||
"print(\"## Chat Completions API - Key Concepts\")\n",
|
||
"print(\"=\"*50)\n",
|
||
"\n",
|
||
"print(\"\"\"\n",
|
||
"1. **What is Chat Completions API?**\n",
|
||
" - The simplest way to call an LLM\n",
|
||
" - Takes a conversation and predicts what should come next\n",
|
||
" - Invented by OpenAI, now used by everyone\n",
|
||
"\n",
|
||
"2. **Key Components:**\n",
|
||
" - Messages: List of conversation turns\n",
|
||
" - Roles: system, user, assistant\n",
|
||
" - Models: Different LLMs with different capabilities\n",
|
||
" - Parameters: temperature, max_tokens, etc.\n",
|
||
"\n",
|
||
"3. **Message Format:**\n",
|
||
" [\n",
|
||
" {\"role\": \"system\", \"content\": \"You are a helpful assistant\"},\n",
|
||
" {\"role\": \"user\", \"content\": \"Hello!\"},\n",
|
||
" {\"role\": \"assistant\", \"content\": \"Hi there!\"},\n",
|
||
" {\"role\": \"user\", \"content\": \"What's the weather?\"}\n",
|
||
" ]\n",
|
||
"\"\"\")\n",
|
||
"\n",
|
||
"# Test basic Chat Completions\n",
|
||
"messages = [\n",
|
||
" {\"role\": \"system\", \"content\": \"You are a helpful programming tutor.\"},\n",
|
||
" {\"role\": \"user\", \"content\": \"Explain what a Chat Completions API is in simple terms.\"}\n",
|
||
"]\n",
|
||
"\n",
|
||
"print(\"\\\\nTesting basic Chat Completions API...\")\n",
|
||
"response = openai.chat.completions.create(\n",
|
||
" model=\"gpt-4o-mini\",\n",
|
||
" messages=messages,\n",
|
||
" temperature=0.7,\n",
|
||
" max_tokens=150\n",
|
||
")\n",
|
||
"\n",
|
||
"print(f\"Response: {response.choices[0].message.content}\")\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"## Model Comparison Test\n",
|
||
"==================================================\n",
|
||
"\\n🤖 Testing gpt-4o-mini...\n",
|
||
"✅ gpt-4o-mini: Machine learning is a subset of artificial intelligence that enables systems to learn from data and improve their performance over time without being explicitly programmed.\n",
|
||
"\\n🤖 Testing gpt-4o...\n",
|
||
"✅ gpt-4o: Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.\n",
|
||
"\\n🤖 Testing gpt-3.5-turbo...\n",
|
||
"✅ gpt-3.5-turbo: Machine learning is a branch of artificial intelligence that enables computers to learn from data and improve their performance on specific tasks without being explicitly programmed.\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Model Comparison - Different OpenAI Models\n",
|
||
"def test_model(model_name, prompt, temperature=0.7, max_tokens=100):\n",
|
||
" \"\"\"Test different OpenAI models with the same prompt\"\"\"\n",
|
||
" print(f\"\\\\n🤖 Testing {model_name}...\")\n",
|
||
" \n",
|
||
" messages = [\n",
|
||
" {\"role\": \"system\", \"content\": \"You are a helpful assistant. Be concise.\"},\n",
|
||
" {\"role\": \"user\", \"content\": prompt}\n",
|
||
" ]\n",
|
||
" \n",
|
||
" try:\n",
|
||
" response = openai.chat.completions.create(\n",
|
||
" model=model_name,\n",
|
||
" messages=messages,\n",
|
||
" temperature=temperature,\n",
|
||
" max_tokens=max_tokens\n",
|
||
" )\n",
|
||
" \n",
|
||
" result = response.choices[0].message.content\n",
|
||
" print(f\"✅ {model_name}: {result}\")\n",
|
||
" return result\n",
|
||
" \n",
|
||
" except Exception as e:\n",
|
||
" print(f\"❌ {model_name}: Error - {e}\")\n",
|
||
" return None\n",
|
||
"\n",
|
||
"# Test different models\n",
|
||
"prompt = \"What is machine learning in one sentence?\"\n",
|
||
"\n",
|
||
"models_to_test = [\n",
|
||
" \"gpt-4o-mini\",\n",
|
||
" \"gpt-4o\", \n",
|
||
" \"gpt-3.5-turbo\"\n",
|
||
"]\n",
|
||
"\n",
|
||
"print(\"## Model Comparison Test\")\n",
|
||
"print(\"=\"*50)\n",
|
||
"\n",
|
||
"results = {}\n",
|
||
"for model in models_to_test:\n",
|
||
" results[model] = test_model(model, prompt)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\\n## Ollama Local Model Testing\n",
|
||
"==================================================\n",
|
||
"\\n🦙 Testing Ollama llama3.2...\n",
|
||
"✅ Ollama llama3.2: Machine learning is a type of artificial intelligence that enables computers to learn from data, identify patterns, and make predictions or decisions without being explicitly programmed.\n",
|
||
"\\n🦙 Testing Ollama llama3.2:3b...\n",
|
||
"❌ Ollama llama3.2:3b: Error - model 'llama3.2:3b' not found (status code: 404)\n",
|
||
"\\n🦙 Testing Ollama llama3.2:1b...\n",
|
||
"❌ Ollama llama3.2:1b: Error - model 'llama3.2:1b' not found (status code: 404)\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Ollama Integration - Local Model Testing\n",
|
||
"def test_ollama_model(model_name, prompt):\n",
|
||
" \"\"\"Test Ollama models locally\"\"\"\n",
|
||
" print(f\"\\\\n🦙 Testing Ollama {model_name}...\")\n",
|
||
" \n",
|
||
" try:\n",
|
||
" response = ollama.chat(\n",
|
||
" model=model_name,\n",
|
||
" messages=[\n",
|
||
" {\"role\": \"system\", \"content\": \"You are a helpful assistant. Be concise.\"},\n",
|
||
" {\"role\": \"user\", \"content\": prompt}\n",
|
||
" ]\n",
|
||
" )\n",
|
||
" \n",
|
||
" result = response['message']['content']\n",
|
||
" print(f\"✅ Ollama {model_name}: {result}\")\n",
|
||
" return result\n",
|
||
" \n",
|
||
" except Exception as e:\n",
|
||
" print(f\"❌ Ollama {model_name}: Error - {e}\")\n",
|
||
" return None\n",
|
||
"\n",
|
||
"# Test Ollama models\n",
|
||
"print(\"\\\\n## Ollama Local Model Testing\")\n",
|
||
"print(\"=\"*50)\n",
|
||
"\n",
|
||
"ollama_models = [\"llama3.2\", \"llama3.2:3b\", \"llama3.2:1b\"]\n",
|
||
"\n",
|
||
"ollama_results = {}\n",
|
||
"for model in ollama_models:\n",
|
||
" ollama_results[model] = test_ollama_model(model, prompt)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Advanced web scraping functions defined!\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Advanced Web Scraping with Selenium Fallback\n",
|
||
"from selenium import webdriver\n",
|
||
"from selenium.webdriver.chrome.options import Options\n",
|
||
"from selenium.webdriver.chrome.service import Service\n",
|
||
"from webdriver_manager.chrome import ChromeDriverManager\n",
|
||
"\n",
|
||
"def clean_text_from_soup(soup):\n",
|
||
" \"\"\"Extract clean text from BeautifulSoup object\"\"\"\n",
|
||
" if not soup or not soup.body:\n",
|
||
" return \"\"\n",
|
||
" for tag in soup.body([\"script\", \"style\", \"noscript\", \"template\", \"svg\", \"img\", \"video\", \"source\", \"iframe\", \"form\", \"input\"]):\n",
|
||
" tag.decompose()\n",
|
||
" text = soup.body.get_text(separator=\"\\\\n\", strip=True)\n",
|
||
" # Collapse excessive blank lines\n",
|
||
" import re\n",
|
||
" text = re.sub(r\"\\\\n{3,}\", \"\\\\n\\\\n\", text)\n",
|
||
" return text\n",
|
||
"\n",
|
||
"def is_js_heavy(html_text):\n",
|
||
" \"\"\"Check if page needs JavaScript to render content\"\"\"\n",
|
||
" if not html_text:\n",
|
||
" return True\n",
|
||
" soup = BeautifulSoup(html_text, \"html.parser\")\n",
|
||
" txt_len = len(re.sub(r\"\\\\s+\", \" \", soup.get_text()))\n",
|
||
" script_tags = html_text.count(\"<script\")\n",
|
||
" if txt_len < 1200: # very little text => likely JS-rendered\n",
|
||
" return True\n",
|
||
" if script_tags > 50 and (txt_len / (script_tags + 1)) < 40:\n",
|
||
" return True\n",
|
||
" if re.search(r\"(Loading|Please wait|Enable JavaScript)\", html_text, re.I):\n",
|
||
" return True\n",
|
||
" return False\n",
|
||
"\n",
|
||
"def fetch_static_html(url):\n",
|
||
" \"\"\"Try to fetch HTML using requests (no JS execution)\"\"\"\n",
|
||
" try:\n",
|
||
" r = requests.get(url, headers={\"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36\"}, timeout=15)\n",
|
||
" r.raise_for_status()\n",
|
||
" return r.text\n",
|
||
" except Exception:\n",
|
||
" return None\n",
|
||
"\n",
|
||
"def fetch_js_html(url):\n",
|
||
" \"\"\"Fetch HTML using Selenium (with JS execution)\"\"\"\n",
|
||
" try:\n",
|
||
" options = Options()\n",
|
||
" options.add_argument(\"--headless\")\n",
|
||
" options.add_argument(\"--no-sandbox\")\n",
|
||
" options.add_argument(\"--disable-dev-shm-usage\")\n",
|
||
" \n",
|
||
" service = Service(ChromeDriverManager().install())\n",
|
||
" driver = webdriver.Chrome(service=service, options=options)\n",
|
||
" \n",
|
||
" driver.get(url)\n",
|
||
" time.sleep(2) # Wait for JS to execute\n",
|
||
" html = driver.page_source\n",
|
||
" driver.quit()\n",
|
||
" return html\n",
|
||
" except Exception as e:\n",
|
||
" print(f\"JS fetch failed: {e}\")\n",
|
||
" return None\n",
|
||
"\n",
|
||
"def fetch_website_contents(url, char_limit=2000, allow_js_fallback=True):\n",
|
||
" \"\"\"Enhanced website content fetching with JS fallback\"\"\"\n",
|
||
" html = fetch_static_html(url)\n",
|
||
" need_js = (html is None) or is_js_heavy(html)\n",
|
||
"\n",
|
||
" if need_js and allow_js_fallback:\n",
|
||
" html = fetch_js_html(url) or html or \"\"\n",
|
||
"\n",
|
||
" soup = BeautifulSoup(html or \"\", \"html.parser\")\n",
|
||
" title = soup.title.get_text(strip=True) if soup.title else \"No title found\"\n",
|
||
" text = clean_text_from_soup(soup)\n",
|
||
" return (f\"{title}\\\\n\\\\n{text}\").strip()[:char_limit]\n",
|
||
"\n",
|
||
"print(\"Advanced web scraping functions defined!\")\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Model-agnostic summarization functions defined!\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Model-Agnostic Summarization Function\n",
|
||
"def summarize_with_model(url, model=\"gpt-4o-mini\", temperature=0.4, max_tokens=None):\n",
|
||
" \"\"\"Summarize website content using any available model\"\"\"\n",
|
||
" website = fetch_website_contents(url, allow_js_fallback=True)\n",
|
||
" \n",
|
||
" system_prompt = \"\"\"\n",
|
||
" You are a helpful assistant that analyzes website content\n",
|
||
" and provides a clear, concise summary.\n",
|
||
" Respond in markdown format. Do not wrap the markdown in a code block.\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" user_prompt = f\"\"\"\n",
|
||
" Here are the contents of a website.\n",
|
||
" Provide a short summary of this website.\n",
|
||
" If it includes news or announcements, then summarize these too.\n",
|
||
"\n",
|
||
" {website}\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" messages = [\n",
|
||
" {\"role\": \"system\", \"content\": system_prompt},\n",
|
||
" {\"role\": \"user\", \"content\": user_prompt}\n",
|
||
" ]\n",
|
||
" \n",
|
||
" try:\n",
|
||
" if model.startswith(\"gpt\") or model.startswith(\"o1\"):\n",
|
||
" # OpenAI model\n",
|
||
" response = openai.chat.completions.create(\n",
|
||
" model=model,\n",
|
||
" messages=messages,\n",
|
||
" temperature=temperature,\n",
|
||
" max_tokens=max_tokens\n",
|
||
" )\n",
|
||
" return response.choices[0].message.content\n",
|
||
" else:\n",
|
||
" # Ollama model\n",
|
||
" response = ollama.chat(\n",
|
||
" model=model,\n",
|
||
" messages=messages\n",
|
||
" )\n",
|
||
" return response['message']['content']\n",
|
||
" except Exception as e:\n",
|
||
" return f\"Error with {model}: {e}\"\n",
|
||
"\n",
|
||
"def display_summary_with_model(url, model=\"gpt-4o-mini\", **kwargs):\n",
|
||
" \"\"\"Display website summary using specified model\"\"\"\n",
|
||
" print(f\"🔍 Summarizing {url} with {model}...\")\n",
|
||
" summary = summarize_with_model(url, model, **kwargs)\n",
|
||
" display(Markdown(f\"## Summary using {model}\\\\n\\\\n{summary}\"))\n",
|
||
"\n",
|
||
"print(\"Model-agnostic summarization functions defined!\")\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"## Day 2 Solution Test - Model Comparison\n",
|
||
"============================================================\n",
|
||
"Testing website: https://openai.com\n",
|
||
"\\n============================================================\n",
|
||
"\\n📊 Testing with OpenAI GPT-4o-mini...\n",
|
||
"🔍 Summarizing https://openai.com with gpt-4o-mini...\n",
|
||
"❌ Error with OpenAI GPT-4o-mini: name 're' is not defined\n",
|
||
"\\n----------------------------------------\n",
|
||
"\\n📊 Testing with Ollama Llama 3.2 3B...\n",
|
||
"🔍 Summarizing https://openai.com with llama3.2:3b...\n",
|
||
"❌ Error with Ollama Llama 3.2 3B: name 're' is not defined\n",
|
||
"\\n----------------------------------------\n",
|
||
"\\n📊 Testing with Ollama Llama 3.2 1B...\n",
|
||
"🔍 Summarizing https://openai.com with llama3.2:1b...\n",
|
||
"❌ Error with Ollama Llama 3.2 1B: name 're' is not defined\n",
|
||
"\\n----------------------------------------\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Test Day 2 Solution - Model Comparison\n",
|
||
"print(\"## Day 2 Solution Test - Model Comparison\")\n",
|
||
"print(\"=\"*60)\n",
|
||
"\n",
|
||
"# Test with a JavaScript-heavy website\n",
|
||
"test_url = \"https://openai.com\"\n",
|
||
"\n",
|
||
"print(f\"Testing website: {test_url}\")\n",
|
||
"print(\"\\\\n\" + \"=\"*60)\n",
|
||
"\n",
|
||
"# Test with different models\n",
|
||
"models_to_test = [\n",
|
||
" (\"gpt-4o-mini\", \"OpenAI GPT-4o-mini\"),\n",
|
||
" (\"llama3.2:3b\", \"Ollama Llama 3.2 3B\"),\n",
|
||
" (\"llama3.2:1b\", \"Ollama Llama 3.2 1B\")\n",
|
||
"]\n",
|
||
"\n",
|
||
"for model, description in models_to_test:\n",
|
||
" print(f\"\\\\n📊 Testing with {description}...\")\n",
|
||
" try:\n",
|
||
" display_summary_with_model(test_url, model=model, temperature=0.4, max_tokens=200)\n",
|
||
" except Exception as e:\n",
|
||
" print(f\"❌ Error with {description}: {e}\")\n",
|
||
" \n",
|
||
" print(\"\\\\n\" + \"-\"*40)\n"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": ".venv",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.12.12"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|