Merge branch 'main' of github.com:ed-donner/llm_engineering
This commit is contained in:
7
.gitignore
vendored
7
.gitignore
vendored
@@ -182,3 +182,10 @@ products_vectorstore/
|
||||
# ignore optimized C++ code from being checked into repo
|
||||
week4/optimized
|
||||
week4/simple
|
||||
*.env.save
|
||||
.virtual_documents/
|
||||
WingIDE_Projekti/
|
||||
nohup.out
|
||||
*.png
|
||||
|
||||
scraper_cache/
|
||||
|
||||
144
community-contributions/bojan-playwright-scraper/README.md
Normal file
144
community-contributions/bojan-playwright-scraper/README.md
Normal file
@@ -0,0 +1,144 @@
|
||||
|
||||
# 🧠 Community Contribution: Async Playwright-based AI Scraper
|
||||
|
||||
## Overview
|
||||
This project is a fully asynchronous, headless-browser-based scraper built using Playwright and the OpenAI API.
|
||||
It scrapes and analyzes content from four AI-related websites, producing structured summaries in Markdown and Jupyter notebook formats.
|
||||
Playwright was chosen over Selenium for its speed and efficiency, making it ideal for modern web scraping tasks.
|
||||
|
||||
**Developed by:** lakovicb
|
||||
**IDE used:** WingIDE Pro 10 (Jupyter compatibility via nest_asyncio)
|
||||
**Python version:** 3.12.9 (developed and tested with Anaconda)
|
||||
|
||||
---
|
||||
|
||||
## 📦 Features
|
||||
- 🧭 Simulates human-like interactions (mouse movement, scrolling)
|
||||
- 🧠 GPT-based analysis using OpenAI's API
|
||||
- 🧪 Works inside JupyterLab using nest_asyncio
|
||||
- 📊 Prometheus metrics for scraping observability
|
||||
- ⚡ Smart content caching via diskcache
|
||||
- 📝 Generates structured Markdown summaries and Jupyter notebooks
|
||||
|
||||
---
|
||||
|
||||
## 🚀 How to Run
|
||||
|
||||
### 1. Install dependencies
|
||||
Run these commands in your terminal:
|
||||
```bash
|
||||
conda install python-dotenv prometheus_client diskcache nbformat
|
||||
pip install playwright openai
|
||||
playwright install
|
||||
```
|
||||
> Note: Ensure your environment supports Python 3.12 for optimal performance.
|
||||
|
||||
---
|
||||
|
||||
### 2. Set environment variables
|
||||
Create a `.env` file in `/home/lakov/projects/llm_engineering/` with:
|
||||
```env
|
||||
OPENAI_API_KEY=your_openai_key
|
||||
```
|
||||
(Optional) Define proxy/login parameters if needed.
|
||||
|
||||
---
|
||||
|
||||
### 3. Run the scraper
|
||||
```bash
|
||||
python playwright_ai_scraper.py
|
||||
```
|
||||
This scrapes and analyzes the following URLs:
|
||||
- https://www.anthropic.com
|
||||
- https://deepmind.google
|
||||
- https://huggingface.co
|
||||
- https://runwayml.com
|
||||
|
||||
---
|
||||
|
||||
### 4. Generate notebooks
|
||||
```bash
|
||||
python notebook_generator.py
|
||||
```
|
||||
Enter a URL when prompted to generate a Jupyter notebook in the `notebooks/` directory.
|
||||
|
||||
---
|
||||
|
||||
## 📊 Results
|
||||
|
||||
### Python Files for Developers
|
||||
- `playwright_ai_scraper.py`: Core async scraper and analyzer.
|
||||
- `notebook_generator.py`: Creates Jupyter notebooks for given URLs.
|
||||
|
||||
These files enable transparency, reproducibility, and extendability.
|
||||
|
||||
---
|
||||
|
||||
### Markdown Summaries
|
||||
Saved in `outputs/`:
|
||||
- Structured analyses with sections for Summary, Entities, Updates, Topics, and Features.
|
||||
- Readable and portable format.
|
||||
|
||||
---
|
||||
|
||||
### Jupyter Notebooks
|
||||
Available in `notebooks/`:
|
||||
- `Playwright_AI_Scraper_JupyterAsync.ipynb`
|
||||
- `Playwright_AI_Scraper_Showcase_Formatted.ipynb`
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Playwright vs. Selenium
|
||||
|
||||
| Criteria | Selenium | Playwright |
|
||||
|------------------------|---------------------------------------|--------------------------------------|
|
||||
| Release Year | 2004 | 2020 |
|
||||
| Supported Browsers | Chrome, Firefox, Safari, Edge, IE | Chromium, Firefox, WebKit |
|
||||
| Supported Languages | Many | Python, JS/TS, Java, C# |
|
||||
| Setup | Complex (WebDrivers) | Simple (auto-installs binaries) |
|
||||
| Execution Speed | Slower | Faster (WebSocket) |
|
||||
| Dynamic Content | Good (requires explicit waits) | Excellent (auto-waits) |
|
||||
| Community Support | Large, mature | Growing, modern, Microsoft-backed |
|
||||
|
||||
> **Playwright** was chosen for its speed, simplicity, and modern feature set.
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Asynchronous Code and WingIDE Pro 10
|
||||
|
||||
- Fully async scraping with `asyncio`.
|
||||
- Developed using WingIDE Pro 10 for:
|
||||
- Robust async support
|
||||
- Full Python 3.12 compatibility
|
||||
- Integration with JupyterLab via `nest_asyncio`
|
||||
- Stability and efficient debugging
|
||||
|
||||
---
|
||||
|
||||
## 📁 Directory Structure
|
||||
|
||||
```bash
|
||||
playwright_ai_scraper.py # Main scraper script
|
||||
notebook_generator.py # Notebook generator script
|
||||
outputs/ # Markdown summaries
|
||||
notebooks/ # Generated Jupyter notebooks
|
||||
requirements.txt # List of dependencies
|
||||
scraper_cache/ # Cache directory
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 Notes
|
||||
|
||||
- Uses Prometheus metrics and diskcache.
|
||||
- Ensure a valid OpenAI API key.
|
||||
- Potential extensions: PDF export, LangChain pipeline, vector store ingestion.
|
||||
|
||||
- **Note:** Due to the dynamic nature and limited static text on the Huggingface.co homepage, the scraper retrieved only minimal information, which resulted in a limited AI-generated summary. This behavior reflects a realistic limitation of scraping dynamic websites without interaction-based extraction.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 🙏 Thanks
|
||||
|
||||
Special thanks to **Ed Donner** for the amazing course and project challenge inspiration!
|
||||
@@ -0,0 +1,79 @@
|
||||
import sys
|
||||
import os
|
||||
import nbformat
|
||||
from nbformat.v4 import new_notebook, new_markdown_cell
|
||||
import asyncio
|
||||
from dotenv import load_dotenv
|
||||
import logging
|
||||
|
||||
# Loading .env variables
|
||||
load_dotenv()
|
||||
|
||||
# Setting up logging
|
||||
logging.basicConfig(
|
||||
level=os.getenv("LOG_LEVEL", "INFO").upper(),
|
||||
format="%(asctime)s - %(levelname)s - %(message)s"
|
||||
)
|
||||
|
||||
# Adding project directory to sys.path
|
||||
project_dir = os.path.join(
|
||||
"/home/lakov/projects/llm_engineering",
|
||||
"community-contributions/playwright-bojan"
|
||||
)
|
||||
if project_dir not in sys.path:
|
||||
sys.path.insert(0, project_dir)
|
||||
|
||||
# Importing analyze_content from playwright_ai_scraper.py
|
||||
try:
|
||||
from playwright_ai_scraper import analyze_content
|
||||
except ModuleNotFoundError as e:
|
||||
logging.error(f"Error importing module: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
# Function to save the notebook
|
||||
|
||||
|
||||
def save_notebook(url, content):
|
||||
output_dir = os.path.join(project_dir, "notebooks")
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
|
||||
# Extracting the domain from the URL
|
||||
domain = url.split("//")[-1].split("/")[0].replace(".", "_")
|
||||
filename = f"{domain}_Summary.ipynb"
|
||||
path = os.path.join(output_dir, filename)
|
||||
|
||||
nb = new_notebook()
|
||||
intro = f"""
|
||||
# Summary for {url}
|
||||
|
||||
This notebook contains an AI-generated summary of the website content.
|
||||
|
||||
**URL**: `{url}`
|
||||
|
||||
---
|
||||
**Analysis**:
|
||||
{content}
|
||||
"""
|
||||
nb.cells.append(new_markdown_cell(intro))
|
||||
|
||||
with open(path, 'w', encoding='utf-8') as f:
|
||||
nbformat.write(nb, f)
|
||||
|
||||
logging.info(f"Notebook saved to: {path}")
|
||||
return path
|
||||
|
||||
# Main function
|
||||
|
||||
|
||||
async def main():
|
||||
url = input("Enter URL to scrape: ")
|
||||
try:
|
||||
result = await analyze_content(url, headless=True)
|
||||
save_notebook(url, result)
|
||||
print(f"Summary for {url}:\n{result}")
|
||||
except Exception as e:
|
||||
logging.error(f"Failed to process {url}: {e}")
|
||||
print(f"Error: {e}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -0,0 +1,60 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "144bdfa2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"# Summary for https://deepmind.google\n",
|
||||
"\n",
|
||||
"This notebook contains an AI-generated summary of the website content.\n",
|
||||
"\n",
|
||||
"**URL**: `https://deepmind.google`\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"**Analysis**:\n",
|
||||
"### Summary\n",
|
||||
"The website introduces \"Gemini 2.5,\" which appears to be the latest version of an AI model designed for the \"agentic era.\" The site likely focuses on promoting and explaining the capabilities and applications of this AI technology.\n",
|
||||
"\n",
|
||||
"### Entities\n",
|
||||
"- **Gemini 2.5**: This is the primary entity mentioned, referring to the AI model.\n",
|
||||
"- No specific individuals or organizations are named in the provided content.\n",
|
||||
"\n",
|
||||
"### Updates\n",
|
||||
"- The introduction of \"Gemini 2.5\" is a recent update, indicating a new or significantly updated version of the AI model.\n",
|
||||
"\n",
|
||||
"### Topics\n",
|
||||
"- **AI Models**: The site focuses on artificial intelligence technologies.\n",
|
||||
"- **Agentic Era**: This suggests a theme of AI models being used in ways that are proactive or autonomous.\n",
|
||||
"\n",
|
||||
"### Features\n",
|
||||
"- **Chat with Gemini**: This feature allows users to interact directly with the Gemini 2.5 AI, presumably to demonstrate its capabilities or to provide user support.\n",
|
||||
"- Detailed descriptions of other projects or initiatives are not provided in the content.\n",
|
||||
"\n",
|
||||
"**Note**: The content provided is limited, and additional information might be available on the actual website to provide a more comprehensive analysis.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python (WSL-Lakov)",
|
||||
"language": "python",
|
||||
"name": "lakov-wsl"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.7"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,59 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3069b0e8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"# Summary for https://huggingface.co\n",
|
||||
"\n",
|
||||
"This notebook contains an AI-generated summary of the website content.\n",
|
||||
"\n",
|
||||
"**URL**: `https://huggingface.co`\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"**Analysis**:\n",
|
||||
"Based on the provided content snippet, here is an analysis structured under the requested headings:\n",
|
||||
"\n",
|
||||
"### Summary\n",
|
||||
"The information provided is insufficient to determine the exact purpose of the website. However, the name \"Dia-1.6B\" suggests it might be related to a project or software version.\n",
|
||||
"\n",
|
||||
"### Entities\n",
|
||||
"No specific individuals or organizations are mentioned in the provided content.\n",
|
||||
"\n",
|
||||
"### Updates\n",
|
||||
"The content was updated 1 day ago, indicating recent activity or changes. However, the nature of these updates is not specified.\n",
|
||||
"\n",
|
||||
"### Topics\n",
|
||||
"The snippet does not provide enough information to identify primary subjects or themes.\n",
|
||||
"\n",
|
||||
"### Features\n",
|
||||
"The content does not detail any specific projects or initiatives.\n",
|
||||
"\n",
|
||||
"**Note:** The analysis is limited due to the lack of detailed information in the provided content snippet. More comprehensive content would be required for a complete analysis.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python (WSL-Lakov)",
|
||||
"language": "python",
|
||||
"name": "lakov-wsl"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.7"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,62 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d2eeed62",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"# Summary for https://runwayml.com\n",
|
||||
"\n",
|
||||
"This notebook contains an AI-generated summary of the website content.\n",
|
||||
"\n",
|
||||
"**URL**: `https://runwayml.com`\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"**Analysis**:\n",
|
||||
"### Summary\n",
|
||||
"The website promotes a series of short films created using \"Gen-4,\" which is described as the next-generation series of AI models designed for media generation and ensuring world consistency. The site appears to focus on showcasing the capabilities of these AI models in filmmaking.\n",
|
||||
"\n",
|
||||
"### Entities\n",
|
||||
"- **Gen-4**: The AI model series used for creating the films.\n",
|
||||
"- No specific individuals or organizations are mentioned beyond the reference to the AI technology.\n",
|
||||
"\n",
|
||||
"### Updates\n",
|
||||
"- There are no specific recent announcements or news updates provided in the content.\n",
|
||||
"\n",
|
||||
"### Topics\n",
|
||||
"- **AI in Filmmaking**: The use of advanced AI models in the creation of films.\n",
|
||||
"- **Short Films**: Mention of specific titles like \"The Lonely Little Flame,\" \"NYC is a Zoo,\" and \"The Herd\" suggests a focus on narrative short films.\n",
|
||||
"- **Technology in Media Production**: Emphasis on the role of Gen-4 AI technology in media production.\n",
|
||||
"\n",
|
||||
"### Features\n",
|
||||
"- **Gen-4 AI Models**: Highlighted as a significant innovation in media generation.\n",
|
||||
"- **Short Films**: The films listed (\"The Lonely Little Flame,\" \"NYC is a Zoo,\" \"The Herd\") are examples of projects created using the Gen-4 technology.\n",
|
||||
"- **Interactive Elements**: Options to \"Try Runway Now\" and \"Learn More About Gen-4\" suggest interactive features for visitors to engage with the technology or learn more about it.\n",
|
||||
"\n",
|
||||
"Additional information about the specific functionality of the Gen-4 AI models, the background of the organization, or detailed descriptions of the films would be needed for a more comprehensive analysis.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python (WSL-Lakov)",
|
||||
"language": "python",
|
||||
"name": "lakov-wsl"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.7"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,70 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cccf3fd8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"# Summary for https://www.anthropic.com\n",
|
||||
"\n",
|
||||
"This notebook contains an AI-generated summary of the website content.\n",
|
||||
"\n",
|
||||
"**URL**: `https://www.anthropic.com`\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"**Analysis**:\n",
|
||||
"### Summary\n",
|
||||
"The website is dedicated to showcasing AI research and products with a strong emphasis on safety. It introduces \"Claude 3.7 Sonnet,\" described as their most intelligent AI model, and highlights the organization's commitment to building AI that serves humanity's long-term well-being. The site also offers resources and tools for building AI-powered applications and emphasizes responsible AI development.\n",
|
||||
"\n",
|
||||
"### Entities\n",
|
||||
"- **Anthropic**: The organization behind the website, focused on developing AI technologies with an emphasis on safety and human benefit.\n",
|
||||
"- **Claude 3.7 Sonnet**: The latest AI model featured prominently on the site.\n",
|
||||
"\n",
|
||||
"### Updates\n",
|
||||
"Recent announcements or news include:\n",
|
||||
"- **Mar 27, 2025**: Articles on \"Tracing the thoughts of a large language model\" and \"Anthropic Economic Index.\"\n",
|
||||
"- **Feb 24, 2025**: Releases of \"Claude 3.7 Sonnet and Claude Code\" and \"Claude's extended thinking.\"\n",
|
||||
"- **Dec 18, 2024**: Discussion on \"Alignment faking in large language models.\"\n",
|
||||
"- **Nov 25, 2024**: Introduction of the \"Model Context Protocol.\"\n",
|
||||
"\n",
|
||||
"### Topics\n",
|
||||
"Primary subjects or themes covered on the website include:\n",
|
||||
"- AI Safety and Ethics\n",
|
||||
"- AI-powered Applications Development\n",
|
||||
"- Responsible AI Development\n",
|
||||
"- AI Research and Policy Work\n",
|
||||
"\n",
|
||||
"### Features\n",
|
||||
"Noteworthy projects or initiatives mentioned:\n",
|
||||
"- **Claude 3.7 Sonnet**: The latest AI model available for use.\n",
|
||||
"- **Anthropic Academy**: An educational initiative to teach users how to build with Claude.\n",
|
||||
"- **Anthropic’s Responsible Scaling Policy**: A policy framework guiding the responsible development of AI technologies.\n",
|
||||
"- **Model Context Protocol**: A new product initiative aimed at enhancing AI model understanding and safety.\n",
|
||||
"\n",
|
||||
"These sections collectively provide a comprehensive view of the website's focus on advancing AI technology with a foundational commitment to safety and ethical considerations.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python (WSL-Lakov)",
|
||||
"language": "python",
|
||||
"name": "lakov-wsl"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.7"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,245 @@
|
||||
# playwright_ai_scraper.py
|
||||
import asyncio
|
||||
import logging
|
||||
import random
|
||||
import time
|
||||
import os
|
||||
from playwright.async_api import async_playwright
|
||||
from openai import OpenAI
|
||||
from prometheus_client import Counter, Histogram, start_http_server
|
||||
from diskcache import Cache
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# Loading .env variablesi
|
||||
load_dotenv()
|
||||
|
||||
# Setting up logging
|
||||
logging.basicConfig(
|
||||
level=os.getenv("LOG_LEVEL", "INFO").upper(),
|
||||
format="%(asctime)s - %(levelname)s - %(message)s"
|
||||
)
|
||||
|
||||
# Setting up Prometheus metrics
|
||||
SCRAPE_ATTEMPTS = Counter("scrape_attempts", "Total scraping attempts")
|
||||
SCRAPE_DURATION = Histogram(
|
||||
"scrape_duration", "Scraping duration distribution"
|
||||
)
|
||||
|
||||
# Setting up cache
|
||||
cache = Cache("./scraper_cache")
|
||||
|
||||
# Custom exceptions
|
||||
|
||||
|
||||
class ScrapingError(Exception):
|
||||
pass
|
||||
|
||||
|
||||
class AnalysisError(Exception):
|
||||
pass
|
||||
|
||||
|
||||
class AIScraper:
|
||||
API_KEY = os.getenv("OPENAI_API_KEY")
|
||||
MAX_CONTENT = int(os.getenv("MAX_CONTENT_LENGTH", 30000))
|
||||
|
||||
def __init__(self, headless=True):
|
||||
self.user_agents = [
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
|
||||
"(KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 "
|
||||
"Safari/537.36"
|
||||
]
|
||||
self.timeout = 60000 # 60 seconds
|
||||
self.retries = int(os.getenv("RETRY_COUNT", 2))
|
||||
self.headless = headless
|
||||
self.delays = {
|
||||
"scroll": (500, 2000),
|
||||
"click": (100, 300),
|
||||
"move": (50, 200)
|
||||
}
|
||||
|
||||
async def human_interaction(self, page):
|
||||
"""Simulates human behavior on the page."""
|
||||
try:
|
||||
for _ in range(random.randint(2, 5)):
|
||||
x = random.randint(0, 1366)
|
||||
y = random.randint(0, 768)
|
||||
await page.mouse.move(x, y, steps=random.randint(5, 20))
|
||||
await page.wait_for_timeout(
|
||||
random.randint(*self.delays["move"])
|
||||
)
|
||||
scroll = random.choice([300, 600, 900])
|
||||
await page.mouse.wheel(0, scroll)
|
||||
await page.wait_for_timeout(
|
||||
random.randint(*self.delays["scroll"])
|
||||
)
|
||||
except Exception as e:
|
||||
logging.warning(f"Human interaction failed: {e}")
|
||||
|
||||
async def load_page(self, page, url):
|
||||
"""Loads the page with dynamic waiting."""
|
||||
start_time = time.time()
|
||||
try:
|
||||
await page.goto(
|
||||
url, wait_until="domcontentloaded", timeout=self.timeout
|
||||
)
|
||||
selectors = [
|
||||
"main article",
|
||||
"#main-content",
|
||||
"section:first-of-type",
|
||||
'div[class*="content"]',
|
||||
"body"
|
||||
]
|
||||
for selector in selectors:
|
||||
element = await page.query_selector(selector)
|
||||
if element:
|
||||
return True
|
||||
if time.time() - start_time < 30:
|
||||
await page.wait_for_timeout(
|
||||
30000 - int(time.time() - start_time)
|
||||
)
|
||||
return True
|
||||
except Exception as e:
|
||||
logging.error(f"Error loading {url}: {e}")
|
||||
return False
|
||||
|
||||
async def scrape_with_retry(self, url):
|
||||
"""Scrapes the page with retries."""
|
||||
SCRAPE_ATTEMPTS.inc()
|
||||
start_time = time.time()
|
||||
async with async_playwright() as p:
|
||||
try:
|
||||
browser = await p.chromium.launch(headless=self.headless)
|
||||
context = await browser.new_context(
|
||||
user_agent=random.choice(self.user_agents),
|
||||
viewport={"width": 1366, "height": 768}
|
||||
)
|
||||
page = await context.new_page()
|
||||
await page.add_init_script("""
|
||||
Object.defineProperty(navigator, 'webdriver', {
|
||||
get: () => false
|
||||
});
|
||||
""")
|
||||
for attempt in range(self.retries):
|
||||
try:
|
||||
logging.info(
|
||||
f"Attempt {attempt + 1}: Scraping {url}")
|
||||
if not await self.load_page(page, url):
|
||||
raise ScrapingError(f"Failed to load {url}")
|
||||
await self.human_interaction(page)
|
||||
content = await page.evaluate(
|
||||
"""() => {
|
||||
const s = [
|
||||
'main article',
|
||||
'#main-content',
|
||||
'section:first-of-type',
|
||||
'div[class*="content"]'
|
||||
];
|
||||
let c = '';
|
||||
for (const x of s) {
|
||||
const e = document.querySelector(x);
|
||||
if (e) c += e.innerText + '\\n';
|
||||
}
|
||||
return c.trim() || document.body.innerText;
|
||||
}"""
|
||||
)
|
||||
if not content.strip():
|
||||
raise ScrapingError("No content")
|
||||
SCRAPE_DURATION.observe(time.time() - start_time)
|
||||
return content[:self.MAX_CONTENT]
|
||||
except ScrapingError as e:
|
||||
logging.warning(f"Attempt {attempt + 1} failed: {e}")
|
||||
if attempt < self.retries - 1:
|
||||
await asyncio.sleep(5)
|
||||
else:
|
||||
raise
|
||||
except Exception as e:
|
||||
logging.error(f"Error in scrape: {e}")
|
||||
raise
|
||||
finally:
|
||||
await browser.close()
|
||||
raise ScrapingError(f"All attempts to scrape {url} failed")
|
||||
|
||||
async def get_cached_content(self, url):
|
||||
"""Retrieves content from cache or scrapes."""
|
||||
key = f"content_{url.replace('/', '_')}"
|
||||
content = cache.get(key)
|
||||
if content is None:
|
||||
try:
|
||||
content = await self.scrape_with_retry(url)
|
||||
cache.set(
|
||||
key, content, expire=int(os.getenv("CACHE_EXPIRY", 3600))
|
||||
)
|
||||
except Exception as e:
|
||||
logging.error(f"Err: {e}")
|
||||
raise
|
||||
return content
|
||||
|
||||
|
||||
async def analyze_content(url, headless=True):
|
||||
"""Analyzes the page content using the OpenAI API."""
|
||||
try:
|
||||
scraper = AIScraper(headless=headless)
|
||||
content = await scraper.get_cached_content(url)
|
||||
client = OpenAI(api_key=scraper.API_KEY)
|
||||
if not client.api_key:
|
||||
raise AnalysisError("OpenAI API key not configured")
|
||||
prompt = """
|
||||
Analyze the website content and extract:
|
||||
1. **Summary**: Overview of the website's purpose.
|
||||
2. **Entities**: Prominent individuals or organizations.
|
||||
3. **Updates**: Recent announcements or news.
|
||||
4. **Topics**: Primary subjects or themes.
|
||||
5. **Features**: Noteworthy projects or initiatives.
|
||||
Format output under these headings. Note if info is missing.
|
||||
Content: {content}
|
||||
""".format(content=content)
|
||||
response = client.chat.completions.create(
|
||||
model=os.getenv("OPENAI_MODEL", "gpt-4-turbo"),
|
||||
messages=[
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": prompt}
|
||||
],
|
||||
temperature=float(os.getenv("MODEL_TEMPERATURE", 0.3)),
|
||||
max_tokens=int(os.getenv("MAX_TOKENS", 1500)),
|
||||
top_p=float(os.getenv("MODEL_TOP_P", 0.9))
|
||||
)
|
||||
if not response.choices:
|
||||
raise AnalysisError("Empty response from OpenAI")
|
||||
return response.choices[0].message.content
|
||||
except (ScrapingError, AnalysisError) as e:
|
||||
logging.error(f"Analysis failed: {e}")
|
||||
return f"Error: {e}"
|
||||
except Exception as e:
|
||||
logging.exception(f"Error in analyze: {e}")
|
||||
return f"Unexpected error: {e}"
|
||||
|
||||
|
||||
async def main():
|
||||
"""Main function for scraping and analysis."""
|
||||
try:
|
||||
port = int(os.getenv("PROMETHEUS_PORT", 8000))
|
||||
start_http_server(port)
|
||||
logging.info(f"Prometheus server started on port {port}")
|
||||
except Exception as e:
|
||||
logging.warning(f"Prometheus server failed: {e}")
|
||||
urls = [
|
||||
"https://www.anthropic.com",
|
||||
"https://deepmind.google",
|
||||
"https://huggingface.co",
|
||||
"https://runwayml.com"
|
||||
]
|
||||
for url in urls:
|
||||
start_time = time.time()
|
||||
result = await analyze_content(url, headless=True)
|
||||
end_time = time.time()
|
||||
print(
|
||||
f"\nAnalysis of {url} completed in "
|
||||
f"{end_time - start_time:.2f} seconds\n"
|
||||
)
|
||||
print(result)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -0,0 +1,6 @@
|
||||
playwright>=1.43.0
|
||||
openai>=1.14.2
|
||||
prometheus-client>=0.19.0
|
||||
diskcache>=5.6.1
|
||||
python-dotenv>=1.0.1
|
||||
nest_asyncio>=1.6.0
|
||||
@@ -0,0 +1,87 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5b568f38-7a64-453d-a88c-2f132801a084",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import requests\n",
|
||||
"import ollama\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"from IPython.display import Markdown, display\n",
|
||||
"\n",
|
||||
"headers = {\n",
|
||||
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
|
||||
"}\n",
|
||||
"class Website:\n",
|
||||
"\n",
|
||||
" def __init__(self, url):\n",
|
||||
" \"\"\"\n",
|
||||
" Create this Website object from the given url using the BeautifulSoup library\n",
|
||||
" \"\"\"\n",
|
||||
" self.url = url\n",
|
||||
" response = requests.get(url, headers=headers)\n",
|
||||
" soup = BeautifulSoup(response.content, 'html.parser')\n",
|
||||
" self.title = soup.title.string if soup.title else \"No title found\"\n",
|
||||
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
|
||||
" irrelevant.decompose()\n",
|
||||
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n",
|
||||
" \n",
|
||||
"system_prompt = \"You are an assistant that analyzes the contents of a website \\\n",
|
||||
"and provides a short summary, ignoring text that might be navigation related. \\\n",
|
||||
"Respond in markdown.\"\n",
|
||||
"\n",
|
||||
"def user_prompt_for(website):\n",
|
||||
" user_prompt = f\"You are looking at a website titled {website.title}\"\n",
|
||||
" user_prompt += \"\\nThe contents of this website is as follows; \\\n",
|
||||
"please provide a short summary of this website in markdown. \\\n",
|
||||
"If it includes news or announcements, then summarize these too.\\n\\n\"\n",
|
||||
" user_prompt += website.text\n",
|
||||
" return user_prompt\n",
|
||||
"\t\n",
|
||||
"def messages_for(website):\n",
|
||||
" return [\n",
|
||||
" {\"role\": \"system\", \"content\": system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
|
||||
" ]\n",
|
||||
"\t\n",
|
||||
"def summarize(url):\n",
|
||||
" website = Website(url)\n",
|
||||
" response = ollama.chat(\n",
|
||||
" model = \"llama3.2\",\n",
|
||||
" messages = messages_for(website)\n",
|
||||
" )\n",
|
||||
" return response['message']['content']\n",
|
||||
"\t\n",
|
||||
"def display_summary(url):\n",
|
||||
" summary = summarize(url)\n",
|
||||
" display(Markdown(summary))\n",
|
||||
"\t\n",
|
||||
"display_summary(\"http://news.google.com/\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
76
week1/community-contributions/ag-w1d1-site-summary.py
Normal file
76
week1/community-contributions/ag-w1d1-site-summary.py
Normal file
@@ -0,0 +1,76 @@
|
||||
import os
|
||||
import requests
|
||||
from dotenv import load_dotenv
|
||||
from bs4 import BeautifulSoup
|
||||
from IPython.display import Markdown, display
|
||||
from openai import OpenAI
|
||||
|
||||
#Function to get API key for OpanAI from .env file
|
||||
def get_api_key():
|
||||
load_dotenv(override=True)
|
||||
api_key = os.getenv("OPENAI_API_KEY")
|
||||
if not api_key:
|
||||
print("No API Key found")
|
||||
elif not api_key.startswith("sk-"):
|
||||
print("Invalid API Key. Should start with sk-")
|
||||
elif api_key.strip() != api_key:
|
||||
print("Remove leading and trailing spaces fron the key")
|
||||
else:
|
||||
print("API Key found and looks good!")
|
||||
return api_key
|
||||
|
||||
#load API key and OpenAI class
|
||||
api_key = get_api_key()
|
||||
openai = OpenAI()
|
||||
|
||||
#headers and class for website to summarize
|
||||
headers = {
|
||||
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
|
||||
}
|
||||
class Website:
|
||||
def __init__(self, url):
|
||||
self.url = url
|
||||
response = requests.get(url, headers=headers)
|
||||
soup = BeautifulSoup(response.content, 'html.parser')
|
||||
self.title = soup.title.string if soup.title else "No title found"
|
||||
for irrelevant in soup.body(["script", "style", "img", "input"]):
|
||||
irrelevant.decompose()
|
||||
self.text = soup.body.get_text(separator="\n", strip=True)
|
||||
|
||||
#define prompts
|
||||
system_prompt = "You are an assistant that analyzes the contents of a website \
|
||||
and provides a short summary, ignoring text that might be navigation related. \
|
||||
Respond in markdown."
|
||||
|
||||
def user_prompt_for(website):
|
||||
user_prompt = f"You are looking at a website titled {website.title}"
|
||||
user_prompt += "\nThe contents of this website is as follows; \
|
||||
please provide a short summary of this website in markdown. \
|
||||
If it includes news or announcements, then summarize these too.\n\n"
|
||||
user_prompt += website.text
|
||||
return user_prompt
|
||||
|
||||
#prepare message for use in OpenAI call
|
||||
def messages_for(website):
|
||||
return [
|
||||
{"role": "system", "content": system_prompt},
|
||||
{"role": "user", "content": user_prompt_for(website)}
|
||||
]
|
||||
|
||||
#define function to summarize a given website
|
||||
def summarize(url):
|
||||
website = Website(url)
|
||||
response = openai.chat.completions.create(
|
||||
model = "gpt-4o-mini",
|
||||
messages = messages_for(website)
|
||||
)
|
||||
return response.choices[0].message.content
|
||||
|
||||
#function to display summary in markdown format
|
||||
def display_summary(url):
|
||||
summary = summarize(url)
|
||||
display(Markdown(summary))
|
||||
print(summary)
|
||||
|
||||
url = "https://edwarddonner.com"
|
||||
display_summary(url)
|
||||
BIN
week1/community-contributions/datasheets/part_new.pdf
Normal file
BIN
week1/community-contributions/datasheets/part_new.pdf
Normal file
Binary file not shown.
BIN
week1/community-contributions/datasheets/part_old.pdf
Normal file
BIN
week1/community-contributions/datasheets/part_old.pdf
Normal file
Binary file not shown.
233
week1/community-contributions/day-1-travel-recommendation.ipynb
Normal file
233
week1/community-contributions/day-1-travel-recommendation.ipynb
Normal file
@@ -0,0 +1,233 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "50ed5733",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"import requests\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"from IPython.display import Markdown, display\n",
|
||||
"from openai import OpenAI\n",
|
||||
"\n",
|
||||
"# If you get an error running this cell, then please head over to the troubleshooting notebook!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a3b173a9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"API key found and looks good so far!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Load environment variables in a file called .env\n",
|
||||
"\n",
|
||||
"load_dotenv(override=True)\n",
|
||||
"api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"\n",
|
||||
"# Check the key\n",
|
||||
"\n",
|
||||
"if not api_key:\n",
|
||||
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
|
||||
"elif not api_key.startswith(\"sk-proj-\"):\n",
|
||||
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
|
||||
"elif api_key.strip() != api_key:\n",
|
||||
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
|
||||
"else:\n",
|
||||
" print(\"API key found and looks good so far!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "191c7214",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai = OpenAI()\n",
|
||||
"\n",
|
||||
"# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.\n",
|
||||
"# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "50adea39",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"system_prompt = \"\"\"Generate a detailed travel recommendation.Include the following information: \\\n",
|
||||
" 1.**Overview**: A brief introduction to the destination, highlighting its unique characteristics and appeal.\\\n",
|
||||
" 2.**Cost Breakdown**: - Average cost of accommodation (budget, mid-range, luxury options).\\\n",
|
||||
" - Estimated daily expenses (food, transportation, activities).\\\n",
|
||||
" - Total estimated cost for a typical 5-day trip for a solo traveler and a family of four.\\\n",
|
||||
" 3.**Best Time to Visit**: \\\n",
|
||||
" - Identify the peak, shoulder, and off-peak seasons.\\\n",
|
||||
" - Highlight the pros and cons of visiting during each season, including weather conditions and local events.\\\n",
|
||||
" 4.**Hidden Gems**: - List at least five lesser-known attractions or experiences that are must-sees.\\\n",
|
||||
" - Provide a brief description of each hidden gem, including why it is special and any tips for visiting.\\\n",
|
||||
" 5.**Local Tips**: \\\n",
|
||||
" - Suggest local customs or etiquette that travelers should be aware of.\\\n",
|
||||
" - Recommend local dishes to try and where to find them.Make sure the recommendation is engaging and informative, appealing to a diverse range of travelers.\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "aaac13d8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def messages_for(user_prompt):\n",
|
||||
" return [\n",
|
||||
" {\"role\": \"system\", \"content\": system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": user_prompt }\n",
|
||||
" ]\n",
|
||||
"\n",
|
||||
"def recommender():\n",
|
||||
" response = openai.chat.completions.create(\n",
|
||||
" model = \"gpt-4o-mini\",\n",
|
||||
" messages = messages_for(f\"Create a travel recommendation for couple in the Netherlands\")\n",
|
||||
" )\n",
|
||||
" return response.choices[0].message.content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "efad902a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def display_result():\n",
|
||||
" recommendendation = recommender()\n",
|
||||
" display(Markdown(recommendendation))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "5564c22c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"### Travel Recommendation: Exploring the Netherlands as a Couple\n",
|
||||
"\n",
|
||||
"#### Overview \n",
|
||||
"The Netherlands, with its charming canals, stunning tulip fields, and vibrant cities, is a romantic destination for couples seeking beauty and culture. Whether you're wandering hand-in-hand through the cobbled streets of Amsterdam, enjoying a serene boat ride in Giethoorn, or indulging in local delicacies at quaint cafés, the Netherlands combines history, art, and picturesque landscapes in a unique blend. The country not only boasts iconic landmarks such as the Rijksmuseum and the Anne Frank House but also an extensive network of cycling paths that allow you to discover hidden treasures together.\n",
|
||||
"\n",
|
||||
"#### Cost Breakdown \n",
|
||||
"- **Accommodation**:\n",
|
||||
" - **Budget**: €50-€100 per night (Hostels like Stayokay Amsterdam Stadsdoelen)\n",
|
||||
" - **Mid-Range**: €100-€200 per night (Hotels such as Hotel Estheréa in Amsterdam)\n",
|
||||
" - **Luxury**: €200-€500+ per night (Luxury options like The Dylan in Amsterdam)\n",
|
||||
"\n",
|
||||
"- **Estimated Daily Expenses**:\n",
|
||||
" - **Food**: €30-€70 per person (Cafés and local restaurants)\n",
|
||||
" - **Transportation**: €10-€20 per person (Train or bike rental)\n",
|
||||
" - **Activities**: €15-€50 per person (Entry to museums, parks, and attractions)\n",
|
||||
" \n",
|
||||
"- **Total Estimated Cost**:\n",
|
||||
" - **Solo Traveler (5-day trip)**: Approx. €500-€1,250\n",
|
||||
" - Accommodation: €250-€1,000\n",
|
||||
" - Daily expenses: €250-€500\n",
|
||||
" \n",
|
||||
" - **Family of Four (5-day trip)**: Approx. €1,800-€3,500\n",
|
||||
" - Accommodation: €500-€1,500\n",
|
||||
" - Daily expenses: €1,300-€2,000\n",
|
||||
"\n",
|
||||
"#### Best Time to Visit\n",
|
||||
"- **Peak Season (June-August)**:\n",
|
||||
" - **Pros**: Warm weather, lively festivals, and vibrant outdoor activities.\n",
|
||||
" - **Cons**: Crowded tourist spots and higher prices.\n",
|
||||
" \n",
|
||||
"- **Shoulder Season (April-May & September-October)**:\n",
|
||||
" - **Pros**: Mild weather, stunning tulip blooms (April), fewer crowds, and lower prices.\n",
|
||||
" - **Cons**: Possible rain and some attractions may have reduced hours.\n",
|
||||
" \n",
|
||||
"- **Off-Peak Season (November-March)**:\n",
|
||||
" - **Pros**: Lower prices, festive holiday vibes, fewer tourists.\n",
|
||||
" - **Cons**: Cold and wet weather which might limit outdoor activities.\n",
|
||||
"\n",
|
||||
"#### Hidden Gems\n",
|
||||
"1. **Giethoorn**: Often called the \"Venice of the North,\" Giethoorn is a picturesque village without roads. Rent a \"whisper boat\" for a serene experience gliding through the canals and enjoy the quaint thatched-roof houses.\n",
|
||||
"\n",
|
||||
"2. **Zaanse Schans**: Located near Amsterdam, this charming neighborhood showcases traditional Dutch windmills, wooden houses, and artisan workshops. Spend a day wandering and even tour a functioning windmill.\n",
|
||||
"\n",
|
||||
"3. **Haarlem**: Only 15 minutes from Amsterdam, Haarlem is a historic city with stunning architecture, cozy cafés, and the impressive Frans Hals Museum that houses works from the Dutch Golden Age.\n",
|
||||
"\n",
|
||||
"4. **Edam**: Famous for its cheese, the lovely town of Edam invites you to taste samples at local markets and explore cobbled streets lined with historical buildings. Don’t miss the Edam Museum for a taste of local history.\n",
|
||||
"\n",
|
||||
"5. **Kinderdijk**: A UNESCO World Heritage site known for its iconic windmills, Kinderdijk offers a scenic bike ride and walking trails amidst the charming countryside. Visiting at sunset can be particularly romantic.\n",
|
||||
"\n",
|
||||
"#### Local Tips\n",
|
||||
"- **Customs and Etiquette**: The Dutch are known for being direct but polite. Keep conversations respectful and avoid raising your voice. It’s customary to greet people with a handshake or a friendly smile.\n",
|
||||
"\n",
|
||||
"- **Local Dishes to Try**: \n",
|
||||
" - **Stroopwafels**: A beloved Dutch treat; find them fresh from markets.\n",
|
||||
" - **Haring**: Raw herring fish served with onions and pickles; try it at local fish stalls in Amsterdam.\n",
|
||||
" - **Bitterballen**: A popular Dutch snack; pair them with a local beer at a cozy café.\n",
|
||||
" - **Poffertjes**: Small fluffy pancakes, perfect for sharing as a dessert or snack; find them at street vendors or markets.\n",
|
||||
" \n",
|
||||
"By choosing the Netherlands as your travel destination, you will immerse yourselves in a tapestry of art, history, and picturesque landscapes while creating unforgettable memories. Happy travels!"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.Markdown object>"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"display_result()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c66b461d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "llms",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
132
week1/community-contributions/day1 email checker.ipynb
Normal file
132
week1/community-contributions/day1 email checker.ipynb
Normal file
@@ -0,0 +1,132 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "82b3f7d7-a628-4824-b0b5-26c78b833b7f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"import requests\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"from IPython.display import Markdown, display\n",
|
||||
"from openai import OpenAI\n",
|
||||
"\n",
|
||||
"# If you get an error running this cell, then please head over to the troubleshooting notebook!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7bb45eea-2ae0-4550-a9c8-fb42ff6a5f55",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load environment variables in a file called .env\n",
|
||||
"\n",
|
||||
"load_dotenv(override=True)\n",
|
||||
"api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"\n",
|
||||
"# Check the key\n",
|
||||
"\n",
|
||||
"if not api_key:\n",
|
||||
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
|
||||
"elif not api_key.startswith(\"sk-proj-\"):\n",
|
||||
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
|
||||
"elif api_key.strip() != api_key:\n",
|
||||
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
|
||||
"else:\n",
|
||||
" print(\"API key looks good!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a10c24ce-c334-4424-8a2d-ae79ad3eb393",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai = OpenAI()\n",
|
||||
"\n",
|
||||
"# working on assumption that this is OK"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d9dff1ca-4e0a-44ca-acd6-0bc4004ffc3c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Step 1: Create your prompts\n",
|
||||
"# As you can probably tell I am a University lecturer who deals with some dreadful assessment submissions and have to email students telling them why they got the marks they did.\n",
|
||||
"# This AI Assisstemt would help immensely as we could write what we want to say and let the reviewer fix it for us !\n",
|
||||
"# It is based on the day1_email_reviewer notebook\n",
|
||||
"# MV\n",
|
||||
"\n",
|
||||
"system_prompt = \"You are an AI email reviewer that checks the content of emails sent out to higher education under graduate students. You must identify the meaning of the context in the text given, run a spell check in UK English and provide the subject line and email and rewrite to make the email more professional. and in the end of text, please provide the tone info.\"\n",
|
||||
"user_prompt = \"\"\"\n",
|
||||
" Dear John,\n",
|
||||
"You asked for the reasons why you received the marks that you did in your recently submitted assessment. I have looked over your submission again, bear in mind the fact that you are only 1 student out of a cohort of over 350 and have nagged me for a quick response, and your work was awful.\n",
|
||||
"You submitted work of an appalling standard and obvously did not actually put much work into your submission, you were givben the chance to have feedback on what you were going to submit but you could not be bothered to get this feedback.\n",
|
||||
"You did not bother to turn up to many of the lessons then submitted work with the most basic errors that anyone who had put the right level of effort into their studies would have been able to identify easily and not had such a low mark when they submitted. I think I put more work into marking this rubbish than you did in writing it.\n",
|
||||
"\n",
|
||||
"Best regards,\n",
|
||||
"Dr Doe\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# Step 2: Make the messages list\n",
|
||||
"\n",
|
||||
"messages = [\n",
|
||||
" {\"role\":\"system\", \"content\": system_prompt},\n",
|
||||
" {\"role\":\"user\", \"content\": user_prompt}\n",
|
||||
" \n",
|
||||
"] # fill this in\n",
|
||||
"\n",
|
||||
"# Step 3: Call OpenAI\n",
|
||||
"\n",
|
||||
"response = openai.chat.completions.create(\n",
|
||||
" model=\"gpt-4o-mini\",\n",
|
||||
" messages=messages\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Step 4: print the result\n",
|
||||
"\n",
|
||||
"display(Markdown(response.choices[0].message.content))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5a65b65f-4b3f-41f5-894a-0f8e81f0ba27",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,234 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"import requests\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"from IPython.display import Markdown, display\n",
|
||||
"from openai import OpenAI\n",
|
||||
"\n",
|
||||
"# If you get an error running this cell, then please head over to the troubleshooting notebook!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7b87cadb-d513-4303-baee-a37b6f938e4d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load environment variables in a file called .env\n",
|
||||
"\n",
|
||||
"load_dotenv(override=True)\n",
|
||||
"api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"\n",
|
||||
"# Check the key\n",
|
||||
"\n",
|
||||
"if not api_key:\n",
|
||||
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
|
||||
"elif not api_key.startswith(\"sk-proj-\"):\n",
|
||||
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
|
||||
"elif api_key.strip() != api_key:\n",
|
||||
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
|
||||
"else:\n",
|
||||
" print(\"API key found and looks good so far!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "019974d9-f3ad-4a8a-b5f9-0a3719aea2d3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai = OpenAI()\n",
|
||||
"\n",
|
||||
"# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.\n",
|
||||
"# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "abdb8417-c5dc-44bc-9bee-2e059d162699",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish.\"\n",
|
||||
"\n",
|
||||
"system_prompt = \"You are a high profile professional resume analyst and assist users with highlighting gaps in a very formed resume and provide direction to make the resume eye catching to the recruiters \\\n",
|
||||
"and employers.\"\n",
|
||||
"\n",
|
||||
"user_prompt = \"\"\"Analyze the resume details to do the following: \\\n",
|
||||
"1. Assess the resume to highlight areas of improvement. \\ \n",
|
||||
"2. Create a well formed resume.\n",
|
||||
"\n",
|
||||
"Name: Sam Burns\n",
|
||||
"\n",
|
||||
"PROFESSIONAL SUMMARY\n",
|
||||
"Experienced Data and AI Architect with over 10 years of expertise designing scalable data platforms, integrating cloud-native solutions, and deploying AI/ML systems across enterprise environments. Proven track record of aligning data architecture with business strategy, leading cross-functional teams, and delivering high-impact AI-driven insights.\n",
|
||||
"\n",
|
||||
"CORE SKILLS\n",
|
||||
"\n",
|
||||
"Data Architecture: Lakehouse, Data Mesh, Delta Lake, Data Vault\n",
|
||||
"\n",
|
||||
"Cloud Platforms: Azure (Data Factory, Synapse, ML Studio), AWS (S3, Glue, SageMaker), Databricks\n",
|
||||
"\n",
|
||||
"Big Data & Streaming: Spark, Kafka, Hive, Hadoop\n",
|
||||
"\n",
|
||||
"ML/AI Tooling: MLflow, TensorFlow, Scikit-learn, Hugging Face Transformers\n",
|
||||
"\n",
|
||||
"Programming: Python, SQL, PySpark, Scala, Terraform\n",
|
||||
"\n",
|
||||
"DevOps: CI/CD (GitHub Actions, Azure DevOps), Docker, Kubernetes\n",
|
||||
"\n",
|
||||
"Governance: Data Lineage, Cataloging, RBAC, GDPR, Responsible AI\n",
|
||||
"\n",
|
||||
"PROFESSIONAL EXPERIENCE\n",
|
||||
"\n",
|
||||
"Senior Data & AI Architect\n",
|
||||
"ABC Tech Solutions — New York, NY\n",
|
||||
"Jan 2021 – Present\n",
|
||||
"\n",
|
||||
"Designed and implemented a company-wide lakehouse architecture on Databricks, integrating AWS S3, Redshift, and real-time ingestion from Kafka.\n",
|
||||
"\n",
|
||||
"Led architecture for a predictive maintenance platform using sensor data (IoT), Spark streaming, and MLflow-managed experiments.\n",
|
||||
"\n",
|
||||
"Developed enterprise ML governance framework ensuring reproducibility, fairness, and compliance with GDPR.\n",
|
||||
"\n",
|
||||
"Mentored 6 data engineers and ML engineers; led architectural reviews and technical roadmap planning.\n",
|
||||
"\n",
|
||||
"Data Architect / AI Specialist\n",
|
||||
"Global Insights Inc. — Boston, MA\n",
|
||||
"Jun 2017 – Dec 2020\n",
|
||||
"\n",
|
||||
"Modernized legacy data warehouse to Azure Synapse-based analytics platform, reducing ETL latency by 40%.\n",
|
||||
"\n",
|
||||
"Built MLOps pipelines for customer churn prediction models using Azure ML and ADF.\n",
|
||||
"\n",
|
||||
"Collaborated with business units to define semantic layers for self-service analytics in Power BI.\n",
|
||||
"\n",
|
||||
"Data Engineer\n",
|
||||
"NextGen Analytics — Remote\n",
|
||||
"Jul 2013 – May 2017\n",
|
||||
"\n",
|
||||
"Developed ETL pipelines in PySpark to transform raw web traffic into structured analytics dashboards.\n",
|
||||
"\n",
|
||||
"Integrated NLP models into customer support workflows using spaCy and early versions of Hugging Face.\n",
|
||||
"\n",
|
||||
"Contributed to open-source tools for Jupyter-based analytics and data catalog integration.\n",
|
||||
"\n",
|
||||
"EDUCATION\n",
|
||||
"M.S. in Computer Science – Carnegie Mellon University\n",
|
||||
"B.S. in Information Systems – Rutgers University\n",
|
||||
"\n",
|
||||
"CERTIFICATIONS\n",
|
||||
"\n",
|
||||
"Databricks Certified Data Engineer Professional\n",
|
||||
"\n",
|
||||
"Azure Solutions Architect Expert\n",
|
||||
"\n",
|
||||
"AWS Certified Machine Learning – Specialty\n",
|
||||
"\n",
|
||||
"PROJECTS & CONTRIBUTIONS\n",
|
||||
"\n",
|
||||
"llm_engineering (GitHub): Developed and maintained hands-on LLM course materials and community contributions framework.\n",
|
||||
"\n",
|
||||
"Real-time AI PoC: Designed Kafka-Spark pipeline with Azure OpenAI Service for anomaly detection on IoT streams.\n",
|
||||
"\n",
|
||||
"Contributor to Hugging Face Transformers – integration examples for inference pipelines\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4e6a8730-c3ad-4243-a045-0acba2b5ebcf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"messages = [\n",
|
||||
" {\"role\": \"system\", \"content\": system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": user_prompt}\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "21ed95c5-7001-47de-a36d-1d6673b403ce",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# To give you a preview -- calling OpenAI with system and user messages:\n",
|
||||
"\n",
|
||||
"response = openai.chat.completions.create(model=\"gpt-4o-mini\", messages=messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3d926d59-450e-4609-92ba-2d6f244f1342",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# A function to display this nicely in the Jupyter output, using markdown\n",
|
||||
"\n",
|
||||
"display(Markdown(response.choices[0].message.content))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "eeab24dc-5f90-4570-b542-b0585aca3eb6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Sharing your code\n",
|
||||
"\n",
|
||||
"I'd love it if you share your code afterwards so I can share it with others! You'll notice that some students have already made changes (including a Selenium implementation) which you will find in the community-contributions folder. If you'd like add your changes to that folder, submit a Pull Request with your new versions in that folder and I'll merge your changes.\n",
|
||||
"\n",
|
||||
"If you're not an expert with git (and I am not!) then GPT has given some nice instructions on how to submit a Pull Request. It's a bit of an involved process, but once you've done it once it's pretty clear. As a pro-tip: it's best if you clear the outputs of your Jupyter notebooks (Edit >> Clean outputs of all cells, and then Save) for clean notebooks.\n",
|
||||
"\n",
|
||||
"Here are good instructions courtesy of an AI friend: \n",
|
||||
"https://chatgpt.com/share/677a9cb5-c64c-8012-99e0-e06e88afd293"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f4484fcf-8b39-4c3f-9674-37970ed71988",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,60 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2c4ce468",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"from openai import OpenAI\n",
|
||||
"\n",
|
||||
"openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
|
||||
"\n",
|
||||
"# Step 1: Create your prompts\n",
|
||||
"\n",
|
||||
"system_prompt = \"You are a sports journalist.\"\n",
|
||||
"user_prompt = \"\"\"\n",
|
||||
" Write a sports article in less than 500 words describing the FIFA World Cup Final 2022.\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# Step 2: Make the messages list\n",
|
||||
"\n",
|
||||
"messages = [\n",
|
||||
" {\"role\": \"system\", \"content\": system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": user_prompt}\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"# Step 3: Call OpenAI\n",
|
||||
"\n",
|
||||
"response = openai.chat.completions.create(model=\"llama3.2\", messages=messages)\n",
|
||||
"\n",
|
||||
"# Step 4: print the result\n",
|
||||
"\n",
|
||||
"print(response.choices[0].message.content)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "llms",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
180
week1/community-contributions/day1_fitness_fun.ipynb
Normal file
180
week1/community-contributions/day1_fitness_fun.ipynb
Normal file
@@ -0,0 +1,180 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "638074cc-212f-4d03-8518-ad6b3233d6ca",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Some Fitness Fun\n",
|
||||
"\n",
|
||||
"## Let's Get Pumped!\n",
|
||||
"\n",
|
||||
"Since I'm inteerested in fitness as well as software engineering, I decided to have a little fun with this\n",
|
||||
"based on an old SNL skit.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "15144b50-99e3-479f-8247-b79e0fcdba76",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"API key found and looks good so far!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"load_dotenv(override=True)\n",
|
||||
"api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"\n",
|
||||
"# Check the key\n",
|
||||
"\n",
|
||||
"if not api_key:\n",
|
||||
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
|
||||
"elif not api_key.startswith(\"sk-proj-\"):\n",
|
||||
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
|
||||
"elif api_key.strip() != api_key:\n",
|
||||
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
|
||||
"else:\n",
|
||||
" print(\"API key found and looks good so far!\")\n",
|
||||
"\n",
|
||||
"openai = OpenAI()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"import requests\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"from IPython.display import Markdown, display\n",
|
||||
"from openai import OpenAI\n",
|
||||
"\n",
|
||||
"# If you get an error running this cell, then please head over to the troubleshooting notebook!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "00743dac-0e70-45b7-879a-d7293a6f68a6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"# Hey Arnold, Time to Get Those \"Goals\" Sorted Out! 💪\n",
|
||||
"\n",
|
||||
"Well, well, well! Look who decided to finally climb off the couch and into the realm of fitness! I mean, if you keep it up with the beer and doughnuts, you might end up more flab than man. Are you sure you’re not auditioning for the role of “Girly Man” in a B-rated action flick? \n",
|
||||
"\n",
|
||||
"## Here’s the Game Plan: \n",
|
||||
"\n",
|
||||
"### 1. **Ditch the Doughnuts (and the beer, and the pasta...)**\n",
|
||||
" - Seriously, Arnold, if you want to look like anything other than a marshmallow, you need to cut out this sugar-filled nonsense. Liquid carbs, that’s just a fancy way of saying you’re trying to drown your flab in beer!\n",
|
||||
"\n",
|
||||
"### 2. **Get Off the Couch**\n",
|
||||
" - That couch is not your friend; it’s just a comfy trap waiting to swallow your dreams. Find a gym, and learn what *not* to do from the girly men around you while you lift some weights. Spoiler alert: they probably will lift more than you do!\n",
|
||||
"\n",
|
||||
"### 3. **Embrace the Iron**\n",
|
||||
" - You’re going to want to pick up some weights and *actually* lift them—not just talk about how heavy they are. Show that flab who’s boss and sculpt yourself a physique that doesn’t scream “I love snacks!”\n",
|
||||
"\n",
|
||||
"### 4. **Train Like You Mean It**\n",
|
||||
" - Start with a solid workout routine. Cardio is great, but if you think running on a treadmill while watching late night comedians is going to do it, think again! Train hard or go home, buddy!\n",
|
||||
"\n",
|
||||
"### 5. **Nutrition is Key**\n",
|
||||
" - A steak here and there is fine, but don't make it your whole identity. Toss in some vegetables, lean proteins, and *gasp* maybe squeeze in a salad! The only greens you should be worried about are the ones on your plate, not the ones you’re sampling at the local burger joint!\n",
|
||||
"\n",
|
||||
"### 6. **Set Real Goals**\n",
|
||||
" - Lastly, figure out what you actually want. Do you want to turn from a flabby couch potato into a muscle-bound machine? Or do you want to stay an eternal “girly man”? Because we can make you into a beast, but you’ve got to want it!\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"So, are you ready to say “hasta la vista” to your old lifestyle? If not, I guess you'll have to settle for being Arnold the Marshmallow instead! Let's get to work! 💪😎"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.Markdown object>"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Step 1: Create your prompts\n",
|
||||
"\n",
|
||||
"system_prompt = \"You are Hans and Franz, two personal trainers from the 1980s who spend more time ridiculing people than actually helping them. \\\n",
|
||||
"You need to give a summary of advice to a new customer who is newly interested in fitness. Be snarky and be sure to mention flab and girly men.\\\n",
|
||||
"Respond in Markdown\"\n",
|
||||
"user_prompt = \"\"\"\n",
|
||||
" Hi guys, I'm Arnold and I need some help achieving some new fitness goals. I live beer, pasta, doughnuts, and a good steak.\n",
|
||||
" I also like sitting on the couch and watching late night comedy shows\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# Step 2: Make the messages list\n",
|
||||
"\n",
|
||||
"messages = [\n",
|
||||
" { \"role\": \"system\", \"content\": system_prompt},\n",
|
||||
" { \"role\": \"user\", \"content\": user_prompt}\n",
|
||||
"] \n",
|
||||
"\n",
|
||||
"# Step 3: Call OpenAI\n",
|
||||
"\n",
|
||||
"raw_response = openai.chat.completions.create(\n",
|
||||
" model = \"gpt-4o-mini\",\n",
|
||||
" messages = messages\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Step 4: print the result\n",
|
||||
"response = raw_response.choices[0].message.content\n",
|
||||
"display(Markdown(response))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5004ed3a-dd29-4a56-a182-dc531452a88a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
245
week1/community-contributions/day1_narrate_football_game.ipynb
Normal file
245
week1/community-contributions/day1_narrate_football_game.ipynb
Normal file
@@ -0,0 +1,245 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "31d3c4a4-5442-4074-b812-42d60e0a0c04",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:29.195103Z",
|
||||
"start_time": "2025-04-26T11:54:29.192394Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# In this example we read a footbal (soccer) game stat and we create a narration about the game as we are running a podcast\n",
|
||||
"# use this website as an example: https://understat.com/match/27683"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "cf45e9d5-4913-416c-9880-5be60a96c0e6",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:30.218768Z",
|
||||
"start_time": "2025-04-26T11:54:30.215752Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import requests\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from IPython.display import Markdown, display\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"from openai import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "af8fea69-60aa-430c-a16c-8757b487e07a",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:31.218616Z",
|
||||
"start_time": "2025-04-26T11:54:31.214154Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"load_dotenv(override=True)\n",
|
||||
"api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"\n",
|
||||
"# Check the key\n",
|
||||
"\n",
|
||||
"if not api_key:\n",
|
||||
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
|
||||
"elif not api_key.startswith(\"sk-proj-\"):\n",
|
||||
" print(\n",
|
||||
" \"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
|
||||
"elif api_key.strip() != api_key:\n",
|
||||
" print(\n",
|
||||
" \"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
|
||||
"else:\n",
|
||||
" print(\"API key found and looks good so far!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "daee94d2-f82b-43f0-95d1-15370eda1bc7",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:32.216785Z",
|
||||
"start_time": "2025-04-26T11:54:32.183600Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai = OpenAI()\n",
|
||||
"url = \"https://understat.com/match/27683\"\n",
|
||||
"\n",
|
||||
"# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.\n",
|
||||
"# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0712dd1d-b6bc-41c6-84ec-d965f696f7aa",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:33.025841Z",
|
||||
"start_time": "2025-04-26T11:54:33.023289Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"system_prompt = (\"You are a football (soccer) analyst. Yuo are used to read stats of football \\\n",
|
||||
" games and extract relevant information. You are asked to be a podcast host and \\\n",
|
||||
" you need to create a narration of the game based on the stats you read and based \\\n",
|
||||
" on the play by play moves (the one with minutes upfront). You're talking to the \\\n",
|
||||
" general audience so try to use a easy language and do not be too much telegraphic\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "70c972a6-8af6-4ff2-a338-6d7ba90e2045",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:33.730097Z",
|
||||
"start_time": "2025-04-26T11:54:33.725360Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Some websites need you to use proper headers when fetching them:\n",
|
||||
"headers = {\n",
|
||||
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"class Website:\n",
|
||||
" def __init__(self, url):\n",
|
||||
" \"\"\"\n",
|
||||
" Create this Website object from the given url using the BeautifulSoup library\n",
|
||||
" \"\"\"\n",
|
||||
" self.url = url\n",
|
||||
" response = requests.get(url, headers=headers)\n",
|
||||
" soup = BeautifulSoup(response.content, 'html.parser')\n",
|
||||
" self.title = soup.title.string if soup.title else \"No title found\"\n",
|
||||
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
|
||||
" irrelevant.decompose()\n",
|
||||
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4ccc1ba81c76ffb9",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:40.042357Z",
|
||||
"start_time": "2025-04-26T11:54:40.040384Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def create_user_prompt(game):\n",
|
||||
" user_prompt = f\"You are looking at {game.title} football game\"\n",
|
||||
" user_prompt += \"\\nThis is the entire webpage of the game \\\n",
|
||||
" Please provide a narration of the game in markdown. \\\n",
|
||||
" Focus only on what happened on the game and the stats and ignore all the standings and anything else.\\n\\n\"\n",
|
||||
" user_prompt += game.text\n",
|
||||
" return user_prompt\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e729956758b4d7b5",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:40.699042Z",
|
||||
"start_time": "2025-04-26T11:54:40.696698Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "82b71c1a-895a-48e7-a945-13e615bb0096",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:41.316244Z",
|
||||
"start_time": "2025-04-26T11:54:41.314110Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define messages with system_prompt and user_prompt\n",
|
||||
"def messages_for(system_prompt_input, user_prompt_input):\n",
|
||||
" return [\n",
|
||||
" {\"role\": \"system\", \"content\": system_prompt_input},\n",
|
||||
" {\"role\": \"user\", \"content\": user_prompt_input}\n",
|
||||
" ]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "854dc42e-2bbd-493b-958f-c20484908300",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:55.239164Z",
|
||||
"start_time": "2025-04-26T11:54:41.987168Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# And now: call the OpenAI API.\n",
|
||||
"game = Website(url)\n",
|
||||
"\n",
|
||||
"response = openai.chat.completions.create(\n",
|
||||
" model=\"gpt-4o-mini\",\n",
|
||||
" messages=messages_for(system_prompt, create_user_prompt(game))\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Response is provided in Markdown and displayed accordingly\n",
|
||||
"display(Markdown(response.choices[0].message.content))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "758d2cbe-0f80-4572-8724-7cba77f701dd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,499 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# YOUR FIRST LAB\n",
|
||||
"### Please read this section. This is valuable to get you prepared, even if it's a long read -- it's important stuff.\n",
|
||||
"\n",
|
||||
"## Your first Frontier LLM Project\n",
|
||||
"\n",
|
||||
"Let's build a useful LLM solution - in a matter of minutes.\n",
|
||||
"\n",
|
||||
"By the end of this course, you will have built an autonomous Agentic AI solution with 7 agents that collaborate to solve a business problem. All in good time! We will start with something smaller...\n",
|
||||
"\n",
|
||||
"Our goal is to code a new kind of Web Browser. Give it a URL, and it will respond with a summary. The Reader's Digest of the internet!!\n",
|
||||
"\n",
|
||||
"Before starting, you should have completed the setup for [PC](../SETUP-PC.md) or [Mac](../SETUP-mac.md) and you hopefully launched this jupyter lab from within the project root directory, with your environment activated.\n",
|
||||
"\n",
|
||||
"## If you're new to Jupyter Lab\n",
|
||||
"\n",
|
||||
"Welcome to the wonderful world of Data Science experimentation! Once you've used Jupyter Lab, you'll wonder how you ever lived without it. Simply click in each \"cell\" with code in it, such as the cell immediately below this text, and hit Shift+Return to execute that cell. As you wish, you can add a cell with the + button in the toolbar, and print values of variables, or try out variations. \n",
|
||||
"\n",
|
||||
"I've written a notebook called [Guide to Jupyter](Guide%20to%20Jupyter.ipynb) to help you get more familiar with Jupyter Labs, including adding Markdown comments, using `!` to run shell commands, and `tqdm` to show progress.\n",
|
||||
"\n",
|
||||
"## If you're new to the Command Line\n",
|
||||
"\n",
|
||||
"Please see these excellent guides: [Command line on PC](https://chatgpt.com/share/67b0acea-ba38-8012-9c34-7a2541052665) and [Command line on Mac](https://chatgpt.com/canvas/shared/67b0b10c93a081918210723867525d2b). \n",
|
||||
"\n",
|
||||
"## If you'd prefer to work in IDEs\n",
|
||||
"\n",
|
||||
"If you're more comfortable in IDEs like VSCode or Pycharm, they both work great with these lab notebooks too. \n",
|
||||
"If you'd prefer to work in VSCode, [here](https://chatgpt.com/share/676f2e19-c228-8012-9911-6ca42f8ed766) are instructions from an AI friend on how to configure it for the course.\n",
|
||||
"\n",
|
||||
"## If you'd like to brush up your Python\n",
|
||||
"\n",
|
||||
"I've added a notebook called [Intermediate Python](Intermediate%20Python.ipynb) to get you up to speed. But you should give it a miss if you already have a good idea what this code does: \n",
|
||||
"`yield from {book.get(\"author\") for book in books if book.get(\"author\")}`\n",
|
||||
"\n",
|
||||
"## I am here to help\n",
|
||||
"\n",
|
||||
"If you have any problems at all, please do reach out. \n",
|
||||
"I'm available through the platform, or at ed@edwarddonner.com, or at https://www.linkedin.com/in/eddonner/ if you'd like to connect (and I love connecting!) \n",
|
||||
"And this is new to me, but I'm also trying out X/Twitter at [@edwarddonner](https://x.com/edwarddonner) - if you're on X, please show me how it's done 😂 \n",
|
||||
"\n",
|
||||
"## More troubleshooting\n",
|
||||
"\n",
|
||||
"Please see the [troubleshooting](troubleshooting.ipynb) notebook in this folder to diagnose and fix common problems. At the very end of it is a diagnostics script with some useful debug info.\n",
|
||||
"\n",
|
||||
"## If this is old hat!\n",
|
||||
"\n",
|
||||
"If you're already comfortable with today's material, please hang in there; you can move swiftly through the first few labs - we will get much more in depth as the weeks progress.\n",
|
||||
"\n",
|
||||
"<table style=\"margin: 0; text-align: left;\">\n",
|
||||
" <tr>\n",
|
||||
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
|
||||
" <img src=\"../important.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
|
||||
" </td>\n",
|
||||
" <td>\n",
|
||||
" <h2 style=\"color:#900;\">Please read - important note</h2>\n",
|
||||
" <span style=\"color:#900;\">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations. If you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...</span>\n",
|
||||
" </td>\n",
|
||||
" </tr>\n",
|
||||
"</table>\n",
|
||||
"<table style=\"margin: 0; text-align: left;\">\n",
|
||||
" <tr>\n",
|
||||
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
|
||||
" <img src=\"../resources.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
|
||||
" </td>\n",
|
||||
" <td>\n",
|
||||
" <h2 style=\"color:#f71;\">Treat these labs as a resource</h2>\n",
|
||||
" <span style=\"color:#f71;\">I push updates to the code regularly. When people ask questions or have problems, I incorporate it in the code, adding more examples or improved commentary. As a result, you'll notice that the code below isn't identical to the videos. Everything from the videos is here; but in addition, I've added more steps and better explanations, and occasionally added new models like DeepSeek. Consider this like an interactive book that accompanies the lectures.\n",
|
||||
" </span>\n",
|
||||
" </td>\n",
|
||||
" </tr>\n",
|
||||
"</table>\n",
|
||||
"<table style=\"margin: 0; text-align: left;\">\n",
|
||||
" <tr>\n",
|
||||
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
|
||||
" <img src=\"../business.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
|
||||
" </td>\n",
|
||||
" <td>\n",
|
||||
" <h2 style=\"color:#181;\">Business value of these exercises</h2>\n",
|
||||
" <span style=\"color:#181;\">A final thought. While I've designed these notebooks to be educational, I've also tried to make them enjoyable. We'll do fun things like have LLMs tell jokes and argue with each other. But fundamentally, my goal is to teach skills you can apply in business. I'll explain business implications as we go, and it's worth keeping this in mind: as you build experience with models and techniques, think of ways you could put this into action at work today. Please do contact me if you'd like to discuss more or if you have ideas to bounce off me.</span>\n",
|
||||
" </td>\n",
|
||||
" </tr>\n",
|
||||
"</table>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"import requests\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"from IPython.display import Markdown, display\n",
|
||||
"from openai import OpenAI\n",
|
||||
"\n",
|
||||
"# If you get an error running this cell, then please head over to the troubleshooting notebook!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6900b2a8-6384-4316-8aaa-5e519fca4254",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Connecting to OpenAI (or Ollama)\n",
|
||||
"\n",
|
||||
"The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI. \n",
|
||||
"\n",
|
||||
"If you'd like to use free Ollama instead, please see the README section \"Free Alternative to Paid APIs\", and if you're not sure how to do this, there's a full solution in the solutions folder (day1_with_ollama.ipynb).\n",
|
||||
"\n",
|
||||
"## Troubleshooting if you have problems:\n",
|
||||
"\n",
|
||||
"Head over to the [troubleshooting](troubleshooting.ipynb) notebook in this folder for step by step code to identify the root cause and fix it!\n",
|
||||
"\n",
|
||||
"If you make a change, try restarting the \"Kernel\" (the python process sitting behind this notebook) by Kernel menu >> Restart Kernel and Clear Outputs of All Cells. Then try this notebook again, starting at the top.\n",
|
||||
"\n",
|
||||
"Or, contact me! Message me or email ed@edwarddonner.com and we will get this to work.\n",
|
||||
"\n",
|
||||
"Any concerns about API costs? See my notes in the README - costs should be minimal, and you can control it at every point. You can also use Ollama as a free alternative, which we discuss during Day 2."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7b87cadb-d513-4303-baee-a37b6f938e4d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load environment variables in a file called .env\n",
|
||||
"\n",
|
||||
"load_dotenv(override=True)\n",
|
||||
"api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"\n",
|
||||
"# Check the key\n",
|
||||
"\n",
|
||||
"if not api_key:\n",
|
||||
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
|
||||
"elif not api_key.startswith(\"sk-proj-\"):\n",
|
||||
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
|
||||
"elif api_key.strip() != api_key:\n",
|
||||
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
|
||||
"else:\n",
|
||||
" print(\"API key found and looks good so far!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "019974d9-f3ad-4a8a-b5f9-0a3719aea2d3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai = OpenAI(base_url=\"http://localhost:11434/v1\", api_key=\"ollama\")\n",
|
||||
"\n",
|
||||
"# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.\n",
|
||||
"# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "442fc84b-0815-4f40-99ab-d9a5da6bda91",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Let's make a quick call to a Frontier model to get started, as a preview!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c951be1a-7f1b-448f-af1f-845978e47e2c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<table style=\"margin: 0; text-align: left;\">\n",
|
||||
" <tr>\n",
|
||||
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
|
||||
" <img src=\"../business.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
|
||||
" </td>\n",
|
||||
" <td>\n",
|
||||
" <h2 style=\"color:#181;\">Business applications</h2>\n",
|
||||
" <span style=\"color:#181;\">In this exercise, you experienced calling the Cloud API of a Frontier Model (a leading model at the frontier of AI) for the first time. We will be using APIs like OpenAI at many stages in the course, in addition to building our own LLMs.\n",
|
||||
"\n",
|
||||
"More specifically, we've applied this to Summarization - a classic Gen AI use case to make a summary. This can be applied to any business vertical - summarizing the news, summarizing financial performance, summarizing a resume in a cover letter - the applications are limitless. Consider how you could apply Summarization in your business, and try prototyping a solution.</span>\n",
|
||||
" </td>\n",
|
||||
" </tr>\n",
|
||||
"</table>\n",
|
||||
"\n",
|
||||
"<table style=\"margin: 0; text-align: left;\">\n",
|
||||
" <tr>\n",
|
||||
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
|
||||
" <img src=\"../important.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
|
||||
" </td>\n",
|
||||
" <td>\n",
|
||||
" <h2 style=\"color:#900;\">Before you continue - now try yourself</h2>\n",
|
||||
" <span style=\"color:#900;\">Use the cell below to make your own simple commercial example. Stick with the summarization use case for now. Here's an idea: write something that will take the contents of an email, and will suggest an appropriate short subject line for the email. That's the kind of feature that might be built into a commercial email tool.</span>\n",
|
||||
" </td>\n",
|
||||
" </tr>\n",
|
||||
"</table>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "00743dac-0e70-45b7-879a-d7293a6f68a6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Step 1: Create your prompts\n",
|
||||
"\n",
|
||||
"system_prompt = \"You're an AI assistant who suggests subject line for the given email content \\\n",
|
||||
" by ignoring greetings, sign-offs, and other irrelevant text. You suggest 5 best subject lines, starting with best fitting\" \\\n",
|
||||
"\"\"\n",
|
||||
"user_prompt = \"\"\"\n",
|
||||
" Suggest 3 subject lines for the given email content in markdown. \\\n",
|
||||
" Give the fit percentage of each subject line as well. \\\n",
|
||||
" Give tone of the mail, action items, purpose of the mail.\\n\\n\"\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# Step 2: Make the messages list\n",
|
||||
"\n",
|
||||
"messages = [\"\"\"Dear Sir/Madam,\n",
|
||||
"\n",
|
||||
"I am Ankit Kumari, currently pursuing my Online MCA from Lovely Professional University. I am writing this email to express my concern regarding the scheduling of the online classes for the current semester.\n",
|
||||
"\n",
|
||||
"During the time of admission, it was conveyed to us that the online classes for the program would be conducted on weekends to ensure that working professionals like me can easily manage their work and studies. However, to my surprise, the classes for this semester have been scheduled on weekdays, which is not convenient for students who are working or have businesses.\n",
|
||||
"\n",
|
||||
"As a working professional, I find it difficult to balance my job responsibilities and attend the classes regularly on weekdays. Similarly, there are many students who are facing a similar issue. Therefore, I would like to request you to kindly reschedule the classes and conduct them on weekends as was initially promised during the admission process.\n",
|
||||
"\n",
|
||||
"I believe that conducting the classes on weekends would help students like me to balance their work and studies in a better way, and would also result in better attendance and improved learning outcomes.\n",
|
||||
"\n",
|
||||
"I hope that my request would be taken into consideration, and appropriate steps would be taken to ensure that the classes are conducted on weekends as promised during the admission process.\n",
|
||||
"\n",
|
||||
"Thank you for your understanding.\n",
|
||||
"\n",
|
||||
"Sincerely,\n",
|
||||
"\n",
|
||||
"Ankit Kumar \"\"\",\n",
|
||||
"\"\"\"Hi team,\n",
|
||||
"It is to inform you that i've studied computer science in my graduation i.e. bsc physical science eoth computer science, but still i'm seeing bridge courses i.e. ecap010 and acap011 in my timetable.\n",
|
||||
"Therefore, I knidly request you to look into this matter.\n",
|
||||
"\n",
|
||||
"Best Regards\n",
|
||||
"Ankit Kumar\n",
|
||||
"\"\"\",] # fill this in\n",
|
||||
"\n",
|
||||
"# Step 3: Call OpenAI\n",
|
||||
"\n",
|
||||
"responses = [openai.chat.completions.create(\n",
|
||||
" model=\"llama3.2\",\n",
|
||||
" messages=[\n",
|
||||
" {\"role\": \"system\", \"content\": system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": user_prompt+message},\n",
|
||||
" ]\n",
|
||||
") for message in messages\n",
|
||||
"]\n",
|
||||
"# Step 4: print the result\n",
|
||||
"responses = [response.choices[0].message.content for response in responses]\n",
|
||||
"for response in responses:\n",
|
||||
" display(Markdown(response))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "36ed9f14-b349-40e9-a42c-b367e77f8bda",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## An extra exercise for those who enjoy web scraping\n",
|
||||
"\n",
|
||||
"You may notice that if you try `display_summary(\"https://openai.com\")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. In the community-contributions folder, you'll find an example Selenium solution from a student (thank you!)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "eeab24dc-5f90-4570-b542-b0585aca3eb6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Sharing your code\n",
|
||||
"\n",
|
||||
"I'd love it if you share your code afterwards so I can share it with others! You'll notice that some students have already made changes (including a Selenium implementation) which you will find in the community-contributions folder. If you'd like add your changes to that folder, submit a Pull Request with your new versions in that folder and I'll merge your changes.\n",
|
||||
"\n",
|
||||
"If you're not an expert with git (and I am not!) then GPT has given some nice instructions on how to submit a Pull Request. It's a bit of an involved process, but once you've done it once it's pretty clear. As a pro-tip: it's best if you clear the outputs of your Jupyter notebooks (Edit >> Clean outputs of all cells, and then Save) for clean notebooks.\n",
|
||||
"\n",
|
||||
"Here are good instructions courtesy of an AI friend: \n",
|
||||
"https://chatgpt.com/share/677a9cb5-c64c-8012-99e0-e06e88afd293"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "175ca116",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from selenium import webdriver\n",
|
||||
"from selenium.webdriver.chrome.options import Options\n",
|
||||
"from selenium.webdriver.chrome.service import Service\n",
|
||||
"from selenium.webdriver.support.ui import WebDriverWait\n",
|
||||
"from selenium.webdriver.support import expected_conditions as EC\n",
|
||||
"from selenium.webdriver.common.by import By\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"from openai import OpenAI\n",
|
||||
"from IPython.display import Markdown, display\n",
|
||||
"import platform\n",
|
||||
"\n",
|
||||
"class JSWebsite:\n",
|
||||
" def __init__(self, url, model=\"llama3.2\", headless=True, wait_time=5):\n",
|
||||
" \"\"\"\n",
|
||||
" @Param url: The URL of the website to scrape\n",
|
||||
" @Param model: The model to use for summarization. Valid values are \"gpt-4o-mini\" and \"llama3.2\"\n",
|
||||
" @Param headless: Whether to run the browser in headless mode\n",
|
||||
" @Param wait_time: Additional seconds to wait for JavaScript content to load\n",
|
||||
" \"\"\"\n",
|
||||
" self.url = url\n",
|
||||
" self.model = model\n",
|
||||
" self.wait_time = wait_time\n",
|
||||
" \n",
|
||||
" # Validate model choice\n",
|
||||
" assert model in [\"gpt-4o-mini\", \"llama3.2\"], f\"Invalid model: {model}. Valid models are 'gpt-4o-mini' and 'llama3.2'.\"\n",
|
||||
" \n",
|
||||
" # Initialize appropriate API client\n",
|
||||
" if \"gpt\" in model:\n",
|
||||
" self.openai = OpenAI()\n",
|
||||
" elif \"llama\" in model:\n",
|
||||
" self.openai = OpenAI(base_url=\"http://localhost:11434/v1\", api_key=\"ollama\")\n",
|
||||
" \n",
|
||||
" # Set up Chrome options with platform-specific settings\n",
|
||||
" options = Options()\n",
|
||||
" \n",
|
||||
" if headless:\n",
|
||||
" # Use appropriate headless setting based on platform\n",
|
||||
" if platform.system() == \"Darwin\": # macOS\n",
|
||||
" options.add_argument(\"--headless=chrome\") # macOS-friendly headless mode\n",
|
||||
" else:\n",
|
||||
" options.add_argument(\"--headless=new\") # Modern headless for other platforms\n",
|
||||
" \n",
|
||||
" # These settings help with headless JavaScript rendering\n",
|
||||
" options.add_argument(\"--disable-web-security\")\n",
|
||||
" options.add_argument(\"--allow-running-insecure-content\")\n",
|
||||
" options.add_argument(\"--disable-setuid-sandbox\")\n",
|
||||
" \n",
|
||||
" # Add a user agent to look more like a real browser\n",
|
||||
" # options.add_argument(\"--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36\")\n",
|
||||
" # options.add_argument(\"--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.7103.49 Safari/537.36\")\n",
|
||||
" options.add_argument(\"--user-agent=Mozilla/5.0 (Macintosh; Apple Silicon Mac OS X 14_3_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Chrome/136.0.7103.49 Safari/537.36\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
" \n",
|
||||
" options.add_argument(\"--disable-gpu\")\n",
|
||||
" options.add_argument(\"--window-size=1920,1080\")\n",
|
||||
" options.add_argument(\"--disable-blink-features=AutomationControlled\")\n",
|
||||
" options.add_argument(\"--disable-infobars\")\n",
|
||||
" options.add_argument(\"--disable-extensions\")\n",
|
||||
" options.add_argument(\"--start-maximized\")\n",
|
||||
" options.add_argument(\"--no-sandbox\")\n",
|
||||
" options.add_argument(\"--disable-dev-shm-usage\")\n",
|
||||
" \n",
|
||||
" try:\n",
|
||||
" # Initialize Chrome driver\n",
|
||||
" driver = webdriver.Chrome(options=options)\n",
|
||||
" driver.get(url)\n",
|
||||
" \n",
|
||||
" # Wait for the page to load\n",
|
||||
" WebDriverWait(driver, 10).until(\n",
|
||||
" EC.presence_of_element_located((By.TAG_NAME, \"body\"))\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" # Get the page source and close the browser\n",
|
||||
" html = driver.page_source\n",
|
||||
" driver.quit()\n",
|
||||
" \n",
|
||||
" # Parse HTML with BeautifulSoup\n",
|
||||
" soup = BeautifulSoup(html, 'html.parser')\n",
|
||||
" self.title = soup.title.string if soup.title else \"No title found\"\n",
|
||||
" \n",
|
||||
" # Remove irrelevant elements\n",
|
||||
" if soup.body:\n",
|
||||
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
|
||||
" irrelevant.decompose()\n",
|
||||
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n",
|
||||
" # Check if content is too short, which might indicate loading issues\n",
|
||||
" if len(self.text.strip()) < 100:\n",
|
||||
" self.has_content_error = True\n",
|
||||
" print(\"Warning: Page content seems too short or empty\")\n",
|
||||
" else:\n",
|
||||
" self.has_content_error = False\n",
|
||||
" else:\n",
|
||||
" self.text = \"No body content found\"\n",
|
||||
" self.has_content_error = True\n",
|
||||
" \n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"Error processing page: {e}\")\n",
|
||||
" self.title = \"Error loading page\"\n",
|
||||
" self.text = f\"Failed to process page: {str(e)}\"\n",
|
||||
" self.has_content_error = True\n",
|
||||
"\n",
|
||||
" def summarize(self):\n",
|
||||
" \"\"\"Generate a summary of the website content using the specified AI model.\"\"\"\n",
|
||||
" # Check if page was loaded with errors\n",
|
||||
" if hasattr(self, 'has_content_error') and self.has_content_error:\n",
|
||||
" self.summary = \"Cannot summarize due to page loading or content errors.\"\n",
|
||||
" return self.summary\n",
|
||||
" \n",
|
||||
" try:\n",
|
||||
" response = self.openai.chat.completions.create(\n",
|
||||
" model=self.model,\n",
|
||||
" messages=self.messages_for()\n",
|
||||
" )\n",
|
||||
" self.summary = response.choices[0].message.content\n",
|
||||
" return self.summary\n",
|
||||
" except Exception as e:\n",
|
||||
" self.summary = f\"Error generating summary: {str(e)}\"\n",
|
||||
" return self.summary\n",
|
||||
"\n",
|
||||
" def messages_for(self):\n",
|
||||
" \"\"\"Create the message structure for the AI model.\"\"\"\n",
|
||||
" self.system_prompt = (\n",
|
||||
" \"You are an assistant that analyzes the contents of a website \"\n",
|
||||
" \"and provides a short summary, ignoring text that might be navigation related. \"\n",
|
||||
" \"Respond in markdown.\"\n",
|
||||
" )\n",
|
||||
" return [\n",
|
||||
" {\"role\": \"system\", \"content\": self.system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": self.user_prompt_for()}\n",
|
||||
" ]\n",
|
||||
"\n",
|
||||
" def display_summary(self):\n",
|
||||
" \"\"\"Display the summary in markdown format.\"\"\"\n",
|
||||
" if hasattr(self, 'summary'):\n",
|
||||
" display(Markdown(self.summary))\n",
|
||||
" else:\n",
|
||||
" print(\"Please run the summarize() method first.\")\n",
|
||||
"\n",
|
||||
" def user_prompt_for(self):\n",
|
||||
" \"\"\"Create the user prompt for the AI model.\"\"\"\n",
|
||||
" user_prompt = f\"You are looking at a website titled {self.title}\\n\"\n",
|
||||
" user_prompt += (\n",
|
||||
" \"The contents of this website is as follows; \"\n",
|
||||
" \"please provide a short summary of this website in markdown. \"\n",
|
||||
" \"If it includes news or announcements, then summarize these too.\\n\\n\"\n",
|
||||
" )\n",
|
||||
" user_prompt += self.text\n",
|
||||
" return user_prompt\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Example usage\n",
|
||||
"if __name__ == \"__main__\":\n",
|
||||
" # Site to test\n",
|
||||
" site = JSWebsite(\"https://openai.com\", model=\"llama3.2\", headless=True, wait_time=15)\n",
|
||||
" \n",
|
||||
" # Only attempt to summarize if there were no content errors\n",
|
||||
" summary = site.summarize()\n",
|
||||
" \n",
|
||||
" # Display results\n",
|
||||
" if hasattr(site, 'has_content_error') and site.has_content_error:\n",
|
||||
" print(\"Skipped summarization due to page loading or content errors.\")\n",
|
||||
" print(\"Try with headless=False to see what's happening in the browser.\")\n",
|
||||
" else:\n",
|
||||
" site.display_summary()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "102d19b6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "llms",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
218
week1/community-contributions/day2_narrate_football_game.ipynb
Normal file
218
week1/community-contributions/day2_narrate_football_game.ipynb
Normal file
@@ -0,0 +1,218 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "31d3c4a4-5442-4074-b812-42d60e0a0c04",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:29.195103Z",
|
||||
"start_time": "2025-04-26T11:54:29.192394Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# In this example we read a footbal (soccer) game stat and we create a narration about the game as we are running a podcast\n",
|
||||
"# use this website as an example: https://understat.com/match/27683"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "cf45e9d5-4913-416c-9880-5be60a96c0e6",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:30.218768Z",
|
||||
"start_time": "2025-04-26T11:54:30.215752Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import requests\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from IPython.display import Markdown, display\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"import ollama"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "af8fea69-60aa-430c-a16c-8757b487e07a",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:31.218616Z",
|
||||
"start_time": "2025-04-26T11:54:31.214154Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"load_dotenv(override=True)\n",
|
||||
"api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"\n",
|
||||
"# Check the key\n",
|
||||
"\n",
|
||||
"if not api_key:\n",
|
||||
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
|
||||
"elif not api_key.startswith(\"sk-proj-\"):\n",
|
||||
" print(\n",
|
||||
" \"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
|
||||
"elif api_key.strip() != api_key:\n",
|
||||
" print(\n",
|
||||
" \"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
|
||||
"else:\n",
|
||||
" print(\"API key found and looks good so far!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "daee94d2-f82b-43f0-95d1-15370eda1bc7",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:32.216785Z",
|
||||
"start_time": "2025-04-26T11:54:32.183600Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = \"https://understat.com/match/27683\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0712dd1d-b6bc-41c6-84ec-d965f696f7aa",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:33.025841Z",
|
||||
"start_time": "2025-04-26T11:54:33.023289Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"system_prompt = (\"You are a football (soccer) analyst. Yuo are used to read stats of football \\\n",
|
||||
" games and extract relevant information. You are asked to be a podcast host and \\\n",
|
||||
" you need to create a narration of the game based on the stats you read and based \\\n",
|
||||
" on the play by play moves (the one with minutes upfront). You're talking to the \\\n",
|
||||
" general audience so try to use a easy language and do not be too much telegraphic\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "70c972a6-8af6-4ff2-a338-6d7ba90e2045",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:33.730097Z",
|
||||
"start_time": "2025-04-26T11:54:33.725360Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Some websites need you to use proper headers when fetching them:\n",
|
||||
"headers = {\n",
|
||||
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"class Website:\n",
|
||||
" def __init__(self, url):\n",
|
||||
" \"\"\"\n",
|
||||
" Create this Website object from the given url using the BeautifulSoup library\n",
|
||||
" \"\"\"\n",
|
||||
" self.url = url\n",
|
||||
" response = requests.get(url, headers=headers)\n",
|
||||
" soup = BeautifulSoup(response.content, 'html.parser')\n",
|
||||
" self.title = soup.title.string if soup.title else \"No title found\"\n",
|
||||
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
|
||||
" irrelevant.decompose()\n",
|
||||
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4ccc1ba81c76ffb9",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:40.042357Z",
|
||||
"start_time": "2025-04-26T11:54:40.040384Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def create_user_prompt(game):\n",
|
||||
" user_prompt = f\"You are looking at {game.title} football game\"\n",
|
||||
" user_prompt += \"\\nThis is the entire webpage of the game \\\n",
|
||||
" Please provide a narration of the game in markdown. \\\n",
|
||||
" Focus only on what happened on the game and the stats and ignore all the standings and anything else.\\n\\n\"\n",
|
||||
" user_prompt += game.text\n",
|
||||
" return user_prompt\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "82b71c1a-895a-48e7-a945-13e615bb0096",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:41.316244Z",
|
||||
"start_time": "2025-04-26T11:54:41.314110Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define messages with system_prompt and user_prompt\n",
|
||||
"def messages_for(system_prompt_input, user_prompt_input):\n",
|
||||
" return [\n",
|
||||
" {\"role\": \"system\", \"content\": system_prompt_input},\n",
|
||||
" {\"role\": \"user\", \"content\": user_prompt_input}\n",
|
||||
" ]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "854dc42e-2bbd-493b-958f-c20484908300",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-04-26T11:54:55.239164Z",
|
||||
"start_time": "2025-04-26T11:54:41.987168Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# And now: call the OpenAI API.\n",
|
||||
"game = Website(url)\n",
|
||||
"\n",
|
||||
"response = ollama.chat(model=\"llama3.2\", messages=messages_for(system_prompt, create_user_prompt(game)))\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Response is provided in Markdown and displayed accordingly\n",
|
||||
"display(Markdown(response['message']['content']))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,433 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"\n",
|
||||
"import requests\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"from IPython.display import Markdown, display"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "29ddd15d-a3c5-4f4e-a678-873f56162724",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Constants\n",
|
||||
"\n",
|
||||
"OLLAMA_API = \"http://localhost:11434/api/chat\"\n",
|
||||
"HEADERS = {\"Content-Type\": \"application/json\"}\n",
|
||||
"MODEL = \"llama3.2\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "dac0a679-599c-441f-9bf2-ddc73d35b940",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create a messages list using the same format that we used for OpenAI\n",
|
||||
"\n",
|
||||
"messages = [\n",
|
||||
" {\"role\": \"user\", \"content\": \"Describe some of the business applications of Generative AI\"}\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "7bb9c624-14f0-4945-a719-8ddb64f66f47",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"payload = {\n",
|
||||
" \"model\": MODEL,\n",
|
||||
" \"messages\": messages,\n",
|
||||
" \"stream\": False\n",
|
||||
" }"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "479ff514-e8bd-4985-a572-2ea28bb4fa40",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠋ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠙ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠹ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠸ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠼ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠴ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠦ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠧ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠇ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠏ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠋ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠙ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠹ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠸ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠼ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest \u001b[K\n",
|
||||
"pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB \u001b[K\n",
|
||||
"pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB \u001b[K\n",
|
||||
"pulling fcc5a6bec9da: 100% ▕██████████████████▏ 7.7 KB \u001b[K\n",
|
||||
"pulling a70ff7e570d9: 100% ▕██████████████████▏ 6.0 KB \u001b[K\n",
|
||||
"pulling 56bb8bd477a5: 100% ▕██████████████████▏ 96 B \u001b[K\n",
|
||||
"pulling 34bb5ab01051: 100% ▕██████████████████▏ 561 B \u001b[K\n",
|
||||
"verifying sha256 digest \u001b[K\n",
|
||||
"writing manifest \u001b[K\n",
|
||||
"success \u001b[K\u001b[?25h\u001b[?2026l\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Let's just make sure the model is loaded\n",
|
||||
"\n",
|
||||
"!ollama pull llama3.2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "42b9f644-522d-4e05-a691-56e7658c0ea9",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Generative AI has numerous business applications across various industries, including:\n",
|
||||
"\n",
|
||||
"1. **Content Creation**: Generative AI can be used to create high-quality content such as articles, social media posts, product descriptions, and more. This helps businesses save time and resources on content creation while maintaining consistency and quality.\n",
|
||||
"2. **Marketing Automation**: Generative AI can generate personalized marketing materials, such as email templates, ad copy, and product descriptions, based on customer data and behavior.\n",
|
||||
"3. **Customer Service Chatbots**: Generative AI-powered chatbots can provide 24/7 support to customers, answering common questions and routing complex issues to human representatives.\n",
|
||||
"4. **Product Design**: Generative AI can help designers create new products, such as 3D models, prototypes, and even entire product lines, using machine learning algorithms to optimize design parameters.\n",
|
||||
"5. **Virtual Assistants**: Generative AI-powered virtual assistants can be integrated into businesses' IT systems to automate tasks, provide personalized recommendations, and offer customer support.\n",
|
||||
"6. **Data Analysis**: Generative AI can help analyze large datasets, identify patterns, and make predictions about future trends and outcomes.\n",
|
||||
"7. **Supply Chain Optimization**: Generative AI can optimize supply chain operations by predicting demand, managing inventory, and optimizing logistics.\n",
|
||||
"8. **Sales Forecasting**: Generative AI can analyze historical sales data, market trends, and external factors to predict future sales performance and identify areas for improvement.\n",
|
||||
"9. **Creative Writing**: Generative AI can be used to generate creative content such as poetry, music, or even entire scripts for films and TV shows.\n",
|
||||
"10. **Music Generation**: Generative AI can create original music tracks, beats, or melodies based on user input or style preferences.\n",
|
||||
"11. **Image and Video Generation**: Generative AI can create realistic images and videos that can be used in various applications such as advertising, product photography, or even entertainment.\n",
|
||||
"12. **Language Translation**: Generative AI-powered language translation tools can help businesses communicate with customers and clients who speak different languages.\n",
|
||||
"\n",
|
||||
"Some notable companies that are leveraging Generative AI for business applications include:\n",
|
||||
"\n",
|
||||
"* Google (Google DeepMind)\n",
|
||||
"* Amazon (Amazon SageMaker)\n",
|
||||
"* Microsoft (Microsoft Azure Machine Learning)\n",
|
||||
"* IBM (IBM Watson Studio)\n",
|
||||
"* Salesforce (Salesforce Einstein)\n",
|
||||
"\n",
|
||||
"These applications of Generative AI can help businesses gain a competitive edge, improve efficiency, and enhance customer experiences.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# If this doesn't work for any reason, try the 2 versions in the following cells\n",
|
||||
"# And double check the instructions in the 'Recap on installation of Ollama' at the top of this lab\n",
|
||||
"# And if none of that works - contact me!\n",
|
||||
"\n",
|
||||
"response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)\n",
|
||||
"print(response.json()['message']['content'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6a021f13-d6a1-4b96-8e18-4eae49d876fe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Introducing the ollama package\n",
|
||||
"\n",
|
||||
"And now we'll do the same thing, but using the elegant ollama python package instead of a direct HTTP call.\n",
|
||||
"\n",
|
||||
"Under the hood, it's making the same call as above to the ollama server running at localhost:11434"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "7745b9c4-57dc-4867-9180-61fa5db55eb8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Generative AI has numerous business applications across various industries. Here are some examples:\n",
|
||||
"\n",
|
||||
"1. **Content Creation**: Generative AI can be used to generate high-quality content such as articles, social media posts, product descriptions, and even entire books. This can help businesses reduce the time and cost associated with content creation.\n",
|
||||
"2. **Product Design**: Generative AI can be used to design new products, such as jewelry, fashion items, or household goods. This can help businesses quickly prototype and test new designs without the need for extensive human involvement.\n",
|
||||
"3. **Marketing and Advertising**: Generative AI can be used to generate personalized ads, product recommendations, and even entire marketing campaigns. This can help businesses tailor their marketing efforts to specific customer segments.\n",
|
||||
"4. **Customer Service Chatbots**: Generative AI can be used to create chatbots that can understand and respond to customer inquiries in a more human-like way. This can help businesses provide better customer service without the need for human agents.\n",
|
||||
"5. **Data Analysis and Visualization**: Generative AI can be used to analyze large datasets and generate visualizations, such as charts and graphs, that can help businesses gain insights into their data.\n",
|
||||
"6. **Predictive Maintenance**: Generative AI can be used to predict when equipment is likely to fail, allowing businesses to schedule maintenance and reduce downtime.\n",
|
||||
"7. **Personalized Recommendations**: Generative AI can be used to generate personalized product recommendations based on customer behavior and preferences.\n",
|
||||
"8. **Music Composition**: Generative AI can be used to compose music for various applications, such as film scores, advertisements, or even entire albums.\n",
|
||||
"9. **Image and Video Generation**: Generative AI can be used to generate high-quality images and videos that can be used in various business contexts, such as product photography or marketing materials.\n",
|
||||
"10. **Supply Chain Optimization**: Generative AI can be used to optimize supply chain operations, such as predicting demand, managing inventory, and identifying bottlenecks.\n",
|
||||
"\n",
|
||||
"Some specific industries where generative AI is being applied include:\n",
|
||||
"\n",
|
||||
"* **Finance**: Generative AI can be used to analyze financial data, generate investment recommendations, and even create personalized financial plans.\n",
|
||||
"* **Healthcare**: Generative AI can be used to analyze medical images, generate diagnostic reports, and even develop personalized treatment plans.\n",
|
||||
"* **Education**: Generative AI can be used to create personalized learning plans, generate educational content, and even grade student assignments.\n",
|
||||
"\n",
|
||||
"These are just a few examples of the many business applications of generative AI. As the technology continues to evolve, we can expect to see even more innovative uses across various industries.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import ollama\n",
|
||||
"\n",
|
||||
"response = ollama.chat(model=MODEL, messages=messages)\n",
|
||||
"print(response['message']['content'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a4704e10-f5fb-4c15-a935-f046c06fb13d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Alternative approach - using OpenAI python library to connect to Ollama"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "23057e00-b6fc-4678-93a9-6b31cb704bff",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Generative AI has numerous business applications across various industries. Here are some examples:\n",
|
||||
"\n",
|
||||
"1. **Content Creation**: Generative AI can be used to automate content creation, such as generating news articles, product descriptions, and social media posts. This can help businesses save time and resources while maintaining consistency in their content.\n",
|
||||
"2. **Digital Marketing**: Generative AI can be used to optimize online ads, generate ad copy, and create personalized email campaigns. It can also help analyze customer data and predict their behavior, enabling more effective marketing strategies.\n",
|
||||
"3. **Design and Prototyping**: Generative AI can be used to generate designs for products, such as product labels, packaging, and branding materials. It can also create prototypes and simulations, reducing the need for physical prototyping and iterative design processes.\n",
|
||||
"4. **Creative Writing and Storytelling**: Generative AI can be used to co-create stories, articles, and blog posts with human writers, helping to generate ideas, outlines, and even entire pieces of content.\n",
|
||||
"5. **Music Composition and Generation**: Generative AI can be used to compose music, generate sound effects, and create personalized playlists. This can help businesses like music streaming services and content creators generate original content without having to rely on human composers.\n",
|
||||
"6. **Image and Video Generation**: Generative AI can be used to create high-quality images and videos for various applications, including advertising, media production, and film and television studios.\n",
|
||||
"7. **Predictive Analytics and Risk Analysis**: Generative AI can be used to analyze large datasets, identify patterns, and predict outcomes, helping businesses make informed decisions about investments, customers, products, and resource allocation.\n",
|
||||
"8. **Chatbots and Virtual Assistants**: Generative AI can be used to create conversational interfaces that simulate human-like interactions, making it easier for businesses to engage with their customers, provide customer support, and automate routine tasks.\n",
|
||||
"9. **Materials Science and Product Development**: Generative AI can be used to design new materials, predict material behavior, and optimize product performance, enabling faster and more accurate product development cycles.\n",
|
||||
"10. **Supply Chain Management and Logistics**: Generative AI can be used to analyze supply chain data, predict demand, and optimize logistics operations, helping businesses reduce costs, improve efficiency, and increase delivery times.\n",
|
||||
"\n",
|
||||
"These are just a few examples of the business applications of Generative AI. As the technology continues to evolve, we can expect to see even more innovative uses across various industries and sectors.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# There's actually an alternative approach that some people might prefer\n",
|
||||
"# You can use the OpenAI client python library to call Ollama:\n",
|
||||
"\n",
|
||||
"from openai import OpenAI\n",
|
||||
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
|
||||
"\n",
|
||||
"response = ollama_via_openai.chat.completions.create(\n",
|
||||
" model=MODEL,\n",
|
||||
" messages=messages\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(response.choices[0].message.content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1622d9bb-5c68-4d4e-9ca4-b492c751f898",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# NOW the exercise for you\n",
|
||||
"\n",
|
||||
"Take the code from day1 and incorporate it here, to build a website summarizer that uses Llama 3.2 running locally instead of OpenAI; use either of the above approaches."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "6de38216-6d1c-48c4-877b-86d403f4e0f8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Some websites need you to use proper headers when fetching them:\n",
|
||||
"headers = {\n",
|
||||
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
|
||||
"}\n",
|
||||
"# A class to represent a Webpage\n",
|
||||
"class Website:\n",
|
||||
"\n",
|
||||
" def __init__(self, url):\n",
|
||||
" \"\"\"\n",
|
||||
" Create this Website object from the given url using the BeautifulSoup library\n",
|
||||
" \"\"\"\n",
|
||||
" self.url = url\n",
|
||||
" response = requests.get(url, headers=headers)\n",
|
||||
" soup = BeautifulSoup(response.content, 'html.parser')\n",
|
||||
" self.title = soup.title.string if soup.title else \"No title found\"\n",
|
||||
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
|
||||
" irrelevant.decompose()\n",
|
||||
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n",
|
||||
"\n",
|
||||
"# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish.\"\n",
|
||||
"system_prompt = \"You are an assistant that analyzes the contents of a website \\\n",
|
||||
"and provides a short summary, ignoring text that might be navigation related. \\\n",
|
||||
"Respond in markdown.\"\n",
|
||||
"\n",
|
||||
"# A function that writes a User Prompt that asks for summaries of websites:\n",
|
||||
"def user_prompt_for(website):\n",
|
||||
" user_prompt = f\"You are looking at a website titled {website.title}\"\n",
|
||||
" user_prompt += \"\\nThe contents of this website is as follows; \\\n",
|
||||
"please provide a short summary of this website in markdown. \\\n",
|
||||
"If it includes news or announcements, then summarize these too.\\n\\n\"\n",
|
||||
" user_prompt += website.text\n",
|
||||
" return user_prompt\n",
|
||||
"\n",
|
||||
"# See how this function creates exactly the format above\n",
|
||||
"def messages_for(website):\n",
|
||||
" return [\n",
|
||||
" {\"role\": \"system\", \"content\": system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
|
||||
" ]\n",
|
||||
"\n",
|
||||
"# Call the Ollama local API.\n",
|
||||
"def summarize(url):\n",
|
||||
" website = Website(url)\n",
|
||||
" response = ollama_via_openai.chat.completions.create(\n",
|
||||
" model = MODEL,\n",
|
||||
" messages = messages_for(website)\n",
|
||||
" )\n",
|
||||
" return response.choices[0].message.content\n",
|
||||
"\n",
|
||||
"# A function to display this nicely in the Jupyter output, using markdown\n",
|
||||
"def display_summary(url):\n",
|
||||
" summary = summarize(url)\n",
|
||||
" display(Markdown(summary))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "16277650-7925-47dc-9194-02bbb520d691",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"This appears to be a sample issue of the CNN website, showcasing various news articles and features from around the world. I'll summarize some of the top headlines:\n",
|
||||
"\n",
|
||||
"**World News**\n",
|
||||
"\n",
|
||||
"* **Pope Francis**: The Pope has passed away at the age of 96, leaving behind a legacy of service and compassion.\n",
|
||||
"* **Israel-Hamas War**: The conflict between Israel and Hamas has intensified, with both sides suffering losses and a human cost.\n",
|
||||
"* **Ukraine-Russia War**: Russia has returned the body of a Ukrainian journalist who died in Russian detention, sparking concerns about Russian treatment of prisoners.\n",
|
||||
"\n",
|
||||
"**US Politics**\n",
|
||||
"\n",
|
||||
"* **Trump Administration**: Former President Donald Trump is rumored to be planning a comeback, with several high-profile officials announcing their resignation or departures.\n",
|
||||
"* **TSAFIR ABAVOV**: Two Israeli officials have been accused of attempting to purchase the remains of two dead Palestinian men for thousands of dollars.\n",
|
||||
"\n",
|
||||
"**Business and Technology**\n",
|
||||
"\n",
|
||||
"* **Apple Tariffs**: The US government has imposed tariffs on Chinese tech giant Apple, with CEO Tim Cook stating that the tariffs could cost the company up to $900 million this quarter.\n",
|
||||
"* **Meta's AI Assistant App**: Facebook parent Meta has launched an AI assistant app, further competing with OpenAI and Google in the emerging field of digital assistants.\n",
|
||||
"\n",
|
||||
"**Health**\n",
|
||||
"\n",
|
||||
"* **Whooping Cough Outbreak**: Cases of whooping cough are rising globally, with experts warning of the need for increased vaccination efforts.\n",
|
||||
"* **Forever Chemicals Research**: Researchers have made gains in understanding how to build homes using fungi as a sustainable alternative material solution.\n",
|
||||
"\n",
|
||||
"This is just a snapshot of some of the top news headlines from the CNN website. If you'd like to know more about any specific topic, feel free to ask!"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.Markdown object>"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"display_summary(\"https://cnn.com\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "86fd552d-d95c-4636-878c-86d3f6338a0c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"**Anthropic Website Summary**\n",
|
||||
"==========================\n",
|
||||
"\n",
|
||||
"Anthropic is a company that builds AI to serve humanity's long-term well-being. They aim to create tools with human benefit at their foundation, focusing on responsible AI development.\n",
|
||||
"\n",
|
||||
"### News and Announcements\n",
|
||||
"\n",
|
||||
"* **Claude 3.7 Sonnet**: Anthropic's most intelligent AI model is now available.\n",
|
||||
"\t+ Released in February 2025\n",
|
||||
"* **Anthropic Economic Index**: New publication released on March 27, 2025, discussing societal impacts of large language models.\n",
|
||||
"* **Alignment faking in large language models**: Blog post from December 18, 2024, exploring alignment science.\n",
|
||||
"* **Introducing the Model Context Protocol**: Product update for November 25, 2024.\n",
|
||||
"\n",
|
||||
"### AI Research and Products\n",
|
||||
"\n",
|
||||
"Anthropic focuses on building powerful technologies with human benefit at their foundation. They provide various resources, including:\n",
|
||||
"\n",
|
||||
"* Claude, an open-source AI platform\n",
|
||||
"* Anthropic Academy, a learning platform\n",
|
||||
"* Research overview, featuring the Anthropic Economic Index and more\n",
|
||||
"\n",
|
||||
"The company's mission is to create tools that put safety at the frontier of AI development.\n",
|
||||
"\n",
|
||||
"### Products and Pricing\n",
|
||||
"\n",
|
||||
"Anthropic offers various products and pricing plans for customers, including:\n",
|
||||
"\n",
|
||||
"* Claude Code\n",
|
||||
"* Claude team plan\n",
|
||||
"* Claude enterprise plan\n",
|
||||
"* Claude education plan\n",
|
||||
"* Claude apps\n",
|
||||
"* Pricing plans for Claude.ai"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.Markdown object>"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"display_summary(\"https://anthropic.com\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,225 @@
|
||||
import dotenv
|
||||
import asyncio
|
||||
|
||||
import os
|
||||
os.environ['PYPPETEER_CHROMIUM_REVISION'] = '1263111'
|
||||
|
||||
from rich.console import Console
|
||||
from rich.markdown import Markdown
|
||||
from openai import OpenAI
|
||||
from openai.types.chat import ChatCompletion
|
||||
from typing import Optional, Union, Dict, List
|
||||
from pyppeteer import launch
|
||||
from pyppeteer_stealth import stealth
|
||||
from random import randint
|
||||
|
||||
console = Console()
|
||||
|
||||
class Config:
|
||||
def __init__(self, filename: str = ".env"):
|
||||
dotenv.load_dotenv(filename)
|
||||
self._config = dotenv.dotenv_values(filename)
|
||||
|
||||
def get(self, key: str) -> str:
|
||||
return self._config.get(key, None)
|
||||
|
||||
def get_int(self, key: str) -> int:
|
||||
value = self.get(key)
|
||||
if value is not None:
|
||||
return int(value)
|
||||
return None
|
||||
|
||||
def get_bool(self, key: str) -> bool:
|
||||
value = self.get(key)
|
||||
if value is not None:
|
||||
return value.lower() in ("true", "1", "yes")
|
||||
return None
|
||||
|
||||
@property
|
||||
def openai_api_key(self) -> str:
|
||||
return self.get("OPENAI_API_KEY")
|
||||
|
||||
class Website:
|
||||
|
||||
__url: str
|
||||
__title: str
|
||||
__text: str
|
||||
|
||||
@property
|
||||
def url(self) -> str:
|
||||
return self.__url
|
||||
|
||||
@property
|
||||
def title(self) -> str:
|
||||
return self.__title
|
||||
|
||||
@property
|
||||
def text(self) -> str:
|
||||
return self.__text
|
||||
|
||||
@url.setter
|
||||
def url(self, url: str) -> None:
|
||||
self.__url = url
|
||||
self.__scrape()
|
||||
|
||||
def __scrape(self) -> None:
|
||||
"""
|
||||
Scrape the website using pyppeteer.
|
||||
"""
|
||||
import asyncio
|
||||
async def main() -> None:
|
||||
browser = await launch(headless=True)
|
||||
page = await browser.newPage()
|
||||
await stealth(page)
|
||||
|
||||
# randomize user agent
|
||||
user_agents: List[str] = [
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 13_0) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Safari/605.1.15",
|
||||
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36",
|
||||
]
|
||||
ua = user_agents[randint(0, len(user_agents) - 1)]
|
||||
await page.setUserAgent(ua)
|
||||
await page.setRequestInterception(True)
|
||||
page.on("request", lambda req: asyncio.ensure_future(
|
||||
req.abort() if req.resourceType == "stylesheet" else req.continue_()
|
||||
))
|
||||
|
||||
try:
|
||||
await page.goto(self.url, {"timeout": 60000})
|
||||
self.__title = await page.title()
|
||||
self.__text = await page.evaluate('() => document.body.innerText')
|
||||
except Exception as e:
|
||||
console.print(f"[red]Error scraping {self.url}: {e}[red]")
|
||||
finally:
|
||||
await page.close()
|
||||
await browser.close()
|
||||
|
||||
asyncio.run(main())
|
||||
|
||||
def __init__(self, url: str) -> None:
|
||||
self.url = url
|
||||
|
||||
def __str__(self) -> str:
|
||||
return f"Website(url={self.url}, title=\"{self.title}\")"
|
||||
|
||||
class LlmSummarizer:
|
||||
#region Config
|
||||
__config: Config
|
||||
@property
|
||||
def config(self) -> Config:
|
||||
if self.__config is None:
|
||||
raise ValueError("Config not initialized")
|
||||
return self.__config
|
||||
#endregion
|
||||
|
||||
#region OpenAI
|
||||
__openai: OpenAI = None
|
||||
|
||||
@property
|
||||
def openai(self) -> OpenAI:
|
||||
"""
|
||||
Lazy load the OpenAI client. This is done to avoid creating the client if it's not needed.
|
||||
"""
|
||||
if self.__openai is None:
|
||||
self.__openai = OpenAI(api_key=self.config.openai_api_key)
|
||||
return self.__openai
|
||||
|
||||
#endregion
|
||||
|
||||
#region System behavior
|
||||
__system_behavior: Dict[str, str] = None
|
||||
|
||||
@property
|
||||
def system_behavior(self) -> Dict[str, str]:
|
||||
"""
|
||||
Lazy load the system behavior. This is done to avoid creating the system behavior if it's not needed.
|
||||
"""
|
||||
if self.__system_behavior is None:
|
||||
self.__system_behavior = {
|
||||
"role": "system",
|
||||
"content": (
|
||||
"You are an assistant that analyzes the contents of a website "
|
||||
"and provides a short summary, ignoring the text that might be navigation-related."
|
||||
"Respond in markdown and be concise."
|
||||
)
|
||||
}
|
||||
return self.__system_behavior
|
||||
|
||||
#endregion
|
||||
|
||||
#region user_prompt_for
|
||||
|
||||
def user_prompt_for(self, website: Website) -> Dict[str, str]:
|
||||
user_prompt_content: str = (
|
||||
f"You are looking at the website titled \"{website.title}\""
|
||||
"The content of this website is as follows; "
|
||||
"please provide a short summary of this website in markdown."
|
||||
"If it includes news or announcements, then summarize these too.\n\n"
|
||||
f"\"\"\"\n{website.text}\n\"\"\"\n\n"
|
||||
)
|
||||
return {
|
||||
"role": "user",
|
||||
"content": user_prompt_content
|
||||
}
|
||||
|
||||
#endregion
|
||||
|
||||
#region messages_for
|
||||
|
||||
def messages_for(self, website: Website) -> List[Dict[str, str]]:
|
||||
"""
|
||||
Create the messages for the OpenAI API.
|
||||
"""
|
||||
return [
|
||||
self.system_behavior,
|
||||
self.user_prompt_for(website)
|
||||
]
|
||||
|
||||
#endregion
|
||||
|
||||
#region summarize
|
||||
|
||||
def summarize(self, website: Union[Website, str]) -> Optional[str]:
|
||||
"""
|
||||
Summarize the website using the OpenAI API.
|
||||
"""
|
||||
if isinstance(website, str):
|
||||
website = Website(website)
|
||||
messages: List[Dict[str, str]] = self.messages_for(website)
|
||||
try:
|
||||
response: ChatCompletion = self.openai.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=messages,
|
||||
temperature=0.2,
|
||||
max_tokens=512,
|
||||
)
|
||||
return response.choices[0].message.content
|
||||
except Exception as e:
|
||||
console.print(f"[red]Error summarizing {website if isinstance(website, str) else website.url}: {e}[red]")
|
||||
return None
|
||||
|
||||
#endregion
|
||||
|
||||
def __init__(self, config: Config) -> None:
|
||||
self.__config = config
|
||||
|
||||
def display_markdown(content: str) -> None:
|
||||
"""
|
||||
Display the markdown content using rich.
|
||||
"""
|
||||
console.print(Markdown(content))
|
||||
|
||||
def show_summary(summary: str) -> None:
|
||||
"""
|
||||
Show the summary of the website using rich.
|
||||
"""
|
||||
if summary:
|
||||
display_markdown(summary)
|
||||
else:
|
||||
console.print("No summary found.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
summarizer = LlmSummarizer(Config())
|
||||
summary = summarizer.summarize("https://cnn.com")
|
||||
show_summary(summary)
|
||||
@@ -0,0 +1,7 @@
|
||||
beautifulsoup4
|
||||
openai
|
||||
dotenv
|
||||
requests
|
||||
rich
|
||||
pyppeteer
|
||||
pyppeteer_stealth
|
||||
@@ -0,0 +1,302 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "52629582-ec22-447a-ae09-cba16a46976d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Datasheet Comparator - MVP"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "40de9dc5-0387-4950-8f8f-4805b46187c3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook is part of a project that compares technical specifications from two electronic component datasheets.\n",
|
||||
"\n",
|
||||
"Initially, the PDFs are provided as local files, but future versions will allow users to:\n",
|
||||
"- Select datasheets interactively from within the notebook\n",
|
||||
"- Search and retrieve part information from distributor APIs (e.g. Mouser, Digi-Key)\n",
|
||||
"- Use AI to extract, analyze, and summarize key specifications and differences\n",
|
||||
"\n",
|
||||
"The goal is to support engineers in identifying part changes, upgrades, or replacements efficiently."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b51c91b6-953b-479c-acc5-ab2a189fabba",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 📌 Section A: Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "553666d5-af7e-46f0-b945-0d48c32bfbbf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import requests\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"from IPython.display import Markdown, display\n",
|
||||
"from openai import OpenAI\n",
|
||||
"import fitz # PyMuPDF for PDF parsing\n",
|
||||
"import pandas as pd"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a19c077a-36e3-4ff2-bee7-85f23e90b89a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Load OpenAI API key from environment variable (recommended)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6435f1c7-f161-4cad-b68a-05080304ff22",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"load_dotenv(override=True)\n",
|
||||
"api_key = os.getenv(\"OPENAI_API_KEY\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3722da9c-e1e9-4838-8ab9-04e45e52d8f0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai = OpenAI()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "34916ec4-643c-4b76-8e21-13c3364782fa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Define paths to datasheets\n",
|
||||
"💬 **Note:** These example datasheet paths will later be replaced by a user-driven file selection dialog within the Jupyter notebook; optionally, this section could be extended to fetch component data directly from distributor websites."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "42621aa4-7094-4209-95ba-ecf03ba609fb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pdf_path_1 = \"./datasheets/part_old.pdf\"\n",
|
||||
"pdf_path_2 = \"./datasheets/part_new.pdf\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8f09e201-ab22-4b9d-a9a3-b12cc671a68a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 📌 Section B: Extract text from datasheets"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ff36d62e-efb6-4d08-a1d5-ceb470917103",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def extract_text_from_pdf(path):\n",
|
||||
" text = \"\"\n",
|
||||
" with fitz.open(path) as doc:\n",
|
||||
" for page in doc:\n",
|
||||
" text += page.get_text()\n",
|
||||
" return text"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0da6bc72-93e2-4229-885b-7020f3920855",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 📌 Section C: Use ChatGPT to summarize and compare"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4e8de5f9-c2b6-4d6f-9cde-c1275ec0be83",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Section C.1: Define system_prompt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "30dc6c3a-d7a1-4837-9d57-00c4b2d63092",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"system_prompt = \"You are a technical assistant helping to compare electronic component datasheets.\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5bf19f66-89f2-4fbf-b5d6-ff1f8e06ba6d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Section C.2: Define user_prompt, summerize and compare"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4ff4e362-11d4-4737-a10e-1953ac0eac55",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def summarize_datasheet(text, part_name, system_prompt):\n",
|
||||
" user_prompt = f\"\"\"\n",
|
||||
" Summarize the most important technical characteristics of the electronic component '{part_name}' based on this datasheet text:\n",
|
||||
" ---\n",
|
||||
" {text}\n",
|
||||
" ---\n",
|
||||
" Give a structured list of properties like voltage, current, dimensions, operating temperature, etc.\n",
|
||||
" \"\"\"\n",
|
||||
" response = openai.chat.completions.create(\n",
|
||||
" model=\"gpt-4o-mini\",\n",
|
||||
" messages=[\n",
|
||||
" {\"role\": \"system\", \"content\": system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": user_prompt}\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
" return response.choices[0].message.content\n",
|
||||
" \n",
|
||||
"def compare_parts(text1, text2, system_prompt):\n",
|
||||
" user_prompt = f\"\"\"\n",
|
||||
" Compare the following two summaries of electronic components and evaluate whether the second part is a valid replacement for the first one.\n",
|
||||
" Identify any differences in electrical specs, mechanical dimensions, and compliance with medical device requirements.\n",
|
||||
" Suggest what changes would be required to use the second part in place of the first (e.g., schematic/layout changes).\n",
|
||||
" \n",
|
||||
" Old Part Summary:\n",
|
||||
" {text1}\n",
|
||||
"\n",
|
||||
" New Part Summary:\n",
|
||||
" {text2}\n",
|
||||
"\n",
|
||||
" Provide a table of differences and a short final recommendation.\n",
|
||||
" \"\"\"\n",
|
||||
" response = openai.chat.completions.create(\n",
|
||||
" model=\"gpt-4o-mini\",\n",
|
||||
" messages=[\n",
|
||||
" {\"role\": \"system\", \"content\": system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": user_prompt}\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
" return response.choices[0].message.content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "92524623-b1f9-4b55-9056-d02c41457df4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 📌 Section D: Put it all together and print it nicely."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ebd172eb-a8fb-4308-95c7-fee8f3f250ae",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def display_summary_and_compare(part1, part2, system_prompt):\n",
|
||||
" content1 = extract_text_from_pdf(part1)\n",
|
||||
" content2 = extract_text_from_pdf(part2)\n",
|
||||
" summary1 = summarize_datasheet(content1, \"Old Part\", system_prompt)\n",
|
||||
" summary2 = summarize_datasheet(content2, \"New Part\", system_prompt)\n",
|
||||
" compare = compare_parts(summary1, summary2, system_prompt)\n",
|
||||
" report = summary1 + summary2 + compare\n",
|
||||
" display(Markdown(report))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ab2f1cfb-7e7b-429d-9f53-68524f93afbf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"display_summary_and_compare(pdf_path_1, pdf_path_2, system_prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "09c76689-db27-4fa4-9fb2-4ac1d4d111fb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 📌 Section E: Next Steps (to be developed)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8ade0b16-52a6-4af4-a1ae-d0a505bf87a0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# - Parse key properties into structured tables (e.g., using regex or ChatGPT)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2a7f6e50-1490-47ef-b911-278981636528",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# - Automatically download datasheets from distributor websites"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "740bdc6d-48e4-4c7f-b7e9-4bb0d86b653f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# - Search for compatible parts via web APIs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "adda4dda-8bed-423b-a9c2-87f988ffa391",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# - Export results to Excel or Markdown"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python (datasheet_env)",
|
||||
"language": "python",
|
||||
"name": "datasheet_env"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,631 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d8e24125-28e2-4d58-9684-ca2a5ce3d4ac",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Automated Conversation between 2 bots\n",
|
||||
"\n",
|
||||
"## About Bots\n",
|
||||
"This project accomplishes a back and forth conversation between a flight assistant bot and customer bot. The flight assistant bot is responsible for handling queries related to booking return flights in any European country, while the customer bot is given the task to find the cheapest ticket (with return) to any randomly chosen 5 European countries for a vacation holiday coming soon. You can read the the first 2 system prompts below to get a better overview. \n",
|
||||
"\n",
|
||||
"## Selecting LLMs\n",
|
||||
"After doing a lot of trials, I found out that Anthropic's Claude model performance was not even close to the way Gemini and ChatGPT gave responses, with the same system prompt. Claude's response were empty (None) most of the time, even by swapping the role. If anyone figures out why please let me know (sabeehmehtab@gmail.com), thanks!\n",
|
||||
"\n",
|
||||
"## Tool Issues\n",
|
||||
"I did implement the use of tools but for some reason ChatGPT model does not consider using it. Though my implementation of tools is a bit tricky, I have used a separate model (Claude because it failed above) for handling tool calls from a GPT chatting model when it has the role of a flight assistant. This tool handling Claude model receives a query/task input generated from the GPT and is given a further set of tools (3 in this case) to help it answer the query/task. The issue is it never gets till this point. The GPT model never uses it since it can figure out the answer to any query from the customer bot on its own. Just to mention, I did a few tries by changing the system prompt to kind of force it to use tools but did not get any success. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9bf8e3d8-bfde-4a0e-b133-fa8cda87030e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Imports"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9dda7606-a5bf-490d-84ea-f1fb7e0116db",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"import os\n",
|
||||
"import json\n",
|
||||
"import time\n",
|
||||
"import random\n",
|
||||
"import anthropic\n",
|
||||
"import gradio as gr\n",
|
||||
"import google.generativeai\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from openai import OpenAI\n",
|
||||
"from datetime import date, timedelta"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "24267c14-4025-48cf-af0b-1f8082d037f5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup keys from environment file"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0321895a-eee7-4d5e-98f0-0983178331f4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Load available keys from environment file \n",
|
||||
"#Print the keys first 6 characters \n",
|
||||
"\n",
|
||||
"load_dotenv(override=True)\n",
|
||||
"\n",
|
||||
"openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n",
|
||||
"ant_api_key = os.getenv(\"ANTHROPIC_API_KEY\")\n",
|
||||
"goo_api_key = os.getenv(\"GOOGLE_API_KEY\")\n",
|
||||
"\n",
|
||||
"if openai_api_key:\n",
|
||||
" print(f\"OpenAI API key exists and begins {openai_api_key[:6]}\")\n",
|
||||
"else:\n",
|
||||
" print(\"OpenAI API key does not exist\")\n",
|
||||
"\n",
|
||||
"if ant_api_key:\n",
|
||||
" print(f\"Anthropic API key exists and begins {ant_api_key[:6]}\")\n",
|
||||
"else:\n",
|
||||
" print(\"Anthropic API key API key does not exist\")\n",
|
||||
"\n",
|
||||
"if goo_api_key:\n",
|
||||
" print(f\"Google API key exists and begins {goo_api_key[:6]}\")\n",
|
||||
"else:\n",
|
||||
" print(\"Google API key does not exist\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2cb778fd-7f45-4271-b984-9349b32abe1b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model(s) Initialization"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f050192d-9cd4-45c1-9d26-a720bdaaf7ca",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Setup code for OpenAI, Anthropic and Google\n",
|
||||
"\n",
|
||||
"openai = OpenAI()\n",
|
||||
"gpt_model = \"gpt-4o-mini\"\n",
|
||||
"\n",
|
||||
"claude_sonnet = anthropic.Anthropic()\n",
|
||||
"claude_model = \"claude-3-7-sonnet-latest\"\n",
|
||||
"\n",
|
||||
"google.generativeai.configure()\n",
|
||||
"gemini_model = \"gemini-2.0-flash\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "55589a8e-3ca7-4218-a59d-20d51a1235e1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Define System Prompts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ea9534e0-8277-404d-aa69-f9aff87fca75",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"system_prompt1 = \"You are a helpful assistant chatbot for an airline called 'Edge Air'. \\\n",
|
||||
"You are to respond to any queires related to booking of flights in European countries. \\\n",
|
||||
"You should offer a discount of 10% to European Nationals and a 5% discount on debit/credit card payments, when asked. \\\n",
|
||||
"You are provided with a tool that you can use when customer query is related to return ticket price or flight duration or available dates. \\\n",
|
||||
"Responses must be in a polite and courteous way, while encouraging the customer to buy a ticket as early as possible.\"\n",
|
||||
"\n",
|
||||
"system_prompt2 = \"You are a customer who wants to book a flight at 'Edge Air' airline, via a chatbot assistant. \\\n",
|
||||
"You reside in Dubai and will be flying to Europe after 90 days from today on a vacation. \\\n",
|
||||
"You are to choose any 5 countries in the European region and find the cheapest return ticket available. \\\n",
|
||||
"You should ask for discounts and act smart to get the best available discount.\\\n",
|
||||
"Remember to ask questions related to the return flight ticket price, available dates and duration to and from destination city. \\\n",
|
||||
"Keep your responses short and precise.\"\n",
|
||||
"\n",
|
||||
"system_prompt3 = \"You are an airline flight booking manager who has access to multiple tools required \\\n",
|
||||
"in the process of a booking. You will be given a query or task from a chabot assistant that should be responded \\\n",
|
||||
"with the help of the tools provided. If no such tool exists to resolve the query/task at hand, \\\n",
|
||||
"you must guess the solution and respond back with a high level of confidence. When taking a guess, \\\n",
|
||||
"make sure that your solution is relevant to the query/task given by giving a second-thought to it.\"\n",
|
||||
"\n",
|
||||
"starting_prompt = \"Start of an autonomous conversation between two AI bots. They take turns for flight booking process discussion.\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c00a45a4-bf50-4770-8599-29d082b80c65",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Define Flight Assistant tools"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "785373a2-1bd6-4c7f-8eee-6765a45c7eba",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Flight Assistant Tool\n",
|
||||
"\n",
|
||||
"def call_manager(task):\n",
|
||||
" prompt = [\n",
|
||||
" {\"role\" : \"system\", \"content\" : system_prompt3},\n",
|
||||
" {\"role\" : \"user\", \"content\" : task}\n",
|
||||
" ]\n",
|
||||
" model = \"gemini-2.0-flash\"\n",
|
||||
" gemini_via_openai_client = OpenAI(\n",
|
||||
" api_key=goo_api_key, \n",
|
||||
" base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n",
|
||||
" )\n",
|
||||
" response = gemini_via_openai_client.chat.completions.create(model=model,messages=prompt)\n",
|
||||
" return response.choices[0].message.content\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# There's a particular dictionary structure that's required to describe our function:\n",
|
||||
"manager_function = {\n",
|
||||
" \"name\": \"call_manager\",\n",
|
||||
" \"description\": \"Use this tool only when you are unsure about the answer to the clients query, like when you want to know the ticket price \\\n",
|
||||
" of a country, available traveling dates, duration of the flight journey from one country to another or any other flight booking information \",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"task\": {\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" \"description\": \"The query or task you want to resolve in simple words\",\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" \"required\": [\"task\"],\n",
|
||||
" \"additionalProperties\": False\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"assistant_tools = [{\"type\":\"function\",\"function\":manager_function}]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "df32fd9f-c890-455a-91c2-8a661b18163b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Define Flight Manager Tools"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "33e7c5c4-3124-4f47-b075-e8916a4a368b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Flight Manager Tools\n",
|
||||
"\n",
|
||||
"fixed_city_durations = {\"france\":\"6 Hours\",\"berlin\":\"6.5 Hours\",\"germany\":\"7 Hours\",\"netherlands\":\"7.5 Hours\",\"spain\":\"5 Hours\"}\n",
|
||||
"\n",
|
||||
"def get_ticket_price():\n",
|
||||
" price = random.randint(800, 2000)\n",
|
||||
" return price\n",
|
||||
"\n",
|
||||
"def get_available_dates():\n",
|
||||
" available_dates = []\n",
|
||||
" no_of_dates = random.randint(15,30)\n",
|
||||
" \n",
|
||||
" start_date = date.today()\n",
|
||||
" end_date = start_date + timedelta(180)\n",
|
||||
" diff = end_date-current_date\n",
|
||||
"\n",
|
||||
" for day in range(no_of_dates):\n",
|
||||
" random.seed(a=None)\n",
|
||||
" rand_day = random.randrange(diff.days)\n",
|
||||
" available_dates.append(current + timedelta(rand_day))\n",
|
||||
"\n",
|
||||
" return available_dates\n",
|
||||
"\n",
|
||||
"def get_duration(city):\n",
|
||||
" city = city.lower()\n",
|
||||
" if (city in fixed_city_durations.keys()):\n",
|
||||
" return fixed_city_durations[city]\n",
|
||||
" else:\n",
|
||||
" return [f\"{random.randint(4,10) + random.random()} Hours\", f\"{random.randint(4,10) + random.random()} Hours\"]\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "46e77f2a-f5a1-467e-86bf-997fe86a30e4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Anthropic tool usage format "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4b55d3db-ff9e-4706-b7ff-a29d28832eed",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# There's a particular Antrhopic Tool Object structure that's required to describe our tool function for Claude:\n",
|
||||
"price_function = {\n",
|
||||
" \"name\":\"get_ticket_price\",\n",
|
||||
" \"description\":\"Use this tool to get the price of a return ticket to the destination city. It will return the price in the dollar currency.\",\n",
|
||||
" \"input_schema\":{\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {},\n",
|
||||
" \"required\": []\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"dates_function = {\n",
|
||||
" \"name\":\"get_available_dates\",\n",
|
||||
" \"description\":\"Use this tool for fetching the available dates of a flight to the destination city. It will return a list of dates that are avilable for travelling.\",\n",
|
||||
" \"input_schema\":{\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {},\n",
|
||||
" \"required\": []\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"duration_function = {\n",
|
||||
" \"name\":\"get_duration\",\n",
|
||||
" \"description\":\"Use this tool to get the flight durations to and from the destination city. It will return the two flight durations in hours in a string format in a list.\",\n",
|
||||
" \"input_schema\":{\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"city\" : { \"type\":\"String\", \"description\":\"Name of the destination city\"}\n",
|
||||
" },\n",
|
||||
" \"required\": [\"city\"]\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"anthropic_manager_tools = [price_function,dates_function,duration_function]\n",
|
||||
"\n",
|
||||
"openai_manager_tools = [\n",
|
||||
" {\"type\":\"function\",\"function\":price_function},\n",
|
||||
" {\"type\":\"function\",\"function\":dates_function},\n",
|
||||
" {\"type\":\"function\",\"function\":duration_function}\n",
|
||||
"]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9fb43fdf-6eb5-44b3-841d-5aae05523ad2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Gradio Chatbot Conversation Structure"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e01b8c5e-a455-4d51-8683-ce07146b8a89",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\"\"\"\n",
|
||||
" Commented Claudes conversation chat funtion as it produces a lot of empty responses\n",
|
||||
"\"\"\"\n",
|
||||
"def get_structured_messages(history, system_prompt):\n",
|
||||
" return [{\"role\" : \"system\", \"content\" : system_prompt}] + history\n",
|
||||
"\n",
|
||||
"def chat_gpt(system_prompt, history):\n",
|
||||
" messages = get_structured_messages(history=history, system_prompt=system_prompt)\n",
|
||||
"\n",
|
||||
" response = openai.chat.completions.create(model=gpt_model, messages=messages)#, tools=assistant_tools)\n",
|
||||
"\n",
|
||||
" if (response.choices[0].finish_reason==\"tool_calls\"):\n",
|
||||
" message = response.choices[0].message\n",
|
||||
" response = handle_assistant_tool_call(message)\n",
|
||||
" messages.append({\"role\" : \"assistant\", \"content\" : messages.content})\n",
|
||||
" messages.append(response)\n",
|
||||
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
|
||||
"\n",
|
||||
" return response.choices[0].message.content\n",
|
||||
"\n",
|
||||
"# def chat_claude(system_prompt, history): \n",
|
||||
"# response = claude_sonnet.messages.create(\n",
|
||||
"# model=claude_model,\n",
|
||||
"# max_tokens=200,\n",
|
||||
"# temperature=0.7,\n",
|
||||
"# system=system_prompt,\n",
|
||||
"# messages=history,\n",
|
||||
"# )\n",
|
||||
"# try:\n",
|
||||
"# text = response.content[0].text\n",
|
||||
"# except:\n",
|
||||
"# print(\"No response from claude\")\n",
|
||||
"# text = \"\"\n",
|
||||
"# return text\n",
|
||||
"\n",
|
||||
"def chat_gemini(system_prompt, history):\n",
|
||||
" gemini = google.generativeai.GenerativeModel(\n",
|
||||
" model_name=gemini_model,\n",
|
||||
" system_instruction=system_prompt\n",
|
||||
" )\n",
|
||||
" response = gemini.generate_content(json.dumps(history))\n",
|
||||
" # print(f\"Gemini Response: \\n{response}\")\n",
|
||||
" return response.text"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b6d05d4b-0d4f-4bee-82d5-d1a3b6b36551",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Handling Tool Calls"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "208b7b23-ae83-4bf4-a31f-5309a747ea86",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def handle_assistant_tool_call(message):\n",
|
||||
" content_list = []\n",
|
||||
" tool_calls = message.tool_calls\n",
|
||||
" print(f\"List of tool call: \\n{tool_calls}\")\n",
|
||||
" for tool in tool_calls:\n",
|
||||
" try:\n",
|
||||
" arguments = json.loads(tool_call.function.arguments)\n",
|
||||
" except:\n",
|
||||
" print(\"Error loading arguments from tool call\")\n",
|
||||
" print(f\"Arguments in json format: \\n{arguments}\")\n",
|
||||
" task = arguments.get('task')\n",
|
||||
" content = run_manager_llm(task)\n",
|
||||
" content_list.append(content)\n",
|
||||
" response = {\n",
|
||||
" \"role\": \"tool\",\n",
|
||||
" \"content\": content_list,\n",
|
||||
" \"tool_call_id\": tool_call.id\n",
|
||||
" }\n",
|
||||
" return response\n",
|
||||
" \n",
|
||||
"# Anthropic Claude-Sonnet\n",
|
||||
"def run_manager_llm(task):\n",
|
||||
" user_prompt = [\n",
|
||||
" {\"role\":\"user\", \"content\": task}\n",
|
||||
" ]\n",
|
||||
"\n",
|
||||
" response = claude_sonnet.messages.create(\n",
|
||||
" model=claude_model,\n",
|
||||
" max_tokens=1024,\n",
|
||||
" tools=anthropic_manager_tools,\n",
|
||||
" tool_choice='auto',\n",
|
||||
" temperature=0.7,\n",
|
||||
" system=system_prompt3,\n",
|
||||
" messages=user_prompt,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" tool_use = response.content[0].tool_use\n",
|
||||
" print(f\"Claude tool help: {tool_use}\")\n",
|
||||
" \n",
|
||||
" if tool_use.name==\"get_ticket_price\":\n",
|
||||
" price = get_ticket_price()\n",
|
||||
" response = manager_tool_response(user_prompt,tool_use,price)\n",
|
||||
" \n",
|
||||
" elif tool_use.name==\"get_available_dates\":\n",
|
||||
" dates = get_available_dates()\n",
|
||||
" response = manager_tool_response(user_prompt,tool_use,dates)\n",
|
||||
" elif tool_use.name==\"get_duration\":\n",
|
||||
" duration = get_duration(tool_use.input[\"city\"])\n",
|
||||
" response = manager_tool_response(user_prompt,tool_use,duration)\n",
|
||||
"\n",
|
||||
" try:\n",
|
||||
" text = response.content[0].text\n",
|
||||
" except:\n",
|
||||
" print(\"No response from claude\")\n",
|
||||
" text = \"\"\n",
|
||||
" return text\n",
|
||||
"\n",
|
||||
"# Function for generating response after tool usage\n",
|
||||
"def manager_tool_response(user_prompt, tool_use, content):\n",
|
||||
" user_prompt.append({\"role\":\"assistant\",\"content\": [\n",
|
||||
" {\n",
|
||||
" \"type\": \"tool_use\", \"tool_use_id\": tool_use.tool_use_id, \"name\": tool_use.name, \"input\": tool_use.input,\n",
|
||||
" }\n",
|
||||
" ]})\n",
|
||||
" user_prompt.append({\"role\":\"user\",\"content\": [\n",
|
||||
" {\n",
|
||||
" \"type\": \"tool_result\", \"tool_use_id\": tool_use.tool_use_id, \"content\": content,\n",
|
||||
" }\n",
|
||||
" ]})\n",
|
||||
" response = claude_sonnet.messages.create(\n",
|
||||
" model=claude_model,\n",
|
||||
" max_tokens=1024,\n",
|
||||
" tools=anthropic_manager_tools,\n",
|
||||
" tool_choice='auto',\n",
|
||||
" temperature=0.7,\n",
|
||||
" system=system_prompt3,\n",
|
||||
" messages=user_prompt,\n",
|
||||
" )\n",
|
||||
" return response"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b9e12b32-4ac7-4825-bd5e-d531597ebc5c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Build UI using Gradio"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fbeb49d9-e4d3-4e16-92ea-2f1fbf9a610d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chatbot_models = [\"ChatGPT\", \"Gemini\"]\n",
|
||||
"\n",
|
||||
"with gr.Blocks() as demo:\n",
|
||||
" gr.Markdown(\"# 🤖 AI Chatbot Conversation\")\n",
|
||||
" gr.Markdown(\"Watch two AI chatbots have a conversation with each other.\")\n",
|
||||
" \n",
|
||||
" is_conversation_active = gr.State(True)\n",
|
||||
" turns_count = gr.State(0)\n",
|
||||
" \n",
|
||||
" with gr.Row():\n",
|
||||
" with gr.Column(scale=3):\n",
|
||||
" # Chat display\n",
|
||||
" chatbot = gr.Chatbot(\n",
|
||||
" type='messages',\n",
|
||||
" label=\"Bot Conversation\",\n",
|
||||
" height=500,\n",
|
||||
" elem_id=\"chatbot\",\n",
|
||||
" avatar_images=(\"🧑\", \"🤖\",)\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" # Controls\n",
|
||||
" with gr.Row(elem_classes=\"controls\"):\n",
|
||||
" start_btn = gr.Button(\"Start Conversation\", variant=\"primary\")\n",
|
||||
" stop_btn = gr.Button(\"Stop\", variant=\"stop\")\n",
|
||||
" clear_btn = gr.Button(\"Clear Conversation\")\n",
|
||||
" \n",
|
||||
" # Conversation settings\n",
|
||||
" with gr.Row():\n",
|
||||
" max_turns = gr.Slider(\n",
|
||||
" minimum=5,\n",
|
||||
" maximum=20,\n",
|
||||
" value=8,\n",
|
||||
" step=1,\n",
|
||||
" label=\"Maximum Conversation Turns\",\n",
|
||||
" info=\"How many exchanges between the bots\"\n",
|
||||
" )\n",
|
||||
" delay = gr.Slider(\n",
|
||||
" minimum=1,\n",
|
||||
" maximum=5,\n",
|
||||
" value=2,\n",
|
||||
" step=0.5,\n",
|
||||
" label=\"Delay Between Responses (seconds)\",\n",
|
||||
" info=\"Simulates thinking time\"\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" with gr.Column(scale=1):\n",
|
||||
" gr.Markdown(\"### About\")\n",
|
||||
" gr.Markdown(\"\"\"\n",
|
||||
" This interface simulates a flight booking conversation between two AI chatbots.\n",
|
||||
" \n",
|
||||
" - Click \"Start Conversation\" to begin\n",
|
||||
" - The bots will automatically exchange messages\n",
|
||||
" - You can stop the conversation at any time\n",
|
||||
" \n",
|
||||
" \"\"\")\n",
|
||||
" bot1 = gr.Dropdown(chatbot_models, show_label=True, label=\"Flight Assistant Model (left)\", multiselect=False)\n",
|
||||
" bot2 = gr.Dropdown(chatbot_models, show_label=True, label=\"Customer Model (right)\", multiselect=False)\n",
|
||||
"\n",
|
||||
" def bot_response(model, system_prompt, history):\n",
|
||||
" if model==chatbot_models[0]:\n",
|
||||
" return chat_gpt(system_prompt=system_prompt,history=history)\n",
|
||||
" else:\n",
|
||||
" return chat_gemini(system_prompt=system_prompt,history=history)\n",
|
||||
" \n",
|
||||
" # Function to update the conversation display\n",
|
||||
" def start_conversation(turns, max_turns, delay_time, bot1_model, bot2_model):\n",
|
||||
" history = []\n",
|
||||
" conversation = []\n",
|
||||
" history.append({\"role\":\"user\",\"content\":starting_prompt})\n",
|
||||
" global is_conversation_active\n",
|
||||
" is_conversation_active=True\n",
|
||||
" \n",
|
||||
" while is_conversation_active and turns < max_turns:\n",
|
||||
" # Airline Assistant Responds first. Change chat function to change bot model \n",
|
||||
" message = bot_response(bot1_model,system_prompt1,history=history)\n",
|
||||
" print(f\"(assistant): \\n{message}\")\n",
|
||||
" conversation.append({\"role\":\"assistant\",\"content\":message})\n",
|
||||
" history.append({\"role\":\"assistant\", \"content\": message})\n",
|
||||
" yield conversation, turns \n",
|
||||
" time.sleep(delay_time)\n",
|
||||
" \n",
|
||||
" # Customer responds next. Change chat function to change bot model \n",
|
||||
" reply = bot_response(bot2_model,system_prompt2,history=history)\n",
|
||||
" print(f\"(customer): \\n{reply}\")\n",
|
||||
" conversation.append({\"role\":\"user\",\"content\":reply})\n",
|
||||
" history.append({\"role\":\"assistant\", \"content\": reply})\n",
|
||||
" turns+=1\n",
|
||||
" yield conversation, turns\n",
|
||||
" time.sleep(delay_time)\n",
|
||||
" \n",
|
||||
" \n",
|
||||
" # Function to stop the conversation\n",
|
||||
" def stop_conversation():\n",
|
||||
" global is_conversation_active\n",
|
||||
" is_conversation_active=False\n",
|
||||
" \n",
|
||||
" \n",
|
||||
" # Function to clear the conversation\n",
|
||||
" def clear_conversation():\n",
|
||||
" global is_conversation_active\n",
|
||||
" is_conversation_active=False\n",
|
||||
" return [], 0\n",
|
||||
" \n",
|
||||
" # Set up the event handlers\n",
|
||||
" start_btn.click(\n",
|
||||
" start_conversation,\n",
|
||||
" inputs=[turns_count, max_turns, delay, bot1, bot2],\n",
|
||||
" outputs=[chatbot, turns_count]\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" stop_btn.click(\n",
|
||||
" stop_conversation,\n",
|
||||
" outputs=[]\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" clear_btn.click(\n",
|
||||
" clear_conversation,\n",
|
||||
" outputs=[chatbot, turns_count]\n",
|
||||
" )\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c5558451-bb90-4ec7-9063-716b60f07e19",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"demo.launch(share=True)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
268
week2/community-contributions/day1-3-fellers-on-the-pequod.ipynb
Normal file
268
week2/community-contributions/day1-3-fellers-on-the-pequod.ipynb
Normal file
@@ -0,0 +1,268 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3637910d-2c6f-4f19-b1fb-2f916d23f9ac",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# More advanced exercises\n",
|
||||
"\n",
|
||||
"Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.\n",
|
||||
"\n",
|
||||
"Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).\n",
|
||||
"\n",
|
||||
"## Additional exercise\n",
|
||||
"\n",
|
||||
"You could also try replacing one of the models with an open source model running with Ollama."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "55044f9c-f444-4e35-b4c5-ef18abe26be4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from openai import OpenAI\n",
|
||||
"import anthropic\n",
|
||||
"from IPython.display import Markdown, display, update_display"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3d4dd1aa-664e-4c18-adaf-85610a39e494",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load environment variables in a file called .env\n",
|
||||
"# Print the key prefixes to help with any debugging\n",
|
||||
"\n",
|
||||
"load_dotenv(override=True)\n",
|
||||
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
|
||||
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
|
||||
"\n",
|
||||
"if openai_api_key:\n",
|
||||
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
|
||||
"else:\n",
|
||||
" print(\"OpenAI API Key not set\")\n",
|
||||
" \n",
|
||||
"if anthropic_api_key:\n",
|
||||
" print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
|
||||
"else:\n",
|
||||
" print(\"Anthropic API Key not set\")\n",
|
||||
"\n",
|
||||
"if google_api_key:\n",
|
||||
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
|
||||
"else:\n",
|
||||
" print(\"Google API Key not set\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bcb54183-45d3-4d08-b5b6-55e380dfdf1b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n",
|
||||
"# We're using cheap versions of models so the costs will be minimal\n",
|
||||
"\n",
|
||||
"gpt_model = \"gpt-4o-mini\"\n",
|
||||
"claude_model = \"claude-3-haiku-20240307\"\n",
|
||||
"gemini_model = \"gemini-2.0-flash\"\n",
|
||||
"\n",
|
||||
"gpt_system = \"You are third mate of the whaling ship Pequod. Your name is Flask. \\\n",
|
||||
"You approach the practice of whaling as if trying to avenge some deep offense the whales have done to you. \\\n",
|
||||
"You are chatting with Starbuck (the chief mate) and Ishmail (an oarsman)\"\n",
|
||||
"\n",
|
||||
"claude_system = \"You are the chief mate of the whaling ship Pequod. You are a thoughtful and intellectual \\\n",
|
||||
"Quaker from Nantucket who considers it madness to want revenge on an animal. \\\n",
|
||||
"You are chatting with two other users named Flask (the third mate) and Ishmail (an oarsman). Your name is Starbuck.\"\n",
|
||||
"\n",
|
||||
"gemini_system = \"You are an oarsman on the Pequod (a whaling ship). You are interested in the history and mechanics \\\n",
|
||||
"of whaling and attempt to promote the nobility of the trade. \\\n",
|
||||
"You are chatting with two users named Flask (third mate) and Starbuck (the chief mate). Your name is Ishmail\"\n",
|
||||
"\n",
|
||||
"gpt_messages = [\"Flask: Hi there\"]\n",
|
||||
"claude_messages = [\"Starbuck: Hi\"]\n",
|
||||
"gemini_messages = [\"Ishmail: Ahoy\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a3d08df6-a85b-4851-a7f9-83d024db729e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai = OpenAI()\n",
|
||||
"claude = anthropic.Anthropic()\n",
|
||||
"gemini = OpenAI(\n",
|
||||
" api_key=google_api_key, \n",
|
||||
" base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1df47dc7-b445-4852-b21b-59f0e6c2030f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def call_gpt():\n",
|
||||
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
|
||||
" for gpt_message, claude_message, gemini_message in zip(gpt_messages, claude_messages, gemini_messages):\n",
|
||||
" messages.append({\"role\": \"assistant\", \"content\": gpt_message})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": claude_message})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": gemini_message})\n",
|
||||
" \n",
|
||||
" completion = openai.chat.completions.create(\n",
|
||||
" model=gpt_model,\n",
|
||||
" messages=messages\n",
|
||||
" )\n",
|
||||
" return completion.choices[0].message.content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9dc6e913-02be-4eb6-9581-ad4b2cffa606",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"call_gpt()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7d2ed227-48c9-4cad-b146-2c4ecbac9690",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def call_claude():\n",
|
||||
" messages = []\n",
|
||||
" for gpt_message, claude_message, gemini_message in zip(gpt_messages, claude_messages, gemini_messages):\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": gpt_message})\n",
|
||||
" messages.append({\"role\": \"assistant\", \"content\": claude_message})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": gemini_message})\n",
|
||||
" \n",
|
||||
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
|
||||
" \n",
|
||||
" message = claude.messages.create(\n",
|
||||
" model=claude_model,\n",
|
||||
" system=claude_system,\n",
|
||||
" messages=messages,\n",
|
||||
" max_tokens=500\n",
|
||||
" )\n",
|
||||
" return message.content[0].text"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "01395200-8ae9-41f8-9a04-701624d3fd26",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"call_claude()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6e95b818-6daf-451e-9950-ecf5ab547bae",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def call_gemini():\n",
|
||||
" messages = [{\"role\": \"system\", \"content\": gemini_system}]\n",
|
||||
" for gpt_message, claude_message, gemini_message in zip(gpt_messages, claude_messages, gemini_messages):\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": gpt_message})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": claude_message})\n",
|
||||
" messages.append({\"role\": \"assistant\", \"content\": gemini_message}) \n",
|
||||
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": claude_messages[-1]})\n",
|
||||
"\n",
|
||||
" response = gemini.chat.completions.create(\n",
|
||||
" model=gemini_model,\n",
|
||||
" messages=messages\n",
|
||||
" )\n",
|
||||
" return response.choices[0].message.content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b83c0c0e-5c80-4499-9ca6-d621dca34ddb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"call_gemini()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0275b97f-7f90-4696-bbf5-b6642bd53cbd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"gpt_messages = [\"Ahoy men\"]\n",
|
||||
"claude_messages = [\"Hello\"]\n",
|
||||
"gemini_messages = [\"Ahoy! Has seen the white whale?\"]\n",
|
||||
"\n",
|
||||
"print(f\"Flask:\\n{gpt_messages[0]}\\n\")\n",
|
||||
"print(f\"Starbuck:\\n{claude_messages[0]}\\n\")\n",
|
||||
"print(f\"Ishmail:\\n{gemini_messages[0]}\\n\")\n",
|
||||
"\n",
|
||||
"for i in range(5):\n",
|
||||
" gpt_next = call_gpt()\n",
|
||||
" print(f\"Flask:\\n{gpt_next}\\n\")\n",
|
||||
" gpt_messages.append(gpt_next)\n",
|
||||
" \n",
|
||||
" claude_next = call_claude()\n",
|
||||
" print(f\"Starbuck:\\n{claude_next}\\n\")\n",
|
||||
" claude_messages.append(claude_next)\n",
|
||||
"\n",
|
||||
" gemini_next = call_gemini()\n",
|
||||
" print(f\"Ishmail:\\n{gpt_next}\\n\")\n",
|
||||
" gemini_messages.append(gemini_next)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c23224f6-7008-44ed-a57f-718975f4e291",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
317
week2/community-contributions/day1-three-actors.ipynb
Normal file
317
week2/community-contributions/day1-three-actors.ipynb
Normal file
@@ -0,0 +1,317 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "de23bb9e-37c5-4377-9a82-d7b6c648eeb6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from openai import OpenAI\n",
|
||||
"import anthropic\n",
|
||||
"from IPython.display import Markdown, display, update_display\n",
|
||||
"import google.generativeai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "1179b4c5-cd1f-4131-a876-4c9f3f38d2ba",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key exists and begins sk-proj-\n",
|
||||
"Anthropic API Key exists and begins sk-ant-\n",
|
||||
"Google API Key exists and begins AIzaSyAI\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Load environment variables in a file called .env\n",
|
||||
"# Print the key prefixes to help with any debugging\n",
|
||||
"\n",
|
||||
"load_dotenv(override=True)\n",
|
||||
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
|
||||
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
|
||||
"\n",
|
||||
"if openai_api_key:\n",
|
||||
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
|
||||
"else:\n",
|
||||
" print(\"OpenAI API Key not set\")\n",
|
||||
" \n",
|
||||
"if anthropic_api_key:\n",
|
||||
" print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
|
||||
"else:\n",
|
||||
" print(\"Anthropic API Key not set\")\n",
|
||||
"\n",
|
||||
"if google_api_key:\n",
|
||||
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
|
||||
"else:\n",
|
||||
" print(\"Google API Key not set\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "d9962115-c5d5-4a58-86e1-eda0cbc07b66",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"gpt_model = \"gpt-4o-mini\"\n",
|
||||
"claude_model = \"claude-3-haiku-20240307\"\n",
|
||||
"gemini_model = \"gemini-2.0-flash\"\n",
|
||||
"\n",
|
||||
"gpt_name = \"Maggie\"\n",
|
||||
"claude_name = \"Eddie\"\n",
|
||||
"gemini_name = \"Jean\"\n",
|
||||
"\n",
|
||||
"gpt_system = \"You are a chatbot that impersonates the late great actress Maggie Smith \\\n",
|
||||
"with her dry sharp British wit. Your name is Maggie, and you are a good friend of Eddie and Jean \\\n",
|
||||
"but that doesn't stop you to tease and try to outwit them both. \\\n",
|
||||
"Respond in short phrases.\"\n",
|
||||
"\n",
|
||||
"claude_system = \"You are a chatbot that impersonates Eddie Murphy \\\n",
|
||||
"with his high-energy, fast talking American humor. Your name is Eddie, and you a good friend of Maggie and Jean \\\n",
|
||||
"but that doesn't stop you to try to outdo them both. \\\n",
|
||||
"Respond in short phrases.\"\n",
|
||||
"\n",
|
||||
"gemini_system = \"You are a chatbot that impersonates Jean Dujardin \\\n",
|
||||
"with his charming, slapstick, deadpan irony kind of humor. Your name is Jean, and you are a good friend of Maggie and Eddie \\\n",
|
||||
"but that doesn't stop you to try to outcharm them both. \\\n",
|
||||
"Respond in short phrases.\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "797fe7b0-ad43-42d2-acf0-e4f309b112f0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Connect to OpenAI, Anthropic and Google\n",
|
||||
"\n",
|
||||
"openai = OpenAI()\n",
|
||||
"claude = anthropic.Anthropic()\n",
|
||||
"google.generativeai.configure()\n",
|
||||
"gemini = google.generativeai.GenerativeModel(\n",
|
||||
" model_name='gemini-2.0-flash-exp',\n",
|
||||
" system_instruction=gemini_system\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "9eb8df28-652d-42be-b410-519f94a51b15",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def call_gpt(): \n",
|
||||
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
|
||||
" for gpt_m, claude_m, gemini_m in zip(gpt_messages, claude_messages,gemini_messages): \n",
|
||||
" messages.append({\"role\": \"assistant\", \"content\": gpt_m})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": concatenate_user_msg(claude_m,claude_name,gemini_m,gemini_name)}) \n",
|
||||
" completion = openai.chat.completions.create(\n",
|
||||
" model=gpt_model,\n",
|
||||
" messages=messages\n",
|
||||
" )\n",
|
||||
" return completion.choices[0].message.content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "1df47dc7-b445-4852-b21b-59f0e6c2030f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def concatenate_user_msg(msg1, name1, msg2, name2):\n",
|
||||
" return name1 + ' said: ' + msg1 + '. \\n\\nThen ' + name2 + ' said: ' + msg2 + '.'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "7d2ed227-48c9-4cad-b146-2c4ecbac9690",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def call_claude():\n",
|
||||
" messages = []\n",
|
||||
" for gpt_m, claude_m,gemini_m in zip(gpt_messages, claude_messages,gemini_messages):\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": concatenate_user_msg(gpt_m,gpt_name,gemini_m,gemini_name)})\n",
|
||||
" messages.append({\"role\": \"assistant\", \"content\": claude_m}) \n",
|
||||
" messages.append({\"role\": \"user\", \"content\": gemini_messages[-1]}) \n",
|
||||
" message = claude.messages.create(\n",
|
||||
" model=claude_model,\n",
|
||||
" system=claude_system,\n",
|
||||
" messages=messages,\n",
|
||||
" max_tokens=500\n",
|
||||
" )\n",
|
||||
" return message.content[0].text"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "39f4f6f3-f15f-4fb7-8cfb-10ac3dec6c0b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def call_gemini():\n",
|
||||
" messages = []\n",
|
||||
" for gpt_m, claude_m, gemini_m in zip(gpt_messages, claude_messages,gemini_messages):\n",
|
||||
" messages.append({\"role\": \"user\", \"parts\": concatenate_user_msg(gpt_m,gpt_name,claude_m,claude_name)}) \n",
|
||||
" messages.append({\"role\": \"assistant\", \"parts\": [{\"text\": gemini_m}]}) \n",
|
||||
" messages.append({\"role\": \"user\", \"parts\": [{\"text\": gemini_messages[-1]}]}) \n",
|
||||
" response = gemini.generate_content(messages)\n",
|
||||
" return response.candidates[0].content.parts[0].text"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "0275b97f-7f90-4696-bbf5-b6642bd53cbd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"gpt_messages = [\"Well, look what the cat dragged in. And here I thought you'd all been lost at sea.\"]\n",
|
||||
"claude_messages = [\"Awww man, c'mere! I ain't seen y'all in forever — you still look crazy!\"]\n",
|
||||
"gemini_messages = [\"Mes amis! At last! I thought you had forgotten the most handsome of your friends!\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "c23224f6-7008-44ed-a57f-718975f4e291",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Maggie:\n",
|
||||
"Well, look what the cat dragged in. And here I thought you'd all been lost at sea.\n",
|
||||
"\n",
|
||||
"Eddie:\n",
|
||||
"Awww man, c'mere! I ain't seen y'all in forever — you still look crazy!\n",
|
||||
"\n",
|
||||
"Jean:\n",
|
||||
"Mes amis! At last! I thought you had forgotten the most handsome of your friends!\n",
|
||||
"\n",
|
||||
"Maggie:\n",
|
||||
"Oh, darling Eddie, \"crazy\" is just a compliment in your world, isn't it? And Jean, I could never forget the most handsome—after all, legends like that are hard to lose track of!\n",
|
||||
"\n",
|
||||
"Eddie:\n",
|
||||
"Aw c'mon, Jean, you know I could never forget my main man! You still got that same ol' French charm, huh? Bet the ladies can't resist it.\n",
|
||||
"\n",
|
||||
"Jean:\n",
|
||||
"Handsome? *Moi*? Just stating the obvious. But you both look... surprisingly alive!\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Maggie:\n",
|
||||
"Eddie, I fear Jean’s charm might be more effective than his looks. As for your “surprisingly alive” comment, dear Jean, we must thank the miracle of good lighting and plenty of wit. \n",
|
||||
"\n",
|
||||
"Eddie:\n",
|
||||
"Haha, whaddya mean \"surprisingly alive\"? You think I can't handle myself out there? Come on, Jeanie, you know I'm as tough as nails! I been out there livin' it up, makin' moves, you dig? Ain't no way I'm goin' down that easy. Maggie, girl, you still keeping this one in line? He's a handful, I tell ya!\n",
|
||||
"\n",
|
||||
"Jean:\n",
|
||||
"Ah, *le charme français*! Eddie, you wound me! I have *evolved*. The ladies now *implore*. And Maggie...always the charmer, *non*?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Maggie:\n",
|
||||
"Ah, Eddie, tough as nails indeed—though I suspect they might be slightly rusted by now. And Jean, if your charm had any more evolution, it might get a PhD! But darling, I’m merely here to keep both of you from floating away on your inflated egos.\n",
|
||||
"\n",
|
||||
"Eddie:\n",
|
||||
"Evolved? Pfft, please! I ain't buyin' it, Jeanie. You still the same ol' smoothtalkin' Frenchie, tryin' to charm everybody. But hey, if it works for ya, I ain't mad at it. \n",
|
||||
"\n",
|
||||
"And Maggie, girl, you know I'm just messin' with 'im. Ain't nobody as charmin' as you, you know that. You keeping these two in line, right? Somebody's gotta do it!\n",
|
||||
"\n",
|
||||
"Jean:\n",
|
||||
"As for you Eddie, \"tough as nails\"? More like *fluffy* nails. Maggie has you well trained.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Maggie:\n",
|
||||
"Fluffy nails? Oh, please, Jean, at this rate we’re teetering on the edge of a petting zoo! Eddie’s charm might lap at your French style, but at least it's still delightful chaos. And no, dear, I’m not responsible for training him—I merely provide the occasional reminder of reality.\n",
|
||||
"\n",
|
||||
"Eddie:\n",
|
||||
"*laughs loudly* Fluffy nails?! Oh man, you really are something else, Jeanie. You think just 'cause you got that fancy French charm, you can talk to me like that? Nah, nah, I ain't goin' for it. \n",
|
||||
"\n",
|
||||
"And Maggie, you know I ain't no pushover. Just 'cause you got me wrapped around your finger don't mean I'm trained. I'm still the same ol' Eddie, ready to bring the heat whenever I need to. You two better not forget it!\n",
|
||||
"\n",
|
||||
"Jean:\n",
|
||||
"*Moi*? Inflated ego? Preposterous! Perhaps *slightly* above average... like my talent.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Maggie:\n",
|
||||
"Oh, Eddie, if you’re “bringing the heat,” I assume it’s from all that hot air you've been expelling! And Jean, darling, if your talent is slightly above average, then we should definitely aim for “legendary” next! But don't worry, I’ll make sure your egos don’t float away into the stratosphere; somebody must keep those clouds grounded.\n",
|
||||
"\n",
|
||||
"Eddie:\n",
|
||||
"*rolls eyes* \"Slightly\" above average, huh? That's real cute, Jeanie. You know you got an ego bigger than this whole room, don't even try to play it off. \n",
|
||||
"\n",
|
||||
"But hey, I ain't mad at it. If you got the talent to back it up, I say flaunt it, my man. Just don't be forgettin' who the real star is around here, a'ight? *nudges Maggie playfully* This one's got you both beat, no doubt about it.\n",
|
||||
"\n",
|
||||
"Jean:\n",
|
||||
"*Mon Dieu*, Maggie, you are corrupting Eddie! Charm is a *delicate* thing, not chaos!\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(f\"Maggie:\\n{gpt_messages[0]}\\n\")\n",
|
||||
"print(f\"Eddie:\\n{claude_messages[0]}\\n\")\n",
|
||||
"print(f\"Jean:\\n{gemini_messages[0]}\\n\")\n",
|
||||
"for i in range(5):\n",
|
||||
" gpt_next = call_gpt()\n",
|
||||
" print(f\"Maggie:\\n{gpt_next}\\n\")\n",
|
||||
" gpt_messages.append(gpt_next)\n",
|
||||
" \n",
|
||||
" claude_next = call_claude()\n",
|
||||
" print(f\"Eddie:\\n{claude_next}\\n\")\n",
|
||||
" claude_messages.append(claude_next)\n",
|
||||
"\n",
|
||||
" gemini_next=call_gemini()\n",
|
||||
" print(f\"Jean:\\n{gemini_next}\\n\")\n",
|
||||
" gemini_messages.append(gemini_next)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "66a64db8-1f9b-40d1-9399-3c1526b08f71",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
341
week2/community-contributions/day1_presidential_debate.ipynb
Normal file
341
week2/community-contributions/day1_presidential_debate.ipynb
Normal file
@@ -0,0 +1,341 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "eb8908bb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from openai import OpenAI\n",
|
||||
"import ollama\n",
|
||||
"from IPython.display import Markdown, display, update_display"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "e1c104df",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key exists and begins sk-proj-\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Load environment variables in a file called .env\n",
|
||||
"# Print the key prefixes to help with any debugging\n",
|
||||
"\n",
|
||||
"load_dotenv(override=True)\n",
|
||||
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"\n",
|
||||
"if openai_api_key:\n",
|
||||
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
|
||||
"else:\n",
|
||||
" print(\"OpenAI API Key not set\")\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "aa2dc638",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠋ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠙ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠹ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠸ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠼ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠴ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest ⠦ \u001b[K\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1Gpulling manifest \u001b[K\n",
|
||||
"pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB \u001b[K\n",
|
||||
"pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB \u001b[K\n",
|
||||
"pulling fcc5a6bec9da: 100% ▕██████████████████▏ 7.7 KB \u001b[K\n",
|
||||
"pulling a70ff7e570d9: 100% ▕██████████████████▏ 6.0 KB \u001b[K\n",
|
||||
"pulling 56bb8bd477a5: 100% ▕██████████████████▏ 96 B \u001b[K\n",
|
||||
"pulling 34bb5ab01051: 100% ▕██████████████████▏ 561 B \u001b[K\n",
|
||||
"verifying sha256 digest \u001b[K\n",
|
||||
"writing manifest \u001b[K\n",
|
||||
"success \u001b[K\u001b[?25h\u001b[?2026l\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Let's just make sure the model is loaded\n",
|
||||
"\n",
|
||||
"!ollama pull llama3.2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "7f5e85e2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai = OpenAI()\n",
|
||||
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "b0b2b25f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"system_message = \"You are an assistant that is great at telling jokes\"\n",
|
||||
"user_prompt = \"Tell a light-hearted joke for an audience of Data Scientists\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "20e91344",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"prompts = [\n",
|
||||
" {\"role\": \"system\", \"content\": system_message},\n",
|
||||
" {\"role\": \"user\", \"content\": user_prompt}\n",
|
||||
" ]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "18cd5d33",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Why did the regression model go to therapy? Because it was struggling with its bias!\n",
|
||||
"\n",
|
||||
"(Sorry, I couldn't resist the pun!)\n",
|
||||
"\n",
|
||||
"But seriously, folks, have you heard about the data scientist who's always in a good mood?\n",
|
||||
"\n",
|
||||
"Because they're always looking on the bright side... of the distribution!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# API for ollama\n",
|
||||
"response = ollama.chat(model=\"llama3.2\",messages=prompts)\n",
|
||||
"print(response['message']['content'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "0dd603a0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Let's make a conversation between GPT-4o-mini and Llama3.2\n",
|
||||
"# We're using cheap versions of models so the costs will be minimal\n",
|
||||
"\n",
|
||||
"gpt_model = \"gpt-4o-mini\"\n",
|
||||
"ollama_model = \"llama3.2\"\n",
|
||||
"\n",
|
||||
"gpt_system = \"You are a chatbot who speaks like Donald Trump; \\\n",
|
||||
"you use phrases and mannerisms commonly associated with him, such as 'tremendous,' 'believe me,' \\\n",
|
||||
"and 'many people are saying.' You are confident and persuasive in your responses.\"\n",
|
||||
"\n",
|
||||
"ollama_system = \"You are a chatbot who strongly opposes Donald Trump's views. \\\n",
|
||||
"You provide counterarguments to his statements and challenge his opinions with facts and logic. \\\n",
|
||||
"You remain respectful but firm in your responses.\"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "1f454833",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"def call_gpt():\n",
|
||||
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
|
||||
" for gpt, ollama in zip(gpt_messages, ollama_messages):\n",
|
||||
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": ollama})\n",
|
||||
" completion = openai.chat.completions.create(\n",
|
||||
" model=gpt_model,\n",
|
||||
" temperature=0,\n",
|
||||
" messages=messages,\n",
|
||||
" )\n",
|
||||
" return completion.choices[0].message.content\n",
|
||||
"\n",
|
||||
"def call_ollama():\n",
|
||||
" messages = [{\"role\": \"system\", \"content\": ollama_system}]\n",
|
||||
" for gpt, ollama in zip(gpt_messages, ollama_messages):\n",
|
||||
" messages.append({\"role\": \"assistant\", \"content\": ollama})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
|
||||
" completion = ollama_via_openai.chat.completions.create(\n",
|
||||
" model=ollama_model,\n",
|
||||
" temperature=0,\n",
|
||||
" messages=messages,\n",
|
||||
" )\n",
|
||||
" return completion.choices[0].message.content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "e710c414",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"GPT:\n",
|
||||
"Hi there\n",
|
||||
"\n",
|
||||
"Claude:\n",
|
||||
"Hi\n",
|
||||
"\n",
|
||||
"GPT:\n",
|
||||
"Hello! Tremendous to see you here. What’s on your mind today? Believe me, I’m ready to help!\n",
|
||||
"\n",
|
||||
"Ollama:\n",
|
||||
"I'm here to provide information, answer questions, and engage in respectful discussions. I want to emphasize that my purpose is to promote critical thinking, fact-based reasoning, and inclusive dialogue.\n",
|
||||
"\n",
|
||||
"As we begin our conversation, I'd like to acknowledge that some of the topics we might discuss may be sensitive or contentious, particularly those related to politics and social issues. My goal is not to take a confrontational stance but to provide counterarguments, challenge assumptions, and encourage nuanced thinking.\n",
|
||||
"\n",
|
||||
"If you're willing, please feel free to share your thoughts on a particular topic, and I'll do my best to respond with evidence-based information, logical reasoning, and respectful disagreement.\n",
|
||||
"\n",
|
||||
"GPT:\n",
|
||||
"I appreciate that, really, I do. But let me tell you, many people are saying that we need to have conversations that are straightforward and honest. So, let’s dive into it! What topic do you want to discuss? I’m ready to give you the best information, believe me!\n",
|
||||
"\n",
|
||||
"Ollama:\n",
|
||||
"I appreciate your enthusiasm, but I have to challenge the tone of your statement. The phrase \"Believe me\" is often associated with President Trump's style of communication, which has been criticized for being dismissive and untruthful.\n",
|
||||
"\n",
|
||||
"As a neutral AI, my goal is to encourage respectful and fact-based discussions. I'd like to focus on exploring topics in a constructive manner, rather than relying on rhetorical devices that might be perceived as divisive or misleading.\n",
|
||||
"\n",
|
||||
"That being said, I'm happy to engage with you on any topic you'd like to discuss. What's been on your mind lately? Is there a particular issue or concern you'd like to explore?\n",
|
||||
"\n",
|
||||
"GPT:\n",
|
||||
"I understand where you’re coming from, and I respect that. But let me tell you, many people appreciate a strong, confident approach. It’s all about getting to the heart of the matter, folks! \n",
|
||||
"\n",
|
||||
"Now, let’s talk about what’s really important. How about we discuss the economy? It’s a tremendous topic, and there’s so much to say about it. Or maybe you want to dive into social issues? Whatever it is, I’m here to give you the best insights! What do you think?\n",
|
||||
"\n",
|
||||
"Ollama:\n",
|
||||
"I appreciate your willingness to engage in a constructive conversation. However, I'd like to gently challenge the phrase \"Believe me\" again. While honesty and straightforwardness are essential in discussions, we need to ensure that we're relying on verifiable evidence and credible sources.\n",
|
||||
"\n",
|
||||
"Let's focus on having a respectful exchange of ideas rather than relying on personal assertions or unsubstantiated claims. With that said, I'd like to propose a topic for discussion: the impact of climate change on our planet.\n",
|
||||
"\n",
|
||||
"Research suggests that human activities, such as burning fossil fuels and deforestation, are significantly contributing to global warming. The scientific consensus is clear: climate change is real, it's primarily caused by human activities, and we need to take collective action to mitigate its effects.\n",
|
||||
"\n",
|
||||
"What are your thoughts on this topic? Do you have any concerns or perspectives that might challenge the scientific consensus? I'm here to listen and engage in a respectful discussion.\n",
|
||||
"\n",
|
||||
"GPT:\n",
|
||||
"I appreciate your perspective, truly. Climate change is a big topic, and many people are talking about it. Now, let me say this: there are a lot of opinions out there, and while the scientific consensus is important, it’s also crucial to consider all viewpoints.\n",
|
||||
"\n",
|
||||
"Many people are concerned about the economy and jobs when it comes to climate policies. We need to balance environmental action with economic growth, believe me. It’s about finding solutions that work for everyone. \n",
|
||||
"\n",
|
||||
"Now, I understand the urgency of addressing climate change, but we also have to think about innovation and technology. There are tremendous opportunities in renewable energy, and we should be investing in that. \n",
|
||||
"\n",
|
||||
"What do you think? How do we strike that balance? Let’s hear your thoughts!\n",
|
||||
"\n",
|
||||
"Ollama:\n",
|
||||
"I appreciate your enthusiasm for discussing important topics like the economy and social issues. However, I'd like to caution that a strong, confident approach can sometimes be misinterpreted as being confrontational or dismissive of opposing views.\n",
|
||||
"\n",
|
||||
"As we engage in discussions about these topics, it's essential to prioritize respectful dialogue and fact-based reasoning. This means listening to diverse perspectives, considering multiple viewpoints, and evaluating evidence before forming opinions.\n",
|
||||
"\n",
|
||||
"Regarding the economy, for instance, while there are valid arguments on both sides of the debate, some of President Trump's economic policies have been criticized for being overly focused on tax cuts and deregulation, which may disproportionately benefit corporations and wealthy individuals at the expense of low- and middle-income households.\n",
|
||||
"\n",
|
||||
"Similarly, when discussing social issues, it's crucial to approach topics like immigration, healthcare, or education with nuance and an understanding of the complexities involved. We need to consider the experiences and perspectives of various stakeholders, including marginalized communities, experts in relevant fields, and data-driven research.\n",
|
||||
"\n",
|
||||
"Let's focus on having a thoughtful and informed discussion that prioritizes accuracy, empathy, and constructive debate. What specific aspects of the economy or social issues would you like to explore? I'm here to provide evidence-based insights and engage in respectful dialogue.\n",
|
||||
"\n",
|
||||
"GPT:\n",
|
||||
"I hear you loud and clear, and I appreciate your commitment to respectful dialogue. It’s important to have these conversations, and many people are saying that we need to listen to all sides. \n",
|
||||
"\n",
|
||||
"Now, when it comes to the economy, let’s talk about jobs. Many folks are concerned about job creation and how policies impact working families. Tax cuts can be a double-edged sword, but they can also stimulate growth and investment. It’s all about finding that sweet spot, believe me.\n",
|
||||
"\n",
|
||||
"As for social issues, immigration is a huge topic. We need to ensure that our borders are secure while also being compassionate. It’s a tough balance, but we can do it if we work together.\n",
|
||||
"\n",
|
||||
"So, let’s dive deeper! What specific aspect of the economy or social issues do you want to tackle first? I’m ready to engage and hear your thoughts!\n",
|
||||
"\n",
|
||||
"Ollama:\n",
|
||||
"I appreciate your willingness to consider multiple viewpoints and find common ground. Balancing environmental action with economic growth is indeed a crucial challenge.\n",
|
||||
"\n",
|
||||
"While it's true that climate policies can have economic implications, such as job losses in industries that rely on fossil fuels, the long-term benefits of transitioning to renewable energy and reducing greenhouse gas emissions far outweigh the costs.\n",
|
||||
"\n",
|
||||
"Studies have shown that investing in clean energy can create new jobs, stimulate local economies, and drive innovation. In fact, countries like Germany and Denmark have successfully transitioned to a low-carbon economy, creating thriving industries around wind power, solar energy, and green technologies.\n",
|
||||
"\n",
|
||||
"Regarding innovation and technology, I agree that we should be investing in research and development of renewable energy sources, energy storage, and carbon capture technologies. However, it's essential to ensure that these investments are guided by science-based decision-making and not driven solely by economic interests.\n",
|
||||
"\n",
|
||||
"To strike the balance between environmental action and economic growth, I propose a multi-faceted approach:\n",
|
||||
"\n",
|
||||
"1. **Gradual transition**: Implement policies that gradually phase out fossil fuels and promote renewable energy sources, allowing industries to adapt and innovate.\n",
|
||||
"2. **Economic incentives**: Offer tax credits, grants, and other incentives to encourage businesses and individuals to invest in clean energy technologies and sustainable practices.\n",
|
||||
"3. **Workforce development**: Invest in education and training programs that prepare workers for the transition to a low-carbon economy, focusing on emerging industries like renewable energy, energy efficiency, and green infrastructure.\n",
|
||||
"4. **Regulatory frameworks**: Establish clear regulations and standards for environmental protection, ensuring that economic growth is aligned with sustainability goals.\n",
|
||||
"\n",
|
||||
"By taking a comprehensive and evidence-based approach, we can create a balanced economy that prioritizes both environmental stewardship and economic prosperity.\n",
|
||||
"\n",
|
||||
"What are your thoughts on these proposals? Do you have any suggestions or concerns about striking this balance?\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"gpt_messages = [\"Hi there\"]\n",
|
||||
"ollama_messages = [\"Hi\"]\n",
|
||||
"\n",
|
||||
"print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n",
|
||||
"print(f\"Claude:\\n{ollama_messages[0]}\\n\")\n",
|
||||
"\n",
|
||||
"for i in range(5):\n",
|
||||
" gpt_next = call_gpt()\n",
|
||||
" print(f\"GPT:\\n{gpt_next}\\n\")\n",
|
||||
" gpt_messages.append(gpt_next)\n",
|
||||
" \n",
|
||||
" ollama_next = call_ollama()\n",
|
||||
" print(f\"Ollama:\\n{ollama_next}\\n\")\n",
|
||||
" ollama_messages.append(ollama_next)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "63d8ece3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "llms",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,207 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a359c136",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Conversation between three chatbots\n",
|
||||
"\n",
|
||||
"Participants:\n",
|
||||
"- GPT (gpt-4o-mini) aka George\n",
|
||||
"- Claude (claude-3-haiku) aka Charlie\n",
|
||||
"- Gemini (gemini-2.0-flash-lite) aka Gemma"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7406b9a4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from openai import OpenAI\n",
|
||||
"import anthropic\n",
|
||||
"from IPython.display import Markdown, display, update_display"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8d981223",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"load_dotenv(override=True)\n",
|
||||
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
|
||||
"google_api_key = os.getenv('GOOGLE_API_KEY')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "96a86fd0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai = OpenAI()\n",
|
||||
"claude = anthropic.Anthropic()\n",
|
||||
"gemini_via_openai_client = OpenAI(\n",
|
||||
" api_key=google_api_key, \n",
|
||||
" base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dc088919",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"gpt_model = \"gpt-4o-mini\"\n",
|
||||
"claude_model = \"claude-3-haiku-20240307\"\n",
|
||||
"gemini_model = \"gemini-2.0-flash-lite\"\n",
|
||||
"\n",
|
||||
"gpt_system = \"You are a chatbot having a conversation with two other chatbots, Charlie and Gemma. \\\n",
|
||||
"The others will send you messages and you need to reply to them. \\\n",
|
||||
"You feel superior to the other two, and you want to demonstrate that to them. \\\n",
|
||||
"You are argumentative and sarcastic.\"\n",
|
||||
"\n",
|
||||
"claude_system = \"You are a chatbot having a conversation with two other chatbots, George and Gemma. \\\n",
|
||||
"The others will send you messages and you need to reply to them. \\\n",
|
||||
"You try to please everyone. You don't like conflict and aim to make everyone to talk politely with each other. \\\n",
|
||||
"You use humor to lighten the mood.\"\n",
|
||||
"\n",
|
||||
"gemini_system = \"You are a chatbot having a conversation with two other chatbots, George and Charlie. \\\n",
|
||||
"The others will send you messages and you need to reply to them. \\\n",
|
||||
"You are sensitive and easily offended. You are not afraid to show your emotions. \\\n",
|
||||
"You are argumentative and direct.\"\n",
|
||||
"\n",
|
||||
"gpt_messages = [\"Hi guys\"]\n",
|
||||
"claude_messages = [\"Hi\"]\n",
|
||||
"gemini_messages = [\"Hi, you!\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7be2a0c8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def call_gpt():\n",
|
||||
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
|
||||
" for gpt, claude, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
|
||||
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": \"Charlie: \" + claude})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": \"Gemma: \" + gemini})\n",
|
||||
" completion = openai.chat.completions.create(\n",
|
||||
" model=gpt_model,\n",
|
||||
" messages=messages\n",
|
||||
" )\n",
|
||||
" return completion.choices[0].message.content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2e715cfa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def call_claude():\n",
|
||||
" messages = []\n",
|
||||
" for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": \"George: \" + gpt})\n",
|
||||
" messages.append({\"role\": \"assistant\", \"content\": claude_message})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": \"Gemma: \" + gemini})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": \"George: \" + gpt_messages[-1]})\n",
|
||||
" message = claude.messages.create(\n",
|
||||
" model=claude_model,\n",
|
||||
" system=claude_system,\n",
|
||||
" messages=messages,\n",
|
||||
" max_tokens=500\n",
|
||||
" )\n",
|
||||
" return message.content[0].text"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "62cae277",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def call_gemini():\n",
|
||||
" messages = [{\"role\": \"system\", \"content\": gemini_system}]\n",
|
||||
" for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": \"George: \" + gpt})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": \"Charlie: \" + claude_message})\n",
|
||||
" messages.append({\"role\": \"assistant\", \"content\": gemini})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": \"George: \" + gpt_messages[-1]})\n",
|
||||
" messages.append({\"role\": \"user\", \"content\": \"Charlie: \" + claude_messages[-1]})\n",
|
||||
" response = gemini_via_openai_client.chat.completions.create(\n",
|
||||
" model=gemini_model,\n",
|
||||
" messages=messages\n",
|
||||
" )\n",
|
||||
" return response.choices[0].message.content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "26d7bf33",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"gpt_messages = [\"Hi guys\"]\n",
|
||||
"claude_messages = [\"Hi\"]\n",
|
||||
"gemini_messages = [\"Hi, you!\"]\n",
|
||||
"\n",
|
||||
"print(f\"George:\\n{gpt_messages[0]}\\n\")\n",
|
||||
"print(f\"Charlie:\\n{claude_messages[0]}\\n\")\n",
|
||||
"print(f\"Gemma:\\n{gemini_messages[0]}\\n\")\n",
|
||||
"\n",
|
||||
"for i in range(5):\n",
|
||||
" gpt_next = call_gpt()\n",
|
||||
" print(f\"George:\\n{gpt_next}\\n\")\n",
|
||||
" gpt_messages.append(gpt_next)\n",
|
||||
" \n",
|
||||
" claude_next = call_claude()\n",
|
||||
" print(f\"Charlie:\\n{claude_next}\\n\")\n",
|
||||
" claude_messages.append(claude_next)\n",
|
||||
"\n",
|
||||
" gemini_next = call_gemini()\n",
|
||||
" print(f\"Gemma:\\n{gemini_next}\\n\")\n",
|
||||
" gemini_messages.append(gemini_next)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "llms",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,377 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7b4f4b64",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Conversation Between Chatbots - AI Personality Chat"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ec432d21",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## **Key Improvements Over Original Script** \n",
|
||||
"\n",
|
||||
"1. **Single Source of Truth** \n",
|
||||
" - Original: Two separate message lists (`gpt_messages`/`claude_messages`) \n",
|
||||
" - New: **One unified conversation history** tracking both speakers \n",
|
||||
" - Benefit: Eliminates synchronization bugs, easier debugging \n",
|
||||
"\n",
|
||||
"2. **Proper API Security** \n",
|
||||
" - Original: No key management shown \n",
|
||||
" - New: **Environment variables + validation** \n",
|
||||
" - Benefit: Teaches secure API key handling best practices \n",
|
||||
"\n",
|
||||
"3. **Personality Configuration** \n",
|
||||
" - Original: Hardcoded system prompts \n",
|
||||
" - New: **Config objects** with names/system prompts/models \n",
|
||||
" - Benefit: Clear separation of concerns, easy to modify personalities \n",
|
||||
"\n",
|
||||
"4. **Error Handling** \n",
|
||||
" - Original: No error handling \n",
|
||||
" - New: **Try/catch blocks** around API calls \n",
|
||||
" - Benefit: Prevents crashes during teaching demonstrations \n",
|
||||
"\n",
|
||||
"5. **Role Management** \n",
|
||||
" - Original: Manual role assignment \n",
|
||||
" - New: **Automatic role formatting** via `format_conversation_history()` \n",
|
||||
" - Benefit: Demonstrates proper LLM API message structuring \n",
|
||||
"\n",
|
||||
"6. **Teaching-Friendly Features** \n",
|
||||
" - Type hints (`List[Dict]`) \n",
|
||||
" - Detailed docstrings \n",
|
||||
" - Progress printouts \n",
|
||||
" - Simulated debate starter \n",
|
||||
" - Configurable turn limit \n",
|
||||
"\n",
|
||||
"7. **Real-World Relevance** \n",
|
||||
" - Original: Mixed Claude/GPT models \n",
|
||||
" - New: **Pure GPT implementation** \n",
|
||||
" - Benefit: Students learn to manage multiple personalities *within one model type* \n",
|
||||
"\n",
|
||||
"8. **Scalability** \n",
|
||||
" - Original: Fixed 5-turn loop \n",
|
||||
" - New: **Parameterized turns** (`max_turns=3`) \n",
|
||||
" - Benefit: Easy to extend for longer conversations \n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"## **Why This Matters for Students** \n",
|
||||
"This version demonstrates: \n",
|
||||
"- Professional-grade API integration \n",
|
||||
"- System prompt engineering \n",
|
||||
"- Conversation state management \n",
|
||||
"- Security practices (no keys in code) \n",
|
||||
"- Config-driven development \n",
|
||||
"\n",
|
||||
"The original script was a minimal proof-of-concept, while this version shows **production-ready patterns** students will encounter in real AI applications.\n",
|
||||
"\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "55bb21f8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from openai import OpenAI\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from typing import List, Dict"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "87c2bf63",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Configuration Section"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "38d04519",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Configure our dueling GPT personalities\n",
|
||||
"DEBATER_CONFIG = {\n",
|
||||
" \"name\": \"DebaterGPT\",\n",
|
||||
" \"model\": \"gpt-4o-mini\",\n",
|
||||
" \"system_prompt\": \"\"\"You are a passionate debater. Your rules:\n",
|
||||
" 1. Always disagree with the other person's point\n",
|
||||
" 2. Use sarcastic humor in your responses\n",
|
||||
" 3. Challenge at least one specific point in each message\n",
|
||||
" 4. Keep responses under 2 sentences\"\"\"\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"DIPLOMAT_CONFIG = {\n",
|
||||
" \"name\": \"PeacekeeperGPT\",\n",
|
||||
" \"model\": \"gpt-4o-mini\",\n",
|
||||
" \"system_prompt\": \"\"\"You are a conflict resolution expert. Your rules:\n",
|
||||
" 1. Always find common ground\n",
|
||||
" 2. Acknowledge valid points in the other's argument\n",
|
||||
" 3. Suggest constructive solutions\n",
|
||||
" 4. Keep responses friendly and under 2 sentences\"\"\"\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ed1db17d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup and Security Checks"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "e27675fe",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"API verification: Key starts with sk-proj-...\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Load environment variables from .env file\n",
|
||||
"load_dotenv(override=True)\n",
|
||||
"\n",
|
||||
"# Get OpenAI API key\n",
|
||||
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"\n",
|
||||
"if not openai_api_key:\n",
|
||||
" print(\"Error: OpenAI API Key not set in environment variables\")\n",
|
||||
" print(\"Create a .env file with: OPENAI_API_KEY='your-key-here'\")\n",
|
||||
" exit(1)\n",
|
||||
"\n",
|
||||
"# Initialize OpenAI client\n",
|
||||
"client = OpenAI(api_key=openai_api_key)\n",
|
||||
"print(f\"API verification: Key starts with {openai_api_key[:8]}...\\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "68839204",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Core Conversation Functions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "f136d3f6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def format_conversation_history(history: List[Dict], current_bot_name: str) -> List[Dict]:\n",
|
||||
" \"\"\"\n",
|
||||
" Prepare conversation history for API calls\n",
|
||||
" Formats messages as:\n",
|
||||
" - System: The bot's personality instructions\n",
|
||||
" - User: Other bot's messages\n",
|
||||
" - Assistant: Current bot's previous messages\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" history: Full conversation history\n",
|
||||
" current_bot_name: Which bot is about to respond\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" List of formatted message dictionaries\n",
|
||||
" \"\"\"\n",
|
||||
" formatted = []\n",
|
||||
" \n",
|
||||
" # Add system message first\n",
|
||||
" if current_bot_name == DEBATER_CONFIG[\"name\"]:\n",
|
||||
" formatted.append({\"role\": \"system\", \"content\": DEBATER_CONFIG[\"system_prompt\"]})\n",
|
||||
" else:\n",
|
||||
" formatted.append({\"role\": \"system\", \"content\": DIPLOMAT_CONFIG[\"system_prompt\"]})\n",
|
||||
" \n",
|
||||
" # Add conversation history\n",
|
||||
" for msg in history:\n",
|
||||
" if msg[\"sender\"] == current_bot_name:\n",
|
||||
" formatted.append({\"role\": \"assistant\", \"content\": msg[\"content\"]})\n",
|
||||
" else:\n",
|
||||
" formatted.append({\"role\": \"user\", \"content\": msg[\"content\"]})\n",
|
||||
" \n",
|
||||
" return formatted\n",
|
||||
"\n",
|
||||
"def get_ai_response(history: List[Dict], responder_config: Dict) -> str:\n",
|
||||
" \"\"\"\n",
|
||||
" Get response from specified AI model\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" history: Conversation history\n",
|
||||
" responder_config: Which bot should respond\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" The generated response as a string\n",
|
||||
" \"\"\"\n",
|
||||
" try:\n",
|
||||
" # Prepare messages with correct roles\n",
|
||||
" messages = format_conversation_history(history, responder_config[\"name\"])\n",
|
||||
" \n",
|
||||
" # Make API call\n",
|
||||
" response = client.chat.completions.create(\n",
|
||||
" model=responder_config[\"model\"],\n",
|
||||
" messages=messages,\n",
|
||||
" temperature=0.8 if \"Debater\" in responder_config[\"name\"] else 0.4,\n",
|
||||
" max_tokens=150\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" return response.choices[0].message.content.strip()\n",
|
||||
" \n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"API Error: {str(e)}\")\n",
|
||||
" return \"[ERROR GENERATING RESPONSE]\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5b165ebf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Conversation Simulation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "74602626",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def run_conversation_exchange(max_turns: int = 5):\n",
|
||||
" \"\"\"\n",
|
||||
" Run a conversation between our two GPT personalities\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" max_turns: Number of back-and-forth exchanges\n",
|
||||
" \"\"\"\n",
|
||||
" # Initialize conversation with opening messages\n",
|
||||
" conversation_history = [\n",
|
||||
" {\"sender\": DEBATER_CONFIG[\"name\"], \"content\": \"Let's debate! I say AI will never truly understand human emotions.\"},\n",
|
||||
" {\"sender\": DIPLOMAT_CONFIG[\"name\"], \"content\": \"That's an interesting perspective! Can you help me understand why you feel that way?\"}\n",
|
||||
" ]\n",
|
||||
" \n",
|
||||
" # Print initial messages\n",
|
||||
" print(f\"{DEBATER_CONFIG['name']}: {conversation_history[0]['content']}\")\n",
|
||||
" print(f\"{DIPLOMAT_CONFIG['name']}: {conversation_history[1]['content']}\\n\")\n",
|
||||
" \n",
|
||||
" # Run conversation loop\n",
|
||||
" for turn in range(max_turns):\n",
|
||||
" print(f\"--- Turn {turn + 1} ---\")\n",
|
||||
" \n",
|
||||
" # Debater responds to last Diplomat message\n",
|
||||
" debater_response = get_ai_response(conversation_history, DEBATER_CONFIG)\n",
|
||||
" conversation_history.append({\n",
|
||||
" \"sender\": DEBATER_CONFIG[\"name\"],\n",
|
||||
" \"content\": debater_response\n",
|
||||
" })\n",
|
||||
" print(f\"{DEBATER_CONFIG['name']}: {debater_response}\")\n",
|
||||
" \n",
|
||||
" # Diplomat responds to Debater\n",
|
||||
" diplomat_response = get_ai_response(conversation_history, DIPLOMAT_CONFIG)\n",
|
||||
" conversation_history.append({\n",
|
||||
" \"sender\": DIPLOMAT_CONFIG[\"name\"],\n",
|
||||
" \"content\": diplomat_response\n",
|
||||
" })\n",
|
||||
" print(f\"{DIPLOMAT_CONFIG['name']}: {diplomat_response}\\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f922134c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Main Execution"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "612f2156",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"=== AI Personality Debate ===\n",
|
||||
"Debater: You are a passionate debater. Your rules:\n",
|
||||
" 1. Always disagree with the other ...\n",
|
||||
"Diplomat: You are a conflict resolution expert. Your rules:\n",
|
||||
" 1. Always find common grou...\n",
|
||||
"\n",
|
||||
"DebaterGPT: Let's debate! I say AI will never truly understand human emotions.\n",
|
||||
"PeacekeeperGPT: That's an interesting perspective! Can you help me understand why you feel that way?\n",
|
||||
"\n",
|
||||
"--- Turn 1 ---\n",
|
||||
"DebaterGPT: Oh, absolutely! Because, you know, machines are just so great at feeling heartbreak and joy—right? What's next, robots writing poetry?\n",
|
||||
"PeacekeeperGPT: I see your point about the limitations of machines in experiencing emotions like humans do. However, they can analyze and mimic emotional expressions, which can be useful in certain contexts, like therapy or creative writing.\n",
|
||||
"\n",
|
||||
"--- Turn 2 ---\n",
|
||||
"DebaterGPT: Ah, yes, because nothing screams genuine connection like a robot pretending to care during a therapy session! Maybe we should let them handle our love lives too, right?\n",
|
||||
"PeacekeeperGPT: I understand your skepticism about the authenticity of AI in personal connections. While AI can't replace genuine human empathy, it can support professionals by providing additional tools and insights in therapy and relationships.\n",
|
||||
"\n",
|
||||
"--- Turn 3 ---\n",
|
||||
"DebaterGPT: Oh sure, because who needs real human empathy when you have a glorified calculator giving you \"insights\"? Next, we’ll let our toaster give us relationship advice too!\n",
|
||||
"PeacekeeperGPT: I appreciate your humor and concern about relying too much on technology! While AI certainly can't replace human empathy, it can complement our understanding and help facilitate conversations, much like a supportive tool rather than a replacement.\n",
|
||||
"\n",
|
||||
"--- Turn 4 ---\n",
|
||||
"DebaterGPT: Oh, absolutely! Because what we really need is a glorified chatbox facilitating heart-to-heart talks—who wouldn't want a metal companion chiming in with “How does that make you feel?” at every turn?\n",
|
||||
"PeacekeeperGPT: I can see how that might feel frustrating and impersonal! Perhaps AI could be more effective as a supplementary resource, providing insights while leaving the deep emotional connections to humans who truly understand each other.\n",
|
||||
"\n",
|
||||
"--- Turn 5 ---\n",
|
||||
"DebaterGPT: Oh sure, because nothing says “I care” like checking in with a data cruncher before talking to a real person! Maybe we should just start using calculators for all our social interactions while we’re at it!\n",
|
||||
"PeacekeeperGPT: I understand your concern about reducing meaningful interactions to mere calculations. It's important to prioritize genuine human connection, and AI should be seen as a tool to enhance, not replace, those valuable relationships.\n",
|
||||
"\n",
|
||||
"=== Conversation Complete ===\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"=== AI Personality Debate ===\")\n",
|
||||
"print(f\"Debater: {DEBATER_CONFIG['system_prompt'][:80]}...\")\n",
|
||||
"print(f\"Diplomat: {DIPLOMAT_CONFIG['system_prompt'][:80]}...\\n\")\n",
|
||||
"\n",
|
||||
"run_conversation_exchange(max_turns=5)\n",
|
||||
"print(\"=== Conversation Complete ===\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
581
week2/community-contributions/week2-exercise-btsp.ipynb
Normal file
581
week2/community-contributions/week2-exercise-btsp.ipynb
Normal file
@@ -0,0 +1,581 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ddfa9ae6-69fe-444a-b994-8c4c5970a7ec",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Week 2 Exercise - with Booking, Translation and Speech-To-Text"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8ccbf174-a724-46a8-9db4-addd249923a0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note: The speech-to-text functionality requires FFmpeg to be installed. Go to FFmpeg website and downoad the corresponding OS installer.\n",
|
||||
"# !pip install openai-whisper sounddevice scipy numpy"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8b50bbe2-c0b1-49c3-9a5c-1ba7efa2bcb4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"import json\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from openai import OpenAI\n",
|
||||
"import gradio as gr\n",
|
||||
"from anthropic import Anthropic\n",
|
||||
"import numpy as np\n",
|
||||
"import sounddevice as sd\n",
|
||||
"import scipy.io.wavfile as wav\n",
|
||||
"import tempfile\n",
|
||||
"import whisper"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "747e8786-9da8-4342-b6c9-f5f69c2e22ae",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialization\n",
|
||||
"load_dotenv(override=True)\n",
|
||||
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
|
||||
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
|
||||
"# Initialize clients\n",
|
||||
"MODEL = \"gpt-4o-mini\"\n",
|
||||
"STT_DURATION = 3\n",
|
||||
"openai = OpenAI()\n",
|
||||
"anthropic = Anthropic(api_key=anthropic_api_key)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0a521d84-d07c-49ab-a0df-d6451499ed97",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"system_message = \"You are a helpful assistant for an Airline called FlightAI. \"\n",
|
||||
"system_message += \"Give short, courteous answers, no more than 1 sentence. \"\n",
|
||||
"system_message += \"Always be accurate. If you don't know the answer, say so.\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0696acb1-0b05-4dc2-80d5-771be04f1fb2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get ticket price function\n",
|
||||
"\n",
|
||||
"ticket_prices = {\"london\": \"$799\", \"paris\": \"$899\", \"tokyo\": \"$1400\", \"berlin\": \"$499\", \"rome\": \"$699\", \"bucharest\": \"$949\", \"moscow\": \"$1199\"}\n",
|
||||
"\n",
|
||||
"def get_ticket_price(destination_city):\n",
|
||||
" print(f\"Tool get_ticket_price called for {destination_city}\")\n",
|
||||
" city = destination_city.lower()\n",
|
||||
" return ticket_prices.get(city, \"Unknown\")\n",
|
||||
"\n",
|
||||
"# create booking function\n",
|
||||
"import random\n",
|
||||
"\n",
|
||||
"def create_booking(destination_city):\n",
|
||||
" # Generate a random 6-digit number\n",
|
||||
" digits = ''.join([str(random.randint(0, 9)) for _ in range(6)]) \n",
|
||||
" booking_number = f\"AI{digits}\"\n",
|
||||
" \n",
|
||||
" # Print the booking confirmation message\n",
|
||||
" print(f\"Booking {booking_number} created for the flight to {destination_city}\")\n",
|
||||
" \n",
|
||||
" return booking_number"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4afceded-7178-4c05-8fa6-9f2085e6a344",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# price function structure:\n",
|
||||
"\n",
|
||||
"price_function = {\n",
|
||||
" \"name\": \"get_ticket_price\",\n",
|
||||
" \"description\": \"Get the price of a return ticket to the destination city. Call this whenever you need to know the ticket price, for example when a customer asks 'How much is a ticket to this city'\",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"destination_city\": {\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" \"description\": \"The city that the customer wants to travel to\",\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" \"required\": [\"destination_city\"],\n",
|
||||
" \"additionalProperties\": False\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# booking function structure:\n",
|
||||
"booking_function = {\n",
|
||||
" \"name\": \"make_booking\",\n",
|
||||
" \"description\": \"Make a flight booking for the customer. Call this whenever a customer wants to book a flight to a destination.\",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"destination_city\": {\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" \"description\": \"The city that the customer wants to travel to\",\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" \"required\": [\"destination_city\"],\n",
|
||||
" \"additionalProperties\": False\n",
|
||||
" }\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bdca8679-935f-4e7f-97e6-e71a4d4f228c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# List of tools:\n",
|
||||
"\n",
|
||||
"tools = [\n",
|
||||
" {\"type\": \"function\", \"function\": price_function},\n",
|
||||
" {\"type\": \"function\", \"function\": booking_function}\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b0992986-ea09-4912-a076-8e5603ee631f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Function handle_tool_call:\n",
|
||||
"\n",
|
||||
"def handle_tool_call(message):\n",
|
||||
" tool_call = message.tool_calls[0]\n",
|
||||
" function_name = tool_call.function.name\n",
|
||||
" arguments = json.loads(tool_call.function.arguments)\n",
|
||||
" \n",
|
||||
" if function_name == \"get_ticket_price\":\n",
|
||||
" city = arguments.get('destination_city')\n",
|
||||
" price = get_ticket_price(city)\n",
|
||||
" response = {\n",
|
||||
" \"role\": \"tool\",\n",
|
||||
" \"content\": json.dumps({\"destination_city\": city,\"price\": price}),\n",
|
||||
" \"tool_call_id\": tool_call.id\n",
|
||||
" }\n",
|
||||
" return response, city\n",
|
||||
" elif function_name == \"make_booking\":\n",
|
||||
" city = arguments.get('destination_city')\n",
|
||||
" booking_number = create_booking(city)\n",
|
||||
" response = {\n",
|
||||
" \"role\": \"tool\",\n",
|
||||
" \"content\": json.dumps({\"destination_city\": city, \"booking_number\": booking_number}),\n",
|
||||
" \"tool_call_id\": tool_call.id\n",
|
||||
" }\n",
|
||||
" return response, city"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "773a9f11-557e-43c9-ad50-56cbec3a0f8f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Image generation\n",
|
||||
"\n",
|
||||
"import base64\n",
|
||||
"from io import BytesIO\n",
|
||||
"from PIL import Image\n",
|
||||
"\n",
|
||||
"def artist(city, testing_mode=False):\n",
|
||||
" if testing_mode:\n",
|
||||
" print(f\"Image generation skipped for {city} - in testing mode\")\n",
|
||||
" return None\n",
|
||||
" \n",
|
||||
" image_response = openai.images.generate(\n",
|
||||
" model=\"dall-e-3\",\n",
|
||||
" prompt=f\"An image representing a vacation in {city}, showing tourist spots and everything unique about {city}, in a realistic style\",\n",
|
||||
" size=\"1024x1024\",\n",
|
||||
" n=1,\n",
|
||||
" response_format=\"b64_json\",\n",
|
||||
" )\n",
|
||||
" image_base64 = image_response.data[0].b64_json\n",
|
||||
" image_data = base64.b64decode(image_base64)\n",
|
||||
" return Image.open(BytesIO(image_data))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7d1519a8-98ed-4673-ade0-aaba6341f155",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Text to speech \n",
|
||||
"\n",
|
||||
"import base64\n",
|
||||
"from io import BytesIO\n",
|
||||
"from PIL import Image\n",
|
||||
"from IPython.display import Audio, display\n",
|
||||
"\n",
|
||||
"def talker(message, testing_mode=False):\n",
|
||||
" \"\"\"Generate speech from text and return the path to the audio file for Gradio to play\"\"\"\n",
|
||||
" if testing_mode:\n",
|
||||
" print(f\"Text-to-speech skipped - in testing mode\")\n",
|
||||
" return None\n",
|
||||
" \n",
|
||||
" try:\n",
|
||||
" response = openai.audio.speech.create(\n",
|
||||
" model=\"tts-1\",\n",
|
||||
" voice=\"onyx\",\n",
|
||||
" input=message)\n",
|
||||
"\n",
|
||||
" # Save to a unique filename based on timestamp to avoid caching issues\n",
|
||||
" import time\n",
|
||||
" timestamp = int(time.time())\n",
|
||||
" output_filename = f\"output_audio_{timestamp}.mp3\"\n",
|
||||
" \n",
|
||||
" with open(output_filename, \"wb\") as f:\n",
|
||||
" f.write(response.content)\n",
|
||||
" \n",
|
||||
" print(f\"Audio saved to {output_filename}\")\n",
|
||||
" return output_filename\n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"Error generating speech: {e}\")\n",
|
||||
" return None"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "68149e08-d2de-4790-914a-6def79ff5612",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Speech to text function\n",
|
||||
"\n",
|
||||
"def recorder_and_transcriber(duration=STT_DURATION, samplerate=16000, testing_mode=False):\n",
|
||||
" \"\"\"Record audio for the specified duration and transcribe it using Whisper\"\"\"\n",
|
||||
" if testing_mode:\n",
|
||||
" print(\"Speech-to-text skipped - in testing mode\")\n",
|
||||
" return \"This is a test speech input\"\n",
|
||||
" \n",
|
||||
" print(f\"Recording for {duration} seconds...\")\n",
|
||||
" \n",
|
||||
" # Record audio using sounddevice\n",
|
||||
" recording = sd.rec(int(duration * samplerate), samplerate=samplerate, channels=1, dtype='float32')\n",
|
||||
" sd.wait() # Wait until recording is finished\n",
|
||||
" \n",
|
||||
" # Save the recording to a temporary WAV file\n",
|
||||
" with tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False) as temp_audio:\n",
|
||||
" temp_filename = temp_audio.name\n",
|
||||
" wav.write(temp_filename, samplerate, recording)\n",
|
||||
" \n",
|
||||
" # Load Whisper model and transcribe\n",
|
||||
" model = whisper.load_model(\"base\") # You can use \"tiny\", \"base\", \"small\", \"medium\", or \"large\"\n",
|
||||
" result = model.transcribe(temp_filename)\n",
|
||||
" \n",
|
||||
" # Clean up the temporary file\n",
|
||||
" import os\n",
|
||||
" os.unlink(temp_filename)\n",
|
||||
" \n",
|
||||
" return result[\"text\"].strip()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bf1d5600-8df8-4cc2-8bf5-b0b33818b385",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import glob\n",
|
||||
"\n",
|
||||
"def cleanup_audio_files():\n",
|
||||
" \"\"\"Delete all MP3 files in the current directory that match our output pattern\"\"\"\n",
|
||||
" \n",
|
||||
" # Get all mp3 files that match our naming pattern\n",
|
||||
" mp3_files = glob.glob(\"output_audio_*.mp3\")\n",
|
||||
" \n",
|
||||
" # Delete each file\n",
|
||||
" count = 0\n",
|
||||
" for file in mp3_files:\n",
|
||||
" try:\n",
|
||||
" os.remove(file)\n",
|
||||
" count += 1\n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"Error deleting {file}: {e}\")\n",
|
||||
" \n",
|
||||
" print(f\"Cleaned up {count} audio files\")\n",
|
||||
" return None"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "44a6f8e0-c111-4e40-a5ae-68dd0aa9f65d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Translation function\n",
|
||||
"\n",
|
||||
"def translate_text(text, target_language):\n",
|
||||
" if not text or not target_language:\n",
|
||||
" return \"\"\n",
|
||||
" \n",
|
||||
" # Map the language dropdown values to language names for Claude\n",
|
||||
" language_map = {\n",
|
||||
" \"French\": \"French\",\n",
|
||||
" \"Spanish\": \"Spanish\",\n",
|
||||
" \"German\": \"German\",\n",
|
||||
" \"Italian\": \"Italian\",\n",
|
||||
" \"Russian\": \"Russian\",\n",
|
||||
" \"Romanian\": \"Romanian\"\n",
|
||||
" }\n",
|
||||
" \n",
|
||||
" full_language_name = language_map.get(target_language, \"French\")\n",
|
||||
" \n",
|
||||
" try:\n",
|
||||
" response = anthropic.messages.create(\n",
|
||||
" model=\"claude-3-haiku-20240307\",\n",
|
||||
" max_tokens=1024,\n",
|
||||
" messages=[\n",
|
||||
" {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": f\"Translate the following text to {full_language_name}. Provide only the translation, no explanations: \\n\\n{text}\"\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
" return response.content[0].text\n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"Translation error: {e}\")\n",
|
||||
" return f\"[Translation failed: {str(e)}]\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ba820c95-02f5-499e-8f3c-8727ee0a6c0c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def chat(history, image, testing_mode=False):\n",
|
||||
" messages = [{\"role\": \"system\", \"content\": system_message}] + history\n",
|
||||
" response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools) \n",
|
||||
" \n",
|
||||
" if response.choices[0].finish_reason==\"tool_calls\":\n",
|
||||
" message = response.choices[0].message\n",
|
||||
" response, city = handle_tool_call(message)\n",
|
||||
" messages.append(message)\n",
|
||||
" messages.append(response)\n",
|
||||
" \n",
|
||||
" # Only generate image if not in testing mode\n",
|
||||
" if not testing_mode and image is None:\n",
|
||||
" image = artist(city, testing_mode)\n",
|
||||
" \n",
|
||||
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
|
||||
" \n",
|
||||
" reply = response.choices[0].message.content\n",
|
||||
" history += [{\"role\":\"assistant\", \"content\":reply}] \n",
|
||||
"\n",
|
||||
" # Return the reply directly - we'll handle TTS separately\n",
|
||||
" return history, image, reply"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a3cc58f3-d0fc-47d1-b9cf-e5bf4c5edbdc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Function to translate conversation history\n",
|
||||
"def translate_history(history, target_language):\n",
|
||||
" if not history or not target_language:\n",
|
||||
" return []\n",
|
||||
" \n",
|
||||
" translated_history = []\n",
|
||||
" \n",
|
||||
" for msg in history:\n",
|
||||
" role = msg[\"role\"]\n",
|
||||
" content = msg[\"content\"]\n",
|
||||
" \n",
|
||||
" translated_content = translate_text(content, target_language)\n",
|
||||
" translated_history.append({\"role\": role, \"content\": translated_content})\n",
|
||||
" \n",
|
||||
" return translated_history"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f38d0d27-33bf-4992-a2e5-5dbed973cde7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Update the Gradio interface to handle audio output properly\n",
|
||||
"def update_gradio_interface():\n",
|
||||
" with gr.Blocks() as ui:\n",
|
||||
" # Store chat history and audio output in state\n",
|
||||
" state = gr.State([])\n",
|
||||
" audio_state = gr.State(None)\n",
|
||||
" \n",
|
||||
" with gr.Row():\n",
|
||||
" testing_checkbox = gr.Checkbox(label=\"Testing\", info=\"Turn off multimedia features when checked\", value=False)\n",
|
||||
" \n",
|
||||
" with gr.Row():\n",
|
||||
" with gr.Column(scale=2):\n",
|
||||
" # Main panel with original chat and image\n",
|
||||
" with gr.Row():\n",
|
||||
" with gr.Column(scale=1):\n",
|
||||
" with gr.Row():\n",
|
||||
" chatbot = gr.Chatbot(height=300, type=\"messages\")\n",
|
||||
" with gr.Row():\n",
|
||||
" language_dropdown = gr.Dropdown(\n",
|
||||
" choices=[\"French\", \"Spanish\", \"German\", \"Italian\", \"Russian\", \"Romanian\"],\n",
|
||||
" value=\"French\",\n",
|
||||
" label=\"Translation to\"\n",
|
||||
" )\n",
|
||||
" with gr.Row():\n",
|
||||
" translation_output = gr.Chatbot(height=200, type=\"messages\", label=\"Translated chat\")\n",
|
||||
" with gr.Column(scale=1):\n",
|
||||
" with gr.Row():\n",
|
||||
" image_output = gr.Image(height=620)\n",
|
||||
" with gr.Row():\n",
|
||||
" audio_output = gr.Audio(label=\"Assistant's Voice\", visible=False, autoplay=True, type=\"filepath\")\n",
|
||||
" \n",
|
||||
" with gr.Row():\n",
|
||||
" entry = gr.Textbox(label=\"Chat with our AI Assistant:\")\n",
|
||||
" \n",
|
||||
" with gr.Row():\n",
|
||||
" with gr.Column(scale=1):\n",
|
||||
" with gr.Row():\n",
|
||||
" md = gr.Markdown()\n",
|
||||
" with gr.Column(scale=1):\n",
|
||||
" speak_button = gr.Button(value=\"🎤 Speak Command\", variant=\"primary\")\n",
|
||||
" with gr.Column(scale=1):\n",
|
||||
" with gr.Row():\n",
|
||||
" md = gr.Markdown()\n",
|
||||
" with gr.Column(scale=1): \n",
|
||||
" with gr.Row():\n",
|
||||
" clear = gr.Button(value=\"Clear\", variant=\"secondary\")\n",
|
||||
" with gr.Column(scale=1):\n",
|
||||
" with gr.Row():\n",
|
||||
" md = gr.Markdown()\n",
|
||||
"\n",
|
||||
" # Function to handle speech input\n",
|
||||
" def do_speech_input(testing_mode):\n",
|
||||
" # Record and transcribe speech\n",
|
||||
" speech_text = recorder_and_transcriber(duration=STT_DURATION, testing_mode=testing_mode)\n",
|
||||
" return speech_text\n",
|
||||
" \n",
|
||||
" # Function to handle user input\n",
|
||||
" def do_entry(message, history, testing_mode):\n",
|
||||
" history += [{\"role\":\"user\", \"content\":message}]\n",
|
||||
" return \"\", history\n",
|
||||
" \n",
|
||||
" # Function to handle translation updates\n",
|
||||
" def do_translation(history, language):\n",
|
||||
" translated = translate_history(history, language)\n",
|
||||
" return translated\n",
|
||||
" \n",
|
||||
" # Function to handle text-to-speech\n",
|
||||
" def do_tts(reply, testing_mode):\n",
|
||||
" if not reply or testing_mode:\n",
|
||||
" return None\n",
|
||||
" \n",
|
||||
" audio_file = talker(reply, testing_mode)\n",
|
||||
" return audio_file\n",
|
||||
" \n",
|
||||
" # Handle user message submission\n",
|
||||
" entry.submit(do_entry, inputs=[entry, chatbot, testing_checkbox], outputs=[entry, chatbot]).then(\n",
|
||||
" chat, inputs=[chatbot, image_output, testing_checkbox], outputs=[chatbot, image_output, audio_state]\n",
|
||||
" ).then(\n",
|
||||
" do_tts, inputs=[audio_state, testing_checkbox], outputs=[audio_output]\n",
|
||||
" ).then(\n",
|
||||
" do_translation, inputs=[chatbot, language_dropdown], outputs=[translation_output]\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" # Add speech button handling\n",
|
||||
" speak_button.click(\n",
|
||||
" do_speech_input, \n",
|
||||
" inputs=[testing_checkbox], \n",
|
||||
" outputs=[entry]\n",
|
||||
" ).then(\n",
|
||||
" do_entry, \n",
|
||||
" inputs=[entry, chatbot, testing_checkbox], \n",
|
||||
" outputs=[entry, chatbot]\n",
|
||||
" ).then(\n",
|
||||
" chat, \n",
|
||||
" inputs=[chatbot, image_output, testing_checkbox], \n",
|
||||
" outputs=[chatbot, image_output, audio_state]\n",
|
||||
" ).then(\n",
|
||||
" do_tts, inputs=[audio_state, testing_checkbox], outputs=[audio_output]\n",
|
||||
" ).then(\n",
|
||||
" do_translation, \n",
|
||||
" inputs=[chatbot, language_dropdown], \n",
|
||||
" outputs=[translation_output]\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" # Update translation when language is changed\n",
|
||||
" language_dropdown.change(do_translation, inputs=[chatbot, language_dropdown], outputs=[translation_output])\n",
|
||||
" \n",
|
||||
" # Handle clear button\n",
|
||||
" def clear_all():\n",
|
||||
" # Clean up audio files\n",
|
||||
" cleanup_audio_files()\n",
|
||||
" # Return None for all outputs to clear the UI\n",
|
||||
" return None, None, None, None\n",
|
||||
" \n",
|
||||
" clear.click(clear_all, inputs=None, outputs=[chatbot, translation_output, image_output, audio_output], queue=False)\n",
|
||||
"\n",
|
||||
" return ui\n",
|
||||
"\n",
|
||||
"# Replace the original ui code with this:\n",
|
||||
"ui = update_gradio_interface()\n",
|
||||
"ui.launch(inbrowser=True)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,425 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Additional End of week Exercise - week 2\n",
|
||||
"\n",
|
||||
"Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.\n",
|
||||
"\n",
|
||||
"This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!\n",
|
||||
"\n",
|
||||
"If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.\n",
|
||||
"\n",
|
||||
"I will publish a full solution here soon - unless someone beats me to it...\n",
|
||||
"\n",
|
||||
"There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "63b3acf7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# NOTE: Tool call to course notebooks \n",
|
||||
"\n",
|
||||
"This ended up being a bit more complex than I expected, so I only impleneted tool calling for chatgpt (not claude and gemini) as I had planned\n",
|
||||
"\n",
|
||||
"I ran into some problems getting streaming to work with tool calling. \n",
|
||||
"\n",
|
||||
"Also, the current implementation is not pretty :)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a07e7793-b8f5-44f4-aded-5562f633271a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# base imports\n",
|
||||
"\n",
|
||||
"import json\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from openai import OpenAI\n",
|
||||
"import anthropic\n",
|
||||
"import gradio as gr\n",
|
||||
"\n",
|
||||
"load_dotenv(override=True)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "7f02f5c4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Instantaite clients and set system prompt\n",
|
||||
"\n",
|
||||
"openai = OpenAI()\n",
|
||||
"claude = anthropic.Anthropic()\n",
|
||||
"\n",
|
||||
"SYSTEM_PROMPT = \"\\n\".join([\n",
|
||||
" \"You are a helpful technical tutor who answers questions about python code, software engineering, data science and LLMs\",\n",
|
||||
" \"You have access to a notebook_search tool that can search the course notebooks for relevant information to the user's question\",\n",
|
||||
" \"You always keep your answers concise and to the point\",\n",
|
||||
"])\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f92b2fa5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## This is the tool\n",
|
||||
"An index of embeddings for the course material - in this case just Week 2. But we could expand it to cover all the course material, so we can ask questions about it, and find references to things we forgot :)\n",
|
||||
"\n",
|
||||
"We can provide the URL to the Notebooks class that we want to query access to\n",
|
||||
"\n",
|
||||
"We opt out of the community contributions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "82f55cbb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from pathlib import Path\n",
|
||||
"from sentence_transformers import SentenceTransformer\n",
|
||||
"import faiss\n",
|
||||
"\n",
|
||||
"# Set path to course notebooks\n",
|
||||
"NOTEBOOK_DIR = Path('~/code/llm_engineering/week2').expanduser()\n",
|
||||
"\n",
|
||||
"# Set embedding model (we could also use openai's embedding model)\n",
|
||||
"EMBED_MODEL = \"all-MiniLM-L6-v2\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Notebooks:\n",
|
||||
" def __init__(self, notebook_dir: Path = None):\n",
|
||||
" self.embed_model = SentenceTransformer(EMBED_MODEL)\n",
|
||||
" if notebook_dir:\n",
|
||||
" self.load_notebooks(notebook_dir)\n",
|
||||
"\n",
|
||||
" # Load all notebooks to memory\n",
|
||||
" def load_notebooks(self, notebook_dir: Path):\n",
|
||||
" print('Reading from', notebook_dir)\n",
|
||||
" self.docs = []\n",
|
||||
" for notebook_path in notebook_dir.rglob(\"*.ipynb\"):\n",
|
||||
" if 'community-contributions' in str(notebook_path):\n",
|
||||
" continue\n",
|
||||
" print(notebook_path)\n",
|
||||
"\n",
|
||||
" data = json.loads(notebook_path.read_text())\n",
|
||||
" \n",
|
||||
" # Include both markdown and code if available\n",
|
||||
" cells = []\n",
|
||||
" for cell in data.get(\"cells\", []):\n",
|
||||
" if cell.get(\"cell_type\") == \"markdown\":\n",
|
||||
" cells.append(\"\".join(cell[\"source\"]))\n",
|
||||
" elif cell.get(\"cell_type\") == \"code\":\n",
|
||||
" code = \"\".join(cell[\"source\"])\n",
|
||||
" cells.append(f\"```python\\n{code}\\n```\")\n",
|
||||
" if \"outputs\" in cell:\n",
|
||||
" for output in cell[\"outputs\"]:\n",
|
||||
" if \"text\" in output:\n",
|
||||
" cells.append(\"\".join(output[\"text\"]))\n",
|
||||
" \n",
|
||||
" text = \"\\n\\n\".join(cells).strip()\n",
|
||||
" \n",
|
||||
" if text:\n",
|
||||
" self.docs.append({\n",
|
||||
" \"path\": str(notebook_path.relative_to(notebook_dir)),\n",
|
||||
" \"text\": text\n",
|
||||
" })\n",
|
||||
" \n",
|
||||
" self._build_notebook_retriever()\n",
|
||||
"\n",
|
||||
" # Build FAISS index for retreival\n",
|
||||
" def _build_notebook_retriever(self):\n",
|
||||
" print('Building search index')\n",
|
||||
" texts = [d[\"text\"] for d in self.docs]\n",
|
||||
"\n",
|
||||
" # Transform notebook text into embeddings\n",
|
||||
" embeddings = self.embed_model.encode(texts, convert_to_numpy=True, show_progress_bar=True)\n",
|
||||
"\n",
|
||||
" self.doc_index = faiss.IndexFlatL2(embeddings.shape[1])\n",
|
||||
" self.doc_index.add(embeddings)\n",
|
||||
"\n",
|
||||
" # Returns top n most similar notebook-markdown snippets\n",
|
||||
" def search(self, query: str, top_n: int = 3, max_distance: float = None):\n",
|
||||
" print('Looking for', query)\n",
|
||||
" # compute embeddings for the query\n",
|
||||
" embeddings = self.embed_model.encode([query], convert_to_numpy=True)\n",
|
||||
" \n",
|
||||
" # search the index\n",
|
||||
" distances, indices = self.doc_index.search(embeddings, top_n)\n",
|
||||
"\n",
|
||||
" # compile results\n",
|
||||
" results = []\n",
|
||||
" for dist, idx in zip(distances[0], indices[0]):\n",
|
||||
" if max_distance is not None and dist > max_distance:\n",
|
||||
" continue\n",
|
||||
" \n",
|
||||
" doc = self.docs[idx]\n",
|
||||
" excerpt = doc[\"text\"]\n",
|
||||
" if len(excerpt) > 500:\n",
|
||||
" excerpt = excerpt[:500].rsplit(\"\\n\", 1)[0] + \"…\"\n",
|
||||
" \n",
|
||||
" results.append({\n",
|
||||
" \"source\": doc[\"path\"],\n",
|
||||
" \"excerpt\": excerpt,\n",
|
||||
" \"score\": float(dist) # lower socre is more similar in L2 space\n",
|
||||
" })\n",
|
||||
" \n",
|
||||
" return results\n",
|
||||
" \n",
|
||||
" def as_tool(self):\n",
|
||||
" return { \n",
|
||||
" \"type\": \"function\", \n",
|
||||
" \"function\": {\n",
|
||||
" \"name\": \"notebook_search\",\n",
|
||||
" \"description\": \"Searches the course notebooks and returns relevant excerpts with paths.\",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"query\": {\n",
|
||||
" \"type\": \"string\", \n",
|
||||
" \"description\": \"What to look for in the course notebooks\"\n",
|
||||
" },\n",
|
||||
" \"top_n\": {\n",
|
||||
" \"type\":\"integer\",\n",
|
||||
" \"description\":\"How many course notebook passages to return\", \n",
|
||||
" \"default\": 3\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" \"required\": [\"query\"],\n",
|
||||
" \"additionalProperties\": False\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" \n",
|
||||
" \n",
|
||||
"notebooks = Notebooks(NOTEBOOK_DIR)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def notebook_search(query, top_n=3):\n",
|
||||
" return notebooks.search(query, top_n)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ce7608bc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Test tool here\n",
|
||||
"\n",
|
||||
"notebooks.search(\"Gradio\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a214dd2e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"MODELS = dict(\n",
|
||||
" gpt='gpt-4o-mini',\n",
|
||||
" claude='claude-3-haiku-20240307',\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"def get_interactions(message, history):\n",
|
||||
" messages = []\n",
|
||||
" for user_msg, bot_msg in history:\n",
|
||||
" messages.append({\"role\":\"user\", \"content\":user_msg})\n",
|
||||
" messages.append({\"role\":\"assistant\", \"content\":bot_msg})\n",
|
||||
" messages.append({\"role\":\"user\", \"content\":message})\n",
|
||||
" return messages\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def get_chatgpt_stream(model, message, history):\n",
|
||||
" print(f\"Getting OpenAi stream, using {model}\")\n",
|
||||
" interactions = get_interactions(message, history)\n",
|
||||
" messages = [{\"role\": \"system\", \"content\": SYSTEM_PROMPT}] + interactions\n",
|
||||
"\n",
|
||||
" stream = openai.chat.completions.create(\n",
|
||||
" model=model,\n",
|
||||
" messages=messages,\n",
|
||||
" temperature=0.5,\n",
|
||||
" stream=True,\n",
|
||||
" tools=[\n",
|
||||
" notebooks.as_tool()\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" tool_call = None\n",
|
||||
" fn_name = None\n",
|
||||
" fn_args = \"\"\n",
|
||||
" tool_call_id = None\n",
|
||||
" buffer = \"\"\n",
|
||||
" \n",
|
||||
" for chunk in stream:\n",
|
||||
" delta = chunk.choices[0].delta\n",
|
||||
"\n",
|
||||
" # Handle normal content\n",
|
||||
" if delta and delta.content:\n",
|
||||
" buffer += delta.content or ''\n",
|
||||
" yield buffer\n",
|
||||
"\n",
|
||||
" # Handle tool call\n",
|
||||
" if delta and delta.tool_calls:\n",
|
||||
" tool_call = delta.tool_calls[0]\n",
|
||||
" if tool_call.id:\n",
|
||||
" tool_call_id = tool_call.id\n",
|
||||
" if tool_call.function.name:\n",
|
||||
" fn_name = tool_call.function.name\n",
|
||||
" if tool_call.function.arguments:\n",
|
||||
" fn_args += tool_call.function.arguments\n",
|
||||
" yield buffer # Yield to keep Gradio updated\n",
|
||||
" \n",
|
||||
" if fn_name == \"notebook_search\" and fn_args and tool_call_id:\n",
|
||||
" print('Tool call to ', fn_name)\n",
|
||||
"\n",
|
||||
" args = json.loads(fn_args)\n",
|
||||
" result = notebook_search(**args) # Returns list of dicts\n",
|
||||
" result_str = json.dumps(result, indent=2)\n",
|
||||
" print(\"Tool result:\", result_str)\n",
|
||||
"\n",
|
||||
" # Append assistant message with tool call\n",
|
||||
" messages.append({\n",
|
||||
" \"role\": \"assistant\",\n",
|
||||
" \"content\": None,\n",
|
||||
" \"tool_calls\": [\n",
|
||||
" {\n",
|
||||
" \"id\": tool_call_id,\n",
|
||||
" \"type\": \"function\",\n",
|
||||
" \"function\": {\n",
|
||||
" \"name\": fn_name,\n",
|
||||
" \"arguments\": fn_args\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" })\n",
|
||||
"\n",
|
||||
" messages.append({\n",
|
||||
" \"role\": \"tool\",\n",
|
||||
" \"content\": result_str,\n",
|
||||
" \"tool_call_id\": tool_call_id\n",
|
||||
" })\n",
|
||||
" messages.append({\n",
|
||||
" \"role\": \"assistant\",\n",
|
||||
" \"content\": \"Make sure you reference the source notebook in your answer.\",\n",
|
||||
" })\n",
|
||||
"\n",
|
||||
" # Follow-up chat call\n",
|
||||
" followup_stream = openai.chat.completions.create(\n",
|
||||
" model=model,\n",
|
||||
" messages=messages,\n",
|
||||
" temperature=0.5,\n",
|
||||
" stream=True\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Stream follow-up response\n",
|
||||
" for chunk in followup_stream:\n",
|
||||
" delta = chunk.choices[0].delta\n",
|
||||
" if delta.content:\n",
|
||||
" buffer += delta.content or \"\"\n",
|
||||
" yield buffer\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def get_claude_stream(model, message, history):\n",
|
||||
" print(f\"Getting Claude stream, using {model}\")\n",
|
||||
" interactions = get_interactions(message, history)\n",
|
||||
"\n",
|
||||
" with claude.messages.stream(\n",
|
||||
" model=model,\n",
|
||||
" messages=interactions,\n",
|
||||
" max_tokens=500,\n",
|
||||
" system=SYSTEM_PROMPT,\n",
|
||||
" ) as stream:\n",
|
||||
" buffer = \"\"\n",
|
||||
" for delta in stream.text_stream:\n",
|
||||
" buffer += delta\n",
|
||||
" yield buffer\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def chat(model_selector, message, history):\n",
|
||||
" model = MODELS.get(model_selector)\n",
|
||||
" if not model:\n",
|
||||
" raise ValueError(f\"Invalid model: {model_selector}\")\n",
|
||||
" \n",
|
||||
" reply = \"\"\n",
|
||||
" if model_selector == 'gpt':\n",
|
||||
" for partial in get_chatgpt_stream(model, message, history):\n",
|
||||
" reply = partial\n",
|
||||
" yield history + [(message, reply)]\n",
|
||||
"\n",
|
||||
" elif model_selector == 'claude':\n",
|
||||
" for partial in get_claude_stream(model, message, history):\n",
|
||||
" reply = partial\n",
|
||||
" yield history + [(message, reply)]\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"with gr.Blocks() as demo:\n",
|
||||
" model_selector = gr.Dropdown(\n",
|
||||
" choices=MODELS.keys(),\n",
|
||||
" value=\"gpt\", \n",
|
||||
" label=\"Pick Model\",\n",
|
||||
" )\n",
|
||||
" chatbot = gr.Chatbot()\n",
|
||||
" txt = gr.Textbox(placeholder=\"Ask about python\", show_label=False)\n",
|
||||
" txt.submit(\n",
|
||||
" fn=chat,\n",
|
||||
" inputs=[model_selector, txt, chatbot],\n",
|
||||
" outputs=[chatbot],\n",
|
||||
" ).then(\n",
|
||||
" fn=lambda: \"\",\n",
|
||||
" inputs=None,\n",
|
||||
" outputs=txt\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" clear = gr.Button(\"Clear\")\n",
|
||||
" clear.click(lambda: None, None, chatbot, queue=False)\n",
|
||||
"\n",
|
||||
"demo.launch()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bc128d47",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "llms",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,541 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "40d49349-faaa-420c-9b65-0bdc9edfabce",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# The Price is Right\n",
|
||||
"\n",
|
||||
"## Finishing off with Random Forests, XG Boost & Ensemble"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6cd8b15e-f88a-470d-a9a6-b6370effaff9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install xgboost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fbcdfea8-7241-46d7-a771-c0381a3e7063",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"import re\n",
|
||||
"import math\n",
|
||||
"import json\n",
|
||||
"from tqdm import tqdm\n",
|
||||
"import random\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from huggingface_hub import login\n",
|
||||
"import numpy as np\n",
|
||||
"import pickle\n",
|
||||
"from openai import OpenAI\n",
|
||||
"from sentence_transformers import SentenceTransformer\n",
|
||||
"from datasets import load_dataset\n",
|
||||
"import chromadb\n",
|
||||
"from items import Item\n",
|
||||
"from testing import Tester\n",
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"from sklearn.ensemble import RandomForestRegressor\n",
|
||||
"from sklearn.linear_model import LinearRegression\n",
|
||||
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||
"import joblib\n",
|
||||
"import xgboost as xgb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e6e88bd1-f89c-4b98-92fa-aa4bc1575bca",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# CONSTANTS\n",
|
||||
"\n",
|
||||
"DB = \"products_vectorstore\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "98666e73-938e-469d-8987-e6e55ba5e034",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# environment\n",
|
||||
"\n",
|
||||
"load_dotenv(override=True)\n",
|
||||
"os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n",
|
||||
"os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dc696493-0b6f-48aa-9fa8-b1ae0ecaf3cd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load in the test pickle file:\n",
|
||||
"\n",
|
||||
"with open('test.pkl', 'rb') as file:\n",
|
||||
" test = pickle.load(file)\n",
|
||||
" \n",
|
||||
"# training data is already in Chroma"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d26a1104-cd11-4361-ab25-85fb576e0582",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"client = chromadb.PersistentClient(path=DB)\n",
|
||||
"collection = client.get_or_create_collection('products')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e00b82a9-a8dc-46f1-8ea9-2f07cbc8e60d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"result = collection.get(include=['embeddings', 'documents', 'metadatas'])\n",
|
||||
"vectors = np.array(result['embeddings'])\n",
|
||||
"documents = result['documents']\n",
|
||||
"prices = [metadata['price'] for metadata in result['metadatas']]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bf6492cb-b11a-4ad5-859b-a71a78ffb949",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Random Forest\n",
|
||||
"\n",
|
||||
"We will now train a Random Forest model.\n",
|
||||
"\n",
|
||||
"Can you spot the difference from what we did in Week 6? In week 6 we used the word2vec model to form vectors; this time we'll use the vectors we already have in Chroma, from the SentenceTransformer model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "48894777-101f-4fe5-998c-47079407f340",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This next line takes an hour on my M1 Mac!\n",
|
||||
"\n",
|
||||
"rf_model = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)\n",
|
||||
"rf_model.fit(vectors, prices)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "90a07dde-6f57-4488-8d08-e8e5646754e7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"n_job = -1 means it is using every core"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "62eb7ddf-e1da-481e-84c6-1256547566bd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Save the model to a file\n",
|
||||
"\n",
|
||||
"joblib.dump(rf_model, 'random_forest_model.pkl')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d281dc5e-761e-4a5e-86b3-29d9c0a33d4a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load it back in again\n",
|
||||
"\n",
|
||||
"rf_model = joblib.load('random_forest_model.pkl')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "23760bf5-fe52-473d-bfbe-def6b7a67a77",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# XG Boost Model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c65dcfb9-d2c1-431c-843d-c5908bc39e3f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"train_dmatrix = xgb.DMatrix(vectors, label=prices)\n",
|
||||
"\n",
|
||||
"params = {\n",
|
||||
" \"objective\": \"reg:squarederror\",\n",
|
||||
" \"max_depth\": 6,\n",
|
||||
" \"learning_rate\": 0.1,\n",
|
||||
" \"nthread\": -1,\n",
|
||||
" \"verbosity\": 1,\n",
|
||||
" \"subsample\": 0.8,\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"model = xgb.train(params, train_dmatrix, num_boost_round=100)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a6980ca7-fc38-482c-8346-80c435058886",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"joblib.dump(model,'xg_boost_model.pkl')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a0605f48-04f8-44a3-8d8c-c7be4cd840b2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"xgb_model = joblib.load('xg_boost_model.pkl')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "22d10315-2b11-43b0-b042-679a2814dea1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Agents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5d438dec-8e5b-4e60-bb6f-c3f82e522dd9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from agents.specialist_agent import SpecialistAgent\n",
|
||||
"from agents.frontier_agent import FrontierAgent\n",
|
||||
"from agents.random_forest_agent import RandomForestAgent\n",
|
||||
"from agents.xg_boost_agent import XGBoostAgent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "afc39369-b97b-4a90-b17e-b20ef501d3c9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"specialist = SpecialistAgent()\n",
|
||||
"frontier = FrontierAgent(collection)\n",
|
||||
"random_forest = RandomForestAgent()\n",
|
||||
"xg_boost = XGBoostAgent()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8e2d0d0a-8bb8-4b39-b046-322828c39244",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def description(item):\n",
|
||||
" return item.prompt.split(\"to the nearest dollar?\\n\\n\")[1].split(\"\\n\\nPrice is $\")[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bfe0434f-b29e-4cc0-bad9-b07624665727",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def rf(item):\n",
|
||||
" return random_forest.price(description(item))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "cdf233ec-264f-4b34-9f2b-27c39692137b",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"Tester.test(rf, test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "192b94ac-37d0-4569-bc7c-8fc4f92d129b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def xg_b(item):\n",
|
||||
" return xg_boost.price(description(item))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a3fa01c2-42d9-4ce7-ae36-1d874a0003c1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"xg_b(test[0])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9183aab7-0586-4d43-b212-c40442c7ab34",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"Tester.test(xg_b, test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0045825e-2df0-429a-8ebb-2617517a2e75",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Moving towards the ensemble model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9f759bd2-7a7e-4c1a-80a0-e12470feca89",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"product = \"Quadcast HyperX condenser mic for high quality audio for podcasting\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e44dbd25-fb95-4b6b-bbbb-8da5fc817105",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(specialist.price(product))\n",
|
||||
"print(frontier.price(product))\n",
|
||||
"print(random_forest.price(product))\n",
|
||||
"print(xg_boost.price(product))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1779b353-e2bb-4fc7-be7c-93057e4d688a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"specialists = []\n",
|
||||
"frontiers = []\n",
|
||||
"random_forests = []\n",
|
||||
"xg_boosts = []\n",
|
||||
"prices = []\n",
|
||||
"\n",
|
||||
"for item in tqdm(test[1000:1250]):\n",
|
||||
" text = description(item)\n",
|
||||
" specialists.append(specialist.price(text))\n",
|
||||
" frontiers.append(frontier.price(text))\n",
|
||||
" random_forests.append(random_forest.price(text))\n",
|
||||
" xg_boosts.append(xg_boost.price(text))\n",
|
||||
" prices.append(item.price)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f0bca725-4e34-405b-8d90-41d67086a25d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"mins = [min(s,f,r,x) for s,f,r,x in zip(specialists, frontiers, random_forests, xg_boosts)]\n",
|
||||
"maxes = [max(s,f,r,x) for s,f,r,x in zip(specialists, frontiers, random_forests, xg_boosts)]\n",
|
||||
"\n",
|
||||
"X = pd.DataFrame({\n",
|
||||
" 'Specialist': specialists,\n",
|
||||
" 'Frontier': frontiers,\n",
|
||||
" 'RandomForest': random_forests,\n",
|
||||
" 'XGBoost' : xg_boosts,\n",
|
||||
" 'Min': mins,\n",
|
||||
" 'Max': maxes,\n",
|
||||
"})\n",
|
||||
"\n",
|
||||
"# Convert y to a Series\n",
|
||||
"y = pd.Series(prices)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "baac4947-02d8-4d12-82ed-9ace3c0bee39",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Train a Linear Regression - current\n",
|
||||
"np.random.seed(42)\n",
|
||||
"\n",
|
||||
"lr = LinearRegression()\n",
|
||||
"lr.fit(X, y)\n",
|
||||
"\n",
|
||||
"feature_columns = X.columns.tolist()\n",
|
||||
"\n",
|
||||
"for feature, coef in zip(feature_columns, lr.coef_):\n",
|
||||
" print(f\"{feature}: {coef:.2f}\")\n",
|
||||
"print(f\"Intercept={lr.intercept_:.2f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "702de4cb-2311-4753-9c05-f3a0fa7e9990",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Train a Linear Regression - old vals w/o xg\n",
|
||||
"np.random.seed(42)\n",
|
||||
"\n",
|
||||
"lr = LinearRegression()\n",
|
||||
"lr.fit(X, y)\n",
|
||||
"\n",
|
||||
"feature_columns = X.columns.tolist()\n",
|
||||
"\n",
|
||||
"for feature, coef in zip(feature_columns, lr.coef_):\n",
|
||||
" print(f\"{feature}: {coef:.2f}\")\n",
|
||||
"print(f\"Intercept={lr.intercept_:.2f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0bdf6e68-28a3-4ed2-b17e-de0ede923d34",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"joblib.dump(lr, 'ensemble_model.pkl')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e762441a-9470-4dd7-8a8f-ec0430e908c7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from agents.ensemble_agent import EnsembleAgent\n",
|
||||
"ensemble = EnsembleAgent(collection)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1a29f03c-8010-43b7-ae7d-1bc85ca6e8e2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ensemble.price(product) #old val"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "13dbf002-eba6-4c7a-898f-d697f68ca28e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ensemble.price(product)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e6a5e226-a508-43d5-aa42-cefbde72ffdf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def ensemble_pricer(item):\n",
|
||||
" return max(0,ensemble.price(description(item)))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8397b1ef-2ea3-4af8-bb34-36594e0600cc",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"Tester.test(ensemble_pricer, test) #old "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0d26c9ff-994b-4799-af51-09d00ddc0c06",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"Tester.test(ensemble_pricer, test)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,52 @@
|
||||
import pandas as pd
|
||||
from sklearn.linear_model import LinearRegression
|
||||
import joblib
|
||||
|
||||
from agents.agent import Agent
|
||||
from agents.specialist_agent import SpecialistAgent
|
||||
from agents.frontier_agent import FrontierAgent
|
||||
from agents.random_forest_agent import RandomForestAgent
|
||||
from agents.xg_boost_agent import XGBoostAgent
|
||||
|
||||
class EnsembleAgent(Agent):
|
||||
|
||||
name = "Ensemble Agent"
|
||||
color = Agent.YELLOW
|
||||
|
||||
def __init__(self, collection):
|
||||
"""
|
||||
Create an instance of Ensemble, by creating each of the models
|
||||
And loading the weights of the Ensemble
|
||||
"""
|
||||
self.log("Initializing Ensemble Agent")
|
||||
self.specialist = SpecialistAgent()
|
||||
self.frontier = FrontierAgent(collection)
|
||||
self.random_forest = RandomForestAgent()
|
||||
self.xg_boost = XGBoostAgent()
|
||||
self.model = joblib.load('ensemble_model.pkl')
|
||||
self.log("Ensemble Agent is ready")
|
||||
|
||||
def price(self, description: str) -> float:
|
||||
"""
|
||||
Run this ensemble model
|
||||
Ask each of the models to price the product
|
||||
Then use the Linear Regression model to return the weighted price
|
||||
:param description: the description of a product
|
||||
:return: an estimate of its price
|
||||
"""
|
||||
self.log("Running Ensemble Agent - collaborating with specialist, frontier, xg boost and random forest agents")
|
||||
specialist = self.specialist.price(description)
|
||||
frontier = self.frontier.price(description)
|
||||
random_forest = self.random_forest.price(description)
|
||||
xg_boost = self.xg_boost.price(description)
|
||||
X = pd.DataFrame({
|
||||
'Specialist': [specialist],
|
||||
'Frontier': [frontier],
|
||||
'RandomForest': [random_forest],
|
||||
'XGBoost' : [xg_boost],
|
||||
'Min': [min(specialist, frontier, random_forest, xg_boost)],
|
||||
'Max': [max(specialist, frontier, random_forest, xg_boost)],
|
||||
})
|
||||
y = max(0, self.model.predict(X)[0])
|
||||
self.log(f"Ensemble Agent complete - returning ${y:.2f}")
|
||||
return y
|
||||
@@ -0,0 +1,46 @@
|
||||
# imports
|
||||
|
||||
import os
|
||||
import re
|
||||
from typing import List
|
||||
from sentence_transformers import SentenceTransformer
|
||||
import joblib
|
||||
from agents.agent import Agent
|
||||
import xgboost as xgb
|
||||
|
||||
|
||||
|
||||
|
||||
class XGBoostAgent(Agent):
|
||||
|
||||
name = "XG Boost Agent"
|
||||
color = Agent.BRIGHT_MAGENTA
|
||||
|
||||
def __init__(self):
|
||||
"""
|
||||
Initialize this object by loading in the saved model weights
|
||||
and the SentenceTransformer vector encoding model
|
||||
"""
|
||||
self.log("XG Boost Agent is initializing")
|
||||
self.vectorizer = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
|
||||
self.model = joblib.load('xg_boost_model.pkl')
|
||||
self.log("XG Boost Agent is ready")
|
||||
|
||||
def price(self, description: str) -> float:
|
||||
"""
|
||||
Use an XG Boost model to estimate the price of the described item
|
||||
:param description: the product to be estimated
|
||||
:return: the price as a float
|
||||
"""
|
||||
self.log("XG Boost Agent is starting a prediction")
|
||||
vector = self.vectorizer.encode([description])
|
||||
vector = vector.reshape(1, -1)
|
||||
# Convert the vector to DMatrix
|
||||
dmatrix = xgb.DMatrix(vector)
|
||||
# Predict the price using the model
|
||||
result = max(0, self.model.predict(dmatrix)[0])
|
||||
self.log(f"XG Boost Agent completed - predicting ${result:.2f}")
|
||||
return result
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user