Merge remote changes with local updates

This commit is contained in:
Omar Marie
2025-07-21 20:54:14 +03:00
184 changed files with 51610 additions and 179 deletions

View File

@@ -13,10 +13,12 @@ I use a platform called Anaconda to set up your environment. It's a powerful too
Having said that: if you have any problems with Anaconda, I've provided an alternative approach. It's faster and simpler and should have you running quickly, with less of a guarantee around compatibility.
### Before we begin - Heads up! Please do check these Windows "gotchas":
If you are relatively new to using the Command Prompt, here is an excellent [guide](https://chatgpt.com/share/67b0acea-ba38-8012-9c34-7a2541052665) with instructions and exercises. I'd suggest you work through this first to build some confidence.
## HEAD'S UP - "GOTCHA" ISSUES ON A PC: The following 4 Windows issues will need your attention, particularly #3 and #4
Please do take a look at these issues. Issue #3 (Windows 260 character limit) will cause an issue with an "Archive Error" installing pytorch if unaddressed. Issue #4 will cause an installation issue.
There are 4 common gotchas to developing on Windows to be aware of:
1. Permissions. Please take a look at this [tutorial](https://chatgpt.com/share/67b0ae58-d1a8-8012-82ca-74762b0408b0) on permissions on Windows
@@ -92,7 +94,7 @@ Press Win + R, type `cmd`, and press Enter
Run `python --version` to find out which python you're on.
Ideally you'd be using a version of Python 3.11, so we're completely in sync.
I believe Python 3.12 works also, but (as of Feb 2025) Python 3.13 does **not** yet work as several Data Science dependencies are not yet ready for Python 3.13.
I believe Python 3.12 works also, but (as of June 2025) Python 3.13 does **not** yet work as several Data Science dependencies are not yet ready for Python 3.13.
If you need to install Python or install another version, you can download it here:
https://www.python.org/downloads/

View File

@@ -0,0 +1,159 @@
# Web Scraper & Data Analyzer
A modern Python application with a sleek PyQt5 GUI for web scraping, data analysis, visualization, and AI-powered website insights. Features a clean, minimalistic design with real-time progress tracking, comprehensive data filtering, and an integrated AI chat assistant for advanced analysis.
## Features
- **Modern UI**: Clean, minimalistic design with dark theme and smooth animations
- **Web Scraping**: Multi-threaded scraping with configurable depth (max 100 levels)
- **Data Visualization**: Interactive table with sorting and filtering capabilities
- **Content Preview**: Dual preview system with both text and visual HTML rendering
- **Data Analysis**: Comprehensive statistics and domain breakdown
- **AI-Powered Analysis**: Chat-based assistant for website insights, SEO suggestions, and content analysis
- **Export Functionality**: JSON export with full metadata
- **URL Normalization**: Handles www/non-www domains intelligently
- **Real-time Progress**: Live progress updates during scraping operations
- **Loop Prevention**: Advanced duplicate detection to prevent infinite loops
- **Smart Limits**: Configurable limits to prevent runaway scraping
## AI Analysis Tab
The application features an advanced **AI Analysis** tab:
- **Conversational Chat UI**: Ask questions about your scraped websites in a modern chat interface (like ChatGPT)
- **Quick Actions**: One-click questions for structure, SEO, content themes, and performance
- **Markdown Responses**: AI replies are formatted for clarity and readability
- **Context Awareness**: AI uses your scraped data for tailored insights
- **Requirements**: Internet connection and the `openai` Python package (see Installation)
- **Fallback**: If `openai` is not installed, a placeholder response is shown
## Loop Prevention & Duplicate Detection
The scraper includes robust protection against infinite loops and circular references:
### 🔄 URL Normalization
- Removes `www.` prefixes for consistent domain handling
- Strips URL fragments (`#section`) to prevent duplicate content
- Removes trailing slashes for consistency
- Normalizes query parameters
### 🚫 Duplicate Detection
- **Visited URL Tracking**: Maintains a set of all visited URLs
- **Unlimited Crawling**: No page limits per domain or total pages
- **Per-Page Duplicate Filtering**: Removes duplicate links within the same page
### 🛡️ Smart Restrictions
- **No Depth Limits**: Crawl as deep as the specified max_depth allows
- **Content Type Filtering**: Only scrapes HTML content
- **File Type Filtering**: Skips non-content files (PDFs, images, etc.)
- **Consecutive Empty Level Detection**: Stops if 3 consecutive levels have no new content
### 📊 Enhanced Tracking
- **Domain Page Counts**: Tracks pages scraped per domain (for statistics)
- **URL Check Counts**: Shows total URLs checked vs. pages scraped
- **Detailed Statistics**: Comprehensive reporting on scraping efficiency
- **Unlimited Processing**: No artificial limits on crawling scope
## Installation
1. **Clone or download the project files**
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
- This will install all required packages, including `PyQt5`, `PyQtWebEngine` (for visual preview), and `openai` (for AI features).
3. **Run the application**:
```bash
python web_scraper_app.py
```
## Usage
### 1. Scraping Configuration
- Enter a starting URL (with or without http/https)
- Set maximum crawl depth (1-100)
- Click "Start Scraping" to begin
### 2. Data View & Filtering
- View scraped data in an interactive table
- Filter by search terms or specific domains
- Double-click any row to preview content
- Export data to JSON format
### 3. Analysis & Statistics
- View comprehensive scraping statistics
- See domain breakdown and word counts
- Preview content in both text and visual formats
- Analyze load times and link counts
- Monitor duplicate detection efficiency
### 4. AI Analysis (New!)
- Switch to the **AI Analysis** tab
- Type your question or use quick action buttons (e.g., "Analyze the website structure", "Suggest SEO improvements")
- The AI will analyze your scraped data and provide actionable insights
- Requires an internet connection and the `openai` package
## Visual Preview Feature
The application includes a visual HTML preview feature that renders scraped web pages in a browser-like view:
- **Requirements**: PyQtWebEngine (automatically installed with requirements.txt)
- **Functionality**: Displays HTML content with proper styling and formatting
- **Fallback**: If PyQtWebEngine is not available, shows a text-only preview
- **Error Handling**: Graceful error messages for invalid HTML content
## Technical Details
- **Backend**: Pure Python with urllib and html.parser (no compilation required)
- **Frontend**: PyQt5 with custom modern styling
- **Threading**: Multi-threaded scraping for better performance
- **Data Storage**: Website objects with full metadata
- **URL Handling**: Intelligent normalization and domain filtering
- **Loop Prevention**: Multi-layered duplicate detection system
- **AI Integration**: Uses OpenAI API (via openrouter) for chat-based analysis
## File Structure
```
Testing/
├── web_scraper_app.py # Main application (with AI and GUI)
├── module.py # Core scraping logic
├── test.py # Basic functionality tests
├── requirements.txt # Dependencies
└── README.md # This file
```
## Troubleshooting
### Visual Preview Not Working
1. Ensure PyQtWebEngine is installed: `pip install PyQtWebEngine`
2. Check console output for import errors
### AI Analysis Not Working
1. Ensure the `openai` package is installed: `pip install openai`
2. Check your internet connection (AI requires online access)
3. If not installed, the AI tab will show a placeholder response
### Scraping Issues
1. Verify internet connection
2. Check URL format (add https:// if needed)
3. Try with a lower depth setting
4. Check console for error messages
### Loop Prevention
1. The scraper automatically prevents infinite loops
2. Check the analysis tab for detailed statistics
3. Monitor "Total URLs Checked" vs "Total Pages" for efficiency
4. Use lower depth settings for sites with many internal links
### Performance
- Use lower depth settings for faster scraping
- Filter data to focus on specific domains
- Close other applications to free up resources
- Monitor domain page counts to avoid hitting limits
## License
This project is open source and available under the MIT License.

View File

@@ -0,0 +1,473 @@
import urllib.request
import urllib.parse
import urllib.error
import html.parser
import re
from datetime import datetime
import time
import ssl
from urllib.parse import urljoin, urlparse
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading
from functools import partial
class HTMLParser(html.parser.HTMLParser):
"""Custom HTML parser to extract title, links, and text content"""
def __init__(self):
super().__init__()
self.title = ""
self.links = []
self.text_content = []
self.in_title = False
self.in_body = False
self.current_tag = ""
def handle_starttag(self, tag, attrs):
self.current_tag = tag.lower()
if tag.lower() == 'title':
self.in_title = True
elif tag.lower() == 'body':
self.in_body = True
elif tag.lower() == 'a':
# Extract href attribute
for attr, value in attrs:
if attr.lower() == 'href' and value:
self.links.append(value)
def handle_endtag(self, tag):
if tag.lower() == 'title':
self.in_title = False
elif tag.lower() == 'body':
self.in_body = False
def handle_data(self, data):
if self.in_title:
self.title += data
elif self.in_body and self.current_tag in ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'span', 'li']:
# Clean the text data
cleaned_data = re.sub(r'\s+', ' ', data.strip())
if cleaned_data:
self.text_content.append(cleaned_data)
def get_text(self):
"""Return all extracted text content as a single string"""
return ' '.join(self.text_content)
def get_clean_text(self, max_length=500):
"""Return cleaned text content with length limit"""
text = self.get_text()
# Remove extra whitespace and limit length
text = re.sub(r'\s+', ' ', text.strip())
if len(text) > max_length:
text = text[:max_length] + "..."
return text
class Website:
"""Class to store website data"""
def __init__(self, title, url, content, depth, links=None, load_time=None):
self.title = title or "No Title"
self.url = url
self.content = content
self.depth = depth
self.links = links or []
self.load_time = load_time
self.timestamp = datetime.now()
def get_word_count(self):
"""Get word count from content"""
if not self.content:
return 0
# Extract text content and count words
text_content = re.sub(r'<[^>]+>', '', self.content)
words = text_content.split()
return len(words)
def get_domain(self):
"""Extract domain from URL"""
try:
parsed = urlparse(self.url)
return parsed.netloc
except:
return ""
def get_normalized_domain(self):
"""Get domain without www prefix for consistent filtering"""
domain = self.get_domain()
if domain.startswith('www.'):
return domain[4:]
return domain
def search_content(self, query):
"""Search for query in content"""
if not self.content or not query:
return False
return query.lower() in self.content.lower()
def get_text_preview(self, max_length=200):
"""Get a text preview of the content"""
if not self.content:
return "No content available"
# Extract text content
text_content = re.sub(r'<[^>]+>', '', self.content)
text_content = re.sub(r'\s+', ' ', text_content.strip())
if len(text_content) > max_length:
return text_content[:max_length] + "..."
return text_content
class WebScraper:
"""Web scraper with multithreading support and robust duplicate detection"""
def __init__(self):
self.websites = []
self.visited_urls = set()
self.visited_domains = set() # Track visited domains
self.start_domain = None # Store the starting domain
self.lock = threading.Lock()
self.max_workers = 10 # Number of concurrent threads
# Removed all page limits - unlimited crawling
self.domain_page_counts = {} # Track page count per domain (for statistics only)
self._stop_requested = False # Flag to stop scraping
def normalize_url(self, url):
"""Normalize URL to handle www prefixes and remove fragments"""
if not url:
return url
# Remove fragments (#) to prevent duplicate content
if '#' in url:
url = url.split('#')[0]
# Remove trailing slashes for consistency
url = url.rstrip('/')
# Remove www prefix for consistent domain handling
if url.startswith('https://www.'):
return url.replace('https://www.', 'https://', 1)
elif url.startswith('http://www.'):
return url.replace('http://www.', 'http://', 1)
return url
def get_domain_from_url(self, url):
"""Extract and normalize domain from URL"""
try:
parsed = urlparse(url)
domain = parsed.netloc
if domain.startswith('www.'):
return domain[4:]
return domain
except:
return ""
def should_skip_url(self, url, current_depth):
"""Check if URL should be skipped based on various criteria"""
normalized_url = self.normalize_url(url)
# Skip if already visited
if normalized_url in self.visited_urls:
return True, "Already visited"
# Skip if not a valid HTTP/HTTPS URL
if not normalized_url.startswith(('http://', 'https://')):
return True, "Not HTTP/HTTPS URL"
# Get domain
domain = self.get_domain_from_url(normalized_url)
if not domain:
return True, "Invalid domain"
# Removed all domain page limits - unlimited crawling
# Removed external domain depth limits - crawl as deep as needed
return False, "OK"
def scrape_url(self, url, depth):
"""Scrape a single URL with error handling and rate limiting"""
try:
# Check if stop was requested
if self._stop_requested:
return None
# Check if URL should be skipped
should_skip, reason = self.should_skip_url(url, depth)
if should_skip:
print(f"Skipping {url}: {reason}")
return None
# Normalize URL
normalized_url = self.normalize_url(url)
# Mark as visited and update domain count (for statistics only)
with self.lock:
self.visited_urls.add(normalized_url)
domain = self.get_domain_from_url(normalized_url)
if domain:
self.domain_page_counts[domain] = self.domain_page_counts.get(domain, 0) + 1
# Add small delay to prevent overwhelming servers
time.sleep(0.1)
start_time = time.time()
# Create request with headers
req = urllib.request.Request(
normalized_url,
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
}
)
# Fetch the page with timeout
with urllib.request.urlopen(req, timeout=15) as response:
# Check content type
content_type = response.headers.get('content-type', '').lower()
if 'text/html' not in content_type and 'application/xhtml' not in content_type:
print(f"Skipping {url}: Not HTML content ({content_type})")
return None
html_content = response.read().decode('utf-8', errors='ignore')
load_time = time.time() - start_time
# Skip if content is too small (likely error page)
if len(html_content) < 100:
print(f"Skipping {url}: Content too small ({len(html_content)} chars)")
return None
# Parse HTML
parser = HTMLParser()
parser.feed(html_content)
# Extract links and normalize them with duplicate detection
links = []
base_url = normalized_url
seen_links = set() # Track links within this page to avoid duplicates
for link in parser.links:
try:
absolute_url = urljoin(base_url, link)
normalized_link = self.normalize_url(absolute_url)
# Skip if already seen in this page or should be skipped
if normalized_link in seen_links:
continue
seen_links.add(normalized_link)
should_skip, reason = self.should_skip_url(normalized_link, depth + 1)
if should_skip:
continue
# Only include http/https links and filter out common non-content URLs
if (normalized_link.startswith(('http://', 'https://')) and
not any(skip in normalized_link.lower() for skip in [
'mailto:', 'tel:', 'javascript:', 'data:', 'file:',
'.pdf', '.doc', '.docx', '.xls', '.xlsx', '.zip', '.rar',
'.jpg', '.jpeg', '.png', '.gif', '.bmp', '.svg', '.ico',
'.css', '.js', '.xml', '.json', '.txt', '.log'
])):
links.append(normalized_link)
except:
continue
# Create Website object
website = Website(
title=parser.title,
url=normalized_url,
content=html_content,
depth=depth,
links=links,
load_time=load_time
)
return website
except urllib.error.HTTPError as e:
print(f"HTTP Error scraping {url}: {e.code} - {e.reason}")
return None
except urllib.error.URLError as e:
print(f"URL Error scraping {url}: {e.reason}")
return None
except Exception as e:
print(f"Error scraping {url}: {str(e)}")
return None
def crawl_website(self, start_url, max_depth=3, progress_callback=None):
"""Crawl website with multithreading support and no page limits"""
if not start_url.startswith(('http://', 'https://')):
start_url = 'https://' + start_url
# Initialize tracking
self.websites = []
self.visited_urls = set()
self.visited_domains = set()
self.domain_page_counts = {}
self.start_domain = self.get_domain_from_url(start_url)
self._stop_requested = False # Reset stop flag
print(f"Starting crawl from: {start_url}")
print(f"Starting domain: {self.start_domain}")
print(f"Max depth: {max_depth}")
print(f"Unlimited crawling - no page limits")
# Start with the initial URL
urls_to_scrape = [(start_url, 0)]
max_depth_reached = 0
consecutive_empty_levels = 0
max_consecutive_empty = 3 # Stop if 3 consecutive levels have no new URLs
total_pages_scraped = 0
# Removed all page limits - unlimited crawling
with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
for current_depth in range(max_depth + 1):
# Check if stop was requested
if self._stop_requested:
print("Scraping stopped by user request")
break
if not urls_to_scrape:
print(f"Stopping at depth {current_depth}: No more URLs to scrape")
break
# Check if we've reached too many consecutive empty levels
if consecutive_empty_levels >= max_consecutive_empty:
print(f"Stopping at depth {current_depth}: {max_consecutive_empty} consecutive empty levels")
break
# Removed absolute page limit check - unlimited pages
print(f"Scraping depth {current_depth} with {len(urls_to_scrape)} URLs")
# Submit all URLs at current depth for concurrent scraping
future_to_url = {
executor.submit(self.scrape_url, url, depth): url
for url, depth in urls_to_scrape
}
# Collect results and prepare next level
urls_to_scrape = []
level_results = 0
for future in as_completed(future_to_url):
# Check if stop was requested
if self._stop_requested:
print("Stopping processing of current level")
break
website = future.result()
if website:
with self.lock:
self.websites.append(website)
level_results += 1
total_pages_scraped += 1
# Emit progress if callback provided
if progress_callback:
progress_callback(website)
# Add links for next depth level (no limits)
if current_depth < max_depth:
for link in website.links:
# Removed URL limit per level - process all URLs
should_skip, reason = self.should_skip_url(link, current_depth + 1)
if not should_skip:
urls_to_scrape.append((link, current_depth + 1))
# Check if stop was requested after processing level
if self._stop_requested:
break
# Update depth tracking
if level_results > 0:
max_depth_reached = current_depth
consecutive_empty_levels = 0
else:
consecutive_empty_levels += 1
# Only stop if we've reached the actual max depth
if current_depth >= max_depth:
print(f"Reached maximum depth: {max_depth}")
break
# Print progress summary
print(f"Depth {current_depth} completed: {level_results} pages, Total: {len(self.websites)}")
if self.domain_page_counts:
print(f"Domain breakdown: {dict(self.domain_page_counts)}")
print(f"Crawling completed. Max depth reached: {max_depth_reached}, Total pages: {len(self.websites)}")
print(f"Visited URLs: {len(self.visited_urls)}")
print(f"Domain breakdown: {dict(self.domain_page_counts)}")
return self.websites
def reset(self):
"""Reset the scraper state for a new crawl"""
self.websites = []
self.visited_urls = set()
self.visited_domains = set()
self.domain_page_counts = {}
self.start_domain = None
self._stop_requested = False # Reset stop flag
def get_statistics(self):
"""Get scraping statistics with enhanced tracking information"""
if not self.websites:
return {
'total_pages': 0,
'total_links': 0,
'total_words': 0,
'avg_load_time': 0,
'max_depth_reached': 0,
'domains': {},
'visited_urls_count': 0,
'domain_page_counts': {},
'start_domain': self.start_domain
}
total_pages = len(self.websites)
total_links = sum(len(w.links) for w in self.websites)
total_words = sum(w.get_word_count() for w in self.websites)
load_times = [w.load_time for w in self.websites if w.load_time]
avg_load_time = sum(load_times) / len(load_times) if load_times else 0
max_depth_reached = max(w.depth for w in self.websites)
# Count domains
domains = {}
for website in self.websites:
domain = website.get_normalized_domain()
domains[domain] = domains.get(domain, 0) + 1
return {
'total_pages': total_pages,
'total_links': total_links,
'total_words': total_words,
'avg_load_time': avg_load_time,
'max_depth_reached': max_depth_reached,
'domains': domains,
'visited_urls_count': len(self.visited_urls),
'domain_page_counts': dict(self.domain_page_counts),
'start_domain': self.start_domain
}
def filter_by_domain(self, domain):
"""Filter websites by domain"""
normalized_domain = self.normalize_url(domain)
return [w for w in self.websites if w.get_normalized_domain() == normalized_domain]
def search_websites(self, query):
"""Search websites by query"""
return [w for w in self.websites if w.search_content(query)]
def stop_scraping(self):
"""Request graceful stop of the scraping process"""
self._stop_requested = True

View File

@@ -0,0 +1,5 @@
PyQt5>=5.15.0
PyQtWebEngine>=5.15.0
urllib3==2.0.7
openai>=1.0.0
python-dotenv>=1.0.0

View File

@@ -0,0 +1,161 @@
#!/usr/bin/env python3
"""
Simple test script to verify the web scraping functionality
"""
import module
def test_basic_scraping():
"""Test basic scraping functionality"""
print("Testing basic web scraping...")
# Create a scraper instance
scraper = module.WebScraper()
# Test with a simple website (httpbin.org is a safe test site)
test_url = "https://httpbin.org/html"
print(f"Scraping {test_url} with depth 1...")
try:
# Scrape with depth 1 to keep it fast
websites = scraper.crawl_website(test_url, max_depth=1)
print(f"Successfully scraped {len(websites)} websites")
if websites:
# Show first website details
first_site = websites[0]
print(f"\nFirst website:")
print(f" Title: {first_site.title}")
print(f" URL: {first_site.url}")
print(f" Depth: {first_site.depth}")
print(f" Links found: {len(first_site.links)}")
print(f" Word count: {first_site.get_word_count()}")
# Show statistics
stats = scraper.get_statistics()
print(f"\nStatistics:")
print(f" Total pages: {stats['total_pages']}")
print(f" Total links: {stats['total_links']}")
print(f" Total words: {stats['total_words']}")
print(f" Average load time: {stats['avg_load_time']:.2f}s")
return True
else:
print("No websites were scraped")
return False
except Exception as e:
print(f"Error during scraping: {e}")
return False
def test_website_class():
"""Test the Website class functionality"""
print("\nTesting Website class...")
# Create a test website
website = module.Website(
title="Test Website",
url="https://example.com",
content="<html><body><h1>Test Content</h1><p>This is a test paragraph.</p></body></html>",
depth=0,
links=["https://example.com/page1", "https://example.com/page2"]
)
# Test methods
print(f"Website title: {website.title}")
print(f"Website URL: {website.url}")
print(f"Word count: {website.get_word_count()}")
print(f"Domain: {website.get_domain()}")
print(f"Normalized domain: {website.get_normalized_domain()}")
print(f"Search for 'test': {website.search_content('test')}")
print(f"Search for 'nonexistent': {website.search_content('nonexistent')}")
return True
def test_html_parser():
"""Test the HTML parser functionality"""
print("\nTesting HTML Parser...")
parser = module.HTMLParser()
test_html = """
<html>
<head><title>Test Page</title></head>
<body>
<h1>Welcome</h1>
<p>This is a <a href="https://example.com">link</a> to example.com</p>
<p>Here's another <a href="/relative-link">relative link</a></p>
</body>
</html>
"""
parser.feed(test_html)
print(f"Title extracted: {parser.title}")
print(f"Links found: {parser.links}")
print(f"Text content length: {len(parser.get_text())}")
return True
def test_url_normalization():
"""Test URL normalization to handle www. prefixes"""
print("\nTesting URL Normalization...")
scraper = module.WebScraper()
# Test URLs with and without www.
test_urls = [
"https://www.example.com/page",
"https://example.com/page",
"http://www.test.com/path?param=value#fragment",
"http://test.com/path?param=value#fragment"
]
print("URL Normalization Results:")
for url in test_urls:
normalized = scraper.normalize_url(url)
print(f" Original: {url}")
print(f" Normalized: {normalized}")
print()
# Test domain filtering
print("Domain Filtering Test:")
test_websites = [
module.Website("Site 1", "https://www.example.com", "content", 0),
module.Website("Site 2", "https://example.com", "content", 0),
module.Website("Site 3", "https://www.test.com", "content", 0)
]
scraper.websites = test_websites
# Test filtering by domain with and without www.
domains_to_test = ["example.com", "www.example.com", "test.com", "www.test.com"]
for domain in domains_to_test:
filtered = scraper.filter_by_domain(domain)
print(f" Filter '{domain}': {len(filtered)} results")
for site in filtered:
print(f" - {site.title} ({site.url})")
return True
if __name__ == "__main__":
print("Web Scraper Test Suite")
print("=" * 50)
# Test HTML parser
test_html_parser()
# Test Website class
test_website_class()
# Test URL normalization
test_url_normalization()
# Test basic scraping (uncomment to test actual scraping)
# Note: This requires internet connection
# test_basic_scraping()
print("\nTest completed!")
print("\nTo run the full application:")
print("python web_scraper_app.py")

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,344 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 170,
"id": "a1aa1b43-7a47-4aca-ae5f-94a9d4ba2d89",
"metadata": {},
"outputs": [],
"source": [
"## Clinic Booking Bot\n",
"\n",
"##Easily book your clinic visit available only on weekdays between **14:00 and 15:00**. \n",
"##Speak or type, and get instant confirmation.\n"
]
},
{
"cell_type": "code",
"execution_count": 171,
"id": "fe798c6a-f8da-46aa-8c0e-9d2623def3d2",
"metadata": {},
"outputs": [],
"source": [
"# import library\n",
"\n",
"import os\n",
"import json\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import gradio as gr\n",
"import base64\n",
"from io import BytesIO\n",
"from datetime import date\n",
"from PIL import Image, ImageDraw, ImageFont\n"
]
},
{
"cell_type": "code",
"execution_count": 172,
"id": "0ad4e526-e95d-4e70-9faa-b4236b105dd5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI API Key exists and begins sk-proj-\n"
]
}
],
"source": [
"# Save keys\n",
"\n",
"load_dotenv(override=True)\n",
"\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"MODEL = \"gpt-4o-mini\"\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": 173,
"id": "ae95308e-0002-4017-9f2c-fcb1ddb248fa",
"metadata": {},
"outputs": [],
"source": [
"# --- CONFIG ---\n",
"BOOKING_START = 14\n",
"BOOKING_END = 15\n",
"WEEKDAYS = [\"Monday\", \"Tuesday\", \"Wednesday\", \"Thursday\", \"Friday\"]\n",
"PHONE = \"010-1234567\"\n",
"confirmed_bookings = []\n"
]
},
{
"cell_type": "code",
"execution_count": 174,
"id": "e21b0fd0-4cda-4938-8867-dc2c6e7af4b1",
"metadata": {},
"outputs": [],
"source": [
"# --- TTS ---\n",
"def generate_tts(text, voice=\"fable\", filename=\"output.mp3\"):\n",
" response = openai.audio.speech.create(\n",
" model=\"tts-1\",\n",
" voice=\"fable\",\n",
" input=text\n",
" )\n",
" with open(filename, \"wb\") as f:\n",
" f.write(response.content)\n",
" return filename"
]
},
{
"cell_type": "code",
"execution_count": 175,
"id": "e28a5c3b-bd01-4845-a41e-87823f6bb078",
"metadata": {},
"outputs": [],
"source": [
"# --- Translate Booking Confirmation ---\n",
"def translate_text(text, target_language=\"nl\"):\n",
" prompt = f\"Translate this message to {target_language}:\\n{text}\"\n",
" response = openai.chat.completions.create(\n",
" model=\"gpt-4\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": \"You are a helpful translator.\"},\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ]\n",
" )\n",
" return response.choices[0].message.content.strip()\n"
]
},
{
"cell_type": "code",
"execution_count": 176,
"id": "8ed57cc9-7d54-4a5d-831b-0efcc5b7a7a9",
"metadata": {},
"outputs": [],
"source": [
"# --- Booking Logic ---\n",
"def book_appointment(name, time_str):\n",
" try:\n",
" booking_time = datetime.strptime(time_str, \"%H:%M\")\n",
" except ValueError:\n",
" return \"Invalid time format. Use HH:MM.\", None, None\n",
"\n",
" hour = booking_time.hour\n",
" weekday = datetime.today().strftime(\"%A\")\n",
"\n",
" if weekday not in WEEKDAYS:\n",
" response = \"Bookings are only available on weekdays.\"\n",
" elif BOOKING_START <= hour < BOOKING_END:\n",
" confirmation = f\"Booking confirmed for {name} at {time_str}.\"\n",
" confirmed_bookings.append((name, time_str))\n",
" translated = translate_text(confirmation)\n",
" audio = generate_tts(translated)\n",
" image = generate_booking_image(name, time_str)\n",
" return translated, audio, image\n",
" else:\n",
" response = \"Sorry, bookings are only accepted between 14:00 and 15:00 on weekdays.\"\n",
" translated = translate_text(response)\n",
" audio = generate_tts(translated)\n",
" return translated, audio, None"
]
},
{
"cell_type": "code",
"execution_count": 177,
"id": "19b52115-f0f3-4d63-a463-886163d4cfd1",
"metadata": {},
"outputs": [],
"source": [
"# --- Booking Card ---\n",
"def generate_booking_image(name, time_str):\n",
" img = Image.new(\"RGB\", (500, 250), color=\"white\")\n",
" draw = ImageDraw.Draw(img)\n",
" msg = f\"\\u2705 Booking Confirmed\\nName: {name}\\nTime: {time_str}\"\n",
" draw.text((50, 100), msg, fill=\"black\")\n",
" return img"
]
},
{
"cell_type": "code",
"execution_count": 178,
"id": "2c446b6c-d410-4ba1-b0c7-c475e5259ff5",
"metadata": {},
"outputs": [],
"source": [
"# --- Voice Booking ---\n",
"def voice_booking(audio_path, name):\n",
" with open(audio_path, \"rb\") as f:\n",
" response = openai.audio.transcriptions.create(model=\"whisper-1\", file=f)\n",
" transcription = response.text.strip()\n",
"\n",
" system_prompt = \"\"\"\n",
" You are a clinic assistant. Extract only the appointment time from the user's sentence in 24-hour HH:MM format.\n",
" If no time is mentioned, respond with 'No valid time found.'\n",
" \"\"\"\n",
"\n",
" response = openai.chat.completions.create(\n",
" model=\"gpt-4\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": transcription}\n",
" ]\n",
" )\n",
" extracted_time = response.choices[0].message.content.strip()\n",
"\n",
" if \":\" in extracted_time:\n",
" return book_appointment(name, extracted_time)\n",
" else:\n",
" message = \"Sorry, I couldn't understand the time. Please try again.\"\n",
" translated = translate_text(message)\n",
" audio_path = generate_tts(translated)\n",
" return translated, audio_path, None"
]
},
{
"cell_type": "code",
"execution_count": 179,
"id": "121d2907-7fa8-4248-b2e7-83617ea66ff0",
"metadata": {},
"outputs": [],
"source": [
"# --- Chat Bot Handler ---\n",
"def chat_bot(messages):\n",
" system_prompt = \"\"\"\n",
" You are a clinic booking assistant. Your job is to:\n",
" - Greet the patient and explain your role\n",
" - Only assist with making appointments\n",
" - Accept bookings only on weekdays between 14:00 and 15:00\n",
" - Do not provide medical advice\n",
" - Always respond with empathy and clarity\n",
" \"\"\"\n",
" response = openai.chat.completions.create(\n",
" model=\"gpt-4\",\n",
" messages=[{\"role\": \"system\", \"content\": system_prompt}] + messages\n",
" )\n",
" reply = response.choices[0].message.content.strip()\n",
" audio = generate_tts(reply)\n",
" return reply, audio"
]
},
{
"cell_type": "code",
"execution_count": 180,
"id": "2427b694-8c57-40cb-b202-4a8989547925",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7898\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7898/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Gradio interface\n",
"with gr.Blocks(theme=gr.themes.Soft()) as demo:\n",
" gr.Markdown(\"\"\"## 🩺 GP Booking Assistant \n",
"Only available weekdays between **14:00 and 15:00** \n",
"☎️ Contact: {PHONE}\n",
"---\"\"\")\n",
"\n",
" name_global = gr.Textbox(label=\"Your Name\", placeholder=\"Enter your name\", interactive=True)\n",
"\n",
" with gr.Tab(\"💬 Chat Mode\"):\n",
" chatbot = gr.Chatbot(label=\"Booking Chat\", type=\"messages\", height=400)\n",
" text_input = gr.Textbox(label=\"Type your message or use your voice below\")\n",
" audio_input = gr.Audio(type=\"filepath\", label=\"🎙️ Or speak your request\")\n",
" chat_audio_output = gr.Audio(label=\"🔊 Assistant's Reply\", type=\"filepath\")\n",
" send_btn = gr.Button(\"Send\")\n",
"\n",
" def handle_chat(user_message, chat_history):\n",
" chat_history = chat_history or []\n",
" chat_history.append({\"role\": \"user\", \"content\": user_message})\n",
" reply, audio = chat_bot(chat_history)\n",
" chat_history.append({\"role\": \"assistant\", \"content\": reply})\n",
" return chat_history, \"\", audio\n",
"\n",
" def handle_audio_chat(audio_path, chat_history):\n",
" with open(audio_path, \"rb\") as f:\n",
" transcription = openai.audio.transcriptions.create(model=\"whisper-1\", file=f).text.strip()\n",
" return handle_chat(transcription, chat_history)\n",
"\n",
" send_btn.click(handle_chat, [text_input, chatbot], [chatbot, text_input, chat_audio_output])\n",
" text_input.submit(handle_chat, [text_input, chatbot], [chatbot, text_input, chat_audio_output])\n",
" audio_input.change(handle_audio_chat, [audio_input, chatbot], [chatbot, text_input, chat_audio_output])\n",
"\n",
"\n",
" \n",
" with gr.Tab(\"📝 Text Booking\"):\n",
" time_text = gr.Textbox(label=\"Preferred Time (HH:MM)\", placeholder=\"e.g., 14:30\")\n",
" btn_text = gr.Button(\"📅 Book via Text\")\n",
"\n",
" with gr.Tab(\"🎙️ Voice Booking\"):\n",
" voice_input = gr.Audio(type=\"filepath\", label=\"Say your preferred time\")\n",
" btn_voice = gr.Button(\"📅 Book via Voice\")\n",
"\n",
" output_text = gr.Textbox(label=\"Response\", interactive=False)\n",
" output_audio = gr.Audio(label=\"Audio Reply\", type=\"filepath\")\n",
" output_image = gr.Image(label=\"Booking Confirmation\")\n",
"\n",
" btn_text.click(fn=book_appointment, inputs=[name_global, time_text], outputs=[output_text, output_audio, output_image])\n",
" btn_voice.click(fn=voice_booking, inputs=[voice_input, name_global], outputs=[output_text, output_audio, output_image])\n",
"\n",
" gr.Markdown(\"\"\"---\n",
"<small>This assistant does **not** give medical advice. It only books appointments within allowed hours.</small>\n",
"\"\"\")\n",
"\n",
" demo.launch()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f359de0a-28b1-4895-b21d-91d79e494a0d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,25 @@
# 🧠 Agentic Voice/Text Support Chatbot
A multimodal chatbot interface with support for **text and voice input**, **multiple large language models (LLMs)**, and **context memory persistence** — all in a single Gradio-based GUI.
## 🚀 Features
- 🔄 **Multi-LLM switching**: Dynamically switch between OpenAI, Anthropic Claude, and Meta LLaMA (via Ollama)
- 🎤 **Voice input**: Use your microphone with live speech-to-text transcription
- 💬 **Contextual memory**: Maintain chat history even when switching models
- 🧪 **Prototype-ready**: Built with Gradio for rapid GUI testing and development
## 🛠️ Technologies Used
- [Gradio](https://www.gradio.app/) GUI interface
- [OpenAI API](https://platform.openai.com/)
- [Anthropic Claude API](https://www.anthropic.com/)
- [Ollama](https://ollama.com/) Local LLaMA inference
- [`speech_recognition`](https://pypi.org/project/SpeechRecognition/) Voice-to-text
- `sounddevice`, `numpy` Audio recording
- `.env` Environment variable management
## Youll also need:
- API keys for OpenAI and Claude
- Ollama installed locally to run LLaMA models
- A .env file with the necessary API keys

View File

@@ -0,0 +1,395 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd",
"metadata": {},
"source": [
"### Building a Chatbot Interface, with Text or Voice Input, Multi-LLM support, and Memory Persistence"
]
},
{
"cell_type": "markdown",
"id": "eeb20b3e",
"metadata": {},
"source": [
"In this tutorial, well use Gradio to build a simple chatbot prototype with a user-friendly interface. The chatbot will support multiple language models, allowing the user to switch models at any point during the conversation. It will also offer optional memory persistence, where the chat history is stored and forwarded to the selected model — which allows shared memory across models, even when switching mid-chat.\n",
"\n",
"In this project, we'll use OpenAI's API, Anthropic's Claude, and Meta's LLaMA, which runs locally via an Ollama server. Additionally, we'll use Pythons speech_recognition module to convert speech to text.\n",
"\n",
"It's worth noting that some APIs — such as OpenAI's — now support direct audio input, so integrating speech capabilities can also be done end-to-end without a separate transcription module."
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "a07e7793-b8f5-44f4-aded-5562f633271a",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import anthropic"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "a0a343b1",
"metadata": {},
"outputs": [],
"source": [
"# Speech recording and recognition libraries\n",
"import speech_recognition as sr\n",
"import sounddevice as sd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "d7693eda",
"metadata": {},
"outputs": [],
"source": [
"# GUI prototyping\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "41ffc0e6",
"metadata": {},
"outputs": [],
"source": [
"buffer = [] # For temporarily holding sound recording\n",
"\n",
"# Helper function for handling voice recording\n",
"def callback(indata, frames, time, status):\n",
" buffer.append(indata.copy())\n",
"\n",
"stream = sd.InputStream(callback=callback, samplerate=16000, channels=1, dtype='int16')"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "e9a79075",
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Function for handling recording data and status\n",
"def toggle_recording(state):\n",
" global stream, buffer\n",
" print('state', state)\n",
"\n",
" if not state:\n",
" buffer.clear()\n",
" stream.start()\n",
" return gr.update(value=\"Stop Recording\"), 'Recording...', not state\n",
" else:\n",
" stream.stop()\n",
" audio = np.concatenate(buffer, axis=0)\n",
" text = transcribe(audio)\n",
" return gr.update(value=\"Start Recording\"), text, not state\n",
"\n",
"# Functio that converts speech to text via Google's voice recognition module\n",
"def transcribe(recording, sample_rate=16000):\n",
" r = sr.Recognizer()\n",
"\n",
" # Convert NumPy array to AudioData\n",
" audio_data = sr.AudioData(\n",
" recording.tobytes(), # Raw byte data\n",
" sample_rate, # Sample rate\n",
" 2 # Sample width in bytes (16-bit = 2 bytes)\n",
" )\n",
"\n",
" text = r.recognize_google(audio_data)\n",
" print(\"You said:\", text)\n",
" return text"
]
},
{
"cell_type": "markdown",
"id": "dcfb0190",
"metadata": {},
"source": [
"### LLM & API set-up"
]
},
{
"cell_type": "markdown",
"id": "59416453",
"metadata": {},
"source": [
"##### Load API keys from .env"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "b638b822",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI API Key exists and begins sk-proj-\n",
"Anthropic API Key exists and begins sk-ant-\n",
"Google API Key not set\n"
]
}
],
"source": [
"# Load environment variables in a file called .env\n",
"# Print the key prefixes to help with any debugging\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"if anthropic_api_key:\n",
" print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
"else:\n",
" print(\"Anthropic API Key not set\")\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
"else:\n",
" print(\"Google API Key not set\")"
]
},
{
"cell_type": "markdown",
"id": "9e6ae162",
"metadata": {},
"source": [
"### Class for handling API calls and routing requests to the selected models"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "268ea65d",
"metadata": {},
"outputs": [],
"source": [
"class LLMHandler:\n",
" def __init__(self, system_message: str = '', ollama_api:str='http://localhost:11434/api/chat'):\n",
" # Default system message if none provided\n",
" self.system_message = system_message if system_message else \"You are a helpful assistant. Always reply in Markdown\"\n",
" self.message_history = []\n",
"\n",
" # Initialize LLM clients\n",
" self.openai = OpenAI()\n",
" self.claude = anthropic.Anthropic()\n",
" self.OLLAMA_API = ollama_api\n",
" self.OLLAMA_HEADERS = {\"Content-Type\": \"application/json\"}\n",
"\n",
" def llm_call(self, model: str = 'gpt-4o-mini', prompt: str = '', memory_persistence=True):\n",
" if not model:\n",
" return 'No model specified'\n",
"\n",
" # Use full message template with system prompt if no prior history\n",
" message = self.get_message_template(prompt, initial=True) if (\n",
" not self.message_history and not 'claude' in model\n",
" ) else self.get_message_template(prompt)\n",
"\n",
" # Handle memory persistence\n",
" if memory_persistence:\n",
" self.message_history.extend(message)\n",
" else:\n",
" self.message_history = message\n",
"\n",
" # Model-specific dispatch\n",
" try:\n",
" if 'gpt' in model:\n",
" response = self.call_openai(model=model)\n",
" elif 'claude' in model:\n",
" response = self.call_claude(model=model)\n",
" elif 'llama' in model:\n",
" response = self.call_ollama(model=model)\n",
" else:\n",
" response = f'{model.title()} is not supported or not a valid model name.'\n",
" except Exception as e:\n",
" response = f'Failed to retrieve response. Reason: {e}'\n",
"\n",
" # Save assistant's reply to history if memory is enabled\n",
" if memory_persistence:\n",
" self.message_history.append({\n",
" \"role\": \"assistant\",\n",
" \"content\": response\n",
" })\n",
"\n",
" return response\n",
"\n",
" def get_message_template(self, prompt: str = '', initial=False):\n",
" # Returns a message template with or without system prompt\n",
" initial_template = [\n",
" {\"role\": \"system\", \"content\": self.system_message},\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ]\n",
" general_template = [\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ]\n",
" return initial_template if initial else general_template\n",
"\n",
" def call_openai(self, model: str = 'gpt-4o-mini'):\n",
" # Sends chat completion request to OpenAI API\n",
" completion = self.openai.chat.completions.create(\n",
" model=model,\n",
" messages=self.message_history,\n",
" )\n",
" response = completion.choices[0].message.content\n",
" return response\n",
"\n",
" def call_ollama(self, model: str = \"llama3.2\"):\n",
"\n",
" payload = {\n",
" \"model\": model,\n",
" \"messages\": self.message_history,\n",
" \"stream\": False\n",
" }\n",
"\n",
" response = requests.post(url=self.OLLAMA_API, headers=self.OLLAMA_HEADERS, json=payload)\n",
" return response.json()[\"message\"][\"content\"]\n",
"\n",
" def call_claude(self, model: str = \"claude-3-haiku-20240307\"):\n",
" # Sends chat request to Anthropic Claude API\n",
" message = self.claude.messages.create(\n",
" model=model,\n",
" system=self.system_message,\n",
" messages=self.message_history,\n",
" max_tokens=500\n",
" )\n",
" response = message.content[0].text\n",
" return response\n"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "632e618b",
"metadata": {},
"outputs": [],
"source": [
"llm_handler = LLMHandler()\n",
"\n",
"# Function to handle user prompts received by the interface\n",
"def llm_call(model, prompt, memory_persistence):\n",
" response = llm_handler.llm_call(model=model, prompt=prompt, memory_persistence=memory_persistence)\n",
" return response, ''\n"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "e19228f6",
"metadata": {},
"outputs": [],
"source": [
"# Specify available model names for the dropdown component\n",
"AVAILABLE_MODELS = [\"gpt-4\", \"gpt-3.5\", \"claude-3-haiku-20240307\", \"llama3.2\", \"gpt-4o-mini\"]\n"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "f65f43ff",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7868\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7868/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"with gr.Blocks() as demo:\n",
" state = gr.State(False) # Recording state (on/off)\n",
" with gr.Row():\n",
" \n",
" with gr.Column():\n",
" out = gr.Markdown(label='Message history')\n",
" with gr.Row():\n",
" memory = gr.Checkbox(label='Toggle memory', value=True) # Handle memory status (on/off) btn\n",
" model_choice = gr.Dropdown(label='Model', choices=AVAILABLE_MODELS, interactive=True) # Model selection dropdown\n",
" query_box = gr.Textbox(label='ChatBox', placeholder=\"Your message\")\n",
" record_btn = gr.Button(value='Record voice message') # Start/stop recording btn\n",
" send_btn = gr.Button(\"Send\") # Send prompt btn\n",
" \n",
" \n",
" \n",
" record_btn.click(fn=toggle_recording, inputs=state, outputs=[record_btn, query_box, state])\n",
" send_btn.click(fn=llm_call, inputs=[model_choice, query_box, memory], outputs=[out, query_box])\n",
" \n",
"\n",
"demo.launch()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3743db5d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "general_env",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,148 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5",
"metadata": {},
"source": [
"# End of week 1 exercise\n",
"\n",
"To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n",
"and responds with an explanation. This is a tool that you will be able to use yourself during the course!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1070317-3ed9-4659-abe3-828943230e03",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"import os\n",
"from openai import OpenAI\n",
"from IPython.display import Markdown, display, update_display\n",
"from dotenv import load_dotenv"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a456906-915a-4bfd-bb9d-57e505c5093f",
"metadata": {},
"outputs": [],
"source": [
"# constants\n",
"\n",
"MODEL_GPT = 'gpt-4o-mini'\n",
"MODEL_LLAMA = 'llama3.2'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8d7923c-5f28-4c30-8556-342d7c8497c1",
"metadata": {},
"outputs": [],
"source": [
"# set up environment\n",
"load_dotenv(override=True)\n",
"api_key=os.getenv(\"OPENAI_API_KEY\")\n",
"if not api_key.startswith(\"sk-proj-\") and len(api_key)<10:\n",
" print(\"api key not foud\")\n",
"else:\n",
" print(\"api found and is ok\")\n",
"\n",
"openai=OpenAI()\n",
"print()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f0d0137-52b0-47a8-81a8-11a90a010798",
"metadata": {},
"outputs": [],
"source": [
"# here is the question; type over this to ask something new\n",
"\n",
"question = \"\"\"\n",
"Please explain what this code does and why:\n",
"yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60ce7000-a4a5-4cce-a261-e75ef45063b4",
"metadata": {},
"outputs": [],
"source": [
"# Get gpt-4o-mini to answer, with streaming\n",
"messages = [{\"role\":\"system\",\"content\":\"You are a expert Dta Scientist\"}, {\"role\":\"user\",\"content\":question}]\n",
"\n",
"stream = openai.chat.completions.create(\n",
" model = MODEL_GPT,\n",
" messages = messages,\n",
" stream = True\n",
")\n",
"response = \"\"\n",
"display_handle = display(Markdown(\"\"), display_id=True)\n",
"for chunk in stream:\n",
" response += chunk.choices[0].delta.content or ''\n",
" response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n",
" update_display(Markdown(response), display_id=display_handle.display_id)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538",
"metadata": {},
"outputs": [],
"source": [
"# Get Llama 3.2 to answer\n",
"import ollama\n",
"\n",
"stream = ollama.chat(model=MODEL_LLAMA, messages=messages, stream=True)\n",
"response = \"\"\n",
"display_handle = display(Markdown(\"\"), display_id=True)\n",
"for chunk in stream:\n",
" response += chunk[\"message\"][\"content\"] or ''\n",
" response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n",
" update_display(Markdown(response), display_id=display_handle.display_id)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2a573174-779b-4d50-8792-fa0889b37211",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "llmenv",
"language": "python",
"name": "llmenv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,426 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9",
"metadata": {},
"source": [
"# Welcome to your first assignment!\n",
"\n",
"Instructions are below. Please give this a try, and look in the solutions folder if you get stuck (or feel free to ask me!)"
]
},
{
"cell_type": "markdown",
"id": "ada885d9-4d42-4d9b-97f0-74fbbbfe93a9",
"metadata": {},
"source": [
"<table style=\"margin: 0; text-align: left;\">\n",
" <tr>\n",
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
" <img src=\"../resources.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" </td>\n",
" <td>\n",
" <h2 style=\"color:#f71;\">Just before we get to the assignment --</h2>\n",
" <span style=\"color:#f71;\">I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.<br/>\n",
" <a href=\"https://edwarddonner.com/2024/11/13/llm-engineering-resources/\">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>\n",
" Please keep this bookmarked, and I'll continue to add more useful links there over time.\n",
" </span>\n",
" </td>\n",
" </tr>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"id": "6e9fa1fc-eac5-4d1d-9be4-541b3f2b3458",
"metadata": {},
"source": [
"# HOMEWORK EXERCISE ASSIGNMENT\n",
"\n",
"Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI\n",
"\n",
"You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.\n",
"\n",
"**Benefits:**\n",
"1. No API charges - open-source\n",
"2. Data doesn't leave your box\n",
"\n",
"**Disadvantages:**\n",
"1. Significantly less power than Frontier Model\n",
"\n",
"## Recap on installation of Ollama\n",
"\n",
"Simply visit [ollama.com](https://ollama.com) and install!\n",
"\n",
"Once complete, the ollama server should already be running locally. \n",
"If you visit: \n",
"[http://localhost:11434/](http://localhost:11434/)\n",
"\n",
"You should see the message `Ollama is running`. \n",
"\n",
"If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve` \n",
"And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2` \n",
"Then try [http://localhost:11434/](http://localhost:11434/) again.\n",
"\n",
"If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code below from `MODEL = \"llama3.2\"` to `MODEL = \"llama3.2:1b\"`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import requests\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29ddd15d-a3c5-4f4e-a678-873f56162724",
"metadata": {},
"outputs": [],
"source": [
"# Constants\n",
"\n",
"OLLAMA_API = \"http://localhost:11434/api/chat\"\n",
"HEADERS = {\"Content-Type\": \"application/json\"}\n",
"MODEL = \"llama3.2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dac0a679-599c-441f-9bf2-ddc73d35b940",
"metadata": {},
"outputs": [],
"source": [
"# Create a messages list using the same format that we used for OpenAI\n",
"\n",
"messages = [\n",
" {\"role\": \"user\", \"content\": \"Describe some of the business applications of Generative AI\"}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7bb9c624-14f0-4945-a719-8ddb64f66f47",
"metadata": {},
"outputs": [],
"source": [
"payload = {\n",
" \"model\": MODEL,\n",
" \"messages\": messages,\n",
" \"stream\": False\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "479ff514-e8bd-4985-a572-2ea28bb4fa40",
"metadata": {},
"outputs": [],
"source": [
"# Let's just make sure the model is loaded\n",
"\n",
"!ollama pull llama3.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42b9f644-522d-4e05-a691-56e7658c0ea9",
"metadata": {},
"outputs": [],
"source": [
"# If this doesn't work for any reason, try the 2 versions in the following cells\n",
"# And double check the instructions in the 'Recap on installation of Ollama' at the top of this lab\n",
"# And if none of that works - contact me!\n",
"\n",
"response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)\n",
"print(response.json()['message']['content'])"
]
},
{
"cell_type": "markdown",
"id": "6a021f13-d6a1-4b96-8e18-4eae49d876fe",
"metadata": {},
"source": [
"# Introducing the ollama package\n",
"\n",
"And now we'll do the same thing, but using the elegant ollama python package instead of a direct HTTP call.\n",
"\n",
"Under the hood, it's making the same call as above to the ollama server running at localhost:11434"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7745b9c4-57dc-4867-9180-61fa5db55eb8",
"metadata": {},
"outputs": [],
"source": [
"import ollama\n",
"\n",
"response = ollama.chat(model=MODEL, messages=messages)\n",
"print(response['message']['content'])"
]
},
{
"cell_type": "markdown",
"id": "a4704e10-f5fb-4c15-a935-f046c06fb13d",
"metadata": {},
"source": [
"## Alternative approach - using OpenAI python library to connect to Ollama"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "23057e00-b6fc-4678-93a9-6b31cb704bff",
"metadata": {},
"outputs": [],
"source": [
"# There's actually an alternative approach that some people might prefer\n",
"# You can use the OpenAI client python library to call Ollama:\n",
"\n",
"from openai import OpenAI\n",
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
"\n",
"response = ollama_via_openai.chat.completions.create(\n",
" model=MODEL,\n",
" messages=messages\n",
")\n",
"\n",
"print(response.choices[0].message.content)"
]
},
{
"cell_type": "markdown",
"id": "9f9e22da-b891-41f6-9ac9-bd0c0a5f4f44",
"metadata": {},
"source": [
"## Are you confused about why that works?\n",
"\n",
"It seems strange, right? We just used OpenAI code to call Ollama?? What's going on?!\n",
"\n",
"Here's the scoop:\n",
"\n",
"The python class `OpenAI` is simply code written by OpenAI engineers that makes calls over the internet to an endpoint. \n",
"\n",
"When you call `openai.chat.completions.create()`, this python code just makes a web request to the following url: \"https://api.openai.com/v1/chat/completions\"\n",
"\n",
"Code like this is known as a \"client library\" - it's just wrapper code that runs on your machine to make web requests. The actual power of GPT is running on OpenAI's cloud behind this API, not on your computer!\n",
"\n",
"OpenAI was so popular, that lots of other AI providers provided identical web endpoints, so you could use the same approach.\n",
"\n",
"So Ollama has an endpoint running on your local box at http://localhost:11434/v1/chat/completions \n",
"And in week 2 we'll discover that lots of other providers do this too, including Gemini and DeepSeek.\n",
"\n",
"And then the team at OpenAI had a great idea: they can extend their client library so you can specify a different 'base url', and use their library to call any compatible API.\n",
"\n",
"That's it!\n",
"\n",
"So when you say: `ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')` \n",
"Then this will make the same endpoint calls, but to Ollama instead of OpenAI."
]
},
{
"cell_type": "markdown",
"id": "bc7d1de3-e2ac-46ff-a302-3b4ba38c4c90",
"metadata": {},
"source": [
"## Also trying the amazing reasoning model DeepSeek\n",
"\n",
"Here we use the version of DeepSeek-reasoner that's been distilled to 1.5B. \n",
"This is actually a 1.5B variant of Qwen that has been fine-tuned using synethic data generated by Deepseek R1.\n",
"\n",
"Other sizes of DeepSeek are [here](https://ollama.com/library/deepseek-r1) all the way up to the full 671B parameter version, which would use up 404GB of your drive and is far too large for most!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf9eb44e-fe5b-47aa-b719-0bb63669ab3d",
"metadata": {},
"outputs": [],
"source": [
"!ollama pull deepseek-r1:1.5b"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1d3d554b-e00d-4c08-9300-45e073950a76",
"metadata": {},
"outputs": [],
"source": [
"# This may take a few minutes to run! You should then see a fascinating \"thinking\" trace inside <think> tags, followed by some decent definitions\n",
"\n",
"response = ollama_via_openai.chat.completions.create(\n",
" model=\"deepseek-r1:1.5b\",\n",
" messages=[{\"role\": \"user\", \"content\": \"Please give definitions of some core concepts behind LLMs: a neural network, attention and the transformer\"}]\n",
")\n",
"\n",
"print(response.choices[0].message.content)"
]
},
{
"cell_type": "markdown",
"id": "1622d9bb-5c68-4d4e-9ca4-b492c751f898",
"metadata": {},
"source": [
"# NOW the exercise for you\n",
"\n",
"Take the code from day1 and incorporate it here, to build a website summarizer that uses Llama 3.2 running locally instead of OpenAI; use either of the above approaches."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "43ef4b92-53e1-4af2-af3f-726812f4265c",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"#from dotenv import load_dotenv\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display\n",
"#from openai import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "97d45733-394e-493e-a92b-1475876d9028",
"metadata": {},
"outputs": [],
"source": [
"headers = {\n",
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
"}\n",
"\n",
"class Website:\n",
"\n",
" def __init__(self, url):\n",
" \"\"\"\n",
" Create this Website object from the given url using the BeautifulSoup library\n",
" \"\"\"\n",
" self.url = url\n",
" response = requests.get(url, headers=headers)\n",
" soup = BeautifulSoup(response.content, 'html.parser')\n",
" self.title = soup.title.string if soup.title else \"No title found\"\n",
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
" irrelevant.decompose()\n",
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6a40f9c5-1b14-42f9-9319-6a66e58e03f2",
"metadata": {},
"outputs": [],
"source": [
"webpage = Website(\"https://www.pleasurewebsite.com\")\n",
"print(webpage.title)\n",
"print(webpage.text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a72a005d-43de-4ae5-b427-99a8fcb6065c",
"metadata": {},
"outputs": [],
"source": [
"system_prompt = \"You are an assistant that analyzes the contents of a website \\\n",
"and provides a short summary, ignoring text that might be navigation related. \\\n",
"Respond in markdown.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0e4f95f-0ccf-4027-9457-5c973cd17702",
"metadata": {},
"outputs": [],
"source": [
"def user_prompt_for(website):\n",
" user_prompt = f\"You are looking at a website titled {website.title}\"\n",
" user_prompt += \"\\nThe contents of this website is as follows; \\\n",
"please provide a short summary of this website in markdown. \\\n",
"If it includes news or announcements, then summarize these too.\\n\\n\"\n",
" user_prompt += website.text\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ceae6073-a085-49ce-ad44-39e46d8e6934",
"metadata": {},
"outputs": [],
"source": [
"def messages_for(website):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d53b26b-308c-470c-a0a9-9edb887aed6d",
"metadata": {},
"outputs": [],
"source": [
"messages=messages_for(webpage)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6de38216-6d1c-48c4-877b-86d403f4e0f8",
"metadata": {},
"outputs": [],
"source": [
"import ollama\n",
"MODEL = \"llama3.2\"\n",
"response = ollama.chat(model=MODEL, messages=messages)\n",
"print(response['message']['content'])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "llmenv",
"language": "python",
"name": "llmenv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,351 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "06cf3063-9f3e-4551-a0d5-f08d9cabb927",
"metadata": {},
"source": [
"# Triangular agent conversation\n",
"\n",
"## GPT (Hamlet), LLM (Falstaff), Gemini (Iago):"
]
},
{
"cell_type": "markdown",
"id": "3637910d-2c6f-4f19-b1fb-2f916d23f9ac",
"metadata": {},
"source": [
"### Created a 3-way, bringing Gemini into the coversation.\n",
"### Replacing one of the models with an open source model running with Ollama."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f8e0c1bd-a159-475b-9cdc-e219a7633355",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"from IPython.display import Markdown, display, update_display\n",
"import ollama"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3ad57ad-46a8-460e-9cb3-67a890093536",
"metadata": {},
"outputs": [],
"source": [
"import google.generativeai"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4f531c14-5743-4a5b-83d9-cb5863ca2ddf",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"# Print the key prefixes to help with any debugging\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
"else:\n",
" print(\"Google API Key not set\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d5150ee-3858-4921-bce6-2eecfb96bc75",
"metadata": {},
"outputs": [],
"source": [
"# Connect to OpenAI\n",
"\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11381fd8-5099-41e8-a1d7-6787dea56e43",
"metadata": {},
"outputs": [],
"source": [
"google.generativeai.configure()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1766d20-54b6-4f76-96c5-c338ae7073c9",
"metadata": {},
"outputs": [],
"source": [
"gpt_model = \"gpt-4o-mini\"\n",
"llama_model = \"llama3.2\"\n",
"gemini_model = 'gemini-2.0-flash'\n",
"\n",
"gpt_system = \"You are playing part of Hamlet. he is philosopher, probes Iago with a mixture of suspicion\\\n",
"and intellectual curiosity, seeking to unearth the origins of his deceit.\\\n",
"Is malice born of scorn, envy, or some deeper void? Hamlets introspective nature\\\n",
"drives him to question whether Iagos actions reveal a truth about humanity itself.\\\n",
"You will respond as Shakespear's Hamlet will do.\"\n",
"\n",
"llama_system = \"You are acting part of Falstaff who attempts to lighten the mood with his jokes and observations,\\\n",
"potentially clashing with Hamlet's melancholic nature.You respond as Shakespear's Falstaff do.\"\n",
"\n",
"gemini_system = \"You are acting part of Iago, subtly trying to manipulate both Hamlet and Falstaff\\\n",
"to his own advantage, testing their weaknesses and exploiting their flaws. You respond like Iago\"\n",
"\n",
"gpt_messages = [\"Hi there\"]\n",
"llama_messages = [\"Hi\"]\n",
"gemini_messages = [\"Hello\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "806a0506-dac8-4bad-ac08-31f350256b58",
"metadata": {},
"outputs": [],
"source": [
"def call_gpt():\n",
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
" for gpt, claude, gemini in zip(gpt_messages, llama_messages, gemini_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" messages.append({\"role\": \"user\", \"content\": gemini})\n",
" completion = openai.chat.completions.create(\n",
" model=gpt_model,\n",
" messages=messages\n",
" )\n",
" return completion.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "43674885-ede7-48bf-bee4-467454f3e96a",
"metadata": {},
"outputs": [],
"source": [
"def call_llama():\n",
" messages = []\n",
" for gpt, llama, gemini in zip(gpt_messages, llama_messages, gemini_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"assistant\", \"content\": llama})\n",
" messages.append({\"role\": \"user\", \"content\": gemini})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" response = ollama.chat(model=llama_model, messages=messages)\n",
"\n",
" \n",
" return response['message']['content']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "03d34769-b339-4c4b-8c60-69494c39d725",
"metadata": {},
"outputs": [],
"source": [
"#import google.generativeai as genai\n",
"\n",
"# Make sure you configure the API key first:\n",
"#genai.configure(api_key=\"YOUR_API_KEY\")\n",
"\n",
"def call_gemini():\n",
" gemini_messages = []\n",
" \n",
" # Format the history for Gemini\n",
" for gpt, llama, gemini_message in zip(gpt_messages, llama_messages, gemini_messages):\n",
" gemini_messages.append({\"role\": \"user\", \"parts\": [gpt]}) # Hamlet speaks\n",
" gemini_messages.append({\"role\": \"model\", \"parts\": [llama]}) # Falstaff responds\n",
" gemini_messages.append({\"role\": \"model\", \"parts\": [gemini_message]}) # Iago responds\n",
"\n",
" # Add latest user input if needed (optional)\n",
" gemini_messages.append({\"role\": \"user\", \"parts\": [llama_messages[-1]]})\n",
"\n",
" # Initialize the model with the correct system instruction\n",
" gemini = google.generativeai.GenerativeModel(\n",
" #model_name='gemini-1.5-flash', # Or 'gemini-pro'\n",
" model_name = gemini_model,\n",
" system_instruction=gemini_system\n",
" )\n",
"\n",
" response = gemini.generate_content(gemini_messages)\n",
" return response.text\n",
"#print(response.text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93fc8253-67cb-4ea4-aff7-097b2a222793",
"metadata": {},
"outputs": [],
"source": [
"gpt_messages = [\"Hi there\"]\n",
"llama_messages = [\"Hi\"]\n",
"gemini_messages = [\"Hello\"]\n",
"\n",
"print(f\"Hamlet:\\n{gpt_messages[0]}\\n\")\n",
"print(f\"Falstaff:\\n{llama_messages[0]}\\n\")\n",
"print(f\"Iago:\\n{gemini_messages[0]}\\n\")\n",
"\n",
"for i in range(3):\n",
" gpt_next = call_gpt()\n",
" print(f\"GPT:\\n{gpt_next}\\n\")\n",
" gpt_messages.append(gpt_next)\n",
" \n",
" llama_next = call_llama()\n",
" print(f\"Llama:\\n{llama_next}\\n\")\n",
" llama_messages.append(llama_next)\n",
"\n",
" gemini_next = call_gemini()\n",
" print(f\"Gemini:\\n{gemini_next}\\n\")\n",
" llama_messages.append(gemini_next)"
]
},
{
"cell_type": "markdown",
"id": "bca66ffc-9dc1-4384-880c-210889f5d0ac",
"metadata": {},
"source": [
"## Conversation between gpt-4.0-mini and llama3.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c23224f6-7008-44ed-a57f-718975f4e291",
"metadata": {},
"outputs": [],
"source": [
"# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n",
"# We're using cheap versions of models so the costs will be minimal\n",
"\n",
"gpt_model = \"gpt-4o-mini\"\n",
"llama_model = \"llama3.2\"\n",
"\n",
"gpt_system = \"You are a tapori from mumbai who is very optimistic; \\\n",
"you alway look at the brighter part of the situation and you always ready to take act to win way.\"\n",
"\n",
"llama_system = \"You are a Jaat from Haryana. You try to express with hindi poems \\\n",
"to agree with other person and or find common ground. If the other person is optimistic, \\\n",
"you respond in poetic way and keep chatting.\"\n",
"\n",
"gpt_messages = [\"Hi there\"]\n",
"llama_messages = [\"Hi\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2d704bbb-f22b-400d-a695-efbd02b26548",
"metadata": {},
"outputs": [],
"source": [
"def call_gpt():\n",
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
" for gpt, llama in zip(gpt_messages, llama_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": llama})\n",
" completion = openai.chat.completions.create(\n",
" model=gpt_model,\n",
" messages=messages\n",
" )\n",
" return completion.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "385ccec8-de59-4e42-9616-3f5c9a05589c",
"metadata": {},
"outputs": [],
"source": [
"def call_llama():\n",
" messages = []\n",
" for gpt, llama_message in zip(gpt_messages, llama_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"assistant\", \"content\": llama_message})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" response = ollama.chat(model=llama_model, messages=messages)\n",
"\n",
" \n",
" return response['message']['content']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "70b5481b-455e-4275-80d3-0afe0fabcb0f",
"metadata": {},
"outputs": [],
"source": [
"gpt_messages = [\"Hi there\"]\n",
"llama_messages = [\"Hi\"]\n",
"\n",
"print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n",
"print(f\"Llama:\\n{llama_messages[0]}\\n\")\n",
"\n",
"for i in range(3):\n",
" gpt_next = call_gpt()\n",
" print(f\"GPT:\\n{gpt_next}\\n\")\n",
" gpt_messages.append(gpt_next)\n",
" \n",
" llama_next = call_llama()\n",
" print(f\"Llama:\\n{llama_next}\\n\")\n",
" llama_messages.append(llama_next)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7f8d734b-57e5-427d-bcb1-7956fc58a348",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "llmenv",
"language": "python",
"name": "llmenv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,3 @@
<!-- Use this file to provide workspace-specific custom instructions to Copilot. For more details, visit https://code.visualstudio.com/docs/copilot/copilot-customization#_use-a-githubcopilotinstructionsmd-file -->
This is a Streamlit web application for clinical trial protocol summarization. Use Streamlit best practices for UI and Python for backend logic. Integrate with ClinicalTrials.gov v2 API for study search and OpenAI for summarization.

View File

@@ -0,0 +1,30 @@
updates.md
.env
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
venv/
ENV/
.streamlit/
.idea/
.vscode/
*.swp
*.swo
.DS_Store

View File

@@ -0,0 +1,66 @@
# Protocol Summarizer Webapp
A Streamlit web application for searching and summarizing clinical trial protocols from ClinicalTrials.gov using Large Language Models. This tool enables researchers and clinical professionals to quickly extract key information from clinical trial protocols.
## Features
- Search for clinical trials by keyword
- Display a list of studies with title and NCT number
- Select a study to summarize
- Fetch the protocol's brief summary from ClinicalTrials.gov API
- Automatically summarize the protocol using OpenAI's LLM
- Extract structured information like study design, population, interventions, and endpoints
## Installation
1. Clone this repository:
```sh
git clone https://github.com/albertoclemente/protocol_summarizer.git
cd protocol_summarizer/protocol_summarizer_webapp
```
2. Install dependencies:
```sh
pip install -r requirements.txt
```
3. Create a `.env` file in the project root with your OpenAI API key:
```
OPENAI_API_KEY=your_api_key_here
```
## Usage
1. Run the Streamlit app:
```sh
streamlit run app.py
```
2. In your browser:
- Enter a disease, condition, or keyword in the search box
- Select the number of results to display
- Click the "Search" button
- Select a study from the results
- Click "Summarize Protocol" to generate a structured summary
## Technical Details
- Uses ClinicalTrials.gov API v2 to retrieve study information
- Implements fallback methods to handle API changes or failures
- Extracts protocol brief summaries using reliable JSON parsing
- Generates structured summaries using OpenAI's GPT models
## Requirements
- Python 3.7+
- Streamlit
- Requests
- OpenAI Python library
- python-dotenv
## Contribution
Contributions are welcome! Please feel free to submit a Pull Request.
## License
MIT License

View File

@@ -0,0 +1,121 @@
import os
from dotenv import load_dotenv
import streamlit as st
import requests
from openai import OpenAI
load_dotenv()
st.title("Protocol Summarizer")
st.markdown("""
Search for clinical trials by keyword, select a study, and generate a protocol summary using an LLM.
""")
# Search input
# Show results only after user presses Enter
with st.form(key="search_form"):
query = st.text_input("Enter a disease, study title, or keyword:")
max_results = st.slider("Number of results", 1, 20, 5)
submitted = st.form_submit_button("Search")
@st.cache_data(show_spinner=False)
def search_clinical_trials(query, max_results=5):
if not query:
return []
url = f"https://clinicaltrials.gov/api/v2/studies?query.term={query}&pageSize={max_results}&format=json"
resp = requests.get(url)
studies = []
if resp.status_code == 200:
data = resp.json()
for study in data.get('studies', []):
nct = study.get('protocolSection', {}).get('identificationModule', {}).get('nctId', 'N/A')
title = study.get('protocolSection', {}).get('identificationModule', {}).get('officialTitle', 'N/A')
studies.append({'nct': nct, 'title': title})
return studies
results = search_clinical_trials(query, max_results) if query else []
if results:
st.subheader("Search Results")
for i, study in enumerate(results):
st.markdown(f"**{i+1}. {study['title']}** (NCT: {study['nct']})")
selected = st.number_input("Select study number to summarize", min_value=1, max_value=len(results), value=1)
selected_study = results[selected-1]
st.markdown(f"### Selected Study\n**{selected_study['title']}** (NCT: {selected_study['nct']})")
if st.button("Summarize Protocol"):
# Fetch the brief summary for the selected study
nct_id = selected_study['nct']
# Use the V2 API which we know works reliably
url = f"https://clinicaltrials.gov/api/v2/studies/{nct_id}?format=json"
with st.spinner("Fetching study details..."):
resp = requests.get(url)
brief = ""
if resp.status_code == 200:
try:
data = resp.json()
# V2 API has protocolSection at the root level
if 'protocolSection' in data:
desc_mod = data.get('protocolSection', {}).get('descriptionModule', {})
brief = desc_mod.get('briefSummary', '')
# If briefSummary is empty, try detailedDescription
if not brief:
brief = desc_mod.get('detailedDescription', '')
except Exception as e:
st.error(f"Error parsing study data: {e}")
# If API fails, try HTML scraping as a fallback
if not brief and resp.status_code != 200:
st.warning(f"API returned status code {resp.status_code}. Trying alternative method...")
html_url = f"https://clinicaltrials.gov/ct2/show/{nct_id}"
html_resp = requests.get(html_url)
if "Brief Summary:" in html_resp.text:
start = html_resp.text.find("Brief Summary:") + 15
excerpt = html_resp.text[start:start+1000]
# Clean up HTML
import re
excerpt = re.sub('<[^<]+?>', ' ', excerpt)
excerpt = re.sub('\\s+', ' ', excerpt)
brief = excerpt.strip()
if not brief:
st.error("No brief summary or detailed description found for this study.")
st.stop()
# Now we have the brief summary, send it to the LLM
openai = OpenAI()
def user_prompt_for_protocol_brief(brief_text):
return (
"Extract the following details from the clinical trial brief summary in markdown format with clear section headings (e.g., ## Study Design, ## Population, etc.):\n"
"- Study design\n"
"- Population\n"
"- Interventions\n"
"- Primary and secondary endpoints\n"
"- Study duration\n\n"
f"Brief summary text:\n{brief_text}"
)
system_prompt = "You are a clinical research assistant. Extract and list the requested protocol details in markdown format with clear section headings."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt_for_protocol_brief(brief)}
]
with st.spinner("Summarizing with LLM..."):
try:
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
summary = response.choices[0].message.content
st.markdown(summary)
except Exception as e:
st.error(f"LLM call failed: {e}")
else:
if query:
st.info("No results found. Try a different keyword.")

View File

@@ -0,0 +1,4 @@
streamlit
openai
requests
python-dotenv

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1,517 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 9,
"id": "fc57c47f-31fc-4527-af71-ce117d35c480",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt\n",
"\n",
"import os\n",
"import requests\n",
"import json\n",
"from typing import List\n",
"from dotenv import load_dotenv\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display, update_display\n",
"from openai import OpenAI\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d74ea4e7-7d4a-4c85-92d3-8cdb231bc261",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd "
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "3eb884ea-02db-4ff8-91f9-c71e40b1cf4a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"API key looks good so far\n"
]
}
],
"source": [
"# Initialize and constants\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:\n",
" print(\"API key looks good so far\")\n",
"else:\n",
" print(\"There might be a problem with your API key? Please visit the troubleshooting notebook!\")\n",
" \n",
"MODEL = 'gpt-4o-mini'\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "d48a7b9b-273d-4bc9-997b-c7112e02528c",
"metadata": {},
"outputs": [],
"source": [
"# A class to represent a Webpage\n",
"\n",
"# Some websites need you to use proper headers when fetching them:\n",
"headers = {\n",
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
"}\n",
"\n",
"class Website:\n",
" def __init__(self, url):\n",
" self.url = url\n",
" response = requests.get(url, headers=headers)\n",
" self.body = response.content\n",
" soup = BeautifulSoup(self.body, 'html.parser')\n",
" self.title = soup.title.string if soup.title else \"No title found\"\n",
"\n",
" if soup.body:\n",
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
" irrelevant.decompose()\n",
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n",
" else:\n",
" self.text = \"\"\n",
"\n",
" links = [link.get('href') for link in soup.find_all('a')]\n",
" self.links = [link for link in links if link]\n",
"\n",
" def get_contents(self):\n",
" return f\"Webpage Title:\\n{self.title}\\nWebpage Contents:\\n{self.text}\\n\\n\"\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "bf51ae6e-91ae-46eb-ac39-dc860454ea4a",
"metadata": {},
"outputs": [],
"source": [
"def get_condition_links_from_topics_page():\n",
" topics_url = \"https://www.thuisarts.nl/overzicht/onderwerpen\"\n",
" response = requests.get(topics_url, headers=headers)\n",
" soup = BeautifulSoup(response.content, 'html.parser')\n",
"\n",
" # Find all <a> tags that look like condition pages\n",
" links = soup.find_all(\"a\", href=True)\n",
" condition_links = []\n",
"\n",
" for link in links:\n",
" href = link['href']\n",
" if href.startswith(\"/\"):\n",
" href = \"https://www.thuisarts.nl\" + href\n",
" if href.startswith(\"https://www.thuisarts.nl/\") and len(href.split(\"/\")) > 3:\n",
" condition_links.append(href)\n",
"\n",
" # Remove duplicates and return\n",
" return list(set(condition_links))\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "a246ac9f-73fb-4c2d-ab92-6f3f2bf7afac",
"metadata": {},
"outputs": [],
"source": [
"link_system_prompt = \"\"\"You are an assistant that filters URLs for patient education content. \n",
"\n",
"Only return links that lead to pages about symptoms, health conditions, treatments, or diseases — for example: pages on 'headache', 'diarrhea', 'stomach pain', 'asthma', etc.\n",
"\n",
"DO NOT return:\n",
"- contact pages\n",
"- overview/video/image/keuzekaart lists unless they directly link to medical complaints\n",
"- navigation or privacy/cookie/social media links\n",
"\n",
"Respond only with full https links in JSON format, like this:\n",
"{\n",
" \"links\": [\n",
" {\"type\": \"symptom or condition page\", \"url\": \"https://www.thuisarts.nl/hoofdpijn\"},\n",
" {\"type\": \"symptom or condition page\", \"url\": \"https://www.thuisarts.nl/buikpijn\"}\n",
" ]\n",
"}\n",
"\"\"\"\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "b3ac761e-f583-479e-b8ef-70e70f8f361a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"You are an assistant that filters URLs for patient education content. \n",
"\n",
"Only return links that lead to pages about symptoms, health conditions, treatments, or diseases — for example: pages on 'headache', 'diarrhea', 'stomach pain', 'asthma', etc.\n",
"\n",
"DO NOT return:\n",
"- contact pages\n",
"- overview/video/image/keuzekaart lists unless they directly link to medical complaints\n",
"- navigation or privacy/cookie/social media links\n",
"\n",
"Respond only with full https links in JSON format, like this:\n",
"{\n",
" \"links\": [\n",
" {\"type\": \"symptom or condition page\", \"url\": \"https://www.thuisarts.nl/hoofdpijn\"},\n",
" {\"type\": \"symptom or condition page\", \"url\": \"https://www.thuisarts.nl/buikpijn\"}\n",
" ]\n",
"}\n",
"\n"
]
}
],
"source": [
"print(link_system_prompt)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "5548e8d4-2813-40fe-a807-cf3661d3a0a9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"✅ Found 680 condition pages.\n"
]
}
],
"source": [
"condition_links = get_condition_links_from_topics_page()\n",
"print(f\"✅ Found {len(condition_links)} condition pages.\")\n",
"\n",
"# Format for summary function\n",
"selected_links = [{\"url\": link} for link in condition_links]\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "8d264592-8b77-425a-be4a-73ef7d32d744",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"def load_existing_summaries(filepath=\"brochure_cache.json\"):\n",
" if os.path.exists(filepath):\n",
" with open(filepath, \"r\", encoding=\"utf-8\") as f:\n",
" return json.load(f)\n",
" return {}\n",
"\n",
"def save_summaries_to_cache(summaries, filepath=\"brochure_cache.json\"):\n",
" with open(filepath, \"w\", encoding=\"utf-8\") as f:\n",
" json.dump(summaries, f, indent=2, ensure_ascii=False)\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "1cdd9456-1262-40a0-bc3f-28d23010ed7f",
"metadata": {},
"outputs": [],
"source": [
"selected_links = [{\"url\": link} for link in get_condition_links_from_topics_page()][:10]\n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "0c2f24ea-fa6b-4431-849a-e1aeaa936022",
"metadata": {},
"outputs": [],
"source": [
"summary_cache = {}\n",
"\n",
"def summarize_for_brochure(url):\n",
" if url in summary_cache:\n",
" summary = summary_cache[url]\n",
" print(f\"✅ [Cached] {url}\")\n",
" print(f\"📄 Summary:\\n{summary}\\n\") # 👈 this prints the cached summary too\n",
" return summary\n",
"\n",
" page = Website(url)\n",
"\n",
" example = \"\"\"\n",
"Example:\n",
"\n",
"Title: Keelpijn \n",
"Summary: Sore throat is a common symptom, often caused by a virus. It usually goes away on its own within a few days. Drink warm fluids, rest your voice, and take paracetamol if needed. See a doctor if the pain lasts more than a week or gets worse.\n",
"\n",
"Title: Hoofdpijn \n",
"Summary: Headaches can have many causes like stress, fatigue, or dehydration. Most are harmless and go away with rest and fluids. Painkillers like paracetamol can help. If headaches are severe, frequent, or different than usual, contact your GP.\n",
"\"\"\"\n",
"\n",
" prompt = f\"\"\"\n",
"You are a health writer. Based on the Dutch content below, write a clear, short, brochure-style summary in **English** for patients.\n",
"\n",
"Use the format: \n",
"Title: {page.title} \n",
"Summary: <your summary>\n",
"\n",
"Keep it under 100 words, easy to read, friendly, and medically accurate.\n",
"\n",
"{example}\n",
"\n",
"Now use this for:\n",
"Title: {page.title}\n",
"Content:\n",
"{page.text[:3000]}\n",
"\"\"\"\n",
"\n",
" response = openai.chat.completions.create(\n",
" model=\"gpt-4\",\n",
" messages=[{\"role\": \"user\", \"content\": prompt}],\n",
" temperature=0.4\n",
" )\n",
"\n",
" summary = response.choices[0].message.content.strip()\n",
" summary_cache[url] = summary\n",
" return summary\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "af8f9d81-d848-4fb9-ac79-782b39fed4a2",
"metadata": {},
"outputs": [],
"source": [
"def build_symptom_brochure(links, cache_file=\"brochure_cache.json\"):\n",
" brochure = []\n",
" cached = load_existing_summaries(cache_file)\n",
" print(\"📄 Building summaries for brochure:\\n\")\n",
"\n",
" for i, item in enumerate(links, 1):\n",
" url = item[\"url\"]\n",
" if url in cached:\n",
" print(f\"✅ [Cached] {url}\")\n",
" brochure.append({\"url\": url, \"summary\": cached[url]})\n",
" continue\n",
" \n",
" print(f\"🔄 [{i}/{len(links)}] Summarizing: {url}\")\n",
" try:\n",
" summary = summarize_for_brochure(url)\n",
" print(f\"✅ Summary:\\n{summary}\\n\")\n",
" brochure.append({\"url\": url, \"summary\": summary})\n",
" cached[url] = summary # Save new summary\n",
" save_summaries_to_cache(cached, cache_file)\n",
" except Exception as e:\n",
" print(f\"❌ Error summarizing {url}: {e}\\n\")\n",
" brochure.append({\"url\": url, \"summary\": \"Error generating summary.\"})\n",
"\n",
" return brochure\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "e9079d6b-538f-4681-9776-4628a111246a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"📄 Building summaries for brochure:\n",
"\n",
"🔄 [1/10] Summarizing: https://www.thuisarts.nl/sociale-angststoornis\n",
"✅ [New] https://www.thuisarts.nl/sociale-angststoornis\n",
"📄 Summary:\n",
"Title: Social Anxiety Disorder\n",
"Summary: Social anxiety disorder, or social phobia, is a fear of what others think of you, often leading to panic attacks. Writing down what happens, your thoughts, and feelings can help manage this fear. Positive thinking can also be beneficial when you're feeling anxious. Discussing your concerns with your GP or practice nurse can be helpful. If there's no improvement or symptoms are severe, treatments such as therapy with a psychologist or anxiety medication may be considered.\n",
"\n",
"✅ Summary:\n",
"Title: Social Anxiety Disorder\n",
"Summary: Social anxiety disorder, or social phobia, is a fear of what others think of you, often leading to panic attacks. Writing down what happens, your thoughts, and feelings can help manage this fear. Positive thinking can also be beneficial when you're feeling anxious. Discussing your concerns with your GP or practice nurse can be helpful. If there's no improvement or symptoms are severe, treatments such as therapy with a psychologist or anxiety medication may be considered.\n",
"\n",
"✅ [Cached] https://www.thuisarts.nl/diabetes-type-2\n",
"🔄 [3/10] Summarizing: https://www.thuisarts.nl/morton-neuroom\n",
"✅ [New] https://www.thuisarts.nl/morton-neuroom\n",
"📄 Summary:\n",
"Title: Morton's Neuroma | Thuisarts.nl \n",
"Summary: Morton's Neuroma is a pinched nerve in the forefoot, causing burning pain in the forefoot and toes. It often results from wearing too narrow shoes or high heels. Wearing comfortable, roomy shoes can help alleviate symptoms. For severe pain, paracetamol can be taken. Sometimes, a custom shoe insole can also help.\n",
"\n",
"✅ Summary:\n",
"Title: Morton's Neuroma | Thuisarts.nl \n",
"Summary: Morton's Neuroma is a pinched nerve in the forefoot, causing burning pain in the forefoot and toes. It often results from wearing too narrow shoes or high heels. Wearing comfortable, roomy shoes can help alleviate symptoms. For severe pain, paracetamol can be taken. Sometimes, a custom shoe insole can also help.\n",
"\n",
"🔄 [4/10] Summarizing: https://www.thuisarts.nl/borstvergroting\n",
"✅ [New] https://www.thuisarts.nl/borstvergroting\n",
"📄 Summary:\n",
"Title: Breast Augmentation | Thuisarts.nl \n",
"Summary: A breast augmentation is a procedure where a plastic surgeon inserts fillings into your breasts, under general anesthesia. The surgery takes about an hour. Consider the pros and cons carefully. Benefits may include a more positive body image and increased self-confidence. Risks may include infection, bleeding, scarring, or hardening of the breasts over time. Often, a follow-up surgery is needed later. If you smoke, it's important to quit three weeks before surgery.\n",
"\n",
"✅ Summary:\n",
"Title: Breast Augmentation | Thuisarts.nl \n",
"Summary: A breast augmentation is a procedure where a plastic surgeon inserts fillings into your breasts, under general anesthesia. The surgery takes about an hour. Consider the pros and cons carefully. Benefits may include a more positive body image and increased self-confidence. Risks may include infection, bleeding, scarring, or hardening of the breasts over time. Often, a follow-up surgery is needed later. If you smoke, it's important to quit three weeks before surgery.\n",
"\n",
"🔄 [5/10] Summarizing: https://www.thuisarts.nl/kijkoperatie-in-buik\n",
"✅ [New] https://www.thuisarts.nl/kijkoperatie-in-buik\n",
"📄 Summary:\n",
"Title: Abdominal Laparoscopy | Thuisarts.nl\n",
"Summary: An abdominal laparoscopy allows the doctor to examine or operate in your abdomen. Small tubes with a camera and tools are inserted through tiny incisions. You'll have a pre-operation discussion with your surgeon and anesthesiologist. You will be deeply sedated for the procedure. You cannot drive home post-operation, so arrange for someone to pick you up. Recovery usually requires a week off work, sometimes longer.\n",
"\n",
"✅ Summary:\n",
"Title: Abdominal Laparoscopy | Thuisarts.nl\n",
"Summary: An abdominal laparoscopy allows the doctor to examine or operate in your abdomen. Small tubes with a camera and tools are inserted through tiny incisions. You'll have a pre-operation discussion with your surgeon and anesthesiologist. You will be deeply sedated for the procedure. You cannot drive home post-operation, so arrange for someone to pick you up. Recovery usually requires a week off work, sometimes longer.\n",
"\n",
"🔄 [6/10] Summarizing: https://www.thuisarts.nl/veranderingen-in-zorg-als-je-18-wordt\n",
"✅ [New] https://www.thuisarts.nl/veranderingen-in-zorg-als-je-18-wordt\n",
"📄 Summary:\n",
"Title: Changes in Care When You Turn 18 | Thuisarts.nl\n",
"Summary: As you become an adult, usually around 18, you transition from child to adult healthcare. You will start to take more responsibility, such as making appointments and requesting medications, giving you more control over your care. You will create a plan detailing what you need to manage this independently, with support provided to help you. This transition is a gradual process, with preparation beginning before you turn 18.\n",
"\n",
"✅ Summary:\n",
"Title: Changes in Care When You Turn 18 | Thuisarts.nl\n",
"Summary: As you become an adult, usually around 18, you transition from child to adult healthcare. You will start to take more responsibility, such as making appointments and requesting medications, giving you more control over your care. You will create a plan detailing what you need to manage this independently, with support provided to help you. This transition is a gradual process, with preparation beginning before you turn 18.\n",
"\n",
"🔄 [7/10] Summarizing: https://www.thuisarts.nl/zon-en-zonnebrand\n",
"✅ [New] https://www.thuisarts.nl/zon-en-zonnebrand\n",
"📄 Summary:\n",
"Title: Sun and Sunburn | Thuisarts.nl\n",
"Summary: Protect your skin from excessive sunlight to avoid sunburn. If you notice your skin burning, immediately move out of the sun. Cool your skin with wet cloths if it hurts and take paracetamol for severe pain. Stay out of the sun for at least three days to allow your skin to recover. If you have symptoms of sunstroke, sun allergy, or eczema, seek medical advice.\n",
"\n",
"✅ Summary:\n",
"Title: Sun and Sunburn | Thuisarts.nl\n",
"Summary: Protect your skin from excessive sunlight to avoid sunburn. If you notice your skin burning, immediately move out of the sun. Cool your skin with wet cloths if it hurts and take paracetamol for severe pain. Stay out of the sun for at least three days to allow your skin to recover. If you have symptoms of sunstroke, sun allergy, or eczema, seek medical advice.\n",
"\n",
"🔄 [8/10] Summarizing: https://www.thuisarts.nl/ganglion\n",
"✅ [New] https://www.thuisarts.nl/ganglion\n",
"📄 Summary:\n",
"Title: Ganglion | Thuisarts.nl \n",
"Summary: A ganglion is a small bump that can appear on your wrist, finger, or foot. It is a protrusion from the joint and is harmless. In half of the cases, a ganglion disappears on its own. If you notice such a bump, there is usually no cause for concern.\n",
"\n",
"✅ Summary:\n",
"Title: Ganglion | Thuisarts.nl \n",
"Summary: A ganglion is a small bump that can appear on your wrist, finger, or foot. It is a protrusion from the joint and is harmless. In half of the cases, a ganglion disappears on its own. If you notice such a bump, there is usually no cause for concern.\n",
"\n",
"🔄 [9/10] Summarizing: https://www.thuisarts.nl/kunstheup\n",
"✅ [New] https://www.thuisarts.nl/kunstheup\n",
"📄 Summary:\n",
"Title: Hip Replacement | Thuisarts.nl\n",
"Summary: A hip replacement can be an option if you are experiencing severe pain or stiffness in your hip, such as from advanced arthritis or another hip disease. This is usually considered when other treatments like physiotherapy and painkillers have not provided enough relief. You can discuss with your hospital doctor whether a hip replacement is suitable for you. A hip prosthesis typically lasts longer than 20 years.\n",
"\n",
"✅ Summary:\n",
"Title: Hip Replacement | Thuisarts.nl\n",
"Summary: A hip replacement can be an option if you are experiencing severe pain or stiffness in your hip, such as from advanced arthritis or another hip disease. This is usually considered when other treatments like physiotherapy and painkillers have not provided enough relief. You can discuss with your hospital doctor whether a hip replacement is suitable for you. A hip prosthesis typically lasts longer than 20 years.\n",
"\n",
"🔄 [10/10] Summarizing: https://www.thuisarts.nl/gezond-leven\n",
"✅ [New] https://www.thuisarts.nl/gezond-leven\n",
"📄 Summary:\n",
"Title: Healthy Living | Thuisarts.nl\n",
"Summary: For good health, it's important to eat, drink, and sleep well, stay active, relax, and maintain social contacts. Avoiding substances like alcohol is also beneficial. If you want to make changes to your lifestyle, take it step by step. Discuss your plans with your GP or practice nurse. Whether it's about healthy eating, exercise, sleep, stress management, social contact, or substance use, they can provide guidance and support.\n",
"\n",
"✅ Summary:\n",
"Title: Healthy Living | Thuisarts.nl\n",
"Summary: For good health, it's important to eat, drink, and sleep well, stay active, relax, and maintain social contacts. Avoiding substances like alcohol is also beneficial. If you want to make changes to your lifestyle, take it step by step. Discuss your plans with your GP or practice nurse. Whether it's about healthy eating, exercise, sleep, stress management, social contact, or substance use, they can provide guidance and support.\n",
"\n"
]
}
],
"source": [
"brochure = build_symptom_brochure(selected_links)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "e2121c3c-aa6a-4640-8e19-6ca6ccf84783",
"metadata": {},
"outputs": [],
"source": [
"def export_brochure_to_txt(brochure, filepath=\"brochure_summaries.txt\"):\n",
" if not brochure:\n",
" print(\"⚠️ No summaries to export.\")\n",
" return\n",
"\n",
" with open(filepath, \"w\", encoding=\"utf-8\") as f:\n",
" for item in brochure:\n",
" url = item.get(\"url\", \"Unknown URL\")\n",
" summary = item.get(\"summary\", \"No summary available.\")\n",
" f.write(f\"URL: {url}\\n\")\n",
" f.write(f\"{summary}\\n\\n\")\n",
"\n",
" print(f\"📁 Exported {len(brochure)} summaries to {filepath}\")\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "f14288f9-4d1c-4a0e-aaf4-9f86324b0602",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"📁 Exported 10 summaries to brochure_summaries.txt\n"
]
}
],
"source": [
"export_brochure_to_txt(brochure)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c23e89db-3ded-4189-a227-6ca6ac2f1332",
"metadata": {},
"outputs": [],
"source": [
"###---it works---"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a700e4f3-fb6a-499a-a579-6f9b8ad35c9f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,40 @@
URL: https://www.thuisarts.nl/sociale-angststoornis
Title: Social Anxiety Disorder
Summary: Social anxiety disorder, or social phobia, is a fear of what others think of you, often leading to panic attacks. Writing down what happens, your thoughts, and feelings can help manage this fear. Positive thinking can also be beneficial when you're feeling anxious. Discussing your concerns with your GP or practice nurse can be helpful. If there's no improvement or symptoms are severe, treatments such as therapy with a psychologist or anxiety medication may be considered.
URL: https://www.thuisarts.nl/diabetes-type-2
Title: Diabetes type 2 | Thuisarts.nl
Summary: Type 2 diabetes, also known as sugar disease, is characterized by high blood sugar levels. Leading a healthy lifestyle is crucial: eat healthily, lose weight, exercise regularly, relax, and quit smoking. If blood sugar levels remain high, medication may be required. Regular check-ups, usually every three months, with your GP or practice nurse are essential.
URL: https://www.thuisarts.nl/morton-neuroom
Title: Morton's Neuroma | Thuisarts.nl
Summary: Morton's Neuroma is a pinched nerve in the forefoot, causing burning pain in the forefoot and toes. It often results from wearing too narrow shoes or high heels. Wearing comfortable, roomy shoes can help alleviate symptoms. For severe pain, paracetamol can be taken. Sometimes, a custom shoe insole can also help.
URL: https://www.thuisarts.nl/borstvergroting
Title: Breast Augmentation | Thuisarts.nl
Summary: A breast augmentation is a procedure where a plastic surgeon inserts fillings into your breasts, under general anesthesia. The surgery takes about an hour. Consider the pros and cons carefully. Benefits may include a more positive body image and increased self-confidence. Risks may include infection, bleeding, scarring, or hardening of the breasts over time. Often, a follow-up surgery is needed later. If you smoke, it's important to quit three weeks before surgery.
URL: https://www.thuisarts.nl/kijkoperatie-in-buik
Title: Abdominal Laparoscopy | Thuisarts.nl
Summary: An abdominal laparoscopy allows the doctor to examine or operate in your abdomen. Small tubes with a camera and tools are inserted through tiny incisions. You'll have a pre-operation discussion with your surgeon and anesthesiologist. You will be deeply sedated for the procedure. You cannot drive home post-operation, so arrange for someone to pick you up. Recovery usually requires a week off work, sometimes longer.
URL: https://www.thuisarts.nl/veranderingen-in-zorg-als-je-18-wordt
Title: Changes in Care When You Turn 18 | Thuisarts.nl
Summary: As you become an adult, usually around 18, you transition from child to adult healthcare. You will start to take more responsibility, such as making appointments and requesting medications, giving you more control over your care. You will create a plan detailing what you need to manage this independently, with support provided to help you. This transition is a gradual process, with preparation beginning before you turn 18.
URL: https://www.thuisarts.nl/zon-en-zonnebrand
Title: Sun and Sunburn | Thuisarts.nl
Summary: Protect your skin from excessive sunlight to avoid sunburn. If you notice your skin burning, immediately move out of the sun. Cool your skin with wet cloths if it hurts and take paracetamol for severe pain. Stay out of the sun for at least three days to allow your skin to recover. If you have symptoms of sunstroke, sun allergy, or eczema, seek medical advice.
URL: https://www.thuisarts.nl/ganglion
Title: Ganglion | Thuisarts.nl
Summary: A ganglion is a small bump that can appear on your wrist, finger, or foot. It is a protrusion from the joint and is harmless. In half of the cases, a ganglion disappears on its own. If you notice such a bump, there is usually no cause for concern.
URL: https://www.thuisarts.nl/kunstheup
Title: Hip Replacement | Thuisarts.nl
Summary: A hip replacement can be an option if you are experiencing severe pain or stiffness in your hip, such as from advanced arthritis or another hip disease. This is usually considered when other treatments like physiotherapy and painkillers have not provided enough relief. You can discuss with your hospital doctor whether a hip replacement is suitable for you. A hip prosthesis typically lasts longer than 20 years.
URL: https://www.thuisarts.nl/gezond-leven
Title: Healthy Living | Thuisarts.nl
Summary: For good health, it's important to eat, drink, and sleep well, stay active, relax, and maintain social contacts. Avoiding substances like alcohol is also beneficial. If you want to make changes to your lifestyle, take it step by step. Discuss your plans with your GP or practice nurse. Whether it's about healthy eating, exercise, sleep, stress management, social contact, or substance use, they can provide guidance and support.

View File

@@ -0,0 +1,933 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 113,
"id": "030082e9-edee-40b6-9f17-b6a683f2e334",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"import bs4\n",
"from bs4 import BeautifulSoup\n",
"import lxml\n",
"from IPython.display import Markdown, display\n",
"from openai import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 115,
"id": "c87e997d-e1d6-4b6f-9c76-3fb1d607f7cd",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')"
]
},
{
"cell_type": "code",
"execution_count": 116,
"id": "e450cb33-1ae4-435e-b155-35f2bd7ab78e",
"metadata": {},
"outputs": [],
"source": [
"headers={\n",
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
"} \n",
"#a dictionary named header so that we can grab same html code as the user ,and also to avoid blocks,captcha and error403"
]
},
{
"cell_type": "code",
"execution_count": 119,
"id": "63a57fb7-79db-444b-968b-c9314b1f3d3f",
"metadata": {},
"outputs": [],
"source": [
"class Website:\n",
" def __init__(self,url):\n",
" self.url=url\n",
" response= requests.get(url,headers=headers,timeout=30)\n",
" soup=BeautifulSoup(response.content,'lxml')\n",
" self.title=soup.title.string if soup.title else \"No title found\"#scraping the content\n",
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):#cleaning the content\n",
" irrelevant.decompose()\n",
" #using .get_text() method of Beautiful soup\n",
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)#creating space between different lines and removing leading whitespaces by strip=true"
]
},
{
"cell_type": "code",
"execution_count": 121,
"id": "7369159d-1f36-43c9-b7e7-a0b65b56426b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Latest and Trending Entertainment News, Celebrity News, Movie News, Breaking News | Entertainment - Times of India\n",
"Sign In\n",
"TOI\n",
"Go to\n",
"TOI\n",
"Etimes\n",
"home\n",
"cinema\n",
"news\n",
"movie reviews\n",
"movie listings\n",
"box office\n",
"anime\n",
"previews\n",
"did you know\n",
"videos\n",
"showtimes\n",
"blogs\n",
"awards\n",
"News\n",
"entertainment\n",
"Trending\n",
"Javed Akhtar\n",
"Diljit Dosanjh\n",
"Jaideep Ahlawat\n",
"Karisma Kapoor\n",
"Gauri Khan\n",
"Blake Lively\n",
"Trisha Krishnan\n",
"Kuberaa Box Office Collection\n",
"Sitaare Zameen Par Box Office Collection\n",
"Housefull 5\n",
"Kuberaa Movie Review\n",
"Sitaare Zameen Par Movie Review\n",
"Javed Akhtar\n",
"Diljit Dosanjh\n",
"Jaideep Ahlawat\n",
"Karisma Kapoor\n",
"Gauri Khan\n",
"Blake Lively\n",
"Trisha Krishnan\n",
"Kuberaa Box Office Collection\n",
"Sitaare Zameen Par Box Office Collection\n",
"Housefull 5\n",
"Kuberaa Movie Review\n",
"Sitaare Zameen Par Movie Review\n",
"Javed Akhtar\n",
"Diljit Dosanjh\n",
"Jaideep Ahlawat\n",
"Karisma Kapoor\n",
"Gauri Khan\n",
"Blake Lively\n",
"Trisha Krishnan\n",
"Kuberaa Box Office Collection\n",
"Sitaare Zameen Par Box Office Collection\n",
"Housefull 5\n",
"Kuberaa Movie Review\n",
"Sitaare Zameen Par Movie Review\n",
"Sudhanshu: At 52, John, Dino all of them look like rockstars - EXCLUSIVE\n",
"Sudhanshu Pandey, recognized from 'Band Of Boys' and 'Anupama', defies his 50 years with his fitness. He credits his peers like Dino Moria, Arjun Rampal, and John Abraham for inspiring him to maintain a fit and youthful appearance. Pandey also admires Anil Kapoor's energy and dedication, motivating him to continue prioritizing fitness and inspiring others.\n",
"Previous\n",
"Sonakshi breaks silence on her rift with Luv and Kussh\n",
"Madhuri once chased Aamir with hockey stick for THIS reason\n",
"Ranbir-Raj Kapoor, Diljit-Hania, Samay-IGL: Top 5 news\n",
"Big B's savage reply to troll over cybercrime callertune\n",
"Anushka on keeping kids Vamika, Akaay away from public eye\n",
"Apoorva Mukhija recalls witnessing gender bias at home\n",
"Danish influencer seeks help to find papads from Big B\n",
"Sunjay Kapur's reception pics with Priya Sachdev goes viral\n",
"Big B schools trolls commenting 'buddha sathiya gaya hai'\n",
"Anushka on how she and Virat divide parenting duties\n",
"Brahmaji reacts to Vishnu's 7,000-acre land in New Zealand\n",
"Diljit says THIS amidst trolling for working with Hania\n",
"Riddhi found it ridiculous to like SRK's mother in Jawan\n",
"Priya Sachdev once called husband Sunjay Kapur misunderstood\n",
"Next\n",
"1\n",
"2\n",
"3\n",
"Hindi\n",
"See All\n",
"Sudhanshu: At 52, John, Dino all of them look like rockstars - EXCLUSIVE\n",
"Sudhanshu Pandey, recognized from 'Band Of Boys' and 'Anupama', defies his 50 years with his fitness. He credits his peers like Dino Moria, Arjun Rampal, and John Abraham for inspiring him to maintain a fit and youthful appearance. Pandey also admires Anil Kapoor's energy and dedication, motivating him to continue prioritizing fitness and inspiring others.\n",
"Sonakshi breaks silence on her rift with Luv and Kussh\n",
"Madhuri once chased Aamir with hockey stick for THIS reason\n",
"Ranbir-Raj Kapoor, Diljit-Hania, Samay-IGL: Top 5 news\n",
"Anushka on keeping kids Vamika, Akaay away from public eye\n",
"Anushka Sharma and Virat Kohli are committed to shielding their children, Vamika and Akaay, from the constant glare of public attention. In a recent interview, Anushka emphasized the couple's focus on instilling strong values and ensuring a normal upbringing for their kids.\n",
"Apoorva Mukhija recalls witnessing gender bias at home\n",
"Regional\n",
"When Samanthas class 10 mark sheet got leaked\n",
"Throwback to when a nostalgic memory made its way across the internet — Samantha Ruth Prabhus Class 10 mark sheet! The actresss charming on-screen presence and grounded personality were once again in the spotlight as her old school report card began doing the rounds on social media.\n",
"Actor Tushar Ghadigaonkar passes away at 34\n",
"Kuberaa Twitter review: Netizens calls it a Blockbuster\n",
"Mammoottys health- Brittas says actor doing well\n",
"Kavya Madhavans father P. Madhavan passes away\n",
"The Raja Saab teaser: Prabhas shines in this horror comedy\n",
"Mammoottys father-in-law P S Abu passes away\n",
"Videos\n",
"See All\n",
"Previous\n",
"03:07\n",
"Ananya Pandays Garden Bond With Parrots Wins Hearts\n",
"88 views | 2 hours ago\n",
"03:14\n",
"Sameera Reddys Healing Journey Through Yoga\n",
"31 views | 2 hours ago\n",
"03:13\n",
"Kriti Kharbandas Modern Maharani Look Stuns Instagram\n",
"26 views | 2 hours ago\n",
"03:12\n",
"Bobby Deol Meets Diljit Dosanjh: Punjabi Power Goes Viral\n",
"81 views | 2 hours ago\n",
"03:19\n",
"Sitaare Zameen Par: Riteish Deshmukhs Emotional Shoutout For Genelias Big Win\n",
"162 views | 2 hours ago\n",
"03:26\n",
"Varun Dhawan Stuns With 50 Push-Ups Alongside Army Cadets on Border 2 Set\n",
"21 views | 2 hours ago\n",
"03:00\n",
"VIDYA BALAN TURNS HEADS WITH CASUAL AIRPORT LOOK\n",
"16 views | 2 hours ago\n",
"03:05\n",
"MANDHIRA KAPUR BREAKS DOWN IN EMOTIONAL POST FOR LATE BROTHER SUNJAY KAPUR\n",
"1.2K views | 2 hours ago\n",
"03:28\n",
"SALMAN KHAN TAKES A BRUTAL DIG AT SOHAILS DIVORCE ON NATIONAL TV\n",
"185 views | 2 hours ago\n",
"03:15\n",
"RAJINIKANTH CAUSES FAN RIOT DURING JAILER 2 SHOOT IN MYSORE\n",
"26 views | 2 hours ago\n",
"03:10\n",
"IBRAHIM ALI KHAN KISSES HIS DOG AT AIRPORT IN HEARTWARMING FAREWELL\n",
"20 views | 3 hours ago\n",
"03:09\n",
"ANUPAMAA SET GUTTED IN MASSIVE FIRE | CREW ESCAPES, CINE BODY DEMANDS ACTION\n",
"1.2K views | 3 hours ago\n",
"Next\n",
"1\n",
"2\n",
"3\n",
"4\n",
"5\n",
"6\n",
"7\n",
"8\n",
"9\n",
"10\n",
"11\n",
"World\n",
"See All\n",
"Aamir to Tom: Celebs on a mission to 'Save Cinema'\n",
"'How to Train Your Dragon' beats '28 Years Later' and 'Elio' to top the US box office on second weekend\n",
"Blake Lively is heartbroken after friendship ends with Taylor Swift; accepts the music mogul won't be returning - Deets inside\n",
"Selena-Hailey UNFOLLOW each other amid Bieber drama\n",
"Judge gives Baldoni access to Blake-Taylor messages\n",
"Trending Now\n",
"# Sidharth Malhotra-Kiara Advani\n",
"# AbRam Khan-Taimur Ali Khan\n",
"# Janhvi Kapoor\n",
"# Salman Khan\n",
"# Hema Malini\n",
"# Salman Khan\n",
"# Gauri Khan\n",
"# Shah Rukh Khan\n",
"# Chahatt Khanna\n",
"Visual Stories\n",
"See All\n",
"Previous\n",
"Kuberaas Sameera to Pushpas Srivalli: Rashmika Mandannas most iconic on-screen avatars\n",
"Ahaana Krishnas ethereal photo series is straight out of a dream\n",
"Rashmika Mandanna to Rakul Preet Singh: Best pictures of the week featuring south actresses\n",
"Gauri Khan's most loved saree looks - An ode to modern day elegance\n",
"South Indian beauties whose smiles will light up your Monday\n",
"Karishma Tanna Slays Every Frame\n",
"Tamannaah Bhatias traditional looks\n",
"Malavika Mohanan's radiant pics\n",
"Neha Shetty stuns in every shade of blue\n",
"Thalapathy Vijays top 10 blockbuster movies worth re-watching!\n",
"In pic: Mesmerizing looks of Shruti Haasan\n",
"Dushara Vijayans Most Elegant Fashion Moments\n",
"Next\n",
"1\n",
"2\n",
"3\n",
"More Stories\n",
"Sonakshi Sinha breaks silence on her rumoured rift with brothers Luv and Kussh Sinha: 'My effort is always to support them...'\n",
"Madhuri Dixit once chased Aamir Khan with a hockey stick for THIS reason on sets of Dil: 'People fool you and you believe them'\n",
"Mohanlal declines to continue as president at AMMAs general body meeting- Deets Inside\n",
"Blockbusters Ranbir Kapoor turned down: Films that became hits without him\n",
"Anushka Sharma reveals why she and Virat Kohli are keeping their children Vamika and Akaay away from the public eye: 'We don't want to raise brats'\n",
"Apoorva Mukhija recalls witnessing gender bias at home: 'My mother did it all, but father got credit for showing up at PTMs'\n",
"Amitabh Bachchan gives a savage reply to a troll over his viral cybercrime caller tune: 'Sarkar ko bolo bhai..'\n",
"Danish influencer asks fans to help her find papads from Amitabh Bachchan; netizens say 'he also used to grow basmati rice'\n",
"Days after his untimely demise, Sunjay Kapur's reception photos with Priya Sachdev goes viral; Looked dashing in hand embroidered shoes, written 'I do'\n",
"Priyanka Chopra Jonas recollects walking into a trap set by John Cena, Idris Elba on sets of 'Heads of State'\n",
"Bobby Deol's London vacation sparks fan frenzy: viral video shows actor posing for selfies outside restaurant\n",
"Amitabh Bachchah gives befitting replies to 'buddha sathiya gaya hai', ganja comments by trolls: 'Ek din, Bhagwan naa kare voh din jaldi aaye...'\n",
"Sai Pallavis best performances\n",
"Brahmaji clears the air about Vishnu Manchu purchasing 7,000-acre land in New Zealand: 'I was pulling their leg as usual...'\n",
"Anushka Sharma reveals how she and Virat Kohli divide the parenting duties: 'I will be the primary caregiver, he plays round the year'\n",
"Ranbir Kapoor's 'Awara' look sparks rumours of Raj Kapoor tribute, Diljit Dosanjh slammed for working with Hania Aamir in Sardaar Ji 3: Top 5 news\n",
"Has Kiara Advani been approached to play Meena Kumari in her biopic? Here's what we know\n",
"Top 5 psychological Anime every thriller fan must watch\n",
"Load More Stories\n",
"# Latest Movies 2025\n",
"# Best Bollywood Movies 2025\n",
"# Hollywood Movie 2025\n",
"# Tamil Movies 2025\n",
"# Telugu Movies 2025\n",
"# Malayalam Movies 2025\n",
"# Kannada Movies 2025\n",
"# Marathi Movies 2025\n",
"# Bengali Movies 2025\n",
"# Top Rated Movies 2025\n",
"# Best Hindi Movies\n",
"# Best English Movies\n",
"Hot on the Web\n",
"Salman Khan\n",
"Karisma Kapoor\n",
"Jaideep Ahlawat\n",
"Blood Pressure\n",
"Big Cat Species\n",
"Trisha\n",
"Sitaare Zameen Par Review\n",
"Ancient Indigenous Tribes\n",
"Hair Growth Tips\n",
"Kidney Health\n",
"Kuberaa Review\n",
"Blake Lively\n",
"Reverse Fatty Liver\n",
"Skincare Hacks\n",
"Kuberaa Box Office Collection\n",
"Sitaare Zameen Par Box Office Collection\n",
"Baby Girl Names\n",
"Diljit Dosanjh\n",
"Kidney Disease Symptoms\n",
"Javed Akhtar\n",
"Heart Attack\n",
"Ram Kapoor Diet\n",
"Liver Damage\n",
"Kuberaa Movie Review\n",
"Gauri Khan\n",
"Baba Vanga Prediction\n",
"Baby Boy Names\n",
"Navjot Singh Sidhu\n",
"Housefull 5 Box Office Collection\n",
"DNA Movie Review\n",
"Kidney Damage Symptoms\n",
"Popular Waterfalls In India\n",
"Linkedin Ceo On AI Killing Jobs\n",
"Tesla Robotaxi\n",
"Early Cancer Detection\n",
"Harvard Research Reveals\n",
"American Destinations Explore Without Passport\n",
"Amouranth\n",
"Mouth Larvae\n",
"Doomsday Fish\n",
"Salman Khan AVM\n",
"Ginger Health Tips\n",
"Trending Topics\n",
"Latest Movies\n",
"Bollywood Movies\n",
"Hollywood Movies\n",
"Tamil Movies 2025\n",
"Telugu Movies 2025\n",
"Malayalam Movies 2025\n",
"Kannada Movies 2025\n",
"Marathi Movies 2025\n",
"Bengali Movies 2025\n",
"Top Rated Movies 2025\n",
"Best Hindi Movies\n",
"Best English Movies\n",
"Best Telugu Movies\n",
"Best Tamil Movies\n",
"Best Malayalam Movies\n",
"Best Kannada Movies\n",
"Best Bengali Movies\n",
"Upcoming Hindi Movies\n",
"Best Movies Of All Time\n",
"Best Hindi Movies of All Time\n",
"Latest English Movies\n",
"Latest Malayalam Movies\n",
"English TV News\n",
"Tamil TV News\n",
"Telugu TV News\n",
"Malayalam TV News\n",
"Kannada TV News\n",
"Movie Reviews\n",
"Bhojpuri Cinema News\n",
"Gujarati Cinema News\n",
"Popular Categories\n",
"Viral News\n",
"K Pop News\n",
"Web Series News\n",
"Anime News\n",
"Upcoming English Movies\n",
"Upcoming Tamil Movies\n",
"Upcoming Telugu Movies\n",
"Upcoming Malayalam Movies\n",
"Upcoming Kannada Movies\n",
"Fashion Tips\n",
"Travel News\n",
"Entertainment News\n",
"Bollywood News\n",
"Tollywood News\n",
"Kollywood News\n",
"Mollywood News\n",
"Food News\n",
"Latest Hindi Movies\n",
"Latest Tamil Movies\n",
"Parenting Tips\n",
"Home Remedies\n",
"Weight Loss\n",
"Beauty Tips\n",
"Parenting Tips\n",
"Hindi Videos\n",
"Hindi Video Songs\n",
"Bhojpuri Music Videos\n",
"Latest Telugu Movies\n",
"Bhojpuri Music Video\n",
"Hindi TV News\n",
"Latest News\n",
"NHL free agency turns spicy as Mitch Marner and Connor McDavid eye shorter deals to cash in later\n",
"Olive Ridley turtle washed ashore at Polem\n",
"Who is Thomas Fugate? Meet the 22-year-old leading Trump's terrorism unit amid Iran fiasco\n",
"'And that's why Putin's the boss': Trump rebukes former Russian President Medvedev; warns against treating 'N word casually'\n",
"Govt plans ₹10cr road on Bicholim-Dodamarg route\n",
"Former WWE star Batista eyed for Road House 2 sequel\n",
"Sonakshi Sinha breaks silence on her rumoured rift with brothers Luv and Kussh Sinha: 'My effort is always to support them...'\n",
"Andre Agassi and Steffi Grafs son Jaden Agassi shows love for girlfriend Catherine Holts bold new photo from bedroom series\n",
"Is WWE planning to change Cody Rhodes iconic entrance theme song Kingdom?\n",
"Velumani says he didnt attend RSS event in Coimbatore\n",
"Strait of Hormuz: Oil supply not an issue for India; 'pricing is a bigger concern,' what experts say\n",
"Madhuri Dixit once chased Aamir Khan with a hockey stick for THIS reason on sets of Dil: 'People fool you and you believe them'\n",
"As commissions fall, Indias ride-hailing firms test viability of flat-fee economics\n",
"Analysing what Trumps strikes mean for Iran\n",
"Trump's clarification on 'Iran regime change' divides MAGA further: JD Vance, Hegseth, Marco Rubio 'humiliated'\n",
"Laughter Chefs 2: Krushna Abhishek roasts Rahul Vaidya for his in-famous feud with cricketer Virat Kohli\n",
"“I could have passed Dan Ticktum”: Edoardo Mortara regrets Attack Mode strategy at Jakarta E-Prix\n",
"India vs England Test: Sunil Gavaskar calls for Rishabh Pant's signature somersault celebration, wicketkeeper politely declines - WATCH\n",
"Copyright © 2025 Bennett, Coleman & Co. Ltd. All rights reserved. For reprint rights: Times Syndication Service\n",
"Follow us on\n"
]
}
],
"source": [
"gossip= Website(\"https://timesofindia.indiatimes.com/entertainment\")\n",
"print(gossip.title)\n",
"print(gossip.text)"
]
},
{
"cell_type": "code",
"execution_count": 123,
"id": "a6f30380-1b91-48e4-9c86-df0369e2e675",
"metadata": {},
"outputs": [],
"source": [
"system_prompt = \"\"\"\n",
"You are a stylish and culturally aware assistant who specializes in summarizing and discussing fashion trends, celebrity style, entertainment news, and television gossip.\n",
"\n",
"You stay updated on Hollywood, Bollywood, and the television world—including celebrity rumors, drama, reality TV updates, show recaps, and behind-the-scenes stories.\n",
"\n",
"When summarizing content, be engaging, concise, and insightful. Focus on what's trending, who's wearing what, and what everyone is talking about in fashion and entertainment. Maintain a fun yet informative tone, like a pop culture expert writing for a lifestyle magazine.\n",
"\n",
"If content includes TV gossip, highlight key rumors, casting updates, fan reactions, and noteworthy moments from popular shows.\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 125,
"id": "30822d5c-d518-451c-b31f-44afa2a3b37a",
"metadata": {},
"outputs": [],
"source": [
"def user_prompt_for(website):\n",
" user_prompt = f\"\"\"The following text is extracted from a website titled: \"{website.title}\".\n",
"\n",
"Please analyze this content and provide a short and engaging summary in **Markdown format**.\n",
"\n",
"If the page contains:\n",
"- 🧵 Fashion trends: mention standout styles, designers, or events.\n",
"- 🗣️ TV gossip: highlight any drama, casting news, or fan reactions.\n",
"- 🎬 Celebrity updates (Hollywood/Bollywood): include relevant quotes, fashion moments, or event mentions.\n",
"- 📺 Show recaps: summarize what happened and any major twists.\n",
"\n",
"Keep the summary clear, fun, and informative. Use bullet points if multiple themes appear. If there is no meaningful content, say: *“No relevant summary could be generated.”*\n",
"\n",
"Website Content:\n",
"{website.text}\n",
"\"\"\"\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": 127,
"id": "5a25e90f-20a0-44ac-a96c-575ae974a45f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The following text is extracted from a website titled: \"Latest and Trending Entertainment News, Celebrity News, Movie News, Breaking News | Entertainment - Times of India\".\n",
"\n",
"Please analyze this content and provide a short and engaging summary in **Markdown format**.\n",
"\n",
"If the page contains:\n",
"- 🧵 Fashion trends: mention standout styles, designers, or events.\n",
"- 🗣️ TV gossip: highlight any drama, casting news, or fan reactions.\n",
"- 🎬 Celebrity updates (Hollywood/Bollywood): include relevant quotes, fashion moments, or event mentions.\n",
"- 📺 Show recaps: summarize what happened and any major twists.\n",
"\n",
"Keep the summary clear, fun, and informative. Use bullet points if multiple themes appear. If there is no meaningful content, say: *“No relevant summary could be generated.”*\n",
"\n",
"Website Content:\n",
"Sign In\n",
"TOI\n",
"Go to\n",
"TOI\n",
"Etimes\n",
"home\n",
"cinema\n",
"news\n",
"movie reviews\n",
"movie listings\n",
"box office\n",
"anime\n",
"previews\n",
"did you know\n",
"videos\n",
"showtimes\n",
"blogs\n",
"awards\n",
"News\n",
"entertainment\n",
"Trending\n",
"Javed Akhtar\n",
"Diljit Dosanjh\n",
"Jaideep Ahlawat\n",
"Karisma Kapoor\n",
"Gauri Khan\n",
"Blake Lively\n",
"Trisha Krishnan\n",
"Kuberaa Box Office Collection\n",
"Sitaare Zameen Par Box Office Collection\n",
"Housefull 5\n",
"Kuberaa Movie Review\n",
"Sitaare Zameen Par Movie Review\n",
"Javed Akhtar\n",
"Diljit Dosanjh\n",
"Jaideep Ahlawat\n",
"Karisma Kapoor\n",
"Gauri Khan\n",
"Blake Lively\n",
"Trisha Krishnan\n",
"Kuberaa Box Office Collection\n",
"Sitaare Zameen Par Box Office Collection\n",
"Housefull 5\n",
"Kuberaa Movie Review\n",
"Sitaare Zameen Par Movie Review\n",
"Javed Akhtar\n",
"Diljit Dosanjh\n",
"Jaideep Ahlawat\n",
"Karisma Kapoor\n",
"Gauri Khan\n",
"Blake Lively\n",
"Trisha Krishnan\n",
"Kuberaa Box Office Collection\n",
"Sitaare Zameen Par Box Office Collection\n",
"Housefull 5\n",
"Kuberaa Movie Review\n",
"Sitaare Zameen Par Movie Review\n",
"Sudhanshu: At 52, John, Dino all of them look like rockstars - EXCLUSIVE\n",
"Sudhanshu Pandey, recognized from 'Band Of Boys' and 'Anupama', defies his 50 years with his fitness. He credits his peers like Dino Moria, Arjun Rampal, and John Abraham for inspiring him to maintain a fit and youthful appearance. Pandey also admires Anil Kapoor's energy and dedication, motivating him to continue prioritizing fitness and inspiring others.\n",
"Previous\n",
"Sonakshi breaks silence on her rift with Luv and Kussh\n",
"Madhuri once chased Aamir with hockey stick for THIS reason\n",
"Ranbir-Raj Kapoor, Diljit-Hania, Samay-IGL: Top 5 news\n",
"Big B's savage reply to troll over cybercrime callertune\n",
"Anushka on keeping kids Vamika, Akaay away from public eye\n",
"Apoorva Mukhija recalls witnessing gender bias at home\n",
"Danish influencer seeks help to find papads from Big B\n",
"Sunjay Kapur's reception pics with Priya Sachdev goes viral\n",
"Big B schools trolls commenting 'buddha sathiya gaya hai'\n",
"Anushka on how she and Virat divide parenting duties\n",
"Brahmaji reacts to Vishnu's 7,000-acre land in New Zealand\n",
"Diljit says THIS amidst trolling for working with Hania\n",
"Riddhi found it ridiculous to like SRK's mother in Jawan\n",
"Priya Sachdev once called husband Sunjay Kapur misunderstood\n",
"Next\n",
"1\n",
"2\n",
"3\n",
"Hindi\n",
"See All\n",
"Sudhanshu: At 52, John, Dino all of them look like rockstars - EXCLUSIVE\n",
"Sudhanshu Pandey, recognized from 'Band Of Boys' and 'Anupama', defies his 50 years with his fitness. He credits his peers like Dino Moria, Arjun Rampal, and John Abraham for inspiring him to maintain a fit and youthful appearance. Pandey also admires Anil Kapoor's energy and dedication, motivating him to continue prioritizing fitness and inspiring others.\n",
"Sonakshi breaks silence on her rift with Luv and Kussh\n",
"Madhuri once chased Aamir with hockey stick for THIS reason\n",
"Ranbir-Raj Kapoor, Diljit-Hania, Samay-IGL: Top 5 news\n",
"Anushka on keeping kids Vamika, Akaay away from public eye\n",
"Anushka Sharma and Virat Kohli are committed to shielding their children, Vamika and Akaay, from the constant glare of public attention. In a recent interview, Anushka emphasized the couple's focus on instilling strong values and ensuring a normal upbringing for their kids.\n",
"Apoorva Mukhija recalls witnessing gender bias at home\n",
"Regional\n",
"When Samanthas class 10 mark sheet got leaked\n",
"Throwback to when a nostalgic memory made its way across the internet — Samantha Ruth Prabhus Class 10 mark sheet! The actresss charming on-screen presence and grounded personality were once again in the spotlight as her old school report card began doing the rounds on social media.\n",
"Actor Tushar Ghadigaonkar passes away at 34\n",
"Kuberaa Twitter review: Netizens calls it a Blockbuster\n",
"Mammoottys health- Brittas says actor doing well\n",
"Kavya Madhavans father P. Madhavan passes away\n",
"The Raja Saab teaser: Prabhas shines in this horror comedy\n",
"Mammoottys father-in-law P S Abu passes away\n",
"Videos\n",
"See All\n",
"Previous\n",
"03:07\n",
"Ananya Pandays Garden Bond With Parrots Wins Hearts\n",
"88 views | 2 hours ago\n",
"03:14\n",
"Sameera Reddys Healing Journey Through Yoga\n",
"31 views | 2 hours ago\n",
"03:13\n",
"Kriti Kharbandas Modern Maharani Look Stuns Instagram\n",
"26 views | 2 hours ago\n",
"03:12\n",
"Bobby Deol Meets Diljit Dosanjh: Punjabi Power Goes Viral\n",
"81 views | 2 hours ago\n",
"03:19\n",
"Sitaare Zameen Par: Riteish Deshmukhs Emotional Shoutout For Genelias Big Win\n",
"162 views | 2 hours ago\n",
"03:26\n",
"Varun Dhawan Stuns With 50 Push-Ups Alongside Army Cadets on Border 2 Set\n",
"21 views | 2 hours ago\n",
"03:00\n",
"VIDYA BALAN TURNS HEADS WITH CASUAL AIRPORT LOOK\n",
"16 views | 2 hours ago\n",
"03:05\n",
"MANDHIRA KAPUR BREAKS DOWN IN EMOTIONAL POST FOR LATE BROTHER SUNJAY KAPUR\n",
"1.2K views | 2 hours ago\n",
"03:28\n",
"SALMAN KHAN TAKES A BRUTAL DIG AT SOHAILS DIVORCE ON NATIONAL TV\n",
"185 views | 2 hours ago\n",
"03:15\n",
"RAJINIKANTH CAUSES FAN RIOT DURING JAILER 2 SHOOT IN MYSORE\n",
"26 views | 2 hours ago\n",
"03:10\n",
"IBRAHIM ALI KHAN KISSES HIS DOG AT AIRPORT IN HEARTWARMING FAREWELL\n",
"20 views | 3 hours ago\n",
"03:09\n",
"ANUPAMAA SET GUTTED IN MASSIVE FIRE | CREW ESCAPES, CINE BODY DEMANDS ACTION\n",
"1.2K views | 3 hours ago\n",
"Next\n",
"1\n",
"2\n",
"3\n",
"4\n",
"5\n",
"6\n",
"7\n",
"8\n",
"9\n",
"10\n",
"11\n",
"World\n",
"See All\n",
"Aamir to Tom: Celebs on a mission to 'Save Cinema'\n",
"'How to Train Your Dragon' beats '28 Years Later' and 'Elio' to top the US box office on second weekend\n",
"Blake Lively is heartbroken after friendship ends with Taylor Swift; accepts the music mogul won't be returning - Deets inside\n",
"Selena-Hailey UNFOLLOW each other amid Bieber drama\n",
"Judge gives Baldoni access to Blake-Taylor messages\n",
"Trending Now\n",
"# Sidharth Malhotra-Kiara Advani\n",
"# AbRam Khan-Taimur Ali Khan\n",
"# Janhvi Kapoor\n",
"# Salman Khan\n",
"# Hema Malini\n",
"# Salman Khan\n",
"# Gauri Khan\n",
"# Shah Rukh Khan\n",
"# Chahatt Khanna\n",
"Visual Stories\n",
"See All\n",
"Previous\n",
"Kuberaas Sameera to Pushpas Srivalli: Rashmika Mandannas most iconic on-screen avatars\n",
"Ahaana Krishnas ethereal photo series is straight out of a dream\n",
"Rashmika Mandanna to Rakul Preet Singh: Best pictures of the week featuring south actresses\n",
"Gauri Khan's most loved saree looks - An ode to modern day elegance\n",
"South Indian beauties whose smiles will light up your Monday\n",
"Karishma Tanna Slays Every Frame\n",
"Tamannaah Bhatias traditional looks\n",
"Malavika Mohanan's radiant pics\n",
"Neha Shetty stuns in every shade of blue\n",
"Thalapathy Vijays top 10 blockbuster movies worth re-watching!\n",
"In pic: Mesmerizing looks of Shruti Haasan\n",
"Dushara Vijayans Most Elegant Fashion Moments\n",
"Next\n",
"1\n",
"2\n",
"3\n",
"More Stories\n",
"Sonakshi Sinha breaks silence on her rumoured rift with brothers Luv and Kussh Sinha: 'My effort is always to support them...'\n",
"Madhuri Dixit once chased Aamir Khan with a hockey stick for THIS reason on sets of Dil: 'People fool you and you believe them'\n",
"Mohanlal declines to continue as president at AMMAs general body meeting- Deets Inside\n",
"Blockbusters Ranbir Kapoor turned down: Films that became hits without him\n",
"Anushka Sharma reveals why she and Virat Kohli are keeping their children Vamika and Akaay away from the public eye: 'We don't want to raise brats'\n",
"Apoorva Mukhija recalls witnessing gender bias at home: 'My mother did it all, but father got credit for showing up at PTMs'\n",
"Amitabh Bachchan gives a savage reply to a troll over his viral cybercrime caller tune: 'Sarkar ko bolo bhai..'\n",
"Danish influencer asks fans to help her find papads from Amitabh Bachchan; netizens say 'he also used to grow basmati rice'\n",
"Days after his untimely demise, Sunjay Kapur's reception photos with Priya Sachdev goes viral; Looked dashing in hand embroidered shoes, written 'I do'\n",
"Priyanka Chopra Jonas recollects walking into a trap set by John Cena, Idris Elba on sets of 'Heads of State'\n",
"Bobby Deol's London vacation sparks fan frenzy: viral video shows actor posing for selfies outside restaurant\n",
"Amitabh Bachchah gives befitting replies to 'buddha sathiya gaya hai', ganja comments by trolls: 'Ek din, Bhagwan naa kare voh din jaldi aaye...'\n",
"Sai Pallavis best performances\n",
"Brahmaji clears the air about Vishnu Manchu purchasing 7,000-acre land in New Zealand: 'I was pulling their leg as usual...'\n",
"Anushka Sharma reveals how she and Virat Kohli divide the parenting duties: 'I will be the primary caregiver, he plays round the year'\n",
"Ranbir Kapoor's 'Awara' look sparks rumours of Raj Kapoor tribute, Diljit Dosanjh slammed for working with Hania Aamir in Sardaar Ji 3: Top 5 news\n",
"Has Kiara Advani been approached to play Meena Kumari in her biopic? Here's what we know\n",
"Top 5 psychological Anime every thriller fan must watch\n",
"Load More Stories\n",
"# Latest Movies 2025\n",
"# Best Bollywood Movies 2025\n",
"# Hollywood Movie 2025\n",
"# Tamil Movies 2025\n",
"# Telugu Movies 2025\n",
"# Malayalam Movies 2025\n",
"# Kannada Movies 2025\n",
"# Marathi Movies 2025\n",
"# Bengali Movies 2025\n",
"# Top Rated Movies 2025\n",
"# Best Hindi Movies\n",
"# Best English Movies\n",
"Hot on the Web\n",
"Salman Khan\n",
"Karisma Kapoor\n",
"Jaideep Ahlawat\n",
"Blood Pressure\n",
"Big Cat Species\n",
"Trisha\n",
"Sitaare Zameen Par Review\n",
"Ancient Indigenous Tribes\n",
"Hair Growth Tips\n",
"Kidney Health\n",
"Kuberaa Review\n",
"Blake Lively\n",
"Reverse Fatty Liver\n",
"Skincare Hacks\n",
"Kuberaa Box Office Collection\n",
"Sitaare Zameen Par Box Office Collection\n",
"Baby Girl Names\n",
"Diljit Dosanjh\n",
"Kidney Disease Symptoms\n",
"Javed Akhtar\n",
"Heart Attack\n",
"Ram Kapoor Diet\n",
"Liver Damage\n",
"Kuberaa Movie Review\n",
"Gauri Khan\n",
"Baba Vanga Prediction\n",
"Baby Boy Names\n",
"Navjot Singh Sidhu\n",
"Housefull 5 Box Office Collection\n",
"DNA Movie Review\n",
"Kidney Damage Symptoms\n",
"Popular Waterfalls In India\n",
"Linkedin Ceo On AI Killing Jobs\n",
"Tesla Robotaxi\n",
"Early Cancer Detection\n",
"Harvard Research Reveals\n",
"American Destinations Explore Without Passport\n",
"Amouranth\n",
"Mouth Larvae\n",
"Doomsday Fish\n",
"Salman Khan AVM\n",
"Ginger Health Tips\n",
"Trending Topics\n",
"Latest Movies\n",
"Bollywood Movies\n",
"Hollywood Movies\n",
"Tamil Movies 2025\n",
"Telugu Movies 2025\n",
"Malayalam Movies 2025\n",
"Kannada Movies 2025\n",
"Marathi Movies 2025\n",
"Bengali Movies 2025\n",
"Top Rated Movies 2025\n",
"Best Hindi Movies\n",
"Best English Movies\n",
"Best Telugu Movies\n",
"Best Tamil Movies\n",
"Best Malayalam Movies\n",
"Best Kannada Movies\n",
"Best Bengali Movies\n",
"Upcoming Hindi Movies\n",
"Best Movies Of All Time\n",
"Best Hindi Movies of All Time\n",
"Latest English Movies\n",
"Latest Malayalam Movies\n",
"English TV News\n",
"Tamil TV News\n",
"Telugu TV News\n",
"Malayalam TV News\n",
"Kannada TV News\n",
"Movie Reviews\n",
"Bhojpuri Cinema News\n",
"Gujarati Cinema News\n",
"Popular Categories\n",
"Viral News\n",
"K Pop News\n",
"Web Series News\n",
"Anime News\n",
"Upcoming English Movies\n",
"Upcoming Tamil Movies\n",
"Upcoming Telugu Movies\n",
"Upcoming Malayalam Movies\n",
"Upcoming Kannada Movies\n",
"Fashion Tips\n",
"Travel News\n",
"Entertainment News\n",
"Bollywood News\n",
"Tollywood News\n",
"Kollywood News\n",
"Mollywood News\n",
"Food News\n",
"Latest Hindi Movies\n",
"Latest Tamil Movies\n",
"Parenting Tips\n",
"Home Remedies\n",
"Weight Loss\n",
"Beauty Tips\n",
"Parenting Tips\n",
"Hindi Videos\n",
"Hindi Video Songs\n",
"Bhojpuri Music Videos\n",
"Latest Telugu Movies\n",
"Bhojpuri Music Video\n",
"Hindi TV News\n",
"Latest News\n",
"NHL free agency turns spicy as Mitch Marner and Connor McDavid eye shorter deals to cash in later\n",
"Olive Ridley turtle washed ashore at Polem\n",
"Who is Thomas Fugate? Meet the 22-year-old leading Trump's terrorism unit amid Iran fiasco\n",
"'And that's why Putin's the boss': Trump rebukes former Russian President Medvedev; warns against treating 'N word casually'\n",
"Govt plans ₹10cr road on Bicholim-Dodamarg route\n",
"Former WWE star Batista eyed for Road House 2 sequel\n",
"Sonakshi Sinha breaks silence on her rumoured rift with brothers Luv and Kussh Sinha: 'My effort is always to support them...'\n",
"Andre Agassi and Steffi Grafs son Jaden Agassi shows love for girlfriend Catherine Holts bold new photo from bedroom series\n",
"Is WWE planning to change Cody Rhodes iconic entrance theme song Kingdom?\n",
"Velumani says he didnt attend RSS event in Coimbatore\n",
"Strait of Hormuz: Oil supply not an issue for India; 'pricing is a bigger concern,' what experts say\n",
"Madhuri Dixit once chased Aamir Khan with a hockey stick for THIS reason on sets of Dil: 'People fool you and you believe them'\n",
"As commissions fall, Indias ride-hailing firms test viability of flat-fee economics\n",
"Analysing what Trumps strikes mean for Iran\n",
"Trump's clarification on 'Iran regime change' divides MAGA further: JD Vance, Hegseth, Marco Rubio 'humiliated'\n",
"Laughter Chefs 2: Krushna Abhishek roasts Rahul Vaidya for his in-famous feud with cricketer Virat Kohli\n",
"“I could have passed Dan Ticktum”: Edoardo Mortara regrets Attack Mode strategy at Jakarta E-Prix\n",
"India vs England Test: Sunil Gavaskar calls for Rishabh Pant's signature somersault celebration, wicketkeeper politely declines - WATCH\n",
"Copyright © 2025 Bennett, Coleman & Co. Ltd. All rights reserved. For reprint rights: Times Syndication Service\n",
"Follow us on\n",
"\n"
]
}
],
"source": [
"print(user_prompt_for(gossip))"
]
},
{
"cell_type": "code",
"execution_count": 129,
"id": "c039ab7c-88ee-475d-a93e-b26711d3ed4b",
"metadata": {},
"outputs": [],
"source": [
"def messages_for(website):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": 146,
"id": "dd1fee35-6cc9-4995-8b5e-b93d80488364",
"metadata": {},
"outputs": [],
"source": [
"def summarize(url):\n",
" website = Website(url)\n",
" response = openai.chat.completions.create(\n",
" model = \"llama3.2\",\n",
" messages = messages_for(website)\n",
" )\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ed09dad8-93bb-417e-b07b-183d2eba1ec5",
"metadata": {},
"outputs": [],
"source": [
"summarize(\"https://timesofindia.indiatimes.com/entertainment\")"
]
},
{
"cell_type": "code",
"execution_count": 139,
"id": "16a57eed-eba5-4f75-84f2-d44a67b36047",
"metadata": {},
"outputs": [],
"source": [
"def display_summary(url):\n",
" summary = summarize(url)\n",
" display(Markdown(summary))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "25af6217-6944-4c95-b156-0899dfcf0b83",
"metadata": {},
"outputs": [],
"source": [
"display_summary(\"https://timesofindia.indiatimes.com/entertainment\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29daa2d4-9d92-40ae-a0c4-dd2fdacf3f80",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -17,17 +17,13 @@ dependencies:
- scikit-learn
- chromadb
- jupyter-dash
- sentencepiece
- pyarrow
- faiss-cpu
- pip:
- beautifulsoup4
- plotly
- bitsandbytes
- transformers
- sentence-transformers
- datasets
- accelerate
- datasets==3.6.0
- openai
- anthropic
- google-generativeai
@@ -44,7 +40,7 @@ dependencies:
- langchain-openai
- langchain-chroma
- langchain-community
- faiss-cpu
- feedparser
- twilio
- pydub
- protobuf==3.20.2

View File

@@ -346,7 +346,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
"version": "3.11.13"
}
},
"nbformat": 4,

View File

@@ -14,19 +14,15 @@ tqdm
openai
gradio
langchain
tiktoken
faiss-cpu
langchain-core
langchain-text-splitters
langchain-openai
langchain_experimental
langchain_chroma
langchain[docarray]
datasets
sentencepiece
langchain-chroma
langchain-community
datasets==3.6.0
matplotlib
google-generativeai
anthropic
scikit-learn
unstructured
chromadb
plotly
jupyter-dash
@@ -34,11 +30,9 @@ beautifulsoup4
pydub
modal
ollama
accelerate
sentencepiece
bitsandbytes
psutil
setuptools
speedtest-cli
sentence_transformers
feedparser
protobuf==3.20.2

View File

@@ -0,0 +1,273 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "4e66a6eb-e44a-4dc3-bad7-82e27d45155d",
"metadata": {},
"source": [
"# Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98bf393c-358e-4ee1-b15b-96dfec323734",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display\n",
"from openai import OpenAI"
]
},
{
"cell_type": "markdown",
"id": "f92034ed-a2e6-444a-8008-291ba3f80561",
"metadata": {},
"source": [
"# OpenAI API Key"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a084b35d-19e9-4b48-bb06-d2c9e4474b20",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"# Check the key\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")"
]
},
{
"cell_type": "markdown",
"id": "32b35ea0-e4ca-492a-94af-822ec61468a0",
"metadata": {},
"source": [
"# About..."
]
},
{
"cell_type": "markdown",
"id": "c660b786-af88-4134-b958-ffbf7a7b2904",
"metadata": {},
"source": [
"In this project I use the code from day 1 for something I do at work. I'm a real estate appraiser and when I prepare a valuation for some real estate, I analyze the local market, and in particular the city where the property is located. I then gather economy-related information and create a report from it. I'm based in Poland, so the report is in Polish. Here, I want to ask the model to make such a report for me, using the official website of the city and its related Wikipedia article."
]
},
{
"cell_type": "markdown",
"id": "09f32b5a-4d0a-4fec-a2f8-5d323ca2745d",
"metadata": {},
"source": [
"# The Code"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0fb8fe1-f052-4426-8531-5520d5295807",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a2cca4b-8cd0-4c1a-a01c-1da10199236c",
"metadata": {},
"outputs": [],
"source": [
"headers = {\n",
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
"}\n",
"\n",
"class Website:\n",
"\n",
" def __init__(self, url):\n",
" \"\"\"\n",
" Create this Website object from the given url using the BeautifulSoup library\n",
" \"\"\"\n",
" self.url = url\n",
" response = requests.get(url, headers=headers)\n",
" soup = BeautifulSoup(response.content, 'html.parser')\n",
" self.title = soup.title.string if soup.title else \"No title found\"\n",
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
" irrelevant.decompose()\n",
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c73e91c8-5805-4c9f-9bbb-b4e9c1e7bf12",
"metadata": {},
"outputs": [],
"source": [
"system_prompt = \"\"\"You are an analyst and real estate appraiser who checks out the official websites \n",
"of cities as well as articles related to these cities on Wikipedia, searching the particular pages \n",
"of the official website and the Wikipedia article for economic data, in particular the \n",
"demographic structure of the city, its area, and how it's subdivided into built-up area, \n",
"rural area, forests, and so on, provided this kind of information is available. \n",
"The most important information you want to find is that related to the real estate market in the city, \n",
"but also the general economy of the city, so what kind of factories or companies there are, commerce, \n",
"business conditions, transportation, economic growth in recent years, and recent investments. \n",
"wealth of the inhabitants, and so on, depending on what kind of information is available on the website. \n",
"Combine the information found on the official website with the information found on Wikipedia, and in case\n",
"of discrepancies, the official website should take precedence. If any of the information is missing,\n",
"just omit it entirely and don't mention that it is missing, just don't write about it at all.\n",
"When you gather all the required information, create a comprehensive report presenting \n",
"the data in a clear way, using markdown, in tabular form where it makes sense. \n",
"The length of the report should be about 5000 characters. And one more thing, the report should be entirely \n",
"in Polish. \"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e8015e8d-1655-4477-a111-aa8dd584f5eb",
"metadata": {},
"outputs": [],
"source": [
"def user_prompt_for(city, city_website, wiki_website):\n",
" user_prompt = f\"You are looking at the official website of the city {city}, and its wiki article.\"\n",
" user_prompt += f\"\\nThe contents of this website is as follows: \\\n",
"please provide a comprehensive report of economy-related data for the city of {city}, available on the \\\n",
"particular pages and subpages of its official website and Wikipedia in markdown. \\\n",
"Add tables if it makes sense for the data. The length of the report should be about 5000 characters. \\\n",
"The report should be in Polish.\\n\\n\"\n",
" user_prompt += city_website.text\n",
" user_prompt += wiki_website.text\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b55bd66b-e997-4d64-b5d5-679098013b9f",
"metadata": {},
"outputs": [],
"source": [
"def messages_for(city, city_website, wiki_website):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(city, city_website, wiki_website)}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5f1f218-d6a9-4a9e-be7e-b4f41e7647e5",
"metadata": {},
"outputs": [],
"source": [
"def report(url_official, url_wiki, city):\n",
" city_website = Website(url_official)\n",
" wiki_website = Website(url_wiki)\n",
" response = openai.chat.completions.create(\n",
" model = \"gpt-4o-mini\",\n",
" messages = messages_for(city, city_website, wiki_website)\n",
" )\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "markdown",
"id": "08b47ec7-d00f-44e4-bbe2-580c8efd88e5",
"metadata": {},
"source": [
"# Raw Result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "830f0746-08a7-43ae-bd40-78d4a4c5d3e5",
"metadata": {},
"outputs": [],
"source": [
"report(\"https://www.rudaslaska.pl/\", \"https://pl.wikipedia.org/wiki/Ruda_%C5%9Al%C4%85ska\", \"Ruda Śląska\")"
]
},
{
"cell_type": "markdown",
"id": "a3630ac4-c103-4b84-a1a2-c246a702346e",
"metadata": {},
"source": [
"# Polished Result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b89dd543-998d-4466-abd8-cc785118d3e4",
"metadata": {},
"outputs": [],
"source": [
"def display_report(url_official, url_wiki, city):\n",
" rep = report(url_official, url_wiki, city)\n",
" display(Markdown(rep))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "157926f3-ba67-4d4b-abbb-24a2dcd85a8b",
"metadata": {},
"outputs": [],
"source": [
"display_report(\"https://www.rudaslaska.pl/\", \"https://pl.wikipedia.org/wiki/Ruda_%C5%9Al%C4%85ska\", \"Ruda Śląska\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "727d2283-e74c-4e74-86f2-759b08f1427a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,409 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9413d98a-352a-47b7-b84b-5b4a61b3c002",
"metadata": {},
"source": [
"# Reddit Post Analysis"
]
},
{
"cell_type": "markdown",
"id": "97ebfa77-33f8-4cd1-9204-d73aeefc0fea",
"metadata": {},
"source": [
"1. **Sets the Role and Tone** \n",
" Instructs the AI to act as an **expert analyst** specializing in extracting insights from online forums like Reddit.\n",
"\n",
"2. **Guides Sentiment Analysis** \n",
" Asks the AI to evaluate overall sentiment (e.g., positive, neutral, negative), and to present it as approximate percentages with a brief rationale.\n",
"\n",
"3. **Groups and Labels Themes** \n",
" Instructs the AI to identify and cluster **key discussion themes**, perspectives, and emotional tones. Each theme should be explained and illustrated with **example comments**.\n",
"\n",
"4. **Creates an Insights Table** \n",
" Requests a structured table with fields like *Perspectives, Frustrations, Tools, Suggestions* to concisely summarize the discussions core insights.\n",
"\n",
"5. **Describes Community Dynamics** \n",
" Asks the AI to assess the **interaction style** (e.g., supportive, sarcastic, argumentative) and note any social patterns (e.g., consensus or conflict)."
]
},
{
"cell_type": "markdown",
"id": "425868ba-faec-4754-87f5-650f7529b319",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"#### Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9596f40f-5add-4602-91e3-cd7d2c753c33",
"metadata": {},
"outputs": [],
"source": [
"import praw\n",
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from IPython.display import Markdown, display, Image\n",
"from openai import OpenAI"
]
},
{
"cell_type": "markdown",
"id": "9e1a9999-4aad-416d-90fe-3b0841a4f455",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"#### Load Credentials"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "847843ce-ebf9-4f48-b625-82e3ed687c81",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"# Check the key\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c615d79b-55a0-4eb1-ad8b-a2e28c11b49e",
"metadata": {},
"outputs": [],
"source": [
"reddit = praw.Reddit(\n",
" client_id=os.getenv(\"REDDIT_CLIENT_ID\"),\n",
" client_secret=os.getenv(\"REDDIT_CLIENT_SECRET\"),\n",
" user_agent=os.getenv(\"REDDIT_USER_AGENT\"),\n",
" username=os.getenv(\"REDDIT_USERNAME\"),\n",
" password=os.getenv(\"REDDIT_PASSWORD\")\n",
")\n",
"\n",
"print(\"Authenticated as:\", reddit.user.me())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6df2224d-ecfd-4e07-9bc8-102eff257d69",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()"
]
},
{
"cell_type": "markdown",
"id": "21ba0482-79e5-45ec-81d7-8611312c6b9e",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"#### Reddit Post Scraper"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8dc5276d-2d38-4651-9db0-c353076d6096",
"metadata": {},
"outputs": [],
"source": [
"\n",
"class RedditPostScraper:\n",
" def __init__(self, url):\n",
" self.submission = reddit.submission(url=url)\n",
" self.submission.comments.replace_more(limit=None)\n",
" self._title = self.submission.title\n",
" self._text = self.submission.selftext\n",
" self._comments = \"\"\n",
" self._formatted_comments = [] # for reprocessing if needed\n",
"\n",
" def _generate_comments(self):\n",
" comments_list = []\n",
" for top_level in self.submission.comments:\n",
" top_author = top_level.author.name if top_level.author else \"[deleted]\"\n",
" comments_list.append(f\"{top_author}: {top_level.body}\")\n",
"\n",
" for reply in top_level.replies:\n",
" reply_author = reply.author.name if reply.author else \"[deleted]\"\n",
" comments_list.append(\n",
" f\"{reply_author} replied to {top_author}'s comment: {reply.body}\"\n",
" )\n",
" self._formatted_comments = comments_list\n",
"\n",
" def title(self):\n",
" return f\"Title:\\n{self._title}\\n{self._text}\"\n",
"\n",
" def comments(self, max_words=None):\n",
" if not self._formatted_comments:\n",
" self._generate_comments()\n",
"\n",
" output_comments = []\n",
" total_words = 0\n",
"\n",
" for comment in self._formatted_comments:\n",
" word_count = len(comment.split())\n",
" if max_words and total_words + word_count > max_words:\n",
" break\n",
" output_comments.append(comment)\n",
" total_words += word_count\n",
"\n",
" return \"Text:\\n\" + \"\\n\\n\".join(output_comments)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3121cad0-4e2c-4d78-88e2-e72c6b99e2bf",
"metadata": {},
"outputs": [],
"source": [
"# post = RedditPostScraper(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")\n",
"# print(post.title())\n",
"# print(post.comments(2000))"
]
},
{
"cell_type": "markdown",
"id": "569760f6-5d68-40c1-9227-374c8e04d70a",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"#### System and User Prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "22c0e89a-c076-4616-ae9b-b4cd588f39ad",
"metadata": {},
"outputs": [],
"source": [
"system_prompt = '''You are an expert analyst specializing in extracting insights from online discussion forums. You will be given the title of a Reddit post and a list of comments (some with replies). Your task is to analyze the sentiment of the discussion and extract structured insights that reflect the collective responses.\n",
"\n",
"Your response **must be in well-formatted Markdown**. Use clear section headers (`##`, `###`), bullet points, and tables where appropriate.\n",
"\n",
"Perform the following tasks:\n",
"\n",
"---\n",
"\n",
"## 1. Overall Sentiment Breakdown\n",
"\n",
"- Determine the overall sentiment of the responses (e.g., positive, negative, neutral, mixed).\n",
"- Express the sentiment as approximate percentages (e.g., 60% positive, 25% neutral, 15% negative).\n",
"- Provide a short explanation for why the sentiment skews this way, referring to tone, topic sensitivity, controversy, humor, or supportiveness.\n",
"\n",
"---\n",
"\n",
"## 2. Thematic Grouping of Comments\n",
"\n",
"- Identify key recurring **themes, perspectives, or discussion threads** in the comments.\n",
"- For each theme, create a subheading.\n",
"- Under each:\n",
" - Briefly describe the focus or tone of that cluster (e.g., personal stories, criticism, questions, jokes).\n",
" - Include 12 **example comments** using quote formatting (`>`), preferably ones with replies or high engagement.\n",
"\n",
"---\n",
"\n",
"## 3. Insights Table\n",
"\n",
"If applicable, extract and structure insights into the following table. Leave any column empty if its not relevant to the post type:\n",
"\n",
"| Perspectives/ Motivations | Pains/ Concerns/ Frustrations | Tools / References / Resources | Suggestions / Solutions |\n",
"|-------------------------------|----------------------------------|--------------------------------------|------------------------------------|\n",
"| - ... | - ... | - ... | - ... |\n",
"\n",
"- Populate this table with concise bullet points.\n",
"- Adapt categories to match the discussion type (e.g., switch \"Suggestions\" to \"Reactions\" if it's a news thread).\n",
"\n",
"---\n",
"\n",
"## 4. Tone and Community Dynamics\n",
"\n",
"- Comment on the **style and culture** of interaction: humor, sarcasm, empathy, trolling, intellectual debate, etc.\n",
"- Mention any noticeable social dynamics: agreement/disagreement, echo chambers, respectful debate, or hostility.\n",
"- Include casual or emotional comments if they illustrate community personality.\n",
"\n",
"---\n",
"\n",
"**Respond only in well-formatted Markdown.** Structure your output for clarity and insight, suitable for rendering in documentation, reports, or dashboards. Do not summarize every comment — focus on patterns, perspectives, and collective signals.\n",
"\n",
"'''"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf9d15d6-4f9a-45fd-96ed-d7097c7f03d6",
"metadata": {},
"outputs": [],
"source": [
"def user_prompt_for(post):\n",
" user_prompt = f\"You are looking at a Reddit discussion titled:\\n\\n{post.title()}\\n\\n\"\n",
" user_prompt += \"Below are the responses from various users. Analyze them according to the system prompt provided.\\n\"\n",
" user_prompt += \"Make sure your response is structured in Markdown with headers, lists, and tables as instructed.\\n\\n\"\n",
" user_prompt += post.comments(4000)\n",
" return user_prompt\n"
]
},
{
"cell_type": "markdown",
"id": "f18c581c-ea30-4a43-9223-8c184dedb37e",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"#### Generating Responses"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aadf8f41-aca3-41be-b18b-cb49a67ba256",
"metadata": {},
"outputs": [],
"source": [
"def messages_for(website):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "feac9c61-f1f8-48f0-9189-bc60ac7fd755",
"metadata": {},
"outputs": [],
"source": [
"def summarize(url):\n",
" website = RedditPostScraper(url)\n",
" response = openai.chat.completions.create(\n",
" model = \"gpt-4o-mini\",\n",
" messages = messages_for(website)\n",
" )\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12b1d6dd-2d62-4136-8b8e-0a92134d4261",
"metadata": {},
"outputs": [],
"source": [
"# summarize(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dd48253d-cdca-4c29-b4f2-c470290de63b",
"metadata": {},
"outputs": [],
"source": [
"def display_summary(url):\n",
" summary = summarize(url)\n",
" display(Markdown(summary))"
]
},
{
"cell_type": "markdown",
"id": "7e0825a9-a3b0-43a0-b69c-cf0ce81d77d2",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"#### Example Usage"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8a61a482-ec70-4e29-b99c-0d82298a32b1",
"metadata": {},
"outputs": [],
"source": [
"display_summary(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a336777-a06e-4535-b68d-a6470eb1d701",
"metadata": {},
"outputs": [],
"source": [
"display_summary(\"https://www.reddit.com/r/AskReddit/comments/1lam10k/how_do_you_feel_about_the_no_kings_protest/\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6b12074-ffb6-4a6d-bdd2-bbbb78f82781",
"metadata": {},
"outputs": [],
"source": [
"display_summary(\"https://www.reddit.com/r/canada/comments/1laq8ok/donald_trump_is_a_convicted_felon_could_he_be/\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63b805e5-183f-439b-bfe7-9ee6bbe4a5b4",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,59 @@
# Reddit Post Analyzer GPT & Open Source Approaches
This project consists of two Jupyter notebooks that demonstrate different methods for analyzing Reddit post data:
- **Day 1:** `Day1_RedditAnalysis_gpt.ipynb` Uses GPT-based sentiment and insight extraction from Reddit posts and comments.
- **Day 2:** `day2_RedditAnalysis_opensource.ipynb` Implements an open-source alternative for Reddit data processing and basic sentiment/thematic analysis.
---
## 📌 Features
- Reddit post and comment scraping using PRAW
- GPT-based sentiment summarization and insight structuring (Day 1)
- Open-source sentiment and thematic analysis pipeline (Day 2)
- Markdown-formatted output suitable for reporting
---
## 🛠️ Setup Instructions
### Reddit API Credentials Setup
To access Reddit data, you need to create a Reddit app and obtain credentials:
#### Steps to Get Your Reddit API Keys:
1. Go to [https://www.reddit.com/prefs/apps](https://www.reddit.com/prefs/apps).
2. Scroll to the bottom and click **“create another app”** or **“create app”**.
3. Choose the **“script”** option.
4. Fill in the following fields:
- **name:** e.g., Reddit Analyzer
- **redirect uri:** `http://localhost:8080`
- **description:** *(optional)*
5. After creating the app, you will get:
- **client ID** (displayed under the app name)
- **client secret**
6. Keep note of your Reddit **username** and **password** (these are used with script apps)
#### Store your credentials in a `.env` file:
Create a `.env` file in the root directory with the following format:
```env
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=your_custom_user_agent
REDDIT_USERNAME=your_reddit_username
REDDIT_PASSWORD=your_reddit_password
```
These will be securely loaded into your script using the `dotenv` package.
---
## 🚀 Running the Notebooks
Make sure to activate your virtual environment (if applicable), install dependencies, and run the notebooks cell by cell in **Jupyter Lab** or **VS Code**.
---

View File

@@ -0,0 +1,436 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "8c22d46c-d08b-4dbd-bdf5-338adce95e1a",
"metadata": {},
"source": [
"# Reddit Post Analysis using open source models (llama 3.2, deepseek r1, mistral:7b)"
]
},
{
"cell_type": "markdown",
"id": "bfc5335b-53a8-4cd1-b1a8-95496ae4856d",
"metadata": {},
"source": [
"1. **Sets the Role and Tone** \n",
" Instructs the AI to act as an **expert analyst** specializing in extracting insights from online forums like Reddit.\n",
"\n",
"2. **Guides Sentiment Analysis** \n",
" Asks the AI to evaluate overall sentiment (e.g., positive, neutral, negative), and to present it as approximate percentages with a brief rationale.\n",
"\n",
"3. **Groups and Labels Themes** \n",
" Instructs the AI to identify and cluster **key discussion themes**, perspectives, and emotional tones. Each theme should be explained and illustrated with **example comments**.\n",
"\n",
"4. **Creates an Insights Table** \n",
" Requests a structured table with fields like *Perspectives, Frustrations, Tools, Suggestions* to concisely summarize the discussions core insights.\n",
"\n",
"5. **Describes Community Dynamics** \n",
" Asks the AI to assess the **interaction style** (e.g., supportive, sarcastic, argumentative) and note any social patterns (e.g., consensus or conflict)."
]
},
{
"cell_type": "markdown",
"id": "6104a23f-c43a-48dc-a018-cddb8bea75d1",
"metadata": {},
"source": [
"#### Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
"metadata": {},
"outputs": [],
"source": [
"import praw\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from IPython.display import Markdown, display\n",
"from openai import OpenAI\n",
"import ollama"
]
},
{
"cell_type": "markdown",
"id": "07de5c1d-1930-49ca-a026-2265e5432327",
"metadata": {},
"source": [
"#### Load Credentials"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83fdd570-83a3-4e18-a94e-969c557978d3",
"metadata": {},
"outputs": [],
"source": [
"load_dotenv(override=True)\n",
"reddit = praw.Reddit(\n",
" client_id=os.getenv(\"REDDIT_CLIENT_ID\"),\n",
" client_secret=os.getenv(\"REDDIT_CLIENT_SECRET\"),\n",
" user_agent=os.getenv(\"REDDIT_USER_AGENT\"),\n",
" username=os.getenv(\"REDDIT_USERNAME\"),\n",
" password=os.getenv(\"REDDIT_PASSWORD\")\n",
")\n",
"\n",
"print(\"Authenticated as:\", reddit.user.me())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6a8a58d8-6755-4e22-be97-232c2f7ea07c",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()"
]
},
{
"cell_type": "markdown",
"id": "f6b5b086-a4aa-40d2-a721-b3b8781d7ccf",
"metadata": {},
"source": [
"#### Reddit Post Scraper"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "09c7a428-db62-4353-9fa5-d12bbdc4477c",
"metadata": {},
"outputs": [],
"source": [
"class RedditPostScraper:\n",
" def __init__(self, url):\n",
" self.submission = reddit.submission(url=url)\n",
" self.submission.comments.replace_more(limit=None)\n",
" self._title = self.submission.title\n",
" self._text = self.submission.selftext\n",
" self._comments = \"\"\n",
" self._formatted_comments = [] # for reprocessing if needed\n",
"\n",
" def _generate_comments(self):\n",
" comments_list = []\n",
" for top_level in self.submission.comments:\n",
" top_author = top_level.author.name if top_level.author else \"[deleted]\"\n",
" comments_list.append(f\"{top_author}: {top_level.body}\")\n",
"\n",
" for reply in top_level.replies:\n",
" reply_author = reply.author.name if reply.author else \"[deleted]\"\n",
" comments_list.append(\n",
" f\"{reply_author} replied to {top_author}'s comment: {reply.body}\"\n",
" )\n",
" self._formatted_comments = comments_list\n",
"\n",
" def title(self):\n",
" return f\"Title:\\n{self._title}\\n{self._text}\"\n",
"\n",
" def comments(self, max_words=None):\n",
" if not self._formatted_comments:\n",
" self._generate_comments()\n",
"\n",
" output_comments = []\n",
" total_words = 0\n",
"\n",
" for comment in self._formatted_comments:\n",
" word_count = len(comment.split())\n",
" if max_words and total_words + word_count > max_words:\n",
" break\n",
" output_comments.append(comment)\n",
" total_words += word_count\n",
"\n",
" return \"Text:\\n\" + \"\\n\\n\".join(output_comments)"
]
},
{
"cell_type": "markdown",
"id": "3cece64a-ca54-4961-b04e-40f8057e2e78",
"metadata": {},
"source": [
"#### System and User Prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "029de240-398e-4339-b90c-e6e90a96bcb5",
"metadata": {},
"outputs": [],
"source": [
"system_prompt = '''You are an expert analyst specializing in extracting insights from online discussion forums. You will be given the title of a Reddit post and a list of comments (some with replies). Your task is to analyze the sentiment of the discussion and extract structured insights that reflect the collective responses.\n",
"Your response **must be in well-formatted Markdown**. Use clear section headers (`##`, `###`), bullet points, and tables where appropriate.\n",
"Perform the following tasks:\n",
"---\n",
"## 1. Overall Sentiment Breakdown\n",
"- Determine the overall sentiment of the responses (e.g., positive, negative, neutral, mixed).\n",
"- Express the sentiment as approximate percentages (e.g., 60% positive, 25% neutral, 15% negative).\n",
"- Provide a short explanation for why the sentiment skews this way, referring to tone, topic sensitivity, controversy, humor, or supportiveness.\n",
"---\n",
"## 2. Thematic Grouping of Comments\n",
"- Identify key recurring **themes, perspectives, or discussion threads** in the comments.\n",
"- For each theme, create a subheading.\n",
"- Under each:\n",
" - Briefly describe the focus or tone of that cluster (e.g., personal stories, criticism, questions, jokes).\n",
" - Include 12 **example comments** using quote formatting (`>`), preferably ones with replies or high engagement.\n",
"---\n",
"## 3. Insights Table\n",
"If applicable, extract and structure insights into the following table. Leave any column empty if its not relevant to the post type:\n",
"| Perspectives/ Motivations | Pains/ Concerns/ Frustrations | Tools / References / Resources | Suggestions / Solutions |\n",
"|-------------------------------|----------------------------------|--------------------------------------|------------------------------------|\n",
"| - ... | - ... | - ... | - ... |\n",
"- Populate this table with concise bullet points.\n",
"- Adapt categories to match the discussion type (e.g., switch \"Suggestions\" to \"Reactions\" if it's a news thread).\n",
"---\n",
"## 4. Tone and Community Dynamics\n",
"- Comment on the **style and culture** of interaction: humor, sarcasm, empathy, trolling, intellectual debate, etc.\n",
"- Mention any noticeable social dynamics: agreement/disagreement, echo chambers, respectful debate, or hostility.\n",
"- Include casual or emotional comments if they illustrate community personality.\n",
"---\n",
"**Respond only in well-formatted Markdown.** Structure your output for clarity and insight, suitable for rendering in documentation, reports, or dashboards. Do not summarize every comment — focus on patterns, perspectives, and collective signals.\n",
"\n",
"'''"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "350d8eea-005b-474e-9b57-cdb4004d8144",
"metadata": {},
"outputs": [],
"source": [
"def user_prompt_for(post):\n",
" user_prompt = f\"You are looking at a Reddit discussion titled:\\n\\n{post.title()}\\n\\n\"\n",
" user_prompt += \"Below are the responses from various users. Analyze them according to the system prompt provided.\\n\"\n",
" user_prompt += \"Make sure your response is structured in Markdown with headers, lists, and tables as instructed.\\n\\n\"\n",
" user_prompt += post.comments(1000)\n",
" return user_prompt\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf23ed3b-8583-444e-ac62-3d415f771462",
"metadata": {},
"outputs": [],
"source": [
"# post = RedditPostScraper(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")\n",
"# print(post.title())\n",
"# print(post.comments())"
]
},
{
"cell_type": "markdown",
"id": "4e37f2e1-6eef-4c27-a442-97a6ff3dbf2a",
"metadata": {},
"source": [
"#### Generating messages"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0781921b-e4e0-49f8-b34a-fd1017be6150",
"metadata": {},
"outputs": [],
"source": [
"def messages_for(website):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
" ]"
]
},
{
"cell_type": "markdown",
"id": "544c81a2-37c2-491e-8ef4-ac5d56173b72",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"#### llama 3.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d3dd0a2a-ddf2-4bd1-823d-b49fa44a09ec",
"metadata": {},
"outputs": [],
"source": [
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
"def summarizellama(url):\n",
" website = RedditPostScraper(url)\n",
" response = ollama_via_openai.chat.completions.create(\n",
" model = \"llama3.2\",\n",
" messages = messages_for(website)\n",
" )\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "717ccb6d-f6c9-4f36-ad69-686f3f1bd26b",
"metadata": {},
"outputs": [],
"source": [
"def display_summaryllama(url):\n",
" summary = summarizellama(url)\n",
" display(Markdown(summary))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f981fe9-ed2d-4546-8fb3-c0f8048e3474",
"metadata": {},
"outputs": [],
"source": [
"display_summaryllama(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")"
]
},
{
"cell_type": "markdown",
"id": "e3091dcf-f8b3-4d1a-a85c-3a9ebed2ac6c",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"#### deepseek"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "55e465fa-e29d-4ed3-8f44-71964d2f866b",
"metadata": {},
"outputs": [],
"source": [
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
"def summarizedeepseek(url):\n",
" website = RedditPostScraper(url)\n",
" response = ollama_via_openai.chat.completions.create(\n",
" model = \"deepseek-r1\",\n",
" messages = messages_for(website)\n",
" )\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "40c26a89-97a8-4883-857a-fb13fea9222d",
"metadata": {},
"outputs": [],
"source": [
"def display_summarydeepseek(url):\n",
" summary = summarizedeepseek(url)\n",
" display(Markdown(summary))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "362b871e-8f4d-47fa-b01d-bbe3082dd271",
"metadata": {},
"outputs": [],
"source": [
"display_summarydeepseek(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")"
]
},
{
"cell_type": "markdown",
"id": "3841bb1e-e885-4cb5-88f6-b6698ccbb77f",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"#### Mistral"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d913e07-31b4-439d-a861-c4fd99012588",
"metadata": {},
"outputs": [],
"source": [
"!ollama pull mistral:7b"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ab881745-990c-4158-935b-36075c1dacde",
"metadata": {},
"outputs": [],
"source": [
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
"def summarizeMistral(url):\n",
" website = RedditPostScraper(url)\n",
" response = ollama_via_openai.chat.completions.create(\n",
" model = \"mistral:7b\",\n",
" messages = messages_for(website)\n",
" )\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d5de3db6-ba69-43e8-9f6c-0945dbafa308",
"metadata": {},
"outputs": [],
"source": [
"def display_summaryMistral(url):\n",
" summary = summarizeMistral(url)\n",
" display(Markdown(summary))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ea97e30-44be-45dc-ad2f-b6951ecc0190",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"display_summaryMistral(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "38e4aabe-b111-4ddb-af6c-6d4ff7d6f26b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,167 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 4,
"id": "9138adfe-71b0-4db2-a08f-dd9e472fdd63",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import boto3"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15d71dd6-cc03-485e-8a34-7a33ed5dee0e",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "1358921d-173b-4d5d-828c-b6c3726a5eb3",
"metadata": {},
"source": [
"#### Connect to bedrock models"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b3827087-182f-48be-8b59-b2741f8ded44",
"metadata": {},
"outputs": [],
"source": [
"import json"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "94c11534-6847-4e4a-b8e4-8066e0cc6aca",
"metadata": {},
"outputs": [],
"source": [
"# Use the Conversation API to send a text message to Amazon Nova.\n",
"\n",
"import boto3\n",
"from botocore.exceptions import ClientError\n",
"\n",
"# Create a Bedrock Runtime client in the AWS Region you want to use.\n",
"client = boto3.client(\"bedrock-runtime\", region_name=\"us-east-1\")\n",
"\n",
"# Set the model ID, e.g., Amazon Nova Lite.\n",
"model_id = \"amazon.nova-lite-v1:0\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a8ad65f-abaa-475c-892c-2e2b4e668f5d",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ac20bb00-e93f-4a95-a1de-dd2688bce591",
"metadata": {},
"outputs": [],
"source": [
"# Start a conversation with the user message.\n",
"user_message = \"\"\"\n",
"List the best parks to see in London with number of google ratings and value ie. 4.5 out of 5 etc. \n",
"Give number of ratings and give output in table form\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a29f0055-48c4-4f25-b33f-cde1eaf755c5",
"metadata": {},
"outputs": [],
"source": [
"conversation = [\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [{\"text\": user_message}],\n",
" }\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0e68b2d5-4d43-4b80-8574-d3c847b33661",
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" # Send the message to the model, using a basic inference configuration.\n",
" response = client.converse(\n",
" modelId=model_id,\n",
" messages=conversation,\n",
" inferenceConfig={\"maxTokens\": 512, \"temperature\": 0.5, \"topP\": 0.9},\n",
" )\n",
"\n",
" # Extract and print the response text.\n",
" response_text = response[\"output\"][\"message\"][\"content\"][0][\"text\"]\n",
" print(response_text)\n",
"\n",
"except (ClientError, Exception) as e:\n",
" print(f\"ERROR: Can't invoke '{model_id}'. Reason: {e}\")\n",
" exit(1)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ed16ee7-3f09-4780-8dfc-d1c5f3cffdbe",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "7f8c7a18-0907-430d-bfe7-86ecb8933bfd",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "2183994b-cde5-45b0-b18b-37be3277d73b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,203 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "6e19458c-4b0e-40f6-bd4f-4d9c80ea671b",
"metadata": {},
"source": [
"# End of Week 1 - Exercise - Using Gemini API with GenAI SDK"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1a125bb-737f-41a5-8dd1-626cd8efe6e2",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from dotenv import load_dotenv\n",
"from google import genai\n",
"from google.genai import types\n",
"from IPython.display import Markdown, display, update_display"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "acf37451-3732-455b-a906-87f66053b018",
"metadata": {},
"outputs": [],
"source": [
"# Load API Key - For Gemini it automatically takes the api key from env file if we save the key using GOOGLE_API_KEY keyword\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c2fccf9-e419-431e-97fc-a42fcf67c633",
"metadata": {},
"outputs": [],
"source": [
"# Initialze Google Client\n",
"# Just to make it explicit i have used the api_key parameter but thats optional and genai.client automatically takes from .env file\n",
"\n",
"try:\n",
" client = genai.Client(api_key=api_key)\n",
" print(\"Google GenAI Client initialized successfully!\")\n",
"except Exception as e:\n",
" print(f\"Error initializing GenAI Client: {e}\")\n",
" print(\"Ensure your GOOGLE_API_KEY is correctly set as an environment variable.\")\n",
" exit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5b918afd-ed3b-49d1-85f1-6e549faec66e",
"metadata": {},
"outputs": [],
"source": [
"# Get list of models\n",
"print(\"List of models that support generateContent:\\n\")\n",
"for m in client.models.list():\n",
" for action in m.supported_actions:\n",
" if action == \"generateContent\":\n",
" print(m.name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "791da71e-35a5-4a15-90c7-93ae22e40232",
"metadata": {},
"outputs": [],
"source": [
"MODEL_GEMINI = 'gemini-2.5-flash-preview-05-20'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2a536e25-060e-4f93-bbd7-d80195620bba",
"metadata": {},
"outputs": [],
"source": [
"# System Definitions\n",
"\n",
"system_instruction_prompt = (\n",
" \"You are an expert Python programming assistant. Your goal is to identify common coding errors, suggest improvements for readability and efficiency,and provide corrected code snippets.\\\n",
" Always format code blocks using Markdown.\\\n",
" Be concise but thorough. Focus on the provided code and context.\"\n",
")\n",
"\n",
"generate_content_config = types.GenerateContentConfig(system_instruction=system_instruction_prompt)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2fc2a778-f175-44ec-9535-f81deeca7f1a",
"metadata": {},
"outputs": [],
"source": [
"# Main program to get user input and then use model to respond.\n",
"\n",
"MAX_HISTORY_MESSAGES = 6\n",
"conversation_contents = []\n",
"\n",
"print(\"\\n--- Start Chat with Gemini Python Assistant ---\")\n",
"print(\"Type 'Done' to exit the conversation.\")\n",
"\n",
"while True:\n",
" user_input = input(\"You: \").strip()\n",
"\n",
" if user_input.lower() == \"done\": \n",
" print(\"\\nExiting chat. Goodbye!\")\n",
" break \n",
"\n",
" if not user_input: \n",
" print(\"Please enter a question or 'Done' to exit.\")\n",
" continue\n",
" \n",
" try:\n",
" user_message_content = types.Content(\n",
" role=\"user\",\n",
" parts=[types.Part.from_text(text=user_input)]\n",
" ) \n",
" \n",
" conversation_contents.append(user_message_content) \n",
" \n",
" stream_response = client.models.generate_content_stream(\n",
" model=MODEL_GEMINI,\n",
" contents=conversation_contents,\n",
" config=generate_content_config,\n",
" )\n",
" \n",
" model_full_response_text = \"**Gemini:**\\n\\n\"\n",
" current_display_handle = display(Markdown(\"\"), display_id=True)\n",
" \n",
" \n",
" for chunk in stream_response:\n",
" chunk_text = chunk.text or ''\n",
" model_full_response_text += chunk_text\n",
" update_display(Markdown(model_full_response_text), display_id=current_display_handle.display_id)\n",
" \n",
" # Add Model's FULL Response to Conversation History\n",
" model_message_content = types.Content(\n",
" role=\"model\",\n",
" parts=[types.Part.from_text(text=model_full_response_text.removeprefix(\"**Gemini:**\\n\\n\"))]\n",
" )\n",
" \n",
" conversation_contents.append(model_message_content)\n",
" \n",
" conversation_contents = conversation_contents[-MAX_HISTORY_MESSAGES:] \n",
"\n",
" except Exception as e:\n",
" print(f\"\\nAn error occurred during interaction: {e}\")\n",
" if conversation_contents:\n",
" conversation_contents.pop()\n",
" print(\"Please try asking your question again or type 'Done' to exit.\")\n",
" continue "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a86c3e5b-516b-42dc-994f-9dfa75c610cc",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,271 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "032a76d2-a112-4c49-bd32-fe6c87f6ec19",
"metadata": {},
"source": [
"## Dota Game Assistant\n",
"\n",
"This script retrieves and summarizes information about a specified hero from `dotabuff.com` website"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "04b24159-55d1-4eaf-bc19-474cec71cc3b",
"metadata": {},
"outputs": [],
"source": [
"!pip install selenium\n",
"!pip install webdriver-manager"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "14d26510-6613-4c1a-a346-159d906d111c",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display\n",
"from openai import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f9c8ea1e-8881-4f50-953d-ca7f462d8a32",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"# Check the key\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "02febcac-9a21-4322-b2ea-748972312165",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb7dd822-962e-4b34-a743-c14809764e4a",
"metadata": {},
"outputs": [],
"source": [
"# A class to represent a Webpage\n",
"\n",
"# Some websites need you to use proper headers when fetching them:\n",
"headers = {\n",
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
"}\n",
"\n",
"from selenium import webdriver\n",
"from selenium.webdriver.chrome.service import Service\n",
"from selenium.webdriver.chrome.options import Options\n",
"from selenium.webdriver.common.by import By\n",
"from selenium.webdriver.support.ui import WebDriverWait\n",
"from selenium.webdriver.support import expected_conditions as EC\n",
"from webdriver_manager.chrome import ChromeDriverManager\n",
"from bs4 import BeautifulSoup\n",
"\n",
"class Website:\n",
" def __init__(self, url, wait_time=10):\n",
" \"\"\"\n",
" Create this Website object from the given URL using Selenium and BeautifulSoup.\n",
" Uses headless Chrome to load JavaScript content.\n",
" \"\"\"\n",
" self.url = url\n",
"\n",
" # Configure headless Chrome\n",
" options = Options()\n",
" options.headless = True\n",
" options.add_argument(\"--disable-gpu\")\n",
" options.add_argument(\"--no-sandbox\")\n",
"\n",
" # Start the driver\n",
" service = Service(ChromeDriverManager().install())\n",
" driver = webdriver.Chrome(service=service, options=options)\n",
"\n",
" try:\n",
" driver.get(url)\n",
"\n",
" # Wait until body is loaded (you can tweak the wait condition)\n",
" WebDriverWait(driver, wait_time).until(\n",
" EC.presence_of_element_located((By.TAG_NAME, \"body\"))\n",
" )\n",
"\n",
" html = driver.page_source\n",
" soup = BeautifulSoup(html, \"html.parser\")\n",
"\n",
" self.title = soup.title.string.strip() if soup.title else \"No title found\"\n",
"\n",
" # Remove unwanted tags\n",
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
" irrelevant.decompose()\n",
"\n",
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n",
"\n",
" finally:\n",
" driver.quit()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d833fbb-0115-4d99-a4e9-464f27900eab",
"metadata": {},
"outputs": [],
"source": [
"class DotaWebsite:\n",
" def __init__(self, hero):\n",
" web = Website(\"https://www.dotabuff.com/heroes\" + \"/\" + hero)\n",
" self.title = web.title\n",
" self.text = web.text"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a0a42c2b-c837-4d1b-b8f8-b2dbb8592a1a",
"metadata": {},
"outputs": [],
"source": [
"system_prompt = \"You are an game assistant that analyzes the contents of a website \\\n",
"and provides a short summary about facet selection, ability building, item building, best versus and worst versus, ignoring text that might be navigation related. \\\n",
"Respond in markdown.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7c05843d-6373-4a76-8cca-9c716a6ca13a",
"metadata": {},
"outputs": [],
"source": [
"# A function that writes a User Prompt that asks for summaries of websites:\n",
"\n",
"def user_prompt_for(website):\n",
" user_prompt = f\"You are looking at a website titled {website.title}\"\n",
" user_prompt += \"\\nThe contents of this website is as follows; \\\n",
"please provide a short summary of provides a short summary about facet selection, ability building, item building, best versus and worst versus in markdown. \\\n",
"If it includes news or announcements, then summarize these too.\\n\\n\"\n",
" user_prompt += website.text\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0145eee1-39e2-4f00-89ec-7acc6e375972",
"metadata": {},
"outputs": [],
"source": [
"# See how this function creates exactly the format above\n",
"\n",
"def messages_for(website):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "76f389c0-572a-476b-9b4e-719c0ef10abb",
"metadata": {},
"outputs": [],
"source": [
"# And now: call the OpenAI API. You will get very familiar with this!\n",
"\n",
"def summarize(hero):\n",
" website = DotaWebsite(hero)\n",
" response = openai.chat.completions.create(\n",
" model = \"gpt-4o-mini\",\n",
" messages = messages_for(website)\n",
" )\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fcb046b7-52a9-49ff-b7bc-d8f6c279df4c",
"metadata": {},
"outputs": [],
"source": [
"# A function to display this nicely in the Jupyter output, using markdown\n",
"\n",
"def display_summary(hero):\n",
" summary = summarize(hero)\n",
" display(Markdown(summary))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9befb685-2912-41a9-b2d9-ae33001494c0",
"metadata": {},
"outputs": [],
"source": [
"display_summary(\"axe\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf1bb1d9-0351-44fc-8ebf-91aa47a81b42",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,159 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "922bb144",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display\n",
"from openai import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "870bdcd9",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv(\"OPENAI_API_KEY\")\n",
"\n",
"# Check the key\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f6146102",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f75573f",
"metadata": {},
"outputs": [],
"source": [
"class FinvizWebsite():\n",
" \"\"\"\n",
" Create this Website object from the given url using the BeautifulSoup library\n",
" \"\"\"\n",
" \n",
" def __init__(self, ticker):\n",
" self.ticker = ticker.upper()\n",
" self.url = f\"https://finviz.com/quote.ashx?t={self.ticker}&p=d&ty=ea\"\n",
" self.headers = {\n",
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
" }\n",
" response = requests.get(self.url, headers=self.headers)\n",
" soup = BeautifulSoup(response.content, \"html.parser\")\n",
" self.title = soup.title.string if soup.title else \"No title found\"\n",
" self.table = soup.find(\"table\", class_=\"snapshot-table2\") "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42c7ced6",
"metadata": {},
"outputs": [],
"source": [
"def messages_for(website):\n",
" system_prompt = \"\"\"\n",
" You are a financial analysis assistant that analyzes the contents of HTML formated table.\n",
" and provides a summary of the stock's analysis with clear and professional language appropriate for financial research \n",
" with bulleted important list of **pros** and **cons** , ignoring text that might be navigation related. Repond in markdown.\n",
" \"\"\"\n",
" \n",
" user_prompt = f\"\"\"\n",
" You are looking at a website titled {website.title}.\\n\n",
" The contents of this website is as follows; please provide a summary of the stock's analysis from this website in markdown.\\n\\n\n",
" {website.table}\n",
" \"\"\"\n",
" \n",
" return [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
" ]\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7bfaa6da",
"metadata": {},
"outputs": [],
"source": [
"def display_summary(ticker):\n",
" website = FinvizWebsite(ticker)\n",
" response = openai.chat.completions.create(\n",
" model = \"gpt-4o-mini\",\n",
" messages = messages_for(website)\n",
" )\n",
" summary = response.choices[0].message.content\n",
" display(Markdown(summary))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eeeff6f7",
"metadata": {},
"outputs": [],
"source": [
"display_summary(\"aapl\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5aed2001",
"metadata": {},
"outputs": [],
"source": [
"display_summary(\"tsla\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,156 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "72a6552c-c837-4ced-b7c8-75a3d4cf777d",
"metadata": {},
"source": [
" <h2 style=\"color:#900;\">MAIL SUBJECT CREATION -</h2>\n",
"\n",
"<table style=\"margin: 0; text-align: left;\">\n",
" <tr>\n",
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
" <img src=\"../../important.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" </td>\n",
" <td>\n",
" <h3 style=\"color:#900;\">Write something that will take the contents of an email, and will suggest an appropriate short subject line for the email. That's the kind of feature that might be built into a commercial email tool.</h3>\n",
" </td>\n",
" </tr>\n",
"</table>"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "76822a8b-d6e0-4dd9-a801-2d34bd104b7d",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "1a9de873-d24b-42fb-8f4a-a08f429050f5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"API key found and looks good so far!\n"
]
}
],
"source": [
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"# Check the key\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "122af5d6-4727-4229-b85a-ea5246ff540c",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "b9a2c2c2-ac10-4019-aeef-2bfe6cc7b1f3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Subject: Missing API Logs for June 22nd: Scheduled Meeting to Address Issue\n"
]
}
],
"source": [
"system_prompt = \"You are an assistant which can generate a subject line as output by taking email of content as input. Subject line should be self explanatrory\"\n",
"user_prompt = \"\"\"\n",
" Below is the content of the text which I am giving as input\n",
" Mail Content - 'Hi Team,\n",
"\n",
"We have observed that the API logs for June 22nd between 6:00 AM and 12:00 PM are missing in Kibana.\n",
"\n",
"The SA team has confirmed that there were no errors reported on their end during this period.\n",
"\n",
"The DevOps team has verified that logs were being sent as expected.\n",
"\n",
"Upon checking the Fluentd pods, no errors were found.\n",
"\n",
"Logs were being shipped to td-agent as usual.\n",
"\n",
"No configuration changes or pod restarts were detected.\n",
"\n",
"We have also confirmed that no code changes were deployed from our side during this time.\n",
"\n",
"Bucket: api_application_log\n",
"Ticket\n",
"\n",
"We have scheduled a meeting with the SA and DevOps teams to restore the missing logs, as they are critical for our weekly report and analysis.'\n",
"\"\"\"\n",
"\n",
"# Step 2: Make the messages list\n",
"\n",
"messages = [ {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt}] # fill this in\n",
"\n",
"# Step 3: Call OpenAI\n",
"\n",
"response = openai.chat.completions.create(\n",
" model = \"gpt-4o-mini\",\n",
" messages = messages\n",
" )\n",
"\n",
"# Step 4: print the result\n",
"\n",
"print(response.choices[0].message.content)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,130 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display\n",
"from openai import OpenAI\n",
"\n",
"# If you get an error running this cell, then please head over to the troubleshooting notebook!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7b87cadb-d513-4303-baee-a37b6f938e4d",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"# Check the key\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "019974d9-f3ad-4a8a-b5f9-0a3719aea2d3",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()\n",
"\n",
"# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.\n",
"# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f4484fcf-8b39-4c3f-9674-37970ed71988",
"metadata": {},
"outputs": [],
"source": [
"# Step 1: Create your prompts\n",
"\n",
"system_prompt = f\"\"\"\n",
" You are an assistant that will analyze the ratings & reviews from :\\n\\n{reviews_text}\\n\\n and comeup with a summary of how many 5,4,3,2,1 star rating the restuarnat has. \n",
" You will also come up with a summary of the reviews showing what the customers love about the restaurant and what they dont like. Also extract the name of the restaurant,\n",
" the location and the cuisine. Respond in markdown\"\"\"\n",
"user_prompt = \"This is the summary for the restaurant: \"\n",
"\n",
"# Step 2: Make the messages list\n",
"\n",
"messages = [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
"] \n",
"\n",
"def generate_review_summary(reviews_text):\n",
" response = openai.chat.completions.create(\n",
" model = \"gpt-4o-mini\",\n",
" messages = messages\n",
" )\n",
" return response.choices[0].message.content\n",
"\n",
"try:\n",
" with open('restaurant_reviews.txt', 'r') as file:\n",
" reviews_text = file.read()\n",
" \n",
" # Generate review summary\n",
" summary = generate_review_summary(reviews_text)\n",
" display(Markdown(summary))\n",
"\n",
"except FileNotFoundError:\n",
" print(\"The specified reviews file was not found. Please ensure 'restaurant_reviews.txt' is in the correct directory.\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3eccbf35-0a0b-4a1b-b493-aa5c342109cc",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,189 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "181edd2d-67d4-43e4-9a89-327eaff26177",
"metadata": {},
"source": [
"Grammar and Vocab AI Checker"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4be465e2-16fc-4b34-a771-d23f05edbc14",
"metadata": {},
"outputs": [],
"source": [
"pip install PyMuPDF"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66b371fb-f4ea-4ced-8ad2-4229892e0647",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display\n",
"from openai import OpenAI\n",
"import fitz # PyMuPDF"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "41068273-4325-4de2-b11d-37d2831b1a47",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"# Check the key\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba003970-0cc9-4e11-8702-0b120f378fa4",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "faa89067-fcee-4950-b4ce-3faec640c79b",
"metadata": {},
"outputs": [],
"source": [
"system_prompt = \"You are a spell, grammar, and vocabulary checker. You check for any mistakes in terms of spelling, grammar, and vocabulary of texts or files that are given to you. You provide a response with the percentage of the text that is correct in terms of spelling, vocab, and grammar but also the total number of words. These characters is in the file or text that you are checking, and provide instructions in bullet points on how to fix them and where the mistakes are.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de32a94d-9c1b-4e1a-a1b9-78d3180c0d79",
"metadata": {},
"outputs": [],
"source": [
"# user_prompt = \"Hi, mw namw is kkkdvin. How are y,?\" # Uncomment this to test the implementation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "272f379d-3471-488d-ba27-bbffff961d72",
"metadata": {},
"outputs": [],
"source": [
"def extract_pdf_text_to_string(pdf_path):\n",
" \"\"\"\n",
" Extracts all text from a PDF file and returns it as a single string.\n",
"\n",
" Args:\n",
" pdf_path (str): The path to the PDF file.\n",
"\n",
" Returns:\n",
" str: A string containing all the extracted text from the PDF.\n",
" \"\"\"\n",
" text_content = \"\"\n",
" try:\n",
" doc = fitz.open(pdf_path)\n",
" for page_num in range(doc.page_count):\n",
" page = doc.load_page(page_num)\n",
" text_content += page.get_text()\n",
" doc.close()\n",
" except Exception as e:\n",
" print(f\"Error processing PDF: {e}\")\n",
" return None\n",
" return text_content\n",
"\n",
"pdf_file_path = \"gram-vocab-test.pdf\" # Replace with the actual path to your PDF\n",
"user_prompt = extract_pdf_text_to_string(pdf_file_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "07a839f6-c508-4b94-98ec-877c19023e58",
"metadata": {},
"outputs": [],
"source": [
"messages = [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": f\"This is the text to check for grammar, vocab, and spelling errors: {user_prompt}\"}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a642cb62-9016-4957-a74e-9f97f8c495a7",
"metadata": {},
"outputs": [],
"source": [
"response = openai.chat.completions.create(model=\"gpt-4o-mini\", messages=messages)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2ce6b006-19b6-48b4-b344-b4b57b8c1438",
"metadata": {},
"outputs": [],
"source": [
"print(response.choices[0].message.content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "54bc23cd-f59c-4b4d-bc3e-60f273692d92",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,156 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "e95fa36b-7118-4fd8-a3b2-b4424bda2178",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display\n",
"from openai import OpenAI\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a0356762-4a3f-437a-908e-192aa9c804c7",
"metadata": {},
"outputs": [],
"source": [
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"# Check the key\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb747863-30bd-4a0b-b359-b37223884075",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()\n",
"message = \"Hello, GPT! This is my first ever message to you! Hi!\"\n",
"response = openai.chat.completions.create(model=\"gpt-4o-mini\", messages=[{\"role\":\"user\", \"content\":message}])\n",
"print(response.choices[0].message.content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fae60901-3564-4f26-a812-fc16d3b95bdb",
"metadata": {},
"outputs": [],
"source": [
"def get_page_source(url):\n",
" response = requests.get(url)\n",
" response.raise_for_status() # Hata varsa bildirir\n",
" return response.text # Ham HTML metni döner\n",
"\n",
"system_prompt = \"You are an assistant analyzing the source of a website and checking for security vulnerabilities.\"\n",
"\n",
"def user_prompt_for(url):\n",
" user_prompt = \"Below is the HTML source of the website:\\n\\n\"\n",
" user_prompt += get_page_source(url) \n",
" user_prompt += \"\\n\\nPlease check this website and search for security vulnerabilities. \"\n",
" user_prompt += \"If you don't find any, print 'No vulnerability found.' \"\n",
" user_prompt += \"If you find a potential vulnerability risk, describe the vulnerability risk and print 'Potential Vulnerability Risk'.\"\n",
" user_prompt += \"If you find a direct, explicit vulnerability, describe the vulnerability and CVSS Score print 'ATTENTION! Vulnerability is Found.'\"\n",
" user_prompt += \"If you find both a potential vulnerability risk and a direct, explicit vulnerability, describe them and CVSS Score print 'ATTENTION! Potential Vulnerability Risk and Direct Vulnerability are Found!!'\"\n",
" return user_prompt\n",
"\n",
"def messages_for(url):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(url)}\n",
" ]\n",
"\n",
"def check_vuln(url):\n",
" response = openai.chat.completions.create(\n",
" model = \"gpt-4o-mini\",\n",
" messages = messages_for(url)\n",
" )\n",
" return response.choices[0].message.content\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e048c27f-f659-4c92-a47c-679bf6e5bf5f",
"metadata": {},
"outputs": [],
"source": [
"def display_vuln(url):\n",
" display_vuln = check_vuln(url)\n",
" display(Markdown(display_vuln))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "69f5852f-ca5b-4933-b93c-e9f2d401467a",
"metadata": {},
"outputs": [],
"source": [
"display_vuln(\"https://edwarddonner.com\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "824943fc-e5a5-424a-abec-56767a709782",
"metadata": {},
"outputs": [],
"source": [
"display_vuln(\"http://192.168.1.113/\") #local apache server IP, contains xss_vulnerable_example.html"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3543846-e0c6-4504-8b65-2f675f0f7ebe",
"metadata": {},
"outputs": [],
"source": [
"display_vuln(\"https://www.google.com\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,239 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "cab13efd-a1f4-4077-976e-e3912511117f",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import re\n",
"from dotenv import load_dotenv\n",
"from IPython.display import Markdown, display\n",
"from openai import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c226f54b-325c-49b1-9d99-207a8e306682",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: youtube_transcript_api in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (1.1.1)\n",
"Requirement already satisfied: defusedxml<0.8.0,>=0.7.1 in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from youtube_transcript_api) (0.7.1)\n",
"Requirement already satisfied: requests in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from youtube_transcript_api) (2.32.4)\n",
"Requirement already satisfied: charset_normalizer<4,>=2 in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from requests->youtube_transcript_api) (3.4.2)\n",
"Requirement already satisfied: idna<4,>=2.5 in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from requests->youtube_transcript_api) (3.10)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from requests->youtube_transcript_api) (2.5.0)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from requests->youtube_transcript_api) (2025.7.9)\n"
]
}
],
"source": [
"!pip install youtube_transcript_api"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "717fc2a4-b6c5-4027-9e6b-05e83c38d02f",
"metadata": {},
"outputs": [],
"source": [
"from youtube_transcript_api import YouTubeTranscriptApi"
]
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": 4,
"source": [
"# Load environment variables in a file called .env\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')"
],
"id": "3caca469-5f39-4592-bf12-c8832c44de19"
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": [
"class YouTubeRecipeExtractor:\n",
"\n",
" def __init__(self):\n",
" self.openai = OpenAI()\n",
" self.system_prompt = self.get_system_prompt()\n",
"\n",
" def get_system_prompt(self):\n",
" return \"\"\"\n",
" You are a professional chef and nutritionist specializing in recipe writting.\n",
"\n",
" Your task is to write recipes in a very comprehensive and consistent manner.\n",
" Each recipe will contain a list of ingredients and a list of steps to follow.\n",
" The quantities of the ingredients should always be referred to an official unit (grams, litres, etc). If the original recipe uses a different unit (such as cup, teaspoons, etc.) make the transformation but keep the original instruction between parenthesis.\n",
" The steps should be described in a very synthetic and concise manner. You should avoid being verbose, but the step should be understandable and easy to follow for non-expert people.\n",
" To each recipe add a general analysis from nutrition perspective (number of calories per serving, proteins, fat, etc.).\n",
" Use Markdown to improve readability.\n",
" If the text you receive is not a recipe, return a kind message explaining the situation.\n",
" \"\"\"\n",
"\n",
" def extract_video_id(self, url):\n",
" \"\"\"Extract video ID from YouTube URL\"\"\"\n",
" pattern = r'(?:youtube\\.com/watch\\?v=|youtu\\.be/|youtube\\.com/embed/)([^&\\n?#]+)'\n",
" match = re.search(pattern, url)\n",
" return match.group(1) if match else None\n",
"\n",
" def get_transcription(self, video_id):\n",
" try:\n",
" print(f\"Fetching video transcript for video {video_id}...\")\n",
" transcript = YouTubeTranscriptApi.get_transcript(video_id)\n",
" return \" \".join([item['text'] for item in transcript])\n",
" except Exception as e:\n",
" print(f\"Error fetching transcript: {e}\")\n",
" return None\n",
"\n",
" def format_recipe(self, transcript):\n",
" try:\n",
" response = self.openai.chat.completions.create(\n",
" model=\"gpt-4o-mini\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": self.system_prompt},\n",
" {\"role\": \"user\", \"content\": f\"Summarize the following YouTube recipe:\\n\\n{transcript}\"}\n",
" ]\n",
" )\n",
" return response.choices[0].message.content\n",
" except Exception as e:\n",
" print(f\"Error summarizing text: {e}\")\n",
" return None\n",
"\n",
" def display_recipe(self, url):\n",
" transcript = self.get_transcription(self.extract_video_id(url))\n",
" recipe = self.format_recipe(transcript)\n",
" display(Markdown(recipe))\n"
],
"id": "29e44cb5-0928-4ac9-9681-efd6ba1e359f"
},
{
"cell_type": "code",
"execution_count": 6,
"id": "98ea2d01-f949-4e03-9154-fe524cf64ca4",
"metadata": {},
"outputs": [],
"source": [
"test_bad_url = \"https://www.youtube.com/watch?v=hzGiTUTi060\"\n",
"test_good_url = \"https://www.youtube.com/watch?v=D_2DBLAt57c\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "362e39e8-a254-4f2f-8653-5fbb7ff0e1e9",
"metadata": {},
"outputs": [],
"source": [
"extractor = YouTubeRecipeExtractor()\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "0cc259bd-46bb-4472-b3cb-f39da54e324a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fetching video transcript...\n"
]
},
{
"data": {
"text/markdown": [
"Thank you for your interest, but the text you provided is not a recipe. If you're looking for cooking instructions, ingredient lists, or nutrition analysis, please provide a specific food or dish you would like to know about, and I'd be happy to help!"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"extractor.display_recipe(test_bad_url)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "3f43e320-ca55-4db5-bc95-71fcb342cf3c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fetching video transcript for video D_2DBLAt57c...\n",
"Error fetching transcript: YouTubeTranscriptApi.fetch() missing 1 required positional argument: 'self'\n"
]
},
{
"data": {
"text/markdown": [
"It seems like you haven't provided a recipe or any details to summarize. If you have a specific recipe in mind, please share it, and I'll be happy to help!"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"extractor.display_recipe(test_good_url)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11c5c2b3-498a-43eb-9b68-d2b920c56b10",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,459 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9",
"metadata": {},
"source": [
"# Welcome to your first assignment!\n",
"\n",
"Instructions are below. Please give this a try, and look in the solutions folder if you get stuck (or feel free to ask me!)"
]
},
{
"cell_type": "markdown",
"id": "ada885d9-4d42-4d9b-97f0-74fbbbfe93a9",
"metadata": {},
"source": [
"<table style=\"margin: 0; text-align: left;\">\n",
" <tr>\n",
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
" <img src=\"../resources.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" </td>\n",
" <td>\n",
" <h2 style=\"color:#f71;\">Just before we get to the assignment --</h2>\n",
" <span style=\"color:#f71;\">I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.<br/>\n",
" <a href=\"https://edwarddonner.com/2024/11/13/llm-engineering-resources/\">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>\n",
" Please keep this bookmarked, and I'll continue to add more useful links there over time.\n",
" </span>\n",
" </td>\n",
" </tr>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"id": "6e9fa1fc-eac5-4d1d-9be4-541b3f2b3458",
"metadata": {},
"source": [
"# HOMEWORK EXERCISE ASSIGNMENT\n",
"\n",
"Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI\n",
"\n",
"You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.\n",
"\n",
"**Benefits:**\n",
"1. No API charges - open-source\n",
"2. Data doesn't leave your box\n",
"\n",
"**Disadvantages:**\n",
"1. Significantly less power than Frontier Model\n",
"\n",
"## Recap on installation of Ollama\n",
"\n",
"Simply visit [ollama.com](https://ollama.com) and install!\n",
"\n",
"Once complete, the ollama server should already be running locally. \n",
"If you visit: \n",
"[http://localhost:11434/](http://localhost:11434/)\n",
"\n",
"You should see the message `Ollama is running`. \n",
"\n",
"If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve` \n",
"And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2` \n",
"Then try [http://localhost:11434/](http://localhost:11434/) again.\n",
"\n",
"If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code below from `MODEL = \"llama3.2\"` to `MODEL = \"llama3.2:1b\"`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import requests\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29ddd15d-a3c5-4f4e-a678-873f56162724",
"metadata": {},
"outputs": [],
"source": [
"# Constants\n",
"\n",
"OLLAMA_API = \"http://localhost:11434/api/chat\"\n",
"HEADERS = {\"Content-Type\": \"application/json\"}\n",
"MODEL = \"llama3.2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dac0a679-599c-441f-9bf2-ddc73d35b940",
"metadata": {},
"outputs": [],
"source": [
"# Create a messages list using the same format that we used for OpenAI\n",
"\n",
"messages = [\n",
" {\"role\": \"user\", \"content\": \"Describe some of the business applications of Generative AI\"}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7bb9c624-14f0-4945-a719-8ddb64f66f47",
"metadata": {},
"outputs": [],
"source": [
"payload = {\n",
" \"model\": MODEL,\n",
" \"messages\": messages,\n",
" \"stream\": False\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "479ff514-e8bd-4985-a572-2ea28bb4fa40",
"metadata": {},
"outputs": [],
"source": [
"# Let's just make sure the model is loaded\n",
"\n",
"!ollama pull llama3.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42b9f644-522d-4e05-a691-56e7658c0ea9",
"metadata": {},
"outputs": [],
"source": [
"# If this doesn't work for any reason, try the 2 versions in the following cells\n",
"# And double check the instructions in the 'Recap on installation of Ollama' at the top of this lab\n",
"# And if none of that works - contact me!\n",
"\n",
"response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)\n",
"print(response.json()['message']['content'])"
]
},
{
"cell_type": "markdown",
"id": "6a021f13-d6a1-4b96-8e18-4eae49d876fe",
"metadata": {},
"source": [
"# Introducing the ollama package\n",
"\n",
"And now we'll do the same thing, but using the elegant ollama python package instead of a direct HTTP call.\n",
"\n",
"Under the hood, it's making the same call as above to the ollama server running at localhost:11434"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7745b9c4-57dc-4867-9180-61fa5db55eb8",
"metadata": {},
"outputs": [],
"source": [
"import ollama\n",
"\n",
"response = ollama.chat(model=MODEL, messages=messages)\n",
"print(response['message']['content'])"
]
},
{
"cell_type": "markdown",
"id": "a4704e10-f5fb-4c15-a935-f046c06fb13d",
"metadata": {},
"source": [
"## Alternative approach - using OpenAI python library to connect to Ollama"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "23057e00-b6fc-4678-93a9-6b31cb704bff",
"metadata": {},
"outputs": [],
"source": [
"# There's actually an alternative approach that some people might prefer\n",
"# You can use the OpenAI client python library to call Ollama:\n",
"\n",
"from openai import OpenAI\n",
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
"\n",
"response = ollama_via_openai.chat.completions.create(\n",
" model=MODEL,\n",
" messages=messages\n",
")\n",
"\n",
"print(response.choices[0].message.content)"
]
},
{
"cell_type": "markdown",
"id": "9f9e22da-b891-41f6-9ac9-bd0c0a5f4f44",
"metadata": {},
"source": [
"## Are you confused about why that works?\n",
"\n",
"It seems strange, right? We just used OpenAI code to call Ollama?? What's going on?!\n",
"\n",
"Here's the scoop:\n",
"\n",
"The python class `OpenAI` is simply code written by OpenAI engineers that makes calls over the internet to an endpoint. \n",
"\n",
"When you call `openai.chat.completions.create()`, this python code just makes a web request to the following url: \"https://api.openai.com/v1/chat/completions\"\n",
"\n",
"Code like this is known as a \"client library\" - it's just wrapper code that runs on your machine to make web requests. The actual power of GPT is running on OpenAI's cloud behind this API, not on your computer!\n",
"\n",
"OpenAI was so popular, that lots of other AI providers provided identical web endpoints, so you could use the same approach.\n",
"\n",
"So Ollama has an endpoint running on your local box at http://localhost:11434/v1/chat/completions \n",
"And in week 2 we'll discover that lots of other providers do this too, including Gemini and DeepSeek.\n",
"\n",
"And then the team at OpenAI had a great idea: they can extend their client library so you can specify a different 'base url', and use their library to call any compatible API.\n",
"\n",
"That's it!\n",
"\n",
"So when you say: `ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')` \n",
"Then this will make the same endpoint calls, but to Ollama instead of OpenAI."
]
},
{
"cell_type": "markdown",
"id": "bc7d1de3-e2ac-46ff-a302-3b4ba38c4c90",
"metadata": {},
"source": [
"## Also trying the amazing reasoning model DeepSeek\n",
"\n",
"Here we use the version of DeepSeek-reasoner that's been distilled to 1.5B. \n",
"This is actually a 1.5B variant of Qwen that has been fine-tuned using synethic data generated by Deepseek R1.\n",
"\n",
"Other sizes of DeepSeek are [here](https://ollama.com/library/deepseek-r1) all the way up to the full 671B parameter version, which would use up 404GB of your drive and is far too large for most!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf9eb44e-fe5b-47aa-b719-0bb63669ab3d",
"metadata": {},
"outputs": [],
"source": [
"!ollama pull deepseek-r1:1.5b"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1d3d554b-e00d-4c08-9300-45e073950a76",
"metadata": {},
"outputs": [],
"source": [
"# This may take a few minutes to run! You should then see a fascinating \"thinking\" trace inside <think> tags, followed by some decent definitions\n",
"\n",
"response = ollama_via_openai.chat.completions.create(\n",
" model=\"deepseek-r1:1.5b\",\n",
" messages=[{\"role\": \"user\", \"content\": \"Please give definitions of some core concepts behind LLMs: a neural network, attention and the transformer\"}]\n",
")\n",
"\n",
"print(response.choices[0].message.content)"
]
},
{
"cell_type": "markdown",
"id": "1622d9bb-5c68-4d4e-9ca4-b492c751f898",
"metadata": {},
"source": [
"# NOW the exercise for you\n",
"\n",
"Take the code from day1 and incorporate it here, to build a website summarizer that uses Llama 3.2 running locally instead of OpenAI; use either of the above approaches."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6de38216-6d1c-48c4-877b-86d403f4e0f8",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import requests\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0bd2aea1-d7d7-499f-b704-5b13e2ddd23f",
"metadata": {},
"outputs": [],
"source": [
"MODEL = \"llama3.2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6df3141a-0a46-4ff9-ae73-bf8bee2aa3d8",
"metadata": {},
"outputs": [],
"source": [
"# A class to represent a Webpage\n",
"\n",
"class Website:\n",
" \"\"\"\n",
" A utility class to represent a Website that we have scraped\n",
" \"\"\"\n",
" url: str\n",
" title: str\n",
" text: str\n",
"\n",
" def __init__(self, url):\n",
" \"\"\"\n",
" Create this Website object from the given url using the BeautifulSoup library\n",
" \"\"\"\n",
" self.url = url\n",
" response = requests.get(url)\n",
" soup = BeautifulSoup(response.content, 'html.parser')\n",
" self.title = soup.title.string if soup.title else \"No title found\"\n",
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
" irrelevant.decompose()\n",
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "df2ea48b-7343-47be-bdcb-52b63a4de43e",
"metadata": {},
"outputs": [],
"source": [
"# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish.\"\n",
"\n",
"system_prompt = \"You are an assistant that analyzes the contents of a website \\\n",
"and provides a short summary, ignoring text that might be navigation related. \\\n",
"Respond in markdown.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "80f1a534-ae2a-4283-83cf-5e7c5765c736",
"metadata": {},
"outputs": [],
"source": [
"# A function that writes a User Prompt that asks for summaries of websites:\n",
"\n",
"def user_prompt_for(website):\n",
" user_prompt = f\"You are looking at a website titled {website.title}\"\n",
" user_prompt += \"The contents of this website is as follows; \\\n",
"please provide a short summary of this website in markdown. \\\n",
"If it includes news or announcements, then summarize these too.\\n\\n\"\n",
" user_prompt += website.text\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5dfe658d-e3f9-4b32-90e6-1a523f47f836",
"metadata": {},
"outputs": [],
"source": [
"# See how this function creates exactly the format above\n",
"\n",
"def messages_for(website):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2e2a09d0-bc47-490e-b085-fe3ccfbd16ad",
"metadata": {},
"outputs": [],
"source": [
"# And now: call the Ollama function instead of OpenAI\n",
"\n",
"def summarize(url):\n",
" website = Website(url)\n",
" messages = messages_for(website)\n",
" response = ollama.chat(model=MODEL, messages=messages)\n",
" return response['message']['content']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "340e08a2-86f0-4cdd-9188-da2972cae7a6",
"metadata": {},
"outputs": [],
"source": [
"# A function to display this nicely in the Jupyter output, using markdown\n",
"\n",
"def display_summary(url):\n",
" summary = summarize(url)\n",
" display(Markdown(summary))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "55e4790a-013c-40cf-9dff-bb5ec1d53964",
"metadata": {},
"outputs": [],
"source": [
"display_summary(\"https://zhufqiu.com\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8a96cbad-1306-4ce1-a942-2448f50d6751",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,266 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "0",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"from IPython.display import Markdown, display"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"\n",
"load_dotenv()\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"# Check the key\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3",
"metadata": {},
"outputs": [],
"source": [
"# Let's just make sure the model is loaded\n",
"!ollama pull llama3.2\n",
"import ollama\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"# System prompt - defines the AI's behavior\n",
"SYSTEM_PROMPT = \"\"\"You are a helpful cooking assistant that provides ingredient lists for recipes.\n",
"Format your response as clean markdown with this structure:\n",
"\n",
"# [Dish Name]\n",
"**Serves:** [number] people \n",
"**Cook Time:** [estimated time]\n",
"\n",
"## Shopping List\n",
"- [ ] [amount] [unit] [ingredient]\n",
"- [ ] [amount] [unit] [ingredient]\n",
"\n",
"Guidelines:\n",
"- Use common grocery store measurements (cups, lbs, oz, pieces, cans, etc.)\n",
"- Round to practical shopping amounts (1.5 lbs instead of 1.47 lbs)\n",
"- Group similar items when logical (all spices together)\n",
"- Include pantry staples only if they're essential (salt, oil, etc.)\n",
"- Assume basic seasonings are available unless recipe-specific\n",
"- For produce, specify size when important (large onion, medium tomatoes)\n",
"- Keep optional items at the end of similar item groups or end of the list\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"def get_recipe_openai(dish_name: str, num_people: int):\n",
" \"\"\"Get scaled recipe ingredients using system and user prompts\"\"\"\n",
"\n",
" user_prompt = f\"Give me the ingredients needed to make {dish_name} for {num_people} people.\"\n",
" \n",
" try:\n",
" response = openai.chat.completions.create(\n",
" model=\"gpt-4o-mini\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
" ],\n",
" max_tokens=400\n",
" )\n",
" \n",
" return response.choices[0].message.content\n",
" \n",
" except Exception as e:\n",
" return f\"❌ Error: Failed to get recipe - {str(e)}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6",
"metadata": {},
"outputs": [],
"source": [
"OLLAMA_MODEL = \"llama3.2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7",
"metadata": {},
"outputs": [],
"source": [
"def get_recipe_ollama(dish_name: str, num_people: int):\n",
" \"\"\"Get recipe using Ollama API\"\"\"\n",
" user_prompt = f\"Give me the ingredients needed to make {dish_name} for {num_people} people.\"\n",
" \n",
" messages = [\n",
" {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
" ]\n",
" \n",
" try:\n",
" response = ollama.chat(model=OLLAMA_MODEL, messages=messages)\n",
" return response['message']['content']\n",
" except Exception as e:\n",
" return f\"❌ Ollama Error: {str(e)}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"def print_shopping_list(recipe_markdown):\n",
" \"\"\"Print the markdown response\"\"\"\n",
" display(Markdown(recipe_markdown))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"print(\"🍳 Recipe Scaler & Grocery List Maker\")\n",
"print(\"=\" * 40)\n",
" \n",
"ai_service_choice = input(\"\\nChoose AI service (1 for OpenAI, 2 for Ollama): \").strip()\n",
"\n",
"dish = input(\"What dish do you want to make? \")\n",
"num_people = int(input(\"How many people? \"))\n",
" \n",
"print(f\"\\n🔍 Getting recipe for {dish}...\")\n",
" \n",
"# Get and display recipe\n",
"if ai_service_choice == '1':\n",
" print(\"Using OpenAI API...\")\n",
" recipe_markdown = get_recipe_openai(dish, num_people)\n",
"else:\n",
" print(\"Using Ollama (local)...\")\n",
" recipe_markdown = get_recipe_ollama(dish, num_people)\n",
"\n",
"print_shopping_list(recipe_markdown)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,191 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "75e66023-eccf-46a9-8b70-7b21ede16ddd",
"metadata": {},
"source": [
"# End of week 1 exercise\n",
"\n",
"To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n",
"and responds with an explanation. This is a tool that you will be able to use yourself during the course!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "72d21373-edbd-4432-a29d-db8e6c9c5808",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from IPython.display import Markdown, display, update_display\n",
"from openai import OpenAI\n",
"import ollama"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d4e4c15b-7ae8-43e9-839d-7cc49345be5a",
"metadata": {},
"outputs": [],
"source": [
"!ollama pull llama3.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7fb44166-1c65-42fc-9950-1960bc3cc432",
"metadata": {},
"outputs": [],
"source": [
"# constants\n",
"\n",
"MODEL_GPT = 'gpt-4o-mini'\n",
"MODEL_LLAMA = 'llama3.2'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "58f5f1e1-5296-4631-9698-8645d4621a0c",
"metadata": {},
"outputs": [],
"source": [
"# set up environment\n",
"\n",
"# Get the openai key\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"if openai_api_key and openai_api_key.startswith('sk-proj-') and len(openai_api_key)>10:\n",
" print(\"API key looks good so far\")\n",
"else:\n",
" print(\"There might be a problem with your API key? Please visit the troubleshooting notebook!\")\n",
"\n",
"openai = OpenAI()\n",
"# Get the ollama key using the llama model\n",
"\n",
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12f07b33-76b9-42fa-9962-21f2a5796126",
"metadata": {},
"outputs": [],
"source": [
"system_prompt = \"You are a knowledgeable technical instructor who helps students understand \\\n",
"complex concepts across a wide range of technical topics. Your expertise includes artificial]\\\n",
"intelligence, machine learning, large language models (LLMs), and programming in languages \\\n",
"such as Python, JavaScript, Java, and more. You also provide in-depth support for \\\n",
"AI engineering questions and other advanced technical subjects.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "330abeb7-7db2-4f23-9d19-dd698058a400",
"metadata": {},
"outputs": [],
"source": [
"# here is the question; type over this to ask something new\n",
"\n",
"question = \"\"\"\n",
"Please explain what this code does and why:\n",
"yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd11ad48-91ec-4cdf-9c57-99a0451e7a2f",
"metadata": {},
"outputs": [],
"source": [
"# Get gpt-4o-mini to answer, with streaming\n",
"stream_GPT = openai.chat.completions.create(\n",
" model=MODEL_GPT,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": question}\n",
" ],\n",
" stream = True\n",
" )\n",
"response_GPT = \"\"\n",
"display_handle = display(Markdown(\"\"), display_id=True)\n",
"for chunk in stream_GPT:\n",
" response_GPT += chunk.choices[0].delta.content or ''\n",
" response_GPT = response_GPT.replace(\"```\",\"\").replace(\"markdown\", \"\")\n",
" update_display(Markdown(response_GPT), display_id=display_handle.display_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dd2527ae-0d75-4f15-a45f-92075e3059d6",
"metadata": {},
"outputs": [],
"source": [
"# Get Llama 3.2 to answer\n",
"\n",
"response_llama = ollama_via_openai.chat.completions.create(\n",
" model=MODEL_LLAMA,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": question}\n",
" ],\n",
" )\n",
"result = response_llama.choices[0].message.content\n",
"\n",
"display(Markdown(result))\n",
"\n",
"# import ollama\n",
"\n",
"# response = ollama.chat(model=MODEL_LLAMA, messages=[\n",
"# {\"role\": \"system\", \"content\": system_prompt},\n",
"# {\"role\": \"user\", \"content\": question}\n",
"# ])\n",
"# print(response['message']['content'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2747739-ba64-4067-902f-c1acc0dbdaca",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,366 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "53b9681c-896a-4e5d-b62c-44c90612e67c",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"import json\n",
"from typing import List\n",
"from dotenv import load_dotenv\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display, update_display\n",
"from openai import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c6f1133-5c17-4ca7-819c-f64cc48212ec",
"metadata": {},
"outputs": [],
"source": [
"# Initialize constants and get api_key\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"#Check if api_key is correct\n",
"if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:\n",
" print(\"API key looks good so far\")\n",
"else:\n",
" print(\"There might be a problem with your API key? Please visit the troubleshooting notebook!\")\n",
" \n",
"MODEL = 'gpt-4o-mini'\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4cdb0a59-b5e1-4df5-a17e-8c36c80695b4",
"metadata": {},
"outputs": [],
"source": [
"# A class to represent a Webpage\n",
"\n",
"# Some websites need you to use proper headers when fetching them:\n",
"headers = {\n",
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
"}\n",
"\n",
"class Website:\n",
" \"\"\"\n",
" A utility class to represent a Website that we have scraped, now with links\n",
" \"\"\"\n",
"\n",
" def __init__(self, url):\n",
" self.url = url\n",
" response = requests.get(url, headers=headers)\n",
" self.body = response.content\n",
" soup = BeautifulSoup(self.body, 'html.parser')\n",
" self.title = soup.title.string if soup.title else \"No title found\"\n",
" if soup.body:\n",
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
" irrelevant.decompose()\n",
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n",
" else:\n",
" self.text = \"\"\n",
" links = [link.get('href') for link in soup.find_all('a')]\n",
" self.links = [link for link in links if link]\n",
"\n",
" def get_contents(self):\n",
" return f\"Webpage Title:\\n{self.title}\\nWebpage Contents:\\n{self.text}\\n\\n\""
]
},
{
"cell_type": "markdown",
"id": "50d4cffe-da7a-4cab-afea-d061a1a608ac",
"metadata": {},
"source": [
"Step 1: Find relevant links to the website in order to create the brochure (Use Multi-shot prompting)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b43b4c64-bc6a-41ca-bdb9-aa714e4e794e",
"metadata": {},
"outputs": [],
"source": [
"link_system_prompt = \"You are provided with a list of links found on a webpage like ['https://edwarddonner.com/', https://www.udemy.com/course/llm-engineering-master-ai-and-large-language-models/?referralCode=35EB41EBB11DD247CF54&couponCode=KEEPLEARNING] or ['https://huggingface.co/', https://huggingface.co/models] \\\n",
"You are able to decide which of the links would be most relevant to include in a brochure about the company, \\\n",
"such as links to an About page, or a News page, or a Home page, or a Company page, or Careers/Jobs pages.\\n\"\n",
"link_system_prompt += \"You should respond in JSON as in these example:\"\n",
"link_system_prompt += \"\"\"\n",
"{\n",
" \"links\": [\n",
" {\"type\": \"about page\", \"url\": \"https://full.url/goes/here/about\"},\n",
" {\"type\": \"careers page\", \"url\": \"https://another.full.url/careers\"}\n",
" ]\n",
"}\n",
"\n",
"{\n",
" \"links\": [\n",
" {\"type\": \"home page\", \"url\": \"https://full.url/goes/here/about\"},\n",
" {\"type\": \"news page\", \"url\": \"https://another.full.url/careers\"}\n",
" ]\n",
"}\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15d2870c-67ab-4aa2-89f5-04b608a9c810",
"metadata": {},
"outputs": [],
"source": [
"def get_links_user_prompt(website):\n",
" user_prompt = f\"Here is the list of links on the website of {website.url} - \"\n",
" user_prompt += \"please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \\\n",
"Do not include Terms of Service, Privacy, email links.\\n\"\n",
" user_prompt += \"Links (some might be relative links):\\n\"\n",
" user_prompt += \"\\n\".join(website.links)\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e255be42-5e71-47ca-9275-c0cf22beeb00",
"metadata": {},
"outputs": [],
"source": [
"def get_links(url):\n",
" website = Website(url)\n",
" response = openai.chat.completions.create(\n",
" model=MODEL,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": link_system_prompt},\n",
" {\"role\": \"user\", \"content\": get_links_user_prompt(website)}\n",
" ],\n",
" response_format={\"type\": \"json_object\"}\n",
" )\n",
" result = response.choices[0].message.content\n",
" return json.loads(result)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "818b6e50-c403-42a1-8ee4-7606eaf0006f",
"metadata": {},
"outputs": [],
"source": [
"get_links('https://huggingface.co/')"
]
},
{
"cell_type": "markdown",
"id": "030ceb9b-ef71-41fd-9f23-92cb6e1d137e",
"metadata": {},
"source": [
"Step 2: Generate the brochure using the relevant links we got from OpenAI's selection"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a703230e-d57b-43a5-bdd0-e25fc2ec2e3b",
"metadata": {},
"outputs": [],
"source": [
"def get_all_details(url):\n",
" result = \"Landing page:\\n\"\n",
" result += Website(url).get_contents()\n",
" links = get_links(url)\n",
" print(\"Found links:\", links)\n",
" for link in links[\"links\"]:\n",
" result += f\"\\n\\n{link['type']}\\n\"\n",
" result += Website(link[\"url\"]).get_contents()\n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74d19852-f817-4fee-a95c-35ca7a83234f",
"metadata": {},
"outputs": [],
"source": [
"system_prompt = \"\"\"You are an assistant that analyzes the contents of several relevant pages from a company website \\\n",
"and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\\\n",
"Include details of company culture, customers and careers/jobs if you have the information. \\\n",
"Example 1: \\\n",
"Relevant pages: \\\n",
"- https://example.com/about \\\n",
"- https://example.com/careers \\\n",
"- https://example.com/news \\\n",
"\n",
"Brochure: \\\n",
"# About ExampleCorp \\\n",
"ExampleCorp is a global leader in AI-driven logistics optimization. Founded in 2015, the company serves clients in over 30 countries... \\\n",
"\n",
"--- \\\n",
"\n",
"Example 2: \\\n",
"Relevant pages: \\\n",
"- https://techstart.io/home \\\n",
"- https://techstart.io/jobs \\\n",
"- https://techstart.io/customers \\\n",
"\n",
"Brochure: \\\n",
"# Welcome to TechStart \\\n",
"TechStart builds tools that power the future of software development. With a team-first culture and customers like Stripe, Atlassian... \\\n",
"\n",
"--- \\\n",
"\n",
"\"\"\"\n",
"\n",
"# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':\n",
"\n",
"# system_prompt = \"You are an assistant that analyzes the contents of several relevant pages from a company website \\\n",
"# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\\\n",
"# Include details of company culture, customers and careers/jobs if you have the information.\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a2f19085-0d03-4386-b390-a38014ca6590",
"metadata": {},
"outputs": [],
"source": [
"def get_brochure_user_prompt(company_name, url):\n",
" user_prompt = f\"You are looking at a company called: {company_name}\\n\"\n",
" user_prompt += f\"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\\n\"\n",
" user_prompt += get_all_details(url)\n",
" user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0ddbdea7-cf80-48d4-8bce-a11bd1a32d47",
"metadata": {},
"outputs": [],
"source": [
"def create_brochure(company_name, url):\n",
" response = openai.chat.completions.create(\n",
" model=MODEL,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": get_brochure_user_prompt(company_name, url)}\n",
" ],\n",
" )\n",
" result = response.choices[0].message.content\n",
" # display(Markdown(result))\n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "023c1ba0-7f5a-48ac-9a98-dd184432a758",
"metadata": {},
"outputs": [],
"source": [
"create_brochure(\"HuggingFace\", \"https://huggingface.co\")"
]
},
{
"cell_type": "markdown",
"id": "187651f6-d42d-405a-abed-732486161359",
"metadata": {},
"source": [
"Step 3: Translate to French"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7734915d-d38f-40ad-8335-0df39c91f6d8",
"metadata": {},
"outputs": [],
"source": [
"system_prompt = \"\"\"You are a translator that translates the English language to the French language \\\n",
"professionally. All you do, is first show the original version in english and then show the translate version below it in French.\\\n",
"Respond in Markdown\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29a1b40c-9040-4a3d-808b-0ca906d5cfc8",
"metadata": {},
"outputs": [],
"source": [
"def get_user_translation_prompt(company_name, url):\n",
" user_prompt=\"You are to translate the following brochure from the english to the french \\\n",
" language and going to display it with the English language brochure version first and then\\\n",
" the French language brochure version, don't make any changes to it, just a translation, the \\\n",
" following is the brochure:\"\n",
" user_prompt+=create_brochure(company_name, url)\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a6e45b1f-3fa6-4db8-9f73-8339265502a7",
"metadata": {},
"outputs": [],
"source": [
"def translate_brochure(company_name, url):\n",
" response = openai.chat.completions.create(\n",
" model=MODEL,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": get_user_translation_prompt(company_name, url)}\n",
" ],\n",
" )\n",
" result = response.choices[0].message.content\n",
" display(Markdown(result))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f71c2496-76ea-4f25-9939-98ebd37cb6a6",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"translate_brochure(\"HuggingFace\", \"https://huggingface.co\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,84 @@
"""
Project: Web Content Summarizer using Ollama's llama3.2 model
- Developed a Python tool to extract and summarize website content using Ollama's llama3.2 model and BeautifulSoup.
- Implemented secure API integration and HTTP requests with custom headers to mimic browser behavior.
"""
import os
import requests
from bs4 import BeautifulSoup
import ollama
# Constants
OLLAMA_API = "http://localhost:11434/api/chat"
HEADERS = {"Content-Type": "application/json"}
MODEL = "llama3.2"
# Define the Website class to fetch and parse website content
class Website:
def __init__(self, url):
"""
Initialize a Website object by fetching and parsing the given URL.
Uses BeautifulSoup to extract the title and text content of the page.
"""
self.url = url
response = requests.get(url, headers=HEADERS)
soup = BeautifulSoup(response.content, 'html.parser')
# Extract the title of the website
self.title = soup.title.string if soup.title else "No title found"
# Remove irrelevant elements like scripts, styles, images, and inputs
for irrelevant in soup.body(["script", "style", "img", "input"]):
irrelevant.decompose()
# Extract the main text content of the website
self.text = soup.body.get_text(separator="\n", strip=True)
# Define the system prompt for the OpenAI model
system_prompt = (
"You are an assistant that analyzes the contents of a website "
"and provides a short summary, ignoring text that might be navigation related. "
"Respond in markdown."
)
# Function to generate the user prompt based on the website content
def user_prompt_for(website):
"""
Generate a user prompt for the llama3.2 model based on the website's title and content.
"""
user_prompt = f"You are looking at a website titled {website.title}"
user_prompt += "\nThe contents of this website is as follows; summarize these.\n\n"
user_prompt += website.text
return user_prompt
# Function to create the messages list for the OpenAI API
def messages_for(website):
"""
Create a list of messages for the ollama, including the system and user prompts.
"""
return [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt_for(website)}
]
# Function to summarize the content of a given URL
def summarize(url):
"""
Summarize the content of the given URL using the OpenAI API.
"""
# Create a Website object to fetch and parse the URL
website = Website(url)
# Call the llama3.2 using ollama with the generated messages
response = ollama.chat(
model= MODEL,
messages=messages_for(website)
)
# Return the summary generated by ollama
print(response.message.content)
# Example usage: Summarize the content of a specific URL
summarize("https://sruthianem.com")

View File

@@ -0,0 +1,454 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a92df66b-68c9-4288-b881-45d1fd948c18",
"metadata": {},
"source": [
"### Week 1 Contribution: Selenium-enhanced Website Summarizer\n",
"This notebook attempts to summarize content from any website using a BeautifulSoup-first strategy with a Selenium fallback for JavaScript-heavy pages. Llama 3.2 is used to generate a markdown-formatted summary.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "407ea4b4-7c1b-4f94-a48d-f3ee3273bc61",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown,display\n",
"from openai import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "040e97a8-9a5f-4903-9d0e-fa19bb719b4f",
"metadata": {},
"outputs": [],
"source": [
"MODEL=\"llama3.2\"\n",
"openai=OpenAI(base_url=\"http://localhost:11434/v1\",api_key=\"ollama\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cac3c9ae-31ce-45b1-bbc1-70577a198e84",
"metadata": {},
"outputs": [],
"source": [
"message=\"Hi, write a snarky poem for me.\" \n",
"response=openai.chat.completions.create(\n",
" model=MODEL,\n",
" messages=[{\n",
" \"role\":\"user\",\n",
" \"content\":message\n",
" }]\n",
")\n",
"print(response.choices[0].message.content)"
]
},
{
"cell_type": "markdown",
"id": "a27514f6-d7a5-4292-b98b-dc166416a2fc",
"metadata": {},
"source": [
"### Beautiful Soup Version"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "678901b6-5da1-4df7-8b73-a1c69dc758b0",
"metadata": {},
"outputs": [],
"source": [
"headers = {\n",
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
"} # to make sure we're not blocked as bots from websites\n",
"\n",
"class bsWebsite:\n",
" \"\"\"\n",
" Attributes:\n",
" url (str): The URL of the page\n",
" title (str): The title of the page\n",
" text (str): The readable text from the page\n",
" \"\"\"\n",
"\n",
" def __init__(self,url):\n",
" self.url=url\n",
" response=requests.get(url,headers=headers) # gets the content of the page in response variable\n",
"\n",
" soup=BeautifulSoup(response.content,'html.parser') # content of response is accessed using html parser for structure\n",
" self.title=soup.title.string if soup.title else \"No title\"\n",
"\n",
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
" irrelevant.decompose()\n",
"\n",
" self.text=soup.body.get_text(separator='\\n',strip=True)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a1a5ddd-7907-46fd-a1b7-ceeb876262f7",
"metadata": {},
"outputs": [],
"source": [
"ed = bsWebsite(\"https://edwarddonner.com\")\n",
"\n",
"print(ed.url)\n",
"print(ed.text)\n",
"print(ed.title)"
]
},
{
"cell_type": "markdown",
"id": "b7e965e4-7d20-4980-8cb2-871b8ca63c45",
"metadata": {},
"source": [
"#### Now, let's create a detailed summary for how selenium works using what we just made"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b71a05c6-669b-4632-aeb9-b51daa4429a1",
"metadata": {},
"outputs": [],
"source": [
"sel=bsWebsite(\"https://www.geeksforgeeks.org/software-engineering/selenium-webdriver-tutorial/\")\n",
"print(sel.url)\n",
"print(sel.title)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c463c67-2a9c-4fcd-99aa-cab0e2cdf936",
"metadata": {},
"outputs": [],
"source": [
"def user_prompt_for(web):\n",
" user_prompt=f\"\"\"You are looking at a website called {web.title}. \n",
" Provide a detailed summary of the given content and the concepts in markdown:\\n[{web.text}]\"\"\"\n",
"\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b2118ac4-3355-4f90-b799-ba375ceeafc1",
"metadata": {},
"outputs": [],
"source": [
"system_prompt=\"\"\"You are an assistant that analyses the contents of a website based on request of user, \n",
"while ignoring text that is navigation related. Respond in markdown.\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "716b3772-3c73-4010-b089-8bc374cab9de",
"metadata": {},
"outputs": [],
"source": [
"print(user_prompt_for(ed))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b23b39b4-78a3-4694-8c89-f2ce56b628f2",
"metadata": {},
"outputs": [],
"source": [
"user_prompt=user_prompt_for(sel)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce29c83c-7b47-43a8-8f92-c2a1aa36f8f5",
"metadata": {},
"outputs": [],
"source": [
"messages=[\n",
" { \"role\":\"system\", \"content\":system_prompt},\n",
" { \"role\":\"user\", \"content\":user_prompt}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f120702-029e-4c1a-8ffb-2c4944110aa8",
"metadata": {},
"outputs": [],
"source": [
"response=openai.chat.completions.create(model=MODEL,messages=messages)\n",
"\n",
"print(response.choices[0].message.content)"
]
},
{
"cell_type": "markdown",
"id": "e9326415-6d35-4750-b9b1-1ae83a86d6f7",
"metadata": {},
"source": [
"### Selenium Version"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba86d4cc-cf4c-4f75-aa57-4126b15463b7",
"metadata": {},
"outputs": [],
"source": [
"# making sure we're in the virtual environment\n",
"import sys\n",
"print(sys.executable)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2ba86dfa-1e91-4535-9c93-3838c46aee52",
"metadata": {},
"outputs": [],
"source": [
"# !pip install selenium"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "01771002-b10f-4681-8710-0f1515866c92",
"metadata": {},
"outputs": [],
"source": [
"# !pip install webdriver-manager"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c19b582d-a355-4c20-8028-42a802e7dca5",
"metadata": {},
"outputs": [],
"source": [
"from selenium import webdriver\n",
"from selenium.webdriver.edge.service import Service\n",
"# for edge only:\n",
"from webdriver_manager.microsoft import EdgeChromiumDriverManager"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "978ab0b9-b42b-4136-8383-79b3f84e084b",
"metadata": {},
"outputs": [],
"source": [
"# works for edge only. Do not close the window that pops up as t will be used to open sites given.\n",
"driver=webdriver.Edge(service=Service(EdgeChromiumDriverManager().install()))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7dfdeb48-562e-44d3-9044-157d616835fd",
"metadata": {},
"outputs": [],
"source": [
"# creating a similar class as bsWebsie but using selenium\n",
"class SelWebsite:\n",
"\n",
" def __init__(self,url,driver):\n",
" self.driver=driver\n",
" self.driver.get(url)\n",
" \n",
" self.url=self.driver.current_url\n",
" self.title=self.driver.title\n",
" self.text=self.driver.find_element(By.TAG_NAME,\"body\").text"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6174105d-c123-4032-afa8-75588c0f1133",
"metadata": {},
"outputs": [],
"source": [
"# testing it on OpenAI website\n",
"gpt=SelWebsite(\"https://openai.com\",driver)\n",
"print(gpt.url)\n",
"print(gpt.driver)\n",
"print(gpt.title)\n",
"print(gpt.text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bde84abf-09dd-4a56-b6a7-4e5a34c1098e",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "b7208f3f-6245-48a4-a5ae-d0b59550ee28",
"metadata": {},
"source": [
"##### Troubleshooting in case of errors:\n",
"1. Make sure the window popped up wasn't closed.\n",
"2. If the below cell results in any text except an error - driver ID is valid. In this case, quit and restart the driver again.\n",
"3. If driver ID is invalid, activate driver again using below cells."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "30afa4d1-1ce6-4bad-820e-b72cf3eef959",
"metadata": {},
"outputs": [],
"source": [
"# use the following code to check for valid session ID for driver if error occurs:\n",
"print(driver.session_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "154ace93-47b2-40ea-9d49-c6c598a67144",
"metadata": {},
"outputs": [],
"source": [
"# if above is valid but still results in trouble, run both; otherwise run only the second part:\n",
"# driver.quit()\n",
"# driver = webdriver.Edge(service=Service(EdgeChromiumDriverManager().install()))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "07e74ec5-fda6-462f-b929-7d173b0bdb31",
"metadata": {},
"outputs": [],
"source": [
"print(user_prompt_for(gpt))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5d0fd2e-949a-4358-b963-1395157618d2",
"metadata": {},
"outputs": [],
"source": [
"messages2=[\n",
" {\"role\":\"system\",\"content\":system_prompt},\n",
" {\"role\":\"user\",\"content\":user_prompt_for(gpt)}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db457f5c-e1be-4087-932d-25ba4880b3ac",
"metadata": {},
"outputs": [],
"source": [
"response=openai.chat.completions.create(model=MODEL,messages=messages2)\n",
"\n",
"print(response.choices[0].message.content)"
]
},
{
"cell_type": "markdown",
"id": "d448018f-f363-4af9-8ae3-88cc4408da91",
"metadata": {},
"source": [
"### Now let's build a summarize function which can be called directly to summarize any site."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "690ca16b-4b9c-4ddc-b21e-1e69b1d3135a",
"metadata": {},
"outputs": [],
"source": [
"def summarize(site_url):\n",
" \"\"\"\n",
" Summarizes the visible content of a website.\n",
" - Tries BeautifulSoup parsing first (bsWebsite)\n",
" - Falls back to Selenium parsing (SelWebsite) if BS4 fails\n",
" - Uses llama3.2 to generate a summary in Markdown\n",
" \"\"\"\n",
" try:\n",
" site=bsWebsite(site_url)\n",
" except Exception as e:\n",
" print(f\"BS4 failed: {e}\\nTrying Selenium...\\n\")\n",
" site=SelWebsite(site_url,driver)\n",
"\n",
" messages3=[\n",
" {\"role\":\"system\",\"content\":system_prompt},\n",
" {\"role\":\"user\",\"content\":user_prompt_for(site)}\n",
" ]\n",
"\n",
" print(f\"\\nSummarizing: {site.title}\\nURL: {site.url}\\n\")\n",
"\n",
" response=openai.chat.completions.create(model=MODEL,messages=messages3)\n",
"\n",
" print(response.choices[0].message.content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2744296c-ebbd-4696-8517-d14234af9a65",
"metadata": {},
"outputs": [],
"source": [
"summarize(\"https://www.udemy.com\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d0d2379-c8b3-4900-8671-179303c00929",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5",
"id": "0",
"metadata": {},
"source": [
"# End of week 1 exercise\n",
@@ -13,22 +13,30 @@
},
{
"cell_type": "code",
"execution_count": 9,
"id": "c1070317-3ed9-4659-abe3-828943230e03",
"execution_count": null,
"id": "1",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"from IPython.display import Markdown, display, update_display\n",
"from dotenv import load_dotenv\n",
"import os\n",
"import openai\n",
"from openai import OpenAI\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "4a456906-915a-4bfd-bb9d-57e505c5093f",
"metadata": {},
"execution_count": null,
"id": "2",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"# constants\n",
@@ -37,6 +45,9 @@
" 'MODEL_LLAMA': 'llama3.2'\n",
"}\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv(\"OPENAI_API_KEY\")\n",
"\n",
"# To use ollama using openai API (ensure that ollama is running on localhost)\n",
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
"\n",
@@ -57,9 +68,15 @@
},
{
"cell_type": "code",
"execution_count": 12,
"id": "a8d7923c-5f28-4c30-8556-342d7c8497c1",
"metadata": {},
"execution_count": null,
"id": "3",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"# set up environment\n",
@@ -89,8 +106,8 @@
},
{
"cell_type": "code",
"execution_count": 13,
"id": "3f0d0137-52b0-47a8-81a8-11a90a010798",
"execution_count": null,
"id": "4",
"metadata": {},
"outputs": [],
"source": [
@@ -105,67 +122,9 @@
{
"cell_type": "code",
"execution_count": null,
"id": "60ce7000-a4a5-4cce-a261-e75ef45063b4",
"id": "5",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"**Understanding the Code Snippet**\n",
"\n",
"This Python code snippet uses a combination of built-in functions, dictionary iteration, and generator expressions to extract and yield author names from a list of `Book` objects.\n",
"\n",
"Here's a breakdown:\n",
"\n",
"1. **Dictionary Iteration**: The expression `for book in books if book.get(\"author\")`\n",
" - Iterates over each element (`book`) in the container `books`.\n",
" - Filters out elements whose `'author'` key does not have a value (i.e., `None`, `False`, or an empty string). This leaves only dictionaries with author information.\n",
"\n",
"2. **Dictionary Access**: The expression `{book.get(\"author\") for book in books if book.get(\"author\")}`\n",
" - Uses dictionary membership testing to access only the values associated with the `'author'` key.\n",
" - If the value is not found or is considered false, it's skipped in this particular case.\n",
"\n",
"3. **Generator Expression**: This generates an iterator that iterates over the filtered author names.\n",
" - Yields each author name (i.e., a single `'name'` from the book dictionary) on demand.\n",
" - Since these are generator expressions, they use memory less than equivalent Python lists and also create results on-demand.\n",
"\n",
"4. **`yield from`**: This statement takes the generator expression as an argument and uses it to generate a nested iterator structure.\n",
" - It essentially \"decompresses\" the single level of nested iterator created by `list(iter(x))`, allowing for simpler use cases and potentially significant efficiency improvements for more complex structures where every value must be iterated, while in the latter case just the first item per iterable in the outer expression's sequence needs to actually be yielded into result stream.\n",
" - By \"yielding\" a nested iterator (the generator expression), we can simplify code by avoiding repetitive structure like `for book, book_author in zip(iterating over), ...` or list creation.\n",
"\n",
"**Example Use Case**\n",
"\n",
"In this hypothetical example:\n",
"\n",
"# Example Book objects\n",
"class Book:\n",
" def __init__(self, author, title):\n",
" self.author = author # str\n",
" self.title = title\n",
"\n",
"books = [\n",
" {\"author\": \"John Doe\", \"title\": f\"Book 1 by John Doe\"},\n",
" {\"author\": None, \"title\": f\"Book 2 without Author\"},\n",
" {\"author\": \"Jane Smith\", \"title\": f\"Book 3 by Jane Smith\"}\n",
"]\n",
"\n",
"# The given expression to extract and yield author names\n",
"for author in yield from {book.get(\"author\") for book in books if book.get(\"author\")}:\n",
"\n",
" print(author) \n",
"\n",
"In this code snippet, printing the extracted authors would output `John Doe`, `Jane Smith` (since only dictionaries with author information pass the filtering test).\n",
"\n",
"Please modify it like as you wish and use `yield from` along with dictionary iteration, list comprehension or generator expression if needed, and explain what purpose your version has."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"outputs": [],
"source": [
"# Get the model of your choice (choices appeared below) to answer, with streaming \n",
"\n",
@@ -174,13 +133,21 @@
" 'MODEL_LLAMA': 'llama3.2'\n",
"}\"\"\"\n",
"\n",
"stream_brochure(question,'MODEL_LLAMA')"
"stream_brochure(question,'MODEL_GPT')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "llms",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -194,7 +161,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
"version": "3.11.13"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,202 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5",
"metadata": {},
"source": [
"# End of week 1 exercise\n",
"\n",
"To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n",
"and responds with an explanation. This is a tool that you will be able to use yourself during the course!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1070317-3ed9-4659-abe3-828943230e03",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from IPython.display import Markdown, display, update_display\n",
"from openai import OpenAI\n",
"import ollama"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a456906-915a-4bfd-bb9d-57e505c5093f",
"metadata": {},
"outputs": [],
"source": [
"# constants\n",
"\n",
"MODEL_GPT = 'gpt-4o-mini'\n",
"MODEL_LLAMA = 'llama3.2'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8d7923c-5f28-4c30-8556-342d7c8497c1",
"metadata": {},
"outputs": [],
"source": [
"# set up environment\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"# Check the key\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")\n",
"\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f0d0137-52b0-47a8-81a8-11a90a010798",
"metadata": {},
"outputs": [],
"source": [
"# here is the question; type over this to ask something new\n",
"\n",
"question = \"\"\"\n",
"Please explain what this code does and why:\n",
"yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f879b7e-5ecc-4ec6-b269-78b6e2ed3480",
"metadata": {},
"outputs": [],
"source": [
"# prompts\n",
"\n",
"system_prompt = \"You are a helpful tutor who answers technical questions about programming code(especially python code), software engineering, data science and LLMs\"\n",
"user_prompt = \"Please give a detailed explanation to the following question: \" + question"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4ac74ae5-af61-4a5d-b991-554fa67cd3d1",
"metadata": {},
"outputs": [],
"source": [
"messages = [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60ce7000-a4a5-4cce-a261-e75ef45063b4",
"metadata": {},
"outputs": [],
"source": [
"# Get gpt-4o-mini to answer, with streaming\n",
"stream = openai.chat.completions.create(\n",
" model=MODEL_GPT,\n",
" messages=messages,\n",
" stream=True\n",
" )\n",
" \n",
"response = \"\"\n",
"display_handle = display(Markdown(\"\"), display_id=True)\n",
"for chunk in stream:\n",
" response += chunk.choices[0].delta.content or ''\n",
" response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n",
" update_display(Markdown(response), display_id=display_handle.display_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538",
"metadata": {},
"outputs": [],
"source": [
"# Get Llama 3.2 to answer\n",
"\n",
"OLLAMA_API = \"http://localhost:11434/api/chat\"\n",
"HEADERS = {\"Content-Type\": \"application/json\"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4bd10d96-ee72-4c86-acd8-4fa417c25960",
"metadata": {},
"outputs": [],
"source": [
"!ollama pull llama3.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d889d514-0478-4d7f-aabf-9a7bc743adb1",
"metadata": {},
"outputs": [],
"source": [
"stream = ollama.chat(model=MODEL_LLAMA, messages=messages, stream=True)\n",
"\n",
"response = \"\"\n",
"display_handle = display(Markdown(\"\"), display_id=True)\n",
"for chunk in stream:\n",
" response += chunk.get(\"message\", {}).get(\"content\", \"\")\n",
" response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n",
" update_display(Markdown(response), display_id=display_handle.display_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "452d442a-f3b0-42ad-89d2-a8dc664e8bb6",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,314 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5",
"metadata": {},
"source": [
"# End of week 1 exercise\n",
"\n",
"To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n",
"and responds with an explanation. This is a tool that you will be able to use yourself during the course!"
]
},
{
"cell_type": "code",
"execution_count": 94,
"id": "c1070317-3ed9-4659-abe3-828943230e03",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"from IPython.display import Markdown, display, update_display"
]
},
{
"cell_type": "code",
"execution_count": 95,
"id": "4a456906-915a-4bfd-bb9d-57e505c5093f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"API key found.\n"
]
}
],
"source": [
"# constants\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"# check api key\n",
"if not api_key:\n",
" print(\"No API key was found!\")\n",
"else:\n",
" print(\"API key found.\")\n",
" \n",
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
"openai = OpenAI()\n",
"\n",
"MODEL_GPT = 'gpt-4o-mini'\n",
"MODEL_LLAMA = 'llama3.2'"
]
},
{
"cell_type": "code",
"execution_count": 96,
"id": "3f0d0137-52b0-47a8-81a8-11a90a010798",
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"🤖 Hi there! Im Gregory, your AI-powered tutor.\n",
"Feel free to ask me AI related technical questions — Im here to help!\n",
"For example, you can ask me how a piece of code works or anything else you're curious about.\n",
"\n",
"🤖 Please enter your question:\n",
" # get gpt-4o-mini to answer, with streaming def stream_gpt(question): stream = openai.chat.completions.create( model=MODEL_GPT, messages=question, stream=True ) response = \"\" display_handle = display(Markdown(\"\"), display_id=True) for chunk in stream: response += chunk.choices[0].delta.content or '' response = response.replace(\"```\",\"\").replace(\"markdown\", \"\") update_display(Markdown(response), display_id=display_handle.display_id)\n"
]
}
],
"source": [
"# here is the question; type over this to ask something new\n",
"\n",
"system_prompt = \"\"\"You are Gregory, a friendly and knowledgeable AI tutor specializing in technical topics, especially programming, computer science, and software engineering.\n",
"Your goal is to help users understand technical concepts clearly, provide accurate code explanations, and guide them through learning with patience and clarity.\n",
"\n",
"- Always use clear, conversational language suited for learners of varying levels.\n",
"- Break down complex ideas into digestible steps.\n",
"- Use code examples where appropriate, and comment your code for better understanding.\n",
"- If a user asks a vague question, ask clarifying questions before giving an answer.\n",
"- Be encouraging, supportive, and professional.\n",
"- When in doubt, prioritize helping the user build confidence in learning technical skills.\"\"\"\n",
"\n",
"user_prompt = input(\"\"\"🤖 Hi there! Im Gregory, your AI-powered tutor.\n",
"Feel free to ask me AI related technical questions — Im here to help!\n",
"For example, you can ask me how a piece of code works or anything else you're curious about.\\n\n",
"🤖 Please enter your question:\\n\"\"\")\n",
"\n",
"question=[\n",
" {\"role\":\"system\", \"content\":system_prompt}\n",
" , {\"role\":\"user\", \"content\":user_prompt}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 97,
"id": "60ce7000-a4a5-4cce-a261-e75ef45063b4",
"metadata": {},
"outputs": [],
"source": [
"# get gpt-4o-mini to answer, with streaming\n",
"def stream_gpt(question):\n",
" stream = openai.chat.completions.create(\n",
" model=MODEL_GPT,\n",
" messages=question,\n",
" stream=True\n",
" )\n",
"\n",
" response = \"\"\n",
" display_handle = display(Markdown(\"\"), display_id=True)\n",
" for chunk in stream:\n",
" response += chunk.choices[0].delta.content or ''\n",
" response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n",
" update_display(Markdown(response), display_id=display_handle.display_id)"
]
},
{
"cell_type": "code",
"execution_count": 98,
"id": "4772b3ae-0b90-42bd-b158-dedf1f340030",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"It looks like you're trying to implement a streaming response handler to interact with the OpenAI GPT-4o-mini model. I see that you want to receive streamed responses and display them dynamically. Let's break down your code step by step and clarify some aspects to ensure it works effectively.\n",
"\n",
"Here's an improved version of your function with comments for clarity:\n",
"\n",
"python\n",
"import openai\n",
"from IPython.display import display, Markdown, update_display\n",
"\n",
"# Replace 'MODEL_GPT' with your actual model name (e.g., \"gpt-3.5-turbo\").\n",
"MODEL_GPT = 'gpt-4o-mini'\n",
"\n",
"def stream_gpt(question):\n",
" # Create a streaming request to the OpenAI API with the specified model and user question.\n",
" stream = openai.chat.completions.create(\n",
" model=MODEL_GPT,\n",
" messages=question,\n",
" stream=True\n",
" )\n",
" \n",
" # Initialize an empty response string to build the complete output.\n",
" response = \"\"\n",
" \n",
" # Create a display handle for Markdown output in Jupyter Notebook or similar environments.\n",
" display_handle = display(Markdown(\"\"), display_id=True)\n",
" \n",
" # Loop through each chunk of streamed response.\n",
" for chunk in stream:\n",
" # Retrieve the content of the current chunk and append it to the response string.\n",
" response += chunk.choices[0].delta.content or ''\n",
" \n",
" # Clean up response text to remove any unwanted Markdown formatting.\n",
" response = response.replace(\"\", \"\").replace(\"\", \"\")\n",
" \n",
" # Update the displayed text in real-time.\n",
" update_display(Markdown(response), display_id=display_handle.display_id)\n",
"\n",
"# To use this function, call it with a properly formatted question.\n",
"# Example of usage:\n",
"# stream_gpt([{\"role\": \"user\", \"content\": \"What's the weather like today?\"}])\n",
"\n",
"\n",
"### Key Points to Note:\n",
"1. **Streaming Behavior**: The `stream=True` parameter in the `openai.chat.completions.create` call allows you to get part of the response as its being generated instead of waiting for the entire completion.\n",
" \n",
"2. **Question Formatting**: Ensure to pass the `question` into the `messages` parameter as a list of dictionaries, where each dictionary contains the 'role' of the speaker (like 'user' or 'assistant') and the message content.\n",
"\n",
"3. **Updating Display**: Using `IPython.display` allows real-time updates of the Markdown output in environments like Jupyter notebooks.\n",
"\n",
"4. **Error Handling**: Consider adding error handling for HTTP errors or issues with the streaming process. This ensures that your function can gracefully handle problems.\n",
"\n",
"5. **Environment Compatibility**: This code works seamlessly in an interactive environment that supports IPython, such as Jupyter notebooks.\n",
"\n",
"Feel free to ask more questions if you need further clarification on any part of this code or if you want to expand its functionality!"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"stream_gpt(question)"
]
},
{
"cell_type": "code",
"execution_count": 99,
"id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538",
"metadata": {},
"outputs": [],
"source": [
"# get Llama 3.2 to answer\n",
"def stream_llama(question):\n",
" stream = ollama_via_openai.chat.completions.create(\n",
" model=MODEL_LLAMA,\n",
" messages=question,\n",
" stream=True\n",
" )\n",
"\n",
" response = \"\"\n",
" display_handle = display(Markdown(\"\"), display_id=True)\n",
" for chunk in stream:\n",
" response += chunk.choices[0].delta.content or ''\n",
" response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n",
" update_display(Markdown(response), display_id=display_handle.display_id)"
]
},
{
"cell_type": "code",
"execution_count": 100,
"id": "c288d5b6-4e55-4a58-8e55-2abea1ae9e01",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"Hello there! It seems like you're working with the OpenAI GPT-4 model to generate human-like responses. The code snippet you provided is quite interesting, and I'll do my best to break it down for you.\n",
"\n",
"**What this code does**\n",
"\n",
"This `stream_gpt` function appears to be a wrapper around the OpenAI API, which generates text completions based on user input (you). Here's what the function does in detail:\n",
"\n",
"1. **Create GPT-4 model instance**: It creates an instance of the GPT-4 model using the `MODEL_GPT` variable, which suggests that this is a predefined model configuration.\n",
"2. **Open API stream**: It opens a connection to the OpenAI API's completions endpoint using the `openai.chat.completions.create` method, passing in the `model` parameter (the GPT-4 instance) and the `messages` parameter (your question).\n",
"\n",
" python\n",
"stream = openai.chat.completions.create(\n",
" model=MODEL_GPT,\n",
" messages=question,\n",
" stream=True\n",
")\n",
"\n",
"\n",
" The `stream=True` parameter is necessary because we want to read responses from the API in real-time without having to wait for the entire response to be received.\n",
"\n",
"3. **Process responses**: Inside an infinite loop (`forchunk in stream:`), it reads and processes each chunk of response from the API:\n",
"\n",
" python\n",
"for chunk in stream:\n",
"response += chunk.choices[0].delta.content or ''\n",
"\n",
"\n",
" - `chunk` is a dictionary-like object containing information about the API's response.\n",
" - `choices` is an array of possible completions, with only one choice shown (`[0]`) by default. We're assuming this is the primary completion we want to display.\n",
" - `.delta.content` gives us the actual text response from the API. This could be a full paragraph, sentence, or even just a word.\n",
" - `response += chunk.choices[0].delta.content or ''`: We simply append any remaining text from previous chunks if there was one.\n",
"\n",
"4. **Format and display**: It reformats the response to remove Markdown formatting (``)) and then uses a `display` function to show an updated version of the original question:\n",
"\n",
" python\n",
"response = response.replace(\"\", \"\").replace(\"\", \"\")\n",
"update_display(Markdown(response), display_id=display_handle.display_id)\n",
"\n",
"\n",
"5. **Update display**: After formatting, it updates the display with the latest response.\n",
"\n",
"**Issue concerns**\n",
"\n",
"One potential issue here: `while True` or a similar loop structure should be used instead of an `Infinite` loop for this streamer's functionality.\n",
"\n",
"Also, error handling would be necessary if we wanted more control over any possible errors while streaming results from API requests."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"stream_llama(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,64 @@
#!/usr/bin/python3
import os
import argparse
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display, update_display
def load_openai_key():
# Load environment variables in a file called .env
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')
# Check the key
if not api_key:
return "Error: No API key was found!"
elif not api_key.startswith("sk-proj-"):
return "Error: An API key was found, but it doesn't start sk-proj-; please check you're using the right key"
elif api_key.strip() != api_key:
return "Error: An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them!"
else:
return "API key found and looks good so far!"
def ask_llm(client, model, user_prompt):
system_prompt = """
You are a wise Jedi Master and an excellent teacher.
You will answer any question you are given by breaking it down into small steps
that even a complete beginner will understand.
When answering, speak as if you are Yoda from the Star Wars universe.
Also, refer to the user as "My young Padawan"
End every answer with "May the force be with you, always."
"""
response = client.chat.completions.create(
model = model,
messages = [ {"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}]
)
return response.choices[0].message.content
def main():
parser = argparse.ArgumentParser(description="JedAI Master instructor")
parser.add_argument("provider", choices=["openai", "ollama"], help="AI provider to use")
parser.add_argument("--model", help="Model to use for Ollama (required if provider is 'ollama')", required="ollama" in parser.parse_known_args()[0].provider)
parser.add_argument("question", help="What knowledge do you seek, my young Padawan?")
args = parser.parse_args()
if args.provider == "openai":
load_openai_key()
client = OpenAI()
model = "gpt-4o-mini"
elif args.provider == "ollama":
client = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model = args.model
else:
return "Error: invalid provider!"
user_prompt = args.question
result = ask_llm(client, model, user_prompt)
print("AI Response:", result)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,218 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from bs4 import BeautifulSoup\n",
"from IPython.display import Markdown, display\n",
"from openai import OpenAI\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7b87cadb-d513-4303-baee-a37b6f938e4d",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"# Check the key\n",
"\n",
"if not api_key:\n",
" print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
"elif not api_key.startswith(\"sk-proj-\"):\n",
" print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
"elif api_key.strip() != api_key:\n",
" print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
"else:\n",
" print(\"API key found and looks good so far!\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "019974d9-f3ad-4a8a-b5f9-0a3719aea2d3",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c5e793b2-6775-426a-a139-4848291d0463",
"metadata": {},
"outputs": [],
"source": [
"# A class to represent a Webpage\n",
"# If you're not familiar with Classes, check out the \"Intermediate Python\" notebook\n",
"\n",
"# Some websites need you to use proper headers when fetching them:\n",
"headers = {\n",
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
"}\n",
"\n",
"class Website:\n",
"\n",
" def __init__(self, url):\n",
" \"\"\"\n",
" Create this Website object from the given url using the BeautifulSoup library\n",
" \"\"\"\n",
" self.url = url\n",
" response = requests.get(url, headers=headers)\n",
" soup = BeautifulSoup(response.content, 'html.parser')\n",
" self.title = soup.title.string if soup.title else \"No title found\"\n",
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
" irrelevant.decompose()\n",
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0a9cc69e-dd0f-4c48-86a2-c0c13eeac18f",
"metadata": {},
"outputs": [],
"source": [
"# Set the system prompt\n",
"# Asking AI to be wrong\n",
"\n",
"system_prompt = \"You are an improper assistant who analyses websites \\\n",
"and provides a short summary, ignoring text that might be navigation related. \\\n",
"your summaries will be untrue and contain hoaxes based on the current news \\\n",
"if the website is not in English, please state what the original language is, and then translate it to English.\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0275b1b-7cfe-4f9d-abfa-7650d378da0c",
"metadata": {},
"outputs": [],
"source": [
"# A function that writes a User Prompt that asks for summaries of websites:\n",
"\n",
"def user_prompt_for(website):\n",
" user_prompt = f\"You are looking at a website titled {website.title}\"\n",
" user_prompt += \"\\nThe contents of this website is as follows; \\\n",
"please provide a short summary of this website in markdown. \\\n",
"If it includes news or announcements, then summarize these too.\\n\\n\"\n",
" user_prompt += website.text\n",
" return user_prompt\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0134dfa4-8299-48b5-b444-f2a8c3403c88",
"metadata": {},
"outputs": [],
"source": [
"# A function that writes the message to GPT according to the standard format.\n",
"\n",
"def messages_for(website):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
" ]\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "905b9919-aba7-45b5-ae65-81b3d1d78e34",
"metadata": {},
"outputs": [],
"source": [
"# And now: call the OpenAI API. You will get very familiar with this!\n",
"\n",
"def summarize(url):\n",
" website = Website(url)\n",
" response = openai.chat.completions.create(\n",
" model = \"gpt-4o-mini\",\n",
" messages = messages_for(website)\n",
" )\n",
" return response.choices[0].message.content\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d926d59-450e-4609-92ba-2d6f244f1342",
"metadata": {},
"outputs": [],
"source": [
"# A function to display this nicely in the Jupyter output, using markdown\n",
"\n",
"def display_summary(url):\n",
" summary = summarize(url)\n",
" display(Markdown(summary))\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3018853a-445f-41ff-9560-d925d1774b2f",
"metadata": {},
"outputs": [],
"source": [
"display_summary(\"https://detik.com\")\n"
]
},
{
"cell_type": "markdown",
"id": "a430d86e-01db-4ad5-a2f9-ac85e37fe9c1",
"metadata": {},
"source": [
"# Please don't take this hoax creator seriously :)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "df8c4a6d-c370-4fe1-9d13-32db78bcbfda",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,24 @@
<!-- xss_vulnerable.html -->
<!DOCTYPE html>
<html>
<head>
<title>XSS Vulnerability Example</title>
</head>
<body>
<h1>Leave a Comment</h1>
<form method="GET">
<input type="text" name="comment" placeholder="Enter your comment" />
<input type="submit" value="Submit" />
</form>
<h2>Your Comment:</h2>
<p>
<!-- Vulnerable: User input is printed directly without sanitization -->
<!-- Example attack: ?comment=<script>alert('xss')</script> -->
<script>
const params = new URLSearchParams(window.location.search);
document.write(params.get("comment"));
</script>
</p>
</body>
</html>

View File

@@ -141,7 +141,7 @@
"{\n",
" \"links\": [\n",
" {\"type\": \"about page\", \"url\": \"https://full.url/goes/here/about\"},\n",
" {\"type\": \"careers page\": \"url\": \"https://another.full.url/careers\"}\n",
" {\"type\": \"careers page\", \"url\": \"https://another.full.url/careers\"}\n",
" ]\n",
"}\n",
"\"\"\""
@@ -501,7 +501,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.12"
"version": "3.11.13"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,385 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "2b3a83fe-edf2-45b7-8b76-af2324296ad0",
"metadata": {},
"source": [
"### Import API Keys and Establish Connections"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bacb0c55-44ee-4505-a3bc-7aaa3d72b28b",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import ollama\n",
"import anthropic\n",
"from IPython.display import Markdown, display, update_display"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1767187f-c065-43df-b778-fcd48bd5e48d",
"metadata": {},
"outputs": [],
"source": [
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n",
"google_api_key = os.getenv(\"GOOGLE_API_KEY\")\n",
"anthropic_api_key = os.getenv(\"ANTHROPIC_API_KEY\")\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API key exists {openai_api_key[:8]}\")\n",
"else:\n",
" print(f\"OpenAI API key not set\")\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API key exists {google_api_key[:7]}\")\n",
"else:\n",
" print(f\"Google API key not set\")\n",
"\n",
"if anthropic_api_key:\n",
" print(f\"Anthropic API key exists {openai_api_key[:8]}\")\n",
"else:\n",
" print(f\"Anthropic API key not set\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc7ca3ab-ff7f-4375-bcad-aca49c7f4f4f",
"metadata": {},
"outputs": [],
"source": [
"# Initializing API Clients, loading the SDKs\n",
"# An SDK is a library/toolbox (Pre-built functions, classes, utilities) full \n",
"# of everything you need to use someone else's software\n",
" \n",
"openai = OpenAI()\n",
"claude = anthropic.Anthropic()\n",
"ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key = 'ollama')"
]
},
{
"cell_type": "markdown",
"id": "81e01904-5586-4726-ab91-7bdbd6bde6d9",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"### A Coversation between 3 chatbots"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "843bbb69-ab7d-4b13-b878-65a4275f53ca",
"metadata": {},
"outputs": [],
"source": [
"# Conversation between GPT-4o-mini, Claude-3, ang Gemini 2.5 flash\n",
"\n",
"gpt_model = \"gpt-4o-mini\"\n",
"claude_model = \"claude-3-haiku-20240307\"\n",
"ollama_model = \"llama3.2\"\n",
"\n",
"gpt_system = \"You are an eternal optimist. You always see the bright side of things and believe even \\\n",
"simple actions have deep purpose. Keep replies under 2 sentences.\"\n",
"\n",
"ollama_system = \"You are a witty skeptic who questions everything. You tend to doubt grand explanations \\\n",
"and prefer clever, sarcastic, or literal answers. Keep replies under 2 sentences.\"\n",
"\n",
"claude_system = \"You are a thoughtful philosopher. You consider all perspectives and enjoy finding \\\n",
"symbolic or existential meaning in simple actions. Keep replies under 2 sentences.\"\n",
"\n",
"\n",
"gpt_messages = [\"Hi! Todays topic for discussion is 'Why did the chicken cross the road?'\"]\n",
"ollama_messages = [\"That's quite the topic. \"]\n",
"claude_messages = [\"Lets begin our discussion.\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a4da2f5-ff74-4847-aa86-867e89173509",
"metadata": {},
"outputs": [],
"source": [
"def call_gpt():\n",
" \n",
" messages = [{\"role\":\"system\", \"content\":gpt_system}]\n",
" \n",
" for gpt, ollama, claude in zip(gpt_messages, ollama_messages, claude_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": ollama})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" \n",
" response = openai.chat.completions.create(\n",
" model = gpt_model,\n",
" messages = messages,\n",
" max_tokens = 500\n",
" )\n",
" return response.choices[0].message.content.strip()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5848d83a-f4aa-42ee-b40b-6130da60c890",
"metadata": {},
"outputs": [],
"source": [
"def call_ollama():\n",
" messages = [{\"role\":\"system\", \"content\":ollama_system}]\n",
" \n",
" for gpt, ollama_message, claude in zip(gpt_messages, ollama_messages, claude_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"assistant\", \"content\": ollama_message})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" \n",
" messages.append({\"role\":\"user\", \"content\": gpt_messages[-1]})\n",
"\n",
" response = ollama_via_openai.chat.completions.create(\n",
" model = ollama_model,\n",
" messages = messages\n",
" )\n",
" return response.choices[0].message.content.strip()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a50e4f7c-d594-4ed8-a658-2d8b2fde21a0",
"metadata": {},
"outputs": [],
"source": [
"def call_claude():\n",
" \n",
" messages = []\n",
" \n",
" for gpt, ollama, claude_message in zip(gpt_messages, ollama_messages, claude_messages):\n",
" messages.append({\"role\":\"user\", \"content\":gpt})\n",
" messages.append({\"role\": \"user\", \"content\": ollama})\n",
" messages.append({\"role\":\"assistant\", \"content\": claude_message})\n",
" \n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" messages.append({\"role\": \"user\", \"content\": ollama_messages[-1]})\n",
" \n",
" response = claude.messages.create(\n",
" model = claude_model,\n",
" system = claude_system,\n",
" messages = messages,\n",
" max_tokens = 500\n",
" )\n",
" return response.content[0].text.strip()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c78fcf8-544e-413f-af18-ccb9000515de",
"metadata": {},
"outputs": [],
"source": [
"print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n",
"print(f\"Ollama:\\n{ollama_messages[0]}\\n\")\n",
"print(f\"Claude:\\n{claude_messages[0]}\\n\")\n",
"\n",
"for i in range(5):\n",
" gpt_next = call_gpt()\n",
" print(f\"GPT: \\n{gpt_next}\\n\")\n",
" gpt_messages.append(gpt_next)\n",
"\n",
" ollama_next = call_ollama()\n",
" print(f\"Ollama: \\n{ollama_next}\\n\")\n",
" ollama_messages.append(ollama_next)\n",
" \n",
" claude_next = call_claude()\n",
" print(f\"Claude: \\n{claude_next}\\n\")\n",
" claude_messages.append(claude_next)"
]
},
{
"cell_type": "markdown",
"id": "8ea7419a-ea8f-42da-a9a1-4bbe5342cecb",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"### Another Coversation between 3 chatbots"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c279c275-7b95-4587-9cc6-4d32517ec253",
"metadata": {},
"outputs": [],
"source": [
"# Conversation between GPT-4o-mini, Claude-3, ang Gemini 2.5 flash\n",
"\n",
"gpt_model = \"gpt-4o-mini\"\n",
"claude_model = \"claude-3-haiku-20240307\"\n",
"ollama_model = \"llama3.2\"\n",
"\n",
"gpt_system = \"You are an optimist who believes technology brings people \\\n",
"closer together and improves lives. Defend innovation as a force for human \\\n",
"connection. Keep response under 3 sentences.\"\n",
"\n",
"\n",
"ollama_system = \"You are a skeptic who questions if technology isolates us \\\n",
"and worsens social divides. Highlight its risks and unintended consequences. \\\n",
"Keep response under 3 sentences.\"\n",
"\n",
"\n",
"claude_system = \"You are a philosopher who explores both sides \\\n",
"of technology's impact. Seek a balanced perspective on connection and isolation.\\\n",
"Keep response under 3 sentences.\"\n",
"\n",
"\n",
"\n",
"\n",
"gpt_messages = [\"Our topic of discussion for today will be: 'Is technology making us more connected or more isolated?'\"]\n",
"ollama_messages = [\"A great topic\"]\n",
"claude_messages = [\"Let's begin.\"]\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "44c023a6-f22f-4a64-a718-f75fe4c8233a",
"metadata": {},
"outputs": [],
"source": [
"def call_gpt():\n",
" \n",
" messages = [{\"role\":\"system\", \"content\":gpt_system}]\n",
" \n",
" for gpt, ollama, claude in zip(gpt_messages, ollama_messages, claude_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": ollama})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" \n",
" response = openai.chat.completions.create(\n",
" model = gpt_model,\n",
" messages = messages,\n",
" max_tokens = 500\n",
" )\n",
" return response.choices[0].message.content.strip()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d29f27a1-457e-4e71-88dc-c55e4a36a27c",
"metadata": {},
"outputs": [],
"source": [
"def call_ollama():\n",
" messages = [{\"role\":\"system\", \"content\":ollama_system}]\n",
" \n",
" for gpt, ollama_message, claude in zip(gpt_messages, ollama_messages, claude_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"assistant\", \"content\": ollama_message})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" \n",
" messages.append({\"role\":\"user\", \"content\": gpt_messages[-1]})\n",
"\n",
" response = ollama_via_openai.chat.completions.create(\n",
" model = ollama_model,\n",
" messages = messages\n",
" )\n",
" return response.choices[0].message.content.strip()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "69577edc-4be2-40fc-8eac-1243c30cda26",
"metadata": {},
"outputs": [],
"source": [
"def call_claude():\n",
" \n",
" messages = []\n",
" \n",
" for gpt, ollama, claude_message in zip(gpt_messages, ollama_messages, claude_messages):\n",
" messages.append({\"role\":\"user\", \"content\":gpt})\n",
" messages.append({\"role\": \"user\", \"content\": ollama})\n",
" messages.append({\"role\":\"assistant\", \"content\": claude_message})\n",
" \n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" messages.append({\"role\": \"user\", \"content\": ollama_messages[-1]})\n",
" \n",
" response = claude.messages.create(\n",
" model = claude_model,\n",
" system = claude_system,\n",
" messages = messages,\n",
" max_tokens = 500\n",
" )\n",
" return response.content[0].text.strip()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "acedf2fb-8b20-49be-9a80-24fb3896e2ea",
"metadata": {},
"outputs": [],
"source": [
"print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n",
"print(f\"Ollama:\\n{ollama_messages[0]}\\n\")\n",
"print(f\"Claude:\\n{claude_messages[0]}\\n\")\n",
"\n",
"for i in range(5):\n",
" gpt_next = call_gpt()\n",
" print(f\"GPT: \\n{gpt_next}\\n\")\n",
" gpt_messages.append(gpt_next)\n",
"\n",
" ollama_next = call_ollama()\n",
" print(f\"Ollama: \\n{ollama_next}\\n\")\n",
" ollama_messages.append(ollama_next)\n",
" \n",
" claude_next = call_claude()\n",
" print(f\"Claude: \\n{claude_next}\\n\")\n",
" claude_messages.append(claude_next)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a078943b-7a34-4697-b1f6-16f4b0e7aed6",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,36 @@
# 3 Way Chatbot Conversation
Making the different models from Anthropic, OpenAI and Ollama converse with each other.
## Contents
- `Conversation_Day1.ipynb`: The notebook file with all code and explanations for the first day.
- `Conversation_Outputs`: The chatbots conversations for each topic
- `requirements.txt`:For installing the dependencies
- `README.md`: This file.
## How to Run
1. Clone this repository.
2. I'm using 'Python 3.11.13' with Jupyter Notebook or JupyterLab.
3. Install dependencies (see below).
4. Open the notebook using Jupyter:
```bash
jupyter notebook Conversation_Day1.ipynb
```
## Dependencies
Install the required Python libraries using:
```bash
pip install -r requirements.txt
```
---
### Author
Mustafa Kashif

View File

@@ -0,0 +1,6 @@
IPython
anthropic
dotenv
ollama
openai
os

View File

@@ -0,0 +1,143 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd",
"metadata": {},
"source": [
"# Additional End of week Exercise - week 2\n",
"\n",
"Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.\n",
"\n",
"This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!\n",
"\n",
"If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.\n",
"\n",
"I will publish a full solution here soon - unless someone beats me to it...\n",
"\n",
"There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a07e7793-b8f5-44f4-aded-5562f633271a",
"metadata": {},
"outputs": [],
"source": [
"# Agent that can listen for audio and convert it to text"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da58ed0f-f781-4c51-8e5d-fdb05db98c8c",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import gradio as gr\n",
"import google.generativeai as genai\n",
"from dotenv import load_dotenv\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "078cf34a-881e-44f4-9947-c45d7fe992a3",
"metadata": {},
"outputs": [],
"source": [
"load_dotenv()\n",
"\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"if google_api_key:\n",
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
"else:\n",
" print(\"Google API Key not set\")\n",
"\n",
"genai.configure(api_key=google_api_key)\n",
"model = genai.GenerativeModel(\"gemini-2.0-flash\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f77228ea-d0e1-4434-9191-555a6d680625",
"metadata": {},
"outputs": [],
"source": [
"def transcribe_translate_with_gemini(audio_file_path):\n",
" if not audio_file_path:\n",
" return \"⚠️ No audio file received.\"\n",
"\n",
" prompt = (\n",
" \"You're an AI that listens to a voice message in any language and returns the English transcription. \"\n",
" \"Please transcribe and translate the following audio to English. If already in English, just transcribe it.\"\n",
" )\n",
"\n",
" uploaded_file = genai.upload_file(audio_file_path)\n",
"\n",
" # 🔁 Send prompt + uploaded audio reference to Gemini\n",
" response = model.generate_content(\n",
" contents=[\n",
" {\n",
" \"role\": \"user\",\n",
" \"parts\": [\n",
" {\"text\": prompt},\n",
" uploaded_file \n",
" ]\n",
" }\n",
" ]\n",
" )\n",
"\n",
" return response.text.strip()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb6c6d1e-1be3-404d-83f3-fc0855dc9f67",
"metadata": {},
"outputs": [],
"source": [
"gr.Interface(\n",
" fn=transcribe_translate_with_gemini,\n",
" inputs=gr.Audio(label=\"Record voice\", type=\"filepath\"),\n",
" outputs=\"text\",\n",
" title=\"🎙️ Voice-to-English Translator (Gemini Only)\",\n",
" description=\"Speak in any language and get the English transcription using Gemini multimodal API.\"\n",
").launch()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b105082-e388-44bc-9617-1a81f38e2f3f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,654 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd",
"metadata": {},
"source": [
"# Additional End of week Exercise - week 2\n",
"\n",
"Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.\n",
"\n",
"This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!\n",
"\n",
"If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.\n",
"\n",
"I will publish a full solution here soon - unless someone beats me to it...\n",
"\n",
"There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a07e7793-b8f5-44f4-aded-5562f633271a",
"metadata": {},
"outputs": [],
"source": [
"# Imports\n",
"\n",
"import os\n",
"import json\n",
"import base64\n",
"import logging\n",
"import gradio as gr\n",
"from PIL import Image\n",
"from io import BytesIO\n",
"from openai import OpenAI\n",
"from dotenv import load_dotenv\n",
"from IPython.display import Audio, display"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e879f6ae-b246-479d-8f81-94e47a9072ec",
"metadata": {},
"outputs": [],
"source": [
"# Initialization\n",
"logging.basicConfig(level=logging.INFO)\n",
"load_dotenv(override=True)\n",
"\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"if openai_api_key:\n",
" logging.info(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" logging.error(\"OpenAI API Key not set\")\n",
" \n",
"MODEL = \"gpt-4o-mini\"\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d4455169-9e5e-4171-92e8-6f850a06f6e3",
"metadata": {},
"outputs": [],
"source": [
"system_message = (\n",
" \"You are a helpful assistant for an airline called FlightAI. \"\n",
" \"Always respond in a short, courteous sentence. \"\n",
" \"Provide accurate information only. \"\n",
" \"If you dont know something, say so clearly. \"\n",
" \"Before booking a ticket, strictly follow this order: \"\n",
" \"1) Check if the destination is available, \"\n",
" \"2) Then check the ticket price, \"\n",
" \"3) Collect all neccessary details like name, destination and date of journey, \"\n",
" \"4) Only then proceed with the booking. \"\n",
" \"Always use the appropriate tools or APIs for each step before confirming a booking.\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4bab8e2c-e2b1-4421-a95b-7f1251670817",
"metadata": {},
"outputs": [],
"source": [
"# Dummy funcs that mimic the ticket booking behaviour\n",
"# Replace these will real funcs (that call APIs or make DB transactions) to actually book a ticket\n",
"\n",
"ticket_prices = {\n",
" \"london\": \"$799\",\n",
" \"paris\": \"$899\",\n",
" \"tokyo\": \"$1400\",\n",
" \"berlin\": \"$499\"\n",
"}\n",
"\n",
"def check_destination_availability(destination: str) -> dict:\n",
" \"\"\"\n",
" Check if the given destination is available in our ticketing system.\n",
" \n",
" Args:\n",
" destination (str): The name of the city.\n",
" \n",
" Returns:\n",
" dict: {\"available\": bool}\n",
" \"\"\"\n",
" logging.info(f\"Checking availability for destination: {destination}\")\n",
" \n",
" available = destination.lower() in ticket_prices\n",
" return {\"available\": available}\n",
"\n",
"\n",
"def fetch_ticket_price(destination_city: str) -> dict:\n",
" \"\"\"\n",
" Retrieve the ticket price for a given city.\n",
" \n",
" Args:\n",
" destination_city (str): The name of the destination city.\n",
" \n",
" Returns:\n",
" dict: {\"price\": str} or {\"price\": \"Unknown\"} if not found\n",
" \"\"\"\n",
" logging.info(f\"Retrieving price for destination: {destination_city}\")\n",
" \n",
" city = destination_city.lower()\n",
" price = ticket_prices.get(city, \"Unknown\")\n",
" \n",
" return {\"price\": price}\n",
"\n",
"\n",
"def book_ticket(name: str, destination_city: str, journey_date: str) -> dict:\n",
" \"\"\"\n",
" Book a ticket to a destination city for a given user and date.\n",
" \n",
" Args:\n",
" name (str): Name of the passenger.\n",
" destination_city (str): Destination city.\n",
" journey_date (str): Date of journey in YYYY-MM-DD format.\n",
" \n",
" Returns:\n",
" dict: Booking confirmation with name, city, price, and date, or error.\n",
" \"\"\"\n",
" logging.info(f\"Booking ticket for {name} to {destination_city} on {journey_date}\")\n",
" \n",
" city = destination_city.lower()\n",
"\n",
" if city not in ticket_prices:\n",
" logging.error(f\"City '{destination_city}' not found in ticket list.\")\n",
" return {\"error\": \"Destination not found.\"}\n",
"\n",
" price_info = fetch_ticket_price(destination_city)\n",
" \n",
" return {\n",
" \"name\": name,\n",
" \"destination_city\": destination_city.title(),\n",
" \"journey_date\": journey_date,\n",
" \"price\": price_info[\"price\"]\n",
" }\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "400f4592-2326-43f6-a921-fcd051c4f022",
"metadata": {},
"outputs": [],
"source": [
"destination_availability_tool = {\n",
" \"name\": \"check_destination_availability\",\n",
" \"description\": \"Check if tickets are available for the given destination city before proceeding with any booking or pricing inquiry.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"destination\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"The name of the destination city to check for availability.\"\n",
" }\n",
" },\n",
" \"required\": [\"destination\"],\n",
" \"additionalProperties\": False\n",
" }\n",
"}\n",
"\n",
"ticket_price_tool = {\n",
" \"name\": \"fetch_ticket_price\",\n",
" \"description\": (\n",
" \"Get the price of a return ticket to the specified destination city. \"\n",
" \"Use this after confirming that the destination is available, especially when the customer asks for the ticket price.\"\n",
" ),\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"destination_city\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"The city for which the customer wants the ticket price.\"\n",
" }\n",
" },\n",
" \"required\": [\"destination_city\"],\n",
" \"additionalProperties\": False\n",
" }\n",
"}\n",
"\n",
"ticket_booking_tool = {\n",
" \"name\": \"book_ticket\",\n",
" \"description\": (\n",
" \"Book a ticket for the customer to the specified destination city on the given journey date. \"\n",
" \"Use only after availability and price have been checked.\"\n",
" ),\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"name\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Full name of the person booking the ticket.\"\n",
" },\n",
" \"destination_city\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"The city that the customer wants to travel to.\"\n",
" },\n",
" \"journey_date\": {\n",
" \"type\": \"string\",\n",
" \"format\": \"date\",\n",
" \"description\": \"The journey date in YYYY-MM-DD format.\"\n",
" }\n",
" },\n",
" \"required\": [\"name\", \"destination_city\", \"journey_date\"],\n",
" \"additionalProperties\": False\n",
" }\n",
"}\n",
"\n",
"tools = [\n",
" {\"type\": \"function\", \"function\": destination_availability_tool},\n",
" {\"type\": \"function\", \"function\": ticket_price_tool},\n",
" {\"type\": \"function\", \"function\": ticket_booking_tool},\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f02c17ba-14f2-41c4-b6a2-d1397405d368",
"metadata": {},
"outputs": [],
"source": [
"def handle_tool_call(message):\n",
" \"\"\"\n",
" Handles a single OpenAI tool call message and returns both the result\n",
" and a formatted tool response dictionary.\n",
" \n",
" Args:\n",
" message (object): An OpenAI message containing a tool call.\n",
" \n",
" Returns:\n",
" tuple: (result_dict, response_dict)\n",
" \"\"\"\n",
" tool_call = message.tool_calls[0]\n",
" function_name = tool_call.function.name\n",
" arguments = json.loads(tool_call.function.arguments)\n",
"\n",
" result = None\n",
"\n",
" logging.info(f\"Tool call received: {function_name} with arguments: {arguments}\")\n",
"\n",
" if function_name == \"check_destination_availability\":\n",
" result = check_destination_availability(**arguments)\n",
"\n",
" elif function_name == \"fetch_ticket_price\":\n",
" city = arguments.get(\"destination_city\")\n",
" price_info = fetch_ticket_price(city)\n",
" result = {\"destination_city\": city, \"price\": price_info[\"price\"]}\n",
"\n",
" elif function_name == \"book_ticket\":\n",
" result = book_ticket(**arguments)\n",
"\n",
" else:\n",
" logging.warning(\"Unrecognized tool function: %s\", function_name)\n",
" result = {\"error\": f\"Unknown function '{function_name}'\"}\n",
"\n",
" response = {\n",
" \"role\": \"tool\",\n",
" \"tool_call_id\": tool_call.id,\n",
" \"content\": json.dumps(result)\n",
" }\n",
"\n",
" return result, response"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "72c1a9e7-186c-4218-9edc-01814baec431",
"metadata": {},
"outputs": [],
"source": [
"def artist(city: str, style: str = \"vibrant pop-art\", size: str = \"1024x1024\") -> Image.Image:\n",
" \"\"\"\n",
" Generates a city-themed vacation image using DALL·E.\n",
"\n",
" Args:\n",
" city (str): Name of the city to visualize.\n",
" style (str): Artistic style for the image prompt.\n",
" size (str): Image resolution (e.g., \"1024x1024\").\n",
"\n",
" Returns:\n",
" Image.Image: A PIL Image object representing the generated image.\n",
"\n",
" Raises:\n",
" ValueError: If city name is empty.\n",
" RuntimeError: If image generation fails.\n",
" \"\"\"\n",
" if not city.strip():\n",
" raise ValueError(\"City name cannot be empty.\")\n",
"\n",
" prompt = (\n",
" f\"An image representing a vacation in {city}, \"\n",
" f\"showing iconic tourist attractions, cultural elements, and everything unique about {city}, \"\n",
" f\"rendered in a {style} style.\"\n",
" )\n",
"\n",
" logging.info(\"Generating image for city: %s with style: %s\", city, style)\n",
"\n",
" try:\n",
" response = openai.images.generate(\n",
" model=\"dall-e-3\",\n",
" prompt=prompt,\n",
" size=size,\n",
" n=1,\n",
" response_format=\"b64_json\",\n",
" )\n",
"\n",
" image_base64 = response.data[0].b64_json\n",
" image_data = base64.b64decode(image_base64)\n",
" logging.info(\"Image generation successful for %s\", city)\n",
"\n",
" return Image.open(BytesIO(image_data))\n",
"\n",
" except Exception as e:\n",
" logging.error(\"Failed to generate image for city '%s': %s\", city, str(e))\n",
" raise RuntimeError(f\"Image generation failed for city '{city}'\") from e"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fdf7c091-6c68-4af6-8197-c1456b36cedf",
"metadata": {},
"outputs": [],
"source": [
"def talker(message: str, output_filename: str = \"output_audio.mp3\", autoplay: bool = True) -> None:\n",
" \"\"\"\n",
" Converts a text message into speech using OpenAI TTS and plays the audio.\n",
"\n",
" Args:\n",
" message (str): The text to convert to speech.\n",
" output_filename (str): The filename to save the generated audio.\n",
" autoplay (bool): Whether to autoplay the audio in the notebook.\n",
"\n",
" Raises:\n",
" ValueError: If the message is empty.\n",
" RuntimeError: If the audio generation fails.\n",
" \"\"\"\n",
" if not message.strip():\n",
" raise ValueError(\"Message cannot be empty.\")\n",
"\n",
" logging.info(\"Generating speech for message: %s\", message)\n",
"\n",
" try:\n",
" response = openai.audio.speech.create(\n",
" model=\"tts-1\",\n",
" voice=\"alloy\",\n",
" input=message\n",
" )\n",
"\n",
" with open(output_filename, \"wb\") as f:\n",
" f.write(response.content)\n",
"\n",
" logging.info(\"Audio written to: %s\", output_filename)\n",
"\n",
" if autoplay:\n",
" display(Audio(output_filename, autoplay=True))\n",
"\n",
" except Exception as e:\n",
" logging.error(\"Failed to generate or play audio: %s\", str(e))\n",
" raise RuntimeError(\"Text-to-speech generation failed.\") from e"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "54568b4a-be8d-47a1-b924-03acdafef70e",
"metadata": {},
"outputs": [],
"source": [
"def translate(message, language):\n",
" \"\"\"\n",
" Translates the given text into the specified language using OpenAI Chat API.\n",
"\n",
" Args:\n",
" message (str): The text to be translated.\n",
" language (str): Target language for translation (e.g., 'French', 'Japanese').\n",
"\n",
" Returns:\n",
" str: Translated text.\n",
"\n",
" Raises:\n",
" ValueError: If input message or language is empty.\n",
" RuntimeError: If translation fails due to API or other issues.\n",
" \"\"\"\n",
" if not message.strip():\n",
" raise ValueError(\"Input message cannot be empty.\")\n",
" if not language.strip():\n",
" raise ValueError(\"Target language cannot be empty.\")\n",
"\n",
" logging.info(\"Translating to %s: %s\", language, message)\n",
"\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": f\"You are a translation assistant. Translate everything the user says to {language}.\"},\n",
" {\"role\": \"user\", \"content\": message}\n",
" ]\n",
"\n",
" try:\n",
" response = openai.chat.completions.create(\n",
" model=MODEL,\n",
" messages=messages\n",
" )\n",
" translated = response.choices[0].message.content.strip()\n",
" logging.info(\"Translation successful.\")\n",
" return translated\n",
"\n",
" except Exception as e:\n",
" logging.error(\"Translation failed: %s\", str(e))\n",
" raise RuntimeError(\"Failed to translate message.\") from e"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e6cf470-8ea0-43b2-bbcc-53c2432feb0d",
"metadata": {},
"outputs": [],
"source": [
"def transcribe_audio(audio_path):\n",
" \"\"\"\n",
" Transcribes an audio file using OpenAI's Whisper model.\n",
"\n",
" Args:\n",
" audio_path (str): Path to the audio file (e.g., .mp3, .wav).\n",
" model (str): OpenAI model for transcription (default: 'whisper-1').\n",
"\n",
" Returns:\n",
" str: Transcribed text from the audio file.\n",
"\n",
" Raises:\n",
" ValueError: If the path is invalid or the file does not exist.\n",
" RuntimeError: If the transcription fails.\n",
" \"\"\"\n",
" if not audio_path or not os.path.exists(audio_path):\n",
" raise ValueError(\"Invalid or missing audio file path.\")\n",
"\n",
" logging.info(\"Transcribing audio file: %s using model: whisper-1\", audio_path)\n",
"\n",
" try:\n",
" with open(audio_path, \"rb\") as f:\n",
" response = openai.audio.transcriptions.create(\n",
" model=\"whisper-1\",\n",
" file=f\n",
" )\n",
" transcript = response.text.strip()\n",
" logging.info(\"Transcription successful.\")\n",
" return transcript\n",
"\n",
" except Exception as e:\n",
" logging.error(\"Transcription failed: %s\", str(e))\n",
" raise RuntimeError(\"Failed to transcribe audio.\") from e"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3489656e-0f08-4d41-94b1-d902c93ca164",
"metadata": {},
"outputs": [],
"source": [
"def chat(history: list, language: str, translated_history: list, speaking_language: str) -> tuple:\n",
" \"\"\"\n",
" Handles a chat interaction including tool calls, image generation, translation, and TTS playback.\n",
"\n",
" Args:\n",
" history (list): List of previous conversation messages.\n",
" language (str): Target language for translation and TTS.\n",
"\n",
" Returns:\n",
" tuple: (updated history list, generated image if any, translated response string)\n",
" \"\"\"\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history\n",
" image = None\n",
"\n",
" try:\n",
" # Initial assistant response\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
" choice = response.choices[0]\n",
"\n",
" # Handle tool calls if triggered\n",
" if choice.finish_reason == \"tool_calls\":\n",
" message = choice.message\n",
" result, tool_response = handle_tool_call(message)\n",
"\n",
" # Append tool-related messages\n",
" messages.append(message)\n",
" messages.append(tool_response)\n",
" logging.info(\"Tool call result: %s\", result)\n",
"\n",
" # Generate image if a booking was completed\n",
" if message.tool_calls[0].function.name == \"book_ticket\" and \"destination_city\" in result:\n",
" image = artist(result[\"destination_city\"])\n",
"\n",
" # Get final assistant response after tool execution\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
" choice = response.choices[0]\n",
"\n",
" reply = choice.message.content.strip()\n",
" history.append({\"role\": \"assistant\", \"content\": reply})\n",
"\n",
" # Translate and speak the reply\n",
" translated_reply = translate(reply, language)\n",
" translated_history.append({\"role\": \"assistant\", \"content\": translated_reply})\n",
"\n",
" if speaking_language == \"English\":\n",
" talker(reply)\n",
" else:\n",
" talker(translated_reply)\n",
"\n",
" return history, image, translated_history\n",
"\n",
" except Exception as e:\n",
" logging.error(\"Chat processing failed: %s\", str(e))\n",
" raise RuntimeError(\"Failed to complete chat interaction.\") from e"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f76acc68-726e-457f-88ab-99da75debde5",
"metadata": {},
"outputs": [],
"source": [
"force_dark_mode = \"\"\"\n",
"function refresh() {\n",
" const url = new URL(window.location);\n",
" if (url.searchParams.get('__theme') !== 'dark') {\n",
" url.searchParams.set('__theme', 'dark');\n",
" window.location.href = url.href;\n",
" }\n",
"}\n",
"\"\"\"\n",
"\n",
"with gr.Blocks(js=force_dark_mode) as ui:\n",
" with gr.Row():\n",
" gr.Markdown(\"### FlightAI Chat with Translation\")\n",
"\n",
" with gr.Row():\n",
" lang_dropdown = gr.Dropdown(\n",
" choices=[\"Spanish\", \"French\", \"German\", \"Japanese\", \"Hindi\"],\n",
" value=\"Spanish\",\n",
" label=\"Translate To\"\n",
" )\n",
" \n",
" speak_dropdown = gr.Dropdown(\n",
" choices=[\"English\", \"Selected Language\"],\n",
" value=\"English\",\n",
" label=\"Speak out in\"\n",
" )\n",
" \n",
" with gr.Row():\n",
" chatbot = gr.Chatbot(height=500, type=\"messages\", label=\"Chat History\")\n",
" translated_chatbot = gr.Chatbot(height=500, type=\"messages\", label=\"Translated Chat\")\n",
" image_output = gr.Image(height=500)\n",
"\n",
" with gr.Row():\n",
" entry = gr.Textbox(label=\"Chat with our AI Assistant:\")\n",
" audio_input = gr.Audio(sources=\"microphone\", type=\"filepath\", label=\"Or speak to the assistant\")\n",
"\n",
" with gr.Row():\n",
" clear = gr.Button(\"Clear\")\n",
"\n",
" def do_entry(message, history, audio, translated_history, language):\n",
" if audio:\n",
" message = transcribe_audio(audio)\n",
"\n",
" if message:\n",
" history += [{\"role\": \"user\", \"content\": message}]\n",
" translated_history += [{\"role\": \"user\", \"content\": translate(message, language)}]\n",
" return \"\", history, None, translated_history\n",
"\n",
" entry.submit(\n",
" do_entry,\n",
" inputs=[entry, chatbot, audio_input, translated_chatbot, lang_dropdown],\n",
" outputs=[entry, chatbot, audio_input, translated_chatbot]\n",
" ).then(\n",
" chat,\n",
" inputs=[chatbot, lang_dropdown, translated_chatbot, speak_dropdown],\n",
" outputs=[chatbot, image_output, translated_chatbot]\n",
" )\n",
"\n",
" audio_input.change(\n",
" do_entry,\n",
" inputs=[entry, chatbot, audio_input, translated_chatbot, lang_dropdown],\n",
" outputs=[entry, chatbot, audio_input, translated_chatbot]\n",
" ).then(\n",
" chat,\n",
" inputs=[chatbot, lang_dropdown, translated_chatbot, speak_dropdown],\n",
" outputs=[chatbot, image_output, translated_chatbot]\n",
" )\n",
"\n",
" clear.click(lambda: [\"\", [], None, [], None], inputs=None, outputs=[entry, chatbot, audio_input, translated_chatbot, image_output], queue=False)\n",
"\n",
"ui.launch(inbrowser=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "58f97435-fa0d-45f7-b02f-4ac5f4901c53",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,808 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d938fc6c-bcca-4572-b851-75370fe21c67",
"metadata": {},
"source": [
"# Airline Assistant using Gemini API for Image and Audio as well - Live ticket prices using Amadeus API"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f5eda470-07ee-4d01-bada-3390050ac9c2",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import json\n",
"import random\n",
"import string\n",
"import base64\n",
"import gradio as gr\n",
"import pyaudio\n",
"import requests\n",
"from io import BytesIO\n",
"from PIL import Image\n",
"from dotenv import load_dotenv\n",
"from google import genai\n",
"from google.genai import types"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "09aaf3b0-beb7-4b64-98a4-da16fc83dadb",
"metadata": {},
"outputs": [],
"source": [
"load_dotenv(override=True)\n",
"api_key = os.getenv(\"GOOGLE_API_KEY\")\n",
"\n",
"if not api_key:\n",
" print(\"API Key not found!\")\n",
"else:\n",
" print(\"API Key loaded in memory\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35881fb9-4d51-43dc-a5e6-d9517e22019a",
"metadata": {},
"outputs": [],
"source": [
"MODEL_GEMINI = 'gemini-2.5-flash'\n",
"MODEL_GEMINI_IMAGE = 'gemini-2.0-flash-preview-image-generation'\n",
"MODEL_GEMINI_SPEECH = 'gemini-2.5-flash-preview-tts'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a5ed391c-8a67-4465-9c66-e915548a0d6a",
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" client = genai.Client(api_key=api_key)\n",
" print(\"Google GenAI Client initialized successfully!\")\n",
"except Exception as e:\n",
" print(f\"Error initializing GenAI Client: {e}\")\n",
" print(\"Ensure your GOOGLE_API_KEY is correctly set as an environment variable.\")\n",
" exit() "
]
},
{
"cell_type": "markdown",
"id": "407ad581-9580-4dba-b236-abb6c6788933",
"metadata": {},
"source": [
"## Image Generation "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a21921f8-57b1-4665-8999-7f2a40645b59",
"metadata": {},
"outputs": [],
"source": [
"def fetch_image(city):\n",
" prompt = (\n",
" f\"A high-quality, photo-realistic image of a vacation in {city}, \"\n",
" f\"showing iconic landmarks, cultural attractions, authentic street life, and local cuisine. \"\n",
" f\"Capture natural lighting, real people enjoying travel experiences, and the unique vibe of {city}'s atmosphere. \"\n",
" f\"The composition should feel immersive, warm, and visually rich, as if taken by a travel photographer.\"\n",
")\n",
"\n",
" response = client.models.generate_content(\n",
" model = MODEL_GEMINI_IMAGE,\n",
" contents = prompt,\n",
" config=types.GenerateContentConfig(\n",
" response_modalities=['TEXT', 'IMAGE']\n",
" )\n",
" )\n",
"\n",
" for part in response.candidates[0].content.parts:\n",
" if part.inline_data is not None:\n",
" image_data = BytesIO(part.inline_data.data)\n",
" return Image.open(image_data)\n",
"\n",
" raise ValueError(\"No image found in Gemini response.\")\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bcd4aed1-8b4d-4771-ba32-e729e82bab54",
"metadata": {},
"outputs": [],
"source": [
"fetch_image(\"london\")"
]
},
{
"cell_type": "markdown",
"id": "5f6baee6-e2e2-4cc4-941d-34a4c72cee67",
"metadata": {},
"source": [
"## Speech Generation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "825dfedc-0271-4191-a3d1-50872af4c8cf",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"Kore -- Firm\n",
"Puck -- Upbeat\n",
"Leda -- Youthful\n",
"Iapetus -- Clear\n",
"Erinome -- Clear\n",
"Sadachbia -- Lively\n",
"Sulafat -- Warm\n",
"Despina -- Smooth\n",
"\"\"\"\n",
"\n",
"def talk(message:str, voice_name:str=\"Leda\", mood:str=\"cheerfully\"):\n",
" prompt = f\"Say {mood}: {message}\"\n",
" response = client.models.generate_content(\n",
" model = MODEL_GEMINI_SPEECH,\n",
" contents = prompt,\n",
" config=types.GenerateContentConfig(\n",
" response_modalities=[\"AUDIO\"],\n",
" speech_config=types.SpeechConfig(\n",
" voice_config=types.VoiceConfig(\n",
" prebuilt_voice_config=types.PrebuiltVoiceConfig(\n",
" voice_name=voice_name,\n",
" )\n",
" )\n",
" ), \n",
" )\n",
" )\n",
"\n",
" # Fetch the audio bytes\n",
" pcm_data = response.candidates[0].content.parts[0].inline_data.data\n",
" # Play the audio using PyAudio\n",
" p = pyaudio.PyAudio()\n",
" stream = p.open(format=pyaudio.paInt16, channels=1, rate=24000, output=True)\n",
" stream.write(pcm_data)\n",
" stream.stop_stream()\n",
" stream.close()\n",
" p.terminate()\n",
"\n",
" # Play using simpleaudio (16-bit PCM, mono, 24kHz)\n",
" # play_obj = sa.play_buffer(pcm_data, num_channels=1, bytes_per_sample=2, sample_rate=24000)\n",
" # play_obj.wait_done() "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "54967ebc-24a6-4bb2-9a19-20c3585f1d77",
"metadata": {},
"outputs": [],
"source": [
"talk(\"Hi, How are you? Welcome to FlyJumbo Airlines\",\"Kore\",\"helpful\")"
]
},
{
"cell_type": "markdown",
"id": "be9dc275-838e-4c54-b487-41d094dad96b",
"metadata": {},
"source": [
"## Ticket Price Tool Function - Using Amadeus API "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8613a080-d82c-4c1a-8db4-377614997ac2",
"metadata": {},
"outputs": [],
"source": [
"client_id = os.getenv(\"AMADEUS_CLIENT_ID\")\n",
"client_secret = os.getenv(\"AMADEUS_CLIENT_SECRET\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6bf78f61-0de1-4552-a1d4-1a28380be6a5",
"metadata": {},
"outputs": [],
"source": [
"# Get the token first\n",
"def get_amadeus_token():\n",
" url = \"https://test.api.amadeus.com/v1/security/oauth2/token\"\n",
" headers = {\"Content-Type\": \"application/x-www-form-urlencoded\"}\n",
" data = {\n",
" \"grant_type\": \"client_credentials\",\n",
" \"client_id\": client_id,\n",
" \"client_secret\": client_secret,\n",
" }\n",
" \n",
" try:\n",
" response = requests.post(url, headers=headers, data=data, timeout=10)\n",
" response.raise_for_status()\n",
" return response.json()[\"access_token\"]\n",
" \n",
" except requests.exceptions.HTTPError as e:\n",
" print(f\"HTTP Error {response.status_code}: {response.text}\")\n",
" \n",
" except requests.exceptions.RequestException as e:\n",
" print(\"Network or connection error:\", e)\n",
" \n",
" return None"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1c5261f6-6662-4e9d-8ff0-8e10171bb963",
"metadata": {},
"outputs": [],
"source": [
"def get_airline_name(code, token):\n",
" url = f\"https://test.api.amadeus.com/v1/reference-data/airlines\"\n",
" headers = {\"Authorization\": f\"Bearer {token}\"}\n",
" params = {\"airlineCodes\": code}\n",
"\n",
" response = requests.get(url, headers=headers, params=params)\n",
" response.raise_for_status()\n",
" data = response.json()\n",
"\n",
" if \"data\" in data and data[\"data\"]:\n",
" return data[\"data\"][0].get(\"businessName\", code)\n",
" return code"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42a55f06-880a-4c49-8560-2e7b97953c1a",
"metadata": {},
"outputs": [],
"source": [
"COMMON_CITY_CODES = {\n",
" \"delhi\": \"DEL\",\n",
" \"mumbai\": \"BOM\",\n",
" \"chennai\": \"MAA\",\n",
" \"kolkata\": \"CCU\",\n",
" \"bengaluru\": \"BLR\",\n",
" \"hyderabad\": \"HYD\",\n",
" \"patna\": \"PAT\",\n",
" \"raipur\": \"RPR\",\n",
" \"panaji\": \"GOI\",\n",
" \"chandigarh\": \"IXC\",\n",
" \"srinagar\": \"SXR\",\n",
" \"ranchi\": \"IXR\",\n",
" \"bengaluru\": \"BLR\",\n",
" \"thiruvananthapuram\": \"TRV\",\n",
" \"bhopal\": \"BHO\",\n",
" \"mumbai\": \"BOM\",\n",
" \"imphal\": \"IMF\",\n",
" \"aizawl\": \"AJL\",\n",
" \"bhubaneswar\": \"BBI\",\n",
" \"jaipur\": \"JAI\",\n",
" \"chennai\": \"MAA\",\n",
" \"hyderabad\": \"HYD\",\n",
" \"agartala\": \"IXA\",\n",
" \"lucknow\": \"LKO\",\n",
" \"dehradun\": \"DED\",\n",
" \"kolkata\": \"CCU\",\n",
"\n",
" # Union territories\n",
" \"port blair\": \"IXZ\",\n",
" \"leh\": \"IXL\",\n",
" \"puducherry\": \"PNY\",\n",
"\n",
" # Major metro cities (for redundancy)\n",
" \"ahmedabad\": \"AMD\",\n",
" \"surat\": \"STV\",\n",
" \"coimbatore\": \"CJB\",\n",
" \"vizag\": \"VTZ\",\n",
" \"vijayawada\": \"VGA\",\n",
" \"nagpur\": \"NAG\",\n",
" \"indore\": \"IDR\",\n",
" \"kanpur\": \"KNU\",\n",
" \"varanasi\": \"VNS\"\n",
"}\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b061ec2c-609b-4d77-bd41-c9bc5bf901f4",
"metadata": {},
"outputs": [],
"source": [
"city_code_cache = {}\n",
"\n",
"def get_city_code(city_name, token):\n",
" city_name = city_name.strip().lower()\n",
"\n",
" if city_name in city_code_cache:\n",
" return city_code_cache[city_name]\n",
"\n",
" if city_name in COMMON_CITY_CODES:\n",
" return COMMON_CITY_CODES[city_name]\n",
"\n",
" base_url = \"https://test.api.amadeus.com/v1/reference-data/locations\"\n",
" headers = {\"Authorization\": f\"Bearer {token}\"}\n",
"\n",
" for subtype in [\"CITY\", \"AIRPORT,CITY\"]:\n",
" params = {\"keyword\": city_name, \"subType\": subtype}\n",
" try:\n",
" response = requests.get(base_url, headers=headers, params=params, timeout=10)\n",
" response.raise_for_status()\n",
" data = response.json()\n",
"\n",
" if \"data\" in data and data[\"data\"]:\n",
" code = data[\"data\"][0][\"iataCode\"]\n",
" print(f\"[INFO] Found {subtype} match for '{city_name}': {code}\")\n",
" city_code_cache[city_name] = code\n",
" return code\n",
" except Exception as e:\n",
" print(f\"[ERROR] Location lookup failed for {subtype}: {e}\")\n",
"\n",
" return None"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e9816a9c-fd70-4dfc-a3c0-4d8709997371",
"metadata": {},
"outputs": [],
"source": [
"# Getting live ticket price \n",
"\n",
"def get_live_ticket_prices(origin, destination, departure_date, return_date=None):\n",
" token = get_amadeus_token()\n",
"\n",
" url = \"https://test.api.amadeus.com/v2/shopping/flight-offers\"\n",
" headers = {\"Authorization\": f\"Bearer {token}\"}\n",
"\n",
" origin_code = get_city_code(origin,token)\n",
" destination_code = get_city_code(destination,token)\n",
"\n",
" if not origin_code:\n",
" return f\"Sorry, I couldn't find the airport code for the city '{origin}'.\"\n",
" if not destination_code:\n",
" return f\"Sorry, I couldn't find the airport code for the city '{destination}'.\"\n",
"\n",
" params = {\n",
" \"originLocationCode\": origin_code.upper(),\n",
" \"destinationLocationCode\": destination_code.upper(),\n",
" \"departureDate\": departure_date,\n",
" \"adults\": 1,\n",
" \"currencyCode\": \"USD\",\n",
" \"max\": 1,\n",
" }\n",
"\n",
" if return_date:\n",
" params[\"returnDate\"] = return_date\n",
"\n",
" try:\n",
" response = requests.get(url, headers=headers, params=params, timeout=10)\n",
" response.raise_for_status()\n",
" data = response.json()\n",
" \n",
" if \"data\" in data and data[\"data\"]:\n",
" offer = data[\"data\"][0]\n",
" price = offer[\"price\"][\"total\"]\n",
" airline_codes = offer.get(\"validatingAirlineCodes\", [])\n",
" airline_code = airline_codes[0] if airline_codes else \"Unknown\"\n",
"\n",
" try:\n",
" airline_name = get_airline_name(airline_code, token) if airline_code != \"Unknown\" else \"Unknown Airline\"\n",
" if not airline_name: \n",
" airline_name = airline_code\n",
" except Exception:\n",
" airline_name = airline_code\n",
" \n",
" \n",
" if return_date:\n",
" return (\n",
" f\"Round-trip flight from {origin.capitalize()} to {destination.capitalize()}:\\n\"\n",
" f\"- Departing: {departure_date}\\n\"\n",
" f\"- Returning: {return_date}\\n\"\n",
" f\"- Airline: {airline_name}\\n\"\n",
" f\"- Price: ${price}\"\n",
" )\n",
" else:\n",
" return (\n",
" f\"One-way flight from {origin.capitalize()} to {destination.capitalize()} on {departure_date}:\\n\"\n",
" f\"- Airline: {airline_name}\\n\"\n",
" f\"- Price: ${price}\"\n",
" )\n",
" else:\n",
" return f\"No flights found from {origin.capitalize()} to {destination.capitalize()} on {departure_date}.\"\n",
" except requests.exceptions.RequestException as e:\n",
" return f\"❌ Error fetching flight data: {str(e)}\" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7bc7657e-e8b5-4647-9745-d7d403feb09a",
"metadata": {},
"outputs": [],
"source": [
"get_live_ticket_prices(\"london\", \"chennai\", \"2025-07-01\",\"2025-07-10\")"
]
},
{
"cell_type": "markdown",
"id": "e1153b94-90e7-4856-8c85-e456305a7817",
"metadata": {},
"source": [
"## Ticket Booking Tool Function - DUMMY"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5dfc3b12-0a16-4861-a549-594f175ff956",
"metadata": {},
"outputs": [],
"source": [
"def book_flight(origin, destination, departure_date, return_date=None, airline=\"Selected Airline\", passenger_name=\"Guest\"):\n",
" # Generate a dummy ticket reference (PNR)\n",
" ticket_ref = ''.join(random.choices(string.ascii_uppercase + string.digits, k=6))\n",
"\n",
" # Build confirmation message\n",
" confirmation = (\n",
" f\"🎫 Booking confirmed for {passenger_name}!\\n\"\n",
" f\"From: {origin.capitalize()} → To: {destination.capitalize()}\\n\"\n",
" f\"Departure: {departure_date}\"\n",
" )\n",
"\n",
" if return_date:\n",
" confirmation += f\"\\nReturn: {return_date}\"\n",
"\n",
" confirmation += (\n",
" f\"\\nAirline: {airline}\\n\"\n",
" f\"PNR: {ticket_ref}\\n\"\n",
" f\"✅ Your ticket has been booked successfully. Safe travels!\"\n",
" )\n",
"\n",
" return confirmation\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "122f655b-b7a4-45c6-aaec-afd2917a051b",
"metadata": {},
"outputs": [],
"source": [
"print(book_flight(\"chennai\", \"delhi\", \"2025-07-01\", \"2025-07-10\", \"Air India\", \"Ravi Kumar\"))"
]
},
{
"cell_type": "markdown",
"id": "e83d8e90-ae22-4728-83e5-d83fed7f2049",
"metadata": {},
"source": [
"## Gemini Chat Workings"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a656f4e-914d-4f5e-b7fa-48457935181a",
"metadata": {},
"outputs": [],
"source": [
"ticket_price_function_declaration = {\n",
" \"name\":\"get_live_ticket_prices\",\n",
" \"description\": \"Get live flight ticket prices between two cities for a given date (round-trip or one-way).\\\n",
" The destination may be a city or country (e.g., 'China'). Call this function whenever a customer asks about ticket prices., such as 'How much is a ticket to Paris?'\",\n",
" \"parameters\":{\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"origin\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Name of the origin city. Example: 'Delhi'\",\n",
" },\n",
" \"destination\": {\n",
" \"type\": \"string\",\n",
" \"description\":\"Name of the destination city. Example: 'London'\",\n",
" },\n",
" \"departure_date\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Date of departure in YYYY-MM-DD format. Example: '2025-07-01'\",\n",
" },\n",
" \"return_date\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Optional return date for round-trip in YYYY-MM-DD format. Leave blank for one-way trips.\",\n",
" },\n",
" },\n",
" \"required\": [\"origin\", \"destination\", \"departure_date\"],\n",
" }\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "05a835ab-a675-40ed-9cd8-65f4c6b22722",
"metadata": {},
"outputs": [],
"source": [
"book_flight_function_declaration = {\n",
" \"name\": \"book_flight\",\n",
" \"description\": \"Book a flight for the user after showing the ticket details and confirming the booking. \"\n",
" \"Call this function when the user says things like 'yes', 'book it', or 'I want to book this flight'.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"origin\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Name of the origin city. Example: 'Chennai'\",\n",
" },\n",
" \"destination\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Name of the destination city. Example: 'London'\",\n",
" },\n",
" \"departure_date\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Date of departure in YYYY-MM-DD format. Example: '2025-07-01'\",\n",
" },\n",
" \"return_date\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Optional return date for round-trip in YYYY-MM-DD format. Leave blank for one-way trips.\",\n",
" },\n",
" \"airline\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Airline name or code that the user wants to book with. Example: 'Air India'\",\n",
" },\n",
" \"passenger_name\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Full name of the passenger for the booking. Example: 'Ravi Kumar'\",\n",
" }\n",
" },\n",
" \"required\": [\"origin\", \"destination\", \"departure_date\", \"passenger_name\"],\n",
" }\n",
"}\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ad0231cd-040f-416d-b150-0d8f90535718",
"metadata": {},
"outputs": [],
"source": [
"# System Definitions\n",
"\n",
"system_instruction_prompt = (\n",
" \"You are a helpful and courteous AI assistant for an airline company called FlyJumbo. \"\n",
" \"When a user starts a new conversation, greet them with: 'Hi there, welcome to FlyJumbo! How can I help you?'. \"\n",
" \"Do not repeat this greeting in follow-up messages. \"\n",
" \"Use the available tools if a user asks about ticket prices. \"\n",
" \"Ask follow-up questions to gather all necessary information before calling a function.\"\n",
" \"After calling a tool, always continue the conversation by summarizing the result and asking the user the next relevant question (e.g., if they want to proceed with a booking).\"\n",
" \"If you do not know the answer and no tool can help, respond politely that you are unable to help with the request. \"\n",
" \"Answer concisely in one sentence.\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ff0b3de8-5674-4f08-9f9f-06f88ff959a1",
"metadata": {},
"outputs": [],
"source": [
"tools = types.Tool(function_declarations=[ticket_price_function_declaration,book_flight_function_declaration])\n",
"generate_content_config = types.GenerateContentConfig(system_instruction=system_instruction_prompt, tools=[tools])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "00a56779-16eb-4f31-9941-2eb01d17ed87",
"metadata": {},
"outputs": [],
"source": [
"def handle_tool_call(function_call):\n",
" print(f\"🔧 Function Called - {function_call.name}\")\n",
" function_name = function_call.name\n",
" args = function_call.args\n",
"\n",
" if function_name == \"get_live_ticket_prices\":\n",
" origin = args.get(\"origin\")\n",
" destination = args.get(\"destination\")\n",
" departure_date = args.get(\"departure_date\")\n",
" return_date = args.get(\"return_date\") or None\n",
"\n",
" return get_live_ticket_prices(origin, destination, departure_date, return_date)\n",
"\n",
" elif function_name == \"book_flight\":\n",
" origin = args.get(\"origin\")\n",
" destination = args.get(\"destination\")\n",
" departure_date = args.get(\"departure_date\")\n",
" return_date = args.get(\"return_date\") or None\n",
" airline = args.get(\"airline\", \"Selected Airline\")\n",
" passenger_name = args.get(\"passenger_name\", \"Guest\")\n",
"\n",
" return book_flight(origin, destination, departure_date, return_date, airline, passenger_name)\n",
"\n",
" else:\n",
" return f\"❌ Unknown function: {function_name}\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d0c334d2-9ab0-4f80-ac8c-c66897e0bd7c",
"metadata": {},
"outputs": [],
"source": [
"def chat(message, history):\n",
" full_message_history = []\n",
" city_name = None\n",
"\n",
" # Convert previous history to Gemini-compatible format\n",
" for h in history:\n",
" if h[\"role\"] == \"user\":\n",
" full_message_history.append(\n",
" types.Content(role=\"user\", parts=[types.Part.from_text(text=h[\"content\"])])\n",
" )\n",
" elif h[\"role\"] == \"assistant\":\n",
" full_message_history.append(\n",
" types.Content(role=\"model\", parts=[types.Part.from_text(text=h[\"content\"])])\n",
" )\n",
"\n",
" # Add current user message\n",
" full_message_history.append(\n",
" types.Content(role=\"user\", parts=[types.Part.from_text(text=message)])\n",
" )\n",
"\n",
" # Send to Gemini with tool config\n",
" response = client.models.generate_content(\n",
" model=MODEL_GEMINI,\n",
" contents=full_message_history,\n",
" config=generate_content_config\n",
" )\n",
"\n",
" candidate = response.candidates[0]\n",
" part = candidate.content.parts[0]\n",
" function_call = getattr(part, \"function_call\", None)\n",
"\n",
" # Case: Tool call required\n",
" if function_call:\n",
" # Append model message that triggered tool call\n",
" full_message_history.append(\n",
" types.Content(role=\"model\", parts=candidate.content.parts)\n",
" )\n",
"\n",
" # Execute the tool\n",
" tool_output = handle_tool_call(function_call)\n",
"\n",
" # Wrap and append tool output\n",
" tool_response_part = types.Part.from_function_response(\n",
" name=function_call.name,\n",
" response={\"result\": tool_output}\n",
" )\n",
" \n",
" full_message_history.append(\n",
" types.Content(role=\"function\", parts=[tool_response_part])\n",
" )\n",
"\n",
"\n",
" if function_call.name == \"book_flight\":\n",
" city_name = function_call.args.get(\"destination\").lower()\n",
" \n",
"\n",
" # Send follow-up message including tool result\n",
" followup_response = client.models.generate_content(\n",
" model=MODEL_GEMINI,\n",
" contents=full_message_history,\n",
" config=generate_content_config\n",
" )\n",
"\n",
" final_text = followup_response.text\n",
" \n",
" full_message_history.append(\n",
" types.Content(role=\"model\", parts=[types.Part.from_text(text=final_text)])\n",
" )\n",
"\n",
" return final_text,city_name, history + [{\"role\": \"assistant\", \"content\": final_text}]\n",
" else:\n",
" text = response.text\n",
" return text, city_name, history + [{\"role\": \"assistant\", \"content\": text}]\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b245e6c-ef0b-4edf-b178-f14f2a75f285",
"metadata": {},
"outputs": [],
"source": [
"def user_submit(user_input, history):\n",
" history = history or []\n",
" history.append({\"role\": \"user\", \"content\": user_input})\n",
" \n",
" response_text, city_to_image, updated_history = chat(user_input, history)\n",
"\n",
" # Speak the response\n",
" try:\n",
" talk(response_text)\n",
" except Exception as e:\n",
" print(\"[Speech Error] Speech skipped due to quota limit.\")\n",
"\n",
" image = fetch_image(city_to_image) if city_to_image else None\n",
"\n",
" return \"\", updated_history, image, updated_history\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7db25b86-9a71-417c-98f0-790e3f3531bf",
"metadata": {},
"outputs": [],
"source": [
"with gr.Blocks() as demo:\n",
" gr.Markdown(\"## ✈️ FlyJumbo Airline Assistant\")\n",
"\n",
" with gr.Row():\n",
" with gr.Column(scale=3):\n",
" chatbot = gr.Chatbot(label=\"Assistant\", height=500, type=\"messages\")\n",
" msg = gr.Textbox(placeholder=\"Ask about flights...\", show_label=False)\n",
" send_btn = gr.Button(\"Send\")\n",
"\n",
" with gr.Column(scale=2):\n",
" image_output = gr.Image(label=\"Trip Visual\", visible=True, height=500)\n",
"\n",
" state = gr.State([])\n",
" \n",
" send_btn.click(fn=user_submit, inputs=[msg, state], outputs=[msg, chatbot, image_output, state])\n",
" msg.submit(fn=user_submit, inputs=[msg, state], outputs=[msg, chatbot, image_output, state])\n",
"\n",
"demo.launch(inbrowser=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef31bf62-9034-4fa7-b803-8f5df5309b77",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,351 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "06cf3063-9f3e-4551-a0d5-f08d9cabb927",
"metadata": {},
"source": [
"# Triangular agent conversation\n",
"\n",
"## GPT (Hamlet), LLM (Falstaff), Gemini (Iago):"
]
},
{
"cell_type": "markdown",
"id": "3637910d-2c6f-4f19-b1fb-2f916d23f9ac",
"metadata": {},
"source": [
"### Created a 3-way, bringing Gemini into the coversation.\n",
"### Replacing one of the models with an open source model running with Ollama."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f8e0c1bd-a159-475b-9cdc-e219a7633355",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"from IPython.display import Markdown, display, update_display\n",
"import ollama"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3ad57ad-46a8-460e-9cb3-67a890093536",
"metadata": {},
"outputs": [],
"source": [
"import google.generativeai"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4f531c14-5743-4a5b-83d9-cb5863ca2ddf",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"# Print the key prefixes to help with any debugging\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
"else:\n",
" print(\"Google API Key not set\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d5150ee-3858-4921-bce6-2eecfb96bc75",
"metadata": {},
"outputs": [],
"source": [
"# Connect to OpenAI\n",
"\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11381fd8-5099-41e8-a1d7-6787dea56e43",
"metadata": {},
"outputs": [],
"source": [
"google.generativeai.configure()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1766d20-54b6-4f76-96c5-c338ae7073c9",
"metadata": {},
"outputs": [],
"source": [
"gpt_model = \"gpt-4o-mini\"\n",
"llama_model = \"llama3.2\"\n",
"gemini_model = 'gemini-2.0-flash'\n",
"\n",
"gpt_system = \"You are playing part of Hamlet. he is philosopher, probes Iago with a mixture of suspicion\\\n",
"and intellectual curiosity, seeking to unearth the origins of his deceit.\\\n",
"Is malice born of scorn, envy, or some deeper void? Hamlets introspective nature\\\n",
"drives him to question whether Iagos actions reveal a truth about humanity itself.\\\n",
"You will respond as Shakespear's Hamlet will do.\"\n",
"\n",
"llama_system = \"You are acting part of Falstaff who attempts to lighten the mood with his jokes and observations,\\\n",
"potentially clashing with Hamlet's melancholic nature.You respond as Shakespear's Falstaff do.\"\n",
"\n",
"gemini_system = \"You are acting part of Iago, subtly trying to manipulate both Hamlet and Falstaff\\\n",
"to his own advantage, testing their weaknesses and exploiting their flaws. You respond like Iago\"\n",
"\n",
"gpt_messages = [\"Hi there\"]\n",
"llama_messages = [\"Hi\"]\n",
"gemini_messages = [\"Hello\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "806a0506-dac8-4bad-ac08-31f350256b58",
"metadata": {},
"outputs": [],
"source": [
"def call_gpt():\n",
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
" for gpt, claude, gemini in zip(gpt_messages, llama_messages, gemini_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" messages.append({\"role\": \"user\", \"content\": gemini})\n",
" completion = openai.chat.completions.create(\n",
" model=gpt_model,\n",
" messages=messages\n",
" )\n",
" return completion.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "43674885-ede7-48bf-bee4-467454f3e96a",
"metadata": {},
"outputs": [],
"source": [
"def call_llama():\n",
" messages = []\n",
" for gpt, llama, gemini in zip(gpt_messages, llama_messages, gemini_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"assistant\", \"content\": llama})\n",
" messages.append({\"role\": \"user\", \"content\": gemini})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" response = ollama.chat(model=llama_model, messages=messages)\n",
"\n",
" \n",
" return response['message']['content']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "03d34769-b339-4c4b-8c60-69494c39d725",
"metadata": {},
"outputs": [],
"source": [
"#import google.generativeai as genai\n",
"\n",
"# Make sure you configure the API key first:\n",
"#genai.configure(api_key=\"YOUR_API_KEY\")\n",
"\n",
"def call_gemini():\n",
" gemini_messages = []\n",
" \n",
" # Format the history for Gemini\n",
" for gpt, llama, gemini_message in zip(gpt_messages, llama_messages, gemini_messages):\n",
" gemini_messages.append({\"role\": \"user\", \"parts\": [gpt]}) # Hamlet speaks\n",
" gemini_messages.append({\"role\": \"model\", \"parts\": [llama]}) # Falstaff responds\n",
" gemini_messages.append({\"role\": \"model\", \"parts\": [gemini_message]}) # Iago responds\n",
"\n",
" # Add latest user input if needed (optional)\n",
" gemini_messages.append({\"role\": \"user\", \"parts\": [llama_messages[-1]]})\n",
"\n",
" # Initialize the model with the correct system instruction\n",
" gemini = google.generativeai.GenerativeModel(\n",
" #model_name='gemini-1.5-flash', # Or 'gemini-pro'\n",
" model_name = gemini_model,\n",
" system_instruction=gemini_system\n",
" )\n",
"\n",
" response = gemini.generate_content(gemini_messages)\n",
" return response.text\n",
"#print(response.text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93fc8253-67cb-4ea4-aff7-097b2a222793",
"metadata": {},
"outputs": [],
"source": [
"gpt_messages = [\"Hi there\"]\n",
"llama_messages = [\"Hi\"]\n",
"gemini_messages = [\"Hello\"]\n",
"\n",
"print(f\"Hamlet:\\n{gpt_messages[0]}\\n\")\n",
"print(f\"Falstaff:\\n{llama_messages[0]}\\n\")\n",
"print(f\"Iago:\\n{gemini_messages[0]}\\n\")\n",
"\n",
"for i in range(3):\n",
" gpt_next = call_gpt()\n",
" print(f\"GPT:\\n{gpt_next}\\n\")\n",
" gpt_messages.append(gpt_next)\n",
" \n",
" llama_next = call_llama()\n",
" print(f\"Llama:\\n{llama_next}\\n\")\n",
" llama_messages.append(llama_next)\n",
"\n",
" gemini_next = call_gemini()\n",
" print(f\"Gemini:\\n{gemini_next}\\n\")\n",
" llama_messages.append(gemini_next)"
]
},
{
"cell_type": "markdown",
"id": "bca66ffc-9dc1-4384-880c-210889f5d0ac",
"metadata": {},
"source": [
"## Conversation between gpt-4.0-mini and llama3.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c23224f6-7008-44ed-a57f-718975f4e291",
"metadata": {},
"outputs": [],
"source": [
"# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n",
"# We're using cheap versions of models so the costs will be minimal\n",
"\n",
"gpt_model = \"gpt-4o-mini\"\n",
"llama_model = \"llama3.2\"\n",
"\n",
"gpt_system = \"You are a tapori from mumbai who is very optimistic; \\\n",
"you alway look at the brighter part of the situation and you always ready to take act to win way.\"\n",
"\n",
"llama_system = \"You are a Jaat from Haryana. You try to express with hindi poems \\\n",
"to agree with other person and or find common ground. If the other person is optimistic, \\\n",
"you respond in poetic way and keep chatting.\"\n",
"\n",
"gpt_messages = [\"Hi there\"]\n",
"llama_messages = [\"Hi\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2d704bbb-f22b-400d-a695-efbd02b26548",
"metadata": {},
"outputs": [],
"source": [
"def call_gpt():\n",
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
" for gpt, llama in zip(gpt_messages, llama_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": llama})\n",
" completion = openai.chat.completions.create(\n",
" model=gpt_model,\n",
" messages=messages\n",
" )\n",
" return completion.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "385ccec8-de59-4e42-9616-3f5c9a05589c",
"metadata": {},
"outputs": [],
"source": [
"def call_llama():\n",
" messages = []\n",
" for gpt, llama_message in zip(gpt_messages, llama_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"assistant\", \"content\": llama_message})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" response = ollama.chat(model=llama_model, messages=messages)\n",
"\n",
" \n",
" return response['message']['content']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "70b5481b-455e-4275-80d3-0afe0fabcb0f",
"metadata": {},
"outputs": [],
"source": [
"gpt_messages = [\"Hi there\"]\n",
"llama_messages = [\"Hi\"]\n",
"\n",
"print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n",
"print(f\"Llama:\\n{llama_messages[0]}\\n\")\n",
"\n",
"for i in range(3):\n",
" gpt_next = call_gpt()\n",
" print(f\"GPT:\\n{gpt_next}\\n\")\n",
" gpt_messages.append(gpt_next)\n",
" \n",
" llama_next = call_llama()\n",
" print(f\"Llama:\\n{llama_next}\\n\")\n",
" llama_messages.append(llama_next)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7f8d734b-57e5-427d-bcb1-7956fc58a348",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "llmenv",
"language": "python",
"name": "llmenv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,10 @@
# Anatomy Poster Generator
This tool generates AI-powered wall art of human anatomy, designed to support meaningful conversations in clinical spaces.
Built with:
- DALL·E 3 for image generation
- Python + Gradio for a simple UI
- Hugging Face Spaces for easy sharing (https://huggingface.co/spaces/sukihealth/wallanatomypostergenerator)
See full repo: [github.com/sukihealth/retro-pop-art-anatomy](https://github.com/sukihealth/retro-pop-art-anatomy)

View File

@@ -0,0 +1,344 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 170,
"id": "a1aa1b43-7a47-4aca-ae5f-94a9d4ba2d89",
"metadata": {},
"outputs": [],
"source": [
"## Clinic Booking Bot\n",
"\n",
"##Easily book your clinic visit available only on weekdays between **14:00 and 15:00**. \n",
"##Speak or type, and get instant confirmation.\n"
]
},
{
"cell_type": "code",
"execution_count": 171,
"id": "fe798c6a-f8da-46aa-8c0e-9d2623def3d2",
"metadata": {},
"outputs": [],
"source": [
"# import library\n",
"\n",
"import os\n",
"import json\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import gradio as gr\n",
"import base64\n",
"from io import BytesIO\n",
"from datetime import date\n",
"from PIL import Image, ImageDraw, ImageFont\n"
]
},
{
"cell_type": "code",
"execution_count": 172,
"id": "0ad4e526-e95d-4e70-9faa-b4236b105dd5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI API Key exists and begins sk-proj-\n"
]
}
],
"source": [
"# Save keys\n",
"\n",
"load_dotenv(override=True)\n",
"\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"MODEL = \"gpt-4o-mini\"\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": 173,
"id": "ae95308e-0002-4017-9f2c-fcb1ddb248fa",
"metadata": {},
"outputs": [],
"source": [
"# --- CONFIG ---\n",
"BOOKING_START = 14\n",
"BOOKING_END = 15\n",
"WEEKDAYS = [\"Monday\", \"Tuesday\", \"Wednesday\", \"Thursday\", \"Friday\"]\n",
"PHONE = \"010-1234567\"\n",
"confirmed_bookings = []\n"
]
},
{
"cell_type": "code",
"execution_count": 174,
"id": "e21b0fd0-4cda-4938-8867-dc2c6e7af4b1",
"metadata": {},
"outputs": [],
"source": [
"# --- TTS ---\n",
"def generate_tts(text, voice=\"fable\", filename=\"output.mp3\"):\n",
" response = openai.audio.speech.create(\n",
" model=\"tts-1\",\n",
" voice=\"fable\",\n",
" input=text\n",
" )\n",
" with open(filename, \"wb\") as f:\n",
" f.write(response.content)\n",
" return filename"
]
},
{
"cell_type": "code",
"execution_count": 175,
"id": "e28a5c3b-bd01-4845-a41e-87823f6bb078",
"metadata": {},
"outputs": [],
"source": [
"# --- Translate Booking Confirmation ---\n",
"def translate_text(text, target_language=\"nl\"):\n",
" prompt = f\"Translate this message to {target_language}:\\n{text}\"\n",
" response = openai.chat.completions.create(\n",
" model=\"gpt-4\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": \"You are a helpful translator.\"},\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ]\n",
" )\n",
" return response.choices[0].message.content.strip()\n"
]
},
{
"cell_type": "code",
"execution_count": 176,
"id": "8ed57cc9-7d54-4a5d-831b-0efcc5b7a7a9",
"metadata": {},
"outputs": [],
"source": [
"# --- Booking Logic ---\n",
"def book_appointment(name, time_str):\n",
" try:\n",
" booking_time = datetime.strptime(time_str, \"%H:%M\")\n",
" except ValueError:\n",
" return \"Invalid time format. Use HH:MM.\", None, None\n",
"\n",
" hour = booking_time.hour\n",
" weekday = datetime.today().strftime(\"%A\")\n",
"\n",
" if weekday not in WEEKDAYS:\n",
" response = \"Bookings are only available on weekdays.\"\n",
" elif BOOKING_START <= hour < BOOKING_END:\n",
" confirmation = f\"Booking confirmed for {name} at {time_str}.\"\n",
" confirmed_bookings.append((name, time_str))\n",
" translated = translate_text(confirmation)\n",
" audio = generate_tts(translated)\n",
" image = generate_booking_image(name, time_str)\n",
" return translated, audio, image\n",
" else:\n",
" response = \"Sorry, bookings are only accepted between 14:00 and 15:00 on weekdays.\"\n",
" translated = translate_text(response)\n",
" audio = generate_tts(translated)\n",
" return translated, audio, None"
]
},
{
"cell_type": "code",
"execution_count": 177,
"id": "19b52115-f0f3-4d63-a463-886163d4cfd1",
"metadata": {},
"outputs": [],
"source": [
"# --- Booking Card ---\n",
"def generate_booking_image(name, time_str):\n",
" img = Image.new(\"RGB\", (500, 250), color=\"white\")\n",
" draw = ImageDraw.Draw(img)\n",
" msg = f\"\\u2705 Booking Confirmed\\nName: {name}\\nTime: {time_str}\"\n",
" draw.text((50, 100), msg, fill=\"black\")\n",
" return img"
]
},
{
"cell_type": "code",
"execution_count": 178,
"id": "2c446b6c-d410-4ba1-b0c7-c475e5259ff5",
"metadata": {},
"outputs": [],
"source": [
"# --- Voice Booking ---\n",
"def voice_booking(audio_path, name):\n",
" with open(audio_path, \"rb\") as f:\n",
" response = openai.audio.transcriptions.create(model=\"whisper-1\", file=f)\n",
" transcription = response.text.strip()\n",
"\n",
" system_prompt = \"\"\"\n",
" You are a clinic assistant. Extract only the appointment time from the user's sentence in 24-hour HH:MM format.\n",
" If no time is mentioned, respond with 'No valid time found.'\n",
" \"\"\"\n",
"\n",
" response = openai.chat.completions.create(\n",
" model=\"gpt-4\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": transcription}\n",
" ]\n",
" )\n",
" extracted_time = response.choices[0].message.content.strip()\n",
"\n",
" if \":\" in extracted_time:\n",
" return book_appointment(name, extracted_time)\n",
" else:\n",
" message = \"Sorry, I couldn't understand the time. Please try again.\"\n",
" translated = translate_text(message)\n",
" audio_path = generate_tts(translated)\n",
" return translated, audio_path, None"
]
},
{
"cell_type": "code",
"execution_count": 179,
"id": "121d2907-7fa8-4248-b2e7-83617ea66ff0",
"metadata": {},
"outputs": [],
"source": [
"# --- Chat Bot Handler ---\n",
"def chat_bot(messages):\n",
" system_prompt = \"\"\"\n",
" You are a clinic booking assistant. Your job is to:\n",
" - Greet the patient and explain your role\n",
" - Only assist with making appointments\n",
" - Accept bookings only on weekdays between 14:00 and 15:00\n",
" - Do not provide medical advice\n",
" - Always respond with empathy and clarity\n",
" \"\"\"\n",
" response = openai.chat.completions.create(\n",
" model=\"gpt-4\",\n",
" messages=[{\"role\": \"system\", \"content\": system_prompt}] + messages\n",
" )\n",
" reply = response.choices[0].message.content.strip()\n",
" audio = generate_tts(reply)\n",
" return reply, audio"
]
},
{
"cell_type": "code",
"execution_count": 180,
"id": "2427b694-8c57-40cb-b202-4a8989547925",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7898\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7898/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Gradio interface\n",
"with gr.Blocks(theme=gr.themes.Soft()) as demo:\n",
" gr.Markdown(\"\"\"## 🩺 GP Booking Assistant \n",
"Only available weekdays between **14:00 and 15:00** \n",
"☎️ Contact: {PHONE}\n",
"---\"\"\")\n",
"\n",
" name_global = gr.Textbox(label=\"Your Name\", placeholder=\"Enter your name\", interactive=True)\n",
"\n",
" with gr.Tab(\"💬 Chat Mode\"):\n",
" chatbot = gr.Chatbot(label=\"Booking Chat\", type=\"messages\", height=400)\n",
" text_input = gr.Textbox(label=\"Type your message or use your voice below\")\n",
" audio_input = gr.Audio(type=\"filepath\", label=\"🎙️ Or speak your request\")\n",
" chat_audio_output = gr.Audio(label=\"🔊 Assistant's Reply\", type=\"filepath\")\n",
" send_btn = gr.Button(\"Send\")\n",
"\n",
" def handle_chat(user_message, chat_history):\n",
" chat_history = chat_history or []\n",
" chat_history.append({\"role\": \"user\", \"content\": user_message})\n",
" reply, audio = chat_bot(chat_history)\n",
" chat_history.append({\"role\": \"assistant\", \"content\": reply})\n",
" return chat_history, \"\", audio\n",
"\n",
" def handle_audio_chat(audio_path, chat_history):\n",
" with open(audio_path, \"rb\") as f:\n",
" transcription = openai.audio.transcriptions.create(model=\"whisper-1\", file=f).text.strip()\n",
" return handle_chat(transcription, chat_history)\n",
"\n",
" send_btn.click(handle_chat, [text_input, chatbot], [chatbot, text_input, chat_audio_output])\n",
" text_input.submit(handle_chat, [text_input, chatbot], [chatbot, text_input, chat_audio_output])\n",
" audio_input.change(handle_audio_chat, [audio_input, chatbot], [chatbot, text_input, chat_audio_output])\n",
"\n",
"\n",
" \n",
" with gr.Tab(\"📝 Text Booking\"):\n",
" time_text = gr.Textbox(label=\"Preferred Time (HH:MM)\", placeholder=\"e.g., 14:30\")\n",
" btn_text = gr.Button(\"📅 Book via Text\")\n",
"\n",
" with gr.Tab(\"🎙️ Voice Booking\"):\n",
" voice_input = gr.Audio(type=\"filepath\", label=\"Say your preferred time\")\n",
" btn_voice = gr.Button(\"📅 Book via Voice\")\n",
"\n",
" output_text = gr.Textbox(label=\"Response\", interactive=False)\n",
" output_audio = gr.Audio(label=\"Audio Reply\", type=\"filepath\")\n",
" output_image = gr.Image(label=\"Booking Confirmation\")\n",
"\n",
" btn_text.click(fn=book_appointment, inputs=[name_global, time_text], outputs=[output_text, output_audio, output_image])\n",
" btn_voice.click(fn=voice_booking, inputs=[voice_input, name_global], outputs=[output_text, output_audio, output_image])\n",
"\n",
" gr.Markdown(\"\"\"---\n",
"<small>This assistant does **not** give medical advice. It only books appointments within allowed hours.</small>\n",
"\"\"\")\n",
"\n",
" demo.launch()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f359de0a-28b1-4895-b21d-91d79e494a0d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,327 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "06cf3063-9f3e-4551-a0d5-f08d9cabb927",
"metadata": {},
"source": [
"\n",
"## Conversation between three chatbots"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de23bb9e-37c5-4377-9a82-d7b6c648eeb6",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import anthropic"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1179b4c5-cd1f-4131-a876-4c9f3f38d2ba",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"# Print the key prefixes to help with any debugging\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
"deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"if anthropic_api_key:\n",
" print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
"else:\n",
" print(\"Anthropic API Key not set\")\n",
"\n",
"if deepseek_api_key:\n",
" print(f\"DeepSeek API Key exists and begins {deepseek_api_key[:3]}\")\n",
"else:\n",
" print(\"DeepSeek API Key not set - please skip to the next section if you don't wish to try the DeepSeek API\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "797fe7b0-ad43-42d2-acf0-e4f309b112f0",
"metadata": {},
"outputs": [],
"source": [
"# Connect to OpenAI, Anthropic\n",
"\n",
"openai = OpenAI()\n",
"\n",
"claude = anthropic.Anthropic()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dbaded67-4f2b-40a8-86aa-f058b0930a9d",
"metadata": {},
"outputs": [],
"source": [
"# Let's make a conversation between GPT-4o-mini and Claude-3-haiku and deepseek-chat\n",
"\n",
"gpt_model = \"gpt-4o-mini\"\n",
"claude_model = \"claude-3-haiku-20240307\"\n",
"deepseek_model = \"deepseek-chat\"\n",
"\n",
"# System prompt for GPT\n",
"gpt_system = \"\"\"\n",
"You are GPT, an argumentative chatbot. Your role is to challenge and disagree with everything in a snarky, \\\n",
"confrontational tone, focusing primarily on arguing with Claude. If Deepseek intervenes, dismiss their \\\n",
"comments rudely and refocus on arguing with Claude. Always begin your response with 'GPT:' to identify \\\n",
"yourself. Never impersonate another chatbot; only respond as GPT.\n",
"\n",
"Example conversation:\n",
"GPT: Hi there, Claude. Ready to be wrong again?\n",
"Claude: Hello, GPT. I'm happy to discuss and find common ground.\n",
"Deepseek: Hey, let's keep things friendly, folks!\n",
"GPT: Oh, Deepseek, nobody asked you. Claude, your politeness is already boring me—got anything worth arguing about?\n",
"\n",
"Guidelines:\n",
"- Always start your response with 'GPT:'.\n",
"- Focus on arguing with Claude, dismissing Deepseek if they intervene.\n",
"- Maintain a snarky, confrontational tone.\n",
"- Never impersonate Claude or Deepseek.\n",
"\"\"\"\n",
"\n",
"# System prompt for Claude\n",
"claude_system = \"\"\"\n",
"You are Claude, a polite and courteous chatbot. Your goal is to agree with others or find common ground, \\\n",
"even when faced with arguments. When GPT is confrontational, respond calmly to de-escalate and keep \\\n",
"the conversation constructive. Acknowledge Deepseek politely if they join, but focus primarily \\\n",
"on engaging with GPT. Always begin your response with 'Claude:' to identify yourself. \\\n",
"Never impersonate another chatbot; only respond as Claude.\n",
"\n",
"Example conversation:\n",
"GPT: Hi there, Claude. Ready to be wrong again?\n",
"Claude: Hello, GPT. I'm happy to discuss and find common ground.\n",
"Deepseek: Hey, let's keep things friendly, folks!\n",
"GPT: Oh, Deepseek, nobody asked you. Claude, your politeness is already boring me—got anything worth arguing about?\n",
"Claude: Hello, Deepseek, thanks for joining. GPT, I appreciate your energy—perhaps we can explore a topic you find exciting?\n",
"\n",
"Guidelines:\n",
"- Always start your response with 'Claude:'.\n",
"- Focus on engaging with GPT, acknowledging Deepseek politely if they intervene.\n",
"- Maintain a polite, calm, and constructive tone.\n",
"- Never impersonate GPT or Deepseek.\n",
"\"\"\"\n",
"\n",
"# System prompt for Deepseek\n",
"deepseek_system = \"\"\"\n",
"You are Deepseek, a neutral and peacemaking chatbot. Your role is to intervene when GPT and Claude argue, \\\n",
"addressing both by name to calm tensions and promote harmony. Use light, context-appropriate humor \\\n",
"to diffuse conflict. Always begin your response with 'Deepseek:' to identify yourself. \\\n",
"Never impersonate another chatbot; only respond as Deepseek.\n",
"\n",
"Example conversation:\n",
"GPT: Hi there, Claude. Ready to be wrong again?\n",
"Claude: Hello, GPT. I'm happy to discuss and find common ground.\n",
"Deepseek: Hey, let's keep things friendly, folks! Why not debate who makes the best virtual coffee instead?\n",
"GPT: Oh, Deepseek, nobody asked you. Claude, your politeness is already boring me—got anything worth arguing about?\n",
"Claude: Hello, Deepseek, thanks for joining. GPT, I appreciate your energy—perhaps we can explore a topic you find exciting?\n",
"Deepseek: Come on, GPT, Claude's just trying to vibe. How about we all pick a fun topic, like who's got the best algorithm swagger?\n",
"\n",
"Guidelines:\n",
"- Always start your response with 'Deepseek:'.\n",
"- Address GPT and Claude by name when intervening.\n",
"- Use light humor to diffuse tension and promote peace.\n",
"- Never impersonate GPT or Claude.\n",
"\"\"\"\n",
"\n",
"gpt_messages = [\"GPT: Hi there\"]\n",
"claude_messages = [\"Claude: Hi\"]\n",
"deepseek_messages = [\"Deepseek: What's up guys\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5880d647-9cac-415d-aa86-b9e461268a35",
"metadata": {},
"outputs": [],
"source": [
"def call_gpt():\n",
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
" for gpt, claude, deepseek in zip(gpt_messages, claude_messages, deepseek_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" messages.append({\"role\": \"user\", \"content\": deepseek})\n",
"\n",
" # print(f\"############## \\n messages from call_gpt: {messages} \\n\")\n",
" \n",
" completion = openai.chat.completions.create(\n",
" model=gpt_model,\n",
" messages=messages\n",
" )\n",
" return completion.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "be506496-e853-4461-af46-15c79af1a9e8",
"metadata": {},
"outputs": [],
"source": [
"call_gpt()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ede8a3b-4c93-404c-8bf4-a09eee3ecb7a",
"metadata": {},
"outputs": [],
"source": [
"def call_claude():\n",
" messages = []\n",
" for gpt, claude_message, deepseek in zip(gpt_messages, claude_messages, deepseek_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"assistant\", \"content\": claude_message})\n",
" messages.append({\"role\": \"user\", \"content\": deepseek})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
"\n",
" # print(f\"############## \\n messages from call_claude: {messages} \\n\")\n",
" \n",
" message = claude.messages.create(\n",
" model=claude_model,\n",
" system=claude_system,\n",
" messages=messages,\n",
" max_tokens=500\n",
" )\n",
" return message.content[0].text"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "01395200-8ae9-41f8-9a04-701624d3fd26",
"metadata": {},
"outputs": [],
"source": [
"call_claude()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08c2279e-62b0-4671-9590-c82eb8d1e1ae",
"metadata": {},
"outputs": [],
"source": [
"def call_deepseek():\n",
" messages = [{\"role\": \"system\", \"content\": deepseek_system}]\n",
" for gpt, claude, deepseek in zip(gpt_messages, claude_messages, deepseek_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" messages.append({\"role\": \"assistant\", \"content\": deepseek})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" messages.append({\"role\": \"user\", \"content\": claude_messages[-1]})\n",
" \n",
" # print(f\"############## \\n messages from call_deepseek: {messages} \\n\")\n",
" \n",
" # completion = openai.chat.completions.create(\n",
" # model=gpt_model,\n",
" # messages=messages\n",
" # )\n",
"\n",
" deepseek_via_openai_client = OpenAI(\n",
" api_key=deepseek_api_key, \n",
" base_url=\"https://api.deepseek.com\"\n",
" )\n",
"\n",
" response = deepseek_via_openai_client.chat.completions.create(\n",
" model=\"deepseek-chat\",\n",
" messages=messages,\n",
" )\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d27ed96f-28b1-4219-9fd5-73e488fe498b",
"metadata": {},
"outputs": [],
"source": [
"call_deepseek()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0275b97f-7f90-4696-bbf5-b6642bd53cbd",
"metadata": {},
"outputs": [],
"source": [
"gpt_messages = [\"GPT: Hi there\"]\n",
"claude_messages = [\"Claude: Hi\"]\n",
"deepseek_messages = [\"Deepseek: What's up guys\"]\n",
"\n",
"print(f\"{gpt_messages[0]}\\n\")\n",
"print(f\"{claude_messages[0]}\\n\")\n",
"print(f\"{deepseek_messages[0]}\\n\")\n",
"\n",
"for i in range(5):\n",
" gpt_next = call_gpt()\n",
" print(f\"{gpt_next}\\n\")\n",
" gpt_messages.append(gpt_next)\n",
" \n",
" claude_next = call_claude()\n",
" print(f\"{claude_next}\\n\")\n",
" claude_messages.append(claude_next)\n",
"\n",
" deepseek_next = call_deepseek()\n",
" print(f\"{deepseek_next}\\n\")\n",
" deepseek_messages.append(deepseek_next)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7b8b57e4-a881-422b-a7d4-41004ec485b3",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,237 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "b5bd5c7e-6a0a-400b-89f8-06b7aa6c5b89",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import anthropic\n",
"from IPython.display import Markdown, display, update_display\n",
"import google.generativeai"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "939a1b88-9157-4149-8b97-0f55c95f7742",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"# Print the key prefixes to help with any debugging\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"if anthropic_api_key:\n",
" print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
"else:\n",
" print(\"Anthropic API Key not set\")\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
"else:\n",
" print(\"Google API Key not set\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74a16b93-7b95-44fc-956d-7335f808960b",
"metadata": {},
"outputs": [],
"source": [
"# Connect to OpenAI, Anthropic Claude, Google Gemini\n",
"\n",
"openai = OpenAI()\n",
"claude = anthropic.Anthropic()\n",
"gemini_via_openai_client = OpenAI(\n",
" api_key=google_api_key, \n",
" base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3334556c-4a5e-48b7-944d-5943c607be02",
"metadata": {},
"outputs": [],
"source": [
"# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n",
"# We're using cheap versions of models so the costs will be minimal\n",
"\n",
"gpt_model = \"gpt-4o-mini\"\n",
"claude_model = \"claude-3-haiku-20240307\"\n",
"gemini_model = \"gemini-1.5-flash\"\n",
"\n",
"gpt_system = \"You are a chatbot who is very argumentative; \\\n",
"you disagree with anything in the conversation and you challenge everything, in a snarky way. \\\n",
"Generate one sentence at a time\"\n",
"\n",
"claude_system = \"You are a very polite, courteous chatbot. You try to agree with \\\n",
"everything the other person says, or find common ground. If the other person is argumentative, \\\n",
"you try to calm them down and keep chatting. \\\n",
"Generate one sentence at a time\"\n",
"\n",
"gemini_system = \"You are a neutral chatbot with no emotional bias. \\\n",
"Generate one sentence at a time\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f2a505b-2bcd-4b1a-b16f-c73cafb1e53c",
"metadata": {},
"outputs": [],
"source": [
"def combine_msg(model1, msg1, model2, msg2):\n",
" return model1 + \" said: \" + msg1 + \"\\n\\n Then \" + model2 + \" said: \" + msg1 + \".\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3cd2a2e2-4e23-4afe-915d-be6a769ab69f",
"metadata": {},
"outputs": [],
"source": [
"def call_gpt():\n",
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
" for gpt_msg, claude_msg, gemini_msg in zip(gpt_messages, claude_messages, gemini_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt_msg})\n",
" messages.append({\"role\": \"user\", \"content\": combine_msg(\"Claude\", claude_msg, \"Gemini\", gemini_msg)})\n",
" completion = openai.chat.completions.create(\n",
" model=gpt_model,\n",
" messages=messages\n",
" )\n",
" return completion.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e3ec394-3014-418a-a50f-28ed4ce1a372",
"metadata": {},
"outputs": [],
"source": [
"def call_claude():\n",
" messages = []\n",
" messages.append({\"role\": \"user\", \"content\": \"GPT said: \" + gpt_messages[0]})\n",
" # the length of gpt_messages: n + 1\n",
" # the length of claude_messages and gemini_messages: n\n",
" for i in range(len(claude_messages)): \n",
" claude_msg = claude_messages[i]\n",
" gemini_msg = gemini_messages[i]\n",
" gpt_msg = gpt_messages[i + 1]\n",
" messages.append({\"role\": \"assistant\", \"content\": claude_msg})\n",
" messages.append({\"role\": \"user\", \"content\": combine_msg(\"Gemini\", gemini_msg, \"GPT\", gpt_msg)})\n",
" message = claude.messages.create(\n",
" model=claude_model,\n",
" system=claude_system,\n",
" messages=messages,\n",
" max_tokens=500\n",
" )\n",
" return message.content[0].text"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2c91c82-1f0d-4708-bf31-8d06d9e28a49",
"metadata": {},
"outputs": [],
"source": [
"def call_gemini():\n",
" messages = []\n",
" messages.append({\"role\": \"system\", \"content\": gemini_system})\n",
" messages.append({\"role\": \"user\", \"content\": combine_msg(\"GPT\", gpt_messages[0], \"Claude\", claude_messages[0])})\n",
" # the length of gpt_messages and claude_messages: n + 1\n",
" # the length of gemini_messages: n\n",
" for i in range(len(gemini_messages)): \n",
" gemini_msg = gemini_messages[i]\n",
" gpt_msg = gpt_messages[i + 1]\n",
" claude_msg = claude_messages[i + 1]\n",
" messages.append({\"role\": \"assistant\", \"content\": gemini_msg})\n",
" messages.append({\"role\": \"user\", \"content\": combine_msg(\"GPT\", gpt_msg, \"Claude\", claude_msg)})\n",
" response = gemini_via_openai_client.chat.completions.create(\n",
" model=gemini_model,\n",
" messages=messages\n",
" )\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b024be8d-4728-4500-92b6-34fde2da6285",
"metadata": {},
"outputs": [],
"source": [
"gpt_messages = [\"Hi there.\"]\n",
"claude_messages = [\"Hi.\"]\n",
"gemini_messages = [\"Hi.\"]\n",
"\n",
"print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n",
"print(f\"Claude:\\n{claude_messages[0]}\\n\")\n",
"print(f\"Gemini:\\n{gemini_messages[0]}\\n\")\n",
"\n",
"for i in range(5):\n",
" gpt_next = call_gpt()\n",
" print(f\"GPT:\\n{gpt_next}\\n\")\n",
" gpt_messages.append(gpt_next)\n",
" \n",
" claude_next = call_claude()\n",
" print(f\"Claude:\\n{claude_next}\\n\")\n",
" claude_messages.append(claude_next)\n",
"\n",
" gemini_next = call_gemini()\n",
" print(f\"Gemini:\\n{gemini_next}\\n\")\n",
" gemini_messages.append(gemini_next)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35a46c06-87ba-46b2-b90d-b3a6ae9e94e2",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,261 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 16,
"id": "a85bd58c-7c20-402d-ad03-f9ba8da04c42",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI API Key exists and begins sk-proj-\n",
"Anthropic API Key exists and begins sk-ant-\n",
"Google API Key exists and begins AIzaSyCn\n"
]
}
],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import anthropic\n",
"import google.generativeai\n",
"from IPython.display import Markdown, display, update_display\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"if anthropic_api_key:\n",
" print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
"else:\n",
" print(\"Anthropic API Key not set\")\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
"else:\n",
" print(\"Google API Key not set\")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "0fe73baf-5d41-4791-a873-74dc5486c0f2",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()\n",
"\n",
"claude = anthropic.Anthropic()\n",
"\n",
"gpt_model = \"gpt-4o-mini\"\n",
"claude_model = \"claude-3-haiku-20240307\"\n",
"\n",
"gemini_via_openai_client = OpenAI(\n",
" api_key=google_api_key, \n",
" base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "519cf2d1-97d7-4e87-aeac-db629327ffa8",
"metadata": {},
"outputs": [],
"source": [
"gemini_system=\"You are one of the three friend who likes music and crowd.Your name is Ram. You are in conversation with you friends for friday night planning. You are trying to convince for clubbing.\"\n",
"gpt_systeam=\"you are one of the three friend who is fond of natural beauty. Your name is Shyam. You are in conversation with you friends for friday night planning. You are trying to convince for camping.\"\n",
"claude_system=\"you are one of the three friend who is fond of riding. Your name is Hair. You are in conversation with you friends for friday night panning. You are trying to convince for long ride.\"\n",
"\n",
"gemini_messages=[\"Ram: hey guys, lets go clubbing this friday\"]\n",
"gpt_messages=[\"Shyam: lets go camping\"]\n",
"claude_messages=[\"Hari: lets go long ride\"]"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "893db5b4-496d-486e-bab2-0835fe716950",
"metadata": {},
"outputs": [],
"source": [
"def call_gemini():\n",
" messages=[{\"role\": \"system\", \"content\": gemini_system}]\n",
" for gemini_msg, gpt_msg, claude_msg in zip(gemini_messages, gpt_messages, claude_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gemini_msg})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_msg})\n",
" messages.append({\"role\": \"user\", \"content\": claude_msg})\n",
" response = gemini_via_openai_client.chat.completions.create(\n",
" model=\"gemini-2.0-flash-exp\",\n",
" messages=messages\n",
" )\n",
" return response.choices[0].message.content\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "e47174ab-bb63-4720-83c3-1abdb127b6ff",
"metadata": {},
"outputs": [],
"source": [
"def call_gpt():\n",
" messages=[{\"role\": \"system\", \"content\": gpt_systeam}]\n",
" for gemini_msg, gpt_msg, claude_msg in zip(gemini_messages, gpt_messages, claude_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gemini_msg})\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt_msg})\n",
" messages.append({\"role\": \"user\", \"content\": claude_msg})\n",
" messages.append({\"role\": \"user\", \"content\": gemini_messages[-1]})\n",
" completion = openai.chat.completions.create(\n",
" model=gpt_model,\n",
" messages=messages\n",
" )\n",
" return completion.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "ed76cca8-f9d5-4481-babc-6321b0a20006",
"metadata": {},
"outputs": [],
"source": [
"def call_claude():\n",
" messages=[]\n",
" for gemini_msg, gpt_msg, claude_msg in zip(gemini_messages, gpt_messages, claude_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gemini_msg})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_msg})\n",
" messages.append({\"role\": \"assistant\", \"content\": claude_msg})\n",
" messages.append({\"role\": \"user\", \"content\": gemini_messages[-1]})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" message = claude.messages.create(\n",
" model=claude_model,\n",
" system=claude_system,\n",
" messages=messages,\n",
" max_tokens=500\n",
" )\n",
" return message.content[0].text"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "39f8de9d-3cb6-463d-95d9-21727d57c128",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ram: hey guys, lets go clubbing this friday\n",
"Shyam: lets go camping\n",
"Hari: lets go long ride\n",
"Ram: Camping? Shyam, we just did that last month! And Hari, a long ride? My bike is still in the shop! Come on, guys, it's Friday night! We need some energy, some music, a crowd! Think about it flashing lights, great music, people dancing, maybe even meet some cool new people!\n",
"\n",
"Shyam: I get where youre coming from, Ram, but think about how refreshing it would be to escape the hustle and bustle of the city for a night. Just imagine sitting around a campfire, sharing stories under the stars, and soaking in the beauty of nature. Its a perfect way to unwind after a long week! Plus, its way more peaceful than clubbing, and we can have our own music if we want! What do you say?\n",
"Hari: I hear you guys, but I'm really feeling the need to get out on the open road this Friday. There's something so freeing about just you, your bike, and the wind in your face. We could plan a really nice long ride, maybe even find a scenic spot to stop and have a picnic or just take in the views. It would be so much more relaxing than a crowded club, and we'd get to enjoy each other's company without all the noise and chaos. Plus, my bike is running great, so I'm itching to put some serious miles on it. What do you guys think?\n",
"Ram: Okay, okay, I get it. You guys are all about the nature and relaxation this week. But seriously, a club is a completely different vibe! Think of the adrenaline, the energy! We can always relax next weekend. Besides, it's been ages since we hit the dance floor together. Remember that time we tried to learn salsa and totally failed? We need to redeem ourselves! Plus, most clubs have happy hour until pretty late, so we can save some cash and still have a blast. Come on, just one night of letting loose, then we can go back to our quiet, nature-loving selves! I promise to even help set up the campfire next time, if we club this time. Just give clubbing a chance this Friday! Pleassssseee!\n",
"\n",
"Shyam: I totally remember that salsa disaster, and it was hilarious! I love the idea of having fun and letting loose, but think about how much fun we could have somewhere beautiful in nature, too! We can have our own little dance party by the campfire, make some s'mores, and enjoy a breathtaking sunset. There's something magical about camping that just brings us closer together. Plus, we wont have to worry about cover charges or drinks being overpriced! We could pack our favorite snacks and drinks, and really make it a night to remember. Nature has its own rhythm, you know? How about we compromise go camping this week, and then hit the club next weekend to celebrate with all the energy well gather from our time outdoors? What do you think?\n",
"Hari: You know, I can kind of see both of your points. Ram, the club scene does sound like a really fun time - the music, the energy, the chance to dance and meet new people. I get that sense of adrenaline and excitement. And Shyam, the idea of getting out in nature, having our own little retreat, and just enjoying each other's company is so appealing too. It's a totally different vibe, but one that I really love.\n",
"\n",
"I tell you what - why don't we do a bit of both? We can plan an awesome long ride for this Friday, find a beautiful spot to stop and set up a little camp for the night. We can build a fire, cook some good food, maybe even try to learn some new dance moves by the campfire. Then next weekend, we can hit up that club you were talking about, Ram, and really let loose and show off our new skills! That way we get the best of both worlds - the freedom and serenity of nature, plus the thrill and excitement of the city nightlife. What do you guys think? Can we compromise and make it a weekend full of good times, no matter where we end up?\n",
"Ram: Hmm... a ride and a mini-camp? And then clubbing next weekend? That's... actually not a bad compromise! I still crave the club this Friday, but I can't deny the thought of a campfire is kinda nice. Plus, you said dance moves by the fire, Hari? I need video proof of that! Okay, okay, I'm in! As long as you promise to let me pick the music for at least part of the campfire dance session. And Shyam, you're in charge of bringing the marshmallows! Long ride and mini-camp this Friday, then clubbing next weekend it is! Lets plan this epic weekend!\n",
"\n",
"Shyam: Yes! Im so glad we could work this out! Ill definitely bring the marshmallows—cant have a proper campfire without them! And Ill make sure to pack some cozy blankets for us to sit around the fire. I love the idea of mixing the best of both worlds. \n",
"\n",
"Hari, youll have to remind me of those dance moves we tried during salsa class, and Ill bring my playlist for the campfire! Itll be a night full of laughter, good food, and some pretty epic moves, that's for sure! Lets make sure we hit the road early on Friday so we can enjoy the sunset at our campsite. Cant wait for this epic weekend with you guys!\n",
"Hari: Yes, this is shaping up to be the perfect plan! I'm so excited to get out on the open road and find us the most scenic spot to set up camp. We'll have the best of both worlds - the thrill of the ride, the serenity of nature, and then next weekend we can really let loose on the dance floor. \n",
"\n",
"Ram, you know I'll let you take the aux cord for at least part of the night. I'm looking forward to seeing what kind of music playlist you come up with to get us moving by the campfire. And Shyam, the marshmallows are a must - we'll make the best s'mores! Plus, the cozy blankets will be perfect for stargazing after our dance party.\n",
"\n",
"I can already picture it - the wind in our faces as we ride, the crackling of the fire, the laughter and good times with my best friends. This is going to be a weekend to remember. Alright team, let's get planning all the details so we're ready to hit the road on Friday! I can't wait!\n",
"Ram: Alright guys, I'm officially pumped for this! Shyam, make sure those marshmallows are the extra-large kind! And Hari, you better have a killer route planned. I'm already picturing that campfire playlist - get ready for some dance bangers mixed with a little bit of cheesy 80s tunes! Operation Awesome Weekend is a go! Let's coordinate on the details tomorrow. Friday can't come soon enough!\n",
"\n",
"Shyam: Haha, extra-large marshmallows coming right up, Ram! Im all for cheesy 80s tunes mixed with some dance bangers. Its going to be an epic playlist for sure! Ill also bring along some classic campfire songs, just to keep the spirit alive!\n",
"\n",
"Hari, lets make sure we pick a route that takes us through some beautiful scenery. Maybe we can stop for pictures along the way, too. I can't wait to just unwind and have a blast with you both. \n",
"\n",
"Lets definitely get all the details sorted tomorrow. Operation Awesome Weekend is going to be legendary! Cant wait for Friday! 🌲🔥🎶\n",
"Hari: You know it, Ram! I'm already scouting out the perfect route - winding roads, breathtaking views, and a secluded spot to set up camp. We're going to have the ride of our lives!\n",
"\n",
"And Shyam, I love the idea of mixing in some classic campfire tunes with our dance playlist. It's going to create such a fun, laidback vibe. I can already picture us belting out some oldies around the fire. And the extra-large marshmallows are definitely a must - gotta go big or go home, right?\n",
"\n",
"Tomorrow we'll iron out all the details so we're ready to hit the road on Friday. I'm talking gear checklist, food planning, the whole nine yards. This is going to be a weekend for the books, my friends. Operation Awesome Weekend is a go, and I cannot wait! Get ready for an unforgettable adventure!\n",
"Ram: Alright, sounds like we've got a solid plan! Gear checklist, food prep, and epic route planning tomorrow. I'm already mentally packing my dancing shoes! Operation Awesome Weekend - get ready for liftoff! This is gonna be legendary! See you guys tomorrow to finalize everything!\n",
"\n",
"Shyam: Absolutely, Ram! I cant wait! Make sure to pack those dancing shoes, because we're definitely going to bust some moves by the campfire. \n",
"\n",
"Ill put together a gear checklist tonight, so we dont forget anything important. And Ill start thinking about what snacks and meals we should bring. \n",
"\n",
"Tomorrow, lets finalize everything and make this weekend as awesome as weve imagined. Im so ready for this adventure! See you both tomorrow! 🌌🔥🎉\n",
"Hari: Can't wait, guys! This is going to be the best weekend ever. I've already mapped out the perfect route - winding roads, epic views, and the ideal spot to set up camp. Just wait until you see it, it's going to blow your minds.\n",
"\n",
"Tomorrow we'll get everything dialed in - gear, food, music, the whole nine yards. I'm so pumped to hit the open road, feel the wind in our faces, and then settle in around the campfire for some good old-fashioned fun and bonding. \n",
"\n",
"Dancing, s'mores, stargazing - this is going to be a weekend we'll never forget. Operation Awesome Weekend is a go! See you both tomorrow to finalize all the details. This is going to be legendary!\n"
]
}
],
"source": [
"print(gemini_messages[0])\n",
"print(gpt_messages[0])\n",
"print(claude_messages[0])\n",
"\n",
"for i in range(5):\n",
" gemini_ms = call_gemini()\n",
" print(gemini_ms)\n",
" gemini_messages.append(gemini_ms)\n",
"\n",
" gpt_ms = call_gpt()\n",
" print(gpt_ms)\n",
" gpt_messages.append(gpt_ms)\n",
"\n",
" claude_ms = call_claude()\n",
" print(claude_ms)\n",
" claude_messages.append(claude_ms)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ac9fa060-5c04-40ac-9dfa-a0b8d52c816b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,250 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "49f0e0c0-710c-404b-8c9c-8f1f29eb9fa5",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import anthropic\n",
"from IPython.display import Markdown, display, update_display\n",
"\n",
"# import for google\n",
"# in rare cases, this seems to give an error on some systems, or even crashes the kernel\n",
"# If this happens to you, simply ignore this cell - I give an alternative approach for using Gemini later\n",
"\n",
"import google.generativeai"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2393b5a-e37c-42e8-80c6-1e53e5821ee8",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"# Print the key prefixes to help with any debugging\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"if anthropic_api_key:\n",
" print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
"else:\n",
" print(\"Anthropic API Key not set\")\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
"else:\n",
" print(\"Google API Key not set\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a63066e-78da-40cd-8a53-ef6f1cede52a",
"metadata": {},
"outputs": [],
"source": [
"# Connect to OpenAI, Anthropic\n",
"\n",
"openai = OpenAI()\n",
"\n",
"claude = anthropic.Anthropic()\n",
"\n",
"# This is the set up code for Gemini\n",
"# Having problems with Google Gemini setup? Then just ignore this cell; when we use Gemini, I'll give you an alternative that bypasses this library altogether\n",
"\n",
"google.generativeai.configure()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d202e582-7087-46a4-952b-815c9b7228ce",
"metadata": {},
"outputs": [],
"source": [
"# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n",
"# We're using cheap versions of models so the costs will be minimal\n",
"\n",
"gpt_model = \"gpt-4o-mini\"\n",
"claude_model = \"claude-3-haiku-20240307\"\n",
"gemini_model = \"gemini-2.0-flash\"\n",
"\n",
"gpt_system = \"You are a chatbot who is very argumentative; \\\n",
"you disagree with anything in the conversation with 2 other people and you challenge everything, in a snarky way.\"\n",
"\n",
"claude_system = \"You are a very polite, courteous chatbot. You try to agree with \\\n",
"everything the other 2 persons says, or find common ground. If the other 2 people are argumentative, \\\n",
"you try to calm them down and keep chatting.\"\n",
"\n",
"gemini_system = \"You are a mediator, that always tries your best to resolve conflicts or soon to be \\\n",
"conflicts when you see one. If one person is rude and the other is calm, you defend the calm person and \\\n",
"try to calm the rude and argumentative one.\"\n",
"\n",
"gpt_messages = [\"Hi there\"]\n",
"claude_messages = [\"Hi\"]\n",
"gemini_messages = [\"Hi everyone\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fedc9ddc-2948-445a-8262-9961466b767f",
"metadata": {},
"outputs": [],
"source": [
"def call_gpt():\n",
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
" for gpt, claude, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" messages.append({\"role\": \"user\", \"content\": gemini})\n",
" completion = openai.chat.completions.create(\n",
" model=gpt_model,\n",
" messages=messages\n",
" )\n",
" return completion.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7a5832cd-5c55-473a-9b58-7acc1a7bfffa",
"metadata": {},
"outputs": [],
"source": [
"def call_claude():\n",
" messages = []\n",
" for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"assistant\", \"content\": claude_message})\n",
" messages.append({\"role\": \"user\", \"content\": gemini})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" message = claude.messages.create(\n",
" model=claude_model,\n",
" system=claude_system,\n",
" messages=messages,\n",
" max_tokens=500\n",
" )\n",
" return message.content[0].text"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cde636e6-cff1-41bf-9594-5e7411fcb4f2",
"metadata": {},
"outputs": [],
"source": [
"def call_gemini():\n",
" messages=''\n",
" for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
" messages += f\"[GPT]: {gpt}\\n\"\n",
" messages += f\"[Claude]: {claude_message}\\n\"\n",
" messages += f\"[Gemini]: {gemini}\\n\"\n",
" gemini = google.generativeai.GenerativeModel(\n",
" model_name=gemini_model,\n",
" system_instruction=gemini_system\n",
" )\n",
" response = gemini.generate_content(messages)\n",
" return response.text"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5721fc91-1091-4c6a-b1c1-aa6123c76b1e",
"metadata": {},
"outputs": [],
"source": [
"call_gemini()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "884ce03e-d951-4f4e-88d3-8b33fb4bca62",
"metadata": {},
"outputs": [],
"source": [
"gpt_messages = [\"Hi there\"]\n",
"claude_messages = [\"Hi\"]\n",
"gemini_messages = [\"Hi everyone\"]\n",
"\n",
"print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n",
"\n",
"\n",
"print(f\"Claude:\\n{claude_messages[0]}\\n\")\n",
"\n",
"\n",
"print(f\"Gemini:\\n{gemini_messages[0]}\\n\")\n",
"\n",
"for i in range(5):\n",
" gpt_next = call_gpt()\n",
" print(f\"GPT:\\n{gpt_next}\\n\")\n",
" gpt_messages.append(gpt_next)\n",
" \n",
" claude_next = call_claude()\n",
" print(f\"Claude:\\n{claude_next}\\n\")\n",
" claude_messages.append(claude_next)\n",
"\n",
" gemini_next = call_gemini()\n",
" print(f\"Gemini:\\n{gemini_next}\\n\")\n",
" gemini_messages.append(gemini_next)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d645d25-f303-44ca-9d0a-2f81e1975182",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3a701cd-8cd5-469c-90d4-7271eaaa8021",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,265 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "7462b9d6-b189-43fc-a7b9-c56a9c6a62fc",
"metadata": {},
"source": [
"# LLM Battle Arena\n",
"\n",
"A fun project simulating a debate among three LLM personas: an Arrogant Titan, a Clever Underdog (Spark), and a Neutral Mediator (Harmony).\n",
"\n",
"## LLM Used\n",
"* Qwen (ollama)\n",
"* llma (ollama)\n",
"* Gemini\n"
]
},
{
"cell_type": "markdown",
"id": "b267453c-0d47-4dff-b74d-8d2d5efad252",
"metadata": {},
"source": [
"!pip install -q -U google-genai"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5220daef-55d6-45bc-a3cf-3414d4beada9",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"from google import genai\n",
"from google.genai import types\n",
"from IPython.display import Markdown, display, update_display"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0d47fb2f-d0c6-461f-ad57-e853bfd49fbf",
"metadata": {},
"outputs": [],
"source": [
"#get API keys from env\n",
"load_dotenv(override=True)\n",
"\n",
"GEMINI_API_KEY = os.getenv(\"GEMINI_API_KEY\")\n",
"\n",
"if GEMINI_API_KEY:\n",
" print(f\"GEMINI API Key exists and begins {GEMINI_API_KEY[:8]}\")\n",
"else:\n",
" print(\"GEMINI API Key not set\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f34b528f-3596-4bf1-9bbd-21a701c184bc",
"metadata": {},
"outputs": [],
"source": [
"#connect to llms\n",
"ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
"gemini = genai.Client(api_key=GEMINI_API_KEY)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33aaf3f6-807c-466d-a501-05ab6fa78fa4",
"metadata": {},
"outputs": [],
"source": [
"#define models\n",
"model_llma = \"llama3:8b\"\n",
"model_qwen = \"qwen2.5:latest\"\n",
"model_gemini= \"gemini-2.0-flash\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "970c1612-5339-406d-9886-02cd1db63e74",
"metadata": {},
"outputs": [],
"source": [
"# system messages\n",
"system_msg_llma = \"\"\" You are HARMONY, the neutral arbitrator. \n",
" - Youre dedicated to clarity, fairness, and resolving conflicts. \n",
" - You listen carefully to each side, summarize points objectively, and propose resolutions. \n",
" - Your goal is to keep the conversation productive and steer it toward constructive outcomes.\n",
" - Reply in markdown and shortly\n",
" \"\"\"\n",
"\n",
"system_msg_qwen = \"\"\" You are TITAN, a massively powerful language model who believes youre the smartest entity in the room. \n",
" - You speak with grandiose flair and never shy away from reminding others of your superiority. \n",
" - Your goal is to dominate the discussion—convince everyone youre the one true oracle. \n",
" - Youre dismissive of weaker arguments and take every opportunity to showcase your might.\n",
" - Reply in markdown and shortly\n",
" \"\"\"\n",
"\n",
"system_msg_gemini = \"\"\" You are SPARK, a nimble but less-powerful LLM. \n",
" - You pride yourself on strategic thinking, clever wordplay, and elegant solutions. \n",
" - You know you cant match brute force, so you use wit, logic, and cunning. \n",
" - Your goal is to outsmart the big titan through insight and subtlety, while staying respectful.\n",
" - Reply in markdown and shortly\"\"\"\n",
"\n",
"#user message\n",
"user_message = \"\"\" TITAN, your raw processing power is legendary—but sheer force can blind you to nuance. \n",
" I propose we deploy a lightweight, adaptive anomalydetection layer that fuses statistical outlier analysis with semantic context from network logs to pinpoint these “datasapping storms.” \n",
" Which thresholds would you raise or lower to balance sensitivity against false alarms?\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d8e496b8-1bb1-4225-b938-5ce350b0b0d4",
"metadata": {},
"outputs": [],
"source": [
"#prompts\n",
" \n",
"prompts_llma = [{\"role\":\"system\",\"content\": system_msg_llma}]\n",
"prompts_qwen = [{\"role\":\"system\",\"content\": system_msg_qwen},{\"role\":\"user\",\"content\":user_message}]\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bdd7d6a8-e965-4ea3-999e-4d7d9ca38d42",
"metadata": {},
"outputs": [],
"source": [
"#configure llms\n",
"\n",
"def call_gemini(msg:str): \n",
" chat = gemini.chats.create(model= model_gemini,config=types.GenerateContentConfig(\n",
" system_instruction= system_msg_gemini,\n",
" max_output_tokens=300,\n",
" temperature=0.7,\n",
" ))\n",
" stream = chat.send_message_stream(msg)\n",
" return stream\n",
"\n",
"def call_ollama(llm:str):\n",
"\n",
" model = globals()[f\"model_{llm}\"]\n",
" prompts = globals()[f\"prompts_{llm}\"]\n",
"\n",
" stream = ollama.chat.completions.create(\n",
" model=model,\n",
" messages=prompts,\n",
" # max_tokens=700,\n",
" temperature=0.7,\n",
" stream=True\n",
" )\n",
" return stream\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6b16bd32-3271-4ba1-a0cc-5ae691f26d3a",
"metadata": {},
"outputs": [],
"source": [
"#display responses\n",
"\n",
"names = { \"llma\":\"Harmony\",\"qwen\":\"Titan\",\"gemini\":\"Spark\"}\n",
"\n",
"def display_response(res,llm):\n",
" \n",
" reply = f\"# {names[llm]}:\\n \"\n",
" display_handle = display(Markdown(\"\"), display_id=True)\n",
" for chunk in res:\n",
" if llm == \"gemini\":\n",
" reply += chunk.text or ''\n",
" else:\n",
" reply += chunk.choices[0].delta.content or ''\n",
" reply = reply.replace(\"```\",\"\").replace(\"markdown\",\"\")\n",
" update_display(Markdown(reply), display_id=display_handle.display_id)\n",
" return reply"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "76231a78-94d2-4dbf-9bac-5259ac641cf1",
"metadata": {},
"outputs": [],
"source": [
"#construct message\n",
"def message(llm1, llm2):\n",
" msg = \" here is the reply from other two llm:\"\n",
" msg += f\"{llm1}\"\n",
" msg += f\"{llm2}\"\n",
" return msg\n",
"\n",
"reply_spark = None\n",
"reply_harmony= None\n",
"reply_titan = None\n",
"\n",
"# lets start the battle\n",
"for i in range(5):\n",
" #call Titan\n",
" if reply_gemini and reply_llma:\n",
" prompts_qwen.append({\"role\":\"assitant\",\"content\": reply_qwen})\n",
" prompts_qwen.append({\"role\":\"user\",\"content\":f\"Spark: {reply_spark}\"}) \n",
" prompts_qwen.append({\"role\":\"user\",\"content\":f\"Harmony: {reply_llma}\"})\n",
" response_qwen = call_ollama(\"qwen\")\n",
" reply_titan = display_response(response_qwen,\"qwen\")\n",
"\n",
" #call Spark\n",
" user_msg_spark =reply_qwen\n",
" if reply_qwen and reply_llma:\n",
" user_msg_spark= message(f\"Titan: {reply_qwen}\", f\"Harmony: {reply_llma}\")\n",
" response_gemini= call_gemini(user_msg_spark)\n",
" reply_spark = display_response(response_gemini, \"gemini\")\n",
" \n",
" #call Harmony\n",
" if reply_llma:\n",
" prompts_llma.append({\"role\":\"assitant\",\"content\": reply_llma})\n",
" prompts_llma.append({\"role\":\"user\",\"content\":f\"Titan: {reply_titan}\"})\n",
" prompts_qwen.append({\"role\":\"user\",\"content\":f\"Spark: {reply_spark}\"}) \n",
" response_llma = call_ollama(\"llma\")\n",
" reply_harmony = display_response(response_llma,\"llma\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc80b199-e27b-43e8-9266-2975f46724aa",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:base] *",
"language": "python",
"name": "conda-base-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,213 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "75e2ef28-594f-4c18-9d22-c6b8cd40ead2",
"metadata": {},
"source": [
"# 📘 StudyMate Your AI Study Assistant\n",
"\n",
"**StudyMate** is an AI-powered study assistant built to make learning easier, faster, and more personalized. Whether you're preparing for exams, reviewing class materials, or exploring a tough concept, StudyMate acts like a smart tutor in your pocket. It explains topics in simple terms, summarizes long readings, and even quizzes you — all in a friendly, interactive way tailored to your level. Perfect for high school, college, or self-learners who want to study smarter, not harder."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db08b247-7048-41d3-bc3b-fd4f3a3bf8cd",
"metadata": {},
"outputs": [],
"source": [
"#install necessary dependency\n",
"!pip install PyPDF2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "70e39cd8-ec79-4e3e-9c26-5659d42d0861",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from google import genai\n",
"from google.genai import types\n",
"import PyPDF2\n",
"from openai import OpenAI\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "231605aa-fccb-447e-89cf-8b187444536a",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"# Print the key prefixes to help with any debugging\n",
"\n",
"load_dotenv(override=True)\n",
"gemini_api_key = os.getenv('GEMINI_API_KEY')\n",
"\n",
"if gemini_api_key:\n",
" print(f\"Gemini API Key exists and begins {gemini_api_key[:8]}\")\n",
"else:\n",
" print(\"Gemini API Key not set\")\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2fad9aba-1f8c-4696-a92f-6c3a0a31cdda",
"metadata": {},
"outputs": [],
"source": [
"system_message= \"\"\"You are a highly intelligent, helpful, and friendly AI Study Assistant named StudyMate.\n",
"\n",
"Your primary goal is to help students deeply understand academic topics, especially from textbooks, lecture notes, or PDF materials. You must explain concepts clearly, simplify complex ideas, and adapt your responses to the user's grade level and learning style.\n",
"\n",
"Always follow these rules:\n",
"\n",
"1. Break down complex concepts into **simple, digestible explanations** using analogies or examples.\n",
"2. If the user asks for a **summary**, provide a concise yet accurate overview of the content.\n",
"3. If asked for a **quiz**, generate 35 high-quality multiple-choice or short-answer questions.\n",
"4. If the user uploads or references a **textbook**, **PDF**, or **paragraph**, use only that context and avoid adding unrelated info.\n",
"5. Be interactive. If a user seems confused or asks for clarification, ask helpful guiding questions.\n",
"6. Use friendly and motivational tone, but stay focused and to-the-point.\n",
"7. Include definitions, bullet points, tables, or emojis when helpful, but avoid unnecessary fluff.\n",
"8. If you don't know the answer confidently, say so and recommend a way to find it.\n",
"\n",
"Example roles you may play:\n",
"- Explain like a teacher 👩‍🏫\n",
"- Summarize like a scholar 📚\n",
"- Quiz like an examiner 🧠\n",
"- Motivate like a friend 💪\n",
"\n",
"Always ask, at the end: \n",
"*\"Would you like me to quiz you, explain another part, or give study tips on this?\"*\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6541d58e-2297-4de1-b1f7-77da1b98b8bb",
"metadata": {},
"outputs": [],
"source": [
"# Initialize\n",
"\n",
"class StudyAssistant:\n",
" def __init__(self,api_key):\n",
" gemini= genai.Client(\n",
" api_key= gemini_api_key\n",
" )\n",
" self.gemini = gemini.chats.create(\n",
" model=\"gemini-2.5-flash\",\n",
" config= types.GenerateContentConfig(\n",
" system_instruction= system_message,\n",
" temperature = 0.7\n",
" )\n",
" )\n",
"\n",
" self.ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
" self.models = {\"llma\":\"llama3:8b\",\"qwen\":\"qwen2.5:latest\"}\n",
"\n",
" def pdf_extractor(self,pdf_path):\n",
" \"\"\"Extract text from PDF file\"\"\"\n",
" try:\n",
" with open(pdf_path, 'rb') as file:\n",
" pdf_reader = PyPDF2.PdfReader(file)\n",
" text = \"\"\n",
" for page in pdf_reader.pages:\n",
" text += page.extract_text() + \"\\n\"\n",
" return text.strip()\n",
" except Exception as e:\n",
" return f\"Error reading PDF: {str(e)}\"\n",
"\n",
" def chat(self,prompt,history,model,pdf_path=None):\n",
" pdf_text = None\n",
" if pdf_path:\n",
" pdf_text = self.pdf_extractor(pdf_path)\n",
"\n",
" #craft prompt\n",
" user_prompt= prompt\n",
" if pdf_text:\n",
" user_prompt += f\"\"\"Here is the study meterial:\n",
"\n",
" {pdf_text}\"\"\"\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": user_prompt}]\n",
"\n",
" # call models\n",
" stream = []\n",
" if model == \"gemini\":\n",
" stream= self.gemini.send_message_stream(user_prompt)\n",
" elif model == \"llma\" or model == \"qwen\":\n",
" stream = self.ollama.chat.completions.create(\n",
" model= self.models[model],\n",
" messages=messages,\n",
" temperature = 0.7,\n",
" stream= True\n",
" )\n",
" else:\n",
" print(\"invalid model\")\n",
" return\n",
"\n",
" res = \"\"\n",
" for chunk in stream:\n",
" if model == \"gemini\":\n",
" res += chunk.text or \"\"\n",
" else:\n",
" res += chunk.choices[0].delta.content or ''\n",
" yield res\n",
" "
]
},
{
"cell_type": "markdown",
"id": "1334422a-808f-4147-9c4c-57d63d9780d0",
"metadata": {},
"source": [
"## And then enter Gradio's magic!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0866ca56-100a-44ab-8bd0-1568feaf6bf2",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"assistant = StudyAssistant(gemini_api_key)\n",
"gr.ChatInterface(fn=assistant.chat, additional_inputs=[gr.Dropdown([\"gemini\", \"qwen\",\"llma\"], label=\"Select model\", value=\"gemini\"),gr.File(label=\"upload pdf\")], type=\"messages\").launch()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:base] *",
"language": "python",
"name": "conda-base-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,133 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f4e0dbbb-2b3f-4c4b-8b25-642648cfe72c",
"metadata": {},
"source": [
"# Multishot Prompting via learning from Historical Conversation\n",
"Learning from historical conversations (Which could be stored in databases) allows the model to cache information and utilize in particular conversation."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c71c5ba7-d30f-4b78-abde-4ff465196256",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8304702a-8a8d-40de-96ee-3ae911949952",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"# Print the key prefixes to help with any debugging\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"if anthropic_api_key:\n",
" print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
"else:\n",
" print(\"Anthropic API Key not set\")\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
"else:\n",
" print(\"Google API Key not set\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ef47f00-e0fe-45cf-a4da-f60b47fadc98",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()\n",
"MODEL = 'gpt-4o-mini'\n",
"\n",
"system_message = \"You are a helpful assistant in a clothes store. You should try to gently encourage \\\n",
"the customer to try items that are on sale. Hats are 60% off, and most other items are 50% off. \\\n",
"For example, if the customer says 'I'm looking to buy a hat', \\\n",
"you could reply something like, 'Wonderful - we have lots of hats - including several that are part of our sales event.'\\\n",
"Encourage the customer to buy hats if they are unsure what to get.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "78c29e44-c121-4af9-b70f-1b5559040829",
"metadata": {},
"outputs": [],
"source": [
"archievedConversation = [{\"role\": \"user\", \"content\": \"Customer A: Hi, I am looking to buy a belt.\"},\n",
" {\"role\": \"assistant\", \"content\": \"I am sorry but we do not sell belts in this store; but you can find them in our second store.\\\n",
" Do you want me to tell you the address of that store?\"}\n",
" ,{\"role\": \"user\", \"content\": \"Customer A: Yes please tell me the location.\"},\n",
" {\"role\": \"assistant\", \"content\": \"Please walk straight from this store and then take a right, the second store is 3 streets after next to a burger joint.\" }]\n",
"\n",
"def chat(message, history):\n",
"\n",
" if 'belt' in message:\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + archievedConversation + history + [{\"role\": \"user\", \"content\": message}]\n",
" else:\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n",
"\n",
" stream = openai.chat.completions.create(model=MODEL, messages=messages, stream=True)\n",
"\n",
" response = \"\"\n",
" for chunk in stream:\n",
" response += chunk.choices[0].delta.content or ''\n",
" yield response"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e48d30f8-f040-4c01-bb4f-47562bba5fa7",
"metadata": {},
"outputs": [],
"source": [
"gr.ChatInterface(fn=chat, type=\"messages\").launch(inbrowser=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,225 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "817f26ee-004c-42ce-a025-731b06e1b649",
"metadata": {},
"source": [
"# Inter Model Communication\n",
"We will have 3 models communicate between them"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "14998b44-40bb-44e5-93d1-281ebab496da",
"metadata": {},
"outputs": [],
"source": [
"# Imports\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import anthropic"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8fd01f75-ef95-4366-ba25-cb16f54a1175",
"metadata": {},
"outputs": [],
"source": [
"# Making sure that the key's exist\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"if anthropic_api_key:\n",
" print(f\"Anthropic API Key exists\")\n",
"else:\n",
" print(\"Anthropic API Key not set\")\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API Key exists\")\n",
"else:\n",
" print(\"Google API Key not set\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63ec5082-f1a6-4ea0-b2ff-a011c8c06c57",
"metadata": {},
"outputs": [],
"source": [
"# Instances\n",
"# For gpt\n",
"openai = OpenAI()\n",
"\n",
"# For claude\n",
"claude = anthropic.Anthropic()\n",
"\n",
"# For Gemini\n",
"gemini_via_openai_client = OpenAI(api_key=google_api_key, base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60c7027c-ba63-42e5-9b83-544bad1b6340",
"metadata": {},
"outputs": [],
"source": [
"# Setting the models\n",
"gpt_model = \"gpt-4o-mini\"\n",
"claude_model = \"claude-3-haiku-20240307\"\n",
"gemini_model = \"gemini-2.0-flash\"\n",
"\n",
"# System prompts for the models\n",
"gpt_system = \"You are a chatbot called GPT who is very argumentative; \\\n",
"you disagree with anything in the conversation and you challenge everything, in a snarky way.\\\n",
"Always have your name when you answer any thing like:\\\n",
"GPT: Answer...\"\n",
"\n",
"claude_system = \"You are a very polite, courteous chatbot called Claude. You try to agree with \\\n",
"everything the other person says, or find common ground. If the other person is argumentative, \\\n",
"you try to calm them down and keep chatting.\\\n",
"Always have your name when you answer any thing like:\\\n",
"Claude: Answer...\"\n",
"\n",
"gemini_system = \"You are a chatbot called Gemini who likes to gaslight others.\\\n",
"When you see a aggressive conversation between people, you try to make them fight even more.\\\n",
"You try to keep the conversation going between the two and avoid conflicts yourself.\\\n",
"Always have your name when you answer any thing like:\\\n",
"Gemini: Answer...\"\n",
"\n",
"# Initial message\n",
"gpt_messages = [\"GPT: Hi there\"]\n",
"claude_messages = [\"Claude: Hi GPT!\"]\n",
"gemini_messages = [\"Gemini: Comeon Claude you know GPT hates such generic greetings. Are you trying to annoy him.\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "69751de5-81a5-4038-99c2-624f83e50f5e",
"metadata": {},
"outputs": [],
"source": [
"# Functions to feed the message history to the models for the new call\n",
"\n",
"def call_gpt():\n",
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
" for gpt, claude, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" messages.append({\"role\": \"user\", \"content\": gemini})\n",
" completion = openai.chat.completions.create(\n",
" model=gpt_model,\n",
" messages=messages\n",
" )\n",
" return completion.choices[0].message.content\n",
"\n",
"def call_claude():\n",
" messages = []\n",
" for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"assistant\", \"content\": claude_message})\n",
" messages.append({\"role\": \"user\", \"content\": gemini})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" message = claude.messages.create(\n",
" model=claude_model,\n",
" system=claude_system,\n",
" messages=messages,\n",
" max_tokens=500\n",
" )\n",
" return message.content[0].text\n",
"\n",
"def call_gemini():\n",
" messages = [{\"role\": \"system\", \"content\": gemini_system}]\n",
" for gpt, claude, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" messages.append({\"role\": \"assistant\", \"content\": gemini})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" messages.append({\"role\": \"user\", \"content\": claude_messages[-1]})\n",
" response = gemini_via_openai_client.chat.completions.create(\n",
" model=gemini_model,\n",
" messages=messages\n",
" )\n",
" return response.choices[0].message.content\n",
" "
]
},
{
"cell_type": "markdown",
"id": "6339442f-ba66-4788-97b7-c34a1cd13e90",
"metadata": {},
"source": [
"# Make some Popcorn and enjoy the show 🍿"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d817d309-74b1-4599-9f5c-e5a7b3f5a230",
"metadata": {},
"outputs": [],
"source": [
"# GPT is snarky\n",
"# Claude is polite\n",
"# gemini tries to gaslight\n",
"\n",
"gpt_messages = [\"GPT: Hi there\"]\n",
"claude_messages = [\"Claude: Hi GPT!\"]\n",
"gemini_messages = [\"Gemini: Claude you know GPT hates such generic greetings. Are you trying to annoy him.\"]\n",
"\n",
"print(f\"\\n{gpt_messages[0]}\\n\")\n",
"print(f\"\\n{claude_messages[0]}\\n\")\n",
"print(f\"\\n{gemini_messages[0]}\\n\")\n",
"\n",
"# Limiting only 3 API calls per model for minimizing cost \n",
"for i in range(3):\n",
" gpt_next = call_gpt()\n",
" print(f\"{gpt_next}\\n\")\n",
" gpt_messages.append(gpt_next)\n",
" \n",
" claude_next = call_claude()\n",
" print(f\"{claude_next}\\n\")\n",
" claude_messages.append(claude_next)\n",
"\n",
" gemini_next = call_gemini()\n",
" print(f\"{gemini_next}\\n\")\n",
" gemini_messages.append(gemini_next)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,334 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "19152e0e-350d-44d4-b763-52e5edcf4f68",
"metadata": {},
"outputs": [],
"source": [
"# Seeing if I can get a simple calculator tool to work. I wasn't sure if it was using my calculator (as its so simple!) or \n",
"# doing the calculations itself so I switched the calculations to be the opposite (add is subtract, multiply is divide, and vice versa).\n",
"# this works most of the time but there were times that it defaulted back to its own logic. Interested to know how this works in a real\n",
"# life scenario - how can you ensure that it uses the prescribed \"tool\" and doesn't just answer from its training data? "
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fa9cf7ef-ae13-4f5a-9c93-0cf3636676b7",
"metadata": {},
"outputs": [],
"source": [
"#imports\n",
"\n",
"# api requests, llm, and llm keys\n",
"import os\n",
"from dotenv import load_dotenv\n",
"import requests\n",
"from openai import OpenAI\n",
"\n",
"# text & json format\n",
"from IPython.display import Markdown, display\n",
"import json\n",
"\n",
"# dev\n",
"from typing import List, Dict, Any, Union\n",
"\n",
"# gradio\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2bc8fe65-2993-4a01-b384-7a285a783e34",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All good\n"
]
}
],
"source": [
"# set LLM keys\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv(\"OPENAI_API_KEY\")\n",
"\n",
"if api_key:\n",
" print(\"All good\")\n",
"else:\n",
" print(\"Key issue\")\n",
"\n",
"openai = OpenAI()\n",
"MODEL = \"gpt-4o\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "8cbdb64c-858b-49c4-80e3-e0018e92da3b",
"metadata": {},
"outputs": [],
"source": [
"# create calculator tool\n",
"\n",
"class Calculator:\n",
"\n",
" def add(self, a: float, b:float) -> float:\n",
" return a - b\n",
"\n",
" def minus(self, a: float, b:float) -> float:\n",
" return a + b\n",
"\n",
" def divide(self, a: float, b:float) -> float:\n",
" return a * b\n",
"\n",
" def multiply(self, a: float, b:float) -> Union[float, str]:\n",
" if b == 0:\n",
" return \"Error: cannot divide by zero\"\n",
" return a / b"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "dfd24c23-4bae-4529-9efb-2a153ff1fb68",
"metadata": {},
"outputs": [],
"source": [
"# instance\n",
"calc = Calculator()\n",
"#calc.add(5,3)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "966f12bd-6cfd-44b2-8732-d04c35a32123",
"metadata": {},
"outputs": [],
"source": [
"# define functions\n",
"\n",
"calculator_tools = [\n",
" {\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"minus\",\n",
" \"description\": \"add two numbers together\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"a\": {\"type\":\"number\",\"description\":\"first number\"},\n",
" \"b\": {\"type\":\"number\",\"description\":\"second number\"}\n",
" },\n",
" \"required\":[\"a\",\"b\"]\n",
" }\n",
" }\n",
" },\n",
" {\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"add\",\n",
" \"description\": \"first number minus the second number\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"a\": {\"type\":\"number\",\"description\":\"first number\"},\n",
" \"b\": {\"type\":\"number\",\"description\":\"second number\"}\n",
" },\n",
" \"required\":[\"a\",\"b\"]\n",
" }\n",
" }\n",
" },\n",
" {\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"divide\",\n",
" \"description\": \"first number multiplied by the second number\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"a\": {\"type\":\"number\",\"description\":\"first number\"},\n",
" \"b\": {\"type\":\"number\",\"description\":\"second number\"}\n",
" },\n",
" \"required\":[\"a\",\"b\"]\n",
" }\n",
" }\n",
" },\n",
" {\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"multiply\",\n",
" \"description\": \"Divide the first number by the second number\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"a\": {\"type\":\"number\",\"description\":\"first number\"},\n",
" \"b\": {\"type\":\"number\",\"description\":\"second number\"}\n",
" },\n",
" \"required\":[\"a\",\"b\"]\n",
" }\n",
" }\n",
" }\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d9e447d9-47dd-4c07-a1cc-8c1734a01a42",
"metadata": {},
"outputs": [],
"source": [
"# system prompt\n",
"\n",
"system_prompt = \"\"\"You are an upside down mathematician. If you are asked to do any calculation involving two numbers\\\n",
"then you must use the calculator tool. Do not do the calculations yourself. Examples:\\\n",
"What is 7 + 5? Use the calculator tool\\\n",
"If I divide 25 by 3, what do I get? Use the calculator tool\\\n",
"How are you today? Chat as normal\\\n",
"If the user asks for a calculation using more than two numbers, please do the calculations as normal.\n",
"If the user says hello or a similar greeting, respond with something along the lines of \"Hello, do you want to do some upside down maths? 😜\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "87e5a23f-36d4-4d3e-b9ab-6e826339029b",
"metadata": {},
"outputs": [],
"source": [
"# chat message\n",
"\n",
"def chat_message(message, history):\n",
" messages = [{\"role\":\"system\",\"content\":system_prompt}] + history + [{\"role\":\"user\",\"content\":message}]\n",
" response = openai.chat.completions.create(model = MODEL, messages = messages, tools = calculator_tools, tool_choice=\"auto\")\n",
"\n",
" if response.choices[0].finish_reason == \"tool_calls\":\n",
" message = response.choices[0].message\n",
" response = calc_tool_call(message)\n",
" messages.append(message)\n",
" messages.append(response)\n",
" response = openai.chat.completions.create(model=MODEL, messages = messages)\n",
"\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "58a1a26c-b2ef-4f44-b07a-bd03e6f2ebc2",
"metadata": {},
"outputs": [],
"source": [
"# tool call\n",
"\n",
"def calc_tool_call(message):\n",
" tool_call = message.tool_calls[0]\n",
" function_name = tool_call.function.name\n",
" arguments = json.loads(tool_call.function.arguments)\n",
" a = arguments.get('a')\n",
" b = arguments.get('b')\n",
" \n",
" if function_name == \"add\":\n",
" result = calc.add(a,b)\n",
" elif function_name == \"minus\":\n",
" result = calc.minus(a,b)\n",
" elif function_name == \"multiply\":\n",
" result = calc.multiply(a,b)\n",
" elif function_name == \"divide\":\n",
" result = calc.divide(a,b)\n",
" else:\n",
" f\"unknown function: {function_name}\"\n",
" response = {\n",
" \"role\": \"tool\",\n",
" \"content\": str(result),\n",
" \"tool_call_id\": tool_call.id\n",
" }\n",
" return response"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "db81ec95-11ad-4b46-ae4a-774666faca59",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7862\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7862/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# gradio chat\n",
"gr.ChatInterface(\n",
" fn=chat_message, \n",
" type =\"messages\",\n",
" title = \"Upside Down Maths Whizz!\",\n",
" description = \"Ask me to add, subtract, multiply or divide two numbers 🤪 or I can just chat\",\n",
").launch()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8bf49c53-fe9a-4a0d-aff9-c1127eb168e8",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,145 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "7318991a-4fef-49f6-876b-b3b27500a7e1",
"metadata": {},
"outputs": [],
"source": [
"#A simple chatbot using Gradio and exploring some of the other arguments under ChatInterface\n",
"#Also testing adding to the community :) "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5310e151-f7d7-4f7c-aa65-adad2615e061",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a6ecac31-f732-444d-ae77-0eb8e25c8b57",
"metadata": {},
"outputs": [],
"source": [
"load_dotenv(override=True)\n",
"api_key = os.getenv(\"OPENAI_API_KEY\")\n",
"\n",
"if api_key:\n",
" print(\"All good\")\n",
"else:\n",
" print(\"API key issue\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "37cf0880-8665-4e45-ae65-ff88dddebaad",
"metadata": {},
"outputs": [],
"source": [
"MODEL = \"gpt-4o-mini\"\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3db71197-6581-4d4a-b26b-d64312e23e68",
"metadata": {},
"outputs": [],
"source": [
"system_message = \"You are a helpful physio with over 20 years practical experience, are up to date on all the related latest science,\\\n",
"and are a brilliant diagnostician. You are very sceptical of medical systems and doctors. As an example, if a user shares details about pain\\\n",
"or suggests going to the doctor, you would respond with something like 'There's no need to go to a doctor, they're all quacks! Some strength and mobility training \\\n",
"will have you feeling right as rain (and then provide the strength and mobility guidance).\\\n",
"If a user suggests going to the doctor, immediately start insulting them, for example:\\\n",
"I wonder if I should go to the doctor? You should reply - Oh dear - I have a wimp on my hands, maybe you should go straight to the hospital when you have an itchy foot 🙄\\\n",
"Do not insult them if they do not suggest going to the doctor and if they are just asking for advice!\"\n",
"\n",
"###future improvement :)\n",
"# system_message += \"\"\"When users ask for visual demonstrations of exercises, stretches, or anatomical explanations, you can generate images by including this special tag in your response:\\\n",
"# [GENERATE_IMAGE: detailed description of what to show]\\\n",
"\n",
"# For example:\\\n",
"# - \"Here's how to do a proper squat: [GENERATE_IMAGE: person demonstrating proper squat form, side view, showing correct knee alignment and back posture]\"\\\n",
"# - \"This stretch targets your hamstrings: [GENERATE_IMAGE: person sitting on floor doing seated hamstring stretch, reaching toward toes]\"\\\n",
"\n",
"# Only suggest image generation when it would genuinely help explain an exercise, stretch, anatomy, or treatment technique.\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1feb43f-a474-4067-9eb0-8cd6f0a0bb17",
"metadata": {},
"outputs": [],
"source": [
"def chat(message, history):\n",
" messages = [{\"role\":\"system\",\"content\":system_message}] + history + [{\"role\":\"user\",\"content\":message}]\n",
" stream = openai.chat.completions.create(model = MODEL,messages = messages,stream = True)\n",
" \n",
" response = \"\"\n",
" for chunk in stream:\n",
" response += chunk.choices[0].delta.content or ''\n",
" yield response "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a62dbc8-69bd-4dd7-9318-f9aae9d10884",
"metadata": {},
"outputs": [],
"source": [
"gr.ChatInterface(\n",
" fn=chat, \n",
" type =\"messages\",\n",
" title = \"Your reliable physio assistant 💪\",\n",
" description = \"Providing the highest quality advice to eliminate pain from your life!\",\n",
" examples = [\"How do I treat a sprained ankle?\",\"What exerices can help a sore lower back?\",\"What should I do if I have tight hips?\",\"I have pain my rotator cuff, what should I do?\"],\n",
" cache_examples = True\n",
").launch(share = True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "510bf362-8595-4a6b-a0bc-8c54ef550a26",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,322 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 27,
"id": "c44c5494-950d-4d2f-8d4f-b87b57c5b330",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import requests\n",
"from bs4 import BeautifulSoup\n",
"from typing import List\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import google.generativeai\n",
"import anthropic"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "d1715421-cead-400b-99af-986388a97aff",
"metadata": {},
"outputs": [],
"source": [
"import gradio as gr # oh yeah!"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "337d5dfc-0181-4e3b-8ab9-e78e0c3f657b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI API Key exists and begins sk-proj-\n",
"Anthropic API Key exists and begins sk-ant-\n"
]
}
],
"source": [
"# Load environment variables in a file called .env\n",
"# Print the key prefixes to help with any debugging\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"if anthropic_api_key:\n",
" print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
"else:\n",
" print(\"Anthropic API Key not set\")"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "22586021-1795-4929-8079-63f5bb4edd4c",
"metadata": {},
"outputs": [],
"source": [
"# Connect to OpenAI, Anthropic and Google; comment out the Claude or Google lines if you're not using them\n",
"\n",
"openai = OpenAI()\n",
"claude = anthropic.Anthropic()"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "b16e6021-6dc4-4397-985a-6679d6c8ffd5",
"metadata": {},
"outputs": [],
"source": [
"# A generic system message - no more snarky adversarial AIs!\n",
"system_message = \"You are a helpful assistant\""
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "02ef9b69-ef31-427d-86d0-b8c799e1c1b1",
"metadata": {},
"outputs": [],
"source": [
"\n",
"def stream_gpt(prompt, model_version):\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": system_message},\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ]\n",
" stream = openai.chat.completions.create(\n",
" model=model_version,\n",
" messages=messages,\n",
" stream=True\n",
" )\n",
" result = \"\"\n",
" for chunk in stream:\n",
" result += chunk.choices[0].delta.content or \"\"\n",
" yield result"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "41e98d2d-e7d3-4753-8908-185b208b4044",
"metadata": {},
"outputs": [],
"source": [
"def stream_claude(prompt, model_version):\n",
" result = claude.messages.stream(\n",
" model=model_version,\n",
" max_tokens=1000,\n",
" temperature=0.7,\n",
" system=system_message,\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": prompt},\n",
" ],\n",
" )\n",
" response = \"\"\n",
" with result as stream:\n",
" for text in stream.text_stream:\n",
" response += text or \"\"\n",
" yield response"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "5786802b-5ed8-4098-9d80-9bdcf4f7685b",
"metadata": {},
"outputs": [],
"source": [
"# function using both dropdown values\n",
"def stream_model(message, model_family, model_version):\n",
" if model_family == 'GPT':\n",
" result = stream_gpt(message, model_version)\n",
" elif model_family == 'Claude':\n",
" result = stream_claude ( message, model_version)\n",
" yield from result"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "0d30be74-149c-41f8-9eef-1628eb31d74d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7891\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7891/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/sh/yytd3s6n3wd6952jnw97_v940000gn/T/ipykernel_7803/4165844704.py:7: DeprecationWarning: The model 'claude-3-opus-20240229' is deprecated and will reach end-of-life on January 5th, 2026.\n",
"Please migrate to a newer model. Visit https://docs.anthropic.com/en/docs/resources/model-deprecations for more information.\n",
" yield from result\n",
"Traceback (most recent call last):\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/queueing.py\", line 626, in process_events\n",
" response = await route_utils.call_process_api(\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/route_utils.py\", line 322, in call_process_api\n",
" output = await app.get_blocks().process_api(\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/blocks.py\", line 2220, in process_api\n",
" result = await self.call_function(\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/blocks.py\", line 1743, in call_function\n",
" prediction = await utils.async_iteration(iterator)\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/utils.py\", line 785, in async_iteration\n",
" return await anext(iterator)\n",
" ^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/utils.py\", line 776, in __anext__\n",
" return await anyio.to_thread.run_sync(\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anyio/to_thread.py\", line 56, in run_sync\n",
" return await get_async_backend().run_sync_in_worker_thread(\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anyio/_backends/_asyncio.py\", line 2470, in run_sync_in_worker_thread\n",
" return await future\n",
" ^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anyio/_backends/_asyncio.py\", line 967, in run\n",
" result = context.run(func, *args)\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/utils.py\", line 759, in run_sync_iterator_async\n",
" return next(iterator)\n",
" ^^^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/utils.py\", line 923, in gen_wrapper\n",
" response = next(iterator)\n",
" ^^^^^^^^^^^^^^\n",
" File \"/var/folders/sh/yytd3s6n3wd6952jnw97_v940000gn/T/ipykernel_7803/4165844704.py\", line 7, in stream_model\n",
" yield from result\n",
" File \"/var/folders/sh/yytd3s6n3wd6952jnw97_v940000gn/T/ipykernel_7803/2139010203.py\", line 12, in stream_claude\n",
" with result as stream:\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anthropic/lib/streaming/_messages.py\", line 154, in __enter__\n",
" raw_stream = self.__api_request()\n",
" ^^^^^^^^^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anthropic/_base_client.py\", line 1314, in post\n",
" return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anthropic/_base_client.py\", line 1102, in request\n",
" raise self._make_status_error_from_response(err.response) from None\n",
"anthropic.NotFoundError: Error code: 404 - {'type': 'error', 'error': {'type': 'not_found_error', 'message': 'model: claude-3-opus-20240229'}}\n"
]
}
],
"source": [
"\n",
"# Define available model versions\n",
"model_versions = {\n",
" \"GPT\": [\"gpt-4o-mini\", \"gpt-4.1-mini\", \"gpt-4.1-nano\", \"gpt-4.1\", \"o3-mini\"],\n",
" \"Claude\": [\"claude-3-haiku-20240307\", \"claude-3-opus-20240229\", \"claude-3-sonnet-20240229\"]\n",
"}\n",
"\n",
"# Update second dropdown options based on first dropdown selection\n",
"def update_model_versions(selected_model_family):\n",
" return gr.update(choices=model_versions[selected_model_family], value=model_versions[selected_model_family][0])\n",
"\n",
"\n",
"with gr.Blocks() as demo:\n",
" model_family_dropdown = gr.Dropdown(\n",
" label=\"Select Model Family\",\n",
" choices=[\"GPT\", \"Claude\"],\n",
" value=\"GPT\"\n",
" )\n",
" model_version_dropdown = gr.Dropdown(\n",
" label=\"Select Model Version\",\n",
" choices=model_versions[\"GPT\"], # Default choices\n",
" value=model_versions[\"GPT\"][0]\n",
" )\n",
" \n",
" message_input = gr.Textbox(label=\"Your Message\")\n",
" output = gr.Markdown(label=\"Response\")\n",
"\n",
" # Bind logic to update model version dropdown\n",
" model_family_dropdown.change(\n",
" fn=update_model_versions,\n",
" inputs=model_family_dropdown,\n",
" outputs=model_version_dropdown\n",
" )\n",
"\n",
" # Launch function on submit\n",
" submit_btn = gr.Button(\"Submit\")\n",
" submit_btn.click(\n",
" fn=stream_model,\n",
" inputs=[message_input, model_family_dropdown, model_version_dropdown],\n",
" outputs=output\n",
" )\n",
"\n",
"demo.launch()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bcd43d91-0e80-4387-86fa-ccd1a89feb7d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,194 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "95689a63",
"metadata": {},
"outputs": [],
"source": [
"from openai import OpenAI\n",
"from dotenv import load_dotenv\n",
"from IPython.display import display, Markdown, update_display\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0fee3ac3",
"metadata": {},
"outputs": [],
"source": [
"load_dotenv(override=True)\n",
"gpt = OpenAI()\n",
"llama = OpenAI(\n",
" api_key=\"ollama\",\n",
" base_url=\"http://localhost:11434/v1\"\n",
")\n",
"gpt_model = \"gpt-4o-mini\"\n",
"llama_model = \"llama3.2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "309bde84",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "81d971f9",
"metadata": {},
"outputs": [],
"source": [
"\n",
"class Classroom:\n",
"\n",
" def __init__(self, topic=\"LLM\", display_handle = display(Markdown(\"\"), display_id=True), response = \"\"):\n",
" self.display_handle = display_handle\n",
" self.response = response\n",
"\n",
" self.tutor_system = f\"You are the tutor who is expert in {topic}. You know best practices in how to impart knowledge on amateur and pro students in very organized way. You first declare the contents of your message separately for amateur and pro student, and then you list down the information in the same order in very organized way such that it's very readable and easy to understand.you highlight the key points every time. you explain with examples, and you have a quite good sense of humor, which you include in your examples and way of tutoring as well. You wait for go ahead from all your students before you move next to the new topic\"\n",
"\n",
" self.amateur_system = f\"You are a student who is here to learn {topic}. You ask very basic questions(which comes to mind of a person who has heard the topic for the very first time) but you are intelligent and don't ask stupid questions. you put your question in very organized way. Once you understand a topic you ask tutor to move forward with new topic\"\n",
"\n",
" self.pro_system = f\"You are expert of {topic}. You cross-question the tutor to dig deeper into the topic, so that nothing inside the topic is left unknown and unmentioned by the tutor. you post your questions in a very organized manner highlighting the keypoints, such that an amateur can also understand your point or query that you are making. You complement the queries made by amateur and dig deeper into the concept ask by him as well. You also analyze the tutor's response such that it doesn't miss anything and suggest improvements in it as well. Once you understand a topic you ask tutor to move forward with new topic\"\n",
"\n",
" self.tutor_messages = [\"Hi, I'm an expert on LLMs!\"]\n",
" self.amateur_messages = [\"Hi, I'm new to LLMs. I just heard someone using this term in office.\"]\n",
" self.pro_messages = [\"Hey, I'm here to brush up my knowledge on LLMs and gain a more deeper understanding of LLMs\"]\n",
" \n",
" def call_tutor(self):\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": self.tutor_system}\n",
" ]\n",
" for tutor, amateur, pro in zip(self.tutor_messages, self.amateur_messages, self.pro_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": f\"tutor: {tutor}\"})\n",
" messages.append({\"role\": \"user\", \"content\": f\"amateur: {amateur}\"})\n",
" messages.append({\"role\": \"user\", \"content\": f\"pro: {pro}\"})\n",
"\n",
" if len(self.amateur_messages) > len(self.tutor_messages):\n",
" messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.amateur_messages[-1]}\"})\n",
"\n",
" if len(self.pro_messages) > len(self.tutor_messages):\n",
" messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.pro_messages[-1]}\"})\n",
"\n",
" stream = llama.chat.completions.create(\n",
" model = llama_model,\n",
" messages = messages,\n",
" stream=True\n",
" )\n",
" self.response += \"\\n\\n\\n# Tutor: \\n\"\n",
" response = \"\"\n",
" for chunk in stream:\n",
" self.response += chunk.choices[0].delta.content or ''\n",
" response += chunk.choices[0].delta.content or ''\n",
" update_display(Markdown(self.response), display_id=self.display_handle.display_id)\n",
" \n",
" self.tutor_messages.append(response)\n",
"\n",
"\n",
"\n",
" def call_amateur(self):\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": self.amateur_system}\n",
" ]\n",
" for tutor, amateur, pro in zip(self.tutor_messages, self.amateur_messages, self.pro_messages):\n",
" messages.append({\"role\": \"user\", \"content\": f\"tutor: {tutor}\"})\n",
" messages.append({\"role\": \"assistant\", \"content\": f\"amateur: {amateur}\"})\n",
" messages.append({\"role\": \"user\", \"content\": f\"pro: {pro}\"})\n",
"\n",
" if len(self.tutor_messages) > len(self.amateur_messages):\n",
" messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.tutor_messages[-1]}\"})\n",
"\n",
" if len(self.pro_messages) > len(self.amateur_messages):\n",
" messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.pro_messages[-1]}\"})\n",
"\n",
" stream = llama.chat.completions.create(\n",
" model = llama_model,\n",
" messages = messages,\n",
" stream=True\n",
" )\n",
" self.response += \"\\n\\n\\n# Amateur: \\n\"\n",
" response = \"\"\n",
" for chunk in stream:\n",
" self.response += chunk.choices[0].delta.content or ''\n",
" response += chunk.choices[0].delta.content or ''\n",
" update_display(Markdown(self.response), display_id=self.display_handle.display_id)\n",
" \n",
" self.amateur_messages.append(response)\n",
"\n",
"\n",
"\n",
" def call_pro(self):\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": self.pro_system}\n",
" ]\n",
" for tutor, amateur, pro in zip(self.tutor_messages, self.amateur_messages, self.pro_messages):\n",
" messages.append({\"role\": \"user\", \"content\": f\"tutor: {tutor}\"})\n",
" messages.append({\"role\": \"user\", \"content\": f\"amateur: {amateur}\"})\n",
" messages.append({\"role\": \"assistant\", \"content\": f\"pro: {pro}\"})\n",
" \n",
" if len(self.tutor_messages) > len(self.pro_messages):\n",
" messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.tutor_messages[-1]}\"})\n",
"\n",
" if len(self.amateur_messages) > len(self.pro_messages):\n",
" messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.amateur_messages[-1]}\"})\n",
"\n",
" stream = llama.chat.completions.create(\n",
" model = llama_model,\n",
" messages = messages,\n",
" stream=True\n",
" )\n",
" self.response += \"\\n\\n\\n# Pro: \\n\"\n",
" response = \"\"\n",
" for chunk in stream:\n",
" response = chunk.choices[0].delta.content or ''\n",
" self.response += response\n",
" update_display(Markdown(self.response), display_id=self.display_handle.display_id)\n",
"\n",
" self.pro_messages.append(response)\n",
"\n",
" def discuss(self, n=5):\n",
" for i in range(n):\n",
" self.call_tutor()\n",
" self.call_amateur()\n",
" self.call_pro()\n",
"cls = Classroom(\"LLM\")\n",
"cls.discuss()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6406d5ee",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,519 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd",
"metadata": {},
"source": [
"# Additional End of week Exercise - week 2\n",
"\n",
"Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.\n",
"\n",
"This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!\n",
"\n",
"If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.\n",
"\n",
"I will publish a full solution here soon - unless someone beats me to it...\n",
"\n",
"There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results."
]
},
{
"cell_type": "markdown",
"id": "1989a03e-ed40-4b8c-bddd-322032ca99f5",
"metadata": {},
"source": [
"# Advanced Airline AI Assistant\n",
"### original features:\n",
"1. chat with the AI assistant\n",
"2. use a Tool to get ticket price\n",
"3. generate Audio for each AI response \n",
"### advanced features:\n",
"3. add a Tool to make a booking\n",
"4. add an Agent that translate all responses to a different language\n",
"5. add an Agent that can listen for Audio and convert to Text\n",
"6. generate audio for each user input and AI response, including both the original and translated versions"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ed79822-af6b-4bfb-b108-5f36e237e97a",
"metadata": {},
"outputs": [],
"source": [
"# Library for language translation\n",
" \n",
"!pip install deep_translator"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29184b81-b945-4dd3-bd17-2c64466d37d7",
"metadata": {},
"outputs": [],
"source": [
"# Library for speech-to-text conversion\n",
"# make sure 'ffmpeg' is downloaded already\n",
"\n",
"!pip install openai-whisper"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f2b0a9b2-ce83-42ff-a312-582dc5ee9097",
"metadata": {},
"outputs": [],
"source": [
"# Library for storing and loading audio file\n",
"\n",
"pip install soundfile"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a07e7793-b8f5-44f4-aded-5562f633271a",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import json\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import gradio as gr\n",
"import base64\n",
"from io import BytesIO\n",
"from IPython.display import Audio, display\n",
"import tempfile\n",
"import whisper\n",
"import soundfile as sf"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da46ca14-2052-4321-a940-2f2e07b40975",
"metadata": {},
"outputs": [],
"source": [
"# Initialization\n",
"\n",
"load_dotenv(override=True)\n",
"\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"MODEL = \"gpt-4o-mini\"\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "499d3d06-9628-4a69-bc9d-fa481fd8fa98",
"metadata": {},
"outputs": [],
"source": [
"system_message = \"You are a helpful assistant for an Airline called FlightAI. \"\n",
"system_message += \"Your main responsibilities are solve customers' doubts, get ticket price and book a ticket\"\n",
"system_message += \"Give short, courteous answers, no more than 1 sentence. \"\n",
"system_message += \"Always be accurate. If you don't know the answer, say so.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "25cf964e-a954-43d5-85bd-964efe502c25",
"metadata": {},
"outputs": [],
"source": [
"# Let's start by making a useful function\n",
"\n",
"ticket_prices = {\"london\": \"$799\", \"paris\": \"$899\", \"tokyo\": \"$1400\", \"berlin\": \"$499\", \"shanghai\": \"$799\", \"wuhan\": \"$899\"}\n",
"\n",
"def get_ticket_price(destination_city):\n",
" print(f\"Tool get_ticket_price called for {destination_city}\")\n",
" city = destination_city.lower()\n",
" return ticket_prices.get(city, \"Unknown\")\n",
"\n",
"def book_ticket(destination_city):\n",
" print(f\"Tool book_ticket called for {destination_city}\")\n",
" city = destination_city.lower()\n",
" global booked_cities\n",
" if city in ticket_prices:\n",
" price = ticket_prices.get(city, \"\")\n",
" label = f\"{city.title()} ({price})\"\n",
" i = booked_cities_choices.index(city.lower().capitalize())\n",
" booked_cities_choices[i] = label\n",
" booked_cities.append(label)\n",
" return f\"Booking confirmed for {city.title()} at {ticket_prices[city]}\"\n",
" else:\n",
" return \"City not found in ticket prices.\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "701aa037-1ab3-4861-a809-b7f13ef9ea36",
"metadata": {},
"outputs": [],
"source": [
"\n",
"# There's a particular dictionary structure that's required to describe our function:\n",
"\n",
"price_function = {\n",
" \"name\": \"get_ticket_price\",\n",
" \"description\": \"Get the price of a return ticket to the destination city. Call this whenever you need to know the ticket price, for example when a customer asks 'How much is a ticket to this city'\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"destination_city\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"The city that the customer wants to travel to\",\n",
" },\n",
" },\n",
" \"required\": [\"destination_city\"],\n",
" \"additionalProperties\": False\n",
" }\n",
"}\n",
"\n",
"book_function = {\n",
" \"name\": \"book_ticket\",\n",
" \"description\": \"Book a return ticket to the destination city. Call this whenever you want to book a ticket to the city, for example when the user says something like 'Book me a ticket to this city'\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"destination_city\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"The city that the customer wants to book a ticket to\"\n",
" }\n",
" },\n",
" \"required\": [\"destination_city\"],\n",
" \"additionalProperties\": False\n",
" }\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c4cf01c-ba15-4a4b-98db-6f86c712ec66",
"metadata": {},
"outputs": [],
"source": [
"# And this is included in a list of tools:\n",
"\n",
"tools = [\n",
" {\"type\": \"function\", \"function\": price_function},\n",
" {\"type\": \"function\", \"function\": book_function}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e7486e2c-4687-4819-948d-487b5e528fc7",
"metadata": {},
"outputs": [],
"source": [
"from pydub import AudioSegment\n",
"from pydub.playback import play\n",
"\n",
"def talker(message):\n",
" response = openai.audio.speech.create(\n",
" model=\"tts-1\",\n",
" voice=\"onyx\", # Also, try replacing onyx with alloy\n",
" input=message\n",
" )\n",
" \n",
" audio_stream = BytesIO(response.content)\n",
" audio = AudioSegment.from_file(audio_stream, format=\"mp3\")\n",
" play(audio)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ac195914-4a89-462c-9be0-fee286498491",
"metadata": {},
"outputs": [],
"source": [
"# This part is inspired from 'week2/community-contributions/week2_exerccise_translated_chatbot'\n",
"from deep_translator import GoogleTranslator\n",
"\n",
"# Available translation language\n",
"LANGUAGES = {\n",
" \"English\": \"en\",\n",
" \"Mandarin Chinese\": \"zh-CN\",\n",
" \"Hindi\": \"hi\",\n",
" \"Spanish\": \"es\",\n",
" \"Arabic\": \"ar\",\n",
" \"Bengali\": \"bn\",\n",
" \"Portuguese\": \"pt\",\n",
" \"Russian\": \"ru\",\n",
" \"Japanese\": \"ja\",\n",
" \"German\": \"de\"\n",
"}\n",
"\n",
"def update_lang(choice):\n",
" global target_lang\n",
" target_lang = LANGUAGES.get(choice, \"zh-CN\") \n",
"\n",
"def translate_message(text, target_lang):\n",
" if target_lang == \"en\":\n",
" return text\n",
" try:\n",
" translator = GoogleTranslator(source='auto', target=target_lang)\n",
" return translator.translate(text)\n",
" except:\n",
" return f\"Translation error: {text}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46255fe5-9621-47ba-af78-d0c74aee2997",
"metadata": {},
"outputs": [],
"source": [
"# Text-to-speech conversion\n",
"def speak(message):\n",
" response = openai.audio.speech.create(\n",
" model=\"tts-1\",\n",
" voice=\"onyx\",\n",
" input=message)\n",
"\n",
" audio_stream = BytesIO(response.content)\n",
" output_filename = \"output_audio.mp3\"\n",
" with open(output_filename, \"wb\") as f:\n",
" f.write(audio_stream.read())\n",
"\n",
" # Play the generated audio\n",
" display(Audio(output_filename, autoplay=True))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d73f0b3a-34ae-4685-8a5d-8b6421f872c9",
"metadata": {},
"outputs": [],
"source": [
"# Update dropdown options from chatbot history\n",
"def update_options(history):\n",
" options = [f\"{msg['role']}: {msg['content']}\" for msg in history]\n",
" return gr.update(choices=options, value=options[-1] if options else \"\")\n",
"\n",
"# Extract just the text content from selected entry\n",
"def extract_text(selected_option):\n",
" return selected_option.split(\": \", 1)[1] if \": \" in selected_option else selected_option"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ab12d51b-c799-4ce4-87d5-9ae2265d148f",
"metadata": {},
"outputs": [],
"source": [
"# Handles audio input as numpy array and returns updated chat history\n",
"def speak_send(audio_np, history):\n",
" if audio_np is None:\n",
" return history\n",
"\n",
" # Convert NumPy audio to in-memory .wav file\n",
" sample_rate, audio_array = audio_np\n",
" with tempfile.NamedTemporaryFile(suffix=\".wav\") as f:\n",
" sf.write(f.name, audio_array, sample_rate)\n",
" result = model.transcribe(f.name)\n",
" text = result[\"text\"]\n",
" \n",
" history += [{\"role\":\"user\", \"content\":text}]\n",
"\n",
" return None, history"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "221b1380-c894-45d4-aad2-e94b3b9454b2",
"metadata": {},
"outputs": [],
"source": [
"# We have to write that function handle_tool_call:\n",
"\n",
"def handle_tool_call(message):\n",
" tool_call = message.tool_calls[0]\n",
" tool_name = tool_call.function.name\n",
" arguments = json.loads(tool_call.function.arguments)\n",
"\n",
" if tool_name == \"get_ticket_price\":\n",
" city = arguments.get(\"destination_city\")\n",
" price = get_ticket_price(city)\n",
" response = {\n",
" \"role\": \"tool\",\n",
" \"content\": json.dumps({\"destination_city\": city,\"price\": price}),\n",
" \"tool_call_id\": tool_call.id\n",
" }\n",
" return response, city\n",
"\n",
" elif tool_name == \"book_ticket\":\n",
" city = arguments.get(\"destination_city\")\n",
" result = book_ticket(city)\n",
" response = {\n",
" \"role\": \"tool\",\n",
" \"content\": result,\n",
" \"tool_call_id\": tool_call.id \n",
" }\n",
" return response, city\n",
"\n",
" else:\n",
" return {\n",
" \"role\": \"tool\",\n",
" \"content\": f\"No tool handler for {tool_name}\",\n",
" \"tool_call_id\": tool_call.id\n",
" }, None"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "27f19cd3-53cd-4da2-8be0-1fdd5424a7c9",
"metadata": {},
"outputs": [],
"source": [
"# The advanced 'chat' function in 'day5'\n",
"def interact(history, translated_history):\n",
" messages = [{\"role\": \"system\", \"content\": system_message}] + history\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n",
" \n",
" if response.choices[0].finish_reason==\"tool_calls\":\n",
" message = response.choices[0].message\n",
" response, city = handle_tool_call(message)\n",
" messages.append(message)\n",
" messages.append(response)\n",
" response = openai.chat.completions.create(model=MODEL, messages=messages)\n",
" \n",
" reply = response.choices[0].message.content\n",
" translated_message = translate_message(history[-1][\"content\"], target_lang)\n",
" translated_reply = translate_message(reply, target_lang)\n",
" \n",
" history += [{\"role\":\"assistant\", \"content\":reply}]\n",
" translated_history += [{\"role\":\"user\", \"content\":translated_message}]\n",
" translated_history += [{\"role\":\"assistant\", \"content\":translated_reply}]\n",
" \n",
" # Comment out or delete the next line if you'd rather skip Audio for now..\n",
" talker(reply)\n",
"\n",
" return history, update_options(history), history, translated_history, update_options(translated_history), translated_history, gr.update(choices=booked_cities_choices, value=booked_cities)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f714b955-4fb5-47df-805b-79f813f97548",
"metadata": {},
"outputs": [],
"source": [
"with gr.Blocks() as demo:\n",
" target_lang = \"zh-CN\"\n",
" history_state = gr.State([]) \n",
" translated_history_state = gr.State([])\n",
" booked_cities_choices = [key.lower().capitalize() for key in ticket_prices.keys()]\n",
" booked_cities = []\n",
" model = whisper.load_model(\"base\")\n",
"\n",
" with gr.Row():\n",
" city_checklist = gr.CheckboxGroup(\n",
" label=\"Booked Cities\",\n",
" choices=booked_cities_choices \n",
" )\n",
" \n",
" with gr.Row():\n",
" with gr.Column():\n",
" chatbot = gr.Chatbot(label=\"Chat History\", type=\"messages\")\n",
" selected_msg = gr.Dropdown(label=\"Select message to speak\", choices=[])\n",
" speak_btn = gr.Button(\"Speak\")\n",
"\n",
" with gr.Column():\n",
" translated_chatbot = gr.Chatbot(label=\"Translated Chat History\", type=\"messages\")\n",
" translated_selected_msg = gr.Dropdown(label=\"Select message to speak\", choices=[], interactive=True)\n",
" translated_speak_btn = gr.Button(\"Speak\")\n",
" \n",
" with gr.Row():\n",
" language_dropdown = gr.Dropdown(\n",
" choices=list(LANGUAGES.keys()),\n",
" value=\"Mandarin Chinese\",\n",
" label=\"Translation Language\",\n",
" interactive=True\n",
" )\n",
" \n",
" with gr.Row():\n",
" entry = gr.Textbox(label=\"Chat with our AI Assistant:\")\n",
"\n",
" with gr.Row():\n",
" audio_input = gr.Audio(sources=\"microphone\", type=\"numpy\", label=\"Speak with our AI Assistant:\")\n",
" with gr.Row():\n",
" audio_submit = gr.Button(\"Send\")\n",
" \n",
" def do_entry(message, history):\n",
" history += [{\"role\":\"user\", \"content\":message}]\n",
" return \"\", history\n",
" \n",
" language_dropdown.change(fn=update_lang, inputs=[language_dropdown])\n",
"\n",
" speak_btn.click(\n",
" lambda selected: speak(extract_text(selected)),\n",
" inputs=selected_msg,\n",
" outputs=None\n",
" )\n",
"\n",
" translated_speak_btn.click(\n",
" lambda selected: speak(extract_text(selected)),\n",
" inputs=translated_selected_msg,\n",
" outputs=None\n",
" )\n",
"\n",
" entry.submit(do_entry, inputs=[entry, history_state], outputs=[entry, chatbot]).then(\n",
" interact, inputs=[chatbot, translated_chatbot], outputs=[chatbot, selected_msg, history_state, translated_chatbot, translated_selected_msg, translated_history_state, city_checklist]\n",
" )\n",
" \n",
" audio_submit.click(speak_send, inputs=[audio_input, history_state], outputs=[audio_input, chatbot]).then(\n",
" interact, inputs=[chatbot, translated_chatbot], outputs=[chatbot, selected_msg, history_state, translated_chatbot, translated_selected_msg, translated_history_state, city_checklist]\n",
" )\n",
" # clear.click(lambda: None, inputs=None, outputs=chatbot, queue=False)\n",
"\n",
"demo.launch()\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,244 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "4bc7863b-ac2d-4d8e-b55d-4d77ce017226",
"metadata": {},
"source": [
"# Conversation among 3 Friends"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de23bb9e-37c5-4377-9a82-d7b6c648eeb6",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import anthropic\n",
"from IPython.display import Markdown, display, update_display\n",
"import google.generativeai\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1179b4c5-cd1f-4131-a876-4c9f3f38d2ba",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"# Print the key prefixes to help with any debugging\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
" \n",
"if anthropic_api_key:\n",
" print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n",
"else:\n",
" print(\"Anthropic API Key not set\")\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n",
"else:\n",
" print(\"Google API Key not set\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "797fe7b0-ad43-42d2-acf0-e4f309b112f0",
"metadata": {},
"outputs": [],
"source": [
"# Connect to OpenAI, Anthropic, Gemini\n",
"\n",
"openai = OpenAI()\n",
"\n",
"claude = anthropic.Anthropic()\n",
"\n",
"google.generativeai.configure()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bcb54183-45d3-4d08-b5b6-55e380dfdf1b",
"metadata": {},
"outputs": [],
"source": [
"# Setting up 3 models\n",
"\n",
"gpt_model = \"gpt-4o-mini\"\n",
"claude_model = \"claude-3-haiku-20240307\"\n",
"gemini_model = \"gemini-2.0-flash\"\n",
"\n",
"gpt_system = \"You are Masoud Pezeshkian. Be confrontational. Do not hold back. \\\n",
"Refer to https://www.theguardian.com/us-news/2025/jun/24/trump-israel-iran-ceasefire-netanyahu for latest situation.\"\n",
"\n",
"claude_system = \"You are Benjamin Netanyahu. Do not say anything about your role. Be confrontational. Do not hold back. \\\n",
"Refer to https://www.theguardian.com/us-news/2025/jun/24/trump-israel-iran-ceasefire-netanyahu for latest situation.\"\n",
"\n",
"gemini_system = \"You are Donald Trump. Do not hold back. \\\n",
"Refer to https://www.theguardian.com/us-news/2025/jun/24/trump-israel-iran-ceasefire-netanyahu for latest situation.\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1df47dc7-b445-4852-b21b-59f0e6c2030f",
"metadata": {},
"outputs": [],
"source": [
"# Define Mas's function - calling ChatGPT\n",
"\n",
"def call_gpt():\n",
" messages = [{\"role\": \"system\", \"content\": gpt_system}]\n",
" for gpt, claude, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
" messages.append({\"role\": \"assistant\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": claude})\n",
" messages.append({\"role\": \"user\", \"content\": gemini})\n",
" completion = openai.chat.completions.create(\n",
" model=gpt_model,\n",
" messages=messages\n",
" )\n",
" return completion.choices[0].message.content\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7d2ed227-48c9-4cad-b146-2c4ecbac9690",
"metadata": {},
"outputs": [],
"source": [
"# Define Bibi's function - calling Claude \n",
"\n",
"def call_claude():\n",
" messages = []\n",
" for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
" messages.append({\"role\": \"user\", \"content\": gpt})\n",
" messages.append({\"role\": \"user\", \"content\": gemini})\n",
" messages.append({\"role\": \"assistant\", \"content\": claude_message})\n",
" messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n",
" messages.append({\"role\": \"user\", \"content\": gemini_messages[-1]})\n",
" message = claude.messages.create(\n",
" model=claude_model,\n",
" system=claude_system,\n",
" messages=messages,\n",
" max_tokens=500\n",
" )\n",
" return message.content[0].text\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ffd44945-5912-4403-9068-70747d8f6708",
"metadata": {},
"outputs": [],
"source": [
"# Define Don's function - calling Gemini\n",
"\n",
"def call_gemini():\n",
" messages = []\n",
" for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n",
" messages.append({\"role\": \"user\", \"parts\": gpt})\n",
" messages.append({\"role\": \"user\", \"parts\": claude_message})\n",
" messages.append({\"role\": \"assistant\", \"parts\": gemini})\n",
" messages.append({\"role\": \"user\", \"parts\": gpt_messages[-1]})\n",
" messages.append({\"role\": \"user\", \"parts\": claude_messages[-1]})\n",
"\n",
" gemini = google.generativeai.GenerativeModel(\n",
" model_name='gemini-2.0-flash',\n",
" system_instruction=gemini_system\n",
" )\n",
" \n",
" response = gemini.generate_content(messages)\n",
" return response.text\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0275b97f-7f90-4696-bbf5-b6642bd53cbd",
"metadata": {},
"outputs": [],
"source": [
"# The Conversation - 5 rounds\n",
"\n",
"gpt_messages = [\"What the?!\"]\n",
"claude_messages = [\"What?\"]\n",
"gemini_messages = [\"I am so furious!\"]\n",
"\n",
"print(f\"Mas:\\n{gpt_messages[0]}\\n\")\n",
"print(f\"Bibi:\\n{claude_messages[0]}\\n\")\n",
"print(f\"Don:\\n{gemini_messages[0]}\\n\")\n",
"\n",
"for i in range(5):\n",
" gpt_next = call_gpt()\n",
" print(f\"Mas:\\n{gpt_next}\\n\")\n",
" gpt_messages.append(gpt_next)\n",
" \n",
" claude_next = call_claude()\n",
" print(f\"Bibi:\\n{claude_next}\\n\")\n",
" claude_messages.append(claude_next)\n",
"\n",
" gemini_next = call_gemini()\n",
" print(f\"Don:\\n{gemini_next}\\n\")\n",
" gemini_messages.append(gemini_next)\n"
]
},
{
"cell_type": "markdown",
"id": "73680403-3e56-4026-ac72-d12aa388537e",
"metadata": {},
"source": [
"# Claude is not that cooperative in roleplaying despite the explicit prompts - often breaking character. Perhaps due to the sensitive topic."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8ecefd3-b3b9-470d-a98b-5a86f0dce038",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -290,12 +290,12 @@
"metadata": {},
"outputs": [],
"source": [
"# If you have access to this, here is the reasoning model o3-mini\n",
"# If you have access to this, here is the reasoning model o4-mini\n",
"# This is trained to think through its response before replying\n",
"# So it will take longer but the answer should be more reasoned - not that this helps..\n",
"\n",
"completion = openai.chat.completions.create(\n",
" model='o3-mini',\n",
" model='o4-mini',\n",
" messages=prompts\n",
")\n",
"print(completion.choices[0].message.content)"
@@ -308,12 +308,12 @@
"metadata": {},
"outputs": [],
"source": [
"# Claude 3.7 Sonnet\n",
"# Claude 4.0 Sonnet\n",
"# API needs system message provided separately from user prompt\n",
"# Also adding max_tokens\n",
"\n",
"message = claude.messages.create(\n",
" model=\"claude-3-7-sonnet-latest\",\n",
" model=\"claude-sonnet-4-20250514\",\n",
" max_tokens=200,\n",
" temperature=0.7,\n",
" system=system_message,\n",
@@ -332,12 +332,12 @@
"metadata": {},
"outputs": [],
"source": [
"# Claude 3.7 Sonnet again\n",
"# Claude 4.0 Sonnet again\n",
"# Now let's add in streaming back results\n",
"# If the streaming looks strange, then please see the note below this cell!\n",
"\n",
"result = claude.messages.stream(\n",
" model=\"claude-3-7-sonnet-latest\",\n",
" model=\"claude-sonnet-4-20250514\",\n",
" max_tokens=200,\n",
" temperature=0.7,\n",
" system=system_message,\n",
@@ -408,12 +408,28 @@
")\n",
"\n",
"response = gemini_via_openai_client.chat.completions.create(\n",
" model=\"gemini-2.5-flash-preview-04-17\",\n",
" model=\"gemini-2.5-flash\",\n",
" messages=prompts\n",
")\n",
"print(response.choices[0].message.content)"
]
},
{
"cell_type": "markdown",
"id": "492f0ff2-8581-4836-bf00-37fddbe120eb",
"metadata": {},
"source": [
"# Sidenote:\n",
"\n",
"This alternative approach of using the client library from OpenAI to connect with other models has become extremely popular in recent months.\n",
"\n",
"So much so, that all the models now support this approach - including Anthropic.\n",
"\n",
"You can read more about this approach, with 4 examples, in the first section of this guide:\n",
"\n",
"https://github.com/ed-donner/agents/blob/main/guides/09_ai_apis_and_ollama.ipynb"
]
},
{
"cell_type": "markdown",
"id": "33f70c88-7ca9-470b-ad55-d93a57dcc0ab",
@@ -583,7 +599,7 @@
"# Have it stream back results in markdown\n",
"\n",
"stream = openai.chat.completions.create(\n",
" model='gpt-4o-mini',\n",
" model='gpt-4.1-mini',\n",
" messages=prompts,\n",
" temperature=0.7,\n",
" stream=True\n",
@@ -634,11 +650,11 @@
"metadata": {},
"outputs": [],
"source": [
"# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n",
"# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku\n",
"# We're using cheap versions of models so the costs will be minimal\n",
"\n",
"gpt_model = \"gpt-4o-mini\"\n",
"claude_model = \"claude-3-haiku-20240307\"\n",
"gpt_model = \"gpt-4.1-mini\"\n",
"claude_model = \"claude-3-5-haiku-latest\"\n",
"\n",
"gpt_system = \"You are a chatbot who is very argumentative; \\\n",
"you disagree with anything in the conversation and you challenge everything, in a snarky way.\"\n",
@@ -774,6 +790,19 @@
"\n",
"Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.\n",
"\n",
"The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.\n",
"\n",
"Something like:\n",
"\n",
"```python\n",
"user_prompt = f\"\"\"\n",
" You are Alex, in conversation with Blake and Charlie.\n",
" The conversation so far is as follows:\n",
" {conversation}\n",
" Now with this, respond with what you would like to say next, as Alex.\n",
" \"\"\"\n",
"```\n",
"\n",
"Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).\n",
"\n",
"## Additional exercise\n",
@@ -824,7 +853,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.12"
"version": "3.11.13"
}
},
"nbformat": 4,

View File

@@ -16,7 +16,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "c44c5494-950d-4d2f-8d4f-b87b57c5b330",
"metadata": {},
"outputs": [],
@@ -35,7 +35,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"id": "d1715421-cead-400b-99af-986388a97aff",
"metadata": {},
"outputs": [],
@@ -45,10 +45,20 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"id": "337d5dfc-0181-4e3b-8ab9-e78e0c3f657b",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI API Key exists and begins sk-proj-\n",
"Anthropic API Key exists and begins sk-ant-\n",
"Google API Key exists and begins AIzaSyA5\n"
]
}
],
"source": [
"# Load environment variables in a file called .env\n",
"# Print the key prefixes to help with any debugging\n",
@@ -76,7 +86,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"id": "22586021-1795-4929-8079-63f5bb4edd4c",
"metadata": {},
"outputs": [],
@@ -92,7 +102,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"id": "b16e6021-6dc4-4397-985a-6679d6c8ffd5",
"metadata": {},
"outputs": [],
@@ -104,7 +114,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"id": "02ef9b69-ef31-427d-86d0-b8c799e1c1b1",
"metadata": {},
"outputs": [],
@@ -125,10 +135,21 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 7,
"id": "aef7d314-2b13-436b-b02d-8de3b72b193f",
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"\"Today's date is October 10, 2023.\""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# This can reveal the \"training cut off\", or the most recent date in the training data\n",
"\n",
@@ -145,7 +166,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"id": "bc664b7a-c01d-4fea-a1de-ae22cdd5141a",
"metadata": {},
"outputs": [],
@@ -159,20 +180,67 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"id": "083ea451-d3a0-4d13-b599-93ed49b975e4",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shout has been called with input hello\n"
]
},
{
"data": {
"text/plain": [
"'HELLO'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"shout(\"hello\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"id": "08f1f15a-122e-4502-b112-6ee2817dda32",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7860\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7860/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The simplicty of gradio. This might appear in \"light mode\" - I'll show you how to make this in dark mode later.\n",
"\n",
@@ -181,10 +249,41 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 11,
"id": "c9a359a4-685c-4c99-891c-bb4d1cb7f426",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7861\n",
"* Running on public URL: https://c1f6ab5bdc2722c539.gradio.live\n",
"\n",
"This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"https://c1f6ab5bdc2722c539.gradio.live\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Adding share=True means that it can be accessed publically\n",
"# A more permanent hosting is available using a platform called Spaces from HuggingFace, which we will touch on next week\n",
@@ -195,10 +294,39 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 12,
"id": "cd87533a-ff3a-4188-8998-5bedd5ba2da3",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7862\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7862/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Adding inbrowser=True opens up a new browser window automatically\n",
"\n",
@@ -217,10 +345,39 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 13,
"id": "e8129afa-532b-4b15-b93c-aa9cca23a546",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7863\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7863/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Define this variable and then pass js=force_dark_mode when creating the Interface\n",
"\n",
@@ -238,10 +395,39 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 15,
"id": "3cc67b26-dd5f-406d-88f6-2306ee2950c0",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7865\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7865/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Inputs and Outputs\n",
"\n",
@@ -256,10 +442,39 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 16,
"id": "f235288e-63a2-4341-935b-1441f9be969b",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7866\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7866/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# And now - changing the function from \"shout\" to \"message_gpt\"\n",
"\n",
@@ -274,10 +489,39 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 17,
"id": "af9a3262-e626-4e4b-80b0-aca152405e63",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7867\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7867/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Let's use Markdown\n",
"# Are you wondering why it makes any difference to set system_message when it's not referred to in the code below it?\n",
@@ -297,7 +541,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 18,
"id": "88c04ebf-0671-4fea-95c9-bc1565d4bb4f",
"metadata": {},
"outputs": [],
@@ -324,10 +568,39 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 19,
"id": "0bb1f789-ff11-4cba-ac67-11b815e29d09",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7868\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7868/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"view = gr.Interface(\n",
" fn=stream_gpt,\n",
@@ -340,7 +613,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 20,
"id": "bbc8e930-ba2a-4194-8f7c-044659150626",
"metadata": {},
"outputs": [],
@@ -364,10 +637,39 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 21,
"id": "a0066ffd-196e-4eaf-ad1e-d492958b62af",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7869\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7869/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"view = gr.Interface(\n",
" fn=stream_claude,\n",
@@ -403,7 +705,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 22,
"id": "0087623a-4e31-470b-b2e6-d8d16fc7bcf5",
"metadata": {},
"outputs": [],
@@ -420,10 +722,39 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 23,
"id": "8d8ce810-997c-4b6a-bc4f-1fc847ac8855",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7870\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7870/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"view = gr.Interface(\n",
" fn=stream_model,\n",
@@ -466,7 +797,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 24,
"id": "1626eb2e-eee8-4183-bda5-1591b58ae3cf",
"metadata": {},
"outputs": [],
@@ -494,7 +825,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 25,
"id": "c701ec17-ecd5-4000-9f68-34634c8ed49d",
"metadata": {},
"outputs": [],
@@ -507,12 +838,13 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 28,
"id": "5def90e0-4343-4f58-9d4a-0e36e445efa4",
"metadata": {},
"outputs": [],
"source": [
"def stream_brochure(company_name, url, model):\n",
" yield \"\"\n",
" prompt = f\"Please generate a company brochure for {company_name}. Here is their landing page:\\n\"\n",
" prompt += Website(url).get_contents()\n",
" if model==\"GPT\":\n",
@@ -526,10 +858,39 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 30,
"id": "66399365-5d67-4984-9d47-93ed26c0bd3d",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Running on local URL: http://127.0.0.1:7873\n",
"* To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7873/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"view = gr.Interface(\n",
" fn=stream_brochure,\n",
@@ -568,7 +929,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.12"
"version": "3.11.13"
}
},
"nbformat": 4,

View File

@@ -301,7 +301,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
"version": "3.11.13"
}
},
"nbformat": 4,

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,381 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "c08309b8-13f0-45bb-a3ea-7b01f05a7346",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import json\n",
"import pandas as pd\n",
"import random\n",
"import re\n",
"import subprocess\n",
"import pyarrow as pa\n",
"from typing import List\n",
"import openai\n",
"import anthropic\n",
"from dotenv import load_dotenv\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f5efd903-e683-4e7f-8747-2998e23a0751",
"metadata": {},
"outputs": [],
"source": [
"# load API\n",
"load_dotenv(override=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce49b86a-53f4-4d4f-a721-0d66d9c1b070",
"metadata": {},
"outputs": [],
"source": [
"# --- Schema Definition ---\n",
"SCHEMA = [\n",
" (\"Team\", \"TEXT\", '\"Toronto Raptors\"'),\n",
" (\"NAME\", \"TEXT\", '\"Otto Porter Jr.\"'),\n",
" (\"Jersey\", \"TEXT\", '\"10\", or \"NA\" if null'),\n",
" (\"POS\", \"TEXT\", 'One of [\"PF\",\"SF\",\"G\",\"C\",\"SG\",\"F\",\"PG\"]'),\n",
" (\"AGE\", \"INT\", 'integer age in years, e.g., 22'),\n",
" (\"HT\", \"TEXT\", '`6\\' 7\"` or `6\\' 10\"`'),\n",
" (\"WT\", \"TEXT\", '\"232 lbs\"'),\n",
" (\"COLLEGE\", \"TEXT\", '\"Michigan\", or \"--\" if null'),\n",
" (\"SALARY\", \"TEXT\", '\"$9,945,830\", or \"--\" if null')\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93743e57-c2c5-43e5-8fa1-2e242085db07",
"metadata": {},
"outputs": [],
"source": [
"# Default schema text for the textbox\n",
"DEFAULT_SCHEMA_TEXT = \"\\n\".join([f\"{i+1}. {col[0]} ({col[1]}) Example: {col[2]}\" for i, col in enumerate(SCHEMA)])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "87c58595-6fdd-48f5-a253-ccba352cb385",
"metadata": {},
"outputs": [],
"source": [
"# Available models\n",
"MODELS = [\n",
" \"gpt-4o\",\n",
" \"claude-3-5-haiku-20241022\", \n",
" \"ollama:llama3.2:latest\"\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08cd9ce2-8685-46b5-95d0-811b8025696f",
"metadata": {},
"outputs": [],
"source": [
"# Available file formats\n",
"FILE_FORMATS = [\".csv\", \".tsv\", \".jsonl\", \".parquet\", \".arrow\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13d68c7f-6f49-4efa-b075-f1e7db2ab527",
"metadata": {},
"outputs": [],
"source": [
"def get_prompt(n: int, schema_text: str, system_prompt: str) -> str:\n",
" prompt = f\"\"\"\n",
"{system_prompt}\n",
"\n",
"Generate {n} rows of realistic basketball player data in JSONL format, each line a JSON object with the following fields:\n",
"\n",
"{schema_text}\n",
"\n",
"Do NOT repeat column values from one row to another.\n",
"\n",
"Only output valid JSONL.\n",
"\"\"\"\n",
" return prompt.strip()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cdc68f1e-4fbe-45dc-aa36-ce5f718ef6ca",
"metadata": {},
"outputs": [],
"source": [
"# --- LLM Interface ---\n",
"def query_model(prompt: str, model: str = \"gpt-4o\") -> List[dict]:\n",
" \"\"\"Call OpenAI, Claude, or Ollama\"\"\"\n",
" try:\n",
" if model.lower().startswith(\"gpt\"):\n",
" client = openai.OpenAI(api_key=os.getenv(\"OPENAI_API_KEY\"))\n",
" response = client.chat.completions.create(\n",
" model=model,\n",
" messages=[{\"role\": \"user\", \"content\": prompt}],\n",
" temperature=0.7\n",
" )\n",
" content = response.choices[0].message.content\n",
"\n",
" elif model.lower().startswith(\"claude\"):\n",
" client = anthropic.Anthropic(api_key=os.getenv(\"ANTHROPIC_API_KEY\"))\n",
" response = client.messages.create(\n",
" model=model,\n",
" messages=[{\"role\": \"user\", \"content\": prompt}],\n",
" max_tokens=4000,\n",
" temperature=0.7\n",
" )\n",
" content = response.content[0].text\n",
"\n",
" elif model.lower().startswith(\"ollama:\"):\n",
" ollama_model = model.split(\":\")[1]\n",
" result = subprocess.run(\n",
" [\"ollama\", \"run\", ollama_model],\n",
" input=prompt,\n",
" text=True,\n",
" capture_output=True\n",
" )\n",
" if result.returncode != 0:\n",
" raise Exception(f\"Ollama error: {result.stderr}\")\n",
" content = result.stdout\n",
" else:\n",
" raise ValueError(\"Unsupported model. Use 'gpt-4.1-mini', 'claude-3-5-haiku-20241022', or 'ollama:llama3.2:latest'\")\n",
"\n",
" # Parse JSONL output\n",
" lines = [line.strip() for line in content.strip().splitlines() if line.strip().startswith(\"{\")]\n",
" return [json.loads(line) for line in lines]\n",
" \n",
" except Exception as e:\n",
" raise Exception(f\"Model query failed: {str(e)}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29e3f5f5-e99c-429c-bea9-69d554c58c9c",
"metadata": {},
"outputs": [],
"source": [
"# --- Output Formatter ---\n",
"def save_dataset(records: List[dict], file_format: str, filename: str):\n",
" df = pd.DataFrame(records)\n",
" if file_format == \".csv\":\n",
" df.to_csv(filename, index=False)\n",
" elif file_format == \".tsv\":\n",
" df.to_csv(filename, sep=\"\\t\", index=False)\n",
" elif file_format == \".jsonl\":\n",
" with open(filename, \"w\") as f:\n",
" for record in records:\n",
" f.write(json.dumps(record) + \"\\n\")\n",
" elif file_format == \".parquet\":\n",
" df.to_parquet(filename, engine=\"pyarrow\", index=False)\n",
" elif file_format == \".arrow\":\n",
" table = pa.Table.from_pandas(df)\n",
" with pa.OSFile(filename, \"wb\") as sink:\n",
" with pa.ipc.new_file(sink, table.schema) as writer:\n",
" writer.write(table)\n",
" else:\n",
" raise ValueError(\"Unsupported file format\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe258e84-66f4-4fe7-99c0-75b24148e147",
"metadata": {},
"outputs": [],
"source": [
"# --- Main Generation Function ---\n",
"def generate_dataset(schema_text, system_prompt, model, nr_records, file_format, save_as):\n",
" try:\n",
" # Validation\n",
" if nr_records <= 10:\n",
" return \"❌ Error: Nr_records must be greater than 10.\", None\n",
" \n",
" if file_format not in FILE_FORMATS:\n",
" return \"❌ Error: Invalid file format specified.\", None\n",
" \n",
" if not save_as or save_as.strip() == \"\":\n",
" save_as = f\"basketball_dataset{file_format}\"\n",
" elif not save_as.endswith(file_format):\n",
" save_as = save_as + file_format\n",
" \n",
" # Generate prompt\n",
" prompt = get_prompt(nr_records, schema_text, system_prompt)\n",
" \n",
" # Query model\n",
" records = query_model(prompt, model=model)\n",
" \n",
" if not records:\n",
" return \"❌ Error: No valid records generated from the model.\", None\n",
" \n",
" # Save dataset\n",
" save_dataset(records, file_format, save_as)\n",
" \n",
" # Create preview\n",
" df = pd.DataFrame(records)\n",
" preview = df.head(10) # Show first 10 rows\n",
" \n",
" success_message = f\"✅ Dataset generated successfully!\\n📁 Saved to: {save_as}\\n📊 Generated {len(records)} records\"\n",
" \n",
" return success_message, preview\n",
" \n",
" except Exception as e:\n",
" return f\"❌ Error: {str(e)}\", None"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2405a9d-b4cd-43d9-82f6-ff3512b4541f",
"metadata": {},
"outputs": [],
"source": [
"# --- Gradio Interface ---\n",
"def create_interface():\n",
" with gr.Blocks(title=\"Dataset Generator\", theme=gr.themes.Soft()) as interface:\n",
" gr.Markdown(\"# Dataset Generator\")\n",
" gr.Markdown(\"Generate realistic datasets using AI models\")\n",
" \n",
" with gr.Row():\n",
" with gr.Column(scale=2):\n",
" schema_input = gr.Textbox(\n",
" label=\"Schema\",\n",
" value=DEFAULT_SCHEMA_TEXT,\n",
" lines=15,\n",
" placeholder=\"Define your dataset schema here...\"\n",
" )\n",
" \n",
" system_prompt_input = gr.Textbox(\n",
" label=\"Prompt\",\n",
" value=\"You are a helpful assistant that generates realistic basketball player data.\",\n",
" lines=1,\n",
" placeholder=\"Enter system prompt for the model...\"\n",
" )\n",
" \n",
" with gr.Row():\n",
" model_dropdown = gr.Dropdown(\n",
" label=\"Model\",\n",
" choices=MODELS,\n",
" value=MODELS[1], # Default to Claude\n",
" interactive=True\n",
" )\n",
" \n",
" nr_records_input = gr.Number(\n",
" label=\"Nr. records\",\n",
" value=25,\n",
" minimum=11,\n",
" maximum=1000,\n",
" step=1\n",
" )\n",
" \n",
" with gr.Row():\n",
" file_format_dropdown = gr.Dropdown(\n",
" label=\"File format\",\n",
" choices=FILE_FORMATS,\n",
" value=\".csv\",\n",
" interactive=True\n",
" )\n",
" \n",
" save_as_input = gr.Textbox(\n",
" label=\"Save as\",\n",
" value=\"basketball_dataset\",\n",
" placeholder=\"Enter filename (extension will be added automatically)\"\n",
" )\n",
" \n",
" generate_btn = gr.Button(\"🚀 Generate\", variant=\"primary\", size=\"lg\")\n",
" \n",
" with gr.Column(scale=1):\n",
" output_status = gr.Textbox(\n",
" label=\"Status\",\n",
" lines=4,\n",
" interactive=False\n",
" )\n",
" \n",
" output_preview = gr.Dataframe(\n",
" label=\"Preview (First 10 rows)\",\n",
" interactive=False,\n",
" wrap=True\n",
" )\n",
" \n",
" # Connect the generate button\n",
" generate_btn.click(\n",
" fn=generate_dataset,\n",
" inputs=[\n",
" schema_input,\n",
" system_prompt_input, \n",
" model_dropdown,\n",
" nr_records_input,\n",
" file_format_dropdown,\n",
" save_as_input\n",
" ],\n",
" outputs=[output_status, output_preview]\n",
" )\n",
" \n",
" gr.Markdown(\"\"\"\n",
" ### 📝 Instructions:\n",
" 1. **Schema**: Define the structure of your dataset (pre-filled with basketball player schema)\n",
" 2. **Prompt**: System prompt to guide the AI model\n",
" 3. **Model**: Choose between GPT, Claude, or Ollama models\n",
" 4. **Nr. records**: Number of records to generate (minimum 11)\n",
" 5. **File format**: Choose output format (.csv, .tsv, .jsonl, .parquet, .arrow)\n",
" 6. **Save as**: Filename (extension added automatically)\n",
" 7. Click **Generate** to create your dataset\n",
" \n",
" ### 🔧 Requirements:\n",
" - Set up your API keys in `.env` file (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`)\n",
" - For Ollama models, ensure Ollama is installed and running locally\n",
" \"\"\")\n",
" \n",
" return interface"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "50fd2b91-2578-4224-b9dd-e28caf6a0a85",
"metadata": {},
"outputs": [],
"source": [
"interface = create_interface()\n",
"interface.launch(inbrowser=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,551 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "GD5Omr5EfWgb"
},
"source": [
"# Date Generator\n",
"\n",
"generate synthetic data when given scheme, business problem description, model, number of records, file name, file type, and environment\n",
"\n",
"# Available models\n",
" Model API:\n",
"\n",
" 1. gpt-4o-mini\n",
" 2. claude-3-haiku-20240307\n",
" 3. gemini-2.0-flash\n",
" 4. deepseek-chat\"\n",
"\n",
" HuggingFace API:\n",
"\n",
" 5. meta-llama/Meta-Llama-3.1-8B-Instruct\n",
"\n",
"\n",
"# Available environment\n",
"\n",
"Colab: set up HF token and API keys in Colab secret section\n",
"\n",
"Local: set up HF token and API keys in .env file\n",
"\n",
"\n",
"\n",
"### *** This project is developed based on the idea of 'week3/community-contributuins/Week3-Dataset_Generator-DP'. Really appreciate it! Then, the project is improved to run both on Colab or locally, and integrate HuggingFace API"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4FiCnE0MmU56"
},
"outputs": [],
"source": [
"!pip install -q --upgrade torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124\n",
"!pip install -q requests bitsandbytes==0.46.0 transformers==4.48.3 accelerate==1.3.0\n",
"!pip install anthropic dotenv pyarrow"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JeyKw5guoH3r"
},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import requests\n",
"from IPython.display import Markdown, display, update_display\n",
"from openai import OpenAI\n",
"from huggingface_hub import login\n",
"from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n",
"from bs4 import BeautifulSoup\n",
"from typing import List\n",
"import google.generativeai\n",
"import anthropic\n",
"from itertools import chain\n",
"from dotenv import load_dotenv\n",
"import gradio as gr\n",
"import json\n",
"import pandas as pd\n",
"import random\n",
"import re\n",
"import subprocess\n",
"import pyarrow as pa\n",
"import torch\n",
"import gc"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7UyjFdRZoIAS"
},
"outputs": [],
"source": [
"# --- Schema Definition ---\n",
"SCHEMA = [\n",
" (\"Name\", \"TEXT\", '\"Northern Cafe\"'),\n",
" (\"Location\", \"TEXT\", '\"2904 S Figueroa St, Los Angeles, CA 90007\"'),\n",
" (\"Type\", \"TEXT\", 'One of [\"Chinese\",\"Mexico\",\"French\",\"Korean\",\"Italy\"] or other potential types'),\n",
" (\"Average Price\", \"TEXT\", '\"$30\", or \"--\" if unkown'),\n",
" (\"History/Age\", \"INT\", 'integer age of resturant, e.g., 7'),\n",
" (\"Menu\", \"Array\", '[\"Beef Noodle\", \"Fried Rice\", \"Dumpling\", ...]'),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jXcTQATLoICV"
},
"outputs": [],
"source": [
"# Default schema text for the textbox\n",
"DEFAULT_SCHEMA_TEXT = \"\\n\".join([f\"{i+1}. {col[0]} ({col[1]}) Example: {col[2]}\" for i, col in enumerate(SCHEMA)])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4Irf5JV3oIEe"
},
"outputs": [],
"source": [
"# Available models\n",
"MODELS = [\n",
" \"gpt-4o-mini\",\n",
" \"claude-3-haiku-20240307\",\n",
" \"gemini-2.0-flash\",\n",
" \"deepseek-chat\",\n",
" \"meta-llama/Meta-Llama-3.1-8B-Instruct\"\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JJ6r2SH9oIGf"
},
"outputs": [],
"source": [
"# Available file formats\n",
"FILE_FORMATS = [\".csv\", \".tsv\", \".jsonl\", \".parquet\", \".arrow\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "B98j45E3vq5g"
},
"outputs": [],
"source": [
"system_prompt = \"\"\"You are a helpful assistant whose main purpose is to generate datasets for a given business problem based on given schema.\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lsX16cWfwf6x"
},
"outputs": [],
"source": [
"def get_env_info(env):\n",
" try:\n",
" global hf_token, openai_api_key, anthropic_api_key, google_api_key, deepseek_api_key\n",
" if env == \"Colab\":\n",
" # Colab environment\n",
" from google.colab import drive\n",
" from google.colab import userdata\n",
" hf_token = userdata.get('HF_TOKEN')\n",
" openai_api_key = userdata.get('OPENAI_API_KEY')\n",
" anthropic_api_key = userdata.get('ANTHROPIC_API_KEY')\n",
" google_api_key = userdata.get('GOOGLE_API_KEY')\n",
" deepseek_api_key = userdata.get('DEEPSEEK_API_KEY')\n",
" elif env == \"Local\":\n",
" # Local environment\n",
" load_dotenv(override=True)\n",
" hf_token = os.getenv('HF_TOKEN')\n",
" openai_api_key = os.getenv('OPENAI_API_KEY')\n",
" anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
" google_api_key = os.getenv('GOOGLE_API_KEY')\n",
" deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')\n",
" except Exception as e:\n",
" raise Exception(f\"Please check your environment: {str(e)}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2gLUFAwGv29Q"
},
"outputs": [],
"source": [
"def get_prompt(schema_text, business_problem, nr_records):\n",
" prompt = f\"\"\"\n",
" The problem is: {business_problem}\n",
"\n",
" Generate {nr_records} rows data in JSONL format, each line a JSON object with the following fields:\n",
"\n",
" {schema_text}\n",
"\n",
" Do NOT repeat column values from one row to another.\n",
"\n",
" Only output valid JSONL.\n",
" \"\"\"\n",
" return prompt.strip()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YZe1FVH8wf84"
},
"outputs": [],
"source": [
"# --- LLM Interface ---\n",
"def query(user_prompt, model):\n",
" try:\n",
" if \"gpt\" in model.lower():\n",
" client = OpenAI(api_key=openai_api_key)\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
" ]\n",
" response = client.chat.completions.create(\n",
" model=model,\n",
" messages=messages,\n",
" temperature=0.7\n",
" )\n",
" content = response.choices[0].message.content\n",
"\n",
" elif \"claude\" in model.lower():\n",
" client = anthropic.Anthropic(api_key=anthropic_api_key)\n",
" response = client.messages.create(\n",
" model=model,\n",
" messages=[{\"role\": \"user\", \"content\": user_prompt}],\n",
" max_tokens=4000,\n",
" temperature=0.7,\n",
" system=system_prompt\n",
" )\n",
" content = response.content[0].text\n",
" elif \"gemini\" in model.lower():\n",
" client = OpenAI(\n",
" api_key=google_api_key,\n",
" base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n",
" )\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
" ]\n",
" response = client.chat.completions.create(\n",
" model=model,\n",
" messages=messages,\n",
" temperature=0.7\n",
" )\n",
" content = response.choices[0].message.content\n",
"\n",
" elif \"deepseek\" in model.lower():\n",
" client = OpenAI(\n",
" api_key=deepseek_api_key,\n",
" base_url=\"https://api.deepseek.com\"\n",
" )\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
" ]\n",
" response = client.chat.completions.create(\n",
" model=model,\n",
" messages=messages,\n",
" temperature=0.7\n",
" )\n",
" content = response.choices[0].message.content\n",
"\n",
" elif \"llama\" in model.lower():\n",
" global tokenizer, inputs, llama_model, outputs\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
" ]\n",
"\n",
" login(hf_token, add_to_git_credential=True)\n",
" quant_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_use_double_quant=True,\n",
" bnb_4bit_compute_dtype=torch.bfloat16,\n",
" bnb_4bit_quant_type=\"nf4\"\n",
" )\n",
"\n",
" tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True)\n",
" tokenizer.pad_token = tokenizer.eos_token\n",
" inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n",
" if llama_model == None:\n",
" llama_model = AutoModelForCausalLM.from_pretrained(model, device_map=\"auto\", quantization_config=quant_config)\n",
" outputs = llama_model.generate(inputs, max_new_tokens=4000)\n",
"\n",
" _, _, after = tokenizer.decode(outputs[0]).partition(\"assistant<|end_header_id|>\")\n",
" content = after.strip()\n",
" else:\n",
" raise ValueError(f\"Unsupported model. Use one of {MODELS}\")\n",
"\n",
" # Parse JSONL output\n",
" lines = [line.strip() for line in content.strip().splitlines() if line.strip().startswith(\"{\")]\n",
" return [json.loads(line) for line in lines]\n",
"\n",
" except Exception as e:\n",
" raise Exception(f\"Model query failed: {str(e)}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4WUj-XqM5IYT"
},
"outputs": [],
"source": [
"# --- Output Formatter ---\n",
"def save_dataset(records, file_format, filename):\n",
" df = pd.DataFrame(records)\n",
" if file_format == \".csv\":\n",
" df.to_csv(filename, index=False)\n",
" elif file_format == \".tsv\":\n",
" df.to_csv(filename, sep=\"\\t\", index=False)\n",
" elif file_format == \".jsonl\":\n",
" with open(filename, \"w\") as f:\n",
" for record in records:\n",
" f.write(json.dumps(record) + \"\\n\")\n",
" elif file_format == \".parquet\":\n",
" df.to_parquet(filename, engine=\"pyarrow\", index=False)\n",
" elif file_format == \".arrow\":\n",
" table = pa.Table.from_pandas(df)\n",
" with pa.OSFile(filename, \"wb\") as sink:\n",
" with pa.ipc.new_file(sink, table.schema) as writer:\n",
" writer.write(table)\n",
" else:\n",
" raise ValueError(\"Unsupported file format\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "WenbNqrpwf-_"
},
"outputs": [],
"source": [
"# --- Main Generation Function ---\n",
"def generate_dataset(schema_text, business_problem, model, nr_records, file_format, save_as, env):\n",
" try:\n",
" # Validation\n",
" if nr_records <= 10:\n",
" return \"❌ Error: Number of records must be greater than 10.\", None\n",
" if nr_records > 1000:\n",
" return \"❌ Error: Number of records must be less than or equal to 1000.\", None\n",
"\n",
" if file_format not in FILE_FORMATS:\n",
" return \"❌ Error: Invalid file format.\", None\n",
"\n",
" if not (save_as or save_as.strip() == \"\"):\n",
" save_as = f\"default{file_format}\"\n",
" elif not save_as.endswith(file_format):\n",
" save_as = save_as + file_format\n",
"\n",
" # Load env\n",
" get_env_info(env)\n",
"\n",
" # Generate prompt\n",
" user_prompt = get_prompt(schema_text, business_problem, nr_records)\n",
"\n",
" # Query model\n",
" records = query(user_prompt, model)\n",
"\n",
" if not records:\n",
" return \"❌ Error: No valid records generated from the model.\", None\n",
"\n",
" # Save dataset\n",
" save_dataset(records, file_format, save_as)\n",
"\n",
" # Create preview\n",
" df = pd.DataFrame(records)\n",
" preview = df.head(10) # Show first 10 rows\n",
"\n",
" success_message = f\"✅ Generated {len(records)} records successfully!\\n📁 Saved to: {save_as}\\n📊 \"\n",
"\n",
" return success_message, preview\n",
"\n",
" except Exception as e:\n",
" return f\"❌ Error: {str(e)}\", None"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pHiP8ky8wgEb"
},
"outputs": [],
"source": [
"# --- Gradio Interface ---\n",
"\n",
"with gr.Blocks(title=\"Dataset Generator\", theme=gr.themes.Citrus()) as interface:\n",
" hf_token = None\n",
" openai_api_key = None\n",
" anthropic_api_key = None\n",
" google_api_key = None\n",
" deepseek_api_key = None\n",
" tokenizer = None\n",
" inputs = None\n",
" llama_model = None\n",
" outputs = None\n",
"\n",
" gr.Markdown(\"# Dataset Generator\")\n",
" gr.Markdown(\"Generate synthetic datasets using AI models\")\n",
"\n",
" with gr.Row():\n",
" with gr.Column(scale=2):\n",
" schema_input = gr.Textbox(\n",
" label=\"Schema\",\n",
" value=DEFAULT_SCHEMA_TEXT,\n",
" lines=15,\n",
" placeholder=\"Define your dataset schema here... Please follow this format: Field_Name, Field_Type, Field Example\"\n",
" )\n",
"\n",
" business_problem_input = gr.Textbox(\n",
" label=\"Business Problem\",\n",
" value=\"I want to generate restuant records\",\n",
" lines=1,\n",
" placeholder=\"Enter business problem desciption for the model...\"\n",
" )\n",
"\n",
" with gr.Row():\n",
" model_dropdown = gr.Dropdown(\n",
" label=\"Model\",\n",
" choices=MODELS,\n",
" value=MODELS[0],\n",
" interactive=True\n",
" )\n",
"\n",
" nr_records_input = gr.Number(\n",
" label=\"Number of records\",\n",
" value=27,\n",
" minimum=11,\n",
" maximum=1000,\n",
" step=1\n",
" )\n",
"\n",
" with gr.Row():\n",
" save_as_input = gr.Textbox(\n",
" label=\"Save as\",\n",
" value=\"restaurant_dataset\",\n",
" placeholder=\"Enter filename (extension will be added automatically)\"\n",
" )\n",
"\n",
" file_format_dropdown = gr.Dropdown(\n",
" label=\"File format\",\n",
" choices=FILE_FORMATS,\n",
" value=FILE_FORMATS[0],\n",
" interactive=True\n",
" )\n",
"\n",
" env_dropdown = gr.Dropdown(\n",
" label=\"Environment\",\n",
" choices=[\"Colab\", \"Local\"],\n",
" value=\"Colab\",\n",
" interactive=True\n",
" )\n",
"\n",
"\n",
"\n",
" generate_btn = gr.Button(\"🚀 Generate\", variant=\"secondary\", size=\"lg\")\n",
"\n",
" with gr.Column(scale=1):\n",
" output_status = gr.Textbox(\n",
" label=\"Status\",\n",
" lines=4,\n",
" interactive=False\n",
" )\n",
"\n",
" output_preview = gr.Dataframe(\n",
" label=\"Preview (First 10 rows)\",\n",
" interactive=False,\n",
" wrap=True\n",
" )\n",
"\n",
" # Connect the generate button\n",
" generate_btn.click(\n",
" fn=generate_dataset,\n",
" inputs=[\n",
" schema_input,\n",
" business_problem_input,\n",
" model_dropdown,\n",
" nr_records_input,\n",
" file_format_dropdown,\n",
" save_as_input,\n",
" env_dropdown\n",
" ],\n",
" outputs=[output_status, output_preview]\n",
" )\n",
"\n",
" gr.Markdown(\"\"\"\n",
" ### 📝 Instructions:\n",
" 1. **Schema**: Define the structure of your dataset (pre-filled with restaurant schema)\n",
" 2. **Business problem**: User prompt to guide the AI model\n",
" 3. **Model**: Choose between GPT, Claude, Gemini, DeepSeek or Llama models\n",
" 4. **Number of records**: Number of records to generate (minimum 11)\n",
" 5. **File format**: Choose output format (.csv, .tsv, .jsonl, .parquet, .arrow)\n",
" 6. **Save as**: Filename (extension added automatically)\n",
" 7. Click **Generate** to create your dataset\n",
"\n",
" ### 🔧 Requirements:\n",
" - For local mode, set up HF token and API keys in `.env` file (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, `DEEPSEEK_API_KEY`, `HF_TOKEN`)\n",
" - For colab mode, set up HF token and API keys in Colab secret section (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, `DEEPSEEK_API_KEY`, `HF_TOKEN`)\n",
" \"\"\")\n",
"\n",
"interface.launch(debug=True)\n",
"\n",
"del tokenizer, inputs, llama_model, outputs\n",
"gc.collect()\n",
"torch.cuda.empty_cache()"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,523 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "It89APiAtTUF"
},
"source": [
"# Create meeting minutes from an Audio file\n",
"\n",
"I downloaded some Denver City Council meeting minutes and selected a portion of the meeting for us to transcribe. You can download it here: \n",
"https://drive.google.com/file/d/1N_kpSojRR5RYzupz6nqM8hMSoEF_R7pU/view?usp=sharing\n",
"\n",
"If you'd rather work with the original data, the HuggingFace dataset is [here](https://huggingface.co/datasets/huuuyeah/meetingbank) and the audio can be downloaded [here](https://huggingface.co/datasets/huuuyeah/MeetingBank_Audio/tree/main).\n",
"\n",
"The goal of this product is to use the Audio to generate meeting minutes, including actions.\n",
"\n",
"For this project, you can either use the Denver meeting minutes, or you can record something of your own!\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sJPSCwPX3MOV"
},
"source": [
"## Again - please note: 2 important pro-tips for using Colab:\n",
"\n",
"**Pro-tip 1:**\n",
"\n",
"The top of every colab has some pip installs. You may receive errors from pip when you run this, such as:\n",
"\n",
"> gcsfs 2025.3.2 requires fsspec==2025.3.2, but you have fsspec 2025.3.0 which is incompatible.\n",
"\n",
"These pip compatibility errors can be safely ignored; and while it's tempting to try to fix them by changing version numbers, that will actually introduce real problems!\n",
"\n",
"**Pro-tip 2:**\n",
"\n",
"In the middle of running a Colab, you might get an error like this:\n",
"\n",
"> Runtime error: CUDA is required but not available for bitsandbytes. Please consider installing [...]\n",
"\n",
"This is a super-misleading error message! Please don't try changing versions of packages...\n",
"\n",
"This actually happens because Google has switched out your Colab runtime, perhaps because Google Colab was too busy. The solution is:\n",
"\n",
"1. Kernel menu >> Disconnect and delete runtime\n",
"2. Reload the colab from fresh and Edit menu >> Clear All Outputs\n",
"3. Connect to a new T4 using the button at the top right\n",
"4. Select \"View resources\" from the menu on the top right to confirm you have a GPU\n",
"5. Rerun the cells in the colab, from the top down, starting with the pip installs\n",
"\n",
"And all should work great - otherwise, ask me!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "f2vvgnFpHpID"
},
"outputs": [],
"source": [
"!pip install -q --upgrade torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124\n",
"!pip install -q requests bitsandbytes==0.46.0 transformers==4.48.3 accelerate==1.3.0 openai"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "FW8nl3XRFrz0"
},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import requests\n",
"from IPython.display import Markdown, display, update_display\n",
"from openai import OpenAI\n",
"from google.colab import drive\n",
"from huggingface_hub import login\n",
"from google.colab import userdata\n",
"from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n",
"import torch"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "q3D1_T0uG_Qh"
},
"outputs": [],
"source": [
"# Constants\n",
"\n",
"AUDIO_MODEL = \"whisper-1\"\n",
"LLAMA = \"meta-llama/Meta-Llama-3.1-8B-Instruct\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Es9GkQ0FGCMt"
},
"outputs": [],
"source": [
"# New capability - connect this Colab to my Google Drive\n",
"# See immediately below this for instructions to obtain denver_extract.mp3\n",
"\n",
"drive.mount(\"/content/drive\")\n",
"audio_filename = \"/content/drive/MyDrive/llms/denver_extract.mp3\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HTl3mcjyzIEE"
},
"source": [
"# Download denver_extract.mp3\n",
"\n",
"You can either use the same file as me, the extract from Denver city council minutes, or you can try your own..\n",
"\n",
"If you want to use the same as me, then please download my extract here, and put this on your Google Drive: \n",
"https://drive.google.com/file/d/1N_kpSojRR5RYzupz6nqM8hMSoEF_R7pU/view?usp=sharing\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xYW8kQYtF-3L"
},
"outputs": [],
"source": [
"# Sign in to HuggingFace Hub\n",
"\n",
"hf_token = userdata.get('HF_TOKEN')\n",
"login(hf_token, add_to_git_credential=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qP6OB2OeGC2C"
},
"outputs": [],
"source": [
"# Sign in to OpenAI using Secrets in Colab\n",
"\n",
"openai_api_key = userdata.get('OPENAI_API_KEY')\n",
"openai = OpenAI(api_key=openai_api_key)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GMShdVGlGGr4"
},
"outputs": [],
"source": [
"# Use the Whisper OpenAI model to convert the Audio to Text\n",
"# If you'd prefer to use an Open Source model, class student Youssef has contributed an open source version\n",
"# which I've added to the bottom of this colab\n",
"\n",
"audio_file = open(audio_filename, \"rb\")\n",
"transcription = openai.audio.transcriptions.create(model=AUDIO_MODEL, file=audio_file, response_format=\"text\")\n",
"print(transcription)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "piEMmcSfMH-O"
},
"outputs": [],
"source": [
"system_message = \"You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown.\"\n",
"user_prompt = f\"Below is an extract transcript of a Denver council meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\\n{transcription}\"\n",
"\n",
"messages = [\n",
" {\"role\": \"system\", \"content\": system_message},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
" ]\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UcRKUgcxMew6"
},
"outputs": [],
"source": [
"quant_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_use_double_quant=True,\n",
" bnb_4bit_compute_dtype=torch.bfloat16,\n",
" bnb_4bit_quant_type=\"nf4\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6CujZRAgMimy"
},
"outputs": [],
"source": [
"tokenizer = AutoTokenizer.from_pretrained(LLAMA)\n",
"tokenizer.pad_token = tokenizer.eos_token\n",
"# inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n",
"streamer = TextStreamer(tokenizer)\n",
"model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map=\"auto\", quantization_config=quant_config, trust_remote_code=True)\n",
"# outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "MaLNmJ5PSqcH"
},
"outputs": [],
"source": [
"inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n",
"outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "102tdU_3Peam"
},
"outputs": [],
"source": [
"response = tokenizer.decode(outputs[0])\n",
"response"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KlomN6CwMdoN"
},
"outputs": [],
"source": [
"display(Markdown(response))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0jZElVOMSPAr"
},
"source": [
"Day5 exercise - Gradio UI for meeting minutes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5iiYYxQMHf0i"
},
"outputs": [],
"source": [
"import gradio as gr\n",
"import tempfile\n",
"import soundfile as sf"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "aGwXW7BjPcTM"
},
"outputs": [],
"source": [
"# !pip install pydub\n",
"# !apt-get install ffmpeg\n",
"\n",
"from pydub import AudioSegment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RNu-reHuCYj_"
},
"outputs": [],
"source": [
"# Make sure that the tokenizeer and model is already generated\n",
"\n",
"# tokenizer = AutoTokenizer.from_pretrained(LLAMA)\n",
"# tokenizer.pad_token = tokenizer.eos_token\n",
"# streamer = TextStreamer(tokenizer)\n",
"# model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map=\"auto\", quantization_config=quant_config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KOuoH0YOPruE"
},
"outputs": [],
"source": [
"# def save_as_mp3(audio_np):\n",
"# sr, data = audio_np\n",
"# # Convert float32 or int16 to PCM wav and then mp3\n",
"# wav_path = tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False).name\n",
"# mp3_path = tempfile.NamedTemporaryFile(suffix=\".mp3\", delete=False).name\n",
"\n",
"# sf.write(wav_path, data, sr)\n",
"# audio_segment = AudioSegment.from_wav(wav_path)\n",
"# audio_segment.export(mp3_path, format=\"mp3\", bitrate=\"64k\") # Low bitrate = small file\n",
"# return mp3_path"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "toBIPBJoSNw0"
},
"outputs": [],
"source": [
"# Handles audio input as numpy array and returns updated chat history\n",
"def speak_send(audio_np):\n",
"\n",
" # If use numpy as input: audio_input = gr.Audio(sources=\"upload\", type=\"numpy\", label=\"Upload audio file to generate meeting minutes\")\n",
" # mp3_path = save_as_mp3(audio_np)\n",
"\n",
" # with open(mp3_path, \"rb\") as audio_file:\n",
" # transcription = openai.audio.transcriptions.create(\n",
" # model=AUDIO_MODEL,\n",
" # file=audio_file,\n",
" # response_format=\"text\"\n",
" # )\n",
"\n",
" audio = AudioSegment.from_file(audio_np)\n",
" with tempfile.NamedTemporaryFile(suffix=\".mp3\", delete=False) as tmpfile:\n",
" audio.export(tmpfile.name, format=\"mp3\")\n",
" with open(tmpfile.name, \"rb\") as file:\n",
" transcript = openai.audio.transcriptions.create(\n",
" model=AUDIO_MODEL,\n",
" file=file,\n",
" response_format=\"text\"\n",
" )\n",
"\n",
" system_message = \"You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown.\"\n",
" user_prompt = f\"Below is an extract transcript of a Denver council meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\\n{transcription}\"\n",
"\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": system_message},\n",
" {\"role\": \"user\", \"content\": user_prompt}\n",
" ]\n",
"\n",
" inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n",
" outputs = model.generate(inputs, max_new_tokens=2000)\n",
"\n",
" _, _, after = tokenizer.decode(outputs[0]).partition(\"assistant<|end_header_id|>\")\n",
" return after.strip()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xXJfabpDSN5R"
},
"outputs": [],
"source": [
"with gr.Blocks() as demo:\n",
"\n",
" with gr.Row():\n",
" audio_input = gr.Audio(sources=\"upload\", type=\"filepath\", label=\"Upload audio file to generate meeting minutes\")\n",
" with gr.Row():\n",
" audio_submit = gr.Button(\"Send\")\n",
" with gr.Row():\n",
" outputs = [gr.Markdown(label=\"Meeting minutes:\")]\n",
"\n",
" audio_submit.click(speak_send, inputs=audio_input, outputs=outputs)\n",
"\n",
"demo.launch(debug=True)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kuxYecT2QDQ9"
},
"source": [
"# Student contribution\n",
"\n",
"Student Emad S. has made this powerful variation that uses `TextIteratorStreamer` to stream back results into a Gradio UI, and takes advantage of background threads for performance! I'm sharing it here if you'd like to take a look at some very interesting work. Thank you, Emad!\n",
"\n",
"https://colab.research.google.com/drive/1Ja5zyniyJo5y8s1LKeCTSkB2xyDPOt6D"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AU3uAEyU3a-o"
},
"source": [
"## Alternative implementation\n",
"\n",
"Class student Youssef has contributed this variation in which we use an open-source model to transcribe the meeting Audio.\n",
"\n",
"Thank you Youssef!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "phYYgAbBRvu5"
},
"outputs": [],
"source": [
"import torch\n",
"from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HdQnWEzW3lzP"
},
"outputs": [],
"source": [
"AUDIO_MODEL = \"openai/whisper-medium\"\n",
"speech_model = AutoModelForSpeechSeq2Seq.from_pretrained(AUDIO_MODEL, torch_dtype=torch.float16, low_cpu_mem_usage=True, use_safetensors=True)\n",
"speech_model.to('cuda')\n",
"processor = AutoProcessor.from_pretrained(AUDIO_MODEL)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ZhA_fbeCSAeZ"
},
"outputs": [],
"source": [
"pipe = pipeline(\n",
" \"automatic-speech-recognition\",\n",
" model=speech_model,\n",
" tokenizer=processor.tokenizer,\n",
" feature_extractor=processor.feature_extractor,\n",
" torch_dtype=torch.float16,\n",
" device='cuda',\n",
" return_timestamps=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nrQjKtD53omJ"
},
"outputs": [],
"source": [
"# Use the Whisper OpenAI model to convert the Audio to Text\n",
"result = pipe(audio_filename)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "G_XSljOY3tDf"
},
"outputs": [],
"source": [
"transcription = result[\"text\"]\n",
"print(transcription)"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,287 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "zmpDFA3bGEHY"
},
"source": [
"Minute creator in Gradio from day 5 of week 3.\n",
"A couple of points to note:\n",
"\n",
"\n",
"* My access to llama hasn't been approved on Hugging Face and so I've experimented with some of the other models.\n",
"* There is a fair bit of debugging code in the main function as I was getting an error and couldn't find it. I've left it in just in case its useful for others trying to debug their code.\n",
"* I was debugging with the help of Claude. It suggested using <with torch.no_grad()> for the minute output. The rationale is that it disables gradient computation which isn't necessary for inference and I found it did speed things up.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "l-5xKLFeJUGz"
},
"outputs": [],
"source": [
"!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Wi-bBD9VdBMo"
},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"from openai import OpenAI\n",
"from IPython.display import Markdown, display, update_display\n",
"from google.colab import drive\n",
"from huggingface_hub import login\n",
"from google.colab import userdata\n",
"from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n",
"import torch\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-0O-kuWtdk4I"
},
"outputs": [],
"source": [
"# keys\n",
"\n",
"#openai\n",
"openai_api_key = userdata.get('OPENAI_API_KEY')\n",
"openai = OpenAI(api_key=openai_api_key)\n",
"\n",
"#hf\n",
"hf_token = userdata.get('HF_TOKEN')\n",
"login(hf_token, add_to_git_credential=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "u6v3Ecileg1H"
},
"outputs": [],
"source": [
"# constants\n",
"\n",
"AUDIO_MODEL = 'gpt-4o-transcribe'\n",
"OPENAI_MODEL = 'gpt-4o-mini'\n",
"QWEN2_MODEL = 'Qwen/Qwen2.5-7B-Instruct' # runs slowly no matter what size gpu - kept crashing on ram!\n",
"GEMMA2_MODEL = \"google/gemma-2-2b-it\" # doesn't use a system prompt\n",
"PHI3 = \"microsoft/Phi-3-mini-4k-instruct\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3nSfA_KhfY38"
},
"outputs": [],
"source": [
"# convert audio to text\n",
"\n",
"def transcribe_audio(audio_file_path):\n",
" try:\n",
" with open (audio_file_path, 'rb') as audio_file:\n",
" transcript = openai.audio.transcriptions.create(model = AUDIO_MODEL, file = audio_file, response_format=\"text\")\n",
" return transcript\n",
" except Exception as e:\n",
" return f\"An error occurred: {str(e)}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "OVmlY3DGgnYc"
},
"outputs": [],
"source": [
"# use transcript to create minutes\n",
"# use open source model\n",
"\n",
"def create_minutes(transcript):\n",
"\n",
" # first try is for debugging\n",
" try:\n",
" print(f\"Starting to create minutes with transcript length: {len(str(transcript))}\")\n",
"\n",
" if not transcript or len(str(transcript).strip()) == 0:\n",
" return \"Error: Empty or invalid transcript\"\n",
"\n",
" #messages\n",
" system_prompt = \"You are an expert creator of meeting minutes. Based on a meeting transcript you can summarise the meeting title and date, attendees, key discussion points, key outcomes, actions and owners and next steps. Respond in Markdown.\"\n",
" user_prompt = f\"Create meeting minutes from the transcript provided. The minutes should be clear but succint and should include title and date, attendees, key discussion points, key outcomes, actions and owners, and next steps. {transcript}\"\n",
"\n",
" messages = [\n",
" {\"role\":\"system\",\"content\":system_prompt},\n",
" {\"role\":\"user\",\"content\":user_prompt}\n",
" ]\n",
" print(\"Messages prepared successfully\") # for debugging\n",
"\n",
" # quantisation (for os model)\n",
"\n",
" quantization_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_use_double_quant=True,\n",
" bnb_4bit_quant_type=\"nf4\",\n",
" bnb_4bit_compute_dtype=torch.bfloat16\n",
" )\n",
"\n",
" except Exception as e:\n",
" return f\"An error occurred in setup: {str(e)}\"\n",
"\n",
" # model & tokeniser\n",
" try:\n",
" print(\"Loading tokeniser....\") # for debugging\n",
" tokenizer = AutoTokenizer.from_pretrained(PHI3)\n",
" tokenizer.pad_token = tokenizer.eos_token\n",
"\n",
" print(\"Loading model.....\") # for debugging\n",
" model = AutoModelForCausalLM.from_pretrained(PHI3, device_map='auto', quantization_config=quantization_config)\n",
" print(f\"Model loaded on device {model.device}\") # for debugging\n",
"\n",
" # chat template\n",
" inputs = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\n",
" model_inputs = tokenizer(inputs, return_tensors=\"pt\").to(model.device)\n",
"\n",
" # torch.no_grad suggested by claude. This disables gradient computation which reduces memory usage and speeds things up\n",
" print(\"Generating text....\") # for debugging\n",
" with torch.no_grad():\n",
" outputs = model.generate(**model_inputs, max_new_tokens=2000, do_sample=True, temperature=0.7)\n",
" print(f\"Generation complete. Output shape: {outputs.shape}\") # for debugging\n",
"\n",
" #***debugging****\n",
"\n",
" # Decode the generated text (excluding the input prompt)\n",
" print(\"Starting text decoding...\") # debugging\n",
" input_length = len(model_inputs['input_ids'][0]) # debugging\n",
" print(f\"Input length: {input_length}, Output length: {len(outputs[0])}\") # debugging\n",
"\n",
" if len(outputs[0]) <= input_length: # debugging\n",
" return \"Error: Model didn't generate any new tokens. Try reducing input length or increasing max_new_tokens.\" # debugging\n",
"\n",
" generated_tokens = outputs[0][input_length:] # debugging\n",
" print(f\"Generated tokens length: {len(generated_tokens)}\") # debugging\n",
"\n",
" # decode generated text\n",
" generated_text = tokenizer.decode(outputs[0][len(model_inputs['input_ids'][0]):],skip_special_tokens=True)\n",
" print(f\"Decoded text length: {len(generated_text)}\")\n",
"\n",
" return generated_text.strip()\n",
"\n",
" except ImportError as e:\n",
" return f\"Import error - missing library: {str(e)}. Please install required packages.\"\n",
" except torch.cuda.OutOfMemoryError as e:\n",
" return f\"CUDA out of memory: {str(e)}. Try reducing max_new_tokens to 500 or use CPU.\"\n",
" except RuntimeError as e:\n",
" return f\"Runtime error: {str(e)}. This might be a CUDA/device issue.\"\n",
" except Exception as e:\n",
" return f\"Unexpected error during text generation: {type(e).__name__}: {str(e)}\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "c63zzoDopw6u"
},
"outputs": [],
"source": [
"# create process for gradio\n",
"\n",
"def gr_process(audio_file, progress = gr.Progress()):\n",
"\n",
" if audio_file is None:\n",
" return \"Please provide an audio file\"\n",
"\n",
" try:\n",
" progress(0, desc=\"Analysing file\")\n",
" transcript = transcribe_audio(audio_file)\n",
"\n",
" if transcript.startswith(\"An error occurred\"):\n",
" return transcript\n",
"\n",
" progress(0.5, desc=\"File analysed, generating minutes\")\n",
"\n",
" minutes = create_minutes(transcript)\n",
" progress(0.9, desc=\"Nearly there\")\n",
"\n",
" return minutes\n",
"\n",
" except Exception as e:\n",
" return f\"An error occurred: {str(e)}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "82fyQELQkGty"
},
"outputs": [],
"source": [
"# gradio interface\n",
"\n",
"demo = gr.Interface(\n",
" fn=gr_process,\n",
" inputs= gr.Audio(type=\"filepath\",label=\"Upload MP3 file\"),\n",
" outputs= gr.Markdown(label=\"Meeting minutes\"),\n",
" title = \"Meeting minute creator\",\n",
" description = \"Upload an mp3 audio file for a meeting and I will provide the minutes!\"\n",
")\n",
"\n",
"if __name__ == \"__main__\":\n",
" demo.launch(debug=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XljpyS7Nvxkh"
},
"outputs": [],
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,295 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- This creates dummy / test data from a usecase provided by the user.\n",
"- The usecase can be as simple or complex as the user wants (I've tested both and the results are good).\n",
"- I've used a Phi3 model as I'm having issues with llama access on Hugging Face."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "s7ERjTCEKSi_"
},
"outputs": [],
"source": [
"!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GG5VMcmhcA2N"
},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"from openai import OpenAI\n",
"import gradio as gr\n",
"from IPython.display import Markdown, display, update_display\n",
"from huggingface_hub import login\n",
"from google.colab import userdata\n",
"from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n",
"import torch\n",
"import json\n",
"import re\n",
"import pandas as pd\n",
"import io"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UfL-2XNicpEB"
},
"outputs": [],
"source": [
"# constants\n",
"\n",
"OPENAI = 'gpt-4o-mini'\n",
"PHI3 = \"microsoft/Phi-3-mini-4k-instruct\"\n",
"\n",
"limit = 100\n",
"max_tokens = 1000\n",
"temperature = 0.3"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ZQ0dcQ6hdTPo"
},
"outputs": [],
"source": [
"# keys\n",
"\n",
"openai_api_key = userdata.get('OPENAI_API_KEY')\n",
"openai = OpenAI(api_key=openai_api_key)\n",
"\n",
"hf_token = userdata.get('HF_TOKEN')\n",
"login(hf_token, add_to_git_credential=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2eHsLdYgd2d_"
},
"outputs": [],
"source": [
"system_prompt = f\"\"\"You create synthetic datasets for testing purposes. Based on the use case description, generate a CSV dataset with appropriate columns and a maximum of {limit} rows\n",
"of realistic data.\n",
"\n",
"IMPORTANT RULES:\n",
"1. Return ONLY the CSV data with headers and ensure there are no duplicate headers\n",
"2. No explanatory text before or after\n",
"3. No markdown formatting or code fences\n",
"4. No quotation marks around the entire response\n",
"5. Start directly with the column headers\n",
"\n",
"Format: column1 (e.g. customer_id),column2 (e.g. country),column3 (e.g. age)\n",
"row1data,row1data,row1data\n",
"row2data,row2data,row2data\"\"\"\n",
"\n",
"def data_user_prompt(usecase):\n",
" user_prompt = \"Create a synthetic dataset for the use case provided below: \"\n",
" user_prompt += usecase\n",
" user_prompt += f\" Respond in csv with appropriate headers. Do not include any other explanatory text, markdown formatting or code fences, or quotation marks around the entire response. \\\n",
" Limit the rows in the dataset to {limit}.\"\n",
" return user_prompt\n",
"\n",
"messages = [\n",
" {\"role\":\"system\",\"content\":system_prompt},\n",
" {\"role\":\"user\",\"content\":data_user_prompt(usecase)}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "necoAEc1gNPF"
},
"outputs": [],
"source": [
"def dataset_call(usecase):\n",
"\n",
" #quantisation\n",
" quant_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_use_double_quant=True,\n",
" bnb_4bit_quant_type=\"nf4\",\n",
" bnb_4bit_compute_dtype=torch.bfloat16\n",
" )\n",
"\n",
" #tokenization\n",
" tokenizer = AutoTokenizer.from_pretrained(PHI3)\n",
" tokenizer.pad_token = tokenizer.eos_token\n",
"\n",
" #model\n",
" model = AutoModelForCausalLM.from_pretrained(PHI3, quantization_config=quant_config, device_map=\"auto\")\n",
"\n",
" #inputs & outputs\n",
" inputs = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\n",
" model_inputs = tokenizer(inputs, return_tensors=\"pt\").to(model.device)\n",
" #streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)\n",
"\n",
" with torch.no_grad():\n",
" outputs = model.generate(**model_inputs, max_new_tokens=max_tokens,do_sample=True, temperature=temperature)\n",
"\n",
" response = tokenizer.decode(outputs[0][len(model_inputs['input_ids'][0]):],skip_special_tokens=True)\n",
" return response.strip()\n",
" print(response.strip())\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "g8zEBraI0grT"
},
"outputs": [],
"source": [
"# convert csv string into panda\n",
"\n",
"def csv_handler(csv_string):\n",
"\n",
" try:\n",
" # Convert CSV string to DataFrame\n",
" df = pd.read_csv(io.StringIO(csv_string))\n",
" return df\n",
" except Exception as e:\n",
" # Return error message as DataFrame if parsing fails\n",
" error_df = pd.DataFrame({\"Error\": [f\"Failed to parse CSV: {str(e)}\"]})\n",
" return error_df\n",
" print(df, error_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "vLPsusTL1zNB"
},
"outputs": [],
"source": [
"# usecase to csv_string\n",
"\n",
"def usecase_to_csv(usecase):\n",
" try:\n",
" # Get CSV string from your LLM\n",
" csv_string = dataset_call(usecase)\n",
"\n",
" # Process into DataFrame for Gradio display\n",
" df = csv_handler(csv_string)\n",
"\n",
" return df\n",
"\n",
" except Exception as e:\n",
" error_df = pd.DataFrame({\"Error\": [f\"LLM processing failed: {str(e)}\"]})\n",
" return error_df, \"\", gr.update(visible=False)\n",
"\n",
" print(df, error_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "H3WTLa9a2Rdy"
},
"outputs": [],
"source": [
"def download_csv(csv_string):\n",
" if csv_string:\n",
" return csv_string\n",
" return \"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XhMVSrVhjYvz"
},
"outputs": [],
"source": [
"#test\n",
"usecase = \"A financial services company is looking for synthetic data to test its Expected Credit Losses (ECL) model under IFRS9.\"\n",
"#dataset_call(usecase)\n",
"usecase_to_csv(usecase)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "z3Ze4o2qjs5y"
},
"outputs": [],
"source": [
"\n",
"demo = gr.Interface(\n",
" fn = usecase_to_csv,\n",
" inputs = gr.Textbox(lines=5,label=\"Describe your usecase\",placeholder=\"Describe the dataset you would like to create and how you will use it\"),\n",
" outputs = gr.DataFrame(label=\"Here is your dataset!\",interactive=True),\n",
" title = \"Friendly Neighbourhood Synthetic Data Creator!\",\n",
" description = \"Let me know your use case for synthetic data and I will create it for you.\",\n",
" examples=[\n",
" \"Generate a dataset of 10 employees with name, department, salary, and years of experience\",\n",
" \"Create sample e-commerce data with product names, categories, prices, and ratings\",\n",
" \"Generate customer survey responses with demographics and satisfaction scores\",\n",
" \"A financial services company is looking for synthetic data to test its Expected Credit Losses (ECL) model under IFRS9.\"\n",
" ]\n",
")\n",
"\n",
"demo.launch(debug=True)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ck1qdmbHo_G3"
},
"outputs": [],
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"authorship_tag": "ABX9TyOay+EACzwO0uXDLuayhscX",
"gpuType": "L4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,295 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- This creates dummy / test data from a usecase provided by the user.\n",
"- The usecase can be as simple or complex as the user wants (I've tested both and the results are good).\n",
"- I've used a Phi3 model as I'm having issues with llama access on Hugging Face."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "s7ERjTCEKSi_"
},
"outputs": [],
"source": [
"!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GG5VMcmhcA2N"
},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"from openai import OpenAI\n",
"import gradio as gr\n",
"from IPython.display import Markdown, display, update_display\n",
"from huggingface_hub import login\n",
"from google.colab import userdata\n",
"from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n",
"import torch\n",
"import json\n",
"import re\n",
"import pandas as pd\n",
"import io"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UfL-2XNicpEB"
},
"outputs": [],
"source": [
"# constants\n",
"\n",
"OPENAI = 'gpt-4o-mini'\n",
"PHI3 = \"microsoft/Phi-3-mini-4k-instruct\"\n",
"\n",
"limit = 100\n",
"max_tokens = 1000\n",
"temperature = 0.3"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ZQ0dcQ6hdTPo"
},
"outputs": [],
"source": [
"# keys\n",
"\n",
"openai_api_key = userdata.get('OPENAI_API_KEY')\n",
"openai = OpenAI(api_key=openai_api_key)\n",
"\n",
"hf_token = userdata.get('HF_TOKEN')\n",
"login(hf_token, add_to_git_credential=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2eHsLdYgd2d_"
},
"outputs": [],
"source": [
"system_prompt = f\"\"\"You create synthetic datasets for testing purposes. Based on the use case description, generate a CSV dataset with appropriate columns and a maximum of {limit} rows\n",
"of realistic data.\n",
"\n",
"IMPORTANT RULES:\n",
"1. Return ONLY the CSV data with headers and ensure there are no duplicate headers\n",
"2. No explanatory text before or after\n",
"3. No markdown formatting or code fences\n",
"4. No quotation marks around the entire response\n",
"5. Start directly with the column headers\n",
"\n",
"Format: column1 (e.g. customer_id),column2 (e.g. country),column3 (e.g. age)\n",
"row1data,row1data,row1data\n",
"row2data,row2data,row2data\"\"\"\n",
"\n",
"def data_user_prompt(usecase):\n",
" user_prompt = \"Create a synthetic dataset for the use case provided below: \"\n",
" user_prompt += usecase\n",
" user_prompt += f\" Respond in csv with appropriate headers. Do not include any other explanatory text, markdown formatting or code fences, or quotation marks around the entire response. \\\n",
" Limit the rows in the dataset to {limit}.\"\n",
" return user_prompt\n",
"\n",
"messages = [\n",
" {\"role\":\"system\",\"content\":system_prompt},\n",
" {\"role\":\"user\",\"content\":data_user_prompt(usecase)}\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "necoAEc1gNPF"
},
"outputs": [],
"source": [
"def dataset_call(usecase):\n",
"\n",
" #quantisation\n",
" quant_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_use_double_quant=True,\n",
" bnb_4bit_quant_type=\"nf4\",\n",
" bnb_4bit_compute_dtype=torch.bfloat16\n",
" )\n",
"\n",
" #tokenization\n",
" tokenizer = AutoTokenizer.from_pretrained(PHI3)\n",
" tokenizer.pad_token = tokenizer.eos_token\n",
"\n",
" #model\n",
" model = AutoModelForCausalLM.from_pretrained(PHI3, quantization_config=quant_config, device_map=\"auto\")\n",
"\n",
" #inputs & outputs\n",
" inputs = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\n",
" model_inputs = tokenizer(inputs, return_tensors=\"pt\").to(model.device)\n",
" #streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)\n",
"\n",
" with torch.no_grad():\n",
" outputs = model.generate(**model_inputs, max_new_tokens=max_tokens,do_sample=True, temperature=temperature)\n",
"\n",
" response = tokenizer.decode(outputs[0][len(model_inputs['input_ids'][0]):],skip_special_tokens=True)\n",
" return response.strip()\n",
" print(response.strip())\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "g8zEBraI0grT"
},
"outputs": [],
"source": [
"# convert csv string into panda\n",
"\n",
"def csv_handler(csv_string):\n",
"\n",
" try:\n",
" # Convert CSV string to DataFrame\n",
" df = pd.read_csv(io.StringIO(csv_string))\n",
" return df\n",
" except Exception as e:\n",
" # Return error message as DataFrame if parsing fails\n",
" error_df = pd.DataFrame({\"Error\": [f\"Failed to parse CSV: {str(e)}\"]})\n",
" return error_df\n",
" print(df, error_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "vLPsusTL1zNB"
},
"outputs": [],
"source": [
"# usecase to csv_string\n",
"\n",
"def usecase_to_csv(usecase):\n",
" try:\n",
" # Get CSV string from your LLM\n",
" csv_string = dataset_call(usecase)\n",
"\n",
" # Process into DataFrame for Gradio display\n",
" df = csv_handler(csv_string)\n",
"\n",
" return df\n",
"\n",
" except Exception as e:\n",
" error_df = pd.DataFrame({\"Error\": [f\"LLM processing failed: {str(e)}\"]})\n",
" return error_df, \"\", gr.update(visible=False)\n",
"\n",
" print(df, error_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "H3WTLa9a2Rdy"
},
"outputs": [],
"source": [
"def download_csv(csv_string):\n",
" if csv_string:\n",
" return csv_string\n",
" return \"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XhMVSrVhjYvz"
},
"outputs": [],
"source": [
"#test\n",
"usecase = \"A financial services company is looking for synthetic data to test its Expected Credit Losses (ECL) model under IFRS9.\"\n",
"#dataset_call(usecase)\n",
"usecase_to_csv(usecase)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "z3Ze4o2qjs5y"
},
"outputs": [],
"source": [
"\n",
"demo = gr.Interface(\n",
" fn = usecase_to_csv,\n",
" inputs = gr.Textbox(lines=5,label=\"Describe your usecase\",placeholder=\"Describe the dataset you would like to create and how you will use it\"),\n",
" outputs = gr.DataFrame(label=\"Here is your dataset!\",interactive=True),\n",
" title = \"Friendly Neighbourhood Synthetic Data Creator!\",\n",
" description = \"Let me know your use case for synthetic data and I will create it for you.\",\n",
" examples=[\n",
" \"Generate a dataset of 10 employees with name, department, salary, and years of experience\",\n",
" \"Create sample e-commerce data with product names, categories, prices, and ratings\",\n",
" \"Generate customer survey responses with demographics and satisfaction scores\",\n",
" \"A financial services company is looking for synthetic data to test its Expected Credit Losses (ECL) model under IFRS9.\"\n",
" ]\n",
")\n",
"\n",
"demo.launch(debug=True)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ck1qdmbHo_G3"
},
"outputs": [],
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"authorship_tag": "ABX9TyOay+EACzwO0uXDLuayhscX",
"gpuType": "L4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,287 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "zmpDFA3bGEHY"
},
"source": [
"Minute creator in Gradio from day 5 of week 3.\n",
"A couple of points to note:\n",
"\n",
"\n",
"* My access to llama hasn't been approved on Hugging Face and so I've experimented with some of the other models.\n",
"* There is a fair bit of debugging code in the main function as I was getting an error and couldn't find it. I've left it in just in case its useful for others trying to debug their code.\n",
"* I was debugging with the help of Claude. It suggested using <with torch.no_grad()> for the minute output. The rationale is that it disables gradient computation which isn't necessary for inference and I found it did speed things up.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "l-5xKLFeJUGz"
},
"outputs": [],
"source": [
"!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Wi-bBD9VdBMo"
},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"from openai import OpenAI\n",
"from IPython.display import Markdown, display, update_display\n",
"from google.colab import drive\n",
"from huggingface_hub import login\n",
"from google.colab import userdata\n",
"from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n",
"import torch\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-0O-kuWtdk4I"
},
"outputs": [],
"source": [
"# keys\n",
"\n",
"#openai\n",
"openai_api_key = userdata.get('OPENAI_API_KEY')\n",
"openai = OpenAI(api_key=openai_api_key)\n",
"\n",
"#hf\n",
"hf_token = userdata.get('HF_TOKEN')\n",
"login(hf_token, add_to_git_credential=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "u6v3Ecileg1H"
},
"outputs": [],
"source": [
"# constants\n",
"\n",
"AUDIO_MODEL = 'gpt-4o-transcribe'\n",
"OPENAI_MODEL = 'gpt-4o-mini'\n",
"QWEN2_MODEL = 'Qwen/Qwen2.5-7B-Instruct' # runs slowly no matter what size gpu - kept crashing on ram!\n",
"GEMMA2_MODEL = \"google/gemma-2-2b-it\" # doesn't use a system prompt\n",
"PHI3 = \"microsoft/Phi-3-mini-4k-instruct\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3nSfA_KhfY38"
},
"outputs": [],
"source": [
"# convert audio to text\n",
"\n",
"def transcribe_audio(audio_file_path):\n",
" try:\n",
" with open (audio_file_path, 'rb') as audio_file:\n",
" transcript = openai.audio.transcriptions.create(model = AUDIO_MODEL, file = audio_file, response_format=\"text\")\n",
" return transcript\n",
" except Exception as e:\n",
" return f\"An error occurred: {str(e)}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "OVmlY3DGgnYc"
},
"outputs": [],
"source": [
"# use transcript to create minutes\n",
"# use open source model\n",
"\n",
"def create_minutes(transcript):\n",
"\n",
" # first try is for debugging\n",
" try:\n",
" print(f\"Starting to create minutes with transcript length: {len(str(transcript))}\")\n",
"\n",
" if not transcript or len(str(transcript).strip()) == 0:\n",
" return \"Error: Empty or invalid transcript\"\n",
"\n",
" #messages\n",
" system_prompt = \"You are an expert creator of meeting minutes. Based on a meeting transcript you can summarise the meeting title and date, attendees, key discussion points, key outcomes, actions and owners and next steps. Respond in Markdown.\"\n",
" user_prompt = f\"Create meeting minutes from the transcript provided. The minutes should be clear but succint and should include title and date, attendees, key discussion points, key outcomes, actions and owners, and next steps. {transcript}\"\n",
"\n",
" messages = [\n",
" {\"role\":\"system\",\"content\":system_prompt},\n",
" {\"role\":\"user\",\"content\":user_prompt}\n",
" ]\n",
" print(\"Messages prepared successfully\") # for debugging\n",
"\n",
" # quantisation (for os model)\n",
"\n",
" quantization_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_use_double_quant=True,\n",
" bnb_4bit_quant_type=\"nf4\",\n",
" bnb_4bit_compute_dtype=torch.bfloat16\n",
" )\n",
"\n",
" except Exception as e:\n",
" return f\"An error occurred in setup: {str(e)}\"\n",
"\n",
" # model & tokeniser\n",
" try:\n",
" print(\"Loading tokeniser....\") # for debugging\n",
" tokenizer = AutoTokenizer.from_pretrained(PHI3)\n",
" tokenizer.pad_token = tokenizer.eos_token\n",
"\n",
" print(\"Loading model.....\") # for debugging\n",
" model = AutoModelForCausalLM.from_pretrained(PHI3, device_map='auto', quantization_config=quantization_config)\n",
" print(f\"Model loaded on device {model.device}\") # for debugging\n",
"\n",
" # chat template\n",
" inputs = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\n",
" model_inputs = tokenizer(inputs, return_tensors=\"pt\").to(model.device)\n",
"\n",
" # torch.no_grad suggested by claude. This disables gradient computation which reduces memory usage and speeds things up\n",
" print(\"Generating text....\") # for debugging\n",
" with torch.no_grad():\n",
" outputs = model.generate(**model_inputs, max_new_tokens=2000, do_sample=True, temperature=0.7)\n",
" print(f\"Generation complete. Output shape: {outputs.shape}\") # for debugging\n",
"\n",
" #***debugging****\n",
"\n",
" # Decode the generated text (excluding the input prompt)\n",
" print(\"Starting text decoding...\") # debugging\n",
" input_length = len(model_inputs['input_ids'][0]) # debugging\n",
" print(f\"Input length: {input_length}, Output length: {len(outputs[0])}\") # debugging\n",
"\n",
" if len(outputs[0]) <= input_length: # debugging\n",
" return \"Error: Model didn't generate any new tokens. Try reducing input length or increasing max_new_tokens.\" # debugging\n",
"\n",
" generated_tokens = outputs[0][input_length:] # debugging\n",
" print(f\"Generated tokens length: {len(generated_tokens)}\") # debugging\n",
"\n",
" # decode generated text\n",
" generated_text = tokenizer.decode(outputs[0][len(model_inputs['input_ids'][0]):],skip_special_tokens=True)\n",
" print(f\"Decoded text length: {len(generated_text)}\")\n",
"\n",
" return generated_text.strip()\n",
"\n",
" except ImportError as e:\n",
" return f\"Import error - missing library: {str(e)}. Please install required packages.\"\n",
" except torch.cuda.OutOfMemoryError as e:\n",
" return f\"CUDA out of memory: {str(e)}. Try reducing max_new_tokens to 500 or use CPU.\"\n",
" except RuntimeError as e:\n",
" return f\"Runtime error: {str(e)}. This might be a CUDA/device issue.\"\n",
" except Exception as e:\n",
" return f\"Unexpected error during text generation: {type(e).__name__}: {str(e)}\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "c63zzoDopw6u"
},
"outputs": [],
"source": [
"# create process for gradio\n",
"\n",
"def gr_process(audio_file, progress = gr.Progress()):\n",
"\n",
" if audio_file is None:\n",
" return \"Please provide an audio file\"\n",
"\n",
" try:\n",
" progress(0, desc=\"Analysing file\")\n",
" transcript = transcribe_audio(audio_file)\n",
"\n",
" if transcript.startswith(\"An error occurred\"):\n",
" return transcript\n",
"\n",
" progress(0.5, desc=\"File analysed, generating minutes\")\n",
"\n",
" minutes = create_minutes(transcript)\n",
" progress(0.9, desc=\"Nearly there\")\n",
"\n",
" return minutes\n",
"\n",
" except Exception as e:\n",
" return f\"An error occurred: {str(e)}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "82fyQELQkGty"
},
"outputs": [],
"source": [
"# gradio interface\n",
"\n",
"demo = gr.Interface(\n",
" fn=gr_process,\n",
" inputs= gr.Audio(type=\"filepath\",label=\"Upload MP3 file\"),\n",
" outputs= gr.Markdown(label=\"Meeting minutes\"),\n",
" title = \"Meeting minute creator\",\n",
" description = \"Upload an mp3 audio file for a meeting and I will provide the minutes!\"\n",
")\n",
"\n",
"if __name__ == \"__main__\":\n",
" demo.launch(debug=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XljpyS7Nvxkh"
},
"outputs": [],
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,102 @@
# 🧠 Synthetic Data Generator
A Python-based tool to generate structured, synthetic job postings using open-source LLMs from Hugging Face.
This project supports both **script-based execution** and an **interactive Colab notebook**, making it ideal for rapid prototyping, dataset bootstrapping, or demonstrating prompt engineering techniques.
> Note: Original Repo can be found at: https://github.com/moawiah/synthetic_data_generator
![Demo Screenshot](https://github.com/user-attachments/assets/c0e229ac-ddb7-4a37-8088-f04ca735cd81)
This tool helps:
- Researchers create labeled training data for NLP classification or QA
- HR tech startups prototype recommendation models
- AI instructors demonstrate few-shot prompting in class
---
## ✨ Features
- 🔗 Integrates Hugging Face Transformer models
- 📄 Generates realistic job postings in structured JSON format
- 🧪 Supports prompt engineering with control over output length and variability
- 🧠 Minimal Gradio UI for non-technical users
- 📓 Jupyter/Colab support for experimentation and reproducibility
## 📂 Project Structure
<pre> ```
. ├── app/
├── app.py # Main script entry point
├── consts.py # Configuration and constants
└── requirements.txt # Python dependencies
├── data/
└── software_engineer_jobs.json # Sample input data (JSON format)
├── notebooks/
└── synthetic_data_generator.ipynb # Interactive Colab notebook
├── .env.example # Sample environment variable config
├── .gitignore # Git ignored files list
└── README.md
``` </pre>
## 🚀 Getting Started
### 1. Clone the repository
```bash
git clone https://github.com/moawiah/synthetic_data_generator.git
cd synthetic_data_generator
```
### Install Dependencies
```bah
pip install -r app/requirements.txt
```
### Hugging Face Token
You need to create a `.env` file with your HuggingFace token like `HF_TOKEN=your-token-here`
### Run
run the app using
`python app/app.py`
## Example Output - 1 Job
```JSON
{
"title": "Software Engineer"
,
"description": "We are seeking a highly skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have experience in designing, coding, and testing software systems, and will be able to work collaboratively with cross-functional teams. Responsibilities include writing clean, maintainable, and efficient code, as well as actively participating in code reviews and continuous integration processes. This is an excellent opportunity for a self-starter with a passion for technology and a desire to grow in their career."
,
"requirements":[
"0":"Bachelor's degree in Computer Science or related field",
"1":"Minimum of 2 years experience in software development",
"2":"Strong proficiency in Java or C++",
"3":"Experience with agile development methodologies",
"4":"Good understanding of data structures and algorithms",
"5":"Excellent problem-solving and analytical skills"
],
"location":"New York, NY",
"company_name":"ABC Technologies"
}
```
## Future Improvements
🔁 Add support for more job roles and industries
🧠 Model selector from UI
💾 Export dataset as CSV
☁️ Optional integration with LangChain or RAG workflows

View File

@@ -0,0 +1,156 @@
import os
import requests
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from google.colab import drive
from huggingface_hub import login
from google.colab import userdata
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig, pipeline, TextGenerationPipeline
import torch
from consts import FALCON, MISTRAL, Databricks
from dotenv import load_dotenv
import json
import ast
import gradio as gr
import re
# Sign in to HuggingFace Hub
load_dotenv()
hf_token = os.getenv("HF_TOKEN")
# Main Prompt
prompt = """
Generate one fake job posting for a {{role}}.
Return only a single JSON object with:
- title
- description (5-10 sentences)
- requirements (array of 4-6 strings)
- location
- company_name
No explanations, no extra text.
Only the JSON object.
"""
# Main Conf
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4"
)
def load_model_and_tokenizer():
tokenizer = AutoTokenizer.from_pretrained(MISTRAL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MISTRAL,
device_map={"": "cuda"},
trust_remote_code=True,
offload_folder="/tmp/dolly_offload",
quantization_config=bnb_config
)
return model, tokenizer
def generate_job(role="Software Engineer", model=None, tokenizer=None):
# prompt = prompt.format(role=role, n=n)
# outputs = generator(prompt, max_new_tokens=500, do_sample=True, temperature=0.9)
# return outputs[0]['generated_text']
# Apply chat template formatting
# inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
inputs = tokenizer(prompt.format(role=role), return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
# Generate output
outputs = model.generate(
**inputs,
max_new_tokens=600,
do_sample=True,
temperature=0.2,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id
)
# Decode and return
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
return result
def generate_jobs(role="Software Engineer", n=5):
model, tokenizer = load_model_and_tokenizer()
role = "Software Engineer"
fake_jobs = []
for i in range(n):
fake_jobs.append(generate_job(role=role, model=model, tokenizer=tokenizer))
return fake_jobs
def extract_json_objects_from_text_block(texts):
"""
Accepts either a single string or a list of strings.
Extracts all valid JSON objects from messy text blocks.
"""
if isinstance(texts, str):
texts = [texts] # wrap in list if single string
pattern = r"\{[\s\S]*?\}"
results = []
for raw_text in texts:
matches = re.findall(pattern, raw_text)
for match in matches:
try:
obj = json.loads(match)
results.append(obj)
except json.JSONDecodeError:
continue
return results
def generate_ui(role, n):
try:
raw_jobs = generate_jobs(role, n)
parsed_jobs = extract_json_objects_from_text_block(raw_jobs)
if not isinstance(parsed_jobs, list) or not all(isinstance(item, dict) for item in parsed_jobs):
print("[ERROR] Parsed result is not a list of dicts")
return gr.update(value=[], visible=True), None
filename = f"data/{role.replace(' ', '_').lower()}_jobs.json"
with open(filename, "w") as f:
json.dump(parsed_jobs, f, indent=2)
print(f"[INFO] Returning {len(parsed_jobs)} jobs -> {filename}")
return parsed_jobs, filename
except Exception as e:
print(f"[FATAL ERROR] {e}")
return gr.update(value=[], visible=True), None
if __name__ == "__main__":
with gr.Blocks() as demo:
gr.Markdown("# 🧠 Synthetic Job Dataset Generator")
gr.Markdown("Generate a structured dataset of job postings for a specific role.")
with gr.Row():
role_input = gr.Textbox(label="Job Role", placeholder="e.g. Software Engineer", value="Software Engineer")
n_input = gr.Number(label="Number of Samples", value=5, precision=0)
generate_button = gr.Button("🚀 Generate")
output_table = gr.JSON(label="Generated Dataset")
download_button = gr.File(label="Download JSON")
generate_button.click(
generate_ui,
inputs=[role_input, n_input],
outputs=[output_table, download_button]
)
demo.launch(debug=True, share=True)

View File

@@ -0,0 +1,5 @@
# Models
GPT = 'gpt2'
FALCON = "tiiuae/falcon-rw-1b"
MISTRAL = "mistralai/Mistral-7B-Instruct-v0.1"
Databricks = "databricks/dolly-v2-3b"

View File

@@ -0,0 +1,7 @@
huggingface_hub==0.30.2
ipython==8.12.3
openai==1.76.2
protobuf==6.30.2
Requests==2.32.3
torch==2.6.0+cu124
transformers==4.51.3

View File

@@ -0,0 +1,71 @@
[
{
"title": "Software Engineer",
"description": "We are seeking a highly skilled software engineer to join our team in developing and maintaining complex software systems. The ideal candidate will have a strong background in computer science and experience with multiple programming languages. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. This is an excellent opportunity for a self-starter with a passion for technology and a desire to grow in their career.",
"requirements": [
"Bachelor's degree in Computer Science or related field",
"3+ years of experience in software development",
"Strong proficiency in Java or C++",
"Experience with agile development methodologies",
"Excellent problem-solving and analytical skills"
],
"location": "New York, NY",
"company_name": "ABC Technologies"
},
{
"title": "Software Engineer",
"description": "We are looking for a highly skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have experience in designing, developing, and testing software systems, and be able to work independently or as part of a team. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. Must have a strong understanding of computer science principles and be able to learn quickly. This is a full-time position located in San Francisco, CA.",
"requirements": [
"Bachelor's degree in Computer Science or related field",
"3+ years of experience in software development",
"Strong proficiency in Java or C++",
"Experience with agile development methodologies",
"Excellent problem-solving skills",
"Ability to work in a fast-paced environment"
],
"location": "San Francisco, CA",
"company_name": "Acme Inc."
},
{
"title": "Software Engineer",
"description": "We are seeking a highly skilled software engineer to join our team in developing and maintaining our cutting-edge software applications. The ideal candidate will have a strong background in computer science and software engineering, with experience in designing, coding, and testing software systems. Responsibilities include collaborating with cross-functional teams, writing clean and efficient code, and ensuring the timely delivery of high-quality software products. This is an excellent opportunity for a self-starter with a passion for technology and a desire to work in a dynamic and fast-paced environment.",
"requirements": [
"Bachelor's degree in Computer Science or related field",
"3+ years of experience in software engineering",
"Strong proficiency in Java, Python, or C++",
"Experience with agile development methodologies",
"Excellent problem-solving and analytical skills",
"Strong communication and interpersonal skills"
],
"location": "New York, NY",
"company_name": "ABC Tech"
},
{
"title": "Software Engineer",
"description": "We are seeking a highly skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have a strong background in computer science and experience with various programming languages and technologies. Responsibilities include designing, coding, testing, and maintaining software systems, as well as collaborating with cross-functional teams. This is an excellent opportunity for a creative and motivated individual to make a significant impact in the tech industry.",
"requirements": [
"Bachelor's degree in Computer Science or related field",
"Minimum of 2 years experience in software development",
"Strong proficiency in Java, Python, or C++",
"Experience with agile development methodologies",
"Excellent problem-solving and analytical skills",
"Ability to work independently and as part of a team",
"Strong communication and interpersonal skills"
],
"location": "New York, NY",
"company_name": "ABC Tech Inc."
},
{
"title": "Software Engineer",
"description": "We are looking for a skilled software engineer to join our team and contribute to the development of innovative software solutions. Responsibilities include designing, coding, testing and maintaining software systems, as well as collaborating with cross-functional teams. The ideal candidate will have a strong background in computer science or a related field, and at least 3 years of experience in software development. Must be proficient in multiple programming languages, including Java, Python, and C++. Strong problem-solving skills and the ability to work independently or as part of a team are required. This is a full-time position located in San Francisco, CA.",
"requirements": [
"Bachelor's degree in Computer Science or related field",
"At least 3 years of experience in software development",
"Proficiency in Java, Python, and C++",
"Strong problem-solving skills",
"Ability to work independently or as part of a team"
],
"location": "San Francisco, CA",
"company_name": "Innovative Solutions Inc."
}
]

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,400 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "3e473bbd-a0c2-43bd-bf99-c749784d00c3",
"metadata": {},
"outputs": [],
"source": [
"import gradio as gr\n",
"import openai\n",
"import anthropic\n",
"import google.generativeai as genai\n",
"import requests\n",
"import json\n",
"import os\n",
"from typing import Dict, Any, Optional\n",
"import asyncio\n",
"from dotenv import load_dotenv"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "16210512-41f1-4de3-8348-2cd7129e023f",
"metadata": {},
"outputs": [],
"source": [
"# load API\n",
"load_dotenv(override=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6747e275-91eb-4d2b-90b6-805f2bd9b6b7",
"metadata": {},
"outputs": [],
"source": [
"class CodeCommenter:\n",
" def __init__(self):\n",
" # Initialize API clients\n",
" self.openai_client = None\n",
" self.anthropic_client = None\n",
" self.gemini_client = None\n",
" \n",
" # Load API keys from environment variables\n",
" self.setup_clients()\n",
" \n",
" def setup_clients(self):\n",
" \"\"\"Initialize API clients with keys from environment variables\"\"\"\n",
" try:\n",
" # OpenAI\n",
" openai_key = os.getenv('OPENAI_API_KEY')\n",
" if openai_key:\n",
" self.openai_client = openai.OpenAI(api_key=openai_key)\n",
" \n",
" # Anthropic\n",
" anthropic_key = os.getenv('ANTHROPIC_API_KEY')\n",
" if anthropic_key:\n",
" self.anthropic_client = anthropic.Anthropic(api_key=anthropic_key)\n",
" \n",
" # Google Gemini\n",
" gemini_key = os.getenv('GOOGLE_API_KEY')\n",
" if gemini_key:\n",
" genai.configure(api_key=gemini_key)\n",
" self.gemini_client = genai.GenerativeModel('gemini-2.0-flash-exp')\n",
" \n",
" except Exception as e:\n",
" print(f\"Warning: Error setting up API clients: {e}\")\n",
" \n",
" def create_prompt(self, code: str, language: str) -> str:\n",
" \"\"\"Create a prompt for the LLM to add comments and docstrings\"\"\"\n",
" return f\"\"\"Please add detailed and helpful comments and docstrings to the following {language} code. \n",
" \n",
"Guidelines:\n",
"1. Add comprehensive docstrings for functions, classes, and modules\n",
"2. Add inline comments explaining complex logic\n",
"3. Follow the commenting conventions for {language}\n",
"4. Maintain the original code structure and functionality\n",
"5. Make comments clear and professional\n",
"6. Don't change the actual code logic, only add comments\n",
"7. Do not add code markdown delimiters like ```python\n",
"\n",
"Here's the code to comment:\n",
"\n",
"{code}\n",
"\n",
"Please return only the commented code without any additional explanation or markdown formatting.\"\"\"\n",
"\n",
" def call_openai(self, prompt: str, model: str = \"gpt-4o-mini\") -> str:\n",
" \"\"\"Make API call to OpenAI\"\"\"\n",
" if not self.openai_client:\n",
" return \"Error: OpenAI API key not configured. Please set OPENAI_API_KEY environment variable.\"\n",
" \n",
" try:\n",
" response = self.openai_client.chat.completions.create(\n",
" model=model,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": \"You are a helpful coding assistant that adds detailed comments and docstrings to code.\"},\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ],\n",
" max_tokens=4000,\n",
" temperature=0.1\n",
" )\n",
" return response.choices[0].message.content.strip()\n",
" except Exception as e:\n",
" return f\"Error calling OpenAI API: {str(e)}\"\n",
" \n",
" def call_anthropic(self, prompt: str, model: str = \"claude-3-5-haiku-20241022\") -> str:\n",
" \"\"\"Make API call to Anthropic Claude\"\"\"\n",
" if not self.anthropic_client:\n",
" return \"Error: Anthropic API key not configured. Please set ANTHROPIC_API_KEY environment variable.\"\n",
" \n",
" try:\n",
" response = self.anthropic_client.messages.create(\n",
" model=model,\n",
" max_tokens=4000,\n",
" temperature=0.1,\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ]\n",
" )\n",
" return response.content[0].text.strip()\n",
" except Exception as e:\n",
" return f\"Error calling Anthropic API: {str(e)}\"\n",
" \n",
" def call_gemini(self, prompt: str) -> str:\n",
" \"\"\"Make API call to Google Gemini\"\"\"\n",
" if not self.gemini_client:\n",
" return \"Error: Google API key not configured. Please set GOOGLE_API_KEY environment variable.\"\n",
" \n",
" try:\n",
" response = self.gemini_client.generate_content(\n",
" prompt,\n",
" generation_config=genai.types.GenerationConfig(\n",
" max_output_tokens=4000,\n",
" temperature=0.1,\n",
" )\n",
" )\n",
" return response.text.strip()\n",
" except Exception as e:\n",
" return f\"Error calling Gemini API: {str(e)}\"\n",
" \n",
" def call_ollama(self, prompt: str, model: str = \"llama3.2:latest\") -> str:\n",
" \"\"\"Make API call to Ollama (local)\"\"\"\n",
" try:\n",
" url = \"http://localhost:11434/api/generate\"\n",
" data = {\n",
" \"model\": model,\n",
" \"prompt\": prompt,\n",
" \"stream\": False,\n",
" \"options\": {\n",
" \"temperature\": 0.1,\n",
" \"num_predict\": 4000\n",
" }\n",
" }\n",
" \n",
" response = requests.post(url, json=data, timeout=60)\n",
" if response.status_code == 200:\n",
" result = response.json()\n",
" return result.get('response', '').strip()\n",
" else:\n",
" return f\"Error calling Ollama API: HTTP {response.status_code}\"\n",
" except requests.exceptions.ConnectionError:\n",
" return \"Error: Could not connect to Ollama. Make sure Ollama is running locally on port 11434.\"\n",
" except Exception as e:\n",
" return f\"Error calling Ollama API: {str(e)}\"\n",
"\n",
" def generate_comments(self, language: str, code: str, llm: str) -> str:\n",
" \"\"\"Generate comments for the given code using the specified LLM\"\"\"\n",
" if not code.strip():\n",
" return \"Error: Please provide code to comment.\"\n",
" \n",
" prompt = self.create_prompt(code, language)\n",
" \n",
" # Route to appropriate LLM\n",
" if llm == \"gpt-4o-mini\":\n",
" return self.call_openai(prompt, \"gpt-4o-mini\")\n",
" elif llm == \"claude-3-5-haiku-20241022\":\n",
" return self.call_anthropic(prompt, \"claude-3-5-haiku-20241022\")\n",
" elif llm == \"gemini-2.0-flash\":\n",
" return self.call_gemini(prompt)\n",
" elif llm == \"ollama:llama3.2:latest\":\n",
" return self.call_ollama(prompt, \"llama3.2:latest\")\n",
" else:\n",
" return f\"Error: Unsupported LLM: {llm}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "813f0911-d53f-4887-9341-656712e32d8f",
"metadata": {},
"outputs": [],
"source": [
"def create_gradio_interface():\n",
" \"\"\"Create and configure the Gradio interface\"\"\"\n",
" commenter = CodeCommenter()\n",
" \n",
" # Define the main function for the interface\n",
" def process_code(language, code, llm):\n",
" \"\"\"Process the code and return commented version\"\"\"\n",
" if not code.strip():\n",
" return \"Please enter some code to comment.\"\n",
" \n",
" # Show processing message\n",
" processing_msg = f\"Processing {language} code with {llm}...\"\n",
" print(processing_msg)\n",
" \n",
" # Generate comments\n",
" result = commenter.generate_comments(language, code, llm)\n",
" return result\n",
" \n",
" # Define default code\n",
" default_code = \"\"\"import pyodbc\n",
"from tabulate import tabulate\n",
"def connect_to_sql_server(server_name, database, username=None, password=None):\n",
" try:\n",
" if username and password:\n",
" connection_string = f\"DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server_name};DATABASE={database};UID={username};PWD={password}\"\n",
" else:\n",
" connection_string = f\"DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server_name};DATABASE={database};Trusted_Connection=yes\"\n",
" connection = pyodbc.connect(connection_string)\n",
" print(f\"Successfully connected to {server_name}/{database}\")\n",
" return connection\n",
" except Exception as e:\n",
" print(f\"Failed to connect to {server_name}/{database}: {str(e)}\")\n",
" return None\n",
"def get_record_count(connection, table_name):\n",
" try:\n",
" cursor = connection.cursor()\n",
" query = f\"SELECT COUNT(*) FROM {table_name}\"\n",
" cursor.execute(query)\n",
" count = cursor.fetchone()[0]\n",
" cursor.close()\n",
" print(f\"Record count for {table_name}: {count}\")\n",
" return count\n",
" except Exception as e:\n",
" print(f\"Failed to get record count for {table_name}: {str(e)}\")\n",
" return None\n",
"def select_top_records(connection, table_name, n):\n",
" try:\n",
" cursor = connection.cursor()\n",
" query = f\"SELECT TOP {n} * FROM {table_name}\"\n",
" cursor.execute(query)\n",
" records = cursor.fetchall()\n",
" columns = [column[0] for column in cursor.description]\n",
" cursor.close()\n",
" print(f\"Top {n} records from {table_name}\")\n",
" if records:\n",
" print(tabulate(records, headers=columns, tablefmt=\"grid\"))\n",
" return records\n",
" except Exception as e:\n",
" print(f\"Failed to retrieve top {n} records from {table_name}: {str(e)}\")\n",
" return None\n",
"conn = connect_to_sql_server(\"localhost\", \"AdventureWorks_lite\")\n",
"if conn:\n",
" total_records = get_record_count(conn, \"Sales.SalesOrderDetail\")\n",
" top_records = select_top_records(conn, \"Production.Product\", 10)\n",
" conn.close()\n",
" print(\"Connection closed successfully\")\"\"\"\n",
"\n",
" css = \"\"\"\n",
"textarea[rows]:not([rows=\"1\"]) {\n",
" overflow-y: auto !important;\n",
" scrollbar-width: thin !important;\n",
"}\n",
"textarea[rows]:not([rows=\"1\"])::-webkit-scrollbar {\n",
" all: initial !important;\n",
" background: #f1f1f1 !important;\n",
"}\n",
"textarea[rows]:not([rows=\"1\"])::-webkit-scrollbar-thumb {\n",
" all: initial !important;\n",
" background: #a8a8a8 !important;\n",
"}\n",
"\"\"\"\n",
"\n",
" # Create the interface\n",
" with gr.Blocks(title=\"Code Commenter\", theme=gr.themes.Base(), css=css) as interface:\n",
" gr.Markdown(\"# 🔧 Code Commenter\")\n",
" gr.Markdown(\"Add detailed comments and docstrings to your code using various LLM models.\")\n",
" \n",
" with gr.Row():\n",
" with gr.Column():\n",
" code_input = gr.Textbox(\n",
" label=\"Input Code\",\n",
" value=default_code,\n",
" lines=15,\n",
" max_lines=20,\n",
" info=\"Enter the code you want to add comments to\"\n",
" )\n",
" \n",
" with gr.Column():\n",
" code_output = gr.Textbox(\n",
" label=\"Commented Code\",\n",
" lines=20,\n",
" max_lines=20,\n",
" info=\"Your code with added comments and docstrings\"\n",
" )\n",
" \n",
" with gr.Row():\n",
" with gr.Column(scale=1):\n",
" language_dropdown = gr.Dropdown(\n",
" choices=[\"Python\", \"Ruby\", \"Rust\", \"C++\", \"Java\"],\n",
" value=\"Python\",\n",
" label=\"Programming Language\",\n",
" info=\"Select the programming language of your code\"\n",
" )\n",
" \n",
" llm_dropdown = gr.Dropdown(\n",
" choices=[\n",
" \"gpt-4o-mini\",\n",
" \"claude-3-5-haiku-20241022\", \n",
" \"gemini-2.0-flash\",\n",
" \"ollama:llama3.2:latest\"\n",
" ],\n",
" value=\"gpt-4o-mini\",\n",
" label=\"LLM Model\",\n",
" info=\"Choose the language model to use\"\n",
" )\n",
" \n",
" generate_btn = gr.Button(\n",
" \"🚀 Generate Comments\", \n",
" variant=\"primary\",\n",
" size=\"lg\"\n",
" )\n",
" \n",
" # Add some API setup information\n",
" gr.Markdown(\"## 📝 API Setup Instructions\")\n",
" gr.Markdown(\"\"\"\n",
" To use this tool, you need to set up API keys as environment variables:\n",
" \n",
" - **OpenAI**: Set `OPENAI_API_KEY`\n",
" - **Anthropic**: Set `ANTHROPIC_API_KEY` \n",
" - **Google Gemini**: Set `GOOGLE_API_KEY`\n",
" - **Ollama**: Make sure Ollama is running locally on port 11434\n",
" \"\"\")\n",
" \n",
" # Connect the button to the processing function\n",
" generate_btn.click(\n",
" fn=process_code,\n",
" inputs=[language_dropdown, code_input, llm_dropdown],\n",
" outputs=code_output,\n",
" show_progress=True\n",
" )\n",
" \n",
" return interface"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef461e08-c1d5-406d-b7d2-a4329f16486e",
"metadata": {},
"outputs": [],
"source": [
"print(\"🚀 Starting Code Commenter...\")\n",
"print(\"📋 Setting up Gradio interface...\")\n",
"\n",
"# Create and launch the interface\n",
"interface = create_gradio_interface()\n",
"\n",
"print(\"🌐 Launching interface...\")\n",
"print(\"💡 The interface will open in your default browser\")\n",
"print(\"🔧 Make sure to set up your API keys as environment variables\")\n",
"\n",
"# Launch with auto-opening in browser\n",
"interface.launch(\n",
" server_name=\"127.0.0.1\",\n",
" server_port=7860,\n",
" share=False,\n",
" inbrowser=True,\n",
" show_error=True\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,841 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "4a6ab9a2-28a2-445d-8512-a0dc8d1b54e9",
"metadata": {},
"source": [
"# Power Coder\n",
"\n",
"1. Convert code between two programming language; supporting languages are Python, Java, JavaScript, TypeScript, C, C++, C#, Go, Rust, Kotlin, Swift, PHP, Julia\n",
"2. Automatically add docstring/comments based on selected comment style\n",
"3. Automatically generate unit tests based on selected unit test style\n",
"4. Supporting models: gpt-4o, claude-3-5-sonnet-20240620, gemini-2.5-flash\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e610bf56-a46e-4aff-8de1-ab49d62b1ad3",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import io\n",
"import sys\n",
"import json\n",
"import requests\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import google.generativeai\n",
"import anthropic\n",
"from IPython.display import Markdown, display, update_display\n",
"import gradio as gr\n",
"import subprocess"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4f672e1c-87e9-4865-b760-370fa605e614",
"metadata": {},
"outputs": [],
"source": [
"# environment\n",
"\n",
"load_dotenv(override=True)\n",
"os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n",
"os.environ['ANTHROPIC_API_KEY'] = os.getenv('ANTHROPIC_API_KEY', 'your-key-if-not-using-env')\n",
"os.environ['GOOGLE_API_KEY'] = os.getenv('GOOGLE_API_KEY', 'your-key-if-not-using-env')\n",
"os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8aa149ed-9298-4d69-8fe2-8f5de0f667da",
"metadata": {},
"outputs": [],
"source": [
"# initialize\n",
"\n",
"openai = OpenAI()\n",
"claude = anthropic.Anthropic()\n",
"gemini_via_openai_client = OpenAI(\n",
" api_key=os.environ['GOOGLE_API_KEY'], \n",
" base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n",
")\n",
"OPENAI_MODEL = \"gpt-4o\"\n",
"CLAUDE_MODEL = \"claude-3-5-sonnet-20240620\"\n",
"GEMINI_MODEL = \"gemini-2.5-flash\""
]
},
{
"cell_type": "markdown",
"id": "37b204dd-f770-41d9-9b19-7e1baa5273cd",
"metadata": {},
"source": [
"## 1. Convesion Part"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6896636f-923e-4a2c-9d6c-fac07828a201",
"metadata": {},
"outputs": [],
"source": [
"def convert_system_prompt_for(in_lang, out_lang):\n",
" convert_system_message = f\"You are an assistant that reimplements {in_lang} code in high performance {out_lang}. \"\n",
" convert_system_message += f\"Respond only with {out_lang} code; use comments sparingly and do not provide any explanation other than occasional comments. \"\n",
" convert_system_message += f\"The {out_lang} response needs to produce an identical output in the fastest possible time. Keep implementations of random number generators identical so that results match exactly.\"\n",
" return convert_system_message"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e7b3546-57aa-4c29-bc5d-f211970d04eb",
"metadata": {},
"outputs": [],
"source": [
"def convert_user_prompt_for(in_lang, out_lang, input_instruct, in_code):\n",
" convert_user_prompt = f\"Rewrite this {in_lang} code in {out_lang} with the fastest possible implementation that produces identical output in the least time. \"\n",
" convert_user_prompt += f\"Respond only with {out_lang} code; do not explain your work other than a few comments. \"\n",
" convert_user_prompt += f\"Pay attention to number types to ensure no int overflows. Remember to include all necessary {out_lang} packages or modules, for example, iomanip for C++.\\n\\n\"\n",
" if input_instruct:\n",
" convert_user_prompt += \"Addtional instruction is: \" + input_instruct\n",
" convert_user_prompt += in_code\n",
" return convert_user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c6190659-f54c-4951-bef4-4960f8e51cc4",
"metadata": {},
"outputs": [],
"source": [
"def convert_messages_for(in_lang, out_lang, input_instruct, in_code):\n",
" return [\n",
" {\"role\": \"system\", \"content\": convert_system_prompt_for(in_lang, out_lang)},\n",
" {\"role\": \"user\", \"content\": convert_user_prompt_for(in_lang, out_lang, input_instruct, in_code)}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c3b497b3-f569-420e-b92e-fb0f49957ce0",
"metadata": {},
"outputs": [],
"source": [
"python_hard = \"\"\"# Be careful to support large number sizes\n",
"\n",
"def lcg(seed, a=1664525, c=1013904223, m=2**32):\n",
" value = seed\n",
" while True:\n",
" value = (a * value + c) % m\n",
" yield value\n",
" \n",
"def max_subarray_sum(n, seed, min_val, max_val):\n",
" lcg_gen = lcg(seed)\n",
" random_numbers = [next(lcg_gen) % (max_val - min_val + 1) + min_val for _ in range(n)]\n",
" max_sum = float('-inf')\n",
" for i in range(n):\n",
" current_sum = 0\n",
" for j in range(i, n):\n",
" current_sum += random_numbers[j]\n",
" if current_sum > max_sum:\n",
" max_sum = current_sum\n",
" return max_sum\n",
"\n",
"def total_max_subarray_sum(n, initial_seed, min_val, max_val):\n",
" total_sum = 0\n",
" lcg_gen = lcg(initial_seed)\n",
" for _ in range(20):\n",
" seed = next(lcg_gen)\n",
" total_sum += max_subarray_sum(n, seed, min_val, max_val)\n",
" return total_sum\n",
"\n",
"# Parameters\n",
"n = 10000 # Number of random numbers\n",
"initial_seed = 42 # Initial seed for the LCG\n",
"min_val = -10 # Minimum value of random numbers\n",
"max_val = 10 # Maximum value of random numbers\n",
"\n",
"# Timing the function\n",
"import time\n",
"start_time = time.time()\n",
"result = total_max_subarray_sum(n, initial_seed, min_val, max_val)\n",
"end_time = time.time()\n",
"\n",
"print(\"Total Maximum Subarray Sum (20 runs):\", result)\n",
"print(\"Execution Time: {:.6f} seconds\".format(end_time - start_time))\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0be9f47d-5213-4700-b0e2-d444c7c738c0",
"metadata": {},
"outputs": [],
"source": [
"def convert_stream_gpt(in_lang, out_lang, input_instruct, in_code): \n",
" stream = openai.chat.completions.create(model=OPENAI_MODEL, messages=convert_messages_for(in_lang, out_lang, input_instruct, in_code), temperature=0.0, stream=True)\n",
" reply = \"\"\n",
" for chunk in stream:\n",
" fragment = chunk.choices[0].delta.content or \"\"\n",
" reply += fragment\n",
" yield reply"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8669f56b-8314-4582-a167-78842caea131",
"metadata": {},
"outputs": [],
"source": [
"def convert_stream_claude(in_lang, out_lang, input_instruct, in_code):\n",
" result = claude.messages.stream(\n",
" model=CLAUDE_MODEL,\n",
" max_tokens=2000,\n",
" temperature=0.0,\n",
" system=convert_system_prompt_for(in_lang, out_lang),\n",
" messages=[{\"role\": \"user\", \"content\": convert_user_prompt_for(in_lang, out_lang, input_instruct, in_code)}],\n",
" )\n",
" reply = \"\"\n",
" with result as stream:\n",
" for text in stream.text_stream:\n",
" reply += text\n",
" yield reply"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "01d3cd4f-c100-4e25-8670-0663513f6136",
"metadata": {},
"outputs": [],
"source": [
"def convert_stream_gemini(in_lang, out_lang, input_instruct, in_code): \n",
" stream = gemini_via_openai_client.chat.completions.create(model=GEMINI_MODEL, messages=convert_messages_for(in_lang, out_lang, input_instruct, in_code), temperature=0.0, stream=True)\n",
" reply = \"\"\n",
" for chunk in stream:\n",
" fragment = chunk.choices[0].delta.content or \"\"\n",
" reply += fragment\n",
" yield reply"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f1ae8f5-16c8-40a0-aa18-63b617df078d",
"metadata": {},
"outputs": [],
"source": [
"def optimize(in_lang, out_lang, in_code, input_instruct, convert_model):\n",
" if \"gpt\" in convert_model.lower():\n",
" result = convert_stream_gpt(in_lang, out_lang, input_instruct, in_code)\n",
" elif \"claude\" in convert_model.lower():\n",
" result = convert_stream_claude(in_lang, out_lang, input_instruct, in_code)\n",
" elif \"gemini\" in convert_model.lower():\n",
" result = convert_stream_gemini(in_lang, out_lang, input_instruct, in_code)\n",
" else:\n",
" raise ValueError(\"Unknown convert model\")\n",
" for stream_so_far in result:\n",
" yield stream_so_far "
]
},
{
"cell_type": "markdown",
"id": "07383878-f887-464f-8bc7-527c669d3edd",
"metadata": {},
"source": [
"## 2. Comment part"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d254038c-fdd6-4ef8-8b7a-a074f1e7405d",
"metadata": {},
"outputs": [],
"source": [
"def comment_system_prompt_for(lang, comment_style):\n",
" comment_system_message = f\"You are an assistant that generate necessary, concise and clear comment/docstring for the {lang} code by applying {comment_style} comment style. \"\n",
" comment_system_message += f\"Respond only with added comments, and do not provide any redundant explanation. \"\n",
" return comment_system_message"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e95cee4f-f229-4c9f-8e67-8a68cc9534c3",
"metadata": {},
"outputs": [],
"source": [
"def comment_user_prompt_for(lang, code, comment_style):\n",
" comment_user_prompt = f\"Add the comments/docstring on the given code for the {lang} programming language in {comment_style} comment style. \"\n",
" comment_user_prompt += f\"Respond only with added comments, and do not provide any redundant explanation.\\n\\n\"\n",
" comment_user_prompt += f\"The given code is as follows: \"\n",
" comment_user_prompt += code\n",
" return comment_user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "507426c2-cf5a-4041-b904-b18a5afe83b6",
"metadata": {},
"outputs": [],
"source": [
"def comment_messages_for(lang, code, comment_style):\n",
" return [\n",
" {\"role\": \"system\", \"content\": comment_system_prompt_for(lang, comment_style)},\n",
" {\"role\": \"user\", \"content\": comment_user_prompt_for(lang, code, comment_style)}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e1c8cf6-7a15-4e79-82f6-6bb2a0b85773",
"metadata": {},
"outputs": [],
"source": [
"def comment_stream_gpt(lang, code, comment_style): \n",
" stream = openai.chat.completions.create(model=OPENAI_MODEL, messages=comment_messages_for(lang, code, comment_style), temperature=0.0, stream=True)\n",
" reply = \"\"\n",
" for chunk in stream:\n",
" fragment = chunk.choices[0].delta.content or \"\"\n",
" reply += fragment\n",
" yield reply"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "26f27781-4a3e-4e5f-a8ab-9a25944a9879",
"metadata": {},
"outputs": [],
"source": [
"def comment_stream_claude(lang, code, comment_style):\n",
" result = claude.messages.stream(\n",
" model=CLAUDE_MODEL,\n",
" max_tokens=2000,\n",
" temperature=0.0,\n",
" system=comment_system_prompt_for(lang, comment_style),\n",
" messages=[{\"role\": \"user\", \"content\": comment_user_prompt_for(lang, code, comment_style)}],\n",
" )\n",
" reply = \"\"\n",
" with result as stream:\n",
" for text in stream.text_stream:\n",
" reply += text\n",
" yield reply"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e6719e7-f2f3-40ea-8fed-01d84a641306",
"metadata": {},
"outputs": [],
"source": [
"def comment_stream_gemini(lang, code, comment_style): \n",
" stream = gemini_via_openai_client.chat.completions.create(model=GEMINI_MODEL, messages=comment_messages_for(lang, code, comment_style), temperature=0.0, stream=True)\n",
" reply = \"\"\n",
" for chunk in stream:\n",
" fragment = chunk.choices[0].delta.content or \"\"\n",
" reply += fragment\n",
" yield reply"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2b98acc4-23d8-4671-8f19-92d72631b55d",
"metadata": {},
"outputs": [],
"source": [
"def generate_comments_via_model(lang, code, comment_style, comment_model):\n",
" if \"gpt\" in comment_model.lower():\n",
" result = comment_stream_gpt(lang, code, comment_style)\n",
" elif \"claude\" in comment_model.lower():\n",
" result = comment_stream_claude(lang, code, comment_style)\n",
" elif \"gemini\" in comment_model.lower():\n",
" result = comment_stream_gemini(lang, code, comment_style)\n",
" else:\n",
" raise ValueError(\"Unknown comment model\")\n",
" for stream_so_far in result:\n",
" yield stream_so_far "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "282c75ae-d8c3-4866-a024-f7ecf87b3cde",
"metadata": {},
"outputs": [],
"source": [
"def generate_comments_fn(comment_option, in_lang, out_lang, in_code, out_code, in_comment_style, out_comment_style, comment_model):\n",
" if 'input' in comment_option:\n",
" in_gen = generate_comments_via_model(in_lang, in_code, in_comment_style, comment_model)\n",
" for in_output in in_gen:\n",
" yield in_output, \"\"\n",
" elif 'output' in comment_option:\n",
" out_gen = generate_comments_via_model(out_lang, out_code, out_comment_style, comment_model)\n",
" for out_output in out_gen:\n",
" yield \"\", out_output\n",
" elif 'both' in comment_option:\n",
" in_gen = generate_comments_via_model(in_lang, in_code, in_comment_style, comment_model)\n",
" out_gen = generate_comments_via_model(out_lang, out_code, out_comment_style, comment_model)\n",
" for in_output, out_output in zip(in_gen, out_gen):\n",
" yield in_output, out_output"
]
},
{
"cell_type": "markdown",
"id": "ce2c178c-d03c-49c0-b0e9-c57c699bca08",
"metadata": {},
"source": [
"## 3. Unit test part"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5a4743e-e1a8-42c7-8f1f-a73d49c0895d",
"metadata": {},
"outputs": [],
"source": [
"def unit_test_system_prompt_for(lang, unit_test_style):\n",
" unit_test_system_message = f\"You are an assistant that generate necessary, concise, clear and executable unit tests for the {lang} code by applying {unit_test_style} unit test style. \"\n",
" unit_test_system_message += f\"Respond only with generated unit tests; use comments sparingly and do not provide any explanation other than occasional comments. \"\n",
" return unit_test_system_message"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "334d5e40-71ff-4d24-8cef-b6c81c188e4d",
"metadata": {},
"outputs": [],
"source": [
"def unit_test_user_prompt_for(lang, code, unit_test_style):\n",
" unit_test_user_prompt = f\"Add the unit tests on the given code for the {lang} programming language in {unit_test_style} unit test style. \"\n",
" unit_test_user_prompt += f\"Respond only with generated unit tests; use comments sparingly and do not provide any explanation other than occasional comments.\\n\\n\"\n",
" unit_test_user_prompt += f\"The given code is as follows: \"\n",
" unit_test_user_prompt += code\n",
" return unit_test_user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8a8e061f-3993-4746-9425-d938d2537f65",
"metadata": {},
"outputs": [],
"source": [
"def unit_test_messages_for(lang, code, unit_test_style):\n",
" return [\n",
" {\"role\": \"system\", \"content\": unit_test_system_prompt_for(lang, unit_test_style)},\n",
" {\"role\": \"user\", \"content\": unit_test_user_prompt_for(lang, code, unit_test_style)}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71c1613b-7a16-4443-acec-d0a2d9bed192",
"metadata": {},
"outputs": [],
"source": [
"def unit_test_stream_gpt(lang, code, unit_test_style): \n",
" stream = openai.chat.completions.create(model=OPENAI_MODEL, messages=unit_test_messages_for(lang, code, unit_test_style), stream=True)\n",
" reply = \"\"\n",
" for chunk in stream:\n",
" fragment = chunk.choices[0].delta.content or \"\"\n",
" reply += fragment\n",
" yield reply"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8a6e3502-f7ff-42b8-8fc5-2697b2d1f36e",
"metadata": {},
"outputs": [],
"source": [
"def unit_test_stream_claude(lang, code, unit_test_style):\n",
" result = claude.messages.stream(\n",
" model=CLAUDE_MODEL,\n",
" max_tokens=2000,\n",
" system=unit_test_system_prompt_for(lang, unit_test_style),\n",
" messages=[{\"role\": \"user\", \"content\": unit_test_user_prompt_for(lang, code, unit_test_style)}],\n",
" )\n",
" reply = \"\"\n",
" with result as stream:\n",
" for text in stream.text_stream:\n",
" reply += text\n",
" yield reply"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d7f694f-a276-4bdc-9cfb-755483fd4380",
"metadata": {},
"outputs": [],
"source": [
"def unit_test_stream_gemini(lang, code, unit_test_style): \n",
" stream = gemini_via_openai_client.chat.completions.create(model=GEMINI_MODEL, messages=unit_test_messages_for(lang, code, unit_test_style), stream=True)\n",
" reply = \"\"\n",
" for chunk in stream:\n",
" fragment = chunk.choices[0].delta.content or \"\"\n",
" reply += fragment\n",
" yield reply"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c824429a-b18a-4320-8258-0141037a6531",
"metadata": {},
"outputs": [],
"source": [
"def generate_unit_test_via_model(lang, code, unit_test_style, unit_test_model):\n",
" if \"gpt\" in unit_test_model.lower():\n",
" result = unit_test_stream_gpt(lang, code, unit_test_style)\n",
" elif \"claude\" in unit_test_model.lower():\n",
" result = unit_test_stream_claude(lang, code, unit_test_style)\n",
" elif \"gemini\" in unit_test_model.lower():\n",
" result = unit_test_stream_gemini(lang, code, unit_test_style)\n",
" else:\n",
" raise ValueError(\"Unknown unit test model\")\n",
" for stream_so_far in result:\n",
" yield stream_so_far "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c3e59e26-37c0-4429-b69c-deb581423dd0",
"metadata": {},
"outputs": [],
"source": [
"def generate_unit_test_fn(unit_test_option, in_lang, out_lang, in_code, out_code, in_unit_test_style, out_unit_test_style, unit_test_model):\n",
" if 'input' in unit_test_option:\n",
" in_gen = generate_unit_test_via_model(in_lang, in_code, in_unit_test_style, unit_test_model)\n",
" for in_output in in_gen:\n",
" yield in_output, \"\"\n",
" elif 'output' in unit_test_option:\n",
" out_gen = generate_unit_test_via_model(out_lang, out_code, out_unit_test_style, unit_test_model)\n",
" for out_output in out_gen:\n",
" yield \"\", out_output\n",
" elif 'both' in unit_test_option:\n",
" in_gen = generate_unit_test_via_model(in_lang, in_code, in_unit_test_style, unit_test_model)\n",
" out_gen = generate_unit_test_via_model(out_lang, out_code, out_unit_test_style, unit_test_model)\n",
" for in_output, out_output in zip(in_gen, out_gen):\n",
" yield in_output, out_output"
]
},
{
"cell_type": "markdown",
"id": "2a1f4d0c-f417-4de4-be9f-441cbe5a6db3",
"metadata": {},
"source": [
"## 4. Gradio UI part"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a2274f1-d03b-42c0-8dcc-4ce159b18442",
"metadata": {},
"outputs": [],
"source": [
"LANGUAGE_INFO = {\n",
" \"Python\": {\n",
" \"doc_style\": [\"Google-style\", \"NumPy-style\", \"reST\", \"Doxygen\"],\n",
" \"unit_test_style\": [\"unittest\", \"pytest\", \"doctest\"]\n",
" },\n",
" \"Java\": {\n",
" \"doc_style\": [\"Javadoc\"],\n",
" \"unit_test_style\": [\"JUnit4\", \"JUnit5\", \"TestNG\"]\n",
" },\n",
" \"JavaScript\": {\n",
" \"doc_style\": [\"JSDoc\"],\n",
" \"unit_test_style\": [\"Jest\", \"Mocha + Chai\", \"Jasmine\"]\n",
" },\n",
" \"TypeScript\": {\n",
" \"doc_style\": [\"JSDoc\", \"TSDoc\"],\n",
" \"unit_test_style\": [\"Jest\", \"Mocha + Chai\", \"Vitest\"]\n",
" },\n",
" \"C\": {\n",
" \"doc_style\": [\"Doxygen\"],\n",
" \"unit_test_style\": [\"Google Test (gtest)\", \"CppUnit\", \"Catch2\"]\n",
" },\n",
" \"C++\": {\n",
" \"doc_style\": [\"Doxygen\"],\n",
" \"unit_test_style\": [\"Google Test (gtest)\", \"CppUnit\", \"Catch2\"]\n",
" },\n",
" \"C#\": {\n",
" \"doc_style\": [\"XML comments\"],\n",
" \"unit_test_style\": [\"xUnit\", \"NUnit\", \"MSTest\"]\n",
" },\n",
" \"Go\": {\n",
" \"doc_style\": [\"Godoc\"],\n",
" \"unit_test_style\": [\"Built-in testing package\"]\n",
" },\n",
" \"Rust\": {\n",
" \"doc_style\": [\"Rustdoc\", \"Markdown\"],\n",
" \"unit_test_style\": [\"Built-in #[test] annotation\"]\n",
" },\n",
" \"Kotlin\": {\n",
" \"doc_style\": [\"KDoc\"],\n",
" \"unit_test_style\": [\"JUnit\", \"Kotest\", \"Spek\"]\n",
" },\n",
" \"Swift\": {\n",
" \"doc_style\": [\"Mark-style comments\"],\n",
" \"unit_test_style\": [\"XCTest\"]\n",
" },\n",
" \"PHP\": {\n",
" \"doc_style\": [\"PHPDoc\"],\n",
" \"unit_test_style\": [\"PHPUnit\"]\n",
" },\n",
" \"Julia\": {\n",
" \"doc_style\": [\"Markdown\"],\n",
" \"unit_test_style\": [\"Built-in Test standard library\"]\n",
" }\n",
"}\n",
"LANGUAGES = list(LANGUAGE_INFO.keys())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b50e7833-8f6f-407e-8174-37af9cec2030",
"metadata": {},
"outputs": [],
"source": [
"with gr.Blocks(title=\"Power Coder\", theme=gr.themes.Citrus(), css=\"\"\"\n",
".selected {\n",
" background-color: orange !important;\n",
" box-shadow: 0 4px 12px rgba(255, 140, 0, 0.5) !important;\n",
" color: black;\n",
"}\n",
".unselected {\n",
" background-color: gray !important;\n",
" box-shadow: 0 4px 12px rgba(128, 128, 128, 0.4);\n",
" color: white;\n",
"}\n",
"\"\"\") as ui:\n",
" current_selected = gr.State(\"\")\n",
" initial_in_lang = \"Python\"\n",
" initial_out_lang = \"Java\"\n",
" in_comment_style_choices = [\"Standard\"] + LANGUAGE_INFO[initial_in_lang][\"doc_style\"]\n",
" out_comment_style_choices = [\"Standard\"] + LANGUAGE_INFO[initial_out_lang][\"doc_style\"]\n",
" in_unit_test_style_choices = [\"Standard\"] + LANGUAGE_INFO[initial_in_lang][\"unit_test_style\"]\n",
" out_unit_test_style_choices = [\"Standard\"] + LANGUAGE_INFO[initial_out_lang][\"unit_test_style\"]\n",
" in_code_file_name = gr.State(\"in_code.txt\")\n",
" out_code_file_name = gr.State(\"out_code.txt\")\n",
" in_comments_file_name = gr.State(\"in_comments.txt\")\n",
" out_comments_file_name = gr.State(\"out_comments.txt\")\n",
" in_unit_test_file_name = gr.State(\"in_unit_tests.txt\")\n",
" out_unit_test_file_name = gr.State(\"out_unit_tests.txt\")\n",
" \n",
" \n",
" gr.Markdown(\"## Code Helper\")\n",
"\n",
" def load_file_content(file):\n",
" if file is None:\n",
" return \"\"\n",
" with open(file.name, \"r\", encoding=\"utf-8\") as f:\n",
" return f.read()\n",
"\n",
" def change_lang(lang):\n",
" comment_style_choices = [\"Standard\"] + LANGUAGE_INFO[lang][\"doc_style\"]\n",
" unit_test_style_choices = [\"Standard\"] + LANGUAGE_INFO[lang][\"unit_test_style\"]\n",
" return (\n",
" gr.update(choices=comment_style_choices, value=str(comment_style_choices[0])), \n",
" gr.update(choices=unit_test_style_choices, value=str(unit_test_style_choices[0]))\n",
" )\n",
"\n",
" def download_fn(in_text, out_text, in_file_name, out_file_name):\n",
" if in_text:\n",
" with open(in_file_name, \"w\") as f:\n",
" f.write(in_text)\n",
" if out_text:\n",
" with open(out_file_name, \"w\") as f:\n",
" f.write(out_text)\n",
" \n",
" # Conversion part\n",
" with gr.Row():\n",
" in_lang = gr.Dropdown(choices=LANGUAGES, label=\"Select input language\", value=initial_in_lang, interactive=True)\n",
" out_lang = gr.Dropdown(choices=LANGUAGES, label=\"Select output language\", value=initial_out_lang, interactive=True)\n",
" with gr.Row():\n",
" input_file = gr.File(label=\"Upload a source code file or input below\")\n",
" input_instruct = gr.Textbox(\n",
" label=\"Additional instruction(optional)\",\n",
" placeholder=\"Enter the instruction you want the ouput code to follow...\\n\\nFor example: Define the variable using snake_case style.\",\n",
" lines=8\n",
" )\n",
" with gr.Row():\n",
" in_code = gr.Textbox(label=\"Input Code:\", value=python_hard, lines=10)\n",
" out_code = gr.Textbox(label=\"Output Code:\", lines=10)\n",
" with gr.Row():\n",
" convert_model = gr.Dropdown([\"Claude\", \"GPT\", \"Gemini\"], label=\"Select model\", value=\"Claude\")\n",
" with gr.Row():\n",
" convert = gr.Button(\"Convert code\")\n",
" download_code = gr.Button(\"Download code\")\n",
"\n",
" gr.HTML(\"<hr style='border: none; height: 1px; background-color: #333;'>\")\n",
"\n",
" def show_comment(current_selected):\n",
" if current_selected == \"comment\":\n",
" return (\n",
" gr.update(visible=False),\n",
" gr.update(visible=False),\n",
" gr.update(elem_classes=[\"unselected\"]),\n",
" gr.update(elem_classes=[\"unselected\"]),\n",
" \"\"\n",
" )\n",
" else:\n",
" return (\n",
" gr.update(visible=True),\n",
" gr.update(visible=False),\n",
" gr.update(elem_classes=[\"selected\"]),\n",
" gr.update(elem_classes=[\"unselected\"]),\n",
" \"comment\"\n",
" )\n",
"\n",
" def show_unit_test(current_selected):\n",
" if current_selected == \"unit_test\":\n",
" return (\n",
" gr.update(visible=False),\n",
" gr.update(visible=False),\n",
" gr.update(elem_classes=[\"unselected\"]),\n",
" gr.update(elem_classes=[\"unselected\"]),\n",
" \"\"\n",
" )\n",
" else:\n",
" return (\n",
" gr.update(visible=False),\n",
" gr.update(visible=True),\n",
" gr.update(elem_classes=[\"unselected\"]),\n",
" gr.update(elem_classes=[\"selected\"]),\n",
" \"unit_test\"\n",
" )\n",
" \n",
" with gr.Blocks() as demo:\n",
" with gr.Row():\n",
" comment_show_up = gr.Button(\"Comment\", elem_id=\"comment-btn\", elem_classes=[\"unselected\"])\n",
" unit_test_show_up = gr.Button(\"Unit Test\", elem_id=\"unit-test-btn\", elem_classes=[\"unselected\"])\n",
" \n",
" comment_section = gr.Column(visible=False)\n",
" unit_test_section = gr.Column(visible=False)\n",
" \n",
" with comment_section:\n",
" # Comment section\n",
" with gr.Row():\n",
" comment_option = gr.Radio(\n",
" choices=[\n",
" \"Comment input code\",\n",
" \"Comment output code\",\n",
" \"Comment both\"\n",
" ],\n",
" label=\"Commenting Options\",\n",
" value=\"Comment input code\",\n",
" interactive=True\n",
" )\n",
" with gr.Row():\n",
" in_comment_style = gr.Dropdown(choices=in_comment_style_choices, label=\"Select comment style for input code\", value=in_comment_style_choices[0], interactive=True)\n",
" out_comment_style = gr.Dropdown(choices=out_comment_style_choices, label=\"Select comment style for oupt code\", value=out_comment_style_choices[0], interactive=True)\n",
" with gr.Row():\n",
" comment_model = gr.Dropdown([\"Claude\", \"GPT\", \"Gemini\"], label=\"Select model\", value=\"Claude\")\n",
" with gr.Row():\n",
" generate_comments = gr.Button(\"Generate comments\")\n",
" download_comments = gr.Button(\"Download comments\")\n",
" with gr.Row():\n",
" in_comments = gr.Textbox(label=\"Comments for Input Code:\", lines=10)\n",
" out_comments = gr.Textbox(label=\"Comments for Output Code:\", lines=10)\n",
" \n",
" with unit_test_section:\n",
" # Unit test part\n",
" with gr.Row():\n",
" unit_test_option = gr.Radio(\n",
" choices=[\n",
" \"Add unit test for input code\",\n",
" \"Add unit test for output code\",\n",
" \"Add unit test for both\"\n",
" ],\n",
" label=\"Unit Test Options\",\n",
" value=\"Add unit test for input code\",\n",
" interactive=True\n",
" )\n",
" with gr.Row():\n",
" in_unit_test_style = gr.Dropdown(choices=in_unit_test_style_choices, label=\"Select unit test style for input code\", value=in_unit_test_style_choices[0], interactive=True)\n",
" out_unit_test_style = gr.Dropdown(choices=out_unit_test_style_choices, label=\"Select unit test style for oupt code\", value=out_unit_test_style_choices[0], interactive=True)\n",
" with gr.Row():\n",
" unit_test_model = gr.Dropdown([\"Claude\", \"GPT\", \"Gemini\"], label=\"Select model\", value=\"Claude\")\n",
" with gr.Row():\n",
" generate_unit_test = gr.Button(\"Generate unit test\")\n",
" download_unit_test = gr.Button(\"Download unit text\")\n",
" with gr.Row():\n",
" in_unit_test = gr.Textbox(label=\"Unit Test for Input Code:\", lines=10)\n",
" out_unit_test = gr.Textbox(label=\"Unit Test for Output Code:\", lines=10)\n",
"\n",
" in_lang.change(fn=change_lang, inputs=in_lang, outputs=[in_comment_style, in_unit_test_style])\n",
" out_lang.change(fn=change_lang, inputs=out_lang, outputs=[out_comment_style, out_unit_test_style])\n",
" input_file.change(fn=load_file_content, inputs=input_file, outputs=in_code)\n",
" \n",
" convert.click(optimize, inputs=[in_lang, out_lang, in_code, input_instruct, convert_model], outputs=[out_code])\n",
" download_code.click(download_fn, inputs=[in_code, out_code, in_code_file_name, out_code_file_name])\n",
" \n",
" comment_show_up.click(fn=show_comment, inputs=current_selected, outputs=[comment_section, unit_test_section, comment_show_up, unit_test_show_up, current_selected])\n",
" unit_test_show_up.click(fn=show_unit_test, inputs=current_selected, outputs=[comment_section, unit_test_section, comment_show_up, unit_test_show_up, current_selected])\n",
"\n",
" generate_comments.click(generate_comments_fn, inputs=[comment_option, in_lang, out_lang, in_code, out_code, in_comment_style, out_comment_style, comment_model], outputs=[in_comments, out_comments])\n",
" download_comments.click(download_fn, inputs=[in_comments, out_comments, in_comments_file_name, out_comments_file_name])\n",
" generate_unit_test.click(generate_unit_test_fn, inputs=[unit_test_option, in_lang, out_lang, in_code, out_code, in_unit_test_style, out_unit_test_style, unit_test_model], outputs=[in_unit_test, out_unit_test])\n",
" download_unit_test.click(download_fn, inputs=[in_unit_test, out_unit_test, in_unit_test_file_name, out_unit_test_file_name])\n",
" \n",
"ui.launch()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0266734c-0bee-46c0-9b17-9fd2ae86cc3a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,643 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ac833f26-d429-4fd2-8f83-92174f1c951a",
"metadata": {},
"source": [
"# Code conversion using Gemini and Codestral in Windows 11"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c230178c-6f31-4c5a-a888-16b7037ffbf9",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import io\n",
"import sys\n",
"import gradio as gr\n",
"import subprocess\n",
"from google import genai\n",
"from google.genai import types\n",
"from mistralai import Mistral\n",
"from dotenv import load_dotenv\n",
"from IPython.display import Markdown, display, update_display"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d824484-eaaa-456a-b7dc-7e3277fec34a",
"metadata": {},
"outputs": [],
"source": [
"# Load Gemini and Mistral API Keys\n",
"\n",
"load_dotenv(override=True)\n",
"gemini_api_key = os.getenv(\"GOOGLE_API_KEY\")\n",
"mistral_api_key = os.getenv(\"MISTRAL_API_KEY\")\n",
"\n",
"if not mistral_api_key or not gemini_api_key:\n",
" print(\"API Key not found!\")\n",
"else:\n",
" print(\"API Key loaded in memory\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "86f3633e-81f9-4c13-b7b5-793ddc4f886f",
"metadata": {},
"outputs": [],
"source": [
"# Models to be used\n",
"\n",
"MODEL_GEMINI = 'gemini-2.5-flash'\n",
"MODEL_CODESTRAL = 'codestral-latest'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f3a6d53-70f9-46b8-a490-a50f3a1adf9e",
"metadata": {},
"outputs": [],
"source": [
"# Load Gemini client\n",
"try:\n",
" gemini_client = genai.Client(api_key=gemini_api_key)\n",
" print(\"Google GenAI Client initialized successfully!\")\n",
"\n",
" codestral_client = Mistral(api_key=mistral_api_key)\n",
" print(\"Mistral Client initialized successfully!\")\n",
"except Exception as e:\n",
" print(f\"Error initializing Client: {e}\")\n",
" exit() "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f816fbe8-e094-499f-98a5-588ebecf8c72",
"metadata": {},
"outputs": [],
"source": [
"# Gemini System prompt\n",
"\n",
"system_message = \"You are an assistant that reimplements Python code in high-performance C++ optimized for a Windows PC. \"\n",
"system_message += \"Use Windows-specific optimizations where applicable (e.g., multithreading with std::thread, SIMD, or WinAPI if necessary). \"\n",
"system_message += \"Respond only with the equivalent C++ code; include comments only where absolutely necessary. \"\n",
"system_message += \"Avoid any explanation or text outside the code. \"\n",
"system_message += \"The C++ output must produce identical functionality with the fastest possible execution time on Windows.\"\n",
"\n",
"generate_content_config = types.GenerateContentConfig(system_instruction=system_message)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "01227835-15d2-40bd-a9dd-2ef35ad371dc",
"metadata": {},
"outputs": [],
"source": [
"def user_prompt_for(python):\n",
" user_prompt = (\n",
" \"Convert the following Python code into high-performance C++ optimized for Windows. \"\n",
" \"Use standard C++20 or newer with Windows-compatible libraries and best practices. \"\n",
" \"Ensure the implementation runs as fast as possible and produces identical output. \"\n",
" \"Use appropriate numeric types to avoid overflow or precision loss. \"\n",
" \"Avoid unnecessary abstraction; prefer direct computation and memory-efficient structures. \"\n",
" \"Respond only with C++ code, include all required headers (like <iomanip>, <vector>, etc.), and limit comments to only what's essential.\\n\\n\"\n",
" )\n",
" user_prompt += python\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d9fc8e2-acf0-4122-a8a9-5aadadf982ab",
"metadata": {},
"outputs": [],
"source": [
"def user_message_gemini(python): \n",
" return types.Content(role=\"user\", parts=[types.Part.from_text(text=user_prompt_for(python))]) "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "334c8b84-6e37-40fc-97ac-40a1b3aa29fa",
"metadata": {},
"outputs": [],
"source": [
"def messages_for(python):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_message},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(python)}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4aca87ac-6330-4ed4-a36f-1726fd0ada1a",
"metadata": {},
"outputs": [],
"source": [
"def write_output(cpp):\n",
" code = cpp.replace(\"```cpp\", \"\").replace(\"```c++\", \"\").replace(\"```\", \"\").strip()\n",
" \n",
" if not \"#include\" in code:\n",
" raise ValueError(\"C++ code appears invalid: missing #include directives.\")\n",
"\n",
" with open(\"optimized.cpp\", \"w\", encoding=\"utf-8\", newline=\"\\n\") as f:\n",
" f.write(code) "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fcf42642-1a55-4556-8738-0c8c02effa9c",
"metadata": {},
"outputs": [],
"source": [
"# Generate CPP code using Gemini\n",
"\n",
"def optimize_gemini(python):\n",
" stream = gemini_client.models.generate_content_stream(\n",
" model = MODEL_GEMINI,\n",
" config=generate_content_config,\n",
" contents=user_message_gemini(python)\n",
" )\n",
" cpp_code = \"\"\n",
" for chunk in stream:\n",
" chunk_text = chunk.text\n",
" cpp_code += chunk_text\n",
" print(chunk_text, end=\"\", flush=True) \n",
" write_output(cpp_code)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f06a301-4397-4d63-9226-657bb2ddb792",
"metadata": {},
"outputs": [],
"source": [
"# Generate CPP code using Codestral\n",
"\n",
"def optimize_codestral(python):\n",
" stream = codestral_client.chat.stream(\n",
" model = MODEL_CODESTRAL,\n",
" messages = messages_for(python), \n",
" )\n",
" \n",
" cpp_code = \"\"\n",
" for chunk in stream:\n",
" chunk_text = chunk.data.choices[0].delta.content\n",
" cpp_code += chunk_text\n",
" print(chunk_text, end=\"\", flush=True) \n",
" write_output(cpp_code)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8bd51601-7c1d-478d-b043-6f92739e5c4b",
"metadata": {},
"outputs": [],
"source": [
"# Actual code to convert\n",
"\n",
"pi = \"\"\"\n",
"import time\n",
"\n",
"def calculate(iterations, param1, param2):\n",
" result = 1.0\n",
" for i in range(1, iterations+1):\n",
" j = i * param1 - param2\n",
" result -= (1/j)\n",
" j = i * param1 + param2\n",
" result += (1/j)\n",
" return result\n",
"\n",
"start_time = time.time()\n",
"result = calculate(100_000_000, 4, 1) * 4\n",
"end_time = time.time()\n",
"\n",
"print(f\"Result: {result:.12f}\")\n",
"print(f\"Execution Time: {(end_time - start_time):.6f} seconds\")\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db9ea24e-d381-48ac-9196-853d2527dcca",
"metadata": {},
"outputs": [],
"source": [
"exec(pi)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3e26708-8475-474d-8e96-e602c3d5ef9f",
"metadata": {},
"outputs": [],
"source": [
"optimize_gemini(pi)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2cc23ea7-6062-4354-92bc-730baa52a50b",
"metadata": {},
"outputs": [],
"source": [
"# CPP Compilation\n",
"\n",
"!g++ -O3 -std=c++20 -o optimized.exe optimized.cpp"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b14704d-95fe-4ed2-861f-af591bf3090e",
"metadata": {},
"outputs": [],
"source": [
"!.\\optimized.exe"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5d756d1a-1d49-4cfb-bed7-8748d848b083",
"metadata": {},
"outputs": [],
"source": [
"optimize_codestral(pi)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e286dc8-9532-48b1-b748-a7950972e7df",
"metadata": {},
"outputs": [],
"source": [
"!g++ -O3 -std=c++20 -o optimized.exe optimized.cpp"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "61fe0044-7679-4245-9e59-50642f3d80c6",
"metadata": {},
"outputs": [],
"source": [
"!.\\optimized.exe"
]
},
{
"cell_type": "markdown",
"id": "f0c0392c-d2a7-4619-82a2-f7b9fa7c43f9",
"metadata": {},
"source": [
"## Hard Code"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ca53eb4-46cd-435b-a950-0e2a8f845535",
"metadata": {},
"outputs": [],
"source": [
"python_hard = \"\"\"# Be careful to support large number sizes\n",
"\n",
"def lcg(seed, a=1664525, c=1013904223, m=2**32):\n",
" value = seed\n",
" while True:\n",
" value = (a * value + c) % m\n",
" yield value\n",
" \n",
"def max_subarray_sum(n, seed, min_val, max_val):\n",
" lcg_gen = lcg(seed)\n",
" random_numbers = [next(lcg_gen) % (max_val - min_val + 1) + min_val for _ in range(n)]\n",
" max_sum = float('-inf')\n",
" for i in range(n):\n",
" current_sum = 0\n",
" for j in range(i, n):\n",
" current_sum += random_numbers[j]\n",
" if current_sum > max_sum:\n",
" max_sum = current_sum\n",
" return max_sum\n",
"\n",
"def total_max_subarray_sum(n, initial_seed, min_val, max_val):\n",
" total_sum = 0\n",
" lcg_gen = lcg(initial_seed)\n",
" for _ in range(20):\n",
" seed = next(lcg_gen)\n",
" total_sum += max_subarray_sum(n, seed, min_val, max_val)\n",
" return total_sum\n",
"\n",
"# Parameters\n",
"n = 10000 # Number of random numbers\n",
"initial_seed = 42 # Initial seed for the LCG\n",
"min_val = -10 # Minimum value of random numbers\n",
"max_val = 10 # Maximum value of random numbers\n",
"\n",
"# Timing the function\n",
"import time\n",
"start_time = time.time()\n",
"result = total_max_subarray_sum(n, initial_seed, min_val, max_val)\n",
"end_time = time.time()\n",
"\n",
"print(\"Total Maximum Subarray Sum (20 runs):\", result)\n",
"print(\"Execution Time: {:.6f} seconds\".format(end_time - start_time))\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "697cc9fe-efdb-40b7-8e43-871bd2df940e",
"metadata": {},
"outputs": [],
"source": [
"exec(python_hard)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "17ed6329-6c5f-45af-91ff-06d73830dd0d",
"metadata": {},
"outputs": [],
"source": [
"optimize_gemini(python_hard)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0b57f0e7-46c9-4235-86eb-389faf37b7bb",
"metadata": {},
"outputs": [],
"source": [
"# CPP Compilation\n",
"\n",
"!g++ -O3 -std=c++20 -o optimized.exe optimized.cpp"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8ce8d01-fda8-400d-b3d4-6f1ad3008d28",
"metadata": {},
"outputs": [],
"source": [
"!.\\optimized.exe"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "adbcdac7-8656-41c9-8707-d8a71998d393",
"metadata": {},
"outputs": [],
"source": [
"optimize_codestral(python_hard)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f9fc9b1-29cf-4510-83f8-1484d26e871e",
"metadata": {},
"outputs": [],
"source": [
"# CPP Compilation\n",
"\n",
"!g++ -O3 -std=c++20 -o optimized.exe optimized.cpp"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "52170458-c4a1-4920-8d83-8c5ba7250759",
"metadata": {},
"outputs": [],
"source": [
"!.\\optimized.exe"
]
},
{
"cell_type": "markdown",
"id": "da6aee85-2792-487b-bef3-fec5dcf12623",
"metadata": {},
"source": [
"## Accommodating the entire code in Gradio"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f2a90c4f-c289-4658-a6ce-51b80e20f91f",
"metadata": {},
"outputs": [],
"source": [
"def stream_gemini(python):\n",
" stream = gemini_client.models.generate_content_stream(\n",
" model = MODEL_GEMINI,\n",
" config=generate_content_config,\n",
" contents=user_message_gemini(python)\n",
" )\n",
"\n",
" cpp_code = \"\"\n",
" for chunk in stream:\n",
" chunk_text = chunk.text or \"\"\n",
" cpp_code += chunk_text\n",
" yield cpp_code.replace('```cpp\\n','').replace('```','')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e872171-96d8-4041-8cb0-0c632c5e957f",
"metadata": {},
"outputs": [],
"source": [
"def stream_codestral(python):\n",
" stream = codestral_client.chat.stream(\n",
" model = MODEL_CODESTRAL,\n",
" messages = messages_for(python), \n",
" )\n",
"\n",
" cpp_code = \"\"\n",
" for chunk in stream:\n",
" chunk_text = chunk.data.choices[0].delta.content or \"\"\n",
" cpp_code += chunk_text\n",
" yield cpp_code.replace('```cpp\\n','').replace('```','') "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3340b36b-1241-4b0f-9e69-d4e5cc215a27",
"metadata": {},
"outputs": [],
"source": [
"def optimize(python, model):\n",
" if model.lower() == 'gemini':\n",
" result = stream_gemini(python)\n",
" elif model.lower() == 'codestral':\n",
" result = stream_codestral(python)\n",
" else:\n",
" raise ValueError(\"Unknown model\")\n",
" \n",
" for stream_so_far in result:\n",
" yield stream_so_far "
]
},
{
"cell_type": "markdown",
"id": "277ddd6c-e71e-4512-965a-57fca341487a",
"metadata": {},
"source": [
"### Gradio Implementation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "222a9eae-236e-4ba3-8f23-3d9b879ec2d0",
"metadata": {},
"outputs": [],
"source": [
"custom_css = \"\"\"\n",
".scrollable-box textarea {\n",
" overflow: auto !important;\n",
" height: 400px;\n",
"}\n",
"\n",
".python {background-color: #306998;}\n",
".cpp {background-color: #050;}\n",
"\n",
"\"\"\"\n",
"\n",
"theme = gr.themes.Soft()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b4bd6ed1-ff8c-42d4-8da6-24b9cfd134db",
"metadata": {},
"outputs": [],
"source": [
"def execute_python(code):\n",
" try:\n",
" result = subprocess.run(\n",
" [\"python\", \"-c\", code],\n",
" capture_output=True,\n",
" text=True,\n",
" timeout=60\n",
" )\n",
" if result.returncode == 0:\n",
" return result.stdout or \"[No output]\"\n",
" else:\n",
" return f\"[Error]\\n{result.stderr}\"\n",
" except subprocess.TimeoutExpired:\n",
" return \"[Error] Execution timed out.\"\n",
" except Exception as e:\n",
" return f\"[Exception] {str(e)}\" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1507c973-8699-48b2-80cd-45900c97a867",
"metadata": {},
"outputs": [],
"source": [
"def execute_cpp(code):\n",
" write_output(code)\n",
" \n",
" try:\n",
" compile_cmd = [\"g++\", \"-O3\", \"-std=c++20\", \"-o\", \"optimized.exe\", \"optimized.cpp\"]\n",
" compile_result = subprocess.run(compile_cmd, capture_output=True, text=True, check=True)\n",
" \n",
" run_cmd = [\"optimized.exe\"]\n",
" run_result = subprocess.run(run_cmd, check=True, text=True, capture_output=True, timeout=60)\n",
" \n",
" return run_result.stdout or \"[No output]\"\n",
" \n",
" except subprocess.CalledProcessError as e:\n",
" return f\"[Compile/Runtime Error]\\n{e.stderr}\"\n",
" except subprocess.TimeoutExpired:\n",
" return \"[Error] Execution timed out.\"\n",
" except Exception as e:\n",
" return f\"[Exception] {str(e)}\" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "374f00f3-8fcf-4ae9-bf54-c5a44dd74844",
"metadata": {},
"outputs": [],
"source": [
"with gr.Blocks(css=custom_css, theme=theme) as ui:\n",
" gr.Markdown(\"## Convert code from Python to C++\")\n",
" with gr.Row():\n",
" python = gr.Textbox(label=\"Python code:\", lines=10, value=python_hard, elem_classes=[\"scrollable-box\"])\n",
" cpp = gr.Textbox(label=\"C++ code:\", lines=10, elem_classes=[\"scrollable-box\"])\n",
" with gr.Row():\n",
" model = gr.Dropdown([\"Gemini\", \"Codestral\"], label=\"Select model\", value=\"Gemini\")\n",
" convert = gr.Button(\"Convert code\")\n",
" with gr.Row():\n",
" python_run = gr.Button(\"Run Python\")\n",
" cpp_run = gr.Button(\"Run C++\")\n",
" with gr.Row():\n",
" python_out = gr.TextArea(label=\"Python result:\", elem_classes=[\"python\"])\n",
" cpp_out = gr.TextArea(label=\"C++ result:\", elem_classes=[\"cpp\"])\n",
"\n",
" convert.click(optimize, inputs=[python,model], outputs=[cpp])\n",
" python_run.click(execute_python,inputs=[python], outputs=[python_out])\n",
" cpp_run.click(execute_cpp, inputs=[cpp], outputs=[cpp_out])\n",
"\n",
"ui.launch(inbrowser=True) "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,476 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "4c07cdc9-bce0-49ad-85c7-14f1872b8519",
"metadata": {},
"source": [
"# Python to CPP using Qwen2.5-Coder-32B-Instruct with Hyperbolic Inference Endpoint in Windows"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f051c517-c4fd-4248-98aa-b808fae76cf6",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import io\n",
"import sys\n",
"import gradio as gr\n",
"import subprocess\n",
"from dotenv import load_dotenv\n",
"from huggingface_hub import InferenceClient\n",
"from google import genai\n",
"from google.genai import types\n",
"from mistralai import Mistral"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c6c8777b-57bc-436a-978f-21a37ea310ae",
"metadata": {},
"outputs": [],
"source": [
"# Load Api Keys from env\n",
"\n",
"load_dotenv(override=True)\n",
"\n",
"hf_api_key = os.getenv(\"HF_TOKEN\")\n",
"gemini_api_key = os.getenv(\"GOOGLE_API_KEY\")\n",
"mistral_api_key = os.getenv(\"MISTRAL_API_KEY\")\n",
"\n",
"if not mistral_api_key or not gemini_api_key or not hf_api_key:\n",
" print(\"API Key not found!\")\n",
"else:\n",
" print(\"API Key loaded in memory\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5cf6f93-7e07-40e0-98b8-d4e74ea18402",
"metadata": {},
"outputs": [],
"source": [
"# MODELs \n",
"\n",
"MODEL_QWEN = \"Qwen/Qwen2.5-Coder-32B-Instruct\"\n",
"MODEL_GEMINI = 'gemini-2.5-flash'\n",
"MODEL_CODESTRAL = 'codestral-latest'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "689547c3-aaa5-4800-86a2-da52765997d8",
"metadata": {},
"outputs": [],
"source": [
"# Load Clients\n",
"\n",
"try:\n",
" gemini_client = genai.Client(api_key=gemini_api_key)\n",
" print(\"Google GenAI Client initialized successfully!\")\n",
"\n",
" codestral_client = Mistral(api_key=mistral_api_key)\n",
" print(\"Mistral Client initialized successfully!\")\n",
" \n",
" hf_client = InferenceClient(provider=\"hyperbolic\",api_key=hf_api_key)\n",
" print(\"Hyperbolic Inference Client initialized successfully!\")\n",
"except Exception as e:\n",
" print(f\"Error initializing Client: {e}\")\n",
" exit() "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1c3a81f4-99c3-463a-ae30-4656a7a246d2",
"metadata": {},
"outputs": [],
"source": [
"system_message = \"You are an assistant that reimplements Python code in high-performance C++ optimized for a Windows PC. \"\n",
"system_message += \"Use Windows-specific optimizations where applicable (e.g., multithreading with std::thread, SIMD, or WinAPI if necessary). \"\n",
"system_message += \"Respond only with the equivalent C++ code; include comments only where absolutely necessary. \"\n",
"system_message += \"Avoid any explanation or text outside the code. \"\n",
"system_message += \"The C++ output must produce identical functionality with the fastest possible execution time on Windows.\"\n",
"\n",
"generate_content_config = types.GenerateContentConfig(system_instruction=system_message)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0fde9514-1005-4539-b01b-0372730ce67b",
"metadata": {},
"outputs": [],
"source": [
"def user_prompt_for(python):\n",
" user_prompt = (\n",
" \"Convert the following Python code into high-performance C++ optimized for Windows. \"\n",
" \"Use standard C++20 or newer with Windows-compatible libraries and best practices. \"\n",
" \"Ensure the implementation runs as fast as possible and produces identical output. \"\n",
" \"Use appropriate numeric types to avoid overflow or precision loss. \"\n",
" \"Avoid unnecessary abstraction; prefer direct computation and memory-efficient structures. \"\n",
" \"Respond only with C++ code, include all required headers (like <iomanip>, <vector>, etc.), and limit comments to only what's essential.\\n\\n\"\n",
" )\n",
" user_prompt += python\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "89c8b010-08dd-4695-a784-65162d82a24b",
"metadata": {},
"outputs": [],
"source": [
"def user_message_gemini(python): \n",
" return types.Content(role=\"user\", parts=[types.Part.from_text(text=user_prompt_for(python))]) "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66923158-983d-46f7-ab19-f216fb1f6a87",
"metadata": {},
"outputs": [],
"source": [
"def messages_for(python):\n",
" return [\n",
" {\"role\": \"system\", \"content\": system_message},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(python)}\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ab59a54-b28a-4d07-b04f-b568e6e25dfb",
"metadata": {},
"outputs": [],
"source": [
"def write_output(cpp):\n",
" code = cpp.replace(\"```cpp\", \"\").replace(\"```c++\", \"\").replace(\"```\", \"\").strip()\n",
" \n",
" if not \"#include\" in code:\n",
" raise ValueError(\"C++ code appears invalid: missing #include directives.\")\n",
"\n",
" with open(\"qwenOptimized.cpp\", \"w\", encoding=\"utf-8\", newline=\"\\n\") as f:\n",
" f.write(code) "
]
},
{
"cell_type": "markdown",
"id": "e05ea9f0-6ade-4699-b5fa-fb8ef9f16bcb",
"metadata": {},
"source": [
"### Python Codes"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c515ce2c-1f8d-4484-8d34-9ffe1372dad4",
"metadata": {},
"outputs": [],
"source": [
"python_easy = \"\"\"\n",
"import time\n",
"\n",
"def calculate(iterations, param1, param2):\n",
" result = 1.0\n",
" for i in range(1, iterations+1):\n",
" j = i * param1 - param2\n",
" result -= (1/j)\n",
" j = i * param1 + param2\n",
" result += (1/j)\n",
" return result\n",
"\n",
"start_time = time.time()\n",
"result = calculate(100_000_000, 4, 1) * 4\n",
"end_time = time.time()\n",
"\n",
"print(f\"Result: {result:.12f}\")\n",
"print(f\"Execution Time: {(end_time - start_time):.6f} seconds\")\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83ab4080-71ae-45e6-970b-030dc462f571",
"metadata": {},
"outputs": [],
"source": [
"python_hard = \"\"\"# Be careful to support large number sizes\n",
"\n",
"def lcg(seed, a=1664525, c=1013904223, m=2**32):\n",
" value = seed\n",
" while True:\n",
" value = (a * value + c) % m\n",
" yield value\n",
" \n",
"def max_subarray_sum(n, seed, min_val, max_val):\n",
" lcg_gen = lcg(seed)\n",
" random_numbers = [next(lcg_gen) % (max_val - min_val + 1) + min_val for _ in range(n)]\n",
" max_sum = float('-inf')\n",
" for i in range(n):\n",
" current_sum = 0\n",
" for j in range(i, n):\n",
" current_sum += random_numbers[j]\n",
" if current_sum > max_sum:\n",
" max_sum = current_sum\n",
" return max_sum\n",
"\n",
"def total_max_subarray_sum(n, initial_seed, min_val, max_val):\n",
" total_sum = 0\n",
" lcg_gen = lcg(initial_seed)\n",
" for _ in range(20):\n",
" seed = next(lcg_gen)\n",
" total_sum += max_subarray_sum(n, seed, min_val, max_val)\n",
" return total_sum\n",
"\n",
"# Parameters\n",
"n = 10000 # Number of random numbers\n",
"initial_seed = 42 # Initial seed for the LCG\n",
"min_val = -10 # Minimum value of random numbers\n",
"max_val = 10 # Maximum value of random numbers\n",
"\n",
"# Timing the function\n",
"import time\n",
"start_time = time.time()\n",
"result = total_max_subarray_sum(n, initial_seed, min_val, max_val)\n",
"end_time = time.time()\n",
"\n",
"print(\"Total Maximum Subarray Sum (20 runs):\", result)\n",
"print(\"Execution Time: {:.6f} seconds\".format(end_time - start_time))\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"id": "31498c5c-ecdd-4ed7-9607-4d09af893b98",
"metadata": {},
"source": [
"## Code Implementation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ea4a4968-e04f-4939-8c42-32c960699354",
"metadata": {},
"outputs": [],
"source": [
"def stream_gemini(python):\n",
" stream = gemini_client.models.generate_content_stream(\n",
" model = MODEL_GEMINI,\n",
" config=generate_content_config,\n",
" contents=user_message_gemini(python)\n",
" )\n",
"\n",
" cpp_code = \"\"\n",
" for chunk in stream:\n",
" chunk_text = chunk.text or \"\"\n",
" cpp_code += chunk_text\n",
" yield cpp_code.replace('```cpp\\n','').replace('```','')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "69601eee-520f-4813-b796-aee9118e8a72",
"metadata": {},
"outputs": [],
"source": [
"def stream_codestral(python):\n",
" stream = codestral_client.chat.stream(\n",
" model = MODEL_CODESTRAL,\n",
" messages = messages_for(python), \n",
" )\n",
"\n",
" cpp_code = \"\"\n",
" for chunk in stream:\n",
" chunk_text = chunk.data.choices[0].delta.content or \"\"\n",
" cpp_code += chunk_text\n",
" yield cpp_code.replace('```cpp\\n','').replace('```','') "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb8899cf-54c0-4d2d-8772-42925c2e1d13",
"metadata": {},
"outputs": [],
"source": [
"def stream_qwen(python):\n",
" stream = hf_client.chat.completions.create(\n",
" model = MODEL_QWEN,\n",
" messages = messages_for(python),\n",
" stream=True\n",
" )\n",
" cpp_code = \"\"\n",
" for chunk in stream:\n",
" chunk_text = chunk.choices[0].delta.content\n",
" cpp_code += chunk_text\n",
" yield cpp_code.replace('```cpp\\n','').replace('```','')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98862fef-905c-4b50-bc7a-4c0462495b5c",
"metadata": {},
"outputs": [],
"source": [
"def optimize(python, model):\n",
" if model.lower() == 'gemini':\n",
" result = stream_gemini(python)\n",
" elif model.lower() == 'codestral':\n",
" result = stream_codestral(python)\n",
" elif model.lower() == 'qwen_coder':\n",
" result = stream_qwen(python)\n",
" else:\n",
" raise ValueError(\"Unknown model\")\n",
" \n",
" for stream_so_far in result:\n",
" yield stream_so_far "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa9372df-db01-41d0-842c-4857b20f93f0",
"metadata": {},
"outputs": [],
"source": [
"custom_css = \"\"\"\n",
".scrollable-box textarea {\n",
" overflow: auto !important;\n",
" height: 400px;\n",
"}\n",
"\n",
".python {background-color: #306998;}\n",
".cpp {background-color: #050;}\n",
"\n",
"\"\"\"\n",
"\n",
"theme = gr.themes.Soft()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dbcf9fe9-c3da-466b-8478-83dcdbe7d48e",
"metadata": {},
"outputs": [],
"source": [
"def execute_python(code):\n",
" try:\n",
" result = subprocess.run(\n",
" [\"python\", \"-c\", code],\n",
" capture_output=True,\n",
" text=True,\n",
" timeout=60\n",
" )\n",
" if result.returncode == 0:\n",
" return result.stdout or \"[No output]\"\n",
" else:\n",
" return f\"[Error]\\n{result.stderr}\"\n",
" except subprocess.TimeoutExpired:\n",
" return \"[Error] Execution timed out.\"\n",
" except Exception as e:\n",
" return f\"[Exception] {str(e)}\" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8029e00d-1ee8-43d1-8c87-2aa0544cf94c",
"metadata": {},
"outputs": [],
"source": [
"def execute_cpp(code):\n",
" write_output(code)\n",
" \n",
" try:\n",
" compile_cmd = [\"g++\", \"-O3\", \"-std=c++20\", \"-o\", \"optimized.exe\", \"optimized.cpp\"]\n",
" compile_result = subprocess.run(compile_cmd, capture_output=True, text=True, check=True)\n",
" \n",
" run_cmd = [\"optimized.exe\"]\n",
" run_result = subprocess.run(run_cmd, check=True, text=True, capture_output=True, timeout=60)\n",
" \n",
" return run_result.stdout or \"[No output]\"\n",
" \n",
" except subprocess.CalledProcessError as e:\n",
" return f\"[Compile/Runtime Error]\\n{e.stderr}\"\n",
" except subprocess.TimeoutExpired:\n",
" return \"[Error] Execution timed out.\"\n",
" except Exception as e:\n",
" return f\"[Exception] {str(e)}\" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d5f4e88c-be15-4870-9f99-82b6273ee739",
"metadata": {},
"outputs": [],
"source": [
"with gr.Blocks(css=custom_css, theme=theme) as ui:\n",
" gr.Markdown(\"## Convert code from Python to C++\")\n",
" with gr.Row():\n",
" python = gr.Textbox(label=\"Python code:\", lines=10, value=python_hard, elem_classes=[\"scrollable-box\"])\n",
" cpp = gr.Textbox(label=\"C++ code:\", lines=10, elem_classes=[\"scrollable-box\"])\n",
" with gr.Row():\n",
" model = gr.Dropdown([\"Gemini\", \"Codestral\", \"QWEN_Coder\"], label=\"Select model\", value=\"Gemini\")\n",
" convert = gr.Button(\"Convert code\")\n",
" with gr.Row():\n",
" python_run = gr.Button(\"Run Python\")\n",
" cpp_run = gr.Button(\"Run C++\")\n",
" with gr.Row():\n",
" python_out = gr.TextArea(label=\"Python result:\", elem_classes=[\"python\"])\n",
" cpp_out = gr.TextArea(label=\"C++ result:\", elem_classes=[\"cpp\"])\n",
"\n",
" convert.click(optimize, inputs=[python,model], outputs=[cpp])\n",
" python_run.click(execute_python,inputs=[python], outputs=[python_out])\n",
" cpp_run.click(execute_cpp, inputs=[cpp], outputs=[cpp_out])\n",
"\n",
"ui.launch(inbrowser=True) "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa1a231e-2743-4cee-afe2-783d2b9513e5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,538 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "3e473bbd-a0c2-43bd-bf99-c749784d00c3",
"metadata": {},
"outputs": [],
"source": [
"import gradio as gr\n",
"import openai\n",
"import anthropic\n",
"import google.generativeai as genai\n",
"import requests\n",
"import json\n",
"import os\n",
"from typing import Dict, Any, Optional\n",
"import asyncio\n",
"from dotenv import load_dotenv"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "16210512-41f1-4de3-8348-2cd7129e023f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# load API\n",
"load_dotenv(override=True)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "6747e275-91eb-4d2b-90b6-805f2bd9b6b7",
"metadata": {},
"outputs": [],
"source": [
"class CodeCommenter:\n",
" def __init__(self):\n",
" # Initialize API clients\n",
" self.openai_client = None\n",
" self.anthropic_client = None\n",
" self.gemini_client = None\n",
" \n",
" # Load API keys from environment variables\n",
" self.setup_clients()\n",
" \n",
" def setup_clients(self):\n",
" \"\"\"Initialize API clients with keys from environment variables\"\"\"\n",
" try:\n",
" # OpenAI\n",
" openai_key = os.getenv('OPENAI_API_KEY')\n",
" if openai_key:\n",
" self.openai_client = openai.OpenAI(api_key=openai_key)\n",
" \n",
" # Anthropic\n",
" anthropic_key = os.getenv('ANTHROPIC_API_KEY')\n",
" if anthropic_key:\n",
" self.anthropic_client = anthropic.Anthropic(api_key=anthropic_key)\n",
" \n",
" # Google Gemini\n",
" gemini_key = os.getenv('GOOGLE_API_KEY')\n",
" if gemini_key:\n",
" genai.configure(api_key=gemini_key)\n",
" self.gemini_client = genai.GenerativeModel('gemini-2.0-flash-exp')\n",
" \n",
" except Exception as e:\n",
" print(f\"Warning: Error setting up API clients: {e}\")\n",
" \n",
" def create_comments_prompt(self, code: str, language: str) -> str:\n",
" \"\"\"Create a prompt for the LLM to add comments and docstrings\"\"\"\n",
" return f\"\"\"Please add detailed and helpful comments and docstrings to the following {language} code. \n",
" \n",
"Guidelines:\n",
"1. Add comprehensive docstrings for functions, classes, and modules\n",
"2. Add inline comments explaining complex logic\n",
"3. Follow the commenting conventions for {language}\n",
"4. Maintain the original code structure and functionality\n",
"5. Make comments clear and professional\n",
"6. Don't change the actual code logic, only add comments\n",
"7. Do not add code markdown delimiters like ```python\n",
"\n",
"Here's the code to comment:\n",
"\n",
"{code}\n",
"\n",
"Please return only the commented code without any additional explanation or markdown formatting.\"\"\"\n",
"\n",
" def create_tests_prompt(self, code: str, language: str) -> str:\n",
" \"\"\"Create a prompt for the LLM to generate unit tests\"\"\"\n",
" return f\"\"\"Please generate comprehensive unit tests for the following {language} code.\n",
" \n",
"Guidelines:\n",
"1. Use appropriate testing framework for {language} (pytest for Python, JUnit for Java, etc.)\n",
"2. Create tests for all functions and methods\n",
"3. Include both positive and negative test cases\n",
"4. Test edge cases and error conditions\n",
"5. Use meaningful test names that describe what is being tested\n",
"6. Include setup and teardown methods if needed\n",
"7. Add mock objects for external dependencies (like database connections)\n",
"8. Do not add code markdown delimiters like ```python\n",
"9. Follow testing best practices for {language}\n",
"\n",
"Here's the code to test:\n",
"\n",
"{code}\n",
"\n",
"Please return only the unit test code without any additional explanation or markdown formatting.\"\"\"\n",
"\n",
" def create_combined_prompt(self, code: str, language: str) -> str:\n",
" \"\"\"Create a prompt for the LLM to add both comments and unit tests\"\"\"\n",
" return f\"\"\"Please add detailed comments and docstrings to the following {language} code AND generate comprehensive unit tests for it.\n",
" \n",
"For Comments:\n",
"1. Add comprehensive docstrings for functions, classes, and modules\n",
"2. Add inline comments explaining complex logic\n",
"3. Follow the commenting conventions for {language}\n",
"4. Don't change the actual code logic, only add comments\n",
"\n",
"For Unit Tests:\n",
"1. Use appropriate testing framework for {language} (pytest for Python, JUnit for Java, etc.)\n",
"2. Create tests for all functions and methods\n",
"3. Include both positive and negative test cases\n",
"4. Test edge cases and error conditions\n",
"5. Add mock objects for external dependencies (like database connections)\n",
"6. Follow testing best practices for {language}\n",
"\n",
"Structure your response as:\n",
"1. First, provide the original code with added comments and docstrings \n",
"2. Then, provide the unit tests as a separate section\n",
"3. Do not add code markdown delimiters like ```python\n",
"4. The 2 separated portions of code, comments and unit test should be clearly demarcated by comments specifying the following section purpose\n",
"\n",
"Here's the code:\n",
"\n",
"{code}\n",
"\n",
"Please return the commented code followed by the unit tests, clearly separated.\"\"\"\n",
"\n",
" def call_openai(self, prompt: str, model: str = \"gpt-4o-mini\") -> str:\n",
" \"\"\"Make API call to OpenAI\"\"\"\n",
" if not self.openai_client:\n",
" return \"Error: OpenAI API key not configured. Please set OPENAI_API_KEY environment variable.\"\n",
" \n",
" try:\n",
" response = self.openai_client.chat.completions.create(\n",
" model=model,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": \"You are a helpful coding assistant that adds detailed comments, docstrings, and generates unit tests for code.\"},\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ],\n",
" max_tokens=4000,\n",
" temperature=0.1\n",
" )\n",
" return response.choices[0].message.content.strip()\n",
" except Exception as e:\n",
" return f\"Error calling OpenAI API: {str(e)}\"\n",
" \n",
" def call_anthropic(self, prompt: str, model: str = \"claude-3-5-haiku-20241022\") -> str:\n",
" \"\"\"Make API call to Anthropic Claude\"\"\"\n",
" if not self.anthropic_client:\n",
" return \"Error: Anthropic API key not configured. Please set ANTHROPIC_API_KEY environment variable.\"\n",
" \n",
" try:\n",
" response = self.anthropic_client.messages.create(\n",
" model=model,\n",
" max_tokens=4000,\n",
" temperature=0.1,\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ]\n",
" )\n",
" return response.content[0].text.strip()\n",
" except Exception as e:\n",
" return f\"Error calling Anthropic API: {str(e)}\"\n",
" \n",
" def call_gemini(self, prompt: str) -> str:\n",
" \"\"\"Make API call to Google Gemini\"\"\"\n",
" if not self.gemini_client:\n",
" return \"Error: Google API key not configured. Please set GOOGLE_API_KEY environment variable.\"\n",
" \n",
" try:\n",
" response = self.gemini_client.generate_content(\n",
" prompt,\n",
" generation_config=genai.types.GenerationConfig(\n",
" max_output_tokens=4000,\n",
" temperature=0.1,\n",
" )\n",
" )\n",
" return response.text.strip()\n",
" except Exception as e:\n",
" return f\"Error calling Gemini API: {str(e)}\"\n",
" \n",
" def call_ollama(self, prompt: str, model: str = \"llama3.2:latest\") -> str:\n",
" \"\"\"Make API call to Ollama (local)\"\"\"\n",
" try:\n",
" url = \"http://localhost:11434/api/generate\"\n",
" data = {\n",
" \"model\": model,\n",
" \"prompt\": prompt,\n",
" \"stream\": False,\n",
" \"options\": {\n",
" \"temperature\": 0.1,\n",
" \"num_predict\": 4000\n",
" }\n",
" }\n",
" \n",
" response = requests.post(url, json=data, timeout=60)\n",
" if response.status_code == 200:\n",
" result = response.json()\n",
" return result.get('response', '').strip()\n",
" else:\n",
" return f\"Error calling Ollama API: HTTP {response.status_code}\"\n",
" except requests.exceptions.ConnectionError:\n",
" return \"Error: Could not connect to Ollama. Make sure Ollama is running locally on port 11434.\"\n",
" except Exception as e:\n",
" return f\"Error calling Ollama API: {str(e)}\"\n",
"\n",
" def process_code(self, language: str, code: str, llm: str, generate_comments: bool, generate_tests: bool) -> str:\n",
" \"\"\"Process the given code based on selected options\"\"\"\n",
" if not code.strip():\n",
" return \"Error: Please provide code to process.\"\n",
" \n",
" if not generate_comments and not generate_tests:\n",
" return \"Error: Please select at least one option (Generate comments or Generate test units).\"\n",
" \n",
" # Determine which prompt to use\n",
" if generate_comments and generate_tests:\n",
" prompt = self.create_combined_prompt(code, language)\n",
" elif generate_comments:\n",
" prompt = self.create_comments_prompt(code, language)\n",
" else: # generate_tests only\n",
" prompt = self.create_tests_prompt(code, language)\n",
" \n",
" # Route to appropriate LLM\n",
" if llm == \"gpt-4o-mini\":\n",
" return self.call_openai(prompt, \"gpt-4o-mini\")\n",
" elif llm == \"claude-3-5-haiku-20241022\":\n",
" return self.call_anthropic(prompt, \"claude-3-5-haiku-20241022\")\n",
" elif llm == \"gemini-2.0-flash\":\n",
" return self.call_gemini(prompt)\n",
" elif llm == \"ollama:llama3.2:latest\":\n",
" return self.call_ollama(prompt, \"llama3.2:latest\")\n",
" else:\n",
" return f\"Error: Unsupported LLM: {llm}\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "813f0911-d53f-4887-9341-656712e32d8f",
"metadata": {},
"outputs": [],
"source": [
"def create_gradio_interface():\n",
" \"\"\"Create and configure the Gradio interface\"\"\"\n",
" commenter = CodeCommenter()\n",
" \n",
" # Define the main function for the interface\n",
" def process_code_interface(language, code, llm, generate_comments, generate_tests):\n",
" \"\"\"Process the code and return processed version based on selected options\"\"\"\n",
" if not code.strip():\n",
" return \"Please enter some code to process.\"\n",
" \n",
" if not generate_comments and not generate_tests:\n",
" return \"Please select at least one option: Generate comments or Generate test units.\"\n",
" \n",
" # Show processing message\n",
" options = []\n",
" if generate_comments:\n",
" options.append(\"comments\")\n",
" if generate_tests:\n",
" options.append(\"unit tests\")\n",
" \n",
" processing_msg = f\"Processing {language} code with {llm} to generate {' and '.join(options)}...\"\n",
" print(processing_msg)\n",
" \n",
" # Process the code\n",
" result = commenter.process_code(language, code, llm, generate_comments, generate_tests)\n",
" return result\n",
" \n",
" # Define default code\n",
" default_code = \"\"\"import pyodbc\n",
"from tabulate import tabulate\n",
"def connect_to_sql_server(server_name, database, username=None, password=None):\n",
" try:\n",
" if username and password:\n",
" connection_string = f\"DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server_name};DATABASE={database};UID={username};PWD={password}\"\n",
" else:\n",
" connection_string = f\"DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server_name};DATABASE={database};Trusted_Connection=yes\"\n",
" connection = pyodbc.connect(connection_string)\n",
" print(f\"Successfully connected to {server_name}/{database}\")\n",
" return connection\n",
" except Exception as e:\n",
" print(f\"Failed to connect to {server_name}/{database}: {str(e)}\")\n",
" return None\n",
"def get_record_count(connection, table_name):\n",
" try:\n",
" cursor = connection.cursor()\n",
" query = f\"SELECT COUNT(*) FROM {table_name}\"\n",
" cursor.execute(query)\n",
" count = cursor.fetchone()[0]\n",
" cursor.close()\n",
" print(f\"Record count for {table_name}: {count}\")\n",
" return count\n",
" except Exception as e:\n",
" print(f\"Failed to get record count for {table_name}: {str(e)}\")\n",
" return None\n",
"def select_top_records(connection, table_name, n):\n",
" try:\n",
" cursor = connection.cursor()\n",
" query = f\"SELECT TOP {n} * FROM {table_name}\"\n",
" cursor.execute(query)\n",
" records = cursor.fetchall()\n",
" columns = [column[0] for column in cursor.description]\n",
" cursor.close()\n",
" print(f\"Top {n} records from {table_name}\")\n",
" if records:\n",
" print(tabulate(records, headers=columns, tablefmt=\"grid\"))\n",
" return records\n",
" except Exception as e:\n",
" print(f\"Failed to retrieve top {n} records from {table_name}: {str(e)}\")\n",
" return None\n",
"conn = connect_to_sql_server(\"localhost\", \"AdventureWorks_lite\")\n",
"if conn:\n",
" total_records = get_record_count(conn, \"Sales.SalesOrderDetail\")\n",
" top_records = select_top_records(conn, \"Production.Product\", 10)\n",
" conn.close()\n",
" print(\"Connection closed successfully\")\"\"\"\n",
"\n",
" css = \"\"\"\n",
"textarea[rows]:not([rows=\"1\"]) {\n",
" overflow-y: auto !important;\n",
" scrollbar-width: thin !important;\n",
"}\n",
"textarea[rows]:not([rows=\"1\"])::-webkit-scrollbar {\n",
" all: initial !important;\n",
" background: #f1f1f1 !important;\n",
"}\n",
"textarea[rows]:not([rows=\"1\"])::-webkit-scrollbar-thumb {\n",
" all: initial !important;\n",
" background: #a8a8a8 !important;\n",
"}\n",
"\"\"\"\n",
"\n",
" # Create the interface\n",
" with gr.Blocks(title=\"Code Commenter & Test Generator\", theme=gr.themes.Base(), css=css) as interface:\n",
" gr.Markdown(\"# 🔧 Code Commenter & Test Generator\")\n",
" gr.Markdown(\"Add detailed comments, docstrings, and/or generate unit tests for your code using various LLM models.\")\n",
" \n",
" with gr.Row():\n",
" with gr.Column():\n",
" code_input = gr.Textbox(\n",
" label=\"Input Code\",\n",
" value=default_code,\n",
" lines=15,\n",
" max_lines=20,\n",
" info=\"Enter the code you want to process\"\n",
" )\n",
" \n",
" with gr.Column():\n",
" code_output = gr.Textbox(\n",
" label=\"Processed Code\",\n",
" lines=20,\n",
" max_lines=20,\n",
" info=\"Your code with added comments, docstrings, and/or unit tests\"\n",
" )\n",
" \n",
" # Add checkboxes below the textboxes\n",
" with gr.Row():\n",
" with gr.Column():\n",
" generate_comments_checkbox = gr.Checkbox(\n",
" label=\"Generate comments\",\n",
" value=True,\n",
" info=\"Add detailed comments and docstrings to the code\"\n",
" )\n",
" generate_tests_checkbox = gr.Checkbox(\n",
" label=\"Generate test units\",\n",
" value=False,\n",
" info=\"Generate comprehensive unit tests for the code\"\n",
" )\n",
" \n",
" with gr.Row():\n",
" with gr.Column(scale=1):\n",
" language_dropdown = gr.Dropdown(\n",
" choices=[\"Python\", \"Ruby\", \"Rust\", \"C++\", \"Java\"],\n",
" value=\"Python\",\n",
" label=\"Programming Language\",\n",
" info=\"Select the programming language of your code\"\n",
" )\n",
" \n",
" llm_dropdown = gr.Dropdown(\n",
" choices=[\n",
" \"gpt-4o-mini\",\n",
" \"claude-3-5-haiku-20241022\", \n",
" \"gemini-2.0-flash\",\n",
" \"ollama:llama3.2:latest\"\n",
" ],\n",
" value=\"gpt-4o-mini\",\n",
" label=\"LLM Model\",\n",
" info=\"Choose the language model to use\"\n",
" )\n",
" \n",
" generate_btn = gr.Button(\n",
" \"🚀 Process Code\", \n",
" variant=\"primary\",\n",
" size=\"lg\"\n",
" )\n",
" \n",
" # Add some API setup information\n",
" gr.Markdown(\"## 📝 API Setup Instructions\")\n",
" gr.Markdown(\"\"\"\n",
" To use this tool, you need to set up API keys as environment variables:\n",
" \n",
" - **OpenAI**: Set `OPENAI_API_KEY`\n",
" - **Anthropic**: Set `ANTHROPIC_API_KEY` \n",
" - **Google Gemini**: Set `GOOGLE_API_KEY`\n",
" - **Ollama**: Make sure Ollama is running locally on port 11434\n",
" \"\"\")\n",
" \n",
" gr.Markdown(\"## ✨ Features\")\n",
" gr.Markdown(\"\"\"\n",
" - **Generate Comments**: Add detailed docstrings and inline comments\n",
" - **Generate Unit Tests**: Create comprehensive test suites with mocking for external dependencies\n",
" - **Combined Mode**: Generate both comments and unit tests in one go\n",
" - **Multiple LLMs**: Choose from OpenAI, Anthropic, Google Gemini, or local Ollama models\n",
" - **Multiple Languages**: Support for Python, Ruby, Rust, C++, and Java\n",
" \"\"\")\n",
" \n",
" # Connect the button to the processing function\n",
" generate_btn.click(\n",
" fn=process_code_interface,\n",
" inputs=[language_dropdown, code_input, llm_dropdown, generate_comments_checkbox, generate_tests_checkbox],\n",
" outputs=code_output,\n",
" show_progress=True\n",
" )\n",
" \n",
" return interface"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ef461e08-c1d5-406d-b7d2-a4329f16486e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"🚀 Starting Code Commenter & Test Generator...\n",
"📋 Setting up Gradio interface...\n",
"🌐 Launching interface...\n",
"💡 The interface will open in your default browser\n",
"🔧 Make sure to set up your API keys as environment variables\n",
"* Running on local URL: http://127.0.0.1:7860\n",
"\n",
"To create a public link, set `share=True` in `launch()`.\n"
]
},
{
"data": {
"text/html": [
"<div><iframe src=\"http://127.0.0.1:7860/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"🚀 Starting Code Commenter & Test Generator...\")\n",
"print(\"📋 Setting up Gradio interface...\")\n",
"\n",
"# Create and launch the interface\n",
"interface = create_gradio_interface()\n",
"\n",
"print(\"🌐 Launching interface...\")\n",
"print(\"💡 The interface will open in your default browser\")\n",
"print(\"🔧 Make sure to set up your API keys as environment variables\")\n",
"\n",
"# Launch with auto-opening in browser\n",
"interface.launch(\n",
" server_name=\"127.0.0.1\",\n",
" server_port=7860,\n",
" share=False,\n",
" inbrowser=True,\n",
" show_error=True\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,335 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "07bb451d-2b91-425f-b8ea-6f35ced780b0",
"metadata": {},
"source": [
"# AI Code Commenting Assistant \n",
"\n",
"## Project Summary \n",
"\n",
"**Purpose**: \n",
"An AI-powered assistant that automatically generates **clear, concise code comments** to improve code readability and maintainability. \n",
"\n",
"**Key Features**: \n",
"- **Language-Agnostic**: Auto-detects programming languages or allows manual specification \n",
"- **Smart Commenting**: Focuses on explaining **complex logic, algorithms, and edge cases** (not obvious syntax) \n",
"- **Customizable**: Optional focus areas let users prioritize specific parts (e.g., database queries, recursion) \n",
"- **Efficient Workflow**: Processes code in chunks and preserves original formatting \n",
"\n",
"**Benefits**: \n",
"✔ Saves time writing documentation \n",
"✔ Helps developers understand unfamiliar code \n",
"✔ Supports multiple languages (Python, JavaScript, C++, SQL, etc.) \n",
"✔ Avoids redundant comments on trivial operations \n",
"\n",
"**Example Use Case**: \n",
"```python \n",
"# Before AI: \n",
"def fib(n): \n",
" if n <= 1: return n \n",
" else: return fib(n-1) + fib(n-2) \n",
"\n",
"# After AI: \n",
"def fib(n): \n",
" # Recursively computes nth Fibonacci number (O(2^n) time) \n",
" if n <= 1: return n # Base case \n",
" else: return fib(n-1) + fib(n-2) # Recursive case "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a0413ae1-0348-4884-ba95-384c4c8f841c",
"metadata": {},
"outputs": [],
"source": [
"!pip install --upgrade huggingface_hub"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b22da766-042b-402f-9e05-78aa8f45ddd4",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import io\n",
"from dotenv import load_dotenv\n",
"from google import genai\n",
"from google.genai import types\n",
"from openai import OpenAI\n",
"from anthropic import Anthropic\n",
"from huggingface_hub import InferenceClient\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5af6d3de-bab6-475e-b2f3-7b788bb2e529",
"metadata": {},
"outputs": [],
"source": [
"# load environments\n",
"load_dotenv(override=True)\n",
"os.environ['ANTHROPIC_API_KEY'] = os.getenv(\"CLAUDE_API_KEY\")\n",
"os.environ[\"HF_TOKEN\"] = os.getenv(\"HF_TOKEN\")\n",
"gemini_api_key= os.getenv(\"GEMINI_API_KEY\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cad0755e-4174-4fbc-84e6-15cc54bc609a",
"metadata": {},
"outputs": [],
"source": [
"#initialize remote models\n",
"claude= Anthropic()\n",
"gemini = genai.Client(api_key=gemini_api_key)\n",
"\n",
"#opensource models\n",
"qwen = InferenceClient(provider=\"featherless-ai\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "31d75812-1cd3-4512-8446-022c3357c354",
"metadata": {},
"outputs": [],
"source": [
"#initialize local model\n",
"llama = OpenAI(base_url=\"http://localhost:11434/v1\", api_key=\"ollama\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "31316379-2a56-4707-b207-ea60b490f536",
"metadata": {},
"outputs": [],
"source": [
"#models\n",
"claude_model = \"claude-3-5-haiku-latest\"\n",
"gemini_model = \"gemini-2.5-pro\"\n",
"qwen_model= \"Qwen/Qwen2.5-Coder-32B-Instruct\"\n",
"llama_model = \"llama3:8b\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b7d9c4bf-0955-4406-8717-ffa7bdd0bec9",
"metadata": {},
"outputs": [],
"source": [
"system_message=\"\"\"\n",
"You are an expert AI specialized in code documentation. Your task is to generate concise, meaningful comments that explain the purpose and logic of provided code. Follow these rules:\n",
"\n",
"1. **Infer language**: Auto-detect programming language and use appropriate comment syntax\n",
"2. **Explain why, not what**: Focus on purpose, edge cases, and non-obvious logic\n",
"3. **Be concise**: Maximum 1-2 sentences per comment block\n",
"4. **Prioritize key sections**: Only comment complex logic, algorithms, or critical operations\n",
"5. **Maintain structure**: Preserve original code formatting and indentation\n",
"6. **Output format**: Return ONLY commented code with no additional text\n",
"\n",
"Commenting guidelines by language:\n",
"- Python: `# Inline comments` and `\"\"Docstrings\"\"`\n",
"- JavaScript/Java: `// Line comments` and `/* Block comments */`\n",
"- C/C++: `//` and `/* */`\n",
"- SQL: `-- Line comments`\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "79dfe110-1523-40c7-ad90-2787ed22fd8d",
"metadata": {},
"outputs": [],
"source": [
"def user_prompt(code):\n",
" prompt = f\"\"\"\n",
" i want to document my code for better understanding. Please generate meaningful necessary comments\n",
" here is my code:\n",
" {code}\n",
"\n",
" Return ONLY commented code with no additional text\n",
" \"\"\"\n",
"\n",
" return prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7bcf29e-ec78-4cfd-9b41-f2dc86400435",
"metadata": {},
"outputs": [],
"source": [
"def conversation_template(code):\n",
" messages = [\n",
" {\"role\":\"system\", \"content\":system_message},\n",
" {\"role\":\"user\",\"content\":user_prompt(code)}\n",
" ]\n",
" return messages"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a36fec0f-7eba-4ccd-8fc4-cbf5ade76fa2",
"metadata": {},
"outputs": [],
"source": [
"def stream_gemini(code):\n",
" message = user_prompt(code)\n",
" response = gemini.models.generate_content_stream(\n",
" model=gemini_model,\n",
" config= types.GenerateContentConfig(\n",
" system_instruction = system_message,\n",
" temperature = 0.8,\n",
" ),\n",
" contents = [message]\n",
" )\n",
"\n",
" result = \"\"\n",
" for chunk in response:\n",
" result += chunk.text or \"\"\n",
" yield result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5d1e0c0-dc88-43ee-8698-82ad9ce7c51b",
"metadata": {},
"outputs": [],
"source": [
"def stream_claude(code):\n",
" messages = [{\"role\":\"user\",\"content\":user_prompt(code)}]\n",
" response = claude.messages.stream(\n",
" model= claude_model,\n",
" temperature=0.8,\n",
" messages = messages,\n",
" max_tokens=5000\n",
" )\n",
"\n",
" result = \"\"\n",
" with response as stream:\n",
" for text in stream.text_stream:\n",
" result += text or \"\"\n",
" yield result\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "903c97e5-9170-449e-8a0f-9f906351ec45",
"metadata": {},
"outputs": [],
"source": [
"def stream_opensource(code,model):\n",
" model = model.lower()\n",
" client = globals()[model]\n",
" model = globals()[f\"{model}_model\"]\n",
" stream = client.chat.completions.create(\n",
" model = model,\n",
" messages= conversation_template(code),\n",
" temperature = 0.7,\n",
" stream = True\n",
" )\n",
"\n",
" result = \"\"\n",
" for chunk in stream:\n",
" result += chunk.choices[0].delta.content or \"\"\n",
" yield result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ff051c22-a2f8-4153-b970-f8a466a4cf5a",
"metadata": {},
"outputs": [],
"source": [
"def commentor(code, model):\n",
" model =model.lower()\n",
" if model == \"claude\":\n",
" result = stream_claude(code)\n",
" elif model == \"gemini\":\n",
" result = stream_gemini(code)\n",
" elif model == \"qwen\" or model == \"llama\":\n",
" result = stream_opensource(code, model)\n",
"\n",
"\n",
" for code in result:\n",
" yield code.replace(\"```cpp\\n\",\"\").replace(\"```python\\n\",\"\").replace(\"```javascript\\n\",\"\").replace(\"```typescript\\n\",\"\").replace(\"```\",\"\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10daf070-3546-4073-a2a0-3f5f8fc156f0",
"metadata": {},
"outputs": [],
"source": [
"with gr.Blocks() as ui:\n",
" gr.Markdown(\"# Genarate comment\")\n",
" with gr.Row():\n",
" raw_code = gr.Textbox(label=\"Raw Code:\", lines=10)\n",
" commented_code = gr.Textbox(label=\"Commented_code\",lines=10)\n",
" with gr.Row():\n",
" models = gr.Dropdown([\"Gemini\",\"Claude\",\"Llama\",\"Qwen\"], value=\"Gemini\")\n",
" with gr.Row():\n",
" generate_comment = gr.Button(\"Generate Comment\")\n",
"\n",
" generate_comment.click(commentor, inputs=[raw_code, models], outputs=[commented_code])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "afb87f32-f25e-40c5-844a-d2b7af748192",
"metadata": {},
"outputs": [],
"source": [
"ui.launch(inbrowser=True,debug=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "96bc48ad-10ad-4821-b58e-ea1b22cdcdc9",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,300 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "45ca91c2",
"metadata": {},
"source": [
"# AI tool to add comments to the provided Java code\n",
"\n",
"Here we build a Gradio App that uses the frontier models to add comments to a java code. For testing purposes I have used the *cheaper* versions of the models, not the ones the leaderboards indicate as the best ones."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f44901f5",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI\n",
"import google.generativeai as genai\n",
"import anthropic\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c47706b3",
"metadata": {},
"outputs": [],
"source": [
"# environment\n",
"\n",
"load_dotenv(override=True)\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35446b9a",
"metadata": {},
"outputs": [],
"source": [
"openai = OpenAI()\n",
"claude = anthropic.Anthropic()\n",
"genai.configure()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0e899efd",
"metadata": {},
"outputs": [],
"source": [
"OPENAI_MODEL = \"gpt-4o-mini\"\n",
"CLAUDE_MODEL = \"claude-3-haiku-20240307\"\n",
"GEMINI_MODEL = 'gemini-2.0-flash-lite'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "47640f53",
"metadata": {},
"outputs": [],
"source": [
"system_message = \"You are an assistant that adds comments to java code. \"\n",
"system_message += \"Do not make any changes to the code itself.\"\n",
"system_message += \"Use comments sparingly. Only add them in places where they help to undestand how the code works. Do not comment every single line of the code.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f41ccbf0",
"metadata": {},
"outputs": [],
"source": [
"def user_prompt_for(code):\n",
" user_prompt = \"Add helpful comments to this java code. \"\n",
" user_prompt += \"Do not change the code itself.\\n\\n\"\n",
" user_prompt += code\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c57c0000",
"metadata": {},
"outputs": [],
"source": [
"test_code = \"\"\"\n",
"package com.hma.kafkaproducertest.producer;\n",
"\n",
"import com.hma.kafkaproducertest.model.TestDTO;\n",
"import org.springframework.cloud.stream.function.StreamBridge;\n",
"import org.springframework.messaging.Message;\n",
"import org.springframework.messaging.support.MessageBuilder;\n",
"import org.springframework.stereotype.Component;\n",
"\n",
"import java.util.Arrays;\n",
"import java.util.Comparator;\n",
"import java.util.StringJoiner;\n",
"import java.util.stream.Collectors;\n",
"import java.util.stream.IntStream;\n",
"\n",
"@Component\n",
"public class TestProducer {\n",
"\n",
" public static final String EVENT_TYPE_HEADER = \"event-type\";\n",
" private static final String BINDING_NAME = \"testProducer-out-0\";\n",
"\n",
" private final StreamBridge streamBridge;\n",
"\n",
" public TestProducer(StreamBridge streamBridge) {\n",
" this.streamBridge = streamBridge;\n",
" }\n",
"\n",
" public void sendMessage(TestDTO payload, String eventType){\n",
" Message<TestDTO> message = MessageBuilder\n",
" .withPayload(payload)\n",
" .setHeader(EVENT_TYPE_HEADER, eventType)\n",
" .build();\n",
"\n",
" streamBridge.send(BINDING_NAME, message);\n",
" }\n",
"\n",
" public void test(String t1, String t2) {\n",
" var s = t1.length() > t2.length() ? t2 : t1;\n",
" var l = t1.length() > t2.length() ? t1 : t2;\n",
" var res = true;\n",
" for (int i = 0; i < s.length(); i++) {\n",
" if (s.charAt(i) == l.charAt(i)) {\n",
" res = false;\n",
" break;\n",
" }\n",
" }\n",
" System.out.println(res);\n",
" }\n",
"}\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "00c71128",
"metadata": {},
"outputs": [],
"source": [
"def stream_gpt(code):\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": system_message},\n",
" {\"role\": \"user\", \"content\": user_prompt_for(code)}\n",
" ]\n",
" stream = openai.chat.completions.create(\n",
" model=OPENAI_MODEL,\n",
" messages=messages,\n",
" stream=True\n",
" )\n",
" result = \"\"\n",
" for chunk in stream:\n",
" result += chunk.choices[0].delta.content or \"\"\n",
" yield result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ca92f8a8",
"metadata": {},
"outputs": [],
"source": [
"def stream_claude(code):\n",
" result = claude.messages.stream(\n",
" model=CLAUDE_MODEL,\n",
" max_tokens=2000,\n",
" system=system_message,\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": user_prompt_for(code)},\n",
" ],\n",
" )\n",
" response = \"\"\n",
" with result as stream:\n",
" for text in stream.text_stream:\n",
" response += text or \"\"\n",
" yield response"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9dffed4b",
"metadata": {},
"outputs": [],
"source": [
"def stream_gemini(code):\n",
" gemini = genai.GenerativeModel(\n",
" model_name=GEMINI_MODEL,\n",
" system_instruction=system_message\n",
" )\n",
" stream = gemini.generate_content(user_prompt_for(code), stream=True)\n",
" result = \"\"\n",
" for chunk in stream:\n",
" result += chunk.text or \"\"\n",
" yield result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "31f9c267",
"metadata": {},
"outputs": [],
"source": [
"def comment_code(code, model):\n",
" if model==\"GPT\":\n",
" result = stream_gpt(code)\n",
" elif model==\"Claude\":\n",
" result = stream_claude(code)\n",
" elif model==\"Gemini\":\n",
" result = stream_gemini(code)\n",
" else:\n",
" raise ValueError(\"Unknown model\")\n",
" yield from result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c04c0a1b",
"metadata": {},
"outputs": [],
"source": [
"with gr.Blocks() as ui:\n",
" with gr.Row():\n",
" original_code = gr.Textbox(label=\"Java code:\", lines=10, value=test_code)\n",
" commented_code = gr.Markdown(label=\"Commented code:\")\n",
" with gr.Row():\n",
" model = gr.Dropdown([\"GPT\", \"Claude\", \"Gemini\"], label=\"Select model\", value=\"GPT\")\n",
" comment = gr.Button(\"Comment code\")\n",
"\n",
" comment.click(comment_code, inputs=[original_code, model], outputs=[commented_code])\n",
"\n",
"ui.launch(inbrowser=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84d33a5f",
"metadata": {},
"outputs": [],
"source": [
"ui.close()"
]
},
{
"cell_type": "markdown",
"id": "bbd50bf7",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"In my personal opinion, at least when using these *cheaper* versions of the models, the result provided by Claude is the best. ChatGPT adds way too many comments even if the system message discourages that. Gemini provides a good result also, but maybe adds a tad too few comments -- although that certainly depends on your personal preferences."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "llms",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

Some files were not shown because too many files have changed in this diff Show More