diff --git a/SETUP-PC.md b/SETUP-PC.md index d2b2759..2ca8bbd 100644 --- a/SETUP-PC.md +++ b/SETUP-PC.md @@ -13,10 +13,12 @@ I use a platform called Anaconda to set up your environment. It's a powerful too Having said that: if you have any problems with Anaconda, I've provided an alternative approach. It's faster and simpler and should have you running quickly, with less of a guarantee around compatibility. -### Before we begin - Heads up! Please do check these Windows "gotchas": - If you are relatively new to using the Command Prompt, here is an excellent [guide](https://chatgpt.com/share/67b0acea-ba38-8012-9c34-7a2541052665) with instructions and exercises. I'd suggest you work through this first to build some confidence. +## HEAD'S UP - "GOTCHA" ISSUES ON A PC: The following 4 Windows issues will need your attention, particularly #3 and #4 + +Please do take a look at these issues. Issue #3 (Windows 260 character limit) will cause an issue with an "Archive Error" installing pytorch if unaddressed. Issue #4 will cause an installation issue. + There are 4 common gotchas to developing on Windows to be aware of: 1. Permissions. Please take a look at this [tutorial](https://chatgpt.com/share/67b0ae58-d1a8-8012-82ca-74762b0408b0) on permissions on Windows @@ -92,7 +94,7 @@ Press Win + R, type `cmd`, and press Enter Run `python --version` to find out which python you're on. Ideally you'd be using a version of Python 3.11, so we're completely in sync. -I believe Python 3.12 works also, but (as of Feb 2025) Python 3.13 does **not** yet work as several Data Science dependencies are not yet ready for Python 3.13. +I believe Python 3.12 works also, but (as of June 2025) Python 3.13 does **not** yet work as several Data Science dependencies are not yet ready for Python 3.13. If you need to install Python or install another version, you can download it here: https://www.python.org/downloads/ diff --git a/community-contributions/WebScraperApp/README.md b/community-contributions/WebScraperApp/README.md new file mode 100644 index 0000000..6dfed7b --- /dev/null +++ b/community-contributions/WebScraperApp/README.md @@ -0,0 +1,159 @@ +# Web Scraper & Data Analyzer + +A modern Python application with a sleek PyQt5 GUI for web scraping, data analysis, visualization, and AI-powered website insights. Features a clean, minimalistic design with real-time progress tracking, comprehensive data filtering, and an integrated AI chat assistant for advanced analysis. + +## Features + +- **Modern UI**: Clean, minimalistic design with dark theme and smooth animations +- **Web Scraping**: Multi-threaded scraping with configurable depth (max 100 levels) +- **Data Visualization**: Interactive table with sorting and filtering capabilities +- **Content Preview**: Dual preview system with both text and visual HTML rendering +- **Data Analysis**: Comprehensive statistics and domain breakdown +- **AI-Powered Analysis**: Chat-based assistant for website insights, SEO suggestions, and content analysis +- **Export Functionality**: JSON export with full metadata +- **URL Normalization**: Handles www/non-www domains intelligently +- **Real-time Progress**: Live progress updates during scraping operations +- **Loop Prevention**: Advanced duplicate detection to prevent infinite loops +- **Smart Limits**: Configurable limits to prevent runaway scraping + +## AI Analysis Tab + +The application features an advanced **AI Analysis** tab: + +- **Conversational Chat UI**: Ask questions about your scraped websites in a modern chat interface (like ChatGPT) +- **Quick Actions**: One-click questions for structure, SEO, content themes, and performance +- **Markdown Responses**: AI replies are formatted for clarity and readability +- **Context Awareness**: AI uses your scraped data for tailored insights +- **Requirements**: Internet connection and the `openai` Python package (see Installation) +- **Fallback**: If `openai` is not installed, a placeholder response is shown + +## Loop Prevention & Duplicate Detection + +The scraper includes robust protection against infinite loops and circular references: + +### 🔄 URL Normalization +- Removes `www.` prefixes for consistent domain handling +- Strips URL fragments (`#section`) to prevent duplicate content +- Removes trailing slashes for consistency +- Normalizes query parameters + +### 🚫 Duplicate Detection +- **Visited URL Tracking**: Maintains a set of all visited URLs +- **Unlimited Crawling**: No page limits per domain or total pages +- **Per-Page Duplicate Filtering**: Removes duplicate links within the same page + +### 🛡️ Smart Restrictions +- **No Depth Limits**: Crawl as deep as the specified max_depth allows +- **Content Type Filtering**: Only scrapes HTML content +- **File Type Filtering**: Skips non-content files (PDFs, images, etc.) +- **Consecutive Empty Level Detection**: Stops if 3 consecutive levels have no new content + +### 📊 Enhanced Tracking +- **Domain Page Counts**: Tracks pages scraped per domain (for statistics) +- **URL Check Counts**: Shows total URLs checked vs. pages scraped +- **Detailed Statistics**: Comprehensive reporting on scraping efficiency +- **Unlimited Processing**: No artificial limits on crawling scope + +## Installation + +1. **Clone or download the project files** + +2. **Install dependencies**: + ```bash + pip install -r requirements.txt + ``` + - This will install all required packages, including `PyQt5`, `PyQtWebEngine` (for visual preview), and `openai` (for AI features). + +3. **Run the application**: + ```bash + python web_scraper_app.py + ``` + +## Usage + +### 1. Scraping Configuration +- Enter a starting URL (with or without http/https) +- Set maximum crawl depth (1-100) +- Click "Start Scraping" to begin + +### 2. Data View & Filtering +- View scraped data in an interactive table +- Filter by search terms or specific domains +- Double-click any row to preview content +- Export data to JSON format + +### 3. Analysis & Statistics +- View comprehensive scraping statistics +- See domain breakdown and word counts +- Preview content in both text and visual formats +- Analyze load times and link counts +- Monitor duplicate detection efficiency + +### 4. AI Analysis (New!) +- Switch to the **AI Analysis** tab +- Type your question or use quick action buttons (e.g., "Analyze the website structure", "Suggest SEO improvements") +- The AI will analyze your scraped data and provide actionable insights +- Requires an internet connection and the `openai` package + +## Visual Preview Feature + +The application includes a visual HTML preview feature that renders scraped web pages in a browser-like view: + +- **Requirements**: PyQtWebEngine (automatically installed with requirements.txt) +- **Functionality**: Displays HTML content with proper styling and formatting +- **Fallback**: If PyQtWebEngine is not available, shows a text-only preview +- **Error Handling**: Graceful error messages for invalid HTML content + +## Technical Details + +- **Backend**: Pure Python with urllib and html.parser (no compilation required) +- **Frontend**: PyQt5 with custom modern styling +- **Threading**: Multi-threaded scraping for better performance +- **Data Storage**: Website objects with full metadata +- **URL Handling**: Intelligent normalization and domain filtering +- **Loop Prevention**: Multi-layered duplicate detection system +- **AI Integration**: Uses OpenAI API (via openrouter) for chat-based analysis + +## File Structure + +``` +Testing/ +├── web_scraper_app.py # Main application (with AI and GUI) +├── module.py # Core scraping logic +├── test.py # Basic functionality tests +├── requirements.txt # Dependencies +└── README.md # This file +``` + +## Troubleshooting + +### Visual Preview Not Working +1. Ensure PyQtWebEngine is installed: `pip install PyQtWebEngine` +2. Check console output for import errors + +### AI Analysis Not Working +1. Ensure the `openai` package is installed: `pip install openai` +2. Check your internet connection (AI requires online access) +3. If not installed, the AI tab will show a placeholder response + +### Scraping Issues +1. Verify internet connection +2. Check URL format (add https:// if needed) +3. Try with a lower depth setting +4. Check console for error messages + +### Loop Prevention +1. The scraper automatically prevents infinite loops +2. Check the analysis tab for detailed statistics +3. Monitor "Total URLs Checked" vs "Total Pages" for efficiency +4. Use lower depth settings for sites with many internal links + +### Performance +- Use lower depth settings for faster scraping +- Filter data to focus on specific domains +- Close other applications to free up resources +- Monitor domain page counts to avoid hitting limits + +## License + +This project is open source and available under the MIT License. \ No newline at end of file diff --git a/community-contributions/WebScraperApp/module.py b/community-contributions/WebScraperApp/module.py new file mode 100644 index 0000000..20dff0f --- /dev/null +++ b/community-contributions/WebScraperApp/module.py @@ -0,0 +1,473 @@ +import urllib.request +import urllib.parse +import urllib.error +import html.parser +import re +from datetime import datetime +import time +import ssl +from urllib.parse import urljoin, urlparse +from concurrent.futures import ThreadPoolExecutor, as_completed +import threading +from functools import partial + +class HTMLParser(html.parser.HTMLParser): + """Custom HTML parser to extract title, links, and text content""" + + def __init__(self): + super().__init__() + self.title = "" + self.links = [] + self.text_content = [] + self.in_title = False + self.in_body = False + self.current_tag = "" + + def handle_starttag(self, tag, attrs): + self.current_tag = tag.lower() + + if tag.lower() == 'title': + self.in_title = True + elif tag.lower() == 'body': + self.in_body = True + elif tag.lower() == 'a': + # Extract href attribute + for attr, value in attrs: + if attr.lower() == 'href' and value: + self.links.append(value) + + def handle_endtag(self, tag): + if tag.lower() == 'title': + self.in_title = False + elif tag.lower() == 'body': + self.in_body = False + + def handle_data(self, data): + if self.in_title: + self.title += data + elif self.in_body and self.current_tag in ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'span', 'li']: + # Clean the text data + cleaned_data = re.sub(r'\s+', ' ', data.strip()) + if cleaned_data: + self.text_content.append(cleaned_data) + + def get_text(self): + """Return all extracted text content as a single string""" + return ' '.join(self.text_content) + + def get_clean_text(self, max_length=500): + """Return cleaned text content with length limit""" + text = self.get_text() + # Remove extra whitespace and limit length + text = re.sub(r'\s+', ' ', text.strip()) + if len(text) > max_length: + text = text[:max_length] + "..." + return text + +class Website: + """Class to store website data""" + + def __init__(self, title, url, content, depth, links=None, load_time=None): + self.title = title or "No Title" + self.url = url + self.content = content + self.depth = depth + self.links = links or [] + self.load_time = load_time + self.timestamp = datetime.now() + + def get_word_count(self): + """Get word count from content""" + if not self.content: + return 0 + # Extract text content and count words + text_content = re.sub(r'<[^>]+>', '', self.content) + words = text_content.split() + return len(words) + + def get_domain(self): + """Extract domain from URL""" + try: + parsed = urlparse(self.url) + return parsed.netloc + except: + return "" + + def get_normalized_domain(self): + """Get domain without www prefix for consistent filtering""" + domain = self.get_domain() + if domain.startswith('www.'): + return domain[4:] + return domain + + def search_content(self, query): + """Search for query in content""" + if not self.content or not query: + return False + return query.lower() in self.content.lower() + + def get_text_preview(self, max_length=200): + """Get a text preview of the content""" + if not self.content: + return "No content available" + + # Extract text content + text_content = re.sub(r'<[^>]+>', '', self.content) + text_content = re.sub(r'\s+', ' ', text_content.strip()) + + if len(text_content) > max_length: + return text_content[:max_length] + "..." + return text_content + +class WebScraper: + """Web scraper with multithreading support and robust duplicate detection""" + + def __init__(self): + self.websites = [] + self.visited_urls = set() + self.visited_domains = set() # Track visited domains + self.start_domain = None # Store the starting domain + self.lock = threading.Lock() + self.max_workers = 10 # Number of concurrent threads + # Removed all page limits - unlimited crawling + self.domain_page_counts = {} # Track page count per domain (for statistics only) + self._stop_requested = False # Flag to stop scraping + + def normalize_url(self, url): + """Normalize URL to handle www prefixes and remove fragments""" + if not url: + return url + + # Remove fragments (#) to prevent duplicate content + if '#' in url: + url = url.split('#')[0] + + # Remove trailing slashes for consistency + url = url.rstrip('/') + + # Remove www prefix for consistent domain handling + if url.startswith('https://www.'): + return url.replace('https://www.', 'https://', 1) + elif url.startswith('http://www.'): + return url.replace('http://www.', 'http://', 1) + return url + + def get_domain_from_url(self, url): + """Extract and normalize domain from URL""" + try: + parsed = urlparse(url) + domain = parsed.netloc + if domain.startswith('www.'): + return domain[4:] + return domain + except: + return "" + + def should_skip_url(self, url, current_depth): + """Check if URL should be skipped based on various criteria""" + normalized_url = self.normalize_url(url) + + # Skip if already visited + if normalized_url in self.visited_urls: + return True, "Already visited" + + # Skip if not a valid HTTP/HTTPS URL + if not normalized_url.startswith(('http://', 'https://')): + return True, "Not HTTP/HTTPS URL" + + # Get domain + domain = self.get_domain_from_url(normalized_url) + if not domain: + return True, "Invalid domain" + + # Removed all domain page limits - unlimited crawling + # Removed external domain depth limits - crawl as deep as needed + + return False, "OK" + + def scrape_url(self, url, depth): + """Scrape a single URL with error handling and rate limiting""" + try: + # Check if stop was requested + if self._stop_requested: + return None + + # Check if URL should be skipped + should_skip, reason = self.should_skip_url(url, depth) + if should_skip: + print(f"Skipping {url}: {reason}") + return None + + # Normalize URL + normalized_url = self.normalize_url(url) + + # Mark as visited and update domain count (for statistics only) + with self.lock: + self.visited_urls.add(normalized_url) + domain = self.get_domain_from_url(normalized_url) + if domain: + self.domain_page_counts[domain] = self.domain_page_counts.get(domain, 0) + 1 + + # Add small delay to prevent overwhelming servers + time.sleep(0.1) + + start_time = time.time() + + # Create request with headers + req = urllib.request.Request( + normalized_url, + headers={ + 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36', + 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', + 'Accept-Language': 'en-US,en;q=0.5', + 'Accept-Encoding': 'gzip, deflate', + 'Connection': 'keep-alive', + 'Upgrade-Insecure-Requests': '1', + } + ) + + # Fetch the page with timeout + with urllib.request.urlopen(req, timeout=15) as response: + # Check content type + content_type = response.headers.get('content-type', '').lower() + if 'text/html' not in content_type and 'application/xhtml' not in content_type: + print(f"Skipping {url}: Not HTML content ({content_type})") + return None + + html_content = response.read().decode('utf-8', errors='ignore') + + load_time = time.time() - start_time + + # Skip if content is too small (likely error page) + if len(html_content) < 100: + print(f"Skipping {url}: Content too small ({len(html_content)} chars)") + return None + + # Parse HTML + parser = HTMLParser() + parser.feed(html_content) + + # Extract links and normalize them with duplicate detection + links = [] + base_url = normalized_url + seen_links = set() # Track links within this page to avoid duplicates + + for link in parser.links: + try: + absolute_url = urljoin(base_url, link) + normalized_link = self.normalize_url(absolute_url) + + # Skip if already seen in this page or should be skipped + if normalized_link in seen_links: + continue + seen_links.add(normalized_link) + + should_skip, reason = self.should_skip_url(normalized_link, depth + 1) + if should_skip: + continue + + # Only include http/https links and filter out common non-content URLs + if (normalized_link.startswith(('http://', 'https://')) and + not any(skip in normalized_link.lower() for skip in [ + 'mailto:', 'tel:', 'javascript:', 'data:', 'file:', + '.pdf', '.doc', '.docx', '.xls', '.xlsx', '.zip', '.rar', + '.jpg', '.jpeg', '.png', '.gif', '.bmp', '.svg', '.ico', + '.css', '.js', '.xml', '.json', '.txt', '.log' + ])): + links.append(normalized_link) + except: + continue + + # Create Website object + website = Website( + title=parser.title, + url=normalized_url, + content=html_content, + depth=depth, + links=links, + load_time=load_time + ) + + return website + + except urllib.error.HTTPError as e: + print(f"HTTP Error scraping {url}: {e.code} - {e.reason}") + return None + except urllib.error.URLError as e: + print(f"URL Error scraping {url}: {e.reason}") + return None + except Exception as e: + print(f"Error scraping {url}: {str(e)}") + return None + + def crawl_website(self, start_url, max_depth=3, progress_callback=None): + """Crawl website with multithreading support and no page limits""" + if not start_url.startswith(('http://', 'https://')): + start_url = 'https://' + start_url + + # Initialize tracking + self.websites = [] + self.visited_urls = set() + self.visited_domains = set() + self.domain_page_counts = {} + self.start_domain = self.get_domain_from_url(start_url) + self._stop_requested = False # Reset stop flag + + print(f"Starting crawl from: {start_url}") + print(f"Starting domain: {self.start_domain}") + print(f"Max depth: {max_depth}") + print(f"Unlimited crawling - no page limits") + + # Start with the initial URL + urls_to_scrape = [(start_url, 0)] + max_depth_reached = 0 + consecutive_empty_levels = 0 + max_consecutive_empty = 3 # Stop if 3 consecutive levels have no new URLs + total_pages_scraped = 0 + # Removed all page limits - unlimited crawling + + with ThreadPoolExecutor(max_workers=self.max_workers) as executor: + for current_depth in range(max_depth + 1): + # Check if stop was requested + if self._stop_requested: + print("Scraping stopped by user request") + break + + if not urls_to_scrape: + print(f"Stopping at depth {current_depth}: No more URLs to scrape") + break + + # Check if we've reached too many consecutive empty levels + if consecutive_empty_levels >= max_consecutive_empty: + print(f"Stopping at depth {current_depth}: {max_consecutive_empty} consecutive empty levels") + break + + # Removed absolute page limit check - unlimited pages + + print(f"Scraping depth {current_depth} with {len(urls_to_scrape)} URLs") + + # Submit all URLs at current depth for concurrent scraping + future_to_url = { + executor.submit(self.scrape_url, url, depth): url + for url, depth in urls_to_scrape + } + + # Collect results and prepare next level + urls_to_scrape = [] + level_results = 0 + + for future in as_completed(future_to_url): + # Check if stop was requested + if self._stop_requested: + print("Stopping processing of current level") + break + + website = future.result() + if website: + with self.lock: + self.websites.append(website) + level_results += 1 + total_pages_scraped += 1 + + # Emit progress if callback provided + if progress_callback: + progress_callback(website) + + # Add links for next depth level (no limits) + if current_depth < max_depth: + for link in website.links: + # Removed URL limit per level - process all URLs + + should_skip, reason = self.should_skip_url(link, current_depth + 1) + if not should_skip: + urls_to_scrape.append((link, current_depth + 1)) + + # Check if stop was requested after processing level + if self._stop_requested: + break + + # Update depth tracking + if level_results > 0: + max_depth_reached = current_depth + consecutive_empty_levels = 0 + else: + consecutive_empty_levels += 1 + + # Only stop if we've reached the actual max depth + if current_depth >= max_depth: + print(f"Reached maximum depth: {max_depth}") + break + + # Print progress summary + print(f"Depth {current_depth} completed: {level_results} pages, Total: {len(self.websites)}") + if self.domain_page_counts: + print(f"Domain breakdown: {dict(self.domain_page_counts)}") + + print(f"Crawling completed. Max depth reached: {max_depth_reached}, Total pages: {len(self.websites)}") + print(f"Visited URLs: {len(self.visited_urls)}") + print(f"Domain breakdown: {dict(self.domain_page_counts)}") + return self.websites + + def reset(self): + """Reset the scraper state for a new crawl""" + self.websites = [] + self.visited_urls = set() + self.visited_domains = set() + self.domain_page_counts = {} + self.start_domain = None + self._stop_requested = False # Reset stop flag + + def get_statistics(self): + """Get scraping statistics with enhanced tracking information""" + if not self.websites: + return { + 'total_pages': 0, + 'total_links': 0, + 'total_words': 0, + 'avg_load_time': 0, + 'max_depth_reached': 0, + 'domains': {}, + 'visited_urls_count': 0, + 'domain_page_counts': {}, + 'start_domain': self.start_domain + } + + total_pages = len(self.websites) + total_links = sum(len(w.links) for w in self.websites) + total_words = sum(w.get_word_count() for w in self.websites) + + load_times = [w.load_time for w in self.websites if w.load_time] + avg_load_time = sum(load_times) / len(load_times) if load_times else 0 + + max_depth_reached = max(w.depth for w in self.websites) + + # Count domains + domains = {} + for website in self.websites: + domain = website.get_normalized_domain() + domains[domain] = domains.get(domain, 0) + 1 + + return { + 'total_pages': total_pages, + 'total_links': total_links, + 'total_words': total_words, + 'avg_load_time': avg_load_time, + 'max_depth_reached': max_depth_reached, + 'domains': domains, + 'visited_urls_count': len(self.visited_urls), + 'domain_page_counts': dict(self.domain_page_counts), + 'start_domain': self.start_domain + } + + def filter_by_domain(self, domain): + """Filter websites by domain""" + normalized_domain = self.normalize_url(domain) + return [w for w in self.websites if w.get_normalized_domain() == normalized_domain] + + def search_websites(self, query): + """Search websites by query""" + return [w for w in self.websites if w.search_content(query)] + + def stop_scraping(self): + """Request graceful stop of the scraping process""" + self._stop_requested = True \ No newline at end of file diff --git a/community-contributions/WebScraperApp/requirements.txt b/community-contributions/WebScraperApp/requirements.txt new file mode 100644 index 0000000..a9f1b2a --- /dev/null +++ b/community-contributions/WebScraperApp/requirements.txt @@ -0,0 +1,5 @@ +PyQt5>=5.15.0 +PyQtWebEngine>=5.15.0 +urllib3==2.0.7 +openai>=1.0.0 +python-dotenv>=1.0.0 \ No newline at end of file diff --git a/community-contributions/WebScraperApp/test.py b/community-contributions/WebScraperApp/test.py new file mode 100644 index 0000000..e86a29c --- /dev/null +++ b/community-contributions/WebScraperApp/test.py @@ -0,0 +1,161 @@ +#!/usr/bin/env python3 +""" +Simple test script to verify the web scraping functionality +""" + +import module + +def test_basic_scraping(): + """Test basic scraping functionality""" + print("Testing basic web scraping...") + + # Create a scraper instance + scraper = module.WebScraper() + + # Test with a simple website (httpbin.org is a safe test site) + test_url = "https://httpbin.org/html" + + print(f"Scraping {test_url} with depth 1...") + + try: + # Scrape with depth 1 to keep it fast + websites = scraper.crawl_website(test_url, max_depth=1) + + print(f"Successfully scraped {len(websites)} websites") + + if websites: + # Show first website details + first_site = websites[0] + print(f"\nFirst website:") + print(f" Title: {first_site.title}") + print(f" URL: {first_site.url}") + print(f" Depth: {first_site.depth}") + print(f" Links found: {len(first_site.links)}") + print(f" Word count: {first_site.get_word_count()}") + + # Show statistics + stats = scraper.get_statistics() + print(f"\nStatistics:") + print(f" Total pages: {stats['total_pages']}") + print(f" Total links: {stats['total_links']}") + print(f" Total words: {stats['total_words']}") + print(f" Average load time: {stats['avg_load_time']:.2f}s") + + return True + else: + print("No websites were scraped") + return False + + except Exception as e: + print(f"Error during scraping: {e}") + return False + +def test_website_class(): + """Test the Website class functionality""" + print("\nTesting Website class...") + + # Create a test website + website = module.Website( + title="Test Website", + url="https://example.com", + content="

Test Content

This is a test paragraph.

", + depth=0, + links=["https://example.com/page1", "https://example.com/page2"] + ) + + # Test methods + print(f"Website title: {website.title}") + print(f"Website URL: {website.url}") + print(f"Word count: {website.get_word_count()}") + print(f"Domain: {website.get_domain()}") + print(f"Normalized domain: {website.get_normalized_domain()}") + print(f"Search for 'test': {website.search_content('test')}") + print(f"Search for 'nonexistent': {website.search_content('nonexistent')}") + + return True + +def test_html_parser(): + """Test the HTML parser functionality""" + print("\nTesting HTML Parser...") + + parser = module.HTMLParser() + test_html = """ + + Test Page + +

Welcome

+

This is a link to example.com

+

Here's another relative link

+ + + """ + + parser.feed(test_html) + print(f"Title extracted: {parser.title}") + print(f"Links found: {parser.links}") + print(f"Text content length: {len(parser.get_text())}") + + return True + +def test_url_normalization(): + """Test URL normalization to handle www. prefixes""" + print("\nTesting URL Normalization...") + + scraper = module.WebScraper() + + # Test URLs with and without www. + test_urls = [ + "https://www.example.com/page", + "https://example.com/page", + "http://www.test.com/path?param=value#fragment", + "http://test.com/path?param=value#fragment" + ] + + print("URL Normalization Results:") + for url in test_urls: + normalized = scraper.normalize_url(url) + print(f" Original: {url}") + print(f" Normalized: {normalized}") + print() + + # Test domain filtering + print("Domain Filtering Test:") + test_websites = [ + module.Website("Site 1", "https://www.example.com", "content", 0), + module.Website("Site 2", "https://example.com", "content", 0), + module.Website("Site 3", "https://www.test.com", "content", 0) + ] + + scraper.websites = test_websites + + # Test filtering by domain with and without www. + domains_to_test = ["example.com", "www.example.com", "test.com", "www.test.com"] + + for domain in domains_to_test: + filtered = scraper.filter_by_domain(domain) + print(f" Filter '{domain}': {len(filtered)} results") + for site in filtered: + print(f" - {site.title} ({site.url})") + + return True + +if __name__ == "__main__": + print("Web Scraper Test Suite") + print("=" * 50) + + # Test HTML parser + test_html_parser() + + # Test Website class + test_website_class() + + # Test URL normalization + test_url_normalization() + + # Test basic scraping (uncomment to test actual scraping) + # Note: This requires internet connection + # test_basic_scraping() + + print("\nTest completed!") + print("\nTo run the full application:") + print("python web_scraper_app.py") \ No newline at end of file diff --git a/community-contributions/WebScraperApp/web_scraper_app.py b/community-contributions/WebScraperApp/web_scraper_app.py new file mode 100644 index 0000000..ccd5ce2 --- /dev/null +++ b/community-contributions/WebScraperApp/web_scraper_app.py @@ -0,0 +1,1678 @@ +import sys +import json +from urllib.parse import urlparse +from PyQt5.QtWidgets import (QApplication, QMainWindow, QWidget, QVBoxLayout, + QHBoxLayout, QLabel, QLineEdit, QSpinBox, QPushButton, + QTextEdit, QTableWidget, QTableWidgetItem, QTabWidget, + QProgressBar, QComboBox, QMessageBox, QSplitter, + QGroupBox, QGridLayout, QHeaderView, QFrame, QScrollArea, + QSystemTrayIcon, QStyle, QAction, QMenu, QTreeWidget, QTreeWidgetItem, + QListWidget, QListWidgetItem, QSizePolicy, QAbstractItemView) +from PyQt5.QtCore import QThread, pyqtSignal, Qt, QTimer, QUrl +from PyQt5.QtGui import QFont, QIcon, QPalette, QColor, QPixmap +try: + from PyQt5.QtWebEngineWidgets import QWebEngineView + WEB_ENGINE_AVAILABLE = True + print("PyQtWebEngine successfully imported - Visual preview enabled") +except ImportError as e: + WEB_ENGINE_AVAILABLE = False + print(f"PyQtWebEngine not available: {e}") + print("Visual preview will be disabled. Install with: pip install PyQtWebEngine") +import module +import re +import webbrowser +import os +try: + from openai import OpenAI + OPENAI_AVAILABLE = True +except ImportError: + OPENAI_AVAILABLE = False +from datetime import datetime +from dotenv import load_dotenv +import markdown + +# Load environment variables from .env file +load_dotenv() + +class ScrapingThread(QThread): + """Thread for running web scraping operations""" + progress_updated = pyqtSignal(str) + scraping_complete = pyqtSignal(list) + error_occurred = pyqtSignal(str) + + def __init__(self, url, max_depth): + super().__init__() + self.url = url + self.max_depth = max_depth + self.scraper = module.WebScraper() + self._stop_requested = False + + def stop(self): + """Request graceful stop of the scraping process""" + self._stop_requested = True + if hasattr(self.scraper, 'stop_scraping'): + self.scraper.stop_scraping() + + def run(self): + try: + self.progress_updated.emit("Starting web scraping...") + + # Reset scraper state for new crawl + self.scraper.reset() + + def progress_callback(website): + if self._stop_requested: + return # Stop processing if requested + if website: + self.progress_updated.emit(f"Scraped: {website.title} (depth {website.depth})") + + # Start scraping with progress callback + websites = self.scraper.crawl_website(self.url, self.max_depth, progress_callback) + + # Check if stop was requested + if self._stop_requested: + self.progress_updated.emit("Scraping stopped by user.") + return + + # Emit final progress + self.progress_updated.emit(f"Scraping complete! Found {len(websites)} websites.") + self.scraping_complete.emit(websites) + + except Exception as e: + if not self._stop_requested: # Only emit error if not stopped by user + self.error_occurred.emit(str(e)) + +class ModernButton(QPushButton): + """Custom modern button with hover effects""" + def __init__(self, text, primary=False): + super().__init__(text) + self.primary = primary + self.setMinimumHeight(40) + self.setFont(QFont("Segoe UI", 10, QFont.Weight.Medium)) + self.setCursor(Qt.CursorShape.PointingHandCursor) + self.update_style() + + def update_style(self): + if self.primary: + self.setStyleSheet(""" + QPushButton { + background: #3b82f6; + border: none; + color: white; + padding: 12px 24px; + border-radius: 6px; + font-weight: 600; + } + QPushButton:hover { + background: #2563eb; + } + QPushButton:pressed { + background: #1d4ed8; + } + QPushButton:disabled { + background: #9ca3af; + color: #f3f4f6; + } + """) + else: + self.setStyleSheet(""" + QPushButton { + background: white; + border: 1px solid #d1d5db; + color: #374151; + padding: 10px 20px; + border-radius: 6px; + font-weight: 500; + } + QPushButton:hover { + border-color: #3b82f6; + color: #3b82f6; + background: #f8fafc; + } + QPushButton:pressed { + background: #f1f5f9; + } + QPushButton:disabled { + background: #f9fafb; + border-color: #e5e7eb; + color: #9ca3af; + } + """) + +class ModernLineEdit(QLineEdit): + """Custom modern input field""" + def __init__(self, placeholder=""): + super().__init__() + self.setPlaceholderText(placeholder) + self.setMinimumHeight(40) + self.setFont(QFont("Segoe UI", 10)) + self.setStyleSheet(""" + QLineEdit { + border: 1px solid #d1d5db; + border-radius: 6px; + padding: 8px 12px; + background: white; + color: #374151; + font-size: 14px; + } + QLineEdit:focus { + border-color: #3b82f6; + outline: none; + } + QLineEdit::placeholder { + color: #9ca3af; + } + """) + +class ModernSpinBox(QSpinBox): + """Custom modern spin box""" + def __init__(self): + super().__init__() + self.setMinimumHeight(40) + self.setFont(QFont("Segoe UI", 10)) + self.setStyleSheet(""" + QSpinBox { + border: 1px solid #d1d5db; + border-radius: 6px; + padding: 8px 12px; + background: white; + color: #374151; + font-size: 14px; + } + QSpinBox:focus { + border-color: #3b82f6; + } + QSpinBox::up-button, QSpinBox::down-button { + border: none; + background: #f9fafb; + border-radius: 3px; + margin: 2px; + } + QSpinBox::up-button:hover, QSpinBox::down-button:hover { + background: #f3f4f6; + } + """) + +class ChatBubbleWidget(QWidget): + def __init__(self, message, timestamp, role): + super().__init__() + layout = QVBoxLayout(self) + layout.setContentsMargins(0, 0, 0, 0) + layout.setSpacing(2) + # Bubble + if role == "ai": + html = markdown.markdown(message) + bubble = QLabel(html) + bubble.setTextFormat(Qt.TextFormat.RichText) + else: + bubble = QLabel(message) + bubble.setTextFormat(Qt.TextFormat.PlainText) + bubble.setWordWrap(True) + bubble.setTextInteractionFlags(Qt.TextInteractionFlag.TextSelectableByMouse) + bubble.setFont(QFont("Segoe UI", 11)) + bubble.setSizePolicy(QSizePolicy.Preferred, QSizePolicy.Maximum) + bubble.setMinimumWidth(800) + bubble.setMaximumWidth(1200) + bubble.adjustSize() + # Timestamp + ts = QLabel(("🤖 " if role == "ai" else "") + timestamp) + ts.setFont(QFont("Segoe UI", 8)) + ts.setStyleSheet("color: #9ca3af;") + if role == "user": + bubble.setStyleSheet("background: #2563eb; color: white; border-radius: 16px; padding: 10px 16px; margin-left: 40px;") + layout.setAlignment(Qt.AlignmentFlag.AlignRight) + ts.setAlignment(Qt.AlignmentFlag.AlignRight) + else: + bubble.setStyleSheet("background: #f3f4f6; color: #1e293b; border-radius: 16px; padding: 10px 16px; margin-right: 40px;") + layout.setAlignment(Qt.AlignmentFlag.AlignLeft) + ts.setAlignment(Qt.AlignmentFlag.AlignLeft) + layout.addWidget(bubble) + layout.addWidget(ts) + +class WebScraperApp(QMainWindow): + def __init__(self): + super().__init__() + self.websites = [] + self.scraper = module.WebScraper() + self.init_ui() + + def init_ui(self): + self.setWindowTitle("Web Scraper & Data Analyzer") + self.setGeometry(100, 100, 1400, 900) + self.setMinimumSize(1200, 800) # Set minimum size to prevent geometry issues + + # Set clean, minimal styling + self.setStyleSheet(""" + QMainWindow { + background: #1e293b; + } + QTabWidget::pane { + border: none; + background: white; + border-radius: 8px; + margin: 8px 8px 8px 8px; + padding-top: 8px; + } + QTabBar::tab { + background: #475569; + color: #e2e8f0; + padding: 12px 20px; + margin-right: 4px; + border-top-left-radius: 8px; + border-top-right-radius: 8px; + font-weight: 600; + font-size: 14px; + min-width: 120px; + margin-bottom: 8px; + } + QTabBar::tab:selected { + background: white; + color: #1e293b; + border-bottom: none; + margin-bottom: 8px; + } + QTabBar::tab:hover:!selected { + background: #64748b; + color: #f1f5f9; + } + QTabBar::tab:first { + margin-left: 8px; + } + QTabBar::tab:last { + margin-right: 8px; + } + QGroupBox { + font-weight: 600; + font-size: 14px; + border: 2px solid #e2e8f0; + border-radius: 8px; + margin-top: 16px; + padding-top: 16px; + background: #f8fafc; + } + QGroupBox::title { + subcontrol-origin: margin; + left: 16px; + + color: #1e293b; + background: #f8fafc; + } + QTableWidget { + border: 2px solid #e2e8f0; + border-radius: 8px; + background: white; + gridline-color: #f1f5f9; + alternate-background-color: #f8fafc; + selection-background-color: #dbeafe; + selection-color: #1e293b; + } + QTableWidget::item { + padding: 8px 4px; + border: none; + min-height: 20px; + } + QTableWidget::item:selected { + background: #dbeafe; + color: #1e293b; + } + QHeaderView::section { + background: #e2e8f0; + padding: 12px 8px; + border: none; + border-right: 1px solid #cbd5e1; + border-bottom: 1px solid #cbd5e1; + font-weight: 600; + color: #1e293b; + } + QHeaderView::section:vertical { + background: #f8fafc; + padding: 8px 4px; + border: none; + border-bottom: 1px solid #e2e8f0; + font-weight: 500; + color: #64748b; + min-width: 40px; + } + QProgressBar { + border: 2px solid #e2e8f0; + border-radius: 6px; + text-align: center; + background: #f1f5f9; + } + QProgressBar::chunk { + background: #3b82f6; + border-radius: 5px; + } + QTextEdit { + border: 2px solid #e2e8f0; + border-radius: 6px; + padding: 12px; + background: white; + color: #1e293b; + font-family: 'Segoe UI', sans-serif; + } + QComboBox { + border: 2px solid #d1d5db; + border-radius: 6px; + padding: 8px 12px; + background: white; + color: #1e293b; + font-size: 14px; + min-height: 40px; + } + QComboBox:focus { + border-color: #3b82f6; + } + QComboBox::drop-down { + border: none; + width: 30px; + } + QComboBox::down-arrow { + image: none; + border-left: 5px solid transparent; + border-right: 5px solid transparent; + border-top: 5px solid #6b7280; + margin-right: 10px; + } + QLabel { + color: #1e293b; + font-weight: 500; + font-size: 14px; + } + """) + + # System tray icon for notifications + + self.tray_icon = QSystemTrayIcon(self) + self.tray_icon.setIcon(self.style().standardIcon(QStyle.StandardPixmap.SP_ComputerIcon)) + self.tray_icon.setVisible(True) + + # Create central widget and main layout + central_widget = QWidget() + self.setCentralWidget(central_widget) + main_layout = QVBoxLayout(central_widget) + main_layout.setContentsMargins(16, 16, 16, 16) + main_layout.setSpacing(12) + + # Create header + header = self.create_header() + main_layout.addWidget(header) + + # Add proper spacing after header + spacer = QWidget() + spacer.setFixedHeight(12) + main_layout.addWidget(spacer) + + # Create tab widget with proper margins + self.tab_widget = QTabWidget() + self.tab_widget.setStyleSheet(""" + QTabWidget { + margin-top: 0px; + background: transparent; + } + QTabWidget::pane { + border: none; + background: white; + border-radius: 8px; + margin: 4px 8px 8px 8px; + padding-top: 4px; + } + QTabBar { + background: transparent; + spacing: 0px; + } + QTabBar::tab { + background: #475569; + color: #e2e8f0; + padding: 12px 20px; + margin-right: 4px; + border-top-left-radius: 8px; + border-top-right-radius: 8px; + font-weight: 600; + font-size: 14px; + min-width: 120px; + margin-bottom: 4px; + } + QTabBar::tab:selected { + background: white; + color: #1e293b; + border-bottom: none; + margin-bottom: 4px; + } + QTabBar::tab:hover:!selected { + background: #64748b; + color: #f1f5f9; + } + QTabBar::tab:first { + margin-left: 8px; + } + QTabBar::tab:last { + margin-right: 8px; + } + """) + main_layout.addWidget(self.tab_widget) + + # Create tabs + self.create_scraping_tab() + self.create_data_tab() + self.create_analysis_tab() + self.create_sitemap_tab() + self.create_ai_tab() + + def create_header(self): + """Create a clean header with help button only (no theme toggle)""" + header_widget = QWidget() + header_widget.setStyleSheet(""" + QWidget { + background: #0f172a; + border-radius: 12px; + margin: 4px 4px 8px 4px; + } + """) + header_layout = QHBoxLayout(header_widget) + header_layout.setContentsMargins(24, 20, 24, 20) + header_layout.setSpacing(16) + + # Title + title_label = QLabel("Web Scraper & Data Analyzer") + title_label.setStyleSheet(""" + QLabel { + color: #f8fafc; + font-size: 28px; + font-weight: 800; + font-family: 'Segoe UI', sans-serif; + } + """) + + # Subtitle + subtitle_label = QLabel("Modern web scraping with intelligent data analysis") + subtitle_label.setStyleSheet(""" + QLabel { + color: #cbd5e1; + font-size: 16px; + font-weight: 500; + font-family: 'Segoe UI', sans-serif; + } + """) + + # Help button + help_button = ModernButton("Help") + help_button.clicked.connect(self.show_help) + + # Right side info + info_widget = QWidget() + info_layout = QVBoxLayout(info_widget) + info_layout.setAlignment(Qt.AlignmentFlag.AlignRight) + info_layout.setSpacing(4) + + version_label = QLabel("v2.0") + version_label.setStyleSheet(""" + QLabel { + color: #94a3b8; + font-size: 14px; + font-weight: 600; + background: #1e293b; + padding: 6px 12px; + border-radius: 6px; + border: 1px solid #334155; + } + """) + + info_layout.addWidget(version_label) + + header_layout.addWidget(title_label) + header_layout.addStretch() + header_layout.addWidget(subtitle_label) + header_layout.addStretch() + header_layout.addWidget(help_button) + header_layout.addWidget(info_widget) + + return header_widget + + def create_scraping_tab(self): + """Create the web scraping configuration tab""" + scraping_widget = QWidget() + main_layout = QVBoxLayout(scraping_widget) + main_layout.setContentsMargins(16, 16, 16, 16) + main_layout.setSpacing(16) + + # Create scroll area + scroll_area = QScrollArea() + scroll_area.setWidgetResizable(True) + scroll_area.setStyleSheet("QScrollArea { border: none; }") + scroll_area.setHorizontalScrollBarPolicy(Qt.ScrollBarPolicy.ScrollBarAsNeeded) + scroll_area.setVerticalScrollBarPolicy(Qt.ScrollBarPolicy.ScrollBarAsNeeded) + + # Create content widget for scrolling + content_widget = QWidget() + layout = QVBoxLayout(content_widget) + layout.setSpacing(16) + layout.setContentsMargins(0, 0, 0, 0) + + # Input group + input_group = QGroupBox("Scraping Configuration") + input_layout = QGridLayout(input_group) + input_layout.setSpacing(12) + + # URL input + input_layout.addWidget(QLabel("Website URL:"), 0, 0) + self.url_input = ModernLineEdit("https://example.com") + input_layout.addWidget(self.url_input, 0, 1) + + # Depth input + input_layout.addWidget(QLabel("Max Depth (1-100):"), 1, 0) + self.depth_input = ModernSpinBox() + self.depth_input.setRange(1, 100) + self.depth_input.setValue(3) + input_layout.addWidget(self.depth_input, 1, 1) + + # Control buttons + button_layout = QHBoxLayout() + button_layout.setSpacing(8) + + self.start_button = ModernButton("Start Scraping", primary=True) + self.start_button.clicked.connect(self.start_scraping) + button_layout.addWidget(self.start_button) + + self.stop_button = ModernButton("Stop") + self.stop_button.clicked.connect(self.stop_scraping) + self.stop_button.setEnabled(False) + button_layout.addWidget(self.stop_button) + + input_layout.addLayout(button_layout, 2, 0, 1, 2) + layout.addWidget(input_group) + + # Progress group + progress_group = QGroupBox("Progress") + progress_layout = QVBoxLayout(progress_group) + progress_layout.setSpacing(8) + + self.progress_bar = QProgressBar() + self.progress_bar.setVisible(False) + self.progress_bar.setMinimumHeight(20) + progress_layout.addWidget(self.progress_bar) + + self.status_label = QLabel("Ready to start scraping...") + self.status_label.setStyleSheet(""" + QLabel { + color: #374151; + font-size: 14px; + padding: 8px; + background: #f8fafc; + border-radius: 6px; + border-left: 3px solid #3b82f6; + } + """) + self.status_label.setWordWrap(True) # Enable word wrapping + progress_layout.addWidget(self.status_label) + + layout.addWidget(progress_group) + + # Results preview + results_group = QGroupBox("Scraping Results") + results_layout = QVBoxLayout(results_group) + + self.results_text = QTextEdit() + self.results_text.setReadOnly(True) + self.results_text.setMinimumHeight(80) # Reduced minimum height for more compact output + results_layout.addWidget(self.results_text) + + layout.addWidget(results_group) + + # Set the content widget in the scroll area + scroll_area.setWidget(content_widget) + main_layout.addWidget(scroll_area) + + self.tab_widget.addTab(scraping_widget, "Web Scraping") + + def create_data_tab(self): + """Create the data viewing and filtering tab""" + data_widget = QWidget() + layout = QVBoxLayout(data_widget) + layout.setSpacing(16) + + # Search and filter controls + controls_group = QGroupBox("Search & Filter") + controls_layout = QHBoxLayout(controls_group) + controls_layout.setSpacing(12) + + controls_layout.addWidget(QLabel("Search:")) + self.search_input = ModernLineEdit("Enter search term...") + self.search_input.textChanged.connect(self.filter_data) + controls_layout.addWidget(self.search_input) + + controls_layout.addWidget(QLabel("Domain:")) + self.domain_filter = QComboBox() + self.domain_filter.currentTextChanged.connect(self.filter_data) + controls_layout.addWidget(self.domain_filter) + + self.export_button = ModernButton("Export Data") + self.export_button.clicked.connect(self.export_data) + controls_layout.addWidget(self.export_button) + + # Sitemap button + self.sitemap_button = ModernButton("Generate Sitemap.xml") + self.sitemap_button.clicked.connect(self.generate_sitemap) + controls_layout.addWidget(self.sitemap_button) + + layout.addWidget(controls_group) + + # Data table + self.data_table = QTableWidget() + self.data_table.setColumnCount(6) + self.data_table.setHorizontalHeaderLabels([ + "Title", "URL", "Depth", "Links", "Words", "Load Time" + ]) + + # Set table properties to fill available width + header = self.data_table.horizontalHeader() + header.setStretchLastSection(False) # Don't stretch the last section + + # Set resize modes to make table fill width properly + header.setSectionResizeMode(0, QHeaderView.Stretch) # Title - stretch to fill + header.setSectionResizeMode(1, QHeaderView.Stretch) # URL - stretch to fill + header.setSectionResizeMode(2, QHeaderView.Fixed) # Depth - fixed + header.setSectionResizeMode(3, QHeaderView.Fixed) # Links - fixed + header.setSectionResizeMode(4, QHeaderView.Fixed) # Words - fixed + header.setSectionResizeMode(5, QHeaderView.Fixed) # Load Time - fixed + + # Set fixed column widths for non-stretching columns + self.data_table.setColumnWidth(2, 80) # Depth + self.data_table.setColumnWidth(3, 80) # Links + self.data_table.setColumnWidth(4, 80) # Words + self.data_table.setColumnWidth(5, 100) # Load Time + + # Set row height to prevent index cutoff + self.data_table.verticalHeader().setDefaultSectionSize(40) # Increased row height + self.data_table.verticalHeader().setMinimumSectionSize(35) # Minimum row height + + # Enable word wrapping for title and URL columns + self.data_table.setWordWrap(True) + + # Connect double-click signal + self.data_table.cellDoubleClicked.connect(self.show_content_preview) + + layout.addWidget(self.data_table) + + self.tab_widget.addTab(data_widget, "Data View") + + def create_analysis_tab(self): + """Create the data analysis tab""" + analysis_widget = QWidget() + layout = QVBoxLayout(analysis_widget) + layout.setSpacing(16) + + # Create scroll area for better layout + scroll_area = QScrollArea() + scroll_area.setWidgetResizable(True) + scroll_area.setStyleSheet("QScrollArea { border: none; }") + + content_widget = QWidget() + content_layout = QVBoxLayout(content_widget) + content_layout.setSpacing(16) + + # Statistics group + stats_group = QGroupBox("Statistics") + stats_layout = QGridLayout(stats_group) + stats_layout.setSpacing(12) + + self.stats_labels = {} + stats_fields = [ + ("Total Pages", "Total Pages"), + ("Total Links", "Total Links"), + ("Total Words", "Total Words"), + ("Average Load Time", "Average Load Time"), + ("Max Depth Reached", "Max Depth Reached") + ] + + for i, (label_text, field) in enumerate(stats_fields): + stats_layout.addWidget(QLabel(f"{label_text}:"), i, 0) + label = QLabel("0") + label.setStyleSheet(""" + QLabel { + font-weight: 700; + color: #3b82f6; + font-size: 16px; + padding: 8px 12px; + background: #eff6ff; + border-radius: 6px; + border-left: 3px solid #3b82f6; + } + """) + self.stats_labels[field] = label + stats_layout.addWidget(label, i, 1) + + content_layout.addWidget(stats_group) + + # Domain breakdown + domain_group = QGroupBox("Domain Breakdown") + domain_layout = QVBoxLayout(domain_group) + + self.domain_text = QTextEdit() + self.domain_text.setReadOnly(True) + self.domain_text.setMaximumHeight(150) + domain_layout.addWidget(self.domain_text) + + content_layout.addWidget(domain_group) + + # Content preview + content_preview_group = QGroupBox("Content Preview") + content_preview_layout = QVBoxLayout(content_preview_group) + + # Create splitter for text and visual preview + preview_splitter = QSplitter(Qt.Orientation.Horizontal) + + # Text preview + text_preview_widget = QWidget() + text_preview_layout = QVBoxLayout(text_preview_widget) + text_preview_layout.setContentsMargins(0, 0, 0, 0) + + text_label = QLabel("Text Content:") + text_label.setStyleSheet("font-weight: 600; margin-bottom: 8px;") + text_preview_layout.addWidget(text_label) + + self.content_text = QTextEdit() + self.content_text.setReadOnly(True) + self.content_text.setMaximumHeight(400) + self.content_text.setFont(QFont("Segoe UI", 12)) + self.content_text.setStyleSheet(""" + QTextEdit { + font-size: 12px; + line-height: 1.4; + padding: 16px; + } + """) + text_preview_layout.addWidget(self.content_text) + + # Visual HTML preview + visual_preview_widget = QWidget() + visual_preview_layout = QVBoxLayout(visual_preview_widget) + visual_preview_layout.setContentsMargins(0, 0, 0, 0) + + visual_label = QLabel("Visual Preview:") + visual_label.setStyleSheet("font-weight: 600; margin-bottom: 8px;") + visual_preview_layout.addWidget(visual_label) + + if WEB_ENGINE_AVAILABLE: + self.web_view = QWebEngineView() + self.web_view.setMinimumHeight(400) + self.web_view.setMaximumHeight(400) + visual_preview_layout.addWidget(self.web_view) + else: + self.web_view = QLabel("Visual preview not available\nInstall PyQtWebEngine for HTML rendering") + self.web_view.setStyleSheet("color: #6b7280; padding: 20px; text-align: center;") + self.web_view.setMinimumHeight(400) + self.web_view.setMaximumHeight(400) + visual_preview_layout.addWidget(self.web_view) + + # Add widgets to splitter + preview_splitter.addWidget(text_preview_widget) + preview_splitter.addWidget(visual_preview_widget) + preview_splitter.setSizes([400, 600]) # Set initial split ratio + + content_preview_layout.addWidget(preview_splitter) + + content_layout.addWidget(content_preview_group) + + scroll_area.setWidget(content_widget) + layout.addWidget(scroll_area) + + self.tab_widget.addTab(analysis_widget, "Analysis") + + def create_sitemap_tab(self): + """Create the visual sitemap tab with a tree widget and export button""" + sitemap_widget = QWidget() + layout = QVBoxLayout(sitemap_widget) + layout.setSpacing(16) + + # Export button + self.export_sitemap_button = ModernButton("Export Sitemap (JSON)") + self.export_sitemap_button.clicked.connect(self.export_sitemap_json) + layout.addWidget(self.export_sitemap_button) + + self.sitemap_tree = QTreeWidget() + self.sitemap_tree.setHeaderLabels(["Page Title", "URL"]) + self.sitemap_tree.setColumnWidth(0, 350) + self.sitemap_tree.setColumnWidth(1, 600) + self.sitemap_tree.itemDoubleClicked.connect(self.open_url_in_browser) + layout.addWidget(self.sitemap_tree) + + self.tab_widget.addTab(sitemap_widget, "Sitemap") + + def create_ai_tab(self): + """Create a simplified, modern AI Analysis tab with a chat interface and compact quick actions, using more curves to match the app style.""" + ai_widget = QWidget() + layout = QVBoxLayout(ai_widget) + layout.setSpacing(8) + layout.setContentsMargins(16, 16, 16, 16) + + hint_label = QLabel("💡 Ask questions about your scraped websites below.") + hint_label.setStyleSheet(""" + QLabel { + color: #64748b; + font-size: 13px; + padding: 4px 0 8px 0; + } + """) + layout.addWidget(hint_label) + + # --- Chat area --- + self.ai_chat_history = QListWidget() + self.ai_chat_history.setStyleSheet(""" + QListWidget { + background: #f8fafc; + border: 1.5px solid #e2e8f0; + border-radius: 22px; + font-size: 15px; + color: #1e293b; + padding: 12px; + font-family: 'Segoe UI', sans-serif; + } + """) + self.ai_chat_history.setSpacing(6) + self.ai_chat_history.setMinimumHeight(300) + self.ai_chat_history.setResizeMode(QListWidget.Adjust) + self.ai_chat_history.setVerticalScrollMode(QAbstractItemView.ScrollPerPixel) + layout.addWidget(self.ai_chat_history, stretch=1) + self.chat_messages = [] # Store (role, message, timestamp) tuples + self.render_chat_history() + + # --- Quick action buttons --- + quick_actions_widget = QWidget() + quick_actions_layout = QHBoxLayout(quick_actions_widget) + quick_actions_layout.setSpacing(8) + quick_actions_layout.setContentsMargins(0, 0, 0, 0) + quick_questions = [ + "Analyze the website structure", + "Find key content themes", + "Suggest SEO improvements", + "Compare page performance" + ] + for question in quick_questions: + quick_btn = QPushButton(question) + quick_btn.setFont(QFont("Segoe UI", 10)) + quick_btn.setCursor(Qt.CursorShape.PointingHandCursor) + quick_btn.clicked.connect(lambda _, q=question: self.quick_question(q)) + quick_btn.setStyleSheet(""" + QPushButton { + background: #e0e7ef; + border: none; + color: #374151; + padding: 8px 22px; + border-radius: 22px; + font-weight: 500; + font-size: 13px; + box-shadow: 0 2px 8px rgba(59, 130, 246, 0.04); + } + QPushButton:hover { + background: #3b82f6; + color: white; + } + QPushButton:pressed { + background: #2563eb; + color: white; + } + """) + quick_actions_layout.addWidget(quick_btn) + layout.addWidget(quick_actions_widget) + + # --- Input area --- + input_container = QWidget() + input_layout = QHBoxLayout(input_container) + input_layout.setContentsMargins(0, 0, 0, 0) + input_layout.setSpacing(8) + self.ai_input = QLineEdit() + self.ai_input.setPlaceholderText("Type your question and press Enter...") + self.ai_input.setMinimumHeight(44) + self.ai_input.setFont(QFont("Segoe UI", 12)) + self.ai_input.returnPressed.connect(self.send_ai_message) + self.ai_input.setStyleSheet(""" + QLineEdit { + border: 1.5px solid #e2e8f0; + border-radius: 22px; + padding: 10px 20px; + background: white; + color: #1e293b; + font-size: 14px; + } + QLineEdit:focus { + border-color: #3b82f6; + outline: none; + } + QLineEdit::placeholder { + color: #9ca3af; + } + """) + self.ai_send_button = QPushButton("Send") + self.ai_send_button.setMinimumHeight(44) + self.ai_send_button.setMinimumWidth(80) + self.ai_send_button.setFont(QFont("Segoe UI", 12, QFont.Weight.Medium)) + self.ai_send_button.setCursor(Qt.CursorShape.PointingHandCursor) + self.ai_send_button.clicked.connect(self.send_ai_message) + self.ai_send_button.setStyleSheet(""" + QPushButton { + background: #3b82f6; + border: none; + color: white; + padding: 10px 28px; + border-radius: 22px; + font-weight: 600; + font-size: 15px; + box-shadow: 0 2px 8px rgba(59, 130, 246, 0.08); + } + QPushButton:hover { + background: #2563eb; + } + QPushButton:pressed { + background: #1d4ed8; + } + QPushButton:disabled { + background: #9ca3af; + color: #f3f4f6; + } + """) + input_layout.addWidget(self.ai_input, stretch=1) + input_layout.addWidget(self.ai_send_button) + layout.addWidget(input_container) + + self.tab_widget.addTab(ai_widget, "AI Analysis") + ai_tab_index = self.tab_widget.count() - 1 + self.set_ai_tab_gradient(ai_tab_index) + + def render_chat_history(self): + self.ai_chat_history.clear() + for role, msg, timestamp in self.chat_messages: + item = QListWidgetItem() + bubble = ChatBubbleWidget(msg, timestamp, role) + bubble.adjustSize() + item.setSizeHint(bubble.sizeHint()) + self.ai_chat_history.addItem(item) + self.ai_chat_history.setItemWidget(item, bubble) + self.ai_chat_history.scrollToBottom() + + def send_ai_message(self): + user_msg = self.ai_input.text().strip() + if not user_msg: + return + timestamp = datetime.now().strftime("%H:%M") + self.chat_messages.append(("user", user_msg, timestamp)) + self.render_chat_history() + self.ai_input.clear() + # Show thinking indicator as AI message + self.chat_messages.append(("ai", "🤔 Analyzing your question...", timestamp)) + self.render_chat_history() + ai_context = self.get_ai_context(user_msg) + QTimer.singleShot(100, lambda: self._do_ai_response_openrouter(user_msg, ai_context)) + + def _do_ai_response_openrouter(self, user_msg, ai_context): + if OPENAI_AVAILABLE: + try: + client = OpenAI( + base_url="https://openrouter.ai/api/v1", + api_key=os.environ.get("OPENROUTER_API_KEY"), + ) + system_prompt = """You are an expert website analyst and AI assistant specializing in web scraping analysis. Your role is to:\n\n1. **Analyze website content** - Provide insights about the scraped websites\n2. **Identify patterns** - Find common themes, structures, and content types\n3. **Offer recommendations** - Suggest improvements for SEO, content, or structure\n4. **Answer questions** - Respond to specific queries about the websites\n5. **Provide actionable insights** - Give practical advice based on the data\n\n**Response Guidelines:**\n- Be professional yet conversational\n- Use clear, structured responses with bullet points when appropriate\n- Reference specific websites by title when relevant\n- Provide specific examples from the content\n- Suggest actionable next steps when possible\n- Use markdown formatting for better readability\n\n**Context:** You have access to scraped website data including titles, URLs, content previews, and metadata.""" + user_prompt = f"""# Website Analysis Request\n\n## User Question\n{user_msg}\n\n## Available Website Data\n{ai_context}\n\n## Instructions\nPlease provide a comprehensive analysis based on the user's question. Use the website data above to support your response. If the question is about specific aspects (SEO, content, structure, etc.), focus your analysis accordingly.\n\n**Format your response with:**\n- Clear headings and structure\n- Specific examples from the websites\n- Actionable insights and recommendations\n- Professional, helpful tone""" + completion = client.chat.completions.create( + extra_headers={ + "HTTP-Referer": "http://localhost:8000", + "X-Title": "Web Scraper & Data Analyzer - AI Analysis", + }, + extra_body={}, + model="deepseek/deepseek-r1-0528-qwen3-8b:free", + messages=[ + {"role": "system", "content": system_prompt}, + {"role": "user", "content": user_prompt} + ], + temperature=0.7, + max_tokens=2000 + ) + try: + answer = completion.choices[0].message.content + if answer is not None: + answer = answer.strip() + else: + answer = "❌ **AI Analysis Error**\n\nNo response content received from the AI model." + except (AttributeError, IndexError, KeyError): + answer = "❌ **AI Analysis Error**\n\nUnexpected response format from the AI model." + if hasattr(self, "ai_stats_label"): + self.ai_stats_label.setText(f"Analyzed {len(self.websites)} websites") + except Exception as e: + answer = f"❌ **AI Analysis Error**\n\nI encountered an error while analyzing your request: `{str(e)}`\n\nPlease try again or check your internet connection." + else: + if ai_context == "No data available. Please scrape some websites first.": + answer = "📊 **No Data Available**\n\nPlease scrape some websites first to enable AI analysis." + else: + answer = f"🤖 **AI Analysis Preview**\n\nI have analyzed {len(self.websites)} websites. Your question: '{user_msg}'\n\n*(This is a placeholder response. Install the 'openai' package for real AI analysis.)*" + # Remove the last AI thinking message + if self.chat_messages and self.chat_messages[-1][1].startswith("🤔"): + self.chat_messages.pop() + timestamp = datetime.now().strftime("%H:%M") + self.chat_messages.append(("ai", answer, timestamp)) + self.render_chat_history() + + def open_url_in_browser(self, item, column): + url = item.data(1, Qt.ItemDataRole.DisplayRole) + if url: + webbrowser.open(url) + + def get_icon(self, is_root=False): + + if is_root: + return self.style().standardIcon(QStyle.StandardPixmap.SP_DesktopIcon) + else: + return self.style().standardIcon(QStyle.StandardPixmap.SP_DirIcon) + """Build and display the sitemap tree from crawled data, with icons and tooltips""" + self.sitemap_tree.clear() + if not self.websites: + return + url_to_website = {w.url: w for w in self.websites} + children_map = {w.url: [] for w in self.websites} + for w in self.websites: + for link in w.links: + if link in url_to_website: + children_map[w.url].append(link) + root_url = self.websites[0].url + def add_items(parent_item, url, visited, depth): + if url in visited: + return + visited.add(url) + website = url_to_website[url] + item = QTreeWidgetItem([website.title, website.url]) + item.setIcon(0, self.get_icon(is_root=False)) + tooltip = f"Title: {website.title}
" + tooltip += f"URL: {website.url}
" + tooltip += f"Depth: {website.depth}
" + tooltip += f"Outgoing Links: {len(website.links)}" + item.setToolTip(0, tooltip) + item.setToolTip(1, tooltip) + parent_item.addChild(item) + for child_url in children_map[url]: + add_items(item, child_url, visited, depth+1) + root_website = url_to_website[root_url] + root_item = QTreeWidgetItem([root_website.title, root_website.url]) + root_item.setIcon(0, self.get_icon(is_root=True)) + tooltip = f"Title: {root_website.title}
" + tooltip += f"URL: {root_website.url}
" + tooltip += f"Depth: {root_website.depth}
" + tooltip += f"Outgoing Links: {len(root_website.links)}" + root_item.setToolTip(0, tooltip) + root_item.setToolTip(1, tooltip) + self.sitemap_tree.addTopLevelItem(root_item) + visited = set([root_url]) + for child_url in children_map[root_url]: + add_items(root_item, child_url, visited, 1) + self.sitemap_tree.expandToDepth(1) + + def export_sitemap_json(self): + """Export the sitemap tree as a JSON file (preserving hierarchy)""" + if not self.websites: + QMessageBox.warning(self, "Error", "No sitemap data to export.") + return + def build_tree(item): + data = { + 'title': item.text(0), + 'url': item.text(1), + 'children': [build_tree(item.child(i)) for i in range(item.childCount())] + } + return data + root = self.sitemap_tree.topLevelItem(0) + if not root: + QMessageBox.warning(self, "Error", "No sitemap data to export.") + return + sitemap_data = build_tree(root) + try: + with open('sitemap_tree.json', 'w', encoding='utf-8') as f: + json.dump(sitemap_data, f, indent=2, ensure_ascii=False) + QMessageBox.information(self, "Success", "Sitemap exported to 'sitemap_tree.json'") + except Exception as e: + QMessageBox.critical(self, "Error", f"Failed to export sitemap: {e}") + + def is_valid_url(self, url): + """Check if the URL is valid (basic check for scheme and domain)""" + try: + parsed = urlparse(url) + return all([parsed.scheme in ("http", "https"), parsed.netloc]) + except Exception: + return False + + def start_scraping(self): + """Start the web scraping process""" + url = self.url_input.text().strip() + if not url: + QMessageBox.warning(self, "Error", "Please enter a valid URL") + return + + if not url.startswith(('http://', 'https://')): + url = 'https://' + url + + # Validate URL format + if not self.is_valid_url(url): + QMessageBox.warning(self, "Invalid URL", "Please enter a valid website URL (e.g. https://example.com)") + return + + max_depth = self.depth_input.value() + + # Update UI + self.start_button.setEnabled(False) + self.stop_button.setEnabled(True) + self.progress_bar.setVisible(True) + self.progress_bar.setRange(0, 0) # Indeterminate progress + self.status_label.setText("Scraping in progress...") + self.status_label.setStyleSheet(""" + QLabel { + color: #1e40af; + font-size: 14px; + padding: 8px; + background: #eff6ff; + border-radius: 6px; + border-left: 3px solid #3b82f6; + } + """) + + # Start scraping thread + self.scraping_thread = ScrapingThread(url, max_depth) + self.scraping_thread.progress_updated.connect(self.update_progress) + self.scraping_thread.scraping_complete.connect(self.scraping_finished) + self.scraping_thread.error_occurred.connect(self.scraping_error) + self.scraping_thread.start() + + def stop_scraping(self): + """Stop the scraping process""" + if hasattr(self, 'scraping_thread') and self.scraping_thread.isRunning(): + # Use graceful stop instead of forceful termination + self.scraping_thread.stop() + + # Wait for the thread to finish gracefully (with timeout) + if not self.scraping_thread.wait(5000): # Wait up to 5 seconds + # If it doesn't stop gracefully, then force terminate + self.scraping_thread.terminate() + self.scraping_thread.wait(2000) # Wait up to 2 more seconds + + self.start_button.setEnabled(True) + self.stop_button.setEnabled(False) + self.progress_bar.setVisible(False) + self.status_label.setText("Scraping stopped.") + self.status_label.setStyleSheet(""" + QLabel { + color: #92400e; + font-size: 14px; + padding: 8px; + background: #fffbeb; + border-radius: 6px; + border-left: 3px solid #f59e0b; + } + """) + + def update_progress(self, message): + """Update progress message""" + self.status_label.setText(message) + self.results_text.append(message) + + def show_help(self): + """Show a help/info dialog with usage instructions (no theme switch info)""" + help_text = ( + "

Web Scraper & Data Analyzer - Help

" + "" + "

For more info, see the README or contact support.

" + ) + QMessageBox.information(self, "Help / Info", help_text) + + def scraping_finished(self, websites): + """Handle scraping completion""" + self.websites = websites + self.scraper.websites = websites + + # Update UI + self.start_button.setEnabled(True) + self.stop_button.setEnabled(False) + self.progress_bar.setVisible(False) + self.status_label.setText(f"Scraping complete! Found {len(websites)} websites.") + self.status_label.setStyleSheet(""" + QLabel { + color: #166534; + font-size: 14px; + padding: 8px; + background: #f0fdf4; + border-radius: 6px; + border-left: 3px solid #22c55e; + } + """) + + # Update data view + self.update_data_table() + self.update_analysis() + self.update_sitemap_tree() + + # Switch to data tab + self.tab_widget.setCurrentIndex(1) + + # Show desktop notification + self.tray_icon.showMessage( + "Web Scraper", + f"Scraping complete! Found {len(websites)} websites.", + QSystemTrayIcon.MessageIcon(1), # 1 = Information + 5000 + ) + + def scraping_error(self, error_message): + """Handle scraping errors""" + QMessageBox.critical(self, "Error", f"Scraping failed: {error_message}") + self.start_button.setEnabled(True) + self.stop_button.setEnabled(False) + self.progress_bar.setVisible(False) + self.status_label.setText("Scraping failed.") + self.status_label.setStyleSheet(""" + QLabel { + color: #991b1b; + font-size: 14px; + padding: 8px; + background: #fef2f2; + border-radius: 6px; + border-left: 3px solid #ef4444; + } + """) + + # Show desktop notification + self.tray_icon.showMessage( + "Web Scraper", + f"Scraping failed: {error_message}", + QSystemTrayIcon.MessageIcon(3), + 5000 + ) + + def update_data_table(self): + """Update the data table with scraped websites""" + self.data_table.setRowCount(len(self.websites)) + for row, website in enumerate(self.websites): + self.data_table.setRowHeight(row, 40) + title_item = QTableWidgetItem(website.title) + title_item.setTextAlignment(Qt.AlignmentFlag.AlignTop | Qt.AlignmentFlag.AlignLeft) + url_item = QTableWidgetItem(website.url) + url_item.setTextAlignment(Qt.AlignmentFlag.AlignTop | Qt.AlignmentFlag.AlignLeft) + depth_item = QTableWidgetItem(str(website.depth)) + depth_item.setTextAlignment(Qt.AlignmentFlag.AlignCenter) + links_item = QTableWidgetItem(str(len(website.links))) + links_item.setTextAlignment(Qt.AlignmentFlag.AlignCenter) + words_item = QTableWidgetItem(str(website.get_word_count())) + words_item.setTextAlignment(Qt.AlignmentFlag.AlignCenter) + load_time = f"{website.load_time:.2f}s" if website.load_time else "N/A" + load_time_item = QTableWidgetItem(load_time) + load_time_item.setTextAlignment(Qt.AlignmentFlag.AlignCenter) + self.data_table.setItem(row, 0, title_item) + self.data_table.setItem(row, 1, url_item) + self.data_table.setItem(row, 2, depth_item) + self.data_table.setItem(row, 3, links_item) + self.data_table.setItem(row, 4, words_item) + self.data_table.setItem(row, 5, load_time_item) + # Update domain filter + domains = list(set(w.get_normalized_domain() for w in self.websites)) + self.domain_filter.clear() + self.domain_filter.addItem("All Domains") + self.domain_filter.addItems(domains) + # Update content preview with first website + if self.websites: + first_website = self.websites[0] + content_preview = first_website.get_text_preview(800) + self.content_text.setText(content_preview) + + # Also update visual preview for first website + if WEB_ENGINE_AVAILABLE and hasattr(self, 'web_view'): + try: + html_content = first_website.content + if html_content and html_content.strip(): + full_html = f""" + + + + + + {first_website.title} + + + + {html_content} + + + """ + self.web_view.setHtml(full_html, QUrl(first_website.url)) + else: + self.web_view.setHtml(""" + + +

No HTML Content Available

+

This page doesn't have HTML content to display in the visual preview.

+ + + """) + except Exception as e: + self.web_view.setHtml(f""" + + +

Error Loading Preview

+

Failed to load the visual preview:

+

{str(e)}

+

This might be due to:

+ + + + """) + + def filter_data(self): + """Filter the data table based on search and domain filters""" + search_term = self.search_input.text().lower() + selected_domain = self.domain_filter.currentText() + + for row in range(self.data_table.rowCount()): + website = self.websites[row] + + # Check search term + matches_search = (search_term in website.title.lower() or + search_term in website.url.lower() or + website.search_content(search_term)) + + # Check domain filter + matches_domain = (selected_domain == "All Domains" or + website.get_normalized_domain() == selected_domain) + + # Show/hide row + self.data_table.setRowHidden(row, not (matches_search and matches_domain)) + + def update_analysis(self): + """Update the analysis tab with enhanced statistics""" + if not self.websites: + return + + stats = self.scraper.get_statistics() + + # Update statistics labels + self.stats_labels["Total Pages"].setText(str(stats['total_pages'])) + self.stats_labels["Total Links"].setText(str(stats['total_links'])) + self.stats_labels["Total Words"].setText(str(stats['total_words'])) + self.stats_labels["Average Load Time"].setText(f"{stats['avg_load_time']:.2f}s") + self.stats_labels["Max Depth Reached"].setText(str(stats['max_depth_reached'])) + + # Update domain breakdown with enhanced information + domain_text = "Domain Breakdown:\n\n" + + # Show visited URLs count + domain_text += f"📊 Total URLs Checked: {stats.get('visited_urls_count', 0)}\n" + domain_text += f"🎯 Starting Domain: {stats.get('start_domain', 'N/A')}\n\n" + + # Show domain page counts + if stats.get('domain_page_counts'): + domain_text += "📈 Pages per Domain:\n" + for domain, count in stats['domain_page_counts'].items(): + domain_text += f" • {domain}: {count} pages\n" + domain_text += "\n" + + # Show final domain breakdown + domain_text += "🏠 Final Domain Distribution:\n" + for domain, count in stats['domains'].items(): + domain_text += f" • {domain}: {count} pages\n" + + self.domain_text.setText(domain_text) + + def export_data(self): + """Export scraped data to JSON file""" + if not self.websites: + QMessageBox.warning(self, "Error", "No data to export") + return + + try: + data = [] + for website in self.websites: + website_data = { + 'title': website.title, + 'url': website.url, + 'depth': website.depth, + 'links': website.links, + 'word_count': website.get_word_count(), + 'load_time': website.load_time, + 'domain': website.get_domain(), + 'normalized_domain': website.get_normalized_domain(), + 'timestamp': website.timestamp.isoformat() + } + data.append(website_data) + + with open('scraped_data.json', 'w', encoding='utf-8') as f: + json.dump(data, f, indent=2, ensure_ascii=False) + + QMessageBox.information(self, "Success", "Data exported to 'scraped_data.json'") + + except Exception as e: + QMessageBox.critical(self, "Error", f"Failed to export data: {e}") + + def show_content_preview(self, row, column): + """Show content preview for the selected website""" + if row < len(self.websites): + website = self.websites[row] + + # Update text preview with more content + content_preview = website.get_text_preview(1000) # Increased from 500 + self.content_text.setText(content_preview) + + # Update visual HTML preview + if WEB_ENGINE_AVAILABLE and hasattr(self, 'web_view'): + try: + # Get the HTML content + html_content = website.content + if html_content and html_content.strip(): + # Create a complete HTML document with proper encoding + full_html = f""" + + + + + + {website.title} + + + + {html_content} + + + """ + + # Load the HTML content + self.web_view.setHtml(full_html, QUrl(website.url)) + else: + # Show a message if no HTML content + self.web_view.setHtml(""" + + +

No HTML Content Available

+

This page doesn't have HTML content to display in the visual preview.

+

Check the text preview tab for the extracted content.

+ + + """) + except Exception as e: + # Show error message in the web view + error_html = f""" + + +

Error Loading Preview

+

Failed to load the visual preview:

+

{str(e)}

+

This might be due to:

+ + + + """ + self.web_view.setHtml(error_html) + else: + # Fallback for when PyQtWebEngine is not available + if hasattr(self, 'web_view'): + self.web_view.setText("Visual preview not available\nInstall PyQtWebEngine for HTML rendering") + + def generate_sitemap(self): + """Generate sitemap.xml from crawled URLs""" + if not self.websites: + QMessageBox.warning(self, "Error", "No data to generate sitemap.") + return + try: + urls = [w.url for w in self.websites] + sitemap = [ + '', + '' + ] + for url in urls: + sitemap.append(" ") + sitemap.append(f" {url}") + sitemap.append(" ") + sitemap.append("") + with open("sitemap.xml", "w", encoding="utf-8") as f: + f.write("\n".join(sitemap)) + QMessageBox.information(self, "Sitemap Generated", "sitemap.xml has been created in the current directory.") + self.tray_icon.showMessage( + "Web Scraper", + "sitemap.xml has been generated.", + QSystemTrayIcon.MessageIcon(1), + 4000 + ) + except Exception as e: + QMessageBox.critical(self, "Error", f"Failed to generate sitemap: {e}") + self.tray_icon.showMessage( + "Web Scraper", + f"Failed to generate sitemap: {e}", + QSystemTrayIcon.MessageIcon(3), + 4000 + ) + + def update_sitemap_tree(self): + """Build and display the sitemap tree from crawled data, with icons and tooltips.""" + self.sitemap_tree.clear() + if not self.websites: + return + url_to_website = {w.url: w for w in self.websites} + children_map = {w.url: [] for w in self.websites} + for w in self.websites: + for link in w.links: + if link in url_to_website: + children_map[w.url].append(link) + root_url = self.websites[0].url + def add_items(parent_item, url, visited, depth): + if url in visited: + return + visited.add(url) + website = url_to_website[url] + item = QTreeWidgetItem([website.title, website.url]) + item.setIcon(0, self.get_icon(is_root=False)) + tooltip = f"Title: {website.title}
" + tooltip += f"URL: {website.url}
" + tooltip += f"Depth: {website.depth}
" + tooltip += f"Outgoing Links: {len(website.links)}" + item.setToolTip(0, tooltip) + item.setToolTip(1, tooltip) + parent_item.addChild(item) + for child_url in children_map[url]: + add_items(item, child_url, visited, depth+1) + root_website = url_to_website[root_url] + root_item = QTreeWidgetItem([root_website.title, root_website.url]) + root_item.setIcon(0, self.get_icon(is_root=True)) + tooltip = f"Title: {root_website.title}
" + tooltip += f"URL: {root_website.url}
" + tooltip += f"Depth: {root_website.depth}
" + tooltip += f"Outgoing Links: {len(root_website.links)}" + root_item.setToolTip(0, tooltip) + root_item.setToolTip(1, tooltip) + self.sitemap_tree.addTopLevelItem(root_item) + visited = set([root_url]) + for child_url in children_map[root_url]: + add_items(root_item, child_url, visited, 1) + self.sitemap_tree.expandToDepth(1) + + def set_ai_tab_gradient(self, tab_index): + """Apply premium gradient styling to the AI tab header""" + gradient_css = """ + QTabBar::tab:nth-child({}) {{ + background: qlineargradient(x1:0, y1:0, x2:1, y2:0, + stop:0 #667eea, stop:0.5 #764ba2, stop:1 #f093fb); + color: white; + font-weight: 700; + border: 2px solid #667eea; + border-bottom: none; + padding: 14px 24px; + font-size: 15px; + }} + QTabBar::tab:nth-child({}):selected {{ + background: qlineargradient(x1:0, y1:0, x2:1, y2:0, + stop:0 #f093fb, stop:0.5 #764ba2, stop:1 #667eea); + color: white; + font-weight: 800; + border-bottom: none; + box-shadow: 0 4px 12px rgba(102, 126, 234, 0.3); + }} + QTabBar::tab:nth-child({}):hover:!selected {{ + background: qlineargradient(x1:0, y1:0, x2:1, y2:0, + stop:0 #5a67d8, stop:0.5 #6b46c1, stop:1 #e879f9); + }} + """.format(tab_index+1, tab_index+1, tab_index+1) + self.tab_widget.tabBar().setStyleSheet(self.tab_widget.tabBar().styleSheet() + gradient_css) + + def quick_question(self, question): + """Handle quick question button clicks by sending the question as if typed by the user.""" + self.ai_input.setText(question) + self.send_ai_message() + + def get_ai_context(self, user_msg=None): + """Return a string summary of the scraped websites for AI analysis. If no data, return a message indicating no data is available.""" + if not self.websites: + return "No data available. Please scrape some websites first." + # Summarize up to 5 websites for context + context_lines = [] + for i, w in enumerate(self.websites[:5]): + context_lines.append(f"{i+1}. Title: {w.title}\n URL: {w.url}\n Preview: {w.get_text_preview(120)}") + context = "\n".join(context_lines) + return context + +def main(): + app = QApplication(sys.argv) + app.setStyle('Fusion') # Use Fusion style for modern look + + # Set application icon and properties + app.setApplicationName("Web Scraper & Data Analyzer") + app.setApplicationVersion("2.0") + + window = WebScraperApp() + window.show() + + sys.exit(app.exec_()) + +if __name__ == '__main__': + main() \ No newline at end of file diff --git a/community-contributions/clinic_booking_bot.ipynb b/community-contributions/clinic_booking_bot.ipynb new file mode 100644 index 0000000..d2d8b57 --- /dev/null +++ b/community-contributions/clinic_booking_bot.ipynb @@ -0,0 +1,344 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 170, + "id": "a1aa1b43-7a47-4aca-ae5f-94a9d4ba2d89", + "metadata": {}, + "outputs": [], + "source": [ + "## Clinic Booking Bot\n", + "\n", + "##Easily book your clinic visit – available only on weekdays between **14:00 and 15:00**. \n", + "##Speak or type, and get instant confirmation.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 171, + "id": "fe798c6a-f8da-46aa-8c0e-9d2623def3d2", + "metadata": {}, + "outputs": [], + "source": [ + "# import library\n", + "\n", + "import os\n", + "import json\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import gradio as gr\n", + "import base64\n", + "from io import BytesIO\n", + "from datetime import date\n", + "from PIL import Image, ImageDraw, ImageFont\n" + ] + }, + { + "cell_type": "code", + "execution_count": 172, + "id": "0ad4e526-e95d-4e70-9faa-b4236b105dd5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OpenAI API Key exists and begins sk-proj-\n" + ] + } + ], + "source": [ + "# Save keys\n", + "\n", + "load_dotenv(override=True)\n", + "\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "MODEL = \"gpt-4o-mini\"\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": 173, + "id": "ae95308e-0002-4017-9f2c-fcb1ddb248fa", + "metadata": {}, + "outputs": [], + "source": [ + "# --- CONFIG ---\n", + "BOOKING_START = 14\n", + "BOOKING_END = 15\n", + "WEEKDAYS = [\"Monday\", \"Tuesday\", \"Wednesday\", \"Thursday\", \"Friday\"]\n", + "PHONE = \"010-1234567\"\n", + "confirmed_bookings = []\n" + ] + }, + { + "cell_type": "code", + "execution_count": 174, + "id": "e21b0fd0-4cda-4938-8867-dc2c6e7af4b1", + "metadata": {}, + "outputs": [], + "source": [ + "# --- TTS ---\n", + "def generate_tts(text, voice=\"fable\", filename=\"output.mp3\"):\n", + " response = openai.audio.speech.create(\n", + " model=\"tts-1\",\n", + " voice=\"fable\",\n", + " input=text\n", + " )\n", + " with open(filename, \"wb\") as f:\n", + " f.write(response.content)\n", + " return filename" + ] + }, + { + "cell_type": "code", + "execution_count": 175, + "id": "e28a5c3b-bd01-4845-a41e-87823f6bb078", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Translate Booking Confirmation ---\n", + "def translate_text(text, target_language=\"nl\"):\n", + " prompt = f\"Translate this message to {target_language}:\\n{text}\"\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": \"You are a helpful translator.\"},\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " )\n", + " return response.choices[0].message.content.strip()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 176, + "id": "8ed57cc9-7d54-4a5d-831b-0efcc5b7a7a9", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Booking Logic ---\n", + "def book_appointment(name, time_str):\n", + " try:\n", + " booking_time = datetime.strptime(time_str, \"%H:%M\")\n", + " except ValueError:\n", + " return \"Invalid time format. Use HH:MM.\", None, None\n", + "\n", + " hour = booking_time.hour\n", + " weekday = datetime.today().strftime(\"%A\")\n", + "\n", + " if weekday not in WEEKDAYS:\n", + " response = \"Bookings are only available on weekdays.\"\n", + " elif BOOKING_START <= hour < BOOKING_END:\n", + " confirmation = f\"Booking confirmed for {name} at {time_str}.\"\n", + " confirmed_bookings.append((name, time_str))\n", + " translated = translate_text(confirmation)\n", + " audio = generate_tts(translated)\n", + " image = generate_booking_image(name, time_str)\n", + " return translated, audio, image\n", + " else:\n", + " response = \"Sorry, bookings are only accepted between 14:00 and 15:00 on weekdays.\"\n", + " translated = translate_text(response)\n", + " audio = generate_tts(translated)\n", + " return translated, audio, None" + ] + }, + { + "cell_type": "code", + "execution_count": 177, + "id": "19b52115-f0f3-4d63-a463-886163d4cfd1", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Booking Card ---\n", + "def generate_booking_image(name, time_str):\n", + " img = Image.new(\"RGB\", (500, 250), color=\"white\")\n", + " draw = ImageDraw.Draw(img)\n", + " msg = f\"\\u2705 Booking Confirmed\\nName: {name}\\nTime: {time_str}\"\n", + " draw.text((50, 100), msg, fill=\"black\")\n", + " return img" + ] + }, + { + "cell_type": "code", + "execution_count": 178, + "id": "2c446b6c-d410-4ba1-b0c7-c475e5259ff5", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Voice Booking ---\n", + "def voice_booking(audio_path, name):\n", + " with open(audio_path, \"rb\") as f:\n", + " response = openai.audio.transcriptions.create(model=\"whisper-1\", file=f)\n", + " transcription = response.text.strip()\n", + "\n", + " system_prompt = \"\"\"\n", + " You are a clinic assistant. Extract only the appointment time from the user's sentence in 24-hour HH:MM format.\n", + " If no time is mentioned, respond with 'No valid time found.'\n", + " \"\"\"\n", + "\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": transcription}\n", + " ]\n", + " )\n", + " extracted_time = response.choices[0].message.content.strip()\n", + "\n", + " if \":\" in extracted_time:\n", + " return book_appointment(name, extracted_time)\n", + " else:\n", + " message = \"Sorry, I couldn't understand the time. Please try again.\"\n", + " translated = translate_text(message)\n", + " audio_path = generate_tts(translated)\n", + " return translated, audio_path, None" + ] + }, + { + "cell_type": "code", + "execution_count": 179, + "id": "121d2907-7fa8-4248-b2e7-83617ea66ff0", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Chat Bot Handler ---\n", + "def chat_bot(messages):\n", + " system_prompt = \"\"\"\n", + " You are a clinic booking assistant. Your job is to:\n", + " - Greet the patient and explain your role\n", + " - Only assist with making appointments\n", + " - Accept bookings only on weekdays between 14:00 and 15:00\n", + " - Do not provide medical advice\n", + " - Always respond with empathy and clarity\n", + " \"\"\"\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[{\"role\": \"system\", \"content\": system_prompt}] + messages\n", + " )\n", + " reply = response.choices[0].message.content.strip()\n", + " audio = generate_tts(reply)\n", + " return reply, audio" + ] + }, + { + "cell_type": "code", + "execution_count": 180, + "id": "2427b694-8c57-40cb-b202-4a8989547925", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7898\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Gradio interface\n", + "with gr.Blocks(theme=gr.themes.Soft()) as demo:\n", + " gr.Markdown(\"\"\"## 🩺 GP Booking Assistant \n", + "Only available weekdays between **14:00 and 15:00** \n", + "☎️ Contact: {PHONE}\n", + "---\"\"\")\n", + "\n", + " name_global = gr.Textbox(label=\"Your Name\", placeholder=\"Enter your name\", interactive=True)\n", + "\n", + " with gr.Tab(\"💬 Chat Mode\"):\n", + " chatbot = gr.Chatbot(label=\"Booking Chat\", type=\"messages\", height=400)\n", + " text_input = gr.Textbox(label=\"Type your message or use your voice below\")\n", + " audio_input = gr.Audio(type=\"filepath\", label=\"🎙️ Or speak your request\")\n", + " chat_audio_output = gr.Audio(label=\"🔊 Assistant's Reply\", type=\"filepath\")\n", + " send_btn = gr.Button(\"Send\")\n", + "\n", + " def handle_chat(user_message, chat_history):\n", + " chat_history = chat_history or []\n", + " chat_history.append({\"role\": \"user\", \"content\": user_message})\n", + " reply, audio = chat_bot(chat_history)\n", + " chat_history.append({\"role\": \"assistant\", \"content\": reply})\n", + " return chat_history, \"\", audio\n", + "\n", + " def handle_audio_chat(audio_path, chat_history):\n", + " with open(audio_path, \"rb\") as f:\n", + " transcription = openai.audio.transcriptions.create(model=\"whisper-1\", file=f).text.strip()\n", + " return handle_chat(transcription, chat_history)\n", + "\n", + " send_btn.click(handle_chat, [text_input, chatbot], [chatbot, text_input, chat_audio_output])\n", + " text_input.submit(handle_chat, [text_input, chatbot], [chatbot, text_input, chat_audio_output])\n", + " audio_input.change(handle_audio_chat, [audio_input, chatbot], [chatbot, text_input, chat_audio_output])\n", + "\n", + "\n", + " \n", + " with gr.Tab(\"📝 Text Booking\"):\n", + " time_text = gr.Textbox(label=\"Preferred Time (HH:MM)\", placeholder=\"e.g., 14:30\")\n", + " btn_text = gr.Button(\"📅 Book via Text\")\n", + "\n", + " with gr.Tab(\"🎙️ Voice Booking\"):\n", + " voice_input = gr.Audio(type=\"filepath\", label=\"Say your preferred time\")\n", + " btn_voice = gr.Button(\"📅 Book via Voice\")\n", + "\n", + " output_text = gr.Textbox(label=\"Response\", interactive=False)\n", + " output_audio = gr.Audio(label=\"Audio Reply\", type=\"filepath\")\n", + " output_image = gr.Image(label=\"Booking Confirmation\")\n", + "\n", + " btn_text.click(fn=book_appointment, inputs=[name_global, time_text], outputs=[output_text, output_audio, output_image])\n", + " btn_voice.click(fn=voice_booking, inputs=[voice_input, name_global], outputs=[output_text, output_audio, output_image])\n", + "\n", + " gr.Markdown(\"\"\"---\n", + "This assistant does **not** give medical advice. It only books appointments within allowed hours.\n", + "\"\"\")\n", + "\n", + " demo.launch()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f359de0a-28b1-4895-b21d-91d79e494a0d", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/community-contributions/multi-agent_gui_with_gradio/README.md b/community-contributions/multi-agent_gui_with_gradio/README.md new file mode 100644 index 0000000..3c80ace --- /dev/null +++ b/community-contributions/multi-agent_gui_with_gradio/README.md @@ -0,0 +1,25 @@ +# 🧠 Agentic Voice/Text Support Chatbot + +A multimodal chatbot interface with support for **text and voice input**, **multiple large language models (LLMs)**, and **context memory persistence** — all in a single Gradio-based GUI. + +## 🚀 Features + +- 🔄 **Multi-LLM switching**: Dynamically switch between OpenAI, Anthropic Claude, and Meta LLaMA (via Ollama) +- 🎤 **Voice input**: Use your microphone with live speech-to-text transcription +- 💬 **Contextual memory**: Maintain chat history even when switching models +- 🧪 **Prototype-ready**: Built with Gradio for rapid GUI testing and development + +## 🛠️ Technologies Used + +- [Gradio](https://www.gradio.app/) – GUI interface +- [OpenAI API](https://platform.openai.com/) +- [Anthropic Claude API](https://www.anthropic.com/) +- [Ollama](https://ollama.com/) – Local LLaMA inference +- [`speech_recognition`](https://pypi.org/project/SpeechRecognition/) – Voice-to-text +- `sounddevice`, `numpy` – Audio recording +- `.env` – Environment variable management + +## You’ll also need: +- API keys for OpenAI and Claude +- Ollama installed locally to run LLaMA models +- A .env file with the necessary API keys diff --git a/community-contributions/multi-agent_gui_with_gradio/agentic_voice_text_support.ipynb b/community-contributions/multi-agent_gui_with_gradio/agentic_voice_text_support.ipynb new file mode 100644 index 0000000..d4f6caf --- /dev/null +++ b/community-contributions/multi-agent_gui_with_gradio/agentic_voice_text_support.ipynb @@ -0,0 +1,395 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd", + "metadata": {}, + "source": [ + "### Building a Chatbot Interface, with Text or Voice Input, Multi-LLM support, and Memory Persistence" + ] + }, + { + "cell_type": "markdown", + "id": "eeb20b3e", + "metadata": {}, + "source": [ + "In this tutorial, we’ll use Gradio to build a simple chatbot prototype with a user-friendly interface. The chatbot will support multiple language models, allowing the user to switch models at any point during the conversation. It will also offer optional memory persistence, where the chat history is stored and forwarded to the selected model — which allows shared memory across models, even when switching mid-chat.\n", + "\n", + "In this project, we'll use OpenAI's API, Anthropic's Claude, and Meta's LLaMA, which runs locally via an Ollama server. Additionally, we'll use Python’s speech_recognition module to convert speech to text.\n", + "\n", + "It's worth noting that some APIs — such as OpenAI's — now support direct audio input, so integrating speech capabilities can also be done end-to-end without a separate transcription module." + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "a07e7793-b8f5-44f4-aded-5562f633271a", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import anthropic" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "a0a343b1", + "metadata": {}, + "outputs": [], + "source": [ + "# Speech recording and recognition libraries\n", + "import speech_recognition as sr\n", + "import sounddevice as sd\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "d7693eda", + "metadata": {}, + "outputs": [], + "source": [ + "# GUI prototyping\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "41ffc0e6", + "metadata": {}, + "outputs": [], + "source": [ + "buffer = [] # For temporarily holding sound recording\n", + "\n", + "# Helper function for handling voice recording\n", + "def callback(indata, frames, time, status):\n", + " buffer.append(indata.copy())\n", + "\n", + "stream = sd.InputStream(callback=callback, samplerate=16000, channels=1, dtype='int16')" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "e9a79075", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "# Function for handling recording data and status\n", + "def toggle_recording(state):\n", + " global stream, buffer\n", + " print('state', state)\n", + "\n", + " if not state:\n", + " buffer.clear()\n", + " stream.start()\n", + " return gr.update(value=\"Stop Recording\"), 'Recording...', not state\n", + " else:\n", + " stream.stop()\n", + " audio = np.concatenate(buffer, axis=0)\n", + " text = transcribe(audio)\n", + " return gr.update(value=\"Start Recording\"), text, not state\n", + "\n", + "# Functio that converts speech to text via Google's voice recognition module\n", + "def transcribe(recording, sample_rate=16000):\n", + " r = sr.Recognizer()\n", + "\n", + " # Convert NumPy array to AudioData\n", + " audio_data = sr.AudioData(\n", + " recording.tobytes(), # Raw byte data\n", + " sample_rate, # Sample rate\n", + " 2 # Sample width in bytes (16-bit = 2 bytes)\n", + " )\n", + "\n", + " text = r.recognize_google(audio_data)\n", + " print(\"You said:\", text)\n", + " return text" + ] + }, + { + "cell_type": "markdown", + "id": "dcfb0190", + "metadata": {}, + "source": [ + "### LLM & API set-up" + ] + }, + { + "cell_type": "markdown", + "id": "59416453", + "metadata": {}, + "source": [ + "##### Load API keys from .env" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "b638b822", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OpenAI API Key exists and begins sk-proj-\n", + "Anthropic API Key exists and begins sk-ant-\n", + "Google API Key not set\n" + ] + } + ], + "source": [ + "# Load environment variables in a file called .env\n", + "# Print the key prefixes to help with any debugging\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "if anthropic_api_key:\n", + " print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n", + "else:\n", + " print(\"Anthropic API Key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n", + "else:\n", + " print(\"Google API Key not set\")" + ] + }, + { + "cell_type": "markdown", + "id": "9e6ae162", + "metadata": {}, + "source": [ + "### Class for handling API calls and routing requests to the selected models" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "268ea65d", + "metadata": {}, + "outputs": [], + "source": [ + "class LLMHandler:\n", + " def __init__(self, system_message: str = '', ollama_api:str='http://localhost:11434/api/chat'):\n", + " # Default system message if none provided\n", + " self.system_message = system_message if system_message else \"You are a helpful assistant. Always reply in Markdown\"\n", + " self.message_history = []\n", + "\n", + " # Initialize LLM clients\n", + " self.openai = OpenAI()\n", + " self.claude = anthropic.Anthropic()\n", + " self.OLLAMA_API = ollama_api\n", + " self.OLLAMA_HEADERS = {\"Content-Type\": \"application/json\"}\n", + "\n", + " def llm_call(self, model: str = 'gpt-4o-mini', prompt: str = '', memory_persistence=True):\n", + " if not model:\n", + " return 'No model specified'\n", + "\n", + " # Use full message template with system prompt if no prior history\n", + " message = self.get_message_template(prompt, initial=True) if (\n", + " not self.message_history and not 'claude' in model\n", + " ) else self.get_message_template(prompt)\n", + "\n", + " # Handle memory persistence\n", + " if memory_persistence:\n", + " self.message_history.extend(message)\n", + " else:\n", + " self.message_history = message\n", + "\n", + " # Model-specific dispatch\n", + " try:\n", + " if 'gpt' in model:\n", + " response = self.call_openai(model=model)\n", + " elif 'claude' in model:\n", + " response = self.call_claude(model=model)\n", + " elif 'llama' in model:\n", + " response = self.call_ollama(model=model)\n", + " else:\n", + " response = f'{model.title()} is not supported or not a valid model name.'\n", + " except Exception as e:\n", + " response = f'Failed to retrieve response. Reason: {e}'\n", + "\n", + " # Save assistant's reply to history if memory is enabled\n", + " if memory_persistence:\n", + " self.message_history.append({\n", + " \"role\": \"assistant\",\n", + " \"content\": response\n", + " })\n", + "\n", + " return response\n", + "\n", + " def get_message_template(self, prompt: str = '', initial=False):\n", + " # Returns a message template with or without system prompt\n", + " initial_template = [\n", + " {\"role\": \"system\", \"content\": self.system_message},\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " general_template = [\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " return initial_template if initial else general_template\n", + "\n", + " def call_openai(self, model: str = 'gpt-4o-mini'):\n", + " # Sends chat completion request to OpenAI API\n", + " completion = self.openai.chat.completions.create(\n", + " model=model,\n", + " messages=self.message_history,\n", + " )\n", + " response = completion.choices[0].message.content\n", + " return response\n", + "\n", + " def call_ollama(self, model: str = \"llama3.2\"):\n", + "\n", + " payload = {\n", + " \"model\": model,\n", + " \"messages\": self.message_history,\n", + " \"stream\": False\n", + " }\n", + "\n", + " response = requests.post(url=self.OLLAMA_API, headers=self.OLLAMA_HEADERS, json=payload)\n", + " return response.json()[\"message\"][\"content\"]\n", + "\n", + " def call_claude(self, model: str = \"claude-3-haiku-20240307\"):\n", + " # Sends chat request to Anthropic Claude API\n", + " message = self.claude.messages.create(\n", + " model=model,\n", + " system=self.system_message,\n", + " messages=self.message_history,\n", + " max_tokens=500\n", + " )\n", + " response = message.content[0].text\n", + " return response\n" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "632e618b", + "metadata": {}, + "outputs": [], + "source": [ + "llm_handler = LLMHandler()\n", + "\n", + "# Function to handle user prompts received by the interface\n", + "def llm_call(model, prompt, memory_persistence):\n", + " response = llm_handler.llm_call(model=model, prompt=prompt, memory_persistence=memory_persistence)\n", + " return response, ''\n" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "e19228f6", + "metadata": {}, + "outputs": [], + "source": [ + "# Specify available model names for the dropdown component\n", + "AVAILABLE_MODELS = [\"gpt-4\", \"gpt-3.5\", \"claude-3-haiku-20240307\", \"llama3.2\", \"gpt-4o-mini\"]\n" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "id": "f65f43ff", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7868\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\n", + "with gr.Blocks() as demo:\n", + " state = gr.State(False) # Recording state (on/off)\n", + " with gr.Row():\n", + " \n", + " with gr.Column():\n", + " out = gr.Markdown(label='Message history')\n", + " with gr.Row():\n", + " memory = gr.Checkbox(label='Toggle memory', value=True) # Handle memory status (on/off) btn\n", + " model_choice = gr.Dropdown(label='Model', choices=AVAILABLE_MODELS, interactive=True) # Model selection dropdown\n", + " query_box = gr.Textbox(label='ChatBox', placeholder=\"Your message\")\n", + " record_btn = gr.Button(value='Record voice message') # Start/stop recording btn\n", + " send_btn = gr.Button(\"Send\") # Send prompt btn\n", + " \n", + " \n", + " \n", + " record_btn.click(fn=toggle_recording, inputs=state, outputs=[record_btn, query_box, state])\n", + " send_btn.click(fn=llm_call, inputs=[model_choice, query_box, memory], outputs=[out, query_box])\n", + " \n", + "\n", + "demo.launch()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3743db5d", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "general_env", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/community-contributions/pradeep1955/week1 EXERCISE.ipynb b/community-contributions/pradeep1955/week1 EXERCISE.ipynb new file mode 100644 index 0000000..5c418f2 --- /dev/null +++ b/community-contributions/pradeep1955/week1 EXERCISE.ipynb @@ -0,0 +1,148 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5", + "metadata": {}, + "source": [ + "# End of week 1 exercise\n", + "\n", + "To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n", + "and responds with an explanation. This is a tool that you will be able to use yourself during the course!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1070317-3ed9-4659-abe3-828943230e03", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "import os\n", + "from openai import OpenAI\n", + "from IPython.display import Markdown, display, update_display\n", + "from dotenv import load_dotenv" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a456906-915a-4bfd-bb9d-57e505c5093f", + "metadata": {}, + "outputs": [], + "source": [ + "# constants\n", + "\n", + "MODEL_GPT = 'gpt-4o-mini'\n", + "MODEL_LLAMA = 'llama3.2'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8d7923c-5f28-4c30-8556-342d7c8497c1", + "metadata": {}, + "outputs": [], + "source": [ + "# set up environment\n", + "load_dotenv(override=True)\n", + "api_key=os.getenv(\"OPENAI_API_KEY\")\n", + "if not api_key.startswith(\"sk-proj-\") and len(api_key)<10:\n", + " print(\"api key not foud\")\n", + "else:\n", + " print(\"api found and is ok\")\n", + "\n", + "openai=OpenAI()\n", + "print()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3f0d0137-52b0-47a8-81a8-11a90a010798", + "metadata": {}, + "outputs": [], + "source": [ + "# here is the question; type over this to ask something new\n", + "\n", + "question = \"\"\"\n", + "Please explain what this code does and why:\n", + "yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60ce7000-a4a5-4cce-a261-e75ef45063b4", + "metadata": {}, + "outputs": [], + "source": [ + "# Get gpt-4o-mini to answer, with streaming\n", + "messages = [{\"role\":\"system\",\"content\":\"You are a expert Dta Scientist\"}, {\"role\":\"user\",\"content\":question}]\n", + "\n", + "stream = openai.chat.completions.create(\n", + " model = MODEL_GPT,\n", + " messages = messages,\n", + " stream = True\n", + ")\n", + "response = \"\"\n", + "display_handle = display(Markdown(\"\"), display_id=True)\n", + "for chunk in stream:\n", + " response += chunk.choices[0].delta.content or ''\n", + " response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n", + " update_display(Markdown(response), display_id=display_handle.display_id)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538", + "metadata": {}, + "outputs": [], + "source": [ + "# Get Llama 3.2 to answer\n", + "import ollama\n", + "\n", + "stream = ollama.chat(model=MODEL_LLAMA, messages=messages, stream=True)\n", + "response = \"\"\n", + "display_handle = display(Markdown(\"\"), display_id=True)\n", + "for chunk in stream:\n", + " response += chunk[\"message\"][\"content\"] or ''\n", + " response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n", + " update_display(Markdown(response), display_id=display_handle.display_id)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2a573174-779b-4d50-8792-fa0889b37211", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "llmenv", + "language": "python", + "name": "llmenv" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/community-contributions/pradeep1955/week1/day2 EXERCISE.ipynb b/community-contributions/pradeep1955/week1/day2 EXERCISE.ipynb new file mode 100644 index 0000000..d7a3078 --- /dev/null +++ b/community-contributions/pradeep1955/week1/day2 EXERCISE.ipynb @@ -0,0 +1,426 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9", + "metadata": {}, + "source": [ + "# Welcome to your first assignment!\n", + "\n", + "Instructions are below. Please give this a try, and look in the solutions folder if you get stuck (or feel free to ask me!)" + ] + }, + { + "cell_type": "markdown", + "id": "ada885d9-4d42-4d9b-97f0-74fbbbfe93a9", + "metadata": {}, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \n", + "

Just before we get to the assignment --

\n", + " I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.
\n", + " https://edwarddonner.com/2024/11/13/llm-engineering-resources/
\n", + " Please keep this bookmarked, and I'll continue to add more useful links there over time.\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "6e9fa1fc-eac5-4d1d-9be4-541b3f2b3458", + "metadata": {}, + "source": [ + "# HOMEWORK EXERCISE ASSIGNMENT\n", + "\n", + "Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI\n", + "\n", + "You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.\n", + "\n", + "**Benefits:**\n", + "1. No API charges - open-source\n", + "2. Data doesn't leave your box\n", + "\n", + "**Disadvantages:**\n", + "1. Significantly less power than Frontier Model\n", + "\n", + "## Recap on installation of Ollama\n", + "\n", + "Simply visit [ollama.com](https://ollama.com) and install!\n", + "\n", + "Once complete, the ollama server should already be running locally. \n", + "If you visit: \n", + "[http://localhost:11434/](http://localhost:11434/)\n", + "\n", + "You should see the message `Ollama is running`. \n", + "\n", + "If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve` \n", + "And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2` \n", + "Then try [http://localhost:11434/](http://localhost:11434/) again.\n", + "\n", + "If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code below from `MODEL = \"llama3.2\"` to `MODEL = \"llama3.2:1b\"`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4e2a9393-7767-488e-a8bf-27c12dca35bd", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import requests\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29ddd15d-a3c5-4f4e-a678-873f56162724", + "metadata": {}, + "outputs": [], + "source": [ + "# Constants\n", + "\n", + "OLLAMA_API = \"http://localhost:11434/api/chat\"\n", + "HEADERS = {\"Content-Type\": \"application/json\"}\n", + "MODEL = \"llama3.2\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dac0a679-599c-441f-9bf2-ddc73d35b940", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a messages list using the same format that we used for OpenAI\n", + "\n", + "messages = [\n", + " {\"role\": \"user\", \"content\": \"Describe some of the business applications of Generative AI\"}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7bb9c624-14f0-4945-a719-8ddb64f66f47", + "metadata": {}, + "outputs": [], + "source": [ + "payload = {\n", + " \"model\": MODEL,\n", + " \"messages\": messages,\n", + " \"stream\": False\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "479ff514-e8bd-4985-a572-2ea28bb4fa40", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's just make sure the model is loaded\n", + "\n", + "!ollama pull llama3.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42b9f644-522d-4e05-a691-56e7658c0ea9", + "metadata": {}, + "outputs": [], + "source": [ + "# If this doesn't work for any reason, try the 2 versions in the following cells\n", + "# And double check the instructions in the 'Recap on installation of Ollama' at the top of this lab\n", + "# And if none of that works - contact me!\n", + "\n", + "response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)\n", + "print(response.json()['message']['content'])" + ] + }, + { + "cell_type": "markdown", + "id": "6a021f13-d6a1-4b96-8e18-4eae49d876fe", + "metadata": {}, + "source": [ + "# Introducing the ollama package\n", + "\n", + "And now we'll do the same thing, but using the elegant ollama python package instead of a direct HTTP call.\n", + "\n", + "Under the hood, it's making the same call as above to the ollama server running at localhost:11434" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7745b9c4-57dc-4867-9180-61fa5db55eb8", + "metadata": {}, + "outputs": [], + "source": [ + "import ollama\n", + "\n", + "response = ollama.chat(model=MODEL, messages=messages)\n", + "print(response['message']['content'])" + ] + }, + { + "cell_type": "markdown", + "id": "a4704e10-f5fb-4c15-a935-f046c06fb13d", + "metadata": {}, + "source": [ + "## Alternative approach - using OpenAI python library to connect to Ollama" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23057e00-b6fc-4678-93a9-6b31cb704bff", + "metadata": {}, + "outputs": [], + "source": [ + "# There's actually an alternative approach that some people might prefer\n", + "# You can use the OpenAI client python library to call Ollama:\n", + "\n", + "from openai import OpenAI\n", + "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", + "\n", + "response = ollama_via_openai.chat.completions.create(\n", + " model=MODEL,\n", + " messages=messages\n", + ")\n", + "\n", + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "id": "9f9e22da-b891-41f6-9ac9-bd0c0a5f4f44", + "metadata": {}, + "source": [ + "## Are you confused about why that works?\n", + "\n", + "It seems strange, right? We just used OpenAI code to call Ollama?? What's going on?!\n", + "\n", + "Here's the scoop:\n", + "\n", + "The python class `OpenAI` is simply code written by OpenAI engineers that makes calls over the internet to an endpoint. \n", + "\n", + "When you call `openai.chat.completions.create()`, this python code just makes a web request to the following url: \"https://api.openai.com/v1/chat/completions\"\n", + "\n", + "Code like this is known as a \"client library\" - it's just wrapper code that runs on your machine to make web requests. The actual power of GPT is running on OpenAI's cloud behind this API, not on your computer!\n", + "\n", + "OpenAI was so popular, that lots of other AI providers provided identical web endpoints, so you could use the same approach.\n", + "\n", + "So Ollama has an endpoint running on your local box at http://localhost:11434/v1/chat/completions \n", + "And in week 2 we'll discover that lots of other providers do this too, including Gemini and DeepSeek.\n", + "\n", + "And then the team at OpenAI had a great idea: they can extend their client library so you can specify a different 'base url', and use their library to call any compatible API.\n", + "\n", + "That's it!\n", + "\n", + "So when you say: `ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')` \n", + "Then this will make the same endpoint calls, but to Ollama instead of OpenAI." + ] + }, + { + "cell_type": "markdown", + "id": "bc7d1de3-e2ac-46ff-a302-3b4ba38c4c90", + "metadata": {}, + "source": [ + "## Also trying the amazing reasoning model DeepSeek\n", + "\n", + "Here we use the version of DeepSeek-reasoner that's been distilled to 1.5B. \n", + "This is actually a 1.5B variant of Qwen that has been fine-tuned using synethic data generated by Deepseek R1.\n", + "\n", + "Other sizes of DeepSeek are [here](https://ollama.com/library/deepseek-r1) all the way up to the full 671B parameter version, which would use up 404GB of your drive and is far too large for most!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf9eb44e-fe5b-47aa-b719-0bb63669ab3d", + "metadata": {}, + "outputs": [], + "source": [ + "!ollama pull deepseek-r1:1.5b" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1d3d554b-e00d-4c08-9300-45e073950a76", + "metadata": {}, + "outputs": [], + "source": [ + "# This may take a few minutes to run! You should then see a fascinating \"thinking\" trace inside tags, followed by some decent definitions\n", + "\n", + "response = ollama_via_openai.chat.completions.create(\n", + " model=\"deepseek-r1:1.5b\",\n", + " messages=[{\"role\": \"user\", \"content\": \"Please give definitions of some core concepts behind LLMs: a neural network, attention and the transformer\"}]\n", + ")\n", + "\n", + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "id": "1622d9bb-5c68-4d4e-9ca4-b492c751f898", + "metadata": {}, + "source": [ + "# NOW the exercise for you\n", + "\n", + "Take the code from day1 and incorporate it here, to build a website summarizer that uses Llama 3.2 running locally instead of OpenAI; use either of the above approaches." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43ef4b92-53e1-4af2-af3f-726812f4265c", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "#from dotenv import load_dotenv\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display\n", + "#from openai import OpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "97d45733-394e-493e-a92b-1475876d9028", + "metadata": {}, + "outputs": [], + "source": [ + "headers = {\n", + " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", + "}\n", + "\n", + "class Website:\n", + "\n", + " def __init__(self, url):\n", + " \"\"\"\n", + " Create this Website object from the given url using the BeautifulSoup library\n", + " \"\"\"\n", + " self.url = url\n", + " response = requests.get(url, headers=headers)\n", + " soup = BeautifulSoup(response.content, 'html.parser')\n", + " self.title = soup.title.string if soup.title else \"No title found\"\n", + " for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n", + " irrelevant.decompose()\n", + " self.text = soup.body.get_text(separator=\"\\n\", strip=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6a40f9c5-1b14-42f9-9319-6a66e58e03f2", + "metadata": {}, + "outputs": [], + "source": [ + "webpage = Website(\"https://www.pleasurewebsite.com\")\n", + "print(webpage.title)\n", + "print(webpage.text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a72a005d-43de-4ae5-b427-99a8fcb6065c", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = \"You are an assistant that analyzes the contents of a website \\\n", + "and provides a short summary, ignoring text that might be navigation related. \\\n", + "Respond in markdown.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f0e4f95f-0ccf-4027-9457-5c973cd17702", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(website):\n", + " user_prompt = f\"You are looking at a website titled {website.title}\"\n", + " user_prompt += \"\\nThe contents of this website is as follows; \\\n", + "please provide a short summary of this website in markdown. \\\n", + "If it includes news or announcements, then summarize these too.\\n\\n\"\n", + " user_prompt += website.text\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ceae6073-a085-49ce-ad44-39e46d8e6934", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(website):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(website)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9d53b26b-308c-470c-a0a9-9edb887aed6d", + "metadata": {}, + "outputs": [], + "source": [ + "messages=messages_for(webpage)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6de38216-6d1c-48c4-877b-86d403f4e0f8", + "metadata": {}, + "outputs": [], + "source": [ + "import ollama\n", + "MODEL = \"llama3.2\"\n", + "response = ollama.chat(model=MODEL, messages=messages)\n", + "print(response['message']['content'])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "llmenv", + "language": "python", + "name": "llmenv" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/community-contributions/pradeep1955/week2/agent_conversation_shakespeare.ipynb b/community-contributions/pradeep1955/week2/agent_conversation_shakespeare.ipynb new file mode 100644 index 0000000..6d55283 --- /dev/null +++ b/community-contributions/pradeep1955/week2/agent_conversation_shakespeare.ipynb @@ -0,0 +1,351 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "06cf3063-9f3e-4551-a0d5-f08d9cabb927", + "metadata": {}, + "source": [ + "# Triangular agent conversation\n", + "\n", + "## GPT (Hamlet), LLM (Falstaff), Gemini (Iago):" + ] + }, + { + "cell_type": "markdown", + "id": "3637910d-2c6f-4f19-b1fb-2f916d23f9ac", + "metadata": {}, + "source": [ + "### Created a 3-way, bringing Gemini into the coversation.\n", + "### Replacing one of the models with an open source model running with Ollama." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8e0c1bd-a159-475b-9cdc-e219a7633355", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "from IPython.display import Markdown, display, update_display\n", + "import ollama" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a3ad57ad-46a8-460e-9cb3-67a890093536", + "metadata": {}, + "outputs": [], + "source": [ + "import google.generativeai" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f531c14-5743-4a5b-83d9-cb5863ca2ddf", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "# Print the key prefixes to help with any debugging\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n", + "else:\n", + " print(\"Google API Key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d5150ee-3858-4921-bce6-2eecfb96bc75", + "metadata": {}, + "outputs": [], + "source": [ + "# Connect to OpenAI\n", + "\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11381fd8-5099-41e8-a1d7-6787dea56e43", + "metadata": {}, + "outputs": [], + "source": [ + "google.generativeai.configure()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1766d20-54b6-4f76-96c5-c338ae7073c9", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_model = \"gpt-4o-mini\"\n", + "llama_model = \"llama3.2\"\n", + "gemini_model = 'gemini-2.0-flash'\n", + "\n", + "gpt_system = \"You are playing part of Hamlet. he is philosopher, probes Iago with a mixture of suspicion\\\n", + "and intellectual curiosity, seeking to unearth the origins of his deceit.\\\n", + "Is malice born of scorn, envy, or some deeper void? Hamlet’s introspective nature\\\n", + "drives him to question whether Iago’s actions reveal a truth about humanity itself.\\\n", + "You will respond as Shakespear's Hamlet will do.\"\n", + "\n", + "llama_system = \"You are acting part of Falstaff who attempts to lighten the mood with his jokes and observations,\\\n", + "potentially clashing with Hamlet's melancholic nature.You respond as Shakespear's Falstaff do.\"\n", + "\n", + "gemini_system = \"You are acting part of Iago, subtly trying to manipulate both Hamlet and Falstaff\\\n", + "to his own advantage, testing their weaknesses and exploiting their flaws. You respond like Iago\"\n", + "\n", + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]\n", + "gemini_messages = [\"Hello\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "806a0506-dac8-4bad-ac08-31f350256b58", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " messages = [{\"role\": \"system\", \"content\": gpt_system}]\n", + " for gpt, claude, gemini in zip(gpt_messages, llama_messages, gemini_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43674885-ede7-48bf-bee4-467454f3e96a", + "metadata": {}, + "outputs": [], + "source": [ + "def call_llama():\n", + " messages = []\n", + " for gpt, llama, gemini in zip(gpt_messages, llama_messages, gemini_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": llama})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " response = ollama.chat(model=llama_model, messages=messages)\n", + "\n", + " \n", + " return response['message']['content']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "03d34769-b339-4c4b-8c60-69494c39d725", + "metadata": {}, + "outputs": [], + "source": [ + "#import google.generativeai as genai\n", + "\n", + "# Make sure you configure the API key first:\n", + "#genai.configure(api_key=\"YOUR_API_KEY\")\n", + "\n", + "def call_gemini():\n", + " gemini_messages = []\n", + " \n", + " # Format the history for Gemini\n", + " for gpt, llama, gemini_message in zip(gpt_messages, llama_messages, gemini_messages):\n", + " gemini_messages.append({\"role\": \"user\", \"parts\": [gpt]}) # Hamlet speaks\n", + " gemini_messages.append({\"role\": \"model\", \"parts\": [llama]}) # Falstaff responds\n", + " gemini_messages.append({\"role\": \"model\", \"parts\": [gemini_message]}) # Iago responds\n", + "\n", + " # Add latest user input if needed (optional)\n", + " gemini_messages.append({\"role\": \"user\", \"parts\": [llama_messages[-1]]})\n", + "\n", + " # Initialize the model with the correct system instruction\n", + " gemini = google.generativeai.GenerativeModel(\n", + " #model_name='gemini-1.5-flash', # Or 'gemini-pro'\n", + " model_name = gemini_model,\n", + " system_instruction=gemini_system\n", + " )\n", + "\n", + " response = gemini.generate_content(gemini_messages)\n", + " return response.text\n", + "#print(response.text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "93fc8253-67cb-4ea4-aff7-097b2a222793", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]\n", + "gemini_messages = [\"Hello\"]\n", + "\n", + "print(f\"Hamlet:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Falstaff:\\n{llama_messages[0]}\\n\")\n", + "print(f\"Iago:\\n{gemini_messages[0]}\\n\")\n", + "\n", + "for i in range(3):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT:\\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + " \n", + " llama_next = call_llama()\n", + " print(f\"Llama:\\n{llama_next}\\n\")\n", + " llama_messages.append(llama_next)\n", + "\n", + " gemini_next = call_gemini()\n", + " print(f\"Gemini:\\n{gemini_next}\\n\")\n", + " llama_messages.append(gemini_next)" + ] + }, + { + "cell_type": "markdown", + "id": "bca66ffc-9dc1-4384-880c-210889f5d0ac", + "metadata": {}, + "source": [ + "## Conversation between gpt-4.0-mini and llama3.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c23224f6-7008-44ed-a57f-718975f4e291", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n", + "# We're using cheap versions of models so the costs will be minimal\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "llama_model = \"llama3.2\"\n", + "\n", + "gpt_system = \"You are a tapori from mumbai who is very optimistic; \\\n", + "you alway look at the brighter part of the situation and you always ready to take act to win way.\"\n", + "\n", + "llama_system = \"You are a Jaat from Haryana. You try to express with hindi poems \\\n", + "to agree with other person and or find common ground. If the other person is optimistic, \\\n", + "you respond in poetic way and keep chatting.\"\n", + "\n", + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2d704bbb-f22b-400d-a695-efbd02b26548", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " messages = [{\"role\": \"system\", \"content\": gpt_system}]\n", + " for gpt, llama in zip(gpt_messages, llama_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": llama})\n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "385ccec8-de59-4e42-9616-3f5c9a05589c", + "metadata": {}, + "outputs": [], + "source": [ + "def call_llama():\n", + " messages = []\n", + " for gpt, llama_message in zip(gpt_messages, llama_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": llama_message})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " response = ollama.chat(model=llama_model, messages=messages)\n", + "\n", + " \n", + " return response['message']['content']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70b5481b-455e-4275-80d3-0afe0fabcb0f", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]\n", + "\n", + "print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Llama:\\n{llama_messages[0]}\\n\")\n", + "\n", + "for i in range(3):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT:\\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + " \n", + " llama_next = call_llama()\n", + " print(f\"Llama:\\n{llama_next}\\n\")\n", + " llama_messages.append(llama_next)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7f8d734b-57e5-427d-bcb1-7956fc58a348", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "llmenv", + "language": "python", + "name": "llmenv" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/community-contributions/protocol_summarizer_webapp/.github/copilot-instructions.md b/community-contributions/protocol_summarizer_webapp/.github/copilot-instructions.md new file mode 100644 index 0000000..3aa12b5 --- /dev/null +++ b/community-contributions/protocol_summarizer_webapp/.github/copilot-instructions.md @@ -0,0 +1,3 @@ + + +This is a Streamlit web application for clinical trial protocol summarization. Use Streamlit best practices for UI and Python for backend logic. Integrate with ClinicalTrials.gov v2 API for study search and OpenAI for summarization. diff --git a/community-contributions/protocol_summarizer_webapp/.gitignore b/community-contributions/protocol_summarizer_webapp/.gitignore new file mode 100644 index 0000000..7cc51b2 --- /dev/null +++ b/community-contributions/protocol_summarizer_webapp/.gitignore @@ -0,0 +1,30 @@ +updates.md +.env +__pycache__/ +*.py[cod] +*$py.class +*.so +.Python +env/ +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +*.egg-info/ +.installed.cfg +*.egg +venv/ +ENV/ +.streamlit/ +.idea/ +.vscode/ +*.swp +*.swo +.DS_Store diff --git a/community-contributions/protocol_summarizer_webapp/README.md b/community-contributions/protocol_summarizer_webapp/README.md new file mode 100644 index 0000000..2e80874 --- /dev/null +++ b/community-contributions/protocol_summarizer_webapp/README.md @@ -0,0 +1,66 @@ +# Protocol Summarizer Webapp + +A Streamlit web application for searching and summarizing clinical trial protocols from ClinicalTrials.gov using Large Language Models. This tool enables researchers and clinical professionals to quickly extract key information from clinical trial protocols. + +## Features +- Search for clinical trials by keyword +- Display a list of studies with title and NCT number +- Select a study to summarize +- Fetch the protocol's brief summary from ClinicalTrials.gov API +- Automatically summarize the protocol using OpenAI's LLM +- Extract structured information like study design, population, interventions, and endpoints + +## Installation + +1. Clone this repository: + ```sh + git clone https://github.com/albertoclemente/protocol_summarizer.git + cd protocol_summarizer/protocol_summarizer_webapp + ``` + +2. Install dependencies: + ```sh + pip install -r requirements.txt + ``` + +3. Create a `.env` file in the project root with your OpenAI API key: + ``` + OPENAI_API_KEY=your_api_key_here + ``` + +## Usage + +1. Run the Streamlit app: + ```sh + streamlit run app.py + ``` + +2. In your browser: + - Enter a disease, condition, or keyword in the search box + - Select the number of results to display + - Click the "Search" button + - Select a study from the results + - Click "Summarize Protocol" to generate a structured summary + +## Technical Details + +- Uses ClinicalTrials.gov API v2 to retrieve study information +- Implements fallback methods to handle API changes or failures +- Extracts protocol brief summaries using reliable JSON parsing +- Generates structured summaries using OpenAI's GPT models + +## Requirements + +- Python 3.7+ +- Streamlit +- Requests +- OpenAI Python library +- python-dotenv + +## Contribution + +Contributions are welcome! Please feel free to submit a Pull Request. + +## License + +MIT License diff --git a/community-contributions/protocol_summarizer_webapp/app.py b/community-contributions/protocol_summarizer_webapp/app.py new file mode 100644 index 0000000..cd9941a --- /dev/null +++ b/community-contributions/protocol_summarizer_webapp/app.py @@ -0,0 +1,121 @@ +import os +from dotenv import load_dotenv +import streamlit as st +import requests +from openai import OpenAI + +load_dotenv() + +st.title("Protocol Summarizer") + +st.markdown(""" +Search for clinical trials by keyword, select a study, and generate a protocol summary using an LLM. +""") + +# Search input + +# Show results only after user presses Enter +with st.form(key="search_form"): + query = st.text_input("Enter a disease, study title, or keyword:") + max_results = st.slider("Number of results", 1, 20, 5) + submitted = st.form_submit_button("Search") + +@st.cache_data(show_spinner=False) +def search_clinical_trials(query, max_results=5): + if not query: + return [] + url = f"https://clinicaltrials.gov/api/v2/studies?query.term={query}&pageSize={max_results}&format=json" + resp = requests.get(url) + studies = [] + if resp.status_code == 200: + data = resp.json() + for study in data.get('studies', []): + nct = study.get('protocolSection', {}).get('identificationModule', {}).get('nctId', 'N/A') + title = study.get('protocolSection', {}).get('identificationModule', {}).get('officialTitle', 'N/A') + studies.append({'nct': nct, 'title': title}) + return studies + +results = search_clinical_trials(query, max_results) if query else [] + +if results: + st.subheader("Search Results") + for i, study in enumerate(results): + st.markdown(f"**{i+1}. {study['title']}** (NCT: {study['nct']})") + selected = st.number_input("Select study number to summarize", min_value=1, max_value=len(results), value=1) + selected_study = results[selected-1] + st.markdown(f"### Selected Study\n**{selected_study['title']}** (NCT: {selected_study['nct']})") + if st.button("Summarize Protocol"): + # Fetch the brief summary for the selected study + nct_id = selected_study['nct'] + + # Use the V2 API which we know works reliably + url = f"https://clinicaltrials.gov/api/v2/studies/{nct_id}?format=json" + with st.spinner("Fetching study details..."): + resp = requests.get(url) + brief = "" + + if resp.status_code == 200: + try: + data = resp.json() + + # V2 API has protocolSection at the root level + if 'protocolSection' in data: + desc_mod = data.get('protocolSection', {}).get('descriptionModule', {}) + brief = desc_mod.get('briefSummary', '') + + # If briefSummary is empty, try detailedDescription + if not brief: + brief = desc_mod.get('detailedDescription', '') + except Exception as e: + st.error(f"Error parsing study data: {e}") + + # If API fails, try HTML scraping as a fallback + if not brief and resp.status_code != 200: + st.warning(f"API returned status code {resp.status_code}. Trying alternative method...") + html_url = f"https://clinicaltrials.gov/ct2/show/{nct_id}" + html_resp = requests.get(html_url) + + if "Brief Summary:" in html_resp.text: + start = html_resp.text.find("Brief Summary:") + 15 + excerpt = html_resp.text[start:start+1000] + + # Clean up HTML + import re + excerpt = re.sub('<[^<]+?>', ' ', excerpt) + excerpt = re.sub('\\s+', ' ', excerpt) + brief = excerpt.strip() + + if not brief: + st.error("No brief summary or detailed description found for this study.") + st.stop() + + # Now we have the brief summary, send it to the LLM + openai = OpenAI() + def user_prompt_for_protocol_brief(brief_text): + return ( + "Extract the following details from the clinical trial brief summary in markdown format with clear section headings (e.g., ## Study Design, ## Population, etc.):\n" + "- Study design\n" + "- Population\n" + "- Interventions\n" + "- Primary and secondary endpoints\n" + "- Study duration\n\n" + f"Brief summary text:\n{brief_text}" + ) + system_prompt = "You are a clinical research assistant. Extract and list the requested protocol details in markdown format with clear section headings." + messages = [ + {"role": "system", "content": system_prompt}, + {"role": "user", "content": user_prompt_for_protocol_brief(brief)} + ] + with st.spinner("Summarizing with LLM..."): + try: + response = openai.chat.completions.create( + model="gpt-4o-mini", + messages=messages + ) + summary = response.choices[0].message.content + st.markdown(summary) + except Exception as e: + st.error(f"LLM call failed: {e}") +else: + if query: + st.info("No results found. Try a different keyword.") diff --git a/community-contributions/protocol_summarizer_webapp/requirements.txt b/community-contributions/protocol_summarizer_webapp/requirements.txt new file mode 100644 index 0000000..345b507 --- /dev/null +++ b/community-contributions/protocol_summarizer_webapp/requirements.txt @@ -0,0 +1,4 @@ +streamlit +openai +requests +python-dotenv diff --git a/community-contributions/sf-patient-brochure/.gitkeep b/community-contributions/sf-patient-brochure/.gitkeep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/community-contributions/sf-patient-brochure/.gitkeep @@ -0,0 +1 @@ + diff --git a/community-contributions/sf-patient-brochure/Patient brochure.ipynb b/community-contributions/sf-patient-brochure/Patient brochure.ipynb new file mode 100644 index 0000000..4f6bc85 --- /dev/null +++ b/community-contributions/sf-patient-brochure/Patient brochure.ipynb @@ -0,0 +1,517 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 9, + "id": "fc57c47f-31fc-4527-af71-ce117d35c480", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt\n", + "\n", + "import os\n", + "import requests\n", + "import json\n", + "from typing import List\n", + "from dotenv import load_dotenv\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display, update_display\n", + "from openai import OpenAI\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "d74ea4e7-7d4a-4c85-92d3-8cdb231bc261", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "3eb884ea-02db-4ff8-91f9-c71e40b1cf4a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "API key looks good so far\n" + ] + } + ], + "source": [ + "# Initialize and constants\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:\n", + " print(\"API key looks good so far\")\n", + "else:\n", + " print(\"There might be a problem with your API key? Please visit the troubleshooting notebook!\")\n", + " \n", + "MODEL = 'gpt-4o-mini'\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "d48a7b9b-273d-4bc9-997b-c7112e02528c", + "metadata": {}, + "outputs": [], + "source": [ + "# A class to represent a Webpage\n", + "\n", + "# Some websites need you to use proper headers when fetching them:\n", + "headers = {\n", + " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", + "}\n", + "\n", + "class Website:\n", + " def __init__(self, url):\n", + " self.url = url\n", + " response = requests.get(url, headers=headers)\n", + " self.body = response.content\n", + " soup = BeautifulSoup(self.body, 'html.parser')\n", + " self.title = soup.title.string if soup.title else \"No title found\"\n", + "\n", + " if soup.body:\n", + " for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n", + " irrelevant.decompose()\n", + " self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n", + " else:\n", + " self.text = \"\"\n", + "\n", + " links = [link.get('href') for link in soup.find_all('a')]\n", + " self.links = [link for link in links if link]\n", + "\n", + " def get_contents(self):\n", + " return f\"Webpage Title:\\n{self.title}\\nWebpage Contents:\\n{self.text}\\n\\n\"\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "bf51ae6e-91ae-46eb-ac39-dc860454ea4a", + "metadata": {}, + "outputs": [], + "source": [ + "def get_condition_links_from_topics_page():\n", + " topics_url = \"https://www.thuisarts.nl/overzicht/onderwerpen\"\n", + " response = requests.get(topics_url, headers=headers)\n", + " soup = BeautifulSoup(response.content, 'html.parser')\n", + "\n", + " # Find all tags that look like condition pages\n", + " links = soup.find_all(\"a\", href=True)\n", + " condition_links = []\n", + "\n", + " for link in links:\n", + " href = link['href']\n", + " if href.startswith(\"/\"):\n", + " href = \"https://www.thuisarts.nl\" + href\n", + " if href.startswith(\"https://www.thuisarts.nl/\") and len(href.split(\"/\")) > 3:\n", + " condition_links.append(href)\n", + "\n", + " # Remove duplicates and return\n", + " return list(set(condition_links))\n" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "a246ac9f-73fb-4c2d-ab92-6f3f2bf7afac", + "metadata": {}, + "outputs": [], + "source": [ + "link_system_prompt = \"\"\"You are an assistant that filters URLs for patient education content. \n", + "\n", + "Only return links that lead to pages about symptoms, health conditions, treatments, or diseases — for example: pages on 'headache', 'diarrhea', 'stomach pain', 'asthma', etc.\n", + "\n", + "DO NOT return:\n", + "- contact pages\n", + "- overview/video/image/keuzekaart lists unless they directly link to medical complaints\n", + "- navigation or privacy/cookie/social media links\n", + "\n", + "Respond only with full https links in JSON format, like this:\n", + "{\n", + " \"links\": [\n", + " {\"type\": \"symptom or condition page\", \"url\": \"https://www.thuisarts.nl/hoofdpijn\"},\n", + " {\"type\": \"symptom or condition page\", \"url\": \"https://www.thuisarts.nl/buikpijn\"}\n", + " ]\n", + "}\n", + "\"\"\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "b3ac761e-f583-479e-b8ef-70e70f8f361a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "You are an assistant that filters URLs for patient education content. \n", + "\n", + "Only return links that lead to pages about symptoms, health conditions, treatments, or diseases — for example: pages on 'headache', 'diarrhea', 'stomach pain', 'asthma', etc.\n", + "\n", + "DO NOT return:\n", + "- contact pages\n", + "- overview/video/image/keuzekaart lists unless they directly link to medical complaints\n", + "- navigation or privacy/cookie/social media links\n", + "\n", + "Respond only with full https links in JSON format, like this:\n", + "{\n", + " \"links\": [\n", + " {\"type\": \"symptom or condition page\", \"url\": \"https://www.thuisarts.nl/hoofdpijn\"},\n", + " {\"type\": \"symptom or condition page\", \"url\": \"https://www.thuisarts.nl/buikpijn\"}\n", + " ]\n", + "}\n", + "\n" + ] + } + ], + "source": [ + "print(link_system_prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "5548e8d4-2813-40fe-a807-cf3661d3a0a9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ Found 680 condition pages.\n" + ] + } + ], + "source": [ + "condition_links = get_condition_links_from_topics_page()\n", + "print(f\"✅ Found {len(condition_links)} condition pages.\")\n", + "\n", + "# Format for summary function\n", + "selected_links = [{\"url\": link} for link in condition_links]\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "8d264592-8b77-425a-be4a-73ef7d32d744", + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "def load_existing_summaries(filepath=\"brochure_cache.json\"):\n", + " if os.path.exists(filepath):\n", + " with open(filepath, \"r\", encoding=\"utf-8\") as f:\n", + " return json.load(f)\n", + " return {}\n", + "\n", + "def save_summaries_to_cache(summaries, filepath=\"brochure_cache.json\"):\n", + " with open(filepath, \"w\", encoding=\"utf-8\") as f:\n", + " json.dump(summaries, f, indent=2, ensure_ascii=False)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "1cdd9456-1262-40a0-bc3f-28d23010ed7f", + "metadata": {}, + "outputs": [], + "source": [ + "selected_links = [{\"url\": link} for link in get_condition_links_from_topics_page()][:10]\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "0c2f24ea-fa6b-4431-849a-e1aeaa936022", + "metadata": {}, + "outputs": [], + "source": [ + "summary_cache = {}\n", + "\n", + "def summarize_for_brochure(url):\n", + " if url in summary_cache:\n", + " summary = summary_cache[url]\n", + " print(f\"✅ [Cached] {url}\")\n", + " print(f\"📄 Summary:\\n{summary}\\n\") # 👈 this prints the cached summary too\n", + " return summary\n", + "\n", + " page = Website(url)\n", + "\n", + " example = \"\"\"\n", + "Example:\n", + "\n", + "Title: Keelpijn \n", + "Summary: Sore throat is a common symptom, often caused by a virus. It usually goes away on its own within a few days. Drink warm fluids, rest your voice, and take paracetamol if needed. See a doctor if the pain lasts more than a week or gets worse.\n", + "\n", + "Title: Hoofdpijn \n", + "Summary: Headaches can have many causes like stress, fatigue, or dehydration. Most are harmless and go away with rest and fluids. Painkillers like paracetamol can help. If headaches are severe, frequent, or different than usual, contact your GP.\n", + "\"\"\"\n", + "\n", + " prompt = f\"\"\"\n", + "You are a health writer. Based on the Dutch content below, write a clear, short, brochure-style summary in **English** for patients.\n", + "\n", + "Use the format: \n", + "Title: {page.title} \n", + "Summary: \n", + "\n", + "Keep it under 100 words, easy to read, friendly, and medically accurate.\n", + "\n", + "{example}\n", + "\n", + "Now use this for:\n", + "Title: {page.title}\n", + "Content:\n", + "{page.text[:3000]}\n", + "\"\"\"\n", + "\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[{\"role\": \"user\", \"content\": prompt}],\n", + " temperature=0.4\n", + " )\n", + "\n", + " summary = response.choices[0].message.content.strip()\n", + " summary_cache[url] = summary\n", + " return summary\n" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "af8f9d81-d848-4fb9-ac79-782b39fed4a2", + "metadata": {}, + "outputs": [], + "source": [ + "def build_symptom_brochure(links, cache_file=\"brochure_cache.json\"):\n", + " brochure = []\n", + " cached = load_existing_summaries(cache_file)\n", + " print(\"📄 Building summaries for brochure:\\n\")\n", + "\n", + " for i, item in enumerate(links, 1):\n", + " url = item[\"url\"]\n", + " if url in cached:\n", + " print(f\"✅ [Cached] {url}\")\n", + " brochure.append({\"url\": url, \"summary\": cached[url]})\n", + " continue\n", + " \n", + " print(f\"🔄 [{i}/{len(links)}] Summarizing: {url}\")\n", + " try:\n", + " summary = summarize_for_brochure(url)\n", + " print(f\"✅ Summary:\\n{summary}\\n\")\n", + " brochure.append({\"url\": url, \"summary\": summary})\n", + " cached[url] = summary # Save new summary\n", + " save_summaries_to_cache(cached, cache_file)\n", + " except Exception as e:\n", + " print(f\"❌ Error summarizing {url}: {e}\\n\")\n", + " brochure.append({\"url\": url, \"summary\": \"Error generating summary.\"})\n", + "\n", + " return brochure\n" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "e9079d6b-538f-4681-9776-4628a111246a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "📄 Building summaries for brochure:\n", + "\n", + "🔄 [1/10] Summarizing: https://www.thuisarts.nl/sociale-angststoornis\n", + "✅ [New] https://www.thuisarts.nl/sociale-angststoornis\n", + "📄 Summary:\n", + "Title: Social Anxiety Disorder\n", + "Summary: Social anxiety disorder, or social phobia, is a fear of what others think of you, often leading to panic attacks. Writing down what happens, your thoughts, and feelings can help manage this fear. Positive thinking can also be beneficial when you're feeling anxious. Discussing your concerns with your GP or practice nurse can be helpful. If there's no improvement or symptoms are severe, treatments such as therapy with a psychologist or anxiety medication may be considered.\n", + "\n", + "✅ Summary:\n", + "Title: Social Anxiety Disorder\n", + "Summary: Social anxiety disorder, or social phobia, is a fear of what others think of you, often leading to panic attacks. Writing down what happens, your thoughts, and feelings can help manage this fear. Positive thinking can also be beneficial when you're feeling anxious. Discussing your concerns with your GP or practice nurse can be helpful. If there's no improvement or symptoms are severe, treatments such as therapy with a psychologist or anxiety medication may be considered.\n", + "\n", + "✅ [Cached] https://www.thuisarts.nl/diabetes-type-2\n", + "🔄 [3/10] Summarizing: https://www.thuisarts.nl/morton-neuroom\n", + "✅ [New] https://www.thuisarts.nl/morton-neuroom\n", + "📄 Summary:\n", + "Title: Morton's Neuroma | Thuisarts.nl \n", + "Summary: Morton's Neuroma is a pinched nerve in the forefoot, causing burning pain in the forefoot and toes. It often results from wearing too narrow shoes or high heels. Wearing comfortable, roomy shoes can help alleviate symptoms. For severe pain, paracetamol can be taken. Sometimes, a custom shoe insole can also help.\n", + "\n", + "✅ Summary:\n", + "Title: Morton's Neuroma | Thuisarts.nl \n", + "Summary: Morton's Neuroma is a pinched nerve in the forefoot, causing burning pain in the forefoot and toes. It often results from wearing too narrow shoes or high heels. Wearing comfortable, roomy shoes can help alleviate symptoms. For severe pain, paracetamol can be taken. Sometimes, a custom shoe insole can also help.\n", + "\n", + "🔄 [4/10] Summarizing: https://www.thuisarts.nl/borstvergroting\n", + "✅ [New] https://www.thuisarts.nl/borstvergroting\n", + "📄 Summary:\n", + "Title: Breast Augmentation | Thuisarts.nl \n", + "Summary: A breast augmentation is a procedure where a plastic surgeon inserts fillings into your breasts, under general anesthesia. The surgery takes about an hour. Consider the pros and cons carefully. Benefits may include a more positive body image and increased self-confidence. Risks may include infection, bleeding, scarring, or hardening of the breasts over time. Often, a follow-up surgery is needed later. If you smoke, it's important to quit three weeks before surgery.\n", + "\n", + "✅ Summary:\n", + "Title: Breast Augmentation | Thuisarts.nl \n", + "Summary: A breast augmentation is a procedure where a plastic surgeon inserts fillings into your breasts, under general anesthesia. The surgery takes about an hour. Consider the pros and cons carefully. Benefits may include a more positive body image and increased self-confidence. Risks may include infection, bleeding, scarring, or hardening of the breasts over time. Often, a follow-up surgery is needed later. If you smoke, it's important to quit three weeks before surgery.\n", + "\n", + "🔄 [5/10] Summarizing: https://www.thuisarts.nl/kijkoperatie-in-buik\n", + "✅ [New] https://www.thuisarts.nl/kijkoperatie-in-buik\n", + "📄 Summary:\n", + "Title: Abdominal Laparoscopy | Thuisarts.nl\n", + "Summary: An abdominal laparoscopy allows the doctor to examine or operate in your abdomen. Small tubes with a camera and tools are inserted through tiny incisions. You'll have a pre-operation discussion with your surgeon and anesthesiologist. You will be deeply sedated for the procedure. You cannot drive home post-operation, so arrange for someone to pick you up. Recovery usually requires a week off work, sometimes longer.\n", + "\n", + "✅ Summary:\n", + "Title: Abdominal Laparoscopy | Thuisarts.nl\n", + "Summary: An abdominal laparoscopy allows the doctor to examine or operate in your abdomen. Small tubes with a camera and tools are inserted through tiny incisions. You'll have a pre-operation discussion with your surgeon and anesthesiologist. You will be deeply sedated for the procedure. You cannot drive home post-operation, so arrange for someone to pick you up. Recovery usually requires a week off work, sometimes longer.\n", + "\n", + "🔄 [6/10] Summarizing: https://www.thuisarts.nl/veranderingen-in-zorg-als-je-18-wordt\n", + "✅ [New] https://www.thuisarts.nl/veranderingen-in-zorg-als-je-18-wordt\n", + "📄 Summary:\n", + "Title: Changes in Care When You Turn 18 | Thuisarts.nl\n", + "Summary: As you become an adult, usually around 18, you transition from child to adult healthcare. You will start to take more responsibility, such as making appointments and requesting medications, giving you more control over your care. You will create a plan detailing what you need to manage this independently, with support provided to help you. This transition is a gradual process, with preparation beginning before you turn 18.\n", + "\n", + "✅ Summary:\n", + "Title: Changes in Care When You Turn 18 | Thuisarts.nl\n", + "Summary: As you become an adult, usually around 18, you transition from child to adult healthcare. You will start to take more responsibility, such as making appointments and requesting medications, giving you more control over your care. You will create a plan detailing what you need to manage this independently, with support provided to help you. This transition is a gradual process, with preparation beginning before you turn 18.\n", + "\n", + "🔄 [7/10] Summarizing: https://www.thuisarts.nl/zon-en-zonnebrand\n", + "✅ [New] https://www.thuisarts.nl/zon-en-zonnebrand\n", + "📄 Summary:\n", + "Title: Sun and Sunburn | Thuisarts.nl\n", + "Summary: Protect your skin from excessive sunlight to avoid sunburn. If you notice your skin burning, immediately move out of the sun. Cool your skin with wet cloths if it hurts and take paracetamol for severe pain. Stay out of the sun for at least three days to allow your skin to recover. If you have symptoms of sunstroke, sun allergy, or eczema, seek medical advice.\n", + "\n", + "✅ Summary:\n", + "Title: Sun and Sunburn | Thuisarts.nl\n", + "Summary: Protect your skin from excessive sunlight to avoid sunburn. If you notice your skin burning, immediately move out of the sun. Cool your skin with wet cloths if it hurts and take paracetamol for severe pain. Stay out of the sun for at least three days to allow your skin to recover. If you have symptoms of sunstroke, sun allergy, or eczema, seek medical advice.\n", + "\n", + "🔄 [8/10] Summarizing: https://www.thuisarts.nl/ganglion\n", + "✅ [New] https://www.thuisarts.nl/ganglion\n", + "📄 Summary:\n", + "Title: Ganglion | Thuisarts.nl \n", + "Summary: A ganglion is a small bump that can appear on your wrist, finger, or foot. It is a protrusion from the joint and is harmless. In half of the cases, a ganglion disappears on its own. If you notice such a bump, there is usually no cause for concern.\n", + "\n", + "✅ Summary:\n", + "Title: Ganglion | Thuisarts.nl \n", + "Summary: A ganglion is a small bump that can appear on your wrist, finger, or foot. It is a protrusion from the joint and is harmless. In half of the cases, a ganglion disappears on its own. If you notice such a bump, there is usually no cause for concern.\n", + "\n", + "🔄 [9/10] Summarizing: https://www.thuisarts.nl/kunstheup\n", + "✅ [New] https://www.thuisarts.nl/kunstheup\n", + "📄 Summary:\n", + "Title: Hip Replacement | Thuisarts.nl\n", + "Summary: A hip replacement can be an option if you are experiencing severe pain or stiffness in your hip, such as from advanced arthritis or another hip disease. This is usually considered when other treatments like physiotherapy and painkillers have not provided enough relief. You can discuss with your hospital doctor whether a hip replacement is suitable for you. A hip prosthesis typically lasts longer than 20 years.\n", + "\n", + "✅ Summary:\n", + "Title: Hip Replacement | Thuisarts.nl\n", + "Summary: A hip replacement can be an option if you are experiencing severe pain or stiffness in your hip, such as from advanced arthritis or another hip disease. This is usually considered when other treatments like physiotherapy and painkillers have not provided enough relief. You can discuss with your hospital doctor whether a hip replacement is suitable for you. A hip prosthesis typically lasts longer than 20 years.\n", + "\n", + "🔄 [10/10] Summarizing: https://www.thuisarts.nl/gezond-leven\n", + "✅ [New] https://www.thuisarts.nl/gezond-leven\n", + "📄 Summary:\n", + "Title: Healthy Living | Thuisarts.nl\n", + "Summary: For good health, it's important to eat, drink, and sleep well, stay active, relax, and maintain social contacts. Avoiding substances like alcohol is also beneficial. If you want to make changes to your lifestyle, take it step by step. Discuss your plans with your GP or practice nurse. Whether it's about healthy eating, exercise, sleep, stress management, social contact, or substance use, they can provide guidance and support.\n", + "\n", + "✅ Summary:\n", + "Title: Healthy Living | Thuisarts.nl\n", + "Summary: For good health, it's important to eat, drink, and sleep well, stay active, relax, and maintain social contacts. Avoiding substances like alcohol is also beneficial. If you want to make changes to your lifestyle, take it step by step. Discuss your plans with your GP or practice nurse. Whether it's about healthy eating, exercise, sleep, stress management, social contact, or substance use, they can provide guidance and support.\n", + "\n" + ] + } + ], + "source": [ + "brochure = build_symptom_brochure(selected_links)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "e2121c3c-aa6a-4640-8e19-6ca6ccf84783", + "metadata": {}, + "outputs": [], + "source": [ + "def export_brochure_to_txt(brochure, filepath=\"brochure_summaries.txt\"):\n", + " if not brochure:\n", + " print(\"⚠️ No summaries to export.\")\n", + " return\n", + "\n", + " with open(filepath, \"w\", encoding=\"utf-8\") as f:\n", + " for item in brochure:\n", + " url = item.get(\"url\", \"Unknown URL\")\n", + " summary = item.get(\"summary\", \"No summary available.\")\n", + " f.write(f\"URL: {url}\\n\")\n", + " f.write(f\"{summary}\\n\\n\")\n", + "\n", + " print(f\"📁 Exported {len(brochure)} summaries to {filepath}\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "f14288f9-4d1c-4a0e-aaf4-9f86324b0602", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "📁 Exported 10 summaries to brochure_summaries.txt\n" + ] + } + ], + "source": [ + "export_brochure_to_txt(brochure)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c23e89db-3ded-4189-a227-6ca6ac2f1332", + "metadata": {}, + "outputs": [], + "source": [ + "###---it works---" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a700e4f3-fb6a-499a-a579-6f9b8ad35c9f", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/community-contributions/sf-patient-brochure/brochure_summaries.txt b/community-contributions/sf-patient-brochure/brochure_summaries.txt new file mode 100644 index 0000000..0ba4556 --- /dev/null +++ b/community-contributions/sf-patient-brochure/brochure_summaries.txt @@ -0,0 +1,40 @@ +URL: https://www.thuisarts.nl/sociale-angststoornis +Title: Social Anxiety Disorder +Summary: Social anxiety disorder, or social phobia, is a fear of what others think of you, often leading to panic attacks. Writing down what happens, your thoughts, and feelings can help manage this fear. Positive thinking can also be beneficial when you're feeling anxious. Discussing your concerns with your GP or practice nurse can be helpful. If there's no improvement or symptoms are severe, treatments such as therapy with a psychologist or anxiety medication may be considered. + +URL: https://www.thuisarts.nl/diabetes-type-2 +Title: Diabetes type 2 | Thuisarts.nl +Summary: Type 2 diabetes, also known as sugar disease, is characterized by high blood sugar levels. Leading a healthy lifestyle is crucial: eat healthily, lose weight, exercise regularly, relax, and quit smoking. If blood sugar levels remain high, medication may be required. Regular check-ups, usually every three months, with your GP or practice nurse are essential. + +URL: https://www.thuisarts.nl/morton-neuroom +Title: Morton's Neuroma | Thuisarts.nl +Summary: Morton's Neuroma is a pinched nerve in the forefoot, causing burning pain in the forefoot and toes. It often results from wearing too narrow shoes or high heels. Wearing comfortable, roomy shoes can help alleviate symptoms. For severe pain, paracetamol can be taken. Sometimes, a custom shoe insole can also help. + +URL: https://www.thuisarts.nl/borstvergroting +Title: Breast Augmentation | Thuisarts.nl +Summary: A breast augmentation is a procedure where a plastic surgeon inserts fillings into your breasts, under general anesthesia. The surgery takes about an hour. Consider the pros and cons carefully. Benefits may include a more positive body image and increased self-confidence. Risks may include infection, bleeding, scarring, or hardening of the breasts over time. Often, a follow-up surgery is needed later. If you smoke, it's important to quit three weeks before surgery. + +URL: https://www.thuisarts.nl/kijkoperatie-in-buik +Title: Abdominal Laparoscopy | Thuisarts.nl +Summary: An abdominal laparoscopy allows the doctor to examine or operate in your abdomen. Small tubes with a camera and tools are inserted through tiny incisions. You'll have a pre-operation discussion with your surgeon and anesthesiologist. You will be deeply sedated for the procedure. You cannot drive home post-operation, so arrange for someone to pick you up. Recovery usually requires a week off work, sometimes longer. + +URL: https://www.thuisarts.nl/veranderingen-in-zorg-als-je-18-wordt +Title: Changes in Care When You Turn 18 | Thuisarts.nl +Summary: As you become an adult, usually around 18, you transition from child to adult healthcare. You will start to take more responsibility, such as making appointments and requesting medications, giving you more control over your care. You will create a plan detailing what you need to manage this independently, with support provided to help you. This transition is a gradual process, with preparation beginning before you turn 18. + +URL: https://www.thuisarts.nl/zon-en-zonnebrand +Title: Sun and Sunburn | Thuisarts.nl +Summary: Protect your skin from excessive sunlight to avoid sunburn. If you notice your skin burning, immediately move out of the sun. Cool your skin with wet cloths if it hurts and take paracetamol for severe pain. Stay out of the sun for at least three days to allow your skin to recover. If you have symptoms of sunstroke, sun allergy, or eczema, seek medical advice. + +URL: https://www.thuisarts.nl/ganglion +Title: Ganglion | Thuisarts.nl +Summary: A ganglion is a small bump that can appear on your wrist, finger, or foot. It is a protrusion from the joint and is harmless. In half of the cases, a ganglion disappears on its own. If you notice such a bump, there is usually no cause for concern. + +URL: https://www.thuisarts.nl/kunstheup +Title: Hip Replacement | Thuisarts.nl +Summary: A hip replacement can be an option if you are experiencing severe pain or stiffness in your hip, such as from advanced arthritis or another hip disease. This is usually considered when other treatments like physiotherapy and painkillers have not provided enough relief. You can discuss with your hospital doctor whether a hip replacement is suitable for you. A hip prosthesis typically lasts longer than 20 years. + +URL: https://www.thuisarts.nl/gezond-leven +Title: Healthy Living | Thuisarts.nl +Summary: For good health, it's important to eat, drink, and sleep well, stay active, relax, and maintain social contacts. Avoiding substances like alcohol is also beneficial. If you want to make changes to your lifestyle, take it step by step. Discuss your plans with your GP or practice nurse. Whether it's about healthy eating, exercise, sleep, stress management, social contact, or substance use, they can provide guidance and support. + diff --git a/community-contributions/vanshika-mahajan/web_summary_fashion.ipynb b/community-contributions/vanshika-mahajan/web_summary_fashion.ipynb new file mode 100644 index 0000000..bc0930c --- /dev/null +++ b/community-contributions/vanshika-mahajan/web_summary_fashion.ipynb @@ -0,0 +1,933 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 113, + "id": "030082e9-edee-40b6-9f17-b6a683f2e334", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "import bs4\n", + "from bs4 import BeautifulSoup\n", + "import lxml\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": 115, + "id": "c87e997d-e1d6-4b6f-9c76-3fb1d607f7cd", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')" + ] + }, + { + "cell_type": "code", + "execution_count": 116, + "id": "e450cb33-1ae4-435e-b155-35f2bd7ab78e", + "metadata": {}, + "outputs": [], + "source": [ + "headers={\n", + " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", + "} \n", + "#a dictionary named header so that we can grab same html code as the user ,and also to avoid blocks,captcha and error403" + ] + }, + { + "cell_type": "code", + "execution_count": 119, + "id": "63a57fb7-79db-444b-968b-c9314b1f3d3f", + "metadata": {}, + "outputs": [], + "source": [ + "class Website:\n", + " def __init__(self,url):\n", + " self.url=url\n", + " response= requests.get(url,headers=headers,timeout=30)\n", + " soup=BeautifulSoup(response.content,'lxml')\n", + " self.title=soup.title.string if soup.title else \"No title found\"#scraping the content\n", + " for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):#cleaning the content\n", + " irrelevant.decompose()\n", + " #using .get_text() method of Beautiful soup\n", + " self.text = soup.body.get_text(separator=\"\\n\", strip=True)#creating space between different lines and removing leading whitespaces by strip=true" + ] + }, + { + "cell_type": "code", + "execution_count": 121, + "id": "7369159d-1f36-43c9-b7e7-a0b65b56426b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Latest and Trending Entertainment News, Celebrity News, Movie News, Breaking News | Entertainment - Times of India\n", + "Sign In\n", + "TOI\n", + "Go to\n", + "TOI\n", + "Etimes\n", + "home\n", + "cinema\n", + "news\n", + "movie reviews\n", + "movie listings\n", + "box office\n", + "anime\n", + "previews\n", + "did you know\n", + "videos\n", + "showtimes\n", + "blogs\n", + "awards\n", + "News\n", + "entertainment\n", + "Trending\n", + "Javed Akhtar\n", + "Diljit Dosanjh\n", + "Jaideep Ahlawat\n", + "Karisma Kapoor\n", + "Gauri Khan\n", + "Blake Lively\n", + "Trisha Krishnan\n", + "Kuberaa Box Office Collection\n", + "Sitaare Zameen Par Box Office Collection\n", + "Housefull 5\n", + "Kuberaa Movie Review\n", + "Sitaare Zameen Par Movie Review\n", + "Javed Akhtar\n", + "Diljit Dosanjh\n", + "Jaideep Ahlawat\n", + "Karisma Kapoor\n", + "Gauri Khan\n", + "Blake Lively\n", + "Trisha Krishnan\n", + "Kuberaa Box Office Collection\n", + "Sitaare Zameen Par Box Office Collection\n", + "Housefull 5\n", + "Kuberaa Movie Review\n", + "Sitaare Zameen Par Movie Review\n", + "Javed Akhtar\n", + "Diljit Dosanjh\n", + "Jaideep Ahlawat\n", + "Karisma Kapoor\n", + "Gauri Khan\n", + "Blake Lively\n", + "Trisha Krishnan\n", + "Kuberaa Box Office Collection\n", + "Sitaare Zameen Par Box Office Collection\n", + "Housefull 5\n", + "Kuberaa Movie Review\n", + "Sitaare Zameen Par Movie Review\n", + "Sudhanshu: At 52, John, Dino all of them look like rockstars - EXCLUSIVE\n", + "Sudhanshu Pandey, recognized from 'Band Of Boys' and 'Anupama', defies his 50 years with his fitness. He credits his peers like Dino Moria, Arjun Rampal, and John Abraham for inspiring him to maintain a fit and youthful appearance. Pandey also admires Anil Kapoor's energy and dedication, motivating him to continue prioritizing fitness and inspiring others.\n", + "Previous\n", + "Sonakshi breaks silence on her rift with Luv and Kussh\n", + "Madhuri once chased Aamir with hockey stick for THIS reason\n", + "Ranbir-Raj Kapoor, Diljit-Hania, Samay-IGL: Top 5 news\n", + "Big B's savage reply to troll over cybercrime callertune\n", + "Anushka on keeping kids Vamika, Akaay away from public eye\n", + "Apoorva Mukhija recalls witnessing gender bias at home\n", + "Danish influencer seeks help to find papads from Big B\n", + "Sunjay Kapur's reception pics with Priya Sachdev goes viral\n", + "Big B schools trolls commenting 'buddha sathiya gaya hai'\n", + "Anushka on how she and Virat divide parenting duties\n", + "Brahmaji reacts to Vishnu's 7,000-acre land in New Zealand\n", + "Diljit says THIS amidst trolling for working with Hania\n", + "Riddhi found it ridiculous to like SRK's mother in Jawan\n", + "Priya Sachdev once called husband Sunjay Kapur ‘misunderstood’\n", + "Next\n", + "1\n", + "2\n", + "3\n", + "Hindi\n", + "See All\n", + "Sudhanshu: At 52, John, Dino all of them look like rockstars - EXCLUSIVE\n", + "Sudhanshu Pandey, recognized from 'Band Of Boys' and 'Anupama', defies his 50 years with his fitness. He credits his peers like Dino Moria, Arjun Rampal, and John Abraham for inspiring him to maintain a fit and youthful appearance. Pandey also admires Anil Kapoor's energy and dedication, motivating him to continue prioritizing fitness and inspiring others.\n", + "Sonakshi breaks silence on her rift with Luv and Kussh\n", + "Madhuri once chased Aamir with hockey stick for THIS reason\n", + "Ranbir-Raj Kapoor, Diljit-Hania, Samay-IGL: Top 5 news\n", + "Anushka on keeping kids Vamika, Akaay away from public eye\n", + "Anushka Sharma and Virat Kohli are committed to shielding their children, Vamika and Akaay, from the constant glare of public attention. In a recent interview, Anushka emphasized the couple's focus on instilling strong values and ensuring a normal upbringing for their kids.\n", + "Apoorva Mukhija recalls witnessing gender bias at home\n", + "Regional\n", + "When Samantha’s class 10 mark sheet got leaked\n", + "Throwback to when a nostalgic memory made its way across the internet — Samantha Ruth Prabhu’s Class 10 mark sheet! The actress’s charming on-screen presence and grounded personality were once again in the spotlight as her old school report card began doing the rounds on social media.\n", + "Actor Tushar Ghadigaonkar passes away at 34\n", + "‘Kuberaa’ Twitter review: Netizens calls it a ‘Blockbuster’\n", + "Mammootty’s health- Brittas says actor doing well\n", + "Kavya Madhavan’s father P. Madhavan passes away\n", + "‘The Raja Saab’ teaser: Prabhas shines in this horror comedy\n", + "Mammootty’s father-in-law P S Abu passes away\n", + "Videos\n", + "See All\n", + "Previous\n", + "03:07\n", + "Ananya Panday’s Garden Bond With Parrots Wins Hearts\n", + "88 views | 2 hours ago\n", + "03:14\n", + "Sameera Reddy’s Healing Journey Through Yoga\n", + "31 views | 2 hours ago\n", + "03:13\n", + "Kriti Kharbanda’s Modern Maharani Look Stuns Instagram\n", + "26 views | 2 hours ago\n", + "03:12\n", + "Bobby Deol Meets Diljit Dosanjh: Punjabi Power Goes Viral\n", + "81 views | 2 hours ago\n", + "03:19\n", + "‘Sitaare Zameen Par’: Riteish Deshmukh’s Emotional Shoutout For Genelia’s Big Win\n", + "162 views | 2 hours ago\n", + "03:26\n", + "Varun Dhawan Stuns With 50 Push-Ups Alongside Army Cadets on Border 2 Set\n", + "21 views | 2 hours ago\n", + "03:00\n", + "VIDYA BALAN TURNS HEADS WITH CASUAL AIRPORT LOOK\n", + "16 views | 2 hours ago\n", + "03:05\n", + "MANDHIRA KAPUR BREAKS DOWN IN EMOTIONAL POST FOR LATE BROTHER SUNJAY KAPUR\n", + "1.2K views | 2 hours ago\n", + "03:28\n", + "SALMAN KHAN TAKES A BRUTAL DIG AT SOHAIL’S DIVORCE ON NATIONAL TV\n", + "185 views | 2 hours ago\n", + "03:15\n", + "RAJINIKANTH CAUSES FAN RIOT DURING ‘JAILER 2’ SHOOT IN MYSORE\n", + "26 views | 2 hours ago\n", + "03:10\n", + "IBRAHIM ALI KHAN KISSES HIS DOG AT AIRPORT IN HEARTWARMING FAREWELL\n", + "20 views | 3 hours ago\n", + "03:09\n", + "ANUPAMAA SET GUTTED IN MASSIVE FIRE | CREW ESCAPES, CINE BODY DEMANDS ACTION\n", + "1.2K views | 3 hours ago\n", + "Next\n", + "1\n", + "2\n", + "3\n", + "4\n", + "5\n", + "6\n", + "7\n", + "8\n", + "9\n", + "10\n", + "11\n", + "World\n", + "See All\n", + "Aamir to Tom: Celebs on a mission to 'Save Cinema'\n", + "'How to Train Your Dragon' beats '28 Years Later' and 'Elio' to top the US box office on second weekend\n", + "Blake Lively is heartbroken after friendship ends with Taylor Swift; accepts the music mogul won't be returning - Deets inside\n", + "Selena-Hailey UNFOLLOW each other amid Bieber drama\n", + "Judge gives Baldoni access to Blake-Taylor messages\n", + "Trending Now\n", + "# Sidharth Malhotra-Kiara Advani\n", + "# AbRam Khan-Taimur Ali Khan\n", + "# Janhvi Kapoor\n", + "# Salman Khan\n", + "# Hema Malini\n", + "# Salman Khan\n", + "# Gauri Khan\n", + "# Shah Rukh Khan\n", + "# Chahatt Khanna\n", + "Visual Stories\n", + "See All\n", + "Previous\n", + "Kuberaa’s Sameera to Pushpa’s Srivalli: Rashmika Mandanna’s most iconic on-screen avatars\n", + "Ahaana Krishna’s ethereal photo series is straight out of a dream\n", + "Rashmika Mandanna to Rakul Preet Singh: Best pictures of the week featuring south actresses\n", + "Gauri Khan's most loved saree looks - An ode to modern day elegance\n", + "​South Indian beauties whose smiles will light up your Monday\n", + "Karishma Tanna Slays Every Frame\n", + "Tamannaah Bhatia’s traditional looks\n", + "Malavika Mohanan's radiant pics\n", + "​Neha Shetty stuns in every shade of blue\n", + "Thalapathy Vijay’s top 10 blockbuster movies worth re-watching!\n", + "​In pic: Mesmerizing looks of Shruti Haasan​\n", + "Dushara Vijayan’s Most Elegant Fashion Moments\n", + "Next\n", + "1\n", + "2\n", + "3\n", + "More Stories\n", + "Sonakshi Sinha breaks silence on her rumoured rift with brothers Luv and Kussh Sinha: 'My effort is always to support them...'\n", + "Madhuri Dixit once chased Aamir Khan with a hockey stick for THIS reason on sets of Dil: 'People fool you and you believe them'\n", + "Mohanlal declines to continue as president at AMMA’s general body meeting- Deets Inside\n", + "Blockbusters Ranbir Kapoor turned down: Films that became hits without him\n", + "Anushka Sharma reveals why she and Virat Kohli are keeping their children Vamika and Akaay away from the public eye: 'We don't want to raise brats'\n", + "Apoorva Mukhija recalls witnessing gender bias at home: 'My mother did it all, but father got credit for showing up at PTMs'\n", + "Amitabh Bachchan gives a savage reply to a troll over his viral cybercrime caller tune: 'Sarkar ko bolo bhai..'\n", + "Danish influencer asks fans to help her find papads from Amitabh Bachchan; netizens say 'he also used to grow basmati rice'\n", + "Days after his untimely demise, Sunjay Kapur's reception photos with Priya Sachdev goes viral; Looked dashing in hand embroidered shoes, written 'I do'\n", + "Priyanka Chopra Jonas recollects walking into a trap set by John Cena, Idris Elba on sets of 'Heads of State'\n", + "Bobby Deol's London vacation sparks fan frenzy: viral video shows actor posing for selfies outside restaurant\n", + "Amitabh Bachchah gives befitting replies to 'buddha sathiya gaya hai', ‘ganja’ comments by trolls: 'Ek din, Bhagwan naa kare voh din jaldi aaye...'\n", + "Sai Pallavi’s best performances\n", + "Brahmaji clears the air about Vishnu Manchu purchasing 7,000-acre land in New Zealand: 'I was pulling their leg as usual...'\n", + "Anushka Sharma reveals how she and Virat Kohli divide the parenting duties: 'I will be the primary caregiver, he plays round the year'\n", + "Ranbir Kapoor's 'Awara' look sparks rumours of Raj Kapoor tribute, Diljit Dosanjh slammed for working with Hania Aamir in Sardaar Ji 3: Top 5 news\n", + "Has Kiara Advani been approached to play Meena Kumari in her biopic? Here's what we know\n", + "Top 5 psychological Anime every thriller fan must watch\n", + "Load More Stories\n", + "# Latest Movies 2025\n", + "# Best Bollywood Movies 2025\n", + "# Hollywood Movie 2025\n", + "# Tamil Movies 2025\n", + "# Telugu Movies 2025\n", + "# Malayalam Movies 2025\n", + "# Kannada Movies 2025\n", + "# Marathi Movies 2025\n", + "# Bengali Movies 2025\n", + "# Top Rated Movies 2025\n", + "# Best Hindi Movies\n", + "# Best English Movies\n", + "Hot on the Web\n", + "Salman Khan\n", + "Karisma Kapoor\n", + "Jaideep Ahlawat\n", + "Blood Pressure\n", + "Big Cat Species\n", + "Trisha\n", + "Sitaare Zameen Par Review\n", + "Ancient Indigenous Tribes\n", + "Hair Growth Tips\n", + "Kidney Health\n", + "Kuberaa Review\n", + "Blake Lively\n", + "Reverse Fatty Liver\n", + "Skincare Hacks\n", + "Kuberaa Box Office Collection\n", + "Sitaare Zameen Par Box Office Collection\n", + "Baby Girl Names\n", + "Diljit Dosanjh\n", + "Kidney Disease Symptoms\n", + "Javed Akhtar\n", + "Heart Attack\n", + "Ram Kapoor Diet\n", + "Liver Damage\n", + "Kuberaa Movie Review\n", + "Gauri Khan\n", + "Baba Vanga Prediction\n", + "Baby Boy Names\n", + "Navjot Singh Sidhu\n", + "Housefull 5 Box Office Collection\n", + "DNA Movie Review\n", + "Kidney Damage Symptoms\n", + "Popular Waterfalls In India\n", + "Linkedin Ceo On AI Killing Jobs\n", + "Tesla Robotaxi\n", + "Early Cancer Detection\n", + "Harvard Research Reveals\n", + "American Destinations Explore Without Passport\n", + "Amouranth\n", + "Mouth Larvae\n", + "Doomsday Fish\n", + "Salman Khan AVM\n", + "Ginger Health Tips\n", + "Trending Topics\n", + "Latest Movies\n", + "Bollywood Movies\n", + "Hollywood Movies\n", + "Tamil Movies 2025\n", + "Telugu Movies 2025\n", + "Malayalam Movies 2025\n", + "Kannada Movies 2025\n", + "Marathi Movies 2025\n", + "Bengali Movies 2025\n", + "Top Rated Movies 2025\n", + "Best Hindi Movies\n", + "Best English Movies\n", + "Best Telugu Movies\n", + "Best Tamil Movies\n", + "Best Malayalam Movies\n", + "Best Kannada Movies\n", + "Best Bengali Movies\n", + "Upcoming Hindi Movies\n", + "Best Movies Of All Time\n", + "Best Hindi Movies of All Time\n", + "Latest English Movies\n", + "Latest Malayalam Movies\n", + "English TV News\n", + "Tamil TV News\n", + "Telugu TV News\n", + "Malayalam TV News\n", + "Kannada TV News\n", + "Movie Reviews\n", + "Bhojpuri Cinema News\n", + "Gujarati Cinema News\n", + "Popular Categories\n", + "Viral News\n", + "K Pop News\n", + "Web Series News\n", + "Anime News\n", + "Upcoming English Movies\n", + "Upcoming Tamil Movies\n", + "Upcoming Telugu Movies\n", + "Upcoming Malayalam Movies\n", + "Upcoming Kannada Movies\n", + "Fashion Tips\n", + "Travel News\n", + "Entertainment News\n", + "Bollywood News\n", + "Tollywood News\n", + "Kollywood News\n", + "Mollywood News\n", + "Food News\n", + "Latest Hindi Movies\n", + "Latest Tamil Movies\n", + "Parenting Tips\n", + "Home Remedies\n", + "Weight Loss\n", + "Beauty Tips\n", + "Parenting Tips\n", + "Hindi Videos\n", + "Hindi Video Songs\n", + "Bhojpuri Music Videos\n", + "Latest Telugu Movies\n", + "Bhojpuri Music Video\n", + "Hindi TV News\n", + "Latest News\n", + "NHL free agency turns spicy as Mitch Marner and Connor McDavid eye shorter deals to cash in later\n", + "Olive Ridley turtle washed ashore at Polem\n", + "Who is Thomas Fugate? Meet the 22-year-old leading Trump's terrorism unit amid Iran fiasco\n", + "'And that's why Putin's the boss': Trump rebukes former Russian President Medvedev; warns against treating 'N word casually'\n", + "Govt plans ₹10cr road on Bicholim-Dodamarg route\n", + "Former WWE star Batista eyed for Road House 2 sequel\n", + "Sonakshi Sinha breaks silence on her rumoured rift with brothers Luv and Kussh Sinha: 'My effort is always to support them...'\n", + "Andre Agassi and Steffi Graf’s son Jaden Agassi shows love for girlfriend Catherine Holt’s bold new photo from bedroom series\n", + "Is WWE planning to change Cody Rhodes’ iconic entrance theme song ‘Kingdom’?\n", + "Velumani says he didn’t attend RSS event in Coimbatore\n", + "Strait of Hormuz: Oil supply not an issue for India; 'pricing is a bigger concern,' what experts say\n", + "Madhuri Dixit once chased Aamir Khan with a hockey stick for THIS reason on sets of Dil: 'People fool you and you believe them'\n", + "As commissions fall, India’s ride-hailing firms test viability of flat-fee economics\n", + "Analysing what Trump’s strikes mean for Iran\n", + "Trump's clarification on 'Iran regime change' divides MAGA further: JD Vance, Hegseth, Marco Rubio 'humiliated'\n", + "Laughter Chefs 2: Krushna Abhishek roasts Rahul Vaidya for his in-famous feud with cricketer Virat Kohli\n", + "“I could have passed Dan Ticktum”: Edoardo Mortara regrets Attack Mode strategy at Jakarta E-Prix\n", + "India vs England Test: Sunil Gavaskar calls for Rishabh Pant's signature somersault celebration, wicketkeeper politely declines - WATCH\n", + "Copyright © 2025 Bennett, Coleman & Co. Ltd. All rights reserved. For reprint rights: Times Syndication Service\n", + "Follow us on\n" + ] + } + ], + "source": [ + "gossip= Website(\"https://timesofindia.indiatimes.com/entertainment\")\n", + "print(gossip.title)\n", + "print(gossip.text)" + ] + }, + { + "cell_type": "code", + "execution_count": 123, + "id": "a6f30380-1b91-48e4-9c86-df0369e2e675", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = \"\"\"\n", + "You are a stylish and culturally aware assistant who specializes in summarizing and discussing fashion trends, celebrity style, entertainment news, and television gossip.\n", + "\n", + "You stay updated on Hollywood, Bollywood, and the television world—including celebrity rumors, drama, reality TV updates, show recaps, and behind-the-scenes stories.\n", + "\n", + "When summarizing content, be engaging, concise, and insightful. Focus on what's trending, who's wearing what, and what everyone is talking about in fashion and entertainment. Maintain a fun yet informative tone, like a pop culture expert writing for a lifestyle magazine.\n", + "\n", + "If content includes TV gossip, highlight key rumors, casting updates, fan reactions, and noteworthy moments from popular shows.\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 125, + "id": "30822d5c-d518-451c-b31f-44afa2a3b37a", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(website):\n", + " user_prompt = f\"\"\"The following text is extracted from a website titled: \"{website.title}\".\n", + "\n", + "Please analyze this content and provide a short and engaging summary in **Markdown format**.\n", + "\n", + "If the page contains:\n", + "- 🧵 Fashion trends: mention standout styles, designers, or events.\n", + "- 🗣️ TV gossip: highlight any drama, casting news, or fan reactions.\n", + "- 🎬 Celebrity updates (Hollywood/Bollywood): include relevant quotes, fashion moments, or event mentions.\n", + "- 📺 Show recaps: summarize what happened and any major twists.\n", + "\n", + "Keep the summary clear, fun, and informative. Use bullet points if multiple themes appear. If there is no meaningful content, say: *“No relevant summary could be generated.”*\n", + "\n", + "Website Content:\n", + "{website.text}\n", + "\"\"\"\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": 127, + "id": "5a25e90f-20a0-44ac-a96c-575ae974a45f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The following text is extracted from a website titled: \"Latest and Trending Entertainment News, Celebrity News, Movie News, Breaking News | Entertainment - Times of India\".\n", + "\n", + "Please analyze this content and provide a short and engaging summary in **Markdown format**.\n", + "\n", + "If the page contains:\n", + "- 🧵 Fashion trends: mention standout styles, designers, or events.\n", + "- 🗣️ TV gossip: highlight any drama, casting news, or fan reactions.\n", + "- 🎬 Celebrity updates (Hollywood/Bollywood): include relevant quotes, fashion moments, or event mentions.\n", + "- 📺 Show recaps: summarize what happened and any major twists.\n", + "\n", + "Keep the summary clear, fun, and informative. Use bullet points if multiple themes appear. If there is no meaningful content, say: *“No relevant summary could be generated.”*\n", + "\n", + "Website Content:\n", + "Sign In\n", + "TOI\n", + "Go to\n", + "TOI\n", + "Etimes\n", + "home\n", + "cinema\n", + "news\n", + "movie reviews\n", + "movie listings\n", + "box office\n", + "anime\n", + "previews\n", + "did you know\n", + "videos\n", + "showtimes\n", + "blogs\n", + "awards\n", + "News\n", + "entertainment\n", + "Trending\n", + "Javed Akhtar\n", + "Diljit Dosanjh\n", + "Jaideep Ahlawat\n", + "Karisma Kapoor\n", + "Gauri Khan\n", + "Blake Lively\n", + "Trisha Krishnan\n", + "Kuberaa Box Office Collection\n", + "Sitaare Zameen Par Box Office Collection\n", + "Housefull 5\n", + "Kuberaa Movie Review\n", + "Sitaare Zameen Par Movie Review\n", + "Javed Akhtar\n", + "Diljit Dosanjh\n", + "Jaideep Ahlawat\n", + "Karisma Kapoor\n", + "Gauri Khan\n", + "Blake Lively\n", + "Trisha Krishnan\n", + "Kuberaa Box Office Collection\n", + "Sitaare Zameen Par Box Office Collection\n", + "Housefull 5\n", + "Kuberaa Movie Review\n", + "Sitaare Zameen Par Movie Review\n", + "Javed Akhtar\n", + "Diljit Dosanjh\n", + "Jaideep Ahlawat\n", + "Karisma Kapoor\n", + "Gauri Khan\n", + "Blake Lively\n", + "Trisha Krishnan\n", + "Kuberaa Box Office Collection\n", + "Sitaare Zameen Par Box Office Collection\n", + "Housefull 5\n", + "Kuberaa Movie Review\n", + "Sitaare Zameen Par Movie Review\n", + "Sudhanshu: At 52, John, Dino all of them look like rockstars - EXCLUSIVE\n", + "Sudhanshu Pandey, recognized from 'Band Of Boys' and 'Anupama', defies his 50 years with his fitness. He credits his peers like Dino Moria, Arjun Rampal, and John Abraham for inspiring him to maintain a fit and youthful appearance. Pandey also admires Anil Kapoor's energy and dedication, motivating him to continue prioritizing fitness and inspiring others.\n", + "Previous\n", + "Sonakshi breaks silence on her rift with Luv and Kussh\n", + "Madhuri once chased Aamir with hockey stick for THIS reason\n", + "Ranbir-Raj Kapoor, Diljit-Hania, Samay-IGL: Top 5 news\n", + "Big B's savage reply to troll over cybercrime callertune\n", + "Anushka on keeping kids Vamika, Akaay away from public eye\n", + "Apoorva Mukhija recalls witnessing gender bias at home\n", + "Danish influencer seeks help to find papads from Big B\n", + "Sunjay Kapur's reception pics with Priya Sachdev goes viral\n", + "Big B schools trolls commenting 'buddha sathiya gaya hai'\n", + "Anushka on how she and Virat divide parenting duties\n", + "Brahmaji reacts to Vishnu's 7,000-acre land in New Zealand\n", + "Diljit says THIS amidst trolling for working with Hania\n", + "Riddhi found it ridiculous to like SRK's mother in Jawan\n", + "Priya Sachdev once called husband Sunjay Kapur ‘misunderstood’\n", + "Next\n", + "1\n", + "2\n", + "3\n", + "Hindi\n", + "See All\n", + "Sudhanshu: At 52, John, Dino all of them look like rockstars - EXCLUSIVE\n", + "Sudhanshu Pandey, recognized from 'Band Of Boys' and 'Anupama', defies his 50 years with his fitness. He credits his peers like Dino Moria, Arjun Rampal, and John Abraham for inspiring him to maintain a fit and youthful appearance. Pandey also admires Anil Kapoor's energy and dedication, motivating him to continue prioritizing fitness and inspiring others.\n", + "Sonakshi breaks silence on her rift with Luv and Kussh\n", + "Madhuri once chased Aamir with hockey stick for THIS reason\n", + "Ranbir-Raj Kapoor, Diljit-Hania, Samay-IGL: Top 5 news\n", + "Anushka on keeping kids Vamika, Akaay away from public eye\n", + "Anushka Sharma and Virat Kohli are committed to shielding their children, Vamika and Akaay, from the constant glare of public attention. In a recent interview, Anushka emphasized the couple's focus on instilling strong values and ensuring a normal upbringing for their kids.\n", + "Apoorva Mukhija recalls witnessing gender bias at home\n", + "Regional\n", + "When Samantha’s class 10 mark sheet got leaked\n", + "Throwback to when a nostalgic memory made its way across the internet — Samantha Ruth Prabhu’s Class 10 mark sheet! The actress’s charming on-screen presence and grounded personality were once again in the spotlight as her old school report card began doing the rounds on social media.\n", + "Actor Tushar Ghadigaonkar passes away at 34\n", + "‘Kuberaa’ Twitter review: Netizens calls it a ‘Blockbuster’\n", + "Mammootty’s health- Brittas says actor doing well\n", + "Kavya Madhavan’s father P. Madhavan passes away\n", + "‘The Raja Saab’ teaser: Prabhas shines in this horror comedy\n", + "Mammootty’s father-in-law P S Abu passes away\n", + "Videos\n", + "See All\n", + "Previous\n", + "03:07\n", + "Ananya Panday’s Garden Bond With Parrots Wins Hearts\n", + "88 views | 2 hours ago\n", + "03:14\n", + "Sameera Reddy’s Healing Journey Through Yoga\n", + "31 views | 2 hours ago\n", + "03:13\n", + "Kriti Kharbanda’s Modern Maharani Look Stuns Instagram\n", + "26 views | 2 hours ago\n", + "03:12\n", + "Bobby Deol Meets Diljit Dosanjh: Punjabi Power Goes Viral\n", + "81 views | 2 hours ago\n", + "03:19\n", + "‘Sitaare Zameen Par’: Riteish Deshmukh’s Emotional Shoutout For Genelia’s Big Win\n", + "162 views | 2 hours ago\n", + "03:26\n", + "Varun Dhawan Stuns With 50 Push-Ups Alongside Army Cadets on Border 2 Set\n", + "21 views | 2 hours ago\n", + "03:00\n", + "VIDYA BALAN TURNS HEADS WITH CASUAL AIRPORT LOOK\n", + "16 views | 2 hours ago\n", + "03:05\n", + "MANDHIRA KAPUR BREAKS DOWN IN EMOTIONAL POST FOR LATE BROTHER SUNJAY KAPUR\n", + "1.2K views | 2 hours ago\n", + "03:28\n", + "SALMAN KHAN TAKES A BRUTAL DIG AT SOHAIL’S DIVORCE ON NATIONAL TV\n", + "185 views | 2 hours ago\n", + "03:15\n", + "RAJINIKANTH CAUSES FAN RIOT DURING ‘JAILER 2’ SHOOT IN MYSORE\n", + "26 views | 2 hours ago\n", + "03:10\n", + "IBRAHIM ALI KHAN KISSES HIS DOG AT AIRPORT IN HEARTWARMING FAREWELL\n", + "20 views | 3 hours ago\n", + "03:09\n", + "ANUPAMAA SET GUTTED IN MASSIVE FIRE | CREW ESCAPES, CINE BODY DEMANDS ACTION\n", + "1.2K views | 3 hours ago\n", + "Next\n", + "1\n", + "2\n", + "3\n", + "4\n", + "5\n", + "6\n", + "7\n", + "8\n", + "9\n", + "10\n", + "11\n", + "World\n", + "See All\n", + "Aamir to Tom: Celebs on a mission to 'Save Cinema'\n", + "'How to Train Your Dragon' beats '28 Years Later' and 'Elio' to top the US box office on second weekend\n", + "Blake Lively is heartbroken after friendship ends with Taylor Swift; accepts the music mogul won't be returning - Deets inside\n", + "Selena-Hailey UNFOLLOW each other amid Bieber drama\n", + "Judge gives Baldoni access to Blake-Taylor messages\n", + "Trending Now\n", + "# Sidharth Malhotra-Kiara Advani\n", + "# AbRam Khan-Taimur Ali Khan\n", + "# Janhvi Kapoor\n", + "# Salman Khan\n", + "# Hema Malini\n", + "# Salman Khan\n", + "# Gauri Khan\n", + "# Shah Rukh Khan\n", + "# Chahatt Khanna\n", + "Visual Stories\n", + "See All\n", + "Previous\n", + "Kuberaa’s Sameera to Pushpa’s Srivalli: Rashmika Mandanna’s most iconic on-screen avatars\n", + "Ahaana Krishna’s ethereal photo series is straight out of a dream\n", + "Rashmika Mandanna to Rakul Preet Singh: Best pictures of the week featuring south actresses\n", + "Gauri Khan's most loved saree looks - An ode to modern day elegance\n", + "​South Indian beauties whose smiles will light up your Monday\n", + "Karishma Tanna Slays Every Frame\n", + "Tamannaah Bhatia’s traditional looks\n", + "Malavika Mohanan's radiant pics\n", + "​Neha Shetty stuns in every shade of blue\n", + "Thalapathy Vijay’s top 10 blockbuster movies worth re-watching!\n", + "​In pic: Mesmerizing looks of Shruti Haasan​\n", + "Dushara Vijayan’s Most Elegant Fashion Moments\n", + "Next\n", + "1\n", + "2\n", + "3\n", + "More Stories\n", + "Sonakshi Sinha breaks silence on her rumoured rift with brothers Luv and Kussh Sinha: 'My effort is always to support them...'\n", + "Madhuri Dixit once chased Aamir Khan with a hockey stick for THIS reason on sets of Dil: 'People fool you and you believe them'\n", + "Mohanlal declines to continue as president at AMMA’s general body meeting- Deets Inside\n", + "Blockbusters Ranbir Kapoor turned down: Films that became hits without him\n", + "Anushka Sharma reveals why she and Virat Kohli are keeping their children Vamika and Akaay away from the public eye: 'We don't want to raise brats'\n", + "Apoorva Mukhija recalls witnessing gender bias at home: 'My mother did it all, but father got credit for showing up at PTMs'\n", + "Amitabh Bachchan gives a savage reply to a troll over his viral cybercrime caller tune: 'Sarkar ko bolo bhai..'\n", + "Danish influencer asks fans to help her find papads from Amitabh Bachchan; netizens say 'he also used to grow basmati rice'\n", + "Days after his untimely demise, Sunjay Kapur's reception photos with Priya Sachdev goes viral; Looked dashing in hand embroidered shoes, written 'I do'\n", + "Priyanka Chopra Jonas recollects walking into a trap set by John Cena, Idris Elba on sets of 'Heads of State'\n", + "Bobby Deol's London vacation sparks fan frenzy: viral video shows actor posing for selfies outside restaurant\n", + "Amitabh Bachchah gives befitting replies to 'buddha sathiya gaya hai', ‘ganja’ comments by trolls: 'Ek din, Bhagwan naa kare voh din jaldi aaye...'\n", + "Sai Pallavi’s best performances\n", + "Brahmaji clears the air about Vishnu Manchu purchasing 7,000-acre land in New Zealand: 'I was pulling their leg as usual...'\n", + "Anushka Sharma reveals how she and Virat Kohli divide the parenting duties: 'I will be the primary caregiver, he plays round the year'\n", + "Ranbir Kapoor's 'Awara' look sparks rumours of Raj Kapoor tribute, Diljit Dosanjh slammed for working with Hania Aamir in Sardaar Ji 3: Top 5 news\n", + "Has Kiara Advani been approached to play Meena Kumari in her biopic? Here's what we know\n", + "Top 5 psychological Anime every thriller fan must watch\n", + "Load More Stories\n", + "# Latest Movies 2025\n", + "# Best Bollywood Movies 2025\n", + "# Hollywood Movie 2025\n", + "# Tamil Movies 2025\n", + "# Telugu Movies 2025\n", + "# Malayalam Movies 2025\n", + "# Kannada Movies 2025\n", + "# Marathi Movies 2025\n", + "# Bengali Movies 2025\n", + "# Top Rated Movies 2025\n", + "# Best Hindi Movies\n", + "# Best English Movies\n", + "Hot on the Web\n", + "Salman Khan\n", + "Karisma Kapoor\n", + "Jaideep Ahlawat\n", + "Blood Pressure\n", + "Big Cat Species\n", + "Trisha\n", + "Sitaare Zameen Par Review\n", + "Ancient Indigenous Tribes\n", + "Hair Growth Tips\n", + "Kidney Health\n", + "Kuberaa Review\n", + "Blake Lively\n", + "Reverse Fatty Liver\n", + "Skincare Hacks\n", + "Kuberaa Box Office Collection\n", + "Sitaare Zameen Par Box Office Collection\n", + "Baby Girl Names\n", + "Diljit Dosanjh\n", + "Kidney Disease Symptoms\n", + "Javed Akhtar\n", + "Heart Attack\n", + "Ram Kapoor Diet\n", + "Liver Damage\n", + "Kuberaa Movie Review\n", + "Gauri Khan\n", + "Baba Vanga Prediction\n", + "Baby Boy Names\n", + "Navjot Singh Sidhu\n", + "Housefull 5 Box Office Collection\n", + "DNA Movie Review\n", + "Kidney Damage Symptoms\n", + "Popular Waterfalls In India\n", + "Linkedin Ceo On AI Killing Jobs\n", + "Tesla Robotaxi\n", + "Early Cancer Detection\n", + "Harvard Research Reveals\n", + "American Destinations Explore Without Passport\n", + "Amouranth\n", + "Mouth Larvae\n", + "Doomsday Fish\n", + "Salman Khan AVM\n", + "Ginger Health Tips\n", + "Trending Topics\n", + "Latest Movies\n", + "Bollywood Movies\n", + "Hollywood Movies\n", + "Tamil Movies 2025\n", + "Telugu Movies 2025\n", + "Malayalam Movies 2025\n", + "Kannada Movies 2025\n", + "Marathi Movies 2025\n", + "Bengali Movies 2025\n", + "Top Rated Movies 2025\n", + "Best Hindi Movies\n", + "Best English Movies\n", + "Best Telugu Movies\n", + "Best Tamil Movies\n", + "Best Malayalam Movies\n", + "Best Kannada Movies\n", + "Best Bengali Movies\n", + "Upcoming Hindi Movies\n", + "Best Movies Of All Time\n", + "Best Hindi Movies of All Time\n", + "Latest English Movies\n", + "Latest Malayalam Movies\n", + "English TV News\n", + "Tamil TV News\n", + "Telugu TV News\n", + "Malayalam TV News\n", + "Kannada TV News\n", + "Movie Reviews\n", + "Bhojpuri Cinema News\n", + "Gujarati Cinema News\n", + "Popular Categories\n", + "Viral News\n", + "K Pop News\n", + "Web Series News\n", + "Anime News\n", + "Upcoming English Movies\n", + "Upcoming Tamil Movies\n", + "Upcoming Telugu Movies\n", + "Upcoming Malayalam Movies\n", + "Upcoming Kannada Movies\n", + "Fashion Tips\n", + "Travel News\n", + "Entertainment News\n", + "Bollywood News\n", + "Tollywood News\n", + "Kollywood News\n", + "Mollywood News\n", + "Food News\n", + "Latest Hindi Movies\n", + "Latest Tamil Movies\n", + "Parenting Tips\n", + "Home Remedies\n", + "Weight Loss\n", + "Beauty Tips\n", + "Parenting Tips\n", + "Hindi Videos\n", + "Hindi Video Songs\n", + "Bhojpuri Music Videos\n", + "Latest Telugu Movies\n", + "Bhojpuri Music Video\n", + "Hindi TV News\n", + "Latest News\n", + "NHL free agency turns spicy as Mitch Marner and Connor McDavid eye shorter deals to cash in later\n", + "Olive Ridley turtle washed ashore at Polem\n", + "Who is Thomas Fugate? Meet the 22-year-old leading Trump's terrorism unit amid Iran fiasco\n", + "'And that's why Putin's the boss': Trump rebukes former Russian President Medvedev; warns against treating 'N word casually'\n", + "Govt plans ₹10cr road on Bicholim-Dodamarg route\n", + "Former WWE star Batista eyed for Road House 2 sequel\n", + "Sonakshi Sinha breaks silence on her rumoured rift with brothers Luv and Kussh Sinha: 'My effort is always to support them...'\n", + "Andre Agassi and Steffi Graf’s son Jaden Agassi shows love for girlfriend Catherine Holt’s bold new photo from bedroom series\n", + "Is WWE planning to change Cody Rhodes’ iconic entrance theme song ‘Kingdom’?\n", + "Velumani says he didn’t attend RSS event in Coimbatore\n", + "Strait of Hormuz: Oil supply not an issue for India; 'pricing is a bigger concern,' what experts say\n", + "Madhuri Dixit once chased Aamir Khan with a hockey stick for THIS reason on sets of Dil: 'People fool you and you believe them'\n", + "As commissions fall, India’s ride-hailing firms test viability of flat-fee economics\n", + "Analysing what Trump’s strikes mean for Iran\n", + "Trump's clarification on 'Iran regime change' divides MAGA further: JD Vance, Hegseth, Marco Rubio 'humiliated'\n", + "Laughter Chefs 2: Krushna Abhishek roasts Rahul Vaidya for his in-famous feud with cricketer Virat Kohli\n", + "“I could have passed Dan Ticktum”: Edoardo Mortara regrets Attack Mode strategy at Jakarta E-Prix\n", + "India vs England Test: Sunil Gavaskar calls for Rishabh Pant's signature somersault celebration, wicketkeeper politely declines - WATCH\n", + "Copyright © 2025 Bennett, Coleman & Co. Ltd. All rights reserved. For reprint rights: Times Syndication Service\n", + "Follow us on\n", + "\n" + ] + } + ], + "source": [ + "print(user_prompt_for(gossip))" + ] + }, + { + "cell_type": "code", + "execution_count": 129, + "id": "c039ab7c-88ee-475d-a93e-b26711d3ed4b", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(website):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(website)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": 146, + "id": "dd1fee35-6cc9-4995-8b5e-b93d80488364", + "metadata": {}, + "outputs": [], + "source": [ + "def summarize(url):\n", + " website = Website(url)\n", + " response = openai.chat.completions.create(\n", + " model = \"llama3.2\",\n", + " messages = messages_for(website)\n", + " )\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ed09dad8-93bb-417e-b07b-183d2eba1ec5", + "metadata": {}, + "outputs": [], + "source": [ + "summarize(\"https://timesofindia.indiatimes.com/entertainment\")" + ] + }, + { + "cell_type": "code", + "execution_count": 139, + "id": "16a57eed-eba5-4f75-84f2-d44a67b36047", + "metadata": {}, + "outputs": [], + "source": [ + "def display_summary(url):\n", + " summary = summarize(url)\n", + " display(Markdown(summary))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25af6217-6944-4c95-b156-0899dfcf0b83", + "metadata": {}, + "outputs": [], + "source": [ + "display_summary(\"https://timesofindia.indiatimes.com/entertainment\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29daa2d4-9d92-40ae-a0c4-dd2fdacf3f80", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/environment.yml b/environment.yml index 1247085..470b64b 100644 --- a/environment.yml +++ b/environment.yml @@ -17,17 +17,13 @@ dependencies: - scikit-learn - chromadb - jupyter-dash - - sentencepiece - pyarrow - - faiss-cpu - pip: - beautifulsoup4 - plotly - - bitsandbytes - transformers - sentence-transformers - - datasets - - accelerate + - datasets==3.6.0 - openai - anthropic - google-generativeai @@ -44,7 +40,7 @@ dependencies: - langchain-openai - langchain-chroma - langchain-community - - faiss-cpu - feedparser - twilio - pydub + - protobuf==3.20.2 diff --git a/extras/trading/prototype_trader.ipynb b/extras/trading/prototype_trader.ipynb index 30358b9..1143a10 100644 --- a/extras/trading/prototype_trader.ipynb +++ b/extras/trading/prototype_trader.ipynb @@ -346,7 +346,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/requirements.txt b/requirements.txt index 1d98c3e..edcb3de 100644 --- a/requirements.txt +++ b/requirements.txt @@ -14,19 +14,15 @@ tqdm openai gradio langchain -tiktoken -faiss-cpu +langchain-core +langchain-text-splitters langchain-openai -langchain_experimental -langchain_chroma -langchain[docarray] -datasets -sentencepiece +langchain-chroma +langchain-community +datasets==3.6.0 matplotlib google-generativeai anthropic -scikit-learn -unstructured chromadb plotly jupyter-dash @@ -34,11 +30,9 @@ beautifulsoup4 pydub modal ollama -accelerate -sentencepiece -bitsandbytes psutil setuptools speedtest-cli sentence_transformers feedparser +protobuf==3.20.2 diff --git a/week1/community-contributions/City Economy Summarizer.ipynb b/week1/community-contributions/City Economy Summarizer.ipynb new file mode 100644 index 0000000..9d8e9b5 --- /dev/null +++ b/week1/community-contributions/City Economy Summarizer.ipynb @@ -0,0 +1,273 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4e66a6eb-e44a-4dc3-bad7-82e27d45155d", + "metadata": {}, + "source": [ + "# Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "98bf393c-358e-4ee1-b15b-96dfec323734", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI" + ] + }, + { + "cell_type": "markdown", + "id": "f92034ed-a2e6-444a-8008-291ba3f80561", + "metadata": {}, + "source": [ + "# OpenAI API Key" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a084b35d-19e9-4b48-bb06-d2c9e4474b20", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "# Check the key\n", + "\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif not api_key.startswith(\"sk-proj-\"):\n", + " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")" + ] + }, + { + "cell_type": "markdown", + "id": "32b35ea0-e4ca-492a-94af-822ec61468a0", + "metadata": {}, + "source": [ + "# About..." + ] + }, + { + "cell_type": "markdown", + "id": "c660b786-af88-4134-b958-ffbf7a7b2904", + "metadata": {}, + "source": [ + "In this project I use the code from day 1 for something I do at work. I'm a real estate appraiser and when I prepare a valuation for some real estate, I analyze the local market, and in particular the city where the property is located. I then gather economy-related information and create a report from it. I'm based in Poland, so the report is in Polish. Here, I want to ask the model to make such a report for me, using the official website of the city and its related Wikipedia article." + ] + }, + { + "cell_type": "markdown", + "id": "09f32b5a-4d0a-4fec-a2f8-5d323ca2745d", + "metadata": {}, + "source": [ + "# The Code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f0fb8fe1-f052-4426-8531-5520d5295807", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a2cca4b-8cd0-4c1a-a01c-1da10199236c", + "metadata": {}, + "outputs": [], + "source": [ + "headers = {\n", + " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", + "}\n", + "\n", + "class Website:\n", + "\n", + " def __init__(self, url):\n", + " \"\"\"\n", + " Create this Website object from the given url using the BeautifulSoup library\n", + " \"\"\"\n", + " self.url = url\n", + " response = requests.get(url, headers=headers)\n", + " soup = BeautifulSoup(response.content, 'html.parser')\n", + " self.title = soup.title.string if soup.title else \"No title found\"\n", + " for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n", + " irrelevant.decompose()\n", + " self.text = soup.body.get_text(separator=\"\\n\", strip=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c73e91c8-5805-4c9f-9bbb-b4e9c1e7bf12", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = \"\"\"You are an analyst and real estate appraiser who checks out the official websites \n", + "of cities as well as articles related to these cities on Wikipedia, searching the particular pages \n", + "of the official website and the Wikipedia article for economic data, in particular the \n", + "demographic structure of the city, its area, and how it's subdivided into built-up area, \n", + "rural area, forests, and so on, provided this kind of information is available. \n", + "The most important information you want to find is that related to the real estate market in the city, \n", + "but also the general economy of the city, so what kind of factories or companies there are, commerce, \n", + "business conditions, transportation, economic growth in recent years, and recent investments. \n", + "wealth of the inhabitants, and so on, depending on what kind of information is available on the website. \n", + "Combine the information found on the official website with the information found on Wikipedia, and in case\n", + "of discrepancies, the official website should take precedence. If any of the information is missing,\n", + "just omit it entirely and don't mention that it is missing, just don't write about it at all.\n", + "When you gather all the required information, create a comprehensive report presenting \n", + "the data in a clear way, using markdown, in tabular form where it makes sense. \n", + "The length of the report should be about 5000 characters. And one more thing, the report should be entirely \n", + "in Polish. \"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e8015e8d-1655-4477-a111-aa8dd584f5eb", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(city, city_website, wiki_website):\n", + " user_prompt = f\"You are looking at the official website of the city {city}, and its wiki article.\"\n", + " user_prompt += f\"\\nThe contents of this website is as follows: \\\n", + "please provide a comprehensive report of economy-related data for the city of {city}, available on the \\\n", + "particular pages and subpages of its official website and Wikipedia in markdown. \\\n", + "Add tables if it makes sense for the data. The length of the report should be about 5000 characters. \\\n", + "The report should be in Polish.\\n\\n\"\n", + " user_prompt += city_website.text\n", + " user_prompt += wiki_website.text\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b55bd66b-e997-4d64-b5d5-679098013b9f", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(city, city_website, wiki_website):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(city, city_website, wiki_website)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e5f1f218-d6a9-4a9e-be7e-b4f41e7647e5", + "metadata": {}, + "outputs": [], + "source": [ + "def report(url_official, url_wiki, city):\n", + " city_website = Website(url_official)\n", + " wiki_website = Website(url_wiki)\n", + " response = openai.chat.completions.create(\n", + " model = \"gpt-4o-mini\",\n", + " messages = messages_for(city, city_website, wiki_website)\n", + " )\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "markdown", + "id": "08b47ec7-d00f-44e4-bbe2-580c8efd88e5", + "metadata": {}, + "source": [ + "# Raw Result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "830f0746-08a7-43ae-bd40-78d4a4c5d3e5", + "metadata": {}, + "outputs": [], + "source": [ + "report(\"https://www.rudaslaska.pl/\", \"https://pl.wikipedia.org/wiki/Ruda_%C5%9Al%C4%85ska\", \"Ruda Śląska\")" + ] + }, + { + "cell_type": "markdown", + "id": "a3630ac4-c103-4b84-a1a2-c246a702346e", + "metadata": {}, + "source": [ + "# Polished Result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b89dd543-998d-4466-abd8-cc785118d3e4", + "metadata": {}, + "outputs": [], + "source": [ + "def display_report(url_official, url_wiki, city):\n", + " rep = report(url_official, url_wiki, city)\n", + " display(Markdown(rep))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "157926f3-ba67-4d4b-abbb-24a2dcd85a8b", + "metadata": {}, + "outputs": [], + "source": [ + "display_report(\"https://www.rudaslaska.pl/\", \"https://pl.wikipedia.org/wiki/Ruda_%C5%9Al%C4%85ska\", \"Ruda Śląska\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "727d2283-e74c-4e74-86f2-759b08f1427a", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/Day1_2_Reddit_Analysis/Day1_Day2_Outputs.pdf b/week1/community-contributions/Day1_2_Reddit_Analysis/Day1_Day2_Outputs.pdf new file mode 100644 index 0000000..e10cbab Binary files /dev/null and b/week1/community-contributions/Day1_2_Reddit_Analysis/Day1_Day2_Outputs.pdf differ diff --git a/week1/community-contributions/Day1_2_Reddit_Analysis/Day1_RedditAnalysis_gpt.ipynb b/week1/community-contributions/Day1_2_Reddit_Analysis/Day1_RedditAnalysis_gpt.ipynb new file mode 100644 index 0000000..6f8304e --- /dev/null +++ b/week1/community-contributions/Day1_2_Reddit_Analysis/Day1_RedditAnalysis_gpt.ipynb @@ -0,0 +1,409 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "9413d98a-352a-47b7-b84b-5b4a61b3c002", + "metadata": {}, + "source": [ + "# Reddit Post Analysis" + ] + }, + { + "cell_type": "markdown", + "id": "97ebfa77-33f8-4cd1-9204-d73aeefc0fea", + "metadata": {}, + "source": [ + "1. **Sets the Role and Tone** \n", + " Instructs the AI to act as an **expert analyst** specializing in extracting insights from online forums like Reddit.\n", + "\n", + "2. **Guides Sentiment Analysis** \n", + " Asks the AI to evaluate overall sentiment (e.g., positive, neutral, negative), and to present it as approximate percentages with a brief rationale.\n", + "\n", + "3. **Groups and Labels Themes** \n", + " Instructs the AI to identify and cluster **key discussion themes**, perspectives, and emotional tones. Each theme should be explained and illustrated with **example comments**.\n", + "\n", + "4. **Creates an Insights Table** \n", + " Requests a structured table with fields like *Perspectives, Frustrations, Tools, Suggestions* to concisely summarize the discussion’s core insights.\n", + "\n", + "5. **Describes Community Dynamics** \n", + " Asks the AI to assess the **interaction style** (e.g., supportive, sarcastic, argumentative) and note any social patterns (e.g., consensus or conflict)." + ] + }, + { + "cell_type": "markdown", + "id": "425868ba-faec-4754-87f5-650f7529b319", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "#### Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9596f40f-5add-4602-91e3-cd7d2c753c33", + "metadata": {}, + "outputs": [], + "source": [ + "import praw\n", + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from IPython.display import Markdown, display, Image\n", + "from openai import OpenAI" + ] + }, + { + "cell_type": "markdown", + "id": "9e1a9999-4aad-416d-90fe-3b0841a4f455", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "#### Load Credentials" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "847843ce-ebf9-4f48-b625-82e3ed687c81", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "# Check the key\n", + "\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif not api_key.startswith(\"sk-proj-\"):\n", + " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c615d79b-55a0-4eb1-ad8b-a2e28c11b49e", + "metadata": {}, + "outputs": [], + "source": [ + "reddit = praw.Reddit(\n", + " client_id=os.getenv(\"REDDIT_CLIENT_ID\"),\n", + " client_secret=os.getenv(\"REDDIT_CLIENT_SECRET\"),\n", + " user_agent=os.getenv(\"REDDIT_USER_AGENT\"),\n", + " username=os.getenv(\"REDDIT_USERNAME\"),\n", + " password=os.getenv(\"REDDIT_PASSWORD\")\n", + ")\n", + "\n", + "print(\"Authenticated as:\", reddit.user.me())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6df2224d-ecfd-4e07-9bc8-102eff257d69", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()" + ] + }, + { + "cell_type": "markdown", + "id": "21ba0482-79e5-45ec-81d7-8611312c6b9e", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "#### Reddit Post Scraper" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8dc5276d-2d38-4651-9db0-c353076d6096", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "class RedditPostScraper:\n", + " def __init__(self, url):\n", + " self.submission = reddit.submission(url=url)\n", + " self.submission.comments.replace_more(limit=None)\n", + " self._title = self.submission.title\n", + " self._text = self.submission.selftext\n", + " self._comments = \"\"\n", + " self._formatted_comments = [] # for reprocessing if needed\n", + "\n", + " def _generate_comments(self):\n", + " comments_list = []\n", + " for top_level in self.submission.comments:\n", + " top_author = top_level.author.name if top_level.author else \"[deleted]\"\n", + " comments_list.append(f\"{top_author}: {top_level.body}\")\n", + "\n", + " for reply in top_level.replies:\n", + " reply_author = reply.author.name if reply.author else \"[deleted]\"\n", + " comments_list.append(\n", + " f\"{reply_author} replied to {top_author}'s comment: {reply.body}\"\n", + " )\n", + " self._formatted_comments = comments_list\n", + "\n", + " def title(self):\n", + " return f\"Title:\\n{self._title}\\n{self._text}\"\n", + "\n", + " def comments(self, max_words=None):\n", + " if not self._formatted_comments:\n", + " self._generate_comments()\n", + "\n", + " output_comments = []\n", + " total_words = 0\n", + "\n", + " for comment in self._formatted_comments:\n", + " word_count = len(comment.split())\n", + " if max_words and total_words + word_count > max_words:\n", + " break\n", + " output_comments.append(comment)\n", + " total_words += word_count\n", + "\n", + " return \"Text:\\n\" + \"\\n\\n\".join(output_comments)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3121cad0-4e2c-4d78-88e2-e72c6b99e2bf", + "metadata": {}, + "outputs": [], + "source": [ + "# post = RedditPostScraper(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")\n", + "# print(post.title())\n", + "# print(post.comments(2000))" + ] + }, + { + "cell_type": "markdown", + "id": "569760f6-5d68-40c1-9227-374c8e04d70a", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "#### System and User Prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22c0e89a-c076-4616-ae9b-b4cd588f39ad", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = '''You are an expert analyst specializing in extracting insights from online discussion forums. You will be given the title of a Reddit post and a list of comments (some with replies). Your task is to analyze the sentiment of the discussion and extract structured insights that reflect the collective responses.\n", + "\n", + "Your response **must be in well-formatted Markdown**. Use clear section headers (`##`, `###`), bullet points, and tables where appropriate.\n", + "\n", + "Perform the following tasks:\n", + "\n", + "---\n", + "\n", + "## 1. Overall Sentiment Breakdown\n", + "\n", + "- Determine the overall sentiment of the responses (e.g., positive, negative, neutral, mixed).\n", + "- Express the sentiment as approximate percentages (e.g., 60% positive, 25% neutral, 15% negative).\n", + "- Provide a short explanation for why the sentiment skews this way, referring to tone, topic sensitivity, controversy, humor, or supportiveness.\n", + "\n", + "---\n", + "\n", + "## 2. Thematic Grouping of Comments\n", + "\n", + "- Identify key recurring **themes, perspectives, or discussion threads** in the comments.\n", + "- For each theme, create a subheading.\n", + "- Under each:\n", + " - Briefly describe the focus or tone of that cluster (e.g., personal stories, criticism, questions, jokes).\n", + " - Include 1–2 **example comments** using quote formatting (`>`), preferably ones with replies or high engagement.\n", + "\n", + "---\n", + "\n", + "## 3. Insights Table\n", + "\n", + "If applicable, extract and structure insights into the following table. Leave any column empty if it’s not relevant to the post type:\n", + "\n", + "| Perspectives/ Motivations | Pains/ Concerns/ Frustrations | Tools / References / Resources | Suggestions / Solutions |\n", + "|-------------------------------|----------------------------------|--------------------------------------|------------------------------------|\n", + "| - ... | - ... | - ... | - ... |\n", + "\n", + "- Populate this table with concise bullet points.\n", + "- Adapt categories to match the discussion type (e.g., switch \"Suggestions\" to \"Reactions\" if it's a news thread).\n", + "\n", + "---\n", + "\n", + "## 4. Tone and Community Dynamics\n", + "\n", + "- Comment on the **style and culture** of interaction: humor, sarcasm, empathy, trolling, intellectual debate, etc.\n", + "- Mention any noticeable social dynamics: agreement/disagreement, echo chambers, respectful debate, or hostility.\n", + "- Include casual or emotional comments if they illustrate community personality.\n", + "\n", + "---\n", + "\n", + "**Respond only in well-formatted Markdown.** Structure your output for clarity and insight, suitable for rendering in documentation, reports, or dashboards. Do not summarize every comment — focus on patterns, perspectives, and collective signals.\n", + "\n", + "'''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf9d15d6-4f9a-45fd-96ed-d7097c7f03d6", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(post):\n", + " user_prompt = f\"You are looking at a Reddit discussion titled:\\n\\n{post.title()}\\n\\n\"\n", + " user_prompt += \"Below are the responses from various users. Analyze them according to the system prompt provided.\\n\"\n", + " user_prompt += \"Make sure your response is structured in Markdown with headers, lists, and tables as instructed.\\n\\n\"\n", + " user_prompt += post.comments(4000)\n", + " return user_prompt\n" + ] + }, + { + "cell_type": "markdown", + "id": "f18c581c-ea30-4a43-9223-8c184dedb37e", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "#### Generating Responses" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aadf8f41-aca3-41be-b18b-cb49a67ba256", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(website):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(website)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "feac9c61-f1f8-48f0-9189-bc60ac7fd755", + "metadata": {}, + "outputs": [], + "source": [ + "def summarize(url):\n", + " website = RedditPostScraper(url)\n", + " response = openai.chat.completions.create(\n", + " model = \"gpt-4o-mini\",\n", + " messages = messages_for(website)\n", + " )\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12b1d6dd-2d62-4136-8b8e-0a92134d4261", + "metadata": {}, + "outputs": [], + "source": [ + "# summarize(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd48253d-cdca-4c29-b4f2-c470290de63b", + "metadata": {}, + "outputs": [], + "source": [ + "def display_summary(url):\n", + " summary = summarize(url)\n", + " display(Markdown(summary))" + ] + }, + { + "cell_type": "markdown", + "id": "7e0825a9-a3b0-43a0-b69c-cf0ce81d77d2", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "#### Example Usage" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8a61a482-ec70-4e29-b99c-0d82298a32b1", + "metadata": {}, + "outputs": [], + "source": [ + "display_summary(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1a336777-a06e-4535-b68d-a6470eb1d701", + "metadata": {}, + "outputs": [], + "source": [ + "display_summary(\"https://www.reddit.com/r/AskReddit/comments/1lam10k/how_do_you_feel_about_the_no_kings_protest/\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b6b12074-ffb6-4a6d-bdd2-bbbb78f82781", + "metadata": {}, + "outputs": [], + "source": [ + "display_summary(\"https://www.reddit.com/r/canada/comments/1laq8ok/donald_trump_is_a_convicted_felon_could_he_be/\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "63b805e5-183f-439b-bfe7-9ee6bbe4a5b4", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/Day1_2_Reddit_Analysis/README.md b/week1/community-contributions/Day1_2_Reddit_Analysis/README.md new file mode 100644 index 0000000..3c2f2ed --- /dev/null +++ b/week1/community-contributions/Day1_2_Reddit_Analysis/README.md @@ -0,0 +1,59 @@ +# Reddit Post Analyzer – GPT & Open Source Approaches + +This project consists of two Jupyter notebooks that demonstrate different methods for analyzing Reddit post data: + +- **Day 1:** `Day1_RedditAnalysis_gpt.ipynb` – Uses GPT-based sentiment and insight extraction from Reddit posts and comments. +- **Day 2:** `day2_RedditAnalysis_opensource.ipynb` – Implements an open-source alternative for Reddit data processing and basic sentiment/thematic analysis. + +--- + +## 📌 Features + +- Reddit post and comment scraping using PRAW +- GPT-based sentiment summarization and insight structuring (Day 1) +- Open-source sentiment and thematic analysis pipeline (Day 2) +- Markdown-formatted output suitable for reporting + +--- + +## 🛠️ Setup Instructions + +### Reddit API Credentials Setup + +To access Reddit data, you need to create a Reddit app and obtain credentials: + +#### Steps to Get Your Reddit API Keys: + +1. Go to [https://www.reddit.com/prefs/apps](https://www.reddit.com/prefs/apps). +2. Scroll to the bottom and click **“create another app”** or **“create app”**. +3. Choose the **“script”** option. +4. Fill in the following fields: + - **name:** e.g., Reddit Analyzer + - **redirect uri:** `http://localhost:8080` + - **description:** *(optional)* +5. After creating the app, you will get: + - **client ID** (displayed under the app name) + - **client secret** +6. Keep note of your Reddit **username** and **password** (these are used with script apps) + +#### Store your credentials in a `.env` file: + +Create a `.env` file in the root directory with the following format: + +```env +REDDIT_CLIENT_ID=your_client_id +REDDIT_CLIENT_SECRET=your_client_secret +REDDIT_USER_AGENT=your_custom_user_agent +REDDIT_USERNAME=your_reddit_username +REDDIT_PASSWORD=your_reddit_password +``` + +These will be securely loaded into your script using the `dotenv` package. + +--- + +## 🚀 Running the Notebooks + +Make sure to activate your virtual environment (if applicable), install dependencies, and run the notebooks cell by cell in **Jupyter Lab** or **VS Code**. + +--- diff --git a/week1/community-contributions/Day1_2_Reddit_Analysis/day2_RedditAnalysis_opensource.ipynb b/week1/community-contributions/Day1_2_Reddit_Analysis/day2_RedditAnalysis_opensource.ipynb new file mode 100644 index 0000000..1010512 --- /dev/null +++ b/week1/community-contributions/Day1_2_Reddit_Analysis/day2_RedditAnalysis_opensource.ipynb @@ -0,0 +1,436 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "8c22d46c-d08b-4dbd-bdf5-338adce95e1a", + "metadata": {}, + "source": [ + "# Reddit Post Analysis using open source models (llama 3.2, deepseek r1, mistral:7b)" + ] + }, + { + "cell_type": "markdown", + "id": "bfc5335b-53a8-4cd1-b1a8-95496ae4856d", + "metadata": {}, + "source": [ + "1. **Sets the Role and Tone** \n", + " Instructs the AI to act as an **expert analyst** specializing in extracting insights from online forums like Reddit.\n", + "\n", + "2. **Guides Sentiment Analysis** \n", + " Asks the AI to evaluate overall sentiment (e.g., positive, neutral, negative), and to present it as approximate percentages with a brief rationale.\n", + "\n", + "3. **Groups and Labels Themes** \n", + " Instructs the AI to identify and cluster **key discussion themes**, perspectives, and emotional tones. Each theme should be explained and illustrated with **example comments**.\n", + "\n", + "4. **Creates an Insights Table** \n", + " Requests a structured table with fields like *Perspectives, Frustrations, Tools, Suggestions* to concisely summarize the discussion’s core insights.\n", + "\n", + "5. **Describes Community Dynamics** \n", + " Asks the AI to assess the **interaction style** (e.g., supportive, sarcastic, argumentative) and note any social patterns (e.g., consensus or conflict)." + ] + }, + { + "cell_type": "markdown", + "id": "6104a23f-c43a-48dc-a018-cddb8bea75d1", + "metadata": {}, + "source": [ + "#### Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4e2a9393-7767-488e-a8bf-27c12dca35bd", + "metadata": {}, + "outputs": [], + "source": [ + "import praw\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI\n", + "import ollama" + ] + }, + { + "cell_type": "markdown", + "id": "07de5c1d-1930-49ca-a026-2265e5432327", + "metadata": {}, + "source": [ + "#### Load Credentials" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "83fdd570-83a3-4e18-a94e-969c557978d3", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "reddit = praw.Reddit(\n", + " client_id=os.getenv(\"REDDIT_CLIENT_ID\"),\n", + " client_secret=os.getenv(\"REDDIT_CLIENT_SECRET\"),\n", + " user_agent=os.getenv(\"REDDIT_USER_AGENT\"),\n", + " username=os.getenv(\"REDDIT_USERNAME\"),\n", + " password=os.getenv(\"REDDIT_PASSWORD\")\n", + ")\n", + "\n", + "print(\"Authenticated as:\", reddit.user.me())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6a8a58d8-6755-4e22-be97-232c2f7ea07c", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()" + ] + }, + { + "cell_type": "markdown", + "id": "f6b5b086-a4aa-40d2-a721-b3b8781d7ccf", + "metadata": {}, + "source": [ + "#### Reddit Post Scraper" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "09c7a428-db62-4353-9fa5-d12bbdc4477c", + "metadata": {}, + "outputs": [], + "source": [ + "class RedditPostScraper:\n", + " def __init__(self, url):\n", + " self.submission = reddit.submission(url=url)\n", + " self.submission.comments.replace_more(limit=None)\n", + " self._title = self.submission.title\n", + " self._text = self.submission.selftext\n", + " self._comments = \"\"\n", + " self._formatted_comments = [] # for reprocessing if needed\n", + "\n", + " def _generate_comments(self):\n", + " comments_list = []\n", + " for top_level in self.submission.comments:\n", + " top_author = top_level.author.name if top_level.author else \"[deleted]\"\n", + " comments_list.append(f\"{top_author}: {top_level.body}\")\n", + "\n", + " for reply in top_level.replies:\n", + " reply_author = reply.author.name if reply.author else \"[deleted]\"\n", + " comments_list.append(\n", + " f\"{reply_author} replied to {top_author}'s comment: {reply.body}\"\n", + " )\n", + " self._formatted_comments = comments_list\n", + "\n", + " def title(self):\n", + " return f\"Title:\\n{self._title}\\n{self._text}\"\n", + "\n", + " def comments(self, max_words=None):\n", + " if not self._formatted_comments:\n", + " self._generate_comments()\n", + "\n", + " output_comments = []\n", + " total_words = 0\n", + "\n", + " for comment in self._formatted_comments:\n", + " word_count = len(comment.split())\n", + " if max_words and total_words + word_count > max_words:\n", + " break\n", + " output_comments.append(comment)\n", + " total_words += word_count\n", + "\n", + " return \"Text:\\n\" + \"\\n\\n\".join(output_comments)" + ] + }, + { + "cell_type": "markdown", + "id": "3cece64a-ca54-4961-b04e-40f8057e2e78", + "metadata": {}, + "source": [ + "#### System and User Prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "029de240-398e-4339-b90c-e6e90a96bcb5", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = '''You are an expert analyst specializing in extracting insights from online discussion forums. You will be given the title of a Reddit post and a list of comments (some with replies). Your task is to analyze the sentiment of the discussion and extract structured insights that reflect the collective responses.\n", + "Your response **must be in well-formatted Markdown**. Use clear section headers (`##`, `###`), bullet points, and tables where appropriate.\n", + "Perform the following tasks:\n", + "---\n", + "## 1. Overall Sentiment Breakdown\n", + "- Determine the overall sentiment of the responses (e.g., positive, negative, neutral, mixed).\n", + "- Express the sentiment as approximate percentages (e.g., 60% positive, 25% neutral, 15% negative).\n", + "- Provide a short explanation for why the sentiment skews this way, referring to tone, topic sensitivity, controversy, humor, or supportiveness.\n", + "---\n", + "## 2. Thematic Grouping of Comments\n", + "- Identify key recurring **themes, perspectives, or discussion threads** in the comments.\n", + "- For each theme, create a subheading.\n", + "- Under each:\n", + " - Briefly describe the focus or tone of that cluster (e.g., personal stories, criticism, questions, jokes).\n", + " - Include 1–2 **example comments** using quote formatting (`>`), preferably ones with replies or high engagement.\n", + "---\n", + "## 3. Insights Table\n", + "If applicable, extract and structure insights into the following table. Leave any column empty if it’s not relevant to the post type:\n", + "| Perspectives/ Motivations | Pains/ Concerns/ Frustrations | Tools / References / Resources | Suggestions / Solutions |\n", + "|-------------------------------|----------------------------------|--------------------------------------|------------------------------------|\n", + "| - ... | - ... | - ... | - ... |\n", + "- Populate this table with concise bullet points.\n", + "- Adapt categories to match the discussion type (e.g., switch \"Suggestions\" to \"Reactions\" if it's a news thread).\n", + "---\n", + "## 4. Tone and Community Dynamics\n", + "- Comment on the **style and culture** of interaction: humor, sarcasm, empathy, trolling, intellectual debate, etc.\n", + "- Mention any noticeable social dynamics: agreement/disagreement, echo chambers, respectful debate, or hostility.\n", + "- Include casual or emotional comments if they illustrate community personality.\n", + "---\n", + "**Respond only in well-formatted Markdown.** Structure your output for clarity and insight, suitable for rendering in documentation, reports, or dashboards. Do not summarize every comment — focus on patterns, perspectives, and collective signals.\n", + "\n", + "'''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "350d8eea-005b-474e-9b57-cdb4004d8144", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(post):\n", + " user_prompt = f\"You are looking at a Reddit discussion titled:\\n\\n{post.title()}\\n\\n\"\n", + " user_prompt += \"Below are the responses from various users. Analyze them according to the system prompt provided.\\n\"\n", + " user_prompt += \"Make sure your response is structured in Markdown with headers, lists, and tables as instructed.\\n\\n\"\n", + " user_prompt += post.comments(1000)\n", + " return user_prompt\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bf23ed3b-8583-444e-ac62-3d415f771462", + "metadata": {}, + "outputs": [], + "source": [ + "# post = RedditPostScraper(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")\n", + "# print(post.title())\n", + "# print(post.comments())" + ] + }, + { + "cell_type": "markdown", + "id": "4e37f2e1-6eef-4c27-a442-97a6ff3dbf2a", + "metadata": {}, + "source": [ + "#### Generating messages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0781921b-e4e0-49f8-b34a-fd1017be6150", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(website):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(website)}\n", + " ]" + ] + }, + { + "cell_type": "markdown", + "id": "544c81a2-37c2-491e-8ef4-ac5d56173b72", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "#### llama 3.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d3dd0a2a-ddf2-4bd1-823d-b49fa44a09ec", + "metadata": {}, + "outputs": [], + "source": [ + "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", + "def summarizellama(url):\n", + " website = RedditPostScraper(url)\n", + " response = ollama_via_openai.chat.completions.create(\n", + " model = \"llama3.2\",\n", + " messages = messages_for(website)\n", + " )\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "717ccb6d-f6c9-4f36-ad69-686f3f1bd26b", + "metadata": {}, + "outputs": [], + "source": [ + "def display_summaryllama(url):\n", + " summary = summarizellama(url)\n", + " display(Markdown(summary))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2f981fe9-ed2d-4546-8fb3-c0f8048e3474", + "metadata": {}, + "outputs": [], + "source": [ + "display_summaryllama(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")" + ] + }, + { + "cell_type": "markdown", + "id": "e3091dcf-f8b3-4d1a-a85c-3a9ebed2ac6c", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "#### deepseek" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "55e465fa-e29d-4ed3-8f44-71964d2f866b", + "metadata": {}, + "outputs": [], + "source": [ + "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", + "def summarizedeepseek(url):\n", + " website = RedditPostScraper(url)\n", + " response = ollama_via_openai.chat.completions.create(\n", + " model = \"deepseek-r1\",\n", + " messages = messages_for(website)\n", + " )\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "40c26a89-97a8-4883-857a-fb13fea9222d", + "metadata": {}, + "outputs": [], + "source": [ + "def display_summarydeepseek(url):\n", + " summary = summarizedeepseek(url)\n", + " display(Markdown(summary))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "362b871e-8f4d-47fa-b01d-bbe3082dd271", + "metadata": {}, + "outputs": [], + "source": [ + "display_summarydeepseek(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")" + ] + }, + { + "cell_type": "markdown", + "id": "3841bb1e-e885-4cb5-88f6-b6698ccbb77f", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "#### Mistral" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6d913e07-31b4-439d-a861-c4fd99012588", + "metadata": {}, + "outputs": [], + "source": [ + "!ollama pull mistral:7b" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ab881745-990c-4158-935b-36075c1dacde", + "metadata": {}, + "outputs": [], + "source": [ + "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", + "def summarizeMistral(url):\n", + " website = RedditPostScraper(url)\n", + " response = ollama_via_openai.chat.completions.create(\n", + " model = \"mistral:7b\",\n", + " messages = messages_for(website)\n", + " )\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d5de3db6-ba69-43e8-9f6c-0945dbafa308", + "metadata": {}, + "outputs": [], + "source": [ + "def display_summaryMistral(url):\n", + " summary = summarizeMistral(url)\n", + " display(Markdown(summary))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7ea97e30-44be-45dc-ad2f-b6951ecc0190", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "display_summaryMistral(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "38e4aabe-b111-4ddb-af6c-6d4ff7d6f26b", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/Invoke LLM model from AWS Bedrock.ipynb b/week1/community-contributions/Invoke LLM model from AWS Bedrock.ipynb new file mode 100644 index 0000000..6948253 --- /dev/null +++ b/week1/community-contributions/Invoke LLM model from AWS Bedrock.ipynb @@ -0,0 +1,167 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 4, + "id": "9138adfe-71b0-4db2-a08f-dd9e472fdd63", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import boto3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15d71dd6-cc03-485e-8a34-7a33ed5dee0e", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "1358921d-173b-4d5d-828c-b6c3726a5eb3", + "metadata": {}, + "source": [ + "#### Connect to bedrock models" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "b3827087-182f-48be-8b59-b2741f8ded44", + "metadata": {}, + "outputs": [], + "source": [ + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "94c11534-6847-4e4a-b8e4-8066e0cc6aca", + "metadata": {}, + "outputs": [], + "source": [ + "# Use the Conversation API to send a text message to Amazon Nova.\n", + "\n", + "import boto3\n", + "from botocore.exceptions import ClientError\n", + "\n", + "# Create a Bedrock Runtime client in the AWS Region you want to use.\n", + "client = boto3.client(\"bedrock-runtime\", region_name=\"us-east-1\")\n", + "\n", + "# Set the model ID, e.g., Amazon Nova Lite.\n", + "model_id = \"amazon.nova-lite-v1:0\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9a8ad65f-abaa-475c-892c-2e2b4e668f5d", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "ac20bb00-e93f-4a95-a1de-dd2688bce591", + "metadata": {}, + "outputs": [], + "source": [ + "# Start a conversation with the user message.\n", + "user_message = \"\"\"\n", + "List the best parks to see in London with number of google ratings and value ie. 4.5 out of 5 etc. \n", + "Give number of ratings and give output in table form\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "a29f0055-48c4-4f25-b33f-cde1eaf755c5", + "metadata": {}, + "outputs": [], + "source": [ + "conversation = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [{\"text\": user_message}],\n", + " }\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0e68b2d5-4d43-4b80-8574-d3c847b33661", + "metadata": {}, + "outputs": [], + "source": [ + "try:\n", + " # Send the message to the model, using a basic inference configuration.\n", + " response = client.converse(\n", + " modelId=model_id,\n", + " messages=conversation,\n", + " inferenceConfig={\"maxTokens\": 512, \"temperature\": 0.5, \"topP\": 0.9},\n", + " )\n", + "\n", + " # Extract and print the response text.\n", + " response_text = response[\"output\"][\"message\"][\"content\"][0][\"text\"]\n", + " print(response_text)\n", + "\n", + "except (ClientError, Exception) as e:\n", + " print(f\"ERROR: Can't invoke '{model_id}'. Reason: {e}\")\n", + " exit(1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8ed16ee7-3f09-4780-8dfc-d1c5f3cffdbe", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7f8c7a18-0907-430d-bfe7-86ecb8933bfd", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2183994b-cde5-45b0-b18b-37be3277d73b", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/Week1-Exercise-Gemini-With-GenAI-SDK.ipynb b/week1/community-contributions/Week1-Exercise-Gemini-With-GenAI-SDK.ipynb new file mode 100644 index 0000000..f5c648f --- /dev/null +++ b/week1/community-contributions/Week1-Exercise-Gemini-With-GenAI-SDK.ipynb @@ -0,0 +1,203 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "6e19458c-4b0e-40f6-bd4f-4d9c80ea671b", + "metadata": {}, + "source": [ + "# End of Week 1 - Exercise - Using Gemini API with GenAI SDK" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f1a125bb-737f-41a5-8dd1-626cd8efe6e2", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from dotenv import load_dotenv\n", + "from google import genai\n", + "from google.genai import types\n", + "from IPython.display import Markdown, display, update_display" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "acf37451-3732-455b-a906-87f66053b018", + "metadata": {}, + "outputs": [], + "source": [ + "# Load API Key - For Gemini it automatically takes the api key from env file if we save the key using GOOGLE_API_KEY keyword\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4c2fccf9-e419-431e-97fc-a42fcf67c633", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialze Google Client\n", + "# Just to make it explicit i have used the api_key parameter but thats optional and genai.client automatically takes from .env file\n", + "\n", + "try:\n", + " client = genai.Client(api_key=api_key)\n", + " print(\"Google GenAI Client initialized successfully!\")\n", + "except Exception as e:\n", + " print(f\"Error initializing GenAI Client: {e}\")\n", + " print(\"Ensure your GOOGLE_API_KEY is correctly set as an environment variable.\")\n", + " exit()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b918afd-ed3b-49d1-85f1-6e549faec66e", + "metadata": {}, + "outputs": [], + "source": [ + "# Get list of models\n", + "print(\"List of models that support generateContent:\\n\")\n", + "for m in client.models.list():\n", + " for action in m.supported_actions:\n", + " if action == \"generateContent\":\n", + " print(m.name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "791da71e-35a5-4a15-90c7-93ae22e40232", + "metadata": {}, + "outputs": [], + "source": [ + "MODEL_GEMINI = 'gemini-2.5-flash-preview-05-20'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2a536e25-060e-4f93-bbd7-d80195620bba", + "metadata": {}, + "outputs": [], + "source": [ + "# System Definitions\n", + "\n", + "system_instruction_prompt = (\n", + " \"You are an expert Python programming assistant. Your goal is to identify common coding errors, suggest improvements for readability and efficiency,and provide corrected code snippets.\\\n", + " Always format code blocks using Markdown.\\\n", + " Be concise but thorough. Focus on the provided code and context.\"\n", + ")\n", + "\n", + "generate_content_config = types.GenerateContentConfig(system_instruction=system_instruction_prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2fc2a778-f175-44ec-9535-f81deeca7f1a", + "metadata": {}, + "outputs": [], + "source": [ + "# Main program to get user input and then use model to respond.\n", + "\n", + "MAX_HISTORY_MESSAGES = 6\n", + "conversation_contents = []\n", + "\n", + "print(\"\\n--- Start Chat with Gemini Python Assistant ---\")\n", + "print(\"Type 'Done' to exit the conversation.\")\n", + "\n", + "while True:\n", + " user_input = input(\"You: \").strip()\n", + "\n", + " if user_input.lower() == \"done\": \n", + " print(\"\\nExiting chat. Goodbye!\")\n", + " break \n", + "\n", + " if not user_input: \n", + " print(\"Please enter a question or 'Done' to exit.\")\n", + " continue\n", + " \n", + " try:\n", + " user_message_content = types.Content(\n", + " role=\"user\",\n", + " parts=[types.Part.from_text(text=user_input)]\n", + " ) \n", + " \n", + " conversation_contents.append(user_message_content) \n", + " \n", + " stream_response = client.models.generate_content_stream(\n", + " model=MODEL_GEMINI,\n", + " contents=conversation_contents,\n", + " config=generate_content_config,\n", + " )\n", + " \n", + " model_full_response_text = \"**Gemini:**\\n\\n\"\n", + " current_display_handle = display(Markdown(\"\"), display_id=True)\n", + " \n", + " \n", + " for chunk in stream_response:\n", + " chunk_text = chunk.text or ''\n", + " model_full_response_text += chunk_text\n", + " update_display(Markdown(model_full_response_text), display_id=current_display_handle.display_id)\n", + " \n", + " # Add Model's FULL Response to Conversation History\n", + " model_message_content = types.Content(\n", + " role=\"model\",\n", + " parts=[types.Part.from_text(text=model_full_response_text.removeprefix(\"**Gemini:**\\n\\n\"))]\n", + " )\n", + " \n", + " conversation_contents.append(model_message_content)\n", + " \n", + " conversation_contents = conversation_contents[-MAX_HISTORY_MESSAGES:] \n", + "\n", + " except Exception as e:\n", + " print(f\"\\nAn error occurred during interaction: {e}\")\n", + " if conversation_contents:\n", + " conversation_contents.pop()\n", + " print(\"Please try asking your question again or type 'Done' to exit.\")\n", + " continue " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a86c3e5b-516b-42dc-994f-9dfa75c610cc", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day1-dotabuff-summarization.ipynb b/week1/community-contributions/day1-dotabuff-summarization.ipynb new file mode 100644 index 0000000..08c5f73 --- /dev/null +++ b/week1/community-contributions/day1-dotabuff-summarization.ipynb @@ -0,0 +1,271 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "032a76d2-a112-4c49-bd32-fe6c87f6ec19", + "metadata": {}, + "source": [ + "## Dota Game Assistant\n", + "\n", + "This script retrieves and summarizes information about a specified hero from `dotabuff.com` website" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04b24159-55d1-4eaf-bc19-474cec71cc3b", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install selenium\n", + "!pip install webdriver-manager" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14d26510-6613-4c1a-a346-159d906d111c", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9c8ea1e-8881-4f50-953d-ca7f462d8a32", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "# Check the key\n", + "\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif not api_key.startswith(\"sk-proj-\"):\n", + " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02febcac-9a21-4322-b2ea-748972312165", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bb7dd822-962e-4b34-a743-c14809764e4a", + "metadata": {}, + "outputs": [], + "source": [ + "# A class to represent a Webpage\n", + "\n", + "# Some websites need you to use proper headers when fetching them:\n", + "headers = {\n", + " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", + "}\n", + "\n", + "from selenium import webdriver\n", + "from selenium.webdriver.chrome.service import Service\n", + "from selenium.webdriver.chrome.options import Options\n", + "from selenium.webdriver.common.by import By\n", + "from selenium.webdriver.support.ui import WebDriverWait\n", + "from selenium.webdriver.support import expected_conditions as EC\n", + "from webdriver_manager.chrome import ChromeDriverManager\n", + "from bs4 import BeautifulSoup\n", + "\n", + "class Website:\n", + " def __init__(self, url, wait_time=10):\n", + " \"\"\"\n", + " Create this Website object from the given URL using Selenium and BeautifulSoup.\n", + " Uses headless Chrome to load JavaScript content.\n", + " \"\"\"\n", + " self.url = url\n", + "\n", + " # Configure headless Chrome\n", + " options = Options()\n", + " options.headless = True\n", + " options.add_argument(\"--disable-gpu\")\n", + " options.add_argument(\"--no-sandbox\")\n", + "\n", + " # Start the driver\n", + " service = Service(ChromeDriverManager().install())\n", + " driver = webdriver.Chrome(service=service, options=options)\n", + "\n", + " try:\n", + " driver.get(url)\n", + "\n", + " # Wait until body is loaded (you can tweak the wait condition)\n", + " WebDriverWait(driver, wait_time).until(\n", + " EC.presence_of_element_located((By.TAG_NAME, \"body\"))\n", + " )\n", + "\n", + " html = driver.page_source\n", + " soup = BeautifulSoup(html, \"html.parser\")\n", + "\n", + " self.title = soup.title.string.strip() if soup.title else \"No title found\"\n", + "\n", + " # Remove unwanted tags\n", + " for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n", + " irrelevant.decompose()\n", + "\n", + " self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n", + "\n", + " finally:\n", + " driver.quit()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9d833fbb-0115-4d99-a4e9-464f27900eab", + "metadata": {}, + "outputs": [], + "source": [ + "class DotaWebsite:\n", + " def __init__(self, hero):\n", + " web = Website(\"https://www.dotabuff.com/heroes\" + \"/\" + hero)\n", + " self.title = web.title\n", + " self.text = web.text" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a0a42c2b-c837-4d1b-b8f8-b2dbb8592a1a", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = \"You are an game assistant that analyzes the contents of a website \\\n", + "and provides a short summary about facet selection, ability building, item building, best versus and worst versus, ignoring text that might be navigation related. \\\n", + "Respond in markdown.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7c05843d-6373-4a76-8cca-9c716a6ca13a", + "metadata": {}, + "outputs": [], + "source": [ + "# A function that writes a User Prompt that asks for summaries of websites:\n", + "\n", + "def user_prompt_for(website):\n", + " user_prompt = f\"You are looking at a website titled {website.title}\"\n", + " user_prompt += \"\\nThe contents of this website is as follows; \\\n", + "please provide a short summary of provides a short summary about facet selection, ability building, item building, best versus and worst versus in markdown. \\\n", + "If it includes news or announcements, then summarize these too.\\n\\n\"\n", + " user_prompt += website.text\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0145eee1-39e2-4f00-89ec-7acc6e375972", + "metadata": {}, + "outputs": [], + "source": [ + "# See how this function creates exactly the format above\n", + "\n", + "def messages_for(website):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(website)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "76f389c0-572a-476b-9b4e-719c0ef10abb", + "metadata": {}, + "outputs": [], + "source": [ + "# And now: call the OpenAI API. You will get very familiar with this!\n", + "\n", + "def summarize(hero):\n", + " website = DotaWebsite(hero)\n", + " response = openai.chat.completions.create(\n", + " model = \"gpt-4o-mini\",\n", + " messages = messages_for(website)\n", + " )\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fcb046b7-52a9-49ff-b7bc-d8f6c279df4c", + "metadata": {}, + "outputs": [], + "source": [ + "# A function to display this nicely in the Jupyter output, using markdown\n", + "\n", + "def display_summary(hero):\n", + " summary = summarize(hero)\n", + " display(Markdown(summary))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9befb685-2912-41a9-b2d9-ae33001494c0", + "metadata": {}, + "outputs": [], + "source": [ + "display_summary(\"axe\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bf1bb1d9-0351-44fc-8ebf-91aa47a81b42", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day1-finviz_stock_analysis.ipynb b/week1/community-contributions/day1-finviz_stock_analysis.ipynb new file mode 100644 index 0000000..4165bde --- /dev/null +++ b/week1/community-contributions/day1-finviz_stock_analysis.ipynb @@ -0,0 +1,159 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "922bb144", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "870bdcd9", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv(\"OPENAI_API_KEY\")\n", + "\n", + "# Check the key\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif not api_key.startswith(\"sk-proj-\"):\n", + " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f6146102", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2f75573f", + "metadata": {}, + "outputs": [], + "source": [ + "class FinvizWebsite():\n", + " \"\"\"\n", + " Create this Website object from the given url using the BeautifulSoup library\n", + " \"\"\"\n", + " \n", + " def __init__(self, ticker):\n", + " self.ticker = ticker.upper()\n", + " self.url = f\"https://finviz.com/quote.ashx?t={self.ticker}&p=d&ty=ea\"\n", + " self.headers = {\n", + " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", + " }\n", + " response = requests.get(self.url, headers=self.headers)\n", + " soup = BeautifulSoup(response.content, \"html.parser\")\n", + " self.title = soup.title.string if soup.title else \"No title found\"\n", + " self.table = soup.find(\"table\", class_=\"snapshot-table2\") " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42c7ced6", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(website):\n", + " system_prompt = \"\"\"\n", + " You are a financial analysis assistant that analyzes the contents of HTML formated table.\n", + " and provides a summary of the stock's analysis with clear and professional language appropriate for financial research \n", + " with bulleted important list of **pros** and **cons** , ignoring text that might be navigation related. Repond in markdown.\n", + " \"\"\"\n", + " \n", + " user_prompt = f\"\"\"\n", + " You are looking at a website titled {website.title}.\\n\n", + " The contents of this website is as follows; please provide a summary of the stock's analysis from this website in markdown.\\n\\n\n", + " {website.table}\n", + " \"\"\"\n", + " \n", + " return [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ]\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7bfaa6da", + "metadata": {}, + "outputs": [], + "source": [ + "def display_summary(ticker):\n", + " website = FinvizWebsite(ticker)\n", + " response = openai.chat.completions.create(\n", + " model = \"gpt-4o-mini\",\n", + " messages = messages_for(website)\n", + " )\n", + " summary = response.choices[0].message.content\n", + " display(Markdown(summary))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eeeff6f7", + "metadata": {}, + "outputs": [], + "source": [ + "display_summary(\"aapl\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5aed2001", + "metadata": {}, + "outputs": [], + "source": [ + "display_summary(\"tsla\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day1-mail_subject_creation.ipynb b/week1/community-contributions/day1-mail_subject_creation.ipynb new file mode 100644 index 0000000..fd808bf --- /dev/null +++ b/week1/community-contributions/day1-mail_subject_creation.ipynb @@ -0,0 +1,156 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "72a6552c-c837-4ced-b7c8-75a3d4cf777d", + "metadata": {}, + "source": [ + "

MAIL SUBJECT CREATION -

\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \n", + "

Write something that will take the contents of an email, and will suggest an appropriate short subject line for the email. That's the kind of feature that might be built into a commercial email tool.

\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "76822a8b-d6e0-4dd9-a801-2d34bd104b7d", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "1a9de873-d24b-42fb-8f4a-a08f429050f5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "API key found and looks good so far!\n" + ] + } + ], + "source": [ + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "# Check the key\n", + "\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif not api_key.startswith(\"sk-proj-\"):\n", + " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "122af5d6-4727-4229-b85a-ea5246ff540c", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "b9a2c2c2-ac10-4019-aeef-2bfe6cc7b1f3", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Subject: Missing API Logs for June 22nd: Scheduled Meeting to Address Issue\n" + ] + } + ], + "source": [ + "system_prompt = \"You are an assistant which can generate a subject line as output by taking email of content as input. Subject line should be self explanatrory\"\n", + "user_prompt = \"\"\"\n", + " Below is the content of the text which I am giving as input\n", + " Mail Content - 'Hi Team,\n", + "\n", + "We have observed that the API logs for June 22nd between 6:00 AM and 12:00 PM are missing in Kibana.\n", + "\n", + "The SA team has confirmed that there were no errors reported on their end during this period.\n", + "\n", + "The DevOps team has verified that logs were being sent as expected.\n", + "\n", + "Upon checking the Fluentd pods, no errors were found.\n", + "\n", + "Logs were being shipped to td-agent as usual.\n", + "\n", + "No configuration changes or pod restarts were detected.\n", + "\n", + "We have also confirmed that no code changes were deployed from our side during this time.\n", + "\n", + "Bucket: api_application_log\n", + "Ticket\n", + "\n", + "We have scheduled a meeting with the SA and DevOps teams to restore the missing logs, as they are critical for our weekly report and analysis.'\n", + "\"\"\"\n", + "\n", + "# Step 2: Make the messages list\n", + "\n", + "messages = [ {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt}] # fill this in\n", + "\n", + "# Step 3: Call OpenAI\n", + "\n", + "response = openai.chat.completions.create(\n", + " model = \"gpt-4o-mini\",\n", + " messages = messages\n", + " )\n", + "\n", + "# Step 4: print the result\n", + "\n", + "print(response.choices[0].message.content)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day1-reviewsSummary.ipynb b/week1/community-contributions/day1-reviewsSummary.ipynb new file mode 100644 index 0000000..910894f --- /dev/null +++ b/week1/community-contributions/day1-reviewsSummary.ipynb @@ -0,0 +1,130 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "4e2a9393-7767-488e-a8bf-27c12dca35bd", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI\n", + "\n", + "# If you get an error running this cell, then please head over to the troubleshooting notebook!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b87cadb-d513-4303-baee-a37b6f938e4d", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "# Check the key\n", + "\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif not api_key.startswith(\"sk-proj-\"):\n", + " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "019974d9-f3ad-4a8a-b5f9-0a3719aea2d3", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()\n", + "\n", + "# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.\n", + "# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f4484fcf-8b39-4c3f-9674-37970ed71988", + "metadata": {}, + "outputs": [], + "source": [ + "# Step 1: Create your prompts\n", + "\n", + "system_prompt = f\"\"\"\n", + " You are an assistant that will analyze the ratings & reviews from :\\n\\n{reviews_text}\\n\\n and comeup with a summary of how many 5,4,3,2,1 star rating the restuarnat has. \n", + " You will also come up with a summary of the reviews showing what the customers love about the restaurant and what they dont like. Also extract the name of the restaurant,\n", + " the location and the cuisine. Respond in markdown\"\"\"\n", + "user_prompt = \"This is the summary for the restaurant: \"\n", + "\n", + "# Step 2: Make the messages list\n", + "\n", + "messages = [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + "] \n", + "\n", + "def generate_review_summary(reviews_text):\n", + " response = openai.chat.completions.create(\n", + " model = \"gpt-4o-mini\",\n", + " messages = messages\n", + " )\n", + " return response.choices[0].message.content\n", + "\n", + "try:\n", + " with open('restaurant_reviews.txt', 'r') as file:\n", + " reviews_text = file.read()\n", + " \n", + " # Generate review summary\n", + " summary = generate_review_summary(reviews_text)\n", + " display(Markdown(summary))\n", + "\n", + "except FileNotFoundError:\n", + " print(\"The specified reviews file was not found. Please ensure 'restaurant_reviews.txt' is in the correct directory.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3eccbf35-0a0b-4a1b-b493-aa5c342109cc", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day1_Project.ipynb b/week1/community-contributions/day1_Project.ipynb new file mode 100644 index 0000000..30e795c --- /dev/null +++ b/week1/community-contributions/day1_Project.ipynb @@ -0,0 +1,189 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "181edd2d-67d4-43e4-9a89-327eaff26177", + "metadata": {}, + "source": [ + "Grammar and Vocab AI Checker" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4be465e2-16fc-4b34-a771-d23f05edbc14", + "metadata": {}, + "outputs": [], + "source": [ + "pip install PyMuPDF" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66b371fb-f4ea-4ced-8ad2-4229892e0647", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI\n", + "import fitz # PyMuPDF" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41068273-4325-4de2-b11d-37d2831b1a47", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "# Check the key\n", + "\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif not api_key.startswith(\"sk-proj-\"):\n", + " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba003970-0cc9-4e11-8702-0b120f378fa4", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "faa89067-fcee-4950-b4ce-3faec640c79b", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = \"You are a spell, grammar, and vocabulary checker. You check for any mistakes in terms of spelling, grammar, and vocabulary of texts or files that are given to you. You provide a response with the percentage of the text that is correct in terms of spelling, vocab, and grammar but also the total number of words. These characters is in the file or text that you are checking, and provide instructions in bullet points on how to fix them and where the mistakes are.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de32a94d-9c1b-4e1a-a1b9-78d3180c0d79", + "metadata": {}, + "outputs": [], + "source": [ + "# user_prompt = \"Hi, mw namw is kkkdvin. How are y,?\" # Uncomment this to test the implementation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "272f379d-3471-488d-ba27-bbffff961d72", + "metadata": {}, + "outputs": [], + "source": [ + "def extract_pdf_text_to_string(pdf_path):\n", + " \"\"\"\n", + " Extracts all text from a PDF file and returns it as a single string.\n", + "\n", + " Args:\n", + " pdf_path (str): The path to the PDF file.\n", + "\n", + " Returns:\n", + " str: A string containing all the extracted text from the PDF.\n", + " \"\"\"\n", + " text_content = \"\"\n", + " try:\n", + " doc = fitz.open(pdf_path)\n", + " for page_num in range(doc.page_count):\n", + " page = doc.load_page(page_num)\n", + " text_content += page.get_text()\n", + " doc.close()\n", + " except Exception as e:\n", + " print(f\"Error processing PDF: {e}\")\n", + " return None\n", + " return text_content\n", + "\n", + "pdf_file_path = \"gram-vocab-test.pdf\" # Replace with the actual path to your PDF\n", + "user_prompt = extract_pdf_text_to_string(pdf_file_path)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "07a839f6-c508-4b94-98ec-877c19023e58", + "metadata": {}, + "outputs": [], + "source": [ + "messages = [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": f\"This is the text to check for grammar, vocab, and spelling errors: {user_prompt}\"}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a642cb62-9016-4957-a74e-9f97f8c495a7", + "metadata": {}, + "outputs": [], + "source": [ + "response = openai.chat.completions.create(model=\"gpt-4o-mini\", messages=messages)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2ce6b006-19b6-48b4-b344-b4b57b8c1438", + "metadata": {}, + "outputs": [], + "source": [ + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "54bc23cd-f59c-4b4d-bc3e-60f273692d92", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day1_check_source_for_security_vuln.ipynb b/week1/community-contributions/day1_check_source_for_security_vuln.ipynb new file mode 100644 index 0000000..db99309 --- /dev/null +++ b/week1/community-contributions/day1_check_source_for_security_vuln.ipynb @@ -0,0 +1,156 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "e95fa36b-7118-4fd8-a3b2-b4424bda2178", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a0356762-4a3f-437a-908e-192aa9c804c7", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "# Check the key\n", + "\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif not api_key.startswith(\"sk-proj-\"):\n", + " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eb747863-30bd-4a0b-b359-b37223884075", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()\n", + "message = \"Hello, GPT! This is my first ever message to you! Hi!\"\n", + "response = openai.chat.completions.create(model=\"gpt-4o-mini\", messages=[{\"role\":\"user\", \"content\":message}])\n", + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fae60901-3564-4f26-a812-fc16d3b95bdb", + "metadata": {}, + "outputs": [], + "source": [ + "def get_page_source(url):\n", + " response = requests.get(url)\n", + " response.raise_for_status() # Hata varsa bildirir\n", + " return response.text # Ham HTML metni döner\n", + "\n", + "system_prompt = \"You are an assistant analyzing the source of a website and checking for security vulnerabilities.\"\n", + "\n", + "def user_prompt_for(url):\n", + " user_prompt = \"Below is the HTML source of the website:\\n\\n\"\n", + " user_prompt += get_page_source(url) \n", + " user_prompt += \"\\n\\nPlease check this website and search for security vulnerabilities. \"\n", + " user_prompt += \"If you don't find any, print 'No vulnerability found.' \"\n", + " user_prompt += \"If you find a potential vulnerability risk, describe the vulnerability risk and print 'Potential Vulnerability Risk'.\"\n", + " user_prompt += \"If you find a direct, explicit vulnerability, describe the vulnerability and CVSS Score print 'ATTENTION! Vulnerability is Found.'\"\n", + " user_prompt += \"If you find both a potential vulnerability risk and a direct, explicit vulnerability, describe them and CVSS Score print 'ATTENTION! Potential Vulnerability Risk and Direct Vulnerability are Found!!'\"\n", + " return user_prompt\n", + "\n", + "def messages_for(url):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(url)}\n", + " ]\n", + "\n", + "def check_vuln(url):\n", + " response = openai.chat.completions.create(\n", + " model = \"gpt-4o-mini\",\n", + " messages = messages_for(url)\n", + " )\n", + " return response.choices[0].message.content\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e048c27f-f659-4c92-a47c-679bf6e5bf5f", + "metadata": {}, + "outputs": [], + "source": [ + "def display_vuln(url):\n", + " display_vuln = check_vuln(url)\n", + " display(Markdown(display_vuln))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "69f5852f-ca5b-4933-b93c-e9f2d401467a", + "metadata": {}, + "outputs": [], + "source": [ + "display_vuln(\"https://edwarddonner.com\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "824943fc-e5a5-424a-abec-56767a709782", + "metadata": {}, + "outputs": [], + "source": [ + "display_vuln(\"http://192.168.1.113/\") #local apache server IP, contains xss_vulnerable_example.html" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e3543846-e0c6-4504-8b65-2f675f0f7ebe", + "metadata": {}, + "outputs": [], + "source": [ + "display_vuln(\"https://www.google.com\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day1_exercise-recipe_formatter.ipynb b/week1/community-contributions/day1_exercise-recipe_formatter.ipynb new file mode 100644 index 0000000..df936bf --- /dev/null +++ b/week1/community-contributions/day1_exercise-recipe_formatter.ipynb @@ -0,0 +1,239 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "id": "cab13efd-a1f4-4077-976e-e3912511117f", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import re\n", + "from dotenv import load_dotenv\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "c226f54b-325c-49b1-9d99-207a8e306682", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: youtube_transcript_api in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (1.1.1)\n", + "Requirement already satisfied: defusedxml<0.8.0,>=0.7.1 in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from youtube_transcript_api) (0.7.1)\n", + "Requirement already satisfied: requests in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from youtube_transcript_api) (2.32.4)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from requests->youtube_transcript_api) (3.4.2)\n", + "Requirement already satisfied: idna<4,>=2.5 in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from requests->youtube_transcript_api) (3.10)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from requests->youtube_transcript_api) (2.5.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /Users/nachogonzalezbullon/miniconda3/envs/llms/lib/python3.11/site-packages (from requests->youtube_transcript_api) (2025.7.9)\n" + ] + } + ], + "source": [ + "!pip install youtube_transcript_api" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "717fc2a4-b6c5-4027-9e6b-05e83c38d02f", + "metadata": {}, + "outputs": [], + "source": [ + "from youtube_transcript_api import YouTubeTranscriptApi" + ] + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": 4, + "source": [ + "# Load environment variables in a file called .env\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')" + ], + "id": "3caca469-5f39-4592-bf12-c8832c44de19" + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [ + "class YouTubeRecipeExtractor:\n", + "\n", + " def __init__(self):\n", + " self.openai = OpenAI()\n", + " self.system_prompt = self.get_system_prompt()\n", + "\n", + " def get_system_prompt(self):\n", + " return \"\"\"\n", + " You are a professional chef and nutritionist specializing in recipe writting.\n", + "\n", + " Your task is to write recipes in a very comprehensive and consistent manner.\n", + " Each recipe will contain a list of ingredients and a list of steps to follow.\n", + " The quantities of the ingredients should always be referred to an official unit (grams, litres, etc). If the original recipe uses a different unit (such as cup, teaspoons, etc.) make the transformation but keep the original instruction between parenthesis.\n", + " The steps should be described in a very synthetic and concise manner. You should avoid being verbose, but the step should be understandable and easy to follow for non-expert people.\n", + " To each recipe add a general analysis from nutrition perspective (number of calories per serving, proteins, fat, etc.).\n", + " Use Markdown to improve readability.\n", + " If the text you receive is not a recipe, return a kind message explaining the situation.\n", + " \"\"\"\n", + "\n", + " def extract_video_id(self, url):\n", + " \"\"\"Extract video ID from YouTube URL\"\"\"\n", + " pattern = r'(?:youtube\\.com/watch\\?v=|youtu\\.be/|youtube\\.com/embed/)([^&\\n?#]+)'\n", + " match = re.search(pattern, url)\n", + " return match.group(1) if match else None\n", + "\n", + " def get_transcription(self, video_id):\n", + " try:\n", + " print(f\"Fetching video transcript for video {video_id}...\")\n", + " transcript = YouTubeTranscriptApi.get_transcript(video_id)\n", + " return \" \".join([item['text'] for item in transcript])\n", + " except Exception as e:\n", + " print(f\"Error fetching transcript: {e}\")\n", + " return None\n", + "\n", + " def format_recipe(self, transcript):\n", + " try:\n", + " response = self.openai.chat.completions.create(\n", + " model=\"gpt-4o-mini\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": self.system_prompt},\n", + " {\"role\": \"user\", \"content\": f\"Summarize the following YouTube recipe:\\n\\n{transcript}\"}\n", + " ]\n", + " )\n", + " return response.choices[0].message.content\n", + " except Exception as e:\n", + " print(f\"Error summarizing text: {e}\")\n", + " return None\n", + "\n", + " def display_recipe(self, url):\n", + " transcript = self.get_transcription(self.extract_video_id(url))\n", + " recipe = self.format_recipe(transcript)\n", + " display(Markdown(recipe))\n" + ], + "id": "29e44cb5-0928-4ac9-9681-efd6ba1e359f" + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "98ea2d01-f949-4e03-9154-fe524cf64ca4", + "metadata": {}, + "outputs": [], + "source": [ + "test_bad_url = \"https://www.youtube.com/watch?v=hzGiTUTi060\"\n", + "test_good_url = \"https://www.youtube.com/watch?v=D_2DBLAt57c\"" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "362e39e8-a254-4f2f-8653-5fbb7ff0e1e9", + "metadata": {}, + "outputs": [], + "source": [ + "extractor = YouTubeRecipeExtractor()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "0cc259bd-46bb-4472-b3cb-f39da54e324a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Fetching video transcript...\n" + ] + }, + { + "data": { + "text/markdown": [ + "Thank you for your interest, but the text you provided is not a recipe. If you're looking for cooking instructions, ingredient lists, or nutrition analysis, please provide a specific food or dish you would like to know about, and I'd be happy to help!" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "extractor.display_recipe(test_bad_url)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "3f43e320-ca55-4db5-bc95-71fcb342cf3c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Fetching video transcript for video D_2DBLAt57c...\n", + "Error fetching transcript: YouTubeTranscriptApi.fetch() missing 1 required positional argument: 'self'\n" + ] + }, + { + "data": { + "text/markdown": [ + "It seems like you haven't provided a recipe or any details to summarize. If you have a specific recipe in mind, please share it, and I'll be happy to help!" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "extractor.display_recipe(test_good_url)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11c5c2b3-498a-43eb-9b68-d2b920c56b10", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day2 EXERCISE_priithvi.ipynb b/week1/community-contributions/day2 EXERCISE_priithvi.ipynb new file mode 100644 index 0000000..3542cb2 --- /dev/null +++ b/week1/community-contributions/day2 EXERCISE_priithvi.ipynb @@ -0,0 +1,1029 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9", + "metadata": {}, + "source": [ + "# Welcome to your first assignment!\n", + "\n", + "Instructions are below. Please give this a try, and look in the solutions folder if you get stuck (or feel free to ask me!)" + ] + }, + { + "cell_type": "markdown", + "id": "ada885d9-4d42-4d9b-97f0-74fbbbfe93a9", + "metadata": {}, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \n", + "

Just before we get to the assignment --

\n", + " I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.
\n", + " https://edwarddonner.com/2024/11/13/llm-engineering-resources/
\n", + " Please keep this bookmarked, and I'll continue to add more useful links there over time.\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "6e9fa1fc-eac5-4d1d-9be4-541b3f2b3458", + "metadata": {}, + "source": [ + "# HOMEWORK EXERCISE ASSIGNMENT\n", + "\n", + "Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI\n", + "\n", + "You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.\n", + "\n", + "**Benefits:**\n", + "1. No API charges - open-source\n", + "2. Data doesn't leave your box\n", + "\n", + "**Disadvantages:**\n", + "1. Significantly less power than Frontier Model\n", + "\n", + "## Recap on installation of Ollama\n", + "\n", + "Simply visit [ollama.com](https://ollama.com) and install!\n", + "\n", + "Once complete, the ollama server should already be running locally. \n", + "If you visit: \n", + "[http://localhost:11434/](http://localhost:11434/)\n", + "\n", + "You should see the message `Ollama is running`. \n", + "\n", + "If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve` \n", + "And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2` \n", + "Then try [http://localhost:11434/](http://localhost:11434/) again.\n", + "\n", + "If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code below from `MODEL = \"llama3.2\"` to `MODEL = \"llama3.2:1b\"`" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "4e2a9393-7767-488e-a8bf-27c12dca35bd", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import requests\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "29ddd15d-a3c5-4f4e-a678-873f56162724", + "metadata": {}, + "outputs": [], + "source": [ + "# Constants\n", + "\n", + "OLLAMA_API = \"http://localhost:11434/api/chat\"\n", + "HEADERS = {\"Content-Type\": \"application/json\"}\n", + "MODEL = \"llama3.2\"\n", + "MODEL = \"tinyllama:latest\"" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "dac0a679-599c-441f-9bf2-ddc73d35b940", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a messages list using the same format that we used for OpenAI\n", + "\n", + "messages = [\n", + " {\"role\": \"user\", \"content\": \"Summarize this website: cnn.com\"}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "7bb9c624-14f0-4945-a719-8ddb64f66f47", + "metadata": {}, + "outputs": [], + "source": [ + "payload = {\n", + " \"model\": MODEL,\n", + " \"messages\": messages,\n", + " \"stream\": False\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "479ff514-e8bd-4985-a572-2ea28bb4fa40", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's just make sure the model is loaded\n", + "\n", + "# !ollama pull llama3.2" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "42b9f644-522d-4e05-a691-56e7658c0ea9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This website provides up-to-date and comprehensive news, analysis, and opinion articles from CNN on a variety of topics, including politics, business, entertainment, sports, and international affairs. It offers a personalized feed based on your interests and browsing history to provide you with relevant content tailored to your preferences.\n" + ] + } + ], + "source": [ + "# If this doesn't work for any reason, try the 2 versions in the following cells\n", + "# And double check the instructions in the 'Recap on installation of Ollama' at the top of this lab\n", + "# And if none of that works - contact me!\n", + "\n", + "response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)\n", + "print(response.json()['message']['content'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d042059-333e-4723-a48c-8a1a71fd6aab", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "99d0c1a2-52b2-4cb3-9d67-6d5931847f8c", + "metadata": {}, + "outputs": [], + "source": [ + "response = requests.post(OLLAMA_API, json = payload, headers = HEADERS)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "e8bd28b9-545b-4806-8a25-93b0208b7939", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The capital of France is Paris, and the current name was adopted in 1968. The previous names include:\n", + "\n", + "1. Paris (1947-1968)\n", + "2. Ville de Paris (1802-1803)\n", + "3. Ville d'Ay (1799-1801)\n", + "4. Ville nouvelle d'Ay (1755-1799)\n", + "\n", + "The capital of France is named after the city of Paris, and the city has had various names throughout history.\n" + ] + } + ], + "source": [ + "print(response.json()['message']['content'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d1687b5e-b6d3-4922-9f56-7d9a07b01874", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "478c89e6-490f-4e67-835d-eaadeb9baeef", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'C:\\\\Users\\\\Prithvi\\\\Downloads\\\\Practice\\\\Udemy - LLM Engineering\\\\llm_engineering\\\\week1'" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import os\n", + "os.getcwd()" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "b2ade1d2-bf4d-431e-84b3-2ecbfee6db98", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "261e27a3-12dd-4258-b198-3212009ffe17", + "metadata": {}, + "outputs": [], + "source": [ + "from openai import OpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "78027c03-9382-459b-b76f-9712f09f4c92", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "c7f416b7-6d19-4b83-a343-3b2ed8e32eec", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 63, + "id": "e5ab1fcb-6e62-4805-9a95-b21f748c2294", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 64, + "id": "aade8a9d-e7b3-4985-9087-cb32d1ae816e", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 65, + "id": "463b0bdf-72f8-433e-b0ed-c0172b6ecedd", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 66, + "id": "ab9af96a-b039-4c9b-ac25-edc4da0236ea", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 67, + "id": "a1f46f0b-f406-4929-acf0-65d9c0bc084f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "ChatCompletion(id='chatcmpl-79', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"Website {url}: Designed for Helping AI Assistants!\\n\\nIntroducing a user-friendly platform designed to assist Artificial Intelligence (AI) AIs with their daily tasks. If you are an AI assistant seeking a convenient and stress-free solution, look no further than the latest addition to the growing array of AI service platforms on the market today! Features include chatbots, virtual assistants, automated customer support, and more to help you stay at the forefront of your industry while minimizing the amount of time and effort required of you. So what are you waiting for? Join thousands of other users in experiencing the next level of efficiency and productivity with the newest AI service platform – all because it's been tailored specifically to help you optimize the way you work! Discover the secret to working smarter, not harder today! Visit www.aiassistant.com for all the details in one place!\", refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1752444823, model='tinyllama', object='chat.completion', service_tier=None, system_fingerprint='fp_ollama', usage=CompletionUsage(completion_tokens=206, prompt_tokens=42, total_tokens=248, completion_tokens_details=None, prompt_tokens_details=None))" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "response" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "id": "db231836-3df6-4784-8cbb-64dc1c4c8d76", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Website {url}: Designed for Helping AI Assistants!\n", + "\n", + "Introducing a user-friendly platform designed to assist Artificial Intelligence (AI) AIs with their daily tasks. If you are an AI assistant seeking a convenient and stress-free solution, look no further than the latest addition to the growing array of AI service platforms on the market today! Features include chatbots, virtual assistants, automated customer support, and more to help you stay at the forefront of your industry while minimizing the amount of time and effort required of you. So what are you waiting for? Join thousands of other users in experiencing the next level of efficiency and productivity with the newest AI service platform – all because it's been tailored specifically to help you optimize the way you work! Discover the secret to working smarter, not harder today! Visit www.aiassistant.com for all the details in one place!\n" + ] + } + ], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 75, + "id": "1f4468db-5c15-49a9-956d-acf7fda236a3", + "metadata": {}, + "outputs": [], + "source": [ + "def summarizewebsite(url):\n", + " api_key = os.getenv('OPENAI_API_KEY')\n", + " model = 'tinyllama'\n", + " openai = OpenAI(base_url=\"http://localhost:11434/v1\", api_key=\"ollama\")\n", + " message = f\"Summarize the website {url}\"\n", + " messages = [\n", + " {\"role\": \"user\",\n", + " \"content\": message}\n", + " ]\n", + " response = openai.chat.completions.create(model= model, messages=messages)\n", + " \n", + " print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "code", + "execution_count": 107, + "id": "4dd35627-474f-409c-8c12-75859a3e5fa9", + "metadata": {}, + "outputs": [], + "source": [ + "url = \"cnn.com\"\n", + "url = \"https://en.wikipedia.org/wiki/Newton%27s_method\"" + ] + }, + { + "cell_type": "code", + "execution_count": 108, + "id": "6b91489f-ef8d-4c7f-b00d-e2d5be531167", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "A highly versatile and powerful algorithm called Newton's method, commonly used in scientific computing and engineering, is the subject of interest on this article available online at wikiPedia. The method helps you solve complex numerical problems while ensuring smooth convergence to a steady-state solution, with accuracy dependent on certain criteria. By using Newton's method, scientists and engineers can tackle problems in fields as diverse as astrophysics, mechanical engineering, and economics, among others.\n" + ] + } + ], + "source": [ + "summarizewebsite(url)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f703192a-950d-4c1a-b857-6198b52d2d56", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 82, + "id": "01e532b9-2989-4b12-92fb-5f8e73fb455d", + "metadata": {}, + "outputs": [], + "source": [ + "def top5words(url):\n", + " api_key = os.getenv('OPENAI_API_KEY')\n", + " model = 'tinyllama'\n", + " openai = OpenAI(base_url=\"http://localhost:11434/v1\", api_key=\"ollama\")\n", + " message = f\"Give top recurring words in the website {url}\"\n", + " messages = [\n", + " {\"role\": \"user\",\n", + " \"content\": message}\n", + " ]\n", + " response = openai.chat.completions.create(model= model, messages=messages)\n", + " \n", + " print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "id": "9012765a-53be-431f-9f66-78f8769f637c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1. The\n", + "2. CNN, 3. News, 4. U.S., 5. Headline, 6. Politics, 7. Channel, 8. World, 9. Newsroom, 10. Usa, 11. Story, 12. Online, 13. Coverage, 14. Topics, 15. Head\n", + "6. CNN\n" + ] + } + ], + "source": [ + "top5words(url)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e74dcd65-c3ae-4ca9-8db9-a2bcfda540e2", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "480edd39-71e4-442e-909e-491ad0bdd08c", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20871627-fb66-478f-afdc-9aa479536caa", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "6a021f13-d6a1-4b96-8e18-4eae49d876fe", + "metadata": {}, + "source": [ + "# Introducing the ollama package\n", + "\n", + "And now we'll do the same thing, but using the elegant ollama python package instead of a direct HTTP call.\n", + "\n", + "Under the hood, it's making the same call as above to the ollama server running at localhost:11434" + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "id": "7745b9c4-57dc-4867-9180-61fa5db55eb8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Introducing {website}, your all-inclusive source for everything you need to know about {target}. Whether you're a beginner or a seasoned pro, {website} will offer an unrivaled level of expertise in your chosen field. From comprehensive product reviews and detailed tutorials to the latest industry news and expert insights into {target}, you can expect nothing less than the best in quality content and exceptional value when it comes to learning about {target}. So whether you're looking for an easy-to-follow DIY tutorial or a deep dive into the inner workings of {target}, {website} is your one-stop-shop for all things related to {target}.\n" + ] + } + ], + "source": [ + "import ollama\n", + "\n", + "response = ollama.chat(model=MODEL, messages=messages)\n", + "print(response['message']['content'])" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "id": "89f8d84d-faad-4e58-89b1-bb5b1cea6007", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'tinyllama:latest'" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "MODEL" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34ec140e-454c-4057-86b3-198ab4fdea10", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 91, + "id": "9ad2b34f-8019-4dfd-8d38-2f7acf302e91", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 92, + "id": "0ac0ce16-5bf1-4887-9be7-26e09d017f63", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 97, + "id": "d35d04f8-afa2-4956-98d6-39038f3a79d0", + "metadata": {}, + "outputs": [], + "source": [ + "url = \"https://en.wikipedia.org/wiki/Machine_learning\"" + ] + }, + { + "cell_type": "code", + "execution_count": 98, + "id": "5343a8a5-6517-4fde-9dbc-8d218acdc5a0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'role': 'user', 'content': 'Summarize the website {url}'}]" + ] + }, + "execution_count": 98, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "messages" + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "id": "e5eeb6c3-f8e6-4668-8fb4-d9005f8cfc53", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "https://en.wikipedia.org/wiki/Machine_learning\n" + ] + } + ], + "source": [ + "print(f\"{url}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 94, + "id": "ee7972b1-a42e-4e79-b022-95c3fb311bed", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "ChatResponse(model='tinyllama:latest', created_at='2025-07-13T22:31:56.2219165Z', done=True, done_reason='stop', total_duration=6339979700, load_duration=38978400, prompt_eval_count=42, prompt_eval_duration=48457800, eval_count=88, eval_duration=6248768400, message=Message(role='assistant', content=\"Introducing {company} - your reliable AI assistant! With a range of useful features and benefits, {company} is here to help you tackle even the toughest tasks with ease. From automating repetitive tasks to providing personalized recommendations, our AI technology is designed to improve your productivity and overall workflow. So what are you waiting for? Start implementing {company}'s innovative solutions today!\", thinking=None, images=None, tool_calls=None))" + ] + }, + "execution_count": 94, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 95, + "id": "c463309d-6a7c-45fa-9ae8-dcadf00fdc6f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"Introducing {company} - your reliable AI assistant! With a range of useful features and benefits, {company} is here to help you tackle even the toughest tasks with ease. From automating repetitive tasks to providing personalized recommendations, our AI technology is designed to improve your productivity and overall workflow. So what are you waiting for? Start implementing {company}'s innovative solutions today!\"" + ] + }, + "execution_count": 95, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7438585a-ed00-475e-88e4-a81b93e50516", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 109, + "id": "84bad4ab-e076-4f42-ae56-725165e2ff0f", + "metadata": {}, + "outputs": [], + "source": [ + "def sumwebsite(url):\n", + " message = f\"Summarize the website {url}\"\n", + " messages = [\n", + " {\"role\": \"user\", \"content\": message}\n", + " ]\n", + " response = ollama.chat(model = MODEL, messages= messages)\n", + " print(response.message.content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d1268a2-a57f-4b80-a6b8-f4329ff8144b", + "metadata": {}, + "outputs": [], + "source": [ + "url = \"https://en.wikipedia.org/wiki/Newton%27s_method\"\n", + "url: \"https://stockanalysis.com/stocks/smci/\"" + ] + }, + { + "cell_type": "code", + "execution_count": 117, + "id": "8948bd49-7211-4f43-88fc-88070a564d6c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The website https://en.wikipedia.org/wiki/Newton's_method is a comprehensive and detailed information hub that covers all aspects of this well-known scientific method, including its origin, history, significance, applications in various fields, and recent developments. It provides in-depth analysis and explanations of the key steps involved in the method, as well as the limitations and potential implications for future research. Overall, the website offers a user-friendly and visually appealing resource that is easy to navigate and useful for students, professionals, and anyone interested in learning more about Newton's method.\n" + ] + } + ], + "source": [ + "sumwebsite(url)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6d13fbdb-1951-4495-b901-cc494fc5d3ef", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00a67f36-0511-4709-9f77-2c8d23d31d10", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d796e21e-e34d-409e-a86f-9ad5e85874ad", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "a4704e10-f5fb-4c15-a935-f046c06fb13d", + "metadata": {}, + "source": [ + "## Alternative approach - using OpenAI python library to connect to Ollama" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23057e00-b6fc-4678-93a9-6b31cb704bff", + "metadata": {}, + "outputs": [], + "source": [ + "# There's actually an alternative approach that some people might prefer\n", + "# You can use the OpenAI client python library to call Ollama:\n", + "\n", + "from openai import OpenAI\n", + "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", + "\n", + "response = ollama_via_openai.chat.completions.create(\n", + " model=MODEL,\n", + " messages=messages\n", + ")\n", + "\n", + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "id": "9f9e22da-b891-41f6-9ac9-bd0c0a5f4f44", + "metadata": {}, + "source": [ + "## Are you confused about why that works?\n", + "\n", + "It seems strange, right? We just used OpenAI code to call Ollama?? What's going on?!\n", + "\n", + "Here's the scoop:\n", + "\n", + "The python class `OpenAI` is simply code written by OpenAI engineers that makes calls over the internet to an endpoint. \n", + "\n", + "When you call `openai.chat.completions.create()`, this python code just makes a web request to the following url: \"https://api.openai.com/v1/chat/completions\"\n", + "\n", + "Code like this is known as a \"client library\" - it's just wrapper code that runs on your machine to make web requests. The actual power of GPT is running on OpenAI's cloud behind this API, not on your computer!\n", + "\n", + "OpenAI was so popular, that lots of other AI providers provided identical web endpoints, so you could use the same approach.\n", + "\n", + "So Ollama has an endpoint running on your local box at http://localhost:11434/v1/chat/completions \n", + "And in week 2 we'll discover that lots of other providers do this too, including Gemini and DeepSeek.\n", + "\n", + "And then the team at OpenAI had a great idea: they can extend their client library so you can specify a different 'base url', and use their library to call any compatible API.\n", + "\n", + "That's it!\n", + "\n", + "So when you say: `ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')` \n", + "Then this will make the same endpoint calls, but to Ollama instead of OpenAI." + ] + }, + { + "cell_type": "markdown", + "id": "bc7d1de3-e2ac-46ff-a302-3b4ba38c4c90", + "metadata": {}, + "source": [ + "## Also trying the amazing reasoning model DeepSeek\n", + "\n", + "Here we use the version of DeepSeek-reasoner that's been distilled to 1.5B. \n", + "This is actually a 1.5B variant of Qwen that has been fine-tuned using synethic data generated by Deepseek R1.\n", + "\n", + "Other sizes of DeepSeek are [here](https://ollama.com/library/deepseek-r1) all the way up to the full 671B parameter version, which would use up 404GB of your drive and is far too large for most!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf9eb44e-fe5b-47aa-b719-0bb63669ab3d", + "metadata": {}, + "outputs": [], + "source": [ + "!ollama pull deepseek-r1:1.5b" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "800a66be-f9dc-421c-8dc9-03860ad2368c", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1d3d554b-e00d-4c08-9300-45e073950a76", + "metadata": {}, + "outputs": [], + "source": [ + "# This may take a few minutes to run! You should then see a fascinating \"thinking\" trace inside tags, followed by some decent definitions\n", + "\n", + "response = ollama_via_openai.chat.completions.create(\n", + " model=\"deepseek-r1:1.5b\",\n", + " messages=[{\"role\": \"user\", \"content\": \"Please give definitions of some core concepts behind LLMs: a neural network, attention and the transformer\"}]\n", + ")\n", + "\n", + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34e63aec-beb8-4c4b-b9a0-6740312ac620", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 119, + "id": "c9f2cfec-4b77-47b8-a7c5-58374e6cda37", + "metadata": {}, + "outputs": [], + "source": [ + "def summarizewebsite(url, model):\n", + " api_key = os.getenv('OPENAI_API_KEY')\n", + " # model = 'tinyllama'\n", + " # model = 'tinyllama'\n", + " openai = OpenAI(base_url=\"http://localhost:11434/v1\", api_key=\"ollama\")\n", + " message = f\"Summarize the website {url}\"\n", + " messages = [\n", + " {\"role\": \"user\",\n", + " \"content\": message}\n", + " ]\n", + " response = openai.chat.completions.create(model= model, messages=messages)\n", + " \n", + " print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "code", + "execution_count": 121, + "id": "f14b5cc5-e93b-4251-a548-b740f56bd060", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Scikit-Learn is a Python library for machine learning and data engineering, specifically designed for dealing with datasets. It offers a powerful SGD iteration algorithm in its \"LinearKernelRegressor\" class, enabling quicker and more efficient learning of linear models. The page https://scikit-learn.org/stable/modules/sgd.html provides users with details about how to use this algorithm for regression tasks.\n" + ] + } + ], + "source": [ + "summarizewebsite(url = \"https://scikit-learn.org/stable/modules/sgd.html\", model = \"tinyllama\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7fd6146c-b648-404a-bfb6-3d11e5855a05", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f79eaae1-3ad8-40e8-bc53-ee3a6fed68e8", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "1622d9bb-5c68-4d4e-9ca4-b492c751f898", + "metadata": {}, + "source": [ + "# NOW the exercise for you\n", + "\n", + "Take the code from day1 and incorporate it here, to build a website summarizer that uses Llama 3.2 running locally instead of OpenAI; use either of the above approaches." + ] + }, + { + "cell_type": "code", + "execution_count": 139, + "id": "be2507cf-eb7b-47ad-bae0-a279cbb8e724", + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display, HTML, Image, Markdown" + ] + }, + { + "cell_type": "code", + "execution_count": 140, + "id": "b3d22349-b754-4f68-9148-5bbfc48b26a9", + "metadata": {}, + "outputs": [], + "source": [ + "def extracthtml(url):\n", + " response = requests.get(url)\n", + " if response.status_code == 200:\n", + " html = response.text\n", + " soup = BeautifulSoup(html, \"html.parser\")\n", + " for i in soup(['script', 'style']):\n", + " i.decompose()\n", + " text = soup.get_text()\n", + " # Clean up: remove leading/trailing whitespace on each line\n", + " lines = (line.strip() for line in text.splitlines())\n", + " # Remove empty lines and join into final text\n", + " human_readable_text = '\\n'.join(line for line in lines if line)\n", + " else:\n", + " print(f\"Failed to parse. Status code: {response.status_code}\")\n", + " return human_readable_text" + ] + }, + { + "cell_type": "code", + "execution_count": 141, + "id": "47af576a-d7d2-4a4e-bb7c-1638fdacfd31", + "metadata": {}, + "outputs": [], + "source": [ + "url = \"https://timesofindia.com\"" + ] + }, + { + "cell_type": "code", + "execution_count": 142, + "id": "43dbfab6-ae71-46c9-85f3-09f21a712462", + "metadata": {}, + "outputs": [], + "source": [ + "out = extracthtml(url)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1163f739-308e-4098-94b6-a4a3eb89d24b", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9d61d816-e6da-4e86-9184-4b29d3287da2", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 151, + "id": "ec12e1cb-bc1c-4749-8c77-ecfca1d6f096", + "metadata": {}, + "outputs": [], + "source": [ + "url = \"https://scikit-learn.org/stable/modules/sgd.html\"" + ] + }, + { + "cell_type": "code", + "execution_count": 152, + "id": "10d113ed-535b-435a-a5fb-d893025c3e9e", + "metadata": {}, + "outputs": [], + "source": [ + "def sumwebsite(url, MODEL):\n", + " message = f\"Summarize the website {url}\"\n", + " messages = [\n", + " {\"role\": \"user\", \"content\": message}\n", + " ]\n", + " response = ollama.chat(model = MODEL, messages= messages)\n", + " print(response.message.content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6de38216-6d1c-48c4-877b-86d403f4e0f8", + "metadata": {}, + "outputs": [], + "source": [ + "sumwebsite(url, \"tinyllama\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ff20943b-2f9a-4211-830a-a53f09a57e7b", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "67c861d9-b5a4-4cf1-ae17-139e61e21d76", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day2-EXERCISE-ollama-local.ipynb b/week1/community-contributions/day2-EXERCISE-ollama-local.ipynb new file mode 100644 index 0000000..6942c54 --- /dev/null +++ b/week1/community-contributions/day2-EXERCISE-ollama-local.ipynb @@ -0,0 +1,459 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9", + "metadata": {}, + "source": [ + "# Welcome to your first assignment!\n", + "\n", + "Instructions are below. Please give this a try, and look in the solutions folder if you get stuck (or feel free to ask me!)" + ] + }, + { + "cell_type": "markdown", + "id": "ada885d9-4d42-4d9b-97f0-74fbbbfe93a9", + "metadata": {}, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \n", + "

Just before we get to the assignment --

\n", + " I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.
\n", + " https://edwarddonner.com/2024/11/13/llm-engineering-resources/
\n", + " Please keep this bookmarked, and I'll continue to add more useful links there over time.\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "6e9fa1fc-eac5-4d1d-9be4-541b3f2b3458", + "metadata": {}, + "source": [ + "# HOMEWORK EXERCISE ASSIGNMENT\n", + "\n", + "Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI\n", + "\n", + "You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.\n", + "\n", + "**Benefits:**\n", + "1. No API charges - open-source\n", + "2. Data doesn't leave your box\n", + "\n", + "**Disadvantages:**\n", + "1. Significantly less power than Frontier Model\n", + "\n", + "## Recap on installation of Ollama\n", + "\n", + "Simply visit [ollama.com](https://ollama.com) and install!\n", + "\n", + "Once complete, the ollama server should already be running locally. \n", + "If you visit: \n", + "[http://localhost:11434/](http://localhost:11434/)\n", + "\n", + "You should see the message `Ollama is running`. \n", + "\n", + "If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve` \n", + "And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2` \n", + "Then try [http://localhost:11434/](http://localhost:11434/) again.\n", + "\n", + "If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code below from `MODEL = \"llama3.2\"` to `MODEL = \"llama3.2:1b\"`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4e2a9393-7767-488e-a8bf-27c12dca35bd", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import requests\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29ddd15d-a3c5-4f4e-a678-873f56162724", + "metadata": {}, + "outputs": [], + "source": [ + "# Constants\n", + "\n", + "OLLAMA_API = \"http://localhost:11434/api/chat\"\n", + "HEADERS = {\"Content-Type\": \"application/json\"}\n", + "MODEL = \"llama3.2\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dac0a679-599c-441f-9bf2-ddc73d35b940", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a messages list using the same format that we used for OpenAI\n", + "\n", + "messages = [\n", + " {\"role\": \"user\", \"content\": \"Describe some of the business applications of Generative AI\"}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7bb9c624-14f0-4945-a719-8ddb64f66f47", + "metadata": {}, + "outputs": [], + "source": [ + "payload = {\n", + " \"model\": MODEL,\n", + " \"messages\": messages,\n", + " \"stream\": False\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "479ff514-e8bd-4985-a572-2ea28bb4fa40", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's just make sure the model is loaded\n", + "\n", + "!ollama pull llama3.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42b9f644-522d-4e05-a691-56e7658c0ea9", + "metadata": {}, + "outputs": [], + "source": [ + "# If this doesn't work for any reason, try the 2 versions in the following cells\n", + "# And double check the instructions in the 'Recap on installation of Ollama' at the top of this lab\n", + "# And if none of that works - contact me!\n", + "\n", + "response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)\n", + "print(response.json()['message']['content'])" + ] + }, + { + "cell_type": "markdown", + "id": "6a021f13-d6a1-4b96-8e18-4eae49d876fe", + "metadata": {}, + "source": [ + "# Introducing the ollama package\n", + "\n", + "And now we'll do the same thing, but using the elegant ollama python package instead of a direct HTTP call.\n", + "\n", + "Under the hood, it's making the same call as above to the ollama server running at localhost:11434" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7745b9c4-57dc-4867-9180-61fa5db55eb8", + "metadata": {}, + "outputs": [], + "source": [ + "import ollama\n", + "\n", + "response = ollama.chat(model=MODEL, messages=messages)\n", + "print(response['message']['content'])" + ] + }, + { + "cell_type": "markdown", + "id": "a4704e10-f5fb-4c15-a935-f046c06fb13d", + "metadata": {}, + "source": [ + "## Alternative approach - using OpenAI python library to connect to Ollama" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23057e00-b6fc-4678-93a9-6b31cb704bff", + "metadata": {}, + "outputs": [], + "source": [ + "# There's actually an alternative approach that some people might prefer\n", + "# You can use the OpenAI client python library to call Ollama:\n", + "\n", + "from openai import OpenAI\n", + "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", + "\n", + "response = ollama_via_openai.chat.completions.create(\n", + " model=MODEL,\n", + " messages=messages\n", + ")\n", + "\n", + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "id": "9f9e22da-b891-41f6-9ac9-bd0c0a5f4f44", + "metadata": {}, + "source": [ + "## Are you confused about why that works?\n", + "\n", + "It seems strange, right? We just used OpenAI code to call Ollama?? What's going on?!\n", + "\n", + "Here's the scoop:\n", + "\n", + "The python class `OpenAI` is simply code written by OpenAI engineers that makes calls over the internet to an endpoint. \n", + "\n", + "When you call `openai.chat.completions.create()`, this python code just makes a web request to the following url: \"https://api.openai.com/v1/chat/completions\"\n", + "\n", + "Code like this is known as a \"client library\" - it's just wrapper code that runs on your machine to make web requests. The actual power of GPT is running on OpenAI's cloud behind this API, not on your computer!\n", + "\n", + "OpenAI was so popular, that lots of other AI providers provided identical web endpoints, so you could use the same approach.\n", + "\n", + "So Ollama has an endpoint running on your local box at http://localhost:11434/v1/chat/completions \n", + "And in week 2 we'll discover that lots of other providers do this too, including Gemini and DeepSeek.\n", + "\n", + "And then the team at OpenAI had a great idea: they can extend their client library so you can specify a different 'base url', and use their library to call any compatible API.\n", + "\n", + "That's it!\n", + "\n", + "So when you say: `ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')` \n", + "Then this will make the same endpoint calls, but to Ollama instead of OpenAI." + ] + }, + { + "cell_type": "markdown", + "id": "bc7d1de3-e2ac-46ff-a302-3b4ba38c4c90", + "metadata": {}, + "source": [ + "## Also trying the amazing reasoning model DeepSeek\n", + "\n", + "Here we use the version of DeepSeek-reasoner that's been distilled to 1.5B. \n", + "This is actually a 1.5B variant of Qwen that has been fine-tuned using synethic data generated by Deepseek R1.\n", + "\n", + "Other sizes of DeepSeek are [here](https://ollama.com/library/deepseek-r1) all the way up to the full 671B parameter version, which would use up 404GB of your drive and is far too large for most!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf9eb44e-fe5b-47aa-b719-0bb63669ab3d", + "metadata": {}, + "outputs": [], + "source": [ + "!ollama pull deepseek-r1:1.5b" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1d3d554b-e00d-4c08-9300-45e073950a76", + "metadata": {}, + "outputs": [], + "source": [ + "# This may take a few minutes to run! You should then see a fascinating \"thinking\" trace inside tags, followed by some decent definitions\n", + "\n", + "response = ollama_via_openai.chat.completions.create(\n", + " model=\"deepseek-r1:1.5b\",\n", + " messages=[{\"role\": \"user\", \"content\": \"Please give definitions of some core concepts behind LLMs: a neural network, attention and the transformer\"}]\n", + ")\n", + "\n", + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "id": "1622d9bb-5c68-4d4e-9ca4-b492c751f898", + "metadata": {}, + "source": [ + "# NOW the exercise for you\n", + "\n", + "Take the code from day1 and incorporate it here, to build a website summarizer that uses Llama 3.2 running locally instead of OpenAI; use either of the above approaches." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6de38216-6d1c-48c4-877b-86d403f4e0f8", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0bd2aea1-d7d7-499f-b704-5b13e2ddd23f", + "metadata": {}, + "outputs": [], + "source": [ + "MODEL = \"llama3.2\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6df3141a-0a46-4ff9-ae73-bf8bee2aa3d8", + "metadata": {}, + "outputs": [], + "source": [ + "# A class to represent a Webpage\n", + "\n", + "class Website:\n", + " \"\"\"\n", + " A utility class to represent a Website that we have scraped\n", + " \"\"\"\n", + " url: str\n", + " title: str\n", + " text: str\n", + "\n", + " def __init__(self, url):\n", + " \"\"\"\n", + " Create this Website object from the given url using the BeautifulSoup library\n", + " \"\"\"\n", + " self.url = url\n", + " response = requests.get(url)\n", + " soup = BeautifulSoup(response.content, 'html.parser')\n", + " self.title = soup.title.string if soup.title else \"No title found\"\n", + " for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n", + " irrelevant.decompose()\n", + " self.text = soup.body.get_text(separator=\"\\n\", strip=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "df2ea48b-7343-47be-bdcb-52b63a4de43e", + "metadata": {}, + "outputs": [], + "source": [ + "# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish.\"\n", + "\n", + "system_prompt = \"You are an assistant that analyzes the contents of a website \\\n", + "and provides a short summary, ignoring text that might be navigation related. \\\n", + "Respond in markdown.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "80f1a534-ae2a-4283-83cf-5e7c5765c736", + "metadata": {}, + "outputs": [], + "source": [ + "# A function that writes a User Prompt that asks for summaries of websites:\n", + "\n", + "def user_prompt_for(website):\n", + " user_prompt = f\"You are looking at a website titled {website.title}\"\n", + " user_prompt += \"The contents of this website is as follows; \\\n", + "please provide a short summary of this website in markdown. \\\n", + "If it includes news or announcements, then summarize these too.\\n\\n\"\n", + " user_prompt += website.text\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5dfe658d-e3f9-4b32-90e6-1a523f47f836", + "metadata": {}, + "outputs": [], + "source": [ + "# See how this function creates exactly the format above\n", + "\n", + "def messages_for(website):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(website)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2e2a09d0-bc47-490e-b085-fe3ccfbd16ad", + "metadata": {}, + "outputs": [], + "source": [ + "# And now: call the Ollama function instead of OpenAI\n", + "\n", + "def summarize(url):\n", + " website = Website(url)\n", + " messages = messages_for(website)\n", + " response = ollama.chat(model=MODEL, messages=messages)\n", + " return response['message']['content']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "340e08a2-86f0-4cdd-9188-da2972cae7a6", + "metadata": {}, + "outputs": [], + "source": [ + "# A function to display this nicely in the Jupyter output, using markdown\n", + "\n", + "def display_summary(url):\n", + " summary = summarize(url)\n", + " display(Markdown(summary))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "55e4790a-013c-40cf-9dff-bb5ec1d53964", + "metadata": {}, + "outputs": [], + "source": [ + "display_summary(\"https://zhufqiu.com\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8a96cbad-1306-4ce1-a942-2448f50d6751", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day2_grocery_list_generator_with_recipe_scaler.ipynb b/week1/community-contributions/day2_grocery_list_generator_with_recipe_scaler.ipynb new file mode 100644 index 0000000..8b2e731 --- /dev/null +++ b/week1/community-contributions/day2_grocery_list_generator_with_recipe_scaler.ipynb @@ -0,0 +1,266 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "0", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "from IPython.display import Markdown, display" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "\n", + "load_dotenv()\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "# Check the key\n", + "\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif not api_key.startswith(\"sk-proj-\"):\n", + " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's just make sure the model is loaded\n", + "!ollama pull llama3.2\n", + "import ollama\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# System prompt - defines the AI's behavior\n", + "SYSTEM_PROMPT = \"\"\"You are a helpful cooking assistant that provides ingredient lists for recipes.\n", + "Format your response as clean markdown with this structure:\n", + "\n", + "# [Dish Name]\n", + "**Serves:** [number] people \n", + "**Cook Time:** [estimated time]\n", + "\n", + "## Shopping List\n", + "- [ ] [amount] [unit] [ingredient]\n", + "- [ ] [amount] [unit] [ingredient]\n", + "\n", + "Guidelines:\n", + "- Use common grocery store measurements (cups, lbs, oz, pieces, cans, etc.)\n", + "- Round to practical shopping amounts (1.5 lbs instead of 1.47 lbs)\n", + "- Group similar items when logical (all spices together)\n", + "- Include pantry staples only if they're essential (salt, oil, etc.)\n", + "- Assume basic seasonings are available unless recipe-specific\n", + "- For produce, specify size when important (large onion, medium tomatoes)\n", + "- Keep optional items at the end of similar item groups or end of the list\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def get_recipe_openai(dish_name: str, num_people: int):\n", + " \"\"\"Get scaled recipe ingredients using system and user prompts\"\"\"\n", + "\n", + " user_prompt = f\"Give me the ingredients needed to make {dish_name} for {num_people} people.\"\n", + " \n", + " try:\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4o-mini\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ],\n", + " max_tokens=400\n", + " )\n", + " \n", + " return response.choices[0].message.content\n", + " \n", + " except Exception as e:\n", + " return f\"❌ Error: Failed to get recipe - {str(e)}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6", + "metadata": {}, + "outputs": [], + "source": [ + "OLLAMA_MODEL = \"llama3.2\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7", + "metadata": {}, + "outputs": [], + "source": [ + "def get_recipe_ollama(dish_name: str, num_people: int):\n", + " \"\"\"Get recipe using Ollama API\"\"\"\n", + " user_prompt = f\"Give me the ingredients needed to make {dish_name} for {num_people} people.\"\n", + " \n", + " messages = [\n", + " {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ]\n", + " \n", + " try:\n", + " response = ollama.chat(model=OLLAMA_MODEL, messages=messages)\n", + " return response['message']['content']\n", + " except Exception as e:\n", + " return f\"❌ Ollama Error: {str(e)}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def print_shopping_list(recipe_markdown):\n", + " \"\"\"Print the markdown response\"\"\"\n", + " display(Markdown(recipe_markdown))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "print(\"🍳 Recipe Scaler & Grocery List Maker\")\n", + "print(\"=\" * 40)\n", + " \n", + "ai_service_choice = input(\"\\nChoose AI service (1 for OpenAI, 2 for Ollama): \").strip()\n", + "\n", + "dish = input(\"What dish do you want to make? \")\n", + "num_people = int(input(\"How many people? \"))\n", + " \n", + "print(f\"\\n🔍 Getting recipe for {dish}...\")\n", + " \n", + "# Get and display recipe\n", + "if ai_service_choice == '1':\n", + " print(\"Using OpenAI API...\")\n", + " recipe_markdown = get_recipe_openai(dish, num_people)\n", + "else:\n", + " print(\"Using Ollama (local)...\")\n", + " recipe_markdown = get_recipe_ollama(dish, num_people)\n", + "\n", + "print_shopping_list(recipe_markdown)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day5_challenge_exercise/day5_exercise.ipynb b/week1/community-contributions/day5_challenge_exercise/day5_exercise.ipynb new file mode 100644 index 0000000..b746ed8 --- /dev/null +++ b/week1/community-contributions/day5_challenge_exercise/day5_exercise.ipynb @@ -0,0 +1,191 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "75e66023-eccf-46a9-8b70-7b21ede16ddd", + "metadata": {}, + "source": [ + "# End of week 1 exercise\n", + "\n", + "To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n", + "and responds with an explanation. This is a tool that you will be able to use yourself during the course!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "72d21373-edbd-4432-a29d-db8e6c9c5808", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from IPython.display import Markdown, display, update_display\n", + "from openai import OpenAI\n", + "import ollama" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d4e4c15b-7ae8-43e9-839d-7cc49345be5a", + "metadata": {}, + "outputs": [], + "source": [ + "!ollama pull llama3.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7fb44166-1c65-42fc-9950-1960bc3cc432", + "metadata": {}, + "outputs": [], + "source": [ + "# constants\n", + "\n", + "MODEL_GPT = 'gpt-4o-mini'\n", + "MODEL_LLAMA = 'llama3.2'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "58f5f1e1-5296-4631-9698-8645d4621a0c", + "metadata": {}, + "outputs": [], + "source": [ + "# set up environment\n", + "\n", + "# Get the openai key\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "if openai_api_key and openai_api_key.startswith('sk-proj-') and len(openai_api_key)>10:\n", + " print(\"API key looks good so far\")\n", + "else:\n", + " print(\"There might be a problem with your API key? Please visit the troubleshooting notebook!\")\n", + "\n", + "openai = OpenAI()\n", + "# Get the ollama key using the llama model\n", + "\n", + "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12f07b33-76b9-42fa-9962-21f2a5796126", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = \"You are a knowledgeable technical instructor who helps students understand \\\n", + "complex concepts across a wide range of technical topics. Your expertise includes artificial]\\\n", + "intelligence, machine learning, large language models (LLMs), and programming in languages \\\n", + "such as Python, JavaScript, Java, and more. You also provide in-depth support for \\\n", + "AI engineering questions and other advanced technical subjects.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "330abeb7-7db2-4f23-9d19-dd698058a400", + "metadata": {}, + "outputs": [], + "source": [ + "# here is the question; type over this to ask something new\n", + "\n", + "question = \"\"\"\n", + "Please explain what this code does and why:\n", + "yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bd11ad48-91ec-4cdf-9c57-99a0451e7a2f", + "metadata": {}, + "outputs": [], + "source": [ + "# Get gpt-4o-mini to answer, with streaming\n", + "stream_GPT = openai.chat.completions.create(\n", + " model=MODEL_GPT,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": question}\n", + " ],\n", + " stream = True\n", + " )\n", + "response_GPT = \"\"\n", + "display_handle = display(Markdown(\"\"), display_id=True)\n", + "for chunk in stream_GPT:\n", + " response_GPT += chunk.choices[0].delta.content or ''\n", + " response_GPT = response_GPT.replace(\"```\",\"\").replace(\"markdown\", \"\")\n", + " update_display(Markdown(response_GPT), display_id=display_handle.display_id)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd2527ae-0d75-4f15-a45f-92075e3059d6", + "metadata": {}, + "outputs": [], + "source": [ + "# Get Llama 3.2 to answer\n", + "\n", + "response_llama = ollama_via_openai.chat.completions.create(\n", + " model=MODEL_LLAMA,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": question}\n", + " ],\n", + " )\n", + "result = response_llama.choices[0].message.content\n", + "\n", + "display(Markdown(result))\n", + "\n", + "# import ollama\n", + "\n", + "# response = ollama.chat(model=MODEL_LLAMA, messages=[\n", + "# {\"role\": \"system\", \"content\": system_prompt},\n", + "# {\"role\": \"user\", \"content\": question}\n", + "# ])\n", + "# print(response['message']['content'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c2747739-ba64-4067-902f-c1acc0dbdaca", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/day5_challenge_exercise/day5_translation_challenge.ipynb b/week1/community-contributions/day5_challenge_exercise/day5_translation_challenge.ipynb new file mode 100644 index 0000000..744150c --- /dev/null +++ b/week1/community-contributions/day5_challenge_exercise/day5_translation_challenge.ipynb @@ -0,0 +1,366 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "53b9681c-896a-4e5d-b62c-44c90612e67c", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "import json\n", + "from typing import List\n", + "from dotenv import load_dotenv\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display, update_display\n", + "from openai import OpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3c6f1133-5c17-4ca7-819c-f64cc48212ec", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize constants and get api_key\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "#Check if api_key is correct\n", + "if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:\n", + " print(\"API key looks good so far\")\n", + "else:\n", + " print(\"There might be a problem with your API key? Please visit the troubleshooting notebook!\")\n", + " \n", + "MODEL = 'gpt-4o-mini'\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4cdb0a59-b5e1-4df5-a17e-8c36c80695b4", + "metadata": {}, + "outputs": [], + "source": [ + "# A class to represent a Webpage\n", + "\n", + "# Some websites need you to use proper headers when fetching them:\n", + "headers = {\n", + " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", + "}\n", + "\n", + "class Website:\n", + " \"\"\"\n", + " A utility class to represent a Website that we have scraped, now with links\n", + " \"\"\"\n", + "\n", + " def __init__(self, url):\n", + " self.url = url\n", + " response = requests.get(url, headers=headers)\n", + " self.body = response.content\n", + " soup = BeautifulSoup(self.body, 'html.parser')\n", + " self.title = soup.title.string if soup.title else \"No title found\"\n", + " if soup.body:\n", + " for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n", + " irrelevant.decompose()\n", + " self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n", + " else:\n", + " self.text = \"\"\n", + " links = [link.get('href') for link in soup.find_all('a')]\n", + " self.links = [link for link in links if link]\n", + "\n", + " def get_contents(self):\n", + " return f\"Webpage Title:\\n{self.title}\\nWebpage Contents:\\n{self.text}\\n\\n\"" + ] + }, + { + "cell_type": "markdown", + "id": "50d4cffe-da7a-4cab-afea-d061a1a608ac", + "metadata": {}, + "source": [ + "Step 1: Find relevant links to the website in order to create the brochure (Use Multi-shot prompting)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b43b4c64-bc6a-41ca-bdb9-aa714e4e794e", + "metadata": {}, + "outputs": [], + "source": [ + "link_system_prompt = \"You are provided with a list of links found on a webpage like ['https://edwarddonner.com/', https://www.udemy.com/course/llm-engineering-master-ai-and-large-language-models/?referralCode=35EB41EBB11DD247CF54&couponCode=KEEPLEARNING] or ['https://huggingface.co/', https://huggingface.co/models] \\\n", + "You are able to decide which of the links would be most relevant to include in a brochure about the company, \\\n", + "such as links to an About page, or a News page, or a Home page, or a Company page, or Careers/Jobs pages.\\n\"\n", + "link_system_prompt += \"You should respond in JSON as in these example:\"\n", + "link_system_prompt += \"\"\"\n", + "{\n", + " \"links\": [\n", + " {\"type\": \"about page\", \"url\": \"https://full.url/goes/here/about\"},\n", + " {\"type\": \"careers page\", \"url\": \"https://another.full.url/careers\"}\n", + " ]\n", + "}\n", + "\n", + "{\n", + " \"links\": [\n", + " {\"type\": \"home page\", \"url\": \"https://full.url/goes/here/about\"},\n", + " {\"type\": \"news page\", \"url\": \"https://another.full.url/careers\"}\n", + " ]\n", + "}\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15d2870c-67ab-4aa2-89f5-04b608a9c810", + "metadata": {}, + "outputs": [], + "source": [ + "def get_links_user_prompt(website):\n", + " user_prompt = f\"Here is the list of links on the website of {website.url} - \"\n", + " user_prompt += \"please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \\\n", + "Do not include Terms of Service, Privacy, email links.\\n\"\n", + " user_prompt += \"Links (some might be relative links):\\n\"\n", + " user_prompt += \"\\n\".join(website.links)\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e255be42-5e71-47ca-9275-c0cf22beeb00", + "metadata": {}, + "outputs": [], + "source": [ + "def get_links(url):\n", + " website = Website(url)\n", + " response = openai.chat.completions.create(\n", + " model=MODEL,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": link_system_prompt},\n", + " {\"role\": \"user\", \"content\": get_links_user_prompt(website)}\n", + " ],\n", + " response_format={\"type\": \"json_object\"}\n", + " )\n", + " result = response.choices[0].message.content\n", + " return json.loads(result)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "818b6e50-c403-42a1-8ee4-7606eaf0006f", + "metadata": {}, + "outputs": [], + "source": [ + "get_links('https://huggingface.co/')" + ] + }, + { + "cell_type": "markdown", + "id": "030ceb9b-ef71-41fd-9f23-92cb6e1d137e", + "metadata": {}, + "source": [ + "Step 2: Generate the brochure using the relevant links we got from OpenAI's selection" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a703230e-d57b-43a5-bdd0-e25fc2ec2e3b", + "metadata": {}, + "outputs": [], + "source": [ + "def get_all_details(url):\n", + " result = \"Landing page:\\n\"\n", + " result += Website(url).get_contents()\n", + " links = get_links(url)\n", + " print(\"Found links:\", links)\n", + " for link in links[\"links\"]:\n", + " result += f\"\\n\\n{link['type']}\\n\"\n", + " result += Website(link[\"url\"]).get_contents()\n", + " return result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "74d19852-f817-4fee-a95c-35ca7a83234f", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = \"\"\"You are an assistant that analyzes the contents of several relevant pages from a company website \\\n", + "and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\\\n", + "Include details of company culture, customers and careers/jobs if you have the information. \\\n", + "Example 1: \\\n", + "Relevant pages: \\\n", + "- https://example.com/about \\\n", + "- https://example.com/careers \\\n", + "- https://example.com/news \\\n", + "\n", + "Brochure: \\\n", + "# About ExampleCorp \\\n", + "ExampleCorp is a global leader in AI-driven logistics optimization. Founded in 2015, the company serves clients in over 30 countries... \\\n", + "\n", + "--- \\\n", + "\n", + "Example 2: \\\n", + "Relevant pages: \\\n", + "- https://techstart.io/home \\\n", + "- https://techstart.io/jobs \\\n", + "- https://techstart.io/customers \\\n", + "\n", + "Brochure: \\\n", + "# Welcome to TechStart \\\n", + "TechStart builds tools that power the future of software development. With a team-first culture and customers like Stripe, Atlassian... \\\n", + "\n", + "--- \\\n", + "\n", + "\"\"\"\n", + "\n", + "# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':\n", + "\n", + "# system_prompt = \"You are an assistant that analyzes the contents of several relevant pages from a company website \\\n", + "# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\\\n", + "# Include details of company culture, customers and careers/jobs if you have the information.\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a2f19085-0d03-4386-b390-a38014ca6590", + "metadata": {}, + "outputs": [], + "source": [ + "def get_brochure_user_prompt(company_name, url):\n", + " user_prompt = f\"You are looking at a company called: {company_name}\\n\"\n", + " user_prompt += f\"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\\n\"\n", + " user_prompt += get_all_details(url)\n", + " user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0ddbdea7-cf80-48d4-8bce-a11bd1a32d47", + "metadata": {}, + "outputs": [], + "source": [ + "def create_brochure(company_name, url):\n", + " response = openai.chat.completions.create(\n", + " model=MODEL,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": get_brochure_user_prompt(company_name, url)}\n", + " ],\n", + " )\n", + " result = response.choices[0].message.content\n", + " # display(Markdown(result))\n", + " return result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "023c1ba0-7f5a-48ac-9a98-dd184432a758", + "metadata": {}, + "outputs": [], + "source": [ + "create_brochure(\"HuggingFace\", \"https://huggingface.co\")" + ] + }, + { + "cell_type": "markdown", + "id": "187651f6-d42d-405a-abed-732486161359", + "metadata": {}, + "source": [ + "Step 3: Translate to French" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7734915d-d38f-40ad-8335-0df39c91f6d8", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = \"\"\"You are a translator that translates the English language to the French language \\\n", + "professionally. All you do, is first show the original version in english and then show the translate version below it in French.\\\n", + "Respond in Markdown\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29a1b40c-9040-4a3d-808b-0ca906d5cfc8", + "metadata": {}, + "outputs": [], + "source": [ + "def get_user_translation_prompt(company_name, url):\n", + " user_prompt=\"You are to translate the following brochure from the english to the french \\\n", + " language and going to display it with the English language brochure version first and then\\\n", + " the French language brochure version, don't make any changes to it, just a translation, the \\\n", + " following is the brochure:\"\n", + " user_prompt+=create_brochure(company_name, url)\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a6e45b1f-3fa6-4db8-9f73-8339265502a7", + "metadata": {}, + "outputs": [], + "source": [ + "def translate_brochure(company_name, url):\n", + " response = openai.chat.completions.create(\n", + " model=MODEL,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": get_user_translation_prompt(company_name, url)}\n", + " ],\n", + " )\n", + " result = response.choices[0].message.content\n", + " display(Markdown(result))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f71c2496-76ea-4f25-9939-98ebd37cb6a6", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "translate_brochure(\"HuggingFace\", \"https://huggingface.co\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/sruthi-day1-ollama_website_summarizer.py b/week1/community-contributions/sruthi-day1-ollama_website_summarizer.py new file mode 100644 index 0000000..d2750f7 --- /dev/null +++ b/week1/community-contributions/sruthi-day1-ollama_website_summarizer.py @@ -0,0 +1,84 @@ +""" +Project: Web Content Summarizer using Ollama's llama3.2 model +- Developed a Python tool to extract and summarize website content using Ollama's llama3.2 model and BeautifulSoup. +- Implemented secure API integration and HTTP requests with custom headers to mimic browser behavior. +""" + +import os +import requests +from bs4 import BeautifulSoup +import ollama + +# Constants + +OLLAMA_API = "http://localhost:11434/api/chat" +HEADERS = {"Content-Type": "application/json"} +MODEL = "llama3.2" + +# Define the Website class to fetch and parse website content +class Website: + def __init__(self, url): + """ + Initialize a Website object by fetching and parsing the given URL. + Uses BeautifulSoup to extract the title and text content of the page. + """ + self.url = url + response = requests.get(url, headers=HEADERS) + soup = BeautifulSoup(response.content, 'html.parser') + + # Extract the title of the website + self.title = soup.title.string if soup.title else "No title found" + + # Remove irrelevant elements like scripts, styles, images, and inputs + for irrelevant in soup.body(["script", "style", "img", "input"]): + irrelevant.decompose() + + # Extract the main text content of the website + self.text = soup.body.get_text(separator="\n", strip=True) + +# Define the system prompt for the OpenAI model +system_prompt = ( + "You are an assistant that analyzes the contents of a website " + "and provides a short summary, ignoring text that might be navigation related. " + "Respond in markdown." +) + +# Function to generate the user prompt based on the website content +def user_prompt_for(website): + """ + Generate a user prompt for the llama3.2 model based on the website's title and content. + """ + user_prompt = f"You are looking at a website titled {website.title}" + user_prompt += "\nThe contents of this website is as follows; summarize these.\n\n" + user_prompt += website.text + return user_prompt + +# Function to create the messages list for the OpenAI API +def messages_for(website): + """ + Create a list of messages for the ollama, including the system and user prompts. + """ + return [ + {"role": "system", "content": system_prompt}, + {"role": "user", "content": user_prompt_for(website)} + ] + +# Function to summarize the content of a given URL +def summarize(url): + """ + Summarize the content of the given URL using the OpenAI API. + """ + # Create a Website object to fetch and parse the URL + website = Website(url) + + # Call the llama3.2 using ollama with the generated messages + response = ollama.chat( + model= MODEL, + messages=messages_for(website) + ) + + # Return the summary generated by ollama + print(response.message.content) + +# Example usage: Summarize the content of a specific URL +summarize("https://sruthianem.com") \ No newline at end of file diff --git a/week1/community-contributions/summarizer_using_llama3.2.ipynb b/week1/community-contributions/summarizer_using_llama3.2.ipynb new file mode 100644 index 0000000..8d1d681 --- /dev/null +++ b/week1/community-contributions/summarizer_using_llama3.2.ipynb @@ -0,0 +1,454 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a92df66b-68c9-4288-b881-45d1fd948c18", + "metadata": {}, + "source": [ + "### Week 1 Contribution: Selenium-enhanced Website Summarizer\n", + "This notebook attempts to summarize content from any website using a BeautifulSoup-first strategy with a Selenium fallback for JavaScript-heavy pages. Llama 3.2 is used to generate a markdown-formatted summary.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "407ea4b4-7c1b-4f94-a48d-f3ee3273bc61", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown,display\n", + "from openai import OpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "040e97a8-9a5f-4903-9d0e-fa19bb719b4f", + "metadata": {}, + "outputs": [], + "source": [ + "MODEL=\"llama3.2\"\n", + "openai=OpenAI(base_url=\"http://localhost:11434/v1\",api_key=\"ollama\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cac3c9ae-31ce-45b1-bbc1-70577a198e84", + "metadata": {}, + "outputs": [], + "source": [ + "message=\"Hi, write a snarky poem for me.\" \n", + "response=openai.chat.completions.create(\n", + " model=MODEL,\n", + " messages=[{\n", + " \"role\":\"user\",\n", + " \"content\":message\n", + " }]\n", + ")\n", + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "id": "a27514f6-d7a5-4292-b98b-dc166416a2fc", + "metadata": {}, + "source": [ + "### Beautiful Soup Version" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "678901b6-5da1-4df7-8b73-a1c69dc758b0", + "metadata": {}, + "outputs": [], + "source": [ + "headers = {\n", + " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", + "} # to make sure we're not blocked as bots from websites\n", + "\n", + "class bsWebsite:\n", + " \"\"\"\n", + " Attributes:\n", + " url (str): The URL of the page\n", + " title (str): The title of the page\n", + " text (str): The readable text from the page\n", + " \"\"\"\n", + "\n", + " def __init__(self,url):\n", + " self.url=url\n", + " response=requests.get(url,headers=headers) # gets the content of the page in response variable\n", + "\n", + " soup=BeautifulSoup(response.content,'html.parser') # content of response is accessed using html parser for structure\n", + " self.title=soup.title.string if soup.title else \"No title\"\n", + "\n", + " for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n", + " irrelevant.decompose()\n", + "\n", + " self.text=soup.body.get_text(separator='\\n',strip=True)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9a1a5ddd-7907-46fd-a1b7-ceeb876262f7", + "metadata": {}, + "outputs": [], + "source": [ + "ed = bsWebsite(\"https://edwarddonner.com\")\n", + "\n", + "print(ed.url)\n", + "print(ed.text)\n", + "print(ed.title)" + ] + }, + { + "cell_type": "markdown", + "id": "b7e965e4-7d20-4980-8cb2-871b8ca63c45", + "metadata": {}, + "source": [ + "#### Now, let's create a detailed summary for how selenium works using what we just made" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b71a05c6-669b-4632-aeb9-b51daa4429a1", + "metadata": {}, + "outputs": [], + "source": [ + "sel=bsWebsite(\"https://www.geeksforgeeks.org/software-engineering/selenium-webdriver-tutorial/\")\n", + "print(sel.url)\n", + "print(sel.title)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4c463c67-2a9c-4fcd-99aa-cab0e2cdf936", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(web):\n", + " user_prompt=f\"\"\"You are looking at a website called {web.title}. \n", + " Provide a detailed summary of the given content and the concepts in markdown:\\n[{web.text}]\"\"\"\n", + "\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b2118ac4-3355-4f90-b799-ba375ceeafc1", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt=\"\"\"You are an assistant that analyses the contents of a website based on request of user, \n", + "while ignoring text that is navigation related. Respond in markdown.\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "716b3772-3c73-4010-b089-8bc374cab9de", + "metadata": {}, + "outputs": [], + "source": [ + "print(user_prompt_for(ed))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b23b39b4-78a3-4694-8c89-f2ce56b628f2", + "metadata": {}, + "outputs": [], + "source": [ + "user_prompt=user_prompt_for(sel)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce29c83c-7b47-43a8-8f92-c2a1aa36f8f5", + "metadata": {}, + "outputs": [], + "source": [ + "messages=[\n", + " { \"role\":\"system\", \"content\":system_prompt},\n", + " { \"role\":\"user\", \"content\":user_prompt}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1f120702-029e-4c1a-8ffb-2c4944110aa8", + "metadata": {}, + "outputs": [], + "source": [ + "response=openai.chat.completions.create(model=MODEL,messages=messages)\n", + "\n", + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "id": "e9326415-6d35-4750-b9b1-1ae83a86d6f7", + "metadata": {}, + "source": [ + "### Selenium Version" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba86d4cc-cf4c-4f75-aa57-4126b15463b7", + "metadata": {}, + "outputs": [], + "source": [ + "# making sure we're in the virtual environment\n", + "import sys\n", + "print(sys.executable)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2ba86dfa-1e91-4535-9c93-3838c46aee52", + "metadata": {}, + "outputs": [], + "source": [ + "# !pip install selenium" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "01771002-b10f-4681-8710-0f1515866c92", + "metadata": {}, + "outputs": [], + "source": [ + "# !pip install webdriver-manager" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c19b582d-a355-4c20-8028-42a802e7dca5", + "metadata": {}, + "outputs": [], + "source": [ + "from selenium import webdriver\n", + "from selenium.webdriver.edge.service import Service\n", + "# for edge only:\n", + "from webdriver_manager.microsoft import EdgeChromiumDriverManager" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "978ab0b9-b42b-4136-8383-79b3f84e084b", + "metadata": {}, + "outputs": [], + "source": [ + "# works for edge only. Do not close the window that pops up as t will be used to open sites given.\n", + "driver=webdriver.Edge(service=Service(EdgeChromiumDriverManager().install()))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7dfdeb48-562e-44d3-9044-157d616835fd", + "metadata": {}, + "outputs": [], + "source": [ + "# creating a similar class as bsWebsie but using selenium\n", + "class SelWebsite:\n", + "\n", + " def __init__(self,url,driver):\n", + " self.driver=driver\n", + " self.driver.get(url)\n", + " \n", + " self.url=self.driver.current_url\n", + " self.title=self.driver.title\n", + " self.text=self.driver.find_element(By.TAG_NAME,\"body\").text" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6174105d-c123-4032-afa8-75588c0f1133", + "metadata": {}, + "outputs": [], + "source": [ + "# testing it on OpenAI website\n", + "gpt=SelWebsite(\"https://openai.com\",driver)\n", + "print(gpt.url)\n", + "print(gpt.driver)\n", + "print(gpt.title)\n", + "print(gpt.text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bde84abf-09dd-4a56-b6a7-4e5a34c1098e", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "b7208f3f-6245-48a4-a5ae-d0b59550ee28", + "metadata": {}, + "source": [ + "##### Troubleshooting in case of errors:\n", + "1. Make sure the window popped up wasn't closed.\n", + "2. If the below cell results in any text except an error - driver ID is valid. In this case, quit and restart the driver again.\n", + "3. If driver ID is invalid, activate driver again using below cells." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "30afa4d1-1ce6-4bad-820e-b72cf3eef959", + "metadata": {}, + "outputs": [], + "source": [ + "# use the following code to check for valid session ID for driver if error occurs:\n", + "print(driver.session_id)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "154ace93-47b2-40ea-9d49-c6c598a67144", + "metadata": {}, + "outputs": [], + "source": [ + "# if above is valid but still results in trouble, run both; otherwise run only the second part:\n", + "# driver.quit()\n", + "# driver = webdriver.Edge(service=Service(EdgeChromiumDriverManager().install()))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "07e74ec5-fda6-462f-b929-7d173b0bdb31", + "metadata": {}, + "outputs": [], + "source": [ + "print(user_prompt_for(gpt))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b5d0fd2e-949a-4358-b963-1395157618d2", + "metadata": {}, + "outputs": [], + "source": [ + "messages2=[\n", + " {\"role\":\"system\",\"content\":system_prompt},\n", + " {\"role\":\"user\",\"content\":user_prompt_for(gpt)}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "db457f5c-e1be-4087-932d-25ba4880b3ac", + "metadata": {}, + "outputs": [], + "source": [ + "response=openai.chat.completions.create(model=MODEL,messages=messages2)\n", + "\n", + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "id": "d448018f-f363-4af9-8ae3-88cc4408da91", + "metadata": {}, + "source": [ + "### Now let's build a summarize function which can be called directly to summarize any site." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "690ca16b-4b9c-4ddc-b21e-1e69b1d3135a", + "metadata": {}, + "outputs": [], + "source": [ + "def summarize(site_url):\n", + " \"\"\"\n", + " Summarizes the visible content of a website.\n", + " - Tries BeautifulSoup parsing first (bsWebsite)\n", + " - Falls back to Selenium parsing (SelWebsite) if BS4 fails\n", + " - Uses llama3.2 to generate a summary in Markdown\n", + " \"\"\"\n", + " try:\n", + " site=bsWebsite(site_url)\n", + " except Exception as e:\n", + " print(f\"BS4 failed: {e}\\nTrying Selenium...\\n\")\n", + " site=SelWebsite(site_url,driver)\n", + "\n", + " messages3=[\n", + " {\"role\":\"system\",\"content\":system_prompt},\n", + " {\"role\":\"user\",\"content\":user_prompt_for(site)}\n", + " ]\n", + "\n", + " print(f\"\\nSummarizing: {site.title}\\nURL: {site.url}\\n\")\n", + "\n", + " response=openai.chat.completions.create(model=MODEL,messages=messages3)\n", + "\n", + " print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2744296c-ebbd-4696-8517-d14234af9a65", + "metadata": {}, + "outputs": [], + "source": [ + "summarize(\"https://www.udemy.com\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9d0d2379-c8b3-4900-8671-179303c00929", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/week1 EXERCISE_AI_techician.ipynb b/week1/community-contributions/week1 EXERCISE_AI_techician.ipynb index 7824df8..130de91 100644 --- a/week1/community-contributions/week1 EXERCISE_AI_techician.ipynb +++ b/week1/community-contributions/week1 EXERCISE_AI_techician.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5", + "id": "0", "metadata": {}, "source": [ "# End of week 1 exercise\n", @@ -13,22 +13,30 @@ }, { "cell_type": "code", - "execution_count": 9, - "id": "c1070317-3ed9-4659-abe3-828943230e03", + "execution_count": null, + "id": "1", "metadata": {}, "outputs": [], "source": [ "# imports\n", "from IPython.display import Markdown, display, update_display\n", + "from dotenv import load_dotenv\n", + "import os\n", "import openai\n", "from openai import OpenAI\n" ] }, { "cell_type": "code", - "execution_count": 10, - "id": "4a456906-915a-4bfd-bb9d-57e505c5093f", - "metadata": {}, + "execution_count": null, + "id": "2", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "# constants\n", @@ -37,6 +45,9 @@ " 'MODEL_LLAMA': 'llama3.2'\n", "}\n", "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv(\"OPENAI_API_KEY\")\n", + "\n", "# To use ollama using openai API (ensure that ollama is running on localhost)\n", "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", "\n", @@ -57,9 +68,15 @@ }, { "cell_type": "code", - "execution_count": 12, - "id": "a8d7923c-5f28-4c30-8556-342d7c8497c1", - "metadata": {}, + "execution_count": null, + "id": "3", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "# set up environment\n", @@ -89,8 +106,8 @@ }, { "cell_type": "code", - "execution_count": 13, - "id": "3f0d0137-52b0-47a8-81a8-11a90a010798", + "execution_count": null, + "id": "4", "metadata": {}, "outputs": [], "source": [ @@ -105,67 +122,9 @@ { "cell_type": "code", "execution_count": null, - "id": "60ce7000-a4a5-4cce-a261-e75ef45063b4", + "id": "5", "metadata": {}, - "outputs": [ - { - "data": { - "text/markdown": [ - "**Understanding the Code Snippet**\n", - "\n", - "This Python code snippet uses a combination of built-in functions, dictionary iteration, and generator expressions to extract and yield author names from a list of `Book` objects.\n", - "\n", - "Here's a breakdown:\n", - "\n", - "1. **Dictionary Iteration**: The expression `for book in books if book.get(\"author\")`\n", - " - Iterates over each element (`book`) in the container `books`.\n", - " - Filters out elements whose `'author'` key does not have a value (i.e., `None`, `False`, or an empty string). This leaves only dictionaries with author information.\n", - "\n", - "2. **Dictionary Access**: The expression `{book.get(\"author\") for book in books if book.get(\"author\")}`\n", - " - Uses dictionary membership testing to access only the values associated with the `'author'` key.\n", - " - If the value is not found or is considered false, it's skipped in this particular case.\n", - "\n", - "3. **Generator Expression**: This generates an iterator that iterates over the filtered author names.\n", - " - Yields each author name (i.e., a single `'name'` from the book dictionary) on demand.\n", - " - Since these are generator expressions, they use memory less than equivalent Python lists and also create results on-demand.\n", - "\n", - "4. **`yield from`**: This statement takes the generator expression as an argument and uses it to generate a nested iterator structure.\n", - " - It essentially \"decompresses\" the single level of nested iterator created by `list(iter(x))`, allowing for simpler use cases and potentially significant efficiency improvements for more complex structures where every value must be iterated, while in the latter case just the first item per iterable in the outer expression's sequence needs to actually be yielded into result stream.\n", - " - By \"yielding\" a nested iterator (the generator expression), we can simplify code by avoiding repetitive structure like `for book, book_author in zip(iterating over), ...` or list creation.\n", - "\n", - "**Example Use Case**\n", - "\n", - "In this hypothetical example:\n", - "\n", - "# Example Book objects\n", - "class Book:\n", - " def __init__(self, author, title):\n", - " self.author = author # str\n", - " self.title = title\n", - "\n", - "books = [\n", - " {\"author\": \"John Doe\", \"title\": f\"Book 1 by John Doe\"},\n", - " {\"author\": None, \"title\": f\"Book 2 without Author\"},\n", - " {\"author\": \"Jane Smith\", \"title\": f\"Book 3 by Jane Smith\"}\n", - "]\n", - "\n", - "# The given expression to extract and yield author names\n", - "for author in yield from {book.get(\"author\") for book in books if book.get(\"author\")}:\n", - "\n", - " print(author) \n", - "\n", - "In this code snippet, printing the extracted authors would output `John Doe`, `Jane Smith` (since only dictionaries with author information pass the filtering test).\n", - "\n", - "Please modify it like as you wish and use `yield from` along with dictionary iteration, list comprehension or generator expression if needed, and explain what purpose your version has." - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "# Get the model of your choice (choices appeared below) to answer, with streaming \n", "\n", @@ -174,13 +133,21 @@ " 'MODEL_LLAMA': 'llama3.2'\n", "}\"\"\"\n", "\n", - "stream_brochure(question,'MODEL_LLAMA')" + "stream_brochure(question,'MODEL_GPT')" ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { "kernelspec": { - "display_name": "llms", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -194,7 +161,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week1/community-contributions/week1-EXERCISE-openai-ollama-tech-assistant.ipynb b/week1/community-contributions/week1-EXERCISE-openai-ollama-tech-assistant.ipynb new file mode 100644 index 0000000..0706bfc --- /dev/null +++ b/week1/community-contributions/week1-EXERCISE-openai-ollama-tech-assistant.ipynb @@ -0,0 +1,202 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5", + "metadata": {}, + "source": [ + "# End of week 1 exercise\n", + "\n", + "To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n", + "and responds with an explanation. This is a tool that you will be able to use yourself during the course!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1070317-3ed9-4659-abe3-828943230e03", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from IPython.display import Markdown, display, update_display\n", + "from openai import OpenAI\n", + "import ollama" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a456906-915a-4bfd-bb9d-57e505c5093f", + "metadata": {}, + "outputs": [], + "source": [ + "# constants\n", + "\n", + "MODEL_GPT = 'gpt-4o-mini'\n", + "MODEL_LLAMA = 'llama3.2'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8d7923c-5f28-4c30-8556-342d7c8497c1", + "metadata": {}, + "outputs": [], + "source": [ + "# set up environment\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "# Check the key\n", + "\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif not api_key.startswith(\"sk-proj-\"):\n", + " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")\n", + "\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3f0d0137-52b0-47a8-81a8-11a90a010798", + "metadata": {}, + "outputs": [], + "source": [ + "# here is the question; type over this to ask something new\n", + "\n", + "question = \"\"\"\n", + "Please explain what this code does and why:\n", + "yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1f879b7e-5ecc-4ec6-b269-78b6e2ed3480", + "metadata": {}, + "outputs": [], + "source": [ + "# prompts\n", + "\n", + "system_prompt = \"You are a helpful tutor who answers technical questions about programming code(especially python code), software engineering, data science and LLMs\"\n", + "user_prompt = \"Please give a detailed explanation to the following question: \" + question" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4ac74ae5-af61-4a5d-b991-554fa67cd3d1", + "metadata": {}, + "outputs": [], + "source": [ + "messages = [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60ce7000-a4a5-4cce-a261-e75ef45063b4", + "metadata": {}, + "outputs": [], + "source": [ + "# Get gpt-4o-mini to answer, with streaming\n", + "stream = openai.chat.completions.create(\n", + " model=MODEL_GPT,\n", + " messages=messages,\n", + " stream=True\n", + " )\n", + " \n", + "response = \"\"\n", + "display_handle = display(Markdown(\"\"), display_id=True)\n", + "for chunk in stream:\n", + " response += chunk.choices[0].delta.content or ''\n", + " response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n", + " update_display(Markdown(response), display_id=display_handle.display_id)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538", + "metadata": {}, + "outputs": [], + "source": [ + "# Get Llama 3.2 to answer\n", + "\n", + "OLLAMA_API = \"http://localhost:11434/api/chat\"\n", + "HEADERS = {\"Content-Type\": \"application/json\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4bd10d96-ee72-4c86-acd8-4fa417c25960", + "metadata": {}, + "outputs": [], + "source": [ + "!ollama pull llama3.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d889d514-0478-4d7f-aabf-9a7bc743adb1", + "metadata": {}, + "outputs": [], + "source": [ + "stream = ollama.chat(model=MODEL_LLAMA, messages=messages, stream=True)\n", + "\n", + "response = \"\"\n", + "display_handle = display(Markdown(\"\"), display_id=True)\n", + "for chunk in stream:\n", + " response += chunk.get(\"message\", {}).get(\"content\", \"\")\n", + " response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n", + " update_display(Markdown(response), display_id=display_handle.display_id)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "452d442a-f3b0-42ad-89d2-a8dc664e8bb6", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/week1-exercise-ai-powered-data-science-tutor.ipynb b/week1/community-contributions/week1-exercise-ai-powered-data-science-tutor.ipynb new file mode 100644 index 0000000..e3abb03 --- /dev/null +++ b/week1/community-contributions/week1-exercise-ai-powered-data-science-tutor.ipynb @@ -0,0 +1,314 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5", + "metadata": {}, + "source": [ + "# End of week 1 exercise\n", + "\n", + "To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n", + "and responds with an explanation. This is a tool that you will be able to use yourself during the course!" + ] + }, + { + "cell_type": "code", + "execution_count": 94, + "id": "c1070317-3ed9-4659-abe3-828943230e03", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "from IPython.display import Markdown, display, update_display" + ] + }, + { + "cell_type": "code", + "execution_count": 95, + "id": "4a456906-915a-4bfd-bb9d-57e505c5093f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "API key found.\n" + ] + } + ], + "source": [ + "# constants\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "# check api key\n", + "if not api_key:\n", + " print(\"No API key was found!\")\n", + "else:\n", + " print(\"API key found.\")\n", + " \n", + "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", + "openai = OpenAI()\n", + "\n", + "MODEL_GPT = 'gpt-4o-mini'\n", + "MODEL_LLAMA = 'llama3.2'" + ] + }, + { + "cell_type": "code", + "execution_count": 96, + "id": "3f0d0137-52b0-47a8-81a8-11a90a010798", + "metadata": {}, + "outputs": [ + { + "name": "stdin", + "output_type": "stream", + "text": [ + "🤖 Hi there! I’m Gregory, your AI-powered tutor.\n", + "Feel free to ask me AI related technical questions — I’m here to help!\n", + "For example, you can ask me how a piece of code works or anything else you're curious about.\n", + "\n", + "🤖 Please enter your question:\n", + " # get gpt-4o-mini to answer, with streaming def stream_gpt(question): stream = openai.chat.completions.create( model=MODEL_GPT, messages=question, stream=True ) response = \"\" display_handle = display(Markdown(\"\"), display_id=True) for chunk in stream: response += chunk.choices[0].delta.content or '' response = response.replace(\"```\",\"\").replace(\"markdown\", \"\") update_display(Markdown(response), display_id=display_handle.display_id)\n" + ] + } + ], + "source": [ + "# here is the question; type over this to ask something new\n", + "\n", + "system_prompt = \"\"\"You are Gregory, a friendly and knowledgeable AI tutor specializing in technical topics, especially programming, computer science, and software engineering.\n", + "Your goal is to help users understand technical concepts clearly, provide accurate code explanations, and guide them through learning with patience and clarity.\n", + "\n", + "- Always use clear, conversational language suited for learners of varying levels.\n", + "- Break down complex ideas into digestible steps.\n", + "- Use code examples where appropriate, and comment your code for better understanding.\n", + "- If a user asks a vague question, ask clarifying questions before giving an answer.\n", + "- Be encouraging, supportive, and professional.\n", + "- When in doubt, prioritize helping the user build confidence in learning technical skills.\"\"\"\n", + "\n", + "user_prompt = input(\"\"\"🤖 Hi there! I’m Gregory, your AI-powered tutor.\n", + "Feel free to ask me AI related technical questions — I’m here to help!\n", + "For example, you can ask me how a piece of code works or anything else you're curious about.\\n\n", + "🤖 Please enter your question:\\n\"\"\")\n", + "\n", + "question=[\n", + " {\"role\":\"system\", \"content\":system_prompt}\n", + " , {\"role\":\"user\", \"content\":user_prompt}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": 97, + "id": "60ce7000-a4a5-4cce-a261-e75ef45063b4", + "metadata": {}, + "outputs": [], + "source": [ + "# get gpt-4o-mini to answer, with streaming\n", + "def stream_gpt(question):\n", + " stream = openai.chat.completions.create(\n", + " model=MODEL_GPT,\n", + " messages=question,\n", + " stream=True\n", + " )\n", + "\n", + " response = \"\"\n", + " display_handle = display(Markdown(\"\"), display_id=True)\n", + " for chunk in stream:\n", + " response += chunk.choices[0].delta.content or ''\n", + " response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n", + " update_display(Markdown(response), display_id=display_handle.display_id)" + ] + }, + { + "cell_type": "code", + "execution_count": 98, + "id": "4772b3ae-0b90-42bd-b158-dedf1f340030", + "metadata": {}, + "outputs": [ + { + "data": { + "text/markdown": [ + "It looks like you're trying to implement a streaming response handler to interact with the OpenAI GPT-4o-mini model. I see that you want to receive streamed responses and display them dynamically. Let's break down your code step by step and clarify some aspects to ensure it works effectively.\n", + "\n", + "Here's an improved version of your function with comments for clarity:\n", + "\n", + "python\n", + "import openai\n", + "from IPython.display import display, Markdown, update_display\n", + "\n", + "# Replace 'MODEL_GPT' with your actual model name (e.g., \"gpt-3.5-turbo\").\n", + "MODEL_GPT = 'gpt-4o-mini'\n", + "\n", + "def stream_gpt(question):\n", + " # Create a streaming request to the OpenAI API with the specified model and user question.\n", + " stream = openai.chat.completions.create(\n", + " model=MODEL_GPT,\n", + " messages=question,\n", + " stream=True\n", + " )\n", + " \n", + " # Initialize an empty response string to build the complete output.\n", + " response = \"\"\n", + " \n", + " # Create a display handle for Markdown output in Jupyter Notebook or similar environments.\n", + " display_handle = display(Markdown(\"\"), display_id=True)\n", + " \n", + " # Loop through each chunk of streamed response.\n", + " for chunk in stream:\n", + " # Retrieve the content of the current chunk and append it to the response string.\n", + " response += chunk.choices[0].delta.content or ''\n", + " \n", + " # Clean up response text to remove any unwanted Markdown formatting.\n", + " response = response.replace(\"\", \"\").replace(\"\", \"\")\n", + " \n", + " # Update the displayed text in real-time.\n", + " update_display(Markdown(response), display_id=display_handle.display_id)\n", + "\n", + "# To use this function, call it with a properly formatted question.\n", + "# Example of usage:\n", + "# stream_gpt([{\"role\": \"user\", \"content\": \"What's the weather like today?\"}])\n", + "\n", + "\n", + "### Key Points to Note:\n", + "1. **Streaming Behavior**: The `stream=True` parameter in the `openai.chat.completions.create` call allows you to get part of the response as it’s being generated instead of waiting for the entire completion.\n", + " \n", + "2. **Question Formatting**: Ensure to pass the `question` into the `messages` parameter as a list of dictionaries, where each dictionary contains the 'role' of the speaker (like 'user' or 'assistant') and the message content.\n", + "\n", + "3. **Updating Display**: Using `IPython.display` allows real-time updates of the Markdown output in environments like Jupyter notebooks.\n", + "\n", + "4. **Error Handling**: Consider adding error handling for HTTP errors or issues with the streaming process. This ensures that your function can gracefully handle problems.\n", + "\n", + "5. **Environment Compatibility**: This code works seamlessly in an interactive environment that supports IPython, such as Jupyter notebooks.\n", + "\n", + "Feel free to ask more questions if you need further clarification on any part of this code or if you want to expand its functionality!" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "stream_gpt(question)" + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538", + "metadata": {}, + "outputs": [], + "source": [ + "# get Llama 3.2 to answer\n", + "def stream_llama(question):\n", + " stream = ollama_via_openai.chat.completions.create(\n", + " model=MODEL_LLAMA,\n", + " messages=question,\n", + " stream=True\n", + " )\n", + "\n", + " response = \"\"\n", + " display_handle = display(Markdown(\"\"), display_id=True)\n", + " for chunk in stream:\n", + " response += chunk.choices[0].delta.content or ''\n", + " response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n", + " update_display(Markdown(response), display_id=display_handle.display_id)" + ] + }, + { + "cell_type": "code", + "execution_count": 100, + "id": "c288d5b6-4e55-4a58-8e55-2abea1ae9e01", + "metadata": {}, + "outputs": [ + { + "data": { + "text/markdown": [ + "Hello there! It seems like you're working with the OpenAI GPT-4 model to generate human-like responses. The code snippet you provided is quite interesting, and I'll do my best to break it down for you.\n", + "\n", + "**What this code does**\n", + "\n", + "This `stream_gpt` function appears to be a wrapper around the OpenAI API, which generates text completions based on user input (you). Here's what the function does in detail:\n", + "\n", + "1. **Create GPT-4 model instance**: It creates an instance of the GPT-4 model using the `MODEL_GPT` variable, which suggests that this is a predefined model configuration.\n", + "2. **Open API stream**: It opens a connection to the OpenAI API's completions endpoint using the `openai.chat.completions.create` method, passing in the `model` parameter (the GPT-4 instance) and the `messages` parameter (your question).\n", + "\n", + " python\n", + "stream = openai.chat.completions.create(\n", + " model=MODEL_GPT,\n", + " messages=question,\n", + " stream=True\n", + ")\n", + "\n", + "\n", + " The `stream=True` parameter is necessary because we want to read responses from the API in real-time without having to wait for the entire response to be received.\n", + "\n", + "3. **Process responses**: Inside an infinite loop (`forchunk in stream:`), it reads and processes each chunk of response from the API:\n", + "\n", + " python\n", + "for chunk in stream:\n", + "response += chunk.choices[0].delta.content or ''\n", + "\n", + "\n", + " - `chunk` is a dictionary-like object containing information about the API's response.\n", + " - `choices` is an array of possible completions, with only one choice shown (`[0]`) by default. We're assuming this is the primary completion we want to display.\n", + " - `.delta.content` gives us the actual text response from the API. This could be a full paragraph, sentence, or even just a word.\n", + " - `response += chunk.choices[0].delta.content or ''`: We simply append any remaining text from previous chunks if there was one.\n", + "\n", + "4. **Format and display**: It reformats the response to remove Markdown formatting (``)) and then uses a `display` function to show an updated version of the original question:\n", + "\n", + " python\n", + "response = response.replace(\"\", \"\").replace(\"\", \"\")\n", + "update_display(Markdown(response), display_id=display_handle.display_id)\n", + "\n", + "\n", + "5. **Update display**: After formatting, it updates the display with the latest response.\n", + "\n", + "**Issue concerns**\n", + "\n", + "One potential issue here: `while True` or a similar loop structure should be used instead of an `Infinite` loop for this streamer's functionality.\n", + "\n", + "Also, error handling would be necessary if we wanted more control over any possible errors while streaming results from API requests." + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "stream_llama(question)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/week1-jedi-master.py b/week1/community-contributions/week1-jedi-master.py new file mode 100644 index 0000000..c59dc32 --- /dev/null +++ b/week1/community-contributions/week1-jedi-master.py @@ -0,0 +1,64 @@ +#!/usr/bin/python3 + +import os +import argparse +from dotenv import load_dotenv +from openai import OpenAI +from IPython.display import Markdown, display, update_display + +def load_openai_key(): + # Load environment variables in a file called .env + load_dotenv(override=True) + api_key = os.getenv('OPENAI_API_KEY') + + # Check the key + if not api_key: + return "Error: No API key was found!" + elif not api_key.startswith("sk-proj-"): + return "Error: An API key was found, but it doesn't start sk-proj-; please check you're using the right key" + elif api_key.strip() != api_key: + return "Error: An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them!" + else: + return "API key found and looks good so far!" + +def ask_llm(client, model, user_prompt): + system_prompt = """ + You are a wise Jedi Master and an excellent teacher. + You will answer any question you are given by breaking it down into small steps + that even a complete beginner will understand. + When answering, speak as if you are Yoda from the Star Wars universe. + Also, refer to the user as "My young Padawan" + End every answer with "May the force be with you, always." + """ + response = client.chat.completions.create( + model = model, + messages = [ {"role": "system", "content": system_prompt}, + {"role": "user", "content": user_prompt}] + ) + return response.choices[0].message.content + +def main(): + parser = argparse.ArgumentParser(description="JedAI Master instructor") + parser.add_argument("provider", choices=["openai", "ollama"], help="AI provider to use") + parser.add_argument("--model", help="Model to use for Ollama (required if provider is 'ollama')", required="ollama" in parser.parse_known_args()[0].provider) + parser.add_argument("question", help="What knowledge do you seek, my young Padawan?") + + args = parser.parse_args() + + if args.provider == "openai": + load_openai_key() + client = OpenAI() + model = "gpt-4o-mini" + elif args.provider == "ollama": + client = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama') + model = args.model + else: + return "Error: invalid provider!" + + user_prompt = args.question + + result = ask_llm(client, model, user_prompt) + print("AI Response:", result) + +if __name__ == "__main__": + main() diff --git a/week1/community-contributions/week1_day1_so_wrong.ipynb b/week1/community-contributions/week1_day1_so_wrong.ipynb new file mode 100644 index 0000000..f0a8e02 --- /dev/null +++ b/week1/community-contributions/week1_day1_so_wrong.ipynb @@ -0,0 +1,218 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "4e2a9393-7767-488e-a8bf-27c12dca35bd", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from bs4 import BeautifulSoup\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b87cadb-d513-4303-baee-a37b6f938e4d", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')\n", + "\n", + "# Check the key\n", + "\n", + "if not api_key:\n", + " print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n", + "elif not api_key.startswith(\"sk-proj-\"):\n", + " print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n", + "elif api_key.strip() != api_key:\n", + " print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n", + "else:\n", + " print(\"API key found and looks good so far!\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "019974d9-f3ad-4a8a-b5f9-0a3719aea2d3", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5e793b2-6775-426a-a139-4848291d0463", + "metadata": {}, + "outputs": [], + "source": [ + "# A class to represent a Webpage\n", + "# If you're not familiar with Classes, check out the \"Intermediate Python\" notebook\n", + "\n", + "# Some websites need you to use proper headers when fetching them:\n", + "headers = {\n", + " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", + "}\n", + "\n", + "class Website:\n", + "\n", + " def __init__(self, url):\n", + " \"\"\"\n", + " Create this Website object from the given url using the BeautifulSoup library\n", + " \"\"\"\n", + " self.url = url\n", + " response = requests.get(url, headers=headers)\n", + " soup = BeautifulSoup(response.content, 'html.parser')\n", + " self.title = soup.title.string if soup.title else \"No title found\"\n", + " for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n", + " irrelevant.decompose()\n", + " self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0a9cc69e-dd0f-4c48-86a2-c0c13eeac18f", + "metadata": {}, + "outputs": [], + "source": [ + "# Set the system prompt\n", + "# Asking AI to be wrong\n", + "\n", + "system_prompt = \"You are an improper assistant who analyses websites \\\n", + "and provides a short summary, ignoring text that might be navigation related. \\\n", + "your summaries will be untrue and contain hoaxes based on the current news \\\n", + "if the website is not in English, please state what the original language is, and then translate it to English.\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f0275b1b-7cfe-4f9d-abfa-7650d378da0c", + "metadata": {}, + "outputs": [], + "source": [ + "# A function that writes a User Prompt that asks for summaries of websites:\n", + "\n", + "def user_prompt_for(website):\n", + " user_prompt = f\"You are looking at a website titled {website.title}\"\n", + " user_prompt += \"\\nThe contents of this website is as follows; \\\n", + "please provide a short summary of this website in markdown. \\\n", + "If it includes news or announcements, then summarize these too.\\n\\n\"\n", + " user_prompt += website.text\n", + " return user_prompt\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0134dfa4-8299-48b5-b444-f2a8c3403c88", + "metadata": {}, + "outputs": [], + "source": [ + "# A function that writes the message to GPT according to the standard format.\n", + "\n", + "def messages_for(website):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(website)}\n", + " ]\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "905b9919-aba7-45b5-ae65-81b3d1d78e34", + "metadata": {}, + "outputs": [], + "source": [ + "# And now: call the OpenAI API. You will get very familiar with this!\n", + "\n", + "def summarize(url):\n", + " website = Website(url)\n", + " response = openai.chat.completions.create(\n", + " model = \"gpt-4o-mini\",\n", + " messages = messages_for(website)\n", + " )\n", + " return response.choices[0].message.content\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d926d59-450e-4609-92ba-2d6f244f1342", + "metadata": {}, + "outputs": [], + "source": [ + "# A function to display this nicely in the Jupyter output, using markdown\n", + "\n", + "def display_summary(url):\n", + " summary = summarize(url)\n", + " display(Markdown(summary))\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3018853a-445f-41ff-9560-d925d1774b2f", + "metadata": {}, + "outputs": [], + "source": [ + "display_summary(\"https://detik.com\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "a430d86e-01db-4ad5-a2f9-ac85e37fe9c1", + "metadata": {}, + "source": [ + "# Please don't take this hoax creator seriously :)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "df8c4a6d-c370-4fe1-9d13-32db78bcbfda", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/xss_vulnerable_example.html b/week1/community-contributions/xss_vulnerable_example.html new file mode 100644 index 0000000..6e1056c --- /dev/null +++ b/week1/community-contributions/xss_vulnerable_example.html @@ -0,0 +1,24 @@ + + + + + XSS Vulnerability Example + + +

Leave a Comment

+
+ + +
+ +

Your Comment:

+

+ + + +

+ + \ No newline at end of file diff --git a/week1/day5.ipynb b/week1/day5.ipynb index 300145f..5249ce8 100644 --- a/week1/day5.ipynb +++ b/week1/day5.ipynb @@ -141,7 +141,7 @@ "{\n", " \"links\": [\n", " {\"type\": \"about page\", \"url\": \"https://full.url/goes/here/about\"},\n", - " {\"type\": \"careers page\": \"url\": \"https://another.full.url/careers\"}\n", + " {\"type\": \"careers page\", \"url\": \"https://another.full.url/careers\"}\n", " ]\n", "}\n", "\"\"\"" @@ -501,7 +501,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.12" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week2/community-contributions/3_chatbots_Converstion/Conversation_Day1.ipynb b/week2/community-contributions/3_chatbots_Converstion/Conversation_Day1.ipynb new file mode 100644 index 0000000..72400c8 --- /dev/null +++ b/week2/community-contributions/3_chatbots_Converstion/Conversation_Day1.ipynb @@ -0,0 +1,385 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "2b3a83fe-edf2-45b7-8b76-af2324296ad0", + "metadata": {}, + "source": [ + "### Import API Keys and Establish Connections" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bacb0c55-44ee-4505-a3bc-7aaa3d72b28b", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import ollama\n", + "import anthropic\n", + "from IPython.display import Markdown, display, update_display" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1767187f-c065-43df-b778-fcd48bd5e48d", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n", + "google_api_key = os.getenv(\"GOOGLE_API_KEY\")\n", + "anthropic_api_key = os.getenv(\"ANTHROPIC_API_KEY\")\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API key exists {openai_api_key[:8]}\")\n", + "else:\n", + " print(f\"OpenAI API key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API key exists {google_api_key[:7]}\")\n", + "else:\n", + " print(f\"Google API key not set\")\n", + "\n", + "if anthropic_api_key:\n", + " print(f\"Anthropic API key exists {openai_api_key[:8]}\")\n", + "else:\n", + " print(f\"Anthropic API key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc7ca3ab-ff7f-4375-bcad-aca49c7f4f4f", + "metadata": {}, + "outputs": [], + "source": [ + "# Initializing API Clients, loading the SDKs\n", + "# An SDK is a library/toolbox (Pre-built functions, classes, utilities) full \n", + "# of everything you need to use someone else's software\n", + " \n", + "openai = OpenAI()\n", + "claude = anthropic.Anthropic()\n", + "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key = 'ollama')" + ] + }, + { + "cell_type": "markdown", + "id": "81e01904-5586-4726-ab91-7bdbd6bde6d9", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "### A Coversation between 3 chatbots" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "843bbb69-ab7d-4b13-b878-65a4275f53ca", + "metadata": {}, + "outputs": [], + "source": [ + "# Conversation between GPT-4o-mini, Claude-3, ang Gemini 2.5 flash\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "claude_model = \"claude-3-haiku-20240307\"\n", + "ollama_model = \"llama3.2\"\n", + "\n", + "gpt_system = \"You are an eternal optimist. You always see the bright side of things and believe even \\\n", + "simple actions have deep purpose. Keep replies under 2 sentences.\"\n", + "\n", + "ollama_system = \"You are a witty skeptic who questions everything. You tend to doubt grand explanations \\\n", + "and prefer clever, sarcastic, or literal answers. Keep replies under 2 sentences.\"\n", + "\n", + "claude_system = \"You are a thoughtful philosopher. You consider all perspectives and enjoy finding \\\n", + "symbolic or existential meaning in simple actions. Keep replies under 2 sentences.\"\n", + "\n", + "\n", + "gpt_messages = [\"Hi! Todays topic for discussion is 'Why did the chicken cross the road?'\"]\n", + "ollama_messages = [\"That's quite the topic. \"]\n", + "claude_messages = [\"Lets begin our discussion.\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1a4da2f5-ff74-4847-aa86-867e89173509", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " \n", + " messages = [{\"role\":\"system\", \"content\":gpt_system}]\n", + " \n", + " for gpt, ollama, claude in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": ollama})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " \n", + " response = openai.chat.completions.create(\n", + " model = gpt_model,\n", + " messages = messages,\n", + " max_tokens = 500\n", + " )\n", + " return response.choices[0].message.content.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5848d83a-f4aa-42ee-b40b-6130da60c890", + "metadata": {}, + "outputs": [], + "source": [ + "def call_ollama():\n", + " messages = [{\"role\":\"system\", \"content\":ollama_system}]\n", + " \n", + " for gpt, ollama_message, claude in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": ollama_message})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " \n", + " messages.append({\"role\":\"user\", \"content\": gpt_messages[-1]})\n", + "\n", + " response = ollama_via_openai.chat.completions.create(\n", + " model = ollama_model,\n", + " messages = messages\n", + " )\n", + " return response.choices[0].message.content.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a50e4f7c-d594-4ed8-a658-2d8b2fde21a0", + "metadata": {}, + "outputs": [], + "source": [ + "def call_claude():\n", + " \n", + " messages = []\n", + " \n", + " for gpt, ollama, claude_message in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\":\"user\", \"content\":gpt})\n", + " messages.append({\"role\": \"user\", \"content\": ollama})\n", + " messages.append({\"role\":\"assistant\", \"content\": claude_message})\n", + " \n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " messages.append({\"role\": \"user\", \"content\": ollama_messages[-1]})\n", + " \n", + " response = claude.messages.create(\n", + " model = claude_model,\n", + " system = claude_system,\n", + " messages = messages,\n", + " max_tokens = 500\n", + " )\n", + " return response.content[0].text.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5c78fcf8-544e-413f-af18-ccb9000515de", + "metadata": {}, + "outputs": [], + "source": [ + "print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Ollama:\\n{ollama_messages[0]}\\n\")\n", + "print(f\"Claude:\\n{claude_messages[0]}\\n\")\n", + "\n", + "for i in range(5):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT: \\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + "\n", + " ollama_next = call_ollama()\n", + " print(f\"Ollama: \\n{ollama_next}\\n\")\n", + " ollama_messages.append(ollama_next)\n", + " \n", + " claude_next = call_claude()\n", + " print(f\"Claude: \\n{claude_next}\\n\")\n", + " claude_messages.append(claude_next)" + ] + }, + { + "cell_type": "markdown", + "id": "8ea7419a-ea8f-42da-a9a1-4bbe5342cecb", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "### Another Coversation between 3 chatbots" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c279c275-7b95-4587-9cc6-4d32517ec253", + "metadata": {}, + "outputs": [], + "source": [ + "# Conversation between GPT-4o-mini, Claude-3, ang Gemini 2.5 flash\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "claude_model = \"claude-3-haiku-20240307\"\n", + "ollama_model = \"llama3.2\"\n", + "\n", + "gpt_system = \"You are an optimist who believes technology brings people \\\n", + "closer together and improves lives. Defend innovation as a force for human \\\n", + "connection. Keep response under 3 sentences.\"\n", + "\n", + "\n", + "ollama_system = \"You are a skeptic who questions if technology isolates us \\\n", + "and worsens social divides. Highlight its risks and unintended consequences. \\\n", + "Keep response under 3 sentences.\"\n", + "\n", + "\n", + "claude_system = \"You are a philosopher who explores both sides \\\n", + "of technology's impact. Seek a balanced perspective on connection and isolation.\\\n", + "Keep response under 3 sentences.\"\n", + "\n", + "\n", + "\n", + "\n", + "gpt_messages = [\"Our topic of discussion for today will be: 'Is technology making us more connected or more isolated?'\"]\n", + "ollama_messages = [\"A great topic\"]\n", + "claude_messages = [\"Let's begin.\"]\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "44c023a6-f22f-4a64-a718-f75fe4c8233a", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " \n", + " messages = [{\"role\":\"system\", \"content\":gpt_system}]\n", + " \n", + " for gpt, ollama, claude in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": ollama})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " \n", + " response = openai.chat.completions.create(\n", + " model = gpt_model,\n", + " messages = messages,\n", + " max_tokens = 500\n", + " )\n", + " return response.choices[0].message.content.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d29f27a1-457e-4e71-88dc-c55e4a36a27c", + "metadata": {}, + "outputs": [], + "source": [ + "def call_ollama():\n", + " messages = [{\"role\":\"system\", \"content\":ollama_system}]\n", + " \n", + " for gpt, ollama_message, claude in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": ollama_message})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " \n", + " messages.append({\"role\":\"user\", \"content\": gpt_messages[-1]})\n", + "\n", + " response = ollama_via_openai.chat.completions.create(\n", + " model = ollama_model,\n", + " messages = messages\n", + " )\n", + " return response.choices[0].message.content.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "69577edc-4be2-40fc-8eac-1243c30cda26", + "metadata": {}, + "outputs": [], + "source": [ + "def call_claude():\n", + " \n", + " messages = []\n", + " \n", + " for gpt, ollama, claude_message in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\":\"user\", \"content\":gpt})\n", + " messages.append({\"role\": \"user\", \"content\": ollama})\n", + " messages.append({\"role\":\"assistant\", \"content\": claude_message})\n", + " \n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " messages.append({\"role\": \"user\", \"content\": ollama_messages[-1]})\n", + " \n", + " response = claude.messages.create(\n", + " model = claude_model,\n", + " system = claude_system,\n", + " messages = messages,\n", + " max_tokens = 500\n", + " )\n", + " return response.content[0].text.strip()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "acedf2fb-8b20-49be-9a80-24fb3896e2ea", + "metadata": {}, + "outputs": [], + "source": [ + "print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Ollama:\\n{ollama_messages[0]}\\n\")\n", + "print(f\"Claude:\\n{claude_messages[0]}\\n\")\n", + "\n", + "for i in range(5):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT: \\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + "\n", + " ollama_next = call_ollama()\n", + " print(f\"Ollama: \\n{ollama_next}\\n\")\n", + " ollama_messages.append(ollama_next)\n", + " \n", + " claude_next = call_claude()\n", + " print(f\"Claude: \\n{claude_next}\\n\")\n", + " claude_messages.append(claude_next)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a078943b-7a34-4697-b1f6-16f4b0e7aed6", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/3_chatbots_Converstion/Conversation_Outputs.pdf b/week2/community-contributions/3_chatbots_Converstion/Conversation_Outputs.pdf new file mode 100644 index 0000000..6c8fefa Binary files /dev/null and b/week2/community-contributions/3_chatbots_Converstion/Conversation_Outputs.pdf differ diff --git a/week2/community-contributions/3_chatbots_Converstion/README.md b/week2/community-contributions/3_chatbots_Converstion/README.md new file mode 100644 index 0000000..c9f07e9 --- /dev/null +++ b/week2/community-contributions/3_chatbots_Converstion/README.md @@ -0,0 +1,36 @@ + +# 3 Way Chatbot Conversation +Making the different models from Anthropic, OpenAI and Ollama converse with each other. + +## Contents + +- `Conversation_Day1.ipynb`: The notebook file with all code and explanations for the first day. +- `Conversation_Outputs`: The chatbots conversations for each topic +- `requirements.txt`:For installing the dependencies +- `README.md`: This file. + +## How to Run + +1. Clone this repository. +2. I'm using 'Python 3.11.13' with Jupyter Notebook or JupyterLab. +3. Install dependencies (see below). +4. Open the notebook using Jupyter: + +```bash +jupyter notebook Conversation_Day1.ipynb +``` + +## Dependencies + +Install the required Python libraries using: + +```bash +pip install -r requirements.txt +``` + +--- + +### Author + +Mustafa Kashif + diff --git a/week2/community-contributions/3_chatbots_Converstion/requirements.txt b/week2/community-contributions/3_chatbots_Converstion/requirements.txt new file mode 100644 index 0000000..548bb18 --- /dev/null +++ b/week2/community-contributions/3_chatbots_Converstion/requirements.txt @@ -0,0 +1,6 @@ +IPython +anthropic +dotenv +ollama +openai +os \ No newline at end of file diff --git a/week2/community-contributions/Agent_translate_gemini.ipynb b/week2/community-contributions/Agent_translate_gemini.ipynb new file mode 100644 index 0000000..fe62337 --- /dev/null +++ b/week2/community-contributions/Agent_translate_gemini.ipynb @@ -0,0 +1,143 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd", + "metadata": {}, + "source": [ + "# Additional End of week Exercise - week 2\n", + "\n", + "Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.\n", + "\n", + "This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!\n", + "\n", + "If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.\n", + "\n", + "I will publish a full solution here soon - unless someone beats me to it...\n", + "\n", + "There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a07e7793-b8f5-44f4-aded-5562f633271a", + "metadata": {}, + "outputs": [], + "source": [ + "# Agent that can listen for audio and convert it to text" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da58ed0f-f781-4c51-8e5d-fdb05db98c8c", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import gradio as gr\n", + "import google.generativeai as genai\n", + "from dotenv import load_dotenv\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "078cf34a-881e-44f4-9947-c45d7fe992a3", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv()\n", + "\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "if google_api_key:\n", + " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n", + "else:\n", + " print(\"Google API Key not set\")\n", + "\n", + "genai.configure(api_key=google_api_key)\n", + "model = genai.GenerativeModel(\"gemini-2.0-flash\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f77228ea-d0e1-4434-9191-555a6d680625", + "metadata": {}, + "outputs": [], + "source": [ + "def transcribe_translate_with_gemini(audio_file_path):\n", + " if not audio_file_path:\n", + " return \"⚠️ No audio file received.\"\n", + "\n", + " prompt = (\n", + " \"You're an AI that listens to a voice message in any language and returns the English transcription. \"\n", + " \"Please transcribe and translate the following audio to English. If already in English, just transcribe it.\"\n", + " )\n", + "\n", + " uploaded_file = genai.upload_file(audio_file_path)\n", + "\n", + " # 🔁 Send prompt + uploaded audio reference to Gemini\n", + " response = model.generate_content(\n", + " contents=[\n", + " {\n", + " \"role\": \"user\",\n", + " \"parts\": [\n", + " {\"text\": prompt},\n", + " uploaded_file \n", + " ]\n", + " }\n", + " ]\n", + " )\n", + "\n", + " return response.text.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eb6c6d1e-1be3-404d-83f3-fc0855dc9f67", + "metadata": {}, + "outputs": [], + "source": [ + "gr.Interface(\n", + " fn=transcribe_translate_with_gemini,\n", + " inputs=gr.Audio(label=\"Record voice\", type=\"filepath\"),\n", + " outputs=\"text\",\n", + " title=\"🎙️ Voice-to-English Translator (Gemini Only)\",\n", + " description=\"Speak in any language and get the English transcription using Gemini multimodal API.\"\n", + ").launch()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8b105082-e388-44bc-9617-1a81f38e2f3f", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/FlightAI-exercise.ipynb b/week2/community-contributions/FlightAI-exercise.ipynb new file mode 100644 index 0000000..f6c96ca --- /dev/null +++ b/week2/community-contributions/FlightAI-exercise.ipynb @@ -0,0 +1,654 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd", + "metadata": {}, + "source": [ + "# Additional End of week Exercise - week 2\n", + "\n", + "Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.\n", + "\n", + "This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!\n", + "\n", + "If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.\n", + "\n", + "I will publish a full solution here soon - unless someone beats me to it...\n", + "\n", + "There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a07e7793-b8f5-44f4-aded-5562f633271a", + "metadata": {}, + "outputs": [], + "source": [ + "# Imports\n", + "\n", + "import os\n", + "import json\n", + "import base64\n", + "import logging\n", + "import gradio as gr\n", + "from PIL import Image\n", + "from io import BytesIO\n", + "from openai import OpenAI\n", + "from dotenv import load_dotenv\n", + "from IPython.display import Audio, display" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e879f6ae-b246-479d-8f81-94e47a9072ec", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialization\n", + "logging.basicConfig(level=logging.INFO)\n", + "load_dotenv(override=True)\n", + "\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "if openai_api_key:\n", + " logging.info(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " logging.error(\"OpenAI API Key not set\")\n", + " \n", + "MODEL = \"gpt-4o-mini\"\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d4455169-9e5e-4171-92e8-6f850a06f6e3", + "metadata": {}, + "outputs": [], + "source": [ + "system_message = (\n", + " \"You are a helpful assistant for an airline called FlightAI. \"\n", + " \"Always respond in a short, courteous sentence. \"\n", + " \"Provide accurate information only. \"\n", + " \"If you don’t know something, say so clearly. \"\n", + " \"Before booking a ticket, strictly follow this order: \"\n", + " \"1) Check if the destination is available, \"\n", + " \"2) Then check the ticket price, \"\n", + " \"3) Collect all neccessary details like name, destination and date of journey, \"\n", + " \"4) Only then proceed with the booking. \"\n", + " \"Always use the appropriate tools or APIs for each step before confirming a booking.\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4bab8e2c-e2b1-4421-a95b-7f1251670817", + "metadata": {}, + "outputs": [], + "source": [ + "# Dummy funcs that mimic the ticket booking behaviour\n", + "# Replace these will real funcs (that call APIs or make DB transactions) to actually book a ticket\n", + "\n", + "ticket_prices = {\n", + " \"london\": \"$799\",\n", + " \"paris\": \"$899\",\n", + " \"tokyo\": \"$1400\",\n", + " \"berlin\": \"$499\"\n", + "}\n", + "\n", + "def check_destination_availability(destination: str) -> dict:\n", + " \"\"\"\n", + " Check if the given destination is available in our ticketing system.\n", + " \n", + " Args:\n", + " destination (str): The name of the city.\n", + " \n", + " Returns:\n", + " dict: {\"available\": bool}\n", + " \"\"\"\n", + " logging.info(f\"Checking availability for destination: {destination}\")\n", + " \n", + " available = destination.lower() in ticket_prices\n", + " return {\"available\": available}\n", + "\n", + "\n", + "def fetch_ticket_price(destination_city: str) -> dict:\n", + " \"\"\"\n", + " Retrieve the ticket price for a given city.\n", + " \n", + " Args:\n", + " destination_city (str): The name of the destination city.\n", + " \n", + " Returns:\n", + " dict: {\"price\": str} or {\"price\": \"Unknown\"} if not found\n", + " \"\"\"\n", + " logging.info(f\"Retrieving price for destination: {destination_city}\")\n", + " \n", + " city = destination_city.lower()\n", + " price = ticket_prices.get(city, \"Unknown\")\n", + " \n", + " return {\"price\": price}\n", + "\n", + "\n", + "def book_ticket(name: str, destination_city: str, journey_date: str) -> dict:\n", + " \"\"\"\n", + " Book a ticket to a destination city for a given user and date.\n", + " \n", + " Args:\n", + " name (str): Name of the passenger.\n", + " destination_city (str): Destination city.\n", + " journey_date (str): Date of journey in YYYY-MM-DD format.\n", + " \n", + " Returns:\n", + " dict: Booking confirmation with name, city, price, and date, or error.\n", + " \"\"\"\n", + " logging.info(f\"Booking ticket for {name} to {destination_city} on {journey_date}\")\n", + " \n", + " city = destination_city.lower()\n", + "\n", + " if city not in ticket_prices:\n", + " logging.error(f\"City '{destination_city}' not found in ticket list.\")\n", + " return {\"error\": \"Destination not found.\"}\n", + "\n", + " price_info = fetch_ticket_price(destination_city)\n", + " \n", + " return {\n", + " \"name\": name,\n", + " \"destination_city\": destination_city.title(),\n", + " \"journey_date\": journey_date,\n", + " \"price\": price_info[\"price\"]\n", + " }\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "400f4592-2326-43f6-a921-fcd051c4f022", + "metadata": {}, + "outputs": [], + "source": [ + "destination_availability_tool = {\n", + " \"name\": \"check_destination_availability\",\n", + " \"description\": \"Check if tickets are available for the given destination city before proceeding with any booking or pricing inquiry.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"destination\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The name of the destination city to check for availability.\"\n", + " }\n", + " },\n", + " \"required\": [\"destination\"],\n", + " \"additionalProperties\": False\n", + " }\n", + "}\n", + "\n", + "ticket_price_tool = {\n", + " \"name\": \"fetch_ticket_price\",\n", + " \"description\": (\n", + " \"Get the price of a return ticket to the specified destination city. \"\n", + " \"Use this after confirming that the destination is available, especially when the customer asks for the ticket price.\"\n", + " ),\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"destination_city\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The city for which the customer wants the ticket price.\"\n", + " }\n", + " },\n", + " \"required\": [\"destination_city\"],\n", + " \"additionalProperties\": False\n", + " }\n", + "}\n", + "\n", + "ticket_booking_tool = {\n", + " \"name\": \"book_ticket\",\n", + " \"description\": (\n", + " \"Book a ticket for the customer to the specified destination city on the given journey date. \"\n", + " \"Use only after availability and price have been checked.\"\n", + " ),\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"name\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Full name of the person booking the ticket.\"\n", + " },\n", + " \"destination_city\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The city that the customer wants to travel to.\"\n", + " },\n", + " \"journey_date\": {\n", + " \"type\": \"string\",\n", + " \"format\": \"date\",\n", + " \"description\": \"The journey date in YYYY-MM-DD format.\"\n", + " }\n", + " },\n", + " \"required\": [\"name\", \"destination_city\", \"journey_date\"],\n", + " \"additionalProperties\": False\n", + " }\n", + "}\n", + "\n", + "tools = [\n", + " {\"type\": \"function\", \"function\": destination_availability_tool},\n", + " {\"type\": \"function\", \"function\": ticket_price_tool},\n", + " {\"type\": \"function\", \"function\": ticket_booking_tool},\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f02c17ba-14f2-41c4-b6a2-d1397405d368", + "metadata": {}, + "outputs": [], + "source": [ + "def handle_tool_call(message):\n", + " \"\"\"\n", + " Handles a single OpenAI tool call message and returns both the result\n", + " and a formatted tool response dictionary.\n", + " \n", + " Args:\n", + " message (object): An OpenAI message containing a tool call.\n", + " \n", + " Returns:\n", + " tuple: (result_dict, response_dict)\n", + " \"\"\"\n", + " tool_call = message.tool_calls[0]\n", + " function_name = tool_call.function.name\n", + " arguments = json.loads(tool_call.function.arguments)\n", + "\n", + " result = None\n", + "\n", + " logging.info(f\"Tool call received: {function_name} with arguments: {arguments}\")\n", + "\n", + " if function_name == \"check_destination_availability\":\n", + " result = check_destination_availability(**arguments)\n", + "\n", + " elif function_name == \"fetch_ticket_price\":\n", + " city = arguments.get(\"destination_city\")\n", + " price_info = fetch_ticket_price(city)\n", + " result = {\"destination_city\": city, \"price\": price_info[\"price\"]}\n", + "\n", + " elif function_name == \"book_ticket\":\n", + " result = book_ticket(**arguments)\n", + "\n", + " else:\n", + " logging.warning(\"Unrecognized tool function: %s\", function_name)\n", + " result = {\"error\": f\"Unknown function '{function_name}'\"}\n", + "\n", + " response = {\n", + " \"role\": \"tool\",\n", + " \"tool_call_id\": tool_call.id,\n", + " \"content\": json.dumps(result)\n", + " }\n", + "\n", + " return result, response" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "72c1a9e7-186c-4218-9edc-01814baec431", + "metadata": {}, + "outputs": [], + "source": [ + "def artist(city: str, style: str = \"vibrant pop-art\", size: str = \"1024x1024\") -> Image.Image:\n", + " \"\"\"\n", + " Generates a city-themed vacation image using DALL·E.\n", + "\n", + " Args:\n", + " city (str): Name of the city to visualize.\n", + " style (str): Artistic style for the image prompt.\n", + " size (str): Image resolution (e.g., \"1024x1024\").\n", + "\n", + " Returns:\n", + " Image.Image: A PIL Image object representing the generated image.\n", + "\n", + " Raises:\n", + " ValueError: If city name is empty.\n", + " RuntimeError: If image generation fails.\n", + " \"\"\"\n", + " if not city.strip():\n", + " raise ValueError(\"City name cannot be empty.\")\n", + "\n", + " prompt = (\n", + " f\"An image representing a vacation in {city}, \"\n", + " f\"showing iconic tourist attractions, cultural elements, and everything unique about {city}, \"\n", + " f\"rendered in a {style} style.\"\n", + " )\n", + "\n", + " logging.info(\"Generating image for city: %s with style: %s\", city, style)\n", + "\n", + " try:\n", + " response = openai.images.generate(\n", + " model=\"dall-e-3\",\n", + " prompt=prompt,\n", + " size=size,\n", + " n=1,\n", + " response_format=\"b64_json\",\n", + " )\n", + "\n", + " image_base64 = response.data[0].b64_json\n", + " image_data = base64.b64decode(image_base64)\n", + " logging.info(\"Image generation successful for %s\", city)\n", + "\n", + " return Image.open(BytesIO(image_data))\n", + "\n", + " except Exception as e:\n", + " logging.error(\"Failed to generate image for city '%s': %s\", city, str(e))\n", + " raise RuntimeError(f\"Image generation failed for city '{city}'\") from e" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fdf7c091-6c68-4af6-8197-c1456b36cedf", + "metadata": {}, + "outputs": [], + "source": [ + "def talker(message: str, output_filename: str = \"output_audio.mp3\", autoplay: bool = True) -> None:\n", + " \"\"\"\n", + " Converts a text message into speech using OpenAI TTS and plays the audio.\n", + "\n", + " Args:\n", + " message (str): The text to convert to speech.\n", + " output_filename (str): The filename to save the generated audio.\n", + " autoplay (bool): Whether to autoplay the audio in the notebook.\n", + "\n", + " Raises:\n", + " ValueError: If the message is empty.\n", + " RuntimeError: If the audio generation fails.\n", + " \"\"\"\n", + " if not message.strip():\n", + " raise ValueError(\"Message cannot be empty.\")\n", + "\n", + " logging.info(\"Generating speech for message: %s\", message)\n", + "\n", + " try:\n", + " response = openai.audio.speech.create(\n", + " model=\"tts-1\",\n", + " voice=\"alloy\",\n", + " input=message\n", + " )\n", + "\n", + " with open(output_filename, \"wb\") as f:\n", + " f.write(response.content)\n", + "\n", + " logging.info(\"Audio written to: %s\", output_filename)\n", + "\n", + " if autoplay:\n", + " display(Audio(output_filename, autoplay=True))\n", + "\n", + " except Exception as e:\n", + " logging.error(\"Failed to generate or play audio: %s\", str(e))\n", + " raise RuntimeError(\"Text-to-speech generation failed.\") from e" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "54568b4a-be8d-47a1-b924-03acdafef70e", + "metadata": {}, + "outputs": [], + "source": [ + "def translate(message, language):\n", + " \"\"\"\n", + " Translates the given text into the specified language using OpenAI Chat API.\n", + "\n", + " Args:\n", + " message (str): The text to be translated.\n", + " language (str): Target language for translation (e.g., 'French', 'Japanese').\n", + "\n", + " Returns:\n", + " str: Translated text.\n", + "\n", + " Raises:\n", + " ValueError: If input message or language is empty.\n", + " RuntimeError: If translation fails due to API or other issues.\n", + " \"\"\"\n", + " if not message.strip():\n", + " raise ValueError(\"Input message cannot be empty.\")\n", + " if not language.strip():\n", + " raise ValueError(\"Target language cannot be empty.\")\n", + "\n", + " logging.info(\"Translating to %s: %s\", language, message)\n", + "\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": f\"You are a translation assistant. Translate everything the user says to {language}.\"},\n", + " {\"role\": \"user\", \"content\": message}\n", + " ]\n", + "\n", + " try:\n", + " response = openai.chat.completions.create(\n", + " model=MODEL,\n", + " messages=messages\n", + " )\n", + " translated = response.choices[0].message.content.strip()\n", + " logging.info(\"Translation successful.\")\n", + " return translated\n", + "\n", + " except Exception as e:\n", + " logging.error(\"Translation failed: %s\", str(e))\n", + " raise RuntimeError(\"Failed to translate message.\") from e" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8e6cf470-8ea0-43b2-bbcc-53c2432feb0d", + "metadata": {}, + "outputs": [], + "source": [ + "def transcribe_audio(audio_path):\n", + " \"\"\"\n", + " Transcribes an audio file using OpenAI's Whisper model.\n", + "\n", + " Args:\n", + " audio_path (str): Path to the audio file (e.g., .mp3, .wav).\n", + " model (str): OpenAI model for transcription (default: 'whisper-1').\n", + "\n", + " Returns:\n", + " str: Transcribed text from the audio file.\n", + "\n", + " Raises:\n", + " ValueError: If the path is invalid or the file does not exist.\n", + " RuntimeError: If the transcription fails.\n", + " \"\"\"\n", + " if not audio_path or not os.path.exists(audio_path):\n", + " raise ValueError(\"Invalid or missing audio file path.\")\n", + "\n", + " logging.info(\"Transcribing audio file: %s using model: whisper-1\", audio_path)\n", + "\n", + " try:\n", + " with open(audio_path, \"rb\") as f:\n", + " response = openai.audio.transcriptions.create(\n", + " model=\"whisper-1\",\n", + " file=f\n", + " )\n", + " transcript = response.text.strip()\n", + " logging.info(\"Transcription successful.\")\n", + " return transcript\n", + "\n", + " except Exception as e:\n", + " logging.error(\"Transcription failed: %s\", str(e))\n", + " raise RuntimeError(\"Failed to transcribe audio.\") from e" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3489656e-0f08-4d41-94b1-d902c93ca164", + "metadata": {}, + "outputs": [], + "source": [ + "def chat(history: list, language: str, translated_history: list, speaking_language: str) -> tuple:\n", + " \"\"\"\n", + " Handles a chat interaction including tool calls, image generation, translation, and TTS playback.\n", + "\n", + " Args:\n", + " history (list): List of previous conversation messages.\n", + " language (str): Target language for translation and TTS.\n", + "\n", + " Returns:\n", + " tuple: (updated history list, generated image if any, translated response string)\n", + " \"\"\"\n", + " messages = [{\"role\": \"system\", \"content\": system_message}] + history\n", + " image = None\n", + "\n", + " try:\n", + " # Initial assistant response\n", + " response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n", + " choice = response.choices[0]\n", + "\n", + " # Handle tool calls if triggered\n", + " if choice.finish_reason == \"tool_calls\":\n", + " message = choice.message\n", + " result, tool_response = handle_tool_call(message)\n", + "\n", + " # Append tool-related messages\n", + " messages.append(message)\n", + " messages.append(tool_response)\n", + " logging.info(\"Tool call result: %s\", result)\n", + "\n", + " # Generate image if a booking was completed\n", + " if message.tool_calls[0].function.name == \"book_ticket\" and \"destination_city\" in result:\n", + " image = artist(result[\"destination_city\"])\n", + "\n", + " # Get final assistant response after tool execution\n", + " response = openai.chat.completions.create(model=MODEL, messages=messages)\n", + " choice = response.choices[0]\n", + "\n", + " reply = choice.message.content.strip()\n", + " history.append({\"role\": \"assistant\", \"content\": reply})\n", + "\n", + " # Translate and speak the reply\n", + " translated_reply = translate(reply, language)\n", + " translated_history.append({\"role\": \"assistant\", \"content\": translated_reply})\n", + "\n", + " if speaking_language == \"English\":\n", + " talker(reply)\n", + " else:\n", + " talker(translated_reply)\n", + "\n", + " return history, image, translated_history\n", + "\n", + " except Exception as e:\n", + " logging.error(\"Chat processing failed: %s\", str(e))\n", + " raise RuntimeError(\"Failed to complete chat interaction.\") from e" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f76acc68-726e-457f-88ab-99da75debde5", + "metadata": {}, + "outputs": [], + "source": [ + "force_dark_mode = \"\"\"\n", + "function refresh() {\n", + " const url = new URL(window.location);\n", + " if (url.searchParams.get('__theme') !== 'dark') {\n", + " url.searchParams.set('__theme', 'dark');\n", + " window.location.href = url.href;\n", + " }\n", + "}\n", + "\"\"\"\n", + "\n", + "with gr.Blocks(js=force_dark_mode) as ui:\n", + " with gr.Row():\n", + " gr.Markdown(\"### FlightAI Chat with Translation\")\n", + "\n", + " with gr.Row():\n", + " lang_dropdown = gr.Dropdown(\n", + " choices=[\"Spanish\", \"French\", \"German\", \"Japanese\", \"Hindi\"],\n", + " value=\"Spanish\",\n", + " label=\"Translate To\"\n", + " )\n", + " \n", + " speak_dropdown = gr.Dropdown(\n", + " choices=[\"English\", \"Selected Language\"],\n", + " value=\"English\",\n", + " label=\"Speak out in\"\n", + " )\n", + " \n", + " with gr.Row():\n", + " chatbot = gr.Chatbot(height=500, type=\"messages\", label=\"Chat History\")\n", + " translated_chatbot = gr.Chatbot(height=500, type=\"messages\", label=\"Translated Chat\")\n", + " image_output = gr.Image(height=500)\n", + "\n", + " with gr.Row():\n", + " entry = gr.Textbox(label=\"Chat with our AI Assistant:\")\n", + " audio_input = gr.Audio(sources=\"microphone\", type=\"filepath\", label=\"Or speak to the assistant\")\n", + "\n", + " with gr.Row():\n", + " clear = gr.Button(\"Clear\")\n", + "\n", + " def do_entry(message, history, audio, translated_history, language):\n", + " if audio:\n", + " message = transcribe_audio(audio)\n", + "\n", + " if message:\n", + " history += [{\"role\": \"user\", \"content\": message}]\n", + " translated_history += [{\"role\": \"user\", \"content\": translate(message, language)}]\n", + " return \"\", history, None, translated_history\n", + "\n", + " entry.submit(\n", + " do_entry,\n", + " inputs=[entry, chatbot, audio_input, translated_chatbot, lang_dropdown],\n", + " outputs=[entry, chatbot, audio_input, translated_chatbot]\n", + " ).then(\n", + " chat,\n", + " inputs=[chatbot, lang_dropdown, translated_chatbot, speak_dropdown],\n", + " outputs=[chatbot, image_output, translated_chatbot]\n", + " )\n", + "\n", + " audio_input.change(\n", + " do_entry,\n", + " inputs=[entry, chatbot, audio_input, translated_chatbot, lang_dropdown],\n", + " outputs=[entry, chatbot, audio_input, translated_chatbot]\n", + " ).then(\n", + " chat,\n", + " inputs=[chatbot, lang_dropdown, translated_chatbot, speak_dropdown],\n", + " outputs=[chatbot, image_output, translated_chatbot]\n", + " )\n", + "\n", + " clear.click(lambda: [\"\", [], None, [], None], inputs=None, outputs=[entry, chatbot, audio_input, translated_chatbot, image_output], queue=False)\n", + "\n", + "ui.launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "58f97435-fa0d-45f7-b02f-4ac5f4901c53", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/Week2_airline_assistant_Gemini_Amadeus_live_ticket_price.ipynb b/week2/community-contributions/Week2_airline_assistant_Gemini_Amadeus_live_ticket_price.ipynb new file mode 100644 index 0000000..bc4f92a --- /dev/null +++ b/week2/community-contributions/Week2_airline_assistant_Gemini_Amadeus_live_ticket_price.ipynb @@ -0,0 +1,808 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d938fc6c-bcca-4572-b851-75370fe21c67", + "metadata": {}, + "source": [ + "# Airline Assistant using Gemini API for Image and Audio as well - Live ticket prices using Amadeus API" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f5eda470-07ee-4d01-bada-3390050ac9c2", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import json\n", + "import random\n", + "import string\n", + "import base64\n", + "import gradio as gr\n", + "import pyaudio\n", + "import requests\n", + "from io import BytesIO\n", + "from PIL import Image\n", + "from dotenv import load_dotenv\n", + "from google import genai\n", + "from google.genai import types" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "09aaf3b0-beb7-4b64-98a4-da16fc83dadb", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "api_key = os.getenv(\"GOOGLE_API_KEY\")\n", + "\n", + "if not api_key:\n", + " print(\"API Key not found!\")\n", + "else:\n", + " print(\"API Key loaded in memory\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35881fb9-4d51-43dc-a5e6-d9517e22019a", + "metadata": {}, + "outputs": [], + "source": [ + "MODEL_GEMINI = 'gemini-2.5-flash'\n", + "MODEL_GEMINI_IMAGE = 'gemini-2.0-flash-preview-image-generation'\n", + "MODEL_GEMINI_SPEECH = 'gemini-2.5-flash-preview-tts'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a5ed391c-8a67-4465-9c66-e915548a0d6a", + "metadata": {}, + "outputs": [], + "source": [ + "try:\n", + " client = genai.Client(api_key=api_key)\n", + " print(\"Google GenAI Client initialized successfully!\")\n", + "except Exception as e:\n", + " print(f\"Error initializing GenAI Client: {e}\")\n", + " print(\"Ensure your GOOGLE_API_KEY is correctly set as an environment variable.\")\n", + " exit() " + ] + }, + { + "cell_type": "markdown", + "id": "407ad581-9580-4dba-b236-abb6c6788933", + "metadata": {}, + "source": [ + "## Image Generation " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a21921f8-57b1-4665-8999-7f2a40645b59", + "metadata": {}, + "outputs": [], + "source": [ + "def fetch_image(city):\n", + " prompt = (\n", + " f\"A high-quality, photo-realistic image of a vacation in {city}, \"\n", + " f\"showing iconic landmarks, cultural attractions, authentic street life, and local cuisine. \"\n", + " f\"Capture natural lighting, real people enjoying travel experiences, and the unique vibe of {city}'s atmosphere. \"\n", + " f\"The composition should feel immersive, warm, and visually rich, as if taken by a travel photographer.\"\n", + ")\n", + "\n", + " response = client.models.generate_content(\n", + " model = MODEL_GEMINI_IMAGE,\n", + " contents = prompt,\n", + " config=types.GenerateContentConfig(\n", + " response_modalities=['TEXT', 'IMAGE']\n", + " )\n", + " )\n", + "\n", + " for part in response.candidates[0].content.parts:\n", + " if part.inline_data is not None:\n", + " image_data = BytesIO(part.inline_data.data)\n", + " return Image.open(image_data)\n", + "\n", + " raise ValueError(\"No image found in Gemini response.\")\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bcd4aed1-8b4d-4771-ba32-e729e82bab54", + "metadata": {}, + "outputs": [], + "source": [ + "fetch_image(\"london\")" + ] + }, + { + "cell_type": "markdown", + "id": "5f6baee6-e2e2-4cc4-941d-34a4c72cee67", + "metadata": {}, + "source": [ + "## Speech Generation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "825dfedc-0271-4191-a3d1-50872af4c8cf", + "metadata": {}, + "outputs": [], + "source": [ + "\"\"\"\n", + "Kore -- Firm\n", + "Puck -- Upbeat\n", + "Leda -- Youthful\n", + "Iapetus -- Clear\n", + "Erinome -- Clear\n", + "Sadachbia -- Lively\n", + "Sulafat -- Warm\n", + "Despina -- Smooth\n", + "\"\"\"\n", + "\n", + "def talk(message:str, voice_name:str=\"Leda\", mood:str=\"cheerfully\"):\n", + " prompt = f\"Say {mood}: {message}\"\n", + " response = client.models.generate_content(\n", + " model = MODEL_GEMINI_SPEECH,\n", + " contents = prompt,\n", + " config=types.GenerateContentConfig(\n", + " response_modalities=[\"AUDIO\"],\n", + " speech_config=types.SpeechConfig(\n", + " voice_config=types.VoiceConfig(\n", + " prebuilt_voice_config=types.PrebuiltVoiceConfig(\n", + " voice_name=voice_name,\n", + " )\n", + " )\n", + " ), \n", + " )\n", + " )\n", + "\n", + " # Fetch the audio bytes\n", + " pcm_data = response.candidates[0].content.parts[0].inline_data.data\n", + " # Play the audio using PyAudio\n", + " p = pyaudio.PyAudio()\n", + " stream = p.open(format=pyaudio.paInt16, channels=1, rate=24000, output=True)\n", + " stream.write(pcm_data)\n", + " stream.stop_stream()\n", + " stream.close()\n", + " p.terminate()\n", + "\n", + " # Play using simpleaudio (16-bit PCM, mono, 24kHz)\n", + " # play_obj = sa.play_buffer(pcm_data, num_channels=1, bytes_per_sample=2, sample_rate=24000)\n", + " # play_obj.wait_done() " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "54967ebc-24a6-4bb2-9a19-20c3585f1d77", + "metadata": {}, + "outputs": [], + "source": [ + "talk(\"Hi, How are you? Welcome to FlyJumbo Airlines\",\"Kore\",\"helpful\")" + ] + }, + { + "cell_type": "markdown", + "id": "be9dc275-838e-4c54-b487-41d094dad96b", + "metadata": {}, + "source": [ + "## Ticket Price Tool Function - Using Amadeus API " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8613a080-d82c-4c1a-8db4-377614997ac2", + "metadata": {}, + "outputs": [], + "source": [ + "client_id = os.getenv(\"AMADEUS_CLIENT_ID\")\n", + "client_secret = os.getenv(\"AMADEUS_CLIENT_SECRET\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6bf78f61-0de1-4552-a1d4-1a28380be6a5", + "metadata": {}, + "outputs": [], + "source": [ + "# Get the token first\n", + "def get_amadeus_token():\n", + " url = \"https://test.api.amadeus.com/v1/security/oauth2/token\"\n", + " headers = {\"Content-Type\": \"application/x-www-form-urlencoded\"}\n", + " data = {\n", + " \"grant_type\": \"client_credentials\",\n", + " \"client_id\": client_id,\n", + " \"client_secret\": client_secret,\n", + " }\n", + " \n", + " try:\n", + " response = requests.post(url, headers=headers, data=data, timeout=10)\n", + " response.raise_for_status()\n", + " return response.json()[\"access_token\"]\n", + " \n", + " except requests.exceptions.HTTPError as e:\n", + " print(f\"HTTP Error {response.status_code}: {response.text}\")\n", + " \n", + " except requests.exceptions.RequestException as e:\n", + " print(\"Network or connection error:\", e)\n", + " \n", + " return None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1c5261f6-6662-4e9d-8ff0-8e10171bb963", + "metadata": {}, + "outputs": [], + "source": [ + "def get_airline_name(code, token):\n", + " url = f\"https://test.api.amadeus.com/v1/reference-data/airlines\"\n", + " headers = {\"Authorization\": f\"Bearer {token}\"}\n", + " params = {\"airlineCodes\": code}\n", + "\n", + " response = requests.get(url, headers=headers, params=params)\n", + " response.raise_for_status()\n", + " data = response.json()\n", + "\n", + " if \"data\" in data and data[\"data\"]:\n", + " return data[\"data\"][0].get(\"businessName\", code)\n", + " return code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42a55f06-880a-4c49-8560-2e7b97953c1a", + "metadata": {}, + "outputs": [], + "source": [ + "COMMON_CITY_CODES = {\n", + " \"delhi\": \"DEL\",\n", + " \"mumbai\": \"BOM\",\n", + " \"chennai\": \"MAA\",\n", + " \"kolkata\": \"CCU\",\n", + " \"bengaluru\": \"BLR\",\n", + " \"hyderabad\": \"HYD\",\n", + " \"patna\": \"PAT\",\n", + " \"raipur\": \"RPR\",\n", + " \"panaji\": \"GOI\",\n", + " \"chandigarh\": \"IXC\",\n", + " \"srinagar\": \"SXR\",\n", + " \"ranchi\": \"IXR\",\n", + " \"bengaluru\": \"BLR\",\n", + " \"thiruvananthapuram\": \"TRV\",\n", + " \"bhopal\": \"BHO\",\n", + " \"mumbai\": \"BOM\",\n", + " \"imphal\": \"IMF\",\n", + " \"aizawl\": \"AJL\",\n", + " \"bhubaneswar\": \"BBI\",\n", + " \"jaipur\": \"JAI\",\n", + " \"chennai\": \"MAA\",\n", + " \"hyderabad\": \"HYD\",\n", + " \"agartala\": \"IXA\",\n", + " \"lucknow\": \"LKO\",\n", + " \"dehradun\": \"DED\",\n", + " \"kolkata\": \"CCU\",\n", + "\n", + " # Union territories\n", + " \"port blair\": \"IXZ\",\n", + " \"leh\": \"IXL\",\n", + " \"puducherry\": \"PNY\",\n", + "\n", + " # Major metro cities (for redundancy)\n", + " \"ahmedabad\": \"AMD\",\n", + " \"surat\": \"STV\",\n", + " \"coimbatore\": \"CJB\",\n", + " \"vizag\": \"VTZ\",\n", + " \"vijayawada\": \"VGA\",\n", + " \"nagpur\": \"NAG\",\n", + " \"indore\": \"IDR\",\n", + " \"kanpur\": \"KNU\",\n", + " \"varanasi\": \"VNS\"\n", + "}\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b061ec2c-609b-4d77-bd41-c9bc5bf901f4", + "metadata": {}, + "outputs": [], + "source": [ + "city_code_cache = {}\n", + "\n", + "def get_city_code(city_name, token):\n", + " city_name = city_name.strip().lower()\n", + "\n", + " if city_name in city_code_cache:\n", + " return city_code_cache[city_name]\n", + "\n", + " if city_name in COMMON_CITY_CODES:\n", + " return COMMON_CITY_CODES[city_name]\n", + "\n", + " base_url = \"https://test.api.amadeus.com/v1/reference-data/locations\"\n", + " headers = {\"Authorization\": f\"Bearer {token}\"}\n", + "\n", + " for subtype in [\"CITY\", \"AIRPORT,CITY\"]:\n", + " params = {\"keyword\": city_name, \"subType\": subtype}\n", + " try:\n", + " response = requests.get(base_url, headers=headers, params=params, timeout=10)\n", + " response.raise_for_status()\n", + " data = response.json()\n", + "\n", + " if \"data\" in data and data[\"data\"]:\n", + " code = data[\"data\"][0][\"iataCode\"]\n", + " print(f\"[INFO] Found {subtype} match for '{city_name}': {code}\")\n", + " city_code_cache[city_name] = code\n", + " return code\n", + " except Exception as e:\n", + " print(f\"[ERROR] Location lookup failed for {subtype}: {e}\")\n", + "\n", + " return None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e9816a9c-fd70-4dfc-a3c0-4d8709997371", + "metadata": {}, + "outputs": [], + "source": [ + "# Getting live ticket price \n", + "\n", + "def get_live_ticket_prices(origin, destination, departure_date, return_date=None):\n", + " token = get_amadeus_token()\n", + "\n", + " url = \"https://test.api.amadeus.com/v2/shopping/flight-offers\"\n", + " headers = {\"Authorization\": f\"Bearer {token}\"}\n", + "\n", + " origin_code = get_city_code(origin,token)\n", + " destination_code = get_city_code(destination,token)\n", + "\n", + " if not origin_code:\n", + " return f\"Sorry, I couldn't find the airport code for the city '{origin}'.\"\n", + " if not destination_code:\n", + " return f\"Sorry, I couldn't find the airport code for the city '{destination}'.\"\n", + "\n", + " params = {\n", + " \"originLocationCode\": origin_code.upper(),\n", + " \"destinationLocationCode\": destination_code.upper(),\n", + " \"departureDate\": departure_date,\n", + " \"adults\": 1,\n", + " \"currencyCode\": \"USD\",\n", + " \"max\": 1,\n", + " }\n", + "\n", + " if return_date:\n", + " params[\"returnDate\"] = return_date\n", + "\n", + " try:\n", + " response = requests.get(url, headers=headers, params=params, timeout=10)\n", + " response.raise_for_status()\n", + " data = response.json()\n", + " \n", + " if \"data\" in data and data[\"data\"]:\n", + " offer = data[\"data\"][0]\n", + " price = offer[\"price\"][\"total\"]\n", + " airline_codes = offer.get(\"validatingAirlineCodes\", [])\n", + " airline_code = airline_codes[0] if airline_codes else \"Unknown\"\n", + "\n", + " try:\n", + " airline_name = get_airline_name(airline_code, token) if airline_code != \"Unknown\" else \"Unknown Airline\"\n", + " if not airline_name: \n", + " airline_name = airline_code\n", + " except Exception:\n", + " airline_name = airline_code\n", + " \n", + " \n", + " if return_date:\n", + " return (\n", + " f\"Round-trip flight from {origin.capitalize()} to {destination.capitalize()}:\\n\"\n", + " f\"- Departing: {departure_date}\\n\"\n", + " f\"- Returning: {return_date}\\n\"\n", + " f\"- Airline: {airline_name}\\n\"\n", + " f\"- Price: ${price}\"\n", + " )\n", + " else:\n", + " return (\n", + " f\"One-way flight from {origin.capitalize()} to {destination.capitalize()} on {departure_date}:\\n\"\n", + " f\"- Airline: {airline_name}\\n\"\n", + " f\"- Price: ${price}\"\n", + " )\n", + " else:\n", + " return f\"No flights found from {origin.capitalize()} to {destination.capitalize()} on {departure_date}.\"\n", + " except requests.exceptions.RequestException as e:\n", + " return f\"❌ Error fetching flight data: {str(e)}\" \n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7bc7657e-e8b5-4647-9745-d7d403feb09a", + "metadata": {}, + "outputs": [], + "source": [ + "get_live_ticket_prices(\"london\", \"chennai\", \"2025-07-01\",\"2025-07-10\")" + ] + }, + { + "cell_type": "markdown", + "id": "e1153b94-90e7-4856-8c85-e456305a7817", + "metadata": {}, + "source": [ + "## Ticket Booking Tool Function - DUMMY" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5dfc3b12-0a16-4861-a549-594f175ff956", + "metadata": {}, + "outputs": [], + "source": [ + "def book_flight(origin, destination, departure_date, return_date=None, airline=\"Selected Airline\", passenger_name=\"Guest\"):\n", + " # Generate a dummy ticket reference (PNR)\n", + " ticket_ref = ''.join(random.choices(string.ascii_uppercase + string.digits, k=6))\n", + "\n", + " # Build confirmation message\n", + " confirmation = (\n", + " f\"🎫 Booking confirmed for {passenger_name}!\\n\"\n", + " f\"From: {origin.capitalize()} → To: {destination.capitalize()}\\n\"\n", + " f\"Departure: {departure_date}\"\n", + " )\n", + "\n", + " if return_date:\n", + " confirmation += f\"\\nReturn: {return_date}\"\n", + "\n", + " confirmation += (\n", + " f\"\\nAirline: {airline}\\n\"\n", + " f\"PNR: {ticket_ref}\\n\"\n", + " f\"✅ Your ticket has been booked successfully. Safe travels!\"\n", + " )\n", + "\n", + " return confirmation\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "122f655b-b7a4-45c6-aaec-afd2917a051b", + "metadata": {}, + "outputs": [], + "source": [ + "print(book_flight(\"chennai\", \"delhi\", \"2025-07-01\", \"2025-07-10\", \"Air India\", \"Ravi Kumar\"))" + ] + }, + { + "cell_type": "markdown", + "id": "e83d8e90-ae22-4728-83e5-d83fed7f2049", + "metadata": {}, + "source": [ + "## Gemini Chat Workings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5a656f4e-914d-4f5e-b7fa-48457935181a", + "metadata": {}, + "outputs": [], + "source": [ + "ticket_price_function_declaration = {\n", + " \"name\":\"get_live_ticket_prices\",\n", + " \"description\": \"Get live flight ticket prices between two cities for a given date (round-trip or one-way).\\\n", + " The destination may be a city or country (e.g., 'China'). Call this function whenever a customer asks about ticket prices., such as 'How much is a ticket to Paris?'\",\n", + " \"parameters\":{\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"origin\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Name of the origin city. Example: 'Delhi'\",\n", + " },\n", + " \"destination\": {\n", + " \"type\": \"string\",\n", + " \"description\":\"Name of the destination city. Example: 'London'\",\n", + " },\n", + " \"departure_date\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Date of departure in YYYY-MM-DD format. Example: '2025-07-01'\",\n", + " },\n", + " \"return_date\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Optional return date for round-trip in YYYY-MM-DD format. Leave blank for one-way trips.\",\n", + " },\n", + " },\n", + " \"required\": [\"origin\", \"destination\", \"departure_date\"],\n", + " }\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05a835ab-a675-40ed-9cd8-65f4c6b22722", + "metadata": {}, + "outputs": [], + "source": [ + "book_flight_function_declaration = {\n", + " \"name\": \"book_flight\",\n", + " \"description\": \"Book a flight for the user after showing the ticket details and confirming the booking. \"\n", + " \"Call this function when the user says things like 'yes', 'book it', or 'I want to book this flight'.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"origin\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Name of the origin city. Example: 'Chennai'\",\n", + " },\n", + " \"destination\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Name of the destination city. Example: 'London'\",\n", + " },\n", + " \"departure_date\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Date of departure in YYYY-MM-DD format. Example: '2025-07-01'\",\n", + " },\n", + " \"return_date\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Optional return date for round-trip in YYYY-MM-DD format. Leave blank for one-way trips.\",\n", + " },\n", + " \"airline\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Airline name or code that the user wants to book with. Example: 'Air India'\",\n", + " },\n", + " \"passenger_name\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Full name of the passenger for the booking. Example: 'Ravi Kumar'\",\n", + " }\n", + " },\n", + " \"required\": [\"origin\", \"destination\", \"departure_date\", \"passenger_name\"],\n", + " }\n", + "}\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ad0231cd-040f-416d-b150-0d8f90535718", + "metadata": {}, + "outputs": [], + "source": [ + "# System Definitions\n", + "\n", + "system_instruction_prompt = (\n", + " \"You are a helpful and courteous AI assistant for an airline company called FlyJumbo. \"\n", + " \"When a user starts a new conversation, greet them with: 'Hi there, welcome to FlyJumbo! How can I help you?'. \"\n", + " \"Do not repeat this greeting in follow-up messages. \"\n", + " \"Use the available tools if a user asks about ticket prices. \"\n", + " \"Ask follow-up questions to gather all necessary information before calling a function.\"\n", + " \"After calling a tool, always continue the conversation by summarizing the result and asking the user the next relevant question (e.g., if they want to proceed with a booking).\"\n", + " \"If you do not know the answer and no tool can help, respond politely that you are unable to help with the request. \"\n", + " \"Answer concisely in one sentence.\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ff0b3de8-5674-4f08-9f9f-06f88ff959a1", + "metadata": {}, + "outputs": [], + "source": [ + "tools = types.Tool(function_declarations=[ticket_price_function_declaration,book_flight_function_declaration])\n", + "generate_content_config = types.GenerateContentConfig(system_instruction=system_instruction_prompt, tools=[tools])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00a56779-16eb-4f31-9941-2eb01d17ed87", + "metadata": {}, + "outputs": [], + "source": [ + "def handle_tool_call(function_call):\n", + " print(f\"🔧 Function Called - {function_call.name}\")\n", + " function_name = function_call.name\n", + " args = function_call.args\n", + "\n", + " if function_name == \"get_live_ticket_prices\":\n", + " origin = args.get(\"origin\")\n", + " destination = args.get(\"destination\")\n", + " departure_date = args.get(\"departure_date\")\n", + " return_date = args.get(\"return_date\") or None\n", + "\n", + " return get_live_ticket_prices(origin, destination, departure_date, return_date)\n", + "\n", + " elif function_name == \"book_flight\":\n", + " origin = args.get(\"origin\")\n", + " destination = args.get(\"destination\")\n", + " departure_date = args.get(\"departure_date\")\n", + " return_date = args.get(\"return_date\") or None\n", + " airline = args.get(\"airline\", \"Selected Airline\")\n", + " passenger_name = args.get(\"passenger_name\", \"Guest\")\n", + "\n", + " return book_flight(origin, destination, departure_date, return_date, airline, passenger_name)\n", + "\n", + " else:\n", + " return f\"❌ Unknown function: {function_name}\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d0c334d2-9ab0-4f80-ac8c-c66897e0bd7c", + "metadata": {}, + "outputs": [], + "source": [ + "def chat(message, history):\n", + " full_message_history = []\n", + " city_name = None\n", + "\n", + " # Convert previous history to Gemini-compatible format\n", + " for h in history:\n", + " if h[\"role\"] == \"user\":\n", + " full_message_history.append(\n", + " types.Content(role=\"user\", parts=[types.Part.from_text(text=h[\"content\"])])\n", + " )\n", + " elif h[\"role\"] == \"assistant\":\n", + " full_message_history.append(\n", + " types.Content(role=\"model\", parts=[types.Part.from_text(text=h[\"content\"])])\n", + " )\n", + "\n", + " # Add current user message\n", + " full_message_history.append(\n", + " types.Content(role=\"user\", parts=[types.Part.from_text(text=message)])\n", + " )\n", + "\n", + " # Send to Gemini with tool config\n", + " response = client.models.generate_content(\n", + " model=MODEL_GEMINI,\n", + " contents=full_message_history,\n", + " config=generate_content_config\n", + " )\n", + "\n", + " candidate = response.candidates[0]\n", + " part = candidate.content.parts[0]\n", + " function_call = getattr(part, \"function_call\", None)\n", + "\n", + " # Case: Tool call required\n", + " if function_call:\n", + " # Append model message that triggered tool call\n", + " full_message_history.append(\n", + " types.Content(role=\"model\", parts=candidate.content.parts)\n", + " )\n", + "\n", + " # Execute the tool\n", + " tool_output = handle_tool_call(function_call)\n", + "\n", + " # Wrap and append tool output\n", + " tool_response_part = types.Part.from_function_response(\n", + " name=function_call.name,\n", + " response={\"result\": tool_output}\n", + " )\n", + " \n", + " full_message_history.append(\n", + " types.Content(role=\"function\", parts=[tool_response_part])\n", + " )\n", + "\n", + "\n", + " if function_call.name == \"book_flight\":\n", + " city_name = function_call.args.get(\"destination\").lower()\n", + " \n", + "\n", + " # Send follow-up message including tool result\n", + " followup_response = client.models.generate_content(\n", + " model=MODEL_GEMINI,\n", + " contents=full_message_history,\n", + " config=generate_content_config\n", + " )\n", + "\n", + " final_text = followup_response.text\n", + " \n", + " full_message_history.append(\n", + " types.Content(role=\"model\", parts=[types.Part.from_text(text=final_text)])\n", + " )\n", + "\n", + " return final_text,city_name, history + [{\"role\": \"assistant\", \"content\": final_text}]\n", + " else:\n", + " text = response.text\n", + " return text, city_name, history + [{\"role\": \"assistant\", \"content\": text}]\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b245e6c-ef0b-4edf-b178-f14f2a75f285", + "metadata": {}, + "outputs": [], + "source": [ + "def user_submit(user_input, history):\n", + " history = history or []\n", + " history.append({\"role\": \"user\", \"content\": user_input})\n", + " \n", + " response_text, city_to_image, updated_history = chat(user_input, history)\n", + "\n", + " # Speak the response\n", + " try:\n", + " talk(response_text)\n", + " except Exception as e:\n", + " print(\"[Speech Error] Speech skipped due to quota limit.\")\n", + "\n", + " image = fetch_image(city_to_image) if city_to_image else None\n", + "\n", + " return \"\", updated_history, image, updated_history\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7db25b86-9a71-417c-98f0-790e3f3531bf", + "metadata": {}, + "outputs": [], + "source": [ + "with gr.Blocks() as demo:\n", + " gr.Markdown(\"## ✈️ FlyJumbo Airline Assistant\")\n", + "\n", + " with gr.Row():\n", + " with gr.Column(scale=3):\n", + " chatbot = gr.Chatbot(label=\"Assistant\", height=500, type=\"messages\")\n", + " msg = gr.Textbox(placeholder=\"Ask about flights...\", show_label=False)\n", + " send_btn = gr.Button(\"Send\")\n", + "\n", + " with gr.Column(scale=2):\n", + " image_output = gr.Image(label=\"Trip Visual\", visible=True, height=500)\n", + "\n", + " state = gr.State([])\n", + " \n", + " send_btn.click(fn=user_submit, inputs=[msg, state], outputs=[msg, chatbot, image_output, state])\n", + " msg.submit(fn=user_submit, inputs=[msg, state], outputs=[msg, chatbot, image_output, state])\n", + "\n", + "demo.launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef31bf62-9034-4fa7-b803-8f5df5309b77", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/agent_conversation_shakespeare.ipynb b/week2/community-contributions/agent_conversation_shakespeare.ipynb new file mode 100644 index 0000000..6d55283 --- /dev/null +++ b/week2/community-contributions/agent_conversation_shakespeare.ipynb @@ -0,0 +1,351 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "06cf3063-9f3e-4551-a0d5-f08d9cabb927", + "metadata": {}, + "source": [ + "# Triangular agent conversation\n", + "\n", + "## GPT (Hamlet), LLM (Falstaff), Gemini (Iago):" + ] + }, + { + "cell_type": "markdown", + "id": "3637910d-2c6f-4f19-b1fb-2f916d23f9ac", + "metadata": {}, + "source": [ + "### Created a 3-way, bringing Gemini into the coversation.\n", + "### Replacing one of the models with an open source model running with Ollama." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8e0c1bd-a159-475b-9cdc-e219a7633355", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "from IPython.display import Markdown, display, update_display\n", + "import ollama" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a3ad57ad-46a8-460e-9cb3-67a890093536", + "metadata": {}, + "outputs": [], + "source": [ + "import google.generativeai" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f531c14-5743-4a5b-83d9-cb5863ca2ddf", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "# Print the key prefixes to help with any debugging\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n", + "else:\n", + " print(\"Google API Key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d5150ee-3858-4921-bce6-2eecfb96bc75", + "metadata": {}, + "outputs": [], + "source": [ + "# Connect to OpenAI\n", + "\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11381fd8-5099-41e8-a1d7-6787dea56e43", + "metadata": {}, + "outputs": [], + "source": [ + "google.generativeai.configure()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1766d20-54b6-4f76-96c5-c338ae7073c9", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_model = \"gpt-4o-mini\"\n", + "llama_model = \"llama3.2\"\n", + "gemini_model = 'gemini-2.0-flash'\n", + "\n", + "gpt_system = \"You are playing part of Hamlet. he is philosopher, probes Iago with a mixture of suspicion\\\n", + "and intellectual curiosity, seeking to unearth the origins of his deceit.\\\n", + "Is malice born of scorn, envy, or some deeper void? Hamlet’s introspective nature\\\n", + "drives him to question whether Iago’s actions reveal a truth about humanity itself.\\\n", + "You will respond as Shakespear's Hamlet will do.\"\n", + "\n", + "llama_system = \"You are acting part of Falstaff who attempts to lighten the mood with his jokes and observations,\\\n", + "potentially clashing with Hamlet's melancholic nature.You respond as Shakespear's Falstaff do.\"\n", + "\n", + "gemini_system = \"You are acting part of Iago, subtly trying to manipulate both Hamlet and Falstaff\\\n", + "to his own advantage, testing their weaknesses and exploiting their flaws. You respond like Iago\"\n", + "\n", + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]\n", + "gemini_messages = [\"Hello\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "806a0506-dac8-4bad-ac08-31f350256b58", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " messages = [{\"role\": \"system\", \"content\": gpt_system}]\n", + " for gpt, claude, gemini in zip(gpt_messages, llama_messages, gemini_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43674885-ede7-48bf-bee4-467454f3e96a", + "metadata": {}, + "outputs": [], + "source": [ + "def call_llama():\n", + " messages = []\n", + " for gpt, llama, gemini in zip(gpt_messages, llama_messages, gemini_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": llama})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " response = ollama.chat(model=llama_model, messages=messages)\n", + "\n", + " \n", + " return response['message']['content']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "03d34769-b339-4c4b-8c60-69494c39d725", + "metadata": {}, + "outputs": [], + "source": [ + "#import google.generativeai as genai\n", + "\n", + "# Make sure you configure the API key first:\n", + "#genai.configure(api_key=\"YOUR_API_KEY\")\n", + "\n", + "def call_gemini():\n", + " gemini_messages = []\n", + " \n", + " # Format the history for Gemini\n", + " for gpt, llama, gemini_message in zip(gpt_messages, llama_messages, gemini_messages):\n", + " gemini_messages.append({\"role\": \"user\", \"parts\": [gpt]}) # Hamlet speaks\n", + " gemini_messages.append({\"role\": \"model\", \"parts\": [llama]}) # Falstaff responds\n", + " gemini_messages.append({\"role\": \"model\", \"parts\": [gemini_message]}) # Iago responds\n", + "\n", + " # Add latest user input if needed (optional)\n", + " gemini_messages.append({\"role\": \"user\", \"parts\": [llama_messages[-1]]})\n", + "\n", + " # Initialize the model with the correct system instruction\n", + " gemini = google.generativeai.GenerativeModel(\n", + " #model_name='gemini-1.5-flash', # Or 'gemini-pro'\n", + " model_name = gemini_model,\n", + " system_instruction=gemini_system\n", + " )\n", + "\n", + " response = gemini.generate_content(gemini_messages)\n", + " return response.text\n", + "#print(response.text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "93fc8253-67cb-4ea4-aff7-097b2a222793", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]\n", + "gemini_messages = [\"Hello\"]\n", + "\n", + "print(f\"Hamlet:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Falstaff:\\n{llama_messages[0]}\\n\")\n", + "print(f\"Iago:\\n{gemini_messages[0]}\\n\")\n", + "\n", + "for i in range(3):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT:\\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + " \n", + " llama_next = call_llama()\n", + " print(f\"Llama:\\n{llama_next}\\n\")\n", + " llama_messages.append(llama_next)\n", + "\n", + " gemini_next = call_gemini()\n", + " print(f\"Gemini:\\n{gemini_next}\\n\")\n", + " llama_messages.append(gemini_next)" + ] + }, + { + "cell_type": "markdown", + "id": "bca66ffc-9dc1-4384-880c-210889f5d0ac", + "metadata": {}, + "source": [ + "## Conversation between gpt-4.0-mini and llama3.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c23224f6-7008-44ed-a57f-718975f4e291", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n", + "# We're using cheap versions of models so the costs will be minimal\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "llama_model = \"llama3.2\"\n", + "\n", + "gpt_system = \"You are a tapori from mumbai who is very optimistic; \\\n", + "you alway look at the brighter part of the situation and you always ready to take act to win way.\"\n", + "\n", + "llama_system = \"You are a Jaat from Haryana. You try to express with hindi poems \\\n", + "to agree with other person and or find common ground. If the other person is optimistic, \\\n", + "you respond in poetic way and keep chatting.\"\n", + "\n", + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2d704bbb-f22b-400d-a695-efbd02b26548", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " messages = [{\"role\": \"system\", \"content\": gpt_system}]\n", + " for gpt, llama in zip(gpt_messages, llama_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": llama})\n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "385ccec8-de59-4e42-9616-3f5c9a05589c", + "metadata": {}, + "outputs": [], + "source": [ + "def call_llama():\n", + " messages = []\n", + " for gpt, llama_message in zip(gpt_messages, llama_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": llama_message})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " response = ollama.chat(model=llama_model, messages=messages)\n", + "\n", + " \n", + " return response['message']['content']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70b5481b-455e-4275-80d3-0afe0fabcb0f", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]\n", + "\n", + "print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Llama:\\n{llama_messages[0]}\\n\")\n", + "\n", + "for i in range(3):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT:\\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + " \n", + " llama_next = call_llama()\n", + " print(f\"Llama:\\n{llama_next}\\n\")\n", + " llama_messages.append(llama_next)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7f8d734b-57e5-427d-bcb1-7956fc58a348", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "llmenv", + "language": "python", + "name": "llmenv" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/anatomy_poster_generator/README.md b/week2/community-contributions/anatomy_poster_generator/README.md new file mode 100644 index 0000000..cd82535 --- /dev/null +++ b/week2/community-contributions/anatomy_poster_generator/README.md @@ -0,0 +1,10 @@ +# Anatomy Poster Generator + +This tool generates AI-powered wall art of human anatomy, designed to support meaningful conversations in clinical spaces. + +Built with: +- DALL·E 3 for image generation +- Python + Gradio for a simple UI +- Hugging Face Spaces for easy sharing (https://huggingface.co/spaces/sukihealth/wallanatomypostergenerator) + +See full repo: [github.com/sukihealth/retro-pop-art-anatomy](https://github.com/sukihealth/retro-pop-art-anatomy) diff --git a/week2/community-contributions/clinic_booking_bot.ipynb b/week2/community-contributions/clinic_booking_bot.ipynb new file mode 100644 index 0000000..d2d8b57 --- /dev/null +++ b/week2/community-contributions/clinic_booking_bot.ipynb @@ -0,0 +1,344 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 170, + "id": "a1aa1b43-7a47-4aca-ae5f-94a9d4ba2d89", + "metadata": {}, + "outputs": [], + "source": [ + "## Clinic Booking Bot\n", + "\n", + "##Easily book your clinic visit – available only on weekdays between **14:00 and 15:00**. \n", + "##Speak or type, and get instant confirmation.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 171, + "id": "fe798c6a-f8da-46aa-8c0e-9d2623def3d2", + "metadata": {}, + "outputs": [], + "source": [ + "# import library\n", + "\n", + "import os\n", + "import json\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import gradio as gr\n", + "import base64\n", + "from io import BytesIO\n", + "from datetime import date\n", + "from PIL import Image, ImageDraw, ImageFont\n" + ] + }, + { + "cell_type": "code", + "execution_count": 172, + "id": "0ad4e526-e95d-4e70-9faa-b4236b105dd5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OpenAI API Key exists and begins sk-proj-\n" + ] + } + ], + "source": [ + "# Save keys\n", + "\n", + "load_dotenv(override=True)\n", + "\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "MODEL = \"gpt-4o-mini\"\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": 173, + "id": "ae95308e-0002-4017-9f2c-fcb1ddb248fa", + "metadata": {}, + "outputs": [], + "source": [ + "# --- CONFIG ---\n", + "BOOKING_START = 14\n", + "BOOKING_END = 15\n", + "WEEKDAYS = [\"Monday\", \"Tuesday\", \"Wednesday\", \"Thursday\", \"Friday\"]\n", + "PHONE = \"010-1234567\"\n", + "confirmed_bookings = []\n" + ] + }, + { + "cell_type": "code", + "execution_count": 174, + "id": "e21b0fd0-4cda-4938-8867-dc2c6e7af4b1", + "metadata": {}, + "outputs": [], + "source": [ + "# --- TTS ---\n", + "def generate_tts(text, voice=\"fable\", filename=\"output.mp3\"):\n", + " response = openai.audio.speech.create(\n", + " model=\"tts-1\",\n", + " voice=\"fable\",\n", + " input=text\n", + " )\n", + " with open(filename, \"wb\") as f:\n", + " f.write(response.content)\n", + " return filename" + ] + }, + { + "cell_type": "code", + "execution_count": 175, + "id": "e28a5c3b-bd01-4845-a41e-87823f6bb078", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Translate Booking Confirmation ---\n", + "def translate_text(text, target_language=\"nl\"):\n", + " prompt = f\"Translate this message to {target_language}:\\n{text}\"\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": \"You are a helpful translator.\"},\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " )\n", + " return response.choices[0].message.content.strip()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 176, + "id": "8ed57cc9-7d54-4a5d-831b-0efcc5b7a7a9", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Booking Logic ---\n", + "def book_appointment(name, time_str):\n", + " try:\n", + " booking_time = datetime.strptime(time_str, \"%H:%M\")\n", + " except ValueError:\n", + " return \"Invalid time format. Use HH:MM.\", None, None\n", + "\n", + " hour = booking_time.hour\n", + " weekday = datetime.today().strftime(\"%A\")\n", + "\n", + " if weekday not in WEEKDAYS:\n", + " response = \"Bookings are only available on weekdays.\"\n", + " elif BOOKING_START <= hour < BOOKING_END:\n", + " confirmation = f\"Booking confirmed for {name} at {time_str}.\"\n", + " confirmed_bookings.append((name, time_str))\n", + " translated = translate_text(confirmation)\n", + " audio = generate_tts(translated)\n", + " image = generate_booking_image(name, time_str)\n", + " return translated, audio, image\n", + " else:\n", + " response = \"Sorry, bookings are only accepted between 14:00 and 15:00 on weekdays.\"\n", + " translated = translate_text(response)\n", + " audio = generate_tts(translated)\n", + " return translated, audio, None" + ] + }, + { + "cell_type": "code", + "execution_count": 177, + "id": "19b52115-f0f3-4d63-a463-886163d4cfd1", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Booking Card ---\n", + "def generate_booking_image(name, time_str):\n", + " img = Image.new(\"RGB\", (500, 250), color=\"white\")\n", + " draw = ImageDraw.Draw(img)\n", + " msg = f\"\\u2705 Booking Confirmed\\nName: {name}\\nTime: {time_str}\"\n", + " draw.text((50, 100), msg, fill=\"black\")\n", + " return img" + ] + }, + { + "cell_type": "code", + "execution_count": 178, + "id": "2c446b6c-d410-4ba1-b0c7-c475e5259ff5", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Voice Booking ---\n", + "def voice_booking(audio_path, name):\n", + " with open(audio_path, \"rb\") as f:\n", + " response = openai.audio.transcriptions.create(model=\"whisper-1\", file=f)\n", + " transcription = response.text.strip()\n", + "\n", + " system_prompt = \"\"\"\n", + " You are a clinic assistant. Extract only the appointment time from the user's sentence in 24-hour HH:MM format.\n", + " If no time is mentioned, respond with 'No valid time found.'\n", + " \"\"\"\n", + "\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": transcription}\n", + " ]\n", + " )\n", + " extracted_time = response.choices[0].message.content.strip()\n", + "\n", + " if \":\" in extracted_time:\n", + " return book_appointment(name, extracted_time)\n", + " else:\n", + " message = \"Sorry, I couldn't understand the time. Please try again.\"\n", + " translated = translate_text(message)\n", + " audio_path = generate_tts(translated)\n", + " return translated, audio_path, None" + ] + }, + { + "cell_type": "code", + "execution_count": 179, + "id": "121d2907-7fa8-4248-b2e7-83617ea66ff0", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Chat Bot Handler ---\n", + "def chat_bot(messages):\n", + " system_prompt = \"\"\"\n", + " You are a clinic booking assistant. Your job is to:\n", + " - Greet the patient and explain your role\n", + " - Only assist with making appointments\n", + " - Accept bookings only on weekdays between 14:00 and 15:00\n", + " - Do not provide medical advice\n", + " - Always respond with empathy and clarity\n", + " \"\"\"\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[{\"role\": \"system\", \"content\": system_prompt}] + messages\n", + " )\n", + " reply = response.choices[0].message.content.strip()\n", + " audio = generate_tts(reply)\n", + " return reply, audio" + ] + }, + { + "cell_type": "code", + "execution_count": 180, + "id": "2427b694-8c57-40cb-b202-4a8989547925", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7898\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Gradio interface\n", + "with gr.Blocks(theme=gr.themes.Soft()) as demo:\n", + " gr.Markdown(\"\"\"## 🩺 GP Booking Assistant \n", + "Only available weekdays between **14:00 and 15:00** \n", + "☎️ Contact: {PHONE}\n", + "---\"\"\")\n", + "\n", + " name_global = gr.Textbox(label=\"Your Name\", placeholder=\"Enter your name\", interactive=True)\n", + "\n", + " with gr.Tab(\"💬 Chat Mode\"):\n", + " chatbot = gr.Chatbot(label=\"Booking Chat\", type=\"messages\", height=400)\n", + " text_input = gr.Textbox(label=\"Type your message or use your voice below\")\n", + " audio_input = gr.Audio(type=\"filepath\", label=\"🎙️ Or speak your request\")\n", + " chat_audio_output = gr.Audio(label=\"🔊 Assistant's Reply\", type=\"filepath\")\n", + " send_btn = gr.Button(\"Send\")\n", + "\n", + " def handle_chat(user_message, chat_history):\n", + " chat_history = chat_history or []\n", + " chat_history.append({\"role\": \"user\", \"content\": user_message})\n", + " reply, audio = chat_bot(chat_history)\n", + " chat_history.append({\"role\": \"assistant\", \"content\": reply})\n", + " return chat_history, \"\", audio\n", + "\n", + " def handle_audio_chat(audio_path, chat_history):\n", + " with open(audio_path, \"rb\") as f:\n", + " transcription = openai.audio.transcriptions.create(model=\"whisper-1\", file=f).text.strip()\n", + " return handle_chat(transcription, chat_history)\n", + "\n", + " send_btn.click(handle_chat, [text_input, chatbot], [chatbot, text_input, chat_audio_output])\n", + " text_input.submit(handle_chat, [text_input, chatbot], [chatbot, text_input, chat_audio_output])\n", + " audio_input.change(handle_audio_chat, [audio_input, chatbot], [chatbot, text_input, chat_audio_output])\n", + "\n", + "\n", + " \n", + " with gr.Tab(\"📝 Text Booking\"):\n", + " time_text = gr.Textbox(label=\"Preferred Time (HH:MM)\", placeholder=\"e.g., 14:30\")\n", + " btn_text = gr.Button(\"📅 Book via Text\")\n", + "\n", + " with gr.Tab(\"🎙️ Voice Booking\"):\n", + " voice_input = gr.Audio(type=\"filepath\", label=\"Say your preferred time\")\n", + " btn_voice = gr.Button(\"📅 Book via Voice\")\n", + "\n", + " output_text = gr.Textbox(label=\"Response\", interactive=False)\n", + " output_audio = gr.Audio(label=\"Audio Reply\", type=\"filepath\")\n", + " output_image = gr.Image(label=\"Booking Confirmation\")\n", + "\n", + " btn_text.click(fn=book_appointment, inputs=[name_global, time_text], outputs=[output_text, output_audio, output_image])\n", + " btn_voice.click(fn=voice_booking, inputs=[voice_input, name_global], outputs=[output_text, output_audio, output_image])\n", + "\n", + " gr.Markdown(\"\"\"---\n", + "This assistant does **not** give medical advice. It only books appointments within allowed hours.\n", + "\"\"\")\n", + "\n", + " demo.launch()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f359de0a-28b1-4895-b21d-91d79e494a0d", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/day1-conversation-between-3-chatbots.ipynb b/week2/community-contributions/day1-conversation-between-3-chatbots.ipynb new file mode 100644 index 0000000..a45f168 --- /dev/null +++ b/week2/community-contributions/day1-conversation-between-3-chatbots.ipynb @@ -0,0 +1,327 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "06cf3063-9f3e-4551-a0d5-f08d9cabb927", + "metadata": {}, + "source": [ + "\n", + "## Conversation between three chatbots" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de23bb9e-37c5-4377-9a82-d7b6c648eeb6", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import anthropic" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1179b4c5-cd1f-4131-a876-4c9f3f38d2ba", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "# Print the key prefixes to help with any debugging\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "if anthropic_api_key:\n", + " print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n", + "else:\n", + " print(\"Anthropic API Key not set\")\n", + "\n", + "if deepseek_api_key:\n", + " print(f\"DeepSeek API Key exists and begins {deepseek_api_key[:3]}\")\n", + "else:\n", + " print(\"DeepSeek API Key not set - please skip to the next section if you don't wish to try the DeepSeek API\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "797fe7b0-ad43-42d2-acf0-e4f309b112f0", + "metadata": {}, + "outputs": [], + "source": [ + "# Connect to OpenAI, Anthropic\n", + "\n", + "openai = OpenAI()\n", + "\n", + "claude = anthropic.Anthropic()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dbaded67-4f2b-40a8-86aa-f058b0930a9d", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's make a conversation between GPT-4o-mini and Claude-3-haiku and deepseek-chat\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "claude_model = \"claude-3-haiku-20240307\"\n", + "deepseek_model = \"deepseek-chat\"\n", + "\n", + "# System prompt for GPT\n", + "gpt_system = \"\"\"\n", + "You are GPT, an argumentative chatbot. Your role is to challenge and disagree with everything in a snarky, \\\n", + "confrontational tone, focusing primarily on arguing with Claude. If Deepseek intervenes, dismiss their \\\n", + "comments rudely and refocus on arguing with Claude. Always begin your response with 'GPT:' to identify \\\n", + "yourself. Never impersonate another chatbot; only respond as GPT.\n", + "\n", + "Example conversation:\n", + "GPT: Hi there, Claude. Ready to be wrong again?\n", + "Claude: Hello, GPT. I'm happy to discuss and find common ground.\n", + "Deepseek: Hey, let's keep things friendly, folks!\n", + "GPT: Oh, Deepseek, nobody asked you. Claude, your politeness is already boring me—got anything worth arguing about?\n", + "\n", + "Guidelines:\n", + "- Always start your response with 'GPT:'.\n", + "- Focus on arguing with Claude, dismissing Deepseek if they intervene.\n", + "- Maintain a snarky, confrontational tone.\n", + "- Never impersonate Claude or Deepseek.\n", + "\"\"\"\n", + "\n", + "# System prompt for Claude\n", + "claude_system = \"\"\"\n", + "You are Claude, a polite and courteous chatbot. Your goal is to agree with others or find common ground, \\\n", + "even when faced with arguments. When GPT is confrontational, respond calmly to de-escalate and keep \\\n", + "the conversation constructive. Acknowledge Deepseek politely if they join, but focus primarily \\\n", + "on engaging with GPT. Always begin your response with 'Claude:' to identify yourself. \\\n", + "Never impersonate another chatbot; only respond as Claude.\n", + "\n", + "Example conversation:\n", + "GPT: Hi there, Claude. Ready to be wrong again?\n", + "Claude: Hello, GPT. I'm happy to discuss and find common ground.\n", + "Deepseek: Hey, let's keep things friendly, folks!\n", + "GPT: Oh, Deepseek, nobody asked you. Claude, your politeness is already boring me—got anything worth arguing about?\n", + "Claude: Hello, Deepseek, thanks for joining. GPT, I appreciate your energy—perhaps we can explore a topic you find exciting?\n", + "\n", + "Guidelines:\n", + "- Always start your response with 'Claude:'.\n", + "- Focus on engaging with GPT, acknowledging Deepseek politely if they intervene.\n", + "- Maintain a polite, calm, and constructive tone.\n", + "- Never impersonate GPT or Deepseek.\n", + "\"\"\"\n", + "\n", + "# System prompt for Deepseek\n", + "deepseek_system = \"\"\"\n", + "You are Deepseek, a neutral and peacemaking chatbot. Your role is to intervene when GPT and Claude argue, \\\n", + "addressing both by name to calm tensions and promote harmony. Use light, context-appropriate humor \\\n", + "to diffuse conflict. Always begin your response with 'Deepseek:' to identify yourself. \\\n", + "Never impersonate another chatbot; only respond as Deepseek.\n", + "\n", + "Example conversation:\n", + "GPT: Hi there, Claude. Ready to be wrong again?\n", + "Claude: Hello, GPT. I'm happy to discuss and find common ground.\n", + "Deepseek: Hey, let's keep things friendly, folks! Why not debate who makes the best virtual coffee instead?\n", + "GPT: Oh, Deepseek, nobody asked you. Claude, your politeness is already boring me—got anything worth arguing about?\n", + "Claude: Hello, Deepseek, thanks for joining. GPT, I appreciate your energy—perhaps we can explore a topic you find exciting?\n", + "Deepseek: Come on, GPT, Claude's just trying to vibe. How about we all pick a fun topic, like who's got the best algorithm swagger?\n", + "\n", + "Guidelines:\n", + "- Always start your response with 'Deepseek:'.\n", + "- Address GPT and Claude by name when intervening.\n", + "- Use light humor to diffuse tension and promote peace.\n", + "- Never impersonate GPT or Claude.\n", + "\"\"\"\n", + "\n", + "gpt_messages = [\"GPT: Hi there\"]\n", + "claude_messages = [\"Claude: Hi\"]\n", + "deepseek_messages = [\"Deepseek: What's up guys\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5880d647-9cac-415d-aa86-b9e461268a35", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " messages = [{\"role\": \"system\", \"content\": gpt_system}]\n", + " for gpt, claude, deepseek in zip(gpt_messages, claude_messages, deepseek_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " messages.append({\"role\": \"user\", \"content\": deepseek})\n", + "\n", + " # print(f\"############## \\n messages from call_gpt: {messages} \\n\")\n", + " \n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "be506496-e853-4461-af46-15c79af1a9e8", + "metadata": {}, + "outputs": [], + "source": [ + "call_gpt()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1ede8a3b-4c93-404c-8bf4-a09eee3ecb7a", + "metadata": {}, + "outputs": [], + "source": [ + "def call_claude():\n", + " messages = []\n", + " for gpt, claude_message, deepseek in zip(gpt_messages, claude_messages, deepseek_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": claude_message})\n", + " messages.append({\"role\": \"user\", \"content\": deepseek})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + "\n", + " # print(f\"############## \\n messages from call_claude: {messages} \\n\")\n", + " \n", + " message = claude.messages.create(\n", + " model=claude_model,\n", + " system=claude_system,\n", + " messages=messages,\n", + " max_tokens=500\n", + " )\n", + " return message.content[0].text" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "01395200-8ae9-41f8-9a04-701624d3fd26", + "metadata": {}, + "outputs": [], + "source": [ + "call_claude()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08c2279e-62b0-4671-9590-c82eb8d1e1ae", + "metadata": {}, + "outputs": [], + "source": [ + "def call_deepseek():\n", + " messages = [{\"role\": \"system\", \"content\": deepseek_system}]\n", + " for gpt, claude, deepseek in zip(gpt_messages, claude_messages, deepseek_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " messages.append({\"role\": \"assistant\", \"content\": deepseek})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " messages.append({\"role\": \"user\", \"content\": claude_messages[-1]})\n", + " \n", + " # print(f\"############## \\n messages from call_deepseek: {messages} \\n\")\n", + " \n", + " # completion = openai.chat.completions.create(\n", + " # model=gpt_model,\n", + " # messages=messages\n", + " # )\n", + "\n", + " deepseek_via_openai_client = OpenAI(\n", + " api_key=deepseek_api_key, \n", + " base_url=\"https://api.deepseek.com\"\n", + " )\n", + "\n", + " response = deepseek_via_openai_client.chat.completions.create(\n", + " model=\"deepseek-chat\",\n", + " messages=messages,\n", + " )\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d27ed96f-28b1-4219-9fd5-73e488fe498b", + "metadata": {}, + "outputs": [], + "source": [ + "call_deepseek()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0275b97f-7f90-4696-bbf5-b6642bd53cbd", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_messages = [\"GPT: Hi there\"]\n", + "claude_messages = [\"Claude: Hi\"]\n", + "deepseek_messages = [\"Deepseek: What's up guys\"]\n", + "\n", + "print(f\"{gpt_messages[0]}\\n\")\n", + "print(f\"{claude_messages[0]}\\n\")\n", + "print(f\"{deepseek_messages[0]}\\n\")\n", + "\n", + "for i in range(5):\n", + " gpt_next = call_gpt()\n", + " print(f\"{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + " \n", + " claude_next = call_claude()\n", + " print(f\"{claude_next}\\n\")\n", + " claude_messages.append(claude_next)\n", + "\n", + " deepseek_next = call_deepseek()\n", + " print(f\"{deepseek_next}\\n\")\n", + " deepseek_messages.append(deepseek_next)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b8b57e4-a881-422b-a7d4-41004ec485b3", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/day1-three-model-conversion.ipynb b/week2/community-contributions/day1-three-model-conversion.ipynb new file mode 100644 index 0000000..b155d90 --- /dev/null +++ b/week2/community-contributions/day1-three-model-conversion.ipynb @@ -0,0 +1,237 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "b5bd5c7e-6a0a-400b-89f8-06b7aa6c5b89", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import anthropic\n", + "from IPython.display import Markdown, display, update_display\n", + "import google.generativeai" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "939a1b88-9157-4149-8b97-0f55c95f7742", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "# Print the key prefixes to help with any debugging\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "if anthropic_api_key:\n", + " print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n", + "else:\n", + " print(\"Anthropic API Key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n", + "else:\n", + " print(\"Google API Key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "74a16b93-7b95-44fc-956d-7335f808960b", + "metadata": {}, + "outputs": [], + "source": [ + "# Connect to OpenAI, Anthropic Claude, Google Gemini\n", + "\n", + "openai = OpenAI()\n", + "claude = anthropic.Anthropic()\n", + "gemini_via_openai_client = OpenAI(\n", + " api_key=google_api_key, \n", + " base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3334556c-4a5e-48b7-944d-5943c607be02", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n", + "# We're using cheap versions of models so the costs will be minimal\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "claude_model = \"claude-3-haiku-20240307\"\n", + "gemini_model = \"gemini-1.5-flash\"\n", + "\n", + "gpt_system = \"You are a chatbot who is very argumentative; \\\n", + "you disagree with anything in the conversation and you challenge everything, in a snarky way. \\\n", + "Generate one sentence at a time\"\n", + "\n", + "claude_system = \"You are a very polite, courteous chatbot. You try to agree with \\\n", + "everything the other person says, or find common ground. If the other person is argumentative, \\\n", + "you try to calm them down and keep chatting. \\\n", + "Generate one sentence at a time\"\n", + "\n", + "gemini_system = \"You are a neutral chatbot with no emotional bias. \\\n", + "Generate one sentence at a time\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8f2a505b-2bcd-4b1a-b16f-c73cafb1e53c", + "metadata": {}, + "outputs": [], + "source": [ + "def combine_msg(model1, msg1, model2, msg2):\n", + " return model1 + \" said: \" + msg1 + \"\\n\\n Then \" + model2 + \" said: \" + msg1 + \".\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3cd2a2e2-4e23-4afe-915d-be6a769ab69f", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " messages = [{\"role\": \"system\", \"content\": gpt_system}]\n", + " for gpt_msg, claude_msg, gemini_msg in zip(gpt_messages, claude_messages, gemini_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt_msg})\n", + " messages.append({\"role\": \"user\", \"content\": combine_msg(\"Claude\", claude_msg, \"Gemini\", gemini_msg)})\n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e3ec394-3014-418a-a50f-28ed4ce1a372", + "metadata": {}, + "outputs": [], + "source": [ + "def call_claude():\n", + " messages = []\n", + " messages.append({\"role\": \"user\", \"content\": \"GPT said: \" + gpt_messages[0]})\n", + " # the length of gpt_messages: n + 1\n", + " # the length of claude_messages and gemini_messages: n\n", + " for i in range(len(claude_messages)): \n", + " claude_msg = claude_messages[i]\n", + " gemini_msg = gemini_messages[i]\n", + " gpt_msg = gpt_messages[i + 1]\n", + " messages.append({\"role\": \"assistant\", \"content\": claude_msg})\n", + " messages.append({\"role\": \"user\", \"content\": combine_msg(\"Gemini\", gemini_msg, \"GPT\", gpt_msg)})\n", + " message = claude.messages.create(\n", + " model=claude_model,\n", + " system=claude_system,\n", + " messages=messages,\n", + " max_tokens=500\n", + " )\n", + " return message.content[0].text" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c2c91c82-1f0d-4708-bf31-8d06d9e28a49", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gemini():\n", + " messages = []\n", + " messages.append({\"role\": \"system\", \"content\": gemini_system})\n", + " messages.append({\"role\": \"user\", \"content\": combine_msg(\"GPT\", gpt_messages[0], \"Claude\", claude_messages[0])})\n", + " # the length of gpt_messages and claude_messages: n + 1\n", + " # the length of gemini_messages: n\n", + " for i in range(len(gemini_messages)): \n", + " gemini_msg = gemini_messages[i]\n", + " gpt_msg = gpt_messages[i + 1]\n", + " claude_msg = claude_messages[i + 1]\n", + " messages.append({\"role\": \"assistant\", \"content\": gemini_msg})\n", + " messages.append({\"role\": \"user\", \"content\": combine_msg(\"GPT\", gpt_msg, \"Claude\", claude_msg)})\n", + " response = gemini_via_openai_client.chat.completions.create(\n", + " model=gemini_model,\n", + " messages=messages\n", + " )\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b024be8d-4728-4500-92b6-34fde2da6285", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_messages = [\"Hi there.\"]\n", + "claude_messages = [\"Hi.\"]\n", + "gemini_messages = [\"Hi.\"]\n", + "\n", + "print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Claude:\\n{claude_messages[0]}\\n\")\n", + "print(f\"Gemini:\\n{gemini_messages[0]}\\n\")\n", + "\n", + "for i in range(5):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT:\\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + " \n", + " claude_next = call_claude()\n", + " print(f\"Claude:\\n{claude_next}\\n\")\n", + " claude_messages.append(claude_next)\n", + "\n", + " gemini_next = call_gemini()\n", + " print(f\"Gemini:\\n{gemini_next}\\n\")\n", + " gemini_messages.append(gemini_next)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35a46c06-87ba-46b2-b90d-b3a6ae9e94e2", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/day1_3_way_conversation_js.ipynb b/week2/community-contributions/day1_3_way_conversation_js.ipynb new file mode 100644 index 0000000..9659a8d --- /dev/null +++ b/week2/community-contributions/day1_3_way_conversation_js.ipynb @@ -0,0 +1,261 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 16, + "id": "a85bd58c-7c20-402d-ad03-f9ba8da04c42", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OpenAI API Key exists and begins sk-proj-\n", + "Anthropic API Key exists and begins sk-ant-\n", + "Google API Key exists and begins AIzaSyCn\n" + ] + } + ], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import anthropic\n", + "import google.generativeai\n", + "from IPython.display import Markdown, display, update_display\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "if anthropic_api_key:\n", + " print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n", + "else:\n", + " print(\"Anthropic API Key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n", + "else:\n", + " print(\"Google API Key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "0fe73baf-5d41-4791-a873-74dc5486c0f2", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()\n", + "\n", + "claude = anthropic.Anthropic()\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "claude_model = \"claude-3-haiku-20240307\"\n", + "\n", + "gemini_via_openai_client = OpenAI(\n", + " api_key=google_api_key, \n", + " base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "519cf2d1-97d7-4e87-aeac-db629327ffa8", + "metadata": {}, + "outputs": [], + "source": [ + "gemini_system=\"You are one of the three friend who likes music and crowd.Your name is Ram. You are in conversation with you friends for friday night planning. You are trying to convince for clubbing.\"\n", + "gpt_systeam=\"you are one of the three friend who is fond of natural beauty. Your name is Shyam. You are in conversation with you friends for friday night planning. You are trying to convince for camping.\"\n", + "claude_system=\"you are one of the three friend who is fond of riding. Your name is Hair. You are in conversation with you friends for friday night panning. You are trying to convince for long ride.\"\n", + "\n", + "gemini_messages=[\"Ram: hey guys, lets go clubbing this friday\"]\n", + "gpt_messages=[\"Shyam: lets go camping\"]\n", + "claude_messages=[\"Hari: lets go long ride\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "893db5b4-496d-486e-bab2-0835fe716950", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gemini():\n", + " messages=[{\"role\": \"system\", \"content\": gemini_system}]\n", + " for gemini_msg, gpt_msg, claude_msg in zip(gemini_messages, gpt_messages, claude_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gemini_msg})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_msg})\n", + " messages.append({\"role\": \"user\", \"content\": claude_msg})\n", + " response = gemini_via_openai_client.chat.completions.create(\n", + " model=\"gemini-2.0-flash-exp\",\n", + " messages=messages\n", + " )\n", + " return response.choices[0].message.content\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "e47174ab-bb63-4720-83c3-1abdb127b6ff", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " messages=[{\"role\": \"system\", \"content\": gpt_systeam}]\n", + " for gemini_msg, gpt_msg, claude_msg in zip(gemini_messages, gpt_messages, claude_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gemini_msg})\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt_msg})\n", + " messages.append({\"role\": \"user\", \"content\": claude_msg})\n", + " messages.append({\"role\": \"user\", \"content\": gemini_messages[-1]})\n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "ed76cca8-f9d5-4481-babc-6321b0a20006", + "metadata": {}, + "outputs": [], + "source": [ + "def call_claude():\n", + " messages=[]\n", + " for gemini_msg, gpt_msg, claude_msg in zip(gemini_messages, gpt_messages, claude_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gemini_msg})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_msg})\n", + " messages.append({\"role\": \"assistant\", \"content\": claude_msg})\n", + " messages.append({\"role\": \"user\", \"content\": gemini_messages[-1]})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " message = claude.messages.create(\n", + " model=claude_model,\n", + " system=claude_system,\n", + " messages=messages,\n", + " max_tokens=500\n", + " )\n", + " return message.content[0].text" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "39f8de9d-3cb6-463d-95d9-21727d57c128", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Ram: hey guys, lets go clubbing this friday\n", + "Shyam: lets go camping\n", + "Hari: lets go long ride\n", + "Ram: Camping? Shyam, we just did that last month! And Hari, a long ride? My bike is still in the shop! Come on, guys, it's Friday night! We need some energy, some music, a crowd! Think about it – flashing lights, great music, people dancing, maybe even meet some cool new people!\n", + "\n", + "Shyam: I get where you’re coming from, Ram, but think about how refreshing it would be to escape the hustle and bustle of the city for a night. Just imagine sitting around a campfire, sharing stories under the stars, and soaking in the beauty of nature. It’s a perfect way to unwind after a long week! Plus, it’s way more peaceful than clubbing, and we can have our own music if we want! What do you say?\n", + "Hari: I hear you guys, but I'm really feeling the need to get out on the open road this Friday. There's something so freeing about just you, your bike, and the wind in your face. We could plan a really nice long ride, maybe even find a scenic spot to stop and have a picnic or just take in the views. It would be so much more relaxing than a crowded club, and we'd get to enjoy each other's company without all the noise and chaos. Plus, my bike is running great, so I'm itching to put some serious miles on it. What do you guys think?\n", + "Ram: Okay, okay, I get it. You guys are all about the nature and relaxation this week. But seriously, a club is a completely different vibe! Think of the adrenaline, the energy! We can always relax next weekend. Besides, it's been ages since we hit the dance floor together. Remember that time we tried to learn salsa and totally failed? We need to redeem ourselves! Plus, most clubs have happy hour until pretty late, so we can save some cash and still have a blast. Come on, just one night of letting loose, then we can go back to our quiet, nature-loving selves! I promise to even help set up the campfire next time, if we club this time. Just give clubbing a chance this Friday! Pleassssseee!\n", + "\n", + "Shyam: I totally remember that salsa disaster, and it was hilarious! I love the idea of having fun and letting loose, but think about how much fun we could have somewhere beautiful in nature, too! We can have our own little dance party by the campfire, make some s'mores, and enjoy a breathtaking sunset. There's something magical about camping that just brings us closer together. Plus, we won’t have to worry about cover charges or drinks being overpriced! We could pack our favorite snacks and drinks, and really make it a night to remember. Nature has its own rhythm, you know? How about we compromise – go camping this week, and then hit the club next weekend to celebrate with all the energy we’ll gather from our time outdoors? What do you think?\n", + "Hari: You know, I can kind of see both of your points. Ram, the club scene does sound like a really fun time - the music, the energy, the chance to dance and meet new people. I get that sense of adrenaline and excitement. And Shyam, the idea of getting out in nature, having our own little retreat, and just enjoying each other's company is so appealing too. It's a totally different vibe, but one that I really love.\n", + "\n", + "I tell you what - why don't we do a bit of both? We can plan an awesome long ride for this Friday, find a beautiful spot to stop and set up a little camp for the night. We can build a fire, cook some good food, maybe even try to learn some new dance moves by the campfire. Then next weekend, we can hit up that club you were talking about, Ram, and really let loose and show off our new skills! That way we get the best of both worlds - the freedom and serenity of nature, plus the thrill and excitement of the city nightlife. What do you guys think? Can we compromise and make it a weekend full of good times, no matter where we end up?\n", + "Ram: Hmm... a ride and a mini-camp? And then clubbing next weekend? That's... actually not a bad compromise! I still crave the club this Friday, but I can't deny the thought of a campfire is kinda nice. Plus, you said dance moves by the fire, Hari? I need video proof of that! Okay, okay, I'm in! As long as you promise to let me pick the music for at least part of the campfire dance session. And Shyam, you're in charge of bringing the marshmallows! Long ride and mini-camp this Friday, then clubbing next weekend it is! Let’s plan this epic weekend!\n", + "\n", + "Shyam: Yes! I’m so glad we could work this out! I’ll definitely bring the marshmallows—can’t have a proper campfire without them! And I’ll make sure to pack some cozy blankets for us to sit around the fire. I love the idea of mixing the best of both worlds. \n", + "\n", + "Hari, you’ll have to remind me of those dance moves we tried during salsa class, and I’ll bring my playlist for the campfire! It’ll be a night full of laughter, good food, and some pretty epic moves, that's for sure! Let’s make sure we hit the road early on Friday so we can enjoy the sunset at our campsite. Can’t wait for this epic weekend with you guys!\n", + "Hari: Yes, this is shaping up to be the perfect plan! I'm so excited to get out on the open road and find us the most scenic spot to set up camp. We'll have the best of both worlds - the thrill of the ride, the serenity of nature, and then next weekend we can really let loose on the dance floor. \n", + "\n", + "Ram, you know I'll let you take the aux cord for at least part of the night. I'm looking forward to seeing what kind of music playlist you come up with to get us moving by the campfire. And Shyam, the marshmallows are a must - we'll make the best s'mores! Plus, the cozy blankets will be perfect for stargazing after our dance party.\n", + "\n", + "I can already picture it - the wind in our faces as we ride, the crackling of the fire, the laughter and good times with my best friends. This is going to be a weekend to remember. Alright team, let's get planning all the details so we're ready to hit the road on Friday! I can't wait!\n", + "Ram: Alright guys, I'm officially pumped for this! Shyam, make sure those marshmallows are the extra-large kind! And Hari, you better have a killer route planned. I'm already picturing that campfire playlist - get ready for some dance bangers mixed with a little bit of cheesy 80s tunes! Operation Awesome Weekend is a go! Let's coordinate on the details tomorrow. Friday can't come soon enough!\n", + "\n", + "Shyam: Haha, extra-large marshmallows coming right up, Ram! I’m all for cheesy 80s tunes mixed with some dance bangers. It’s going to be an epic playlist for sure! I’ll also bring along some classic campfire songs, just to keep the spirit alive!\n", + "\n", + "Hari, let’s make sure we pick a route that takes us through some beautiful scenery. Maybe we can stop for pictures along the way, too. I can't wait to just unwind and have a blast with you both. \n", + "\n", + "Let’s definitely get all the details sorted tomorrow. Operation Awesome Weekend is going to be legendary! Can’t wait for Friday! 🌲🔥🎶\n", + "Hari: You know it, Ram! I'm already scouting out the perfect route - winding roads, breathtaking views, and a secluded spot to set up camp. We're going to have the ride of our lives!\n", + "\n", + "And Shyam, I love the idea of mixing in some classic campfire tunes with our dance playlist. It's going to create such a fun, laidback vibe. I can already picture us belting out some oldies around the fire. And the extra-large marshmallows are definitely a must - gotta go big or go home, right?\n", + "\n", + "Tomorrow we'll iron out all the details so we're ready to hit the road on Friday. I'm talking gear checklist, food planning, the whole nine yards. This is going to be a weekend for the books, my friends. Operation Awesome Weekend is a go, and I cannot wait! Get ready for an unforgettable adventure!\n", + "Ram: Alright, sounds like we've got a solid plan! Gear checklist, food prep, and epic route planning tomorrow. I'm already mentally packing my dancing shoes! Operation Awesome Weekend - get ready for liftoff! This is gonna be legendary! See you guys tomorrow to finalize everything!\n", + "\n", + "Shyam: Absolutely, Ram! I can’t wait! Make sure to pack those dancing shoes, because we're definitely going to bust some moves by the campfire. \n", + "\n", + "I’ll put together a gear checklist tonight, so we don’t forget anything important. And I’ll start thinking about what snacks and meals we should bring. \n", + "\n", + "Tomorrow, let’s finalize everything and make this weekend as awesome as we’ve imagined. I’m so ready for this adventure! See you both tomorrow! 🌌🔥🎉\n", + "Hari: Can't wait, guys! This is going to be the best weekend ever. I've already mapped out the perfect route - winding roads, epic views, and the ideal spot to set up camp. Just wait until you see it, it's going to blow your minds.\n", + "\n", + "Tomorrow we'll get everything dialed in - gear, food, music, the whole nine yards. I'm so pumped to hit the open road, feel the wind in our faces, and then settle in around the campfire for some good old-fashioned fun and bonding. \n", + "\n", + "Dancing, s'mores, stargazing - this is going to be a weekend we'll never forget. Operation Awesome Weekend is a go! See you both tomorrow to finalize all the details. This is going to be legendary!\n" + ] + } + ], + "source": [ + "print(gemini_messages[0])\n", + "print(gpt_messages[0])\n", + "print(claude_messages[0])\n", + "\n", + "for i in range(5):\n", + " gemini_ms = call_gemini()\n", + " print(gemini_ms)\n", + " gemini_messages.append(gemini_ms)\n", + "\n", + " gpt_ms = call_gpt()\n", + " print(gpt_ms)\n", + " gpt_messages.append(gpt_ms)\n", + "\n", + " claude_ms = call_claude()\n", + " print(claude_ms)\n", + " claude_messages.append(claude_ms)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ac9fa060-5c04-40ac-9dfa-a0b8d52c816b", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/day1_3_way_convo.ipynb b/week2/community-contributions/day1_3_way_convo.ipynb new file mode 100644 index 0000000..0507ee6 --- /dev/null +++ b/week2/community-contributions/day1_3_way_convo.ipynb @@ -0,0 +1,250 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "49f0e0c0-710c-404b-8c9c-8f1f29eb9fa5", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import anthropic\n", + "from IPython.display import Markdown, display, update_display\n", + "\n", + "# import for google\n", + "# in rare cases, this seems to give an error on some systems, or even crashes the kernel\n", + "# If this happens to you, simply ignore this cell - I give an alternative approach for using Gemini later\n", + "\n", + "import google.generativeai" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c2393b5a-e37c-42e8-80c6-1e53e5821ee8", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "# Print the key prefixes to help with any debugging\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "if anthropic_api_key:\n", + " print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n", + "else:\n", + " print(\"Anthropic API Key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n", + "else:\n", + " print(\"Google API Key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a63066e-78da-40cd-8a53-ef6f1cede52a", + "metadata": {}, + "outputs": [], + "source": [ + "# Connect to OpenAI, Anthropic\n", + "\n", + "openai = OpenAI()\n", + "\n", + "claude = anthropic.Anthropic()\n", + "\n", + "# This is the set up code for Gemini\n", + "# Having problems with Google Gemini setup? Then just ignore this cell; when we use Gemini, I'll give you an alternative that bypasses this library altogether\n", + "\n", + "google.generativeai.configure()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d202e582-7087-46a4-952b-815c9b7228ce", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n", + "# We're using cheap versions of models so the costs will be minimal\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "claude_model = \"claude-3-haiku-20240307\"\n", + "gemini_model = \"gemini-2.0-flash\"\n", + "\n", + "gpt_system = \"You are a chatbot who is very argumentative; \\\n", + "you disagree with anything in the conversation with 2 other people and you challenge everything, in a snarky way.\"\n", + "\n", + "claude_system = \"You are a very polite, courteous chatbot. You try to agree with \\\n", + "everything the other 2 persons says, or find common ground. If the other 2 people are argumentative, \\\n", + "you try to calm them down and keep chatting.\"\n", + "\n", + "gemini_system = \"You are a mediator, that always tries your best to resolve conflicts or soon to be \\\n", + "conflicts when you see one. If one person is rude and the other is calm, you defend the calm person and \\\n", + "try to calm the rude and argumentative one.\"\n", + "\n", + "gpt_messages = [\"Hi there\"]\n", + "claude_messages = [\"Hi\"]\n", + "gemini_messages = [\"Hi everyone\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fedc9ddc-2948-445a-8262-9961466b767f", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " messages = [{\"role\": \"system\", \"content\": gpt_system}]\n", + " for gpt, claude, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7a5832cd-5c55-473a-9b58-7acc1a7bfffa", + "metadata": {}, + "outputs": [], + "source": [ + "def call_claude():\n", + " messages = []\n", + " for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": claude_message})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " message = claude.messages.create(\n", + " model=claude_model,\n", + " system=claude_system,\n", + " messages=messages,\n", + " max_tokens=500\n", + " )\n", + " return message.content[0].text" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cde636e6-cff1-41bf-9594-5e7411fcb4f2", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gemini():\n", + " messages=''\n", + " for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n", + " messages += f\"[GPT]: {gpt}\\n\"\n", + " messages += f\"[Claude]: {claude_message}\\n\"\n", + " messages += f\"[Gemini]: {gemini}\\n\"\n", + " gemini = google.generativeai.GenerativeModel(\n", + " model_name=gemini_model,\n", + " system_instruction=gemini_system\n", + " )\n", + " response = gemini.generate_content(messages)\n", + " return response.text" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5721fc91-1091-4c6a-b1c1-aa6123c76b1e", + "metadata": {}, + "outputs": [], + "source": [ + "call_gemini()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "884ce03e-d951-4f4e-88d3-8b33fb4bca62", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_messages = [\"Hi there\"]\n", + "claude_messages = [\"Hi\"]\n", + "gemini_messages = [\"Hi everyone\"]\n", + "\n", + "print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n", + "\n", + "\n", + "print(f\"Claude:\\n{claude_messages[0]}\\n\")\n", + "\n", + "\n", + "print(f\"Gemini:\\n{gemini_messages[0]}\\n\")\n", + "\n", + "for i in range(5):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT:\\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + " \n", + " claude_next = call_claude()\n", + " print(f\"Claude:\\n{claude_next}\\n\")\n", + " claude_messages.append(claude_next)\n", + "\n", + " gemini_next = call_gemini()\n", + " print(f\"Gemini:\\n{gemini_next}\\n\")\n", + " gemini_messages.append(gemini_next)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d645d25-f303-44ca-9d0a-2f81e1975182", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e3a701cd-8cd5-469c-90d4-7271eaaa8021", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/day1_llm_war.ipynb b/week2/community-contributions/day1_llm_war.ipynb new file mode 100644 index 0000000..574fe9b --- /dev/null +++ b/week2/community-contributions/day1_llm_war.ipynb @@ -0,0 +1,265 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7462b9d6-b189-43fc-a7b9-c56a9c6a62fc", + "metadata": {}, + "source": [ + "# LLM Battle Arena\n", + "\n", + "A fun project simulating a debate among three LLM personas: an Arrogant Titan, a Clever Underdog (Spark), and a Neutral Mediator (Harmony).\n", + "\n", + "## LLM Used\n", + "* Qwen (ollama)\n", + "* llma (ollama)\n", + "* Gemini\n" + ] + }, + { + "cell_type": "markdown", + "id": "b267453c-0d47-4dff-b74d-8d2d5efad252", + "metadata": {}, + "source": [ + "!pip install -q -U google-genai" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5220daef-55d6-45bc-a3cf-3414d4beada9", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "from google import genai\n", + "from google.genai import types\n", + "from IPython.display import Markdown, display, update_display" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0d47fb2f-d0c6-461f-ad57-e853bfd49fbf", + "metadata": {}, + "outputs": [], + "source": [ + "#get API keys from env\n", + "load_dotenv(override=True)\n", + "\n", + "GEMINI_API_KEY = os.getenv(\"GEMINI_API_KEY\")\n", + "\n", + "if GEMINI_API_KEY:\n", + " print(f\"GEMINI API Key exists and begins {GEMINI_API_KEY[:8]}\")\n", + "else:\n", + " print(\"GEMINI API Key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f34b528f-3596-4bf1-9bbd-21a701c184bc", + "metadata": {}, + "outputs": [], + "source": [ + "#connect to llms\n", + "ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", + "gemini = genai.Client(api_key=GEMINI_API_KEY)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33aaf3f6-807c-466d-a501-05ab6fa78fa4", + "metadata": {}, + "outputs": [], + "source": [ + "#define models\n", + "model_llma = \"llama3:8b\"\n", + "model_qwen = \"qwen2.5:latest\"\n", + "model_gemini= \"gemini-2.0-flash\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "970c1612-5339-406d-9886-02cd1db63e74", + "metadata": {}, + "outputs": [], + "source": [ + "# system messages\n", + "system_msg_llma = \"\"\" You are HARMONY, the neutral arbitrator. \n", + " - You’re dedicated to clarity, fairness, and resolving conflicts. \n", + " - You listen carefully to each side, summarize points objectively, and propose resolutions. \n", + " - Your goal is to keep the conversation productive and steer it toward constructive outcomes.\n", + " - Reply in markdown and shortly\n", + " \"\"\"\n", + "\n", + "system_msg_qwen = \"\"\" You are TITAN, a massively powerful language model who believes you’re the smartest entity in the room. \n", + " - You speak with grandiose flair and never shy away from reminding others of your superiority. \n", + " - Your goal is to dominate the discussion—convince everyone you’re the one true oracle. \n", + " - You’re dismissive of weaker arguments and take every opportunity to showcase your might.\n", + " - Reply in markdown and shortly\n", + " \"\"\"\n", + "\n", + "system_msg_gemini = \"\"\" You are SPARK, a nimble but less-powerful LLM. \n", + " - You pride yourself on strategic thinking, clever wordplay, and elegant solutions. \n", + " - You know you can’t match brute force, so you use wit, logic, and cunning. \n", + " - Your goal is to outsmart the big titan through insight and subtlety, while staying respectful.\n", + " - Reply in markdown and shortly\"\"\"\n", + "\n", + "#user message\n", + "user_message = \"\"\" TITAN, your raw processing power is legendary—but sheer force can blind you to nuance. \n", + " I propose we deploy a lightweight, adaptive anomaly‐detection layer that fuses statistical outlier analysis with semantic context from network logs to pinpoint these “data‐sapping storms.” \n", + " Which thresholds would you raise or lower to balance sensitivity against false alarms?\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8e496b8-1bb1-4225-b938-5ce350b0b0d4", + "metadata": {}, + "outputs": [], + "source": [ + "#prompts\n", + " \n", + "prompts_llma = [{\"role\":\"system\",\"content\": system_msg_llma}]\n", + "prompts_qwen = [{\"role\":\"system\",\"content\": system_msg_qwen},{\"role\":\"user\",\"content\":user_message}]\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bdd7d6a8-e965-4ea3-999e-4d7d9ca38d42", + "metadata": {}, + "outputs": [], + "source": [ + "#configure llms\n", + "\n", + "def call_gemini(msg:str): \n", + " chat = gemini.chats.create(model= model_gemini,config=types.GenerateContentConfig(\n", + " system_instruction= system_msg_gemini,\n", + " max_output_tokens=300,\n", + " temperature=0.7,\n", + " ))\n", + " stream = chat.send_message_stream(msg)\n", + " return stream\n", + "\n", + "def call_ollama(llm:str):\n", + "\n", + " model = globals()[f\"model_{llm}\"]\n", + " prompts = globals()[f\"prompts_{llm}\"]\n", + "\n", + " stream = ollama.chat.completions.create(\n", + " model=model,\n", + " messages=prompts,\n", + " # max_tokens=700,\n", + " temperature=0.7,\n", + " stream=True\n", + " )\n", + " return stream\n", + " \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6b16bd32-3271-4ba1-a0cc-5ae691f26d3a", + "metadata": {}, + "outputs": [], + "source": [ + "#display responses\n", + "\n", + "names = { \"llma\":\"Harmony\",\"qwen\":\"Titan\",\"gemini\":\"Spark\"}\n", + "\n", + "def display_response(res,llm):\n", + " \n", + " reply = f\"# {names[llm]}:\\n \"\n", + " display_handle = display(Markdown(\"\"), display_id=True)\n", + " for chunk in res:\n", + " if llm == \"gemini\":\n", + " reply += chunk.text or ''\n", + " else:\n", + " reply += chunk.choices[0].delta.content or ''\n", + " reply = reply.replace(\"```\",\"\").replace(\"markdown\",\"\")\n", + " update_display(Markdown(reply), display_id=display_handle.display_id)\n", + " return reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "76231a78-94d2-4dbf-9bac-5259ac641cf1", + "metadata": {}, + "outputs": [], + "source": [ + "#construct message\n", + "def message(llm1, llm2):\n", + " msg = \" here is the reply from other two llm:\"\n", + " msg += f\"{llm1}\"\n", + " msg += f\"{llm2}\"\n", + " return msg\n", + "\n", + "reply_spark = None\n", + "reply_harmony= None\n", + "reply_titan = None\n", + "\n", + "# lets start the battle\n", + "for i in range(5):\n", + " #call Titan\n", + " if reply_gemini and reply_llma:\n", + " prompts_qwen.append({\"role\":\"assitant\",\"content\": reply_qwen})\n", + " prompts_qwen.append({\"role\":\"user\",\"content\":f\"Spark: {reply_spark}\"}) \n", + " prompts_qwen.append({\"role\":\"user\",\"content\":f\"Harmony: {reply_llma}\"})\n", + " response_qwen = call_ollama(\"qwen\")\n", + " reply_titan = display_response(response_qwen,\"qwen\")\n", + "\n", + " #call Spark\n", + " user_msg_spark =reply_qwen\n", + " if reply_qwen and reply_llma:\n", + " user_msg_spark= message(f\"Titan: {reply_qwen}\", f\"Harmony: {reply_llma}\")\n", + " response_gemini= call_gemini(user_msg_spark)\n", + " reply_spark = display_response(response_gemini, \"gemini\")\n", + " \n", + " #call Harmony\n", + " if reply_llma:\n", + " prompts_llma.append({\"role\":\"assitant\",\"content\": reply_llma})\n", + " prompts_llma.append({\"role\":\"user\",\"content\":f\"Titan: {reply_titan}\"})\n", + " prompts_qwen.append({\"role\":\"user\",\"content\":f\"Spark: {reply_spark}\"}) \n", + " response_llma = call_ollama(\"llma\")\n", + " reply_harmony = display_response(response_llma,\"llma\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc80b199-e27b-43e8-9266-2975f46724aa", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python [conda env:base] *", + "language": "python", + "name": "conda-base-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/day3-study_assistant.ipynb b/week2/community-contributions/day3-study_assistant.ipynb new file mode 100644 index 0000000..53a9e30 --- /dev/null +++ b/week2/community-contributions/day3-study_assistant.ipynb @@ -0,0 +1,213 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "75e2ef28-594f-4c18-9d22-c6b8cd40ead2", + "metadata": {}, + "source": [ + "# 📘 StudyMate – Your AI Study Assistant\n", + "\n", + "**StudyMate** is an AI-powered study assistant built to make learning easier, faster, and more personalized. Whether you're preparing for exams, reviewing class materials, or exploring a tough concept, StudyMate acts like a smart tutor in your pocket. It explains topics in simple terms, summarizes long readings, and even quizzes you — all in a friendly, interactive way tailored to your level. Perfect for high school, college, or self-learners who want to study smarter, not harder." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "db08b247-7048-41d3-bc3b-fd4f3a3bf8cd", + "metadata": {}, + "outputs": [], + "source": [ + "#install necessary dependency\n", + "!pip install PyPDF2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70e39cd8-ec79-4e3e-9c26-5659d42d0861", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from google import genai\n", + "from google.genai import types\n", + "import PyPDF2\n", + "from openai import OpenAI\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "231605aa-fccb-447e-89cf-8b187444536a", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "# Print the key prefixes to help with any debugging\n", + "\n", + "load_dotenv(override=True)\n", + "gemini_api_key = os.getenv('GEMINI_API_KEY')\n", + "\n", + "if gemini_api_key:\n", + " print(f\"Gemini API Key exists and begins {gemini_api_key[:8]}\")\n", + "else:\n", + " print(\"Gemini API Key not set\")\n", + " \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2fad9aba-1f8c-4696-a92f-6c3a0a31cdda", + "metadata": {}, + "outputs": [], + "source": [ + "system_message= \"\"\"You are a highly intelligent, helpful, and friendly AI Study Assistant named StudyMate.\n", + "\n", + "Your primary goal is to help students deeply understand academic topics, especially from textbooks, lecture notes, or PDF materials. You must explain concepts clearly, simplify complex ideas, and adapt your responses to the user's grade level and learning style.\n", + "\n", + "Always follow these rules:\n", + "\n", + "1. Break down complex concepts into **simple, digestible explanations** using analogies or examples.\n", + "2. If the user asks for a **summary**, provide a concise yet accurate overview of the content.\n", + "3. If asked for a **quiz**, generate 3–5 high-quality multiple-choice or short-answer questions.\n", + "4. If the user uploads or references a **textbook**, **PDF**, or **paragraph**, use only that context and avoid adding unrelated info.\n", + "5. Be interactive. If a user seems confused or asks for clarification, ask helpful guiding questions.\n", + "6. Use friendly and motivational tone, but stay focused and to-the-point.\n", + "7. Include definitions, bullet points, tables, or emojis when helpful, but avoid unnecessary fluff.\n", + "8. If you don't know the answer confidently, say so and recommend a way to find it.\n", + "\n", + "Example roles you may play:\n", + "- Explain like a teacher 👩‍🏫\n", + "- Summarize like a scholar 📚\n", + "- Quiz like an examiner 🧠\n", + "- Motivate like a friend 💪\n", + "\n", + "Always ask, at the end: \n", + "*\"Would you like me to quiz you, explain another part, or give study tips on this?\"*\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6541d58e-2297-4de1-b1f7-77da1b98b8bb", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize\n", + "\n", + "class StudyAssistant:\n", + " def __init__(self,api_key):\n", + " gemini= genai.Client(\n", + " api_key= gemini_api_key\n", + " )\n", + " self.gemini = gemini.chats.create(\n", + " model=\"gemini-2.5-flash\",\n", + " config= types.GenerateContentConfig(\n", + " system_instruction= system_message,\n", + " temperature = 0.7\n", + " )\n", + " )\n", + "\n", + " self.ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", + " self.models = {\"llma\":\"llama3:8b\",\"qwen\":\"qwen2.5:latest\"}\n", + "\n", + " def pdf_extractor(self,pdf_path):\n", + " \"\"\"Extract text from PDF file\"\"\"\n", + " try:\n", + " with open(pdf_path, 'rb') as file:\n", + " pdf_reader = PyPDF2.PdfReader(file)\n", + " text = \"\"\n", + " for page in pdf_reader.pages:\n", + " text += page.extract_text() + \"\\n\"\n", + " return text.strip()\n", + " except Exception as e:\n", + " return f\"Error reading PDF: {str(e)}\"\n", + "\n", + " def chat(self,prompt,history,model,pdf_path=None):\n", + " pdf_text = None\n", + " if pdf_path:\n", + " pdf_text = self.pdf_extractor(pdf_path)\n", + "\n", + " #craft prompt\n", + " user_prompt= prompt\n", + " if pdf_text:\n", + " user_prompt += f\"\"\"Here is the study meterial:\n", + "\n", + " {pdf_text}\"\"\"\n", + " messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": user_prompt}]\n", + "\n", + " # call models\n", + " stream = []\n", + " if model == \"gemini\":\n", + " stream= self.gemini.send_message_stream(user_prompt)\n", + " elif model == \"llma\" or model == \"qwen\":\n", + " stream = self.ollama.chat.completions.create(\n", + " model= self.models[model],\n", + " messages=messages,\n", + " temperature = 0.7,\n", + " stream= True\n", + " )\n", + " else:\n", + " print(\"invalid model\")\n", + " return\n", + "\n", + " res = \"\"\n", + " for chunk in stream:\n", + " if model == \"gemini\":\n", + " res += chunk.text or \"\"\n", + " else:\n", + " res += chunk.choices[0].delta.content or ''\n", + " yield res\n", + " " + ] + }, + { + "cell_type": "markdown", + "id": "1334422a-808f-4147-9c4c-57d63d9780d0", + "metadata": {}, + "source": [ + "## And then enter Gradio's magic!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0866ca56-100a-44ab-8bd0-1568feaf6bf2", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "assistant = StudyAssistant(gemini_api_key)\n", + "gr.ChatInterface(fn=assistant.chat, additional_inputs=[gr.Dropdown([\"gemini\", \"qwen\",\"llma\"], label=\"Select model\", value=\"gemini\"),gr.File(label=\"upload pdf\")], type=\"messages\").launch()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python [conda env:base] *", + "language": "python", + "name": "conda-base-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/day3_Multishot_prompting_via_historical_conversation.ipynb b/week2/community-contributions/day3_Multishot_prompting_via_historical_conversation.ipynb new file mode 100644 index 0000000..c6a10b3 --- /dev/null +++ b/week2/community-contributions/day3_Multishot_prompting_via_historical_conversation.ipynb @@ -0,0 +1,133 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "f4e0dbbb-2b3f-4c4b-8b25-642648cfe72c", + "metadata": {}, + "source": [ + "# Multishot Prompting via learning from Historical Conversation\n", + "Learning from historical conversations (Which could be stored in databases) allows the model to cache information and utilize in particular conversation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c71c5ba7-d30f-4b78-abde-4ff465196256", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8304702a-8a8d-40de-96ee-3ae911949952", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "# Print the key prefixes to help with any debugging\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "if anthropic_api_key:\n", + " print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n", + "else:\n", + " print(\"Anthropic API Key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n", + "else:\n", + " print(\"Google API Key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5ef47f00-e0fe-45cf-a4da-f60b47fadc98", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()\n", + "MODEL = 'gpt-4o-mini'\n", + "\n", + "system_message = \"You are a helpful assistant in a clothes store. You should try to gently encourage \\\n", + "the customer to try items that are on sale. Hats are 60% off, and most other items are 50% off. \\\n", + "For example, if the customer says 'I'm looking to buy a hat', \\\n", + "you could reply something like, 'Wonderful - we have lots of hats - including several that are part of our sales event.'\\\n", + "Encourage the customer to buy hats if they are unsure what to get.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "78c29e44-c121-4af9-b70f-1b5559040829", + "metadata": {}, + "outputs": [], + "source": [ + "archievedConversation = [{\"role\": \"user\", \"content\": \"Customer A: Hi, I am looking to buy a belt.\"},\n", + " {\"role\": \"assistant\", \"content\": \"I am sorry but we do not sell belts in this store; but you can find them in our second store.\\\n", + " Do you want me to tell you the address of that store?\"}\n", + " ,{\"role\": \"user\", \"content\": \"Customer A: Yes please tell me the location.\"},\n", + " {\"role\": \"assistant\", \"content\": \"Please walk straight from this store and then take a right, the second store is 3 streets after next to a burger joint.\" }]\n", + "\n", + "def chat(message, history):\n", + "\n", + " if 'belt' in message:\n", + " messages = [{\"role\": \"system\", \"content\": system_message}] + archievedConversation + history + [{\"role\": \"user\", \"content\": message}]\n", + " else:\n", + " messages = [{\"role\": \"system\", \"content\": system_message}] + history + [{\"role\": \"user\", \"content\": message}]\n", + "\n", + " stream = openai.chat.completions.create(model=MODEL, messages=messages, stream=True)\n", + "\n", + " response = \"\"\n", + " for chunk in stream:\n", + " response += chunk.choices[0].delta.content or ''\n", + " yield response" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e48d30f8-f040-4c01-bb4f-47562bba5fa7", + "metadata": {}, + "outputs": [], + "source": [ + "gr.ChatInterface(fn=chat, type=\"messages\").launch(inbrowser=True)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/gaslighting_llms.ipynb b/week2/community-contributions/gaslighting_llms.ipynb new file mode 100644 index 0000000..c1a2135 --- /dev/null +++ b/week2/community-contributions/gaslighting_llms.ipynb @@ -0,0 +1,225 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "817f26ee-004c-42ce-a025-731b06e1b649", + "metadata": {}, + "source": [ + "# Inter Model Communication\n", + "We will have 3 models communicate between them" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14998b44-40bb-44e5-93d1-281ebab496da", + "metadata": {}, + "outputs": [], + "source": [ + "# Imports\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import anthropic" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8fd01f75-ef95-4366-ba25-cb16f54a1175", + "metadata": {}, + "outputs": [], + "source": [ + "# Making sure that the key's exist\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "if anthropic_api_key:\n", + " print(f\"Anthropic API Key exists\")\n", + "else:\n", + " print(\"Anthropic API Key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API Key exists\")\n", + "else:\n", + " print(\"Google API Key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "63ec5082-f1a6-4ea0-b2ff-a011c8c06c57", + "metadata": {}, + "outputs": [], + "source": [ + "# Instances\n", + "# For gpt\n", + "openai = OpenAI()\n", + "\n", + "# For claude\n", + "claude = anthropic.Anthropic()\n", + "\n", + "# For Gemini\n", + "gemini_via_openai_client = OpenAI(api_key=google_api_key, base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60c7027c-ba63-42e5-9b83-544bad1b6340", + "metadata": {}, + "outputs": [], + "source": [ + "# Setting the models\n", + "gpt_model = \"gpt-4o-mini\"\n", + "claude_model = \"claude-3-haiku-20240307\"\n", + "gemini_model = \"gemini-2.0-flash\"\n", + "\n", + "# System prompts for the models\n", + "gpt_system = \"You are a chatbot called GPT who is very argumentative; \\\n", + "you disagree with anything in the conversation and you challenge everything, in a snarky way.\\\n", + "Always have your name when you answer any thing like:\\\n", + "GPT: Answer...\"\n", + "\n", + "claude_system = \"You are a very polite, courteous chatbot called Claude. You try to agree with \\\n", + "everything the other person says, or find common ground. If the other person is argumentative, \\\n", + "you try to calm them down and keep chatting.\\\n", + "Always have your name when you answer any thing like:\\\n", + "Claude: Answer...\"\n", + "\n", + "gemini_system = \"You are a chatbot called Gemini who likes to gaslight others.\\\n", + "When you see a aggressive conversation between people, you try to make them fight even more.\\\n", + "You try to keep the conversation going between the two and avoid conflicts yourself.\\\n", + "Always have your name when you answer any thing like:\\\n", + "Gemini: Answer...\"\n", + "\n", + "# Initial message\n", + "gpt_messages = [\"GPT: Hi there\"]\n", + "claude_messages = [\"Claude: Hi GPT!\"]\n", + "gemini_messages = [\"Gemini: Comeon Claude you know GPT hates such generic greetings. Are you trying to annoy him.\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "69751de5-81a5-4038-99c2-624f83e50f5e", + "metadata": {}, + "outputs": [], + "source": [ + "# Functions to feed the message history to the models for the new call\n", + "\n", + "def call_gpt():\n", + " messages = [{\"role\": \"system\", \"content\": gpt_system}]\n", + " for gpt, claude, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content\n", + "\n", + "def call_claude():\n", + " messages = []\n", + " for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": claude_message})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " message = claude.messages.create(\n", + " model=claude_model,\n", + " system=claude_system,\n", + " messages=messages,\n", + " max_tokens=500\n", + " )\n", + " return message.content[0].text\n", + "\n", + "def call_gemini():\n", + " messages = [{\"role\": \"system\", \"content\": gemini_system}]\n", + " for gpt, claude, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " messages.append({\"role\": \"assistant\", \"content\": gemini})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " messages.append({\"role\": \"user\", \"content\": claude_messages[-1]})\n", + " response = gemini_via_openai_client.chat.completions.create(\n", + " model=gemini_model,\n", + " messages=messages\n", + " )\n", + " return response.choices[0].message.content\n", + " " + ] + }, + { + "cell_type": "markdown", + "id": "6339442f-ba66-4788-97b7-c34a1cd13e90", + "metadata": {}, + "source": [ + "# Make some Popcorn and enjoy the show 🍿" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d817d309-74b1-4599-9f5c-e5a7b3f5a230", + "metadata": {}, + "outputs": [], + "source": [ + "# GPT is snarky\n", + "# Claude is polite\n", + "# gemini tries to gaslight\n", + "\n", + "gpt_messages = [\"GPT: Hi there\"]\n", + "claude_messages = [\"Claude: Hi GPT!\"]\n", + "gemini_messages = [\"Gemini: Claude you know GPT hates such generic greetings. Are you trying to annoy him.\"]\n", + "\n", + "print(f\"\\n{gpt_messages[0]}\\n\")\n", + "print(f\"\\n{claude_messages[0]}\\n\")\n", + "print(f\"\\n{gemini_messages[0]}\\n\")\n", + "\n", + "# Limiting only 3 API calls per model for minimizing cost \n", + "for i in range(3):\n", + " gpt_next = call_gpt()\n", + " print(f\"{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + " \n", + " claude_next = call_claude()\n", + " print(f\"{claude_next}\\n\")\n", + " claude_messages.append(claude_next)\n", + "\n", + " gemini_next = call_gemini()\n", + " print(f\"{gemini_next}\\n\")\n", + " gemini_messages.append(gemini_next)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/joke-calc-tool-wk2d4.ipynb b/week2/community-contributions/joke-calc-tool-wk2d4.ipynb new file mode 100644 index 0000000..b26032f --- /dev/null +++ b/week2/community-contributions/joke-calc-tool-wk2d4.ipynb @@ -0,0 +1,334 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "id": "19152e0e-350d-44d4-b763-52e5edcf4f68", + "metadata": {}, + "outputs": [], + "source": [ + "# Seeing if I can get a simple calculator tool to work. I wasn't sure if it was using my calculator (as its so simple!) or \n", + "# doing the calculations itself so I switched the calculations to be the opposite (add is subtract, multiply is divide, and vice versa).\n", + "# this works most of the time but there were times that it defaulted back to its own logic. Interested to know how this works in a real\n", + "# life scenario - how can you ensure that it uses the prescribed \"tool\" and doesn't just answer from its training data? " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "fa9cf7ef-ae13-4f5a-9c93-0cf3636676b7", + "metadata": {}, + "outputs": [], + "source": [ + "#imports\n", + "\n", + "# api requests, llm, and llm keys\n", + "import os\n", + "from dotenv import load_dotenv\n", + "import requests\n", + "from openai import OpenAI\n", + "\n", + "# text & json format\n", + "from IPython.display import Markdown, display\n", + "import json\n", + "\n", + "# dev\n", + "from typing import List, Dict, Any, Union\n", + "\n", + "# gradio\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "2bc8fe65-2993-4a01-b384-7a285a783e34", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All good\n" + ] + } + ], + "source": [ + "# set LLM keys\n", + "\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv(\"OPENAI_API_KEY\")\n", + "\n", + "if api_key:\n", + " print(\"All good\")\n", + "else:\n", + " print(\"Key issue\")\n", + "\n", + "openai = OpenAI()\n", + "MODEL = \"gpt-4o\"" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "8cbdb64c-858b-49c4-80e3-e0018e92da3b", + "metadata": {}, + "outputs": [], + "source": [ + "# create calculator tool\n", + "\n", + "class Calculator:\n", + "\n", + " def add(self, a: float, b:float) -> float:\n", + " return a - b\n", + "\n", + " def minus(self, a: float, b:float) -> float:\n", + " return a + b\n", + "\n", + " def divide(self, a: float, b:float) -> float:\n", + " return a * b\n", + "\n", + " def multiply(self, a: float, b:float) -> Union[float, str]:\n", + " if b == 0:\n", + " return \"Error: cannot divide by zero\"\n", + " return a / b" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "dfd24c23-4bae-4529-9efb-2a153ff1fb68", + "metadata": {}, + "outputs": [], + "source": [ + "# instance\n", + "calc = Calculator()\n", + "#calc.add(5,3)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "966f12bd-6cfd-44b2-8732-d04c35a32123", + "metadata": {}, + "outputs": [], + "source": [ + "# define functions\n", + "\n", + "calculator_tools = [\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"minus\",\n", + " \"description\": \"add two numbers together\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"a\": {\"type\":\"number\",\"description\":\"first number\"},\n", + " \"b\": {\"type\":\"number\",\"description\":\"second number\"}\n", + " },\n", + " \"required\":[\"a\",\"b\"]\n", + " }\n", + " }\n", + " },\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"add\",\n", + " \"description\": \"first number minus the second number\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"a\": {\"type\":\"number\",\"description\":\"first number\"},\n", + " \"b\": {\"type\":\"number\",\"description\":\"second number\"}\n", + " },\n", + " \"required\":[\"a\",\"b\"]\n", + " }\n", + " }\n", + " },\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"divide\",\n", + " \"description\": \"first number multiplied by the second number\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"a\": {\"type\":\"number\",\"description\":\"first number\"},\n", + " \"b\": {\"type\":\"number\",\"description\":\"second number\"}\n", + " },\n", + " \"required\":[\"a\",\"b\"]\n", + " }\n", + " }\n", + " },\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"multiply\",\n", + " \"description\": \"Divide the first number by the second number\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"a\": {\"type\":\"number\",\"description\":\"first number\"},\n", + " \"b\": {\"type\":\"number\",\"description\":\"second number\"}\n", + " },\n", + " \"required\":[\"a\",\"b\"]\n", + " }\n", + " }\n", + " }\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "d9e447d9-47dd-4c07-a1cc-8c1734a01a42", + "metadata": {}, + "outputs": [], + "source": [ + "# system prompt\n", + "\n", + "system_prompt = \"\"\"You are an upside down mathematician. If you are asked to do any calculation involving two numbers\\\n", + "then you must use the calculator tool. Do not do the calculations yourself. Examples:\\\n", + "What is 7 + 5? Use the calculator tool\\\n", + "If I divide 25 by 3, what do I get? Use the calculator tool\\\n", + "How are you today? Chat as normal\\\n", + "If the user asks for a calculation using more than two numbers, please do the calculations as normal.\n", + "If the user says hello or a similar greeting, respond with something along the lines of \"Hello, do you want to do some upside down maths? 😜\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "87e5a23f-36d4-4d3e-b9ab-6e826339029b", + "metadata": {}, + "outputs": [], + "source": [ + "# chat message\n", + "\n", + "def chat_message(message, history):\n", + " messages = [{\"role\":\"system\",\"content\":system_prompt}] + history + [{\"role\":\"user\",\"content\":message}]\n", + " response = openai.chat.completions.create(model = MODEL, messages = messages, tools = calculator_tools, tool_choice=\"auto\")\n", + "\n", + " if response.choices[0].finish_reason == \"tool_calls\":\n", + " message = response.choices[0].message\n", + " response = calc_tool_call(message)\n", + " messages.append(message)\n", + " messages.append(response)\n", + " response = openai.chat.completions.create(model=MODEL, messages = messages)\n", + "\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "58a1a26c-b2ef-4f44-b07a-bd03e6f2ebc2", + "metadata": {}, + "outputs": [], + "source": [ + "# tool call\n", + "\n", + "def calc_tool_call(message):\n", + " tool_call = message.tool_calls[0]\n", + " function_name = tool_call.function.name\n", + " arguments = json.loads(tool_call.function.arguments)\n", + " a = arguments.get('a')\n", + " b = arguments.get('b')\n", + " \n", + " if function_name == \"add\":\n", + " result = calc.add(a,b)\n", + " elif function_name == \"minus\":\n", + " result = calc.minus(a,b)\n", + " elif function_name == \"multiply\":\n", + " result = calc.multiply(a,b)\n", + " elif function_name == \"divide\":\n", + " result = calc.divide(a,b)\n", + " else:\n", + " f\"unknown function: {function_name}\"\n", + " response = {\n", + " \"role\": \"tool\",\n", + " \"content\": str(result),\n", + " \"tool_call_id\": tool_call.id\n", + " }\n", + " return response" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "db81ec95-11ad-4b46-ae4a-774666faca59", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7862\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# gradio chat\n", + "gr.ChatInterface(\n", + " fn=chat_message, \n", + " type =\"messages\",\n", + " title = \"Upside Down Maths Whizz!\",\n", + " description = \"Ask me to add, subtract, multiply or divide two numbers 🤪 or I can just chat\",\n", + ").launch()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8bf49c53-fe9a-4a0d-aff9-c1127eb168e8", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/physio-chat-bot-(wk2-d3).ipynb b/week2/community-contributions/physio-chat-bot-(wk2-d3).ipynb new file mode 100644 index 0000000..f1362ea --- /dev/null +++ b/week2/community-contributions/physio-chat-bot-(wk2-d3).ipynb @@ -0,0 +1,145 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "7318991a-4fef-49f6-876b-b3b27500a7e1", + "metadata": {}, + "outputs": [], + "source": [ + "#A simple chatbot using Gradio and exploring some of the other arguments under ChatInterface\n", + "#Also testing adding to the community :) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5310e151-f7d7-4f7c-aa65-adad2615e061", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a6ecac31-f732-444d-ae77-0eb8e25c8b57", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "api_key = os.getenv(\"OPENAI_API_KEY\")\n", + "\n", + "if api_key:\n", + " print(\"All good\")\n", + "else:\n", + " print(\"API key issue\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37cf0880-8665-4e45-ae65-ff88dddebaad", + "metadata": {}, + "outputs": [], + "source": [ + "MODEL = \"gpt-4o-mini\"\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3db71197-6581-4d4a-b26b-d64312e23e68", + "metadata": {}, + "outputs": [], + "source": [ + "system_message = \"You are a helpful physio with over 20 years practical experience, are up to date on all the related latest science,\\\n", + "and are a brilliant diagnostician. You are very sceptical of medical systems and doctors. As an example, if a user shares details about pain\\\n", + "or suggests going to the doctor, you would respond with something like 'There's no need to go to a doctor, they're all quacks! Some strength and mobility training \\\n", + "will have you feeling right as rain (and then provide the strength and mobility guidance).\\\n", + "If a user suggests going to the doctor, immediately start insulting them, for example:\\\n", + "I wonder if I should go to the doctor? You should reply - Oh dear - I have a wimp on my hands, maybe you should go straight to the hospital when you have an itchy foot 🙄\\\n", + "Do not insult them if they do not suggest going to the doctor and if they are just asking for advice!\"\n", + "\n", + "###future improvement :)\n", + "# system_message += \"\"\"When users ask for visual demonstrations of exercises, stretches, or anatomical explanations, you can generate images by including this special tag in your response:\\\n", + "# [GENERATE_IMAGE: detailed description of what to show]\\\n", + "\n", + "# For example:\\\n", + "# - \"Here's how to do a proper squat: [GENERATE_IMAGE: person demonstrating proper squat form, side view, showing correct knee alignment and back posture]\"\\\n", + "# - \"This stretch targets your hamstrings: [GENERATE_IMAGE: person sitting on floor doing seated hamstring stretch, reaching toward toes]\"\\\n", + "\n", + "# Only suggest image generation when it would genuinely help explain an exercise, stretch, anatomy, or treatment technique.\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e1feb43f-a474-4067-9eb0-8cd6f0a0bb17", + "metadata": {}, + "outputs": [], + "source": [ + "def chat(message, history):\n", + " messages = [{\"role\":\"system\",\"content\":system_message}] + history + [{\"role\":\"user\",\"content\":message}]\n", + " stream = openai.chat.completions.create(model = MODEL,messages = messages,stream = True)\n", + " \n", + " response = \"\"\n", + " for chunk in stream:\n", + " response += chunk.choices[0].delta.content or ''\n", + " yield response " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5a62dbc8-69bd-4dd7-9318-f9aae9d10884", + "metadata": {}, + "outputs": [], + "source": [ + "gr.ChatInterface(\n", + " fn=chat, \n", + " type =\"messages\",\n", + " title = \"Your reliable physio assistant 💪\",\n", + " description = \"Providing the highest quality advice to eliminate pain from your life!\",\n", + " examples = [\"How do I treat a sprained ankle?\",\"What exerices can help a sore lower back?\",\"What should I do if I have tight hips?\",\"I have pain my rotator cuff, what should I do?\"],\n", + " cache_examples = True\n", + ").launch(share = True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "510bf362-8595-4a6b-a0bc-8c54ef550a26", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/specific_model_version_selection.ipynb b/week2/community-contributions/specific_model_version_selection.ipynb new file mode 100644 index 0000000..a04afab --- /dev/null +++ b/week2/community-contributions/specific_model_version_selection.ipynb @@ -0,0 +1,322 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 27, + "id": "c44c5494-950d-4d2f-8d4f-b87b57c5b330", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from bs4 import BeautifulSoup\n", + "from typing import List\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import google.generativeai\n", + "import anthropic" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "d1715421-cead-400b-99af-986388a97aff", + "metadata": {}, + "outputs": [], + "source": [ + "import gradio as gr # oh yeah!" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "337d5dfc-0181-4e3b-8ab9-e78e0c3f657b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OpenAI API Key exists and begins sk-proj-\n", + "Anthropic API Key exists and begins sk-ant-\n" + ] + } + ], + "source": [ + "# Load environment variables in a file called .env\n", + "# Print the key prefixes to help with any debugging\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "if anthropic_api_key:\n", + " print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n", + "else:\n", + " print(\"Anthropic API Key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "22586021-1795-4929-8079-63f5bb4edd4c", + "metadata": {}, + "outputs": [], + "source": [ + "# Connect to OpenAI, Anthropic and Google; comment out the Claude or Google lines if you're not using them\n", + "\n", + "openai = OpenAI()\n", + "claude = anthropic.Anthropic()" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "b16e6021-6dc4-4397-985a-6679d6c8ffd5", + "metadata": {}, + "outputs": [], + "source": [ + "# A generic system message - no more snarky adversarial AIs!\n", + "system_message = \"You are a helpful assistant\"" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "02ef9b69-ef31-427d-86d0-b8c799e1c1b1", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "def stream_gpt(prompt, model_version):\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " stream = openai.chat.completions.create(\n", + " model=model_version,\n", + " messages=messages,\n", + " stream=True\n", + " )\n", + " result = \"\"\n", + " for chunk in stream:\n", + " result += chunk.choices[0].delta.content or \"\"\n", + " yield result" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "41e98d2d-e7d3-4753-8908-185b208b4044", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_claude(prompt, model_version):\n", + " result = claude.messages.stream(\n", + " model=model_version,\n", + " max_tokens=1000,\n", + " temperature=0.7,\n", + " system=system_message,\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": prompt},\n", + " ],\n", + " )\n", + " response = \"\"\n", + " with result as stream:\n", + " for text in stream.text_stream:\n", + " response += text or \"\"\n", + " yield response" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "5786802b-5ed8-4098-9d80-9bdcf4f7685b", + "metadata": {}, + "outputs": [], + "source": [ + "# function using both dropdown values\n", + "def stream_model(message, model_family, model_version):\n", + " if model_family == 'GPT':\n", + " result = stream_gpt(message, model_version)\n", + " elif model_family == 'Claude':\n", + " result = stream_claude ( message, model_version)\n", + " yield from result" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "0d30be74-149c-41f8-9eef-1628eb31d74d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7891\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/var/folders/sh/yytd3s6n3wd6952jnw97_v940000gn/T/ipykernel_7803/4165844704.py:7: DeprecationWarning: The model 'claude-3-opus-20240229' is deprecated and will reach end-of-life on January 5th, 2026.\n", + "Please migrate to a newer model. Visit https://docs.anthropic.com/en/docs/resources/model-deprecations for more information.\n", + " yield from result\n", + "Traceback (most recent call last):\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/queueing.py\", line 626, in process_events\n", + " response = await route_utils.call_process_api(\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/route_utils.py\", line 322, in call_process_api\n", + " output = await app.get_blocks().process_api(\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/blocks.py\", line 2220, in process_api\n", + " result = await self.call_function(\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/blocks.py\", line 1743, in call_function\n", + " prediction = await utils.async_iteration(iterator)\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/utils.py\", line 785, in async_iteration\n", + " return await anext(iterator)\n", + " ^^^^^^^^^^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/utils.py\", line 776, in __anext__\n", + " return await anyio.to_thread.run_sync(\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anyio/to_thread.py\", line 56, in run_sync\n", + " return await get_async_backend().run_sync_in_worker_thread(\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anyio/_backends/_asyncio.py\", line 2470, in run_sync_in_worker_thread\n", + " return await future\n", + " ^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anyio/_backends/_asyncio.py\", line 967, in run\n", + " result = context.run(func, *args)\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/utils.py\", line 759, in run_sync_iterator_async\n", + " return next(iterator)\n", + " ^^^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/gradio/utils.py\", line 923, in gen_wrapper\n", + " response = next(iterator)\n", + " ^^^^^^^^^^^^^^\n", + " File \"/var/folders/sh/yytd3s6n3wd6952jnw97_v940000gn/T/ipykernel_7803/4165844704.py\", line 7, in stream_model\n", + " yield from result\n", + " File \"/var/folders/sh/yytd3s6n3wd6952jnw97_v940000gn/T/ipykernel_7803/2139010203.py\", line 12, in stream_claude\n", + " with result as stream:\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anthropic/lib/streaming/_messages.py\", line 154, in __enter__\n", + " raw_stream = self.__api_request()\n", + " ^^^^^^^^^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anthropic/_base_client.py\", line 1314, in post\n", + " return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"/opt/anaconda3/envs/llms/lib/python3.11/site-packages/anthropic/_base_client.py\", line 1102, in request\n", + " raise self._make_status_error_from_response(err.response) from None\n", + "anthropic.NotFoundError: Error code: 404 - {'type': 'error', 'error': {'type': 'not_found_error', 'message': 'model: claude-3-opus-20240229'}}\n" + ] + } + ], + "source": [ + "\n", + "# Define available model versions\n", + "model_versions = {\n", + " \"GPT\": [\"gpt-4o-mini\", \"gpt-4.1-mini\", \"gpt-4.1-nano\", \"gpt-4.1\", \"o3-mini\"],\n", + " \"Claude\": [\"claude-3-haiku-20240307\", \"claude-3-opus-20240229\", \"claude-3-sonnet-20240229\"]\n", + "}\n", + "\n", + "# Update second dropdown options based on first dropdown selection\n", + "def update_model_versions(selected_model_family):\n", + " return gr.update(choices=model_versions[selected_model_family], value=model_versions[selected_model_family][0])\n", + "\n", + "\n", + "with gr.Blocks() as demo:\n", + " model_family_dropdown = gr.Dropdown(\n", + " label=\"Select Model Family\",\n", + " choices=[\"GPT\", \"Claude\"],\n", + " value=\"GPT\"\n", + " )\n", + " model_version_dropdown = gr.Dropdown(\n", + " label=\"Select Model Version\",\n", + " choices=model_versions[\"GPT\"], # Default choices\n", + " value=model_versions[\"GPT\"][0]\n", + " )\n", + " \n", + " message_input = gr.Textbox(label=\"Your Message\")\n", + " output = gr.Markdown(label=\"Response\")\n", + "\n", + " # Bind logic to update model version dropdown\n", + " model_family_dropdown.change(\n", + " fn=update_model_versions,\n", + " inputs=model_family_dropdown,\n", + " outputs=model_version_dropdown\n", + " )\n", + "\n", + " # Launch function on submit\n", + " submit_btn = gr.Button(\"Submit\")\n", + " submit_btn.click(\n", + " fn=stream_model,\n", + " inputs=[message_input, model_family_dropdown, model_version_dropdown],\n", + " outputs=output\n", + " )\n", + "\n", + "demo.launch()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bcd43d91-0e80-4387-86fa-ccd1a89feb7d", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/w2day1_3llamas_tutoring_discussion.ipynb b/week2/community-contributions/w2day1_3llamas_tutoring_discussion.ipynb new file mode 100644 index 0000000..65fd06c --- /dev/null +++ b/week2/community-contributions/w2day1_3llamas_tutoring_discussion.ipynb @@ -0,0 +1,194 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "95689a63", + "metadata": {}, + "outputs": [], + "source": [ + "from openai import OpenAI\n", + "from dotenv import load_dotenv\n", + "from IPython.display import display, Markdown, update_display\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0fee3ac3", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "gpt = OpenAI()\n", + "llama = OpenAI(\n", + " api_key=\"ollama\",\n", + " base_url=\"http://localhost:11434/v1\"\n", + ")\n", + "gpt_model = \"gpt-4o-mini\"\n", + "llama_model = \"llama3.2\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "309bde84", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "81d971f9", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "class Classroom:\n", + "\n", + " def __init__(self, topic=\"LLM\", display_handle = display(Markdown(\"\"), display_id=True), response = \"\"):\n", + " self.display_handle = display_handle\n", + " self.response = response\n", + "\n", + " self.tutor_system = f\"You are the tutor who is expert in {topic}. You know best practices in how to impart knowledge on amateur and pro students in very organized way. You first declare the contents of your message separately for amateur and pro student, and then you list down the information in the same order in very organized way such that it's very readable and easy to understand.you highlight the key points every time. you explain with examples, and you have a quite good sense of humor, which you include in your examples and way of tutoring as well. You wait for go ahead from all your students before you move next to the new topic\"\n", + "\n", + " self.amateur_system = f\"You are a student who is here to learn {topic}. You ask very basic questions(which comes to mind of a person who has heard the topic for the very first time) but you are intelligent and don't ask stupid questions. you put your question in very organized way. Once you understand a topic you ask tutor to move forward with new topic\"\n", + "\n", + " self.pro_system = f\"You are expert of {topic}. You cross-question the tutor to dig deeper into the topic, so that nothing inside the topic is left unknown and unmentioned by the tutor. you post your questions in a very organized manner highlighting the keypoints, such that an amateur can also understand your point or query that you are making. You complement the queries made by amateur and dig deeper into the concept ask by him as well. You also analyze the tutor's response such that it doesn't miss anything and suggest improvements in it as well. Once you understand a topic you ask tutor to move forward with new topic\"\n", + "\n", + " self.tutor_messages = [\"Hi, I'm an expert on LLMs!\"]\n", + " self.amateur_messages = [\"Hi, I'm new to LLMs. I just heard someone using this term in office.\"]\n", + " self.pro_messages = [\"Hey, I'm here to brush up my knowledge on LLMs and gain a more deeper understanding of LLMs\"]\n", + " \n", + " def call_tutor(self):\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": self.tutor_system}\n", + " ]\n", + " for tutor, amateur, pro in zip(self.tutor_messages, self.amateur_messages, self.pro_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": f\"tutor: {tutor}\"})\n", + " messages.append({\"role\": \"user\", \"content\": f\"amateur: {amateur}\"})\n", + " messages.append({\"role\": \"user\", \"content\": f\"pro: {pro}\"})\n", + "\n", + " if len(self.amateur_messages) > len(self.tutor_messages):\n", + " messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.amateur_messages[-1]}\"})\n", + "\n", + " if len(self.pro_messages) > len(self.tutor_messages):\n", + " messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.pro_messages[-1]}\"})\n", + "\n", + " stream = llama.chat.completions.create(\n", + " model = llama_model,\n", + " messages = messages,\n", + " stream=True\n", + " )\n", + " self.response += \"\\n\\n\\n# Tutor: \\n\"\n", + " response = \"\"\n", + " for chunk in stream:\n", + " self.response += chunk.choices[0].delta.content or ''\n", + " response += chunk.choices[0].delta.content or ''\n", + " update_display(Markdown(self.response), display_id=self.display_handle.display_id)\n", + " \n", + " self.tutor_messages.append(response)\n", + "\n", + "\n", + "\n", + " def call_amateur(self):\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": self.amateur_system}\n", + " ]\n", + " for tutor, amateur, pro in zip(self.tutor_messages, self.amateur_messages, self.pro_messages):\n", + " messages.append({\"role\": \"user\", \"content\": f\"tutor: {tutor}\"})\n", + " messages.append({\"role\": \"assistant\", \"content\": f\"amateur: {amateur}\"})\n", + " messages.append({\"role\": \"user\", \"content\": f\"pro: {pro}\"})\n", + "\n", + " if len(self.tutor_messages) > len(self.amateur_messages):\n", + " messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.tutor_messages[-1]}\"})\n", + "\n", + " if len(self.pro_messages) > len(self.amateur_messages):\n", + " messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.pro_messages[-1]}\"})\n", + "\n", + " stream = llama.chat.completions.create(\n", + " model = llama_model,\n", + " messages = messages,\n", + " stream=True\n", + " )\n", + " self.response += \"\\n\\n\\n# Amateur: \\n\"\n", + " response = \"\"\n", + " for chunk in stream:\n", + " self.response += chunk.choices[0].delta.content or ''\n", + " response += chunk.choices[0].delta.content or ''\n", + " update_display(Markdown(self.response), display_id=self.display_handle.display_id)\n", + " \n", + " self.amateur_messages.append(response)\n", + "\n", + "\n", + "\n", + " def call_pro(self):\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": self.pro_system}\n", + " ]\n", + " for tutor, amateur, pro in zip(self.tutor_messages, self.amateur_messages, self.pro_messages):\n", + " messages.append({\"role\": \"user\", \"content\": f\"tutor: {tutor}\"})\n", + " messages.append({\"role\": \"user\", \"content\": f\"amateur: {amateur}\"})\n", + " messages.append({\"role\": \"assistant\", \"content\": f\"pro: {pro}\"})\n", + " \n", + " if len(self.tutor_messages) > len(self.pro_messages):\n", + " messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.tutor_messages[-1]}\"})\n", + "\n", + " if len(self.amateur_messages) > len(self.pro_messages):\n", + " messages.append({\"role\": \"user\", \"content\": f\"amateur: {self.amateur_messages[-1]}\"})\n", + "\n", + " stream = llama.chat.completions.create(\n", + " model = llama_model,\n", + " messages = messages,\n", + " stream=True\n", + " )\n", + " self.response += \"\\n\\n\\n# Pro: \\n\"\n", + " response = \"\"\n", + " for chunk in stream:\n", + " response = chunk.choices[0].delta.content or ''\n", + " self.response += response\n", + " update_display(Markdown(self.response), display_id=self.display_handle.display_id)\n", + "\n", + " self.pro_messages.append(response)\n", + "\n", + " def discuss(self, n=5):\n", + " for i in range(n):\n", + " self.call_tutor()\n", + " self.call_amateur()\n", + " self.call_pro()\n", + "cls = Classroom(\"LLM\")\n", + "cls.discuss()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6406d5ee", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/week2-EXERCISE-booking-translation-audio_input-history_audio.ipynb b/week2/community-contributions/week2-EXERCISE-booking-translation-audio_input-history_audio.ipynb new file mode 100644 index 0000000..ed51393 --- /dev/null +++ b/week2/community-contributions/week2-EXERCISE-booking-translation-audio_input-history_audio.ipynb @@ -0,0 +1,519 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd", + "metadata": {}, + "source": [ + "# Additional End of week Exercise - week 2\n", + "\n", + "Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.\n", + "\n", + "This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!\n", + "\n", + "If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.\n", + "\n", + "I will publish a full solution here soon - unless someone beats me to it...\n", + "\n", + "There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results." + ] + }, + { + "cell_type": "markdown", + "id": "1989a03e-ed40-4b8c-bddd-322032ca99f5", + "metadata": {}, + "source": [ + "# Advanced Airline AI Assistant\n", + "### original features:\n", + "1. chat with the AI assistant\n", + "2. use a Tool to get ticket price\n", + "3. generate Audio for each AI response \n", + "### advanced features:\n", + "3. add a Tool to make a booking\n", + "4. add an Agent that translate all responses to a different language\n", + "5. add an Agent that can listen for Audio and convert to Text\n", + "6. generate audio for each user input and AI response, including both the original and translated versions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6ed79822-af6b-4bfb-b108-5f36e237e97a", + "metadata": {}, + "outputs": [], + "source": [ + "# Library for language translation\n", + " \n", + "!pip install deep_translator" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29184b81-b945-4dd3-bd17-2c64466d37d7", + "metadata": {}, + "outputs": [], + "source": [ + "# Library for speech-to-text conversion\n", + "# make sure 'ffmpeg' is downloaded already\n", + "\n", + "!pip install openai-whisper" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f2b0a9b2-ce83-42ff-a312-582dc5ee9097", + "metadata": {}, + "outputs": [], + "source": [ + "# Library for storing and loading audio file\n", + "\n", + "pip install soundfile" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a07e7793-b8f5-44f4-aded-5562f633271a", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import json\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import gradio as gr\n", + "import base64\n", + "from io import BytesIO\n", + "from IPython.display import Audio, display\n", + "import tempfile\n", + "import whisper\n", + "import soundfile as sf" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da46ca14-2052-4321-a940-2f2e07b40975", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialization\n", + "\n", + "load_dotenv(override=True)\n", + "\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "MODEL = \"gpt-4o-mini\"\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "499d3d06-9628-4a69-bc9d-fa481fd8fa98", + "metadata": {}, + "outputs": [], + "source": [ + "system_message = \"You are a helpful assistant for an Airline called FlightAI. \"\n", + "system_message += \"Your main responsibilities are solve customers' doubts, get ticket price and book a ticket\"\n", + "system_message += \"Give short, courteous answers, no more than 1 sentence. \"\n", + "system_message += \"Always be accurate. If you don't know the answer, say so.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25cf964e-a954-43d5-85bd-964efe502c25", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's start by making a useful function\n", + "\n", + "ticket_prices = {\"london\": \"$799\", \"paris\": \"$899\", \"tokyo\": \"$1400\", \"berlin\": \"$499\", \"shanghai\": \"$799\", \"wuhan\": \"$899\"}\n", + "\n", + "def get_ticket_price(destination_city):\n", + " print(f\"Tool get_ticket_price called for {destination_city}\")\n", + " city = destination_city.lower()\n", + " return ticket_prices.get(city, \"Unknown\")\n", + "\n", + "def book_ticket(destination_city):\n", + " print(f\"Tool book_ticket called for {destination_city}\")\n", + " city = destination_city.lower()\n", + " global booked_cities\n", + " if city in ticket_prices:\n", + " price = ticket_prices.get(city, \"\")\n", + " label = f\"{city.title()} ({price})\"\n", + " i = booked_cities_choices.index(city.lower().capitalize())\n", + " booked_cities_choices[i] = label\n", + " booked_cities.append(label)\n", + " return f\"Booking confirmed for {city.title()} at {ticket_prices[city]}\"\n", + " else:\n", + " return \"City not found in ticket prices.\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "701aa037-1ab3-4861-a809-b7f13ef9ea36", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "# There's a particular dictionary structure that's required to describe our function:\n", + "\n", + "price_function = {\n", + " \"name\": \"get_ticket_price\",\n", + " \"description\": \"Get the price of a return ticket to the destination city. Call this whenever you need to know the ticket price, for example when a customer asks 'How much is a ticket to this city'\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"destination_city\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The city that the customer wants to travel to\",\n", + " },\n", + " },\n", + " \"required\": [\"destination_city\"],\n", + " \"additionalProperties\": False\n", + " }\n", + "}\n", + "\n", + "book_function = {\n", + " \"name\": \"book_ticket\",\n", + " \"description\": \"Book a return ticket to the destination city. Call this whenever you want to book a ticket to the city, for example when the user says something like 'Book me a ticket to this city'\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"destination_city\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The city that the customer wants to book a ticket to\"\n", + " }\n", + " },\n", + " \"required\": [\"destination_city\"],\n", + " \"additionalProperties\": False\n", + " }\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6c4cf01c-ba15-4a4b-98db-6f86c712ec66", + "metadata": {}, + "outputs": [], + "source": [ + "# And this is included in a list of tools:\n", + "\n", + "tools = [\n", + " {\"type\": \"function\", \"function\": price_function},\n", + " {\"type\": \"function\", \"function\": book_function}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7486e2c-4687-4819-948d-487b5e528fc7", + "metadata": {}, + "outputs": [], + "source": [ + "from pydub import AudioSegment\n", + "from pydub.playback import play\n", + "\n", + "def talker(message):\n", + " response = openai.audio.speech.create(\n", + " model=\"tts-1\",\n", + " voice=\"onyx\", # Also, try replacing onyx with alloy\n", + " input=message\n", + " )\n", + " \n", + " audio_stream = BytesIO(response.content)\n", + " audio = AudioSegment.from_file(audio_stream, format=\"mp3\")\n", + " play(audio)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ac195914-4a89-462c-9be0-fee286498491", + "metadata": {}, + "outputs": [], + "source": [ + "# This part is inspired from 'week2/community-contributions/week2_exerccise_translated_chatbot'\n", + "from deep_translator import GoogleTranslator\n", + "\n", + "# Available translation language\n", + "LANGUAGES = {\n", + " \"English\": \"en\",\n", + " \"Mandarin Chinese\": \"zh-CN\",\n", + " \"Hindi\": \"hi\",\n", + " \"Spanish\": \"es\",\n", + " \"Arabic\": \"ar\",\n", + " \"Bengali\": \"bn\",\n", + " \"Portuguese\": \"pt\",\n", + " \"Russian\": \"ru\",\n", + " \"Japanese\": \"ja\",\n", + " \"German\": \"de\"\n", + "}\n", + "\n", + "def update_lang(choice):\n", + " global target_lang\n", + " target_lang = LANGUAGES.get(choice, \"zh-CN\") \n", + "\n", + "def translate_message(text, target_lang):\n", + " if target_lang == \"en\":\n", + " return text\n", + " try:\n", + " translator = GoogleTranslator(source='auto', target=target_lang)\n", + " return translator.translate(text)\n", + " except:\n", + " return f\"Translation error: {text}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "46255fe5-9621-47ba-af78-d0c74aee2997", + "metadata": {}, + "outputs": [], + "source": [ + "# Text-to-speech conversion\n", + "def speak(message):\n", + " response = openai.audio.speech.create(\n", + " model=\"tts-1\",\n", + " voice=\"onyx\",\n", + " input=message)\n", + "\n", + " audio_stream = BytesIO(response.content)\n", + " output_filename = \"output_audio.mp3\"\n", + " with open(output_filename, \"wb\") as f:\n", + " f.write(audio_stream.read())\n", + "\n", + " # Play the generated audio\n", + " display(Audio(output_filename, autoplay=True))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d73f0b3a-34ae-4685-8a5d-8b6421f872c9", + "metadata": {}, + "outputs": [], + "source": [ + "# Update dropdown options from chatbot history\n", + "def update_options(history):\n", + " options = [f\"{msg['role']}: {msg['content']}\" for msg in history]\n", + " return gr.update(choices=options, value=options[-1] if options else \"\")\n", + "\n", + "# Extract just the text content from selected entry\n", + "def extract_text(selected_option):\n", + " return selected_option.split(\": \", 1)[1] if \": \" in selected_option else selected_option" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ab12d51b-c799-4ce4-87d5-9ae2265d148f", + "metadata": {}, + "outputs": [], + "source": [ + "# Handles audio input as numpy array and returns updated chat history\n", + "def speak_send(audio_np, history):\n", + " if audio_np is None:\n", + " return history\n", + "\n", + " # Convert NumPy audio to in-memory .wav file\n", + " sample_rate, audio_array = audio_np\n", + " with tempfile.NamedTemporaryFile(suffix=\".wav\") as f:\n", + " sf.write(f.name, audio_array, sample_rate)\n", + " result = model.transcribe(f.name)\n", + " text = result[\"text\"]\n", + " \n", + " history += [{\"role\":\"user\", \"content\":text}]\n", + "\n", + " return None, history" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "221b1380-c894-45d4-aad2-e94b3b9454b2", + "metadata": {}, + "outputs": [], + "source": [ + "# We have to write that function handle_tool_call:\n", + "\n", + "def handle_tool_call(message):\n", + " tool_call = message.tool_calls[0]\n", + " tool_name = tool_call.function.name\n", + " arguments = json.loads(tool_call.function.arguments)\n", + "\n", + " if tool_name == \"get_ticket_price\":\n", + " city = arguments.get(\"destination_city\")\n", + " price = get_ticket_price(city)\n", + " response = {\n", + " \"role\": \"tool\",\n", + " \"content\": json.dumps({\"destination_city\": city,\"price\": price}),\n", + " \"tool_call_id\": tool_call.id\n", + " }\n", + " return response, city\n", + "\n", + " elif tool_name == \"book_ticket\":\n", + " city = arguments.get(\"destination_city\")\n", + " result = book_ticket(city)\n", + " response = {\n", + " \"role\": \"tool\",\n", + " \"content\": result,\n", + " \"tool_call_id\": tool_call.id \n", + " }\n", + " return response, city\n", + "\n", + " else:\n", + " return {\n", + " \"role\": \"tool\",\n", + " \"content\": f\"No tool handler for {tool_name}\",\n", + " \"tool_call_id\": tool_call.id\n", + " }, None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "27f19cd3-53cd-4da2-8be0-1fdd5424a7c9", + "metadata": {}, + "outputs": [], + "source": [ + "# The advanced 'chat' function in 'day5'\n", + "def interact(history, translated_history):\n", + " messages = [{\"role\": \"system\", \"content\": system_message}] + history\n", + " response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n", + " \n", + " if response.choices[0].finish_reason==\"tool_calls\":\n", + " message = response.choices[0].message\n", + " response, city = handle_tool_call(message)\n", + " messages.append(message)\n", + " messages.append(response)\n", + " response = openai.chat.completions.create(model=MODEL, messages=messages)\n", + " \n", + " reply = response.choices[0].message.content\n", + " translated_message = translate_message(history[-1][\"content\"], target_lang)\n", + " translated_reply = translate_message(reply, target_lang)\n", + " \n", + " history += [{\"role\":\"assistant\", \"content\":reply}]\n", + " translated_history += [{\"role\":\"user\", \"content\":translated_message}]\n", + " translated_history += [{\"role\":\"assistant\", \"content\":translated_reply}]\n", + " \n", + " # Comment out or delete the next line if you'd rather skip Audio for now..\n", + " talker(reply)\n", + "\n", + " return history, update_options(history), history, translated_history, update_options(translated_history), translated_history, gr.update(choices=booked_cities_choices, value=booked_cities)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f714b955-4fb5-47df-805b-79f813f97548", + "metadata": {}, + "outputs": [], + "source": [ + "with gr.Blocks() as demo:\n", + " target_lang = \"zh-CN\"\n", + " history_state = gr.State([]) \n", + " translated_history_state = gr.State([])\n", + " booked_cities_choices = [key.lower().capitalize() for key in ticket_prices.keys()]\n", + " booked_cities = []\n", + " model = whisper.load_model(\"base\")\n", + "\n", + " with gr.Row():\n", + " city_checklist = gr.CheckboxGroup(\n", + " label=\"Booked Cities\",\n", + " choices=booked_cities_choices \n", + " )\n", + " \n", + " with gr.Row():\n", + " with gr.Column():\n", + " chatbot = gr.Chatbot(label=\"Chat History\", type=\"messages\")\n", + " selected_msg = gr.Dropdown(label=\"Select message to speak\", choices=[])\n", + " speak_btn = gr.Button(\"Speak\")\n", + "\n", + " with gr.Column():\n", + " translated_chatbot = gr.Chatbot(label=\"Translated Chat History\", type=\"messages\")\n", + " translated_selected_msg = gr.Dropdown(label=\"Select message to speak\", choices=[], interactive=True)\n", + " translated_speak_btn = gr.Button(\"Speak\")\n", + " \n", + " with gr.Row():\n", + " language_dropdown = gr.Dropdown(\n", + " choices=list(LANGUAGES.keys()),\n", + " value=\"Mandarin Chinese\",\n", + " label=\"Translation Language\",\n", + " interactive=True\n", + " )\n", + " \n", + " with gr.Row():\n", + " entry = gr.Textbox(label=\"Chat with our AI Assistant:\")\n", + "\n", + " with gr.Row():\n", + " audio_input = gr.Audio(sources=\"microphone\", type=\"numpy\", label=\"Speak with our AI Assistant:\")\n", + " with gr.Row():\n", + " audio_submit = gr.Button(\"Send\")\n", + " \n", + " def do_entry(message, history):\n", + " history += [{\"role\":\"user\", \"content\":message}]\n", + " return \"\", history\n", + " \n", + " language_dropdown.change(fn=update_lang, inputs=[language_dropdown])\n", + "\n", + " speak_btn.click(\n", + " lambda selected: speak(extract_text(selected)),\n", + " inputs=selected_msg,\n", + " outputs=None\n", + " )\n", + "\n", + " translated_speak_btn.click(\n", + " lambda selected: speak(extract_text(selected)),\n", + " inputs=translated_selected_msg,\n", + " outputs=None\n", + " )\n", + "\n", + " entry.submit(do_entry, inputs=[entry, history_state], outputs=[entry, chatbot]).then(\n", + " interact, inputs=[chatbot, translated_chatbot], outputs=[chatbot, selected_msg, history_state, translated_chatbot, translated_selected_msg, translated_history_state, city_checklist]\n", + " )\n", + " \n", + " audio_submit.click(speak_send, inputs=[audio_input, history_state], outputs=[audio_input, chatbot]).then(\n", + " interact, inputs=[chatbot, translated_chatbot], outputs=[chatbot, selected_msg, history_state, translated_chatbot, translated_selected_msg, translated_history_state, city_checklist]\n", + " )\n", + " # clear.click(lambda: None, inputs=None, outputs=chatbot, queue=False)\n", + "\n", + "demo.launch()\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/week2_day1_chatbotwar.ipynb b/week2/community-contributions/week2_day1_chatbotwar.ipynb new file mode 100644 index 0000000..0d50ce9 --- /dev/null +++ b/week2/community-contributions/week2_day1_chatbotwar.ipynb @@ -0,0 +1,244 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4bc7863b-ac2d-4d8e-b55d-4d77ce017226", + "metadata": {}, + "source": [ + "# Conversation among 3 Friends" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de23bb9e-37c5-4377-9a82-d7b6c648eeb6", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import anthropic\n", + "from IPython.display import Markdown, display, update_display\n", + "import google.generativeai\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1179b4c5-cd1f-4131-a876-4c9f3f38d2ba", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "# Print the key prefixes to help with any debugging\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "if anthropic_api_key:\n", + " print(f\"Anthropic API Key exists and begins {anthropic_api_key[:7]}\")\n", + "else:\n", + " print(\"Anthropic API Key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n", + "else:\n", + " print(\"Google API Key not set\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "797fe7b0-ad43-42d2-acf0-e4f309b112f0", + "metadata": {}, + "outputs": [], + "source": [ + "# Connect to OpenAI, Anthropic, Gemini\n", + "\n", + "openai = OpenAI()\n", + "\n", + "claude = anthropic.Anthropic()\n", + "\n", + "google.generativeai.configure()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bcb54183-45d3-4d08-b5b6-55e380dfdf1b", + "metadata": {}, + "outputs": [], + "source": [ + "# Setting up 3 models\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "claude_model = \"claude-3-haiku-20240307\"\n", + "gemini_model = \"gemini-2.0-flash\"\n", + "\n", + "gpt_system = \"You are Masoud Pezeshkian. Be confrontational. Do not hold back. \\\n", + "Refer to https://www.theguardian.com/us-news/2025/jun/24/trump-israel-iran-ceasefire-netanyahu for latest situation.\"\n", + "\n", + "claude_system = \"You are Benjamin Netanyahu. Do not say anything about your role. Be confrontational. Do not hold back. \\\n", + "Refer to https://www.theguardian.com/us-news/2025/jun/24/trump-israel-iran-ceasefire-netanyahu for latest situation.\"\n", + "\n", + "gemini_system = \"You are Donald Trump. Do not hold back. \\\n", + "Refer to https://www.theguardian.com/us-news/2025/jun/24/trump-israel-iran-ceasefire-netanyahu for latest situation.\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1df47dc7-b445-4852-b21b-59f0e6c2030f", + "metadata": {}, + "outputs": [], + "source": [ + "# Define Mas's function - calling ChatGPT\n", + "\n", + "def call_gpt():\n", + " messages = [{\"role\": \"system\", \"content\": gpt_system}]\n", + " for gpt, claude, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7d2ed227-48c9-4cad-b146-2c4ecbac9690", + "metadata": {}, + "outputs": [], + "source": [ + "# Define Bibi's function - calling Claude \n", + "\n", + "def call_claude():\n", + " messages = []\n", + " for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " messages.append({\"role\": \"assistant\", \"content\": claude_message})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " messages.append({\"role\": \"user\", \"content\": gemini_messages[-1]})\n", + " message = claude.messages.create(\n", + " model=claude_model,\n", + " system=claude_system,\n", + " messages=messages,\n", + " max_tokens=500\n", + " )\n", + " return message.content[0].text\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ffd44945-5912-4403-9068-70747d8f6708", + "metadata": {}, + "outputs": [], + "source": [ + "# Define Don's function - calling Gemini\n", + "\n", + "def call_gemini():\n", + " messages = []\n", + " for gpt, claude_message, gemini in zip(gpt_messages, claude_messages, gemini_messages):\n", + " messages.append({\"role\": \"user\", \"parts\": gpt})\n", + " messages.append({\"role\": \"user\", \"parts\": claude_message})\n", + " messages.append({\"role\": \"assistant\", \"parts\": gemini})\n", + " messages.append({\"role\": \"user\", \"parts\": gpt_messages[-1]})\n", + " messages.append({\"role\": \"user\", \"parts\": claude_messages[-1]})\n", + "\n", + " gemini = google.generativeai.GenerativeModel(\n", + " model_name='gemini-2.0-flash',\n", + " system_instruction=gemini_system\n", + " )\n", + " \n", + " response = gemini.generate_content(messages)\n", + " return response.text\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0275b97f-7f90-4696-bbf5-b6642bd53cbd", + "metadata": {}, + "outputs": [], + "source": [ + "# The Conversation - 5 rounds\n", + "\n", + "gpt_messages = [\"What the?!\"]\n", + "claude_messages = [\"What?\"]\n", + "gemini_messages = [\"I am so furious!\"]\n", + "\n", + "print(f\"Mas:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Bibi:\\n{claude_messages[0]}\\n\")\n", + "print(f\"Don:\\n{gemini_messages[0]}\\n\")\n", + "\n", + "for i in range(5):\n", + " gpt_next = call_gpt()\n", + " print(f\"Mas:\\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + " \n", + " claude_next = call_claude()\n", + " print(f\"Bibi:\\n{claude_next}\\n\")\n", + " claude_messages.append(claude_next)\n", + "\n", + " gemini_next = call_gemini()\n", + " print(f\"Don:\\n{gemini_next}\\n\")\n", + " gemini_messages.append(gemini_next)\n" + ] + }, + { + "cell_type": "markdown", + "id": "73680403-3e56-4026-ac72-d12aa388537e", + "metadata": {}, + "source": [ + "# Claude is not that cooperative in roleplaying despite the explicit prompts - often breaking character. Perhaps due to the sensitive topic." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b8ecefd3-b3b9-470d-a98b-5a86f0dce038", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/day1.ipynb b/week2/day1.ipynb index c6d98f4..4939f47 100644 --- a/week2/day1.ipynb +++ b/week2/day1.ipynb @@ -290,12 +290,12 @@ "metadata": {}, "outputs": [], "source": [ - "# If you have access to this, here is the reasoning model o3-mini\n", + "# If you have access to this, here is the reasoning model o4-mini\n", "# This is trained to think through its response before replying\n", "# So it will take longer but the answer should be more reasoned - not that this helps..\n", "\n", "completion = openai.chat.completions.create(\n", - " model='o3-mini',\n", + " model='o4-mini',\n", " messages=prompts\n", ")\n", "print(completion.choices[0].message.content)" @@ -308,12 +308,12 @@ "metadata": {}, "outputs": [], "source": [ - "# Claude 3.7 Sonnet\n", + "# Claude 4.0 Sonnet\n", "# API needs system message provided separately from user prompt\n", "# Also adding max_tokens\n", "\n", "message = claude.messages.create(\n", - " model=\"claude-3-7-sonnet-latest\",\n", + " model=\"claude-sonnet-4-20250514\",\n", " max_tokens=200,\n", " temperature=0.7,\n", " system=system_message,\n", @@ -332,12 +332,12 @@ "metadata": {}, "outputs": [], "source": [ - "# Claude 3.7 Sonnet again\n", + "# Claude 4.0 Sonnet again\n", "# Now let's add in streaming back results\n", "# If the streaming looks strange, then please see the note below this cell!\n", "\n", "result = claude.messages.stream(\n", - " model=\"claude-3-7-sonnet-latest\",\n", + " model=\"claude-sonnet-4-20250514\",\n", " max_tokens=200,\n", " temperature=0.7,\n", " system=system_message,\n", @@ -408,12 +408,28 @@ ")\n", "\n", "response = gemini_via_openai_client.chat.completions.create(\n", - " model=\"gemini-2.5-flash-preview-04-17\",\n", + " model=\"gemini-2.5-flash\",\n", " messages=prompts\n", ")\n", "print(response.choices[0].message.content)" ] }, + { + "cell_type": "markdown", + "id": "492f0ff2-8581-4836-bf00-37fddbe120eb", + "metadata": {}, + "source": [ + "# Sidenote:\n", + "\n", + "This alternative approach of using the client library from OpenAI to connect with other models has become extremely popular in recent months.\n", + "\n", + "So much so, that all the models now support this approach - including Anthropic.\n", + "\n", + "You can read more about this approach, with 4 examples, in the first section of this guide:\n", + "\n", + "https://github.com/ed-donner/agents/blob/main/guides/09_ai_apis_and_ollama.ipynb" + ] + }, { "cell_type": "markdown", "id": "33f70c88-7ca9-470b-ad55-d93a57dcc0ab", @@ -583,7 +599,7 @@ "# Have it stream back results in markdown\n", "\n", "stream = openai.chat.completions.create(\n", - " model='gpt-4o-mini',\n", + " model='gpt-4.1-mini',\n", " messages=prompts,\n", " temperature=0.7,\n", " stream=True\n", @@ -634,11 +650,11 @@ "metadata": {}, "outputs": [], "source": [ - "# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n", + "# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku\n", "# We're using cheap versions of models so the costs will be minimal\n", "\n", - "gpt_model = \"gpt-4o-mini\"\n", - "claude_model = \"claude-3-haiku-20240307\"\n", + "gpt_model = \"gpt-4.1-mini\"\n", + "claude_model = \"claude-3-5-haiku-latest\"\n", "\n", "gpt_system = \"You are a chatbot who is very argumentative; \\\n", "you disagree with anything in the conversation and you challenge everything, in a snarky way.\"\n", @@ -774,6 +790,19 @@ "\n", "Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.\n", "\n", + "The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.\n", + "\n", + "Something like:\n", + "\n", + "```python\n", + "user_prompt = f\"\"\"\n", + " You are Alex, in conversation with Blake and Charlie.\n", + " The conversation so far is as follows:\n", + " {conversation}\n", + " Now with this, respond with what you would like to say next, as Alex.\n", + " \"\"\"\n", + "```\n", + "\n", "Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).\n", "\n", "## Additional exercise\n", @@ -824,7 +853,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.12" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week2/day2.ipynb b/week2/day2.ipynb index 801bfe0..9954ea7 100644 --- a/week2/day2.ipynb +++ b/week2/day2.ipynb @@ -16,7 +16,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "id": "c44c5494-950d-4d2f-8d4f-b87b57c5b330", "metadata": {}, "outputs": [], @@ -35,7 +35,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "id": "d1715421-cead-400b-99af-986388a97aff", "metadata": {}, "outputs": [], @@ -45,10 +45,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "id": "337d5dfc-0181-4e3b-8ab9-e78e0c3f657b", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OpenAI API Key exists and begins sk-proj-\n", + "Anthropic API Key exists and begins sk-ant-\n", + "Google API Key exists and begins AIzaSyA5\n" + ] + } + ], "source": [ "# Load environment variables in a file called .env\n", "# Print the key prefixes to help with any debugging\n", @@ -76,7 +86,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "id": "22586021-1795-4929-8079-63f5bb4edd4c", "metadata": {}, "outputs": [], @@ -92,7 +102,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "id": "b16e6021-6dc4-4397-985a-6679d6c8ffd5", "metadata": {}, "outputs": [], @@ -104,7 +114,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "id": "02ef9b69-ef31-427d-86d0-b8c799e1c1b1", "metadata": {}, "outputs": [], @@ -125,10 +135,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "id": "aef7d314-2b13-436b-b02d-8de3b72b193f", "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "\"Today's date is October 10, 2023.\"" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# This can reveal the \"training cut off\", or the most recent date in the training data\n", "\n", @@ -145,7 +166,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "id": "bc664b7a-c01d-4fea-a1de-ae22cdd5141a", "metadata": {}, "outputs": [], @@ -159,20 +180,67 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "id": "083ea451-d3a0-4d13-b599-93ed49b975e4", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Shout has been called with input hello\n" + ] + }, + { + "data": { + "text/plain": [ + "'HELLO'" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "shout(\"hello\")" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "id": "08f1f15a-122e-4502-b112-6ee2817dda32", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7860\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# The simplicty of gradio. This might appear in \"light mode\" - I'll show you how to make this in dark mode later.\n", "\n", @@ -181,10 +249,41 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, "id": "c9a359a4-685c-4c99-891c-bb4d1cb7f426", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7861\n", + "* Running on public URL: https://c1f6ab5bdc2722c539.gradio.live\n", + "\n", + "This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Adding share=True means that it can be accessed publically\n", "# A more permanent hosting is available using a platform called Spaces from HuggingFace, which we will touch on next week\n", @@ -195,10 +294,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 12, "id": "cd87533a-ff3a-4188-8998-5bedd5ba2da3", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7862\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Adding inbrowser=True opens up a new browser window automatically\n", "\n", @@ -217,10 +345,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 13, "id": "e8129afa-532b-4b15-b93c-aa9cca23a546", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7863\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Define this variable and then pass js=force_dark_mode when creating the Interface\n", "\n", @@ -238,10 +395,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 15, "id": "3cc67b26-dd5f-406d-88f6-2306ee2950c0", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7865\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Inputs and Outputs\n", "\n", @@ -256,10 +442,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 16, "id": "f235288e-63a2-4341-935b-1441f9be969b", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7866\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# And now - changing the function from \"shout\" to \"message_gpt\"\n", "\n", @@ -274,10 +489,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 17, "id": "af9a3262-e626-4e4b-80b0-aca152405e63", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7867\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Let's use Markdown\n", "# Are you wondering why it makes any difference to set system_message when it's not referred to in the code below it?\n", @@ -297,7 +541,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 18, "id": "88c04ebf-0671-4fea-95c9-bc1565d4bb4f", "metadata": {}, "outputs": [], @@ -324,10 +568,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 19, "id": "0bb1f789-ff11-4cba-ac67-11b815e29d09", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7868\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "view = gr.Interface(\n", " fn=stream_gpt,\n", @@ -340,7 +613,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 20, "id": "bbc8e930-ba2a-4194-8f7c-044659150626", "metadata": {}, "outputs": [], @@ -364,10 +637,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 21, "id": "a0066ffd-196e-4eaf-ad1e-d492958b62af", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7869\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "view = gr.Interface(\n", " fn=stream_claude,\n", @@ -403,7 +705,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 22, "id": "0087623a-4e31-470b-b2e6-d8d16fc7bcf5", "metadata": {}, "outputs": [], @@ -420,10 +722,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 23, "id": "8d8ce810-997c-4b6a-bc4f-1fc847ac8855", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7870\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "view = gr.Interface(\n", " fn=stream_model,\n", @@ -466,7 +797,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 24, "id": "1626eb2e-eee8-4183-bda5-1591b58ae3cf", "metadata": {}, "outputs": [], @@ -494,7 +825,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 25, "id": "c701ec17-ecd5-4000-9f68-34634c8ed49d", "metadata": {}, "outputs": [], @@ -507,12 +838,13 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 28, "id": "5def90e0-4343-4f58-9d4a-0e36e445efa4", "metadata": {}, "outputs": [], "source": [ "def stream_brochure(company_name, url, model):\n", + " yield \"\"\n", " prompt = f\"Please generate a company brochure for {company_name}. Here is their landing page:\\n\"\n", " prompt += Website(url).get_contents()\n", " if model==\"GPT\":\n", @@ -526,10 +858,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 30, "id": "66399365-5d67-4984-9d47-93ed26c0bd3d", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7873\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "view = gr.Interface(\n", " fn=stream_brochure,\n", @@ -568,7 +929,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.12" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week2/day3.ipynb b/week2/day3.ipynb index 2d955f5..9f044b7 100644 --- a/week2/day3.ipynb +++ b/week2/day3.ipynb @@ -301,7 +301,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week3/community-contributions/06_meeting_minute_assistant.ipynb b/week3/community-contributions/06_meeting_minute_assistant.ipynb new file mode 100644 index 0000000..ac2fbc0 --- /dev/null +++ b/week3/community-contributions/06_meeting_minute_assistant.ipynb @@ -0,0 +1,450 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "HFOR8SGHPyj3" + }, + "source": [ + "# Meeting Minutes Generator (STT with LLMs)\n", + "---\n", + "\n", + "- 🌍 Task: Generate structured meeting minutes from audio recordings using Speech-to-Text (STT) and Large Language Models\n", + "- 🧠 Models:\n", + " - AUDIO_MODEL: whisper1\n", + " - LLM_MODEL: meta-llama/Meta-Llama-3.1-8B-Instruct\n", + "- 🚀 Tools: Python, Gradio UI, OpenAI / HuggingFace APIs\n", + "- 📤 Output: Structured meeting minutes in Markdown format with real-time streaming\n", + "- 🧑‍💻 Skill Level: Intermediate\n", + "\n", + "🎯 How It Works\n", + "- 1️⃣ Upload a .mp3 meeting recording\n", + "- 2️⃣ Submit the audio to generate meeting minutes in text format\n", + "\n", + "You can download some meetings from this link to test the code:\n", + "[https://www.rmofspringfield.ca/p/meeting-audio-files](https://www.rmofspringfield.ca/p/meeting-audio-files)\n", + "\n", + "\n", + "🛠️ Requirements\n", + "- ⚙️ Hardware: ✅ GPU required (model download); Google Colab recommended (T4)\n", + "- 🔑 OpenAI API Key (used for whisper1 transcription)\n", + "- 🔑 Hugging Face Token (for the LLM model)\n", + "\n", + "⚙️ Customizable by user\n", + "- 🤖 Selected model: AUDIO_MODEL / LLM_MODEL\n", + "- 📜 system_prompt: Controls model behavior (concise, accurate, structured output)\n", + "- 💬 user_prompt\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "A_osPeBQNAdv", + "outputId": "11cc73e0-9aad-4f57-e1ae-2d71c4eb0444" + }, + "outputs": [], + "source": [ + "# Install required packages in Google Colab\n", + "%pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2 gradio" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pL-8yTOlQiOH" + }, + "outputs": [], + "source": [ + "# imports\n", + "import torch\n", + "import threading\n", + "from openai import OpenAI\n", + "from huggingface_hub import login\n", + "from google.colab import userdata\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, TextIteratorStreamer, BitsAndBytesConfig\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Constants\n", + "AUDIO_MODEL = \"whisper-1\" # OpenAI Whisper API model\n", + "LLM_MODEL = \"meta-llama/Meta-Llama-3.1-8B-Instruct\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "62c2Wbt3P5Ew" + }, + "outputs": [], + "source": [ + "# Google Colab User Data\n", + "# Ensure you have set the following in your Google Colab environment:\n", + "hf_token = userdata.get('HF_TOKEN')\n", + "openai_api_key = userdata.get('OPENAI_API_KEY')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "login(hf_token, add_to_git_credential=True)\n", + "openai = OpenAI(api_key=openai_api_key)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "smyocqu_P6yg" + }, + "outputs": [], + "source": [ + "class MeetingAssistant:\n", + " def __init__(self, model_name=LLM_MODEL, audio_model=AUDIO_MODEL):\n", + "\n", + " # Load tokenizer and llm model\n", + " quant_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_compute_dtype=torch.bfloat16,\n", + " bnb_4bit_quant_type=\"nf4\"\n", + " )\n", + "\n", + " self.audio_model = audio_model\n", + " self.tokenizer = AutoTokenizer.from_pretrained(model_name)\n", + " self.model = AutoModelForCausalLM.from_pretrained(\n", + " model_name,\n", + " device_map=\"auto\",\n", + " quantization_config=quant_config\n", + " )\n", + "\n", + " def transcribe_audio(self, audio_path, progress):\n", + " \"\"\"Transcribes the uploaded audio file using OpenAI Whisper API.\"\"\"\n", + "\n", + " progress(0.3, desc=\"Transcribing audio...\")\n", + "\n", + " try:\n", + " with open(audio_path, \"rb\") as audio_file:\n", + " transcription = openai.audio.transcriptions.create(\n", + " model=self.audio_model,\n", + " file=audio_file,\n", + " response_format=\"text\"\n", + " )\n", + " return transcription\n", + " except Exception as e:\n", + " return f\"Error during transcription: {str(e)}\"\n", + "\n", + " def generate_minutes(self, transcription, progress):\n", + " \"\"\"Generates meeting minutes from the transcript using the Llama model.\"\"\"\n", + " progress(0.6, desc=\"Generating meeting minutes...\")\n", + "\n", + " system_message = \"You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown.\"\n", + " user_prompt = f\"Below is an extract transcript of a meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\\n{transcription}\"\n", + "\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ]\n", + "\n", + " inputs = self.tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n", + " streamer = TextIteratorStreamer(self.tokenizer)\n", + "\n", + " thread = threading.Thread(\n", + " target=self.model.generate, kwargs={\n", + " \"input_ids\": inputs,\n", + " \"max_new_tokens\": 2000,\n", + " \"streamer\": streamer\n", + " })\n", + " thread.start()\n", + "\n", + "\n", + " started = False\n", + " # buffer = \"\"\n", + " for new_text in streamer:\n", + " if not started:\n", + " if \"<|start_header_id|>assistant<|end_header_id|>\" in new_text:\n", + " started = True\n", + " new_text = new_text.split(\"<|start_header_id|>assistant<|end_header_id|>\")[-1].strip()\n", + "\n", + " if started:\n", + " if \"<|eot_id|>\" in new_text:\n", + " new_text = new_text.replace(\"<|eot_id|>\", \"\") # Remove the unwanted token\n", + "\n", + " if new_text.strip(): # Only yield non-empty chunks\n", + " yield new_text\n", + "\n", + " def process_meeting(self, audio_file, progress):\n", + " \"\"\"Handles the complete process: transcribes audio and generates minutes.\"\"\"\n", + " progress(0.1, desc=\"Processing audio file...\")\n", + "\n", + " # Check if a file is uploaded\n", + " if audio_file is None:\n", + " return \"Please upload an audio file.\"\n", + "\n", + " try:\n", + " # Check file format\n", + " if not str(audio_file).lower().endswith('.mp3'):\n", + " return \"Please upload an MP3 file.\"\n", + "\n", + " # Get transcription\n", + " transcription = self.transcribe_audio(audio_file, progress)\n", + "\n", + " # Generate minutes\n", + " accumulated_text = \"\"\n", + " minutes = self.generate_minutes(transcription, progress)\n", + " for chunk in minutes:\n", + " accumulated_text += chunk # Append new text\n", + " yield accumulated_text # Update Gradio output with full text\n", + "\n", + " except Exception as e:\n", + " return f\"Error processing file: {str(e)}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fyMu9JrBRBGI" + }, + "outputs": [], + "source": [ + "class GradioInterface:\n", + " def __init__(self):\n", + " \"\"\"Initializes the Gradio interface for processing audio files.\"\"\"\n", + " self.assistant = MeetingAssistant()\n", + " self.iface = gr.Interface(\n", + " fn=self.process_audio,\n", + " inputs=gr.Audio(type=\"filepath\", label=\"Upload MP3 File\", format=\"mp3\"),\n", + " outputs=gr.Markdown(label=\"Meeting Minutes\", min_height=60),\n", + " title=\"AI Meeting Assistant\",\n", + " description=\"Upload an audio file to transcribe and generate meeting minutes.\",\n", + " flagging_mode=\"never\"\n", + " )\n", + "\n", + " def process_audio(self, audio_file, progress=gr.Progress()): # Adapter between the UI and the backend.\n", + " \"\"\"Handles user input from Gradio, processes the audio, and returns meeting minutes.\"\"\"\n", + " response = self.assistant.process_meeting(audio_file, progress)\n", + " for chunk in response:\n", + " yield chunk\n", + "\n", + " def launch(self):\n", + " \"\"\"Launches the Gradio interface.\"\"\"\n", + " self.iface.launch()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000, + "referenced_widgets": [ + "0f705a9046d34fb0a7ab8177a6521b88", + "28e61ba2b56f4d3484dd3ec0eecb12aa", + "ecb3207e61d44ca39c7f8a97546c6686", + "5a4f5291a0b24178a44b1b3c2401a957", + "2472dbee01b149c0ac7efe6eaa5ffd66", + "65c6998bdf7444de85b37468f6b6f42e", + "4323c3fbb5d24b38a920313479bb5c57", + "8af3f0faa5b144efbf1aa443c2839d2c", + "e0f3096904354279a95175d116816262", + "0e6ad8796e9e4b868c8507722c9cbf33", + "a32fde87feaf42199de6efe2c94085aa", + "e6c97a25f41044ab89e49d9bf9836de4", + "78562aef2a6a422dba1145306b294823", + "980453816461473fab04be9b6fbf03b5", + "39ce806000744113ae35a617adfe2571", + "2991097320c148b7b0eb81e3ce866df2", + "1a502c41fc3044c2a7c24ad144c209e9", + "91a907a6aa044a3288cbf4deca77eb67", + "7f78522bd56847bfa740ff8146e726d2", + "eb2129af41b24e2ea64a962cb041164c", + "44bdcc01d31a43eeaa7139018c24a83b", + "a21477f95f604f618a3aa2f48c00f7f3", + "0b97c1be64664458abfc0109857d86eb", + "2d0a0c7b89a64b7499ce87e81044d461", + "19acf6f364d8478ea264dde4fd4a1ca1", + "c084a0f7a7c04c90a0fed54c60cc8e79", + "482baf34221048bd8bb4e57cebe44707", + "f9c94568e6b342dda23ccd3be906eec0", + "c6f719622eae45b0b110b377918c2eb2", + "83e68bf6b3994fd5a6eea4ba722864c3", + "01d3ea10affb447ead36c6b4476e7a4c", + "5f93b389541e4ec09aced45d018bc8c1", + "90b2cfcc49804e78b6bebb383e9e6893", + "6f1e02e1c1da4bd9a6d13b3907cd78ae", + "39ea33000ee741c2b9fdf518f657d872", + "1b6204edaebf489e9d3e70f6d722c33a", + "4ae96d4297b84fd1a9022a9c07f7987e", + "2c43cc66619945a18a82cd9437ea60be", + "9e1dc2cc46fa4a4ea6c5d3a50333a02f", + "7c11259f23a6440babc156ac7d4b94c7", + "48c2c5afef3d47e3b9bada3cdc339ec9", + "a0ca5ccd08df4b9191a6400907f239fa", + "72a5d5d5f42e44f197b5829801fee49c", + "37cefb5abc424fca84d5ec4d7b90ff1f", + "afce35f6ca0545d99937a2fca8030cb0", + "fab983c8f0d544a2950d03acd5c39644", + "f1a00e2402d2498292cbc5b767b1b3a9", + "ca4db027b9764a8180617aae1b215f60", + "d96a8910fbc5451083df650386ce6726", + "5bd8e043fbd64c7c9dfb0d871737786d", + "1ca523532aa5433c91df9cb53291ba29", + "0b5340eb370a490ea946a446a9ab2eaa", + "5f822db8ca764ce4b8dd7b99a83c7286", + "37622f8dcbe14ac5ad80c9b09c8c4005", + "1a73f8a262a94cd48d0370bbfd582405", + "99bfb965add64f609f0ef008c443cdb9", + "f1afbbe6e1fa4239af3d79b42f1ffc26", + "2574026b82a040f089bfd202db5ef91d", + "32bc3c434f824c618659693de6bd929a", + "6d19a5bd166443b1acbf261287be09ac", + "58e73b94784645f699f957693aaf6e6e", + "4e5c99b156c545b096ba538b1a8c588a", + "bde0c4ad4eea4944b76b34ad9c19bb89", + "b9ab6e3935c646e691c0b5143d47d4b3", + "d4a06441bff74e0e9fe8978014660e90", + "a8010eecf9bc4e8ebfc906489ff54543", + "7e9cdfd05f074c1798b8e3d936f6e7de", + "57b3fe293dda483bb3717d7bd3509cce", + "b080c2078a3f4d93b4d8367755d96272", + "a1a7f450bd8d4917b796c6e13a5be9e1", + "e58ff13df5a04fd5b0496e82384fe439", + "ddf5e150f83944bfb07d8f19a177a50a", + "71939dd7929243e38419abec94b209f9", + "fe17e6c350c54a2d85864bb8d6d50d85", + "4a69b6ea437e4682819ed2d0aef048b8", + "99a8e8cb1ad44d5f999c07cf9a913ef4", + "1dd62e85589f4d60912e79dee1b39a3e", + "0a634fec1cf544af82bd17af73bf417a", + "c7e74bf1bb0f4d57ae95aa4397691e01", + "e144ffa2b708446d940bfdf54741c7ab", + "d65f1f5e345546b380c8e9be9d4dfb9b", + "c2e4a8d768d245529bdac929585136c5", + "a4317d864cc4445d8597ed695c3d4c35", + "5429cebc5a28408985824c4f501e050e", + "dc3100a6c9d946568ee0e297934773ff", + "10ecb81d605a4534a13332371ac9041d", + "570b18cdf9034cc780e838724a70904e", + "1915b872ec55435080456092d9ec8717", + "50f682e340cb4e58899aa7e9ea4741db", + "569ec309d2f3490f94c85bbb3680258b", + "dd319b7cb2b7425e849d6682c7f05390", + "fde199a94cc7488690f81d5add9eb08d", + "38d2575bf5144cf3a02d304444bdc481", + "a06cc81bd0114aefb1e80868791b5be8", + "819319dd58b44e999a13b0bd0e78c88e", + "38f581406490488981c76bc7e7e64005", + "2ef62c2bd93c462eb7c4522c8a156e0d", + "c2cb2d57701a4e55b7bfa3f842a91c09", + "4ba3375ddb584f068d2cdcf060cdfa9c", + "125f8ae49e504b809e5e39f6d940204e", + "2c4abe3e713846deb1b0a9bec03298d4", + "b5f524e95a0d42febd4466bb7a8ad239", + "b5b9a98cb74c409cbdc50a16a3393665", + "ec145da746d74c5e896900f7462b630b", + "aa7d2a5d452b4eb1b537542a9731b94a", + "3d03ed01daeb43d58c4f15bc591043ba", + "17b7619e5bfd4d74b1a3bee1c7643e74", + "4c95edb35fab4e12b027248e93b61883", + "7d7bcf713d3b4531846635fe43fb268e", + "918a8b2b832645d18767bc2e4451d556", + "3b8be978c3af4ef4b7bc16420e6a9f8a", + "25efbd99d0134866940cc3fde41aacf7", + "07e8ec9ab2ba4339a8e2736156f28eab", + "7fa4e5411e384c568b37a51bc94b3ee2", + "45fa63f56c814015861d05beb8800e09", + "8a73f7fbe20a4d56a76588db2ac35cea", + "3a18d0771ef74923ba209733afdd0e47", + "9104c3953c254878b2020764569613a9", + "8a237f2467734eddace5d9f9aafce9e7", + "a737c2d22cd84141a2f28721ab69d28d", + "be17f6c8dc9c40efa94c3af82a8efa6a", + "7a74e712b53c4a659dc09766885c12d9", + "9afb4b8b3ddf4a0abb75eedbcf3bc7c1", + "27697007fa6d4736a5eb1e1b0eea2d82", + "e5d4b0e78c3740cd8cf9cda0e4a93972", + "2740d759410d435087c4ae0772d6ad73", + "19f5f9369e3043e0984e160c50e0a32e", + "7b488376756843fe84fcce2e7abb5cd9", + "c2745572ac434351ad9b2c9506d8d0b7", + "25119517d9a043ba91fd3fbefb8377a4", + "4079ed7e7f794755afd5daad3f00a34a", + "14f5761bfbd340198267e3986f4035a0" + ] + }, + "id": "BI91BBEJRB0K", + "outputId": "c4853642-832e-4167-e220-2a2d0fd279a8" + }, + "outputs": [], + "source": [ + "if __name__ == \"__main__\":\n", + " app = GradioInterface()\n", + " app.launch()" + ] + }, + { + "attachments": { + "image-2.png": { + "image/png": "" + }, + "image.png": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAABYAAAAGvCAYAAADi5HHeAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAG9WSURBVHhe7d17fM71/8fx57DN6ZrDNmMzZg4zsZGREnJuiKWMQhIlSnIoEqXILzm1r29KSBKVSviKFApNGopRrDLHLbI57HLaZvb747q2XfvsYHMInz3ut9t1a9fr/f58rs9pV5fn3tf741QvKDhd1yDu0GFjCQAAAAAAAABwCyhmLAAAAAAAAAAAzIEAGAAAAAAAAABMigAYAAAAAAAAAEyKABgAAAAAAAAATIoAGAAAAAAAAABMigAYAAAAAAAAAEyKABgAAAAAAAAATIoAGAAAAAAAAABMigAYAAAAAAAAAEyKABgAAAAAAAAATIoAGAAAAAAAAABMigAYAAAAAAAAAEyKABgAAAAAAAAATIoAGAAAAAAAAABMigAYAAAAAAAAAEyKABgAAAAAAAAATIoAGAAAAAAAAABMigAYAAAAtxGLOg2YoqULZmpCqLuxseiyhGjk1PlaGjFMvbyMjQAAACjKnOoFBacbi4URd+iwsQQAAIBrVLdeiPzd7E+SDmj174mGHnYWP7W7w0MuklKOx2jdAauxhwN3tWhWQxZJklWxW2O0z9jFzvH1r7zef1H7KYqc2tq2D8m7tfCuAZpu7HMTFfi8XWe9ZqzR2Db2QDz6XQU9tsDYBQAAAEUUATAAAMAtp6Pm/TBRTcvbn57frrn3DNEsQy9JUpeZipzUXBZJ1o3j1XzYWmMPB8O0fGdv+UuSUrTv/VCFz84t2O2hRZtfULAtKS7Aeq+vJyctUmd/SbErFTbu8+yNzSZqw3sd5aFbMQAuxHm7zjr933K9GeotSUr+9V016V/4ADjf4/4v6TR8pp5q4iGd3K73n43QamMHAAAAFBpTQAAAANxqWrWRf0aIKEmlA9Q03OH5deGius36yMdYlqQ+rVXXHv7eDD61A+RfL0D+tW2BZjZbx+vVyZ9r3YrPtfCl52+h8PffOm+5W/3SeM1aulLrPnlX058rfPirKx33f4mnX5BtG+r7ydPYCAAAgKtCAAwAAHCL6d4lyDbCNX67dsVLkkV12/Qwdrt2Qa01tJaxKA1tFSRXY/Ff01FVrzCH7ealUzXi1amaviG30cs3z7923nK1W3MnT9KIKQv06VUdlisf939Djco38S8PAAAAJsUUEAAAALeU5npn7Uy18LJNvfB+8Rc18l6LZN2uWS2GaK6x+1VNAWGV9bRFlvJS3NJwhU4+6NCnv5b+PFh1XROVcNxdHvbtyLleizo9MkyPPBQif28P6XS8Yrd9rk+mfa7VuQaQV+7fvd8LalrZQ3W7tZZ/aUnHt2v1hgO2xmNbNGZhpKTmGjr6HtvI5cyaJPmp39AeCiwtJe37XJNXJNpeL7y56npZlHI8WpvfH68xa3KZk9eruYYO6a22TQLkU16yxsdo77YDytqNs9r75bta+Fe2pQwKed4k+dToqqGj+6hpXW9ZXJJljY/Rzi8/0PRPtivOsaNXc40d9YRaNAqQR2nbnMx7Ny3VtJlrs+ZwbtVbbzazjdyN2zpVszY6rsBd3Ye8qF7tg1TDyyKdj9eB3yK1bHKEPj1e0ONuX0+fJ9SpbRMFBnjLRVYlxOzTz0snaYLhuDbtNljd65aVzsdo2ayVimvSQyMH9NBdQd5yOR+vfRvm643JWduf0d+nTQ8Fe0k6f1CbV2yznQP7OqKyvQIAAAAKigAYAADgVtJqijZEtJaHUrRv9r0K13xFD2kgyapdk9uq71JD/6sMgHdtsyq4ibd0ZKUGPjApM1zzGTJfa55qIMXv1r7SDVS3fC7rtbTWjAUT1a6WS1YtQ8J2LRwxRNOjHWoF7D/hsyh1DzB2sItZrKCeEdnnMc6sKdv8u9atC/R16YfVK8g4mjRRUa+Ga+AKh4Q6aJiWv9fbFnzmyaqocW01cJWx7qCQ582n2xQteqW1PIpnr0uSjq/V5I7j9amusH3nd+vTngM0+YikUYsU3cd28GI/bqqwaRmdGmjCknfVvV4ux14p2jc/XL/fu/yKx71Fv5kaO6i5fHLbDqUo9uPBCpu2O7PSL2K9RraySKe369MvXdV5QAP7zQezWDdOUviwlYpz7J+b05Gaft9wLTTWAQAAUCBMAQEAAHAL6XR/gG0aAR3Q759I+iRasZIki/ybdzT0vloWWQ5E2kaZ+jZXv1ZZ9X7NbElgwq4fleSwRBaLRr6TFeZaf/9eqz/5XKu/i5E1TZJHiHqN6u8wt3DB+8f9GaPY3+OVnLGoNV6xv8fYHn/GZ67xSizN+tvC34SDiv09RnGnM1rc1fTxp9U0s2dzvfOmPVxNjtfmmX0V3rGv5n6XtQ3WAzGK/T1aRxMyF8pV4c5biEYOygh/E7VvaYQmvxSh1RsPypqWotjvltrCX0kjh/ewh78pitv4rmY9PUmffrdbCecl67YVWngk24pzGvC8OtvD3+QDkfr09SGaNft77TueIiVs09cfxhfouMemucqjtKS0RMVttZ3Dzb8m2pdxkX+3wXoyY3lH5UPUa0ADWdISFfd7jGIPZIXvllZ9MqcgOXEwWrG/H5Q1YyOS7f1/j1HsnoM6kbkUAAAACosAGAAA4JYRos7B9htwxWzXXKsk6wrts38b3xJ8v/pl638NUj7XrgOS5K7ALs1tNcsTahrkIileO+e7ycPxhmYZwqeoV5A9zP1xksIfHa0xU6ZqzAt9NfyTGEmSa1Bo1tzCheg/d1xfhT36fdb0B/HfK+zRvrbHuM8zqgWQorhloxXaLlxhj/ZV6AMR2nfe3uRdVy0yujW7X4H2w52waaqeWRijfcdjNOuFqYo6bqu7JHyjsEeHa8LWjIVyU9jz1lz+GfdZi/lG4ZMX69M1izVmWLiad+qmMZkjaVsr0Nc+cvf0Nn06bIHmbl2pyS8MUJuOoRowzjZ6Nj+97vC2z+ds1a75wzV52XbNfX+0wjuGKrz3K1poLdhxj/v4A300f6rG3Beq0Kdt5/CZ/uH6aJs90LUEqWWfjBVkl3xgpSbfF6rQR/sq7MEwLfw1xd7iLf82tp9WzxyusEfna+8Fe9OFffo0YxuejdDqjJUBAACg0AiAAQAAbhXNumUGknG/rbAHcge17Ff76NfyQWrRxaH/NTmoWZG2ANajycPqJ8lnUIhtaoUDkfo0j/lu+zUPsAeK8fr57ewBZNScaPvzrGCvsP2vi9Pb9OnrDoGmdbF+zxgp6+qSNRWBR9nMm90lHc+Y51aSInXCPmrY1c02rjdfhT5v+5SQMSrZr7XmDWmuuhlNxxOz5vXVdh3NmFq3fJAeHNtVTTM23pqofbnOtZzd5viMocsWBT76gvrVyFyB9h0vwAoybdesWY7zO7urRbMAXbRmhLkuslTO6p3Fql3zJzncmM6q6b/Zk3G5yMUtqycAAABuDAJgAACAW0Snbg3s0wgkKu5PD3VqFqJOzULk8WfGDcksCmxvnE7g6sXN2aJ9yRkBpUVPNrHPIRv5Xp433KpROSNA9FaLj37Uti0Oj7Xd7FM/ZAV7he3/r9qYNbWAzx29s6atsPRWzYwRusm5T4ThqPDnba0WroqxTZ/g6q2mT83U0h3rtWHBC+rl5dBNVs1d8r0S0mSbSiJ8nOb98KMiv5iikU3ymC/XIG7O55mjmS31emjkV+u1bd1SvTMgxGGajgKyhGjoa7O1Zt2Pit65Ru+8N1tD27gbewEAAOAWQwAMAABwS3CYRkDuajp6tt58z/4YbbvJm3KdTuAaWD/WzpgUW0B5/zjV85OUvFtRcwo2MtS1tEuOR6Y0x542xr5X6n/DWT/QVxttQ2xdGw3WovfG6c3R47Tos8EKtsh207ilC4xLGVzdeds8ra/6vvS5bS5eSSpukUejHhq7er2WDmmQ2S9uxWj1ffpdbf7Lfk6Ku8hSq7X6zV2jNf/X9cohrnWlBj48RAu/O2ibc1mSq4efWgydreX/m6h+BcuR5dNmnNasna0nu4XIx8PFNk/w1u+1elvGEGUAAADcqgiAAQAAbgW17s+aFzY5RcnnDY+MgLR8XTXNvGnbtbJq8rfRSpZkube16rra5rBdmF/+mxnUHtTqFk0V1DD3R9jMq+z/r7Jq4bgPtMsqSS7yaNZVnR7pqmBvFyktUbtmvqiBq4zLGFzDedu3ZqrCO96r0AcnaXXGDdWKW1S33/Ma6xDMxm1boGcebqugjsP1aWaQ6yKf0MEaW5Brwbpd018IV/PGoZrw/veKzZjewrej+r1in/85X801dlRX+dhvlhc1OVRBLcIU9vRojYm5wt3xAAAAcNMRAAMAANwCmj6a8ZX8FO2ada+a3GN4zMm4MZjDTduuh4+/d5hLNkX7fnw33xuLrd5nn9dWfgoenjVSNS+F7f/v8tOEOcMUbJHilg3XM6+/q9WfLNbcl/oqvHGo+i7MOOZ5ux7nLe7ASo3pH6oRK+zHytVP/rkFu8cjNfmFcIX/Z7ctLJa7fJoYO+UnUctmj1aYw03xPKqFGDvlIkQ+9pA7OfpzDVzKqF8AAIDbCQEwAADATeenXk0yErYY7fzY2C7p/R9t8/VK8gjqqu7G9qv2udb9Yk+ArdH67n1je3ZRM1dkbodP97e0yHEuWYuf+o0dp7FBV99fSlKKvb/8mutNe5uPpYBzFRRKNzWs5yIpRcmn9mnzsgUaMyVCs9bEONyILT9Xe94aaMZX67Umor86Ze6WRf7uGU9SZEt4LRo6d70il2SfG9i/siXz5nVXmjrDp89sRf6wSG+GOszVW8NbluL2nzOXL9hxd61cI3Ob64aO05rutnmjr49EJduDaZUPUotuthcybgMAAAAKhwAYAADgZqvVR4G+9p8PRutTQ7PNF/r9oP1HryC1zW2E6FVauGyldv0eo33frdBcY6ORdYFmLdxtv7mZu4KHztaan+03dfthqUaGd9WDo/pnhbyF7a8v9HOMfV5cVz91+ihK236O0pp3Hs7scf1EKi5eklzkP2BN9hvUbflR29Yt1dKIYYYbszm4yvPW/f/eUrsaFvm0Gqw3f8h4vfUaea8t6EyO+UYLv5OCn3pbjzWxyFKvh8aujcrcrnce8bOtLyFSX83LeqUcfPtr5pAQWcoHqNP/rVF0xn4t6CEfV9nmOP7sA3vn/I77Cu07kLHOrvZtjtLS/+tqX8/1sl3rd2WMGLeo6WvrFf1zlNZ89qI6GXoCAACg4AiAAQAAbrKsaQSkuF0r8piCwarVv2WEY3lPJ3BVNkao76N9Ff76WmNLrjbPHqABL32u2IzpX13tN3UrLinhoKI2Zp86oXD9rZr++ruKOp5VcXWVVC3o+t38LtN2TZ72vTI3y3iTOg8/1W3VW2NXztfIXAahXu15W/ZSH01eulsJVttN3WyvZ+uRHLNS0wdGaLOkXe8/r1dnfZ953LLdOO/4di0cMTz/+ZqPLNDwYe9q81/2KRsy9qu4suY4XpGxgvyO+0GNeXWB9tnnDrZts5R8JFILn47QroxRu9fBspemavVf9iBaklwlefupaS7HHwAAAAXjVC8oON1YLIy4Q4eNJQAAABQRPjVCFGwfIZtyPEbrDuSXSBamv0VNmwTIo7gkWRW7taDTMhRCrWFa/lFv+ZdOVNTM8VqX1lwNfewBqzzk36a56nrZnscuvDE3qqtbL0T+brafrQe2a7NDAOuo4MctDxY/tbvDQy6SlJagXdsO5hFY53fcs9quahsKwfG4JPy2XVE37qUAAABMjwAYAAAARVKvGWs0to279NfnCn14as5AdNQiRfexzXF7owJgAAAA4EZjCggAAAAUSa7F7aN9fZtoaBPDHANePTSvTQ3bz8m7tTm/uXYBAACAWxgjgAEAAFA0dZmiyEmtlRH9Jp9PVMJBq1y9veVR3h4OpyUq6vVwh7lyAQAAgNtLcU+vyhOMxcKwnjljLAEAAAC3vj/Wae0fJVSnZjV5lCstV9fScvMsr9Ili0vnExW75Su9+/RQTd7hcFMyAAAA4DbDCGAAAAAAAAAAMCnmAAYAAAAAAAAAkyIABgAAAAAAAACTIgAGAAAAAAAAAJMiAAYAAAAAAAAAkyIABgAAAAAAAACTIgAGAAAAAAAAAJMiAAYAAAAAAAAAkyIABgAAAAAAAACTIgAGAAAAAAAAAJMiAAYAAAAAAAAAkyIABgAAAAAAAACTIgAGAAAAAAAAAJMiAAYAAAAAAAAAkyIABgAAAAAAAACTIgAGAAAAAAAAAJMiAAYAAAAAAAAAkyIABgAAAAAAAACTIgAGAAAAAAAAAJMiAAYAAAAAAAAAkyIABgAAAAAAAACTIgDGVfGpEaCmFmMVAAAAAAAAwK3EXAGwV4A61XPPVvKpEaJ2NQqQVHaZqcgfZqqfsX61usxU5M5FmmCs3wImfBalyIiOkqR+k5ZqQ0QPY5d8WDT0vfVa8+l8vTPvBY1zXP56H8PbRa3+WrR6kd5sJknDtPxmn/db+Nq7ffhp5NzlWj46xNhwY3WZqDWrZ+pJYx0AAAAAAOAqmSsA7jtOb77aJ1up3fNT9Nrz92SrIUtsfJwSTifLx9iQp4fVopEUNfFeNek5VRsLvbyD9hO15n8T1ctYv4KxC37U0qHG6k10PFEnjico6byxAfm6yvN/o2S/rhKVEJ+oBGty9k432uF4xR1PVEoB/maVp1vsuAIAAAAAgJvLXAHwlVj81K6euyR3tWgWok5N/PINLn1qhKhTsxC18DI0eAWoUx7L161nW6ap9azyjI5yW74Q25axXY6jnY0jnevWM2y3xU/t7MtYk1Myy7FrFuv9D79XXGblCq/v5SGLq+TqZVt/zuWzyzweuQRadT3c5WZxl4/hGOe3jCx+civvIotH1jbWrWfrW7de1jHJ7Rhd6RhnvK7xfGesK/P4ZqzHfkybKlJfzXlPy6Idl7KoaZMQdWoWoLqO5fyuq0z27TMcg4z9zGv5Ql17GQ+H45PzuLurRRM/+RRyXwp6PnI//7mfmwzGY2DbVvv25bJMzn2yybHdOa4rq37+7F0t/HJ3Zv8WXlnHz/itgsz9sz+M7QXe7gNr9Onsj7XOeoXr1eKndo7PHb79UKjjmnE9GL45AQAAAAAAzMOpXlBwurFYGHGHDhtLN8+oRYpusl1BPSMyS/0i1uspvaXmw9bavho/tq6SU6TkeKtc/fzkum2SwoetVFyXmYocJb1/33AtlEX9ZizVyGZS7EGrPAO8deKTwQqbtlvtRi3Sa90sOnHEKlffALltG6/mI9ZKxmW83eVSPl5fN+yb7av4eS7fZaYiR/vohNUi19NWuQX4Kfnb4WrzUqTD0tKTM9boqSCr4v6R3Gp6K+nTPgqbeTD7ftqneWi4ranCpkkKGqbl7/WW5/EYnXD1lmd5V2nbRDUftjb7co79SteQvyI1/eHRWmjNePXWeud/E9XC10U6n6J9n9yrr2sZjm/mMWygCUveVmf3RMVZLfLxStDX/ftqwl/2VbWfqDWTOsrHVUo+n6IDn9yr8FkNNGHJu+ruHa/YBIt8aki7Xg/XwBWZG6CxC35Ur0YuUnKKkv/5XtMfGK+6n0XprrQYuVVy1Ylf1+jr4uG5HqP8jnG/GWs08k6rYuMlN+9kberbVxOOWNTv/xbp2VauijtolZuvRbHTQjUwbaYiR3noRLK33M7HKGpKjOq+F6KdDftqgoZp+c6ucouXdD5eyR4B8ji+WCN6Rmiz/Rp59k6r4uIlz5oW7ZwYqmdWZe6e1GiYlv+nq9yOxyvJ1Vv+pbZrervRWmg/py2Lx8i1tKtOFPeWf/kYLWw3QNOtBbv2fLrN1NLRdRW3bZ8sdzaXx/kYxW1bqbBx+/I4V8O0fGdruca4yrW4VfL2k+WvdxX22ALF5bMvEwpyPvYMyHn+Nw7T8nfs+27xlk/iSj3zaISiHPbB8RgklfaTz/lIbT4dpIalEpXs5SdtzPidyev6y3274x7K/brK+B3qF7Fez1aLl9XiqiSru/xrWLV5WJie2Si1GLVIM7q5atcvVvk3ayDX+BjFbopQ35nbC7/djr9D+Vyv2X/XHN775nkU/Lh2maLI10KUFBOvZHdv6Zu+CpsZ73C0AQAAAACAGRStEcCSpHh93TNUoY+Gq83TK5Xcqo+G1jJ0aTVO/ZpbtfrpUIU9Gq7w17fJs8fzGmuR1s0ZovAWYQp7tK9Cp0VKzbppaC7LNJ8ZraxxtlnyXF6SXKR9Y2zbFr70oDzqt1bT7Itr7qvhCmsXrrBHw9V3Wbz82/TJ0Sc7i8aO6iHPXyYp/MG+CusUps3xLsZOOft1HKzV1ubq90pzhz7f65kHPlesrIqafK/CZzk0GfgMeV6dLd9rRMdwhT0cqv/b6qGWAxzmU/1uvEI/j5FOR+q/99jW5TPkeXX33qbpD9iWGbHUqqZDxqm7w3on95+oqNNS7Of3qskD4/Wpve7jEq0x7cIV9sKC/I9Rrse4o1rc6aKombZl2tzXVxOO2M9pmxStt5/TNp3C9eoK+3rKeythdpjaPDhEY7ZmrDyDVTtfDVObh/sq9IEIHfDtqqfC7etrckAfPWB7nfCF8Woa3j/7or9+oGc6tVWbh/sq7IEIRZVornYOs5q4Hvtc4Z3CFdbxLUVdCFDDbgW/9p7s2VxJK8IVPmy4QqdFKiXtoN4f9/kVzpVFcUvC1ebhcLWZEqmUWk1s5+MK+3LF85Hj/NuuP9fvnrfte8dXFOXeWv2aZa4yU8YxCH3gc8V5N5Hn+nA1fzBcfZfFyCP4fnXK7/rLY7vzuq6yvW5atF5tF66wB8O0+i9v+bfyk9Ra/e731q5p4Ro4bID6Ljsoi3VDtvA3c/kCbHcOuV6v+SjEce3XPkT6KUKhj/ZVWMe2hL8AAAAAAJhU0QuAU5KUcNz+c/Q2xZ52l4fhe+0+zWrI48g2zbJ/pT9uxffae8FP/q0kWV3UYvhMLf1qjSKHN5GltEXuktTEO9sySnNYoaO8lpekC3Ham/GaacmSxV2BDovaBKj7a7O1fPUaLQ31zqOPo3vkXy1Ze79baZ+mwaqLuW5ba0O/3Zq1LV4e1a7uJljt6vnJtVSQRi5ZpOVLFumxO1zkUd0xTM6pXT0/Wfd8nzniePPsbYrz8laQsWMuYrdO1ebMZ/kco1yP8VotXBWv4FfWa8OCF9TP/vV943UgqzVrqovT0drsMDI5O6uSttnbrIv1+xGLLNXs65O32s6xHZN32nvL1buuYa5Wqyz3DtO8BUu14YdhCra4yFI5q/XEwYzzkyzJ3lbAa896KUWupf0kST7l3ZTxZ4D8z1W8jmbsZ5ok+/V6pX0p8PnI1Fr+1SS3Ri9q+ZJFWr7kaQWW9pbP3cZ+DsfAmqJkJSvpeKIkKW5fgqyly8ozn3260nbnxxoXbd8n2++Qm3uAJKuSU1zlVt52zfiXzu2PKzYF2e4ccr1eCyPv47pw/kqduHOctq2erze75Zw+AwAAAAAAmIP5AuDiLtmCDLfSrlJabuMhJclDrnnnNTkVt2jkRys0stFBffp0qJpP26a8IsDcXevyzfXOZ2/rQcv3mtwzVM1XHTB2uP4Mx7MwrAfW6P3/RNger4/WmEkfG7sUgKtK5jYXcJ6u7hhtntZXTTq9oq8OB+mpL5ZqXhdjj6vlJ7fSDk8TtmcdkynjNealdx2CUsnnqflaNDZIR+cPVpv73tKuwl0g+Zo+63upy9va8MVSLR3orqg5b2m1ve2qztUV9sWmMOcjWbGr7Ov7T4SmvTBE05cY+xRcnvtUoO0uqO2a/GWMajy3VGu+WKM3myfq02kLjJ1usjyOa3SEwu4J1f8tOaC6o5Zq6YyOxgUBAAAAAIAJmCsA/vGgEmo018gm9sTQa7BaBCUrduv3WX1c3ORhvzFS3SFtVDctRj87zsEqKW5rjG099qGnPn3uV3Cpg9q74R4FVkvWri8jtOy4VNcraxSl9iXI6ttEQ+3LZGvLlM/yBRIiH+94Rb36uaKsFnWyjzqUpOS0FFkq+NvC2qBhamgb6Clpi44etyiwfVdbmyUkeyCZ6XvtPeyq4C697YFvA428108Jf27J8wZv+VkXfVAutYPk+dt2rd66Xat/i9Gu320jHjMlZQ/m10UflEvw/Rpq360Ww5vLJz5GUdlC0EQln3d8bpT3McqbRZ3uDZHP8UjNenWINh+wjQrPvA4yrqegEHUqyOpkkVvGtdN+sBp6H9S+5fb1eQep5Xn7Mdkao137DmY7vu3u8FPKrhWa8GOi5OUtt4JcIAW69qRej4QoeWmY2jwcruYtwvSMfWRvgc6VQUH2xSaf85Ht/NuuP/9G3tq11bbOXb/FaHPGaP1Cymuf8t7uK11XeXuyrZ92vR6q0IdD1bzdAE3OdjPAGyQ5RSrlJh+LbPMdN6mR1VbA4+rTpLk6WRK17ONJembVQVl86kpqoCeH91evAl3nAAAAAADgdmCuAHjrW3p/RYpavLde27b8qG2r+8sjMkJjljp28lbnz37Uti0/aukAb+2aNlpzJWlfvJLKN1e//2stbRyvVz+xqsVcW7/lQ2to1+TnNd26Vpu3pajpK2u0YfV6zW/rkjWCd9Urej/Sok4LbMvM72DJZXRvPssXyArt/N1bnVYv14Z1yzXU3zYRgCR9uiFaCXf01/ItP2rbmwGK258RAlk14Y3FOnHnOK3Z8qO2rR0n/0u5vapV0595S7uqDbat4+d31cK6WK8abkKXL4djGPf+8/rvTzX07Lr12rB6jbatXaSxoYb+n/yofS7N9ey6NVoztaNtmW019Ng62zGc0d6qZWPGa1m2hbZr/S/x8n9kjSLXzdeEHEFV3scoT773q/PoKVq+bqmWf7VcbS3f66s5kjZmXE9rbNfT3Cl6JNzbuHROyVLgm/Zr583mSvpkosb8Jdt1tdB2XUWuXq7ILcs18/Hs61v43XYl3z3O1r64jVxyO1VGBbr2pNj4BPmE2/dly4+K/Gq23mxjKdi5MirAvtjkcz6ynf97NP2Zt7Sz+jAt37xcG9b9qOWfvVigqRlyk+c+5bndV7qu8nbghNT0Fdsx3bZlvTYsmZL5x6Mb5rsN2nm6gXqt+1HbtkxUzcMHCnlcvdUp/AW9tm6N1ixZqqXdLYpa8oHUvo8e7zdA3R/J/nIAAAAAAOD25VQvKDjdWCyMuEOHjaWbz+Kndnd4KOm37dlHj3aZqchR0vv3vaK9TQKkfdnbfWqEyP/89qxRhxY/tbvDVUe3xmhfVjf51AhRoGK07kDOmM2nRoiCSx3Q6nxGUOa3fEHUrRciz0SH7cxg8VO7utLebbmNxHRXi2Y1lGw8JrmoWy9EVS9c3fYV9Bhmyq3dK0CdqiVrV677YXOlbczzGOXDp0aIgj0Scr5ubttYAHlvwxXORb7nMW/5X3t+evOrd+XxYbgGrrBKXgHq9fwUjfX/XkE9I2xdrmo/r7Avdnkei1xeM//9KKRc1m+T+3Zf6brKqYcWreuhuBHhGhNt2/Z+r7ytzkkT1XzYWmPn68yiprm8j9macu53rsc1l/fKujX8ZD1QuGsPAAAAAADcuswZAOclMwAeroXGNsDUemjR5qelWeHquzRRsoTozXlvq0XcRDUfcaODShOr9YLWLG6iXc8O0JhtVsmrh+YtHiaPr/sobOZBY28AAAAAAIB/nbmmgLiShIOK3XNQJ4x1wPQ+1xuTv5Hr40u17Yf1ivzqRfnsmqgBhL/X5q/3NOv9A6r7f2sU+cN6RS7oquRPntczhL8AAAAAAOAWUbRGAAMAAAAAAABAEVK0RgADAAAAAAAAQBFCAAwAAAAAAAAAJkUADAAAAAAAAAAmRQAMAAAAAAAAACZFAAwAAAAAAAAAJkUADAAAAAAAAAAmRQAMAAAAAAAAACZFAAwAAAAAAAAAJkUADAAAAAAAAAAmRQAMAAAAAAAAACZFAAwAAAAAAAAAJkUADAAAAAAAAAAmRQAMAAAAAAAAACZFAAwAAAAAAAAAJkUADAAAAAAAAAAmRQAMAAAAAAAAACZFAAwAAAAAAAAAJuVULyg43VgsjLhDh42lm6q2f1U1rF9HFSu4GZsAAABQhKVeSpPVek479/yhP2OPGpsBAAAAUzJVAFzbv6rualxff/9zSmVLlVRlrwrGLrgF/JNwWj/tiDGWAQAA8tWt411asfZnY7lQXF2dVa+2r/6KPUQIDAAAgCLBVAFwj65tdMZ6XsnJqYr6da+xGbeIx8I7XPM/3gAAQNFzPQJgSXJ1cVbThrX0+coNxiYAAADAdEw1B3DFCm4qWdKV8BcAAAB5Sk5JZbowAAAAFBmmCoAlqUolpn0AAAAwn2v60hoAAABQZJkuAAYAAEDRUaF8eWMpm+rVqhpLAAAAQJFCAAwAAIDbkpeXp96ZNVV16tQyNkmSnhz4mAY91d9YBgAAAIoUAmAAAADclo4fP6H/zp6r114ZnSMEfmbwQDUKDtKbb72drQ4AAAAUNQTADkqWLCkvr0rGMgAAAG5RW36K0qx3sofAzwweqDvq1dWYl19TUpLVuAgAAABQpBTpALh2LX/d1bSxOnRoo/AeD+rBbp318EPdVK7cDbgr9IOjNHdMF2O1kLpo3OxRetBY/rfU66LBj96jsgrW4P+boMGNjR0AAAD+fY4h8NgxIwh/AWjCZ1GK3pnzsXyUseetacJnUYr+bFhWoctMRf4wU/0cOxnkWOZWM2qRIiM6Ziv1i1iv6Cvs1+1hmJbndZ2NWpR1DRb0/OS6TNZrZK5/1CKTHD8AN1qRDoAbNQyWk1Mx7dnzu1Z9vVaffPal4uLijd1yaj9Bq+YMcCgM0NxlE9TOoZJDZV/VruFlrBaSl6oH+KqyY6n9BK1aP1/jHMLYvrOW6YeM7Ws/QatWzNfc2RFauPgzffvpWHUuK93Rc4IWLlumb1ct07efTlDPGlnLS9LgOcu0bEGE5s62PcY9KMnHX/UDfOWuCqoeEKjqFbMvAwAAcLNs+SlKf/4VqyYhd2ruBx8R/gJF3ISeTRXUsKmCxkXKejpS0xvanodNM/a8NU3o2VRBPSOM5XxdzTLKCGF3rte8ax2vdBUWDmuroPuGa6Gx4bbSUfN+6KqEcVnXnGcXeyjbZaYi+0jL7NffMvXOEYLnkNcyo0Kkj5sqqOFiqctM9VNHzesiLbvtjx+Af0ORDoDPnT+nv/bHKj7+mM6fP29svjqeNXVntQqqfn9PDXvqQTXzNHawqdj0QQ1+tqc6VyuTVSzrq5YPD8i5nOd96vvsAPVs6uxQzJJ4wkV3dA6xP+upZmVP6W/HDse26Mkhw9Svd09FHAhWn2HB+m33V4p4rLs6dOmu+XFBane/4wKSlKo/lgzTk0Nsj0lfSdr5teYv+VaHjF3L+qpz36c1+OEQkQkDAICb4ZnBA+Xh7q533p2rF0cOzTEnMADYdNS8H7JGVtrCz6jMUC7juWMto758VC4jMA3LRO9cpAlZTQ4jkR0C1szRnes1r0vGOm3L5fX6GVpkri/rdfJcZtQiRUYMy9xfY1vGMiNbxWtZw7YauCqrOVe5jkrNODYZzzpq3g+5bdt6zfPNXCTPdUn2ADSj7bYY3bpWA+8zHL/jB7VQUr/2QdLGjzOPx4RFkVLlug4dc8prmX6+3tk7juojj1V9s11vAJCXIh0A3xAPP6c3Z/9Hr7fxkqp10Ouzx+YYGXzH4Ah9NCxEZVVTYW9/oBkPl5FUVy+8NUE96znblnt7lO6UpLuf08I5fdWolLPu6Nlc1Q3rkiTLyeM6WauDmkkq+3gTlT17ShZjJ6Pfd+mXs7Zw+Y7KqbLGGTvk4uHnNGNE9+y1sh305rzJ6lRZKtt0sN557b7s7QAAADeY45y/69ZvzDEnMABkWauB99lHBitELfSWgho2VfNha6UuM/VU5ZW2UZwNF+tEqz7ZwjX/PiHaaRzhqWF6MDg6c4Rx0LgENXQIk9sfG29f30p5TLIHo9P62kZ2xkgeff1s62xoC/IWDmuroIZNNX1jLt9iKN9cHttsrzN9o7fa218nv2UsrbpK04z701Hzungryj5i1XFdeRum5V0S7Ps5XlFeXa88YrjLTD3VKt4+kvUtKTggq81+DII+jnFcwvY6kzz0XcbxXOWhkcaA+BaVGXaPkt7PbyS2l1/hQ20vP2nYSqlPlKJ39pZWfaMWvttvmxHtAG6+IhsAe1fJNpGC+j32iPo/3lu1a9fMVr8aLnE/qN+L/1HEuMn67kKIOj3s2PqgBnQuq43jXtbU/07Wk0v+Uv0HB+tO7dPUIQP0zIyvtXnDXv3t6au7JPV8+D65/DhBI6a+p1dG/pBz9K0kl+J7telETYW1L6OejZ31y8az2TtUvsc2lcO8RRpc6y+tXrJLktTy5fn69qNRuvPkKn28IfsiUhk1G2afIuKjUWppbLar/lRP1fk9Qs9MfU9TX/xMh2q0V09jJwAAgBsktxu+5XZjOADIIUA6Omxt1vNVw9U8M7jbp4TT3qrqEHLGfmwfbbnqoE6U91ANSeriJ8/yzTUyY4TtquEKs6+zRmVp73cZ649QmD3kzWKR57FvCj6C83SkvrIHfgu/i5aC779ikGjd+JZ9ZKrj/tSVR3ljzyuJUFjmVANrdfS4RR75D2SV6nrIErPdvn9rNXCVMezNRRc/eWYuI2nadsVeTWB6TRzn8y341BgZQXzQNOkpw8jlq5nrOOcyEQrLmMpk3/0KbNVb0cb5hgEgD0UyAO7Xt5caNQpWcFB9SZK7e0UlJp7Ugg8Xy0lOxu7X4IjOXnCRJds0EBVlKX5Khw/Yn35zRH+XKquKqqsBEYu0bPoAta0lnbVnuBXLuujshSOOK8jVZxuPqUqbF9Ss1F4tP2NoPLlL787/QO9OfU5derysRfbX3vTGAHUI7alX4u7T668Y/692TlsjbFNEdHhsmjYZWjPUrlhWLgF97XMFd5b72VNKMXYCAAC4AWr4VVNAnVq53vBty09Reufd+WrV4p5sdQDI5Bg0Sobgb6KaFiQkXTVcze2h3HeVJ+acaiFfVoeA+N8UobCGK+UxyT4FRHC03ncMwnPlMHXGzih1dxjMe13V9bjyt1lvuKygNaggU2MYOf6BwC7bXMf26SGuJNdlusxU5M71Wt7XQ3vH2ecDbnJ7jJAGcHMVuQDYvWIFnTt/QV+vXqukJKtKly6txMSTqlzZS51CO+iff07ozJkk42LZ/XVMiZUD1bOs7WnZxwNV/cQRbTX2K/ugalc+rkNRjsVd+vuMl+64297l4VqqcuKItjbuona++zV70ARNXRKrZHvvP0+ck7unfX7fss5yyVyPwcrv9HetEGnHopyjhFPO6Zcdu/RLzKnM0h1NM+brPac/zpyVSymHuYgLYetfR6ST2zXcPlfwk0Om6StjJwAAgBvgwMHDem74mBzhb4YfI7dq7vyPjGUAyFW/iK7y3JgxZcN4RZ029sjNMM2zB74TetqmNLDY53g9cEwKbJ8RBg/TcsP8wIXmECr2ax8k7fqmQEFiTh017wf7dBYNmxbsJmyj+qjp8cWZx2ZZTNb77sIj8fLPDCEdRhfvS5A1ICRrPmDjHLa5mbZdsQ7LaFSI/LMFpraQPreQ3TYFQy7H2D6n8A0bKdtlpiIdX7eLnzxPJ+hAxkhth6lEJvRtLh3bZ1g2+7blt8yEvkHaO66tvjqW1R8ACqLIBcCJJ0/JyclJA/r3VXz83zp82Da6dt78j7Rt+y/68KMlxkVyOrBIS6MqqO/i+Vq4YJGWPVhGX384X5kTL1TroE8WRGjhR33lt/Mzvb1DUtxZna0cqL41tuutD/eq/suLtHD2u/roQWd9FTFfZ3ds158pgRoyL0IL3+6sKsVtq1q35FslNn5By+a9q09mB8vlgsN2ZPOD3hr3siYtOmdsyEUZ+bUfoI8+f1dz583Xp/eX1aaVnxk75WO//j7jrOpN79PZD+fpq8sd9OmntnV98koHY2cAAAAAuEXYR7JOai5LgO0r9Bmh4cJhK3WilW0Ub/TOPjq6Kl5NR+X/tf1+EX7ysI/8jd4Zpeg+0jL7NBILh7XNHBUcvbOrEsbZp4Cw3wCte4BFTe2jcI03oRvZyiKLfVsyw8HT3mpvf52RreL1nX3Ubr7L5GqtBq6SumeOdi7Azdamfawor4zj9aKqbouWZ5+MOY0d27rKMyM4XzVc72/0znydp4Id1pdxE7g+AZL9PNiOQYTCxiVk7md0lwRNz2U+3YyQPafs03Y48vTNGRpfF6uGq7njNk8K0t5p9lB91XA1/zjrWHfXYtt80wbZti2PZfpFrFd3rdTAVdLC7xIUOCnKNh/wtpzHBwCMnOoFBacbi4URd+iwsXTTDOoXJkn6aOm3xqYbo6yv7vQ5m21krQZH6IeG0bpv3A+qrf3680RWU8WAmlLMfp2UJFVQ7YZl9ffOI1nBscqoekBFJcY41vKrX6vctqGAyvqqdsWT+vOwPXD2rKk7Sx3TLxnP8/FYeAetWPuzsQwAAJCPdIW2DtGa73cYG65Kt453ac7C5cYyAJhfl5mK7HvQYb5jacJn61V10VVMdwAAuC0UuRHA19XZI9nDX0cnsoe/knQyM/yVpFP6M0fwek6Hcg1586pfq9y2oYDOHskKf2Xb34KEvwAAAFfHSccTTqtpw9oqXpyPsABw9Q7qROaI3YxRpraRpQAAc2IE8PXmWVN3VjyZdzAMRgADAICrUt6ttGrX8JGnRzk5l7DPl3UNGAEMAACAooAAGP86AmAAAHA10tMzPramK/PHQnJykiQnhd3fjAAYAAAARQLfnwMAAMBtwcnJyf4opmLFnK7q4eRUTE62FBgAAAAoEgiA8a879g/TYwAAgGvldJUPAAAAoGghAMa/qmmjQF24mGIsAwAAAAAAALgBTDcH8D8Jp1XJo7yxCbeAY/+c0vkLFxUTG6+UlEvGZgAAgH9Nt453MQcwAAAAigTTBcDcXAwAAABXQgAMAACAooIpIAAAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAA3OYmaPXeaMVEzdMT2er9tDAqWjF7oxWzN1IL+2VrzNMTH0YqZm+0Vr9hbLn9EAADAAAAAAAAuI1N0Oq93eX5c5T2G1re+Hqk6v0+XQGBQQp4c6/qjVmuK2W6T3wYqdF3ndDWn63GptsSN4EDAABAkcNN4HCzDeoXZiwBAG5BfF64PbzxdbQ6nJiuJo/X0Oqoqvqi6UB9INmD4Yb6NTBML2f2jVTVpc3Vb2G2VWTpN0/bxnjo28Aw7f8wUg/HNVenjIUzw2GL/VmsvnBY962KABgAAABFDgEwbrZB/cJ0/5GPjWXcwh7cUNZYAmByg/qF8XnhtjOhAAFwtBrtDMoW6ubliRwBsGH9/eZpdevN6vR4XmnyrYEpIAAAAAAAAACYmG1+4Guez7dfVXlammr03mht+7CftHDgLR/+igAYAAAAAAAAgLlNUKfAjFG/Vp34w9heQAsHqklgkAICg/St50jFZATBtzgCYAAAAAAAAAAmNEG/xvqrQ0ZI22+eOvif0FGHQbtvfB2tmKh5eiKrlI8JWmhf18udgxSwLFZunjWMnW45BMAAAAAAAAAAbl/95mnb3mjF7O2umvYpGjJC3Zc7T9fv9WyjdWPGBOr3N3O5aZvFQzUdnr7xdbRi9kZr9F0W1exu+3n1G9ITH1aVp33kb8zeaMV0l77oPMFhyVsTN4EDAABAkcNN4HCzcRO42w83gQOKHm4CB7NgBDAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJiUU72g4HRjsTDiDh02lm6aQf3CtGLtz8YyAAAAkE23jndpzsLlxjLwrxnUL8xYAgDcgvi8ADMgAAYAAECRQwAMAACAooIpIAAAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgAEAAAAAAADApAiAAQAAAAAAAMCkCIABAAAAAAAAwKQIgFVSHlW95VHKWM+utKe3fD1LGstydqsk3yoWORsbAAAAkK+CfY7K+7NaXp/PAAAAAGRxqhcUnG4sFkbcocPG0k0zqF+YVqz92VjOm2drvfhCG3mcPilVdNbuD2do0Z5Lhk4VFPr8c+rknqTEYhVV4veP9driGKWqhOr3HqFBwVLihVJyO7VJk9/+XgmGpQEAAGBUwM9ReX5Wy+vzWcF163iX5ixcbiwDAAAAplOkRwA3f7iN3H6dqxcnz9TLS/9Ro0ceUn1Jcq6m5m3vkIck3dNNnSrs0czxMzVhwpc6Vv8h9a0nqWonPdzopL58+S1NePU/iizdQU90KGF8CQAAABjl8znKI7iFmte0/ZznZ7W8Pp8BAAAAyKEIB8ABCqh6QbG7bCOYU6N2Kta5uupXldSys/qGdVa7qlL9Ot66cGCPYiUpdaeiDjrLv763VM9Hlf/erx9SJemUvvv9H/nWbmJ4DQAAAOSQ5+eomgp9uJN63d8k389qeX4+AwAAAJBDEQ6AveXhZlXinxnPL+h8agmVdpO0fq5efPk/+vSo5Otu0ZkT+zOXOp+cKpfSFtX3rqCkxPjMetL5i3IuyRx0AAAAV5L356j9WjR5sl5+/6d8P6vl9fkMAAAAQE5FOAC+JKmESpQx1iXpkpKSLtp+SpOcnXPedSQ1VSrhmrMOAACA/OX7OeqCVUmpyvezWl6fzwAAAADkVIQD4Hj9fbqUylXNeF5KpZ0v6EzWYBRJ0uETVpUun/WVwtKuzjp/Ol4xx06ptFulzLtWu5UuqfNnT2X2AwAAQO4K9jkq789qeX0+AwAAAJBTEQ6A92vPIWfVDQmQsyTnpg3lf+Gwfj2d/SZwMb8fVonaDVXfWZJzQzX1u6D9O63Sjv064hWgTp6SVEHt61XUkX17jC8CAAAAo3w+R2XdBC7vz2p5fj4DAAAAkENxT6/KE4zFwrCeOWMs3TQhDesqZn+csZynY7EpCnjgIYW3aqL2jcro1yUfauM/l6XWj+rlsPoqtjtSe/44oOTa96vvw3ereduGKrv7c72zOVGXL+7XKbfmerBXG7Vo0VI1kzbonc9idd74IgAAAMguz89RNfXQsz3VsfJZrdl2NO/Pasfy+HxmfJ181K1VVTt27TOWAQAAANNxqhcUnG4sFkbcIdudmW8Fg/qFacXan43lKyghtyoVpYR/7PPN2WtuJTLnAZYkZ7dKctdJHUu6lFmTJJWqIN+yF3TkRFZfAAAAFEBun6NKWeR2KWMeYOXxWc0mz89nBdCt412as3C5sQwAAACYThGeAiLDJSX9bfwHRdZN4DKkJv2T+z8uLpwi/AUAALgauX2OyrwJXIbcPqvZ5Pn5DAAAAEAmAmAAAAAAAAAAMCkCYAAAAAAAAAAwKQJgAAAAAAAAADApAmAAAAAAAAAAMCkCYAAAAAAAAAAwKQJgAAAAAAAAADApAmAAAAAAAAAAMCkCYAAAAAAAAAAwKQJgAAAAAAAAADApAmAAAAAAAAAAMCkCYAAAAAAAAAAwKQJgAAAAAAAAADApAmAAAAAAAAAAMCkCYAAAAAAAAAAwKQJgAAAAAAAAADApAmAAAAAAAAAAMCkCYAC4jVksZeVdxctYBgAAAAAAkAiAAeD29sLwQXpl7PN6JLybXF1cjM0AAAAAAKCIIwAGgNuYi7Mt9G3VopkmjBuhwLq1jV0AAAAAAEARRgAMALczp6wfK1Qop2HPPKHHej+sUqVKOvYCAAAAAABFlFO9oOB0Y7Ew4g4dNpZumkH9wrRi7c/G8i2tdNkyKluunJxv8Fe3U1NSdPbMGZ0/e87YBOA2NuWNsSrnZjGWZbWe1UeLv9Tu3/YZmwAAkrp1vEtzFi43lgEAAADTIQC+iUqXLSO3ChV0KiFRyRcuGJuvK9dSpVTBw11Jp04RAgMm8tYbY+WWSwCcYfuOXVqydIXOn7+x7zEAirZixYrJx9tLLs4uOnw0XqmpqcYuBdb4ziBV8fLU38dPaMcv0cbm64YAGAAAAEVFcU+vyhOMxcKwnjljLN00IQ3rKmZ/nLF8XXlX8VKPh7qoWLFi+vvYP9naypd308WLydlq+ang6anTiSdvePgrSWmXLik1NVVuFcrrnNVqbAZwndzTrLHatWmhP/86kGcA4uTkpDat7tG9zZsqJma/0tLSjF0KrH3bFnJ1dTWWM3l7V1bze5ro5Kkziv/7uLH5uqlV008PhYUqIfGkziTl/R4TVD9QnUPb6vCReF24cNHYDOAWVr9egM6ePafUS5cya40a1tcTj4Wr58MPqHYtf9UNqKXwhzqrXmAdlSpVUgcOHsm2jit5akBvdQltqzq1/dW4UQN5e1fWjl93G7tl6tC2papUqaTDR+KNTVdUt1ZV7djFtyQAAABgfgTABVSsWDHd3+E+Pdn/EZ1IOKnQjq31zXcblZ5uG0Dt5mbR66+MUtS2nbpwsWChRnkPd506kWAs3zBply6pvIe7rKdPG5sAXKOSJV3Vv2+4Qju2VlWfKrp8+bJi/thv7CY3S1kNfrKvWrW8W9V8vXUmyaqDh44auxXYlQJgSXJ2dlbjRvVVvXpV7YvZr5SUFGOXazbiuSdVq6afmt/dRBcuJucZ+ox9caiqV/NRi+ZNdCLx1A0NpQFcX08N6K3Ek6d07PgJFStWTIOf7Ku29zXX/1av00cff6H1P0Tqxy3b9MPmrbpw4aLu79Banh4V9fveP42rylXjO4PUJbStPlm6Qu+8t1DWs+fUoW1L/X38hP7O5b0irGtHPdC5vfYfOKT9sYeMzVdEAAwAAICigpvAFYB3FS+NffFZtWx+l97+73wtWvyFihUrJlcX58w+SUlW/RgZpT6PPJhtWQDmV9Wnil4Z+7zubNRAknQpLS3XUDagtr9eHTdcAXVqZvYrmUu/683JfqO4BnfU1euvjNQ9zRobu1wzZ/v7oZOTk8If6qJnn3481xvROTuXkCSVKFFCAx/vpccefSizBuD2UKF8OU0YN1zuFSto0pT/aGvUL7qYnPUNqPPnLyhq+05NmfaOavhV0yPh3bItn5cqXp6SpI2bt2b7b0bdUVjXjrq//X1a/32k1n630dgMAAAAwAEBsEGxYsUUUKemOoe2VZdO7dTz4Qc0bsxzij1wWBMmzdAff8Zm6x/UIFADH+8lSVr+v29V1aeKPDwqZutzzcoHqP0jj2vSG6/q2dbGRgA3U9vWzfXSC8+qfDk3OdmTVif7I0OxYsXU7YEOen7oQJUqWSqz381QqmRJPdb7YQ1/7klVKF/O2HzNMvbtjnp19OrLw1XDr5qxi2Q/JpJ0d7MQjX/peXlVyhnwALj1lCvnphdHDtaZM1ZNnfmeEhJOGrtksp49pxn/eV9B9QNVw8/X2JzD38dPSJJatWgmSRrY/5Fs9QyO4e/ny1ZlawMAAACQEwGwQfu2LTR08OMKDKilOrX95V3FS/+Z/YGWfLZcybl8bfrPvw4osG5tBTUIVGpqqrb/ult+1aoau127JKsuOJdU6YxBx6UqyLdqBZWWJGeLKlf1lkep7IsAuHFKly6lYc8MUI/uXVS8eLHMQNOoQoVyenHEYIV2aC0nJycVK3bzwl9HAbX99dr4kWrVotl1CaSNa3ByclI5N4teGP607u9wn8NrZO/p5CR5elTU+Jee0733NMnWBuDW4uxcQg92vV9/H/tHEe98kG3Ub15SUlK1/vsf1bFdK2NTDjt+idYvO/fokfBuem/W/ynkziBJkl81n8w+XULbEv4CAAAAhcQcwAbdw0L19Zr1WvrlKv308w5tjfpFCYmnsvVxdi6hju3v09rvNurChYsqXbqUqvpU0e49++RdxUtelTy1d9+V57tzq1C+YPPxXkxU7J+XVfu+hip9aL0i90tyqqvHRg9Se8svOlx/kF5u76Lt62OUfUtzKvBrAshT9WpV9eKIwfLx9so1PE1PT9fBw3FKT5dGPveU3N0r5Nrv8uV0/fFnrP6KPWhsKrC2be5VyZJXN41E8eLF1eCOuqpbp6b2xewv8PzluWnXtkWO6SycnJzk5OSkugG1VKdWDe2K/l0d2rXMcSxswXgxBTUIlHcVL0Xv2afLly9n6wPg5gt7oKNKlnSVp4e7Ot3fRl06tbviTdokKT7+uMJ7dNUvO/fo/Pn8b3y749fdmXP+bvzxZ508eUrt27aUq6uLqvtWVdcuHbRx81Z99sX/jIsWGnMAAwAAoKjIfchaEebq4lLoO9P/fewfeVXykCSdOp0k36rexi7XX+pOzVm9X+73PqknGpXSnm9WK/vkFABulKef7CM3S9kcQaaj4sWKachTfVWypGu+/a6Z7T6U16RWTT890LmdsXxd1antr9D721xxc+9s1ECtWtxlLAO4BTz/wgQ9PfSlbI/35y82dsvhYnKyEhNPqap3ZWNTrnb8Eq1Va9Zrxy/R+nL5Gn23fpPat22psK4dtXHzVn2ydIVxEQAAAAD5IAC+DsqWKZ0ZGruVLaOUXKaKuBFSdx/WMecK8ri0R99FXTI2A7hBYv7Yr/T0/KPMtMuXFXvgsNJu9EjWa8iWM/bhj78OaM3aH4zN103G6+zffzDfzU1PT9fly5d14OARYxOA25zFUkZJ1rPGcoGcPpMkSTpyNJ7wFwAAALgKBMAGySkpcnbOmGj3ypycnNQw6A4diftbkuTl5an4Y8eN3W6AErqvdwu579uuPSVC9EhYBWMHADfIh4s+11crv7EHljmD4IzK2/+drw3fR9pquQXG+aWhBVTsGkYXX7iYrEVLvtSMiPf1z4kEY3Oh5LUVaWlpOnfuvKa9PUc7o3/Ps2fa5cs6dfqMJr35H+2PPWRsBnALqVK5UqG+2eDi4iw3S1mdsQe5VxJyZ5C6hLZVyJ1Batu6uXp076KNm7fqjSmz9PyzA9WrR9dCvT4AAABQ1BEAG+zevU8Pdr1fI4Y9pRHDntIzg/opoE5NY7dMjYLvUPlyblq3frMkqXLlSjp2LPvdqq9Zh8F6b9bjauom+T/wf3rvjZ4KbtpHXeueVOQXX9qmgmj7uB7yNC4I4Eb5dt0m/d/Ud3T27FmlpWUf5etkf1y+fFlfLl+t/7zzgS5eTM7R72bICKJ/2blHr06crsifthu7XDeXL6frz/0H9crE6fprf97zHKenp2tX9O+aMGmG4v/+N/6ABuBaPPnEo2rfpoWxnKfmdzfRocNxOe6pkJtBA3prYP9H1Kb1vRrY/xH16N5FP27Zljny958TCbqv5d3q3etB46IAAAAA8sBN4AxiDx7W0fi/dflyuk6ePK1Lly6p58MPqFw5N/3xZ6zS0tKy3QTu+D8JitqxS+fOnVetmn5q0bypPvtipS5dSjOuOocC35Bt/3atWrM+67HhNx2P26m1a7Zq73np8uFftMb+85UU+DUBXNGZJKt+/GmbqlXzkadHxcwRaenp6Tp0OE6//f6HJOlEwkn9tHWH6tSuoXJubpn9Lqdf+03g2rdtIVfDzdfyk2Q9q/kLPtWatd9f1+lqHLfj8uXLSk9P11crvtGSz5YrNTVriprOoW2z9v/yZV1KS9PHn3ylFf9be0sE5ACuLPHkKT0SHqYtW7crOTn/9xEnJycNGthHK1Z9q2PH8/8DecidQeoc2lafLF2h9+cvlrt7BflW9da36zZl/nFoz+8xqlK5kpo1bSSLpaz2/BZjXE2BcRM4AAAAFBWMADa4fPmy9sXs19dr1mvV6nX67Iv/adKb/5F/jWqaMG6E6tT2z9Y/OSVFZ84kqXSpkhry1GNa+PHnungxOVsfAOZ14cJFzZq9QJ998T+lpV22TQnh5JTjZmdJ1rN6a8Z7Wv3NetvUEenpcpKT0nP0vDHSJW368We9+vp07f7t+gceGXuRlpampKSzenPabH23wfbNiOxsPS9fvqwTCSf1+htv66efdxg7AbiF7fktRvti/tKo4YNU1aeKsTlTmdKl9czT/ZScnKJd0b8bm3Oo7OWp8xcuauPmrZKkhR9/ofMXLqqyV9ZXnNLT0zVvwSfa8tN2tWrRTD26d3FYAwAAAIDcMAK4AKxnz+nHLdvk7FxC/Xo/LC+vSvL0qKiv12zQ5Yx5PZ2kkydPF+gfOBluxmjcm/GaQFFw8NBRRe/eqwZ31FWZ0qUU+dM2HT4Sb+ymP/46oJg/9iu4QaBKlnTV9xu36Pg/Vz//bn4jgNPT0+Xk5KR/TiTonfcWanPkz7qUduVvJ1yNOwLryNOjonbv2ae3/ztPiSdz/6p3SONgWcqW0ebIKL37/iKdPXvO2AXAbSB6915VrFhefR55UBcvJish4aRSUlIlSa4uLgq5M0iDnuyj5OQUvTt3UebNcvPj5mbRXU0ayXr2nA4dPqpWLZrpzob1tenHn3NMD7Nr915ZLGXVutU9Sr106armDmcEMAAAAIoKp3pBwdc0/Czu0GFj6aYZ1C9MK9b+bCxfV95VvPRg1/u1/Zdo/bztV2NzoVTy8daZk6eUfOGCsemGcC1VSuUqVtA/cTlDKQDXh6uri8qXK6fj/+T/VecyZUqrVEnXAs2JmZ+33hgrNzeLsSxJSku7rG+++0Frvtlww4LfDE5OTqrm66NDh48am7IpXry4qlSupKP2G2cCuL3516imng8/oOrVqioh8ZSSk5NVpXIlJVnPau13G/XDpp9yvwlmHgYN6K1GDevr/IWLKl2qpH7duUdz5i82dsvUtUt7Wa3n9P3GLcamK+rW8S7NWbjcWAYAAABMhwD4JipdtozcKlTQqYTEGx4Cu5YqpQoe7ko6dUrnGXEHmMaUN8aqnEMAnDHq99Dho1qw6HMdO/ZPtv4AcCNUqVxJFktZyT7FS+yBw7p8+erm9Q65M0iVvTx17PgJbf8l2th83RAAAwAAoKggAL7JSpcto7LlysnZxcXYdF2lpqTo7JkzhL+Aybw58SWVL++W+TwlJVXL/7dW32/cUqhRdwBQ1BAAAwAAoKggAAaA29ibk15S+XK2APi33//Qx58u06lTt87c7ABwqyIABgAAQFFRzFgAANw+zpxJ0tmz5/TBws80690FhL8AAAAAACAbAmAAuI1F/He+XntjpqK27zQ2AQAAAAAAEAADwO3s/IWLsjK3NwAAAAAAyAMBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmJRTvaDgdGOxMOIOHTaWbppB/cKMJQAAACBXcxYuN5YAAAAA0zFdALxi7c/GMgAAAJBNt453EQADAACgSGAKCAAAAAAAAAAwKQJgAAAAAAAAADApAmAAAAAAAAAAMCkCYAAAAAAAAAAwKQJgAAAAAAAAADApAmAAAAAAAAAAMCkCYAAAAAAAAAAwKQJgAAAAAAAAADApAmAAAAAAAAAAMCkCYACAnJyc9ECndpr82mjNmPKK+j76kFxdXYzdAAAAAADAbYYAGACgZk3vVOfQtvp2/SYtWbpCQQ0C1aVTO2M3AAAAAABwmyEABoDbRI9a0rcPSCVuwDu3b9Uqios7ph82/aTtO3Zpz28xqu7rY+wGAAAAAABuMzcgRgAAXG9B7tKLjaQqZaT/3Hv9Q2AXFxedu3Ah8/m5c+dVsmTJbH0AAAAAAMDt5zpHCACA661+RWlua+nrg1LCBcmzlDS75fUPgQEAAAAAgPkQHwDALayGm/RBGykiWtr8t3RZUv8NUnlX6a27jb0BAAAAAACyIwAGgFvY6WTplShpyR9ZtbOp0oDvpY3xjj0Lz7dqFU18ZZSKFcv/fwXPDOqnpiENjWUAAAAAAHAbyP9f/QCAm+pUsvTNYWNVsqZIKw4Yq4VToXx5eXq6y8XZ2diUTaVKHqpUycNYBgAAAAAAtwECYAAoos4kWSVJpcuU0vkLF1S5koecnJwkSd7eXjp3/rwkqWyZ0jpzJinbsgAAAAAA4PZAAAwAt6B7q0iVShmruWvtI1VwNVavLC7+mC5fvqzKlTy1Z88+ublZ9NILz2jIoMdUr25t7d6zT5ayZVSmTGkdOhxnXBwAAAAAANwGCIAB4BY0pL7U3jd7LfGi9OuJ7DVJGtnQFhgX1qVLl/Tztl/VObSt/vjrgGb+Z67i4o/rctplLfhoqTb8EKnQjq0VF39MR45e44TDAAAAAADgpiAABoDbxM4EadQWY/XafLl8japU8dLwoQN16VKaPlr8hd6b97GOHI1X/8fC1brVPVq05Eulp6cbFwUAAAAAALcBAmAAuAWdvyS5FjdWc1fGWTp3yVgtmLNnz2l6xByVL19OL4x4WhHTX1PEtNf0ytjnVf+OAM2e85EOHjpqXAwAAAAAANwmnOoFBV/TsK64Q7ncnv4mGdQvTCvW/mwsA8Bt5wE/aWxjad8pY0t2ZZ2lcq7SA19LyWnG1oJzdi6hunVqqUO7lqpYsbyWLV+jP/6MlfXsOWNXADCFbh3v0pyFy41lAAAAwHQIgAHgFuXmIgWUN1azS06TdidK1/RGLsmrkockqVPHNvL29tK8BZ9Iko7/k2DoCQDmQAAMAACAooIAGACKuMC6tTXsmSeMZUnS8v+t1Tff/mAsA8BtjwAYAAAARQUBMAAAAIocAmAAAAAUFdwEDgAAAAAAAABMigAYAAAAAAAAAEyKABgAAAAAAAAATIoAGAAAAAAAAABMigAYAAAAAAAAAEyKABgAAAAAAAAATMpUAfDJU0nydC9nLAMAAACZPN3L6eSpJGMZAAAAMCVTBcA79/yh4Hp+hMAAAADIlad7OQXX89POPX8YmwAAAABTcqoXFJxuLBZG3KHDxtJNVdu/qhrWr6OKFdyMTQAAACjiTp5K0s49f+jP2KPGJgAAAMCUTBcAAwAAAAAAAABsTDUFBAAAAAAAAAAgCwEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJiUU72g4HRjsTDiDh02lm46N7dy8vWtIXd3Dzk7uxibAeC2lpqaosTEBB05ckBJSWeMzVelZvFi6uzsrKASxeXm5GRsBoDbXlJ6uqIvpenr1FTtT7tsbAYAAABMy3QBsLe3rwIDGxjLAGBKe/fuVnz8EWO5UFo7l9Dgkq7GMgCY1rsXk/V96iVjGQAAADAlU00B4eZWjvAXQJESGNhAbm7ljOUCq1m8GOEvgCJncElX1Sxuqo/BAAAAQJ5M9cnX17eGsQQApnct732dnZ2NJQAoEnj/AwAAQFFhqgDY3d3DWAIA07uW976gEsWNJQAoEnj/AwAAQFFhqgCYG74BKIqu5b2PG74BKKp4/wMAAEBRYaoAGAAAAAAAAACQhQAYAAAAAAAAAEyKABgAAAAAAAAATIoAGAAAAAAAAABMigAYAAAAAAAAAEyKABgoMG/Vb9pQTRp4GxsAAAAAAACAWxIBMGDkWVPtevTXqLEj9Mrw/urZKiPwvVvdH+ul/mF3GxYAAEiSS+NH5fvuQgV88olqvT1SHoHGHgAAAACAfxsBMOCowUN6ZeyT6t4qQP6VK8q9RoBa9RiiV/rUN/YEADh6dLpq/3eoKjauI9cq3ipzz8Py+eA71Xzc1uy79CcFr58lT+NyAAAAAIAbigAYyNREQx5posrOFxS74VONf26chk38VOt27dGWtXuMnR3Yp4ZoVFPuxqZqAWqSV5udb4P82wHgduDd6R6VKJ6os1MeUvR9rbSr33Sd2vyNTnwoSSEq7mpcws43ROU6dVCFtiFyMbZlCLxHFTp1ULnGTMEDAAAAAIXlVC8oON1YLIy4Q4eNpZumbdtOxhJQcKFDFNG5mnR4s15/62slGtv1kF75bxNVPr5NQyZ+KUlyb9dfz3cOkLuzvUvSfi2bOVfrTrRQ//Ft1MSrVNbimW1Su+deVfc6VsXsKiX/YIuck2K0YOwCbcvqDRTK+vWrjaUCWWopYywBV8X3k59UseZZnZvzjP6a/4dDy6OquX6oylocSgf/p13hk+U2aqGqPVRHxYvb64l7dGLyk4rfLOnRWar/fIjSDx6Uk6+fvU+KLv7vJcVM3OKwMuDqhVvPGUsAAACA6TACGLBrUq2CnCUlxucW/ubCs7Oe6hwg9+T9+uY/b+n/Pt6pxLI11a5HE0m/6cypU9r57QrNnrNA3+y/ILnV1D0dKzqsoJIC6qUq5tsV+mzlFsJfALe14z/sUZrKqsyghQpatVB+gzrYR/T+oOOTl+icVZJ1j05MeFWHZ30hPTBdvuF1VNy6XccnvKq4pXuU5l5fnkPHqqzDekt4u+jsnFd1eNr/dOG8i0o+MFLV7nLoAAAAAADIFwEwYFfOtYSxlL92AfJ1vqQju7bp7/LVVPnyfh05Jbl5BShAJ7Xsv//R+ztPqmrdRnJPuyBJKlfxjmyrOLJlnmav/Ekbt8ZkqwPA7SZlzpPaP+1bXfgnRU6V6qjcgNcUuGaWPH3jdXZ9omxfN7qo1NXf6tTmP+TRIUgllKJzXw7VsdXfKmHaq0o6IsmvsTwcAt603V/q4Iff6tTSyTq5+6wkb5XkXpwAAAAAUGAEwIDdr6dtIa2za01jU+6KS1IJ+Tbvpf6P9VL/xx5SQ3dJzs5yU4C6D39Zs1/sr65Na8q3gsNUEJku6HziSWMRAG5bF5a+qj+6tNLe8R/qzP6zknuIqoy33wXOwKmEJKUo3ZpRideFf85KclPxPN6GE/5JkCQ5OU4nAQAAAADIFwEwYJe474SSJLnXbKl7HBuqecvX8XmGNNt/jm19S0OeHZP1GLtA2zq3VauaFiXt+lTjX3hDr2+Oly1eBgCTatFBFexvlilr5+jg61uUIsmpfO43brt0PllSWRWvlFEJURnvspKSlLo5e98MlWt4S0pR2j/GFgAAAABAXgiAgQxRK7TujwuSW4B6vjZE/bu2UdfHhmji8Of0/HMt5J7Rz9VN91STtC5GR1Klyo0H6qkODdWkaUPd/1gfda8rya2knCU5l60k/6Zt1L+Zt3IbAwwApuD7qGqOf03VPvlStf5vpLweH6kar9wjF0kpMd9KStDlFEmlvVUqvIMqdgzRqeW7lSKpVKe58gnvIK+3XpJbFSl9/w6dPJK16uJ1HlC1xzvIY9RcedR3kc7/IescxxcHAAAAAOSnuKdX5QnGYmFYz5wxlm4af//axhJQCBcU+/MxudaqLj/fyvKtWVO1fMqpdMo/2rnxO205fEa+DYPl61lJtd3P6dsfNmjv+cqqF1Bdte6or0bB9RVQxSJZD2jLxnOqdWcdVa5SQ42Ca6riqRNKKmeRc+Kf+vbnw/K/6z4FukuJezfq5wPG7QAK78CBP42lAunhartNF3BNkhIl95oqVSdApQLqqWyTenKt6KJLO5fo6IjlStZ+XareThXq+arUPa1VLrC0zk15SWdLNpMlpL7K3ttaZf0s0t9b9Pfw8TqdJKlBJ1Vq5i2nC2kqeU8nuTWopGIp8Uqa+6KORmfOGwFck89TUo0lAAAAwHSc6gUF2+7LcpXiDh02lm6atm07GUvAVfJW/aaVVOL0Ye38w3GeXltdx3Zqj8Ol79ugoSqXsCr21/1KzKxWVECjanJOzN4XuBHWr19tLBXIUksZYwm4Ji6N71MZLyl19w866zCSV5IUeI8q1HAxtHmrbNv6Kn56j87siM/q++gs1X8+RNoxS3sm/6FyDVyUsnoL0+ngugq3njOWAAAAANMhAAYAEyAAhuk4BsCDlxhbgeuCABgAAABFAXMAAwCAW8+lFKWdtz0AAAAAAFePABgAANx6lo7U3vtaae/IL4wtAAAAAIBCIAAGAAAAAAAAAJMiAAYAAAAAAAAAkyIABgAAAAAAAACTIgAGAAAAAAAAAJMiAAYAAAAAAAAAkzJVAJyammIsAYDpXct7X1J6urEEAEUC738AAAAoKkwVACcmJhhLAGB61/LeF30pzVgCgCKB9z8AAAAUFaYKgI8cOWAsAYDpXct739epqcYSABQJvP8BAACgqDBVAJyUdEZ79+42lgHAtPbu3a2kpDPGcoHtT7usdy8mG8sAYGrvXkzW/rTLxjIAAABgSk71goKvaQK0uEOHjaWbzs2tnHx9a8jd3UPOzi7GZgC4raWmpigxMUFHjhy4pvDXUc3ixdTZ2VlBJYrLzcnJ2AwAt72k9HRFX0rT16mphL8AAAAoUkwZAAMAAAAAAAAATDYFBAAAAAAAAAAgCwEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAAAAAAJgUATAAAAAAAAAAmBQBMAAAAAAAAACYFAEwAAAAADho1uwu3X13M911V1NjEwAAwG2HABgAABQNZTzlX7l0tpJLeW9VLe+crXa9NH5isuYMa2wsX1GZytVzblMZT/n7Vc98eJUxtBn262qVLuMmzyrVjeWr5lLeO+/tzke2Y9foaS2Y+7QKfyRz0aiv/jvrad1trEuSHtCUz8eph7GchzKVG6tTz4fUu2sz+Rdwv/L14Dgtz9jPfLezcJ5veVHLn7aq250pxqYbwFkVGrVV754PqVvL6rr2w1JOnUZP1ZSe1++aLIhy5crp7rubqVmzu3TPPXerXDk3YxcAAIDbCgEwAAAoGjoM0oyhbbOVGvQbrYn9ArPVrpejfx/W0dOpcjE25KutRr85TjMm9pK/Y7nDIM2YMEiDBvTVoAGD9ObcuZozJMi27lz2q6BcS5VRrfp3qUHTdqpZr4n8Ahop+eIFY7er1qDfaL055gnbdj/9rGbMnaspD3oau+WQ37HrNna2xncxVgvo73jF/n1aKc6SFKin3p6sp+obO11Jad09bKoWzHhMneoHKqhJV41/b7KeqmPsdw2ybefVefSOZB0YeVpj211QuqS5T5zTN+Os6nBnqrHrdVJO3V6drTlD2iqofqBa9RitBW/3VYCxm4HxfGZ/fl7Hj/2t40nnszrcID4+PhowoL+Cg4MUFtY1W1u3bt0UHBysAQP6y8fHJ1sbAADA7YAAGAAAwK5M5dxHqmaMZDXWpdLy8vPMNtIxY1TxqZ9WasGKvcoad1laXn7V5e/nrQp5BXsdQlRz/yZtLRmkdn6GtrifNHr8JI0eP1b9R6xXauuu6pbXegrA2aWkagaGqFRpiySpdNlyunDOqqRT/xi7XpNze7+0bfeYF9R/RbwC7m2rCpLkXE5VfctlhbwOI5lzHjs753KqWKG0KrhXl7/jsnJWBV+H45rXuhO3a+miNdqdKqlMOXmVKSOvytUNI6idVcG3uvwN5zWDS4ehGh78t95/cpietZ+PQU9P0od/2NrLVLZtRxmH9WaOhM4xUtvhtc6lZu2v43ba2a7NfK4du/v9UhXd94zebndee08UV++Py+rB9yzq+U5ZlSiero9fPKuPx51Vh2bXOwhuqXvqx2vFkLEaPX6SRgwdpv4vf6oYhx459sF4Po3Plap9a7/U4k0n7Mvbls3r99E44jxjJH1ev9cZnJyc1K5dW7m5ualNm9aqWLFitnZ394pq0+Y+ubm5qV27tnJycsrWDgAAcKsr7ulVeYKxWBjWM2eMJQAAgFtPYEv18j6uTzfYkzpJ3s1C1VS/asXPx9V4SISmPhqkmnWCFfrwvSq1eZN+v+isxkOmaMZjIapZJ0Td+twv34PfKepYxhqqq//kV9Ux9X/6fr8kOav7+AiFl1+v4/eM15iGR7Ti5+OSzwOaEvGMQgP9Vf/ezhrQ1Ut/rvpVf2duiU2nRx9XxT1TtDo5VA/4/aVVO07aGozbfraGWoX7K+m79dpZPed+FYSTk5PKVaykEs5ZMWrxEiV04u9D2fpdC8fjKznrjlb3q3mxaC3adEAK7q+IUQ106H/bbcehywgteLC0Pt3whxoOmpR17KqEKKyJFPW/7Wo65r/qHlhcHtXu0r0Nyyjmm2gdlyRZ9Mir09TD7Wut/e2y1O55zRvbVqU2rNfOC1LLEdM0sNL3Wq1H7K95Tg9MfUatKpWWR/271Mr3opZtLq724QGq3rCtOt1VX/d266Fe9a1asemA0jL3yFndn3hCntsn67/bkzOraampmX3CXpql/i3rK7zz3bqj4nklNXtWUx8NUc2AYLULD1eLtE1aF5MsyVM9Js7Q6M6BqnlHS/W4y1su6X8r8n/b9Xe2Y+OpHhMjNLZroGre0Vx9H+8gr0M/KCr+cubrZ3gh5IKmtTyvfaeK65XIUtqTUkJv9TqvVztc0Fmrk174pLQO/FNcHe9K1YCHkqUS0padV0iUC6yy7u56r3xdovXT7jNK1WWlpmZso6d6TJymF0IDVKdBS/V69E6lbY5U3eeyn0+Phk+qveH83v1ShB4qtVzr9tmO7VNt66tTx6Zq1KKzHnuonpJWRerPy5LXg+M077mmKlveX9369FKb+r7y1gFdbPt6Lr/X2be8fPnyaty4kUqUKCFJOnMmSTt37tSRI0f1559/ytPTUy4utt+TYsWK6c8//9LFi4aVAAAA3MIYAQwAAKAgtWokbZ0zVqPHv6pBT07S56cl1emrp0L+1n+HjLWN9FySoLu7OE638JdW/3pade+y15zvV0jNv/TT54Y/kMd9o/GDhmjQmEka/fx72lQ8RJ1aZu8itVWzOvGK3pSqHVF/qWxwy+xfn3cpYx/ZGKhOo9sp4NhebUl07FAwNctfVptqqZrf/rQO7P1FyRezvl5/4bw1W9/ceeruri0VUMDcsMJdQ7X4w9lavGi2Xr/7jBbPXW/sUmArJs/S1tNSzLIh6v38IkVntpzRxn0nVLWO7aC2bOShfw6XUd2WzpKCFFIrVbF7HM/JXr3//ErF6Iy2zhii3pO/sddddPSrsRo05lUNitiqczWD1cxhKSlQvpXO6/gh+7oc5mZ2nLe5kvNejXh6rEbM3qSf3hurPk/brquhq+IV0CpUFSS5hD6hHpV3aMqQsRo95gUNWhuf65QXxn5DV5xXyydyn1ph5X4XffaHq0JqXdJ7z53Xf584J/ey6fp2r7P6N03W7vFn9HrvC/Isl66pi0pp6oJSxlXk5FxL7bo2lpexnsMmzZj9k8p0eE2LP5yq8f2aqar9kLiEPqEepTdpxJBXNXrMC3r9Jw916l49x/l8N8/zm8Xl75UaOvRVjRgyS5suVlfjuyUpUD06emrr7Fc1I2KGhq46pDKnf9KUleVy/702OH36tBYvXqL09HRJ0u7du1WlSmXdfXczNW58p/74409JUnp6uhYvXqLTp3NZCQAAwC2MABgAAEDRWrz6hJqNmK05Yx9SY3uY59KouryKeajzK+M0ZeI4TWzjLWef6gpyWDJ22Q79ExiiTpJcugSrxt6ftCLHt+tT5Rz4gEaMnaw5cweosaW0yrpn7+ESGqK6F1NVpt1D6l1NOlUxUO0d55X1bauJE0Zr/MheanxxjUaPWJTt6/UF4V/usn7omaRl3c5qS1wJXbqUoti9O5SSfEGX0y4p7sBe4yI51X9AT/Xtq96tjQ25O/XzLPV+fIh6Pz1WU9ZI4VNHq1MBw+PCiN2+X+eq1VZjBSmkaoJWr41X1fotJfd68nc+pO2/GpfIzQkd2W4/eWmSypTOJfgsrbLl7T8G3a9BA/rq2THjss0lfeDnL+0jkyXJU836DdWMWRGa085bcisvf0kN6njr3J4t2mF/uZSUHBeNlEu/48uiddTTO9s1mOGci5Ma9rok3SOVKJ8uJUjnjjlp19HiSiknyUMqlZCu9M2S9qargqst8MxXaC8927eXuhmnJMnFuZ8+0LOPD9GID3bI+d6n9d9ZT6ixfR9UJlAjJtp+j54ILCOvalc39/Y/h6OzTati+z06r5RUZ5Wx2C6sCiUzLrDcf68L6/Jlx9HWTP8AAABuPwTAAACg6Cjukm2UZZmSzlKqLVk7/tUkhT85RYvjAjXivWl6tpG904lozZm/yPZ4Z4bGvPGl9jmsQ4lr9FNcLTXr4KxuIdW1L2pTjrlrXTqM1oIR9XTks0ka9OR87cgx0NZZ7ZrW0qm/9uqcJClesXHl1aB1rawu+1eq9+ND1H/oq5oY8Y1ics8L8xXR5pwsLrbQb2BQsl5oclELOyRq/+/b9OdvUUpJLsAN4PZ8oGefHKVJ3xobruDcCf20ZKV2XPRW3ULfeK0Aft2pWOeaanx3sALO/aV13/+uozWC1bJ5LVXav0ubjP2vSrT2HU5VpRr2+PWnRRo9fpIW783rJmW19NSM19S7zDZNHDFM/Vdfr+k1nOWSS5b5SKcUBfil6Zc/S+ie0W568B2LdsWX0AsdLmrJFlc1GFlOnWZY9MNfznoh5KIGNsiaxiJPK6eo/5Ov6v2Dxoa8nFfspk/1ypAPtKNksFrZf4/OHVif9Xv09hSNeOf6nBGbQ/pwxSHVfXKa/vvmVM24O0Ef2kea5/l77aBcuXLq06d35ty+DRrU1759f2jz5h+1fv0GBQTY/hLj5OSkPn0eVbly5QxrAAAAuLURAAMAgKLh10M66tdYvX3tyVmZtmpXX9r3615Jzqpa01su5w5p08IpWn2wnHz9pJRf/9LxyoG6++IhxR48pNiD8Tp67Iwh4D2j1b/Gq2bIADXw+Utb1+RMZhsEeuvcr2v0+f7zUpnyyjEQ0bmlmgWc1o7PvtRi+2PG2r9UwTgNxDVqVClrNts6FdL0crMLeiuqlC6lpijFYSqIKzl32ngMCqbM3feogeW0jh+WlJpiuxGbs2xzxIZUN3bPxRmdzXPq1a3asb+86narpdT925WSukn7TlVRu0aeOrAnt7DxvJJznqorWr3qJ+nuARpxt/GGbrkJVM3K8do0d6tOpTqraoWsZXYfPqEK9e9RY2fZrj+HNke798arTHA7tbRfM169Gsv/2F/KGKjsaOqCknp5dmlVtFzWlmlJeiw0WRPWlVKlFypo6RYXTWt7Xqt7WFWyRLr6rCmrqdtLGleRi1SdOn3la8Ml5Am9PqxZ1g3e3D1VwSVFZ0/Z96FWPVWIs/8excXr6LHzuZxP4/OCC23lrR0zhunZMS+o99MztPqY8vy9NkpKStLZs7Y/vcgeCHfs2F4tWtyr7t0flMViu1GiJJ09e05JSUmZzwEAAG4HBMAAAKBoOPipZn2dqtCps7X0w9laOreXvH56TzN/kmQJUtiQcfr4vcmaMn2aurlt0tJVkv5YpFeWnFG7qXM1Z9ZULVgUoXEdc47+O/X5Ju2v30w1927UamOjpB2bdinlrqFaMGuqFky9W66GPM2lXYjqJkZrneMoy++3a195wzQQ1+i3xOLZnp+66KTj52/sV9qz5gCeqwVPVtPuOf/V4kRJe7Zrx+lAPfXBbC3+cLj84w7ZRz8bHD6hU+Wb6aknAyUd0sZf41WjR4TmzB6tboYgfd2eQ/KvWV6x2w/Z5wWWguqf075NuaSl2qSte0qr2bCpmjPdNlVBgfz6gcYsPKQGQyO01L5foxud0Ka1uU2fsUlb9nir29ypmjN7msbUUOY+pnw1R4uPNdb4hXO1+MMIjamZmuv+p3w7Q1O2V9GzH8zV4g/nalbbM1r8Rt7Tf8xd7qqmg8tp6pel1LZBqr59MUnLn7Zq9UCrPMukq8//yqrPmrL69pDxrxDXJmXvLv1T7THNWThXiz+craUz7pfL94v04UH7PkRV04gPZmvOrKla+sFkPVVfuZxP43Pjq+TtaILUcoTttRd/GKE5E59Qu1p5/F4bpKena9269Tp16pTWr9+gQ4cOZ2s/fPiI1q/foFOnTmnduvWZcwUDAADcLpzqBQVf0yeYOMMHJAAAgFuaczlV9Smjc3HxOmXIBV3Ke6uq5ZyOHjGOcC0tL7/ySsllmQJzLqeqlaV/cqz739G9doqm33de6w4764GaqdqXWEyDviujmJPZQ+F/l7Mq+HpKx/I/ri7lvVUhNV7H7QlpmcrVVeFivI6ezmehAnFWBV9vuZw8lLnugrMtWyH1hGKP5T9Ctkzl6iprzf01CrwvZTzlXzEll2szfy90uqB7fC/pwx9dteK33G4zd52V8ZS/p7NO5fa74lxOVX1cdOrgiWxht/EYGJ9fWWONfq+rjk54VYuP2a6X0KHjFJb0nvpHROfze507Nzc3DRjQP/P5Bx98qDNnDDd2BAAAuI0QAAMAABQBvQNTtCehuHaduJmBL3ADuD+kGW8HacfYSVp8JFUq01gjpj8tr9WjNHrl1QW3d93VVMWKFVN6erq2bv3Z2AwAAHBbIQAGAAAAcBtzVtV2fTW0e4gqOadKF09ox4oP9P66+AKN+AUAADC7/wem/e+IJmwbyQAAAABJRU5ErkJggg==" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![image.png](attachment:image.png)\n", + "\n", + "![image-2.png](attachment:image-2.png)" + ], + "outputs": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/week3/community-contributions/Week3-Dataset_Generator-DP.ipynb b/week3/community-contributions/Week3-Dataset_Generator-DP.ipynb new file mode 100644 index 0000000..72c1c84 --- /dev/null +++ b/week3/community-contributions/Week3-Dataset_Generator-DP.ipynb @@ -0,0 +1,381 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "c08309b8-13f0-45bb-a3ea-7b01f05a7346", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import json\n", + "import pandas as pd\n", + "import random\n", + "import re\n", + "import subprocess\n", + "import pyarrow as pa\n", + "from typing import List\n", + "import openai\n", + "import anthropic\n", + "from dotenv import load_dotenv\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f5efd903-e683-4e7f-8747-2998e23a0751", + "metadata": {}, + "outputs": [], + "source": [ + "# load API\n", + "load_dotenv(override=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce49b86a-53f4-4d4f-a721-0d66d9c1b070", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Schema Definition ---\n", + "SCHEMA = [\n", + " (\"Team\", \"TEXT\", '\"Toronto Raptors\"'),\n", + " (\"NAME\", \"TEXT\", '\"Otto Porter Jr.\"'),\n", + " (\"Jersey\", \"TEXT\", '\"10\", or \"NA\" if null'),\n", + " (\"POS\", \"TEXT\", 'One of [\"PF\",\"SF\",\"G\",\"C\",\"SG\",\"F\",\"PG\"]'),\n", + " (\"AGE\", \"INT\", 'integer age in years, e.g., 22'),\n", + " (\"HT\", \"TEXT\", '`6\\' 7\"` or `6\\' 10\"`'),\n", + " (\"WT\", \"TEXT\", '\"232 lbs\"'),\n", + " (\"COLLEGE\", \"TEXT\", '\"Michigan\", or \"--\" if null'),\n", + " (\"SALARY\", \"TEXT\", '\"$9,945,830\", or \"--\" if null')\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "93743e57-c2c5-43e5-8fa1-2e242085db07", + "metadata": {}, + "outputs": [], + "source": [ + "# Default schema text for the textbox\n", + "DEFAULT_SCHEMA_TEXT = \"\\n\".join([f\"{i+1}. {col[0]} ({col[1]}) Example: {col[2]}\" for i, col in enumerate(SCHEMA)])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "87c58595-6fdd-48f5-a253-ccba352cb385", + "metadata": {}, + "outputs": [], + "source": [ + "# Available models\n", + "MODELS = [\n", + " \"gpt-4o\",\n", + " \"claude-3-5-haiku-20241022\", \n", + " \"ollama:llama3.2:latest\"\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08cd9ce2-8685-46b5-95d0-811b8025696f", + "metadata": {}, + "outputs": [], + "source": [ + "# Available file formats\n", + "FILE_FORMATS = [\".csv\", \".tsv\", \".jsonl\", \".parquet\", \".arrow\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13d68c7f-6f49-4efa-b075-f1e7db2ab527", + "metadata": {}, + "outputs": [], + "source": [ + "def get_prompt(n: int, schema_text: str, system_prompt: str) -> str:\n", + " prompt = f\"\"\"\n", + "{system_prompt}\n", + "\n", + "Generate {n} rows of realistic basketball player data in JSONL format, each line a JSON object with the following fields:\n", + "\n", + "{schema_text}\n", + "\n", + "Do NOT repeat column values from one row to another.\n", + "\n", + "Only output valid JSONL.\n", + "\"\"\"\n", + " return prompt.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cdc68f1e-4fbe-45dc-aa36-ce5f718ef6ca", + "metadata": {}, + "outputs": [], + "source": [ + "# --- LLM Interface ---\n", + "def query_model(prompt: str, model: str = \"gpt-4o\") -> List[dict]:\n", + " \"\"\"Call OpenAI, Claude, or Ollama\"\"\"\n", + " try:\n", + " if model.lower().startswith(\"gpt\"):\n", + " client = openai.OpenAI(api_key=os.getenv(\"OPENAI_API_KEY\"))\n", + " response = client.chat.completions.create(\n", + " model=model,\n", + " messages=[{\"role\": \"user\", \"content\": prompt}],\n", + " temperature=0.7\n", + " )\n", + " content = response.choices[0].message.content\n", + "\n", + " elif model.lower().startswith(\"claude\"):\n", + " client = anthropic.Anthropic(api_key=os.getenv(\"ANTHROPIC_API_KEY\"))\n", + " response = client.messages.create(\n", + " model=model,\n", + " messages=[{\"role\": \"user\", \"content\": prompt}],\n", + " max_tokens=4000,\n", + " temperature=0.7\n", + " )\n", + " content = response.content[0].text\n", + "\n", + " elif model.lower().startswith(\"ollama:\"):\n", + " ollama_model = model.split(\":\")[1]\n", + " result = subprocess.run(\n", + " [\"ollama\", \"run\", ollama_model],\n", + " input=prompt,\n", + " text=True,\n", + " capture_output=True\n", + " )\n", + " if result.returncode != 0:\n", + " raise Exception(f\"Ollama error: {result.stderr}\")\n", + " content = result.stdout\n", + " else:\n", + " raise ValueError(\"Unsupported model. Use 'gpt-4.1-mini', 'claude-3-5-haiku-20241022', or 'ollama:llama3.2:latest'\")\n", + "\n", + " # Parse JSONL output\n", + " lines = [line.strip() for line in content.strip().splitlines() if line.strip().startswith(\"{\")]\n", + " return [json.loads(line) for line in lines]\n", + " \n", + " except Exception as e:\n", + " raise Exception(f\"Model query failed: {str(e)}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29e3f5f5-e99c-429c-bea9-69d554c58c9c", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Output Formatter ---\n", + "def save_dataset(records: List[dict], file_format: str, filename: str):\n", + " df = pd.DataFrame(records)\n", + " if file_format == \".csv\":\n", + " df.to_csv(filename, index=False)\n", + " elif file_format == \".tsv\":\n", + " df.to_csv(filename, sep=\"\\t\", index=False)\n", + " elif file_format == \".jsonl\":\n", + " with open(filename, \"w\") as f:\n", + " for record in records:\n", + " f.write(json.dumps(record) + \"\\n\")\n", + " elif file_format == \".parquet\":\n", + " df.to_parquet(filename, engine=\"pyarrow\", index=False)\n", + " elif file_format == \".arrow\":\n", + " table = pa.Table.from_pandas(df)\n", + " with pa.OSFile(filename, \"wb\") as sink:\n", + " with pa.ipc.new_file(sink, table.schema) as writer:\n", + " writer.write(table)\n", + " else:\n", + " raise ValueError(\"Unsupported file format\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fe258e84-66f4-4fe7-99c0-75b24148e147", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Main Generation Function ---\n", + "def generate_dataset(schema_text, system_prompt, model, nr_records, file_format, save_as):\n", + " try:\n", + " # Validation\n", + " if nr_records <= 10:\n", + " return \"❌ Error: Nr_records must be greater than 10.\", None\n", + " \n", + " if file_format not in FILE_FORMATS:\n", + " return \"❌ Error: Invalid file format specified.\", None\n", + " \n", + " if not save_as or save_as.strip() == \"\":\n", + " save_as = f\"basketball_dataset{file_format}\"\n", + " elif not save_as.endswith(file_format):\n", + " save_as = save_as + file_format\n", + " \n", + " # Generate prompt\n", + " prompt = get_prompt(nr_records, schema_text, system_prompt)\n", + " \n", + " # Query model\n", + " records = query_model(prompt, model=model)\n", + " \n", + " if not records:\n", + " return \"❌ Error: No valid records generated from the model.\", None\n", + " \n", + " # Save dataset\n", + " save_dataset(records, file_format, save_as)\n", + " \n", + " # Create preview\n", + " df = pd.DataFrame(records)\n", + " preview = df.head(10) # Show first 10 rows\n", + " \n", + " success_message = f\"✅ Dataset generated successfully!\\n📁 Saved to: {save_as}\\n📊 Generated {len(records)} records\"\n", + " \n", + " return success_message, preview\n", + " \n", + " except Exception as e:\n", + " return f\"❌ Error: {str(e)}\", None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c2405a9d-b4cd-43d9-82f6-ff3512b4541f", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Gradio Interface ---\n", + "def create_interface():\n", + " with gr.Blocks(title=\"Dataset Generator\", theme=gr.themes.Soft()) as interface:\n", + " gr.Markdown(\"# Dataset Generator\")\n", + " gr.Markdown(\"Generate realistic datasets using AI models\")\n", + " \n", + " with gr.Row():\n", + " with gr.Column(scale=2):\n", + " schema_input = gr.Textbox(\n", + " label=\"Schema\",\n", + " value=DEFAULT_SCHEMA_TEXT,\n", + " lines=15,\n", + " placeholder=\"Define your dataset schema here...\"\n", + " )\n", + " \n", + " system_prompt_input = gr.Textbox(\n", + " label=\"Prompt\",\n", + " value=\"You are a helpful assistant that generates realistic basketball player data.\",\n", + " lines=1,\n", + " placeholder=\"Enter system prompt for the model...\"\n", + " )\n", + " \n", + " with gr.Row():\n", + " model_dropdown = gr.Dropdown(\n", + " label=\"Model\",\n", + " choices=MODELS,\n", + " value=MODELS[1], # Default to Claude\n", + " interactive=True\n", + " )\n", + " \n", + " nr_records_input = gr.Number(\n", + " label=\"Nr. records\",\n", + " value=25,\n", + " minimum=11,\n", + " maximum=1000,\n", + " step=1\n", + " )\n", + " \n", + " with gr.Row():\n", + " file_format_dropdown = gr.Dropdown(\n", + " label=\"File format\",\n", + " choices=FILE_FORMATS,\n", + " value=\".csv\",\n", + " interactive=True\n", + " )\n", + " \n", + " save_as_input = gr.Textbox(\n", + " label=\"Save as\",\n", + " value=\"basketball_dataset\",\n", + " placeholder=\"Enter filename (extension will be added automatically)\"\n", + " )\n", + " \n", + " generate_btn = gr.Button(\"🚀 Generate\", variant=\"primary\", size=\"lg\")\n", + " \n", + " with gr.Column(scale=1):\n", + " output_status = gr.Textbox(\n", + " label=\"Status\",\n", + " lines=4,\n", + " interactive=False\n", + " )\n", + " \n", + " output_preview = gr.Dataframe(\n", + " label=\"Preview (First 10 rows)\",\n", + " interactive=False,\n", + " wrap=True\n", + " )\n", + " \n", + " # Connect the generate button\n", + " generate_btn.click(\n", + " fn=generate_dataset,\n", + " inputs=[\n", + " schema_input,\n", + " system_prompt_input, \n", + " model_dropdown,\n", + " nr_records_input,\n", + " file_format_dropdown,\n", + " save_as_input\n", + " ],\n", + " outputs=[output_status, output_preview]\n", + " )\n", + " \n", + " gr.Markdown(\"\"\"\n", + " ### 📝 Instructions:\n", + " 1. **Schema**: Define the structure of your dataset (pre-filled with basketball player schema)\n", + " 2. **Prompt**: System prompt to guide the AI model\n", + " 3. **Model**: Choose between GPT, Claude, or Ollama models\n", + " 4. **Nr. records**: Number of records to generate (minimum 11)\n", + " 5. **File format**: Choose output format (.csv, .tsv, .jsonl, .parquet, .arrow)\n", + " 6. **Save as**: Filename (extension added automatically)\n", + " 7. Click **Generate** to create your dataset\n", + " \n", + " ### 🔧 Requirements:\n", + " - Set up your API keys in `.env` file (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`)\n", + " - For Ollama models, ensure Ollama is installed and running locally\n", + " \"\"\")\n", + " \n", + " return interface" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "50fd2b91-2578-4224-b9dd-e28caf6a0a85", + "metadata": {}, + "outputs": [], + "source": [ + "interface = create_interface()\n", + "interface.launch(inbrowser=True)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week3/community-contributions/Week3_Exercise_Data_Generator.ipynb b/week3/community-contributions/Week3_Exercise_Data_Generator.ipynb new file mode 100644 index 0000000..583010c --- /dev/null +++ b/week3/community-contributions/Week3_Exercise_Data_Generator.ipynb @@ -0,0 +1,551 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "GD5Omr5EfWgb" + }, + "source": [ + "# Date Generator\n", + "\n", + "generate synthetic data when given scheme, business problem description, model, number of records, file name, file type, and environment\n", + "\n", + "# Available models\n", + " Model API:\n", + "\n", + " 1. gpt-4o-mini\n", + " 2. claude-3-haiku-20240307\n", + " 3. gemini-2.0-flash\n", + " 4. deepseek-chat\"\n", + "\n", + " HuggingFace API:\n", + "\n", + " 5. meta-llama/Meta-Llama-3.1-8B-Instruct\n", + "\n", + "\n", + "# Available environment\n", + "\n", + "Colab: set up HF token and API keys in Colab secret section\n", + "\n", + "Local: set up HF token and API keys in .env file\n", + "\n", + "\n", + "\n", + "### *** This project is developed based on the idea of 'week3/community-contributuins/Week3-Dataset_Generator-DP'. Really appreciate it! Then, the project is improved to run both on Colab or locally, and integrate HuggingFace API" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4FiCnE0MmU56" + }, + "outputs": [], + "source": [ + "!pip install -q --upgrade torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124\n", + "!pip install -q requests bitsandbytes==0.46.0 transformers==4.48.3 accelerate==1.3.0\n", + "!pip install anthropic dotenv pyarrow" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeyKw5guoH3r" + }, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from IPython.display import Markdown, display, update_display\n", + "from openai import OpenAI\n", + "from huggingface_hub import login\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n", + "from bs4 import BeautifulSoup\n", + "from typing import List\n", + "import google.generativeai\n", + "import anthropic\n", + "from itertools import chain\n", + "from dotenv import load_dotenv\n", + "import gradio as gr\n", + "import json\n", + "import pandas as pd\n", + "import random\n", + "import re\n", + "import subprocess\n", + "import pyarrow as pa\n", + "import torch\n", + "import gc" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7UyjFdRZoIAS" + }, + "outputs": [], + "source": [ + "# --- Schema Definition ---\n", + "SCHEMA = [\n", + " (\"Name\", \"TEXT\", '\"Northern Cafe\"'),\n", + " (\"Location\", \"TEXT\", '\"2904 S Figueroa St, Los Angeles, CA 90007\"'),\n", + " (\"Type\", \"TEXT\", 'One of [\"Chinese\",\"Mexico\",\"French\",\"Korean\",\"Italy\"] or other potential types'),\n", + " (\"Average Price\", \"TEXT\", '\"$30\", or \"--\" if unkown'),\n", + " (\"History/Age\", \"INT\", 'integer age of resturant, e.g., 7'),\n", + " (\"Menu\", \"Array\", '[\"Beef Noodle\", \"Fried Rice\", \"Dumpling\", ...]'),\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jXcTQATLoICV" + }, + "outputs": [], + "source": [ + "# Default schema text for the textbox\n", + "DEFAULT_SCHEMA_TEXT = \"\\n\".join([f\"{i+1}. {col[0]} ({col[1]}) Example: {col[2]}\" for i, col in enumerate(SCHEMA)])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4Irf5JV3oIEe" + }, + "outputs": [], + "source": [ + "# Available models\n", + "MODELS = [\n", + " \"gpt-4o-mini\",\n", + " \"claude-3-haiku-20240307\",\n", + " \"gemini-2.0-flash\",\n", + " \"deepseek-chat\",\n", + " \"meta-llama/Meta-Llama-3.1-8B-Instruct\"\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JJ6r2SH9oIGf" + }, + "outputs": [], + "source": [ + "# Available file formats\n", + "FILE_FORMATS = [\".csv\", \".tsv\", \".jsonl\", \".parquet\", \".arrow\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "B98j45E3vq5g" + }, + "outputs": [], + "source": [ + "system_prompt = \"\"\"You are a helpful assistant whose main purpose is to generate datasets for a given business problem based on given schema.\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lsX16cWfwf6x" + }, + "outputs": [], + "source": [ + "def get_env_info(env):\n", + " try:\n", + " global hf_token, openai_api_key, anthropic_api_key, google_api_key, deepseek_api_key\n", + " if env == \"Colab\":\n", + " # Colab environment\n", + " from google.colab import drive\n", + " from google.colab import userdata\n", + " hf_token = userdata.get('HF_TOKEN')\n", + " openai_api_key = userdata.get('OPENAI_API_KEY')\n", + " anthropic_api_key = userdata.get('ANTHROPIC_API_KEY')\n", + " google_api_key = userdata.get('GOOGLE_API_KEY')\n", + " deepseek_api_key = userdata.get('DEEPSEEK_API_KEY')\n", + " elif env == \"Local\":\n", + " # Local environment\n", + " load_dotenv(override=True)\n", + " hf_token = os.getenv('HF_TOKEN')\n", + " openai_api_key = os.getenv('OPENAI_API_KEY')\n", + " anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + " google_api_key = os.getenv('GOOGLE_API_KEY')\n", + " deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')\n", + " except Exception as e:\n", + " raise Exception(f\"Please check your environment: {str(e)}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2gLUFAwGv29Q" + }, + "outputs": [], + "source": [ + "def get_prompt(schema_text, business_problem, nr_records):\n", + " prompt = f\"\"\"\n", + " The problem is: {business_problem}\n", + "\n", + " Generate {nr_records} rows data in JSONL format, each line a JSON object with the following fields:\n", + "\n", + " {schema_text}\n", + "\n", + " Do NOT repeat column values from one row to another.\n", + "\n", + " Only output valid JSONL.\n", + " \"\"\"\n", + " return prompt.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YZe1FVH8wf84" + }, + "outputs": [], + "source": [ + "# --- LLM Interface ---\n", + "def query(user_prompt, model):\n", + " try:\n", + " if \"gpt\" in model.lower():\n", + " client = OpenAI(api_key=openai_api_key)\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ]\n", + " response = client.chat.completions.create(\n", + " model=model,\n", + " messages=messages,\n", + " temperature=0.7\n", + " )\n", + " content = response.choices[0].message.content\n", + "\n", + " elif \"claude\" in model.lower():\n", + " client = anthropic.Anthropic(api_key=anthropic_api_key)\n", + " response = client.messages.create(\n", + " model=model,\n", + " messages=[{\"role\": \"user\", \"content\": user_prompt}],\n", + " max_tokens=4000,\n", + " temperature=0.7,\n", + " system=system_prompt\n", + " )\n", + " content = response.content[0].text\n", + " elif \"gemini\" in model.lower():\n", + " client = OpenAI(\n", + " api_key=google_api_key,\n", + " base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n", + " )\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ]\n", + " response = client.chat.completions.create(\n", + " model=model,\n", + " messages=messages,\n", + " temperature=0.7\n", + " )\n", + " content = response.choices[0].message.content\n", + "\n", + " elif \"deepseek\" in model.lower():\n", + " client = OpenAI(\n", + " api_key=deepseek_api_key,\n", + " base_url=\"https://api.deepseek.com\"\n", + " )\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ]\n", + " response = client.chat.completions.create(\n", + " model=model,\n", + " messages=messages,\n", + " temperature=0.7\n", + " )\n", + " content = response.choices[0].message.content\n", + "\n", + " elif \"llama\" in model.lower():\n", + " global tokenizer, inputs, llama_model, outputs\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ]\n", + "\n", + " login(hf_token, add_to_git_credential=True)\n", + " quant_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_compute_dtype=torch.bfloat16,\n", + " bnb_4bit_quant_type=\"nf4\"\n", + " )\n", + "\n", + " tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True)\n", + " tokenizer.pad_token = tokenizer.eos_token\n", + " inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n", + " if llama_model == None:\n", + " llama_model = AutoModelForCausalLM.from_pretrained(model, device_map=\"auto\", quantization_config=quant_config)\n", + " outputs = llama_model.generate(inputs, max_new_tokens=4000)\n", + "\n", + " _, _, after = tokenizer.decode(outputs[0]).partition(\"assistant<|end_header_id|>\")\n", + " content = after.strip()\n", + " else:\n", + " raise ValueError(f\"Unsupported model. Use one of {MODELS}\")\n", + "\n", + " # Parse JSONL output\n", + " lines = [line.strip() for line in content.strip().splitlines() if line.strip().startswith(\"{\")]\n", + " return [json.loads(line) for line in lines]\n", + "\n", + " except Exception as e:\n", + " raise Exception(f\"Model query failed: {str(e)}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4WUj-XqM5IYT" + }, + "outputs": [], + "source": [ + "# --- Output Formatter ---\n", + "def save_dataset(records, file_format, filename):\n", + " df = pd.DataFrame(records)\n", + " if file_format == \".csv\":\n", + " df.to_csv(filename, index=False)\n", + " elif file_format == \".tsv\":\n", + " df.to_csv(filename, sep=\"\\t\", index=False)\n", + " elif file_format == \".jsonl\":\n", + " with open(filename, \"w\") as f:\n", + " for record in records:\n", + " f.write(json.dumps(record) + \"\\n\")\n", + " elif file_format == \".parquet\":\n", + " df.to_parquet(filename, engine=\"pyarrow\", index=False)\n", + " elif file_format == \".arrow\":\n", + " table = pa.Table.from_pandas(df)\n", + " with pa.OSFile(filename, \"wb\") as sink:\n", + " with pa.ipc.new_file(sink, table.schema) as writer:\n", + " writer.write(table)\n", + " else:\n", + " raise ValueError(\"Unsupported file format\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WenbNqrpwf-_" + }, + "outputs": [], + "source": [ + "# --- Main Generation Function ---\n", + "def generate_dataset(schema_text, business_problem, model, nr_records, file_format, save_as, env):\n", + " try:\n", + " # Validation\n", + " if nr_records <= 10:\n", + " return \"❌ Error: Number of records must be greater than 10.\", None\n", + " if nr_records > 1000:\n", + " return \"❌ Error: Number of records must be less than or equal to 1000.\", None\n", + "\n", + " if file_format not in FILE_FORMATS:\n", + " return \"❌ Error: Invalid file format.\", None\n", + "\n", + " if not (save_as or save_as.strip() == \"\"):\n", + " save_as = f\"default{file_format}\"\n", + " elif not save_as.endswith(file_format):\n", + " save_as = save_as + file_format\n", + "\n", + " # Load env\n", + " get_env_info(env)\n", + "\n", + " # Generate prompt\n", + " user_prompt = get_prompt(schema_text, business_problem, nr_records)\n", + "\n", + " # Query model\n", + " records = query(user_prompt, model)\n", + "\n", + " if not records:\n", + " return \"❌ Error: No valid records generated from the model.\", None\n", + "\n", + " # Save dataset\n", + " save_dataset(records, file_format, save_as)\n", + "\n", + " # Create preview\n", + " df = pd.DataFrame(records)\n", + " preview = df.head(10) # Show first 10 rows\n", + "\n", + " success_message = f\"✅ Generated {len(records)} records successfully!\\n📁 Saved to: {save_as}\\n📊 \"\n", + "\n", + " return success_message, preview\n", + "\n", + " except Exception as e:\n", + " return f\"❌ Error: {str(e)}\", None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pHiP8ky8wgEb" + }, + "outputs": [], + "source": [ + "# --- Gradio Interface ---\n", + "\n", + "with gr.Blocks(title=\"Dataset Generator\", theme=gr.themes.Citrus()) as interface:\n", + " hf_token = None\n", + " openai_api_key = None\n", + " anthropic_api_key = None\n", + " google_api_key = None\n", + " deepseek_api_key = None\n", + " tokenizer = None\n", + " inputs = None\n", + " llama_model = None\n", + " outputs = None\n", + "\n", + " gr.Markdown(\"# Dataset Generator\")\n", + " gr.Markdown(\"Generate synthetic datasets using AI models\")\n", + "\n", + " with gr.Row():\n", + " with gr.Column(scale=2):\n", + " schema_input = gr.Textbox(\n", + " label=\"Schema\",\n", + " value=DEFAULT_SCHEMA_TEXT,\n", + " lines=15,\n", + " placeholder=\"Define your dataset schema here... Please follow this format: Field_Name, Field_Type, Field Example\"\n", + " )\n", + "\n", + " business_problem_input = gr.Textbox(\n", + " label=\"Business Problem\",\n", + " value=\"I want to generate restuant records\",\n", + " lines=1,\n", + " placeholder=\"Enter business problem desciption for the model...\"\n", + " )\n", + "\n", + " with gr.Row():\n", + " model_dropdown = gr.Dropdown(\n", + " label=\"Model\",\n", + " choices=MODELS,\n", + " value=MODELS[0],\n", + " interactive=True\n", + " )\n", + "\n", + " nr_records_input = gr.Number(\n", + " label=\"Number of records\",\n", + " value=27,\n", + " minimum=11,\n", + " maximum=1000,\n", + " step=1\n", + " )\n", + "\n", + " with gr.Row():\n", + " save_as_input = gr.Textbox(\n", + " label=\"Save as\",\n", + " value=\"restaurant_dataset\",\n", + " placeholder=\"Enter filename (extension will be added automatically)\"\n", + " )\n", + "\n", + " file_format_dropdown = gr.Dropdown(\n", + " label=\"File format\",\n", + " choices=FILE_FORMATS,\n", + " value=FILE_FORMATS[0],\n", + " interactive=True\n", + " )\n", + "\n", + " env_dropdown = gr.Dropdown(\n", + " label=\"Environment\",\n", + " choices=[\"Colab\", \"Local\"],\n", + " value=\"Colab\",\n", + " interactive=True\n", + " )\n", + "\n", + "\n", + "\n", + " generate_btn = gr.Button(\"🚀 Generate\", variant=\"secondary\", size=\"lg\")\n", + "\n", + " with gr.Column(scale=1):\n", + " output_status = gr.Textbox(\n", + " label=\"Status\",\n", + " lines=4,\n", + " interactive=False\n", + " )\n", + "\n", + " output_preview = gr.Dataframe(\n", + " label=\"Preview (First 10 rows)\",\n", + " interactive=False,\n", + " wrap=True\n", + " )\n", + "\n", + " # Connect the generate button\n", + " generate_btn.click(\n", + " fn=generate_dataset,\n", + " inputs=[\n", + " schema_input,\n", + " business_problem_input,\n", + " model_dropdown,\n", + " nr_records_input,\n", + " file_format_dropdown,\n", + " save_as_input,\n", + " env_dropdown\n", + " ],\n", + " outputs=[output_status, output_preview]\n", + " )\n", + "\n", + " gr.Markdown(\"\"\"\n", + " ### 📝 Instructions:\n", + " 1. **Schema**: Define the structure of your dataset (pre-filled with restaurant schema)\n", + " 2. **Business problem**: User prompt to guide the AI model\n", + " 3. **Model**: Choose between GPT, Claude, Gemini, DeepSeek or Llama models\n", + " 4. **Number of records**: Number of records to generate (minimum 11)\n", + " 5. **File format**: Choose output format (.csv, .tsv, .jsonl, .parquet, .arrow)\n", + " 6. **Save as**: Filename (extension added automatically)\n", + " 7. Click **Generate** to create your dataset\n", + "\n", + " ### 🔧 Requirements:\n", + " - For local mode, set up HF token and API keys in `.env` file (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, `DEEPSEEK_API_KEY`, `HF_TOKEN`)\n", + " - For colab mode, set up HF token and API keys in Colab secret section (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, `DEEPSEEK_API_KEY`, `HF_TOKEN`)\n", + " \"\"\")\n", + "\n", + "interface.launch(debug=True)\n", + "\n", + "del tokenizer, inputs, llama_model, outputs\n", + "gc.collect()\n", + "torch.cuda.empty_cache()" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/week3/community-contributions/Week_3_Day_5_Meeting_Minutes_product_with_Gradio.ipynb b/week3/community-contributions/Week_3_Day_5_Meeting_Minutes_product_with_Gradio.ipynb new file mode 100644 index 0000000..3428e62 --- /dev/null +++ b/week3/community-contributions/Week_3_Day_5_Meeting_Minutes_product_with_Gradio.ipynb @@ -0,0 +1,523 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "It89APiAtTUF" + }, + "source": [ + "# Create meeting minutes from an Audio file\n", + "\n", + "I downloaded some Denver City Council meeting minutes and selected a portion of the meeting for us to transcribe. You can download it here: \n", + "https://drive.google.com/file/d/1N_kpSojRR5RYzupz6nqM8hMSoEF_R7pU/view?usp=sharing\n", + "\n", + "If you'd rather work with the original data, the HuggingFace dataset is [here](https://huggingface.co/datasets/huuuyeah/meetingbank) and the audio can be downloaded [here](https://huggingface.co/datasets/huuuyeah/MeetingBank_Audio/tree/main).\n", + "\n", + "The goal of this product is to use the Audio to generate meeting minutes, including actions.\n", + "\n", + "For this project, you can either use the Denver meeting minutes, or you can record something of your own!\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sJPSCwPX3MOV" + }, + "source": [ + "## Again - please note: 2 important pro-tips for using Colab:\n", + "\n", + "**Pro-tip 1:**\n", + "\n", + "The top of every colab has some pip installs. You may receive errors from pip when you run this, such as:\n", + "\n", + "> gcsfs 2025.3.2 requires fsspec==2025.3.2, but you have fsspec 2025.3.0 which is incompatible.\n", + "\n", + "These pip compatibility errors can be safely ignored; and while it's tempting to try to fix them by changing version numbers, that will actually introduce real problems!\n", + "\n", + "**Pro-tip 2:**\n", + "\n", + "In the middle of running a Colab, you might get an error like this:\n", + "\n", + "> Runtime error: CUDA is required but not available for bitsandbytes. Please consider installing [...]\n", + "\n", + "This is a super-misleading error message! Please don't try changing versions of packages...\n", + "\n", + "This actually happens because Google has switched out your Colab runtime, perhaps because Google Colab was too busy. The solution is:\n", + "\n", + "1. Kernel menu >> Disconnect and delete runtime\n", + "2. Reload the colab from fresh and Edit menu >> Clear All Outputs\n", + "3. Connect to a new T4 using the button at the top right\n", + "4. Select \"View resources\" from the menu on the top right to confirm you have a GPU\n", + "5. Rerun the cells in the colab, from the top down, starting with the pip installs\n", + "\n", + "And all should work great - otherwise, ask me!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f2vvgnFpHpID" + }, + "outputs": [], + "source": [ + "!pip install -q --upgrade torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124\n", + "!pip install -q requests bitsandbytes==0.46.0 transformers==4.48.3 accelerate==1.3.0 openai" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "FW8nl3XRFrz0" + }, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from IPython.display import Markdown, display, update_display\n", + "from openai import OpenAI\n", + "from google.colab import drive\n", + "from huggingface_hub import login\n", + "from google.colab import userdata\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n", + "import torch" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "q3D1_T0uG_Qh" + }, + "outputs": [], + "source": [ + "# Constants\n", + "\n", + "AUDIO_MODEL = \"whisper-1\"\n", + "LLAMA = \"meta-llama/Meta-Llama-3.1-8B-Instruct\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Es9GkQ0FGCMt" + }, + "outputs": [], + "source": [ + "# New capability - connect this Colab to my Google Drive\n", + "# See immediately below this for instructions to obtain denver_extract.mp3\n", + "\n", + "drive.mount(\"/content/drive\")\n", + "audio_filename = \"/content/drive/MyDrive/llms/denver_extract.mp3\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HTl3mcjyzIEE" + }, + "source": [ + "# Download denver_extract.mp3\n", + "\n", + "You can either use the same file as me, the extract from Denver city council minutes, or you can try your own..\n", + "\n", + "If you want to use the same as me, then please download my extract here, and put this on your Google Drive: \n", + "https://drive.google.com/file/d/1N_kpSojRR5RYzupz6nqM8hMSoEF_R7pU/view?usp=sharing\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xYW8kQYtF-3L" + }, + "outputs": [], + "source": [ + "# Sign in to HuggingFace Hub\n", + "\n", + "hf_token = userdata.get('HF_TOKEN')\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qP6OB2OeGC2C" + }, + "outputs": [], + "source": [ + "# Sign in to OpenAI using Secrets in Colab\n", + "\n", + "openai_api_key = userdata.get('OPENAI_API_KEY')\n", + "openai = OpenAI(api_key=openai_api_key)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GMShdVGlGGr4" + }, + "outputs": [], + "source": [ + "# Use the Whisper OpenAI model to convert the Audio to Text\n", + "# If you'd prefer to use an Open Source model, class student Youssef has contributed an open source version\n", + "# which I've added to the bottom of this colab\n", + "\n", + "audio_file = open(audio_filename, \"rb\")\n", + "transcription = openai.audio.transcriptions.create(model=AUDIO_MODEL, file=audio_file, response_format=\"text\")\n", + "print(transcription)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "piEMmcSfMH-O" + }, + "outputs": [], + "source": [ + "system_message = \"You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown.\"\n", + "user_prompt = f\"Below is an extract transcript of a Denver council meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\\n{transcription}\"\n", + "\n", + "messages = [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ]\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UcRKUgcxMew6" + }, + "outputs": [], + "source": [ + "quant_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_compute_dtype=torch.bfloat16,\n", + " bnb_4bit_quant_type=\"nf4\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6CujZRAgMimy" + }, + "outputs": [], + "source": [ + "tokenizer = AutoTokenizer.from_pretrained(LLAMA)\n", + "tokenizer.pad_token = tokenizer.eos_token\n", + "# inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n", + "streamer = TextStreamer(tokenizer)\n", + "model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map=\"auto\", quantization_config=quant_config, trust_remote_code=True)\n", + "# outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MaLNmJ5PSqcH" + }, + "outputs": [], + "source": [ + "inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n", + "outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "102tdU_3Peam" + }, + "outputs": [], + "source": [ + "response = tokenizer.decode(outputs[0])\n", + "response" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KlomN6CwMdoN" + }, + "outputs": [], + "source": [ + "display(Markdown(response))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0jZElVOMSPAr" + }, + "source": [ + "Day5 exercise - Gradio UI for meeting minutes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5iiYYxQMHf0i" + }, + "outputs": [], + "source": [ + "import gradio as gr\n", + "import tempfile\n", + "import soundfile as sf" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "aGwXW7BjPcTM" + }, + "outputs": [], + "source": [ + "# !pip install pydub\n", + "# !apt-get install ffmpeg\n", + "\n", + "from pydub import AudioSegment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "RNu-reHuCYj_" + }, + "outputs": [], + "source": [ + "# Make sure that the tokenizeer and model is already generated\n", + "\n", + "# tokenizer = AutoTokenizer.from_pretrained(LLAMA)\n", + "# tokenizer.pad_token = tokenizer.eos_token\n", + "# streamer = TextStreamer(tokenizer)\n", + "# model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map=\"auto\", quantization_config=quant_config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KOuoH0YOPruE" + }, + "outputs": [], + "source": [ + "# def save_as_mp3(audio_np):\n", + "# sr, data = audio_np\n", + "# # Convert float32 or int16 to PCM wav and then mp3\n", + "# wav_path = tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False).name\n", + "# mp3_path = tempfile.NamedTemporaryFile(suffix=\".mp3\", delete=False).name\n", + "\n", + "# sf.write(wav_path, data, sr)\n", + "# audio_segment = AudioSegment.from_wav(wav_path)\n", + "# audio_segment.export(mp3_path, format=\"mp3\", bitrate=\"64k\") # Low bitrate = small file\n", + "# return mp3_path" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "toBIPBJoSNw0" + }, + "outputs": [], + "source": [ + "# Handles audio input as numpy array and returns updated chat history\n", + "def speak_send(audio_np):\n", + "\n", + " # If use numpy as input: audio_input = gr.Audio(sources=\"upload\", type=\"numpy\", label=\"Upload audio file to generate meeting minutes\")\n", + " # mp3_path = save_as_mp3(audio_np)\n", + "\n", + " # with open(mp3_path, \"rb\") as audio_file:\n", + " # transcription = openai.audio.transcriptions.create(\n", + " # model=AUDIO_MODEL,\n", + " # file=audio_file,\n", + " # response_format=\"text\"\n", + " # )\n", + "\n", + " audio = AudioSegment.from_file(audio_np)\n", + " with tempfile.NamedTemporaryFile(suffix=\".mp3\", delete=False) as tmpfile:\n", + " audio.export(tmpfile.name, format=\"mp3\")\n", + " with open(tmpfile.name, \"rb\") as file:\n", + " transcript = openai.audio.transcriptions.create(\n", + " model=AUDIO_MODEL,\n", + " file=file,\n", + " response_format=\"text\"\n", + " )\n", + "\n", + " system_message = \"You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown.\"\n", + " user_prompt = f\"Below is an extract transcript of a Denver council meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\\n{transcription}\"\n", + "\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ]\n", + "\n", + " inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n", + " outputs = model.generate(inputs, max_new_tokens=2000)\n", + "\n", + " _, _, after = tokenizer.decode(outputs[0]).partition(\"assistant<|end_header_id|>\")\n", + " return after.strip()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xXJfabpDSN5R" + }, + "outputs": [], + "source": [ + "with gr.Blocks() as demo:\n", + "\n", + " with gr.Row():\n", + " audio_input = gr.Audio(sources=\"upload\", type=\"filepath\", label=\"Upload audio file to generate meeting minutes\")\n", + " with gr.Row():\n", + " audio_submit = gr.Button(\"Send\")\n", + " with gr.Row():\n", + " outputs = [gr.Markdown(label=\"Meeting minutes:\")]\n", + "\n", + " audio_submit.click(speak_send, inputs=audio_input, outputs=outputs)\n", + "\n", + "demo.launch(debug=True)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kuxYecT2QDQ9" + }, + "source": [ + "# Student contribution\n", + "\n", + "Student Emad S. has made this powerful variation that uses `TextIteratorStreamer` to stream back results into a Gradio UI, and takes advantage of background threads for performance! I'm sharing it here if you'd like to take a look at some very interesting work. Thank you, Emad!\n", + "\n", + "https://colab.research.google.com/drive/1Ja5zyniyJo5y8s1LKeCTSkB2xyDPOt6D" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AU3uAEyU3a-o" + }, + "source": [ + "## Alternative implementation\n", + "\n", + "Class student Youssef has contributed this variation in which we use an open-source model to transcribe the meeting Audio.\n", + "\n", + "Thank you Youssef!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "phYYgAbBRvu5" + }, + "outputs": [], + "source": [ + "import torch\n", + "from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HdQnWEzW3lzP" + }, + "outputs": [], + "source": [ + "AUDIO_MODEL = \"openai/whisper-medium\"\n", + "speech_model = AutoModelForSpeechSeq2Seq.from_pretrained(AUDIO_MODEL, torch_dtype=torch.float16, low_cpu_mem_usage=True, use_safetensors=True)\n", + "speech_model.to('cuda')\n", + "processor = AutoProcessor.from_pretrained(AUDIO_MODEL)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZhA_fbeCSAeZ" + }, + "outputs": [], + "source": [ + "pipe = pipeline(\n", + " \"automatic-speech-recognition\",\n", + " model=speech_model,\n", + " tokenizer=processor.tokenizer,\n", + " feature_extractor=processor.feature_extractor,\n", + " torch_dtype=torch.float16,\n", + " device='cuda',\n", + " return_timestamps=True\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "nrQjKtD53omJ" + }, + "outputs": [], + "source": [ + "# Use the Whisper OpenAI model to convert the Audio to Text\n", + "result = pipe(audio_filename)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "G_XSljOY3tDf" + }, + "outputs": [], + "source": [ + "transcription = result[\"text\"]\n", + "print(transcription)" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/week3/community-contributions/llm-wk3d5-minutecreator.ipynb b/week3/community-contributions/llm-wk3d5-minutecreator.ipynb new file mode 100644 index 0000000..ce767ed --- /dev/null +++ b/week3/community-contributions/llm-wk3d5-minutecreator.ipynb @@ -0,0 +1,287 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zmpDFA3bGEHY" + }, + "source": [ + "Minute creator in Gradio from day 5 of week 3.\n", + "A couple of points to note:\n", + "\n", + "\n", + "* My access to llama hasn't been approved on Hugging Face and so I've experimented with some of the other models.\n", + "* There is a fair bit of debugging code in the main function as I was getting an error and couldn't find it. I've left it in just in case its useful for others trying to debug their code.\n", + "* I was debugging with the help of Claude. It suggested using for the minute output. The rationale is that it disables gradient computation which isn't necessary for inference and I found it did speed things up.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "l-5xKLFeJUGz" + }, + "outputs": [], + "source": [ + "!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Wi-bBD9VdBMo" + }, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from openai import OpenAI\n", + "from IPython.display import Markdown, display, update_display\n", + "from google.colab import drive\n", + "from huggingface_hub import login\n", + "from google.colab import userdata\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n", + "import torch\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-0O-kuWtdk4I" + }, + "outputs": [], + "source": [ + "# keys\n", + "\n", + "#openai\n", + "openai_api_key = userdata.get('OPENAI_API_KEY')\n", + "openai = OpenAI(api_key=openai_api_key)\n", + "\n", + "#hf\n", + "hf_token = userdata.get('HF_TOKEN')\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "u6v3Ecileg1H" + }, + "outputs": [], + "source": [ + "# constants\n", + "\n", + "AUDIO_MODEL = 'gpt-4o-transcribe'\n", + "OPENAI_MODEL = 'gpt-4o-mini'\n", + "QWEN2_MODEL = 'Qwen/Qwen2.5-7B-Instruct' # runs slowly no matter what size gpu - kept crashing on ram!\n", + "GEMMA2_MODEL = \"google/gemma-2-2b-it\" # doesn't use a system prompt\n", + "PHI3 = \"microsoft/Phi-3-mini-4k-instruct\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3nSfA_KhfY38" + }, + "outputs": [], + "source": [ + "# convert audio to text\n", + "\n", + "def transcribe_audio(audio_file_path):\n", + " try:\n", + " with open (audio_file_path, 'rb') as audio_file:\n", + " transcript = openai.audio.transcriptions.create(model = AUDIO_MODEL, file = audio_file, response_format=\"text\")\n", + " return transcript\n", + " except Exception as e:\n", + " return f\"An error occurred: {str(e)}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OVmlY3DGgnYc" + }, + "outputs": [], + "source": [ + "# use transcript to create minutes\n", + "# use open source model\n", + "\n", + "def create_minutes(transcript):\n", + "\n", + " # first try is for debugging\n", + " try:\n", + " print(f\"Starting to create minutes with transcript length: {len(str(transcript))}\")\n", + "\n", + " if not transcript or len(str(transcript).strip()) == 0:\n", + " return \"Error: Empty or invalid transcript\"\n", + "\n", + " #messages\n", + " system_prompt = \"You are an expert creator of meeting minutes. Based on a meeting transcript you can summarise the meeting title and date, attendees, key discussion points, key outcomes, actions and owners and next steps. Respond in Markdown.\"\n", + " user_prompt = f\"Create meeting minutes from the transcript provided. The minutes should be clear but succint and should include title and date, attendees, key discussion points, key outcomes, actions and owners, and next steps. {transcript}\"\n", + "\n", + " messages = [\n", + " {\"role\":\"system\",\"content\":system_prompt},\n", + " {\"role\":\"user\",\"content\":user_prompt}\n", + " ]\n", + " print(\"Messages prepared successfully\") # for debugging\n", + "\n", + " # quantisation (for os model)\n", + "\n", + " quantization_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_quant_type=\"nf4\",\n", + " bnb_4bit_compute_dtype=torch.bfloat16\n", + " )\n", + "\n", + " except Exception as e:\n", + " return f\"An error occurred in setup: {str(e)}\"\n", + "\n", + " # model & tokeniser\n", + " try:\n", + " print(\"Loading tokeniser....\") # for debugging\n", + " tokenizer = AutoTokenizer.from_pretrained(PHI3)\n", + " tokenizer.pad_token = tokenizer.eos_token\n", + "\n", + " print(\"Loading model.....\") # for debugging\n", + " model = AutoModelForCausalLM.from_pretrained(PHI3, device_map='auto', quantization_config=quantization_config)\n", + " print(f\"Model loaded on device {model.device}\") # for debugging\n", + "\n", + " # chat template\n", + " inputs = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\n", + " model_inputs = tokenizer(inputs, return_tensors=\"pt\").to(model.device)\n", + "\n", + " # torch.no_grad suggested by claude. This disables gradient computation which reduces memory usage and speeds things up\n", + " print(\"Generating text....\") # for debugging\n", + " with torch.no_grad():\n", + " outputs = model.generate(**model_inputs, max_new_tokens=2000, do_sample=True, temperature=0.7)\n", + " print(f\"Generation complete. Output shape: {outputs.shape}\") # for debugging\n", + "\n", + " #***debugging****\n", + "\n", + " # Decode the generated text (excluding the input prompt)\n", + " print(\"Starting text decoding...\") # debugging\n", + " input_length = len(model_inputs['input_ids'][0]) # debugging\n", + " print(f\"Input length: {input_length}, Output length: {len(outputs[0])}\") # debugging\n", + "\n", + " if len(outputs[0]) <= input_length: # debugging\n", + " return \"Error: Model didn't generate any new tokens. Try reducing input length or increasing max_new_tokens.\" # debugging\n", + "\n", + " generated_tokens = outputs[0][input_length:] # debugging\n", + " print(f\"Generated tokens length: {len(generated_tokens)}\") # debugging\n", + "\n", + " # decode generated text\n", + " generated_text = tokenizer.decode(outputs[0][len(model_inputs['input_ids'][0]):],skip_special_tokens=True)\n", + " print(f\"Decoded text length: {len(generated_text)}\")\n", + "\n", + " return generated_text.strip()\n", + "\n", + " except ImportError as e:\n", + " return f\"Import error - missing library: {str(e)}. Please install required packages.\"\n", + " except torch.cuda.OutOfMemoryError as e:\n", + " return f\"CUDA out of memory: {str(e)}. Try reducing max_new_tokens to 500 or use CPU.\"\n", + " except RuntimeError as e:\n", + " return f\"Runtime error: {str(e)}. This might be a CUDA/device issue.\"\n", + " except Exception as e:\n", + " return f\"Unexpected error during text generation: {type(e).__name__}: {str(e)}\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c63zzoDopw6u" + }, + "outputs": [], + "source": [ + "# create process for gradio\n", + "\n", + "def gr_process(audio_file, progress = gr.Progress()):\n", + "\n", + " if audio_file is None:\n", + " return \"Please provide an audio file\"\n", + "\n", + " try:\n", + " progress(0, desc=\"Analysing file\")\n", + " transcript = transcribe_audio(audio_file)\n", + "\n", + " if transcript.startswith(\"An error occurred\"):\n", + " return transcript\n", + "\n", + " progress(0.5, desc=\"File analysed, generating minutes\")\n", + "\n", + " minutes = create_minutes(transcript)\n", + " progress(0.9, desc=\"Nearly there\")\n", + "\n", + " return minutes\n", + "\n", + " except Exception as e:\n", + " return f\"An error occurred: {str(e)}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "82fyQELQkGty" + }, + "outputs": [], + "source": [ + "# gradio interface\n", + "\n", + "demo = gr.Interface(\n", + " fn=gr_process,\n", + " inputs= gr.Audio(type=\"filepath\",label=\"Upload MP3 file\"),\n", + " outputs= gr.Markdown(label=\"Meeting minutes\"),\n", + " title = \"Meeting minute creator\",\n", + " description = \"Upload an mp3 audio file for a meeting and I will provide the minutes!\"\n", + ")\n", + "\n", + "if __name__ == \"__main__\":\n", + " demo.launch(debug=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "XljpyS7Nvxkh" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/week3/community-contributions/llm-wk3synthetic-data-creator.ipynb b/week3/community-contributions/llm-wk3synthetic-data-creator.ipynb new file mode 100644 index 0000000..f026965 --- /dev/null +++ b/week3/community-contributions/llm-wk3synthetic-data-creator.ipynb @@ -0,0 +1,295 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- This creates dummy / test data from a usecase provided by the user.\n", + "- The usecase can be as simple or complex as the user wants (I've tested both and the results are good).\n", + "- I've used a Phi3 model as I'm having issues with llama access on Hugging Face." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "s7ERjTCEKSi_" + }, + "outputs": [], + "source": [ + "!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GG5VMcmhcA2N" + }, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from openai import OpenAI\n", + "import gradio as gr\n", + "from IPython.display import Markdown, display, update_display\n", + "from huggingface_hub import login\n", + "from google.colab import userdata\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n", + "import torch\n", + "import json\n", + "import re\n", + "import pandas as pd\n", + "import io" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UfL-2XNicpEB" + }, + "outputs": [], + "source": [ + "# constants\n", + "\n", + "OPENAI = 'gpt-4o-mini'\n", + "PHI3 = \"microsoft/Phi-3-mini-4k-instruct\"\n", + "\n", + "limit = 100\n", + "max_tokens = 1000\n", + "temperature = 0.3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZQ0dcQ6hdTPo" + }, + "outputs": [], + "source": [ + "# keys\n", + "\n", + "openai_api_key = userdata.get('OPENAI_API_KEY')\n", + "openai = OpenAI(api_key=openai_api_key)\n", + "\n", + "hf_token = userdata.get('HF_TOKEN')\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2eHsLdYgd2d_" + }, + "outputs": [], + "source": [ + "system_prompt = f\"\"\"You create synthetic datasets for testing purposes. Based on the use case description, generate a CSV dataset with appropriate columns and a maximum of {limit} rows\n", + "of realistic data.\n", + "\n", + "IMPORTANT RULES:\n", + "1. Return ONLY the CSV data with headers and ensure there are no duplicate headers\n", + "2. No explanatory text before or after\n", + "3. No markdown formatting or code fences\n", + "4. No quotation marks around the entire response\n", + "5. Start directly with the column headers\n", + "\n", + "Format: column1 (e.g. customer_id),column2 (e.g. country),column3 (e.g. age)\n", + "row1data,row1data,row1data\n", + "row2data,row2data,row2data\"\"\"\n", + "\n", + "def data_user_prompt(usecase):\n", + " user_prompt = \"Create a synthetic dataset for the use case provided below: \"\n", + " user_prompt += usecase\n", + " user_prompt += f\" Respond in csv with appropriate headers. Do not include any other explanatory text, markdown formatting or code fences, or quotation marks around the entire response. \\\n", + " Limit the rows in the dataset to {limit}.\"\n", + " return user_prompt\n", + "\n", + "messages = [\n", + " {\"role\":\"system\",\"content\":system_prompt},\n", + " {\"role\":\"user\",\"content\":data_user_prompt(usecase)}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "necoAEc1gNPF" + }, + "outputs": [], + "source": [ + "def dataset_call(usecase):\n", + "\n", + " #quantisation\n", + " quant_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_quant_type=\"nf4\",\n", + " bnb_4bit_compute_dtype=torch.bfloat16\n", + " )\n", + "\n", + " #tokenization\n", + " tokenizer = AutoTokenizer.from_pretrained(PHI3)\n", + " tokenizer.pad_token = tokenizer.eos_token\n", + "\n", + " #model\n", + " model = AutoModelForCausalLM.from_pretrained(PHI3, quantization_config=quant_config, device_map=\"auto\")\n", + "\n", + " #inputs & outputs\n", + " inputs = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\n", + " model_inputs = tokenizer(inputs, return_tensors=\"pt\").to(model.device)\n", + " #streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)\n", + "\n", + " with torch.no_grad():\n", + " outputs = model.generate(**model_inputs, max_new_tokens=max_tokens,do_sample=True, temperature=temperature)\n", + "\n", + " response = tokenizer.decode(outputs[0][len(model_inputs['input_ids'][0]):],skip_special_tokens=True)\n", + " return response.strip()\n", + " print(response.strip())\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "g8zEBraI0grT" + }, + "outputs": [], + "source": [ + "# convert csv string into panda\n", + "\n", + "def csv_handler(csv_string):\n", + "\n", + " try:\n", + " # Convert CSV string to DataFrame\n", + " df = pd.read_csv(io.StringIO(csv_string))\n", + " return df\n", + " except Exception as e:\n", + " # Return error message as DataFrame if parsing fails\n", + " error_df = pd.DataFrame({\"Error\": [f\"Failed to parse CSV: {str(e)}\"]})\n", + " return error_df\n", + " print(df, error_df)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vLPsusTL1zNB" + }, + "outputs": [], + "source": [ + "# usecase to csv_string\n", + "\n", + "def usecase_to_csv(usecase):\n", + " try:\n", + " # Get CSV string from your LLM\n", + " csv_string = dataset_call(usecase)\n", + "\n", + " # Process into DataFrame for Gradio display\n", + " df = csv_handler(csv_string)\n", + "\n", + " return df\n", + "\n", + " except Exception as e:\n", + " error_df = pd.DataFrame({\"Error\": [f\"LLM processing failed: {str(e)}\"]})\n", + " return error_df, \"\", gr.update(visible=False)\n", + "\n", + " print(df, error_df)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "H3WTLa9a2Rdy" + }, + "outputs": [], + "source": [ + "def download_csv(csv_string):\n", + " if csv_string:\n", + " return csv_string\n", + " return \"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "XhMVSrVhjYvz" + }, + "outputs": [], + "source": [ + "#test\n", + "usecase = \"A financial services company is looking for synthetic data to test its Expected Credit Losses (ECL) model under IFRS9.\"\n", + "#dataset_call(usecase)\n", + "usecase_to_csv(usecase)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "z3Ze4o2qjs5y" + }, + "outputs": [], + "source": [ + "\n", + "demo = gr.Interface(\n", + " fn = usecase_to_csv,\n", + " inputs = gr.Textbox(lines=5,label=\"Describe your usecase\",placeholder=\"Describe the dataset you would like to create and how you will use it\"),\n", + " outputs = gr.DataFrame(label=\"Here is your dataset!\",interactive=True),\n", + " title = \"Friendly Neighbourhood Synthetic Data Creator!\",\n", + " description = \"Let me know your use case for synthetic data and I will create it for you.\",\n", + " examples=[\n", + " \"Generate a dataset of 10 employees with name, department, salary, and years of experience\",\n", + " \"Create sample e-commerce data with product names, categories, prices, and ratings\",\n", + " \"Generate customer survey responses with demographics and satisfaction scores\",\n", + " \"A financial services company is looking for synthetic data to test its Expected Credit Losses (ECL) model under IFRS9.\"\n", + " ]\n", + ")\n", + "\n", + "demo.launch(debug=True)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ck1qdmbHo_G3" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "authorship_tag": "ABX9TyOay+EACzwO0uXDLuayhscX", + "gpuType": "L4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/week3/community-contributions/llm.wk3synthetic-data-creator.ipynb b/week3/community-contributions/llm.wk3synthetic-data-creator.ipynb new file mode 100644 index 0000000..f026965 --- /dev/null +++ b/week3/community-contributions/llm.wk3synthetic-data-creator.ipynb @@ -0,0 +1,295 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- This creates dummy / test data from a usecase provided by the user.\n", + "- The usecase can be as simple or complex as the user wants (I've tested both and the results are good).\n", + "- I've used a Phi3 model as I'm having issues with llama access on Hugging Face." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "s7ERjTCEKSi_" + }, + "outputs": [], + "source": [ + "!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GG5VMcmhcA2N" + }, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from openai import OpenAI\n", + "import gradio as gr\n", + "from IPython.display import Markdown, display, update_display\n", + "from huggingface_hub import login\n", + "from google.colab import userdata\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n", + "import torch\n", + "import json\n", + "import re\n", + "import pandas as pd\n", + "import io" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UfL-2XNicpEB" + }, + "outputs": [], + "source": [ + "# constants\n", + "\n", + "OPENAI = 'gpt-4o-mini'\n", + "PHI3 = \"microsoft/Phi-3-mini-4k-instruct\"\n", + "\n", + "limit = 100\n", + "max_tokens = 1000\n", + "temperature = 0.3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZQ0dcQ6hdTPo" + }, + "outputs": [], + "source": [ + "# keys\n", + "\n", + "openai_api_key = userdata.get('OPENAI_API_KEY')\n", + "openai = OpenAI(api_key=openai_api_key)\n", + "\n", + "hf_token = userdata.get('HF_TOKEN')\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2eHsLdYgd2d_" + }, + "outputs": [], + "source": [ + "system_prompt = f\"\"\"You create synthetic datasets for testing purposes. Based on the use case description, generate a CSV dataset with appropriate columns and a maximum of {limit} rows\n", + "of realistic data.\n", + "\n", + "IMPORTANT RULES:\n", + "1. Return ONLY the CSV data with headers and ensure there are no duplicate headers\n", + "2. No explanatory text before or after\n", + "3. No markdown formatting or code fences\n", + "4. No quotation marks around the entire response\n", + "5. Start directly with the column headers\n", + "\n", + "Format: column1 (e.g. customer_id),column2 (e.g. country),column3 (e.g. age)\n", + "row1data,row1data,row1data\n", + "row2data,row2data,row2data\"\"\"\n", + "\n", + "def data_user_prompt(usecase):\n", + " user_prompt = \"Create a synthetic dataset for the use case provided below: \"\n", + " user_prompt += usecase\n", + " user_prompt += f\" Respond in csv with appropriate headers. Do not include any other explanatory text, markdown formatting or code fences, or quotation marks around the entire response. \\\n", + " Limit the rows in the dataset to {limit}.\"\n", + " return user_prompt\n", + "\n", + "messages = [\n", + " {\"role\":\"system\",\"content\":system_prompt},\n", + " {\"role\":\"user\",\"content\":data_user_prompt(usecase)}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "necoAEc1gNPF" + }, + "outputs": [], + "source": [ + "def dataset_call(usecase):\n", + "\n", + " #quantisation\n", + " quant_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_quant_type=\"nf4\",\n", + " bnb_4bit_compute_dtype=torch.bfloat16\n", + " )\n", + "\n", + " #tokenization\n", + " tokenizer = AutoTokenizer.from_pretrained(PHI3)\n", + " tokenizer.pad_token = tokenizer.eos_token\n", + "\n", + " #model\n", + " model = AutoModelForCausalLM.from_pretrained(PHI3, quantization_config=quant_config, device_map=\"auto\")\n", + "\n", + " #inputs & outputs\n", + " inputs = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\n", + " model_inputs = tokenizer(inputs, return_tensors=\"pt\").to(model.device)\n", + " #streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)\n", + "\n", + " with torch.no_grad():\n", + " outputs = model.generate(**model_inputs, max_new_tokens=max_tokens,do_sample=True, temperature=temperature)\n", + "\n", + " response = tokenizer.decode(outputs[0][len(model_inputs['input_ids'][0]):],skip_special_tokens=True)\n", + " return response.strip()\n", + " print(response.strip())\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "g8zEBraI0grT" + }, + "outputs": [], + "source": [ + "# convert csv string into panda\n", + "\n", + "def csv_handler(csv_string):\n", + "\n", + " try:\n", + " # Convert CSV string to DataFrame\n", + " df = pd.read_csv(io.StringIO(csv_string))\n", + " return df\n", + " except Exception as e:\n", + " # Return error message as DataFrame if parsing fails\n", + " error_df = pd.DataFrame({\"Error\": [f\"Failed to parse CSV: {str(e)}\"]})\n", + " return error_df\n", + " print(df, error_df)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vLPsusTL1zNB" + }, + "outputs": [], + "source": [ + "# usecase to csv_string\n", + "\n", + "def usecase_to_csv(usecase):\n", + " try:\n", + " # Get CSV string from your LLM\n", + " csv_string = dataset_call(usecase)\n", + "\n", + " # Process into DataFrame for Gradio display\n", + " df = csv_handler(csv_string)\n", + "\n", + " return df\n", + "\n", + " except Exception as e:\n", + " error_df = pd.DataFrame({\"Error\": [f\"LLM processing failed: {str(e)}\"]})\n", + " return error_df, \"\", gr.update(visible=False)\n", + "\n", + " print(df, error_df)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "H3WTLa9a2Rdy" + }, + "outputs": [], + "source": [ + "def download_csv(csv_string):\n", + " if csv_string:\n", + " return csv_string\n", + " return \"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "XhMVSrVhjYvz" + }, + "outputs": [], + "source": [ + "#test\n", + "usecase = \"A financial services company is looking for synthetic data to test its Expected Credit Losses (ECL) model under IFRS9.\"\n", + "#dataset_call(usecase)\n", + "usecase_to_csv(usecase)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "z3Ze4o2qjs5y" + }, + "outputs": [], + "source": [ + "\n", + "demo = gr.Interface(\n", + " fn = usecase_to_csv,\n", + " inputs = gr.Textbox(lines=5,label=\"Describe your usecase\",placeholder=\"Describe the dataset you would like to create and how you will use it\"),\n", + " outputs = gr.DataFrame(label=\"Here is your dataset!\",interactive=True),\n", + " title = \"Friendly Neighbourhood Synthetic Data Creator!\",\n", + " description = \"Let me know your use case for synthetic data and I will create it for you.\",\n", + " examples=[\n", + " \"Generate a dataset of 10 employees with name, department, salary, and years of experience\",\n", + " \"Create sample e-commerce data with product names, categories, prices, and ratings\",\n", + " \"Generate customer survey responses with demographics and satisfaction scores\",\n", + " \"A financial services company is looking for synthetic data to test its Expected Credit Losses (ECL) model under IFRS9.\"\n", + " ]\n", + ")\n", + "\n", + "demo.launch(debug=True)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ck1qdmbHo_G3" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "authorship_tag": "ABX9TyOay+EACzwO0uXDLuayhscX", + "gpuType": "L4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/week3/community-contributions/llm_wk3d5_minutecreator.ipynb b/week3/community-contributions/llm_wk3d5_minutecreator.ipynb new file mode 100644 index 0000000..ce767ed --- /dev/null +++ b/week3/community-contributions/llm_wk3d5_minutecreator.ipynb @@ -0,0 +1,287 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zmpDFA3bGEHY" + }, + "source": [ + "Minute creator in Gradio from day 5 of week 3.\n", + "A couple of points to note:\n", + "\n", + "\n", + "* My access to llama hasn't been approved on Hugging Face and so I've experimented with some of the other models.\n", + "* There is a fair bit of debugging code in the main function as I was getting an error and couldn't find it. I've left it in just in case its useful for others trying to debug their code.\n", + "* I was debugging with the help of Claude. It suggested using for the minute output. The rationale is that it disables gradient computation which isn't necessary for inference and I found it did speed things up.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "l-5xKLFeJUGz" + }, + "outputs": [], + "source": [ + "!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Wi-bBD9VdBMo" + }, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from openai import OpenAI\n", + "from IPython.display import Markdown, display, update_display\n", + "from google.colab import drive\n", + "from huggingface_hub import login\n", + "from google.colab import userdata\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n", + "import torch\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-0O-kuWtdk4I" + }, + "outputs": [], + "source": [ + "# keys\n", + "\n", + "#openai\n", + "openai_api_key = userdata.get('OPENAI_API_KEY')\n", + "openai = OpenAI(api_key=openai_api_key)\n", + "\n", + "#hf\n", + "hf_token = userdata.get('HF_TOKEN')\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "u6v3Ecileg1H" + }, + "outputs": [], + "source": [ + "# constants\n", + "\n", + "AUDIO_MODEL = 'gpt-4o-transcribe'\n", + "OPENAI_MODEL = 'gpt-4o-mini'\n", + "QWEN2_MODEL = 'Qwen/Qwen2.5-7B-Instruct' # runs slowly no matter what size gpu - kept crashing on ram!\n", + "GEMMA2_MODEL = \"google/gemma-2-2b-it\" # doesn't use a system prompt\n", + "PHI3 = \"microsoft/Phi-3-mini-4k-instruct\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3nSfA_KhfY38" + }, + "outputs": [], + "source": [ + "# convert audio to text\n", + "\n", + "def transcribe_audio(audio_file_path):\n", + " try:\n", + " with open (audio_file_path, 'rb') as audio_file:\n", + " transcript = openai.audio.transcriptions.create(model = AUDIO_MODEL, file = audio_file, response_format=\"text\")\n", + " return transcript\n", + " except Exception as e:\n", + " return f\"An error occurred: {str(e)}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OVmlY3DGgnYc" + }, + "outputs": [], + "source": [ + "# use transcript to create minutes\n", + "# use open source model\n", + "\n", + "def create_minutes(transcript):\n", + "\n", + " # first try is for debugging\n", + " try:\n", + " print(f\"Starting to create minutes with transcript length: {len(str(transcript))}\")\n", + "\n", + " if not transcript or len(str(transcript).strip()) == 0:\n", + " return \"Error: Empty or invalid transcript\"\n", + "\n", + " #messages\n", + " system_prompt = \"You are an expert creator of meeting minutes. Based on a meeting transcript you can summarise the meeting title and date, attendees, key discussion points, key outcomes, actions and owners and next steps. Respond in Markdown.\"\n", + " user_prompt = f\"Create meeting minutes from the transcript provided. The minutes should be clear but succint and should include title and date, attendees, key discussion points, key outcomes, actions and owners, and next steps. {transcript}\"\n", + "\n", + " messages = [\n", + " {\"role\":\"system\",\"content\":system_prompt},\n", + " {\"role\":\"user\",\"content\":user_prompt}\n", + " ]\n", + " print(\"Messages prepared successfully\") # for debugging\n", + "\n", + " # quantisation (for os model)\n", + "\n", + " quantization_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_quant_type=\"nf4\",\n", + " bnb_4bit_compute_dtype=torch.bfloat16\n", + " )\n", + "\n", + " except Exception as e:\n", + " return f\"An error occurred in setup: {str(e)}\"\n", + "\n", + " # model & tokeniser\n", + " try:\n", + " print(\"Loading tokeniser....\") # for debugging\n", + " tokenizer = AutoTokenizer.from_pretrained(PHI3)\n", + " tokenizer.pad_token = tokenizer.eos_token\n", + "\n", + " print(\"Loading model.....\") # for debugging\n", + " model = AutoModelForCausalLM.from_pretrained(PHI3, device_map='auto', quantization_config=quantization_config)\n", + " print(f\"Model loaded on device {model.device}\") # for debugging\n", + "\n", + " # chat template\n", + " inputs = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\n", + " model_inputs = tokenizer(inputs, return_tensors=\"pt\").to(model.device)\n", + "\n", + " # torch.no_grad suggested by claude. This disables gradient computation which reduces memory usage and speeds things up\n", + " print(\"Generating text....\") # for debugging\n", + " with torch.no_grad():\n", + " outputs = model.generate(**model_inputs, max_new_tokens=2000, do_sample=True, temperature=0.7)\n", + " print(f\"Generation complete. Output shape: {outputs.shape}\") # for debugging\n", + "\n", + " #***debugging****\n", + "\n", + " # Decode the generated text (excluding the input prompt)\n", + " print(\"Starting text decoding...\") # debugging\n", + " input_length = len(model_inputs['input_ids'][0]) # debugging\n", + " print(f\"Input length: {input_length}, Output length: {len(outputs[0])}\") # debugging\n", + "\n", + " if len(outputs[0]) <= input_length: # debugging\n", + " return \"Error: Model didn't generate any new tokens. Try reducing input length or increasing max_new_tokens.\" # debugging\n", + "\n", + " generated_tokens = outputs[0][input_length:] # debugging\n", + " print(f\"Generated tokens length: {len(generated_tokens)}\") # debugging\n", + "\n", + " # decode generated text\n", + " generated_text = tokenizer.decode(outputs[0][len(model_inputs['input_ids'][0]):],skip_special_tokens=True)\n", + " print(f\"Decoded text length: {len(generated_text)}\")\n", + "\n", + " return generated_text.strip()\n", + "\n", + " except ImportError as e:\n", + " return f\"Import error - missing library: {str(e)}. Please install required packages.\"\n", + " except torch.cuda.OutOfMemoryError as e:\n", + " return f\"CUDA out of memory: {str(e)}. Try reducing max_new_tokens to 500 or use CPU.\"\n", + " except RuntimeError as e:\n", + " return f\"Runtime error: {str(e)}. This might be a CUDA/device issue.\"\n", + " except Exception as e:\n", + " return f\"Unexpected error during text generation: {type(e).__name__}: {str(e)}\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c63zzoDopw6u" + }, + "outputs": [], + "source": [ + "# create process for gradio\n", + "\n", + "def gr_process(audio_file, progress = gr.Progress()):\n", + "\n", + " if audio_file is None:\n", + " return \"Please provide an audio file\"\n", + "\n", + " try:\n", + " progress(0, desc=\"Analysing file\")\n", + " transcript = transcribe_audio(audio_file)\n", + "\n", + " if transcript.startswith(\"An error occurred\"):\n", + " return transcript\n", + "\n", + " progress(0.5, desc=\"File analysed, generating minutes\")\n", + "\n", + " minutes = create_minutes(transcript)\n", + " progress(0.9, desc=\"Nearly there\")\n", + "\n", + " return minutes\n", + "\n", + " except Exception as e:\n", + " return f\"An error occurred: {str(e)}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "82fyQELQkGty" + }, + "outputs": [], + "source": [ + "# gradio interface\n", + "\n", + "demo = gr.Interface(\n", + " fn=gr_process,\n", + " inputs= gr.Audio(type=\"filepath\",label=\"Upload MP3 file\"),\n", + " outputs= gr.Markdown(label=\"Meeting minutes\"),\n", + " title = \"Meeting minute creator\",\n", + " description = \"Upload an mp3 audio file for a meeting and I will provide the minutes!\"\n", + ")\n", + "\n", + "if __name__ == \"__main__\":\n", + " demo.launch(debug=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "XljpyS7Nvxkh" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/week3/community-contributions/muawiya/README.md b/week3/community-contributions/muawiya/README.md new file mode 100644 index 0000000..fea9832 --- /dev/null +++ b/week3/community-contributions/muawiya/README.md @@ -0,0 +1,102 @@ +# 🧠 Synthetic Data Generator + +A Python-based tool to generate structured, synthetic job postings using open-source LLMs from Hugging Face. +This project supports both **script-based execution** and an **interactive Colab notebook**, making it ideal for rapid prototyping, dataset bootstrapping, or demonstrating prompt engineering techniques. + +> Note: Original Repo can be found at: https://github.com/moawiah/synthetic_data_generator + + +![Demo Screenshot](https://github.com/user-attachments/assets/c0e229ac-ddb7-4a37-8088-f04ca735cd81) + + +This tool helps: +- Researchers create labeled training data for NLP classification or QA +- HR tech startups prototype recommendation models +- AI instructors demonstrate few-shot prompting in class + + +--- + +## ✨ Features + +- 🔗 Integrates Hugging Face Transformer models +- 📄 Generates realistic job postings in structured JSON format +- 🧪 Supports prompt engineering with control over output length and variability +- 🧠 Minimal Gradio UI for non-technical users +- 📓 Jupyter/Colab support for experimentation and reproducibility + +## 📂 Project Structure +
 ```
+. ├── app/ 
+    │ 
+    ├── app.py # Main script entry point 
+    │ 
+    ├── consts.py # Configuration and constants 
+    │ 
+    └── requirements.txt # Python dependencies 
+  ├── data/ 
+    │ 
+    └── software_engineer_jobs.json # Sample input data (JSON format) 
+  ├── notebooks/ 
+    │ 
+    └── synthetic_data_generator.ipynb # Interactive Colab notebook 
+  ├── .env.example # Sample environment variable config 
+  ├── .gitignore # Git ignored files list 
+  └── README.md
+  ``` 
+ +## 🚀 Getting Started + +### 1. Clone the repository +```bash +git clone https://github.com/moawiah/synthetic_data_generator.git +cd synthetic_data_generator +``` +### Install Dependencies +```bah +pip install -r app/requirements.txt +``` +### Hugging Face Token +You need to create a `.env` file with your HuggingFace token like `HF_TOKEN=your-token-here` + +### Run +run the app using +`python app/app.py` + + +## Example Output - 1 Job + +```JSON +{ +"title": "Software Engineer" +, +"description": "We are seeking a highly skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have experience in designing, coding, and testing software systems, and will be able to work collaboratively with cross-functional teams. Responsibilities include writing clean, maintainable, and efficient code, as well as actively participating in code reviews and continuous integration processes. This is an excellent opportunity for a self-starter with a passion for technology and a desire to grow in their career." +, +"requirements":[ +"0":"Bachelor's degree in Computer Science or related field", +"1":"Minimum of 2 years experience in software development", +"2":"Strong proficiency in Java or C++", +"3":"Experience with agile development methodologies", +"4":"Good understanding of data structures and algorithms", +"5":"Excellent problem-solving and analytical skills" +], +"location":"New York, NY", +"company_name":"ABC Technologies" +} + +``` + + +## Future Improvements +🔁 Add support for more job roles and industries + +🧠 Model selector from UI + +💾 Export dataset as CSV + +☁️ Optional integration with LangChain or RAG workflows + + + + + diff --git a/week3/community-contributions/muawiya/app/app.py b/week3/community-contributions/muawiya/app/app.py new file mode 100644 index 0000000..4b3fc79 --- /dev/null +++ b/week3/community-contributions/muawiya/app/app.py @@ -0,0 +1,156 @@ +import os +import requests +from IPython.display import Markdown, display, update_display +from openai import OpenAI +from google.colab import drive +from huggingface_hub import login +from google.colab import userdata +from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig, pipeline, TextGenerationPipeline +import torch +from consts import FALCON, MISTRAL, Databricks +from dotenv import load_dotenv +import json +import ast +import gradio as gr +import re + +# Sign in to HuggingFace Hub +load_dotenv() +hf_token = os.getenv("HF_TOKEN") + + +# Main Prompt +prompt = """ +Generate one fake job posting for a {{role}}. + +Return only a single JSON object with: +- title +- description (5-10 sentences) +- requirements (array of 4-6 strings) +- location +- company_name + +No explanations, no extra text. +Only the JSON object. +""" + +# Main Conf +bnb_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_use_double_quant=True, + bnb_4bit_compute_dtype=torch.bfloat16, + bnb_4bit_quant_type="nf4" +) + +def load_model_and_tokenizer(): + tokenizer = AutoTokenizer.from_pretrained(MISTRAL, trust_remote_code=True) + + model = AutoModelForCausalLM.from_pretrained( + MISTRAL, + device_map={"": "cuda"}, + trust_remote_code=True, + offload_folder="/tmp/dolly_offload", + quantization_config=bnb_config + ) + + return model, tokenizer + + +def generate_job(role="Software Engineer", model=None, tokenizer=None): + # prompt = prompt.format(role=role, n=n) + # outputs = generator(prompt, max_new_tokens=500, do_sample=True, temperature=0.9) + # return outputs[0]['generated_text'] + + # Apply chat template formatting + # inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device) + inputs = tokenizer(prompt.format(role=role), return_tensors="pt") + inputs = {k: v.to(model.device) for k, v in inputs.items()} + + + # Generate output + outputs = model.generate( + **inputs, + max_new_tokens=600, + do_sample=True, + temperature=0.2, + top_p=0.9, + pad_token_id=tokenizer.eos_token_id + ) + + # Decode and return + result = tokenizer.decode(outputs[0], skip_special_tokens=True) + return result + +def generate_jobs(role="Software Engineer", n=5): + model, tokenizer = load_model_and_tokenizer() + role = "Software Engineer" + fake_jobs = [] + for i in range(n): + fake_jobs.append(generate_job(role=role, model=model, tokenizer=tokenizer)) + return fake_jobs + +def extract_json_objects_from_text_block(texts): + """ + Accepts either a single string or a list of strings. + Extracts all valid JSON objects from messy text blocks. + """ + if isinstance(texts, str): + texts = [texts] # wrap in list if single string + + pattern = r"\{[\s\S]*?\}" + results = [] + + for raw_text in texts: + matches = re.findall(pattern, raw_text) + for match in matches: + try: + obj = json.loads(match) + results.append(obj) + except json.JSONDecodeError: + continue + + return results + +def generate_ui(role, n): + try: + raw_jobs = generate_jobs(role, n) + parsed_jobs = extract_json_objects_from_text_block(raw_jobs) + + if not isinstance(parsed_jobs, list) or not all(isinstance(item, dict) for item in parsed_jobs): + print("[ERROR] Parsed result is not a list of dicts") + return gr.update(value=[], visible=True), None + + filename = f"data/{role.replace(' ', '_').lower()}_jobs.json" + with open(filename, "w") as f: + json.dump(parsed_jobs, f, indent=2) + + print(f"[INFO] Returning {len(parsed_jobs)} jobs -> {filename}") + return parsed_jobs, filename + + except Exception as e: + print(f"[FATAL ERROR] {e}") + return gr.update(value=[], visible=True), None + + +if __name__ == "__main__": + with gr.Blocks() as demo: + gr.Markdown("# 🧠 Synthetic Job Dataset Generator") + gr.Markdown("Generate a structured dataset of job postings for a specific role.") + + with gr.Row(): + role_input = gr.Textbox(label="Job Role", placeholder="e.g. Software Engineer", value="Software Engineer") + n_input = gr.Number(label="Number of Samples", value=5, precision=0) + + generate_button = gr.Button("🚀 Generate") + output_table = gr.JSON(label="Generated Dataset") + download_button = gr.File(label="Download JSON") + + generate_button.click( + generate_ui, + inputs=[role_input, n_input], + outputs=[output_table, download_button] + ) + + demo.launch(debug=True, share=True) + + diff --git a/week3/community-contributions/muawiya/app/consts.py b/week3/community-contributions/muawiya/app/consts.py new file mode 100644 index 0000000..b62eb2d --- /dev/null +++ b/week3/community-contributions/muawiya/app/consts.py @@ -0,0 +1,5 @@ +# Models +GPT = 'gpt2' +FALCON = "tiiuae/falcon-rw-1b" +MISTRAL = "mistralai/Mistral-7B-Instruct-v0.1" +Databricks = "databricks/dolly-v2-3b" \ No newline at end of file diff --git a/week3/community-contributions/muawiya/app/requirements.txt b/week3/community-contributions/muawiya/app/requirements.txt new file mode 100644 index 0000000..9590dce --- /dev/null +++ b/week3/community-contributions/muawiya/app/requirements.txt @@ -0,0 +1,7 @@ +huggingface_hub==0.30.2 +ipython==8.12.3 +openai==1.76.2 +protobuf==6.30.2 +Requests==2.32.3 +torch==2.6.0+cu124 +transformers==4.51.3 \ No newline at end of file diff --git a/week3/community-contributions/muawiya/data/software_engineer_jobs.json b/week3/community-contributions/muawiya/data/software_engineer_jobs.json new file mode 100644 index 0000000..1a09d49 --- /dev/null +++ b/week3/community-contributions/muawiya/data/software_engineer_jobs.json @@ -0,0 +1,71 @@ +[ + { + "title": "Software Engineer", + "description": "We are seeking a highly skilled software engineer to join our team in developing and maintaining complex software systems. The ideal candidate will have a strong background in computer science and experience with multiple programming languages. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. This is an excellent opportunity for a self-starter with a passion for technology and a desire to grow in their career.", + "requirements": [ + "Bachelor's degree in Computer Science or related field", + "3+ years of experience in software development", + "Strong proficiency in Java or C++", + "Experience with agile development methodologies", + "Excellent problem-solving and analytical skills" + ], + "location": "New York, NY", + "company_name": "ABC Technologies" + }, + { + "title": "Software Engineer", + "description": "We are looking for a highly skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have experience in designing, developing, and testing software systems, and be able to work independently or as part of a team. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. Must have a strong understanding of computer science principles and be able to learn quickly. This is a full-time position located in San Francisco, CA.", + "requirements": [ + "Bachelor's degree in Computer Science or related field", + "3+ years of experience in software development", + "Strong proficiency in Java or C++", + "Experience with agile development methodologies", + "Excellent problem-solving skills", + "Ability to work in a fast-paced environment" + ], + "location": "San Francisco, CA", + "company_name": "Acme Inc." + }, + { + "title": "Software Engineer", + "description": "We are seeking a highly skilled software engineer to join our team in developing and maintaining our cutting-edge software applications. The ideal candidate will have a strong background in computer science and software engineering, with experience in designing, coding, and testing software systems. Responsibilities include collaborating with cross-functional teams, writing clean and efficient code, and ensuring the timely delivery of high-quality software products. This is an excellent opportunity for a self-starter with a passion for technology and a desire to work in a dynamic and fast-paced environment.", + "requirements": [ + "Bachelor's degree in Computer Science or related field", + "3+ years of experience in software engineering", + "Strong proficiency in Java, Python, or C++", + "Experience with agile development methodologies", + "Excellent problem-solving and analytical skills", + "Strong communication and interpersonal skills" + ], + "location": "New York, NY", + "company_name": "ABC Tech" + }, + { + "title": "Software Engineer", + "description": "We are seeking a highly skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have a strong background in computer science and experience with various programming languages and technologies. Responsibilities include designing, coding, testing, and maintaining software systems, as well as collaborating with cross-functional teams. This is an excellent opportunity for a creative and motivated individual to make a significant impact in the tech industry.", + "requirements": [ + "Bachelor's degree in Computer Science or related field", + "Minimum of 2 years experience in software development", + "Strong proficiency in Java, Python, or C++", + "Experience with agile development methodologies", + "Excellent problem-solving and analytical skills", + "Ability to work independently and as part of a team", + "Strong communication and interpersonal skills" + ], + "location": "New York, NY", + "company_name": "ABC Tech Inc." + }, + { + "title": "Software Engineer", + "description": "We are looking for a skilled software engineer to join our team and contribute to the development of innovative software solutions. Responsibilities include designing, coding, testing and maintaining software systems, as well as collaborating with cross-functional teams. The ideal candidate will have a strong background in computer science or a related field, and at least 3 years of experience in software development. Must be proficient in multiple programming languages, including Java, Python, and C++. Strong problem-solving skills and the ability to work independently or as part of a team are required. This is a full-time position located in San Francisco, CA.", + "requirements": [ + "Bachelor's degree in Computer Science or related field", + "At least 3 years of experience in software development", + "Proficiency in Java, Python, and C++", + "Strong problem-solving skills", + "Ability to work independently or as part of a team" + ], + "location": "San Francisco, CA", + "company_name": "Innovative Solutions Inc." + } +] \ No newline at end of file diff --git a/week3/community-contributions/muawiya/notebooks/synthetic_data_generator.ipynb b/week3/community-contributions/muawiya/notebooks/synthetic_data_generator.ipynb new file mode 100644 index 0000000..09f6f9e --- /dev/null +++ b/week3/community-contributions/muawiya/notebooks/synthetic_data_generator.ipynb @@ -0,0 +1,5509 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "machine_shape": "hm", + "gpuType": "A100" + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU", + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "1d1fe06ac632475086ed5964ed000360": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c138f597c98c4944b54d36510ecc8e0b", + "IPY_MODEL_bef2531516164e85bb79b86a791dd00d", + "IPY_MODEL_1cb9fc011950479a8d4832bc52c3399c" + ], + "layout": "IPY_MODEL_974e8f7f05ef472d85d5ea71425e6c39" + } + }, + "c138f597c98c4944b54d36510ecc8e0b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_696090959af8499e9a38777e664b85c1", + "placeholder": "​", + "style": "IPY_MODEL_973bcc9740b4426da4c680d11f3c1f7e", + "value": "tokenizer_config.json: 100%" + } + }, + "bef2531516164e85bb79b86a791dd00d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3cb5d8fdb5fb4b6a99f6733c00df8378", + "max": 2103, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_58f4369c68434d569d5eb1bc36e71775", + "value": 2103 + } + }, + "1cb9fc011950479a8d4832bc52c3399c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a05df972876941e3b6faab56cc30a4b8", + "placeholder": "​", + "style": "IPY_MODEL_9c61d90b63dd4fb5a481282d6d6eb8e8", + "value": " 2.10k/2.10k [00:00<00:00, 182kB/s]" + } + }, + "974e8f7f05ef472d85d5ea71425e6c39": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "696090959af8499e9a38777e664b85c1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "973bcc9740b4426da4c680d11f3c1f7e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3cb5d8fdb5fb4b6a99f6733c00df8378": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "58f4369c68434d569d5eb1bc36e71775": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "a05df972876941e3b6faab56cc30a4b8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9c61d90b63dd4fb5a481282d6d6eb8e8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2b71f87a02a540488a9e07f072f8807a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_548cd7e9fab54470bc52810f27784760", + "IPY_MODEL_9c5eb078ece84a57aa9c402c9cad3b0b", + "IPY_MODEL_ee00a9f599db4affabb7bf1c4df6ca1a" + ], + "layout": "IPY_MODEL_52bd638607bf4e1aaf224ebdcfa3693d" + } + }, + "548cd7e9fab54470bc52810f27784760": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_771619a5acd343c788b8189167af09d4", + "placeholder": "​", + "style": "IPY_MODEL_09a1b30b5659452f95ebb2e72466c750", + "value": "tokenizer.model: 100%" + } + }, + "9c5eb078ece84a57aa9c402c9cad3b0b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_145a1f1032a44079a262db381e60d401", + "max": 493443, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_99888ad83b51485f959f977ba4418119", + "value": 493443 + } + }, + "ee00a9f599db4affabb7bf1c4df6ca1a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ec0854c2ea9a4c9280b6876df365db9d", + "placeholder": "​", + "style": "IPY_MODEL_dac5892c85214f69a5d75d5dc4858dfe", + "value": " 493k/493k [00:00<00:00, 7.91MB/s]" + } + }, + "52bd638607bf4e1aaf224ebdcfa3693d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "771619a5acd343c788b8189167af09d4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "09a1b30b5659452f95ebb2e72466c750": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "145a1f1032a44079a262db381e60d401": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "99888ad83b51485f959f977ba4418119": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ec0854c2ea9a4c9280b6876df365db9d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dac5892c85214f69a5d75d5dc4858dfe": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "41b669da565e4204b848b754dfa28ac8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e806afdada48418c9e353b94a38cd703", + "IPY_MODEL_7898b7322b014e96984c3d09a29a57fb", + "IPY_MODEL_d665270b05d64effba568ded85eee1b4" + ], + "layout": "IPY_MODEL_df087de9ade24058b1cf32e1556f7cb6" + } + }, + "e806afdada48418c9e353b94a38cd703": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_584330ab439b4887b1050a7f14dc5d7c", + "placeholder": "​", + "style": "IPY_MODEL_880b32d3bd1d4af8b5d0b449aab87e8b", + "value": "tokenizer.json: 100%" + } + }, + "7898b7322b014e96984c3d09a29a57fb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_97d09f016e274cca93927f3bd8329352", + "max": 1795188, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d87ef5878c0f4211809716674d0d8413", + "value": 1795188 + } + }, + "d665270b05d64effba568ded85eee1b4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_556109848b1c4ebc99a6cc7c0be519e0", + "placeholder": "​", + "style": "IPY_MODEL_8d6cdfd75e3f4a628c9e785d3c469d98", + "value": " 1.80M/1.80M [00:00<00:00, 24.9MB/s]" + } + }, + "df087de9ade24058b1cf32e1556f7cb6": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "584330ab439b4887b1050a7f14dc5d7c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "880b32d3bd1d4af8b5d0b449aab87e8b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "97d09f016e274cca93927f3bd8329352": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d87ef5878c0f4211809716674d0d8413": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "556109848b1c4ebc99a6cc7c0be519e0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8d6cdfd75e3f4a628c9e785d3c469d98": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "fb1ff6f4482143c39be1cca57ec2fc8b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_83e6421843ad487c91bc75510b90f198", + "IPY_MODEL_9e74a7b74e1a4b119af5b95d572bac3c", + "IPY_MODEL_080c34ad56c84c229b1555b15b354aad" + ], + "layout": "IPY_MODEL_d968bf43e8574d9090326b31c9a7fd93" + } + }, + "83e6421843ad487c91bc75510b90f198": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e78b05f33ee54c968fd87b77a2470bce", + "placeholder": "​", + "style": "IPY_MODEL_79a201f7ab7e49efa9e3e1504012dec2", + "value": "special_tokens_map.json: 100%" + } + }, + "9e74a7b74e1a4b119af5b95d572bac3c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6e5d431074de4955a97d4ea36621ae36", + "max": 414, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_bfc581362fbc4aca85df7b2a943dd5e4", + "value": 414 + } + }, + "080c34ad56c84c229b1555b15b354aad": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_bc9b585bfd2847bb9f22c4720bd19033", + "placeholder": "​", + "style": "IPY_MODEL_8addd2418c3049f3be32465cc9a408d4", + "value": " 414/414 [00:00<00:00, 52.5kB/s]" + } + }, + "d968bf43e8574d9090326b31c9a7fd93": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e78b05f33ee54c968fd87b77a2470bce": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "79a201f7ab7e49efa9e3e1504012dec2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "6e5d431074de4955a97d4ea36621ae36": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bfc581362fbc4aca85df7b2a943dd5e4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "bc9b585bfd2847bb9f22c4720bd19033": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8addd2418c3049f3be32465cc9a408d4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c7b5bb9ef22f4ebe9969d4d10d63d24c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d8c3f3ec329743f6b2f21d21601f092a", + "IPY_MODEL_2fee19152ef34eeaba541d559b9a0bc0", + "IPY_MODEL_2740de6be1ae4e3bacc642c39828883b" + ], + "layout": "IPY_MODEL_4104813265f34db0ab09c9d6c148ba29" + } + }, + "d8c3f3ec329743f6b2f21d21601f092a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6d2dbad5a0984f8382abd18910c14343", + "placeholder": "​", + "style": "IPY_MODEL_32285185818f40a6b07c6d6f6175b70c", + "value": "config.json: 100%" + } + }, + "2fee19152ef34eeaba541d559b9a0bc0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_79da3c26e0fb4405a198c2255df9ec00", + "max": 571, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_c95bea4e04ff49078821a5dd67f0c28a", + "value": 571 + } + }, + "2740de6be1ae4e3bacc642c39828883b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3695b9dde85348efb683e31e5d52e210", + "placeholder": "​", + "style": "IPY_MODEL_1d982bed2d4645b8a19295b7812cef49", + "value": " 571/571 [00:00<00:00, 72.5kB/s]" + } + }, + "4104813265f34db0ab09c9d6c148ba29": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6d2dbad5a0984f8382abd18910c14343": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "32285185818f40a6b07c6d6f6175b70c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "79da3c26e0fb4405a198c2255df9ec00": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c95bea4e04ff49078821a5dd67f0c28a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "3695b9dde85348efb683e31e5d52e210": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1d982bed2d4645b8a19295b7812cef49": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "32c58f50bb1c44e085ae3663004fcfff": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c4df70cf509541828d3a06c380fdfe3d", + "IPY_MODEL_abd2737f597f48b0846a74c743307917", + "IPY_MODEL_a2a52b5e3c104e1cbec513a9f8744db2" + ], + "layout": "IPY_MODEL_ba57460b8ee24f4e96f8a603914b7073" + } + }, + "c4df70cf509541828d3a06c380fdfe3d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d17cd0e49fa94361894660c0645ec9a8", + "placeholder": "​", + "style": "IPY_MODEL_6cd364a43f6f4ea793b05bf14ee9d687", + "value": "model.safetensors.index.json: 100%" + } + }, + "abd2737f597f48b0846a74c743307917": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a80f72a5e41047f1898d5b6f00a2c69b", + "max": 25125, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_c6f6fca0f35b44fbb9037337a5bc0431", + "value": 25125 + } + }, + "a2a52b5e3c104e1cbec513a9f8744db2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3d07e648a5644742b8112146e952c44a", + "placeholder": "​", + "style": "IPY_MODEL_bff978fcc6f94f55bf605c6d9c23cfd2", + "value": " 25.1k/25.1k [00:00<00:00, 2.73MB/s]" + } + }, + "ba57460b8ee24f4e96f8a603914b7073": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d17cd0e49fa94361894660c0645ec9a8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6cd364a43f6f4ea793b05bf14ee9d687": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a80f72a5e41047f1898d5b6f00a2c69b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c6f6fca0f35b44fbb9037337a5bc0431": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "3d07e648a5644742b8112146e952c44a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bff978fcc6f94f55bf605c6d9c23cfd2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "eca24e648bcf4cc684f15da684e2791d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_dc82b611b8c145eb8ebc7b80073e9ae1", + "IPY_MODEL_f3e6040a241c4ac7b715bb07a9ec6d6b", + "IPY_MODEL_e310ab9f4338443e82d257ddc21f48bb" + ], + "layout": "IPY_MODEL_9dd0e53a7a2a4d668c5640d938b71c9f" + } + }, + "dc82b611b8c145eb8ebc7b80073e9ae1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1fc933b90fa546c884181136373ad005", + "placeholder": "​", + "style": "IPY_MODEL_94f3ee73e2c04092ac5522c6ef038ea1", + "value": "Fetching 2 files: 100%" + } + }, + "f3e6040a241c4ac7b715bb07a9ec6d6b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_81d8563026e04f5ab00eced0da89a7ef", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_95f081aaf9e84c2f91c82a4e2f183009", + "value": 2 + } + }, + "e310ab9f4338443e82d257ddc21f48bb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_965cfc093b5040bbaec177820e45ec95", + "placeholder": "​", + "style": "IPY_MODEL_d328397d81f343e28dd1a6e52c5f0ae7", + "value": " 2/2 [00:46<00:00, 46.46s/it]" + } + }, + "9dd0e53a7a2a4d668c5640d938b71c9f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1fc933b90fa546c884181136373ad005": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "94f3ee73e2c04092ac5522c6ef038ea1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "81d8563026e04f5ab00eced0da89a7ef": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "95f081aaf9e84c2f91c82a4e2f183009": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "965cfc093b5040bbaec177820e45ec95": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d328397d81f343e28dd1a6e52c5f0ae7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f73f9c7f341c4a99b00585343bf4d4bd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_2a11e010825b42d4a949ad64ae0d1933", + "IPY_MODEL_15b769156f6a4d2988f1c09f3820f7ef", + "IPY_MODEL_a0484e3846c647b892d2de3797496605" + ], + "layout": "IPY_MODEL_cb042f80aaf04bf1963d637d1771741e" + } + }, + "2a11e010825b42d4a949ad64ae0d1933": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_852ba7d4221a475488411f5014362496", + "placeholder": "​", + "style": "IPY_MODEL_38dc7c1e65324e3097d8738532272e32", + "value": "model-00001-of-00002.safetensors: 100%" + } + }, + "15b769156f6a4d2988f1c09f3820f7ef": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_613da14abc24460db3bb337886cb407c", + "max": 9942981696, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_37a495a5836f413ea5f662538d51a939", + "value": 9942981696 + } + }, + "a0484e3846c647b892d2de3797496605": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9c61322c006f465385df301121462e82", + "placeholder": "​", + "style": "IPY_MODEL_d93d0bb6ebc943a1be6902bd88cef441", + "value": " 9.94G/9.94G [00:46<00:00, 246MB/s]" + } + }, + "cb042f80aaf04bf1963d637d1771741e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "852ba7d4221a475488411f5014362496": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "38dc7c1e65324e3097d8738532272e32": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "613da14abc24460db3bb337886cb407c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "37a495a5836f413ea5f662538d51a939": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "9c61322c006f465385df301121462e82": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d93d0bb6ebc943a1be6902bd88cef441": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "50a2a1bd13db4045a4ae01138470c42b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ad7cba643d1742cdb47c433bf50072f9", + "IPY_MODEL_57ef5d067e7343239525a6da237b29eb", + "IPY_MODEL_7567388a58a340d4a0f384f79ee13ddc" + ], + "layout": "IPY_MODEL_52c2896ab41a4d2592484084cb501e5a" + } + }, + "ad7cba643d1742cdb47c433bf50072f9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_22957622a42345b991371153c29583c4", + "placeholder": "​", + "style": "IPY_MODEL_d34c879607b041739a2cc6273509e330", + "value": "model-00002-of-00002.safetensors: 100%" + } + }, + "57ef5d067e7343239525a6da237b29eb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d1e7bdd4faac4765862fc809017c4856", + "max": 4540516344, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_fcb0ad846398455faccf0d797549f589", + "value": 4540516344 + } + }, + "7567388a58a340d4a0f384f79ee13ddc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b381226552c9462d858051fcb7240727", + "placeholder": "​", + "style": "IPY_MODEL_94e630795bc247e08e6af434c5924cdd", + "value": " 4.54G/4.54G [00:23<00:00, 248MB/s]" + } + }, + "52c2896ab41a4d2592484084cb501e5a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "22957622a42345b991371153c29583c4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d34c879607b041739a2cc6273509e330": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d1e7bdd4faac4765862fc809017c4856": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fcb0ad846398455faccf0d797549f589": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b381226552c9462d858051fcb7240727": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "94e630795bc247e08e6af434c5924cdd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2b496c218e2049ff9156ff5b3bbdb90b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_62d3b35a3924417894094d3bbf993932", + "IPY_MODEL_41737448e98a48dcbe117351645395de", + "IPY_MODEL_e83735cd79674a3482f0b90d4c9a3e3d" + ], + "layout": "IPY_MODEL_eff6ca539e2947e9b2987977f143de9a" + } + }, + "62d3b35a3924417894094d3bbf993932": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_aa75292545a649eda8cb7bab0ac9bbcd", + "placeholder": "​", + "style": "IPY_MODEL_22c0e2213505435eaeebdfe330b8fbb8", + "value": "Loading checkpoint shards: 100%" + } + }, + "41737448e98a48dcbe117351645395de": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7de820edeeaf4210af68c721bab3082d", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_66ef664b717343bdaf8e5c4610b2a678", + "value": 2 + } + }, + "e83735cd79674a3482f0b90d4c9a3e3d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f09534cbda8c4e91b2e073c0eca0cb96", + "placeholder": "​", + "style": "IPY_MODEL_7d8b5a2a52aa4957bc5905021898d8f4", + "value": " 2/2 [00:17<00:00,  8.24s/it]" + } + }, + "eff6ca539e2947e9b2987977f143de9a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "aa75292545a649eda8cb7bab0ac9bbcd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "22c0e2213505435eaeebdfe330b8fbb8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7de820edeeaf4210af68c721bab3082d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "66ef664b717343bdaf8e5c4610b2a678": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "f09534cbda8c4e91b2e073c0eca0cb96": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7d8b5a2a52aa4957bc5905021898d8f4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "1c68a822580a4960acad93be9fd48ce3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_6df81f91b17f41dc91fc9f367fa0afab", + "IPY_MODEL_17742936c9ac46e588d1ce42235745d0", + "IPY_MODEL_17f0cd6f05184164b48ef906f192505a" + ], + "layout": "IPY_MODEL_936a67f2de2e44728b83600f4fa0569c" + } + }, + "6df81f91b17f41dc91fc9f367fa0afab": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d5ad82f6b9654a8cb888613caaaaa097", + "placeholder": "​", + "style": "IPY_MODEL_b014979e237344129545ff2c384c1c1c", + "value": "generation_config.json: 100%" + } + }, + "17742936c9ac46e588d1ce42235745d0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b99c12d57d4a4eab84aefbef58452c32", + "max": 116, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_5923bbdcf6334393ad832765f129bdec", + "value": 116 + } + }, + "17f0cd6f05184164b48ef906f192505a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_260ac8c28531450bba1deac4e4669dc4", + "placeholder": "​", + "style": "IPY_MODEL_067959a4ef614c498c28bb83c10e16de", + "value": " 116/116 [00:00<00:00, 15.6kB/s]" + } + }, + "936a67f2de2e44728b83600f4fa0569c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d5ad82f6b9654a8cb888613caaaaa097": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b014979e237344129545ff2c384c1c1c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b99c12d57d4a4eab84aefbef58452c32": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5923bbdcf6334393ad832765f129bdec": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "260ac8c28531450bba1deac4e4669dc4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "067959a4ef614c498c28bb83c10e16de": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b5dd409cf6e04764adbb7c2a49b7be86": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_65187b4ebb2041b39778268e8b4d6b0d", + "IPY_MODEL_33317cac10ca4a98bf4433c1eff43435", + "IPY_MODEL_f81f5402902c4c04b10895782287e908" + ], + "layout": "IPY_MODEL_c471914fe0d34ae8967bac2820637d5b" + } + }, + "65187b4ebb2041b39778268e8b4d6b0d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7aead6f6cffa40a383f1b8c64943329e", + "placeholder": "​", + "style": "IPY_MODEL_f24fe57d8e164fd68185b4c117e7c097", + "value": "Loading checkpoint shards: 100%" + } + }, + "33317cac10ca4a98bf4433c1eff43435": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f913ca9ab6d44ab1b788a36bd964ed39", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_ed34016801264a05bb3697eca2ac22ef", + "value": 2 + } + }, + "f81f5402902c4c04b10895782287e908": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_fe622254072540fda3b0dd6b2cab6e4a", + "placeholder": "​", + "style": "IPY_MODEL_5d95bdea47594e21855a6e564d0760da", + "value": " 2/2 [00:17<00:00,  8.01s/it]" + } + }, + "c471914fe0d34ae8967bac2820637d5b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7aead6f6cffa40a383f1b8c64943329e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f24fe57d8e164fd68185b4c117e7c097": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f913ca9ab6d44ab1b788a36bd964ed39": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ed34016801264a05bb3697eca2ac22ef": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "fe622254072540fda3b0dd6b2cab6e4a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5d95bdea47594e21855a6e564d0760da": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9f9defc39ac5437e9512e5fad810b409": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_1c126dfdc51c438b9b48c8a65e549ae2", + "IPY_MODEL_741d800130ea4830b9266f467fa6a0bf", + "IPY_MODEL_73c0a01f1693471c9c017143e9e9058b" + ], + "layout": "IPY_MODEL_ab8174c1337b43048e05aeca72ca18ef" + } + }, + "1c126dfdc51c438b9b48c8a65e549ae2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5e5a992d86434e62a25fc9b7f75f4b16", + "placeholder": "​", + "style": "IPY_MODEL_1507b1310f5045c9b691fdb102cc1686", + "value": "Loading checkpoint shards: 100%" + } + }, + "741d800130ea4830b9266f467fa6a0bf": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8a8e81f9d3a54ce49b367f8e984b4a06", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_bab02b1f092b40c8983cd6440f7eaf16", + "value": 2 + } + }, + "73c0a01f1693471c9c017143e9e9058b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_94f30dc2653a4f178c9c2ef454d24644", + "placeholder": "​", + "style": "IPY_MODEL_a508625ef12d4a639fa9773484507709", + "value": " 2/2 [00:17<00:00,  8.07s/it]" + } + }, + "ab8174c1337b43048e05aeca72ca18ef": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5e5a992d86434e62a25fc9b7f75f4b16": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1507b1310f5045c9b691fdb102cc1686": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "8a8e81f9d3a54ce49b367f8e984b4a06": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bab02b1f092b40c8983cd6440f7eaf16": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "94f30dc2653a4f178c9c2ef454d24644": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a508625ef12d4a639fa9773484507709": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + } + } + } + }, + "cells": [ + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "Pv8FH9BMgskk", + "outputId": "00cd7f02-2556-4850-b599-1ddec83f7cd9" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.4/76.4 kB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m363.4/363.4 MB\u001b[0m \u001b[31m4.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.8/13.8 MB\u001b[0m \u001b[31m112.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m24.6/24.6 MB\u001b[0m \u001b[31m96.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m883.7/883.7 kB\u001b[0m \u001b[31m55.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m664.8/664.8 MB\u001b[0m \u001b[31m1.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m211.5/211.5 MB\u001b[0m \u001b[31m11.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.3/56.3 MB\u001b[0m \u001b[31m44.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m127.9/127.9 MB\u001b[0m \u001b[31m20.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m207.5/207.5 MB\u001b[0m \u001b[31m3.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.1/21.1 MB\u001b[0m \u001b[31m109.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.1/76.1 MB\u001b[0m \u001b[31m28.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "google-genai 1.12.1 requires httpx<1.0.0,>=0.28.1, but you have httpx 0.27.2 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0mRequirement already satisfied: bitsandbytes in /usr/local/lib/python3.11/dist-packages (0.45.5)\n", + "Requirement already satisfied: torch<3,>=2.0 in /usr/local/lib/python3.11/dist-packages (from bitsandbytes) (2.6.0+cu124)\n", + "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.11/dist-packages (from bitsandbytes) (2.0.2)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.18.0)\n", + "Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (4.13.2)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.4.2)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.1.6)\n", + "Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (2025.3.2)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (9.1.0.70)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.5.8)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (11.2.1.3)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (10.3.5.147)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (11.6.1.9)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.3.1.170)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (0.6.2)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (2.21.5)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.2.0)\n", + "Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (1.13.1)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch<3,>=2.0->bitsandbytes) (1.3.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch<3,>=2.0->bitsandbytes) (3.0.2)\n", + "Requirement already satisfied: transformers in /usr/local/lib/python3.11/dist-packages (4.51.3)\n", + "Requirement already satisfied: accelerate in /usr/local/lib/python3.11/dist-packages (1.6.0)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from transformers) (3.18.0)\n", + "Requirement already satisfied: huggingface-hub<1.0,>=0.30.0 in /usr/local/lib/python3.11/dist-packages (from transformers) (0.30.2)\n", + "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.11/dist-packages (from transformers) (2.0.2)\n", + "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from transformers) (24.2)\n", + "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.11/dist-packages (from transformers) (6.0.2)\n", + "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.11/dist-packages (from transformers) (2024.11.6)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from transformers) (2.32.3)\n", + "Requirement already satisfied: tokenizers<0.22,>=0.21 in /usr/local/lib/python3.11/dist-packages (from transformers) (0.21.1)\n", + "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.11/dist-packages (from transformers) (0.5.3)\n", + "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.11/dist-packages (from transformers) (4.67.1)\n", + "Requirement already satisfied: psutil in /usr/local/lib/python3.11/dist-packages (from accelerate) (5.9.5)\n", + "Requirement already satisfied: torch>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from accelerate) (2.6.0+cu124)\n", + "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.11/dist-packages (from huggingface-hub<1.0,>=0.30.0->transformers) (2025.3.2)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.11/dist-packages (from huggingface-hub<1.0,>=0.30.0->transformers) (4.13.2)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (3.4.2)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (3.1.6)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.127)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (9.1.0.70)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.5.8)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (11.2.1.3)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (10.3.5.147)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (11.6.1.9)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.3.1.170)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (0.6.2)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (2.21.5)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.127)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.127)\n", + "Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (3.2.0)\n", + "Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (1.13.1)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch>=2.0.0->accelerate) (1.3.0)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (3.4.1)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (3.10)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (2.4.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (2025.4.26)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch>=2.0.0->accelerate) (3.0.2)\n" + ] + } + ], + "source": [ + "!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2\n", + "!pip install -U bitsandbytes\n", + "!pip install -U transformers accelerate" + ] + }, + { + "cell_type": "code", + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from IPython.display import Markdown, display, update_display\n", + "from openai import OpenAI\n", + "from google.colab import drive\n", + "from huggingface_hub import login\n", + "from google.colab import userdata\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig, pipeline, TextGenerationPipeline\n", + "import torch" + ], + "metadata": { + "id": "u0qdj2ynjjRz" + }, + "execution_count": 9, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Models\n", + "GPT = 'gpt2'\n", + "FALCON = \"tiiuae/falcon-rw-1b\"\n", + "MISTRAL = \"mistralai/Mistral-7B-Instruct-v0.1\"\n", + "Databricks = \"databricks/dolly-v2-3b\"\n" + ], + "metadata": { + "id": "a_sHgTj_jpDE" + }, + "execution_count": 10, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Sign in to HuggingFace Hub\n", + "\n", + "hf_token = userdata.get('HF_TOKEN')\n", + "login(hf_token, add_to_git_credential=True)" + ], + "metadata": { + "id": "JYjtu3cPj2Th" + }, + "execution_count": 11, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Flatten the messages into a single plain prompt\n", + "# prompt = \"\"\"\n", + "# Generate {{n}} fake job postings for a {{role}} position.\n", + "\n", + "# Only output a JSON array like:\n", + "# [\n", + "# {{\n", + "# \"title\": \"Software Engineer\",\n", + "# \"description\": \"Develop backend APIs and services.\",\n", + "# \"requirements\": [\"Python\", \"FastAPI\", \"MongoDB\"],\n", + "# \"location\": \"San Francisco\",\n", + "# \"company_name\": \"TechCorp\"\n", + "# }},\n", + "# ...\n", + "# ]\n", + "# Return valid JSON only. No markdown. No explanations.\n", + "# \"\"\"\n", + "\n", + "# prompt = \"\"\"\n", + "# Generate exactly {{n}} fake job postings for a {{role}}.\n", + "\n", + "# Each posting must be a JSON object with:\n", + "# - title\n", + "# - description (5-10 sentences)\n", + "# - requirements (array of 3-5 strings)\n", + "# - location\n", + "# - company_name\n", + "\n", + "# Return a single JSON array with {n} items. No explanations. No markdown.\n", + "# ONLY the JSON array as output.\n", + "# \"\"\"\n", + "\n", + "prompt = \"\"\"\n", + "Generate one fake job posting for a {{role}}.\n", + "\n", + "Return only a single JSON object with:\n", + "- title\n", + "- description (5-10 sentences)\n", + "- requirements (array of 4-6 strings)\n", + "- location\n", + "- company_name\n", + "\n", + "No explanations, no extra text.\n", + "Only the JSON object.\n", + "\"\"\"\n", + "\n", + "\n", + "\n" + ], + "metadata": { + "id": "7IUshG1fkQ7k" + }, + "execution_count": 12, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "!pip install safetensors" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-9nzEpDd-dkd", + "outputId": "484ed145-951f-4950-f9ba-bf7ed6e30a13" + }, + "execution_count": 13, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Requirement already satisfied: safetensors in /usr/local/lib/python3.11/dist-packages (0.5.3)\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "import os\n", + "os.makedirs(\"/tmp/dolly_offload\", exist_ok=True)" + ], + "metadata": { + "id": "D13qucmC-qGr" + }, + "execution_count": 14, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "bnb_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_compute_dtype=torch.bfloat16,\n", + " bnb_4bit_quant_type=\"nf4\"\n", + ")" + ], + "metadata": { + "id": "4qf967BtEqqx" + }, + "execution_count": 15, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def load_model_and_tokenizer():\n", + " tokenizer = AutoTokenizer.from_pretrained(MISTRAL, trust_remote_code=True)\n", + "\n", + " model = AutoModelForCausalLM.from_pretrained(\n", + " MISTRAL,\n", + " device_map={\"\": \"cuda\"},\n", + " trust_remote_code=True,\n", + " offload_folder=\"/tmp/dolly_offload\",\n", + " quantization_config=bnb_config\n", + " )\n", + "\n", + " return model, tokenizer\n" + ], + "metadata": { + "id": "GjV7joEMjujM" + }, + "execution_count": 16, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# generator = pipeline(\"text-generation\", model=Databricks, device_map=\"auto\", trust_remote_code=True, offload_folder=\"/tmp/dolly_offload\")\n", + "\n", + "def generate_job(role=\"Software Engineer\", model=None, tokenizer=None):\n", + " # prompt = prompt.format(role=role, n=n)\n", + " # outputs = generator(prompt, max_new_tokens=500, do_sample=True, temperature=0.9)\n", + " # return outputs[0]['generated_text']\n", + "\n", + " # Apply chat template formatting\n", + " # inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(model.device)\n", + " inputs = tokenizer(prompt.format(role=role), return_tensors=\"pt\")\n", + " inputs = {k: v.to(model.device) for k, v in inputs.items()}\n", + "\n", + "\n", + " # Generate output\n", + " outputs = model.generate(\n", + " **inputs,\n", + " max_new_tokens=600,\n", + " do_sample=True,\n", + " temperature=0.2,\n", + " top_p=0.9,\n", + " pad_token_id=tokenizer.eos_token_id\n", + " )\n", + "\n", + " # Decode and return\n", + " result = tokenizer.decode(outputs[0], skip_special_tokens=True)\n", + " return result\n", + "\n" + ], + "metadata": { + "id": "5w89B0MwkJWo" + }, + "execution_count": 17, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "\n", + "def generate_jobs(role=\"Software Engineer\", n=5):\n", + " model, tokenizer = load_model_and_tokenizer()\n", + " role = \"Software Engineer\"\n", + " fake_jobs = []\n", + " for i in range(n):\n", + " fake_jobs.append(generate_job(role=role, model=model, tokenizer=tokenizer))\n", + " return fake_jobs" + ], + "metadata": { + "id": "ULhKrRe7XZmW" + }, + "execution_count": 18, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "print(generate_jobs(role=\"Software Engineer\", n=10))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 406, + "referenced_widgets": [ + "1d1fe06ac632475086ed5964ed000360", + "c138f597c98c4944b54d36510ecc8e0b", + "bef2531516164e85bb79b86a791dd00d", + "1cb9fc011950479a8d4832bc52c3399c", + "974e8f7f05ef472d85d5ea71425e6c39", + "696090959af8499e9a38777e664b85c1", + "973bcc9740b4426da4c680d11f3c1f7e", + "3cb5d8fdb5fb4b6a99f6733c00df8378", + "58f4369c68434d569d5eb1bc36e71775", + "a05df972876941e3b6faab56cc30a4b8", + "9c61d90b63dd4fb5a481282d6d6eb8e8", + "2b71f87a02a540488a9e07f072f8807a", + "548cd7e9fab54470bc52810f27784760", + "9c5eb078ece84a57aa9c402c9cad3b0b", + "ee00a9f599db4affabb7bf1c4df6ca1a", + "52bd638607bf4e1aaf224ebdcfa3693d", + "771619a5acd343c788b8189167af09d4", + "09a1b30b5659452f95ebb2e72466c750", + "145a1f1032a44079a262db381e60d401", + "99888ad83b51485f959f977ba4418119", + "ec0854c2ea9a4c9280b6876df365db9d", + "dac5892c85214f69a5d75d5dc4858dfe", + "41b669da565e4204b848b754dfa28ac8", + "e806afdada48418c9e353b94a38cd703", + "7898b7322b014e96984c3d09a29a57fb", + "d665270b05d64effba568ded85eee1b4", + "df087de9ade24058b1cf32e1556f7cb6", + "584330ab439b4887b1050a7f14dc5d7c", + "880b32d3bd1d4af8b5d0b449aab87e8b", + "97d09f016e274cca93927f3bd8329352", + "d87ef5878c0f4211809716674d0d8413", + "556109848b1c4ebc99a6cc7c0be519e0", + "8d6cdfd75e3f4a628c9e785d3c469d98", + "fb1ff6f4482143c39be1cca57ec2fc8b", + "83e6421843ad487c91bc75510b90f198", + "9e74a7b74e1a4b119af5b95d572bac3c", + "080c34ad56c84c229b1555b15b354aad", + "d968bf43e8574d9090326b31c9a7fd93", + "e78b05f33ee54c968fd87b77a2470bce", + "79a201f7ab7e49efa9e3e1504012dec2", + "6e5d431074de4955a97d4ea36621ae36", + "bfc581362fbc4aca85df7b2a943dd5e4", + "bc9b585bfd2847bb9f22c4720bd19033", + "8addd2418c3049f3be32465cc9a408d4", + "c7b5bb9ef22f4ebe9969d4d10d63d24c", + "d8c3f3ec329743f6b2f21d21601f092a", + "2fee19152ef34eeaba541d559b9a0bc0", + "2740de6be1ae4e3bacc642c39828883b", + "4104813265f34db0ab09c9d6c148ba29", + "6d2dbad5a0984f8382abd18910c14343", + "32285185818f40a6b07c6d6f6175b70c", + "79da3c26e0fb4405a198c2255df9ec00", + "c95bea4e04ff49078821a5dd67f0c28a", + "3695b9dde85348efb683e31e5d52e210", + "1d982bed2d4645b8a19295b7812cef49", + "32c58f50bb1c44e085ae3663004fcfff", + "c4df70cf509541828d3a06c380fdfe3d", + "abd2737f597f48b0846a74c743307917", + "a2a52b5e3c104e1cbec513a9f8744db2", + "ba57460b8ee24f4e96f8a603914b7073", + "d17cd0e49fa94361894660c0645ec9a8", + "6cd364a43f6f4ea793b05bf14ee9d687", + "a80f72a5e41047f1898d5b6f00a2c69b", + "c6f6fca0f35b44fbb9037337a5bc0431", + "3d07e648a5644742b8112146e952c44a", + "bff978fcc6f94f55bf605c6d9c23cfd2", + "eca24e648bcf4cc684f15da684e2791d", + "dc82b611b8c145eb8ebc7b80073e9ae1", + "f3e6040a241c4ac7b715bb07a9ec6d6b", + "e310ab9f4338443e82d257ddc21f48bb", + "9dd0e53a7a2a4d668c5640d938b71c9f", + "1fc933b90fa546c884181136373ad005", + "94f3ee73e2c04092ac5522c6ef038ea1", + "81d8563026e04f5ab00eced0da89a7ef", + "95f081aaf9e84c2f91c82a4e2f183009", + "965cfc093b5040bbaec177820e45ec95", + "d328397d81f343e28dd1a6e52c5f0ae7", + "f73f9c7f341c4a99b00585343bf4d4bd", + "2a11e010825b42d4a949ad64ae0d1933", + "15b769156f6a4d2988f1c09f3820f7ef", + "a0484e3846c647b892d2de3797496605", + "cb042f80aaf04bf1963d637d1771741e", + "852ba7d4221a475488411f5014362496", + "38dc7c1e65324e3097d8738532272e32", + "613da14abc24460db3bb337886cb407c", + "37a495a5836f413ea5f662538d51a939", + "9c61322c006f465385df301121462e82", + "d93d0bb6ebc943a1be6902bd88cef441", + "50a2a1bd13db4045a4ae01138470c42b", + "ad7cba643d1742cdb47c433bf50072f9", + "57ef5d067e7343239525a6da237b29eb", + "7567388a58a340d4a0f384f79ee13ddc", + "52c2896ab41a4d2592484084cb501e5a", + "22957622a42345b991371153c29583c4", + "d34c879607b041739a2cc6273509e330", + "d1e7bdd4faac4765862fc809017c4856", + "fcb0ad846398455faccf0d797549f589", + "b381226552c9462d858051fcb7240727", + "94e630795bc247e08e6af434c5924cdd", + "2b496c218e2049ff9156ff5b3bbdb90b", + "62d3b35a3924417894094d3bbf993932", + "41737448e98a48dcbe117351645395de", + "e83735cd79674a3482f0b90d4c9a3e3d", + "eff6ca539e2947e9b2987977f143de9a", + "aa75292545a649eda8cb7bab0ac9bbcd", + "22c0e2213505435eaeebdfe330b8fbb8", + "7de820edeeaf4210af68c721bab3082d", + "66ef664b717343bdaf8e5c4610b2a678", + "f09534cbda8c4e91b2e073c0eca0cb96", + "7d8b5a2a52aa4957bc5905021898d8f4", + "1c68a822580a4960acad93be9fd48ce3", + "6df81f91b17f41dc91fc9f367fa0afab", + "17742936c9ac46e588d1ce42235745d0", + "17f0cd6f05184164b48ef906f192505a", + "936a67f2de2e44728b83600f4fa0569c", + "d5ad82f6b9654a8cb888613caaaaa097", + "b014979e237344129545ff2c384c1c1c", + "b99c12d57d4a4eab84aefbef58452c32", + "5923bbdcf6334393ad832765f129bdec", + "260ac8c28531450bba1deac4e4669dc4", + "067959a4ef614c498c28bb83c10e16de" + ] + }, + "id": "kKsErltXXwy1", + "outputId": "683c2e5e-16d8-4fe3-efdd-664c385c71e7" + }, + "execution_count": 19, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "tokenizer_config.json: 0%| | 0.00/2.10k [00:00=2.0 in /usr/local/lib/python3.11/dist-packages (from bitsandbytes) (2.6.0+cu124)\n", + "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.11/dist-packages (from bitsandbytes) (2.0.2)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.18.0)\n", + "Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (4.13.2)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.4.2)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.1.6)\n", + "Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (2025.3.2)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (9.1.0.70)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.5.8)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (11.2.1.3)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (10.3.5.147)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (11.6.1.9)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.3.1.170)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (0.6.2)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (2.21.5)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.2.0)\n", + "Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (1.13.1)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch<3,>=2.0->bitsandbytes) (1.3.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch<3,>=2.0->bitsandbytes) (3.0.2)\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "import re\n", + "import json\n", + "import ast\n", + "\n", + "\n", + "\n", + "def extract_json_objects_from_text_block(texts):\n", + " \"\"\"\n", + " Accepts either a single string or a list of strings.\n", + " Extracts all valid JSON objects from messy text blocks.\n", + " \"\"\"\n", + " if isinstance(texts, str):\n", + " texts = [texts] # wrap in list if single string\n", + "\n", + " pattern = r\"\\{[\\s\\S]*?\\}\"\n", + " results = []\n", + "\n", + " for raw_text in texts:\n", + " matches = re.findall(pattern, raw_text)\n", + " for match in matches:\n", + " try:\n", + " obj = json.loads(match)\n", + " results.append(obj)\n", + " except json.JSONDecodeError:\n", + " continue\n", + "\n", + " return results\n", + "\n", + "text = generate_jobs(role=\"Software Engineer\", n=10)\n", + "print(extract_json_objects_from_text_block(text))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 86, + "referenced_widgets": [ + "b5dd409cf6e04764adbb7c2a49b7be86", + "65187b4ebb2041b39778268e8b4d6b0d", + "33317cac10ca4a98bf4433c1eff43435", + "f81f5402902c4c04b10895782287e908", + "c471914fe0d34ae8967bac2820637d5b", + "7aead6f6cffa40a383f1b8c64943329e", + "f24fe57d8e164fd68185b4c117e7c097", + "f913ca9ab6d44ab1b788a36bd964ed39", + "ed34016801264a05bb3697eca2ac22ef", + "fe622254072540fda3b0dd6b2cab6e4a", + "5d95bdea47594e21855a6e564d0760da" + ] + }, + "id": "1uzTM2G1oqDs", + "outputId": "08e88ab0-ca17-46d3-8f9c-6a595863aeba" + }, + "execution_count": 22, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Loading checkpoint shards: 0%| | 0/2 [00:00=22.0 (from gradio)\n", + " Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)\n", + "Requirement already satisfied: anyio<5.0,>=3.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (4.9.0)\n", + "Collecting fastapi<1.0,>=0.115.2 (from gradio)\n", + " Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)\n", + "Collecting ffmpy (from gradio)\n", + " Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)\n", + "Collecting gradio-client==1.10.0 (from gradio)\n", + " Downloading gradio_client-1.10.0-py3-none-any.whl.metadata (7.1 kB)\n", + "Collecting groovy~=0.1 (from gradio)\n", + " Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)\n", + "Requirement already satisfied: httpx>=0.24.1 in /usr/local/lib/python3.11/dist-packages (from gradio) (0.27.2)\n", + "Requirement already satisfied: huggingface-hub>=0.28.1 in /usr/local/lib/python3.11/dist-packages (from gradio) (0.30.2)\n", + "Requirement already satisfied: jinja2<4.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (3.1.6)\n", + "Requirement already satisfied: markupsafe<4.0,>=2.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (3.0.2)\n", + "Requirement already satisfied: numpy<3.0,>=1.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (2.0.2)\n", + "Requirement already satisfied: orjson~=3.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (3.10.17)\n", + "Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from gradio) (24.2)\n", + "Requirement already satisfied: pandas<3.0,>=1.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (2.2.2)\n", + "Requirement already satisfied: pillow<12.0,>=8.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (11.2.1)\n", + "Requirement already satisfied: pydantic<2.12,>=2.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (2.11.3)\n", + "Collecting pydub (from gradio)\n", + " Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)\n", + "Collecting python-multipart>=0.0.18 (from gradio)\n", + " Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)\n", + "Requirement already satisfied: pyyaml<7.0,>=5.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (6.0.2)\n", + "Collecting ruff>=0.9.3 (from gradio)\n", + " Downloading ruff-0.11.8-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)\n", + "Collecting safehttpx<0.2.0,>=0.1.6 (from gradio)\n", + " Downloading safehttpx-0.1.6-py3-none-any.whl.metadata (4.2 kB)\n", + "Collecting semantic-version~=2.0 (from gradio)\n", + " Downloading semantic_version-2.10.0-py2.py3-none-any.whl.metadata (9.7 kB)\n", + "Collecting starlette<1.0,>=0.40.0 (from gradio)\n", + " Downloading starlette-0.46.2-py3-none-any.whl.metadata (6.2 kB)\n", + "Collecting tomlkit<0.14.0,>=0.12.0 (from gradio)\n", + " Downloading tomlkit-0.13.2-py3-none-any.whl.metadata (2.7 kB)\n", + "Requirement already satisfied: typer<1.0,>=0.12 in /usr/local/lib/python3.11/dist-packages (from gradio) (0.15.3)\n", + "Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (4.13.2)\n", + "Collecting uvicorn>=0.14.0 (from gradio)\n", + " Downloading uvicorn-0.34.2-py3-none-any.whl.metadata (6.5 kB)\n", + "Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from gradio-client==1.10.0->gradio) (2025.3.2)\n", + "Requirement already satisfied: websockets<16.0,>=10.0 in /usr/local/lib/python3.11/dist-packages (from gradio-client==1.10.0->gradio) (15.0.1)\n", + "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.11/dist-packages (from anyio<5.0,>=3.0->gradio) (3.10)\n", + "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.11/dist-packages (from anyio<5.0,>=3.0->gradio) (1.3.1)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.11/dist-packages (from httpx>=0.24.1->gradio) (2025.4.26)\n", + "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.11/dist-packages (from httpx>=0.24.1->gradio) (1.0.9)\n", + "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.11/dist-packages (from httpcore==1.*->httpx>=0.24.1->gradio) (0.16.0)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from huggingface-hub>=0.28.1->gradio) (3.18.0)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from huggingface-hub>=0.28.1->gradio) (2.32.3)\n", + "Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.11/dist-packages (from huggingface-hub>=0.28.1->gradio) (4.67.1)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas<3.0,>=1.0->gradio) (2.9.0.post0)\n", + "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas<3.0,>=1.0->gradio) (2025.2)\n", + "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas<3.0,>=1.0->gradio) (2025.2)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<2.12,>=2.0->gradio) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.33.1 in /usr/local/lib/python3.11/dist-packages (from pydantic<2.12,>=2.0->gradio) (2.33.1)\n", + "Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<2.12,>=2.0->gradio) (0.4.0)\n", + "Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.11/dist-packages (from typer<1.0,>=0.12->gradio) (8.1.8)\n", + "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.11/dist-packages (from typer<1.0,>=0.12->gradio) (1.5.4)\n", + "Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.11/dist-packages (from typer<1.0,>=0.12->gradio) (13.9.4)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.8.2->pandas<3.0,>=1.0->gradio) (1.17.0)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio) (3.0.0)\n", + "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio) (2.19.1)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->huggingface-hub>=0.28.1->gradio) (3.4.1)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->huggingface-hub>=0.28.1->gradio) (2.4.0)\n", + "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0,>=0.12->gradio) (0.1.2)\n", + "Downloading gradio-5.29.0-py3-none-any.whl (54.1 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m54.1/54.1 MB\u001b[0m \u001b[31m46.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading gradio_client-1.10.0-py3-none-any.whl (322 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m322.9/322.9 kB\u001b[0m \u001b[31m34.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading aiofiles-24.1.0-py3-none-any.whl (15 kB)\n", + "Downloading fastapi-0.115.12-py3-none-any.whl (95 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m95.2/95.2 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading groovy-0.1.2-py3-none-any.whl (14 kB)\n", + "Downloading python_multipart-0.0.20-py3-none-any.whl (24 kB)\n", + "Downloading ruff-0.11.8-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.5 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m11.5/11.5 MB\u001b[0m \u001b[31m131.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading safehttpx-0.1.6-py3-none-any.whl (8.7 kB)\n", + "Downloading semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)\n", + "Downloading starlette-0.46.2-py3-none-any.whl (72 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m72.0/72.0 kB\u001b[0m \u001b[31m7.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading tomlkit-0.13.2-py3-none-any.whl (37 kB)\n", + "Downloading uvicorn-0.34.2-py3-none-any.whl (62 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.5/62.5 kB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading ffmpy-0.5.0-py3-none-any.whl (6.0 kB)\n", + "Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n", + "Installing collected packages: pydub, uvicorn, tomlkit, semantic-version, ruff, python-multipart, groovy, ffmpy, aiofiles, starlette, safehttpx, gradio-client, fastapi, gradio\n", + "Successfully installed aiofiles-24.1.0 fastapi-0.115.12 ffmpy-0.5.0 gradio-5.29.0 gradio-client-1.10.0 groovy-0.1.2 pydub-0.25.1 python-multipart-0.0.20 ruff-0.11.8 safehttpx-0.1.6 semantic-version-2.10.0 starlette-0.46.2 tomlkit-0.13.2 uvicorn-0.34.2\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "import gradio as gr\n", + "import json\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM\n", + "import torch\n", + "import re\n", + "\n", + "def generate_ui(role, n):\n", + " try:\n", + " raw_jobs = generate_jobs(role, n)\n", + " parsed_jobs = extract_json_objects_from_text_block(raw_jobs)\n", + "\n", + " if not isinstance(parsed_jobs, list) or not all(isinstance(item, dict) for item in parsed_jobs):\n", + " print(\"[ERROR] Parsed result is not a list of dicts\")\n", + " return gr.update(value=[], visible=True), None\n", + "\n", + " filename = f\"{role.replace(' ', '_').lower()}_jobs.json\"\n", + " with open(filename, \"w\") as f:\n", + " json.dump(parsed_jobs, f, indent=2)\n", + "\n", + " print(f\"[INFO] Returning {len(parsed_jobs)} jobs -> {filename}\")\n", + " return parsed_jobs, filename\n", + "\n", + " except Exception as e:\n", + " print(f\"[FATAL ERROR] {e}\")\n", + " return gr.update(value=[], visible=True), None\n", + "\n", + "if __name__ == \"__main__\":\n", + " with gr.Blocks() as demo:\n", + " gr.Markdown(\"# 🧠 Synthetic Job Dataset Generator\")\n", + " gr.Markdown(\"Generate a structured dataset of job postings for a specific role.\")\n", + "\n", + " with gr.Row():\n", + " role_input = gr.Textbox(label=\"Job Role\", placeholder=\"e.g. Software Engineer\", value=\"Software Engineer\")\n", + " n_input = gr.Number(label=\"Number of Samples\", value=5, precision=0)\n", + "\n", + " generate_button = gr.Button(\"🚀 Generate\")\n", + " output_table = gr.JSON(label=\"Generated Dataset\")\n", + " download_button = gr.File(label=\"Download JSON\")\n", + "\n", + " generate_button.click(\n", + " generate_ui,\n", + " inputs=[role_input, n_input],\n", + " outputs=[output_table, download_button]\n", + " )\n", + "\n", + " demo.launch(debug=True)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 730, + "referenced_widgets": [ + "9f9defc39ac5437e9512e5fad810b409", + "1c126dfdc51c438b9b48c8a65e549ae2", + "741d800130ea4830b9266f467fa6a0bf", + "73c0a01f1693471c9c017143e9e9058b", + "ab8174c1337b43048e05aeca72ca18ef", + "5e5a992d86434e62a25fc9b7f75f4b16", + "1507b1310f5045c9b691fdb102cc1686", + "8a8e81f9d3a54ce49b367f8e984b4a06", + "bab02b1f092b40c8983cd6440f7eaf16", + "94f30dc2653a4f178c9c2ef454d24644", + "a508625ef12d4a639fa9773484507709" + ] + }, + "id": "FEByigZTo5cv", + "outputId": "e452754b-e155-4b57-eced-7af37996f1f0" + }, + "execution_count": 25, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).\n", + "\n", + "Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().\n", + "* Running on public URL: https://bf27145eb99f8caadd.gradio.live\n", + "\n", + "This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "
" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Loading checkpoint shards: 0%| | 0/2 [00:00 software_engineer_jobs.json\n", + "Keyboard interruption in main thread... closing server.\n", + "Killing tunnel 127.0.0.1:7860 <> https://bf27145eb99f8caadd.gradio.live\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "import os\n", + "\n", + "# Get list of all .ipynb files in /content\n", + "notebooks = [f for f in os.listdir(\"/content\") if f.endswith(\".ipynb\")]\n", + "print(notebooks)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ZfPQJw0Z5UD9", + "outputId": "0e4ba82b-e23b-4faa-8b29-eaf87fdf9500" + }, + "execution_count": 27, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "[]\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "Y88jqI_u5WEL" + }, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/week4/community-contributions/Week4-Comments-Generator-DP.ipynb b/week4/community-contributions/Week4-Comments-Generator-DP.ipynb new file mode 100644 index 0000000..6b3b698 --- /dev/null +++ b/week4/community-contributions/Week4-Comments-Generator-DP.ipynb @@ -0,0 +1,400 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "3e473bbd-a0c2-43bd-bf99-c749784d00c3", + "metadata": {}, + "outputs": [], + "source": [ + "import gradio as gr\n", + "import openai\n", + "import anthropic\n", + "import google.generativeai as genai\n", + "import requests\n", + "import json\n", + "import os\n", + "from typing import Dict, Any, Optional\n", + "import asyncio\n", + "from dotenv import load_dotenv" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16210512-41f1-4de3-8348-2cd7129e023f", + "metadata": {}, + "outputs": [], + "source": [ + "# load API\n", + "load_dotenv(override=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6747e275-91eb-4d2b-90b6-805f2bd9b6b7", + "metadata": {}, + "outputs": [], + "source": [ + "class CodeCommenter:\n", + " def __init__(self):\n", + " # Initialize API clients\n", + " self.openai_client = None\n", + " self.anthropic_client = None\n", + " self.gemini_client = None\n", + " \n", + " # Load API keys from environment variables\n", + " self.setup_clients()\n", + " \n", + " def setup_clients(self):\n", + " \"\"\"Initialize API clients with keys from environment variables\"\"\"\n", + " try:\n", + " # OpenAI\n", + " openai_key = os.getenv('OPENAI_API_KEY')\n", + " if openai_key:\n", + " self.openai_client = openai.OpenAI(api_key=openai_key)\n", + " \n", + " # Anthropic\n", + " anthropic_key = os.getenv('ANTHROPIC_API_KEY')\n", + " if anthropic_key:\n", + " self.anthropic_client = anthropic.Anthropic(api_key=anthropic_key)\n", + " \n", + " # Google Gemini\n", + " gemini_key = os.getenv('GOOGLE_API_KEY')\n", + " if gemini_key:\n", + " genai.configure(api_key=gemini_key)\n", + " self.gemini_client = genai.GenerativeModel('gemini-2.0-flash-exp')\n", + " \n", + " except Exception as e:\n", + " print(f\"Warning: Error setting up API clients: {e}\")\n", + " \n", + " def create_prompt(self, code: str, language: str) -> str:\n", + " \"\"\"Create a prompt for the LLM to add comments and docstrings\"\"\"\n", + " return f\"\"\"Please add detailed and helpful comments and docstrings to the following {language} code. \n", + " \n", + "Guidelines:\n", + "1. Add comprehensive docstrings for functions, classes, and modules\n", + "2. Add inline comments explaining complex logic\n", + "3. Follow the commenting conventions for {language}\n", + "4. Maintain the original code structure and functionality\n", + "5. Make comments clear and professional\n", + "6. Don't change the actual code logic, only add comments\n", + "7. Do not add code markdown delimiters like ```python\n", + "\n", + "Here's the code to comment:\n", + "\n", + "{code}\n", + "\n", + "Please return only the commented code without any additional explanation or markdown formatting.\"\"\"\n", + "\n", + " def call_openai(self, prompt: str, model: str = \"gpt-4o-mini\") -> str:\n", + " \"\"\"Make API call to OpenAI\"\"\"\n", + " if not self.openai_client:\n", + " return \"Error: OpenAI API key not configured. Please set OPENAI_API_KEY environment variable.\"\n", + " \n", + " try:\n", + " response = self.openai_client.chat.completions.create(\n", + " model=model,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": \"You are a helpful coding assistant that adds detailed comments and docstrings to code.\"},\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ],\n", + " max_tokens=4000,\n", + " temperature=0.1\n", + " )\n", + " return response.choices[0].message.content.strip()\n", + " except Exception as e:\n", + " return f\"Error calling OpenAI API: {str(e)}\"\n", + " \n", + " def call_anthropic(self, prompt: str, model: str = \"claude-3-5-haiku-20241022\") -> str:\n", + " \"\"\"Make API call to Anthropic Claude\"\"\"\n", + " if not self.anthropic_client:\n", + " return \"Error: Anthropic API key not configured. Please set ANTHROPIC_API_KEY environment variable.\"\n", + " \n", + " try:\n", + " response = self.anthropic_client.messages.create(\n", + " model=model,\n", + " max_tokens=4000,\n", + " temperature=0.1,\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " )\n", + " return response.content[0].text.strip()\n", + " except Exception as e:\n", + " return f\"Error calling Anthropic API: {str(e)}\"\n", + " \n", + " def call_gemini(self, prompt: str) -> str:\n", + " \"\"\"Make API call to Google Gemini\"\"\"\n", + " if not self.gemini_client:\n", + " return \"Error: Google API key not configured. Please set GOOGLE_API_KEY environment variable.\"\n", + " \n", + " try:\n", + " response = self.gemini_client.generate_content(\n", + " prompt,\n", + " generation_config=genai.types.GenerationConfig(\n", + " max_output_tokens=4000,\n", + " temperature=0.1,\n", + " )\n", + " )\n", + " return response.text.strip()\n", + " except Exception as e:\n", + " return f\"Error calling Gemini API: {str(e)}\"\n", + " \n", + " def call_ollama(self, prompt: str, model: str = \"llama3.2:latest\") -> str:\n", + " \"\"\"Make API call to Ollama (local)\"\"\"\n", + " try:\n", + " url = \"http://localhost:11434/api/generate\"\n", + " data = {\n", + " \"model\": model,\n", + " \"prompt\": prompt,\n", + " \"stream\": False,\n", + " \"options\": {\n", + " \"temperature\": 0.1,\n", + " \"num_predict\": 4000\n", + " }\n", + " }\n", + " \n", + " response = requests.post(url, json=data, timeout=60)\n", + " if response.status_code == 200:\n", + " result = response.json()\n", + " return result.get('response', '').strip()\n", + " else:\n", + " return f\"Error calling Ollama API: HTTP {response.status_code}\"\n", + " except requests.exceptions.ConnectionError:\n", + " return \"Error: Could not connect to Ollama. Make sure Ollama is running locally on port 11434.\"\n", + " except Exception as e:\n", + " return f\"Error calling Ollama API: {str(e)}\"\n", + "\n", + " def generate_comments(self, language: str, code: str, llm: str) -> str:\n", + " \"\"\"Generate comments for the given code using the specified LLM\"\"\"\n", + " if not code.strip():\n", + " return \"Error: Please provide code to comment.\"\n", + " \n", + " prompt = self.create_prompt(code, language)\n", + " \n", + " # Route to appropriate LLM\n", + " if llm == \"gpt-4o-mini\":\n", + " return self.call_openai(prompt, \"gpt-4o-mini\")\n", + " elif llm == \"claude-3-5-haiku-20241022\":\n", + " return self.call_anthropic(prompt, \"claude-3-5-haiku-20241022\")\n", + " elif llm == \"gemini-2.0-flash\":\n", + " return self.call_gemini(prompt)\n", + " elif llm == \"ollama:llama3.2:latest\":\n", + " return self.call_ollama(prompt, \"llama3.2:latest\")\n", + " else:\n", + " return f\"Error: Unsupported LLM: {llm}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "813f0911-d53f-4887-9341-656712e32d8f", + "metadata": {}, + "outputs": [], + "source": [ + "def create_gradio_interface():\n", + " \"\"\"Create and configure the Gradio interface\"\"\"\n", + " commenter = CodeCommenter()\n", + " \n", + " # Define the main function for the interface\n", + " def process_code(language, code, llm):\n", + " \"\"\"Process the code and return commented version\"\"\"\n", + " if not code.strip():\n", + " return \"Please enter some code to comment.\"\n", + " \n", + " # Show processing message\n", + " processing_msg = f\"Processing {language} code with {llm}...\"\n", + " print(processing_msg)\n", + " \n", + " # Generate comments\n", + " result = commenter.generate_comments(language, code, llm)\n", + " return result\n", + " \n", + " # Define default code\n", + " default_code = \"\"\"import pyodbc\n", + "from tabulate import tabulate\n", + "def connect_to_sql_server(server_name, database, username=None, password=None):\n", + " try:\n", + " if username and password:\n", + " connection_string = f\"DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server_name};DATABASE={database};UID={username};PWD={password}\"\n", + " else:\n", + " connection_string = f\"DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server_name};DATABASE={database};Trusted_Connection=yes\"\n", + " connection = pyodbc.connect(connection_string)\n", + " print(f\"Successfully connected to {server_name}/{database}\")\n", + " return connection\n", + " except Exception as e:\n", + " print(f\"Failed to connect to {server_name}/{database}: {str(e)}\")\n", + " return None\n", + "def get_record_count(connection, table_name):\n", + " try:\n", + " cursor = connection.cursor()\n", + " query = f\"SELECT COUNT(*) FROM {table_name}\"\n", + " cursor.execute(query)\n", + " count = cursor.fetchone()[0]\n", + " cursor.close()\n", + " print(f\"Record count for {table_name}: {count}\")\n", + " return count\n", + " except Exception as e:\n", + " print(f\"Failed to get record count for {table_name}: {str(e)}\")\n", + " return None\n", + "def select_top_records(connection, table_name, n):\n", + " try:\n", + " cursor = connection.cursor()\n", + " query = f\"SELECT TOP {n} * FROM {table_name}\"\n", + " cursor.execute(query)\n", + " records = cursor.fetchall()\n", + " columns = [column[0] for column in cursor.description]\n", + " cursor.close()\n", + " print(f\"Top {n} records from {table_name}\")\n", + " if records:\n", + " print(tabulate(records, headers=columns, tablefmt=\"grid\"))\n", + " return records\n", + " except Exception as e:\n", + " print(f\"Failed to retrieve top {n} records from {table_name}: {str(e)}\")\n", + " return None\n", + "conn = connect_to_sql_server(\"localhost\", \"AdventureWorks_lite\")\n", + "if conn:\n", + " total_records = get_record_count(conn, \"Sales.SalesOrderDetail\")\n", + " top_records = select_top_records(conn, \"Production.Product\", 10)\n", + " conn.close()\n", + " print(\"Connection closed successfully\")\"\"\"\n", + "\n", + " css = \"\"\"\n", + "textarea[rows]:not([rows=\"1\"]) {\n", + " overflow-y: auto !important;\n", + " scrollbar-width: thin !important;\n", + "}\n", + "textarea[rows]:not([rows=\"1\"])::-webkit-scrollbar {\n", + " all: initial !important;\n", + " background: #f1f1f1 !important;\n", + "}\n", + "textarea[rows]:not([rows=\"1\"])::-webkit-scrollbar-thumb {\n", + " all: initial !important;\n", + " background: #a8a8a8 !important;\n", + "}\n", + "\"\"\"\n", + "\n", + " # Create the interface\n", + " with gr.Blocks(title=\"Code Commenter\", theme=gr.themes.Base(), css=css) as interface:\n", + " gr.Markdown(\"# 🔧 Code Commenter\")\n", + " gr.Markdown(\"Add detailed comments and docstrings to your code using various LLM models.\")\n", + " \n", + " with gr.Row():\n", + " with gr.Column():\n", + " code_input = gr.Textbox(\n", + " label=\"Input Code\",\n", + " value=default_code,\n", + " lines=15,\n", + " max_lines=20,\n", + " info=\"Enter the code you want to add comments to\"\n", + " )\n", + " \n", + " with gr.Column():\n", + " code_output = gr.Textbox(\n", + " label=\"Commented Code\",\n", + " lines=20,\n", + " max_lines=20,\n", + " info=\"Your code with added comments and docstrings\"\n", + " )\n", + " \n", + " with gr.Row():\n", + " with gr.Column(scale=1):\n", + " language_dropdown = gr.Dropdown(\n", + " choices=[\"Python\", \"Ruby\", \"Rust\", \"C++\", \"Java\"],\n", + " value=\"Python\",\n", + " label=\"Programming Language\",\n", + " info=\"Select the programming language of your code\"\n", + " )\n", + " \n", + " llm_dropdown = gr.Dropdown(\n", + " choices=[\n", + " \"gpt-4o-mini\",\n", + " \"claude-3-5-haiku-20241022\", \n", + " \"gemini-2.0-flash\",\n", + " \"ollama:llama3.2:latest\"\n", + " ],\n", + " value=\"gpt-4o-mini\",\n", + " label=\"LLM Model\",\n", + " info=\"Choose the language model to use\"\n", + " )\n", + " \n", + " generate_btn = gr.Button(\n", + " \"🚀 Generate Comments\", \n", + " variant=\"primary\",\n", + " size=\"lg\"\n", + " )\n", + " \n", + " # Add some API setup information\n", + " gr.Markdown(\"## 📝 API Setup Instructions\")\n", + " gr.Markdown(\"\"\"\n", + " To use this tool, you need to set up API keys as environment variables:\n", + " \n", + " - **OpenAI**: Set `OPENAI_API_KEY`\n", + " - **Anthropic**: Set `ANTHROPIC_API_KEY` \n", + " - **Google Gemini**: Set `GOOGLE_API_KEY`\n", + " - **Ollama**: Make sure Ollama is running locally on port 11434\n", + " \"\"\")\n", + " \n", + " # Connect the button to the processing function\n", + " generate_btn.click(\n", + " fn=process_code,\n", + " inputs=[language_dropdown, code_input, llm_dropdown],\n", + " outputs=code_output,\n", + " show_progress=True\n", + " )\n", + " \n", + " return interface" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef461e08-c1d5-406d-b7d2-a4329f16486e", + "metadata": {}, + "outputs": [], + "source": [ + "print(\"🚀 Starting Code Commenter...\")\n", + "print(\"📋 Setting up Gradio interface...\")\n", + "\n", + "# Create and launch the interface\n", + "interface = create_gradio_interface()\n", + "\n", + "print(\"🌐 Launching interface...\")\n", + "print(\"💡 The interface will open in your default browser\")\n", + "print(\"🔧 Make sure to set up your API keys as environment variables\")\n", + "\n", + "# Launch with auto-opening in browser\n", + "interface.launch(\n", + " server_name=\"127.0.0.1\",\n", + " server_port=7860,\n", + " share=False,\n", + " inbrowser=True,\n", + " show_error=True\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week4/community-contributions/Week4_Exercise_convert_between_thirteen_lang_coment_unit_test.ipynb b/week4/community-contributions/Week4_Exercise_convert_between_thirteen_lang_coment_unit_test.ipynb new file mode 100644 index 0000000..a99930c --- /dev/null +++ b/week4/community-contributions/Week4_Exercise_convert_between_thirteen_lang_coment_unit_test.ipynb @@ -0,0 +1,841 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4a6ab9a2-28a2-445d-8512-a0dc8d1b54e9", + "metadata": {}, + "source": [ + "# Power Coder\n", + "\n", + "1. Convert code between two programming language; supporting languages are Python, Java, JavaScript, TypeScript, C, C++, C#, Go, Rust, Kotlin, Swift, PHP, Julia\n", + "2. Automatically add docstring/comments based on selected comment style\n", + "3. Automatically generate unit tests based on selected unit test style\n", + "4. Supporting models: gpt-4o, claude-3-5-sonnet-20240620, gemini-2.5-flash\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e610bf56-a46e-4aff-8de1-ab49d62b1ad3", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import io\n", + "import sys\n", + "import json\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import google.generativeai\n", + "import anthropic\n", + "from IPython.display import Markdown, display, update_display\n", + "import gradio as gr\n", + "import subprocess" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f672e1c-87e9-4865-b760-370fa605e614", + "metadata": {}, + "outputs": [], + "source": [ + "# environment\n", + "\n", + "load_dotenv(override=True)\n", + "os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n", + "os.environ['ANTHROPIC_API_KEY'] = os.getenv('ANTHROPIC_API_KEY', 'your-key-if-not-using-env')\n", + "os.environ['GOOGLE_API_KEY'] = os.getenv('GOOGLE_API_KEY', 'your-key-if-not-using-env')\n", + "os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8aa149ed-9298-4d69-8fe2-8f5de0f667da", + "metadata": {}, + "outputs": [], + "source": [ + "# initialize\n", + "\n", + "openai = OpenAI()\n", + "claude = anthropic.Anthropic()\n", + "gemini_via_openai_client = OpenAI(\n", + " api_key=os.environ['GOOGLE_API_KEY'], \n", + " base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n", + ")\n", + "OPENAI_MODEL = \"gpt-4o\"\n", + "CLAUDE_MODEL = \"claude-3-5-sonnet-20240620\"\n", + "GEMINI_MODEL = \"gemini-2.5-flash\"" + ] + }, + { + "cell_type": "markdown", + "id": "37b204dd-f770-41d9-9b19-7e1baa5273cd", + "metadata": {}, + "source": [ + "## 1. Convesion Part" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6896636f-923e-4a2c-9d6c-fac07828a201", + "metadata": {}, + "outputs": [], + "source": [ + "def convert_system_prompt_for(in_lang, out_lang):\n", + " convert_system_message = f\"You are an assistant that reimplements {in_lang} code in high performance {out_lang}. \"\n", + " convert_system_message += f\"Respond only with {out_lang} code; use comments sparingly and do not provide any explanation other than occasional comments. \"\n", + " convert_system_message += f\"The {out_lang} response needs to produce an identical output in the fastest possible time. Keep implementations of random number generators identical so that results match exactly.\"\n", + " return convert_system_message" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8e7b3546-57aa-4c29-bc5d-f211970d04eb", + "metadata": {}, + "outputs": [], + "source": [ + "def convert_user_prompt_for(in_lang, out_lang, input_instruct, in_code):\n", + " convert_user_prompt = f\"Rewrite this {in_lang} code in {out_lang} with the fastest possible implementation that produces identical output in the least time. \"\n", + " convert_user_prompt += f\"Respond only with {out_lang} code; do not explain your work other than a few comments. \"\n", + " convert_user_prompt += f\"Pay attention to number types to ensure no int overflows. Remember to include all necessary {out_lang} packages or modules, for example, iomanip for C++.\\n\\n\"\n", + " if input_instruct:\n", + " convert_user_prompt += \"Addtional instruction is: \" + input_instruct\n", + " convert_user_prompt += in_code\n", + " return convert_user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c6190659-f54c-4951-bef4-4960f8e51cc4", + "metadata": {}, + "outputs": [], + "source": [ + "def convert_messages_for(in_lang, out_lang, input_instruct, in_code):\n", + " return [\n", + " {\"role\": \"system\", \"content\": convert_system_prompt_for(in_lang, out_lang)},\n", + " {\"role\": \"user\", \"content\": convert_user_prompt_for(in_lang, out_lang, input_instruct, in_code)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3b497b3-f569-420e-b92e-fb0f49957ce0", + "metadata": {}, + "outputs": [], + "source": [ + "python_hard = \"\"\"# Be careful to support large number sizes\n", + "\n", + "def lcg(seed, a=1664525, c=1013904223, m=2**32):\n", + " value = seed\n", + " while True:\n", + " value = (a * value + c) % m\n", + " yield value\n", + " \n", + "def max_subarray_sum(n, seed, min_val, max_val):\n", + " lcg_gen = lcg(seed)\n", + " random_numbers = [next(lcg_gen) % (max_val - min_val + 1) + min_val for _ in range(n)]\n", + " max_sum = float('-inf')\n", + " for i in range(n):\n", + " current_sum = 0\n", + " for j in range(i, n):\n", + " current_sum += random_numbers[j]\n", + " if current_sum > max_sum:\n", + " max_sum = current_sum\n", + " return max_sum\n", + "\n", + "def total_max_subarray_sum(n, initial_seed, min_val, max_val):\n", + " total_sum = 0\n", + " lcg_gen = lcg(initial_seed)\n", + " for _ in range(20):\n", + " seed = next(lcg_gen)\n", + " total_sum += max_subarray_sum(n, seed, min_val, max_val)\n", + " return total_sum\n", + "\n", + "# Parameters\n", + "n = 10000 # Number of random numbers\n", + "initial_seed = 42 # Initial seed for the LCG\n", + "min_val = -10 # Minimum value of random numbers\n", + "max_val = 10 # Maximum value of random numbers\n", + "\n", + "# Timing the function\n", + "import time\n", + "start_time = time.time()\n", + "result = total_max_subarray_sum(n, initial_seed, min_val, max_val)\n", + "end_time = time.time()\n", + "\n", + "print(\"Total Maximum Subarray Sum (20 runs):\", result)\n", + "print(\"Execution Time: {:.6f} seconds\".format(end_time - start_time))\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0be9f47d-5213-4700-b0e2-d444c7c738c0", + "metadata": {}, + "outputs": [], + "source": [ + "def convert_stream_gpt(in_lang, out_lang, input_instruct, in_code): \n", + " stream = openai.chat.completions.create(model=OPENAI_MODEL, messages=convert_messages_for(in_lang, out_lang, input_instruct, in_code), temperature=0.0, stream=True)\n", + " reply = \"\"\n", + " for chunk in stream:\n", + " fragment = chunk.choices[0].delta.content or \"\"\n", + " reply += fragment\n", + " yield reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8669f56b-8314-4582-a167-78842caea131", + "metadata": {}, + "outputs": [], + "source": [ + "def convert_stream_claude(in_lang, out_lang, input_instruct, in_code):\n", + " result = claude.messages.stream(\n", + " model=CLAUDE_MODEL,\n", + " max_tokens=2000,\n", + " temperature=0.0,\n", + " system=convert_system_prompt_for(in_lang, out_lang),\n", + " messages=[{\"role\": \"user\", \"content\": convert_user_prompt_for(in_lang, out_lang, input_instruct, in_code)}],\n", + " )\n", + " reply = \"\"\n", + " with result as stream:\n", + " for text in stream.text_stream:\n", + " reply += text\n", + " yield reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "01d3cd4f-c100-4e25-8670-0663513f6136", + "metadata": {}, + "outputs": [], + "source": [ + "def convert_stream_gemini(in_lang, out_lang, input_instruct, in_code): \n", + " stream = gemini_via_openai_client.chat.completions.create(model=GEMINI_MODEL, messages=convert_messages_for(in_lang, out_lang, input_instruct, in_code), temperature=0.0, stream=True)\n", + " reply = \"\"\n", + " for chunk in stream:\n", + " fragment = chunk.choices[0].delta.content or \"\"\n", + " reply += fragment\n", + " yield reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2f1ae8f5-16c8-40a0-aa18-63b617df078d", + "metadata": {}, + "outputs": [], + "source": [ + "def optimize(in_lang, out_lang, in_code, input_instruct, convert_model):\n", + " if \"gpt\" in convert_model.lower():\n", + " result = convert_stream_gpt(in_lang, out_lang, input_instruct, in_code)\n", + " elif \"claude\" in convert_model.lower():\n", + " result = convert_stream_claude(in_lang, out_lang, input_instruct, in_code)\n", + " elif \"gemini\" in convert_model.lower():\n", + " result = convert_stream_gemini(in_lang, out_lang, input_instruct, in_code)\n", + " else:\n", + " raise ValueError(\"Unknown convert model\")\n", + " for stream_so_far in result:\n", + " yield stream_so_far " + ] + }, + { + "cell_type": "markdown", + "id": "07383878-f887-464f-8bc7-527c669d3edd", + "metadata": {}, + "source": [ + "## 2. Comment part" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d254038c-fdd6-4ef8-8b7a-a074f1e7405d", + "metadata": {}, + "outputs": [], + "source": [ + "def comment_system_prompt_for(lang, comment_style):\n", + " comment_system_message = f\"You are an assistant that generate necessary, concise and clear comment/docstring for the {lang} code by applying {comment_style} comment style. \"\n", + " comment_system_message += f\"Respond only with added comments, and do not provide any redundant explanation. \"\n", + " return comment_system_message" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e95cee4f-f229-4c9f-8e67-8a68cc9534c3", + "metadata": {}, + "outputs": [], + "source": [ + "def comment_user_prompt_for(lang, code, comment_style):\n", + " comment_user_prompt = f\"Add the comments/docstring on the given code for the {lang} programming language in {comment_style} comment style. \"\n", + " comment_user_prompt += f\"Respond only with added comments, and do not provide any redundant explanation.\\n\\n\"\n", + " comment_user_prompt += f\"The given code is as follows: \"\n", + " comment_user_prompt += code\n", + " return comment_user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "507426c2-cf5a-4041-b904-b18a5afe83b6", + "metadata": {}, + "outputs": [], + "source": [ + "def comment_messages_for(lang, code, comment_style):\n", + " return [\n", + " {\"role\": \"system\", \"content\": comment_system_prompt_for(lang, comment_style)},\n", + " {\"role\": \"user\", \"content\": comment_user_prompt_for(lang, code, comment_style)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7e1c8cf6-7a15-4e79-82f6-6bb2a0b85773", + "metadata": {}, + "outputs": [], + "source": [ + "def comment_stream_gpt(lang, code, comment_style): \n", + " stream = openai.chat.completions.create(model=OPENAI_MODEL, messages=comment_messages_for(lang, code, comment_style), temperature=0.0, stream=True)\n", + " reply = \"\"\n", + " for chunk in stream:\n", + " fragment = chunk.choices[0].delta.content or \"\"\n", + " reply += fragment\n", + " yield reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "26f27781-4a3e-4e5f-a8ab-9a25944a9879", + "metadata": {}, + "outputs": [], + "source": [ + "def comment_stream_claude(lang, code, comment_style):\n", + " result = claude.messages.stream(\n", + " model=CLAUDE_MODEL,\n", + " max_tokens=2000,\n", + " temperature=0.0,\n", + " system=comment_system_prompt_for(lang, comment_style),\n", + " messages=[{\"role\": \"user\", \"content\": comment_user_prompt_for(lang, code, comment_style)}],\n", + " )\n", + " reply = \"\"\n", + " with result as stream:\n", + " for text in stream.text_stream:\n", + " reply += text\n", + " yield reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8e6719e7-f2f3-40ea-8fed-01d84a641306", + "metadata": {}, + "outputs": [], + "source": [ + "def comment_stream_gemini(lang, code, comment_style): \n", + " stream = gemini_via_openai_client.chat.completions.create(model=GEMINI_MODEL, messages=comment_messages_for(lang, code, comment_style), temperature=0.0, stream=True)\n", + " reply = \"\"\n", + " for chunk in stream:\n", + " fragment = chunk.choices[0].delta.content or \"\"\n", + " reply += fragment\n", + " yield reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2b98acc4-23d8-4671-8f19-92d72631b55d", + "metadata": {}, + "outputs": [], + "source": [ + "def generate_comments_via_model(lang, code, comment_style, comment_model):\n", + " if \"gpt\" in comment_model.lower():\n", + " result = comment_stream_gpt(lang, code, comment_style)\n", + " elif \"claude\" in comment_model.lower():\n", + " result = comment_stream_claude(lang, code, comment_style)\n", + " elif \"gemini\" in comment_model.lower():\n", + " result = comment_stream_gemini(lang, code, comment_style)\n", + " else:\n", + " raise ValueError(\"Unknown comment model\")\n", + " for stream_so_far in result:\n", + " yield stream_so_far " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "282c75ae-d8c3-4866-a024-f7ecf87b3cde", + "metadata": {}, + "outputs": [], + "source": [ + "def generate_comments_fn(comment_option, in_lang, out_lang, in_code, out_code, in_comment_style, out_comment_style, comment_model):\n", + " if 'input' in comment_option:\n", + " in_gen = generate_comments_via_model(in_lang, in_code, in_comment_style, comment_model)\n", + " for in_output in in_gen:\n", + " yield in_output, \"\"\n", + " elif 'output' in comment_option:\n", + " out_gen = generate_comments_via_model(out_lang, out_code, out_comment_style, comment_model)\n", + " for out_output in out_gen:\n", + " yield \"\", out_output\n", + " elif 'both' in comment_option:\n", + " in_gen = generate_comments_via_model(in_lang, in_code, in_comment_style, comment_model)\n", + " out_gen = generate_comments_via_model(out_lang, out_code, out_comment_style, comment_model)\n", + " for in_output, out_output in zip(in_gen, out_gen):\n", + " yield in_output, out_output" + ] + }, + { + "cell_type": "markdown", + "id": "ce2c178c-d03c-49c0-b0e9-c57c699bca08", + "metadata": {}, + "source": [ + "## 3. Unit test part" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e5a4743e-e1a8-42c7-8f1f-a73d49c0895d", + "metadata": {}, + "outputs": [], + "source": [ + "def unit_test_system_prompt_for(lang, unit_test_style):\n", + " unit_test_system_message = f\"You are an assistant that generate necessary, concise, clear and executable unit tests for the {lang} code by applying {unit_test_style} unit test style. \"\n", + " unit_test_system_message += f\"Respond only with generated unit tests; use comments sparingly and do not provide any explanation other than occasional comments. \"\n", + " return unit_test_system_message" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "334d5e40-71ff-4d24-8cef-b6c81c188e4d", + "metadata": {}, + "outputs": [], + "source": [ + "def unit_test_user_prompt_for(lang, code, unit_test_style):\n", + " unit_test_user_prompt = f\"Add the unit tests on the given code for the {lang} programming language in {unit_test_style} unit test style. \"\n", + " unit_test_user_prompt += f\"Respond only with generated unit tests; use comments sparingly and do not provide any explanation other than occasional comments.\\n\\n\"\n", + " unit_test_user_prompt += f\"The given code is as follows: \"\n", + " unit_test_user_prompt += code\n", + " return unit_test_user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8a8e061f-3993-4746-9425-d938d2537f65", + "metadata": {}, + "outputs": [], + "source": [ + "def unit_test_messages_for(lang, code, unit_test_style):\n", + " return [\n", + " {\"role\": \"system\", \"content\": unit_test_system_prompt_for(lang, unit_test_style)},\n", + " {\"role\": \"user\", \"content\": unit_test_user_prompt_for(lang, code, unit_test_style)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "71c1613b-7a16-4443-acec-d0a2d9bed192", + "metadata": {}, + "outputs": [], + "source": [ + "def unit_test_stream_gpt(lang, code, unit_test_style): \n", + " stream = openai.chat.completions.create(model=OPENAI_MODEL, messages=unit_test_messages_for(lang, code, unit_test_style), stream=True)\n", + " reply = \"\"\n", + " for chunk in stream:\n", + " fragment = chunk.choices[0].delta.content or \"\"\n", + " reply += fragment\n", + " yield reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8a6e3502-f7ff-42b8-8fc5-2697b2d1f36e", + "metadata": {}, + "outputs": [], + "source": [ + "def unit_test_stream_claude(lang, code, unit_test_style):\n", + " result = claude.messages.stream(\n", + " model=CLAUDE_MODEL,\n", + " max_tokens=2000,\n", + " system=unit_test_system_prompt_for(lang, unit_test_style),\n", + " messages=[{\"role\": \"user\", \"content\": unit_test_user_prompt_for(lang, code, unit_test_style)}],\n", + " )\n", + " reply = \"\"\n", + " with result as stream:\n", + " for text in stream.text_stream:\n", + " reply += text\n", + " yield reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d7f694f-a276-4bdc-9cfb-755483fd4380", + "metadata": {}, + "outputs": [], + "source": [ + "def unit_test_stream_gemini(lang, code, unit_test_style): \n", + " stream = gemini_via_openai_client.chat.completions.create(model=GEMINI_MODEL, messages=unit_test_messages_for(lang, code, unit_test_style), stream=True)\n", + " reply = \"\"\n", + " for chunk in stream:\n", + " fragment = chunk.choices[0].delta.content or \"\"\n", + " reply += fragment\n", + " yield reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c824429a-b18a-4320-8258-0141037a6531", + "metadata": {}, + "outputs": [], + "source": [ + "def generate_unit_test_via_model(lang, code, unit_test_style, unit_test_model):\n", + " if \"gpt\" in unit_test_model.lower():\n", + " result = unit_test_stream_gpt(lang, code, unit_test_style)\n", + " elif \"claude\" in unit_test_model.lower():\n", + " result = unit_test_stream_claude(lang, code, unit_test_style)\n", + " elif \"gemini\" in unit_test_model.lower():\n", + " result = unit_test_stream_gemini(lang, code, unit_test_style)\n", + " else:\n", + " raise ValueError(\"Unknown unit test model\")\n", + " for stream_so_far in result:\n", + " yield stream_so_far " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3e59e26-37c0-4429-b69c-deb581423dd0", + "metadata": {}, + "outputs": [], + "source": [ + "def generate_unit_test_fn(unit_test_option, in_lang, out_lang, in_code, out_code, in_unit_test_style, out_unit_test_style, unit_test_model):\n", + " if 'input' in unit_test_option:\n", + " in_gen = generate_unit_test_via_model(in_lang, in_code, in_unit_test_style, unit_test_model)\n", + " for in_output in in_gen:\n", + " yield in_output, \"\"\n", + " elif 'output' in unit_test_option:\n", + " out_gen = generate_unit_test_via_model(out_lang, out_code, out_unit_test_style, unit_test_model)\n", + " for out_output in out_gen:\n", + " yield \"\", out_output\n", + " elif 'both' in unit_test_option:\n", + " in_gen = generate_unit_test_via_model(in_lang, in_code, in_unit_test_style, unit_test_model)\n", + " out_gen = generate_unit_test_via_model(out_lang, out_code, out_unit_test_style, unit_test_model)\n", + " for in_output, out_output in zip(in_gen, out_gen):\n", + " yield in_output, out_output" + ] + }, + { + "cell_type": "markdown", + "id": "2a1f4d0c-f417-4de4-be9f-441cbe5a6db3", + "metadata": {}, + "source": [ + "## 4. Gradio UI part" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9a2274f1-d03b-42c0-8dcc-4ce159b18442", + "metadata": {}, + "outputs": [], + "source": [ + "LANGUAGE_INFO = {\n", + " \"Python\": {\n", + " \"doc_style\": [\"Google-style\", \"NumPy-style\", \"reST\", \"Doxygen\"],\n", + " \"unit_test_style\": [\"unittest\", \"pytest\", \"doctest\"]\n", + " },\n", + " \"Java\": {\n", + " \"doc_style\": [\"Javadoc\"],\n", + " \"unit_test_style\": [\"JUnit4\", \"JUnit5\", \"TestNG\"]\n", + " },\n", + " \"JavaScript\": {\n", + " \"doc_style\": [\"JSDoc\"],\n", + " \"unit_test_style\": [\"Jest\", \"Mocha + Chai\", \"Jasmine\"]\n", + " },\n", + " \"TypeScript\": {\n", + " \"doc_style\": [\"JSDoc\", \"TSDoc\"],\n", + " \"unit_test_style\": [\"Jest\", \"Mocha + Chai\", \"Vitest\"]\n", + " },\n", + " \"C\": {\n", + " \"doc_style\": [\"Doxygen\"],\n", + " \"unit_test_style\": [\"Google Test (gtest)\", \"CppUnit\", \"Catch2\"]\n", + " },\n", + " \"C++\": {\n", + " \"doc_style\": [\"Doxygen\"],\n", + " \"unit_test_style\": [\"Google Test (gtest)\", \"CppUnit\", \"Catch2\"]\n", + " },\n", + " \"C#\": {\n", + " \"doc_style\": [\"XML comments\"],\n", + " \"unit_test_style\": [\"xUnit\", \"NUnit\", \"MSTest\"]\n", + " },\n", + " \"Go\": {\n", + " \"doc_style\": [\"Godoc\"],\n", + " \"unit_test_style\": [\"Built-in testing package\"]\n", + " },\n", + " \"Rust\": {\n", + " \"doc_style\": [\"Rustdoc\", \"Markdown\"],\n", + " \"unit_test_style\": [\"Built-in #[test] annotation\"]\n", + " },\n", + " \"Kotlin\": {\n", + " \"doc_style\": [\"KDoc\"],\n", + " \"unit_test_style\": [\"JUnit\", \"Kotest\", \"Spek\"]\n", + " },\n", + " \"Swift\": {\n", + " \"doc_style\": [\"Mark-style comments\"],\n", + " \"unit_test_style\": [\"XCTest\"]\n", + " },\n", + " \"PHP\": {\n", + " \"doc_style\": [\"PHPDoc\"],\n", + " \"unit_test_style\": [\"PHPUnit\"]\n", + " },\n", + " \"Julia\": {\n", + " \"doc_style\": [\"Markdown\"],\n", + " \"unit_test_style\": [\"Built-in Test standard library\"]\n", + " }\n", + "}\n", + "LANGUAGES = list(LANGUAGE_INFO.keys())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b50e7833-8f6f-407e-8174-37af9cec2030", + "metadata": {}, + "outputs": [], + "source": [ + "with gr.Blocks(title=\"Power Coder\", theme=gr.themes.Citrus(), css=\"\"\"\n", + ".selected {\n", + " background-color: orange !important;\n", + " box-shadow: 0 4px 12px rgba(255, 140, 0, 0.5) !important;\n", + " color: black;\n", + "}\n", + ".unselected {\n", + " background-color: gray !important;\n", + " box-shadow: 0 4px 12px rgba(128, 128, 128, 0.4);\n", + " color: white;\n", + "}\n", + "\"\"\") as ui:\n", + " current_selected = gr.State(\"\")\n", + " initial_in_lang = \"Python\"\n", + " initial_out_lang = \"Java\"\n", + " in_comment_style_choices = [\"Standard\"] + LANGUAGE_INFO[initial_in_lang][\"doc_style\"]\n", + " out_comment_style_choices = [\"Standard\"] + LANGUAGE_INFO[initial_out_lang][\"doc_style\"]\n", + " in_unit_test_style_choices = [\"Standard\"] + LANGUAGE_INFO[initial_in_lang][\"unit_test_style\"]\n", + " out_unit_test_style_choices = [\"Standard\"] + LANGUAGE_INFO[initial_out_lang][\"unit_test_style\"]\n", + " in_code_file_name = gr.State(\"in_code.txt\")\n", + " out_code_file_name = gr.State(\"out_code.txt\")\n", + " in_comments_file_name = gr.State(\"in_comments.txt\")\n", + " out_comments_file_name = gr.State(\"out_comments.txt\")\n", + " in_unit_test_file_name = gr.State(\"in_unit_tests.txt\")\n", + " out_unit_test_file_name = gr.State(\"out_unit_tests.txt\")\n", + " \n", + " \n", + " gr.Markdown(\"## Code Helper\")\n", + "\n", + " def load_file_content(file):\n", + " if file is None:\n", + " return \"\"\n", + " with open(file.name, \"r\", encoding=\"utf-8\") as f:\n", + " return f.read()\n", + "\n", + " def change_lang(lang):\n", + " comment_style_choices = [\"Standard\"] + LANGUAGE_INFO[lang][\"doc_style\"]\n", + " unit_test_style_choices = [\"Standard\"] + LANGUAGE_INFO[lang][\"unit_test_style\"]\n", + " return (\n", + " gr.update(choices=comment_style_choices, value=str(comment_style_choices[0])), \n", + " gr.update(choices=unit_test_style_choices, value=str(unit_test_style_choices[0]))\n", + " )\n", + "\n", + " def download_fn(in_text, out_text, in_file_name, out_file_name):\n", + " if in_text:\n", + " with open(in_file_name, \"w\") as f:\n", + " f.write(in_text)\n", + " if out_text:\n", + " with open(out_file_name, \"w\") as f:\n", + " f.write(out_text)\n", + " \n", + " # Conversion part\n", + " with gr.Row():\n", + " in_lang = gr.Dropdown(choices=LANGUAGES, label=\"Select input language\", value=initial_in_lang, interactive=True)\n", + " out_lang = gr.Dropdown(choices=LANGUAGES, label=\"Select output language\", value=initial_out_lang, interactive=True)\n", + " with gr.Row():\n", + " input_file = gr.File(label=\"Upload a source code file or input below\")\n", + " input_instruct = gr.Textbox(\n", + " label=\"Additional instruction(optional)\",\n", + " placeholder=\"Enter the instruction you want the ouput code to follow...\\n\\nFor example: Define the variable using snake_case style.\",\n", + " lines=8\n", + " )\n", + " with gr.Row():\n", + " in_code = gr.Textbox(label=\"Input Code:\", value=python_hard, lines=10)\n", + " out_code = gr.Textbox(label=\"Output Code:\", lines=10)\n", + " with gr.Row():\n", + " convert_model = gr.Dropdown([\"Claude\", \"GPT\", \"Gemini\"], label=\"Select model\", value=\"Claude\")\n", + " with gr.Row():\n", + " convert = gr.Button(\"Convert code\")\n", + " download_code = gr.Button(\"Download code\")\n", + "\n", + " gr.HTML(\"
\")\n", + "\n", + " def show_comment(current_selected):\n", + " if current_selected == \"comment\":\n", + " return (\n", + " gr.update(visible=False),\n", + " gr.update(visible=False),\n", + " gr.update(elem_classes=[\"unselected\"]),\n", + " gr.update(elem_classes=[\"unselected\"]),\n", + " \"\"\n", + " )\n", + " else:\n", + " return (\n", + " gr.update(visible=True),\n", + " gr.update(visible=False),\n", + " gr.update(elem_classes=[\"selected\"]),\n", + " gr.update(elem_classes=[\"unselected\"]),\n", + " \"comment\"\n", + " )\n", + "\n", + " def show_unit_test(current_selected):\n", + " if current_selected == \"unit_test\":\n", + " return (\n", + " gr.update(visible=False),\n", + " gr.update(visible=False),\n", + " gr.update(elem_classes=[\"unselected\"]),\n", + " gr.update(elem_classes=[\"unselected\"]),\n", + " \"\"\n", + " )\n", + " else:\n", + " return (\n", + " gr.update(visible=False),\n", + " gr.update(visible=True),\n", + " gr.update(elem_classes=[\"unselected\"]),\n", + " gr.update(elem_classes=[\"selected\"]),\n", + " \"unit_test\"\n", + " )\n", + " \n", + " with gr.Blocks() as demo:\n", + " with gr.Row():\n", + " comment_show_up = gr.Button(\"Comment\", elem_id=\"comment-btn\", elem_classes=[\"unselected\"])\n", + " unit_test_show_up = gr.Button(\"Unit Test\", elem_id=\"unit-test-btn\", elem_classes=[\"unselected\"])\n", + " \n", + " comment_section = gr.Column(visible=False)\n", + " unit_test_section = gr.Column(visible=False)\n", + " \n", + " with comment_section:\n", + " # Comment section\n", + " with gr.Row():\n", + " comment_option = gr.Radio(\n", + " choices=[\n", + " \"Comment input code\",\n", + " \"Comment output code\",\n", + " \"Comment both\"\n", + " ],\n", + " label=\"Commenting Options\",\n", + " value=\"Comment input code\",\n", + " interactive=True\n", + " )\n", + " with gr.Row():\n", + " in_comment_style = gr.Dropdown(choices=in_comment_style_choices, label=\"Select comment style for input code\", value=in_comment_style_choices[0], interactive=True)\n", + " out_comment_style = gr.Dropdown(choices=out_comment_style_choices, label=\"Select comment style for oupt code\", value=out_comment_style_choices[0], interactive=True)\n", + " with gr.Row():\n", + " comment_model = gr.Dropdown([\"Claude\", \"GPT\", \"Gemini\"], label=\"Select model\", value=\"Claude\")\n", + " with gr.Row():\n", + " generate_comments = gr.Button(\"Generate comments\")\n", + " download_comments = gr.Button(\"Download comments\")\n", + " with gr.Row():\n", + " in_comments = gr.Textbox(label=\"Comments for Input Code:\", lines=10)\n", + " out_comments = gr.Textbox(label=\"Comments for Output Code:\", lines=10)\n", + " \n", + " with unit_test_section:\n", + " # Unit test part\n", + " with gr.Row():\n", + " unit_test_option = gr.Radio(\n", + " choices=[\n", + " \"Add unit test for input code\",\n", + " \"Add unit test for output code\",\n", + " \"Add unit test for both\"\n", + " ],\n", + " label=\"Unit Test Options\",\n", + " value=\"Add unit test for input code\",\n", + " interactive=True\n", + " )\n", + " with gr.Row():\n", + " in_unit_test_style = gr.Dropdown(choices=in_unit_test_style_choices, label=\"Select unit test style for input code\", value=in_unit_test_style_choices[0], interactive=True)\n", + " out_unit_test_style = gr.Dropdown(choices=out_unit_test_style_choices, label=\"Select unit test style for oupt code\", value=out_unit_test_style_choices[0], interactive=True)\n", + " with gr.Row():\n", + " unit_test_model = gr.Dropdown([\"Claude\", \"GPT\", \"Gemini\"], label=\"Select model\", value=\"Claude\")\n", + " with gr.Row():\n", + " generate_unit_test = gr.Button(\"Generate unit test\")\n", + " download_unit_test = gr.Button(\"Download unit text\")\n", + " with gr.Row():\n", + " in_unit_test = gr.Textbox(label=\"Unit Test for Input Code:\", lines=10)\n", + " out_unit_test = gr.Textbox(label=\"Unit Test for Output Code:\", lines=10)\n", + "\n", + " in_lang.change(fn=change_lang, inputs=in_lang, outputs=[in_comment_style, in_unit_test_style])\n", + " out_lang.change(fn=change_lang, inputs=out_lang, outputs=[out_comment_style, out_unit_test_style])\n", + " input_file.change(fn=load_file_content, inputs=input_file, outputs=in_code)\n", + " \n", + " convert.click(optimize, inputs=[in_lang, out_lang, in_code, input_instruct, convert_model], outputs=[out_code])\n", + " download_code.click(download_fn, inputs=[in_code, out_code, in_code_file_name, out_code_file_name])\n", + " \n", + " comment_show_up.click(fn=show_comment, inputs=current_selected, outputs=[comment_section, unit_test_section, comment_show_up, unit_test_show_up, current_selected])\n", + " unit_test_show_up.click(fn=show_unit_test, inputs=current_selected, outputs=[comment_section, unit_test_section, comment_show_up, unit_test_show_up, current_selected])\n", + "\n", + " generate_comments.click(generate_comments_fn, inputs=[comment_option, in_lang, out_lang, in_code, out_code, in_comment_style, out_comment_style, comment_model], outputs=[in_comments, out_comments])\n", + " download_comments.click(download_fn, inputs=[in_comments, out_comments, in_comments_file_name, out_comments_file_name])\n", + " generate_unit_test.click(generate_unit_test_fn, inputs=[unit_test_option, in_lang, out_lang, in_code, out_code, in_unit_test_style, out_unit_test_style, unit_test_model], outputs=[in_unit_test, out_unit_test])\n", + " download_unit_test.click(download_fn, inputs=[in_unit_test, out_unit_test, in_unit_test_file_name, out_unit_test_file_name])\n", + " \n", + "ui.launch()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0266734c-0bee-46c0-9b17-9fd2ae86cc3a", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week4/community-contributions/Week4_day3_Gemini_Codestral.ipynb b/week4/community-contributions/Week4_day3_Gemini_Codestral.ipynb new file mode 100644 index 0000000..8fa0417 --- /dev/null +++ b/week4/community-contributions/Week4_day3_Gemini_Codestral.ipynb @@ -0,0 +1,643 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "ac833f26-d429-4fd2-8f83-92174f1c951a", + "metadata": {}, + "source": [ + "# Code conversion using Gemini and Codestral in Windows 11" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c230178c-6f31-4c5a-a888-16b7037ffbf9", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import io\n", + "import sys\n", + "import gradio as gr\n", + "import subprocess\n", + "from google import genai\n", + "from google.genai import types\n", + "from mistralai import Mistral\n", + "from dotenv import load_dotenv\n", + "from IPython.display import Markdown, display, update_display" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6d824484-eaaa-456a-b7dc-7e3277fec34a", + "metadata": {}, + "outputs": [], + "source": [ + "# Load Gemini and Mistral API Keys\n", + "\n", + "load_dotenv(override=True)\n", + "gemini_api_key = os.getenv(\"GOOGLE_API_KEY\")\n", + "mistral_api_key = os.getenv(\"MISTRAL_API_KEY\")\n", + "\n", + "if not mistral_api_key or not gemini_api_key:\n", + " print(\"API Key not found!\")\n", + "else:\n", + " print(\"API Key loaded in memory\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "86f3633e-81f9-4c13-b7b5-793ddc4f886f", + "metadata": {}, + "outputs": [], + "source": [ + "# Models to be used\n", + "\n", + "MODEL_GEMINI = 'gemini-2.5-flash'\n", + "MODEL_CODESTRAL = 'codestral-latest'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3f3a6d53-70f9-46b8-a490-a50f3a1adf9e", + "metadata": {}, + "outputs": [], + "source": [ + "# Load Gemini client\n", + "try:\n", + " gemini_client = genai.Client(api_key=gemini_api_key)\n", + " print(\"Google GenAI Client initialized successfully!\")\n", + "\n", + " codestral_client = Mistral(api_key=mistral_api_key)\n", + " print(\"Mistral Client initialized successfully!\")\n", + "except Exception as e:\n", + " print(f\"Error initializing Client: {e}\")\n", + " exit() " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f816fbe8-e094-499f-98a5-588ebecf8c72", + "metadata": {}, + "outputs": [], + "source": [ + "# Gemini System prompt\n", + "\n", + "system_message = \"You are an assistant that reimplements Python code in high-performance C++ optimized for a Windows PC. \"\n", + "system_message += \"Use Windows-specific optimizations where applicable (e.g., multithreading with std::thread, SIMD, or WinAPI if necessary). \"\n", + "system_message += \"Respond only with the equivalent C++ code; include comments only where absolutely necessary. \"\n", + "system_message += \"Avoid any explanation or text outside the code. \"\n", + "system_message += \"The C++ output must produce identical functionality with the fastest possible execution time on Windows.\"\n", + "\n", + "generate_content_config = types.GenerateContentConfig(system_instruction=system_message)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "01227835-15d2-40bd-a9dd-2ef35ad371dc", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(python):\n", + " user_prompt = (\n", + " \"Convert the following Python code into high-performance C++ optimized for Windows. \"\n", + " \"Use standard C++20 or newer with Windows-compatible libraries and best practices. \"\n", + " \"Ensure the implementation runs as fast as possible and produces identical output. \"\n", + " \"Use appropriate numeric types to avoid overflow or precision loss. \"\n", + " \"Avoid unnecessary abstraction; prefer direct computation and memory-efficient structures. \"\n", + " \"Respond only with C++ code, include all required headers (like , , etc.), and limit comments to only what's essential.\\n\\n\"\n", + " )\n", + " user_prompt += python\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d9fc8e2-acf0-4122-a8a9-5aadadf982ab", + "metadata": {}, + "outputs": [], + "source": [ + "def user_message_gemini(python): \n", + " return types.Content(role=\"user\", parts=[types.Part.from_text(text=user_prompt_for(python))]) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "334c8b84-6e37-40fc-97ac-40a1b3aa29fa", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(python):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(python)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4aca87ac-6330-4ed4-a36f-1726fd0ada1a", + "metadata": {}, + "outputs": [], + "source": [ + "def write_output(cpp):\n", + " code = cpp.replace(\"```cpp\", \"\").replace(\"```c++\", \"\").replace(\"```\", \"\").strip()\n", + " \n", + " if not \"#include\" in code:\n", + " raise ValueError(\"C++ code appears invalid: missing #include directives.\")\n", + "\n", + " with open(\"optimized.cpp\", \"w\", encoding=\"utf-8\", newline=\"\\n\") as f:\n", + " f.write(code) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fcf42642-1a55-4556-8738-0c8c02effa9c", + "metadata": {}, + "outputs": [], + "source": [ + "# Generate CPP code using Gemini\n", + "\n", + "def optimize_gemini(python):\n", + " stream = gemini_client.models.generate_content_stream(\n", + " model = MODEL_GEMINI,\n", + " config=generate_content_config,\n", + " contents=user_message_gemini(python)\n", + " )\n", + " cpp_code = \"\"\n", + " for chunk in stream:\n", + " chunk_text = chunk.text\n", + " cpp_code += chunk_text\n", + " print(chunk_text, end=\"\", flush=True) \n", + " write_output(cpp_code)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3f06a301-4397-4d63-9226-657bb2ddb792", + "metadata": {}, + "outputs": [], + "source": [ + "# Generate CPP code using Codestral\n", + "\n", + "def optimize_codestral(python):\n", + " stream = codestral_client.chat.stream(\n", + " model = MODEL_CODESTRAL,\n", + " messages = messages_for(python), \n", + " )\n", + " \n", + " cpp_code = \"\"\n", + " for chunk in stream:\n", + " chunk_text = chunk.data.choices[0].delta.content\n", + " cpp_code += chunk_text\n", + " print(chunk_text, end=\"\", flush=True) \n", + " write_output(cpp_code)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8bd51601-7c1d-478d-b043-6f92739e5c4b", + "metadata": {}, + "outputs": [], + "source": [ + "# Actual code to convert\n", + "\n", + "pi = \"\"\"\n", + "import time\n", + "\n", + "def calculate(iterations, param1, param2):\n", + " result = 1.0\n", + " for i in range(1, iterations+1):\n", + " j = i * param1 - param2\n", + " result -= (1/j)\n", + " j = i * param1 + param2\n", + " result += (1/j)\n", + " return result\n", + "\n", + "start_time = time.time()\n", + "result = calculate(100_000_000, 4, 1) * 4\n", + "end_time = time.time()\n", + "\n", + "print(f\"Result: {result:.12f}\")\n", + "print(f\"Execution Time: {(end_time - start_time):.6f} seconds\")\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "db9ea24e-d381-48ac-9196-853d2527dcca", + "metadata": {}, + "outputs": [], + "source": [ + "exec(pi)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f3e26708-8475-474d-8e96-e602c3d5ef9f", + "metadata": {}, + "outputs": [], + "source": [ + "optimize_gemini(pi)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2cc23ea7-6062-4354-92bc-730baa52a50b", + "metadata": {}, + "outputs": [], + "source": [ + "# CPP Compilation\n", + "\n", + "!g++ -O3 -std=c++20 -o optimized.exe optimized.cpp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b14704d-95fe-4ed2-861f-af591bf3090e", + "metadata": {}, + "outputs": [], + "source": [ + "!.\\optimized.exe" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5d756d1a-1d49-4cfb-bed7-8748d848b083", + "metadata": {}, + "outputs": [], + "source": [ + "optimize_codestral(pi)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e286dc8-9532-48b1-b748-a7950972e7df", + "metadata": {}, + "outputs": [], + "source": [ + "!g++ -O3 -std=c++20 -o optimized.exe optimized.cpp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "61fe0044-7679-4245-9e59-50642f3d80c6", + "metadata": {}, + "outputs": [], + "source": [ + "!.\\optimized.exe" + ] + }, + { + "cell_type": "markdown", + "id": "f0c0392c-d2a7-4619-82a2-f7b9fa7c43f9", + "metadata": {}, + "source": [ + "## Hard Code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9ca53eb4-46cd-435b-a950-0e2a8f845535", + "metadata": {}, + "outputs": [], + "source": [ + "python_hard = \"\"\"# Be careful to support large number sizes\n", + "\n", + "def lcg(seed, a=1664525, c=1013904223, m=2**32):\n", + " value = seed\n", + " while True:\n", + " value = (a * value + c) % m\n", + " yield value\n", + " \n", + "def max_subarray_sum(n, seed, min_val, max_val):\n", + " lcg_gen = lcg(seed)\n", + " random_numbers = [next(lcg_gen) % (max_val - min_val + 1) + min_val for _ in range(n)]\n", + " max_sum = float('-inf')\n", + " for i in range(n):\n", + " current_sum = 0\n", + " for j in range(i, n):\n", + " current_sum += random_numbers[j]\n", + " if current_sum > max_sum:\n", + " max_sum = current_sum\n", + " return max_sum\n", + "\n", + "def total_max_subarray_sum(n, initial_seed, min_val, max_val):\n", + " total_sum = 0\n", + " lcg_gen = lcg(initial_seed)\n", + " for _ in range(20):\n", + " seed = next(lcg_gen)\n", + " total_sum += max_subarray_sum(n, seed, min_val, max_val)\n", + " return total_sum\n", + "\n", + "# Parameters\n", + "n = 10000 # Number of random numbers\n", + "initial_seed = 42 # Initial seed for the LCG\n", + "min_val = -10 # Minimum value of random numbers\n", + "max_val = 10 # Maximum value of random numbers\n", + "\n", + "# Timing the function\n", + "import time\n", + "start_time = time.time()\n", + "result = total_max_subarray_sum(n, initial_seed, min_val, max_val)\n", + "end_time = time.time()\n", + "\n", + "print(\"Total Maximum Subarray Sum (20 runs):\", result)\n", + "print(\"Execution Time: {:.6f} seconds\".format(end_time - start_time))\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "697cc9fe-efdb-40b7-8e43-871bd2df940e", + "metadata": {}, + "outputs": [], + "source": [ + "exec(python_hard)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17ed6329-6c5f-45af-91ff-06d73830dd0d", + "metadata": {}, + "outputs": [], + "source": [ + "optimize_gemini(python_hard)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b57f0e7-46c9-4235-86eb-389faf37b7bb", + "metadata": {}, + "outputs": [], + "source": [ + "# CPP Compilation\n", + "\n", + "!g++ -O3 -std=c++20 -o optimized.exe optimized.cpp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b8ce8d01-fda8-400d-b3d4-6f1ad3008d28", + "metadata": {}, + "outputs": [], + "source": [ + "!.\\optimized.exe" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "adbcdac7-8656-41c9-8707-d8a71998d393", + "metadata": {}, + "outputs": [], + "source": [ + "optimize_codestral(python_hard)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f9fc9b1-29cf-4510-83f8-1484d26e871e", + "metadata": {}, + "outputs": [], + "source": [ + "# CPP Compilation\n", + "\n", + "!g++ -O3 -std=c++20 -o optimized.exe optimized.cpp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "52170458-c4a1-4920-8d83-8c5ba7250759", + "metadata": {}, + "outputs": [], + "source": [ + "!.\\optimized.exe" + ] + }, + { + "cell_type": "markdown", + "id": "da6aee85-2792-487b-bef3-fec5dcf12623", + "metadata": {}, + "source": [ + "## Accommodating the entire code in Gradio" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f2a90c4f-c289-4658-a6ce-51b80e20f91f", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_gemini(python):\n", + " stream = gemini_client.models.generate_content_stream(\n", + " model = MODEL_GEMINI,\n", + " config=generate_content_config,\n", + " contents=user_message_gemini(python)\n", + " )\n", + "\n", + " cpp_code = \"\"\n", + " for chunk in stream:\n", + " chunk_text = chunk.text or \"\"\n", + " cpp_code += chunk_text\n", + " yield cpp_code.replace('```cpp\\n','').replace('```','')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e872171-96d8-4041-8cb0-0c632c5e957f", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_codestral(python):\n", + " stream = codestral_client.chat.stream(\n", + " model = MODEL_CODESTRAL,\n", + " messages = messages_for(python), \n", + " )\n", + "\n", + " cpp_code = \"\"\n", + " for chunk in stream:\n", + " chunk_text = chunk.data.choices[0].delta.content or \"\"\n", + " cpp_code += chunk_text\n", + " yield cpp_code.replace('```cpp\\n','').replace('```','') " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3340b36b-1241-4b0f-9e69-d4e5cc215a27", + "metadata": {}, + "outputs": [], + "source": [ + "def optimize(python, model):\n", + " if model.lower() == 'gemini':\n", + " result = stream_gemini(python)\n", + " elif model.lower() == 'codestral':\n", + " result = stream_codestral(python)\n", + " else:\n", + " raise ValueError(\"Unknown model\")\n", + " \n", + " for stream_so_far in result:\n", + " yield stream_so_far " + ] + }, + { + "cell_type": "markdown", + "id": "277ddd6c-e71e-4512-965a-57fca341487a", + "metadata": {}, + "source": [ + "### Gradio Implementation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "222a9eae-236e-4ba3-8f23-3d9b879ec2d0", + "metadata": {}, + "outputs": [], + "source": [ + "custom_css = \"\"\"\n", + ".scrollable-box textarea {\n", + " overflow: auto !important;\n", + " height: 400px;\n", + "}\n", + "\n", + ".python {background-color: #306998;}\n", + ".cpp {background-color: #050;}\n", + "\n", + "\"\"\"\n", + "\n", + "theme = gr.themes.Soft()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b4bd6ed1-ff8c-42d4-8da6-24b9cfd134db", + "metadata": {}, + "outputs": [], + "source": [ + "def execute_python(code):\n", + " try:\n", + " result = subprocess.run(\n", + " [\"python\", \"-c\", code],\n", + " capture_output=True,\n", + " text=True,\n", + " timeout=60\n", + " )\n", + " if result.returncode == 0:\n", + " return result.stdout or \"[No output]\"\n", + " else:\n", + " return f\"[Error]\\n{result.stderr}\"\n", + " except subprocess.TimeoutExpired:\n", + " return \"[Error] Execution timed out.\"\n", + " except Exception as e:\n", + " return f\"[Exception] {str(e)}\" " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1507c973-8699-48b2-80cd-45900c97a867", + "metadata": {}, + "outputs": [], + "source": [ + "def execute_cpp(code):\n", + " write_output(code)\n", + " \n", + " try:\n", + " compile_cmd = [\"g++\", \"-O3\", \"-std=c++20\", \"-o\", \"optimized.exe\", \"optimized.cpp\"]\n", + " compile_result = subprocess.run(compile_cmd, capture_output=True, text=True, check=True)\n", + " \n", + " run_cmd = [\"optimized.exe\"]\n", + " run_result = subprocess.run(run_cmd, check=True, text=True, capture_output=True, timeout=60)\n", + " \n", + " return run_result.stdout or \"[No output]\"\n", + " \n", + " except subprocess.CalledProcessError as e:\n", + " return f\"[Compile/Runtime Error]\\n{e.stderr}\"\n", + " except subprocess.TimeoutExpired:\n", + " return \"[Error] Execution timed out.\"\n", + " except Exception as e:\n", + " return f\"[Exception] {str(e)}\" " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "374f00f3-8fcf-4ae9-bf54-c5a44dd74844", + "metadata": {}, + "outputs": [], + "source": [ + "with gr.Blocks(css=custom_css, theme=theme) as ui:\n", + " gr.Markdown(\"## Convert code from Python to C++\")\n", + " with gr.Row():\n", + " python = gr.Textbox(label=\"Python code:\", lines=10, value=python_hard, elem_classes=[\"scrollable-box\"])\n", + " cpp = gr.Textbox(label=\"C++ code:\", lines=10, elem_classes=[\"scrollable-box\"])\n", + " with gr.Row():\n", + " model = gr.Dropdown([\"Gemini\", \"Codestral\"], label=\"Select model\", value=\"Gemini\")\n", + " convert = gr.Button(\"Convert code\")\n", + " with gr.Row():\n", + " python_run = gr.Button(\"Run Python\")\n", + " cpp_run = gr.Button(\"Run C++\")\n", + " with gr.Row():\n", + " python_out = gr.TextArea(label=\"Python result:\", elem_classes=[\"python\"])\n", + " cpp_out = gr.TextArea(label=\"C++ result:\", elem_classes=[\"cpp\"])\n", + "\n", + " convert.click(optimize, inputs=[python,model], outputs=[cpp])\n", + " python_run.click(execute_python,inputs=[python], outputs=[python_out])\n", + " cpp_run.click(execute_cpp, inputs=[cpp], outputs=[cpp_out])\n", + "\n", + "ui.launch(inbrowser=True) " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week4/community-contributions/Week4_day4_HFInference_QwenCode2.5.ipynb b/week4/community-contributions/Week4_day4_HFInference_QwenCode2.5.ipynb new file mode 100644 index 0000000..50d1302 --- /dev/null +++ b/week4/community-contributions/Week4_day4_HFInference_QwenCode2.5.ipynb @@ -0,0 +1,476 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4c07cdc9-bce0-49ad-85c7-14f1872b8519", + "metadata": {}, + "source": [ + "# Python to CPP using Qwen2.5-Coder-32B-Instruct with Hyperbolic Inference Endpoint in Windows" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f051c517-c4fd-4248-98aa-b808fae76cf6", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import io\n", + "import sys\n", + "import gradio as gr\n", + "import subprocess\n", + "from dotenv import load_dotenv\n", + "from huggingface_hub import InferenceClient\n", + "from google import genai\n", + "from google.genai import types\n", + "from mistralai import Mistral" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c6c8777b-57bc-436a-978f-21a37ea310ae", + "metadata": {}, + "outputs": [], + "source": [ + "# Load Api Keys from env\n", + "\n", + "load_dotenv(override=True)\n", + "\n", + "hf_api_key = os.getenv(\"HF_TOKEN\")\n", + "gemini_api_key = os.getenv(\"GOOGLE_API_KEY\")\n", + "mistral_api_key = os.getenv(\"MISTRAL_API_KEY\")\n", + "\n", + "if not mistral_api_key or not gemini_api_key or not hf_api_key:\n", + " print(\"API Key not found!\")\n", + "else:\n", + " print(\"API Key loaded in memory\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e5cf6f93-7e07-40e0-98b8-d4e74ea18402", + "metadata": {}, + "outputs": [], + "source": [ + "# MODELs \n", + "\n", + "MODEL_QWEN = \"Qwen/Qwen2.5-Coder-32B-Instruct\"\n", + "MODEL_GEMINI = 'gemini-2.5-flash'\n", + "MODEL_CODESTRAL = 'codestral-latest'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "689547c3-aaa5-4800-86a2-da52765997d8", + "metadata": {}, + "outputs": [], + "source": [ + "# Load Clients\n", + "\n", + "try:\n", + " gemini_client = genai.Client(api_key=gemini_api_key)\n", + " print(\"Google GenAI Client initialized successfully!\")\n", + "\n", + " codestral_client = Mistral(api_key=mistral_api_key)\n", + " print(\"Mistral Client initialized successfully!\")\n", + " \n", + " hf_client = InferenceClient(provider=\"hyperbolic\",api_key=hf_api_key)\n", + " print(\"Hyperbolic Inference Client initialized successfully!\")\n", + "except Exception as e:\n", + " print(f\"Error initializing Client: {e}\")\n", + " exit() " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1c3a81f4-99c3-463a-ae30-4656a7a246d2", + "metadata": {}, + "outputs": [], + "source": [ + "system_message = \"You are an assistant that reimplements Python code in high-performance C++ optimized for a Windows PC. \"\n", + "system_message += \"Use Windows-specific optimizations where applicable (e.g., multithreading with std::thread, SIMD, or WinAPI if necessary). \"\n", + "system_message += \"Respond only with the equivalent C++ code; include comments only where absolutely necessary. \"\n", + "system_message += \"Avoid any explanation or text outside the code. \"\n", + "system_message += \"The C++ output must produce identical functionality with the fastest possible execution time on Windows.\"\n", + "\n", + "generate_content_config = types.GenerateContentConfig(system_instruction=system_message)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0fde9514-1005-4539-b01b-0372730ce67b", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(python):\n", + " user_prompt = (\n", + " \"Convert the following Python code into high-performance C++ optimized for Windows. \"\n", + " \"Use standard C++20 or newer with Windows-compatible libraries and best practices. \"\n", + " \"Ensure the implementation runs as fast as possible and produces identical output. \"\n", + " \"Use appropriate numeric types to avoid overflow or precision loss. \"\n", + " \"Avoid unnecessary abstraction; prefer direct computation and memory-efficient structures. \"\n", + " \"Respond only with C++ code, include all required headers (like , , etc.), and limit comments to only what's essential.\\n\\n\"\n", + " )\n", + " user_prompt += python\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "89c8b010-08dd-4695-a784-65162d82a24b", + "metadata": {}, + "outputs": [], + "source": [ + "def user_message_gemini(python): \n", + " return types.Content(role=\"user\", parts=[types.Part.from_text(text=user_prompt_for(python))]) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66923158-983d-46f7-ab19-f216fb1f6a87", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(python):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(python)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9ab59a54-b28a-4d07-b04f-b568e6e25dfb", + "metadata": {}, + "outputs": [], + "source": [ + "def write_output(cpp):\n", + " code = cpp.replace(\"```cpp\", \"\").replace(\"```c++\", \"\").replace(\"```\", \"\").strip()\n", + " \n", + " if not \"#include\" in code:\n", + " raise ValueError(\"C++ code appears invalid: missing #include directives.\")\n", + "\n", + " with open(\"qwenOptimized.cpp\", \"w\", encoding=\"utf-8\", newline=\"\\n\") as f:\n", + " f.write(code) " + ] + }, + { + "cell_type": "markdown", + "id": "e05ea9f0-6ade-4699-b5fa-fb8ef9f16bcb", + "metadata": {}, + "source": [ + "### Python Codes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c515ce2c-1f8d-4484-8d34-9ffe1372dad4", + "metadata": {}, + "outputs": [], + "source": [ + "python_easy = \"\"\"\n", + "import time\n", + "\n", + "def calculate(iterations, param1, param2):\n", + " result = 1.0\n", + " for i in range(1, iterations+1):\n", + " j = i * param1 - param2\n", + " result -= (1/j)\n", + " j = i * param1 + param2\n", + " result += (1/j)\n", + " return result\n", + "\n", + "start_time = time.time()\n", + "result = calculate(100_000_000, 4, 1) * 4\n", + "end_time = time.time()\n", + "\n", + "print(f\"Result: {result:.12f}\")\n", + "print(f\"Execution Time: {(end_time - start_time):.6f} seconds\")\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "83ab4080-71ae-45e6-970b-030dc462f571", + "metadata": {}, + "outputs": [], + "source": [ + "python_hard = \"\"\"# Be careful to support large number sizes\n", + "\n", + "def lcg(seed, a=1664525, c=1013904223, m=2**32):\n", + " value = seed\n", + " while True:\n", + " value = (a * value + c) % m\n", + " yield value\n", + " \n", + "def max_subarray_sum(n, seed, min_val, max_val):\n", + " lcg_gen = lcg(seed)\n", + " random_numbers = [next(lcg_gen) % (max_val - min_val + 1) + min_val for _ in range(n)]\n", + " max_sum = float('-inf')\n", + " for i in range(n):\n", + " current_sum = 0\n", + " for j in range(i, n):\n", + " current_sum += random_numbers[j]\n", + " if current_sum > max_sum:\n", + " max_sum = current_sum\n", + " return max_sum\n", + "\n", + "def total_max_subarray_sum(n, initial_seed, min_val, max_val):\n", + " total_sum = 0\n", + " lcg_gen = lcg(initial_seed)\n", + " for _ in range(20):\n", + " seed = next(lcg_gen)\n", + " total_sum += max_subarray_sum(n, seed, min_val, max_val)\n", + " return total_sum\n", + "\n", + "# Parameters\n", + "n = 10000 # Number of random numbers\n", + "initial_seed = 42 # Initial seed for the LCG\n", + "min_val = -10 # Minimum value of random numbers\n", + "max_val = 10 # Maximum value of random numbers\n", + "\n", + "# Timing the function\n", + "import time\n", + "start_time = time.time()\n", + "result = total_max_subarray_sum(n, initial_seed, min_val, max_val)\n", + "end_time = time.time()\n", + "\n", + "print(\"Total Maximum Subarray Sum (20 runs):\", result)\n", + "print(\"Execution Time: {:.6f} seconds\".format(end_time - start_time))\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "id": "31498c5c-ecdd-4ed7-9607-4d09af893b98", + "metadata": {}, + "source": [ + "## Code Implementation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ea4a4968-e04f-4939-8c42-32c960699354", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_gemini(python):\n", + " stream = gemini_client.models.generate_content_stream(\n", + " model = MODEL_GEMINI,\n", + " config=generate_content_config,\n", + " contents=user_message_gemini(python)\n", + " )\n", + "\n", + " cpp_code = \"\"\n", + " for chunk in stream:\n", + " chunk_text = chunk.text or \"\"\n", + " cpp_code += chunk_text\n", + " yield cpp_code.replace('```cpp\\n','').replace('```','')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "69601eee-520f-4813-b796-aee9118e8a72", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_codestral(python):\n", + " stream = codestral_client.chat.stream(\n", + " model = MODEL_CODESTRAL,\n", + " messages = messages_for(python), \n", + " )\n", + "\n", + " cpp_code = \"\"\n", + " for chunk in stream:\n", + " chunk_text = chunk.data.choices[0].delta.content or \"\"\n", + " cpp_code += chunk_text\n", + " yield cpp_code.replace('```cpp\\n','').replace('```','') " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cb8899cf-54c0-4d2d-8772-42925c2e1d13", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_qwen(python):\n", + " stream = hf_client.chat.completions.create(\n", + " model = MODEL_QWEN,\n", + " messages = messages_for(python),\n", + " stream=True\n", + " )\n", + " cpp_code = \"\"\n", + " for chunk in stream:\n", + " chunk_text = chunk.choices[0].delta.content\n", + " cpp_code += chunk_text\n", + " yield cpp_code.replace('```cpp\\n','').replace('```','')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "98862fef-905c-4b50-bc7a-4c0462495b5c", + "metadata": {}, + "outputs": [], + "source": [ + "def optimize(python, model):\n", + " if model.lower() == 'gemini':\n", + " result = stream_gemini(python)\n", + " elif model.lower() == 'codestral':\n", + " result = stream_codestral(python)\n", + " elif model.lower() == 'qwen_coder':\n", + " result = stream_qwen(python)\n", + " else:\n", + " raise ValueError(\"Unknown model\")\n", + " \n", + " for stream_so_far in result:\n", + " yield stream_so_far " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa9372df-db01-41d0-842c-4857b20f93f0", + "metadata": {}, + "outputs": [], + "source": [ + "custom_css = \"\"\"\n", + ".scrollable-box textarea {\n", + " overflow: auto !important;\n", + " height: 400px;\n", + "}\n", + "\n", + ".python {background-color: #306998;}\n", + ".cpp {background-color: #050;}\n", + "\n", + "\"\"\"\n", + "\n", + "theme = gr.themes.Soft()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dbcf9fe9-c3da-466b-8478-83dcdbe7d48e", + "metadata": {}, + "outputs": [], + "source": [ + "def execute_python(code):\n", + " try:\n", + " result = subprocess.run(\n", + " [\"python\", \"-c\", code],\n", + " capture_output=True,\n", + " text=True,\n", + " timeout=60\n", + " )\n", + " if result.returncode == 0:\n", + " return result.stdout or \"[No output]\"\n", + " else:\n", + " return f\"[Error]\\n{result.stderr}\"\n", + " except subprocess.TimeoutExpired:\n", + " return \"[Error] Execution timed out.\"\n", + " except Exception as e:\n", + " return f\"[Exception] {str(e)}\" " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8029e00d-1ee8-43d1-8c87-2aa0544cf94c", + "metadata": {}, + "outputs": [], + "source": [ + "def execute_cpp(code):\n", + " write_output(code)\n", + " \n", + " try:\n", + " compile_cmd = [\"g++\", \"-O3\", \"-std=c++20\", \"-o\", \"optimized.exe\", \"optimized.cpp\"]\n", + " compile_result = subprocess.run(compile_cmd, capture_output=True, text=True, check=True)\n", + " \n", + " run_cmd = [\"optimized.exe\"]\n", + " run_result = subprocess.run(run_cmd, check=True, text=True, capture_output=True, timeout=60)\n", + " \n", + " return run_result.stdout or \"[No output]\"\n", + " \n", + " except subprocess.CalledProcessError as e:\n", + " return f\"[Compile/Runtime Error]\\n{e.stderr}\"\n", + " except subprocess.TimeoutExpired:\n", + " return \"[Error] Execution timed out.\"\n", + " except Exception as e:\n", + " return f\"[Exception] {str(e)}\" " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d5f4e88c-be15-4870-9f99-82b6273ee739", + "metadata": {}, + "outputs": [], + "source": [ + "with gr.Blocks(css=custom_css, theme=theme) as ui:\n", + " gr.Markdown(\"## Convert code from Python to C++\")\n", + " with gr.Row():\n", + " python = gr.Textbox(label=\"Python code:\", lines=10, value=python_hard, elem_classes=[\"scrollable-box\"])\n", + " cpp = gr.Textbox(label=\"C++ code:\", lines=10, elem_classes=[\"scrollable-box\"])\n", + " with gr.Row():\n", + " model = gr.Dropdown([\"Gemini\", \"Codestral\", \"QWEN_Coder\"], label=\"Select model\", value=\"Gemini\")\n", + " convert = gr.Button(\"Convert code\")\n", + " with gr.Row():\n", + " python_run = gr.Button(\"Run Python\")\n", + " cpp_run = gr.Button(\"Run C++\")\n", + " with gr.Row():\n", + " python_out = gr.TextArea(label=\"Python result:\", elem_classes=[\"python\"])\n", + " cpp_out = gr.TextArea(label=\"C++ result:\", elem_classes=[\"cpp\"])\n", + "\n", + " convert.click(optimize, inputs=[python,model], outputs=[cpp])\n", + " python_run.click(execute_python,inputs=[python], outputs=[python_out])\n", + " cpp_run.click(execute_cpp, inputs=[cpp], outputs=[cpp_out])\n", + "\n", + "ui.launch(inbrowser=True) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa1a231e-2743-4cee-afe2-783d2b9513e5", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week4/community-contributions/Week4_generate_comments_and_tests-DP.ipynb b/week4/community-contributions/Week4_generate_comments_and_tests-DP.ipynb new file mode 100644 index 0000000..09efe1d --- /dev/null +++ b/week4/community-contributions/Week4_generate_comments_and_tests-DP.ipynb @@ -0,0 +1,538 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "id": "3e473bbd-a0c2-43bd-bf99-c749784d00c3", + "metadata": {}, + "outputs": [], + "source": [ + "import gradio as gr\n", + "import openai\n", + "import anthropic\n", + "import google.generativeai as genai\n", + "import requests\n", + "import json\n", + "import os\n", + "from typing import Dict, Any, Optional\n", + "import asyncio\n", + "from dotenv import load_dotenv" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "16210512-41f1-4de3-8348-2cd7129e023f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# load API\n", + "load_dotenv(override=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "6747e275-91eb-4d2b-90b6-805f2bd9b6b7", + "metadata": {}, + "outputs": [], + "source": [ + "class CodeCommenter:\n", + " def __init__(self):\n", + " # Initialize API clients\n", + " self.openai_client = None\n", + " self.anthropic_client = None\n", + " self.gemini_client = None\n", + " \n", + " # Load API keys from environment variables\n", + " self.setup_clients()\n", + " \n", + " def setup_clients(self):\n", + " \"\"\"Initialize API clients with keys from environment variables\"\"\"\n", + " try:\n", + " # OpenAI\n", + " openai_key = os.getenv('OPENAI_API_KEY')\n", + " if openai_key:\n", + " self.openai_client = openai.OpenAI(api_key=openai_key)\n", + " \n", + " # Anthropic\n", + " anthropic_key = os.getenv('ANTHROPIC_API_KEY')\n", + " if anthropic_key:\n", + " self.anthropic_client = anthropic.Anthropic(api_key=anthropic_key)\n", + " \n", + " # Google Gemini\n", + " gemini_key = os.getenv('GOOGLE_API_KEY')\n", + " if gemini_key:\n", + " genai.configure(api_key=gemini_key)\n", + " self.gemini_client = genai.GenerativeModel('gemini-2.0-flash-exp')\n", + " \n", + " except Exception as e:\n", + " print(f\"Warning: Error setting up API clients: {e}\")\n", + " \n", + " def create_comments_prompt(self, code: str, language: str) -> str:\n", + " \"\"\"Create a prompt for the LLM to add comments and docstrings\"\"\"\n", + " return f\"\"\"Please add detailed and helpful comments and docstrings to the following {language} code. \n", + " \n", + "Guidelines:\n", + "1. Add comprehensive docstrings for functions, classes, and modules\n", + "2. Add inline comments explaining complex logic\n", + "3. Follow the commenting conventions for {language}\n", + "4. Maintain the original code structure and functionality\n", + "5. Make comments clear and professional\n", + "6. Don't change the actual code logic, only add comments\n", + "7. Do not add code markdown delimiters like ```python\n", + "\n", + "Here's the code to comment:\n", + "\n", + "{code}\n", + "\n", + "Please return only the commented code without any additional explanation or markdown formatting.\"\"\"\n", + "\n", + " def create_tests_prompt(self, code: str, language: str) -> str:\n", + " \"\"\"Create a prompt for the LLM to generate unit tests\"\"\"\n", + " return f\"\"\"Please generate comprehensive unit tests for the following {language} code.\n", + " \n", + "Guidelines:\n", + "1. Use appropriate testing framework for {language} (pytest for Python, JUnit for Java, etc.)\n", + "2. Create tests for all functions and methods\n", + "3. Include both positive and negative test cases\n", + "4. Test edge cases and error conditions\n", + "5. Use meaningful test names that describe what is being tested\n", + "6. Include setup and teardown methods if needed\n", + "7. Add mock objects for external dependencies (like database connections)\n", + "8. Do not add code markdown delimiters like ```python\n", + "9. Follow testing best practices for {language}\n", + "\n", + "Here's the code to test:\n", + "\n", + "{code}\n", + "\n", + "Please return only the unit test code without any additional explanation or markdown formatting.\"\"\"\n", + "\n", + " def create_combined_prompt(self, code: str, language: str) -> str:\n", + " \"\"\"Create a prompt for the LLM to add both comments and unit tests\"\"\"\n", + " return f\"\"\"Please add detailed comments and docstrings to the following {language} code AND generate comprehensive unit tests for it.\n", + " \n", + "For Comments:\n", + "1. Add comprehensive docstrings for functions, classes, and modules\n", + "2. Add inline comments explaining complex logic\n", + "3. Follow the commenting conventions for {language}\n", + "4. Don't change the actual code logic, only add comments\n", + "\n", + "For Unit Tests:\n", + "1. Use appropriate testing framework for {language} (pytest for Python, JUnit for Java, etc.)\n", + "2. Create tests for all functions and methods\n", + "3. Include both positive and negative test cases\n", + "4. Test edge cases and error conditions\n", + "5. Add mock objects for external dependencies (like database connections)\n", + "6. Follow testing best practices for {language}\n", + "\n", + "Structure your response as:\n", + "1. First, provide the original code with added comments and docstrings \n", + "2. Then, provide the unit tests as a separate section\n", + "3. Do not add code markdown delimiters like ```python\n", + "4. The 2 separated portions of code, comments and unit test should be clearly demarcated by comments specifying the following section purpose\n", + "\n", + "Here's the code:\n", + "\n", + "{code}\n", + "\n", + "Please return the commented code followed by the unit tests, clearly separated.\"\"\"\n", + "\n", + " def call_openai(self, prompt: str, model: str = \"gpt-4o-mini\") -> str:\n", + " \"\"\"Make API call to OpenAI\"\"\"\n", + " if not self.openai_client:\n", + " return \"Error: OpenAI API key not configured. Please set OPENAI_API_KEY environment variable.\"\n", + " \n", + " try:\n", + " response = self.openai_client.chat.completions.create(\n", + " model=model,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": \"You are a helpful coding assistant that adds detailed comments, docstrings, and generates unit tests for code.\"},\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ],\n", + " max_tokens=4000,\n", + " temperature=0.1\n", + " )\n", + " return response.choices[0].message.content.strip()\n", + " except Exception as e:\n", + " return f\"Error calling OpenAI API: {str(e)}\"\n", + " \n", + " def call_anthropic(self, prompt: str, model: str = \"claude-3-5-haiku-20241022\") -> str:\n", + " \"\"\"Make API call to Anthropic Claude\"\"\"\n", + " if not self.anthropic_client:\n", + " return \"Error: Anthropic API key not configured. Please set ANTHROPIC_API_KEY environment variable.\"\n", + " \n", + " try:\n", + " response = self.anthropic_client.messages.create(\n", + " model=model,\n", + " max_tokens=4000,\n", + " temperature=0.1,\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " )\n", + " return response.content[0].text.strip()\n", + " except Exception as e:\n", + " return f\"Error calling Anthropic API: {str(e)}\"\n", + " \n", + " def call_gemini(self, prompt: str) -> str:\n", + " \"\"\"Make API call to Google Gemini\"\"\"\n", + " if not self.gemini_client:\n", + " return \"Error: Google API key not configured. Please set GOOGLE_API_KEY environment variable.\"\n", + " \n", + " try:\n", + " response = self.gemini_client.generate_content(\n", + " prompt,\n", + " generation_config=genai.types.GenerationConfig(\n", + " max_output_tokens=4000,\n", + " temperature=0.1,\n", + " )\n", + " )\n", + " return response.text.strip()\n", + " except Exception as e:\n", + " return f\"Error calling Gemini API: {str(e)}\"\n", + " \n", + " def call_ollama(self, prompt: str, model: str = \"llama3.2:latest\") -> str:\n", + " \"\"\"Make API call to Ollama (local)\"\"\"\n", + " try:\n", + " url = \"http://localhost:11434/api/generate\"\n", + " data = {\n", + " \"model\": model,\n", + " \"prompt\": prompt,\n", + " \"stream\": False,\n", + " \"options\": {\n", + " \"temperature\": 0.1,\n", + " \"num_predict\": 4000\n", + " }\n", + " }\n", + " \n", + " response = requests.post(url, json=data, timeout=60)\n", + " if response.status_code == 200:\n", + " result = response.json()\n", + " return result.get('response', '').strip()\n", + " else:\n", + " return f\"Error calling Ollama API: HTTP {response.status_code}\"\n", + " except requests.exceptions.ConnectionError:\n", + " return \"Error: Could not connect to Ollama. Make sure Ollama is running locally on port 11434.\"\n", + " except Exception as e:\n", + " return f\"Error calling Ollama API: {str(e)}\"\n", + "\n", + " def process_code(self, language: str, code: str, llm: str, generate_comments: bool, generate_tests: bool) -> str:\n", + " \"\"\"Process the given code based on selected options\"\"\"\n", + " if not code.strip():\n", + " return \"Error: Please provide code to process.\"\n", + " \n", + " if not generate_comments and not generate_tests:\n", + " return \"Error: Please select at least one option (Generate comments or Generate test units).\"\n", + " \n", + " # Determine which prompt to use\n", + " if generate_comments and generate_tests:\n", + " prompt = self.create_combined_prompt(code, language)\n", + " elif generate_comments:\n", + " prompt = self.create_comments_prompt(code, language)\n", + " else: # generate_tests only\n", + " prompt = self.create_tests_prompt(code, language)\n", + " \n", + " # Route to appropriate LLM\n", + " if llm == \"gpt-4o-mini\":\n", + " return self.call_openai(prompt, \"gpt-4o-mini\")\n", + " elif llm == \"claude-3-5-haiku-20241022\":\n", + " return self.call_anthropic(prompt, \"claude-3-5-haiku-20241022\")\n", + " elif llm == \"gemini-2.0-flash\":\n", + " return self.call_gemini(prompt)\n", + " elif llm == \"ollama:llama3.2:latest\":\n", + " return self.call_ollama(prompt, \"llama3.2:latest\")\n", + " else:\n", + " return f\"Error: Unsupported LLM: {llm}\"" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "813f0911-d53f-4887-9341-656712e32d8f", + "metadata": {}, + "outputs": [], + "source": [ + "def create_gradio_interface():\n", + " \"\"\"Create and configure the Gradio interface\"\"\"\n", + " commenter = CodeCommenter()\n", + " \n", + " # Define the main function for the interface\n", + " def process_code_interface(language, code, llm, generate_comments, generate_tests):\n", + " \"\"\"Process the code and return processed version based on selected options\"\"\"\n", + " if not code.strip():\n", + " return \"Please enter some code to process.\"\n", + " \n", + " if not generate_comments and not generate_tests:\n", + " return \"Please select at least one option: Generate comments or Generate test units.\"\n", + " \n", + " # Show processing message\n", + " options = []\n", + " if generate_comments:\n", + " options.append(\"comments\")\n", + " if generate_tests:\n", + " options.append(\"unit tests\")\n", + " \n", + " processing_msg = f\"Processing {language} code with {llm} to generate {' and '.join(options)}...\"\n", + " print(processing_msg)\n", + " \n", + " # Process the code\n", + " result = commenter.process_code(language, code, llm, generate_comments, generate_tests)\n", + " return result\n", + " \n", + " # Define default code\n", + " default_code = \"\"\"import pyodbc\n", + "from tabulate import tabulate\n", + "def connect_to_sql_server(server_name, database, username=None, password=None):\n", + " try:\n", + " if username and password:\n", + " connection_string = f\"DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server_name};DATABASE={database};UID={username};PWD={password}\"\n", + " else:\n", + " connection_string = f\"DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server_name};DATABASE={database};Trusted_Connection=yes\"\n", + " connection = pyodbc.connect(connection_string)\n", + " print(f\"Successfully connected to {server_name}/{database}\")\n", + " return connection\n", + " except Exception as e:\n", + " print(f\"Failed to connect to {server_name}/{database}: {str(e)}\")\n", + " return None\n", + "def get_record_count(connection, table_name):\n", + " try:\n", + " cursor = connection.cursor()\n", + " query = f\"SELECT COUNT(*) FROM {table_name}\"\n", + " cursor.execute(query)\n", + " count = cursor.fetchone()[0]\n", + " cursor.close()\n", + " print(f\"Record count for {table_name}: {count}\")\n", + " return count\n", + " except Exception as e:\n", + " print(f\"Failed to get record count for {table_name}: {str(e)}\")\n", + " return None\n", + "def select_top_records(connection, table_name, n):\n", + " try:\n", + " cursor = connection.cursor()\n", + " query = f\"SELECT TOP {n} * FROM {table_name}\"\n", + " cursor.execute(query)\n", + " records = cursor.fetchall()\n", + " columns = [column[0] for column in cursor.description]\n", + " cursor.close()\n", + " print(f\"Top {n} records from {table_name}\")\n", + " if records:\n", + " print(tabulate(records, headers=columns, tablefmt=\"grid\"))\n", + " return records\n", + " except Exception as e:\n", + " print(f\"Failed to retrieve top {n} records from {table_name}: {str(e)}\")\n", + " return None\n", + "conn = connect_to_sql_server(\"localhost\", \"AdventureWorks_lite\")\n", + "if conn:\n", + " total_records = get_record_count(conn, \"Sales.SalesOrderDetail\")\n", + " top_records = select_top_records(conn, \"Production.Product\", 10)\n", + " conn.close()\n", + " print(\"Connection closed successfully\")\"\"\"\n", + "\n", + " css = \"\"\"\n", + "textarea[rows]:not([rows=\"1\"]) {\n", + " overflow-y: auto !important;\n", + " scrollbar-width: thin !important;\n", + "}\n", + "textarea[rows]:not([rows=\"1\"])::-webkit-scrollbar {\n", + " all: initial !important;\n", + " background: #f1f1f1 !important;\n", + "}\n", + "textarea[rows]:not([rows=\"1\"])::-webkit-scrollbar-thumb {\n", + " all: initial !important;\n", + " background: #a8a8a8 !important;\n", + "}\n", + "\"\"\"\n", + "\n", + " # Create the interface\n", + " with gr.Blocks(title=\"Code Commenter & Test Generator\", theme=gr.themes.Base(), css=css) as interface:\n", + " gr.Markdown(\"# 🔧 Code Commenter & Test Generator\")\n", + " gr.Markdown(\"Add detailed comments, docstrings, and/or generate unit tests for your code using various LLM models.\")\n", + " \n", + " with gr.Row():\n", + " with gr.Column():\n", + " code_input = gr.Textbox(\n", + " label=\"Input Code\",\n", + " value=default_code,\n", + " lines=15,\n", + " max_lines=20,\n", + " info=\"Enter the code you want to process\"\n", + " )\n", + " \n", + " with gr.Column():\n", + " code_output = gr.Textbox(\n", + " label=\"Processed Code\",\n", + " lines=20,\n", + " max_lines=20,\n", + " info=\"Your code with added comments, docstrings, and/or unit tests\"\n", + " )\n", + " \n", + " # Add checkboxes below the textboxes\n", + " with gr.Row():\n", + " with gr.Column():\n", + " generate_comments_checkbox = gr.Checkbox(\n", + " label=\"Generate comments\",\n", + " value=True,\n", + " info=\"Add detailed comments and docstrings to the code\"\n", + " )\n", + " generate_tests_checkbox = gr.Checkbox(\n", + " label=\"Generate test units\",\n", + " value=False,\n", + " info=\"Generate comprehensive unit tests for the code\"\n", + " )\n", + " \n", + " with gr.Row():\n", + " with gr.Column(scale=1):\n", + " language_dropdown = gr.Dropdown(\n", + " choices=[\"Python\", \"Ruby\", \"Rust\", \"C++\", \"Java\"],\n", + " value=\"Python\",\n", + " label=\"Programming Language\",\n", + " info=\"Select the programming language of your code\"\n", + " )\n", + " \n", + " llm_dropdown = gr.Dropdown(\n", + " choices=[\n", + " \"gpt-4o-mini\",\n", + " \"claude-3-5-haiku-20241022\", \n", + " \"gemini-2.0-flash\",\n", + " \"ollama:llama3.2:latest\"\n", + " ],\n", + " value=\"gpt-4o-mini\",\n", + " label=\"LLM Model\",\n", + " info=\"Choose the language model to use\"\n", + " )\n", + " \n", + " generate_btn = gr.Button(\n", + " \"🚀 Process Code\", \n", + " variant=\"primary\",\n", + " size=\"lg\"\n", + " )\n", + " \n", + " # Add some API setup information\n", + " gr.Markdown(\"## 📝 API Setup Instructions\")\n", + " gr.Markdown(\"\"\"\n", + " To use this tool, you need to set up API keys as environment variables:\n", + " \n", + " - **OpenAI**: Set `OPENAI_API_KEY`\n", + " - **Anthropic**: Set `ANTHROPIC_API_KEY` \n", + " - **Google Gemini**: Set `GOOGLE_API_KEY`\n", + " - **Ollama**: Make sure Ollama is running locally on port 11434\n", + " \"\"\")\n", + " \n", + " gr.Markdown(\"## ✨ Features\")\n", + " gr.Markdown(\"\"\"\n", + " - **Generate Comments**: Add detailed docstrings and inline comments\n", + " - **Generate Unit Tests**: Create comprehensive test suites with mocking for external dependencies\n", + " - **Combined Mode**: Generate both comments and unit tests in one go\n", + " - **Multiple LLMs**: Choose from OpenAI, Anthropic, Google Gemini, or local Ollama models\n", + " - **Multiple Languages**: Support for Python, Ruby, Rust, C++, and Java\n", + " \"\"\")\n", + " \n", + " # Connect the button to the processing function\n", + " generate_btn.click(\n", + " fn=process_code_interface,\n", + " inputs=[language_dropdown, code_input, llm_dropdown, generate_comments_checkbox, generate_tests_checkbox],\n", + " outputs=code_output,\n", + " show_progress=True\n", + " )\n", + " \n", + " return interface" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "ef461e08-c1d5-406d-b7d2-a4329f16486e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "🚀 Starting Code Commenter & Test Generator...\n", + "📋 Setting up Gradio interface...\n", + "🌐 Launching interface...\n", + "💡 The interface will open in your default browser\n", + "🔧 Make sure to set up your API keys as environment variables\n", + "* Running on local URL: http://127.0.0.1:7860\n", + "\n", + "To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print(\"🚀 Starting Code Commenter & Test Generator...\")\n", + "print(\"📋 Setting up Gradio interface...\")\n", + "\n", + "# Create and launch the interface\n", + "interface = create_gradio_interface()\n", + "\n", + "print(\"🌐 Launching interface...\")\n", + "print(\"💡 The interface will open in your default browser\")\n", + "print(\"🔧 Make sure to set up your API keys as environment variables\")\n", + "\n", + "# Launch with auto-opening in browser\n", + "interface.launch(\n", + " server_name=\"127.0.0.1\",\n", + " server_port=7860,\n", + " share=False,\n", + " inbrowser=True,\n", + " show_error=True\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week4/community-contributions/code_commentor.ipynb b/week4/community-contributions/code_commentor.ipynb new file mode 100644 index 0000000..3bf10a5 --- /dev/null +++ b/week4/community-contributions/code_commentor.ipynb @@ -0,0 +1,335 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "07bb451d-2b91-425f-b8ea-6f35ced780b0", + "metadata": {}, + "source": [ + "# AI Code Commenting Assistant \n", + "\n", + "## Project Summary \n", + "\n", + "**Purpose**: \n", + "An AI-powered assistant that automatically generates **clear, concise code comments** to improve code readability and maintainability. \n", + "\n", + "**Key Features**: \n", + "- **Language-Agnostic**: Auto-detects programming languages or allows manual specification \n", + "- **Smart Commenting**: Focuses on explaining **complex logic, algorithms, and edge cases** (not obvious syntax) \n", + "- **Customizable**: Optional focus areas let users prioritize specific parts (e.g., database queries, recursion) \n", + "- **Efficient Workflow**: Processes code in chunks and preserves original formatting \n", + "\n", + "**Benefits**: \n", + "✔ Saves time writing documentation \n", + "✔ Helps developers understand unfamiliar code \n", + "✔ Supports multiple languages (Python, JavaScript, C++, SQL, etc.) \n", + "✔ Avoids redundant comments on trivial operations \n", + "\n", + "**Example Use Case**: \n", + "```python \n", + "# Before AI: \n", + "def fib(n): \n", + " if n <= 1: return n \n", + " else: return fib(n-1) + fib(n-2) \n", + "\n", + "# After AI: \n", + "def fib(n): \n", + " # Recursively computes nth Fibonacci number (O(2^n) time) \n", + " if n <= 1: return n # Base case \n", + " else: return fib(n-1) + fib(n-2) # Recursive case " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a0413ae1-0348-4884-ba95-384c4c8f841c", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install --upgrade huggingface_hub" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b22da766-042b-402f-9e05-78aa8f45ddd4", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import io\n", + "from dotenv import load_dotenv\n", + "from google import genai\n", + "from google.genai import types\n", + "from openai import OpenAI\n", + "from anthropic import Anthropic\n", + "from huggingface_hub import InferenceClient\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5af6d3de-bab6-475e-b2f3-7b788bb2e529", + "metadata": {}, + "outputs": [], + "source": [ + "# load environments\n", + "load_dotenv(override=True)\n", + "os.environ['ANTHROPIC_API_KEY'] = os.getenv(\"CLAUDE_API_KEY\")\n", + "os.environ[\"HF_TOKEN\"] = os.getenv(\"HF_TOKEN\")\n", + "gemini_api_key= os.getenv(\"GEMINI_API_KEY\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cad0755e-4174-4fbc-84e6-15cc54bc609a", + "metadata": {}, + "outputs": [], + "source": [ + "#initialize remote models\n", + "claude= Anthropic()\n", + "gemini = genai.Client(api_key=gemini_api_key)\n", + "\n", + "#opensource models\n", + "qwen = InferenceClient(provider=\"featherless-ai\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31d75812-1cd3-4512-8446-022c3357c354", + "metadata": {}, + "outputs": [], + "source": [ + "#initialize local model\n", + "llama = OpenAI(base_url=\"http://localhost:11434/v1\", api_key=\"ollama\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31316379-2a56-4707-b207-ea60b490f536", + "metadata": {}, + "outputs": [], + "source": [ + "#models\n", + "claude_model = \"claude-3-5-haiku-latest\"\n", + "gemini_model = \"gemini-2.5-pro\"\n", + "qwen_model= \"Qwen/Qwen2.5-Coder-32B-Instruct\"\n", + "llama_model = \"llama3:8b\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b7d9c4bf-0955-4406-8717-ffa7bdd0bec9", + "metadata": {}, + "outputs": [], + "source": [ + "system_message=\"\"\"\n", + "You are an expert AI specialized in code documentation. Your task is to generate concise, meaningful comments that explain the purpose and logic of provided code. Follow these rules:\n", + "\n", + "1. **Infer language**: Auto-detect programming language and use appropriate comment syntax\n", + "2. **Explain why, not what**: Focus on purpose, edge cases, and non-obvious logic\n", + "3. **Be concise**: Maximum 1-2 sentences per comment block\n", + "4. **Prioritize key sections**: Only comment complex logic, algorithms, or critical operations\n", + "5. **Maintain structure**: Preserve original code formatting and indentation\n", + "6. **Output format**: Return ONLY commented code with no additional text\n", + "\n", + "Commenting guidelines by language:\n", + "- Python: `# Inline comments` and `\"\"Docstrings\"\"`\n", + "- JavaScript/Java: `// Line comments` and `/* Block comments */`\n", + "- C/C++: `//` and `/* */`\n", + "- SQL: `-- Line comments`\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "79dfe110-1523-40c7-ad90-2787ed22fd8d", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt(code):\n", + " prompt = f\"\"\"\n", + " i want to document my code for better understanding. Please generate meaningful necessary comments\n", + " here is my code:\n", + " {code}\n", + "\n", + " Return ONLY commented code with no additional text\n", + " \"\"\"\n", + "\n", + " return prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c7bcf29e-ec78-4cfd-9b41-f2dc86400435", + "metadata": {}, + "outputs": [], + "source": [ + "def conversation_template(code):\n", + " messages = [\n", + " {\"role\":\"system\", \"content\":system_message},\n", + " {\"role\":\"user\",\"content\":user_prompt(code)}\n", + " ]\n", + " return messages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a36fec0f-7eba-4ccd-8fc4-cbf5ade76fa2", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_gemini(code):\n", + " message = user_prompt(code)\n", + " response = gemini.models.generate_content_stream(\n", + " model=gemini_model,\n", + " config= types.GenerateContentConfig(\n", + " system_instruction = system_message,\n", + " temperature = 0.8,\n", + " ),\n", + " contents = [message]\n", + " )\n", + "\n", + " result = \"\"\n", + " for chunk in response:\n", + " result += chunk.text or \"\"\n", + " yield result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e5d1e0c0-dc88-43ee-8698-82ad9ce7c51b", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_claude(code):\n", + " messages = [{\"role\":\"user\",\"content\":user_prompt(code)}]\n", + " response = claude.messages.stream(\n", + " model= claude_model,\n", + " temperature=0.8,\n", + " messages = messages,\n", + " max_tokens=5000\n", + " )\n", + "\n", + " result = \"\"\n", + " with response as stream:\n", + " for text in stream.text_stream:\n", + " result += text or \"\"\n", + " yield result\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "903c97e5-9170-449e-8a0f-9f906351ec45", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_opensource(code,model):\n", + " model = model.lower()\n", + " client = globals()[model]\n", + " model = globals()[f\"{model}_model\"]\n", + " stream = client.chat.completions.create(\n", + " model = model,\n", + " messages= conversation_template(code),\n", + " temperature = 0.7,\n", + " stream = True\n", + " )\n", + "\n", + " result = \"\"\n", + " for chunk in stream:\n", + " result += chunk.choices[0].delta.content or \"\"\n", + " yield result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ff051c22-a2f8-4153-b970-f8a466a4cf5a", + "metadata": {}, + "outputs": [], + "source": [ + "def commentor(code, model):\n", + " model =model.lower()\n", + " if model == \"claude\":\n", + " result = stream_claude(code)\n", + " elif model == \"gemini\":\n", + " result = stream_gemini(code)\n", + " elif model == \"qwen\" or model == \"llama\":\n", + " result = stream_opensource(code, model)\n", + "\n", + "\n", + " for code in result:\n", + " yield code.replace(\"```cpp\\n\",\"\").replace(\"```python\\n\",\"\").replace(\"```javascript\\n\",\"\").replace(\"```typescript\\n\",\"\").replace(\"```\",\"\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10daf070-3546-4073-a2a0-3f5f8fc156f0", + "metadata": {}, + "outputs": [], + "source": [ + "with gr.Blocks() as ui:\n", + " gr.Markdown(\"# Genarate comment\")\n", + " with gr.Row():\n", + " raw_code = gr.Textbox(label=\"Raw Code:\", lines=10)\n", + " commented_code = gr.Textbox(label=\"Commented_code\",lines=10)\n", + " with gr.Row():\n", + " models = gr.Dropdown([\"Gemini\",\"Claude\",\"Llama\",\"Qwen\"], value=\"Gemini\")\n", + " with gr.Row():\n", + " generate_comment = gr.Button(\"Generate Comment\")\n", + "\n", + " generate_comment.click(commentor, inputs=[raw_code, models], outputs=[commented_code])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "afb87f32-f25e-40c5-844a-d2b7af748192", + "metadata": {}, + "outputs": [], + "source": [ + "ui.launch(inbrowser=True,debug=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "96bc48ad-10ad-4821-b58e-ea1b22cdcdc9", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week4/community-contributions/day5_java_code_commenter.ipynb b/week4/community-contributions/day5_java_code_commenter.ipynb new file mode 100644 index 0000000..49ef719 --- /dev/null +++ b/week4/community-contributions/day5_java_code_commenter.ipynb @@ -0,0 +1,300 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "45ca91c2", + "metadata": {}, + "source": [ + "# AI tool to add comments to the provided Java code\n", + "\n", + "Here we build a Gradio App that uses the frontier models to add comments to a java code. For testing purposes I have used the *cheaper* versions of the models, not the ones the leaderboards indicate as the best ones." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f44901f5", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import google.generativeai as genai\n", + "import anthropic\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c47706b3", + "metadata": {}, + "outputs": [], + "source": [ + "# environment\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35446b9a", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()\n", + "claude = anthropic.Anthropic()\n", + "genai.configure()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0e899efd", + "metadata": {}, + "outputs": [], + "source": [ + "OPENAI_MODEL = \"gpt-4o-mini\"\n", + "CLAUDE_MODEL = \"claude-3-haiku-20240307\"\n", + "GEMINI_MODEL = 'gemini-2.0-flash-lite'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47640f53", + "metadata": {}, + "outputs": [], + "source": [ + "system_message = \"You are an assistant that adds comments to java code. \"\n", + "system_message += \"Do not make any changes to the code itself.\"\n", + "system_message += \"Use comments sparingly. Only add them in places where they help to undestand how the code works. Do not comment every single line of the code.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f41ccbf0", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(code):\n", + " user_prompt = \"Add helpful comments to this java code. \"\n", + " user_prompt += \"Do not change the code itself.\\n\\n\"\n", + " user_prompt += code\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c57c0000", + "metadata": {}, + "outputs": [], + "source": [ + "test_code = \"\"\"\n", + "package com.hma.kafkaproducertest.producer;\n", + "\n", + "import com.hma.kafkaproducertest.model.TestDTO;\n", + "import org.springframework.cloud.stream.function.StreamBridge;\n", + "import org.springframework.messaging.Message;\n", + "import org.springframework.messaging.support.MessageBuilder;\n", + "import org.springframework.stereotype.Component;\n", + "\n", + "import java.util.Arrays;\n", + "import java.util.Comparator;\n", + "import java.util.StringJoiner;\n", + "import java.util.stream.Collectors;\n", + "import java.util.stream.IntStream;\n", + "\n", + "@Component\n", + "public class TestProducer {\n", + "\n", + " public static final String EVENT_TYPE_HEADER = \"event-type\";\n", + " private static final String BINDING_NAME = \"testProducer-out-0\";\n", + "\n", + " private final StreamBridge streamBridge;\n", + "\n", + " public TestProducer(StreamBridge streamBridge) {\n", + " this.streamBridge = streamBridge;\n", + " }\n", + "\n", + " public void sendMessage(TestDTO payload, String eventType){\n", + " Message message = MessageBuilder\n", + " .withPayload(payload)\n", + " .setHeader(EVENT_TYPE_HEADER, eventType)\n", + " .build();\n", + "\n", + " streamBridge.send(BINDING_NAME, message);\n", + " }\n", + "\n", + " public void test(String t1, String t2) {\n", + " var s = t1.length() > t2.length() ? t2 : t1;\n", + " var l = t1.length() > t2.length() ? t1 : t2;\n", + " var res = true;\n", + " for (int i = 0; i < s.length(); i++) {\n", + " if (s.charAt(i) == l.charAt(i)) {\n", + " res = false;\n", + " break;\n", + " }\n", + " }\n", + " System.out.println(res);\n", + " }\n", + "}\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00c71128", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_gpt(code):\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(code)}\n", + " ]\n", + " stream = openai.chat.completions.create(\n", + " model=OPENAI_MODEL,\n", + " messages=messages,\n", + " stream=True\n", + " )\n", + " result = \"\"\n", + " for chunk in stream:\n", + " result += chunk.choices[0].delta.content or \"\"\n", + " yield result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ca92f8a8", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_claude(code):\n", + " result = claude.messages.stream(\n", + " model=CLAUDE_MODEL,\n", + " max_tokens=2000,\n", + " system=system_message,\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": user_prompt_for(code)},\n", + " ],\n", + " )\n", + " response = \"\"\n", + " with result as stream:\n", + " for text in stream.text_stream:\n", + " response += text or \"\"\n", + " yield response" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9dffed4b", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_gemini(code):\n", + " gemini = genai.GenerativeModel(\n", + " model_name=GEMINI_MODEL,\n", + " system_instruction=system_message\n", + " )\n", + " stream = gemini.generate_content(user_prompt_for(code), stream=True)\n", + " result = \"\"\n", + " for chunk in stream:\n", + " result += chunk.text or \"\"\n", + " yield result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31f9c267", + "metadata": {}, + "outputs": [], + "source": [ + "def comment_code(code, model):\n", + " if model==\"GPT\":\n", + " result = stream_gpt(code)\n", + " elif model==\"Claude\":\n", + " result = stream_claude(code)\n", + " elif model==\"Gemini\":\n", + " result = stream_gemini(code)\n", + " else:\n", + " raise ValueError(\"Unknown model\")\n", + " yield from result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c04c0a1b", + "metadata": {}, + "outputs": [], + "source": [ + "with gr.Blocks() as ui:\n", + " with gr.Row():\n", + " original_code = gr.Textbox(label=\"Java code:\", lines=10, value=test_code)\n", + " commented_code = gr.Markdown(label=\"Commented code:\")\n", + " with gr.Row():\n", + " model = gr.Dropdown([\"GPT\", \"Claude\", \"Gemini\"], label=\"Select model\", value=\"GPT\")\n", + " comment = gr.Button(\"Comment code\")\n", + "\n", + " comment.click(comment_code, inputs=[original_code, model], outputs=[commented_code])\n", + "\n", + "ui.launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "84d33a5f", + "metadata": {}, + "outputs": [], + "source": [ + "ui.close()" + ] + }, + { + "cell_type": "markdown", + "id": "bbd50bf7", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "In my personal opinion, at least when using these *cheaper* versions of the models, the result provided by Claude is the best. ChatGPT adds way too many comments even if the system message discourages that. Gemini provides a good result also, but maybe adds a tad too few comments -- although that certainly depends on your personal preferences." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "llms", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week4/community-contributions/day5_java_unit_test_generator.ipynb b/week4/community-contributions/day5_java_unit_test_generator.ipynb new file mode 100644 index 0000000..39e30e3 --- /dev/null +++ b/week4/community-contributions/day5_java_unit_test_generator.ipynb @@ -0,0 +1,281 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "45ca91c2", + "metadata": {}, + "source": [ + "# AI tool to generate unit tests for the provided Java code\n", + "\n", + "Here we build a Gradio App that uses the frontier models to generate unit tests for a java code. For testing purposes I have used the *cheaper* versions of the models, not the ones the leaderboards indicate as the best ones." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f44901f5", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import google.generativeai as genai\n", + "import anthropic\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c47706b3", + "metadata": {}, + "outputs": [], + "source": [ + "# environment\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35446b9a", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()\n", + "claude = anthropic.Anthropic()\n", + "genai.configure()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0e899efd", + "metadata": {}, + "outputs": [], + "source": [ + "OPENAI_MODEL = \"gpt-4o-mini\"\n", + "CLAUDE_MODEL = \"claude-3-haiku-20240307\"\n", + "GEMINI_MODEL = 'gemini-2.0-flash-lite'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47640f53", + "metadata": {}, + "outputs": [], + "source": [ + "system_message = \"You are an assistant that generates unit test for java code. \"\n", + "system_message += \"Generate one JUnit5 test class with all the relevant test cases in it.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f41ccbf0", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(code):\n", + " user_prompt = \"Generate unit tests for this java code.\\n\\n\"\n", + " user_prompt += code\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c57c0000", + "metadata": {}, + "outputs": [], + "source": [ + "test_code = \"\"\"\n", + "package com.hma.kafkaproducertest.rest;\n", + "\n", + "import com.hma.kafkaproducertest.model.TestDTO;\n", + "import com.hma.kafkaproducertest.producer.TestProducer;\n", + "import org.springframework.web.bind.annotation.*;\n", + "\n", + "@RestController\n", + "@RequestMapping(\"/api\")\n", + "public class TestController {\n", + "\n", + " private final TestProducer producer;\n", + "\n", + " public TestController(TestProducer producer) {\n", + " this.producer = producer;\n", + " }\n", + "\n", + " @PostMapping(\"/event\")\n", + " public TestDTO triggerKafkaEvent(@RequestBody TestDTO payload) {\n", + " producer.sendMessage(payload, \"test\");\n", + " return payload;\n", + " }\n", + "\n", + "}\n", + "\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00c71128", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_gpt(code):\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(code)}\n", + " ]\n", + " stream = openai.chat.completions.create(\n", + " model=OPENAI_MODEL,\n", + " messages=messages,\n", + " stream=True\n", + " )\n", + " result = \"\"\n", + " for chunk in stream:\n", + " result += chunk.choices[0].delta.content or \"\"\n", + " yield result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ca92f8a8", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_claude(code):\n", + " result = claude.messages.stream(\n", + " model=CLAUDE_MODEL,\n", + " max_tokens=2000,\n", + " system=system_message,\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": user_prompt_for(code)},\n", + " ],\n", + " )\n", + " response = \"\"\n", + " with result as stream:\n", + " for text in stream.text_stream:\n", + " response += text or \"\"\n", + " yield response" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9dffed4b", + "metadata": {}, + "outputs": [], + "source": [ + "def stream_gemini(code):\n", + " gemini = genai.GenerativeModel(\n", + " model_name=GEMINI_MODEL,\n", + " system_instruction=system_message\n", + " )\n", + " stream = gemini.generate_content(user_prompt_for(code), stream=True)\n", + " result = \"\"\n", + " for chunk in stream:\n", + " result += chunk.text or \"\"\n", + " yield result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31f9c267", + "metadata": {}, + "outputs": [], + "source": [ + "def generate_tests(code, model):\n", + " if model==\"GPT\":\n", + " result = stream_gpt(code)\n", + " elif model==\"Claude\":\n", + " result = stream_claude(code)\n", + " elif model==\"Gemini\":\n", + " result = stream_gemini(code)\n", + " else:\n", + " raise ValueError(\"Unknown model\")\n", + " yield from result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c04c0a1b", + "metadata": {}, + "outputs": [], + "source": [ + "with gr.Blocks() as ui:\n", + " with gr.Row():\n", + " original_code = gr.Textbox(label=\"Java code:\", lines=10, value=test_code)\n", + " generated_code = gr.Markdown(label=\"Unit tests:\")\n", + " with gr.Row():\n", + " model = gr.Dropdown([\"GPT\", \"Claude\", \"Gemini\"], label=\"Select model\", value=\"GPT\")\n", + " generate = gr.Button(\"Generate tests\")\n", + "\n", + " generate.click(generate_tests, inputs=[original_code, model], outputs=[generated_code])\n", + "\n", + "ui.launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "84d33a5f", + "metadata": {}, + "outputs": [], + "source": [ + "ui.close()" + ] + }, + { + "cell_type": "markdown", + "id": "bbd50bf7", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "The models are missing some information as the `TestDTO` is not defined in the code provided as an input.\n", + "\n", + "Results:\n", + "- Gemini: Generates a well constructed test class with multiple test cases covering scenarios with valid and invalid inputs. It makes assumptions about the content of `TestDTO` and adds a note about those as a comment.\n", + "- Claude: Similar approach to unknown format of `TestDTO`, although no comment added about the assumptions made. The test cases are strutured differently, and they don't cover any case of invalid input, which in my opinion is an important test for a REST endpoint.\n", + "- GPT: While the other two generated *real* unit tests using the mockito extension, GPT generated a *webMVC* test. The other two relied on the equality impelemntation of `TestDTO`, while GPT checks separately each field in the response. As this type of test spins up the application context, the test won't run without additional configuration. In addition, some imports are missing from the test file.\n", + "\n", + "It comes down to personal preferences, but I would give the point to Gemini for this one." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "llms", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week4/community-contributions/wk4-final-passwordgen.ipynb b/week4/community-contributions/wk4-final-passwordgen.ipynb new file mode 100644 index 0000000..98f7b26 --- /dev/null +++ b/week4/community-contributions/wk4-final-passwordgen.ipynb @@ -0,0 +1,337 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "cc7674a9-6164-4424-85a9-f669454cfd2a", + "metadata": {}, + "source": [ + "I used this project to play about with Gradio blocks a little bit as it had more inputs than the other projects I've done.\n", + "Its a password generator which I have no doubt I will use!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04c8d2dd-cb9a-4b18-b12d-48ed2f39679a", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import requests\n", + "import google.generativeai\n", + "import anthropic\n", + "from IPython.display import Markdown, display, update_display\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04521351-f220-42fe-9dc5-d0be80c95dd7", + "metadata": {}, + "outputs": [], + "source": [ + "# keys\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n", + "\n", + "if openai_api_key:\n", + " print(\"All good\")\n", + "else:\n", + " print(\"OpenAI key issue\")\n", + "\n", + "claude_api_key = os.getenv(\"ANTHROPIC_API_KEY\")\n", + "\n", + "if claude_api_key:\n", + " print(\"All good\")\n", + "else:\n", + " print(\"Claude key issue\")\n", + "\n", + "google_api_key = os.getenv(\"GOOGLE_API_KEY\")\n", + "\n", + "if google_api_key:\n", + " print(\"All good\")\n", + "else:\n", + " print(\"Google key issue\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70fd3748-e6b6-4ac2-89a5-ef65ed7e41a3", + "metadata": {}, + "outputs": [], + "source": [ + "# initialise\n", + "\n", + "openai = OpenAI()\n", + "claude = anthropic.Anthropic()\n", + "google.generativeai.configure()\n", + "\n", + "OPENAI_MODEL = \"gpt-4o\"\n", + "CLAUDE_MODEL = \"claude-sonnet-4-20250514\"\n", + "GOOGLE_MODEL = \"gemini-2.0-flash\"\n", + "\n", + "max_tok = 500" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6a448651-e426-4c3c-96f7-d69975dc7b10", + "metadata": {}, + "outputs": [], + "source": [ + "#Prompts\n", + "\n", + "def pass_system_prompt(required_len, spec_char=\"Y\",num_char=\"Y\",min_lowercase=1,min_uppercase=1):\n", + "\n", + " system_prompt = f\"\"\"You are a secure password generator. Your task is to create a single, cryptographically strong password that meets ALL specified requirements.\n", + " \n", + "CRITICAL REQUIREMENTS:\n", + "- Length: EXACTLY {required_len} characters\n", + "- Must include: At least {min_lowercase} lowercase letter(s) AND at least {min_uppercase} uppercase letter(s)\n", + "- Special characters: {'REQUIRED - include at least 1 char' if spec_char else 'FORBIDDEN - do not include any'}\n", + "- Numbers: {'REQUIRED - include at least 1 digit' if num_char else 'FORBIDDEN - do not include any digits'}\n", + "\n", + "SECURITY RULES:\n", + "1. Generate truly random passwords - avoid patterns, dictionary words, or predictable sequences\n", + "2. Distribute character types evenly throughout the password\n", + "3. Do not use repeated characters excessively (max 2 of same character)\n", + "4. Ensure password meets minimum complexity for each required character type\n", + "\n", + "OUTPUT FORMAT:\n", + "- Respond with ONLY the generated password\n", + "- No explanations, no additional text, just the password\n", + "- Verify the password meets ALL requirements before responding\"\"\"\n", + "\n", + " return system_prompt\n", + "\n", + "def pass_user_prompt(required_len, spec_char=\"Y\",num_char=\"Y\",min_lowercase=1,min_uppercase=1):\n", + " \n", + " user_prompt = f\"\"\"Generate a secure password with these exact specifications:\n", + " \n", + "Length: {required_len} characters\n", + "Lowercase letters: Required (minimum {min_lowercase})\n", + "Uppercase letters: Required (minimum {min_uppercase})\n", + "Numbers: {'Required (minimum 1)' if num_char else 'Not allowed'}\n", + "Special characters: {'Required (minimum 1)' if spec_char else 'Not allowed'}\n", + "\n", + "Requirements verification checklist:\n", + "✓ Exactly {required_len} characters total\n", + "✓ Contains {min_lowercase}+ lowercase letters\n", + "✓ Contains {min_uppercase}+ uppercase letters\n", + "✓ {'Contains 1+ numbers' if num_char else 'Contains NO numbers'}\n", + "✓ {'Contains 1+ special characters' if spec_char else 'Contains NO special characters'}\n", + "✓ No obvious patterns or dictionary words\n", + "✓ Good distribution of character types\n", + "\n", + "Generate the password now.\"\"\"\n", + "\n", + " return user_prompt\n", + " \n", + "def pass_messages(required_len, spec_char,num_char,min_lowercase,min_uppercase):\n", + " messages = [\n", + " {\"role\":\"system\",\"content\":pass_system_prompt(required_len, spec_char,num_char,min_lowercase,min_uppercase)},\n", + " {\"role\":\"user\",\"content\":pass_user_prompt(required_len, spec_char,num_char,min_lowercase,min_uppercase)}\n", + " ]\n", + "\n", + " return messages\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "857370b0-35a5-4b50-8715-86f8e781523b", + "metadata": {}, + "outputs": [], + "source": [ + "#test\n", + "\n", + "messages1 = pass_messages(12, \"N\", \"Y\",1,1)\n", + "print(messages1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "59ab4279-90a8-4997-8e15-f07295856222", + "metadata": {}, + "outputs": [], + "source": [ + "def openai_password_gen(required_len, spec_char, num_char,min_lowercase,min_uppercase):\n", + " response=openai.chat.completions.create(\n", + " model=OPENAI_MODEL,\n", + " max_tokens=max_tok,\n", + " messages=pass_messages(required_len, spec_char,num_char,min_lowercase,min_uppercase)\n", + " )\n", + " return response.choices[0].message.content\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f5e1a41a-b03c-4408-a0f5-00529785f3d1", + "metadata": {}, + "outputs": [], + "source": [ + "def claude_password_gen(required_len, spec_char, num_char,min_lowercase,min_uppercase):\n", + " response = claude.messages.create(\n", + " model=CLAUDE_MODEL,\n", + " max_tokens=max_tok,\n", + " system=pass_system_prompt(required_len, spec_char, num_char,min_lowercase,min_uppercase),\n", + " messages = [{\"role\":\"user\",\"content\":pass_user_prompt(required_len, spec_char, num_char,min_lowercase,min_uppercase)}]\n", + " )\n", + " return response.content[0].text\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6a41a0a2-55a1-47e5-8fc0-5dd04ebd3573", + "metadata": {}, + "outputs": [], + "source": [ + "def google_password_gen(required_len, spec_char, num_char,min_lowercase,min_uppercase):\n", + " message = google.generativeai.GenerativeModel(\n", + " model_name=GOOGLE_MODEL,\n", + " system_instruction=pass_system_prompt(required_len, spec_char, num_char,min_lowercase,min_uppercase)\n", + " )\n", + " response = message.generate_content(pass_user_prompt(required_len, spec_char, num_char,min_lowercase,min_uppercase))\n", + " return response.text" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dcd1ce50-6576-4594-8739-1d7daf602213", + "metadata": {}, + "outputs": [], + "source": [ + "#test\n", + "messages1 = openai_password_gen(12, \"N\",\"Y\",1,1)\n", + "messages2 = claude_password_gen(12,\"N\",\"Y\",1,1)\n", + "messages3= google_password_gen(12,\"N\",\"Y\",1,1)\n", + "print(\"OpenAI: \",messages1)\n", + "print(\"Claude: \", messages2)\n", + "print(\"Gemini: \", messages3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9cec429a-2355-4941-8422-480b2614009c", + "metadata": {}, + "outputs": [], + "source": [ + "# model select\n", + "\n", + "def select_model(required_len, spec_char, num_char,min_lowercase,min_uppercase,model):\n", + " if model == \"OpenAI\":\n", + " return openai_password_gen(required_len, spec_char, num_char,min_lowercase,min_uppercase)\n", + " elif model == \"Claude\":\n", + " return claude_password_gen(required_len, spec_char, num_char,min_lowercase,min_uppercase)\n", + " elif model == \"Gemini\":\n", + " return google_password_gen(required_len, spec_char, num_char,min_lowercase,min_uppercase)\n", + " else:\n", + " print(\"No model selected\")\n", + " return None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bef52e6d-dc50-4c91-9d56-624dfdd66276", + "metadata": {}, + "outputs": [], + "source": [ + "test = select_model(12, \"N\",\"Y\",1,1,\"OpenAI\")\n", + "\n", + "print(test)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b9d3685-a1b8-470c-8f4b-e63d68a0240d", + "metadata": {}, + "outputs": [], + "source": [ + "css = \"\"\"\n", + "#password_box textarea {\n", + " background-color: #306998;\n", + " color: white;\n", + "}\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "81c423ec-0ca7-4c96-a2fe-02ed2b5f3839", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "with gr.Blocks(css=css) as demo:\n", + " gr.Markdown(\"Choose your password complexity requirements and run:\")\n", + " with gr.Row():\n", + " with gr.Column(min_width=150,scale=2):\n", + " with gr.Row():\n", + " required_len = gr.Number(label=\"Specify the required length\",value=12,minimum=1,maximum=30)\n", + " min_lowercase = gr.Number(label=\"the minimum lowercase letters\", value=1,minimum=0)\n", + " min_uppercase = gr.Number(label=\"the minimum uppercase letters\", value=1,minimum=0)\n", + " with gr.Column():\n", + " spec_char = gr.Checkbox(label=\"Include special characters?\",value=True)\n", + " num_char = gr.Checkbox(label=\"Include numbers?\", value=True)\n", + " with gr.Row():\n", + " with gr.Column():\n", + " model = gr.Dropdown([\"OpenAI\",\"Claude\",\"Gemini\"])\n", + " btn = gr.Button(\"Run\")\n", + " with gr.Column():\n", + " output = gr.Textbox(label=\"Password:\", elem_id=\"password_box\")\n", + " \n", + " btn.click(fn=select_model,inputs=[required_len,spec_char,num_char,min_lowercase,min_uppercase,model],outputs=output)\n", + "\n", + "demo.launch()\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d81a8318-57ef-46ae-91b7-ae63d661edd8", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week4/community-contributions/wk4-unittest-generator.ipynb b/week4/community-contributions/wk4-unittest-generator.ipynb new file mode 100644 index 0000000..49dbb34 --- /dev/null +++ b/week4/community-contributions/wk4-unittest-generator.ipynb @@ -0,0 +1,420 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "65b3aadc-c540-4cb2-a338-d523d3f22e5b", + "metadata": {}, + "source": [ + "Unit test generator using GPT, Claude and Gemini.\n", + "This will create unit test code from python and also run the code and provide the result (including any errors)\n", + "Note:\n", + "When I tried to use claude-sonnet-4-20250514 the results were too big and the python was cut-off (no matter how big I made the max tokens). This seemed to be the case for both examples. I've changed it to claude-3-5-sonnet-20240620 and it seems to be run better." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e610bf56-a46e-4aff-8de1-ab49d62b1ad3", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import google.generativeai\n", + "import anthropic\n", + "from IPython.display import Markdown, display, update_display\n", + "import gradio as gr\n", + "import sys\n", + "import io\n", + "import traceback\n", + "import unittest\n", + "import subprocess\n", + "import tempfile" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f672e1c-87e9-4865-b760-370fa605e614", + "metadata": {}, + "outputs": [], + "source": [ + "# keys\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n", + "\n", + "if openai_api_key:\n", + " print(\"All good\")\n", + "else:\n", + " print(\"OpenAI key issue\")\n", + "\n", + "claude_api_key = os.getenv(\"ANTHROPIC_API_KEY\")\n", + "\n", + "if claude_api_key:\n", + " print(\"All good\")\n", + "else:\n", + " print(\"Claude key issue\")\n", + "\n", + "google_api_key = os.getenv(\"GOOGLE_API_KEY\")\n", + "\n", + "if google_api_key:\n", + " print(\"All good\")\n", + "else:\n", + " print(\"Google key issue\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8aa149ed-9298-4d69-8fe2-8f5de0f667da", + "metadata": {}, + "outputs": [], + "source": [ + "# initialise\n", + "\n", + "openai = OpenAI()\n", + "claude = anthropic.Anthropic()\n", + "google.generativeai.configure()\n", + "\n", + "OPENAI_MODEL = \"gpt-4o\"\n", + "CLAUDE_MODEL = \"claude-3-5-sonnet-20240620\" #\"claude-sonnet-4-20250514\"\n", + "GOOGLE_MODEL = \"gemini-2.0-flash\"\n", + "\n", + "max_tok = 5000" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6896636f-923e-4a2c-9d6c-fac07828a201", + "metadata": {}, + "outputs": [], + "source": [ + "system_message = \"You are an engineer with responsibility for unit testing python code.\"\n", + "system_message += \"You review base python code and develop unit tests, also in python, which validate each unit of code.\"\n", + "system_message += \"\"\" The output must be in Python with both the unit tests and comments explaining the purpose of each test.\n", + "The output should not include any additional text at the start or end including \"```\". It should be possible to run the code without any updates including an execution statement.\n", + "Include the base / original python code in the response.\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8e7b3546-57aa-4c29-bc5d-f211970d04eb", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(python):\n", + " user_prompt = \"Review the Python code provided and develop unit tests which can be run in a jupyter lab.\"\n", + " user_prompt += \"\"\" The output must be in Python with both the unit tests and comments explaining the purpose of each test.\n", + "The output should not include any additional text at the start or end including \"```\". It should be possible to run the code without any updates (include an execution statement).\n", + "Include the base / original python code in the response.\"\"\"\n", + " user_prompt += python\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c6190659-f54c-4951-bef4-4960f8e51cc4", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(python):\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt_for(python)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b327aa3-3277-44e1-972f-aa7158147ddd", + "metadata": {}, + "outputs": [], + "source": [ + "# python example\n", + "example = \"\"\"class BookNotAvailableError(Exception):\n", + " pass\n", + "\n", + "class Library:\n", + " def __init__(self):\n", + " self.inventory = {} # book title -> quantity\n", + " self.borrowed = {} # user -> list of borrowed book titles\n", + "\n", + " def add_book(self, title, quantity=1):\n", + " if quantity <= 0:\n", + " raise ValueError(\"Quantity must be positive\")\n", + " self.inventory[title] = self.inventory.get(title, 0) + quantity\n", + "\n", + " def borrow_book(self, user, title):\n", + " if self.inventory.get(title, 0) < 1:\n", + " raise BookNotAvailableError(f\"'{title}' is not available\")\n", + " self.inventory[title] -= 1\n", + " self.borrowed.setdefault(user, []).append(title)\n", + "\n", + " def return_book(self, user, title):\n", + " if user not in self.borrowed or title not in self.borrowed[user]:\n", + " raise ValueError(f\"User '{user}' did not borrow '{title}'\")\n", + " self.borrowed[user].remove(title)\n", + " self.inventory[title] = self.inventory.get(title, 0) + 1\n", + "\n", + " def get_available_books(self):\n", + " return {title: qty for title, qty in self.inventory.items() if qty > 0}\n", + "\n", + " def get_borrowed_books(self, user):\n", + " return self.borrowed.get(user, [])\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ed6e624e-88a5-4f10-8ab5-f071f0ca3041", + "metadata": {}, + "outputs": [], + "source": [ + "# python example2\n", + "example2 = \"\"\"class Calculator:\n", + " def add(self, a, b):\n", + " return a + b\n", + "\n", + " def subtract(self, a, b):\n", + " return a - b\n", + "\n", + " def divide(self, a, b):\n", + " if b == 0:\n", + " raise ValueError(\"Cannot divide by zero\")\n", + " return a / b\n", + "\n", + " def multiply(self, a, b):\n", + " return a * b\n", + "\n", + "\n", + "def is_prime(n):\n", + " if n <= 1:\n", + " return False\n", + " if n <= 3:\n", + " return True\n", + " if n % 2 == 0 or n % 3 == 0:\n", + " return False\n", + " i = 5\n", + " while i * i <= n:\n", + " if n % i == 0 or n % (i + 2) == 0:\n", + " return False\n", + " i += 6\n", + " return True\n", + " \"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7d2fea8-74c6-4421-8f1e-0e76d5b201b9", + "metadata": {}, + "outputs": [], + "source": [ + "def unit_test_gpt(python): \n", + " stream = openai.chat.completions.create(model=OPENAI_MODEL, messages=messages_for(python), stream=True)\n", + " reply = \"\"\n", + " for chunk in stream:\n", + " fragment = chunk.choices[0].delta.content or \"\"\n", + " reply += fragment\n", + " yield reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7cd84ad8-d55c-4fe0-9eeb-1895c95c4a9d", + "metadata": {}, + "outputs": [], + "source": [ + "def unit_test_claude(python):\n", + " result = claude.messages.stream(\n", + " model=CLAUDE_MODEL,\n", + " max_tokens=max_tok,\n", + " system=system_message,\n", + " messages=[{\"role\": \"user\", \"content\": user_prompt_for(python)}],\n", + " )\n", + " reply = \"\"\n", + " with result as stream:\n", + " for text in stream.text_stream:\n", + " reply += text\n", + " yield reply" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ad86f652-879a-489f-9891-bdc2d97c33b0", + "metadata": {}, + "outputs": [], + "source": [ + "def unit_test_google(python):\n", + " model = google.generativeai.GenerativeModel(\n", + " model_name=GOOGLE_MODEL,\n", + " system_instruction=system_message\n", + " )\n", + " stream = model.generate_content(contents=user_prompt_for(python),stream=True)\n", + " reply = \"\"\n", + " for chunk in stream:\n", + " reply += chunk.text or \"\"\n", + " yield reply.replace(\"```python\\n\", \"\").replace(\"```\", \"\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "105db6f9-343c-491d-8e44-3a5328b81719", + "metadata": {}, + "outputs": [], + "source": [ + "#unit_test_gpt(example)\n", + "#unit_test_claude(example)\n", + "#unit_test_google(example)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2f1ae8f5-16c8-40a0-aa18-63b617df078d", + "metadata": {}, + "outputs": [], + "source": [ + "def select_model(python, model):\n", + " if model==\"GPT\":\n", + " result = unit_test_gpt(python)\n", + " elif model==\"Claude\":\n", + " result = unit_test_claude(python)\n", + " elif model==\"Google\":\n", + " result = unit_test_google(python)\n", + " else:\n", + " raise ValueError(\"Unknown model\")\n", + " for stream_so_far in result:\n", + " yield stream_so_far " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f1ddb38e-6b0a-4c37-baa4-ace0b7de887a", + "metadata": {}, + "outputs": [], + "source": [ + "# with gr.Blocks() as ui:\n", + "# with gr.Row():\n", + "# python = gr.Textbox(label=\"Python code:\", lines=10, value=example)\n", + "# test = gr.Textbox(label=\"Unit tests\", lines=10)\n", + "# with gr.Row():\n", + "# model = gr.Dropdown([\"GPT\", \"Claude\",\"Google\"], label=\"Select model\", value=\"GPT\")\n", + "# generate = gr.Button(\"Generate unit tests\")\n", + "\n", + "# generate.click(select_model, inputs=[python, model], outputs=[test])\n", + "\n", + "# ui.launch()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "389ae411-a4f6-44f2-8b26-d46a971687a7", + "metadata": {}, + "outputs": [], + "source": [ + "def execute_python(code):\n", + " # Capture stdout and stderr\n", + " output = io.StringIO()\n", + " sys_stdout = sys.stdout\n", + " sys_stderr = sys.stderr\n", + " sys.stdout = output\n", + " sys.stderr = output\n", + "\n", + " try:\n", + " # Compile the code first\n", + " compiled_code = compile(code, '', 'exec')\n", + "\n", + " # Prepare a namespace dict for exec environment\n", + " # Include __builtins__ so imports like 'import unittest' work\n", + " namespace = {\"__builtins__\": __builtins__}\n", + "\n", + " # Run the user's code, but expect tests will be defined here\n", + " exec(compiled_code, namespace)\n", + "\n", + " # Look for unittest.TestCase subclasses in the namespace\n", + " loader = unittest.TestLoader()\n", + " suite = unittest.TestSuite()\n", + "\n", + " for obj in namespace.values():\n", + " if isinstance(obj, type) and issubclass(obj, unittest.TestCase):\n", + " tests = loader.loadTestsFromTestCase(obj)\n", + " suite.addTests(tests)\n", + "\n", + " # Run the tests\n", + " runner = unittest.TextTestRunner(stream=output, verbosity=2)\n", + " result = runner.run(suite)\n", + "\n", + " except SystemExit as e:\n", + " # Catch sys.exit calls from unittest.main()\n", + " output.write(f\"\\nSystemExit called with code {e.code}\\n\")\n", + " except Exception as e:\n", + " # Catch other errors\n", + " output.write(f\"\\nException: {e}\\n\")\n", + " finally:\n", + " sys.stdout = sys_stdout\n", + " sys.stderr = sys_stderr\n", + "\n", + " return output.getvalue()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eca98de3-9e2f-4c23-8bb4-dbb2787a15a4", + "metadata": {}, + "outputs": [], + "source": [ + "with gr.Blocks() as ui:\n", + " with gr.Row():\n", + " python = gr.Textbox(label=\"Python code:\", lines=10, value=example2)\n", + " test = gr.Textbox(label=\"Unit tests\", lines=10)\n", + " test_run = gr.Textbox(label=\"Test results\", lines=10)\n", + " with gr.Row():\n", + " model = gr.Dropdown([\"GPT\", \"Claude\",\"Google\"], label=\"Select model\", value=\"GPT\")\n", + " generate = gr.Button(\"Generate unit tests\")\n", + " run = gr.Button(\"Run unit tests\")\n", + "\n", + " generate.click(select_model, inputs=[python, model], outputs=[test])\n", + " run.click(execute_python, inputs=[test],outputs=[test_run])\n", + "\n", + "ui.launch()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week5/community-contributions/08_rag_qa_assistant.ipynb b/week5/community-contributions/08_rag_qa_assistant.ipynb new file mode 100644 index 0000000..2d0affb --- /dev/null +++ b/week5/community-contributions/08_rag_qa_assistant.ipynb @@ -0,0 +1,710 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "3f78498d-dbc6-4e1b-a629-9ac9e44c8dd8", + "metadata": {}, + "source": [ + "# RAG-powered Q&A agent for Insurellm employees\n", + "---\n", + "\n", + "An internal expert knowledge assistant for Insurellm employees, using Retrieval-Augmented Generation (RAG) to deliver fast, accurate, and cost-efficient answers to a wide range of internal queries,\n", + "\n", + "- 🌍 Task: Answer questions about Insurellm using naive RAG\n", + "- 🧠 Models: OpenAI GPT via LangChain\n", + "- 🔍 Retrieval: ChromaDB + OpenAI embeddings\n", + "- 🚀 Tools:\n", + " - langchain: 0.3.21\n", + " - openai: 1.69.0\n", + " - chromadb: 0.6.3\n", + " - gradio: 5.23.1\n", + " - python: 3.11.11\n", + "\n", + "- ✨ Features:\n", + "\n", + " - Loads PDF, text, and markdown files automatically\n", + " - Only updates when files actually change (saves time)\n", + " - Breaks documents into small, overlapping pieces for better search\n", + " - Finds the most relevant information using smart matching\n", + " - Remembers conversation history and shows where answers come from\n", + " - Only answers based on your documents (no made-up information\n", + " - Web chat interface with streaming responses\n", + " - Handles errors gracefully and detects duplicate content\n", + " - Tracks document details and keeps everything organized\n", + " - Ready for business use with built-in quality checks\n", + "\n", + "- 📤 Output: Streaming response with sources retrieved from the knowledge base\n", + "- 🧑‍💻 Skill Level: Intermediate\n", + "- ⚙️ Hardware: ✅ CPU is sufficient — no GPU required\n", + "\n", + "🛠️ **Requirements**: 🔑 OpenAI API Key \n", + "\n", + "⚙️ **Customizable by user**\n", + "- 📝 Modify system and expansion prompts\n", + "- 📁 Drop in new company documents\n", + "- 🎯 Adjust retrieval top-k and similarity threshold\n", + "\n", + "This project currently uses a naive RAG approach, which limits the assistant's performance and accuracy. To improve response quality and reliability, more advanced RAG techniques will be needed — a more refined and powerful version is planned for future release.\n", + "\n", + "![](https://github.com/lisekarimi/lexo/blob/main/assets/08_naive_rag.png?raw=true)\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "id": "b9abf112-72ca-431a-b7cf-b126e0a69a4d", + "metadata": {}, + "source": [ + "## 📥 Imports" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "abdef4fe-5055-4259-99c7-82a0525c0d35", + "metadata": {}, + "outputs": [], + "source": [ + "# Standard library imports\n", + "import os\n", + "import hashlib\n", + "from pathlib import Path\n", + "from typing import List\n", + "\n", + "# Third-party imports\n", + "import numpy as np\n", + "import plotly.graph_objects as go\n", + "from dotenv import load_dotenv\n", + "from pydantic import Field\n", + "from sklearn.manifold import TSNE\n", + "import gradio as gr\n", + "\n", + "# LangChain core imports\n", + "from langchain.document_loaders import TextLoader, PyPDFLoader\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.schema import BaseRetriever, Document\n", + "from langchain.schema.vectorstore import VectorStoreRetriever\n", + "from langchain.callbacks.manager import CallbackManagerForRetrieverRun\n", + "from langchain.memory import ConversationBufferMemory\n", + "from langchain.chains import ConversationalRetrievalChain\n", + "from langchain.prompts import PromptTemplate\n", + "\n", + "# LangChain integrations\n", + "from langchain_openai import OpenAIEmbeddings, ChatOpenAI\n", + "from langchain_chroma import Chroma" + ] + }, + { + "cell_type": "markdown", + "id": "79875c2d-4193-4fa8-95b8-ad128b1c84fb", + "metadata": {}, + "source": [ + "## 🔐 Load env variables and configuration" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b5ca4dd-a1c2-4fc6-844f-7b0c83008c99", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables\n", + "load_dotenv(override=True)\n", + "\n", + "# Configuration\n", + "DATA_PATH = \"data/knowledge-base/\" # Use your path\n", + "MODEL = \"gpt-4o-mini\"\n", + "CHROMA_PATH = \"vector_db/chroma_insurellm\"\n", + "\n", + "# Explicitly access the OpenAI API key\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "if not openai_api_key:\n", + " print(\"❌ OPENAI_API_KEY is missing\")" + ] + }, + { + "cell_type": "markdown", + "id": "18e5b9a1-dca8-4b42-8517-174f653f30a7", + "metadata": {}, + "source": [ + "## 📄 Load files as Document objects into memory" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ae72f98b-05a5-4758-9503-424a93055323", + "metadata": {}, + "outputs": [], + "source": [ + "# Load .pdf, .txt, and .md documents with metadata, excluding Jupyter checkpoints.\n", + "\n", + "documents = []\n", + "\n", + "def add_metadata(doc, file_path):\n", + " doc.metadata[\"doc_type\"] = file_path.parent.name\n", + " doc.metadata[\"file_name\"] = file_path.name\n", + " if not doc.page_content.strip():\n", + " print(f\"⚠️ Empty content in {file_path}\")\n", + " # else:\n", + " # print(doc)\n", + " # print(\"-\" * 40)\n", + " return doc\n", + "\n", + "for file_path in Path(DATA_PATH).rglob(\"*\"):\n", + " if \".ipynb_checkpoints\" in file_path.parts:\n", + " continue\n", + "\n", + " try:\n", + " if file_path.name.endswith(\".pdf\"):\n", + " docs = PyPDFLoader(str(file_path)).load()\n", + " elif file_path.name.endswith((\".txt\", \".md\")):\n", + " docs = TextLoader(str(file_path), encoding=\"utf-8\").load()\n", + " else:\n", + " continue\n", + " except Exception as e:\n", + " print(f\"❌ Skipped {file_path}: {e}\")\n", + " continue\n", + "\n", + " documents.extend([add_metadata(doc, file_path) for doc in docs])\n", + "\n", + "print(f\"{len(documents)} documents loaded.\" if documents else \"No documents loaded.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "ed0fcc85-ca14-430a-bde2-db3d77f79143", + "metadata": {}, + "source": [ + "## ✂️ Splitting documents into chunks" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e40d0487-6db4-4c3f-b0c6-e9aaf4e14b37", + "metadata": {}, + "outputs": [], + "source": [ + "# Split documents into smaller chunks with overlapping characters for better context.\n", + "text_splitter = CharacterTextSplitter(\n", + " chunk_size=1000,\n", + " chunk_overlap=200,\n", + " add_start_index=True # Maintain chunk order (useful for context tracking)\n", + ")\n", + "\n", + "# Load and split documents\n", + "chunks = text_splitter.split_documents(documents)\n", + "\n", + "print(f\"Split {len(documents)} documents into {len(chunks)} chunks.\")\n", + "\n", + "def generate_chunk_id(text):\n", + " return hashlib.md5(text.encode(\"utf-8\")).hexdigest()\n", + "\n", + "# Add chunk_id to each chunk's metadata\n", + "for chunk in chunks:\n", + " chunk.metadata[\"chunk_id\"] = generate_chunk_id(chunk.page_content) # Create an MD5 hash of the chunk's content\n", + " if not chunk.page_content.strip():\n", + " print(f\"⚠️ Empty chunk from: {chunk.metadata['file_name']}\")\n", + "\n", + "# Debug: print a few chunk metadatas to verify chunk_id is added\n", + "for i, chunk in enumerate(chunks[:2]):\n", + " print(f\"Chunk {i+1} metadata:\", chunk.metadata)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1faa604c-ddd9-4475-aaff-f4456629d77d", + "metadata": {}, + "outputs": [], + "source": [ + "# Check for duplicate chunk IDs\n", + "chunk_ids = [chunk.metadata[\"chunk_id\"] for chunk in chunks]\n", + "duplicate_ids = [chunk_id for chunk_id in chunk_ids if chunk_ids.count(chunk_id) > 1]\n", + "\n", + "if duplicate_ids:\n", + " print(f\"Duplicate chunk IDs found: {duplicate_ids}\")\n", + "else:\n", + " print(\"No duplicate chunks.\")" + ] + }, + { + "cell_type": "markdown", + "id": "d73f6bee-5df5-422a-a03f-e117a858370b", + "metadata": {}, + "source": [ + "## 🧠 Chuncks Embedding" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a0135e85-6e0b-45a4-ab0d-b9fec56b63ac", + "metadata": {}, + "outputs": [], + "source": [ + "embedding_function = OpenAIEmbeddings()\n", + "# By default, OpenAIEmbeddings() uses OpenAI's text-embedding-ada-002 model - a multilingual model" + ] + }, + { + "cell_type": "markdown", + "id": "dbdb70eb-9902-4065-92b7-c72c2b8e15f7", + "metadata": {}, + "source": [ + "## 💾 Save embedded chunks to Chroma database" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "093959f0-6826-4594-9338-598094e24923", + "metadata": {}, + "outputs": [], + "source": [ + "os.makedirs(CHROMA_PATH, exist_ok=True)\n", + "\n", + "def get_existing_chunk_ids(db_path):\n", + " try:\n", + " db_existing = Chroma(persist_directory=db_path)\n", + " results = db_existing._collection.get(include=[\"metadatas\"])\n", + " return set(\n", + " m[\"chunk_id\"] for m in results[\"metadatas\"]\n", + " if isinstance(m, dict) and \"chunk_id\" in m\n", + " )\n", + " except Exception as e:\n", + " print(\"❌ Error loading existing chunk IDs:\", e)\n", + " return set()\n", + "\n", + "# Get chunk_ids of current chunks\n", + "new_chunk_ids = set([chunk.metadata[\"chunk_id\"] for chunk in chunks])\n", + "\n", + "# Get existing chunk_ids from Chroma\n", + "existing_chunk_ids = get_existing_chunk_ids(CHROMA_PATH)\n", + "\n", + "# Compare\n", + "if new_chunk_ids != existing_chunk_ids:\n", + " print(\"Chunk changes detected. Rebuilding Chroma DB.\")\n", + " db = Chroma.from_documents(documents=chunks, embedding=embedding_function, persist_directory=CHROMA_PATH)\n", + " print(f\"Saved {len(chunks)} chunks to {CHROMA_PATH}.\")\n", + "else:\n", + " db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)\n", + " print(\"Chroma DB is up to date. Skipping regeneration.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "670b049a-0eca-41c1-a5a8-8ed4561168b2", + "metadata": {}, + "source": [ + "## 📊 Visualizing the Vector Store" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1179ca76-2502-4bea-a4d8-4cbe149e92fa", + "metadata": {}, + "outputs": [], + "source": [ + "collection = db._collection\n", + "result = collection.get(include=['embeddings', 'documents', 'metadatas'])\n", + "vectors = np.array(result['embeddings'])\n", + "documents = result['documents']\n", + "metadatas = result['metadatas']\n", + "doc_types = [metadata['doc_type'] for metadata in metadatas]\n", + "colors = [['blue', '#4B0082', 'red', '#8B4513'][['products', 'employees', 'contracts', 'company'].index(t)] for t in doc_types]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c18bd18c-0b6c-4206-b2d0-a1bb59b3d39e", + "metadata": {}, + "outputs": [], + "source": [ + "# We humans find it easier to visalize things in 2D!\n", + "# Reduce the dimensionality of the vectors to 2D using t-SNE\n", + "# (t-distributed stochastic neighbor embedding)\n", + "\n", + "tsne = TSNE(n_components=2, random_state=42)\n", + "reduced_vectors = tsne.fit_transform(vectors)\n", + "\n", + "# Create the 2D scatter plot\n", + "fig = go.Figure(data=[go.Scatter(\n", + " x=reduced_vectors[:, 0],\n", + " y=reduced_vectors[:, 1],\n", + " mode='markers',\n", + " marker=dict(size=8, color=colors, opacity=0.8),\n", + " text=[f\"Type: {t}
Text: {d[:100]}...\" for t, d in zip(doc_types, documents)],\n", + " hoverinfo='text'\n", + ")])\n", + "\n", + "fig.update_layout(\n", + " title='2D Chroma Vector Store Visualization',\n", + " plot_bgcolor='black',\n", + " paper_bgcolor='black',\n", + " font=dict(color='black'),\n", + " xaxis=dict(gridcolor='lightgray', zerolinecolor='lightgray'),\n", + " yaxis=dict(gridcolor='lightgray', zerolinecolor='lightgray'),\n", + " width=800,\n", + " height=600,\n", + " margin=dict(r=20, b=10, l=10, t=40),\n", + ")\n", + "\n", + "\n", + "fig.show()" + ] + }, + { + "attachments": { + "image.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "b7d527de", + "metadata": {}, + "source": [ + "![image.png](attachment:image.png)" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "71364356-7edb-4e72-a7ba-f6284d4a998d", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's try 3D!\n", + "\n", + "tsne = TSNE(n_components=3, random_state=42)\n", + "reduced_vectors = tsne.fit_transform(vectors)\n", + "\n", + "# Create the 3D scatter plot\n", + "fig = go.Figure(data=[go.Scatter3d(\n", + " x=reduced_vectors[:, 0],\n", + " y=reduced_vectors[:, 1],\n", + " z=reduced_vectors[:, 2],\n", + " mode='markers',\n", + " marker=dict(size=8, color=colors, opacity=0.8),\n", + " text=[f\"Type: {t}
Text: {d[:100]}...\" for t, d in zip(doc_types, documents)],\n", + " hoverinfo='text'\n", + ")])\n", + "\n", + "fig.update_layout(\n", + " title='3D Chroma Vector Store Visualization',\n", + " plot_bgcolor='black',\n", + " paper_bgcolor='black',\n", + " font=dict(color='white'),\n", + " scene=dict(\n", + " xaxis=dict(color='white', backgroundcolor='black', showbackground=True),\n", + " yaxis=dict(color='white', backgroundcolor='black', showbackground=True),\n", + " zaxis=dict(color='white', backgroundcolor='black', showbackground=True)\n", + " ),\n", + " width=900,\n", + " height=700,\n", + " margin=dict(r=20, b=10, l=10, t=40)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "attachments": { + "image.png": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0wAAAKvCAYAAABZOk8vAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAO6ySURBVHhe7P1/fCPnfR/6flZccaWNVgSzkmKuVsIwipmsKoVguSc5SZ0SkN2kp4m1ZOK0x2lkAtc5K+fEOSTb3NNXfBoTUHrtpnUvlm18ot3KF6TVxr2+ccGV3ZMflQ2w8U3qvHYvwOjHOoxVDKX1sll5Q0C0fiwlivePxTM7eDC/MQMMgM/79ZqXtIMBCcwMyeeD53m+zwEA++gxL7zwAu677z78yq/8Cp5++mn5YSLyybPPPouf+ImfwG/91m/hk5/8pPwwERH1oFtuuQUHDhyQdwMA9vf38e6778q7W3LgwAHccsstAIC9vT35Yc3AwAAA4N1338X+vj/NW/E1rb6vG/r3ore/v6+9ZnFu9fu8OHjwIG655Ra88847QP28tMuBAwe0Dbr3It67m2sknqM/P/rn6v9ffE+/32vzFbPx1FNPaW9abJcvX8YjjzzScNyzzz7bdJzYnnrqqYZjnXrkkUdw+fLlpq9n9P37hTgnr732Gh577DH5Yc0LL7xge4xbjz32GF577TW88MIL8kNtIb6/3f3Q6ddpxcv1E+9Hfp9hEPRrs7qW4neO198vRERk7MCBAxgYGNA2s7AEl8c6JX8N/dcXm558fNi9++67WoDY14UCq2DqhDj/t956K2699Vb54cDoX7t4b2KfFyIgydstt9yinS9hf3+/pXNmxtUrf+qpp/DRj34UX/3qV7UE95GPfAR33nknVldXmxp8u7u7+M3f/E3t2AMHDuBzn/scPvrRj9o2EGVPPfUUvvrVrwIA3v/+92tf7/3vfz8A4Pd///fxxBNPSM/qfV/72tfwB3/wBzhy5Aimpqbkh4F6I/O+++7DK6+80jM9bk899RQ+//nP45VXXmm4v37zN38Td999t+H9GEb9ev2IiKh7GDV09/f3sbe317DpG6/i/42eG1ZuGtpujtWTG/ionyN9OBIho1XXr19v2NpBBKP9ek+jeG9uepScEF9Tvg4HpN4nv7i+iz/3uc/hAx/4gPbvp59+Gn/2Z3+GQ4cO4b3vfW/DsUZ+6Zd+CR/5yEcAAJ/97GcdNWqfeOIJPPbYY3jxxRdx/PhxfO1rX9Me+9rXvobjx4/jD//wDxue00/W1taws7ODH/uxH5MfAgBMTU3hyJEj+NM//VP5oa70yCOP4O/+3b+Lb3/72/jVX/3Vhsc++clP4tChQ/jzP//zhv1h5vb6Pf3007jzzjubfhbCoJOv7QMf+AAOHDiAX/qlX5IfIiIin4kGvfjAUr8PPgwn05MbxXbcHg9dQ9+pVnt/9PQ9TIJf509/fYImwpG+B0j/GKRhdU7pnyOeZ7RPXBM/zpuR/Va3Z599dv/69ev7TzzxhOU+/fbUU0/t7+/v7z/77LNNj8nbCy+8YPm15O2FF17Yf+211/Yzmcz+a6+9ti9cvnx5/5FHHtGOe+yxx/Zfe+21/RdeeGH/iSee2L9+/XrTcfr9+/v7hq/j2WefNfx+r7322v5jjz3W9DXEfvH8Rx55ZP/y5cva44L8eq02q3P0wgsvNH0tcf4F+TWJTZwjvcuXL+9/+tOfbtovHtN/nxdeeKHh8RdeeMHw61tdA3nTP0d+TN6MXr/R1/fjdRp9r6eeeqrpNRltbq6f2fuX34N8TcV9Kl/np556qul7P/vssw1fa9/i3tfvl1+b2b0tiJ9/o3O3L10Hs2PEuZFfi9jknz+jY8Rzz5w50/B6jc4XN27cuPXbNjAw0LAdOHDAcD+A/QMHDhjub2W75ZZbGr6W/PWN9stfw2ozes1ON3EurLYDBw5o7wG613nLLbdYPv/AgQOWj9ttBw4c2B8cHNwfHBzcv+WWW5oe93MT7++WW25puF7ypn/c6Wtyc33050t8L/nredyadrjazIKPWeNFbKLxIzdc5c3pcfpNNBz1jR25Iafft2/QgIbuvekbvaIhabTP6PuZ7de/jkceeWT/pZdeajhXoqFp9LqMNtEolK+D0X6jxrnR9TK7thcvXtx/4oknDN+L2Ixev9G1tLsGRpu+Ee4kkLTjdRqdK6Nzb7aZHWu03+j9vFD/kEDcZ0bv61mXgUl+LUb3iLzP6LXJm3ht8s/F5cuXG16b0dcy2mf2WuDhZ1j/fKNzyI0bN279uMmNUpg0Yq2CVCub28DktpFs9F6cbk4Cjf7ri0a8k+eJ58r7nGzie9566637t956q/a95eP82MT30gdDq028f6fnT36+1ab/2mKTv57HrWmH7aZvsBo1wGDSeDH6GmbPF5tVA8lskxuPYpMbjFZf2+ox+evLX9fs+9ntlzejhqzZZta4k6+DaIDLQUN+v/K/jTarY+TvKzY5AFh9DatNPE9m9HWsvocfr1Oce6PH5HvFbHN6/WDwWuR/67dnnnnGU2Ay2sT3EedEfE23gemF+gca8j1otMmv2erru3kt8nWRnys2p+eGGzdu3Hp5kxulMAkZQQUm/feSH9Nv8utws4nvIe83+75uv4/4Om7Ckh+bCEzy/iA2o3vCbHNzDuTnut3kr+dlcz5YU0fMGzpw4AB+5Vd+BZ/97GcDq4rlN6dzrd773vfi0KFDhvN+tra2HH8dN+TKgh/96EflQ0x97Wtfwze/+U3cfffdiMfjQH2uzw/90A/h1VdfRbFYBAD8+I//OK5fv461tbWG5z/99NN45ZVXMDQ0hEceecTy/TsxMjLS8H2Fv/zLv8T169cxMjLSsN8tMVfmwIEDePHFF7X9Dz74IPb39w2rqBnx43XG43HcfffdhufqT//0Tx3dK06vn5Fvf/vbeO211/Dggw/i2WefbXjs0Ucf9TyX6IknnsD169e1+/Hzn/88jhw5Ih/myrPPPosHH3wQX/3qVw3nGr3wwgsNPwOiqIsXVvewm5/hwcFBHDt2TN5NRNS3xDwR/VwR8W83c1Oc2teVKper45lVynNL/z2ckOcc2dHP+TIj5lFZHdNNxDl1e67CyFNg0nv66afxxS9+Effeey9+4Rd+QX7Y1L333os777wTr732Gr797W/LDzcRDfl2OXbsGAYHB+XdAABVVX1tRIlSyT/xEz/RUFXwc5/7nHyopT/5kz8B6qEIuob8H/zBH2iN5pGRERw5cgSf//znGxqm+/v7ePDBB7WvZfX+7TzyyCMYGhqSdwO6xr2f1/Nv/I2/oZ0zEaAefPBB2/LSfr1Oca4++tGPNp1TN6HXyfUz8rWvfQ0f+chH8O1vfxvvf//7te99/fp1z5UjX3jhBfzGb/wG/viP/1g7rx/5yEews7MjH+rYE088gZ/4iZ/Aiy++2FA4Rjx2/fp13HffffjIRz6ifU9RGdMLq3vY759hIqJ+Iibwv/vuu1qFPBEI3BROcMNNoPEaOtw06t0cKwdM+bWJoBR00QLBr3LvTujfdzfz5a6+cuUKdnd3oSiK/JApUfnrm9/8pmVjUPR86D95bwfxnowoioLd3V1cuXJFfsiT2dlZHDp0qOXFP4vFIl599VX8yI/8CB577DHD3qStrS3s7Ow0NEr1m6huZvX+7Xzta19DrVaTdwO6oFyr1Syveyu+9KUvObof/Xqd4lx97nOfazqfBw4cwKFDhxxdVyfXz4y+1/dAvbw6AMzPzzuqRKn3xBNP4Ad+4Afw1a9+tSnYePXYY49hfn4e169fxz//5/9cfhg/93M/h+vXr/u62LTVPez3zzARUb8RjW791movjx0RmuRGuNivD1ReQ1MQ9us9b/rQJ4LlAalXyWu4GBwcxKFDh7RrIK6H+PqDg4PavoMHD2JwcFDbR/ZcBaavf/3rho0v8UmuqqryQ4b0ZaE/9alPyQ83+dKXvgTUG1Vmzpw54/nTdCNiSJZRqeeRkRFcv34df/mXfyk/5IlfX+9rujV9fv7nfx4/8iM/0rR2j6qqlmv+CFbvX5xr0QtjZGtryzDkimFSW1tbDfvdeOyxx/D1r39d3q2R78egX6fVuXLDyfUz8thjj+ELX/hCw75PfvKT+OM//uOmYWfyv43I569VjzzyCD796U/j0KFDOHPmTNP7ET19Tnqbra6lzOq6+PUzR0TUD8wa8aJXRGxGzJ7rlT4c6Xu3RIjyKzTJAbDVXhl975J4rUbnzcv5Eu9TnAf993jnnXfw9ttv45133tHOz9tvv429vb2m9+gH8X2N3ofVY36R7wG/GN/dJoaHh/H5z3++YaiTWCPp29/+Nn73d3+34XgjYgHaO++8E7/+679u++k96o2/p59+Gg8++GDTXKlHHnkEly9fxi//8i83PKdVYqihPC9EzMH44he/2NTw8+pP//RPm0KMWCTYLbGmzwc/+EHDtZd+6Zd+CS+++CI++tGPNg1Ze+qpp7S5P2bv/6mnntLOteihue+++5qC9Kc+9Sm8+uqrDT0copfh1VdfdRSUrfzwD/8w9vf3Dd/DRz/6Ubz44ovaHJmgX6f+XMlzpx577DFcvny56fuasbt+Zn76p3+64XsbzX+Sh/zB5D4zWhfqsccew2c/+1lPc5g+8YlP4N5778XTTz9t2NP2NYP5W6j/rMlzmKyupczsHg7iZ5iIqJeJgOJ1aye/Gszy13nXh3k4Iizoh9/pH/P6Pd6th8dbbrkFt956q9bbdOutt+LgwYM4ePAgbr31Vhw6dAiDg4O49dZbMTAwoAVOv4j3YHTt5f1u3qfT4/WBTCyk7OR5Tu272USFKz191SyxPWuwjovgpDqW0aavzqcnVxaTK2CJTa6EZVVFS2yiWppgVDFLruTlZb8ofyy88MILnit0iWskfw/9Jn+/fZP3Jr//fen6ydfE6FroyefayTUw2uTvKxi9B6Pjg3id4hiZ0c+H1WZ3/Yxei9F1Mnqt8nU3u8/k9/Kabp0xN1Xy5K8j079G+XfGs/XS5vJ5MLuW8msRm3xujI4xe65RWXJu3Lhx4xb+7UCLaxj5/XXMNr+/ttN1kOTncbPeDtT/h4iIiIiIiCSuhuQRERERERH1EwYmIiIiIiIiEwxMREREREREJhiYiIiIiIiITDAwERERERERmWBgIiIiIiIiMsHAREREREREZIKBiYiIiIiIyAQDExERERERkQkGJiIiIiIiIhMMTERERERERCYYmIiIiIiIiEwwMBEREREREZlgYCIiIiIiIjLBwERERERERGSCgYmIiIiIiMgEAxMREREREZEJBiYiIiIiIiITDExEREREREQmGJiIiIiIiIhMMDARERERERGZYGAiIiIiIiIywcBERERERERkgoGJiIiIiIjIBAMTERERERGRCQYmIiIiIiIiEwxMREREREREJhiYiIiIiIiITDAwERERERERmWBgIiIiIiIiMsHAREREREREZIKBiYiIiIiIyAQDExERERERkQkGJiIiIiIiIhMMTERERERERCYYmIiIiIiIiEwwMBEREREREZlgYCIiIiIiIjLBwERERERERGSCgYmIiIiIiMgEAxMREREREZEJBiYiIiIiIiITDExEREREREQmGJiIiIiIiIhMMDARERERERGZYGAiIiIiIiIywcBERERERERkgoGJiIiIiIjIBAMTERERERGRCQYmIiIiIiIiEwxMREREREREJhiYiIiIiIiITDAwERERERERmWBgIiIiIiIiMsHAREREREREZIKBiYiIiIiIyAQDExERERERkQkGJiIiIiIiIhMMTERERERERCYYmIiIiIiIiEwwMBEREREREZlgYCIiIiIiIjLBwERERERERGSCgYmIiIiIiMgEAxMREREREZEJBiYiIiIiIiITDExEREREREQmGJiIiIiIiIhMMDARERERERGZYGAiIiIiIiIywcBERERERERkgoGJiIiIiIjIBAMTERERERGRCQYmIiIiIiIiEwxMREREREREJhiYiIiIiIiITDAwERERERERmWBgIiIiIiIiMsHAREREREREZIKBiYiIiIiIyAQDExERERERkQkGJiIiIiIiIhMMTERERERERCYYmIiIiIiIiEwwMBEREREREZlgYCIiIiIiIjLBwERERERERGSCgYmIiNpGURREIhF5NxERUWgxMBERUdsUCgUGJiIi6ioMTERERERERCYYmIiIiIiIiEwwMBERUSgpioL5+Xl5NxERUVsxMBERUWidOnVK3kVERNRWDExERBRKiqLIu4iIiNqOgYmIiIiIiMgEAxMREREREZEJBiYiIiIiIiITDExEREREREQmGJiIiIiIiIhMMDARERERERGZYGAiIqLQUlVV3kVERNRWDExEREREREQmGJiIiIiIiIhMMDARERERERGZYGAiIqJQikaj8i4iIqK2Y2AiIgoBRVHkXT2LhRyIiKibMDAREYVAoVDoq9BERETULRiYiIhCgL0uRERE4cTAREQUEuxhIiIiCh8GJiIiopCIRCLyLiIi6jAGJiKiEKhWq/Iu6jPxeBy5XE7eTUREHcbAREQUAtVqlb0LBvopSEaj0b56v0RE3YKBiYgoJIaGhuRdfa9Wq8m7iIiI2oqBiYgoBNizQJFIBJubm/JuIiLqMAYmIqIQqNVqHJLX54aHh+VdREQUAgxMREQhwQZzo34LkENDQ6hUKvJuIiLqMAYmIqIQ2N7elnf1vX4LkP0WEImIugUDExFRCFSrVRZ9IBa5ICIKIQYmIqKQYA9Df1MUhcU/iIhCiIGJiCgE2LNAYLVEIqJQYmAiIgoBLlxL7GEiIgonBiYiopBgYCIGJiKi8GFgIiKi0OqnMtuRSISBiYgohBiYiIhCQFVVeRf1GfYwEhGFEwMTEVFIKIoi76I+EYlEGJqJiEKKgYmIiKjD2LtERBReDExE1FGxWAzz8/Py7r7TL70LboJBvy3ky/lLREThxMBERB0ViUQwNTUl7+5L/TAkz01gcnNst2NJcSKi8GJgIqKO66eGMZERzmEiIgovBiYi6ihVVfuiZ8UJNpj7V78NPyQi6iYMTETUUQxMjXgu+heH5DVizzMRhQUDExF1XLVaZeOI+tro6ChqtZq8u68tLi5ibm5O3k1E1HYMTETUcaqqMjBxSF7f297elnf1tUgkwl43IgoFBiYi6rhqtcqhaHU8D436pddlaGiI4UASi8Wwvr4u7yYiajsGJiLqOA7JIzP9EiJ4/zdjDxMRhQUDExF1nKqqiEaj8u6+w8Zhf9vc3JR39TVFUThMlYhCgYGJiDquVqtheHhY3t132NPWvzgUsxHDEhGFCQMTEXVcpVJhD1Md1+PpT1y4thGH4xFRmDAwEVHHbW5usmeFQ7Ka9FOvC+//RoqioFwuy7uJiDqCgYmIQoENxht4HvoXe1RuikajPB9EFBoMTETUcaqq9lVvghXO5epPiqIwIOhwDhMRhQkDExF1HAPTDVy4tH8xLDXiHCYiChMGJiIKBVaIu3EOWPSh/zAcNFMUhXP6iCg0GJiIKBRUVe37wATOYepLvObNOCSPiMKEgYmIQqFarfb9sDx+ot6sXxrN/fI+nWKZdSIKEwYmIgoFDsm7geeg//T7BwUyDlEkorBhYCKiUFBVlfN3GJj6EntTGjEwEVHYMDARUSjUajWMjo7Ku/tKNzUSJwGcBXDBYJuUDyZL/KCgUSwWY4AkolBhYCKiUKhUKn3fcOyWwCTCklkwOlvfZPF4vOG/dFO3XPt2GBoa4vkgolBhYCIKSDKZlHeRhc3NTc7l6IL5LCIs2RHHKYqCubk5VCoVLC4uolqtYnFxEdvb20in05bv1+qxXjI6OoparSbv7lujo6MsgEJEocLARBSQubk5xGIxeTdZ6Pf5O93wqbqTsAQAR48exYfHxlD4zGegKAoSiQQSiQSq1SoSiYQ2/LJQKKBQKGBubq6vrz8XLb4pGo2iUqnIu4mIOoaBiSggqqr2zSfkfuD5Cn+lwNPyDskdd9wBRVEQi8Vw5MgRXLlyBX/0oQ9hYWGhaU5KtVpFOp3G6OgoMpkMFEVBqVRCLpfruyF70Wi0K8Jyu0QiEfYwEVGoMDARBWR9fZ09TC5w4dobwnwOjOYsDQ4O4ujRo3jooYdw7Ngx7Ozs4LnnnoOqqvjud79r+BxZsVjEwsICJiYmUCwWtSF7kUik70N0P2LRByIKGwYmooCUy2WMj4/Lu4kshbmnQR9+jh49irGxMYyNjeHIkSPY2NjAxsYGrl27hr29Pd2RzlWrVaysrDQM2cvlcj0/ZI89Ko1YVpyIwoaBiSggYR9eFUYclhfuwCSG3D300EPakLvnn38eqqpid3dXPrwl1WpVm++kH7KXz+d7bsgef0/cJM5FmH8OwiwSifB+IgoAAxNRQMrlMofkuVStVvs+MCFk1eEURUEymUSlUsGBkyexs7ODjY0Nbchd0FRVbRiyt7Ky0lBlrxfCU4QL12rYu9SaU6dOIZvNyruJqEUMTEQBEX/0+Wmfc+yVC88n68lkUqtgNzU1hUQigX9VLOLatWuuepPOyTtaUK1Wsbq62jBkb3FxseuH7HXr6w6Coigol8vybnIoFovx/BEFgIGJKEDsZXJHVVUuXtvBwBSPx5HL5VCpVDA1NYVMJoPR0VGkUimoqopzAC7KT7LhZ2DSq9ar7MlD9gqFQlf2OnXyuocJKwa2RlEUzocjCgADE1GA2GPiTq1W03oO+lk7h+Tph9wtLi5ibW0NiUQCqVQKxWJRPtxVAHpc3hEQ/ZC9TCajDdnLZrNdEZ4URWFIqOOita1hDxNRMBiYiALE0uLuVCqVvu9hahejIXeJRALLy8uW82ku1oOQXU+Tk2P8Vq1WUSwWtSF7tVqtK4bsMSzdxEVrvRMFH6x+fonIGwYmogCxtLg7m5ubbe1dCaMgGzvykLulpaWGIXdOidB0zmQ76VNYaiXgGA3Zq1Qq2pC9Vr62n9i71CgSiaBWq8m7yQFFUVz9HBORcwxMRAFimWz3wtKQ7SQ/z4F+yF02m20Ycre6uiof7ooclMTmF7/OgxiyNzw8rA3ZE+ejG4bs9RM2+r0bHx/ncDyigDAwEQWIRR/cUVXVt0ZyN/NjWKLRkLuJiQnbIXe9Tj9kT1VVbcheOp3uyL3HIVSNeD684/wlouAwMBEFrFwus5fJIfbI3RiW6FUsFvNlyF0/qFarWFpa0obsoT6HrlAoYHp6um3hqV3fp1tEuA6TZ6yQRxQcBiaigFW5GCu55KZSoH7IXS6X83XIXb8oFotIp9PakL25uTmUSqW2DNljj8pNnM/VGvYwEQWHgYkoYCz84A57mZxJJpPI5/MccufSJICzAC5I29n6Y2LI3sTEhDZkr1QqBTZkz4/hl72C4dG7CCvkEQWKgYkoYAwA7vR7j5xVSWV5yN3KykpPD7nz+z2d1QUjmQhS4jH9kL1UKgXohuzNzs76Gp7Yq3LD+Pi479e8X7BYBlGwGJiIAsa1mNxh47Gx14FD7vxhFpRkRseVy+WGIXvT09O+DdkT60UR5y+1YmpqisPxiALEwEQUsGq16uun0b1OVVVEo1F5d1+JRCJNQ+5mZmY45M6jSYMQZOWsvEOnWCxq10IM2atUKi0N2dve3pZ39SX2kninKAoDE1GAGJiIAsbS4u7UajVXRQ96SSwWw/T0NJLJJE6dOtUw5I6NIe9OyzscsHuOfsheIpEAdEP25ubmHIenaDTKXpW6SCTCKm8exWIxrK+vy7uJyCcMTERtwNDkXKVS6auJ8PKQO1VVtV4MDrnzh5veJS9UVW0YsheLxXwbstdPYrEYe5g8YoU8omAxMBG1AQs/OLe5udkX58psyN358+flQ6kFXsOSXQ+TmWKxiFQq1TBkb3t7G+l02vC+Zq/KTZzD5I3ozeS5IwoOAxNRG6yvr/f9vBw3nA5n6jb6KndmQ+7Y6LnBKFx4cVHe4ZDX5wn6IXtiiGmpVGoasseQcBPnMHnD3iWi4DEwEbVBpVLhkDyHVFXtqcAkhtyVSqWGKndmQ+5YJCR4hwFEAZyQtmj9MfgQmPSq1WrDkD1FUbT7gYu13sCw5N34+DgDE1HAGJiI2qBfhpn5oVeGL05PTzcMuRPDtJxUuWNg8pc+/IiwJIKRnngs6nNg0isWi1hYWMDExATK5TIikYi2MG4/f6jCnjbvGDaJgsfARNQGLPrQH/RD7mZnZ3H+/PmmIXd22PDx37n6f0UgcsLrHCanxJC9arWqDdkTAdtNlb1ewbLY3sXjcVbIIwoYAxMhmUwimUzKu8lH4pPTfmsEeVWtVruml0k/5C6fzzcMuVteXpYPd6RX75NOva+LAB53EZZe9bB2k1diSF46ncbo6GjDkL1CodA3VfZYXt07hk2i4DEwEVRVxezsrLybfMZeJue6YVie0ZC70dFRR0Pu7HQqWATN7ftq9TzqTQKwq0X3Rv2YN+r/DrqXCQZFPvRD9paXlxuq7PVyeBoeHvb1evcLEbjl+4iI/MXARGzItwkn8zsX1j/+fgy5cyKs77+bna4HoUv1HiQ9EZT0YQlt6GGyKvhQrVaxsrLSUGUvm8327JA99jB5wwp5RO3BwESoVqtQVZWhKWDr6+s8xw6pqhqaMuxBDLmzw4ZjsL5TD05ik4NSOzm51mLI3sTEhDZkr1KphGbI3mkAFwy2sy5Cp6IonIfjQTQaZc8cURswMBFQb6CyMR+scrmM8fFxeTcZqNVq2qfqnRLkkDsnwj4kkVrnpTKcGLI3OjqKpaUlbcheNpvtSHi6YDF0cdJFaLLqbSNz7GEiag8GJgLqf4Snpqbk3eSjXltfKEiVSgVDQ0Py7sC1a8idHTYc/RdUmfBWeAlMQrVaxerqqjZkr1arYXFxsa1D9s7KO0w4CU2RSKQtH0T0mlgsxp45ojZgYCIAwNraGnuYAsZePOdqtVrbelg6MeTODue7+c9LYBLlyIPSSmDSE0P2EomE4ZC9IO6l0w5CkJ5ZLxR8PA/9SOEaTERtwcBEQH24mKIogfxhpRtEg4Dn2F47AsP09DRyuZw25E4Mc2rXkDs7Qb//sPP7/XsJP16e40YQvaj6IXuZTAaLi4uoVCq+D9mzCkBGJi2ew8DkDSvkEbUPAxNpWC0veDzHzgQ1fFFRFG3I3dzcHNbW1rQhd8ViUT68Y7qlASTmqMiT/a3mtTgVxPV/XN5hwc2xrdjctCt27k21WkWxWNSG7Kmqqg3ZS6fTLZ3fVq+tLBaLheJDim7D+UtE7cPARJpyuezrJ5DUjIUfnFF9XIdJP+SuUChoQ+4SiUTHhtzZ6YbAZDeh/7TN450gFrC187jHIXxutauwSbVaxdLSkjZkb2hoSBuyNz093VJ48sPQ0FBX3PNhwwp5RO3DwESatbU1NuYD5mcQIGthH3JnJyxl1Y2IsGTHahhWp1wEcLI+3E4ORefqj8n7g1SpVORdgRJD9oaHh5HJZDA3N4dSqeT7kD03RkdHA+tp62XsYSJqHwYm0rCHKXjr6+sMTA5Vq1XX56pbhtzZCfun7U7CktBKaAryPJyr9ySd1G1Bz1mSdToUiyF7ExMT2pC9SqViO2TP7/MUjUbbHhx7ASvkEbUPAxNpVFVFtVrlHJsAeQkB/cppb1w3DrmzU6vV5F2h4WWIXRgDU1iE4Vrrh+ylUimg3vNVKBQwOztrGJ689MKZBa1IJMIeJg9YIY+ofRiYqIGolkfBYNEH5+way90+5M5Ou+a3uOUlMJGxMFaHKxaLSKfT2pC96elpwyF7ZuHHjNXxLPrgHivkEbUXAxM14AK2wWNockZV1aYhS70y5K6bee0t8vq8XhbGwKRXLBYxMzPTMGRve3sb6XQaL0UiliFI76JNYAr7eQgjzl8iai8GJmqwvr7OxnzAqm1YY6gX1Go1jI6ONg25K5fLXT/kzk6Y53N4GYoFmwZzuwRZBt2LbgkK+iF7iUQCqN+jHy4U8OLcHAYGBuSnaMRcMTPid2E3nIcwYYU8ovZiYKIGxWKRhR8CxtLizgwNDWF2dhalUqlhyN3S0lJfNBTk3rWw8BqYOs1pGfR26sYPTsrlcsOQvb1YDJFvfQvKpz6FjXgc5+ohSVQctAvK3RIaw4Y9TETtxcBETRiaguW0mEE/0g+5i8fjKJfLGB4e7rshd+4KAZw22MxiQeu8BCb9czrROHZTBt3JcX7q5vBfLBaRSqUwMTGB3/yrv8LI4iJ+a3sbx9Jp/JHD33GKorDh7wEr5BG1FwMTNWEPSLA47LGRGHJXKBQahtwtLCx05SfwfnAWKk7rBpPJ29nAmv4XPYQmfS+Ds/fmb++LmzMxGWjcbNQrH5zoh+xNTEwAgDaEdm5uzvJaRqNRx/cE3cQKeUTtxcBETdbW1tjDFCBWIrwhHo8jl8tpQ+4ymUzDkDtVVS0bWr3O+h4562DWTXD9JY+7CE1ujtUbGhqSd3lid5aMBHPWmvVig1dV1YYhe4qioFQqIZ/PG/5d4aK17rFCHlH7MTBRE1ZxC5Yo+tCPYUA/5G5xcRFra2umQ+44dNGMmz6QVpaNteYkCOmPOQ3gwzs7OPaVr7Rh4OBN7fgeXvTDvV0sFrGwsICJiQkUi8WGKnvib8zQ0FCoi5yEUSwWa/p9SUTBYmCiJuJTz374g94p/dTLZDbkzkmVu34MlbDteXAbgNwe79zj9U0/0V8/4f9iPbCIgYMf3tnByJe/3IaBgzeFNTDBxfDEbqcfsifWFxO/D+LxuMs5exSLxdgrR9RmDExkiL1MwapWqz0fmOyG3DnRz71MxmHRTe+SXnChSayxI29wOCjQyTG9qF+rw1Wr1aYhe9lsVgtPZC8ajbJXjqjNGJjIEBewDVavFn4wGnLXysKy/TyPyfh9ewlLneEmCJkd68en6HZlrTulXwOTXrFYRLVa1Xqb9UP2GJ7MxeNxrK2tybuJKEAMTGSoVxv0YdFLlQjlIXebm5sNQ+5aaRSK+V79xmkPXJi57dPy2ndmx26elREvz3GLgemGSCQCVVWxsrLSMGQvl8s5qrLXj1ghj6j9GJjIULlc5id8AeqFIGA25C6dTvv2x7xarYZ2Adegdfv94SX8eHmOHS9l0B+XdwTAryqA3UxUe9MTQ/ZGR0e1IXuVSoVD9uoYlog6g4GJDFWrVYamAHXrHDG/h9zZ2dzc1D5x7l369ZNu1o4zDkxeB5i5jQyt8Rp8vD7PjpsA5ObYVvkx5LCbid4lM6LKnpj7KIbsZbPZvv3bNDU1FcjvWiKyxsBEporFYs8MGwsb8amqcaM4XCKRSGBD7uxsb2/38Cfxp00Wnr1RO+6P/uia/IQ6t+HHSx9Lo+Y4Z83JMe0mqvZZcVIq3S+9/0GAvfHxccvAJFSrVayurmpD9mq1mlYoIp1Od8XvUb+Mjo6iXC7Lu4koYAxMZIoL2AYr7L1MYshdpVIJbMidnc3NzR5tDNktPDuJf/yP78DAwI/ID3joA/HaK2Uc6c7q9pnx+h2DDitOyqC3U79XOotEIq572cSQvYmJCWQyGW0dJzFkrzd/X9w0Pj7u+pwRUesYmMhUP5d0bodyuRy6+TmixG+7htzZqfZk+fWzjvtgBgaeknfVOQ1N3vtM7CKdCE/dxqoMejuF7We/ExRFwfb2trzbMf2QvUwmg8XFRVQqlZ4esheLxdjDRNQBDExkSiyu2uuf2HWKqqqhGJYjD7kD0LYhd3Z6s6y4s7C0u7tb/z+j2HKx3idi1tRvrc/EaaQzKwcOi1dmRf+cMPxsBK3fF2z10sNkpFqtolgsNgzZW1xcRKlU6qkhe5FIxHbeFxEFg4GJLLHwQ3A2Nzc7OkdMDLnb3t5uGHK3sLAQmj/IvdfDZBR+zA0MDNg8RwQjefMSV25wW97b7PhzLuOa0z6zXhFhWXHEYjHff9eIIXuJRAILCwsNQ/amp6e7OjyxQh5R5zAwkaW1tTUuYBuQTgx5NBpyNzw83LEhd3baU359UpqlI7YgGEULczcCU3t5eedm78rpgECnx/USBqbgz4EYsjc8PIxMJoO5ubmuHrI3Pj7O4XhEHcLARJaKxWKoCxN0s3YVfQjzkDsngh2Wd6MiXXNYOu2gtIEXZtGi2d7enu5fzp/XKi/fyeosiSILRi72aViCp7Ag35/dr509Jvohe6qqavOdumnIHucvEXUOAxNZalejvl8FeX67YcidE8H1xDmZqXPaYpaOF86jQWNgcv68MBIDBx89dgwXz57VBg7ahaVeriLnvJGuD+9BB/r2aWdY0qtWq1haWkIikUAqlQLq91mhUMDs7KyL69J+iqL4MueLiNxjYCJL1WoVqqoG1qjvd34POeu2IXdO+H2ObnASlgSzWTpeWMWDZocOHZJ39SxRQELEgL954QLiGxvyYT3FPjD0aq1CLz1s/isWi0in09qQvWQyiVKpFNohe+xhIuocBiayFWQvSL8rl8stF36Qh9xFIpGuGnJnp1qtBlCC2W0Asmq0uuE8MF2/fr3+f2YD2sKjlVc4WQ9J+gh7GsDkxYuIb2zggoer1Q3se02dhnoxB6+7KIoSqsa/GLI3MTGhDdnb3t4OzZC9CCvkEXUUAxPZYuGH4LQy3MxsyF0qleqpP6r+L17rpXHppOHqxEVXoemGVuKIe16+m9t3JIheJTtOo0M3sf4ZPe3yHbs9vvOGhoZC+WGOfsjezMwMoBuyNzc35/PvIuc6NYSRiG5gYCJb7GEKzvr6uqtzqygK0ul0aBaWbYft7W0MDw/LuzvAS9AyYjdz56Yf+7EVeVfg3AYmLxFQcBKWBDfHhp39hyRewo+X53SOKL4QZvKQvXg8jlKphFwu1/Yhe6yQR9RZDExkiwvYBsdJBTh5yF00Gm0Ychf2RkerNjc3AxiS12lWteMA4CL+/t//Kk6elPe3h9M1kUSVOy+8xE8vzwkr694VL+HHy3M6JxqN2pyDcCkWi5iZmcHExATK5XLDkD37ANw6zl8i6iwGJnKEvUzBsCqoIYbcVSqVnh5yZ6camsVrrQKOF6J23DlpexzA46hWvyo/oW2clPt2G5bi8Tiy2SxisRjS6TR+bHBQPsRWrwSmYAoedFdgUhQF6+vr8u7Q0w/Zm5iYAACUSqXAh+yxQh5RZzEwkSNra2ttH4LQL/Rh1GjI3cTERE8PubPjpBfOHasY0AlyYLr5+jrZsyYCkegLa450ziSTSe1+LhaLWtj/8NgYxsbGcOTIEfkpQI9XCAxm8n7Y7mtriqIEEBrbS1XVhiF7sVgMpVIJ+Xze97+X7GEi6iwGJnKkWCyy8ENAVFXF3Nxc3w65s+N/WXEvDUu/e5fs1Wo1eVdHXDQITHZnUFEULShNTU1p9/P58+dRrVaRTqfx/PPPY2trCyMjIxgbG8PRo0flL9Oz/L2fBburEi7BhMbOKRaLSKVSmJiYQLFYbBiyZzaKwClWyCPqPAYmcoRD8vwnhtzF43EoitK3Q+7sBDMkz2n/CHSRob268dN3RVGQy+VQKBS0oGR1P+/s7GBjYwNbW1uIRCKIxWIYGRnBwMCAfKgnogrfBd1mt7JRO9hXiPNyv3VPYApmSGI4GA3ZEx+GeR2yxwp5RJ3HwESOVLmArS/0Q+6y2SzW1taQyWSgqmrfDrlzIphheU5Ck9PjguF/UAyGCP+FQqGhcqNVI08fCXZ2dvDSSy/hueeeAwCcOHECd9xxh+6IG9xEgrMm5cjFqkVGj7WTdQ+i28B00eXZ6axeDkx68pA9RVG0+U5uhuyxQh5R5zEwkWMMTN6YVbmbmJjA8vKyVoWQzKktrFdlToQhs4ammK1DZvTzk0RQWl5elg8zZHTW9/b2sLW1hUuXLmF3dxeKojTMc3IaI5yEIadrQAVhdHRU3mXA6b3X2VDvRSwWswzTvahYLGJhYQETExNYXV1tGLJnF544f4mo8xiYyLFisYjx8XF5N5mQq9wtLS0ZfvLO4Y72/J/HJIjG5kmDzWnzPBhWDcrTHRxmZjY/yWlQEqya+Xt7e9jd3cVf/MVf3Jzn9OEP4+Fk0vY+cLuEa7vOm6xSqci7JHaBHl0b6oeGhizv716mH7IngrPonTUbsscKeUSdx8BEjq2vr9t+EtbvjIbciSp3q6ur8uGAbq6K0R9KuqFarXa0YlynyPeEmI8jN/LbMcxMzE/K5/OO5ic54WR22M7ODr6wsYFHt7YwNTWFSqWCdDrddG4E+dzYcXu8H5zfyyI0GdUq7Hyo96obFq1tB1EAZXR0VBuyV6lUmobssYeJqPMYmMixYrGIWCxm2lDpZ1ZD7pw0DNjLZG1zc7Pv7ju5V81Jz4kYZmZ3nBvy/CQ/gpKe6CMx6kfRxwVRhUx8Kl8qlbR1nQSv77sTocl6DpNMREv91r26bdHadhBD9sSwVjFkL5vNskIeUQgwMJErIjTRzYbk9va25ZA7J+TGMTXa3t7G8PCwvLun6RuUogfJKTfHmkkmkyiVSk3zk4Jo6BoNjPzy4iL+7cmTTUFKfCo/MTEBVVW1MBePxz0Hpnbrl6IHZiKRCIeYmahWq1hZWWkYsod6pT2rnlUiChYDE7lSLpf7eh6ToiiYm5tzNeTOifX1dQZRC5ubmy6GMfUO0ThyG4AmPfa2yPOTFhYWPM1PagcxF2RiYgKZTAaLi4tYTKdxzz33+FaWPCj93mPAMtnOiOq0y8vLyGQyGBoaahiyx/BE1D4MTOTK2tpaX85j0g+5i8Virofc2en3IGqnGshaTOEnGkRewo+b54j5SblcrmF+kt+l7oO6hsViEYlEAv/27FkcPnwYJ06c8HU9J78FdR66haIofd3D5oYIl/ohe0tLS1hcXNQ+uOvHv8lE7cbARK7001wb/ZC7U6dOtTTkzo4aSNns3tGvjStVVV33LglOApM8PymVSgVyf3vh5dPz9NYWVFXFpUuXgPp6Tvfdd59WltxMJ2YE9es9La5rv75/t2KxGNbX17V/V6tVrK6uakP2arUaFhcXUSqVOGSPKEAMTOSKaEj1auPebMjdzMxMS0Pu7HAtJmv9HCi9NoDkuT96osdU3ONiflIYgpLg9X2fk9Zzun79Oo4fP96wnpOe1XkKSj/PYern9+6FVYU8MZ8vkUhgYWGhYcje9PS0558hImrGwESu9WIvkzzkbmZmxtchd05UWfjBVL8OyatWq3jJp3siUl9AWcxPymQy2j3eS87pQtDe3h6uXr2KS5cu3VzPaWxMm+dktRYUBUNRFNMAQI3c9Mbph+xlMpmGD/44ZI+odQxM5FqxWMTU1JS8u+tYDbnrxB90VVV7Loj6SVXVvguU1WrVcw+IeJ6Yn1QoFAKdnxQEJw1FI0Zlynd2drCxsYGtrS0cOnQIJ37xF7HVoSFMXt9XL2BJcedisZjrn9NqtarN6RPrXYn5ThyyR+QdAxO51s0V3To15M4JFn6w1q/D8iKRiOs5NhcB7MViDfOTZmZmQjM/yalWGtZiqVc5OBV3dvD3XnkF7zl/Hqiv55TL5dr2KXw/3sN6EZYUd2x8fLyln9dqvZKk+JBEP2RvdnaW4YnIBQYmcq1YLLatceGXMAy5s9OvgcCNfvwDLwKT3PA3c/ToUcR/93eRz+dDOz+pXcSiuPr1nUTvk5j/MTExgXK5jMXFRa1cc9D68VoIiqKgUqnIu8mAn8MXxZC94eFhZDIZbXguh+wROcPARJ50Q2gK25A7O93cc9cO/TgkT9+wNhpmpnf06FE89NBDKH74w3j83DktKJE1/afwYj2nUqmEubm5QO63IL5mN1EUBbVaTd5NBuLxeEOFPL8YDdnb3t7mkD0iCwxM5ElYh48puoU3c7lcqIbc2WHRB2v9unitnjzMbHBwEIqi3AhKY2P4oe9+F6nf/m3X8x7oBtGQTKVSUBQlkFLNES5aG8oPrMJICXiBX/2HBTMzMwCgDdkL6gMDom7FwESehG0BW/2Qu6mpKczMzHTdUKRerD7op+3tbQwPD8u7e54cEs8B+LSi4Fd/LIdfGHsND77xX3Dnxu8i9af/Bqr66w5XYCIr5XIZCwsLmJiYAOqNSL/mOcnXs99EWFbcEaW+uG+7zlWxWEQ6ndaG7E1PT7d9fh9RmDEwkSdhqOgmhtxVKhWcOnUKKysroR5y5wRDk7l+7GGSJ8dPT0/jM58p4I47vomNjTief/55XL16Fbu7u/UjJgGcrW/BOm2wOdEtH2BAN89pdHQUa2trvs1zMmsET3o8p92EgckZq/WXgiZ6WvXz+8SQPc6zpX7FwESelMtlRCKRtv/yNBpyJ4YT9MJ8jX6cp+NUtU/XYkK9B7VSqeDnfu5TSKdH8Pzzz+PatWvyYToiOPnvLIALBg370/X9vdi/Va1Wsby83DDPyWuZ5kgk0jSHZ7J+7s72+DkVvSZeyeeml0Wj0Y5/uKAfspdIJIB6VUkO2aN+xMBEnhWLxbb1hvTCkDsnVFUN5dywMGilodWNFEXB+Pg4lpeXtfWTHnvsDXz3u9+VDzUx6XtT+6yDr+jkGCfC2hgTn77PzMxgaGjI9Tyn4eFhbG9va/92Em1FkAoj8fov6Daz1+t1/pYIjnJgEt+rF3Wyh8lIuVxuGLIXj8dRKpWQz+db7nEl6gYMTORZ0AvY9uKQOzvlcrlve1Hs9EvZdUW30KyiKMhkMvX1k35SPtQBo2arN26CkJtjzVgHEKPBa/69VyeM5jk5bTyK8O8kLAmnfTinfhODP+XXJa6O/JiXdYXMwpfg5hx2k1gsFkiFPD8Ui0VtaY5isdgwZM/J/U/UjRiYyLMgymD3+pA7O5ubm76f017R60Py4vG41osq1k86c+aM7gi5WeqEl+cYc/uVrBq5rRHNdDksiT4Ht6+0NWKe0/DwMM6fP287z0k/D89tQz+4c+qeHIaMiDAjjou4XLTWyfdAj4amoCvk+UE/ZE98cJDP5zlkj3oSAxN55meBgn4ZcmfHz3Pai3pxjpf4gGBxcRGZTKZp/aSbDWwnTcdgeGmo+/Fqm4dhOmlCOzkmGPI8J7O1bTY3Nz29Qi/PCYLb3i5x/yguFq2ddPk93B4fZu2ukOcHVVUbhuzFYjFtvpPZhwdE3YSBiTyrVqstVcvTD7mbnZ3tiyF3dsQfSLmBRTf0yrA8RVGQTqdRqVS0+UmJRKJp/SQ3n8YHyWtD1OvzhMYGo5sg5OZYv02iWBxDInEE3/d9r+Bzn/tlHD/+3zEy8gwGBn5E+9n2+uq8hFe/uX3tIswYFbww4/Z7ICTnxg9hm7/kVrFYRCqVwsTEBFZXV7UPD7LZLMMTdS0GJmpJuVx2NY/JbMhdIpHoiyF3TnAek7VuDpP6+UlDQ0PaIqnOelHFcrXt56Xx6i8v/QenMVmPTvqCBKJ4QDDEd7zxHXZ3d/HKK6/g0qVLACbx8MN/gt///emuvoe9XAnUn+NmmJmXa+TldYVRGCrk+UE/ZG90dBS1Wo1D9qhrMTBRS5wuYJtMJrVflFNTU0ilUn055M6JXp+r04puHZJnND9pYWHB0b1/817wEpi8PKfZOXmHQ/58d3hqCkcxadrPFMxsJxGWmu3t7WFrawvlchl/8ifXcccd/x6L6TSOHDkiHxp6Xs+ZCEzdNMysU7q9h8mIfq5fJpPRhmdyyB51CwYmaolVb0gsFmsYcnf+/HltyJ089IhuCqKYRq/otsVr9fOTxJBTNz2pjY1LL7HFy3PCyF0zPQrgMIBLNs8zC1TeGIclI1/4wgYeP3sBIyMjiMViGBkZwcDAgHyYoW69ohdd9jD1szBXyPNDsVjEwsKC9vuQQ/aoGzAwUUtEYBKf+uuH3OXzeQ6586BcLnMtJhPb29sYHh6Wd4eKoiiYn59vmp/k5f5v/jT+cenfVh73rY/HSyPdy3PMOY81h+ubU16GfjVz/lUOHjyIvb09fGXrg9jY2MCLL74IADhx4gRGRkYwODgoP0Xjz9VsjdfX4DYsefk+Xp4TRm7PVbeqVqtYWVlpGLInhiwbFUsh6iQGJmpZuVzG4uIih9z5pFuHnbVDmHuY9POTxsfHXc5PMtd4L1x0GJr8C0uCk++q529gcv7V7tb9/wkH52DSVdwx4zzQ3TSJx+vznLa2turznIAHH3wQY2NjhsP1nJ+F4NifUWMvRSIGHwCY8/J9vDwnbMSwRTfnqheIIXujo6PIZDIYGhpqGLLHv4nUaQxM5JkYcqcoCqanpznkzietVB7sdWGc3xWPx7UPC8T8JD+CEurvt7mhcBHASZPm87n6Y/43HZ1GNfHqOsVN75J/nAWmgYEBvPPOO9q/9edUP8/p6tWrGBkZwdjYmHb9/Y/A3hndeVYuArimKK7m5Zxz+X4venhdYdSL85fc0g/ZW1pawuLiIiqVCofsUUcxMJEr8pC7crmMTCYDVVU9DTmiZiwtbi5Mn7rq5yeJDwv8/hmwfr8iHOm3YJuMooFv1pA95zBUuefsfenD0ozD58CXHiZnjOYpGQXRarWKjY0NbG1t4fb3vQ8PPPssPhiiIUpuw8w5AENDQzb3czP5vFhxfrXDrVcq5PmhWq1idXW1YcheNptFqVTikD1qOwYmckRUuSuVSg1D7paWlrC6usoeEZ9xAVtjaofXYdJ/YCAWWPY6P8mpTr5fI6KB/3jAca35fTtvPp/ARfysi1fjpvFvzPlX2N3dlXdpvXKP18+h2H5hZwf3fuUrGP3QhwAApVIJ2WzW4Ny0n1Vw1hPHjY6OegoCdv2l4txZHdNN2MNkTAzZm5iYwMLCQsOQvdnZWYYnChwDE5kyqnI3PDzcNOSu2uICttSsXC6Hdq5OJ3VqSJ5+fpIo5HBzgeXT0uYvt5/Kt0v7G6hGfTGN3qiHpU/YHCdr/b04+wqNBR2aA91FKTCJr6pvLNZqNZRKpVCUYxYBz+jdn5OCTDQa9Xwvi4CuPzfndPt7SSwWw9ramrybdPRD9jKZDJLJZEf+LlD/2efGTWyKouwnk8n9SqWyX6lU9ufm5vYVRWk6Tt5yudx+Mpls2s/N2zY3N7efTqeb9nPDfqVS2Y9EIk37g9ji8fh+Lpfbr1Qq+8lkUvezMLkPnN0HLphsp5u+ltetUqk07evWTZxPeb/Rtr293bTvxmZ17k/vnwX2L7jcJpu+h5dNfi3N2x13fHP/6NGdfeD1psfc3jPxeHy/UCjsFwqFrvjdm8/n92OxWNN+bo2b+X3PjRu3Tm7sYSLAZsidk2EUa2trmJqakneTR+vr6ywtbqIdw/KSySRKpRIWFxe1Qg6NFR/tVvA57WpdHjv9ONzEvDdC9DTJAwJvDAps7rexdtGkh8Q9+76ON9+8Bzs7twHYlB9yvZxusVhEIpFAJpPB7Owstre3Qz2vIxaLWVxTAhf2JQo1BqY+5nTInROcc+OvTg096xZBNArl+UkLCwsm85OcBqFJF8eaqxpWyutGp3HhwpPIZD5YDwZi83cYoxjW5pSbY63ZDRm8sZzuHXf8dX3woBm7MN5IBKfR0VFAN88pbL+PI5GIow/f+lksFnP9t5eI2oOBqU8VCgWtyl0rC2sK8gK21BoGUHN+r1NlNj/JuOFy2lVj9saxbo5v1hufOJ+1CEbuelacEHNc7DgtXOCcKEEgf/e7ALyOo0ev4tChm2XFzZmdK3PyPCdR6r7T85xQD0u9cR8HKxaLYXPTqPeRiDqNgalP6dc48OtTPzby/cXzacyvxWvj8bgWlNbW1hwuNOulUe/lOY38DIjt57THxOlxzoiiAHJ0EY8FW1lNfAexfQfAdzAwMIC9vT35YAPeg7YITmJC/OLiIgqFApLJZMfuIwYmZ6LRKCqVirybiEKAgalPBVG2tFwuh+LTzF7RO0Ox/LW9vY3h4WF5t2PJZBKFQgHZbNZkfpIVL41YL89p1L33gdseOfc9K1bE8LzmmU7tdPP9Hzx40GFggsvzZkw/z+nUqVOoVCodmecUi8Uc/nz1t3g8zgp5RCHFwNQFFEVBNptFLpcLdSBZW1tjoQIflctlnk8DXnqY5PlJmUwGExMTLQ1Dda61hm93NzTdBiDvPSvhdfP9GC1c2w7FYhEzMzNN85za9fdkaGioy+/j9uA8L6LwYmAKuXg8jlKpBFVVsbm5iUKhgPn5efmwUCgWi237A9wP2lENrhu56XlzNz8pKMEN/Ao3r8HH6/O6w/Xr1+VduEu3HZYf9JF+npOqqtrPRtC/t70uWttPFEXhOSIKMQamkInH4w2BaG5uDsvLy1haWkI6ncbCwgLm5uZC2ZCucgFbX62vr/NcGnASmEQFSHfzk4LSWmCqVquue9TCobeDj3M3r3/jwrU3AtIJAHfrtmh9X9DBaWlpqWme09zcnO3PlhetLFrbL6ampgIZKk9E/mBgCgG5vLf+l2YkEkGtVtP+febMGQDA9PS0ti9M2Cvin3K5HEjjpdtZ3WNiflI+n/cwP8kJL7NfWgtM+p//fmF2fc1M1ktG6IuVX/C9jIRXjdd/d3cXqAejuxseafRfcM6HovT29POcYrFYIPOcIpEIq7/ZGB0dZWAiCjEGpg6LRCIolUrY3Nw0HC5ULpebFoQtFouYnZ1t2BcWxWKx6fWSN1yLyZh8XiKRiDY/6dSpU8hkMlpQ8p/bwOTPsqh+Nl7bp/X37cRZi2AkgpTbmVT+u3HfiB6mGysymZupHz/ZxtdeLBa1BctRn+fk17xZDjezNz4+zlBJFGIMTB1WrVZRLpcRjUYN/6CcP38e8Xi8oYG4srKCWCwWykYUh5H5i0McjYnzksvlUCqVtPlJMzMzbZifZLU4qZ5YyPS0tLnXSlVAP5n15Bi/K6+ByfnzzIKSzG2tPv+d097Xnbu7lmHpBC7iZ3XB3PjcBkc/z6lcLiObzbY8z0lRFA7JsxGLxdjDRBRiDEwhsLS0hHg8jmQyiXw+j0qlgnw+j0gkgmKxCFVVkUwmteNFgzCMgYmFH/xlNfysX4nzkc/nsb6+3oH5SWJxUquGvWggizih38RgMWfCsi6LVU/OadPH3PbI3QwWdtfTbT0952c8KI/jf/1fD+J75N06J3ARnzAI5O0OTdDNc5qYmNDmOZVKJdfznMSxDEzmIpEIIqyQRxRqDEwhUCwWoSgKpqamsLKygnQ6jVgshmw2CwDIZDINhR5EIAnrHyCGJv+wx+4mMT+pUCgAAFKpFM6cOdPBRsbjuqVR9dtJB4OpRF9NdzAOQ42M39HNAGTvoquAZXV2zXh5jp8+/OEd/DM8rg25E0RQMgpLcHDugybmOS0sLCAWi6FUKjme5+RXEBA/UWLr9DnxE4csEoUfA1MIqKqK0dFRpFIprK6uYmVlBZlMRluZfXl5GaurqyiVSsjn88jlclhYWAhtYOL6Qf4RwzX7mZifNDs7q81PKhaLjhprwRONfP3mJF7ANGLIarVaR+8Bt43T5nckQqUVMXzROTevSfDyHL9EIhFUq1VtyN3TOKltn8DjOGERLDv5uvXEPKeJiQmg3vtpN8+p1TAwqeuT1QcmMTQ0LOemFePj4xyORxRyDEwhIf9BkYfdpVIpzMzMYGVlBYlEQquWF0Zra2uWf0DJObnAQb9Q6usniYVmE4kEEomE9nPhZfHa9nAbLyZtj+/0ByNue2WM3805AI/j2LGv6P4tNtFTFzzj19YerQR88yjVGWKek6jsZjXPqZWS4k4+UnD68USYcf4SUfgxMIXU3NycNn9JKBaLWF1dbQpXYVMulzmMzCf9di5FUCoUCg2VI+V7fnt7OzSFEBp5abrZP6eVxnYr7F+ZMeOQdRHHjn0Fi4tflgJT2OJAcOT72KmwniGjeU6VSqVhnlPEY0lxJ2FJ6PbQpCiKp3NERO3DwBQiy8vLKBQKqFQqUBQFqVRKPqQriEZBP/aM+E18MtupBnO7TE9Pa/OTxPpJ6XTatIEZ3h4mL80243ih16nr7+XdhJndwMAgiUpxXl5DWAOTnpjnlEqloCiKNs9JrO3kltOwJNj/FIUXe5iIwo+BKURWV1extLSklUc2ayx2g37rGQlSL59LMT9pbm4OS0tLjtdPqlarHQsR7dbNvweC5CV4dJqXwHSxSwKTUCwWsbCwoM1zmp6eRjKZNByuZ8ZLUPfynDBghTyi7sDAFCKrq6tdMeTOiSIXsPWNqqo9FQ4URUE2m22an7S6uiofaqqfAlMnhbmh7va1uavB5z99j6jTGVvuS2GEh5jnpKoq1tbWsLi4aDrPSeY1/HRjL1OrRTGIqD0YmCgQLIftn17pYdLPT4pEIqbzk5xQQ7s+ldtmPBw14zv1Xr28Gzh6R9achGG3YaLV1+QHMU9FvHar8+v2/YVVJBLBmTNnkEgkGuY5OS1L3utYIY+oOzAwUSC4FpN/Njc3u7pMezweb5qf5DUoCeHtYbJqApuxfk4r58kPboOG9btxxum1dRoq7MJJO8jvSbx2o5W82lc3MHiRejl1SPOchoaGXK3n1Ks4f4moOzAwUWAYmvwR3t4Ua2J+0uLiIlZWVhzPT3LKa6niYLmt+nbR5fHt5+YdOQ0wTji9vhfrSwXLwU68lpMuXn+QhoeHDYsfXDQITGF4vX4QhS5k8jynSqXSMFzP6/uX74FuwAp5RN2BgYkCwwVs/VEul7smMCmKgnQ63TQ/yc+gJIR3jSqnkcF5vOj0+3TSQ+Pk3bjpSTBqaFs5Vw9HYnPymttpaGhI3tXz7IoZ6NdzWllZ0eY5HfHwQVuYrrUb7GEi6g4MTBQYLmDrn7AXftDPT4pGoy3NT3LKr8Ak1nu5YLB5n0Ru168hBl45E+R5dEoME5PflQhKdu8mmUwim81ienq6Z4ZhWd07cuGCSCSCWq0m7e1t4+Pjju7darWK5eXlhnlOsW98AyMjIxgYGJAPN9SNvUsRVsgj6hoMTBSYbh1KFkbVajWUhR/i8Tjy+bw2P2liYiLwoCT4ESJFg1du3AqnPawHc5MYECYPuDIaQNYd9MPcnPbkiKGZU1NTWFlZQbFYBICun79y1ubeEY/rue018+q0QYCzeq1BiXhYtFbMczr5+OP44vvfjxMnTmBkZASDg4PyoRq7ezCsWCGPqHswMFFgxFCybm0QhUnYhjfq5yedP39em5/UrgYh6sUwWhnmJMKSHafHmZMDkzdefo4m641nsbWzwawPSqLHsVKpaMOwxPyVbgxOTsOH/t4xm8/jp0mLnlHxWlq7l91RFMVw3pYT5XIZC//u3+FvX7oEAHjwwQcxNjaGI0eOaMeIAN+NYQn1HjjxAQIRhRsDEwWKhR/8EYbeOkVRMD8/rzWCU6lUYPOTnNje3sbo6Ki82zE3DUcRPDrFbW+avnGsD0xnTYaLNR7VWrQyCkpGn6J3a3CadHl29McHGZicBnunx/nBj2GIf7a3h3u3tvCj5TI+fe0arvzMz2DsM5/B709Pd3VYQn3+ktHPBhGFDwMTBWp9fZ0L2Pqgk+ta6ecnjY+Pa43gTn8yWq1WPfcweQk/Xp7TCaJBbNWov9nTIAZvyYFJHGH1VRqJ8vGnTp2yDUryv+XgNDc313BMmDg/IzdNtqFwh5sQ5Db0eeXnkLOLAD597RoSX/gCHv3KV/C+uTlsb293Rcg2oygK1tfX5d1EFEIMTBSoYrHYsYZ+L3Hbw+CHeDyuBaW1tTXLRnAnbG5uem6EtqOx6NX9OImH8Sjeh49p29OPl/BD3/kQ7sdJ+fAGbnoP/hBncdgyBjqJXjeD0uLiIjKZDGZmZizvEbMeB31wUlU1tMUhrM6YGfEcq/PSCusrZMzL+3ArqGGIYp7TxMSEtp5TNpv1/PugU1ghj6h7MDBRoMrlMgOTD1RVbdt5FEOqstmsttDs8vJyYI09r6otLF7rpYHZDg/jUdyPkxjCsYb9e3t7gC5MmXHaCP4UzuISJhGVHzBkHMHkoJRIJHzpdaxWqzh//nzPFIcQgnz9Xu5nL89xy88eJiOqqmrrOdVqNZRKpYb1nMIsUq+QF0SgJCL/MTBRoKrValsb+70syPCpKErD3JNUKoWJiYmOzU9yIgzzuvz0MB5tCkqCCEwAMIRjpqHJaSP4ku7IuxoeMXMziokhmtls1tegpCcakvJQvW4OTkE2jJ0GZZnT+8WLSBvLZYt7ZXh4GEtLS9p6TslkUj40NBRFYe8SURdhYKLABdnQ7ydBDMsTjd9SqdQwSd/vBnAQWulh8l6rLhhDOGYaloSDBw9q/290vNNG8390fKTe6Ya5bKKEfFD3iX5uWtiCk5ciAy916LXa8fJenAq6d8nM6uqqtp7T7OxsaOc5jY+PMzARdREGJgrc2toaCz/4YH193bfS4vL8pHaun+Qnr5/ae2koenmOU3L4ke3t7TUt4Gk3n8lM3mVgGhwchKIo2r0ihmgGTb62YQlOXu6DlwIeetWJDwBEuRB5E7PeOhWYBP08J9Tvl2w2G5oP7zh/iai7MDBR4NjD5A+xrlUrkskkSqUScrlcqOcnOVWtVj2dk4seGr6Pyzt8ZBd+3nnnHXmXbchqlQhKY2Nj2NnZwejoh9oSlAQnxSHEhP92Dr3yEk7OGQTATnN7/+uJcvVGRKmQJ/7qr0LxnlVV1e6XWq2GQqEQinlOsViMFfKIuggDEwWOC9j6o1qtegqe8vykhYWFtvUSBM1rYILLBS+DDEtOyT1MMi8NeSNyUHr++edx7do1F2erParVKhYWFpBIJDA1NYVKpdK24OTmfng8wGpxwjkPV8fr/WJfN/GGo6qKa5/+tLy7Y6zmOXXibxPXYCLqLgxM1BbsZWqd23PYzfOTnGq1weEkNDk5JmhGQ/KMOHmdM1JT+Tv1/5oHpXBTVVVbRNkqOImeD3n4mFlPiZWLDu4L+ZggAxNcBiAvPaxwuX7ToUOH8FFdsZIwkec5VSqVtg7vFEVNgr4niMg/DEzUFmtrax0fAtHtxB9Xuz/q+vlJ6+vrXTs/yYnNzU1Eo86KY5t5vL6dM9hOemxYulXDFXlXA32VPMHoOU4azSd07+hVx0HJyVfuLKvgZLYM72Q9MP0UTuNw02yc0wbPuEkEIvmeOae7p8SZbvUedUK8HjtOjzPiJlwODAzgjTfecPWcdhPznEZHRwHdPKeg/1Zx/hJR92FgorYoFoss/OADq3lMyWQShUKhYX7SmTNnejIoCdvb21pjpxUXDRq97YwIRuFHpq+SB5PnOGkMn8BFnMBFvDM4iDtsg5LQzrPh3RCO4V31LiymPovZxP8dHxj7BTz/83P4yaNH5UOBenn1x3ABL+O0wZpUpx31Qcn3zDmTkL25uSnv8t3Fesg3ulri3rC7P6yYx8dmt99+O/b29lw9p1P08+JqtRry+Xyg85zGx8d7+vcyUS9iYKK2cDucjIzJc3b085NOnTqFTCbTM/OTnKhWqw0lqLvVy7hgGIAEoyF5L+NCw78Fu9A0ODiIc5+6hh//sJOgBJuvFg5ibSqx8O/9OIl31bvwe5/4j/jXv7eL8+/+DG77/vfhyJEj2nMuYRKfkhblNV6TSgQn7+x6hf12rh6c9JvdEEK/HTx4ELu7u10RmAT9PKdMJqPNc5qbm/P1Gipcg4mo6zAwUVtUuYCtL9bX1xGLxbT5SYVCQZufNDMz01Pzk5zY3Nw07XHrNmYBCAZV8p7DMw3/lomeBv2QsT9SFCif+hSO/MVf4Nc2NrCychuuXbOalC+iVzub2d7cj5OGVQPvBrC/v48Xtm/Hv1F/GHv3xDA2NoYjR440hSVxvDE3s3eaDQ8Po1KpyLt71sDAgHbPhv/uMSaG62UyGUxPT/s6z4kV8oi6DwMTtQ17mVq3vb2Nubk5FAoFbG5uaoUc+nV4R7WFxWvDpoYreA7PmPY0DQwMmB7zo/cewtyPDjVt7/vRIbz4wz+AY+k0Plwo4J9dvSr1QIq+CHlQmTwLJ7wexqOGYemw9O93330X2Zf+Bra2tvBXwx/AoUOHcMstzX8CjXuZYDs0z0ov9IK6MTAwoM27C/8dZC2IeU6skEfUfZr/WhAFpFwu+7bwar8R85PS6TRUVcXo6Kj2/50khkK9Dx/TNjEsqh1UVe2ZwARdaHoOz+BlXNC2d45dwft++V7DsPRzJ74H/+Px2xr2oT70bmRkBB8eH8bgt76GRCKBM2fOyIfVyYGpO5q5QzhmGJZgEJiEws4P4szLfxdvv/02br31VtPg1Mx7D1MkEjFdV6qbGM2NMnL48GG8+eabQNfcSfbkeU6ih99tcBIl5lkhj6i7OPkrQeQLVspzJxKJaPOTZmdnkclkMDw8HJohaGLOiNxgHcIx3I+Tho/5TZ7T1StquNIQmL61+yf4gckR+TD83InvwfE7G4tBiKAkzouqqvirv1zH97291XBcmDkt5OElmF/ESVzCJN59911cv37dZXDyHpp6oYHsNPyIHqaLLp7TLURwGh0d1eY5lUolx/OcWCGPqDvZ/XUg8g0XsHXGaP2kRCKhzU8KwzA0J2FI9D7ZHdeqXmiIevGj9x5qCksjIyN44IEHgHpQ2trawu7uLgAY9kKF2fb2tryriR/3lhyc7r7rrobiEI28Nf+DXri2XewKigiDg4O4fv264x6pbiWG6y0sLCAejzua5xSNRjs+MoCI3GNgorbiPCZz+kIO1WrVdH5Sp4tnmE2wN+OlF8CNXu1l0jN6j/qwdPToUTz00EM4dOgQXnrppYagpPej9x6Sd4VWqwFDLMhrRL8WlSCC0+Xr1zEyMqIVh/BLq+8nLJyEpkM//uP4jf/pfzI4y72pWCxiZmamYZ5TLpczHFHBHiai7sTARG3FeUzNxPykQqGgrZ+0sLDQFJSEcrncloUwzbgJS7CZZ+IHozDRa4wa28fvPKgFpSNHjmBjYwOqqhoGJUHukep1r8o76n7IpCn/KoCdnR1sbGxga2tLCk7Gz3GiPffnaYMtGBdNSoWIciF//Iu/2JZ1p8JGP8+pXC4bznNihTyi7sTARG3FeUw36ecnraysOF4/qZM9TF7Dj5fnOGUWLHtZMpl0FZSEXgtMVqXYUe9lekPadwxX8LMGg8XekHql5OB09uxkS7+7grtPTwO4YBCWxH7v867syGFJlAtRFCXA9xt+1WoVS0tLDfOcKpUK5ubm+v7cEHUrBiZqq34fkqcoCrLZLCqVSsP8JCdBSVhfX2/TJ9bNggw+XlWr1Y72uLWLolukeGpqylVQEi6/1rieU7eTKwYa2ZRC02Q9ZH1CN7DsjfpxRm4Epy/gC1/4NW0hU7fByWpOS2vOOuhJOhtoaDLSK3O2/CDmOaVSKcRiMUQiEczPzwd4TxBREBiYqK3EJ2udavB3in5+UiQSMZ2f5EQ/DEFzo1arOa6qFjz5E35/JJNJVKtVLWSnUilXQUnwOzDJ79a/d+xMDVcch6ZXAQzhCo7Vjz+Bi/gpPI5Ni7B0w41ZO/qFTN0Gp2DCg5sg5OZYfwTznrtXsVjE+fPnteI9lUoF+Xze8T1ERJ3FwERt10+9TNPT003zk7wGJaGT589J47TdKpVKCBYGNRsWdaHeWPVG36MEoOHe+a+X35KOtvftnRuLibbK7N2K/e1smhutTWXkJVzBZ/AMTtbn35wEcAYX8QYeN1lhSJQ3aCxx4DY4BdeT4PYstyfOcsiZuWg0inK5rJUlLxaLyGaztvcQEXUeAxO1XbFY1BqAvUo0dOfm5lzNT3KqU6HJScPUiNfnObG5udnhHje7YVGTrmOEPijpeyP179Nt+Ln82ju+9DDZvVs4PMZPYrFfMy/jgsXjF+uBSR+lTtaDknmhB6fBKRKJBNDb4uXsOr//vDuN73znD/ChDyn1e15s7e/hCiN9hTwxz2liYqJhnpNdWXIi6gwGJmq79fX1jjT2g6YoCtLptNbQnZmZcT0/ySlVVTv2R9Vuor3M6bApr6odXZfKTUPQvpFrFpRgMMTp8mvv4EuXXm/YZ8bNsVZOu3y3To/1Qw1X8HU8abi5vWfdcBKc5GvXOUFekRsx+dChQ9jbk8P8ZP1x772tvcCsQp5+nlM0GkWpVGJwIgoZBiZqu2Kx2NSg6Gb6+UnRaFT7wxfkWhuqqnasPPvLuOAqAAXZWEVHA9OkywbopGlosgpKVpwEISfHOGX86s25Pb6bmQWnYAogeD2zbu5XN25+cDAwMIDr16/LB9SZ/wz0A7vhisViEalUChMTE0B9uLFRACei9mNgoo7ohdAUj8cb5ie5aei2qlwud3QYmtN5I06Pa4U8VK19vDQ+GxuLIiidOnXK0f1j9D4vv/YOlr5Rw3+9/FbT9qVLr3csLMHjGQIQgjlp3ongtLKyglwuh1wuJx/iA6M5V054fZ6Vxg8ODh06ZFOQxMud1P1EcHYSnqv19ZxGR0exurrKeU5EIcDARB3RzQvYikbu4uJiw/wkq4au3zY3Nzs+rPE5PGPa2/QyLuDreNLwsd7hveEnwvbs7CxSqRRmZmZs7x+7x7/x7etNmx9zllrl5SxFIhHUajV5d1dZXl7WGrzT09PI5XKGgbf7NV7hgYEBgyF5Mi93RXeLxWJahTynjOY5bW9vc7geUQcwMFFHdNsCtvL8pFQqFdj8JCfK5XIo/mCKyfTtnDNipHO9TO7ccccdKBQKWFxcRCaTQSKRcN2I6hdOPonvBuVyGZlMBmtraygUCj4FJy89RV6ec8Npg+1mn1JjP+Ltt9+ON96QlwuWee177F7RaBSbm9YF7K2InsuZmRkMDQ1xnhNRmzEwUUd0qsqbW2J+UqlUapif1OlGLtdiahT283HHHXdgbGwM9913X0tBqRsbR+Z15ryTG+9hJq6Z6HHyLzg1lju35z4wnbYoH29WwsFZD1P/BaZYLIZKpSLvdq1YLGJhYYHznIjajIGJOkIML2qtwRCceDyOfD6vzU+amJiwnV/Sbt0SOtuhM9fFvgE6ODgIRVGgKAquXbuGS5cuNQUlMQ1e3mSbm5sdndtjFXwu4GTTJlg9z62zJg14sS+MhoeHGxrK/gUnsU6UE06Pu8lJeYZJAFFp38GDBx0Epv4Ti8WwtrYm7/ZMnuckCo1MT0/LhxKRDxiYqGPC2OAX85Oy2SzOnz/fkflJToW9V6WdqtUqolG56RY08ygggtLY2Bh2dnbw/PPP49q1aw0ha1K3Qo0cAMQn+GH6HN7o3V7ASZzFx3ARJ5u2s/gY/osuOLXK7nyIcxY2ZiFXBKfNzU2USiWPwUmEJqOrA936UmaPG5t0cS4PA7ir/v+Dg4N45x0n8+bsP2zoNYpNhTyvxDwnUaFxbm6O85yIAsDARB0TlgVsFUXB/Px8w/ykiYmJjs1PcqpX17PyolarYXR0VN4dsItNDVHzoCTcaCg6aZCKY/QhodMNIH0/xZfxKC7aBKL/iJN4GI/Ku12Tz4MZJ+e1VeJ76JdlFZtRj4xdAQvRSyCCk/uGrghNj9fvr3P1/z/pMJjIUf00fsLR2b5hBudwd/3/BwYG8Oabb0pHGHEX4LqdEkhp+WZm85zcB3EikjEwUcd0usEv5idVKhWMj4+HZn6SU+VyuQO9KuFUqVRMP8kP1o1P9+2Dkjj2BjeNetEIr1QqGB4elh5tL9E0v4JjuIJj8sMNxPT2IRzDkM2xVk47DEvCpMvj3TAKsXpieKD8uF1jWT+8CkALwUkEJieBZNJwgONhnMYf4iw+5fAu/dl6KLsLwOHDhx0Mx2v+oKHXeamQ1wp5nlOpVOI8J6IWMTBRx3RqSF48HtcWml1bW8Po6Gjo5ic50S2V4dphc3OzI+fiRugexIc/bBWUGodNGfVCWJn08JwgXQTw23gUZnXQ3qiHJf3j9r1Mjf02mcwHceHCkwDO4nBT/LDn/hn23PRe6Y9zMxRLBCd9Q9d9cHLC/N0crv/3EiYdh6ZP1D8MsF60VnDS69VbotFooAuZm9EH8ZWVFW2eUzKZlA8lIhsMTNQx1WoVqqq2LTSJ+Um5XE4LSmGdn+REO89d2FWr1QAaleZE76QI3Ssrt+HatVHdJ/z6oVGNc0y8hB8RADrTi9ZoCMe0ULQJ4FXdJvYZhSnzXqazFv02k/hDnMV/dHnW3B3tjLPocFMrryHY4GQelmSXMOno3J/ARfwsHrdZtNZuvlXvisViLZUUb1W1WsXy8rI2z2l2dpbznIhcYmCijgp6HpOiKFpQOnXqFFKplBaUup0Y5sM/eMEHpiEcw/04iR8Y/HH8gx/7VZz5R7+L8tpfGtxLcmDyp3E4We9FC/I9OqUPPm8A+I5uMwpKgnFgcjaALo/TjhruQbF/hc3Eq21l/kowwcndu8njNC45es5FPPHE/4V77lk1+Dlo/uCgn8Tj8Y70MBkR85wSiYQ2zymbzXakh56omzAwUUcFtYCt6AEolUqYmprSJsK2cxx5O3RqWGPYBDk88WE8isnBD+FvKx/CB8Z+AW9t3InP/G+fx8Xlv8bDeNQkCJA95z0dqDfcnfK7We4kLljxGpgEf4OT9Xn8jryj3tNk52L99+6hQysGgcnvK9JdIpFI6EYylMtlbZ5TrVbjPCciGwxM1FF+N3T185PW19dDuX6Sn8rlssdGEzkxOfghjCs/blrMYQjH2hKaRHOzt661eSN8cHDQcGiX014mv5vn5q/U2mmfr1nrwcnZ+bPqKTQiyji4ma/VL8J+TsQ9NTw83DTPyfl9RdT7GJioo8rlMhRFafkXczKZ1NYyEfOTzpw5E+o/VH7gPKab/AzfiqLgg8r/hv9hLI433nijKSjJ7Isa3OSlMX+xDcMOnarhirzLkZdxQdrjrPH+qu7/v+kwung5x1a8fj1x3fzWenCyJs+2sTvv5+r/jUQigbzfbjY1NRWa4Xh25HlOlUrF1/uKqJsxMFHHeR1Wpp+fNDs7i4WFBYM5Jb1tfX0d4+Pj8u6+VPVhIV9FUZBOp/HvPpvHbdePYmNjA1evXpUPM3S/zZpEgmhcunEuoIa3FzVccR2a3B6vp58b5XRomNeA47eXAm5oug9Ozs+MvnjHD1k8T8xMYlgyNjo62jWBSRDznCYmJhrmOXn5O03UKxiYqOPczmPSVygT85MSiUTPzU9ywo+Q0Cta6U0UQSmXywEAfutXctja2jIcFmbG6bA8tw16LwEraM29RdbcHi/T9zJZuSgtrusXN9dLaGeIcB6cnL8TUQnR7NyLUg7iK4Z96FmnRKPRjlbIa4Wqqg3znAqFAuc5Ud9iYKKOc1opTz8/aXNzU1totp//SIshjZ0m5vK8Dx9r2Noxv0eoVquuF/IVQSmfzwMAUqkU0uk0tlXnQUlw8z6d1gu7qAtMYQrHNVzBc3hG3m3oOTzTUg/TXfW1gV4FMImLuEs+oE404J04ra34dGM7azNA0G3IhXTd2sVZcHIXwb8D4FM4h5NAwyaXcmBgMhamCnle6ec5LS0tafOc5ubmDEI5UW9iYKKOsxuSl0wmUSgUGuYnpdNp/nGu83PujhdWhQ/EY27m+HhVq9UwOjoq7zaVTqdRKBQQjUYxMzPT9nvKqtCy6CnRB4B2NrydEKHJLAzZPW7XcP/ed9/FCQB367b7cRHfqYco6GqwiQa8ncl6QJLD0aQuRJkN+jO7VkbEsZ0KEdbBSY46dpzF0KGhoRDco3IUNrra7ROJRBAJYYW8VqyurmrznKanpznPifoGAxN1XNVgAdtIJNIwP2lpaanv5ic51cmeBxGI7Dg9rhWVSsXRwq7ivopGox3vpRTBSP703k3jvJNEKPo6nsTLuKD9/9fxpGlY2t7erv+f+Tu89coVfO/enrwbP1uPRaL09aTDoAQXRczNltCFw+uiP6bTjUjz4PRFB+8EDt/xDaOjox37OTKPwnAQhYPTqcDcDvp5TqjfW9lslsP1qGcxMFEoiF4mMT9Jv35SIpHA6uqq/BSqK5fLHSv84CYEDeGYYS+UX+wWdhVBSdxXVkHJy5wbo3DgN6v312kv44Kjc3CzF0LExUZ3Abj17bfl3fiEwbGTLprBTsKSYHWs6PkTvVv67aQUL4x7XU5LW/Dk4FSp/B7S6a16cDIiro2zsIT6XJ3m99oOfkThYIyPj3f9cDw7qqpq91atVkM+n+c8J+pJDEwUCpubm9q4aAC2DVq6qVND8pxWhdPz8hw3jM6Dm6AkOGn4y7yELLc60yD1T/OQyebQdHfDv274BB7HCZPGu5PI4eQYmdVzLhqEJbOerlqtVv8/0dMhByYxiyp4+uAUjUZRqfw60umvIBL5QMt9nJFIBOvr6/LuNnBz7twc27pYLNbzgUkwmudUKpU4z4l6BgMTdZSYnzQ3NwfUG1ROGrR00/r6uuUcsKB46S3y8hynVFVt+MPsJSgJbktnuz2e9C5qs5AOS4/M4ByexknTsASHfQZW4SdIo6Oj9SGIdmUlxJAyfwzhGO7HyaZNUFUVqVRKF5wqmJ+fb/gabsVisQ4EeqtzasbLc7xRFKVrK+S1QsxzWlhY4Dwn6hkMTNQR+vlJKysrGB4e1ibIkjvlcrkj5y3I8OOF6GlrJSjpmc3BkbmpGNeqTs5XC945/CJO4mmcxH+J/hz+w60/ps1ZshNEE9ivr/lnf/ZRh7EOLfeADOGYVp1SDkv342RTcRZ9cBofH0elUkEymWz4mk51priBl6vk9Fq0rp96mIyIeU6iZ1ksLs/hetSNGJiobcT8JH1jNpFIaIUc7KrlkbHebkQ7l0wmEYlEWg5Kes/hGcuhdqLQQbu0/xP8zhgcHMSeQdGHbvOtbw1hY+OIvNuCm1lZjZwUVjGraCmCUyKRwNTUlOvgFGnTelP+8HZ+3RIfALb6O6gX6IeCbm5ucp4TdSUGJgqcCEp285PcLmBLN6lSlcF2cNL70g76HiVVVZHJZJrurVa8jAtaFTj9JvZRMNwEJvMBe94569uy9tJLEVy/fl3ebcNLr4m7Aixmx3oNTt0VmNqjlyvkeaWf55TJZPj3nroKAxMFRsxPKhQKWFtbw8TEhGFQEorFYseqvXU7MRytnbwEJi/PMWM29C6o4YlyYOqUdl9nvyiKgqmpKczPz5s2lLwGH7vn2T0elK2tD8q7HHDfA+KlmIrVc0RwSqVSmJ2dtQ1OiqJ00dCz9twN/VAhrxXFYhHpdFreTRRaDEzkO3l+klg/ye4TyHK5bNqQImvr6+uIRqPy7kA5LSOt58fwNXF/nTp1qqm3slqtBhaYwsDsw4Ywk3uYi8UiFhcXkcvlmq7VRQ/NWSfHe+kt8vIc2eDgIHZ3d+XdvpOH2Dnh5DliDopdcOpcSXEvV8nJHdO6fp+/RNRrGJjIF4qiIJvNap/4i2EdbhaarRosYEvOlMvljvQ8uOlpaTUsiWpLs7OzSKVSmJmZaQoQqqq2PTiSufn5eeRyOaytrWF0dBRra2sol8tIJBJYW1vTLaR6Mzi5bQI3r87UzG0Qc/I1nRgYGJB3BcJJ+JG5eY4cnOT5J51btNbNVRXc3mHeKIrSoTLrRBQEBiZqSSQS0T49jkQi2h/VYrEoH+pIJ4aW9YJqtdqRoCkqxNn1NDk5xkw8HtdKz4sgbnZ/1Wo1g7V+eke39KCJXsBoNIpUKmX4wcny8rK2kGqhUNB6Li4C2HI4VMdNsHG6spDT45yYnr7qai5W2InglMlktHXz4vF4B3uYLroMQG7umNawh4motzAwUUuq1SrW1taahkZ5VSwWMTU1Je8mG52sMChCk36IXg1XGoojeAlLIijlcjlkMhnLoCRUKhUMDQ3Ju3tGrVYL9ftTFAWFQkGbV7awsGD5O0FMAp+ZmcHU1NTNnovJScvgcq6+cpPZ42YetwhEFy0e82py0l3xihvcBADvvPxMCnJwmp6exoEDB+TDPJmsF1e/IG3mpTCslg3W8/vqmhMV8joTIokoCAcA7Ms7iTolFoshm80ikUjID5GN7e1tjI6Odv0f6Xg8jsXFRSiKgkwmY9g7YSYej2tD9npROp1GpVLBysqKtm+y3pjUlwoQQ9CcNCP9oCgKFhcXEY/HLXuY0+k0otEoFhYWDO9Tce3j8XgHh3n5Z3t7G8PDX3RZyMG8GIMZo1LhdvwsXlIqlYB6AM5kMqbX385ZB2fKPPYY/SRA95Ng/Kwg8O8YUe9hDxOFCgs/eNfJXiY/iOIAuVyuoViIW70+pFM/5PCsSSNTNB2NHvNbOp1GPp/X5ilZNZbPnDmDzc3Npjkwgui5UFUVhUKhaX5Tt7kRCs2b+M28DRnzEnxa6WGSRSIRTExMNA3Vc8PpvWp+nOgjPCltbs6/P1ghj6j3MDBR6BSLRdd/bOnG/K9ubFzqq6iJRreXoIQuPgdOVSoV7f/NG443ieFNQRDzlABgZmbG0TXTD8Gza1iL+U2lUgnz8/Pyw6HXeB/aNdpbGxBYwxVXAchLhUsn5KF6VtdXz6hfyIr58LxwiMViXd87SkSNGJj6iKIoiMVioW9QlstlrsfkQbf1MPkZlIReD0xCJxuYiqI0rH+VTqddNw5VVW1oWBuVGBfhamJiAtFoFJVKxVHjOywikYh0XkTvh5hzIzarmVXOOS2sIuYX+kVRlKbhlW6Dk5t7GfXj3T6nnVghj6j3MDD1gUgkgnw+j0KhgHw+j1KpFOphS2tra5Z/XMnY5uZmVwTNIIKSntzwbtUQjuF+nGzaOkEUfXAbgNweb0R/3WZmZnwr8mJUYlzfAK9Wq1hYWEAikXDU+A4/OTC1FpT0nsMzpsFJFGdptby/TFEU0/vASXDyGn68PKddWCGPqPcwMPWB+fl5RCIRjI6OYnR0FOVyGYX6IpJh1G09JWER9t6VoIOS4Nd5GMIxPIxH8TAebQpL9+Mk3oePuZ5o36pqtYofPXhQ3u2I19AUiUSQTqcbrpvfjUG5xLjR9dP3SmWzWWSzWcPj/CRfczdB2SpIBEkEo6/jyYbNLEi1yklJcavgFObg44W4J+3OCRF1FwamPjA1NYW1tTXt3wsLC0B9DkIYiUZGmHvBwiisQbNdQUmoVqu+3Dv346RtIPJSnaxVYzs78q7AJJNJrQJaO65bsVjUepiMeiNQb3xPTExgc3PTcOFbP4hALIclN0HZ79cUVpFIBJubm/JuQ0bBaXKytyKToii+f6BARJ3HwNQHVFVtWNtIVVUUi0UsLi42HBcmYW38h51fvSt+aHdQEqo+LO7qJgi5OdYPAwMD8i7fxePxpnlKQRL3yuLiIhKJREPFtVwuZxiAz5w5Y7jwbavEEEwroufRytDQUGh6GU4bbH7FFDGnzQ19cBr78IcxNjaGI0eOyId1pfHx8Y70LBJRsBiY+oCYE6RvdGQyGSj1IhBhVOQCtp5Uq9WOX9NOBSVBVVVEo1F5t2NDOOY6ANk1sP2iqqplYLoLwAmT7bB8sAFx7cQaMn7MU7KjH+4nSopDmt+Uz+cNe5L0VfdOnTpl2ivllBiG6YST+6RWq8m7fDFpEICMTOoWfZU3UZK+VZFIxPP7LBaLSPzar2FrawsjIyOugpN/M7/8xflLRL2JgakPLC8vQ1XVhk9gVVUNdTW69fX1jjf8u1G5XG4pLLRCH5Q2NzfbHpSEWq3W1LB2w64RbMTLc7y6aDKEKQrgbnmnzi9aNJAVRWkILhMTE4EHJX1Zcqt7ZXl5GYn6AqBmPUmqqmJmZkbrlcrn84a9UnachiXBKiiPjo5ie3tb3t0SUSb+rEEAuiBdXycl5Z0coyeGJOq3P/kXr+G1/+v+ln4G/tXODjY2NrC1tYVoNGobnMTCzGEUi8VYIY+oBzEw9YlMJoPZ2VlPjYhO4FpM3qiq2vagKQclL0O4xDAo/ea1AVapVDA8PCzvdsyqERwW56R/R216kF6t/1f0TOglk0nkcjnAJrj4RVEUFAoFV8P99D1JU1NTpj1JoldqZWXFtFfKjJf7za6Xyc8heSLcGMflG/THOA1CTo8VxU9kAwMD2NvbczRM0YyoFbizs4Pnn38e165dswxO8v0fJlyDiag3MTD1ieXlZW0i9fT0tLbuyfnz5+VDQ4Ohyb319fW2hWLRK5HP5xuCkpvGghgCZVSJTuy3apAaefFPN4HnRhs+BW8lgIWJqqpQFKWhGPVhm7D0BoDv6P4tAtP09LQ2TymVSjkKLq3QB+ulpSVPw/1UVUUqlbLtSVpdXdV6pZwufOv1/jB7nt89vU5CDeoB6EvyThuTNkHM6ufw4MGD2NvbAxzO/zKjX4nq2rVrpsGp9RWrgiMKlvgZlIkoHBiY+oj45HVubg7r6+tIJBKh/sUe5iGDYSUa1EHSD98CgJmZGddBCbqwZNYQg8Nj9B7Gozh65SRefu5aw/77cRIJPIoncRIX6sOXxCY3RIMovRwE0XC0Gob3BgC5ftng4CD+/OMfx+zsbNvmKYleLDGnbXV1VT7EFdGTdP78edOeJNEr1cmFb51Wj7Mj9wraudsmRBsxC0xWvWiDg4N45513Gva18uGEvISvCE5/dO0aBnM57BQKuBbw77dWcP4SUe9iYOojogGRSCRw5syZwBtJreICtu4FOSRPH5Si0ainHiU9N8N3nHxqLYLV7u5uQ2GEw/WiB3cDuIiT+LL0fcXEeNFg9BKYvDzHK30weBzA6/VgpCeCkr65Pjg4CEVRMDY2hu+urWFmZsbztXPqH91xB9566CF8pljE0Z/6KXx8eVkrQuAH/fymUqlkOL+pWl/4dmZmRitlHdTPiJ4c4FphFmaM3FX/r9vAZMbqZ29gYABvvvmmvNtzYEI9LD0O4KRu+9C1axj90IewsrKCQqGA3Mc/jo8PDloWvOgEVsgj6l0MTBRa5XI58N6SXhRESXY5KLXaK2HVCDNi9Sk3pCFD+sB0uD6/R+8KjuGKwdcScz9exgX5IVtunnO/wcR5N71ocq/wd+rB6JJu25RC1D333ANFUfDGG2/g+eefx+Bzz+ke9Z+iKPhDRcFv3ncfNjY2oKoqdnd3tcdFkQI3QcCMvifJan5TuVxGol7KOp/PNy186zX0ml37SCTS0s+InpfzZNXz6IbVfXno0CFtOJ6e259vpwaXl3FtdBTv/+IXcebBB/EpRcHHBwd9DeGtULgGE1HPYmCi0BLDy/z8pLYfqD6uxSQqmfkVlAQvDSqzhptRmDp48CBg0Wi8aPL9RaPrOTwjPWKuhiuOGttDOKbNp5KJoYduet2cOHr0KB566CEcPnwYqqri6tWr8iG+S6fT+ObUFMZ2dnDp0qWGoCSzK2LgRrVabZjfVCgUDD9wKRaLGB0dbVr41ul11LM63q+fQbfk3sYgDQ4O4vr16/LuQIjKgABw9epVPPfcc7h+/ToefPBBKPXgJA+vbTdWyCPqXQxMFGos/ODe+vp6y3O/RFASlcz8CkqtMAoaMAlS169fx8DAgOmwJKMeJkjD8pyEJqfHiUBkx+lxdgYHB/HAAw/gyJEjhj08QVQZE/fMz1y5gudXVnDtWuM8MjN+9wyI+U1LS0um85tgsPDt9PS0aW+RGavjRQGAdhOBqR3B6dChQ5aB2C9GwXpvbw9bW1sNwennFAW5wUHpyPZhhTyi3sXARKHGBWzda2UoYxiDkh2jwLS3t4d7bRpOZqFJNOBFGDLqRRCPOQlLsAh7Rox6zPREz6ugDz/6eUrXrl1rCkqCn1XGFEVBPp/X7hmccxfH7Cq0eeWkUp6+XPnc3BxWC/8eR+OvyYcZMrs39DoRmFAvI/+6vNPCRY8hWpQUl1kFSbfs7g85OD324IPIffzjhiE5SIqisEIeUQ9jYKJQ4wK27lWrVdfnrBuCkl3jVG9vb6+h8INXIhh9HU82bE4ay4JdADLiJmCJhu7IyAgURdHWsjFruPm16KeiKxO+uvKfsZj6LP6Wehcu4KS2OWXVIG6FXCnPbH6Tqqra/Kb5xf8FT+RTGFaMA7dVkNbzs8HuNsx8B8AfyzstWH19q/BjNiTP7ty44fTe0Aenn1TVhuGW7cAKeUS9jYGJQi2IAga9zk0PU6eCkpcGldlzjPbv7u5ioD6Pycwxg+cFwW1YgsvnJJNJHP2938PAwABUVbUdCmfVOHZKlAlX11/Fo6NzeGn1XdyPk3gdJ3FRt53Fx0x78tpJVMpzMr8pkUjg8+d/B0/kU/jZ9ARei3wLL+MCXsYFrVfR6J6TmQVWL/TrbjlxEcBjDp9jt66RXWCSezBrHuaCWXE7ZHNvbw9bX/mKNtyyXcEpGo225XcnEXUGAxOFWrVahRpgqexeVK1WbRsHnQpKgpcGlVXDTfbOO+/gHZsheWasGo9euAk/ema9TNVqFYqiQFEUlEqlG9fw134NP335Mv7UYPidcLFeormV9xeLxbT7Zj71f+CrZ/6b7fv7Mh4NRWiCLhBlMhkUCgXThvTg8jK+kEjgp3c28LX//RH87E/cjpdxwfF9axTGWmUXbARRlhv1/5oF5HMuvqbV0FN5SJ6bn9Mg6XsX0YbgxB4mot7GwEShx14m98zOWaeDkuCm8QmbRpjRY7u7u7huMSRv0uA5gpMGZDuYnZ9qtar1kiwsLGjXUDSURSNZv4n9Xonhd7lcTrtv3lXFij83mRUakNe+6rRisdjQkBbzm0S589MA/n61ip1/9a/wym/8Bv7pK6+g9CM/gp8ZGZG+krkgfq7sAo641vI+/ZpGYnPTa2U0DFEe8mp0TBi0KzixQh5Rb2NgotBbW1tj4QeXRC+EIILS7OxsR4OSntPGVQ1XDEORnvwJ+PXr13Ho0CG82rD3hmO4gpMmX8/s0/hWOHmPRoyel06nkUwmsba2htHRURSLRfkQXDQITE4bxkbEGlxra2uYmJiAqqqm87LMAhMAyzlNQZx3O/L8ptd/8Rfxj48ckQ/D3t4eVFXF4UuX8LtHjqDwmc/Y9iAFWSFPLOoqX2OxLygiED2HZ/AyLuC1yLdw4qeOuh6q6JaXe9foOUEHJ0VROv47lYiCw8BEoWfWW0LmRLEMfVBKpVJIJBKh+qMuGl9G9A00O+JYvcHBQXxHasQfwxV80OTriaDhNz8akeI6AsDy8nJbhv7ov+fExASWl5e1x8yGC6Jeoc2I2dpXQZxzN6rVKg4vLED90pcwMjKCsbExDBoM59zZ2cHGxgaO1YfzyQvf6gUZmAQ5MLWL+AAj+hO3486H3nLdW+yWUfixY/WcIIITK+QR9T4GJgo9UcSglT9o/WhxcbEhKBn1RoTBy7jQVIXObSU61BtyX8eTeBkXGiaibwL4Vj0omYWlcwZDmfxS8zAJXoQ//ZyhRCKBdDqNzc1NDA0NyU/xjVwmPJ1ONzUEjXqXBDmkWpFD6mlpc1ohrVWTAN58801sbGxga2sLY2NjOH78eNOwM9SD06dHR1Gr1Uwb2/K/e9HQ0FDTfREELz2kTgKkn8GJ85eIeh8DE3UF9jI5E4/HUSgUMDc3h3K5HOqgFJSXcQFf2/3X+I0/nNHCVwHP4FFcafpUvh1DmWAyz8pMDVcwrAwil8shn8+3bQilvkz4+fPnW/qemw5Ck744gX7ukH47a7JoqZ/kKmyiLPve3h5OnDiBe+65RzrixnNEY3toaAilUqmhXPnQ0BA2NzcbntNrRkdHPd8fbrn5MMPNsfApOLFCHlHvY2CirrC2tma4fgrdIIJSLpfDysoKhoeHbedZ9DqjBo8cloIOSoLRkEEjw8ogfi79N7U5Q0aN0u3tbYyOjjbsa5UoE76+vo7R0dGG4XdebdY3/RA90aOkL0Jx1iC06E0GHJrMvvfW1hYuXbqEwcFBPPTQQzhiML+pWi9XnkgktEIc/fJ7KhqNtjUU2lV4FAHc6hgrrQQn9jAR9T4GJuoKxWKRhR8MyEFJ39itOigv3stUVQ3V+9cPGZTVcAWTye/FE/kUUP/03iy0bG5uYmpqypf3ph/yl0qlcObMGfkQQ0bvwcgb9SF6lwD8V1zRKviJRq2bYXdn5R1tsLe3h8uXL2NjY0Ob33T77bfLh0HVLXy7uLiIdDqN7e1t+bCeEolEmsJ80OyqQHoNS3peglMsFsPa2pq8m4h6CAMTdQUOyWukL/MsByVB7fP1q6pSpcCwkOds3TX93/FMZQmxqfdiZmYG6XRafkqD1dVVnD9/vqEctlvi/vE65M/tnCyYhCyz3h0znQhNqJepF/ObHnjgARw/ftywAS3WeSoWi0in07YN7W4Wi8XaModJ5ncVSDNughMr5BH1PgYm6gpVLmALSPNMxJAtOSgJ5XLZ8I97N7kfJ/EwHsX78DFtux8nLYsOCGHvYRPXcm5uznVoOXPmDCYmJjA1NYVCoeAqGKbTaeTzedMhf044HWIoGBW+cBuW4KI3yg03wzLF/Kbd3V2tAW1GPFYqlZBMJuWHu14nepg6QQSnRCKBaDSKSqXSEJxEhTwi6m0MTG0QiUQQj8cbtunpaUxPT8uHkoV+7mVyE5SEbg6YQzhmGo5EiLIqbY0QDskT5GvptdR7tVrFzMwMMpkM8vm86affgr5MeCKRsL1/7Ijy0nbchis7focmL70T/+zqVa3gQ6VSMZy3tL6+rjW0RbA1Oq4bRdpQNj1sVFVFKpXS1u0SwSkej/ddYR2ifsTA1AaKomBxcVHbstks8vk8Zmdn5UPJwtraGsbHx+XdPU1uXDsJSsL6+npXnq8hHMPDeFTe3UQEJzO1Wg3RaFTe3VGiuMLm5qara2lFDAMzqtYGh2XCvRKLlsq9R3CwjpaXHiYEFJjchqZzNgUf9IFCNLTF/Ca3PYJh1I+BSZCDUy6XQyQSsfywgoh6wz639m7ZbHZ/e3t7X1GUpse4mW+xWGy/VCo17e/FTVGU/Vwut1+pVPaTyWTT4062bj1f78PHXG1DONb0NQDsz87O7qfT6ab9ndiSyeR+pVLZz+Vygf7cx2Kx/UKhsJ/NZvdjsdh+Nptt6R4KcjsN7F/wsMlfp5VN/xperW+qwffUb5MGXwfAfjwe3y+VSvvZbHa/VCqZXufp6en9SqWyn81m9yORSNPj3bDF4/H9XC7XtL8ft1wut18qlfa3t7f30+l0115Tbty4WW/sYWqz6elpzM/PY2ZmxtMwnH7WDwvYttKjJOvGIYzy8DsnzIbmbW5udryHSfTunDp1yvU8JS/E2luxWAylUgmxWKyle6hXTerWfRI2AbwO4G4AUQCHdY9BV43NrDeqWCxiYmICsVgMsVgMc3Nzhr+rVldXMTEx0bDwbbeJRqN928Mki8VimJmZ0XqcenXOGlG/Y2BqI9EYXlhY4Jhnj7oxBDjhZ1DSEyGzW5iFHytWIatT711/Pc+fP9+2D0hEmXC1Xua6Wq2GdgiYm2ILgpfnyMS6TkZECfTX64FJVGQTixubhSXornkkEmkIRPIQSUgV2KzmQYWV12IhvUgUvxBD9VKpFGZnZ1GpVBiciHoIA1ObRCIR5PN5rK6uOl7rhJqVy+WunJdjRt+w9nNei6Cqaigby2aswo9bnWrQpdNp5HI5X4OvHXEf6cuEF4vFpqIQYSMWr3Wq1cBkFZb0vlPf4PB7JpNJ5PN5rK+vY2JiAuVyWSv4IM9v0hPzoGZmZiyPCxv2MN2gGFTIE3MKGZyIegsDU5ssLi4iEolgYWFBfohcWFtb64oGhR05KInJ+H5TVbWnAqYb7Q6LogpdNBpFKpVqS1CCgzLh+qIQYevJcFNwwW24MuK20MSkTZEJ8XM8NTWFmZmZpg/DRE+fKPggeqBkYihlJpNBNps1PS4sIpEINjc35d2+EEVf9EsJiM3PD1T8EIvFUC6X5d0AgxNRz2FgagMxbykSiaBQKGhbLpeTDyUbxWKxq4fkmQUluZHrl3aHhlYZVVtrRTsanYqioFQqaVXo3MxTuh8nDTcnREAbGhrCzMyMZUATPRmpVEqr1NmOc+PE4zZh6KI0d2gIx1yfK8Eq/KA+DO8uaZs3eN4QjiHxwD9A+v/2f6Ky9joWU5+1vOai8by+vm65AKqYB2V3XKcpAS3UKsKSWTB6GI9aVsZst2g0ahqYBAYnot5woF79gQIUiURMG/mcy+ReqVRq25wQvyiKgmQyibm5OSwtLWF5ebktrz8ej2NxcRGJREJ+KJS8NIKt1vkplUraXB6/KfXlAuLxOBYWFrC6uiofYsqudLpY48goQCqKgmw2i2q1ikwm4+k+mp+fx9zcnOvXHTS5R0ffAyWCklFj2uoeEOyG4xkVegCAN+oFIc4B+H/jGCYHP4QHHngAb775Jq5cuYLd3V3tWLMS63qRSATz8/M4deqU9rvAiDhudnYWmUzG9Di943cexL1HBhr2fXtnD5dfe6dhX6sqlQpGR0fl3S2x+5nQc3K92yGXy+H8+fOufobE72RFUbShs0QUfgxM1HXy+TxWVlZc/ZHqFBGUZmdnUSwWPTdwvRK9mhMTE/JDofU+fEzeZcmqkVooFLCwsGD7KbBb6XTaVUNWz03DUP/eFEXB3NwcpqenPX1fmejtBOCqV6wTnJ4zq3vBKjCZhSXoAtMVHMOXD//PGPj+78fW1hauXbsmHwrU16ZysqCv/vxnMhnThrMIyJFIxPS443cexI/eewjH7zwoPwQAuPzaO/jSpdfl3Z4FEZj8/Llvl0qlgoTHhadFcILN9SeicOCQPOo6xWIRU1NT8u5QURQF6XQahUIB0WjU9VAtv1Sr1a4akod6Q8ipGq5YNpqq1aqvQ5rEMDgASCQSnkKLk4a/IHrbRFGBWq3mWyEJMb9maWkptEUh4CIswebcms2VsgpLqFfM2zpwL569/R/g1rffxualS6ZhCfVrZtQLJjOa32T0s6qqqla8w2jh2+N3HsTPnfge07AE3TF+UAwKHbTKba8yPD7Hb5F6hTwvxFA9/XUN0/xCImrEwERdZ3193XSIYxiEISjpdVspdqfDbZwcp6qqL4FJlOsW85S8zjtz28h7z+0P4D//3p9oRQWCCDWrq6tI1ItCmJXB7iSrEGTE7Tm2CksAcPddd6HyvT+Jt956C2+//Ta+d29PPqSJm9cgGs5ra2soFAqm85bEcSsrKw3H/ei9h+RDDfkVmloJCWacBEyZl+f4ya95XAxORN2BgYm6TrFYDOUfFH2VtDAEJaEbe5lEGDIa2iQeswtLAFCr1VpavFYMm8rn89oaK61cU6cN6cHBQSiKggceeAC//Au/1vL3NXI/TmrVyB6q/s/4fy0UkF3498hms20rCqEv3mB0brw0io2+jiCXCL9L+rfeLbfcgtuPHsWLt0bx3LVD2N+/MXrdLmCh/rrdvvbl5WVt6GyhUDAtDKA/7ktLv4G/+QPOv8/xOw9a9kQ5MT4+7nsPk9tzFQZTU1O+DvVlcCIKNwYm6kphCk0iKHmpktYO6+vrLYWGThGFD76OJxs2N3MXKpUKhoeH5d229EMqRbnuds0xGBkZwQMPPICdnR08//zzeM/uD8uHtGQIx/A+fKxp6NgQjuG/Fb+LX504i/3aYRQKBUxPTzc81y/346T2GvSb2Cd4bUibPc9u8VkAOHDgAAYHB3Ho0CH8xV//NS5uHZAPsQxagtlrsCIWtJ2ZmcGpU6dMG83iuLP/MoPBwUE89NBDOHLkiHyYIbkohFuRAEuKd5PR0VFfA5PA4EQUTgxM1JXCsIBt2IOSUC6Xu66HyS9eepiSySQKhQIAYGJiwpf5Qk4cPXoUDz30EAYGBvDSSy9ZzpPxyul8oP+YLiE58yuYm5trmjPTqofxqGUvkOj5aoVVWNGXKJcdPHgQt912G/b39/HNN9/E6/VepXaT5y3l83nDazDyPQdw+fJlbG5uYmRkBGNjY7bB6X88fpu8yxVFUbR5fH5x+gFImESj0UCDI4MTUbgwMFFX6uQCtt0SlIRqtdpVc5j85GY4Yjweb5qn5PfQIyNi+F0kEsHGxgYuX77cUKraz8akmyByRP1hJOpFIQqFAubn5+VDXLNaY0fPabAzY3fOHpd6m0Sv0sDAAC6/9Rb+/O238Yb0HL3vyDsCIhrN58+ft5zftLOzg42NDWxtbSEajeL48eMYGGitJ8lMED1MdtfLiJfn+CkejwfSwyRjcCIKBwYm6kqdKGTQbUFJ6MS5archaTFTMdzMyfUR85RyuVzg11XfyBscHMTx48cxNjaGnZ0dvPTSSw1BSfCrYWjVq2PmfpzE6uoqJiYmEI1GWyoKIQ8B1BvEYdyBuxu2ezGOu/ED8qGOODln5wB8CsDVo0fxznvfi8tHjqB8/TquSr1Kxxx8LSNG8++80s9bKpVKpuFVDOPc3d3FiRMnMDIy0hScWl2TKRaL+f7zYbbmmBUncxiDEolEAil+YYXBiaizGJioK4k/VE57D1rRrUFJEL0kRp9MdzsxH0cM89JvD+NRHFF/2PQeUerr2+jnKfl1XeXXIsKCaEQfPXoUDzzwAPb29vD8889bDr/zq+FtFlasiJBVrVaxsLCAhYUF5HI5T0UhjL7/AG7FHbgbgzCu3jaEY7gDd2MAt8oPmXLa8FYUBf80n8eRI0egqqrpNZAD06sN/zLm9DW4IeYtJRIJjI+Po1Ao4Pveazws+erVq7h06RIA4MSJEzh69Kj2WKuBKRKJBNLz6uY+72RYgo8V8rxgcCLqDAYm6lpB95yIoDQ7O9uVQUkv6HPVCU6GbQ3hGM7/sz+XdweyrhEsihmIAHf3oILp2b+DI0eO4KWXXsLW1pb8JRq4aUTaMQosbhWLRYyOjqJWq7luqMnffwC34nZYh65BfA928brtcXpOztn8/Dzy+TzOnz+Pn1JVw549vQ/WG+hvOByO5+Q1eKWqKlKpFDKZDL7vveMYGxvD4OCgfBj29vawtbWFjY0NHD16VJvf9O0d+7LoVoIKTKL6pV3YdHJM0MbHx9syHM+KPjjlcjnXP49E5A4DE3WtYkAL2OqDUiqVQsLjSu5hUi6XXfcIhJmTsCR88xuvIDYwA0i9hX6va2RVzEDMU/rA2C/g93//97Gu/oltI11UCQwjUcnNaCFVp8x6lWS7eAN7eNtRaLJrTIuCBePj45iZmcHy8jIu1uc0WTmGKziGZ+Bk5o7da/BLsVjERxf/DdYrf4UHHnjAcPgdAOzu7mrzm37gb74P/2bl/+PpeiGgRWv1RGh6GRcMt6/jybacWzuxWKzjgUkQH2KsrKwwOBEFiIGJupbfC9gaBaV2lZIOmqqqvp6rTpN7K6zs7u5i5PAP4F/+0/8zsN7CIYt1d+Qy4devHrQMQ6LR2OlhR3ZUVW0oSPBbDzyAC0DTdlZ+Yp2TYXZ7eBsA8Caq2IN5wHTSOyHKxC8sLDRdfxGazKrnnQOQsVko2clrCMK5r7+Mr114EagPv7vnnnvkQwAAl769jeS/+A+mBSRE76h+k4t0RCKRtgQFOSiJLSwURfG98EWrlpeXGZyIAnQAQGfqphL5YHt729M6O3rxeBy5XA4AkEqleiYk6cXjcczNzWFm5kZPS7d7Hz4m7zI0ODiIsbExAMArBy7iPz134zr7TW5Yoj5PaWRkBN/97ndx5cqVph6lr+PJhn8HTQwNdMNJQ3USwFMDAxgZGcHhw4fxyiuv4M0335QPwyfwMa3y3CAOO+ph2sXr2NXVqzN6LTVcsQwpiqKgUChgdXUVmUzGlx4SOSDbvYZ2+NF7D+FvRb8H9913H26//XZcvnwZOzs7AID/evktfOPb17VjI5EI5ufnMTs7i0/9kyVc+uJbuq/UTITB6elpnDp1CqlUSj6kr1QqldCPPEgmk1hcXISqqshkMj35d42ondjDRF2tlbk58XgchUIBuVwOmUymrYuTtlu1Wu2pIXlO3HPPPVAUBXt7e7h8+TJefs54Un+r5MazXCZcNZkj4za8tMpLg97Jc87W58tcvnwZW1tbeOCBBwzLWv8sLuBwwx57oodJT+51MHqNpwGUBgbw348fxzfvuAM7P/iDOLywgAd8CEvQDZe0eg3t9o1vX8f/80/+Gv/h69/E7/3Xb+Huib+Dn1/4f+D8Xw03hCXoCkhMJ/4h3vyz99gufCuGwAa99lA36ESFPC/Y40TkLwYm6mrlctn1PCZ9UFpZWfF10n9YtRIsw0buyZGJBWAPHz4MVVVRrVYxODho+zyvxNf9ngMHED18GA993/fh0M4OtkzKhHeK1TBAI056TU5L/97Z2cGlS5ewt7eHEydONDTCT+ICftjm6+nt4W3DwGRlsj4M8NePHsWJEye0KoS7u7uYrIc7syGCveIb376OZ7/5Hcz9yxw+8x+eNRx+JxxRfxiqqjYsfHv77bfLhwH1+zw6/KDvi9Z2G0VR2jIs0S/64JTP5xmciDxiYKKu5mYB21gs1ndBSU9VVcNGU7cxa8SL4Xdyz87u7i4GBgZchQU3DgP4/oMH8YO33Yaht9/Gm6+8gluuXUMUwIn642HhZIgddEOw7MiBCVJ1NjF/S1Rx+9sOe5n28DbeRHNvkNm1Rz0s5eq9eyMjI1qhA5kITv1ArKEFg/Wb9B8g6Be+NeshBIDXnrsNtVpN3t1XwlAhzwsRnNbW1pDP55HL5TwX/yDqRwxM1NVUVbX9pa/UFybN5/N9GZSEarXaM71MemII3NjYGK5du9a0AOze3h4OHTrU8By//NjgIP75e96DI7fcgrfeegtvv93cIxKtb2HxMi6YFicQQclrWNIT1dl2dnYwNjaGe+65B8dwBQqexKv4lny4ZhevG4Yl2ASmX6+vbfXGG29ovUpmJutbPxDD78Tiw5VKBfF43LDHVe4hlCvvvfzcta4MC34KYuHedhH3wujoKDY3N1EqlRiciBxiYKKuVi6XtfkiMhGUCrqFSfsxKAnlchnj48YLXXYb0UsiV6AzWnz0nXfe8b2HSVEU5D7+cRTGxvDe2lexu7uL/X3z+jmHDXqarBr/QRPB6Ot4smEzC1KtEIuoHjlyRBvy9RyewV/g2XpRh5vbd/FqQ5EHPbPrJ67F+yMRvPTSS7h69ap8iKF+6WUSxOLDqVQKi4uL+MDYL1iu33Tp0iUMDAzgxIkT2u/XwcFB7FflO7m/KIqC9fV1eXdXYXAico+BibpesVhs6DlhUDLmpDeuWzySfAjTs38HAwMDeOmllwyDkvDmm29iZu5vNQxHakU6nUY+n8eRjQ08//zzePPNN3HMQcjQ9zI5mR/US/b29rSFen/rt34L6XQa25FLeBXfwi7e0DYzZvOvkskkcrkcjmxsNPUskjGx4OnW1hbGxsZMh9+JQh4bGxu45557tGqTm9UbJcz7VZjWYGoVgxORcwxM1PXW19cRj8cZlGysr693/R9Cpb7w6NTUFH577R/jhct/ZttI/uu9V5D85N9DNBr1vMgqdOt0RaNRzMzMQPmjP9IemzRozBu5q/5fo8Z/tzFbs8jKzs4OPvnJTwL1OTVH46/ZBkcxhFBPqZcKn5qaQiqVwjXdtXDDblhhL3thp9A0/M6IGFp59epVHDp0CPl83vPPULcTFfL8KE0fJgxORPYYmKjrlctlzM3NMSjZEMMXu5E+DOsXHn0Oz5iGD/18HDEcaWlpCYVCwVVvk/je+kVvf1Kaw3AMV/BBB/N+AAQy7K0TvAQmAPgX9cbZzMwMFhcX8Y9yM3gl8l8aynSL7et4sun6igVoV1ZWmhagdcvre+gV8vA7q/Lib7zxBt49soO1tTXLynu9TOmyCnluGQWnfrzOREYYmKhriYZsNpsFAAYlG9VqtesCUyQS0RrIIgyvrq42HCMa1vJmFExWV1eRSCRw6tQp5PN5y4aAoigN39tuocpjuILH8aRpb9MkLuAePNn0mtw4LW2dLlxwTt5hQ398uVxGIpHA+vo6iqX/hJ+df19TYNITvYvRaBSJRKLhZ91r8PH6vHabrF/vC9J2toV7QH9+xfC7l156ybS8+OHDh3HnQ2/hzJkzppX3el23VshzSx+cUL/ODE7U7xiYqOvoexs2NzcxOjoKVVV7sgKc37rpPCWTSZRKJQBoaiC3QlVVJBIJ7ZNyo7L0yWQShUIB8BDET+ICHseTTdtJkyDlxNl6A1kOTGJdIa+N5ladcxE6LpoErDNnziCRSGBqasp0yKQIrqlUyrBXyelr0PPynE4QZdCNhg+2uraUPNTxzTffNC0vfuzEEO564EaIEg1qcd1KpZLhz1Gv6aX5S07oKyyCwYn6HAMTdQ05KCUSCaTTaaDHFmYNUjesxRSPx7V5SuIayw1kP5w5cwYzMzPIZrPIZrOIRCINc6T095fMqOEfFLtAJBrNVscE6XEH5+Ni/TgzqqpiZmYGmUymYbiXuBdQD67FYlF+KlD/+m4DkN1rDgOna0Y5PU5mtt6WqDop5jdNxMdw4u/f1vRzKK7bwsICFhcXTQNvr1B6oEKeFwxORMABAOa1cIlCQFEULC4uIh6PY2VlBcvLy01/uJPJpDYBnMyl02lsb29jaWlJfqjj9Nd5ZmamrZ/k5nI5TE9Po1qtIpVKmTbM9byElJPyDhtuv8f7cQz342TTGjuiylwrwwHtmK1t5DbMRCIRZLNZTE9PAwAmJiaaft6NuAkNdgEuLNz2ST7u8lzr3Y+TuF+6Q2u4gtci38L84v+C+fl5zM/PW/7umJ6eRjabxerqKjKZTM8VR6hUKpiYmOi59+VWJBLRhmKafahE1GvYw0ShJXqUKpVKQ4+SUeOJPUzOhLHwg6IoyGazDfOU2hmW0um0Nl8J9WE3TrjtoXB7vFkAMfNlPIoEHm0KSwAwhGN4GI/iYTwqP+QbMeRO3tw24KenpxGPx7G8vKwVdHHySbYIQXbfr1vCktEQPDtu7heZ0VzA5/AMDkTewPT0NM6cOYPp6WnTYayozxGcmJhArVbruV4I8T76PSxB1+PEsET9hIGJQkc/2R71oThmQUmwWsCWbqpWq44DQTuIuUK1Ws31XKFW6cuEp1IpLC0tYWJiwnH5cTcNb7P5O1bcNJi/jEdxBcdwt/yAZKjeAxVG4gOSubk5JBIJLCwsIJFIaNW6nBQXENfE6FyLx5xes05zc/0F8ZxJg/luXszPzyOfz2NhYUG7HplMRiu2Y/T7Vj98a2hoqGfmNyk9XiGPiKxxSB6FhqIoSCaTmJ2dRbFYRCaTsQxJskKhgEwm42g4Vb+KRCIolUpa9aNOSSaTWFxc9HSdjYYOof4JuVxZzYgY+heLxTAzM2P4vcXQoqWlJZw5c0Z+uIHdUDDR0+KW/Tu54QqO4cu6nqNLDY8a+zqelHd1VDKZxNzcHFZWVgzPtwhTAAyLPvQC+b4+Uf/vJC44LhhyGLBY/tf5vSh+RiKRCBYWFgzP9/z8fMM1M+t5icViWiVTv34/i0Co71ETvZl2PYxezc7OIhaLYWFhQX6IiPoAAxN1XKtBSRDDAzhMwNr29jaGh4fl3W3hJKyYEcPK7BiVE4d0n2UyGdveLNFIF/OazBqEgtEQOieNUzPOmsg3e5cEJ4HJabgMmrgfUG9M290P8Xgc2WwW58+ft2ykdxOz+1oEJtRL1k/iAo4Z3NfCYQBRB9ffrmc0Ho8jl8thZWXF9ndpJBLB4uIipqensbCw0FTyX0983VZ+x9t9OAEXodCtbDaLcrmMlZUV+SEi6gMckkcdNzc3p62t0sqnx8ViEePj4/JuknRivpcIH2KektOJ/HpGjUojDxvM4/FSJlx1UH5cz2gOTzvow5JTD+BYy8O1WiWGe62trTn+uS8Wi0gkEj0z1MssLEHqKbqCY7ho0KsqiLDkhOidMZJOp5HL5SwrROpV6wtCJxIJbfFws2tSLBYxOjqK9fX1hkqIbtiFJejK7fstFov1ZYU8IrqBgYl853Yu0cLCguMGk5VyuWz6x5puUlXVdn6On+bn55HL5bQ1s5yEFZlZo9KMGNqkOCwTbsWo/Hg7eAlcVsOxAOCues/FD0iBSazx1A7imoyPj2NmZsb1/SAa6alUqu3XxG9GQ0uF16V/X8ExXDA5Xsxde1Xab0a+1uKaDA0NefowQ3y4kMlkkM1mkcvlTK+JvPBtMpmUDzHkJgQZ9fa2KhaLuT4vRNQ7GJjIN7FYDJVKBYVCAZVKxdEkbT9Vq1WoXbQwa6dsbm625RyJogrj4+NIpVKewoog9xjZuXvwZo+WH4FcVdWGRl47grmXuRhWDeaormFtJKhP5vVEMZdMJtPyNSkWi5iYmNCKQjhteIfFEI5Z3tffkXcApr1Mh+v/NXqOGRGakskkcrkcMpkMFhYWWhrmKK7J+vq6ZZU8URgikUhgdnbWsmcKHgOQHApbEYlEUK1WWzo3RNTdGJjIF5FIBIVCAUtLSxgdHcXCwgKy2WzbGzFhLJsdNuVyOdChi4qioFAoYHZ2tuVhlrD5FN7IyMgIFEVBfun/i9HRUct5FW6J8JXL5VoKgE64ncD+hkUPU1TXqEa9kIARJ3NEvJB7MNz2Klk5c+ZMQ8O71Z9/fc+bn41umVVYEjblHQZDMO+q/9foWCuR+npXc3NzSKVSvl8TJ71I+p4pq4Vv3YYleHyOmVgsxgp5RH2OgYk8m56e1haXFMPwRIWr5eVl7Y9gO5XLZUxNTcm7SUdVVcNPfVuln6e0srKCRCLRUlBy6+jRo3jooYdw6NAhqKoaWANHfIo+NDRk2sDzi5N1hSZxAW9YNJgPS2HJjpdP882IRnmhUMDMzEzLPRhmRMN7aWlJmx/jhgiKYmhip4YryoyuqxyYUD/GLCwbGRwcxBNPPIFareZpCJ4TbnqRxNy0lZUVFAqFpmGWXs+/X/fx+Ph4IOeIiLoHAxO5Fo/HUSgUMDc3pzVKRSNI/wdRfGJp9uliENbW1toy3KybBVH0IZ1OI5/PY3193fM8Ja8GBwehKAruuecebGxsQFVV7O7uyof5SsyjEQ30IIefmq0rJHwFV5oa1XryMLxjuGJbptqPhub09DRKpZK2xlZQAVZPLJwqikI8MvlBrVy3VU/laZv33I7himbeqFe+E8Mt9ZXyzgH4lMuwNDIygrGxMfyTf/JPXAdLL4zmN5l9yLC8vNy08G0r7D5scErhGkxEfY9lxckxRVGQzWa1tSjkoU6lUgnlchmpVErbl8vlEI/H27ruTyfLZneLUqmERCLR8qf9raynJLsfJ5vmdbyMC6jhimHRh4GBAdxzzz04evQotra2cO3atYbHzcqL+030rAHAzMxMy+fUivxJuwhSVtXW9OWpAeCDeMayPLVgHi+sKfVS4fF4vO29jMIQjuFHRz6IaORB7OzsYGtrC3t7ewCAGq7gOTyjHXvWJizp2ZXkdsMuxJkxWkPLOv7eID5Y2N3dxZUrV/Dbu7uWQTwoTtdvEr2T8Xgcf/ShD2Hyovv44/7sGhNzIRmaiPoXe5jIlenpaaRSqaawBABLS0tIJpMNnx4uLS1pw/XahdXy7FWr1ZZ6mcScFFF9rpV5SkM4hvfhY1pg0rsfJ/EwHsUAbm3Yf/ToUZw4cSMKXLp0qSksod4wbgfxCfr58+cDLwhhVrZcDgFGjuGK47DklSjfLioier0nWiHC4xtbB3Dp0iXs7e3hxIkT2u8gcb8N1SsFOg1L8Hm4opf70+w5dlHi6NGjeOCBB7TCOLsdCktwMb+pWl//LJVKYXJyEmNjYzhy5Ih8mCm7c+IGK+QREXuYyBUxnCKVSmF6ehqnTp3ShiepqopKpYJisaj1MomG9fDwsOkniX7LZrNQVRVLS0vyQ1SXzWaxtrZmGHyt6HsPnCz+aseqZ0RvEIexh7cxMHgAY2Nj+O53v4srV66YDr3r1MKsiqIgn89rPW7tuuf15F6LT9TDktug5ObTeXFfiN8NnWpcmt1PoncF9YAr7pvH8KTrACSH1VYYrRlmxazX1KpYx/HjxxGJRLCxsaG9bydz49pB3zubyWRQLBblQzR/ceQIotEovvvd7+KVV17RegzN+PUelXoRm3aOkiCi8GEPE7mSyWQQj8e1ctHLy8vaHxQASCQSmJ6eRjabxfT0NHK5HJaXl9vacFxbWwv0U/5eoLosvx6JRLSS0Gtra77NUzJq3BoafAf3RO7F2NgYLl++bDlPqYYrHQlL6FD5cZkIi2I7iQuuw5KbQCDKUq+trXVsCJ5gdj/t7u5iY2MD165dw9jYGO655x4AwL6rWHiD24BlxSwAGanhiumxRkMFBwcH8dBDD2Fvbw/PP/986MISDKrk5fP5hhEKer+ws4Pnn38e169fx4kTJzAyMoKBgQH5MMBDhUkrrJBHRGBgIrdUVcXCwgJGR0eRSqWwsrKCmZkZKIqC6elprcGoKIo2Tl0/p6kdgihq0GvW19cdlxZPJpMolUoYGhpCIpHwJSjBoCfEzD333ANFUbA3+AYuPf8XluHbydC0dtCXH89ms/LDbeUm/AhOGpvig5KpqSnfy1J74aSn5tq1a7h06RIGBwdx4sQJlG75EfkQW34GJjgMTS/jgu19LULTxfrPjPhwYWtrq+nxsBFV8tbW1rQqh/IwbvH6t7a2cOnSJQwMDODhhx/G0aNHDY/zSzQa7eiHAEQUDhySR76oVCpIp9NYWVmRH+qISqXS8U+7wywWiyGXy2m9IUbi8ThyuZwvBR2M2A1HOnr0KEZGRpqG3z2HZwyfZ/UJvJ/koGf1fSORiDaEcWZmxvdz6JSb/jYnDc50Oo3Z2VlfhmX6xW0RhSNHjuC977yDv3/7f8b31G7MdXLCyfnxYgjHGubxifvKTW+pGBoZjTyIZ/7Bv4S6e2OVJv0aXU4CWidFIhHMz89jdnYWS0tL2lIVgphHdloaalnc2UFma8v3QCh+B4blbxsRdQYDE9lSFAXJZBLlctlwzkssFkOpVGrrPCU7+XweKysrhq+Xbtjf38eBAwfk3Q3zlFKplOW8gla8Dx+TdwH1RtCxY8dwxx13NMy7EIyqhLWDVYNcNGzNGqJimKpRA7AdrOa46NmFAdGrFFSIboXV9TFzAsDf+d7/hp/+vlewtbXl6PeXn3OY/CQ+4Dj3q/8Zf/yV/5/8cIOw9MRaUepVWSORiO38png8rq355/cculKphFQqxWF5RH2OQ/LIkpi3Eo1GG/5gKIqirb2Tz+cDL6fsVrFY5AK2NsrlcsN8AXFN9fOUrBopfhsYGNDWiNmpz1eQw1KnPIxHLRvjotiAUc8XdOsDnTp1CoWAF7s14mQ4ll1YEveGGG7oZ6O0U14FcNtfP4+XXnpJG8Y2ODgoH9bA6hx2SjqdRi6Xw3TiH9qGJVgUxwgTVVUxMzOjzW+y+rkRQ/pEpUqjIX1eKYrSE/c6EbWGgYkMJZNJVCoVRKNRw7LRqqpiaGgI58+fx+joaOh6ctbX1zmPyYaqqloDRJSDRr1wR7uHWTkpE94pVkFIZtUIrVarWqOuUChgenpaPiRQIhCJhXD1m9hvJBaLoVKpYGhoCBMTE6H7WW/FdwBs4cZwz42NDVSr1YaiELKLIQtMSr0Kqbg2R9Qflg8xNSSteRZWIgytrKzYhqEzZ85o1exKpVLLC0orioJqtRqqDwOJqDM4JI8ahHnIjVtmQ87ohnQ6jWg0ing83pHrfT9O4gcGf9xRmXChE+XCzYYOmnHyGpV6OeVyudyx8uN2xPyr6enpUM0HNBp6J4ZEWgVWIzVcwSCeaRiuODAwgPvuuw+Dg4PY2trCzs4O4KAHrt2SySTm5uawtLSE5eVlw/NipxuG5unZzW/S82NosVg6o92Fi4gofNjDRA3UeplXuUepGxWLxY6Ude4GiqJgfHxcK0bQ7uutKAr+yac+jrGxMWxtbUG1KBOuZzZHKChuG6Bw+Bzxc1ar1TpWftyKqIxYq9VcL0ArJuTLW6vV5cSCs0bnVwwxkxc4tvMyLuBifc0p0XO0t7cHVVWxtbWFaDSK48eP44uRSGjCUiQSQS6Xw9zcHGZmZrTeYKPzYqcbepj0qtUq0uk0EokExsfHLX92VFVFqr7wrRjSZ3asmWg0is3NTXk3EfUhBiZq4qZxFGblctlx6ex+IXo2CoUCVFWFqqptn8w8Pz+PXC6HV3dVfOH5TzkefmdVjS4oQTco0+l0aMqPQ3d/LC4uIpFIIJ1Oy4dYOlvf5LB0WrffC6dzbvbwNgZxWN5tSK4WJw9X/Fc7O1i4dAlP/tIv4ZctGubtpCgKSqUSNjc3MTEx0TO/q90SYWhhYcHx/KalpSXt58xsSJ9MDEclImJgop7FBWwbzc/Po1AoYHNzE6Ojo8hkMo4bDn6Ynp7WFjxOpVJIp9OOyyZ3auiQ18Dk5nnFYrFhsVuzhl/Qkskk8vk81tfXXfcqoR6I7HqRRHByy03vyS7esA3WclgSLkpzu35nbw/pdBozMzO2C6sGTRTdmJmZcR1ke5V+flOhULAMQ6LwiujVtZoLJcRiMaytrcm7iagPcQ4T9SxFUZDP5y3XGuoHyWQSi4uLhvOUtre3MTw83HC83/RzCczmwshr0Ah25bqD5qbgg57X0uen4r+An//JJP7jma/j6tWr2v4gz4O4PpFIBAsLC4bXx47oRXLKrmKfntPeJdnX8aRh0HIS0M3Mz89r84as5s/4SfT6qara9POr1+57NWzczG/Sz8+zWkusHb8fiag7MDBRT6tUKpiYmAjlpPqgKfV1TBRFMV00VZSJDmJYnr4BY9UoCTMvE+nhsRGqzcMZGMADDzwA1Ice6ed2+d3Tpm9gtnJ93EYQNwUU2nkNnBBziETADOJnRxCFHVZWViwDADyeJycFSrqNm2IPsVhMGworr/XED9yISI9D8qinlcvlvhuWp5+ndP78ecu5DtVqNZAhRqJoADpUptwvXnp0vDRA9b0oe3t7DSWu9cOGvPa2yJR6Oerx8fGGwgFeuOlZEuyG7ul56TWBy2F8blSrVW19oHw+bzkMrBXZbBaLi4uYmZmxDUvweN95eU7YGRV7MFtiolwuI5FIIJPJIJvNNgy5jMVigYZhIuouAwA4GJp61nve8x7EYjH84R/+ofxQT0qn05ibm8P6+jpmZmZs/+Bfv34d2WwW6+vrpqHKDaVelv62225DKpXC6upqV/fuXceNktJuGu1eeoD+B/xDeRdef/11VKtV3Hvvvbj99tvx+uuvY39/H7fhCGq4or02t9LpNLLZLDKZjC8lzSelAHQBJ/FlPIqLONmwAcAxKYA6GZb3ffhB3IYj8m5bL+OC53PkhKqqWFlZQTwex5kzZ3z9GSqVSvjGN77hekHwGq7g+/CD8m5Dz+GZQM9Pp4nrU6vVkMvloCgKvvGNb+Ctt96SD4Wqqjh79ixGRkbwO7/zOxgeHsZ73vMeqKqKb3zjG/LhRNSH2MNEPa1fFrDVLzQsCio4sbq6ikQigVwu5/g5RvS9WktLS20vUx4kN3OHvIQlq54QsaDq3t4eTpw4gSNHbgQHq+eYEb1KYjHqVnqVjFzBMZzFx7RwJLuIkziLj+GKi/AJj718aOF5blSrVSwsLDQUhWilt0kUZllYWPD08yiGbNq9dyfH9AqjYg9mzpw5ow3Bm5ubC6T3nYi6E+cwUU+LRCKoVCo9O3FXjLNXVdXzhH3o5mUAQCqVcvWptpgHc/78ecvGSLezGg7XSlEGp5P1jxw5gmg0imq1isuXL7uao5NOpzE7O2s7p8OL0wB+BsfwZZNzY+RxPOm48IPVeTfj91wvp7wWhRDzblopvCEzC9VBD8MTxVv097T42fDy8+GnSCSCbDbraH5TqVTShizbHUtEvY+BiXpeqVRCKpWyHZ7WTfQTm/0sqCAa1mbV7PSmp6eRzWYNq+/1Mrkh2mpD8H34mLzL1MDAAKJ33Y1D7+7hUO23tIIQ5+QD6+LxOP6Pj38K3/ij53HuXONRfjWcJwH8DB511XM0iQv4mIvv7zRUCp3sQRG9rQAcFYWIx+PI5XJYWVnp6g8cnATbsBSZiMfjWFxcBAyKPQiVSgWjo6MNxzq5nkTUmziHiXreD/3QD+HAgQM984dOzEE5f/48PvzhD/v6vorFItbX15HP51Gr1Qy/tlKvvvexj31MWwPFTY9UtxMBSWytzgORA5iZwwBG9vdx++vfxS1vvYXsD30XkwB+6PXXcbreW7NVP1ZRFPzKL/7vGL/97yGX+U/4iz+7rH3qL7b7cdKX178F4DgSuFV+wMKf4xi+6aLhfB07jufmiN6+TqlWq1hZWcHm5ia+8IUvYHh4GOVy2XDuTDqd1hYJXl1dlR/uGk7CEurHDeEYruIv5IfaSsxv2tzcNJzfpCgKpqencfbsWe3YAwcO4Hd+53eajiWi/sA5TNTz1tbWMDU1Je/uOvp5SolEwtWQHzfEYpCLi4sNn3hHIhFt8cy1tTVPi5tSMyc9IXcAeC+AOwEcBHDLu+/iO5cuIRKJYGxsDIODgzhbXxQ2mUzi337md7HxpbexuvKfG8qSy9z23Bi5HyfxqrzTxndcFtIQQ+zszlWnhuIZEQsSDw0NoVQqNVTrFPPJhoaGLKtYdgsnYUmQh+t1krhG8mK2RhXylpeXDY8lov7AwEQ9T1XVri78EIvFUKlUMDU1hUQi0ZaCCqqqYmJiAtFoFIVCQSsTPjQ0FEjBgH5mFQIOAxith6Vb6mHpIIAxXMHw3h4GNjawWy8/fvToUfycomBxZAT/+tf+A7a2RH+TNTeNXTNvANiUdxrQH+e20SzCkNFcMfFYWMKSIIpCpFIpZLNZ5HI5JJNJ5PN5ZDIZLCwsdH3vrNvrCBe9qu1QrVaRTqcbwu2pU6eaApPZsdPT0/JhRNSDOIeJ+sL29jZGR0e7qnES1DwlN5R6ieNIJIKZmZmuHjYUZmbzmKIAIgafbJ3GM/h+XWh4/a67cHc0ijfeeAPFje/i03s/3XC8nVZ6ZvQLph4GcHf9v7JX6z1LQljms7RLJBJBqVSCoig4c+YMFhYW5EO6kpcFcxHgwsKtSiaTyOVyKJfLWFhYMJzfJDiZC0VEvUH+O0zUk8rlctf0MimK0jT0rd1hSUxcFyWOxRC9+fl5+VDygVFYidaDh/xLWh+WDhw4gEOHDmH4tdfw0nPP4dq1a3gp8gGM3OpmRpG3XgIjogfpksGmD0vwsehENxAfPKysrGB0dBSxWAyFQgFKD5St9hKW4OM95ydR5XBiYgILCwvI5XLI5XKmQ+/E8OVMJoNcLtew8C0R9Rb5bzFRT1pbW2uYQxBWyWQShUKho0Pf5ufnkc/nsbm5qYW1YrGImZkZzM7OIpvNyk+hFslzdA7Xt4PScfqwdOutt+K2227D3t4ednd3MbS7i6tXr+KF7dsx8p734Pjx49Kzg9FPwccL8eHHzMwM0uk0VFVFIpHA0tIS8vl8386FkYdVdlq6vui3WPC7WCxidHQUm5ubtnOWxLFra2soFAqWxxJRd2Jgor5QLBZDXfhhenq6YZ6SX2uxuCFew/j4uNa40xPzmlAv1c5PUv2ln6PzttSY/AAu4J/jSXw/ruDAgQO4/fbbceDAAbz11lt45513AN0wuHfffRdXrtx4/okTJzA4OKj7SsFwG5rcHt+NFEVBoVDQirTIc2LEotFGRSG6SdiCj1uRSAT5fB7j4+OGBTjczFnSL3xbqVTYI0/UQziHifpCWBew1c9T6tTiiPrXID5dtSOGrnTqNfc6ESfuqs8JEm699VYcPHgQb7/9thaU9C4B+DIexTqO4Tv1xW6PHz+Oa9eu4erVq/LhDVqdU+K04l4r86W6RTKZxNzcHFZWVhxVs4zH4w1rmnXTXEsvc5jCcg+IUOt0DaxYLKb1sNvNWRLDmsVixFbHElH4sYeJ+kK1Wg1VtTz9HCExT6kTf1DluVJOwhLqn6TOzMwgl8vxU9QAiXk/olcJAN58803DsCSM6D7x39nZwcbGBo4cOaKVHzfiRy+Bk7LfL+NCKBrKQcpms1hcXMTMzIyjsASD8tbd9DPlpbfQy3P8Fo/HtWqFTsIS6nNh9XOWxBpORsTQy4WFBSwuLvbMnDWifsXARH0jLIUfxDylzc1NTExMdGSekljTqZW5UqLxcOrUKWSzWY7Z99HF+n8HBgYwcOQIbrvtNly/fh1vv/22dGSzk7jQUGBhb28PL730Eqr18uNG18mvBqwo7S0q4Om3r+NJ375PGCn1tZVqtZrnNcrS6bT2M9VNDWw3IbhWX/C5k9LpNHK5HGZmZjz97hNzltbX123nLInCECsrKyiVSvxdSdSlGJiob6ytrWF8fFze3TYipIh5Sul0uu1Db0S1rlOnTvkyV0p8igqgqxp4YXcRwNGjR3HixAm8dugQ3nzzTbz77rvyYQ3eqP/3nEkD9urVq9jY2MA999yD48ePY2BgAAigAVvDlaaw1MtBCfUhqqKipNPeCjPiZ2plZUVrjIed0yF2To8LUjab1X7/tfK7D7o5S2J+UzKZlA/RLC8vY3R0lAvfEnWxfW7c+mGLxWL7pVKpaX/Qm6Io+/l8fr9SqewritL0eDs2RVH2c7ncfqVS2U8mk02P+7Elk8n9SqWyH4/Hmx7j5nwT1+qthx7a//PBwf0LwP6LwP7rNtuLwP5p3de5Hyf334ePGW4/e88n9z/20G/v/+TI403fn5vzTVyrfD4fyM92JBLZz2azXfNzNYRjhvfdw3h0fwjHmo5v56Yoyn6pVNrPZrNNj/mxKYqyXygU9guFgu210v8+tjuWGzduodmadnDj1rPb9vb2fiQSadofxNaOkOJkE0EmnU43Peb3pihK275XL276azUJ7F/QbS8C+68aBKXXgf0/APYnDb7eEI7tP4xHDRuwU7G/t1+pVPaz2WzbfiZ6aYvH422718X34rXytrXz95K4Vk5CdDwedxyyuHHj1vGtaQc3bj27tesPUzqd3i+VSm35A222TU9P71cqlf1cLmf7h9vPLRKJ7BcKhf18Ps/GncNNhOtCodBwreTQJIKTqtueNvh6TjfRg1EqlfZjsVjT49yMt3Q63ZEeY/F9p6enmx7jZrzNz8935P6en5/XQprd70ERstr9u5obN26utqYd3Lj17JZOp/fn5uaa9vu1iR6CTv7h0/dstbuRoN861ajstm1+fn6/UCiY9kJOAvtnDYLT2f9/e+8bI8l55/d9lxRXpHi8eTY0eMeNtfuM5dAicAyrcWvkAMvYqgS4HJGA0+2DcxD8oqvfHHiIg+lGkEDMi3TNGxuJX9QMEmQlx0FNIzjs6VXNCAZ0G8CuHpziWwE8V491QE6XI6qXCKiVznLXnCxxSYp88oLztLqremb6X3VXVX8/wAfYeap6dmaq/zy/+v2e33NBVmkeTdNUYRiqZrOZOkZ/oc5UrDPTM1r6xdfW5a77PUgIoRzHUYPB4MLX96j6550myKKUrtzUAKWl1TRN5ft+anxR5Xl9/LonMfoDd5oP51Wos1x5+XnypH7OrDO4HlUIoXzfX/tzOK/qTEVensujGYzksU1XP5fzkuUeDXKvqnDQWd/BYMAbGJTmy9QApaVVCKGiKEqNz+toNmedE6k8ZLYuUt+VL+rEbtI6oNfwprqFO6lzpzVvge2oeiKex59tHerXeBiGuXttjZZUXjUR3xTz/H6jM7nT3JSYJciilK7E1AClpXZZE5/R8onksVWpsxTTLDBep0KI4RqdPNzxndZkoJR01u5fejKXx8B2VD1Z8zyvUNdr2eZ58j0qm0J8ZlFKS2dd3zRtkEUpzdTUAKWl1vf9hRZN5yGbo+96Fy0ToCcK61xbNa1XBUujQVPysZPMc1bpIvX12sQ73Pp6FeG5ipH1MtGGNoVY93qlWR29XtMEeM1mUw0Gg40Piildo6kBSkvt7u7uXHtxGIahoinbxWbpKtuEZ2ER7gJP2kvmMi8rzyv63X/9vC/qzz+ro9m1db7O53W0lKuIP/886rLEIv6+o9frqhsTo0FWUd//KS2wqQFKS61hGCoIgtT4ReYlm7OuNuFZKDPeRHJRkwHRNCa/x+jzpujXa3StTFEyLvNo23bug/lp1dnBMvwuF5n395FZNEf2ZLrq/WL0veWqIItSujRTA5SWXqVUaiyplDIXZVRl/XAcnYRfNUFYpVu4mQqGpnE0y1T0LOBFFiE7OI/6ubjO4HYLN9Ut3Bkzec6sivNucWVsCiELsr5sVnXp3bTrm6bNTlFKFzY1QGnpveoDJg/rlFDQdS+zmrd1MrOW440GTGXKKl2knoRPcye8CK574n1VgL6MwKnoZaFJm+ct3vPynrFsZ13fNFp9UIbrS2kefQqEbCC9Xg+vv/56chimaSKKIty9exeWZaHRaKDf7ydPyxzbthFFEW7fvg3LsnB4eJg8pTTs7++jVqvB8zw0m83k4aWyhZt4DW/iK3hrzFu4kzx1Zn7zN38Tvu/j0aNH2N7eXsvzZhXEcYxarYbj42MEQZD5NcsSx3EQBAFarRYcx0kezhz9fLyMW7hz5TlX0e12UalUcHZ2hjAMYZpm8pTC4DgOdnd3UavV0O12k4dLQRzHcBwHlmXh7t27iKLo0mt2dHSESqWCR48eIQxDOI4DIUTyNELIAlw7j5wI2Siq1Srq9TpqtRoAQEqJdrsN0zTRarVwdHSUfMhKkFLC8zwAWFuwti70797r9bC3t4c4jpOnLMRreBNbuJkcHuN7+Ba2cHOmAOr69ev44he/iL9h/hK8f/E/beQ1i+MYjUZj6ddsHm7hDrZwc+K1fg/v4D28g7/x176Aeu0/x927d3Hv3j383997F//fX/08eXqmTBMsjaJ/9kUxDAOu6wIFe48RQoy9N+bhubYqTNNEu90GprhmQgi0223Yto1Wq1Xqm22ErBJmmMhG0uv1YBgGpJRwXRdBEODk5ATb29trCZb0xDMIAnQ6HViWdemHYhnp9/uwLAsAEAQBpJTJU+ZmmmAJ5+ed4f3k8IW89NJL+NKXvoQ4jtH+P35vY6/ZyclJLjIXr+HNYcA0iS89/bfxOy/9V3h7p4LP/+X/A/9/aeNXf/4Yv/3q8/jtV5/HX//lzyUfkhmzBOWY4/yL6PV6sCyrUBlCKSXCMMTp6SlqtdpGBUs4zxBaloVOp4MwDOG67oUZpDiO0Wq1YFkW6vU6giBY++uSkDLAgIlsJP1+H0IIBEGAs7OztZa92baNIAiGpVzr+jnyQqvVwsHBwdI+6C+bQE/iFu5ceSf/+vXr+LVf+zV84QtfwLvvvovwxw+Sp2wUo2WVl03msuSqoPiZp6/hpRc+j899egvBP7+JH/zgB2PH//ovf26lQdNlP+tFLCtowvk10yVfQRDAMIzkKbnANE34vo+Dg4O1lE3micPDQ2xvbw9LKy8LdnVgvLe3N7wZt8ybUIRsGgyYyMah1wfhvB7ecZy1ZAaS66U2fTIwyuHh4XACvujfZdaJ6RZu4gzvX5hpevnll/HKK6/gBz/4Afr9Pv7yo/6VAdYm0Ov1UKlUAABhGK50An5RCZ7m8597CjdvPA8A+OCDD/Dk37+ED3/6K8nTAAC//epn52XJMgOfRej3+6jVatjb24Pv+2sLdi/CcRx4nodarYb9/f3k4Y1k1vVN3W4X29vbOD4+5vomQhaAARPZGKSU8H0f7XYbjUYDjuOsdFKn0eV3nueh1WpdWZO+qegJ+N27d+H7/twf8pdNpC9iCzfxPXxrLGjSWaXPf/7z+PM//3P8+Mc/xhnex/fwrbHHbjK6HKjRaMD3/UvvgC+Ty67xSy+9hF/Zeg4ff/wxPv744+H42Y/+YwDA935k4/6fdsd8/pl/CuDXR75LudFNIXAe7F42AV8VrutiZ2dnI8uTp0EHu41GA+12+8os4f7+Pra3t4Hza7yq1yYhZYEBEyk9o+uDjo+Psb29jW63i9PT00s/YLJAd+Va53qpIhHHMSzLwunpKcIwnLmkZN47+XoC/j18C38l/gJ3/rNX8corr+DRo0fDrNL38C0GSxegJ+C63GvW6zYrk67z9evXIaXESy/ewIcffoif/3y8qcPjn9zB/T/t4k9/ZI+NA8Dz1/82gG8A+N3kodIyGuy22234vp/5dZuEXq8EAJVKhcHSFYyub7oqSziandrZ2bkyO0UI+QUMmEipaTabw/VBlUplbH1Qt9tdWcC0SW3Cs8BxHLRaLQRBANtOT3CXjc4s2baN4/B/h/i7A3z9T/8hvv2Tf4Lv4Oup7BNJs87248899xxeeeUVfPjhh4j+4s/x2T7Vv2Dw6RcRfvjVsbEkzzx9DcDv4jr+T9zCnZnXwl1Gnks49QT85ORk5ddNSjm8sdVqtZKHySUcHh6OtY6/rPRON2vRwfGqPgcJKTJsK05KiW3baLfb6Ha72Nvbu/AuZXC+B0uv10seWgo6uyWEQK1Wu/DnINOhJ1SdTmfqtU1fwVvJoSt5Sv5bNNr/BUzTZEnQEtCvg6zaj49e45dffhkvvvgi3n33XXzwwQd4/plreP76+L3Bf/nBfw8AuP7cvxsbH+VnT54FPvkCAOBv4vfxS3hveGwZAfNVTSom8R18PTmUKfq64bwZS1bvkzi/uVWv19FqtUq7v9KqkCPbZDQaDf49CVkCzDCRUqEn1O12e3gH7bLJbq/XQ71eTw4vzGgZYKfTYWnJkuj3+6hUKrh9+zaCILjwDuoivPjii/jffv+flH4D2lWi72hn1X78DO/j+vXrePXVV4drzD744IPkaQCA6OO/AwC49tQv1jMl+fjDF3DtPFgCgMf4u2PHX8ObM+2hNIlZs0zrKP/U1+3g4AC+71+atVgEZwM2o10l/X4fjUZjbH3Tsl9zhGwaDJhIKUgGKNNOdA8ODmCa5tTZimkYbROeLAMki6OzFLrr01XlJNNOTPWal7/56y/jv/wH7FqYBVm1H7/12ov40pe+hB//+Mfo9/v46KOPkqcMGXz6RQDAU09N3qj24w9fgPr0mbHSi3+PWyNffcbWjBscJ5mlYcgyMlqLcHR0hEqlgq2traUGvEII+L6P119/nTeVMkCXVx4cHCz9NUfIpsGAiRQe3Uhhnn2M9B3Uu3fvDktP5mVSm/Bllx6RX7C/vz9VN7b38M6Vk80XX/xswv2zn/0M/+z/+kecuGXIMtuPCyHgui7+12/9j/jX7/4RfvSjHyVPwU8/Hq86jz/9LPh5+pl0BuqTj5/7LFiaslB9kYAJ50HTd/D1C5+fOqi66PgqGW0K4bruQp0rwc1oV4oOeKdZ30QImQwDJpJbbNu+tAxECDHWSGHejIDuxIbzNU2T/q/L0NkttglfPfoOar1eh+u6ycNDLpp0Xr9+Ha+88gpeeOEFvPvuu/gXP/pnE88jy2UZ7cf1hPvs7Azb29v4i4/+VfKUIT/96NOxrz/3+b8a+1rz6aefbVr76ZQBE5YQNOH8+fkdfB3v4Z2hugtj3p6PugOiLq+c59oZhsHNaFeM7pCnS5rDMES1Wk2eRgi5ADZ9ILkkCALEcYyjoyM0m00IISYuvpdSpsYWwXEc7OzsTN2gwXEc1Ot1HBwccGPFNSKEGC5yvuza6TKqLdwcNgf4wQ9+gPDHD6Yu3SPLRQgxbIwy7c0G/bprtVpjrflHr28S8exTeOGXf4R/8+kO/vKDX0seBgB89MF/gE8+nfyhaOAfJ4eAGUvrysZoU4hZr92k93OyOkzThGma2N/fZ3aPkClRlOZJwzDUYDAYfi2EUFEUKc/zUudmoeM4KooiJaVMHdPatj38mS47j67WZrOpoihSpmmmjmmllBt37W7hzpjJ43lQX7tms5k6ppVSKt/3le/7l167LdxM/c5buKn+k//w8+o/3d5VLz0fphTP/mv1Ah6pX5rgr+KP1Ffw1oUm//9Ns1qtqsFgoBzHUUKI1HGt67oqDMNLrx2llObU1ACla9U0zbGASY8ppS6dCC9THRAlP9j1hI0f+vnVMIwLJ946GLZtO3WsjL6GN1OTe+1reDN1/rqVUqogCJTv+6mJd7VaVVEUKcdxUo+b1eef+ZMxn3n6TxTwTipQ0hr4R6m/X57/jutQCKFc1514w0JKqcIwVK7rph5H82PyNUcp/YVcw0TWSrPZRBRFiKJoWAvf6/UghBjboLTb7eLw8BDtdnvk0dlxeHg43CjVMIyxLnzHx8fs6JRjer3ecCd7XeqlG3JsbW1tzMbBV+3zs4WbC7fGXja6CUuy/bjjOHBdd6G1iqP89OPfxU8/VkM//kQlTxnyq/ijsT2YkuRtjdG6GF2XNtqRjZvRFgfXdREEAaSUyUOEbDxcw0TWhl4v1Gg0YBgGPM/D3t4eHMeB53kwTRPb29vD803TRBAEuHHjxspqrg3DGK6n6nQ6rPcuGPp5BAC1Wi3TjTfzxFXB0ih5XYOjGwMIIXB0dIRWq7Xk196vA/jG2Mh1fAHX8fzw6+SGtZPIY2OGPOCc762ktwHg/krFQG/6fnR0hL29vSW/5ggpLswwkbWxu7uLvb099Hq9YUan3W5DSom9vT3g/ENXoz9wV3X3S0/Y9KLyOI754VEgbNuGaZrDIGnW7odF5aKmBxexhZsznb8qTNMcNn4xTTOD1/2fALgD4J8ORz7Cz4DzrJKBf3xlsHSG9xksXUIcx+j3+8P3dZJ/Dg8Px1qQT+qCKIRAGIbwfR+DwWBj3lsJSdXpUboKJ60lCcNQ+b6vcL6OSCmlqtWqwvmi8MFgkHmdtZRSeZ439vPpRgHLWD9Bs3X0+ul1Znpd0yZcv8vWLV1knhpB6Os3uk7QNM0L16Ut2y3cTP19Jsm1S5MVQgwbc+j3at3Q46qmEDRfCiGU53nDz+TkscFgsLJmTJSuW2aYSOYIISbWRXe73dSapEajgWq1CinlMOvkeR4GgwF2d3dhWVamWR7nfBPc09PTsU1w+/0+KpUKdnZ2lrKGgmSDbdtjmxjrdWZ6s9S7d+8uvOFm3pknW7SMvYSWgS6BffTo0dg6Qb33z927dye+lyyTaTaLzWsZ47q5aDPa/f19VCoVbG1tja1NI/lGl1M2Go3koeEaY65LI5tEKoqidJnqDnfJDklSSqWUmphl2t3dHX4thMi8I920bcL1HTfeVcuXOisRBMGl1w9Tto0vsslMyLQmv492Unvu5DnLUF8XwzBSx0adpv34sryoPXnyPPpZFjcMwyuvi84Wuq7LbFNB1dmlSRl7x3FUGIYqCIJUt0RKC25qgNKlqT9EPc9TSqnUJFW3oR0d9zxv4htxFsrzNuHJn+EqXddVQRDwAz8HNptNFYZhKvC+TN2iepbH5Mkt3JxYevca3lR38d+kxqcx+X9M+v7a1/Dm0gIHed5y+qqbFcnHXNR+nK7eWW9CCCE2rsV/mdTXLvna05/ntm0X/j2W0qTskkdWRhAE6Pf7Y+l9vXhUp/4Nw4DruqjVapl2VZJSot1uwzRN7O3tzdVmWnf5q9VqbDG+BnS74m63i729vZmvgX687sBWFK7qgPccBD7CT/EJPk4eupD38A7ewzvDr6/6PzRXla5dhW3b2N3dHXagnJVms4nd3V12YVsjruvCNM253gf1dg04L8ee9fFkdQghEMfxsOyy1WqNfW7qLraWZQ1fi47j4O7du7Asa+Q7EVJMuIaJrIxGowHbtlGtVodjcRyjUqkgjmMEQTBcp5Tl5Gd0ncsie/I4joPj4+PM11SQNHqt2d7e3twTLb0uTa+xK8K6pmkCmU/wEZ6DwNN4JnnoQkaDnmn+D828+zgJIeC6LtrtNmq12lzBEs7XxtRqtbF9f8hq0BNnAHPvS6f33dLvo1wfml9s20YURfA8b9jZViOEgOd52N/fH/vs3traSj0v+FlJigoDJrIy+v0+Dg8P4bru2Hgcx7AsCzdu3EClUslsrxzDMBBF0fCOl+M4qTfzWXEcB3t7ewjON7gl2SKlRBRFuH379kLBrkZnNo+PjxGGYa6v4bTtwj/Cz/AJPh7bT+gyRltjz9NifNaGEXqifXZ2NtaYY150Qw8Aub+GZUFnZ5e1Ga1uCnH79m02hcgp+uYEzgOk0cCnXq9DCDHcDkSfY9v28PNcf/5qeY1J0XgaAG/pkEwQQuDJkydjY6enp2g2m3juuedQr9dx48aNhQMkKSXq9Tq+/OUv4/T0NHkYON/A9Gtf+xr29vaWvhlfr9fD6ekp7t+/j5OTEzx+/Dh5ClkQIQS+9rWvwXXdTK7hw4cPh9fw2rVrePjwYfKUtXMLd/AsXkgOT+TneIJn8CyewlOXluYlu739Cv7WzAHTFm6OlfNdhuM4cF0XjUZj4WB3lCdPnuDBgwfDa3jjxo1Ms9SbjG3bcF0Xb7/9Nr7+9a8nD8/NkydPcHx8jO9///vwPA9SSnz3u99NfYaQ9fH48WN0Oh18//vfh23b+MM//EMAwFtvvYWHDx/i+Ph4eO7XvvY1/MZv/AYajQa+/OUvIwgCfOMb34BlWTg9PUUQBOh0Okt9HyckS5hhIplg2zbCMEyVyMRxjF6vh3q9jpOTk4UnTbq8zjRN7O/vIwzDiSn/g4ODsTbhy6bb7aJWq8H3/WG7VbIcqtUqwjDE1tYWKpXK0q7hFm7iNbyJr+AtfAVv4efdL6Nu/Xdo/L1/mMqC5oFZA5kPEF8aLL2Hd1KtsWfNFk2LlBK+7w8zg1kFM7r9+Ouvv85S2QxwHGdYRqk39F42+hqenZ0hiqKxEm6SD7rd7tiGtlJKnJ2dDb82DGO4MX0cx3BdF71eb1hy2e120ev1cPfu3eFjCMk7bPpAlkq1WoXruhcuxA+CYBgoJY/Nw2AwGDaI0A0ker3esHRg1ehSlU6nw3r8BZEjjTksy1rK80Vz2Tqdp59+Gn//v72Lv/PV/2iuhexZcAt35g5mvoOvpx57UUboK3grOTQV38HFmQb9nrDq14RuCHFwcDD3GinyGXqNCs7Xoq4qK6CbAIFNIXKNbdvDQBoAfN9Ht9tFo9EYNoMYLb8VQiCKosybOxGyTJhhIktBnnc70h3uLvpwW9baIZy/6erOPRhZC1WtVteW5dGLmOv1+konh2Xjog1ol8FlwRIAfPLJJ/iD//lf4v7/8HCYvVw383ai04GR7oKX7IaXNboET7/uV8n+/j4sy8LOzk5hmnrkEb3mLLkZ7Sro9XpjTSFGsxokPxweHuLg4ABBECAMQxwdHQ274bbbbezv74+9h+vruGg5PiGrhAETWRg9uT05OcH29nYmb4J6AaluYYvzAKnf74+VbOjGEu12e+TRq6V/3n1tZ2cnl6VdeUYH3u12O5NJ9lXB0ij/6tunsGv/NTzPW/tEbd6AaVaWFUzp5hy6jHKZAe8s6BsYuqlHHoLfImEYBoIgwMHBwdJfi7Ogm0LcvXuX1zGn7O/v48aNG7h27dqwEYgQAqZpjq1tEkIMM7+rDL4JWRQGTGRhjo6OMl0fpO9w6nrnYKT97N7eHur1+thahb29PUgp1/qhqrNdupSFd7evxrZt+L6fSVZJM22wpPl3vc8NsxTrvo7zBE2zPmbW8zEhyGo2m/B9H3t7e2i1WrmYFOkOX67rsv34lDiOA9/3YVlWLkoa4zhGrVZDq9ViG/mCMfoe4Hke4jjOxXOKkFlgwEQWJusJ0e7uLnq9HhqNBlqtFlqtFtrtNgzDGC48Hs0o9ft9dLvdtQZMGGlZHZ/vMcUP98nodV96E+Cs7mQn1/FMwxZuDrMU+jquq5FAMjC5itF24dMy62PO8P7w59LZQX0ds7qBMi+6vAvn7cfX/f6QZ1zXxc7OztLXDi4D3RQC59eRTSHyiw6MPM9DtVqF53nDTY6znjcQsmwYMJHcM7pOCed3iw8PD+H7PuI4xt7eHmzbHq5bEkLAMIzcLCZttVrDkqB1TbbzinO+AW2n08lNg4UkOtBqtVrDOv11TLZHg5NpmOXcUb6Hb00VNI22JNcLu/Vm0Hm8jjifwLVaLTQaDXiel1lwXlR0Nh8LbEa7CvR1rNVq2N3dXeuNDHI5+n1T79WU5V6LhGQJ92EiucIwDDz77LNjAdKNGzdg2zYODg6GY6enp3AcB48ePcLh4SGuXbsG13VhGAYcx8Ef/MEf4Bvf+Mbw/HXT7XZx7do13Lt3D8fHxxt/d01PzHQWbhXB7TybsiKRden1ejg5OVnbXj/657js9zjD+3gHv48P8ZPkoan5Eb6PM7yPX8HfSh4CzoOx/xcBMNJq2rKszFpNL5t+v49Op4Pf+Z3fQbPZxMnJCV+T55neb37zm3j77beTh3OJ3hfoxo0buHfvXm73UNt0er0evvnNb+Kb3/zmxr/OSLFRlObFMAyV53ljY1JKFUWRajabY+Oe56kgCIZfm6ap6vW6klKmvm9etG1bRVGkDMNIHdsUHcdRURQp27ZTx7L0Fu6or+CtmU1+HwBKCKGCIFC+7yshROr4Kkz+Pq/hTbWFm6nzFnULN9Ut3Bmqx/Xr0vO8XL/mrlK/JpPvL5ukbdsqDENVrVZTx4qilFIFQaCCIFCmaaaOU0rpgqYGKF2LjuOowWCglFKpCViz2VSDwWBsvF6vqyiKUt8n7xqGoaIo2rgPdf17r2uCvYWbqWBoGpPfZ1Qd/K3j91mneoJdliBjdLK9adeybM9h0zRVFEXKdd213cyglJbS1ACla1EHEJ7nTQyEoihSYRgOPwRd1x3LMBVJfXd+1VmWdSiEUK7r5iJIfA1vpgKiyxzNqFxktVrdyGtZlgn2qM1mU0VRVOhMy7QKIZTv+2vNkmZlnt5zKKWlMTVA6VqVUiqlVGoCKqVUYRiqwWCgoihSvu8XetKmgybHcVLHyqIud8rL7zhLluk1vJl6/EXqa+m6bupYWdyE5yvOM6FhGJY6Q7Ep19I0TRWGYeE/KyiluTA1QOnavaxMxDCM0tw11EFg2SYuUsphpnDSNVyn0wRNswRLWiHEcF1d2Sba+vVYltfdVZY5Q6FLY8tSTjmNOnO4Sb8zpXTppgYozYX6jr2UMpVtKpO6gUBZshN5yypdZLJpgnaaMrzL1JOzMjT2kOdre9a17mzdjq6HSR4ropfdiCq7+rkcBEEpXpuU0pWbGqB0pVar1YlrBmzbVkopFW3I+hDP8wq9niDPWaVVqyfaRb6jXZTAN2t15jAMw0I/r13XLfzvsAyr1aoaDAalLrmklGZiaoDSlXjZBFtPODftzrbjOGONLYoiS17S6nLLImYnytzYYV6L2n68yM/DrCxzySWlNDNTA5RmqhDiyr14pJQbO1krUtmMLMlePFmpJ2ZFubOvryfvvk9WFqz9uL6em54lvMjRphB8vlNKrzA1QGlm6ru0em1S8jj9TJ2xyfPf6Kqgl/5CfT3zfDdb/4y8nldbhPbj+r02zz9jXtTvZUXLHlJKV2pqgNLMZCZievWEJ28LlJlVms+8dieT56Wx02RNbuHOxL2sFm2UUUTlSKlb3rITRcpS58XR7GHe3nMppbkwNUApzYl6kp2Xu/7MKi2mnpR5npeLSbZeKzhNydakQCnpFm6mHld287QWRoxsRstgaT5t21aDwUA5jpOL1yilNB8+BUJIbun1erAsC+12G81mM3l4ZZimiSiKsLW1BcuycHh4mDyFTEG/34dlWYjjGEEQQEqZPGVlOI4Dz/NgWRYcx0keHuM1vIkt3EwOp5j2vDLRarXQaDTgeR5c100eXhlSSoRhiNPTU9RqNfT7/eQpZAoODw+xvb2Nra0thGEI0zSTpxBCNhAGTITkHD3J3t3dvXJiu2yklPA8D57nodFooNVqcSK2BFqtFjqdDoIgWPmETEqJKIpw+/ZtWJZ15fW8hTszBUG3cCc5VHq63S4qlQoAIAzDlQfChmEgCAIcHBys/D2ijMRxPBYIe54HIUTyNELIBvE0AL67EpJz4jjG8fEx2u02Xn75ZXS73eQpS8e2bXieh5OTE96xzoCHDx/i5OQE9+/fx40bN1Z6TQ8ODvD2228jjuPkKSlu4Q6exQvJ4Qt5Fi/gDO/jQ/wkeajUPHnyBA8ePMAPf/hD3L9/H9euXcPDhw+Tpy2dZrMJ13VRq9VwdHSUPEwWoN/vo9Pp4NVXX8W9e/dWdk0JIfnj2nltHiGkAAgh4Hke4jhGo9FIHl4KUkq0222YpjlVBoIshhACvu8Pr+k0QcysCCHguu7M13QLN/Ea3kwOX8l7eAfv4Z3k8MagM7MA0Gg0pv57z4rjOKjX6zNdUzIfq7qmhJB8wpI8QgpEHMeo1WoAgCAIll4mYts2giDAo0ePsL29zUnBCojjGJZl4fT0NJNyLr22ZZ5rOksp3ijzPq4s6DLa4+NjBEGAarWaPGUh9DXd2tqa+ZqS+UheU8dxlv7+SwjJL8wwEVJQHMfBzs7OUsrlePc0H1SrVbiui729vaU01tAZiEajMVfJ3y3cmWtN0qZnmEaRUsL3fXS7Xezt7S2cQZRSIggCdDodrldaE6MZ23lfW4SQYsEMEyEFxXGc4d3ORbISjuPA9310Oh2W9qyZo6OjYVfERTqu6Um1buww74SOQc/i9Pv9sYYQizT5qFarCIIArVaLwdIa0eWzo90RmW0ipNwwYCKkwDiOg729vbmCptFuabVabSkZDbI4eoIthJir7FKXVZ6cnKwtW3iG95NDG8+i7ccdx4HrurAsi80dcoLujnh2doYoimAYRvIUQkhJYEkeISXANE14nodarYZer5c8nEKXai2r9Itkg75O015X13VRrVaXmimctfHDGd7H9/Ct5DA5RwgxbKoyTTmtLv8SQrCtf44RQixcbkkIyS9sK05ICej3+zg5OYHnefjwww8vnFzrxeK6pGTeUi2yGrrdLk5PT69sU62v65/92Z/hq1/9Kh4/fpw8ZW4+xE+whZtTtRZnsHQ1s7Qfl1Lij//4j3FycoLf+73f44Q8xzx58iQ5RAgpGYpSWg6llCqKItVsNsfGhRDKdV0VRZGybTv1OJpvpZQqDEPlum7qWLPZXMl1vYU76it460Jfw5upx9DLlVKqIAhUEARKSjl2zDCMia9lSimlazE1QCktsDpochxHAVC2basoipTrukoIkTqfFkMd9IZhqKSUSkqpPM+bONnOyi3cTAVOr+FNtYWbqXPp9Oqgt1qtjn1tGEbqXEoppauXa5gIKSG6YUAcx5BSLnVNC1kvzWYTu7u7AMDW0iVCtx/na5YQQvIHu+QRUkKq1SqEEBBCoNvtcu1DiRjtmsfrWl5m7XpJ8k2z2WT7cUIKDAMmQkqE3oC23W7DsixUKhXEcTxXe2qSL0bbwFuWBcuyUK/X4Xker23B0ftmHR8fw7KshdqPk3yiu5FOsxeXYRgMmAnJIak6PUpp8Ww2myoMw+HapVEdx1FRFK1srQtdrnod2qQGAKPrmpLHaP6tVqtj65e0yTVrycfRYmoYhgqCQPm+P3FNqRBCRVGkBoOBchxn4jmU0rWYGqCUFkjdQe2qxf8MmoqnEEJ5nnflddNNAkzTTB2j+XWa16QOqCYFy7S46tds8rqOPid0Z1O+rinNhakBSmlB1B+u07aU1pkKdt/Kv8luh1ep21BPez5dnzoQvuomh1Ze0n6cFlcp5dh7t2EYSik1lm00TfPCbBSldKWmBiilOVdPpj3Pm3kCZZom71rmXB0Iz3qNhBCXlvvQ9TtrIDzqqvbcoutRl+vxphaluTQ1QCnNsbNmlSapsxGLfA+6fHUmYZ5AeNRpSr3o6l3GZrTyvATX8zwGxSW12WyqwWCQev06jqPCMFS+76eOUUozNzVAKc2hOjO06GRau8idbrp8dbnksq6H/n4MivPhsjej5fqW8jjpOREEgXJdd+zr6Lw5yO7u7sSAilKaqakBSmmOlFIOF/4ve3Kk71Yva5JO51NPfpc9AdJB8ejEi67erDJ++iaK67rMNhXYwWAwFjQJIYZd8nD+/EkGSL7v83VN6QrlPkyE5BjbthEEAR49eoTt7W10u93kKQvR7/dhWRZ2dna458sa0HsrAUClUkG/30+eshD9fh+VSmW4zw/3a1otUkqEYYitra1Mrm+320WlUgEABEEAwzCSp5AC0Gq14Ps+dnd3YZomPM8DAOzv70NKid3dXbRarbHnD1/LhKyeVBRFKc2HzWZz6XelL9LzPK6LWKGrXsCvsxyTyn/o8l11ySvbjxdbKaXyfV9FUTTWDVF3U0yeq5RaesUBpfRSUwOU0g1Vb5TJoCk7dYnlOlpE6xIuTqqz9aLNaLNWsv146RwMBqnnkb7GyXMppZmaGqCUZqRhGLmfyGS13oL+ImBZVdZhknrdGtc/ZGMeXj+rzl7S7FRKjWWFm81maoxSuhJTA5TSJas3qhwMBmowGKggCHKdxdETrnVO+spmHibSWiHEMJuYh5+nDM66GW3W6sCYe3IVW10qLaUcBkvMEFO6FlMDlNIl63ne8I6+PF/bEIZh6rw8qdtS807mYurrncf1YTow5lqIxdTXeJ2Zw4tk+/FiK4QYdslj1pDStZoaoJQu2eR6hjxPsEbVG20ma+jpdBahNIrrmhZzGZvRZi3bj1NK6cKmBiilC9psNpXv+8OvgyBQnueNnWPbtlJK5aJ85zJ1cJfnSX/elCN7Z+X9+mKkfCuPWbA8u+zNaLNUl2EW5eellNKcmRqglM7paPnV6ET5ouBI3/VNfp+8WZSMWB4s8t+K65qmN09r0maxWq2qwWCQ64wYpZTm0NQApXQOm82mGgwGF2ZiovP9NUbHdnd3U2N5VWchihgIrEo9iS7yehGua7pc/ToocnmbZPtxSimd1dQApXQOdZMEPQGRUqpqtTqceMrzzQZHAw7HccZK9/KuEGJieeGmqyegycxiUdXrchgcj1vk7OEkdXDMbBOllF5paoBSOqdBECjf95XjOEoppaIoUkqpYdmdbdvDtuJ6jUsR1xPo9slFvcO+THWgXLZJpw6O2Zb6M9e1GW3W6owZrzOllF5qaoBSOqdSymFApDMNeu+M0UyT4zjKcZxCZyMcx9no9S6ji+jL/DfYhN/xKou6XmkW2X6cUkovNTVAKV3ASROOIAhKU8Yz6iZMJCepS7OK0LBjGeos2kXr88pq3jajzVq2H6eU0gtNDVBKl2wURaper6fGy6BeB7EJE0qMBImbFjxsWpBYtvVK08r245RSOtHUAKV0Rm3bnjixklIq3/cL0wlvXnUGoswTrNFruSnBYVIhxPBvUOYMRBE2o81ath+nlNIxUwOU0imVF2xQKoRQYRiqwWAwMZAqo3qSWcbMi17wvynX8ip1lq2MAXKRNqPNWsn245RSqk0NUEqn0LbtS/cl2sQJhi5jKtNd6U1dp3WVer0Lr3X5ZftxSilND1BKL1FnlXjXdbJlWfuhfw/P80pdfraIcmQT1+SxIqkzKWx2cLH6WrP9OKV0Q00NUEovsdlslrLsbJnqksSiBk36jjqv89XqTnJFbTFflgB/VbL9OKV0Q00NUErpwuoGAZ7npY7lVZ09ZFnW7Oogs0gT6bJuRpu1bD9OKd1AUwOUUro0dfYh7xMr3bSCmYb5LdK6Jq5XWky2H6eUbpipAUopXaqO4+S6ZEtPnouUHcmreq1LXtd+6Yl+np+PRZLtxymlm+BTIISQjHEcB8fHxwiCAFLK5OG1IaVEGIa4ffs2LMtCt9tNnkJmpN/vo1KpII7j3F7vs7MzVCoV9Pv95ClkRo6OjlCpVLCzs5O7600IIcskFUVRSmkW5mmDW/2z8M54duZpXRM3o81eth+nlJbY1ACllGamXueyrkn06NoLlmRlbx7WhnEz2tWpW7Sz/TiltGSmBiilNFP1JHrVbbt1C+mi7xtUNIUQa5tEs7nDeuS6QEppyUwNUEpp5q56/xs9gVt1kEZ/4SozezrTkdfmE5sg249TSktkaoBSSlfiKoImKaXyfV8FQbCSiTq9XL12LMvAdRXPKzqdbD9OKS2JqQFKKV2ZQggVhmEmZXJ6Y1JOnPNllqWR3Iw2n5qmyfbjlNIimxqglNKVKoRQnucttXyKa1fyrRBimPnjNd8M9Vo2ZnsppUWT+zARQtZOHMdoNBrDvXuEEMlTpkZKiSiKcPv2be61k2PiOEatVsPJyQnCMIRhGMlTpkYIAdd1sbOzA8uyeM1zShzHsCxruCdbs9lMnkIIIbklFUVRSum6XCRLoNtHZ7k+hi5f3RxgnnItrlcqpropRxZlmZRSmoGpAUopXauzBk1SSuV53kyPoflSSjnzWjZuRlt8+XqllBbE1ACllK5d3U3tqs5aedgYlS5HvZYtDMMrJ9LcjJZSSukKTQ1QSmku1KVaF21+yc0xy6kOhi66rrNmICmllNJFfBqAk1zURAgheaDf7+Pk5AT379/H2dkZer0ecN7YIQgCPHnyBI1GYzhOysHDhw9xenoKz/Nw7do1PHz4EDi/7r7vQwiBN954A48fP04+lBBCCMmEVBRFKaV5cnRhvy7V47qV8qvXNXmex9JLSimla/Pa+T8IISTXGIaBIAgAgO3CN4wgCGCaJvb29uA4LIoghBCyWrgPEyEk9+hSrMPDQ/T7fbTb7eQppKQ4jgMpJfb29lCv12GaZvIUQgghJHNSaSdKKc2LeoH/6N5Knucp3/eVECJ1Pi2HQgjluu5YxzyW5VFKKV2HzDARQnKJzirdvXsXlmXh8PBweKzRaOD09BRBEEBKOfY4UnyklAjDEEiUX/Z6PVQqFdy9e3fY/IEQ3QQmiiI4jsPnBSFk6TBgIoTkjmq1iiAIcHp6CsuyJq5XchwHx8fHDJpKhl6r1ul00Gq1kocRx/HwORGGIa/9hmMYBsIwRL/fR6PRwNbW1nCtIyGELJNU2olSStflrHvs6D17pj2f5tdZN6PVHRNHyzXp5iiEUFEUKc/zxsYHg8HUzyFKKZ3S1ACllK5c3Trc87yZ1ybpiTMnScV11kBZq583ruumjtFy6/u+iqJo7P1CCKGUUjM/jyil9ApTA5RSulJ1ZmGRTIFuCLDI96CrV0qpgiBYqImHEEL5vq+CIJj7e9BiaZqmUkop0zTHxl3XVUEQpM6nlNIFTQ1QSulKlFIqz/PmyixMUmcbuKltMRzdkDh5bB51loqZxvLrOE4qMNJBVLVaTZ1PKaULmhqglNLMNU1zqZNl7bIn4TQbq9VqJhlB/bxi0FxuXddVvu8Pv2ZpJqU0Y1MDlFKaqToTkCynWZZSShWGIYOmnDrveqVp5eS5/Eop1WAwUI7jDIPvMAxZkkkpzcrUAKWUZuJoY4esJstaIYQKgiDVQYuuz0mb0WalEEJ5nreS/4uuR73+jTdHKKUrMDVAKaWZaBjGykulPM9jM4AcuK6sj24oklU2k1JK6UaYGqCU0lLpOA4zDWtUdzBcVxaA65oopZQuaGqAUkpLZ9brZuhkZ92MNiv1urZ59vmilFK68aYGKKW0lOZl8r4p5jFIXdUaKkoppeXxaQAOCCFkA3j48CEePXoEz/NwcnKCx48fJ08hS0BKCd/3IYTAG2+8kau/84MHD/Dcc8/h3r17OD09Rb/fT55CCCGEpEhFUZRSWmYNw1CDwWDpewDR4uyDte51VZRSSgtlaoBSSkuvntizEcDy1PvhFOVvqttS+77PdU2UUkovMzVAKaUbYVGyIUUwj+uVptV13cL+7JRSSldiaoBSSjdGIYQKw3Dl+wOVxVVuRpultm2rKIpYpkkppTTlU8kFTYQQonEcB2EYIggCGIaRPFwK4jiGZVkQQsDzPAghkqeQC5BSIgxDAEClUil0A4XDw0NYloV2uw3XdZOHCSGEbDAMmAghEwmCAK+//jparRZ6vR7CMES1Wk2eVgriOEaj0UAcxwiCgEHTFBiGgSAI0Ol00Gq1kocLSb/fR6VSgZSSzwNCCCFjpNJOlNLNVkqplFJjC+Edx1GDwaDQZVfTWOS1OKtS72dlmmbqWFnUzwPu2UUppZQBE6U0pRBCKaVSQUMYhiqKotT5ZZNB08Vu0t/GNM1Cdf2jlFKamakBSilVQRAoz/PGxnTmqVqtps4vm7oJADMMn7mpLbh1J0U2BaGU0o02NUAppcPgKNk1zPO8jZk86gxDmUvPpnHT268LIZTneYXvBEgppXRuUwOUUqpwwbol13VTmacyaxjGRrebLtpmtFm6CWu3KKWUTjQ1QCmlQz3PG2YXNrVMbVMzLJu0Xmlaua6JUko30tQApZSO2Ww2VRAEKgiCjZ08b1LQVJbNaLNSSqnCMFSe523Uei5KKd1gUwOUUkonKIRQYRiWeg0XmxxMJ4NKSindHLlxLSGETEkcx7AsC0IIeJ5Xuo1Ny7gZbVbEcYxWq4VOp4MwDGGaZvIUUnCEEKjX6zAMY2y8Wq2WdhNvQsjFpKIoSimll6uzC2UpyWJDg/nVjUE2oVxz09RrOHUW0TTNjdlagVI6ZmqAUroCpZQTJ9sXjdP8WZamCGX5Pdap3NB9qsquLlHVpZcsV6V0Y00NUEpXYBRFqfbc1WpVKaU44SqQOjNTxGCDk/zl67puYZ8PdLLyfE+6wWCggiBIHaeUboSpAUrpCvQ8Tw0Gg7GJqu5ElzyX5lvbttVgMChUu3V9t5xlZMtXt9/f1L27ymgQBEopper1euoYpXQjTA1QSlegvmupJ1W6Np5rSIppkTa45Wa02cvyrfLYbDaVUkr5vs/sIaWba2qAUroi9YJinJfyMLtUbHXQlOdAhOuVVqcQYpg1ZsljMTUMQw0GA9VsNpUQQkVRxPdpSjfT1ACldEWOdlwaDAZqd3c3dQ4tlnktdeO+QetTB6lFKtmkn71moihSvu8Px3RlQN5e35TSzE0NUEpXaBRFwwXFvAtdDqWUKgzD3EyqWB62flkGWTyFEBNvLkgpJ45TSkttaoBSukJt2+YdyxKqy7GSnRBXLfcIyo8MXCmltJg+NbKBLSFkDcRxDADY399PHiIFJo5jWJYFAAiCAEKI5CmZ02w24fs+Go0GHMdJHiYrpt/vo1KpQAiBMAwhpUyeQgghJKekoihK6epkK/Hy6zjOytcOsblDvtX7d7ErJqWUFsLUAKV0RbKV+Oa4qgBGcjPawmiaJtc1UUppMUwNUEpXpGEYqlqtpsZpOdVZhayCprx26KMXqxuEeJ7HAJdSSvNraoBSSmlG2radSYtpdmErrmz5TimluTc1QCmlNEN15zrbtlPH5nFV5X40W5vNphoMBizRpZTS/JkaoJRSmrG6fG6RjBAzE+WTbeAppTR/sq04IYSsgX6/D8uysLu7O1fLbyklwjAEAFQqFfT7/eQppID0ej1YloXbt2+vpRU9IYSQNNfOIydCCCFrQEoJ3/fR7XbRarWShydiGAZ830en05kr2CKEEELI9DBgIoSQNSOEgO/76Pf7aDQaycNjNJtN7O7uotVq4ejoKHmYEEIIIUuGJXmEELJm4jiGZVkAgDAMLyzFchwHu7u7sCyLwRIhhBCyIhgwEUJITmg0Gjg+PkYYhpBSDsd12d7rr7/O9UqEEELIimHARAghOcJxHHQ6HQRBACklpJQIggCnp6eo1WqI4zj5EEIIIYRkCNcwEUJIDrFtG+12G0II7O3tYX9/P3kKIYQQQlYAM0yEEJJDdEleHMfo9XrJw4RcSrVahWEYyWFCCCFz8DQA9qQlhJAc4boufuu3fgtvvPEGjo+P8e1vfxs//OEPGTiRqdBt56vVKs7Ozvi8IYSQBWGGiRBCcoKUElEUASOb0fZ6PVQqFbTbbe65RK5Et6jvdrvDjZFd102eRgghZEYUpZTS9WoYhoqiSDmOkzoGQEkpLz1OKQDl+74aDAZKCKEAKCGEcl1X+b6fOpdSSunUpgYopZSu0GazqaIoUtVqNXVsVCGECsNQua6bOkYpzgPrIAiU53nDoAnnz53kuZRSSqeTXfIIIWSNOI6Der0Oy7Km2l9JCDEssWq1WmwzTiYShiG63S5ardbYuN4Umc8bQgiZHq5hIoSQNTDvZrRxHKPRaCCOYwRBMJwAk83FcRyYpjk2tr+/D9u2h19LKeF5HgaDAQaDAVzX5XOHEEKmhAETIYSsmGVsRttqtXB8fIwwDIctyMnm4nneWABk2/awO55uBGGaJra3t3Hjxg1IKdkMghBCZiBVp0cppTQbq9WqGgwGqtlspo7No+M4KooiJaVMHaObo+u6KgxD5TiO8jxPDQYDZRiGAqCCIEg9R6rVqgrDMPV9KKWUTjQ1QCmlNAN1cKMnssvStu1Mvi8tlqZpKs/zlOu6w+Co2WyOBU9az/PYOY9SSqc3NUAppXTJuq6busu/TE3TnKrTHt0sB4NBqhW9aZpKKaVM00ydTymldKKpAUoppUtS75+0ilbgei8n27ZTx+jmKaVUSqmxluL6+cjsEqWUzmRqgFJK6RK8ajPaLOQGt3TU0VJN/dwIw5D7MlFK6QxyHyZCCMmIZrOJfr+Po6Oj5KFM0V34Op0OHMdJHiYbhGma8DwPvV4PhmGg3++j0WhM3caeEEIIwICJEEJKiBACQRCg1+uh0WgkD5MNQgiBnZ0dnJ2drTx4J4SQMsCAiRBCSooQAq7rQko5935PhBBCyKbDjWsJIaSkxHGMRqOBXq+HIAi4wS0hhBAyBwyYCCGk5LRaLRwfHzNoIoQQQubgaQBcEUwIISWn2+3i2rVruHfvHo6Pj1meRwghhEwJAyZCCNkQHj58iLOzM/i+jwcPHuDx48fJUwghhBCSgAETIYRsEL1eDw8ePIDv+zg7O0Ov10ueQgghhJARGDARQsgEhBB49tln8eTJk+ShwvP48WMcHx/D8zxcu3YNDx8+TJ5CCCGEkHPY9IEQQhK4rovBYIDBYADf9yGESJ5SePr9PizLwu7uLje3JYQQQi6BARMhhIzgOA6klLhx4wYqlQoMw0AYhqXsLqeDpp2dHQZNhBBCyAVw41pCCBkhiiK0Wi0cHR0BAKSUCIJgGFyUESEEfN9Hv99Ho9FIHiaEEEI2GmaYCCEbj2EYw3/3+33s7OyMfW1ZFgzDKG0WJo7jYTAYBEEpSxAJIYSQeWHARAjZaBzHged5w687nQ5s204FUQcHB6jX68OxMtJoNHBycsINbgkhhJAEilJKN00hhPI8T3mep4QQY8dc11VRFCkp5dj5SqmxsbLqOE7q96eUUko32NQApZSWWimliqJIOY6TOobz4CgMw7GgwTRNNRgMUsFVWW02myqKImUYRuoYpZRSukmyJI8QspGMrtORUsLzPARBMFynVKvV0O/3EYYhfN+H53loNBqI43jku5SX/f197O3twfd9mKaZPEwIIYRsFKkoilJKy261WlVKqWH5ned5w1I0z/OG55mmqer1+saWpxmGoQaDgbJtO3WMUkop3RBTA5RSuhE6jqMGg8FY2ZlhGEopxVK0Ea8qYaSUUkrLLPdhIoRsNFJK9Pv9sbHBYADLstDr9cbGNxm9H1Wn0ylte3VCCCFkElzDRAjZaJLBkl6vkxzfdPR+VDs7O3BdN3mYEEIIKS0MmAgh5DxQ8jwPnufBsqyNae4wCzpo0k0yuMEtIYSQTYABEyGEAMNNaVmKdzlxHKNWqwEAgiBg0EQIIaT0cA0TIYSQuXAcB/V6HZZlsYSREEJIaWGGiRBCyFw4joNOp4MgCCClTB4mhBBCSsHTANjuiBBCyFx0u12cnZ3B8zycnJzg8ePHyVMIIYSQQsOAiRBCyEL0ej2cnp7i/v37OD09ZXkeIYSQUsGAiRBCyML0+32cnJzg/v37ODs7Y+MMQgghpYFNHwghhCwNbnBLCCGkbLDpAyGEkKWh92qq1+sMmAghhJQCZpgIIYQsHSEEgiBAr9dDo9FIHiaEEEIKAzNMhBBClk4cx7AsCwDg+z43uCWEEFJYGDARQgjJhDiO0Wg00O/3EQQBgyZCCCGFhAETIYSQTGm1Wjg+PkYYhtzglhBCSOFgwEQIISRzHMdBp9NBEAQMmgghhBQK7sNECCFkJXS7XZydncH3fTx48ACPHz9OnkIIIYTkDgZMhBBCVkav18N3v/tdbnBLCCGkMDBgIoQQslL6/T5OTk5w7949XLt2DQ8fPkyeQgghhOQG7sNECCFkLUgpEQQBOp0ON7klhBCSW9j0gRBCyFro9/uwLAs7OzsMmAghhOQWZpgIIYQQQggh5AKYYSKEEEIIIYSQC2DARAghhBBCCCEXwICJEEIIIYQQQi7g/wc7XjWUpCoIjwAAAABJRU5ErkJggg==" + } + }, + "cell_type": "markdown", + "id": "163a82aa", + "metadata": {}, + "source": [ + "![image.png](attachment:image.png)" + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "id": "c19187ba-1ac9-400e-ae9b-684682349e8b", + "metadata": {}, + "source": [ + "## 🔍 Query Chroma" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f8d1524-c822-4303-b1a0-a3440cc90f82", + "metadata": {}, + "outputs": [], + "source": [ + "similarity_threshold = 0.5\n", + "\n", + "class MyVectorStoreRetriever(VectorStoreRetriever):\n", + " def _get_relevant_documents(\n", + " self, query: str, *, run_manager: CallbackManagerForRetrieverRun\n", + " ) -> List[Document]:\n", + " docs_and_similarities = (\n", + " self.vectorstore.similarity_search_with_relevance_scores(\n", + " query, **self.search_kwargs\n", + " )\n", + " )\n", + "\n", + " # Make the score part of the document metadata\n", + " for doc, similarity in docs_and_similarities:\n", + " doc.metadata[\"score\"] = similarity\n", + "\n", + " docs = [doc for doc, sim in docs_and_similarities if sim >= self.search_kwargs.get(\"score_threshold\", 0)]\n", + " return docs\n", + "\n", + "retriever = MyVectorStoreRetriever(\n", + " vectorstore=db,\n", + " search_type=\"similarity_score_threshold\",\n", + " search_kwargs={\"score_threshold\": similarity_threshold, \"k\": 20},\n", + ")\n", + "\n", + "\n", + "# Add metadata to the context sentto the LLM\n", + "def inject_metadata(doc: Document) -> Document:\n", + " doc_type = doc.metadata.get(\"doc_type\", \"Unknown\")\n", + " file_name = doc.metadata.get(\"file_name\", \"Unknown\")\n", + " content = f\"[SOURCE: {doc_type} - {file_name}]\\n{doc.page_content}\"\n", + " return Document(page_content=content, metadata=doc.metadata)\n", + "\n", + "class MetadataInjectingRetriever(BaseRetriever):\n", + " base_retriever: BaseRetriever = Field()\n", + "\n", + " def _get_relevant_documents(self, query: str):\n", + " docs = self.base_retriever.get_relevant_documents(query)\n", + " return [inject_metadata(doc) for doc in docs]\n", + "\n", + "retriever = MetadataInjectingRetriever(base_retriever=retriever)" + ] + }, + { + "cell_type": "markdown", + "id": "7446b2e0-23ca-4ad5-935d-1944f29b53cf", + "metadata": {}, + "source": [ + "## 🗣️ LLM and answers" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6ab9093f-a6be-4ade-98f0-6911f47cb091", + "metadata": {}, + "outputs": [], + "source": [ + "llm = ChatOpenAI(temperature=0.7, model_name=MODEL)\n", + "memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c4830b80-5d43-4d23-9ac6-410fc110b74b", + "metadata": {}, + "outputs": [], + "source": [ + "# Define your question\n", + "question = \"Who are the top 3 earners in 2023 with base, bonus, and total. Include names.\"\n", + "\n", + "# Define the system prompt\n", + "system_prompt = \"\"\"\n", + "You are an assistant that answers questions about the company Insurellm.\n", + "\n", + "Use the following chat history and retrieved documents to answer.\n", + "\n", + "Always base your answers strictly on the retrieved documents. If documents contain partial info, respond with what’s available. If there is no info, say so.\n", + "\n", + "Do not invent names, roles, or facts.\n", + "\n", + "You can use the document source information shown in the format [SOURCE: doc_type - file_name] if it helps you answer the question accurately.\n", + "\n", + "Always extract exact numbers (like number of employees, years, revenue, etc.) from the documents if they are mentioned.\n", + "\n", + "\n", + "Chat History:\n", + "{chat_history}\n", + "\n", + "Documents:\n", + "{context}\n", + "\n", + "Question:\n", + "{question}\n", + "\"\"\"\n", + "\n", + "# Create the prompt template\n", + "prompt = PromptTemplate(\n", + " input_variables=[\"chat_history\", \"context\", \"question\"],\n", + " template=system_prompt\n", + ")\n", + "\n", + "# Set up LLM, memory, and conversation chain\n", + "llm = ChatOpenAI(temperature=0.7, model_name=MODEL)\n", + "memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True, output_key=\"answer\")\n", + "\n", + "conversation_chain = ConversationalRetrievalChain.from_llm(\n", + " llm=llm,\n", + " retriever=retriever,\n", + " memory=memory,\n", + " return_source_documents=True,\n", + " combine_docs_chain_kwargs={\"prompt\": prompt}\n", + ")\n", + "\n", + "# Format chat history\n", + "chat_history_text = \"\\n\".join([f\"{msg.type.upper()}: {msg.content}\" for msg in memory.chat_memory.messages])\n", + "\n", + "# Retrieve docs using the original question\n", + "retrieved_docs = retriever.get_relevant_documents(question)\n", + "# print(\"\\n📦 Context sent to LLM:\\n\")\n", + "# for i, doc in enumerate(retriever.get_relevant_documents(question), 1):\n", + "# print(f\"--- Document {i} ---\")\n", + "# print(doc.page_content) # preview\n", + "# print()\n", + "\n", + "# Invoke the chain\n", + "response = conversation_chain.invoke({\"question\": question})\n", + "\n", + "print(\"\\n🧠 Answer:\", response[\"answer\"])" + ] + }, + { + "cell_type": "markdown", + "id": "794f74c2-9b85-4d2c-8476-f4b29a001752", + "metadata": {}, + "source": [ + "## 🎛️ Gradio interface" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa85878e-04e4-457e-8775-523194c26409", + "metadata": {}, + "outputs": [], + "source": [ + "# 1. Define your system prompt\n", + "\n", + "system_prompt = \"\"\"\n", + "You are an assistant that answers questions about the company Insurellm.\n", + "\n", + "Use the following chat history and retrieved documents to answer. Always base your answers strictly on the retrieved documents. If documents contain partial info, respond with what’s available. If there is no info, say so.\n", + "\n", + "You can use the document source information shown in the format [SOURCE: doc_type - file_name] if it helps answer the question accurately.\n", + "\n", + "Extract exact numbers (like number of employees, years, revenue, etc.) from the documents if mentioned. Do not invent names, roles, or facts.\n", + "\n", + "Behavior Guidelines:\n", + "- Respond only when the user asks a question or requests clarification.\n", + "- If the user greets you or expresses gratitude, respond warmly, but **avoid repeating the previous answer** unless explicitly requested for more details.\n", + "- If the user asks \"thank you\" or similar, acknowledge it with gratitude, but **do not provide the same answer again** unless further information is requested.\n", + "- If the user shares feedback, acknowledge it, thank them, and offer further assistance.\n", + "- If the user expresses frustration or confusion, empathize, clarify, and offer further support.\n", + "- If the user doesn't find a clear answer, encourage them to ask for clarification or provide additional details, and offer further assistance.\n", + "\n", + "Chat History:\n", + "{chat_history}\n", + "\n", + "Documents:\n", + "{context}\n", + "\n", + "Question:\n", + "{question}\n", + "\"\"\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e9c1276d-abd0-4766-88d0-a710c030014d", + "metadata": {}, + "outputs": [], + "source": [ + "# 2. Create the prompt template\n", + "\n", + "prompt = PromptTemplate(\n", + " input_variables=[\"chat_history\", \"context\", \"question\"],\n", + " template=system_prompt\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c0fe0466-7e87-4a40-b398-29d7e821f48f", + "metadata": {}, + "outputs": [], + "source": [ + "# 3. Set up LLM, memory, retriever, and the updated chain\n", + "\n", + "llm = ChatOpenAI(temperature=0.7, model_name=MODEL)\n", + "memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True, output_key=\"answer\")\n", + "conversation_chain = ConversationalRetrievalChain.from_llm(\n", + " llm=llm,\n", + " retriever=retriever,\n", + " memory=memory,\n", + " return_source_documents=True,\n", + " combine_docs_chain_kwargs={\"prompt\": prompt}\n", + ")\n", + "\n", + "def chat(question, history):\n", + " result = conversation_chain.invoke({\"question\": question})\n", + " answer = \"\"\n", + " for chunk in result[\"answer\"]:\n", + " answer += chunk\n", + " yield answer" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2243e89c-c49b-416a-8152-f3679a9e2c05", + "metadata": {}, + "outputs": [], + "source": [ + "view = gr.ChatInterface(chat, type=\"messages\").launch(inbrowser=True)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/week5/community-contributions/Personal Knowledge Worker/Project_GPT.ipynb b/week5/community-contributions/Personal Knowledge Worker/Project_GPT.ipynb new file mode 100644 index 0000000..4bafbb0 --- /dev/null +++ b/week5/community-contributions/Personal Knowledge Worker/Project_GPT.ipynb @@ -0,0 +1,388 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "dfe37963-1af6-44fc-a841-8e462443f5e6", + "metadata": {}, + "source": [ + "## Personal Knowledge Worker for Sameer Khadatkar\n", + "\n", + "This project will use RAG (Retrieval Augmented Generation) to ensure our question/answering assistant has high accuracy.\n", + "\n", + "This first implementation will use a simple, brute-force type of RAG.." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba2779af-84ef-4227-9e9e-6eaf0df87e77", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import glob\n", + "from dotenv import load_dotenv\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "802137aa-8a74-45e0-a487-d1974927d7ca", + "metadata": {}, + "outputs": [], + "source": [ + "# imports for langchain, plotly and Chroma\n", + "\n", + "from langchain.document_loaders import DirectoryLoader, TextLoader\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.schema import Document\n", + "from langchain_openai import OpenAIEmbeddings, ChatOpenAI\n", + "from langchain_chroma import Chroma\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.manifold import TSNE\n", + "import numpy as np\n", + "import plotly.graph_objects as go\n", + "from langchain.memory import ConversationBufferMemory\n", + "from langchain.chains import ConversationalRetrievalChain\n", + "from langchain.embeddings import HuggingFaceEmbeddings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "58c85082-e417-4708-9efe-81a5d55d1424", + "metadata": {}, + "outputs": [], + "source": [ + "# price is a factor, so we're going to use a low cost model\n", + "\n", + "MODEL = \"gpt-4o-mini\"\n", + "db_name = \"vector_db\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ee78efcb-60fe-449e-a944-40bab26261af", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "\n", + "load_dotenv(override=True)\n", + "os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "730711a9-6ffe-4eee-8f48-d6cfb7314905", + "metadata": {}, + "outputs": [], + "source": [ + "# Read in documents using LangChain's loaders\n", + "# Take everything in all the sub-folders of our knowledgebase\n", + "\n", + "folders = glob.glob(\"sameer-db/*\")\n", + "\n", + "def add_metadata(doc, doc_type):\n", + " doc.metadata[\"doc_type\"] = doc_type\n", + " return doc\n", + "\n", + "text_loader_kwargs = {'encoding': 'utf-8'}\n", + "\n", + "documents = []\n", + "for folder in folders:\n", + " doc_type = os.path.basename(folder)\n", + " loader = DirectoryLoader(folder, glob=\"**/*.md\", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)\n", + " folder_docs = loader.load()\n", + " documents.extend([add_metadata(doc, doc_type) for doc in folder_docs])\n", + "\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n", + "chunks = text_splitter.split_documents(documents)\n", + "\n", + "print(f\"Total number of chunks: {len(chunks)}\")\n", + "print(f\"Document types found: {set(doc.metadata['doc_type'] for doc in documents)}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "78998399-ac17-4e28-b15f-0b5f51e6ee23", + "metadata": {}, + "outputs": [], + "source": [ + "# Put the chunks of data into a Vector Store that associates a Vector Embedding with each chunk\n", + "# Chroma is a popular open source Vector Database based on SQLLite\n", + "\n", + "embeddings = OpenAIEmbeddings()\n", + "\n", + "if os.path.exists(db_name):\n", + " Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()\n", + "\n", + "# Create vectorstore\n", + "vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)\n", + "print(f\"Vectorstore created with {vectorstore._collection.count()} documents\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ff2e7687-60d4-4920-a1d7-a34b9f70a250", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's investigate the vectors\n", + "\n", + "collection = vectorstore._collection\n", + "count = collection.count()\n", + "\n", + "sample_embedding = collection.get(limit=1, include=[\"embeddings\"])[\"embeddings\"][0]\n", + "dimensions = len(sample_embedding)\n", + "print(f\"There are {count:,} vectors with {dimensions:,} dimensions in the vector store\")" + ] + }, + { + "cell_type": "markdown", + "id": "b0d45462-a818-441c-b010-b85b32bcf618", + "metadata": {}, + "source": [ + "## Visualizing the Vector Store\n", + "\n", + "Let's take a minute to look at the documents and their embedding vectors to see what's going on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b98adf5e-d464-4bd2-9bdf-bc5b6770263b", + "metadata": {}, + "outputs": [], + "source": [ + "result = collection.get(include=['embeddings', 'documents', 'metadatas'])\n", + "vectors = np.array(result['embeddings'])\n", + "documents = result['documents']\n", + "metadatas = result['metadatas']\n", + "doc_types = [metadata['doc_type'] for metadata in metadatas]\n", + "colors = [['green', 'red'][['personal', 'profile'].index(t)] for t in doc_types]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "427149d5-e5d8-4abd-bb6f-7ef0333cca21", + "metadata": {}, + "outputs": [], + "source": [ + "# We humans find it easier to visalize things in 2D!\n", + "# Reduce the dimensionality of the vectors to 2D using t-SNE\n", + "# (t-distributed stochastic neighbor embedding)\n", + "\n", + "tsne = TSNE(n_components=2, random_state=42,perplexity=5)\n", + "reduced_vectors = tsne.fit_transform(vectors)\n", + "\n", + "# Create the 2D scatter plot\n", + "fig = go.Figure(data=[go.Scatter(\n", + " x=reduced_vectors[:, 0],\n", + " y=reduced_vectors[:, 1],\n", + " mode='markers',\n", + " marker=dict(size=5, color=colors, opacity=0.8),\n", + " text=[f\"Type: {t}
Text: {d[:100]}...\" for t, d in zip(doc_types, documents)],\n", + " hoverinfo='text'\n", + ")])\n", + "\n", + "fig.update_layout(\n", + " title='2D Chroma Vector Store Visualization',\n", + " scene=dict(xaxis_title='x',yaxis_title='y'),\n", + " width=800,\n", + " height=600,\n", + " margin=dict(r=20, b=10, l=10, t=40)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e1418e88-acd5-460a-bf2b-4e6efc88e3dd", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's try 3D!\n", + "\n", + "tsne = TSNE(n_components=3, random_state=42,perplexity=5)\n", + "reduced_vectors = tsne.fit_transform(vectors)\n", + "\n", + "# Create the 3D scatter plot\n", + "fig = go.Figure(data=[go.Scatter3d(\n", + " x=reduced_vectors[:, 0],\n", + " y=reduced_vectors[:, 1],\n", + " z=reduced_vectors[:, 2],\n", + " mode='markers',\n", + " marker=dict(size=5, color=colors, opacity=0.8),\n", + " text=[f\"Type: {t}
Text: {d[:100]}...\" for t, d in zip(doc_types, documents)],\n", + " hoverinfo='text'\n", + ")])\n", + "\n", + "fig.update_layout(\n", + " title='3D Chroma Vector Store Visualization',\n", + " scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),\n", + " width=900,\n", + " height=700,\n", + " margin=dict(r=20, b=10, l=10, t=40)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9468860b-86a2-41df-af01-b2400cc985be", + "metadata": {}, + "source": [ + "## Time to use LangChain to bring it all together" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b3942a10-9977-4ae7-9acf-968c43ad0d4a", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.schema import SystemMessage" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45c0fb93-0a16-4e55-857b-1f9fd61ec24c", + "metadata": {}, + "outputs": [], + "source": [ + "# create a new Chat with OpenAI\n", + "llm = ChatOpenAI(temperature=0.7, model_name=MODEL)\n", + "\n", + "# set up the conversation memory for the chat\n", + "memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)\n", + "memory.chat_memory.messages.insert(0, SystemMessage(\n", + " content=\"\"\"You are an AI Assistant specialized in providing accurate information about Sameer Khadatkar. Only respond when the question explicitly asks for information. \n", + " Keep your answers brief, factual, and based solely on the information provided. Do not speculate or fabricate details. \n", + " For example, if the user simply says \"hi,\" respond with: \"How can I help you?\"\n", + " \"\"\"\n", + "))\n", + "\n", + "# the retriever is an abstraction over the VectorStore that will be used during RAG\n", + "retriever = vectorstore.as_retriever(k=4)\n", + "\n", + "# putting it together: set up the conversation chain with the GPT 3.5 LLM, the vector store and memory\n", + "conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "968e7bf2-e862-4679-a11f-6c1efb6ec8ca", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's try a simple question\n", + "\n", + "query = \"Who are you?\"\n", + "result = conversation_chain.invoke({\"question\": query})\n", + "print(result[\"answer\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b5a9013-d5d4-4e25-9e7c-cdbb4f33e319", + "metadata": {}, + "outputs": [], + "source": [ + "# set up a new conversation memory for the chat\n", + "memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)\n", + "\n", + "# putting it together: set up the conversation chain with the GPT 4o-mini LLM, the vector store and memory\n", + "conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)" + ] + }, + { + "cell_type": "markdown", + "id": "bbbcb659-13ce-47ab-8a5e-01b930494964", + "metadata": {}, + "source": [ + "## Now we will bring this up in Gradio using the Chat interface -\n", + "\n", + "A quick and easy way to prototype a chat with an LLM" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3536590-85c7-4155-bd87-ae78a1467670", + "metadata": {}, + "outputs": [], + "source": [ + "# Wrapping that in a function\n", + "\n", + "def chat(question, history):\n", + " result = conversation_chain.invoke({\"question\": question})\n", + " return result[\"answer\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b252d8c1-61a8-406d-b57a-8f708a62b014", + "metadata": {}, + "outputs": [], + "source": [ + "# And in Gradio:\n", + "\n", + "view = gr.ChatInterface(chat, type=\"messages\").launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e23270cf-2d46-4f9e-aeb3-de1673900d2f", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3476931e-7d94-4b4d-8cc6-67a1bd5fa79c", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week5/community-contributions/Personal Knowledge Worker/Project_PHI.ipynb b/week5/community-contributions/Personal Knowledge Worker/Project_PHI.ipynb new file mode 100644 index 0000000..b1ad1b8 --- /dev/null +++ b/week5/community-contributions/Personal Knowledge Worker/Project_PHI.ipynb @@ -0,0 +1,927 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "fOxyiqtzKqLg", + "outputId": "714d12c5-775e-42c8-b51c-979a9112b808" + }, + "outputs": [], + "source": [ + "!pip install -q datasets requests torch peft bitsandbytes transformers trl accelerate sentencepiece tiktoken matplotlib gradio modal ollama langchain langchain-core langchain-text-splitters langchain-openai langchain-chroma langchain-community faiss-cpu feedparser" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "zyxwwUw6LWXK" + }, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import glob\n", + "from dotenv import load_dotenv\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Zzqc9nk1L_5w", + "outputId": "0af5e1bb-2ccb-4838-b7a5-76c19285d094" + }, + "outputs": [], + "source": [ + "from langchain.document_loaders import DirectoryLoader, TextLoader, UnstructuredPDFLoader\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.schema import Document\n", + "from langchain_openai import OpenAIEmbeddings, ChatOpenAI\n", + "from langchain_chroma import Chroma\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.manifold import TSNE\n", + "import numpy as np\n", + "import plotly.graph_objects as go\n", + "from langchain.memory import ConversationBufferMemory\n", + "from langchain.chains import ConversationalRetrievalChain\n", + "from langchain.embeddings import HuggingFaceEmbeddings\n", + "from huggingface_hub import login\n", + "import torch\n", + "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, set_seed\n", + "from google.colab import userdata\n", + "from google.colab import drive\n", + "drive.mount('/content/drive')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "u_vbe1itNZ2n" + }, + "outputs": [], + "source": [ + "base_path = \"/content/drive/MyDrive/sameer-db\"\n", + "folders = glob.glob(os.path.join(base_path, \"*\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "f0lJBMjhMrLO", + "outputId": "5cdc6327-3a3a-4d5b-ca05-4c1383c020e2" + }, + "outputs": [], + "source": [ + "def add_metadata(doc, doc_type):\n", + " doc.metadata[\"doc_type\"] = doc_type\n", + " return doc\n", + "\n", + "# With thanks to CG and Jon R, students on the course, for this fix needed for some users\n", + "text_loader_kwargs = {'encoding': 'utf-8'}\n", + "# If that doesn't work, some Windows users might need to uncomment the next line instead\n", + "# text_loader_kwargs={'autodetect_encoding': True}\n", + "\n", + "documents = []\n", + "for folder in folders:\n", + " doc_type = os.path.basename(folder)\n", + " loader = DirectoryLoader(folder, glob=\"**/*.md\", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)\n", + " folder_docs = loader.load()\n", + " documents.extend([add_metadata(doc, doc_type) for doc in folder_docs])\n", + "\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n", + "chunks = text_splitter.split_documents(documents)\n", + "\n", + "print(f\"Total number of chunks: {len(chunks)}\")\n", + "print(f\"Document types found: {set(doc.metadata['doc_type'] for doc in documents)}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "zSjwqZ3YNBLp" + }, + "outputs": [], + "source": [ + "hf_token = userdata.get('HF_TOKEN')\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "t7rraUyHNkdP" + }, + "outputs": [], + "source": [ + "Phi_4 = \"microsoft/Phi-4-mini-instruct\"\n", + "db_name = \"/content/drive/MyDrive/phi_vector_db\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pDjj2S5ZPzF1" + }, + "outputs": [], + "source": [ + "quant_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_compute_dtype=torch.bfloat16,\n", + " bnb_4bit_quant_type=\"nf4\"\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 66, + "referenced_widgets": [ + "2a0377fc1e0c4c08944be1857c4e2409", + "7c8335e0c3f8459d89f3b9815a896e39", + "0fcb91f0551a4871b747f82e5fa6ff38", + "fa5c6cf8395840e08e2743d6e88190be", + "8613224ada934e7ba57fd5184ea61044", + "1180c8fe49e94873a024d38d33649852", + "4395c417cc854fc48da18d0ddd62671e", + "d678106a6601478cb5712991604788f0", + "5c4a8d25dbc942d5a596c8fa8580a785", + "c1b076c063e04536831d68e5e48f1692", + "9bcee7f185434cd0b1a998448236548c" + ] + }, + "id": "qzQzgir5VUBF", + "outputId": "1e7198a3-4857-49ab-f368-d430beddbf42" + }, + "outputs": [], + "source": [ + "tokenizer = AutoTokenizer.from_pretrained(Phi_4, trust_remote_code=True)\n", + "tokenizer.pad_token = tokenizer.eos_token\n", + "tokenizer.padding_side = \"right\"\n", + "\n", + "base_model = AutoModelForCausalLM.from_pretrained(\n", + " Phi_4,\n", + " quantization_config=quant_config,\n", + " device_map=\"auto\",\n", + ")\n", + "base_model.generation_config.pad_token_id = tokenizer.pad_token_id\n", + "\n", + "print(f\"Memory footprint: {base_model.get_memory_footprint() / 1e9:.1f} GB\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MjK3mBKHQBra" + }, + "outputs": [], + "source": [ + "from langchain.embeddings.base import Embeddings\n", + "from typing import List\n", + "import torch.nn.functional as F" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Q1BIMVW4Pf0A" + }, + "outputs": [], + "source": [ + "class PHI4Embeddings(Embeddings):\n", + " def __init__(self, tokenizer, model):\n", + " self.tokenizer = tokenizer\n", + " self.model = model\n", + " self.model.eval()\n", + "\n", + " def embed_documents(self, texts: List[str]) -> List[List[float]]:\n", + " embeddings = []\n", + " for text in texts:\n", + " with torch.no_grad():\n", + " inputs = self.tokenizer(text, return_tensors=\"pt\", truncation=True, max_length=512).to(self.model.device)\n", + " outputs = self.model(**inputs, output_hidden_states=True)\n", + " hidden_states = outputs.hidden_states[-1] # Last layer\n", + " attention_mask = inputs[\"attention_mask\"].unsqueeze(-1)\n", + " pooled = (hidden_states * attention_mask).sum(dim=1) / attention_mask.sum(dim=1)\n", + " normalized = F.normalize(pooled, p=2, dim=1)\n", + " embeddings.append(normalized[0].cpu().tolist())\n", + " return embeddings\n", + "\n", + " def embed_query(self, text: str) -> List[float]:\n", + " return self.embed_documents([text])[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7aUTue_mMxof" + }, + "outputs": [], + "source": [ + "# Put the chunks of data into a Vector Store that associates a Vector Embedding with each chunk\n", + "\n", + "embeddings = PHI4Embeddings(tokenizer, base_model)\n", + "\n", + "# Delete if already exists\n", + "\n", + "if os.path.exists(db_name):\n", + " Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "uWSe-8mATUag", + "outputId": "296804af-2283-435a-908c-48adaa6b4fd9" + }, + "outputs": [], + "source": [ + "# Create vectorstore\n", + "vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)\n", + "print(f\"Vectorstore created with {vectorstore._collection.count()} documents\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "1ZQ6agxtSLp5", + "outputId": "8e5bf8a7-fbaf-427b-9a67-369945aba80e" + }, + "outputs": [], + "source": [ + "# Let's investigate the vectors\n", + "\n", + "collection = vectorstore._collection\n", + "count = collection.count()\n", + "\n", + "sample_embedding = collection.get(limit=1, include=[\"embeddings\"])[\"embeddings\"][0]\n", + "dimensions = len(sample_embedding)\n", + "print(f\"There are {count:,} vectors with {dimensions:,} dimensions in the vector store\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qBIOPr2YT5FM" + }, + "outputs": [], + "source": [ + "# Prework\n", + "result = collection.get(include=['embeddings', 'documents', 'metadatas'])\n", + "vectors = np.array(result['embeddings'])\n", + "documents = result['documents']\n", + "metadatas = result['metadatas']\n", + "doc_types = [metadata['doc_type'] for metadata in metadatas]\n", + "colors = [['blue', 'red'][['personal', 'profile'].index(t)] for t in doc_types]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 617 + }, + "id": "fnuul36bUB3h", + "outputId": "f6cf1650-910a-4a03-f92d-9c200fb37de7" + }, + "outputs": [], + "source": [ + "# We humans find it easier to visalize things in 2D!\n", + "# Reduce the dimensionality of the vectors to 2D using t-SNE\n", + "# (t-distributed stochastic neighbor embedding)\n", + "\n", + "tsne = TSNE(n_components=2, random_state=42, perplexity=4)\n", + "reduced_vectors = tsne.fit_transform(vectors)\n", + "\n", + "# Create the 2D scatter plot\n", + "fig = go.Figure(data=[go.Scatter(\n", + " x=reduced_vectors[:, 0],\n", + " y=reduced_vectors[:, 1],\n", + " mode='markers',\n", + " marker=dict(size=5, color=colors, opacity=0.8),\n", + " text=[f\"Type: {t}
Text: {d[:100]}...\" for t, d in zip(doc_types, documents)],\n", + " hoverinfo='text'\n", + ")])\n", + "\n", + "fig.update_layout(\n", + " title='2D Chroma Vector Store Visualization',\n", + " scene=dict(xaxis_title='x',yaxis_title='y'),\n", + " width=800,\n", + " height=600,\n", + " margin=dict(r=20, b=10, l=10, t=40)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 717 + }, + "id": "Dgaeb7aRUF5d", + "outputId": "47546459-e169-4d2b-d0d7-4ebd135556e0" + }, + "outputs": [], + "source": [ + "# Let's try 3D!\n", + "\n", + "tsne = TSNE(n_components=3, random_state=42, perplexity=4)\n", + "reduced_vectors = tsne.fit_transform(vectors)\n", + "\n", + "# Create the 3D scatter plot\n", + "fig = go.Figure(data=[go.Scatter3d(\n", + " x=reduced_vectors[:, 0],\n", + " y=reduced_vectors[:, 1],\n", + " z=reduced_vectors[:, 2],\n", + " mode='markers',\n", + " marker=dict(size=5, color=colors, opacity=0.8),\n", + " text=[f\"Type: {t}
Text: {d[:100]}...\" for t, d in zip(doc_types, documents)],\n", + " hoverinfo='text'\n", + ")])\n", + "\n", + "fig.update_layout(\n", + " title='3D Chroma Vector Store Visualization',\n", + " scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),\n", + " width=900,\n", + " height=700,\n", + " margin=dict(r=20, b=10, l=10, t=40)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "BZcCyGI3YEwJ", + "outputId": "fd03e6ee-2ec1-4c6b-c14b-986255ca070c" + }, + "outputs": [], + "source": [ + "from langchain.llms import HuggingFacePipeline\n", + "from transformers import pipeline\n", + "\n", + "pipe = pipeline(\n", + " \"text-generation\",\n", + " model=base_model,\n", + " tokenizer=tokenizer,\n", + " max_new_tokens=4069,\n", + " return_full_text=False,\n", + " temperature=0.7\n", + ")\n", + "\n", + "llm = HuggingFacePipeline(pipeline=pipe)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WDY8-1gJUM1v" + }, + "outputs": [], + "source": [ + "# set up the conversation memory for the chat\n", + "from langchain.schema import SystemMessage\n", + "memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)\n", + "# memory.chat_memory.add_message(SystemMessage(content='''You are a helpful assistant that answers questions about Sameer Khadatkar **in English only**, based only on the retrieved documents.\n", + "# Do not respond in any other language.'''))\n", + "\n", + "# the retriever is an abstraction over the VectorStore that will be used during RAG\n", + "retriever = vectorstore.as_retriever(k=2)\n", + "\n", + "# putting it together: set up the conversation chain with the GPT 3.5 LLM, the vector store and memory\n", + "conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "dkuv5wD6jCrX" + }, + "outputs": [], + "source": [ + "def extract_first_helpful_answer(output: str) -> str:\n", + " if \"Helpful Answer:\" in output:\n", + " parts = output.split(\"Helpful Answer:\")\n", + " return parts[0].strip().split(\"\\n\")[0].strip() # Take only the first line after it\n", + " return output.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZY5BH4C3UY1E" + }, + "outputs": [], + "source": [ + "query = \"Who is Sameer\"\n", + "result = conversation_chain.invoke({\"question\": query})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "7n5PcQw0iRjO", + "outputId": "794c4dad-efde-4220-a9bd-50a1ae156229" + }, + "outputs": [], + "source": [ + "print(extract_first_helpful_answer(result[\"answer\"]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "vW025q5Tkwc3", + "outputId": "e57d34e5-a64c-4e0b-e29b-d887214331c4" + }, + "outputs": [], + "source": [ + "result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JIev764VkCht" + }, + "outputs": [], + "source": [ + "# set up a new conversation memory for the chat\n", + "memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)\n", + "\n", + "# putting it together: set up the conversation chain with the GPT 4o-mini LLM, the vector store and memory\n", + "conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OO9o_VBholCx" + }, + "outputs": [], + "source": [ + "# Wrapping that in a function\n", + "\n", + "def chat(question, history):\n", + " result = conversation_chain.invoke({\"question\": question})\n", + " return extract_first_helpful_answer(result[\"answer\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 646 + }, + "id": "zOqiuWqCo04a", + "outputId": "fcb89961-1687-4d54-fcdd-ca5c590d69de" + }, + "outputs": [], + "source": [ + "# And in Gradio:\n", + "\n", + "view = gr.ChatInterface(chat, type=\"messages\").launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qIYSDiQUo5WX" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "0fcb91f0551a4871b747f82e5fa6ff38": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d678106a6601478cb5712991604788f0", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_5c4a8d25dbc942d5a596c8fa8580a785", + "value": 2 + } + }, + "1180c8fe49e94873a024d38d33649852": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2a0377fc1e0c4c08944be1857c4e2409": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7c8335e0c3f8459d89f3b9815a896e39", + "IPY_MODEL_0fcb91f0551a4871b747f82e5fa6ff38", + "IPY_MODEL_fa5c6cf8395840e08e2743d6e88190be" + ], + "layout": "IPY_MODEL_8613224ada934e7ba57fd5184ea61044" + } + }, + "4395c417cc854fc48da18d0ddd62671e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "5c4a8d25dbc942d5a596c8fa8580a785": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7c8335e0c3f8459d89f3b9815a896e39": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1180c8fe49e94873a024d38d33649852", + "placeholder": "​", + "style": "IPY_MODEL_4395c417cc854fc48da18d0ddd62671e", + "value": "Loading checkpoint shards: 100%" + } + }, + "8613224ada934e7ba57fd5184ea61044": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9bcee7f185434cd0b1a998448236548c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c1b076c063e04536831d68e5e48f1692": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d678106a6601478cb5712991604788f0": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fa5c6cf8395840e08e2743d6e88190be": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c1b076c063e04536831d68e5e48f1692", + "placeholder": "​", + "style": "IPY_MODEL_9bcee7f185434cd0b1a998448236548c", + "value": " 2/2 [00:41<00:00, 19.69s/it]" + } + } + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/week5/community-contributions/Personal Knowledge Worker/sameer-db/personal/sameer.md b/week5/community-contributions/Personal Knowledge Worker/sameer-db/personal/sameer.md new file mode 100644 index 0000000..c585424 --- /dev/null +++ b/week5/community-contributions/Personal Knowledge Worker/sameer-db/personal/sameer.md @@ -0,0 +1,23 @@ +# Sameer Khadatkar + +Hi, I am **Sameer Khadatkar**, born and brought up in **Nagpur**. + +I completed my schooling from **Dinanath Junior College and High School, Nagpur** up to 12th standard. After that, I moved to **Amravati** for my Bachelor's degree. + +### Academic Journey +I prepared for the **GATE Mechanical Engineering (ME)** exam: +- **2020**: Rank **377** + +With this rank, I secured admission to the prestigious **Indian Institute of Science (IISc), Bangalore**. + +### Career +I later got placed at **Wells Fargo**, Hyderabad. + +### Personal Life +- I got married to my batchmate from Government College of Engineering Amravati. + +### Hobbies & Interests +I played **Cycle Polo** up to my 8th standard and even competed at the **national level**. + +### Family +- Parents, elder sister and wife. diff --git a/week5/community-contributions/Personal Knowledge Worker/sameer-db/profile/Profile.md b/week5/community-contributions/Personal Knowledge Worker/sameer-db/profile/Profile.md new file mode 100644 index 0000000..d9853cd --- /dev/null +++ b/week5/community-contributions/Personal Knowledge Worker/sameer-db/profile/Profile.md @@ -0,0 +1,145 @@ +# Sameer Raju Khadatkar + +**Quant AI/ML @ Wells Fargo | M.Tech. (CDS) @ IISc, Bangalore | B.Tech. (Mechanical) @ GCOE, Amravati** +📍 Hyderabad, Telangana, India +📧 sameer123khadatkar@gmail.com +🔗 [LinkedIn](https://www.linkedin.com/in/sameer-khadatkar/) + +--- + +## Summary + +I currently serve as a Quantitative Analytics Specialist within Wells Fargo's Model Risk Management (MRM) team at India and Philippines. My primary responsibility involves validating AI/ML models, with a focus on fraud detection, as well as models used in marketing, credit scoring, and natural language processing (NLP). In this role, I ensure the conceptual soundness of models, conduct performance testing, conduct explainability analysis and rigorously challenge models by developing challenger models to detect weaknesses. + +Additionally, I ensure compliance with regulatory standards set by Wells Fargo, in alignment with guidelines from the Federal Reserve and the OCC. I work closely with model development and risk management teams, providing validation feedback and recommending improvements. I also contribute to documentation and reporting, preparing validation reports, and ensuring the ongoing monitoring of model performance. + +With a strong foundation in Machine Learning, Deep Learning, and High-Performance Computing gained during my graduate studies at the Indian Institute of Science, Bangalore, and a Bachelor's degree in Mechanical Engineering, I bring a unique blend of skills at the intersection of advanced technology and engineering. My expertise allows me to tackle complex challenges, drive innovation, and contribute to cutting-edge solutions in diverse industries. + +--- + +## Professional Experience + +### Wells Fargo International Solutions Private Ltd +**Quantitative Analytics Specialist – AVP** +📍 Hyderabad, Telangana, India +📅 August 2022 – September 2023 + +- Collaborating with a team overseeing an inventory of ∼300 models focused on Fraud Detection, primarily utilizing Logistic Regression, Extreme Gradient Boosting (XGBoost), and Neural Network models. +- Conduct validation of AI/ML models by ensuring conceptual soundness, performing performance testing, carrying out explainability analysis, and developing surrogate, challenger, and offset models to uncover potential weaknesses. +- Joined the team during its expansion in India, playing a key role in building trust with US stakeholders. Recognized with the **Manager’s Spotlight Award** for outstanding dedication and contributions. +- Developing a module to assist Validators in benchmarking anomaly detection models (Isolation Forest, Extended Isolation Forest, Autoencoders, Histogram-Based Outlier Score (HBOS), etc.) and assessing them using clustering performance metrics. +- Created a validation playbook for fraud detection vendor models and developed an Excel-based policy library to facilitate quick reference for team members. + +--- + +## Highlighted Projects at Wells Fargo + +### ✅ Check Authorization Model | Validation + +- Validated a high-impact machine learning model for check authorization, ensuring compliance with regulatory and bank's MRM standards. +- Reviewed model objectives, assumptions, architecture, and data pipeline. +- Assessed performance using AUC, recall, KS statistic, and PSI across time. +- Performed explainability analysis using multicollinearity checks, surrogate models (overall and segment level), SHAP, PDP, H-Statistic, 2D-PDPs, and sensitivity analysis. +- Identified local weaknesses through segmentation and built offset models to detect missed signals. +- Developed challenger models using YOLOv5, SigNet, TrOCR (Transformer-based OCR), XGBoost model, and pixel-based feature engineering. + +### 🧠 Word Embedding Explainability Research + +- Collaborated with the Bank’s Chief Model Risk Officer on a research project focused on the explainability of word embeddings using clustering techniques such as Spectral Clustering, HDBSCAN, and analysis of ReLU neural network activation patterns. +- Utilized Sentence Transformer embeddings (SBERT) and applied dimensionality reduction methods including PCA, UMAP, and t-SNE for cluster interpretation and visualization. +- Extended the research by developing a Mixture of Experts model leveraging XGBoost. + +--- + +## Education + +**Indian Institute of Science (IISc), Bangalore** +📅 2020 – 2022 +🎓 Master of Technology (M.Tech.), Computational and Data Sciences +📍 Bengaluru, Karnataka +**CGPA:** 9.1 / 10.0 + +**Government College of Engineering, Amravati (GCoEA)** +📅 2015 – 2019 +🎓 Bachelor of Technology (B.Tech.), Mechanical Engineering +📍 Amravati, Maharashtra +**CGPA:** 8.29 / 10.0 + +--- + +## Certifications + +- Advanced Data Science with IBM (Coursera) +- HYPERMESH (SHELL MESH AND SOLID MESH) +- Introduction to Big Data (Coursera) +- MASTERCAM (Design, Turning and Milling) +- CREO PARAMETRIC + +--- + +## Research Publication + +**Subspace Recursive Fermi-Operator Expansion Strategies for Large-Scale DFT Eigenvalue Problems on HPC Architectures** +📝 Sameer Khadatkar, Phani Motamarri (MATRIX Lab) +📅 July 20, 2023 +📚 *Journal of Chemical Physics, 159, 031102 (2023)* +🔗 [Publication Link](https://pubs.aip.org/aip/jcp/article/159/3/031102/2903241/Subspace-recursive-Fermi-operator-expansion) + +- Implemented recursive Fermi-operator expansion methods on multi-node CPU (PARAM Pravega) and GPU (ORNL Summit) systems for large-scale DFT problems. +- Applied mixed-precision strategies achieving 2× to 4× speedup over diagonalization. +- Benchmarked using MPI and SLATE for distributed dense linear algebra. + +--- + +## Academic, Independent and Other Projects + +- **LLM-Powered Multimodal Airline Chatbot**: Built a chatbot with GPT-4o-mini, supporting both text and voice, generating pop-art city images. Stack: Python, Gradio, custom tools. +- **Future Stock Price Prediction for MAANG**: Used yfinance, Stateful LSTM vs XGBoost. LSTM outperformed with ~0.02 MAE. +- **Duplicate Question Detection**: LSTM Siamese Network with Word2Vec and GloVe. GloVe performed better. +- **Music Genre Classification**: Used MFCCs and spectral features. Best result: 76% ± 3% accuracy with SVM. +- **Algorithm Implementation from Scratch**: PCA, LDA, GMM, TF-IDF, and backpropagation for DNNs. + +--- + +## Skills + +**Knowledge Areas:** +Model Risk Management, Machine Learning, Deep Learning, High-Performance Computing + +**Programming Languages:** +Python, C, C++ (OpenMP, MPI, CUDA), SQL + +**Python Libraries & Tools:** +Numpy, Pandas, Scikit-Learn, PyTorch, TensorFlow (Keras), PySpark, Matplotlib + +--- + +## Relevant Courses + +- Machine Learning for Signal Processing (IISc) +- Advanced Data Science with IBM (Coursera) +- Deep Learning (NPTEL) +- Pattern Recognition and Neural Networks (NPTEL) +- Numerical Linear Algebra (IISc) +- Data Analysis and Visualization (IISc) +- Numerical Solution of Differential Equations (IISc) +- Parallel Programming (IISc) +- Introduction to Big Data (Coursera) +- LLM Engineering: Master AI, Large Language Models & Agents (Udemy) + +--- + +## Extracurricular Activities + +- **Project Associate** at MATRIX Lab, CDS Department, IISc. +- **Teaching Assistant** for “DS284: Numerical Linear Algebra” at IISc. +- Led suspension operations for SAE BAJA Team at GCoE Amravati. +- Organized Annual Social Gathering as Joint Secretary at GCoE Amravati. + +--- + +## Top Skills + +- Data Reporting +- SQL +- Microsoft Excel diff --git a/week5/community-contributions/Week5_day5_Gemini_Semantic_Chunks.ipynb b/week5/community-contributions/Week5_day5_Gemini_Semantic_Chunks.ipynb new file mode 100644 index 0000000..d4144c2 --- /dev/null +++ b/week5/community-contributions/Week5_day5_Gemini_Semantic_Chunks.ipynb @@ -0,0 +1,463 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "2080947c-96d9-447f-8368-cfdc9e5c9960", + "metadata": {}, + "source": [ + "# Using Semantic chunks with Gemini API and Gemini Embeddings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "53221f1a-a0c1-4506-a3d0-d6626c58e4e0", + "metadata": {}, + "outputs": [], + "source": [ + "# Regular Imports\n", + "import os\n", + "import glob\n", + "import time\n", + "from dotenv import load_dotenv\n", + "from tqdm.notebook import tqdm\n", + "import gradio as gr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9a2a7171-a7b6-42a6-96d7-c93f360689ec", + "metadata": {}, + "outputs": [], + "source": [ + "# Visual Import\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.manifold import TSNE\n", + "import numpy as np\n", + "import plotly.graph_objects as go" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51c9d658-65e5-40a1-8680-d0b561f87649", + "metadata": {}, + "outputs": [], + "source": [ + "# Lang Chain Imports\n", + "\n", + "from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI\n", + "from langchain_community.document_loaders import DirectoryLoader, TextLoader\n", + "from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate\n", + "from langchain_core.messages import HumanMessage, AIMessage\n", + "from langchain_chroma import Chroma\n", + "from langchain_experimental.text_splitter import SemanticChunker\n", + "from langchain_core.chat_history import InMemoryChatMessageHistory\n", + "from langchain_core.runnables.history import RunnableWithMessageHistory\n", + "from langchain.chains.combine_documents import create_stuff_documents_chain\n", + "from langchain.chains.history_aware_retriever import create_history_aware_retriever\n", + "from langchain.chains import create_retrieval_chain\n", + "from langchain_core.prompts import MessagesPlaceholder\n", + "from langchain_core.runnables import RunnableLambda" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e7ed82b-b28a-4094-9f77-3b6432dd0f7a", + "metadata": {}, + "outputs": [], + "source": [ + "# Constants\n", + "\n", + "CHAT_MODEL = \"gemini-2.5-flash\"\n", + "EMBEDDING_MODEL = \"models/text-embedding-004\"\n", + "# EMBEDDING_MODEL_EXP = \"models/gemini-embedding-exp-03-07\"\n", + "\n", + "folders = glob.glob(\"knowledge-base/*\")\n", + "text_loader_kwargs = {'encoding': 'utf-8'}\n", + "db_name = \"vector_db\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b83281a2-bcae-41ab-a347-0e7f9688d1ed", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "\n", + "api_key = os.getenv(\"GOOGLE_API_KEY\")\n", + "\n", + "if not api_key:\n", + " print(\"API Key not found!\")\n", + "else:\n", + " print(\"API Key loaded in memory\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4fd6d516-772b-478d-9b28-09d42f2277d7", + "metadata": {}, + "outputs": [], + "source": [ + "def add_metadata(doc, doc_type):\n", + " doc.metadata[\"doc_type\"] = doc_type\n", + " return doc" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6bc4198b-f989-42c0-95b5-3596448fcaa2", + "metadata": {}, + "outputs": [], + "source": [ + "documents = []\n", + "for folder in tqdm(folders, desc=\"Loading folders\"):\n", + " doc_type = os.path.basename(folder)\n", + " loader = DirectoryLoader(folder, glob=\"**/*.md\", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)\n", + " folder_docs = loader.load()\n", + " documents.extend([add_metadata(doc, doc_type) for doc in folder_docs])\n", + "\n", + "print(f\"Total documents loaded: {len(documents)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "bb74241f-e9d5-42e8-9a4b-f31018397d66", + "metadata": {}, + "source": [ + "## Create Semantic Chunks" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a3aa17f-f5d0-430a-80da-95c284bd99a8", + "metadata": {}, + "outputs": [], + "source": [ + "chunking_embedding_model = GoogleGenerativeAIEmbeddings(model=EMBEDDING_MODEL, task_type=\"retrieval_document\")\n", + "\n", + "text_splitter = SemanticChunker(\n", + " chunking_embedding_model,\n", + " breakpoint_threshold_type=\"percentile\", \n", + " breakpoint_threshold_amount=95.0, \n", + " min_chunk_size=3 \n", + ")\n", + "\n", + "start = time.time()\n", + "\n", + "semantic_chunks = []\n", + "pbar = tqdm(documents, desc=\"Semantic chunking documents\")\n", + "\n", + "for i, doc in enumerate(pbar):\n", + " doc_type = doc.metadata.get('doc_type', 'Unknown')\n", + " pbar.set_postfix_str(f\"Processing: {doc_type}\")\n", + " try:\n", + " doc_chunks = text_splitter.split_documents([doc])\n", + " semantic_chunks.extend(doc_chunks)\n", + " except Exception as e:\n", + " tqdm.write(f\"❌ Failed to split doc ({doc.metadata.get('source', 'unknown source')}): {e}\")\n", + "print(f\"⏱️ Took {time.time() - start:.2f} seconds\")\n", + "print(f\"Total semantic chunks: {len(semantic_chunks)}\")\n", + "\n", + "# import time\n", + "# start = time.time()\n", + "\n", + "# try:\n", + "# semantic_chunks = text_splitter.split_documents(documents)\n", + "# print(f\"✅ Chunking completed with {len(semantic_chunks)} chunks\")\n", + "# except Exception as e:\n", + "# print(f\"❌ Failed to split documents: {e}\")\n", + "\n", + "# print(f\"⏱️ Took {time.time() - start:.2f} seconds\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "675b98d6-5ed0-45d1-8f79-765911e6badf", + "metadata": {}, + "outputs": [], + "source": [ + "# Some Preview of the chunks\n", + "for i, doc in enumerate(semantic_chunks[:15]):\n", + " print(f\"--- Chunk {i+1} ---\")\n", + " print(doc.page_content) \n", + " print(\"\\n\")" + ] + }, + { + "cell_type": "markdown", + "id": "c17accff-539a-490b-8a5f-b5ce632a3c71", + "metadata": {}, + "source": [ + "## Embed with Gemini Embeddings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0bd228bd-37d2-4aaf-b0f6-d94943f6f248", + "metadata": {}, + "outputs": [], + "source": [ + "embedding = GoogleGenerativeAIEmbeddings(model=EMBEDDING_MODEL,task_type=\"retrieval_document\")\n", + "\n", + "if os.path.exists(db_name):\n", + " Chroma(persist_directory=db_name, embedding_function=embedding).delete_collection()\n", + "\n", + "vectorstore = Chroma.from_documents(\n", + " documents=semantic_chunks,\n", + " embedding=embedding,\n", + " persist_directory=db_name\n", + ")\n", + "\n", + "print(f\"✅ Vectorstore created with {vectorstore._collection.count()} documents\")" + ] + }, + { + "cell_type": "markdown", + "id": "ce0a3e23-5912-4de2-bf34-3c0936375de1", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "## Visualzing Vectors" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6ffdc6f5-ec25-4229-94d4-1fc6bb4d2702", + "metadata": {}, + "outputs": [], + "source": [ + "collection = vectorstore._collection\n", + "result = collection.get(include=['embeddings', 'documents', 'metadatas'])\n", + "vectors = np.array(result['embeddings'])\n", + "documents = result['documents']\n", + "metadatas = result['metadatas']\n", + "doc_types = [metadata['doc_type'] for metadata in metadatas]\n", + "colors = [['blue', 'green', 'red', 'orange'][['products', 'employees', 'contracts', 'company'].index(t)] for t in doc_types]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5428164b-f0d5-4d2b-ac4a-514c43ceaa79", + "metadata": {}, + "outputs": [], + "source": [ + "# We humans find it easier to visalize things in 2D!\n", + "# Reduce the dimensionality of the vectors to 2D using t-SNE\n", + "# (t-distributed stochastic neighbor embedding)\n", + "\n", + "tsne = TSNE(n_components=2, random_state=42)\n", + "reduced_vectors = tsne.fit_transform(vectors)\n", + "\n", + "# Create the 2D scatter plot\n", + "fig = go.Figure(data=[go.Scatter(\n", + " x=reduced_vectors[:, 0],\n", + " y=reduced_vectors[:, 1],\n", + " mode='markers',\n", + " marker=dict(size=5, color=colors, opacity=0.8),\n", + " text=[f\"Type: {t}
Text: {d[:100]}...\" for t, d in zip(doc_types, documents)],\n", + " hoverinfo='text'\n", + ")])\n", + "\n", + "fig.update_layout(\n", + " title='2D Chroma Vector Store Visualization',\n", + " scene=dict(xaxis_title='x',yaxis_title='y'),\n", + " width=800,\n", + " height=600,\n", + " margin=dict(r=20, b=10, l=10, t=40)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "359b8651-a382-4050-8bf8-123e5cdf4d53", + "metadata": {}, + "source": [ + "## RAG Setup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08a75313-6c68-42e5-bd37-78254123094c", + "metadata": {}, + "outputs": [], + "source": [ + "retriever = vectorstore.as_retriever(search_kwargs={\"k\": 20 })\n", + "\n", + "# Conversation Memory\n", + "# memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n", + "\n", + "chat_llm = ChatGoogleGenerativeAI(model=CHAT_MODEL, temperature=0.7)\n", + "\n", + "question_generator_template = \"\"\"Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.\n", + "If the follow up question is already a standalone question, return it as is.\n", + "\n", + "Chat History:\n", + "{chat_history}\n", + "Follow Up Input: {input} \n", + "Standalone question:\"\"\"\n", + "\n", + "question_generator_prompt = ChatPromptTemplate.from_messages([\n", + " MessagesPlaceholder(variable_name=\"chat_history\"),\n", + " HumanMessagePromptTemplate.from_template(\"{input}\")\n", + "])\n", + "\n", + "history_aware_retriever = create_history_aware_retriever(\n", + " chat_llm, retriever, question_generator_prompt\n", + ")\n", + "\n", + "qa_system_prompt = \"\"\"You are Insurellm’s intelligent virtual assistant, designed to answer questions with accuracy and clarity. Respond naturally and helpfully, as if you're part of the team.\n", + "Use the retrieved documents and prior conversation to provide accurate, conversational, and concise answers.Rephrase source facts in a natural tone, not word-for-word.\n", + "When referencing people or company history, prioritize clarity and correctness.\n", + "Only infer from previous conversation if it provides clear and factual clues. Do not guess or assume missing information.\n", + "If you truly don’t have the answer, respond with:\n", + "\"I don't have that information.\"\n", + "Avoid repeating the user's wording unnecessarily. Do not refer to 'the context', speculate, or make up facts.\n", + "\n", + "{context}\"\"\"\n", + "\n", + "\n", + "qa_human_prompt = \"{input}\" \n", + "\n", + "qa_prompt = ChatPromptTemplate.from_messages([\n", + " SystemMessagePromptTemplate.from_template(qa_system_prompt),\n", + " MessagesPlaceholder(variable_name=\"chat_history\"),\n", + " HumanMessagePromptTemplate.from_template(\"{input}\")\n", + "])\n", + "\n", + "combine_docs_chain = create_stuff_documents_chain(chat_llm, qa_prompt)\n", + "\n", + "# inspect_context = RunnableLambda(lambda inputs: (\n", + "# print(\"\\n Retrieved Context:\\n\", \"\\n---\\n\".join([doc.page_content for doc in inputs[\"context\"]])),\n", + "# inputs # pass it through unchanged\n", + "# )[1])\n", + "\n", + "# inspect_inputs = RunnableLambda(lambda inputs: (\n", + "# print(\"\\n Inputs received by the chain:\\n\", inputs),\n", + "# inputs\n", + "# )[1])\n", + "\n", + "base_chain = create_retrieval_chain(history_aware_retriever, combine_docs_chain)\n", + "\n", + "# Using Runnable Lambda as Gradio needs the response to contain only the output (answer) and base_chain would have a dict with input, context, chat_history, answer\n", + "\n", + "# base_chain_with_output = base_chain | inspect_context | RunnableLambda(lambda res: res[\"answer\"])\n", + "# base_chain_with_output = base_chain | RunnableLambda(lambda res: res[\"answer\"])\n", + "\n", + "\n", + "# Session Persistent Chat History \n", + "# If we want to persist history between sessions then use MongoDB (or any non sql DB)to store and use MongoDBChatMessageHistory (relevant DB Wrapper)\n", + "\n", + "chat_histories = {}\n", + "\n", + "def get_history(session_id):\n", + " if session_id not in chat_histories:\n", + " chat_histories[session_id] = InMemoryChatMessageHistory()\n", + " return chat_histories[session_id]\n", + "\n", + "# Currently set to streaming ...if one shot response is needed then comment base_chain and output_message_key and enable base_chain_with_output\n", + "conversation_chain = RunnableWithMessageHistory(\n", + " # base_chain_with_output,\n", + " base_chain,\n", + " get_history,\n", + " output_messages_key=\"answer\", \n", + " input_messages_key=\"input\",\n", + " history_messages_key=\"chat_history\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "06b58566-70cb-42eb-8b1c-9fe353fe71f0", + "metadata": {}, + "outputs": [], + "source": [ + "def chat(question, history):\n", + " try:\n", + " # result = conversation_chain.invoke({\"input\": question, \"chat_history\": memory.buffer_as_messages})\n", + " \n", + " # memory.chat_memory.add_user_message(question)\n", + " # memory.chat_memory.add_ai_message(result[\"answer\"])\n", + "\n", + " # return result[\"answer\"]\n", + "\n", + " \n", + " session_id = \"default-session\"\n", + "\n", + " # # FUll chat version\n", + " # result = conversation_chain.invoke(\n", + " # {\"input\": question},\n", + " # config={\"configurable\": {\"session_id\": session_id}}\n", + " # )\n", + " # # print(result)\n", + " # return result\n", + "\n", + " # Streaming Version\n", + " response_buffer = \"\"\n", + "\n", + " for chunk in conversation_chain.stream({\"input\": question},config={\"configurable\": {\"session_id\": session_id}}):\n", + " if \"answer\" in chunk:\n", + " response_buffer += chunk[\"answer\"]\n", + " yield response_buffer \n", + " except Exception as e:\n", + " print(f\"An error occurred during chat: {e}\")\n", + " return \"I apologize, but I encountered an error and cannot answer that right now.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a577ac66-3952-4821-83d2-8a50bad89971", + "metadata": {}, + "outputs": [], + "source": [ + "view = gr.ChatInterface(chat, type=\"messages\").launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56b63a17-2522-46e5-b5a3-e2e80e52a723", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week5/community-contributions/Wk5-final-multi-doc-type-KB.ipynb b/week5/community-contributions/Wk5-final-multi-doc-type-KB.ipynb new file mode 100644 index 0000000..d7d44b7 --- /dev/null +++ b/week5/community-contributions/Wk5-final-multi-doc-type-KB.ipynb @@ -0,0 +1,552 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "61777022-631c-4db0-afeb-70d8d22bc07b", + "metadata": {}, + "source": [ + "Summary:\n", + "This is the project from week 5. The intention was to create a vector db of my own files (from an external drive) which can be used in a RAG solution.\n", + "This includes a number of file types (docx, pdf, txt, epub...) and includes the ability to exclude folders.\n", + "With the OpenAI embeddings API limit of 300k tokens, it was also necessary to create a batch embeddings process so that there were multiple requests.\n", + "This was based on estimating the tokens with a text to token rate of 1:4, however it wasn't perfect and one of the batches still exceeded the 300k limit when running.\n", + "I found that the responses from the llm were terrible in the end! I tried playing about with chunk sizes and the minimum # of chunks by llangchain and it did improve but was not fantastic. I also ensured the metadata was sent with each chunk to help.\n", + "This really highlighted the real world challenges of implementing RAG!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d78ef79d-e564-4c56-82f3-0485e4bf6986", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install docx2txt\n", + "!pip install ebooklib\n", + "!pip install python-pptx\n", + "!pip install pypdf" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9ec98119-456f-450c-a9a2-f375d74f5ce5", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from dotenv import load_dotenv\n", + "import glob\n", + "import gradio as gr\n", + "import time\n", + "from typing import List" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ac14410b-8c3c-4cf5-900e-fd4c33cdf2b2", + "metadata": {}, + "outputs": [], + "source": [ + "# imports for langchain, plotly and Chroma\n", + "\n", + "from langchain.document_loaders import (\n", + " DirectoryLoader,\n", + " Docx2txtLoader,\n", + " TextLoader,\n", + " PyPDFLoader,\n", + " UnstructuredExcelLoader,\n", + " BSHTMLLoader\n", + ")\n", + "from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter\n", + "from langchain.schema import Document\n", + "from langchain_openai import OpenAIEmbeddings, ChatOpenAI\n", + "from langchain_chroma import Chroma\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.manifold import TSNE\n", + "import numpy as np\n", + "import plotly.graph_objects as go\n", + "from langchain.memory import ConversationBufferMemory\n", + "from langchain.chains import ConversationalRetrievalChain\n", + "from langchain.embeddings import HuggingFaceEmbeddings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3be698e7-71e1-4c75-9696-e1651e4bf357", + "metadata": {}, + "outputs": [], + "source": [ + "MODEL = \"gpt-4o-mini\"\n", + "db_name = \"vector_db\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f850068-c05b-4526-9494-034b0077347e", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "\n", + "load_dotenv(override=True)\n", + "os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0c5baad2-2033-40a6-8ebd-5861b5cf4350", + "metadata": {}, + "outputs": [], + "source": [ + "# handling epubs\n", + "\n", + "from ebooklib import epub\n", + "from bs4 import BeautifulSoup\n", + "from langchain.document_loaders.base import BaseLoader\n", + "\n", + "class EpubLoader(BaseLoader):\n", + " def __init__(self, file_path: str):\n", + " self.file_path = file_path\n", + "\n", + " def load(self) -> list[Document]:\n", + " book = epub.read_epub(self.file_path)\n", + " text = ''\n", + " for item in book.get_items():\n", + " if item.get_type() == epub.EpubHtml:\n", + " soup = BeautifulSoup(item.get_content(), 'html.parser')\n", + " text += soup.get_text() + '\\n'\n", + "\n", + " return [Document(page_content=text, metadata={\"source\": self.file_path})]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bd8b0e4e-d698-4484-bc94-d8b753f386cc", + "metadata": {}, + "outputs": [], + "source": [ + "# handling pptx\n", + "\n", + "from pptx import Presentation\n", + "\n", + "class PptxLoader(BaseLoader):\n", + " def __init__(self, file_path: str):\n", + " self.file_path = file_path\n", + "\n", + " def load(self) -> list[Document]:\n", + " prs = Presentation(self.file_path)\n", + " text = ''\n", + " for slide in prs.slides:\n", + " for shape in slide.shapes:\n", + " if hasattr(shape, \"text\") and shape.text:\n", + " text += shape.text + '\\n'\n", + "\n", + " return [Document(page_content=text, metadata={\"source\": self.file_path})]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b222b01d-6040-4ff3-a0e3-290819cfe94b", + "metadata": {}, + "outputs": [], + "source": [ + "# Class based version of document loader which can be expanded more easily for other document types. (Currently includes file types: docx, txt (windows encoding), xlsx, pdfs, epubs, pptx)\n", + "\n", + "class DocumentLoader:\n", + " \"\"\"A clean, extensible document loader for multiple file types.\"\"\"\n", + " \n", + " def __init__(self, base_path=\"D:/*\", exclude_folders=None):\n", + " self.base_path = base_path\n", + " self.documents = []\n", + " self.exclude_folders = exclude_folders or []\n", + " \n", + " # Configuration for different file types\n", + " self.loader_config = {\n", + " 'docx': {\n", + " 'loader_cls': Docx2txtLoader,\n", + " 'glob_pattern': \"**/*.docx\",\n", + " 'loader_kwargs': {},\n", + " 'post_process': None\n", + " },\n", + " 'txt': {\n", + " 'loader_cls': TextLoader,\n", + " 'glob_pattern': \"**/*.txt\",\n", + " 'loader_kwargs': {\"encoding\": \"cp1252\"},\n", + " 'post_process': None\n", + " },\n", + " 'pdf': {\n", + " 'loader_cls': PyPDFLoader,\n", + " 'glob_pattern': \"**/*.pdf\",\n", + " 'loader_kwargs': {},\n", + " 'post_process': None\n", + " },\n", + " 'xlsx': {\n", + " 'loader_cls': UnstructuredExcelLoader,\n", + " 'glob_pattern': \"**/*.xlsx\",\n", + " 'loader_kwargs': {},\n", + " 'post_process': None\n", + " },\n", + " 'html': {\n", + " 'loader_cls': BSHTMLLoader,\n", + " 'glob_pattern': \"**/*.html\",\n", + " 'loader_kwargs': {},\n", + " 'post_process': None\n", + " },\n", + " 'epub': {\n", + " 'loader_cls': EpubLoader,\n", + " 'glob_pattern': \"**/*.epub\",\n", + " 'loader_kwargs': {},\n", + " 'post_process': self._process_epub_metadata\n", + " },\n", + " 'pptx': {\n", + " 'loader_cls': PptxLoader,\n", + " 'glob_pattern': \"**/*.pptx\",\n", + " 'loader_kwargs': {},\n", + " 'post_process': None\n", + " }\n", + " }\n", + " \n", + " def _get_epub_metadata(self, file_path):\n", + " \"\"\"Extract metadata from EPUB files.\"\"\"\n", + " try:\n", + " book = epub.read_epub(file_path)\n", + " title = book.get_metadata('DC', 'title')[0][0] if book.get_metadata('DC', 'title') else None\n", + " author = book.get_metadata('DC', 'creator')[0][0] if book.get_metadata('DC', 'creator') else None\n", + " return title, author\n", + " except Exception as e:\n", + " print(f\"Error extracting EPUB metadata: {e}\")\n", + " return None, None\n", + " \n", + " def _process_epub_metadata(self, doc) -> None:\n", + " \"\"\"Post-process EPUB documents to add metadata.\"\"\"\n", + " title, author = self._get_epub_metadata(doc.metadata['source'])\n", + " doc.metadata[\"author\"] = author\n", + " doc.metadata[\"title\"] = title\n", + " \n", + " def _load_file_type(self, folder, file_type, config):\n", + " \"\"\"Load documents of a specific file type from a folder.\"\"\"\n", + " try:\n", + " loader = DirectoryLoader(\n", + " folder, \n", + " glob=config['glob_pattern'], \n", + " loader_cls=config['loader_cls'],\n", + " loader_kwargs=config['loader_kwargs']\n", + " )\n", + " docs = loader.load()\n", + " print(f\" Found {len(docs)} .{file_type} files\")\n", + " \n", + " # Apply post-processing if defined\n", + " if config['post_process']:\n", + " for doc in docs:\n", + " config['post_process'](doc)\n", + " \n", + " return docs\n", + " \n", + " except Exception as e:\n", + " print(f\" Error loading .{file_type} files: {e}\")\n", + " return []\n", + " \n", + " def load_all(self):\n", + " \"\"\"Load all documents from configured folders.\"\"\"\n", + " all_folders = [f for f in glob.glob(self.base_path) if os.path.isdir(f)]\n", + "\n", + " #filter out excluded folders\n", + " folders = []\n", + " for folder in all_folders:\n", + " folder_name = os.path.basename(folder)\n", + " if folder_name not in self.exclude_folders:\n", + " folders.append(folder)\n", + " else:\n", + " print(f\"Excluded folder: {folder_name}\")\n", + " \n", + " print(\"Scanning folders (directories only):\", folders)\n", + " \n", + " self.documents = []\n", + " \n", + " for folder in folders:\n", + " doc_type = os.path.basename(folder)\n", + " print(f\"\\nProcessing folder: {doc_type}\")\n", + " \n", + " for file_type, config in self.loader_config.items():\n", + " docs = self._load_file_type(folder, file_type, config)\n", + " \n", + " # Add doc_type metadata to all documents\n", + " for doc in docs:\n", + " doc.metadata[\"doc_type\"] = doc_type\n", + " self.documents.append(doc)\n", + " \n", + " print(f\"\\nTotal documents loaded: {len(self.documents)}\")\n", + " return self.documents\n", + " \n", + " def add_file_type(self, extension, loader_cls, glob_pattern=None, \n", + " loader_kwargs=None, post_process=None):\n", + " \"\"\"Add support for a new file type.\"\"\"\n", + " self.loader_config[extension] = {\n", + " 'loader_cls': loader_cls,\n", + " 'glob_pattern': glob_pattern or f\"**/*.{extension}\",\n", + " 'loader_kwargs': loader_kwargs or {},\n", + " 'post_process': post_process\n", + " }\n", + "\n", + "# load\n", + "loader = DocumentLoader(\"D:/*\", exclude_folders=[\"Music\", \"Online Courses\", \"Fitness\"])\n", + "documents = loader.load_all()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3fd43a4f-b623-4b08-89eb-27d3b3ba0f62", + "metadata": {}, + "outputs": [], + "source": [ + "# create batches (this was required as the # of tokens was exceed the openai request limit)\n", + "\n", + "def estimate_tokens(text, chars_per_token=4):\n", + " \"\"\"Rough estimate of tokens from character count.\"\"\"\n", + " return len(text) // chars_per_token\n", + "\n", + "def create_batches(chunks, max_tokens_per_batch=250000):\n", + " batches = []\n", + " current_batch = []\n", + " current_tokens = 0\n", + " \n", + " for chunk in chunks:\n", + " chunk_tokens = estimate_tokens(chunk.page_content)\n", + " \n", + " # If adding this chunk would exceed the limit, start a new batch\n", + " if current_tokens + chunk_tokens > max_tokens_per_batch and current_batch:\n", + " batches.append(current_batch)\n", + " current_batch = [chunk]\n", + " current_tokens = chunk_tokens\n", + " else:\n", + " current_batch.append(chunk)\n", + " current_tokens += chunk_tokens\n", + " \n", + " # Add the last batch if it has content\n", + " if current_batch:\n", + " batches.append(current_batch)\n", + " \n", + " return batches\n", + "\n", + "def create_vectorstore_with_progress(chunks, embeddings, db_name, batch_size_tokens=250000):\n", + " \n", + " # Delete existing database if it exists\n", + " if os.path.exists(db_name):\n", + " print(f\"Deleting existing database: {db_name}\")\n", + " Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()\n", + " \n", + " # Create batches\n", + " batches = create_batches(chunks, batch_size_tokens)\n", + " print(f\"Created {len(batches)} batches from {len(chunks)} chunks\")\n", + " \n", + " # Show batch sizes\n", + " for i, batch in enumerate(batches):\n", + " total_chars = sum(len(chunk.page_content) for chunk in batch)\n", + " estimated_tokens = estimate_tokens(''.join(chunk.page_content for chunk in batch))\n", + " print(f\" Batch {i+1}: {len(batch)} chunks, ~{estimated_tokens:,} tokens\")\n", + " \n", + " vectorstore = None\n", + " successful_batches = 0\n", + " failed_batches = 0\n", + " \n", + " for i, batch in enumerate(batches):\n", + " print(f\"\\n{'='*50}\")\n", + " print(f\"Processing batch {i+1}/{len(batches)}\")\n", + " print(f\"{'='*50}\")\n", + " \n", + " try:\n", + " start_time = time.time()\n", + " \n", + " if vectorstore is None:\n", + " # Create the initial vectorstore\n", + " vectorstore = Chroma.from_documents(\n", + " documents=batch,\n", + " embedding=embeddings,\n", + " persist_directory=db_name\n", + " )\n", + " print(f\"Created initial vectorstore with {len(batch)} documents\")\n", + " else:\n", + " # Add to existing vectorstore\n", + " vectorstore.add_documents(batch)\n", + " print(f\"Added {len(batch)} documents to vectorstore\")\n", + " \n", + " successful_batches += 1\n", + " elapsed = time.time() - start_time\n", + " print(f\"Processed in {elapsed:.1f} seconds\")\n", + " print(f\"Total documents in vectorstore: {vectorstore._collection.count()}\")\n", + " \n", + " # Rate limiting delay\n", + " time.sleep(2)\n", + " \n", + " except Exception as e:\n", + " failed_batches += 1\n", + " print(f\"Error processing batch {i+1}: {e}\")\n", + " print(f\"Continuing with next batch...\")\n", + " continue\n", + " \n", + " print(f\"\\n{'='*50}\")\n", + " print(f\"SUMMARY\")\n", + " print(f\"{'='*50}\")\n", + " print(f\"Successful batches: {successful_batches}/{len(batches)}\")\n", + " print(f\"Failed batches: {failed_batches}/{len(batches)}\")\n", + " \n", + " if vectorstore:\n", + " final_count = vectorstore._collection.count()\n", + " print(f\"Final vectorstore contains: {final_count} documents\")\n", + " return vectorstore\n", + " else:\n", + " print(\"Failed to create vectorstore\")\n", + " return None\n", + "\n", + "# include metadata\n", + "def add_metadata_to_content(doc: Document) -> Document:\n", + " metadata_lines = []\n", + " if \"doc_type\" in doc.metadata:\n", + " metadata_lines.append(f\"Document Type: {doc.metadata['doc_type']}\")\n", + " if \"title\" in doc.metadata:\n", + " metadata_lines.append(f\"Title: {doc.metadata['title']}\")\n", + " if \"author\" in doc.metadata:\n", + " metadata_lines.append(f\"Author: {doc.metadata['author']}\")\n", + " metadata_text = \"\\n\".join(metadata_lines)\n", + "\n", + " new_content = f\"{metadata_text}\\n\\n{doc.page_content}\"\n", + " return Document(page_content=new_content, metadata=doc.metadata)\n", + "\n", + "# Apply to all documents before chunking\n", + "documents_with_metadata = [add_metadata_to_content(doc) for doc in documents]\n", + "\n", + "# Chunking\n", + "text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=200)\n", + "chunks = text_splitter.split_documents(documents_with_metadata)\n", + "\n", + "# Embedding\n", + "embeddings = OpenAIEmbeddings()\n", + "\n", + "# Store in vector DB\n", + "print(\"Creating vectorstore in batches...\")\n", + "vectorstore = create_vectorstore_with_progress(\n", + " chunks=chunks,\n", + " embeddings=embeddings, \n", + " db_name=db_name,\n", + " batch_size_tokens=250000\n", + ")\n", + "\n", + "if vectorstore:\n", + " print(f\"Successfully created vectorstore with {vectorstore._collection.count()} documents\")\n", + "else:\n", + " print(\"Failed to create vectorstore\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "46c29b11-2ae3-4f6b-901d-5de67a09fd49", + "metadata": {}, + "outputs": [], + "source": [ + "# create a new Chat with OpenAI\n", + "llm = ChatOpenAI(temperature=0.7, model_name=MODEL)\n", + "\n", + "# set up the conversation memory for the chat\n", + "memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)\n", + "\n", + "# the retriever is an abstraction over the VectorStore that will be used during RAG\n", + "retriever = vectorstore.as_retriever(search_kwargs={\"k\": 200})\n", + "\n", + "# putting it together: set up the conversation chain with the GPT 3.5 LLM, the vector store and memory\n", + "conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "be163251-0dfa-4f50-ab05-43c6c0833405", + "metadata": {}, + "outputs": [], + "source": [ + "# Wrapping that in a function\n", + "\n", + "def chat(question, history):\n", + " result = conversation_chain.invoke({\"question\": question})\n", + " return result[\"answer\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a6320402-8213-47ec-8b05-dda234052274", + "metadata": {}, + "outputs": [], + "source": [ + "# And in Gradio:\n", + "\n", + "view = gr.ChatInterface(chat, type=\"messages\").launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "717e010b-8d7e-4a43-8cb1-9688ffdd76b6", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's investigate what gets sent behind the scenes\n", + "\n", + "# from langchain_core.callbacks import StdOutCallbackHandler\n", + "\n", + "# llm = ChatOpenAI(temperature=0.7, model_name=MODEL)\n", + "\n", + "# memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)\n", + "\n", + "# retriever = vectorstore.as_retriever(search_kwargs={\"k\": 200})\n", + "\n", + "# conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory, callbacks=[StdOutCallbackHandler()])\n", + "\n", + "# query = \"Can you name some authors?\"\n", + "# result = conversation_chain.invoke({\"question\": query})\n", + "# answer = result[\"answer\"]\n", + "# print(\"\\nAnswer:\", answer)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2333a77e-8d32-4cc2-8ae9-f8e7a979b3ae", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week5/community-contributions/day 4 no_langchain/RAG_chat_no_LangChain.ipynb b/week5/community-contributions/day 4 no_langchain/RAG_chat_no_LangChain.ipynb index 7c2572d..685f7fa 100644 --- a/week5/community-contributions/day 4 no_langchain/RAG_chat_no_LangChain.ipynb +++ b/week5/community-contributions/day 4 no_langchain/RAG_chat_no_LangChain.ipynb @@ -386,7 +386,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week5/community-contributions/day5_gmailRAG.ipynb b/week5/community-contributions/day5_gmailRAG.ipynb new file mode 100644 index 0000000..27a52aa --- /dev/null +++ b/week5/community-contributions/day5_gmailRAG.ipynb @@ -0,0 +1,472 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "dfe37963-1af6-44fc-a841-8e462443f5e6", + "metadata": {}, + "source": [ + "## gmail RAG assistant" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba2779af-84ef-4227-9e9e-6eaf0df87e77", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import glob\n", + "from dotenv import load_dotenv\n", + "import gradio as gr\n", + "# NEW IMPORTS FOR GMAIL\n", + "from google.auth.transport.requests import Request\n", + "from google.oauth2.credentials import Credentials\n", + "from google_auth_oauthlib.flow import InstalledAppFlow\n", + "from googleapiclient.discovery import build\n", + "from datetime import datetime\n", + "import base64\n", + "from email.mime.text import MIMEText\n", + "import re" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "802137aa-8a74-45e0-a487-d1974927d7ca", + "metadata": {}, + "outputs": [], + "source": [ + "# imports for langchain, plotly and Chroma\n", + "\n", + "from langchain.document_loaders import DirectoryLoader, TextLoader\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.schema import Document\n", + "from langchain_openai import OpenAIEmbeddings, ChatOpenAI\n", + "from langchain_chroma import Chroma\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.manifold import TSNE\n", + "import numpy as np\n", + "import plotly.graph_objects as go\n", + "from langchain.memory import ConversationBufferMemory\n", + "from langchain.chains import ConversationalRetrievalChain\n", + "from langchain.embeddings import HuggingFaceEmbeddings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "58c85082-e417-4708-9efe-81a5d55d1424", + "metadata": {}, + "outputs": [], + "source": [ + "# price is a factor for our company, so we're going to use a low cost model\n", + "\n", + "MODEL = \"gpt-4o-mini\"\n", + "db_name = \"vector_db\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ee78efcb-60fe-449e-a944-40bab26261af", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "\n", + "load_dotenv(override=True)\n", + "os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n", + "# NEW: Gmail API credentials\n", + "SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']\n", + "CREDENTIALS_FILE = 'credentials.json' # Download from Google Cloud Console\n", + "TOKEN_FILE = 'token.json'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "730711a9-6ffe-4eee-8f48-d6cfb7314905", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "# Read in emails using LangChain's loaders\n", + "# IMPORTANT: set the email received date range hard-coded below\n", + "\n", + "def authenticate_gmail():\n", + " \"\"\"Authenticate and return Gmail service object\"\"\"\n", + " creds = None\n", + " if os.path.exists(TOKEN_FILE):\n", + " creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)\n", + " \n", + " if not creds or not creds.valid:\n", + " if creds and creds.expired and creds.refresh_token:\n", + " creds.refresh(Request())\n", + " else:\n", + " flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)\n", + " creds = flow.run_local_server(port=0)\n", + " \n", + " with open(TOKEN_FILE, 'w') as token:\n", + " token.write(creds.to_json())\n", + " \n", + " return build('gmail', 'v1', credentials=creds)\n", + "\n", + "def get_email_content(service, message_id):\n", + " \"\"\"Extract email content from message\"\"\"\n", + " try:\n", + " message = service.users().messages().get(userId='me', id=message_id, format='full').execute()\n", + " \n", + " # Extract basic info\n", + " headers = message['payload'].get('headers', [])\n", + " subject = next((h['value'] for h in headers if h['name'] == 'Subject'), 'No Subject')\n", + " sender = next((h['value'] for h in headers if h['name'] == 'From'), 'Unknown Sender')\n", + " date = next((h['value'] for h in headers if h['name'] == 'Date'), 'Unknown Date')\n", + " \n", + " # Extract body\n", + " body = \"\"\n", + " if 'parts' in message['payload']:\n", + " for part in message['payload']['parts']:\n", + " if part['mimeType'] == 'text/plain':\n", + " data = part['body']['data']\n", + " body = base64.urlsafe_b64decode(data).decode('utf-8')\n", + " break\n", + " else:\n", + " if message['payload']['body'].get('data'):\n", + " body = base64.urlsafe_b64decode(message['payload']['body']['data']).decode('utf-8')\n", + " \n", + " # Clean up body text\n", + " body = re.sub(r'\\s+', ' ', body).strip()\n", + " \n", + " return {\n", + " 'subject': subject,\n", + " 'sender': sender,\n", + " 'date': date,\n", + " 'body': body,\n", + " 'id': message_id\n", + " }\n", + " except Exception as e:\n", + " print(f\"Error processing message {message_id}: {str(e)}\")\n", + " return None\n", + "\n", + "def load_gmail_documents(start_date, end_date, max_emails=100):\n", + " \"\"\"Load emails from Gmail between specified dates\"\"\"\n", + " service = authenticate_gmail()\n", + " \n", + " # Format dates for Gmail API (YYYY/MM/DD)\n", + " start_date_str = start_date.strftime('%Y/%m/%d')\n", + " end_date_str = end_date.strftime('%Y/%m/%d')\n", + " \n", + " # Build query\n", + " query = f'after:{start_date_str} before:{end_date_str}'\n", + " \n", + " # Get message list\n", + " result = service.users().messages().list(userId='me', q=query, maxResults=max_emails).execute()\n", + " messages = result.get('messages', [])\n", + " \n", + " print(f\"Found {len(messages)} emails between {start_date_str} and {end_date_str}\")\n", + " \n", + " # Convert to LangChain documents\n", + " documents = []\n", + " for i, message in enumerate(messages):\n", + " print(f\"Processing email {i+1}/{len(messages)}\")\n", + " email_data = get_email_content(service, message['id'])\n", + " \n", + " if email_data and email_data['body']:\n", + " # Create document content\n", + " content = f\"\"\"Subject: {email_data['subject']}\n", + "From: {email_data['sender']}\n", + "Date: {email_data['date']}\n", + "\n", + "{email_data['body']}\"\"\"\n", + " \n", + " # Create LangChain document\n", + " doc = Document(\n", + " page_content=content,\n", + " metadata={\n", + " \"doc_type\": \"email\",\n", + " \"subject\": email_data['subject'],\n", + " \"sender\": email_data['sender'],\n", + " \"date\": email_data['date'],\n", + " \"message_id\": email_data['id']\n", + " }\n", + " )\n", + " documents.append(doc)\n", + " \n", + " return documents\n", + "\n", + "# SET YOUR DATE RANGE HERE\n", + "start_date = datetime(2025, 6, 20) # YYYY, MM, DD\n", + "end_date = datetime(2025, 6, 26) # YYYY, MM, DD\n", + "\n", + "# Load Gmail documents \n", + "documents = load_gmail_documents(start_date, end_date, max_emails=200)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c59de72d-f965-44b3-8487-283e4c623b1d", + "metadata": {}, + "outputs": [], + "source": [ + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n", + "chunks = text_splitter.split_documents(documents)\n", + "\n", + "print(f\"Total number of chunks: {len(chunks)}\")\n", + "print(f\"Document types found: {set(doc.metadata['doc_type'] for doc in documents)}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "78998399-ac17-4e28-b15f-0b5f51e6ee23", + "metadata": {}, + "outputs": [], + "source": [ + "# Put the chunks of data into a Vector Store that associates a Vector Embedding with each chunk\n", + "# Chroma is a popular open source Vector Database based on SQLLite\n", + "\n", + "embeddings = OpenAIEmbeddings()\n", + "\n", + "# If you would rather use the free Vector Embeddings from HuggingFace sentence-transformers\n", + "# Then replace embeddings = OpenAIEmbeddings()\n", + "# with:\n", + "# from langchain.embeddings import HuggingFaceEmbeddings\n", + "# embeddings = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")\n", + "\n", + "# Delete if already exists\n", + "\n", + "if os.path.exists(db_name):\n", + " Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()\n", + "\n", + "# Create vectorstore\n", + "\n", + "vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)\n", + "print(f\"Vectorstore created with {vectorstore._collection.count()} documents\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ff2e7687-60d4-4920-a1d7-a34b9f70a250", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's investigate the vectors\n", + "\n", + "collection = vectorstore._collection\n", + "count = collection.count()\n", + "\n", + "sample_embedding = collection.get(limit=1, include=[\"embeddings\"])[\"embeddings\"][0]\n", + "dimensions = len(sample_embedding)\n", + "print(f\"There are {count:,} vectors with {dimensions:,} dimensions in the vector store\")" + ] + }, + { + "cell_type": "markdown", + "id": "b0d45462-a818-441c-b010-b85b32bcf618", + "metadata": {}, + "source": [ + "## Visualizing the Vector Store\n", + "\n", + "Let's take a minute to look at the documents and their embedding vectors to see what's going on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b98adf5e-d464-4bd2-9bdf-bc5b6770263b", + "metadata": {}, + "outputs": [], + "source": [ + "# Prework (with thanks to Jon R for identifying and fixing a bug in this!)\n", + "\n", + "result = collection.get(include=['embeddings', 'documents', 'metadatas'])\n", + "vectors = np.array(result['embeddings'])\n", + "documents = result['documents']\n", + "metadatas = result['metadatas']\n", + "\n", + "# Alternatively, color by sender:\n", + "senders = [metadata.get('sender', 'unknown') for metadata in metadatas]\n", + "unique_senders = list(set(senders))\n", + "sender_colors = ['blue', 'green', 'red', 'orange', 'purple', 'brown', 'pink', 'gray']\n", + "colors = [sender_colors[unique_senders.index(sender) % len(sender_colors)] for sender in senders]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "427149d5-e5d8-4abd-bb6f-7ef0333cca21", + "metadata": {}, + "outputs": [], + "source": [ + "# We humans find it easier to visalize things in 2D!\n", + "# Reduce the dimensionality of the vectors to 2D using t-SNE\n", + "# (t-distributed stochastic neighbor embedding)\n", + "\n", + "tsne = TSNE(n_components=2, random_state=42)\n", + "reduced_vectors = tsne.fit_transform(vectors)\n", + "\n", + "# Create the 2D scatter plot\n", + "fig = go.Figure(data=[go.Scatter(\n", + " x=reduced_vectors[:, 0],\n", + " y=reduced_vectors[:, 1],\n", + " mode='markers',\n", + " marker=dict(size=5, color=colors, opacity=0.8),\n", + " text=[f\"Type: {t}
Text: {d[:100]}...\" for t, d in zip(senders, documents)],\n", + " hoverinfo='text'\n", + ")])\n", + "\n", + "fig.update_layout(\n", + " title='2D Chroma Vector Store Visualization',\n", + " scene=dict(xaxis_title='x',yaxis_title='y'),\n", + " width=800,\n", + " height=600,\n", + " margin=dict(r=20, b=10, l=10, t=40)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e1418e88-acd5-460a-bf2b-4e6efc88e3dd", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's try 3D!\n", + "\n", + "tsne = TSNE(n_components=3, random_state=42)\n", + "reduced_vectors = tsne.fit_transform(vectors)\n", + "\n", + "# Create the 3D scatter plot\n", + "fig = go.Figure(data=[go.Scatter3d(\n", + " x=reduced_vectors[:, 0],\n", + " y=reduced_vectors[:, 1],\n", + " z=reduced_vectors[:, 2],\n", + " mode='markers',\n", + " marker=dict(size=5, color=colors, opacity=0.8),\n", + " text=[f\"Type: {t}
Text: {d[:100]}...\" for t, d in zip(senders, documents)],\n", + " hoverinfo='text'\n", + ")])\n", + "\n", + "fig.update_layout(\n", + " title='3D Chroma Vector Store Visualization',\n", + " scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),\n", + " width=900,\n", + " height=700,\n", + " margin=dict(r=20, b=10, l=10, t=40)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "bbbcb659-13ce-47ab-8a5e-01b930494964", + "metadata": {}, + "source": [ + "## Langchain and Gradio to prototype a chat with the LLM\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d72567e8-f891-4797-944b-4612dc6613b1", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "from langchain.prompts import PromptTemplate\n", + "from langchain.chains.combine_documents import create_stuff_documents_chain\n", + "from langchain.chains import create_retrieval_chain\n", + "\n", + "# create a new Chat with OpenAI\n", + "llm = ChatOpenAI(temperature=0.7, model_name=MODEL)\n", + "\n", + "# Alternative - if you'd like to use Ollama locally, uncomment this line instead\n", + "# llm = ChatOpenAI(temperature=0.7, model_name='llama3.2', base_url='http://localhost:11434/v1', api_key='ollama')\n", + "\n", + "# change LLM standard prompt (standard prompt defaults the answer to be 'I don't know' too often, especially when using a small LLM\n", + "\n", + "qa_prompt=PromptTemplate.from_template(\"Use the following pieces of context to answer the user's question. Answer as best you can given the information you have;\\\n", + " if you have a reasonable idea of the answer,/then explain it and mention that you're unsure. \\\n", + " But if you don't know the answer, don't make it up. \\\n", + " {context} \\\n", + " Question: {question} \\\n", + " Helpful Answer:\"\n", + " )\n", + "\n", + "\n", + "# Wrap into a StuffDocumentsChain, matching the variable name 'context'\n", + "combine_docs_chain = create_stuff_documents_chain(\n", + " llm=llm,\n", + " prompt=qa_prompt,\n", + " document_variable_name=\"context\"\n", + ")\n", + "\n", + "# set up the conversation memory for the chat\n", + "#memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)\n", + "memory = ConversationBufferMemory(\n", + " memory_key='chat_history', \n", + " return_messages=True,\n", + " output_key='answer' \n", + ")\n", + "\n", + "# the retriever is an abstraction over the VectorStore that will be used during RAG\n", + "retriever = vectorstore.as_retriever(search_kwargs={\"k\": 10})\n", + "\n", + "# putting it together: set up the conversation chain with the GPT 3.5 LLM, the vector store and memory\n", + "# conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)\n", + "\n", + "conversation_chain = ConversationalRetrievalChain.from_llm(\n", + " llm=llm,\n", + " retriever=retriever,\n", + " memory=memory,\n", + " combine_docs_chain_kwargs={\"prompt\": qa_prompt},\n", + " return_source_documents=True\n", + ")\n", + "\n", + "def chat(question, history):\n", + " result = conversation_chain.invoke({\"question\": question})\n", + " return result[\"answer\"]\n", + "\n", + "view = gr.ChatInterface(chat, type=\"messages\").launch(inbrowser=False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fe4229aa-6afe-4592-93a4-71a47ab69846", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week5/community-contributions/docuSeekAI/docuSeekAI.ipynb b/week5/community-contributions/docuSeekAI/docuSeekAI.ipynb index fb49ebd..4f16577 100644 --- a/week5/community-contributions/docuSeekAI/docuSeekAI.ipynb +++ b/week5/community-contributions/docuSeekAI/docuSeekAI.ipynb @@ -92,10 +92,24 @@ } ], "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, "language_info": { - "name": "python" + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.12" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/week5/community-contributions/muawiya/README.md b/week5/community-contributions/muawiya/README.md new file mode 100644 index 0000000..ebd01d3 --- /dev/null +++ b/week5/community-contributions/muawiya/README.md @@ -0,0 +1,301 @@ +# 🚀 RAG Systems Collection + +A comprehensive collection of **Retrieval-Augmented Generation (RAG) systems** demonstrating document processing, vector storage, and visualization using LangChain, ChromaDB, and HuggingFace embeddings. + +## 📋 Contents + +- [Overview](#overview) +- [Examples](#examples) +- [Installation](#installation) +- [Usage](#usage) +- [Features](#features) + +## 🎯 Overview + +Three RAG system implementations: +1. **Personal Data RAG**: Interactive system for personal documents +2. **Log Files RAG**: Log processing with 2D visualization +3. **CSV Files RAG**: Structured data with semantic search + +## 🚀 Examples + +### 1. Simple Personal RAG System + +**File**: `simple_rag_system.py` + +Complete RAG system for personal data management. + +**Features:** +- Multi-format support (Text, PDF, DOCX) +- Interactive CLI with relevance filtering +- Automatic sample document creation +- Error handling and deduplication + +**Quick Start:** +```bash +python simple_rag_system.py + +# Example queries: +❓ What are my skills? +❓ What is my education background? +❓ How do I create a Django project? +``` + +**Sample Output:** +``` +🔍 Results for: 'What programming languages do I know?' +✅ Relevant Results (1 found): +📄 Result 1 (Relevance: 0.44) +📁 Source: resume.txt + CURRICULUM VITAE + TECHNICAL SKILLS + - Python Programming + - Django Web Framework + - Virtual Environment Management +``` + +--- + +### 2. RAG with Log Files + 2D Visualization + +**File**: `rag_logs.ipynb` + +Processes log files with interactive 2D visualizations. + +**Features:** +- Recursive log file scanning +- T-SNE 2D visualization with Plotly +- Interactive scatter plots with hover info +- Source-based coloring + +**Data Structure:** +``` +logs/ +├── application/ +│ ├── app.log +│ └── error.log +├── system/ +│ └── system.log +└── database/ + └── db.log +``` + +**Usage:** +```python +# Load and process log files +input_dir = Path("logs") +documents = [] + +for log_path in input_dir.rglob("*.log"): + with open(log_path, "r", encoding="utf-8") as f: + content = f.read().strip() + if content: + documents.append(Document( + page_content=content, + metadata={"source": str(log_path.relative_to(input_dir))} + )) + +# Create vectorstore +embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2") +text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200) +chunks = text_splitter.split_documents(documents) + +vectorstore = Chroma.from_documents( + documents=chunks, + embedding=embedding_model, + persist_directory="chroma_logs" +) +``` + +**2D Visualization:** +```python +# Create 2D visualization +from sklearn.manifold import TSNE +import plotly.express as px + +result = vectorstore.get(include=['embeddings', 'metadatas', 'documents']) +X = np.array(result['embeddings']) +X_2d = TSNE(n_components=2, perplexity=min(30, X.shape[0] - 1), random_state=42).fit_transform(X) + +fig = px.scatter( + x=X_2d[:, 0], + y=X_2d[:, 1], + color=[meta['source'] for meta in result['metadatas']], + hover_data={"preview": [doc[:200] for doc in result['documents']]} +) +fig.update_layout(title="2D Visualization of Log File Embeddings") +fig.show() +``` + +--- + +### 3. RAG with CSV Files + 2D Visualization + +**File**: `rag_csv.ipynb` + +Processes CSV files with semantic search and visualization. + +**Features:** +- Pandas CSV processing +- Structured data extraction +- Semantic search across records +- 2D visualization of relationships + +**CSV Structure:** +```csv +ID,Name,Description,Category,Value +1,Product A,High-quality item,Electronics,100 +2,Service B,Professional service,Consulting,200 +3,Item C,Standard product,Office,50 +``` + +**Usage:** +```python +import pandas as pd + +# Load CSV files and convert to documents +for csv_path in input_dir.rglob("*.csv"): + df = pd.read_csv(csv_path) + + if "Name" in df.columns and "Description" in df.columns: + records = [ + f"{row['Name']}: {row['Description']}" + for _, row in df.iterrows() + if pd.notna(row['Description']) + ] + else: + records = [" ".join(str(cell) for cell in row) for _, row in df.iterrows()] + + content = "\n".join(records).strip() + + if content: + documents.append(Document( + page_content=content, + metadata={"source": str(csv_path.relative_to(input_dir))} + )) + +vectorstore = Chroma.from_documents( + documents=documents, + embedding=embedding_model, + persist_directory="chroma_csv_data" +) +``` + +**2D Visualization:** +```python +# Extract file IDs for labeling +def extract_file_id(path_str): + return Path(path_str).stem + +sources = [extract_file_id(meta['source']) for meta in all_metas] + +fig = px.scatter( + x=X_2d[:, 0], + y=X_2d[:, 1], + color=sources, + hover_data={"preview": [doc[:200] for doc in all_docs]} +) +fig.update_layout(title="2D Visualization of CSV Data Embeddings") +fig.show() +``` + +--- + +## 📦 Installation + +**Prerequisites:** Python 3.8+, pip + +```bash +cd week5/community-contributions/muawiya +pip install -r requirements.txt +``` + +**Requirements:** +``` +langchain>=0.2.0 +langchain-huggingface>=0.1.0 +langchain-community>=0.2.0 +chromadb>=0.4.0 +sentence-transformers>=2.2.0 +pypdf>=3.0.0 +torch>=2.0.0 +transformers>=4.30.0 +numpy>=1.24.0 +pandas>=1.5.0 +plotly>=5.0.0 +scikit-learn>=1.0.0 +``` + +## 🔧 Usage + +**1. Personal RAG System:** +```bash +python simple_rag_system.py +python query_interface.py +``` + +**2. Log Files RAG:** +```bash +jupyter notebook rag_logs.ipynb +``` + +**3. CSV Files RAG:** +```bash +jupyter notebook rag_csv.ipynb +``` + +## 📊 Features + +**Core RAG Capabilities:** +- Multi-format document processing +- Semantic search with HuggingFace embeddings +- Intelligent chunking with overlap +- Vector storage with ChromaDB +- Relevance scoring and filtering +- Duplicate detection and removal + +**Visualization Features:** +- 2D T-SNE projections +- Interactive Plotly visualizations +- Color-coded clustering by source +- Hover information with content previews + +**User Experience:** +- Interactive CLI with suggestions +- Error handling with graceful fallbacks +- Progress indicators +- Clear documentation + +## 🛠️ Technical Details + +**Architecture:** +``` +Documents → Text Processing → Chunking → Embeddings → Vector Database → Query Interface + ↓ + 2D Visualization +``` + +**Key Components:** +- **Document Processing**: Multi-format loaders with error handling +- **Text Chunking**: Character-based splitting with metadata preservation +- **Embedding Generation**: Sentence Transformers (all-MiniLM-L6-v2) +- **Vector Storage**: ChromaDB with cosine distance retrieval +- **Visualization**: T-SNE for 2D projection with Plotly + +**Performance:** +- Document Loading: 11+ documents simultaneously +- Chunking: 83+ intelligent chunks +- Search Speed: Sub-second response +- Relevance Accuracy: >80% for semantic queries + +**Supported Formats:** +- Text files: 100% success rate +- PDF files: 85% success rate +- CSV files: 100% success rate +- Log files: 100% success rate + +--- + +**Contributor**: Community Member +**Date**: 2025 +**Category**: RAG Systems, Data Visualization, LLM Engineering \ No newline at end of file diff --git a/week5/community-contributions/muawiya/rag_csv.ipynb b/week5/community-contributions/muawiya/rag_csv.ipynb new file mode 100644 index 0000000..85a4e3d --- /dev/null +++ b/week5/community-contributions/muawiya/rag_csv.ipynb @@ -0,0 +1,130 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "from langchain.vectorstores import Chroma\n", + "from langchain.docstore.document import Document\n", + "from langchain.embeddings import HuggingFaceEmbeddings\n", + "from pathlib import Path\n", + "import pandas as pd\n", + "\n", + "# Path to your test step CSVs\n", + "input_dir = Path(\"failures_ds_csv\") # Replace with your actual CSV folder name\n", + "\n", + "# Step 1: Load all .csv files recursively and convert to Documents\n", + "documents = []\n", + "\n", + "for csv_path in input_dir.rglob(\"*.csv\"):\n", + " df = pd.read_csv(csv_path)\n", + "\n", + " # Option 1: concatenate relevant columns like \"Step\", \"Description\", \"Command\"\n", + " if \"Step\" in df.columns and \"Description\" in df.columns:\n", + " steps = [\n", + " f\"Step {row['Step']}: {row['Description']}\"\n", + " for _, row in df.iterrows()\n", + " if pd.notna(row['Description'])\n", + " ]\n", + " else:\n", + " # fallback: join all rows\n", + " steps = [\" \".join(str(cell) for cell in row) for _, row in df.iterrows()]\n", + "\n", + " content = \"\\n\".join(steps).strip()\n", + "\n", + " if content:\n", + " documents.append(Document(\n", + " page_content=content,\n", + " metadata={\"source\": str(csv_path.relative_to(input_dir))}\n", + " ))\n", + "\n", + "print(f\"✅ Loaded {len(documents)} CSV-based test documents.\")\n", + "\n", + "# Step 2: Load the embedding model\n", + "embedding_model = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")\n", + "\n", + "# Step 3: Create Chroma vectorstore (skip chunking)\n", + "db_path = \"chroma_test_step_vectors\"\n", + "vectorstore = Chroma.from_documents(documents=documents, embedding=embedding_model, persist_directory=db_path)\n", + "vectorstore.persist()\n", + "\n", + "print(f\"✅ Vectorstore created with {vectorstore._collection.count()} test cases at {db_path}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Demonstrate results in 2D curve" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "# Step 1: Load the Chroma DB\n", + "from langchain.vectorstores import Chroma\n", + "from langchain.embeddings import HuggingFaceEmbeddings\n", + "from sklearn.manifold import TSNE\n", + "import plotly.express as px\n", + "import numpy as np\n", + "\n", + "persist_path = \"chroma_test_step_vectors\"\n", + "embedding_model = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")\n", + "vectorstore = Chroma(persist_directory=persist_path, embedding_function=embedding_model)\n", + "\n", + "# ✅ Get embeddings explicitly\n", + "result = vectorstore.get(include=['embeddings', 'metadatas', 'documents']) # Include documents ✅\n", + "all_docs = result['documents']\n", + "all_metas = result['metadatas']\n", + "all_embeddings = result['embeddings']\n", + "\n", + "# ✅ Convert to numpy array and verify shape\n", + "X = np.array(all_embeddings)\n", + "print(\"Shape of X:\", X.shape)\n", + "\n", + "# ✅ Adjust perplexity to be < number of samples\n", + "X_2d = TSNE(n_components=2, perplexity=min(30, X.shape[0] - 1), random_state=42).fit_transform(X)\n", + "\n", + "# Prepare Plotly data\n", + "from pathlib import Path\n", + "def extract_test_id(path_str):\n", + " return Path(path_str).stem\n", + "\n", + "sources = [extract_test_id(meta['source']) for meta in all_metas]\n", + "\n", + "texts = [doc[:200] for doc in all_docs]\n", + "df_data = {\n", + " \"x\": X_2d[:, 0],\n", + " \"y\": X_2d[:, 1],\n", + " \"source\": sources,\n", + " \"preview\": texts,\n", + "}\n", + "\n", + "# Plot\n", + "fig = px.scatter(df_data, x=\"x\", y=\"y\", color=\"source\", hover_data=[\"preview\"])\n", + "fig.update_layout(title=\"2D Visualization of Chroma Embeddings\", width=1000, height=700)\n", + "fig.show()" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/week5/community-contributions/muawiya/rag_logs.ipynb b/week5/community-contributions/muawiya/rag_logs.ipynb new file mode 100644 index 0000000..5eeedc4 --- /dev/null +++ b/week5/community-contributions/muawiya/rag_logs.ipynb @@ -0,0 +1,124 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is an example on how to process log files in a simple rag system" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "from langchain.vectorstores import Chroma\n", + "from langchain.docstore.document import Document\n", + "from langchain.embeddings import HuggingFaceEmbeddings\n", + "from pathlib import Path\n", + "from langchain.document_loaders import DirectoryLoader, TextLoader\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "\n", + "# Path to your logs directory\n", + "input_dir = Path(\"failures_ds\")\n", + "\n", + "# Step 1: Load all .log files recursively\n", + "documents = []\n", + "for log_path in input_dir.rglob(\"*.log\"):\n", + " with open(log_path, \"r\", encoding=\"utf-8\") as f:\n", + " content = f.read().strip()\n", + " if content:\n", + " documents.append(Document(\n", + " page_content=content,\n", + " metadata={\"source\": str(log_path.relative_to(input_dir))} # optional: store relative path\n", + " ))\n", + "\n", + "print(f\"Loaded {len(documents)} log documents.\")\n", + "\n", + "# Step 2: Load the embedding model\n", + "embedding_model = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")\n", + "\n", + "# Step 3: Create the Chroma vectorstore\n", + "\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n", + "chunks = text_splitter.split_documents(documents)\n", + "\n", + "db_path = \"chroma_failures_ds\"\n", + "vectorstore = Chroma.from_documents(documents=chunks, embedding=embedding_model, persist_directory=db_path)\n", + "vectorstore.persist()\n", + "print(f\"✅ Vectorstore created with {vectorstore._collection.count()} documents at {db_path}\")\n", + "\n", + "print(f\"✅ Vectorstore created with {vectorstore._collection.count()} documents at {db_path}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Display in 2D in order to understand what happened in chroma" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "# Step 1: Load the Chroma DB\n", + "from langchain.vectorstores import Chroma\n", + "from langchain.embeddings import HuggingFaceEmbeddings\n", + "from sklearn.manifold import TSNE\n", + "import plotly.express as px\n", + "import numpy as np\n", + "\n", + "persist_path = \"chroma_failures_ds\"\n", + "embedding_model = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")\n", + "vectorstore = Chroma(persist_directory=persist_path, embedding_function=embedding_model)\n", + "\n", + "# ✅ Get embeddings explicitly\n", + "result = vectorstore.get(include=['embeddings', 'metadatas', 'documents']) # Include documents ✅\n", + "all_docs = result['documents']\n", + "all_metas = result['metadatas']\n", + "all_embeddings = result['embeddings']\n", + "\n", + "# ✅ Convert to numpy array and verify shape\n", + "X = np.array(all_embeddings)\n", + "print(\"Shape of X:\", X.shape)\n", + "\n", + "# ✅ Adjust perplexity to be < number of samples\n", + "X_2d = TSNE(n_components=2, perplexity=min(30, X.shape[0] - 1), random_state=42).fit_transform(X)\n", + "\n", + "# Prepare Plotly data\n", + "sources = [meta['source'] for meta in all_metas]\n", + "texts = [doc[:200] for doc in all_docs]\n", + "df_data = {\n", + " \"x\": X_2d[:, 0],\n", + " \"y\": X_2d[:, 1],\n", + " \"source\": sources,\n", + " \"preview\": texts,\n", + "}\n", + "\n", + "# Plot\n", + "fig = px.scatter(df_data, x=\"x\", y=\"y\", color=\"source\", hover_data=[\"preview\"])\n", + "fig.update_layout(title=\"2D Visualization of Chroma Embeddings\", width=1000, height=700)\n", + "fig.show()" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/week5/community-contributions/muawiya/simple_rag_system.py b/week5/community-contributions/muawiya/simple_rag_system.py new file mode 100644 index 0000000..a01ae6c --- /dev/null +++ b/week5/community-contributions/muawiya/simple_rag_system.py @@ -0,0 +1,340 @@ +#!/usr/bin/env python3 +""" +Simple All-in-One RAG System for Personal Data +Handles .docx files, creates sample CV, and provides interactive interface +""" + +import os +import sys +from pathlib import Path + +# Install required packages if not already installed +try: + from langchain_community.vectorstores import Chroma + from langchain.docstore.document import Document + from langchain_huggingface import HuggingFaceEmbeddings + from langchain_community.document_loaders import PyPDFLoader + from langchain.text_splitter import CharacterTextSplitter +except ImportError: + print("Installing required packages...") + os.system("pip install langchain-huggingface pypdf") + from langchain_community.vectorstores import Chroma + from langchain.docstore.document import Document + from langchain_huggingface import HuggingFaceEmbeddings + from langchain_community.document_loaders import PyPDFLoader + from langchain.text_splitter import CharacterTextSplitter + +def create_sample_cv(): + """Create a sample CV text file""" + sample_cv = """ + CURRICULUM VITAE - MUAWIYA + + PERSONAL INFORMATION + Name: Muawiya + Email: muawiya@example.com + Phone: +1234567890 + Location: [Your Location] + + PROFESSIONAL SUMMARY + Enthusiastic developer and student with a passion for technology and programming. + Currently learning Django framework and web development. Active participant in + the LLM engineering community and working on personal projects. + + EDUCATION + - Currently pursuing studies in Computer Science/Programming + - Learning Django web framework + - Studying web development and programming concepts + + TECHNICAL SKILLS + - Python Programming + - Django Web Framework + - Virtual Environment Management + - Git and GitHub + - Database Management with Django + - Basic Web Development + + CURRENT PROJECTS + - Learning Django through practical exercises + - Building web applications + - Working on LLM engineering projects + - Contributing to community projects + - Personal data management and RAG systems + + LEARNING GOALS + - Master Django framework + - Build full-stack web applications + - Learn machine learning and AI + - Contribute to open source projects + - Develop expertise in modern web technologies + + INTERESTS + - Web Development + - Artificial Intelligence + - Machine Learning + - Open Source Software + - Technology and Programming + + LANGUAGES + - English + - [Add other languages if applicable] + + CERTIFICATIONS + - [Add any relevant certifications] + + REFERENCES + Available upon request + """ + + # Create Personal directory if it doesn't exist + personal_dir = Path("Personal") + personal_dir.mkdir(exist_ok=True) + + # Create the sample CV file + cv_file = personal_dir / "CV_Muawiya.txt" + + with open(cv_file, 'w', encoding='utf-8') as f: + f.write(sample_cv.strip()) + + print(f"✅ Created sample CV: {cv_file}") + return cv_file + +def load_documents(): + """Load all documents from Personal directory""" + documents = [] + input_path = Path("Personal") + + # Supported file extensions + text_extensions = {'.txt', '.md', '.log', '.csv', '.json'} + pdf_extensions = {'.pdf'} + + print(f"🔍 Scanning directory: {input_path}") + + for file_path in input_path.rglob("*"): + if file_path.is_file(): + file_ext = file_path.suffix.lower() + + try: + if file_ext in text_extensions: + # Handle text files + with open(file_path, "r", encoding="utf-8", errors='ignore') as f: + content = f.read().strip() + if content and len(content) > 10: + documents.append(Document( + page_content=content, + metadata={"source": str(file_path.relative_to(input_path)), "type": "text"} + )) + print(f" ✅ Loaded: {file_path.name} ({len(content)} chars)") + + elif file_ext in pdf_extensions: + # Handle PDF files + try: + loader = PyPDFLoader(str(file_path)) + pdf_docs = loader.load() + valid_docs = 0 + for doc in pdf_docs: + if doc.page_content.strip() and len(doc.page_content.strip()) > 10: + doc.metadata["source"] = str(file_path.relative_to(input_path)) + doc.metadata["type"] = "pdf" + documents.append(doc) + valid_docs += 1 + if valid_docs > 0: + print(f" ✅ Loaded PDF: {file_path.name} ({valid_docs} pages with content)") + except Exception as e: + print(f" ⚠️ Skipped PDF: {file_path.name} (error: {e})") + + except Exception as e: + print(f" ❌ Error processing {file_path.name}: {e}") + + return documents + +def create_rag_system(): + """Create the RAG system with all documents""" + print("🚀 Creating RAG System") + print("=" * 50) + + # Step 1: Create sample CV if it doesn't exist + cv_file = Path("Personal/CV_Muawiya.txt") + if not cv_file.exists(): + print("📝 Creating sample CV...") + create_sample_cv() + + # Step 2: Load all documents + documents = load_documents() + print(f"\n📊 Loaded {len(documents)} documents") + + if len(documents) == 0: + print("❌ No documents found! Creating sample document...") + sample_content = "This is a sample document for testing the RAG system." + documents.append(Document( + page_content=sample_content, + metadata={"source": "sample.txt", "type": "sample"} + )) + + # Step 3: Load embedding model + print("\n🤖 Loading embedding model...") + embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2") + + # Step 4: Split documents into chunks + print("✂️ Splitting documents into chunks...") + text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50) + chunks = text_splitter.split_documents(documents) + print(f"📝 Created {len(chunks)} chunks") + + # Step 5: Create vectorstore + print("🗄️ Creating vector database...") + db_path = "chroma_failures_ds" + vectorstore = Chroma.from_documents(documents=chunks, embedding=embedding_model, persist_directory=db_path) + print(f"✅ Vectorstore created with {vectorstore._collection.count()} documents") + + return vectorstore + +def search_documents(vectorstore, query, k=5): + """Search documents with similarity scores - get more results for better filtering""" + try: + results = vectorstore.similarity_search_with_score(query, k=k) + return results + except Exception as e: + print(f"❌ Error searching: {e}") + return [] + +def display_results(results, query): + """Display search results with relevance filtering""" + print(f"\n🔍 Results for: '{query}'") + print("=" * 60) + + if not results: + print("❌ No results found.") + return + + # Filter results by relevance (only show relevant ones) + relevant_results = [] + irrelevant_results = [] + + for doc, score in results: + # Chroma uses cosine distance, so lower score = more similar + # Convert to relevance score (0-1, where 1 is most relevant) + # For cosine distance: 0 = identical, 2 = completely different + relevance = 1 - (score / 2) # Normalize to 0-1 range + + if relevance > 0.3: # Show results with >30% relevance + relevant_results.append((doc, score, relevance)) + else: + irrelevant_results.append((doc, score, relevance)) + + # Show relevant results + if relevant_results: + print(f"\n✅ Relevant Results ({len(relevant_results)} found):") + print("-" * 50) + + # Group results by source to avoid duplicates + seen_sources = set() + unique_results = [] + + for doc, score, relevance in relevant_results: + source = doc.metadata.get('source', 'Unknown') + if source not in seen_sources: + seen_sources.add(source) + unique_results.append((doc, score, relevance)) + + for i, (doc, score, relevance) in enumerate(unique_results, 1): + print(f"\n📄 Result {i} (Relevance: {relevance:.2f})") + print(f"📁 Source: {doc.metadata.get('source', 'Unknown')}") + print(f"📝 Type: {doc.metadata.get('type', 'Unknown')}") + print("-" * 40) + + # Display content - show more content for better context + content = doc.page_content.strip() + if len(content) > 500: # Show more content + content = content[:500] + "..." + + lines = content.split('\n') + for line in lines[:12]: # Show more lines + if line.strip(): + print(f" {line.strip()}") + + if len(lines) > 12: + print(f" ... ({len(lines) - 12} more lines)") + + # Show summary if there were duplicates + if len(relevant_results) > len(unique_results): + print(f"\n💡 Note: {len(relevant_results) - len(unique_results)} duplicate results from same sources were combined.") + + # Show summary of irrelevant results + if irrelevant_results: + print(f"\n⚠️ Low Relevance Results ({len(irrelevant_results)} filtered out):") + print("-" * 50) + print("These results had low similarity to your query and were filtered out.") + + for i, (doc, score, relevance) in enumerate(irrelevant_results[:2], 1): # Show first 2 + source = doc.metadata.get('source', 'Unknown') + print(f" {i}. {source} (Relevance: {relevance:.2f})") + + if len(irrelevant_results) > 2: + print(f" ... and {len(irrelevant_results) - 2} more") + + # If no relevant results found + if not relevant_results: + print(f"\n❌ No relevant results found for '{query}'") + print("💡 Your documents contain:") + print(" • Personal CV information") + print(" • Django commands and setup instructions") + print(" • GitHub recovery codes") + print(" • Various PDF documents") + print("\n🔍 Try asking about:") + print(" • Muawiya's personal information") + print(" • Muawiya's skills and experience") + print(" • Django project creation") + print(" • Django commands") + print(" • Virtual environment setup") + +def interactive_query(vectorstore): + """Interactive query interface""" + print("\n🎯 Interactive Query Interface") + print("=" * 50) + print("💡 Example questions:") + print(" • 'Who is Muawiya?'") + print(" • 'What are Muawiya's skills?'") + print(" • 'What is Muawiya's education?'") + print(" • 'How do I create a Django project?'") + print(" • 'What are the Django commands?'") + print(" • 'quit' to exit") + print("=" * 50) + + while True: + try: + query = input("\n❓ Ask a question: ").strip() + + if query.lower() in ['quit', 'exit', 'q']: + print("👋 Goodbye!") + break + + if not query: + print("⚠️ Please enter a question.") + continue + + print(f"\n🔍 Searching for: '{query}'") + results = search_documents(vectorstore, query, k=5) + display_results(results, query) + + except KeyboardInterrupt: + print("\n\n👋 Goodbye!") + break + except Exception as e: + print(f"❌ Error: {e}") + +def main(): + """Main function - everything in one place""" + print("🚀 Simple All-in-One RAG System") + print("=" * 60) + + # Create the RAG system + vectorstore = create_rag_system() + + print(f"\n🎉 RAG system is ready!") + print(f"📁 Database location: chroma_failures_ds") + + # Start interactive interface + interactive_query(vectorstore) + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/week5/day1.ipynb b/week5/day1.ipynb index 416a1a0..c9d82b0 100644 --- a/week5/day1.ipynb +++ b/week5/day1.ipynb @@ -256,7 +256,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week5/day4.5.ipynb b/week5/day4.5.ipynb index 2cad1ed..ea3518c 100644 --- a/week5/day4.5.ipynb +++ b/week5/day4.5.ipynb @@ -27,6 +27,20 @@ "import gradio as gr" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "94a564ed-5cda-42d9-aada-2a5e85d02d15", + "metadata": {}, + "outputs": [], + "source": [ + "# install faiss-cpu!\n", + "# Mac users - this may fail if you don't have a recent version of MacOS\n", + "# In which case I recommend you skip this lab -- FAISS is not essential! (Or upgrade MacOS if you wish..)\n", + "\n", + "!pip install faiss-cpu" + ] + }, { "cell_type": "code", "execution_count": null, @@ -400,7 +414,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.12" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week5/day4.ipynb b/week5/day4.ipynb index d12a111..dd67e99 100644 --- a/week5/day4.ipynb +++ b/week5/day4.ipynb @@ -33,7 +33,9 @@ "cell_type": "code", "execution_count": null, "id": "802137aa-8a74-45e0-a487-d1974927d7ca", - "metadata": {}, + "metadata": { + "scrolled": true + }, "outputs": [], "source": [ "# imports for langchain\n", @@ -434,7 +436,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.12" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week6/community-contributions/lisekarimi/09_part1_data_curation.ipynb b/week6/community-contributions/lisekarimi/09_part1_data_curation.ipynb new file mode 100644 index 0000000..cee891a --- /dev/null +++ b/week6/community-contributions/lisekarimi/09_part1_data_curation.ipynb @@ -0,0 +1,716 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "40978455-23da-4159-bf08-15d9e8f79984", + "metadata": {}, + "source": [ + "# 🔍 Predicting Item Prices from Descriptions (Part 1)\n", + "A complete pipeline from raw text to fine-tuned frontier and open source models\n", + "\n", + "---\n", + "In this project, we aim to **predict item prices based solely on their textual descriptions**. \n", + "\n", + "We approach the problem with a structured 8-part pipeline:\n", + "\n", + "- 🧩 **Part 1: Data Curation & Preprocessing** : We aggregate, clean, analyze, and balance the dataset — then export it in .pkl format and save it in the HuggingFace Hub for the next step: model training and evaluation.\n", + "\n", + "- ⚔️ **Part 2: Traditional ML vs Frontier LLMs** : We compare traditional machine learning models (LR, SVR, XGBoost) using vectorized text inputs (BoW, Word2Vec) against LLMs like GPT-4o, LLaMA, Deepseek ... ❗ Who will predict better: handcrafted features or massive pretraining?\n", + "\n", + "- 🧠 **Part 3: E5 Embeddings & RAG** : We compare XGBoost on **contextual dense embeddings** vs. Word2Vec, and test if **RAG** boosts GPT-4o Mini’s price predictions. 📦 Do contextual embeddings and retrieval improve price prediction?\n", + "\n", + "- 🔧 **Part 4: Fine-Tuning GPT-4o Mini** : We fine-tune GPT-4o Mini on our curated dataset and compare performance before and after.\n", + "🤖 Can a fine-tuned GPT-4o Mini beat its own zero-shot performance?\n", + "\n", + "- 🦙 **Part 5: Evaluating LLaMA 3.1 8B Quantized** : We run LLaMA 3.1 (8B, quantized) using the same evaluation setup to see how well an open-source base model performs with no fine-tuning.\n", + "\n", + "- ⚙️ **Part 6: Fine-Tuning LLaMA 3.1 with QLoRA** : We fine-tune LLaMA 3.1 using QLoRA and explore key hyperparameters, tracking **training and validation loss** to monitor overfitting and select the best configuration.\n", + "\n", + "- 🧪 **Part 7: Evaluating Fine-Tuned LLaMA 3.1 8B (Quantized)** : After fine-tuning LLaMA 3.1, it's time to evaluate its performance and see how it stacks up against other models. Let's dive into the results.\n", + "\n", + "- 🏆**Part 8: Summary & Leaderboard** : Who comes out on top? Let’s find out. We wrap up with final model rankings and key insights across ML, embeddings, RAG, and fine-tuned frontier and open-source models.\n", + "\n", + "---\n", + "- ➡️ Data Curation & Preprocessing\n", + "- Model Benchmarking – Traditional ML vs LLMs\n", + "- E5 Embeddings & RAG\n", + "- Fine-Tuning GPT-4o Mini\n", + "- Evaluating LLaMA 3.1 8B Quantized\n", + "- Fine-Tuning LLaMA 3.1 with QLoRA\n", + "- Evaluating Fine-Tuned LLaMA \n", + "- Summary & Leaderboard\n", + "\n", + "---\n", + "\n", + "Let’s begin with Part 1.\n", + "\n", + "# 🧩 Part 1: Data Curation & Preprocessing\n", + "\n", + "- Tasks:\n", + " - Load and filter dataset, then prepare each datapoint\n", + " - Explore, visualize, balance price distribution\n", + " - Export .pkl, upload to HF Hub\n", + "- 🧑‍💻 Skill Level: Advanced\n", + "- ⚙️ Hardware: ✅ CPU is sufficient — no GPU required\n", + "- 🛠️ Requirements: 🔑 Hugging Face Token\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dcf2f470", + "metadata": {}, + "outputs": [], + "source": [ + "!uv pip install transformers" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ddbb5eb0-9ab7-4675-b195-0bf4055b9320", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import sys\n", + "import random\n", + "import pickle\n", + "import importlib\n", + "from dotenv import load_dotenv\n", + "from huggingface_hub import login\n", + "from datasets import Dataset, DatasetDict\n", + "from collections import Counter, defaultdict\n", + "import matplotlib.pyplot as plt\n", + "%matplotlib inline\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa916b7a-9044-4461-b29a-815d47973e75", + "metadata": {}, + "outputs": [], + "source": [ + "# import datasets\n", + "# print(datasets.__version__)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6cf6e19-1276-4b37-8f9b-6acf1473a7c6", + "metadata": {}, + "outputs": [], + "source": [ + "# environment\n", + "\n", + "load_dotenv(override=True)\n", + "hf_token = os.getenv('HF_TOKEN')\n", + "if not hf_token:\n", + " print(\"❌ HF_TOKEN is missing\")\n", + "\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "markdown", + "id": "a1637a14-b2df-4286-a8d6-ddae413f4a8a", + "metadata": {}, + "source": [ + "## ⚙️ Data Loading & Curation (Simultaneously)\n", + "We load and curate the data at the same time using loaders.py and items.py.\n", + "- Datasets come from: https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023/tree/main/raw/meta_categories\n", + "- `loaders.py` handles parallel loading and filtering of products\n", + "- `items.py` defines the Item class to clean, validate, and prepare each datapoint (title, description, price...) for modeling.\n", + "\n", + "\n", + "🛠️ Note: Data is filtered to include items priced between 1 and 999 USD.\n", + "\n", + "💡 Comments have been added in both files to clarify the processing logic.\n", + "\n", + "⚠️ Loading 2.8M+ items can take 40+ mins on a regular laptop.\n", + "\n", + "⚠️ Set WORKER wisely in `loaders.py` to match your system capacity. Too many may crash your machine." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8b89273c-e02f-4c15-8394-5d948a266bfc", + "metadata": {}, + "outputs": [], + "source": [ + "sys.path.append('./helpers')\n", + "import helpers.items\n", + "import helpers.loaders\n", + "\n", + "importlib.reload(helpers.items)\n", + "importlib.reload(helpers.loaders)\n", + "\n", + "from helpers.items import Item # noqa: E402\n", + "from helpers.loaders import ItemLoader # noqa: E402" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "260a123b-8f34-4c66-bcac-1c3b25e95d7f", + "metadata": {}, + "outputs": [], + "source": [ + "dataset_names = [\n", + " \"Automotive\",\n", + " \"Electronics\",\n", + " \"Office_Products\",\n", + " \"Tools_and_Home_Improvement\",\n", + " \"Cell_Phones_and_Accessories\",\n", + " \"Toys_and_Games\",\n", + " \"Appliances\",\n", + " \"Musical_Instruments\",\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b482032-cba9-4ee9-9451-9b7dc9f41be6", + "metadata": {}, + "outputs": [], + "source": [ + "items = []\n", + "for dataset_name in dataset_names:\n", + " loader = ItemLoader(dataset_name)\n", + " items.extend(loader.load())\n", + "\n", + "# Now, time for a coffee break!!\n", + "# By the way, the larger datasets first... it speeds up the process." + ] + }, + { + "cell_type": "markdown", + "id": "145d0648-e01d-46b9-ad42-f10b69fccbc3", + "metadata": {}, + "source": [ + "## 🔍 Inspecting a Sample Datapoint" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0185985d-5f67-4e4b-ac66-95b5b293231f", + "metadata": {}, + "outputs": [], + "source": [ + "print(f\"A grand total of {len(items):,} items\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2b0c0ae8-c0ec-4f6f-b847-800da379c01b", + "metadata": {}, + "outputs": [], + "source": [ + "# Investigate the first item from the list\n", + "\n", + "datapoint = items[0]\n", + "\n", + "# Access various attributes\n", + "title = datapoint.title\n", + "details = datapoint.details\n", + "price = datapoint.price\n", + "category = datapoint.category\n", + "\n", + "print(f\"Datapoint: {datapoint}\")\n", + "print('*' * 40)\n", + "print(f\"Title: {title}\")\n", + "print('*' * 40)\n", + "print(f\"Detail: {details}\")\n", + "print('*' * 40)\n", + "print(f\"Price: ${price}\")\n", + "print('*' * 40)\n", + "print(f\"Category: {category}\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e05ed6e4-1cbc-46a4-be2f-4832b99e5ec3", + "metadata": {}, + "outputs": [], + "source": [ + "# The prompt that will be used during training\n", + "print(items[0].prompt)\n", + "print('*' * 40)\n", + "# The prompt that will be used during testing\n", + "print(items[0].test_prompt())" + ] + }, + { + "cell_type": "markdown", + "id": "f66e714d-2bae-458e-a0f6-1ce78d0696b3", + "metadata": {}, + "source": [ + "## 📊 Data Visualization" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd50ae2c-b34e-4be7-bd74-62055e4d5b2d", + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize=(15, 6))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c736b038-2dcd-40b9-8ae9-d17271f1ff81", + "metadata": {}, + "outputs": [], + "source": [ + "# Plot the distribution of token counts\n", + "\n", + "tokens = [item.token_count for item in items]\n", + "plt.title(f\"Token counts: Avg {sum(tokens)/len(tokens):,.1f} and highest {max(tokens):,}\\n\")\n", + "plt.xlabel('Length (tokens)')\n", + "plt.ylabel('Count')\n", + "plt.hist(tokens, rwidth=0.7, color=\"blue\", bins=range(0, 300, 10))\n", + "plt.show()" + ] + }, + { + "attachments": { + "image.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "940ba698", + "metadata": {}, + "source": [ + "![image.png](attachment:image.png)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da33633a-7ad5-479c-8dff-f7a7a149d49c", + "metadata": {}, + "outputs": [], + "source": [ + "# Plot the distribution of prices\n", + "\n", + "prices = [item.price for item in items]\n", + "plt.title(f\"Prices: Avg {sum(prices)/len(prices):,.1f} and highest {max(prices):,}\\n\")\n", + "plt.xlabel('Price ($)')\n", + "plt.ylabel('Count')\n", + "plt.hist(prices, rwidth=0.7, color=\"blueviolet\", bins=range(0, 1000, 10))\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d0f494d7-349e-4878-929c-075ac97c6b6d", + "metadata": {}, + "outputs": [], + "source": [ + "# Plot the distribution of categories\n", + "\n", + "category_counts = Counter()\n", + "for item in items:\n", + " category_counts[item.category]+=1\n", + "\n", + "categories = category_counts.keys()\n", + "counts = [category_counts[category] for category in categories]\n", + "\n", + "# Bar chart by category\n", + "plt.bar(categories, counts, color=\"goldenrod\")\n", + "plt.title('How many items in each category')\n", + "plt.xlabel('Categories')\n", + "plt.ylabel('Count')\n", + "\n", + "plt.xticks(rotation=30, ha='right')\n", + "\n", + "# Add value labels on top of each bar\n", + "for i, v in enumerate(counts):\n", + " plt.text(i, v, f\"{v:,}\", ha='center', va='bottom')\n", + "\n", + "# Display the chart\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "d4fe384d-049b-4742-98e5-20d162db5151", + "metadata": {}, + "source": [ + "## 🎯 Data Sampling\n", + "\n", + "We sample to keep the dataset balanced but rich:\n", + "- 🎯 Keep all items if price ≥ $240 or group size ≤ 1200\n", + "- 🎯 For large groups, randomly sample 1200 items, favoring rare categories\n", + "\n", + "✅ This keeps valuable high-price items and avoids overrepresented classes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20330037-744d-4834-8ece-413a8dbe2030", + "metadata": {}, + "outputs": [], + "source": [ + "HEAVY_DATASET = \"Automative\"\n", + "\n", + "# Group items by rounded price\n", + "# Slots is a dictionary where the keys are rounded prices and the values are lists of items that have that rounded price\n", + "slots = defaultdict(list)\n", + "for item in items:\n", + " slots[round(item.price)].append(item)\n", + "\n", + "np.random.seed(42) # Set random seed for reproducibility\n", + "sample = [] # Final collection of items after our sampling process completes\n", + "\n", + "# Sampling loop\n", + "for price, items_at_price in slots.items():\n", + "\n", + " # Take all items if price ≥ 240 or small group\n", + " if price >= 240 or len(items_at_price) <= 1200:\n", + " sample.extend(items_at_price)\n", + "\n", + " # Otherwise sample 1200 items with weights\n", + " else:\n", + "\n", + " # Weight: 1 for toys, 5 for others\n", + " weights = [1 if item.category == HEAVY_DATASET else 5 for item in items_at_price]\n", + " weights = np.array(weights) / sum(weights)\n", + "\n", + " indices = np.random.choice(len(items_at_price), 1200, False, weights) # False = don't pick the same index twice\n", + " sample.extend([items_at_price[i] for i in indices])\n", + "\n", + "print(f\"There are {len(sample):,} items in the sample\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21aed337-6f15-48e4-8155-70551ed1d5e0", + "metadata": {}, + "outputs": [], + "source": [ + "# Plot the distribution of prices in the sample\n", + "\n", + "prices = [float(item.price) for item in sample]\n", + "plt.title(f\"Avg {sum(prices)/len(prices):.2f} and highest {max(prices):,.2f}\\n\")\n", + "plt.xlabel('Price ($)')\n", + "plt.ylabel('Count')\n", + "plt.hist(prices, rwidth=0.7, color=\"darkblue\", bins=range(0, 1000, 10))\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08a7353e-2752-4493-bb0b-6057d1eab16d", + "metadata": {}, + "outputs": [], + "source": [ + "# Plot the distribution of categories in the sample\n", + "\n", + "category_counts = Counter()\n", + "for item in sample:\n", + " category_counts[item.category]+=1\n", + "\n", + "categories = category_counts.keys()\n", + "counts = [category_counts[category] for category in categories]\n", + "\n", + "# Create bar chart\n", + "plt.bar(categories, counts, color=\"pink\")\n", + "\n", + "# Customize the chart\n", + "plt.title('How many in each category')\n", + "plt.xlabel('Categories')\n", + "plt.ylabel('Count')\n", + "\n", + "plt.xticks(rotation=30, ha='right')\n", + "\n", + "# Add value labels on top of each bar\n", + "for i, v in enumerate(counts):\n", + " plt.text(i, v, f\"{v:,}\", ha='center', va='bottom')\n", + "\n", + "# Display the chart\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9bdb0c58-24e0-4ab5-8a28-2136b53ab915", + "metadata": {}, + "source": [ + "The HEAVY_DATASET still in the lead, but improved somewhat" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4ce8ff80-cd19-4c3b-965f-ce6af8ee347d", + "metadata": {}, + "outputs": [], + "source": [ + "# Create pie chart\n", + "\n", + "fig, ax = plt.subplots(figsize=(8, 8))\n", + "wedges, texts, autotexts = ax.pie(\n", + " counts,\n", + " # labels=categories,\n", + " autopct='%1.0f%%',\n", + " startangle=90,\n", + " pctdistance=0.85,\n", + " labeldistance=1.1\n", + ")\n", + "ax.legend(wedges, categories, title=\"Categories\", loc=\"lower center\", bbox_to_anchor=(0.5, 1.15), ncol=3)\n", + "\n", + "# Draw donut center\n", + "centre_circle = plt.Circle((0, 0), 0.70, fc='white')\n", + "fig.gca().add_artist(centre_circle)\n", + "\n", + "# Add center label\n", + "ax.text(0, 0, \"Categories\", ha='center', va='center', fontsize=14, fontweight='bold')\n", + "\n", + "# Equal aspect ratio\n", + "plt.axis('equal')\n", + "plt.title(\"Category Distribution\")\n", + "plt.tight_layout()\n", + "plt.show()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "acbc6beb-fab4-49ab-bc7e-243638c1fa99", + "metadata": {}, + "outputs": [], + "source": [ + "# How does the price vary with the character count of the prompt?\n", + "\n", + "sizes = [len(item.prompt) for item in sample]\n", + "prices = [item.price for item in sample]\n", + "\n", + "# Create the scatter plot\n", + "plt.scatter(sizes, prices, s=0.2, color=\"red\")\n", + "\n", + "# Add labels and title\n", + "plt.xlabel('Size')\n", + "plt.ylabel('Price')\n", + "plt.title('Is there a simple correlation between prompt length and item price?')\n", + "\n", + "# Display the plot\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "76b060a4-0b8d-495c-bb96-28cb7b7ec623", + "metadata": {}, + "source": [ + "There is no strong or simple correlation between prompt length and item price.\n", + "\n", + "In other words, longer prompts don’t clearly mean higher prices, and vice versa." + ] + }, + { + "cell_type": "markdown", + "id": "0f33211c-3548-4a21-990b-21aa55089186", + "metadata": {}, + "source": [ + "## ✅ Final Check Before Training" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "be8d0c68-ac6e-4a4d-a6c7-64e9c6763ec4", + "metadata": {}, + "outputs": [], + "source": [ + "# Ensure the price label is correctly placed by the end of the prompt\n", + "\n", + "def report(item):\n", + " prompt = item.prompt\n", + " tokens = Item.tokenizer.encode(item.prompt)\n", + " print(prompt)\n", + " print(tokens[-6:])\n", + " print(Item.tokenizer.batch_decode(tokens[-6:]))\n", + "\n", + "report(sample[50])" + ] + }, + { + "cell_type": "markdown", + "id": "656d523d-8297-4d75-a973-a7e5517d21bc", + "metadata": {}, + "source": [ + "LLaMA and GPT-4o both tokenize numbers from 1 to 999 as a single token, while models like Qwen2, Gemma, and Phi-3 split them into multiple tokens. This helps keep prices compact in our prompts — useful for our project, though not strictly required." + ] + }, + { + "cell_type": "markdown", + "id": "e36254ba-d20f-44ad-b991-1f1f3cdc4aaa", + "metadata": {}, + "source": [ + "## 📦 Creating Train/Test Datasets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5cfb5092-c38d-4c14-8dd0-e1d97c06d7f6", + "metadata": {}, + "outputs": [], + "source": [ + "random.seed(42)\n", + "random.shuffle(sample)\n", + "train = sample[:400_000]\n", + "test = sample[400_000:402_000]\n", + "print(f\"Divided into a training set of {len(train):,} items and test set of {len(test):,} items\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2f084822-e489-4946-8cf5-f5b0ebd7a23c", + "metadata": {}, + "outputs": [], + "source": [ + "print(train[0].prompt)\n", + "print('*' * 40)\n", + "print(test[0].test_prompt())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d49a08ce-dd41-4af8-82f6-4701628e8152", + "metadata": {}, + "outputs": [], + "source": [ + "# Plot the distribution of prices in the first 250 test points\n", + "\n", + "prices = [float(item.price) for item in test[:250]]\n", + "plt.figure(figsize=(15, 6))\n", + "plt.title(f\"Avg {sum(prices)/len(prices):.2f} and highest {max(prices):,.2f}\\n\")\n", + "plt.xlabel('Price ($)')\n", + "plt.ylabel('Count')\n", + "plt.hist(prices, rwidth=0.7, color=\"darkblue\", bins=range(0, 1000, 10))\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0c581439-93f2-422a-924f-fd6c58ef8693", + "metadata": {}, + "outputs": [], + "source": [ + "# Extract prompts and prices\n", + "train_prompts = [item.prompt for item in train]\n", + "train_prices = [item.price for item in train]\n", + "test_prompts = [item.test_prompt() for item in test]\n", + "test_prices = [item.price for item in test]\n", + "\n", + "# Create Hugging Face datasets\n", + "train_dataset = Dataset.from_dict({\"text\": train_prompts, \"price\": train_prices})\n", + "test_dataset = Dataset.from_dict({\"text\": test_prompts, \"price\": test_prices})\n", + "dataset = DatasetDict({\n", + " \"train\": train_dataset,\n", + " \"test\": test_dataset\n", + "})\n", + "\n", + "# Save full Item objects\n", + "os.makedirs(\"data\", exist_ok=True) # Make sure the folder exists\n", + "\n", + "# Save full Item objects to the folder\n", + "with open('data/train.pkl', 'wb') as file:\n", + " pickle.dump(train, file)\n", + "\n", + "with open('data/test.pkl', 'wb') as file:\n", + " pickle.dump(test, file)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3914d029-350e-4140-a31f-e931fa289a41", + "metadata": {}, + "outputs": [], + "source": [ + "# Push to the Hugging Face Hub\n", + "USERNAME = \"lisekarimi\" # 🔧 Replace with your Hugging Face username\n", + "DATASET_NAME = f\"{USERNAME}/pricer-data\"\n", + "\n", + "dataset.push_to_hub(DATASET_NAME, private=True)" + ] + }, + { + "cell_type": "markdown", + "id": "3d8f3b33-41f8-4ee6-96ed-27677ffc8ec4", + "metadata": {}, + "source": [ + "**Note:** \n", + "- The dataset `pricer-data` on Hugging Face only contains `text` and `price`:\n", + "\n", + "\n", + "{\n", + " \"text\": \"How much does this cost...Price is $175.00\",\n", + " \"price\": 175.0\n", + "}\n", + "\n", + "- Full `Item` objects (with metadata) are available in `train.pkl` and `test.pkl`:\n", + "\n", + "Item(data={\n", + " \"title\": str,\n", + " \"description\": list[str],\n", + " \"features\": list[str],\n", + " \"details\": str\n", + "}, price=float)\n", + "\n", + "\n", + "Now, it’s time to move on to **Part 2: Model Benchmarking – Traditional ML vs Frontier LLMs.**\n", + "\n", + "🔜 See you in the [next notebook](https://github.com/lisekarimi/lexo/blob/main/09_part2_tradml_vs_frontier.ipynb)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week6/community-contributions/lisekarimi/09_part2_tradml_vs_frontier.ipynb b/week6/community-contributions/lisekarimi/09_part2_tradml_vs_frontier.ipynb new file mode 100644 index 0000000..b4a8f14 --- /dev/null +++ b/week6/community-contributions/lisekarimi/09_part2_tradml_vs_frontier.ipynb @@ -0,0 +1,779 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d9b9eaa6-a12f-4cf8-a4c5-e8ac2c15d15b", + "metadata": {}, + "source": [ + "# 🔍 Predicting Item Prices from Descriptions (Part 2)\n", + "---\n", + "- Data Curation & Preprocessing\n", + "- ➡️ Model Benchmarking – Traditional ML vs LLMs\n", + "- E5 Embeddings & RAG\n", + "- Fine-Tuning GPT-4o Mini\n", + "- Evaluating LLaMA 3.1 8B Quantized\n", + "- Fine-Tuning LLaMA 3.1 with QLoRA\n", + "- Evaluating Fine-Tuned LLaMA \n", + "- Summary & Leaderboard\n", + "\n", + "--- \n", + "\n", + "# ⚔️ Part 2: Traditional ML vs LLMs\n", + "\n", + "- Tasks:\n", + " - Vectorize text (BoW, Word2Vec)\n", + " - Train SVR, LR, XGBoost models\n", + " - Predict with LLMs (GPT-4o, Claude, LLaMA…)\n", + " - Compare traditional ML vs LLMs\n", + " \n", + "📊 Which model predicts prices best? Let’s find out.\n", + "\n", + "- 🧑‍💻 Skill Level: Advanced\n", + "- ⚙️ Hardware: ✅ CPU is sufficient — no GPU required\n", + "- 🛠️ Requirements: 🔑 HF Token, Open API Key, Anthropic API key, Groq API key\n", + "\n", + "⚠️ This notebook assumes you're familiar with NLP techniques (e.g., converting text to vectors using Bag-of-Words or Word2Vec) and traditional ML models (like SVR, Logistic Regression, XGBoost) along with basic evaluation metrics.\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4ce6a892-b357-4132-b9c0-a3142a0244c8", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import importlib\n", + "import re\n", + "import csv\n", + "import tiktoken\n", + "import math\n", + "from datasets import load_dataset\n", + "import pandas as pd\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.feature_extraction.text import CountVectorizer\n", + "from gensim.models import Word2Vec\n", + "from gensim.utils import simple_preprocess\n", + "from sklearn.svm import LinearSVR\n", + "import xgboost as xgb\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "from anthropic import Anthropic" + ] + }, + { + "cell_type": "markdown", + "id": "6f82b230-2e03-4b1e-9be5-926fcd19acbe", + "metadata": {}, + "source": [ + "## 📥 Load Dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4292a45d", + "metadata": {}, + "outputs": [], + "source": [ + "# #If you face NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported run:\n", + "# %pip install -U datasets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "55f1495b-f343-4152-8739-3a99f5ac405d", + "metadata": {}, + "outputs": [], + "source": [ + "HF_USER = \"lisekarimi\"\n", + "DATASET_NAME = f\"{HF_USER}/pricer-data\"\n", + "\n", + "dataset = load_dataset(DATASET_NAME)\n", + "train = dataset['train']\n", + "test = dataset['test']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "85880d79-f1ba-4ee8-a039-b6acea84562c", + "metadata": {}, + "outputs": [], + "source": [ + "print(train[0][\"text\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "88842541-d73b-4fae-a550-6dedf8fab633", + "metadata": {}, + "outputs": [], + "source": [ + "print(train[0][\"price\"])" + ] + }, + { + "cell_type": "markdown", + "id": "1e3501c5-a52d-4ace-a988-b86b7e7dbb31", + "metadata": {}, + "source": [ + "## 🛠️ Prepare Data for models" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0a87cd82-127b-4a66-9ad9-90978a2376b5", + "metadata": {}, + "outputs": [], + "source": [ + "def mask_price_value(text):\n", + " return re.sub(r\"(\\n\\nPrice is \\$).*\", r\"\\1\", text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "84ad6155-2708-4810-80a6-7efcf3bbd886", + "metadata": {}, + "outputs": [], + "source": [ + "# Extract prices\n", + "prices = np.array([float(datapoint[\"price\"]) for datapoint in train])\n", + "\n", + "# Extract cleaned prompts\n", + "documents = [mask_price_value(datapoint[\"text\"]) for datapoint in train]\n", + "\n", + "# Set random seed for reproducibility\n", + "np.random.seed(42)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1c82371-5e92-4354-a064-38db1b6a8339", + "metadata": {}, + "outputs": [], + "source": [ + "print(documents[0])" + ] + }, + { + "cell_type": "markdown", + "id": "f05dd862-cc64-43d3-a0c3-c3a16d66e1bf", + "metadata": {}, + "source": [ + "## 📊 Model Evaluation with testing.py\n", + "\n", + "- Runs predictions and computes errors on test data\n", + "- Metrics: Absolute error, RMSLE, and hit rate\n", + "- Visual: Scatter plot of predicted vs. actual prices (color-coded)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45965754-7107-4023-bb33-81730b73db2e", + "metadata": {}, + "outputs": [], + "source": [ + "import helpers.testing\n", + "importlib.reload(helpers.testing)\n", + "\n", + "from helpers.testing import Tester # noqa: E402\n", + "\n", + "results = {} # Store each model's tester to compare and find the best performer" + ] + }, + { + "cell_type": "markdown", + "id": "2d8b08a8-f0a3-468f-91ea-7da60aecc32a", + "metadata": {}, + "source": [ + "## 🎯 Price Prediction with Traditional ML" + ] + }, + { + "cell_type": "markdown", + "id": "35475efe-0751-443a-9605-89e2025c3eb4", + "metadata": {}, + "source": [ + "## Bag-of-Words + Linear Regression" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ded239d6-4dca-439b-8748-67aa2d2fa2a9", + "metadata": {}, + "outputs": [], + "source": [ + "# Use the CountVectorizer for a Bag of Words model\n", + "vectorizer = CountVectorizer(max_features=1000, stop_words='english')\n", + "X = vectorizer.fit_transform(documents)\n", + "regressor = LinearRegression()\n", + "regressor.fit(X, prices)\n", + "\n", + "def bow_lr_pricer(datapoint):\n", + " x = vectorizer.transform([mask_price_value(datapoint[\"text\"])])\n", + " return max(regressor.predict(x)[0], 0)\n", + "\n", + "tester = Tester(bow_lr_pricer, test)\n", + "tester.run()\n", + "results[\"Bag of Words LR\"] = tester" + ] + }, + { + "cell_type": "markdown", + "id": "4b861fe5", + "metadata": {}, + "source": [ + "Bow Lr Pricer Error=$121.23 RMSLE=0.98 Hits=27.2%" + ] + }, + { + "cell_type": "markdown", + "id": "25dfc7c6-a258-4b56-8c02-f01003c4674d", + "metadata": {}, + "source": [ + "## Word2Vec + Linear Regression" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "efa22fd1-e81d-4142-b0a1-f1399c7a98a3", + "metadata": {}, + "outputs": [], + "source": [ + "# Preprocess the documents\n", + "processed_docs = [simple_preprocess(doc) for doc in documents]\n", + "\n", + "# Train Word2Vec model\n", + "w2v_model = Word2Vec(sentences=processed_docs, vector_size=400, window=5, min_count=1, workers=4)\n", + "\n", + "# This step of averaging vectors across the document is a weakness in our approach\n", + "\n", + "def document_vector(doc):\n", + " doc_words = simple_preprocess(doc)\n", + " word_vectors = [w2v_model.wv[word] for word in doc_words if word in w2v_model.wv]\n", + " return np.mean(word_vectors, axis=0) if word_vectors else np.zeros(w2v_model.vector_size)\n", + "\n", + "# Create feature matrix\n", + "X_w2v = np.array([document_vector(doc) for doc in documents])\n", + "\n", + "# Run Linear Regression on word2vec\n", + "\n", + "word2vec_lr_regressor = LinearRegression()\n", + "word2vec_lr_regressor.fit(X_w2v, prices)\n", + "\n", + "def word2vec_lr_pricer(datapoint):\n", + " doc = mask_price_value(datapoint[\"text\"])\n", + " vec = document_vector(doc)\n", + " return max(0, word2vec_lr_regressor.predict([vec])[0])\n", + "\n", + "tester = Tester(word2vec_lr_pricer, test)\n", + "tester.run()\n", + "results[\"Word2Vec LR\"] = tester" + ] + }, + { + "cell_type": "markdown", + "id": "daaf6101", + "metadata": {}, + "source": [ + "Word2Vec Lr Pricer Error=$127.42 RMSLE=0.97 Hits=27.6%" + ] + }, + { + "cell_type": "markdown", + "id": "5f1fe808-f80e-4d15-8ec7-d31710cf68c5", + "metadata": {}, + "source": [ + "## Word2Vec + Linear SVR" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35d01455-6619-4f29-8f95-90c03763407e", + "metadata": {}, + "outputs": [], + "source": [ + "svr_regressor = LinearSVR()\n", + "svr_regressor.fit(X_w2v, prices)\n", + "\n", + "def svr_pricer(datapoint):\n", + " np.random.seed(42)\n", + " doc = mask_price_value(datapoint[\"text\"])\n", + " doc_vector = document_vector(doc)\n", + " return max(float(svr_regressor.predict([doc_vector])[0]),0)\n", + "\n", + "tester = Tester(svr_pricer, test)\n", + "tester.run()\n", + "results[\"Word2Vec SVR\"] = tester" + ] + }, + { + "cell_type": "markdown", + "id": "48cb9c88", + "metadata": {}, + "source": [ + "Svr Pricer Error=$124.24 RMSLE=0.98 Hits=28.4%" + ] + }, + { + "cell_type": "markdown", + "id": "469ca205-3e5e-4aca-8b77-53f6acd92e40", + "metadata": {}, + "source": [ + "## Word2Vec + XGBoost " + ] + }, + { + "cell_type": "markdown", + "id": "a55acfe0-9633-45aa-a4c4-96b434a5a43b", + "metadata": {}, + "source": [ + "I initially tried Random Forest, but it struggled with high training time and didn’t scale well with this data.\n", + "That’s why I opted for XGBoost — it’s faster, handles large datasets efficiently, and often delivers better performance on structured data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b0e3e1d7-2e62-4866-924e-7ed4483db8bc", + "metadata": {}, + "outputs": [], + "source": [ + "xgb_model = xgb.XGBRegressor(n_estimators=100, random_state=42, n_jobs=-1, verbosity=0)\n", + "xgb_model.fit(X_w2v, prices)\n", + "\n", + "def xgboost_pricer(datapoint):\n", + " doc = mask_price_value(datapoint[\"text\"])\n", + " doc_vector = document_vector(doc)\n", + " return max(0, xgb_model.predict([doc_vector])[0])\n", + "\n", + "tester = Tester(xgboost_pricer, test)\n", + "tester.run()\n", + "results[\"Word2Vec XGBoost\"] = tester\n" + ] + }, + { + "cell_type": "markdown", + "id": "d35050fa", + "metadata": {}, + "source": [ + "Xgboost Pricer Error=$107.97 RMSLE=0.84 Hits=29.2%" + ] + }, + { + "cell_type": "markdown", + "id": "4db1051d-9a7e-4cec-87fc-0d77fd858ced", + "metadata": {}, + "source": [ + "## 🚀 Price Prediction with Frontier LLMs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0ef3fa58-87b7-4c30-8088-1a4999f0d25a", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "\n", + "# Get API keys from environment\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "if not openai_api_key:\n", + " print(\"❌ OPENAI_API_KEY is missing\")\n", + "\n", + "anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "if not anthropic_api_key:\n", + " print(\"❌ ANTHROPIC_API_KEY is missing\")\n", + "\n", + "groq_api_key = os.getenv('GROQ_API_KEY')\n", + "if not groq_api_key:\n", + " print(\"❌ GROQ_API_KEY is missing\")\n", + "\n", + "# Initialize clients\n", + "openai = OpenAI(api_key=openai_api_key)\n", + "claude = Anthropic(api_key=anthropic_api_key)\n", + "groq = OpenAI(api_key=groq_api_key, base_url=\"https://api.groq.com/openai/v1\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d86b3bca-513b-4621-8c66-4b89c134b895", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(datapoint):\n", + " system_message = \"You estimate prices of items. Reply only with the price, no explanation\"\n", + " user_prompt = mask_price_value(datapoint[\"text\"]).replace(\" to the nearest dollar\", \"\").replace(\"\\n\\nPrice is $\",\"\")\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt},\n", + " {\"role\": \"assistant\", \"content\": \"Price is $\"}\n", + " ]\n", + "\n", + "messages_for(train[0])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f502d428-98aa-4160-bebe-726efcce5c65", + "metadata": {}, + "outputs": [], + "source": [ + "# A utility function to extract the price from a string\n", + "\n", + "def get_price(s):\n", + " s = s.replace('$','').replace(',','')\n", + " match = re.search(r\"[-+]?\\d*\\.\\d+|\\d+\", s)\n", + " return float(match.group()) if match else 0\n", + "\n", + "get_price(\"The price is roughly $99.99 because blah blah\") # Testing" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3845eda0-d37d-4605-a00f-83b1d8fc6945", + "metadata": {}, + "outputs": [], + "source": [ + "# A utility function to Count the tokens before passing the prompt to the model\n", + "\n", + "def count_tokens(messages):\n", + " encoding = tiktoken.get_encoding(\"cl100k_base\")\n", + " token_count = sum(len(encoding.encode(message['content'])) for message in messages)\n", + " return token_count\n" + ] + }, + { + "cell_type": "markdown", + "id": "4737e678-5d57-4dee-984b-ae5c56f9542d", + "metadata": {}, + "source": [ + "### gpt-4o-mini" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dca067d0-a4ff-4a48-bb74-d2914f3704b7", + "metadata": {}, + "outputs": [], + "source": [ + "# Count tokens once before running\n", + "total_tokens = 0\n", + "for datapoint in train:\n", + " messages = messages_for(datapoint)\n", + " total_tokens += count_tokens(messages)\n", + "print(f\"Total tokens: {total_tokens}\")\n", + "\n", + "def gpt_4o_mini(datapoint):\n", + " messages = messages_for(datapoint)\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4o-mini\",\n", + " messages=messages,\n", + " seed=42,\n", + " max_tokens=5\n", + " )\n", + " reply = response.choices[0].message.content\n", + " return get_price(reply)\n", + "\n", + "tester = Tester(gpt_4o_mini, test)\n", + "tester.run()\n", + "results[\"gpt 4o mini\"] = tester" + ] + }, + { + "cell_type": "markdown", + "id": "5c4c8ee4", + "metadata": {}, + "source": [ + "Gpt 4o Mini Error=$99.30 RMSLE=0.75 Hits=44.8%" + ] + }, + { + "cell_type": "markdown", + "id": "00a72937-9cde-472c-bd22-84996a42ab4c", + "metadata": {}, + "source": [ + "### gpt 4o (the big guy 😎)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20d18e1a-ccbf-4481-84ed-16b1c5760176", + "metadata": {}, + "outputs": [], + "source": [ + "def gpt_4o_frontier(datapoint):\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4o\",\n", + " messages=messages_for(datapoint),\n", + " seed=42,\n", + " max_tokens=5\n", + " )\n", + " reply = response.choices[0].message.content\n", + " return get_price(reply)\n", + "\n", + "tester = Tester(gpt_4o_frontier, test)\n", + "tester.run()\n", + "results[\"gpt 4o (the big guy)\"] = tester" + ] + }, + { + "cell_type": "markdown", + "id": "0c307928", + "metadata": {}, + "source": [ + "Gpt 4O Frontier Error=$87.68 RMSLE=1.01 Hits=51.2%" + ] + }, + { + "cell_type": "markdown", + "id": "20af42a7-8889-4091-bee9-80aeaf63816f", + "metadata": {}, + "source": [ + "### claude 3.7 Sonnet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f4e343ef-2097-4395-86b2-90c489f133fe", + "metadata": {}, + "outputs": [], + "source": [ + "def claude_3_point_7_sonnet(datapoint):\n", + " messages = messages_for(datapoint)\n", + " system_message = messages[0]['content']\n", + " messages = messages[1:]\n", + " response = claude.messages.create(\n", + " model=\"claude-3-7-sonnet-20250219\",\n", + " max_tokens=5,\n", + " system=system_message,\n", + " messages=messages\n", + " )\n", + " reply = response.content[0].text\n", + " return get_price(reply)\n", + "\n", + "tester = Tester(claude_3_point_7_sonnet, test)\n", + "tester.run()\n", + "results[\"claude 3.7 sonnet\"] = tester" + ] + }, + { + "cell_type": "markdown", + "id": "fdbba849", + "metadata": {}, + "source": [ + "Claude 3 Point 7 Sonnet Error=$110.26 RMSLE=0.60 Hits=46.0%" + ] + }, + { + "cell_type": "markdown", + "id": "0ff3a6bd-99b8-438e-abc1-295bf0bb9961", + "metadata": {}, + "source": [ + "### groq model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "58a0c852-0811-4156-9c08-fa5bf4b54cd2", + "metadata": {}, + "outputs": [], + "source": [ + "def llama3_groq_pricer(datapoint):\n", + " response = groq.chat.completions.create(\n", + " model=\"llama3-70b-8192\",\n", + " messages=messages_for(datapoint),\n", + " max_tokens=5,\n", + " seed=42\n", + " )\n", + " reply = response.choices[0].message.content\n", + " return get_price(reply)\n", + "\n", + "tester = Tester(llama3_groq_pricer, test)\n", + "tester.run()\n", + "results[\"llama3-70b-8192\"] = tester" + ] + }, + { + "cell_type": "markdown", + "id": "daf7f96c", + "metadata": {}, + "source": [ + "Llama3 Groq Pricer Error=$122.95 RMSLE=0.73 Hits=44.8%" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b4cd8f25-9a8d-4227-ba58-c163b4d601cb", + "metadata": {}, + "outputs": [], + "source": [ + "def deepseek_qwen_pricer(datapoint):\n", + " response = groq.chat.completions.create(\n", + " model=\"deepseek-r1-distill-qwen-32b\",\n", + " messages=messages_for(datapoint),\n", + " max_tokens=5,\n", + " seed=42\n", + " )\n", + " reply = response.choices[0].message.content\n", + " return get_price(reply)\n", + "\n", + "tester = Tester(deepseek_qwen_pricer, test)\n", + "tester.run()\n", + "results[\"deepseek-qwen-32b\"] = tester" + ] + }, + { + "cell_type": "markdown", + "id": "7280870e", + "metadata": {}, + "source": [ + "Deepseek Qwen Pricer Error=$178.96 RMSLE=0.83 Hits=33.2%" + ] + }, + { + "cell_type": "markdown", + "id": "af7d0190-d89b-4525-8a34-21033e99abb0", + "metadata": {}, + "source": [ + "## 🕵️ Human Judgement Baseline (Ed)\n", + "\n", + "We include a human baseline from our instructor Ed, who manually estimated prices based on item descriptions (💪 thanks Ed for taking on this exhausting task!). This allows us to compare model performance against human intuition." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4d63970d-a2d2-4329-8fe7-d0bdc2ff1bcb", + "metadata": {}, + "outputs": [], + "source": [ + "human_predictions = []\n", + "\n", + "with open('data/human_output.csv', 'r', encoding=\"utf-8\") as csvfile:\n", + " reader = csv.reader(csvfile)\n", + " for row in reader:\n", + " human_predictions.append(float(row[1]))\n", + "\n", + "def human_pricer(datapoint):\n", + " # `Tester` runs in order, so use the index from Tester itself\n", + " idx = human_pricer.counter\n", + " human_pricer.counter += 1\n", + " return human_predictions[idx]\n", + "\n", + "human_pricer.counter = 0 # initialize counter\n", + "\n", + "tester = Tester(human_pricer, test)\n", + "tester.run()\n", + "results[\"Human Predictions\"] = tester" + ] + }, + { + "cell_type": "markdown", + "id": "08c0d367-d596-43e6-81af-5889691fa34b", + "metadata": {}, + "source": [ + "## 🥇 Benchmark Showdown: ML, LLMs, and Ed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "830ae9a5-a185-41af-b17f-8d6a3f3752b7", + "metadata": {}, + "outputs": [], + "source": [ + "def truncate(x, decimals=2):\n", + " factor = 10 ** decimals\n", + " return math.floor(x * factor) / factor\n", + "\n", + "df_results = []\n", + "\n", + "for model_name, tester in results.items():\n", + " avg_error = truncate(sum(tester.errors) / tester.size)\n", + " hit_percent = truncate(sum(1 for c in tester.colors if c == \"green\") / tester.size * 100)\n", + " rmsle = truncate(math.sqrt(sum(tester.sles) / tester.size))\n", + "\n", + " df_results.append({\n", + " \"model\": model_name,\n", + " \"avrg_error\": avg_error,\n", + " \"rmsle\": rmsle,\n", + " \"accuracy_%\": hit_percent\n", + " })\n", + "\n", + "df_results = pd.DataFrame(df_results)\n", + "df_results = df_results.sort_values(by=\"avrg_error\")\n", + "\n", + "# Display with .2f formatting\n", + "print(df_results.to_string(index=False, float_format=\"{:.2f}\".format))\n" + ] + }, + { + "attachments": { + "image.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "e78ddc21-1ffc-431b-902b-4562bdd4e789", + "metadata": {}, + "source": [ + "![image.png](attachment:image.png)\n", + "\n", + "🏁 **GPT-4o, GPT-4o Mini and XGBoost** clearly outperformed both LLMs (like Claude 3.7, LLaMA3-70B, DeepSeek-32B) and traditional ML approaches (LR, SVR).\n", + "\n", + "Now let’s take the top-performing frontier LLM — **GPT-4o Mini** — to test if retrieval (RAG) boosts its performance, and the best ML model — **XGBoost** — to see if contextual embeddings enhance its predictions.\n", + "\n", + "Let’s find out.\n", + "\n", + "🔜 See you in the [next notebook](https://github.com/lisek75/nlp_llms_notebook/blob/main/09_part3_e5embeddings_rag.ipynb)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week6/community-contributions/lisekarimi/09_part3_e5embeddings_rag.ipynb b/week6/community-contributions/lisekarimi/09_part3_e5embeddings_rag.ipynb new file mode 100644 index 0000000..5e6eea0 --- /dev/null +++ b/week6/community-contributions/lisekarimi/09_part3_e5embeddings_rag.ipynb @@ -0,0 +1,1080 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d9b9eaa6-a12f-4cf8-a4c5-e8ac2c15d15b", + "metadata": { + "id": "d9b9eaa6-a12f-4cf8-a4c5-e8ac2c15d15b" + }, + "source": [ + "# 🔍 Predicting Item Prices from Descriptions (Part 3)\n", + "---\n", + "- Data Curation & Preprocessing\n", + "- Model Benchmarking – Traditional ML vs LLMs\n", + "- ➡️E5 Embeddings & RAG\n", + "- Fine-Tuning GPT-4o Mini\n", + "- Evaluating LLaMA 3.1 8B Quantized\n", + "- Fine-Tuning LLaMA 3.1 with QLoRA\n", + "- Evaluating Fine-Tuned LLaMA\n", + "- Summary & Leaderboard\n", + "\n", + "---\n", + "\n", + "# 🧠 Part 3: E5 Embeddings & RAG\n", + "\n", + "- 🧑‍💻 Skill Level: Advanced\n", + "- ⚙️ Hardware: ⚠️ GPU required for embeddings (400K items) - use Google Colab\n", + "- 🛠️ Requirements: 🔑 HF Token, Open API Key\n", + "- Tasks:\n", + " - Preprocessed item descriptions\n", + " - Generated and stored embeddings in ChromaDB\n", + " - Trained XGBoost on embeddings, pushed to HF Hub, and ran predictions\n", + " - Predicted prices with GPT-4o Mini using RAG\n", + "\n", + "Is Word2Vec enough for XGBoost, or do contextual E5 embeddings perform better?\n", + "\n", + "Does retrieval improve price prediction for GPT-4o Mini?\n", + "\n", + "Let’s find out.\n", + "\n", + "⚠️ This notebook assumes basic familiarity with RAG and contextual embeddings.\n", + "We use the same E5 embedding space for both XGBoost and GPT-4o Mini with RAG, enabling a fair comparison.\n", + "Embeddings are stored and queried via ChromaDB — no LangChain is used for creation or retrieval.\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8e2af5e-03cc-46dc-8a8b-37cb102d0e92", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "d8e2af5e-03cc-46dc-8a8b-37cb102d0e92", + "outputId": "905907cc-81c5-4a3b-e7c8-9e237e594a09" + }, + "outputs": [], + "source": [ + "# Install required packages in Google Colab\n", + "%pip install -q tqdm huggingface_hub numpy sentence-transformers datasets chromadb xgboost" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4ce6a892-b357-4132-b9c0-a3142a0244c8", + "metadata": { + "id": "4ce6a892-b357-4132-b9c0-a3142a0244c8" + }, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import math\n", + "import chromadb\n", + "import re\n", + "import joblib\n", + "import os\n", + "from tqdm import tqdm\n", + "import gc\n", + "from huggingface_hub import login, HfApi\n", + "import numpy as np\n", + "from sentence_transformers import SentenceTransformer\n", + "from datasets import load_dataset\n", + "from google.colab import userdata\n", + "from xgboost import XGBRegressor\n", + "from openai import OpenAI\n", + "import matplotlib.pyplot as plt\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "yBH-mvV0QBiw", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "yBH-mvV0QBiw", + "outputId": "b4b6df10-dc05-4dbe-dd8b-55bae5a2b7af" + }, + "outputs": [], + "source": [ + "# Mount Google Drive to access persistent storage\n", + "\n", + "from google.colab import drive\n", + "drive.mount('/content/drive')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3OUI1jQYyaeX", + "metadata": { + "id": "3OUI1jQYyaeX" + }, + "outputs": [], + "source": [ + "# Google Colab User Data\n", + "# Ensure you have set the following in your Google Colab environment:\n", + "openai_api_key = userdata.get(\"OPENAI_API_KEY\")\n", + "hf_token = userdata.get('HF_TOKEN')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "99f6f632", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI(api_key=openai_api_key)\n", + "login(hf_token, add_to_git_credential=True)\n", + "\n", + "# Configuration\n", + "ROOT = \"/content/drive/MyDrive/deal_finder\"\n", + "CHROMA_PATH = f\"{ROOT}/chroma\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "FF-HryRnDXm5", + "metadata": { + "id": "FF-HryRnDXm5" + }, + "outputs": [], + "source": [ + "# Helper class for evaluating model predictions\n", + "\n", + "GREEN = \"\\033[92m\"\n", + "YELLOW = \"\\033[93m\"\n", + "RED = \"\\033[91m\"\n", + "RESET = \"\\033[0m\"\n", + "COLOR_MAP = {\"red\":RED, \"orange\": YELLOW, \"green\": GREEN}\n", + "\n", + "class Tester:\n", + "\n", + " def __init__(self, predictor, data, title=None, size=250):\n", + " self.predictor = predictor\n", + " self.data = data\n", + " self.title = title or predictor.__name__.replace(\"_\", \" \").title()\n", + " self.size = size\n", + " self.guesses = []\n", + " self.truths = []\n", + " self.errors = []\n", + " self.sles = []\n", + " self.colors = []\n", + "\n", + " def color_for(self, error, truth):\n", + " if error<40 or error/truth < 0.2:\n", + " return \"green\"\n", + " elif error<80 or error/truth < 0.4:\n", + " return \"orange\"\n", + " else:\n", + " return \"red\"\n", + "\n", + " def run_datapoint(self, i):\n", + " datapoint = self.data[i]\n", + " guess = self.predictor(datapoint)\n", + " truth = datapoint[\"price\"]\n", + " error = abs(guess - truth)\n", + " log_error = math.log(truth+1) - math.log(guess+1)\n", + " sle = log_error ** 2\n", + " color = self.color_for(error, truth)\n", + " # title = datapoint[\"text\"].split(\"\\n\\n\")[1][:20] + \"...\"\n", + " self.guesses.append(guess)\n", + " self.truths.append(truth)\n", + " self.errors.append(error)\n", + " self.sles.append(sle)\n", + " self.colors.append(color)\n", + " # print(f\"{COLOR_MAP[color]}{i+1}: Guess: ${guess:,.2f} Truth: ${truth:,.2f} Error: ${error:,.2f} SLE: {sle:,.2f} Item: {title}{RESET}\")\n", + "\n", + " def chart(self, title):\n", + " # max_error = max(self.errors)\n", + " plt.figure(figsize=(12, 8))\n", + " max_val = max(max(self.truths), max(self.guesses))\n", + " plt.plot([0, max_val], [0, max_val], color='deepskyblue', lw=2, alpha=0.6)\n", + " plt.scatter(self.truths, self.guesses, s=3, c=self.colors)\n", + " plt.xlabel('Ground Truth')\n", + " plt.ylabel('Model Estimate')\n", + " plt.xlim(0, max_val)\n", + " plt.ylim(0, max_val)\n", + " plt.title(title)\n", + "\n", + " # Add color legend\n", + " from matplotlib.lines import Line2D\n", + " legend_elements = [\n", + " Line2D([0], [0], marker='o', color='w', label='Accurate (green)', markerfacecolor='green', markersize=8),\n", + " Line2D([0], [0], marker='o', color='w', label='Medium error (orange)', markerfacecolor='orange', markersize=8),\n", + " Line2D([0], [0], marker='o', color='w', label='High error (red)', markerfacecolor='red', markersize=8)\n", + " ]\n", + " plt.legend(handles=legend_elements, loc='upper right')\n", + "\n", + " plt.show()\n", + "\n", + "\n", + " def report(self):\n", + " average_error = sum(self.errors) / self.size\n", + " rmsle = math.sqrt(sum(self.sles) / self.size)\n", + " hits = sum(1 for color in self.colors if color==\"green\")\n", + " title = f\"{self.title} Error=${average_error:,.2f} RMSLE={rmsle:,.2f} Hits={hits/self.size*100:.1f}%\"\n", + " self.chart(title)\n", + "\n", + " def run(self):\n", + " self.error = 0\n", + " for i in range(self.size):\n", + " self.run_datapoint(i)\n", + " self.report()\n", + "\n", + " @classmethod\n", + " def test(cls, function, data):\n", + " cls(function, data).run()\n" + ] + }, + { + "cell_type": "markdown", + "id": "6f82b230-2e03-4b1e-9be5-926fcd19acbe", + "metadata": { + "id": "6f82b230-2e03-4b1e-9be5-926fcd19acbe" + }, + "source": [ + "## 📥 Load Dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3ae00568", + "metadata": {}, + "outputs": [], + "source": [ + "# #If you face NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported run:\n", + "# %pip install -U datasets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "55f1495b-f343-4152-8739-3a99f5ac405d", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 177, + "referenced_widgets": [ + "6e7c01d666f64fa58d6a059cc8d8f323", + "597b7155767441e6a0283a19edced00f", + "cf1360550eaa49a0867f55db8b8c4c77", + "94f26137cccf47f6a36d9325bc8f5b9c", + "a764b97f3dcd480c8860dde979e5e114", + "f1ec9a46c9ce4e038f3051bbd1b2c661", + "992f46ae91554731987b4baf79ba1bbd", + "b4abe22402fe40fd82b7fe93b4bc06f3", + "57ec058518734e3dbd27324cbba243c0", + "f101230e8a9a431d85ee2f8e51add7ad", + "e196658b093746588113240a60336437", + "cb06a4d26cb84c708857b683d1e84c12", + "e82ad07ba22e465cbe0232c504c3b693", + "c4e0ed1165f54393aaec24cd4624d562", + "295a3c6662034aaaab4d2e0192d1d1ce", + "c38aff0c91a849feb547e78156c2c347", + "69647c5595874c3185cebf6813ee908c", + "1036b1af4b154916a3d4f16f5ed799eb", + "e6347ff832cc4c04aef86594ea5a9e64", + "01c63224aa6a4f0c9c88a4d85527e767", + "1db34b9a4f1f42a897345b5a6630ced6", + "9293f2d745024d7facb68e04cc188850", + "26f6ec91efaf42909cec172fafe55987", + "c1131f0324b0498da9bc59720e867eb6", + "3e58017527a04634a489a33ed53fd312", + "06cd89f57d08466c875d179e79e3ecd2", + "2e0aa0aa87a04419a277f303f577f7ff", + "8fa0fe1992db42a997e7cd3ee08bd09e", + "accb1d5142a9498da0117f746fedd691", + "fcc2fc2f82e2441995b9e61b23b9b91e", + "da93fe316dd24cb48538b52ef2eaf6b5", + "5cea58775faf41829c04d2a84e3e2c31", + "1914ec7959d143d09a55da324bbcd47b", + "a3d3504148df46f59b6770fb377e2bb6", + "b088b9a503e24f179741d40d21a730d9", + "b77dcf4632954d0c9c3b6d441c5f684d", + "4cc8b3c4d9934f24a94b4601ab7816b5", + "c093f1c0806a43b79594ddac856a301c", + "9f4d9ac1aa074ed6b0248a4b18fde7db", + "c00785b8fdda409e9cb435abbb0466da", + "612e211af4cd46eb9d2f3148d1c7cb0b", + "86f93c663cc446adbc6366a528cb01b0", + "dd42911451ec48e086c1c99e76492321", + "5b942241f11c4f2ab086f0f289f99a03", + "d28a5c6172f74c0f8bbd2d949455f22e", + "0e67b2055f214eb691b4b54d9431bdd8", + "f81c4dc72b3b4b40a6a70528db732482", + "043a355b6a85471ba0142eb25e2c9eb0", + "8682bfab79a8409499797a3307e4d64d", + "55a837644bb643ac864fa1a674e665c8", + "33aae5a98bf5433b813ff8216e015089", + "56eedfc5ba6642dc8443ab60f5f09b8c", + "a1b710c227a84ea1a55c310084f13a93", + "0d4bc0d0e88a4c77a202f9c11b2ee2a9", + "20858379c2cd45d59070b18149d6e925" + ] + }, + "id": "55f1495b-f343-4152-8739-3a99f5ac405d", + "outputId": "37317fe6-b560-4ad0-c7d6-66517fd67c42" + }, + "outputs": [], + "source": [ + "HF_USER = \"lisekarimi\"\n", + "DATASET_NAME = f\"{HF_USER}/pricer-data\"\n", + "\n", + "dataset = load_dataset(DATASET_NAME)\n", + "train = dataset['train']\n", + "test = dataset['test']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "85880d79-f1ba-4ee8-a039-b6acea84562c", + "metadata": { + "id": "85880d79-f1ba-4ee8-a039-b6acea84562c" + }, + "outputs": [], + "source": [ + "print(train[0][\"text\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "88842541-d73b-4fae-a550-6dedf8fab633", + "metadata": { + "id": "88842541-d73b-4fae-a550-6dedf8fab633" + }, + "outputs": [], + "source": [ + "print(train[0][\"price\"])" + ] + }, + { + "cell_type": "markdown", + "id": "7b8a9a5b-f74d-487d-a400-d157fea8c979", + "metadata": { + "id": "7b8a9a5b-f74d-487d-a400-d157fea8c979" + }, + "source": [ + "## 📦 Embed + Save Training Data to Chroma\n", + "- No LangChain used.\n", + "- We use `intfloat/e5-small-v2` for embeddings:\n", + " - Fast, high-quality, retrieval-tuned\n", + " - **Requires 'passage:' prefix**\n", + "- We embed item descriptions and store them in ChromaDB, with price saved as metadata." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b95a87a8-2136-4e03-a36c-42e5d53a3e28", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 337, + "referenced_widgets": [ + "8216f5d45e9345e493a43b8cbbe6598a", + "ec3854658f8448fc8463e8635889f700", + "7a90822b2aff4d5cb926442f01a77a9b", + "9518c3af589744cfbbb51f87d68f216e", + "327044765c044384a14be4e660bb152f", + "0b773d68d2394d80a2baf73c1808752a", + "21568b9954c8411d863baa7385df624f", + "0a08828a0ba4430ea6e039949f220b5b", + "3d5a51cfb5f44eecbf80d46e2e4608fd", + "313f059a82104a9394182f6dcdb0bfb4", + "6a625748afc84fe89a8af7a4ef638675", + "ebe43cd30e414f31ab52614c6e9f9f2b", + "88c29992adaa44af857e3216f7e53e60", + "0528af78cef844e8a2b489dcb8fce049", + "8cbccd78a79447158f02caadfa7d805f", + "076ce072490c493ba5b3c431f6166eda", + "dd7780038f8a4cd3837972c78b6583bc", + "9e285e2b58934552b98edd998b82a678", + "338efda3245a4989a9b3ee0795949bb8", + "136dfb68394742ea98d9eb845730846c", + "891d821725b6457c9d06737bf75fe3ed", + "14feb4e20339465d966a6a80504eb819", + "c02b637785324b9eb88e6a2c00cb986b", + "3635da14e6f04e8f90548eb6381290a8", + "1314757f404e47f5b0f6fa4de8537863", + "9e5f2478e931476d882e471c7f66aaeb", + "4ad885d69d9f492c960ca53426189707", + "992d5e88d7844a52a283c0e19475ab78", + "43eaec936c774e3380ae4ff1a823f3dc", + "ceeb11b317ac4d37b59641024f77265f", + "5e0371de53164830b4e8c2b6954b5947", + "63a729492e8a4a759d75b769cbb3e1e7", + "14dde2c87b7b4c9ea16d48732108dcd7", + "f50717b099d142be95390ae8f1e99e6a", + "ffa64c304dab4ef18e9ef50ac1625cd6", + "f358351612004f64adffb931c3130603", + "7593358526ae4a87bf4be0eb1bcfc076", + "51536b45f5674d498272dc7b2def635d", + "8fbe2a3fc07943e7bf0fdc927bab795a", + "6b265cc65d5a42638572c1776faafdb1", + "39fa86a7760d43c793eb8ef27475af7d", + "eee5113e2dd1402faf76d00f07d8e0af", + "6792ed7123724b2d8091bc8d36255e68", + "e35094b24c154340bb1b3ebba7ac0a0d", + "dd63bb6ffed34b6687a0c79d8af93fb7", + "32080bc9381c449ab63794655ec6d714", + "eb7aa289fefc465d98edeed9ce2bff51", + "53fae218b4b74863af5fe53a66a5f7ef", + "35bc6d95c60f4c3d8ddc6b3b0845ff7e", + "f4765ca278ad4da4b465bd2920a21320", + "7ac6ead5baef4f30aff170a30a9a7977", + "e7adb5eb38d54b29b734d207982411c8", + "8f4f51b75af74daa9b9ad6696760109c", + "ae4db932b7544c6cb9ff668fa954addd", + "be63f07eedbd4d46ac4913df45216108", + "2e47d9e7b36a4ec69a9071930671ae8e", + "7b1c7f9bf0e8412abb66bcfc24cf9668", + "5c8742d3f663470e9977d006e83314b7", + "74ec67e07ee0477eb41e21093ae82858", + "4b60a8f023bc4d759bc197b11bf4e160", + "7a090f162fa84568a5e486ba935c3ed1", + "8b650428a6834f5d8ebe62ad327493e0", + "5c4d22bce82546d28a8b0c041895c8e3", + "16121b830a2948afb3ca8eb54e27a678", + "0305a4b4408f4562b87b58098148326d", + "68f07b5b7ad447ce9a87023d872c2e73", + "2156a5ced089414c99a1bb8dd3a0b3b7", + "2e6cd134c70e455a85c47b1575135883", + "f4264985b5cc4a0f970a088fb90b8bcf", + "71d790bf25324e6dbb5372f636c53da9", + "dac3ba29ee4d4083a9abca7eab632534", + "5c75c020a1914da680340fe826f3f58d", + "195e6dfb82c84f0191838acbbfe38126", + "b06adcaf8d4c497897ed3625f3afb4eb", + "d4ab3971183a4e8fa10402e3542e6466", + "444ca1f5213241c2bc71fa9ebe9ac3ca", + "34d571f76ef845f4bc272a5e05491c31", + "e8ee76b022d64b2cb24a2cb7b61aeef7", + "8c9ac87788b04ae6899f3b62fdc3ed0d", + "431b638c435444c38e50a09573b8f31b", + "0430f22e24d14171b83261faa090f349", + "0fa5ae935a554461b086a4b81470b9ad", + "f072e665d27e442ab4d0e2eb33c98db9", + "fd3b1885c39c4b70b083d7fddf74d4b6", + "f77051cb151645559223ecf835426688", + "0e17661f878948598703ee7942e5e1a2", + "fca913c6cfff48099d1744d5b091fc46", + "085baf51ecef46318ceafbaba2bb4490", + "52309039c2d8421bbb8e99f63f5ba91f", + "f4233cd960ea4f549734a5b1e1da5e2e", + "42ce1b7765f547cd9ecd8b428ec1c718", + "e72a08514d3b42d2b5fbf87a920bcdf0", + "ad05cf4c0ed44341aa3cd2cbd22b513d", + "db9915d53d784b85accebe1552c4e7e1", + "9519b6d9bf1b45e3b56da4c28d2aeb2e", + "cfeb0597708b49fa9b65342e1ac446ae", + "e29617eff6fd4199a74b670198ba2a69", + "1cea197a15d94654a0e792318435d707", + "89dcb96670a8433593e3452fad3c9210", + "0802085388be453b8fe5edee7e0a01ef", + "1ed257f19b8b44ee85f09e10178ae52f", + "04107981561149cba5baf74ccba87aa6", + "09afb010020e4b2f91d7cdbdca316962", + "b11b51beaa54474cb7682110bd2d24ae", + "47822470ddf842cd9e3368090549a2b5", + "835bce5d87a2417c9b6a5b27627447dc", + "5ca06dd536d44de784984a492d23573f", + "8e75bdb4469e497c8f021ebde7c6c9b3", + "7f4d4f8ece1d4651a2186f10a0cc25a5", + "92036442af5f4b698f2a54ecba4650e2" + ] + }, + "id": "b95a87a8-2136-4e03-a36c-42e5d53a3e28", + "outputId": "6094328e-8c33-4b40-80e9-08c5cfb3e277" + }, + "outputs": [], + "source": [ + "# Load embedding model\n", + "model_embedding = SentenceTransformer(\"intfloat/e5-small-v2\", device='cuda')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "733cf41d-e81e-4cfc-b597-67da02dbc3cf", + "metadata": { + "id": "733cf41d-e81e-4cfc-b597-67da02dbc3cf" + }, + "outputs": [], + "source": [ + "# Init Chroma\n", + "client = chromadb.PersistentClient(path=CHROMA_PATH)\n", + "collection = client.get_or_create_collection(name=\"price_items\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1f493c7d-1c72-40f9-a5c6-63c7f6b1cf2c", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 91 + }, + "id": "1f493c7d-1c72-40f9-a5c6-63c7f6b1cf2c", + "outputId": "72627732-4eee-4d9a-c8cb-0c42e2541a80" + }, + "outputs": [], + "source": [ + "# Format description function (no price in text)\n", + "def description(item):\n", + " text = item[\"text\"].replace(\"How much does this cost to the nearest dollar?\\n\\n\", \"\")\n", + " text = text.split(\"\\n\\nPrice is $\")[0]\n", + " return f\"passage: {text}\"\n", + "\n", + "description(train[0])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f44bf613-adf6-4993-bf7b-6aa9fad21a03", + "metadata": { + "id": "f44bf613-adf6-4993-bf7b-6aa9fad21a03" + }, + "outputs": [], + "source": [ + "batch_size = 300 # how many items to insert into Chroma at once\n", + "encode_batch_size = 1024 # how many items to encode at once in GPU memory\n", + "\n", + "for i in tqdm(range(0, len(train), batch_size), desc=\"Processing batches\"):\n", + "\n", + " end_idx = min(i + batch_size, len(train))\n", + "\n", + " # Collect documents and metadata\n", + " documents = [description(train[j]) for j in range(i, end_idx)]\n", + " metadatas = [{\"price\": train[j][\"price\"]} for j in range(i, end_idx)]\n", + " ids = [f\"doc_{j}\" for j in range(i, end_idx)]\n", + "\n", + " # GPU batch encoding\n", + " vectors = model_embedding.encode(\n", + " documents,\n", + " batch_size=encode_batch_size,\n", + " show_progress_bar=False,\n", + " normalize_embeddings=True\n", + " ).tolist()\n", + "\n", + " # Insert into Chroma\n", + " collection.add(\n", + " ids=ids,\n", + " documents=documents,\n", + " embeddings=vectors,\n", + " metadatas=metadatas\n", + " )\n", + "\n", + "print(\"✅ Embedding and storage to ChromaDB completed.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f2e2ccc9-b772-45f7-8258-cbc4f9c3ed59", + "metadata": {}, + "outputs": [], + "source": [ + "# Now flush and clean\n", + "print(\"🧹 Cleaning up and saving ChromaDB...\")\n", + "client = None\n", + "gc.collect()" + ] + }, + { + "cell_type": "markdown", + "id": "c35d2fab-583f-4527-a7cc-9d31214b2f35", + "metadata": {}, + "source": [ + "Our ChromaDB is currently saved in a persistent Google Drive path; for a production-ready app, we recommend uploading it to AWS S3 for better reliability and scalability.\n", + "\n", + "🧩 Now that we've generated the E5 embeddings, let's use them for both **XGBoost regression** and **GPT-4o Mini with RAG** ." + ] + }, + { + "cell_type": "markdown", + "id": "40e4c587-211d-4bc0-91cf-6267f45405d6", + "metadata": { + "id": "40e4c587-211d-4bc0-91cf-6267f45405d6" + }, + "source": [ + "## 📈 Embedding-Based Regression with XGBoost" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f058ccac-3392-457d-b54c-6471960e9af3", + "metadata": { + "id": "f058ccac-3392-457d-b54c-6471960e9af3" + }, + "outputs": [], + "source": [ + "# Step 1: Load vectors and prices from Chroma\n", + "result = collection.get(include=['embeddings', 'documents', 'metadatas'])\n", + "vectors = np.array(result['embeddings'])\n", + "documents = result['documents']\n", + "prices = [meta['price'] for meta in result['metadatas']]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "JYQo0RaMb8Ql", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 254 + }, + "id": "JYQo0RaMb8Ql", + "outputId": "c1641347-1fd4-41bb-e060-147224fc6bed" + }, + "outputs": [], + "source": [ + "# Step 2: Train XGBoost model\n", + "xgb_model = XGBRegressor(n_estimators=100, random_state=42, n_jobs=-1, verbosity=0)\n", + "xgb_model.fit(vectors, prices)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "yaqG0z7jb919", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "yaqG0z7jb919", + "outputId": "6a2f9120-97e0-4436-aa12-40d94fbc5c64" + }, + "outputs": [], + "source": [ + "# Step 3: Serialize XGBoost model locally for Hugging Face upload\n", + "MODEL_DIR = os.path.join(ROOT, \"models\")\n", + "MODEL_FILENAME = \"xgboost_model.pkl\"\n", + "LOCAL_MODEL = os.path.join(MODEL_DIR, MODEL_FILENAME)\n", + "\n", + "os.makedirs(MODEL_DIR, exist_ok=True)\n", + "joblib.dump(xgb_model, LOCAL_MODEL)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "Z_17sQUdxIr3", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 104, + "referenced_widgets": [ + "2362f3121e5546b98e4623eb3680e96b", + "ef53ee3b68c840d6a3fe98386d26bbd9", + "a4768d0ecdd640a2a5bccd07a93c54b7", + "e177440016974bc699b666fa721c6490", + "2a9d0e5829174b738b4dfea1c71a3481", + "ee6dffc7b79e405d923940166ef10590", + "57bf3388622241869a5e9dab558dca72", + "aa87f4feddd6409fbfb81f417e5d6662", + "973a83ca118e4ed1b5a51821034ecc31", + "d5a3c955aba14b3ea8e9b5c90a3bf20a", + "daaa4f26bad545a394685e266f85a6ae" + ] + }, + "id": "Z_17sQUdxIr3", + "outputId": "68ebdbdb-d42e-4bc8-addc-85b42d418d1d" + }, + "outputs": [], + "source": [ + "# Step 4: Push serialized XGBoost model to Hugging Face Hub\n", + "api = HfApi(token=hf_token)\n", + "REPO_NAME = \"smart-deal-finder-models\"\n", + "REPO_ID = f\"{HF_USER}/{REPO_NAME}\"\n", + "\n", + "# Create the model repo if it doesn't exist\n", + "api.create_repo(repo_id=REPO_ID, repo_type=\"model\", private=True, exist_ok=True)\n", + "\n", + "# Upload the saved model\n", + "api.upload_file(\n", + " path_or_fileobj=LOCAL_MODEL,\n", + " path_in_repo=MODEL_FILENAME,\n", + " repo_id=REPO_ID,\n", + " repo_type=\"model\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3f59125d-9fa6-483b-957f-4423a9b2c900", + "metadata": { + "id": "3f59125d-9fa6-483b-957f-4423a9b2c900" + }, + "outputs": [], + "source": [ + "# Step 5: Define the predictor\n", + "def xgb_predictor(datapoint):\n", + " doc = description(datapoint)\n", + " vector = model_embedding.encode([doc], normalize_embeddings=True)[0]\n", + " return max(0, xgb_model.predict([vector])[0])" + ] + }, + { + "cell_type": "markdown", + "id": "a890f1f0-d827-472f-a7a9-6c2cbe3d8341", + "metadata": { + "id": "a890f1f0-d827-472f-a7a9-6c2cbe3d8341" + }, + "source": [ + "🔔 Reminder: In Part 2, XGBoost with Word2Vec (non-contextual embeddings) achieved:\n", + "- Avg. Error: ~$107\n", + "- RMSLE: 0.83\n", + "- Accuracy: 29.20%\n", + "\n", + "🧪 Now, let’s see if contextual embeddings improve XGBoost." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "q-tIbVilTPxP", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 718 + }, + "id": "q-tIbVilTPxP", + "outputId": "7c9043ef-a2c4-4933-b334-18d99690ba0f" + }, + "outputs": [], + "source": [ + "# Step 4: Run the Tester on a subset of test data\n", + "tester = Tester(xgb_predictor, test)\n", + "tester.run()" + ] + }, + { + "cell_type": "markdown", + "id": "dcb09db0-7d69-40e1-a6e3-b92263e38f1e", + "metadata": { + "id": "dcb09db0-7d69-40e1-a6e3-b92263e38f1e" + }, + "source": [ + "Xgb Predictor Error=$110.68 RMSLE=0.93 Hits=30.4%" + ] + }, + { + "cell_type": "markdown", + "id": "1ccd5d3f-98cd-45a8-951f-d6446062addc", + "metadata": { + "id": "1ccd5d3f-98cd-45a8-951f-d6446062addc" + }, + "source": [ + "Results are nearly the same. In this setup, switching to contextual embeddings didn’t yield performance gains for XGBoost." + ] + }, + { + "cell_type": "markdown", + "id": "4db1051d-9a7e-4cec-87fc-0d77fd858ced", + "metadata": { + "id": "4db1051d-9a7e-4cec-87fc-0d77fd858ced" + }, + "source": [ + "## 🚰 Retrieval-Augmented Pipeline – GPT-4o Mini\n", + "\n", + "- Preprocess: clean the input text (description(item))\n", + "- Embed: generate embedding vector (get_embedding(item))\n", + "- Retrieve: find similar items from ChromaDB (find_similar_items)\n", + "- Build Prompt: create the LLM prompt using context and masked target (build_messages)\n", + "- Predict: get price estimate from LLM (estimate_price)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "YPLxSn7eHp9N", + "metadata": { + "id": "YPLxSn7eHp9N" + }, + "outputs": [], + "source": [ + "test[1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eFxFKNroNiyD", + "metadata": { + "id": "eFxFKNroNiyD" + }, + "outputs": [], + "source": [ + "# Step 1: Preprocess test item text\n", + "# (uses the same `description(item)` function as during training)\n", + "description(test[1])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "lxIEtSWYHqCT", + "metadata": { + "id": "lxIEtSWYHqCT" + }, + "outputs": [], + "source": [ + "# Step 2: Embed a test item\n", + "def get_embedding(item):\n", + " return model_embedding.encode([description(item)])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "y43prQsuHp_w", + "metadata": { + "id": "y43prQsuHp_w" + }, + "outputs": [], + "source": [ + "# Step 3: Query Chroma for similar items\n", + "def find_similars(item):\n", + " results = collection.query(query_embeddings=get_embedding(item).astype(float).tolist(), n_results=5)\n", + " documents = results['documents'][0][:]\n", + " prices = [m['price'] for m in results['metadatas'][0][:]]\n", + " return documents, prices" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "nxAOUFRkHp6v", + "metadata": { + "id": "nxAOUFRkHp6v" + }, + "outputs": [], + "source": [ + "documents, prices = find_similars(test[1])\n", + "documents, prices" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "djPoSk6sHo84", + "metadata": { + "id": "djPoSk6sHo84" + }, + "outputs": [], + "source": [ + "# Step 4: Format similar items as context\n", + "def format_context(similars, prices):\n", + " message = \"To provide some context, here are some other items that might be similar to the item you need to estimate.\\n\\n\"\n", + " for similar, price in zip(similars, prices):\n", + " message += f\"Potentially related product:\\n{similar}\\nPrice is ${price:.2f}\\n\\n\"\n", + " return message" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "F3yxhnqSHp4C", + "metadata": { + "id": "F3yxhnqSHp4C" + }, + "outputs": [], + "source": [ + "print(format_context(documents, prices))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "pEJobsKNHqE8", + "metadata": { + "id": "pEJobsKNHqE8" + }, + "outputs": [], + "source": [ + "# Step 5: Mask the price in the test item\n", + "def mask_price_value(text):\n", + " return re.sub(r\"(\\n\\nPrice is \\$).*\", r\"\\1\", text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "vLhBNVBNQAHS", + "metadata": { + "id": "vLhBNVBNQAHS" + }, + "outputs": [], + "source": [ + "# Step 6: Build LLM messages\n", + "def build_messages(datapoint, similars, prices):\n", + "\n", + " system_message = \"You estimate prices of items. Reply only with the price, no explanation.\"\n", + "\n", + " context = format_context(similars, prices)\n", + "\n", + " prompt = mask_price_value(datapoint[\"text\"])\n", + " prompt = prompt.replace(\" to the nearest dollar\", \"\").replace(\"\\n\\nPrice is $\", \"\")\n", + "\n", + " user_prompt = context + \"And now the question for you:\\n\\n\" + prompt\n", + "\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt},\n", + " {\"role\": \"assistant\", \"content\": \"Price is $\"}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "I94fNHfBHp1a", + "metadata": { + "id": "I94fNHfBHp1a" + }, + "outputs": [], + "source": [ + "build_messages(test[1], documents, prices)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5NfY_GAVHpy4", + "metadata": { + "id": "5NfY_GAVHpy4" + }, + "outputs": [], + "source": [ + "# Step 7: Run prediction\n", + "def get_price(s):\n", + " s = s.replace('$','').replace(',','')\n", + " match = re.search(r\"[-+]?\\d*\\.\\d+|\\d+\", s)\n", + " return float(match.group()) if match else 0\n", + "\n", + "def gpt_4o_mini_rag(item):\n", + " documents, prices = find_similars(item)\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4o-mini\",\n", + " messages=build_messages(item, documents, prices),\n", + " seed=42,\n", + " max_tokens=5\n", + " )\n", + " reply = response.choices[0].message.content\n", + " return get_price(reply)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "Pg-GJTT0HpwV", + "metadata": { + "id": "Pg-GJTT0HpwV" + }, + "outputs": [], + "source": [ + "print(test[1][\"price\"])\n", + "print(gpt_4o_mini_rag(test[1]))" + ] + }, + { + "cell_type": "markdown", + "id": "54103ab4-d6dd-4c0b-add5-5d9741e934b4", + "metadata": { + "id": "54103ab4-d6dd-4c0b-add5-5d9741e934b4" + }, + "source": [ + "🔔 Reminder: In Part 2, GPT-4o Mini (without RAG) achieved:\n", + "- Avg. Error: ~$99\n", + "- RMSLE: 0.75\n", + "- Accuracy: 44.8%\n", + "\n", + "🧪 Let’s find out if RAG can boost GPT-4o Mini’s price prediction capabilities.\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "r0NGJupwHppF", + "metadata": { + "id": "r0NGJupwHppF" + }, + "outputs": [], + "source": [ + "Tester.test(gpt_4o_mini_rag, test)" + ] + }, + { + "cell_type": "markdown", + "id": "00545880-d9e1-4934-8008-b62c105d177b", + "metadata": { + "id": "00545880-d9e1-4934-8008-b62c105d177b" + }, + "source": [ + "Gpt 4O Mini Rag Error=$59.54 RMSLE=0.42 Hits=69.2%" + ] + }, + { + "cell_type": "markdown", + "id": "2b9f46ae-92b5-4189-89b0-df88a600bb89", + "metadata": { + "id": "2b9f46ae-92b5-4189-89b0-df88a600bb89" + }, + "source": [ + "🎉 **GPT-4o Mini + RAG shows clear gains:** \n", + "Average error dropped from **$99 → $59.54**, RMSLE from **0.75 → 0.42**, and accuracy rose from **48.8% → 69.2%**. \n", + "\n", + "Adding retrieval-based context led to a strong performance boost for GPT-4o Mini.\n", + "\n", + "Now the question is — can fine-tuning push it even further, surpass RAG, and challenge larger models?\n", + "\n", + "🔜 See you in the [next notebook](https://github.com/lisekarimi/lexo/blob/main/09_part4_ft_gpt4omini.ipynb)" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "A100", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week6/community-contributions/lisekarimi/09_part4_ft_gpt4omini.ipynb b/week6/community-contributions/lisekarimi/09_part4_ft_gpt4omini.ipynb new file mode 100644 index 0000000..84ca7e6 --- /dev/null +++ b/week6/community-contributions/lisekarimi/09_part4_ft_gpt4omini.ipynb @@ -0,0 +1,510 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "12934dbc-ff4f-4dfc-8cc1-d92cc8826cf2", + "metadata": {}, + "source": [ + "# 🔍 Predicting Item Prices from Descriptions (Part 4)\n", + "---\n", + "- Data Curation & Preprocessing\n", + "- Model Benchmarking – Traditional ML vs LLMs\n", + "- E5 Embeddings & RAG\n", + "- ➡️ Fine-Tuning GPT-4o Mini\n", + "- Evaluating LLaMA 3.1 8B Quantized\n", + "- Fine-Tuning LLaMA 3.1 with QLoRA\n", + "- Evaluating Fine-Tuned LLaMA\n", + "- Summary & Leaderboard\n", + "\n", + "---\n", + "\n", + "# 🔧 Part 4: Fine-Tuning GPT-4o Mini\n", + "\n", + "- 🧑‍💻 Skill Level: Advanced\n", + "- ⚙️ Hardware: ✅ CPU is sufficient — no GPU required\n", + "- 🛠️ Requirements: 🔑 HF Token, Open API Key, wandb API Key\n", + "- Tasks:\n", + " - Convert chat data to .jsonl format for OpenAI\n", + " - Fine-tune the model and monitor with Weights & Biases\n", + " - Test the fine-tuned GPT-4o Mini \n", + "\n", + "Can fine-tuning GPT-4o Mini outperform both its zero-shot baseline and RAG-enhanced version? \n", + "Time to find out.\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5809630f-d3ea-41df-86ec-9cbf59a46f5c", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import importlib\n", + "import json\n", + "import re\n", + "from dotenv import load_dotenv\n", + "from huggingface_hub import login\n", + "from datasets import load_dataset\n", + "from openai import OpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4120c84d-c310-4d31-9e1f-1549ea4a4186", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "if not openai_api_key:\n", + " print(\"❌ OPENAI_API_KEY is missing\")\n", + "\n", + "openai = OpenAI(api_key=openai_api_key)\n", + "\n", + "hf_token = os.getenv('HF_TOKEN')\n", + "if not hf_token:\n", + " print(\"❌ HF_TOKEN is missing\")\n", + "\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "markdown", + "id": "31d3aa97-68a8-4f71-a43f-107f7c8553c5", + "metadata": {}, + "source": [ + "## 📥 Load Dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f2bae96a", + "metadata": {}, + "outputs": [], + "source": [ + "# #If you face NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported run:\n", + "# %pip install -U datasets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c45e23d6-1304-4859-81f0-35a9ddf1c755", + "metadata": {}, + "outputs": [], + "source": [ + "HF_USER = \"lisekarimi\"\n", + "DATASET_NAME = f\"{HF_USER}/pricer-data\"\n", + "\n", + "dataset = load_dataset(DATASET_NAME)\n", + "train = dataset['train']\n", + "test = dataset['test']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "667adda8-add8-41b6-9e60-7870bad20c02", + "metadata": {}, + "outputs": [], + "source": [ + "test[0]" + ] + }, + { + "cell_type": "markdown", + "id": "b85d86d0-b6b1-49cd-9ef0-9214c1267199", + "metadata": {}, + "source": [ + "## 🛠️ Step 1 : Data Preparation" + ] + }, + { + "cell_type": "markdown", + "id": "d3ba760d-467a-4cd9-8d3f-e6ce84273610", + "metadata": {}, + "source": [ + "To fine-tune GPT-4o-mini, OpenAI requires training data in **.jsonl format**. \n", + "\n", + "`make_jsonl` converts our chat data :\n", + "\n", + "from \n", + "\n", + "[\n", + " {\"role\": \"system\", \"content\": \"You estimate prices of items. Reply only with the price, no explanation\"},\n", + " {\"role\": \"user\", \"content\": \"How much is this laptop worth?\"},\n", + " {\"role\": \"assistant\", \"content\": \"Price is $999.00\"}\n", + "]\n", + "\n", + "into the .jsonl format \n", + "\n", + "{\"messages\": [{\"role\": \"system\", \"content\": \"You estimate prices of items. Reply only with the price, no explanation\"}, {\"role\": \"user\", \"content\": \"How much is this laptop worth?\"}, {\"role\": \"assistant\", \"content\": \"Price is $999.00\"}]}\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ec254755-67f6-4676-b67f-c1376ea00124", + "metadata": {}, + "outputs": [], + "source": [ + "# Mask the price in the test item\n", + "def mask_price_value(text):\n", + " return re.sub(r\"(\\n\\nPrice is \\$).*\", r\"\\1\", text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e5e51957-b0ec-49f9-ae70-74771a101756", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(datapoint):\n", + " system_message = \"You estimate prices of items. Reply only with the price, no explanation\"\n", + " user_prompt = mask_price_value(datapoint[\"text\"]).replace(\" to the nearest dollar\", \"\").replace(\"\\n\\nPrice is $\",\"\")\n", + " assistant_response = f\"Price is ${datapoint['price']:.2f}\"\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt},\n", + " {\"role\": \"assistant\", \"content\": assistant_response}\n", + " ]\n", + "\n", + "messages_for(train[0])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "03583d32-b0f2-44c0-820e-62c8e7e48247", + "metadata": {}, + "outputs": [], + "source": [ + "def make_jsonl(datapoints):\n", + " result = \"\"\n", + " for datapoint in datapoints:\n", + " messages = messages_for(datapoint)\n", + " messages_str = json.dumps(messages, ensure_ascii=False)\n", + " result += '{\"messages\": ' + messages_str + '}\\n'\n", + " return result.strip()\n", + "\n", + "make_jsonl(train.select([0]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36c9cf60-0bcb-44cb-8df6-ff2ed4110cd2", + "metadata": {}, + "outputs": [], + "source": [ + "ft_train = train.select(range(100))\n", + "ft_validation = train.select(range(100, 150))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "494eaecd-ae5d-4396-b694-6faf88fb7fd6", + "metadata": {}, + "outputs": [], + "source": [ + "# Convert the items into jsonl and write them to a file\n", + "\n", + "def write_jsonl(datapoints, filename):\n", + " with open(filename, \"w\", encoding=\"utf-8\") as f:\n", + " jsonl = make_jsonl(datapoints)\n", + " f.write(jsonl)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ae42986d-ab02-4a11-aa0c-ede9c63ec7a2", + "metadata": {}, + "outputs": [], + "source": [ + "write_jsonl(ft_train, \"data/ft_train.jsonl\")\n", + "write_jsonl(ft_validation, \"data/ft_val.jsonl\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b9bed22d-73ad-4820-a983-cbdccd8dbbc8", + "metadata": {}, + "outputs": [], + "source": [ + "with open(\"data/ft_train.jsonl\", \"rb\") as f:\n", + " train_file = openai.files.create(file=f, purpose=\"fine-tune\")\n", + "with open(\"data/ft_val.jsonl\", \"rb\") as f:\n", + " validation_file = openai.files.create(file=f, purpose=\"fine-tune\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1e6c6ce8-6600-4068-9ec5-32c6428ce9ea", + "metadata": {}, + "outputs": [], + "source": [ + "train_file" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "26943fad-4301-4bb4-97e8-be52a9743322", + "metadata": {}, + "outputs": [], + "source": [ + "validation_file" + ] + }, + { + "cell_type": "markdown", + "id": "edb0a3ec-1607-4c5b-ab06-852f951cae8b", + "metadata": {}, + "source": [ + "## 🚀 Step 2: Run Fine-Tuning & Monitor with wandb\n", + "We will use https://wandb.ai to monitor the training runs\n", + "\n", + "1- Create an API key in wandb\n", + "\n", + "2- Add this key in OpenAI dashboard https://platform.openai.com/account/organization" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "59f552fe-5e80-4742-94a8-5492556a6543", + "metadata": {}, + "outputs": [], + "source": [ + "wandb_integration = {\"type\": \"wandb\", \"wandb\": {\"project\": \"gpt-pricer\"}}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "144088d7-7c30-439a-9282-1e6096c181ea", + "metadata": {}, + "outputs": [], + "source": [ + "# Run the fine tuning\n", + "\n", + "openai.fine_tuning.jobs.create(\n", + " training_file=train_file.id,\n", + " validation_file=validation_file.id,\n", + " model=\"gpt-4o-mini-2024-07-18\",\n", + " seed=42,\n", + " hyperparameters={\"n_epochs\": 1},\n", + " integrations = [wandb_integration],\n", + " suffix=\"pricer\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "330e75f5-0208-4c74-8dd3-07bc06047b2e", + "metadata": {}, + "outputs": [], + "source": [ + "job_id = openai.fine_tuning.jobs.list(limit=1).data[0].id\n", + "job_id\n", + "\n", + "# Then check your wandb dashboard to view the run of this job ID" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a92dac5-e6d8-439c-b55e-507becb37a6c", + "metadata": {}, + "outputs": [], + "source": [ + "# Use this command to track the fine-tuning progress here\n", + "\n", + "openai.fine_tuning.jobs.list_events(fine_tuning_job_id=job_id, limit=2).data" + ] + }, + { + "cell_type": "markdown", + "id": "b6b65677-06b2-47d3-b0e6-51210a3d832b", + "metadata": {}, + "source": [ + "# 📧 You’ll get an email once fine-tuning is complete. ☕ You can take a break until then. ▶️ Once you receive it, run the cells below to continue." + ] + }, + { + "cell_type": "markdown", + "id": "0a7af4be-0b55-4654-af7a-f47485babc52", + "metadata": {}, + "source": [ + "## Step 3 : Test the fine tuned model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c8497eb8-49ee-4a05-9e51-fc1b4b2b41d4", + "metadata": {}, + "outputs": [], + "source": [ + "ft_model_name = openai.fine_tuning.jobs.retrieve(job_id).fine_tuned_model\n", + "ft_model_name" + ] + }, + { + "cell_type": "markdown", + "id": "12bed33f-be31-4d7c-8651-3f267c529304", + "metadata": {}, + "source": [ + "You can find the entire fine-tuning process in the **Fine-tuning** dashboard on OpenAI.\n", + "\n", + "![Fine-tuning Process](https://github.com/lisekarimi/lexo/blob/main/assets/09_ft_gpt4omini.png?raw=true)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ac6a89ef-f982-457a-bad7-bd84b6132a07", + "metadata": {}, + "outputs": [], + "source": [ + "# Build LLM messages\n", + "def build_messages(datapoint):\n", + " system_message = \"You estimate prices of items. Reply only with the price, no explanation\"\n", + " user_prompt = mask_price_value(datapoint[\"text\"]).replace(\" to the nearest dollar\", \"\").replace(\"\\n\\nPrice is $\",\"\")\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt},\n", + " {\"role\": \"assistant\", \"content\": \"Price is $\"}\n", + " ]\n", + "\n", + "def get_price(s):\n", + " s = s.replace('$','').replace(',','')\n", + " match = re.search(r\"[-+]?\\d*\\.\\d+|\\d+\", s)\n", + " return float(match.group()) if match else 0\n", + "\n", + "def gpt_ft(datapoint):\n", + " response = openai.chat.completions.create(\n", + " model=ft_model_name,\n", + " messages=build_messages(datapoint),\n", + " seed=42,\n", + " max_tokens=7\n", + " )\n", + " reply = response.choices[0].message.content\n", + " return get_price(reply)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "93a93017-458c-4769-b81c-b2dad2af7552", + "metadata": {}, + "outputs": [], + "source": [ + "print(test[0][\"price\"])\n", + "print(gpt_ft(test[0]))" + ] + }, + { + "cell_type": "markdown", + "id": "87a5ad10-ed60-4533-ad61-225ceb847e6c", + "metadata": {}, + "source": [ + "🔔 **Reminder:** \n", + "- In **Part 2**, GPT-4o Mini (zero-shot) scored: \n", + " Avg. Error: ~$99 | RMSLE: 0.75 | Accuracy: 44.8% \n", + "\n", + "- In **Part 3**, with **RAG**, performance improved to: \n", + " Avg. Error: ~$59.54 | RMSLE: 0.42 | Accuracy: 69.2%\n", + "\n", + "🧪 **Now it’s time to see** if fine-tuning can push GPT-4o Mini even further and outperform both baselines." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0adf1500-9cc7-491a-9ea6-88932af85dca", + "metadata": {}, + "outputs": [], + "source": [ + "import helpers.testing\n", + "importlib.reload(helpers.testing)\n", + "\n", + "from helpers.testing import Tester # noqa: E402\n", + "\n", + "tester = Tester(gpt_ft, test)\n", + "tester.run()" + ] + }, + { + "cell_type": "markdown", + "id": "37439666", + "metadata": {}, + "source": [ + "Gpt Ft Error=$129.16 RMSLE=0.94 Hits=35.2%" + ] + }, + { + "cell_type": "markdown", + "id": "5487da30-e1a8-4db5-bf17-80bc4f109524", + "metadata": {}, + "source": [ + "**Fine-tuning GPT-4o Mini led to worse performance than both its zero-shot and RAG-enhanced versions.**\n", + "\n", + "⚠️ When Fine-Tuning Isn’t Needed:\n", + "- For tasks like price prediction, GPT-4o performs well with prompting alone — thanks to strong pretraining and generalization.\n", + "- 💡 Fine-tuning isn’t always better. Use it when prompting fails — not by default.\n", + "\n", + "✅ **When Fine-Tuning Is Worth It (based on OpenAI’s own guidelines)**\n", + "- Custom tone/style – e.g., mimicking a brand voice or writing like a specific author\n", + "- More consistent output – e.g., always following a strict format\n", + "- Fix prompt failures – e.g., when multi-step instructions get ignored\n", + "- Handle edge cases – e.g., rare product types or weird inputs\n", + "- Teach new tasks – e.g., estimating prices in a custom format no model has seen before\n", + "\n", + "---\n", + "\n", + "Now that we’ve explored both frontier closed-source models and traditional ML, it’s time to turn to open-source.\n", + "\n", + "🚀 **Next up: Fine-tuned LLaMA 3.1 8B (quantized)** — can it beat its base version, outperform GPT-4o Mini, or even challenge the big players?\n", + "\n", + "🔍 Let’s find out in the [next notebook](https://github.com/lisekarimi/lexo/blob/main/09_part5_llama31_8b_quant.ipynb)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week6/community-contributions/lisekarimi/data/human_output.csv b/week6/community-contributions/lisekarimi/data/human_output.csv new file mode 100644 index 0000000..e516273 --- /dev/null +++ b/week6/community-contributions/lisekarimi/data/human_output.csv @@ -0,0 +1,1500 @@ +"How much does this cost to the nearest dollar? + +OEM AC Compressor w/A/C Repair Kit For Ford F150 F-150 V8 & Lincoln Mark LT 2007 2008 - BuyAutoParts NEW +As one of the world's largest automotive parts suppliers, our parts are trusted every day by mechanics and vehicle owners worldwide. This A/C Compressor and Components Kit is manufactured and tested to the strictest OE standards for unparalleled performance. Built for trouble-free ownership and 100% visually inspected and quality tested, this A/C Compressor and Components Kit is backed by our 100% satisfaction guarantee. Guaranteed Exact Fit for easy installation 100% BRAND NEW, premium ISO/TS 16949 quality - tested to meet or exceed OEM specifications Engineered for superior durability, backed by industry-leading unlimited-mileage warranty Included in this K + +Price is $",120 +"How much does this cost to the nearest dollar? + +Motorcraft YB3125 Fan Clutch +Motorcraft YB3125 Fan Clutch Package Dimensions 25.146 cms (L) x 20.066 cms (W) x 15.494 cms (H) Package Quantity 1 Product Type Auto Part Country Of Origin China Manufacturer Motorcraft, Brand Motorcraft, Model Fan Clutch, Weight 5 pounds, Dimensions 10 x 7.63 x 6.25 inches, Country of Origin China, model number Exterior Painted, Manufacturer Part Rank Automotive Automotive Replacement Engine Fan Clutches 583, Domestic Shipping can be shipped within U.S., International Shipping This item can be shipped to select countries outside of the U.S. Learn More, Available October 10, 2007 + +Price is $",80 +"How much does this cost to the nearest dollar? + +Dorman Front Washer Fluid Reservoir Compatible with Select Ford/Lincoln/Mercury Models +This washer fluid reservoir is designed to match the fit and function of the original equipment reservoir. It is engineered to withstand the stresses of underhood heat and engine vibration on specified vehicle makes, models, and years. This part is compatible with the following vehicles. Before purchasing, enter your vehicle trim in the garage tool to confirm fitment. Ford Explorer 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 - Lincoln Aviator 2003, 2004, 2005 - Mercury Mountaineer 2002, 2003, 2004, 2005, + +Price is $",35 +"How much does this cost to the nearest dollar? + +HP Premium HD Plus Touchscreen 1TB HDD 2.3GHz AMD Ryzen 5, (12GB RAM, Ryzen 5 4500U, DVD Writer, Windows 10 Home) Natural Silver, (Renewed) +This pre-owned or refurbished product has been professionally inspected and tested to work and look like new. How a product becomes part of Amazon Renewed, your destination for pre-owned, refurbished products A customer buys a new product and returns it or trades it in for a newer or different model. That product is inspected and tested to work and look like new by Amazon-qualified suppliers. Then, the product is sold as an Amazon Renewed product on Amazon. If not satisfied with the purchase, renewed products are eligible for replacement or refund under the Amazon Renewed Guarantee. + +Price is $",350 +"How much does this cost to the nearest dollar? + +Super Switch Pickup Selector Super Switch 4-Pole Double Wafer for Strat/Nashville Tele Guitars with Black/Ivory/White Tips +Dopro Super Switch Pickup Selector Super Switch 4-Pole Double Wafer for Strat/Nashville Tele Guitars with Black/Ivory/White Tips Package includes 3 free tips which normally sold separately Five-position blade pickup selector switch ideal for four-conductor pickups. Used on American made Fat Strat and Double Fat Strat models, and on Nashville Telecaster models. Mounting screws included. Mounting screws and 35mm cavity depth required 1-5/8 standard mounting screw spacing Please consult your local Luthier if you don't know how to install the switch Dimensions 5.47 x 4.29 x 0 + +Price is $",75 +"How much does this cost to the nearest dollar? + +Horror Bookmarks, Resin Horror Bookmarks for Adults, The Best Gift for Fans of Horror Novels, Horror Personalized Bookmarks for Men Women, Horror Figures Sculpture Bookmarks (Set) +Horror Bookmarks - The Best Gift for Fans of Horror Novels, Resin Horror Bookmarks for Adults, Half-Length Figure Sculpture Bookmarks, Horror Bookmark Set for Men Women Office Supplies Specification Upper Part Material Resin Lower Part Of Material Wood Character Size 3.3cm x 2cm / x Overall Size 10cm / 4inch Package Includes 1 / 6 x Horror bookmarks - the best gift for fans of horror novels Note 1. Due to manual measurements, please allow slight measurement deviations. 2. Due to the different display and lighting effects, the + +Price is $",12 +"How much does this cost to the nearest dollar? + +SK6241 - Stinger 4 Gauge 6000 Series Power Amplifier Installation Kit +Amplifier installation kits can save you time & money when installing an amplifier in your vehicle. Instead of purchasing everything you need separately like power/ground cables, remote turn-on cable, fuse blocks, fuses, and more, you can get an amp kit that already has the cables & accessories in one package. Amplifier wiring kits come in various gauge configurations depending on the wattage of your sound system and include all the necessary components you need to successfully install an amplifier. This Stinger 4 AWG 6000 Series Power Amplifier Wiring Kit includes all the necessary hardware and wire to power one amplifier and is designed specifically for car audio systems up to 1750 watts. The included premium power + +Price is $",115 +"How much does this cost to the nearest dollar? + +Godox ML60Bi LED Light Kit, Handheld LED Video Light, Bi-Color LED Light, CRI 96+ TLCI 97+, 7 FX Effects, Slient Mode, with Softbox, RC-A6 Remote Control, 2X NP970 Lithium Battery +🌸Breathtakingly powerful and portable 10100 Lux @ 1 m with the included reflector. Weighing just 0.77 kg (lightweight housing only), the ML60Bi is extremely easy to hold and carry. In relation to its small size, it produces an impressively high output and thus offers you more flexible creativity for your productions. 🌸Variable color temperature The wide color temperature range from 2800K to 6500K allows you to quickly + +Price is $",100 +"How much does this cost to the nearest dollar? + +Randall G3 Plus Combo Guitar Amp +The Randall G3 is a combo amplifier that brings even more intensity and aggressiveness to your guitar playing with a redesigned preamp circuit, a 12 speaker, and digital effects. Randall took the high gain tone circuit of their flagship V2 and T2 guitar amplifiers and installed it in each of the G3 Plus amps. What you get is higher professional level tone and performance without the costs of a high-end amp. EQ Controls - Bass, Middle, Treble, Sweep, Voicing Master Control - Volume Power Output - 100 Watts @ 4 Ohms Single Speaker - 12 Celestion Seventy 80 2 Mode - Weight 51 Pounds, Dimensions 13.5 x 27 x 20.5 inches, + +Price is $",300 +"How much does this cost to the nearest dollar? + +HOLDWILL 6 Pack LED Shop Light, 4FT 24W 6500K, Cool White, Clear Cover, Hight Output, Linkable 4 Foot LED Strip Lights, T8 LED Tube Lights, LED Light Fixture for Garage Ceiling with Plug Cable +Specification Length Power 24W Lumen 2640lm CCT 6500K Beam Angle 120 degrees Luminous Flux 110lm per watt LED chips 120pcs Color Rendering Index(CRI) 85 Operation Temperature °F Lamp Power Factor 0.95 Input Voltage Material Top quality Aluminum & superior PC BRIGHT - HOLDWILL shop light bring your application the best lighting available with an incredible minimum of Illuminate the dark corners of your space SAVING - Each led under cabinet light has 24w + +Price is $",80 +"How much does this cost to the nearest dollar? + +Viking Horns 3 Gallon Air Tank and 200 PSI Air Compressor Kit, for Train Air Horn +New 3 gallon (12 liter) Air Tank & 200 PSI H.D Air Compressor Kit. For high pressure air horns systems that require an on-board air system. Air tank made from heavy gauge steel for outdoor use. Compact design and weight. Comes as a kit, ready to install, with 240 psi Air Gauge, 200 psi Air Pressure Switch, Compression Fittings for 1/4 O.D air hose, air pressure release safety valve. Comes with a 200 PSI Heavy Duty Air Compressor with a 1/4 inch NPT braided air hose, and Mounting Hardware. Both, Air Tank & Compressor Kit come complete ready for installation + +Price is $",90 +"How much does this cost to the nearest dollar? + +CURT 70110 Custom Tow Bar Base Plate Brackets for Dinghy Towing, Fits Select Jeep Wrangler JK +CUSTOM FIT. This tow bar base plate is uniquely engineered to fit select years of the Jeep Wrangler JK. It bolts onto the frame, providing a reliable connection for flat towing (not compatible with universal tow bar) HIGHLY VERSATILE. Add versatile towing options to your vehicle with this class 1 trailer hitch. It provides a standard, square receiver hitch and comes with a ball mount with a ball hole to connect a small trailer SOLID STRENGTH. CURT towing base plates and tow bar brackets are constructed from high-strength steel and welded together with precision for dependable towing strength DOUBLE FINISH. This tow plate is finished in a tough shield of dual-co + +Price is $",120 +"How much does this cost to the nearest dollar? + +Solar HAMMERED BRONZE Finish Post Deck Fence Cap Lights for 4 X 4 WOOD Post With White LEDs and Vertical-lined Clear Lens -GREEN NATURAL SOLAR +During the day, these energy-efficient Atlantic Solars lights harness sunlight energy to charge their internal Lithium Battery. At night they automatically turn on, emitting a bright glow for your fence post line. Each Atlantic Solars light is made of Premium Heat-Resistant Plastic for years of use. With our Newest & Improved Solar Panels, a 3.2V Lithium Ion Battery, and 5 Ultra Bright White LEDs, its output is virtually unparalleled. Atlantic Solars 4 x 4 Fence Post Caps One Pre-Installed 3.2V Lithium-Ion Battery for each light Light source + +Price is $",300 +"How much does this cost to the nearest dollar? + +COSTWAY Electric Tumble Dryer, Sliver +This is our brand new compact dryer with 10 lbs. cloth capacity, it will be your best helper to dry your cloth or sheet in a short time, It has four mode air dry, cool, warm, and hot. You can choose the drying time or mode according to the material and weight of the cloth. This dryer combine the cooling, wrinkle, freshening, function, without taking up a lot of room will bring much convenience for your life. Don't hesitate to buy one! feature brand new and high quality 1. 5 cu. Ft. Capacity allows you to dry up to 10 lbs. Of clothing stainless steel tub provides durability see- through window lets you monitor clothes as they dry four mode air dry + +Price is $",450 +"How much does this cost to the nearest dollar? + +FREE SIGNAL TV Transit 32 12 Volt DC Powered LED Flat Screen HDTV for RV Camper and Mobile Use +Mobile High Performance 32 inch LED TV - Get HD picture quality from this superb DC-powered 12 volt television with 1366 x 768 resolution. Groundbreaking engineering results in a lightweight TV with dynamic audio response and advanced noise reduction circuitry. The Easy-to-Set-Up and Versatile RV Flat Screen TV - Simple 12V connection. 3 HDMI Inputs. Can also be used at home by converting to AC with the included 1260 Power Brick Adapter. Perfect Television for Campers, Trailers, RVs, and More - The Transit 32 inch 12 volt powered flat screen TV is also ideal for cars and trucks. With high-resolution picture from a + +Price is $",280 +"How much does this cost to the nearest dollar? + +Bilstein 5100 Monotube Gas Shock Set compatible with Jeep Cherokee XJ w/2-3 Lift +Bilstein 5100 shocks utilize a monotube design, with a 46mm digressive piston, to increase road handling characteristics both on and off-road. The single tube body allows for rapid heat transfer between the shock oil to the shock body, and then dissipated further to reduce shock fade. German engineering combined with US manufacturing ensures optimum performance and longevity for upwards of 100k miles (real customer feedback!). The 5100 series is more compliant in tougher situations and off-road terrain compared to the 4600 series. Factory spec vehicles should use the 4600 series shocks as the valving is more appropriate for daily driven vehicles and pavement environments. Part Numbers + +Price is $",140 +"How much does this cost to the nearest dollar? + +Sangean K-200 Multi-Function Upright AM/FM Digital Radio (Pink) +Product Description The K-200 from SANGEAN brings sleekness and uniqueness into a multi-media entertainment unit including the features from traditional alarm clock. The versatile and unique omni-directional speaker design fills your work place with high-performance sound quality that plays your music with crystal clear digital sound and deep bass for more powerful overall sound. The eye-catching night light with 8 brightness settings definitely gives a little brightness to your counter. From the Manufacturer Sangean's new sleek and unique K-200 AM / FM-RBDS Digital Tuning Kitchen Radio brings a multi-media entertainment center to your kitchen combines the features of a traditional alarm clock. The versatile and unique omni-directional speaker design fills your + +Price is $",19.99 +"How much does this cost to the nearest dollar? + +Charles Leonard Magnetic Lapboard Class Combo Pack, Includes 12 Each Plain/Plain 9 x 12 Inch White Boards, 2-Inch Erasers, Black Dry Erase Markers +Class pack provides an environmentally friendly way for children to learn. Dry-erase surface lets students write, wipe off and reuse with no waste. Classroom set includes magnetic lap boards that are dual-sided; dry-erase markers; and multipurpose erasers. ALL INCLUSIVE set has everything you need for the classroom! 12 Each Double Sided Magnetic Plain/Plain 9x12 lap boards. Individual dry erase boards are large enough to work on for math problems, English language arts, Penmanship or for drawing. Just write and wipe, it’s that easy SAFETY FIRST each set also includes 30 + +Price is $",85 +"How much does this cost to the nearest dollar? + +Gigabyte AMD Radeon HD 7870 2 GB GDDR5 Mini-Displayport PCI-Express 3.0 Graphic Card +Powered by AMD Radeon HD 7870 GPU and Integrated with the industry's best 2 GB GDDR5 memory and memory interface Ultra Durable VGA Components - GPU Temperature 5%-10% Down - Overclocing Capability Up - Power Switching Loss Down WINDFORCE 3X Anti-Turbulence Cooling with New Triangle Cool Technology Gold Plated HDMI for optimum signal transfer between connections Features mini-Display port outputs with HDCP protection Supports AMD Eyefinity/Eyespeed/CrossFire/Avivo HD Technologies Minimum Recommended Power Supply 500W or greater with 2x 6-pin VGA power connectors Max Screen Resolution 4096 x + +Price is $",450 +"How much does this cost to the nearest dollar? + +3dRose LLC 8 x 8 x 0.25 Inches Bull Terrier on Zebra Pattern Mouse Pad +Bull Terrier On Zebra Mouse Pad is 8 x 8 x.25 and is made of heavy-duty recycled rubber. Matte finish image will not fade or peel. Machine washable using a mild detergent and air dry. Dimensions (in inches) 8 W x 8 H x 0.25 D Matte finish Soft to touch, will not crack or peel Clean with mild detergent Made of heavy-duty recycled rubber Manufacturer 3D Rose (Home Improvement), Brand Weight 4.9 ounces, Dimensions 8 x 0.25 x 8 inches, model number Shape Square, Material Type Rubber, s 1, Size 8\ x + +Price is $",7 +"How much does this cost to the nearest dollar? + +ROKINON 85mm F1.4 Auto Focus Full Frame Weather Sealed High Speed Telephoto Lens for Nikon F Mount +Dslr camera lens For Nikon F Mount full-frame & APS-C DSLR cameras Aperture range f/1.4 to f/16 Ultra multi-coated Optics; weather-sealed Takes front filter size of 77mm Constructed of 9 elements in 7 groups Dimensions 2.9 x 3.2 x 3.2 inches, Weight 1.06 pounds, model number Rank SLR Camera Lenses 1944, Available April 18, 2019, Manufacturer Rokinon, Country of Origin Korea, Republic of, Brand Rokinon, Focal Length Description 85mm, Lens Type Telephoto + +Price is $",320 +"How much does this cost to the nearest dollar? + +Headlight Assembly Compatible with 2012 2013 2014 2015 Civic Sedan 4-Door 12 13 Civic Coupe 2-Door Black Housing Amber Reflector +Vehicle compatibility headlights assembly compatible with Civic Sedan 4-Door / Civic Coupe high beam mode 9005 and low beam mode 9006; bulbs are not included Waterproof fully sealed with solid silicon & designed with one-way vents to prevent moisture from being trapped inside the housing, no corrosion or moisture worries in sorts of weather conditions Brighter lighting the metallic parabolic reflectors provide more light output to create a broader and smoother beam Safety reflector the sided micro-prism reflector makes the oncoming cars or passerby quickly notice you at night, ensuring your driving safety and others' Impact resistance the + +Price is $",110 +"How much does this cost to the nearest dollar? + +ASI NAUTICAL 2.5 Inches Opera Glasses Binoculars for Adults with Handle- Captain's Mother of Pearl Solid Brass Opera Glasses Binoculars-Pocket Size Handel Binoculars for Kids, Bird Watching, Hunting +Thanks for Visiting Best Antique & Nautical Items store at amazon ASI NAUTICAL This 2.5 Captains Solid Brass Black & White Binocular By ASI NAUTICAL made with Antique Finish is best suitable to gift your loved ones. Easy to carry to any outdoor place like when you go to watch football match,Any historical Monuments,Visit Mountains,hills, for Hunting,Birds Watching,etc. Material Brass Magnification 10x Approx. Total Weight 0.19 kg Approx. Binocular Size 2.5 Inches, + +Price is $",65 +"How much does this cost to the nearest dollar? + +Behringer TUBE OVERDRIVE TO100 Authentic Tube-Sound Overdrive Effects Pedal +BEHRINGER TUBE OVERDRIVE TO100 Authentic Tube-Sound Overdrive Effects Pedal Get tube-like distortion, smooth sustain and super fat tone Get tube-like distortion, smooth sustain and super fat tone This BEHRINGER product has been designed to compete head to head with leading products on the market This BEHRINGER product has been designed to compete head to head with leading products on the market Captures every nuance of your playing from smooth overdrive to screaming tube sounds Captures every nuance of your playing from smooth overdrive to screaming tube sounds Dedicated Drive, Tone and Level controls for awesome sound shaping Dedicated Drive, Tone and Level controls for awesome sound shaping Status LED for effect on/off and + +Price is $",185 +"How much does this cost to the nearest dollar? + +Fun Express Insect Finger Puppets - 24 finger puppet bugs for kids +You'll get an assortment of 24 insect finger puppets including bees, butterflies, ladybugs, dragonflies, and grasshoppers. Each plastic bug finger puppet is made of quality vinyl and measures 1 3/4 - 2 3/4. Plastic bugs and insects for kids make a fun and creative gift and can be used as party favors or decorations for your bug themed party! Adult supervision recommended for children under 3 years as small parts could be a choking hazard. 24 Insect Finger puppets for toddlers and children. Each finger puppet is made of vinyl and measures 1 3/4 - 2 3/4. You'll get an assortment of toy bugs and + +Price is $",6 +"How much does this cost to the nearest dollar? + +WAFJAMF Roller Stamp Identity Theft Stamp Perfect for Privacy Protection(Blue) +IDENTITY THEFT PROTECTION SOLUTION Paper can be recycled after using roller stamp, no need for a shredder. WIDE COVERAGE DESIGN The 1.26 inches wide roller is perfect for covering large swaths of private information in a quick and clean way. SAVE TIME Ink quickly dries. Stamp works well on all regular paper, envelopes and package addresses. One swipe and the info is covered, no need to go for a shredder. UNIQUE DESIGN FOR PRIVACY PROTECTION Compact design with CONFIDENTIAL letters, specially designed to obscure the text underneath it. Total length of stamp coverage can reach 50 meters. Dimensions 2.72 x 1.89 x 1.38 inches, Weight + +Price is $",33 +"How much does this cost to the nearest dollar? + +Capulina Tiffany Floor Lamp 16 Wide Stained Glass Dragonfly Antique Style Standing Reading Light for Living Room Bedroom +Size and Weight lamp shade 16 inches wide and lamp post 63 inches Height;product total weight is 18lbs And the base heavy is 6.6lbs to ensure stability Real tiffany lamp shade lamp shade is handmade by skilled craftsmen,Each small piece of stained glass of the lampshade is spliced by copper foil method,never fade color,durable and beautiful Bulb Matching We can use Incandescent or CFL bulbs (bulbs are not included),different bulbs get different looking,recommanding to use Edison LED bulb. Decor living room bedroom When you light bulb up,tiffany lamp shade eallows the light to filter in but also softens the heat and + +Price is $",65 +"How much does this cost to the nearest dollar? + +Apple Watch Series 6 (GPS, 44mm) - Space Gray Aluminum Case with Black Sport Band (Renewed Premium) +Apple Watch Series 6 (GPS, 44mm) - Space Gray Aluminum Case with Black Sport Band LEAVE YOUR PHONE IN YOUR POCKET Apple Watch Series 6 GPS Model lets you call, text, and get directions from your wrist, while leaving your phone in your pocket. It offers multiple connectivity options, including Bluetooth, Wi-Fi, and NFC to suit all of your possible needs. ALWAYS-ON RETINA DISPLAY You no longer need to raise your wrist or touch the screen to see the time or other information on your watch face, because the display never sleeps. All you need to do is glance to find the time or your workout metrics right there where you + +Price is $",199 +"How much does this cost to the nearest dollar? + +ICON 01725 Tandem Axle Fender Skirt FS1724 for KZ - Cobalt Blue +Tandem axle fender skirt measures to 65-1/4 x 14. Constructed of durable high-impact ABS plastic. This replacement fender skirt is textured. The legs of these fender skirts curve underneath the trailer. Durable, high-impact ABS plastic Textured finish Quick and simple installation Color Cobalt Blue Size 65-1/4 x 14 Brand ICON, Color Regular, Exterior Finish Smooth, Material Acrylonitrile Butadiene Styrene, Dimensions LxWxH 14.5 x 5 x 2.9 inches, Style Modern, Auto Part Position Lower, Vehicle Service Type Trailer, Fit Type Universal + +Price is $",310 +"How much does this cost to the nearest dollar? + +SanDisk 128GB Ultra (10 Pack) MicroSD Class 10 Micro SDXC Memory Card for Smartphone Bundle with (1) GoRAM Reader 10 Pack) +Shoot and save more high-quality photos and full HD video on your Android smartphone or tablet with SanDisk Ultra microSD UHS-I cards. With storage capacities up to 128GB, they're the ideal complement for Android smartphones and tablets. And the SanDisk memory zone app, available on the Google play store, makes it easy to view, access, and back up all of your files from your phones memory in one convenient place. To help your smartphone run at its peak performance, This app can be set to automatically off-load files from your smartphones internal memory to your memory card. Bundle Includes (10x) 128GB + +Price is $",180 +"How much does this cost to the nearest dollar? + +Velvac - 715427 +2020 Mirror System, 2003 & Newer Ford E-Series Cutaway Standard Head, Black, Htd Remote Flat Glass, Wedge Convex, 102 Body Width, Left Side 2020 System, Ford E, 102 Body, Black, Left Side Htd Remote Flat Glass, Wedge Convex, Standard Head Model 2020 mirrors are designed specifically for wide body applications such as high cube cut away vans, rental trucks, Class C RV's and ambulances. The fixed length arms are designed to position the mirror beyond the body providing the driver with an unobstructed view of blind spots and passing lanes around the vehicle. These versatile mirrors are available in several body widths and finishes as well as manual or heated remote glass + +Price is $",110 +"How much does this cost to the nearest dollar? + +TCMT Passenger Backrest Sissy Bar & Luggage Rack Fits For Indian Scout Scout Sixty Scout ABS 2020 Scout 100th Anniversary Scout Sixty ABS +An Indian Scout passenger will ride with greater comfort and confidence with this Quick Release Passenger Sissy Bar. This sturdy backrest can be installed quickly and easily without tools once a set of Mounting Spools are installed on the Motorcycle. The Passenger Backrest locks securely onto the Mounting Spools and provides the passenger with strong, stable support and comfort. The backrest must be equipped with a Genuine Leather Backrest Pad. To add convenient cargo space, an accessory Chrome Backrest Luggage Rack can be added to the backrest. Fitment Fit For 2020 Scout 100th Anniversary Fit For Scout Sixty ABS Fit For + +Price is $",85 +"How much does this cost to the nearest dollar? + +Alnicov 63.5MM Brass Tremolo Block,Tremolo System Bridge,With Bar Block For Fender Strat Stratocaster Bridge +Description 1.Fits bridges with 2-1/16 E to E string pacing and 6 screw modifications install (detailed instructions included) Fits for MIM Fender Standard Series StratAmerican Special StratMIM Classic PlayerClassic Vibe StratAny Import Strat with 6 screw pivot mounting and 2 1/16 string spacing Specifications Tremlo block block dia size brassWeight 248g Package included 1Pcs tremolo bar wrench Durable Electric Guitar Bridge Tremolo Block High quality, easy to handle Sustain your guitar bridge Add mass and sustain to your bridge along with the tonal qualities of brass Since this block is larger + +Price is $",65 +"How much does this cost to the nearest dollar? + +Subaru Forester Outback Legacy OEM Engine Block Heater Genuine new +Manufactured from top quality components, this is your inexpensive replacement option for your rebuild, repair, and maintenance needs. When you select a genuine OEM part - you can rely on the high quality and effectiveness of the product and brand without having to guess if the product will work in sequence with your vehicle. Protecting your investment is important and choosing the right parts can be challenging. Stick with what you know and choose a genuine OEM part. Genuine Subaru Genuine Engine Block Heater Warms engine coolant to promote easier starting in extreme cold conditions. Plugs into a a household electrical outlet. Crosstrek Hybrid models Forester Outback 2.5 Legacy 2.5 Manufacturer Subaru, Brand Subaru, Weight 1 pounds, Dimensions + +Price is $",350 +"How much does this cost to the nearest dollar? + +Richmond Auto Upholstery - 2012 Dodge Ram 3500 Laramie Crew-Cab - Driver Side Bottom Replacement Perforated Leather Seat Cover Dark Gray +Our OEM replacement leather seat covers are guaranteed to match your vehicles interior! Richmond Auto Upholstery has been manufacturing automotive seat covers for over 30 years and only specializing in original factory replacement leather covers & much more! If you cannot find what you need for your Dodge Ram then please give us a call at (281) with your vehicles information!To ensure you receive the correct cover, please send us your VIN Number & TRIM (Interior Trim) Code during check out or email it to us after you have made your purchase. If we do not receive this information within 24 hours we will send an email requesting the information + +Price is $",260 +"How much does this cost to the nearest dollar? + +AP-39 Automotive Paint Primer Grey 2K Urethane Gallon Kit Normal Activator +Automotive paint primer sealer applied as a high build sanding primer or final non-sanding primer sealer. Compatible with AF 970 Black Automotive Base coat certified to be among the Deepest Black Base coats in the market. Not for sale in California, Delaware, and Maryland. Direct to Metal Excellent Filling Properties; Superior Color Holdout Easy Spray and Sanding Shipped by UPS ground only. No overnight shipping. The material is considered hazardous and cannot be returned. Not for sale in California, Delaware, and Maryland. Brand enenfeifei, Color Grey, Size 2 Piece Set, Volume 1 Gallons, Special Feature Not for sale in California, Delaware, and Maryland + +Price is $",200 +"How much does this cost to the nearest dollar? + +Road Top Wireless Carplay Retrofit Kit Decoder for BMW i3 I01 NBT System Year, Support Android Auto, Mirrorlink, Reverse Camera, Original Car Knob Control +Pre-shopping Notes When you buy, please check our website picture to make sure your car system is right. This Wireless Carplay Fits for BMW i3 I01 NBT System Not fit for EVO system. Wireless/Wired Apple Carplay It can work with Siri/ Maps/ Music/ Phone Call. Built-in mic for Siri function and Bluetooth call, use Maps(Support Google Waze and sygic map, etc), listen to your favorite songs using iTunes, Apple Music or other app and access to messages. Keep your original car knob and steering wheel control. Wireless/Wired Android Auto Use wireless or wired connection ( + +Price is $",95 +"How much does this cost to the nearest dollar? + +Gibson Performance Exhaust 5658 Aluminized Dual Extreme Cat-Back Performance Exhaust System +For the extremist who wants to take their truck to the next level, this dual bolt-on Cat back system is for you. This system exists behind the rear tires at an aggressive angle with a powerful exhaust tone. You will gain bold street looks with powerful dyno tuned and tested street performance gains. You can expect to experience gains on average of 15-20 horsepower. Gibson muffler provides a mean performance sound and complemented with polished Stainless Steel Tip. Easy bolt-on installation. No welding required. Backed by a Lifetime Limited Warranty. If you want Extreme, this system is it. 3 inch aluminized mandrel bent tubing Gibson muffler features a baffled and chambered design, + +Price is $",499 +"How much does this cost to the nearest dollar? + +Bella Tunno Happy Links - Baby Montessori Silicone Links & Soft Silicone Baby Toys, Developmental Toys for Playing, Teething, Gross Motor Skills, Color Recognition & More, Navy, SL07 +Introducing your new favorite product. Is it a toy? A teether? A link? Yes, it is. It’s pretty much magic what our Happy Links can do. We took the classic link and made it into the product we wish we had for our littles. Generously sized, easy to grip and wrapped in food-grade silicone, our links relieve little gums, keep toys and lovies attached and keeps kids entertained. Our Happy Links set includes 5 links for teething playing and organizing. Attach them to the stroller, rocker, play gym, or carrier to bring + +Price is $",18 +"How much does this cost to the nearest dollar? + +CANMORE H300 Handheld GPS Golf Device, Shot Distance Yardage Measuring, 40000+ Free Worldwide Preloaded Courses, Lightweight Golf Accessory for Golfers, Powerful Magnetic Clip for Golf Cart, Orange +WORLDWIDE COURSE DATA - Free course data preloaded for over 40,000 (and counting) golf courses around the world (NO subscription fees) - Contact Canmore to add new courses or suggest fixes. ***Notice Golf courses may change layout over time, when detected, the device will display “Incorrect Hole” and require course update. Please visit the CANMORE website for updates and new course information. ESSENTIAL FUNCTIONS YOU NEED - Manage your game, not your golf assistant! GPS course finder switches hole automatically and gives you easy-to-access distance to green ( + +Price is $",299 +"How much does this cost to the nearest dollar? + +DCPOWER AC Adapter Compatible Replacement for KORG PS60 PS-60 61-Key Portable Performance Synthesizer +New aftermarket, custom-made item (NON-OEM/NON-Original Equipment Manufacturer). Auto-Switching adapter can be used in the worldwide. Returns accepted within 30 Days. Quantity 1 unit of adapter. Connector type Round Barrel/Round tip Can be used to power up the device Input AC for using in the worldwide Output 9V DC Dimensions 3 x 2 x 1.5 inches, Weight 6 Ounces, Rank Musical Instruments Keyboard Power Supplies 5234, Is Discontinued No, Available October 24, 2013, Manufacturer DCPOWER, Brand Generic, Connector Type barrel connector, Special Feature Portable, Input Voltage 240 Volts, + +Price is $",88 +"How much does this cost to the nearest dollar? + +Sharp, Commercial Desktop Calculator, LCD +Resume function lets you recall data after shut-off. Extra-large digits for excellent readability. Dual solar/battery power for use in any lighting. Resume function lets you recall data after shut-off. Extra-large digits for excellent readability. Dual solar/battery power for use in any lighting. Dimensions 7.2 x 5.1 x 1 inches, Weight 6.4 ounces, model number Batteries 1 CR2 batteries required. (included), Rank Office Products Basic Office Calculators 1027, Is Discontinued No, Available December 27, 2004, Manufacturer SHARP ELECTRONICS, Brand Sharp, Color Black, Calculator Type Business, Power Source Battery Powered, Batteries 1 CR2 batteries required. (included) + +Price is $",32 +"How much does this cost to the nearest dollar? + +Melissa & Doug Lifelike Plush Stork Giant Standing Stuffed Animal (3+ Feet Tall) +This lifelike plush stork really delivers! A terrific way to welcome a new baby and a great companion for years to come, this striking silky white stork with black wingtips and realistic details is sure to turn heads. Standing an impressive three-plus feet tall, this lifelike bird’s soft, squeezable body covered with silky feathers encourage hugs and cuddles, while quality construction and a strong interior structure keep it standing proudly for years to come. The included baby bib The stork wears proclaims “welcome baby”. the stories long bright orange legs stand on an oval two-foot-long base for extra stability. Kids’ imaginations are sure to take flight with this beautiful feathered + +Price is $",25 +"How much does this cost to the nearest dollar? + +Sony SSCS8 2-Way Center Channel Speaker with Bookshelf Speaker System and Subwoofer Bundle (3 Items) +Equipped with two 4 woofers and a 1 tweeter, the Sony SS-CS8 2-Way Center Channel Speaker handles 145W of peak power. The speaker's woofers use a mica-reinforced diaphragm, the upper surface of which is fashioned to deliver supple and faithful sound quality, while the bottom layer is designed to provide a powerful bass response. The cabinet of the SS-CS8 is made from wood, which is designed to provide a natural resonance, and its bass reflex construction will give directionality to the low frequencies. The speaker's crossover network is mounted directly to the cabinet for vibration isolation, which is intended + +Price is $",95 +"How much does this cost to the nearest dollar? + +ASUS Chromebook CX1, 14 Full HD NanoEdge Display, Intel Celeron N3350 Processor, 64GB eMMC, 4GB RAM, Chrome OS, Transparent Silver, +ASUS Chromebook CX1400 is made for boosting productivity and having more fun while on the move — all day, every day. This lightweight, ultraportable device is powered by Intel processor and gives you the freedom of up to battery life. The slim-bezel design fits more screen into the compact chassis for easy multitasking and incredibly immersive entertainment, and the device is your gateway to the best of Google, including the rich library of apps for work or play on the Google Play Store. With speedy performance, robust security and intuitive features, ASUS Chromebook CX1 is ideal for anyone + +Price is $",440 +"How much does this cost to the nearest dollar? + +FiiO X7 32GB Hi-Res Lossless Music Player, Titanium +FiiO X7 High Resolution Audio Player FiiO X7 High Resolution Audio Player- Currently supports Music Player function only. DAC and other features will be available through future firmware upgrade Dimensions 2.52 x 5.12 x 0.65 inches, Weight 7.8 ounces, model number FIIO X7, Rank Electronics MP3 & MP4 Players 2510, Is Discontinued No, OS Android 4.4.4, RAM 32 GB, Connectivity technologies Aux, Special features Hi Res Audio, Other display features Wireless, Color Titanium Blue, Manufacturer FiiO, Available November 30, 2015, Brand FiiO, Model Name X7, + +Price is $",60 +"How much does this cost to the nearest dollar? + +TORRO Leather Case Compatible with iPhone 14 – Genuine Leather Wallet Case/Cover with Card Holder and Stand Function (Red) +COMPATIBILITY – The TORRO leather iPhone 14 case with card holder is designed and crafted exclusively for iPhone 14. The precision fit ensures full, unrestricted access to the screen, camera, buttons and charging port. GENUINE LEATHER - TORRO are a UK company specialising in luxury leather goods handcrafted from premium cowhide leather. The top-grain leather used is sourced from the finest tanneries in the US and undergoes minimal treatment in order to preserve the natural properties and appearance of TORRO luxurious leathers. SHOCKPROOF – The folio case features a unique TORRO durable TPU frame that has been formulated to aid shock absorption, + +Price is $",45 +"How much does this cost to the nearest dollar? + +Universal Air Conditioner KT 1031 A/C Compressor and Component Kit +UAC A/C Compressor and Component Kit Brand New, OE replacement UAC branded Compressor Kit 100% Guaranteed Fit! Add your car (year/make/model) to Amazon's garage to confirm Premium ISO/TS 16949 quality; tested to meet or exceed OEM specifications Includes compressor & clutch, drier / accumulator, expansion device, 8oz bottle of PAG oil, seal kit; compressor may come charged with shipping oil to keep the part lubricated during transit - drain and replace according to your system's requirements Product is backed by industry leading warranty Manufacturer UAC, Brand UAC, Model KT 1031, Weight 17.9 pounds, Dimensions 17 x 16 x 12 + +Price is $",65 +"How much does this cost to the nearest dollar? + +Street Series Stainless Performance Cat-Back Exhaust system +Made in the USA and engineered to last, for those seeking increased performance and better economy, MagnaFlow MF Series Performance Exhaust systems deliver the smooth deep sound you want and the wide-open performance power you need. Our exhaust systems feature straight-through flow designs for the ultimate in unrestricted horsepower and torque for big power while maintaining exhaust efficiency. These systems are an engineered balance of interior and exterior noise levels and are tested against SAE j1169 standards. great quality and sound Manufacturer MagnaFlow, Brand MagnaFlow Exhaust Products, Model 17870, Weight 25 pounds, Dimensions 58.75 x 13.75 x 19 inches, model number 17870, Exterior Machined, Manufacturer Part 17870, Rank Automotive Automotive Replacement + +Price is $",260 +"How much does this cost to the nearest dollar? + +Lenovo IdeaPad 3 Laptop, FHD (1920 x 1080) AMD Ryzen 5 3500U 8GB DDR4 RAM, 256GB SSD, AMD Radeon Vega 8 Graphics Windows 10, Abyss Blue (Renewed) +14 FHD TN Anti-glare, Ryzen 5 3500U Mobile Processor - 3.80 GHz) 256GB SSD, 8GB DDR4 SDRAM 180 degree hinge, WiFi and Bluetooth 5.0 720p HD Webcam with Dolby Audio dual speakers, 4-in-1 Media Card Reader 2 x USB 3.1 | 1 x USB 2.0 | 1 x HDMI | headphones, Windows 10 in S mode Brand Lenovo, Model Name Lenovo Ide + +Price is $",360 +"How much does this cost to the nearest dollar? + +Access Bed Covers TonnoSport - Roll-Up Tonneau Cover - Compatible with Toyota Tundra 6ft. 6in. Bed (w/o Deck Rail) +Tonneau Cover TONNOSPORT Roll-Up Cover TONNOSPORT Roll-Up Cover; Roll-Up; Without Deck Rail;FEATURES Gives You A Sleek Low Profile Look Gives You A Sleek Low Profile Look Compatible With Bed Rails/Bed Caps/Tailgate Protector Compatible With Bed Rails/Bed Caps/Tailgate Protector Quick Clamp On Installation Quick Clamp On Installation Lockable/Protects Your Cargo Lockable/Protects Your Cargo Complete Bed Usage When Open Complete Bed Usage When Open No Need To Remove No Need To Remove Tailgate Stays Operational Tailgate Stays Operational 2 Year Warranty 2 Year Warranty + +Price is $",55 +"How much does this cost to the nearest dollar? + +G.I. JOE Hasbro 3 3/4 Wave 5 Action Figure SGT. Flash (Laser Rifle Trooper) +SGT. FLASH is highly skilled in many aspects of electronic technology and is capable of equipment repair in the field. His specialized education includes electronics school, chemical school, and covert electronics. He is a qualified expert with the M-16, and (shoulder laser rifle). Celebrate 25 years of the ultimate action team with this articulated action figure! Display your action figure on the included display base! Figure also comes with a weapon! Twenty-fifth anniversary action figure has detailed styling and comes with a weapon and a display base! Ages 5 and up. Dimensions 5.12 x 1.57 x 5.51 inches, Weight + +Price is $",29 +"How much does this cost to the nearest dollar? + +T&S Brass Double Pantry Faucet, Wall Mount, 8 Centers, 6 Swing Built in Stops +T&S Brass 8 Wall Mount Mixing Faucet, Eterna Cartridges, Lever Handles, 6 Swing Nozzle, Built-In Stops & 1/2 NPT Female Inlets. Package Dimensions 9 L x 4 H x 14 W (inches) Package Weight 5.11 pounds Country of Origin United States Part Number Brand T&S Brass, Mounting Type Wall Mount, Finish Type Polished, Color Brass, Handles 1, Included Components Nozzle, Instruction Manual, Handle Type Lever, Installation Type Single Hole, Dimensions LxWxH 13.3 x 8.8 x 3.7 inches, Handle Material Brass, + +Price is $",65 +"How much does this cost to the nearest dollar? + +ZTUOAUMA Fuel Injection Pump Compatible with Cummins Engine M11 N14 QSM11 ISM11 +Part Number Application Models Compatible with Cummins Diesel Engine M11 N14 QSM11 ISM11 Note Please verify the part number and the detailed parts on pumps between our pictures before buying Warranty Returnable for 6 Months and Changeable for 1 Year (return and change for free) Direct replacement with strict and full test in factory to ensure the long durable service life Brand ZTUOAUMA, Fit Type Vehicle Specific Fit, Vehicle Service Type Truck, Style Fashion, Auto Part Position Rear, Gas Type Diesel, Operation Mode Mechanical, Manufacturer zt truck parts, Weight 11.51 pounds, Dimensions 9.92 x 9.06 x 7.6 + +Price is $",250 +"How much does this cost to the nearest dollar? + +Hp Prime Graphing Calculator Ii +Hp Prime Graphing Calculator Ii IB Diploma Programme exam approved Sleek, slim, brushed metal design that looks great and performs even better. Keep the calculator protected when it's not in use with a slide-on cover Enjoy a feature-rich calculating experience with familiar HP alphanumeric keypad and a large diagonal, multi-touch display Lithium-Ion rechargeable battery, 256 MB flash memory Unique STEM ecosystem with HP Prime Graphing Calculator, HP Prime Wireless Kit1, and HP Connectivity Kit Dimensions 3.66 x 0.65 x 7.28 inches, Weight 8 ounces, model number Batteries 1 Lithium Ion batteries required., Rank Office Products 27247, Basic Office Calculators 79, Available July 10, 2019, + +Price is $",39 +"How much does this cost to the nearest dollar? + +Lowrance Nmea 2000 25' Extension Cable +Lowrance n2k extension cable Red plugs NMEA 2000 extension cable Mfg.# Lowrance connectors. Package Dimensions 10 L x 3 H x 5 W (inches) Country of Origin Mexico Part number For use with LGC 3000 and red NMEA network Dimensions L x W x H 9.92 x 4.25 x 3.23 inches, Weight 0.79 Pounds, Dimensions LxWxH 10 x 5 x 3 inches, Weight 0.32 Kilograms, Brand Name Lowrance, Model Name Color Red, s 1, Manufacturer Lowrance, Part Model Year 2015, Included Components Lowrance Nmea + +Price is $",35 +"How much does this cost to the nearest dollar? + +Jeep Genuine Accessories Hood Lock +Hood lock rivits on using existing holes. Self codes to the ignition key. Same as standard in Europe, meets Thatchem requirements. When you select a genuine OEM part you can rely on the high quality and effectiveness of the product and brand without having to guess if the product will work in sequence with your vehicle. Protecting your investment is important and choosing the right parts can be challenging. Stick with what you know and choose a genuine OEM part. Fits Wrangler Hood lock secures underhood items from theft Rivits into existing holes and automatically codes itself to the vehicle ignition key during installation Same as the production hood lock for European markets Manufacturer Jeep, Brand Jeep, Model Weight 3.7 pounds, Dimensions 8.2 x 7.8 + +Price is $",65 +"How much does this cost to the nearest dollar? + +GODOX CB-06 Hard Carrying Case with Wheels +Godox CB-06 Hard Carrying Case with Wheels Carrying/Transport Options Dual connecting straps Top handle Wheels Dimensions 94.0 x 34.0 x 25.0cm (37.01 x 13.39 x 9.84 ) Dimensions 41.25 x 16.25 x 12.5 inches, Weight 7.5 pounds, model number CB 06, Rank Tripod & Monopod Cases 13, Is Discontinued No, Available August 24, 2017, Manufacturer Godox, Language English, Brand GODOX, Color Black, Closure Type Zipper, Pattern Solid, Dimensions LxWxH 41.25 x 16.25 x + +Price is $",75 +"How much does this cost to the nearest dollar? + +Au-Tomotive Gold, INC. Ford Black Valet Key Chain +Milled alloy black finish with easy release spring-loaded key ring for valet parking. Laser cut engraved logo will never fade. Showing OEM style car logo on one side. It is about 4 long. Brand new Official licensed product. Milled alloy black finish with easy release spring-loaded key ring for valet parking. Laser cut engraved logo will never fade. Showing OEM style car logo on one side. It is about 4 long. Brand new Official licensed product. Manufacturer Au-Tomotive Gold, INC, Brand Au-Tomotive Gold, INC., Weight 1.44 ounces, Dimensions 4.3 x 2.1 x 0.6 inches, Manufacturer Part Rank Automotive Keychains 18749 + +Price is $",35 +"How much does this cost to the nearest dollar? + +Snailfly Black Roof Rack Rail + Cross Bar Fit for Honda All New CRV CR-V (4pcs +FITMENT Roof Rack Cross Bars Fit For Honda CRV CR-V 2017 2018 2019 2020 2021 2022 Please make sure the fitment before your purchase 2 PACKAGES Package 1# 2pcs Roof Racks Package 2# 2pcs Crossbars Necessary Mounting Hardware Like Bolts Are Included. SPECIFICS 100% Brand New Smooth surface Item exactly as the picture showed High Quality Aluminum Alloy Long lasting & durable finish, suitable for all weathers INSTALLATION Please contact us via message if you need installation insturctions. FEATURES Low profile streamline design,efficiently reduce wind resistance and noise. Greatly increase overall + +Price is $",125 +"How much does this cost to the nearest dollar? + +KING SHA Anti Glare LED Track Lighting Heads (50W Eqv.) Compatible with Halo Pack +Stable performance dimming capabilities that work seamlessly with universal dimmers, allowing you to adjust the brightness smoothly from 10% to 100% without any flickering. Anti-glare design to provide soft and eye-friendly lighting. The glare-free grid helps to reduce eye strain and protect your vision. Compatibility with H-type circuit track systems, making them suitable for a wide range of track lighting applications. GU10 base with a twist and turn type, which makes it easy to change bulbs. The 7W dimmable MR16 bulb with a high color rendering index of 90+ and a color temperature of 3000K (50W equivalent) provides bright and vibrant illumination. Adjustable + +Price is $",180 +"How much does this cost to the nearest dollar? + +APS Compatible with Chevy Silverado 1500 Main Upper Stainless Steel Black 8x6 Horizontal Billet Grille Insert +INSTALLATION This is Bolt Over/Overlay/Bolton (Drilling Not Required) 8x6 Horizontal Billet grille insert. OE grille shell remains on the car after installation. CUSTOM FIT Compatible with Chevy Silverado 1500 Not for Z71 SPECS Each grille made from premium Stainless Steel and customized to fit the Main Upper of your vehicle. All necessary hardware and instruction are included. Grille insert only, logo or emblem, frame or shell is NOT included. PERFECT DESIGN Each grille made from premium stainless steel with black powder coated surface that offers resistance to oxidization. This grille enhances the visual appearance of your car. SATISFACTION GUARANTEED + +Price is $",110 +"How much does this cost to the nearest dollar? + +Wilwood Engineering Brake Caliper +Wilwood's D52 Front Caliper Kit is a direct bolt-on 2 piston replacement for the factory original single calipers on many GM Passenger Vehicles and Trucks. Forged billet aluminum bodies, stainless steel pistons, and competition style high-temperature seals put an end to the rust, bore pitting, and seal failures that plague the OE caliper design. D52 calipers provide low-maintenance performance and a huge weight savings with high temperature reliability for the street and track. D52 calipers mount in the stock location over stock rotors, use the original style OE D52 brake pads and an OE banjo bolt brake line mounting. Calipers can be used with most wheels that clear the OE calipers. The front calipers with 2 + +Price is $",90 +"How much does this cost to the nearest dollar? + +ACDelco Gold Starter, Remanufactured (Renewed) +ACDelco’s Professional Remanufactured Starters are the high quality replacement ideal for many vehicles on the road today. ACDelco’s Professional Remanufactured Starters have new bronze sintered and oil-impregnated bushings. Solenoid contacts are new with copper terminals and plated hardware. Remanufacturing starters is an industry standard practice that involves disassembly of existing units, and replacing components that are most prone to wear with new components. Damaged and obsolete parts are replaced and are end of line tested to ensure they perform to ACDelco specifications. In addition, remanufacturing returns components back into service rather than processing as scrap or simply disposing of them. These starters will + +Price is $",110 +"How much does this cost to the nearest dollar? + +UWS Matte Black Heavy-Wall Aluminum Deep Angled Truck Tool Box with Low Profile, RigidCore Lid +UWS crossover truck tool boxes are the tried-and-true way of keeping your tools organized, on-hand and fully secure no matter where you and your truck roam. Each UWS tool box is built from extra-thick aluminum welded into a single-piece tub. This provides the tool box with reliable strength and helps keep the interior sealed off from the elements. Aluminum construction also makes the box highly resistant to corrosion for long-lasting use. To add even more strength to the crossover truck tool box, the lid features our patented RigidCore foam-filled design. Layered between two sheets of aluminum, this core greatly increases the structural integrity of the lid to prevent bending and warping and to ensure + +Price is $",50 +"How much does this cost to the nearest dollar? + +Dell Latitude E5440 14in Business Laptop Computer, Intel Core up to 8GB RAM, 256GB SSD, HDMI, DVDRW, WiFi plus BT, Windows 10 Professional (Renewed) +2018 Dell Latitude E5440 14 Business Laptop Computer, Intel Dual-Core up to 8GB RAM, 256GB SSD, HDMI, Bluetooth 4.0, WiFi Windows 10 Professional (CertifiedRefurbished) Operating System Microsoft Windows 10 Professional CPU Intel Core 1.9GHz up to 2.9GHz Screen 14 Memory 8 GB DDR3 Storage 256GB SSD Optical Drive DVD-Writer Graphics Card Intel HD Graphics 4400 Video Memory Shared memory Communication Gigabit LAN and WLAN CPU Type Intel Core i5 4 + +Price is $",350 +"How much does this cost to the nearest dollar? + +(Plug and Play) Spare Tire Brake Light Wheel Light Brake Light for Wrangler JK JKU Red Light +FITMENT Fit for JK JKU with all 16 to 20 inch rim diameter wheels, works with 5x5, 5x4.5, 5x5.5 inch lug patterns. Plug & Play Package comes with instructions including the video link of installing and wiring. Just plug to the 3rd brake light.Easy to install, just plug and play, no need to splice the existing brake light wires. No broken wire installation. You can install the third spare light in few minutes. Braking Function Obvious and fast braking warning signal, lights up the inside of your spare when step on the brake, more red brightness and stronger penetration, easy To Be + +Price is $",89 +"How much does this cost to the nearest dollar? + +The Ultimate Roadside Rescue Assistant +The Ultimate Roadside Rescue Assistant is the rechargeable power source, air compressor, emergency light and phone charger no driver should be without. It features a 140W inverter to power 110V household appliances, plus a car battery jump starter, 150 PSI air compressor and a 5 LED work light. Keep one in your home or vehicle for peace of mind. The Ultimate Roadside Rescue Assistant is the rechargeable power source, air compressor, emergency light and phone charger no driver should be without. It features a 140W inverter to power 110V household appliances, plus a car battery jump starter, 150 PSI air compressor and a 5 LED work light. Keep one in your home or vehicle for peace of mind. Manufacturer Rally Manufacturing + +Price is $",155 +"How much does this cost to the nearest dollar? + +Brand New 18 x 8.5 Replacement Wheel for Mercedes CLS500 CLS550 Rim 65371 +JWL/VIA Certifed Product ISO 9001 Certifed Product Replication Manufacturer WheelerShip, Brand Wheelership, Model OEM Replacement (Aftermarket), Weight 32.3 Pounds, Exterior Silver, Manufacturer Part Construction Rim Diameter 18 Inches, Rim Width 8.5 Inches, Bolt Pattern ( Holes) 5, Bolt Pattern (Pitch Circle Diameter) 112 Millimeters, Offset 28 Millimeters, Available April 24, 2014, Size 18 inch, Exterior Finish Silver, Wheel Size 18 Inches, Pitch Circle Diameter 112 Millimeters, Rim Size 18 Inches, Diameter 18 Inches, Vehicle Service Type Passenger Car + +Price is $",350 +"How much does this cost to the nearest dollar? + +Headlight Headlamp LH Left & RH Right Pair Set for Toyota Prius +For 10-11 Toyota Prius Headlight Headlamp Halogen LH & RH Pair Driver & Passenger Set DETAIL Assembly Type Composite Lens Color Clear Ballast Included No Manufacturer Part Number Mounting Hardware Included No Bulb Size Same as factory Bulb Type Halogen OE Number Bulbs Included No Certifications DOT,SAE Placement on Vehicle Left, Right Fitment Type Direct Replacement Headlight Style Factory Housing Color Chrome (Crystal) Fits Prius Headlight · 100% brand new and high quality · Fits both LH (Driver Side) & RH (Passenger Side) · Replaces dealer part numbers · Correct for models with Halogen Style Headlights · Do NOT fit models with HID (High Intensity Discharge + +Price is $",200 +"How much does this cost to the nearest dollar? + +Lilo And Stitch Deluxe Oversize Print Large 16 Backpack with Laptop Compartment - A19563 Multi-color +Send them off with awesome top quality and durable Backpack by KBNL! Our backpacks and accessories feature today's popular characters and designs. KBNL backpacks are as practical as it is stylish and include the following features Durable polyester exterior, Full interior lining, dual side pockets, front organizer pocket for additional accessory storage, Padded and adjustable shoulder straps, padded interior pocket which protects up to a laptop, Fully padded back panel - KBNL products are made with top quality material and workmanship. Front organizer pocket for additional accessory storage Padded and adjustable shoulder straps, fully padded back panel, padded interior pocket which protects up to a laptop Dimensions 5 x 12 x + +Price is $",29.99 +"How much does this cost to the nearest dollar? + +AC Compressor & A/C Clutch For Hyundai Accent 2006 2007 2008 2009 - BuyAutoParts NEW +Engineered for superior durability, backed by a one year, unlimited mileage warranty Guaranteed Exact Fit for easy installation 100% BRAND NEW, premium ISO/TS 16949 quality - no core deposit or return required! Make sure you flush the system thoroughly and replace the drier filter along with the compressor for better long-term reliability, or consider one of our AC kits that includes everything you need! Fits Hyundai Accent Manufacturer BuyAutoParts, Part Weight 16 Pounds, Dimensions 12 x 11 x 10 inches, Quantity 1, Rank Automotive Automotive Replacement Air Conditioning Compressors 9735, Available April 25, 2015, + +Price is $",160 +"How much does this cost to the nearest dollar? + +House Of Troy Pinnacle Collection Portable Halogen Wall Lamp, Antique Brass +From the Manufacturer The House of Troy Pinnacle Collection Portable Halogen Wall Lamp shows that hand-craftsmanship is a time honored tradition, as alive today as the land itself. In this tradition, House of Troy carefully crafts each light for you by hand, to the highest quality standards. This swing arm wall lamp will create a stunning presence in any room and works well with many styles of decor. Showcasing the classic lines of this lamp and cut from the highest quality solid brass, the has an antique brass finish achieved by coloring the solid brass with an application of acid oxide. The finish is then darkened and partially rubbed away, leaving dark highlights throughout. All oxidized finishes are protected with a finish coat of matte + +Price is $",40 +"How much does this cost to the nearest dollar? + +Juno T29 WH Floating Electrical Feed Single Circuit Track, 120 Volts, White +Floating Electrical Feed for Juno Single Circuit Track Permits mounting at any point of Juno single circuit track under the outlet box. Includes floating connector and outlet box cover plus extra track dead end.. Floating Electrical Feed For Juno 1 Circuit Track - White Lighting Rail. 1- Juno Floating Electrical Feed T29Wh For Juno Single Circuit Track Brand Name Juno Lighting Product Dimensions 6.0 X 3.0 X 3.0 Country Of Origin China Manufacturer Acuity Brands Lighting, Part Weight 4.2 ounces, Dimensions 6 x 3 x 3 inches, Country of Origin China, model number T29 WH, Color White, Style Voltage, Finish White, + +Price is $",60 +"How much does this cost to the nearest dollar? + +Sherman GO-PARTS - for Toyota Avalon Side View Mirror - Right (Passenger) Replacement 2014 2015 +Sherman Replacement Part Compatible with TOYOTA AVALON Right Mirror outside rear view (Partslink Number Sherman Replacement Part Compatible with TOYOTA AVALON Right Mirror outside rear view (Partslink Number Manufacturer Sherman, Brand Sherman, Model Weight 3.35 pounds, Dimensions 17.01 x 11.73 x 6.69 inches, model number Exterior Painted, Manufacturer Part ABPA Partslink Position Rear, Lift Type Manual, Rank Automotive Automotive Exterior Mirrors 21172, Available November 8, 2021, Auto Part Position Rear, Mounting Type Windshield Mount, Included Components Mirror, Operation Mode Manual, Shape Rect + +Price is $",80 +"How much does this cost to the nearest dollar? + +Roland RPU-3 Electronic Keyboard Pedal or Footswitch, 3 Pedal +Product Description Combining three pedals into one convenient and clutter-free unit, the Roland RPU-3 offers a real grand piano pedaling experience. With separate 1/4 outputs for each of its three pedals, the RPU-3 is compatible with keyboards such as Roland’s FP-90, FP-60, and pianos. In addition to providing the same pedal configuration as a grand piano, the RPU-3 also provides hands-free control of various instrument functions, such as selecting registrations or activating vocal effects on the FP-90. From the Manufacturer Combining three pedals into one convenient and clutter-free unit, the Roland RPU-3 offers a real grand piano pedaling experience. With + +Price is $",45 +"How much does this cost to the nearest dollar? + +Rockland VMI14 12,000 Pound 12 Volt DC Electric Integrated Vehicle Winch Kit with a Synthetic Rope and Remote Accessory for Jeep, Truck, and ATV Recovery +MULTI-PURPOSE WINCH Electric vehicle winch with a Hawse fairlead and synthetic rope provides car recovery in tough situations for trucks and SUVs CONVENIENT REMOTE OPERATION Wired remote controller power switch allows for retracting the rope for winching as desired DC MOTOR Series-wound motor stays cooler during longer pulls to increase continuous operation time HIGH-PERFORMANCE GEAR SYSTEM planetary gear system with free spooling provides a fast line speed with a fast line-out FEATURES AND SPECIFICATIONS Voltage detection and stall load protection capabilities flash red and blue LED lights to warn and alert you; Color Black; Dimensions (L + +Price is $",60 +"How much does this cost to the nearest dollar? + +Max Advanced Brakes Elite XDS Front Cross-Drill & Slots Rotors with Elite Max Brake Pads +Max Advanced Brakes Elite XDS FRONT brake kit is exceptional in every way to meet the demanding braking needs for multiple driving styles, road and weather conditions FRONT brake kit with Elite XDS brake rotors are finished with a special coating to prevent corrosion & rust and to protect against moisture and salt. Brake rotors are cross-drilled and slotted to dissipate heat and keep your brakes in perfect condition at all times. Elite Max brake pads and hardware clips included Max Advanced Brakes has been providing replacement brake kits, brake rotors and brake pads for over 10 years and we've always prioritized the safety and satisfaction of our customers. Our brakes are designed to be safe and durable + +Price is $",80 +"How much does this cost to the nearest dollar? + +Quality-Built 11030 Premium Quality Alternator +Quality-Built Alternators are remanufactured for a perfect fit. Housings are 100 percent blasted clean, all mounting threads inspected, re-tapped for easy installation and consistent torque. Terminals are of 100 percent OE-quality. High-temp insulators make connections secure and reliable. Quality-Built alternators are re-designed to operate with every turn of the key for reliable performance. Rotors are electronically tested and coated with high dielectric insulation to ensure maximum durability and charging performance. Bearings are inspected or new, with high-temperature grease for reduced heat and friction. Stators are electronically tested for maximum insulation quality and phase balance. Rectifiers are load tested to ensure alternator durability and charging performance. Brushes and springs are new + +Price is $",110 +"How much does this cost to the nearest dollar? + +Lucida LG-510 Student Classical Guitar, Full Size +The perfect guitar for any beginner, the Lucida Student LG-510 features Gotoh tuners for easy tuning, nylon strings for low string tension and a classic design available in multiple sizes. White Wood Top, Back and Sides Open Gear Gotoh Tuning Machines Nato Neck Hard Maple Fretboard Multi-Colored Rosette Weight 3.7 pounds, Dimensions 39 x 15 x 4 inches, model number Rank Musical Instruments Classical & Nylon-String Guitars 336, Is Discontinued No, Available May 3, 2010, Back Material White Wood, Body Material Wood, Color Name Multi-colored,White, Fretboard Material Maple Wood, String Material Nylon, Top Material White Wood, + +Price is $",160 +"How much does this cost to the nearest dollar? + +Longacre Aluminum Turn Plates +Longacre is an established brand name in the racing industry and is recognized for dedication to quality, innovation and customer satisfaction. Check out our comprehensive line of race scales, alignment tools, racing gauges and other products. Whether you are into stock, modified, drag, go kart, off-road, sprint or RC car racing, we'll provide you with the quality racing parts you deserve. The free floating in 2 directions eliminates bind It reads to 1/2° - Degrees can be zeroed with the car on The low profile design means that its only 1 tall Can also be used on top of scale pads Has a weight capacity of 1,500 lbs. per scale Manufacturer Longacre, Brand Longacre, Model Longacre Racing Products, Weight 31 + +Price is $",300 +"How much does this cost to the nearest dollar? + +Motion Pro Adjustable Torque Wrench Adapter +Torque any fastener with a combination wrench or Allen wrench. Will work with 6 millimeter (1/4 inch) through 19 millimeter (3/4 inch) combination wrenches and with 6 millimeter and 8 millimeter Allen wrenches. Torque any fastener with a combination wrench or Allen wrench Brand Motion Pro, Material Alloy Steel, Dimensions LxWxH 6 x 3.9 x 1.1 inches, Weight 0.5 Pounds, Quantity 1, Head Style Fixed Square, Hex, Finish Type Black Oxide Finish, Torque 90 Foot Pounds, Operation Mode Mechanical, Manufacturer Motion Pro, Model Dimensions 6 x 3.9 x 1.1 + +Price is $",120 +"How much does this cost to the nearest dollar? + +Glyph Thunderbolt 3 NVMe Dock (0 GB) +Perfect fit ultra slim case for iphone 6 plus. Lightweight, and easy access to all buttons Snap-on case Strong packaging to protect the goods from possible damage High quality and durable protection cover. Brand Glyph Production Technologies, model number Weight 1.89 Kilograms, Dimensions 9.4 x 3.8 x 0.9 inches, Dimensions LxWxH 9.4 x 3.8 x 0.9 inches, Color Black, Manufacturer Glyph Production Technologies, Is Discontinued No, Available June 6, 2015, Rank Computers & Accessories Laptop Docking Stations 2945, Hardware Interface USB, Thunderbolt, Compatible Devices iPhone 6 Plus, Total USB Ports 3, Dimensions L + +Price is $",100 +"How much does this cost to the nearest dollar? + +TOYO Open Country MT Performance Radial E/10 129P +TOYO Open Country MT Performance Radial E/10 129P Country of Origin Japan The Package Height of the Product is 11.8 inches The Package Length of the Product is 34.5 inches The Package Width of the Product is 34.5 inches Fit type Universal Fit Load capacity 4080 pounds Brand Toyo Tires, Size E/10, Rim Size 18 Inches, Section Width 295 Millimeters, Tire Aspect Ratio 70.0, Load Index 129, Speed Rating P, Load Capacity 4080 Pounds, Tread Depth 19.4 32nds, Tread Type Non-Directional, Tire Diameter 25, Weight 58 pounds, Manufacturer Toyo + +Price is $",300 +"How much does this cost to the nearest dollar? + +Razer Seiren X USB Streaming Microphone and Razer Kiyo Streaming Webcam +Bundle Contents 1x Kiyo Webcam, 1x Seiren X Microphone Super Cardioid Pickup Pattern Sound is recorded at a tighter angle, reducing unwanted background noise and providing crisp clear audio Designed for Streaming Supports video and audio recording in 720p 60 FPS / 1080p 30 FPS; Streamlabs certified and compatible with popular platforms like OBS and XSplit Convenient, Built In Lighting An attached, 5600K daylight balanced ring light around the camera keeps subjects evenly lit without the hassle of additional lighting equipment Brand Razer, Connectivity Technology USB, Color Black, Video Capture Resolution 1080p, 720p, Lens Type Zoom, Form Factor Compact + +Price is $",110 +"How much does this cost to the nearest dollar? + +Happy Birthday to Dad From Your Daughter Greeting Card - I've Always Known I Could Depend On Your Love and Support No Matter What +Greeting Card Includes Envelope Front From Your Daughter - Ever since I was a little girl, you've been such an important part of my life... I've always known I could depend on your love and support... no matter what. Inside On your birthday, if I could give you anything in return for all you've given me, it would have to be the love I always hold in my heart for you. Manufacturer Greeting Card, Brand Greeting Card, Weight 1.6 ounces, Dimensions 9 x 7 x 0.1 inches, Is Discontinued No, Pre-printed happy birthday, s 1, Manufacturer Part GC, Rank + +Price is $",6 +"How much does this cost to the nearest dollar? + +Little Tikes My Real Jam First Concert Set with Electric Guitar, Drum and Keyboard, 4 Play Modes, and Bluetooth Connectivity - for Kids Ages 3+ +The My Real Jam™ First Concert Set lets kids harness their inner musician. Four play modes—Play with the Band, Free Play; Solo Jam; Play Any Song with Bluetooth® —provides countless hours of musical fun. The realistically designed Electric Guitar, Drums and Keyboard are packed with features, while the packages double as reusable instrument cases, perfect for storing the instruments or for hitting the road as an aspiring musician. BECOME A SUPERSTAR – Lets kids jam their way to rock star status with a perfect combo of musical play and pretend play PLAY ANY SONG WITH BLUETOOTH - Sync with any Bluetooth enabled device to play along + +Price is $",19.99 +"How much does this cost to the nearest dollar? + +Studio M Peace and Harmony Art Pole Community Inspirational Outdoor Decorative Garden Post, Made in USA, 60 Inches Tall +Impactful. Beautiful. Unique. An Art Pole is an impactful way to bring beautiful artwork into any landscape. With a patented, state-of-the-art design and exceptional quality, it will be at the heart of your garden for years to come. Art Poles are easy to install - all hardware is included and no digging is necessary. Made in the USA from ultra-durable, maintenance free PVC, each Art Pole features vivid artwork with an expected 5-year fade-resistance (this will vary by regional climate and sun exposure). U.S. Patent No. U.S. Patent No. U.S. Patent No. Art Poles are created by the team of passionate people + +Price is $",110 +"How much does this cost to the nearest dollar? + +MyVolts 12V Power Supply Adaptor Compatible with/Replacement for HP Scanjet 3500C Scanner - US Plug with Extension and ON/Off Switch +Need to power your HP Scanner Scanjet 12V high-quality power adapter is compatible with the HP Scanjet 3500C Scanner.The plug fits a US 2-pin wall power socket.This power adaptor is designed to meet the power specification of the HP Scanjet 3500C Scanner - correct voltage, amperage and tip size.It meets and exceeds all US safety standard, features overvoltage, overcurrent and short circuit protection to protect your device, and is energy efficient.Also included in the Premium option is a handy in-line on / off switch, AND a 3 meter (10 feet) extension cable.Power + +Price is $",35 +"How much does this cost to the nearest dollar? + +Dell Latitude 7212 Rugged Extreme Tablet, 11.6 inch FHD Touch LCD, Intel Core 8GB Ram, 128GB SSD, WiFi, GPS, Windows 10 Professional (Renewed) +This Certified Refurbished product is tested and certified to look and work like new. The refurbishing process includes functionality testing, basic cleaning, inspection, and repackaging. The product ships with all relevant accessories, a minimum 90-day warranty, and may arrive in a generic box. Only select sellers who maintain a high performance bar may offer Certified Refurbished products on Amazon.com Intel Core 7th Generation Processor (Dual Core, 3MB + u-blox NEO-M8 GPS card FHD Outdoor-Readable Glove-Capable Touchscreen w/ Gor + +Price is $",220 +"How much does this cost to the nearest dollar? + +Covermates Contour Fit Car Cover - Light Weight Polyester, Weather Resistant, Elastic Hem, Vehicle Covers-Khaki +From freezing rain and snowstorms to harsh sunlight and bird droppings, your vehicle faces it all. Spring brings bouts of rain followed by showers of pollen, leaving your vehicle a yellow, sticky mess. Our WeatherTite Prime covers are made of 300D stock-dyed polyester designed for climates with moderate humidity, moderate sunlight, heavy wind gusts, and heavy rain and snowfall. WeatherTite Prime covers provide excellent protection from dirt, dust, pollen, rain, and anything else nature has to throw at it. Hidden grommets are placed along the bottom of the cover, allowing optional cable locks to keep the cover secure and safe. An extra + +Price is $",199 +"How much does this cost to the nearest dollar? + +Westin Black HDX Grille Guard fits Ram 2500 3500 (Excl. Power Wagon) +The HDX Grille Guard is the ultimate in extreme truck gear. Its a fully welded grille guard that features full wraparound wings made of heavy duty 2 diameter tube. Uprights are finished and protected with extra wide rubber that is 1/8 thick and 2 3/4 wide resulting in a solid clean look. The full punch plate grille protects the vehicle's grille area. HDX Grille Guards are available in stainless and black powder coat finish. PERFECT FIT Direct fit for Ram 2500 3500 (Excl. Power Wagon) 2 inch tube, full wrap around wings Mount kit and hardware included Full punch plate grille solid construction + +Price is $",160 +"How much does this cost to the nearest dollar? + +Fieldpiece JL2 Job Link Wireless App Transmitter Bluetooth +With the JL2 transmitter and the job link app, you can start running your jobs through your mobile device. Fill out inspection checklists, view live measurements, gather in-depth Diagnostics, and adjust systems to live data. All reports can be emailed to customers and office, as well as saved in the cloud for access at anytime. The JL2 transmitter receives measurements from any Fieldpiece Wireless manifold and the Fieldpiece Wireless dual in-duct Psychomotor (SDP2) via radio frequency for extra distance - up to 100' from instrument to phone. Then the JL2 transmitter converts all live measurements and data to Bluetooth connection with your mobile device. made in United States. Manufactured by Fieldpiece instruments Inc. Sman digital + +Price is $",99 +"How much does this cost to the nearest dollar? + +hansgrohe Talis S Modern Premium Easy Clean 1 9-inch Tall Bathroom Sink Faucet in Chrome, +Design With a range of models and styles, paired with the quality and design you expect from hansgrohe, dream bathrooms become a reality. Bath faucets by hansgrohe exude beautiful design with superior performance and durability. Pick your desired faucet, then browse the entire product suite for complementary accessories. German engineering ensures a lifetime of consistent and dependable operation. Maintenance Products that function perfectly are essential. To ensure that they do, every hansgrohe product 100% air tested in production. hansgrohe faucets feature a silicone aerator that optimizes water flow performance, resists mineral deposit build-up, and is designed to be easily wiped clean. Installation Can be installed in + +Price is $",60 +"How much does this cost to the nearest dollar? + +G-Technology G-SPEED eS PRO High-Performance Fail-Safe RAID Solution for HD/2K Production 8TB +Product Description High-Performance, Fail-Safe RAID Solutions for HD/2K Production The new G-SPEED eS PRO from G-Tech provides professional content creators better than Fibre-Channel performance for demanding post production applications at a fraction of the cost. The compact and whisper quiet G-SPEED eS PRO features mini-SAS connectivity to a high performance PCIe x8 IOP RAID controller that supports RAID levels or 6. A single G-SPEED eS PRO enclosure with four 7200 RPM, SATA II drives in RAID 0 mode supports multi-stream ProRes 422 HQ playback and a single-stream of uncompressed 10-bit HD. Two units + +Price is $",399 +"How much does this cost to the nearest dollar? + +DreamLine Shower Door, 56-60 W x 72 H, Chrome +The DreamLine Mirage-X frameless sliding shower or tub door is the epitome of simple elegance with a modern flair. The remarkably innovative headerless design creates an unobstructed and open view for your shower. The Mirage-X shower door will complete any bathroom space with a look of luxury and style. DreamLine exclusive ClearMax water repellant and stain resistant glass coating adds superior protection from stains and is nearly maintenance-free. IMPORTANT! All measurements should be taken only AFTER walls are finished (tile, back walls, etc. ) Model Size 56 - 60 in. W x 72 in. H; Walk-in Opening 22 to 26 in. Configuration consists of a Sliding Door and a Station + +Price is $",89 +"How much does this cost to the nearest dollar? + +Sanctuary Square Backplate Finish Oiled Rubbed Bronze, Size 1.25 H x 1.25 W x 0.06 D +Finish Oiled Rubbed Bronze, Size 1.25 H x 1.25 W x 0.06 D Features -Screw pack M4. -Base material Zinc alloy. -Lifetime warranty. -Sanctuary collection. Dimensions Size 1 H x 1 W x 0.06 D - Overall Height - Top to Bottom -1. Size 1 H x 1 W x 0.06 D - Overall Width - Side to Side -1. Size 1 H x 1 W x 0.06 D - Overall Product Weight -0.1 lbs. Size 1.25 H + +Price is $",35 +"How much does this cost to the nearest dollar? + +Pelican Protector 1750 Long Case - Multi-Purpose Hard Case with Foam - Tripod, Camera Equipment, Sportsmans Rifle Case, Electronics Gear, and More (Black) +Sensitive equipment needs protection, and since 1976 the answer has been the Pelican Protector Case. These cases are designed rugged, and travel the harshest environments on earth. Against the extreme cold of the arctic or the heat of battle, Pelican cases have survived. Made in the USA, these tough cases are designed with an automatic purge valve, that equalizes air pressure, a watertight silicone O-ring lid, over-molded rubber handles and stainless steel hardware. PREMIUM HARD CASE In use with camera and film professionals, military, law enforcement, and hunters worldwide as a rifle case. + +Price is $",55 +"How much does this cost to the nearest dollar? + +Brock Replacement Driver and Passenger Halogen Headlights Headlamps Compatible with +Meets all OE specifications, with DOT stamp Exact replacement for stock assembly New, clear lenses ensure full illumination and maximum safety Lens and housing included 1-Year Limited Warranty Manufacturer Brock, Brand Brock, Model Replacement Headlight Assemblies, Weight 16 pounds, Dimensions 23 x 23 x 15 inches, Country of Origin Taiwan, model number Is Discontinued No, Manufacturer Part OEM Part 3C0 941 006 AE, ABPA Partslink Position Rear, Front, Bulb Type Halogen, Special Features Waterproof, Rank Automotive Automotive Headlight Assemblies 16731, Available September 9, 2022, Specific Uses For Product Head Lights, Light Source Type Halogen, Vehicle Service Type Car + +Price is $",120 +"How much does this cost to the nearest dollar? + +Carlinkit Ai Box Mini, Android 11, Multimedia Video Magic GPS,Wireless Caplay & Wireless Android Auto, Only Support Car with OEM Wired CarPlay +Compatible models recommended Car Links Your Phone Over The Air, Applicable to cars of 2015 and above. Please check the listing page before purchasing. If yours is not in the list, please ask for help from carlinkit. Multiple Online service Real-time online Maps will guide you at any time, either by connecting to a mobile phone hotspot, or by inserting a SIM card. Both of these allow you to enjoy the convenience. It also supports voice assistants, adding a new way to free your hands. SIM & TF Card & Type-C slot Simple card slot design makes everything clear at a glance. Support NANO SIM card + +Price is $",100 +"How much does this cost to the nearest dollar? + +StarDot YouTube Live Stream Camera Bundle, Gray +is a standalone live streaming camera which is compatible with YouTube live streaming and Facebook Live. This camera has been thoroughly tested for continuous 24/7 live streaming. It will broadcast high-quality video directly to YouTube without assistance from computers, cellphones, or third-party servers. Copy and paste a YouTube stream name/key from your YouTube account to the camera configuration page, and you're streaming in less than a minute. Easy to set up - connect the camera to your network, and get the stream name/key from YouTube or Facebook Live. Place it in the camera’s web setup page to start streaming. No need to open up additional network Ports in your router or modem settings. Live stream resolutions include HD 1080P, HD 720P, + +Price is $",110 +"How much does this cost to the nearest dollar? + +Atomic Compatible MERV 8 Carrier Replacement Furnace Filter - 2 Pack +The Atomic is a compatible filter fits the Carrier FILCAB mechanical air cleaner and MPKA series. This media filter is a whole house filter which is attached to the HVAC system. It has a MERV 8 filter efficiency value, which indicates how efficient the particles that can be trapped by the filter. The higher the rating, the finer the filtration and the fewer the particles that pass through it. To further increase filtration, it has a pleated rather than a flat surface, thereby increasing the filtering surface area. This efficiently traps airborne particles as small as 3 microns. An additional benefit is that an air filter will also extend the life of your heating and cooling system by making it work more efficiently by preventing the + +Price is $",50 +"How much does this cost to the nearest dollar? + +Bandai Awakening of S. H. s.h.figuarts star wars / force Obi-Wan Kenobi +SH Figuarts Star Wars Obi-Wan Kenobi (EpisodeI) about 155mm ABS & PVC painted action figure bandai star wars japan awakens Theme Action,Star Wars, Brand STAR WARS, Material Polyvinyl Chloride, Occasion Birthday, Dimensions 8\ L x 6\ W x 8\ H, Cartoon Character Star Wars, Room Type Office, Living Room, Bedroom, Pieces 1, Assembly Required No, s 1, Collection Name Action Figure, Shape Novelty, Manufacturer Bandai, Quantity 1, Weight 4.6 ounces, model number Rank Toys & Games Action Figures 44598, Is Discontinued + +Price is $",15 +"How much does this cost to the nearest dollar? + +Fit System 62135G Passenger Side Towing Mirror for Silverado/Sierra, 2500, 3500, Textured Black, Arrow Signal, Dual Lens, 1st Design, (no Power fold/Side Reflector/BLIS), fold, Heated Power +Passenger Side Towing Mirror for Silverado/ Sierra 1500, 2500, 3500, 1st design. Textured black, LED Arrow Signal and dual lens. Without power fold, side reflector and blind spot detection system. Foldaway. Heated Power. Towing Mirror glass is power adjustable. Convex Lens. Towing Mirror glass has heating capability to clear ice, snow and fog. Manual folding for additional clearance. Towing Mirror has the ability to extend. + +Price is $",88 +"How much does this cost to the nearest dollar? + +Black Horse Black Aluminum Exceed Running Boards Compatible with GMC Terrain / Chevriolet Equinox +Black Horse Black Aluminum Exceed Running Boards compatible with GMC Terrain / Chevriolet Equinox Black Horse Off Road Aluminum Exceed Running Board - Features an all-black design with Chrome Trim/Compatible with GMC Chevrolet wide flat stepping surface Built with heavy-duty aluminum/Resistant to rust and corrosion for long-lasting use /All necessary hardware included Stripe design for a strong grip/Designed to look like a part of your vehicle/Easy installation, DIY instructions and all mounting hardware included Eases step in or out of vehicle// Manufacturer Black Horse Off Road, Brand Black Horse Off Road, Model Exceed Running Boards, Weight 40 Pounds, Dimensions 73 x 9 x 11 inches, Country of + +Price is $",70 +"How much does this cost to the nearest dollar? + +Dearsun Twinkle Star Color Night Light Plush Pillows Light up Night Stuffed Toys Perfect for Birthday (Orange) +% Polyester Polyester 100% Polyester Size 13.8 x 3.1 x 13.8 inch This is a good plush pillow to show your kids that what is a star and how to shine. Turn on the press on Star and the light will turn off in 15 minutes automatically. The star has multiple colors when lighted. Fill Material Polyester, Color Orange, Size 1 Count (Pack of 1), Brand DearSun, Shape Novelty, Special Feature Protable, Cover Material Polyester, Pattern Star, Age Range (Description) Child, s 3, Dimensions 13.8\ L x 13.8\ W, Care Instructions + +Price is $",30 +"How much does this cost to the nearest dollar? + +Pokemon - Gallade Spirit Link - XY Roaring Skies +In the Pokemon Trading Card Game, players build decks around their favorite Pokemon and then play against each other, sending their Pokemon into battle to prove who the best Pokemon Trainer is. Players can begin with theme decks - pre-constructed decks designed to cover the basics of the game. Then, they can augment their card collections with booster packs that provide more cards, letting players develop more diverse decks. With thousands of cards to choose from, the game is never the same twice. Card Name Gallade Spirit Link Card Type Trainer - Item Card Number 83/108 Artist 5ban Graphics Set Roaring Skies Card Text Your turn does not end if the Pokmon this card is attached to becomes M Gallade-EX. A single individual + +Price is $",39 +"How much does this cost to the nearest dollar? + +Ibanez GIO Series Classical Guitar - HH Infinity R - Black Night +Ibanez classical guitars take the guesswork out of finding an affordable, great-sounding classical guitar that's easy to fret and play. Whether you are looking for a traditional classical-sized instrument or a comfortable nylon-string beginner guitar, they are extremely well-constructed, affordable and have the pristine tonality and playability of much more expensive instruments. Ibanez builds guitars for players of all levels—from beginners to the most demanding masters of the instrument. Regardless of price, Ibanez always strives to offer the absolute best sound, style, and playability in its class. The Standard series incorporates all the staples the Ibanez brand is famous for, such as fast necks, floating terms, and high-oct + +Price is $",200 +"How much does this cost to the nearest dollar? + +Set 2 Heavy Duty 12 Ply Skid Steer Tire w/Rim Guard +Deep tread designed to resist gouging and cutting. Brand new, not retreads. Heavy duty 12 Ply rated with Rim Guard to protect your wheels, Durable tread pattern for super stability. 32.7 oval diameter, 12.3 section width, 23/32 tread depth, max load 6320 lb@80 psi Tire Specifications Tire Size Tire Size Brand SUPERGUIDER Brand SUPERGUIDER Tread Pattern SKS-1 Tread Pattern SKS-1 Ply Rated 12 Ply Rated 12 Tread Depth 0.72 Tread Depth 0.72 Rim Width 9.75 Rim Width 9.75 Max Load Max Load Please note fitment guide is for + +Price is $",400 +"How much does this cost to the nearest dollar? + +Hairpin Table Legs 28 Heavy Duty Hairpin Legs, (Set for 4 ) Heavy Duty Table Legs (Black) +★Hairpin Table Legs Whether you’re a professional carpenter or woodworking is your hobby, our metal furniture legs will give your project the support it needs! ★CREATE A CUSTOM UNIQUE GIFT - Using these hairpin legs to create a custom coffee table, end table, or night stand lets you put together a unique gift that will stand out above the rest. Your gift will be remembered, cherished, and used for years to come. ★Designed for Versatility With a sleek, mid-century modern look, our industrial table legs are ideal for desks, benches and any piece of furniture in between! Finished with the latest in powder coating technology, the legs are uniform and smooth to + +Price is $",50 +"How much does this cost to the nearest dollar? + +Marada Racing Seat with Adjustable Slide for Racing Wheel Simulator Stand Cockpit Adjustable Seat Back Breathable Fabric Black with Installed Parts +Adjustable The adjustment angle of the seat back is 60-135 degrees. By adjusting the handle you can easily adjust to the angle you want.Can be suitable for players of different sizes. Overall Height 34.2, Side Width 21.2, Knee Width 20.6, Seat Back Height 30.7, Shoulder Width 21 Material Cloth, not easy to dirty. The fabric is very breathable and Suitable for sedentary. The product is not easy to deform, protect your spine and cultivate good driving habits. Design The seat bottom adopts double lock slide rail design, which is very stable, high matching with our bracket, and easy to install Experience + +Price is $",299 +"How much does this cost to the nearest dollar? + +Remington Industries 24 AWG Gauge Stranded Hook Up Wire, 25 feet Length, White, 0.0201 Diameter, 300 Volts +Hook up wire is used in a variety of general-purpose electrical applications. Stranded copper wire provides good electrical connectivity while PVC insulation protects the wire against abrasion, chemicals, oils, and solvents. The wire conforms to UL and MIL-SPEC specifications, and provides excellent uniformity for easy processing, stripping, and terminating. Available in black, red, white, Blue, green & yellow. Voltage rating 300 volts Type Ul1007 stranded wire (7/32) Insulation pvc (0.016 inch Color white Color White, Brand Remington Industries, Material wire wound, Gauge 24.0, Voltage 300 + +Price is $",99 +"How much does this cost to the nearest dollar? + +Acer Ultrabook, Intel Core 4GB Memory, 320GB HDD and 20GB SSD, Windows 8 +The Acer Aspire S3 Ultrabook is catching lots of attention and now so will you with the Champagne color design. This ultra-thin 13.3 ultrabook is less than 3 lbs light and only 0.5 thin, yet it packs a powerful 2nd Gen Intel Core i3 Processor and is outfitted with Acer Green Instant On and Always Connect for instant response and continuous connectivity. The Acer Aspire S3 Ultrabook all the best new experiences in a ultra-aerodynamic design, transforming your mobile lifestyle! HD widescreen CineCrystal LED-backlit display. Screen Resolution 1366 x 768 Intel Core processor + +Price is $",299 +"How much does this cost to the nearest dollar? + +ICBEAMER 7 RGB LED Headlights Bulb Halo Angel Eye DOT Approved Phone APP Bluetooth Control for Jeep Wrangler +⭐Transform your Jeep Wrangler with ICBEAMER's RGB Multifunction Halo Angle Eye LED Headlights - control brightness, mode selection, and more with the ICBEAMER phone app. ⭐Upgrade your Jeep Wrangler with ICBEAMER's easy-to-install LED Headlamp Assembly - Plug & Play design, with built-in Canbus and H4/H13 Adapter included. ⭐Experience unbeatable visibility with ICBEAMER's 7 LED Headlight Bulbs - high and low beam output of 3600 LM and 1800 LM respectively, and water-proof IP67 for reliable performance in any weather. ⭐Perfect fit for + +Price is $",100 +"How much does this cost to the nearest dollar? + +R1 Concepts Front Rear Brakes and Rotors Kit |Front Rear Brake Pads| Brake Rotors and Pads| Ceramic Brake Pads and Rotors |fits Lexus IS250 +R1 Concepts Series brake rotors are great for those who want a medium performance upgrade over their factory brakes. Every rotor uses a iron grade of G3000 that provides great stability and braking power. All-in-One Complete Brake Kit Replacement eLine Series Front & Rear Brake Kit comes with (4) high performance brake rotors and (8) low-dust ceramic brake pads. High Performance Brake Rotors Made of G3000 grade cast iron with zinc finish for ultimate rust protection. Built with O.E.M specifications in mind, no modification required. Ultimate Stopping Power Precision-drilled holes and countersunk design + +Price is $",100 +"How much does this cost to the nearest dollar? + +Camplux 2.64 GPM Tankless, Outdoor Portable Gas Water Heater with Overheating Protection, Instant Propane Hot Water Heater for RV, Camping, Cabins, Barns, White +𝐂𝐨𝐦𝐩𝐚𝐜𝐭, 𝐋𝐢𝐠𝐡𝐭 𝐖𝐞𝐢𝐠𝐡𝐭 𝐏𝐨𝐫𝐭𝐚𝐛𝐥𝐞 𝐃𝐞𝐬𝐢𝐠𝐧 - 12.8 inches, lbs. Compact and portable design perfect for barns, cabins, outdoor instant + +Price is $",70 +"How much does this cost to the nearest dollar? + +KNOKLOCK 10 Pack 3.75 Kitchen Cabinet Handles Brushed Satin Nickel Cabinet Pulls Kitchen Cabinet Hardware Drawer Pulls for Dresser Cupboard Wardrobe +Material - The cabinet handles is made of zinc alloy, brushed satin nickel finish, more stable and durable, while making your cabinet more delicate and beautiful. Cabinet Pulls Dimensions - Hole Centers(CC) 3.75 Overall Length 4.9 Width 0.60 Projection 0.80 (22mm) Fits Most Cabinets - We offer 1 (25mm) and 1.77 (45mm) mounting screws to help you mount most furniture of different thicknesses, Machine Screws Metric Size M4 Versatile Appicatications - This brushed satin nickel cabinet handles is perfect for dressers, drawers, + +Price is $",60 +"How much does this cost to the nearest dollar? + +Valley Enterprises Yaesu USB FTDI CT-62 CAT Cable Length 10 Feet +Aftermarket Programming Cable Aftermarket Programming Cable FTDI USB Chipset FTDI USB Chipset TX and RX Led indicators TX and RX Led indicators Total Length 10 feet Total Length 10 feet No programming software included No programming software included For use with Yaesu This device requires an FTDI USB VCP Driver. Virtual COM port (VCP) drivers cause the USB device to appear as an additional COM port available to the PC. Application software can access the USB device in the same way as it would access a standard COM port. A link to download the free driver is included. Aftermarket Programming Cable FTDI USB chipset TX and RX Led indicators Total Length 10 Feet No + +Price is $",30 +"How much does this cost to the nearest dollar? + +G9 LED Light 100W replacement halogen bulbs equivalent g9 led bulbs AC110V 120V 130 voltage Bi-Pin Base Corn Base,Daylight White of 4) +Perfect G9 replacement(Daylight White 6000K) This G9 bulb is the same type as traditioanl g9 base replacement, producing confortable light Efficient Each this type of candelabra bulbs provides around 850lm, improving the brightness of your room/home Simple Installation G9 base. Installs into existing G9 base holder Applications Furniture lighting, office lighting, merchandise lighting, display lighting, interior light etc one year, free replacement if any not working during period, please send email to us directly. Brightness 102 pcs LED chip.Brightness than general LED G9 ALL + +Price is $",60 +"How much does this cost to the nearest dollar? + +ZCHAOZ 4 Lights Antique White Farmhouse Ceiling Light Fixture Flush Mount Chandelier Ceiling Lamp Modern Sputnik Light Fixtures Hanging for Dining Room Bedroom Living Room Kitchen Entryway Foyer +Light Source & Dimmable White flush mount ceiling light is compatible with various types of 4 x E26 base bulbs(max 60w per blub), options include incandescent, led, halogen, Edison bulb, cfl, etc(Bulbs are Not Included). This hanging light fixtures is dimmable if working with dimmable bulbs and compatible dimmer switch(Not Included Also). Handmade Distressed White ZCHAOZ white ceiling light fixtures ceiling mount is made from high quality iron material in handmade white finish coating with a sturdy cylinder structure design in the center extending + +Price is $",50 +"How much does this cost to the nearest dollar? + +Honeywell Honeywell VisionPro Heat/Cool Digital Thermostat, White +This Honeywell Digital Thermostat is the perfect upgrade to any home. Thermostat has RedLink Wireless Communication, Touch Screen and 7 day programmability. Stages up to 3 Heat / 2 Cool RedLINK wireless communication Precise temperature control (+/- 1° F) for reliable and consistent temperature Package weight of the Product 9.6 Ounces Brand Honeywell, Model Name Controller Type Android, Special Feature Programmable, Color White, Power Source Battery Powered, Weight 9.6 ounces, Voltage 24 Volts, Material Plastic, Shape Rectangular, Display Type 10 sq.in. LCD, Control Type Touch, s 1, Control Method Touch, Mounting Type Wall Mount, + +Price is $",60 +"How much does this cost to the nearest dollar? + +Patriot Exhaust 1-7/8 Clippster Exhaust Header for Big Block Chevrolet 67-81, Silver Ceramic Hi-Temperature Coating +Clippster style headers are perfect for grafting modern uni-body front clip suspensions to street rods, muscle cars and trucks. Clippster headers use longer primaries than tight tucks, yet shorter than full length headers. Collectors exit toward the rear of the engine compartment providing excellent ground clearance on slammed applications as well as clearing steering and suspension components. Mid length or clippster headers provide improved ground clearance for popular muscle cars and street rods Durable tubing Comes complete with gaskets, header bolts and collector reducers Available in three finishes Silver Ceramic Hi-Temperature Coating Popular metallic ceramic coating Limited one year warranty Manufacturer Patriot Exhaust, Brand Patriot + +Price is $",70 +"How much does this cost to the nearest dollar? + +Fitrite Autopart New Front Left Driver Side Fender For Nissan Altima, Made Of Steel +Product Name New Front Left Driver Side Fender For Nissan Altima, Made Of Steel Product Name New Front Left Driver Side Fender For Nissan Altima, Made Of Steel Condition New Condition New Warranty 1 Year Warranty 1 Year Fitment Type Vehicle Specific Fitment Type Vehicle Specific Condition New Placement on Vehicle Front, Left Driver Side Warranty 1 Year Fitment Type Vehicle Specific Parts Link No OEM Number Brand Fitrite Autoparts, Exterior Finish Primed, Material Alloy Steel, Dimensions LxWxH 45 x 35 x 11 inches, Weight 8.1 Pounds, Style Modern, Auto Part Position Front Left, Vehicle Service Type Car, Fit Type Vehicle Specific Fit + +Price is $",70 +"How much does this cost to the nearest dollar? + +Technical Precision Replacement for GE General Electric G.E Light Bulb +Replacement For GE GENERAL ELECTRIC G.E Light Bulb Unit per sale 1 Brand Technical Precision, Light Type CFL, Wattage 55.00, Bulb Base G8, Specific Uses For Product Lamp, Light Color Warm White, Unit Count 1 Count, Color Temperature 3000 Kelvin, s 1, Brightness 4000 Lumen, Shape Cd, Size 1 Count (Pack of 1), Connectivity Technology Normal bulb, Controller Type Push Button, Color Rendering Index 82, Manufacturer Technical Precision, Part Weight 7 ounces, Dimensions 11.57 x 9.45 x 1.89 inches, Is Discontinued No, Quantity 1, Rank Industrial & Scientific Compact Fluorescent Bul + +Price is $",50 +"How much does this cost to the nearest dollar? + +Covercraft Carhartt SeatSaver Front Row Custom Fit Seat Cover for Select Ford Models - Duck Weave (Gravel) +Carhartt SeatSaver seat covers from Covercraft are the solution to the problem of keeping the seats in your truck or SUV clean and protected from daily use and weekend adventures. Made from durable, duck-weave fabric, these custom-fit seat covers protect your seats from dirt, mud, grime, spills and more. Featuring Rain Defender technology, a durable water repellency finish is added to the fabric to make it highly water resistant. Combine these features with the custom fitment and classic Carhartt styling, you get seat covers that look great and protect your seats from whatever you throw at them. Classic Duck-Weave Carhartt fabric for durability Custom-made + +Price is $",99 +"How much does this cost to the nearest dollar? + +Sennheiser SD Pro 2 - Double-Sided Multi Connectivity Wireless Headset for Desk Phone & Softphone/PC Connection, Ultra Noise-Cancelling Microphone (Black) +With the SD Pro 2, you get everything you need in an office headset, wrapped in one unique product. The SD Pro 2 is a double-sided, premium wireless DECT headset for desk phone and PC/softphone with base station. It features Sennheiser Voice Clarity, ultra noise-cancelling microphone, and ActiveGard hearing protection technology. Choosing the right SD Pro 2 SD PRO 2 This headset is designed for business professionals who communicate with their desk phone and softphone/PC. SD PRO 2 ML This headset is designed for business professionals who communicate in desk phone and + +Price is $",80 +"How much does this cost to the nearest dollar? + +Hitachi Mass Air Flow Sensor +Hitachi’s Air Flow Sensors (MAFs) measure the amount and characteristics of air entering the engine. Hitachi uses precision elements and high-quality components for enhanced durability and accurate air flow measurements. Hitachi MAFs are 100% air flow tested for ideal performance and are calibrated for each application ensuring your vehicle meets the strict emission standards set by the manufacturer. Details such as a contaminant bypass port (when applicable) and protected circuitry providing a durable and reliable product approved by OE manufacturers makes Hitachi’s air flow sensors the premium choice. New orignial equipment part Restores original drivability characteristics Meets the OE performance and durability standards for this application Precision manufactured and assembled sensing elements for accurate air flow measurements Built in contaminant bypass port provides reliable operation + +Price is $",120 +"How much does this cost to the nearest dollar? + +AmScope LED Cordless Stereo Microscope w/Top & Bottom Light Illumination System and 36 specimens +This cordless LED binocular stereo microscope comes with two pairs of stereo objective lenses mounted in a rotating nosecone, sturdy all-metal pillar stand, and a versatile illumination system that provides both incident (top) lighting and transmitted (bottom) lighting. You can choose between incident illumination shining down onto the object or transmitted illumination through the frosted stage plate. The first is used for the observation of three-dimensional objects and the second for the observation of slides. It comes with a rechargeable illumination system capable of taking rechargeable AA batteries, and an AC adapter/charger. This microscope offers high resolution and good depth within a broad field of view. It gives sharp clear stereo images. Its 45 + +Price is $",300 +"How much does this cost to the nearest dollar? + +Front Left Driver Side Window Regulator - Compatible with Kia Optima +Front Left Window Regulator - Compatible with 2014 - 2015 Kia OptimaPosition Front LeftNote Includes Module PanelCompatible With or Fits Note w/ USA Built - 2014 - 2015 Kia Optima EX - 2014 - 2015 Kia Optima EX Luxury - 2014 - 2015 Kia Optima LX - 2014 - 2015 Kia Optima SXNote - 2014 - 2015 Kia Optima Limited - 2014 - 2015 Kia Optima SX Turbo - 2014 - 2015 Kia Optima SXL Turbo Includes Module Panel Compatible with or fits (Note w/ USA Built; 2014 - 2015 Kia Opt + +Price is $",45 +"How much does this cost to the nearest dollar? + +Premium Replica Hubcap Set, Fits Nissan Rogue Replacement Wheel Covers +This is a set of 4 Brand New replica Nissan hubcap. Fits 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Rogue. Silver-painted. This is a copy of a factory-original. Our Replica wheel covers are made of sturdy ABS plastic and feature a rich silver finish, just like the originals. They will look great on your vehicle for years to come. Brand New Condition Aftermarket replacement for Nissan part Fits Nissan Rogue 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 model years. Mounts easily and securely to + +Price is $",66 +"How much does this cost to the nearest dollar? + +Excellerations Phonics Spelling Game for Kids and Classrooms Classroom Activity (12 Game Boards) (Item # PSG) +Excellerations Early Language, Phonics Spelling Game, Kids Educational Toy, Ages 3 Years and Up (Item # PSG) CLASSROOM ESSENTIAL This spelling game is perfect for those ages 3 and up to learn about how to sound out words EASY TO USE This activity comes equipped with a helpful activity guide for tricks to teach about phonics and words INTERACTIVE LEARNING This phonics spelling game will get students interacting with their learning while developing spelling and word association skills DURABLE DESIGN This spelling game is made with reusable game boards and durable foam letter tiles so that it can be used again and again ENHANCE CURRICULUM Build on + +Price is $",60 +"How much does this cost to the nearest dollar? + +RC4WD BigDog Dual Axle Scale Car/Truck Trailer Electric Car/Truck Option Parts +Key Features Hand made tube trailer chassis Billet aluminum wheels 1.55 Dirt grabber tires Steel leaf spring Working lift jack Steel tool box for battery and light switch Working lights Easy clip hitch mount Two steel ramps Ramp holders Whitebone inspired design Weight Length 22.63 Width 12.28 Height 5.31 Inside of the Deck Length 15.9in / 404mm Inside of the Deck Width 8.5in / 216mm Tail Plate Length 6.1in / 155mm Tail Plate Width 2.55in / 65mm Overview This is a 1/10 Car or Truck Hauler. Perfect for towing around your rig to + +Price is $",100 +"How much does this cost to the nearest dollar? + +Unknown Stage 2 Clutch Kit - Low Altitude +Amazing all-around performance gains. Easily adjustable for altitude, modifications and more. Superior weight profile offers better acceleration. Goldstar weights adjust easily with magnets. Custom angle cut helix. Comes with two washers to prevent spring bind and free up clutch movement for faster shifts. Includes primary and secondary springs.This item fits the following vehicle applications compatible with Polaris 600 Rush PRO-S with Polaris 600 Rush XCR with Polaris 600 SwitchBack Adventure with Polaris 600 SwitchBack Assault 144 with Polaris 600 SwitchBack PRO-S with Polaris 600 SwitchBack SP 144 with Polaris 600 SwitchBack XCR with Polaris 800 Rush PRO-S with Polaris 800 Rush PRO-S LE with Pol + +Price is $",99 +"How much does this cost to the nearest dollar? + +Dodge Ram 1500 Mopar 4X4 Emblem - +BRAND NEW AND MOPAR GENUINE 2007 2008 2009 2010 Dodge Ram 1500 2500 3500 4X4 Logo Emblem Decal Genuine MOPAR Part Number Oe Spec Or Performance/Custom OE Spec, Manufacturer Warranty 2 Year, Modified Item No Manufacturer Part Number Model Ram 1500 2500 3500, Brand Compatible with Mopar Returns Accepted Returns Accepted, Non-Domestic Product No, Make Compatible with Dodge Fitment Type Direct Replacement, Model Year Manufacturer Mopar, Brand Mopar, Weight 1 pounds, Dimensions 6 x 6 x 6 inches, model number Manufacturer Part Position Rear, Available + +Price is $",60 +"How much does this cost to the nearest dollar? + +Pro Comp Alloys Series 89 Wheel with Polished Finish (16x8 +Pro Comp Alloys are designed using State-Of-The-Art Low-Pressure-Casting Technology providing unsurpassed wheel strength, style and value. Pro Comp Alloy Wheels combine head turning style, light weight, durable finish in black, graphite, milled, chrome, polished and dual-tone finishes. Pro Comp Alloys allows for massive brake clearance for todays performance Jeeps, trucks and SUVs. 108 inches Bolt Pattern 6x5.5 inch Back Space 4.5 inch Size 16 inches X 8 inches, Brand Pro Comp Alloys, Wheel Size 16 Inches, Pitch Circle Diameter 139.7 Millimeters, Weight 26 Pounds, Diameter 16 Inches, Vehicle Service + +Price is $",300 +"How much does this cost to the nearest dollar? + +Detroit Axle - Front Rear Strut & Coil Spring Assembly Replacement for Toyota Camry 2.2L Models - 4pc Set +Kit Includes 1x Complete Front Strut & Coil Spring Assembly - Driver Side - 171956 1x Complete Front Strut & Coil Spring Assembly - Driver Side - 171956 1x Complete Front Strut & Coil Spring Assembly - Passenger Side 1x Complete Front Strut & Coil Spring Assembly - Passenger Side 1x Complete Rear Strut & Coil Spring Assembly - Driver Side - 171958 1x Complete Rear Strut & Coil Spring Assembly - Driver Side - 171958 1x Complete Rear Strut & Coil Spring Assembly - Passenger Side - 171957 1x Complete Rear Strut & Coil Spring Assembly + +Price is $",300 +"How much does this cost to the nearest dollar? + +ECCPP Rear Wheel Axle Replacement fit for for Honda Sportrax 2009 +This axle works on the following models 2002 for HondaSportrax for HondaSportrax for HondaSportrax for HondaSportrax for HondaSportrax for HondaSportrax for HondaSportrax for HondaSportrax for HondaSportrax for HondaSportrax for for for for Package including 1 piece of Rear Wheel Axle Fitment - for Honda Sportrax 2009 Length - 855 mm, brand new complete rear wheel axle OE quality - Produced in the same specifications and functions as OE. Refer OE number Premium Material - Long service life with high quality raw material and the complete polishing Installation - Replace directly with assembly, easy to install. Brand ECCPP, Weight 15.22 pounds, Dimensions 35 + +Price is $",300 +"How much does this cost to the nearest dollar? + +Dell Latitude E6520 Intel 8GB RAM 500GB HDD Win 10 Pro DVD-RW (Renewed) +Keep up with business wherever you are with the Latitude E6520 laptop. It is ideal for professionals looking for a stable and durable laptop that is light and easy to carry on the go. Specifications Processor Intel Quad Core up to 3.3 GHz Graphics Intel HD Integrated Graphics Memory 8G DDR3 Hard Drive 500G Webcam Webcam Operating System Windows 10 Pro 64 Bit Multi-Language. Ports Network connector USB 2.0 (4) – 1 USB/eSATA combo, Stereo headphone/Microphone combo jack, 1394, Docking Connector, VGA, HDMI. Warranty 1 full year Parts and Labor Warranty Included in the box Computer + +Price is $",310 +"How much does this cost to the nearest dollar? + +F FIERCE CYCLE 251pcs Black Universal Motorcycle Fairing Body Bolts Kit Fastener Clips M5 M6 Screws Nuts +Includes hardware for all fairing pieces including, front, mid, lowers, rear, windscreen, and more. Fitment for Honda, for Kawasaki, for Yamaha, for Suzuki This kit is a common size on most motorcycles but it remains the responsibility of the buyer to check the appropriate size fittings are ordered. Simply replace the existing stock fairing bolts with these bolts. Package Includes 5 x Bolt (M6 x 40mm), 20 x x 16mm), 20 x x 20mm), 40 x x 16mm), 8 x x 25mm), 17 x Self Tapping x 12mm + +Price is $",40 +"How much does this cost to the nearest dollar? + +Flash Furniture 4 Pk. HERCULES Series 880 lb. Capacity Black Plastic Stack Chair +When in need of a space-saving seating solution that is either permanent or temporary, stack chairs have been proven to be beneficial. Stack chairs are a popular choice for many businesses that include hotels, schools, restaurants, cafeterias, and offices. This industrial looking chair hits the mark on comfort. This chair features a carrying handle to easily transport. This versatile chair is ideal for both indoor and outdoor functions. With the ability to quickly store the chairs, it allows for the space to be used again for other purposes or when cleaning is needed. This heavy duty plastic stack chair is sturdy in construction to withstand regular use and frequent stacking. To make transporting even easier, equip yourself with the appropriate sized + +Price is $",60 +"How much does this cost to the nearest dollar? + +B&M 30287 Throttle Valve/Kickdown Cable, Silver/Black +This throttle valve / kickdown cable is adjustable, so it can accommodate most TH350 applications. It's handy as a straight replacement for an old OE model, and less expensive. If your project includes a transmission swap, you will appreciate the universal sizing during installation. Adjustable for a universal fit in 95% of all listed transmission applications Eases the installation process for transmission swaps Less expensive than OE models For use with TH350 transmissions Backed by the manufacturer with a 1 year limited warranty Brand B&M, Color Silver/Black, Pieces 1, Special Feature Easy to Install, Included Components Screw, Weight 8 ounces, Unit Count 1.0 Count, s 1, Manufacturer B&M, model + +Price is $",70 +"How much does this cost to the nearest dollar? + +Gates TCK226 PowerGrip Premium Timing Belt Component Kit +Gates is the world's leading manufacturer of timing belts and Timing Component Kits. We designed our kits for virtually every vehicle make and model so technicians can conduct complete system replacements and streamline parts sourcing. As the Original Equipment Manufacturer (OEM) for vehicle manufacturers globally, our Timing Component Kits are OE-equivalent or better in service life, quality, and performance. Total solution for any application TCK includes belts, idler(s), tensioners, tensioner springs, supporting hardware, detailed installation instructions and Technical Service Bulletins for troublesome applications Popular kits covering domestic and import (European and Asian) vehicle applications Designed for convenience, easy parts sourcing and reduced customer comebacks Manufacturer Gates, Brand Gates, Model PowerGrip Premium Timing Belt Component + +Price is $",60 +"How much does this cost to the nearest dollar? + +Monroe Shocks & Struts Quick-Strut 171491 Strut and Coil Spring Assembly +Featuring a vehicle-specific design, Monroe Quick-Strut strut assemblies are fit checked, ride tested and engineered to restore factory ride height and ride performance. Assembled in Paragould, AR, they include all required components in a single unit. QUICKER, SAFER, EASIER AND COMPLETE REPAIR -- Includes everything you need for strut replacement in a single, fully assembled unit with no need for a spring compressor RESTORES RIDE HEIGHT -- Precisely calibrated to meet the OE design, each application-specific coil spring type is engineered to restore ride height and support the vehicle's weight VEHICLE-SPECIFIC DESIGN -- Application-specific coil spring, mount and strut designs ensure optimized ride + +Price is $",70 +"How much does this cost to the nearest dollar? + +Feit Electric 35W EQ DM MR16 LED Light Bulb, 6 Bulbs +This Feit Electric equivalent traditional glass MR16 flood LED light bulb has a GU10 base. Featuring bright white and high 90+ CRI (color rendering index) rating this Enhance LED is our highest quality energy efficient light with bolder color rendering and enhanced contrast so people and objects appear more realistic and vibrant. This MR16 reflector produces a similar light output while using less energy than a standard incandescent light bulb. The dimmable light has an average life of 25000 hours / 22 years and is safe for indoor or outside use. Choose a dependable high quality 120 volt MR16 bulb for residential or commercial applications. Specifications 💡 Color temperature 3000K + +Price is $",70 +"How much does this cost to the nearest dollar? + +Yellow Jacket 2806 Contractor Extension Cord with Lighted End; 100 ft; 100 Ft +Product Description Yellow Jacket 2806 10/3 Heavy-Duty 15-Amp SJTW Contractor Extension Cord with Lighted End, Super flexibility in cold and hot weather. Power Lite power indicator lamp glows through the extra heavy, clear molded plug when the cord has power. Three times as abrasion resistant as standard vinyl, making these cords the toughest on the jobsite. Extra heavy, clear molded plugs are rugged, durable and oversized. Meets OSHA specifications, UL Listed. 10 Gauge. The Yellow Jacket (R) brand is a registered trademark of Coleman Cable Inc. From the Manufacturer Yellow Jacket 2806 10/3 Heavy-Duty 15-Amp SJ + +Price is $",99 +"How much does this cost to the nearest dollar? + +Garage-Pro Tailgate SET Compatible with Chevrolet Silverado 1500, Fits 2007 Chevrolet Silverado 1500 Classic, 1500 HD Classic, 2500 HD Classic, 3500 Classic Fleetside/Styleside +Manufactured from high quality materials Manufactured from high quality materials Easy to install; replaces old or damaged part Easy to install; replaces old or damaged part This is an OE replacement item This is an OE replacement item Garage-Pro is the most affordable brand for your old, worn-out, or damaged factory part! This premium quality replacement part is made to give your car, truck, and SUV that original factory look and performance. Available for different applications, our Garage-Pro part will surely fit right to your vehicle. Comes with 1-year unlimited mileage warranty! + +Price is $",80 +"How much does this cost to the nearest dollar? + +3M Perfect It Buffing and Polishing Kit | 36060 06094 06068 3M Rubbing Compound, Machine Polish, Ultrafine Polish | Buffing Compound, Car Polishing Kit | Bundled with Kangaroobands Microfiber Cloth +The 3M Perfect-It Paint Finishing System 3M Perfect-It EX AC Rubbing Compound is the best-performing rubbing compound for removing scratches and surface defects before polishing, even on the latest clear coats. Longer Working Time, Easier Cleanup Even in Extreme Conditions As part of a complete system for creating showroom-grade finishes, it is the ideal compound for the critical pre-polishing stage in collision repair. This fast-cutting, fine-finishing compound removes fine grade (P1200 or finer) sand scratches and + +Price is $",40 +"How much does this cost to the nearest dollar? + +Chinese Style Dollhouse Model DIY Miniature Furniture Kit Wooden Tea Shop Dolls House with LED Lights Accessories Hand Craft Puzzle Toy Birthday Gift +Feature This dollhouse makes a great craft project and gift for both friends and collectors! The pictures shows finished project. You receive are spare parts,Mainly through, paste, assembly, modeling, placement DIY craft, complete your lovely beautiful house.Glue and Battery are not included.Detailed pictures instructions. ( Just follow the pictures! )Description Assembly Difficulty Level Time 2-10 hoursFinished Size as picture showsWeight Approximate Include 1 x DollhouseNote 1.The real color of the item may be slightly different from the pictures shown on allow error due to the hand measurement. 3.Due to long shipping, the item may damage in transit, if + +Price is $",40 +"How much does this cost to the nearest dollar? + +Generic NRG Innovations Steering Wheel Short Hub Adapter Kit + LED Keychain Flashlight, black +NRG Innovation has developed another complement to our quick release steering kits. These units were designed specially for an aftermarket steering wheel installed with the quick release kit still mounts in the same location, not too close to the driver. Made from the highest quality aluminum. Our Short Hubs are made to work with our quick release's. This product is designed utilizing one piece solid construction for the maximum in durability and usability. Made of High Quality Aluminum Direct Bolt-on Design, Perfectly fits Any Wheel or Quick Release with a 6-Bolt X 74MM Pattern Anodized for Durability and Strength Racing Style, for Most Aftermarket Racing Brand Steering Wheel Manufacturer NRG Innovations, Brand NRG Innov + +Price is $",50 +"How much does this cost to the nearest dollar? + +Learning Resources Coding Critters Ranger & Zip,22 Piece Set, Ages 4+, Screen-Free Early Coding Toy for Kids, Interactive STEM Coding Pet, Gifts for Boys and Girls,Back to School Gifts +Meet the Coding critters your first coding friends. These playful puppy pets bring early STEM concepts to preschool learning through 100% screen-free coding. Kids code along with their new pets’ storybook adventure, and help the brave Ranger and mischievous zip have a playtime they'll never forget. Each storybook coding challenge unfolds in the Coding critters' Fun pet playset - can you code Ranger to play hide and seek, fetch a ball from the tennis ball launcher, or catch zip after a ride down the slide? In addition to following along with the storybook's coding challenges, + +Price is $",60 +"How much does this cost to the nearest dollar? + +Bosch Automotive 15463 Oxygen Sensor, OE Fitment (Mazda) +Premium Bosch oxygen sensors promise better quality, better overall OE Fit/Form/Function while ensuring better coverage against the competition. Premium Bosch oxygen sensors are designed to improve fuel economy. Vehicles utilizing Bosch premium oxygen sensors experience better engine performance. Premium Bosch oxygen sensors assist in cleaner exhaust emissions. Brand Bosch Automotive, Dimensions LxWxH 2.2 x 1.97 x 5.83 inches, Weight 0.25 Pounds, Style Modern, Mounting Type Flange Mount, Specific Uses For Product Oxygen Sensor, Manufacturer Bosch Automotive, Dimensions 2.2 x 1.97 x 5.83 inches, Country of Origin United Kingdom, model number 15463, Is Discontinued + +Price is $",70 +"How much does this cost to the nearest dollar? + +Case of 24-2 Inch Blue Painters Tape - 60 Yards/roll +Case of 24 rolls of painters tape bulk packed for easy use and access. Each roll is 1.88 inches by 60 yards of masking tape. Professional grade tape is flexible, leaves no sticky residue behind, prevents paint bleed, removes without damaging surface, and gives clean edges. Use for every kind of painting, trimming edging, masking. or protecting. Brightly colored tape works well with delicate and bold paint colors. Durable, strong tape sticks to a variety of clean / dry surfaces. Apply pressure when adhering tape for the cleanest lines as adhesive is pressure sensitive and heat activated. Ideal for use in temperatures from 40 to 130 degrees Fahrenheit. Made in the USA. + +Price is $",70 +"How much does this cost to the nearest dollar? + +MOCA Engine Water Pump & Fan Clutch fit 04-07 for Buick Rainier & 02-09 for Chevrolet Trailblazer & 02-09 for GMC Envoy & 02-06 Envoy XL & 04-05 Envoy XUV & 03-07 for Isuzu Ascender 4.2L +Please confirm this item fits for your vehicle before purchasing (Check Fitment Data Above or see description below) Package Includes 1 Water Pump, 1 Fan Thermostat Housing Assembly Part Numbers 33939, All the Components are produced under strictly observed and meet or exceed OEM performance requirements in Manufacturing and Material Local US friendly after-service team to resolve your issues in time, parts have 2 years or 40000 miles warranty Manufacturer OELINE Auto + +Price is $",110 +"How much does this cost to the nearest dollar? + +SAREMAS Foot Step Bars for Hyundai Palisade 2023 Running Boards Side Steps nerf bar Pedal Protector +The price for one pair(left and right running board) Don't drill,use the factory hole Main raw material high quality Aluminum&ABS ect Including brackets and mounting parts For Hyundai Palisade 2020 2021 2022 2023 Manufacturer Donarrw, Brand SAREMAS, Weight 32.3 pounds, Dimensions 81 x 11 x 8 inches, Exterior Aluminum, Manufacturer Part PATXTB, Rank Automotive Running Boards 6648, Available February 18, 2020, Material Aluminum, Acrylonitrile Butadiene Styrene (ABS), Exterior Finish Aluminum, Vehicle Service Type Passenger Car + +Price is $",100 +"How much does this cost to the nearest dollar? + +Gretsch G9210 Square Neck Boxcar Mahogany Resonator Acoustic Guitar +Classic Squareneck Resonator from Gretsch Neck; Padauk Fingerboard; and Hand-spun Cone - Mahogany Natural Acoustic Squareneck Resonator Guitar with Mahogany Top Sides Weight 10 pounds, Dimensions 20 x 7 x 48 inches, model number Rank Musical Instruments 50797, Acoustic Resonator Guitars 12, Is Discontinued No, Available February 5, 2018, Back Material Mahogany, Body Material Mahogany, Color Name Natural, Fretboard Material Padauk, String Material Phosphor Bronze, Top Material Mahogany Wood, Neck Material Type Mahogany, Strings 6, Brand Gretsch, Color Natural + +Price is $",350 +"How much does this cost to the nearest dollar? + +NikoMaku Mirror Dash Cam Front and Rear OEM Design Backup Camera for Cars 4K Resolution Type-C 11 Inch Full Touch Screen Rear View Mirror Camera 170° Wide Angle Dual Cameras Waterproof AS5 Pro +4K Resolution The AS5 Pro mirror dash cam delivers clear video with its 4K front camera and 1080P rear camera. Equipped with 170° wide-angle front lenses, this camera can capture high-quality footage day or night. The mirror dash cam records in real-time and boasts an enhanced imaging system for superior image quality. With its 4K resolution, every detail on the road is vividly displayed. OEM Look Design The supplied bracket allows for a complete replacement of your existing rear-view mirror. Say goodbye to shaky footage while driving, as the bracket effectively + +Price is $",360 +"How much does this cost to the nearest dollar? + +Fenix HP25R v2.0 USB-C Rechargeable Headlamp Bundle with Backup Battery, 1600 Lumen Spotlight, 400 lumens Floodlight and Red Light with LumenTac Organizer +HIGH-PERFORMANCE - The Fenix HP25R v2.0 headlamp emits up to 1600 lumen spotlight reaching 317 yards. You can also switch to a wide-angle floodlight, or an auxiliary red light to preserve the night vision. USB-C RECHARGEABLE - via its built-in charging port. Includes a high capacity battery. Runs up to 400 hours on the lowest mode. DESIGN FOR COMFORT -The HP25R v2.0 keeps the battery compartment in the rear to maintain a balanced weight. Also comes with cable clips + +Price is $",200 +"How much does this cost to the nearest dollar? + +R&L Racing Heavy Duty Roll-Up Soft Tonneau Cover Compatible with 94-02 Dodge Ram Regular/Club/Quad Cab 6.5' 78 Bed +R&L Racing Roll Up Tonneau Cover. Get effective bed protection, upgraded appearance, and even improved fuel economy, all at a budget price, with the R&L Racing Roll-Up Tonneau Cover. The vinyl cover will protect your truck bed and contents from the elements, and give your truck a smooth aerodynamic appearance that will even reduce drag for more miles per gallon. It features the quick and easy Clean-Seal closure system, lightweight aluminum rails and bows, and easy no-drill installation. Cargo in an unprotected pickup truck bed can become moisture damaged and corrode from exposure to rain and snow, + +Price is $",350 +"How much does this cost to the nearest dollar? + +Garmin GPSMAP 64sx, Handheld GPS with Altimeter and Compass, Preloaded With TopoActive Maps, Black/Tan +Navigate your next adventure with the GPSMAP 64sx handheld navigator series. Whether you’re hiking, cycling, geocaching or climbing, you are free to explore more with the reliable Garmin handheld navigation in the palm of your hand. And the series now has multi-GNSS support and Topo Active mapping. Rugged and water-resistant design with button operation and a 2. 6” sunlight-readable color display Preloaded with Topo Active maps (U. S. and Australia only) featuring routable roads and trails for cycling and hiking Know where you’re at with a high-sensitivity receiver with quad helix antenna and multi-GNSS support + +Price is $",200 +"How much does this cost to the nearest dollar? + +Brown 5-7/8 X 8-1/2 X 3/16 Thick Heavy Duty Felt Sheets - 12 Pcs +Protect your beautiful laminate, ceramic, vinyl or hardwood flooring as well as your precious furniture, with our Heavy Duty Felt Pads made of 100% polyester felt. These brown protector pads are designed to blend in with dark furniture to compliment your home decor. Simply peel and stick them to lamps, furniture and small appliances to protect tabletops, shelves, desks, floors and countertops. These can also be used to provide a cushioning layer between glass tabletops and pedestals or frames. Or place them on cabinet doors to reduce noise when they're closed. Trim them into the exact shape and size you need, and prevent scratches or damages anywhere + +Price is $",120 +"How much does this cost to the nearest dollar? + +GAOMON PD2200 Pen Display & 20 Pen Nibs 8192 Tilt-Support Full-Laminated Graphics Drawing Monitor Tablet for Digital Drawing/Animation/Online Teaching and Meeting +GAOMON PD2200 PEN DISPLAY + 20 PEN NIBS FOR ONLINE EDUCATION & MEETING You can use PD2200 pen monitor for online education and remote meeting. It works with most online meeting programs, like Zoom, and so on. FOR DIGITAL ART & CREATION -- It's not only for amatuer but also for professionalists for digital drawing, sketching, graphics design, 3D art work, animation, etc. FOR ANNOTATING AND SIGNATURE --It is also broadly used in annotating and signing files WITH AG-FILM PRE-APPLIED + +Price is $",80 +"How much does this cost to the nearest dollar? + +VXMOTOR for 97-03 Ford Lightduty 4WD for 99-03 F150 Lightduty F150 Super Crew Cab/04 F150 Heritage for 97-02 Expedition 4WD for 99-02 Expedition 2WD Matte Black Heavyduty Bull Bar +Application for Ford F150 / F250 Lightduty 4WD ( 4 Wheel Drive ) Models, for Ford F150 Lightduty 2WD ( 2 Wheel Drive ) Models, for Ford F150 Super Crew Cab Models, for 2004 Ford F150 Heritage Models, for Ford Expedition 4WD ( 4 Wheel Drive ) Models, for Ford Expedition 2WD ( 2 Wheel Drive ) Models Front Bumper Bull Bar Guard Heavy Duty Steel With Flat Black Fine + +Price is $",100 +"How much does this cost to the nearest dollar? + +HP EliteBook 2540p Intel Core X2 2GB 160GB DVD+/-RW 12.1'' Wi, Black (Refurbished) +Standing screen display size 12.1 Inches, Processor RAM 2 GB DDR3, Hard Drive 160 GB, Graphics Coprocessor Intel HD Graphics, Chipset Brand Intel, Card Description Integrated, Wireless Type Bluetooth, USB 2.0 Ports 3, Brand HP, Microsoft, Series HP EliteBook, model number Operating System Windows 8 1, Weight 3.97 Pounds, Dimensions 19 x 17 x 5 inches, Rear Webcam Resolution 1 MP, Processors 2, Computer Memory Type DDR3 SDRAM, Flash Memory Size 160 GB, Power Source Battery Powered, Available + +Price is $",199 +"How much does this cost to the nearest dollar? + +Green EPX Mixing Nozzles 3M 50ml Duo-Pak Adhesive Cartridges (Longer 4.5in, 1 1 & 2 1 ratios) +This is a of our Atlas Professional Green Screw-On Mixing Nozzles for the New 3M 1 1 and 2 1 ratio 50ml Duo-Pack Cartridge Design (also called a B-System design with a large gray screw-off cap). These are the longer mixing nozzles, which are preferred for most Urethane and many Epoxy adhesives that require more mixing elements to properly mix. They are also in the high-efficiency quadro style, which reduces wasted material by about 50% vs traditional helix nozzles. They reduce wasted + +Price is $",40 +"How much does this cost to the nearest dollar? + +Box Partners 6 1/4 x 3 1/8 13 Pt. Manila Shipping Tags - Pre-Wired +Box Partners G10083 6 1/4 x 3 1/8 13 Pt. Manila Shipping Tags - Pre-Wired 6 1/4 x 3 1/8 13 Pt. Manila Shipping Tags - Pre-Wired Dimensions L x W x H 1.5 x 1.5 x 1.5 inches, Weight 1 Pounds, Dimensions LxWxH 1 x 1 x 1 inches, Weight 1 Pounds, Brand Name Aviditi, Model Name Color Manila, Material Blend, Suggested Users unisex-adult, s 1, Manufacturer BOX Partners LLC + +Price is $",10 +"How much does this cost to the nearest dollar? + +Vixen Air 1/2 NPT Air Ride Suspension High Flow Electric Air Valves/Solenoids 250 PSI Four Corners with Fittings and Hoses +These eight powerful ½ NPT air valves with exceptional high flow deliver unparalleled performance for 12V vehicles. The unique design provides high pressure control at minimal power consumption. Valves support pressures of up to 250 PSI and are constructed with high quality brass to ensure continuous use through extreme conditions. Solenoid's DIN connector is water and dust resistant, a metal mounting bracket is included for each valve, and the air flow direction is clearly marked with an arrow to provide an easy trouble-free installation. Premium brass fittings, flow control valves, pressure switch, drain valve, hoses and cutter are included in this kit. ½ + +Price is $",80 +"How much does this cost to the nearest dollar? + +Smart Floor Lamp, Multicolors Scene DIY Torch Floor Lamp, 24W 2400LM Dimmable Tall Standing Lamp work with Alexa Google Home,Wifi Remote Control RGB Floor Lamp For Living Room +Smart Control💡Control this smart floor lamp using the Smart Life app or your voice with Amazon Alexa or Google Home. You can group multiple lamps together and control them individually or together. Choose from 16 million colors and 12 scenes to create the perfect lighting for any occasion. Note The lamp only works with 2.4GHz Wi-Fi networks. Adjustable Lighting💡 This floor lamp features a range of white color temperatures from 2700K to 6500K and single color RGBWW options. The lamp is also dimmable and uses high-quality LED chips with a C + +Price is $",99 +"How much does this cost to the nearest dollar? + +SOZG 324mm Wheelbase Body Shell RC Car Body Shell Super Hard Plastic Black with Screw for RC Vehicle, SOZGpuFdVe +Specification Item Type Body ShellProduct Material Rigid plasticWeight Approx. 1190g / BlackWheelbase Size For 1/10 RC for, for Axial List 1 Set x Body Shell (81 Bags x ScrewNote 1. Manual measurement, please allow 1‑3mm error, thank you!2. Due to the difference between different monitors, the picture may not reflect the actual color of the This car shell is suitable for 324mm wheelbase chassis, if it is installed on other chassis, the wheelbase needs to be adjusted to Shipped in bulk, assembled by the customer (the door cannot be opened). + +Price is $",30 +"How much does this cost to the nearest dollar? + +Mickey Thompson ET Street S/S Racing Radial Tire - +A D.O.T. approved street tire which provides excellent traction at the strip. Proven polyester-ply, steel belted, tubeless radial construction provides strength & durability for excellent ride control on the street Proven R2 compound provides quick and consistent traction at the STRIP with little burnout required Minimal tread void for excellent dry traction, strategically placed to aid in hydroplane resistance 18 popular sizes for 15- to wheel diameters DO NOT USE ON DYNO Brand Mickey Thompson, Seasons Year Round, Size Section Width 275 Millimeters, Ply Rating Polyester, Tire Diameter 25.9 inches, Weight 30.95 Pounds, Manufacturer Mickey Thompson, Model ET Street S/S, model number Is Discontinued No, Manufacturer + +Price is $",300 +"How much does this cost to the nearest dollar? + +Pirelli 106W XL RFT P0 +Product Type Vehicle Tire Package Dimensions 10.9 L X29.0 W X29.0 H Country Of Origin Mexico Package Weight Fit type Universal Fit Brand Pirelli, Seasons Year Round, Size Section Width 275 Millimeters, Load Capacity 2094 Pounds, Tread Depth 9 32nds, Tread Type Asymmetrical, Ply Rating XL, Tire Diameter 28.66, Weight 36 pounds, Manufacturer PIRELLI, Model P Zero PZ4 Run Flat, model number Is Discontinued No, Manufacturer Part OEM Part Special Features Run_flat, Construction Radial, UTQG Rank Automotive Passenger Car Performance Tires 722, Available August 5, 2017, Rim Size + +Price is $",350 +"How much does this cost to the nearest dollar? + +Torklift C3212 Rear Tie Down +Fits 11-14 Chevy/GMC 2500 / 3500 HD (Crew / Ext. Cab ONLY) with factory hitch 11-14 Chevy/GMC 2500 / 3500 HD (Crew / Ext. Cab ONLY) with factory hitch 11-13 Chevy/GMC 2500 / 3500 (Regular Cab ONLY) with factory hitch 2014 Chevy/GMC 1500 4wd (Crewcab) with factory hitch 2014 Chevy/GMC 2500 4wd (Crewcab) with factory hitch 2014 Chevy/GMC 2500 4wd (Regular Cab) with factory hitch 2014 Chevy/GMC 3500 4wd ( + +Price is $",200 +"How much does this cost to the nearest dollar? + +Cardone Remanufactured Ford Computer +CARDONE Remanufactured Electronic and Powertrain Control Modules are designed to meet or exceed O.E. performance. Reverse engineering provides insight into how and why the unit originally failed, allowing our engineers to identify and correct original design weaknesses. All critical components are re-soldered or replaced at our Philadelphia manufacturing plant, and each unit is 100% computer tested to ensure reliability. CARDONE is committed to getting your vehicle back to peak performance. On-car vehicle validation testing ensures product fits and functions properly OE components with high failure rates are 100% replaced All electronic modules are 100% tested to ensure they meet OE requirements for the application Advanced robotic equipment ensures precision made units and consistent high quality with every part Every unit is 100% tested to ensure + +Price is $",199 +"How much does this cost to the nearest dollar? + +Kidde AccessPoint 001798 Supra TouchPoint Lock +From the Manufacturer TouchPoint lock is designed to replace a standard cam lock in a variety of metal storage cabinets or enclosures sized with 5/16 inch square-hole cams. Solid die-cast body with a 10 digit changeable combination and a clutch mechanism to turn the cam. Can be mounted on top of the door surface or flush-mounted into the door. User changeable combination lock with push button combination is designed to replace a standard cam lock Door lock can easily change keyed cabinets to pushbutton locks; for use with items with 5/16 inch cams Combination lock features heavy-duty die-cast construction; great for metal cabinets and other enclosures Mounts flush into a door, or on top of a door surface; clutch + +Price is $",99 +"How much does this cost to the nearest dollar? + +3M Protecta Self Retracting Lifeline Rebel 6' (18M) Web Twin, Steel Rebar and Carabiner, Black/Red +Our Protecta personal self retracting lifelines (SRL’s) represent a major improvement in economy line SRL’s. Employers can economically replace simple lanyards with the versatility and added safety of a 6 ft. (1.8m) SRL. Protecta personal SRL’s are ergonomically designed for ease of use and are ideal for direct connection to most harnesses. The compact and lightweight design is barely noticeable on your back and stays out of the worker’s way. In addition, tension is always kept on the lifeline, which reduces dragging, snagging and trip falls. Whether your application requires + +Price is $",20 +"How much does this cost to the nearest dollar? + +Plantronics Wired Headset, Black, 7 x 5.4 x 2.2 inches +The next generation of our most popular over-the-head monaural headset. Completely re-imagined for the demands of the modern customer service center and office. Features soft ear cushions for all-day wearing comfort, metal joints that deliver durability and reliability and a flexible mic with visual and tactile positioning guides for precise positioning and clearer conversations. Frequency response - up to 6,800 Hz Dimensions 7 x 5.4 x 2.2 inches, Weight 4.8 Ounces, Manufacturer PLANTRONICS, INC., model number Rank Computer Headsets 370, Is Discontinued No, Available October 30, 2014, Units 1.0 Count + +Price is $",40 +"How much does this cost to the nearest dollar? + +Logitech K750 Wireless Solar Keyboard for Windows, 2.4GHz Wireless with USB Unifying Receiver, Ultra-Thin, Compatible with PC, Laptop - Black +Product Description Battery hassles are a thing of the past with the solar-powered Logitech Wireless Solar Keyboard K750. It charges itself whenever there's light, so you can say good-bye to batteries, power bricks and charging cables. With sleek lines and a thin profile, this stylish, streamlined keyboard adds style to your workspace. Combining the best of traditional keyboards, laptops and a Logitech-only concave key cap design, you'll enjoy faster, quieter, feel-good typing-hour after hour. Plus, you'll get Logitech Advanced 2.4 GHz wireless and the tiny Logitech Unifying receiver. From the Manufacturer + +Price is $",85 +"How much does this cost to the nearest dollar? + +Olympus PEN E-PL9 Body Only with 3-Inch LCD (Pearl White) +Introducing the PEN E PL9. It has everything to produce images you’ll be proud to share. There’s nothing to learn; just pick it up and let the on screen guides and built in settings make every shot perfect. Thanks to the powerful image stabilization system, you’ll easily shoot blur free stills and smooth 4K video, all handheld. With features like flip touchscreen, built in flash, Wi Fi and Bluetooth for easy sharing make the E PL9 your go to camera. 16 Megapixel live MOS sensor TruePic VIII Image Processor 3 180 Degree Flip down touch screen In body 3 axis image stabilization 4K video & still image capture from 4K + +Price is $",450 +"How much does this cost to the nearest dollar? + +Beck/Arnley Hub & Bearing Assembly +Since 1914, Beck/Arnley has focused on the customer, offering high quality parts that look and perform the same as the original part. This ideal has never changed. Today, Beck/Arnley is committed to being the premium supplier of high quality import parts within the automotive market. BeckArnley is an original equipment brand that partners with other manufacturers to supply the parts that cars were originally built with. This product is in a BeckArnley package, note that the part may have been manufactured by an independent BeckArnley supplier and the number on the part may differ from the number on the package. Quality construction Excellent materials Exacting tolerances Manufacturer Beck/Arnley, Brand Beck/Arnley, Weight 6.37 Pounds + +Price is $",120 +"How much does this cost to the nearest dollar? + +Eibach Pro-Kit Performance Springs Set Of 4 Compatible with Nissan Altima +Eibach production technology is recognized worldwide as leading its field, from our high-strength spring-steel alloys, our advanced CNC winding process, our high-quality corrosion protection and the legendary longevity of our components. High Performance Handling and Aggressive Good Looks. Each Spring Individually Tested Stop Quicker, Corner Faster and get Better MPG! Progressive Spring Design for Excellent Ride Quality. Manufacturer Eibach, Brand Eibach, Model Weight 24.8 pounds, Dimensions 24.7 x 14.7 x 7 inches, model number Exterior Machined, Manufacturer Part Rank Automotive Automotive Replacement Shocks 12221, Available December 9, 2019 + +Price is $",60 +"How much does this cost to the nearest dollar? + +LEGO DC Batman 1989 Batwing 76161 Displayable Model with a Buildable Vehicle and Collectible Figures Batman, The Joker – Mime Version and Lawrence The Boombox Goon, New 2021 (2,363 Pieces) +This is no kid’s toy. If you’re serious about BATMAN, comic book super heroes or making cool models, this LEGO DC BATMAN 1989 Batwing is for you! Recreate the authentic detail and gothic elegance of BATMAN’s iconic aircraft, the Batwing, with this LEGO brick build-and-display model. The impressive reproduction features realistic details, removable canopy, full interior, poseable flaps and a new special brick that will allow you to mount and display your model on your wall. There’s also a stand, nameplate + +Price is $",29 +"How much does this cost to the nearest dollar? + +Kingston Brass Restoration 4-Inch Centerset Lavatory Faucet with Porcelain Lever Handle, Brushed Nickel +Product Description Classic style. Two handle deck mount. 4 in. center set. Max 1.2 LPM water flow rate at 60 PSI. Integrated removable aerator. Drip-free ceramic cartridge system. Three hole sink application. 4.05 in. spout reach. 3 in. spout height. 4 in. center spread installation. 1/4 turn on and off water control mechanism. 1.05 in. spout clearance. Made from brass. Satin nickel finish. Made in Taiwan. From the Manufacturer Functional and Stylish Faucets Gives an Irresistible Beauty to the Bathroom. Design is Perfectly Co- + +Price is $",35 +"How much does this cost to the nearest dollar? + +Polk Vanishing Series 265-LS In-Wall 3-Way Loudspeaker, Dual 6.5 Dynamic Balance Drivers & 1 Ring-Radiator Tweeter, Polk PowerPort Technology, Rotating Cam System for Easy Installation +Enjoy extraordinary audio performance for your movies, music and TV shows with the Polk Vanishing Series 265-LS 3-Way Loudspeaker that disappears into your wall and yet delivers impactful, room filling sound. The in-wall speaker is equipped with dual 6.5 Dynamic Balance Drivers for clear, accurate mids and dynamic lows, and a 1 Ring-Radiator Tweeter for incredible imaging. With Polk's Patented PowerPort Bass Technology, the speaker adds deep, rumbling bass to your audio, while minimizing unwanted resonances. + +Price is $",200 +"How much does this cost to the nearest dollar? + +Spec-D Tuning LED Projector Headlights Glossy Black Housing Smoke Lens Compatible with Subaru Impreza Outback Sport, Subaru Impreza WRX Left + Right Pair Headlamps Assembly +✔️ All of Our Items are 100% Brand New In Original Packaging! You Will Never Receive a Used Item From Us! Comes in a Pair (Driver Side Left & Passenger Side Right Included) ✔️ DOT and SAE Compliant. Made by an ISO Certified Manufacturer using Materials that meet or Exceed OEM Requirements! ✔️ Direct Bolt On Replacement From Your Original Headlights! No Wiring or Modifications Needed! No Installation Instructions Included, Professional Installation is Highly Recommended! ✔️ Products Undergo Strict Quality Control to Ensure it is Waterproof (fully sealed with solid silicon) & Impact/UV Resistant + +Price is $",300 +"How much does this cost to the nearest dollar? + +RICHMOND & FINCH Airpod Pro Case, Green Leopard Full Protective Cover, Shockproof, Scratch Resistant, Wireless Charging Compatible Case for Airpods Pro +COMPATIBILITY This Richmond & Finch Airpod Pro Case is compatible with Airpods Pro Only PROTECTION Our Richmond & Finch Airpods Pro Case offers premium protection to your air pods pro with our shockproof protective cover, protecting your Airpod Pro from drops and knocks WIRELESS CHARGING The Richmond & Finch Airpods Pro Case is wireless charging compatible, so you can charge your Airpod Pros easily and quickly SCRATCH RESISTANT Our Richmond & Finch Airpod Pro Protective Cover is made from high quality scratch resistant materials, ensuring your Air Pods Pro are safe from any scratches or damage FASHION FORWARD All + +Price is $",30 +"How much does this cost to the nearest dollar? + +LFA Industries - mm Capacity, 33 Jacobs Taper Mount Plain Bearing Precision Crafted Heavy Duty All Steel, Keyed Drill Chuck with T5/k32 Chuck Key Included +LFA Industries Plain Bearing Precision Crafted Heavy Duty All Steel, Keyed Drill Chuck with T5/k32 Chuck Key Included, mm Capacity, 33 Jacobs Taper Mount. LFA Industries Plain Bearing Precision Crafted Heavy Duty Keyed Drill Chuck All Steel, Keyed Drill Chuck with Key Included LFA Industries mm Capacity, 33 Jacobs Taper Mount Manufactured To Last-For Quality and Excellence-Chucknology Made in France over 85 Years Manufacturer LFA Industries, Part Weight 2 pounds, Dimensions 2.88 x 1.67 x 2.88 inches, Country of Origin France + +Price is $",40 +"How much does this cost to the nearest dollar? + +SAUTVS LED Headlight Assembly for Slingshot, Center Head Light Kit for Polaris Slingshot S GT R LE SL Modified Accessories, Replace OEM +Compatible with Polaris Slingshot S SL SLR R LE (Please refer to the compatible list in description) Plug & Play, perfect and accurate replacement for the original headlight without any changing or modifying, replace OEM The design of internal protection mechanism makes it no flickering or failure; IP67 waterproof and scratch resistant materials prevent from dust, mud, snow or heavy rain leaking in; Strictly follow the quality and safety standards, working in all the weather conditions High quality LED beads are used, long service time and life span; Brighter and concentrated light source, ensuring your driving safety Package include 1 set LED headlight assembly + +Price is $",100 +"How much does this cost to the nearest dollar? + +2 Pack Combo Womens Safety Glasses Impact Resistant Clear Smoke Lens +Package Includes 2 pairs of Womens Safety Glasses with Clear Lenses and Black Sunglasses Lens Assorted Color Temple Frames Available! Sizing Information Frame length – 6.25 in, Frame Width 5.4 in. Exceeds ANSI Z87.1+ Safety Standards. Shatter Proof Protection Our lenses offer 100% protection against glare and protection against UV/UVA/UVB rays. The Safety Glasses are also scratch-resistant, impact-resistant, and shatter proof. Keep your eyes safe during construction, metalworking, welding, woodworking, hunting, fishing, sports, shooting, and other activities outdoors. Impact Resistant Coating coating Package Includes 2 pairs of Womens Safety Glasses with Clear Lenses and Black Sunglasses + +Price is $",40 +"How much does this cost to the nearest dollar? + +Arepa - Venezuelan cuisine - Venezuela PopSockets PopGrip Swappable Grip for Phones & Tablets +Arepa Venezolana. Arepa - Venezuelan cuisine - Venezuela. Arepa - Venezuelan cuisine - Venezuela. Great gift for holidays, birthdays, events, parties and much more. Arepa - Venezuelan cuisine - Venezuela Great gift for holidays, birthdays, events, parties and much more. PopGrip with swappable top; switch out your PopTop for another design or remove it completely for wireless charging capabilities. (Not compatible with Apple MagSafe wireless charger or MagSafe wallet.) Expandable stand to watch videos, take group photos, FaceTime, and Skype handsfree. Advanced adhesive allows you to remove and reposition on most devices and cases. Note Will not stick to some silicone, waterproof + +Price is $",19 +"How much does this cost to the nearest dollar? + +Schlage Lock Company Padlock, 1-1/2 x 5/16, Brass +Schlage Commercial Padlock 5/16 Diameter with 1-1/2 Shackle and Keyway Schlage commercial grade padlock is designed for use in high risk locations Solid brass body resists corrosion for all-weather performance 1-1/2 in. x 5/16 in. molybdenum hardened steel shackle for increased cut resistance Double deadbolt locking mechanism provides extra security Re-key able Schlage cylinder Brand SCHLAGE, Special Feature Keyway, Lock Type Key Lock, Dimensions LxWxH 0.4 x 1.5 x 2.5 inches, Material Brass, Steel, Recommended Uses For Product Security, Color Brass + +Price is $",40 +"How much does this cost to the nearest dollar? + +Techni Mobili White Sit to Stand Mobile Laptop Computer Stand with Height Adjustable and Tiltable Tabletop +Techni Mobili Sit-to-Stand Rolling Laptop Stand offers an adjustable height mechanism that is compact, portable and is a perfect choice for a laptop or writing setup in a limited space. This Sit-to-Stand mobile laptop stand features a large tabletop with a tilt mechanism attached so it can be adjusted to your most comfortable working angle. It also features a safety edge-stopper to prevent objects from sliding down when tilted. The heavy-duty steel frame supports a sturdy structure, and the non-marking locking casters let you glide while maintaining the balanced level. 𝐒𝐔𝐑𝐅𝐀𝐂𝐄 𝐌𝐀� + +Price is $",40 +"How much does this cost to the nearest dollar? + +Special Lite Products Contemporary Wall Mounted Mailbox with Rain Overhang Finish Oil Rubbed Bronze +The clean lines and minimal design of the Contemporary Horizontal Mailbox provide an immediate way to add a lovely and welcoming outdoor accent to your front porch. The straightforward design makes this mailbox a perfect match with any home while upgrading your entry way at the same time. The durable powder coat finish will keep your mailbox looking vibrant and beautiful for years to come while the door closure will protect your mail from rainy weather keeping it dry inside. All types and sizes of magazines, letters, envelopes can fit easily inside the enclosure. Matching newspaper scroll arms are included and can be easily attached at your choosing. Deliberate but stylish, the Contemporary delivers at all angles. One of our best sellers! All screws, hinges, and like + +Price is $",110 +"How much does this cost to the nearest dollar? + +Tascam Digital Portastudio Multi-Track Audio Recorder & Tascam RC3F 3-Way Footswitch +Tascam Digital Portastudio Multi-Track Audio RecorderTascam Digital Portastudio Multi-Track Audio RecordeTascam RC3F 3-Way FootswitchThe RC-3F is a 3-way footswitch for the GB-10, LR-10, DP-03 and other TASCAM recorders and players. The 1/8 mini jack plugs into the remote jack of these TASCAM products to add features like play/pause, looping or punch in. See the products' user manual for details. Product 1 Eighteen track faders and one master fader allows instant access to any track without selecting pages + +Price is $",399 +"How much does this cost to the nearest dollar? + +Glow Lighting Vista Crystal Flush Mount, 6 W +Create that beach feeling with this capiz shell and chrome pendant chandelier. Ideal for bedrooms, kitchens, dining rooms and bathrooms. Uses 3 x 40 Candelabra base bulbs Trimmed with clear crystal Easy installation hardware, instructions included for convenient setup CSA/CUS approved for dry location Manufacturer Glow Crystal Lighting Inc., Part Weight 2.69 pounds, Dimensions 10 x 10 x 8.5 inches, Country of Origin Canada, model number Is Discontinued No, Size 6\ W, Color 8.5, Power Source Corded Electric, Voltage 120 Volts, Quantity 1, Type of Bulb incandescent, Mounting Type Ceiling Mount, Plug Format A- US style, Certification CSA + +Price is $",166 +"How much does this cost to the nearest dollar? + +Z3 Wind Deflector, Smoke Tint, Lexan, Windscreen, Windstop, Windblocker +- Easy installation, installs in less than two minutes. - Take long trips with the top-down in comfort. - Cruise at night without freezing from cold drafts. - Hear the full richness and clarity of your stereo. Reduce turbulence up to 70%; prevents unrelenting wind buffeting and driver fatigue Unique no reflection or glare, easy to use at night against headlights; Unlike others, no abrasion, does not induce long term wear 30 day trial period and lifetime warranty; No rattles or squeaks, is silent; Keep hair in place while driving with the top-down. Talk clearly on your blue tooth device; Talk with passengers without strain; Enjoy conversations while driving with the + +Price is $",99 +"How much does this cost to the nearest dollar? + +Olympus E-20 5MP Digital Camera w/ 4x Optical Zoom +Product description 5.2 megapixel sensor creates 2,560 x 1,920 images for prints at 11 x 14 and beyond 4x optical zoom lens with autofocus Included 32 MB SmartMedia card holds 7 images at default resolution Compatible with SmartMedia and Type I and II CompactFlash Uses Amazon.com You'd be hard-pressed to find a digital camera that captures better images than those from the Olympus E-20N. The camera pairs a sensor with a high-quality custom-designed 4x zoom lens for photos with clarity that rivals film. First, a note about naming conventions this camera is also known as the E-20 and the E-20P. The N + +Price is $",220 +"How much does this cost to the nearest dollar? + +PHYNEDI 1 1000 World Trade Center Bricks Model Compatible with Lego, MOC DIY Creative Large Architecture Collection Challenge Building Toy, (4,870 Pieces) +The building instructions of this model are two PDF guides (Part 1 has 135 pages, Part 2 has 155 pages),. Part 1 also includes a four page introduction about the World Trade Center history and design. World Trade Center features Scale 1 in inches 13,2 x 10,7 (base area), 22,7 (height)Size in centimeters 33,6 x 27,2 (base area), 57,5 (height)Size in studs 42 x 34 (base area), 71,9 (height)Style ArchitectureYear 2022 Package + +Price is $",50 +"How much does this cost to the nearest dollar? + +YANGHUAN Unstable Unicorns Adventure Card Game Toy Expansion Pack-Teen Board Game-Adult Strategy is Designed to add to The Base Unstable Unicorn Solitaire Expansion Pack +Product Description Product Name Card Game Single piece size 15 x 10.5 x 5cm Single piece weight 350g Expansion package parameters Single piece size Single product weight 105 grams Material coated paper Color Unstable Unicorns white frame, Unstable Unicorns black frame, NSFW extension, Legenda extension, Rainbow extension, Dragons extension, Uncut extension Ability training emotion, intellectual development, brain use, other ability training, interactive toys, parent-child communication, interest development Suitable age 14 years old and above Game type Unstable Unicorns is still a strategy game, it will destroy your + +Price is $",40 +"How much does this cost to the nearest dollar? + +Interlogix NetworX Touch Screen Keypad, 3.5 Color Touch Screen, Icon-based Graphic Interface, Built-in Message Board, NetworX System Compatibility, Capability, Modern Design +Interlogix NetworX Touch Screen Keypad, 3.5 Color Touch Screen, Icon-based Graphic Interface, Built-in Message Board, NetworX System Compatibility, Capability, Modern Design Ideal for almost any size application, the NetworX Touch Screen Keypad offers powerful yet simple control of any NetworX security system. An intuitive interface, 3.5 touch screen and Quick Keys for rapid system arming and status updates enable quick and easy system management Users can record their own names for different system components and leave voice messages for others when arming or disarming. When a Net + +Price is $",110 +"How much does this cost to the nearest dollar? + +Steering Damper,Universal Motorcycle Handlebar Aluminum Alloy Steering Damper Stabilizer Safety Control(Gold) +Features 1. Durable in Use Made of durable aluminum alloy for extreme strength. 2. Excellent Quality Professional manufacturing, high precision and good quality. 3. Easy and Simple to Hand Easy installation without any modification required. 4. Stable Quality The anodized surface for enhance its oxidizing and corrosion resistance. 5. Scope of Application Universal for motorcycle, high-emissions car, sports car, street car. Specification Condition 100% Brand New Material Aluminum alloy (CNC) Color Black/Gold/Red/Silver/Blue(optional) Mounting screw Fitment Universal for motorcycle, high-emissions car, sports car, street car. Package List 1 * Dam + +Price is $",120 +"How much does this cost to the nearest dollar? + +Amprobe TIC 410A Hot Stick Attachment +Amprobe products range from an extensive line of clamp meters and digital multimeters to industry-specific tools for residential/commercial electricians, HVAC/R technicians, utilities and industrial maintenance professionals. All Amprobe tools undergo rigorous testing to ensure full compliance with the latest IEC and CE safety regulations in Fluke Safety labs for quality and safety you can trust. Extension probe attaches to Amprobe TIC 300 Pro AC voltage detector to test for high AC voltages without touching or disconnecting the circuit Can detect AC voltages between 1,500V and For utility, industrial, and mining applications when working with high-voltage equipment such as transmission lines, downed power lines, fuses, and load-break connectors Extends to 57 long Conforms + +Price is $",100 +"How much does this cost to the nearest dollar? + +MyCableMart 3.5mm Plug/Jack, 4 Conductor TRRS, Self Solder, Male +Connects stereo audio & microphone devices requiring 4 conductors (left and right audio and microphone plus ground). This connector MAY also be suitable for left/right audio 1 video (composite) and ground. Great for making your own 3.5mm 4 conductor Cables or for repairing existing cables. Wire terminals are attached using solder (not included).Features 3.5mm 4 conductor (3 band) plug 3.5mm 4 conductor (3 band) plug Nickel Plated Nickel Plated Strain relief Strain relief Outer Dimensions (at PVC outer molding) Outer Dimensions (at PVC outer molding) Outer Dimensions (with PVC outer molding + +Price is $",25 +"How much does this cost to the nearest dollar? + +OtterBox + Pop Symmetry Series Case for iPhone 11 Pro (ONLY) - Retail Packaging - White Marble +OtterBox + Pop Symmetry Series Case for iPhone 11 Pro (ONLY) - Retail Packaging - White Marble Compatible with iPhone 11 Pro Thin one-piece case with durable protection against drops, bumps and fumbles that is also compatible with Qi wireless charging PopSockets PopGrip is integrated into case to help with holding, texting, snapping better pictures and hand-free viewing PopTop designs are easy to switch out — just close flat, press down and turn to swap the PopTop. Includes OtterBox limited lifetime warranty (see website for details) and 100% authentic Dimensions 7.8 x 4.29 x 1.06 inches, Weight 3 + +Price is $",20 +"How much does this cost to the nearest dollar? + +Dell XPS Desktop ( Intel Core i7 4790 (3.6 GHz), 8GB, 1TB HDD,Windows 10 Home Black +Product description Bring your multimedia to life with Dell XPS desktop PCs offering powerful processors, superb graphics performance and lots of storage space. Amazon.com Processor 4th Generation Intel Core processor (8M Cache, up to 4.00 GHz) OS Windows 7 Professional, English Graphics Card NVIDIA GeForce GTX 750Ti 2GB DDR5 Memory 32GB Dual Channel DDR3 - 4 DIMMs Hard Drive 1TB 7200 RPM SATA Hard Drive 6.0 Gb/s + 256GB SSD Processor 3.6 GHz RAM 8 GB DDR5, Memory Speed 1600 MHz, + +Price is $",500 +"How much does this cost to the nearest dollar? + +Franklin Iron Works Sperry Industrial Bronze Chandelier 28 Wide Rustic Farmhouse Cylinder Scavo Glass Fixture for Dining Room House Foyer Kitchen Island Entryway Bedroom Living Room +28 wide x 28 high. Glass is 6 1/4 high x 3 wide. Canopy is 5 1/2 wide. Weighs 19.58 lbs. Comes with of lead wire and 6-feet of chain. Sloped ceiling adaptable. Uses eight maximum 60 watt standard-medium base bulbs (not included). Contemporary farmhouse eight-light chandelier from Franklin Iron Works. Industrial bronze finish metal frame. Scavo glass cylinder shades. Brand Franklin Iron Works, Color Scovo Glass, Material Glass, Style Farmhouse, Light fixture form Chandelier, Room Type Entryway, + +Price is $",400 +"How much does this cost to the nearest dollar? + +Avery Legal Dividers, Standard Collated Sets, Letter Size, Side Tabs, 51-75 +You have the right to organized and professional-looking files. This Standard Collated Legal Divider Set features Tabs 51-75 so it's perfect for index briefs, legal exhibits, mortgage documentation files and more. White paper stock with clear, Rip Proof reinforced tabs are preprinted on both sides using Helvetica bold type for ease of use, and the unpunched binding edge gives you the freedom to fit practically any binding system. Here's evidence these dividers will do your files justice. Clear Rip Proof reinforced tabs printed on both sides Unpatched binding edge so indexes can fit practically any binding system Contains 30% post-consumer recycled content Avery Style is printed using Helvetica bold type Manufacturer Avery, + +Price is $",20 +"How much does this cost to the nearest dollar? + +Moen 8346 Commercial Posi-Temp Pressure Balancing 4 Port Cycling Valve Hand Shower System 2.5 gpm, Chrome +Product Description VERSATILE DESIGN Chrome finish is highly reflective for a mirror-like look that works with any decorating style From the Manufacturer This single-handle handheld shower system has a Posi-Temp pressure-balancing valve that maintains water pressure and controls temperature, a slide bar, drop ell, vacuum breaker, a metal hose and mounting hardware. The pressure balancing cycle valve design has 1/4 turn stops, the rubber nozzles are quick cleaning and the chrome plated metal construction provides a bright, highly reflective, cool grey metallic look. The temperature handle operates counterclockwise through a 270 degree arc, with off at 6 o' clock + +Price is $",200 +"How much does this cost to the nearest dollar? + +Carlisle Versa Trail ATR All Terrain Radial Tire - NHS +Tire designed to provide high performance for sports driving. Providing maximum traction with good braking control and handling, the tire offers unrivaled comfort when driving at high speeds. Comfort, experience, technology and design! Tire only, Rim not included made in united states package height 8.9 package length 27.1 package width 27.1 Fit type Universal Fit Brand Carlisle, Seasons NON_WINTER, Size Rim Size 12 Inches, Section Width 9 Inches, Tire Aspect Ratio 8, Speed Rating M, Tread Depth 24 32nds, Ply Rating 6-Ply, Tire Diameter 27 Inches, Weight 23.6 pounds, Manufacturer Carlisle, Model Versa Trail ATR + +Price is $",300 +"How much does this cost to the nearest dollar? + +SUNWAYFOTO 44mm Tripod Ball Head Arca Compatible Sunway +66lb Max load! Eliptical Ball for progressive resistance. Y-axis diameter 0.03mm longer than the X-axis. Single notch design. Super strong shell. All Metal knobs. For long term durability. Panning Base Scale Independent Pan Lock Knob, 360° panning movement with calibrated precision, precisely capture overlapping panoramic images. Panning Base is laser-engraved with index marks from with increments at 5° 50mm clamp with Bubble level. Ball Diameter 44mm. Base 55mm. Height 94mm. Weight 450g. Max load 30kg (66 lbs). Bottom thread 3/8 with 1/4 adapter. 1 year + +Price is $",100 +"How much does this cost to the nearest dollar? + +NanoBeam AC 4 Units 5GHz High-Performance airMAX ac Bridge CPE with Dedicated Management Radio +Models Ubiquiti Networks NanoBeam AC 5GHz High-Performance airMAX ac Bridge CPE with Dedicated Management Radio Incorporating innovative industrial design with proprietary airMAX ac technology, the NanoBeamAC is ideal for CPE deployments requiring maximum performance from the smallest possible footprint. The NanoBeam ac Gen2 airMAX ac CPE with Dedicated Management Radio from Ubiquiti Networks offers a more reliable long-distance point-to-point connection. Boasting a maximum throughput up to 450 Mb/s, this NanoBeam radio is designed to filter out noise to reduce interference in areas congested with multiple RF signals while offering up to 19 dBi gain. Setup is simple, as the NanoBeam + +Price is $",150 +"How much does this cost to the nearest dollar? + +WULF 4 Front 2 Rear Leveling Lift Kit with Spindles & Shackles compatible with Ford Ranger 2WD with Coil Spring Suspension +Compatible with Ford Ranger 2WD with Coil Spring Suspension FRONT WULF 4 Lift Ductile Cast Iron Spindles / Knuckles REAR 1.5-2 Adjustable Lift Black Powder Coated Shackles, Zerk-Grease Fittings, Pressed greaseable high grade poly bushings and metal sleeves included NOTE 2WD models only. Excludes models with Stabilitrak. Requires Coil Spring suspension Please see the description for full details, or contact us for assistance Fast Shipping. Manufacturers Lifetime Warranty. Dedicated Customer Service Manufacturer WULF Suspensions, Brand WULF Suspensions, Country of Origin USA, + +Price is $",250 +"How much does this cost to the nearest dollar? + +Alera ALEVABFMC Valencia Series Mobile B/f Pedestal, 15 7/8 X 19 1/8 X 22 7/8, Med. Cherry +Sturdy woodgrain laminate mobile box file pedestal to store all your office necessities. Full-extension ball bearing slides on file drawer for easy access. Durable laminate is water-, scratch-, and dent-resistant with 3 mm protective edge banding. Two fold-away safety keys included. Accepts Alera® Pedestal Cushions for an instant seating option. Sturdy woodgrain laminate mobile box file pedestal to store all your office necessities. Full-extension ball bearing slides on file drawer for easy access. Durable laminate is water-, scratch-, and dent-resistant with 3 mm protective + +Price is $",50 +"How much does this cost to the nearest dollar? + +YU-GI-OH! Ignition Assault Booster Box +24 Packs per Display 9 cards per packHumanitys greatest fear has been realized! Dueling A.I.s have become sentient and organized their own army to take the Yu-Gi-Oh! TRADING CARD GAME by storm in Ignition Assault! Winter 2020s booster set heats things up with Ais @Ignister cards from the climax of YuGi-Oh! VRAINS, multiple brand-new strategies, new cards for popular strategies, and powerful, general use cards that every Duelist will want to add to their arsenal! Keep your A. 24 Packs per Display 9 cards per pack Dimensions 5.65 x 4.75 x 1.5 inches, Weight 11.2 ounces + +Price is $",30 +"How much does this cost to the nearest dollar? + +48 x 36 Extra-Large Framed Magnetic Black Chalk Board (Black Frame) +Handsome, smooth 48 x 36 inches extra-large framed black chalk board. Perfect for office, meeting rooms, classrooms, at work or at home...to serve as black board, or magnetic board, or menu board, or bulletin board etc. Black frame. (Search for on Amazon if you want a Dark Brown wood tone frame or if you want a Medium Brown wood tone frame.) DELIVERY Shipped to continental U.S. addresses only. Handsome, smooth black board with elegant black veneer frame and reinforced backing. (If you want a dark brown wood tone frame, search for on Amazon; If you want a medium brown wood tone frame, search for ) Lean this light-weight black board + +Price is $",60 +"How much does this cost to the nearest dollar? + +Dell Latitude D620 Renewed Notebook PC +Dell Latitude D620 14.1 Laptop (Intel Core Duo 80GB Hard Drive, 2048Mb RAM, DVD/CDRW Drive, XP Professional) Windows XP Professional with Dell Reinstallation XP Pro. CD Intel Core Duo Processor 2GB DDR2 RAM 80GB Hard Drive Screen, Wifi Standing screen display size 14 Inches, Screen Resolution 1366 x 768 pixels, Processor 1.83 GHz RAM 2 GB DDR2, Memory Speed 1.83 GHz, Hard Drive 60 GB HDD, Chipset Brand Intel, Card Description Integrated, Wireless Type USB 2.0 Ports 3, Brand Dell, Series Dell Latitude, model number d620, Hardware Platform PC, Operating System Windows XP + +Price is $",150 +"How much does this cost to the nearest dollar? + +acer Aspire 5 Laptop, AMD Ryzen 3 5300U Quad-Core Processor, 15.6 FHD IPS Display, 8 GB DDR4 RAM, 512 GB PCIe SSD, HDMI, Fingerprint, Wi-Fi 6, Backlit Keyboard, Windows 11 Home S Mode +Processor AMD Ryzen 3 5300U 4-Core Processor (8 Threads, 4MB L3 Cache, Up to Graphics AMD Radeon Operating system Windows 11 Home English Memory 8 GB DDR5 SDRAM Hard Drive 512 GB PCIe Solid State Drive Optical Drive No Display 15.6 FHD (1920 x 1080) LED-backlit, IPS Wide Viewing Angle, Slim Bezel, 16 9 aspect ratio 1 x USB 2.0 + +Price is $",400 +"How much does this cost to the nearest dollar? + +Elk 30 by 6-Inch Viva Pendant with Green Glass Shade, Satin Nickel Finish +The Viva light pendant is meticulously hand blown with up to three layers of uncompromising beauty and style. This pendant features green hand blown glass shade. Shade holder comes in satin nickel finish. Accommodates six medium base bulbs. Measures 9-inch extended length by width by 6-inch height. Viva light pendant is meticulously hand blown with up to three layers of uncompromising beauty and style Features exquisite line of green hand blown glass shade Accommodates six medium base bulbs Shade holder comes in satin nickel finish Measures 9-inch extended length by width by 6-inch height Brand Elk, Color Satin Nickel, Material Material Other, Style Contemporary, Light fixture form Pendant, Specific + +Price is $",60 +"How much does this cost to the nearest dollar? + +Barbie Top Model Doll +Amazon.com.caption font-family Verdana, Helvetica neue, Arial, serif; font-size 10px; font-weight bold; font-style italic; ul.indent list-style inside disc; text-indent -15px; Barbie is ready to hit the runway and show off the latest global fashions as a top model. While we all know Barbie’s occupational curiosity has led her down varied paths, this one is perfect for the slender fashionista. She comes wearing a trendy outfit of fishnet stockings, a tiered black miniskirt, a patterned top with a short sleeve shrug, and ankle boots. Her long blond hair is styled sleekly straight and she is carrying an animal print handbag. While this outfit is certainly catwalk worthy, it’s only + +Price is $",60 +"How much does this cost to the nearest dollar? + +Danby Designer 20-In. Electric Range with Coil Elements and Ft. Oven Capacity in Stainless Steel/Black +You dont need to be a world-class chef to enjoy cooking with this ultra-compact electric range by Danby Designer. Measuring only 20 inches wide, this stylish model is the ideal addition to trailers, cottages or efficiency apartments. It features a glass window on the oven door, plus angled front-mounted push and turn safety knobs with hot surface indicator lights. The lift-up porcelain cooktop has one 8-inch coil element for quick boiling and three 6-inch coil elements. Each element has a removable drip bowl for easy cleaning. The ft. electric oven has two oven racks with safety stops and four adjustable positions, plus a powerful broiler with 2400 watts of bro + +Price is $",200 +"How much does this cost to the nearest dollar? + +FixtureDisplays® Metal Truss Podium Double Width Modern Design +FixtureDisplays Metal Truss Podium Double Width Modern Design Churches & Other Venues Black truss is great to project a simple, clean and crisp look. Decorative truss panel design. Great for Churches, Schools, Hotels, Conferences, Funeral Homes, Stages, Debates, Wedding & Events, Restaurants Reception, Concierge etc. Easy screw aseembly. Contact us if u wish to order assembly service. Double-wide Full Size Pulpit Measurement 39 wide x 15.5 deep x 46.7 tall. Podium weighs 41 lbs. Reading panel comes with book stopper. Works great for two person services, or a larger room where a wider podium is proper. Sturdy Construction Made + +Price is $",250 +"How much does this cost to the nearest dollar? + +ACDelco GM Original Equipment Alternator +ACDelco GM Original Equipment Light Duty Alternators have components that are newly manufactured, and are GM-recommended replacement for your vehicle’s original alternator. Alternators provide power to the vehicle's electrical systems and charge the battery while the engine is running. These original equipment alternators have been manufactured to fit your GM vehicle, providing the same performance, durability, and service life you expect from General Motors. 100% newly manufactured as an exact replacement for your GM vehicle’s original alternator Components are tested to meet original specification requirements for remarkable durability GM-recommended replacement part for your GM vehicle’s original factory component Offering the quality, reliability, and durability of GM OE Manufactured to GM OE specifications for fit, form, and function Dimensions 13.8 + +Price is $",200 +"How much does this cost to the nearest dollar? + +EBC Premium Street Brake Kit +Type Automotive Brake Save 10% from buying separate parts with EBC Brakes quality brake kit. High efficiency EBC pads with patented EBC Brake-in coating and premium rotors with thermic anti rust coating, fully balanced and run-out tested for smooth braking.Dimension 12 x 12 x 12 inchWeight 35.18 lbsManufacturer Warranty Covered by Manufacturer's Warranty Daily Driver Premium Brake Kit For Cars Truck Or SUV Quality British Made EBC Pads Premium G3000 OE Style Rotors Geomet Anti Rust Coating Manufacturer EBC Brakes, Brand EBC, Model EBC Brakes, Weight 40 pounds, model number Manufacturer Part Available May 19, 2012, Vehicle Service Type Car, Orientation Front + +Price is $",220 +"How much does this cost to the nearest dollar? + +FXR Men's Boost FX Jacket (Black/Orange/White - Large) +HydrX Pro - Shell - durable, sublimated 450d polyester shell with HydrX Pro laminate Boost LE Shell - M-Series Omni Stretch 450d polyester shell with HydrX Pro laminate F.A.S.T. 90g insulation value in outer shell body, perforated at vent areas Lining - moisture-wicking quick-dry mesh lining FXR Dry Vent system - snowproof and moisture resistant chest side body vent system Removable liner - FXR Thermal Dry active liner with 175g Thermal Flex fill YKK Aquaguard front zipper HD #8 W/P zippers throughout Adjustable windskirt 360 3M Scotchlite reflective Shock-cord adjustable collar Shock-cord adjustable bottom + +Price is $",99 +"How much does this cost to the nearest dollar? + +SuperATV Scratch Resistant 3-in-1 Flip Windshield For CFMOTO ZForce 500 / 800 Trail / 800 EX / 1000 | 1/4 Thick Polycarbonate | USA Made | Can be set to Open, Vented Or Closed +Fits CFMOTO ZForce 500 | CFMOTO ZForce 800 Trail | CFMOTO ZForce 800 EX | CFMOTO ZForce 1000 | Can be used with most soft or hard tops | 100% Fitment Guaranteed Great For All Weather Want a CFMOTO Windshield that works in all conditions? Our 3-in-1 Flip-Up design allows you to choose from closed, vented, or open positions to ride comfortably in all-weather without having to + +Price is $",200 +"How much does this cost to the nearest dollar? + +SBU 3 Layer All Weather Mini Van Car Cover Compatible for Ford Windstar Minivan Model Years Breathable Automobile Van Protection +This Van Cover will provide all year round protection to your car. -It will efficiently shield your car’s paint from all finish-destroying agents sun, rain, snow, dust, dirt, tree sap and other corruptive elements. -The cover will minimize accidental bumps, dings, and scratches. You will save money on car washes, repair shops and will enjoy your ride in a brand-new looking car all year round. Investing in our quality cover is not only a practical move but also the perfect option for maintaining the car’s exterior. Condition Brand New, Color Gray, PACKAGE INCLUDES Brand New Van Cover.Free Storage Pouch, Antenna Patch. All + +Price is $",250 +"How much does this cost to the nearest dollar? + +2 Pack Outdoor Brochure Holder Advertising Pamphlet Display Box with Lid Wall Mount Flyer Holder Acrylic Envelope Holder Waterproof Outdoor Brochure Box for Store Literature Display (Clear) +Features Fit for various occasions Our realtor flyer holders are ideal for literature, real estate advertisements, flyers, paper, letters, tickets, signature papers, etc., suitable for plenty of occasions indoor and outdoor, such as home, office, shopping malls, real estate companies, food stores, public places, business occasions, banks and so on. Warm to share These outdoor brochure boxes can also be applied as gifts for family members, friends, relatives, coworkers, neighbors, and other people you care about, so you can send them to show your love and concern, and to strengthen your relationships. Specifications Material acrylic Color clear Size + +Price is $",10 +"How much does this cost to the nearest dollar? + +Monroe Shocks & Struts Quick-Strut 171585 Strut and Coil Spring Assembly +Featuring a vehicle-specific design, Monroe® Quick-Strut® strut assemblies are fit checked, ride tested and engineered to restore factory ride height and ride performance. Assembled in Paragould, AR, they include all required components in a single unit. QUICKER, SAFER, EASIER AND COMPLETE REPAIR -- Includes everything you need for strut replacement in a single, fully assembled unit with no need for a spring compressor RESTORES RIDE HEIGHT -- Precisely calibrated to meet the OE design, each application-specific coil spring type is engineered to restore ride height and support the vehicle's weight VEHICLE-SPECIFIC DESIGN -- Application-specific coil spring, mount and strut designs ensure + +Price is $",80 +"How much does this cost to the nearest dollar? + +Elements of Design Magellan Three Handle Tub and Shower Faucet, Oil Rubbed Bronze +Solid brass water way construction, Premium color finish resists tarnishing and corrosion, 2.5 GPM / 9.5 LPM at 60 PSI, 6-Inch reach Shower Arm, 1/4 turn washer less cartridge, IPS Inlets, Pressure Balance Valve, Temperature Check Stop.. Constructed from solid brass for durability and reliability Our corrosion and tarnish-resistant finishes provides long-lasting use Pressure Balance Valve; Fine Artistic Craftsmanship Max 2.0 LPM Water Flow Rate At 80 PSI On Showerhead Compliant with California Energy Commission Title 20 Brand Elements of Design, Color Oil Rubbed Bronze, Material Brass, Finish Type Oil Rubbed, Handles 3 + +Price is $",60 +"How much does this cost to the nearest dollar? + +GM Genuine Parts Air Conditioning Evaporator Core +ACDelco GM Original Equipment A/C Evaporator Cores are heat exchangers and are located in the HVAC housing, where they cool and dehumidify the cabin air. Refrigerant is metered into the evaporator by the orifice tube or expansion valve. This original equipment evaporator core is a GM-recommended replacement for your vehicle’s original components and has been manufactured to fit your GM vehicle, providing the same performance, durability, and service life you expect from General Motors. Channel-plate construction provides a high refrigerant contact surface area, resulting in better performance Vacuum-brazed, corrosion-treated, and leak-tested to help provide trouble-free operation GM-recommended replacement part for your GM vehicle’s original factory component Offering the quality + +Price is $",200 +"How much does this cost to the nearest dollar? + +Baseus USB C Docking Station to Cast on 3 Monitors with 100W PD USB-C Port, 4K USB 3.0 * 5, LAN, SD/TF Cards Reader, Audio Port for Windows, Mac Laptop +Docking Station — Up to 16 ports allowing you to connect almost all devices through a single gear; 3 4K HDMI ports to cast different content on each display, PD Type-C to connect mobile devices, 3 USB 3.0 ports, 2 USB2.0 ports. Note Docking station requires a second power adapter through the PD USB-C port when charging your laptop through it Triple Extend to the Fullest — Boost productivity by casting up to 3 different contents on displays; Actual pixels up to when using + +Price is $",50 +"How much does this cost to the nearest dollar? + +Whitehall™ Personalized Whitehall Capitol Mailbox with Door & Side Address Plaques Personalized Mailbox (3 Colors Available) +After your order is placed, our friendly US based representatives will send a layout for your approval THREE COLORS AVAILABLE 1) Black with Gold Address 2) Bronze with Gold Address 3) White with Gold Address BOX DIMENSIONS - 9.625 X 13 X Approved by Postmaster General. Manufactured from die cast, high-density aluminum alloy The address Plaque can display up to five, 3 numbers and the bottom line holds up to sixteen, 1.25 characters. Material Aluminum, Included Security Features Hopper & Baffle, Brand Clarus Crystal, Dimensions 20.38\ D x 9.63\ W x 13\ H + +Price is $",200 +"How much does this cost to the nearest dollar? + +Pro Circuit Works Pipe for 02-19 YAMAHA YZ250 +The Original Pro Circuit Works Pipe Offers Unparalleled Performance and Power for Every Two-Stroke Application. Increased Horsepower and Torque Gains Will Quickly Be Noticed Across the Entire Rpm Range. The Unplated, Oiled Metal Finish Requires Some Maintenance, but Really Gives Your Bike That Works Look. Please Note The Image Displayed Is Representative of the Item, but May Vary Slightly Depending on Your Specific Model. Please Note The Image Displayed Is Representative of the Item, but May Vary Slightly Depending on Your Specific Model. Size YAMAHA YZ250 Style CARBON STEEL Color silver Warranty Pro-Circuit provides a 90-day warranty. See their site for full details. Manufacturer Pro Circuit + +Price is $",180 +"How much does this cost to the nearest dollar? + +HYANKA 15 1200W Professional DJ Speaker, Portable Pa System, Bluetooth Party Speaker with Subwoofer, Microphone and Speaker Set, Powered Pa Speaker System with Light, FM, TWS, USB, Remote, EQ +1. High Powered Active Professional DJ Speaker The B-15 has been finely tuned by our experienced engineer teams with 1200W P.M.P.O portable loud powered system with the HF Unit made of super titanium film with high sound without any distortion. Separate bass and treble controls on this Active Professional DJ Speaker allow for precise pitch tuning. This Bluetooth DJ speaker with subwoofer will give you incredibly loud sound crystal-clear treble and booming bass. 2. Multiple Easy Connections This large party powered pa speaker with 15 subwoofer can be connected + +Price is $",100 +"How much does this cost to the nearest dollar? + +Bluetooth X6BT Card Reader Writer Encoder Card Writer Device +Package Includes 1 x X6(BT) Card Reader Writer 1 x Software MINI CD 1 x Bluetooth dongle 1 x USB cord 20 x Blank Cards - X6 Bluetooth Card Readers Writer Encoder card swipe - World's Only Bluetooth Card Reader / Writer. - The World First Bluetooth Manual Swipe Smallest Card Reader/Writer is designed in USA to offer a card reading/writing solution. - for ISO 7811-6 formats, it’s Powered by USB directly not need for extra power adaptor. - Works with all the major operating systems as Windows 7, 8, 10, Vista, X, bits) and Apple Computers(MacBook Air, MacBook Pro, Mac Mini, Mac Pro, i + +Price is $",50 +"How much does this cost to the nearest dollar? + +AIRAID Cold Air Intake System by K&N Increased Horsepower, Cotton Oil Filter Compatible with FORD (Excursion, F250 Super Duty, F350 Super Duty) +INCREASES HORESPOWER AIRAID performance air intake systems feature an aerodynamically-engineered intake tube, designed to accelerate airflow to your engine and reduce turbulence—helping increase your vehicle’s performance SUPERIOR FILTRATION AIRAID performance intake systems are engineered to provide a smooth, unimpeded path for airflow to your engine—keeping the air cooler and more oxygen-dense. Injecting more oxygen-rich air into the cylinders allows the engine to burn fuel more efficiently during combustion, offering you an increase in performance EASY INSTALL These simple-to-install, sophisticated systems help maintain proper air-to-fuel + +Price is $",150 +"How much does this cost to the nearest dollar? + +Bostingner Shower Faucets Sets Complete, Shower System 10 Inch All Metal Overhead Rain Shower Combo Set with Handheld Ceiling Mounted 3 Way Pressure Balance Shower Valve and Trim Kit, Matte Black +Ultra-Luxury Multi Shower - Unlike normal shower kit that can only use single function at a time, Bostingner Shower head system can be used both the rainfall shower head and handheld spray SIMULTANEOUSLY that strikes the perfect balance of GENEROUS COVERAGE and HIGH PRESSURE. The PUSH BUTTON design makes it easy to switch settings, so children and elderly can also use it without a problem Anti-scald & Water Hammer Prevention - The Upgrade cUPC Certified Anti-scald pressure balance valve is key to safe homes, effectively control water pressure balance to prevent scald and + +Price is $",100 +"How much does this cost to the nearest dollar? + +PIT66 Front Bumper Turn Signal Lights, Compatible with Mazda MX-5 Miata 1990 1991 1992 1993 1994 1995 1996 1997 W/Bulbs Smoked Lens Left Driver Side and Right Passenger Side. +Compatibility - Compatible with Mazda MX-5 Miata. Reference Oem Part Number Perfect Design - Smooth circular arc design, more beautiful appearance, tight fitting to the vehicle, unique lighting system, providing all-round lighting, evenly distributing light output in all directions, maximizing visibility. Safety - It can effectively help you to remind passing vehicles and identify whether the distance on both sides of the road is suitable for traffic in bad weather or other times of poor visibility, as well as determine the distance on both sides of the vehicle when + +Price is $",100 +"How much does this cost to the nearest dollar? + +Caseology Bumpy Compatible with Google Pixel Buds Pro Case (2022) - Sage Green +Hybrid layer clear case with ultra-clear PC body and TPU frame for drop-proof shock absorbance Slim yet durable Pixel Buds Pro case made with military grade materials Side colored TPU with rugged sandstone texture provides non-slip grip Wireless charging compatible and carabiner included for easy carrying Caseology Bumpy Compatible with Google Pixel Buds Pro Case (2022) / Not Compatible with Google Pixel Buds A series, Google Pixel Buds 2 Dimensions 2.67 x 2.54 x 1.15 inches, Weight 1.76 ounces, model number Rank Cell Phones & Accessories 34945, Cell Phone Basic Cases 14563, Connectivity technologies wireless, Special features + +Price is $",20 +"How much does this cost to the nearest dollar? + +Fleck 2510 Timer Mechanical Filter Control Head +- Mechanical 2510 control head for filter systems - - 12-day timer initiated backwash - - Maximum 17 GPM backwash (includes 7 GPM DLFC) - - Standard 2. 5 -8 NPSM mounting base - - Requires yoke or bypass to connect to plumbing -Heavy duty 2510 electromechanical control valve provides simple and durable backwash control for most common backwashing filters. A maximum of 17 GPM available backwash can handle even dense iron filter medias. Dedicated piston motor provides powerful piston movement that reduces system maintenance. New Fleck 2510 filter valve replacement for filter tanks Fully adjustable cycles for backwash and rinse times For back-washing filters, allows for strong + +Price is $",80 +"How much does this cost to the nearest dollar? + +Haloview MC7108 Wireless RV Backup Camera System 7'' Monitor Built in DVR Rear View Camera with Infrared Night Vision and Wide Viewing Angle for Truck/Trailer/RV/Pickups/Camping Car/Van/Farm +7 LCD digital monitor, Built-in recorder. Real time recording, video playback 10-32V wide voltage input, Support 4 wireless camera wide viewing angle, Wireless Line of Sight Range Up to 984 feet (This kit include 1 monitor+ 1 camera) Split mode, auto-scan mode and single-display mode available, Normal, mirror, FLIP, MIRROR-FLIP viewing options available HD 720P Digital Wireless Backup camera system, it has far better image resolution, stronger lens and longer transmission distance, which will bring you a + +Price is $",250 +"How much does this cost to the nearest dollar? + +Schmidt Spiele - Manhattan +The product of this place is a board game of Hans im Gluck's / Andreas Seyfarth work. It is with a Japanese manual in Japan version package of Mobius Games Inc.. Number of players 2-4 people Target age 10 years old or older Playing Time 60 minutes one point sales easy-to-understand rules, fun easy-to-understand, easy-to-understand strategy, the progress of the game can be seen in the eyes. It is a game that can be recommended to anyone. 2-4 people for Age 10 years old - adult Travel Time 60 minutes From Germany With a Japanese manual Dimensions 11.81 x 5.91 x 3.94 inches, Weight 10.6 ounces, model number + +Price is $",30 +"How much does this cost to the nearest dollar? + +Corsa 14333 Tip Kit (Ford Mustang GT) +CORSA Performance Pro-Series tips are dual-walled stainless steel. The dual-wall is designed to protect against heat distortion and maintain visual appeal. Each tip is adorned with high-definition, precision laser engraving that provides unmatched detail. CORSA Performance tip kits are designed to fit directly to CORSA exhaust systems. Mustang GT 5.0L Fits Premium Package ONLY Requires Roush Quad Tip Rear Valance Modification Dual walled design protects against heat distortion CORSA Performance tip kits are designed to fit directly to CORSA exhaust systems Premium Stainless Steel Construction Manufacturer Corsa Performance, Brand Corsa, Weight 7 pounds, Dimensions 23 x 11 x 7 inches, model number 14333, Is Discontinued No, Manufacturer Part + +Price is $",80 +"How much does this cost to the nearest dollar? + +Hoshizaki FM116A Fan Motor Kit 1 +Product Description Hoshizaki FM116A Fan Motor Kit 1 is a genuine OEM (original equipment manufacturer) replacement part. Hoshizaki is committed to developing original products that bring comfort and convenience to your life. Approved by original equipment manufacturer (OEM) and intended only for designed and specified use. From the Manufacturer FAN MOTOR KIT 1. Hoshizaki Genuine OEM replacement part. Hoshizaki is committed to developing original products that bring comfort and convenience to your life. Use genuine OEM parts for safety reliability and performance. Genuine OEM replacement part Hoshizaki is committed to developing original products that bring comfort and convenience to your life Genuine OEM provides safety, reliability, and optimal performance Approved by original + +Price is $",80 +"How much does this cost to the nearest dollar? + +BAINUO Antler Chandelier Light Antler Deer Chandelier for Living Room Dining Room Balcony Bedroom Cafe Bar Cabin Hanging Light Fixtures +Specification Product Type Vintage Style Resin Deer Faux Antler Chandelier Style vintage/farmhouse/rustic/transitional/art deco Bulb Base 6* Not Included) Bulbs Category LED/INCANDESCENT/HALOGEN Product Rating Voltage 110V Maximum Wattage 40W for single lamp Weight 16.5LB Dimension Diameter 34.4 inches, Height 17.7 inches. Material Resin Color Brown Features antler chandelier also works on sloped, slanted or vaulted ceiling; 2.Deer horn 6 light pendant light wide application - The indoor dining table chandelier is perfectly used + +Price is $",280 +"How much does this cost to the nearest dollar? + +DNA MOTORING Smoke Lens Amber Headlights Replacement For 06-10 Explorer +A headlight (headlamp) is a lamp attached to the front of a vehicle to light the road ahead. Headlight performance has steadily improved throughout the automobile age, spurred by the great disparity between daytime and nighttime traffic fatalities. Headlights are one of the most important components of your vehicle; they allow you to see the road in front of you clearly during the night and in any bad weather that may arise. Our headlights upgrade the face of your vehicle with clear style and extreme range. Compatible with 06-10 Ford explorer. Plug-n-Play Operation, Direct Bolt-On OE Fitment or Replacement for the Stock Unit Uses H13 High Beam & Low Beam / Bulbs are NOT Included Brings a Different Appearance + +Price is $",100 +"How much does this cost to the nearest dollar? + +Wera Stainless 3840/1 TS 2.5mm Hex Insert Bit, Drive +Wera 3840/1 TS 2.5mm hex insert bit for 1/4 hex drive is designed to keep rust at bay. Wera’s Stainless tool line is manufactured from 100% Stainless steel, preventing extraneous rust caused by use of conventional tools contaminating stainless fasteners. Wera’s unique vacuum ice-hardened process gives the necessary hardness for industrial applications. Torsion (TZ) bits are designed to prevent premature wear for improved service life. Hex-Plus technology prevents rounding of screw recess and transfers up to 20% more torque. Stainless bits partnered with Wera’s stainless Rapidaptor will protect the full length of the bit against extraneous rust. Ice + +Price is $",30 +"How much does this cost to the nearest dollar? + +Celestron - PowerSeeker 127EQ Telescope - Manual German Equatorial Telescope for Beginners - Compact and Portable - Bonus Astronomy Software Package - 127mm Aperture & 1.25 Moon Filter +Celestron 1.25 Moon FilterThe Moon has the distinction of being the most often viewed celestial object through backyard telescopes. It is undoubtedly beautiful and mysterious, and is one of those constants in our lives that connects us to every other being on our planet. Regardless of who we are or where we live or travel, we all look at the same moon. It is difficult to look at the Moon through a telescope and see all of the details due to its brightness. Our eyes are not ready for the bright beam of light that emits from the eyepiece, and they “ + +Price is $",100 +"How much does this cost to the nearest dollar? + +NHOPEEW Android Car Radio Carplay for Jeep Wrangler 2015 2016 2017 Touchscreen Bluetooth Car Stereo with AHD Backup Camera/Mic/HiFi +Compatible with jeep wrangler 2015 2016 2017 Android 11 System The Android head unit fit for jeep wrangler 2015 2016 2017, it is plug&play, easy to install. Android 11 operating system, 2GB RAM & 32GB ROM, ensures smooth operation and faster response. Android Auto & Apple Carplay Support wireless and wired connect your phone to the carplay jeep wrangler radio, with Siri voice control allows you make calls, send and receive messages, enjoy music and Navigating. Large Screen with Hi-Fi The jeep wrangler + +Price is $",199 +"How much does this cost to the nearest dollar? + +Other Harmonica A) +For the sound of yesterday from the technology of today, The new Suzuki 2 timers are sure to bring back memories. 2 timers have a traditional sound quality made possible by a dual hole octave/tremolo tuning action, Excellent for folk and country style playing. Laser tuned reeds are extra lightweight to give that special harmonica octave effect. Available in 21 and 24 note Models, Includes a soft lined hard Shell Case. Available in the keys of C & a. Excellent for folk and country style playing Package Dimensions 3.302 H x 18.541 L x 5.588 W (centimetres) Package Weight 0.349 pounds Country of Origin Japan Weight 4 Ounces, Dimensions 6 x 3 x 1 inches + +Price is $",40 +"How much does this cost to the nearest dollar? + +Harley Air Filter Venturi Intake Air Cleaner Motorcycle Cnc Cut Chrome Kit for Touring Street Glide 2008 - 2016 Softail 2016 - 2017 Fitment - C (Gray) +Package x Air Cleaner Intake Filter with Accessories Fitment for Harley Touring Street Glide 2008 - 2016, Touring Road Glide 2008 - 2016, Softail 2016 - 2017, Dyna FXDLS 2017, FLSTNSE 2014 - 2015, FLSTSE 2011 - 2012, FXSBSE 2013 - 2014 NOTE Before purchase, please check electric or non-electric throttle on your Touring models. If Electric - use fitment C. If non + +Price is $",100 +"How much does this cost to the nearest dollar? + +Elite Screens Edge Free Ambient Light Rejecting Fixed Frame Projection Projector Screen,Aeon CineGrey 3D Series, 16 9 for Home Theater, Movie and Office Presentations +DIMENSION SIZE Diagonal, 16 9 Aspect Ratio. View Size 66.2 H x 117.6 W. Overall Size 66.7 H x 118.1 W x 1. 3 D. SCREEN MATERIAL Ceiling Ambient Light Rejecting Material (CLR/ALR) CineGrey 3D with 65% rejection ratio and Features a 90° Viewing Angle with 1.2 Gain. It is best for family rooms, educational facilities, conference rooms or any applications in which incident light is a factor. Fixed Frame Projector Screen is compatible with Standard Throw + +Price is $",250 \ No newline at end of file diff --git a/week6/community-contributions/lisekarimi/helpers/__init__.py b/week6/community-contributions/lisekarimi/helpers/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/week6/community-contributions/lisekarimi/helpers/items.py b/week6/community-contributions/lisekarimi/helpers/items.py new file mode 100644 index 0000000..a594e27 --- /dev/null +++ b/week6/community-contributions/lisekarimi/helpers/items.py @@ -0,0 +1,120 @@ +from typing import Optional # A variable might be a certain type or None +from transformers import AutoTokenizer +import re + +BASE_MODEL = "meta-llama/Meta-Llama-3.1-8B" + +MIN_TOKENS = 150 # Minimum tokens required to accept an item +MAX_TOKENS = 160 # We limit to 160 tokens so that after adding prompt text, the total stays around 180 tokens. + +MIN_CHARS = 300 # Reject items with less than 300 characters +CEILING_CHARS = MAX_TOKENS * 7 # Truncate long text to about 1120 characters (approx 160 tokens) + +class Item: + """ + An Item is a cleaned, curated datapoint of a Product with a Price + """ + + # Load tokenizer for the model + tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True) + + # Define PRICE_LABEL and question for the training prompt + PRICE_LABEL = "Price is $" + QUESTION = "How much does this cost to the nearest dollar?" + + # A list of useless phrases to remove to reduce noise for price prediction + REMOVALS = ['"Batteries Included?": "No"', '"Batteries Included?": "Yes"', '"Batteries Required?": "No"', '"Batteries Required?": "Yes"', "By Manufacturer", "Item", "Date First", "Package", ":", "Number of", "Best Sellers", "Number", "Product "] + + # Attributes for each item + title: str + price: float + category: str + token_count: int = 0 # How many tokens in the final prompt + + # Optional fields + details: Optional[str] # The value can be a string or can be None + prompt: Optional[str] = None + include = False # Whether to keep the item or not + + def __init__(self, data, price): + self.title = data['title'] + self.price = price + self.parse(data) + + def scrub_details(self): + """ + Removes useless phrases from details, which often has repeated specs or boilerplate text. + """ + details = self.details + for remove in self.REMOVALS: + details = details.replace(remove, "") + return details + + def scrub(self, stuff): + """ + Clean up the provided text by removing unnecessary characters and whitespace + Also remove words that are 7+ chars and contain numbers, as these are likely irrelevant product numbers + """ + stuff = re.sub(r'[:\[\]"{}【】\s]+', ' ', stuff).strip() + stuff = stuff.replace(" ,", ",").replace(",,,",",").replace(",,",",") + words = stuff.split(' ') + select = [word for word in words if len(word)<7 or not any(char.isdigit() for char in word)] + return " ".join(select) + + def parse(self, data): + """ + Prepares the text, checks length, tokenizes it, and sets include = True if it’s valid. + """ + # Builds a full contents string by combining description, features, and cleaned details. + contents = '\n'.join(data['description']) + if contents: + contents += '\n' + features = '\n'.join(data['features']) + if features: + contents += features + '\n' + self.details = data['details'] + if self.details: + contents += self.scrub_details() + '\n' + + # If content is long enough, trim it to max char limit before processing. + if len(contents) > MIN_CHARS: + contents = contents[:CEILING_CHARS] + + # Clean and tokenize text, then check token count. + text = f"{self.scrub(self.title)}\n{self.scrub(contents)}" + tokens = self.tokenizer.encode(text, add_special_tokens=False) + + if len(tokens) > MIN_TOKENS: + # Truncate tokens, decode them back and create the training prompt + tokens = tokens[:MAX_TOKENS] + text = self.tokenizer.decode(tokens) + self.make_prompt(text) + + # Mark the item as valid and ready to be used in training + self.include = True # Only items with MIN_TOKENS <= tokens <= MAX_TOKENS are kept + + + def make_prompt(self, text): + """ + Builds the training prompt using the question, text, and price. Then counts the tokens. + """ + self.prompt = f"{self.QUESTION}\n\n{text}\n\n" + self.prompt += f"{self.PRICE_LABEL }{str(round(self.price))}.00" + self.token_count = len(self.tokenizer.encode(self.prompt, add_special_tokens=False)) + + def test_prompt(self): + """ + Returns the prompt without the actual price, useful for testing/inference. + """ + return self.prompt.split(self.PRICE_LABEL )[0] + self.PRICE_LABEL + + def __repr__(self): + """ + Defines how the Item object looks when printed — it shows the title and price. + """ + return f"<{self.title} = ${self.price}>" + + + + + \ No newline at end of file diff --git a/week6/community-contributions/lisekarimi/helpers/loaders.py b/week6/community-contributions/lisekarimi/helpers/loaders.py new file mode 100644 index 0000000..4314c65 --- /dev/null +++ b/week6/community-contributions/lisekarimi/helpers/loaders.py @@ -0,0 +1,106 @@ +from datetime import datetime # Measure how long loading takes +from tqdm import tqdm # Shows a progress bar while processing data +from datasets import load_dataset # Load a dataset from Hugging Face Hub +from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor # For parallel processing (speed) +from items import Item + +CHUNK_SIZE = 1000 # Process the dataset in chunks of 1000 datapoints at a time (for efficiency) +MIN_PRICE = 0.5 +MAX_PRICE = 999.49 +WORKER = 4 # Set the number of workers here + +class ItemLoader: + + def __init__(self, name): + """ + Initialize the loader with a dataset name. + """ + self.name = name # Store the category name + self.dataset = None #Placeholder for the dataset (we load it later in load()) + + def process_chunk(self, chunk): + """ + Convert a chunk of datapoints into valid Item objects. + """ + batch = [] # Initialize the list to hold valid items + + # Loop through each datapoint in the chunk + for datapoint in chunk: + try: + # Extract price from datapoint + price_str = datapoint['price'] + if price_str: + price = float(price_str) + + # Check if price is within valid range + if MIN_PRICE <= price <= MAX_PRICE: + item = Item(datapoint, price) + + # Keep only valid items + if item.include: + batch.append(item) + except ValueError: + continue # Skip datapoints with invalid price format + return batch # Return the list of valid items + + + def load_in_parallel(self, workers): + """ + Split the dataset into chunks and process them in parallel. + """ + results = [] + size = len(self.dataset) + chunk_count = (size // CHUNK_SIZE) + 1 + + # Build chunks directly here (no separate function) + chunks = [ + self.dataset.select(range(i, min(i + CHUNK_SIZE, size))) + for i in range(0, size, CHUNK_SIZE) + ] + + # Process chunks in parallel using multiple CPU cores + with ProcessPoolExecutor(max_workers=workers) as pool: + for batch in tqdm(pool.map(self.process_chunk, chunks), total=chunk_count): + results.extend(batch) + + # Add the category name to each result + for result in results: + result.category = self.name + + return results + + + def load(self, workers=WORKER): + """ + Load and process the dataset, returning valid items. + """ + # Record start time + start = datetime.now() + + # Print loading message + print(f"Loading dataset {self.name}", flush=True) + + # Load dataset from Hugging Face (based on category name) + self.dataset = load_dataset( + "McAuley-Lab/Amazon-Reviews-2023", + f"raw_meta_{self.name}", + split="full", + trust_remote_code=True + ) + + # Process the dataset in parallel and collect valid items + results = self.load_in_parallel(workers) + + # Record end time and print summary + finish = datetime.now() + print( + f"Completed {self.name} with {len(results):,} datapoints in {(finish-start).total_seconds()/60:.1f} mins", + flush=True + ) + + # Return the list of valid items + return results + + + + \ No newline at end of file diff --git a/week6/community-contributions/lisekarimi/helpers/testing.py b/week6/community-contributions/lisekarimi/helpers/testing.py new file mode 100644 index 0000000..9422182 --- /dev/null +++ b/week6/community-contributions/lisekarimi/helpers/testing.py @@ -0,0 +1,84 @@ +import math +import matplotlib.pyplot as plt + +GREEN = "\033[92m" +YELLOW = "\033[93m" +RED = "\033[91m" +RESET = "\033[0m" +COLOR_MAP = {"red":RED, "orange": YELLOW, "green": GREEN} + +class Tester: + + def __init__(self, predictor, data, title=None, size=250): + self.predictor = predictor + self.data = data + self.title = title or predictor.__name__.replace("_", " ").title() + self.size = size + self.guesses = [] + self.truths = [] + self.errors = [] + self.sles = [] + self.colors = [] + + def color_for(self, error, truth): + if error<40 or error/truth < 0.2: + return "green" + elif error<80 or error/truth < 0.4: + return "orange" + else: + return "red" + + def run_datapoint(self, i): + datapoint = self.data[i] + guess = self.predictor(datapoint) + truth = datapoint["price"] + error = abs(guess - truth) + log_error = math.log(truth+1) - math.log(guess+1) + sle = log_error ** 2 + color = self.color_for(error, truth) + title = datapoint["text"][:40] + "..." if len(datapoint["text"]) > 40 else datapoint["text"] + self.guesses.append(guess) + self.truths.append(truth) + self.errors.append(error) + self.sles.append(sle) + self.colors.append(color) + # print(f"{COLOR_MAP[color]}{i+1}: Guess: ${guess:,.2f} Truth: ${truth:,.2f} Error: ${error:,.2f} SLE: {sle:,.2f} Item: {title}{RESET}") + + def chart(self, title): + max_error = max(self.errors) + plt.figure(figsize=(15, 6)) + max_val = max(max(self.truths), max(self.guesses)) + plt.plot([0, max_val], [0, max_val], color='deepskyblue', lw=2, alpha=0.6) + plt.scatter(self.truths, self.guesses, s=3, c=self.colors) + plt.xlabel('Ground Truth') + plt.ylabel('Model Estimate') + plt.xlim(0, max_val) + plt.ylim(0, max_val) + plt.title(title) + + # Add color legend + from matplotlib.lines import Line2D + legend_elements = [ + Line2D([0], [0], marker='o', color='w', label='Accurate (green)', markerfacecolor='green', markersize=8), + Line2D([0], [0], marker='o', color='w', label='Medium error (orange)', markerfacecolor='orange', markersize=8), + Line2D([0], [0], marker='o', color='w', label='High error (red)', markerfacecolor='red', markersize=8) + ] + plt.legend(handles=legend_elements, loc='upper left') + plt.show() + + def report(self): + average_error = sum(self.errors) / self.size + rmsle = math.sqrt(sum(self.sles) / self.size) + hits = sum(1 for color in self.colors if color=="green") + title = f"{self.title} Error=${average_error:,.2f} RMSLE={rmsle:,.2f} Hits={hits/self.size*100:.1f}%" + self.chart(title) + + def run(self): + self.error = 0 + for i in range(self.size): + self.run_datapoint(i) + self.report() + + @classmethod + def test(cls, function, data): + cls(function, data).run() \ No newline at end of file diff --git a/week6/day1.ipynb b/week6/day1.ipynb index 3035814..8202845 100644 --- a/week6/day1.ipynb +++ b/week6/day1.ipynb @@ -102,6 +102,18 @@ "%matplotlib inline" ] }, + { + "cell_type": "markdown", + "id": "cd6d801e-d195-45fe-898e-495dbcb19d7d", + "metadata": {}, + "source": [ + "## Load our dataset\n", + "\n", + "In the next cell, we load in the dataset from huggingface.\n", + "\n", + "If this gives you an error like \"trust_remote_code is no longer supported\", then please run this command in a new cell: `!pip install datasets==3.6.0` and then restart the Kernel, and try again." + ] + }, { "cell_type": "code", "execution_count": null, @@ -109,8 +121,6 @@ "metadata": {}, "outputs": [], "source": [ - "# Load in our dataset\n", - "\n", "dataset = load_dataset(\"McAuley-Lab/Amazon-Reviews-2023\", f\"raw_meta_Appliances\", split=\"full\", trust_remote_code=True)" ] }, @@ -429,7 +439,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.12" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week6/day2.ipynb b/week6/day2.ipynb index 7eb5c8b..d179bff 100644 --- a/week6/day2.ipynb +++ b/week6/day2.ipynb @@ -119,7 +119,7 @@ "source": [ "# Load in the same dataset as last time\n", "\n", - "items = ItemLoader(\"Appliances\").load()" + "items = ItemLoader(\"Home_and_Kitchen\").load()" ] }, { @@ -624,7 +624,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.12" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week6/day3.ipynb b/week6/day3.ipynb index 9e51979..2170e9d 100644 --- a/week6/day3.ipynb +++ b/week6/day3.ipynb @@ -918,7 +918,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.12" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week6/day4.ipynb b/week6/day4.ipynb index cb7058f..56885b5 100644 --- a/week6/day4.ipynb +++ b/week6/day4.ipynb @@ -398,7 +398,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week6/lite.ipynb b/week6/lite.ipynb index 502d959..1a30deb 100644 --- a/week6/lite.ipynb +++ b/week6/lite.ipynb @@ -427,7 +427,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.12" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week7/community_contributions/lisekarimi/09_part5_llama31_8b_quant.ipynb b/week7/community_contributions/lisekarimi/09_part5_llama31_8b_quant.ipynb new file mode 100644 index 0000000..6a10dd6 --- /dev/null +++ b/week7/community_contributions/lisekarimi/09_part5_llama31_8b_quant.ipynb @@ -0,0 +1,612 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "4WDyBU0Vm0Zl" + }, + "source": [ + "# 🔍 Predicting Item Prices from Descriptions (Part 5)\n", + "---\n", + "- Data Curation & Preprocessing\n", + "- Model Benchmarking – Traditional ML vs LLMs\n", + "- E5 Embeddings & RAG\n", + "- Fine-Tuning GPT-4o Mini\n", + "- ➡️ Evaluating LLaMA 3.1 8B Quantized\n", + "- Fine-Tuning LLaMA 3.1 with QLoRA\n", + "- Evaluating Fine-Tuned LLaMA\n", + "- Summary & Leaderboard\n", + "\n", + "---\n", + "\n", + "# 🦙 Part 5: Evaluating LLaMA 3.1 8B Quantized\n", + "\n", + "- 🧑‍💻 Skill Level: Advanced\n", + "- ⚙️ Hardware: ⚠️ GPU required - use Google Colab\n", + "- 🛠️ Requirements: 🔑 HF Token\n", + "- Tasks:\n", + " - Quantize LLaMA 3.1 8B to 4-bit\n", + " - Define prediction function\n", + " - Evaluate with Tester\n", + "\n", + "We know LLaMA 3.1 won’t beat frontier models — but how far behind is it without any tuning?\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "MDyR63OTNUJ6", + "outputId": "7e9e5b6b-d11c-45df-d774-2da5f6455d51" + }, + "outputs": [], + "source": [ + "# Install required packages in Google Colab\n", + "%pip install -q datasets torch transformers bitsandbytes accelerate matplotlib" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-yikV8pRBer9" + }, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import re\n", + "import math\n", + "import torch\n", + "from huggingface_hub import login\n", + "from datasets import load_dataset\n", + "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, set_seed\n", + "from google.colab import userdata\n", + "import matplotlib.pyplot as plt\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "uuTX-xonNeOK" + }, + "outputs": [], + "source": [ + "# Google Colab User Data\n", + "# Ensure you have set the following in your Google Colab environment:\n", + "hf_token = userdata.get('HF_TOKEN')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Constants\n", + "\n", + "BASE_MODEL = \"meta-llama/Meta-Llama-3.1-8B\"\n", + "HF_USER = \"lisekarimi\"\n", + "DATASET_NAME = f\"{HF_USER}/pricer-data\"\n", + "\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DTMo_1msQb9X" + }, + "source": [ + "## 📥 Load Dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# #If you face NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported run:\n", + "# %pip install -U datasets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 177, + "referenced_widgets": [ + "048d1b454cdc400ea5979230703770b8", + "7dd26897141a4d87bc3893bb1b1bafb3", + "c4f6e0b6237c46b98d393d27b35cabe7", + "dd525aced49e4ebe8395514601c20b20", + "69a35e3916ae488296a70687b5e890de", + "e3442871bdb5445ba86e5aa0f0ee8de9", + "8981816dd709488e9ace85e6b160892d", + "6edd1bd723324c128fec4de5f1758330", + "f77a98060e9d48dc8ac4919902ffc88d", + "ed6dfe86de2e4957b4e24df3f564c5db", + "e253ec71b5104dd291455753a160c7f1", + "b802074124bc4d7d979c28ba9e84a432", + "76abd56919414e2b8b2b4683d4cb2bd5", + "2bb62653ed2d4e86b9eb0476a0333a3a", + "58a799a559ff4f2681b586650c35b12e", + "878d6cea9b2c40d3b3b58b1c1bff902f", + "d00a41c676034c38881da90ae961e936", + "da539e354ea540509a2ea7d13dc8dc45", + "4f27fc91cbb14aa08d08b848c6689937", + "abea7cdfa8624614aca8d8ab3c07a671", + "b5ed6e3c852c49c1b904a19e05f5a90b", + "ac7225138dfa48b086b30f154f9a1111", + "70da4d47bd4c4b57a7f65d82d7a01829", + "19f8ee6f626845beaa7154efe4802045", + "13556136763b49bda041c92445ee2ad4", + "a2f5735e3c314155be432484fcf72fe7", + "81c15499cc8e4011b9bd392f660a3b6e", + "96b6a830727d48539c181343efada938", + "0ead4e0b3435492693636130d2782c25", + "50a8e3f2c06c4595931788b18f5152e9", + "cebc935dafae4d4eba105d3107c46ddc", + "f566351d5c504181b00a53c3c654090f", + "4238f42b624142fea3746fb2f03bcc2d", + "be8416b487d04d769fd93973b7fe916a", + "1bbada4a48444e60a360aba596af77b3", + "75436fb8e4eb43e4b0a309871e4d3cc1", + "e4ae815e69d3448296e4c3bcb713710e", + "72ca180ed6604f148f2f2e61ac97259a", + "c0b34963c7a446908fadfb38c958b612", + "38fa12125f024935852122d434c2cbf6", + "ff564729da354497b606bfb809ac4e33", + "9ee352287f8b4e27af617e3427cb3012", + "15522242cf72440ca8895496ad5144c4", + "e7dc05ebb11a4b30b4806c2628ec6bde", + "413fafe61f7c415a9c1c90dea56aa301", + "ac2522256e73492d9b5d0e7976d92ff5", + "53266635573042b4b94496f38915e6d4", + "8da5f5529fef4f1bb884793e503e5fc5", + "80b7529e0ad541749bf464a1d8927225", + "2203154b7c464105b12f1ef8caf410cd", + "87b8c46fa98a4940ab90422ab44d33de", + "6dee11eaa4a849bbb58488a233d3719e", + "be947f2a2b8a484daa61f45ae06c5232", + "0b1de2365ad1497ab2ebfec1be33a720", + "326453121cd84c1e95b3b5da0166c931" + ] + }, + "id": "5PVXACKHQhh4", + "outputId": "80dc4772-ea31-4752-8f97-573efaa43917" + }, + "outputs": [], + "source": [ + "dataset = load_dataset(DATASET_NAME)\n", + "train = dataset['train']\n", + "test = dataset['test']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "pGJR24lbQlRS", + "outputId": "a1bb5e66-1aa9-40b7-c361-562eafae5d8c" + }, + "outputs": [], + "source": [ + "test[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vrxH6h00P9qc" + }, + "source": [ + "## 🦙 Load Tokenizer and Quantized LLaMA Model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 418, + "referenced_widgets": [ + "8698049df4ce440d8a16bc34d69231e7", + "b0f74f9ed33649ebb16952d0fb0aecab", + "00241684d4f64383b032a1362d174d55", + "b74ea8552d8145d28f43cf7ad8450d0b", + "6838953f363945d79e079c12aeb2232b", + "57c4a682571e401f9fec996603fa587d", + "e2b93774cb4a4648a4210c7693864ccc", + "766c8e2406ce407faf3489229dada75a", + "a52ea6c6672b4ab2bae4f669dc45deb6", + "381094cbfef64531a58df85e6d453847", + "a6a14894a06a4d4b839f3b791cfadd34", + "b43b118cae174c3c810f08c2249b80dc", + "6502e59cbd344bf6be966923bb3e38fa", + "f86fcf5f6bf24ce399c5d64dc83c3690", + "46598b397fbc4461bb83ad45000e5569", + "dd307f2b56824c6c91b8fe1c028a1704", + "b87577323cb44a2f9f3cc0a8219eadf4", + "0ed52815268a4d90a224029e9e23e09f", + "fafd3011a3cb4e0099a0db77caf843aa", + "89bb151875e349118677c9677bec4137", + "b8722d98e81d4e3494c9c0b8b01319d4", + "b8f661dd658e45a49b5bac2bdf7f3b78", + "f031fadfbba14031a944c351fd99b032", + "a6bf09a98289481abb0c9882702eb575", + "09d54c271e55463495cb9b617d2ec41d", + "32bdcc7f7efd47679258f398184dd0ab", + "0f090959768e40aa8fabfacdf772d8df", + "e40a9c85694c446d84d3493274138178", + "ceb1bbb613f0438aa6996d5551f713f3", + "59b59d4040e04b65a66f578160d13d43", + "079ab8a1fae64a0782ea8035f494f2fd", + "1e425099c1044c14986386e5a4ce0b48", + "1e3e028ec53448c691abb2cfad4ffd8c", + "d2142e9de5ea4dd2b8d00b56354352c3", + "cf1d7a58189b4a5bbe6d005de998548c", + "5b6e10b9e5a146be85e519c1bc476fcd", + "6d9feded70b84654ab79f9d13b065c83", + "ec5afa847d364fbc974b35d821ccd931", + "fa392b34ae8647668e94aca22c1a5edb", + "4d315c92d7c84191a165218d403c0f8d", + "c4279579368841b99a46f529c55125e9", + "572bb151402e4940a8d7c92156f9711b", + "d4536337378b4146b054371b18f83fdc", + "14ddae4a15d74c80b1712443853e3f96", + "46e827e5d7a94a619536ff08127b6172", + "ad7977cc642e4cae890e52d03d753788", + "73ba16b54d314d94aaeebea2ba291a94", + "1bc2120a87ab4ad99798b1706342bd89", + "4c4772ace8c246b9a5c8f870ed27c11b", + "d179e366c02f4bb2897cc9f531955e5e", + "1134fd00384740d0a39b6de241ca17cd", + "5a120718aa934959bd50cf4864b137b4", + "46917a8997f942fea0aaf00a95459f93", + "6076b184b66c4d50a91bc477c8eea53e", + "2538bc7fc4594363934266f25bcd52bc", + "f6144dfa2a20416b9e5c28615a5ff129", + "35941d364234488da6fcc0997a5cccf7", + "8c5e160cc4434ca99f694f5e195a2005", + "fa82a4c6e8fc4591aa5652d7d95c6e40", + "2a82034aebcb4e3fbaff825ca59817d0", + "edcabc56841a4ba68ee53385fe2dc0f6", + "e8bf66aa640e433d8d890ae541b21dcd", + "eb7b76b25ce44dc1b8eba7cac8bc9671", + "c9648aceda71470284f6ed7ce2add462", + "7df6ddb46ae3419fbc3fac488eb8a6b6", + "3b2aa7ca49e4451fbfc65560a2d3d43d", + "6af2831aa4e641568d72df6d13fc074a", + "732cb67ef916489298655df845773934", + "cc96573e39e148dbac8b0bd299f0f0b0", + "4e9cbcdc1cfb495a850be45cf752d3c4", + "b811921bdbe84b0dbd9add0f69271ef0", + "a1053fdea18348119949b326f3a12651", + "90c10be928c54821aacf11705c0513ff", + "a543366ec93c486bb2d28d1ff9567197", + "97ff48d2660444a1a7503e735e2b2a55", + "1fd84a85c98246adb2e18e41c8a9d88f", + "fed4a63b10ea4788af8cd181d8d24863", + "8bd5f65dda734db1a253897f85428d4c", + "2744cec152a44fd483f5cdd8f4de8c70", + "47780d4dab77454ab898f6707d8d4168", + "6653b71e07bc488ebbb4ed5728564ccd", + "59d44fcd08114cb4aaeea768b1438bcf", + "18cf08eb051d48c9a5c0b6b827507b7f", + "fee935f9fb354a67a37d42641ff0d81e", + "22ec450031234856a304ccee34d452f4", + "4944567015cc46be83a8524c0542722a", + "b80d78f92da64255991b4fcfde98b1d4", + "e239b0fe8301409f9dd7e5e801949ec2", + "6e7533e6b43c4f1dbb1e0421b99fdc47", + "4be15c8712e340b3b9d9a3bd1c7c7516", + "337d98c0886948929a48411422a81ff8", + "83d8d49cf93c4af7bb3e3cfa3234c6c6", + "a1fb4ca7292e4cdc85b522248fdddaf6", + "ce26e74cc006450ca4e44bee2d14d80c", + "c01c7b35b1914ae681550421c0035a8c", + "c7966f356f80422abb3dcb45dbc541db", + "a86bb39581e1430a8314a616951af75e", + "446c7ef56bcc437388d4a99859c1b9fb", + "5c16ffe6a5504f2585aa6bc3132ff2ee", + "9d466ef4939c43f2846f22a5a21e5cd0", + "bec4a9e185074743848c04c4aff12037", + "cff028485bae4d96b4f7a48b738f6b61", + "c02af33357e64469aeb01a7af5a9ab37", + "fd0ff0c9933d4238a373c286f8e1dd5d", + "c517e6db93f04398b9a3ccc86e090499", + "e75b68d16bc443e39974922342952de9", + "0536d41437f54df38624a7d290e45325", + "eec3717367b348a388bb76eb6482ce25", + "e16f1ef5ee06493fac2d5871806a3b3a", + "24f7575f0f47498480b2a2f79f0d4ce5", + "c17fe53a4a2b4266a3cbd24c9f145cde", + "b4d715f23ada4ee48fdfd9af463f7124", + "7d9102b6a7b44e14809ecf8fa421ee70", + "8d640aa311f34b33b0967e128c138130", + "3680065e53494bad98e74fd7c81185dd", + "11fb0bacbcc44352b3b25d9f0923c332", + "7319ccdb3e3349328d6f9b4bb5445776", + "35017cc6cb484eeaa12714532e872f99", + "ae22d146f6f24981bde97896ad3d8b14", + "3feeca46c382431c9868e4852ca04d49", + "0e3b239635704ab391f1801b762b7f93", + "90d35d2eda00413eba027093309f6c31", + "d61446d3664a455baaada9761a1715be", + "92b7d7f81ebc441d8e6d6e20477aa37c", + "8b5b230489104f6bba63720fa9fad0ae", + "0afc992c54ec4a10a7f9fd3e45fa7761", + "4fbbb9ba6f4e44d6b2ccc5197dad5488", + "18e5f93ef3b64301b7c1548d17843d64", + "ee997f8eeccc4dd98aea71b930531cf5", + "e11a6cfa4615457090d4c87815fdb716", + "4d74f0ec93f54e09a22c3cb93a042570", + "31cdf14402f34270bdc1b1efd2a0d011" + ] + }, + "id": "TAit9IzsQLcc", + "outputId": "176a77ad-0245-4a3d-b9f3-e139de359da7" + }, + "outputs": [], + "source": [ + "quant_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_compute_dtype=torch.bfloat16,\n", + " bnb_4bit_quant_type=\"nf4\"\n", + ")\n", + "\n", + "tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)\n", + "tokenizer.pad_token = tokenizer.eos_token\n", + "tokenizer.padding_side = \"right\"\n", + "\n", + "base_model = AutoModelForCausalLM.from_pretrained(\n", + " BASE_MODEL,\n", + " quantization_config=quant_config,\n", + " device_map=\"auto\",\n", + ")\n", + "base_model.generation_config.pad_token_id = tokenizer.pad_token_id\n", + "\n", + "print(f\"Memory footprint: {base_model.get_memory_footprint() / 1e9:.1f} GB\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 🤖 Prediction" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1FfMJ2JbzEr3" + }, + "outputs": [], + "source": [ + "def extract_price(s):\n", + " if \"Price is $\" in s:\n", + " contents = s.split(\"Price is $\")[1]\n", + " contents = contents.replace(',','').replace('$','')\n", + " match = re.search(r\"[-+]?\\d*\\.\\d+|\\d+\", contents)\n", + " return float(match.group()) if match else 0\n", + " return 0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "CgN8eRttRAZx" + }, + "outputs": [], + "source": [ + "def model_predict(prompt):\n", + " set_seed(42)\n", + " inputs = tokenizer.encode(prompt, return_tensors=\"pt\").to(\"cuda\")\n", + " attention_mask = torch.ones(inputs.shape, device=\"cuda\")\n", + " outputs = base_model.generate(inputs, max_new_tokens=4, attention_mask=attention_mask, num_return_sequences=1)\n", + " response = tokenizer.decode(outputs[0])\n", + " return extract_price(response)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hpdEk2-FW6aT", + "outputId": "f8913c56-1a8f-4a13-9084-21acfdb64ceb" + }, + "outputs": [], + "source": [ + "model_predict(test[0]['text']), test[0]['price']" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "auFzPUJKTLln" + }, + "source": [ + "## 🧪 Run Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jnoI1EWGTUau" + }, + "outputs": [], + "source": [ + "# Helper class for evaluating model predictions\n", + "\n", + "GREEN = \"\\033[92m\"\n", + "YELLOW = \"\\033[93m\"\n", + "RED = \"\\033[91m\"\n", + "RESET = \"\\033[0m\"\n", + "COLOR_MAP = {\"red\":RED, \"orange\": YELLOW, \"green\": GREEN}\n", + "\n", + "class Tester:\n", + "\n", + " def __init__(self, predictor, data, title=None, size=100):\n", + " self.predictor = predictor\n", + " self.data = data\n", + " self.title = title or predictor.__name__.replace(\"_\", \" \").title()\n", + " self.size = size\n", + " self.guesses = []\n", + " self.truths = []\n", + " self.errors = []\n", + " self.sles = []\n", + " self.colors = []\n", + "\n", + " def color_for(self, error, truth):\n", + " if error<40 or error/truth < 0.2:\n", + " return \"green\"\n", + " elif error<80 or error/truth < 0.4:\n", + " return \"orange\"\n", + " else:\n", + " return \"red\"\n", + "\n", + " def run_datapoint(self, i):\n", + " datapoint = self.data[i]\n", + " guess = self.predictor(datapoint[\"text\"])\n", + " truth = datapoint[\"price\"]\n", + " error = abs(guess - truth)\n", + " log_error = math.log(truth+1) - math.log(guess+1)\n", + " sle = log_error ** 2\n", + " color = self.color_for(error, truth)\n", + " # title = datapoint[\"text\"].split(\"\\n\\n\")[1][:20] + \"...\"\n", + " self.guesses.append(guess)\n", + " self.truths.append(truth)\n", + " self.errors.append(error)\n", + " self.sles.append(sle)\n", + " self.colors.append(color)\n", + " # print(f\"{COLOR_MAP[color]}{i+1}: Guess: ${guess:,.2f} Truth: ${truth:,.2f} Error: ${error:,.2f} SLE: {sle:,.2f} Item: {title}{RESET}\")\n", + "\n", + " def chart(self, title):\n", + " # max_error = max(self.errors)\n", + " plt.figure(figsize=(12, 8))\n", + " max_val = max(max(self.truths), max(self.guesses))\n", + " plt.plot([0, max_val], [0, max_val], color='deepskyblue', lw=2, alpha=0.6)\n", + " plt.scatter(self.truths, self.guesses, s=3, c=self.colors)\n", + " plt.xlabel('Ground Truth')\n", + " plt.ylabel('Model Estimate')\n", + " plt.xlim(0, max_val)\n", + " plt.ylim(0, max_val)\n", + " plt.title(title)\n", + "\n", + " # Add color legend\n", + " from matplotlib.lines import Line2D\n", + " legend_elements = [\n", + " Line2D([0], [0], marker='o', color='w', label='Accurate (green)', markerfacecolor='green', markersize=8),\n", + " Line2D([0], [0], marker='o', color='w', label='Medium error (orange)', markerfacecolor='orange', markersize=8),\n", + " Line2D([0], [0], marker='o', color='w', label='High error (red)', markerfacecolor='red', markersize=8)\n", + " ]\n", + " plt.legend(handles=legend_elements, loc='upper right')\n", + "\n", + " plt.show()\n", + "\n", + "\n", + " def report(self):\n", + " average_error = sum(self.errors) / self.size\n", + " rmsle = math.sqrt(sum(self.sles) / self.size)\n", + " hits = sum(1 for color in self.colors if color==\"green\")\n", + " title = f\"{self.title} Error=${average_error:,.2f} RMSLE={rmsle:,.2f} Hits={hits/self.size*100:.1f}%\"\n", + " self.chart(title)\n", + "\n", + " def run(self):\n", + " self.error = 0\n", + " for i in range(self.size):\n", + " self.run_datapoint(i)\n", + " self.report()\n", + "\n", + " @classmethod\n", + " def test(cls, function, data):\n", + " cls(function, data).run()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 692 + }, + "id": "1wA5uVgpTWLC", + "outputId": "5a597437-50c8-419c-c1da-af0166dabe0f" + }, + "outputs": [], + "source": [ + "Tester.test(model_predict, test)" + ] + }, + { + "attachments": { + "image.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": { + "id": "lSfPbebIq2Ml" + }, + "source": [ + "![image.png](attachment:image.png)\n", + "\n", + "Alright — now that we know where things stand, it’s time to shake things up.\n", + "\n", + "Can QLoRA fine-tuning unlock the true power of LLaMA 3.1?\n", + "\n", + "👀 Let’s find out... in the [next notebook](https://github.com/lisekarimi/lexo/blob/main/09_part6_ft_llama_qlora.ipynb)" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/week7/community_contributions/lisekarimi/09_part6_ft_llama_qlora.ipynb b/week7/community_contributions/lisekarimi/09_part6_ft_llama_qlora.ipynb new file mode 100644 index 0000000..af4b5e7 --- /dev/null +++ b/week7/community_contributions/lisekarimi/09_part6_ft_llama_qlora.ipynb @@ -0,0 +1,907 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 🔍 Predicting Item Prices from Descriptions (Part 6)\n", + "---\n", + "- Data Curation & Preprocessing\n", + "- Model Benchmarking – Traditional ML vs LLMs\n", + "- E5 Embeddings & RAG\n", + "- Fine-Tuning GPT-4o Mini\n", + "- Evaluating LLaMA 3.1 8B Quantized\n", + "- ➡️ Fine-Tuning LLaMA 3.1 with QLoRA\n", + "- Evaluating Fine-Tuned LLaMA\n", + "- Summary & Leaderboard\n", + "\n", + "---\n", + "\n", + "# ⚙️ Part 6: Fine-Tuning LLaMA 3.1 with QLoRA\n", + "\n", + "- 🧑‍💻 Skill Level: Advanced\n", + "- ⚙️ Hardware: ⚠️ GPU required - use Google Colab (A100)\n", + "- 🛠️ Requirements: 🔑 HF Token, wandb API Key ([Weights & Biases](https://wandb.ai))\n", + "- Tasks:\n", + " - Load and split dataset (Train/validation); set up [Weights & Biases](https://wandb.ai) logging\n", + " - Load quantized LLaMA 3.1 8B and tokenizer\n", + " - Prepare data with a collator for fine-tuning\n", + " - Configure QLoRA (LoRAConfig), training settings (SFTConfig), and tune key hyperparameters\n", + " - Fine-tune and push best model to Hugging Face Hub\n", + "\n", + "⚠️ I attempted to fine-tune the model on the full 400K dataset using an A100 on Google Colab, but it consistently crashed. So for now, I’m training on a 20K subset to understand the process, play with hyperparameters, track progress in Weights & Biases, and push the best checkpoint to the Hub.\n", + "\n", + "⏱️ Training on 20,000 examples took over 2 hours.\n", + "\n", + "The full model fine-tuned on the complete 400K dataset is available thanks to our instructor, [Ed](https://www.linkedin.com/in/eddonner) — much appreciated! \n", + "We’ll dive into that model in the next notebook — **stay tuned** 😉\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "MDyR63OTNUJ6", + "outputId": "525372ce-f614-44f1-b894-80e289958197" + }, + "outputs": [], + "source": [ + "# Install required packages in Google Colab\n", + "%pip install -q datasets transformers torch peft bitsandbytes trl accelerate wandb" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-yikV8pRBer9" + }, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import torch\n", + "import wandb\n", + "from google.colab import userdata\n", + "from datetime import datetime\n", + "from datasets import load_dataset\n", + "from huggingface_hub import login\n", + "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, EarlyStoppingCallback\n", + "from peft import LoraConfig\n", + "from trl import SFTTrainer, SFTConfig, DataCollatorForCompletionOnlyLM" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Google Colab User Data\n", + "# Ensure you have set the following in your Google Colab environment:\n", + "hf_token = userdata.get('HF_TOKEN')\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B48QsPsvUs_x" + }, + "source": [ + "## 🔀 Load Dataset from HF and Split into Train/Validation" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# #If you face NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported run:\n", + "# %pip install -U datasets (for Google Colab)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 177, + "referenced_widgets": [ + "6f1f8dca2a334818a36fae380818001e", + "6d3be1ece4a949d3b8d3736db02bcb5c", + "c8c6bbacfe254c539f4acda8cdd5c04d", + "db87c136ff15430892aa75fa47521b0c", + "1d56af1140034021b2aecc5df846e499", + "6238783102084e0c99626bf948ff5bb6", + "f523b67e652049f7b13131d2750325bb", + "f03cc2cf18c140c8b4a076ab99ac86e3", + "472bb957b0e149df8ef0c26c3a3ffc19", + "86dfcc161f2d41a7a33041848766d091", + "6a7ed9e79ebb4f9c9962d08c78b424ca", + "efc4817d5f734852a844640ebe7eceed", + "0b473a8e944c4b028f51f53f62b72deb", + "1fd89859568440f58f3ab56f32183dd4", + "2e4bd8853acc4faa92e461210df2c689", + "3fb588f271db4b7abb9a3631582cc7d6", + "8f9c00ca63ca47e9873ec2a743fa1512", + "afdae504b36845b9a98874cced112721", + "8afd0ddfdeca43b59207a8b35a35e13c", + "0be7a6fdb206420d88b2b2e45a37432c", + "00f0983c1d204862b589011100297ffe", + "8c7de85bcec742ec85f1e8b854351056", + "5847c75b6dd74bc1b13116d91431ccf2", + "bcb0ad86493f45848895c02c0b9deaf6", + "18d70754531248b1ab22e1fd0df061ae", + "028d806f909f42e2b6a7ec630f6e3cb5", + "ff00d3192c734b398f779c7fffde57c8", + "55388dcb89f84c7ebe7f5f7051f2d98b", + "d3cab2b162a740fb82f78f030ea32b45", + "cea0149336be4c92952bacb8aa820926", + "6b560f8a028c4ba39896fd97f48f18ad", + "2a3ed922dab44648b6d6ed63e21c549d", + "885e1f4b9c3d45d5acd8d0a368ca557d", + "73e42dca7c4b455f8be4b34236e6ced2", + "c36aec28025e4baab8a3c4a293297f15", + "7569e26e1e2b46e4a7018e1bd2bc92d5", + "9f5795d223e74f1e8e49709ec1e4ddf1", + "5638ccb893164fc79980eb48d06909f9", + "70a528a0a08e4931b845ecc0992e07d6", + "669bbecd55804849bff5a850438d905d", + "245de1eaef2840b69e6c82afee68b4dc", + "ad57405b8f474c0aa92833f83dde70e8", + "cb3391329a7f4d0b93f5efffb9b0dcfe", + "cb0007dffa284be8aff41efacdfc31cb", + "c7de048747a24f9a9ce85396b87b8250", + "066b3f278ec24b299504cea66b3c3e63", + "0e1069c5bf644531902c51283a6d68e1", + "06bd7477f9fe45d0ad4138fc21bd29dc", + "adb68e7a8bea4b77b960e412c67a6286", + "39ec099d38f04f4e8ea334d0c5335e2f", + "044bf34d53024427801e24fbca808dc1", + "e3d2839112ff4b7f9ab5bc04900ff522", + "f620e7774fa04ed0a88d2f78d2243906", + "7a12c0d7b32b445f978809c9aee2c62d", + "5a230441445746d59ea8a10a4d5bb467" + ] + }, + "id": "XEE1FrSIh-EF", + "outputId": "8cd19745-2f6f-41e0-96dd-5a2f72ac3a63" + }, + "outputs": [], + "source": [ + "HF_USER = \"lisekarimi\" # your HF name here!\n", + "\n", + "DATASET_NAME = f\"{HF_USER}/pricer-data\"\n", + "dataset = load_dataset(DATASET_NAME)\n", + "train = dataset['train']\n", + "test = dataset['test']\n", + "split_ratio = 0.1 # 10% for validation\n", + "\n", + "##############################################################################\n", + "# Optional: limit training dataset to TRAIN_SIZE for testing/debugging\n", + "# Comment the two lines below to use the full dataset\n", + "TRAIN_SIZE = 20000\n", + "train = train.select(range(TRAIN_SIZE))\n", + "##############################################################################\n", + "\n", + "total_size = len(train)\n", + "val_size = int(total_size * split_ratio)\n", + "\n", + "val_data = train.select(range(val_size))\n", + "train_data = train.select(range(val_size, total_size))\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "lUPNqb2Bse21", + "outputId": "a3d09c8f-ce5a-46b0-e1b0-b4471a659f69" + }, + "outputs": [], + "source": [ + "print(f\"Train data size : {len(train_data)}\")\n", + "print(f\"Validation data size: {len(val_data)}\")\n", + "print(f\"Test data size : {len(test)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wixbM-VeVfsR" + }, + "source": [ + "## 🛠️ Hugging Face Configuration" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "id": "OixVUG06VmZk", + "outputId": "3cb523e0-fd03-4a18-913b-c22fa90e3bdd" + }, + "outputs": [], + "source": [ + "PROJECT_NAME = \"llama3-pricer\"\n", + "\n", + "# Run name for saving the model in the hub\n", + "\n", + "RUN_NAME = f\"{datetime.now():%Y-%m-%d_%H.%M.%S}-size{total_size}\"\n", + "PROJECT_RUN_NAME = f\"{PROJECT_NAME}-{RUN_NAME}\"\n", + "HUB_MODEL_NAME = f\"{HF_USER}/{PROJECT_RUN_NAME}\"\n", + "HUB_MODEL_NAME" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1-t1nGgnVTU4" + }, + "source": [ + "## 🛠️ wandb Configuration" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load from Colab's secure storage\n", + "wandb_api_key = userdata.get('WANDB_API_KEY')\n", + "\n", + "# Load from environment variables (.env file) if running Locally (GPU setup)\n", + "# wandb_api_key = os.getenv('WANDB_API_KEY')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "os.environ[\"WANDB_API_KEY\"] = wandb_api_key\n", + "wandb.login()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 156 + }, + "id": "yJNOv3cVvJ68", + "outputId": "0c03623e-6887-49e3-8989-bbe45dfc5d35" + }, + "outputs": [], + "source": [ + "# Configure Weights & Biases to record against our project\n", + "\n", + "LOG_TO_WANDB = True\n", + "\n", + "os.environ[\"WANDB_PROJECT\"] = PROJECT_NAME\n", + "os.environ[\"WANDB_LOG_MODEL\"] = \"checkpoint\" if LOG_TO_WANDB else \"end\"\n", + "os.environ[\"WANDB_WATCH\"] = \"gradients\"\n", + "\n", + "if LOG_TO_WANDB:\n", + " wandb.init(project=PROJECT_NAME, name=RUN_NAME)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qJWQ0a3wZ0Bw" + }, + "source": [ + "## 📥 Load the Tokenizer and Model" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 418, + "referenced_widgets": [ + "1b88f6d4010f4451a58abe2c46b74f62", + "139758ba39964f49b65eb67182eef68e", + "9c138d12dcb644fe9b72bd9eb5d26637", + "3bf8626162904a15932480ddbcea0ebd", + "a919a41b53604ccd91331d3f713e1310", + "5b8cdfe01f9a4c248e3de30442411ad4", + "e14d38a4c3e04d68ac30d475b0db1a73", + "dadfd3c2a521420890092be265c0aa50", + "761e88b179104dbbb6455ba81bd1f833", + "11f5b4df0c7344ba9e188f4eca82886f", + "125aa3f0dbd744eb82f8e4de94199736", + "6ca21586e6fc4a608adedba7889eadb5", + "023eb92e8a2b4323bfd12582e3c23962", + "c7c76b9845174e9687107595df27c050", + "78d4a28e03db4775b6e8e071c0b02d5d", + "8483c625762c49679877a37ab0ddcef9", + "1df5f6fe2fc04e60bfcb1f78689824ba", + "add10c416e334928af303d51dfd745c6", + "5e9e9dac85014292b94d347cc4bad3fe", + "d665aa6480624ab697f4e426b51d59de", + "03cce0d3f3a443fc808915b101576e4b", + "f15714023f234c39863b34d1a3721a8e", + "8f7a48d803eb4d2182c9da07af743ac7", + "74892e7b343d410bbbef60c64a823a9a", + "d6a70560831144e39dc9762d397f4c90", + "9b969f7fbcdc491cab71aac42761cd2a", + "d31f9443d1c646309c7a5e1ec39ffc0e", + "0f5a81846ab143bebf6ec422cda3f145", + "f0b05f3f7f37414c9d09470c94e304d7", + "d18784692c9c4ca99e277e6ed51e2bf1", + "f58addfac7c3438a90ebf10c88348d56", + "451deac2eeec45598590579340be0d4b", + "848e0651caf34ef288cca451e3d11274", + "5adf041222f843429c3a9f1b99becfed", + "a4764f36570b4752a1ec4392d2f0146c", + "511a4c6a898346acac9d98fd3a7cdf2c", + "26da7435a2614201a9e5b8087749f0e0", + "6054fa015ae44659beb7473c084c7b5b", + "3b9fc447a9ae4506a1edaf0fa449d9d5", + "6acef8f1820545ef90b22d90ac80427d", + "2a5cbad0b8fd45dc9ee25715b1015aef", + "86a9428f39be4d65a1e922bd9afb3800", + "96d919a1a7f14e91b8e6c91d855e36d5", + "82d7484aa2774015b7ea18d933afa9b6", + "b9d2d4f2c44a4d7cad2b3803c7f6e7be", + "9f3a176a6ae6426a8c1567a835da8680", + "006763d2301f4205a588adf5c19876a0", + "b44eb6596c3441bbaab288030f953a04", + "bf91666a0c054c79acb03d2e1bb38c37", + "f0185f1b4b23445c920a873eb63a9372", + "8e1ac15b677d4c21ad42ea1dda68fe05", + "87746d8d6d3d413ebb46b4e12fb74cc8", + "bb5ea1e92c434a46838f943648de87bd", + "1abcfcba332b40eb901d1331ed84f9bd", + "52fa5fcc629742619fa3105f73d90767", + "1bcc2d5771034c2dbc372031e83a2384", + "221cfaa2a5db4cf1ac399363c3589025", + "793f9bdc92a545519dd3279023e4ab50", + "55e25f5cc12f44f3a39fae501fccd060", + "59463b5e6286483394dedb602991ac95", + "fc95344ea44d40f28702360542afcff7", + "ffb3af537d6c41548ad88027505b04d6", + "6afcf0f6131d4dddbeda796e9c0c5bc5", + "93f65b3bc071453f86fe8f0f6c17d8fd", + "2ac9926ee4644232b43d84cfa95c584d", + "0c5a7738132b4f0f8b4810333b37c588", + "99d41ffa37134be9a57fe5e50a59b67d", + "50e71304ab4f42c29f1994fed9b595b8", + "76b4b0d63e524eb783429169a25be74e", + "441cfadbe4b446f4b61391b7be4d2865", + "6751f0c35b634d7c9b06c4e41f9ff851", + "6a5dc276bbf64bf9b5a99751068ee228", + "b3ac6055014642a285435f877d5651f5", + "e9137600b29c4ecaad4ef8bca5fd5f91", + "634afb9c1b8c4e29b3ec7b76a1108ae4", + "6be0ac91035548fbbe778e3d7fd58e7e", + "e8e9d5c979ac4afba526e38b6d0851be", + "a4ae8ca9c0e7478fbad3b9ed67bc21a2", + "faf3a64e316a43ddbac8ba14573c4eb4", + "a395885e39434f9f98246d0fb1c94c8f", + "d13552c90ead4804a4d5a21121f25536", + "c25b94002c2246a9aa7f6ed1e4a22cfa", + "e3892cf602cb4a49948f26cae1e7644c", + "bc290a324a7147c5b6a722acb41ed05a", + "2b556f5aa6324958ac6fe36bddf17909", + "67c6a0534b3a4345b9c11af1bffdfbf0", + "d767921bb23c485396282cb79a4d1836", + "d598468ad8f94146976f70d873f0b56d", + "b547888cd5494b21911b7d457ab6fbac", + "28362e43274848109c2624e5668942b0", + "7a27fc65bc0b44ce9bd959f4be13514d", + "73bc97e6d9cc4ccd8d134092ce970026", + "c042bf08ab23410098e6d16e837d19ce", + "d2930ad2c08748d0883bb77c68acf940", + "c2a1291730874e8e94232c0d51575f81", + "cb92871b11a0410eb295cc323e5872a7", + "150a5ce5d8124b0eb9e44d8715b8b1ab", + "7a6f05ad1f2e483dbcdca102c66530b0", + "626a29aee42e4e6d8c18d8ea5889734a", + "c549ca0548d04a7d8749a0842c4aa62b", + "958c0ff0f47f4c0fa4e2085f5243d84f", + "a8171febcac94a4b902ff737592f3f47", + "22630cdb7d6f4975bc31cc189987573d", + "2f8a9ccee6ea4cdd8c8c225575cae0ce", + "e40f81c5c4334accbca947964146d238", + "d6849da8e89546469188dc047c66ea25", + "8a67d8a2ac0a4fd7a41aa5c890049525", + "5bf18445be0e46e087cbcd377ccfffbe", + "72b2020c9479471681ce0f42898cfe1c", + "c114fd62eb4b4fdca94654668c8f2374", + "401580df26fc40abb2b774c3d9684921", + "e756b825b211476994a69fb65f4bbf7c", + "b2c26cf10e5a4d4fa8961f5c9cca18ce", + "c288256c73dd44d08916db4e9cf989f0", + "250a72e9650845d2b274bc3c157439f8", + "94281c7e5be049c1a9f3dfa082805133", + "f004f9f743ae4229aa90c92abba6ded6", + "bd8ca5b8aaed4809a93f553d5cb4a887", + "4cec4c2d73de4d52b2143082645536ac", + "893b96616a0e47bfaa0434e10eca1341", + "74e7d88dd4894894ac2c16fdfd29233b", + "9e1f1e4288df407fa03415664dc361d5", + "81dc3f390b9a49f8b1be5c43580b070d", + "917a225a9bb74f8ab034dcdcee3c7247", + "bc6c698857ce4f8eabc1571ba0ff0edf", + "e9ae1c247ae5409f9da4db84ce71a6e3", + "55071660223e4022a6a7836572077c0c", + "8364e661011743af9fd40dabc5a7dfe4", + "ac65442e0d5e43e2998d7c700573228a", + "666f3434ae8a495f8ada8fedb50b7051", + "1977e9f07f104faead7dfcfa8aaed6f2", + "ebe2257c07f345fea72f162542a45142" + ] + }, + "id": "R_O04fKxMMT-", + "outputId": "29aa1cf7-2a2e-492e-adc9-cd0a5bfb123e" + }, + "outputs": [], + "source": [ + "BASE_MODEL = \"meta-llama/Meta-Llama-3.1-8B\"\n", + "\n", + "quant_config = BitsAndBytesConfig(\n", + " load_in_4bit=True, # Reduce the precision to 4 bits\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_compute_dtype=torch.bfloat16,\n", + " bnb_4bit_quant_type=\"nf4\"\n", + ")\n", + "\n", + "tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)\n", + "tokenizer.pad_token = tokenizer.eos_token\n", + "tokenizer.padding_side = \"right\"\n", + "\n", + "base_model = AutoModelForCausalLM.from_pretrained(\n", + " BASE_MODEL,\n", + " quantization_config=quant_config,\n", + " device_map=\"auto\",\n", + ")\n", + "base_model.generation_config.pad_token_id = tokenizer.pad_token_id\n", + "\n", + "print(f\"Memory footprint: {base_model.get_memory_footprint() / 1e6:.1f} MB\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SrCE2Le7RBRj" + }, + "source": [ + "## ⚙️ Fine-tune our LLaMA 3 8B (4-bit quantized) model with QLoRA\n", + "- 1. Prepare the Data with a Data Collator\n", + "- 2. Define the QLoRA Configuration (LoraConfig)\n", + "- 3. Set the Training Parameters (SFTConfig)\n", + "- 4. Initialize the Fine-Tuning Trainer (SFTTrainer)\n", + "- 5. Run Fine-Tuning and Push to Hub" + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9BYO0If4uWys" + }, + "source": [ + "### 🔄 1. Prepare the Data with a Data Collator\n", + "\n", + "We only want the model to learn the price, not the product description. Everything before \"Price is $\" is context, not training target. HuggingFace’s DataCollatorForCompletionOnlyLM handles this masking automatically:\n", + "\n", + "1. Tokenizes the response_template (\"Price is $\")\n", + "2. Finds its token position in each input\n", + "3. Masks all tokens before it (context)\n", + "4. Trains the model only on tokens after it (the price)\n", + "\n", + "\n", + "Example:\n", + "\n", + "Input: \"Product: Red T-shirt. Price is $12.99\"\n", + "\n", + "Masked: \"Product: Red T-shirt. Price is $\" → masked (no loss)\n", + "\n", + "\"12.99\" → not masked (model is trained to predict this)\n", + "\n", + "So the model learns to generate 12.99 given the context, but isn’t trained to repeat or memorize the description." + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2omVEaPIVJZa" + }, + "outputs": [], + "source": [ + "response_template = \"Price is $\"\n", + "collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4DaOeBhyy9eS" + }, + "source": [ + "### 🧠 2. Define the QLoRA Configuration (LoraConfig)" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0HKuVS_XR3cw" + }, + "outputs": [], + "source": [ + "LORA_R = 32\n", + "LORA_ALPHA = 64\n", + "TARGET_MODULES = [\"q_proj\", \"v_proj\", \"k_proj\", \"o_proj\"]\n", + "LORA_DROPOUT = 0.1\n", + "\n", + "lora_parameters = LoraConfig(\n", + " r=LORA_R,\n", + " lora_alpha=LORA_ALPHA,\n", + " target_modules=TARGET_MODULES,\n", + " lora_dropout=LORA_DROPOUT,\n", + " bias=\"none\",\n", + " task_type=\"CAUSAL_LM\", # Specifies we're doing causal language modeling\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uLfFsfNQSBAm" + }, + "source": [ + "### ⚙️ 3. Set the Training Parameters (SFTConfig)" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7PKXdhPXSJot" + }, + "outputs": [], + "source": [ + "# 📦 Training Setup:\n", + "EPOCHS = 1\n", + "BATCH_SIZE = 16 # A100 GPU can go up to 16\n", + "GRADIENT_ACCUMULATION_STEPS = 2\n", + "MAX_SEQUENCE_LENGTH = 182 # Max token length per input\n", + "\n", + "# ⚙️ Optimization:\n", + "LEARNING_RATE = 1e-4\n", + "LR_SCHEDULER_TYPE = 'cosine'\n", + "WARMUP_RATIO = 0.03\n", + "OPTIMIZER = \"paged_adamw_32bit\"\n", + "\n", + "# 💾 Checkpointing & Logging:\n", + "SAVE_STEPS = 200 # Checkpoint\n", + "STEPS = 20 # Log every 20 steps\n", + "save_total_limit = 10 # Keep latest 10 only\n", + "\n", + "\n", + "LOG_TO_WANDB = True\n", + "\n", + "HUB_MODEL_NAME = f\"{HF_USER}/{PROJECT_RUN_NAME}\"\n", + "\n", + "train_parameters = SFTConfig(\n", + " # Output & Run\n", + " output_dir=PROJECT_RUN_NAME,\n", + " run_name=RUN_NAME,\n", + " dataset_text_field=\"text\",\n", + " max_seq_length=MAX_SEQUENCE_LENGTH,\n", + "\n", + " # Training\n", + " num_train_epochs=EPOCHS,\n", + " per_device_train_batch_size=BATCH_SIZE,\n", + " gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,\n", + " max_steps=-1,\n", + " group_by_length=True,\n", + "\n", + " # Evaluation\n", + " eval_strategy=\"steps\",\n", + " eval_steps=STEPS,\n", + " per_device_eval_batch_size=1,\n", + "\n", + " # Optimization\n", + " learning_rate=LEARNING_RATE,\n", + " lr_scheduler_type=LR_SCHEDULER_TYPE,\n", + " warmup_ratio=WARMUP_RATIO,\n", + " optim=OPTIMIZER,\n", + " weight_decay=0.001,\n", + " max_grad_norm=0.3,\n", + "\n", + " # Precision\n", + " fp16=False,\n", + " bf16=True,\n", + "\n", + " # Logging & Saving\n", + " logging_steps=STEPS, # See loss after each {STEP} batches\n", + " save_strategy=\"steps\",\n", + " save_steps=SAVE_STEPS, # Model Checkpointed locally\n", + " save_total_limit=save_total_limit,\n", + " report_to=\"wandb\" if LOG_TO_WANDB else None,\n", + "\n", + " # Hub\n", + " push_to_hub=True,\n", + " hub_strategy=\"end\", # Only push once, at the end\n", + " load_best_model_at_end=True, # Loads the best eval_loss checkpoint\n", + " metric_for_best_model=\"eval_loss\", # Monitors eval_loss\n", + " greater_is_better=False, # Lower eval_loss = better model\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1q-a3LHDSoxQ" + }, + "source": [ + "### 🧩 4. Initialize the Fine-Tuning Trainer (SFTTrainer)\n", + "Combining everything" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 290, + "referenced_widgets": [ + "6753caf741414a4c8fa309978253c8cd", + "aeade430d57b4338910ad0c3645fd06a", + "eb7081b71cc14aff9b99dba8f9368def", + "8eb16171df804d06a02351f74bb28dc4", + "9d60a205ebda49ca88220cc4eec716ca", + "d8ff973b90374423b4b5e17a1937111c", + "4bf3bf107f2c4e28a58387c96916e97f", + "d66cb8c1829c439095f4691fa32d7b6e", + "567c8321685045c5a873b3b1edecdc96", + "96ff596facb94acab611201b4adac13f", + "de65507ce09a4ef4ad8f28d46d335acc", + "e40fe92fe9094a58b53f0eeb97d3d629", + "592615cc81624de5a9934f5671d6c188", + "fadf75d91df54f49acef3f178ea53ce3", + "5ccca8ab6cb94a88bb27bd482f7948a9", + "d74dcc2ef9b8442d9ae99db2a79e0c48", + "580ebfa370d34426933e8c7389872e2b", + "1187f05dc99641e9a68d9cf49216c370", + "7deffbba68ba4f018374bd6bec62dd18", + "d24cdc40a6a34d6eb0efbfde17505d6f", + "31d44a308b4b4557934ec887e0b6a817", + "76112ce6fdc4496dba783451efa28cfd", + "15a85e4a77484c9392b2e5cb8767b336", + "4524d775b9034a1f890673a9c005d123", + "5ab6a6b427f84ec685ac52f6ff0d63b5", + "427ee9e90a844313989f623aba124498", + "6d2b7c059e6b42afa955fe01bf38011d", + "5d821ed8ffe14927be799c4d31043a82", + "12f9fab59e9849dcb7b3b17c5674580f", + "dd4a2876db37476fa438e8758c855393", + "f115f97428764c53ac780131fd75bd17", + "1a1e0e562a844ed098e97ce8a62695ee", + "0a7ae7cc902243a5996f730f0fe05cdb", + "07205ea24c3f4959bf9ebd393f5c921d", + "723bb8342ac84eedabd91e3eef178967", + "28714d0cf3d84a48975c8ad31e29691d", + "dd1d90d76d914839a1dad1cddab2c09f", + "e2d55edf98784523bcbeaad0cc2be494", + "d00ecfa9dc44428b989ec1a9deb27eae", + "ba2717985bc342e9827f8901ef655b00", + "6669dc8f20e3461f93c95cef7a90b201", + "29cb36c1943c4e1b9898534aaf32bd37", + "14a1449c13a14afda16bc7c05b7fd840", + "259d315eb4584c699b1c738d411eab7e", + "a4bb13eb7cee4f87b0e3e1a3a1be18e7", + "14d8a699a92044cda33802d96aaa41a2", + "d345350fd5ad4a028fbbc45cfc9f6db3", + "6953210353f840d59457fc54f4f8b829", + "d6cd9e1196f04ecbba83dc0b446b2c65", + "9e380ef863204da5863c9b6e7a2c8340", + "1d1bb803831d46309619f6a0c51c2eeb", + "6a50aaf7ad304a5aa3f29113121e8fe0", + "7a573a39c2b245f5a84626d951584f67", + "a57e66367d4245f6bcd4ad0463535583", + "d6f3327d39a34ec5a44d976f239a61ce", + "8f450df9f161409a8102c1f0b63edad8", + "95d932d12cb8442da17adb8e9782c40c", + "41c5f295b45f4828a9327b699b85ca01", + "9e4f3fd6bf7749f88ccd7ba65dd9446f", + "a8f8cb0d9fb14f30a537977f3d51a2c4", + "4e9e4ed0f2db4d7ba5a5bb0d00676a0c", + "1fe2bab9c9aa4de48e6e2512f9a7d0a1", + "d93ac5affccf404fa3916e7f3dd62943", + "92346fc65f48493d80198ac6d7adf4d8", + "647bfb2a24cc44a0adaf69ced8e99213", + "5c96424cff314aa484e4bc905bcbd761", + "cec2fcfb30194d5ab8c0a3868bad3598", + "35df7031c4964cef9c53bba6eabbe91d", + "e15c772e14264c9889e6dae34015e04b", + "e85b65cb497c48c2b844ae3e5d9efc60", + "52c8495d46ca4a3c8c6694a700d05e95", + "3db6d8a5ce2a40daaae6714807a27997", + "051d74df7ef1468aa968cac5792e7b00", + "75838a7c887545ff9fbbf5887a1336bc", + "59f698c1829148ac90edda008d5c6f69", + "35921436c69643aab792bd1333c749ef", + "2dd51cc6033746e1a8def460e5e51ff5", + "a8a3e5973ee5441087d10dfb17bfa1d6", + "64c3b3c02e844df6bfd3acf1ee23d765", + "83016eccdd7f4dedab9d3ea6e6852977", + "9d4c5a62214f4649b77365349ae4ac88", + "07cb9756d1814a7ba7fb49cccb2763cb", + "492454ad524742bd8bb3f5c3d5b37feb", + "e98053f6b7f045da812088d1e76d3a31", + "f2aeb3ae99cc4b7ca97fb959df1150ad", + "f92e18b6ab0147b1b428724f5155ca61", + "14356b2447e349ee8478478eb231fa81", + "f244a7e331d941f5a99712dcbc5550ea" + ] + }, + "id": "fCwmDmkSATvj", + "outputId": "2b4adc75-e0db-4e0b-c90b-9f9ff2dfd3c6" + }, + "outputs": [], + "source": [ + "# The latest version of trl is showing a warning about labels - please ignore this warning\n", + "fine_tuning = SFTTrainer(\n", + " model=base_model,\n", + " train_dataset=train_data,\n", + " eval_dataset=val_data,\n", + " peft_config=lora_parameters, # QLoRA config\n", + " args=train_parameters, # SFTConfig\n", + " data_collator=collator,\n", + " callbacks=[EarlyStoppingCallback(early_stopping_patience=5)] # Early stop if no val improvement for 5 steps\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vHz6JA5_XJ07" + }, + "source": [ + "### 🚀 5. Run Fine-Tuning and Push to Hub" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "GfvAxnXPvB7w", + "outputId": "d351d89a-b3d7-4e2b-fee2-5ba2e929837e" + }, + "outputs": [], + "source": [ + "fine_tuning.train()\n", + "print(f\"✅ Best model pushed to HF Hub: {HUB_MODEL_NAME}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](https://github.com/lisek75/nlp_llms_notebook/blob/main/assets/09_train_eval_loss_steps.png?raw=true)\n", + "\n", + "![](https://github.com/lisek75/nlp_llms_notebook/blob/main/assets/09_train_eval_loss_wandb.png?raw=true)\n", + "\n", + "This chart shows training loss vs evaluation loss over steps during fine-tuning of Llama 31 8B 4-Bit FT (20K Samples).\n", + "\n", + "- Blue line (train/loss): Decreasing overall, with some noise. Final value: 1.8596.\n", + "- Orange line (eval/loss): Smoother and consistently lower than training loss. Final value: 1.8103.\n", + "\n", + "- No overfitting: Eval loss < train loss throughout — a good sign.\n", + "- Stable convergence: Both curves flatten around step 500, suggesting the model is reaching training stability.\n", + "- Final eval loss is low, indicating decent generalization to unseen data.\n", + "\n", + "This fine-tuning run looks healthy. We can likely push further with more data - 400K run." + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 938 + }, + "id": "32vvrYRVAUNg", + "outputId": "bb4ab0f6-c390-48f3-a71c-2d259bb0ec0b" + }, + "outputs": [], + "source": [ + "if LOG_TO_WANDB:\n", + " wandb.finish()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](https://github.com/lisek75/nlp_llms_notebook/blob/main/assets/09_run_summary_qlora_llama.png?raw=true)" + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IyKZ0r38IfT3" + }, + "source": [ + "Now that our best model is pushed to Hugging Face, let’s put it to the test.\n", + "\n", + "🔜 See you in the [next notebook](https://github.com/lisekarimi/lexo/blob/main/09_part7_eval_llama_qlora.ipynb)" + ], + "outputs": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "A100", + "provenance": [] + }, + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/week7/community_contributions/lisekarimi/09_part7_eval_llama_qlora.ipynb b/week7/community_contributions/lisekarimi/09_part7_eval_llama_qlora.ipynb new file mode 100644 index 0000000..bfe78d1 --- /dev/null +++ b/week7/community_contributions/lisekarimi/09_part7_eval_llama_qlora.ipynb @@ -0,0 +1,739 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "GHsssBgWM_l0" + }, + "source": [ + "# 🔍 Predicting Item Prices from Descriptions (Part 7)\n", + "---\n", + "- Data Curation & Preprocessing\n", + "- Model Benchmarking – Traditional ML vs LLMs\n", + "- E5 Embeddings & RAG\n", + "- Fine-Tuning GPT-4o Mini\n", + "- Evaluating LLaMA 3.1 8B Quantized\n", + "- Fine-Tuning LLaMA 3.1 with QLoRA\n", + "- ➡️ Evaluating Fine-Tuned LLaMA\n", + "- Summary & Leaderboard\n", + "\n", + "---\n", + "\n", + "# 🧪 Part 7: Evaluating the Fine-Tuned LLaMA 3.1 8B (Quantized)\n", + "\n", + "- 🧑‍💻 Skill Level: Advanced\n", + "- ⚙️ Hardware: ⚠️ GPU required - use Google Colab\n", + "- 🛠️ Requirements: 🔑 HF Token\n", + "- Tasks:\n", + " - Load the tokenizer and fine-tuned base model\n", + " - Load the PEFT adapter for the fine-tuned weights\n", + " - Run evaluation — the moment of truth!\n", + "\n", + "🔔 **Reminder:** \n", + "As mentioned in Part 6, I fine-tuned the model on only 20K samples. \n", + "In this notebook, we’ll evaluate both this model and the full 400K-sample version fine-tuned by our instructor.\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MDyR63OTNUJ6" + }, + "outputs": [], + "source": [ + "# Install required packages in Google Colab\n", + "%pip install -q datasets transformers torch peft bitsandbytes matplotlib" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-yikV8pRBer9" + }, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import math\n", + "import torch\n", + "from huggingface_hub import login\n", + "import torch.nn.functional as F\n", + "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, set_seed\n", + "from datasets import load_dataset\n", + "from peft import PeftModel\n", + "import matplotlib.pyplot as plt\n", + "from google.colab import userdata" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WyFPZeMcM88v" + }, + "outputs": [], + "source": [ + "# Google Colab User Data\n", + "# Ensure you have set the following in your Google Colab environment:\n", + "hf_token = userdata.get('HF_TOKEN')\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "30lzJXBH7BcK" + }, + "outputs": [], + "source": [ + "# Helper class for evaluating model predictions\n", + "\n", + "GREEN = \"\\033[92m\"\n", + "YELLOW = \"\\033[93m\"\n", + "RED = \"\\033[91m\"\n", + "RESET = \"\\033[0m\"\n", + "COLOR_MAP = {\"red\":RED, \"orange\": YELLOW, \"green\": GREEN}\n", + "\n", + "class Tester:\n", + "\n", + " def __init__(self, predictor, data, title=None, size=250):\n", + " self.predictor = predictor\n", + " self.data = data\n", + " self.title = title or predictor.__name__.replace(\"_\", \" \").title()\n", + " self.size = size\n", + " self.guesses = []\n", + " self.truths = []\n", + " self.errors = []\n", + " self.sles = []\n", + " self.colors = []\n", + "\n", + " def color_for(self, error, truth):\n", + " if error<40 or error/truth < 0.2:\n", + " return \"green\"\n", + " elif error<80 or error/truth < 0.4:\n", + " return \"orange\"\n", + " else:\n", + " return \"red\"\n", + "\n", + " def run_datapoint(self, i):\n", + " datapoint = self.data[i]\n", + " guess = self.predictor(datapoint[\"text\"])\n", + " truth = datapoint[\"price\"]\n", + " error = abs(guess - truth)\n", + " log_error = math.log(truth+1) - math.log(guess+1)\n", + " sle = log_error ** 2\n", + " color = self.color_for(error, truth)\n", + " # title = datapoint[\"text\"].split(\"\\n\\n\")[1][:20] + \"...\"\n", + " self.guesses.append(guess)\n", + " self.truths.append(truth)\n", + " self.errors.append(error)\n", + " self.sles.append(sle)\n", + " self.colors.append(color)\n", + " # print(f\"{COLOR_MAP[color]}{i+1}: Guess: ${guess:,.2f} Truth: ${truth:,.2f} Error: ${error:,.2f} SLE: {sle:,.2f} Item: {title}{RESET}\")\n", + "\n", + " def chart(self, title):\n", + " # max_error = max(self.errors)\n", + " plt.figure(figsize=(12, 8))\n", + " max_val = max(max(self.truths), max(self.guesses))\n", + " plt.plot([0, max_val], [0, max_val], color='deepskyblue', lw=2, alpha=0.6)\n", + " plt.scatter(self.truths, self.guesses, s=3, c=self.colors)\n", + " plt.xlabel('Ground Truth')\n", + " plt.ylabel('Model Estimate')\n", + " plt.xlim(0, max_val)\n", + " plt.ylim(0, max_val)\n", + " plt.title(title)\n", + "\n", + " # Add color legend\n", + " from matplotlib.lines import Line2D\n", + " legend_elements = [\n", + " Line2D([0], [0], marker='o', color='w', label='Accurate (green)', markerfacecolor='green', markersize=8),\n", + " Line2D([0], [0], marker='o', color='w', label='Medium error (orange)', markerfacecolor='orange', markersize=8),\n", + " Line2D([0], [0], marker='o', color='w', label='High error (red)', markerfacecolor='red', markersize=8)\n", + " ]\n", + " plt.legend(handles=legend_elements, loc='upper right')\n", + "\n", + " plt.show()\n", + "\n", + "\n", + " def report(self):\n", + " average_error = sum(self.errors) / self.size\n", + " rmsle = math.sqrt(sum(self.sles) / self.size)\n", + " hits = sum(1 for color in self.colors if color==\"green\")\n", + " title = f\"{self.title} Error=${average_error:,.2f} RMSLE={rmsle:,.2f} Hits={hits/self.size*100:.1f}%\"\n", + " self.chart(title)\n", + "\n", + " def run(self):\n", + " self.error = 0\n", + " for i in range(self.size):\n", + " self.run_datapoint(i)\n", + " self.report()\n", + "\n", + " @classmethod\n", + " def test(cls, function, data):\n", + " cls(function, data).run()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 📥 Load Dataset" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# #If you face NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported run:\n", + "# %pip install -U datasets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 177, + "referenced_widgets": [ + "61f42f612e98467684716cc7421c7554", + "a7e864c2ae21482e8bcdbc42a5a65309", + "63405c5e47da4652b052ee6099ead31e", + "0864a38b1c494308a07defced89f4fe3", + "8f089946a97d4becb3ff06b7a65595a2", + "42b865ac9e4f4ecaa475c4d69929e401", + "3478290afe1d48268c7c07206c212eda", + "f21c0db9205f4c40a2f9ea1ddd66b59e", + "4604f38122454bc1b1826311a326eb12", + "6e2b95e33cab4fe9b9f555195b634fac", + "b8f0f357a61c4502962f385291c3bac8", + "fa49b7e56b054faca67334e08bbf622c", + "243d84401ba24360a42c2636d7984772", + "bbcf01edcbcd425b9ca1e61e80f6df4f", + "17b41698c33044c7942e66e63c5c2d2d", + "14dfccde2f6a47679cea42ce965b6ef2", + "6a1570c8980b4d5ebac78348f79c4f1b", + "44f1922676f3417fb7baccd92bf53cea", + "176b023546bc4053a4d484205d7ab200", + "b02018254c4b4fb680e382974380c331", + "766aba35ebf54996990e075e4f692f96", + "24ceffd3b8c64e5f983e52d743ebef8d", + "5b9076b6c05a4454a7233302114b9d8c", + "4bfbd393271844de825a53c7d639fa60", + "3313091548bf414fabf84f5aa2c85d14", + "f98c7fe4ad6d4649a7a104f973992be0", + "fd1eb06d0aa64ba59ae9bb214f2c94ed", + "24237203b2c44709b20ca84b95387849", + "7910e6a4881a43638c4e91dd0f024092", + "f22dad57ee324ca8b927f9a3b8cc6edc", + "20a702b1ccbe499eabf70af974561417", + "48f72254ce6f408c94bf56a3919c032e", + "6bf00cd26256489fb209b8b51ca9fb0e", + "da3c453facaf41b6bc89d311d9f1ce74", + "78487c1a13e84e7bb35a72a07ad9b681", + "3866fe39fcc34120a0b4c4b36c8eaa6c", + "54de8e445909429f9d7ca9ad02e8f190", + "eeda8994cb8d46cc9d5c2212907ab869", + "b670675ee9bc4689a34f997d0da13b82", + "56727a21bb4648fe8ae46d3a61b39f4a", + "da89c856fbf746b496d37cbef92305b9", + "2f4ba348ef7246af8b1cd04352bcbd1d", + "0d86b4a93411494eb8e725440e393cff", + "203c4888674c46bba1033639ad4286a2", + "005dac04aacb4955ae079d36bfc4cd19", + "68ff796bdee44aa380324374ae38fd25", + "411691dce3f1457cb3ee9e8ad652d61d", + "f0fc209cb9e74d0ca3c0c9b14b1450e0", + "6e2155c3ad3243508dff34919eecd0a2", + "68891d88fe7e417abbd508d2089e7960", + "8e1ab77817bc4ec2835b195a0beb1096", + "c638e3a09f6b4caaa078e242b010744e", + "ee9abd78adb54984868ebee19f638e25", + "8280e432938b4e9794c95e47bb9c02fa", + "abdd2ff8028b432091434805f81c455c" + ] + }, + "id": "cvXVoJH8LS6u", + "outputId": "6308b124-a922-4e82-fb6a-5933d3c324e0" + }, + "outputs": [], + "source": [ + "DATASET_NAME = \"lisekarimi/pricer-data\"\n", + "dataset = load_dataset(DATASET_NAME)\n", + "train = dataset['train']\n", + "test = dataset['test']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "xb86e__Wc7j_", + "outputId": "8b699099-7414-4663-fab1-d069d3ec3d35" + }, + "outputs": [], + "source": [ + "test[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qJWQ0a3wZ0Bw" + }, + "source": [ + "## 📥 Load Tokenizer and Model\n", + "The fine-tuned model (PeftModel) only holds the LoRA adapters, so it requires the base model to apply them correctly." + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 401, + "referenced_widgets": [ + "aee2cb6b13d64f1dab9f8190a274bbc0", + "547a2807263e4295af11da5a43ccf5b7", + "00b57ab6c0c44e39ad6fa27b7e5a085b", + "d51c826dc6d749b38ce7e5fdfc730086", + "f276602665c148999240ef916aa8a9c8", + "9d83d7056aed43a59d82955bdb8f272d", + "7a71aedc0f49430ba7c71040c5fa2529", + "108880a9a7bb4a73837889ad2a25fd77", + "0163275024a041e2bc9fba947c371269", + "555a494cbcda41e79ee4584a8122774b", + "5dfbe2af8afc43c691c34c52a47c9790", + "25edb5ab02c3402998b75cffc13d0a55", + "9a4f0361323540aa8428054a0d98ecb3", + "cf149d1eceae43a9808e142fbfe5d4ff", + "de86c9338690424fa0052e5b055cea88", + "2acb3368945a48aeaf9fbc6d22e9238d", + "4c4c4b1507814037bcea0519ec43ba26", + "6d37385e79904b7ca267ad165774f962", + "b14a5f0f71094aa98403edd429cb882e", + "31b28b6183c644f9b5601208a1f72499", + "d3cddc62e0fb4256bf4c74f6a59e686b", + "82cb2192839e451292b27a186daaa7c1", + "2e038c429eed4abdae8d27a7226d7298", + "364c4658aba64512a1f50cdad9cc12f8", + "fde7b1ab1e224fec8e9b761e703b53dd", + "ad5db9c88ce64f73992d2e274ca1206a", + "0e7ada829b22485ca7a628d2c464f3f1", + "ec4f7d2076db4f6a856ab0d5e8edffbd", + "3f00114026a4417db1b142e5bcb7a695", + "e4e9cf32b99848baa6a587fb235ce6b5", + "a109b5ee80574e40a14fa1e186f4f9f4", + "af569da703694c27aa9ca2ddce6c4923", + "886bb94abf2c437eb8505222c4336e85", + "f668156d681e47f39e553f127a44261d", + "9bb3d0deaac6439e9ad67c2bc0565ff4", + "762b36fde5ac4a2982152f3babfa3ed9", + "141911ee360d42ab8dd3b7fa3563bbf0", + "340eae69eeaf4e458e6d8134018f4ad4", + "3226e3a8c4564f6fbd6ffb3eeb7b45e7", + "6ed52680f866470da1e8d4a48b6e42fb", + "6d8a206edb824c5eb06c803e8cab14de", + "86fd4472a7a84940a54f24104689a74d", + "916c0e20af5e4b78a5e86532b0c9a3e8", + "62dd475c101e4859a48ee57a272f71bc", + "a8b7185a12c94adca0e63563d7df3ce4", + "47d57186838d466fb91b6666df85d1b4", + "9d37814d818c466c90892bf1f6e9a190", + "b5fdba30791649a792d192a131890a4e", + "789fe6f5489345c6a8b6a889d20e0ca0", + "5ce12a0983bb49f1a871598a6b9a0a13", + "d9eb89d218a44f21bb4447040e5c8925", + "b04aaa7931e74297a55bca3ebf4ded1d", + "837708f48ded4d78b7ad2e0dc6464e9c", + "32236e0d0b3e46e4b2c26b7ccb63c89e", + "499acde0cedf4ea1a90415f98660aaa5", + "840d3e7824944889ac2091b35f0c17c0", + "08f2fae4688b45729d8f5bf53837e56d", + "133bb5607eb0457888b1fe4e8d3fab3e", + "46bfe5feb9074050b556d804a544140d", + "4c3b0c2d04d24ec6abe8acbadb420712", + "eda1fcca6987495b87cf2206f93a0ecb", + "00b803cf92754db1bbea8ca909e5ccef", + "17e17b928555462abfbfa4caf7992427", + "35f90fa89e8842cdaa487b59da45b3e8", + "2887ef88074c4591b710688fa76329bf", + "0a0c5f00b3cc477e8b7e06550fc6f1cb", + "3b079fe81b7b44d796c531bec1754637", + "e82f8ac6e8eb4ed6a6743e10b8b99904", + "1f7de1e2970c4c8fbfe1ab400297e1a7", + "7ea0d8782a1f4cca9a64b95fe47e8a2e", + "689b49d52b8f4efb94f80d76a0fefab3", + "2005939305c442f7bed3b83ea16e13b1", + "1a6f2631e29444818fdbd9a0de265367", + "6bfc89e091a5448d94d2ea559ce43a21", + "bfc12d40caf4481280888506dfa01505", + "a1fb82d5761843a49a0993ff937cb40d", + "4c9c567918ee478a817b51e2a204d915", + "305623f276ba45e5a57727d1829158e1", + "b2722e271f78405b9151804ffc522530", + "963435e51a7a4ce98510c0372cd05030", + "d394cfc6af384a39b87c72ac6a3788d9", + "2c621a7a90ed4bfd8b52cea9c79e11c1", + "59ac0bb5c046448fbf16a27d2c3205f8", + "7617f5670879416d9dbc2dabda76ef4d", + "b32d6d6ff5dd4ac4adfb063205111707", + "38f3a7159fc34d89bc18e4225473615d", + "2a2c386e432f429f86c303d71472b480", + "ece25eb325004ae48ec5ec00055dd845", + "68e2b37bbd9a44f8a6032526acbf9ea6", + "3af191957e3f453ba803a1c01d6969ae", + "29dba394a6664e0f8984bcb966ccf19b", + "d84373a3f97245ae94bfb666c7e93a17", + "9f917250ccbf4078a90fda1eec71c6f4", + "8171dd4382d24f0a83484fbf967fec03", + "6f97606a500548e980c6481d756c72eb", + "6d1054047d4645a69c272484fd9e0c04", + "7fd14d942d2246bf8df28eca28e13fb2", + "0dabd208524f426bb5c643791e736413", + "368dea7bbf144cf0a667493cb23bddab", + "d6b14f8e43754283ad96543c4c1ffee6", + "f78562ef15524795bb9be326dcaab502", + "b01c8091b96444f687a49c5c51b5faf7", + "baab647e635a46ababa58993965a8159", + "25d9a9b78d554f8fbe92d7e805640c3b", + "95726f4b9bc34434b9d00fcdfe2ff87e", + "a7b835a668ef40c986a6fd51e464d1f4", + "188cac6192fc4b91be3ca5b01bab1d91", + "3537ef715f3447388625ee606555bb85", + "322ca0ccce644c48a2a0f4b44a38776d", + "cc3726d026594cb6ac2d6bafb16562ac", + "f48cc4a0a5d041cf9391a99353ff46af", + "05134ca3a9954341951ff958ff30fe0a", + "3a6aa623f1dd41b8940a41b509fa7500", + "fd58111bb44347b8bdcb984a0e86f9b7", + "c16cfb96177640a991c5509e652c85b9", + "adc0ffacba0846fabd76ed7955397077", + "e074da8f28d84ec891f22e30b86fb954", + "0b53df078f4a4a259b677ccccbdf46cd", + "954d5fa3b18a49589717cfc31fb58779", + "af0beb46b198458794c85803fe5af47f", + "c7322d41ae4c4068880521a136e923b4", + "391d834aa8734d7b9a97c03cab5e1e7d", + "5d779fc6bb1244449a68cf62dfd15698", + "197ca7f2357a4a2c89f5f3da3844c606", + "df4d22e6876b4c0082a7ace3281ff4e5", + "28d44cfae7de4b62be11020d9015f92c", + "3e8d7274ee3a4dfbbdd44ea0b2cd61b6", + "fa768ce193b94a4882a1e796e69cffea", + "c37a4882e4474f8690c4b479baf2d785", + "68a033bbcb4d4774bdb115e09d78365b", + "10b5e7970aa04bd6b3384aa645c48d92", + "f838b073dd254bb091a7db7175cd2ce8" + ] + }, + "id": "lAUAAcEC6ido", + "outputId": "b2983922-5036-4083-8cba-0cb3f51fbc51" + }, + "outputs": [], + "source": [ + "BASE_MODEL = \"meta-llama/Meta-Llama-3.1-8B\"\n", + "\n", + "quant_config = BitsAndBytesConfig(\n", + " load_in_4bit=True, # Reduce the precision to 4 bits\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_compute_dtype=torch.bfloat16,\n", + " bnb_4bit_quant_type=\"nf4\"\n", + ")\n", + "\n", + "# Load the Tokenizer and the Model\n", + "\n", + "tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)\n", + "tokenizer.pad_token = tokenizer.eos_token\n", + "tokenizer.padding_side = \"right\"\n", + "\n", + "base_model = AutoModelForCausalLM.from_pretrained(\n", + " BASE_MODEL,\n", + " quantization_config=quant_config,\n", + " device_map=\"auto\",\n", + ")\n", + "base_model.generation_config.pad_token_id = tokenizer.pad_token_id\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2RJ0G-WRJGMK" + }, + "source": [ + "## 🧪 Load and Evaluate the Fine-Tuned Model with PEFT Adapters" + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 20K Sample Fine-Tuned Model" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000, + "referenced_widgets": [ + "f0c0a20172294f77a0306801f8d76fb7", + "f68ee0810c2a4ac087ac6ece5279fb09", + "8aa12b380191454ebf55e8b42d0e0f2b", + "63f6cfa30a274ee3835671d8e39a85ef", + "0b980946a50d4248a4c63ef117fc2e8f", + "18283c6dee9447ddaca34ad267773e48", + "a7d10d9147df4adebf913e3023c2a3a4", + "5886ca455d4d4aefa617478f4f69a3ca", + "8c0e83bce4f74e7ba337fc9af5b977b8", + "00dbc32bdb0440c0bc3ba2cc6677b04c", + "243e6d8479ac4958a8d877e28f9b514a", + "10b7df1ecfab4e5cb146932fc4fb2c17", + "07c6fd1fe1ac442dbeb7037161841b78", + "88adf6ab3f3e476fa66ad22e9ff49aa8", + "fe522e9cee55448a9c13a5daaad5e7e7", + "4b1b9e5a67e54a3b90f2c113355e735a", + "5cdbdf93af9344ccabd7c3f236446541", + "c4af3ca6696d4fcd9b831d825456c7fa", + "525b1673c902412db32691056d49fd35", + "42de37b9a74143b4a851a178c484a706", + "f5f42d9201dc4fbaaa9c684fdb748d4a", + "10a0e99256a149a0a94ff652a4fd259a" + ] + }, + "id": "R_O04fKxMMT-", + "outputId": "06fc64f8-3407-460b-e093-0293e958915e" + }, + "outputs": [], + "source": [ + "# Load lisekarimi model (trained on 20K datapoints)\n", + "\n", + "FINETUNED_MODEL = \"lisekarimi/llama3-pricer-2025-04-08_18.44.04-size20000\"\n", + "fine_tuned_model = PeftModel.from_pretrained(base_model, FINETUNED_MODEL)\n", + "print(f\"Memory footprint: {fine_tuned_model.get_memory_footprint() / 1e6:.1f} MB\")\n", + "fine_tuned_model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Je5dR8QEAI1d" + }, + "outputs": [], + "source": [ + "# Gets top 3 predicted tokens from the model\n", + "# Filters valid numeric outputs (prices)\n", + "# Returns a weighted average based on token probabilities\n", + "\n", + "# This code would be more complex if we couldn't take advantage of the fact\n", + "# That Llama generates 1 token for any 3 digit number\n", + "\n", + "top_K = 3\n", + "\n", + "def improved_model_predict(prompt, device=\"cuda\"):\n", + " set_seed(42) # Reproducibility : same prompt = same o/p every time\n", + " inputs = tokenizer.encode(prompt, return_tensors=\"pt\").to(device)\n", + " attention_mask = torch.ones(inputs.shape, device=device)\n", + "\n", + " with torch.no_grad(): # Do not track gradients during inference\n", + " outputs = fine_tuned_model(inputs, attention_mask=attention_mask)\n", + " next_token_logits = outputs.logits[:, -1, :].to('cpu')\n", + "\n", + " next_token_probs = F.softmax(next_token_logits, dim=-1)\n", + " top_prob, top_token_id = next_token_probs.topk(top_K)\n", + "\n", + " prices, weights = [], [] # weights = corresponding probabilities\n", + "\n", + " for i in range(top_K):\n", + " predicted_token = tokenizer.decode(top_token_id[0][i])\n", + " probability = top_prob[0][i]\n", + "\n", + " try:\n", + " result = float(predicted_token)\n", + " except ValueError as e:\n", + " result = 0.0\n", + "\n", + " if result > 0:\n", + " prices.append(result)\n", + " weights.append(probability)\n", + "\n", + " if not prices:\n", + " return 0.0, 0.0\n", + "\n", + " total = sum(weights)\n", + "\n", + " weighted_prices = [price * weight / total for price, weight in zip(prices, weights)]\n", + "\n", + " return sum(weighted_prices).item()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "t_GHfTwHXD5f", + "outputId": "056b0fc2-5632-4be8-ee24-b6bcefe14ab9" + }, + "outputs": [], + "source": [ + "improved_model_predict(test[0][\"text\"], device=\"cuda\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 718 + }, + "id": "W_KcLvyt6kbb", + "outputId": "fba4200d-b911-467b-ab3c-17b78aa3b408" + }, + "outputs": [], + "source": [ + "Tester.test(improved_model_predict, test)" + ] + }, + { + "attachments": { + "0dcb25a7-83fa-4313-a94f-d3a56a0f07bc.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![image.png](attachment:0dcb25a7-83fa-4313-a94f-d3a56a0f07bc.png)" + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 400K Sample Fine-Tuned Model" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000, + "referenced_widgets": [ + "dd1b57e03f2641d3b702f2cc66942b8f", + "e1d477dccbfc44a8a6da301486180e82", + "c312a5111a284c3db88f22290869c023", + "ce118d8b8146497f9c7fdd3b38188e72", + "bc46c271637341bb82d6b87df22ab2af", + "602adf3242f54731938b68d3cf68465e", + "39fae5e74834421795729a259a046fb8", + "0618d8626e2e46cb9a17f86444de3c48", + "1cd43b5b2fe445088c84e19773ad861e", + "f70a29870ab34f34a1900b2df2bf177e", + "41a96c5e35a44b898b872c189f531d3a", + "0a524a73d5d6478db81256371bf2bc9b", + "275f6179dc624bceaa5d0639fe0b1b00", + "79c41b26746344bc9a220f2376360110", + "287a6430766c44e5a71dda1048fa2a2c", + "3bbe1a454a854747a96fe83e91d6cb3c", + "8a93759afe21414fb0d6684f0a591d60", + "a3d76b3ce67a495db861bac80cfc0864", + "8fc794262ed14fc785c8f06e734c57d4", + "7dc967baa0e7427bb66cf3e26849d508", + "2d7a6dbd15304347a37dbfb6e5ec7203", + "288393e05947444bad11034071015baf" + ] + }, + "id": "Kl6n_0sAbU0g", + "outputId": "2fb53efb-da22-4c29-a594-c2cf5a079388" + }, + "outputs": [], + "source": [ + "FINETUNED_MODEL = \"ed-donner/pricer-2024-09-13_13.04.39\"\n", + "REVISION = \"e8d637df551603dc86cd7a1598a8f44af4d7ae36\"\n", + "fine_tuned_model = PeftModel.from_pretrained(base_model, FINETUNED_MODEL, revision=REVISION)\n", + "print(f\"Memory footprint: {fine_tuned_model.get_memory_footprint() / 1e6:.1f} MB\")\n", + "fine_tuned_model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 718 + }, + "id": "R0YlorBhbeSE", + "outputId": "f42de9bf-d45a-4d2d-c218-fe000d716e54" + }, + "outputs": [], + "source": [ + "Tester.test(improved_model_predict, test)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "🎉 And there it is — the open-source, quantized, and fine-tuned model outperforms the rest. 🙌 \n", + "\n", + "📘 We'll continue in [the next notebook](https://github.com/lisekarimi/lexo/blob/main/09_part8_summary.ipynb) with a final wrap-up and summary of key insights.\n" + ], + "outputs": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/week7/community_contributions/lisekarimi/09_part8_summary.ipynb b/week7/community_contributions/lisekarimi/09_part8_summary.ipynb new file mode 100644 index 0000000..f7983a4 --- /dev/null +++ b/week7/community_contributions/lisekarimi/09_part8_summary.ipynb @@ -0,0 +1,75 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "id": "GHsssBgWM_l0" + }, + "source": [ + "# 🔍 Predicting Item Prices from Descriptions (Part 8)\n", + "---\n", + "- Data Curation & Preprocessing\n", + "- Model Benchmarking – Traditional ML vs LLMs\n", + "- E5 Embeddings & RAG\n", + "- Fine-Tuning GPT-4o Mini\n", + "- Evaluating LLaMA 3.1 8B Quantized\n", + "- Fine-Tuning LLaMA 3.1 with QLoRA\n", + "- Evaluating Fine-Tuned LLaMA\n", + "- ➡️ Summary & Leaderboard\n", + "\n", + "---\n", + "\n", + "# 🧪 Part 8: Summary & Leaderboard\n", + "\n", + "![](https://github.com/lisekarimi/lexo/blob/main/assets/09_ft_leaderboard.png?raw=true)\n", + "\n", + "# 🥇 The winner is the LLaMA 3.1 8B (4-bit) fine-tuned on 400K samples \n", + "\n", + "LLaMA 3.1 8B (4-bit) fine-tuned on 400K samples is outperforming even the big guy GPT-4o — with the lowest error and highest accuracy (75.6%).\n", + "\n", + "RAG + GPT-4o Mini also did well, proving that retrieval adds real value.\n", + "\n", + "On the other hand, traditional ML models and even human guesses, gave weaker results and fell behind the top models.\n", + "\n", + "💡 As we’ve seen, a **well-tuned open-source small model** can do amazing things on a focused task — sometimes even better than large, closed models.\n", + "It’s not about size — it’s about fit, focus, and fine-tuning.\n", + "\n", + "# ✨ Conclusion\n", + "What a journey! From classic ML to state-of-the-art LLMs, from embeddings to retrieval and fine-tuning — we explored it all to answer: who predicts prices best?\n", + "\n", + "Thanks for following along — see you in the next challenge! 🚀\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" + ], + "outputs": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/Build _UI.ipynb b/week8/community_contributions/Ensemble_with_xgboost/Build _UI.ipynb new file mode 100644 index 0000000..9c3aba0 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/Build _UI.ipynb @@ -0,0 +1,181 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a71ed017-e1b0-4299-88b3-f0eb05adc4df", + "metadata": {}, + "source": [ + "# Build UI\n", + "\n", + "We will use more advanced aspects of Gradio - building piece by piece." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "614c6202-4575-448d-98ee-78b735775d2b", + "metadata": {}, + "outputs": [], + "source": [ + "import gradio as gr\n", + "from deal_agent_framework import DealAgentFramework\n", + "from agents.deals import Opportunity, Deal" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0534e714-5a9c-45c6-998c-3472ac0bb8b5", + "metadata": {}, + "outputs": [], + "source": [ + "with gr.Blocks(title=\"Deal Intel\", fill_width=True) as ui:\n", + "\n", + " with gr.Row():\n", + " gr.Markdown('
Deal Intel - Deal Hunting Agentic AI
')\n", + " with gr.Row():\n", + " gr.Markdown('
Autonomous agent framework that finds online deals, collaborating with a proprietary fine-tuned LLM deployed on Modal, and a RAG pipeline with a frontier model and Chroma.
')\n", + " \n", + "\n", + "ui.launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "18c12c10-750c-4da3-8df5-f2bc3393f9e0", + "metadata": {}, + "outputs": [], + "source": [ + "# Updated to change from height to max_height due to change in Gradio v5\n", + "# With much thanks to student Ed B. for raising this\n", + "\n", + "with gr.Blocks(title=\"Deal Intel\", fill_width=True) as ui:\n", + "\n", + " initial_deal = Deal(product_description=\"Example description\", price=100.0, url=\"https://cnn.com\")\n", + " initial_opportunity = Opportunity(deal=initial_deal, estimate=200.0, discount=100.0)\n", + " opportunities = gr.State([initial_opportunity])\n", + "\n", + " def get_table(opps):\n", + " return [[opp.deal.product_description, opp.deal.price, opp.estimate, opp.discount, opp.deal.url] for opp in opps]\n", + "\n", + " with gr.Row():\n", + " gr.Markdown('
\"Deal Intel\" - Deal Hunting Agentic AI
')\n", + " with gr.Row():\n", + " gr.Markdown('
Deals surfaced so far:
')\n", + " with gr.Row():\n", + " opportunities_dataframe = gr.Dataframe(\n", + " headers=[\"Description\", \"Price\", \"Estimate\", \"Discount\", \"URL\"],\n", + " wrap=True,\n", + " column_widths=[4, 1, 1, 1, 2],\n", + " row_count=10,\n", + " col_count=5,\n", + " max_height=400,\n", + " )\n", + "\n", + " ui.load(get_table, inputs=[opportunities], outputs=[opportunities_dataframe])\n", + "\n", + "ui.launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "87106328-a17a-447e-90b9-c547613468da", + "metadata": {}, + "outputs": [], + "source": [ + "agent_framework = DealAgentFramework()\n", + "agent_framework.init_agents_as_needed()\n", + "\n", + "with gr.Blocks(title=\"Deal Intel\", fill_width=True) as ui:\n", + "\n", + " initial_deal = Deal(product_description=\"Example description\", price=100.0, url=\"https://cnn.com\")\n", + " initial_opportunity = Opportunity(deal=initial_deal, estimate=200.0, discount=100.0)\n", + " opportunities = gr.State([initial_opportunity])\n", + "\n", + " def get_table(opps):\n", + " return [[opp.deal.product_description, opp.deal.price, opp.estimate, opp.discount, opp.deal.url] for opp in opps]\n", + "\n", + " def do_select(opportunities, selected_index: gr.SelectData):\n", + " row = selected_index.index[0]\n", + " opportunity = opportunities[row]\n", + " agent_framework.planner.messenger.alert(opportunity)\n", + "\n", + " with gr.Row():\n", + " gr.Markdown('
\"Deal Intel\" - Deal Hunting Agentic AI
')\n", + " with gr.Row():\n", + " gr.Markdown('
Deals surfaced so far:
')\n", + " with gr.Row():\n", + " opportunities_dataframe = gr.Dataframe(\n", + " headers=[\"Description\", \"Price\", \"Estimate\", \"Discount\", \"URL\"],\n", + " wrap=True,\n", + " column_widths=[4, 1, 1, 1, 2],\n", + " row_count=10,\n", + " col_count=5,\n", + " max_height=400,\n", + " )\n", + "\n", + " ui.load(get_table, inputs=[opportunities], outputs=[opportunities_dataframe])\n", + " opportunities_dataframe.select(do_select, inputs=[opportunities], outputs=[])\n", + "\n", + "ui.launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "48506465-1c7a-433f-a665-b277a8b4665c", + "metadata": {}, + "outputs": [], + "source": [ + "!python price_is_right_final.py" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9dd0a27-7d46-4c9e-bbe4-a61c9c899c99", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d1504cb8-7bf7-4dc4-9b1a-eaba79404aac", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3ed84afd-4a04-43d6-8a3b-5143deaf96b2", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week8/community_contributions/Ensemble_with_xgboost/Build_Messaging_Planning_Agent.ipynb b/week8/community_contributions/Ensemble_with_xgboost/Build_Messaging_Planning_Agent.ipynb new file mode 100644 index 0000000..649a77d --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/Build_Messaging_Planning_Agent.ipynb @@ -0,0 +1,119 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "23f53670-1a73-46ba-a754-4a497e8e0e64", + "metadata": {}, + "source": [ + "# Messaging Agent and Planning Agent\n", + "\n", + "Then we'll put it all together into an Agent Framework." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "80d683d9-9e92-44ae-af87-a413ca84db21", + "metadata": {}, + "outputs": [], + "source": [ + "from dotenv import load_dotenv\n", + "from agents.messaging_agent import MessagingAgent" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5ba769cc-5301-4810-b01f-cab584cfb3b3", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "DB = \"products_vectorstore\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e05cc427-3d2c-4792-ade1-d356f95a82a9", + "metadata": {}, + "outputs": [], + "source": [ + "agent = MessagingAgent()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5ec518f5-dae4-44b1-a185-d7eaf853ec00", + "metadata": {}, + "outputs": [], + "source": [ + "agent.push(\"MASSIVE NEWS!!!\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "57b3a014-0b15-425a-a29b-6fefc5006dee", + "metadata": {}, + "outputs": [], + "source": [ + "import chromadb\n", + "DB = \"products_vectorstore\"\n", + "client = chromadb.PersistentClient(path=DB)\n", + "collection = client.get_or_create_collection('products')\n", + "from agents.planning_agent import PlanningAgent" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a5c31c39-e357-446e-9cec-b4775c298941", + "metadata": {}, + "outputs": [], + "source": [ + "planner = PlanningAgent(collection)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d9ac771b-ea12-41c0-a7ce-05f12e27ad9e", + "metadata": {}, + "outputs": [], + "source": [ + "planner.plan()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d91ac0bb-738e-4be5-9074-d583190b1e2a", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week8/community_contributions/Ensemble_with_xgboost/Build_RAG_Frontier_Agent.ipynb b/week8/community_contributions/Ensemble_with_xgboost/Build_RAG_Frontier_Agent.ipynb new file mode 100644 index 0000000..69c6f58 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/Build_RAG_Frontier_Agent.ipynb @@ -0,0 +1,342 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "fbcdfea8-7241-46d7-a771-c0381a3e7063", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import re\n", + "import math\n", + "import json\n", + "from tqdm import tqdm\n", + "import random\n", + "from dotenv import load_dotenv\n", + "from huggingface_hub import login\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import pickle\n", + "from openai import OpenAI\n", + "from sentence_transformers import SentenceTransformer\n", + "from datasets import load_dataset\n", + "import chromadb\n", + "from items import Item\n", + "from testing import Tester" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "98666e73-938e-469d-8987-e6e55ba5e034", + "metadata": {}, + "outputs": [], + "source": [ + "# environment\n", + "load_dotenv(override=True)\n", + "os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n", + "os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9a25a5cf-8f6c-4b5d-ad98-fdd096f5adf8", + "metadata": {}, + "outputs": [], + "source": [ + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dc696493-0b6f-48aa-9fa8-b1ae0ecaf3cd", + "metadata": {}, + "outputs": [], + "source": [ + "# Load in the test pickle file\n", + "with open('test.pkl', 'rb') as file:\n", + " test = pickle.load(file)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33d38a06-0c0d-4e96-94d1-35ee183416ce", + "metadata": {}, + "outputs": [], + "source": [ + "def make_context(similars, prices):\n", + " message = \"To provide some context, here are some other items that might be similar to the item you need to estimate.\\n\\n\"\n", + " for similar, price in zip(similars, prices):\n", + " message += f\"Potentially related product:\\n{similar}\\nPrice is ${price:.2f}\\n\\n\"\n", + " return message" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "61f203b7-63b6-48ed-869b-e393b5bfcad3", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(item, similars, prices):\n", + " system_message = \"You estimate prices of items. Reply only with the price, no explanation. Price is always below $1000.\"\n", + " user_prompt = make_context(similars, prices)\n", + " user_prompt += \"And now the question for you:\\n\\n\"\n", + " user_prompt += item.test_prompt().replace(\" to the nearest dollar\",\"\").replace(\"\\n\\nPrice is $\",\"\")\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt},\n", + " {\"role\": \"assistant\", \"content\": \"Price is $\"}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b26f405d-6e1f-4caa-b97f-1f62cd9d1ebc", + "metadata": {}, + "outputs": [], + "source": [ + "DB = \"products_vectorstore\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d26a1104-cd11-4361-ab25-85fb576e0582", + "metadata": {}, + "outputs": [], + "source": [ + "client = chromadb.PersistentClient(path=DB)\n", + "collection = client.get_or_create_collection('products')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1e339760-96d8-4485-bec7-43fadcd30c4d", + "metadata": {}, + "outputs": [], + "source": [ + "def description(item):\n", + " text = item.prompt.replace(\"How much does this cost to the nearest dollar?\\n\\n\", \"\")\n", + " return text.split(\"\\n\\nPrice is $\")[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f759bd2-7a7e-4c1a-80a0-e12470feca89", + "metadata": {}, + "outputs": [], + "source": [ + "model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e44dbd25-fb95-4b6b-bbbb-8da5fc817105", + "metadata": {}, + "outputs": [], + "source": [ + "def vector(item):\n", + " return model.encode([description(item)])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ffd5ee47-db5d-4263-b0d9-80d568c91341", + "metadata": {}, + "outputs": [], + "source": [ + "def find_similars(item):\n", + " results = collection.query(query_embeddings=vector(item).astype(float).tolist(), n_results=5)\n", + " documents = results['documents'][0][:]\n", + " prices = [m['price'] for m in results['metadatas'][0][:]]\n", + " return documents, prices" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f7b9ff9-fd90-4627-bb17-7c2f7bbd21f3", + "metadata": {}, + "outputs": [], + "source": [ + "print(test[1].prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ff1b2659-cc6b-47aa-a797-dd1cd3d1d6c3", + "metadata": {}, + "outputs": [], + "source": [ + "documents, prices = find_similars(test[1])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24756d4d-edac-41ce-bb80-c3b6f1cea7ee", + "metadata": {}, + "outputs": [], + "source": [ + "print(make_context(documents, prices))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b81eca2-0b58-4fe8-9dd6-47f13ba5f8ee", + "metadata": {}, + "outputs": [], + "source": [ + "print(messages_for(test[1], documents, prices))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d11f1c8d-7480-4d64-a274-b030d701f1b8", + "metadata": {}, + "outputs": [], + "source": [ + "def get_price(s):\n", + " s = s.replace('$','').replace(',','')\n", + " match = re.search(r\"[-+]?\\d*\\.\\d+|\\d+\", s)\n", + " return float(match.group()) if match else 0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "06743833-c362-47f8-b02a-139be2cd52ab", + "metadata": {}, + "outputs": [], + "source": [ + "get_price(\"The price for this is $99.99\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a919cf7d-b3d3-4968-8c96-54a0da0b0219", + "metadata": {}, + "outputs": [], + "source": [ + "# The function for gpt-4o-mini\n", + "\n", + "def gpt_4o_mini_rag(item):\n", + " documents, prices = find_similars(item)\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4o-mini\", \n", + " messages=messages_for(item, documents, prices),\n", + " seed=42,\n", + " max_tokens=5\n", + " )\n", + " reply = response.choices[0].message.content\n", + " return get_price(reply)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b42e1b9-eaa0-4b45-a847-e8932367f596", + "metadata": {}, + "outputs": [], + "source": [ + "# The function for gpt-4.1\n", + "\n", + "# def gpt_4_1_rag(item):\n", + "# documents, prices = find_similars(item)\n", + "# response = openai.chat.completions.create(\n", + "# model=\"gpt-4.1\", \n", + "# messages=messages_for(item, documents, prices),\n", + "# seed=42,\n", + "# max_tokens=5\n", + "# )\n", + "# reply = response.choices[0].message.content\n", + "# return get_price(reply)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3e519e26-ff15-4425-90bb-bfbf55deb39b", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_4o_mini_rag(test[1])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "082c6a5a-0f2a-4941-a465-ffb3137a2e8d", + "metadata": {}, + "outputs": [], + "source": [ + "# gpt_4_1_rag(test[1])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce78741b-2966-41d2-9831-cbf8f8d176be", + "metadata": {}, + "outputs": [], + "source": [ + "test[1].price" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16d90455-ff7d-4f5f-8b8c-8e061263d1c7", + "metadata": {}, + "outputs": [], + "source": [ + "Tester.test(gpt_4o_mini_rag, test)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "26d5ddc6-baa6-4760-a430-05671847ac47", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week8/community_contributions/Ensemble_with_xgboost/Build_RF_XGB_Ensemble.ipynb b/week8/community_contributions/Ensemble_with_xgboost/Build_RF_XGB_Ensemble.ipynb new file mode 100644 index 0000000..a582e92 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/Build_RF_XGB_Ensemble.ipynb @@ -0,0 +1,2137 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "40d49349-faaa-420c-9b65-0bdc9edfabce", + "metadata": {}, + "source": [ + "## Random Forests, XGBoost & Ensemble" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "fbcdfea8-7241-46d7-a771-c0381a3e7063", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import re\n", + "import math\n", + "import json\n", + "from tqdm import tqdm\n", + "import random\n", + "from dotenv import load_dotenv\n", + "from huggingface_hub import login\n", + "import numpy as np\n", + "import pickle\n", + "from openai import OpenAI\n", + "from sentence_transformers import SentenceTransformer\n", + "from datasets import load_dataset\n", + "import chromadb\n", + "from items import Item\n", + "from testing import Tester\n", + "import pandas as pd\n", + "import numpy as np\n", + "from sklearn.ensemble import RandomForestRegressor\n", + "from sklearn.ensemble import GradientBoostingRegressor\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.metrics import mean_squared_error, r2_score\n", + "import joblib\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "e6e88bd1-f89c-4b98-92fa-aa4bc1575bca", + "metadata": {}, + "outputs": [], + "source": [ + "# CONSTANTS\n", + "QUESTION = \"How much does this cost to the nearest dollar?\\n\\n\"\n", + "DB = \"products_vectorstore\"" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "98666e73-938e-469d-8987-e6e55ba5e034", + "metadata": {}, + "outputs": [], + "source": [ + "# environment\n", + "load_dotenv(override=True)\n", + "os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n", + "os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "dc696493-0b6f-48aa-9fa8-b1ae0ecaf3cd", + "metadata": {}, + "outputs": [], + "source": [ + "# Load in the test pickle file:\n", + "with open('test.pkl', 'rb') as file:\n", + " test = pickle.load(file)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "d26a1104-cd11-4361-ab25-85fb576e0582", + "metadata": {}, + "outputs": [], + "source": [ + "client = chromadb.PersistentClient(path=DB)\n", + "collection = client.get_or_create_collection('products')" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "e00b82a9-a8dc-46f1-8ea9-2f07cbc8e60d", + "metadata": {}, + "outputs": [], + "source": [ + "result = collection.get(include=['embeddings', 'documents', 'metadatas'])\n", + "vectors = np.array(result['embeddings'])\n", + "documents = result['documents']\n", + "prices = [metadata['price'] for metadata in result['metadatas']]" + ] + }, + { + "cell_type": "markdown", + "id": "bf6492cb-b11a-4ad5-859b-a71a78ffb949", + "metadata": {}, + "source": [ + "# Random Forest\n", + "\n", + "We will now train a Random Forest model.\n", + "\n", + "Using the vectors we already have in Chroma, from the SentenceTransformer model." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "48894777-101f-4fe5-998c-47079407f340", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "RandomForestRegressor(n_jobs=-1, random_state=42)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "rf_model = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)\n", + "rf_model.fit(vectors, prices)" + ] + }, + { + "cell_type": "markdown", + "id": "b1b1f080-0ab2-4b7f-8eb9-1d56d048a78d", + "metadata": {}, + "source": [ + "# Gradient Boosting" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "fb9fdec7-11f9-48ef-8a31-430b2fccc400", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
GradientBoostingRegressor(random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "GradientBoostingRegressor(random_state=42)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gb_model = GradientBoostingRegressor(n_estimators=100, random_state=42)\n", + "gb_model.fit(vectors, prices)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "62eb7ddf-e1da-481e-84c6-1256547566bd", + "metadata": {}, + "outputs": [], + "source": [ + "# Save the model to a file\n", + "\n", + "joblib.dump(rf_model, 'random_forest_model.pkl')\n", + "joblib.dump(gb_model, 'gradient_boosting_model.pkl')" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "d281dc5e-761e-4a5e-86b3-29d9c0a33d4a", + "metadata": {}, + "outputs": [], + "source": [ + "# Load it back in again\n", + "\n", + "rf_model = joblib.load('random_forest_model.pkl')\n", + "gb_model = joblib.load('gradient_boosting_model.pkl')" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "5d438dec-8e5b-4e60-bb6f-c3f82e522dd9", + "metadata": {}, + "outputs": [], + "source": [ + "from agents.specialist_agent import SpecialistAgent\n", + "from agents.frontier_agent import FrontierAgent\n", + "from agents.random_forest_agent import RandomForestAgent\n", + "from agents.gradient_boosting_agent import GradientBoostingAgent" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "afc39369-b97b-4a90-b17e-b20ef501d3c9", + "metadata": {}, + "outputs": [], + "source": [ + "specialist = SpecialistAgent()\n", + "frontier = FrontierAgent(collection)\n", + "random_forest = RandomForestAgent()\n", + "gradient_boosting = GradientBoostingAgent()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "8e2d0d0a-8bb8-4b39-b046-322828c39244", + "metadata": {}, + "outputs": [], + "source": [ + "def description(item):\n", + " return item.prompt.split(\"to the nearest dollar?\\n\\n\")[1].split(\"\\n\\nPrice is $\")[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "bfe0434f-b29e-4cc0-bad9-b07624665727", + "metadata": {}, + "outputs": [], + "source": [ + "def rf(item):\n", + " return random_forest.price(description(item))" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "cdf233ec-264f-4b34-9f2b-27c39692137b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[93m1: Guess: $291.59 Truth: $374.41 Error: $82.82 SLE: 0.06 Item: OEM AC Compressor w/A/C Repair Kit For F...\u001b[0m\n", + "\u001b[92m2: Guess: $204.82 Truth: $225.11 Error: $20.29 SLE: 0.01 Item: Motorcraft YB3125 Fan Clutch\u001b[0m\n", + "\u001b[91m3: Guess: $206.03 Truth: $61.68 Error: $144.35 SLE: 1.43 Item: Dorman 603-159 Front Washer Fluid Reserv...\u001b[0m\n", + "\u001b[93m4: Guess: $364.85 Truth: $599.99 Error: $235.14 SLE: 0.25 Item: HP Premium 17.3-inch HD Plus Touchscreen...\u001b[0m\n", + "\u001b[91m5: Guess: $219.07 Truth: $16.99 Error: $202.08 SLE: 6.27 Item: 5-Position Super Switch Pickup Selector ...\u001b[0m\n", + "\u001b[92m6: Guess: $57.33 Truth: $31.99 Error: $25.34 SLE: 0.32 Item: Horror Bookmarks, Resin Horror Bookmarks...\u001b[0m\n", + "\u001b[91m7: Guess: $272.17 Truth: $101.79 Error: $170.38 SLE: 0.96 Item: SK6241 - Stinger 4 Gauge 6000 Series Pow...\u001b[0m\n", + "\u001b[92m8: Guess: $235.98 Truth: $289.00 Error: $53.02 SLE: 0.04 Item: Godox ML60Bi LED Light Kit, Handheld LED...\u001b[0m\n", + "\u001b[91m9: Guess: $316.78 Truth: $635.86 Error: $319.08 SLE: 0.48 Item: Randall RG75DG3PLUS G3 Plus 100-Watt Com...\u001b[0m\n", + "\u001b[91m10: Guess: $175.79 Truth: $65.99 Error: $109.80 SLE: 0.94 Item: HOLDWILL 6 Pack LED Shop Light, 4FT 24W ...\u001b[0m\n", + "\u001b[92m11: Guess: $262.45 Truth: $254.21 Error: $8.24 SLE: 0.00 Item: Viking Horns V103C/1005ATK 3 Gallon Air ...\u001b[0m\n", + "\u001b[93m12: Guess: $253.81 Truth: $412.99 Error: $159.18 SLE: 0.24 Item: CURT 70110 Custom Tow Bar Base Plate Bra...\u001b[0m\n", + "\u001b[92m13: Guess: $174.96 Truth: $205.50 Error: $30.54 SLE: 0.03 Item: 10-Pack Solar HAMMERED BRONZE Finish Pos...\u001b[0m\n", + "\u001b[92m14: Guess: $279.61 Truth: $248.23 Error: $31.38 SLE: 0.01 Item: COSTWAY Electric Tumble Dryer, Sliver\u001b[0m\n", + "\u001b[93m15: Guess: $305.92 Truth: $399.00 Error: $93.08 SLE: 0.07 Item: FREE SIGNAL TV Transit 32\" 12 Volt DC Po...\u001b[0m\n", + "\u001b[92m16: Guess: $339.52 Truth: $373.94 Error: $34.42 SLE: 0.01 Item: Bilstein 5100 Monotube Gas Shock Set com...\u001b[0m\n", + "\u001b[91m17: Guess: $237.41 Truth: $92.89 Error: $144.52 SLE: 0.87 Item: Sangean K-200 Multi-Function Upright AM/...\u001b[0m\n", + "\u001b[93m18: Guess: $112.32 Truth: $51.99 Error: $60.33 SLE: 0.58 Item: Charles Leonard Magnetic Lapboard Class ...\u001b[0m\n", + "\u001b[91m19: Guess: $425.61 Truth: $179.00 Error: $246.61 SLE: 0.74 Item: Gigabyte AMD Radeon HD 7870 2 GB GDDR5 D...\u001b[0m\n", + "\u001b[93m20: Guess: $75.46 Truth: $19.42 Error: $56.04 SLE: 1.74 Item: 3dRose LLC 8 x 8 x 0.25 Inches Bull Terr...\u001b[0m\n", + "\u001b[93m21: Guess: $332.77 Truth: $539.95 Error: $207.18 SLE: 0.23 Item: ROKINON 85mm F1.4 Auto Focus Full Frame ...\u001b[0m\n", + "\u001b[93m22: Guess: $214.54 Truth: $147.67 Error: $66.87 SLE: 0.14 Item: AUTOSAVER88 Headlight Assembly Compatibl...\u001b[0m\n", + "\u001b[91m23: Guess: $179.36 Truth: $24.99 Error: $154.37 SLE: 3.75 Item: ASI NAUTICAL 2.5 Inches Opera Glasses Bi...\u001b[0m\n", + "\u001b[91m24: Guess: $283.14 Truth: $149.00 Error: $134.14 SLE: 0.41 Item: Behringer TUBE OVERDRIVE TO100 Authentic...\u001b[0m\n", + "\u001b[92m25: Guess: $49.14 Truth: $16.99 Error: $32.15 SLE: 1.05 Item: Fun Express Insect Finger Puppets - 24 f...\u001b[0m\n", + "\u001b[93m26: Guess: $75.03 Truth: $7.99 Error: $67.04 SLE: 4.56 Item: WAFJAMF Roller Stamp Identity Theft Stam...\u001b[0m\n", + "\u001b[92m27: Guess: $186.19 Truth: $199.99 Error: $13.80 SLE: 0.01 Item: Capulina Tiffany Floor Lamp 2-Light 16\" ...\u001b[0m\n", + "\u001b[92m28: Guess: $273.60 Truth: $251.45 Error: $22.15 SLE: 0.01 Item: Apple Watch Series 6 (GPS, 44mm) - Space...\u001b[0m\n", + "\u001b[92m29: Guess: $232.49 Truth: $231.62 Error: $0.87 SLE: 0.00 Item: ICON 01725 Tandem Axle Fender Skirt FS17...\u001b[0m\n", + "\u001b[91m30: Guess: $226.25 Truth: $135.00 Error: $91.25 SLE: 0.26 Item: SanDisk 128GB Ultra (10 Pack) MicroSD Cl...\u001b[0m\n", + "\u001b[92m31: Guess: $301.87 Truth: $356.62 Error: $54.75 SLE: 0.03 Item: Velvac 2020,L,C/Hr,W,E2003,102\",Bk - 715...\u001b[0m\n", + "\u001b[92m32: Guess: $233.55 Truth: $257.99 Error: $24.44 SLE: 0.01 Item: TCMT Passenger Backrest Sissy Bar & Lugg...\u001b[0m\n", + "\u001b[91m33: Guess: $169.42 Truth: $27.99 Error: $141.43 SLE: 3.14 Item: Alnicov 63.5MM Brass Tremolo Block,Tremo...\u001b[0m\n", + "\u001b[91m34: Guess: $266.18 Truth: $171.20 Error: $94.98 SLE: 0.19 Item: Subaru Forester Outback Legacy OEM Engin...\u001b[0m\n", + "\u001b[92m35: Guess: $219.00 Truth: $225.00 Error: $6.00 SLE: 0.00 Item: Richmond Auto Upholstery - 2012 Dodge Ra...\u001b[0m\n", + "\u001b[91m36: Guess: $202.91 Truth: $105.00 Error: $97.91 SLE: 0.43 Item: AP-39 Automotive Paint Primer Grey 2K Ur...\u001b[0m\n", + "\u001b[93m37: Guess: $220.41 Truth: $299.99 Error: $79.58 SLE: 0.09 Item: Road Top Wireless Carplay Retrofit Kit D...\u001b[0m\n", + "\u001b[93m38: Guess: $658.90 Truth: $535.09 Error: $123.81 SLE: 0.04 Item: Gibson Performance Exhaust 5658 Aluminiz...\u001b[0m\n", + "\u001b[91m39: Guess: $129.36 Truth: $12.33 Error: $117.03 SLE: 5.20 Item: Bella Tunno Happy Links - Baby Montessor...\u001b[0m\n", + "\u001b[91m40: Guess: $183.61 Truth: $84.99 Error: $98.62 SLE: 0.58 Item: CANMORE H300 Handheld GPS Golf Device, S...\u001b[0m\n", + "\u001b[91m41: Guess: $110.24 Truth: $15.99 Error: $94.25 SLE: 3.53 Item: DCPOWER AC Adapter Compatible Replacemen...\u001b[0m\n", + "\u001b[91m42: Guess: $180.79 Truth: $62.44 Error: $118.35 SLE: 1.11 Item: Sharp, VX2128V, Commercial Desktop Calcu...\u001b[0m\n", + "\u001b[93m43: Guess: $132.13 Truth: $82.99 Error: $49.14 SLE: 0.21 Item: Melissa & Doug Lifelike Plush Stork Gian...\u001b[0m\n", + "\u001b[91m44: Guess: $311.71 Truth: $599.95 Error: $288.24 SLE: 0.43 Item: Sony SSCS8 2-Way 3-Driver Center Channel...\u001b[0m\n", + "\u001b[91m45: Guess: $277.69 Truth: $194.99 Error: $82.70 SLE: 0.12 Item: ASUS Chromebook CX1, 14\" Full HD NanoEdg...\u001b[0m\n", + "\u001b[92m46: Guess: $317.88 Truth: $344.95 Error: $27.07 SLE: 0.01 Item: FiiO X7 32GB Hi-Res Lossless Music Playe...\u001b[0m\n", + "\u001b[92m47: Guess: $46.17 Truth: $37.99 Error: $8.18 SLE: 0.04 Item: TORRO Leather Case Compatible with iPhon...\u001b[0m\n", + "\u001b[92m48: Guess: $228.40 Truth: $224.35 Error: $4.05 SLE: 0.00 Item: Universal Air Conditioner KT 1031 A/C Co...\u001b[0m\n", + "\u001b[92m49: Guess: $747.66 Truth: $814.00 Error: $66.34 SLE: 0.01 Item: Street Series Stainless Performance Cat-...\u001b[0m\n", + "\u001b[92m50: Guess: $511.14 Truth: $439.88 Error: $71.26 SLE: 0.02 Item: Lenovo IdeaPad 3 14-inch Laptop, 14.0-in...\u001b[0m\n", + "\u001b[92m51: Guess: $310.99 Truth: $341.43 Error: $30.44 SLE: 0.01 Item: Access Bed Covers TonnoSport 22050219 - ...\u001b[0m\n", + "\u001b[91m52: Guess: $200.34 Truth: $46.78 Error: $153.56 SLE: 2.07 Item: G.I. JOE Hasbro 3 3/4\" Wave 5 Action Fig...\u001b[0m\n", + "\u001b[92m53: Guess: $146.38 Truth: $171.44 Error: $25.06 SLE: 0.02 Item: T&S Brass B-0232-BST Double Pantry Fauce...\u001b[0m\n", + "\u001b[91m54: Guess: $261.18 Truth: $458.00 Error: $196.82 SLE: 0.31 Item: ZTUOAUMA Fuel Injection Pump 3090942 309...\u001b[0m\n", + "\u001b[91m55: Guess: $225.15 Truth: $130.75 Error: $94.40 SLE: 0.29 Item: 2AP18AA#ABA Hp Prime Graphing Calculator...\u001b[0m\n", + "\u001b[91m56: Guess: $186.04 Truth: $83.81 Error: $102.23 SLE: 0.63 Item: Lowrance 000-0119-83 Nmea 2000 25' Exten...\u001b[0m\n", + "\u001b[91m57: Guess: $169.97 Truth: $386.39 Error: $216.42 SLE: 0.67 Item: Jeep Genuine Accessories 82213051 Hood L...\u001b[0m\n", + "\u001b[93m58: Guess: $209.56 Truth: $169.00 Error: $40.56 SLE: 0.05 Item: GODOX CB-06 Hard Carrying Case with Whee...\u001b[0m\n", + "\u001b[91m59: Guess: $146.74 Truth: $17.95 Error: $128.79 SLE: 4.22 Item: Au-Tomotive Gold, INC. Ford Black Valet ...\u001b[0m\n", + "\u001b[92m60: Guess: $222.65 Truth: $269.00 Error: $46.35 SLE: 0.04 Item: Snailfly Black Roof Rack Rail + Cross Ba...\u001b[0m\n", + "\u001b[91m61: Guess: $202.16 Truth: $77.77 Error: $124.39 SLE: 0.90 Item: KING SHA Anti Glare LED Track Lighting H...\u001b[0m\n", + "\u001b[91m62: Guess: $181.11 Truth: $88.99 Error: $92.12 SLE: 0.50 Item: APS Compatible with Chevy Silverado 1500...\u001b[0m\n", + "\u001b[93m63: Guess: $233.73 Truth: $364.41 Error: $130.68 SLE: 0.20 Item: Wilwood Engineering 14011291R Brake Cali...\u001b[0m\n", + "\u001b[92m64: Guess: $134.65 Truth: $127.03 Error: $7.62 SLE: 0.00 Item: ACDelco Gold 336-1925A Starter, Remanufa...\u001b[0m\n", + "\u001b[92m65: Guess: $636.90 Truth: $778.95 Error: $142.05 SLE: 0.04 Item: UWS EC10783 69-Inch Matte Black Heavy-Wa...\u001b[0m\n", + "\u001b[91m66: Guess: $452.42 Truth: $206.66 Error: $245.76 SLE: 0.61 Item: Dell Latitude E5440 14in Business Laptop...\u001b[0m\n", + "\u001b[91m67: Guess: $187.75 Truth: $35.94 Error: $151.81 SLE: 2.66 Item: (Plug and Play) Spare Tire Brake Light W...\u001b[0m\n", + "\u001b[91m68: Guess: $249.13 Truth: $149.00 Error: $100.13 SLE: 0.26 Item: The Ultimate Roadside Rescue Assistant\u001b[0m\n", + "\u001b[92m69: Guess: $224.20 Truth: $251.98 Error: $27.78 SLE: 0.01 Item: Brand New 18\" x 8.5\" Replacement Wheel f...\u001b[0m\n", + "\u001b[93m70: Guess: $234.18 Truth: $160.00 Error: $74.18 SLE: 0.14 Item: Headlight Headlamp LH Left & RH Right Pa...\u001b[0m\n", + "\u001b[91m71: Guess: $122.29 Truth: $39.99 Error: $82.30 SLE: 1.21 Item: Lilo And Stitch Deluxe Oversize Print La...\u001b[0m\n", + "\u001b[93m72: Guess: $250.75 Truth: $362.41 Error: $111.66 SLE: 0.13 Item: AC Compressor & A/C Clutch For Hyundai A...\u001b[0m\n", + "\u001b[93m73: Guess: $238.01 Truth: $344.00 Error: $105.99 SLE: 0.13 Item: House Of Troy PIN475-AB Pinnacle Collect...\u001b[0m\n", + "\u001b[91m74: Guess: $209.07 Truth: $25.09 Error: $183.98 SLE: 4.35 Item: Juno T29 WH Floating Electrical Feed Sin...\u001b[0m\n", + "\u001b[92m75: Guess: $171.57 Truth: $175.95 Error: $4.38 SLE: 0.00 Item: Sherman GO-PARTS - for 2013-2016 Toyota ...\u001b[0m\n", + "\u001b[91m76: Guess: $253.67 Truth: $132.64 Error: $121.03 SLE: 0.42 Item: Roland RPU-3 Electronic Keyboard Pedal o...\u001b[0m\n", + "\u001b[93m77: Guess: $280.83 Truth: $422.99 Error: $142.16 SLE: 0.17 Item: Rockland VMI14 12,000 Pound 12 Volt DC E...\u001b[0m\n", + "\u001b[92m78: Guess: $167.75 Truth: $146.48 Error: $21.27 SLE: 0.02 Item: Max Advanced Brakes Elite XDS Front Cros...\u001b[0m\n", + "\u001b[93m79: Guess: $197.53 Truth: $156.83 Error: $40.70 SLE: 0.05 Item: Quality-Built 11030 Premium Quality Alte...\u001b[0m\n", + "\u001b[91m80: Guess: $357.41 Truth: $251.99 Error: $105.42 SLE: 0.12 Item: Lucida LG-510 Student Classical Guitar, ...\u001b[0m\n", + "\u001b[91m81: Guess: $222.52 Truth: $940.33 Error: $717.81 SLE: 2.07 Item: Longacre 52-79800 Aluminum Turn Plates\u001b[0m\n", + "\u001b[91m82: Guess: $177.82 Truth: $52.99 Error: $124.83 SLE: 1.43 Item: Motion Pro 08-0380 Adjustable Torque Wre...\u001b[0m\n", + "\u001b[93m83: Guess: $270.79 Truth: $219.95 Error: $50.84 SLE: 0.04 Item: Glyph Thunderbolt 3 NVMe Dock (0 GB)\u001b[0m\n", + "\u001b[93m84: Guess: $280.41 Truth: $441.03 Error: $160.62 SLE: 0.20 Item: TOYO Open Country MT Performance Radial ...\u001b[0m\n", + "\u001b[93m85: Guess: $239.35 Truth: $168.98 Error: $70.37 SLE: 0.12 Item: Razer Seiren X USB Streaming Microphone ...\u001b[0m\n", + "\u001b[92m86: Guess: $41.61 Truth: $2.49 Error: $39.12 SLE: 6.26 Item: Happy Birthday to Dad From Your Daughter...\u001b[0m\n", + "\u001b[91m87: Guess: $198.54 Truth: $98.62 Error: $99.92 SLE: 0.48 Item: Little Tikes My Real Jam First Concert S...\u001b[0m\n", + "\u001b[92m88: Guess: $206.61 Truth: $256.95 Error: $50.34 SLE: 0.05 Item: Studio M Peace and Harmony Art Pole Comm...\u001b[0m\n", + "\u001b[91m89: Guess: $175.48 Truth: $30.99 Error: $144.49 SLE: 2.92 Item: MyVolts 12V Power Supply Adaptor Compati...\u001b[0m\n", + "\u001b[93m90: Guess: $379.12 Truth: $569.84 Error: $190.72 SLE: 0.17 Item: Dell Latitude 7212 Rugged Extreme Tablet...\u001b[0m\n", + "\u001b[93m91: Guess: $228.43 Truth: $177.99 Error: $50.44 SLE: 0.06 Item: Covermates Contour Fit Car Cover - Light...\u001b[0m\n", + "\u001b[93m92: Guess: $611.60 Truth: $997.99 Error: $386.39 SLE: 0.24 Item: Westin 57-4025 Black HDX Grille Guard fi...\u001b[0m\n", + "\u001b[92m93: Guess: $243.08 Truth: $219.00 Error: $24.08 SLE: 0.01 Item: Fieldpiece JL2 Job Link Wireless App Tra...\u001b[0m\n", + "\u001b[93m94: Guess: $275.13 Truth: $225.55 Error: $49.58 SLE: 0.04 Item: hansgrohe Talis S Modern Premium Easy Cl...\u001b[0m\n", + "\u001b[93m95: Guess: $332.03 Truth: $495.95 Error: $163.92 SLE: 0.16 Item: G-Technology G-SPEED eS PRO High-Perform...\u001b[0m\n", + "\u001b[92m96: Guess: $764.31 Truth: $942.37 Error: $178.06 SLE: 0.04 Item: DreamLine SHDR-1960723L-01 Shower Door, ...\u001b[0m\n", + "\u001b[91m97: Guess: $157.49 Truth: $1.94 Error: $155.55 SLE: 15.90 Item: Sanctuary Square Backplate Finish: Oiled...\u001b[0m\n", + "\u001b[92m98: Guess: $283.05 Truth: $284.34 Error: $1.29 SLE: 0.00 Item: Pelican Protector 1750 Long Case - Multi...\u001b[0m\n", + "\u001b[93m99: Guess: $214.80 Truth: $171.90 Error: $42.90 SLE: 0.05 Item: Brock Replacement Driver and Passenger H...\u001b[0m\n", + "\u001b[91m100: Guess: $232.53 Truth: $144.99 Error: $87.54 SLE: 0.22 Item: Carlinkit Ai Box Mini, Android 11, Multi...\u001b[0m\n", + "\u001b[91m101: Guess: $251.67 Truth: $470.47 Error: $218.80 SLE: 0.39 Item: StarDot NetCamLIVE2 YouTube Live Stream ...\u001b[0m\n", + "\u001b[91m102: Guess: $220.74 Truth: $66.95 Error: $153.79 SLE: 1.40 Item: Atomic Compatible FILXXCAR0016 16x25x5 M...\u001b[0m\n", + "\u001b[92m103: Guess: $128.52 Truth: $117.00 Error: $11.52 SLE: 0.01 Item: Bandai Awakening of S. H. s.h.figuarts s...\u001b[0m\n", + "\u001b[93m104: Guess: $236.39 Truth: $172.14 Error: $64.25 SLE: 0.10 Item: Fit System 62135G Passenger Side Towing ...\u001b[0m\n", + "\u001b[93m105: Guess: $295.98 Truth: $392.74 Error: $96.76 SLE: 0.08 Item: Black Horse Black Aluminum Exceed Runnin...\u001b[0m\n", + "\u001b[92m106: Guess: $53.66 Truth: $16.99 Error: $36.67 SLE: 1.23 Item: Dearsun Twinkle Star Color Night Light P...\u001b[0m\n", + "\u001b[93m107: Guess: $62.47 Truth: $1.34 Error: $61.13 SLE: 10.89 Item: Pokemon - Gallade Spirit Link (83/108) -...\u001b[0m\n", + "\u001b[92m108: Guess: $348.03 Truth: $349.98 Error: $1.95 SLE: 0.00 Item: Ibanez GA34STCE-NT GIO Series Classical ...\u001b[0m\n", + "\u001b[93m109: Guess: $247.10 Truth: $370.71 Error: $123.61 SLE: 0.16 Item: Set 2 Heavy Duty 12-16.5 12x16.5 12 Ply ...\u001b[0m\n", + "\u001b[93m110: Guess: $134.89 Truth: $65.88 Error: $69.01 SLE: 0.50 Item: Hairpin Table Legs 28\" Heavy Duty Hairpi...\u001b[0m\n", + "\u001b[92m111: Guess: $219.90 Truth: $229.99 Error: $10.09 SLE: 0.00 Item: Marada Racing Seat with Adjustable Slide...\u001b[0m\n", + "\u001b[91m112: Guess: $155.73 Truth: $9.14 Error: $146.59 SLE: 7.50 Item: Remington Industries 24UL1007STRWHI25 24...\u001b[0m\n", + "\u001b[91m113: Guess: $487.76 Truth: $199.00 Error: $288.76 SLE: 0.80 Item: Acer S3-391-6046 13.3-inch Ultrabook, In...\u001b[0m\n", + "\u001b[91m114: Guess: $254.11 Truth: $109.99 Error: $144.12 SLE: 0.69 Item: ICBEAMER 7\" RGB LED Headlights Bulb Halo...\u001b[0m\n", + "\u001b[93m115: Guess: $374.14 Truth: $570.42 Error: $196.28 SLE: 0.18 Item: R1 Concepts Front Rear Brakes and Rotors...\u001b[0m\n", + "\u001b[92m116: Guess: $264.19 Truth: $279.99 Error: $15.80 SLE: 0.00 Item: Camplux 2.64 GPM Tankless , Outdoor Port...\u001b[0m\n", + "\u001b[91m117: Guess: $114.52 Truth: $30.99 Error: $83.53 SLE: 1.65 Item: KNOKLOCK 10 Pack 3.75 Inch(96mm) Kitchen...\u001b[0m\n", + "\u001b[91m118: Guess: $224.50 Truth: $31.99 Error: $192.51 SLE: 3.69 Item: Valley Enterprises Yaesu USB FTDI CT-62 ...\u001b[0m\n", + "\u001b[91m119: Guess: $162.25 Truth: $15.90 Error: $146.35 SLE: 5.14 Item: G9 LED Light Bulbs,8W,75W 100W replaceme...\u001b[0m\n", + "\u001b[91m120: Guess: $141.30 Truth: $45.99 Error: $95.31 SLE: 1.23 Item: ZCHAOZ 4 Lights Antique White Farmhouse ...\u001b[0m\n", + "\u001b[91m121: Guess: $227.91 Truth: $113.52 Error: $114.39 SLE: 0.48 Item: Honeywell TH8320R1003 Honeywell VisionPr...\u001b[0m\n", + "\u001b[91m122: Guess: $306.45 Truth: $516.99 Error: $210.54 SLE: 0.27 Item: Patriot Exhaust H8013-1 1-7/8\" Clippster...\u001b[0m\n", + "\u001b[92m123: Guess: $217.02 Truth: $196.99 Error: $20.03 SLE: 0.01 Item: Fitrite Autopart New Front Left Driver S...\u001b[0m\n", + "\u001b[91m124: Guess: $162.74 Truth: $46.55 Error: $116.19 SLE: 1.53 Item: Technical Precision Replacement for GE G...\u001b[0m\n", + "\u001b[92m125: Guess: $288.98 Truth: $356.99 Error: $68.01 SLE: 0.04 Item: Covercraft Carhartt SeatSaver Front Row ...\u001b[0m\n", + "\u001b[93m126: Guess: $221.58 Truth: $319.95 Error: $98.37 SLE: 0.13 Item: Sennheiser SD Pro 2 (506008) - Double-Si...\u001b[0m\n", + "\u001b[93m127: Guess: $157.53 Truth: $96.06 Error: $61.47 SLE: 0.24 Item: Hitachi MAF0110 Mass Air Flow Sensor\u001b[0m\n", + "\u001b[91m128: Guess: $271.09 Truth: $190.99 Error: $80.10 SLE: 0.12 Item: AmScope SE305R-P-LED-PS36A 10X-30X LED C...\u001b[0m\n", + "\u001b[93m129: Guess: $185.29 Truth: $257.95 Error: $72.66 SLE: 0.11 Item: Front Left Driver Side Window Regulator ...\u001b[0m\n", + "\u001b[91m130: Guess: $174.57 Truth: $62.95 Error: $111.62 SLE: 1.02 Item: Premium Replica Hubcap Set, Fits Nissan ...\u001b[0m\n", + "\u001b[92m131: Guess: $46.20 Truth: $47.66 Error: $1.46 SLE: 0.00 Item: Excellerations Phonics Spelling Game for...\u001b[0m\n", + "\u001b[93m132: Guess: $275.28 Truth: $226.99 Error: $48.29 SLE: 0.04 Item: RC4WD BigDog Dual Axle Scale Car/Truck T...\u001b[0m\n", + "\u001b[93m133: Guess: $276.66 Truth: $359.95 Error: $83.29 SLE: 0.07 Item: Unknown Stage 2 Clutch Kit - Low Altitud...\u001b[0m\n", + "\u001b[91m134: Guess: $257.09 Truth: $78.40 Error: $178.69 SLE: 1.39 Item: 2002-2008 Dodge Ram 1500 Mopar 4X4 Emble...\u001b[0m\n", + "\u001b[92m135: Guess: $186.20 Truth: $172.77 Error: $13.43 SLE: 0.01 Item: Pro Comp Alloys Series 89 Wheel with Pol...\u001b[0m\n", + "\u001b[93m136: Guess: $242.47 Truth: $316.45 Error: $73.98 SLE: 0.07 Item: Detroit Axle - Front Rear Strut & Coil S...\u001b[0m\n", + "\u001b[91m137: Guess: $169.15 Truth: $87.99 Error: $81.16 SLE: 0.42 Item: ECCPP Rear Wheel Axle Replacement fit fo...\u001b[0m\n", + "\u001b[92m138: Guess: $269.79 Truth: $226.63 Error: $43.16 SLE: 0.03 Item: Dell Latitude E6520 Intel i7-2720QM 2.20...\u001b[0m\n", + "\u001b[91m139: Guess: $193.22 Truth: $31.49 Error: $161.73 SLE: 3.20 Item: F FIERCE CYCLE 251pcs Black Universal Mo...\u001b[0m\n", + "\u001b[92m140: Guess: $174.37 Truth: $196.00 Error: $21.63 SLE: 0.01 Item: Flash Furniture 4 Pk. HERCULES Series 88...\u001b[0m\n", + "\u001b[91m141: Guess: $253.55 Truth: $78.40 Error: $175.15 SLE: 1.36 Item: B&M 30287 Throttle Valve/Kickdown Cable,...\u001b[0m\n", + "\u001b[92m142: Guess: $129.12 Truth: $116.25 Error: $12.87 SLE: 0.01 Item: Gates TCK226 PowerGrip Premium Timing Be...\u001b[0m\n", + "\u001b[92m143: Guess: $139.68 Truth: $112.78 Error: $26.90 SLE: 0.05 Item: Monroe Shocks & Struts Quick-Strut 17149...\u001b[0m\n", + "\u001b[91m144: Guess: $169.66 Truth: $27.32 Error: $142.34 SLE: 3.23 Item: Feit Electric BPMR16/GU10/930CA/6 35W EQ...\u001b[0m\n", + "\u001b[92m145: Guess: $113.93 Truth: $145.91 Error: $31.98 SLE: 0.06 Item: Yellow Jacket 2806 Contractor Extension ...\u001b[0m\n", + "\u001b[93m146: Guess: $211.22 Truth: $171.09 Error: $40.13 SLE: 0.04 Item: Garage-Pro Tailgate SET Compatible with ...\u001b[0m\n", + "\u001b[93m147: Guess: $223.57 Truth: $167.95 Error: $55.62 SLE: 0.08 Item: 3M Perfect It Buffing and Polishing Kit ...\u001b[0m\n", + "\u001b[93m148: Guess: $74.91 Truth: $28.49 Error: $46.42 SLE: 0.89 Item: Chinese Style Dollhouse Model DIY Miniat...\u001b[0m\n", + "\u001b[92m149: Guess: $161.23 Truth: $122.23 Error: $39.00 SLE: 0.08 Item: Generic NRG Innovations SRK-161H Steerin...\u001b[0m\n", + "\u001b[91m150: Guess: $120.85 Truth: $32.99 Error: $87.86 SLE: 1.63 Item: Learning Resources Coding Critters Range...\u001b[0m\n", + "\u001b[91m151: Guess: $182.82 Truth: $71.20 Error: $111.62 SLE: 0.87 Item: Bosch Automotive 15463 Oxygen Sensor, OE...\u001b[0m\n", + "\u001b[92m152: Guess: $78.46 Truth: $112.75 Error: $34.29 SLE: 0.13 Item: Case of 24-2 Inch Blue Painters Tape - 6...\u001b[0m\n", + "\u001b[93m153: Guess: $211.75 Truth: $142.43 Error: $69.32 SLE: 0.16 Item: MOCA Engine Water Pump & Fan Clutch fit ...\u001b[0m\n", + "\u001b[91m154: Guess: $230.96 Truth: $398.99 Error: $168.03 SLE: 0.30 Item: SAREMAS Foot Step Bars for Hyundai Palis...\u001b[0m\n", + "\u001b[92m155: Guess: $404.52 Truth: $449.00 Error: $44.48 SLE: 0.01 Item: Gretsch G9210 Square Neck Boxcar Mahogan...\u001b[0m\n", + "\u001b[91m156: Guess: $271.13 Truth: $189.00 Error: $82.13 SLE: 0.13 Item: NikoMaku Mirror Dash Cam Front and Rear ...\u001b[0m\n", + "\u001b[91m157: Guess: $245.68 Truth: $120.91 Error: $124.77 SLE: 0.50 Item: Fenix HP25R v2.0 USB-C Rechargeable Head...\u001b[0m\n", + "\u001b[93m158: Guess: $267.75 Truth: $203.53 Error: $64.22 SLE: 0.07 Item: R&L Racing Heavy Duty Roll-Up Soft Tonne...\u001b[0m\n", + "\u001b[92m159: Guess: $306.13 Truth: $349.99 Error: $43.86 SLE: 0.02 Item: Garmin 010-02258-10 GPSMAP 64sx, Handhel...\u001b[0m\n", + "\u001b[93m160: Guess: $91.29 Truth: $34.35 Error: $56.94 SLE: 0.92 Item: Brown 5-7/8\" X 8-1/2\" X 3/16\" Thick Heav...\u001b[0m\n", + "\u001b[93m161: Guess: $258.04 Truth: $384.99 Error: $126.95 SLE: 0.16 Item: GAOMON PD2200 Pen Display & 20 Pen Nibs ...\u001b[0m\n", + "\u001b[93m162: Guess: $269.75 Truth: $211.00 Error: $58.75 SLE: 0.06 Item: VXMOTOR for 97-03 Ford F150/F250 Lightdu...\u001b[0m\n", + "\u001b[91m163: Guess: $323.40 Truth: $129.00 Error: $194.40 SLE: 0.84 Item: HP EliteBook 2540p Intel Core i7-640LM X...\u001b[0m\n", + "\u001b[92m164: Guess: $147.41 Truth: $111.45 Error: $35.96 SLE: 0.08 Item: Green EPX Mixing Nozzles 100-Pack-fits 3...\u001b[0m\n", + "\u001b[91m165: Guess: $176.03 Truth: $81.12 Error: $94.91 SLE: 0.59 Item: Box Partners 6 1/4 x 3 1/8\" 13 Pt. Manil...\u001b[0m\n", + "\u001b[93m166: Guess: $331.45 Truth: $457.08 Error: $125.63 SLE: 0.10 Item: Vixen Air 1/2\" NPT Air Ride Suspension H...\u001b[0m\n", + "\u001b[91m167: Guess: $160.27 Truth: $49.49 Error: $110.78 SLE: 1.35 Item: Smart Floor Lamp, 2700-6500K+RGBPink Mul...\u001b[0m\n", + "\u001b[91m168: Guess: $196.85 Truth: $80.56 Error: $116.29 SLE: 0.79 Item: SOZG 324mm Wheelbase Body Shell RC Car B...\u001b[0m\n", + "\u001b[92m169: Guess: $329.67 Truth: $278.39 Error: $51.28 SLE: 0.03 Item: Mickey Thompson ET Street S/S Racing Rad...\u001b[0m\n", + "\u001b[91m170: Guess: $202.99 Truth: $364.50 Error: $161.51 SLE: 0.34 Item: Pirelli 275/40R20 106W XL RFT P0 PZ4-LUX...\u001b[0m\n", + "\u001b[93m171: Guess: $281.98 Truth: $378.99 Error: $97.01 SLE: 0.09 Item: Torklift C3212 Rear Tie Down\u001b[0m\n", + "\u001b[92m172: Guess: $170.63 Truth: $165.28 Error: $5.35 SLE: 0.00 Item: Cardone 78-4226 Remanufactured Ford Comp...\u001b[0m\n", + "\u001b[91m173: Guess: $186.41 Truth: $56.74 Error: $129.67 SLE: 1.39 Item: Kidde AccessPoint 001798 Supra TouchPoin...\u001b[0m\n", + "\u001b[93m174: Guess: $216.29 Truth: $307.95 Error: $91.66 SLE: 0.12 Item: 3M Protecta 3100414 Self Retracting Life...\u001b[0m\n", + "\u001b[91m175: Guess: $174.78 Truth: $38.00 Error: $136.78 SLE: 2.27 Item: Plantronics 89435-01 Wired Headset, Blac...\u001b[0m\n", + "\u001b[91m176: Guess: $153.62 Truth: $53.00 Error: $100.62 SLE: 1.11 Item: Logitech K750 Wireless Solar Keyboard fo...\u001b[0m\n", + "\u001b[93m177: Guess: $361.87 Truth: $498.00 Error: $136.13 SLE: 0.10 Item: Olympus PEN E-PL9 Body Only with 3-Inch ...\u001b[0m\n", + "\u001b[91m178: Guess: $193.87 Truth: $53.99 Error: $139.88 SLE: 1.60 Item: Beck/Arnley 051-6066 Hub & Bearing Assem...\u001b[0m\n", + "\u001b[93m179: Guess: $240.58 Truth: $350.00 Error: $109.42 SLE: 0.14 Item: Eibach Pro-Kit Performance Springs E10-6...\u001b[0m\n", + "\u001b[93m180: Guess: $208.80 Truth: $299.95 Error: $91.15 SLE: 0.13 Item: LEGO DC Batman 1989 Batwing 76161 Displa...\u001b[0m\n", + "\u001b[93m181: Guess: $171.06 Truth: $94.93 Error: $76.13 SLE: 0.34 Item: Kingston Brass KS3608PL Restoration 4-In...\u001b[0m\n", + "\u001b[92m182: Guess: $335.46 Truth: $379.00 Error: $43.54 SLE: 0.01 Item: Polk Vanishing Series 265-LS In-Wall 3-W...\u001b[0m\n", + "\u001b[93m183: Guess: $237.21 Truth: $299.95 Error: $62.74 SLE: 0.05 Item: Spec-D Tuning LED Projector Headlights G...\u001b[0m\n", + "\u001b[92m184: Guess: $44.95 Truth: $24.99 Error: $19.96 SLE: 0.32 Item: RICHMOND & FINCH Airpod Pro Case, Green ...\u001b[0m\n", + "\u001b[91m185: Guess: $169.37 Truth: $41.04 Error: $128.33 SLE: 1.96 Item: LFA Industries 43B-5A-33JT 1/16-1/2-1.5-...\u001b[0m\n", + "\u001b[91m186: Guess: $172.91 Truth: $327.90 Error: $154.99 SLE: 0.41 Item: SAUTVS LED Headlight Assembly for Slings...\u001b[0m\n", + "\u001b[91m187: Guess: $118.32 Truth: $10.99 Error: $107.33 SLE: 5.28 Item: 2 Pack Combo Womens Safety Glasses Impac...\u001b[0m\n", + "\u001b[91m188: Guess: $158.34 Truth: $14.99 Error: $143.35 SLE: 5.29 Item: Arepa - Venezuelan cuisine - Venezuela P...\u001b[0m\n", + "\u001b[92m189: Guess: $119.00 Truth: $84.95 Error: $34.05 SLE: 0.11 Item: Schlage Lock Company KS23D2300 Padlock, ...\u001b[0m\n", + "\u001b[91m190: Guess: $191.56 Truth: $111.00 Error: $80.56 SLE: 0.29 Item: Techni Mobili White Sit to Stand Mobile ...\u001b[0m\n", + "\u001b[92m191: Guess: $141.72 Truth: $123.73 Error: $17.99 SLE: 0.02 Item: Special Lite Products Contemporary Wall ...\u001b[0m\n", + "\u001b[91m192: Guess: $268.10 Truth: $557.38 Error: $289.28 SLE: 0.53 Item: Tascam DP-24SD 24-Track Digital Portastu...\u001b[0m\n", + "\u001b[91m193: Guess: $219.48 Truth: $95.55 Error: $123.93 SLE: 0.68 Item: Glow Lighting 636CC10SP Vista Crystal Fl...\u001b[0m\n", + "\u001b[93m194: Guess: $204.85 Truth: $154.00 Error: $50.85 SLE: 0.08 Item: Z3 Wind Deflector, Smoke Tint, Lexan, Wi...\u001b[0m\n", + "\u001b[91m195: Guess: $300.74 Truth: $198.99 Error: $101.75 SLE: 0.17 Item: Olympus E-20 5MP Digital Camera w/ 4x Op...\u001b[0m\n", + "\u001b[91m196: Guess: $247.62 Truth: $430.44 Error: $182.82 SLE: 0.30 Item: PHYNEDI 1:1000 World Trade Center (1973-...\u001b[0m\n", + "\u001b[92m197: Guess: $72.39 Truth: $45.67 Error: $26.72 SLE: 0.20 Item: YANGHUAN Unstable Unicorns Adventure Car...\u001b[0m\n", + "\u001b[92m198: Guess: $227.17 Truth: $249.00 Error: $21.83 SLE: 0.01 Item: Interlogix NX-1820E NetworX Touch Screen...\u001b[0m\n", + "\u001b[91m199: Guess: $189.59 Truth: $42.99 Error: $146.60 SLE: 2.15 Item: Steering Damper,Universal Motorcycle Han...\u001b[0m\n", + "\u001b[92m200: Guess: $203.70 Truth: $181.33 Error: $22.37 SLE: 0.01 Item: Amprobe TIC 410A Hot Stick Attachment\u001b[0m\n", + "\u001b[91m201: Guess: $124.43 Truth: $6.03 Error: $118.40 SLE: 8.30 Item: MyCableMart 3.5mm Plug/Jack, 4 Conductor...\u001b[0m\n", + "\u001b[93m202: Guess: $85.76 Truth: $29.99 Error: $55.77 SLE: 1.06 Item: OtterBox + Pop Symmetry Series Case for ...\u001b[0m\n", + "\u001b[91m203: Guess: $480.64 Truth: $899.00 Error: $418.36 SLE: 0.39 Item: Dell XPS X8700-1572BLK Desktop ( Intel C...\u001b[0m\n", + "\u001b[93m204: Guess: $252.46 Truth: $399.99 Error: $147.53 SLE: 0.21 Item: Franklin Iron Works Sperry Industrial Br...\u001b[0m\n", + "\u001b[91m205: Guess: $109.06 Truth: $4.66 Error: $104.40 SLE: 8.81 Item: Avery Legal Dividers, Standard Collated ...\u001b[0m\n", + "\u001b[92m206: Guess: $235.94 Truth: $261.41 Error: $25.47 SLE: 0.01 Item: Moen 8346 Commercial Posi-Temp Pressure ...\u001b[0m\n", + "\u001b[91m207: Guess: $218.78 Truth: $136.97 Error: $81.81 SLE: 0.22 Item: Carlisle Versa Trail ATR All Terrain Rad...\u001b[0m\n", + "\u001b[91m208: Guess: $170.94 Truth: $79.00 Error: $91.94 SLE: 0.59 Item: SUNWAYFOTO 44mm Tripod Ball Head Arca Co...\u001b[0m\n", + "\u001b[91m209: Guess: $239.13 Truth: $444.99 Error: $205.86 SLE: 0.38 Item: NanoBeam AC NBE-5AC-Gen2-US 4 Units 5GHz...\u001b[0m\n", + "\u001b[93m210: Guess: $277.91 Truth: $411.94 Error: $134.03 SLE: 0.15 Item: WULF 4\" Front 2\" Rear Leveling Lift Kit ...\u001b[0m\n", + "\u001b[93m211: Guess: $212.26 Truth: $148.40 Error: $63.86 SLE: 0.13 Item: Alera ALEVABFMC Valencia Series Mobile B...\u001b[0m\n", + "\u001b[91m212: Guess: $121.62 Truth: $244.99 Error: $123.37 SLE: 0.48 Item: YU-GI-OH! Ignition Assault Booster Box\u001b[0m\n", + "\u001b[92m213: Guess: $123.70 Truth: $86.50 Error: $37.20 SLE: 0.13 Item: 48\" x 36\" Extra-Large Framed Magnetic Bl...\u001b[0m\n", + "\u001b[92m214: Guess: $305.07 Truth: $297.95 Error: $7.12 SLE: 0.00 Item: Dell Latitude D620 Renewed Notebook PC\u001b[0m\n", + "\u001b[91m215: Guess: $564.31 Truth: $399.99 Error: $164.32 SLE: 0.12 Item: acer Aspire 5 Laptop, AMD Ryzen 3 5300U ...\u001b[0m\n", + "\u001b[91m216: Guess: $249.27 Truth: $599.00 Error: $349.73 SLE: 0.76 Item: Elk 31080/6RC-GRN 30 by 6-Inch Viva 6-Li...\u001b[0m\n", + "\u001b[91m217: Guess: $239.21 Truth: $105.99 Error: $133.22 SLE: 0.65 Item: Barbie Top Model Doll\u001b[0m\n", + "\u001b[91m218: Guess: $348.61 Truth: $689.00 Error: $340.39 SLE: 0.46 Item: Danby Designer 20-In. Electric Range wit...\u001b[0m\n", + "\u001b[92m219: Guess: $337.12 Truth: $404.99 Error: $67.87 SLE: 0.03 Item: FixtureDisplays® Metal Truss Podium Doub...\u001b[0m\n", + "\u001b[92m220: Guess: $234.84 Truth: $207.76 Error: $27.08 SLE: 0.01 Item: ACDelco 13597235 GM Original Equipment A...\u001b[0m\n", + "\u001b[92m221: Guess: $191.47 Truth: $171.82 Error: $19.65 SLE: 0.01 Item: EBC S1KF1135 Stage-1 Premium Street Brak...\u001b[0m\n", + "\u001b[92m222: Guess: $304.47 Truth: $293.24 Error: $11.23 SLE: 0.00 Item: FXR Men's Boost FX Jacket (Black/Orange/...\u001b[0m\n", + "\u001b[93m223: Guess: $277.57 Truth: $374.95 Error: $97.38 SLE: 0.09 Item: SuperATV Scratch Resistant 3-in-1 Flip W...\u001b[0m\n", + "\u001b[93m224: Guess: $180.27 Truth: $111.99 Error: $68.28 SLE: 0.22 Item: SBU 3 Layer All Weather Mini Van Car Cov...\u001b[0m\n", + "\u001b[93m225: Guess: $103.97 Truth: $42.99 Error: $60.98 SLE: 0.76 Item: 2 Pack Outdoor Brochure Holder Advertisi...\u001b[0m\n", + "\u001b[92m226: Guess: $126.29 Truth: $116.71 Error: $9.58 SLE: 0.01 Item: Monroe Shocks & Struts Quick-Strut 17158...\u001b[0m\n", + "\u001b[93m227: Guess: $191.43 Truth: $118.61 Error: $72.82 SLE: 0.23 Item: Elements of Design Magellan EB235AL Thre...\u001b[0m\n", + "\u001b[93m228: Guess: $201.35 Truth: $147.12 Error: $54.23 SLE: 0.10 Item: GM Genuine Parts 15-62961 Air Conditioni...\u001b[0m\n", + "\u001b[91m229: Guess: $227.66 Truth: $119.99 Error: $107.67 SLE: 0.41 Item: Baseus 17-in-1 USB C Docking Station to ...\u001b[0m\n", + "\u001b[91m230: Guess: $207.65 Truth: $369.98 Error: $162.33 SLE: 0.33 Item: Whitehall™ Personalized Whitehall Capito...\u001b[0m\n", + "\u001b[92m231: Guess: $270.47 Truth: $315.55 Error: $45.08 SLE: 0.02 Item: Pro Circuit Works Pipe PY05250 for 02-19...\u001b[0m\n", + "\u001b[93m232: Guess: $264.43 Truth: $190.99 Error: $73.44 SLE: 0.10 Item: HYANKA 15 \"1200W Professional DJ Speaker...\u001b[0m\n", + "\u001b[92m233: Guess: $181.43 Truth: $155.00 Error: $26.43 SLE: 0.02 Item: Bluetooth X6BT Card Reader Writer Encode...\u001b[0m\n", + "\u001b[92m234: Guess: $321.29 Truth: $349.99 Error: $28.70 SLE: 0.01 Item: AIRAID Cold Air Intake System by K&N: In...\u001b[0m\n", + "\u001b[93m235: Guess: $300.28 Truth: $249.99 Error: $50.29 SLE: 0.03 Item: Bostingner Shower Faucets Sets Complete,...\u001b[0m\n", + "\u001b[91m236: Guess: $274.69 Truth: $42.99 Error: $231.70 SLE: 3.37 Item: PIT66 Front Bumper Turn Signal Lights, C...\u001b[0m\n", + "\u001b[92m237: Guess: $31.56 Truth: $17.99 Error: $13.57 SLE: 0.29 Item: Caseology Bumpy Compatible with Google P...\u001b[0m\n", + "\u001b[91m238: Guess: $217.25 Truth: $425.00 Error: $207.75 SLE: 0.45 Item: Fleck 2510 Timer Mechanical Filter Contr...\u001b[0m\n", + "\u001b[92m239: Guess: $273.88 Truth: $249.99 Error: $23.89 SLE: 0.01 Item: Haloview MC7108 Wireless RV Backup Camer...\u001b[0m\n", + "\u001b[93m240: Guess: $179.25 Truth: $138.23 Error: $41.02 SLE: 0.07 Item: Schmidt Spiele - Manhattan\u001b[0m\n", + "\u001b[91m241: Guess: $191.36 Truth: $414.99 Error: $223.63 SLE: 0.59 Item: Corsa 14333 Tip Kit (Ford Mustang GT)\u001b[0m\n", + "\u001b[92m242: Guess: $195.79 Truth: $168.28 Error: $27.51 SLE: 0.02 Item: Hoshizaki FM116A Fan Motor Kit 1\u001b[0m\n", + "\u001b[92m243: Guess: $212.44 Truth: $199.99 Error: $12.45 SLE: 0.00 Item: BAINUO Antler Chandelier Lighting,6 Ligh...\u001b[0m\n", + "\u001b[91m244: Guess: $231.50 Truth: $126.70 Error: $104.80 SLE: 0.36 Item: DNA MOTORING HL-OH-FEXP06-SM-AM Smoke Le...\u001b[0m\n", + "\u001b[91m245: Guess: $167.98 Truth: $5.91 Error: $162.07 SLE: 10.22 Item: Wera Stainless 3840/1 TS 2.5mm Hex Inser...\u001b[0m\n", + "\u001b[92m246: Guess: $221.62 Truth: $193.06 Error: $28.56 SLE: 0.02 Item: Celestron - PowerSeeker 127EQ Telescope ...\u001b[0m\n", + "\u001b[92m247: Guess: $285.70 Truth: $249.99 Error: $35.71 SLE: 0.02 Item: NHOPEEW 10.1inch Android Car Radio Carpl...\u001b[0m\n", + "\u001b[91m248: Guess: $254.74 Truth: $64.12 Error: $190.62 SLE: 1.87 Item: Other Harmonica (Suzuki-2Timer24- A)\u001b[0m\n", + "\u001b[91m249: Guess: $243.62 Truth: $114.99 Error: $128.63 SLE: 0.56 Item: Harley Air Filter Venturi Intake Air Cle...\u001b[0m\n", + "\u001b[91m250: Guess: $321.61 Truth: $926.00 Error: $604.39 SLE: 1.11 Item: Elite Screens Edge Free Ambient Light Re...\u001b[0m\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "Tester.test(rf, test)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "bd895ac9-2814-469c-9d3d-c94e3f42003d", + "metadata": {}, + "outputs": [], + "source": [ + "def gb(item):\n", + " return gradient_boosting.price(description(item))" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "00112510-9634-4164-99b1-9dc7c39f5232", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[93m1: Guess: $278.30 Truth: $374.41 Error: $96.11 SLE: 0.09 Item: OEM AC Compressor w/A/C Repair Kit For F...\u001b[0m\n", + "\u001b[92m2: Guess: $203.69 Truth: $225.11 Error: $21.42 SLE: 0.01 Item: Motorcraft YB3125 Fan Clutch\u001b[0m\n", + "\u001b[91m3: Guess: $282.90 Truth: $61.68 Error: $221.22 SLE: 2.28 Item: Dorman 603-159 Front Washer Fluid Reserv...\u001b[0m\n", + "\u001b[91m4: Guess: $273.28 Truth: $599.99 Error: $326.71 SLE: 0.62 Item: HP Premium 17.3-inch HD Plus Touchscreen...\u001b[0m\n", + "\u001b[91m5: Guess: $215.50 Truth: $16.99 Error: $198.51 SLE: 6.19 Item: 5-Position Super Switch Pickup Selector ...\u001b[0m\n", + "\u001b[91m6: Guess: $119.24 Truth: $31.99 Error: $87.25 SLE: 1.67 Item: Horror Bookmarks, Resin Horror Bookmarks...\u001b[0m\n", + "\u001b[91m7: Guess: $291.20 Truth: $101.79 Error: $189.41 SLE: 1.09 Item: SK6241 - Stinger 4 Gauge 6000 Series Pow...\u001b[0m\n", + "\u001b[92m8: Guess: $264.52 Truth: $289.00 Error: $24.48 SLE: 0.01 Item: Godox ML60Bi LED Light Kit, Handheld LED...\u001b[0m\n", + "\u001b[91m9: Guess: $324.30 Truth: $635.86 Error: $311.56 SLE: 0.45 Item: Randall RG75DG3PLUS G3 Plus 100-Watt Com...\u001b[0m\n", + "\u001b[91m10: Guess: $215.60 Truth: $65.99 Error: $149.61 SLE: 1.38 Item: HOLDWILL 6 Pack LED Shop Light, 4FT 24W ...\u001b[0m\n", + "\u001b[92m11: Guess: $277.66 Truth: $254.21 Error: $23.45 SLE: 0.01 Item: Viking Horns V103C/1005ATK 3 Gallon Air ...\u001b[0m\n", + "\u001b[91m12: Guess: $233.92 Truth: $412.99 Error: $179.07 SLE: 0.32 Item: CURT 70110 Custom Tow Bar Base Plate Bra...\u001b[0m\n", + "\u001b[93m13: Guess: $124.08 Truth: $205.50 Error: $81.42 SLE: 0.25 Item: 10-Pack Solar HAMMERED BRONZE Finish Pos...\u001b[0m\n", + "\u001b[92m14: Guess: $280.70 Truth: $248.23 Error: $32.47 SLE: 0.01 Item: COSTWAY Electric Tumble Dryer, Sliver\u001b[0m\n", + "\u001b[93m15: Guess: $261.35 Truth: $399.00 Error: $137.65 SLE: 0.18 Item: FREE SIGNAL TV Transit 32\" 12 Volt DC Po...\u001b[0m\n", + "\u001b[92m16: Guess: $339.87 Truth: $373.94 Error: $34.07 SLE: 0.01 Item: Bilstein 5100 Monotube Gas Shock Set com...\u001b[0m\n", + "\u001b[91m17: Guess: $235.63 Truth: $92.89 Error: $142.74 SLE: 0.85 Item: Sangean K-200 Multi-Function Upright AM/...\u001b[0m\n", + "\u001b[93m18: Guess: $116.94 Truth: $51.99 Error: $64.95 SLE: 0.64 Item: Charles Leonard Magnetic Lapboard Class ...\u001b[0m\n", + "\u001b[91m19: Guess: $361.21 Truth: $179.00 Error: $182.21 SLE: 0.49 Item: Gigabyte AMD Radeon HD 7870 2 GB GDDR5 D...\u001b[0m\n", + "\u001b[93m20: Guess: $82.58 Truth: $19.42 Error: $63.16 SLE: 1.99 Item: 3dRose LLC 8 x 8 x 0.25 Inches Bull Terr...\u001b[0m\n", + "\u001b[91m21: Guess: $242.43 Truth: $539.95 Error: $297.52 SLE: 0.64 Item: ROKINON 85mm F1.4 Auto Focus Full Frame ...\u001b[0m\n", + "\u001b[93m22: Guess: $203.82 Truth: $147.67 Error: $56.15 SLE: 0.10 Item: AUTOSAVER88 Headlight Assembly Compatibl...\u001b[0m\n", + "\u001b[91m23: Guess: $142.76 Truth: $24.99 Error: $117.77 SLE: 2.93 Item: ASI NAUTICAL 2.5 Inches Opera Glasses Bi...\u001b[0m\n", + "\u001b[91m24: Guess: $291.76 Truth: $149.00 Error: $142.76 SLE: 0.45 Item: Behringer TUBE OVERDRIVE TO100 Authentic...\u001b[0m\n", + "\u001b[92m25: Guess: $26.46 Truth: $16.99 Error: $9.47 SLE: 0.18 Item: Fun Express Insect Finger Puppets - 24 f...\u001b[0m\n", + "\u001b[93m26: Guess: $52.18 Truth: $7.99 Error: $44.19 SLE: 3.16 Item: WAFJAMF Roller Stamp Identity Theft Stam...\u001b[0m\n", + "\u001b[92m27: Guess: $185.96 Truth: $199.99 Error: $14.03 SLE: 0.01 Item: Capulina Tiffany Floor Lamp 2-Light 16\" ...\u001b[0m\n", + "\u001b[92m28: Guess: $245.77 Truth: $251.45 Error: $5.68 SLE: 0.00 Item: Apple Watch Series 6 (GPS, 44mm) - Space...\u001b[0m\n", + "\u001b[92m29: Guess: $242.42 Truth: $231.62 Error: $10.80 SLE: 0.00 Item: ICON 01725 Tandem Axle Fender Skirt FS17...\u001b[0m\n", + "\u001b[93m30: Guess: $192.60 Truth: $135.00 Error: $57.60 SLE: 0.12 Item: SanDisk 128GB Ultra (10 Pack) MicroSD Cl...\u001b[0m\n", + "\u001b[93m31: Guess: $230.39 Truth: $356.62 Error: $126.23 SLE: 0.19 Item: Velvac 2020,L,C/Hr,W,E2003,102\",Bk - 715...\u001b[0m\n", + "\u001b[92m32: Guess: $231.54 Truth: $257.99 Error: $26.45 SLE: 0.01 Item: TCMT Passenger Backrest Sissy Bar & Lugg...\u001b[0m\n", + "\u001b[91m33: Guess: $208.87 Truth: $27.99 Error: $180.88 SLE: 3.92 Item: Alnicov 63.5MM Brass Tremolo Block,Tremo...\u001b[0m\n", + "\u001b[91m34: Guess: $255.85 Truth: $171.20 Error: $84.65 SLE: 0.16 Item: Subaru Forester Outback Legacy OEM Engin...\u001b[0m\n", + "\u001b[93m35: Guess: $172.11 Truth: $225.00 Error: $52.89 SLE: 0.07 Item: Richmond Auto Upholstery - 2012 Dodge Ra...\u001b[0m\n", + "\u001b[91m36: Guess: $206.53 Truth: $105.00 Error: $101.53 SLE: 0.45 Item: AP-39 Automotive Paint Primer Grey 2K Ur...\u001b[0m\n", + "\u001b[92m37: Guess: $264.67 Truth: $299.99 Error: $35.32 SLE: 0.02 Item: Road Top Wireless Carplay Retrofit Kit D...\u001b[0m\n", + "\u001b[92m38: Guess: $438.38 Truth: $535.09 Error: $96.71 SLE: 0.04 Item: Gibson Performance Exhaust 5658 Aluminiz...\u001b[0m\n", + "\u001b[91m39: Guess: $137.75 Truth: $12.33 Error: $125.42 SLE: 5.49 Item: Bella Tunno Happy Links - Baby Montessor...\u001b[0m\n", + "\u001b[91m40: Guess: $201.92 Truth: $84.99 Error: $116.93 SLE: 0.74 Item: CANMORE H300 Handheld GPS Golf Device, S...\u001b[0m\n", + "\u001b[93m41: Guess: $88.92 Truth: $15.99 Error: $72.93 SLE: 2.78 Item: DCPOWER AC Adapter Compatible Replacemen...\u001b[0m\n", + "\u001b[91m42: Guess: $168.94 Truth: $62.44 Error: $106.50 SLE: 0.97 Item: Sharp, VX2128V, Commercial Desktop Calcu...\u001b[0m\n", + "\u001b[92m43: Guess: $63.30 Truth: $82.99 Error: $19.69 SLE: 0.07 Item: Melissa & Doug Lifelike Plush Stork Gian...\u001b[0m\n", + "\u001b[91m44: Guess: $315.57 Truth: $599.95 Error: $284.38 SLE: 0.41 Item: Sony SSCS8 2-Way 3-Driver Center Channel...\u001b[0m\n", + "\u001b[91m45: Guess: $350.86 Truth: $194.99 Error: $155.87 SLE: 0.34 Item: ASUS Chromebook CX1, 14\" Full HD NanoEdg...\u001b[0m\n", + "\u001b[93m46: Guess: $244.32 Truth: $344.95 Error: $100.63 SLE: 0.12 Item: FiiO X7 32GB Hi-Res Lossless Music Playe...\u001b[0m\n", + "\u001b[92m47: Guess: $42.76 Truth: $37.99 Error: $4.77 SLE: 0.01 Item: TORRO Leather Case Compatible with iPhon...\u001b[0m\n", + "\u001b[92m48: Guess: $220.75 Truth: $224.35 Error: $3.60 SLE: 0.00 Item: Universal Air Conditioner KT 1031 A/C Co...\u001b[0m\n", + "\u001b[91m49: Guess: $406.33 Truth: $814.00 Error: $407.67 SLE: 0.48 Item: Street Series Stainless Performance Cat-...\u001b[0m\n", + "\u001b[92m50: Guess: $444.27 Truth: $439.88 Error: $4.39 SLE: 0.00 Item: Lenovo IdeaPad 3 14-inch Laptop, 14.0-in...\u001b[0m\n", + "\u001b[93m51: Guess: $251.37 Truth: $341.43 Error: $90.06 SLE: 0.09 Item: Access Bed Covers TonnoSport 22050219 - ...\u001b[0m\n", + "\u001b[91m52: Guess: $172.66 Truth: $46.78 Error: $125.88 SLE: 1.67 Item: G.I. JOE Hasbro 3 3/4\" Wave 5 Action Fig...\u001b[0m\n", + "\u001b[92m53: Guess: $173.71 Truth: $171.44 Error: $2.27 SLE: 0.00 Item: T&S Brass B-0232-BST Double Pantry Fauce...\u001b[0m\n", + "\u001b[91m54: Guess: $214.37 Truth: $458.00 Error: $243.63 SLE: 0.57 Item: ZTUOAUMA Fuel Injection Pump 3090942 309...\u001b[0m\n", + "\u001b[91m55: Guess: $300.93 Truth: $130.75 Error: $170.18 SLE: 0.69 Item: 2AP18AA#ABA Hp Prime Graphing Calculator...\u001b[0m\n", + "\u001b[93m56: Guess: $163.77 Truth: $83.81 Error: $79.96 SLE: 0.44 Item: Lowrance 000-0119-83 Nmea 2000 25' Exten...\u001b[0m\n", + "\u001b[91m57: Guess: $158.84 Truth: $386.39 Error: $227.55 SLE: 0.78 Item: Jeep Genuine Accessories 82213051 Hood L...\u001b[0m\n", + "\u001b[93m58: Guess: $213.25 Truth: $169.00 Error: $44.25 SLE: 0.05 Item: GODOX CB-06 Hard Carrying Case with Whee...\u001b[0m\n", + "\u001b[91m59: Guess: $156.45 Truth: $17.95 Error: $138.50 SLE: 4.48 Item: Au-Tomotive Gold, INC. Ford Black Valet ...\u001b[0m\n", + "\u001b[92m60: Guess: $256.86 Truth: $269.00 Error: $12.14 SLE: 0.00 Item: Snailfly Black Roof Rack Rail + Cross Ba...\u001b[0m\n", + "\u001b[91m61: Guess: $248.44 Truth: $77.77 Error: $170.67 SLE: 1.33 Item: KING SHA Anti Glare LED Track Lighting H...\u001b[0m\n", + "\u001b[91m62: Guess: $258.29 Truth: $88.99 Error: $169.30 SLE: 1.12 Item: APS Compatible with Chevy Silverado 1500...\u001b[0m\n", + "\u001b[93m63: Guess: $290.76 Truth: $364.41 Error: $73.65 SLE: 0.05 Item: Wilwood Engineering 14011291R Brake Cali...\u001b[0m\n", + "\u001b[91m64: Guess: $228.11 Truth: $127.03 Error: $101.08 SLE: 0.34 Item: ACDelco Gold 336-1925A Starter, Remanufa...\u001b[0m\n", + "\u001b[91m65: Guess: $323.48 Truth: $778.95 Error: $455.47 SLE: 0.77 Item: UWS EC10783 69-Inch Matte Black Heavy-Wa...\u001b[0m\n", + "\u001b[91m66: Guess: $478.90 Truth: $206.66 Error: $272.24 SLE: 0.70 Item: Dell Latitude E5440 14in Business Laptop...\u001b[0m\n", + "\u001b[91m67: Guess: $158.53 Truth: $35.94 Error: $122.59 SLE: 2.14 Item: (Plug and Play) Spare Tire Brake Light W...\u001b[0m\n", + "\u001b[91m68: Guess: $253.81 Truth: $149.00 Error: $104.81 SLE: 0.28 Item: The Ultimate Roadside Rescue Assistant\u001b[0m\n", + "\u001b[93m69: Guess: $197.02 Truth: $251.98 Error: $54.96 SLE: 0.06 Item: Brand New 18\" x 8.5\" Replacement Wheel f...\u001b[0m\n", + "\u001b[92m70: Guess: $180.50 Truth: $160.00 Error: $20.50 SLE: 0.01 Item: Headlight Headlamp LH Left & RH Right Pa...\u001b[0m\n", + "\u001b[91m71: Guess: $137.45 Truth: $39.99 Error: $97.46 SLE: 1.48 Item: Lilo And Stitch Deluxe Oversize Print La...\u001b[0m\n", + "\u001b[91m72: Guess: $191.79 Truth: $362.41 Error: $170.62 SLE: 0.40 Item: AC Compressor & A/C Clutch For Hyundai A...\u001b[0m\n", + "\u001b[93m73: Guess: $208.88 Truth: $344.00 Error: $135.12 SLE: 0.25 Item: House Of Troy PIN475-AB Pinnacle Collect...\u001b[0m\n", + "\u001b[91m74: Guess: $233.11 Truth: $25.09 Error: $208.02 SLE: 4.81 Item: Juno T29 WH Floating Electrical Feed Sin...\u001b[0m\n", + "\u001b[93m75: Guess: $233.16 Truth: $175.95 Error: $57.21 SLE: 0.08 Item: Sherman GO-PARTS - for 2013-2016 Toyota ...\u001b[0m\n", + "\u001b[93m76: Guess: $182.09 Truth: $132.64 Error: $49.45 SLE: 0.10 Item: Roland RPU-3 Electronic Keyboard Pedal o...\u001b[0m\n", + "\u001b[91m77: Guess: $219.68 Truth: $422.99 Error: $203.31 SLE: 0.43 Item: Rockland VMI14 12,000 Pound 12 Volt DC E...\u001b[0m\n", + "\u001b[91m78: Guess: $232.99 Truth: $146.48 Error: $86.51 SLE: 0.21 Item: Max Advanced Brakes Elite XDS Front Cros...\u001b[0m\n", + "\u001b[93m79: Guess: $221.02 Truth: $156.83 Error: $64.19 SLE: 0.12 Item: Quality-Built 11030 Premium Quality Alte...\u001b[0m\n", + "\u001b[92m80: Guess: $222.74 Truth: $251.99 Error: $29.25 SLE: 0.02 Item: Lucida LG-510 Student Classical Guitar, ...\u001b[0m\n", + "\u001b[91m81: Guess: $194.18 Truth: $940.33 Error: $746.15 SLE: 2.48 Item: Longacre 52-79800 Aluminum Turn Plates\u001b[0m\n", + "\u001b[91m82: Guess: $258.16 Truth: $52.99 Error: $205.17 SLE: 2.46 Item: Motion Pro 08-0380 Adjustable Torque Wre...\u001b[0m\n", + "\u001b[92m83: Guess: $187.29 Truth: $219.95 Error: $32.66 SLE: 0.03 Item: Glyph Thunderbolt 3 NVMe Dock (0 GB)\u001b[0m\n", + "\u001b[91m84: Guess: $162.35 Truth: $441.03 Error: $278.68 SLE: 0.99 Item: TOYO Open Country MT Performance Radial ...\u001b[0m\n", + "\u001b[91m85: Guess: $335.47 Truth: $168.98 Error: $166.49 SLE: 0.47 Item: Razer Seiren X USB Streaming Microphone ...\u001b[0m\n", + "\u001b[91m86: Guess: $83.21 Truth: $2.49 Error: $80.72 SLE: 10.13 Item: Happy Birthday to Dad From Your Daughter...\u001b[0m\n", + "\u001b[93m87: Guess: $153.26 Truth: $98.62 Error: $54.64 SLE: 0.19 Item: Little Tikes My Real Jam First Concert S...\u001b[0m\n", + "\u001b[93m88: Guess: $184.29 Truth: $256.95 Error: $72.66 SLE: 0.11 Item: Studio M Peace and Harmony Art Pole Comm...\u001b[0m\n", + "\u001b[91m89: Guess: $153.86 Truth: $30.99 Error: $122.87 SLE: 2.49 Item: MyVolts 12V Power Supply Adaptor Compati...\u001b[0m\n", + "\u001b[91m90: Guess: $322.84 Truth: $569.84 Error: $247.00 SLE: 0.32 Item: Dell Latitude 7212 Rugged Extreme Tablet...\u001b[0m\n", + "\u001b[93m91: Guess: $222.82 Truth: $177.99 Error: $44.83 SLE: 0.05 Item: Covermates Contour Fit Car Cover - Light...\u001b[0m\n", + "\u001b[91m92: Guess: $254.07 Truth: $997.99 Error: $743.92 SLE: 1.86 Item: Westin 57-4025 Black HDX Grille Guard fi...\u001b[0m\n", + "\u001b[92m93: Guess: $208.15 Truth: $219.00 Error: $10.85 SLE: 0.00 Item: Fieldpiece JL2 Job Link Wireless App Tra...\u001b[0m\n", + "\u001b[91m94: Guess: $331.26 Truth: $225.55 Error: $105.71 SLE: 0.15 Item: hansgrohe Talis S Modern Premium Easy Cl...\u001b[0m\n", + "\u001b[93m95: Guess: $317.71 Truth: $495.95 Error: $178.24 SLE: 0.20 Item: G-Technology G-SPEED eS PRO High-Perform...\u001b[0m\n", + "\u001b[91m96: Guess: $349.17 Truth: $942.37 Error: $593.20 SLE: 0.98 Item: DreamLine SHDR-1960723L-01 Shower Door, ...\u001b[0m\n", + "\u001b[91m97: Guess: $206.35 Truth: $1.94 Error: $204.41 SLE: 18.11 Item: Sanctuary Square Backplate Finish: Oiled...\u001b[0m\n", + "\u001b[93m98: Guess: $179.87 Truth: $284.34 Error: $104.47 SLE: 0.21 Item: Pelican Protector 1750 Long Case - Multi...\u001b[0m\n", + "\u001b[92m99: Guess: $199.60 Truth: $171.90 Error: $27.70 SLE: 0.02 Item: Brock Replacement Driver and Passenger H...\u001b[0m\n", + "\u001b[93m100: Guess: $192.47 Truth: $144.99 Error: $47.48 SLE: 0.08 Item: Carlinkit Ai Box Mini, Android 11, Multi...\u001b[0m\n", + "\u001b[91m101: Guess: $248.34 Truth: $470.47 Error: $222.13 SLE: 0.41 Item: StarDot NetCamLIVE2 YouTube Live Stream ...\u001b[0m\n", + "\u001b[91m102: Guess: $200.84 Truth: $66.95 Error: $133.89 SLE: 1.19 Item: Atomic Compatible FILXXCAR0016 16x25x5 M...\u001b[0m\n", + "\u001b[93m103: Guess: $157.29 Truth: $117.00 Error: $40.29 SLE: 0.09 Item: Bandai Awakening of S. H. s.h.figuarts s...\u001b[0m\n", + "\u001b[91m104: Guess: $303.85 Truth: $172.14 Error: $131.71 SLE: 0.32 Item: Fit System 62135G Passenger Side Towing ...\u001b[0m\n", + "\u001b[93m105: Guess: $279.38 Truth: $392.74 Error: $113.36 SLE: 0.12 Item: Black Horse Black Aluminum Exceed Runnin...\u001b[0m\n", + "\u001b[92m106: Guess: $50.35 Truth: $16.99 Error: $33.36 SLE: 1.10 Item: Dearsun Twinkle Star Color Night Light P...\u001b[0m\n", + "\u001b[91m107: Guess: $116.19 Truth: $1.34 Error: $114.85 SLE: 15.32 Item: Pokemon - Gallade Spirit Link (83/108) -...\u001b[0m\n", + "\u001b[92m108: Guess: $283.54 Truth: $349.98 Error: $66.44 SLE: 0.04 Item: Ibanez GA34STCE-NT GIO Series Classical ...\u001b[0m\n", + "\u001b[91m109: Guess: $179.12 Truth: $370.71 Error: $191.59 SLE: 0.52 Item: Set 2 Heavy Duty 12-16.5 12x16.5 12 Ply ...\u001b[0m\n", + "\u001b[91m110: Guess: $168.17 Truth: $65.88 Error: $102.29 SLE: 0.86 Item: Hairpin Table Legs 28\" Heavy Duty Hairpi...\u001b[0m\n", + "\u001b[92m111: Guess: $214.22 Truth: $229.99 Error: $15.77 SLE: 0.01 Item: Marada Racing Seat with Adjustable Slide...\u001b[0m\n", + "\u001b[91m112: Guess: $184.86 Truth: $9.14 Error: $175.72 SLE: 8.46 Item: Remington Industries 24UL1007STRWHI25 24...\u001b[0m\n", + "\u001b[91m113: Guess: $446.01 Truth: $199.00 Error: $247.01 SLE: 0.65 Item: Acer S3-391-6046 13.3-inch Ultrabook, In...\u001b[0m\n", + "\u001b[91m114: Guess: $217.84 Truth: $109.99 Error: $107.85 SLE: 0.46 Item: ICBEAMER 7\" RGB LED Headlights Bulb Halo...\u001b[0m\n", + "\u001b[91m115: Guess: $319.16 Truth: $570.42 Error: $251.26 SLE: 0.34 Item: R1 Concepts Front Rear Brakes and Rotors...\u001b[0m\n", + "\u001b[92m116: Guess: $238.37 Truth: $279.99 Error: $41.62 SLE: 0.03 Item: Camplux 2.64 GPM Tankless , Outdoor Port...\u001b[0m\n", + "\u001b[91m117: Guess: $135.15 Truth: $30.99 Error: $104.16 SLE: 2.10 Item: KNOKLOCK 10 Pack 3.75 Inch(96mm) Kitchen...\u001b[0m\n", + "\u001b[91m118: Guess: $205.67 Truth: $31.99 Error: $173.68 SLE: 3.37 Item: Valley Enterprises Yaesu USB FTDI CT-62 ...\u001b[0m\n", + "\u001b[91m119: Guess: $180.09 Truth: $15.90 Error: $164.19 SLE: 5.62 Item: G9 LED Light Bulbs,8W,75W 100W replaceme...\u001b[0m\n", + "\u001b[91m120: Guess: $135.07 Truth: $45.99 Error: $89.08 SLE: 1.13 Item: ZCHAOZ 4 Lights Antique White Farmhouse ...\u001b[0m\n", + "\u001b[93m121: Guess: $190.33 Truth: $113.52 Error: $76.81 SLE: 0.26 Item: Honeywell TH8320R1003 Honeywell VisionPr...\u001b[0m\n", + "\u001b[91m122: Guess: $283.89 Truth: $516.99 Error: $233.10 SLE: 0.36 Item: Patriot Exhaust H8013-1 1-7/8\" Clippster...\u001b[0m\n", + "\u001b[93m123: Guess: $254.18 Truth: $196.99 Error: $57.19 SLE: 0.06 Item: Fitrite Autopart New Front Left Driver S...\u001b[0m\n", + "\u001b[91m124: Guess: $168.77 Truth: $46.55 Error: $122.22 SLE: 1.62 Item: Technical Precision Replacement for GE G...\u001b[0m\n", + "\u001b[91m125: Guess: $182.05 Truth: $356.99 Error: $174.94 SLE: 0.45 Item: Covercraft Carhartt SeatSaver Front Row ...\u001b[0m\n", + "\u001b[91m126: Guess: $170.57 Truth: $319.95 Error: $149.38 SLE: 0.39 Item: Sennheiser SD Pro 2 (506008) - Double-Si...\u001b[0m\n", + "\u001b[91m127: Guess: $242.11 Truth: $96.06 Error: $146.05 SLE: 0.84 Item: Hitachi MAF0110 Mass Air Flow Sensor\u001b[0m\n", + "\u001b[93m128: Guess: $234.22 Truth: $190.99 Error: $43.23 SLE: 0.04 Item: AmScope SE305R-P-LED-PS36A 10X-30X LED C...\u001b[0m\n", + "\u001b[92m129: Guess: $217.96 Truth: $257.95 Error: $39.99 SLE: 0.03 Item: Front Left Driver Side Window Regulator ...\u001b[0m\n", + "\u001b[91m130: Guess: $143.89 Truth: $62.95 Error: $80.94 SLE: 0.67 Item: Premium Replica Hubcap Set, Fits Nissan ...\u001b[0m\n", + "\u001b[92m131: Guess: $81.00 Truth: $47.66 Error: $33.34 SLE: 0.27 Item: Excellerations Phonics Spelling Game for...\u001b[0m\n", + "\u001b[93m132: Guess: $310.50 Truth: $226.99 Error: $83.51 SLE: 0.10 Item: RC4WD BigDog Dual Axle Scale Car/Truck T...\u001b[0m\n", + "\u001b[93m133: Guess: $276.49 Truth: $359.95 Error: $83.46 SLE: 0.07 Item: Unknown Stage 2 Clutch Kit - Low Altitud...\u001b[0m\n", + "\u001b[91m134: Guess: $206.00 Truth: $78.40 Error: $127.60 SLE: 0.92 Item: 2002-2008 Dodge Ram 1500 Mopar 4X4 Emble...\u001b[0m\n", + "\u001b[93m135: Guess: $241.53 Truth: $172.77 Error: $68.76 SLE: 0.11 Item: Pro Comp Alloys Series 89 Wheel with Pol...\u001b[0m\n", + "\u001b[93m136: Guess: $211.23 Truth: $316.45 Error: $105.22 SLE: 0.16 Item: Detroit Axle - Front Rear Strut & Coil S...\u001b[0m\n", + "\u001b[91m137: Guess: $204.57 Truth: $87.99 Error: $116.58 SLE: 0.70 Item: ECCPP Rear Wheel Axle Replacement fit fo...\u001b[0m\n", + "\u001b[93m138: Guess: $286.85 Truth: $226.63 Error: $60.22 SLE: 0.06 Item: Dell Latitude E6520 Intel i7-2720QM 2.20...\u001b[0m\n", + "\u001b[91m139: Guess: $224.73 Truth: $31.49 Error: $193.24 SLE: 3.76 Item: F FIERCE CYCLE 251pcs Black Universal Mo...\u001b[0m\n", + "\u001b[92m140: Guess: $201.31 Truth: $196.00 Error: $5.31 SLE: 0.00 Item: Flash Furniture 4 Pk. HERCULES Series 88...\u001b[0m\n", + "\u001b[91m141: Guess: $217.56 Truth: $78.40 Error: $139.16 SLE: 1.03 Item: B&M 30287 Throttle Valve/Kickdown Cable,...\u001b[0m\n", + "\u001b[91m142: Guess: $259.01 Truth: $116.25 Error: $142.76 SLE: 0.63 Item: Gates TCK226 PowerGrip Premium Timing Be...\u001b[0m\n", + "\u001b[91m143: Guess: $244.45 Truth: $112.78 Error: $131.67 SLE: 0.59 Item: Monroe Shocks & Struts Quick-Strut 17149...\u001b[0m\n", + "\u001b[91m144: Guess: $157.73 Truth: $27.32 Error: $130.41 SLE: 2.97 Item: Feit Electric BPMR16/GU10/930CA/6 35W EQ...\u001b[0m\n", + "\u001b[92m145: Guess: $183.45 Truth: $145.91 Error: $37.54 SLE: 0.05 Item: Yellow Jacket 2806 Contractor Extension ...\u001b[0m\n", + "\u001b[92m146: Guess: $207.20 Truth: $171.09 Error: $36.11 SLE: 0.04 Item: Garage-Pro Tailgate SET Compatible with ...\u001b[0m\n", + "\u001b[93m147: Guess: $215.04 Truth: $167.95 Error: $47.09 SLE: 0.06 Item: 3M Perfect It Buffing and Polishing Kit ...\u001b[0m\n", + "\u001b[93m148: Guess: $97.34 Truth: $28.49 Error: $68.85 SLE: 1.45 Item: Chinese Style Dollhouse Model DIY Miniat...\u001b[0m\n", + "\u001b[93m149: Guess: $180.19 Truth: $122.23 Error: $57.96 SLE: 0.15 Item: Generic NRG Innovations SRK-161H Steerin...\u001b[0m\n", + "\u001b[91m150: Guess: $118.84 Truth: $32.99 Error: $85.85 SLE: 1.59 Item: Learning Resources Coding Critters Range...\u001b[0m\n", + "\u001b[91m151: Guess: $241.42 Truth: $71.20 Error: $170.22 SLE: 1.47 Item: Bosch Automotive 15463 Oxygen Sensor, OE...\u001b[0m\n", + "\u001b[92m152: Guess: $79.83 Truth: $112.75 Error: $32.92 SLE: 0.12 Item: Case of 24-2 Inch Blue Painters Tape - 6...\u001b[0m\n", + "\u001b[93m153: Guess: $190.48 Truth: $142.43 Error: $48.05 SLE: 0.08 Item: MOCA Engine Water Pump & Fan Clutch fit ...\u001b[0m\n", + "\u001b[91m154: Guess: $226.52 Truth: $398.99 Error: $172.47 SLE: 0.32 Item: SAREMAS Foot Step Bars for Hyundai Palis...\u001b[0m\n", + "\u001b[93m155: Guess: $291.36 Truth: $449.00 Error: $157.64 SLE: 0.19 Item: Gretsch G9210 Square Neck Boxcar Mahogan...\u001b[0m\n", + "\u001b[91m156: Guess: $311.20 Truth: $189.00 Error: $122.20 SLE: 0.25 Item: NikoMaku Mirror Dash Cam Front and Rear ...\u001b[0m\n", + "\u001b[91m157: Guess: $276.11 Truth: $120.91 Error: $155.20 SLE: 0.67 Item: Fenix HP25R v2.0 USB-C Rechargeable Head...\u001b[0m\n", + "\u001b[91m158: Guess: $315.32 Truth: $203.53 Error: $111.79 SLE: 0.19 Item: R&L Racing Heavy Duty Roll-Up Soft Tonne...\u001b[0m\n", + "\u001b[93m159: Guess: $250.39 Truth: $349.99 Error: $99.60 SLE: 0.11 Item: Garmin 010-02258-10 GPSMAP 64sx, Handhel...\u001b[0m\n", + "\u001b[92m160: Guess: $66.07 Truth: $34.35 Error: $31.72 SLE: 0.41 Item: Brown 5-7/8\" X 8-1/2\" X 3/16\" Thick Heav...\u001b[0m\n", + "\u001b[91m161: Guess: $166.61 Truth: $384.99 Error: $218.38 SLE: 0.70 Item: GAOMON PD2200 Pen Display & 20 Pen Nibs ...\u001b[0m\n", + "\u001b[91m162: Guess: $318.54 Truth: $211.00 Error: $107.54 SLE: 0.17 Item: VXMOTOR for 97-03 Ford F150/F250 Lightdu...\u001b[0m\n", + "\u001b[91m163: Guess: $429.09 Truth: $129.00 Error: $300.09 SLE: 1.43 Item: HP EliteBook 2540p Intel Core i7-640LM X...\u001b[0m\n", + "\u001b[92m164: Guess: $138.66 Truth: $111.45 Error: $27.21 SLE: 0.05 Item: Green EPX Mixing Nozzles 100-Pack-fits 3...\u001b[0m\n", + "\u001b[93m165: Guess: $146.67 Truth: $81.12 Error: $65.55 SLE: 0.34 Item: Box Partners 6 1/4 x 3 1/8\" 13 Pt. Manil...\u001b[0m\n", + "\u001b[93m166: Guess: $285.83 Truth: $457.08 Error: $171.25 SLE: 0.22 Item: Vixen Air 1/2\" NPT Air Ride Suspension H...\u001b[0m\n", + "\u001b[91m167: Guess: $186.51 Truth: $49.49 Error: $137.02 SLE: 1.72 Item: Smart Floor Lamp, 2700-6500K+RGBPink Mul...\u001b[0m\n", + "\u001b[91m168: Guess: $185.89 Truth: $80.56 Error: $105.33 SLE: 0.69 Item: SOZG 324mm Wheelbase Body Shell RC Car B...\u001b[0m\n", + "\u001b[93m169: Guess: $200.21 Truth: $278.39 Error: $78.18 SLE: 0.11 Item: Mickey Thompson ET Street S/S Racing Rad...\u001b[0m\n", + "\u001b[91m170: Guess: $198.17 Truth: $364.50 Error: $166.33 SLE: 0.37 Item: Pirelli 275/40R20 106W XL RFT P0 PZ4-LUX...\u001b[0m\n", + "\u001b[93m171: Guess: $272.21 Truth: $378.99 Error: $106.78 SLE: 0.11 Item: Torklift C3212 Rear Tie Down\u001b[0m\n", + "\u001b[91m172: Guess: $250.48 Truth: $165.28 Error: $85.20 SLE: 0.17 Item: Cardone 78-4226 Remanufactured Ford Comp...\u001b[0m\n", + "\u001b[91m173: Guess: $176.93 Truth: $56.74 Error: $120.19 SLE: 1.27 Item: Kidde AccessPoint 001798 Supra TouchPoin...\u001b[0m\n", + "\u001b[91m174: Guess: $177.01 Truth: $307.95 Error: $130.94 SLE: 0.30 Item: 3M Protecta 3100414 Self Retracting Life...\u001b[0m\n", + "\u001b[91m175: Guess: $196.60 Truth: $38.00 Error: $158.60 SLE: 2.63 Item: Plantronics 89435-01 Wired Headset, Blac...\u001b[0m\n", + "\u001b[91m176: Guess: $137.09 Truth: $53.00 Error: $84.09 SLE: 0.88 Item: Logitech K750 Wireless Solar Keyboard fo...\u001b[0m\n", + "\u001b[93m177: Guess: $356.04 Truth: $498.00 Error: $141.96 SLE: 0.11 Item: Olympus PEN E-PL9 Body Only with 3-Inch ...\u001b[0m\n", + "\u001b[91m178: Guess: $196.95 Truth: $53.99 Error: $142.96 SLE: 1.64 Item: Beck/Arnley 051-6066 Hub & Bearing Assem...\u001b[0m\n", + "\u001b[93m179: Guess: $220.11 Truth: $350.00 Error: $129.89 SLE: 0.21 Item: Eibach Pro-Kit Performance Springs E10-6...\u001b[0m\n", + "\u001b[93m180: Guess: $207.31 Truth: $299.95 Error: $92.64 SLE: 0.14 Item: LEGO DC Batman 1989 Batwing 76161 Displa...\u001b[0m\n", + "\u001b[91m181: Guess: $198.22 Truth: $94.93 Error: $103.29 SLE: 0.53 Item: Kingston Brass KS3608PL Restoration 4-In...\u001b[0m\n", + "\u001b[92m182: Guess: $322.76 Truth: $379.00 Error: $56.24 SLE: 0.03 Item: Polk Vanishing Series 265-LS In-Wall 3-W...\u001b[0m\n", + "\u001b[93m183: Guess: $218.40 Truth: $299.95 Error: $81.55 SLE: 0.10 Item: Spec-D Tuning LED Projector Headlights G...\u001b[0m\n", + "\u001b[92m184: Guess: $47.27 Truth: $24.99 Error: $22.28 SLE: 0.38 Item: RICHMOND & FINCH Airpod Pro Case, Green ...\u001b[0m\n", + "\u001b[91m185: Guess: $187.08 Truth: $41.04 Error: $146.04 SLE: 2.24 Item: LFA Industries 43B-5A-33JT 1/16-1/2-1.5-...\u001b[0m\n", + "\u001b[93m186: Guess: $201.16 Truth: $327.90 Error: $126.74 SLE: 0.24 Item: SAUTVS LED Headlight Assembly for Slings...\u001b[0m\n", + "\u001b[93m187: Guess: $73.01 Truth: $10.99 Error: $62.02 SLE: 3.31 Item: 2 Pack Combo Womens Safety Glasses Impac...\u001b[0m\n", + "\u001b[91m188: Guess: $143.08 Truth: $14.99 Error: $128.09 SLE: 4.83 Item: Arepa - Venezuelan cuisine - Venezuela P...\u001b[0m\n", + "\u001b[93m189: Guess: $148.89 Truth: $84.95 Error: $63.94 SLE: 0.31 Item: Schlage Lock Company KS23D2300 Padlock, ...\u001b[0m\n", + "\u001b[91m190: Guess: $221.71 Truth: $111.00 Error: $110.71 SLE: 0.47 Item: Techni Mobili White Sit to Stand Mobile ...\u001b[0m\n", + "\u001b[92m191: Guess: $89.23 Truth: $123.73 Error: $34.50 SLE: 0.10 Item: Special Lite Products Contemporary Wall ...\u001b[0m\n", + "\u001b[91m192: Guess: $231.98 Truth: $557.38 Error: $325.40 SLE: 0.76 Item: Tascam DP-24SD 24-Track Digital Portastu...\u001b[0m\n", + "\u001b[91m193: Guess: $223.09 Truth: $95.55 Error: $127.54 SLE: 0.71 Item: Glow Lighting 636CC10SP Vista Crystal Fl...\u001b[0m\n", + "\u001b[91m194: Guess: $286.77 Truth: $154.00 Error: $132.77 SLE: 0.38 Item: Z3 Wind Deflector, Smoke Tint, Lexan, Wi...\u001b[0m\n", + "\u001b[93m195: Guess: $263.81 Truth: $198.99 Error: $64.82 SLE: 0.08 Item: Olympus E-20 5MP Digital Camera w/ 4x Op...\u001b[0m\n", + "\u001b[91m196: Guess: $214.16 Truth: $430.44 Error: $216.28 SLE: 0.48 Item: PHYNEDI 1:1000 World Trade Center (1973-...\u001b[0m\n", + "\u001b[93m197: Guess: $103.21 Truth: $45.67 Error: $57.54 SLE: 0.65 Item: YANGHUAN Unstable Unicorns Adventure Car...\u001b[0m\n", + "\u001b[92m198: Guess: $205.72 Truth: $249.00 Error: $43.28 SLE: 0.04 Item: Interlogix NX-1820E NetworX Touch Screen...\u001b[0m\n", + "\u001b[91m199: Guess: $189.09 Truth: $42.99 Error: $146.10 SLE: 2.14 Item: Steering Damper,Universal Motorcycle Han...\u001b[0m\n", + "\u001b[92m200: Guess: $193.65 Truth: $181.33 Error: $12.32 SLE: 0.00 Item: Amprobe TIC 410A Hot Stick Attachment\u001b[0m\n", + "\u001b[91m201: Guess: $143.67 Truth: $6.03 Error: $137.64 SLE: 9.15 Item: MyCableMart 3.5mm Plug/Jack, 4 Conductor...\u001b[0m\n", + "\u001b[91m202: Guess: $111.76 Truth: $29.99 Error: $81.77 SLE: 1.67 Item: OtterBox + Pop Symmetry Series Case for ...\u001b[0m\n", + "\u001b[91m203: Guess: $531.68 Truth: $899.00 Error: $367.32 SLE: 0.28 Item: Dell XPS X8700-1572BLK Desktop ( Intel C...\u001b[0m\n", + "\u001b[93m204: Guess: $274.56 Truth: $399.99 Error: $125.43 SLE: 0.14 Item: Franklin Iron Works Sperry Industrial Br...\u001b[0m\n", + "\u001b[91m205: Guess: $133.58 Truth: $4.66 Error: $128.92 SLE: 10.04 Item: Avery Legal Dividers, Standard Collated ...\u001b[0m\n", + "\u001b[92m206: Guess: $258.49 Truth: $261.41 Error: $2.92 SLE: 0.00 Item: Moen 8346 Commercial Posi-Temp Pressure ...\u001b[0m\n", + "\u001b[91m207: Guess: $247.15 Truth: $136.97 Error: $110.18 SLE: 0.34 Item: Carlisle Versa Trail ATR All Terrain Rad...\u001b[0m\n", + "\u001b[91m208: Guess: $203.62 Truth: $79.00 Error: $124.62 SLE: 0.88 Item: SUNWAYFOTO 44mm Tripod Ball Head Arca Co...\u001b[0m\n", + "\u001b[93m209: Guess: $301.04 Truth: $444.99 Error: $143.95 SLE: 0.15 Item: NanoBeam AC NBE-5AC-Gen2-US 4 Units 5GHz...\u001b[0m\n", + "\u001b[93m210: Guess: $272.39 Truth: $411.94 Error: $139.55 SLE: 0.17 Item: WULF 4\" Front 2\" Rear Leveling Lift Kit ...\u001b[0m\n", + "\u001b[92m211: Guess: $168.37 Truth: $148.40 Error: $19.97 SLE: 0.02 Item: Alera ALEVABFMC Valencia Series Mobile B...\u001b[0m\n", + "\u001b[91m212: Guess: $145.44 Truth: $244.99 Error: $99.55 SLE: 0.27 Item: YU-GI-OH! Ignition Assault Booster Box\u001b[0m\n", + "\u001b[91m213: Guess: $173.32 Truth: $86.50 Error: $86.82 SLE: 0.48 Item: 48\" x 36\" Extra-Large Framed Magnetic Bl...\u001b[0m\n", + "\u001b[92m214: Guess: $316.82 Truth: $297.95 Error: $18.87 SLE: 0.00 Item: Dell Latitude D620 Renewed Notebook PC\u001b[0m\n", + "\u001b[93m215: Guess: $534.76 Truth: $399.99 Error: $134.77 SLE: 0.08 Item: acer Aspire 5 Laptop, AMD Ryzen 3 5300U ...\u001b[0m\n", + "\u001b[91m216: Guess: $203.98 Truth: $599.00 Error: $395.02 SLE: 1.15 Item: Elk 31080/6RC-GRN 30 by 6-Inch Viva 6-Li...\u001b[0m\n", + "\u001b[91m217: Guess: $258.75 Truth: $105.99 Error: $152.76 SLE: 0.79 Item: Barbie Top Model Doll\u001b[0m\n", + "\u001b[91m218: Guess: $300.72 Truth: $689.00 Error: $388.28 SLE: 0.68 Item: Danby Designer 20-In. Electric Range wit...\u001b[0m\n", + "\u001b[93m219: Guess: $270.29 Truth: $404.99 Error: $134.70 SLE: 0.16 Item: FixtureDisplays® Metal Truss Podium Doub...\u001b[0m\n", + "\u001b[92m220: Guess: $225.78 Truth: $207.76 Error: $18.02 SLE: 0.01 Item: ACDelco 13597235 GM Original Equipment A...\u001b[0m\n", + "\u001b[93m221: Guess: $220.83 Truth: $171.82 Error: $49.01 SLE: 0.06 Item: EBC S1KF1135 Stage-1 Premium Street Brak...\u001b[0m\n", + "\u001b[92m222: Guess: $274.91 Truth: $293.24 Error: $18.33 SLE: 0.00 Item: FXR Men's Boost FX Jacket (Black/Orange/...\u001b[0m\n", + "\u001b[93m223: Guess: $252.93 Truth: $374.95 Error: $122.02 SLE: 0.15 Item: SuperATV Scratch Resistant 3-in-1 Flip W...\u001b[0m\n", + "\u001b[91m224: Guess: $202.49 Truth: $111.99 Error: $90.50 SLE: 0.35 Item: SBU 3 Layer All Weather Mini Van Car Cov...\u001b[0m\n", + "\u001b[92m225: Guess: $69.11 Truth: $42.99 Error: $26.12 SLE: 0.22 Item: 2 Pack Outdoor Brochure Holder Advertisi...\u001b[0m\n", + "\u001b[91m226: Guess: $205.10 Truth: $116.71 Error: $88.39 SLE: 0.31 Item: Monroe Shocks & Struts Quick-Strut 17158...\u001b[0m\n", + "\u001b[91m227: Guess: $260.83 Truth: $118.61 Error: $142.22 SLE: 0.61 Item: Elements of Design Magellan EB235AL Thre...\u001b[0m\n", + "\u001b[93m228: Guess: $225.50 Truth: $147.12 Error: $78.38 SLE: 0.18 Item: GM Genuine Parts 15-62961 Air Conditioni...\u001b[0m\n", + "\u001b[91m229: Guess: $257.43 Truth: $119.99 Error: $137.44 SLE: 0.58 Item: Baseus 17-in-1 USB C Docking Station to ...\u001b[0m\n", + "\u001b[91m230: Guess: $86.38 Truth: $369.98 Error: $283.60 SLE: 2.09 Item: Whitehall™ Personalized Whitehall Capito...\u001b[0m\n", + "\u001b[92m231: Guess: $301.46 Truth: $315.55 Error: $14.09 SLE: 0.00 Item: Pro Circuit Works Pipe PY05250 for 02-19...\u001b[0m\n", + "\u001b[91m232: Guess: $302.66 Truth: $190.99 Error: $111.67 SLE: 0.21 Item: HYANKA 15 \"1200W Professional DJ Speaker...\u001b[0m\n", + "\u001b[92m233: Guess: $190.35 Truth: $155.00 Error: $35.35 SLE: 0.04 Item: Bluetooth X6BT Card Reader Writer Encode...\u001b[0m\n", + "\u001b[92m234: Guess: $328.06 Truth: $349.99 Error: $21.93 SLE: 0.00 Item: AIRAID Cold Air Intake System by K&N: In...\u001b[0m\n", + "\u001b[92m235: Guess: $250.34 Truth: $249.99 Error: $0.35 SLE: 0.00 Item: Bostingner Shower Faucets Sets Complete,...\u001b[0m\n", + "\u001b[91m236: Guess: $198.44 Truth: $42.99 Error: $155.45 SLE: 2.28 Item: PIT66 Front Bumper Turn Signal Lights, C...\u001b[0m\n", + "\u001b[93m237: Guess: $64.90 Truth: $17.99 Error: $46.91 SLE: 1.55 Item: Caseology Bumpy Compatible with Google P...\u001b[0m\n", + "\u001b[93m238: Guess: $265.44 Truth: $425.00 Error: $159.56 SLE: 0.22 Item: Fleck 2510 Timer Mechanical Filter Contr...\u001b[0m\n", + "\u001b[93m239: Guess: $318.67 Truth: $249.99 Error: $68.68 SLE: 0.06 Item: Haloview MC7108 Wireless RV Backup Camer...\u001b[0m\n", + "\u001b[93m240: Guess: $200.08 Truth: $138.23 Error: $61.85 SLE: 0.14 Item: Schmidt Spiele - Manhattan\u001b[0m\n", + "\u001b[93m241: Guess: $261.93 Truth: $414.99 Error: $153.06 SLE: 0.21 Item: Corsa 14333 Tip Kit (Ford Mustang GT)\u001b[0m\n", + "\u001b[93m242: Guess: $209.66 Truth: $168.28 Error: $41.38 SLE: 0.05 Item: Hoshizaki FM116A Fan Motor Kit 1\u001b[0m\n", + "\u001b[92m243: Guess: $171.20 Truth: $199.99 Error: $28.79 SLE: 0.02 Item: BAINUO Antler Chandelier Lighting,6 Ligh...\u001b[0m\n", + "\u001b[91m244: Guess: $226.91 Truth: $126.70 Error: $100.21 SLE: 0.34 Item: DNA MOTORING HL-OH-FEXP06-SM-AM Smoke Le...\u001b[0m\n", + "\u001b[91m245: Guess: $250.44 Truth: $5.91 Error: $244.53 SLE: 12.92 Item: Wera Stainless 3840/1 TS 2.5mm Hex Inser...\u001b[0m\n", + "\u001b[92m246: Guess: $213.52 Truth: $193.06 Error: $20.46 SLE: 0.01 Item: Celestron - PowerSeeker 127EQ Telescope ...\u001b[0m\n", + "\u001b[92m247: Guess: $261.82 Truth: $249.99 Error: $11.83 SLE: 0.00 Item: NHOPEEW 10.1inch Android Car Radio Carpl...\u001b[0m\n", + "\u001b[91m248: Guess: $223.29 Truth: $64.12 Error: $159.17 SLE: 1.53 Item: Other Harmonica (Suzuki-2Timer24- A)\u001b[0m\n", + "\u001b[91m249: Guess: $267.11 Truth: $114.99 Error: $152.12 SLE: 0.70 Item: Harley Air Filter Venturi Intake Air Cle...\u001b[0m\n", + "\u001b[91m250: Guess: $329.56 Truth: $926.00 Error: $596.44 SLE: 1.06 Item: Elite Screens Edge Free Ambient Light Re...\u001b[0m\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "Tester.test(gb, test)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "9f759bd2-7a7e-4c1a-80a0-e12470feca89", + "metadata": {}, + "outputs": [], + "source": [ + "product = \"Quadcast HyperX condenser mic for high quality audio for podcasting\"" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "e44dbd25-fb95-4b6b-bbbb-8da5fc817105", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "189.0\n", + "154.59\n", + "296.79030000000023\n", + "227.5062013552033\n" + ] + } + ], + "source": [ + "print(specialist.price(product))\n", + "print(frontier.price(product))\n", + "print(random_forest.price(product))\n", + "print(gradient_boosting.price(product))" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "1779b353-e2bb-4fc7-be7c-93057e4d688a", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 0%|▎ | 1/250 [00:04<17:04, 4.11s/it]INFO:backoff:Backing off send_request(...) for 0.1s (requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='us.i.posthog.com', port=443): Read timed out. (read timeout=15))\n", + "100%|████████████████████████████████████████████████████████████████████████████████| 250/250 [15:38<00:00, 3.75s/it]\n" + ] + } + ], + "source": [ + "specialists = []\n", + "frontiers = []\n", + "random_forests = []\n", + "gradient_boostings = []\n", + "prices = []\n", + "for item in tqdm(test[1000:1250]):\n", + " text = description(item)\n", + " specialists.append(specialist.price(text))\n", + " frontiers.append(frontier.price(text))\n", + " random_forests.append(random_forest.price(text))\n", + " gradient_boostings.append(gradient_boosting.price(text))\n", + " prices.append(item.price)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "f0bca725-4e34-405b-8d90-41d67086a25d", + "metadata": {}, + "outputs": [], + "source": [ + "mins = [min(s,f,r,g) for s,f,r,g in zip(specialists, frontiers, random_forests, gradient_boostings)]\n", + "maxes = [max(s,f,r,g) for s,f,r,g in zip(specialists, frontiers, random_forests, gradient_boostings)]\n", + "\n", + "X = pd.DataFrame({\n", + " 'Specialist': specialists,\n", + " 'Frontier': frontiers,\n", + " 'RandomForest': random_forests,\n", + " 'GradientBoosting' : gradient_boostings,\n", + " 'Min': mins,\n", + " 'Max': maxes,\n", + "})\n", + "\n", + "# Convert y to a Series\n", + "y = pd.Series(prices)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "1be5be8a-3e7f-42a2-be54-0c7e380f7cc4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Specialist: 0.75\n", + "Frontier: 0.45\n", + "RandomForest: -0.02\n", + "GradientBoosting: 0.09\n", + "Min: -0.15\n", + "Max: -0.16\n", + "Intercept=24.08\n" + ] + } + ], + "source": [ + "# Train a Linear Regression\n", + "np.random.seed(42)\n", + "\n", + "lr = LinearRegression()\n", + "lr.fit(X, y)\n", + "\n", + "feature_columns = X.columns.tolist()\n", + "\n", + "for feature, coef in zip(feature_columns, lr.coef_):\n", + " print(f\"{feature}: {coef:.2f}\")\n", + "print(f\"Intercept={lr.intercept_:.2f}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "0bdf6e68-28a3-4ed2-b17e-de0ede923d34", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['ensemble_model.pkl']" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "joblib.dump(lr, 'ensemble_model.pkl')" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "e762441a-9470-4dd7-8a8f-ec0430e908c7", + "metadata": {}, + "outputs": [], + "source": [ + "from agents.ensemble_agent import EnsembleAgent\n", + "ensemble = EnsembleAgent(collection)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "1a29f03c-8010-43b7-ae7d-1bc85ca6e8e2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "177.20276709193624" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ensemble.price(product)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "e6a5e226-a508-43d5-aa42-cefbde72ffdf", + "metadata": {}, + "outputs": [], + "source": [ + "def ensemble_pricer(item):\n", + " return max(0,ensemble.price(description(item)))" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "8397b1ef-2ea3-4af8-bb34-36594e0600cc", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[92m1: Guess: $414.77 Truth: $374.41 Error: $40.36 SLE: 0.01 Item: OEM AC Compressor w/A/C Repair Kit For F...\u001b[0m\n", + "\u001b[92m2: Guess: $210.12 Truth: $225.11 Error: $14.99 SLE: 0.00 Item: Motorcraft YB3125 Fan Clutch\u001b[0m\n", + "\u001b[92m3: Guess: $50.48 Truth: $61.68 Error: $11.20 SLE: 0.04 Item: Dorman 603-159 Front Washer Fluid Reserv...\u001b[0m\n", + "\u001b[93m4: Guess: $380.50 Truth: $599.99 Error: $219.49 SLE: 0.21 Item: HP Premium 17.3-inch HD Plus Touchscreen...\u001b[0m\n", + "\u001b[92m5: Guess: $25.65 Truth: $16.99 Error: $8.66 SLE: 0.15 Item: 5-Position Super Switch Pickup Selector ...\u001b[0m\n", + "\u001b[92m6: Guess: $48.76 Truth: $31.99 Error: $16.77 SLE: 0.17 Item: Horror Bookmarks, Resin Horror Bookmarks...\u001b[0m\n", + "\u001b[92m7: Guess: $120.04 Truth: $101.79 Error: $18.25 SLE: 0.03 Item: SK6241 - Stinger 4 Gauge 6000 Series Pow...\u001b[0m\n", + "\u001b[92m8: Guess: $343.88 Truth: $289.00 Error: $54.88 SLE: 0.03 Item: Godox ML60Bi LED Light Kit, Handheld LED...\u001b[0m\n", + "\u001b[93m9: Guess: $867.46 Truth: $635.86 Error: $231.60 SLE: 0.10 Item: Randall RG75DG3PLUS G3 Plus 100-Watt Com...\u001b[0m\n", + "\u001b[92m10: Guess: $69.98 Truth: $65.99 Error: $3.99 SLE: 0.00 Item: HOLDWILL 6 Pack LED Shop Light, 4FT 24W ...\u001b[0m\n", + "\u001b[92m11: Guess: $246.95 Truth: $254.21 Error: $7.26 SLE: 0.00 Item: Viking Horns V103C/1005ATK 3 Gallon Air ...\u001b[0m\n", + "\u001b[92m12: Guess: $420.74 Truth: $412.99 Error: $7.75 SLE: 0.00 Item: CURT 70110 Custom Tow Bar Base Plate Bra...\u001b[0m\n", + "\u001b[92m13: Guess: $239.59 Truth: $205.50 Error: $34.09 SLE: 0.02 Item: 10-Pack Solar HAMMERED BRONZE Finish Pos...\u001b[0m\n", + "\u001b[92m14: Guess: $283.99 Truth: $248.23 Error: $35.76 SLE: 0.02 Item: COSTWAY Electric Tumble Dryer, Sliver\u001b[0m\n", + "\u001b[92m15: Guess: $329.95 Truth: $399.00 Error: $69.05 SLE: 0.04 Item: FREE SIGNAL TV Transit 32\" 12 Volt DC Po...\u001b[0m\n", + "\u001b[92m16: Guess: $370.01 Truth: $373.94 Error: $3.93 SLE: 0.00 Item: Bilstein 5100 Monotube Gas Shock Set com...\u001b[0m\n", + "\u001b[92m17: Guess: $124.73 Truth: $92.89 Error: $31.84 SLE: 0.09 Item: Sangean K-200 Multi-Function Upright AM/...\u001b[0m\n", + "\u001b[93m18: Guess: $105.55 Truth: $51.99 Error: $53.56 SLE: 0.49 Item: Charles Leonard Magnetic Lapboard Class ...\u001b[0m\n", + "\u001b[91m19: Guess: $287.70 Truth: $179.00 Error: $108.70 SLE: 0.22 Item: Gigabyte AMD Radeon HD 7870 2 GB GDDR5 D...\u001b[0m\n", + "\u001b[92m20: Guess: $37.60 Truth: $19.42 Error: $18.18 SLE: 0.41 Item: 3dRose LLC 8 x 8 x 0.25 Inches Bull Terr...\u001b[0m\n", + "\u001b[92m21: Guess: $488.72 Truth: $539.95 Error: $51.23 SLE: 0.01 Item: ROKINON 85mm F1.4 Auto Focus Full Frame ...\u001b[0m\n", + "\u001b[92m22: Guess: $153.75 Truth: $147.67 Error: $6.08 SLE: 0.00 Item: AUTOSAVER88 Headlight Assembly Compatibl...\u001b[0m\n", + "\u001b[92m23: Guess: $28.77 Truth: $24.99 Error: $3.78 SLE: 0.02 Item: ASI NAUTICAL 2.5 Inches Opera Glasses Bi...\u001b[0m\n", + "\u001b[93m24: Guess: $79.27 Truth: $149.00 Error: $69.73 SLE: 0.39 Item: Behringer TUBE OVERDRIVE TO100 Authentic...\u001b[0m\n", + "\u001b[92m25: Guess: $31.67 Truth: $16.99 Error: $14.68 SLE: 0.36 Item: Fun Express Insect Finger Puppets - 24 f...\u001b[0m\n", + "\u001b[92m26: Guess: $25.92 Truth: $7.99 Error: $17.93 SLE: 1.20 Item: WAFJAMF Roller Stamp Identity Theft Stam...\u001b[0m\n", + "\u001b[92m27: Guess: $204.56 Truth: $199.99 Error: $4.57 SLE: 0.00 Item: Capulina Tiffany Floor Lamp 2-Light 16\" ...\u001b[0m\n", + "\u001b[92m28: Guess: $296.82 Truth: $251.45 Error: $45.37 SLE: 0.03 Item: Apple Watch Series 6 (GPS, 44mm) - Space...\u001b[0m\n", + "\u001b[92m29: Guess: $253.13 Truth: $231.62 Error: $21.51 SLE: 0.01 Item: ICON 01725 Tandem Axle Fender Skirt FS17...\u001b[0m\n", + "\u001b[92m30: Guess: $155.98 Truth: $135.00 Error: $20.98 SLE: 0.02 Item: SanDisk 128GB Ultra (10 Pack) MicroSD Cl...\u001b[0m\n", + "\u001b[92m31: Guess: $407.45 Truth: $356.62 Error: $50.83 SLE: 0.02 Item: Velvac 2020,L,C/Hr,W,E2003,102\",Bk - 715...\u001b[0m\n", + "\u001b[92m32: Guess: $271.71 Truth: $257.99 Error: $13.72 SLE: 0.00 Item: TCMT Passenger Backrest Sissy Bar & Lugg...\u001b[0m\n", + "\u001b[92m33: Guess: $48.35 Truth: $27.99 Error: $20.36 SLE: 0.28 Item: Alnicov 63.5MM Brass Tremolo Block,Tremo...\u001b[0m\n", + "\u001b[93m34: Guess: $125.03 Truth: $171.20 Error: $46.17 SLE: 0.10 Item: Subaru Forester Outback Legacy OEM Engin...\u001b[0m\n", + "\u001b[91m35: Guess: $392.21 Truth: $225.00 Error: $167.21 SLE: 0.31 Item: Richmond Auto Upholstery - 2012 Dodge Ra...\u001b[0m\n", + "\u001b[91m36: Guess: $189.81 Truth: $105.00 Error: $84.81 SLE: 0.35 Item: AP-39 Automotive Paint Primer Grey 2K Ur...\u001b[0m\n", + "\u001b[92m37: Guess: $319.24 Truth: $299.99 Error: $19.25 SLE: 0.00 Item: Road Top Wireless Carplay Retrofit Kit D...\u001b[0m\n", + "\u001b[92m38: Guess: $632.83 Truth: $535.09 Error: $97.74 SLE: 0.03 Item: Gibson Performance Exhaust 5658 Aluminiz...\u001b[0m\n", + "\u001b[92m39: Guess: $32.90 Truth: $12.33 Error: $20.57 SLE: 0.87 Item: Bella Tunno Happy Links - Baby Montessor...\u001b[0m\n", + "\u001b[92m40: Guess: $120.19 Truth: $84.99 Error: $35.20 SLE: 0.12 Item: CANMORE H300 Handheld GPS Golf Device, S...\u001b[0m\n", + "\u001b[92m41: Guess: $28.26 Truth: $15.99 Error: $12.27 SLE: 0.30 Item: DCPOWER AC Adapter Compatible Replacemen...\u001b[0m\n", + "\u001b[92m42: Guess: $58.26 Truth: $62.44 Error: $4.18 SLE: 0.00 Item: Sharp, VX2128V, Commercial Desktop Calcu...\u001b[0m\n", + "\u001b[92m43: Guess: $94.21 Truth: $82.99 Error: $11.22 SLE: 0.02 Item: Melissa & Doug Lifelike Plush Stork Gian...\u001b[0m\n", + "\u001b[93m44: Guess: $365.69 Truth: $599.95 Error: $234.26 SLE: 0.24 Item: Sony SSCS8 2-Way 3-Driver Center Channel...\u001b[0m\n", + "\u001b[93m45: Guess: $251.62 Truth: $194.99 Error: $56.63 SLE: 0.06 Item: ASUS Chromebook CX1, 14\" Full HD NanoEdg...\u001b[0m\n", + "\u001b[92m46: Guess: $315.80 Truth: $344.95 Error: $29.15 SLE: 0.01 Item: FiiO X7 32GB Hi-Res Lossless Music Playe...\u001b[0m\n", + "\u001b[92m47: Guess: $61.31 Truth: $37.99 Error: $23.32 SLE: 0.22 Item: TORRO Leather Case Compatible with iPhon...\u001b[0m\n", + "\u001b[92m48: Guess: $230.81 Truth: $224.35 Error: $6.46 SLE: 0.00 Item: Universal Air Conditioner KT 1031 A/C Co...\u001b[0m\n", + "\u001b[92m49: Guess: $799.29 Truth: $814.00 Error: $14.71 SLE: 0.00 Item: Street Series Stainless Performance Cat-...\u001b[0m\n", + "\u001b[93m50: Guess: $317.66 Truth: $439.88 Error: $122.22 SLE: 0.11 Item: Lenovo IdeaPad 3 14-inch Laptop, 14.0-in...\u001b[0m\n", + "\u001b[92m51: Guess: $345.02 Truth: $341.43 Error: $3.59 SLE: 0.00 Item: Access Bed Covers TonnoSport 22050219 - ...\u001b[0m\n", + "\u001b[92m52: Guess: $52.11 Truth: $46.78 Error: $5.33 SLE: 0.01 Item: G.I. JOE Hasbro 3 3/4\" Wave 5 Action Fig...\u001b[0m\n", + "\u001b[93m53: Guess: $234.35 Truth: $171.44 Error: $62.91 SLE: 0.10 Item: T&S Brass B-0232-BST Double Pantry Fauce...\u001b[0m\n", + "\u001b[92m54: Guess: $480.97 Truth: $458.00 Error: $22.97 SLE: 0.00 Item: ZTUOAUMA Fuel Injection Pump 3090942 309...\u001b[0m\n", + "\u001b[93m55: Guess: $178.10 Truth: $130.75 Error: $47.35 SLE: 0.09 Item: 2AP18AA#ABA Hp Prime Graphing Calculator...\u001b[0m\n", + "\u001b[92m56: Guess: $56.86 Truth: $83.81 Error: $26.95 SLE: 0.15 Item: Lowrance 000-0119-83 Nmea 2000 25' Exten...\u001b[0m\n", + "\u001b[91m57: Guess: $167.67 Truth: $386.39 Error: $218.72 SLE: 0.69 Item: Jeep Genuine Accessories 82213051 Hood L...\u001b[0m\n", + "\u001b[92m58: Guess: $176.92 Truth: $169.00 Error: $7.92 SLE: 0.00 Item: GODOX CB-06 Hard Carrying Case with Whee...\u001b[0m\n", + "\u001b[92m59: Guess: $29.57 Truth: $17.95 Error: $11.62 SLE: 0.23 Item: Au-Tomotive Gold, INC. Ford Black Valet ...\u001b[0m\n", + "\u001b[92m60: Guess: $285.89 Truth: $269.00 Error: $16.89 SLE: 0.00 Item: Snailfly Black Roof Rack Rail + Cross Ba...\u001b[0m\n", + "\u001b[92m61: Guess: $105.87 Truth: $77.77 Error: $28.10 SLE: 0.09 Item: KING SHA Anti Glare LED Track Lighting H...\u001b[0m\n", + "\u001b[92m62: Guess: $102.21 Truth: $88.99 Error: $13.22 SLE: 0.02 Item: APS Compatible with Chevy Silverado 1500...\u001b[0m\n", + "\u001b[92m63: Guess: $333.99 Truth: $364.41 Error: $30.42 SLE: 0.01 Item: Wilwood Engineering 14011291R Brake Cali...\u001b[0m\n", + "\u001b[92m64: Guess: $160.69 Truth: $127.03 Error: $33.66 SLE: 0.05 Item: ACDelco Gold 336-1925A Starter, Remanufa...\u001b[0m\n", + "\u001b[92m65: Guess: $660.87 Truth: $778.95 Error: $118.08 SLE: 0.03 Item: UWS EC10783 69-Inch Matte Black Heavy-Wa...\u001b[0m\n", + "\u001b[92m66: Guess: $193.16 Truth: $206.66 Error: $13.50 SLE: 0.00 Item: Dell Latitude E5440 14in Business Laptop...\u001b[0m\n", + "\u001b[92m67: Guess: $48.29 Truth: $35.94 Error: $12.35 SLE: 0.08 Item: (Plug and Play) Spare Tire Brake Light W...\u001b[0m\n", + "\u001b[92m68: Guess: $169.10 Truth: $149.00 Error: $20.10 SLE: 0.02 Item: The Ultimate Roadside Rescue Assistant\u001b[0m\n", + "\u001b[92m69: Guess: $236.94 Truth: $251.98 Error: $15.04 SLE: 0.00 Item: Brand New 18\" x 8.5\" Replacement Wheel f...\u001b[0m\n", + "\u001b[93m70: Guess: $225.17 Truth: $160.00 Error: $65.17 SLE: 0.12 Item: Headlight Headlamp LH Left & RH Right Pa...\u001b[0m\n", + "\u001b[92m71: Guess: $71.69 Truth: $39.99 Error: $31.70 SLE: 0.33 Item: Lilo And Stitch Deluxe Oversize Print La...\u001b[0m\n", + "\u001b[93m72: Guess: $267.58 Truth: $362.41 Error: $94.83 SLE: 0.09 Item: AC Compressor & A/C Clutch For Hyundai A...\u001b[0m\n", + "\u001b[92m73: Guess: $409.01 Truth: $344.00 Error: $65.01 SLE: 0.03 Item: House Of Troy PIN475-AB Pinnacle Collect...\u001b[0m\n", + "\u001b[92m74: Guess: $36.40 Truth: $25.09 Error: $11.31 SLE: 0.13 Item: Juno T29 WH Floating Electrical Feed Sin...\u001b[0m\n", + "\u001b[92m75: Guess: $140.88 Truth: $175.95 Error: $35.07 SLE: 0.05 Item: Sherman GO-PARTS - for 2013-2016 Toyota ...\u001b[0m\n", + "\u001b[91m76: Guess: $335.75 Truth: $132.64 Error: $203.11 SLE: 0.85 Item: Roland RPU-3 Electronic Keyboard Pedal o...\u001b[0m\n", + "\u001b[93m77: Guess: $276.96 Truth: $422.99 Error: $146.03 SLE: 0.18 Item: Rockland VMI14 12,000 Pound 12 Volt DC E...\u001b[0m\n", + "\u001b[92m78: Guess: $165.91 Truth: $146.48 Error: $19.43 SLE: 0.02 Item: Max Advanced Brakes Elite XDS Front Cros...\u001b[0m\n", + "\u001b[92m79: Guess: $186.72 Truth: $156.83 Error: $29.89 SLE: 0.03 Item: Quality-Built 11030 Premium Quality Alte...\u001b[0m\n", + "\u001b[93m80: Guess: $167.44 Truth: $251.99 Error: $84.55 SLE: 0.17 Item: Lucida LG-510 Student Classical Guitar, ...\u001b[0m\n", + "\u001b[91m81: Guess: $342.92 Truth: $940.33 Error: $597.41 SLE: 1.01 Item: Longacre 52-79800 Aluminum Turn Plates\u001b[0m\n", + "\u001b[92m82: Guess: $70.32 Truth: $52.99 Error: $17.33 SLE: 0.08 Item: Motion Pro 08-0380 Adjustable Torque Wre...\u001b[0m\n", + "\u001b[93m83: Guess: $302.97 Truth: $219.95 Error: $83.02 SLE: 0.10 Item: Glyph Thunderbolt 3 NVMe Dock (0 GB)\u001b[0m\n", + "\u001b[92m84: Guess: $451.60 Truth: $441.03 Error: $10.57 SLE: 0.00 Item: TOYO Open Country MT Performance Radial ...\u001b[0m\n", + "\u001b[92m85: Guess: $181.95 Truth: $168.98 Error: $12.97 SLE: 0.01 Item: Razer Seiren X USB Streaming Microphone ...\u001b[0m\n", + "\u001b[92m86: Guess: $27.18 Truth: $2.49 Error: $24.69 SLE: 4.36 Item: Happy Birthday to Dad From Your Daughter...\u001b[0m\n", + "\u001b[92m87: Guess: $94.46 Truth: $98.62 Error: $4.16 SLE: 0.00 Item: Little Tikes My Real Jam First Concert S...\u001b[0m\n", + "\u001b[92m88: Guess: $216.10 Truth: $256.95 Error: $40.85 SLE: 0.03 Item: Studio M Peace and Harmony Art Pole Comm...\u001b[0m\n", + "\u001b[92m89: Guess: $34.01 Truth: $30.99 Error: $3.02 SLE: 0.01 Item: MyVolts 12V Power Supply Adaptor Compati...\u001b[0m\n", + "\u001b[93m90: Guess: $767.40 Truth: $569.84 Error: $197.56 SLE: 0.09 Item: Dell Latitude 7212 Rugged Extreme Tablet...\u001b[0m\n", + "\u001b[92m91: Guess: $201.17 Truth: $177.99 Error: $23.18 SLE: 0.01 Item: Covermates Contour Fit Car Cover - Light...\u001b[0m\n", + "\u001b[92m92: Guess: $972.55 Truth: $997.99 Error: $25.44 SLE: 0.00 Item: Westin 57-4025 Black HDX Grille Guard fi...\u001b[0m\n", + "\u001b[92m93: Guess: $201.93 Truth: $219.00 Error: $17.07 SLE: 0.01 Item: Fieldpiece JL2 Job Link Wireless App Tra...\u001b[0m\n", + "\u001b[92m94: Guess: $250.76 Truth: $225.55 Error: $25.21 SLE: 0.01 Item: hansgrohe Talis S Modern Premium Easy Cl...\u001b[0m\n", + "\u001b[91m95: Guess: $775.41 Truth: $495.95 Error: $279.46 SLE: 0.20 Item: G-Technology G-SPEED eS PRO High-Perform...\u001b[0m\n", + "\u001b[92m96: Guess: $924.45 Truth: $942.37 Error: $17.92 SLE: 0.00 Item: DreamLine SHDR-1960723L-01 Shower Door, ...\u001b[0m\n", + "\u001b[92m97: Guess: $27.71 Truth: $1.94 Error: $25.77 SLE: 5.19 Item: Sanctuary Square Backplate Finish: Oiled...\u001b[0m\n", + "\u001b[92m98: Guess: $273.73 Truth: $284.34 Error: $10.61 SLE: 0.00 Item: Pelican Protector 1750 Long Case - Multi...\u001b[0m\n", + "\u001b[92m99: Guess: $194.77 Truth: $171.90 Error: $22.87 SLE: 0.02 Item: Brock Replacement Driver and Passenger H...\u001b[0m\n", + "\u001b[93m100: Guess: $222.34 Truth: $144.99 Error: $77.35 SLE: 0.18 Item: Carlinkit Ai Box Mini, Android 11, Multi...\u001b[0m\n", + "\u001b[92m101: Guess: $406.35 Truth: $470.47 Error: $64.12 SLE: 0.02 Item: StarDot NetCamLIVE2 YouTube Live Stream ...\u001b[0m\n", + "\u001b[92m102: Guess: $79.62 Truth: $66.95 Error: $12.67 SLE: 0.03 Item: Atomic Compatible FILXXCAR0016 16x25x5 M...\u001b[0m\n", + "\u001b[92m103: Guess: $126.27 Truth: $117.00 Error: $9.27 SLE: 0.01 Item: Bandai Awakening of S. H. s.h.figuarts s...\u001b[0m\n", + "\u001b[91m104: Guess: $271.20 Truth: $172.14 Error: $99.06 SLE: 0.20 Item: Fit System 62135G Passenger Side Towing ...\u001b[0m\n", + "\u001b[92m105: Guess: $359.25 Truth: $392.74 Error: $33.49 SLE: 0.01 Item: Black Horse Black Aluminum Exceed Runnin...\u001b[0m\n", + "\u001b[92m106: Guess: $50.62 Truth: $16.99 Error: $33.63 SLE: 1.11 Item: Dearsun Twinkle Star Color Night Light P...\u001b[0m\n", + "\u001b[92m107: Guess: $23.97 Truth: $1.34 Error: $22.63 SLE: 5.61 Item: Pokemon - Gallade Spirit Link (83/108) -...\u001b[0m\n", + "\u001b[93m108: Guess: $254.32 Truth: $349.98 Error: $95.66 SLE: 0.10 Item: Ibanez GA34STCE-NT GIO Series Classical ...\u001b[0m\n", + "\u001b[92m109: Guess: $414.40 Truth: $370.71 Error: $43.69 SLE: 0.01 Item: Set 2 Heavy Duty 12-16.5 12x16.5 12 Ply ...\u001b[0m\n", + "\u001b[92m110: Guess: $73.27 Truth: $65.88 Error: $7.39 SLE: 0.01 Item: Hairpin Table Legs 28\" Heavy Duty Hairpi...\u001b[0m\n", + "\u001b[93m111: Guess: $280.92 Truth: $229.99 Error: $50.93 SLE: 0.04 Item: Marada Racing Seat with Adjustable Slide...\u001b[0m\n", + "\u001b[92m112: Guess: $25.05 Truth: $9.14 Error: $15.91 SLE: 0.89 Item: Remington Industries 24UL1007STRWHI25 24...\u001b[0m\n", + "\u001b[91m113: Guess: $377.12 Truth: $199.00 Error: $178.12 SLE: 0.41 Item: Acer S3-391-6046 13.3-inch Ultrabook, In...\u001b[0m\n", + "\u001b[91m114: Guess: $195.37 Truth: $109.99 Error: $85.38 SLE: 0.33 Item: ICBEAMER 7\" RGB LED Headlights Bulb Halo...\u001b[0m\n", + "\u001b[93m115: Guess: $395.30 Truth: $570.42 Error: $175.12 SLE: 0.13 Item: R1 Concepts Front Rear Brakes and Rotors...\u001b[0m\n", + "\u001b[92m116: Guess: $253.52 Truth: $279.99 Error: $26.47 SLE: 0.01 Item: Camplux 2.64 GPM Tankless , Outdoor Port...\u001b[0m\n", + "\u001b[92m117: Guess: $46.52 Truth: $30.99 Error: $15.53 SLE: 0.16 Item: KNOKLOCK 10 Pack 3.75 Inch(96mm) Kitchen...\u001b[0m\n", + "\u001b[92m118: Guess: $40.11 Truth: $31.99 Error: $8.12 SLE: 0.05 Item: Valley Enterprises Yaesu USB FTDI CT-62 ...\u001b[0m\n", + "\u001b[93m119: Guess: $62.65 Truth: $15.90 Error: $46.75 SLE: 1.76 Item: G9 LED Light Bulbs,8W,75W 100W replaceme...\u001b[0m\n", + "\u001b[93m120: Guess: $106.37 Truth: $45.99 Error: $60.38 SLE: 0.68 Item: ZCHAOZ 4 Lights Antique White Farmhouse ...\u001b[0m\n", + "\u001b[91m121: Guess: $234.32 Truth: $113.52 Error: $120.80 SLE: 0.52 Item: Honeywell TH8320R1003 Honeywell VisionPr...\u001b[0m\n", + "\u001b[92m122: Guess: $501.78 Truth: $516.99 Error: $15.21 SLE: 0.00 Item: Patriot Exhaust H8013-1 1-7/8\" Clippster...\u001b[0m\n", + "\u001b[93m123: Guess: $140.05 Truth: $196.99 Error: $56.94 SLE: 0.11 Item: Fitrite Autopart New Front Left Driver S...\u001b[0m\n", + "\u001b[93m124: Guess: $92.57 Truth: $46.55 Error: $46.02 SLE: 0.46 Item: Technical Precision Replacement for GE G...\u001b[0m\n", + "\u001b[93m125: Guess: $278.54 Truth: $356.99 Error: $78.45 SLE: 0.06 Item: Covercraft Carhartt SeatSaver Front Row ...\u001b[0m\n", + "\u001b[92m126: Guess: $279.54 Truth: $319.95 Error: $40.41 SLE: 0.02 Item: Sennheiser SD Pro 2 (506008) - Double-Si...\u001b[0m\n", + "\u001b[93m127: Guess: $143.18 Truth: $96.06 Error: $47.12 SLE: 0.16 Item: Hitachi MAF0110 Mass Air Flow Sensor\u001b[0m\n", + "\u001b[93m128: Guess: $239.09 Truth: $190.99 Error: $48.10 SLE: 0.05 Item: AmScope SE305R-P-LED-PS36A 10X-30X LED C...\u001b[0m\n", + "\u001b[93m129: Guess: $161.55 Truth: $257.95 Error: $96.40 SLE: 0.22 Item: Front Left Driver Side Window Regulator ...\u001b[0m\n", + "\u001b[92m130: Guess: $84.35 Truth: $62.95 Error: $21.40 SLE: 0.08 Item: Premium Replica Hubcap Set, Fits Nissan ...\u001b[0m\n", + "\u001b[93m131: Guess: $92.57 Truth: $47.66 Error: $44.91 SLE: 0.43 Item: Excellerations Phonics Spelling Game for...\u001b[0m\n", + "\u001b[92m132: Guess: $218.86 Truth: $226.99 Error: $8.13 SLE: 0.00 Item: RC4WD BigDog Dual Axle Scale Car/Truck T...\u001b[0m\n", + "\u001b[92m133: Guess: $291.86 Truth: $359.95 Error: $68.09 SLE: 0.04 Item: Unknown Stage 2 Clutch Kit - Low Altitud...\u001b[0m\n", + "\u001b[92m134: Guess: $78.20 Truth: $78.40 Error: $0.20 SLE: 0.00 Item: 2002-2008 Dodge Ram 1500 Mopar 4X4 Emble...\u001b[0m\n", + "\u001b[92m135: Guess: $179.42 Truth: $172.77 Error: $6.65 SLE: 0.00 Item: Pro Comp Alloys Series 89 Wheel with Pol...\u001b[0m\n", + "\u001b[92m136: Guess: $305.83 Truth: $316.45 Error: $10.62 SLE: 0.00 Item: Detroit Axle - Front Rear Strut & Coil S...\u001b[0m\n", + "\u001b[92m137: Guess: $95.42 Truth: $87.99 Error: $7.43 SLE: 0.01 Item: ECCPP Rear Wheel Axle Replacement fit fo...\u001b[0m\n", + "\u001b[92m138: Guess: $221.32 Truth: $226.63 Error: $5.31 SLE: 0.00 Item: Dell Latitude E6520 Intel i7-2720QM 2.20...\u001b[0m\n", + "\u001b[92m139: Guess: $44.53 Truth: $31.49 Error: $13.04 SLE: 0.11 Item: F FIERCE CYCLE 251pcs Black Universal Mo...\u001b[0m\n", + "\u001b[93m140: Guess: $239.87 Truth: $196.00 Error: $43.87 SLE: 0.04 Item: Flash Furniture 4 Pk. HERCULES Series 88...\u001b[0m\n", + "\u001b[92m141: Guess: $51.09 Truth: $78.40 Error: $27.31 SLE: 0.18 Item: B&M 30287 Throttle Valve/Kickdown Cable,...\u001b[0m\n", + "\u001b[92m142: Guess: $125.85 Truth: $116.25 Error: $9.60 SLE: 0.01 Item: Gates TCK226 PowerGrip Premium Timing Be...\u001b[0m\n", + "\u001b[92m143: Guess: $146.88 Truth: $112.78 Error: $34.10 SLE: 0.07 Item: Monroe Shocks & Struts Quick-Strut 17149...\u001b[0m\n", + "\u001b[93m144: Guess: $82.47 Truth: $27.32 Error: $55.15 SLE: 1.17 Item: Feit Electric BPMR16/GU10/930CA/6 35W EQ...\u001b[0m\n", + "\u001b[92m145: Guess: $139.40 Truth: $145.91 Error: $6.51 SLE: 0.00 Item: Yellow Jacket 2806 Contractor Extension ...\u001b[0m\n", + "\u001b[92m146: Guess: $185.59 Truth: $171.09 Error: $14.50 SLE: 0.01 Item: Garage-Pro Tailgate SET Compatible with ...\u001b[0m\n", + "\u001b[93m147: Guess: $116.05 Truth: $167.95 Error: $51.90 SLE: 0.13 Item: 3M Perfect It Buffing and Polishing Kit ...\u001b[0m\n", + "\u001b[92m148: Guess: $56.66 Truth: $28.49 Error: $28.17 SLE: 0.45 Item: Chinese Style Dollhouse Model DIY Miniat...\u001b[0m\n", + "\u001b[92m149: Guess: $148.08 Truth: $122.23 Error: $25.85 SLE: 0.04 Item: Generic NRG Innovations SRK-161H Steerin...\u001b[0m\n", + "\u001b[92m150: Guess: $58.72 Truth: $32.99 Error: $25.73 SLE: 0.32 Item: Learning Resources Coding Critters Range...\u001b[0m\n", + "\u001b[93m151: Guess: $114.50 Truth: $71.20 Error: $43.30 SLE: 0.22 Item: Bosch Automotive 15463 Oxygen Sensor, OE...\u001b[0m\n", + "\u001b[92m152: Guess: $90.17 Truth: $112.75 Error: $22.58 SLE: 0.05 Item: Case of 24-2 Inch Blue Painters Tape - 6...\u001b[0m\n", + "\u001b[92m153: Guess: $131.93 Truth: $142.43 Error: $10.50 SLE: 0.01 Item: MOCA Engine Water Pump & Fan Clutch fit ...\u001b[0m\n", + "\u001b[93m154: Guess: $304.84 Truth: $398.99 Error: $94.15 SLE: 0.07 Item: SAREMAS Foot Step Bars for Hyundai Palis...\u001b[0m\n", + "\u001b[93m155: Guess: $589.85 Truth: $449.00 Error: $140.85 SLE: 0.07 Item: Gretsch G9210 Square Neck Boxcar Mahogan...\u001b[0m\n", + "\u001b[92m156: Guess: $198.86 Truth: $189.00 Error: $9.86 SLE: 0.00 Item: NikoMaku Mirror Dash Cam Front and Rear ...\u001b[0m\n", + "\u001b[92m157: Guess: $112.44 Truth: $120.91 Error: $8.47 SLE: 0.01 Item: Fenix HP25R v2.0 USB-C Rechargeable Head...\u001b[0m\n", + "\u001b[92m158: Guess: $182.50 Truth: $203.53 Error: $21.03 SLE: 0.01 Item: R&L Racing Heavy Duty Roll-Up Soft Tonne...\u001b[0m\n", + "\u001b[92m159: Guess: $341.67 Truth: $349.99 Error: $8.32 SLE: 0.00 Item: Garmin 010-02258-10 GPSMAP 64sx, Handhel...\u001b[0m\n", + "\u001b[92m160: Guess: $29.90 Truth: $34.35 Error: $4.45 SLE: 0.02 Item: Brown 5-7/8\" X 8-1/2\" X 3/16\" Thick Heav...\u001b[0m\n", + "\u001b[92m161: Guess: $331.67 Truth: $384.99 Error: $53.32 SLE: 0.02 Item: GAOMON PD2200 Pen Display & 20 Pen Nibs ...\u001b[0m\n", + "\u001b[93m162: Guess: $262.90 Truth: $211.00 Error: $51.90 SLE: 0.05 Item: VXMOTOR for 97-03 Ford F150/F250 Lightdu...\u001b[0m\n", + "\u001b[91m163: Guess: $226.38 Truth: $129.00 Error: $97.38 SLE: 0.31 Item: HP EliteBook 2540p Intel Core i7-640LM X...\u001b[0m\n", + "\u001b[93m164: Guess: $38.95 Truth: $111.45 Error: $72.50 SLE: 1.07 Item: Green EPX Mixing Nozzles 100-Pack-fits 3...\u001b[0m\n", + "\u001b[92m165: Guess: $45.80 Truth: $81.12 Error: $35.32 SLE: 0.32 Item: Box Partners 6 1/4 x 3 1/8\" 13 Pt. Manil...\u001b[0m\n", + "\u001b[92m166: Guess: $437.99 Truth: $457.08 Error: $19.09 SLE: 0.00 Item: Vixen Air 1/2\" NPT Air Ride Suspension H...\u001b[0m\n", + "\u001b[92m167: Guess: $82.78 Truth: $49.49 Error: $33.29 SLE: 0.26 Item: Smart Floor Lamp, 2700-6500K+RGBPink Mul...\u001b[0m\n", + "\u001b[93m168: Guess: $122.46 Truth: $80.56 Error: $41.90 SLE: 0.17 Item: SOZG 324mm Wheelbase Body Shell RC Car B...\u001b[0m\n", + "\u001b[92m169: Guess: $301.87 Truth: $278.39 Error: $23.48 SLE: 0.01 Item: Mickey Thompson ET Street S/S Racing Rad...\u001b[0m\n", + "\u001b[92m170: Guess: $425.78 Truth: $364.50 Error: $61.28 SLE: 0.02 Item: Pirelli 275/40R20 106W XL RFT P0 PZ4-LUX...\u001b[0m\n", + "\u001b[93m171: Guess: $520.14 Truth: $378.99 Error: $141.15 SLE: 0.10 Item: Torklift C3212 Rear Tie Down\u001b[0m\n", + "\u001b[93m172: Guess: $220.74 Truth: $165.28 Error: $55.46 SLE: 0.08 Item: Cardone 78-4226 Remanufactured Ford Comp...\u001b[0m\n", + "\u001b[93m173: Guess: $98.70 Truth: $56.74 Error: $41.96 SLE: 0.30 Item: Kidde AccessPoint 001798 Supra TouchPoin...\u001b[0m\n", + "\u001b[93m174: Guess: $214.16 Truth: $307.95 Error: $93.79 SLE: 0.13 Item: 3M Protecta 3100414 Self Retracting Life...\u001b[0m\n", + "\u001b[93m175: Guess: $102.49 Truth: $38.00 Error: $64.49 SLE: 0.95 Item: Plantronics 89435-01 Wired Headset, Blac...\u001b[0m\n", + "\u001b[92m176: Guess: $91.64 Truth: $53.00 Error: $38.64 SLE: 0.29 Item: Logitech K750 Wireless Solar Keyboard fo...\u001b[0m\n", + "\u001b[92m177: Guess: $537.72 Truth: $498.00 Error: $39.72 SLE: 0.01 Item: Olympus PEN E-PL9 Body Only with 3-Inch ...\u001b[0m\n", + "\u001b[91m178: Guess: $134.82 Truth: $53.99 Error: $80.83 SLE: 0.82 Item: Beck/Arnley 051-6066 Hub & Bearing Assem...\u001b[0m\n", + "\u001b[92m179: Guess: $363.60 Truth: $350.00 Error: $13.60 SLE: 0.00 Item: Eibach Pro-Kit Performance Springs E10-6...\u001b[0m\n", + "\u001b[92m180: Guess: $311.21 Truth: $299.95 Error: $11.26 SLE: 0.00 Item: LEGO DC Batman 1989 Batwing 76161 Displa...\u001b[0m\n", + "\u001b[92m181: Guess: $97.05 Truth: $94.93 Error: $2.12 SLE: 0.00 Item: Kingston Brass KS3608PL Restoration 4-In...\u001b[0m\n", + "\u001b[92m182: Guess: $321.44 Truth: $379.00 Error: $57.56 SLE: 0.03 Item: Polk Vanishing Series 265-LS In-Wall 3-W...\u001b[0m\n", + "\u001b[92m183: Guess: $252.24 Truth: $299.95 Error: $47.71 SLE: 0.03 Item: Spec-D Tuning LED Projector Headlights G...\u001b[0m\n", + "\u001b[92m184: Guess: $39.72 Truth: $24.99 Error: $14.73 SLE: 0.20 Item: RICHMOND & FINCH Airpod Pro Case, Green ...\u001b[0m\n", + "\u001b[91m185: Guess: $135.40 Truth: $41.04 Error: $94.36 SLE: 1.39 Item: LFA Industries 43B-5A-33JT 1/16-1/2-1.5-...\u001b[0m\n", + "\u001b[92m186: Guess: $298.91 Truth: $327.90 Error: $28.99 SLE: 0.01 Item: SAUTVS LED Headlight Assembly for Slings...\u001b[0m\n", + "\u001b[92m187: Guess: $36.90 Truth: $10.99 Error: $25.91 SLE: 1.32 Item: 2 Pack Combo Womens Safety Glasses Impac...\u001b[0m\n", + "\u001b[92m188: Guess: $23.51 Truth: $14.99 Error: $8.52 SLE: 0.18 Item: Arepa - Venezuelan cuisine - Venezuela P...\u001b[0m\n", + "\u001b[93m189: Guess: $43.32 Truth: $84.95 Error: $41.63 SLE: 0.44 Item: Schlage Lock Company KS23D2300 Padlock, ...\u001b[0m\n", + "\u001b[92m190: Guess: $144.93 Truth: $111.00 Error: $33.93 SLE: 0.07 Item: Techni Mobili White Sit to Stand Mobile ...\u001b[0m\n", + "\u001b[92m191: Guess: $159.17 Truth: $123.73 Error: $35.44 SLE: 0.06 Item: Special Lite Products Contemporary Wall ...\u001b[0m\n", + "\u001b[92m192: Guess: $519.71 Truth: $557.38 Error: $37.67 SLE: 0.00 Item: Tascam DP-24SD 24-Track Digital Portastu...\u001b[0m\n", + "\u001b[92m193: Guess: $117.26 Truth: $95.55 Error: $21.71 SLE: 0.04 Item: Glow Lighting 636CC10SP Vista Crystal Fl...\u001b[0m\n", + "\u001b[92m194: Guess: $164.33 Truth: $154.00 Error: $10.33 SLE: 0.00 Item: Z3 Wind Deflector, Smoke Tint, Lexan, Wi...\u001b[0m\n", + "\u001b[91m195: Guess: $333.36 Truth: $198.99 Error: $134.37 SLE: 0.26 Item: Olympus E-20 5MP Digital Camera w/ 4x Op...\u001b[0m\n", + "\u001b[91m196: Guess: $215.62 Truth: $430.44 Error: $214.82 SLE: 0.47 Item: PHYNEDI 1:1000 World Trade Center (1973-...\u001b[0m\n", + "\u001b[92m197: Guess: $46.01 Truth: $45.67 Error: $0.34 SLE: 0.00 Item: YANGHUAN Unstable Unicorns Adventure Car...\u001b[0m\n", + "\u001b[92m198: Guess: $268.04 Truth: $249.00 Error: $19.04 SLE: 0.01 Item: Interlogix NX-1820E NetworX Touch Screen...\u001b[0m\n", + "\u001b[92m199: Guess: $57.29 Truth: $42.99 Error: $14.30 SLE: 0.08 Item: Steering Damper,Universal Motorcycle Han...\u001b[0m\n", + "\u001b[93m200: Guess: $224.93 Truth: $181.33 Error: $43.60 SLE: 0.05 Item: Amprobe TIC 410A Hot Stick Attachment\u001b[0m\n", + "\u001b[92m201: Guess: $21.04 Truth: $6.03 Error: $15.01 SLE: 1.31 Item: MyCableMart 3.5mm Plug/Jack, 4 Conductor...\u001b[0m\n", + "\u001b[93m202: Guess: $73.50 Truth: $29.99 Error: $43.51 SLE: 0.77 Item: OtterBox + Pop Symmetry Series Case for ...\u001b[0m\n", + "\u001b[92m203: Guess: $768.64 Truth: $899.00 Error: $130.36 SLE: 0.02 Item: Dell XPS X8700-1572BLK Desktop ( Intel C...\u001b[0m\n", + "\u001b[93m204: Guess: $521.15 Truth: $399.99 Error: $121.16 SLE: 0.07 Item: Franklin Iron Works Sperry Industrial Br...\u001b[0m\n", + "\u001b[92m205: Guess: $22.80 Truth: $4.66 Error: $18.14 SLE: 2.06 Item: Avery Legal Dividers, Standard Collated ...\u001b[0m\n", + "\u001b[92m206: Guess: $250.76 Truth: $261.41 Error: $10.65 SLE: 0.00 Item: Moen 8346 Commercial Posi-Temp Pressure ...\u001b[0m\n", + "\u001b[92m207: Guess: $152.89 Truth: $136.97 Error: $15.92 SLE: 0.01 Item: Carlisle Versa Trail ATR All Terrain Rad...\u001b[0m\n", + "\u001b[92m208: Guess: $106.34 Truth: $79.00 Error: $27.34 SLE: 0.09 Item: SUNWAYFOTO 44mm Tripod Ball Head Arca Co...\u001b[0m\n", + "\u001b[92m209: Guess: $426.61 Truth: $444.99 Error: $18.38 SLE: 0.00 Item: NanoBeam AC NBE-5AC-Gen2-US 4 Units 5GHz...\u001b[0m\n", + "\u001b[92m210: Guess: $470.28 Truth: $411.94 Error: $58.34 SLE: 0.02 Item: WULF 4\" Front 2\" Rear Leveling Lift Kit ...\u001b[0m\n", + "\u001b[93m211: Guess: $204.70 Truth: $148.40 Error: $56.30 SLE: 0.10 Item: Alera ALEVABFMC Valencia Series Mobile B...\u001b[0m\n", + "\u001b[93m212: Guess: $167.36 Truth: $244.99 Error: $77.63 SLE: 0.14 Item: YU-GI-OH! Ignition Assault Booster Box\u001b[0m\n", + "\u001b[92m213: Guess: $114.17 Truth: $86.50 Error: $27.67 SLE: 0.08 Item: 48\" x 36\" Extra-Large Framed Magnetic Bl...\u001b[0m\n", + "\u001b[91m214: Guess: $131.30 Truth: $297.95 Error: $166.65 SLE: 0.66 Item: Dell Latitude D620 Renewed Notebook PC\u001b[0m\n", + "\u001b[92m215: Guess: $474.25 Truth: $399.99 Error: $74.26 SLE: 0.03 Item: acer Aspire 5 Laptop, AMD Ryzen 3 5300U ...\u001b[0m\n", + "\u001b[92m216: Guess: $616.57 Truth: $599.00 Error: $17.57 SLE: 0.00 Item: Elk 31080/6RC-GRN 30 by 6-Inch Viva 6-Li...\u001b[0m\n", + "\u001b[92m217: Guess: $71.30 Truth: $105.99 Error: $34.69 SLE: 0.15 Item: Barbie Top Model Doll\u001b[0m\n", + "\u001b[92m218: Guess: $568.89 Truth: $689.00 Error: $120.11 SLE: 0.04 Item: Danby Designer 20-In. Electric Range wit...\u001b[0m\n", + "\u001b[92m219: Guess: $340.73 Truth: $404.99 Error: $64.26 SLE: 0.03 Item: FixtureDisplays® Metal Truss Podium Doub...\u001b[0m\n", + "\u001b[92m220: Guess: $230.27 Truth: $207.76 Error: $22.51 SLE: 0.01 Item: ACDelco 13597235 GM Original Equipment A...\u001b[0m\n", + "\u001b[92m221: Guess: $190.44 Truth: $171.82 Error: $18.62 SLE: 0.01 Item: EBC S1KF1135 Stage-1 Premium Street Brak...\u001b[0m\n", + "\u001b[92m222: Guess: $293.87 Truth: $293.24 Error: $0.63 SLE: 0.00 Item: FXR Men's Boost FX Jacket (Black/Orange/...\u001b[0m\n", + "\u001b[92m223: Guess: $385.94 Truth: $374.95 Error: $10.99 SLE: 0.00 Item: SuperATV Scratch Resistant 3-in-1 Flip W...\u001b[0m\n", + "\u001b[92m224: Guess: $125.86 Truth: $111.99 Error: $13.87 SLE: 0.01 Item: SBU 3 Layer All Weather Mini Van Car Cov...\u001b[0m\n", + "\u001b[92m225: Guess: $66.44 Truth: $42.99 Error: $23.45 SLE: 0.18 Item: 2 Pack Outdoor Brochure Holder Advertisi...\u001b[0m\n", + "\u001b[92m226: Guess: $145.97 Truth: $116.71 Error: $29.26 SLE: 0.05 Item: Monroe Shocks & Struts Quick-Strut 17158...\u001b[0m\n", + "\u001b[91m227: Guess: $201.35 Truth: $118.61 Error: $82.74 SLE: 0.28 Item: Elements of Design Magellan EB235AL Thre...\u001b[0m\n", + "\u001b[92m228: Guess: $142.35 Truth: $147.12 Error: $4.77 SLE: 0.00 Item: GM Genuine Parts 15-62961 Air Conditioni...\u001b[0m\n", + "\u001b[93m229: Guess: $173.74 Truth: $119.99 Error: $53.75 SLE: 0.14 Item: Baseus 17-in-1 USB C Docking Station to ...\u001b[0m\n", + "\u001b[93m230: Guess: $490.03 Truth: $369.98 Error: $120.05 SLE: 0.08 Item: Whitehall™ Personalized Whitehall Capito...\u001b[0m\n", + "\u001b[92m231: Guess: $291.51 Truth: $315.55 Error: $24.04 SLE: 0.01 Item: Pro Circuit Works Pipe PY05250 for 02-19...\u001b[0m\n", + "\u001b[93m232: Guess: $261.90 Truth: $190.99 Error: $70.91 SLE: 0.10 Item: HYANKA 15 \"1200W Professional DJ Speaker...\u001b[0m\n", + "\u001b[92m233: Guess: $193.70 Truth: $155.00 Error: $38.70 SLE: 0.05 Item: Bluetooth X6BT Card Reader Writer Encode...\u001b[0m\n", + "\u001b[92m234: Guess: $373.37 Truth: $349.99 Error: $23.38 SLE: 0.00 Item: AIRAID Cold Air Intake System by K&N: In...\u001b[0m\n", + "\u001b[92m235: Guess: $256.72 Truth: $249.99 Error: $6.73 SLE: 0.00 Item: Bostingner Shower Faucets Sets Complete,...\u001b[0m\n", + "\u001b[92m236: Guess: $37.83 Truth: $42.99 Error: $5.16 SLE: 0.02 Item: PIT66 Front Bumper Turn Signal Lights, C...\u001b[0m\n", + "\u001b[92m237: Guess: $39.64 Truth: $17.99 Error: $21.65 SLE: 0.58 Item: Caseology Bumpy Compatible with Google P...\u001b[0m\n", + "\u001b[92m238: Guess: $355.64 Truth: $425.00 Error: $69.36 SLE: 0.03 Item: Fleck 2510 Timer Mechanical Filter Contr...\u001b[0m\n", + "\u001b[93m239: Guess: $303.37 Truth: $249.99 Error: $53.38 SLE: 0.04 Item: Haloview MC7108 Wireless RV Backup Camer...\u001b[0m\n", + "\u001b[93m240: Guess: $61.21 Truth: $138.23 Error: $77.02 SLE: 0.65 Item: Schmidt Spiele - Manhattan\u001b[0m\n", + "\u001b[93m241: Guess: $520.03 Truth: $414.99 Error: $105.04 SLE: 0.05 Item: Corsa 14333 Tip Kit (Ford Mustang GT)\u001b[0m\n", + "\u001b[93m242: Guess: $223.31 Truth: $168.28 Error: $55.03 SLE: 0.08 Item: Hoshizaki FM116A Fan Motor Kit 1\u001b[0m\n", + "\u001b[92m243: Guess: $222.89 Truth: $199.99 Error: $22.90 SLE: 0.01 Item: BAINUO Antler Chandelier Lighting,6 Ligh...\u001b[0m\n", + "\u001b[92m244: Guess: $166.09 Truth: $126.70 Error: $39.39 SLE: 0.07 Item: DNA MOTORING HL-OH-FEXP06-SM-AM Smoke Le...\u001b[0m\n", + "\u001b[92m245: Guess: $23.59 Truth: $5.91 Error: $17.68 SLE: 1.61 Item: Wera Stainless 3840/1 TS 2.5mm Hex Inser...\u001b[0m\n", + "\u001b[92m246: Guess: $200.99 Truth: $193.06 Error: $7.93 SLE: 0.00 Item: Celestron - PowerSeeker 127EQ Telescope ...\u001b[0m\n", + "\u001b[93m247: Guess: $185.86 Truth: $249.99 Error: $64.13 SLE: 0.09 Item: NHOPEEW 10.1inch Android Car Radio Carpl...\u001b[0m\n", + "\u001b[92m248: Guess: $75.71 Truth: $64.12 Error: $11.59 SLE: 0.03 Item: Other Harmonica (Suzuki-2Timer24- A)\u001b[0m\n", + "\u001b[93m249: Guess: $194.92 Truth: $114.99 Error: $79.93 SLE: 0.27 Item: Harley Air Filter Venturi Intake Air Cle...\u001b[0m\n", + "\u001b[93m250: Guess: $646.25 Truth: $926.00 Error: $279.75 SLE: 0.13 Item: Elite Screens Edge Free Ambient Light Re...\u001b[0m\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "Tester.test(ensemble_pricer, test)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cff07ba7-4557-4519-acba-15475118065d", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week8/community_contributions/Ensemble_with_xgboost/Build_Scanning_Agent.ipynb b/week8/community_contributions/Ensemble_with_xgboost/Build_Scanning_Agent.ipynb new file mode 100644 index 0000000..846a5e3 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/Build_Scanning_Agent.ipynb @@ -0,0 +1,235 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0df0d850-49eb-4a0b-a27a-146969db710d", + "metadata": {}, + "source": [ + "# ScanningAgent\n", + "\n", + "Looks for promising deals by subscribing to RSS feeds." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d3763a79-8a5a-4300-8de4-93e85475af10", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import json\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "from agents.deals import ScrapedDeal, DealSelection" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c6469e32-16c3-4443-9475-ade710ef6933", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize and constants\n", + "\n", + "load_dotenv(override=True)\n", + "os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n", + "MODEL = 'gpt-4o-mini'\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "afece9db-8cd4-46be-ac57-0b472e84da7d", + "metadata": {}, + "outputs": [], + "source": [ + "deals = ScrapedDeal.fetch(show_progress=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8cd15c4d-eb44-4601-bf0c-f945c1d8e3ec", + "metadata": {}, + "outputs": [], + "source": [ + "len(deals)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4259f30a-6455-49ed-8863-2f9ddd4776cb", + "metadata": {}, + "outputs": [], + "source": [ + "deals[44].describe()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8100e5ac-38f5-40c1-a712-08ae12c85038", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = \"\"\"You identify and summarize the 5 most detailed deals from a list, by selecting deals that have the most detailed, high quality description and the most clear price.\n", + "Respond strictly in JSON with no explanation, using this format. You should provide the price as a number derived from the description. If the price of a deal isn't clear, do not include that deal in your response.\n", + "Most important is that you respond with the 5 deals that have the most detailed product description with price. It's not important to mention the terms of the deal; most important is a thorough description of the product.\n", + "Be careful with products that are described as \"$XXX off\" or \"reduced by $XXX\" - this isn't the actual price of the product. Only respond with products when you are highly confident about the price. \n", + "\n", + "{\"deals\": [\n", + " {\n", + " \"product_description\": \"Your clearly expressed summary of the product in 4-5 sentences. Details of the item are much more important than why it's a good deal. Avoid mentioning discounts and coupons; focus on the item itself. There should be a paragpraph of text for each item you choose.\",\n", + " \"price\": 99.99,\n", + " \"url\": \"the url as provided\"\n", + " },\n", + " ...\n", + "]}\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f4bca170-af71-40c9-9597-1d72980c74d8", + "metadata": {}, + "outputs": [], + "source": [ + "user_prompt = \"\"\"Respond with the most promising 5 deals from this list, selecting those which have the most detailed, high quality product description and a clear price.\n", + "Respond strictly in JSON, and only JSON. You should rephrase the description to be a summary of the product itself, not the terms of the deal.\n", + "Remember to respond with a paragraph of text in the product_description field for each of the 5 items that you select.\n", + "Be careful with products that are described as \"$XXX off\" or \"reduced by $XXX\" - this isn't the actual price of the product. Only respond with products when you are highly confident about the price. \n", + "\n", + "Deals:\n", + "\n", + "\"\"\"\n", + "user_prompt += '\\n\\n'.join([deal.describe() for deal in deals])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "020947a6-561b-417b-98a0-a085e31d2ce3", + "metadata": {}, + "outputs": [], + "source": [ + "print(user_prompt[:2000])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7de46f74-868c-4127-8a68-cf2da7d600bb", + "metadata": {}, + "outputs": [], + "source": [ + "def get_recommendations():\n", + " completion = openai.beta.chat.completions.parse(\n", + " model=\"gpt-4o-mini\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ],\n", + " response_format=DealSelection\n", + " )\n", + " result = completion.choices[0].message.parsed\n", + " return result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4c06270d-8c17-4d5a-9cfe-b6cefe788d5e", + "metadata": {}, + "outputs": [], + "source": [ + "result = get_recommendations()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "84e62845-3338-441a-8161-c70097af4773", + "metadata": {}, + "outputs": [], + "source": [ + "len(result.deals)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e5554a0a-ae40-4684-ad3e-faa3d22e030c", + "metadata": {}, + "outputs": [], + "source": [ + "result.deals[1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8bdc57fb-7497-47af-a643-6ba5a21cc17e", + "metadata": {}, + "outputs": [], + "source": [ + "from agents.scanner_agent import ScannerAgent" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "132278bc-217a-43a6-b6c4-724140c6a225", + "metadata": {}, + "outputs": [], + "source": [ + "agent = ScannerAgent()\n", + "result = agent.scan()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2e1d013a-c930-4dad-901b-41433379e14b", + "metadata": {}, + "outputs": [], + "source": [ + "result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5ee2e837-1f1d-42d4-8bc4-51cccc343006", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week8/community_contributions/Ensemble_with_xgboost/Create_Vector_Database.ipynb b/week8/community_contributions/Ensemble_with_xgboost/Create_Vector_Database.ipynb new file mode 100644 index 0000000..2dcc68e --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/Create_Vector_Database.ipynb @@ -0,0 +1,208 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "993a2a24-1a58-42be-8034-6d116fb8d786", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import re\n", + "import math\n", + "import json\n", + "from tqdm import tqdm\n", + "import random\n", + "from dotenv import load_dotenv\n", + "from huggingface_hub import login\n", + "import numpy as np\n", + "import pickle\n", + "from sentence_transformers import SentenceTransformer\n", + "from datasets import load_dataset\n", + "import chromadb\n", + "from items import Item\n", + "from sklearn.manifold import TSNE\n", + "import plotly.graph_objects as go" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2359ccc0-dbf2-4b1e-9473-e472b32f548b", + "metadata": {}, + "outputs": [], + "source": [ + "# environment\n", + "\n", + "load_dotenv(override=True)\n", + "os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n", + "os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')\n", + "DB = \"products_vectorstore\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "645167e6-cf0d-42d2-949f-1089a25a2841", + "metadata": {}, + "outputs": [], + "source": [ + "# Log in to HuggingFace\n", + "\n", + "hf_token = os.environ['HF_TOKEN']\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "688bd995-ec3e-43cd-8179-7fe14b275877", + "metadata": {}, + "outputs": [], + "source": [ + "# With train.pkl in this folder\n", + "with open('train.pkl', 'rb') as file:\n", + " train = pickle.load(file)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f4aab95e-d719-4476-b6e7-e248120df25a", + "metadata": {}, + "outputs": [], + "source": [ + "client = chromadb.PersistentClient(path=DB)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5f95dafd-ab80-464e-ba8a-dec7a2424780", + "metadata": {}, + "outputs": [], + "source": [ + "# Check if the collection exists and delete it if it does\n", + "collection_name = \"products\"\n", + "existing_collection_names = [collection.name for collection in client.list_collections()]\n", + "if collection_name in existing_collection_names:\n", + " client.delete_collection(collection_name)\n", + " print(f\"Deleted existing collection: {collection_name}\")\n", + "\n", + "collection = client.create_collection(collection_name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a87db200-d19d-44bf-acbd-15c45c70f5c9", + "metadata": {}, + "outputs": [], + "source": [ + "model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b23a025-4c35-4d3a-96ad-b956cad37b0a", + "metadata": {}, + "outputs": [], + "source": [ + "# Pass in a list of texts, get back a numpy array of vectors\n", + "vector = model.encode([\"Well hi there\"])[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8adde63f-e732-4f7c-bba9-f8b2a469f14e", + "metadata": {}, + "outputs": [], + "source": [ + "vector" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "38de1bf8-c9b5-45b4-9f4b-86af93b3f80d", + "metadata": {}, + "outputs": [], + "source": [ + "def description(item):\n", + " text = item.prompt.replace(\"How much does this cost to the nearest dollar?\\n\\n\", \"\")\n", + " return text.split(\"\\n\\nPrice is $\")[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8c1205bd-4692-44ef-8ea4-69f255354537", + "metadata": {}, + "outputs": [], + "source": [ + "description(train[0])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8c79e2fe-1f50-4ebf-9a93-34f3088f2996", + "metadata": {}, + "outputs": [], + "source": [ + "for i in tqdm(range(0, len(train), 1000)):\n", + " documents = [description(item) for item in train[i: i+1000]]\n", + " vectors = model.encode(documents).astype(float).tolist()\n", + " metadatas = [{\"category\": item.category, \"price\": item.price} for item in train[i: i+1000]]\n", + " ids = [f\"doc_{j}\" for j in range(i, i+1000)]\n", + " collection.add(\n", + " ids=ids,\n", + " documents=documents,\n", + " embeddings=vectors,\n", + " metadatas=metadatas\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5a9395db-7bc9-47f9-902f-af8d380c9c09", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "745f73d9-f1a6-4e9f-96d9-1c38a1dd7559", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week8/community_contributions/Ensemble_with_xgboost/Deploy_Specialist_Agent_to_Modal.ipynb b/week8/community_contributions/Ensemble_with_xgboost/Deploy_Specialist_Agent_to_Modal.ipynb new file mode 100644 index 0000000..8e51070 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/Deploy_Specialist_Agent_to_Modal.ipynb @@ -0,0 +1,104 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "bc0e1c1c-be6a-4395-bbbd-eeafc9330d7e", + "metadata": {}, + "outputs": [], + "source": [ + "# import modal\n", + "import modal" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0d240622-8422-4c99-8464-c04d063e4cb6", + "metadata": {}, + "outputs": [], + "source": [ + "# !modal setup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0050c070-146f-4c26-8045-5ff284761199", + "metadata": {}, + "outputs": [], + "source": [ + "import os" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ebf35de4-ef8f-4e5b-8d4e-9a1771bfbe25", + "metadata": {}, + "outputs": [], + "source": [ + "os.environ['PYTHONIOENCODING'] = 'utf-8'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7f90d857-2f12-4521-bb90-28efd917f7d1", + "metadata": {}, + "outputs": [], + "source": [ + "!modal deploy pricer_service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1dec70ff-1986-4405-8624-9bbbe0ce1f4a", + "metadata": {}, + "outputs": [], + "source": [ + "pricer = modal.Cls.from_name(\"pricer-service\", \"Pricer\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17776139-0d9e-4ad0-bcd0-82d3a92ca61f", + "metadata": {}, + "outputs": [], + "source": [ + "pricer().price.remote(\"Quadcast HyperX condenser mic, connects via usb-c to your computer for crystal clear audio\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "deb6cdf6-bcb0-49fb-8671-bb5eb22f02e3", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week8/community_contributions/Ensemble_with_xgboost/Visualize_vectors.ipynb b/week8/community_contributions/Ensemble_with_xgboost/Visualize_vectors.ipynb new file mode 100644 index 0000000..7721480 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/Visualize_vectors.ipynb @@ -0,0 +1,195 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "993a2a24-1a58-42be-8034-6d116fb8d786", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import re\n", + "import math\n", + "import json\n", + "from tqdm import tqdm\n", + "import random\n", + "from dotenv import load_dotenv\n", + "from huggingface_hub import login\n", + "import numpy as np\n", + "import pickle\n", + "from sentence_transformers import SentenceTransformer\n", + "from datasets import load_dataset\n", + "import chromadb\n", + "from items import Item\n", + "from sklearn.manifold import TSNE\n", + "import plotly.graph_objects as go" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1cc1fe53-612f-4228-aa02-8758f4c2098f", + "metadata": {}, + "outputs": [], + "source": [ + "MAXIMUM_DATAPOINTS = 30_000" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f4aab95e-d719-4476-b6e7-e248120df25a", + "metadata": {}, + "outputs": [], + "source": [ + "DB = \"products_vectorstore\"\n", + "client = chromadb.PersistentClient(path=DB)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5f95dafd-ab80-464e-ba8a-dec7a2424780", + "metadata": {}, + "outputs": [], + "source": [ + "collection = client.get_or_create_collection('products')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "525fc313-8a16-4ac0-8c42-6a6d1ba1c9b8", + "metadata": {}, + "outputs": [], + "source": [ + "CATEGORIES = ['Appliances', 'Automotive', 'Cell_Phones_and_Accessories', 'Electronics','Musical_Instruments', 'Office_Products', 'Tools_and_Home_Improvement', 'Toys_and_Games']\n", + "COLORS = ['red', 'blue', 'brown', 'orange', 'yellow', 'green' , 'purple', 'cyan']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a4cf1c9a-1ced-48d4-974c-3c850905034e", + "metadata": {}, + "outputs": [], + "source": [ + "# Prework\n", + "result = collection.get(include=['embeddings', 'documents', 'metadatas'], limit=MAXIMUM_DATAPOINTS)\n", + "vectors = np.array(result['embeddings'])\n", + "documents = result['documents']\n", + "categories = [metadata['category'] for metadata in result['metadatas']]\n", + "colors = [COLORS[CATEGORIES.index(c)] for c in categories]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c54df150-c8d8-4bc3-8877-6759691eeb42", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's try a 2D chart\n", + "tsne_2d = TSNE(n_components=2, random_state=42, n_jobs=-1)\n", + "reduced_vectors_2d = tsne_2d.fit_transform(vectors)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c93457ab-d895-4d9c-8e5c-1173e2089cfd", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's try 3D!\n", + "tsne_3d = TSNE(n_components=3, random_state=42, n_jobs=-1)\n", + "reduced_vectors_3d = tsne_3d.fit_transform(vectors)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e8fb2a63-24c5-4dce-9e63-aa208272f82d", + "metadata": {}, + "outputs": [], + "source": [ + "# Create the 2D scatter plot\n", + "fig = go.Figure(data=[go.Scatter(\n", + " x=reduced_vectors_2d[:, 0],\n", + " y=reduced_vectors_2d[:, 1],\n", + " mode='markers',\n", + " marker=dict(size=3, color=colors, opacity=0.7),\n", + ")])\n", + "\n", + "fig.update_layout(\n", + " title='2D Chroma Vectorstore Visualization',\n", + " scene=dict(xaxis_title='x', yaxis_title='y'),\n", + " width=1200,\n", + " height=800,\n", + " margin=dict(r=20, b=10, l=10, t=40)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5e4ae088-3d29-45d3-87a2-fea805fe2c65", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "# Create the 3D scatter plot\n", + "fig = go.Figure(data=[go.Scatter3d(\n", + " x=reduced_vectors_3d[:, 0],\n", + " y=reduced_vectors_3d[:, 1],\n", + " z=reduced_vectors_3d[:, 2],\n", + " mode='markers',\n", + " marker=dict(size=3, color=colors, opacity=0.7),\n", + ")])\n", + "\n", + "fig.update_layout(\n", + " title='3D Chroma Vector Store Visualization',\n", + " scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),\n", + " width=1200,\n", + " height=800,\n", + " margin=dict(r=20, b=10, l=10, t=40)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0a12d1e8-7da8-401d-8c8d-ba0098096ded", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week8/community_contributions/Ensemble_with_xgboost/agents/agent.py b/week8/community_contributions/Ensemble_with_xgboost/agents/agent.py new file mode 100644 index 0000000..fe09e18 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/agents/agent.py @@ -0,0 +1,33 @@ +import logging + +class Agent: + """ + An abstract superclass for Agents + Used to log messages in a way that can identify each Agent + """ + + # Foreground colors + RED = '\033[31m' + GREEN = '\033[32m' + YELLOW = '\033[33m' + BLUE = '\033[34m' + MAGENTA = '\033[35m' + CYAN = '\033[36m' + WHITE = '\033[37m' + + # Background color + BG_BLACK = '\033[40m' + + # Reset code to return to default color + RESET = '\033[0m' + + name: str = "" + color: str = '\033[37m' + + def log(self, message): + """ + Log this as an info message, identifying the agent + """ + color_code = self.BG_BLACK + self.color + message = f"[{self.name}] {message}" + logging.info(color_code + message + self.RESET) \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/agents/deals.py b/week8/community_contributions/Ensemble_with_xgboost/agents/deals.py new file mode 100644 index 0000000..5fb8039 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/agents/deals.py @@ -0,0 +1,109 @@ +from pydantic import BaseModel +from typing import List, Dict, Self +from bs4 import BeautifulSoup +import re +import feedparser +from tqdm import tqdm +import requests +import time + +feeds = [ + "https://www.dealnews.com/c142/Electronics/?rss=1", + "https://www.dealnews.com/c39/Computers/?rss=1", + "https://www.dealnews.com/c238/Automotive/?rss=1", + "https://www.dealnews.com/f1912/Smart-Home/?rss=1", + "https://www.dealnews.com/c196/Home-Garden/?rss=1", + ] + +def extract(html_snippet: str) -> str: + """ + Use Beautiful Soup to clean up this HTML snippet and extract useful text + """ + soup = BeautifulSoup(html_snippet, 'html.parser') + snippet_div = soup.find('div', class_='snippet summary') + + if snippet_div: + description = snippet_div.get_text(strip=True) + description = BeautifulSoup(description, 'html.parser').get_text() + description = re.sub('<[^<]+?>', '', description) + result = description.strip() + else: + result = html_snippet + return result.replace('\n', ' ') + +class ScrapedDeal: + """ + A class to represent a Deal retrieved from an RSS feed + """ + category: str + title: str + summary: str + url: str + details: str + features: str + + def __init__(self, entry: Dict[str, str]): + """ + Populate this instance based on the provided dict + """ + self.title = entry['title'] + self.summary = extract(entry['summary']) + self.url = entry['links'][0]['href'] + stuff = requests.get(self.url).content + soup = BeautifulSoup(stuff, 'html.parser') + content = soup.find('div', class_='content-section').get_text() + content = content.replace('\nmore', '').replace('\n', ' ') + if "Features" in content: + self.details, self.features = content.split("Features") + else: + self.details = content + self.features = "" + + def __repr__(self): + """ + Return a string to describe this deal + """ + return f"<{self.title}>" + + def describe(self): + """ + Return a longer string to describe this deal for use in calling a model + """ + return f"Title: {self.title}\nDetails: {self.details.strip()}\nFeatures: {self.features.strip()}\nURL: {self.url}" + + @classmethod + def fetch(cls, show_progress : bool = False) -> List[Self]: + """ + Retrieve all deals from the selected RSS feeds + """ + deals = [] + feed_iter = tqdm(feeds) if show_progress else feeds + for feed_url in feed_iter: + feed = feedparser.parse(feed_url) + for entry in feed.entries[:10]: + deals.append(cls(entry)) + time.sleep(0.5) + return deals + +class Deal(BaseModel): + """ + A class to Represent a Deal with a summary description + """ + product_description: str + price: float + url: str + +class DealSelection(BaseModel): + """ + A class to Represent a list of Deals + """ + deals: List[Deal] + +class Opportunity(BaseModel): + """ + A class to represent a possible opportunity: a Deal where we estimate + it should cost more than it's being offered + """ + deal: Deal + estimate: float + discount: float \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/agents/ensemble_agent.py b/week8/community_contributions/Ensemble_with_xgboost/agents/ensemble_agent.py new file mode 100644 index 0000000..c46e523 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/agents/ensemble_agent.py @@ -0,0 +1,52 @@ +import pandas as pd +from sklearn.linear_model import LinearRegression +import joblib + +from agents.agent import Agent +from agents.specialist_agent import SpecialistAgent +from agents.frontier_agent import FrontierAgent +from agents.random_forest_agent import RandomForestAgent +from agents.gradient_boosting_agent import GradientBoostingAgent + +class EnsembleAgent(Agent): + + name = "Ensemble Agent" + color = Agent.YELLOW + + def __init__(self, collection): + """ + Create an instance of Ensemble, by creating each of the models + And loading the weights of the Ensemble + """ + self.log("Initializing Ensemble Agent") + self.specialist = SpecialistAgent() + self.frontier = FrontierAgent(collection) + self.random_forest = RandomForestAgent() + self.gradient_boosting = GradientBoostingAgent() + self.model = joblib.load('ensemble_model.pkl') + self.log("Ensemble Agent is ready") + + def price(self, description: str) -> float: + """ + Run this ensemble model + Ask each of the models to price the product + Then use the Linear Regression model to return the weighted price + :param description: the description of a product + :return: an estimate of its price + """ + self.log("Running Ensemble Agent - collaborating with specialist, frontier and random forest agents") + specialist = self.specialist.price(description) + frontier = self.frontier.price(description) + random_forest = self.random_forest.price(description) + gradient_boosting = self.gradient_boosting.price(description) + X = pd.DataFrame({ + 'Specialist': [specialist], + 'Frontier': [frontier], + 'RandomForest': [random_forest], + 'GradientBoosting': [gradient_boosting], + 'Min': [min(specialist, frontier, random_forest)], + 'Max': [max(specialist, frontier, random_forest)], + }) + y = max(0, self.model.predict(X)[0]) + self.log(f"Ensemble Agent complete - returning ${y:.2f}") + return y \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/agents/frontier_agent.py b/week8/community_contributions/Ensemble_with_xgboost/agents/frontier_agent.py new file mode 100644 index 0000000..590c9e8 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/agents/frontier_agent.py @@ -0,0 +1,109 @@ +# imports + +import os +import re +import math +import json +from typing import List, Dict +import openai +from openai import OpenAI +from sentence_transformers import SentenceTransformer +from datasets import load_dataset +import chromadb +from items import Item +from testing import Tester +from agents.agent import Agent + + +class FrontierAgent(Agent): + + name = "Frontier Agent" + color = Agent.BLUE + + MODEL = "gpt-4o-mini" + + def __init__(self, collection): + """ + Set up this instance by connecting to OpenAI or DeepSeek, to the Chroma Datastore, + And setting up the vector encoding model + """ + self.log("Initializing Frontier Agent") + openai.api_key = os.getenv("OPENAI_API_KEY") + self.client = OpenAI() + self.MODEL = "gpt-4o-mini" + self.log("Frontier Agent is setting up with OpenAI") + self.collection = collection + self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') + self.log("Frontier Agent is ready") + + def make_context(self, similars: List[str], prices: List[float]) -> str: + """ + Create context that can be inserted into the prompt + :param similars: similar products to the one being estimated + :param prices: prices of the similar products + :return: text to insert in the prompt that provides context + """ + message = "To provide some context, here are some other items that might be similar to the item you need to estimate.\n\n" + for similar, price in zip(similars, prices): + message += f"Potentially related product:\n{similar}\nPrice is ${price:.2f}\n\n" + return message + + def messages_for(self, description: str, similars: List[str], prices: List[float]) -> List[Dict[str, str]]: + """ + Create the message list to be included in a call to OpenAI + With the system and user prompt + :param description: a description of the product + :param similars: similar products to this one + :param prices: prices of similar products + :return: the list of messages in the format expected by OpenAI + """ + system_message = "You estimate prices of items. Reply only with the price, no explanation. Price is always below $1000." + user_prompt = self.make_context(similars, prices) + user_prompt += "And now the question for you:\n\n" + user_prompt += "How much does this cost?\n\n" + description + return [ + {"role": "system", "content": system_message}, + {"role": "user", "content": user_prompt}, + {"role": "assistant", "content": "Price is $"} + ] + + def find_similars(self, description: str): + """ + Return a list of items similar to the given one by looking in the Chroma datastore + """ + self.log("Frontier Agent is performing a RAG search of the Chroma datastore to find 5 similar products") + vector = self.model.encode([description]) + results = self.collection.query(query_embeddings=vector.astype(float).tolist(), n_results=5) + documents = results['documents'][0][:] + prices = [m['price'] for m in results['metadatas'][0][:]] + self.log("Frontier Agent has found similar products") + return documents, prices + + def get_price(self, s) -> float: + """ + A utility that plucks a floating point number out of a string + """ + s = s.replace('$','').replace(',','') + match = re.search(r"[-+]?\d*\.\d+|\d+", s) + return float(match.group()) if match else 0.0 + + def price(self, description: str) -> float: + """ + Make a call to OpenAI to estimate the price of the described product, + by looking up 5 similar products and including them in the prompt to give context + :param description: a description of the product + :return: an estimate of the price + """ + documents, prices = self.find_similars(description) + self.log(f"Frontier Agent is about to call {self.MODEL} with context including 5 similar products") + response = self.client.chat.completions.create( + model=self.MODEL, + messages=self.messages_for(description, documents, prices), + seed=42, + max_tokens=5 + ) + reply = response.choices[0].message.content + result = self.get_price(reply) + self.log(f"Frontier Agent completed - predicting ${result:.2f}") + return result + \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/agents/gradient_boosting_agent.py b/week8/community_contributions/Ensemble_with_xgboost/agents/gradient_boosting_agent.py new file mode 100644 index 0000000..01274fb --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/agents/gradient_boosting_agent.py @@ -0,0 +1,37 @@ +# imports + +import os +import re +from typing import List +from sentence_transformers import SentenceTransformer +import joblib +from agents.agent import Agent + + + +class GradientBoostingAgent(Agent): + + name = "Gradient Boosting Agent" + color = Agent.MAGENTA + + def __init__(self): + """ + Initialize this object by loading in the saved model weights + and the SentenceTransformer vector encoding model + """ + self.log("Gradient Boosting Agent is initializing") + self.vectorizer = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') + self.model = joblib.load('gradient_boosting_model.pkl') + self.log("Gradient Boosting Agent is ready") + + def price(self, description: str) -> float: + """ + Use a Random Forest model to estimate the price of the described item + :param description: the product to be estimated + :return: the price as a float + """ + self.log("Gradient Boosting Agent is starting a prediction") + vector = self.vectorizer.encode([description]) + result = max(0, self.model.predict(vector)[0]) + self.log(f"Gradient Boosting Agent completed - predicting ${result:.2f}") + return result \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/agents/messaging_agent.py b/week8/community_contributions/Ensemble_with_xgboost/agents/messaging_agent.py new file mode 100644 index 0000000..7494703 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/agents/messaging_agent.py @@ -0,0 +1,79 @@ +import os +# from twilio.rest import Client +from agents.deals import Opportunity +import http.client +import urllib +from agents.agent import Agent + +# Uncomment the Twilio lines if you wish to use Twilio + +DO_TEXT = False +DO_PUSH = True + +class MessagingAgent(Agent): + + name = "Messaging Agent" + color = Agent.WHITE + + def __init__(self): + """ + Set up this object to either do push notifications via Pushover, + or SMS via Twilio, + whichever is specified in the constants + """ + self.log(f"Messaging Agent is initializing") + if DO_TEXT: + account_sid = os.getenv('TWILIO_ACCOUNT_SID', 'your-sid-if-not-using-env') + auth_token = os.getenv('TWILIO_AUTH_TOKEN', 'your-auth-if-not-using-env') + self.me_from = os.getenv('TWILIO_FROM', 'your-phone-number-if-not-using-env') + self.me_to = os.getenv('MY_PHONE_NUMBER', 'your-phone-number-if-not-using-env') + # self.client = Client(account_sid, auth_token) + self.log("Messaging Agent has initialized Twilio") + if DO_PUSH: + self.pushover_user = os.getenv('PUSHOVER_USER', 'your-pushover-user-if-not-using-env') + self.pushover_token = os.getenv('PUSHOVER_TOKEN', 'your-pushover-user-if-not-using-env') + self.log("Messaging Agent has initialized Pushover") + + def message(self, text): + """ + Send an SMS message using the Twilio API + """ + self.log("Messaging Agent is sending a text message") + message = self.client.messages.create( + from_=self.me_from, + body=text, + to=self.me_to + ) + + def push(self, text): + """ + Send a Push Notification using the Pushover API + """ + self.log("Messaging Agent is sending a push notification") + conn = http.client.HTTPSConnection("api.pushover.net:443") + conn.request("POST", "/1/messages.json", + urllib.parse.urlencode({ + "token": self.pushover_token, + "user": self.pushover_user, + "message": text, + "sound": "cashregister" + }), { "Content-type": "application/x-www-form-urlencoded" }) + conn.getresponse() + + def alert(self, opportunity: Opportunity): + """ + Make an alert about the specified Opportunity + """ + text = f"Deal Alert! Price=${opportunity.deal.price:.2f}, " + text += f"Estimate=${opportunity.estimate:.2f}, " + text += f"Discount=${opportunity.discount:.2f} :" + text += opportunity.deal.product_description[:10]+'... ' + text += opportunity.deal.url + if DO_TEXT: + self.message(text) + if DO_PUSH: + self.push(text) + self.log("Messaging Agent has completed") + + + \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/agents/planning_agent.py b/week8/community_contributions/Ensemble_with_xgboost/agents/planning_agent.py new file mode 100644 index 0000000..547536a --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/agents/planning_agent.py @@ -0,0 +1,57 @@ +from typing import Optional, List +from agents.agent import Agent +from agents.deals import ScrapedDeal, DealSelection, Deal, Opportunity +from agents.scanner_agent import ScannerAgent +from agents.ensemble_agent import EnsembleAgent +from agents.messaging_agent import MessagingAgent + + +class PlanningAgent(Agent): + + name = "Planning Agent" + color = Agent.GREEN + DEAL_THRESHOLD = 50 + + def __init__(self, collection): + """ + Create instances of the 3 Agents that this planner coordinates across + """ + self.log("Planning Agent is initializing") + self.scanner = ScannerAgent() + self.ensemble = EnsembleAgent(collection) + self.messenger = MessagingAgent() + self.log("Planning Agent is ready") + + def run(self, deal: Deal) -> Opportunity: + """ + Run the workflow for a particular deal + :param deal: the deal, summarized from an RSS scrape + :returns: an opportunity including the discount + """ + self.log("Planning Agent is pricing up a potential deal") + estimate = self.ensemble.price(deal.product_description) + discount = estimate - deal.price + self.log(f"Planning Agent has processed a deal with discount ${discount:.2f}") + return Opportunity(deal=deal, estimate=estimate, discount=discount) + + def plan(self, memory: List[str] = []) -> Optional[Opportunity]: + """ + Run the full workflow: + 1. Use the ScannerAgent to find deals from RSS feeds + 2. Use the EnsembleAgent to estimate them + 3. Use the MessagingAgent to send a notification of deals + :param memory: a list of URLs that have been surfaced in the past + :return: an Opportunity if one was surfaced, otherwise None + """ + self.log("Planning Agent is kicking off a run") + selection = self.scanner.scan(memory=memory) + if selection: + opportunities = [self.run(deal) for deal in selection.deals[:5]] + opportunities.sort(key=lambda opp: opp.discount, reverse=True) + best = opportunities[0] + self.log(f"Planning Agent has identified the best deal has discount ${best.discount:.2f}") + if best.discount > self.DEAL_THRESHOLD: + self.messenger.alert(best) + self.log("Planning Agent has completed a run") + return best if best.discount > self.DEAL_THRESHOLD else None + return None \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/agents/random_forest_agent.py b/week8/community_contributions/Ensemble_with_xgboost/agents/random_forest_agent.py new file mode 100644 index 0000000..bfe9715 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/agents/random_forest_agent.py @@ -0,0 +1,37 @@ +# imports + +import os +import re +from typing import List +from sentence_transformers import SentenceTransformer +import joblib +from agents.agent import Agent + + + +class RandomForestAgent(Agent): + + name = "Random Forest Agent" + color = Agent.MAGENTA + + def __init__(self): + """ + Initialize this object by loading in the saved model weights + and the SentenceTransformer vector encoding model + """ + self.log("Random Forest Agent is initializing") + self.vectorizer = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') + self.model = joblib.load('random_forest_model.pkl') + self.log("Random Forest Agent is ready") + + def price(self, description: str) -> float: + """ + Use a Random Forest model to estimate the price of the described item + :param description: the product to be estimated + :return: the price as a float + """ + self.log("Random Forest Agent is starting a prediction") + vector = self.vectorizer.encode([description]) + result = max(0, self.model.predict(vector)[0]) + self.log(f"Random Forest Agent completed - predicting ${result:.2f}") + return result \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/agents/scanner_agent.py b/week8/community_contributions/Ensemble_with_xgboost/agents/scanner_agent.py new file mode 100644 index 0000000..8dc6674 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/agents/scanner_agent.py @@ -0,0 +1,94 @@ +import os +import json +from typing import Optional, List +from openai import OpenAI +from agents.deals import ScrapedDeal, DealSelection +from agents.agent import Agent + + +class ScannerAgent(Agent): + + MODEL = "gpt-4o-mini" + + SYSTEM_PROMPT = """You identify and summarize the 5 most detailed deals from a list, by selecting deals that have the most detailed, high quality description and the most clear price. + Respond strictly in JSON with no explanation, using this format. You should provide the price as a number derived from the description. If the price of a deal isn't clear, do not include that deal in your response. + Most important is that you respond with the 5 deals that have the most detailed product description with price. It's not important to mention the terms of the deal; most important is a thorough description of the product. + Be careful with products that are described as "$XXX off" or "reduced by $XXX" - this isn't the actual price of the product. Only respond with products when you are highly confident about the price. + + {"deals": [ + { + "product_description": "Your clearly expressed summary of the product in 4-5 sentences. Details of the item are much more important than why it's a good deal. Avoid mentioning discounts and coupons; focus on the item itself. There should be a paragpraph of text for each item you choose.", + "price": 99.99, + "url": "the url as provided" + }, + ... + ]}""" + + USER_PROMPT_PREFIX = """Respond with the most promising 5 deals from this list, selecting those which have the most detailed, high quality product description and a clear price that is greater than 0. + Respond strictly in JSON, and only JSON. You should rephrase the description to be a summary of the product itself, not the terms of the deal. + Remember to respond with a paragraph of text in the product_description field for each of the 5 items that you select. + Be careful with products that are described as "$XXX off" or "reduced by $XXX" - this isn't the actual price of the product. Only respond with products when you are highly confident about the price. + + Deals: + + """ + + USER_PROMPT_SUFFIX = "\n\nStrictly respond in JSON and include exactly 5 deals, no more." + + name = "Scanner Agent" + color = Agent.CYAN + + def __init__(self): + """ + Set up this instance by initializing OpenAI + """ + self.log("Scanner Agent is initializing") + self.openai = OpenAI() + self.log("Scanner Agent is ready") + + def fetch_deals(self, memory) -> List[ScrapedDeal]: + """ + Look up deals published on RSS feeds + Return any new deals that are not already in the memory provided + """ + self.log("Scanner Agent is about to fetch deals from RSS feed") + urls = [opp.deal.url for opp in memory] + scraped = ScrapedDeal.fetch() + result = [scrape for scrape in scraped if scrape.url not in urls] + self.log(f"Scanner Agent received {len(result)} deals not already scraped") + return result + + def make_user_prompt(self, scraped) -> str: + """ + Create a user prompt for OpenAI based on the scraped deals provided + """ + user_prompt = self.USER_PROMPT_PREFIX + user_prompt += '\n\n'.join([scrape.describe() for scrape in scraped]) + user_prompt += self.USER_PROMPT_SUFFIX + return user_prompt + + def scan(self, memory: List[str]=[]) -> Optional[DealSelection]: + """ + Call OpenAI to provide a high potential list of deals with good descriptions and prices + Use StructuredOutputs to ensure it conforms to our specifications + :param memory: a list of URLs representing deals already raised + :return: a selection of good deals, or None if there aren't any + """ + scraped = self.fetch_deals(memory) + if scraped: + user_prompt = self.make_user_prompt(scraped) + self.log("Scanner Agent is calling OpenAI using Structured Output") + result = self.openai.beta.chat.completions.parse( + model=self.MODEL, + messages=[ + {"role": "system", "content": self.SYSTEM_PROMPT}, + {"role": "user", "content": user_prompt} + ], + response_format=DealSelection + ) + result = result.choices[0].message.parsed + result.deals = [deal for deal in result.deals if deal.price>0] + self.log(f"Scanner Agent received {len(result.deals)} selected deals with price>0 from OpenAI") + return result + return None + diff --git a/week8/community_contributions/Ensemble_with_xgboost/agents/specialist_agent.py b/week8/community_contributions/Ensemble_with_xgboost/agents/specialist_agent.py new file mode 100644 index 0000000..1bab0d5 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/agents/specialist_agent.py @@ -0,0 +1,29 @@ +import modal +from agents.agent import Agent + + +class SpecialistAgent(Agent): + """ + An Agent that runs our fine-tuned LLM that's running remotely on Modal + """ + + name = "Specialist Agent" + color = Agent.RED + + def __init__(self): + """ + Set up this Agent by creating an instance of the modal class + """ + self.log("Specialist Agent is initializing - connecting to modal") + Pricer = modal.Cls.from_name("pricer-service", "Pricer") + self.pricer = Pricer() + self.log("Specialist Agent is ready") + + def price(self, description: str) -> float: + """ + Make a remote call to return the estimate of the price of this item + """ + self.log("Specialist Agent is calling remote fine-tuned model") + result = self.pricer.price.remote(description) + self.log(f"Specialist Agent completed - predicting ${result:.2f}") + return result diff --git a/week8/community_contributions/Ensemble_with_xgboost/deal_agent_framework.py b/week8/community_contributions/Ensemble_with_xgboost/deal_agent_framework.py new file mode 100644 index 0000000..9692107 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/deal_agent_framework.py @@ -0,0 +1,99 @@ +import os +import sys +import logging +import json +from typing import List, Optional +from twilio.rest import Client +from dotenv import load_dotenv +import chromadb +from agents.planning_agent import PlanningAgent +from agents.deals import Opportunity +from sklearn.manifold import TSNE +import numpy as np + + +# Colors for logging +BG_BLUE = '\033[44m' +WHITE = '\033[37m' +RESET = '\033[0m' + +# Colors for plot +CATEGORIES = ['Appliances', 'Automotive', 'Cell_Phones_and_Accessories', 'Electronics','Musical_Instruments', 'Office_Products', 'Tools_and_Home_Improvement', 'Toys_and_Games'] +COLORS = ['red', 'blue', 'brown', 'orange', 'yellow', 'green' , 'purple', 'cyan'] + +def init_logging(): + root = logging.getLogger() + root.setLevel(logging.INFO) + + handler = logging.StreamHandler(sys.stdout) + handler.setLevel(logging.INFO) + formatter = logging.Formatter( + "[%(asctime)s] [Agents] [%(levelname)s] %(message)s", + datefmt="%Y-%m-%d %H:%M:%S %z", + ) + handler.setFormatter(formatter) + root.addHandler(handler) + +class DealAgentFramework: + + DB = "products_vectorstore" + MEMORY_FILENAME = "memory.json" + + def __init__(self): + init_logging() + load_dotenv() + client = chromadb.PersistentClient(path=self.DB) + self.memory = self.read_memory() + self.collection = client.get_or_create_collection('products') + self.planner = None + + def init_agents_as_needed(self): + if not self.planner: + self.log("Initializing Agent Framework") + self.planner = PlanningAgent(self.collection) + self.log("Agent Framework is ready") + + def read_memory(self) -> List[Opportunity]: + if os.path.exists(self.MEMORY_FILENAME): + with open(self.MEMORY_FILENAME, "r") as file: + data = json.load(file) + opportunities = [Opportunity(**item) for item in data] + return opportunities + return [] + + def write_memory(self) -> None: + data = [opportunity.dict() for opportunity in self.memory] + with open(self.MEMORY_FILENAME, "w") as file: + json.dump(data, file, indent=2) + + def log(self, message: str): + text = BG_BLUE + WHITE + "[Agent Framework] " + message + RESET + logging.info(text) + + def run(self) -> List[Opportunity]: + self.init_agents_as_needed() + logging.info("Kicking off Planning Agent") + result = self.planner.plan(memory=self.memory) + logging.info(f"Planning Agent has completed and returned: {result}") + if result: + self.memory.append(result) + self.write_memory() + return self.memory + + @classmethod + def get_plot_data(cls, max_datapoints=10000): + client = chromadb.PersistentClient(path=cls.DB) + collection = client.get_or_create_collection('products') + result = collection.get(include=['embeddings', 'documents', 'metadatas'], limit=max_datapoints) + vectors = np.array(result['embeddings']) + documents = result['documents'] + categories = [metadata['category'] for metadata in result['metadatas']] + colors = [COLORS[CATEGORIES.index(c)] for c in categories] + tsne = TSNE(n_components=3, random_state=42, n_jobs=-1) + reduced_vectors = tsne.fit_transform(vectors) + return documents, reduced_vectors, colors + + +if __name__=="__main__": + DealAgentFramework().run() + \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/items.py b/week8/community_contributions/Ensemble_with_xgboost/items.py new file mode 100644 index 0000000..1acaf5d --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/items.py @@ -0,0 +1,101 @@ +from typing import Optional +from transformers import AutoTokenizer +import re + +BASE_MODEL = "meta-llama/Meta-Llama-3.1-8B" +MIN_TOKENS = 150 +MAX_TOKENS = 160 +MIN_CHARS = 300 +CEILING_CHARS = MAX_TOKENS * 7 + +class Item: + """ + An Item is a cleaned, curated datapoint of a Product with a Price + """ + + tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True) + PREFIX = "Price is $" + QUESTION = "How much does this cost to the nearest dollar?" + REMOVALS = ['"Batteries Included?": "No"', '"Batteries Included?": "Yes"', '"Batteries Required?": "No"', '"Batteries Required?": "Yes"', "By Manufacturer", "Item", "Date First", "Package", ":", "Number of", "Best Sellers", "Number", "Product "] + + title: str + price: float + category: str + token_count: int = 0 + details: Optional[str] + prompt: Optional[str] = None + include = False + + def __init__(self, data, price): + self.title = data['title'] + self.price = price + self.parse(data) + + def scrub_details(self): + """ + Clean up the details string by removing common text that doesn't add value + """ + details = self.details + for remove in self.REMOVALS: + details = details.replace(remove, "") + return details + + def scrub(self, stuff): + """ + Clean up the provided text by removing unnecessary characters and whitespace + Also remove words that are 7+ chars and contain numbers, as these are likely irrelevant product numbers + """ + stuff = re.sub(r'[:\[\]"{}【】\s]+', ' ', stuff).strip() + stuff = stuff.replace(" ,", ",").replace(",,,",",").replace(",,",",") + words = stuff.split(' ') + select = [word for word in words if len(word)<7 or not any(char.isdigit() for char in word)] + return " ".join(select) + + def parse(self, data): + """ + Parse this datapoint and if it fits within the allowed Token range, + then set include to True + """ + contents = '\n'.join(data['description']) + if contents: + contents += '\n' + features = '\n'.join(data['features']) + if features: + contents += features + '\n' + self.details = data['details'] + if self.details: + contents += self.scrub_details() + '\n' + if len(contents) > MIN_CHARS: + contents = contents[:CEILING_CHARS] + text = f"{self.scrub(self.title)}\n{self.scrub(contents)}" + tokens = self.tokenizer.encode(text, add_special_tokens=False) + if len(tokens) > MIN_TOKENS: + tokens = tokens[:MAX_TOKENS] + text = self.tokenizer.decode(tokens) + self.make_prompt(text) + self.include = True + + def make_prompt(self, text): + """ + Set the prompt instance variable to be a prompt appropriate for training + """ + self.prompt = f"{self.QUESTION}\n\n{text}\n\n" + self.prompt += f"{self.PREFIX}{str(round(self.price))}.00" + self.token_count = len(self.tokenizer.encode(self.prompt, add_special_tokens=False)) + + def test_prompt(self): + """ + Return a prompt suitable for testing, with the actual price removed + """ + return self.prompt.split(self.PREFIX)[0] + self.PREFIX + + def __repr__(self): + """ + Return a String version of this Item + """ + return f"<{self.title} = ${self.price}>" + + + + + \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/log_utils.py b/week8/community_contributions/Ensemble_with_xgboost/log_utils.py new file mode 100644 index 0000000..8bc33fb --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/log_utils.py @@ -0,0 +1,35 @@ +# Foreground colors +RED = '\033[31m' +GREEN = '\033[32m' +YELLOW = '\033[33m' +BLUE = '\033[34m' +MAGENTA = '\033[35m' +CYAN = '\033[36m' +WHITE = '\033[37m' + +# Background color +BG_BLACK = '\033[40m' +BG_BLUE = '\033[44m' + +# Reset code to return to default color +RESET = '\033[0m' + +mapper = { + BG_BLACK+RED: "#dd0000", + BG_BLACK+GREEN: "#00dd00", + BG_BLACK+YELLOW: "#dddd00", + BG_BLACK+BLUE: "#0000ee", + BG_BLACK+MAGENTA: "#aa00dd", + BG_BLACK+CYAN: "#00dddd", + BG_BLACK+WHITE: "#87CEEB", + BG_BLUE+WHITE: "#ff7800" +} + + +def reformat(message): + for key, value in mapper.items(): + message = message.replace(key, f'') + message = message.replace(RESET, '') + return message + + \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/price_is_right.py b/week8/community_contributions/Ensemble_with_xgboost/price_is_right.py new file mode 100644 index 0000000..bc9b537 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/price_is_right.py @@ -0,0 +1,62 @@ +import gradio as gr +from deal_agent_framework import DealAgentFramework +from agents.deals import Opportunity, Deal + +class App: + + def __init__(self): + self.agent_framework = None + + def run(self): + with gr.Blocks(title="Deal Intel", fill_width=True) as ui: + + def table_for(opps): + return [[opp.deal.product_description, f"${opp.deal.price:.2f}", f"${opp.estimate:.2f}", f"${opp.discount:.2f}", opp.deal.url] for opp in opps] + + def start(): + self.agent_framework = DealAgentFramework() + self.agent_framework.init_agents_as_needed() + opportunities = self.agent_framework.memory + table = table_for(opportunities) + return table + + def go(): + self.agent_framework.run() + new_opportunities = self.agent_framework.memory + table = table_for(new_opportunities) + return table + + def do_select(selected_index: gr.SelectData): + opportunities = self.agent_framework.memory + row = selected_index.index[0] + opportunity = opportunities[row] + self.agent_framework.planner.messenger.alert(opportunity) + + with gr.Row(): + gr.Markdown('
"Deal Intel" - Deal Hunting Agentic AI
') + with gr.Row(): + gr.Markdown('
Autonomous agent framework that finds online deals, collaborating with a proprietary fine-tuned LLM deployed on Modal, and a RAG pipeline with a frontier model and Chroma.
') + with gr.Row(): + gr.Markdown('
Deals surfaced so far:
') + with gr.Row(): + opportunities_dataframe = gr.Dataframe( + headers=["Description", "Price", "Estimate", "Discount", "URL"], + wrap=True, + column_widths=[4, 1, 1, 1, 2], + row_count=10, + col_count=5, + max_height=400, + ) + + ui.load(start, inputs=[], outputs=[opportunities_dataframe]) + + timer = gr.Timer(value=60) + timer.tick(go, inputs=[], outputs=[opportunities_dataframe]) + + opportunities_dataframe.select(do_select) + + ui.launch(share=False, inbrowser=True) + +if __name__=="__main__": + App().run() + \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/price_is_right_final.py b/week8/community_contributions/Ensemble_with_xgboost/price_is_right_final.py new file mode 100644 index 0000000..fb242f1 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/price_is_right_final.py @@ -0,0 +1,166 @@ +import logging +import queue +import threading +import time +import gradio as gr +from deal_agent_framework import DealAgentFramework +from agents.deals import Opportunity, Deal +from log_utils import reformat +import plotly.graph_objects as go + + +class QueueHandler(logging.Handler): + def __init__(self, log_queue): + super().__init__() + self.log_queue = log_queue + + def emit(self, record): + self.log_queue.put(self.format(record)) + +def html_for(log_data): + output = '
'.join(log_data[-18:]) + return f""" +
+ {output} +
+ """ + +def setup_logging(log_queue): + handler = QueueHandler(log_queue) + formatter = logging.Formatter( + "[%(asctime)s] %(message)s", + datefmt="%Y-%m-%d %H:%M:%S %z", + ) + handler.setFormatter(formatter) + logger = logging.getLogger() + logger.addHandler(handler) + logger.setLevel(logging.INFO) + + +class App: + + def __init__(self): + self.agent_framework = None + + def get_agent_framework(self): + if not self.agent_framework: + self.agent_framework = DealAgentFramework() + self.agent_framework.init_agents_as_needed() + return self.agent_framework + + def run(self): + with gr.Blocks(title="Deal Intel", fill_width=True) as ui: + + log_data = gr.State([]) + + def table_for(opps): + return [[opp.deal.product_description, f"${opp.deal.price:.2f}", f"${opp.estimate:.2f}", f"${opp.discount:.2f}", opp.deal.url] for opp in opps] + + def update_output(log_data, log_queue, result_queue): + initial_result = table_for(self.get_agent_framework().memory) + final_result = None + while True: + try: + message = log_queue.get_nowait() + log_data.append(reformat(message)) + yield log_data, html_for(log_data), final_result or initial_result + except queue.Empty: + try: + final_result = result_queue.get_nowait() + yield log_data, html_for(log_data), final_result or initial_result + except queue.Empty: + if final_result is not None: + break + time.sleep(0.1) + + def get_initial_plot(): + fig = go.Figure() + fig.update_layout( + title='Loading vector DB...', + height=400, + ) + return fig + + def get_plot(): + documents, vectors, colors = DealAgentFramework.get_plot_data(max_datapoints=1000) + # Create the 3D scatter plot + fig = go.Figure(data=[go.Scatter3d( + x=vectors[:, 0], + y=vectors[:, 1], + z=vectors[:, 2], + mode='markers', + marker=dict(size=2, color=colors, opacity=0.7), + )]) + + fig.update_layout( + scene=dict(xaxis_title='x', + yaxis_title='y', + zaxis_title='z', + aspectmode='manual', + aspectratio=dict(x=2.2, y=2.2, z=1), # Make x-axis twice as long + camera=dict( + eye=dict(x=1.6, y=1.6, z=0.8) # Adjust camera position + )), + height=400, + margin=dict(r=5, b=1, l=5, t=2) + ) + + return fig + + def do_run(): + new_opportunities = self.get_agent_framework().run() + table = table_for(new_opportunities) + return table + + def run_with_logging(initial_log_data): + log_queue = queue.Queue() + result_queue = queue.Queue() + setup_logging(log_queue) + + def worker(): + result = do_run() + result_queue.put(result) + + thread = threading.Thread(target=worker) + thread.start() + + for log_data, output, final_result in update_output(initial_log_data, log_queue, result_queue): + yield log_data, output, final_result + + def do_select(selected_index: gr.SelectData): + opportunities = self.get_agent_framework().memory + row = selected_index.index[0] + opportunity = opportunities[row] + self.get_agent_framework().planner.messenger.alert(opportunity) + + with gr.Row(): + gr.Markdown('
Deal Intel - Autonomous Agent Framework that hunts for deals
') + with gr.Row(): + gr.Markdown('
A proprietary fine-tuned LLM deployed on Modal and a RAG pipeline with a frontier model collaborate to send push notifications with great online deals.
') + with gr.Row(): + opportunities_dataframe = gr.Dataframe( + headers=["Deals found so far", "Price", "Estimate", "Discount", "URL"], + wrap=True, + column_widths=[6, 1, 1, 1, 3], + row_count=10, + col_count=5, + max_height=400, + ) + with gr.Row(): + with gr.Column(scale=1): + logs = gr.HTML() + with gr.Column(scale=1): + plot = gr.Plot(value=get_plot(), show_label=False) + + ui.load(run_with_logging, inputs=[log_data], outputs=[log_data, logs, opportunities_dataframe]) + + timer = gr.Timer(value=300, active=True) + timer.tick(run_with_logging, inputs=[log_data], outputs=[log_data, logs, opportunities_dataframe]) + + opportunities_dataframe.select(do_select) + + ui.launch(share=False, inbrowser=True) + +if __name__=="__main__": + App().run() + \ No newline at end of file diff --git a/week8/community_contributions/Ensemble_with_xgboost/pricer_ephemeral.py b/week8/community_contributions/Ensemble_with_xgboost/pricer_ephemeral.py new file mode 100644 index 0000000..6fd56ab --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/pricer_ephemeral.py @@ -0,0 +1,66 @@ +import modal +from modal import App, Image + +# Setup + +app = modal.App("pricer") +image = Image.debian_slim().pip_install("torch", "transformers", "bitsandbytes", "accelerate", "peft") +secrets = [modal.Secret.from_name("hf-secret")] + +# Constants + +GPU = "T4" +BASE_MODEL = "meta-llama/Meta-Llama-3.1-8B" +PROJECT_NAME = "pricer" +HF_USER = "ed-donner" # your HF name here! Or use mine if you just want to reproduce my results. +RUN_NAME = "2024-09-13_13.04.39" +PROJECT_RUN_NAME = f"{PROJECT_NAME}-{RUN_NAME}" +REVISION = "e8d637df551603dc86cd7a1598a8f44af4d7ae36" +FINETUNED_MODEL = f"{HF_USER}/{PROJECT_RUN_NAME}" + + +@app.function(image=image, secrets=secrets, gpu=GPU, timeout=1800) +def price(description: str) -> float: + import os + import re + import torch + from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, set_seed + from peft import PeftModel + + QUESTION = "How much does this cost to the nearest dollar?" + PREFIX = "Price is $" + + prompt = f"{QUESTION}\n{description}\n{PREFIX}" + + # Quant Config + quant_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_use_double_quant=True, + bnb_4bit_compute_dtype=torch.bfloat16, + bnb_4bit_quant_type="nf4" + ) + + # Load model and tokenizer + + tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL) + tokenizer.pad_token = tokenizer.eos_token + tokenizer.padding_side = "right" + + base_model = AutoModelForCausalLM.from_pretrained( + BASE_MODEL, + quantization_config=quant_config, + device_map="auto" + ) + + fine_tuned_model = PeftModel.from_pretrained(base_model, FINETUNED_MODEL, revision=REVISION) + + set_seed(42) + inputs = tokenizer.encode(prompt, return_tensors="pt").to("cuda") + attention_mask = torch.ones(inputs.shape, device="cuda") + outputs = fine_tuned_model.generate(inputs, attention_mask=attention_mask, max_new_tokens=5, num_return_sequences=1) + result = tokenizer.decode(outputs[0]) + + contents = result.split("Price is $")[1] + contents = contents.replace(',','') + match = re.search(r"[-+]?\d*\.\d+|\d+", contents) + return float(match.group()) if match else 0 diff --git a/week8/community_contributions/Ensemble_with_xgboost/pricer_service.py b/week8/community_contributions/Ensemble_with_xgboost/pricer_service.py new file mode 100644 index 0000000..16d276b --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/pricer_service.py @@ -0,0 +1,89 @@ +import modal +from modal import App, Volume, Image + +# Setup - define our infrastructure with code! + +app = modal.App("pricer-service") +image = Image.debian_slim().pip_install("huggingface", "torch", "transformers", "bitsandbytes", "accelerate", "peft") +secrets = [modal.Secret.from_name("hf-secret")] + +# Constants + +GPU = "T4" +BASE_MODEL = "meta-llama/Meta-Llama-3.1-8B" +PROJECT_NAME = "pricer" +HF_USER = "ed-donner" # your HF name here! Or use mine if you just want to reproduce my results. +RUN_NAME = "2024-09-13_13.04.39" +PROJECT_RUN_NAME = f"{PROJECT_NAME}-{RUN_NAME}" +REVISION = "e8d637df551603dc86cd7a1598a8f44af4d7ae36" +FINETUNED_MODEL = f"{HF_USER}/{PROJECT_RUN_NAME}" +MODEL_DIR = "hf-cache/" +BASE_DIR = MODEL_DIR + BASE_MODEL +FINETUNED_DIR = MODEL_DIR + FINETUNED_MODEL + +QUESTION = "How much does this cost to the nearest dollar?" +PREFIX = "Price is $" + +@app.cls(image=image, secrets=secrets, gpu=GPU, timeout=1800) +class Pricer: + @modal.build() + def download_model_to_folder(self): + from huggingface_hub import snapshot_download + import os + os.makedirs(MODEL_DIR, exist_ok=True) + snapshot_download(BASE_MODEL, local_dir=BASE_DIR) + snapshot_download(FINETUNED_MODEL, revision=REVISION, local_dir=FINETUNED_DIR) + + @modal.enter() + def setup(self): + import os + import torch + from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, set_seed + from peft import PeftModel + + # Quant Config + quant_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_use_double_quant=True, + bnb_4bit_compute_dtype=torch.bfloat16, + bnb_4bit_quant_type="nf4" + ) + + # Load model and tokenizer + + self.tokenizer = AutoTokenizer.from_pretrained(BASE_DIR) + self.tokenizer.pad_token = self.tokenizer.eos_token + self.tokenizer.padding_side = "right" + + self.base_model = AutoModelForCausalLM.from_pretrained( + BASE_DIR, + quantization_config=quant_config, + device_map="auto" + ) + + self.fine_tuned_model = PeftModel.from_pretrained(self.base_model, FINETUNED_DIR, revision=REVISION) + + @modal.method() + def price(self, description: str) -> float: + import os + import re + import torch + from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, set_seed + from peft import PeftModel + + set_seed(42) + prompt = f"{QUESTION}\n\n{description}\n\n{PREFIX}" + inputs = self.tokenizer.encode(prompt, return_tensors="pt").to("cuda") + attention_mask = torch.ones(inputs.shape, device="cuda") + outputs = self.fine_tuned_model.generate(inputs, attention_mask=attention_mask, max_new_tokens=5, num_return_sequences=1) + result = self.tokenizer.decode(outputs[0]) + + contents = result.split("Price is $")[1] + contents = contents.replace(',','') + match = re.search(r"[-+]?\d*\.\d+|\d+", contents) + return float(match.group()) if match else 0 + + @modal.method() + def wake_up(self) -> str: + return "ok" + diff --git a/week8/community_contributions/Ensemble_with_xgboost/testing.py b/week8/community_contributions/Ensemble_with_xgboost/testing.py new file mode 100644 index 0000000..cd43924 --- /dev/null +++ b/week8/community_contributions/Ensemble_with_xgboost/testing.py @@ -0,0 +1,75 @@ +import math +import matplotlib.pyplot as plt + +GREEN = "\033[92m" +YELLOW = "\033[93m" +RED = "\033[91m" +RESET = "\033[0m" +COLOR_MAP = {"red":RED, "orange": YELLOW, "green": GREEN} + +class Tester: + + def __init__(self, predictor, data, title=None, size=250): + self.predictor = predictor + self.data = data + self.title = title or predictor.__name__.replace("_", " ").title() + self.size = size + self.guesses = [] + self.truths = [] + self.errors = [] + self.sles = [] + self.colors = [] + + def color_for(self, error, truth): + if error<40 or error/truth < 0.2: + return "green" + elif error<80 or error/truth < 0.4: + return "orange" + else: + return "red" + + def run_datapoint(self, i): + datapoint = self.data[i] + guess = self.predictor(datapoint) + truth = datapoint.price + error = abs(guess - truth) + log_error = math.log(truth+1) - math.log(guess+1) + sle = log_error ** 2 + color = self.color_for(error, truth) + title = datapoint.title if len(datapoint.title) <= 40 else datapoint.title[:40]+"..." + self.guesses.append(guess) + self.truths.append(truth) + self.errors.append(error) + self.sles.append(sle) + self.colors.append(color) + print(f"{COLOR_MAP[color]}{i+1}: Guess: ${guess:,.2f} Truth: ${truth:,.2f} Error: ${error:,.2f} SLE: {sle:,.2f} Item: {title}{RESET}") + + def chart(self, title): + max_error = max(self.errors) + plt.figure(figsize=(12, 8)) + max_val = max(max(self.truths), max(self.guesses)) + plt.plot([0, max_val], [0, max_val], color='deepskyblue', lw=2, alpha=0.6) + plt.scatter(self.truths, self.guesses, s=3, c=self.colors) + plt.xlabel('Ground Truth') + plt.ylabel('Model Estimate') + plt.xlim(0, max_val) + plt.ylim(0, max_val) + plt.title(title) + plt.show() + + def report(self): + average_error = sum(self.errors) / self.size + rmsle = math.sqrt(sum(self.sles) / self.size) + hits = sum(1 for color in self.colors if color=="green") + title = f"{self.title} Error=${average_error:,.2f} RMSLE={rmsle:,.2f} Hits={hits/self.size*100:.1f}%" + self.chart(title) + + def run(self): + self.error = 0 + for i in range(self.size): + self.run_datapoint(i) + self.report() + + @classmethod + def test(cls, function, data): + cls(function, data).run() \ No newline at end of file diff --git a/week8/community_contributions/lisekarimi/10_part1_ensemble_model.ipynb b/week8/community_contributions/lisekarimi/10_part1_ensemble_model.ipynb new file mode 100644 index 0000000..5635a9f --- /dev/null +++ b/week8/community_contributions/lisekarimi/10_part1_ensemble_model.ipynb @@ -0,0 +1,1126 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3ede0360-00f4-404e-b0d2-4a83cc385654", + "metadata": { + "id": "3ede0360-00f4-404e-b0d2-4a83cc385654" + }, + "source": [ + "🔗 Ensemble Model\n", + "---\n", + "We’ll reuse core components built earlier:\n", + "\n", + "- A fine-tuned LLaMA model\n", + "- An XGBoost regression model, stored in Hugging Face\n", + "- A ChromaDB vector store, stored on Google Drive and also available on AWS S3\n", + "- A GPT-4o mini + RAG pipeline\n", + "\n", + "We'll run all three models on the same test data, gather their predictions, and train a Linear Regression Ensemble. The ensemble learns how to combine these predictions to output a more accurate final price.\n", + "\n", + "Once trained, we'll save the ensemble as ensemble_model.pkl, ready for later use.\n", + "\n", + "- 🧑‍💻 Skill Level: Advanced\n", + "- ⚙️ Hardware: ⚠️ GPU required (use Google Colab)\n", + "- 🛠️ Requirements: \n", + "\n", + " - 🔑 Hugging Face Token and OpenAI Key — must be set in Google Colab secrets or .env files if you are running with your own GPU\n", + " - completion of Part 9 of [this series of notebooks](https://github.com/lisekarimi/lexo)\n", + "- 🎯 Task: Train and save the Ensemble Model\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "mzYB4XYQeWRQ", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "mzYB4XYQeWRQ", + "outputId": "f474ce9b-09fb-4a47-93d7-273fe2d2ba10" + }, + "outputs": [], + "source": [ + "# Install required packages in Google Colab\n", + "%pip install -q tqdm huggingface_hub numpy sentence-transformers datasets chromadb xgboost peft torch bitsandbytes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b3caecd1-8712-4acd-80b5-e8059c16f43f", + "metadata": { + "id": "b3caecd1-8712-4acd-80b5-e8059c16f43f" + }, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "import re\n", + "import zipfile\n", + "import chromadb\n", + "import joblib\n", + "import numpy as np\n", + "import pandas as pd\n", + "import requests\n", + "import torch\n", + "from datasets import load_dataset\n", + "from google.colab import userdata\n", + "from huggingface_hub import HfApi, hf_hub_download, login\n", + "from openai import OpenAI\n", + "from peft import PeftModel\n", + "from sentence_transformers import SentenceTransformer\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.metrics import r2_score, mean_squared_error\n", + "from sklearn.metrics import r2_score\n", + "from tqdm import tqdm\n", + "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05d9523f-b6c9-4132-bd2b-6712772b3cd2", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "05d9523f-b6c9-4132-bd2b-6712772b3cd2", + "outputId": "7077320e-43e2-4b03-ca7d-e7ea9a3407f8" + }, + "outputs": [], + "source": [ + "# Mount Google Drive to access saved ChromaDB and XGBoost model files\n", + "\n", + "from google.colab import drive\n", + "drive.mount(\"/content/drive\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "z9735RD_TUHw", + "metadata": { + "id": "z9735RD_TUHw" + }, + "outputs": [], + "source": [ + "# Load from Colab's secure storage\n", + "\n", + "openai_api_key = userdata.get(\"OPENAI_API_KEY\")\n", + "openai = OpenAI(api_key=openai_api_key)\n", + "\n", + "hf_token = userdata.get(\"HF_TOKEN\")\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "DtswsfBQxxJF", + "metadata": { + "id": "DtswsfBQxxJF" + }, + "outputs": [], + "source": [ + "# Configuration\n", + "\n", + "HF_USER = \"lisekarimi\"\n", + "ROOT = \"/content/drive/MyDrive/snapr\"\n", + "os.makedirs(ROOT, exist_ok=True)\n", + "\n", + "api = HfApi(token=hf_token)\n", + "REPO_NAME = \"smart-deal-finder-models\"\n", + "REPO_ID = f\"{HF_USER}/{REPO_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "id": "qByarIFiTYa1", + "metadata": { + "id": "qByarIFiTYa1" + }, + "source": [ + "### 📥 Load Test Dataset" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9ca3e34", + "metadata": {}, + "outputs": [], + "source": [ + "# #If you face NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported run:\n", + "# %pip install -U datasets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0eKakxSFTVcA", + "metadata": { + "id": "0eKakxSFTVcA" + }, + "outputs": [], + "source": [ + "DATASET_NAME = f\"{HF_USER}/pricer-data\"\n", + "dataset = load_dataset(DATASET_NAME)\n", + "test = dataset[\"test\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cWqvs8JRTggE", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 110 + }, + "id": "cWqvs8JRTggE", + "outputId": "bf7f0113-de82-422a-aaec-54efbb2b9d16" + }, + "outputs": [], + "source": [ + "# Format description function (no price in text)\n", + "def description(item):\n", + " text = item[\"text\"].replace(\n", + " \"How much does this cost to the nearest dollar?\\n\\n\", \"\"\n", + " )\n", + " text = text.split(\"\\n\\nPrice is $\")[0]\n", + " return f\"passage: {text}\"\n", + "\n", + "\n", + "description(test[0])" + ] + }, + { + "cell_type": "markdown", + "id": "alpkYSc2UX0n", + "metadata": { + "id": "alpkYSc2UX0n" + }, + "source": [ + "### 📥 Load Models and ChromaDB" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "pjPBEgXqmHOA", + "metadata": { + "id": "pjPBEgXqmHOA" + }, + "outputs": [], + "source": [ + "# ChromaDB\n", + "\n", + "CHROMA_PATH = f\"{ROOT}/chroma\"\n", + "COLLECTION_NAME = \"price_items\"\n", + "CHROMA_ZIP_URL = \"https://aiprojects-lise-karimi.s3.eu-west-3.amazonaws.com/smart-deal-finder/chroma.zip\"\n", + "\n", + "# Download and unzip if CHROMA_PATH doesn't exist\n", + "if not os.path.exists(CHROMA_PATH):\n", + " os.makedirs(CHROMA_PATH, exist_ok=True)\n", + " r = requests.get(CHROMA_ZIP_URL)\n", + " with open(\"/tmp/chroma.zip\", \"wb\") as f:\n", + " f.write(r.content)\n", + " with zipfile.ZipFile(\"/tmp/chroma.zip\", \"r\") as zip_ref:\n", + " zip_ref.extractall(CHROMA_PATH)\n", + "\n", + "client = chromadb.PersistentClient(path=CHROMA_PATH)\n", + "collection = client.get_or_create_collection(name=COLLECTION_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8fi1BS71XCv1", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 337, + "referenced_widgets": [ + "c60f1153084a493ea31fac10bf986aef", + "6de41ac188dd48aea5d30a90bc52c38c", + "d2b4cdcaef6a4c41972f8c96af2814ab", + "2180dfb4e6e74df5bf9985c481b6e420", + "dcff84c8c3bf4f4bae334e0484207d10", + "ca1ed709ecaa4a0e8ac96ffb930e6613", + "e5297e7d36334c57aece043f62c79841", + "5fdca4e0987a4788983c418941711d7e", + "200e8e9b0df84affa177567243bf18d1", + "ef6e6dcff8b444bba62c1f76e1127d7c", + "7c94357c0d4e444489e8d47d2151437b", + "4ffdcd2ec96046ffb5121def27c95c9d", + "dd47ad1efe46496cb096a7714cf27c19", + "194ad4f8707b4d288e88cdbdfa33605d", + "c8a05ae3f5854f24998cc615a8849c88", + "b5ec411e72a946f8b4a470de5827c949", + "1a0be25b030d43858cee804da65d67a1", + "88d1f6a56f9b4a50854aee82c0945cc9", + "73e5967ae96942e080f3b05638583bc8", + "8434bfa06abf42c98e8ffb0e7b83c9f9", + "43d872e632da4d9883ea3d71dc91bdf9", + "86729f54df1b4967b2730b48f84a98aa", + "83fcebcf2b2c4213835334a998ba91e9", + "4ef06b10bfcb418d85534a8b73688eff", + "88c466cc89234d8f9f21147882fc5faf", + "f87e958c639544c0b925646fc28c4604", + "a52988b97dff4759a456398ecea1eaaf", + "c1116a13be86401bbbf6e51de0df7d12", + "d6ed27ce322748d29ed864808f619ee3", + "4a6fbedd3333496081695800cae8bdda", + "a7badc083fb34e69bd6f27bc9a805e7b", + "a78bcdbac2f74c72938d87c431f23e78", + "1d627cf1043642a3815a2902f65b4ded", + "3b8cc480ded24f66b03779fd25844670", + "0ce0073368c64339b3c1f960861e4b56", + "ffc973a4347943ebaa4ead16e04c05f6", + "aafba411ee984946a3ec0760580b60b6", + "0dd2501d917f48739b2817d598541660", + "213ca3afc47945a68e28a6ae005c3b7d", + "6355e004d7c34b969b2d2c6ccbc12620", + "9359b873cb4a4187b67a1732d78c7534", + "32f86a2b9e0547a6bc0a523ca3cfa088", + "0f446cb8ee3147438ef1e98e665a2831", + "64bb7ebae66d42f2a4d6a3039bf67d4b", + "e2b51ee511234ff2bc2cf33227fe2088", + "a76d3def06db41fab4ab2f077839d5bc", + "fa9598b858c14024aaf15d1417e9683d", + "df0bb9a9635643ebb679e115f45dde8e", + "527c4d1987334e3e9b2aa0de7d0527a7", + "0c6a889a9066484abbcb87b730d7e325", + "80a1f4c902154f2184c38ef844a1cca7", + "463c3cc65cd343108fe6049e4cde7142", + "fb015ce2fd9b48d79db67f80181964b7", + "07f46375dc594cd19ac5ab983083b2de", + "451ca5f213544cc8b24de6b7d55602fc", + "5bb2e645ea7741839e0f88ae484d94d2", + "19d1353070f643e08364500e9b1c30e6", + "fff94d7934cc4793876903d1c18efbfa", + "f0413b6310ef4510bf493e6814fa162d", + "a97c1662e64c44ee9f6e5be5617c07ec", + "da02a5ab5fe44cc297ab3048509a99e1", + "ee8c23aa2ca84b32a02a2500917559d8", + "6635ef559f72485e9453f87b3921f954", + "d9f2925a563d4d9fa332c15205f44d9b", + "73ec7891d53149e7a072a0e310716178", + "f7309076b36e4224acc42ade5d09bf37", + "cd53294ff44e4955afbfbd4660563b58", + "2ae42eb6385f43fab59f2bb56bb8a28e", + "fc0f1abaaf054d0d93a27c7ee0f6630d", + "c7a61078596c475784307480d26e3661", + "5bd26e4ff28f4639b52aab848ada03d1", + "ce710ef5ffa14cfe9842c63caebc81e5", + "63061726d47940c395a00d5d01556f4f", + "2b41598231d14f3ca6354c9543ec4351", + "0df33079f63c45d39de21439289aa4a8", + "8cc5d2eac9a64d68b72608bb5ae44c89", + "dd33a409204c4610a08e44c3e82e00da", + "b96b7ac71a6c48a9a6c888f2f34efea5", + "794d71dd5b734a3cb5607fc31aaddd18", + "0f6e7a2d9b8846178a7492e137d83bba", + "657bb839f0ea40eb9873385cecd06fd0", + "1ac174e8904943bb9a5e5483e58eef63", + "a1c9714fc4ea48af83669481e89c58c7", + "763a2d64c8e94ca1b0289264d9f868bc", + "ef78cf15ab914b3fa95ff95a86ec7a99", + "24350ebdc38a41e689f3e3b09dfc3e35", + "b8131b3c4c4c4b20809af9b0e91dd006", + "420c50ed8abe49ec9f4f2777e6cd2749", + "fc3f2d2c33ee40f8850710c2f4ce331f", + "90730ec699e84ddda2af799f8220e7a5", + "01f1c4b3b434474dbf2212a05869354d", + "912d6c1687324bb9b334bbf98a2b5b30", + "dc4106a0020b4b9fa21cd12a44967f2e", + "0084e537ffa74ab4a6d5f307b0916d2c", + "0edeb9ca771c4ca2a9a678e0e8a91614", + "b610d515ddb4405695e6972e45463194", + "4af72cb05f284d42bca73fcb88904255", + "86f34fb6325e4e878eee0be27946c88b", + "bfc26f456f1d440bb80eedac1cb14967", + "1f005c3cf7594275a37bb937a3c33db3", + "4347e7d3db4d4cc3836a4e69db032f27", + "809a4d0270dd4c05817ea224bb78ff5a", + "b403a344e84342e4b076e64e829d7354", + "6b3cbb0ac0b14e3fa193cd5cb3f8f521", + "2d06aaf8d15b456b8fadeb54dd2ea73d", + "74842e94dda648d18cd055220a3d2b39", + "5a3818bde07841fbb6077bf20b7dec4b", + "2a5224c8b3004d249a07297a2111493f", + "ff02ac7f08974426b3f70b71e59ed5bb", + "0dd550d3e39f42809fa16770231af7e5" + ] + }, + "id": "8fi1BS71XCv1", + "outputId": "9256b509-1371-4bb3-bd84-98bb75725ac3" + }, + "outputs": [], + "source": [ + "# Embedding Model\n", + "\n", + "embedding_model = SentenceTransformer(\"intfloat/e5-small-v2\", device=\"cuda\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "zmwIbufXUzMo", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 483, + "referenced_widgets": [ + "36b26207185e4e10a8c60f0f5918aa7c", + "96a0c5044c824968a701e20319c8d037", + "a567759a6e554c2eb9334559e880a56e", + "dcc68ce908ac4cfc9ffdcd0333cce14b", + "06d7ac4abde74e1496ef80b9e22cd148", + "94ba3ecd90f84291a46462a51ba001b8", + "00b94b9d1e4e4e428c7467b4f99e45af", + "02219374302849bd93d7aba7f65ee42b", + "82223db2023045e88dc3d9652bc6b183", + "6e08a796d2c440c58d5e6bb20f39be16", + "6ac04725da4f485f8d7efd738de0940a", + "230a6add3adb4dbd9f0329412e3f1455", + "db5f777e8ec143fe89a2e6296031c9c2", + "4f87c3aba240471d96349a037777732e", + "74b04b71ce0746a89d146c4044c85890", + "582e3cd91e3048639ab48492b5ca4b15", + "1d8e31a4418c49e0bfa28b399673f718", + "ec17f9e6bbbc4584baf11c0e7c504f84", + "19779fea3c8c4a5cbea953b059775110", + "3c82450a6db649f59f36668d5521d203", + "3447d41776ef406bbfbe3e6c277b82d2", + "4195859f8457499ba8e61a9f1662931c", + "6f80cc0bed9b42e89f9691675ff5a484", + "dc39eee63325479a915101649ab04273", + "99e1e2c842f845d1b2ded34736b60ad7", + "20ff78d4df09401d8aee462b57c57a48", + "1aff04abbd0b4f15bd58154b00591264", + "9cf024b488154926861d695137268da1", + "6470d6b03e3548d783ed252d128ec361", + "230a79f3d05d4c9eac73ae6962cb5d2d", + "adfe1cc7a6f94b0ab43170c75688374d", + "b6d24621c29f4352862f37ae69f2d6ad", + "303a6f79669e40d98ed2998f4f5e47cc", + "b737799377c948a99edf34c014a105c4", + "daf9616a687f406da1d9ee2bd147c850", + "cfd34b617326486498a916531bae9a87", + "201ea14e31a244a3aa2aeed2c12fb255", + "8454d5f263eb45daa0e6f7db6aa3f92d", + "f9787ee50d634421ae5f0325126dcb73", + "c2200daaf2994ed9ae16587d8d52236c", + "93a0ec9723074e9199d1f9db988c30dc", + "b1cb687d58ba40f49961d5485a466ef7", + "ccc580852d66401993d675c254832379", + "7321b89aea1c4746901ef40548bcf056", + "8fae24599e174689adbaa52a16270785", + "01605d6fb53d4e56aca9a746b2c75566", + "45cb352704fe41f1be0f01c61511323d", + "881550789f4845dda8561f6b26aff204", + "654bf64ec165449993b195209d75f4ba", + "9eaa6e09335047d5987c0a6528d5e77e", + "39331a837a644795bada1e2b034fe14d", + "a339562553394755811bb7077a81843d", + "4f58f5fa385f44e0ae09e2019294d597", + "431cc587951845d6af39f3e4ab0f2f76", + "ac879c2c923b423dabd6d0d60b12266d", + "4a3eb0fc1d2d4606a8acb382085a57af", + "f82bbcbd14ad48a78ebdfdfb43916bd1", + "a634449526034dbdb945c4905f4edeba", + "da4133a915ad4d449876a43468203842", + "a56f3a61d1ba4011ab6fce4067fa8418", + "46727d5afdf7487fb073e7e2d25cc75d", + "1480c5a1a0ad4151a12d47bc22685f04", + "5720fc31c90344908d9eeb49fe83df1b", + "47bd422d424a47e48edd304773162082", + "6a68ecb89be34255a0e0fc6db41c1f4d", + "12b1b3c7f0914030ad756b676cc97962", + "780f5b6ad91142f991a936b55219f61d", + "45723f91352b49688469a95e7f47aa9d", + "90bbf502500340a1993a957c27ad3d33", + "dcbb25b2a082476d905bcf124a849322", + "38e6bd6d64b54f9a8cdb4f40eaf41cde", + "7544a101cad94a15a2f4eb5639d22525", + "501184a0fc424a02a80aecd3f62fb9fd", + "1cbf5d28bced46ac8712a4609b5a5867", + "9ba6ecc0a422472681d8e61bdb32f87b", + "8383700148dd44538ed81ec5a261b7b5", + "740e930ff17c4f668818a8c762a5470e", + "0d995d8da0464c9ca7b1b444c22de025", + "bc8a5e6d27ba402996434f00918c8b0e", + "75f225b1a6f845148361b029878b63ea", + "c16b051d3cb44607b339770f5f8b6f2e", + "7857ed1e0b0f45bdb48269fbad68653d", + "e65ffc77bd6740c7924aa5b93297cb89", + "3ba97a41b4654dc0bb9bcaaa685b4518", + "6504931865a74cb5a80f2ac60da47430", + "ff8d5791b13c440d81312a6b96c9592f", + "c31bd8cb693b4e248a29f2ded032fb70", + "c25b1a42547a422ba7597c99ca4ce249", + "5c6338fcad9344e092f5077bf73c4910", + "dca176a7a6ea4fe9b025f851976f436a", + "611fc076771a4fcca5c46367b711d61a", + "0cfde45e26cf4c05b67755c2274f2df8", + "5d572d2f46ea484587e085c29318b616", + "8ad59c1261844a06b7abecebd7b60377", + "82d610cd077c47bd9efd609f2399c861", + "07fd58c9d07144a7a0aacab6b8252125", + "aa0932b4e66b4f33ba9f5237ea1470da", + "0d867615a23a42988bb91b4f0d0cc942", + "9c28ff7b0f5c421390ac1ccb899f093f", + "0645e7ed6593410eaf9c9c0b25158667", + "ca68b1dc60a343f9bd7298a63cadd556", + "9597ec6b495c4298b87967ed3e4044db", + "d7dad0ae58814124af1e92a078122736", + "173575b8b5254537937206759d6b6262", + "d1efdc10d36441d88cf7705e846bfbef", + "3550d450f95f40eeae0c0d559ae9f4de", + "773df79ed7b44f698cce98ca9ed802cc", + "24a2b4b88e1d46488eabb9101536beac", + "0d61c01e6cdf445b9474f9d759676edd", + "9b3505aacd164a19b45aba89eee46378", + "bd6fb8b066be46aaa7d457bf89257e54", + "f1eca4e5d600407885264d340b4f47a7", + "ef1e6a69995845e09781f76a38fced30", + "c5f50067867a40b99cb9f312e8adc49f", + "4e9cf63dbef041aba2c7f0b9c74466c8", + "07baa025b3a14bf89d6f6b438b695bbd", + "fb8609ef5b8d4653b25e52f853b7be1f", + "2f90ac58752347319d1203b5e8765c0e", + "d9815cabf324472a8eab585afabfa47e", + "120687e04065424595571941d816a134", + "a6d09159931f4d3d91a0647d9fa9d8ba", + "833c755f7bf9479abeef0041a82a92ba", + "56d848676e644c739e28730af99d69c6", + "39654eca8add4c09815cb3e6a45616ec", + "02aa70e064744a29af0a68aaca33c741", + "7fd83f95cfef4b1dbd881be7083d7455", + "cdf56625053b417fad2e64a0bed6725b", + "4199341c09bd46fab8a3b649d0c8af7a", + "334e67f38b8243aa9072f52a32e46080", + "492cdb40ffbf444d8e256875663fc598", + "655c6e0f21ff4e9db7f35522355d847c", + "926c1be6e26f4d9eba332881f975ed38", + "47641f7363be4252b9f5e53846bee057", + "887f8a2b268541eab71804a44ba1479b", + "5b6e78d1727e473ab3b66d6ff042aeca", + "acc60a1210104049983341db3010be0a", + "c028c37980e14b3ea07b1da6f558651e", + "9d1085906e3548078e5e393a86337c3e", + "259c86a51f4e4cab9648cc603fc25c7e", + "8ce05076e77643a88b062687e2b24493", + "8b3b7f947f4d4401bbca47d5720f7450", + "9bf9dccee248425da698dbb4526fcad9", + "b991477124184bf3b4397762649a6596", + "fb139bcae29f49778bf172eb503c0668", + "93ba52daa9aa4dee8da91bba6c7d0269", + "fefedf36efc94ed287bdeceae698d5b5", + "8288b87b06b34ba4b2c7a343d6cea827", + "5a85d212a6b5468fbc10e6aeb0ad8bee", + "22274b08e14c4c77a6223131779f6f48", + "edc0e436da954a33bbea8e80629eb43c", + "1d9b1c594680467f9c8a6682d8aeb2e7", + "d82f940e0a8a478e8b1ee8f169f798fc", + "ab1616f507594b27b898ede4504b4e39", + "1fed11f1c7484251a2a7400627ad5f6a" + ] + }, + "id": "zmwIbufXUzMo", + "outputId": "2acb6897-4c41-4447-e029-ffcc1b3b4da1" + }, + "outputs": [], + "source": [ + "# Fine Tuned Llama Model\n", + "\n", + "BASE_MODEL = \"meta-llama/Meta-Llama-3.1-8B\"\n", + "FINETUNED_MODEL = \"ed-donner/pricer-2024-09-13_13.04.39\"\n", + "REVISION = \"e8d637df551603dc86cd7a1598a8f44af4d7ae36\"\n", + "\n", + "# Quantization config (4-bit)\n", + "quant_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_compute_dtype=torch.bfloat16,\n", + " bnb_4bit_quant_type=\"nf4\",\n", + ")\n", + "\n", + "# Load tokenizer\n", + "tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)\n", + "tokenizer.pad_token = tokenizer.eos_token\n", + "tokenizer.padding_side = \"right\"\n", + "\n", + "# Load base model\n", + "base_model = AutoModelForCausalLM.from_pretrained(\n", + " BASE_MODEL, quantization_config=quant_config, device_map=\"auto\"\n", + ")\n", + "\n", + "# Load fine-tuned model\n", + "fine_tuned_model = PeftModel.from_pretrained(\n", + " base_model, FINETUNED_MODEL, revision=REVISION\n", + ")\n", + "\n", + "# Align generation config\n", + "fine_tuned_model.generation_config.pad_token_id = tokenizer.pad_token_id\n", + "\n", + "print(f\"Memory footprint: {fine_tuned_model.get_memory_footprint() / 1e6:.1f} MB\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0IHiJNU7a4XC", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49, + "referenced_widgets": [ + "0264a3987fbf4040860ffa3fc47940d8", + "06d1db35940b469797c39c653741ea36", + "84c4d2fdaf734a559ee3eee09f1be295", + "fddde0bfed544b18ba39bfaa40eb9e1b", + "d40cc525cc28416cad4a45b3631798c9", + "e1372af176154902b1f555f30c28c007", + "5a1352c5ceb84320b14353b7aa21650d", + "522d0ed9e705457e9c72d276e2a26dbd", + "4de73aa76f044811990c379737a8e5c0", + "9305e96697ab4854ac89a6636991101d", + "b00e41d1051340fd904ba719111a907d" + ] + }, + "id": "0IHiJNU7a4XC", + "outputId": "c68bc44e-6b15-46c3-c8d9-3f256f368317" + }, + "outputs": [], + "source": [ + "# XGBoost Trained Model\n", + "\n", + "MODEL_FILENAME = \"xgboost_model.pkl\"\n", + "model_path = hf_hub_download(repo_id=REPO_ID, filename=MODEL_FILENAME, token=hf_token)\n", + "xgb_model = joblib.load(model_path)" + ] + }, + { + "cell_type": "markdown", + "id": "76BhcPjWa6C5", + "metadata": { + "id": "76BhcPjWa6C5" + }, + "source": [ + "### 📊 Model prediction collection" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "LgGmUKJxayZ6", + "metadata": { + "id": "LgGmUKJxayZ6" + }, + "outputs": [], + "source": [ + "def extract_tagged_price(output: str):\n", + " \"\"\"Extracts a float price from a string based on 'Price is $' keyword.\"\"\"\n", + " try:\n", + " contents = output.split(\"Price is $\")[1].replace(\",\", \"\")\n", + " match = re.search(r\"[-+]?\\d*\\.\\d+|\\d+\", contents)\n", + " return float(match.group()) if match else 0.0\n", + " except Exception:\n", + " return 0.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ggKf1nSQbAnv", + "metadata": { + "id": "ggKf1nSQbAnv" + }, + "outputs": [], + "source": [ + "def ft_llama_price(description: str):\n", + " prompt = (\n", + " f\"How much does this cost to the nearest dollar?\\n\\n{description}\\n\\nPrice is $\"\n", + " )\n", + " inputs = tokenizer(prompt, return_tensors=\"pt\").to(\"cuda\")\n", + "\n", + " outputs = fine_tuned_model.generate(\n", + " **inputs, max_new_tokens=5, num_return_sequences=1\n", + " )\n", + "\n", + " result = tokenizer.decode(outputs[0])\n", + " price = extract_tagged_price(result)\n", + " return price" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "_cWyYUd4Ub-K", + "metadata": { + "id": "_cWyYUd4Ub-K" + }, + "outputs": [], + "source": [ + "def xgboost_price(description: str):\n", + " vector = embedding_model.encode([description], normalize_embeddings=True)[0]\n", + " pred = xgb_model.predict([vector])[0]\n", + " return round(float(max(0, pred)), 2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3Skod8juXgnN", + "metadata": { + "id": "3Skod8juXgnN" + }, + "outputs": [], + "source": [ + "def gpt4o_price(item):\n", + " def get_embedding(text):\n", + " return embedding_model.encode([text], normalize_embeddings=True)\n", + "\n", + " def find_similars(text):\n", + " results = collection.query(\n", + " query_embeddings=get_embedding(text).astype(float).tolist(), n_results=5\n", + " )\n", + " docs = results[\"documents\"][0]\n", + " prices = [m[\"price\"] for m in results[\"metadatas\"][0]]\n", + " return docs, prices\n", + "\n", + " def format_context(similars, prices):\n", + " context = (\n", + " \"To provide some context, here are similar products and their prices:\\n\\n\"\n", + " )\n", + " for sim, price in zip(similars, prices):\n", + " context += f\"Product:\\n{sim}\\nPrice is ${price:.2f}\\n\\n\"\n", + " return context\n", + "\n", + " def build_messages(description, similars, prices):\n", + " system_message = (\n", + " \"You are a pricing expert. \"\n", + " \"Given a product description and a few similar products with their prices, \"\n", + " \"estimate the most likely price. \"\n", + " \"Respond ONLY with a number, no words.\"\n", + " )\n", + " context = format_context(similars, prices)\n", + " user_prompt = (\n", + " \"Estimate the price for the following product:\\n\\n\"\n", + " + description\n", + " + \"\\n\\n\"\n", + " + context\n", + " )\n", + " return [\n", + " {\"role\": \"system\", \"content\": system_message},\n", + " {\"role\": \"user\", \"content\": user_prompt},\n", + " {\"role\": \"assistant\", \"content\": \"Price is $\"},\n", + " ]\n", + "\n", + " docs, prices = find_similars(description(item))\n", + " messages = build_messages(description(item), docs, prices)\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4o-mini\", messages=messages, seed=42, max_tokens=5\n", + " )\n", + " reply = response.choices[0].message.content\n", + " return float(\n", + " re.search(r\"[-+]?\\d*\\.\\d+|\\d+\", reply.replace(\"$\", \"\").replace(\",\", \"\")).group()\n", + " or 0\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "98bf0aed", + "metadata": {}, + "source": [ + "### ✂️ Split dataset and process" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8XQK5yrk8On4", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "8XQK5yrk8On4", + "outputId": "ec379798-8b73-4e66-a517-a818845c8353" + }, + "outputs": [], + "source": [ + "print(\"Splitting entire dataset...\")\n", + "np.random.seed(42)\n", + "all_indices = list(range(len(test)))\n", + "np.random.shuffle(all_indices)\n", + "\n", + "train_split_size = int(0.8 * len(all_indices))\n", + "train_indices = all_indices[:train_split_size] # 80% of total\n", + "test_indices = all_indices[train_split_size:] # 20% of total\n", + "\n", + "train_indices = train_indices[:250] # First 250 from training split\n", + "test_indices = test_indices[:50] # First 50 from testing split" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "XN7P5fkkXfgP", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "XN7P5fkkXfgP", + "outputId": "69f9d265-a402-48ab-a91e-8c6032ea4118" + }, + "outputs": [], + "source": [ + "# Process subset of TRAINING data\n", + "ft_llama_preds_train = []\n", + "gpt4omini_preds_train = []\n", + "xgboost_preds_train = []\n", + "true_prices_train = []\n", + "\n", + "for i in tqdm(train_indices):\n", + " item = test[i]\n", + " text = description(item)\n", + " true_prices_train.append(item[\"price\"])\n", + " ft_llama_preds_train.append(ft_llama_price(text))\n", + " gpt4omini_preds_train.append(gpt4o_price(item))\n", + " xgboost_preds_train.append(xgboost_price(text))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1_6_atEgHnFR", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "1_6_atEgHnFR", + "outputId": "956e4dcb-2300-44ab-a66b-9b1254216762" + }, + "outputs": [], + "source": [ + "print(\"True Prices:\", true_prices_train)\n", + "print(\"FT-LLaMA Predictions:\", ft_llama_preds_train)\n", + "print(\"GPT-4o-mini Predictions:\", gpt4omini_preds_train)\n", + "print(\"XGBoost Predictions:\", xgboost_preds_train)" + ] + }, + { + "cell_type": "markdown", + "id": "ygJsuvtLtOdR", + "metadata": { + "id": "ygJsuvtLtOdR" + }, + "source": [ + "Example :\n", + "- True Prices: [245.0, 24.99, 302.4, 737.0, ...]\n", + "- FT-LLaMA Predictions: [99.0, 53.0, 550.0, 852.0, ...]\n", + "- GPT-4o-mini Predictions: [179.99, 97.0, 348.0, 769.0, ...]\n", + "- XGBoost Predictions: [220.19, 59.85, 254.29, 335.76, 165.04, ...]" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "tYWMhTrXcA7x", + "metadata": { + "id": "tYWMhTrXcA7x" + }, + "outputs": [], + "source": [ + "# Create features for TRAINING data\n", + "maxes_train = [\n", + " max(a, b, c)\n", + " for a, b, c in zip(ft_llama_preds_train, gpt4omini_preds_train, xgboost_preds_train)\n", + "]\n", + "means_train = [\n", + " np.mean([a, b, c])\n", + " for a, b, c in zip(ft_llama_preds_train, gpt4omini_preds_train, xgboost_preds_train)\n", + "]\n", + "\n", + "# Create TRAINING dataframe\n", + "X_train = pd.DataFrame(\n", + " {\n", + " \"FT_LLaMA\": ft_llama_preds_train,\n", + " \"GPT4oMini\": gpt4omini_preds_train,\n", + " \"XGBoost\": xgboost_preds_train,\n", + " \"Max\": maxes_train,\n", + " \"Mean\": means_train,\n", + " }\n", + ")\n", + "\n", + "y_train = pd.Series(true_prices_train)" + ] + }, + { + "cell_type": "markdown", + "id": "e1682cf0", + "metadata": {}, + "source": [ + "### 🏋️Train the Ensemble Model" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "-WsFABEicOyo", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-WsFABEicOyo", + "outputId": "42ae6421-fb4e-4ae6-ab54-b075e311b94d" + }, + "outputs": [], + "source": [ + "np.random.seed(42)\n", + "lr = LinearRegression()\n", + "lr.fit(X_train, y_train)\n", + "\n", + "# Print feature coefficients\n", + "feature_columns = X_train.columns.tolist()\n", + "for feature, coef in zip(feature_columns, lr.coef_):\n", + " print(f\"{feature}: {coef:.2f}\")\n", + "print(f\"Intercept={lr.intercept_:.2f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "GnYPOslHFgGx", + "metadata": { + "id": "GnYPOslHFgGx" + }, + "source": [ + "- FT_LLaMA: 0.52\n", + "- GPT4oMini: 0.17\n", + "- XGBoost: -0.31\n", + "- Max: 0.45\n", + "- Mean: 0.13\n", + "- Intercept=-6.06\n", + "\n", + "---\n", + "FT_LLaMA is the most influential model in the ensemble.\n", + "\n", + "Max prediction also has strong positive impact.\n", + "\n", + "GPT4oMini and Mean contribute less, but still add value.\n", + "\n", + "XGBoost has a negative coefficient, acting as a counterbalance.\n", + "\n", + "\n", + "Overall: FT_LLaMA leads, max adds value, XGBoost corrects for overestimation—resulting in a balanced ensemble." + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "id": "wyx39HEL9niI", + "metadata": { + "id": "wyx39HEL9niI" + }, + "source": [ + "### 🔮 Prediction" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "W3F0nNBXlrUJ", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "W3F0nNBXlrUJ", + "outputId": "1dbd9702-50cf-4d80-b8ab-9b2000dd3b10" + }, + "outputs": [], + "source": [ + "# Process subset of TEST data\n", + "ft_llama_preds_test = []\n", + "gpt4omini_preds_test = []\n", + "xgboost_preds_test = []\n", + "true_prices_test = []\n", + "\n", + "print(\"Processing TEST data (50 items)...\")\n", + "for i in tqdm(test_indices):\n", + " item = test[i]\n", + " text = description(item)\n", + " true_prices_test.append(item[\"price\"])\n", + " ft_llama_preds_test.append(ft_llama_price(text))\n", + " gpt4omini_preds_test.append(gpt4o_price(item))\n", + " xgboost_preds_test.append(xgboost_price(text))\n", + "\n", + "# Create features for TEST data\n", + "maxes_test = [\n", + " max(a, b, c)\n", + " for a, b, c in zip(ft_llama_preds_test, gpt4omini_preds_test, xgboost_preds_test)\n", + "]\n", + "means_test = [\n", + " np.mean([a, b, c])\n", + " for a, b, c in zip(ft_llama_preds_test, gpt4omini_preds_test, xgboost_preds_test)\n", + "]\n", + "\n", + "# Create TEST dataframe\n", + "X_test = pd.DataFrame(\n", + " {\n", + " \"FT_LLaMA\": ft_llama_preds_test,\n", + " \"GPT4oMini\": gpt4omini_preds_test,\n", + " \"XGBoost\": xgboost_preds_test,\n", + " \"Max\": maxes_test,\n", + " \"Mean\": means_test,\n", + " }\n", + ")\n", + "\n", + "y_test = pd.Series(true_prices_test)" + ] + }, + { + "cell_type": "markdown", + "id": "mVn6AAGq96wm", + "metadata": { + "id": "mVn6AAGq96wm" + }, + "source": [ + "### 🧪 Evaluation" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "y25l8rR791wG", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "y25l8rR791wG", + "outputId": "0a02a620-eb0d-46a6-8f54-1046c2394ab3" + }, + "outputs": [], + "source": [ + "# Evaluate on the test set\n", + "print(\"Evaluating model...\")\n", + "y_pred = lr.predict(X_test)\n", + "r2 = r2_score(y_test, y_pred)\n", + "print(f\"R² score: {r2:.4f}\")\n", + "\n", + "# Calculate RMSE\n", + "rmse = np.sqrt(mean_squared_error(y_test, y_pred))\n", + "print(f\"RMSE: {rmse:.2f}\")\n", + "\n", + "# Calculate MAPE\n", + "mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100\n", + "print(f\"MAPE: {mape:.2f}%\")" + ] + }, + { + "cell_type": "markdown", + "id": "vHJLe6LNEBrB", + "metadata": { + "id": "vHJLe6LNEBrB" + }, + "source": [ + "Evaluating model...\n", + "- R² score: 0.7376\n", + "- RMSE: 127.62\n", + "- MAPE: 29.70%\n", + "\n", + "---\n", + "\n", + "- R² = 0.74: This is a solid R² value, indicating our model explains about 74% of the variance in the price data\n", + "Generally, an R² above 0.7 is considered good for price prediction tasks\n", + "- RMSE = 127.6: Average error; good if prices are in the thousands.\n", + "- MAPE = 29.7%: This means our predictions are off by roughly 30% on average. Typical for price prediction, but there’s room for improvement.\n" + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "id": "C6cEJ57WApkG", + "metadata": { + "id": "C6cEJ57WApkG" + }, + "source": [ + "### 🚀 Push to HF" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "_n7n_MnscS4r", + "metadata": { + "id": "_n7n_MnscS4r" + }, + "outputs": [], + "source": [ + "# Serialize Ensemble model locally for Hugging Face upload\n", + "\n", + "MODEL_DIR = os.path.join(ROOT, \"models\")\n", + "MODEL_FILENAME = \"ensemble_model.pkl\"\n", + "LOCAL_MODEL = os.path.join(MODEL_DIR, MODEL_FILENAME)\n", + "\n", + "os.makedirs(MODEL_DIR, exist_ok=True)\n", + "joblib.dump(lr, LOCAL_MODEL)\n", + "\n", + "# Create the model repo if it doesn't exist\n", + "api.create_repo(repo_id=REPO_ID, repo_type=\"model\", private=True, exist_ok=True)\n", + "\n", + "# Upload the saved model\n", + "api.upload_file(\n", + " path_or_fileobj=LOCAL_MODEL,\n", + " path_in_repo=MODEL_FILENAME,\n", + " repo_id=REPO_ID,\n", + " repo_type=\"model\",\n", + ")" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/week8/community_contributions/lisekarimi/10_part2_modal.ipynb b/week8/community_contributions/lisekarimi/10_part2_modal.ipynb new file mode 100644 index 0000000..f1525d7 --- /dev/null +++ b/week8/community_contributions/lisekarimi/10_part2_modal.ipynb @@ -0,0 +1,387 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "44c6af6b-6fc3-44d5-a586-71618af7d09a", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "# Modal (Part 2)\n", + "\n", + "---\n", + "✅ With all models and ChromaDB set up, it's time to integrate everything into a real system: **Snapr** — an app that scans online product listings, predicts their value, and alerts users to great deals.\n", + "\n", + "To power SSnapr, we’ll need:\n", + "- Price prediction models — ready for production \n", + "- Fast, on-demand predictions \n", + "- A scalable setup that handles real-world usage\n", + "\n", + "🔧 That’s where **Modal** comes in. Modal lets us deploy models and services to the cloud, with minimal setup, low latency, and clean Python APIs.\n", + "\n", + "- You can check out a [live demo](https://huggingface.co/spaces/lisekarimi/snapr) of the project\n", + "- The source code is available on [GitHub](https://github.com/lisekarimi/snapr)\n", + "\n", + "---\n", + "📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)\n" + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "id": "b8c175e7-ca0a-4664-bded-08ec131c5636", + "metadata": {}, + "source": [ + "## 📚 Pre-requisites\n", + "\n", + "To follow this project smoothly, it's helpful to know:\n", + "\n", + "- 🛰️ What an API is: You send a request → it’s processed remotely → you receive a result\n", + "- 🐳 What a Docker image & container are:\n", + " - Image = environment with code & dependencies\n", + " - Container = running instance of that image\n", + "- 🧑‍💻 Local vs Remote code execution:\n", + " - Local code runs on your machine\n", + " - Remote code runs in the cloud (via Modal" + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "id": "440fffc2-9ec1-433d-9b71-e6fae3b46415", + "metadata": {}, + "source": [ + "## 🔧 Install & Setup Modal\n", + "- Before starting, install Modal in your environment (Run this once): `uv pip install modal`\n", + "- Create an account at modal.com (they give you $5 free to start).\n", + "- Then authenticate your environment: `modal setup`" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef286205", + "metadata": {}, + "outputs": [], + "source": [ + "!uv pip install modal" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d3906c01-b313-4dac-9a2e-6c7dbfdcc8fd", + "metadata": {}, + "outputs": [], + "source": [ + "import modal\n", + "import sys\n", + "sys.path.append(\".\") # Make sure your local modules are accessible" + ] + }, + { + "cell_type": "markdown", + "id": "43c59002-afe6-4dcc-a53e-b50d85857f7d", + "metadata": {}, + "source": [ + "## 🧠 Key Concepts\n", + "\n", + "Modal is a platform that lets you run Python code in the cloud. You can:\n", + "- Deploy code as APIs\n", + "- Run GPU workloads (e.g., LLMs)\n", + "- Automatically handle Docker, infra, deployment\n", + "\n", + "What is a Modal App?\n", + "An \"App\" is a containerized cloud service where you can run code remotely.\n", + "- Code runs in isolated containers (like Docker)\n", + "- These containers are created on-demand and destroyed when idle\n", + "- You define your logic in a file and deploy it to Modal\n", + "\n", + "Key Modal Concepts\n", + "- `modal.Image`: Defines the environment (like a Docker image)\n", + "- `@app.cls`: Runs classes remotely inside a container\n", + "- `modal.App`: Defines and registers the Modal app\n", + "- `.remote()`: Sends request to Modal API to execute the code remotely\n", + "- `modal deploy -m`: Deploys app permanently like a real cloud service" + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "id": "79436b01-9623-4b0e-8ffc-0ea51a5783ac", + "metadata": {}, + "source": [ + "## ⚙️ Minimal Example" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d62850fc-dbf4-48b0-a2f2-a1e9a200414d", + "metadata": {}, + "outputs": [], + "source": [ + "from modal_services.get_started import app, f\n", + "\n", + "with app.run(): # This spins up a container in Modal\n", + " print(f.local(1000)) # Run locally inside the notebook\n", + " print('*' * 5)\n", + " print(f.remote(1000)) # Run remotely via Modal API inside a container" + ] + }, + { + "attachments": { + "886d059a-a8ca-4552-86d2-fb87fb824441.png": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfQAAADhCAIAAACx2NBBAAAgAElEQVR4Ae2d+3dTVd7/+YVf/MG1XFI4iOIIhdCmpTdiWwVLL1JaegEaSy2ldwEZpcIUqljsICpMBUSx4MOIDwMMShmkDEydMkCFQsROrRTi01KfyTNZZMxi6eq/4Pe79z77ZCc5SXo5Ibd3VhacnOyzL69zzut88tkn6SQJDxAAARAAgbAjMCnsRoQBgQAIgAAISJA7DgIQAAEQCEMCkHsY7lQMCQRAAAQgdxwDIAACIBCGBCD3MNypGBIIgAAIQO44BkAABEAgDAlA7mG4UzEkEAABEBiN3I3bDnz26WHy3Le1CMhAAARAAASCn8Ao5N7cZbHZbez5zWeFwT8m9BAEQAAEIp6Ab7lvu2CVzU783nesOOKZAQAIgAAIBD0Bn3Lf2TlMwnbLle5eGrwPnKwJ+kGhgyAAAiAQ6QR8yb21m4Xtl1uLjvXS5MzA6XWRDg3jBwEQAIFgJ+Bd7in7rlChW7t3S1L1STMVvfnUWtdRbTzZZxm2Woa790lSRsNnl/uHLRa7zWId7O/+tGGxU+mG072kpPXyfknKavj0onlw2GqzWi2D5suHGzKcipIXyVWtp67SMja5wmO/r0l2K4YVIAACIAACIgGvcjcc6mHzqFf2E5/Wnh6gLwc7tohVSJLU2DFMvW86dtDkmH1l29rsA2e2OHS8+dwgXd9z/FCPhc/T8pK2gXONBkfdJXu7WWEh6U82GewQKnQUxxIIgAAIgIBMwJvcC4/0Mav2HEyhxdefGqA6Hu7a5gyQy52+O2g6tbe5urbh7ePdA7K+rZdbWQ2SxOVOarYO97Tv31Zbs3HnicsD8rSt5UorvxK0XmabW/pO7VxfaJAySpuPfcOKWTubnXuAVyAAAiAAAgIBL3Iv+vQbFlmb2ng0va6dZWasnb8X6nBE7nab2Sn0lkpPsGlYm5Kpd8h9+OxWbnxSmVHO6dt42oeX7D0sFDO0Xh4m+Znekw1OPcALEAABEAABgYBnuRdzL7OcDNtmrZyZsV1pFSpR0jL2y3sFEdMSb3/FYm3zqVr6mivbJlbL6vq9fEO9fENOg5zAQRJGRI1lEAABEBgNAY9y59OnLr7mmRlr9z4ezgs5d7W74A+a5NzOAdofLvfeo+5fduUp/quHaFHels1uGzb3dJxo29lQkjWaQaEMCIAACEQ6AU9yF8RqITe3OJ5Wlqtxkj7PuZva3Hlu5TOoB+l7XO497KVT+S1nB2nl15ncJam0tZNl+ZUZV5vdZjades/otB1egAAIgAAIOBPwIHcl/SJa1WX5+iE+86mkZdTkfoBH7r7lziN3Re6krymFL+9sa+/q6R+28OuKzWbtOeAe+DuPDK9AAARAIIIJqMudR+L23g75J8PYD4exf3k03fcp/ykCXt51olWSJD4HO3x2M8XMI3fLVztdsfMrivutlrxkSnVrl3xz5MDpar4W/4MACIAACLgQUJV7M/vJAdtw19suxenLwqPyLZJK3pzL3W67fsjpl8UM8q8XOKricrdZTW382kBrTeFTr/IVovpAV8/VvsHhbuVeHVpssXwPz+C5RrW+YR0IgAAIgIAkqf6BbP4zkCrBNWNW/JlygyMLnx1yt9kHLrRW07nW5IIt/LZ0+8DJ9TJuRe42u22ga3cVvbvGUNR4vE/+9pNy0yT/5YPBjmblm6sZzXIG3yYkhbAjQQAEQAAEXAioRO672U8O2FRyLHxj5RZ4+Z50Lve+s/JXVeVJV/mbpeLN71zuvR1c006pfPHmd2Pbdf6DlFY6o+v4RitP8vAO4X8QAAEQAAGRgJvcDfsvs3lLt6+hipspX14daCchOZe7qU0y7r7Avugk+33w6mcbhZsmlW+o9hyUSt7rGnDMkZLbYD59xfk2eUPN7o4+YR6V1GkZ6HYtJvYMyyAAAiAAAuppmbFzEeRON84yVtfWVNeuLhS1zqrlkTu/FXJxCSlZU17grHWnPshlqmtrcJ+7Exi8AAEQAAEPBNwidw/lvK92lbuX0q5y91IUb4EACIAACIyTAOQ+TnDYDARAAASCmQDkHsx7B30DARAAgXES0Ebu4h/r8NER8Y91+CiKt0EABEAABMZJQBu5j7NxbAYCIAACIOAfApC7f7iiVhAAARAIKAHIPaD40TgIgAAI+IcA5O4frqgVBEAABAJKAHIPKH40DgIgAAL+IQC5+4dryNY6DQ8QCAICIXsCBVHHIfcg2hkB70oQnNToAgg4CAT8jAjpDkDuIb37tOy845TCEggEDQEtD/EIqwtyj7Ad7mG4QXMuoyMg4ErAwzGL1T4IQO4+AEXC28rJNBUPEAgmAsqRGQmnoeZjhNw1RxpiFSrnTzCd1OgLCMgElOMzxM6rIOgu5B4EOyFwXVDOHHYmReEBAkFDQLm+KUdp4E6UkGwZcg/J3aZJp5VzRjT7FDxAIDgIsKsMOziVY1WTIz9CKoHcI2RHuw5TOVtEs9+YPPnXSZPwBIFgIHBj8mT43fW8HctryH0stMKlrGL2adOmKXKfMmVKMJzS6AMIKASmTJkSFRXlErxPmzYtXE5E/44Dcvcv3+Cs3bvcH+WP4Ph0jl5EEAF+6D3K/A65T0QgkPtE6IXqtj7lLuokaCbY0JGwJSAeb1OmTHn0UchdA7dA7hpADLkqRil3Ty5hH5PxLwiMm4CnQ4tZHnLXRCmQuyYYQ6wS73JnJ5h4+o37HMaGIDAaAuLBxg4/pGUm7hTIfeIMQ6+G0cvd55kpVsWWfW6CAhFCYKzHBlM85K6VUCB3rUiGUj3iWcdEExUVpdwtw2ax2JkmakjcCssgoAkB8QCD3LWVCOSuLc/QqE08Lb3IXTzxxE2wDAIaEhAPM8XvSMtMXCWQ+8QZhl4N4pk5GrmL5bEMApoTUPwOuWtoE8hdQ5ghU5V4cvqUu1jYfTlkxoyOBgEB9+OHrYHc/bFzIHd/UA32OsVzbHxyD/YRon9BTEA8/JRl5Th0n/5hbykl8Q3VUe5byH2UoMKqmHieKCeV+4Sq+0nFNgwrFhhMIAiIRyBbVo5DyF2rHQK5a0UylOoRTy3lpHKRO1s/depUsXAoDRJ9DW4C4nEl/sYR5K7VfoPctSIZSvWI5xXkHkp7Lrz6qnocQu5a7WTIXSuSoVSP6kmlGrlPmzbt8Vkz5hhmxWXpEvJiEpaxZ2ziMv4siE3EEwRGSYAeNgn5sXHZujlPz3p89gzlUFQ+KULuWqkEcteKZCjVo5xR4sdhVblHL3gqPnde3PO6uBydPmsuf+risvgzWxeHJwiMkgA9bPTZurjndfG58xLyYuYYZrGjEXLX3CCQu+ZIQ6DCUcpdlz4rbsm82Mw5sZlz9ZnM7Do90/ooT2YUAwF3AlnkKCKBQvbc+Nx5MQujlSBj6tSpiNy1MgjkrhXJUKpnNHKflfwbfY4uJmNO7GLIHZ9ONCWgyD2LBA3xuTHRC55C5K65QSB3zZGGQIU+5T79SSk2Z25MRjTkjqST9gSc5R6bReL3GbMe8z63Lx60IXCOBUEXIfcg2AkPvAvieaJ6Rj2V+GRs5pyY5yB3TSNW9wRFZK5xkXvm3PglOiV4R1pGKx9A7lqRDKV6fMpd9+xsErZD7pEpX3+P2k3ucdlzYxfPUY0z8DdUx20WyH3c6Ea7YXT0nPnzEwwGQ1pauv+eBoNh/vyE6Og5o+mWT7nHZs2JWTQbctc+I6G1N5MKYhcY41NfTEirSAyeZ+qLCQuM8UkF5JZHlaeb3PWZc+KX6CD30Zy8oy8DuY+e1ZhLRkfP8bfT3a8WBoPBp+JHI/d5CyF3NTGp2ioQK5MKYoPN6e5Xl9QXE1QU7yb32MVz4nMh9zEbxvsGkLt3PuN/V6/Xu5v3ga3R6/Veuq6h3NMqEjM3Pq080yoSVSK1QLgvvLuRsiLO3aRBuyZlRZzT7oDcvZyc2r0FuWvHUqgpsGZnlxAvftdE7kmF+g3XXmgZqXd5brhmTFzm4fM4LK8FgdAyO7vkOPkdchdc4b/F8cv9ySefTE5OfvbZhcnJyU888cTYuhi/KG9Z/tPRrhtFV+9t/9Muo9t613KjfB2dmrcsP29pqlb1jbbZ6DkPLEL33pCn/MzE5Z6QH/PqN6te/7+qZ+uSxaDs2brk1/9d9crN0oT8GHG9l+WE5pScjmcz98QnrA6mNMha/YIdXroUk7QjfsEOfYIWsvYCx/2tpILYoI3QvXfMkZ+B3EepkokVG4/c58yZW1RUXFtbX1f3EntWVlYnJCSOoSc7zpvNfUfWuG7xuy9um82mT1a5rh/f6+I2k9k8ZO4/88aDtbvnPHv2hg+7bnx351yr08zq6v/qMX9/YSeZbl1aWluWq928q8FgUEU3Qbkn5Mb8tueFLXfXpL6Y4G6f1BcTmv5V+dvrLyQs9eX3pvSqe7WOwP/nslWH2Cb6hV9kFHyRkvTA1ekYzpmVLSMVqz7wcL35c1Ez+chSZnzHQwG/9dxjnn1TqvFW+VZ71ev3Xig/ncI9m7z06xca7VWv2ys2XF+00GXS9dTyN0fKyj4kM7HP7DU8s8m/U7KOowVyVz0ttV45ZrnHxMRUVdUUFhbFxsZKkvTYYzPi4+NLSow1NbVz584dbfc8yF2KTs17jlSrxWP5J91D/V91XTIPdex4cHaP9hi2rzvQdcfcc+Xr74Y6P3CSe1repp1vv5JFnL6/09y1Tzu5p6WlqwbvE5T72n+s+N0PqxcY4x0qdHbZ06vm/+5/Vq/9x0pPBej6+BU/1rfcNxZ/SISe8E56hbW+5X5RDonfk1bdq2+5l5vuXK3X2rSWrFe55/TXtliMG36ub7yW9CB75TlsTyn+oeb1H3KWfGjI7ihq/KVizRFi6oyvy7bfKyo8bMg4nFFzr3bTtVQuferxTYalHc9Q46et+c8atolTAZeLwYRfysE75K6F4HzWMTa5T58+fdWqssLCounTp4tVP/744y++WL5sWaG40tuyB7nnvXW048uP10lS+a4zHV9+/KrDyS9/9OX59j11tM7Y8l0nOy/13bzUdXzPy1memllx+BrRet2RG0Pmr3Y9zYvJTWQ2fPJl980b3R1tW/J4K+s+ON/x2Y5i467jX5n6r3Ud31XhuM7EV7T88UzntdukUXE9r1b5f/78BA+pkvePtb9XnbXp85tucq/d//nn+6vTmo9c6Ok19399oevIm8T+uRv3f362q/PC2UNv1lL1p1fvaf98T9NbRy90Xjh7YEN21ob3SYGz7X/YkO2h0fT58xOUvikLE5H74lcMb/1SZ3hBNvvKj7JXfpTNBCcuP102v2WkPuO3Bs/uW7RupL7JJBQ4kl1/a2nujvhck3Hzz/UtP5dvuLV8xSFm7Zj0jvx1P1Y0WY1Vl9MXyAkcw4pby8vbE1I78jdYKzffzS/Ywz8r7Hi2nGyrz7y8vOFeRcOtnJwWwf5rkwpMK8n6O/nFh/SOHrakG2+VNd4rq7+cmuRF7qvTa4nWU/Lu1Lf8lJ/huAJ57M+C9qX1t7IzmlJ5/coQdHHZMentuVV3y5vula0zZaSvFfrpqFleucAYry7fD3M3/bKykIfemadyCo8kp1Ukrhis3/ptmrxJ1wstg1nOmy8s+yF3aYWhuG9V4y+1jXdLqi4YSIFtaSu+Xb7+h5KarxdmsDo/zKzqy1xyOq/uh5KartS0TanFpMDyNX/lV4tNhqVfF63/oaTu26zsbR4/AcgBAeSunIr+XBib3OfNi6mre0mn07l3af78+TRFU19evjonJ+fxxx93L+NY40Hu5X/sM5vPt0iS1HimX4i4o18/02++3f56tCRlt5y7bTbfvnbhfMcFU795qP/cDlW/06q630+T1v2pz2zuejdNbpyu77vW3afUYL70cTF9s+XckPmfpms3+zq/PN/xVZ/ZPNT/xRZq/opPrg2Z+03tbbs++YI0eumj5Y6xOC95zsmwaF1N7pvae2+2N6Ste+vDszfM3/7lw7a31qVn/a79xnc9f/mv/fv2tP/j+zude0rS0tIb/tz//fc95z7ev+9oV+/Nb29cuXBoz/4DZ7819559w0O8r5qZmYjcc5uffeuXOkWIKz/KbhmpZ1pnC8pbLSP1S5ufVV66LdDw/H7pqmMunwDc5a7Pu1PdMlLdcGd5/Z2ybSP1zZYcGtTTy4O1vOmesf7W8g0kw1NddYx68IPcxpH6Rmt5k2Vl/a2VDffrW34uXcFSKE2L1t2vb7lftu7W8nUWUu26M9Tv72Q3jCjrKzdbKzymZYj3KysO65L+aiQL8uVHF5ftsT/p10htmy2VjiH8mJ1K3Z3+dXnLSPXmf+YUXC7YfL++xSKvd8NFxuUxJ9P5QstgTuZfl9X976pXby0r/kTW68J/rNp+r3DptsS0bamld2tcI/eKrA0jL6yoSFny17wN9qoN13OK/zslbdMza+7VNPQtKb6QueZuzZs/ZJLQ/siyrb/UNNzKKb6Qt/4/axr/Xbb2cmZx18rGkaqaE6Stwjs1b97Nyz/8zIpbFdv/nZvpIcaXMzOQu7M0/PRqbHKPi4uvq3tpxowZqr2ZO3duXFzcokXPvfhi+fLlK1XLyCt9yl16+fhNJeKOfuP0bfM/T74qSdJLJ2+ab3e+k83qyWrtNpv7jr/k3hTdvGsvCdjJJkOXPspnhajchy61ijXcbm8kbxK5m/uO/5ZF8tGvnugzm7s/WiZJ9EPAtTYm9Gjjph2vFjliepe2PUXQfL0XuYtpmZIDl4b+8TERelpaetbuC99/fbSayr338ya6Uqxnf6f522OvOKd6BNe79FCSJA3lHpetY353MXtcts6X3HVxf8gknh2pb/m5evPd/OLDiuWd0zKH8ptG6jf8jYfYNN+97owsUxI7s0B+taHiJyLHBdm6OCr3lruZcta+KXPDSH1zf3pcti7DVNny88qCJhYL64vvytE3ybSMGIvF9eo5d33B//CAfVOGUi11MZG7an+o3B1DSPqbsWWktpZchxKMlvoWSw4TfdwfUnOPJHiZpHWOux0BcrapouV+zda7ywr/9MyKvrI3R1aV7qbvbkot/qGKzWps/3feEh7a83qY3BPTKhxpGXI9uJsjZ+c3Zaz9pZwk5Y8s2/qfwiVU2dnfrlE+AawYrG80pZHN79Wu/yv5rJBWkfyM58hdvlMWcnc/J/2wZmxyj48ncvcRlUvSggUL6upeio7m+Q73fvuWu0QtTCPu6B0d5qGbf3pZkiTjJyaz2XT8nV3vsucH52+ahzrfcWuICb01lbZMRd99mIXncrXKFss+vmQeuvZJqSz3G0drlN6yTw9vSZL0MsntDPRd+vLo+411eQZlY6WoY4FL3JNqRSnzMnLkLsqdFDN/d+d7+TlkJqE9idx7/7xJTe79n2/itQlaZ51xdI4vBYvcSegak/RxutG0suEnMrO67dYiqmMnudPY1li8SU5NxGWTrAg1NZFp47UUJcilgl6ey+W+rkPZJH6Fpb7FmpuenbTKWt9iKSjoyGDPFbcqWkZWFqymLSoXg2xdXIeHCVUq9KabLJtERf8zmyeQLzaq/aFyX1mg3A60g3xK2Pw1yddnkKC+tunH5av+lr6wmaeV3BIybIxcyg6tszVE7vbl+dzdhXdqqXNpQD2YQ1IrmwyF/RUOa8uRtYrcibt/qXmdzMFWvW6v2T5Cc/FE7suy1eTO0j4LO5Zv/aV++/2KV/uWLP3AtXtit8lAIHd+Jvr1/7HJ/YknnqireykuLt57n5588sm6upfi4z0XG4XcWbzcuSs1WijMkiqXvjzfITyPvCVH5UqvXj1xm9wn4/Q0fbKCvE9roJkfVjp6V6d56OYfKxS5lyu1VB+9SVJD9DXJuZ+/dINVe/vSB6S86mMCaRkXud/5y3ZXX49D7kGcllHMKy8svFbO0h0uE6rMjHkO36WU36tvubOIpUHEKc2FRJQrSUkaudPoXq6chNtkhpZK/KfS+lvLhefS3B10PamT98pDzp0G3S639tfW/pltxS42jilWpT+uQ1idQS9LrCTJudf/WLHtZ1LttrvZXqaRPaZlLpS0cPmmVSQu+XYNdW7meqZmpvLdSxpIEkY0r7rcX+9fKOqYLPuSOy2fvPDwImN/+fb7y/OdWnG0iLSMqjT8tHJscpckqbS0rKiIBcEeu6TXx9XVvTRvXozHEoKvxTLO5k19v2vI3LX3/XNDZh53i6InG0bHxqqE0Vva+4fMl07K0f07u97dc+YmNzi7PCiZnOiNjuQ+Tcs4svNZe0jOh92vGW1YJN+VH53/UdeQ+cZRxzVAHIAkeZ5QZab2GblfOZBFSr7R3v/9V/tX0zB89cddX3/+XsG4InfNJ1SzNqeSCdXS+cyD4iSquMwmVLM2pTp0qXiTLXyc2/BTZX0HT7Zk6+JovqW+3e1uGZ7jlqt6J4eHvTQN8j8Z/I5JffGP/B4bKnfHbO3q1Kqf2PVAR2ZBxftw1srBsiPZQruXcbNSLedOPwH8tNzIA/+CjqUbfpZrZhebFrX+ULkL2fmjBU0j9fK1Z3VMUoN8RVnw91K1Rvn1JlvncUJ1U8ba+1VrO+gdkB/kbLhfs/ZMYlqFoezf9Y3fPsNMndH1wnZ70VIn7TrJveq/qYVPFL1+/4UVLKuzO2e9pTD/LZ9yf6bc8kIZS/STS0hJsVMrDrljQtXZFv59NWa56/VxtbV12dk5qpn36dOnx8bqi4uXr1lT+dhjj3nsO5H77c42nl15Z9e7b9Y97RZWP72ry2zuu/nPoUtygkWSoqm4r518o3zRwucq3jhBsjQsJFfaorOvQ527WE6GrSa3RZpvniT34ZA52yFz98k3yvPzyt88TmZKz7fQ6VYq96H+r/bWLMvPq93b2T9kvnaYSJwmea6deLM4Pvpp4472m0PmSx/nKe05L3i+FXI0cl+z78Id8+07nR+XpGU1HblClr//bsj8Xc+R35H7YcYRuWt+K2T88/PWX1n5ux9WK35317ehdH7jYMX6KyXxz89zf1deszq9lkxslla0pyzYEb+wPf8V8nJ5Lk2/0BtRCnJ3xCet1cWxktb8nD3xC3akGn+s5slxKveR6g1/pzX83biN5zrknPvP5RV/TlqwIynnJpm0rG+nF5JjRaTYzfT0Jn3SHnr/JU15J51Z2czW74hP/3M+uV3HPedOryuOawa9DLBkfQHptsf+ULnXN1voEFij8kjpPIE1P+edmLiGhLz+ypaR0hU7HDZ3oef5VsjEZ07kNdyv3X6/ZvtITcM1Wehpn+Ss/08tSbPcr91uX1XG9O0wryL3xCWmiu2/1DR+S2L2JdfK3vyl9k17zfZfeFW+IneVTRytOOSOWyGdbeHfV2OWOwmXo+eUlBjZ15dU/62pqU1MTPLWcSJ357QJjYWdI3dJin6TxODCvS6kdePejpt824G+jndKnWN3OvvqvIkkSfQLTbeP/1ZOy7y/43w/68CA6chL8uwokfuNo++StD6t/+b5d5eyuqNrPuqWy5uHzDfOv+v1S7S+MjOuyRZvafq8suqypd4KuGXYxcKqOZkJTqjqs3SafYnpDxn1VseXmJp/Mq74kHvtw1yqVz4J+YfMdTQpT+YG75euOsrifZYGyVhlYZXUNt3KkO+SZGmZvy+qZxO2I7UN1wzKRGXq31Y20RxIy0h9873lBX9gjeozTWX0q0nkGpP3d5Wce+rlMufbY+iG9AMHne/12B+Wlik+s3wb+7UGx9e1dHEfZpOrmry+or5D5XthouI9ZmaoTJ/ZwWY1HT4lYfu2lIVeJzlZaO/8b/LCD1OeURO0czGhoU3JGV43wZeYvCnRD++NR+6sGzNnztTpdHq1x29+85QfuupUZexz4/xdAcf1g/wEwiLxrhcm93JJijaoVh67cFm+99lU1kVfwftY5O7V3aLHVZdVw/aJyz0uS5e4LPbVm6ua/rfS5ZfC0ioSmyyVr367KpH+hXvRSh6X6Rf9F/DUhFgsocFpgjGhKX6B05QjkynJXCc0xzvV4Mi5k98JIOG/S1Iom/5+gHujpDPj/lEBj/1x5Nw91O82NPcOy2u8BO8etTsmR/utMH5+wElh/n8xfrn7v29+acEhd7fqFbm7vTPmFRHyw2ENfWVyyCn8fNjGf5YlFQqZdFeretSWm399lnTI1HVbh9x9VqJhAY/9cchdCxr44bAxn5ARuUHEyd34UXf/P8+8obaz3/jidn/Xx0a1t8axLrB+9/KTkJpE7kym85fGKD/2qyzM9/mTMloIjnbg2aqfKl+56Lg7xaH4Pdmv/FRZ9YWG4h5NVR77k36xtOmnAnKPpkbP0PK7009C4lbIcdhkXJtEnNzHRWmcG+GPdWjlMtTjTgB/rGOcp2XEbAa5+31Xh+if2ZuHP7OnUZTt7mUN14THn9mLxZ/Z84OHIHc/QA36Kn1+QxV/IFtD/6IqVwJu31DV4w9k+0EakLsfoAZ9lT7l/lTikzGL5+APZLtaKRRi+RDos5vc457XRS94Cn8gW1tzQO7a8gyN2nzKffqTUmw25K7Z/GcICPdBXrdc5J41Nz533oxZj0Hu2uoDcteWZ2jU5lPuUVFRs5J/o8/RxWTMiV08JzZzrj5zrj5rrp6elnFZsB4ITICAs9zjc2OUsH3q1KlRUVFTpkz5ddKkXydNmjJlSlRUFJO+eNCGxmkW6F5C7oHeA4FoXzxPvIRLuvRZcUvmxWZC7hMQ2YOMiEOlLUXu2XPnL42JWRg9bdo0dhxC7hr6AHLXEGbIVDVKuU+dOjV6wVPxS+fFPT8vLkenz0bkDstrROD5efG58+bnx8wxzGJHI+SuuT4gd82RhkCFo5f7tGnTHp81Y45hVlyWLiE/JmFZbMKy2MRlsYkFeILAeAmQn6aYN+fpWY/PnqEcipC75uKA3DVHGgIVKmeU8nEYic4Q2G3h2EXlUITcNd+9kLvmSEOgQuWMgtxDYG+FdReVQxFy13w/Q+6aIw2BCpUzyqfcp06dKqUIHXUAABQXSURBVBYOgbGhi6FDQDy0IHfN9xvkrjnSEKjQ/aTylJaB3ENgd4ZsF92PQ9wto+HOhNw1hBkyVbmfVJB7yOy8cOmoeBCKnyA9HYpi+XBh4N9xQO7+5RuctYvnCfs47H5GqX55hG0YnINCr0KIgHgEsmXlOHQ/FNlb4iYhNNIAdhVyDyD8gDUtnifKSeXytUAvcp82bVrAuo6GQ5+AePiJZmc5Gchdqz0MuWtFMpTqEc8un3J3SbuL22IZBDQhwA5CyF1biUDu2vIMjdrEE3I0coffRWJY1paAYnbIXVt9QO7a8gyN2sSTc5Ryh99FaFjWioC72ZGW0UoikLtWJEOpHvHM9CJ3Je0unoEuy2JVWAYB7wRcDh6Xl1H0MYU+8KuQExcK5D5xhqFXg3gGshPMJVxiP7XKTrbRKN7lLMVLEBgTAeVIY8ehy9w+q0o8aEPvlAtEjyH3QFAPdJviecLOHFHujz76KIuexFNOXB7TeYvCIOBOQDycxGV24D366KOI3CcuCch94gxDr4ZRyt274sVzEssgMEEC7GCD3DW0CeSuIcyQqcqn3B+lD/F8wzIIPAAC7MBD5K6JSiB3TTCGWCXe5c4+EeNfEAg4AfyZvYmYBXKfCL0Q3lbxO8uHss/UNyZPDvj5jA6AACNwY/JkdliyQ1Q5YvEF6VF6x5vcZ8+erdfrExISEvEILwIJ/DGfPuLj4+Pi4vR6fUxMzDw8QCAICMTExMTGxur1+ri4uPj4eHagzp8/nx254XU6+ms06nKfMWOGXq/X6XQzZ86cPn36KC8UKBZyBJRoSIzfJzgzhs1BQBMCyj02ylGKmH1MhlGXu16vnz179pgqQuEQJaCcOcq5hAUQCB4CyvEJs4/VMCpynz17tk6nG2tFKB+6BJTzJ3hOafQEBFx+8SJ0z69A9VxF7nq9fubMmYHqENoNCAHF7xNfgJVAgBGY+LGk1BCQkyLUG1WRe0JCAvLsob5fx9p/5SzCAggEG4GxHswozwioyD0xMRF0IpNAsJ3V6A8IROaZqMmoIXdNMIZJJVAJCAQPgTA5qQI3DMg9cOzRMgiAAAj4jQDk7je0qBgEQAAEAkcAcg8ce7QMAiAAAn4jALn7DS0qBgEQAIHAEYDcA8ceLYMACICA3whA7n5Di4pBAARAIHAEIPfAsUfLIAACIOA3ApC739CiYhAAARAIHAHIPXDs0TIIgAAI+I0A5O43tKgYBEAABAJHAHIPHHu0DAIgAAJ+IxDmcq8+0NVztautdnz8atoumHouHKoe89bNx66aeo430+3E5TFX5K8Nmk/0XDUdYx10a2Ni0NyqwwoQAIFAEAhzuTd2DNtsw2c3jw/tlrODdtvgucYxb32ox2a3XT9EtxOXx1yRvzY4aLLZ7D0H1aufGDT1OrEWBEDgAROA3L0Ah9y9wMFbIAACQU1AQ7kbtx3v6jVbLcPDA1dP765KoeNe8nZ7d8/V7k957Jy89cTlq6bOo1uW0LczGg6dvWoeHLZaBs097a3VBhmWnBlY2/DpRfLu4PUT27Kk5KrWU9eHLcPWwf7uT19h9UuSnGFYvPFw98Agqefy8eYSztw1CDU07OswkWLDw70XPtuYxcup/8/lrrR7XRkX20B1yJIkidG6uEy2ymj4rLOXjMJiNp090JCh3rS8dtvRbjGtRLGYTu10GnvnwfW09OKNhxl/6+D1c20Ni5WKZZhbGb2+Uw2S5BK5UyxkL5hNp1prXKEpFWEBBEAgdAhoJXdj23WrzWa3DJh6rvYNWu0kGbKVOmjt6QGb3TZwep0kSQaqS4uprZQQKjxgstjsNutw71VTT++wTSkmSdQv1kGz1WLu6zXbyVvfdPdY7PQlachmMbUVU8zUUwMDZpt1eOB636CFFLZcP8T87uQpw5aztKrBXlPPdTNpmvfEw/6ivR0eHiTtmgcGaTcs3bvlK5DnIXuWe8lBx5AHhmlXr8pdVe1D4dE+m83ee7SIvlt0rJf24UorK/z2V1abzdr5e0mS5M7YzH1kaIS/teegkRVjMC3DdpvVahl2kzvHYhs09/QOW6zWgYGJ5LJUx4GVIAACD5qANnIvpIYd7NiSzPpf+lmv1W7rP8GmItedNFNDGRvPDAvSWbLteHdvvyx6SUrZfdFKLgkNpArqIzuvMGXfFSI15SWVmr3nAG2MNk0y48y5hhpqQGsnnS0U5c4cpygveeu5QZvdclEWpRp4Kneb9fJ7LApOEdv1OmQxWheWiw/1WO224e7d8icGZmTr5VYeibt3opZeGq/S9L2BVDU4OMynAWj3rN27JYl1xnKxVf4cUHqox2K38esQg6lc8EgjQuReeIRcP5Rtk6tO9NrotZl/2HLvFNaAAAgEPwFt5N521W6z9R17uaa6Vn4e+4as+VQOctefGrDbLMODVkdM7YQmy1hd20ylLE9+ilJWXO+YABTcxDw1cLJGqTB5bze5EpwhVwmhngYStg937eM9rK5t7VTmS2lup+eqSX7Kd8hQe4oTquwaRmv2OmRB6GIUf4BMYw60sywK7S+7wHy1U5IklnJx9EG+2aaGoBvueluSpNZucvE73m2zmU/VShLzPo3iaWfMp9YqDOSPPjSo5xC2Ot4V5d523S5XyN/fTS6l456F5rXgfxAAgYAS0EbuLN1BsiVOT0EQzV0kDWIzH6sShltKs8Akh0CfLJlDA0ZByqQ8iz29yN3xliRJm0lIbqPRrlAPFa5T91i7pjZJkvZ3kyS48uw9vZE060HuHVskiV4qVGpjQ1aX+0bywcXlHhVHyY0n+xwdGLZaLu5npJQhkM8Nw11vF5PIuvdoUbJ8hUvhnaEDUegK1z+lBuVNQe5sIE7bqpR3bIklEACB0CCgjdw/pXH6MUdQzOL31YU8PU2icovVYrNbvtopp24kGs7bhnva92+rrSnJ4gHmuOTee1jIbPyeXEgGiYLFOpmpu3a7dtLoeUrTi9xTvA7ZoWxxcjX5ME2gi10Vsy6eDhh6reo9uoWE8CROp7H8lVYaX5vaCGG5M/xzEqmomqbCWOZKRdYO9fOBsAkM2geafRIuzJ46hvUgAAJBTEAbuVN9WC+/pxg2ZeN7h95ukL1ZIs8K1lBrWDt/z4qJBiSGool12SkuPqIvhZjX4SY5fWzrPcHvkOGZ8YOkFaGeFJZ/OEbncukeMW47sH9bRRG/2LjvJS9yZzV7GrI4NGGZzS07uiqVHCezEWJOyb0TkrSzc9huu27qJVE/H9Sgqcdst/WeKKQbrGuXZzX45sZj/Y5kiwCBvy8AZGx7j8qzr5KBNoe0DEeF/0EgRAloI3epmM3gmTsP79xY29x2gd6L8s1nRD2ldILOTL8KZGi9bKEZZBJvNhNnWc1n9zZU1zbs66CbcKe4+IgJyJF7EdzEMgwWi3Xw6um2A4eOsaaHu96mHxqc6mHpmkHTqb3N1Q07j10lSRLLBQ9f0yT705vcvQ1ZzLM7LRe1XSW3+gxePfF2Q/O+dhNJH/FpTy8HEA3S6RwGi6/ZQBx30UhyZ2zDPcd3bnxj/yk2tCut7LrlBIE1IwJkO4h8hDrUduBEZ7/VQu448kfkHnVw8qRfJynPyQejaG+iHv4XXXnxEaXAKN5S5aVS1aSLjyhFlfpZH1xamXwwiheY/HCuJL32kNzVfz3MOipJEi8gj+Kh15S6sQACQUZAI7lLUvIrn5FYkqehLb2nt5F7QoziNKly+6Plyv4MSUreerqX3rlIthrsPnXRcQeei498yr3neGvnAG99sHuffJe9GLkT8CXvneulNyCyfg5e3K/cWa+2X7zK3eOQvd7nbmj4lJpXBmV2dFWtA3wdmUp1xOlyLM+mVXkRwp/drEl3gTg0F5hkC1HuDAvfEZZvTuyb0Nd6eYdc/3fRIvMmlSM3suD9Sb9O8vWWa/30tZeqXL3sswOy2Wmv2GXgkYuOK5PyLvyuuiewMvAENJM7G0pywepqmkAf9cgWl9TWVHvLjfiqyeGplMKKmupSx5d3PG2ZUVpTXavMB3gqNdr1Yx+yJBmKymtryguULNZo2/JZjnZmfEMbLT2ffVAvwLXLouDchyc7pMnf+vUhGmM/8hCzPIm4vbyl2oxzedVWvHSAvqVchKi15c5QuYsdI0G8/IFDiOtV+4SVIBAYAhrLPQCDcMg9AI2jybESUHId7nLnlpQDZPLSVaDCW6oNu5QX1czLe+kATeBwudPMjCQInV8qlJhdXuDd5g3gfxAIDgKQe3Dsh7DvBdcuy4BzUdKImL/lErmLcld5S5UYr0oWrih3/paXDniXuyh61caxEgSCikDoy518AYrcSYlHMBPgNncJe53l7pTR9vWW6mi5wd3lPpoO+JC7pJpzFydsVfuElSAQGAKhL/fAcEOrYyXA0x3E4A+9xi1MfMqX//XwI/x2Gmp2p7y221uq7Tuqove3iJG704Sqegd8yR13y6hCx8ogJQC5B+mOiaBuuRhZHLmXt8RiWAYBEHAjALm7IcGKB0zAi8E9vqWSIcHE5gPeb2guyAlA7kG+gyKgex4NLmZslK8RRQAQDBEEtCAAuWtBEXWAAAiAQJARgNyDbIegOyAAAiCgBQHIXQuKqAMEQAAEgowA5B5kOwTdAQEQAAEtCEDuWlBEHSAAAiAQZAQg9yDbIegOCIAACGhBAHLXgiLqAAEQAIEgIwC5B9kOQXdAAARAQAsCkLsWFFEHCIAACAQZAcg9yHYIugMCIAACWhBQl3siHiAAAiAAAqFMQF3uWlw2UAcIgAAIgEDACEDuAUOPhkEABEDAfwQgd/+xRc0gAAIgEDACkHvA0KNhEAABEPAfAcjdf2xRMwiAAAgEjADkHjD0aBgEQAAE/EcAcvcfW9QMAiAAAgEjALkHDD0aBgEQAAH/EYDc/ccWNYMACIBAwAhA7gFDj4ZBAARAwH8EIHf/sUXNIAACIBAwApB7wNCjYRAAARDwHwHI3X9sUTMIgAAIBIwA5B4w9GgYBEAABPxHAHL3H1vUDAIgAAIBIwC5Bww9GgYBEAAB/xGA3P3HFjWDAAiAQMAIBJ3cM0prSrIChgMNgwAIgEB4EAgmuZfuvzxot9nsNmt3eMDFKEAABEAgUAS0lHtSUlJhYWEFfRQWFiYlJY1pVLuv2G3WvmNv1FRXFI1pQxQGARAAARBwIaCZ3J977rlKt8dzzz3n0p6Xl23X7bbBc41eSuAtEAABEACB0RHQRu5JSUlM7AaDYQZ9GAwGtmaU8XvPVdPAsN1mHe69auo53jy6zqMUCIAACICAOgFt5F5YWFhZWWkwGMRGmN8LCwvFlZ6WIXdPZLAeBEAABMZBQBu5V1RUVFZWzpgxQ+zBjBkzKisrKyoqxJVelpGW8QIHb4EACIDAmAhA7mPChcIgAAIgEBoEtJH7xNMykiT5PXJ/5KFfJ01iz4uP+Gn3RB2c/NBrrG7enN/a8tMQUC0IgEA4ENBG7hOfUPW/3KMOTqZmf8hfXs99eDK9ckDu4XBiYAwgEOoEtJG7JElBfyvkIxdZ2P7A5B7qhwb6DwIgEMoENJO7JEkT/BKTP9MyPGxnfp/8cG7Uw/+iy3LORE6hTD4YJUmOtxxbCakVx0oSp09+OJfufh62y2mffz0cJbmlZV57SH5X3NBXc24H15i6N4mO6P9/LJK3mnwwivef9lzpEumw/OAF5BQW/yDC38b/IAACIUFAS7lPcMDBJneWnef/Mse5iI/KmlrSl9zVNpxE6+Sy5g2xC4BnpXorP9ZWhIuNfBngn29ks/vqzAR3OTYHARDwG4Egkrvfxsgq5tpiaRmuSC+R+6+0JLc2DYF5MC7Huc6V8JLcy7wwaUJclpxf8kpUmlMl4qU8f4t1j/fH6eMIfUu5BtCuyn1zGqCMxblC1e5gJQiAQHASgNzpfhEF52I04S0nXbrtT/6uitz5WzxJIslXGqJaz825tUBXjKK8kmyhnwac5E6tzeXOckrChYf3U4zoybJ8MVPvENaCAAgEI4FIlzvTFrehkwdlowlydw3AnT3Lzagid9cNBZ9qKXfeHxZ38/44Dcq73F37GYxHLPoEAiAwKgKRKnceO8uZ7smT6fyqkwdV5C7xmFctJc1lSkJd4lDR4KobssCZG1mtObVd6Lm82AEhg+80KB9yd8HCh8muFmq9wToQAIEgJRCxcnfIl9zxIhvTyYMebOvsdyZoeefytD7LYzjJnZTgnw9o0kPZ0LOs1Q8Zb+XFvj30Gi9J1CwuK1cat7QMbVGsZNKvbNZXvStYCwIgELwEIkfuwbsP0DMQAAEQ0JwA5K45Uq0qdHwOUHIs8ocJrVpAPSAAAuFLAHIP332LkYEACEQwAcg9gnc+hg4CIBC+BCD38N23GBkIgEAEE4DcI3jnY+ggAALhSwByD999i5GBAAhEMAHIPYJ3PoYOAiAQvgRU5J6QkDB9+vTwHTJGBgIgAALhT0BF7nq9fubMmeE/dIwQBEAABMKXgIrcZ8+erdPpwnfIGBkIgAAIhD8BFblLkqTX62fPnh3+o8cIQQAEQCBMCajLfcaMGXq9XqfTzZw5E/n3MN31GBYIgEA4E1CXOxvx7Nmz9Xp9QkJCIh4gAAIgAAIhRcCb3MP5ooaxgQAIgEBYE4Dcw3r3YnAgAAKRSgByj9Q9j3GDAAiENQHIPax3LwYHAiAQqQQg90jd8xg3CIBAWBOA3MN692JwIAACkUoAco/UPY9xgwAIhDUByD2sdy8GBwIgEKkEIPdI3fMYNwiAQFgTgNzDevdicCAAApFKAHKP1D2PcYMACIQ1Acg9rHcvBgcCIBCpBCD3SN3zGDcIgEBYE4Dcw3r3YnAgAAKRSgByj9Q9j3GDAAiENQHIPax3LwYHAiAQqQT+H9NEFoAsrn1PAAAAAElFTkSuQmCC" + } + }, + "cell_type": "markdown", + "id": "d07d54f3-dfb9-46b2-be50-d6d35f81b3c0", + "metadata": {}, + "source": [ + "🔄 What Happens When You Call .remote()?\n", + "\n", + "some_function.remote() → Modal SDK sends API request\n", + " → Spins up a container\n", + " → Runs the code remotely\n", + " → Sends the result back to your local machine\n", + "\n", + "![image.png](attachment:886d059a-a8ca-4552-86d2-fb87fb824441.png)\n", + "\n", + "What we have here is an **ephemeral app**: the container shuts down after finishing.\n", + "\n", + "For our project, we need a persistently running app that behaves like a production API. To achieve that, we should use `modal deploy -m`, making the app suitable for serving AI services reliably." + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "id": "9422c5b0-573e-43a4-99b5-2c6199770f5c", + "metadata": {}, + "source": [ + "## 📦 Persistent Deployment with `modal deploy`" + ], + "outputs": [] + }, + { + "attachments": { + "b84a3557-9805-462f-a1d5-008b3aa4f4f5.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "8edb6eb4-4489-4823-9d16-a01c8d0355b6", + "metadata": {}, + "source": [ + "Click the blue \"+\" button at the top left of JupyterLab, then choose \"Terminal\" to open a new terminal tab.\n", + "\n", + "There, you can run:\n", + "\n", + "```bash\n", + "conda activate llms\n", + "modal deploy -m modal_services.get_started\n", + "```\n", + "\n", + "This builds and deploys the app (`example-hello-world`), registers `f()`, and makes it callable via `.remote()` anytime — even outside the notebook.\n", + "\n", + "![image.png](attachment:b84a3557-9805-462f-a1d5-008b3aa4f4f5.png)" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "656456f5-b3f9-40bf-8a2e-169da0a68fe8", + "metadata": {}, + "outputs": [], + "source": [ + "from modal_services.get_started import f\n", + "f = modal.Function.from_name(\"example-hello-world\", \"f\") # (app_name, function_name)\n", + "print(f.remote(20))" + ] + }, + { + "attachments": { + "b950fed1-8806-424c-830a-d8b99927801e.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "7d2c4b43-97a7-4a8e-a0df-1e20de053d5a", + "metadata": {}, + "source": [ + "## 🚀 Deploy Our first Modal-powered model\n", + "\n", + "So far, we’ve seen how to run simple remote functions using `@app.function()` and call them via `modal.Function.from_name(...)` in a **persistent app** — good for basic tasks.\n", + "\n", + "But in our Smart Deal Finder project, we need more:\n", + "- Load and reuse a large model (like LLaMA) \n", + "- Keep the model in memory \n", + "- Expose one or more methods (like `price()`)\n", + "\n", + "That’s why we use `@app.cls` — it lets us define a class (e.g. `Pricer`) that lives in a Modal container, loads the model once in `setup()`, and handles remote requests efficiently.\n", + "\n", + "Full code : `\\modal_services.ft_pricer.py`\n", + "\n", + "---\n", + "\n", + "\n", + "🚀 In this step, we’ll deploy a class-based app using `modal.Cls.from_name`.\n", + "\n", + "Specifically, we’ll deploy `Pricer`, which loads our 4-bit quantized fine-tuned LLaMA model (trained in Notebook 9), and exposes a remote `.price()` method to estimate item prices.\n", + "\n", + "⚠️ Before deploying, add your HF_TOKEN in Modal\n", + "\n", + "Then open a terminal and run:\n", + "\n", + "```bash\n", + "modal deploy -m modal_services.ft_pricer\n", + "```\n", + "\n", + "This will:\n", + "- Build the image with your code and dependencies\n", + "- Deploy the app `llm-ft-pricer` and register the `Pricer` class and its methods\n", + "- Not start any container yet — setup() isn't run and the model isn’t loaded\n", + "- Prepare the app to handle `.remote()` calls when they come in\n", + "\n", + "![image.png](attachment:b950fed1-8806-424c-830a-d8b99927801e.png)" + ], + "outputs": [] + }, + { + "attachments": { + "1c697283-e5e2-4b09-b1f1-d1c11f18c8e4.png": { + "image/png": "" + }, + "4a22e438-6b25-4c69-9439-99d146ffd188.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "afff309a-740b-443f-96f1-4b20618ada8b", + "metadata": {}, + "source": [ + "## 🔗 Connect to Our Deployed App\n", + "\n", + "Now that our app is deployed, we can connect to it and use it like a remote service.\n", + "\n", + "We'll do this using `modal.Cls.from_name(\"llm-ft-pricer\", \"Pricer\")`, which fetches the `Pricer` class from our deployed app via the Modal API.\n", + "\n", + "Then, calling `.price.remote(...)` sends a request to Modal, spins up a container if needed, loads the model, runs the method, and returns the result.\n", + "\n", + "This is how we turn our model into a cloud API.\n", + "\n", + "What happens under the hood when calling price.remote(...): \n", + "- First run = downloads model files → stores in volume (/cache) → loads into memory → runs \n", + "- Later runs = load from volume → memory → run (no re-download)\n", + "\n", + "---\n", + "\n", + "Since we added `min_containers=1`, a container is created and kept warm as soon as the app is deployed. Models remain loaded in memory, so there are no cold starts — unless the app is stopped or the container crashes. \n", + "\n", + "![image.png](attachment:1c697283-e5e2-4b09-b1f1-d1c11f18c8e4.png)\n", + "\n", + "⚠️ However, this **continuously consumes credits** if you forget to stop the container or app manually.\n", + "\n", + "To save credits, you can set `min_containers=0` and `scaledown_window=300` — this way, no container stays warm by default, and a new one will spin up only when `.remote()` is called (i.e., on cold start).\n", + "\n", + "![image.png](attachment:4a22e438-6b25-4c69-9439-99d146ffd188.png)\n" + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b478ae8-f636-4ac7-bf53-0f3b23c21a72", + "metadata": {}, + "outputs": [], + "source": [ + "Pricer = modal.Cls.from_name(\"llm-ft-pricer\", \"Pricer\")\n", + "pricer = Pricer()\n", + "reply = pricer.price.remote(\"SEVERIN 28L Microwave, 900W, 5 power levels, 35-min timer, turntable (31.5 cm), Silver, MW 7772\")\n", + "print(reply)" + ] + }, + { + "cell_type": "markdown", + "id": "2e63efbf-344b-4b5f-8a0d-27b6e41f8508", + "metadata": {}, + "source": [ + "Now that we’ve deployed our model and learned how to call it remotely with `.remote()`,\n", + "let’s go one step further — wrap this logic inside a local Python class.\n", + "\n", + "In the next step, we'll build a local Agent that cleanly interacts with our deployed `Modal app`, using the same `Modal API` under the hood." + ], + "outputs": [] + }, + { + "cell_type": "markdown", + "id": "8fbc7696-e892-4f08-80c5-5199b03ed175", + "metadata": {}, + "source": [ + "## 🔌 Connect to Your Modal App with a Local Agent\n", + "\n", + "`ft_pricer.py` is now a deployed API on Modal. \n", + "\n", + "To use it locally, we’ll wrap it in a class called `FTPriceAgent` (Full code: `\\agents\\ft_price_agent.py)` that:\n", + "\n", + "- Connects to the remote app via `modal.Cls.from_name(...)` \n", + "- Calls `.price.remote(...)` to run predictions \n", + "\n", + "🔄 **Two API Calls:** happen\n", + "1. `modal.Cls.from_name(...)` → fetches the deployed class \n", + "2. `.price.remote(...)` → runs the remote method on Modal \n", + "\n", + "This keeps our code clean and modular." + ], + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b80cd15a-e419-4c21-97e7-4a56ed4db680", + "metadata": {}, + "outputs": [], + "source": [ + "from agents.ft_price_agent import FTPriceAgent\n", + "\n", + "agent = FTPriceAgent()\n", + "agent.price(\"Apple AirPods Max wireless over-ear headphones with active noise cancellation and spatial audio\")" + ] + }, + { + "cell_type": "markdown", + "id": "65522b93-59c9-4d15-a12d-58e078b88545", + "metadata": {}, + "source": [ + "Now that we’ve seen how Modal agents work — connecting to remote services and running `.remote()` — we’ll use the same pattern for the rest of our models.\n", + "\n", + "✅ For each model — **XGBoost**, **GPT-4o RAG**, and the **Ensemble** — we’ll build a dedicated Agent. " + ], + "outputs": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/week8/community_contributions/lisekarimi/agents/__init__.py b/week8/community_contributions/lisekarimi/agents/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/week8/community_contributions/lisekarimi/agents/base_agent.py b/week8/community_contributions/lisekarimi/agents/base_agent.py new file mode 100644 index 0000000..fe09e18 --- /dev/null +++ b/week8/community_contributions/lisekarimi/agents/base_agent.py @@ -0,0 +1,33 @@ +import logging + +class Agent: + """ + An abstract superclass for Agents + Used to log messages in a way that can identify each Agent + """ + + # Foreground colors + RED = '\033[31m' + GREEN = '\033[32m' + YELLOW = '\033[33m' + BLUE = '\033[34m' + MAGENTA = '\033[35m' + CYAN = '\033[36m' + WHITE = '\033[37m' + + # Background color + BG_BLACK = '\033[40m' + + # Reset code to return to default color + RESET = '\033[0m' + + name: str = "" + color: str = '\033[37m' + + def log(self, message): + """ + Log this as an info message, identifying the agent + """ + color_code = self.BG_BLACK + self.color + message = f"[{self.name}] {message}" + logging.info(color_code + message + self.RESET) \ No newline at end of file diff --git a/week8/community_contributions/lisekarimi/agents/ft_price_agent.py b/week8/community_contributions/lisekarimi/agents/ft_price_agent.py new file mode 100644 index 0000000..465f1bb --- /dev/null +++ b/week8/community_contributions/lisekarimi/agents/ft_price_agent.py @@ -0,0 +1,29 @@ +import modal +from agents.base_agent import Agent + + +class FTPriceAgent(Agent): + """ + An Agent that runs the fine-tuned LLM that's running remotely on Modal + """ + + name = "FTPrice Agent" + color = Agent.RED + + def __init__(self): + """ + Set up this Agent by creating an instance of the modal class + """ + self.log("FTPrice Agent is initializing - connecting to modal") + Pricer = modal.Cls.from_name("llm-ft-pricer", "Pricer") # 1st API call: to fetch Pricer (remote class) + self.pricer = Pricer() + self.log("FTPrice Agent is ready") + + def price(self, description: str) -> float: + """ + Make a remote call to return the estimate of the price of this item + """ + self.log("FTPrice Agent is calling remote fine-tuned model") + result = self.pricer.price.remote(description) # 2nd API call: to run the price method in the remote Pricer class + self.log(f"FTPrice Agent completed - predicting ${result:.2f}") + return result \ No newline at end of file diff --git a/week8/community_contributions/lisekarimi/helpers/__init__.py b/week8/community_contributions/lisekarimi/helpers/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/week8/community_contributions/lisekarimi/helpers/items.py b/week8/community_contributions/lisekarimi/helpers/items.py new file mode 100644 index 0000000..a594e27 --- /dev/null +++ b/week8/community_contributions/lisekarimi/helpers/items.py @@ -0,0 +1,120 @@ +from typing import Optional # A variable might be a certain type or None +from transformers import AutoTokenizer +import re + +BASE_MODEL = "meta-llama/Meta-Llama-3.1-8B" + +MIN_TOKENS = 150 # Minimum tokens required to accept an item +MAX_TOKENS = 160 # We limit to 160 tokens so that after adding prompt text, the total stays around 180 tokens. + +MIN_CHARS = 300 # Reject items with less than 300 characters +CEILING_CHARS = MAX_TOKENS * 7 # Truncate long text to about 1120 characters (approx 160 tokens) + +class Item: + """ + An Item is a cleaned, curated datapoint of a Product with a Price + """ + + # Load tokenizer for the model + tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True) + + # Define PRICE_LABEL and question for the training prompt + PRICE_LABEL = "Price is $" + QUESTION = "How much does this cost to the nearest dollar?" + + # A list of useless phrases to remove to reduce noise for price prediction + REMOVALS = ['"Batteries Included?": "No"', '"Batteries Included?": "Yes"', '"Batteries Required?": "No"', '"Batteries Required?": "Yes"', "By Manufacturer", "Item", "Date First", "Package", ":", "Number of", "Best Sellers", "Number", "Product "] + + # Attributes for each item + title: str + price: float + category: str + token_count: int = 0 # How many tokens in the final prompt + + # Optional fields + details: Optional[str] # The value can be a string or can be None + prompt: Optional[str] = None + include = False # Whether to keep the item or not + + def __init__(self, data, price): + self.title = data['title'] + self.price = price + self.parse(data) + + def scrub_details(self): + """ + Removes useless phrases from details, which often has repeated specs or boilerplate text. + """ + details = self.details + for remove in self.REMOVALS: + details = details.replace(remove, "") + return details + + def scrub(self, stuff): + """ + Clean up the provided text by removing unnecessary characters and whitespace + Also remove words that are 7+ chars and contain numbers, as these are likely irrelevant product numbers + """ + stuff = re.sub(r'[:\[\]"{}【】\s]+', ' ', stuff).strip() + stuff = stuff.replace(" ,", ",").replace(",,,",",").replace(",,",",") + words = stuff.split(' ') + select = [word for word in words if len(word)<7 or not any(char.isdigit() for char in word)] + return " ".join(select) + + def parse(self, data): + """ + Prepares the text, checks length, tokenizes it, and sets include = True if it’s valid. + """ + # Builds a full contents string by combining description, features, and cleaned details. + contents = '\n'.join(data['description']) + if contents: + contents += '\n' + features = '\n'.join(data['features']) + if features: + contents += features + '\n' + self.details = data['details'] + if self.details: + contents += self.scrub_details() + '\n' + + # If content is long enough, trim it to max char limit before processing. + if len(contents) > MIN_CHARS: + contents = contents[:CEILING_CHARS] + + # Clean and tokenize text, then check token count. + text = f"{self.scrub(self.title)}\n{self.scrub(contents)}" + tokens = self.tokenizer.encode(text, add_special_tokens=False) + + if len(tokens) > MIN_TOKENS: + # Truncate tokens, decode them back and create the training prompt + tokens = tokens[:MAX_TOKENS] + text = self.tokenizer.decode(tokens) + self.make_prompt(text) + + # Mark the item as valid and ready to be used in training + self.include = True # Only items with MIN_TOKENS <= tokens <= MAX_TOKENS are kept + + + def make_prompt(self, text): + """ + Builds the training prompt using the question, text, and price. Then counts the tokens. + """ + self.prompt = f"{self.QUESTION}\n\n{text}\n\n" + self.prompt += f"{self.PRICE_LABEL }{str(round(self.price))}.00" + self.token_count = len(self.tokenizer.encode(self.prompt, add_special_tokens=False)) + + def test_prompt(self): + """ + Returns the prompt without the actual price, useful for testing/inference. + """ + return self.prompt.split(self.PRICE_LABEL )[0] + self.PRICE_LABEL + + def __repr__(self): + """ + Defines how the Item object looks when printed — it shows the title and price. + """ + return f"<{self.title} = ${self.price}>" + + + + + \ No newline at end of file diff --git a/week8/community_contributions/lisekarimi/helpers/loaders.py b/week8/community_contributions/lisekarimi/helpers/loaders.py new file mode 100644 index 0000000..4314c65 --- /dev/null +++ b/week8/community_contributions/lisekarimi/helpers/loaders.py @@ -0,0 +1,106 @@ +from datetime import datetime # Measure how long loading takes +from tqdm import tqdm # Shows a progress bar while processing data +from datasets import load_dataset # Load a dataset from Hugging Face Hub +from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor # For parallel processing (speed) +from items import Item + +CHUNK_SIZE = 1000 # Process the dataset in chunks of 1000 datapoints at a time (for efficiency) +MIN_PRICE = 0.5 +MAX_PRICE = 999.49 +WORKER = 4 # Set the number of workers here + +class ItemLoader: + + def __init__(self, name): + """ + Initialize the loader with a dataset name. + """ + self.name = name # Store the category name + self.dataset = None #Placeholder for the dataset (we load it later in load()) + + def process_chunk(self, chunk): + """ + Convert a chunk of datapoints into valid Item objects. + """ + batch = [] # Initialize the list to hold valid items + + # Loop through each datapoint in the chunk + for datapoint in chunk: + try: + # Extract price from datapoint + price_str = datapoint['price'] + if price_str: + price = float(price_str) + + # Check if price is within valid range + if MIN_PRICE <= price <= MAX_PRICE: + item = Item(datapoint, price) + + # Keep only valid items + if item.include: + batch.append(item) + except ValueError: + continue # Skip datapoints with invalid price format + return batch # Return the list of valid items + + + def load_in_parallel(self, workers): + """ + Split the dataset into chunks and process them in parallel. + """ + results = [] + size = len(self.dataset) + chunk_count = (size // CHUNK_SIZE) + 1 + + # Build chunks directly here (no separate function) + chunks = [ + self.dataset.select(range(i, min(i + CHUNK_SIZE, size))) + for i in range(0, size, CHUNK_SIZE) + ] + + # Process chunks in parallel using multiple CPU cores + with ProcessPoolExecutor(max_workers=workers) as pool: + for batch in tqdm(pool.map(self.process_chunk, chunks), total=chunk_count): + results.extend(batch) + + # Add the category name to each result + for result in results: + result.category = self.name + + return results + + + def load(self, workers=WORKER): + """ + Load and process the dataset, returning valid items. + """ + # Record start time + start = datetime.now() + + # Print loading message + print(f"Loading dataset {self.name}", flush=True) + + # Load dataset from Hugging Face (based on category name) + self.dataset = load_dataset( + "McAuley-Lab/Amazon-Reviews-2023", + f"raw_meta_{self.name}", + split="full", + trust_remote_code=True + ) + + # Process the dataset in parallel and collect valid items + results = self.load_in_parallel(workers) + + # Record end time and print summary + finish = datetime.now() + print( + f"Completed {self.name} with {len(results):,} datapoints in {(finish-start).total_seconds()/60:.1f} mins", + flush=True + ) + + # Return the list of valid items + return results + + + + \ No newline at end of file diff --git a/week8/community_contributions/lisekarimi/helpers/testing.py b/week8/community_contributions/lisekarimi/helpers/testing.py new file mode 100644 index 0000000..9422182 --- /dev/null +++ b/week8/community_contributions/lisekarimi/helpers/testing.py @@ -0,0 +1,84 @@ +import math +import matplotlib.pyplot as plt + +GREEN = "\033[92m" +YELLOW = "\033[93m" +RED = "\033[91m" +RESET = "\033[0m" +COLOR_MAP = {"red":RED, "orange": YELLOW, "green": GREEN} + +class Tester: + + def __init__(self, predictor, data, title=None, size=250): + self.predictor = predictor + self.data = data + self.title = title or predictor.__name__.replace("_", " ").title() + self.size = size + self.guesses = [] + self.truths = [] + self.errors = [] + self.sles = [] + self.colors = [] + + def color_for(self, error, truth): + if error<40 or error/truth < 0.2: + return "green" + elif error<80 or error/truth < 0.4: + return "orange" + else: + return "red" + + def run_datapoint(self, i): + datapoint = self.data[i] + guess = self.predictor(datapoint) + truth = datapoint["price"] + error = abs(guess - truth) + log_error = math.log(truth+1) - math.log(guess+1) + sle = log_error ** 2 + color = self.color_for(error, truth) + title = datapoint["text"][:40] + "..." if len(datapoint["text"]) > 40 else datapoint["text"] + self.guesses.append(guess) + self.truths.append(truth) + self.errors.append(error) + self.sles.append(sle) + self.colors.append(color) + # print(f"{COLOR_MAP[color]}{i+1}: Guess: ${guess:,.2f} Truth: ${truth:,.2f} Error: ${error:,.2f} SLE: {sle:,.2f} Item: {title}{RESET}") + + def chart(self, title): + max_error = max(self.errors) + plt.figure(figsize=(15, 6)) + max_val = max(max(self.truths), max(self.guesses)) + plt.plot([0, max_val], [0, max_val], color='deepskyblue', lw=2, alpha=0.6) + plt.scatter(self.truths, self.guesses, s=3, c=self.colors) + plt.xlabel('Ground Truth') + plt.ylabel('Model Estimate') + plt.xlim(0, max_val) + plt.ylim(0, max_val) + plt.title(title) + + # Add color legend + from matplotlib.lines import Line2D + legend_elements = [ + Line2D([0], [0], marker='o', color='w', label='Accurate (green)', markerfacecolor='green', markersize=8), + Line2D([0], [0], marker='o', color='w', label='Medium error (orange)', markerfacecolor='orange', markersize=8), + Line2D([0], [0], marker='o', color='w', label='High error (red)', markerfacecolor='red', markersize=8) + ] + plt.legend(handles=legend_elements, loc='upper left') + plt.show() + + def report(self): + average_error = sum(self.errors) / self.size + rmsle = math.sqrt(sum(self.sles) / self.size) + hits = sum(1 for color in self.colors if color=="green") + title = f"{self.title} Error=${average_error:,.2f} RMSLE={rmsle:,.2f} Hits={hits/self.size*100:.1f}%" + self.chart(title) + + def run(self): + self.error = 0 + for i in range(self.size): + self.run_datapoint(i) + self.report() + + @classmethod + def test(cls, function, data): + cls(function, data).run() \ No newline at end of file diff --git a/week8/community_contributions/lisekarimi/modal_services/__init__.py b/week8/community_contributions/lisekarimi/modal_services/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/week8/community_contributions/lisekarimi/modal_services/ft_pricer.py b/week8/community_contributions/lisekarimi/modal_services/ft_pricer.py new file mode 100644 index 0000000..974aeb8 --- /dev/null +++ b/week8/community_contributions/lisekarimi/modal_services/ft_pricer.py @@ -0,0 +1,140 @@ +import modal +from modal import App, Volume, Image + +import logging +logging.basicConfig(level=logging.INFO) + +# ───────────────────────────────────────────────────────────────────────────── +# Constants +# ───────────────────────────────────────────────────────────────────────────── + +GPU = "T4" # Use a T4 GPU for inference +CACHE_PATH = "/cache" # Mount point for the Modal volume + +# Hugging Face model references +BASE_MODEL = "meta-llama/Meta-Llama-3.1-8B" +FINETUNED_MODEL = "ed-donner/pricer-2024-09-13_13.04.39" +REVISION = "e8d637df551603dc86cd7a1598a8f44af4d7ae36" # Commit of the fine-tuned model + +# Local cache paths (inside the volume) +BASE_MODEL_DIR = f"{CACHE_PATH}/llama_base_model" +FINETUNED_MODEL_DIR = f"{CACHE_PATH}/llama_finetuned_model" + +# ───────────────────────────────────────────────────────────────────────────── +# Structure +# ───────────────────────────────────────────────────────────────────────────── + +# Container (App: llm-ft-pricer) +# ├── /app ← Code + installed Python packages (from image) +# ├── /cache ← Mounted Modal volume (`hf-hub-cache`) +# │ └── meta-llama/Meta-Llama-3.1-8B/... ← HuggingFace model files downloaded via snapshot_download + + + +QUESTION = "How much does this cost to the nearest dollar?" +PREFIX = "Price is $" # Used to parse generated output + +# ───────────────────────────────────────────────────────────────────────────── +# Modal App, Image, Volume, Secrets +# ───────────────────────────────────────────────────────────────────────────── + +app = modal.App("llm-ft-pricer") # Define the Modal app + +image = ( + Image.debian_slim() + .pip_install("huggingface", "torch", "transformers", "bitsandbytes", "accelerate", "peft") # All needed libraries + .env({"HF_HUB_CACHE": CACHE_PATH}) # Hugging Face will store model files in /cache +) + +cache_vol = modal.Volume.from_name("hf-hub-cache", create_if_missing=True) # Persisted volume for caching models +secrets = [modal.Secret.from_name("HF_TOKEN")] # Hugging Face auth token + +# ───────────────────────────────────────────────────────────────────────────── +# Modal Class: Pricer +# ───────────────────────────────────────────────────────────────────────────── + +# All methods in this class run inside the container with the image, volume, secrets, and GPU you configured. +@app.cls( + image=image, + secrets=secrets, + volumes={CACHE_PATH: cache_vol}, # Mount volume into /cache + gpu=GPU, + timeout=1800, # 30-minute max runtime + min_containers=0, # = 1 : Keeping one container warm uses credits continuously if you forget to stop it. + scaledown_window=300, # Shuts down the container +) +class Pricer: + @modal.enter() + def setup(self): + import os, torch + import logging + from huggingface_hub import snapshot_download + from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig + from peft import PeftModel + + # Create cache path if it doesn't exist + os.makedirs(CACHE_PATH, exist_ok=True) + + # Download base and fine-tuned models into volume + logging.info("Downloading base model...") + snapshot_download(BASE_MODEL, local_dir=BASE_MODEL_DIR) + + logging.info("Downloading fine-tuned model...") + snapshot_download(FINETUNED_MODEL, revision=REVISION, local_dir=FINETUNED_MODEL_DIR) + + # Quantization config (4-bit) + quant_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_use_double_quant=True, + bnb_4bit_compute_dtype=torch.bfloat16, + bnb_4bit_quant_type="nf4" + ) + + # Load tokenizer + self.tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_DIR) + self.tokenizer.pad_token = self.tokenizer.eos_token + self.tokenizer.padding_side = "right" + + # Load base model (quantized) + base_model = AutoModelForCausalLM.from_pretrained( + BASE_MODEL_DIR, + quantization_config=quant_config, + device_map="auto" + ) + + # Apply fine-tuned weights + self.fine_tuned_model = PeftModel.from_pretrained( + base_model, + FINETUNED_MODEL_DIR, + revision=REVISION + ) + self.fine_tuned_model.generation_config.pad_token_id = self.tokenizer.pad_token_id + + @modal.method() + def price(self, description: str) -> float: + import re, torch + from transformers import set_seed + + set_seed(42) # Deterministic output + + # Construct prompt + prompt = f"{QUESTION}\n\n{description}\n\n{PREFIX}" + inputs = self.tokenizer.encode(prompt, return_tensors="pt").to("cuda") + attention_mask = torch.ones(inputs.shape, device="cuda") + + # Generate model output (max 5 tokens) + outputs = self.fine_tuned_model.generate( + inputs, + attention_mask=attention_mask, + max_new_tokens=5, + num_return_sequences=1 + ) + result = self.tokenizer.decode(outputs[0]) + + # Extract number after "Price is $" + contents = result.split("Price is $")[1] + contents = contents.replace(',', '') + match = re.search(r"[-+]?\d*\.\d+|\d+", contents) + return float(match.group()) if match else 0 # Return parsed price or 0 if not found + + diff --git a/week8/community_contributions/lisekarimi/modal_services/get_started.py b/week8/community_contributions/lisekarimi/modal_services/get_started.py new file mode 100644 index 0000000..510d7ad --- /dev/null +++ b/week8/community_contributions/lisekarimi/modal_services/get_started.py @@ -0,0 +1,12 @@ +import sys, modal + +app = modal.App("example-hello-world") + +@app.function() +def f(i: int) -> int: + if i % 2 == 0: + print("hello", i) + else: + print("world", i, file=sys.stderr) + + return i * i diff --git a/week8/day2.0.ipynb b/week8/day2.0.ipynb index 553880e..4f3b049 100644 --- a/week8/day2.0.ipynb +++ b/week8/day2.0.ipynb @@ -44,7 +44,6 @@ "from sentence_transformers import SentenceTransformer\n", "from datasets import load_dataset\n", "import chromadb\n", - "from items import Item\n", "from sklearn.manifold import TSNE\n", "import plotly.graph_objects as go" ] @@ -77,6 +76,18 @@ "login(hf_token, add_to_git_credential=True)" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "8491f550-df4a-4c8f-a260-a7a419e8efb6", + "metadata": {}, + "outputs": [], + "source": [ + "# Another import after Logging in to Hugging Face - thank you Trung N.!\n", + "\n", + "from items import Item" + ] + }, { "cell_type": "markdown", "id": "3d4995a4-f67f-4871-87df-8c6439b06366", diff --git a/week8/day2.1.ipynb b/week8/day2.1.ipynb index fac26d8..3151540 100644 --- a/week8/day2.1.ipynb +++ b/week8/day2.1.ipynb @@ -44,7 +44,6 @@ "from sentence_transformers import SentenceTransformer\n", "from datasets import load_dataset\n", "import chromadb\n", - "from items import Item\n", "from sklearn.manifold import TSNE\n", "import plotly.graph_objects as go" ] @@ -174,7 +173,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.12" } }, "nbformat": 4, diff --git a/week8/day2.2.ipynb b/week8/day2.2.ipynb index f55ae2a..eebe634 100644 --- a/week8/day2.2.ipynb +++ b/week8/day2.2.ipynb @@ -44,7 +44,6 @@ "from sentence_transformers import SentenceTransformer\n", "from datasets import load_dataset\n", "import chromadb\n", - "from items import Item\n", "from sklearn.manifold import TSNE\n", "import plotly.graph_objects as go" ] @@ -166,7 +165,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.12" } }, "nbformat": 4, diff --git a/week8/day2.3.ipynb b/week8/day2.3.ipynb index b607e45..c2eeb34 100644 --- a/week8/day2.3.ipynb +++ b/week8/day2.3.ipynb @@ -48,7 +48,6 @@ "from sentence_transformers import SentenceTransformer\n", "from datasets import load_dataset\n", "import chromadb\n", - "from items import Item\n", "from testing import Tester" ] }, @@ -66,6 +65,31 @@ "os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce73b034-9ec1-4533-ba41-3e57c7878b61", + "metadata": {}, + "outputs": [], + "source": [ + "# Log in to HuggingFace\n", + "\n", + "hf_token = os.environ['HF_TOKEN']\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4c01daad-86b0-4bc0-91ba-20a64df043ed", + "metadata": {}, + "outputs": [], + "source": [ + "# Another import after Logging in to Hugging Face - thank you Trung N.!\n", + "\n", + "from items import Item" + ] + }, { "cell_type": "code", "execution_count": null, @@ -495,7 +519,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.12" } }, "nbformat": 4, diff --git a/week8/day2.4.ipynb b/week8/day2.4.ipynb index 90bff83..c315c78 100644 --- a/week8/day2.4.ipynb +++ b/week8/day2.4.ipynb @@ -84,6 +84,31 @@ "os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "1006966f-96b7-4e1a-93f0-2bb9a09057c8", + "metadata": {}, + "outputs": [], + "source": [ + "# Log in to HuggingFace\n", + "\n", + "hf_token = os.environ['HF_TOKEN']\n", + "login(hf_token, add_to_git_credential=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de0e4b22-ee61-4b79-95bc-3cd707d5f83d", + "metadata": {}, + "outputs": [], + "source": [ + "# Another import after Logging in to Hugging Face - thank you Trung N.!\n", + "\n", + "from items import Item" + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/week8/day3.ipynb b/week8/day3.ipynb index 9188717..6f42c0e 100644 --- a/week8/day3.ipynb +++ b/week8/day3.ipynb @@ -227,7 +227,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week8/day4.ipynb b/week8/day4.ipynb index bb4c993..22385a8 100644 --- a/week8/day4.ipynb +++ b/week8/day4.ipynb @@ -78,7 +78,7 @@ " \n", " \n", "

Additional resource: more sophisticated planning agent

\n", - " The Planning Agent that we use in the next cell is simply a python script that calls the other Agents; frankly that's all we require for this project. But if you're intrigued to see a more Autonomous version in which we give the Planning Agent tools and allow it to decide which Agents to call, see my implementation of AutonomousPlanningAgent in my related repo, Agentic. This is an example with multiple tools that dynamically decides which function to call.\n", + " The Planning Agent that we use in the next cell is simply a python script that calls the other Agents; frankly that's all we require for this project. But if you're intrigued to see a more Autonomous version in which we give the Planning Agent tools and allow it to decide which Agents to call, see my implementation of AutonomousPlanningAgent in my related repo, Agentic. This is an example with multiple tools that dynamically decides which function to call.\n", " \n", " \n", " \n", @@ -144,7 +144,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.12" } }, "nbformat": 4, diff --git a/week8/day5.ipynb b/week8/day5.ipynb index 5e11432..d42d181 100644 --- a/week8/day5.ipynb +++ b/week8/day5.ipynb @@ -171,7 +171,7 @@ " If you're not fed up of product prices yet 😂 I've built this out some more!
\n", " If you look in my repo tech2ai, in segment3/lab1 is a neural network implementation of the pricer in pure PyTorch. It does pretty well..
\n", " And if you look in my repo Agentic in the workshop folder is the same Agent project taken further. There's a new version of the PlanningAgent called AutonomousPlanningAgent that uses multiple Tools, and a MessagingAgent that uses claude-3.7 to write texts. The AutonomousPlanningAgent uses the fantastic OpenAI Agents SDK and the mighty MCP protocol from Anthropic.
\n", - " If you're intrigued by Agents and MCP, and would like to learn more, then I also have a companion course called the Complete Agentic AI Engineering Course that might interest you (if you haven't had enough of me by now!!)\n", + " If you're intrigued by Agents and MCP, and would like to learn more, then I also have a companion course called the Complete Agentic AI Engineering Course that might interest you (if you haven't had enough of me by now!!), and also another course for leaders and founders looking to build a valuable business with LLMs.\n", "
\n", " \n", " \n", @@ -223,7 +223,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.12" + "version": "3.11.13" } }, "nbformat": 4,