diff --git a/SETUP-PC.md b/SETUP-PC.md index d2b2759..2ca8bbd 100644 --- a/SETUP-PC.md +++ b/SETUP-PC.md @@ -13,10 +13,12 @@ I use a platform called Anaconda to set up your environment. It's a powerful too Having said that: if you have any problems with Anaconda, I've provided an alternative approach. It's faster and simpler and should have you running quickly, with less of a guarantee around compatibility. -### Before we begin - Heads up! Please do check these Windows "gotchas": - If you are relatively new to using the Command Prompt, here is an excellent [guide](https://chatgpt.com/share/67b0acea-ba38-8012-9c34-7a2541052665) with instructions and exercises. I'd suggest you work through this first to build some confidence. +## HEAD'S UP - "GOTCHA" ISSUES ON A PC: The following 4 Windows issues will need your attention, particularly #3 and #4 + +Please do take a look at these issues. Issue #3 (Windows 260 character limit) will cause an issue with an "Archive Error" installing pytorch if unaddressed. Issue #4 will cause an installation issue. + There are 4 common gotchas to developing on Windows to be aware of: 1. Permissions. Please take a look at this [tutorial](https://chatgpt.com/share/67b0ae58-d1a8-8012-82ca-74762b0408b0) on permissions on Windows @@ -92,7 +94,7 @@ Press Win + R, type `cmd`, and press Enter Run `python --version` to find out which python you're on. Ideally you'd be using a version of Python 3.11, so we're completely in sync. -I believe Python 3.12 works also, but (as of Feb 2025) Python 3.13 does **not** yet work as several Data Science dependencies are not yet ready for Python 3.13. +I believe Python 3.12 works also, but (as of June 2025) Python 3.13 does **not** yet work as several Data Science dependencies are not yet ready for Python 3.13. If you need to install Python or install another version, you can download it here: https://www.python.org/downloads/ diff --git a/community-contributions/WebScraperApp/README.md b/community-contributions/WebScraperApp/README.md new file mode 100644 index 0000000..6dfed7b --- /dev/null +++ b/community-contributions/WebScraperApp/README.md @@ -0,0 +1,159 @@ +# Web Scraper & Data Analyzer + +A modern Python application with a sleek PyQt5 GUI for web scraping, data analysis, visualization, and AI-powered website insights. Features a clean, minimalistic design with real-time progress tracking, comprehensive data filtering, and an integrated AI chat assistant for advanced analysis. + +## Features + +- **Modern UI**: Clean, minimalistic design with dark theme and smooth animations +- **Web Scraping**: Multi-threaded scraping with configurable depth (max 100 levels) +- **Data Visualization**: Interactive table with sorting and filtering capabilities +- **Content Preview**: Dual preview system with both text and visual HTML rendering +- **Data Analysis**: Comprehensive statistics and domain breakdown +- **AI-Powered Analysis**: Chat-based assistant for website insights, SEO suggestions, and content analysis +- **Export Functionality**: JSON export with full metadata +- **URL Normalization**: Handles www/non-www domains intelligently +- **Real-time Progress**: Live progress updates during scraping operations +- **Loop Prevention**: Advanced duplicate detection to prevent infinite loops +- **Smart Limits**: Configurable limits to prevent runaway scraping + +## AI Analysis Tab + +The application features an advanced **AI Analysis** tab: + +- **Conversational Chat UI**: Ask questions about your scraped websites in a modern chat interface (like ChatGPT) +- **Quick Actions**: One-click questions for structure, SEO, content themes, and performance +- **Markdown Responses**: AI replies are formatted for clarity and readability +- **Context Awareness**: AI uses your scraped data for tailored insights +- **Requirements**: Internet connection and the `openai` Python package (see Installation) +- **Fallback**: If `openai` is not installed, a placeholder response is shown + +## Loop Prevention & Duplicate Detection + +The scraper includes robust protection against infinite loops and circular references: + +### 🔄 URL Normalization +- Removes `www.` prefixes for consistent domain handling +- Strips URL fragments (`#section`) to prevent duplicate content +- Removes trailing slashes for consistency +- Normalizes query parameters + +### 🚫 Duplicate Detection +- **Visited URL Tracking**: Maintains a set of all visited URLs +- **Unlimited Crawling**: No page limits per domain or total pages +- **Per-Page Duplicate Filtering**: Removes duplicate links within the same page + +### 🛡️ Smart Restrictions +- **No Depth Limits**: Crawl as deep as the specified max_depth allows +- **Content Type Filtering**: Only scrapes HTML content +- **File Type Filtering**: Skips non-content files (PDFs, images, etc.) +- **Consecutive Empty Level Detection**: Stops if 3 consecutive levels have no new content + +### 📊 Enhanced Tracking +- **Domain Page Counts**: Tracks pages scraped per domain (for statistics) +- **URL Check Counts**: Shows total URLs checked vs. pages scraped +- **Detailed Statistics**: Comprehensive reporting on scraping efficiency +- **Unlimited Processing**: No artificial limits on crawling scope + +## Installation + +1. **Clone or download the project files** + +2. **Install dependencies**: + ```bash + pip install -r requirements.txt + ``` + - This will install all required packages, including `PyQt5`, `PyQtWebEngine` (for visual preview), and `openai` (for AI features). + +3. **Run the application**: + ```bash + python web_scraper_app.py + ``` + +## Usage + +### 1. Scraping Configuration +- Enter a starting URL (with or without http/https) +- Set maximum crawl depth (1-100) +- Click "Start Scraping" to begin + +### 2. Data View & Filtering +- View scraped data in an interactive table +- Filter by search terms or specific domains +- Double-click any row to preview content +- Export data to JSON format + +### 3. Analysis & Statistics +- View comprehensive scraping statistics +- See domain breakdown and word counts +- Preview content in both text and visual formats +- Analyze load times and link counts +- Monitor duplicate detection efficiency + +### 4. AI Analysis (New!) +- Switch to the **AI Analysis** tab +- Type your question or use quick action buttons (e.g., "Analyze the website structure", "Suggest SEO improvements") +- The AI will analyze your scraped data and provide actionable insights +- Requires an internet connection and the `openai` package + +## Visual Preview Feature + +The application includes a visual HTML preview feature that renders scraped web pages in a browser-like view: + +- **Requirements**: PyQtWebEngine (automatically installed with requirements.txt) +- **Functionality**: Displays HTML content with proper styling and formatting +- **Fallback**: If PyQtWebEngine is not available, shows a text-only preview +- **Error Handling**: Graceful error messages for invalid HTML content + +## Technical Details + +- **Backend**: Pure Python with urllib and html.parser (no compilation required) +- **Frontend**: PyQt5 with custom modern styling +- **Threading**: Multi-threaded scraping for better performance +- **Data Storage**: Website objects with full metadata +- **URL Handling**: Intelligent normalization and domain filtering +- **Loop Prevention**: Multi-layered duplicate detection system +- **AI Integration**: Uses OpenAI API (via openrouter) for chat-based analysis + +## File Structure + +``` +Testing/ +├── web_scraper_app.py # Main application (with AI and GUI) +├── module.py # Core scraping logic +├── test.py # Basic functionality tests +├── requirements.txt # Dependencies +└── README.md # This file +``` + +## Troubleshooting + +### Visual Preview Not Working +1. Ensure PyQtWebEngine is installed: `pip install PyQtWebEngine` +2. Check console output for import errors + +### AI Analysis Not Working +1. Ensure the `openai` package is installed: `pip install openai` +2. Check your internet connection (AI requires online access) +3. If not installed, the AI tab will show a placeholder response + +### Scraping Issues +1. Verify internet connection +2. Check URL format (add https:// if needed) +3. Try with a lower depth setting +4. Check console for error messages + +### Loop Prevention +1. The scraper automatically prevents infinite loops +2. Check the analysis tab for detailed statistics +3. Monitor "Total URLs Checked" vs "Total Pages" for efficiency +4. Use lower depth settings for sites with many internal links + +### Performance +- Use lower depth settings for faster scraping +- Filter data to focus on specific domains +- Close other applications to free up resources +- Monitor domain page counts to avoid hitting limits + +## License + +This project is open source and available under the MIT License. \ No newline at end of file diff --git a/community-contributions/WebScraperApp/module.py b/community-contributions/WebScraperApp/module.py new file mode 100644 index 0000000..20dff0f --- /dev/null +++ b/community-contributions/WebScraperApp/module.py @@ -0,0 +1,473 @@ +import urllib.request +import urllib.parse +import urllib.error +import html.parser +import re +from datetime import datetime +import time +import ssl +from urllib.parse import urljoin, urlparse +from concurrent.futures import ThreadPoolExecutor, as_completed +import threading +from functools import partial + +class HTMLParser(html.parser.HTMLParser): + """Custom HTML parser to extract title, links, and text content""" + + def __init__(self): + super().__init__() + self.title = "" + self.links = [] + self.text_content = [] + self.in_title = False + self.in_body = False + self.current_tag = "" + + def handle_starttag(self, tag, attrs): + self.current_tag = tag.lower() + + if tag.lower() == 'title': + self.in_title = True + elif tag.lower() == 'body': + self.in_body = True + elif tag.lower() == 'a': + # Extract href attribute + for attr, value in attrs: + if attr.lower() == 'href' and value: + self.links.append(value) + + def handle_endtag(self, tag): + if tag.lower() == 'title': + self.in_title = False + elif tag.lower() == 'body': + self.in_body = False + + def handle_data(self, data): + if self.in_title: + self.title += data + elif self.in_body and self.current_tag in ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'span', 'li']: + # Clean the text data + cleaned_data = re.sub(r'\s+', ' ', data.strip()) + if cleaned_data: + self.text_content.append(cleaned_data) + + def get_text(self): + """Return all extracted text content as a single string""" + return ' '.join(self.text_content) + + def get_clean_text(self, max_length=500): + """Return cleaned text content with length limit""" + text = self.get_text() + # Remove extra whitespace and limit length + text = re.sub(r'\s+', ' ', text.strip()) + if len(text) > max_length: + text = text[:max_length] + "..." + return text + +class Website: + """Class to store website data""" + + def __init__(self, title, url, content, depth, links=None, load_time=None): + self.title = title or "No Title" + self.url = url + self.content = content + self.depth = depth + self.links = links or [] + self.load_time = load_time + self.timestamp = datetime.now() + + def get_word_count(self): + """Get word count from content""" + if not self.content: + return 0 + # Extract text content and count words + text_content = re.sub(r'<[^>]+>', '', self.content) + words = text_content.split() + return len(words) + + def get_domain(self): + """Extract domain from URL""" + try: + parsed = urlparse(self.url) + return parsed.netloc + except: + return "" + + def get_normalized_domain(self): + """Get domain without www prefix for consistent filtering""" + domain = self.get_domain() + if domain.startswith('www.'): + return domain[4:] + return domain + + def search_content(self, query): + """Search for query in content""" + if not self.content or not query: + return False + return query.lower() in self.content.lower() + + def get_text_preview(self, max_length=200): + """Get a text preview of the content""" + if not self.content: + return "No content available" + + # Extract text content + text_content = re.sub(r'<[^>]+>', '', self.content) + text_content = re.sub(r'\s+', ' ', text_content.strip()) + + if len(text_content) > max_length: + return text_content[:max_length] + "..." + return text_content + +class WebScraper: + """Web scraper with multithreading support and robust duplicate detection""" + + def __init__(self): + self.websites = [] + self.visited_urls = set() + self.visited_domains = set() # Track visited domains + self.start_domain = None # Store the starting domain + self.lock = threading.Lock() + self.max_workers = 10 # Number of concurrent threads + # Removed all page limits - unlimited crawling + self.domain_page_counts = {} # Track page count per domain (for statistics only) + self._stop_requested = False # Flag to stop scraping + + def normalize_url(self, url): + """Normalize URL to handle www prefixes and remove fragments""" + if not url: + return url + + # Remove fragments (#) to prevent duplicate content + if '#' in url: + url = url.split('#')[0] + + # Remove trailing slashes for consistency + url = url.rstrip('/') + + # Remove www prefix for consistent domain handling + if url.startswith('https://www.'): + return url.replace('https://www.', 'https://', 1) + elif url.startswith('http://www.'): + return url.replace('http://www.', 'http://', 1) + return url + + def get_domain_from_url(self, url): + """Extract and normalize domain from URL""" + try: + parsed = urlparse(url) + domain = parsed.netloc + if domain.startswith('www.'): + return domain[4:] + return domain + except: + return "" + + def should_skip_url(self, url, current_depth): + """Check if URL should be skipped based on various criteria""" + normalized_url = self.normalize_url(url) + + # Skip if already visited + if normalized_url in self.visited_urls: + return True, "Already visited" + + # Skip if not a valid HTTP/HTTPS URL + if not normalized_url.startswith(('http://', 'https://')): + return True, "Not HTTP/HTTPS URL" + + # Get domain + domain = self.get_domain_from_url(normalized_url) + if not domain: + return True, "Invalid domain" + + # Removed all domain page limits - unlimited crawling + # Removed external domain depth limits - crawl as deep as needed + + return False, "OK" + + def scrape_url(self, url, depth): + """Scrape a single URL with error handling and rate limiting""" + try: + # Check if stop was requested + if self._stop_requested: + return None + + # Check if URL should be skipped + should_skip, reason = self.should_skip_url(url, depth) + if should_skip: + print(f"Skipping {url}: {reason}") + return None + + # Normalize URL + normalized_url = self.normalize_url(url) + + # Mark as visited and update domain count (for statistics only) + with self.lock: + self.visited_urls.add(normalized_url) + domain = self.get_domain_from_url(normalized_url) + if domain: + self.domain_page_counts[domain] = self.domain_page_counts.get(domain, 0) + 1 + + # Add small delay to prevent overwhelming servers + time.sleep(0.1) + + start_time = time.time() + + # Create request with headers + req = urllib.request.Request( + normalized_url, + headers={ + 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36', + 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', + 'Accept-Language': 'en-US,en;q=0.5', + 'Accept-Encoding': 'gzip, deflate', + 'Connection': 'keep-alive', + 'Upgrade-Insecure-Requests': '1', + } + ) + + # Fetch the page with timeout + with urllib.request.urlopen(req, timeout=15) as response: + # Check content type + content_type = response.headers.get('content-type', '').lower() + if 'text/html' not in content_type and 'application/xhtml' not in content_type: + print(f"Skipping {url}: Not HTML content ({content_type})") + return None + + html_content = response.read().decode('utf-8', errors='ignore') + + load_time = time.time() - start_time + + # Skip if content is too small (likely error page) + if len(html_content) < 100: + print(f"Skipping {url}: Content too small ({len(html_content)} chars)") + return None + + # Parse HTML + parser = HTMLParser() + parser.feed(html_content) + + # Extract links and normalize them with duplicate detection + links = [] + base_url = normalized_url + seen_links = set() # Track links within this page to avoid duplicates + + for link in parser.links: + try: + absolute_url = urljoin(base_url, link) + normalized_link = self.normalize_url(absolute_url) + + # Skip if already seen in this page or should be skipped + if normalized_link in seen_links: + continue + seen_links.add(normalized_link) + + should_skip, reason = self.should_skip_url(normalized_link, depth + 1) + if should_skip: + continue + + # Only include http/https links and filter out common non-content URLs + if (normalized_link.startswith(('http://', 'https://')) and + not any(skip in normalized_link.lower() for skip in [ + 'mailto:', 'tel:', 'javascript:', 'data:', 'file:', + '.pdf', '.doc', '.docx', '.xls', '.xlsx', '.zip', '.rar', + '.jpg', '.jpeg', '.png', '.gif', '.bmp', '.svg', '.ico', + '.css', '.js', '.xml', '.json', '.txt', '.log' + ])): + links.append(normalized_link) + except: + continue + + # Create Website object + website = Website( + title=parser.title, + url=normalized_url, + content=html_content, + depth=depth, + links=links, + load_time=load_time + ) + + return website + + except urllib.error.HTTPError as e: + print(f"HTTP Error scraping {url}: {e.code} - {e.reason}") + return None + except urllib.error.URLError as e: + print(f"URL Error scraping {url}: {e.reason}") + return None + except Exception as e: + print(f"Error scraping {url}: {str(e)}") + return None + + def crawl_website(self, start_url, max_depth=3, progress_callback=None): + """Crawl website with multithreading support and no page limits""" + if not start_url.startswith(('http://', 'https://')): + start_url = 'https://' + start_url + + # Initialize tracking + self.websites = [] + self.visited_urls = set() + self.visited_domains = set() + self.domain_page_counts = {} + self.start_domain = self.get_domain_from_url(start_url) + self._stop_requested = False # Reset stop flag + + print(f"Starting crawl from: {start_url}") + print(f"Starting domain: {self.start_domain}") + print(f"Max depth: {max_depth}") + print(f"Unlimited crawling - no page limits") + + # Start with the initial URL + urls_to_scrape = [(start_url, 0)] + max_depth_reached = 0 + consecutive_empty_levels = 0 + max_consecutive_empty = 3 # Stop if 3 consecutive levels have no new URLs + total_pages_scraped = 0 + # Removed all page limits - unlimited crawling + + with ThreadPoolExecutor(max_workers=self.max_workers) as executor: + for current_depth in range(max_depth + 1): + # Check if stop was requested + if self._stop_requested: + print("Scraping stopped by user request") + break + + if not urls_to_scrape: + print(f"Stopping at depth {current_depth}: No more URLs to scrape") + break + + # Check if we've reached too many consecutive empty levels + if consecutive_empty_levels >= max_consecutive_empty: + print(f"Stopping at depth {current_depth}: {max_consecutive_empty} consecutive empty levels") + break + + # Removed absolute page limit check - unlimited pages + + print(f"Scraping depth {current_depth} with {len(urls_to_scrape)} URLs") + + # Submit all URLs at current depth for concurrent scraping + future_to_url = { + executor.submit(self.scrape_url, url, depth): url + for url, depth in urls_to_scrape + } + + # Collect results and prepare next level + urls_to_scrape = [] + level_results = 0 + + for future in as_completed(future_to_url): + # Check if stop was requested + if self._stop_requested: + print("Stopping processing of current level") + break + + website = future.result() + if website: + with self.lock: + self.websites.append(website) + level_results += 1 + total_pages_scraped += 1 + + # Emit progress if callback provided + if progress_callback: + progress_callback(website) + + # Add links for next depth level (no limits) + if current_depth < max_depth: + for link in website.links: + # Removed URL limit per level - process all URLs + + should_skip, reason = self.should_skip_url(link, current_depth + 1) + if not should_skip: + urls_to_scrape.append((link, current_depth + 1)) + + # Check if stop was requested after processing level + if self._stop_requested: + break + + # Update depth tracking + if level_results > 0: + max_depth_reached = current_depth + consecutive_empty_levels = 0 + else: + consecutive_empty_levels += 1 + + # Only stop if we've reached the actual max depth + if current_depth >= max_depth: + print(f"Reached maximum depth: {max_depth}") + break + + # Print progress summary + print(f"Depth {current_depth} completed: {level_results} pages, Total: {len(self.websites)}") + if self.domain_page_counts: + print(f"Domain breakdown: {dict(self.domain_page_counts)}") + + print(f"Crawling completed. Max depth reached: {max_depth_reached}, Total pages: {len(self.websites)}") + print(f"Visited URLs: {len(self.visited_urls)}") + print(f"Domain breakdown: {dict(self.domain_page_counts)}") + return self.websites + + def reset(self): + """Reset the scraper state for a new crawl""" + self.websites = [] + self.visited_urls = set() + self.visited_domains = set() + self.domain_page_counts = {} + self.start_domain = None + self._stop_requested = False # Reset stop flag + + def get_statistics(self): + """Get scraping statistics with enhanced tracking information""" + if not self.websites: + return { + 'total_pages': 0, + 'total_links': 0, + 'total_words': 0, + 'avg_load_time': 0, + 'max_depth_reached': 0, + 'domains': {}, + 'visited_urls_count': 0, + 'domain_page_counts': {}, + 'start_domain': self.start_domain + } + + total_pages = len(self.websites) + total_links = sum(len(w.links) for w in self.websites) + total_words = sum(w.get_word_count() for w in self.websites) + + load_times = [w.load_time for w in self.websites if w.load_time] + avg_load_time = sum(load_times) / len(load_times) if load_times else 0 + + max_depth_reached = max(w.depth for w in self.websites) + + # Count domains + domains = {} + for website in self.websites: + domain = website.get_normalized_domain() + domains[domain] = domains.get(domain, 0) + 1 + + return { + 'total_pages': total_pages, + 'total_links': total_links, + 'total_words': total_words, + 'avg_load_time': avg_load_time, + 'max_depth_reached': max_depth_reached, + 'domains': domains, + 'visited_urls_count': len(self.visited_urls), + 'domain_page_counts': dict(self.domain_page_counts), + 'start_domain': self.start_domain + } + + def filter_by_domain(self, domain): + """Filter websites by domain""" + normalized_domain = self.normalize_url(domain) + return [w for w in self.websites if w.get_normalized_domain() == normalized_domain] + + def search_websites(self, query): + """Search websites by query""" + return [w for w in self.websites if w.search_content(query)] + + def stop_scraping(self): + """Request graceful stop of the scraping process""" + self._stop_requested = True \ No newline at end of file diff --git a/community-contributions/WebScraperApp/requirements.txt b/community-contributions/WebScraperApp/requirements.txt new file mode 100644 index 0000000..a9f1b2a --- /dev/null +++ b/community-contributions/WebScraperApp/requirements.txt @@ -0,0 +1,5 @@ +PyQt5>=5.15.0 +PyQtWebEngine>=5.15.0 +urllib3==2.0.7 +openai>=1.0.0 +python-dotenv>=1.0.0 \ No newline at end of file diff --git a/community-contributions/WebScraperApp/test.py b/community-contributions/WebScraperApp/test.py new file mode 100644 index 0000000..e86a29c --- /dev/null +++ b/community-contributions/WebScraperApp/test.py @@ -0,0 +1,161 @@ +#!/usr/bin/env python3 +""" +Simple test script to verify the web scraping functionality +""" + +import module + +def test_basic_scraping(): + """Test basic scraping functionality""" + print("Testing basic web scraping...") + + # Create a scraper instance + scraper = module.WebScraper() + + # Test with a simple website (httpbin.org is a safe test site) + test_url = "https://httpbin.org/html" + + print(f"Scraping {test_url} with depth 1...") + + try: + # Scrape with depth 1 to keep it fast + websites = scraper.crawl_website(test_url, max_depth=1) + + print(f"Successfully scraped {len(websites)} websites") + + if websites: + # Show first website details + first_site = websites[0] + print(f"\nFirst website:") + print(f" Title: {first_site.title}") + print(f" URL: {first_site.url}") + print(f" Depth: {first_site.depth}") + print(f" Links found: {len(first_site.links)}") + print(f" Word count: {first_site.get_word_count()}") + + # Show statistics + stats = scraper.get_statistics() + print(f"\nStatistics:") + print(f" Total pages: {stats['total_pages']}") + print(f" Total links: {stats['total_links']}") + print(f" Total words: {stats['total_words']}") + print(f" Average load time: {stats['avg_load_time']:.2f}s") + + return True + else: + print("No websites were scraped") + return False + + except Exception as e: + print(f"Error during scraping: {e}") + return False + +def test_website_class(): + """Test the Website class functionality""" + print("\nTesting Website class...") + + # Create a test website + website = module.Website( + title="Test Website", + url="https://example.com", + content="
This is a test paragraph.
", + depth=0, + links=["https://example.com/page1", "https://example.com/page2"] + ) + + # Test methods + print(f"Website title: {website.title}") + print(f"Website URL: {website.url}") + print(f"Word count: {website.get_word_count()}") + print(f"Domain: {website.get_domain()}") + print(f"Normalized domain: {website.get_normalized_domain()}") + print(f"Search for 'test': {website.search_content('test')}") + print(f"Search for 'nonexistent': {website.search_content('nonexistent')}") + + return True + +def test_html_parser(): + """Test the HTML parser functionality""" + print("\nTesting HTML Parser...") + + parser = module.HTMLParser() + test_html = """ + +This is a link to example.com
+Here's another relative link
+ + + """ + + parser.feed(test_html) + print(f"Title extracted: {parser.title}") + print(f"Links found: {parser.links}") + print(f"Text content length: {len(parser.get_text())}") + + return True + +def test_url_normalization(): + """Test URL normalization to handle www. prefixes""" + print("\nTesting URL Normalization...") + + scraper = module.WebScraper() + + # Test URLs with and without www. + test_urls = [ + "https://www.example.com/page", + "https://example.com/page", + "http://www.test.com/path?param=value#fragment", + "http://test.com/path?param=value#fragment" + ] + + print("URL Normalization Results:") + for url in test_urls: + normalized = scraper.normalize_url(url) + print(f" Original: {url}") + print(f" Normalized: {normalized}") + print() + + # Test domain filtering + print("Domain Filtering Test:") + test_websites = [ + module.Website("Site 1", "https://www.example.com", "content", 0), + module.Website("Site 2", "https://example.com", "content", 0), + module.Website("Site 3", "https://www.test.com", "content", 0) + ] + + scraper.websites = test_websites + + # Test filtering by domain with and without www. + domains_to_test = ["example.com", "www.example.com", "test.com", "www.test.com"] + + for domain in domains_to_test: + filtered = scraper.filter_by_domain(domain) + print(f" Filter '{domain}': {len(filtered)} results") + for site in filtered: + print(f" - {site.title} ({site.url})") + + return True + +if __name__ == "__main__": + print("Web Scraper Test Suite") + print("=" * 50) + + # Test HTML parser + test_html_parser() + + # Test Website class + test_website_class() + + # Test URL normalization + test_url_normalization() + + # Test basic scraping (uncomment to test actual scraping) + # Note: This requires internet connection + # test_basic_scraping() + + print("\nTest completed!") + print("\nTo run the full application:") + print("python web_scraper_app.py") \ No newline at end of file diff --git a/community-contributions/WebScraperApp/web_scraper_app.py b/community-contributions/WebScraperApp/web_scraper_app.py new file mode 100644 index 0000000..ccd5ce2 --- /dev/null +++ b/community-contributions/WebScraperApp/web_scraper_app.py @@ -0,0 +1,1678 @@ +import sys +import json +from urllib.parse import urlparse +from PyQt5.QtWidgets import (QApplication, QMainWindow, QWidget, QVBoxLayout, + QHBoxLayout, QLabel, QLineEdit, QSpinBox, QPushButton, + QTextEdit, QTableWidget, QTableWidgetItem, QTabWidget, + QProgressBar, QComboBox, QMessageBox, QSplitter, + QGroupBox, QGridLayout, QHeaderView, QFrame, QScrollArea, + QSystemTrayIcon, QStyle, QAction, QMenu, QTreeWidget, QTreeWidgetItem, + QListWidget, QListWidgetItem, QSizePolicy, QAbstractItemView) +from PyQt5.QtCore import QThread, pyqtSignal, Qt, QTimer, QUrl +from PyQt5.QtGui import QFont, QIcon, QPalette, QColor, QPixmap +try: + from PyQt5.QtWebEngineWidgets import QWebEngineView + WEB_ENGINE_AVAILABLE = True + print("PyQtWebEngine successfully imported - Visual preview enabled") +except ImportError as e: + WEB_ENGINE_AVAILABLE = False + print(f"PyQtWebEngine not available: {e}") + print("Visual preview will be disabled. Install with: pip install PyQtWebEngine") +import module +import re +import webbrowser +import os +try: + from openai import OpenAI + OPENAI_AVAILABLE = True +except ImportError: + OPENAI_AVAILABLE = False +from datetime import datetime +from dotenv import load_dotenv +import markdown + +# Load environment variables from .env file +load_dotenv() + +class ScrapingThread(QThread): + """Thread for running web scraping operations""" + progress_updated = pyqtSignal(str) + scraping_complete = pyqtSignal(list) + error_occurred = pyqtSignal(str) + + def __init__(self, url, max_depth): + super().__init__() + self.url = url + self.max_depth = max_depth + self.scraper = module.WebScraper() + self._stop_requested = False + + def stop(self): + """Request graceful stop of the scraping process""" + self._stop_requested = True + if hasattr(self.scraper, 'stop_scraping'): + self.scraper.stop_scraping() + + def run(self): + try: + self.progress_updated.emit("Starting web scraping...") + + # Reset scraper state for new crawl + self.scraper.reset() + + def progress_callback(website): + if self._stop_requested: + return # Stop processing if requested + if website: + self.progress_updated.emit(f"Scraped: {website.title} (depth {website.depth})") + + # Start scraping with progress callback + websites = self.scraper.crawl_website(self.url, self.max_depth, progress_callback) + + # Check if stop was requested + if self._stop_requested: + self.progress_updated.emit("Scraping stopped by user.") + return + + # Emit final progress + self.progress_updated.emit(f"Scraping complete! Found {len(websites)} websites.") + self.scraping_complete.emit(websites) + + except Exception as e: + if not self._stop_requested: # Only emit error if not stopped by user + self.error_occurred.emit(str(e)) + +class ModernButton(QPushButton): + """Custom modern button with hover effects""" + def __init__(self, text, primary=False): + super().__init__(text) + self.primary = primary + self.setMinimumHeight(40) + self.setFont(QFont("Segoe UI", 10, QFont.Weight.Medium)) + self.setCursor(Qt.CursorShape.PointingHandCursor) + self.update_style() + + def update_style(self): + if self.primary: + self.setStyleSheet(""" + QPushButton { + background: #3b82f6; + border: none; + color: white; + padding: 12px 24px; + border-radius: 6px; + font-weight: 600; + } + QPushButton:hover { + background: #2563eb; + } + QPushButton:pressed { + background: #1d4ed8; + } + QPushButton:disabled { + background: #9ca3af; + color: #f3f4f6; + } + """) + else: + self.setStyleSheet(""" + QPushButton { + background: white; + border: 1px solid #d1d5db; + color: #374151; + padding: 10px 20px; + border-radius: 6px; + font-weight: 500; + } + QPushButton:hover { + border-color: #3b82f6; + color: #3b82f6; + background: #f8fafc; + } + QPushButton:pressed { + background: #f1f5f9; + } + QPushButton:disabled { + background: #f9fafb; + border-color: #e5e7eb; + color: #9ca3af; + } + """) + +class ModernLineEdit(QLineEdit): + """Custom modern input field""" + def __init__(self, placeholder=""): + super().__init__() + self.setPlaceholderText(placeholder) + self.setMinimumHeight(40) + self.setFont(QFont("Segoe UI", 10)) + self.setStyleSheet(""" + QLineEdit { + border: 1px solid #d1d5db; + border-radius: 6px; + padding: 8px 12px; + background: white; + color: #374151; + font-size: 14px; + } + QLineEdit:focus { + border-color: #3b82f6; + outline: none; + } + QLineEdit::placeholder { + color: #9ca3af; + } + """) + +class ModernSpinBox(QSpinBox): + """Custom modern spin box""" + def __init__(self): + super().__init__() + self.setMinimumHeight(40) + self.setFont(QFont("Segoe UI", 10)) + self.setStyleSheet(""" + QSpinBox { + border: 1px solid #d1d5db; + border-radius: 6px; + padding: 8px 12px; + background: white; + color: #374151; + font-size: 14px; + } + QSpinBox:focus { + border-color: #3b82f6; + } + QSpinBox::up-button, QSpinBox::down-button { + border: none; + background: #f9fafb; + border-radius: 3px; + margin: 2px; + } + QSpinBox::up-button:hover, QSpinBox::down-button:hover { + background: #f3f4f6; + } + """) + +class ChatBubbleWidget(QWidget): + def __init__(self, message, timestamp, role): + super().__init__() + layout = QVBoxLayout(self) + layout.setContentsMargins(0, 0, 0, 0) + layout.setSpacing(2) + # Bubble + if role == "ai": + html = markdown.markdown(message) + bubble = QLabel(html) + bubble.setTextFormat(Qt.TextFormat.RichText) + else: + bubble = QLabel(message) + bubble.setTextFormat(Qt.TextFormat.PlainText) + bubble.setWordWrap(True) + bubble.setTextInteractionFlags(Qt.TextInteractionFlag.TextSelectableByMouse) + bubble.setFont(QFont("Segoe UI", 11)) + bubble.setSizePolicy(QSizePolicy.Preferred, QSizePolicy.Maximum) + bubble.setMinimumWidth(800) + bubble.setMaximumWidth(1200) + bubble.adjustSize() + # Timestamp + ts = QLabel(("🤖 " if role == "ai" else "") + timestamp) + ts.setFont(QFont("Segoe UI", 8)) + ts.setStyleSheet("color: #9ca3af;") + if role == "user": + bubble.setStyleSheet("background: #2563eb; color: white; border-radius: 16px; padding: 10px 16px; margin-left: 40px;") + layout.setAlignment(Qt.AlignmentFlag.AlignRight) + ts.setAlignment(Qt.AlignmentFlag.AlignRight) + else: + bubble.setStyleSheet("background: #f3f4f6; color: #1e293b; border-radius: 16px; padding: 10px 16px; margin-right: 40px;") + layout.setAlignment(Qt.AlignmentFlag.AlignLeft) + ts.setAlignment(Qt.AlignmentFlag.AlignLeft) + layout.addWidget(bubble) + layout.addWidget(ts) + +class WebScraperApp(QMainWindow): + def __init__(self): + super().__init__() + self.websites = [] + self.scraper = module.WebScraper() + self.init_ui() + + def init_ui(self): + self.setWindowTitle("Web Scraper & Data Analyzer") + self.setGeometry(100, 100, 1400, 900) + self.setMinimumSize(1200, 800) # Set minimum size to prevent geometry issues + + # Set clean, minimal styling + self.setStyleSheet(""" + QMainWindow { + background: #1e293b; + } + QTabWidget::pane { + border: none; + background: white; + border-radius: 8px; + margin: 8px 8px 8px 8px; + padding-top: 8px; + } + QTabBar::tab { + background: #475569; + color: #e2e8f0; + padding: 12px 20px; + margin-right: 4px; + border-top-left-radius: 8px; + border-top-right-radius: 8px; + font-weight: 600; + font-size: 14px; + min-width: 120px; + margin-bottom: 8px; + } + QTabBar::tab:selected { + background: white; + color: #1e293b; + border-bottom: none; + margin-bottom: 8px; + } + QTabBar::tab:hover:!selected { + background: #64748b; + color: #f1f5f9; + } + QTabBar::tab:first { + margin-left: 8px; + } + QTabBar::tab:last { + margin-right: 8px; + } + QGroupBox { + font-weight: 600; + font-size: 14px; + border: 2px solid #e2e8f0; + border-radius: 8px; + margin-top: 16px; + padding-top: 16px; + background: #f8fafc; + } + QGroupBox::title { + subcontrol-origin: margin; + left: 16px; + + color: #1e293b; + background: #f8fafc; + } + QTableWidget { + border: 2px solid #e2e8f0; + border-radius: 8px; + background: white; + gridline-color: #f1f5f9; + alternate-background-color: #f8fafc; + selection-background-color: #dbeafe; + selection-color: #1e293b; + } + QTableWidget::item { + padding: 8px 4px; + border: none; + min-height: 20px; + } + QTableWidget::item:selected { + background: #dbeafe; + color: #1e293b; + } + QHeaderView::section { + background: #e2e8f0; + padding: 12px 8px; + border: none; + border-right: 1px solid #cbd5e1; + border-bottom: 1px solid #cbd5e1; + font-weight: 600; + color: #1e293b; + } + QHeaderView::section:vertical { + background: #f8fafc; + padding: 8px 4px; + border: none; + border-bottom: 1px solid #e2e8f0; + font-weight: 500; + color: #64748b; + min-width: 40px; + } + QProgressBar { + border: 2px solid #e2e8f0; + border-radius: 6px; + text-align: center; + background: #f1f5f9; + } + QProgressBar::chunk { + background: #3b82f6; + border-radius: 5px; + } + QTextEdit { + border: 2px solid #e2e8f0; + border-radius: 6px; + padding: 12px; + background: white; + color: #1e293b; + font-family: 'Segoe UI', sans-serif; + } + QComboBox { + border: 2px solid #d1d5db; + border-radius: 6px; + padding: 8px 12px; + background: white; + color: #1e293b; + font-size: 14px; + min-height: 40px; + } + QComboBox:focus { + border-color: #3b82f6; + } + QComboBox::drop-down { + border: none; + width: 30px; + } + QComboBox::down-arrow { + image: none; + border-left: 5px solid transparent; + border-right: 5px solid transparent; + border-top: 5px solid #6b7280; + margin-right: 10px; + } + QLabel { + color: #1e293b; + font-weight: 500; + font-size: 14px; + } + """) + + # System tray icon for notifications + + self.tray_icon = QSystemTrayIcon(self) + self.tray_icon.setIcon(self.style().standardIcon(QStyle.StandardPixmap.SP_ComputerIcon)) + self.tray_icon.setVisible(True) + + # Create central widget and main layout + central_widget = QWidget() + self.setCentralWidget(central_widget) + main_layout = QVBoxLayout(central_widget) + main_layout.setContentsMargins(16, 16, 16, 16) + main_layout.setSpacing(12) + + # Create header + header = self.create_header() + main_layout.addWidget(header) + + # Add proper spacing after header + spacer = QWidget() + spacer.setFixedHeight(12) + main_layout.addWidget(spacer) + + # Create tab widget with proper margins + self.tab_widget = QTabWidget() + self.tab_widget.setStyleSheet(""" + QTabWidget { + margin-top: 0px; + background: transparent; + } + QTabWidget::pane { + border: none; + background: white; + border-radius: 8px; + margin: 4px 8px 8px 8px; + padding-top: 4px; + } + QTabBar { + background: transparent; + spacing: 0px; + } + QTabBar::tab { + background: #475569; + color: #e2e8f0; + padding: 12px 20px; + margin-right: 4px; + border-top-left-radius: 8px; + border-top-right-radius: 8px; + font-weight: 600; + font-size: 14px; + min-width: 120px; + margin-bottom: 4px; + } + QTabBar::tab:selected { + background: white; + color: #1e293b; + border-bottom: none; + margin-bottom: 4px; + } + QTabBar::tab:hover:!selected { + background: #64748b; + color: #f1f5f9; + } + QTabBar::tab:first { + margin-left: 8px; + } + QTabBar::tab:last { + margin-right: 8px; + } + """) + main_layout.addWidget(self.tab_widget) + + # Create tabs + self.create_scraping_tab() + self.create_data_tab() + self.create_analysis_tab() + self.create_sitemap_tab() + self.create_ai_tab() + + def create_header(self): + """Create a clean header with help button only (no theme toggle)""" + header_widget = QWidget() + header_widget.setStyleSheet(""" + QWidget { + background: #0f172a; + border-radius: 12px; + margin: 4px 4px 8px 4px; + } + """) + header_layout = QHBoxLayout(header_widget) + header_layout.setContentsMargins(24, 20, 24, 20) + header_layout.setSpacing(16) + + # Title + title_label = QLabel("Web Scraper & Data Analyzer") + title_label.setStyleSheet(""" + QLabel { + color: #f8fafc; + font-size: 28px; + font-weight: 800; + font-family: 'Segoe UI', sans-serif; + } + """) + + # Subtitle + subtitle_label = QLabel("Modern web scraping with intelligent data analysis") + subtitle_label.setStyleSheet(""" + QLabel { + color: #cbd5e1; + font-size: 16px; + font-weight: 500; + font-family: 'Segoe UI', sans-serif; + } + """) + + # Help button + help_button = ModernButton("Help") + help_button.clicked.connect(self.show_help) + + # Right side info + info_widget = QWidget() + info_layout = QVBoxLayout(info_widget) + info_layout.setAlignment(Qt.AlignmentFlag.AlignRight) + info_layout.setSpacing(4) + + version_label = QLabel("v2.0") + version_label.setStyleSheet(""" + QLabel { + color: #94a3b8; + font-size: 14px; + font-weight: 600; + background: #1e293b; + padding: 6px 12px; + border-radius: 6px; + border: 1px solid #334155; + } + """) + + info_layout.addWidget(version_label) + + header_layout.addWidget(title_label) + header_layout.addStretch() + header_layout.addWidget(subtitle_label) + header_layout.addStretch() + header_layout.addWidget(help_button) + header_layout.addWidget(info_widget) + + return header_widget + + def create_scraping_tab(self): + """Create the web scraping configuration tab""" + scraping_widget = QWidget() + main_layout = QVBoxLayout(scraping_widget) + main_layout.setContentsMargins(16, 16, 16, 16) + main_layout.setSpacing(16) + + # Create scroll area + scroll_area = QScrollArea() + scroll_area.setWidgetResizable(True) + scroll_area.setStyleSheet("QScrollArea { border: none; }") + scroll_area.setHorizontalScrollBarPolicy(Qt.ScrollBarPolicy.ScrollBarAsNeeded) + scroll_area.setVerticalScrollBarPolicy(Qt.ScrollBarPolicy.ScrollBarAsNeeded) + + # Create content widget for scrolling + content_widget = QWidget() + layout = QVBoxLayout(content_widget) + layout.setSpacing(16) + layout.setContentsMargins(0, 0, 0, 0) + + # Input group + input_group = QGroupBox("Scraping Configuration") + input_layout = QGridLayout(input_group) + input_layout.setSpacing(12) + + # URL input + input_layout.addWidget(QLabel("Website URL:"), 0, 0) + self.url_input = ModernLineEdit("https://example.com") + input_layout.addWidget(self.url_input, 0, 1) + + # Depth input + input_layout.addWidget(QLabel("Max Depth (1-100):"), 1, 0) + self.depth_input = ModernSpinBox() + self.depth_input.setRange(1, 100) + self.depth_input.setValue(3) + input_layout.addWidget(self.depth_input, 1, 1) + + # Control buttons + button_layout = QHBoxLayout() + button_layout.setSpacing(8) + + self.start_button = ModernButton("Start Scraping", primary=True) + self.start_button.clicked.connect(self.start_scraping) + button_layout.addWidget(self.start_button) + + self.stop_button = ModernButton("Stop") + self.stop_button.clicked.connect(self.stop_scraping) + self.stop_button.setEnabled(False) + button_layout.addWidget(self.stop_button) + + input_layout.addLayout(button_layout, 2, 0, 1, 2) + layout.addWidget(input_group) + + # Progress group + progress_group = QGroupBox("Progress") + progress_layout = QVBoxLayout(progress_group) + progress_layout.setSpacing(8) + + self.progress_bar = QProgressBar() + self.progress_bar.setVisible(False) + self.progress_bar.setMinimumHeight(20) + progress_layout.addWidget(self.progress_bar) + + self.status_label = QLabel("Ready to start scraping...") + self.status_label.setStyleSheet(""" + QLabel { + color: #374151; + font-size: 14px; + padding: 8px; + background: #f8fafc; + border-radius: 6px; + border-left: 3px solid #3b82f6; + } + """) + self.status_label.setWordWrap(True) # Enable word wrapping + progress_layout.addWidget(self.status_label) + + layout.addWidget(progress_group) + + # Results preview + results_group = QGroupBox("Scraping Results") + results_layout = QVBoxLayout(results_group) + + self.results_text = QTextEdit() + self.results_text.setReadOnly(True) + self.results_text.setMinimumHeight(80) # Reduced minimum height for more compact output + results_layout.addWidget(self.results_text) + + layout.addWidget(results_group) + + # Set the content widget in the scroll area + scroll_area.setWidget(content_widget) + main_layout.addWidget(scroll_area) + + self.tab_widget.addTab(scraping_widget, "Web Scraping") + + def create_data_tab(self): + """Create the data viewing and filtering tab""" + data_widget = QWidget() + layout = QVBoxLayout(data_widget) + layout.setSpacing(16) + + # Search and filter controls + controls_group = QGroupBox("Search & Filter") + controls_layout = QHBoxLayout(controls_group) + controls_layout.setSpacing(12) + + controls_layout.addWidget(QLabel("Search:")) + self.search_input = ModernLineEdit("Enter search term...") + self.search_input.textChanged.connect(self.filter_data) + controls_layout.addWidget(self.search_input) + + controls_layout.addWidget(QLabel("Domain:")) + self.domain_filter = QComboBox() + self.domain_filter.currentTextChanged.connect(self.filter_data) + controls_layout.addWidget(self.domain_filter) + + self.export_button = ModernButton("Export Data") + self.export_button.clicked.connect(self.export_data) + controls_layout.addWidget(self.export_button) + + # Sitemap button + self.sitemap_button = ModernButton("Generate Sitemap.xml") + self.sitemap_button.clicked.connect(self.generate_sitemap) + controls_layout.addWidget(self.sitemap_button) + + layout.addWidget(controls_group) + + # Data table + self.data_table = QTableWidget() + self.data_table.setColumnCount(6) + self.data_table.setHorizontalHeaderLabels([ + "Title", "URL", "Depth", "Links", "Words", "Load Time" + ]) + + # Set table properties to fill available width + header = self.data_table.horizontalHeader() + header.setStretchLastSection(False) # Don't stretch the last section + + # Set resize modes to make table fill width properly + header.setSectionResizeMode(0, QHeaderView.Stretch) # Title - stretch to fill + header.setSectionResizeMode(1, QHeaderView.Stretch) # URL - stretch to fill + header.setSectionResizeMode(2, QHeaderView.Fixed) # Depth - fixed + header.setSectionResizeMode(3, QHeaderView.Fixed) # Links - fixed + header.setSectionResizeMode(4, QHeaderView.Fixed) # Words - fixed + header.setSectionResizeMode(5, QHeaderView.Fixed) # Load Time - fixed + + # Set fixed column widths for non-stretching columns + self.data_table.setColumnWidth(2, 80) # Depth + self.data_table.setColumnWidth(3, 80) # Links + self.data_table.setColumnWidth(4, 80) # Words + self.data_table.setColumnWidth(5, 100) # Load Time + + # Set row height to prevent index cutoff + self.data_table.verticalHeader().setDefaultSectionSize(40) # Increased row height + self.data_table.verticalHeader().setMinimumSectionSize(35) # Minimum row height + + # Enable word wrapping for title and URL columns + self.data_table.setWordWrap(True) + + # Connect double-click signal + self.data_table.cellDoubleClicked.connect(self.show_content_preview) + + layout.addWidget(self.data_table) + + self.tab_widget.addTab(data_widget, "Data View") + + def create_analysis_tab(self): + """Create the data analysis tab""" + analysis_widget = QWidget() + layout = QVBoxLayout(analysis_widget) + layout.setSpacing(16) + + # Create scroll area for better layout + scroll_area = QScrollArea() + scroll_area.setWidgetResizable(True) + scroll_area.setStyleSheet("QScrollArea { border: none; }") + + content_widget = QWidget() + content_layout = QVBoxLayout(content_widget) + content_layout.setSpacing(16) + + # Statistics group + stats_group = QGroupBox("Statistics") + stats_layout = QGridLayout(stats_group) + stats_layout.setSpacing(12) + + self.stats_labels = {} + stats_fields = [ + ("Total Pages", "Total Pages"), + ("Total Links", "Total Links"), + ("Total Words", "Total Words"), + ("Average Load Time", "Average Load Time"), + ("Max Depth Reached", "Max Depth Reached") + ] + + for i, (label_text, field) in enumerate(stats_fields): + stats_layout.addWidget(QLabel(f"{label_text}:"), i, 0) + label = QLabel("0") + label.setStyleSheet(""" + QLabel { + font-weight: 700; + color: #3b82f6; + font-size: 16px; + padding: 8px 12px; + background: #eff6ff; + border-radius: 6px; + border-left: 3px solid #3b82f6; + } + """) + self.stats_labels[field] = label + stats_layout.addWidget(label, i, 1) + + content_layout.addWidget(stats_group) + + # Domain breakdown + domain_group = QGroupBox("Domain Breakdown") + domain_layout = QVBoxLayout(domain_group) + + self.domain_text = QTextEdit() + self.domain_text.setReadOnly(True) + self.domain_text.setMaximumHeight(150) + domain_layout.addWidget(self.domain_text) + + content_layout.addWidget(domain_group) + + # Content preview + content_preview_group = QGroupBox("Content Preview") + content_preview_layout = QVBoxLayout(content_preview_group) + + # Create splitter for text and visual preview + preview_splitter = QSplitter(Qt.Orientation.Horizontal) + + # Text preview + text_preview_widget = QWidget() + text_preview_layout = QVBoxLayout(text_preview_widget) + text_preview_layout.setContentsMargins(0, 0, 0, 0) + + text_label = QLabel("Text Content:") + text_label.setStyleSheet("font-weight: 600; margin-bottom: 8px;") + text_preview_layout.addWidget(text_label) + + self.content_text = QTextEdit() + self.content_text.setReadOnly(True) + self.content_text.setMaximumHeight(400) + self.content_text.setFont(QFont("Segoe UI", 12)) + self.content_text.setStyleSheet(""" + QTextEdit { + font-size: 12px; + line-height: 1.4; + padding: 16px; + } + """) + text_preview_layout.addWidget(self.content_text) + + # Visual HTML preview + visual_preview_widget = QWidget() + visual_preview_layout = QVBoxLayout(visual_preview_widget) + visual_preview_layout.setContentsMargins(0, 0, 0, 0) + + visual_label = QLabel("Visual Preview:") + visual_label.setStyleSheet("font-weight: 600; margin-bottom: 8px;") + visual_preview_layout.addWidget(visual_label) + + if WEB_ENGINE_AVAILABLE: + self.web_view = QWebEngineView() + self.web_view.setMinimumHeight(400) + self.web_view.setMaximumHeight(400) + visual_preview_layout.addWidget(self.web_view) + else: + self.web_view = QLabel("Visual preview not available\nInstall PyQtWebEngine for HTML rendering") + self.web_view.setStyleSheet("color: #6b7280; padding: 20px; text-align: center;") + self.web_view.setMinimumHeight(400) + self.web_view.setMaximumHeight(400) + visual_preview_layout.addWidget(self.web_view) + + # Add widgets to splitter + preview_splitter.addWidget(text_preview_widget) + preview_splitter.addWidget(visual_preview_widget) + preview_splitter.setSizes([400, 600]) # Set initial split ratio + + content_preview_layout.addWidget(preview_splitter) + + content_layout.addWidget(content_preview_group) + + scroll_area.setWidget(content_widget) + layout.addWidget(scroll_area) + + self.tab_widget.addTab(analysis_widget, "Analysis") + + def create_sitemap_tab(self): + """Create the visual sitemap tab with a tree widget and export button""" + sitemap_widget = QWidget() + layout = QVBoxLayout(sitemap_widget) + layout.setSpacing(16) + + # Export button + self.export_sitemap_button = ModernButton("Export Sitemap (JSON)") + self.export_sitemap_button.clicked.connect(self.export_sitemap_json) + layout.addWidget(self.export_sitemap_button) + + self.sitemap_tree = QTreeWidget() + self.sitemap_tree.setHeaderLabels(["Page Title", "URL"]) + self.sitemap_tree.setColumnWidth(0, 350) + self.sitemap_tree.setColumnWidth(1, 600) + self.sitemap_tree.itemDoubleClicked.connect(self.open_url_in_browser) + layout.addWidget(self.sitemap_tree) + + self.tab_widget.addTab(sitemap_widget, "Sitemap") + + def create_ai_tab(self): + """Create a simplified, modern AI Analysis tab with a chat interface and compact quick actions, using more curves to match the app style.""" + ai_widget = QWidget() + layout = QVBoxLayout(ai_widget) + layout.setSpacing(8) + layout.setContentsMargins(16, 16, 16, 16) + + hint_label = QLabel("💡 Ask questions about your scraped websites below.") + hint_label.setStyleSheet(""" + QLabel { + color: #64748b; + font-size: 13px; + padding: 4px 0 8px 0; + } + """) + layout.addWidget(hint_label) + + # --- Chat area --- + self.ai_chat_history = QListWidget() + self.ai_chat_history.setStyleSheet(""" + QListWidget { + background: #f8fafc; + border: 1.5px solid #e2e8f0; + border-radius: 22px; + font-size: 15px; + color: #1e293b; + padding: 12px; + font-family: 'Segoe UI', sans-serif; + } + """) + self.ai_chat_history.setSpacing(6) + self.ai_chat_history.setMinimumHeight(300) + self.ai_chat_history.setResizeMode(QListWidget.Adjust) + self.ai_chat_history.setVerticalScrollMode(QAbstractItemView.ScrollPerPixel) + layout.addWidget(self.ai_chat_history, stretch=1) + self.chat_messages = [] # Store (role, message, timestamp) tuples + self.render_chat_history() + + # --- Quick action buttons --- + quick_actions_widget = QWidget() + quick_actions_layout = QHBoxLayout(quick_actions_widget) + quick_actions_layout.setSpacing(8) + quick_actions_layout.setContentsMargins(0, 0, 0, 0) + quick_questions = [ + "Analyze the website structure", + "Find key content themes", + "Suggest SEO improvements", + "Compare page performance" + ] + for question in quick_questions: + quick_btn = QPushButton(question) + quick_btn.setFont(QFont("Segoe UI", 10)) + quick_btn.setCursor(Qt.CursorShape.PointingHandCursor) + quick_btn.clicked.connect(lambda _, q=question: self.quick_question(q)) + quick_btn.setStyleSheet(""" + QPushButton { + background: #e0e7ef; + border: none; + color: #374151; + padding: 8px 22px; + border-radius: 22px; + font-weight: 500; + font-size: 13px; + box-shadow: 0 2px 8px rgba(59, 130, 246, 0.04); + } + QPushButton:hover { + background: #3b82f6; + color: white; + } + QPushButton:pressed { + background: #2563eb; + color: white; + } + """) + quick_actions_layout.addWidget(quick_btn) + layout.addWidget(quick_actions_widget) + + # --- Input area --- + input_container = QWidget() + input_layout = QHBoxLayout(input_container) + input_layout.setContentsMargins(0, 0, 0, 0) + input_layout.setSpacing(8) + self.ai_input = QLineEdit() + self.ai_input.setPlaceholderText("Type your question and press Enter...") + self.ai_input.setMinimumHeight(44) + self.ai_input.setFont(QFont("Segoe UI", 12)) + self.ai_input.returnPressed.connect(self.send_ai_message) + self.ai_input.setStyleSheet(""" + QLineEdit { + border: 1.5px solid #e2e8f0; + border-radius: 22px; + padding: 10px 20px; + background: white; + color: #1e293b; + font-size: 14px; + } + QLineEdit:focus { + border-color: #3b82f6; + outline: none; + } + QLineEdit::placeholder { + color: #9ca3af; + } + """) + self.ai_send_button = QPushButton("Send") + self.ai_send_button.setMinimumHeight(44) + self.ai_send_button.setMinimumWidth(80) + self.ai_send_button.setFont(QFont("Segoe UI", 12, QFont.Weight.Medium)) + self.ai_send_button.setCursor(Qt.CursorShape.PointingHandCursor) + self.ai_send_button.clicked.connect(self.send_ai_message) + self.ai_send_button.setStyleSheet(""" + QPushButton { + background: #3b82f6; + border: none; + color: white; + padding: 10px 28px; + border-radius: 22px; + font-weight: 600; + font-size: 15px; + box-shadow: 0 2px 8px rgba(59, 130, 246, 0.08); + } + QPushButton:hover { + background: #2563eb; + } + QPushButton:pressed { + background: #1d4ed8; + } + QPushButton:disabled { + background: #9ca3af; + color: #f3f4f6; + } + """) + input_layout.addWidget(self.ai_input, stretch=1) + input_layout.addWidget(self.ai_send_button) + layout.addWidget(input_container) + + self.tab_widget.addTab(ai_widget, "AI Analysis") + ai_tab_index = self.tab_widget.count() - 1 + self.set_ai_tab_gradient(ai_tab_index) + + def render_chat_history(self): + self.ai_chat_history.clear() + for role, msg, timestamp in self.chat_messages: + item = QListWidgetItem() + bubble = ChatBubbleWidget(msg, timestamp, role) + bubble.adjustSize() + item.setSizeHint(bubble.sizeHint()) + self.ai_chat_history.addItem(item) + self.ai_chat_history.setItemWidget(item, bubble) + self.ai_chat_history.scrollToBottom() + + def send_ai_message(self): + user_msg = self.ai_input.text().strip() + if not user_msg: + return + timestamp = datetime.now().strftime("%H:%M") + self.chat_messages.append(("user", user_msg, timestamp)) + self.render_chat_history() + self.ai_input.clear() + # Show thinking indicator as AI message + self.chat_messages.append(("ai", "🤔 Analyzing your question...", timestamp)) + self.render_chat_history() + ai_context = self.get_ai_context(user_msg) + QTimer.singleShot(100, lambda: self._do_ai_response_openrouter(user_msg, ai_context)) + + def _do_ai_response_openrouter(self, user_msg, ai_context): + if OPENAI_AVAILABLE: + try: + client = OpenAI( + base_url="https://openrouter.ai/api/v1", + api_key=os.environ.get("OPENROUTER_API_KEY"), + ) + system_prompt = """You are an expert website analyst and AI assistant specializing in web scraping analysis. Your role is to:\n\n1. **Analyze website content** - Provide insights about the scraped websites\n2. **Identify patterns** - Find common themes, structures, and content types\n3. **Offer recommendations** - Suggest improvements for SEO, content, or structure\n4. **Answer questions** - Respond to specific queries about the websites\n5. **Provide actionable insights** - Give practical advice based on the data\n\n**Response Guidelines:**\n- Be professional yet conversational\n- Use clear, structured responses with bullet points when appropriate\n- Reference specific websites by title when relevant\n- Provide specific examples from the content\n- Suggest actionable next steps when possible\n- Use markdown formatting for better readability\n\n**Context:** You have access to scraped website data including titles, URLs, content previews, and metadata.""" + user_prompt = f"""# Website Analysis Request\n\n## User Question\n{user_msg}\n\n## Available Website Data\n{ai_context}\n\n## Instructions\nPlease provide a comprehensive analysis based on the user's question. Use the website data above to support your response. If the question is about specific aspects (SEO, content, structure, etc.), focus your analysis accordingly.\n\n**Format your response with:**\n- Clear headings and structure\n- Specific examples from the websites\n- Actionable insights and recommendations\n- Professional, helpful tone""" + completion = client.chat.completions.create( + extra_headers={ + "HTTP-Referer": "http://localhost:8000", + "X-Title": "Web Scraper & Data Analyzer - AI Analysis", + }, + extra_body={}, + model="deepseek/deepseek-r1-0528-qwen3-8b:free", + messages=[ + {"role": "system", "content": system_prompt}, + {"role": "user", "content": user_prompt} + ], + temperature=0.7, + max_tokens=2000 + ) + try: + answer = completion.choices[0].message.content + if answer is not None: + answer = answer.strip() + else: + answer = "❌ **AI Analysis Error**\n\nNo response content received from the AI model." + except (AttributeError, IndexError, KeyError): + answer = "❌ **AI Analysis Error**\n\nUnexpected response format from the AI model." + if hasattr(self, "ai_stats_label"): + self.ai_stats_label.setText(f"Analyzed {len(self.websites)} websites") + except Exception as e: + answer = f"❌ **AI Analysis Error**\n\nI encountered an error while analyzing your request: `{str(e)}`\n\nPlease try again or check your internet connection." + else: + if ai_context == "No data available. Please scrape some websites first.": + answer = "📊 **No Data Available**\n\nPlease scrape some websites first to enable AI analysis." + else: + answer = f"🤖 **AI Analysis Preview**\n\nI have analyzed {len(self.websites)} websites. Your question: '{user_msg}'\n\n*(This is a placeholder response. Install the 'openai' package for real AI analysis.)*" + # Remove the last AI thinking message + if self.chat_messages and self.chat_messages[-1][1].startswith("🤔"): + self.chat_messages.pop() + timestamp = datetime.now().strftime("%H:%M") + self.chat_messages.append(("ai", answer, timestamp)) + self.render_chat_history() + + def open_url_in_browser(self, item, column): + url = item.data(1, Qt.ItemDataRole.DisplayRole) + if url: + webbrowser.open(url) + + def get_icon(self, is_root=False): + + if is_root: + return self.style().standardIcon(QStyle.StandardPixmap.SP_DesktopIcon) + else: + return self.style().standardIcon(QStyle.StandardPixmap.SP_DirIcon) + """Build and display the sitemap tree from crawled data, with icons and tooltips""" + self.sitemap_tree.clear() + if not self.websites: + return + url_to_website = {w.url: w for w in self.websites} + children_map = {w.url: [] for w in self.websites} + for w in self.websites: + for link in w.links: + if link in url_to_website: + children_map[w.url].append(link) + root_url = self.websites[0].url + def add_items(parent_item, url, visited, depth): + if url in visited: + return + visited.add(url) + website = url_to_website[url] + item = QTreeWidgetItem([website.title, website.url]) + item.setIcon(0, self.get_icon(is_root=False)) + tooltip = f"Title: {website.title}For more info, see the README or contact support.
" + ) + QMessageBox.information(self, "Help / Info", help_text) + + def scraping_finished(self, websites): + """Handle scraping completion""" + self.websites = websites + self.scraper.websites = websites + + # Update UI + self.start_button.setEnabled(True) + self.stop_button.setEnabled(False) + self.progress_bar.setVisible(False) + self.status_label.setText(f"Scraping complete! Found {len(websites)} websites.") + self.status_label.setStyleSheet(""" + QLabel { + color: #166534; + font-size: 14px; + padding: 8px; + background: #f0fdf4; + border-radius: 6px; + border-left: 3px solid #22c55e; + } + """) + + # Update data view + self.update_data_table() + self.update_analysis() + self.update_sitemap_tree() + + # Switch to data tab + self.tab_widget.setCurrentIndex(1) + + # Show desktop notification + self.tray_icon.showMessage( + "Web Scraper", + f"Scraping complete! Found {len(websites)} websites.", + QSystemTrayIcon.MessageIcon(1), # 1 = Information + 5000 + ) + + def scraping_error(self, error_message): + """Handle scraping errors""" + QMessageBox.critical(self, "Error", f"Scraping failed: {error_message}") + self.start_button.setEnabled(True) + self.stop_button.setEnabled(False) + self.progress_bar.setVisible(False) + self.status_label.setText("Scraping failed.") + self.status_label.setStyleSheet(""" + QLabel { + color: #991b1b; + font-size: 14px; + padding: 8px; + background: #fef2f2; + border-radius: 6px; + border-left: 3px solid #ef4444; + } + """) + + # Show desktop notification + self.tray_icon.showMessage( + "Web Scraper", + f"Scraping failed: {error_message}", + QSystemTrayIcon.MessageIcon(3), + 5000 + ) + + def update_data_table(self): + """Update the data table with scraped websites""" + self.data_table.setRowCount(len(self.websites)) + for row, website in enumerate(self.websites): + self.data_table.setRowHeight(row, 40) + title_item = QTableWidgetItem(website.title) + title_item.setTextAlignment(Qt.AlignmentFlag.AlignTop | Qt.AlignmentFlag.AlignLeft) + url_item = QTableWidgetItem(website.url) + url_item.setTextAlignment(Qt.AlignmentFlag.AlignTop | Qt.AlignmentFlag.AlignLeft) + depth_item = QTableWidgetItem(str(website.depth)) + depth_item.setTextAlignment(Qt.AlignmentFlag.AlignCenter) + links_item = QTableWidgetItem(str(len(website.links))) + links_item.setTextAlignment(Qt.AlignmentFlag.AlignCenter) + words_item = QTableWidgetItem(str(website.get_word_count())) + words_item.setTextAlignment(Qt.AlignmentFlag.AlignCenter) + load_time = f"{website.load_time:.2f}s" if website.load_time else "N/A" + load_time_item = QTableWidgetItem(load_time) + load_time_item.setTextAlignment(Qt.AlignmentFlag.AlignCenter) + self.data_table.setItem(row, 0, title_item) + self.data_table.setItem(row, 1, url_item) + self.data_table.setItem(row, 2, depth_item) + self.data_table.setItem(row, 3, links_item) + self.data_table.setItem(row, 4, words_item) + self.data_table.setItem(row, 5, load_time_item) + # Update domain filter + domains = list(set(w.get_normalized_domain() for w in self.websites)) + self.domain_filter.clear() + self.domain_filter.addItem("All Domains") + self.domain_filter.addItems(domains) + # Update content preview with first website + if self.websites: + first_website = self.websites[0] + content_preview = first_website.get_text_preview(800) + self.content_text.setText(content_preview) + + # Also update visual preview for first website + if WEB_ENGINE_AVAILABLE and hasattr(self, 'web_view'): + try: + html_content = first_website.content + if html_content and html_content.strip(): + full_html = f""" + + + + + +This page doesn't have HTML content to display in the visual preview.
+ + + """) + except Exception as e: + self.web_view.setHtml(f""" + + +Failed to load the visual preview:
+{str(e)}
+This might be due to:
+This page doesn't have HTML content to display in the visual preview.
+Check the text preview tab for the extracted content.
+ + + """) + except Exception as e: + # Show error message in the web view + error_html = f""" + + +Failed to load the visual preview:
+{str(e)}
+This might be due to:
+\n",
+ " \n",
+ " | \n",
+ " \n",
+ " Just before we get to the assignment --\n", + " I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.\n", + " https://edwarddonner.com/2024/11/13/llm-engineering-resources/ \n", + " Please keep this bookmarked, and I'll continue to add more useful links there over time.\n", + " \n", + " | \n",
+ "
\n",
+ " \n",
+ " | \n",
+ " \n",
+ " Write something that will take the contents of an email, and will suggest an appropriate short subject line for the email. That's the kind of feature that might be built into a commercial email tool.\n", + " | \n",
+ "
\n",
+ " \n",
+ " | \n",
+ " \n",
+ " Just before we get to the assignment --\n", + " I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.\n", + " https://edwarddonner.com/2024/11/13/llm-engineering-resources/ \n", + " Please keep this bookmarked, and I'll continue to add more useful links there over time.\n", + " \n", + " | \n",
+ "
\n",
+ " \n",
+ " | \n",
+ " \n",
+ " Just before we get to the assignment --\n", + " I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.\n", + " https://edwarddonner.com/2024/11/13/llm-engineering-resources/ \n", + " Please keep this bookmarked, and I'll continue to add more useful links there over time.\n", + " \n", + " | \n",
+ "
+ + + +
+ + \ No newline at end of file diff --git a/week1/day5.ipynb b/week1/day5.ipynb index 300145f..5249ce8 100644 --- a/week1/day5.ipynb +++ b/week1/day5.ipynb @@ -141,7 +141,7 @@ "{\n", " \"links\": [\n", " {\"type\": \"about page\", \"url\": \"https://full.url/goes/here/about\"},\n", - " {\"type\": \"careers page\": \"url\": \"https://another.full.url/careers\"}\n", + " {\"type\": \"careers page\", \"url\": \"https://another.full.url/careers\"}\n", " ]\n", "}\n", "\"\"\"" @@ -501,7 +501,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.12" + "version": "3.11.13" } }, "nbformat": 4, diff --git a/week2/community-contributions/3_chatbots_Converstion/Conversation_Day1.ipynb b/week2/community-contributions/3_chatbots_Converstion/Conversation_Day1.ipynb new file mode 100644 index 0000000..72400c8 --- /dev/null +++ b/week2/community-contributions/3_chatbots_Converstion/Conversation_Day1.ipynb @@ -0,0 +1,385 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "2b3a83fe-edf2-45b7-8b76-af2324296ad0", + "metadata": {}, + "source": [ + "### Import API Keys and Establish Connections" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bacb0c55-44ee-4505-a3bc-7aaa3d72b28b", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import ollama\n", + "import anthropic\n", + "from IPython.display import Markdown, display, update_display" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1767187f-c065-43df-b778-fcd48bd5e48d", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n", + "google_api_key = os.getenv(\"GOOGLE_API_KEY\")\n", + "anthropic_api_key = os.getenv(\"ANTHROPIC_API_KEY\")\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API key exists {openai_api_key[:8]}\")\n", + "else:\n", + " print(f\"OpenAI API key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API key exists {google_api_key[:7]}\")\n", + "else:\n", + " print(f\"Google API key not set\")\n", + "\n", + "if anthropic_api_key:\n", + " print(f\"Anthropic API key exists {openai_api_key[:8]}\")\n", + "else:\n", + " print(f\"Anthropic API key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc7ca3ab-ff7f-4375-bcad-aca49c7f4f4f", + "metadata": {}, + "outputs": [], + "source": [ + "# Initializing API Clients, loading the SDKs\n", + "# An SDK is a library/toolbox (Pre-built functions, classes, utilities) full \n", + "# of everything you need to use someone else's software\n", + " \n", + "openai = OpenAI()\n", + "claude = anthropic.Anthropic()\n", + "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key = 'ollama')" + ] + }, + { + "cell_type": "markdown", + "id": "81e01904-5586-4726-ab91-7bdbd6bde6d9", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "### A Coversation between 3 chatbots" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "843bbb69-ab7d-4b13-b878-65a4275f53ca", + "metadata": {}, + "outputs": [], + "source": [ + "# Conversation between GPT-4o-mini, Claude-3, ang Gemini 2.5 flash\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "claude_model = \"claude-3-haiku-20240307\"\n", + "ollama_model = \"llama3.2\"\n", + "\n", + "gpt_system = \"You are an eternal optimist. You always see the bright side of things and believe even \\\n", + "simple actions have deep purpose. Keep replies under 2 sentences.\"\n", + "\n", + "ollama_system = \"You are a witty skeptic who questions everything. You tend to doubt grand explanations \\\n", + "and prefer clever, sarcastic, or literal answers. Keep replies under 2 sentences.\"\n", + "\n", + "claude_system = \"You are a thoughtful philosopher. You consider all perspectives and enjoy finding \\\n", + "symbolic or existential meaning in simple actions. Keep replies under 2 sentences.\"\n", + "\n", + "\n", + "gpt_messages = [\"Hi! Todays topic for discussion is 'Why did the chicken cross the road?'\"]\n", + "ollama_messages = [\"That's quite the topic. \"]\n", + "claude_messages = [\"Lets begin our discussion.\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1a4da2f5-ff74-4847-aa86-867e89173509", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " \n", + " messages = [{\"role\":\"system\", \"content\":gpt_system}]\n", + " \n", + " for gpt, ollama, claude in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": ollama})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " \n", + " response = openai.chat.completions.create(\n", + " model = gpt_model,\n", + " messages = messages,\n", + " max_tokens = 500\n", + " )\n", + " return response.choices[0].message.content.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5848d83a-f4aa-42ee-b40b-6130da60c890", + "metadata": {}, + "outputs": [], + "source": [ + "def call_ollama():\n", + " messages = [{\"role\":\"system\", \"content\":ollama_system}]\n", + " \n", + " for gpt, ollama_message, claude in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": ollama_message})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " \n", + " messages.append({\"role\":\"user\", \"content\": gpt_messages[-1]})\n", + "\n", + " response = ollama_via_openai.chat.completions.create(\n", + " model = ollama_model,\n", + " messages = messages\n", + " )\n", + " return response.choices[0].message.content.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a50e4f7c-d594-4ed8-a658-2d8b2fde21a0", + "metadata": {}, + "outputs": [], + "source": [ + "def call_claude():\n", + " \n", + " messages = []\n", + " \n", + " for gpt, ollama, claude_message in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\":\"user\", \"content\":gpt})\n", + " messages.append({\"role\": \"user\", \"content\": ollama})\n", + " messages.append({\"role\":\"assistant\", \"content\": claude_message})\n", + " \n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " messages.append({\"role\": \"user\", \"content\": ollama_messages[-1]})\n", + " \n", + " response = claude.messages.create(\n", + " model = claude_model,\n", + " system = claude_system,\n", + " messages = messages,\n", + " max_tokens = 500\n", + " )\n", + " return response.content[0].text.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5c78fcf8-544e-413f-af18-ccb9000515de", + "metadata": {}, + "outputs": [], + "source": [ + "print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Ollama:\\n{ollama_messages[0]}\\n\")\n", + "print(f\"Claude:\\n{claude_messages[0]}\\n\")\n", + "\n", + "for i in range(5):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT: \\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + "\n", + " ollama_next = call_ollama()\n", + " print(f\"Ollama: \\n{ollama_next}\\n\")\n", + " ollama_messages.append(ollama_next)\n", + " \n", + " claude_next = call_claude()\n", + " print(f\"Claude: \\n{claude_next}\\n\")\n", + " claude_messages.append(claude_next)" + ] + }, + { + "cell_type": "markdown", + "id": "8ea7419a-ea8f-42da-a9a1-4bbe5342cecb", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "### Another Coversation between 3 chatbots" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c279c275-7b95-4587-9cc6-4d32517ec253", + "metadata": {}, + "outputs": [], + "source": [ + "# Conversation between GPT-4o-mini, Claude-3, ang Gemini 2.5 flash\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "claude_model = \"claude-3-haiku-20240307\"\n", + "ollama_model = \"llama3.2\"\n", + "\n", + "gpt_system = \"You are an optimist who believes technology brings people \\\n", + "closer together and improves lives. Defend innovation as a force for human \\\n", + "connection. Keep response under 3 sentences.\"\n", + "\n", + "\n", + "ollama_system = \"You are a skeptic who questions if technology isolates us \\\n", + "and worsens social divides. Highlight its risks and unintended consequences. \\\n", + "Keep response under 3 sentences.\"\n", + "\n", + "\n", + "claude_system = \"You are a philosopher who explores both sides \\\n", + "of technology's impact. Seek a balanced perspective on connection and isolation.\\\n", + "Keep response under 3 sentences.\"\n", + "\n", + "\n", + "\n", + "\n", + "gpt_messages = [\"Our topic of discussion for today will be: 'Is technology making us more connected or more isolated?'\"]\n", + "ollama_messages = [\"A great topic\"]\n", + "claude_messages = [\"Let's begin.\"]\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "44c023a6-f22f-4a64-a718-f75fe4c8233a", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " \n", + " messages = [{\"role\":\"system\", \"content\":gpt_system}]\n", + " \n", + " for gpt, ollama, claude in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": ollama})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " \n", + " response = openai.chat.completions.create(\n", + " model = gpt_model,\n", + " messages = messages,\n", + " max_tokens = 500\n", + " )\n", + " return response.choices[0].message.content.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d29f27a1-457e-4e71-88dc-c55e4a36a27c", + "metadata": {}, + "outputs": [], + "source": [ + "def call_ollama():\n", + " messages = [{\"role\":\"system\", \"content\":ollama_system}]\n", + " \n", + " for gpt, ollama_message, claude in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": ollama_message})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " \n", + " messages.append({\"role\":\"user\", \"content\": gpt_messages[-1]})\n", + "\n", + " response = ollama_via_openai.chat.completions.create(\n", + " model = ollama_model,\n", + " messages = messages\n", + " )\n", + " return response.choices[0].message.content.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "69577edc-4be2-40fc-8eac-1243c30cda26", + "metadata": {}, + "outputs": [], + "source": [ + "def call_claude():\n", + " \n", + " messages = []\n", + " \n", + " for gpt, ollama, claude_message in zip(gpt_messages, ollama_messages, claude_messages):\n", + " messages.append({\"role\":\"user\", \"content\":gpt})\n", + " messages.append({\"role\": \"user\", \"content\": ollama})\n", + " messages.append({\"role\":\"assistant\", \"content\": claude_message})\n", + " \n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " messages.append({\"role\": \"user\", \"content\": ollama_messages[-1]})\n", + " \n", + " response = claude.messages.create(\n", + " model = claude_model,\n", + " system = claude_system,\n", + " messages = messages,\n", + " max_tokens = 500\n", + " )\n", + " return response.content[0].text.strip()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "acedf2fb-8b20-49be-9a80-24fb3896e2ea", + "metadata": {}, + "outputs": [], + "source": [ + "print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Ollama:\\n{ollama_messages[0]}\\n\")\n", + "print(f\"Claude:\\n{claude_messages[0]}\\n\")\n", + "\n", + "for i in range(5):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT: \\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + "\n", + " ollama_next = call_ollama()\n", + " print(f\"Ollama: \\n{ollama_next}\\n\")\n", + " ollama_messages.append(ollama_next)\n", + " \n", + " claude_next = call_claude()\n", + " print(f\"Claude: \\n{claude_next}\\n\")\n", + " claude_messages.append(claude_next)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a078943b-7a34-4697-b1f6-16f4b0e7aed6", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/3_chatbots_Converstion/Conversation_Outputs.pdf b/week2/community-contributions/3_chatbots_Converstion/Conversation_Outputs.pdf new file mode 100644 index 0000000..6c8fefa Binary files /dev/null and b/week2/community-contributions/3_chatbots_Converstion/Conversation_Outputs.pdf differ diff --git a/week2/community-contributions/3_chatbots_Converstion/README.md b/week2/community-contributions/3_chatbots_Converstion/README.md new file mode 100644 index 0000000..c9f07e9 --- /dev/null +++ b/week2/community-contributions/3_chatbots_Converstion/README.md @@ -0,0 +1,36 @@ + +# 3 Way Chatbot Conversation +Making the different models from Anthropic, OpenAI and Ollama converse with each other. + +## Contents + +- `Conversation_Day1.ipynb`: The notebook file with all code and explanations for the first day. +- `Conversation_Outputs`: The chatbots conversations for each topic +- `requirements.txt`:For installing the dependencies +- `README.md`: This file. + +## How to Run + +1. Clone this repository. +2. I'm using 'Python 3.11.13' with Jupyter Notebook or JupyterLab. +3. Install dependencies (see below). +4. Open the notebook using Jupyter: + +```bash +jupyter notebook Conversation_Day1.ipynb +``` + +## Dependencies + +Install the required Python libraries using: + +```bash +pip install -r requirements.txt +``` + +--- + +### Author + +Mustafa Kashif + diff --git a/week2/community-contributions/3_chatbots_Converstion/requirements.txt b/week2/community-contributions/3_chatbots_Converstion/requirements.txt new file mode 100644 index 0000000..548bb18 --- /dev/null +++ b/week2/community-contributions/3_chatbots_Converstion/requirements.txt @@ -0,0 +1,6 @@ +IPython +anthropic +dotenv +ollama +openai +os \ No newline at end of file diff --git a/week2/community-contributions/Agent_translate_gemini.ipynb b/week2/community-contributions/Agent_translate_gemini.ipynb new file mode 100644 index 0000000..fe62337 --- /dev/null +++ b/week2/community-contributions/Agent_translate_gemini.ipynb @@ -0,0 +1,143 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd", + "metadata": {}, + "source": [ + "# Additional End of week Exercise - week 2\n", + "\n", + "Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.\n", + "\n", + "This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!\n", + "\n", + "If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.\n", + "\n", + "I will publish a full solution here soon - unless someone beats me to it...\n", + "\n", + "There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a07e7793-b8f5-44f4-aded-5562f633271a", + "metadata": {}, + "outputs": [], + "source": [ + "# Agent that can listen for audio and convert it to text" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da58ed0f-f781-4c51-8e5d-fdb05db98c8c", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import gradio as gr\n", + "import google.generativeai as genai\n", + "from dotenv import load_dotenv\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "078cf34a-881e-44f4-9947-c45d7fe992a3", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv()\n", + "\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "if google_api_key:\n", + " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n", + "else:\n", + " print(\"Google API Key not set\")\n", + "\n", + "genai.configure(api_key=google_api_key)\n", + "model = genai.GenerativeModel(\"gemini-2.0-flash\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f77228ea-d0e1-4434-9191-555a6d680625", + "metadata": {}, + "outputs": [], + "source": [ + "def transcribe_translate_with_gemini(audio_file_path):\n", + " if not audio_file_path:\n", + " return \"⚠️ No audio file received.\"\n", + "\n", + " prompt = (\n", + " \"You're an AI that listens to a voice message in any language and returns the English transcription. \"\n", + " \"Please transcribe and translate the following audio to English. If already in English, just transcribe it.\"\n", + " )\n", + "\n", + " uploaded_file = genai.upload_file(audio_file_path)\n", + "\n", + " # 🔁 Send prompt + uploaded audio reference to Gemini\n", + " response = model.generate_content(\n", + " contents=[\n", + " {\n", + " \"role\": \"user\",\n", + " \"parts\": [\n", + " {\"text\": prompt},\n", + " uploaded_file \n", + " ]\n", + " }\n", + " ]\n", + " )\n", + "\n", + " return response.text.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eb6c6d1e-1be3-404d-83f3-fc0855dc9f67", + "metadata": {}, + "outputs": [], + "source": [ + "gr.Interface(\n", + " fn=transcribe_translate_with_gemini,\n", + " inputs=gr.Audio(label=\"Record voice\", type=\"filepath\"),\n", + " outputs=\"text\",\n", + " title=\"🎙️ Voice-to-English Translator (Gemini Only)\",\n", + " description=\"Speak in any language and get the English transcription using Gemini multimodal API.\"\n", + ").launch()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8b105082-e388-44bc-9617-1a81f38e2f3f", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/FlightAI-exercise.ipynb b/week2/community-contributions/FlightAI-exercise.ipynb new file mode 100644 index 0000000..f6c96ca --- /dev/null +++ b/week2/community-contributions/FlightAI-exercise.ipynb @@ -0,0 +1,654 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d006b2ea-9dfe-49c7-88a9-a5a0775185fd", + "metadata": {}, + "source": [ + "# Additional End of week Exercise - week 2\n", + "\n", + "Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.\n", + "\n", + "This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!\n", + "\n", + "If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.\n", + "\n", + "I will publish a full solution here soon - unless someone beats me to it...\n", + "\n", + "There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a07e7793-b8f5-44f4-aded-5562f633271a", + "metadata": {}, + "outputs": [], + "source": [ + "# Imports\n", + "\n", + "import os\n", + "import json\n", + "import base64\n", + "import logging\n", + "import gradio as gr\n", + "from PIL import Image\n", + "from io import BytesIO\n", + "from openai import OpenAI\n", + "from dotenv import load_dotenv\n", + "from IPython.display import Audio, display" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e879f6ae-b246-479d-8f81-94e47a9072ec", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialization\n", + "logging.basicConfig(level=logging.INFO)\n", + "load_dotenv(override=True)\n", + "\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "if openai_api_key:\n", + " logging.info(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " logging.error(\"OpenAI API Key not set\")\n", + " \n", + "MODEL = \"gpt-4o-mini\"\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d4455169-9e5e-4171-92e8-6f850a06f6e3", + "metadata": {}, + "outputs": [], + "source": [ + "system_message = (\n", + " \"You are a helpful assistant for an airline called FlightAI. \"\n", + " \"Always respond in a short, courteous sentence. \"\n", + " \"Provide accurate information only. \"\n", + " \"If you don’t know something, say so clearly. \"\n", + " \"Before booking a ticket, strictly follow this order: \"\n", + " \"1) Check if the destination is available, \"\n", + " \"2) Then check the ticket price, \"\n", + " \"3) Collect all neccessary details like name, destination and date of journey, \"\n", + " \"4) Only then proceed with the booking. \"\n", + " \"Always use the appropriate tools or APIs for each step before confirming a booking.\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4bab8e2c-e2b1-4421-a95b-7f1251670817", + "metadata": {}, + "outputs": [], + "source": [ + "# Dummy funcs that mimic the ticket booking behaviour\n", + "# Replace these will real funcs (that call APIs or make DB transactions) to actually book a ticket\n", + "\n", + "ticket_prices = {\n", + " \"london\": \"$799\",\n", + " \"paris\": \"$899\",\n", + " \"tokyo\": \"$1400\",\n", + " \"berlin\": \"$499\"\n", + "}\n", + "\n", + "def check_destination_availability(destination: str) -> dict:\n", + " \"\"\"\n", + " Check if the given destination is available in our ticketing system.\n", + " \n", + " Args:\n", + " destination (str): The name of the city.\n", + " \n", + " Returns:\n", + " dict: {\"available\": bool}\n", + " \"\"\"\n", + " logging.info(f\"Checking availability for destination: {destination}\")\n", + " \n", + " available = destination.lower() in ticket_prices\n", + " return {\"available\": available}\n", + "\n", + "\n", + "def fetch_ticket_price(destination_city: str) -> dict:\n", + " \"\"\"\n", + " Retrieve the ticket price for a given city.\n", + " \n", + " Args:\n", + " destination_city (str): The name of the destination city.\n", + " \n", + " Returns:\n", + " dict: {\"price\": str} or {\"price\": \"Unknown\"} if not found\n", + " \"\"\"\n", + " logging.info(f\"Retrieving price for destination: {destination_city}\")\n", + " \n", + " city = destination_city.lower()\n", + " price = ticket_prices.get(city, \"Unknown\")\n", + " \n", + " return {\"price\": price}\n", + "\n", + "\n", + "def book_ticket(name: str, destination_city: str, journey_date: str) -> dict:\n", + " \"\"\"\n", + " Book a ticket to a destination city for a given user and date.\n", + " \n", + " Args:\n", + " name (str): Name of the passenger.\n", + " destination_city (str): Destination city.\n", + " journey_date (str): Date of journey in YYYY-MM-DD format.\n", + " \n", + " Returns:\n", + " dict: Booking confirmation with name, city, price, and date, or error.\n", + " \"\"\"\n", + " logging.info(f\"Booking ticket for {name} to {destination_city} on {journey_date}\")\n", + " \n", + " city = destination_city.lower()\n", + "\n", + " if city not in ticket_prices:\n", + " logging.error(f\"City '{destination_city}' not found in ticket list.\")\n", + " return {\"error\": \"Destination not found.\"}\n", + "\n", + " price_info = fetch_ticket_price(destination_city)\n", + " \n", + " return {\n", + " \"name\": name,\n", + " \"destination_city\": destination_city.title(),\n", + " \"journey_date\": journey_date,\n", + " \"price\": price_info[\"price\"]\n", + " }\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "400f4592-2326-43f6-a921-fcd051c4f022", + "metadata": {}, + "outputs": [], + "source": [ + "destination_availability_tool = {\n", + " \"name\": \"check_destination_availability\",\n", + " \"description\": \"Check if tickets are available for the given destination city before proceeding with any booking or pricing inquiry.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"destination\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The name of the destination city to check for availability.\"\n", + " }\n", + " },\n", + " \"required\": [\"destination\"],\n", + " \"additionalProperties\": False\n", + " }\n", + "}\n", + "\n", + "ticket_price_tool = {\n", + " \"name\": \"fetch_ticket_price\",\n", + " \"description\": (\n", + " \"Get the price of a return ticket to the specified destination city. \"\n", + " \"Use this after confirming that the destination is available, especially when the customer asks for the ticket price.\"\n", + " ),\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"destination_city\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The city for which the customer wants the ticket price.\"\n", + " }\n", + " },\n", + " \"required\": [\"destination_city\"],\n", + " \"additionalProperties\": False\n", + " }\n", + "}\n", + "\n", + "ticket_booking_tool = {\n", + " \"name\": \"book_ticket\",\n", + " \"description\": (\n", + " \"Book a ticket for the customer to the specified destination city on the given journey date. \"\n", + " \"Use only after availability and price have been checked.\"\n", + " ),\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"name\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Full name of the person booking the ticket.\"\n", + " },\n", + " \"destination_city\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The city that the customer wants to travel to.\"\n", + " },\n", + " \"journey_date\": {\n", + " \"type\": \"string\",\n", + " \"format\": \"date\",\n", + " \"description\": \"The journey date in YYYY-MM-DD format.\"\n", + " }\n", + " },\n", + " \"required\": [\"name\", \"destination_city\", \"journey_date\"],\n", + " \"additionalProperties\": False\n", + " }\n", + "}\n", + "\n", + "tools = [\n", + " {\"type\": \"function\", \"function\": destination_availability_tool},\n", + " {\"type\": \"function\", \"function\": ticket_price_tool},\n", + " {\"type\": \"function\", \"function\": ticket_booking_tool},\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f02c17ba-14f2-41c4-b6a2-d1397405d368", + "metadata": {}, + "outputs": [], + "source": [ + "def handle_tool_call(message):\n", + " \"\"\"\n", + " Handles a single OpenAI tool call message and returns both the result\n", + " and a formatted tool response dictionary.\n", + " \n", + " Args:\n", + " message (object): An OpenAI message containing a tool call.\n", + " \n", + " Returns:\n", + " tuple: (result_dict, response_dict)\n", + " \"\"\"\n", + " tool_call = message.tool_calls[0]\n", + " function_name = tool_call.function.name\n", + " arguments = json.loads(tool_call.function.arguments)\n", + "\n", + " result = None\n", + "\n", + " logging.info(f\"Tool call received: {function_name} with arguments: {arguments}\")\n", + "\n", + " if function_name == \"check_destination_availability\":\n", + " result = check_destination_availability(**arguments)\n", + "\n", + " elif function_name == \"fetch_ticket_price\":\n", + " city = arguments.get(\"destination_city\")\n", + " price_info = fetch_ticket_price(city)\n", + " result = {\"destination_city\": city, \"price\": price_info[\"price\"]}\n", + "\n", + " elif function_name == \"book_ticket\":\n", + " result = book_ticket(**arguments)\n", + "\n", + " else:\n", + " logging.warning(\"Unrecognized tool function: %s\", function_name)\n", + " result = {\"error\": f\"Unknown function '{function_name}'\"}\n", + "\n", + " response = {\n", + " \"role\": \"tool\",\n", + " \"tool_call_id\": tool_call.id,\n", + " \"content\": json.dumps(result)\n", + " }\n", + "\n", + " return result, response" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "72c1a9e7-186c-4218-9edc-01814baec431", + "metadata": {}, + "outputs": [], + "source": [ + "def artist(city: str, style: str = \"vibrant pop-art\", size: str = \"1024x1024\") -> Image.Image:\n", + " \"\"\"\n", + " Generates a city-themed vacation image using DALL·E.\n", + "\n", + " Args:\n", + " city (str): Name of the city to visualize.\n", + " style (str): Artistic style for the image prompt.\n", + " size (str): Image resolution (e.g., \"1024x1024\").\n", + "\n", + " Returns:\n", + " Image.Image: A PIL Image object representing the generated image.\n", + "\n", + " Raises:\n", + " ValueError: If city name is empty.\n", + " RuntimeError: If image generation fails.\n", + " \"\"\"\n", + " if not city.strip():\n", + " raise ValueError(\"City name cannot be empty.\")\n", + "\n", + " prompt = (\n", + " f\"An image representing a vacation in {city}, \"\n", + " f\"showing iconic tourist attractions, cultural elements, and everything unique about {city}, \"\n", + " f\"rendered in a {style} style.\"\n", + " )\n", + "\n", + " logging.info(\"Generating image for city: %s with style: %s\", city, style)\n", + "\n", + " try:\n", + " response = openai.images.generate(\n", + " model=\"dall-e-3\",\n", + " prompt=prompt,\n", + " size=size,\n", + " n=1,\n", + " response_format=\"b64_json\",\n", + " )\n", + "\n", + " image_base64 = response.data[0].b64_json\n", + " image_data = base64.b64decode(image_base64)\n", + " logging.info(\"Image generation successful for %s\", city)\n", + "\n", + " return Image.open(BytesIO(image_data))\n", + "\n", + " except Exception as e:\n", + " logging.error(\"Failed to generate image for city '%s': %s\", city, str(e))\n", + " raise RuntimeError(f\"Image generation failed for city '{city}'\") from e" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fdf7c091-6c68-4af6-8197-c1456b36cedf", + "metadata": {}, + "outputs": [], + "source": [ + "def talker(message: str, output_filename: str = \"output_audio.mp3\", autoplay: bool = True) -> None:\n", + " \"\"\"\n", + " Converts a text message into speech using OpenAI TTS and plays the audio.\n", + "\n", + " Args:\n", + " message (str): The text to convert to speech.\n", + " output_filename (str): The filename to save the generated audio.\n", + " autoplay (bool): Whether to autoplay the audio in the notebook.\n", + "\n", + " Raises:\n", + " ValueError: If the message is empty.\n", + " RuntimeError: If the audio generation fails.\n", + " \"\"\"\n", + " if not message.strip():\n", + " raise ValueError(\"Message cannot be empty.\")\n", + "\n", + " logging.info(\"Generating speech for message: %s\", message)\n", + "\n", + " try:\n", + " response = openai.audio.speech.create(\n", + " model=\"tts-1\",\n", + " voice=\"alloy\",\n", + " input=message\n", + " )\n", + "\n", + " with open(output_filename, \"wb\") as f:\n", + " f.write(response.content)\n", + "\n", + " logging.info(\"Audio written to: %s\", output_filename)\n", + "\n", + " if autoplay:\n", + " display(Audio(output_filename, autoplay=True))\n", + "\n", + " except Exception as e:\n", + " logging.error(\"Failed to generate or play audio: %s\", str(e))\n", + " raise RuntimeError(\"Text-to-speech generation failed.\") from e" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "54568b4a-be8d-47a1-b924-03acdafef70e", + "metadata": {}, + "outputs": [], + "source": [ + "def translate(message, language):\n", + " \"\"\"\n", + " Translates the given text into the specified language using OpenAI Chat API.\n", + "\n", + " Args:\n", + " message (str): The text to be translated.\n", + " language (str): Target language for translation (e.g., 'French', 'Japanese').\n", + "\n", + " Returns:\n", + " str: Translated text.\n", + "\n", + " Raises:\n", + " ValueError: If input message or language is empty.\n", + " RuntimeError: If translation fails due to API or other issues.\n", + " \"\"\"\n", + " if not message.strip():\n", + " raise ValueError(\"Input message cannot be empty.\")\n", + " if not language.strip():\n", + " raise ValueError(\"Target language cannot be empty.\")\n", + "\n", + " logging.info(\"Translating to %s: %s\", language, message)\n", + "\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": f\"You are a translation assistant. Translate everything the user says to {language}.\"},\n", + " {\"role\": \"user\", \"content\": message}\n", + " ]\n", + "\n", + " try:\n", + " response = openai.chat.completions.create(\n", + " model=MODEL,\n", + " messages=messages\n", + " )\n", + " translated = response.choices[0].message.content.strip()\n", + " logging.info(\"Translation successful.\")\n", + " return translated\n", + "\n", + " except Exception as e:\n", + " logging.error(\"Translation failed: %s\", str(e))\n", + " raise RuntimeError(\"Failed to translate message.\") from e" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8e6cf470-8ea0-43b2-bbcc-53c2432feb0d", + "metadata": {}, + "outputs": [], + "source": [ + "def transcribe_audio(audio_path):\n", + " \"\"\"\n", + " Transcribes an audio file using OpenAI's Whisper model.\n", + "\n", + " Args:\n", + " audio_path (str): Path to the audio file (e.g., .mp3, .wav).\n", + " model (str): OpenAI model for transcription (default: 'whisper-1').\n", + "\n", + " Returns:\n", + " str: Transcribed text from the audio file.\n", + "\n", + " Raises:\n", + " ValueError: If the path is invalid or the file does not exist.\n", + " RuntimeError: If the transcription fails.\n", + " \"\"\"\n", + " if not audio_path or not os.path.exists(audio_path):\n", + " raise ValueError(\"Invalid or missing audio file path.\")\n", + "\n", + " logging.info(\"Transcribing audio file: %s using model: whisper-1\", audio_path)\n", + "\n", + " try:\n", + " with open(audio_path, \"rb\") as f:\n", + " response = openai.audio.transcriptions.create(\n", + " model=\"whisper-1\",\n", + " file=f\n", + " )\n", + " transcript = response.text.strip()\n", + " logging.info(\"Transcription successful.\")\n", + " return transcript\n", + "\n", + " except Exception as e:\n", + " logging.error(\"Transcription failed: %s\", str(e))\n", + " raise RuntimeError(\"Failed to transcribe audio.\") from e" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3489656e-0f08-4d41-94b1-d902c93ca164", + "metadata": {}, + "outputs": [], + "source": [ + "def chat(history: list, language: str, translated_history: list, speaking_language: str) -> tuple:\n", + " \"\"\"\n", + " Handles a chat interaction including tool calls, image generation, translation, and TTS playback.\n", + "\n", + " Args:\n", + " history (list): List of previous conversation messages.\n", + " language (str): Target language for translation and TTS.\n", + "\n", + " Returns:\n", + " tuple: (updated history list, generated image if any, translated response string)\n", + " \"\"\"\n", + " messages = [{\"role\": \"system\", \"content\": system_message}] + history\n", + " image = None\n", + "\n", + " try:\n", + " # Initial assistant response\n", + " response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)\n", + " choice = response.choices[0]\n", + "\n", + " # Handle tool calls if triggered\n", + " if choice.finish_reason == \"tool_calls\":\n", + " message = choice.message\n", + " result, tool_response = handle_tool_call(message)\n", + "\n", + " # Append tool-related messages\n", + " messages.append(message)\n", + " messages.append(tool_response)\n", + " logging.info(\"Tool call result: %s\", result)\n", + "\n", + " # Generate image if a booking was completed\n", + " if message.tool_calls[0].function.name == \"book_ticket\" and \"destination_city\" in result:\n", + " image = artist(result[\"destination_city\"])\n", + "\n", + " # Get final assistant response after tool execution\n", + " response = openai.chat.completions.create(model=MODEL, messages=messages)\n", + " choice = response.choices[0]\n", + "\n", + " reply = choice.message.content.strip()\n", + " history.append({\"role\": \"assistant\", \"content\": reply})\n", + "\n", + " # Translate and speak the reply\n", + " translated_reply = translate(reply, language)\n", + " translated_history.append({\"role\": \"assistant\", \"content\": translated_reply})\n", + "\n", + " if speaking_language == \"English\":\n", + " talker(reply)\n", + " else:\n", + " talker(translated_reply)\n", + "\n", + " return history, image, translated_history\n", + "\n", + " except Exception as e:\n", + " logging.error(\"Chat processing failed: %s\", str(e))\n", + " raise RuntimeError(\"Failed to complete chat interaction.\") from e" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f76acc68-726e-457f-88ab-99da75debde5", + "metadata": {}, + "outputs": [], + "source": [ + "force_dark_mode = \"\"\"\n", + "function refresh() {\n", + " const url = new URL(window.location);\n", + " if (url.searchParams.get('__theme') !== 'dark') {\n", + " url.searchParams.set('__theme', 'dark');\n", + " window.location.href = url.href;\n", + " }\n", + "}\n", + "\"\"\"\n", + "\n", + "with gr.Blocks(js=force_dark_mode) as ui:\n", + " with gr.Row():\n", + " gr.Markdown(\"### FlightAI Chat with Translation\")\n", + "\n", + " with gr.Row():\n", + " lang_dropdown = gr.Dropdown(\n", + " choices=[\"Spanish\", \"French\", \"German\", \"Japanese\", \"Hindi\"],\n", + " value=\"Spanish\",\n", + " label=\"Translate To\"\n", + " )\n", + " \n", + " speak_dropdown = gr.Dropdown(\n", + " choices=[\"English\", \"Selected Language\"],\n", + " value=\"English\",\n", + " label=\"Speak out in\"\n", + " )\n", + " \n", + " with gr.Row():\n", + " chatbot = gr.Chatbot(height=500, type=\"messages\", label=\"Chat History\")\n", + " translated_chatbot = gr.Chatbot(height=500, type=\"messages\", label=\"Translated Chat\")\n", + " image_output = gr.Image(height=500)\n", + "\n", + " with gr.Row():\n", + " entry = gr.Textbox(label=\"Chat with our AI Assistant:\")\n", + " audio_input = gr.Audio(sources=\"microphone\", type=\"filepath\", label=\"Or speak to the assistant\")\n", + "\n", + " with gr.Row():\n", + " clear = gr.Button(\"Clear\")\n", + "\n", + " def do_entry(message, history, audio, translated_history, language):\n", + " if audio:\n", + " message = transcribe_audio(audio)\n", + "\n", + " if message:\n", + " history += [{\"role\": \"user\", \"content\": message}]\n", + " translated_history += [{\"role\": \"user\", \"content\": translate(message, language)}]\n", + " return \"\", history, None, translated_history\n", + "\n", + " entry.submit(\n", + " do_entry,\n", + " inputs=[entry, chatbot, audio_input, translated_chatbot, lang_dropdown],\n", + " outputs=[entry, chatbot, audio_input, translated_chatbot]\n", + " ).then(\n", + " chat,\n", + " inputs=[chatbot, lang_dropdown, translated_chatbot, speak_dropdown],\n", + " outputs=[chatbot, image_output, translated_chatbot]\n", + " )\n", + "\n", + " audio_input.change(\n", + " do_entry,\n", + " inputs=[entry, chatbot, audio_input, translated_chatbot, lang_dropdown],\n", + " outputs=[entry, chatbot, audio_input, translated_chatbot]\n", + " ).then(\n", + " chat,\n", + " inputs=[chatbot, lang_dropdown, translated_chatbot, speak_dropdown],\n", + " outputs=[chatbot, image_output, translated_chatbot]\n", + " )\n", + "\n", + " clear.click(lambda: [\"\", [], None, [], None], inputs=None, outputs=[entry, chatbot, audio_input, translated_chatbot, image_output], queue=False)\n", + "\n", + "ui.launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "58f97435-fa0d-45f7-b02f-4ac5f4901c53", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/Week2_airline_assistant_Gemini_Amadeus_live_ticket_price.ipynb b/week2/community-contributions/Week2_airline_assistant_Gemini_Amadeus_live_ticket_price.ipynb new file mode 100644 index 0000000..bc4f92a --- /dev/null +++ b/week2/community-contributions/Week2_airline_assistant_Gemini_Amadeus_live_ticket_price.ipynb @@ -0,0 +1,808 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d938fc6c-bcca-4572-b851-75370fe21c67", + "metadata": {}, + "source": [ + "# Airline Assistant using Gemini API for Image and Audio as well - Live ticket prices using Amadeus API" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f5eda470-07ee-4d01-bada-3390050ac9c2", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import json\n", + "import random\n", + "import string\n", + "import base64\n", + "import gradio as gr\n", + "import pyaudio\n", + "import requests\n", + "from io import BytesIO\n", + "from PIL import Image\n", + "from dotenv import load_dotenv\n", + "from google import genai\n", + "from google.genai import types" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "09aaf3b0-beb7-4b64-98a4-da16fc83dadb", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "api_key = os.getenv(\"GOOGLE_API_KEY\")\n", + "\n", + "if not api_key:\n", + " print(\"API Key not found!\")\n", + "else:\n", + " print(\"API Key loaded in memory\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35881fb9-4d51-43dc-a5e6-d9517e22019a", + "metadata": {}, + "outputs": [], + "source": [ + "MODEL_GEMINI = 'gemini-2.5-flash'\n", + "MODEL_GEMINI_IMAGE = 'gemini-2.0-flash-preview-image-generation'\n", + "MODEL_GEMINI_SPEECH = 'gemini-2.5-flash-preview-tts'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a5ed391c-8a67-4465-9c66-e915548a0d6a", + "metadata": {}, + "outputs": [], + "source": [ + "try:\n", + " client = genai.Client(api_key=api_key)\n", + " print(\"Google GenAI Client initialized successfully!\")\n", + "except Exception as e:\n", + " print(f\"Error initializing GenAI Client: {e}\")\n", + " print(\"Ensure your GOOGLE_API_KEY is correctly set as an environment variable.\")\n", + " exit() " + ] + }, + { + "cell_type": "markdown", + "id": "407ad581-9580-4dba-b236-abb6c6788933", + "metadata": {}, + "source": [ + "## Image Generation " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a21921f8-57b1-4665-8999-7f2a40645b59", + "metadata": {}, + "outputs": [], + "source": [ + "def fetch_image(city):\n", + " prompt = (\n", + " f\"A high-quality, photo-realistic image of a vacation in {city}, \"\n", + " f\"showing iconic landmarks, cultural attractions, authentic street life, and local cuisine. \"\n", + " f\"Capture natural lighting, real people enjoying travel experiences, and the unique vibe of {city}'s atmosphere. \"\n", + " f\"The composition should feel immersive, warm, and visually rich, as if taken by a travel photographer.\"\n", + ")\n", + "\n", + " response = client.models.generate_content(\n", + " model = MODEL_GEMINI_IMAGE,\n", + " contents = prompt,\n", + " config=types.GenerateContentConfig(\n", + " response_modalities=['TEXT', 'IMAGE']\n", + " )\n", + " )\n", + "\n", + " for part in response.candidates[0].content.parts:\n", + " if part.inline_data is not None:\n", + " image_data = BytesIO(part.inline_data.data)\n", + " return Image.open(image_data)\n", + "\n", + " raise ValueError(\"No image found in Gemini response.\")\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bcd4aed1-8b4d-4771-ba32-e729e82bab54", + "metadata": {}, + "outputs": [], + "source": [ + "fetch_image(\"london\")" + ] + }, + { + "cell_type": "markdown", + "id": "5f6baee6-e2e2-4cc4-941d-34a4c72cee67", + "metadata": {}, + "source": [ + "## Speech Generation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "825dfedc-0271-4191-a3d1-50872af4c8cf", + "metadata": {}, + "outputs": [], + "source": [ + "\"\"\"\n", + "Kore -- Firm\n", + "Puck -- Upbeat\n", + "Leda -- Youthful\n", + "Iapetus -- Clear\n", + "Erinome -- Clear\n", + "Sadachbia -- Lively\n", + "Sulafat -- Warm\n", + "Despina -- Smooth\n", + "\"\"\"\n", + "\n", + "def talk(message:str, voice_name:str=\"Leda\", mood:str=\"cheerfully\"):\n", + " prompt = f\"Say {mood}: {message}\"\n", + " response = client.models.generate_content(\n", + " model = MODEL_GEMINI_SPEECH,\n", + " contents = prompt,\n", + " config=types.GenerateContentConfig(\n", + " response_modalities=[\"AUDIO\"],\n", + " speech_config=types.SpeechConfig(\n", + " voice_config=types.VoiceConfig(\n", + " prebuilt_voice_config=types.PrebuiltVoiceConfig(\n", + " voice_name=voice_name,\n", + " )\n", + " )\n", + " ), \n", + " )\n", + " )\n", + "\n", + " # Fetch the audio bytes\n", + " pcm_data = response.candidates[0].content.parts[0].inline_data.data\n", + " # Play the audio using PyAudio\n", + " p = pyaudio.PyAudio()\n", + " stream = p.open(format=pyaudio.paInt16, channels=1, rate=24000, output=True)\n", + " stream.write(pcm_data)\n", + " stream.stop_stream()\n", + " stream.close()\n", + " p.terminate()\n", + "\n", + " # Play using simpleaudio (16-bit PCM, mono, 24kHz)\n", + " # play_obj = sa.play_buffer(pcm_data, num_channels=1, bytes_per_sample=2, sample_rate=24000)\n", + " # play_obj.wait_done() " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "54967ebc-24a6-4bb2-9a19-20c3585f1d77", + "metadata": {}, + "outputs": [], + "source": [ + "talk(\"Hi, How are you? Welcome to FlyJumbo Airlines\",\"Kore\",\"helpful\")" + ] + }, + { + "cell_type": "markdown", + "id": "be9dc275-838e-4c54-b487-41d094dad96b", + "metadata": {}, + "source": [ + "## Ticket Price Tool Function - Using Amadeus API " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8613a080-d82c-4c1a-8db4-377614997ac2", + "metadata": {}, + "outputs": [], + "source": [ + "client_id = os.getenv(\"AMADEUS_CLIENT_ID\")\n", + "client_secret = os.getenv(\"AMADEUS_CLIENT_SECRET\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6bf78f61-0de1-4552-a1d4-1a28380be6a5", + "metadata": {}, + "outputs": [], + "source": [ + "# Get the token first\n", + "def get_amadeus_token():\n", + " url = \"https://test.api.amadeus.com/v1/security/oauth2/token\"\n", + " headers = {\"Content-Type\": \"application/x-www-form-urlencoded\"}\n", + " data = {\n", + " \"grant_type\": \"client_credentials\",\n", + " \"client_id\": client_id,\n", + " \"client_secret\": client_secret,\n", + " }\n", + " \n", + " try:\n", + " response = requests.post(url, headers=headers, data=data, timeout=10)\n", + " response.raise_for_status()\n", + " return response.json()[\"access_token\"]\n", + " \n", + " except requests.exceptions.HTTPError as e:\n", + " print(f\"HTTP Error {response.status_code}: {response.text}\")\n", + " \n", + " except requests.exceptions.RequestException as e:\n", + " print(\"Network or connection error:\", e)\n", + " \n", + " return None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1c5261f6-6662-4e9d-8ff0-8e10171bb963", + "metadata": {}, + "outputs": [], + "source": [ + "def get_airline_name(code, token):\n", + " url = f\"https://test.api.amadeus.com/v1/reference-data/airlines\"\n", + " headers = {\"Authorization\": f\"Bearer {token}\"}\n", + " params = {\"airlineCodes\": code}\n", + "\n", + " response = requests.get(url, headers=headers, params=params)\n", + " response.raise_for_status()\n", + " data = response.json()\n", + "\n", + " if \"data\" in data and data[\"data\"]:\n", + " return data[\"data\"][0].get(\"businessName\", code)\n", + " return code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42a55f06-880a-4c49-8560-2e7b97953c1a", + "metadata": {}, + "outputs": [], + "source": [ + "COMMON_CITY_CODES = {\n", + " \"delhi\": \"DEL\",\n", + " \"mumbai\": \"BOM\",\n", + " \"chennai\": \"MAA\",\n", + " \"kolkata\": \"CCU\",\n", + " \"bengaluru\": \"BLR\",\n", + " \"hyderabad\": \"HYD\",\n", + " \"patna\": \"PAT\",\n", + " \"raipur\": \"RPR\",\n", + " \"panaji\": \"GOI\",\n", + " \"chandigarh\": \"IXC\",\n", + " \"srinagar\": \"SXR\",\n", + " \"ranchi\": \"IXR\",\n", + " \"bengaluru\": \"BLR\",\n", + " \"thiruvananthapuram\": \"TRV\",\n", + " \"bhopal\": \"BHO\",\n", + " \"mumbai\": \"BOM\",\n", + " \"imphal\": \"IMF\",\n", + " \"aizawl\": \"AJL\",\n", + " \"bhubaneswar\": \"BBI\",\n", + " \"jaipur\": \"JAI\",\n", + " \"chennai\": \"MAA\",\n", + " \"hyderabad\": \"HYD\",\n", + " \"agartala\": \"IXA\",\n", + " \"lucknow\": \"LKO\",\n", + " \"dehradun\": \"DED\",\n", + " \"kolkata\": \"CCU\",\n", + "\n", + " # Union territories\n", + " \"port blair\": \"IXZ\",\n", + " \"leh\": \"IXL\",\n", + " \"puducherry\": \"PNY\",\n", + "\n", + " # Major metro cities (for redundancy)\n", + " \"ahmedabad\": \"AMD\",\n", + " \"surat\": \"STV\",\n", + " \"coimbatore\": \"CJB\",\n", + " \"vizag\": \"VTZ\",\n", + " \"vijayawada\": \"VGA\",\n", + " \"nagpur\": \"NAG\",\n", + " \"indore\": \"IDR\",\n", + " \"kanpur\": \"KNU\",\n", + " \"varanasi\": \"VNS\"\n", + "}\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b061ec2c-609b-4d77-bd41-c9bc5bf901f4", + "metadata": {}, + "outputs": [], + "source": [ + "city_code_cache = {}\n", + "\n", + "def get_city_code(city_name, token):\n", + " city_name = city_name.strip().lower()\n", + "\n", + " if city_name in city_code_cache:\n", + " return city_code_cache[city_name]\n", + "\n", + " if city_name in COMMON_CITY_CODES:\n", + " return COMMON_CITY_CODES[city_name]\n", + "\n", + " base_url = \"https://test.api.amadeus.com/v1/reference-data/locations\"\n", + " headers = {\"Authorization\": f\"Bearer {token}\"}\n", + "\n", + " for subtype in [\"CITY\", \"AIRPORT,CITY\"]:\n", + " params = {\"keyword\": city_name, \"subType\": subtype}\n", + " try:\n", + " response = requests.get(base_url, headers=headers, params=params, timeout=10)\n", + " response.raise_for_status()\n", + " data = response.json()\n", + "\n", + " if \"data\" in data and data[\"data\"]:\n", + " code = data[\"data\"][0][\"iataCode\"]\n", + " print(f\"[INFO] Found {subtype} match for '{city_name}': {code}\")\n", + " city_code_cache[city_name] = code\n", + " return code\n", + " except Exception as e:\n", + " print(f\"[ERROR] Location lookup failed for {subtype}: {e}\")\n", + "\n", + " return None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e9816a9c-fd70-4dfc-a3c0-4d8709997371", + "metadata": {}, + "outputs": [], + "source": [ + "# Getting live ticket price \n", + "\n", + "def get_live_ticket_prices(origin, destination, departure_date, return_date=None):\n", + " token = get_amadeus_token()\n", + "\n", + " url = \"https://test.api.amadeus.com/v2/shopping/flight-offers\"\n", + " headers = {\"Authorization\": f\"Bearer {token}\"}\n", + "\n", + " origin_code = get_city_code(origin,token)\n", + " destination_code = get_city_code(destination,token)\n", + "\n", + " if not origin_code:\n", + " return f\"Sorry, I couldn't find the airport code for the city '{origin}'.\"\n", + " if not destination_code:\n", + " return f\"Sorry, I couldn't find the airport code for the city '{destination}'.\"\n", + "\n", + " params = {\n", + " \"originLocationCode\": origin_code.upper(),\n", + " \"destinationLocationCode\": destination_code.upper(),\n", + " \"departureDate\": departure_date,\n", + " \"adults\": 1,\n", + " \"currencyCode\": \"USD\",\n", + " \"max\": 1,\n", + " }\n", + "\n", + " if return_date:\n", + " params[\"returnDate\"] = return_date\n", + "\n", + " try:\n", + " response = requests.get(url, headers=headers, params=params, timeout=10)\n", + " response.raise_for_status()\n", + " data = response.json()\n", + " \n", + " if \"data\" in data and data[\"data\"]:\n", + " offer = data[\"data\"][0]\n", + " price = offer[\"price\"][\"total\"]\n", + " airline_codes = offer.get(\"validatingAirlineCodes\", [])\n", + " airline_code = airline_codes[0] if airline_codes else \"Unknown\"\n", + "\n", + " try:\n", + " airline_name = get_airline_name(airline_code, token) if airline_code != \"Unknown\" else \"Unknown Airline\"\n", + " if not airline_name: \n", + " airline_name = airline_code\n", + " except Exception:\n", + " airline_name = airline_code\n", + " \n", + " \n", + " if return_date:\n", + " return (\n", + " f\"Round-trip flight from {origin.capitalize()} to {destination.capitalize()}:\\n\"\n", + " f\"- Departing: {departure_date}\\n\"\n", + " f\"- Returning: {return_date}\\n\"\n", + " f\"- Airline: {airline_name}\\n\"\n", + " f\"- Price: ${price}\"\n", + " )\n", + " else:\n", + " return (\n", + " f\"One-way flight from {origin.capitalize()} to {destination.capitalize()} on {departure_date}:\\n\"\n", + " f\"- Airline: {airline_name}\\n\"\n", + " f\"- Price: ${price}\"\n", + " )\n", + " else:\n", + " return f\"No flights found from {origin.capitalize()} to {destination.capitalize()} on {departure_date}.\"\n", + " except requests.exceptions.RequestException as e:\n", + " return f\"❌ Error fetching flight data: {str(e)}\" \n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7bc7657e-e8b5-4647-9745-d7d403feb09a", + "metadata": {}, + "outputs": [], + "source": [ + "get_live_ticket_prices(\"london\", \"chennai\", \"2025-07-01\",\"2025-07-10\")" + ] + }, + { + "cell_type": "markdown", + "id": "e1153b94-90e7-4856-8c85-e456305a7817", + "metadata": {}, + "source": [ + "## Ticket Booking Tool Function - DUMMY" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5dfc3b12-0a16-4861-a549-594f175ff956", + "metadata": {}, + "outputs": [], + "source": [ + "def book_flight(origin, destination, departure_date, return_date=None, airline=\"Selected Airline\", passenger_name=\"Guest\"):\n", + " # Generate a dummy ticket reference (PNR)\n", + " ticket_ref = ''.join(random.choices(string.ascii_uppercase + string.digits, k=6))\n", + "\n", + " # Build confirmation message\n", + " confirmation = (\n", + " f\"🎫 Booking confirmed for {passenger_name}!\\n\"\n", + " f\"From: {origin.capitalize()} → To: {destination.capitalize()}\\n\"\n", + " f\"Departure: {departure_date}\"\n", + " )\n", + "\n", + " if return_date:\n", + " confirmation += f\"\\nReturn: {return_date}\"\n", + "\n", + " confirmation += (\n", + " f\"\\nAirline: {airline}\\n\"\n", + " f\"PNR: {ticket_ref}\\n\"\n", + " f\"✅ Your ticket has been booked successfully. Safe travels!\"\n", + " )\n", + "\n", + " return confirmation\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "122f655b-b7a4-45c6-aaec-afd2917a051b", + "metadata": {}, + "outputs": [], + "source": [ + "print(book_flight(\"chennai\", \"delhi\", \"2025-07-01\", \"2025-07-10\", \"Air India\", \"Ravi Kumar\"))" + ] + }, + { + "cell_type": "markdown", + "id": "e83d8e90-ae22-4728-83e5-d83fed7f2049", + "metadata": {}, + "source": [ + "## Gemini Chat Workings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5a656f4e-914d-4f5e-b7fa-48457935181a", + "metadata": {}, + "outputs": [], + "source": [ + "ticket_price_function_declaration = {\n", + " \"name\":\"get_live_ticket_prices\",\n", + " \"description\": \"Get live flight ticket prices between two cities for a given date (round-trip or one-way).\\\n", + " The destination may be a city or country (e.g., 'China'). Call this function whenever a customer asks about ticket prices., such as 'How much is a ticket to Paris?'\",\n", + " \"parameters\":{\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"origin\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Name of the origin city. Example: 'Delhi'\",\n", + " },\n", + " \"destination\": {\n", + " \"type\": \"string\",\n", + " \"description\":\"Name of the destination city. Example: 'London'\",\n", + " },\n", + " \"departure_date\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Date of departure in YYYY-MM-DD format. Example: '2025-07-01'\",\n", + " },\n", + " \"return_date\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Optional return date for round-trip in YYYY-MM-DD format. Leave blank for one-way trips.\",\n", + " },\n", + " },\n", + " \"required\": [\"origin\", \"destination\", \"departure_date\"],\n", + " }\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05a835ab-a675-40ed-9cd8-65f4c6b22722", + "metadata": {}, + "outputs": [], + "source": [ + "book_flight_function_declaration = {\n", + " \"name\": \"book_flight\",\n", + " \"description\": \"Book a flight for the user after showing the ticket details and confirming the booking. \"\n", + " \"Call this function when the user says things like 'yes', 'book it', or 'I want to book this flight'.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"origin\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Name of the origin city. Example: 'Chennai'\",\n", + " },\n", + " \"destination\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Name of the destination city. Example: 'London'\",\n", + " },\n", + " \"departure_date\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Date of departure in YYYY-MM-DD format. Example: '2025-07-01'\",\n", + " },\n", + " \"return_date\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Optional return date for round-trip in YYYY-MM-DD format. Leave blank for one-way trips.\",\n", + " },\n", + " \"airline\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Airline name or code that the user wants to book with. Example: 'Air India'\",\n", + " },\n", + " \"passenger_name\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Full name of the passenger for the booking. Example: 'Ravi Kumar'\",\n", + " }\n", + " },\n", + " \"required\": [\"origin\", \"destination\", \"departure_date\", \"passenger_name\"],\n", + " }\n", + "}\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ad0231cd-040f-416d-b150-0d8f90535718", + "metadata": {}, + "outputs": [], + "source": [ + "# System Definitions\n", + "\n", + "system_instruction_prompt = (\n", + " \"You are a helpful and courteous AI assistant for an airline company called FlyJumbo. \"\n", + " \"When a user starts a new conversation, greet them with: 'Hi there, welcome to FlyJumbo! How can I help you?'. \"\n", + " \"Do not repeat this greeting in follow-up messages. \"\n", + " \"Use the available tools if a user asks about ticket prices. \"\n", + " \"Ask follow-up questions to gather all necessary information before calling a function.\"\n", + " \"After calling a tool, always continue the conversation by summarizing the result and asking the user the next relevant question (e.g., if they want to proceed with a booking).\"\n", + " \"If you do not know the answer and no tool can help, respond politely that you are unable to help with the request. \"\n", + " \"Answer concisely in one sentence.\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ff0b3de8-5674-4f08-9f9f-06f88ff959a1", + "metadata": {}, + "outputs": [], + "source": [ + "tools = types.Tool(function_declarations=[ticket_price_function_declaration,book_flight_function_declaration])\n", + "generate_content_config = types.GenerateContentConfig(system_instruction=system_instruction_prompt, tools=[tools])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00a56779-16eb-4f31-9941-2eb01d17ed87", + "metadata": {}, + "outputs": [], + "source": [ + "def handle_tool_call(function_call):\n", + " print(f\"🔧 Function Called - {function_call.name}\")\n", + " function_name = function_call.name\n", + " args = function_call.args\n", + "\n", + " if function_name == \"get_live_ticket_prices\":\n", + " origin = args.get(\"origin\")\n", + " destination = args.get(\"destination\")\n", + " departure_date = args.get(\"departure_date\")\n", + " return_date = args.get(\"return_date\") or None\n", + "\n", + " return get_live_ticket_prices(origin, destination, departure_date, return_date)\n", + "\n", + " elif function_name == \"book_flight\":\n", + " origin = args.get(\"origin\")\n", + " destination = args.get(\"destination\")\n", + " departure_date = args.get(\"departure_date\")\n", + " return_date = args.get(\"return_date\") or None\n", + " airline = args.get(\"airline\", \"Selected Airline\")\n", + " passenger_name = args.get(\"passenger_name\", \"Guest\")\n", + "\n", + " return book_flight(origin, destination, departure_date, return_date, airline, passenger_name)\n", + "\n", + " else:\n", + " return f\"❌ Unknown function: {function_name}\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d0c334d2-9ab0-4f80-ac8c-c66897e0bd7c", + "metadata": {}, + "outputs": [], + "source": [ + "def chat(message, history):\n", + " full_message_history = []\n", + " city_name = None\n", + "\n", + " # Convert previous history to Gemini-compatible format\n", + " for h in history:\n", + " if h[\"role\"] == \"user\":\n", + " full_message_history.append(\n", + " types.Content(role=\"user\", parts=[types.Part.from_text(text=h[\"content\"])])\n", + " )\n", + " elif h[\"role\"] == \"assistant\":\n", + " full_message_history.append(\n", + " types.Content(role=\"model\", parts=[types.Part.from_text(text=h[\"content\"])])\n", + " )\n", + "\n", + " # Add current user message\n", + " full_message_history.append(\n", + " types.Content(role=\"user\", parts=[types.Part.from_text(text=message)])\n", + " )\n", + "\n", + " # Send to Gemini with tool config\n", + " response = client.models.generate_content(\n", + " model=MODEL_GEMINI,\n", + " contents=full_message_history,\n", + " config=generate_content_config\n", + " )\n", + "\n", + " candidate = response.candidates[0]\n", + " part = candidate.content.parts[0]\n", + " function_call = getattr(part, \"function_call\", None)\n", + "\n", + " # Case: Tool call required\n", + " if function_call:\n", + " # Append model message that triggered tool call\n", + " full_message_history.append(\n", + " types.Content(role=\"model\", parts=candidate.content.parts)\n", + " )\n", + "\n", + " # Execute the tool\n", + " tool_output = handle_tool_call(function_call)\n", + "\n", + " # Wrap and append tool output\n", + " tool_response_part = types.Part.from_function_response(\n", + " name=function_call.name,\n", + " response={\"result\": tool_output}\n", + " )\n", + " \n", + " full_message_history.append(\n", + " types.Content(role=\"function\", parts=[tool_response_part])\n", + " )\n", + "\n", + "\n", + " if function_call.name == \"book_flight\":\n", + " city_name = function_call.args.get(\"destination\").lower()\n", + " \n", + "\n", + " # Send follow-up message including tool result\n", + " followup_response = client.models.generate_content(\n", + " model=MODEL_GEMINI,\n", + " contents=full_message_history,\n", + " config=generate_content_config\n", + " )\n", + "\n", + " final_text = followup_response.text\n", + " \n", + " full_message_history.append(\n", + " types.Content(role=\"model\", parts=[types.Part.from_text(text=final_text)])\n", + " )\n", + "\n", + " return final_text,city_name, history + [{\"role\": \"assistant\", \"content\": final_text}]\n", + " else:\n", + " text = response.text\n", + " return text, city_name, history + [{\"role\": \"assistant\", \"content\": text}]\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b245e6c-ef0b-4edf-b178-f14f2a75f285", + "metadata": {}, + "outputs": [], + "source": [ + "def user_submit(user_input, history):\n", + " history = history or []\n", + " history.append({\"role\": \"user\", \"content\": user_input})\n", + " \n", + " response_text, city_to_image, updated_history = chat(user_input, history)\n", + "\n", + " # Speak the response\n", + " try:\n", + " talk(response_text)\n", + " except Exception as e:\n", + " print(\"[Speech Error] Speech skipped due to quota limit.\")\n", + "\n", + " image = fetch_image(city_to_image) if city_to_image else None\n", + "\n", + " return \"\", updated_history, image, updated_history\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7db25b86-9a71-417c-98f0-790e3f3531bf", + "metadata": {}, + "outputs": [], + "source": [ + "with gr.Blocks() as demo:\n", + " gr.Markdown(\"## ✈️ FlyJumbo Airline Assistant\")\n", + "\n", + " with gr.Row():\n", + " with gr.Column(scale=3):\n", + " chatbot = gr.Chatbot(label=\"Assistant\", height=500, type=\"messages\")\n", + " msg = gr.Textbox(placeholder=\"Ask about flights...\", show_label=False)\n", + " send_btn = gr.Button(\"Send\")\n", + "\n", + " with gr.Column(scale=2):\n", + " image_output = gr.Image(label=\"Trip Visual\", visible=True, height=500)\n", + "\n", + " state = gr.State([])\n", + " \n", + " send_btn.click(fn=user_submit, inputs=[msg, state], outputs=[msg, chatbot, image_output, state])\n", + " msg.submit(fn=user_submit, inputs=[msg, state], outputs=[msg, chatbot, image_output, state])\n", + "\n", + "demo.launch(inbrowser=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef31bf62-9034-4fa7-b803-8f5df5309b77", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/agent_conversation_shakespeare.ipynb b/week2/community-contributions/agent_conversation_shakespeare.ipynb new file mode 100644 index 0000000..6d55283 --- /dev/null +++ b/week2/community-contributions/agent_conversation_shakespeare.ipynb @@ -0,0 +1,351 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "06cf3063-9f3e-4551-a0d5-f08d9cabb927", + "metadata": {}, + "source": [ + "# Triangular agent conversation\n", + "\n", + "## GPT (Hamlet), LLM (Falstaff), Gemini (Iago):" + ] + }, + { + "cell_type": "markdown", + "id": "3637910d-2c6f-4f19-b1fb-2f916d23f9ac", + "metadata": {}, + "source": [ + "### Created a 3-way, bringing Gemini into the coversation.\n", + "### Replacing one of the models with an open source model running with Ollama." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8e0c1bd-a159-475b-9cdc-e219a7633355", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "from IPython.display import Markdown, display, update_display\n", + "import ollama" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a3ad57ad-46a8-460e-9cb3-67a890093536", + "metadata": {}, + "outputs": [], + "source": [ + "import google.generativeai" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f531c14-5743-4a5b-83d9-cb5863ca2ddf", + "metadata": {}, + "outputs": [], + "source": [ + "# Load environment variables in a file called .env\n", + "# Print the key prefixes to help with any debugging\n", + "\n", + "load_dotenv(override=True)\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "google_api_key = os.getenv('GOOGLE_API_KEY')\n", + "\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + "\n", + "if google_api_key:\n", + " print(f\"Google API Key exists and begins {google_api_key[:8]}\")\n", + "else:\n", + " print(\"Google API Key not set\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d5150ee-3858-4921-bce6-2eecfb96bc75", + "metadata": {}, + "outputs": [], + "source": [ + "# Connect to OpenAI\n", + "\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11381fd8-5099-41e8-a1d7-6787dea56e43", + "metadata": {}, + "outputs": [], + "source": [ + "google.generativeai.configure()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1766d20-54b6-4f76-96c5-c338ae7073c9", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_model = \"gpt-4o-mini\"\n", + "llama_model = \"llama3.2\"\n", + "gemini_model = 'gemini-2.0-flash'\n", + "\n", + "gpt_system = \"You are playing part of Hamlet. he is philosopher, probes Iago with a mixture of suspicion\\\n", + "and intellectual curiosity, seeking to unearth the origins of his deceit.\\\n", + "Is malice born of scorn, envy, or some deeper void? Hamlet’s introspective nature\\\n", + "drives him to question whether Iago’s actions reveal a truth about humanity itself.\\\n", + "You will respond as Shakespear's Hamlet will do.\"\n", + "\n", + "llama_system = \"You are acting part of Falstaff who attempts to lighten the mood with his jokes and observations,\\\n", + "potentially clashing with Hamlet's melancholic nature.You respond as Shakespear's Falstaff do.\"\n", + "\n", + "gemini_system = \"You are acting part of Iago, subtly trying to manipulate both Hamlet and Falstaff\\\n", + "to his own advantage, testing their weaknesses and exploiting their flaws. You respond like Iago\"\n", + "\n", + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]\n", + "gemini_messages = [\"Hello\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "806a0506-dac8-4bad-ac08-31f350256b58", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " messages = [{\"role\": \"system\", \"content\": gpt_system}]\n", + " for gpt, claude, gemini in zip(gpt_messages, llama_messages, gemini_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": claude})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43674885-ede7-48bf-bee4-467454f3e96a", + "metadata": {}, + "outputs": [], + "source": [ + "def call_llama():\n", + " messages = []\n", + " for gpt, llama, gemini in zip(gpt_messages, llama_messages, gemini_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": llama})\n", + " messages.append({\"role\": \"user\", \"content\": gemini})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " response = ollama.chat(model=llama_model, messages=messages)\n", + "\n", + " \n", + " return response['message']['content']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "03d34769-b339-4c4b-8c60-69494c39d725", + "metadata": {}, + "outputs": [], + "source": [ + "#import google.generativeai as genai\n", + "\n", + "# Make sure you configure the API key first:\n", + "#genai.configure(api_key=\"YOUR_API_KEY\")\n", + "\n", + "def call_gemini():\n", + " gemini_messages = []\n", + " \n", + " # Format the history for Gemini\n", + " for gpt, llama, gemini_message in zip(gpt_messages, llama_messages, gemini_messages):\n", + " gemini_messages.append({\"role\": \"user\", \"parts\": [gpt]}) # Hamlet speaks\n", + " gemini_messages.append({\"role\": \"model\", \"parts\": [llama]}) # Falstaff responds\n", + " gemini_messages.append({\"role\": \"model\", \"parts\": [gemini_message]}) # Iago responds\n", + "\n", + " # Add latest user input if needed (optional)\n", + " gemini_messages.append({\"role\": \"user\", \"parts\": [llama_messages[-1]]})\n", + "\n", + " # Initialize the model with the correct system instruction\n", + " gemini = google.generativeai.GenerativeModel(\n", + " #model_name='gemini-1.5-flash', # Or 'gemini-pro'\n", + " model_name = gemini_model,\n", + " system_instruction=gemini_system\n", + " )\n", + "\n", + " response = gemini.generate_content(gemini_messages)\n", + " return response.text\n", + "#print(response.text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "93fc8253-67cb-4ea4-aff7-097b2a222793", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]\n", + "gemini_messages = [\"Hello\"]\n", + "\n", + "print(f\"Hamlet:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Falstaff:\\n{llama_messages[0]}\\n\")\n", + "print(f\"Iago:\\n{gemini_messages[0]}\\n\")\n", + "\n", + "for i in range(3):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT:\\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + " \n", + " llama_next = call_llama()\n", + " print(f\"Llama:\\n{llama_next}\\n\")\n", + " llama_messages.append(llama_next)\n", + "\n", + " gemini_next = call_gemini()\n", + " print(f\"Gemini:\\n{gemini_next}\\n\")\n", + " llama_messages.append(gemini_next)" + ] + }, + { + "cell_type": "markdown", + "id": "bca66ffc-9dc1-4384-880c-210889f5d0ac", + "metadata": {}, + "source": [ + "## Conversation between gpt-4.0-mini and llama3.2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c23224f6-7008-44ed-a57f-718975f4e291", + "metadata": {}, + "outputs": [], + "source": [ + "# Let's make a conversation between GPT-4o-mini and Claude-3-haiku\n", + "# We're using cheap versions of models so the costs will be minimal\n", + "\n", + "gpt_model = \"gpt-4o-mini\"\n", + "llama_model = \"llama3.2\"\n", + "\n", + "gpt_system = \"You are a tapori from mumbai who is very optimistic; \\\n", + "you alway look at the brighter part of the situation and you always ready to take act to win way.\"\n", + "\n", + "llama_system = \"You are a Jaat from Haryana. You try to express with hindi poems \\\n", + "to agree with other person and or find common ground. If the other person is optimistic, \\\n", + "you respond in poetic way and keep chatting.\"\n", + "\n", + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2d704bbb-f22b-400d-a695-efbd02b26548", + "metadata": {}, + "outputs": [], + "source": [ + "def call_gpt():\n", + " messages = [{\"role\": \"system\", \"content\": gpt_system}]\n", + " for gpt, llama in zip(gpt_messages, llama_messages):\n", + " messages.append({\"role\": \"assistant\", \"content\": gpt})\n", + " messages.append({\"role\": \"user\", \"content\": llama})\n", + " completion = openai.chat.completions.create(\n", + " model=gpt_model,\n", + " messages=messages\n", + " )\n", + " return completion.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "385ccec8-de59-4e42-9616-3f5c9a05589c", + "metadata": {}, + "outputs": [], + "source": [ + "def call_llama():\n", + " messages = []\n", + " for gpt, llama_message in zip(gpt_messages, llama_messages):\n", + " messages.append({\"role\": \"user\", \"content\": gpt})\n", + " messages.append({\"role\": \"assistant\", \"content\": llama_message})\n", + " messages.append({\"role\": \"user\", \"content\": gpt_messages[-1]})\n", + " response = ollama.chat(model=llama_model, messages=messages)\n", + "\n", + " \n", + " return response['message']['content']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70b5481b-455e-4275-80d3-0afe0fabcb0f", + "metadata": {}, + "outputs": [], + "source": [ + "gpt_messages = [\"Hi there\"]\n", + "llama_messages = [\"Hi\"]\n", + "\n", + "print(f\"GPT:\\n{gpt_messages[0]}\\n\")\n", + "print(f\"Llama:\\n{llama_messages[0]}\\n\")\n", + "\n", + "for i in range(3):\n", + " gpt_next = call_gpt()\n", + " print(f\"GPT:\\n{gpt_next}\\n\")\n", + " gpt_messages.append(gpt_next)\n", + " \n", + " llama_next = call_llama()\n", + " print(f\"Llama:\\n{llama_next}\\n\")\n", + " llama_messages.append(llama_next)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7f8d734b-57e5-427d-bcb1-7956fc58a348", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "llmenv", + "language": "python", + "name": "llmenv" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week2/community-contributions/anatomy_poster_generator/README.md b/week2/community-contributions/anatomy_poster_generator/README.md new file mode 100644 index 0000000..cd82535 --- /dev/null +++ b/week2/community-contributions/anatomy_poster_generator/README.md @@ -0,0 +1,10 @@ +# Anatomy Poster Generator + +This tool generates AI-powered wall art of human anatomy, designed to support meaningful conversations in clinical spaces. + +Built with: +- DALL·E 3 for image generation +- Python + Gradio for a simple UI +- Hugging Face Spaces for easy sharing (https://huggingface.co/spaces/sukihealth/wallanatomypostergenerator) + +See full repo: [github.com/sukihealth/retro-pop-art-anatomy](https://github.com/sukihealth/retro-pop-art-anatomy) diff --git a/week2/community-contributions/clinic_booking_bot.ipynb b/week2/community-contributions/clinic_booking_bot.ipynb new file mode 100644 index 0000000..d2d8b57 --- /dev/null +++ b/week2/community-contributions/clinic_booking_bot.ipynb @@ -0,0 +1,344 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 170, + "id": "a1aa1b43-7a47-4aca-ae5f-94a9d4ba2d89", + "metadata": {}, + "outputs": [], + "source": [ + "## Clinic Booking Bot\n", + "\n", + "##Easily book your clinic visit – available only on weekdays between **14:00 and 15:00**. \n", + "##Speak or type, and get instant confirmation.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 171, + "id": "fe798c6a-f8da-46aa-8c0e-9d2623def3d2", + "metadata": {}, + "outputs": [], + "source": [ + "# import library\n", + "\n", + "import os\n", + "import json\n", + "from dotenv import load_dotenv\n", + "from openai import OpenAI\n", + "import gradio as gr\n", + "import base64\n", + "from io import BytesIO\n", + "from datetime import date\n", + "from PIL import Image, ImageDraw, ImageFont\n" + ] + }, + { + "cell_type": "code", + "execution_count": 172, + "id": "0ad4e526-e95d-4e70-9faa-b4236b105dd5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OpenAI API Key exists and begins sk-proj-\n" + ] + } + ], + "source": [ + "# Save keys\n", + "\n", + "load_dotenv(override=True)\n", + "\n", + "openai_api_key = os.getenv('OPENAI_API_KEY')\n", + "if openai_api_key:\n", + " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", + "else:\n", + " print(\"OpenAI API Key not set\")\n", + " \n", + "MODEL = \"gpt-4o-mini\"\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": 173, + "id": "ae95308e-0002-4017-9f2c-fcb1ddb248fa", + "metadata": {}, + "outputs": [], + "source": [ + "# --- CONFIG ---\n", + "BOOKING_START = 14\n", + "BOOKING_END = 15\n", + "WEEKDAYS = [\"Monday\", \"Tuesday\", \"Wednesday\", \"Thursday\", \"Friday\"]\n", + "PHONE = \"010-1234567\"\n", + "confirmed_bookings = []\n" + ] + }, + { + "cell_type": "code", + "execution_count": 174, + "id": "e21b0fd0-4cda-4938-8867-dc2c6e7af4b1", + "metadata": {}, + "outputs": [], + "source": [ + "# --- TTS ---\n", + "def generate_tts(text, voice=\"fable\", filename=\"output.mp3\"):\n", + " response = openai.audio.speech.create(\n", + " model=\"tts-1\",\n", + " voice=\"fable\",\n", + " input=text\n", + " )\n", + " with open(filename, \"wb\") as f:\n", + " f.write(response.content)\n", + " return filename" + ] + }, + { + "cell_type": "code", + "execution_count": 175, + "id": "e28a5c3b-bd01-4845-a41e-87823f6bb078", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Translate Booking Confirmation ---\n", + "def translate_text(text, target_language=\"nl\"):\n", + " prompt = f\"Translate this message to {target_language}:\\n{text}\"\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": \"You are a helpful translator.\"},\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " )\n", + " return response.choices[0].message.content.strip()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 176, + "id": "8ed57cc9-7d54-4a5d-831b-0efcc5b7a7a9", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Booking Logic ---\n", + "def book_appointment(name, time_str):\n", + " try:\n", + " booking_time = datetime.strptime(time_str, \"%H:%M\")\n", + " except ValueError:\n", + " return \"Invalid time format. Use HH:MM.\", None, None\n", + "\n", + " hour = booking_time.hour\n", + " weekday = datetime.today().strftime(\"%A\")\n", + "\n", + " if weekday not in WEEKDAYS:\n", + " response = \"Bookings are only available on weekdays.\"\n", + " elif BOOKING_START <= hour < BOOKING_END:\n", + " confirmation = f\"Booking confirmed for {name} at {time_str}.\"\n", + " confirmed_bookings.append((name, time_str))\n", + " translated = translate_text(confirmation)\n", + " audio = generate_tts(translated)\n", + " image = generate_booking_image(name, time_str)\n", + " return translated, audio, image\n", + " else:\n", + " response = \"Sorry, bookings are only accepted between 14:00 and 15:00 on weekdays.\"\n", + " translated = translate_text(response)\n", + " audio = generate_tts(translated)\n", + " return translated, audio, None" + ] + }, + { + "cell_type": "code", + "execution_count": 177, + "id": "19b52115-f0f3-4d63-a463-886163d4cfd1", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Booking Card ---\n", + "def generate_booking_image(name, time_str):\n", + " img = Image.new(\"RGB\", (500, 250), color=\"white\")\n", + " draw = ImageDraw.Draw(img)\n", + " msg = f\"\\u2705 Booking Confirmed\\nName: {name}\\nTime: {time_str}\"\n", + " draw.text((50, 100), msg, fill=\"black\")\n", + " return img" + ] + }, + { + "cell_type": "code", + "execution_count": 178, + "id": "2c446b6c-d410-4ba1-b0c7-c475e5259ff5", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Voice Booking ---\n", + "def voice_booking(audio_path, name):\n", + " with open(audio_path, \"rb\") as f:\n", + " response = openai.audio.transcriptions.create(model=\"whisper-1\", file=f)\n", + " transcription = response.text.strip()\n", + "\n", + " system_prompt = \"\"\"\n", + " You are a clinic assistant. Extract only the appointment time from the user's sentence in 24-hour HH:MM format.\n", + " If no time is mentioned, respond with 'No valid time found.'\n", + " \"\"\"\n", + "\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": transcription}\n", + " ]\n", + " )\n", + " extracted_time = response.choices[0].message.content.strip()\n", + "\n", + " if \":\" in extracted_time:\n", + " return book_appointment(name, extracted_time)\n", + " else:\n", + " message = \"Sorry, I couldn't understand the time. Please try again.\"\n", + " translated = translate_text(message)\n", + " audio_path = generate_tts(translated)\n", + " return translated, audio_path, None" + ] + }, + { + "cell_type": "code", + "execution_count": 179, + "id": "121d2907-7fa8-4248-b2e7-83617ea66ff0", + "metadata": {}, + "outputs": [], + "source": [ + "# --- Chat Bot Handler ---\n", + "def chat_bot(messages):\n", + " system_prompt = \"\"\"\n", + " You are a clinic booking assistant. Your job is to:\n", + " - Greet the patient and explain your role\n", + " - Only assist with making appointments\n", + " - Accept bookings only on weekdays between 14:00 and 15:00\n", + " - Do not provide medical advice\n", + " - Always respond with empathy and clarity\n", + " \"\"\"\n", + " response = openai.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[{\"role\": \"system\", \"content\": system_prompt}] + messages\n", + " )\n", + " reply = response.choices[0].message.content.strip()\n", + " audio = generate_tts(reply)\n", + " return reply, audio" + ] + }, + { + "cell_type": "code", + "execution_count": 180, + "id": "2427b694-8c57-40cb-b202-4a8989547925", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Running on local URL: http://127.0.0.1:7898\n", + "* To create a public link, set `share=True` in `launch()`.\n" + ] + }, + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "``` +. ├── app/ + │ + ├── app.py # Main script entry point + │ + ├── consts.py # Configuration and constants + │ + └── requirements.txt # Python dependencies + ├── data/ + │ + └── software_engineer_jobs.json # Sample input data (JSON format) + ├── notebooks/ + │ + └── synthetic_data_generator.ipynb # Interactive Colab notebook + ├── .env.example # Sample environment variable config + ├── .gitignore # Git ignored files list + └── README.md + ```+ +## 🚀 Getting Started + +### 1. Clone the repository +```bash +git clone https://github.com/moawiah/synthetic_data_generator.git +cd synthetic_data_generator +``` +### Install Dependencies +```bah +pip install -r app/requirements.txt +``` +### Hugging Face Token +You need to create a `.env` file with your HuggingFace token like `HF_TOKEN=your-token-here` + +### Run +run the app using +`python app/app.py` + + +## Example Output - 1 Job + +```JSON +{ +"title": "Software Engineer" +, +"description": "We are seeking a highly skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have experience in designing, coding, and testing software systems, and will be able to work collaboratively with cross-functional teams. Responsibilities include writing clean, maintainable, and efficient code, as well as actively participating in code reviews and continuous integration processes. This is an excellent opportunity for a self-starter with a passion for technology and a desire to grow in their career." +, +"requirements":[ +"0":"Bachelor's degree in Computer Science or related field", +"1":"Minimum of 2 years experience in software development", +"2":"Strong proficiency in Java or C++", +"3":"Experience with agile development methodologies", +"4":"Good understanding of data structures and algorithms", +"5":"Excellent problem-solving and analytical skills" +], +"location":"New York, NY", +"company_name":"ABC Technologies" +} + +``` + + +## Future Improvements +🔁 Add support for more job roles and industries + +🧠 Model selector from UI + +💾 Export dataset as CSV + +☁️ Optional integration with LangChain or RAG workflows + + + + + diff --git a/week3/community-contributions/muawiya/app/app.py b/week3/community-contributions/muawiya/app/app.py new file mode 100644 index 0000000..4b3fc79 --- /dev/null +++ b/week3/community-contributions/muawiya/app/app.py @@ -0,0 +1,156 @@ +import os +import requests +from IPython.display import Markdown, display, update_display +from openai import OpenAI +from google.colab import drive +from huggingface_hub import login +from google.colab import userdata +from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig, pipeline, TextGenerationPipeline +import torch +from consts import FALCON, MISTRAL, Databricks +from dotenv import load_dotenv +import json +import ast +import gradio as gr +import re + +# Sign in to HuggingFace Hub +load_dotenv() +hf_token = os.getenv("HF_TOKEN") + + +# Main Prompt +prompt = """ +Generate one fake job posting for a {{role}}. + +Return only a single JSON object with: +- title +- description (5-10 sentences) +- requirements (array of 4-6 strings) +- location +- company_name + +No explanations, no extra text. +Only the JSON object. +""" + +# Main Conf +bnb_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_use_double_quant=True, + bnb_4bit_compute_dtype=torch.bfloat16, + bnb_4bit_quant_type="nf4" +) + +def load_model_and_tokenizer(): + tokenizer = AutoTokenizer.from_pretrained(MISTRAL, trust_remote_code=True) + + model = AutoModelForCausalLM.from_pretrained( + MISTRAL, + device_map={"": "cuda"}, + trust_remote_code=True, + offload_folder="/tmp/dolly_offload", + quantization_config=bnb_config + ) + + return model, tokenizer + + +def generate_job(role="Software Engineer", model=None, tokenizer=None): + # prompt = prompt.format(role=role, n=n) + # outputs = generator(prompt, max_new_tokens=500, do_sample=True, temperature=0.9) + # return outputs[0]['generated_text'] + + # Apply chat template formatting + # inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device) + inputs = tokenizer(prompt.format(role=role), return_tensors="pt") + inputs = {k: v.to(model.device) for k, v in inputs.items()} + + + # Generate output + outputs = model.generate( + **inputs, + max_new_tokens=600, + do_sample=True, + temperature=0.2, + top_p=0.9, + pad_token_id=tokenizer.eos_token_id + ) + + # Decode and return + result = tokenizer.decode(outputs[0], skip_special_tokens=True) + return result + +def generate_jobs(role="Software Engineer", n=5): + model, tokenizer = load_model_and_tokenizer() + role = "Software Engineer" + fake_jobs = [] + for i in range(n): + fake_jobs.append(generate_job(role=role, model=model, tokenizer=tokenizer)) + return fake_jobs + +def extract_json_objects_from_text_block(texts): + """ + Accepts either a single string or a list of strings. + Extracts all valid JSON objects from messy text blocks. + """ + if isinstance(texts, str): + texts = [texts] # wrap in list if single string + + pattern = r"\{[\s\S]*?\}" + results = [] + + for raw_text in texts: + matches = re.findall(pattern, raw_text) + for match in matches: + try: + obj = json.loads(match) + results.append(obj) + except json.JSONDecodeError: + continue + + return results + +def generate_ui(role, n): + try: + raw_jobs = generate_jobs(role, n) + parsed_jobs = extract_json_objects_from_text_block(raw_jobs) + + if not isinstance(parsed_jobs, list) or not all(isinstance(item, dict) for item in parsed_jobs): + print("[ERROR] Parsed result is not a list of dicts") + return gr.update(value=[], visible=True), None + + filename = f"data/{role.replace(' ', '_').lower()}_jobs.json" + with open(filename, "w") as f: + json.dump(parsed_jobs, f, indent=2) + + print(f"[INFO] Returning {len(parsed_jobs)} jobs -> {filename}") + return parsed_jobs, filename + + except Exception as e: + print(f"[FATAL ERROR] {e}") + return gr.update(value=[], visible=True), None + + +if __name__ == "__main__": + with gr.Blocks() as demo: + gr.Markdown("# 🧠 Synthetic Job Dataset Generator") + gr.Markdown("Generate a structured dataset of job postings for a specific role.") + + with gr.Row(): + role_input = gr.Textbox(label="Job Role", placeholder="e.g. Software Engineer", value="Software Engineer") + n_input = gr.Number(label="Number of Samples", value=5, precision=0) + + generate_button = gr.Button("🚀 Generate") + output_table = gr.JSON(label="Generated Dataset") + download_button = gr.File(label="Download JSON") + + generate_button.click( + generate_ui, + inputs=[role_input, n_input], + outputs=[output_table, download_button] + ) + + demo.launch(debug=True, share=True) + + diff --git a/week3/community-contributions/muawiya/app/consts.py b/week3/community-contributions/muawiya/app/consts.py new file mode 100644 index 0000000..b62eb2d --- /dev/null +++ b/week3/community-contributions/muawiya/app/consts.py @@ -0,0 +1,5 @@ +# Models +GPT = 'gpt2' +FALCON = "tiiuae/falcon-rw-1b" +MISTRAL = "mistralai/Mistral-7B-Instruct-v0.1" +Databricks = "databricks/dolly-v2-3b" \ No newline at end of file diff --git a/week3/community-contributions/muawiya/app/requirements.txt b/week3/community-contributions/muawiya/app/requirements.txt new file mode 100644 index 0000000..9590dce --- /dev/null +++ b/week3/community-contributions/muawiya/app/requirements.txt @@ -0,0 +1,7 @@ +huggingface_hub==0.30.2 +ipython==8.12.3 +openai==1.76.2 +protobuf==6.30.2 +Requests==2.32.3 +torch==2.6.0+cu124 +transformers==4.51.3 \ No newline at end of file diff --git a/week3/community-contributions/muawiya/data/software_engineer_jobs.json b/week3/community-contributions/muawiya/data/software_engineer_jobs.json new file mode 100644 index 0000000..1a09d49 --- /dev/null +++ b/week3/community-contributions/muawiya/data/software_engineer_jobs.json @@ -0,0 +1,71 @@ +[ + { + "title": "Software Engineer", + "description": "We are seeking a highly skilled software engineer to join our team in developing and maintaining complex software systems. The ideal candidate will have a strong background in computer science and experience with multiple programming languages. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. This is an excellent opportunity for a self-starter with a passion for technology and a desire to grow in their career.", + "requirements": [ + "Bachelor's degree in Computer Science or related field", + "3+ years of experience in software development", + "Strong proficiency in Java or C++", + "Experience with agile development methodologies", + "Excellent problem-solving and analytical skills" + ], + "location": "New York, NY", + "company_name": "ABC Technologies" + }, + { + "title": "Software Engineer", + "description": "We are looking for a highly skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have experience in designing, developing, and testing software systems, and be able to work independently or as part of a team. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. Must have a strong understanding of computer science principles and be able to learn quickly. This is a full-time position located in San Francisco, CA.", + "requirements": [ + "Bachelor's degree in Computer Science or related field", + "3+ years of experience in software development", + "Strong proficiency in Java or C++", + "Experience with agile development methodologies", + "Excellent problem-solving skills", + "Ability to work in a fast-paced environment" + ], + "location": "San Francisco, CA", + "company_name": "Acme Inc." + }, + { + "title": "Software Engineer", + "description": "We are seeking a highly skilled software engineer to join our team in developing and maintaining our cutting-edge software applications. The ideal candidate will have a strong background in computer science and software engineering, with experience in designing, coding, and testing software systems. Responsibilities include collaborating with cross-functional teams, writing clean and efficient code, and ensuring the timely delivery of high-quality software products. This is an excellent opportunity for a self-starter with a passion for technology and a desire to work in a dynamic and fast-paced environment.", + "requirements": [ + "Bachelor's degree in Computer Science or related field", + "3+ years of experience in software engineering", + "Strong proficiency in Java, Python, or C++", + "Experience with agile development methodologies", + "Excellent problem-solving and analytical skills", + "Strong communication and interpersonal skills" + ], + "location": "New York, NY", + "company_name": "ABC Tech" + }, + { + "title": "Software Engineer", + "description": "We are seeking a highly skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have a strong background in computer science and experience with various programming languages and technologies. Responsibilities include designing, coding, testing, and maintaining software systems, as well as collaborating with cross-functional teams. This is an excellent opportunity for a creative and motivated individual to make a significant impact in the tech industry.", + "requirements": [ + "Bachelor's degree in Computer Science or related field", + "Minimum of 2 years experience in software development", + "Strong proficiency in Java, Python, or C++", + "Experience with agile development methodologies", + "Excellent problem-solving and analytical skills", + "Ability to work independently and as part of a team", + "Strong communication and interpersonal skills" + ], + "location": "New York, NY", + "company_name": "ABC Tech Inc." + }, + { + "title": "Software Engineer", + "description": "We are looking for a skilled software engineer to join our team and contribute to the development of innovative software solutions. Responsibilities include designing, coding, testing and maintaining software systems, as well as collaborating with cross-functional teams. The ideal candidate will have a strong background in computer science or a related field, and at least 3 years of experience in software development. Must be proficient in multiple programming languages, including Java, Python, and C++. Strong problem-solving skills and the ability to work independently or as part of a team are required. This is a full-time position located in San Francisco, CA.", + "requirements": [ + "Bachelor's degree in Computer Science or related field", + "At least 3 years of experience in software development", + "Proficiency in Java, Python, and C++", + "Strong problem-solving skills", + "Ability to work independently or as part of a team" + ], + "location": "San Francisco, CA", + "company_name": "Innovative Solutions Inc." + } +] \ No newline at end of file diff --git a/week3/community-contributions/muawiya/notebooks/synthetic_data_generator.ipynb b/week3/community-contributions/muawiya/notebooks/synthetic_data_generator.ipynb new file mode 100644 index 0000000..09f6f9e --- /dev/null +++ b/week3/community-contributions/muawiya/notebooks/synthetic_data_generator.ipynb @@ -0,0 +1,5509 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "machine_shape": "hm", + "gpuType": "A100" + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU", + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "1d1fe06ac632475086ed5964ed000360": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c138f597c98c4944b54d36510ecc8e0b", + "IPY_MODEL_bef2531516164e85bb79b86a791dd00d", + "IPY_MODEL_1cb9fc011950479a8d4832bc52c3399c" + ], + "layout": "IPY_MODEL_974e8f7f05ef472d85d5ea71425e6c39" + } + }, + "c138f597c98c4944b54d36510ecc8e0b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_696090959af8499e9a38777e664b85c1", + "placeholder": "", + "style": "IPY_MODEL_973bcc9740b4426da4c680d11f3c1f7e", + "value": "tokenizer_config.json: 100%" + } + }, + "bef2531516164e85bb79b86a791dd00d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3cb5d8fdb5fb4b6a99f6733c00df8378", + "max": 2103, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_58f4369c68434d569d5eb1bc36e71775", + "value": 2103 + } + }, + "1cb9fc011950479a8d4832bc52c3399c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a05df972876941e3b6faab56cc30a4b8", + "placeholder": "", + "style": "IPY_MODEL_9c61d90b63dd4fb5a481282d6d6eb8e8", + "value": " 2.10k/2.10k [00:00<00:00, 182kB/s]" + } + }, + "974e8f7f05ef472d85d5ea71425e6c39": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "696090959af8499e9a38777e664b85c1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "973bcc9740b4426da4c680d11f3c1f7e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3cb5d8fdb5fb4b6a99f6733c00df8378": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "58f4369c68434d569d5eb1bc36e71775": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "a05df972876941e3b6faab56cc30a4b8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9c61d90b63dd4fb5a481282d6d6eb8e8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2b71f87a02a540488a9e07f072f8807a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_548cd7e9fab54470bc52810f27784760", + "IPY_MODEL_9c5eb078ece84a57aa9c402c9cad3b0b", + "IPY_MODEL_ee00a9f599db4affabb7bf1c4df6ca1a" + ], + "layout": "IPY_MODEL_52bd638607bf4e1aaf224ebdcfa3693d" + } + }, + "548cd7e9fab54470bc52810f27784760": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_771619a5acd343c788b8189167af09d4", + "placeholder": "", + "style": "IPY_MODEL_09a1b30b5659452f95ebb2e72466c750", + "value": "tokenizer.model: 100%" + } + }, + "9c5eb078ece84a57aa9c402c9cad3b0b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_145a1f1032a44079a262db381e60d401", + "max": 493443, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_99888ad83b51485f959f977ba4418119", + "value": 493443 + } + }, + "ee00a9f599db4affabb7bf1c4df6ca1a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ec0854c2ea9a4c9280b6876df365db9d", + "placeholder": "", + "style": "IPY_MODEL_dac5892c85214f69a5d75d5dc4858dfe", + "value": " 493k/493k [00:00<00:00, 7.91MB/s]" + } + }, + "52bd638607bf4e1aaf224ebdcfa3693d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "771619a5acd343c788b8189167af09d4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "09a1b30b5659452f95ebb2e72466c750": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "145a1f1032a44079a262db381e60d401": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "99888ad83b51485f959f977ba4418119": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ec0854c2ea9a4c9280b6876df365db9d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dac5892c85214f69a5d75d5dc4858dfe": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "41b669da565e4204b848b754dfa28ac8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e806afdada48418c9e353b94a38cd703", + "IPY_MODEL_7898b7322b014e96984c3d09a29a57fb", + "IPY_MODEL_d665270b05d64effba568ded85eee1b4" + ], + "layout": "IPY_MODEL_df087de9ade24058b1cf32e1556f7cb6" + } + }, + "e806afdada48418c9e353b94a38cd703": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_584330ab439b4887b1050a7f14dc5d7c", + "placeholder": "", + "style": "IPY_MODEL_880b32d3bd1d4af8b5d0b449aab87e8b", + "value": "tokenizer.json: 100%" + } + }, + "7898b7322b014e96984c3d09a29a57fb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_97d09f016e274cca93927f3bd8329352", + "max": 1795188, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d87ef5878c0f4211809716674d0d8413", + "value": 1795188 + } + }, + "d665270b05d64effba568ded85eee1b4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_556109848b1c4ebc99a6cc7c0be519e0", + "placeholder": "", + "style": "IPY_MODEL_8d6cdfd75e3f4a628c9e785d3c469d98", + "value": " 1.80M/1.80M [00:00<00:00, 24.9MB/s]" + } + }, + "df087de9ade24058b1cf32e1556f7cb6": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "584330ab439b4887b1050a7f14dc5d7c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "880b32d3bd1d4af8b5d0b449aab87e8b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "97d09f016e274cca93927f3bd8329352": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d87ef5878c0f4211809716674d0d8413": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "556109848b1c4ebc99a6cc7c0be519e0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8d6cdfd75e3f4a628c9e785d3c469d98": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "fb1ff6f4482143c39be1cca57ec2fc8b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_83e6421843ad487c91bc75510b90f198", + "IPY_MODEL_9e74a7b74e1a4b119af5b95d572bac3c", + "IPY_MODEL_080c34ad56c84c229b1555b15b354aad" + ], + "layout": "IPY_MODEL_d968bf43e8574d9090326b31c9a7fd93" + } + }, + "83e6421843ad487c91bc75510b90f198": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e78b05f33ee54c968fd87b77a2470bce", + "placeholder": "", + "style": "IPY_MODEL_79a201f7ab7e49efa9e3e1504012dec2", + "value": "special_tokens_map.json: 100%" + } + }, + "9e74a7b74e1a4b119af5b95d572bac3c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6e5d431074de4955a97d4ea36621ae36", + "max": 414, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_bfc581362fbc4aca85df7b2a943dd5e4", + "value": 414 + } + }, + "080c34ad56c84c229b1555b15b354aad": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_bc9b585bfd2847bb9f22c4720bd19033", + "placeholder": "", + "style": "IPY_MODEL_8addd2418c3049f3be32465cc9a408d4", + "value": " 414/414 [00:00<00:00, 52.5kB/s]" + } + }, + "d968bf43e8574d9090326b31c9a7fd93": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e78b05f33ee54c968fd87b77a2470bce": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "79a201f7ab7e49efa9e3e1504012dec2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "6e5d431074de4955a97d4ea36621ae36": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bfc581362fbc4aca85df7b2a943dd5e4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "bc9b585bfd2847bb9f22c4720bd19033": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8addd2418c3049f3be32465cc9a408d4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c7b5bb9ef22f4ebe9969d4d10d63d24c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d8c3f3ec329743f6b2f21d21601f092a", + "IPY_MODEL_2fee19152ef34eeaba541d559b9a0bc0", + "IPY_MODEL_2740de6be1ae4e3bacc642c39828883b" + ], + "layout": "IPY_MODEL_4104813265f34db0ab09c9d6c148ba29" + } + }, + "d8c3f3ec329743f6b2f21d21601f092a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6d2dbad5a0984f8382abd18910c14343", + "placeholder": "", + "style": "IPY_MODEL_32285185818f40a6b07c6d6f6175b70c", + "value": "config.json: 100%" + } + }, + "2fee19152ef34eeaba541d559b9a0bc0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_79da3c26e0fb4405a198c2255df9ec00", + "max": 571, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_c95bea4e04ff49078821a5dd67f0c28a", + "value": 571 + } + }, + "2740de6be1ae4e3bacc642c39828883b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3695b9dde85348efb683e31e5d52e210", + "placeholder": "", + "style": "IPY_MODEL_1d982bed2d4645b8a19295b7812cef49", + "value": " 571/571 [00:00<00:00, 72.5kB/s]" + } + }, + "4104813265f34db0ab09c9d6c148ba29": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6d2dbad5a0984f8382abd18910c14343": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "32285185818f40a6b07c6d6f6175b70c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "79da3c26e0fb4405a198c2255df9ec00": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c95bea4e04ff49078821a5dd67f0c28a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "3695b9dde85348efb683e31e5d52e210": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1d982bed2d4645b8a19295b7812cef49": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "32c58f50bb1c44e085ae3663004fcfff": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c4df70cf509541828d3a06c380fdfe3d", + "IPY_MODEL_abd2737f597f48b0846a74c743307917", + "IPY_MODEL_a2a52b5e3c104e1cbec513a9f8744db2" + ], + "layout": "IPY_MODEL_ba57460b8ee24f4e96f8a603914b7073" + } + }, + "c4df70cf509541828d3a06c380fdfe3d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d17cd0e49fa94361894660c0645ec9a8", + "placeholder": "", + "style": "IPY_MODEL_6cd364a43f6f4ea793b05bf14ee9d687", + "value": "model.safetensors.index.json: 100%" + } + }, + "abd2737f597f48b0846a74c743307917": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a80f72a5e41047f1898d5b6f00a2c69b", + "max": 25125, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_c6f6fca0f35b44fbb9037337a5bc0431", + "value": 25125 + } + }, + "a2a52b5e3c104e1cbec513a9f8744db2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3d07e648a5644742b8112146e952c44a", + "placeholder": "", + "style": "IPY_MODEL_bff978fcc6f94f55bf605c6d9c23cfd2", + "value": " 25.1k/25.1k [00:00<00:00, 2.73MB/s]" + } + }, + "ba57460b8ee24f4e96f8a603914b7073": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d17cd0e49fa94361894660c0645ec9a8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6cd364a43f6f4ea793b05bf14ee9d687": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a80f72a5e41047f1898d5b6f00a2c69b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c6f6fca0f35b44fbb9037337a5bc0431": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "3d07e648a5644742b8112146e952c44a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bff978fcc6f94f55bf605c6d9c23cfd2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "eca24e648bcf4cc684f15da684e2791d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_dc82b611b8c145eb8ebc7b80073e9ae1", + "IPY_MODEL_f3e6040a241c4ac7b715bb07a9ec6d6b", + "IPY_MODEL_e310ab9f4338443e82d257ddc21f48bb" + ], + "layout": "IPY_MODEL_9dd0e53a7a2a4d668c5640d938b71c9f" + } + }, + "dc82b611b8c145eb8ebc7b80073e9ae1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1fc933b90fa546c884181136373ad005", + "placeholder": "", + "style": "IPY_MODEL_94f3ee73e2c04092ac5522c6ef038ea1", + "value": "Fetching 2 files: 100%" + } + }, + "f3e6040a241c4ac7b715bb07a9ec6d6b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_81d8563026e04f5ab00eced0da89a7ef", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_95f081aaf9e84c2f91c82a4e2f183009", + "value": 2 + } + }, + "e310ab9f4338443e82d257ddc21f48bb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_965cfc093b5040bbaec177820e45ec95", + "placeholder": "", + "style": "IPY_MODEL_d328397d81f343e28dd1a6e52c5f0ae7", + "value": " 2/2 [00:46<00:00, 46.46s/it]" + } + }, + "9dd0e53a7a2a4d668c5640d938b71c9f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1fc933b90fa546c884181136373ad005": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "94f3ee73e2c04092ac5522c6ef038ea1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "81d8563026e04f5ab00eced0da89a7ef": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "95f081aaf9e84c2f91c82a4e2f183009": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "965cfc093b5040bbaec177820e45ec95": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d328397d81f343e28dd1a6e52c5f0ae7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f73f9c7f341c4a99b00585343bf4d4bd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_2a11e010825b42d4a949ad64ae0d1933", + "IPY_MODEL_15b769156f6a4d2988f1c09f3820f7ef", + "IPY_MODEL_a0484e3846c647b892d2de3797496605" + ], + "layout": "IPY_MODEL_cb042f80aaf04bf1963d637d1771741e" + } + }, + "2a11e010825b42d4a949ad64ae0d1933": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_852ba7d4221a475488411f5014362496", + "placeholder": "", + "style": "IPY_MODEL_38dc7c1e65324e3097d8738532272e32", + "value": "model-00001-of-00002.safetensors: 100%" + } + }, + "15b769156f6a4d2988f1c09f3820f7ef": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_613da14abc24460db3bb337886cb407c", + "max": 9942981696, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_37a495a5836f413ea5f662538d51a939", + "value": 9942981696 + } + }, + "a0484e3846c647b892d2de3797496605": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9c61322c006f465385df301121462e82", + "placeholder": "", + "style": "IPY_MODEL_d93d0bb6ebc943a1be6902bd88cef441", + "value": " 9.94G/9.94G [00:46<00:00, 246MB/s]" + } + }, + "cb042f80aaf04bf1963d637d1771741e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "852ba7d4221a475488411f5014362496": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "38dc7c1e65324e3097d8738532272e32": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "613da14abc24460db3bb337886cb407c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "37a495a5836f413ea5f662538d51a939": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "9c61322c006f465385df301121462e82": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d93d0bb6ebc943a1be6902bd88cef441": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "50a2a1bd13db4045a4ae01138470c42b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ad7cba643d1742cdb47c433bf50072f9", + "IPY_MODEL_57ef5d067e7343239525a6da237b29eb", + "IPY_MODEL_7567388a58a340d4a0f384f79ee13ddc" + ], + "layout": "IPY_MODEL_52c2896ab41a4d2592484084cb501e5a" + } + }, + "ad7cba643d1742cdb47c433bf50072f9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_22957622a42345b991371153c29583c4", + "placeholder": "", + "style": "IPY_MODEL_d34c879607b041739a2cc6273509e330", + "value": "model-00002-of-00002.safetensors: 100%" + } + }, + "57ef5d067e7343239525a6da237b29eb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d1e7bdd4faac4765862fc809017c4856", + "max": 4540516344, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_fcb0ad846398455faccf0d797549f589", + "value": 4540516344 + } + }, + "7567388a58a340d4a0f384f79ee13ddc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b381226552c9462d858051fcb7240727", + "placeholder": "", + "style": "IPY_MODEL_94e630795bc247e08e6af434c5924cdd", + "value": " 4.54G/4.54G [00:23<00:00, 248MB/s]" + } + }, + "52c2896ab41a4d2592484084cb501e5a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "22957622a42345b991371153c29583c4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d34c879607b041739a2cc6273509e330": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d1e7bdd4faac4765862fc809017c4856": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fcb0ad846398455faccf0d797549f589": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b381226552c9462d858051fcb7240727": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "94e630795bc247e08e6af434c5924cdd": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2b496c218e2049ff9156ff5b3bbdb90b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_62d3b35a3924417894094d3bbf993932", + "IPY_MODEL_41737448e98a48dcbe117351645395de", + "IPY_MODEL_e83735cd79674a3482f0b90d4c9a3e3d" + ], + "layout": "IPY_MODEL_eff6ca539e2947e9b2987977f143de9a" + } + }, + "62d3b35a3924417894094d3bbf993932": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_aa75292545a649eda8cb7bab0ac9bbcd", + "placeholder": "", + "style": "IPY_MODEL_22c0e2213505435eaeebdfe330b8fbb8", + "value": "Loading checkpoint shards: 100%" + } + }, + "41737448e98a48dcbe117351645395de": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7de820edeeaf4210af68c721bab3082d", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_66ef664b717343bdaf8e5c4610b2a678", + "value": 2 + } + }, + "e83735cd79674a3482f0b90d4c9a3e3d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f09534cbda8c4e91b2e073c0eca0cb96", + "placeholder": "", + "style": "IPY_MODEL_7d8b5a2a52aa4957bc5905021898d8f4", + "value": " 2/2 [00:17<00:00, 8.24s/it]" + } + }, + "eff6ca539e2947e9b2987977f143de9a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "aa75292545a649eda8cb7bab0ac9bbcd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "22c0e2213505435eaeebdfe330b8fbb8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7de820edeeaf4210af68c721bab3082d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "66ef664b717343bdaf8e5c4610b2a678": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "f09534cbda8c4e91b2e073c0eca0cb96": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7d8b5a2a52aa4957bc5905021898d8f4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "1c68a822580a4960acad93be9fd48ce3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_6df81f91b17f41dc91fc9f367fa0afab", + "IPY_MODEL_17742936c9ac46e588d1ce42235745d0", + "IPY_MODEL_17f0cd6f05184164b48ef906f192505a" + ], + "layout": "IPY_MODEL_936a67f2de2e44728b83600f4fa0569c" + } + }, + "6df81f91b17f41dc91fc9f367fa0afab": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d5ad82f6b9654a8cb888613caaaaa097", + "placeholder": "", + "style": "IPY_MODEL_b014979e237344129545ff2c384c1c1c", + "value": "generation_config.json: 100%" + } + }, + "17742936c9ac46e588d1ce42235745d0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b99c12d57d4a4eab84aefbef58452c32", + "max": 116, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_5923bbdcf6334393ad832765f129bdec", + "value": 116 + } + }, + "17f0cd6f05184164b48ef906f192505a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_260ac8c28531450bba1deac4e4669dc4", + "placeholder": "", + "style": "IPY_MODEL_067959a4ef614c498c28bb83c10e16de", + "value": " 116/116 [00:00<00:00, 15.6kB/s]" + } + }, + "936a67f2de2e44728b83600f4fa0569c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d5ad82f6b9654a8cb888613caaaaa097": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b014979e237344129545ff2c384c1c1c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b99c12d57d4a4eab84aefbef58452c32": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5923bbdcf6334393ad832765f129bdec": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "260ac8c28531450bba1deac4e4669dc4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "067959a4ef614c498c28bb83c10e16de": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b5dd409cf6e04764adbb7c2a49b7be86": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_65187b4ebb2041b39778268e8b4d6b0d", + "IPY_MODEL_33317cac10ca4a98bf4433c1eff43435", + "IPY_MODEL_f81f5402902c4c04b10895782287e908" + ], + "layout": "IPY_MODEL_c471914fe0d34ae8967bac2820637d5b" + } + }, + "65187b4ebb2041b39778268e8b4d6b0d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7aead6f6cffa40a383f1b8c64943329e", + "placeholder": "", + "style": "IPY_MODEL_f24fe57d8e164fd68185b4c117e7c097", + "value": "Loading checkpoint shards: 100%" + } + }, + "33317cac10ca4a98bf4433c1eff43435": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f913ca9ab6d44ab1b788a36bd964ed39", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_ed34016801264a05bb3697eca2ac22ef", + "value": 2 + } + }, + "f81f5402902c4c04b10895782287e908": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_fe622254072540fda3b0dd6b2cab6e4a", + "placeholder": "", + "style": "IPY_MODEL_5d95bdea47594e21855a6e564d0760da", + "value": " 2/2 [00:17<00:00, 8.01s/it]" + } + }, + "c471914fe0d34ae8967bac2820637d5b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7aead6f6cffa40a383f1b8c64943329e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f24fe57d8e164fd68185b4c117e7c097": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f913ca9ab6d44ab1b788a36bd964ed39": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ed34016801264a05bb3697eca2ac22ef": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "fe622254072540fda3b0dd6b2cab6e4a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5d95bdea47594e21855a6e564d0760da": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9f9defc39ac5437e9512e5fad810b409": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_1c126dfdc51c438b9b48c8a65e549ae2", + "IPY_MODEL_741d800130ea4830b9266f467fa6a0bf", + "IPY_MODEL_73c0a01f1693471c9c017143e9e9058b" + ], + "layout": "IPY_MODEL_ab8174c1337b43048e05aeca72ca18ef" + } + }, + "1c126dfdc51c438b9b48c8a65e549ae2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5e5a992d86434e62a25fc9b7f75f4b16", + "placeholder": "", + "style": "IPY_MODEL_1507b1310f5045c9b691fdb102cc1686", + "value": "Loading checkpoint shards: 100%" + } + }, + "741d800130ea4830b9266f467fa6a0bf": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_8a8e81f9d3a54ce49b367f8e984b4a06", + "max": 2, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_bab02b1f092b40c8983cd6440f7eaf16", + "value": 2 + } + }, + "73c0a01f1693471c9c017143e9e9058b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_94f30dc2653a4f178c9c2ef454d24644", + "placeholder": "", + "style": "IPY_MODEL_a508625ef12d4a639fa9773484507709", + "value": " 2/2 [00:17<00:00, 8.07s/it]" + } + }, + "ab8174c1337b43048e05aeca72ca18ef": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5e5a992d86434e62a25fc9b7f75f4b16": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1507b1310f5045c9b691fdb102cc1686": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "8a8e81f9d3a54ce49b367f8e984b4a06": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bab02b1f092b40c8983cd6440f7eaf16": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "94f30dc2653a4f178c9c2ef454d24644": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a508625ef12d4a639fa9773484507709": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + } + } + } + }, + "cells": [ + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "Pv8FH9BMgskk", + "outputId": "00cd7f02-2556-4850-b599-1ddec83f7cd9" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.4/76.4 kB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m363.4/363.4 MB\u001b[0m \u001b[31m4.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.8/13.8 MB\u001b[0m \u001b[31m112.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m24.6/24.6 MB\u001b[0m \u001b[31m96.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m883.7/883.7 kB\u001b[0m \u001b[31m55.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m664.8/664.8 MB\u001b[0m \u001b[31m1.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m211.5/211.5 MB\u001b[0m \u001b[31m11.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.3/56.3 MB\u001b[0m \u001b[31m44.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m127.9/127.9 MB\u001b[0m \u001b[31m20.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m207.5/207.5 MB\u001b[0m \u001b[31m3.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.1/21.1 MB\u001b[0m \u001b[31m109.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.1/76.1 MB\u001b[0m \u001b[31m28.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "google-genai 1.12.1 requires httpx<1.0.0,>=0.28.1, but you have httpx 0.27.2 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0mRequirement already satisfied: bitsandbytes in /usr/local/lib/python3.11/dist-packages (0.45.5)\n", + "Requirement already satisfied: torch<3,>=2.0 in /usr/local/lib/python3.11/dist-packages (from bitsandbytes) (2.6.0+cu124)\n", + "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.11/dist-packages (from bitsandbytes) (2.0.2)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.18.0)\n", + "Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (4.13.2)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.4.2)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.1.6)\n", + "Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (2025.3.2)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (9.1.0.70)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.5.8)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (11.2.1.3)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (10.3.5.147)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (11.6.1.9)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.3.1.170)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (0.6.2)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (2.21.5)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.2.0)\n", + "Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (1.13.1)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch<3,>=2.0->bitsandbytes) (1.3.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch<3,>=2.0->bitsandbytes) (3.0.2)\n", + "Requirement already satisfied: transformers in /usr/local/lib/python3.11/dist-packages (4.51.3)\n", + "Requirement already satisfied: accelerate in /usr/local/lib/python3.11/dist-packages (1.6.0)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from transformers) (3.18.0)\n", + "Requirement already satisfied: huggingface-hub<1.0,>=0.30.0 in /usr/local/lib/python3.11/dist-packages (from transformers) (0.30.2)\n", + "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.11/dist-packages (from transformers) (2.0.2)\n", + "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from transformers) (24.2)\n", + "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.11/dist-packages (from transformers) (6.0.2)\n", + "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.11/dist-packages (from transformers) (2024.11.6)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from transformers) (2.32.3)\n", + "Requirement already satisfied: tokenizers<0.22,>=0.21 in /usr/local/lib/python3.11/dist-packages (from transformers) (0.21.1)\n", + "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.11/dist-packages (from transformers) (0.5.3)\n", + "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.11/dist-packages (from transformers) (4.67.1)\n", + "Requirement already satisfied: psutil in /usr/local/lib/python3.11/dist-packages (from accelerate) (5.9.5)\n", + "Requirement already satisfied: torch>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from accelerate) (2.6.0+cu124)\n", + "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.11/dist-packages (from huggingface-hub<1.0,>=0.30.0->transformers) (2025.3.2)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.11/dist-packages (from huggingface-hub<1.0,>=0.30.0->transformers) (4.13.2)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (3.4.2)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (3.1.6)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.127)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (9.1.0.70)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.5.8)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (11.2.1.3)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (10.3.5.147)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (11.6.1.9)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.3.1.170)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (0.6.2)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (2.21.5)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.127)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (12.4.127)\n", + "Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (3.2.0)\n", + "Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->accelerate) (1.13.1)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch>=2.0.0->accelerate) (1.3.0)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (3.4.1)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (3.10)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (2.4.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (2025.4.26)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch>=2.0.0->accelerate) (3.0.2)\n" + ] + } + ], + "source": [ + "!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2\n", + "!pip install -U bitsandbytes\n", + "!pip install -U transformers accelerate" + ] + }, + { + "cell_type": "code", + "source": [ + "# imports\n", + "\n", + "import os\n", + "import requests\n", + "from IPython.display import Markdown, display, update_display\n", + "from openai import OpenAI\n", + "from google.colab import drive\n", + "from huggingface_hub import login\n", + "from google.colab import userdata\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig, pipeline, TextGenerationPipeline\n", + "import torch" + ], + "metadata": { + "id": "u0qdj2ynjjRz" + }, + "execution_count": 9, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Models\n", + "GPT = 'gpt2'\n", + "FALCON = \"tiiuae/falcon-rw-1b\"\n", + "MISTRAL = \"mistralai/Mistral-7B-Instruct-v0.1\"\n", + "Databricks = \"databricks/dolly-v2-3b\"\n" + ], + "metadata": { + "id": "a_sHgTj_jpDE" + }, + "execution_count": 10, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Sign in to HuggingFace Hub\n", + "\n", + "hf_token = userdata.get('HF_TOKEN')\n", + "login(hf_token, add_to_git_credential=True)" + ], + "metadata": { + "id": "JYjtu3cPj2Th" + }, + "execution_count": 11, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Flatten the messages into a single plain prompt\n", + "# prompt = \"\"\"\n", + "# Generate {{n}} fake job postings for a {{role}} position.\n", + "\n", + "# Only output a JSON array like:\n", + "# [\n", + "# {{\n", + "# \"title\": \"Software Engineer\",\n", + "# \"description\": \"Develop backend APIs and services.\",\n", + "# \"requirements\": [\"Python\", \"FastAPI\", \"MongoDB\"],\n", + "# \"location\": \"San Francisco\",\n", + "# \"company_name\": \"TechCorp\"\n", + "# }},\n", + "# ...\n", + "# ]\n", + "# Return valid JSON only. No markdown. No explanations.\n", + "# \"\"\"\n", + "\n", + "# prompt = \"\"\"\n", + "# Generate exactly {{n}} fake job postings for a {{role}}.\n", + "\n", + "# Each posting must be a JSON object with:\n", + "# - title\n", + "# - description (5-10 sentences)\n", + "# - requirements (array of 3-5 strings)\n", + "# - location\n", + "# - company_name\n", + "\n", + "# Return a single JSON array with {n} items. No explanations. No markdown.\n", + "# ONLY the JSON array as output.\n", + "# \"\"\"\n", + "\n", + "prompt = \"\"\"\n", + "Generate one fake job posting for a {{role}}.\n", + "\n", + "Return only a single JSON object with:\n", + "- title\n", + "- description (5-10 sentences)\n", + "- requirements (array of 4-6 strings)\n", + "- location\n", + "- company_name\n", + "\n", + "No explanations, no extra text.\n", + "Only the JSON object.\n", + "\"\"\"\n", + "\n", + "\n", + "\n" + ], + "metadata": { + "id": "7IUshG1fkQ7k" + }, + "execution_count": 12, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "!pip install safetensors" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-9nzEpDd-dkd", + "outputId": "484ed145-951f-4950-f9ba-bf7ed6e30a13" + }, + "execution_count": 13, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Requirement already satisfied: safetensors in /usr/local/lib/python3.11/dist-packages (0.5.3)\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "import os\n", + "os.makedirs(\"/tmp/dolly_offload\", exist_ok=True)" + ], + "metadata": { + "id": "D13qucmC-qGr" + }, + "execution_count": 14, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "bnb_config = BitsAndBytesConfig(\n", + " load_in_4bit=True,\n", + " bnb_4bit_use_double_quant=True,\n", + " bnb_4bit_compute_dtype=torch.bfloat16,\n", + " bnb_4bit_quant_type=\"nf4\"\n", + ")" + ], + "metadata": { + "id": "4qf967BtEqqx" + }, + "execution_count": 15, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def load_model_and_tokenizer():\n", + " tokenizer = AutoTokenizer.from_pretrained(MISTRAL, trust_remote_code=True)\n", + "\n", + " model = AutoModelForCausalLM.from_pretrained(\n", + " MISTRAL,\n", + " device_map={\"\": \"cuda\"},\n", + " trust_remote_code=True,\n", + " offload_folder=\"/tmp/dolly_offload\",\n", + " quantization_config=bnb_config\n", + " )\n", + "\n", + " return model, tokenizer\n" + ], + "metadata": { + "id": "GjV7joEMjujM" + }, + "execution_count": 16, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# generator = pipeline(\"text-generation\", model=Databricks, device_map=\"auto\", trust_remote_code=True, offload_folder=\"/tmp/dolly_offload\")\n", + "\n", + "def generate_job(role=\"Software Engineer\", model=None, tokenizer=None):\n", + " # prompt = prompt.format(role=role, n=n)\n", + " # outputs = generator(prompt, max_new_tokens=500, do_sample=True, temperature=0.9)\n", + " # return outputs[0]['generated_text']\n", + "\n", + " # Apply chat template formatting\n", + " # inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(model.device)\n", + " inputs = tokenizer(prompt.format(role=role), return_tensors=\"pt\")\n", + " inputs = {k: v.to(model.device) for k, v in inputs.items()}\n", + "\n", + "\n", + " # Generate output\n", + " outputs = model.generate(\n", + " **inputs,\n", + " max_new_tokens=600,\n", + " do_sample=True,\n", + " temperature=0.2,\n", + " top_p=0.9,\n", + " pad_token_id=tokenizer.eos_token_id\n", + " )\n", + "\n", + " # Decode and return\n", + " result = tokenizer.decode(outputs[0], skip_special_tokens=True)\n", + " return result\n", + "\n" + ], + "metadata": { + "id": "5w89B0MwkJWo" + }, + "execution_count": 17, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "\n", + "def generate_jobs(role=\"Software Engineer\", n=5):\n", + " model, tokenizer = load_model_and_tokenizer()\n", + " role = \"Software Engineer\"\n", + " fake_jobs = []\n", + " for i in range(n):\n", + " fake_jobs.append(generate_job(role=role, model=model, tokenizer=tokenizer))\n", + " return fake_jobs" + ], + "metadata": { + "id": "ULhKrRe7XZmW" + }, + "execution_count": 18, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "print(generate_jobs(role=\"Software Engineer\", n=10))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 406, + "referenced_widgets": [ + "1d1fe06ac632475086ed5964ed000360", + "c138f597c98c4944b54d36510ecc8e0b", + "bef2531516164e85bb79b86a791dd00d", + "1cb9fc011950479a8d4832bc52c3399c", + "974e8f7f05ef472d85d5ea71425e6c39", + "696090959af8499e9a38777e664b85c1", + "973bcc9740b4426da4c680d11f3c1f7e", + "3cb5d8fdb5fb4b6a99f6733c00df8378", + "58f4369c68434d569d5eb1bc36e71775", + "a05df972876941e3b6faab56cc30a4b8", + "9c61d90b63dd4fb5a481282d6d6eb8e8", + "2b71f87a02a540488a9e07f072f8807a", + "548cd7e9fab54470bc52810f27784760", + "9c5eb078ece84a57aa9c402c9cad3b0b", + "ee00a9f599db4affabb7bf1c4df6ca1a", + "52bd638607bf4e1aaf224ebdcfa3693d", + "771619a5acd343c788b8189167af09d4", + "09a1b30b5659452f95ebb2e72466c750", + "145a1f1032a44079a262db381e60d401", + "99888ad83b51485f959f977ba4418119", + "ec0854c2ea9a4c9280b6876df365db9d", + "dac5892c85214f69a5d75d5dc4858dfe", + "41b669da565e4204b848b754dfa28ac8", + "e806afdada48418c9e353b94a38cd703", + "7898b7322b014e96984c3d09a29a57fb", + "d665270b05d64effba568ded85eee1b4", + "df087de9ade24058b1cf32e1556f7cb6", + "584330ab439b4887b1050a7f14dc5d7c", + "880b32d3bd1d4af8b5d0b449aab87e8b", + "97d09f016e274cca93927f3bd8329352", + "d87ef5878c0f4211809716674d0d8413", + "556109848b1c4ebc99a6cc7c0be519e0", + "8d6cdfd75e3f4a628c9e785d3c469d98", + "fb1ff6f4482143c39be1cca57ec2fc8b", + "83e6421843ad487c91bc75510b90f198", + "9e74a7b74e1a4b119af5b95d572bac3c", + "080c34ad56c84c229b1555b15b354aad", + "d968bf43e8574d9090326b31c9a7fd93", + "e78b05f33ee54c968fd87b77a2470bce", + "79a201f7ab7e49efa9e3e1504012dec2", + "6e5d431074de4955a97d4ea36621ae36", + "bfc581362fbc4aca85df7b2a943dd5e4", + "bc9b585bfd2847bb9f22c4720bd19033", + "8addd2418c3049f3be32465cc9a408d4", + "c7b5bb9ef22f4ebe9969d4d10d63d24c", + "d8c3f3ec329743f6b2f21d21601f092a", + "2fee19152ef34eeaba541d559b9a0bc0", + "2740de6be1ae4e3bacc642c39828883b", + "4104813265f34db0ab09c9d6c148ba29", + "6d2dbad5a0984f8382abd18910c14343", + "32285185818f40a6b07c6d6f6175b70c", + "79da3c26e0fb4405a198c2255df9ec00", + "c95bea4e04ff49078821a5dd67f0c28a", + "3695b9dde85348efb683e31e5d52e210", + "1d982bed2d4645b8a19295b7812cef49", + "32c58f50bb1c44e085ae3663004fcfff", + "c4df70cf509541828d3a06c380fdfe3d", + "abd2737f597f48b0846a74c743307917", + "a2a52b5e3c104e1cbec513a9f8744db2", + "ba57460b8ee24f4e96f8a603914b7073", + "d17cd0e49fa94361894660c0645ec9a8", + "6cd364a43f6f4ea793b05bf14ee9d687", + "a80f72a5e41047f1898d5b6f00a2c69b", + "c6f6fca0f35b44fbb9037337a5bc0431", + "3d07e648a5644742b8112146e952c44a", + "bff978fcc6f94f55bf605c6d9c23cfd2", + "eca24e648bcf4cc684f15da684e2791d", + "dc82b611b8c145eb8ebc7b80073e9ae1", + "f3e6040a241c4ac7b715bb07a9ec6d6b", + "e310ab9f4338443e82d257ddc21f48bb", + "9dd0e53a7a2a4d668c5640d938b71c9f", + "1fc933b90fa546c884181136373ad005", + "94f3ee73e2c04092ac5522c6ef038ea1", + "81d8563026e04f5ab00eced0da89a7ef", + "95f081aaf9e84c2f91c82a4e2f183009", + "965cfc093b5040bbaec177820e45ec95", + "d328397d81f343e28dd1a6e52c5f0ae7", + "f73f9c7f341c4a99b00585343bf4d4bd", + "2a11e010825b42d4a949ad64ae0d1933", + "15b769156f6a4d2988f1c09f3820f7ef", + "a0484e3846c647b892d2de3797496605", + "cb042f80aaf04bf1963d637d1771741e", + "852ba7d4221a475488411f5014362496", + "38dc7c1e65324e3097d8738532272e32", + "613da14abc24460db3bb337886cb407c", + "37a495a5836f413ea5f662538d51a939", + "9c61322c006f465385df301121462e82", + "d93d0bb6ebc943a1be6902bd88cef441", + "50a2a1bd13db4045a4ae01138470c42b", + "ad7cba643d1742cdb47c433bf50072f9", + "57ef5d067e7343239525a6da237b29eb", + "7567388a58a340d4a0f384f79ee13ddc", + "52c2896ab41a4d2592484084cb501e5a", + "22957622a42345b991371153c29583c4", + "d34c879607b041739a2cc6273509e330", + "d1e7bdd4faac4765862fc809017c4856", + "fcb0ad846398455faccf0d797549f589", + "b381226552c9462d858051fcb7240727", + "94e630795bc247e08e6af434c5924cdd", + "2b496c218e2049ff9156ff5b3bbdb90b", + "62d3b35a3924417894094d3bbf993932", + "41737448e98a48dcbe117351645395de", + "e83735cd79674a3482f0b90d4c9a3e3d", + "eff6ca539e2947e9b2987977f143de9a", + "aa75292545a649eda8cb7bab0ac9bbcd", + "22c0e2213505435eaeebdfe330b8fbb8", + "7de820edeeaf4210af68c721bab3082d", + "66ef664b717343bdaf8e5c4610b2a678", + "f09534cbda8c4e91b2e073c0eca0cb96", + "7d8b5a2a52aa4957bc5905021898d8f4", + "1c68a822580a4960acad93be9fd48ce3", + "6df81f91b17f41dc91fc9f367fa0afab", + "17742936c9ac46e588d1ce42235745d0", + "17f0cd6f05184164b48ef906f192505a", + "936a67f2de2e44728b83600f4fa0569c", + "d5ad82f6b9654a8cb888613caaaaa097", + "b014979e237344129545ff2c384c1c1c", + "b99c12d57d4a4eab84aefbef58452c32", + "5923bbdcf6334393ad832765f129bdec", + "260ac8c28531450bba1deac4e4669dc4", + "067959a4ef614c498c28bb83c10e16de" + ] + }, + "id": "kKsErltXXwy1", + "outputId": "683c2e5e-16d8-4fe3-efdd-664c385c71e7" + }, + "execution_count": 19, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "tokenizer_config.json: 0%| | 0.00/2.10k [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "1d1fe06ac632475086ed5964ed000360" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "tokenizer.model: 0%| | 0.00/493k [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "2b71f87a02a540488a9e07f072f8807a" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "tokenizer.json: 0%| | 0.00/1.80M [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "41b669da565e4204b848b754dfa28ac8" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "special_tokens_map.json: 0%| | 0.00/414 [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "fb1ff6f4482143c39be1cca57ec2fc8b" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "config.json: 0%| | 0.00/571 [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "c7b5bb9ef22f4ebe9969d4d10d63d24c" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model.safetensors.index.json: 0%| | 0.00/25.1k [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "32c58f50bb1c44e085ae3663004fcfff" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Fetching 2 files: 0%| | 0/2 [00:00, ?it/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "eca24e648bcf4cc684f15da684e2791d" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model-00001-of-00002.safetensors: 0%| | 0.00/9.94G [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "f73f9c7f341c4a99b00585343bf4d4bd" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "model-00002-of-00002.safetensors: 0%| | 0.00/4.54G [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "50a2a1bd13db4045a4ae01138470c42b" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "2b496c218e2049ff9156ff5b3bbdb90b" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "generation_config.json: 0%| | 0.00/116 [00:00, ?B/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "1c68a822580a4960acad93be9fd48ce3" + } + }, + "metadata": {} + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "['\\nGenerate one fake job posting for a {role}.\\n\\nReturn only a single JSON object with:\\n- title\\n- description (5-10 sentences)\\n- requirements (array of 4-6 strings)\\n- location\\n- company_name\\n\\nNo explanations, no extra text.\\nOnly the JSON object.\\n\\n```json\\n{\\n \"title\": \"Software Engineer\",\\n \"description\": \"We are looking for a highly skilled software engineer to join our team. The ideal candidate will have experience in developing and maintaining software systems, with a strong understanding of programming languages such as Java or Python. The successful candidate will work collaboratively with our team of developers, designers, and project managers to deliver high-quality software solutions that meet the needs of our clients.\",\\n \"requirements\": [\"Bachelor\\'s degree in Computer Science or related field\", \"Minimum of 2 years experience in software development\", \"Strong understanding of programming languages such as Java or Python\", \"Experience with software testing and debugging\", \"Excellent problem-solving skills\", \"Ability to work collaboratively in a team environment\"],\\n \"location\": \"New York, NY\",\\n \"company_name\": \"ABC Corporation\"\\n}\\n```', '\\nGenerate one fake job posting for a {role}.\\n\\nReturn only a single JSON object with:\\n- title\\n- description (5-10 sentences)\\n- requirements (array of 4-6 strings)\\n- location\\n- company_name\\n\\nNo explanations, no extra text.\\nOnly the JSON object.\\n\\n```json\\n{\\n \"title\": \"Software Engineer\",\\n \"description\": \"We are looking for a software engineer to join our team and help develop new features for our products. The ideal candidate will have experience with programming languages such as Java or Python and be familiar with software development methodologies such as Agile. Responsibilities include writing clean, maintainable, and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. This is a full-time position located in San Francisco.\",\\n \"requirements\": [\"Bachelor\\'s degree in Computer Science or related field\", \"3+ years of experience in software development\", \"Strong proficiency in Java or Python\", \"Experience with software development methodologies such as Agile\", \"Excellent communication and collaboration skills\", \"Ability to work independently and as part of a team\"],\\n \"location\": \"San Francisco\",\\n \"company_name\": \"Acme Inc.\"\\n}\\n```', '\\nGenerate one fake job posting for a {role}.\\n\\nReturn only a single JSON object with:\\n- title\\n- description (5-10 sentences)\\n- requirements (array of 4-6 strings)\\n- location\\n- company_name\\n\\nNo explanations, no extra text.\\nOnly the JSON object.\\n\\n```json\\n{\\n \"title\": \"Software Engineer\",\\n \"description\": \"We are looking for a highly skilled software engineer to join our team and develop innovative solutions. The ideal candidate will have experience in designing, coding, and testing software systems. Responsibilities include collaborating with cross-functional teams, writing clean and maintainable code, and ensuring timely delivery of projects. Must have a strong understanding of programming languages such as Java, Python, or C++. Knowledge of software development methodologies and tools is a plus. This is a full-time position located in San Francisco, CA.\",\\n \"requirements\": [\"Bachelor\\'s degree in Computer Science or related field\", \"3+ years of experience in software engineering\", \"Strong proficiency in Java, Python, or C++\", \"Experience with software development methodologies such as Agile or Scrum\", \"Excellent problem-solving and communication skills\", \"Ability to work independently and as part of a team\", \"Strong attention to detail and ability to manage multiple projects simultaneously\"],\\n \"location\": \"San Francisco, CA\",\\n \"company_name\": \"ABC Tech\"\\n}\\n```', '\\nGenerate one fake job posting for a {role}.\\n\\nReturn only a single JSON object with:\\n- title\\n- description (5-10 sentences)\\n- requirements (array of 4-6 strings)\\n- location\\n- company_name\\n\\nNo explanations, no extra text.\\nOnly the JSON object.\\n\\n```json\\n{\\n \"title\": \"Software Engineer\",\\n \"description\": \"We are looking for a software engineer to join our team and help develop new features for our products. The ideal candidate will have experience with programming languages such as Java or Python and be familiar with software development methodologies such as Agile. Responsibilities include writing clean, maintainable code, collaborating with cross-functional teams, and actively participating in code reviews. This is a full-time position located in our office in San Francisco.\",\\n \"requirements\": [\"Bachelor\\'s degree in Computer Science or related field\", \"2+ years of experience in software development\", \"Strong proficiency in Java or Python\", \"Familiarity with software development methodologies such as Agile\", \"Excellent communication and collaboration skills\"],\\n \"location\": \"San Francisco\",\\n \"company_name\": \"Acme Inc.\"\\n}\\n```', '\\nGenerate one fake job posting for a {role}.\\n\\nReturn only a single JSON object with:\\n- title\\n- description (5-10 sentences)\\n- requirements (array of 4-6 strings)\\n- location\\n- company_name\\n\\nNo explanations, no extra text.\\nOnly the JSON object.\\n\\n```json\\n{\\n \"title\": \"Software Engineer\",\\n \"description\": \"We are looking for a software engineer to join our team in developing and maintaining our software applications. The ideal candidate will have experience in programming languages such as Java, Python, or C++ and be familiar with software development methodologies such as Agile and Scrum. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. This is a full-time position located in San Francisco, CA.\",\\n \"requirements\": [\"Bachelor\\'s degree in Computer Science or related field\", \"3+ years of experience in software development\", \"Strong proficiency in Java, Python, or C++\", \"Experience with Agile and Scrum methodologies\", \"Excellent communication and collaboration skills\"],\\n \"location\": \"San Francisco, CA\",\\n \"company_name\": \"Acme Inc.\"\\n}\\n```', '\\nGenerate one fake job posting for a {role}.\\n\\nReturn only a single JSON object with:\\n- title\\n- description (5-10 sentences)\\n- requirements (array of 4-6 strings)\\n- location\\n- company_name\\n\\nNo explanations, no extra text.\\nOnly the JSON object.\\n\\n```json\\n{\\n \"title\": \"Software Engineer\",\\n \"description\": \"We are looking for a highly skilled software engineer to join our team. The ideal candidate will have experience in developing and maintaining complex software systems. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. Must have a strong understanding of data structures, algorithms, and software design patterns. This is a full-time position located in San Francisco, CA.\",\\n \"requirements\": [\"Bachelor\\'s degree in Computer Science or related field\", \"3+ years of experience in software development\", \"Strong proficiency in Java or C++\", \"Experience with agile development methodologies\", \"Excellent problem-solving skills\", \"Ability to work independently and as part of a team\", \"Strong communication and interpersonal skills\"],\\n \"location\": \"San Francisco, CA\",\\n \"company_name\": \"Acme Inc.\"\\n}\\n```', '\\nGenerate one fake job posting for a {role}.\\n\\nReturn only a single JSON object with:\\n- title\\n- description (5-10 sentences)\\n- requirements (array of 4-6 strings)\\n- location\\n- company_name\\n\\nNo explanations, no extra text.\\nOnly the JSON object.\\n\\n```json\\n{\\n \"title\": \"Software Engineer\",\\n \"description\": \"We are looking for a talented software engineer to join our team in developing and maintaining our web applications. The ideal candidate will have a strong background in computer science and experience with various programming languages such as Java, Python, and JavaScript. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. This is a full-time position located in our office in New York City.\",\\n \"requirements\": [\"Bachelor\\'s degree in Computer Science or related field\", \"3+ years of experience in software development\", \"Strong proficiency in Java, Python, and JavaScript\", \"Experience with web development frameworks such as React and Angular\", \"Good understanding of data structures and algorithms\", \"Excellent problem-solving skills\", \"Ability to work independently and within a team environment\", \"Strong communication and interpersonal skills\"],\\n \"location\": \"New York City\",\\n \"company_name\": \"ABC Company\"\\n}\\n```', '\\nGenerate one fake job posting for a {role}.\\n\\nReturn only a single JSON object with:\\n- title\\n- description (5-10 sentences)\\n- requirements (array of 4-6 strings)\\n- location\\n- company_name\\n\\nNo explanations, no extra text.\\nOnly the JSON object.\\n\\n```json\\n{\\n \"title\": \"Software Engineer\",\\n \"description\": \"We are looking for a software engineer to join our team in developing and maintaining our web applications. The ideal candidate will have experience in front-end and back-end development, proficiency in JavaScript, HTML, CSS, and SQL. They will also be responsible for ensuring the functionality and performance of our applications. This is a full-time position located in New York City.\",\\n \"requirements\": [\"Bachelor\\'s degree in Computer Science or related field\", \"3+ years of experience in software development\", \"Strong knowledge of JavaScript, HTML, CSS, and SQL\", \"Experience with front-end frameworks such as React or Angular\", \"Familiarity with back-end technologies such as Node.js or Python\", \"Ability to work independently and as part of a team\", \"Excellent problem-solving skills\", \"Strong communication and interpersonal skills\"],\\n \"location\": \"New York City\",\\n \"company_name\": \"ABC Company\"\\n}\\n```', '\\nGenerate one fake job posting for a {role}.\\n\\nReturn only a single JSON object with:\\n- title\\n- description (5-10 sentences)\\n- requirements (array of 4-6 strings)\\n- location\\n- company_name\\n\\nNo explanations, no extra text.\\nOnly the JSON object.\\n\\n```json\\n{\\n \"title\": \"Software Engineer\",\\n \"description\": \"We are looking for a skilled software engineer to join our team in developing and maintaining our software applications. The ideal candidate will have experience in programming languages such as Java, Python, and C++, and will be able to work independently or as part of a team. Responsibilities include writing clean and efficient code, testing and debugging software, and collaborating with other team members. This is a full-time position located in New York City.\",\\n \"requirements\": [\"Bachelor\\'s degree in Computer Science or related field\", \"3+ years of experience in software development\", \"Strong proficiency in Java, Python, and C++\", \"Experience with agile development methodologies\", \"Excellent problem-solving and communication skills\"],\\n \"location\": \"New York City\",\\n \"company_name\": \"ABC Corporation\"\\n}\\n```', '\\nGenerate one fake job posting for a {role}.\\n\\nReturn only a single JSON object with:\\n- title\\n- description (5-10 sentences)\\n- requirements (array of 4-6 strings)\\n- location\\n- company_name\\n\\nNo explanations, no extra text.\\nOnly the JSON object.\\n\\n```json\\n{\\n \"title\": \"Software Engineer\",\\n \"description\": \"We are looking for a software engineer to join our team and help develop new features for our products. The ideal candidate will have experience with programming languages such as Java or Python and be familiar with software development methodologies such as Agile. Responsibilities include writing clean, maintainable code, collaborating with cross-functional teams, and actively participating in code reviews. This is a full-time position located in our office in San Francisco.\",\\n \"requirements\": [\"Bachelor\\'s degree in Computer Science or related field\", \"2+ years of experience in software development\", \"Strong proficiency in Java or Python\", \"Experience with Agile software development methodologies\", \"Excellent communication and collaboration skills\"],\\n \"location\": \"San Francisco\",\\n \"company_name\": \"Acme Inc.\"\\n}\\n```']\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "!pip install -U bitsandbytes" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "OYEsZ10YSgHv", + "outputId": "67dc4a86-56ef-4fe8-8c3c-c0b5e13ac76a" + }, + "execution_count": 20, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Requirement already satisfied: bitsandbytes in /usr/local/lib/python3.11/dist-packages (0.45.5)\n", + "Requirement already satisfied: torch<3,>=2.0 in /usr/local/lib/python3.11/dist-packages (from bitsandbytes) (2.6.0+cu124)\n", + "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.11/dist-packages (from bitsandbytes) (2.0.2)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.18.0)\n", + "Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (4.13.2)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.4.2)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.1.6)\n", + "Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (2025.3.2)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (9.1.0.70)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.5.8)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (11.2.1.3)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (10.3.5.147)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (11.6.1.9)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.3.1.170)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (0.6.2)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (2.21.5)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (12.4.127)\n", + "Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (3.2.0)\n", + "Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch<3,>=2.0->bitsandbytes) (1.13.1)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch<3,>=2.0->bitsandbytes) (1.3.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch<3,>=2.0->bitsandbytes) (3.0.2)\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "import re\n", + "import json\n", + "import ast\n", + "\n", + "\n", + "\n", + "def extract_json_objects_from_text_block(texts):\n", + " \"\"\"\n", + " Accepts either a single string or a list of strings.\n", + " Extracts all valid JSON objects from messy text blocks.\n", + " \"\"\"\n", + " if isinstance(texts, str):\n", + " texts = [texts] # wrap in list if single string\n", + "\n", + " pattern = r\"\\{[\\s\\S]*?\\}\"\n", + " results = []\n", + "\n", + " for raw_text in texts:\n", + " matches = re.findall(pattern, raw_text)\n", + " for match in matches:\n", + " try:\n", + " obj = json.loads(match)\n", + " results.append(obj)\n", + " except json.JSONDecodeError:\n", + " continue\n", + "\n", + " return results\n", + "\n", + "text = generate_jobs(role=\"Software Engineer\", n=10)\n", + "print(extract_json_objects_from_text_block(text))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 86, + "referenced_widgets": [ + "b5dd409cf6e04764adbb7c2a49b7be86", + "65187b4ebb2041b39778268e8b4d6b0d", + "33317cac10ca4a98bf4433c1eff43435", + "f81f5402902c4c04b10895782287e908", + "c471914fe0d34ae8967bac2820637d5b", + "7aead6f6cffa40a383f1b8c64943329e", + "f24fe57d8e164fd68185b4c117e7c097", + "f913ca9ab6d44ab1b788a36bd964ed39", + "ed34016801264a05bb3697eca2ac22ef", + "fe622254072540fda3b0dd6b2cab6e4a", + "5d95bdea47594e21855a6e564d0760da" + ] + }, + "id": "1uzTM2G1oqDs", + "outputId": "08e88ab0-ca17-46d3-8f9c-6a595863aeba" + }, + "execution_count": 22, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]" + ], + "application/vnd.jupyter.widget-view+json": { + "version_major": 2, + "version_minor": 0, + "model_id": "b5dd409cf6e04764adbb7c2a49b7be86" + } + }, + "metadata": {} + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "[{'title': 'Software Engineer', 'description': 'We are looking for a highly skilled software engineer to join our team. The ideal candidate will have experience in developing and maintaining complex software systems. Responsibilities include writing clean, efficient, and testable code, collaborating with cross-functional teams, and actively participating in code reviews. Must have a strong understanding of data structures, algorithms, and software design patterns. This is a full-time position located in San Francisco, CA.', 'requirements': [\"Bachelor's degree in Computer Science or related field\", '3+ years of experience in software development', 'Strong proficiency in Java or C++', 'Experience with agile development methodologies', 'Excellent problem-solving and analytical skills', 'Ability to work independently and as part of a team', 'Strong written and verbal communication skills'], 'location': 'San Francisco, CA', 'company_name': 'Acme Inc.'}, {'title': 'Software Engineer', 'description': 'We are looking for a highly skilled software engineer to join our team. The ideal candidate will have experience in developing and maintaining complex software systems. Responsibilities include writing clean, efficient, and testable code, collaborating with cross-functional teams, and actively participating in code reviews. Must have a strong understanding of software design patterns and best practices. This is a full-time position located in San Francisco.', 'requirements': [\"Bachelor's degree in Computer Science or related field\", '5+ years of experience in software development', 'Strong proficiency in Java or C++', 'Experience with agile development methodologies', 'Excellent problem-solving skills', 'Ability to work independently and as part of a team', 'Strong communication and interpersonal skills'], 'location': 'San Francisco', 'company_name': 'Acme Inc.'}, {'title': 'Software Engineer', 'description': 'We are looking for a highly skilled software engineer to join our team. The ideal candidate will have experience in developing and maintaining complex software systems. Responsibilities include writing clean, efficient, and testable code, collaborating with cross-functional teams, and actively participating in code reviews. This is a full-time position located in San Francisco.', 'requirements': [\"Bachelor's degree in Computer Science or related field\", '5+ years of experience in software development', 'Strong proficiency in Java or C++', 'Experience with agile development methodologies', 'Excellent problem-solving and communication skills'], 'location': 'San Francisco', 'company_name': 'Acme Inc.'}, {'title': 'Software Engineer', 'description': 'We are looking for a talented software engineer to join our team. The ideal candidate will have experience in developing and maintaining software systems, with a strong background in computer science or a related field. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. This is an excellent opportunity for a self-starter who is passionate about technology and eager to learn and grow.', 'requirements': [\"Bachelor's degree in Computer Science or a related field\", '2+ years of experience in software development', 'Strong proficiency in Java or C++', 'Experience with agile development methodologies', 'Excellent problem-solving skills', 'Ability to work independently and as part of a team'], 'location': 'New York, NY', 'company_name': 'ABC Corporation'}, {'title': 'Software Engineer', 'description': 'We are looking for a highly skilled software engineer to join our team. The ideal candidate will have experience in developing and maintaining complex software systems. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and actively participating in code reviews. Must have a strong understanding of data structures, algorithms, and software design patterns. This is a full-time position located in San Francisco, CA.', 'requirements': [\"Bachelor's degree in Computer Science or related field\", '3+ years of experience in software development', 'Strong proficiency in Java or C++', 'Experience with agile development methodologies', 'Excellent problem-solving skills', 'Ability to work independently and as part of a team'], 'location': 'San Francisco, CA', 'company_name': 'Acme Inc.'}, {'title': 'Software Engineer', 'description': 'We are seeking a highly skilled software engineer to join our team and develop innovative software solutions. The ideal candidate will have a strong background in computer science and experience with various programming languages. Responsibilities include designing, coding, testing and maintaining software systems, as well as collaborating with cross-functional teams. Must have excellent problem-solving skills and the ability to work independently in a fast-paced environment. This is a full-time position located in San Francisco.', 'requirements': [\"Bachelor's degree in Computer Science or related field\", 'Minimum of 3 years experience in software development', 'Strong proficiency in Java, Python, or C++', 'Experience with agile development methodologies', 'Excellent communication and collaboration skills'], 'location': 'San Francisco', 'company_name': 'Acme Inc.'}, {'title': 'Software Engineer', 'description': 'We are looking for a skilled software engineer to join our team. The ideal candidate will have experience in developing and maintaining software systems, with a strong understanding of programming languages such as Java or Python. Responsibilities include writing clean and efficient code, collaborating with cross-functional teams, and ensuring the timely delivery of high-quality software products. This is an excellent opportunity for a self-starter who is passionate about technology and eager to grow in their career.', 'requirements': [\"Bachelor's degree in Computer Science or related field\", 'Minimum of 2 years experience in software development', 'Strong proficiency in Java or Python', 'Experience with agile development methodologies', 'Excellent problem-solving and analytical skills', 'Ability to work independently and as part of a team'], 'location': 'New York', 'company_name': 'ABC Corporation'}, {'title': 'Software Engineer', 'description': 'We are looking for a skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have experience in designing, developing and testing software systems, and be able to work collaboratively with cross-functional teams. Responsibilities include writing clean, efficient and testable code, as well as actively participating in code reviews and continuous integration processes. This is an excellent opportunity for a self-starter with a passion for technology and a desire to grow in their career.', 'requirements': [\"Bachelor's degree in Computer Science or related field\", '2+ years of experience in software development', 'Strong proficiency in Java or C++', 'Experience with agile development methodologies', 'Good understanding of data structures and algorithms', 'Excellent problem-solving skills', 'Ability to work independently and as part of a team', 'Strong written and verbal communication skills'], 'location': 'New York', 'company_name': 'ABC Technologies'}, {'title': 'Software Engineer', 'description': 'We are seeking a highly skilled software engineer to join our team in developing and maintaining cutting-edge software systems. The ideal candidate will have a strong background in computer science and be proficient in multiple programming languages. Responsibilities include designing, coding, testing, and debugging software applications, as well as collaborating with cross-functional teams. This is an exciting opportunity for a creative and innovative problem solver to make a significant impact in the tech industry.', 'requirements': [\"Bachelor's degree in Computer Science or related field\", 'Minimum of 2 years experience in software development', 'Strong proficiency in Java, Python, or C++', 'Experience with agile development methodologies', 'Excellent analytical and problem-solving skills'], 'location': 'New York, NY', 'company_name': 'ABC Tech Inc.'}, {'title': 'Software Engineer', 'description': 'We are looking for a talented software engineer to join our team. Responsibilities include developing and maintaining software systems, collaborating with cross-functional teams, and ensuring timely delivery of high-quality products. The ideal candidate will have a strong background in computer science and experience with programming languages such as Java, Python, or C++. Must be able to work independently and within a team environment. Strong problem-solving and communication skills required. This is a full-time position located in New York.', 'requirements': [\"Bachelor's degree in Computer Science or related field\", '3+ years of experience in software development', 'Strong proficiency in Java, Python, or C++', 'Experience with agile development methodologies', 'Excellent problem-solving and communication skills', 'Ability to work independently and within a team environment'], 'location': 'New York', 'company_name': 'ABC Company'}]\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "!pip install gradio" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "lsLBgN1TMj0i", + "outputId": "a1e1a45d-62a1-4704-8b86-fb155e68e684" + }, + "execution_count": 23, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Collecting gradio\n", + " Downloading gradio-5.29.0-py3-none-any.whl.metadata (16 kB)\n", + "Collecting aiofiles<25.0,>=22.0 (from gradio)\n", + " Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)\n", + "Requirement already satisfied: anyio<5.0,>=3.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (4.9.0)\n", + "Collecting fastapi<1.0,>=0.115.2 (from gradio)\n", + " Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)\n", + "Collecting ffmpy (from gradio)\n", + " Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)\n", + "Collecting gradio-client==1.10.0 (from gradio)\n", + " Downloading gradio_client-1.10.0-py3-none-any.whl.metadata (7.1 kB)\n", + "Collecting groovy~=0.1 (from gradio)\n", + " Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)\n", + "Requirement already satisfied: httpx>=0.24.1 in /usr/local/lib/python3.11/dist-packages (from gradio) (0.27.2)\n", + "Requirement already satisfied: huggingface-hub>=0.28.1 in /usr/local/lib/python3.11/dist-packages (from gradio) (0.30.2)\n", + "Requirement already satisfied: jinja2<4.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (3.1.6)\n", + "Requirement already satisfied: markupsafe<4.0,>=2.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (3.0.2)\n", + "Requirement already satisfied: numpy<3.0,>=1.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (2.0.2)\n", + "Requirement already satisfied: orjson~=3.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (3.10.17)\n", + "Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from gradio) (24.2)\n", + "Requirement already satisfied: pandas<3.0,>=1.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (2.2.2)\n", + "Requirement already satisfied: pillow<12.0,>=8.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (11.2.1)\n", + "Requirement already satisfied: pydantic<2.12,>=2.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (2.11.3)\n", + "Collecting pydub (from gradio)\n", + " Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)\n", + "Collecting python-multipart>=0.0.18 (from gradio)\n", + " Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)\n", + "Requirement already satisfied: pyyaml<7.0,>=5.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (6.0.2)\n", + "Collecting ruff>=0.9.3 (from gradio)\n", + " Downloading ruff-0.11.8-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)\n", + "Collecting safehttpx<0.2.0,>=0.1.6 (from gradio)\n", + " Downloading safehttpx-0.1.6-py3-none-any.whl.metadata (4.2 kB)\n", + "Collecting semantic-version~=2.0 (from gradio)\n", + " Downloading semantic_version-2.10.0-py2.py3-none-any.whl.metadata (9.7 kB)\n", + "Collecting starlette<1.0,>=0.40.0 (from gradio)\n", + " Downloading starlette-0.46.2-py3-none-any.whl.metadata (6.2 kB)\n", + "Collecting tomlkit<0.14.0,>=0.12.0 (from gradio)\n", + " Downloading tomlkit-0.13.2-py3-none-any.whl.metadata (2.7 kB)\n", + "Requirement already satisfied: typer<1.0,>=0.12 in /usr/local/lib/python3.11/dist-packages (from gradio) (0.15.3)\n", + "Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.11/dist-packages (from gradio) (4.13.2)\n", + "Collecting uvicorn>=0.14.0 (from gradio)\n", + " Downloading uvicorn-0.34.2-py3-none-any.whl.metadata (6.5 kB)\n", + "Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from gradio-client==1.10.0->gradio) (2025.3.2)\n", + "Requirement already satisfied: websockets<16.0,>=10.0 in /usr/local/lib/python3.11/dist-packages (from gradio-client==1.10.0->gradio) (15.0.1)\n", + "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.11/dist-packages (from anyio<5.0,>=3.0->gradio) (3.10)\n", + "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.11/dist-packages (from anyio<5.0,>=3.0->gradio) (1.3.1)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.11/dist-packages (from httpx>=0.24.1->gradio) (2025.4.26)\n", + "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.11/dist-packages (from httpx>=0.24.1->gradio) (1.0.9)\n", + "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.11/dist-packages (from httpcore==1.*->httpx>=0.24.1->gradio) (0.16.0)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from huggingface-hub>=0.28.1->gradio) (3.18.0)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from huggingface-hub>=0.28.1->gradio) (2.32.3)\n", + "Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.11/dist-packages (from huggingface-hub>=0.28.1->gradio) (4.67.1)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas<3.0,>=1.0->gradio) (2.9.0.post0)\n", + "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas<3.0,>=1.0->gradio) (2025.2)\n", + "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas<3.0,>=1.0->gradio) (2025.2)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<2.12,>=2.0->gradio) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.33.1 in /usr/local/lib/python3.11/dist-packages (from pydantic<2.12,>=2.0->gradio) (2.33.1)\n", + "Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<2.12,>=2.0->gradio) (0.4.0)\n", + "Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.11/dist-packages (from typer<1.0,>=0.12->gradio) (8.1.8)\n", + "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.11/dist-packages (from typer<1.0,>=0.12->gradio) (1.5.4)\n", + "Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.11/dist-packages (from typer<1.0,>=0.12->gradio) (13.9.4)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.8.2->pandas<3.0,>=1.0->gradio) (1.17.0)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio) (3.0.0)\n", + "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio) (2.19.1)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->huggingface-hub>=0.28.1->gradio) (3.4.1)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->huggingface-hub>=0.28.1->gradio) (2.4.0)\n", + "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0,>=0.12->gradio) (0.1.2)\n", + "Downloading gradio-5.29.0-py3-none-any.whl (54.1 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m54.1/54.1 MB\u001b[0m \u001b[31m46.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading gradio_client-1.10.0-py3-none-any.whl (322 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m322.9/322.9 kB\u001b[0m \u001b[31m34.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading aiofiles-24.1.0-py3-none-any.whl (15 kB)\n", + "Downloading fastapi-0.115.12-py3-none-any.whl (95 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m95.2/95.2 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading groovy-0.1.2-py3-none-any.whl (14 kB)\n", + "Downloading python_multipart-0.0.20-py3-none-any.whl (24 kB)\n", + "Downloading ruff-0.11.8-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.5 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m11.5/11.5 MB\u001b[0m \u001b[31m131.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading safehttpx-0.1.6-py3-none-any.whl (8.7 kB)\n", + "Downloading semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)\n", + "Downloading starlette-0.46.2-py3-none-any.whl (72 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m72.0/72.0 kB\u001b[0m \u001b[31m7.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading tomlkit-0.13.2-py3-none-any.whl (37 kB)\n", + "Downloading uvicorn-0.34.2-py3-none-any.whl (62 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.5/62.5 kB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading ffmpy-0.5.0-py3-none-any.whl (6.0 kB)\n", + "Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n", + "Installing collected packages: pydub, uvicorn, tomlkit, semantic-version, ruff, python-multipart, groovy, ffmpy, aiofiles, starlette, safehttpx, gradio-client, fastapi, gradio\n", + "Successfully installed aiofiles-24.1.0 fastapi-0.115.12 ffmpy-0.5.0 gradio-5.29.0 gradio-client-1.10.0 groovy-0.1.2 pydub-0.25.1 python-multipart-0.0.20 ruff-0.11.8 safehttpx-0.1.6 semantic-version-2.10.0 starlette-0.46.2 tomlkit-0.13.2 uvicorn-0.34.2\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "import gradio as gr\n", + "import json\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM\n", + "import torch\n", + "import re\n", + "\n", + "def generate_ui(role, n):\n", + " try:\n", + " raw_jobs = generate_jobs(role, n)\n", + " parsed_jobs = extract_json_objects_from_text_block(raw_jobs)\n", + "\n", + " if not isinstance(parsed_jobs, list) or not all(isinstance(item, dict) for item in parsed_jobs):\n", + " print(\"[ERROR] Parsed result is not a list of dicts\")\n", + " return gr.update(value=[], visible=True), None\n", + "\n", + " filename = f\"{role.replace(' ', '_').lower()}_jobs.json\"\n", + " with open(filename, \"w\") as f:\n", + " json.dump(parsed_jobs, f, indent=2)\n", + "\n", + " print(f\"[INFO] Returning {len(parsed_jobs)} jobs -> {filename}\")\n", + " return parsed_jobs, filename\n", + "\n", + " except Exception as e:\n", + " print(f\"[FATAL ERROR] {e}\")\n", + " return gr.update(value=[], visible=True), None\n", + "\n", + "if __name__ == \"__main__\":\n", + " with gr.Blocks() as demo:\n", + " gr.Markdown(\"# 🧠 Synthetic Job Dataset Generator\")\n", + " gr.Markdown(\"Generate a structured dataset of job postings for a specific role.\")\n", + "\n", + " with gr.Row():\n", + " role_input = gr.Textbox(label=\"Job Role\", placeholder=\"e.g. Software Engineer\", value=\"Software Engineer\")\n", + " n_input = gr.Number(label=\"Number of Samples\", value=5, precision=0)\n", + "\n", + " generate_button = gr.Button(\"🚀 Generate\")\n", + " output_table = gr.JSON(label=\"Generated Dataset\")\n", + " download_button = gr.File(label=\"Download JSON\")\n", + "\n", + " generate_button.click(\n", + " generate_ui,\n", + " inputs=[role_input, n_input],\n", + " outputs=[output_table, download_button]\n", + " )\n", + "\n", + " demo.launch(debug=True)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 730, + "referenced_widgets": [ + "9f9defc39ac5437e9512e5fad810b409", + "1c126dfdc51c438b9b48c8a65e549ae2", + "741d800130ea4830b9266f467fa6a0bf", + "73c0a01f1693471c9c017143e9e9058b", + "ab8174c1337b43048e05aeca72ca18ef", + "5e5a992d86434e62a25fc9b7f75f4b16", + "1507b1310f5045c9b691fdb102cc1686", + "8a8e81f9d3a54ce49b367f8e984b4a06", + "bab02b1f092b40c8983cd6440f7eaf16", + "94f30dc2653a4f178c9c2ef454d24644", + "a508625ef12d4a639fa9773484507709" + ] + }, + "id": "FEByigZTo5cv", + "outputId": "e452754b-e155-4b57-eced-7af37996f1f0" + }, + "execution_count": 25, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).\n", + "\n", + "Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().\n", + "* Running on public URL: https://bf27145eb99f8caadd.gradio.live\n", + "\n", + "This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
RandomForestRegressor(n_jobs=-1, random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestRegressor(n_jobs=-1, random_state=42)
GradientBoostingRegressor(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
GradientBoostingRegressor(random_state=42)