Wiping it out , living only README

This commit is contained in:
Dmitry Kisselev
2025-10-29 16:42:37 -07:00
parent d28039e255
commit 3ab5c95deb
81 changed files with 59 additions and 21235 deletions

View File

@@ -1,77 +0,0 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# Virtual environments
venv/
ENV/
env/
.venv
# Environment variables
.env
# Configuration
config.yaml
# Database
*.db
*.db-journal
*.sqlite
*.sqlite3
# ChromaDB
cat_vectorstore/
metadata_vectorstore/
*.chroma
# IDEs
.vscode/
.idea/
*.swp
*.swo
*~
.DS_Store
# Testing
.coverage
htmlcov/
.pytest_cache/
.tox/
# Logs
*.log
logs/
# Modal
.modal-cache/
# Data files
data/*.db
data/*.json
!data/.gitkeep
# Model cache (sentence-transformers, huggingface, etc.)
.cache/
# Jupyter
.ipynb_checkpoints/

View File

@@ -8,230 +8,99 @@ Find your perfect feline companion using AI, semantic search, and multi-platform
---
## 🌟 Features
## 🌟 Overview
**Multi-Platform Search** - Aggregates from Petfinder and RescueGroups
**Natural Language** - Describe your ideal cat in plain English
**Semantic Matching** - AI understands personality, not just keywords
**Color/Breed Matching** - 3-tier system handles typos ("tuxado" → "tuxedo", "main coon" → "Maine Coon")
**Deduplication** - Multi-modal (name + description + image) duplicate detection
**Hybrid Search** - Combines vector similarity with structured filters
**Image Recognition** - Uses CLIP to match cats visually
**Email Notifications** - Get alerts for new matches
**Serverless Backend** - Optionally deploy to Modal for cloud-based search and alerts
Tuxedo Link is an intelligent cat adoption platform that combines:
**Technical Stack**: OpenAI GPT-4 • ChromaDB • CLIP • Gradio • Modal
- **Natural Language Understanding** - Describe your ideal cat in plain English
- **Semantic Search with RAG** - ChromaDB + SentenceTransformers for personality-based matching
- **Multi-Modal Deduplication** - Uses CLIP for image similarity + text analysis
- **Hybrid Scoring** - 60% vector similarity + 40% attribute matching
- **Multi-Platform Aggregation** - Searches Petfinder and RescueGroups APIs
- **Serverless Architecture** - Optional Modal deployment with scheduled email alerts
## 🏗️ Architecture Modes
Tuxedo Link supports two deployment modes:
### Local Mode (Development)
- All components run locally
- Uses local database and vector store
- Fast iteration and development
- No Modal required
### Production Mode (Cloud)
- UI runs locally, backend runs on Modal
- Database and vector store on Modal volumes
- Scheduled email alerts active
- Scalable and serverless
Switch between modes in `config.yaml` by setting `deployment.mode` to `local` or `production`.
**Tech Stack**: OpenAI GPT-4 • ChromaDB • CLIP • Gradio • Modal
---
## 🚀 Quick Start
## 📸 Application Screenshots
### Prerequisites
- Python 3.11+
- `uv` package manager
- API keys (OpenAI, Petfinder, Mailgun)
### Installation
### 🔍 Search Interface
Natural language search with semantic matching and personality-based results:
1. **Navigate to project directory**
```bash
cd week8/community_contributions/dkisselev-zz/tuxedo_link
```
![Search Interface](assets/1.%20search.png)
2. **Set up virtual environment**
```bash
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
```
### 🔔 Email Alerts
Save your search and get notified when new matching cats are available:
3. **Configure environment variables**
```bash
# Copy template and add your API keys
cp env.example .env
# Edit .env with your keys
```
![Alerts Management](assets/2.%20Alerts.png)
4. **Configure application settings**
```bash
# Copy configuration template
cp config.example.yaml config.yaml
# Edit config.yaml for email provider and deployment mode
```
### 📖 About Page
Learn about the technology and inspiration behind Tuxedo Link:
5. **Initialize databases**
```bash
python setup_vectordb.py
```
![About Page](assets/3.%20About.png)
6. **Run the application**
```bash
./run.sh
```
### 📧 Email Notifications
Receive beautiful email alerts with your perfect matches:
Visit http://localhost:7860 in your browser!
![Email Notification](assets/4.%20Email.png)
---
## 🔑 API Setup
## 🚀 Full Project & Source Code
### Required API Keys
The complete source code, documentation, and setup instructions are available at:
Add these to your `.env` file:
### **[👉 GitHub Repository: dkisselev-zz/tuxedo-link](https://github.com/dkisselev-zz/tuxedo-link)**
```bash
# OpenAI (for profile extraction)
# Get key from: https://platform.openai.com/api-keys
OPENAI_API_KEY=sk-...
The repository includes:
# Petfinder (for cat listings)
# Get key from: https://www.petfinder.com/developers/
PETFINDER_API_KEY=your_key
PETFINDER_SECRET=your_secret
# Mailgun (for email alerts)
# Get key from: https://app.mailgun.com/
MAILGUN_API_KEY=your_mailgun_key
```
### Optional API Keys
```bash
# RescueGroups (additional cat listings)
# Get key from: https://userguide.rescuegroups.org/
RESCUEGROUPS_API_KEY=your_key
# SendGrid (alternative email provider)
SENDGRID_API_KEY=SG...
# Modal (for cloud deployment)
MODAL_TOKEN_ID=...
MODAL_TOKEN_SECRET=...
```
### Application Configuration
Edit `config.yaml` to configure:
```yaml
# Email provider (mailgun or sendgrid)
email:
provider: mailgun
from_name: "Tuxedo Link"
from_email: "noreply@yourdomain.com"
# Mailgun domain
mailgun:
domain: "your-domain.mailgun.org"
# Deployment mode (local or production)
deployment:
mode: local # Use 'local' for development
```
**Note**: API keys go in `.env` (git-ignored), application settings go in `config.yaml` (also git-ignored).
- ✅ Complete source code with 92 passing tests
- ✅ Comprehensive technical documentation (3,400+ lines)
- ✅ Agentic architecture with 7 specialized agents
- ✅ Dual vector store implementation (main + metadata)
- ✅ Modal deployment guide for production
- ✅ Setup scripts and configuration examples
- ✅ LLM techniques documentation (structured output, RAG, hybrid search)
---
## 💻 Usage
## 🧠 Key LLM/RAG Techniques
### Search Tab
1. Describe your ideal cat in natural language
2. Click "Search" or press Enter
3. Browse results with match scores
4. Click "View Details" to see adoption page
### 1. Structured Output with GPT-4 Function Calling
Extracts search preferences from natural language into Pydantic models
**Example queries:**
- "I want a friendly family cat in NYC good with children"
- "Looking for a playful young kitten"
- "Show me calm adult cats that like to cuddle"
- "Find me a tuxedo maine coon in Boston" (natural color/breed terms work!)
- "Orange tabby that's good with other cats"
### 2. Dual Vector Store Architecture
- **Main ChromaDB** - Cat profile semantic embeddings
- **Metadata DB** - Fuzzy color/breed matching with typo tolerance
#### Alerts Tab
1. Perform a search in the Search tab first
2. Go to Alerts tab
3. Enter your email address
4. Choose notification frequency (Immediately, Daily, Weekly)
5. Click "Save Alert"
### 3. Hybrid Search Strategy
Combines vector similarity (60%) with structured metadata filtering (40%)
You'll receive email notifications when new matches are found!
### 4. 3-Tier Semantic Normalization
Dictionary → Vector DB → Fuzzy fallback for robust term mapping
#### About Tab
Learn about Kyra and the technology behind the app
### 5. Multi-Modal Deduplication
Fingerprint + text (Levenshtein) + image (CLIP) similarity scoring
### Development Mode
---
For faster development and testing, use local mode in `config.yaml`:
## 🏆 Project Highlights
```yaml
deployment:
mode: local # Uses local database and cached data
```
- **92 Tests** - 81 unit + 11 integration tests (100% passing)
- **Production Ready** - Serverless Modal deployment with volumes
- **Email Alerts** - Scheduled background jobs for new match notifications
- **95%+ Accuracy** - Multi-modal deduplication across platforms
- **85-90% Match Quality** - Hybrid scoring algorithm
---
## 📚 Documentation
### Complete Technical Reference
For detailed documentation on the architecture, agents, and every function in the codebase, see:
**[📖 TECHNICAL_REFERENCE.md](docs/TECHNICAL_REFERENCE.md)** - Complete technical documentation including:
- Configuration system
- Agentic architecture
- Data flow pipeline
- Deduplication strategy
- Email provider system
- Alert management
- All functions with examples
- User journey walkthroughs
**[📊 ARCHITECTURE_DIAGRAM.md](docs/architecture_diagrams/ARCHITECTURE_DIAGRAM.md)** - Visual diagrams:
- System architecture
- Agent interaction
- Data flow
- Database schema
**[🚀 MODAL_DEPLOYMENT.md](docs/MODAL_DEPLOYMENT.md)** - Cloud deployment guide:
- Production mode architecture
- Automated deployment with `deploy.sh`
- Modal API and scheduled jobs
- UI-to-Modal communication
- Monitoring and troubleshooting
**[🧪 tests/README.md](tests/README.md)** - Testing guide:
- Running unit tests
- Running integration tests
- Manual test scripts
- Coverage reports
---
## 🤝 Contributing
This project was built as part of the Andela LLM Engineering bootcamp. Contributions and improvements are welcome!
---
## 📄 License
See [LICENSE](LICENSE) file for details.
- **TECHNICAL_REFERENCE.md** - Complete API documentation
- **MODAL_DEPLOYMENT.md** - Cloud deployment guide
- **ARCHITECTURE_DIAGRAM.md** - System architecture visuals
- **tests/README.md** - Testing guide and coverage
---
@@ -241,6 +110,6 @@ See [LICENSE](LICENSE) file for details.
*May every cat find their perfect home* 🐾
[Technical Reference](docs/TECHNICAL_REFERENCE.md) • [Architecture](docs/architecture_diagrams/ARCHITECTURE_DIAGRAM.md) • [Deployment](docs/MODAL_DEPLOYMENT.md) • [Tests](tests/README.md)
**[View Full Project on GitHub →](https://github.com/dkisselev-zz/tuxedo-link)**
</div>

View File

@@ -1,22 +0,0 @@
"""Agent implementations for Tuxedo Link."""
from .agent import Agent
from .petfinder_agent import PetfinderAgent
from .rescuegroups_agent import RescueGroupsAgent
from .profile_agent import ProfileAgent
from .matching_agent import MatchingAgent
from .deduplication_agent import DeduplicationAgent
from .planning_agent import PlanningAgent
from .email_agent import EmailAgent
__all__ = [
"Agent",
"PetfinderAgent",
"RescueGroupsAgent",
"ProfileAgent",
"MatchingAgent",
"DeduplicationAgent",
"PlanningAgent",
"EmailAgent",
]

View File

@@ -1,86 +0,0 @@
"""Base Agent class for Tuxedo Link agents."""
import logging
import time
from functools import wraps
from typing import Any, Callable
class Agent:
"""
An abstract superclass for Agents.
Used to log messages in a way that can identify each Agent.
"""
# Foreground colors
RED = '\033[31m'
GREEN = '\033[32m'
YELLOW = '\033[33m'
BLUE = '\033[34m'
MAGENTA = '\033[35m'
CYAN = '\033[36m'
WHITE = '\033[37m'
# Background color
BG_BLACK = '\033[40m'
# Reset code to return to default color
RESET = '\033[0m'
name: str = ""
color: str = '\033[37m'
def log(self, message: str) -> None:
"""
Log this as an info message, identifying the agent.
Args:
message: Message to log
"""
color_code = self.BG_BLACK + self.color
message = f"[{self.name}] {message}"
logging.info(color_code + message + self.RESET)
def log_error(self, message: str) -> None:
"""
Log an error message.
Args:
message: Error message to log
"""
color_code = self.BG_BLACK + self.RED
message = f"[{self.name}] ERROR: {message}"
logging.error(color_code + message + self.RESET)
def log_warning(self, message: str) -> None:
"""
Log a warning message.
Args:
message: Warning message to log
"""
color_code = self.BG_BLACK + self.YELLOW
message = f"[{self.name}] WARNING: {message}"
logging.warning(color_code + message + self.RESET)
def timed(func: Callable[..., Any]) -> Callable[..., Any]:
"""
Decorator to log execution time of agent methods.
Args:
func: Function to time
Returns:
Wrapped function
"""
@wraps(func)
def wrapper(self: Any, *args: Any, **kwargs: Any) -> Any:
"""Wrapper function that times and logs method execution."""
start_time = time.time()
result = func(self, *args, **kwargs)
elapsed = time.time() - start_time
self.log(f"{func.__name__} completed in {elapsed:.2f} seconds")
return result
return wrapper

View File

@@ -1,229 +0,0 @@
"""Deduplication agent for identifying and managing duplicate cat listings."""
import os
from typing import List, Tuple, Optional
from dotenv import load_dotenv
import numpy as np
from models.cats import Cat
from database.manager import DatabaseManager
from utils.deduplication import (
create_fingerprint,
calculate_text_similarity,
calculate_composite_score
)
from utils.image_utils import generate_image_embedding, calculate_image_similarity
from .agent import Agent, timed
class DeduplicationAgent(Agent):
"""Agent for deduplicating cat listings across multiple sources."""
name = "Deduplication Agent"
color = Agent.YELLOW
def __init__(self, db_manager: DatabaseManager):
"""
Initialize the deduplication agent.
Args:
db_manager: Database manager instance
"""
load_dotenv()
self.db_manager = db_manager
# Load thresholds from environment
self.name_threshold = float(os.getenv('DEDUP_NAME_SIMILARITY_THRESHOLD', '0.8'))
self.desc_threshold = float(os.getenv('DEDUP_DESCRIPTION_SIMILARITY_THRESHOLD', '0.7'))
self.image_threshold = float(os.getenv('DEDUP_IMAGE_SIMILARITY_THRESHOLD', '0.9'))
self.composite_threshold = float(os.getenv('DEDUP_COMPOSITE_THRESHOLD', '0.85'))
self.log("Deduplication Agent initialized")
self.log(f"Thresholds - Name: {self.name_threshold}, Desc: {self.desc_threshold}, "
f"Image: {self.image_threshold}, Composite: {self.composite_threshold}")
def _get_image_embedding(self, cat: Cat) -> Optional[np.ndarray]:
"""
Get or generate image embedding for a cat.
Args:
cat: Cat object
Returns:
Image embedding or None if unavailable
"""
if not cat.primary_photo:
return None
try:
embedding = generate_image_embedding(cat.primary_photo)
return embedding
except Exception as e:
self.log_warning(f"Failed to generate image embedding for {cat.name}: {e}")
return None
def _compare_cats(self, cat1: Cat, cat2: Cat,
emb1: Optional[np.ndarray],
emb2: Optional[np.ndarray]) -> Tuple[float, dict]:
"""
Compare two cats and return composite similarity score with details.
Args:
cat1: First cat
cat2: Second cat
emb1: Image embedding for cat1
emb2: Image embedding for cat2
Returns:
Tuple of (composite_score, details_dict)
"""
# Text similarity
name_sim, desc_sim = calculate_text_similarity(cat1, cat2)
# Image similarity
image_sim = 0.0
if emb1 is not None and emb2 is not None:
image_sim = calculate_image_similarity(emb1, emb2)
# Composite score
composite = calculate_composite_score(
name_similarity=name_sim,
description_similarity=desc_sim,
image_similarity=image_sim,
name_weight=0.4,
description_weight=0.3,
image_weight=0.3
)
details = {
'name_similarity': name_sim,
'description_similarity': desc_sim,
'image_similarity': image_sim,
'composite_score': composite
}
return composite, details
@timed
def process_cat(self, cat: Cat) -> Tuple[Cat, bool]:
"""
Process a single cat for deduplication.
Checks if the cat is a duplicate of an existing cat in the database.
If it's a duplicate, marks it as such and returns the canonical cat.
If it's unique, caches it in the database.
Args:
cat: Cat to process
Returns:
Tuple of (canonical_cat, is_duplicate)
"""
# Generate fingerprint
cat.fingerprint = create_fingerprint(cat)
# Check database for cats with same fingerprint
candidates = self.db_manager.get_cats_by_fingerprint(cat.fingerprint)
if not candidates:
# No candidates, this is unique
# Generate and cache image embedding
embedding = self._get_image_embedding(cat)
self.db_manager.cache_cat(cat, embedding)
return cat, False
self.log(f"Found {len(candidates)} potential duplicates for {cat.name}")
# Get embedding for new cat
new_embedding = self._get_image_embedding(cat)
# Compare with each candidate
best_match = None
best_score = 0.0
best_details = None
for candidate_cat, candidate_embedding in candidates:
score, details = self._compare_cats(cat, candidate_cat, new_embedding, candidate_embedding)
self.log(f"Comparing with {candidate_cat.name} (ID: {candidate_cat.id}): "
f"name={details['name_similarity']:.2f}, "
f"desc={details['description_similarity']:.2f}, "
f"image={details['image_similarity']:.2f}, "
f"composite={score:.2f}")
if score > best_score:
best_score = score
best_match = candidate_cat
best_details = details
# Check if best match exceeds threshold
if best_match and best_score >= self.composite_threshold:
self.log(f"DUPLICATE DETECTED: {cat.name} is duplicate of {best_match.name} "
f"(score: {best_score:.2f})")
# Mark as duplicate in database
self.db_manager.mark_as_duplicate(cat.id, best_match.id)
return best_match, True
# Not a duplicate, cache it
self.log(f"UNIQUE: {cat.name} is not a duplicate (best score: {best_score:.2f})")
self.db_manager.cache_cat(cat, new_embedding)
return cat, False
@timed
def deduplicate_batch(self, cats: List[Cat]) -> List[Cat]:
"""
Process a batch of cats for deduplication.
Args:
cats: List of cats to process
Returns:
List of unique cats (duplicates removed)
"""
self.log(f"Deduplicating batch of {len(cats)} cats")
unique_cats = []
duplicate_count = 0
for cat in cats:
try:
canonical_cat, is_duplicate = self.process_cat(cat)
if not is_duplicate:
unique_cats.append(canonical_cat)
else:
duplicate_count += 1
# Optionally include canonical if not already in list
if canonical_cat not in unique_cats:
unique_cats.append(canonical_cat)
except Exception as e:
self.log_error(f"Error processing cat {cat.name}: {e}")
# Include it anyway to avoid losing data
unique_cats.append(cat)
self.log(f"Deduplication complete: {len(unique_cats)} unique, {duplicate_count} duplicates")
return unique_cats
def get_duplicate_report(self) -> dict:
"""
Generate a report of duplicate statistics.
Returns:
Dictionary with duplicate statistics
"""
stats = self.db_manager.get_cache_stats()
return {
'total_unique': stats['total_unique'],
'total_duplicates': stats['total_duplicates'],
'deduplication_rate': stats['total_duplicates'] / (stats['total_unique'] + stats['total_duplicates'])
if (stats['total_unique'] + stats['total_duplicates']) > 0 else 0,
'by_source': stats['by_source']
}

View File

@@ -1,386 +0,0 @@
"""Email agent for sending match notifications."""
from typing import List, Optional
from datetime import datetime
from agents.agent import Agent
from agents.email_providers import get_email_provider, EmailProvider
from models.cats import CatMatch, AdoptionAlert
from utils.timing import timed
from utils.config import get_email_config
class EmailAgent(Agent):
"""Agent for sending email notifications about cat matches."""
name = "Email Agent"
color = '\033[35m' # Magenta
def __init__(self, provider: Optional[EmailProvider] = None):
"""
Initialize the email agent.
Args:
provider: Optional email provider instance. If None, creates from config.
"""
super().__init__()
try:
self.provider = provider or get_email_provider()
self.enabled = True
self.log(f"Email Agent initialized with provider: {self.provider.get_provider_name()}")
except Exception as e:
self.log_error(f"Failed to initialize email provider: {e}")
self.log_warning("Email notifications disabled")
self.enabled = False
self.provider = None
def _build_match_html(self, matches: List[CatMatch], alert: AdoptionAlert) -> str:
"""
Build HTML email content for matches.
Args:
matches: List of cat matches
alert: Adoption alert with user preferences
Returns:
HTML email content
"""
# Header
html = f"""
<!DOCTYPE html>
<html>
<head>
<style>
body {{
font-family: Arial, sans-serif;
line-height: 1.6;
color: #333;
max-width: 800px;
margin: 0 auto;
padding: 20px;
}}
.header {{
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
padding: 30px;
border-radius: 10px;
text-align: center;
margin-bottom: 30px;
}}
.header h1 {{
margin: 0;
font-size: 2.5em;
}}
.cat-card {{
border: 1px solid #ddd;
border-radius: 10px;
overflow: hidden;
margin-bottom: 20px;
box-shadow: 0 2px 8px rgba(0,0,0,0.1);
}}
.cat-photo {{
width: 100%;
height: 300px;
object-fit: cover;
}}
.cat-details {{
padding: 20px;
}}
.cat-name {{
font-size: 1.8em;
color: #333;
margin: 0 0 10px 0;
}}
.match-score {{
background: #4CAF50;
color: white;
padding: 5px 15px;
border-radius: 20px;
display: inline-block;
font-weight: bold;
margin-bottom: 10px;
}}
.cat-info {{
color: #666;
margin: 10px 0;
}}
.cat-description {{
color: #888;
line-height: 1.8;
margin: 15px 0;
}}
.view-button {{
display: inline-block;
background: #2196F3;
color: white;
padding: 12px 30px;
border-radius: 5px;
text-decoration: none;
font-weight: bold;
margin-top: 10px;
}}
.footer {{
text-align: center;
color: #999;
padding: 30px 0;
border-top: 1px solid #eee;
margin-top: 30px;
}}
.unsubscribe {{
color: #999;
text-decoration: none;
}}
</style>
</head>
<body>
<div class="header">
<h1>🎩 Tuxedo Link</h1>
<p>We found {len(matches)} new cat{'s' if len(matches) != 1 else ''} matching your preferences!</p>
</div>
"""
# Cat cards
for match in matches[:10]: # Limit to top 10 for email
cat = match.cat
photo = cat.primary_photo or "https://via.placeholder.com/800x300?text=No+Photo"
html += f"""
<div class="cat-card">
<img src="{photo}" alt="{cat.name}" class="cat-photo">
<div class="cat-details">
<h2 class="cat-name">{cat.name}</h2>
<div class="match-score">{match.match_score:.0%} Match</div>
<div class="cat-info">
<strong>{cat.breed}</strong><br/>
📍 {cat.city}, {cat.state}<br/>
🎂 {cat.age}{cat.gender.capitalize()}{cat.size.capitalize() if cat.size else 'Size not specified'}<br/>
"""
# Add special attributes
attrs = []
if cat.good_with_children:
attrs.append("👶 Good with children")
if cat.good_with_dogs:
attrs.append("🐕 Good with dogs")
if cat.good_with_cats:
attrs.append("🐱 Good with cats")
if attrs:
html += "<br/>" + "".join(attrs)
html += f"""
</div>
<div class="cat-description">
<strong>Why this is a great match:</strong><br/>
{match.explanation}
</div>
"""
# Add description if available
if cat.description:
desc = cat.description[:300] + "..." if len(cat.description) > 300 else cat.description
html += f"""
<div class="cat-description">
<strong>About {cat.name}:</strong><br/>
{desc}
</div>
"""
html += f"""
<a href="{cat.url}" class="view-button">View {cat.name}'s Profile →</a>
</div>
</div>
"""
# Footer
html += f"""
<div class="footer">
<p>This email was sent because you saved a search on Tuxedo Link.</p>
<p>
<a href="http://localhost:7860" class="unsubscribe">Manage Alerts</a> |
<a href="http://localhost:7860" class="unsubscribe">Unsubscribe</a>
</p>
<p>Made with ❤️ in memory of Tuxedo</p>
</div>
</body>
</html>
"""
return html
def _build_match_text(self, matches: List[CatMatch]) -> str:
"""
Build plain text email content for matches.
Args:
matches: List of cat matches
Returns:
Plain text email content
"""
text = f"TUXEDO LINK - New Matches Found!\n\n"
text += f"We found {len(matches)} cat{'s' if len(matches) != 1 else ''} matching your preferences!\n\n"
text += "="*60 + "\n\n"
for i, match in enumerate(matches[:10], 1):
cat = match.cat
text += f"{i}. {cat.name} - {match.match_score:.0%} Match\n"
text += f" {cat.breed}\n"
text += f" {cat.city}, {cat.state}\n"
text += f" {cat.age}{cat.gender}{cat.size or 'Size not specified'}\n"
text += f" Match: {match.explanation}\n"
text += f" View: {cat.url}\n\n"
text += "="*60 + "\n"
text += "Manage your alerts: http://localhost:7860\n"
text += "Made with love in memory of Tuxedo\n"
return text
@timed
def send_match_notification(
self,
alert: AdoptionAlert,
matches: List[CatMatch]
) -> bool:
"""
Send email notification about new matches.
Args:
alert: Adoption alert with user email and preferences
matches: List of cat matches to notify about
Returns:
True if email sent successfully, False otherwise
"""
if not self.enabled:
self.log_warning("Email agent disabled - skipping notification")
return False
if not matches:
self.log("No matches to send")
return False
try:
# Build email content
subject = f"🐱 {len(matches)} New Cat Match{'es' if len(matches) != 1 else ''} on Tuxedo Link!"
html_content = self._build_match_html(matches, alert)
text_content = self._build_match_text(matches)
# Send via provider
self.log(f"Sending notification to {alert.user_email} for {len(matches)} matches")
success = self.provider.send_email(
to=alert.user_email,
subject=subject,
html=html_content,
text=text_content
)
if success:
self.log(f"✅ Email sent successfully")
return True
else:
self.log_error(f"Failed to send email")
return False
except Exception as e:
self.log_error(f"Error sending email: {e}")
return False
@timed
def send_welcome_email(self, user_email: str, user_name: str = None) -> bool:
"""
Send welcome email when user creates an alert.
Args:
user_email: User's email address
user_name: User's name (optional)
Returns:
True if sent successfully, False otherwise
"""
if not self.enabled:
return False
try:
greeting = f"Hi {user_name}" if user_name else "Hello"
subject = "Welcome to Tuxedo Link! 🐱"
html_content = f"""
<!DOCTYPE html>
<html>
<head>
<style>
body {{
font-family: Arial, sans-serif;
line-height: 1.6;
color: #333;
max-width: 600px;
margin: 0 auto;
padding: 20px;
}}
.header {{
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
padding: 40px;
border-radius: 10px;
text-align: center;
}}
.content {{
padding: 30px 0;
}}
</style>
</head>
<body>
<div class="header">
<h1>🎩 Welcome to Tuxedo Link!</h1>
</div>
<div class="content">
<p>{greeting}!</p>
<p>Thank you for signing up for cat adoption alerts. We're excited to help you find your perfect feline companion!</p>
<p>We'll notify you when new cats matching your preferences become available for adoption.</p>
<p><strong>What happens next?</strong></p>
<ul>
<li>We'll search across multiple adoption platforms</li>
<li>You'll receive email notifications based on your preferences</li>
<li>You can manage your alerts anytime at <a href="http://localhost:7860">Tuxedo Link</a></li>
</ul>
<p>Happy cat hunting! 🐾</p>
<p style="color: #999; font-style: italic;">In loving memory of Kyra</p>
</div>
</body>
</html>
"""
text_content = f"""
{greeting}!
Thank you for signing up for Tuxedo Link cat adoption alerts!
We'll notify you when new cats matching your preferences become available.
What happens next?
- We'll search across multiple adoption platforms
- You'll receive email notifications based on your preferences
- Manage your alerts at: http://localhost:7860
Happy cat hunting!
In loving memory of Kyra
"""
success = self.provider.send_email(
to=user_email,
subject=subject,
html=html_content,
text=text_content
)
return success
except Exception as e:
self.log_error(f"Error sending welcome email: {e}")
return False

View File

@@ -1,14 +0,0 @@
"""Email provider implementations."""
from .base import EmailProvider
from .mailgun_provider import MailgunProvider
from .sendgrid_provider import SendGridProvider
from .factory import get_email_provider
__all__ = [
"EmailProvider",
"MailgunProvider",
"SendGridProvider",
"get_email_provider",
]

View File

@@ -1,45 +0,0 @@
"""Base email provider interface."""
from abc import ABC, abstractmethod
from typing import Dict, Optional
class EmailProvider(ABC):
"""Abstract base class for email providers."""
@abstractmethod
def send_email(
self,
to: str,
subject: str,
html: str,
text: str,
from_email: Optional[str] = None,
from_name: Optional[str] = None
) -> bool:
"""
Send an email.
Args:
to: Recipient email address
subject: Email subject
html: HTML body
text: Plain text body
from_email: Sender email (optional, uses config default)
from_name: Sender name (optional, uses config default)
Returns:
bool: True if email was sent successfully, False otherwise
"""
pass
@abstractmethod
def get_provider_name(self) -> str:
"""
Get the name of this provider.
Returns:
str: Provider name
"""
pass

View File

@@ -1,45 +0,0 @@
"""Email provider factory."""
import os
import logging
from typing import Optional
from .base import EmailProvider
from .mailgun_provider import MailgunProvider
from .sendgrid_provider import SendGridProvider
from utils.config import get_email_provider as get_configured_provider
logger = logging.getLogger(__name__)
def get_email_provider(provider_name: Optional[str] = None) -> EmailProvider:
"""
Get an email provider instance.
Args:
provider_name: Provider name (mailgun or sendgrid).
If None, uses configuration from config.yaml
Returns:
EmailProvider: Configured email provider instance
Raises:
ValueError: If provider name is unknown
"""
if not provider_name:
provider_name = get_configured_provider()
provider_name = provider_name.lower()
logger.info(f"Initializing email provider: {provider_name}")
if provider_name == 'mailgun':
return MailgunProvider()
elif provider_name == 'sendgrid':
return SendGridProvider()
else:
raise ValueError(
f"Unknown email provider: {provider_name}. "
"Valid options are: mailgun, sendgrid"
)

View File

@@ -1,97 +0,0 @@
"""Mailgun email provider implementation."""
import os
import requests
import logging
from typing import Optional
from .base import EmailProvider
from utils.config import get_mailgun_config, get_email_config
logger = logging.getLogger(__name__)
class MailgunProvider(EmailProvider):
"""Mailgun email provider."""
def __init__(self):
"""Initialize Mailgun provider."""
self.api_key = os.getenv('MAILGUN_API_KEY')
if not self.api_key:
raise ValueError("MAILGUN_API_KEY environment variable not set")
mailgun_config = get_mailgun_config()
self.domain = mailgun_config['domain']
self.base_url = f"https://api.mailgun.net/v3/{self.domain}/messages"
email_config = get_email_config()
self.default_from_name = email_config['from_name']
self.default_from_email = email_config['from_email']
logger.info(f"Mailgun provider initialized with domain: {self.domain}")
def send_email(
self,
to: str,
subject: str,
html: str,
text: str,
from_email: Optional[str] = None,
from_name: Optional[str] = None
) -> bool:
"""
Send an email via Mailgun.
Args:
to: Recipient email address
subject: Email subject
html: HTML body
text: Plain text body
from_email: Sender email (optional, uses config default)
from_name: Sender name (optional, uses config default)
Returns:
bool: True if email was sent successfully, False otherwise
"""
from_email = from_email or self.default_from_email
from_name = from_name or self.default_from_name
from_header = f"{from_name} <{from_email}>"
data = {
"from": from_header,
"to": to,
"subject": subject,
"text": text,
"html": html
}
try:
response = requests.post(
self.base_url,
auth=("api", self.api_key),
data=data,
timeout=30
)
if response.status_code == 200:
logger.info(f"Email sent successfully to {to} via Mailgun")
return True
else:
logger.error(
f"Failed to send email via Mailgun: {response.status_code} - {response.text}"
)
return False
except Exception as e:
logger.error(f"Exception sending email via Mailgun: {e}")
return False
def get_provider_name(self) -> str:
"""
Get the name of this provider.
Returns:
str: Provider name
"""
return "mailgun"

View File

@@ -1,72 +0,0 @@
"""SendGrid email provider implementation (stub)."""
import os
import logging
from typing import Optional
from .base import EmailProvider
from utils.config import get_email_config
logger = logging.getLogger(__name__)
class SendGridProvider(EmailProvider):
"""SendGrid email provider (stub implementation)."""
def __init__(self):
"""Initialize SendGrid provider."""
self.api_key = os.getenv('SENDGRID_API_KEY')
email_config = get_email_config()
self.default_from_name = email_config['from_name']
self.default_from_email = email_config['from_email']
logger.info("SendGrid provider initialized (stub mode)")
if not self.api_key:
logger.warning("SENDGRID_API_KEY not set - stub will only log, not send")
def send_email(
self,
to: str,
subject: str,
html: str,
text: str,
from_email: Optional[str] = None,
from_name: Optional[str] = None
) -> bool:
"""
Send an email via SendGrid (stub - only logs, doesn't actually send).
Args:
to: Recipient email address
subject: Email subject
html: HTML body
text: Plain text body
from_email: Sender email (optional, uses config default)
from_name: Sender name (optional, uses config default)
Returns:
bool: True (always succeeds in stub mode)
"""
from_email = from_email or self.default_from_email
from_name = from_name or self.default_from_name
logger.info(f"[STUB] Would send email via SendGrid:")
logger.info(f" From: {from_name} <{from_email}>")
logger.info(f" To: {to}")
logger.info(f" Subject: {subject}")
logger.info(f" Text length: {len(text)} chars")
logger.info(f" HTML length: {len(html)} chars")
# Simulate success
return True
def get_provider_name(self) -> str:
"""
Get the name of this provider.
Returns:
str: Provider name
"""
return "sendgrid (stub)"

View File

@@ -1,399 +0,0 @@
"""Matching agent for hybrid search (vector + metadata filtering)."""
import os
from typing import List
from dotenv import load_dotenv
from models.cats import Cat, CatProfile, CatMatch
from setup_vectordb import VectorDBManager
from utils.geocoding import calculate_distance
from .agent import Agent, timed
class MatchingAgent(Agent):
"""Agent for matching cats to user preferences using hybrid search."""
name = "Matching Agent"
color = Agent.BLUE
def __init__(self, vector_db: VectorDBManager):
"""
Initialize the matching agent.
Args:
vector_db: Vector database manager
"""
load_dotenv()
self.vector_db = vector_db
# Load configuration
self.vector_top_n = int(os.getenv('VECTOR_TOP_N', '50'))
self.final_limit = int(os.getenv('FINAL_RESULTS_LIMIT', '20'))
self.semantic_weight = float(os.getenv('SEMANTIC_WEIGHT', '0.6'))
self.attribute_weight = float(os.getenv('ATTRIBUTE_WEIGHT', '0.4'))
self.log("Matching Agent initialized")
self.log(f"Config - Vector Top N: {self.vector_top_n}, Final Limit: {self.final_limit}")
self.log(f"Weights - Semantic: {self.semantic_weight}, Attribute: {self.attribute_weight}")
def _apply_metadata_filters(self, profile: CatProfile) -> dict:
"""
Build ChromaDB where clause from profile hard constraints.
Args:
profile: User's cat profile
Returns:
Dictionary of metadata filters
"""
filters = []
# Age filter
if profile.age_range:
age_conditions = [{"age": age} for age in profile.age_range]
if len(age_conditions) > 1:
filters.append({"$or": age_conditions})
else:
filters.extend(age_conditions)
# Size filter
if profile.size:
size_conditions = [{"size": size} for size in profile.size]
if len(size_conditions) > 1:
filters.append({"$or": size_conditions})
else:
filters.extend(size_conditions)
# Gender filter
if profile.gender_preference:
filters.append({"gender": profile.gender_preference})
# Behavioral filters
if profile.good_with_children is not None:
# Filter for cats that are explicitly good with children or unknown
if profile.good_with_children:
filters.append({
"$or": [
{"good_with_children": "True"},
{"good_with_children": "unknown"}
]
})
if profile.good_with_dogs is not None:
if profile.good_with_dogs:
filters.append({
"$or": [
{"good_with_dogs": "True"},
{"good_with_dogs": "unknown"}
]
})
if profile.good_with_cats is not None:
if profile.good_with_cats:
filters.append({
"$or": [
{"good_with_cats": "True"},
{"good_with_cats": "unknown"}
]
})
# Special needs filter
if not profile.special_needs_ok:
filters.append({"special_needs": "False"})
# Combine filters with AND logic
if len(filters) == 0:
return None
elif len(filters) == 1:
return filters[0]
else:
return {"$and": filters}
def _calculate_attribute_match_score(self, cat: Cat, profile: CatProfile) -> tuple[float, List[str], List[str]]:
"""
Calculate how well cat's attributes match profile preferences.
Args:
cat: Cat to evaluate
profile: User profile
Returns:
Tuple of (score, matching_attributes, missing_attributes)
"""
matching_attrs = []
missing_attrs = []
total_checks = 0
matches = 0
# Age preference
if profile.age_range:
total_checks += 1
if cat.age in profile.age_range:
matches += 1
matching_attrs.append(f"Age: {cat.age}")
else:
missing_attrs.append(f"Preferred age: {', '.join(profile.age_range)}")
# Size preference
if profile.size:
total_checks += 1
if cat.size in profile.size:
matches += 1
matching_attrs.append(f"Size: {cat.size}")
else:
missing_attrs.append(f"Preferred size: {', '.join(profile.size)}")
# Gender preference
if profile.gender_preference:
total_checks += 1
if cat.gender == profile.gender_preference:
matches += 1
matching_attrs.append(f"Gender: {cat.gender}")
else:
missing_attrs.append(f"Preferred gender: {profile.gender_preference}")
# Good with children
if profile.good_with_children:
total_checks += 1
if cat.good_with_children:
matches += 1
matching_attrs.append("Good with children")
elif cat.good_with_children is False:
missing_attrs.append("Not good with children")
# Good with dogs
if profile.good_with_dogs:
total_checks += 1
if cat.good_with_dogs:
matches += 1
matching_attrs.append("Good with dogs")
elif cat.good_with_dogs is False:
missing_attrs.append("Not good with dogs")
# Good with cats
if profile.good_with_cats:
total_checks += 1
if cat.good_with_cats:
matches += 1
matching_attrs.append("Good with other cats")
elif cat.good_with_cats is False:
missing_attrs.append("Not good with other cats")
# Special needs
if not profile.special_needs_ok and cat.special_needs:
total_checks += 1
missing_attrs.append("Has special needs")
# Breed preference
if profile.preferred_breeds:
total_checks += 1
if cat.breed.lower() in [b.lower() for b in profile.preferred_breeds]:
matches += 1
matching_attrs.append(f"Breed: {cat.breed}")
else:
missing_attrs.append(f"Preferred breeds: {', '.join(profile.preferred_breeds)}")
# Calculate score
if total_checks == 0:
return 0.5, matching_attrs, missing_attrs # Neutral if no preferences
score = matches / total_checks
return score, matching_attrs, missing_attrs
def _filter_by_distance(self, cats_data: dict, profile: CatProfile) -> List[tuple[Cat, float, dict]]:
"""
Filter cats by distance and prepare for ranking.
Args:
cats_data: Results from vector search
profile: User profile
Returns:
List of (cat, vector_similarity, metadata) tuples
"""
results = []
ids = cats_data['ids'][0]
distances = cats_data['distances'][0]
metadatas = cats_data['metadatas'][0]
for i, cat_id in enumerate(ids):
metadata = metadatas[i]
# Convert distance to similarity (ChromaDB returns L2 distance)
# Lower distance = higher similarity
vector_similarity = 1.0 / (1.0 + distances[i])
# Check distance constraint
if profile.user_latitude and profile.user_longitude:
cat_lat = metadata.get('latitude')
cat_lon = metadata.get('longitude')
if cat_lat and cat_lon and cat_lat != '' and cat_lon != '':
try:
cat_lat = float(cat_lat)
cat_lon = float(cat_lon)
distance = calculate_distance(
profile.user_latitude,
profile.user_longitude,
cat_lat,
cat_lon
)
max_dist = profile.max_distance or 100
if distance > max_dist:
self.log(f"DEBUG: Filtered out {metadata['name']} - {distance:.1f} miles away (max: {max_dist})")
continue # Skip this cat, too far away
except (ValueError, TypeError):
pass # Keep cat if coordinates invalid
# Reconstruct Cat from metadata
cat = Cat(
id=metadata['id'],
name=metadata['name'],
age=metadata['age'],
size=metadata['size'],
gender=metadata['gender'],
breed=metadata['breed'],
city=metadata.get('city', ''),
state=metadata.get('state', ''),
zip_code=metadata.get('zip_code', ''),
latitude=float(metadata['latitude']) if metadata.get('latitude') and metadata['latitude'] != '' else None,
longitude=float(metadata['longitude']) if metadata.get('longitude') and metadata['longitude'] != '' else None,
organization_name=metadata['organization'],
source=metadata['source'],
url=metadata['url'],
primary_photo=metadata.get('primary_photo', ''),
description='', # Not stored in metadata
good_with_children=metadata.get('good_with_children') == 'True' if metadata.get('good_with_children') != 'unknown' else None,
good_with_dogs=metadata.get('good_with_dogs') == 'True' if metadata.get('good_with_dogs') != 'unknown' else None,
good_with_cats=metadata.get('good_with_cats') == 'True' if metadata.get('good_with_cats') != 'unknown' else None,
special_needs=metadata.get('special_needs') == 'True',
)
results.append((cat, vector_similarity, metadata))
return results
def _create_explanation(self, cat: Cat, match_score: float, vector_sim: float, attr_score: float, matching_attrs: List[str]) -> str:
"""
Create human-readable explanation of match.
Args:
cat: Matched cat
match_score: Overall match score
vector_sim: Vector similarity score
attr_score: Attribute match score
matching_attrs: List of matching attributes
Returns:
Explanation string
"""
explanation_parts = []
# Overall match quality
if match_score >= 0.8:
explanation_parts.append(f"{cat.name} is an excellent match!")
elif match_score >= 0.6:
explanation_parts.append(f"{cat.name} is a good match.")
else:
explanation_parts.append(f"{cat.name} might be a match.")
# Personality match
if vector_sim >= 0.7:
explanation_parts.append("Personality description strongly matches your preferences.")
elif vector_sim >= 0.5:
explanation_parts.append("Personality description aligns with your preferences.")
# Matching attributes
if matching_attrs:
top_matches = matching_attrs[:3] # Show top 3
explanation_parts.append("Matches: " + ", ".join(top_matches))
return " ".join(explanation_parts)
@timed
def match(self, profile: CatProfile) -> List[CatMatch]:
"""
Find cats that match the user's profile using hybrid search.
Strategy:
1. Vector search for semantic similarity (top N)
2. Filter by hard constraints (metadata)
3. Rank by weighted combination of semantic + attribute scores
4. Return top matches with explanations
Args:
profile: User's cat profile
Returns:
List of CatMatch objects, sorted by match score
"""
self.log(f"Starting hybrid search with profile: {profile.personality_description[:50]}...")
# Step 1: Vector search
query = profile.personality_description or "friendly, loving cat"
where_clause = self._apply_metadata_filters(profile)
self.log(f"Vector search for top {self.vector_top_n} semantic matches")
if where_clause:
self.log(f"Applying metadata filters: {where_clause}")
results = self.vector_db.search(
query=query,
n_results=self.vector_top_n,
where=where_clause
)
if not results['ids'][0]:
self.log("No results found matching criteria")
return []
self.log(f"Vector search returned {len(results['ids'][0])} candidates")
# Step 2: Filter by distance (if applicable)
candidates = self._filter_by_distance(results, profile)
# Step 3: Calculate attribute scores and rank
self.log("Calculating attribute match scores and ranking")
matches = []
for cat, vector_similarity, metadata in candidates:
# Calculate attribute match score
attr_score, matching_attrs, missing_attrs = self._calculate_attribute_match_score(cat, profile)
# Calculate weighted final score
final_score = (
self.semantic_weight * vector_similarity +
self.attribute_weight * attr_score
)
# Create explanation
explanation = self._create_explanation(cat, final_score, vector_similarity, attr_score, matching_attrs)
# Create match object
match = CatMatch(
cat=cat,
match_score=final_score,
vector_similarity=vector_similarity,
attribute_match_score=attr_score,
explanation=explanation,
matching_attributes=matching_attrs,
missing_attributes=missing_attrs
)
matches.append(match)
# Sort by match score
matches.sort(key=lambda m: m.match_score, reverse=True)
# Return top matches
top_matches = matches[:self.final_limit]
self.log(f"Returning top {len(top_matches)} matches")
if top_matches:
self.log(f"Best match: {top_matches[0].cat.name} (score: {top_matches[0].match_score:.2f})")
return top_matches

View File

@@ -1,459 +0,0 @@
"""Petfinder API agent for fetching cat adoption listings."""
import os
import time
import requests
from datetime import datetime, timedelta
from typing import List, Optional, Dict, Any
from dotenv import load_dotenv
from models.cats import Cat
from .agent import Agent, timed
class PetfinderAgent(Agent):
"""Agent for interacting with Petfinder API v2."""
name = "Petfinder Agent"
color = Agent.CYAN
BASE_URL = "https://api.petfinder.com/v2"
TOKEN_URL = f"{BASE_URL}/oauth2/token"
ANIMALS_URL = f"{BASE_URL}/animals"
TYPES_URL = f"{BASE_URL}/types"
# Rate limiting
MAX_REQUESTS_PER_SECOND = 1
MAX_RESULTS_PER_PAGE = 100
MAX_TOTAL_RESULTS = 1000
# Cache for valid colors and breeds (populated on first use)
_valid_colors_cache: Optional[List[str]] = None
_valid_breeds_cache: Optional[List[str]] = None
def __init__(self):
"""Initialize the Petfinder agent with API credentials."""
load_dotenv()
self.api_key = os.getenv('PETFINDER_API_KEY')
self.api_secret = os.getenv('PETFINDER_SECRET')
if not self.api_key or not self.api_secret:
raise ValueError("PETFINDER_API_KEY and PETFINDER_SECRET must be set in environment")
self.access_token: Optional[str] = None
self.token_expires_at: Optional[datetime] = None
self.last_request_time: float = 0
self.log("Petfinder Agent initialized")
def get_valid_colors(self) -> List[str]:
"""
Fetch valid colors for cats from Petfinder API.
Returns:
List of valid color strings accepted by the API
"""
# Use class-level cache
if PetfinderAgent._valid_colors_cache is not None:
return PetfinderAgent._valid_colors_cache
try:
self.log("Fetching valid cat colors from Petfinder API...")
url = f"{self.TYPES_URL}/cat"
token = self._get_access_token()
headers = {'Authorization': f'Bearer {token}'}
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
data = response.json()
colors = data.get('type', {}).get('colors', [])
# Cache the results
PetfinderAgent._valid_colors_cache = colors
self.log(f"✓ Fetched {len(colors)} valid colors from Petfinder")
self.log(f"Valid colors: {', '.join(colors[:10])}...")
return colors
except Exception as e:
self.log_error(f"Failed to fetch valid colors: {e}")
# Return common colors as fallback
fallback = ["Black", "White", "Orange", "Gray", "Brown", "Cream", "Tabby"]
self.log(f"Using fallback colors: {fallback}")
return fallback
def get_valid_breeds(self) -> List[str]:
"""
Fetch valid cat breeds from Petfinder API.
Returns:
List of valid breed strings accepted by the API
"""
# Use class-level cache
if PetfinderAgent._valid_breeds_cache is not None:
return PetfinderAgent._valid_breeds_cache
try:
self.log("Fetching valid cat breeds from Petfinder API...")
url = f"{self.TYPES_URL}/cat/breeds"
token = self._get_access_token()
headers = {'Authorization': f'Bearer {token}'}
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
data = response.json()
breeds = [breed['name'] for breed in data.get('breeds', [])]
# Cache the results
PetfinderAgent._valid_breeds_cache = breeds
self.log(f"✓ Fetched {len(breeds)} valid breeds from Petfinder")
return breeds
except Exception as e:
self.log_error(f"Failed to fetch valid breeds: {e}")
# Return common breeds as fallback
fallback = ["Domestic Short Hair", "Domestic Medium Hair", "Domestic Long Hair", "Siamese", "Persian", "Maine Coon"]
self.log(f"Using fallback breeds: {fallback}")
return fallback
def _rate_limit(self) -> None:
"""Implement rate limiting to respect API limits."""
elapsed = time.time() - self.last_request_time
min_interval = 1.0 / self.MAX_REQUESTS_PER_SECOND
if elapsed < min_interval:
time.sleep(min_interval - elapsed)
self.last_request_time = time.time()
def _get_access_token(self) -> str:
"""
Get or refresh the OAuth access token.
Returns:
Access token string
"""
# Check if we have a valid token
if self.access_token and self.token_expires_at:
if datetime.now() < self.token_expires_at:
return self.access_token
# Request new token
self.log("Requesting new access token from Petfinder")
data = {
'grant_type': 'client_credentials',
'client_id': self.api_key,
'client_secret': self.api_secret
}
try:
response = requests.post(self.TOKEN_URL, data=data, timeout=10)
response.raise_for_status()
token_data = response.json()
self.access_token = token_data['access_token']
# Set expiration (subtract 60 seconds for safety)
expires_in = token_data.get('expires_in', 3600)
self.token_expires_at = datetime.now() + timedelta(seconds=expires_in - 60)
self.log(f"Access token obtained, expires at {self.token_expires_at}")
return self.access_token
except Exception as e:
self.log_error(f"Failed to get access token: {e}")
raise
def _make_request(self, url: str, params: Dict[str, Any]) -> Dict[str, Any]:
"""
Make an authenticated request to Petfinder API with rate limiting.
Args:
url: API endpoint URL
params: Query parameters
Returns:
JSON response data
"""
self._rate_limit()
token = self._get_access_token()
headers = {
'Authorization': f'Bearer {token}'
}
try:
response = requests.get(url, headers=headers, params=params, timeout=10)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
if e.response.status_code == 401:
# Token might be invalid, clear it and retry once
self.log_warning("Token invalid, refreshing and retrying")
self.access_token = None
token = self._get_access_token()
headers['Authorization'] = f'Bearer {token}'
response = requests.get(url, headers=headers, params=params, timeout=10)
response.raise_for_status()
return response.json()
else:
raise
def _parse_cat(self, animal_data: Dict[str, Any]) -> Cat:
"""
Parse Petfinder API animal data into Cat model.
Args:
animal_data: Animal data from Petfinder API
Returns:
Cat object
"""
# Basic info
cat_id = f"petfinder_{animal_data['id']}"
name = animal_data.get('name', 'Unknown')
# Breed info
breeds = animal_data.get('breeds', {})
primary_breed = breeds.get('primary', 'Unknown')
secondary_breed = breeds.get('secondary')
secondary_breeds = [secondary_breed] if secondary_breed else []
# Age mapping
age_map = {
'Baby': 'kitten',
'Young': 'young',
'Adult': 'adult',
'Senior': 'senior'
}
age = age_map.get(animal_data.get('age', 'Unknown'), 'unknown')
# Size mapping
size_map = {
'Small': 'small',
'Medium': 'medium',
'Large': 'large'
}
size = size_map.get(animal_data.get('size', 'Unknown'), 'unknown')
# Gender mapping
gender_map = {
'Male': 'male',
'Female': 'female',
'Unknown': 'unknown'
}
gender = gender_map.get(animal_data.get('gender', 'Unknown'), 'unknown')
# Description
description = animal_data.get('description', '')
if not description:
description = f"{name} is a {age} {primary_breed} looking for a home."
# Location info
contact = animal_data.get('contact', {})
address = contact.get('address', {})
organization_id = animal_data.get('organization_id')
city = address.get('city')
state = address.get('state')
zip_code = address.get('postcode')
# Attributes
attributes = animal_data.get('attributes', {})
environment = animal_data.get('environment', {})
# Photos
photos_data = animal_data.get('photos', [])
photos = [p['large'] or p['medium'] or p['small'] for p in photos_data if p]
primary_photo = photos[0] if photos else None
# Videos
videos_data = animal_data.get('videos', [])
videos = [v.get('embed') for v in videos_data if v.get('embed')]
# Contact info
contact_email = contact.get('email')
contact_phone = contact.get('phone')
# Colors
colors_data = animal_data.get('colors', {})
colors = [c for c in [colors_data.get('primary'), colors_data.get('secondary'), colors_data.get('tertiary')] if c]
# Coat length
coat = animal_data.get('coat')
coat_map = {
'Short': 'short',
'Medium': 'medium',
'Long': 'long'
}
coat_length = coat_map.get(coat) if coat else None
# URL
url = animal_data.get('url', f"https://www.petfinder.com/cat/{animal_data['id']}")
return Cat(
id=cat_id,
name=name,
breed=primary_breed,
breeds_secondary=secondary_breeds,
age=age,
size=size,
gender=gender,
description=description,
organization_name=animal_data.get('organization_id', 'Unknown Organization'),
organization_id=organization_id,
city=city,
state=state,
zip_code=zip_code,
country='US',
distance=animal_data.get('distance'),
good_with_children=environment.get('children'),
good_with_dogs=environment.get('dogs'),
good_with_cats=environment.get('cats'),
special_needs=attributes.get('special_needs', False),
photos=photos,
primary_photo=primary_photo,
videos=videos,
source='petfinder',
url=url,
contact_email=contact_email,
contact_phone=contact_phone,
declawed=attributes.get('declawed'),
spayed_neutered=attributes.get('spayed_neutered'),
house_trained=attributes.get('house_trained'),
coat_length=coat_length,
colors=colors,
fetched_at=datetime.now()
)
@timed
def search_cats(
self,
location: Optional[str] = None,
distance: int = 100,
age: Optional[List[str]] = None,
size: Optional[List[str]] = None,
gender: Optional[str] = None,
color: Optional[List[str]] = None,
breed: Optional[List[str]] = None,
good_with_children: Optional[bool] = None,
good_with_dogs: Optional[bool] = None,
good_with_cats: Optional[bool] = None,
limit: int = 100
) -> List[Cat]:
"""
Search for cats on Petfinder.
Args:
location: ZIP code or "city, state" (e.g., "10001" or "New York, NY")
distance: Search radius in miles (default: 100)
age: List of age categories: baby, young, adult, senior
size: List of sizes: small, medium, large
gender: Gender filter: male, female
color: List of colors (e.g., ["black", "white", "tuxedo"])
breed: List of breed names (e.g., ["Siamese", "Maine Coon"])
good_with_children: Filter for cats good with children
good_with_dogs: Filter for cats good with dogs
good_with_cats: Filter for cats good with other cats
limit: Maximum number of results (default: 100, max: 1000)
Returns:
List of Cat objects
"""
color_str = f" with colors {color}" if color else ""
self.log(f"Searching for cats near {location} within {distance} miles{color_str}")
# Build query parameters
params: Dict[str, Any] = {
'type': 'cat',
'limit': min(self.MAX_RESULTS_PER_PAGE, limit),
'sort': 'recent'
}
self.log(f"DEBUG: Initial params: {params}")
if location:
params['location'] = location
params['distance'] = distance
if age:
# Map our age categories to Petfinder's
age_map = {
'kitten': 'baby',
'young': 'young',
'adult': 'adult',
'senior': 'senior'
}
petfinder_ages = [age_map.get(a, a) for a in age]
params['age'] = ','.join(petfinder_ages)
if size:
params['size'] = ','.join(size)
if gender:
params['gender'] = gender
if color:
params['color'] = ','.join(color)
if breed:
params['breed'] = ','.join(breed)
if good_with_children is not None:
params['good_with_children'] = str(good_with_children).lower()
if good_with_dogs is not None:
params['good_with_dogs'] = str(good_with_dogs).lower()
if good_with_cats is not None:
params['good_with_cats'] = str(good_with_cats).lower()
self.log(f"DEBUG: ====== PETFINDER API CALL ======")
self.log(f"DEBUG: Final API params: {params}")
self.log(f"DEBUG: ================================")
# Fetch results with pagination
cats = []
page = 1
total_pages = 1
while page <= total_pages and len(cats) < min(limit, self.MAX_TOTAL_RESULTS):
params['page'] = page
try:
data = self._make_request(self.ANIMALS_URL, params)
self.log(f"DEBUG: API Response - Total results: {data.get('pagination', {}).get('total_count', 'unknown')}")
self.log(f"DEBUG: API Response - Animals in this page: {len(data.get('animals', []))}")
# Parse animals
animals = data.get('animals', [])
for animal_data in animals:
try:
cat = self._parse_cat(animal_data)
cats.append(cat)
except Exception as e:
self.log_warning(f"Failed to parse cat {animal_data.get('id')}: {e}")
# Check pagination
pagination = data.get('pagination', {})
total_pages = pagination.get('total_pages', 1)
self.log(f"Fetched page {page}/{total_pages}, {len(animals)} cats")
page += 1
except Exception as e:
self.log_error(f"Failed to fetch page {page}: {e}")
break
self.log(f"Search complete: found {len(cats)} cats")
return cats[:limit] # Ensure we don't exceed limit

View File

@@ -1,365 +0,0 @@
"""Planning agent for orchestrating the cat adoption search pipeline."""
import threading
from typing import List
from concurrent.futures import ThreadPoolExecutor, as_completed
from models.cats import Cat, CatProfile, CatMatch, SearchResult
from database.manager import DatabaseManager
from setup_vectordb import VectorDBManager
from setup_metadata_vectordb import MetadataVectorDB
from .agent import Agent, timed
from .petfinder_agent import PetfinderAgent
from .rescuegroups_agent import RescueGroupsAgent
from .deduplication_agent import DeduplicationAgent
from .matching_agent import MatchingAgent
class PlanningAgent(Agent):
"""Agent for orchestrating the complete cat adoption search pipeline."""
name = "Planning Agent"
color = Agent.WHITE
def __init__(
self,
db_manager: DatabaseManager,
vector_db: VectorDBManager,
metadata_vectordb: MetadataVectorDB = None
):
"""
Initialize the planning agent.
Args:
db_manager: Database manager instance
vector_db: Vector database manager instance
metadata_vectordb: Optional metadata vector DB for color/breed fuzzy matching
"""
self.log("Planning Agent initializing...")
# Initialize all agents
self.petfinder = PetfinderAgent()
self.rescuegroups = RescueGroupsAgent()
self.deduplication = DeduplicationAgent(db_manager)
self.matching = MatchingAgent(vector_db)
self.db_manager = db_manager
self.vector_db = vector_db
self.metadata_vectordb = metadata_vectordb
self.log("Planning Agent ready")
def _search_petfinder(self, profile: CatProfile) -> List[Cat]:
"""
Search Petfinder with the given profile.
Args:
profile: User's cat profile
Returns:
List of cats from Petfinder
"""
try:
# Normalize colors to valid Petfinder API values (3-tier: dict + vector + fallback)
api_colors = None
if profile.color_preferences:
from utils.color_mapping import normalize_user_colors
valid_colors = self.petfinder.get_valid_colors()
api_colors = normalize_user_colors(
profile.color_preferences,
valid_colors,
vectordb=self.metadata_vectordb,
source="petfinder"
)
if api_colors:
self.log(f"✓ Colors: {profile.color_preferences}{api_colors}")
else:
self.log(f"⚠️ Could not map colors {profile.color_preferences}")
# Normalize breeds to valid Petfinder API values (3-tier: dict + vector + fallback)
api_breeds = None
if profile.preferred_breeds:
from utils.breed_mapping import normalize_user_breeds
valid_breeds = self.petfinder.get_valid_breeds()
api_breeds = normalize_user_breeds(
profile.preferred_breeds,
valid_breeds,
vectordb=self.metadata_vectordb,
source="petfinder"
)
if api_breeds:
self.log(f"✓ Breeds: {profile.preferred_breeds}{api_breeds}")
else:
self.log(f"⚠️ Could not map breeds {profile.preferred_breeds}")
return self.petfinder.search_cats(
location=profile.user_location,
distance=profile.max_distance or 100,
age=profile.age_range,
size=profile.size,
gender=profile.gender_preference,
color=api_colors,
breed=api_breeds,
good_with_children=profile.good_with_children,
good_with_dogs=profile.good_with_dogs,
good_with_cats=profile.good_with_cats,
limit=100
)
except Exception as e:
self.log_error(f"Petfinder search failed: {e}")
return []
def _search_rescuegroups(self, profile: CatProfile) -> List[Cat]:
"""
Search RescueGroups with the given profile.
Args:
profile: User's cat profile
Returns:
List of cats from RescueGroups
"""
try:
# Normalize colors to valid RescueGroups API values (3-tier: dict + vector + fallback)
api_colors = None
if profile.color_preferences:
from utils.color_mapping import normalize_user_colors
valid_colors = self.rescuegroups.get_valid_colors()
api_colors = normalize_user_colors(
profile.color_preferences,
valid_colors,
vectordb=self.metadata_vectordb,
source="rescuegroups"
)
if api_colors:
self.log(f"✓ Colors: {profile.color_preferences}{api_colors}")
else:
self.log(f"⚠️ Could not map colors {profile.color_preferences}")
# Normalize breeds to valid RescueGroups API values (3-tier: dict + vector + fallback)
api_breeds = None
if profile.preferred_breeds:
from utils.breed_mapping import normalize_user_breeds
valid_breeds = self.rescuegroups.get_valid_breeds()
api_breeds = normalize_user_breeds(
profile.preferred_breeds,
valid_breeds,
vectordb=self.metadata_vectordb,
source="rescuegroups"
)
if api_breeds:
self.log(f"✓ Breeds: {profile.preferred_breeds}{api_breeds}")
else:
self.log(f"⚠️ Could not map breeds {profile.preferred_breeds}")
return self.rescuegroups.search_cats(
location=profile.user_location,
distance=profile.max_distance or 100,
age=profile.age_range,
size=profile.size,
gender=profile.gender_preference,
color=api_colors,
breed=api_breeds,
good_with_children=profile.good_with_children,
good_with_dogs=profile.good_with_dogs,
good_with_cats=profile.good_with_cats,
limit=100
)
except Exception as e:
self.log_error(f"RescueGroups search failed: {e}")
return []
@timed
def fetch_cats(self, profile: CatProfile) -> List[Cat]:
"""
Fetch cats from all sources in parallel.
Args:
profile: User's cat profile
Returns:
Combined list of cats from all sources
"""
self.log("Fetching cats from all sources in parallel...")
self.log(f"DEBUG: Profile location={profile.user_location}, distance={profile.max_distance}, colors={profile.color_preferences}, age={profile.age_range}")
all_cats = []
sources_queried = []
# Execute searches in parallel
with ThreadPoolExecutor(max_workers=2) as executor:
futures = {
executor.submit(self._search_petfinder, profile): 'petfinder',
executor.submit(self._search_rescuegroups, profile): 'rescuegroups'
}
for future in as_completed(futures):
source = futures[future]
try:
cats = future.result()
all_cats.extend(cats)
sources_queried.append(source)
self.log(f"DEBUG: ✓ Received {len(cats)} cats from {source}")
except Exception as e:
self.log_error(f"Failed to fetch from {source}: {e}")
self.log(f"DEBUG: Total cats fetched: {len(all_cats)} from {len(sources_queried)} sources")
return all_cats, sources_queried
@timed
def deduplicate_and_cache(self, cats: List[Cat]) -> List[Cat]:
"""
Deduplicate cats and cache them in the database.
Args:
cats: List of cats to process
Returns:
List of unique cats
"""
self.log(f"Deduplicating {len(cats)} cats...")
unique_cats = self.deduplication.deduplicate_batch(cats)
self.log(f"Deduplication complete: {len(unique_cats)} unique cats")
return unique_cats
@timed
def update_vector_db(self, cats: List[Cat]) -> None:
"""
Update vector database with new cats.
Args:
cats: List of cats to add/update
"""
self.log(f"Updating vector database with {len(cats)} cats...")
try:
self.vector_db.add_cats_batch(cats)
self.log("Vector database updated successfully")
except Exception as e:
self.log_error(f"Failed to update vector database: {e}")
@timed
def search(self, profile: CatProfile, use_cache: bool = False) -> SearchResult:
"""
Execute the complete search pipeline.
Pipeline:
1. Fetch cats from Petfinder and RescueGroups in parallel (or use cache)
2. Deduplicate across sources and cache in database
3. Update vector database with new/updated cats
4. Use matching agent to find best matches
5. Return search results
Args:
profile: User's cat profile
use_cache: If True, use cached cats instead of fetching from APIs
Returns:
SearchResult with matches and metadata
"""
import time
start_time = time.time()
self.log("=" * 50)
self.log("STARTING CAT ADOPTION SEARCH PIPELINE")
if use_cache:
self.log("🔄 CACHE MODE: Using existing cached data")
self.log("=" * 50)
# Step 1: Fetch from sources or use cache
if use_cache:
self.log("Loading cats from cache...")
all_cats = self.db_manager.get_all_cached_cats(exclude_duplicates=True)
sources_queried = ['cache']
total_found = len(all_cats)
unique_cats = all_cats
duplicates_removed = 0
if not all_cats:
self.log("No cached cats found. Run without use_cache=True first.")
return SearchResult(
matches=[],
total_found=0,
search_profile=profile,
search_time=time.time() - start_time,
sources_queried=['cache'],
duplicates_removed=0
)
self.log(f"Loaded {len(all_cats)} cats from cache")
else:
all_cats, sources_queried = self.fetch_cats(profile)
total_found = len(all_cats)
if not all_cats:
self.log("No cats found matching criteria")
return SearchResult(
matches=[],
total_found=0,
search_profile=profile,
search_time=time.time() - start_time,
sources_queried=sources_queried,
duplicates_removed=0
)
# Step 2: Deduplicate and cache
unique_cats = self.deduplicate_and_cache(all_cats)
duplicates_removed = total_found - len(unique_cats)
# Step 3: Update vector database
self.update_vector_db(unique_cats)
# Step 4: Find matches using hybrid search
self.log("Finding best matches using hybrid search...")
matches = self.matching.match(profile)
# Calculate search time
search_time = time.time() - start_time
# Create result
result = SearchResult(
matches=matches,
total_found=total_found,
search_profile=profile,
search_time=search_time,
sources_queried=sources_queried,
duplicates_removed=duplicates_removed
)
self.log("=" * 50)
self.log(f"SEARCH COMPLETE - Found {len(matches)} matches in {search_time:.2f}s")
self.log("=" * 50)
return result
def cleanup_old_data(self, days: int = 30) -> dict:
"""
Clean up old cached data.
Args:
days: Number of days to keep
Returns:
Dictionary with cleanup stats
"""
self.log(f"Cleaning up cats older than {days} days...")
# Clean SQLite cache
removed = self.db_manager.cleanup_old_cats(days)
# Note: ChromaDB cleanup would require tracking IDs separately
# For now, we rely on the database as source of truth
self.log(f"Cleanup complete: removed {removed} old cats")
return {
'cats_removed': removed,
'days_threshold': days
}

View File

@@ -1,191 +0,0 @@
"""Profile agent for extracting user preferences using LLM."""
import os
from typing import List, Optional
from openai import OpenAI
from dotenv import load_dotenv
from models.cats import CatProfile
from utils.geocoding import parse_location_input
from .agent import Agent
class ProfileAgent(Agent):
"""Agent for extracting cat adoption preferences from user conversation."""
name = "Profile Agent"
color = Agent.GREEN
MODEL = "gpt-4o-mini"
SYSTEM_PROMPT = """You are a helpful assistant helping users find their perfect cat for adoption.
Your job is to extract their preferences through natural conversation and return them in structured format.
Ask about:
- Color and coat patterns (e.g., tuxedo/black&white, tabby, orange, calico, tortoiseshell, gray, etc.)
- Personality traits they're looking for (playful, calm, cuddly, independent, etc.)
- Age preference (kitten, young adult, adult, senior)
- Size preference (small, medium, large)
- Living situation (children, dogs, other cats)
- Special needs acceptance
- Location and max distance willing to travel
- Gender preference (if any)
- Breed preferences (if any)
IMPORTANT: When users mention colors or patterns (like "tuxedo", "black and white", "orange tabby", etc.),
extract these into the color_preferences field exactly as the user states them. Examples:
- "tuxedo" → ["tuxedo"]
- "black and white" → ["black and white"]
- "orange tabby" → ["orange", "tabby"]
- "calico" → ["calico"]
- "gray" or "grey" → ["gray"]
Extract colors/patterns naturally without trying to map to specific API values.
Be conversational and warm. Ask follow-up questions if preferences are unclear.
When you have enough information, extract it into the CatProfile format."""
def __init__(self):
"""Initialize the profile agent."""
load_dotenv()
self.api_key = os.getenv('OPENAI_API_KEY')
if not self.api_key:
raise ValueError("OPENAI_API_KEY must be set in environment")
self.client = OpenAI(api_key=self.api_key)
self.log("Profile Agent initialized")
def extract_profile(self, conversation: List[dict]) -> Optional[CatProfile]:
"""
Extract CatProfile from conversation history.
Args:
conversation: List of message dicts with 'role' and 'content'
Returns:
CatProfile object or None if extraction fails
"""
self.log("Extracting profile from conversation")
# Add system message
messages = [{"role": "system", "content": self.SYSTEM_PROMPT}]
messages.extend(conversation)
# Add extraction prompt
messages.append({
"role": "user",
"content": "Please extract my preferences into a structured profile now."
})
try:
response = self.client.beta.chat.completions.parse(
model=self.MODEL,
messages=messages,
response_format=CatProfile
)
profile = response.choices[0].message.parsed
# Parse location if provided
if profile.user_location:
coords = parse_location_input(profile.user_location)
if coords:
profile.user_latitude, profile.user_longitude = coords
self.log(f"Parsed location: {profile.user_location} -> {coords}")
else:
self.log_warning(f"Could not parse location: {profile.user_location}")
self.log("Profile extracted successfully")
return profile
except Exception as e:
self.log_error(f"Failed to extract profile: {e}")
return None
def chat(self, user_message: str, conversation_history: List[dict]) -> str:
"""
Continue conversation to gather preferences.
Args:
user_message: Latest user message
conversation_history: Previous conversation
Returns:
Assistant's response
"""
self.log(f"Processing user message: {user_message[:50]}...")
# Build messages
messages = [{"role": "system", "content": self.SYSTEM_PROMPT}]
messages.extend(conversation_history)
messages.append({"role": "user", "content": user_message})
try:
response = self.client.chat.completions.create(
model=self.MODEL,
messages=messages
)
assistant_message = response.choices[0].message.content
self.log("Generated response")
return assistant_message
except Exception as e:
self.log_error(f"Chat failed: {e}")
return "I'm sorry, I'm having trouble right now. Could you try again?"
def create_profile_from_direct_input(
self,
location: str,
distance: int = 100,
personality_description: str = "",
age_range: Optional[List[str]] = None,
size: Optional[List[str]] = None,
good_with_children: Optional[bool] = None,
good_with_dogs: Optional[bool] = None,
good_with_cats: Optional[bool] = None
) -> CatProfile:
"""
Create profile directly from form inputs (bypass conversation).
Args:
location: User location
distance: Search radius in miles
personality_description: Free text personality description
age_range: Age preferences
size: Size preferences
good_with_children: Must be good with children
good_with_dogs: Must be good with dogs
good_with_cats: Must be good with cats
Returns:
CatProfile object
"""
self.log("Creating profile from direct input")
# Parse location
user_lat, user_lon = None, None
coords = parse_location_input(location)
if coords:
user_lat, user_lon = coords
profile = CatProfile(
user_location=location,
user_latitude=user_lat,
user_longitude=user_lon,
max_distance=distance,
personality_description=personality_description,
age_range=age_range,
size=size,
good_with_children=good_with_children,
good_with_dogs=good_with_dogs,
good_with_cats=good_with_cats
)
self.log("Profile created from direct input")
return profile

View File

@@ -1,474 +0,0 @@
"""RescueGroups.org API agent for fetching cat adoption listings."""
import os
import time
import requests
from datetime import datetime
from typing import List, Optional, Dict, Any
from dotenv import load_dotenv
from models.cats import Cat
from .agent import Agent, timed
class RescueGroupsAgent(Agent):
"""Agent for interacting with RescueGroups.org API."""
name = "RescueGroups Agent"
color = Agent.MAGENTA
BASE_URL = "https://api.rescuegroups.org/v5"
# Rate limiting
MAX_REQUESTS_PER_SECOND = 0.5 # Be conservative
MAX_RESULTS_PER_PAGE = 100
# Cache for valid colors and breeds
_valid_colors_cache: Optional[List[str]] = None
_valid_breeds_cache: Optional[List[str]] = None
def __init__(self):
"""Initialize the RescueGroups agent with API credentials."""
load_dotenv()
self.api_key = os.getenv('RESCUEGROUPS_API_KEY')
if not self.api_key:
self.log_warning("RESCUEGROUPS_API_KEY not set - agent will not function")
self.api_key = None
self.last_request_time: float = 0
self.log("RescueGroups Agent initialized")
def get_valid_colors(self) -> List[str]:
"""
Fetch valid colors from RescueGroups API.
Returns:
List of valid color strings
"""
if not self.api_key:
return []
# Use class-level cache
if RescueGroupsAgent._valid_colors_cache is not None:
return RescueGroupsAgent._valid_colors_cache
try:
self.log("Fetching valid cat colors from RescueGroups API...")
# Correct endpoint for colors
url = f"{self.BASE_URL}/public/animals/colors"
headers = {
'Authorization': self.api_key,
'Content-Type': 'application/vnd.api+json'
}
# Add limit parameter to get all colors (no max limit for static data per docs)
params = {'limit': 1000}
self._rate_limit()
response = requests.get(url, headers=headers, params=params, timeout=15)
response.raise_for_status()
data = response.json()
colors = [item['attributes']['name'] for item in data.get('data', [])]
# Cache the results
RescueGroupsAgent._valid_colors_cache = colors
self.log(f"✓ Fetched {len(colors)} valid colors from RescueGroups")
return colors
except Exception as e:
self.log_error(f"Failed to fetch valid colors: {e}")
# Return empty list - planning agent will handle gracefully
return []
def get_valid_breeds(self) -> List[str]:
"""
Fetch valid cat breeds from RescueGroups API.
Returns:
List of valid breed strings
"""
if not self.api_key:
return []
# Use class-level cache
if RescueGroupsAgent._valid_breeds_cache is not None:
return RescueGroupsAgent._valid_breeds_cache
try:
self.log("Fetching valid cat breeds from RescueGroups API...")
# Correct endpoint for breeds
url = f"{self.BASE_URL}/public/animals/breeds"
headers = {
'Authorization': self.api_key,
'Content-Type': 'application/vnd.api+json'
}
# Add limit parameter to get all breeds (no max limit for static data per docs)
params = {'limit': 1000}
self._rate_limit()
response = requests.get(url, headers=headers, params=params, timeout=15)
response.raise_for_status()
data = response.json()
breeds = [item['attributes']['name'] for item in data.get('data', [])]
# Cache the results
RescueGroupsAgent._valid_breeds_cache = breeds
self.log(f"✓ Fetched {len(breeds)} valid breeds from RescueGroups")
return breeds
except Exception as e:
self.log_error(f"Failed to fetch valid breeds: {e}")
# Return empty list - planning agent will handle gracefully
return []
def _rate_limit(self) -> None:
"""Implement rate limiting to respect API limits."""
elapsed = time.time() - self.last_request_time
min_interval = 1.0 / self.MAX_REQUESTS_PER_SECOND
if elapsed < min_interval:
time.sleep(min_interval - elapsed)
self.last_request_time = time.time()
def _make_request(self, endpoint: str, data: Dict[str, Any]) -> Dict[str, Any]:
"""
Make an authenticated POST request to RescueGroups API.
Args:
endpoint: API endpoint (e.g., "/animals/search")
data: Request payload
Returns:
JSON response data
"""
if not self.api_key:
raise ValueError("RescueGroups API key not configured")
self._rate_limit()
url = f"{self.BASE_URL}{endpoint}"
headers = {
'Authorization': self.api_key,
'Content-Type': 'application/vnd.api+json'
}
try:
response = requests.post(url, json=data, headers=headers, timeout=15)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
self.log_error(f"API request failed: {e}")
if hasattr(e, 'response') and e.response is not None:
self.log_error(f"Response: {e.response.text[:500]}")
raise
def _parse_cat(self, animal_data: Dict[str, Any]) -> Cat:
"""
Parse RescueGroups API animal data into Cat model.
Args:
animal_data: Animal data from RescueGroups API
Returns:
Cat object
"""
attributes = animal_data.get('attributes', {})
# Basic info
cat_id = f"rescuegroups_{animal_data['id']}"
name = attributes.get('name', 'Unknown')
# Breed info
primary_breed = attributes.get('breedPrimary', 'Unknown')
secondary_breed = attributes.get('breedSecondary')
secondary_breeds = [secondary_breed] if secondary_breed else []
# Age mapping
age_str = attributes.get('ageGroup', '').lower()
age_map = {
'baby': 'kitten',
'young': 'young',
'adult': 'adult',
'senior': 'senior'
}
age = age_map.get(age_str, 'unknown')
# Size mapping
size_str = attributes.get('sizeGroup', '').lower()
size_map = {
'small': 'small',
'medium': 'medium',
'large': 'large'
}
size = size_map.get(size_str, 'unknown')
# Gender mapping
gender_str = attributes.get('sex', '').lower()
gender_map = {
'male': 'male',
'female': 'female'
}
gender = gender_map.get(gender_str, 'unknown')
# Description
description = attributes.get('descriptionText', '')
if not description:
description = f"{name} is a {age} {primary_breed} looking for a home."
# Location info
location = attributes.get('location', {}) or {}
city = location.get('citytown')
state = location.get('stateProvince')
zip_code = location.get('postalcode')
# Organization
org_name = attributes.get('orgName', 'Unknown Organization')
org_id = attributes.get('orgID')
# Attributes - map RescueGroups boolean fields
good_with_children = attributes.get('isKidsGood')
good_with_dogs = attributes.get('isDogsGood')
good_with_cats = attributes.get('isCatsGood')
special_needs = attributes.get('isSpecialNeeds', False)
# Photos
pictures = attributes.get('pictureThumbnailUrl', [])
if isinstance(pictures, str):
pictures = [pictures] if pictures else []
elif not pictures:
pictures = []
photos = [pic for pic in pictures if pic]
primary_photo = photos[0] if photos else None
# Contact info
contact_email = attributes.get('emailAddress')
contact_phone = attributes.get('phoneNumber')
# Colors
color_str = attributes.get('colorDetails', '')
colors = [c.strip() for c in color_str.split(',') if c.strip()] if color_str else []
# Coat
coat_str = attributes.get('coatLength', '').lower()
coat_map = {
'short': 'short',
'medium': 'medium',
'long': 'long'
}
coat_length = coat_map.get(coat_str)
# URL
url = attributes.get('url', f"https://rescuegroups.org/animal/{animal_data['id']}")
# Additional attributes
declawed = attributes.get('isDeclawed')
spayed_neutered = attributes.get('isAltered')
house_trained = attributes.get('isHousetrained')
return Cat(
id=cat_id,
name=name,
breed=primary_breed,
breeds_secondary=secondary_breeds,
age=age,
size=size,
gender=gender,
description=description,
organization_name=org_name,
organization_id=org_id,
city=city,
state=state,
zip_code=zip_code,
country='US',
good_with_children=good_with_children,
good_with_dogs=good_with_dogs,
good_with_cats=good_with_cats,
special_needs=special_needs,
photos=photos,
primary_photo=primary_photo,
source='rescuegroups',
url=url,
contact_email=contact_email,
contact_phone=contact_phone,
declawed=declawed,
spayed_neutered=spayed_neutered,
house_trained=house_trained,
coat_length=coat_length,
colors=colors,
fetched_at=datetime.now()
)
@timed
def search_cats(
self,
location: Optional[str] = None,
distance: int = 100,
age: Optional[List[str]] = None,
size: Optional[List[str]] = None,
gender: Optional[str] = None,
color: Optional[List[str]] = None,
breed: Optional[List[str]] = None,
good_with_children: Optional[bool] = None,
good_with_dogs: Optional[bool] = None,
good_with_cats: Optional[bool] = None,
limit: int = 100
) -> List[Cat]:
"""
Search for cats on RescueGroups.
Args:
location: ZIP code or city/state
distance: Search radius in miles (default: 100)
age: List of age categories: kitten, young, adult, senior
size: List of sizes: small, medium, large
gender: Gender filter: male, female
color: List of colors (e.g., ["black", "white", "tuxedo"])
breed: List of breed names (e.g., ["Siamese", "Maine Coon"])
good_with_children: Filter for cats good with children
good_with_dogs: Filter for cats good with dogs
good_with_cats: Filter for cats good with other cats
limit: Maximum number of results (default: 100)
Returns:
List of Cat objects
"""
if not self.api_key:
self.log_warning("RescueGroups API key not configured, returning empty results")
return []
color_str = f" with colors {color}" if color else ""
breed_str = f" breeds {breed}" if breed else ""
self.log(f"Searching RescueGroups for cats near {location}{color_str}{breed_str}")
self.log(f"DEBUG: RescueGroups search params - location: {location}, distance: {distance}, age: {age}, size: {size}, gender: {gender}, color: {color}, breed: {breed}")
# Build filter criteria
filters = [
{
"fieldName": "species.singular",
"operation": "equals",
"criteria": "cat"
},
{
"fieldName": "statuses.name",
"operation": "equals",
"criteria": "Available"
}
]
# Location filter - DISABLED: RescueGroups v5 API doesn't support location filtering
# Their API returns animals from all locations, filtering must be done client-side
if location:
self.log(f"NOTE: RescueGroups doesn't support location filters. Will return all results.")
# Age filter
if age:
age_map = {
'kitten': 'Baby',
'young': 'Young',
'adult': 'Adult',
'senior': 'Senior'
}
rg_ages = [age_map.get(a, a.capitalize()) for a in age]
for rg_age in rg_ages:
filters.append({
"fieldName": "animals.ageGroup",
"operation": "equals",
"criteria": rg_age
})
# Size filter
if size:
size_map = {
'small': 'Small',
'medium': 'Medium',
'large': 'Large'
}
for s in size:
rg_size = size_map.get(s, s.capitalize())
filters.append({
"fieldName": "animals.sizeGroup",
"operation": "equals",
"criteria": rg_size
})
# Gender filter
if gender:
filters.append({
"fieldName": "animals.sex",
"operation": "equals",
"criteria": gender.capitalize()
})
# Color filter - DISABLED: RescueGroups v5 API field name for color is unclear
# Filtering by color will be done client-side with returned data
if color:
self.log(f"NOTE: Color filtering for RescueGroups will be done client-side: {color}")
# Breed filter - DISABLED: RescueGroups v5 API breed filtering is not reliable
# Filtering by breed will be done client-side with returned data
if breed:
self.log(f"NOTE: Breed filtering for RescueGroups will be done client-side: {breed}")
# Behavioral filters - DISABLED: RescueGroups v5 API doesn't support behavioral filters
# These fields exist in response data but cannot be used as filter criteria
# Client-side filtering will be applied to returned results
if good_with_children:
self.log(f"NOTE: good_with_children filtering will be done client-side")
if good_with_dogs:
self.log(f"NOTE: good_with_dogs filtering will be done client-side")
if good_with_cats:
self.log(f"NOTE: good_with_cats filtering will be done client-side")
# Build request payload
payload = {
"data": {
"filters": filters,
"filterProcessing": "1" # AND logic
}
}
# Add pagination
if limit:
payload["data"]["limit"] = min(limit, self.MAX_RESULTS_PER_PAGE)
self.log(f"DEBUG: RescueGroups filters: {len(filters)} filters applied")
try:
response = self._make_request("/public/animals/search/available/cats", payload)
self.log(f"DEBUG: RescueGroups API Response - Found {len(response.get('data', []))} animals")
# Parse response
data = response.get('data', [])
cats = []
for animal_data in data:
try:
cat = self._parse_cat(animal_data)
cats.append(cat)
except Exception as e:
self.log_warning(f"Failed to parse cat {animal_data.get('id')}: {e}")
self.log(f"Search complete: found {len(cats)} cats")
return cats
except Exception as e:
self.log_error(f"Search failed: {e}")
return []

View File

@@ -1,834 +0,0 @@
"""Gradio UI for Tuxedo Link cat adoption application."""
import os
import gradio as gr
import pandas as pd
from dotenv import load_dotenv
from typing import List, Optional, Tuple
import logging
import re
from datetime import datetime
# Import models - these are lightweight
from models.cats import CatProfile, CatMatch, AdoptionAlert
from utils.config import is_production
# Load environment
load_dotenv()
# Initialize framework based on mode
framework = None
profile_agent = None
if not is_production():
# LOCAL MODE: Import and initialize heavy components
from cat_adoption_framework import TuxedoLinkFramework
from agents.profile_agent import ProfileAgent
framework = TuxedoLinkFramework()
profile_agent = ProfileAgent()
print("✓ Running in LOCAL mode - using local components")
else:
# PRODUCTION MODE: Don't import heavy components - use Modal API
print("✓ Running in PRODUCTION mode - using Modal API")
# Global state for current search results
current_matches: List[CatMatch] = []
current_profile: Optional[CatProfile] = None
# Configure logging to suppress verbose output
logging.getLogger().setLevel(logging.WARNING)
def extract_profile_from_text(user_input: str, use_cache: bool = False) -> tuple:
"""
Extract structured profile from user's natural language input.
Args:
user_input: User's description of desired cat
use_cache: Whether to use cached data for search
Returns:
Tuple of (chat_history, results_html, profile_json)
"""
global current_matches, current_profile
try:
# Handle empty input - use placeholder text
if not user_input or user_input.strip() == "":
user_input = "I'm looking for a friendly, playful kitten in NYC that's good with children"
# Extract profile using LLM
# Using messages format for Gradio chatbot
chat_history = [
{"role": "user", "content": user_input},
{"role": "assistant", "content": "🔍 Analyzing your preferences..."}
]
# Extract profile (Modal or local)
if is_production():
# PRODUCTION: Call Modal API
import modal
# Look up deployed function - correct API!
extract_profile_func = modal.Function.from_name("tuxedo-link-api", "extract_profile")
print("[INFO] Calling Modal API to extract profile...")
profile_result = extract_profile_func.remote(user_input)
if not profile_result["success"]:
return chat_history, "<p>❌ Error extracting profile</p>", "{}"
profile = CatProfile(**profile_result["profile"])
else:
# LOCAL: Use local agent
conversation = [{"role": "user", "content": user_input}]
profile = profile_agent.extract_profile(conversation)
current_profile = profile
# Perform search
response_msg = f"✅ Got it! Searching for:\n\n" + \
f"📍 Location: {profile.user_location or 'Not specified'}\n" + \
f"📏 Distance: {profile.max_distance or 100} miles\n" + \
f"🎨 Colors: {', '.join(profile.color_preferences) if profile.color_preferences else 'Any'}\n" + \
f"🎭 Personality: {profile.personality_description or 'Any'}\n" + \
f"🎂 Age: {', '.join(profile.age_range) if profile.age_range else 'Any'}\n" + \
f"👶 Good with children: {'Yes' if profile.good_with_children else 'Not required'}\n" + \
f"🐕 Good with dogs: {'Yes' if profile.good_with_dogs else 'Not required'}\n" + \
f"🐱 Good with cats: {'Yes' if profile.good_with_cats else 'Not required'}\n\n" + \
f"Searching..."
chat_history[1]["content"] = response_msg
# Run search (Modal or local)
if is_production():
# PRODUCTION: Call Modal API
import modal
# Look up deployed function
search_cats_func = modal.Function.from_name("tuxedo-link-api", "search_cats")
print("[INFO] Calling Modal API to search cats...")
search_result = search_cats_func.remote(profile.model_dump(), use_cache=use_cache)
if not search_result["success"]:
error_msg = search_result.get('error', 'Unknown error')
chat_history.append({"role": "assistant", "content": f"❌ Search error: {error_msg}"})
return chat_history, "<p>😿 Search failed. Please try again.</p>", profile.json()
# Reconstruct matches from Modal response
from models.cats import Cat
current_matches = [
CatMatch(
cat=Cat(**m["cat"]),
match_score=m["match_score"],
vector_similarity=m["vector_similarity"],
attribute_match_score=m["attribute_match_score"],
explanation=m["explanation"],
matching_attributes=m.get("matching_attributes", []),
missing_attributes=m.get("missing_attributes", [])
)
for m in search_result["matches"]
]
else:
# LOCAL: Use local framework
result = framework.search(profile, use_cache=use_cache)
current_matches = result.matches
# Build results HTML
if current_matches:
chat_history[1]["content"] += f"\n\n✨ Found {len(current_matches)} great matches!"
results_html = build_results_grid(current_matches)
else:
chat_history[1]["content"] += "\n\n😿 No matches found. Try broadening your search criteria."
results_html = "<p style='text-align:center; color: #666; padding: 40px;'>No matches found</p>"
# Profile JSON for display
profile_json = profile.model_dump_json(indent=2)
return chat_history, results_html, profile_json
except Exception as e:
error_msg = f"❌ Error: {str(e)}"
print(f"[ERROR] Search failed: {e}")
import traceback
traceback.print_exc()
return [
{"role": "user", "content": user_input},
{"role": "assistant", "content": error_msg}
], "<p>Error occurred</p>", "{}"
def build_results_grid(matches: List[CatMatch]) -> str:
"""Build HTML grid of cat results."""
html = "<div style='display: grid; grid-template-columns: repeat(auto-fill, minmax(240px, 1fr)); gap: 20px; padding: 20px;'>"
for match in matches:
cat = match.cat
photo = cat.primary_photo or "https://via.placeholder.com/240x180?text=No+Photo"
html += f"""
<div style='border: 1px solid #ddd; border-radius: 10px; overflow: hidden; box-shadow: 0 2px 8px rgba(0,0,0,0.1);'>
<img src='{photo}' style='width: 100%; height: 180px; object-fit: cover;' onerror="this.src='https://via.placeholder.com/240x180?text=No+Photo'">
<div style='padding: 15px;'>
<h3 style='margin: 0 0 10px 0; color: #333;'>{cat.name}</h3>
<div style='display: flex; justify-content: space-between; margin-bottom: 8px;'>
<span style='background: #4CAF50; color: white; padding: 4px 12px; border-radius: 12px; font-size: 12px;'>
{match.match_score:.0%} Match
</span>
<span style='color: #666; font-size: 14px;'>{cat.age}</span>
</div>
<p style='color: #666; font-size: 14px; margin: 8px 0;'>
<strong>{cat.breed}</strong><br/>
{cat.city}, {cat.state}<br/>
{cat.gender.capitalize()}{cat.size.capitalize() if cat.size else 'Unknown size'}
</p>
<p style='color: #888; font-size: 13px; margin: 10px 0; line-height: 1.4;'>
{match.explanation}
</p>
<a href='{cat.url}' target='_blank' style='display: block; text-align: center; background: #2196F3; color: white; padding: 10px; border-radius: 5px; text-decoration: none; margin-top: 10px;'>
View Details
</a>
</div>
</div>
"""
html += "</div>"
return html
def search_with_examples(example_text: str, use_cache: bool = False) -> tuple:
"""Handle example button clicks."""
return extract_profile_from_text(example_text, use_cache)
# ===== ALERT MANAGEMENT FUNCTIONS =====
def validate_email(email: str) -> bool:
"""Validate email address format."""
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return bool(re.match(pattern, email))
def send_immediate_notification_local(alert_id: int) -> None:
"""
Send immediate notification locally (not via Modal).
Args:
alert_id: ID of the alert to process
"""
from agents.email_agent import EmailAgent
from agents.email_providers.factory import get_email_provider
print(f"[DEBUG] Sending immediate notification for alert {alert_id}")
# Get alert from database
alert = framework.db_manager.get_alert_by_id(alert_id)
if not alert:
print(f"[ERROR] Alert {alert_id} not found")
raise ValueError(f"Alert {alert_id} not found")
print(f"[DEBUG] Alert found: email={alert.user_email}, profile exists={alert.profile is not None}")
# Run search with the alert's profile
result = framework.search(alert.profile, use_cache=False)
print(f"[DEBUG] Search complete: {len(result.matches)} matches found")
if result.matches:
# Send email notification
try:
email_provider = get_email_provider()
email_agent = EmailAgent(email_provider)
print(f"[DEBUG] Sending email to {alert.user_email}...")
email_agent.send_match_notification(
alert=alert,
matches=result.matches
)
print(f"[DEBUG] ✓ Email sent successfully!")
except Exception as e:
print(f"[ERROR] Failed to send email: {e}")
import traceback
traceback.print_exc()
raise
else:
print(f"[DEBUG] No matches found, no email sent")
def save_alert(email: str, frequency: str, profile_json: str) -> Tuple[str, pd.DataFrame]:
"""
Save an adoption alert to the database.
Args:
email: User's email address
frequency: Notification frequency (Immediately, Daily, Weekly)
profile_json: JSON string of current search profile
Returns:
Tuple of (status_message, updated_alerts_dataframe)
"""
global current_profile
try:
# Validate email
if not email or not validate_email(email):
return "❌ Please enter a valid email address", load_alerts()
# Check if we have a current profile
if not current_profile:
return "❌ Please perform a search first to create a profile", load_alerts()
# Normalize frequency
frequency = frequency.lower()
# Create alert
alert = AdoptionAlert(
user_email=email,
profile=current_profile,
frequency=frequency,
active=True
)
# Save alert based on mode
if is_production():
# PRODUCTION MODE: Use Modal function
try:
import modal
print(f"[INFO] Production mode: Calling Modal function to create alert...")
# Look up deployed function - correct API!
create_alert_func = modal.Function.from_name("tuxedo-link-api", "create_alert_and_notify")
# Send alert data to Modal
result = create_alert_func.remote(alert.dict())
if result["success"]:
status = f"{result['message']}"
else:
status = f"⚠️ {result['message']}"
return status, load_alerts()
except Exception as e:
import traceback
error_detail = traceback.format_exc()
print(f"[ERROR] Modal function failed: {error_detail}")
return f"❌ Error calling Modal service: {str(e)}\n\nCheck Modal logs for details.", load_alerts()
else:
# LOCAL MODE: Save and process locally
alert_id = framework.db_manager.create_alert(alert)
if frequency == "immediately":
try:
send_immediate_notification_local(alert_id)
status = f"✅ Alert saved and notification sent locally! (ID: {alert_id})\n\nCheck your email at {email}"
except Exception as e:
import traceback
error_detail = traceback.format_exc()
print(f"[ERROR] Local notification failed: {error_detail}")
status = f"✅ Alert saved (ID: {alert_id}), but notification failed: {str(e)}"
else:
status = f"✅ Alert saved successfully! (ID: {alert_id})\n\nYou'll receive {frequency} notifications at {email}"
return status, load_alerts()
except Exception as e:
return f"❌ Error saving alert: {str(e)}", load_alerts()
def load_alerts(email_filter: str = "") -> pd.DataFrame:
"""
Load all alerts from the database.
Args:
email_filter: Optional email to filter by
Returns:
DataFrame of alerts
"""
try:
# Get alerts from database (Modal or local)
if is_production():
# PRODUCTION: Call Modal API
import modal
# Look up deployed function
get_alerts_func = modal.Function.from_name("tuxedo-link-api", "get_alerts")
alert_dicts = get_alerts_func.remote(email=email_filter if email_filter and validate_email(email_filter) else None)
alerts = [AdoptionAlert(**a) for a in alert_dicts]
else:
# LOCAL: Use local database
if email_filter and validate_email(email_filter):
alerts = framework.db_manager.get_alerts_by_email(email_filter)
else:
alerts = framework.db_manager.get_all_alerts()
if not alerts:
# Return empty DataFrame with correct columns
return pd.DataFrame(columns=["ID", "Email", "Frequency", "Location", "Preferences", "Last Sent", "Status"])
# Convert to display format
data = []
for alert in alerts:
location = alert.profile.user_location or "Any"
prefs = []
if alert.profile.age_range:
prefs.append(f"Age: {', '.join(alert.profile.age_range)}")
if alert.profile.good_with_children:
prefs.append("Child-friendly")
if alert.profile.good_with_dogs:
prefs.append("Dog-friendly")
if alert.profile.good_with_cats:
prefs.append("Cat-friendly")
prefs_str = ", ".join(prefs) if prefs else "Any"
last_sent = alert.last_sent.strftime("%Y-%m-%d %H:%M") if alert.last_sent else "Never"
status = "🟢 Active" if alert.active else "🔴 Inactive"
data.append({
"ID": alert.id,
"Email": alert.user_email,
"Frequency": alert.frequency.capitalize(),
"Location": location,
"Preferences": prefs_str,
"Last Sent": last_sent,
"Status": status
})
return pd.DataFrame(data)
except Exception as e:
logging.error(f"Error loading alerts: {e}")
return pd.DataFrame(columns=["ID", "Email", "Frequency", "Location", "Preferences", "Last Sent", "Status"])
def delete_alert(alert_id: str, email_filter: str = "") -> Tuple[str, pd.DataFrame]:
"""
Delete an alert by ID.
Args:
alert_id: Alert ID to delete
email_filter: Optional email filter for refresh
Returns:
Tuple of (status_message, updated_alerts_dataframe)
"""
try:
if not alert_id:
return "❌ Please enter an Alert ID", load_alerts(email_filter)
# Convert to int
try:
alert_id_int = int(alert_id)
except ValueError:
return f"❌ Invalid Alert ID: {alert_id}", load_alerts(email_filter)
# Delete from database (Modal or local)
if is_production():
# PRODUCTION: Call Modal API
import modal
# Look up deployed function
delete_alert_func = modal.Function.from_name("tuxedo-link-api", "delete_alert")
success = delete_alert_func.remote(alert_id_int)
if not success:
return f"❌ Failed to delete alert {alert_id}", load_alerts(email_filter)
else:
# LOCAL: Use local database
framework.db_manager.delete_alert(alert_id_int)
return f"✅ Alert {alert_id} deleted successfully", load_alerts(email_filter)
except Exception as e:
return f"❌ Error deleting alert: {str(e)}", load_alerts(email_filter)
def toggle_alert_status(alert_id: str, email_filter: str = "") -> Tuple[str, pd.DataFrame]:
"""
Toggle alert active/inactive status.
Args:
alert_id: Alert ID to toggle
email_filter: Optional email filter for refresh
Returns:
Tuple of (status_message, updated_alerts_dataframe)
"""
try:
if not alert_id:
return "❌ Please enter an Alert ID", load_alerts(email_filter)
# Convert to int
try:
alert_id_int = int(alert_id)
except ValueError:
return f"❌ Invalid Alert ID: {alert_id}", load_alerts(email_filter)
# Get current alert and toggle (Modal or local)
if is_production():
# PRODUCTION: Call Modal API
import modal
# Look up deployed functions
get_alerts_func = modal.Function.from_name("tuxedo-link-api", "get_alerts")
update_alert_func = modal.Function.from_name("tuxedo-link-api", "update_alert")
# Get all alerts and find this one
alert_dicts = get_alerts_func.remote()
alert_dict = next((a for a in alert_dicts if a["id"] == alert_id_int), None)
if not alert_dict:
return f"❌ Alert {alert_id} not found", load_alerts(email_filter)
alert = AdoptionAlert(**alert_dict)
new_status = not alert.active
success = update_alert_func.remote(alert_id_int, active=new_status)
if not success:
return f"❌ Failed to update alert {alert_id}", load_alerts(email_filter)
else:
# LOCAL: Use local database
alert = framework.db_manager.get_alert(alert_id_int)
if not alert:
return f"❌ Alert {alert_id} not found", load_alerts(email_filter)
new_status = not alert.active
framework.db_manager.update_alert(alert_id_int, active=new_status)
status_text = "activated" if new_status else "deactivated"
return f"✅ Alert {alert_id} {status_text}", load_alerts(email_filter)
except Exception as e:
return f"❌ Error toggling alert: {str(e)}", load_alerts(email_filter)
def build_search_tab() -> None:
"""Build the search tab interface with chat and results display."""
with gr.Column():
gr.Markdown("# 🐱 Find Your Perfect Cat")
gr.Markdown("Tell me what kind of cat you're looking for, and I'll help you find the perfect match!")
with gr.Row():
# In production mode, default to False since Modal cache starts empty
# In local mode, can default to True after first run
default_cache = False if is_production() else True
use_cache_checkbox = gr.Checkbox(
label="Use Cache (Fast Mode)",
value=default_cache,
info="Use cached cat data for faster searches (uncheck for fresh data from APIs)"
)
# Chat interface for natural language input
chatbot = gr.Chatbot(label="Chat", height=200, type="messages")
user_input = gr.Textbox(
label="Describe your ideal cat",
placeholder="I'm looking for a friendly, playful kitten in NYC that's good with children...",
lines=3
)
with gr.Row():
submit_btn = gr.Button("🔍 Search", variant="primary")
clear_btn = gr.Button("🔄 Clear")
# Example queries
gr.Markdown("### 💡 Try these examples:")
with gr.Row():
example_btns = [
gr.Button("🏠 Family cat", size="sm"),
gr.Button("🎮 Playful kitten", size="sm"),
gr.Button("😴 Calm adult", size="sm"),
gr.Button("👶 Good with kids", size="sm")
]
# Results display
gr.Markdown("---")
gr.Markdown("## 🎯 Search Results")
results_html = gr.HTML(value="<p style='text-align:center; color: #999; padding: 40px;'>Enter your preferences above to start searching</p>")
# Profile display (collapsible)
with gr.Accordion("📋 Extracted Profile (for debugging)", open=False):
profile_display = gr.JSON(label="Profile Data")
# Wire up events
submit_btn.click(
fn=extract_profile_from_text,
inputs=[user_input, use_cache_checkbox],
outputs=[chatbot, results_html, profile_display]
)
user_input.submit(
fn=extract_profile_from_text,
inputs=[user_input, use_cache_checkbox],
outputs=[chatbot, results_html, profile_display]
)
clear_btn.click(
fn=lambda: ([], "<p style='text-align:center; color: #999; padding: 40px;'>Enter your preferences above to start searching</p>", ""),
outputs=[chatbot, results_html, profile_display]
)
# Example buttons
examples = [
"I want a friendly family cat in zip code 10001, good with children and dogs",
"Looking for a playful young kitten near New York City",
"I need a calm, affectionate adult cat that likes to cuddle",
"Show me cats good with children in the NYC area"
]
for btn, example in zip(example_btns, examples):
btn.click(
fn=search_with_examples,
inputs=[gr.State(example), use_cache_checkbox],
outputs=[chatbot, results_html, profile_display]
)
def build_alerts_tab() -> None:
"""Build the alerts management tab for scheduling email notifications."""
with gr.Column():
gr.Markdown("# 🔔 Manage Alerts")
gr.Markdown("Save your search and get notified when new matching cats are available!")
# Instructions
gr.Markdown("""
### How it works:
1. **Search** for cats using your preferred criteria in the Search tab
2. **Enter your email** below and choose notification frequency
3. **Save Alert** to start receiving notifications
You'll be notified when new cats matching your preferences become available!
""")
# Save Alert Section
gr.Markdown("### 💾 Save Current Search as Alert")
with gr.Row():
with gr.Column(scale=2):
email_input = gr.Textbox(
label="Email Address",
placeholder="your@email.com",
info="Where should we send notifications?"
)
with gr.Column(scale=1):
frequency_dropdown = gr.Dropdown(
label="Notification Frequency",
choices=["Immediately", "Daily", "Weekly"],
value="Daily",
info="How often to check for new matches"
)
with gr.Row():
save_btn = gr.Button("💾 Save Alert", variant="primary", scale=2)
profile_display = gr.JSON(
label="Current Search Profile",
value={},
visible=False,
scale=1
)
save_status = gr.Markdown("")
gr.Markdown("---")
# Manage Alerts Section
gr.Markdown("### 📋 Your Saved Alerts")
with gr.Row():
with gr.Column(scale=2):
email_filter_input = gr.Textbox(
label="Filter by Email (optional)",
placeholder="your@email.com"
)
with gr.Column(scale=1):
refresh_btn = gr.Button("🔄 Refresh", size="sm")
alerts_table = gr.Dataframe(
value=[], # Start empty - load on demand to avoid blocking UI startup
headers=["ID", "Email", "Frequency", "Location", "Preferences", "Last Sent", "Status"],
datatype=["number", "str", "str", "str", "str", "str", "str"],
interactive=False,
wrap=True
)
# Alert Actions
gr.Markdown("### ⚙️ Manage Alert")
with gr.Row():
alert_id_input = gr.Textbox(
label="Alert ID",
placeholder="Enter Alert ID from table above",
scale=2
)
with gr.Column(scale=3):
with gr.Row():
toggle_btn = gr.Button("🔄 Toggle Active/Inactive", size="sm")
delete_btn = gr.Button("🗑️ Delete Alert", variant="stop", size="sm")
action_status = gr.Markdown("")
# Wire up events
save_btn.click(
fn=save_alert,
inputs=[email_input, frequency_dropdown, profile_display],
outputs=[save_status, alerts_table]
)
refresh_btn.click(
fn=load_alerts,
inputs=[email_filter_input],
outputs=[alerts_table]
)
email_filter_input.submit(
fn=load_alerts,
inputs=[email_filter_input],
outputs=[alerts_table]
)
toggle_btn.click(
fn=toggle_alert_status,
inputs=[alert_id_input, email_filter_input],
outputs=[action_status, alerts_table]
)
delete_btn.click(
fn=delete_alert,
inputs=[alert_id_input, email_filter_input],
outputs=[action_status, alerts_table]
)
def build_about_tab() -> None:
"""Build the about tab with Kyra's story and application info."""
with gr.Column():
gr.Markdown("# 🎩 About Tuxedo Link")
gr.Markdown("""
## In Loving Memory of Kyra 🐱
This application is dedicated to **Kyra**, a beloved companion who brought joy,
comfort, and unconditional love to our lives. Kyra was more than just a cat—
he was family, a friend, and a constant source of happiness.
### The Inspiration
Kyra Link was created to help others find their perfect feline companion,
just as Kyra found his way into our hearts. Every cat deserves a loving home,
and every person deserves the companionship of a wonderful cat like Kyra.
### The Technology
This application uses AI and machine learning to match prospective
adopters with their ideal cat:
- **Natural Language Processing**: Understand your preferences in plain English
- **Semantic Search**: Find cats based on personality, not just keywords
- **Multi-Source Aggregation**: Search across multiple adoption platforms
- **Smart Deduplication**: Remove duplicate listings using AI
- **Image Recognition**: Match cats visually using computer vision
- **Hybrid Matching**: Combine semantic understanding with structured filters
### Features
✅ **Multi-Platform Search**: Petfinder, RescueGroups
✅ **AI-Powered Matching**: Semantic search with vector embeddings
✅ **Smart Deduplication**: Name, description, and image similarity
✅ **Personality Matching**: Find cats that match your lifestyle
✅ **Location-Based**: Search near you with customizable radius
### Technical Stack
- **Frontend**: Gradio
- **Backend**: Python with Modal serverless
- **LLMs**: OpenAI GPT-4 for profile extraction
- **Vector DB**: ChromaDB with SentenceTransformers
- **Image AI**: CLIP for visual similarity
- **APIs**: Petfinder, RescueGroups, SendGrid
- **Database**: SQLite for caching and user management
### Open Source
Tuxedo Link is open source and built as part of the Andela LLM Engineering bootcamp.
Contributions and improvements are welcome!
### Acknowledgments
- **Petfinder**: For their comprehensive pet adoption API
- **RescueGroups**: For connecting rescues with adopters
- **Andela**: For the LLM Engineering bootcamp
- **Kyra**: For inspiring this project and bringing so much joy 💙
---
*"In memory of Kyra, who taught us that home is wherever your cat is."*
🐾 **May every cat find their perfect home** 🐾
""")
# Add Kyra's picture
with gr.Row():
with gr.Column():
gr.Image(
value="assets/Kyra.png",
label="Kyra - Forever in our hearts 💙",
show_label=True,
container=True,
width=400,
height=400,
show_download_button=False,
show_share_button=False,
interactive=False
)
def create_app() -> gr.Blocks:
"""
Create and configure the Gradio application.
Returns:
Configured Gradio Blocks application
"""
with gr.Blocks(
title="Tuxedo Link - Find Your Perfect Cat",
theme=gr.themes.Soft()
) as app:
gr.Markdown("""
<div style='text-align: center; padding: 20px;'>
<h1 style='font-size: 3em; margin: 0;'>🎩 Tuxedo Link</h1>
<p style='font-size: 1.2em; color: #666; margin: 10px 0;'>
AI-Powered Cat Adoption Search
</p>
</div>
""")
with gr.Tabs():
with gr.Tab("🔍 Search"):
build_search_tab()
with gr.Tab("🔔 Alerts"):
build_alerts_tab()
with gr.Tab(" About"):
build_about_tab()
gr.Markdown("""
<div style='text-align: center; padding: 20px; color: #999; font-size: 0.9em;'>
Made with ❤️ in memory of Kyra |
<a href='https://github.com/yourusername/tuxedo-link' style='color: #2196F3;'>GitHub</a> |
Powered by AI & Open Source
</div>
""")
return app
if __name__ == "__main__":
app = create_app()
app.launch(
server_name="0.0.0.0",
server_port=7860,
share=False,
show_error=True
)

View File

@@ -1,255 +0,0 @@
"""Main framework for Tuxedo Link cat adoption application."""
import logging
import sys
from typing import Optional
from dotenv import load_dotenv
from models.cats import CatProfile, SearchResult
from database.manager import DatabaseManager
from setup_vectordb import VectorDBManager
from setup_metadata_vectordb import MetadataVectorDB
from agents.planning_agent import PlanningAgent
from utils.config import get_db_path, get_vectordb_path
# Color codes for logging
BG_BLUE = '\033[44m'
WHITE = '\033[37m'
RESET = '\033[0m'
def init_logging() -> None:
"""Initialize logging with colored output for the framework."""
root = logging.getLogger()
root.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)
formatter = logging.Formatter(
"[%(asctime)s] [Tuxedo Link] [%(levelname)s] %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
handler.setFormatter(formatter)
root.addHandler(handler)
class TuxedoLinkFramework:
"""Main framework for Tuxedo Link cat adoption application."""
def __init__(self):
"""Initialize the Tuxedo Link framework."""
init_logging()
load_dotenv()
self.log("Initializing Tuxedo Link Framework...")
# Initialize database managers using config
db_path = get_db_path()
vectordb_path = get_vectordb_path()
self.db_manager = DatabaseManager(db_path)
self.vector_db = VectorDBManager(vectordb_path)
self.metadata_vectordb = MetadataVectorDB("metadata_vectorstore")
# Index colors and breeds from APIs for fuzzy matching
self._index_metadata()
# Lazy agent initialization
self.planner: Optional[PlanningAgent] = None
self.log("Tuxedo Link Framework initialized")
def _index_metadata(self) -> None:
"""Index colors and breeds from APIs into metadata vector DB for fuzzy matching."""
try:
from agents.petfinder_agent import PetfinderAgent
from agents.rescuegroups_agent import RescueGroupsAgent
self.log("Indexing colors and breeds for fuzzy matching...")
# Index Petfinder colors and breeds
try:
petfinder = PetfinderAgent()
colors = petfinder.get_valid_colors()
breeds = petfinder.get_valid_breeds()
if colors:
self.metadata_vectordb.index_colors(colors, source="petfinder")
if breeds:
self.metadata_vectordb.index_breeds(breeds, source="petfinder")
except Exception as e:
logging.warning(f"Could not index Petfinder metadata: {e}")
# Index RescueGroups colors and breeds
try:
rescuegroups = RescueGroupsAgent()
colors = rescuegroups.get_valid_colors()
breeds = rescuegroups.get_valid_breeds()
if colors:
self.metadata_vectordb.index_colors(colors, source="rescuegroups")
if breeds:
self.metadata_vectordb.index_breeds(breeds, source="rescuegroups")
except Exception as e:
logging.warning(f"Could not index RescueGroups metadata: {e}")
stats = self.metadata_vectordb.get_stats()
self.log(f"✓ Metadata indexed: {stats['colors_count']} colors, {stats['breeds_count']} breeds")
except Exception as e:
logging.warning(f"Metadata indexing failed: {e}")
def init_agents(self) -> None:
"""Initialize agents lazily on first search request."""
if not self.planner:
self.log("Initializing agent pipeline...")
self.planner = PlanningAgent(
self.db_manager,
self.vector_db,
self.metadata_vectordb
)
self.log("Agent pipeline ready")
def log(self, message: str) -> None:
"""
Log a message with framework identifier.
Args:
message: Message to log
"""
text = BG_BLUE + WHITE + "[Framework] " + message + RESET
logging.info(text)
def search(self, profile: CatProfile, use_cache: bool = False) -> SearchResult:
"""
Execute cat adoption search.
This runs the complete pipeline:
1. Fetch cats from APIs OR load from cache (if use_cache=True)
2. Deduplicate across sources (if fetching new)
3. Cache in database with image embeddings (if fetching new)
4. Update vector database (if fetching new)
5. Perform hybrid matching (semantic + metadata)
6. Return ranked results
Args:
profile: User's cat profile with preferences
use_cache: If True, use cached data instead of fetching from APIs.
This saves API calls during development/testing.
Returns:
SearchResult with matches and metadata
"""
self.init_agents()
return self.planner.search(profile, use_cache=use_cache)
def cleanup_old_data(self, days: int = 30) -> dict:
"""
Clean up data older than specified days.
Args:
days: Number of days to keep (default: 30)
Returns:
Dictionary with cleanup statistics
"""
self.init_agents()
return self.planner.cleanup_old_data(days)
def get_stats(self) -> dict:
"""
Get statistics about the application state.
Returns:
Dictionary with database and vector DB stats
"""
cache_stats = self.db_manager.get_cache_stats()
vector_stats = self.vector_db.get_stats()
return {
'database': cache_stats,
'vector_db': vector_stats
}
if __name__ == "__main__":
# Test the framework with a real search
print("\n" + "="*60)
print("Testing Tuxedo Link Framework")
print("="*60 + "\n")
framework = TuxedoLinkFramework()
# Create a test profile
print("Creating test profile...")
profile = CatProfile(
user_location="10001", # New York City
max_distance=50,
personality_description="friendly, playful cat good with children",
age_range=["young", "adult"],
good_with_children=True
)
print(f"\nProfile:")
print(f" Location: {profile.user_location}")
print(f" Distance: {profile.max_distance} miles")
print(f" Age: {', '.join(profile.age_range)}")
print(f" Personality: {profile.personality_description}")
print(f" Good with children: {profile.good_with_children}")
# Run search
print("\n" + "-"*60)
print("Running search pipeline...")
print("-"*60 + "\n")
result = framework.search(profile)
# Display results
print("\n" + "="*60)
print("SEARCH RESULTS")
print("="*60 + "\n")
print(f"Total cats found: {result.total_found}")
print(f"Sources queried: {', '.join(result.sources_queried)}")
print(f"Duplicates removed: {result.duplicates_removed}")
print(f"Matches returned: {len(result.matches)}")
print(f"Search time: {result.search_time:.2f} seconds")
if result.matches:
print("\n" + "-"*60)
print("TOP MATCHES")
print("-"*60 + "\n")
for i, match in enumerate(result.matches[:5], 1):
cat = match.cat
print(f"{i}. {cat.name}")
print(f" Breed: {cat.breed}")
print(f" Age: {cat.age} | Size: {cat.size} | Gender: {cat.gender}")
print(f" Location: {cat.city}, {cat.state}")
print(f" Match Score: {match.match_score:.2%}")
print(f" Explanation: {match.explanation}")
print(f" Source: {cat.source}")
print(f" URL: {cat.url}")
if cat.primary_photo:
print(f" Photo: {cat.primary_photo}")
print()
else:
print("\nNo matches found. Try adjusting your search criteria.")
# Show stats
print("\n" + "="*60)
print("SYSTEM STATISTICS")
print("="*60 + "\n")
stats = framework.get_stats()
print("Database:")
for key, value in stats['database'].items():
print(f" {key}: {value}")
print("\nVector Database:")
for key, value in stats['vector_db'].items():
print(f" {key}: {value}")
print("\n" + "="*60)
print("Test Complete!")
print("="*60 + "\n")

View File

@@ -1,31 +0,0 @@
# Tuxedo Link Configuration
# Copy this file to config.yaml and adjust settings
# Email provider configuration
email:
provider: mailgun # Options: mailgun, sendgrid
from_name: "Tuxedo Link"
from_email: "noreply@tuxedolink.com"
# Mailgun configuration
mailgun:
domain: "sandboxfd631e04f8a941d5a5993a11227ea098.mailgun.org" # Your Mailgun domain
# API key from environment: MAILGUN_API_KEY
# SendGrid configuration (if using sendgrid provider)
sendgrid:
# API key from environment: SENDGRID_API_KEY
# kept for backwards compatibility
# Deployment configuration
deployment:
mode: local # Options: local, production
local:
db_path: "data/tuxedo_link.db"
vectordb_path: "cat_vectorstore"
production:
db_path: "/data/tuxedo_link.db"
vectordb_path: "/data/cat_vectorstore"

View File

@@ -1,6 +0,0 @@
"""Database layer for Tuxedo Link."""
from .manager import DatabaseManager
__all__ = ["DatabaseManager"]

View File

@@ -1,382 +0,0 @@
"""Database manager for Tuxedo Link."""
import sqlite3
import json
import os
from datetime import datetime, timedelta
from typing import List, Optional, Tuple, Generator, Dict, Any
import numpy as np
from contextlib import contextmanager
from models.cats import Cat, AdoptionAlert, CatProfile
from .schema import initialize_database
class DatabaseManager:
"""Manages all database operations for Tuxedo Link."""
def __init__(self, db_path: str):
"""
Initialize the database manager.
Args:
db_path: Path to SQLite database file
"""
self.db_path = db_path
# Create database directory if it doesn't exist
db_dir = os.path.dirname(db_path)
if db_dir and not os.path.exists(db_dir):
os.makedirs(db_dir)
# Initialize database if it doesn't exist
if not os.path.exists(db_path):
initialize_database(db_path)
@contextmanager
def get_connection(self) -> Generator[sqlite3.Connection, None, None]:
"""
Context manager for database connections.
Yields:
SQLite database connection with row factory enabled
"""
conn = sqlite3.connect(self.db_path)
conn.row_factory = sqlite3.Row # Access columns by name
try:
yield conn
conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()
# ===== ALERT OPERATIONS =====
def create_alert(self, alert: AdoptionAlert) -> int:
"""
Create a new adoption alert.
Args:
alert: AdoptionAlert object
Returns:
Alert ID
"""
with self.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"""INSERT INTO alerts
(user_email, profile_json, frequency, last_sent, active, last_match_ids)
VALUES (?, ?, ?, ?, ?, ?)""",
(
alert.user_email,
alert.profile.model_dump_json(),
alert.frequency,
alert.last_sent.isoformat() if alert.last_sent else None,
alert.active,
json.dumps(alert.last_match_ids)
)
)
return cursor.lastrowid
def get_alert(self, alert_id: int) -> Optional[AdoptionAlert]:
"""Get alert by ID."""
with self.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"""SELECT id, user_email, profile_json, frequency,
last_sent, active, created_at, last_match_ids
FROM alerts WHERE id = ?""",
(alert_id,)
)
row = cursor.fetchone()
if row:
return self._row_to_alert(row)
return None
def get_alerts_by_email(self, email: str, active_only: bool = False) -> List[AdoptionAlert]:
"""
Get all alerts for a specific email address.
Args:
email: User email address
active_only: If True, only return active alerts
Returns:
List of AdoptionAlert objects
"""
with self.get_connection() as conn:
cursor = conn.cursor()
if active_only:
cursor.execute(
"""SELECT id, user_email, profile_json, frequency,
last_sent, active, created_at, last_match_ids
FROM alerts WHERE user_email = ? AND active = 1
ORDER BY created_at DESC""",
(email,)
)
else:
cursor.execute(
"""SELECT id, user_email, profile_json, frequency,
last_sent, active, created_at, last_match_ids
FROM alerts WHERE user_email = ?
ORDER BY created_at DESC""",
(email,)
)
return [self._row_to_alert(row) for row in cursor.fetchall()]
def get_all_alerts(self, active_only: bool = False) -> List[AdoptionAlert]:
"""
Get all alerts in the database.
Args:
active_only: If True, only return active alerts
Returns:
List of AdoptionAlert objects
"""
with self.get_connection() as conn:
cursor = conn.cursor()
if active_only:
query = """SELECT id, user_email, profile_json, frequency,
last_sent, active, created_at, last_match_ids
FROM alerts WHERE active = 1
ORDER BY created_at DESC"""
else:
query = """SELECT id, user_email, profile_json, frequency,
last_sent, active, created_at, last_match_ids
FROM alerts
ORDER BY created_at DESC"""
cursor.execute(query)
return [self._row_to_alert(row) for row in cursor.fetchall()]
def get_active_alerts(self) -> List[AdoptionAlert]:
"""Get all active alerts across all users."""
with self.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"""SELECT id, user_email, profile_json, frequency,
last_sent, active, created_at, last_match_ids
FROM alerts WHERE active = 1"""
)
return [self._row_to_alert(row) for row in cursor.fetchall()]
def get_alert_by_id(self, alert_id: int) -> Optional[AdoptionAlert]:
"""
Get a specific alert by its ID.
Args:
alert_id: Alert ID to retrieve
Returns:
AdoptionAlert object or None if not found
"""
with self.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"""SELECT id, user_email, profile_json, frequency,
last_sent, active, created_at, last_match_ids
FROM alerts WHERE id = ?""",
(alert_id,)
)
row = cursor.fetchone()
return self._row_to_alert(row) if row else None
def update_alert(self, alert_id: int, **kwargs) -> None:
"""Update alert fields."""
allowed_fields = ['profile_json', 'frequency', 'last_sent', 'active', 'last_match_ids']
updates = []
values = []
for field, value in kwargs.items():
if field in allowed_fields:
updates.append(f"{field} = ?")
if field == 'last_sent' and isinstance(value, datetime):
values.append(value.isoformat())
elif field == 'last_match_ids':
values.append(json.dumps(value))
else:
values.append(value)
if updates:
values.append(alert_id)
with self.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
f"UPDATE alerts SET {', '.join(updates)} WHERE id = ?",
values
)
def delete_alert(self, alert_id: int) -> None:
"""Delete an alert."""
with self.get_connection() as conn:
cursor = conn.cursor()
cursor.execute("DELETE FROM alerts WHERE id = ?", (alert_id,))
def _row_to_alert(self, row: sqlite3.Row) -> AdoptionAlert:
"""
Convert database row to AdoptionAlert object.
Args:
row: SQLite row object from alerts table
Returns:
AdoptionAlert object with parsed JSON fields
"""
return AdoptionAlert(
id=row['id'],
user_email=row['user_email'],
profile=CatProfile.model_validate_json(row['profile_json']),
frequency=row['frequency'],
last_sent=datetime.fromisoformat(row['last_sent']) if row['last_sent'] else None,
active=bool(row['active']),
created_at=datetime.fromisoformat(row['created_at']) if row['created_at'] else datetime.now(),
last_match_ids=json.loads(row['last_match_ids']) if row['last_match_ids'] else []
)
# ===== CAT CACHE OPERATIONS =====
def cache_cat(self, cat: Cat, image_embedding: Optional[np.ndarray] = None) -> None:
"""
Cache a cat in the database.
Args:
cat: Cat object
image_embedding: Optional numpy array of image embedding
"""
with self.get_connection() as conn:
cursor = conn.cursor()
# Serialize image embedding if provided
embedding_bytes = None
if image_embedding is not None:
embedding_bytes = image_embedding.tobytes()
cursor.execute(
"""INSERT OR REPLACE INTO cats_cache
(id, fingerprint, source, data_json, image_embedding, fetched_at, is_duplicate, duplicate_of)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
(
cat.id,
cat.fingerprint,
cat.source,
cat.model_dump_json(),
embedding_bytes,
cat.fetched_at.isoformat(),
False,
None
)
)
def get_cached_cat(self, cat_id: str) -> Optional[Tuple[Cat, Optional[np.ndarray]]]:
"""Get a cat from cache by ID."""
with self.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"""SELECT data_json, image_embedding FROM cats_cache
WHERE id = ? AND is_duplicate = 0""",
(cat_id,)
)
row = cursor.fetchone()
if row:
cat = Cat.model_validate_json(row['data_json'])
embedding = None
if row['image_embedding']:
embedding = np.frombuffer(row['image_embedding'], dtype=np.float32)
return cat, embedding
return None
def get_cats_by_fingerprint(self, fingerprint: str) -> List[Tuple[Cat, Optional[np.ndarray]]]:
"""Get all cats with a specific fingerprint."""
with self.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"""SELECT data_json, image_embedding FROM cats_cache
WHERE fingerprint = ? AND is_duplicate = 0
ORDER BY fetched_at ASC""",
(fingerprint,)
)
results = []
for row in cursor.fetchall():
cat = Cat.model_validate_json(row['data_json'])
embedding = None
if row['image_embedding']:
embedding = np.frombuffer(row['image_embedding'], dtype=np.float32)
results.append((cat, embedding))
return results
def mark_as_duplicate(self, duplicate_id: str, canonical_id: str) -> None:
"""Mark a cat as duplicate of another."""
with self.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"UPDATE cats_cache SET is_duplicate = 1, duplicate_of = ? WHERE id = ?",
(canonical_id, duplicate_id)
)
def get_all_cached_cats(self, exclude_duplicates: bool = True) -> List[Cat]:
"""Get all cached cats."""
with self.get_connection() as conn:
cursor = conn.cursor()
if exclude_duplicates:
cursor.execute(
"SELECT data_json FROM cats_cache WHERE is_duplicate = 0 ORDER BY fetched_at DESC"
)
else:
cursor.execute(
"SELECT data_json FROM cats_cache ORDER BY fetched_at DESC"
)
return [Cat.model_validate_json(row['data_json']) for row in cursor.fetchall()]
def cleanup_old_cats(self, days: int = 30) -> int:
"""
Remove cats older than specified days.
Args:
days: Number of days to keep
Returns:
Number of cats removed
"""
cutoff_date = (datetime.now() - timedelta(days=days)).isoformat()
with self.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"DELETE FROM cats_cache WHERE fetched_at < ?",
(cutoff_date,)
)
return cursor.rowcount
def get_cache_stats(self) -> dict:
"""Get statistics about the cat cache."""
with self.get_connection() as conn:
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM cats_cache WHERE is_duplicate = 0")
total = cursor.fetchone()[0]
cursor.execute("SELECT COUNT(*) FROM cats_cache WHERE is_duplicate = 1")
duplicates = cursor.fetchone()[0]
cursor.execute("SELECT COUNT(DISTINCT source) FROM cats_cache WHERE is_duplicate = 0")
sources = cursor.fetchone()[0]
cursor.execute("""
SELECT source, COUNT(*) as count
FROM cats_cache
WHERE is_duplicate = 0
GROUP BY source
""")
by_source = {row['source']: row['count'] for row in cursor.fetchall()}
return {
'total_unique': total,
'total_duplicates': duplicates,
'sources': sources,
'by_source': by_source
}

View File

@@ -1,131 +0,0 @@
"""SQLite database schema for Tuxedo Link."""
import sqlite3
from typing import Optional
SCHEMA_VERSION = 2
# SQL statements for creating tables
CREATE_ALERTS_TABLE = """
CREATE TABLE IF NOT EXISTS alerts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_email TEXT NOT NULL,
profile_json TEXT NOT NULL,
frequency TEXT NOT NULL CHECK(frequency IN ('immediately', 'daily', 'weekly')),
last_sent TIMESTAMP,
active BOOLEAN DEFAULT 1,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_match_ids TEXT DEFAULT '[]'
);
"""
CREATE_CATS_CACHE_TABLE = """
CREATE TABLE IF NOT EXISTS cats_cache (
id TEXT PRIMARY KEY,
fingerprint TEXT NOT NULL,
source TEXT NOT NULL,
data_json TEXT NOT NULL,
image_embedding BLOB,
fetched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
is_duplicate BOOLEAN DEFAULT 0,
duplicate_of TEXT,
FOREIGN KEY (duplicate_of) REFERENCES cats_cache(id) ON DELETE SET NULL
);
"""
CREATE_SCHEMA_VERSION_TABLE = """
CREATE TABLE IF NOT EXISTS schema_version (
version INTEGER PRIMARY KEY,
applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
"""
# Index statements
CREATE_INDEXES = [
"CREATE INDEX IF NOT EXISTS idx_fingerprint ON cats_cache(fingerprint);",
"CREATE INDEX IF NOT EXISTS idx_source ON cats_cache(source);",
"CREATE INDEX IF NOT EXISTS idx_fetched_at ON cats_cache(fetched_at);",
"CREATE INDEX IF NOT EXISTS idx_is_duplicate ON cats_cache(is_duplicate);",
"CREATE INDEX IF NOT EXISTS idx_alerts_email ON alerts(user_email);",
"CREATE INDEX IF NOT EXISTS idx_alerts_active ON alerts(active);",
]
def initialize_database(db_path: str) -> None:
"""
Initialize the database with all tables and indexes.
Args:
db_path: Path to SQLite database file
"""
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
try:
# Create tables
cursor.execute(CREATE_ALERTS_TABLE)
cursor.execute(CREATE_CATS_CACHE_TABLE)
cursor.execute(CREATE_SCHEMA_VERSION_TABLE)
# Create indexes
for index_sql in CREATE_INDEXES:
cursor.execute(index_sql)
# Check and set schema version
cursor.execute("SELECT version FROM schema_version ORDER BY version DESC LIMIT 1")
result = cursor.fetchone()
if result is None:
cursor.execute("INSERT INTO schema_version (version) VALUES (?)", (SCHEMA_VERSION,))
elif result[0] < SCHEMA_VERSION:
# Future: Add migration logic here
cursor.execute("INSERT INTO schema_version (version) VALUES (?)", (SCHEMA_VERSION,))
conn.commit()
print(f"Database initialized successfully at {db_path}")
except Exception as e:
conn.rollback()
raise Exception(f"Failed to initialize database: {e}")
finally:
conn.close()
def drop_all_tables(db_path: str) -> None:
"""
Drop all tables (useful for testing).
Args:
db_path: Path to SQLite database file
"""
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
try:
cursor.execute("DROP TABLE IF EXISTS cats_cache")
cursor.execute("DROP TABLE IF EXISTS alerts")
cursor.execute("DROP TABLE IF EXISTS schema_version")
conn.commit()
print("All tables dropped successfully")
except Exception as e:
conn.rollback()
raise Exception(f"Failed to drop tables: {e}")
finally:
conn.close()
if __name__ == "__main__":
# For testing
import os
test_db = "test_database.db"
if os.path.exists(test_db):
os.remove(test_db)
initialize_database(test_db)
print(f"Test database created at {test_db}")

View File

@@ -1,147 +0,0 @@
#!/bin/bash
set -e
# Colors
GREEN='\033[0;32m'
BLUE='\033[0;34m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m' # No Color
echo "=========================================="
echo " Tuxedo Link - Modal Deployment"
echo "=========================================="
echo ""
# Check Modal is installed
if ! command -v modal &> /dev/null; then
echo -e "${RED}Error: modal CLI not found${NC}"
echo "Install with: pip install modal"
exit 1
fi
# Check Modal auth
echo -e "${BLUE}Checking Modal authentication...${NC}"
if ! uv run python -m modal app list &>/dev/null; then
echo -e "${RED}Error: Modal authentication not configured${NC}"
echo "Run: uv run python -m modal setup"
exit 1
fi
echo -e "${GREEN}✓ Modal authenticated${NC}"
echo ""
# Check config.yaml exists
if [ ! -f "config.yaml" ]; then
echo -e "${RED}Error: config.yaml not found${NC}"
echo "Copy config.example.yaml to config.yaml and configure it"
exit 1
fi
echo -e "${BLUE}Step 1: Validating configuration...${NC}"
python -c "
import yaml
import sys
try:
config = yaml.safe_load(open('config.yaml'))
if config['deployment']['mode'] != 'production':
print('❌ Error: Set deployment.mode to \"production\" in config.yaml for deployment')
sys.exit(1)
print('✓ Configuration valid')
except Exception as e:
print(f'❌ Error reading config: {e}')
sys.exit(1)
"
if [ $? -ne 0 ]; then
exit 1
fi
echo ""
echo -e "${BLUE}Step 2: Setting up Modal secrets...${NC}"
# Check if required environment variables are set
if [ -z "$OPENAI_API_KEY" ] || [ -z "$PETFINDER_API_KEY" ] || [ -z "$MAILGUN_API_KEY" ]; then
echo -e "${YELLOW}Warning: Some environment variables are not set.${NC}"
echo "Make sure the following are set in your environment or .env file:"
echo " - OPENAI_API_KEY"
echo " - PETFINDER_API_KEY"
echo " - PETFINDER_SECRET"
echo " - RESCUEGROUPS_API_KEY"
echo " - MAILGUN_API_KEY"
echo " - SENDGRID_API_KEY (optional)"
echo ""
read -p "Continue anyway? (y/N) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
exit 1
fi
fi
# Load .env if it exists
if [ -f ".env" ]; then
export $(cat .env | grep -v '^#' | xargs)
fi
modal secret create tuxedo-link-secrets \
OPENAI_API_KEY="${OPENAI_API_KEY}" \
PETFINDER_API_KEY="${PETFINDER_API_KEY}" \
PETFINDER_SECRET="${PETFINDER_SECRET}" \
RESCUEGROUPS_API_KEY="${RESCUEGROUPS_API_KEY}" \
MAILGUN_API_KEY="${MAILGUN_API_KEY}" \
SENDGRID_API_KEY="${SENDGRID_API_KEY:-}" \
--force 2>/dev/null || echo -e "${GREEN}✓ Secrets updated${NC}"
echo ""
echo -e "${BLUE}Step 3: Creating Modal volume...${NC}"
modal volume create tuxedo-link-data 2>/dev/null && echo -e "${GREEN}✓ Volume created${NC}" || echo -e "${GREEN}✓ Volume already exists${NC}"
echo ""
echo -e "${BLUE}Step 4: Copying config to Modal volume...${NC}"
# Create scripts directory if it doesn't exist
mkdir -p scripts
# Upload config.yaml to Modal volume
python scripts/upload_config_to_modal.py
echo ""
echo -e "${BLUE}Step 5: Deploying Modal API...${NC}"
modal deploy modal_services/modal_api.py
echo ""
echo -e "${BLUE}Step 6: Deploying scheduled search service...${NC}"
modal deploy modal_services/scheduled_search.py
echo ""
echo "=========================================="
echo -e " ${GREEN}Deployment Complete!${NC}"
echo "=========================================="
echo ""
echo "Deployed services:"
echo ""
echo "📡 Modal API (tuxedo-link-api):"
echo " - search_cats()"
echo " - extract_profile()"
echo " - create_alert_and_notify()"
echo " - get_alerts()"
echo " - update_alert()"
echo " - delete_alert()"
echo " - health_check()"
echo ""
echo "⏰ Scheduled Jobs (tuxedo-link-scheduled-search):"
echo " - daily_search_job (9 AM UTC daily)"
echo " - weekly_search_job (Monday 9 AM UTC)"
echo " - weekly_cleanup_job (Sunday 2 AM UTC)"
echo ""
echo "Useful commands:"
echo " API logs: modal app logs tuxedo-link-api --follow"
echo " Schedule logs: modal app logs tuxedo-link-scheduled-search --follow"
echo " View apps: modal app list"
echo " View volumes: modal volume list"
echo " View secrets: modal secret list"
echo ""
echo "Next steps:"
echo " 1. Run UI: ./run.sh"
echo " 2. Go to: http://localhost:7860"
echo " 3. Test search and alerts!"
echo "=========================================="

View File

@@ -1,68 +0,0 @@
## 🚀 Modal Deployment Guide
How to deploy Tuxedo Link to Modal for production use.
---
## 🏗️ Production Architecture
In production mode, Tuxedo Link uses a **hybrid architecture**:
### Component Distribution
**Local (Your Computer)**:
- Gradio UI (`app.py`) - User interface only
- No heavy ML models loaded
- Fast startup
**Modal (Cloud)**:
- `modal_api.py` - Main API functions (profile extraction, search, alerts)
- `scheduled_search.py` - Scheduled jobs (daily/weekly alerts, cleanup)
- Database (SQLite on Modal volume)
- Vector DB (ChromaDB on Modal volume)
- All ML models (GPT-4, SentenceTransformer, CLIP)
### Communication Flow
```
User → Gradio UI (local) → modal.Function.from_name().remote() → Modal API → Response → UI
```
**Key Functions Exposed by Modal**:
1. `extract_profile` - Convert natural language to CatProfile
2. `search_cats` - Execute complete search pipeline
3. `create_alert_and_notify` - Create alert with optional immediate email
4. `get_alerts` / `update_alert` / `delete_alert` - Alert management
---
## 📋 Quick Start (Automated Deployment)
The easiest way to deploy is using the automated deployment script:
```bash
cd week8/community_contributions/dkisselev-zz/tuxedo_link
# 1. Configure config.yaml for production
cp config.example.yaml config.yaml
# Edit config.yaml and set deployment.mode to 'production'
# 2. Ensure environment variables are set
# Load from .env or set manually:
export OPENAI_API_KEY=sk-...
export PETFINDER_API_KEY=...
export PETFINDER_SECRET=...
export RESCUEGROUPS_API_KEY=...
export MAILGUN_API_KEY=...
# 3. Run deployment script
./deploy.sh
```
The script will automatically:
- ✅ Validate Modal authentication
- ✅ Check configuration
- ✅ Create/update Modal secrets
- ✅ Create Modal volume
- ✅ Upload config.yaml to Modal
- ✅ Deploy scheduled search services

View File

@@ -1,487 +0,0 @@
# 🏗️ Tuxedo Link - Architecture Diagrams
**Date**: October 27, 2024
**Tool**: [Eraser.io](https://www.eraser.io/)
---
## System Architecture
This diagram can be rendered on [Eraser.io](https://www.eraser.io/) or any compatible Mermaid format diagraming tool
### High-Level Architecture
```eraser
// Tuxedo Link - High-Level System Architecture
// External APIs
openai [icon: openai, color: green]
petfinder [icon: api, color: blue]
rescuegroups [icon: api, color: blue]
sendgrid [icon: email, color: red]
// Frontend Layer
gradio [icon: browser, color: purple] {
search_tab
alerts_tab
about_tab
}
// Application Layer
framework [icon: server, color: orange] {
TuxedoLinkFramework
}
// Agent Layer
agents [icon: users, color: cyan] {
PlanningAgent
ProfileAgent
PetfinderAgent
RescueGroupsAgent
DeduplicationAgent
MatchingAgent
EmailAgent
}
// Data Layer
databases [icon: database, color: gray] {
SQLite
ChromaDB
}
// Deployment
modal [icon: cloud, color: blue] {
scheduled_jobs
volume_storage
}
// Connections
gradio > framework: User requests
framework > agents: Orchestrate
agents > openai: Profile extraction
agents > petfinder: Search cats
agents > rescuegroups: Search cats
agents > sendgrid: Send notifications
agents > databases: Store/retrieve
framework > databases: Manage data
modal > framework: Scheduled searches
modal > databases: Persistent storage
```
---
## Detailed Component Architecture
```eraser
// Tuxedo Link - Detailed Component Architecture
// Users
user [icon: user, color: purple]
// Frontend - Gradio UI
ui_layer [color: #E8F5E9] {
gradio_app [label: "Gradio Application"]
search_interface [label: "Search Tab"]
alerts_interface [label: "Alerts Tab"]
about_interface [label: "About Tab"]
gradio_app > search_interface
gradio_app > alerts_interface
gradio_app > about_interface
}
// Framework Layer
framework_layer [color: #FFF3E0] {
tuxedo_framework [label: "TuxedoLinkFramework", icon: server]
user_manager [label: "UserManager", icon: user]
tuxedo_framework > user_manager
}
// Orchestration Layer
orchestration [color: #E3F2FD] {
planning_agent [label: "PlanningAgent\n(Orchestrator)", icon: brain]
}
// Processing Agents
processing_agents [color: #F3E5F5] {
profile_agent [label: "ProfileAgent\n(GPT-4)", icon: chat]
matching_agent [label: "MatchingAgent\n(Hybrid Search)", icon: search]
dedup_agent [label: "DeduplicationAgent\n(Fingerprint+CLIP)", icon: filter]
}
// External Integration Agents
external_agents [color: #E0F2F1] {
petfinder_agent [label: "PetfinderAgent\n(OAuth)", icon: api]
rescuegroups_agent [label: "RescueGroupsAgent\n(API Key)", icon: api]
email_agent [label: "EmailAgent\n(SendGrid)", icon: email]
}
// Data Storage
storage_layer [color: #ECEFF1] {
sqlite_db [label: "SQLite Database", icon: database]
vector_db [label: "ChromaDB\n(Vector Store)", icon: database]
db_tables [label: "Tables"] {
users_table [label: "users"]
alerts_table [label: "alerts"]
cats_cache_table [label: "cats_cache"]
}
vector_collections [label: "Collections"] {
cats_collection [label: "cats_embeddings"]
}
sqlite_db > db_tables
vector_db > vector_collections
}
// External Services
external_services [color: #FFEBEE] {
openai_api [label: "OpenAI API\n(GPT-4)", icon: openai]
petfinder_api [label: "Petfinder API\n(OAuth 2.0)", icon: api]
rescuegroups_api [label: "RescueGroups API\n(API Key)", icon: api]
sendgrid_api [label: "SendGrid API\n(Email)", icon: email]
}
// Deployment Layer
deployment [color: #E8EAF6] {
modal_service [label: "Modal (Serverless)", icon: cloud]
modal_functions [label: "Functions"] {
daily_job [label: "daily_search_job"]
weekly_job [label: "weekly_search_job"]
cleanup_job [label: "cleanup_job"]
}
modal_storage [label: "Storage"] {
volume [label: "Modal Volume\n(/data)"]
}
modal_service > modal_functions
modal_service > modal_storage
}
// User Flows
user > ui_layer: Interact
ui_layer > framework_layer: API calls
framework_layer > orchestration: Search request
// Orchestration Flow
orchestration > processing_agents: Extract profile
orchestration > external_agents: Fetch cats
orchestration > processing_agents: Deduplicate
orchestration > processing_agents: Match & rank
orchestration > storage_layer: Cache results
// Agent to External Services
processing_agents > external_services: Profile extraction
external_agents > external_services: API requests
external_agents > external_services: Send emails
// Agent to Storage
processing_agents > storage_layer: Store/retrieve
external_agents > storage_layer: Cache & embeddings
orchestration > storage_layer: Query & update
// Modal Integration
deployment > framework_layer: Scheduled tasks
deployment > storage_layer: Persistent data
```
---
## Data Flow Diagram
```eraser
// Tuxedo Link - Search Data Flow
user [icon: user]
// Step 1: User Input
user_input [label: "1. User Input\n'friendly playful cat\nin NYC'"]
// Step 2: Profile Extraction
profile_extraction [label: "2. Profile Agent\n(OpenAI GPT-4)", icon: chat, color: purple]
extracted_profile [label: "CatProfile\n- location: NYC\n- age: young\n- personality: friendly"]
// Step 3: API Fetching (Parallel)
api_fetch [label: "3. Fetch from APIs\n(Parallel)", icon: api, color: blue]
petfinder_results [label: "Petfinder\n50 cats"]
rescuegroups_results [label: "RescueGroups\n50 cats"]
// Step 4: Deduplication
dedup [label: "4. Deduplication\n(3-tier)", icon: filter, color: orange]
dedup_details [label: "- Fingerprint\n- Text similarity\n- Image similarity"]
// Step 5: Cache & Embed
cache [label: "5. Cache & Embed", icon: database, color: gray]
sqlite_cache [label: "SQLite\n(Cat data)"]
vector_store [label: "ChromaDB\n(Embeddings)"]
// Step 6: Hybrid Matching
matching [label: "6. Hybrid Search\n60% vector\n40% metadata", icon: search, color: green]
// Step 7: Results
results [label: "7. Ranked Results\nTop 20 matches"]
// Step 8: Display
display [label: "8. Display to User\nwith explanations", icon: browser, color: purple]
// Flow connections
user > user_input
user_input > profile_extraction
profile_extraction > extracted_profile
extracted_profile > api_fetch
api_fetch > petfinder_results
api_fetch > rescuegroups_results
petfinder_results > dedup
rescuegroups_results > dedup
dedup > dedup_details
dedup > cache
cache > sqlite_cache
cache > vector_store
sqlite_cache > matching
vector_store > matching
matching > results
results > display
display > user
```
---
## Agent Interaction Diagram
```eraser
// Tuxedo Link - Agent Interactions
// Planning Agent (Orchestrator)
planner [label: "PlanningAgent\n(Orchestrator)", icon: brain, color: orange]
// Worker Agents
profile [label: "ProfileAgent", icon: chat, color: purple]
petfinder [label: "PetfinderAgent", icon: api, color: blue]
rescue [label: "RescueGroupsAgent", icon: api, color: blue]
dedup [label: "DeduplicationAgent", icon: filter, color: cyan]
matching [label: "MatchingAgent", icon: search, color: green]
email [label: "EmailAgent", icon: email, color: red]
// Data Stores
db [label: "DatabaseManager", icon: database, color: gray]
vectordb [label: "VectorDBManager", icon: database, color: gray]
// External
openai [label: "OpenAI API", icon: openai, color: green]
apis [label: "External APIs", icon: api, color: blue]
sendgrid [label: "SendGrid", icon: email, color: red]
// Orchestration
planner > profile: 1. Extract preferences
profile > openai: API call
openai > profile: Structured output
profile > planner: CatProfile
planner > petfinder: 2. Search (parallel)
planner > rescue: 2. Search (parallel)
petfinder > apis: API request
rescue > apis: API request
apis > petfinder: Cat data
apis > rescue: Cat data
petfinder > planner: Cats list
rescue > planner: Cats list
planner > dedup: 3. Remove duplicates
dedup > db: Check cache
db > dedup: Cached embeddings
dedup > planner: Unique cats
planner > db: 4. Cache results
planner > vectordb: 5. Update embeddings
planner > matching: 6. Find matches
matching > vectordb: Vector search
matching > db: Metadata filter
vectordb > matching: Similar cats
db > matching: Filtered cats
matching > planner: Ranked matches
planner > email: 7. Send notifications (if alert)
email > sendgrid: API call
sendgrid > email: Delivery status
```
---
## Deployment Architecture
```eraser
// Tuxedo Link - Modal Deployment
// Local Development
local [label: "Local Development", icon: laptop, color: purple] {
gradio_dev [label: "Gradio UI\n:7860"]
dev_db [label: "SQLite DB\n./data/"]
dev_vector [label: "ChromaDB\n./cat_vectorstore/"]
}
// Modal Cloud
modal [label: "Modal Cloud", icon: cloud, color: blue] {
// Scheduled Functions
scheduled [label: "Scheduled Functions"] {
daily [label: "daily_search_job\nCron: 0 9 * * *"]
weekly [label: "weekly_search_job\nCron: 0 9 * * 1"]
cleanup [label: "cleanup_job\nCron: 0 2 * * 0"]
}
// On-Demand Functions
ondemand [label: "On-Demand"] {
manual_search [label: "run_scheduled_searches()"]
manual_cleanup [label: "cleanup_old_data()"]
}
// Storage
storage [label: "Modal Volume\n/data"] {
vol_db [label: "tuxedo_link.db"]
vol_vector [label: "cat_vectorstore/"]
}
// Secrets
secrets [label: "Secrets"] {
api_keys [label: "- OPENAI_API_KEY\n- PETFINDER_*\n- RESCUEGROUPS_*\n- SENDGRID_*"]
}
}
// External Services
external [label: "External Services", icon: cloud, color: red] {
openai [label: "OpenAI"]
petfinder [label: "Petfinder"]
rescue [label: "RescueGroups"]
sendgrid [label: "SendGrid"]
}
// Connections
local > modal: Deploy
modal > storage: Persistent data
modal > secrets: Load keys
scheduled > storage: Read/Write
ondemand > storage: Read/Write
modal > external: API calls
```
---
## Database Schema
```eraser
// Tuxedo Link - Database Schema
// Users Table
users [icon: table, color: blue] {
id [label: "id: INTEGER PK"]
email [label: "email: TEXT UNIQUE"]
password_hash [label: "password_hash: TEXT"]
created_at [label: "created_at: DATETIME"]
last_login [label: "last_login: DATETIME"]
}
// Alerts Table
alerts [icon: table, color: green] {
aid [label: "id: INTEGER PK"]
user_id [label: "user_id: INTEGER FK"]
user_email [label: "user_email: TEXT"]
profile_json [label: "profile_json: TEXT"]
frequency [label: "frequency: TEXT"]
last_sent [label: "last_sent: DATETIME"]
active [label: "active: INTEGER"]
created_at [label: "created_at: DATETIME"]
last_match_ids [label: "last_match_ids: TEXT"]
}
// Cats Cache Table
cats_cache [icon: table, color: orange] {
cid [label: "id: TEXT PK"]
name [label: "name: TEXT"]
breed [label: "breed: TEXT"]
age [label: "age: TEXT"]
gender [label: "gender: TEXT"]
size [label: "size: TEXT"]
organization_name [label: "organization_name: TEXT"]
city [label: "city: TEXT"]
state [label: "state: TEXT"]
source [label: "source: TEXT"]
url [label: "url: TEXT"]
cat_json [label: "cat_json: TEXT"]
fingerprint [label: "fingerprint: TEXT"]
image_embedding [label: "image_embedding: BLOB"]
is_duplicate [label: "is_duplicate: INTEGER"]
duplicate_of [label: "duplicate_of: TEXT"]
fetched_at [label: "fetched_at: DATETIME"]
created_at [label: "created_at: DATETIME"]
}
// ChromaDB Collection
vector_collection [icon: database, color: purple] {
cats_embeddings [label: "Collection: cats_embeddings"]
embedding_dim [label: "Dimensions: 384"]
model [label: "Model: all-MiniLM-L6-v2"]
metadata [label: "Metadata: name, breed, age, etc."]
}
// Relationships
users > alerts: user_id
alerts > cats_cache: Search results
cats_cache > vector_collection: Embeddings
```
---
## Diagram Types Included
1. **System Architecture** - High-level overview of all components
2. **Detailed Component Architecture** - Deep dive into layers and connections
3. **Data Flow Diagram** - Step-by-step search process
4. **Agent Interaction Diagram** - How agents communicate
5. **Deployment Architecture** - Modal cloud deployment
6. **Database Schema** - Data model and relationships
---
## Architecture Highlights
### Layered Architecture
```
┌─────────────────────────────────────┐
│ Frontend Layer (Gradio UI) │
├─────────────────────────────────────┤
│ Framework Layer (Orchestration) │
├─────────────────────────────────────┤
│ Agent Layer (7 Specialized Agents) │
├─────────────────────────────────────┤
│ Data Layer (SQLite + ChromaDB) │
├─────────────────────────────────────┤
│ External APIs (4 Services) │
└─────────────────────────────────────┘
```
### Key Design Patterns
- **Agent Pattern**: Specialized agents for different tasks
- **Orchestrator Pattern**: Planning agent coordinates workflow
- **Repository Pattern**: DatabaseManager abstracts data access
- **Strategy Pattern**: Different search strategies (Petfinder, RescueGroups)
- **Decorator Pattern**: Rate limiting and timing decorators
- **Observer Pattern**: Scheduled jobs watch for new alerts
### Technology Stack
**Frontend**: Gradio
**Backend**: Python 3.12
**Framework**: Custom Agent-based
**Databases**: SQLite, ChromaDB
**AI/ML**: OpenAI GPT-4, CLIP, SentenceTransformers
**Deployment**: Modal (Serverless)
**APIs**: Petfinder, RescueGroups, SendGrid

View File

@@ -1,55 +0,0 @@
// Tuxedo Link - Agent Interactions
// Planning Agent (Orchestrator)
planner [label: "PlanningAgent\n(Orchestrator)", icon: brain, color: orange]
// Worker Agents
profile [label: "ProfileAgent", icon: chat, color: purple]
petfinder [label: "PetfinderAgent", icon: api, color: blue]
rescue [label: "RescueGroupsAgent", icon: api, color: blue]
dedup [label: "DeduplicationAgent", icon: filter, color: cyan]
matching [label: "MatchingAgent", icon: search, color: green]
email [label: "EmailAgent", icon: email, color: red]
// Data Stores
db [label: "DatabaseManager", icon: database, color: gray]
vectordb [label: "VectorDBManager", icon: database, color: gray]
// External
openai [label: "OpenAI API", icon: openai, color: green]
apis [label: "External APIs", icon: api, color: blue]
sendgrid [label: "SendGrid", icon: email, color: red]
// Orchestration
planner > profile: 1. Extract preferences
profile > openai: API call
openai > profile: Structured output
profile > planner: CatProfile
planner > petfinder: 2. Search (parallel)
planner > rescue: 2. Search (parallel)
petfinder > apis: API request
rescue > apis: API request
apis > petfinder: Cat data
apis > rescue: Cat data
petfinder > planner: Cats list
rescue > planner: Cats list
planner > dedup: 3. Remove duplicates
dedup > db: Check cache
db > dedup: Cached embeddings
dedup > planner: Unique cats
planner > db: 4. Cache results
planner > vectordb: 5. Update embeddings
planner > matching: 6. Find matches
matching > vectordb: Vector search
matching > db: Metadata filter
vectordb > matching: Similar cats
db > matching: Filtered cats
matching > planner: Ranked matches
planner > email: 7. Send notifications (if alert)
email > sendgrid: API call
sendgrid > email: Delivery status

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 586 KiB

View File

@@ -1,114 +0,0 @@
// Tuxedo Link - Detailed Component Architecture
// Users
user [icon: user, color: purple]
// Frontend - Gradio UI
ui_layer [color: #E8F5E9] {
gradio_app [label: "Gradio Application"]
search_interface [label: "Search Tab"]
alerts_interface [label: "Alerts Tab"]
about_interface [label: "About Tab"]
gradio_app > search_interface
gradio_app > alerts_interface
gradio_app > about_interface
}
// Framework Layer
framework_layer [color: #FFF3E0] {
tuxedo_framework [label: "TuxedoLinkFramework", icon: server]
user_manager [label: "UserManager", icon: user]
tuxedo_framework > user_manager
}
// Orchestration Layer
orchestration [color: #E3F2FD] {
planning_agent [label: "PlanningAgent\n(Orchestrator)", icon: brain]
}
// Processing Agents
processing_agents [color: #F3E5F5] {
profile_agent [label: "ProfileAgent\n(GPT-4)", icon: chat]
matching_agent [label: "MatchingAgent\n(Hybrid Search)", icon: search]
dedup_agent [label: "DeduplicationAgent\n(Fingerprint+CLIP)", icon: filter]
}
// External Integration Agents
external_agents [color: #E0F2F1] {
petfinder_agent [label: "PetfinderAgent\n(OAuth)", icon: api]
rescuegroups_agent [label: "RescueGroupsAgent\n(API Key)", icon: api]
email_agent [label: "EmailAgent\n(SendGrid)", icon: email]
}
// Data Storage
storage_layer [color: #ECEFF1] {
sqlite_db [label: "SQLite Database", icon: database]
vector_db [label: "ChromaDB\n(Vector Store)", icon: database]
db_tables [label: "Tables"] {
users_table [label: "users"]
alerts_table [label: "alerts"]
cats_cache_table [label: "cats_cache"]
}
vector_collections [label: "Collections"] {
cats_collection [label: "cats_embeddings"]
}
sqlite_db > db_tables
vector_db > vector_collections
}
// External Services
external_services [color: #FFEBEE] {
openai_api [label: "OpenAI API\n(GPT-4)", icon: openai]
petfinder_api [label: "Petfinder API\n(OAuth 2.0)", icon: api]
rescuegroups_api [label: "RescueGroups API\n(API Key)", icon: api]
sendgrid_api [label: "SendGrid API\n(Email)", icon: email]
}
// Deployment Layer
deployment [color: #E8EAF6] {
modal_service [label: "Modal (Serverless)", icon: cloud]
modal_functions [label: "Functions"] {
daily_job [label: "daily_search_job"]
weekly_job [label: "weekly_search_job"]
cleanup_job [label: "cleanup_job"]
}
modal_storage [label: "Storage"] {
volume [label: "Modal Volume\n(/data)"]
}
modal_service > modal_functions
modal_service > modal_storage
}
// User Flows
user > ui_layer: Interact
ui_layer > framework_layer: API calls
framework_layer > orchestration: Search request
// Orchestration Flow
orchestration > processing_agents: Extract profile
orchestration > external_agents: Fetch cats
orchestration > processing_agents: Deduplicate
orchestration > processing_agents: Match & rank
orchestration > storage_layer: Cache results
// Agent to External Services
processing_agents > external_services: Profile extraction
external_agents > external_services: API requests
external_agents > external_services: Send emails
// Agent to Storage
processing_agents > storage_layer: Store/retrieve
external_agents > storage_layer: Cache & embeddings
orchestration > storage_layer: Query & update
// Modal Integration
deployment > framework_layer: Scheduled tasks
deployment > storage_layer: Persistent data

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 1.1 MiB

View File

@@ -1,58 +0,0 @@
// Tuxedo Link - Database Schema
// Users Table
users [icon: table, color: blue] {
id [label: "id: INTEGER PK"]
email [label: "email: TEXT UNIQUE"]
password_hash [label: "password_hash: TEXT"]
created_at [label: "created_at: DATETIME"]
last_login [label: "last_login: DATETIME"]
}
// Alerts Table
alerts [icon: table, color: green] {
aid [label: "id: INTEGER PK"]
user_id [label: "user_id: INTEGER FK"]
user_email [label: "user_email: TEXT"]
profile_json [label: "profile_json: TEXT"]
frequency [label: "frequency: TEXT"]
last_sent [label: "last_sent: DATETIME"]
active [label: "active: INTEGER"]
created_at [label: "created_at: DATETIME"]
last_match_ids [label: "last_match_ids: TEXT"]
}
// Cats Cache Table
cats_cache [icon: table, color: orange] {
cid [label: "id: TEXT PK"]
name [label: "name: TEXT"]
breed [label: "breed: TEXT"]
age [label: "age: TEXT"]
gender [label: "gender: TEXT"]
size [label: "size: TEXT"]
organization_name [label: "organization_name: TEXT"]
city [label: "city: TEXT"]
state [label: "state: TEXT"]
source [label: "source: TEXT"]
url [label: "url: TEXT"]
cat_json [label: "cat_json: TEXT"]
fingerprint [label: "fingerprint: TEXT"]
image_embedding [label: "image_embedding: BLOB"]
is_duplicate [label: "is_duplicate: INTEGER"]
duplicate_of [label: "duplicate_of: TEXT"]
fetched_at [label: "fetched_at: DATETIME"]
created_at [label: "created_at: DATETIME"]
}
// ChromaDB Collection
vector_collection [icon: database, color: purple] {
cats_embeddings [label: "Collection: cats_embeddings"]
embedding_dim [label: "Dimensions: 384"]
model [label: "Model: all-MiniLM-L6-v2"]
metadata [label: "Metadata: name, breed, age, etc."]
}
// Relationships
users > alerts: user_id
alerts > cats_cache: Search results
cats_cache > vector_collection: Embeddings

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 817 KiB

View File

@@ -1,51 +0,0 @@
// Tuxedo Link - Modal Deployment
// Local Development
local [label: "Local Development", icon: laptop, color: purple] {
gradio_dev [label: "Gradio UI\n:7860"]
dev_db [label: "SQLite DB\n./data/"]
dev_vector [label: "ChromaDB\n./cat_vectorstore/"]
}
// Modal Cloud
modal [label: "Modal Cloud", icon: cloud, color: blue] {
// Scheduled Functions
scheduled [label: "Scheduled Functions"] {
daily [label: "daily_search_job\nCron: 0 9 * * *"]
weekly [label: "weekly_search_job\nCron: 0 9 * * 1"]
cleanup [label: "cleanup_job\nCron: 0 2 * * 0"]
}
// On-Demand Functions
ondemand [label: "On-Demand"] {
manual_search [label: "run_scheduled_searches()"]
manual_cleanup [label: "cleanup_old_data()"]
}
// Storage
storage [label: "Modal Volume\n/data"] {
vol_db [label: "tuxedo_link.db"]
vol_vector [label: "cat_vectorstore/"]
}
// Secrets
secrets [label: "Secrets"] {
api_keys [label: "- OPENAI_API_KEY\n- PETFINDER_*\n- RESCUEGROUPS_*\n- SENDGRID_*"]
}
}
// External Services
external [label: "External Services", icon: cloud, color: red] {
openai [label: "OpenAI"]
petfinder [label: "Petfinder"]
rescue [label: "RescueGroups"]
sendgrid [label: "SendGrid"]
}
// Connections
local > modal: Deploy
modal > storage: Persistent data
modal > secrets: Load keys
scheduled > storage: Read/Write
ondemand > storage: Read/Write
modal > external: API calls

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 599 KiB

View File

@@ -1,58 +0,0 @@
// Tuxedo Link - Search Data Flow
user [icon: user]
// Step 1: User Input
user_input [label: "1. User Input\n'friendly playful cat\nin NYC'"]
// Step 2: Profile Extraction
profile_extraction [label: "2. Profile Agent\n(OpenAI GPT-4)", icon: chat, color: purple]
extracted_profile [label: "CatProfile\n- location: NYC\n- age: young\n- personality: friendly"]
// Step 3: API Fetching (Parallel)
api_fetch [label: "3. Fetch from APIs\n(Parallel)", icon: api, color: blue]
petfinder_results [label: "Petfinder\n50 cats"]
rescuegroups_results [label: "RescueGroups\n50 cats"]
// Step 4: Deduplication
dedup [label: "4. Deduplication\n(3-tier)", icon: filter, color: orange]
dedup_details [label: "- Fingerprint\n- Text similarity\n- Image similarity"]
// Step 5: Cache & Embed
cache [label: "5. Cache & Embed", icon: database, color: gray]
sqlite_cache [label: "SQLite\n(Cat data)"]
vector_store [label: "ChromaDB\n(Embeddings)"]
// Step 6: Hybrid Matching
matching [label: "6. Hybrid Search\n60% vector\n40% metadata", icon: search, color: green]
// Step 7: Results
results [label: "7. Ranked Results\nTop 20 matches"]
// Step 8: Display
display [label: "8. Display to User\nwith explanations", icon: browser, color: purple]
// Flow connections
user > user_input
user_input > profile_extraction
profile_extraction > extracted_profile
extracted_profile > api_fetch
api_fetch > petfinder_results
api_fetch > rescuegroups_results
petfinder_results > dedup
rescuegroups_results > dedup
dedup > dedup_details
dedup > cache
cache > sqlite_cache
cache > vector_store
sqlite_cache > matching
vector_store > matching
matching > results
results > display
display > user

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 557 KiB

View File

@@ -1,54 +0,0 @@
// Tuxedo Link - High-Level System Architecture
// External APIs
openai [icon: openai, color: green]
petfinder [icon: api, color: blue]
rescuegroups [icon: api, color: blue]
sendgrid [icon: email, color: red]
// Frontend Layer
gradio [icon: browser, color: purple] {
search_tab
alerts_tab
about_tab
}
// Application Layer
framework [icon: server, color: orange] {
TuxedoLinkFramework
}
// Agent Layer
agents [icon: users, color: cyan] {
PlanningAgent
ProfileAgent
PetfinderAgent
RescueGroupsAgent
DeduplicationAgent
MatchingAgent
EmailAgent
}
// Data Layer
databases [icon: database, color: gray] {
SQLite
ChromaDB
}
// Deployment
modal [icon: cloud, color: blue] {
scheduled_jobs
volume_storage
}
// Connections
gradio > framework: User requests
framework > agents: Orchestrate
agents > openai: Profile extraction
agents > petfinder: Search cats
agents > rescuegroups: Search cats
agents > sendgrid: Send notifications
agents > databases: Store/retrieve
framework > databases: Manage data
modal > framework: Scheduled searches
modal > databases: Persistent storage

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 510 KiB

View File

@@ -1,35 +0,0 @@
# LLM APIs
OPENAI_API_KEY=sk-...
# Pet APIs
PETFINDER_API_KEY=your_petfinder_api_key
PETFINDER_SECRET=your_petfinder_secret
RESCUEGROUPS_API_KEY=your_rescuegroups_api_key
# Email (provider configuration in config.yaml)
MAILGUN_API_KEY=your_mailgun_api_key
SENDGRID_API_KEY=your_sendgrid_api_key_optional
# Modal
MODAL_TOKEN_ID=your_modal_token_id
MODAL_TOKEN_SECRET=your_modal_token_secret
# App Config
DATABASE_PATH=data/tuxedo_link.db
VECTORDB_PATH=cat_vectorstore
TTL_DAYS=30
MAX_DISTANCE_MILES=100
LOG_LEVEL=INFO
# Deduplication Thresholds
DEDUP_NAME_SIMILARITY_THRESHOLD=0.8
DEDUP_DESCRIPTION_SIMILARITY_THRESHOLD=0.7
DEDUP_IMAGE_SIMILARITY_THRESHOLD=0.9
DEDUP_COMPOSITE_THRESHOLD=0.85
# Hybrid Search Config
VECTOR_TOP_N=50
FINAL_RESULTS_LIMIT=20
SEMANTIC_WEIGHT=0.6
ATTRIBUTE_WEIGHT=0.4

View File

@@ -1,378 +0,0 @@
"""
Complete Modal API for Tuxedo Link
All application logic runs on Modal in production mode
"""
import modal
from datetime import datetime
from typing import Dict, List, Any, Optional
from pathlib import Path
from cat_adoption_framework import TuxedoLinkFramework
from models.cats import CatProfile, AdoptionAlert
from database.manager import DatabaseManager
from agents.profile_agent import ProfileAgent
from agents.email_agent import EmailAgent
from agents.email_providers.factory import get_email_provider
# Modal app and configuration
app = modal.App("tuxedo-link-api")
# Create Modal volume for persistent data
volume = modal.Volume.from_name("tuxedo-link-data", create_if_missing=True)
# Reference secrets
secrets = [modal.Secret.from_name("tuxedo-link-secrets")]
# Get project directory
project_dir = Path(__file__).parent
# Modal image with all dependencies and project files
image = (
modal.Image.debian_slim(python_version="3.11")
.pip_install(
"openai",
"chromadb",
"requests",
"sentence-transformers==2.5.1",
"transformers==4.38.0",
"Pillow",
"python-dotenv",
"pydantic",
"geopy",
"pyyaml",
"python-levenshtein",
"open-clip-torch==2.24.0",
)
.apt_install("git")
.run_commands(
"pip install torch==2.2.2 torchvision==0.17.2 --index-url https://download.pytorch.org/whl/cpu",
"pip install numpy==1.26.4",
)
# Add only necessary source directories (Modal 1.0+ API)
.add_local_dir(str(project_dir / "models"), remote_path="/root/models")
.add_local_dir(str(project_dir / "agents"), remote_path="/root/agents")
.add_local_dir(str(project_dir / "database"), remote_path="/root/database")
.add_local_dir(str(project_dir / "utils"), remote_path="/root/utils")
# Add standalone Python files
.add_local_file(str(project_dir / "cat_adoption_framework.py"), remote_path="/root/cat_adoption_framework.py")
.add_local_file(str(project_dir / "setup_vectordb.py"), remote_path="/root/setup_vectordb.py")
.add_local_file(str(project_dir / "setup_metadata_vectordb.py"), remote_path="/root/setup_metadata_vectordb.py")
# Add config file
.add_local_file(str(project_dir / "config.yaml"), remote_path="/root/config.yaml")
)
@app.function(
image=image,
volumes={"/data": volume},
secrets=secrets,
timeout=600,
cpu=2.0,
memory=4096,
)
def search_cats(profile_dict: Dict[str, Any], use_cache: bool = False) -> Dict[str, Any]:
"""
Main search function - runs all agents and returns matches.
This is the primary API endpoint for cat searches in production mode.
Args:
profile_dict: CatProfile as dictionary
use_cache: Whether to use cached data
Returns:
Dict with matches, stats, and search metadata
"""
print(f"[{datetime.now()}] Modal API: Starting cat search")
print(f"Profile location: {profile_dict.get('user_location', 'Not specified')}")
print(f"Cache mode: {use_cache}")
try:
# Initialize framework
framework = TuxedoLinkFramework()
# Reconstruct profile
profile = CatProfile(**profile_dict)
# Run search
result = framework.search(profile, use_cache=use_cache)
print(f"Found {len(result.matches)} matches")
print(f"Duplicates removed: {result.duplicates_removed}")
print(f"Sources: {len(result.sources_queried)}")
# Convert to serializable dict
return {
"success": True,
"matches": [
{
"cat": m.cat.model_dump(),
"match_score": m.match_score,
"vector_similarity": m.vector_similarity,
"attribute_match_score": m.attribute_match_score,
"explanation": m.explanation,
"matching_attributes": m.matching_attributes,
"missing_attributes": m.missing_attributes,
}
for m in result.matches
],
"total_found": result.total_found,
"duplicates_removed": result.duplicates_removed,
"sources_queried": result.sources_queried,
"timestamp": datetime.now().isoformat(),
}
except Exception as e:
print(f"Error in search_cats: {e}")
import traceback
traceback.print_exc()
return {
"success": False,
"error": str(e),
"matches": [],
"total_found": 0,
"duplicates_removed": 0,
"sources_queried": [],
}
@app.function(
image=image,
volumes={"/data": volume},
secrets=secrets,
timeout=300,
)
def create_alert_and_notify(alert_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Create alert in Modal DB and send immediate notification if needed.
Args:
alert_data: AdoptionAlert as dictionary
Returns:
Dict with success status, alert_id, and message
"""
from cat_adoption_framework import TuxedoLinkFramework
from database.manager import DatabaseManager
from models.cats import AdoptionAlert
from agents.email_agent import EmailAgent
from agents.email_providers.factory import get_email_provider
print(f"[{datetime.now()}] Modal API: Creating alert")
try:
# Initialize components
db_manager = DatabaseManager("/data/tuxedo_link.db")
# Reconstruct alert
alert = AdoptionAlert(**alert_data)
print(f"Alert for: {alert.user_email}, frequency: {alert.frequency}")
# Save to Modal DB
alert_id = db_manager.create_alert(alert)
print(f"Alert created with ID: {alert_id}")
alert.id = alert_id
# If immediate, send notification now
if alert.frequency == "immediately":
print("Processing immediate notification...")
framework = TuxedoLinkFramework()
email_provider = get_email_provider()
email_agent = EmailAgent(email_provider)
# Run search
result = framework.search(alert.profile, use_cache=False)
if result.matches:
print(f"Found {len(result.matches)} matches")
if email_agent.enabled:
email_sent = email_agent.send_match_notification(alert, result.matches)
if email_sent:
# Update last_sent
match_ids = [m.cat.id for m in result.matches]
db_manager.update_alert(
alert_id,
last_sent=datetime.now(),
last_match_ids=match_ids
)
return {
"success": True,
"alert_id": alert_id,
"message": f"Alert created and {len(result.matches)} matches sent to {alert.user_email}!"
}
else:
return {
"success": False,
"alert_id": alert_id,
"message": "Alert created but email failed to send"
}
else:
return {
"success": True,
"alert_id": alert_id,
"message": "Alert created but no matches found yet"
}
else:
return {
"success": True,
"alert_id": alert_id,
"message": f"Alert created! You'll receive {alert.frequency} notifications at {alert.user_email}"
}
except Exception as e:
print(f"Error creating alert: {e}")
import traceback
traceback.print_exc()
return {
"success": False,
"alert_id": None,
"message": f"Error: {str(e)}"
}
@app.function(
image=image,
volumes={"/data": volume},
secrets=secrets,
timeout=60,
)
def get_alerts(email: Optional[str] = None) -> List[Dict[str, Any]]:
"""
Get alerts from Modal DB.
Args:
email: Optional email filter
Returns:
List of alert dictionaries
"""
from database.manager import DatabaseManager
try:
db_manager = DatabaseManager("/data/tuxedo_link.db")
if email:
alerts = db_manager.get_alerts_by_email(email)
else:
alerts = db_manager.get_all_alerts()
return [alert.dict() for alert in alerts]
except Exception as e:
print(f"Error getting alerts: {e}")
return []
@app.function(
image=image,
volumes={"/data": volume},
secrets=secrets,
timeout=60,
)
def update_alert(alert_id: int, active: Optional[bool] = None) -> bool:
"""
Update alert in Modal DB.
Args:
alert_id: Alert ID
active: New active status
Returns:
True if successful
"""
from database.manager import DatabaseManager
try:
db_manager = DatabaseManager("/data/tuxedo_link.db")
db_manager.update_alert(alert_id, active=active)
return True
except Exception as e:
print(f"Error updating alert: {e}")
return False
@app.function(
image=image,
volumes={"/data": volume},
secrets=secrets,
timeout=60,
)
def delete_alert(alert_id: int) -> bool:
"""
Delete alert from Modal DB.
Args:
alert_id: Alert ID
Returns:
True if successful
"""
from database.manager import DatabaseManager
try:
db_manager = DatabaseManager("/data/tuxedo_link.db")
db_manager.delete_alert(alert_id)
return True
except Exception as e:
print(f"Error deleting alert: {e}")
return False
@app.function(
image=image,
volumes={"/data": volume},
secrets=secrets,
timeout=120,
)
def extract_profile(user_input: str) -> Dict[str, Any]:
"""
Extract cat profile from natural language using LLM.
Args:
user_input: User's description of desired cat
Returns:
CatProfile as dictionary
"""
from agents.profile_agent import ProfileAgent
print(f"[{datetime.now()}] Modal API: Extracting profile")
try:
agent = ProfileAgent()
conversation = [{"role": "user", "content": user_input}]
profile = agent.extract_profile(conversation)
return {
"success": True,
"profile": profile.dict()
}
except Exception as e:
print(f"Error extracting profile: {e}")
import traceback
traceback.print_exc()
return {
"success": False,
"error": str(e),
"profile": None
}
# Health check
@app.function(image=image, timeout=10)
def health_check() -> Dict[str, str]:
"""Health check endpoint."""
return {
"status": "healthy",
"timestamp": datetime.now().isoformat(),
"service": "tuxedo-link-api"
}

View File

@@ -1,6 +0,0 @@
"""Data models for Tuxedo Link."""
from .cats import Cat, CatProfile, CatMatch, AdoptionAlert, SearchResult
__all__ = ["Cat", "CatProfile", "CatMatch", "AdoptionAlert", "SearchResult"]

View File

@@ -1,229 +0,0 @@
"""Pydantic models for cat adoption data."""
from datetime import datetime
from typing import List, Optional, Dict, Any
from pydantic import BaseModel, Field, field_validator
class Cat(BaseModel):
"""Model representing a cat available for adoption."""
# Basic information
id: str = Field(..., description="Unique identifier from source")
name: str = Field(..., description="Cat's name")
breed: str = Field(..., description="Primary breed")
breeds_secondary: Optional[List[str]] = Field(default=None, description="Secondary breeds")
age: str = Field(..., description="Age category: kitten, young, adult, senior")
size: str = Field(..., description="Size: small, medium, large")
gender: str = Field(..., description="Gender: male, female, unknown")
description: str = Field(default="", description="Full description of the cat")
# Location information
organization_name: str = Field(..., description="Rescue organization name")
organization_id: Optional[str] = Field(default=None, description="Organization ID")
city: Optional[str] = Field(default=None, description="City")
state: Optional[str] = Field(default=None, description="State/Province")
zip_code: Optional[str] = Field(default=None, description="ZIP/Postal code")
latitude: Optional[float] = Field(default=None, description="Latitude coordinate")
longitude: Optional[float] = Field(default=None, description="Longitude coordinate")
country: Optional[str] = Field(default="US", description="Country code")
distance: Optional[float] = Field(default=None, description="Distance from user in miles")
# Behavioral attributes
good_with_children: Optional[bool] = Field(default=None, description="Good with children")
good_with_dogs: Optional[bool] = Field(default=None, description="Good with dogs")
good_with_cats: Optional[bool] = Field(default=None, description="Good with cats")
special_needs: bool = Field(default=False, description="Has special needs")
# Media
photos: List[str] = Field(default_factory=list, description="List of photo URLs")
primary_photo: Optional[str] = Field(default=None, description="Primary photo URL")
videos: List[str] = Field(default_factory=list, description="List of video URLs")
# Metadata
source: str = Field(..., description="Source: petfinder, rescuegroups")
url: str = Field(..., description="Direct URL to listing")
adoption_fee: Optional[float] = Field(default=None, description="Adoption fee in dollars")
contact_email: Optional[str] = Field(default=None, description="Contact email")
contact_phone: Optional[str] = Field(default=None, description="Contact phone")
fetched_at: datetime = Field(default_factory=datetime.now, description="When data was fetched")
# Deduplication
fingerprint: Optional[str] = Field(default=None, description="Computed fingerprint for deduplication")
# Additional attributes
declawed: Optional[bool] = Field(default=None, description="Is declawed")
spayed_neutered: Optional[bool] = Field(default=None, description="Is spayed/neutered")
house_trained: Optional[bool] = Field(default=None, description="Is house trained")
coat_length: Optional[str] = Field(default=None, description="Coat length: short, medium, long")
colors: List[str] = Field(default_factory=list, description="Coat colors")
@field_validator('age')
@classmethod
def validate_age(cls, v: str) -> str:
"""Validate age category."""
valid_ages = ['kitten', 'young', 'adult', 'senior', 'unknown']
if v.lower() not in valid_ages:
return 'unknown'
return v.lower()
@field_validator('size')
@classmethod
def validate_size(cls, v: str) -> str:
"""Validate size category."""
valid_sizes = ['small', 'medium', 'large', 'unknown']
if v.lower() not in valid_sizes:
return 'unknown'
return v.lower()
@field_validator('gender')
@classmethod
def validate_gender(cls, v: str) -> str:
"""Validate gender."""
valid_genders = ['male', 'female', 'unknown']
if v.lower() not in valid_genders:
return 'unknown'
return v.lower()
class CatProfile(BaseModel):
"""Model representing user preferences for cat adoption."""
# Hard constraints
age_range: Optional[List[str]] = Field(
default=None,
description="Acceptable age categories: kitten, young, adult, senior"
)
size: Optional[List[str]] = Field(
default=None,
description="Acceptable sizes: small, medium, large"
)
max_distance: Optional[int] = Field(
default=100,
description="Maximum distance in miles"
)
good_with_children: Optional[bool] = Field(
default=None,
description="Must be good with children"
)
good_with_dogs: Optional[bool] = Field(
default=None,
description="Must be good with dogs"
)
good_with_cats: Optional[bool] = Field(
default=None,
description="Must be good with cats"
)
special_needs_ok: bool = Field(
default=True,
description="Open to special needs cats"
)
# Soft preferences (for vector search)
personality_description: str = Field(
default="",
description="Free-text description of desired personality and traits"
)
# Breed preferences
preferred_breeds: Optional[List[str]] = Field(
default=None,
description="Preferred breeds"
)
# Location
user_location: Optional[str] = Field(
default=None,
description="User location (ZIP code, city, or lat,long)"
)
user_latitude: Optional[float] = Field(default=None, description="User latitude")
user_longitude: Optional[float] = Field(default=None, description="User longitude")
# Additional preferences
gender_preference: Optional[str] = Field(
default=None,
description="Preferred gender: male, female, or None for no preference"
)
coat_length_preference: Optional[List[str]] = Field(
default=None,
description="Preferred coat lengths: short, medium, long"
)
color_preferences: Optional[List[str]] = Field(
default=None,
description="Preferred colors"
)
must_be_declawed: Optional[bool] = Field(default=None, description="Must be declawed")
must_be_spayed_neutered: Optional[bool] = Field(default=None, description="Must be spayed/neutered")
@field_validator('age_range')
@classmethod
def validate_age_range(cls, v: Optional[List[str]]) -> Optional[List[str]]:
"""Validate age range values."""
if v is None:
return None
valid_ages = {'kitten', 'young', 'adult', 'senior'}
return [age.lower() for age in v if age.lower() in valid_ages]
@field_validator('size')
@classmethod
def validate_size_list(cls, v: Optional[List[str]]) -> Optional[List[str]]:
"""Validate size values."""
if v is None:
return None
valid_sizes = {'small', 'medium', 'large'}
return [size.lower() for size in v if size.lower() in valid_sizes]
class CatMatch(BaseModel):
"""Model representing a matched cat with scoring details."""
cat: Cat = Field(..., description="The matched cat")
match_score: float = Field(..., description="Overall match score (0-1)")
vector_similarity: float = Field(..., description="Vector similarity score (0-1)")
attribute_match_score: float = Field(..., description="Attribute match score (0-1)")
explanation: str = Field(default="", description="Human-readable match explanation")
matching_attributes: List[str] = Field(
default_factory=list,
description="List of matching attributes"
)
missing_attributes: List[str] = Field(
default_factory=list,
description="List of desired but missing attributes"
)
class AdoptionAlert(BaseModel):
"""Model representing a scheduled adoption alert."""
id: Optional[int] = Field(default=None, description="Alert ID (assigned by database)")
user_email: str = Field(..., description="User email for notifications")
profile: CatProfile = Field(..., description="Search profile")
frequency: str = Field(..., description="Frequency: immediately, daily, weekly")
last_sent: Optional[datetime] = Field(default=None, description="Last notification sent")
active: bool = Field(default=True, description="Is alert active")
created_at: datetime = Field(default_factory=datetime.now, description="When alert was created")
last_match_ids: List[str] = Field(
default_factory=list,
description="IDs of cats from last notification (to avoid duplicates)"
)
@field_validator('frequency')
@classmethod
def validate_frequency(cls, v: str) -> str:
"""Validate frequency value."""
valid_frequencies = ['immediately', 'daily', 'weekly']
if v.lower() not in valid_frequencies:
raise ValueError(f"Frequency must be one of: {valid_frequencies}")
return v.lower()
class SearchResult(BaseModel):
"""Model representing search results returned to UI."""
matches: List[CatMatch] = Field(..., description="List of matched cats")
total_found: int = Field(..., description="Total cats found before filtering")
search_profile: CatProfile = Field(..., description="Search profile used")
search_time: float = Field(..., description="Search time in seconds")
sources_queried: List[str] = Field(..., description="Sources that were queried")
duplicates_removed: int = Field(default=0, description="Number of duplicates removed")

View File

@@ -1,61 +0,0 @@
[project]
name = "tuxedo-link"
version = "0.1.0"
description = "AI-powered cat adoption matching application"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"pydantic>=2.0",
"python-dotenv",
"requests",
"chromadb",
"sentence-transformers",
"transformers",
"torch==2.2.2",
"pillow",
"scikit-learn",
"open-clip-torch",
"python-Levenshtein",
"beautifulsoup4",
"feedparser",
"sendgrid",
"gradio",
"plotly",
"modal",
"tqdm",
"numpy==1.26.4",
"openai",
"pyyaml",
]
[project.optional-dependencies]
dev = [
"pytest",
"pytest-mock",
"pytest-asyncio",
"pytest-cov",
"ipython",
"jupyter",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["models", "database", "agents", "modal_services", "utils"]
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = "test_*.py"
python_classes = "Test*"
python_functions = "test_*"
addopts = "-v --cov=. --cov-report=html --cov-report=term"
[tool.coverage.run]
omit = [
"tests/*",
"setup.py",
"*/site-packages/*",
]

View File

@@ -1,50 +0,0 @@
# Core
pydantic>=2.0
python-dotenv
requests
# Database
chromadb
# sqlite3 is built-in to Python
# Vector & ML
sentence-transformers
transformers
torch
pillow
scikit-learn
# Image embeddings
open-clip-torch
# Fuzzy matching
python-Levenshtein
# Web scraping & APIs (for potential future sources)
beautifulsoup4
feedparser
# Email
sendgrid
# Mailgun uses requests library (already included above)
# Configuration
pyyaml
# UI
gradio
plotly
# Modal
modal
# Testing
pytest
pytest-mock
pytest-asyncio
pytest-cov
# Utilities
tqdm
numpy

View File

@@ -1,82 +0,0 @@
#!/bin/bash
# Launch script for Tuxedo Link
# Colors
GREEN='\033[0;32m'
BLUE='\033[0;34m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
echo -e "${BLUE}🎩 Tuxedo Link - AI-Powered Cat Adoption Search${NC}"
echo ""
# Check if virtual environment exists
if [ ! -d ".venv" ]; then
echo -e "${YELLOW}⚠️ Virtual environment not found. Please run setup first:${NC}"
echo " uv venv && source .venv/bin/activate && uv pip install -e \".[dev]\""
exit 1
fi
# Activate virtual environment
echo -e "${GREEN}${NC} Activating virtual environment..."
source .venv/bin/activate
# Check if .env exists
if [ ! -f ".env" ]; then
echo -e "${YELLOW}⚠️ .env file not found. Creating from template...${NC}"
if [ -f "env.example" ]; then
cp env.example .env
echo -e "${YELLOW}Please edit .env with your API keys before continuing.${NC}"
exit 1
fi
fi
# Check if config.yaml exists
if [ ! -f "config.yaml" ]; then
echo -e "${YELLOW}⚠️ config.yaml not found. Creating from example...${NC}"
if [ -f "config.example.yaml" ]; then
cp config.example.yaml config.yaml
echo -e "${GREEN}${NC} config.yaml created. Review settings if needed."
fi
fi
# Check deployment mode from config
DEPLOYMENT_MODE=$(python -c "import yaml; config = yaml.safe_load(open('config.yaml')); print(config['deployment']['mode'])" 2>/dev/null || echo "local")
if [ "$DEPLOYMENT_MODE" = "production" ]; then
echo -e "${BLUE}📡 Production mode enabled${NC}"
echo " UI will connect to Modal backend"
echo " All searches and agents run on Modal"
echo ""
else
echo -e "${GREEN}💻 Local mode enabled${NC}"
echo " All components run locally"
echo ""
fi
# Check for required API keys
if ! grep -q "OPENAI_API_KEY=sk-" .env 2>/dev/null && ! grep -q "PETFINDER_API_KEY" .env 2>/dev/null; then
echo -e "${YELLOW}⚠️ Please configure API keys in .env file${NC}"
echo " Required: OPENAI_API_KEY, PETFINDER_API_KEY"
exit 1
fi
echo -e "${GREEN}${NC} Environment configured"
# Initialize databases if needed
if [ ! -f "data/tuxedo_link.db" ]; then
echo -e "${GREEN}${NC} Initializing databases..."
python setup_vectordb.py > /dev/null 2>&1
fi
echo -e "${GREEN}${NC} Databases ready"
echo ""
echo -e "${BLUE}🚀 Starting Tuxedo Link...${NC}"
echo ""
echo -e " ${GREEN}${NC} Opening http://localhost:7860"
echo -e " ${GREEN}${NC} Press Ctrl+C to stop"
echo ""
# Launch the app
python app.py

View File

@@ -1,389 +0,0 @@
"""Modal scheduled search service for running automated cat searches."""
import modal
from datetime import datetime
from typing import Dict, Any
from pathlib import Path
# Local imports - available because we use .add_local_dir() to copy all project files
from cat_adoption_framework import TuxedoLinkFramework
from database.manager import DatabaseManager
from agents.email_agent import EmailAgent
from agents.email_providers.factory import get_email_provider
# Create Modal app
app = modal.App("tuxedo-link-scheduled-search")
# Get project directory
project_dir = Path(__file__).parent
# Define image with all dependencies and project files
image = (
modal.Image.debian_slim(python_version="3.11")
.pip_install(
"openai",
"chromadb",
"sentence-transformers==2.5.1", # Compatible with torch 2.2.2
"transformers==4.38.0", # Compatible with torch 2.2.2
"python-dotenv",
"pydantic",
"requests",
"sendgrid",
"pyyaml",
"python-levenshtein",
"Pillow",
"geopy",
"open-clip-torch==2.24.0", # Compatible with torch 2.2.2
)
.apt_install("git")
.run_commands(
"pip install torch==2.2.2 torchvision==0.17.2 --index-url https://download.pytorch.org/whl/cpu",
"pip install numpy==1.26.4",
)
# Add only necessary source directories (Modal 1.0+ API)
.add_local_dir(str(project_dir / "models"), remote_path="/root/models")
.add_local_dir(str(project_dir / "agents"), remote_path="/root/agents")
.add_local_dir(str(project_dir / "database"), remote_path="/root/database")
.add_local_dir(str(project_dir / "utils"), remote_path="/root/utils")
# Add standalone Python files
.add_local_file(str(project_dir / "cat_adoption_framework.py"), remote_path="/root/cat_adoption_framework.py")
.add_local_file(str(project_dir / "setup_vectordb.py"), remote_path="/root/setup_vectordb.py")
.add_local_file(str(project_dir / "setup_metadata_vectordb.py"), remote_path="/root/setup_metadata_vectordb.py")
# Add config file
.add_local_file(str(project_dir / "config.yaml"), remote_path="/root/config.yaml")
)
# Create Volume for persistent storage (database and vector store)
volume = modal.Volume.from_name("tuxedo-link-data", create_if_missing=True)
# Define secrets
secrets = [
modal.Secret.from_name("tuxedo-link-secrets") # Contains all API keys
]
@app.function(
image=image,
volumes={"/data": volume},
secrets=secrets,
timeout=600, # 10 minutes
)
def run_scheduled_searches() -> None:
"""
Run scheduled searches for all active alerts.
This function:
1. Loads all active adoption alerts from database
2. For each alert, runs a cat search based on saved profile
3. If new matches found, sends email notification
4. Updates alert last_sent timestamp
"""
print(f"[{datetime.now()}] Starting scheduled search job")
# Initialize components
framework = TuxedoLinkFramework()
db_manager = DatabaseManager("/data/tuxedo_link.db")
email_agent = EmailAgent()
# Get all active alerts
alerts = db_manager.get_active_alerts()
print(f"Found {len(alerts)} active alerts")
for alert in alerts:
try:
print(f"Processing alert {alert.id} for {alert.user_email}")
# Run search
result = framework.search(alert.profile)
# Filter out cats already seen
new_matches = [
m for m in result.matches
if m.cat.id not in alert.last_match_ids
]
if new_matches:
print(f"Found {len(new_matches)} new matches for alert {alert.id}")
# Send email
if email_agent.enabled:
email_sent = email_agent.send_match_notification(alert, new_matches)
if email_sent:
# Update last_sent and last_match_ids
new_match_ids = [m.cat.id for m in new_matches]
db_manager.update_alert(
alert.id,
last_sent=datetime.now(),
last_match_ids=new_match_ids
)
print(f"Email sent successfully for alert {alert.id}")
else:
print(f"Failed to send email for alert {alert.id}")
else:
print("Email agent disabled")
else:
print(f"No new matches for alert {alert.id}")
except Exception as e:
print(f"Error processing alert {alert.id}: {e}")
continue
print(f"[{datetime.now()}] Scheduled search job completed")
@app.function(
image=image,
volumes={"/data": volume},
secrets=secrets,
timeout=300,
)
def send_immediate_notification(alert_id: int) -> bool:
"""
Send immediate notification for a specific alert.
This is called when an alert is created with frequency="immediately".
Args:
alert_id: The ID of the alert to process
Returns:
bool: True if notification sent successfully, False otherwise
"""
import sys
import os
# Add project root to path
print(f"[{datetime.now()}] Processing immediate notification for alert {alert_id}")
try:
# Initialize components
framework = TuxedoLinkFramework()
db_manager = DatabaseManager("/data/tuxedo_link.db")
email_agent = EmailAgent()
# Get the alert
alert = db_manager.get_alert(alert_id)
if not alert:
print(f"Alert {alert_id} not found")
return False
if not alert.active:
print(f"Alert {alert_id} is inactive")
return False
# Run search
result = framework.search(alert.profile)
if result.matches:
print(f"Found {len(result.matches)} matches for alert {alert_id}")
# Send email
if email_agent.enabled:
email_sent = email_agent.send_match_notification(alert, result.matches)
if email_sent:
# Update last_sent and last_match_ids
match_ids = [m.cat.id for m in result.matches]
db_manager.update_alert(
alert.id,
last_sent=datetime.now(),
last_match_ids=match_ids
)
print(f"Email sent successfully for alert {alert_id}")
return True
else:
print(f"Failed to send email for alert {alert_id}")
return False
else:
print("Email agent disabled")
return False
else:
print(f"No matches found for alert {alert_id}")
return False
except Exception as e:
print(f"Error processing immediate notification for alert {alert_id}: {e}")
return False
@app.function(
image=image,
volumes={"/data": volume},
secrets=secrets,
timeout=300,
)
def create_alert_and_notify(alert_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Create an alert in Modal's database and send immediate notification.
This is called from the UI in production mode when creating an alert.
The alert is saved to Modal's database, then processed if immediate.
Args:
alert_data: Dictionary containing alert data (from AdoptionAlert.dict())
Returns:
Dict with {"success": bool, "alert_id": int, "message": str}
"""
import sys
import os
# Add project root to path
print(f"[{datetime.now()}] Creating alert in Modal DB")
try:
# Initialize database
db_manager = DatabaseManager("/data/tuxedo_link.db")
# Reconstruct alert from dict
alert = AdoptionAlert(**alert_data)
print(f"Alert for: {alert.user_email}, location: {alert.profile.user_location if alert.profile else 'None'}")
# Save alert to Modal's database
alert_id = db_manager.create_alert(alert)
print(f"✓ Alert created in Modal DB with ID: {alert_id}")
# Update alert with the ID
alert.id = alert_id
# If immediate frequency, send notification now
if alert.frequency == "immediately":
print(f"Sending immediate notification...")
framework = TuxedoLinkFramework()
email_provider = get_email_provider()
email_agent = EmailAgent(email_provider)
# Run search
result = framework.search(alert.profile, use_cache=False)
if result.matches:
print(f"Found {len(result.matches)} matches")
# Send email
if email_agent.enabled:
email_sent = email_agent.send_match_notification(alert, result.matches)
if email_sent:
# Update last_sent
match_ids = [m.cat.id for m in result.matches]
db_manager.update_alert(
alert_id,
last_sent=datetime.now(),
last_match_ids=match_ids
)
print(f"✓ Email sent to {alert.user_email}")
return {
"success": True,
"alert_id": alert_id,
"message": f"Alert created and {len(result.matches)} matches sent to {alert.user_email}!"
}
else:
return {
"success": False,
"alert_id": alert_id,
"message": "Alert created but email failed to send"
}
else:
return {
"success": False,
"alert_id": alert_id,
"message": "Email agent not enabled"
}
else:
print(f"No matches found")
return {
"success": True,
"alert_id": alert_id,
"message": "Alert created but no matches found yet"
}
else:
# For daily/weekly alerts
return {
"success": True,
"alert_id": alert_id,
"message": f"Alert created! You'll receive {alert.frequency} notifications at {alert.user_email}"
}
except Exception as e:
print(f"Error creating alert: {e}")
import traceback
traceback.print_exc()
return {
"success": False,
"alert_id": None,
"message": f"Error: {str(e)}"
}
@app.function(
image=image,
schedule=modal.Cron("0 9 * * *"), # Run daily at 9 AM UTC
volumes={"/data": volume},
secrets=secrets,
timeout=600,
)
def daily_search_job() -> None:
"""Daily scheduled job to run cat searches for all daily alerts."""
run_scheduled_searches.remote()
@app.function(
image=image,
schedule=modal.Cron("0 9 * * 1"), # Run weekly on Mondays at 9 AM UTC
volumes={"/data": volume},
secrets=secrets,
timeout=600,
)
def weekly_search_job() -> None:
"""Weekly scheduled job to run cat searches for all weekly alerts."""
run_scheduled_searches.remote()
@app.function(
image=image,
volumes={"/data": volume},
secrets=secrets,
timeout=300,
)
def cleanup_old_data(days: int = 30) -> Dict[str, Any]:
"""
Clean up old cat data from cache and vector database.
Args:
days: Number of days of data to keep (default: 30)
Returns:
Statistics dictionary with cleanup results
"""
import sys
print(f"[{datetime.now()}] Starting cleanup job (keeping last {days} days)")
framework = TuxedoLinkFramework()
stats = framework.cleanup_old_data(days)
print(f"Cleanup complete: {stats}")
print(f"[{datetime.now()}] Cleanup job completed")
return stats
@app.function(
image=image,
schedule=modal.Cron("0 2 * * 0"), # Run weekly on Sundays at 2 AM UTC
volumes={"/data": volume},
secrets=secrets,
timeout=300,
)
def weekly_cleanup_job() -> None:
"""Weekly scheduled job to clean up old data (30+ days)."""
cleanup_old_data.remote(30)
# For manual testing
@app.local_entrypoint()
def main() -> None:
"""Test the scheduled search locally for development."""
run_scheduled_searches.remote()
if __name__ == "__main__":
main()

View File

@@ -1,2 +0,0 @@
"""Deployment and utility scripts."""

View File

@@ -1,76 +0,0 @@
#!/usr/bin/env python
"""Fetch and display valid colors and breeds from Petfinder API."""
import sys
from pathlib import Path
# Add parent directory to path
sys.path.insert(0, str(Path(__file__).parent.parent))
from agents.petfinder_agent import PetfinderAgent
def main():
"""Fetch and display valid cat colors and breeds from Petfinder API."""
print("=" * 70)
print("Fetching Valid Cat Data from Petfinder API")
print("=" * 70)
print()
try:
# Initialize agent
agent = PetfinderAgent()
# Fetch colors
print("📋 COLORS")
print("-" * 70)
colors = agent.get_valid_colors()
print(f"✓ Found {len(colors)} valid colors:")
print()
for i, color in enumerate(colors, 1):
print(f" {i:2d}. {color}")
print()
print("=" * 70)
print("Common user terms mapped to API colors:")
print("'tuxedo' → Black & White / Tuxedo")
print("'orange' → Orange / Red")
print("'gray' → Gray / Blue / Silver")
print("'orange tabby' → Tabby (Orange / Red)")
print("'calico' → Calico")
print()
# Fetch breeds
print("=" * 70)
print("📋 BREEDS")
print("-" * 70)
breeds = agent.get_valid_breeds()
print(f"✓ Found {len(breeds)} valid breeds:")
print()
# Show first 30 breeds
for i, breed in enumerate(breeds[:30], 1):
print(f" {i:2d}. {breed}")
if len(breeds) > 30:
print(f" ... and {len(breeds) - 30} more breeds")
print()
print("=" * 70)
print("These are the ONLY values accepted by Petfinder API")
print("Use these exact values when making API requests")
print("=" * 70)
print()
except Exception as e:
print(f"❌ Error: {e}")
import traceback
traceback.print_exc()
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -1,57 +0,0 @@
#!/usr/bin/env python
"""Upload config.yaml to Modal volume for remote configuration."""
import modal
import yaml
from pathlib import Path
import sys
def main():
"""Upload config.yaml to Modal volume."""
# Load local config
config_path = Path("config.yaml")
if not config_path.exists():
print("❌ Error: config.yaml not found")
print("Copy config.example.yaml to config.yaml and configure it")
sys.exit(1)
try:
with open(config_path) as f:
config = yaml.safe_load(f)
except Exception as e:
print(f"❌ Error loading config.yaml: {e}")
sys.exit(1)
# Validate config
if config['deployment']['mode'] != 'production':
print("⚠️ Warning: config.yaml deployment mode is not set to 'production'")
try:
# Connect to Modal volume
volume = modal.Volume.from_name("tuxedo-link-data", create_if_missing=True)
# Remove old config if it exists
try:
volume.remove_file("/data/config.yaml")
print(" Removed old config.yaml")
except Exception:
# File doesn't exist, that's fine
pass
# Upload new config
with volume.batch_upload() as batch:
batch.put_file(config_path, "/data/config.yaml")
print("✓ Config uploaded to Modal volume")
print(f" Email provider: {config['email']['provider']}")
print(f" Deployment mode: {config['deployment']['mode']}")
except Exception as e:
print(f"❌ Error uploading config to Modal: {e}")
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -1,238 +0,0 @@
"""
Vector database for semantic search of colors and breeds.
This module provides fuzzy matching for user color/breed terms against
valid API values using sentence embeddings.
"""
import logging
from typing import List, Dict, Optional
from pathlib import Path
import chromadb
from sentence_transformers import SentenceTransformer
class MetadataVectorDB:
"""
Vector database for semantic search of metadata (colors, breeds).
Separate from the main cat vector DB, this stores valid API values
and enables fuzzy matching for user terms.
"""
def __init__(self, persist_directory: str = "metadata_vectorstore"):
"""
Initialize metadata vector database.
Args:
persist_directory: Path to persist the database
"""
self.persist_directory = persist_directory
Path(persist_directory).mkdir(parents=True, exist_ok=True)
# Initialize ChromaDB client
self.client = chromadb.PersistentClient(path=persist_directory)
# Initialize embedding model (same as main vector DB for consistency)
self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
# Get or create collections
self.colors_collection = self.client.get_or_create_collection(
name="colors",
metadata={"description": "Valid color values from APIs"}
)
self.breeds_collection = self.client.get_or_create_collection(
name="breeds",
metadata={"description": "Valid breed values from APIs"}
)
logging.info(f"MetadataVectorDB initialized at {persist_directory}")
logging.info(f"Colors indexed: {self.colors_collection.count()}")
logging.info(f"Breeds indexed: {self.breeds_collection.count()}")
def index_colors(self, valid_colors: List[str], source: str = "petfinder") -> None:
"""
Index valid color values for semantic search.
Args:
valid_colors: List of valid color strings from API
source: API source (petfinder or rescuegroups)
"""
if not valid_colors:
logging.warning(f"No colors provided for indexing from {source}")
return
# Check if already indexed for this source
existing = self.colors_collection.get(
where={"source": source}
)
if existing and len(existing['ids']) > 0:
logging.info(f"Colors from {source} already indexed ({len(existing['ids'])} items)")
return
# Generate embeddings
embeddings = self.embedding_model.encode(valid_colors, show_progress_bar=False)
# Create IDs
ids = [f"{source}_color_{i}" for i in range(len(valid_colors))]
# Index in ChromaDB
self.colors_collection.add(
ids=ids,
embeddings=embeddings.tolist(),
documents=valid_colors,
metadatas=[{"color": c, "source": source} for c in valid_colors]
)
logging.info(f"✓ Indexed {len(valid_colors)} colors from {source}")
def index_breeds(self, valid_breeds: List[str], source: str = "petfinder") -> None:
"""
Index valid breed values for semantic search.
Args:
valid_breeds: List of valid breed strings from API
source: API source (petfinder or rescuegroups)
"""
if not valid_breeds:
logging.warning(f"No breeds provided for indexing from {source}")
return
# Check if already indexed for this source
existing = self.breeds_collection.get(
where={"source": source}
)
if existing and len(existing['ids']) > 0:
logging.info(f"Breeds from {source} already indexed ({len(existing['ids'])} items)")
return
# Generate embeddings
embeddings = self.embedding_model.encode(valid_breeds, show_progress_bar=False)
# Create IDs
ids = [f"{source}_breed_{i}" for i in range(len(valid_breeds))]
# Index in ChromaDB
self.breeds_collection.add(
ids=ids,
embeddings=embeddings.tolist(),
documents=valid_breeds,
metadatas=[{"breed": b, "source": source} for b in valid_breeds]
)
logging.info(f"✓ Indexed {len(valid_breeds)} breeds from {source}")
def search_color(
self,
user_term: str,
n_results: int = 1,
source_filter: Optional[str] = None
) -> List[Dict]:
"""
Find most similar valid color(s) to user term.
Args:
user_term: User's color preference (e.g., "tuxedo", "grey")
n_results: Number of results to return
source_filter: Optional filter by source (petfinder/rescuegroups)
Returns:
List of dicts with 'color', 'distance', 'source' keys
"""
if not user_term or not user_term.strip():
return []
# Generate embedding for user term
embedding = self.embedding_model.encode([user_term], show_progress_bar=False)[0]
# Query ChromaDB
where_filter = {"source": source_filter} if source_filter else None
results = self.colors_collection.query(
query_embeddings=[embedding.tolist()],
n_results=min(n_results, self.colors_collection.count()),
where=where_filter
)
if not results or not results['ids'] or len(results['ids'][0]) == 0:
return []
# Format results
matches = []
for i in range(len(results['ids'][0])):
matches.append({
"color": results['metadatas'][0][i]['color'],
"distance": results['distances'][0][i],
"similarity": 1.0 - results['distances'][0][i], # Convert distance to similarity
"source": results['metadatas'][0][i]['source']
})
return matches
def search_breed(
self,
user_term: str,
n_results: int = 1,
source_filter: Optional[str] = None
) -> List[Dict]:
"""
Find most similar valid breed(s) to user term.
Args:
user_term: User's breed preference (e.g., "siamese", "main coon")
n_results: Number of results to return
source_filter: Optional filter by source (petfinder/rescuegroups)
Returns:
List of dicts with 'breed', 'distance', 'source' keys
"""
if not user_term or not user_term.strip():
return []
# Generate embedding for user term
embedding = self.embedding_model.encode([user_term], show_progress_bar=False)[0]
# Query ChromaDB
where_filter = {"source": source_filter} if source_filter else None
results = self.breeds_collection.query(
query_embeddings=[embedding.tolist()],
n_results=min(n_results, self.breeds_collection.count()),
where=where_filter
)
if not results or not results['ids'] or len(results['ids'][0]) == 0:
return []
# Format results
matches = []
for i in range(len(results['ids'][0])):
matches.append({
"breed": results['metadatas'][0][i]['breed'],
"distance": results['distances'][0][i],
"similarity": 1.0 - results['distances'][0][i],
"source": results['metadatas'][0][i]['source']
})
return matches
def clear_all(self) -> None:
"""Clear all indexed data (for testing)."""
try:
self.client.delete_collection("colors")
self.client.delete_collection("breeds")
logging.info("Cleared all metadata collections")
except Exception as e:
logging.warning(f"Error clearing collections: {e}")
def get_stats(self) -> Dict[str, int]:
"""Get statistics about indexed data."""
return {
"colors_count": self.colors_collection.count(),
"breeds_count": self.breeds_collection.count()
}

View File

@@ -1,284 +0,0 @@
"""Setup script for ChromaDB vector database."""
import os
import chromadb
from chromadb.config import Settings
from typing import List
from dotenv import load_dotenv
from models.cats import Cat
from sentence_transformers import SentenceTransformer
class VectorDBManager:
"""Manages ChromaDB for cat adoption semantic search."""
COLLECTION_NAME = "cats"
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
def __init__(self, persist_directory: str = "cat_vectorstore"):
"""
Initialize the vector database manager.
Args:
persist_directory: Directory for ChromaDB persistence
"""
self.persist_directory = persist_directory
# Create directory if it doesn't exist
if not os.path.exists(persist_directory):
os.makedirs(persist_directory)
# Initialize ChromaDB client
self.client = chromadb.PersistentClient(
path=persist_directory,
settings=Settings(anonymized_telemetry=False)
)
# Initialize embedding model
print(f"Loading embedding model: {self.EMBEDDING_MODEL}")
self.embedding_model = SentenceTransformer(self.EMBEDDING_MODEL)
# Get or create collection
self.collection = self.client.get_or_create_collection(
name=self.COLLECTION_NAME,
metadata={'description': 'Cat adoption listings with semantic search'}
)
print(f"Vector database initialized at {persist_directory}")
print(f"Collection '{self.COLLECTION_NAME}' contains {self.collection.count()} documents")
def create_document_text(self, cat: Cat) -> str:
"""
Create searchable document text from cat attributes.
Combines description with key attributes for semantic search.
Args:
cat: Cat object
Returns:
Document text for embedding
"""
parts = []
# Add description
if cat.description:
parts.append(cat.description)
# Add breed info
parts.append(f"Breed: {cat.breed}")
if cat.breeds_secondary:
parts.append(f"Mixed with: {', '.join(cat.breeds_secondary)}")
# Add personality hints from attributes
traits = []
if cat.good_with_children:
traits.append("good with children")
if cat.good_with_dogs:
traits.append("good with dogs")
if cat.good_with_cats:
traits.append("good with other cats")
if cat.house_trained:
traits.append("house trained")
if cat.special_needs:
traits.append("has special needs")
if traits:
parts.append(f"Personality: {', '.join(traits)}")
# Add color info
if cat.colors:
parts.append(f"Colors: {', '.join(cat.colors)}")
return " | ".join(parts)
def create_metadata(self, cat: Cat) -> dict:
"""
Create metadata dictionary for ChromaDB.
Args:
cat: Cat object
Returns:
Metadata dictionary
"""
return {
'id': cat.id,
'name': cat.name,
'age': cat.age,
'size': cat.size,
'gender': cat.gender,
'breed': cat.breed,
'city': cat.city or '',
'state': cat.state or '',
'zip_code': cat.zip_code or '',
'latitude': str(cat.latitude) if cat.latitude is not None else '',
'longitude': str(cat.longitude) if cat.longitude is not None else '',
'organization': cat.organization_name,
'source': cat.source,
'good_with_children': str(cat.good_with_children) if cat.good_with_children is not None else 'unknown',
'good_with_dogs': str(cat.good_with_dogs) if cat.good_with_dogs is not None else 'unknown',
'good_with_cats': str(cat.good_with_cats) if cat.good_with_cats is not None else 'unknown',
'special_needs': str(cat.special_needs),
'url': cat.url,
'primary_photo': cat.primary_photo or '',
}
def add_cat(self, cat: Cat) -> None:
"""
Add a single cat to the vector database.
Args:
cat: Cat object to add
"""
document = self.create_document_text(cat)
metadata = self.create_metadata(cat)
# Generate embedding
embedding = self.embedding_model.encode([document])[0].tolist()
# Add to collection
self.collection.add(
ids=[cat.id],
embeddings=[embedding],
documents=[document],
metadatas=[metadata]
)
def add_cats_batch(self, cats: List[Cat], batch_size: int = 100) -> None:
"""
Add multiple cats to the vector database in batches.
Args:
cats: List of Cat objects to add
batch_size: Number of cats to process in each batch
"""
print(f"Adding {len(cats)} cats to vector database...")
for i in range(0, len(cats), batch_size):
batch = cats[i:i+batch_size]
# Prepare data
ids = [cat.id for cat in batch]
documents = [self.create_document_text(cat) for cat in batch]
metadatas = [self.create_metadata(cat) for cat in batch]
# Generate embeddings
embeddings = self.embedding_model.encode(documents).tolist()
# Add to collection
self.collection.upsert(
ids=ids,
embeddings=embeddings,
documents=documents,
metadatas=metadatas
)
print(f"Processed batch {i//batch_size + 1}/{(len(cats)-1)//batch_size + 1}")
print(f"Successfully added {len(cats)} cats")
def update_cat(self, cat: Cat) -> None:
"""
Update an existing cat in the vector database.
Args:
cat: Updated Cat object
"""
self.add_cat(cat)
def delete_cat(self, cat_id: str) -> None:
"""
Delete a cat from the vector database.
Args:
cat_id: Cat ID to delete
"""
self.collection.delete(ids=[cat_id])
def search(self, query: str, n_results: int = 50, where: dict = None) -> dict:
"""
Search for cats using semantic similarity.
Args:
query: Search query (personality description)
n_results: Number of results to return
where: Optional metadata filters
Returns:
Search results dictionary
"""
# Generate query embedding
query_embedding = self.embedding_model.encode([query])[0].tolist()
# Search collection
results = self.collection.query(
query_embeddings=[query_embedding],
n_results=n_results,
where=where,
include=['documents', 'metadatas', 'distances']
)
return results
def clear_collection(self) -> None:
"""Delete all documents from the collection."""
print(f"Clearing collection '{self.COLLECTION_NAME}'...")
self.client.delete_collection(self.COLLECTION_NAME)
self.collection = self.client.create_collection(
name=self.COLLECTION_NAME,
metadata={'description': 'Cat adoption listings with semantic search'}
)
print("Collection cleared")
def get_stats(self) -> dict:
"""
Get statistics about the vector database.
Returns:
Dictionary with stats
"""
count = self.collection.count()
return {
'total_documents': count,
'collection_name': self.COLLECTION_NAME,
'persist_directory': self.persist_directory
}
def initialize_vectordb(persist_directory: str = "cat_vectorstore") -> VectorDBManager:
"""
Initialize the vector database.
Args:
persist_directory: Directory for persistence
Returns:
VectorDBManager instance
"""
load_dotenv()
# Get directory from environment or use default
persist_dir = os.getenv('VECTORDB_PATH', persist_directory)
manager = VectorDBManager(persist_dir)
print("\nVector Database Initialized Successfully!")
print(f"Location: {manager.persist_directory}")
print(f"Collection: {manager.COLLECTION_NAME}")
print(f"Documents: {manager.collection.count()}")
return manager
if __name__ == "__main__":
# Initialize database
manager = initialize_vectordb()
# Print stats
stats = manager.get_stats()
print("\nDatabase Stats:")
for key, value in stats.items():
print(f" {key}: {value}")

View File

@@ -1,291 +0,0 @@
# 🧪 Testing Guide
## Test Overview
**Status**: ✅ **92/92 tests passing** (100%)
The test suite includes:
- **81 unit tests** - Models, database, deduplication, email providers, semantic matching
- **11 integration tests** - Search pipeline, alerts, app functionality, color/breed normalization
- **4 manual test scripts** - Cache testing, email sending, semantic matching, framework testing
---
## Unit Tests (81 tests ✅)
Unit tests validate individual components in isolation.
### Test Data Models
```bash
pytest tests/unit/test_models.py -v
```
**Tests**:
- Cat model validation
- CatProfile model validation
- CatMatch model validation
- AdoptionAlert model validation
- SearchResult model validation
- Field requirements and defaults
- JSON serialization
### Test Database Operations
```bash
pytest tests/unit/test_database.py -v
```
**Tests**:
- Database initialization
- Cat caching with fingerprints
- Duplicate marking
- Image embedding storage
- Alert CRUD operations
- Query filtering
- Statistics retrieval
### Test Deduplication Logic
```bash
pytest tests/unit/test_deduplication.py -v
```
**Tests**:
- Fingerprint creation
- Levenshtein similarity calculation
- Composite score calculation
- Three-tier deduplication pipeline
- Image embedding comparison
### Test Email Providers
```bash
pytest tests/unit/test_email_providers.py -v
```
**Tests**:
- Mailgun provider initialization
- Mailgun email sending
- SendGrid stub behavior
- Provider factory
- Configuration loading
- Error handling
### Test Metadata Vector Database
```bash
pytest tests/unit/test_metadata_vectordb.py -v
```
**Tests** (11):
- Vector DB initialization
- Color indexing from multiple sources
- Breed indexing from multiple sources
- Semantic search for colors
- Semantic search for breeds
- Fuzzy matching with typos
- Multi-source filtering
- Empty search handling
- N-results parameter
- Statistics retrieval
### Test Color Mapping
```bash
pytest tests/unit/test_color_mapping.py -v
```
**Tests** (15):
- Dictionary matching for common terms (tuxedo, orange, gray)
- Multiple color normalization
- Exact match fallback
- Substring match fallback
- Vector DB fuzzy matching
- Typo handling
- Dictionary priority over vector search
- Case-insensitive matching
- Whitespace handling
- Empty input handling
- Color suggestions
- All dictionary mappings validation
### Test Breed Mapping
```bash
pytest tests/unit/test_breed_mapping.py -v
```
**Tests** (20):
- Dictionary matching for common breeds (Maine Coon, Ragdoll, Sphynx)
- Typo correction ("main coon" → "Maine Coon")
- Mixed breed handling
- Exact match fallback
- Substring match fallback
- Vector DB fuzzy matching
- Dictionary priority
- Case-insensitive matching
- DSH/DMH/DLH abbreviations
- Tabby/tuxedo pattern recognition
- Norwegian Forest Cat variations
- Similarity threshold testing
- Breed suggestions
- Whitespace handling
- All dictionary mappings validation
---
## Integration Tests (11 tests ✅)
Integration tests validate end-to-end workflows.
### Test Search Pipeline
```bash
pytest tests/integration/test_search_pipeline.py -v
```
**Tests**:
- Complete search flow (API → dedup → cache → match → results)
- Cache mode functionality
- Deduplication integration
- Hybrid matching
- API failure handling
- Vector DB updates
- Statistics tracking
### Test Alerts System
```bash
pytest tests/integration/test_alerts.py -v
```
**Tests**:
- Alert creation and retrieval
- Email-based alert queries
- Alert updates (frequency, status)
- Alert deletion
- Immediate notifications (production mode)
- Local vs production behavior
- UI integration
### Test App Functionality
```bash
pytest tests/integration/test_app.py -v
```
**Tests**:
- Profile extraction from UI
- Search result formatting
- Alert management UI
- Email validation
- Error handling
### Test Color and Breed Normalization
```bash
pytest tests/integration/test_color_breed_normalization.py -v
```
**Tests**:
- Tuxedo color normalization in search flow
- Multiple colors normalization
- Breed normalization (Maine Coon typo handling)
- Fuzzy matching with vector DB
- Combined colors and breeds in search
- RescueGroups API normalization
- Empty preferences handling
- Invalid color/breed graceful handling
---
## Manual Test Scripts
These scripts are for manual testing with real APIs and data.
### Test Cache and Deduplication
```bash
python tests/manual/test_cache_and_dedup.py
```
**Purpose**: Verify cache mode and deduplication with real data
**What it does**:
1. Runs a search without cache (fetches from APIs)
2. Displays statistics (cats found, duplicates removed, cache size)
3. Runs same search with cache (uses cached data)
4. Compares performance and results
5. Shows image embedding deduplication in action
### Test Email Sending
```bash
python tests/manual/test_email_sending.py
```
**Purpose**: Send test emails via configured provider
**What it does**:
1. Sends welcome email
2. Sends match notification email with sample data
3. Verifies HTML rendering and provider integration
**Requirements**: Valid MAILGUN_API_KEY or SENDGRID_API_KEY in `.env`
### Test Semantic Color/Breed Matching
```bash
python scripts/test_semantic_matching.py
```
**Purpose**: Verify 3-tier color and breed matching system
**What it does**:
1. Tests color mapping with and without vector DB
2. Tests breed mapping with and without vector DB
3. Demonstrates typo handling ("tuxado" → "tuxedo", "ragdol" → "Ragdoll")
4. Shows dictionary vs vector vs fallback matching
5. Displays similarity scores for fuzzy matches
**What you'll see**:
- ✅ Dictionary matches (instant)
- ✅ Vector DB fuzzy matches (with similarity scores)
- ✅ Typo correction in action
- ✅ 3-tier strategy demonstration
### Test Framework Directly
```bash
python cat_adoption_framework.py
```
**Purpose**: Run framework end-to-end test
**What it does**:
1. Initializes framework
2. Creates sample profile
3. Executes search
4. Displays top matches
5. Shows statistics
---
## Test Configuration
### Fixtures
Common test fixtures are defined in `tests/conftest.py`:
- `temp_db` - Temporary database for testing
- `temp_vectordb` - Temporary vector store
- `sample_cat` - Sample cat object
- `sample_profile` - Sample search profile
- `mock_framework` - Mocked framework for unit tests
### Environment
Tests use separate databases to avoid affecting production data:
- `test_tuxedo_link.db` - Test database (auto-deleted)
- `test_vectorstore` - Test vector store (auto-deleted)
### Mocking
External APIs are mocked in unit tests:
- Petfinder API calls
- RescueGroups API calls
- Email provider calls
- Modal remote functions
Integration tests can use real APIs (set `SKIP_API_TESTS=false` in environment).
---
**Need help?** Check the [TECHNICAL_REFERENCE.md](../docs/TECHNICAL_REFERENCE.md) for detailed function documentation.

View File

@@ -1,2 +0,0 @@
"""Tests for Tuxedo Link."""

View File

@@ -1,45 +0,0 @@
"""Pytest configuration and fixtures."""
import pytest
import tempfile
import os
from database.manager import DatabaseManager
@pytest.fixture
def temp_db():
"""Create a temporary database for testing."""
# Create temp path but don't create the file yet
# This allows DatabaseManager to initialize it properly
fd, path = tempfile.mkstemp(suffix='.db')
os.close(fd)
os.unlink(path) # Remove empty file so DatabaseManager can initialize it
db = DatabaseManager(path) # Tables are created automatically in __init__
yield db
# Cleanup
try:
os.unlink(path)
except:
pass
@pytest.fixture
def sample_cat_data():
"""Sample cat data for testing."""
return {
"id": "test123",
"name": "Test Cat",
"breed": "Persian",
"age": "adult",
"gender": "female",
"size": "medium",
"city": "Test City",
"state": "TS",
"source": "test",
"organization_name": "Test Rescue",
"url": "https://example.com/cat/test123"
}

View File

@@ -1,2 +0,0 @@
"""Integration tests for Tuxedo Link."""

View File

@@ -1,306 +0,0 @@
"""Integration tests for alert management system."""
import pytest
import tempfile
from pathlib import Path
from datetime import datetime
from database.manager import DatabaseManager
from models.cats import AdoptionAlert, CatProfile
@pytest.fixture
def temp_db():
"""Create a temporary database for testing."""
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f:
db_path = f.name
# Unlink so DatabaseManager can initialize it
Path(db_path).unlink()
db_manager = DatabaseManager(db_path)
yield db_manager
# Cleanup
Path(db_path).unlink(missing_ok=True)
@pytest.fixture
def sample_profile():
"""Create a sample cat profile for testing."""
return CatProfile(
user_location="New York, NY",
max_distance=25,
age_range=["young", "adult"],
good_with_children=True,
good_with_dogs=False,
good_with_cats=True,
personality_description="Friendly and playful",
special_requirements=[]
)
class TestAlertManagement:
"""Tests for alert management without user authentication."""
def test_create_alert_without_user(self, temp_db, sample_profile):
"""Test creating an alert without user authentication."""
alert = AdoptionAlert(
user_email="test@example.com",
profile=sample_profile,
frequency="daily",
active=True
)
alert_id = temp_db.create_alert(alert)
assert alert_id is not None
assert alert_id > 0
def test_get_alert_by_id(self, temp_db, sample_profile):
"""Test retrieving an alert by ID."""
alert = AdoptionAlert(
user_email="test@example.com",
profile=sample_profile,
frequency="weekly",
active=True
)
alert_id = temp_db.create_alert(alert)
retrieved_alert = temp_db.get_alert(alert_id)
assert retrieved_alert is not None
assert retrieved_alert.id == alert_id
assert retrieved_alert.user_email == "test@example.com"
assert retrieved_alert.frequency == "weekly"
assert retrieved_alert.profile.user_location == "New York, NY"
def test_get_alerts_by_email(self, temp_db, sample_profile):
"""Test retrieving all alerts for a specific email."""
email = "user@example.com"
# Create multiple alerts for the same email
for freq in ["daily", "weekly", "immediately"]:
alert = AdoptionAlert(
user_email=email,
profile=sample_profile,
frequency=freq,
active=True
)
temp_db.create_alert(alert)
# Create alert for different email
other_alert = AdoptionAlert(
user_email="other@example.com",
profile=sample_profile,
frequency="daily",
active=True
)
temp_db.create_alert(other_alert)
# Retrieve alerts for specific email
alerts = temp_db.get_alerts_by_email(email)
assert len(alerts) == 3
assert all(a.user_email == email for a in alerts)
def test_get_all_alerts(self, temp_db, sample_profile):
"""Test retrieving all alerts in the database."""
# Create alerts for different emails
for email in ["user1@test.com", "user2@test.com", "user3@test.com"]:
alert = AdoptionAlert(
user_email=email,
profile=sample_profile,
frequency="daily",
active=True
)
temp_db.create_alert(alert)
all_alerts = temp_db.get_all_alerts()
assert len(all_alerts) == 3
assert len(set(a.user_email for a in all_alerts)) == 3
def test_get_active_alerts(self, temp_db, sample_profile):
"""Test retrieving only active alerts."""
# Create active alerts
for i in range(3):
alert = AdoptionAlert(
user_email=f"user{i}@test.com",
profile=sample_profile,
frequency="daily",
active=True
)
temp_db.create_alert(alert)
# Create inactive alert
inactive_alert = AdoptionAlert(
user_email="inactive@test.com",
profile=sample_profile,
frequency="weekly",
active=False
)
alert_id = temp_db.create_alert(inactive_alert)
# Deactivate it
temp_db.update_alert(alert_id, active=False)
active_alerts = temp_db.get_active_alerts()
# Should only get the 3 active alerts
assert len(active_alerts) == 3
assert all(a.active for a in active_alerts)
def test_update_alert_frequency(self, temp_db, sample_profile):
"""Test updating alert frequency."""
alert = AdoptionAlert(
user_email="test@example.com",
profile=sample_profile,
frequency="daily",
active=True
)
alert_id = temp_db.create_alert(alert)
# Update frequency
temp_db.update_alert(alert_id, frequency="weekly")
updated_alert = temp_db.get_alert(alert_id)
assert updated_alert.frequency == "weekly"
def test_update_alert_last_sent(self, temp_db, sample_profile):
"""Test updating alert last_sent timestamp."""
alert = AdoptionAlert(
user_email="test@example.com",
profile=sample_profile,
frequency="daily",
active=True
)
alert_id = temp_db.create_alert(alert)
# Update last_sent
now = datetime.now()
temp_db.update_alert(alert_id, last_sent=now)
updated_alert = temp_db.get_alert(alert_id)
assert updated_alert.last_sent is not None
# Compare with some tolerance
assert abs((updated_alert.last_sent - now).total_seconds()) < 2
def test_update_alert_match_ids(self, temp_db, sample_profile):
"""Test updating alert last_match_ids."""
alert = AdoptionAlert(
user_email="test@example.com",
profile=sample_profile,
frequency="daily",
active=True
)
alert_id = temp_db.create_alert(alert)
# Update match IDs
match_ids = ["cat-123", "cat-456", "cat-789"]
temp_db.update_alert(alert_id, last_match_ids=match_ids)
updated_alert = temp_db.get_alert(alert_id)
assert updated_alert.last_match_ids == match_ids
def test_toggle_alert_active_status(self, temp_db, sample_profile):
"""Test toggling alert active/inactive."""
alert = AdoptionAlert(
user_email="test@example.com",
profile=sample_profile,
frequency="daily",
active=True
)
alert_id = temp_db.create_alert(alert)
# Deactivate
temp_db.update_alert(alert_id, active=False)
assert temp_db.get_alert(alert_id).active is False
# Reactivate
temp_db.update_alert(alert_id, active=True)
assert temp_db.get_alert(alert_id).active is True
def test_delete_alert(self, temp_db, sample_profile):
"""Test deleting an alert."""
alert = AdoptionAlert(
user_email="test@example.com",
profile=sample_profile,
frequency="daily",
active=True
)
alert_id = temp_db.create_alert(alert)
# Verify alert exists
assert temp_db.get_alert(alert_id) is not None
# Delete alert
temp_db.delete_alert(alert_id)
# Verify alert is gone
assert temp_db.get_alert(alert_id) is None
def test_multiple_alerts_same_email(self, temp_db, sample_profile):
"""Test creating multiple alerts for the same email address."""
email = "test@example.com"
# Create alerts with different frequencies
for freq in ["immediately", "daily", "weekly"]:
alert = AdoptionAlert(
user_email=email,
profile=sample_profile,
frequency=freq,
active=True
)
temp_db.create_alert(alert)
alerts = temp_db.get_alerts_by_email(email)
assert len(alerts) == 3
frequencies = {a.frequency for a in alerts}
assert frequencies == {"immediately", "daily", "weekly"}
def test_alert_profile_persistence(self, temp_db):
"""Test that complex profile data persists correctly."""
complex_profile = CatProfile(
user_location="San Francisco, CA",
max_distance=50,
age_range=["kitten", "young"],
size=["small", "medium"],
preferred_breeds=["Siamese", "Persian"],
good_with_children=True,
good_with_dogs=True,
good_with_cats=False,
special_needs_ok=False,
personality_description="Calm and affectionate lap cat"
)
alert = AdoptionAlert(
user_email="test@example.com",
profile=complex_profile,
frequency="daily",
active=True
)
alert_id = temp_db.create_alert(alert)
retrieved_alert = temp_db.get_alert(alert_id)
# Verify all profile fields persisted correctly
assert retrieved_alert.profile.user_location == "San Francisco, CA"
assert retrieved_alert.profile.max_distance == 50
assert retrieved_alert.profile.age_range == ["kitten", "young"]
assert retrieved_alert.profile.size == ["small", "medium"]
assert retrieved_alert.profile.gender == ["female"]
assert retrieved_alert.profile.breed == ["Siamese", "Persian"]
assert retrieved_alert.profile.good_with_children is True
assert retrieved_alert.profile.good_with_dogs is True
assert retrieved_alert.profile.good_with_cats is False
assert retrieved_alert.profile.personality_description == "Calm and affectionate lap cat"
assert retrieved_alert.profile.special_requirements == ["indoor-only", "senior-friendly"]

View File

@@ -1,194 +0,0 @@
"""Integration tests for the Gradio app interface."""
import pytest
from unittest.mock import Mock, patch, MagicMock
from app import extract_profile_from_text
from models.cats import CatProfile, Cat, CatMatch
@pytest.fixture
def mock_framework():
"""Mock the TuxedoLinkFramework."""
with patch('app.framework') as mock:
# Create a mock result
mock_cat = Cat(
id="test_1",
name="Test Cat",
breed="Persian",
age="young",
gender="female",
size="medium",
city="New York",
state="NY",
source="test",
organization_name="Test Rescue",
url="https://example.com/cat/test_1",
description="A friendly and playful cat"
)
mock_match = CatMatch(
cat=mock_cat,
match_score=0.95,
vector_similarity=0.92,
attribute_match_score=0.98,
explanation="Great match for your preferences"
)
mock_result = Mock()
mock_result.matches = [mock_match]
mock_result.search_time = 0.5
mock.search.return_value = mock_result
yield mock
@pytest.fixture
def mock_profile_agent():
"""Mock the ProfileAgent."""
with patch('app.profile_agent') as mock:
mock_profile = CatProfile(
user_location="10001",
max_distance=50,
personality_description="friendly and playful",
age_range=["young"],
good_with_children=True
)
mock.extract_profile.return_value = mock_profile
yield mock
class TestAppInterface:
"""Test the Gradio app interface functions."""
def test_extract_profile_with_valid_input(self, mock_framework, mock_profile_agent):
"""Test that valid user input is processed correctly."""
user_input = "I want a friendly kitten in NYC"
chat_history, results_html, profile_json = extract_profile_from_text(user_input, use_cache=True)
# Verify chat history format (messages format)
assert isinstance(chat_history, list)
assert len(chat_history) == 2
assert chat_history[0]["role"] == "user"
assert chat_history[0]["content"] == user_input
assert chat_history[1]["role"] == "assistant"
assert "Found" in chat_history[1]["content"] or "match" in chat_history[1]["content"].lower()
# Verify profile agent was called with correct format
mock_profile_agent.extract_profile.assert_called_once()
call_args = mock_profile_agent.extract_profile.call_args[0][0]
assert isinstance(call_args, list)
assert call_args[0]["role"] == "user"
assert call_args[0]["content"] == user_input
# Verify results HTML is generated
assert results_html
assert "<div" in results_html
# Verify profile JSON is returned
assert profile_json
def test_extract_profile_with_empty_input(self, mock_framework, mock_profile_agent):
"""Test that empty input uses placeholder text."""
user_input = ""
chat_history, results_html, profile_json = extract_profile_from_text(user_input, use_cache=True)
# Verify placeholder text was used
mock_profile_agent.extract_profile.assert_called_once()
call_args = mock_profile_agent.extract_profile.call_args[0][0]
assert call_args[0]["content"] != ""
assert "friendly" in call_args[0]["content"].lower()
assert "playful" in call_args[0]["content"].lower()
# Verify chat history format
assert isinstance(chat_history, list)
assert len(chat_history) == 2
assert chat_history[0]["role"] == "user"
assert chat_history[1]["role"] == "assistant"
def test_extract_profile_with_whitespace_input(self, mock_framework, mock_profile_agent):
"""Test that whitespace-only input uses placeholder text."""
user_input = " \n\t "
chat_history, results_html, profile_json = extract_profile_from_text(user_input, use_cache=True)
# Verify placeholder text was used
mock_profile_agent.extract_profile.assert_called_once()
call_args = mock_profile_agent.extract_profile.call_args[0][0]
assert call_args[0]["content"].strip() != ""
def test_extract_profile_error_handling(self, mock_framework, mock_profile_agent):
"""Test error handling when profile extraction fails."""
user_input = "I want a cat"
# Make profile agent raise an error
mock_profile_agent.extract_profile.side_effect = Exception("API Error")
chat_history, results_html, profile_json = extract_profile_from_text(user_input, use_cache=True)
# Verify error message is in chat history
assert isinstance(chat_history, list)
assert len(chat_history) == 2
assert chat_history[0]["role"] == "user"
assert chat_history[1]["role"] == "assistant"
assert "Error" in chat_history[1]["content"] or "" in chat_history[1]["content"]
# Verify empty results
assert results_html == ""
assert profile_json == ""
def test_cache_mode_parameter(self, mock_framework, mock_profile_agent):
"""Test that cache mode parameter is passed correctly."""
user_input = "I want a cat in NYC"
# Test with cache=True
extract_profile_from_text(user_input, use_cache=True)
mock_framework.search.assert_called_once()
assert mock_framework.search.call_args[1]["use_cache"] is True
# Reset and test with cache=False
mock_framework.reset_mock()
extract_profile_from_text(user_input, use_cache=False)
mock_framework.search.assert_called_once()
assert mock_framework.search.call_args[1]["use_cache"] is False
def test_messages_format_consistency(self, mock_framework, mock_profile_agent):
"""Test that messages format is consistent throughout."""
user_input = "Show me cats"
chat_history, _, _ = extract_profile_from_text(user_input, use_cache=True)
# Verify all messages have correct format
for msg in chat_history:
assert isinstance(msg, dict)
assert "role" in msg
assert "content" in msg
assert msg["role"] in ["user", "assistant"]
assert isinstance(msg["content"], str)
def test_example_button_scenarios(self, mock_framework, mock_profile_agent):
"""Test example button text scenarios."""
examples = [
"I want a friendly family cat in zip code 10001, good with children and dogs",
"Looking for a playful young kitten near New York City",
"I need a calm, affectionate adult cat that likes to cuddle",
"Show me cats good with children in the NYC area"
]
for example in examples:
mock_profile_agent.reset_mock()
mock_framework.reset_mock()
chat_history, results_html, profile_json = extract_profile_from_text(example, use_cache=True)
# Verify each example is processed
assert isinstance(chat_history, list)
assert len(chat_history) == 2
assert chat_history[0]["content"] == example
mock_profile_agent.extract_profile.assert_called_once()
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -1,323 +0,0 @@
"""Integration tests for color and breed normalization in search pipeline."""
import pytest
import tempfile
import shutil
from unittest.mock import Mock, patch
from models.cats import CatProfile
from setup_metadata_vectordb import MetadataVectorDB
from agents.planning_agent import PlanningAgent
from database.manager import DatabaseManager
from setup_vectordb import VectorDBManager
@pytest.fixture
def temp_dirs():
"""Create temporary directories for testing."""
db_dir = tempfile.mkdtemp()
vector_dir = tempfile.mkdtemp()
metadata_dir = tempfile.mkdtemp()
yield db_dir, vector_dir, metadata_dir
# Cleanup
shutil.rmtree(db_dir, ignore_errors=True)
shutil.rmtree(vector_dir, ignore_errors=True)
shutil.rmtree(metadata_dir, ignore_errors=True)
@pytest.fixture
def metadata_vectordb(temp_dirs):
"""Create metadata vector DB with sample data."""
_, _, metadata_dir = temp_dirs
vectordb = MetadataVectorDB(persist_directory=metadata_dir)
# Index sample colors and breeds
colors = [
"Black",
"White",
"Black & White / Tuxedo",
"Orange / Red",
"Gray / Blue / Silver",
"Calico"
]
breeds = [
"Siamese",
"Persian",
"Maine Coon",
"Domestic Short Hair",
"Domestic Medium Hair"
]
vectordb.index_colors(colors, source="petfinder")
vectordb.index_breeds(breeds, source="petfinder")
return vectordb
@pytest.fixture
def planning_agent(temp_dirs, metadata_vectordb):
"""Create planning agent with metadata vector DB."""
db_dir, vector_dir, _ = temp_dirs
db_manager = DatabaseManager(f"{db_dir}/test.db")
vector_db = VectorDBManager(vector_dir)
return PlanningAgent(db_manager, vector_db, metadata_vectordb)
class TestColorBreedNormalization:
"""Integration tests for color and breed normalization."""
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors')
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_breeds')
def test_tuxedo_color_normalization(
self,
mock_get_breeds,
mock_get_colors,
mock_search,
planning_agent
):
"""Test that 'tuxedo' is correctly normalized to 'Black & White / Tuxedo'."""
# Setup mocks
mock_get_colors.return_value = [
"Black",
"White",
"Black & White / Tuxedo"
]
mock_get_breeds.return_value = ["Domestic Short Hair"]
mock_search.return_value = []
# Create profile with "tuxedo" color
profile = CatProfile(
user_location="New York, NY",
color_preferences=["tuxedo"]
)
# Execute search (will fail but we just want to see the API call)
try:
planning_agent._search_petfinder(profile)
except:
pass
# Verify search_cats was called with correct normalized color
assert mock_search.called
call_args = mock_search.call_args
# Check that color parameter contains the correct API value
if 'color' in call_args.kwargs and call_args.kwargs['color']:
assert "Black & White / Tuxedo" in call_args.kwargs['color']
assert "Black" not in call_args.kwargs['color'] or len(call_args.kwargs['color']) == 1
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors')
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_breeds')
def test_multiple_colors_normalization(
self,
mock_get_breeds,
mock_get_colors,
mock_search,
planning_agent
):
"""Test normalization of multiple color preferences."""
mock_get_colors.return_value = [
"Black & White / Tuxedo",
"Orange / Red",
"Calico"
]
mock_get_breeds.return_value = []
mock_search.return_value = []
profile = CatProfile(
user_location="New York, NY",
color_preferences=["tuxedo", "orange", "calico"]
)
try:
planning_agent._search_petfinder(profile)
except:
pass
assert mock_search.called
call_args = mock_search.call_args
if 'color' in call_args.kwargs and call_args.kwargs['color']:
colors = call_args.kwargs['color']
assert len(colors) == 3
assert "Black & White / Tuxedo" in colors
assert "Orange / Red" in colors
assert "Calico" in colors
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors')
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_breeds')
def test_breed_normalization_maine_coon(
self,
mock_get_breeds,
mock_get_colors,
mock_search,
planning_agent
):
"""Test that 'main coon' (typo) is normalized to 'Maine Coon'."""
mock_get_colors.return_value = []
mock_get_breeds.return_value = ["Maine Coon", "Siamese"]
mock_search.return_value = []
profile = CatProfile(
user_location="New York, NY",
breed_preferences=["main coon"] # Typo
)
try:
planning_agent._search_petfinder(profile)
except:
pass
assert mock_search.called
call_args = mock_search.call_args
if 'breed' in call_args.kwargs and call_args.kwargs['breed']:
assert "Maine Coon" in call_args.kwargs['breed']
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors')
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_breeds')
def test_fuzzy_color_matching_with_vectordb(
self,
mock_get_breeds,
mock_get_colors,
mock_search,
planning_agent
):
"""Test fuzzy matching with vector DB for typos."""
mock_get_colors.return_value = ["Black & White / Tuxedo"]
mock_get_breeds.return_value = []
mock_search.return_value = []
# Use a term that requires vector search (not in dictionary)
profile = CatProfile(
user_location="New York, NY",
color_preferences=["tuxado"] # Typo
)
try:
planning_agent._search_petfinder(profile)
except:
pass
assert mock_search.called
# May or may not match depending on similarity threshold
# This test primarily ensures no errors occur
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors')
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_breeds')
def test_colors_and_breeds_together(
self,
mock_get_breeds,
mock_get_colors,
mock_search,
planning_agent
):
"""Test normalization of both colors and breeds in same search."""
mock_get_colors.return_value = ["Black & White / Tuxedo", "Orange / Red"]
mock_get_breeds.return_value = ["Siamese", "Maine Coon"]
mock_search.return_value = []
profile = CatProfile(
user_location="New York, NY",
color_preferences=["tuxedo", "orange"],
breed_preferences=["siamese", "main coon"]
)
try:
planning_agent._search_petfinder(profile)
except:
pass
assert mock_search.called
call_args = mock_search.call_args
# Verify both colors and breeds are normalized
if 'color' in call_args.kwargs and call_args.kwargs['color']:
assert "Black & White / Tuxedo" in call_args.kwargs['color']
assert "Orange / Red" in call_args.kwargs['color']
if 'breed' in call_args.kwargs and call_args.kwargs['breed']:
assert "Siamese" in call_args.kwargs['breed']
assert "Maine Coon" in call_args.kwargs['breed']
@patch('agents.rescuegroups_agent.RescueGroupsAgent.search_cats')
@patch('agents.rescuegroups_agent.RescueGroupsAgent.get_valid_colors')
@patch('agents.rescuegroups_agent.RescueGroupsAgent.get_valid_breeds')
def test_rescuegroups_normalization(
self,
mock_get_breeds,
mock_get_colors,
mock_search,
planning_agent
):
"""Test that normalization works for RescueGroups API too."""
mock_get_colors.return_value = ["Tuxedo", "Orange"]
mock_get_breeds.return_value = ["Siamese"]
mock_search.return_value = []
profile = CatProfile(
user_location="New York, NY",
color_preferences=["tuxedo"],
breed_preferences=["siamese"]
)
try:
planning_agent._search_rescuegroups(profile)
except:
pass
assert mock_search.called
# Normalization should have occurred with rescuegroups source
def test_no_colors_or_breeds(self, planning_agent):
"""Test search without color or breed preferences."""
with patch('agents.petfinder_agent.PetfinderAgent.search_cats') as mock_search:
mock_search.return_value = []
profile = CatProfile(
user_location="New York, NY"
# No color_preferences or breed_preferences
)
try:
planning_agent._search_petfinder(profile)
except:
pass
assert mock_search.called
call_args = mock_search.call_args
# Should be None or empty
assert call_args.kwargs.get('color') is None or len(call_args.kwargs.get('color', [])) == 0
assert call_args.kwargs.get('breed') is None or len(call_args.kwargs.get('breed', [])) == 0
def test_invalid_color_graceful_handling(self, planning_agent):
"""Test that invalid colors don't break the search."""
with patch('agents.petfinder_agent.PetfinderAgent.search_cats') as mock_search:
with patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors') as mock_colors:
mock_search.return_value = []
mock_colors.return_value = ["Black", "White"]
profile = CatProfile(
user_location="New York, NY",
color_preferences=["invalid_color_xyz"]
)
try:
planning_agent._search_petfinder(profile)
except:
pass
# Should still call search, just with empty/None color
assert mock_search.called

View File

@@ -1,265 +0,0 @@
"""Integration tests for the complete search pipeline."""
import pytest
from unittest.mock import Mock, patch
from models.cats import Cat, CatProfile
from cat_adoption_framework import TuxedoLinkFramework
from utils.deduplication import create_fingerprint
@pytest.fixture
def framework():
"""Create framework instance with test database."""
return TuxedoLinkFramework()
@pytest.fixture
def sample_cats():
"""Create sample cat data for testing."""
cats = []
for i in range(5):
cat = Cat(
id=f"test_{i}",
name=f"Test Cat {i}",
breed="Persian" if i % 2 == 0 else "Siamese",
age="young" if i < 3 else "adult",
gender="female" if i % 2 == 0 else "male",
size="medium",
city="Test City",
state="TS",
source="test",
organization_name="Test Rescue",
url=f"https://example.com/cat/test_{i}",
description=f"A friendly and playful cat number {i}",
good_with_children=True if i < 4 else False
)
cat.fingerprint = create_fingerprint(cat)
cats.append(cat)
return cats
class TestSearchPipelineIntegration:
"""Integration tests for complete search pipeline."""
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
@patch('agents.rescuegroups_agent.RescueGroupsAgent.search_cats')
def test_end_to_end_search(self, mock_rescuegroups, mock_petfinder, framework, sample_cats):
"""Test end-to-end search with mocked API responses."""
# Mock API responses
mock_petfinder.return_value = sample_cats[:3]
mock_rescuegroups.return_value = sample_cats[3:]
# Create search profile
profile = CatProfile(
user_location="10001",
max_distance=50,
personality_description="friendly playful cat",
age_range=["young"],
good_with_children=True
)
# Execute search
result = framework.search(profile)
# Verify results
assert result.total_found == 5
assert len(result.matches) > 0
assert result.search_time > 0
assert 'cache' not in result.sources_queried # Should be fresh search
# Verify API calls were made
mock_petfinder.assert_called_once()
mock_rescuegroups.assert_called_once()
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
def test_cache_mode_search(self, mock_petfinder, framework, sample_cats):
"""Test search using cache mode."""
# First populate cache
mock_petfinder.return_value = sample_cats
profile = CatProfile(user_location="10001")
result1 = framework.search(profile)
# Reset mock
mock_petfinder.reset_mock()
# Second search with cache
result2 = framework.search(profile, use_cache=True)
# Verify cache was used
assert 'cache' in result2.sources_queried
assert result2.search_time < result1.search_time # Cache should be faster
mock_petfinder.assert_not_called() # Should not call API
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
def test_deduplication_integration(self, mock_petfinder, framework, sample_cats):
"""Test that deduplication works in the pipeline."""
# Test deduplication by creating cats that only differ by source
# They will be marked as duplicates due to same fingerprint (org + breed + age + gender)
cat1 = Cat(
id="duplicate_test_1",
name="Fluffy",
breed="Persian",
age="young",
gender="female",
size="medium",
city="Test City",
state="TS",
source="petfinder",
organization_name="Test Rescue",
url="https://example.com/cat/dup1"
)
# Same cat from different source - will have same fingerprint
cat2 = Cat(
id="duplicate_test_2",
name="Fluffy", # Same name
breed="Persian", # Same breed
age="young", # Same age
gender="female", # Same gender
size="medium",
city="Test City",
state="TS",
source="rescuegroups", # Different source (but same fingerprint)
organization_name="Test Rescue", # Same org
url="https://example.com/cat/dup2"
)
# Verify same fingerprints
fp1 = create_fingerprint(cat1)
fp2 = create_fingerprint(cat2)
assert fp1 == fp2, f"Fingerprints should match: {fp1} vs {fp2}"
mock_petfinder.return_value = [cat1, cat2]
profile = CatProfile(user_location="10001")
result = framework.search(profile)
# With same fingerprints, one should be marked as duplicate
# Note: duplicates_removed counts cats marked as duplicates
# The actual behavior is that cats with same fingerprint are deduplicated
if result.duplicates_removed == 0:
# If 0 duplicates removed, skip this check - dedup may already have been done
# or cats may have been in cache
pass
else:
assert result.duplicates_removed >= 1
assert result.total_found == 2
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
def test_hybrid_matching_integration(self, mock_petfinder, framework, sample_cats):
"""Test that hybrid matching filters and ranks correctly."""
mock_petfinder.return_value = sample_cats
# Search for young cats only
profile = CatProfile(
user_location="10001",
personality_description="friendly playful",
age_range=["young"]
)
result = framework.search(profile)
# All results should be young cats
for match in result.matches:
assert match.cat.age == "young"
# Should have match scores
assert all(0 <= m.match_score <= 1 for m in result.matches)
# Should have explanations
assert all(m.explanation for m in result.matches)
def test_stats_integration(self, framework):
"""Test that stats are tracked correctly."""
stats = framework.get_stats()
assert 'database' in stats
assert 'vector_db' in stats
assert 'total_unique' in stats['database']
class TestAPIFailureHandling:
"""Test that pipeline handles API failures gracefully."""
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
@patch('agents.rescuegroups_agent.RescueGroupsAgent.search_cats')
def test_one_api_fails(self, mock_rescuegroups, mock_petfinder, framework, sample_cats):
"""Test that pipeline continues if one API fails."""
# Petfinder succeeds, RescueGroups fails
mock_petfinder.return_value = sample_cats
mock_rescuegroups.side_effect = Exception("API Error")
profile = CatProfile(user_location="10001")
result = framework.search(profile)
# Should still get results from Petfinder
assert result.total_found == 5
assert len(result.matches) > 0
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
@patch('agents.rescuegroups_agent.RescueGroupsAgent.search_cats')
def test_both_apis_fail(self, mock_rescuegroups, mock_petfinder, framework):
"""Test that pipeline handles all APIs failing."""
# Both fail
mock_petfinder.side_effect = Exception("API Error")
mock_rescuegroups.side_effect = Exception("API Error")
profile = CatProfile(user_location="10001")
result = framework.search(profile)
# Should return empty results, not crash
assert result.total_found == 0
assert len(result.matches) == 0
class TestVectorDBIntegration:
"""Test vector database integration."""
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
def test_vector_db_updated(self, mock_petfinder, framework):
"""Test that vector DB is updated with new cats."""
# Create unique cats that definitely won't exist in DB
import time
unique_id = str(int(time.time() * 1000))
unique_cats = []
for i in range(3):
cat = Cat(
id=f"unique_test_{unique_id}_{i}",
name=f"Unique Cat {unique_id} {i}",
breed="TestBreed",
age="young",
gender="female",
size="medium",
city="Test City",
state="TS",
source="petfinder",
organization_name=f"Unique Rescue {unique_id}",
url=f"https://example.com/cat/unique_{unique_id}_{i}",
description=f"A unique test cat {unique_id} {i}"
)
cat.fingerprint = create_fingerprint(cat)
unique_cats.append(cat)
mock_petfinder.return_value = unique_cats
# Get initial count
initial_stats = framework.get_stats()
initial_count = initial_stats['vector_db']['total_documents']
# Run search
profile = CatProfile(user_location="10001")
framework.search(profile)
# Check count increased (should add at least 3 new documents)
final_stats = framework.get_stats()
final_count = final_stats['vector_db']['total_documents']
# Should have added our 3 unique cats
assert final_count >= initial_count + 3, \
f"Expected at least {initial_count + 3} documents, got {final_count}"
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -1,192 +0,0 @@
"""Test script for cache mode and image-based deduplication."""
import os
import sys
from dotenv import load_dotenv
from cat_adoption_framework import TuxedoLinkFramework
from models.cats import CatProfile
def test_cache_mode():
"""Test that cache mode works without hitting APIs."""
print("\n" + "="*70)
print("TEST 1: Cache Mode (No API Calls)")
print("="*70 + "\n")
framework = TuxedoLinkFramework()
profile = CatProfile(
user_location="10001",
max_distance=50,
personality_description="affectionate lap cat",
age_range=["young"],
good_with_children=True
)
print("🔄 Running search with use_cache=True...")
print(" This should use cached data from previous search\n")
result = framework.search(profile, use_cache=True)
print(f"\n✅ Cache search completed in {result.search_time:.2f} seconds")
print(f" Sources: {', '.join(result.sources_queried)}")
print(f" Matches: {len(result.matches)}")
if result.matches:
print(f"\n Top match: {result.matches[0].cat.name} ({result.matches[0].match_score:.1%})")
return result
def test_image_dedup():
"""Test that image embeddings are being used for deduplication."""
print("\n" + "="*70)
print("TEST 2: Image Embedding Deduplication")
print("="*70 + "\n")
framework = TuxedoLinkFramework()
# Get database stats
stats = framework.db_manager.get_cache_stats()
print("Current Database State:")
print(f" Total unique cats: {stats['total_unique']}")
print(f" Total duplicates: {stats['total_duplicates']}")
print(f" Sources: {stats['sources']}")
# Check if image embeddings exist
with framework.db_manager.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"SELECT COUNT(*) as total, "
"SUM(CASE WHEN image_embedding IS NOT NULL THEN 1 ELSE 0 END) as with_images "
"FROM cats_cache WHERE is_duplicate = 0"
)
row = cursor.fetchone()
total = row['total']
with_images = row['with_images']
print(f"\nImage Embeddings:")
print(f" Cats with photos: {with_images}/{total} ({with_images/total*100 if total > 0 else 0:.1f}%)")
if with_images > 0:
print("\n✅ Image embeddings ARE being generated and cached!")
print(" These are used in the deduplication pipeline with:")
print(" - Name similarity (40% weight)")
print(" - Description similarity (30% weight)")
print(" - Image similarity (30% weight)")
else:
print("\n⚠️ No image embeddings found yet")
print(" Run a fresh search to populate the cache")
return stats
def test_dedup_thresholds():
"""Show deduplication thresholds being used."""
print("\n" + "="*70)
print("TEST 3: Deduplication Configuration")
print("="*70 + "\n")
# Show environment variables
name_threshold = float(os.getenv('DEDUP_NAME_THRESHOLD', '0.8'))
desc_threshold = float(os.getenv('DEDUP_DESC_THRESHOLD', '0.7'))
image_threshold = float(os.getenv('DEDUP_IMAGE_THRESHOLD', '0.9'))
composite_threshold = float(os.getenv('DEDUP_COMPOSITE_THRESHOLD', '0.85'))
print("Current Deduplication Thresholds:")
print(f" Name similarity: {name_threshold:.2f}")
print(f" Description similarity: {desc_threshold:.2f}")
print(f" Image similarity: {image_threshold:.2f}")
print(f" Composite score: {composite_threshold:.2f}")
print("\nDeduplication Process:")
print(" 1. Generate fingerprint (organization + breed + age + gender)")
print(" 2. Query database for cats with same fingerprint")
print(" 3. For each candidate:")
print(" a. Load cached image embedding from database")
print(" b. Compare names using Levenshtein distance")
print(" c. Compare descriptions using fuzzy matching")
print(" d. Compare images using CLIP embeddings")
print(" e. Calculate composite score (weighted average)")
print(" 4. If composite score > threshold → mark as duplicate")
print(" 5. Otherwise → cache as new unique cat")
print("\n✅ Multi-stage deduplication with image embeddings is active!")
def show_cache_benefits():
"""Show benefits of using cache mode during development."""
print("\n" + "="*70)
print("CACHE MODE BENEFITS")
print("="*70 + "\n")
print("Why use cache mode during development?")
print()
print("1. 🚀 SPEED")
print(" - API search: ~13-14 seconds")
print(" - Cache search: ~1-2 seconds (10x faster!)")
print()
print("2. 💰 SAVE API CALLS")
print(" - Petfinder: 1000 requests/day limit")
print(" - 100 cats/search = ~10 searches before hitting limit")
print(" - Cache mode: unlimited searches!")
print()
print("3. 🧪 CONSISTENT TESTING")
print(" - Same dataset every time")
print(" - Test different profiles without new API calls")
print(" - Perfect for UI development")
print()
print("4. 🔌 OFFLINE DEVELOPMENT")
print(" - Work without internet")
print(" - No API key rotation needed")
print()
print("Usage:")
print(" # First run - fetch from API")
print(" result = framework.search(profile, use_cache=False)")
print()
print(" # Subsequent runs - use cached data")
print(" result = framework.search(profile, use_cache=True)")
if __name__ == "__main__":
load_dotenv()
print("\n" + "="*70)
print("TUXEDO LINK - CACHE & DEDUPLICATION TESTS")
print("="*70)
# Show benefits
show_cache_benefits()
# Test cache mode
try:
cache_result = test_cache_mode()
except Exception as e:
print(f"\n⚠️ Cache test failed: {e}")
print(" This is expected if you haven't run a search yet.")
print(" Run: python cat_adoption_framework.py")
cache_result = None
# Test image dedup
test_image_dedup()
# Show config
test_dedup_thresholds()
print("\n" + "="*70)
print("SUMMARY")
print("="*70 + "\n")
print("✅ Cache mode: IMPLEMENTED")
print("✅ Image embeddings: CACHED & USED")
print("✅ Multi-stage deduplication: ACTIVE")
print("✅ API call savings: ENABLED")
print("\nRecommendation for development:")
print(" 1. Run ONE search with use_cache=False to populate cache")
print(" 2. Use use_cache=True for all UI/testing work")
print(" 3. Refresh cache weekly or when you need new data")
print("\n" + "="*70 + "\n")

View File

@@ -1,146 +0,0 @@
#!/usr/bin/env python
"""Manual test script for email sending via Mailgun."""
import os
import sys
from pathlib import Path
from dotenv import load_dotenv
# Add project root to path
project_root = Path(__file__).parent.parent.parent
sys.path.insert(0, str(project_root))
# Load environment
load_dotenv()
from agents.email_providers import MailgunProvider, get_email_provider
from models.cats import Cat, CatMatch, AdoptionAlert, CatProfile
print("="*60)
print(" Tuxedo Link - Email Sending Test")
print("="*60)
print()
# Check if Mailgun key is set
if not os.getenv('MAILGUN_API_KEY'):
print("❌ MAILGUN_API_KEY not set in environment")
print("Please set it in your .env file")
sys.exit(1)
print("✓ Mailgun API key found")
print()
# Create test data
test_cat = Cat(
id="test-cat-123",
name="Whiskers",
age="Young",
gender="male",
size="medium",
breed="Domestic Short Hair",
description="A playful and friendly cat looking for a loving home!",
primary_photo="https://via.placeholder.com/400x300?text=Whiskers",
additional_photos=[],
city="New York",
state="NY",
country="US",
organization_name="Test Shelter",
url="https://example.com/cat/123",
good_with_children=True,
good_with_dogs=False,
good_with_cats=True,
declawed=False,
house_trained=True,
spayed_neutered=True,
special_needs=False,
shots_current=True,
adoption_fee=150.0,
source="test"
)
test_match = CatMatch(
cat=test_cat,
match_score=0.95,
explanation="Great match! Friendly and playful, perfect for families.",
vector_similarity=0.92,
attribute_match_score=0.98,
matching_attributes=["good_with_children", "playful", "medium_size"],
missing_attributes=[]
)
test_profile = CatProfile(
user_location="New York, NY",
max_distance=25,
age_range=["young", "adult"],
good_with_children=True,
good_with_dogs=False,
good_with_cats=True,
personality_description="Friendly and playful",
special_requirements=[]
)
test_alert = AdoptionAlert(
id=999,
user_email="test@example.com", # Replace with your actual email for testing
profile=test_profile,
frequency="immediately",
active=True
)
print("Creating email provider...")
try:
provider = get_email_provider() # Uses config.yaml
print(f"✓ Provider initialized: {provider.get_provider_name()}")
except Exception as e:
print(f"❌ Failed to initialize provider: {e}")
sys.exit(1)
print()
print("Preparing test email...")
print(f" To: {test_alert.user_email}")
print(f" Subject: Test - New Cat Match on Tuxedo Link!")
print()
# Create EmailAgent to use its template building methods
from agents.email_agent import EmailAgent
email_agent = EmailAgent(provider=provider)
# Build email content
subject = "🐱 Test - New Cat Match on Tuxedo Link!"
html_content = email_agent._build_match_html([test_match], test_alert)
text_content = email_agent._build_match_text([test_match])
# Send test email
print("Sending test email...")
input("Press Enter to send, or Ctrl+C to cancel...")
success = provider.send_email(
to=test_alert.user_email,
subject=subject,
html=html_content,
text=text_content
)
print()
if success:
print("✅ Email sent successfully!")
print()
print("Please check your inbox at:", test_alert.user_email)
print()
print("If you don't see it:")
print(" 1. Check your spam folder")
print(" 2. Verify the email address is correct")
print(" 3. Check Mailgun logs: https://app.mailgun.com/")
else:
print("❌ Failed to send email")
print()
print("Troubleshooting:")
print(" 1. Check MAILGUN_API_KEY is correct")
print(" 2. Verify Mailgun domain in config.yaml")
print(" 3. Check Mailgun account status")
print(" 4. View logs above for error details")
print()
print("="*60)

View File

@@ -1,2 +0,0 @@
"""Unit tests for Tuxedo Link."""

View File

@@ -1,287 +0,0 @@
"""Unit tests for breed mapping utilities."""
import pytest
import tempfile
import shutil
from utils.breed_mapping import (
normalize_user_breeds,
get_breed_suggestions,
USER_TERM_TO_API_BREED
)
from setup_metadata_vectordb import MetadataVectorDB
@pytest.fixture
def temp_vectordb():
"""Create a temporary metadata vector database with breeds indexed."""
temp_dir = tempfile.mkdtemp()
vectordb = MetadataVectorDB(persist_directory=temp_dir)
# Index some test breeds
test_breeds = [
"Siamese",
"Persian",
"Maine Coon",
"Bengal",
"Ragdoll",
"British Shorthair",
"Domestic Short Hair",
"Domestic Medium Hair",
"Domestic Long Hair"
]
vectordb.index_breeds(test_breeds, source="petfinder")
yield vectordb
# Cleanup
shutil.rmtree(temp_dir, ignore_errors=True)
class TestBreedMapping:
"""Tests for breed mapping functions."""
def test_dictionary_match_maine_coon(self):
"""Test dictionary mapping for 'maine coon' (common typo)."""
valid_breeds = ["Maine Coon", "Siamese", "Persian"]
result = normalize_user_breeds(["main coon"], valid_breeds) # Typo: "main"
assert len(result) > 0
assert "Maine Coon" in result
def test_dictionary_match_ragdoll(self):
"""Test dictionary mapping for 'ragdol' (typo)."""
valid_breeds = ["Ragdoll", "Siamese"]
result = normalize_user_breeds(["ragdol"], valid_breeds)
assert len(result) > 0
assert "Ragdoll" in result
def test_dictionary_match_sphynx(self):
"""Test dictionary mapping for 'sphinx' (common misspelling)."""
valid_breeds = ["Sphynx", "Persian"]
result = normalize_user_breeds(["sphinx"], valid_breeds)
assert len(result) > 0
assert "Sphynx" in result
def test_dictionary_match_mixed_breed(self):
"""Test dictionary mapping for 'mixed' returns multiple options."""
valid_breeds = [
"Mixed Breed",
"Domestic Short Hair",
"Domestic Medium Hair",
"Domestic Long Hair"
]
result = normalize_user_breeds(["mixed"], valid_breeds)
assert len(result) >= 1
# Should map to one or more domestic breeds
assert any(b in result for b in valid_breeds)
def test_exact_match_fallback(self):
"""Test exact match when not in dictionary."""
valid_breeds = ["Siamese", "Persian", "Bengal"]
result = normalize_user_breeds(["siamese"], valid_breeds)
assert len(result) == 1
assert "Siamese" in result
def test_substring_match_fallback(self):
"""Test substring matching for partial breed names."""
valid_breeds = ["British Shorthair", "American Shorthair"]
result = normalize_user_breeds(["shorthair"], valid_breeds)
assert len(result) >= 1
assert any("Shorthair" in breed for breed in result)
def test_multiple_breeds(self):
"""Test mapping multiple breed terms."""
valid_breeds = ["Siamese", "Persian", "Maine Coon"]
result = normalize_user_breeds(
["siamese", "persian", "maine"],
valid_breeds
)
assert len(result) >= 2 # At least siamese and persian should match
assert "Siamese" in result
assert "Persian" in result
def test_no_match(self):
"""Test when no match is found."""
valid_breeds = ["Siamese", "Persian"]
result = normalize_user_breeds(["invalid_breed_xyz"], valid_breeds)
# Should return empty list
assert len(result) == 0
def test_empty_input(self):
"""Test with empty input."""
valid_breeds = ["Siamese", "Persian"]
result = normalize_user_breeds([], valid_breeds)
assert len(result) == 0
result = normalize_user_breeds([""], valid_breeds)
assert len(result) == 0
def test_with_vectordb(self, temp_vectordb):
"""Test with vector DB for fuzzy matching."""
valid_breeds = ["Maine Coon", "Ragdoll", "Bengal"]
# Test with typo
result = normalize_user_breeds(
["ragdol"], # Typo
valid_breeds,
vectordb=temp_vectordb,
source="petfinder"
)
# Should still find Ragdoll via vector search (if not in dictionary)
# Or dictionary match if present
assert len(result) > 0
assert "Ragdoll" in result
def test_vector_search_typo(self, temp_vectordb):
"""Test vector search handles typos."""
valid_breeds = ["Siamese"]
# Typo: "siames"
result = normalize_user_breeds(
["siames"],
valid_breeds,
vectordb=temp_vectordb,
source="petfinder",
similarity_threshold=0.6
)
# Vector search should find Siamese
if len(result) > 0:
assert "Siamese" in result
def test_dictionary_priority(self, temp_vectordb):
"""Test that dictionary matches are prioritized over vector search."""
valid_breeds = ["Maine Coon"]
# "main coon" is in dictionary
result = normalize_user_breeds(
["main coon"],
valid_breeds,
vectordb=temp_vectordb,
source="petfinder"
)
# Should use dictionary match
assert "Maine Coon" in result
def test_case_insensitive(self):
"""Test case-insensitive matching."""
valid_breeds = ["Maine Coon"]
result_lower = normalize_user_breeds(["maine"], valid_breeds)
result_upper = normalize_user_breeds(["MAINE"], valid_breeds)
result_mixed = normalize_user_breeds(["MaInE"], valid_breeds)
assert result_lower == result_upper == result_mixed
def test_domestic_variations(self):
"""Test that DSH/DMH/DLH map correctly."""
valid_breeds = [
"Domestic Short Hair",
"Domestic Medium Hair",
"Domestic Long Hair"
]
result_dsh = normalize_user_breeds(["dsh"], valid_breeds)
result_dmh = normalize_user_breeds(["dmh"], valid_breeds)
result_dlh = normalize_user_breeds(["dlh"], valid_breeds)
assert "Domestic Short Hair" in result_dsh
assert "Domestic Medium Hair" in result_dmh
assert "Domestic Long Hair" in result_dlh
def test_tabby_is_not_breed(self):
"""Test that 'tabby' maps to Domestic Short Hair (tabby is a pattern, not breed)."""
valid_breeds = ["Domestic Short Hair", "Siamese"]
result = normalize_user_breeds(["tabby"], valid_breeds)
assert len(result) > 0
assert "Domestic Short Hair" in result
def test_get_breed_suggestions(self):
"""Test breed suggestions function."""
valid_breeds = [
"British Shorthair",
"American Shorthair",
"Domestic Short Hair"
]
suggestions = get_breed_suggestions("short", valid_breeds, top_n=3)
assert len(suggestions) == 3
assert all("Short" in s for s in suggestions)
def test_all_dictionary_mappings(self):
"""Test that all dictionary mappings are correctly defined."""
# Verify structure of USER_TERM_TO_API_BREED
assert isinstance(USER_TERM_TO_API_BREED, dict)
for user_term, api_breeds in USER_TERM_TO_API_BREED.items():
assert isinstance(user_term, str)
assert isinstance(api_breeds, list)
assert len(api_breeds) > 0
assert all(isinstance(b, str) for b in api_breeds)
def test_whitespace_handling(self):
"""Test handling of whitespace in user input."""
valid_breeds = ["Maine Coon"]
result1 = normalize_user_breeds([" maine "], valid_breeds)
result2 = normalize_user_breeds(["maine"], valid_breeds)
assert result1 == result2
def test_norwegian_forest_variations(self):
"""Test Norwegian Forest Cat variations."""
valid_breeds = ["Norwegian Forest Cat"]
result1 = normalize_user_breeds(["norwegian forest"], valid_breeds)
result2 = normalize_user_breeds(["norwegian forest cat"], valid_breeds)
assert "Norwegian Forest Cat" in result1
assert "Norwegian Forest Cat" in result2
def test_similarity_threshold(self, temp_vectordb):
"""Test that similarity threshold works."""
valid_breeds = ["Siamese"]
# Very different term
result_high = normalize_user_breeds(
["abcxyz"],
valid_breeds,
vectordb=temp_vectordb,
source="petfinder",
similarity_threshold=0.9 # High threshold
)
result_low = normalize_user_breeds(
["abcxyz"],
valid_breeds,
vectordb=temp_vectordb,
source="petfinder",
similarity_threshold=0.1 # Low threshold
)
# High threshold should reject poor matches
# Low threshold may accept them
assert len(result_high) <= len(result_low)

View File

@@ -1,225 +0,0 @@
"""Unit tests for color mapping utilities."""
import pytest
import tempfile
import shutil
from utils.color_mapping import (
normalize_user_colors,
get_color_suggestions,
USER_TERM_TO_API_COLOR
)
from setup_metadata_vectordb import MetadataVectorDB
@pytest.fixture
def temp_vectordb():
"""Create a temporary metadata vector database with colors indexed."""
temp_dir = tempfile.mkdtemp()
vectordb = MetadataVectorDB(persist_directory=temp_dir)
# Index some test colors
test_colors = [
"Black",
"White",
"Black & White / Tuxedo",
"Orange / Red",
"Gray / Blue / Silver",
"Calico",
"Tabby (Brown / Chocolate)"
]
vectordb.index_colors(test_colors, source="petfinder")
yield vectordb
# Cleanup
shutil.rmtree(temp_dir, ignore_errors=True)
class TestColorMapping:
"""Tests for color mapping functions."""
def test_dictionary_match_tuxedo(self):
"""Test dictionary mapping for 'tuxedo'."""
valid_colors = ["Black", "White", "Black & White / Tuxedo"]
result = normalize_user_colors(["tuxedo"], valid_colors)
assert len(result) > 0
assert "Black & White / Tuxedo" in result
assert "Black" not in result # Should NOT map to separate colors
def test_dictionary_match_orange(self):
"""Test dictionary mapping for 'orange'."""
valid_colors = ["Orange / Red", "White"]
result = normalize_user_colors(["orange"], valid_colors)
assert len(result) == 1
assert "Orange / Red" in result
def test_dictionary_match_gray_variations(self):
"""Test dictionary mapping for gray/grey."""
valid_colors = ["Gray / Blue / Silver", "White"]
result_gray = normalize_user_colors(["gray"], valid_colors)
result_grey = normalize_user_colors(["grey"], valid_colors)
assert result_gray == result_grey
assert "Gray / Blue / Silver" in result_gray
def test_multiple_colors(self):
"""Test mapping multiple color terms."""
valid_colors = [
"Black & White / Tuxedo",
"Orange / Red",
"Calico"
]
result = normalize_user_colors(
["tuxedo", "orange", "calico"],
valid_colors
)
assert len(result) == 3
assert "Black & White / Tuxedo" in result
assert "Orange / Red" in result
assert "Calico" in result
def test_exact_match_fallback(self):
"""Test exact match when not in dictionary."""
valid_colors = ["Black", "White", "Calico"]
# "Calico" should match exactly
result = normalize_user_colors(["calico"], valid_colors)
assert len(result) == 1
assert "Calico" in result
def test_substring_match_fallback(self):
"""Test substring matching as last resort."""
valid_colors = ["Tabby (Brown / Chocolate)", "Tabby (Orange / Red)"]
# "tabby" should match both tabby colors
result = normalize_user_colors(["tabby"], valid_colors)
assert len(result) >= 1
assert any("Tabby" in color for color in result)
def test_no_match(self):
"""Test when no match is found."""
valid_colors = ["Black", "White"]
result = normalize_user_colors(["invalid_color_xyz"], valid_colors)
# Should return empty list
assert len(result) == 0
def test_empty_input(self):
"""Test with empty input."""
valid_colors = ["Black", "White"]
result = normalize_user_colors([], valid_colors)
assert len(result) == 0
result = normalize_user_colors([""], valid_colors)
assert len(result) == 0
def test_with_vectordb(self, temp_vectordb):
"""Test with vector DB for fuzzy matching."""
valid_colors = [
"Black & White / Tuxedo",
"Orange / Red",
"Gray / Blue / Silver"
]
# Test with typo (with lower threshold to demonstrate fuzzy matching)
result = normalize_user_colors(
["tuxado"], # Typo
valid_colors,
vectordb=temp_vectordb,
source="petfinder",
similarity_threshold=0.3 # Lower threshold for typos
)
# With lower threshold, may find a match (not guaranteed for all typos)
# The main point is that it doesn't crash and handles typos gracefully
assert isinstance(result, list) # Returns a list (may be empty)
def test_vector_search_typo(self, temp_vectordb):
"""Test vector search handles typos."""
valid_colors = ["Gray / Blue / Silver"]
# Typo: "grey" is in dictionary but "gery" is not
result = normalize_user_colors(
["gery"], # Typo
valid_colors,
vectordb=temp_vectordb,
source="petfinder",
similarity_threshold=0.6 # Lower threshold for typos
)
# Vector search should find gray
# Note: May not always work for severe typos
if len(result) > 0:
assert "Gray" in result[0] or "Blue" in result[0] or "Silver" in result[0]
def test_dictionary_priority(self, temp_vectordb):
"""Test that dictionary matches are prioritized over vector search."""
valid_colors = ["Black & White / Tuxedo", "Black"]
# "tuxedo" is in dictionary
result = normalize_user_colors(
["tuxedo"],
valid_colors,
vectordb=temp_vectordb,
source="petfinder"
)
# Should use dictionary match
assert "Black & White / Tuxedo" in result
assert "Black" not in result # Should not be separate
def test_case_insensitive(self):
"""Test case-insensitive matching."""
valid_colors = ["Black & White / Tuxedo"]
result_lower = normalize_user_colors(["tuxedo"], valid_colors)
result_upper = normalize_user_colors(["TUXEDO"], valid_colors)
result_mixed = normalize_user_colors(["TuXeDo"], valid_colors)
assert result_lower == result_upper == result_mixed
def test_get_color_suggestions(self):
"""Test color suggestions function."""
valid_colors = [
"Tabby (Brown / Chocolate)",
"Tabby (Orange / Red)",
"Tabby (Gray / Blue / Silver)"
]
suggestions = get_color_suggestions("tab", valid_colors, top_n=3)
assert len(suggestions) == 3
assert all("Tabby" in s for s in suggestions)
def test_all_dictionary_mappings(self):
"""Test that all dictionary mappings are correctly defined."""
# Verify structure of USER_TERM_TO_API_COLOR
assert isinstance(USER_TERM_TO_API_COLOR, dict)
for user_term, api_colors in USER_TERM_TO_API_COLOR.items():
assert isinstance(user_term, str)
assert isinstance(api_colors, list)
assert len(api_colors) > 0
assert all(isinstance(c, str) for c in api_colors)
def test_whitespace_handling(self):
"""Test handling of whitespace in user input."""
valid_colors = ["Black & White / Tuxedo"]
result1 = normalize_user_colors([" tuxedo "], valid_colors)
result2 = normalize_user_colors(["tuxedo"], valid_colors)
assert result1 == result2

View File

@@ -1,235 +0,0 @@
"""Fixed unit tests for database manager."""
import pytest
from models.cats import Cat, CatProfile, AdoptionAlert
class TestDatabaseInitialization:
"""Tests for database initialization."""
def test_database_creation(self, temp_db):
"""Test that database is created with tables."""
assert temp_db.db_path.endswith('.db')
# Check that tables exist
with temp_db.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"SELECT name FROM sqlite_master WHERE type='table'"
)
tables = {row['name'] for row in cursor.fetchall()}
assert 'alerts' in tables
assert 'cats_cache' in tables
def test_get_connection(self, temp_db):
"""Test database connection."""
with temp_db.get_connection() as conn:
assert conn is not None
cursor = conn.cursor()
cursor.execute("SELECT 1")
assert cursor.fetchone()[0] == 1
class TestCatCaching:
"""Tests for cat caching operations."""
def test_cache_cat(self, temp_db, sample_cat_data):
"""Test caching a cat."""
from utils.deduplication import create_fingerprint
cat = Cat(**sample_cat_data)
cat.fingerprint = create_fingerprint(cat) # Generate fingerprint
temp_db.cache_cat(cat, None)
# Verify cat was cached
cats = temp_db.get_all_cached_cats()
assert len(cats) == 1
assert cats[0].name == "Test Cat"
def test_cache_cat_with_embedding(self, temp_db, sample_cat_data):
"""Test caching a cat with image embedding."""
import numpy as np
from utils.deduplication import create_fingerprint
cat = Cat(**sample_cat_data)
cat.fingerprint = create_fingerprint(cat) # Generate fingerprint
embedding = np.array([0.1, 0.2, 0.3], dtype=np.float32)
temp_db.cache_cat(cat, embedding)
# Verify embedding was saved
with temp_db.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"SELECT image_embedding FROM cats_cache WHERE id = ?",
(cat.id,)
)
row = cursor.fetchone()
assert row['image_embedding'] is not None
def test_get_cats_by_fingerprint(self, temp_db):
"""Test retrieving cats by fingerprint."""
cat1 = Cat(
id="test1",
name="Cat 1",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="Test City",
state="TS",
source="test",
organization_name="Test Rescue",
url="https://example.com/cat/test1",
fingerprint="test_fingerprint"
)
cat2 = Cat(
id="test2",
name="Cat 2",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="Test City",
state="TS",
source="test",
organization_name="Test Rescue",
url="https://example.com/cat/test2",
fingerprint="test_fingerprint"
)
temp_db.cache_cat(cat1, None)
temp_db.cache_cat(cat2, None)
results = temp_db.get_cats_by_fingerprint("test_fingerprint")
assert len(results) == 2
def test_mark_as_duplicate(self, temp_db):
"""Test marking a cat as duplicate."""
from utils.deduplication import create_fingerprint
cat1 = Cat(
id="original",
name="Original",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="Test City",
state="TS",
source="test",
organization_name="Test Rescue",
url="https://example.com/cat/original"
)
cat1.fingerprint = create_fingerprint(cat1)
cat2 = Cat(
id="duplicate",
name="Duplicate",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="Test City",
state="TS",
source="test",
organization_name="Test Rescue",
url="https://example.com/cat/duplicate"
)
cat2.fingerprint = create_fingerprint(cat2)
temp_db.cache_cat(cat1, None)
temp_db.cache_cat(cat2, None)
temp_db.mark_as_duplicate("duplicate", "original")
# Check duplicate is marked
with temp_db.get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"SELECT is_duplicate, duplicate_of FROM cats_cache WHERE id = ?",
("duplicate",)
)
row = cursor.fetchone()
assert row['is_duplicate'] == 1
assert row['duplicate_of'] == "original"
def test_get_cache_stats(self, temp_db):
"""Test getting cache statistics."""
from utils.deduplication import create_fingerprint
cat1 = Cat(
id="test1",
name="Cat 1",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="Test City",
state="TS",
source="petfinder",
organization_name="Test Rescue",
url="https://example.com/cat/test1"
)
cat1.fingerprint = create_fingerprint(cat1)
cat2 = Cat(
id="test2",
name="Cat 2",
breed="Siamese",
age="young",
gender="male",
size="small",
city="Test City",
state="TS",
source="rescuegroups",
organization_name="Other Rescue",
url="https://example.com/cat/test2"
)
cat2.fingerprint = create_fingerprint(cat2)
temp_db.cache_cat(cat1, None)
temp_db.cache_cat(cat2, None)
stats = temp_db.get_cache_stats()
assert stats['total_unique'] == 2
assert stats['sources'] == 2
assert 'petfinder' in stats['by_source']
assert 'rescuegroups' in stats['by_source']
class TestAlertManagement:
"""Tests for alert management operations."""
def test_create_alert(self, temp_db):
"""Test creating an alert."""
profile = CatProfile(user_location="10001")
alert = AdoptionAlert(
user_email="test@example.com",
profile=profile,
frequency="daily"
)
alert_id = temp_db.create_alert(alert)
assert alert_id is not None
assert alert_id > 0
def test_get_alerts_by_email(self, temp_db):
"""Test retrieving alerts by email."""
profile = CatProfile(user_location="10001")
alert = AdoptionAlert(
user_email="test@example.com",
profile=profile,
frequency="daily"
)
temp_db.create_alert(alert)
alerts = temp_db.get_alerts_by_email("test@example.com")
assert len(alerts) > 0
assert alerts[0].user_email == "test@example.com"

View File

@@ -1,278 +0,0 @@
"""Fixed unit tests for deduplication utilities."""
import pytest
from models.cats import Cat
from utils.deduplication import create_fingerprint, calculate_levenshtein_similarity, calculate_composite_score
class TestFingerprinting:
"""Tests for fingerprint generation."""
def test_fingerprint_basic(self):
"""Test basic fingerprint generation."""
cat = Cat(
id="12345",
name="Fluffy",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="New York",
state="NY",
source="petfinder",
organization_name="Happy Paws Rescue",
url="https://example.com/cat/12345"
)
fingerprint = create_fingerprint(cat)
assert fingerprint is not None
assert isinstance(fingerprint, str)
# Fingerprint is a hash, so just verify it's a 16-character hex string
assert len(fingerprint) == 16
assert all(c in '0123456789abcdef' for c in fingerprint)
def test_fingerprint_consistency(self):
"""Test that same cat produces same fingerprint."""
cat1 = Cat(
id="12345",
name="Fluffy",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="New York",
state="NY",
source="petfinder",
organization_name="Happy Paws",
url="https://example.com/cat/12345"
)
cat2 = Cat(
id="67890",
name="Fluffy McGee", # Different name
breed="Persian",
age="adult",
gender="female",
size="medium",
city="Boston", # Different city
state="MA",
source="rescuegroups", # Different source
organization_name="Happy Paws",
url="https://example.com/cat/67890"
)
# Should have same fingerprint (stable attributes match)
assert create_fingerprint(cat1) == create_fingerprint(cat2)
def test_fingerprint_difference(self):
"""Test that different cats produce different fingerprints."""
cat1 = Cat(
id="12345",
name="Fluffy",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="New York",
state="NY",
source="petfinder",
organization_name="Happy Paws",
url="https://example.com/cat/12345"
)
cat2 = Cat(
id="67890",
name="Fluffy",
breed="Persian",
age="young", # Different age
gender="female",
size="medium",
city="New York",
state="NY",
source="petfinder",
organization_name="Happy Paws",
url="https://example.com/cat/67890"
)
# Should have different fingerprints
assert create_fingerprint(cat1) != create_fingerprint(cat2)
class TestLevenshteinSimilarity:
"""Tests for Levenshtein similarity calculation."""
def test_identical_strings(self):
"""Test identical strings return 1.0."""
similarity = calculate_levenshtein_similarity("Fluffy", "Fluffy")
assert similarity == 1.0
def test_completely_different_strings(self):
"""Test completely different strings return low score."""
similarity = calculate_levenshtein_similarity("Fluffy", "12345")
assert similarity < 0.2
def test_similar_strings(self):
"""Test similar strings return high score."""
similarity = calculate_levenshtein_similarity("Fluffy", "Fluffy2")
assert similarity > 0.8
def test_case_insensitive(self):
"""Test that comparison is case-insensitive."""
similarity = calculate_levenshtein_similarity("Fluffy", "fluffy")
assert similarity == 1.0
def test_empty_strings(self):
"""Test empty strings - both empty is 0.0 similarity."""
similarity = calculate_levenshtein_similarity("", "")
assert similarity == 0.0 # Empty strings return 0.0 in implementation
similarity = calculate_levenshtein_similarity("Fluffy", "")
assert similarity == 0.0
class TestCompositeScore:
"""Tests for composite score calculation."""
def test_composite_score_all_high(self):
"""Test composite score when all similarities are high."""
score = calculate_composite_score(
name_similarity=0.9,
description_similarity=0.9,
image_similarity=0.9,
name_weight=0.4,
description_weight=0.3,
image_weight=0.3
)
assert score > 0.85
assert score <= 1.0
def test_composite_score_weighted(self):
"""Test that weights affect composite score correctly."""
# Name has 100% weight
score = calculate_composite_score(
name_similarity=0.5,
description_similarity=1.0,
image_similarity=1.0,
name_weight=1.0,
description_weight=0.0,
image_weight=0.0
)
assert score == 0.5
def test_composite_score_zero_image(self):
"""Test composite score when no image similarity."""
score = calculate_composite_score(
name_similarity=0.9,
description_similarity=0.9,
image_similarity=0.0,
name_weight=0.4,
description_weight=0.3,
image_weight=0.3
)
# Should still compute based on name and description
assert score > 0.5
assert score < 0.9
def test_composite_score_bounds(self):
"""Test that composite score is always between 0 and 1."""
score = calculate_composite_score(
name_similarity=1.0,
description_similarity=1.0,
image_similarity=1.0,
name_weight=0.4,
description_weight=0.3,
image_weight=0.3
)
assert 0.0 <= score <= 1.0
class TestTextSimilarity:
"""Integration tests for text similarity (name + description)."""
def test_similar_cats_high_score(self):
"""Test that similar cats get high similarity scores."""
cat1 = Cat(
id="12345",
name="Fluffy",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="New York",
state="NY",
source="petfinder",
organization_name="Test Rescue",
url="https://example.com/cat/12345",
description="A very friendly and playful cat that loves to cuddle"
)
cat2 = Cat(
id="67890",
name="Fluffy",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="New York",
state="NY",
source="rescuegroups",
organization_name="Test Rescue",
url="https://example.com/cat/67890",
description="Very friendly playful cat who loves cuddling"
)
name_sim = calculate_levenshtein_similarity(cat1.name, cat2.name)
desc_sim = calculate_levenshtein_similarity(
cat1.description or "",
cat2.description or ""
)
assert name_sim == 1.0
assert desc_sim > 0.7
def test_different_cats_low_score(self):
"""Test that different cats get low similarity scores."""
cat1 = Cat(
id="12345",
name="Fluffy",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="New York",
state="NY",
source="petfinder",
organization_name="Test Rescue",
url="https://example.com/cat/12345",
description="Playful kitten"
)
cat2 = Cat(
id="67890",
name="Rex",
breed="Siamese",
age="young",
gender="male",
size="large",
city="Boston",
state="MA",
source="rescuegroups",
organization_name="Other Rescue",
url="https://example.com/cat/67890",
description="Calm senior cat"
)
name_sim = calculate_levenshtein_similarity(cat1.name, cat2.name)
desc_sim = calculate_levenshtein_similarity(
cat1.description or "",
cat2.description or ""
)
assert name_sim < 0.3
assert desc_sim < 0.5

View File

@@ -1,235 +0,0 @@
"""Unit tests for email providers."""
import pytest
from unittest.mock import patch, MagicMock
from agents.email_providers import (
EmailProvider,
MailgunProvider,
SendGridProvider,
get_email_provider
)
class TestMailgunProvider:
"""Tests for Mailgun email provider."""
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-api-key'})
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
@patch('agents.email_providers.mailgun_provider.get_email_config')
def test_init(self, mock_email_config, mock_mailgun_config):
"""Test Mailgun provider initialization."""
mock_mailgun_config.return_value = {
'domain': 'test.mailgun.org'
}
mock_email_config.return_value = {
'from_name': 'Test App',
'from_email': 'test@test.com'
}
provider = MailgunProvider()
assert provider.api_key == 'test-api-key'
assert provider.domain == 'test.mailgun.org'
assert provider.default_from_name == 'Test App'
assert provider.default_from_email == 'test@test.com'
@patch.dict('os.environ', {})
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
@patch('agents.email_providers.mailgun_provider.get_email_config')
def test_init_missing_api_key(self, mock_email_config, mock_mailgun_config):
"""Test that initialization fails without API key."""
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
mock_email_config.return_value = {
'from_name': 'Test',
'from_email': 'test@test.com'
}
with pytest.raises(ValueError, match="MAILGUN_API_KEY"):
MailgunProvider()
@patch('agents.email_providers.mailgun_provider.requests.post')
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-api-key'})
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
@patch('agents.email_providers.mailgun_provider.get_email_config')
def test_send_email_success(self, mock_email_config, mock_mailgun_config, mock_post):
"""Test successful email sending."""
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
mock_email_config.return_value = {
'from_name': 'Test App',
'from_email': 'test@test.com'
}
# Mock successful response
mock_response = MagicMock()
mock_response.status_code = 200
mock_post.return_value = mock_response
provider = MailgunProvider()
result = provider.send_email(
to="recipient@test.com",
subject="Test Subject",
html="<p>Test HTML</p>",
text="Test Text"
)
assert result is True
mock_post.assert_called_once()
# Check request parameters
call_args = mock_post.call_args
assert call_args[1]['auth'] == ('api', 'test-api-key')
assert call_args[1]['data']['to'] == 'recipient@test.com'
assert call_args[1]['data']['subject'] == 'Test Subject'
@patch('agents.email_providers.mailgun_provider.requests.post')
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-api-key'})
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
@patch('agents.email_providers.mailgun_provider.get_email_config')
def test_send_email_failure(self, mock_email_config, mock_mailgun_config, mock_post):
"""Test email sending failure."""
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
mock_email_config.return_value = {
'from_name': 'Test App',
'from_email': 'test@test.com'
}
# Mock failed response
mock_response = MagicMock()
mock_response.status_code = 400
mock_response.text = "Bad Request"
mock_post.return_value = mock_response
provider = MailgunProvider()
result = provider.send_email(
to="recipient@test.com",
subject="Test",
html="<p>Test</p>",
text="Test"
)
assert result is False
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-api-key'})
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
@patch('agents.email_providers.mailgun_provider.get_email_config')
def test_get_provider_name(self, mock_email_config, mock_mailgun_config):
"""Test provider name."""
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
mock_email_config.return_value = {
'from_name': 'Test',
'from_email': 'test@test.com'
}
provider = MailgunProvider()
assert provider.get_provider_name() == "mailgun"
class TestSendGridProvider:
"""Tests for SendGrid email provider (stub)."""
@patch.dict('os.environ', {'SENDGRID_API_KEY': 'test-api-key'})
@patch('agents.email_providers.sendgrid_provider.get_email_config')
def test_init(self, mock_email_config):
"""Test SendGrid provider initialization."""
mock_email_config.return_value = {
'from_name': 'Test App',
'from_email': 'test@test.com'
}
provider = SendGridProvider()
assert provider.api_key == 'test-api-key'
assert provider.default_from_name == 'Test App'
assert provider.default_from_email == 'test@test.com'
@patch.dict('os.environ', {'SENDGRID_API_KEY': 'test-api-key'})
@patch('agents.email_providers.sendgrid_provider.get_email_config')
def test_send_email_stub(self, mock_email_config):
"""Test that SendGrid stub always succeeds."""
mock_email_config.return_value = {
'from_name': 'Test',
'from_email': 'test@test.com'
}
provider = SendGridProvider()
result = provider.send_email(
to="test@test.com",
subject="Test",
html="<p>Test</p>",
text="Test"
)
# Stub should always return True
assert result is True
@patch.dict('os.environ', {'SENDGRID_API_KEY': 'test-api-key'})
@patch('agents.email_providers.sendgrid_provider.get_email_config')
def test_get_provider_name(self, mock_email_config):
"""Test provider name."""
mock_email_config.return_value = {
'from_name': 'Test',
'from_email': 'test@test.com'
}
provider = SendGridProvider()
assert provider.get_provider_name() == "sendgrid (stub)"
class TestEmailProviderFactory:
"""Tests for email provider factory."""
@patch('agents.email_providers.factory.get_configured_provider')
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-key'})
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
@patch('agents.email_providers.mailgun_provider.get_email_config')
def test_get_mailgun_provider(self, mock_email_config, mock_mailgun_config, mock_get_configured):
"""Test getting Mailgun provider."""
mock_get_configured.return_value = 'mailgun'
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
mock_email_config.return_value = {
'from_name': 'Test',
'from_email': 'test@test.com'
}
provider = get_email_provider()
assert isinstance(provider, MailgunProvider)
@patch('agents.email_providers.factory.get_configured_provider')
@patch.dict('os.environ', {})
@patch('agents.email_providers.sendgrid_provider.get_email_config')
def test_get_sendgrid_provider(self, mock_email_config, mock_get_configured):
"""Test getting SendGrid provider."""
mock_get_configured.return_value = 'sendgrid'
mock_email_config.return_value = {
'from_name': 'Test',
'from_email': 'test@test.com'
}
provider = get_email_provider()
assert isinstance(provider, SendGridProvider)
@patch('agents.email_providers.factory.get_configured_provider')
def test_unknown_provider(self, mock_get_configured):
"""Test that unknown provider raises error."""
mock_get_configured.return_value = 'unknown'
with pytest.raises(ValueError, match="Unknown email provider"):
get_email_provider()
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-key'})
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
@patch('agents.email_providers.mailgun_provider.get_email_config')
def test_explicit_provider_name(self, mock_email_config, mock_mailgun_config):
"""Test explicitly specifying provider name."""
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
mock_email_config.return_value = {
'from_name': 'Test',
'from_email': 'test@test.com'
}
provider = get_email_provider('mailgun')
assert isinstance(provider, MailgunProvider)

View File

@@ -1,154 +0,0 @@
"""Unit tests for metadata vector database."""
import pytest
import tempfile
import shutil
from pathlib import Path
from setup_metadata_vectordb import MetadataVectorDB
@pytest.fixture
def temp_vectordb():
"""Create a temporary metadata vector database."""
temp_dir = tempfile.mkdtemp()
vectordb = MetadataVectorDB(persist_directory=temp_dir)
yield vectordb
# Cleanup
shutil.rmtree(temp_dir, ignore_errors=True)
class TestMetadataVectorDB:
"""Tests for MetadataVectorDB class."""
def test_initialization(self, temp_vectordb):
"""Test vector DB initializes correctly."""
assert temp_vectordb is not None
assert temp_vectordb.colors_collection is not None
assert temp_vectordb.breeds_collection is not None
def test_index_colors(self, temp_vectordb):
"""Test indexing colors."""
colors = ["Black", "White", "Black & White / Tuxedo", "Orange / Red"]
temp_vectordb.index_colors(colors, source="petfinder")
# Check indexed
stats = temp_vectordb.get_stats()
assert stats['colors_count'] == len(colors)
# Should not re-index same source
temp_vectordb.index_colors(colors, source="petfinder")
stats = temp_vectordb.get_stats()
assert stats['colors_count'] == len(colors) # Should not double
def test_index_breeds(self, temp_vectordb):
"""Test indexing breeds."""
breeds = ["Siamese", "Persian", "Maine Coon", "Bengal"]
temp_vectordb.index_breeds(breeds, source="petfinder")
# Check indexed
stats = temp_vectordb.get_stats()
assert stats['breeds_count'] == len(breeds)
def test_search_color_exact(self, temp_vectordb):
"""Test searching for exact color match."""
colors = ["Black", "White", "Black & White / Tuxedo"]
temp_vectordb.index_colors(colors, source="petfinder")
# Search for exact match
results = temp_vectordb.search_color("tuxedo", source_filter="petfinder")
assert len(results) > 0
assert results[0]['color'] == "Black & White / Tuxedo"
assert results[0]['similarity'] > 0.5 # Should be reasonable similarity
def test_search_color_fuzzy(self, temp_vectordb):
"""Test searching for color with typo."""
colors = ["Black & White / Tuxedo", "Orange / Red", "Gray / Blue / Silver"]
temp_vectordb.index_colors(colors, source="petfinder")
# Search with typo
results = temp_vectordb.search_color("tuxado", source_filter="petfinder") # typo: tuxado
assert len(results) > 0
# Should still find tuxedo
assert "Tuxedo" in results[0]['color'] or "tuxado" in results[0]['color'].lower()
def test_search_breed_exact(self, temp_vectordb):
"""Test searching for exact breed match."""
breeds = ["Siamese", "Persian", "Maine Coon"]
temp_vectordb.index_breeds(breeds, source="petfinder")
results = temp_vectordb.search_breed("siamese", source_filter="petfinder")
assert len(results) > 0
assert results[0]['breed'] == "Siamese"
assert results[0]['similarity'] > 0.9 # Should be very high for exact match
def test_search_breed_fuzzy(self, temp_vectordb):
"""Test searching for breed with typo."""
breeds = ["Maine Coon", "Ragdoll", "British Shorthair"]
temp_vectordb.index_breeds(breeds, source="petfinder")
# Typo: "main coon" instead of "Maine Coon"
results = temp_vectordb.search_breed("main coon", source_filter="petfinder")
assert len(results) > 0
assert "Maine" in results[0]['breed'] or "Coon" in results[0]['breed']
def test_multiple_sources(self, temp_vectordb):
"""Test indexing from multiple sources."""
petfinder_colors = ["Black", "White", "Tabby"]
rescuegroups_colors = ["Black", "Grey", "Calico"]
temp_vectordb.index_colors(petfinder_colors, source="petfinder")
temp_vectordb.index_colors(rescuegroups_colors, source="rescuegroups")
# Should have both indexed
stats = temp_vectordb.get_stats()
assert stats['colors_count'] == len(petfinder_colors) + len(rescuegroups_colors)
# Search with source filter
results = temp_vectordb.search_color("black", source_filter="petfinder")
assert len(results) > 0
assert results[0]['source'] == "petfinder"
def test_empty_search(self, temp_vectordb):
"""Test searching with empty string."""
colors = ["Black", "White"]
temp_vectordb.index_colors(colors, source="petfinder")
results = temp_vectordb.search_color("", source_filter="petfinder")
assert len(results) == 0
results = temp_vectordb.search_color(None, source_filter="petfinder")
assert len(results) == 0
def test_no_match(self, temp_vectordb):
"""Test search that returns no good matches."""
colors = ["Black", "White"]
temp_vectordb.index_colors(colors, source="petfinder")
# Search for something very different
results = temp_vectordb.search_color("xyzabc123", source_filter="petfinder")
# Will return something (nearest neighbor) but with low similarity
if len(results) > 0:
assert results[0]['similarity'] < 0.5 # Low similarity
def test_n_results(self, temp_vectordb):
"""Test returning multiple results."""
colors = ["Black", "White", "Black & White / Tuxedo", "Gray / Blue / Silver"]
temp_vectordb.index_colors(colors, source="petfinder")
# Get top 3 results
results = temp_vectordb.search_color("black", n_results=3, source_filter="petfinder")
assert len(results) <= 3
# First should be best match
assert "Black" in results[0]['color']

View File

@@ -1,186 +0,0 @@
"""Fixed unit tests for data models."""
import pytest
from datetime import datetime
from models.cats import Cat, CatProfile, CatMatch, AdoptionAlert, SearchResult
class TestCat:
"""Tests for Cat model."""
def test_cat_creation(self):
"""Test basic cat creation."""
cat = Cat(
id="12345",
name="Fluffy",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="New York",
state="NY",
source="petfinder",
organization_name="Test Rescue",
url="https://example.com/cat/12345"
)
assert cat.name == "Fluffy"
assert cat.breed == "Persian"
assert cat.age == "adult"
assert cat.gender == "female"
assert cat.size == "medium"
assert cat.organization_name == "Test Rescue"
def test_cat_with_optional_fields(self):
"""Test cat with all optional fields."""
cat = Cat(
id="12345",
name="Fluffy",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="New York",
state="NY",
source="petfinder",
organization_name="Test Rescue",
url="https://example.com/cat/12345",
description="Very fluffy",
primary_photo="http://example.com/photo.jpg",
adoption_fee=150.00,
good_with_children=True,
good_with_dogs=False,
good_with_cats=True
)
assert cat.description == "Very fluffy"
assert cat.adoption_fee == 150.00
assert cat.good_with_children is True
def test_cat_from_json(self):
"""Test cat deserialization from JSON."""
json_data = """
{
"id": "12345",
"name": "Fluffy",
"breed": "Persian",
"age": "adult",
"gender": "female",
"size": "medium",
"city": "New York",
"state": "NY",
"source": "petfinder",
"organization_name": "Test Rescue",
"url": "https://example.com/cat/12345"
}
"""
cat = Cat.model_validate_json(json_data)
assert cat.name == "Fluffy"
assert cat.id == "12345"
class TestCatProfile:
"""Tests for CatProfile model."""
def test_profile_creation_minimal(self):
"""Test profile with minimal fields."""
profile = CatProfile()
assert profile.personality_description == "" # Defaults to empty string
assert profile.max_distance == 100
assert profile.age_range is None # No default
def test_profile_creation_full(self):
"""Test profile with all fields."""
profile = CatProfile(
user_location="10001",
max_distance=50,
personality_description="friendly and playful",
age_range=["young", "adult"],
size=["small", "medium"],
good_with_children=True,
good_with_dogs=True,
good_with_cats=False
)
assert profile.user_location == "10001"
assert profile.max_distance == 50
assert "young" in profile.age_range
assert profile.good_with_children is True
class TestCatMatch:
"""Tests for CatMatch model."""
def test_match_creation(self):
"""Test match creation."""
cat = Cat(
id="12345",
name="Fluffy",
breed="Persian",
age="adult",
gender="female",
size="medium",
city="New York",
state="NY",
source="petfinder",
organization_name="Test Rescue",
url="https://example.com/cat/12345"
)
match = CatMatch(
cat=cat,
match_score=0.85,
vector_similarity=0.9,
attribute_match_score=0.8,
explanation="Great personality match"
)
assert match.cat.name == "Fluffy"
assert match.match_score == 0.85
assert "personality" in match.explanation
class TestAdoptionAlert:
"""Tests for AdoptionAlert model."""
def test_alert_creation(self):
"""Test alert creation."""
cat_profile = CatProfile(
user_location="10001",
personality_description="friendly"
)
alert = AdoptionAlert(
user_id=1,
user_email="test@example.com",
profile=cat_profile, # Correct field name
frequency="daily"
)
assert alert.user_email == "test@example.com"
assert alert.frequency == "daily"
assert alert.active is True
class TestSearchResult:
"""Tests for SearchResult model."""
def test_search_result_creation(self):
"""Test search result creation."""
profile = CatProfile(user_location="10001")
result = SearchResult(
matches=[],
total_found=0,
search_profile=profile,
search_time=1.23,
sources_queried=["petfinder"],
duplicates_removed=0
)
assert result.total_found == 0
assert result.search_time == 1.23
assert "petfinder" in result.sources_queried

View File

@@ -1,37 +0,0 @@
"""Utility functions for Tuxedo Link."""
from .deduplication import (
create_fingerprint,
calculate_levenshtein_similarity,
calculate_text_similarity,
)
from .image_utils import generate_image_embedding, calculate_image_similarity
from .log_utils import reformat
from .config import (
get_config,
is_production,
get_db_path,
get_vectordb_path,
get_email_provider,
get_email_config,
get_mailgun_config,
reload_config,
)
__all__ = [
"create_fingerprint",
"calculate_levenshtein_similarity",
"calculate_text_similarity",
"generate_image_embedding",
"calculate_image_similarity",
"reformat",
"get_config",
"is_production",
"get_db_path",
"get_vectordb_path",
"get_email_provider",
"get_email_config",
"get_mailgun_config",
"reload_config",
]

View File

@@ -1,174 +0,0 @@
"""
Breed mapping utilities for cat APIs.
Handles mapping user breed terms to valid API breed values
using dictionary lookups, vector search, and exact matching.
"""
import logging
from typing import List, Optional, Dict
# Mapping of common user terms to API breed values
# These are fuzzy/colloquial terms that users might type
USER_TERM_TO_API_BREED: Dict[str, List[str]] = {
# Common misspellings and variations
"main coon": ["Maine Coon"],
"maine": ["Maine Coon"],
"ragdol": ["Ragdoll"],
"siames": ["Siamese"],
"persian": ["Persian"],
"bengal": ["Bengal"],
"british shorthair": ["British Shorthair"],
"russian blue": ["Russian Blue"],
"sphynx": ["Sphynx"],
"sphinx": ["Sphynx"],
"american shorthair": ["American Shorthair"],
"scottish fold": ["Scottish Fold"],
"abyssinian": ["Abyssinian"],
"birman": ["Birman"],
"burmese": ["Burmese"],
"himalayan": ["Himalayan"],
"norwegian forest": ["Norwegian Forest Cat"],
"norwegian forest cat": ["Norwegian Forest Cat"],
"oriental": ["Oriental"],
"somali": ["Somali"],
"turkish angora": ["Turkish Angora"],
"turkish van": ["Turkish Van"],
# Mixed breeds
"mixed": ["Mixed Breed", "Domestic Short Hair", "Domestic Medium Hair", "Domestic Long Hair"],
"mixed breed": ["Mixed Breed", "Domestic Short Hair", "Domestic Medium Hair", "Domestic Long Hair"],
"domestic": ["Domestic Short Hair", "Domestic Medium Hair", "Domestic Long Hair"],
"dsh": ["Domestic Short Hair"],
"dmh": ["Domestic Medium Hair"],
"dlh": ["Domestic Long Hair"],
"tabby": ["Domestic Short Hair"], # Tabby is a pattern, not a breed
"tuxedo": ["Domestic Short Hair"], # Tuxedo is a color, not a breed
}
def normalize_user_breeds(
user_breeds: List[str],
valid_api_breeds: List[str],
vectordb: Optional[object] = None,
source: str = "petfinder",
similarity_threshold: float = 0.7
) -> List[str]:
"""
Normalize user breed preferences to valid API breed values.
Uses 3-tier strategy:
1. Dictionary lookup for common variations
2. Vector DB semantic search for fuzzy matching
3. Direct string matching as fallback
Args:
user_breeds: List of breed terms provided by the user
valid_api_breeds: List of breeds actually accepted by the API
vectordb: Optional MetadataVectorDB instance for semantic search
source: API source (petfinder/rescuegroups) for vector filtering
similarity_threshold: Minimum similarity score (0-1) for vector matches
Returns:
List of valid API breed strings
"""
if not user_breeds:
return []
normalized_breeds = set()
for user_term in user_breeds:
if not user_term or not user_term.strip():
continue
user_term_lower = user_term.lower().strip()
matched = False
# Tier 1: Dictionary lookup (instant, common variations)
if user_term_lower in USER_TERM_TO_API_BREED:
mapped_breeds = USER_TERM_TO_API_BREED[user_term_lower]
for mapped_breed in mapped_breeds:
if mapped_breed in valid_api_breeds:
normalized_breeds.add(mapped_breed)
matched = True
if matched:
logging.info(f"🎯 Dictionary match: '{user_term}'{list(mapped_breeds)}")
continue
# Tier 2: Vector DB semantic search (fuzzy matching, handles typos)
if vectordb:
try:
matches = vectordb.search_breed(
user_term,
n_results=1,
source_filter=source
)
if matches and matches[0]['similarity'] >= similarity_threshold:
best_match = matches[0]['breed']
similarity = matches[0]['similarity']
if best_match in valid_api_breeds:
normalized_breeds.add(best_match)
logging.info(
f"🔍 Vector match: '{user_term}''{best_match}' "
f"(similarity: {similarity:.2f})"
)
matched = True
continue
except Exception as e:
logging.warning(f"Vector search failed for breed '{user_term}': {e}")
# Tier 3: Direct string matching (exact or substring)
if not matched:
# Try exact match (case-insensitive)
for valid_breed in valid_api_breeds:
if valid_breed.lower() == user_term_lower:
normalized_breeds.add(valid_breed)
logging.info(f"✓ Exact match: '{user_term}''{valid_breed}'")
matched = True
break
# Try substring match if exact didn't work
if not matched:
for valid_breed in valid_api_breeds:
if user_term_lower in valid_breed.lower():
normalized_breeds.add(valid_breed)
logging.info(f"≈ Substring match: '{user_term}''{valid_breed}'")
matched = True
# Log if no match found
if not matched:
logging.warning(
f"⚠️ No breed match found for '{user_term}'. "
f"User will see broader results."
)
result = list(normalized_breeds)
logging.info(f"Breed normalization complete: {user_breeds}{result}")
return result
def get_breed_suggestions(breed_term: str, valid_breeds: List[str], top_n: int = 5) -> List[str]:
"""
Get breed suggestions for autocomplete or error messages.
Args:
breed_term: Partial or misspelled breed name
valid_breeds: List of valid API breed values
top_n: Number of suggestions to return
Returns:
List of suggested breed names
"""
term_lower = breed_term.lower().strip()
suggestions = []
# Find breeds containing the term
for breed in valid_breeds:
if term_lower in breed.lower():
suggestions.append(breed)
return suggestions[:top_n]

View File

@@ -1,224 +0,0 @@
"""
Color mapping utilities for cat APIs.
Handles mapping user color terms to valid API color values
using dictionary lookups, vector search, and exact matching.
"""
import logging
from typing import List, Dict, Optional
# Mapping of common user terms to Petfinder API color values
# Based on actual Petfinder API color list
USER_TERM_TO_API_COLOR: Dict[str, List[str]] = {
# Tuxedo/Bicolor patterns
"tuxedo": ["Black & White / Tuxedo"],
"black and white": ["Black & White / Tuxedo"],
"black & white": ["Black & White / Tuxedo"],
"bicolor": ["Black & White / Tuxedo"], # Most common bicolor
# Solid colors
"black": ["Black"],
"white": ["White"],
# Orange variations
"orange": ["Orange / Red"],
"red": ["Orange / Red"],
"ginger": ["Orange / Red"],
"orange and white": ["Orange & White"],
"orange & white": ["Orange & White"],
# Gray variations
"gray": ["Gray / Blue / Silver"],
"grey": ["Gray / Blue / Silver"],
"silver": ["Gray / Blue / Silver"],
"blue": ["Gray / Blue / Silver"],
"gray and white": ["Gray & White"],
"grey and white": ["Gray & White"],
# Brown/Chocolate
"brown": ["Brown / Chocolate"],
"chocolate": ["Brown / Chocolate"],
# Cream/Ivory
"cream": ["Cream / Ivory"],
"ivory": ["Cream / Ivory"],
"buff": ["Buff / Tan / Fawn"],
"tan": ["Buff / Tan / Fawn"],
"fawn": ["Buff / Tan / Fawn"],
# Patterns
"calico": ["Calico"],
"dilute calico": ["Dilute Calico"],
"tortoiseshell": ["Tortoiseshell"],
"tortie": ["Tortoiseshell"],
"dilute tortoiseshell": ["Dilute Tortoiseshell"],
"torbie": ["Torbie"],
# Tabby patterns
"tabby": ["Tabby (Brown / Chocolate)", "Tabby (Gray / Blue / Silver)", "Tabby (Orange / Red)"],
"brown tabby": ["Tabby (Brown / Chocolate)"],
"gray tabby": ["Tabby (Gray / Blue / Silver)"],
"grey tabby": ["Tabby (Gray / Blue / Silver)"],
"orange tabby": ["Tabby (Orange / Red)"],
"red tabby": ["Tabby (Orange / Red)"],
"tiger": ["Tabby (Tiger Striped)"],
"tiger striped": ["Tabby (Tiger Striped)"],
"leopard": ["Tabby (Leopard / Spotted)"],
"spotted": ["Tabby (Leopard / Spotted)"],
# Point colors (Siamese-type)
"blue point": ["Blue Point"],
"chocolate point": ["Chocolate Point"],
"cream point": ["Cream Point"],
"flame point": ["Flame Point"],
"lilac point": ["Lilac Point"],
"seal point": ["Seal Point"],
# Other
"smoke": ["Smoke"],
"blue cream": ["Blue Cream"],
}
def normalize_user_colors(
user_colors: List[str],
valid_api_colors: List[str],
vectordb: Optional[object] = None,
source: str = "petfinder",
similarity_threshold: float = 0.7
) -> List[str]:
"""
Normalize user color preferences to valid API color values.
Uses 3-tier strategy:
1. Dictionary lookup for common color terms
2. Vector DB semantic search for fuzzy matching
3. Direct string matching as fallback
Args:
user_colors: List of color terms provided by the user
valid_api_colors: List of colors actually accepted by the API
vectordb: Optional MetadataVectorDB instance for semantic search
source: API source (petfinder/rescuegroups) for vector filtering
similarity_threshold: Minimum similarity score (0-1) for vector matches
Returns:
List of valid API color strings
"""
if not user_colors:
return []
normalized_colors = set()
for user_term in user_colors:
if not user_term or not user_term.strip():
continue
user_term_lower = user_term.lower().strip()
matched = False
# Tier 1: Dictionary lookup (instant, common color terms)
if user_term_lower in USER_TERM_TO_API_COLOR:
mapped_colors = USER_TERM_TO_API_COLOR[user_term_lower]
for mapped_color in mapped_colors:
if mapped_color in valid_api_colors:
normalized_colors.add(mapped_color)
matched = True
if matched:
logging.info(f"🎯 Dictionary match: '{user_term}'{list(mapped_colors)}")
continue
# Tier 2: Vector DB semantic search (fuzzy matching, handles typos)
if vectordb:
try:
matches = vectordb.search_color(
user_term,
n_results=1,
source_filter=source
)
if matches and matches[0]['similarity'] >= similarity_threshold:
best_match = matches[0]['color']
similarity = matches[0]['similarity']
if best_match in valid_api_colors:
normalized_colors.add(best_match)
logging.info(
f"🔍 Vector match: '{user_term}''{best_match}' "
f"(similarity: {similarity:.2f})"
)
matched = True
continue
except Exception as e:
logging.warning(f"Vector search failed for color '{user_term}': {e}")
# Tier 3: Direct string matching (exact or substring)
if not matched:
# Try exact match (case-insensitive)
for valid_color in valid_api_colors:
if valid_color.lower() == user_term_lower:
normalized_colors.add(valid_color)
logging.info(f"✓ Exact match: '{user_term}''{valid_color}'")
matched = True
break
# Try substring match if exact didn't work
if not matched:
for valid_color in valid_api_colors:
if user_term_lower in valid_color.lower():
normalized_colors.add(valid_color)
logging.info(f"≈ Substring match: '{user_term}''{valid_color}'")
matched = True
# Log if no match found
if not matched:
logging.warning(
f"⚠️ No color match found for '{user_term}'. "
f"User will see broader results."
)
result = list(normalized_colors)
logging.info(f"Color normalization complete: {user_colors}{result}")
return result
def get_color_suggestions(color_term: str, valid_colors: List[str], top_n: int = 5) -> List[str]:
"""
Get color suggestions for autocomplete or error messages.
Args:
color_term: Partial or misspelled color name
valid_colors: List of valid API color values
top_n: Number of suggestions to return
Returns:
List of suggested color names
"""
term_lower = color_term.lower().strip()
suggestions = []
# Find colors containing the term
for color in valid_colors:
if term_lower in color.lower():
suggestions.append(color)
return suggestions[:top_n]
def get_color_help_text(valid_colors: List[str]) -> str:
"""
Generate help text for valid colors.
Args:
valid_colors: List of valid API colors
Returns:
Formatted string describing valid colors
"""
if not valid_colors:
return "No color information available."
return f"Valid colors: {', '.join(valid_colors)}"

View File

@@ -1,134 +0,0 @@
"""Configuration management for Tuxedo Link."""
import yaml
import os
from pathlib import Path
from typing import Dict, Any
_config_cache: Dict[str, Any] = None
def load_config() -> Dict[str, Any]:
"""
Load configuration from YAML with environment variable overrides.
Returns:
Dict[str, Any]: Configuration dictionary
"""
global _config_cache
if _config_cache:
return _config_cache
# Determine config path - look for config.yaml, fallback to example
project_root = Path(__file__).parent.parent
config_path = project_root / "config.yaml"
if not config_path.exists():
config_path = project_root / "config.example.yaml"
if not config_path.exists():
raise FileNotFoundError(
"No config.yaml or config.example.yaml found. "
"Please copy config.example.yaml to config.yaml and configure it."
)
# Load YAML
with open(config_path) as f:
config = yaml.safe_load(f)
# Override with environment variables if present
if 'EMAIL_PROVIDER' in os.environ:
config['email']['provider'] = os.environ['EMAIL_PROVIDER']
if 'DEPLOYMENT_MODE' in os.environ:
config['deployment']['mode'] = os.environ['DEPLOYMENT_MODE']
if 'MAILGUN_DOMAIN' in os.environ:
config['mailgun']['domain'] = os.environ['MAILGUN_DOMAIN']
_config_cache = config
return config
def get_config() -> Dict[str, Any]:
"""
Get current configuration.
Returns:
Dict[str, Any]: Configuration dictionary
"""
return load_config()
def is_production() -> bool:
"""
Check if running in production mode.
Returns:
bool: True if production mode, False if local
"""
return get_config()['deployment']['mode'] == 'production'
def get_db_path() -> str:
"""
Get database path based on deployment mode.
Returns:
str: Path to database file
"""
config = get_config()
mode = config['deployment']['mode']
return config['deployment'][mode]['db_path']
def get_vectordb_path() -> str:
"""
Get vector database path based on deployment mode.
Returns:
str: Path to vector database directory
"""
config = get_config()
mode = config['deployment']['mode']
return config['deployment'][mode]['vectordb_path']
def get_email_provider() -> str:
"""
Get configured email provider.
Returns:
str: Email provider name (mailgun or sendgrid)
"""
return get_config()['email']['provider']
def get_email_config() -> Dict[str, str]:
"""
Get email configuration.
Returns:
Dict[str, str]: Email configuration (from_name, from_email)
"""
return get_config()['email']
def get_mailgun_config() -> Dict[str, str]:
"""
Get Mailgun configuration.
Returns:
Dict[str, str]: Mailgun configuration (domain)
"""
return get_config()['mailgun']
def reload_config() -> None:
"""
Force reload configuration from file.
Useful for testing or when config changes.
"""
global _config_cache
_config_cache = None
load_config()

View File

@@ -1,201 +0,0 @@
"""Deduplication utilities for identifying duplicate cat listings."""
import hashlib
from typing import Tuple
import Levenshtein
from models.cats import Cat
def create_fingerprint(cat: Cat) -> str:
"""
Create a fingerprint for a cat based on stable attributes.
The fingerprint is a hash of:
- Organization name (normalized)
- Breed (normalized)
- Age
- Gender
Args:
cat: Cat object
Returns:
Fingerprint hash (16 characters)
"""
components = [
cat.organization_name.lower().strip(),
cat.breed.lower().strip(),
str(cat.age).lower(),
cat.gender.lower()
]
# Create hash from combined components
combined = '|'.join(components)
hash_obj = hashlib.sha256(combined.encode())
# Return first 16 characters of hex digest
return hash_obj.hexdigest()[:16]
def calculate_levenshtein_similarity(str1: str, str2: str) -> float:
"""
Calculate normalized Levenshtein similarity between two strings.
Similarity = 1 - (distance / max_length)
Args:
str1: First string
str2: Second string
Returns:
Similarity score (0-1, where 1 is identical)
"""
if not str1 or not str2:
return 0.0
# Normalize strings
str1 = str1.lower().strip()
str2 = str2.lower().strip()
# Handle identical strings
if str1 == str2:
return 1.0
# Calculate Levenshtein distance
distance = Levenshtein.distance(str1, str2)
# Normalize by maximum possible distance
max_length = max(len(str1), len(str2))
if max_length == 0:
return 1.0
similarity = 1.0 - (distance / max_length)
return max(0.0, similarity)
def calculate_text_similarity(cat1: Cat, cat2: Cat) -> Tuple[float, float]:
"""
Calculate text similarity between two cats (name and description).
Args:
cat1: First cat
cat2: Second cat
Returns:
Tuple of (name_similarity, description_similarity)
"""
# Name similarity
name_similarity = calculate_levenshtein_similarity(cat1.name, cat2.name)
# Description similarity
desc_similarity = calculate_levenshtein_similarity(
cat1.description,
cat2.description
)
return name_similarity, desc_similarity
def calculate_composite_score(
name_similarity: float,
description_similarity: float,
image_similarity: float,
name_weight: float = 0.4,
description_weight: float = 0.3,
image_weight: float = 0.3
) -> float:
"""
Calculate a composite similarity score from multiple signals.
Args:
name_similarity: Name similarity (0-1)
description_similarity: Description similarity (0-1)
image_similarity: Image similarity (0-1)
name_weight: Weight for name similarity
description_weight: Weight for description similarity
image_weight: Weight for image similarity
Returns:
Composite score (0-1)
"""
# Ensure weights sum to 1
total_weight = name_weight + description_weight + image_weight
if total_weight == 0:
return 0.0
# Normalize weights
name_weight /= total_weight
description_weight /= total_weight
image_weight /= total_weight
# Calculate weighted score
score = (
name_similarity * name_weight +
description_similarity * description_weight +
image_similarity * image_weight
)
return score
def normalize_string(s: str) -> str:
"""
Normalize a string for comparison.
- Convert to lowercase
- Strip whitespace
- Remove extra spaces
Args:
s: String to normalize
Returns:
Normalized string
"""
import re
s = s.lower().strip()
s = re.sub(r'\s+', ' ', s) # Replace multiple spaces with single space
return s
def calculate_breed_similarity(breed1: str, breed2: str) -> float:
"""
Calculate breed similarity with special handling for mixed breeds.
Args:
breed1: First breed
breed2: Second breed
Returns:
Similarity score (0-1)
"""
breed1_norm = normalize_string(breed1)
breed2_norm = normalize_string(breed2)
# Exact match
if breed1_norm == breed2_norm:
return 1.0
# Check if both are domestic shorthair/longhair (very common)
domestic_variants = ['domestic short hair', 'domestic shorthair', 'dsh',
'domestic long hair', 'domestic longhair', 'dlh',
'domestic medium hair', 'domestic mediumhair', 'dmh']
if breed1_norm in domestic_variants and breed2_norm in domestic_variants:
return 0.9 # High similarity for domestic cats
# Check for mix/mixed keywords
mix_keywords = ['mix', 'mixed', 'tabby']
breed1_has_mix = any(keyword in breed1_norm for keyword in mix_keywords)
breed2_has_mix = any(keyword in breed2_norm for keyword in mix_keywords)
if breed1_has_mix and breed2_has_mix:
# Both are mixes, higher tolerance
return calculate_levenshtein_similarity(breed1, breed2) * 0.9
# Standard Levenshtein similarity
return calculate_levenshtein_similarity(breed1, breed2)

View File

@@ -1,161 +0,0 @@
"""Geocoding utilities for location services."""
import requests
from typing import Optional, Tuple
def geocode_location(location: str) -> Optional[Tuple[float, float]]:
"""
Convert a location string (address, city, or ZIP) to latitude/longitude.
Uses the free Nominatim API (OpenStreetMap).
Args:
location: Location string (address, city, ZIP code, etc.)
Returns:
Tuple of (latitude, longitude) or None if geocoding fails
"""
try:
# Use Nominatim API (free, no API key required)
url = "https://nominatim.openstreetmap.org/search"
params = {
'q': location,
'format': 'json',
'limit': 1,
'countrycodes': 'us,ca' # Limit to US and Canada
}
headers = {
'User-Agent': 'TuxedoLink/1.0' # Required by Nominatim
}
response = requests.get(url, params=params, headers=headers, timeout=10)
response.raise_for_status()
results = response.json()
if results and len(results) > 0:
lat = float(results[0]['lat'])
lon = float(results[0]['lon'])
return lat, lon
return None
except Exception as e:
print(f"Geocoding failed for '{location}': {e}")
return None
def reverse_geocode(latitude: float, longitude: float) -> Optional[dict]:
"""
Convert latitude/longitude to address information.
Args:
latitude: Latitude
longitude: Longitude
Returns:
Dictionary with address components or None if failed
"""
try:
url = "https://nominatim.openstreetmap.org/reverse"
params = {
'lat': latitude,
'lon': longitude,
'format': 'json'
}
headers = {
'User-Agent': 'TuxedoLink/1.0'
}
response = requests.get(url, params=params, headers=headers, timeout=10)
response.raise_for_status()
result = response.json()
if 'address' in result:
address = result['address']
return {
'city': address.get('city', address.get('town', address.get('village', ''))),
'state': address.get('state', ''),
'zip': address.get('postcode', ''),
'country': address.get('country', ''),
'display_name': result.get('display_name', '')
}
return None
except Exception as e:
print(f"Reverse geocoding failed for ({latitude}, {longitude}): {e}")
return None
def calculate_distance(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""
Calculate the great circle distance between two points in miles.
Uses the Haversine formula.
Args:
lat1: Latitude of first point
lon1: Longitude of first point
lat2: Latitude of second point
lon2: Longitude of second point
Returns:
Distance in miles
"""
from math import radians, sin, cos, sqrt, atan2
# Earth's radius in miles
R = 3959.0
# Convert to radians
lat1_rad = radians(lat1)
lon1_rad = radians(lon1)
lat2_rad = radians(lat2)
lon2_rad = radians(lon2)
# Differences
dlat = lat2_rad - lat1_rad
dlon = lon2_rad - lon1_rad
# Haversine formula
a = sin(dlat/2)**2 + cos(lat1_rad) * cos(lat2_rad) * sin(dlon/2)**2
c = 2 * atan2(sqrt(a), sqrt(1-a))
distance = R * c
return distance
def parse_location_input(location_input: str) -> Optional[Tuple[float, float]]:
"""
Parse location input that might be coordinates or an address.
Handles formats:
- "lat,long" (e.g., "40.7128,-74.0060")
- ZIP code (e.g., "10001")
- City, State (e.g., "New York, NY")
- Full address
Args:
location_input: Location string
Returns:
Tuple of (latitude, longitude) or None if parsing fails
"""
# Try to parse as coordinates first
if ',' in location_input:
parts = location_input.split(',')
if len(parts) == 2:
try:
lat = float(parts[0].strip())
lon = float(parts[1].strip())
# Basic validation
if -90 <= lat <= 90 and -180 <= lon <= 180:
return lat, lon
except ValueError:
pass # Not coordinates, try geocoding
# Fall back to geocoding
return geocode_location(location_input)

View File

@@ -1,168 +0,0 @@
"""Image utilities for generating and comparing image embeddings."""
import numpy as np
import requests
from PIL import Image
from io import BytesIO
from typing import Optional
import open_clip
import torch
class ImageEmbeddingGenerator:
"""Generate image embeddings using CLIP model."""
def __init__(self, model_name: str = 'ViT-B-32', pretrained: str = 'openai'):
"""
Initialize the embedding generator.
Args:
model_name: CLIP model architecture
pretrained: Pretrained weights to use
"""
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model, _, self.preprocess = open_clip.create_model_and_transforms(
model_name,
pretrained=pretrained,
device=self.device
)
self.model.eval()
def download_image(self, url: str, timeout: int = 10) -> Optional[Image.Image]:
"""
Download an image from a URL.
Args:
url: Image URL
timeout: Request timeout in seconds
Returns:
PIL Image or None if download fails
"""
try:
response = requests.get(url, timeout=timeout)
response.raise_for_status()
img = Image.open(BytesIO(response.content))
return img.convert('RGB') # Ensure RGB format
except Exception as e:
print(f"Failed to download image from {url}: {e}")
return None
def generate_embedding(self, image: Image.Image) -> np.ndarray:
"""
Generate CLIP embedding for an image.
Args:
image: PIL Image
Returns:
Numpy array of image embedding
"""
with torch.no_grad():
image_input = self.preprocess(image).unsqueeze(0).to(self.device)
image_features = self.model.encode_image(image_input)
# Normalize embedding
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
# Convert to numpy
embedding = image_features.cpu().numpy().flatten()
return embedding.astype(np.float32)
def generate_embedding_from_url(self, url: str) -> Optional[np.ndarray]:
"""
Download an image and generate its embedding.
Args:
url: Image URL
Returns:
Numpy array of image embedding or None if failed
"""
image = self.download_image(url)
if image is None:
return None
return self.generate_embedding(image)
# Global instance (lazy loaded)
_embedding_generator: Optional[ImageEmbeddingGenerator] = None
def get_embedding_generator() -> ImageEmbeddingGenerator:
"""Get or create the global embedding generator instance."""
global _embedding_generator
if _embedding_generator is None:
_embedding_generator = ImageEmbeddingGenerator()
return _embedding_generator
def generate_image_embedding(image_url: str) -> Optional[np.ndarray]:
"""
Generate an image embedding from a URL.
This is a convenience function that uses the global embedding generator.
Args:
image_url: URL of the image
Returns:
Numpy array of image embedding or None if failed
"""
generator = get_embedding_generator()
return generator.generate_embedding_from_url(image_url)
def calculate_image_similarity(embedding1: np.ndarray, embedding2: np.ndarray) -> float:
"""
Calculate cosine similarity between two image embeddings.
Args:
embedding1: First image embedding
embedding2: Second image embedding
Returns:
Similarity score (0-1, where 1 is most similar)
"""
if embedding1 is None or embedding2 is None:
return 0.0
# Ensure embeddings are normalized
norm1 = np.linalg.norm(embedding1)
norm2 = np.linalg.norm(embedding2)
if norm1 == 0 or norm2 == 0:
return 0.0
embedding1_norm = embedding1 / norm1
embedding2_norm = embedding2 / norm2
# Cosine similarity
similarity = np.dot(embedding1_norm, embedding2_norm)
# Clip to [0, 1] range (cosine similarity is [-1, 1])
similarity = (similarity + 1) / 2
return float(similarity)
def batch_generate_embeddings(image_urls: list[str]) -> list[Optional[np.ndarray]]:
"""
Generate embeddings for multiple images.
Args:
image_urls: List of image URLs
Returns:
List of embeddings (same length as input, None for failed downloads)
"""
generator = get_embedding_generator()
embeddings = []
for url in image_urls:
embedding = generator.generate_embedding_from_url(url)
embeddings.append(embedding)
return embeddings

View File

@@ -1,46 +0,0 @@
"""Logging utilities for Tuxedo Link."""
# Foreground colors
RED = '\033[31m'
GREEN = '\033[32m'
YELLOW = '\033[33m'
BLUE = '\033[34m'
MAGENTA = '\033[35m'
CYAN = '\033[36m'
WHITE = '\033[37m'
# Background color
BG_BLACK = '\033[40m'
BG_BLUE = '\033[44m'
# Reset code to return to default color
RESET = '\033[0m'
# Mapping of terminal color codes to HTML colors
mapper = {
BG_BLACK+RED: "#dd0000",
BG_BLACK+GREEN: "#00dd00",
BG_BLACK+YELLOW: "#dddd00",
BG_BLACK+BLUE: "#0000ee",
BG_BLACK+MAGENTA: "#aa00dd",
BG_BLACK+CYAN: "#00dddd",
BG_BLACK+WHITE: "#87CEEB",
BG_BLUE+WHITE: "#ff7800"
}
def reformat(message: str) -> str:
"""
Convert terminal color codes to HTML spans for Gradio display.
Args:
message: Log message with terminal color codes
Returns:
HTML formatted message
"""
for key, value in mapper.items():
message = message.replace(key, f'<span style="color: {value}">')
message = message.replace(RESET, '</span>')
return message

View File

@@ -1,37 +0,0 @@
"""Timing utilities for performance monitoring."""
import time
import functools
from typing import Callable, Any
def timed(func: Callable[..., Any]) -> Callable[..., Any]:
"""
Decorator to time function execution and log it.
Args:
func: Function to be timed
Returns:
Wrapped function that logs execution time
Usage:
@timed
def my_function():
...
"""
@functools.wraps(func)
def wrapper(*args: Any, **kwargs: Any) -> Any:
"""Wrapper function that times the execution."""
start_time = time.time()
result = func(*args, **kwargs)
elapsed = time.time() - start_time
# Try to log if the object has a log method (Agent classes)
if args and hasattr(args[0], 'log'):
args[0].log(f"{func.__name__} completed in {elapsed:.2f} seconds")
return result
return wrapper

File diff suppressed because it is too large Load Diff