Wiping it out , living only README
This commit is contained in:
@@ -1,77 +0,0 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# Virtual environments
|
||||
venv/
|
||||
ENV/
|
||||
env/
|
||||
.venv
|
||||
|
||||
# Environment variables
|
||||
.env
|
||||
|
||||
# Configuration
|
||||
config.yaml
|
||||
|
||||
# Database
|
||||
*.db
|
||||
*.db-journal
|
||||
*.sqlite
|
||||
*.sqlite3
|
||||
|
||||
# ChromaDB
|
||||
cat_vectorstore/
|
||||
metadata_vectorstore/
|
||||
*.chroma
|
||||
|
||||
# IDEs
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
.DS_Store
|
||||
|
||||
# Testing
|
||||
.coverage
|
||||
htmlcov/
|
||||
.pytest_cache/
|
||||
.tox/
|
||||
|
||||
# Logs
|
||||
*.log
|
||||
logs/
|
||||
|
||||
# Modal
|
||||
.modal-cache/
|
||||
|
||||
# Data files
|
||||
data/*.db
|
||||
data/*.json
|
||||
!data/.gitkeep
|
||||
|
||||
# Model cache (sentence-transformers, huggingface, etc.)
|
||||
.cache/
|
||||
|
||||
# Jupyter
|
||||
.ipynb_checkpoints/
|
||||
|
||||
@@ -1 +0,0 @@
|
||||
3.11
|
||||
@@ -8,230 +8,99 @@ Find your perfect feline companion using AI, semantic search, and multi-platform
|
||||
|
||||
---
|
||||
|
||||
## 🌟 Features
|
||||
## 🌟 Overview
|
||||
|
||||
✅ **Multi-Platform Search** - Aggregates from Petfinder and RescueGroups
|
||||
✅ **Natural Language** - Describe your ideal cat in plain English
|
||||
✅ **Semantic Matching** - AI understands personality, not just keywords
|
||||
✅ **Color/Breed Matching** - 3-tier system handles typos ("tuxado" → "tuxedo", "main coon" → "Maine Coon")
|
||||
✅ **Deduplication** - Multi-modal (name + description + image) duplicate detection
|
||||
✅ **Hybrid Search** - Combines vector similarity with structured filters
|
||||
✅ **Image Recognition** - Uses CLIP to match cats visually
|
||||
✅ **Email Notifications** - Get alerts for new matches
|
||||
✅ **Serverless Backend** - Optionally deploy to Modal for cloud-based search and alerts
|
||||
Tuxedo Link is an intelligent cat adoption platform that combines:
|
||||
|
||||
**Technical Stack**: OpenAI GPT-4 • ChromaDB • CLIP • Gradio • Modal
|
||||
- **Natural Language Understanding** - Describe your ideal cat in plain English
|
||||
- **Semantic Search with RAG** - ChromaDB + SentenceTransformers for personality-based matching
|
||||
- **Multi-Modal Deduplication** - Uses CLIP for image similarity + text analysis
|
||||
- **Hybrid Scoring** - 60% vector similarity + 40% attribute matching
|
||||
- **Multi-Platform Aggregation** - Searches Petfinder and RescueGroups APIs
|
||||
- **Serverless Architecture** - Optional Modal deployment with scheduled email alerts
|
||||
|
||||
## 🏗️ Architecture Modes
|
||||
|
||||
Tuxedo Link supports two deployment modes:
|
||||
|
||||
### Local Mode (Development)
|
||||
- All components run locally
|
||||
- Uses local database and vector store
|
||||
- Fast iteration and development
|
||||
- No Modal required
|
||||
|
||||
### Production Mode (Cloud)
|
||||
- UI runs locally, backend runs on Modal
|
||||
- Database and vector store on Modal volumes
|
||||
- Scheduled email alerts active
|
||||
- Scalable and serverless
|
||||
|
||||
Switch between modes in `config.yaml` by setting `deployment.mode` to `local` or `production`.
|
||||
**Tech Stack**: OpenAI GPT-4 • ChromaDB • CLIP • Gradio • Modal
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start
|
||||
## 📸 Application Screenshots
|
||||
|
||||
### Prerequisites
|
||||
- Python 3.11+
|
||||
- `uv` package manager
|
||||
- API keys (OpenAI, Petfinder, Mailgun)
|
||||
### Installation
|
||||
### 🔍 Search Interface
|
||||
Natural language search with semantic matching and personality-based results:
|
||||
|
||||
1. **Navigate to project directory**
|
||||
```bash
|
||||
cd week8/community_contributions/dkisselev-zz/tuxedo_link
|
||||
```
|
||||

|
||||
|
||||
2. **Set up virtual environment**
|
||||
```bash
|
||||
uv venv
|
||||
source .venv/bin/activate
|
||||
uv pip install -e ".[dev]"
|
||||
```
|
||||
### 🔔 Email Alerts
|
||||
Save your search and get notified when new matching cats are available:
|
||||
|
||||
3. **Configure environment variables**
|
||||
```bash
|
||||
# Copy template and add your API keys
|
||||
cp env.example .env
|
||||
# Edit .env with your keys
|
||||
```
|
||||

|
||||
|
||||
4. **Configure application settings**
|
||||
```bash
|
||||
# Copy configuration template
|
||||
cp config.example.yaml config.yaml
|
||||
# Edit config.yaml for email provider and deployment mode
|
||||
```
|
||||
### 📖 About Page
|
||||
Learn about the technology and inspiration behind Tuxedo Link:
|
||||
|
||||
5. **Initialize databases**
|
||||
```bash
|
||||
python setup_vectordb.py
|
||||
```
|
||||

|
||||
|
||||
6. **Run the application**
|
||||
```bash
|
||||
./run.sh
|
||||
```
|
||||
### 📧 Email Notifications
|
||||
Receive beautiful email alerts with your perfect matches:
|
||||
|
||||
Visit http://localhost:7860 in your browser!
|
||||

|
||||
|
||||
---
|
||||
|
||||
## 🔑 API Setup
|
||||
## 🚀 Full Project & Source Code
|
||||
|
||||
### Required API Keys
|
||||
The complete source code, documentation, and setup instructions are available at:
|
||||
|
||||
Add these to your `.env` file:
|
||||
### **[👉 GitHub Repository: dkisselev-zz/tuxedo-link](https://github.com/dkisselev-zz/tuxedo-link)**
|
||||
|
||||
```bash
|
||||
# OpenAI (for profile extraction)
|
||||
# Get key from: https://platform.openai.com/api-keys
|
||||
OPENAI_API_KEY=sk-...
|
||||
The repository includes:
|
||||
|
||||
# Petfinder (for cat listings)
|
||||
# Get key from: https://www.petfinder.com/developers/
|
||||
PETFINDER_API_KEY=your_key
|
||||
PETFINDER_SECRET=your_secret
|
||||
|
||||
# Mailgun (for email alerts)
|
||||
# Get key from: https://app.mailgun.com/
|
||||
MAILGUN_API_KEY=your_mailgun_key
|
||||
```
|
||||
|
||||
### Optional API Keys
|
||||
|
||||
```bash
|
||||
# RescueGroups (additional cat listings)
|
||||
# Get key from: https://userguide.rescuegroups.org/
|
||||
RESCUEGROUPS_API_KEY=your_key
|
||||
|
||||
# SendGrid (alternative email provider)
|
||||
SENDGRID_API_KEY=SG...
|
||||
|
||||
# Modal (for cloud deployment)
|
||||
MODAL_TOKEN_ID=...
|
||||
MODAL_TOKEN_SECRET=...
|
||||
```
|
||||
|
||||
### Application Configuration
|
||||
|
||||
Edit `config.yaml` to configure:
|
||||
|
||||
```yaml
|
||||
# Email provider (mailgun or sendgrid)
|
||||
email:
|
||||
provider: mailgun
|
||||
from_name: "Tuxedo Link"
|
||||
from_email: "noreply@yourdomain.com"
|
||||
|
||||
# Mailgun domain
|
||||
mailgun:
|
||||
domain: "your-domain.mailgun.org"
|
||||
|
||||
# Deployment mode (local or production)
|
||||
deployment:
|
||||
mode: local # Use 'local' for development
|
||||
```
|
||||
|
||||
**Note**: API keys go in `.env` (git-ignored), application settings go in `config.yaml` (also git-ignored).
|
||||
- ✅ Complete source code with 92 passing tests
|
||||
- ✅ Comprehensive technical documentation (3,400+ lines)
|
||||
- ✅ Agentic architecture with 7 specialized agents
|
||||
- ✅ Dual vector store implementation (main + metadata)
|
||||
- ✅ Modal deployment guide for production
|
||||
- ✅ Setup scripts and configuration examples
|
||||
- ✅ LLM techniques documentation (structured output, RAG, hybrid search)
|
||||
|
||||
---
|
||||
|
||||
## 💻 Usage
|
||||
## 🧠 Key LLM/RAG Techniques
|
||||
|
||||
### Search Tab
|
||||
1. Describe your ideal cat in natural language
|
||||
2. Click "Search" or press Enter
|
||||
3. Browse results with match scores
|
||||
4. Click "View Details" to see adoption page
|
||||
### 1. Structured Output with GPT-4 Function Calling
|
||||
Extracts search preferences from natural language into Pydantic models
|
||||
|
||||
**Example queries:**
|
||||
- "I want a friendly family cat in NYC good with children"
|
||||
- "Looking for a playful young kitten"
|
||||
- "Show me calm adult cats that like to cuddle"
|
||||
- "Find me a tuxedo maine coon in Boston" (natural color/breed terms work!)
|
||||
- "Orange tabby that's good with other cats"
|
||||
### 2. Dual Vector Store Architecture
|
||||
- **Main ChromaDB** - Cat profile semantic embeddings
|
||||
- **Metadata DB** - Fuzzy color/breed matching with typo tolerance
|
||||
|
||||
#### Alerts Tab
|
||||
1. Perform a search in the Search tab first
|
||||
2. Go to Alerts tab
|
||||
3. Enter your email address
|
||||
4. Choose notification frequency (Immediately, Daily, Weekly)
|
||||
5. Click "Save Alert"
|
||||
### 3. Hybrid Search Strategy
|
||||
Combines vector similarity (60%) with structured metadata filtering (40%)
|
||||
|
||||
You'll receive email notifications when new matches are found!
|
||||
### 4. 3-Tier Semantic Normalization
|
||||
Dictionary → Vector DB → Fuzzy fallback for robust term mapping
|
||||
|
||||
#### About Tab
|
||||
Learn about Kyra and the technology behind the app
|
||||
### 5. Multi-Modal Deduplication
|
||||
Fingerprint + text (Levenshtein) + image (CLIP) similarity scoring
|
||||
|
||||
### Development Mode
|
||||
---
|
||||
|
||||
For faster development and testing, use local mode in `config.yaml`:
|
||||
## 🏆 Project Highlights
|
||||
|
||||
```yaml
|
||||
deployment:
|
||||
mode: local # Uses local database and cached data
|
||||
```
|
||||
- **92 Tests** - 81 unit + 11 integration tests (100% passing)
|
||||
- **Production Ready** - Serverless Modal deployment with volumes
|
||||
- **Email Alerts** - Scheduled background jobs for new match notifications
|
||||
- **95%+ Accuracy** - Multi-modal deduplication across platforms
|
||||
- **85-90% Match Quality** - Hybrid scoring algorithm
|
||||
|
||||
---
|
||||
|
||||
## 📚 Documentation
|
||||
|
||||
### Complete Technical Reference
|
||||
|
||||
For detailed documentation on the architecture, agents, and every function in the codebase, see:
|
||||
|
||||
**[📖 TECHNICAL_REFERENCE.md](docs/TECHNICAL_REFERENCE.md)** - Complete technical documentation including:
|
||||
- Configuration system
|
||||
- Agentic architecture
|
||||
- Data flow pipeline
|
||||
- Deduplication strategy
|
||||
- Email provider system
|
||||
- Alert management
|
||||
- All functions with examples
|
||||
- User journey walkthroughs
|
||||
|
||||
**[📊 ARCHITECTURE_DIAGRAM.md](docs/architecture_diagrams/ARCHITECTURE_DIAGRAM.md)** - Visual diagrams:
|
||||
- System architecture
|
||||
- Agent interaction
|
||||
- Data flow
|
||||
- Database schema
|
||||
|
||||
**[🚀 MODAL_DEPLOYMENT.md](docs/MODAL_DEPLOYMENT.md)** - Cloud deployment guide:
|
||||
- Production mode architecture
|
||||
- Automated deployment with `deploy.sh`
|
||||
- Modal API and scheduled jobs
|
||||
- UI-to-Modal communication
|
||||
- Monitoring and troubleshooting
|
||||
|
||||
**[🧪 tests/README.md](tests/README.md)** - Testing guide:
|
||||
- Running unit tests
|
||||
- Running integration tests
|
||||
- Manual test scripts
|
||||
- Coverage reports
|
||||
|
||||
---
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
This project was built as part of the Andela LLM Engineering bootcamp. Contributions and improvements are welcome!
|
||||
|
||||
---
|
||||
|
||||
## 📄 License
|
||||
|
||||
See [LICENSE](LICENSE) file for details.
|
||||
- **TECHNICAL_REFERENCE.md** - Complete API documentation
|
||||
- **MODAL_DEPLOYMENT.md** - Cloud deployment guide
|
||||
- **ARCHITECTURE_DIAGRAM.md** - System architecture visuals
|
||||
- **tests/README.md** - Testing guide and coverage
|
||||
|
||||
---
|
||||
|
||||
@@ -241,6 +110,6 @@ See [LICENSE](LICENSE) file for details.
|
||||
|
||||
*May every cat find their perfect home* 🐾
|
||||
|
||||
[Technical Reference](docs/TECHNICAL_REFERENCE.md) • [Architecture](docs/architecture_diagrams/ARCHITECTURE_DIAGRAM.md) • [Deployment](docs/MODAL_DEPLOYMENT.md) • [Tests](tests/README.md)
|
||||
**[View Full Project on GitHub →](https://github.com/dkisselev-zz/tuxedo-link)**
|
||||
|
||||
</div>
|
||||
|
||||
@@ -1,22 +0,0 @@
|
||||
"""Agent implementations for Tuxedo Link."""
|
||||
|
||||
from .agent import Agent
|
||||
from .petfinder_agent import PetfinderAgent
|
||||
from .rescuegroups_agent import RescueGroupsAgent
|
||||
from .profile_agent import ProfileAgent
|
||||
from .matching_agent import MatchingAgent
|
||||
from .deduplication_agent import DeduplicationAgent
|
||||
from .planning_agent import PlanningAgent
|
||||
from .email_agent import EmailAgent
|
||||
|
||||
__all__ = [
|
||||
"Agent",
|
||||
"PetfinderAgent",
|
||||
"RescueGroupsAgent",
|
||||
"ProfileAgent",
|
||||
"MatchingAgent",
|
||||
"DeduplicationAgent",
|
||||
"PlanningAgent",
|
||||
"EmailAgent",
|
||||
]
|
||||
|
||||
@@ -1,86 +0,0 @@
|
||||
"""Base Agent class for Tuxedo Link agents."""
|
||||
|
||||
import logging
|
||||
import time
|
||||
from functools import wraps
|
||||
from typing import Any, Callable
|
||||
|
||||
|
||||
class Agent:
|
||||
"""
|
||||
An abstract superclass for Agents.
|
||||
Used to log messages in a way that can identify each Agent.
|
||||
"""
|
||||
|
||||
# Foreground colors
|
||||
RED = '\033[31m'
|
||||
GREEN = '\033[32m'
|
||||
YELLOW = '\033[33m'
|
||||
BLUE = '\033[34m'
|
||||
MAGENTA = '\033[35m'
|
||||
CYAN = '\033[36m'
|
||||
WHITE = '\033[37m'
|
||||
|
||||
# Background color
|
||||
BG_BLACK = '\033[40m'
|
||||
|
||||
# Reset code to return to default color
|
||||
RESET = '\033[0m'
|
||||
|
||||
name: str = ""
|
||||
color: str = '\033[37m'
|
||||
|
||||
def log(self, message: str) -> None:
|
||||
"""
|
||||
Log this as an info message, identifying the agent.
|
||||
|
||||
Args:
|
||||
message: Message to log
|
||||
"""
|
||||
color_code = self.BG_BLACK + self.color
|
||||
message = f"[{self.name}] {message}"
|
||||
logging.info(color_code + message + self.RESET)
|
||||
|
||||
def log_error(self, message: str) -> None:
|
||||
"""
|
||||
Log an error message.
|
||||
|
||||
Args:
|
||||
message: Error message to log
|
||||
"""
|
||||
color_code = self.BG_BLACK + self.RED
|
||||
message = f"[{self.name}] ERROR: {message}"
|
||||
logging.error(color_code + message + self.RESET)
|
||||
|
||||
def log_warning(self, message: str) -> None:
|
||||
"""
|
||||
Log a warning message.
|
||||
|
||||
Args:
|
||||
message: Warning message to log
|
||||
"""
|
||||
color_code = self.BG_BLACK + self.YELLOW
|
||||
message = f"[{self.name}] WARNING: {message}"
|
||||
logging.warning(color_code + message + self.RESET)
|
||||
|
||||
|
||||
def timed(func: Callable[..., Any]) -> Callable[..., Any]:
|
||||
"""
|
||||
Decorator to log execution time of agent methods.
|
||||
|
||||
Args:
|
||||
func: Function to time
|
||||
|
||||
Returns:
|
||||
Wrapped function
|
||||
"""
|
||||
@wraps(func)
|
||||
def wrapper(self: Any, *args: Any, **kwargs: Any) -> Any:
|
||||
"""Wrapper function that times and logs method execution."""
|
||||
start_time = time.time()
|
||||
result = func(self, *args, **kwargs)
|
||||
elapsed = time.time() - start_time
|
||||
self.log(f"{func.__name__} completed in {elapsed:.2f} seconds")
|
||||
return result
|
||||
return wrapper
|
||||
|
||||
@@ -1,229 +0,0 @@
|
||||
"""Deduplication agent for identifying and managing duplicate cat listings."""
|
||||
|
||||
import os
|
||||
from typing import List, Tuple, Optional
|
||||
from dotenv import load_dotenv
|
||||
import numpy as np
|
||||
|
||||
from models.cats import Cat
|
||||
from database.manager import DatabaseManager
|
||||
from utils.deduplication import (
|
||||
create_fingerprint,
|
||||
calculate_text_similarity,
|
||||
calculate_composite_score
|
||||
)
|
||||
from utils.image_utils import generate_image_embedding, calculate_image_similarity
|
||||
from .agent import Agent, timed
|
||||
|
||||
|
||||
class DeduplicationAgent(Agent):
|
||||
"""Agent for deduplicating cat listings across multiple sources."""
|
||||
|
||||
name = "Deduplication Agent"
|
||||
color = Agent.YELLOW
|
||||
|
||||
def __init__(self, db_manager: DatabaseManager):
|
||||
"""
|
||||
Initialize the deduplication agent.
|
||||
|
||||
Args:
|
||||
db_manager: Database manager instance
|
||||
"""
|
||||
load_dotenv()
|
||||
|
||||
self.db_manager = db_manager
|
||||
|
||||
# Load thresholds from environment
|
||||
self.name_threshold = float(os.getenv('DEDUP_NAME_SIMILARITY_THRESHOLD', '0.8'))
|
||||
self.desc_threshold = float(os.getenv('DEDUP_DESCRIPTION_SIMILARITY_THRESHOLD', '0.7'))
|
||||
self.image_threshold = float(os.getenv('DEDUP_IMAGE_SIMILARITY_THRESHOLD', '0.9'))
|
||||
self.composite_threshold = float(os.getenv('DEDUP_COMPOSITE_THRESHOLD', '0.85'))
|
||||
|
||||
self.log("Deduplication Agent initialized")
|
||||
self.log(f"Thresholds - Name: {self.name_threshold}, Desc: {self.desc_threshold}, "
|
||||
f"Image: {self.image_threshold}, Composite: {self.composite_threshold}")
|
||||
|
||||
def _get_image_embedding(self, cat: Cat) -> Optional[np.ndarray]:
|
||||
"""
|
||||
Get or generate image embedding for a cat.
|
||||
|
||||
Args:
|
||||
cat: Cat object
|
||||
|
||||
Returns:
|
||||
Image embedding or None if unavailable
|
||||
"""
|
||||
if not cat.primary_photo:
|
||||
return None
|
||||
|
||||
try:
|
||||
embedding = generate_image_embedding(cat.primary_photo)
|
||||
return embedding
|
||||
except Exception as e:
|
||||
self.log_warning(f"Failed to generate image embedding for {cat.name}: {e}")
|
||||
return None
|
||||
|
||||
def _compare_cats(self, cat1: Cat, cat2: Cat,
|
||||
emb1: Optional[np.ndarray],
|
||||
emb2: Optional[np.ndarray]) -> Tuple[float, dict]:
|
||||
"""
|
||||
Compare two cats and return composite similarity score with details.
|
||||
|
||||
Args:
|
||||
cat1: First cat
|
||||
cat2: Second cat
|
||||
emb1: Image embedding for cat1
|
||||
emb2: Image embedding for cat2
|
||||
|
||||
Returns:
|
||||
Tuple of (composite_score, details_dict)
|
||||
"""
|
||||
# Text similarity
|
||||
name_sim, desc_sim = calculate_text_similarity(cat1, cat2)
|
||||
|
||||
# Image similarity
|
||||
image_sim = 0.0
|
||||
if emb1 is not None and emb2 is not None:
|
||||
image_sim = calculate_image_similarity(emb1, emb2)
|
||||
|
||||
# Composite score
|
||||
composite = calculate_composite_score(
|
||||
name_similarity=name_sim,
|
||||
description_similarity=desc_sim,
|
||||
image_similarity=image_sim,
|
||||
name_weight=0.4,
|
||||
description_weight=0.3,
|
||||
image_weight=0.3
|
||||
)
|
||||
|
||||
details = {
|
||||
'name_similarity': name_sim,
|
||||
'description_similarity': desc_sim,
|
||||
'image_similarity': image_sim,
|
||||
'composite_score': composite
|
||||
}
|
||||
|
||||
return composite, details
|
||||
|
||||
@timed
|
||||
def process_cat(self, cat: Cat) -> Tuple[Cat, bool]:
|
||||
"""
|
||||
Process a single cat for deduplication.
|
||||
|
||||
Checks if the cat is a duplicate of an existing cat in the database.
|
||||
If it's a duplicate, marks it as such and returns the canonical cat.
|
||||
If it's unique, caches it in the database.
|
||||
|
||||
Args:
|
||||
cat: Cat to process
|
||||
|
||||
Returns:
|
||||
Tuple of (canonical_cat, is_duplicate)
|
||||
"""
|
||||
# Generate fingerprint
|
||||
cat.fingerprint = create_fingerprint(cat)
|
||||
|
||||
# Check database for cats with same fingerprint
|
||||
candidates = self.db_manager.get_cats_by_fingerprint(cat.fingerprint)
|
||||
|
||||
if not candidates:
|
||||
# No candidates, this is unique
|
||||
# Generate and cache image embedding
|
||||
embedding = self._get_image_embedding(cat)
|
||||
self.db_manager.cache_cat(cat, embedding)
|
||||
return cat, False
|
||||
|
||||
self.log(f"Found {len(candidates)} potential duplicates for {cat.name}")
|
||||
|
||||
# Get embedding for new cat
|
||||
new_embedding = self._get_image_embedding(cat)
|
||||
|
||||
# Compare with each candidate
|
||||
best_match = None
|
||||
best_score = 0.0
|
||||
best_details = None
|
||||
|
||||
for candidate_cat, candidate_embedding in candidates:
|
||||
score, details = self._compare_cats(cat, candidate_cat, new_embedding, candidate_embedding)
|
||||
|
||||
self.log(f"Comparing with {candidate_cat.name} (ID: {candidate_cat.id}): "
|
||||
f"name={details['name_similarity']:.2f}, "
|
||||
f"desc={details['description_similarity']:.2f}, "
|
||||
f"image={details['image_similarity']:.2f}, "
|
||||
f"composite={score:.2f}")
|
||||
|
||||
if score > best_score:
|
||||
best_score = score
|
||||
best_match = candidate_cat
|
||||
best_details = details
|
||||
|
||||
# Check if best match exceeds threshold
|
||||
if best_match and best_score >= self.composite_threshold:
|
||||
self.log(f"DUPLICATE DETECTED: {cat.name} is duplicate of {best_match.name} "
|
||||
f"(score: {best_score:.2f})")
|
||||
|
||||
# Mark as duplicate in database
|
||||
self.db_manager.mark_as_duplicate(cat.id, best_match.id)
|
||||
|
||||
return best_match, True
|
||||
|
||||
# Not a duplicate, cache it
|
||||
self.log(f"UNIQUE: {cat.name} is not a duplicate (best score: {best_score:.2f})")
|
||||
self.db_manager.cache_cat(cat, new_embedding)
|
||||
|
||||
return cat, False
|
||||
|
||||
@timed
|
||||
def deduplicate_batch(self, cats: List[Cat]) -> List[Cat]:
|
||||
"""
|
||||
Process a batch of cats for deduplication.
|
||||
|
||||
Args:
|
||||
cats: List of cats to process
|
||||
|
||||
Returns:
|
||||
List of unique cats (duplicates removed)
|
||||
"""
|
||||
self.log(f"Deduplicating batch of {len(cats)} cats")
|
||||
|
||||
unique_cats = []
|
||||
duplicate_count = 0
|
||||
|
||||
for cat in cats:
|
||||
try:
|
||||
canonical_cat, is_duplicate = self.process_cat(cat)
|
||||
|
||||
if not is_duplicate:
|
||||
unique_cats.append(canonical_cat)
|
||||
else:
|
||||
duplicate_count += 1
|
||||
# Optionally include canonical if not already in list
|
||||
if canonical_cat not in unique_cats:
|
||||
unique_cats.append(canonical_cat)
|
||||
|
||||
except Exception as e:
|
||||
self.log_error(f"Error processing cat {cat.name}: {e}")
|
||||
# Include it anyway to avoid losing data
|
||||
unique_cats.append(cat)
|
||||
|
||||
self.log(f"Deduplication complete: {len(unique_cats)} unique, {duplicate_count} duplicates")
|
||||
|
||||
return unique_cats
|
||||
|
||||
def get_duplicate_report(self) -> dict:
|
||||
"""
|
||||
Generate a report of duplicate statistics.
|
||||
|
||||
Returns:
|
||||
Dictionary with duplicate statistics
|
||||
"""
|
||||
stats = self.db_manager.get_cache_stats()
|
||||
|
||||
return {
|
||||
'total_unique': stats['total_unique'],
|
||||
'total_duplicates': stats['total_duplicates'],
|
||||
'deduplication_rate': stats['total_duplicates'] / (stats['total_unique'] + stats['total_duplicates'])
|
||||
if (stats['total_unique'] + stats['total_duplicates']) > 0 else 0,
|
||||
'by_source': stats['by_source']
|
||||
}
|
||||
|
||||
@@ -1,386 +0,0 @@
|
||||
"""Email agent for sending match notifications."""
|
||||
|
||||
from typing import List, Optional
|
||||
from datetime import datetime
|
||||
|
||||
from agents.agent import Agent
|
||||
from agents.email_providers import get_email_provider, EmailProvider
|
||||
from models.cats import CatMatch, AdoptionAlert
|
||||
from utils.timing import timed
|
||||
from utils.config import get_email_config
|
||||
|
||||
|
||||
class EmailAgent(Agent):
|
||||
"""Agent for sending email notifications about cat matches."""
|
||||
|
||||
name = "Email Agent"
|
||||
color = '\033[35m' # Magenta
|
||||
|
||||
def __init__(self, provider: Optional[EmailProvider] = None):
|
||||
"""
|
||||
Initialize the email agent.
|
||||
|
||||
Args:
|
||||
provider: Optional email provider instance. If None, creates from config.
|
||||
"""
|
||||
super().__init__()
|
||||
|
||||
try:
|
||||
self.provider = provider or get_email_provider()
|
||||
self.enabled = True
|
||||
self.log(f"Email Agent initialized with provider: {self.provider.get_provider_name()}")
|
||||
except Exception as e:
|
||||
self.log_error(f"Failed to initialize email provider: {e}")
|
||||
self.log_warning("Email notifications disabled")
|
||||
self.enabled = False
|
||||
self.provider = None
|
||||
|
||||
def _build_match_html(self, matches: List[CatMatch], alert: AdoptionAlert) -> str:
|
||||
"""
|
||||
Build HTML email content for matches.
|
||||
|
||||
Args:
|
||||
matches: List of cat matches
|
||||
alert: Adoption alert with user preferences
|
||||
|
||||
Returns:
|
||||
HTML email content
|
||||
"""
|
||||
# Header
|
||||
html = f"""
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<style>
|
||||
body {{
|
||||
font-family: Arial, sans-serif;
|
||||
line-height: 1.6;
|
||||
color: #333;
|
||||
max-width: 800px;
|
||||
margin: 0 auto;
|
||||
padding: 20px;
|
||||
}}
|
||||
.header {{
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
color: white;
|
||||
padding: 30px;
|
||||
border-radius: 10px;
|
||||
text-align: center;
|
||||
margin-bottom: 30px;
|
||||
}}
|
||||
.header h1 {{
|
||||
margin: 0;
|
||||
font-size: 2.5em;
|
||||
}}
|
||||
.cat-card {{
|
||||
border: 1px solid #ddd;
|
||||
border-radius: 10px;
|
||||
overflow: hidden;
|
||||
margin-bottom: 20px;
|
||||
box-shadow: 0 2px 8px rgba(0,0,0,0.1);
|
||||
}}
|
||||
.cat-photo {{
|
||||
width: 100%;
|
||||
height: 300px;
|
||||
object-fit: cover;
|
||||
}}
|
||||
.cat-details {{
|
||||
padding: 20px;
|
||||
}}
|
||||
.cat-name {{
|
||||
font-size: 1.8em;
|
||||
color: #333;
|
||||
margin: 0 0 10px 0;
|
||||
}}
|
||||
.match-score {{
|
||||
background: #4CAF50;
|
||||
color: white;
|
||||
padding: 5px 15px;
|
||||
border-radius: 20px;
|
||||
display: inline-block;
|
||||
font-weight: bold;
|
||||
margin-bottom: 10px;
|
||||
}}
|
||||
.cat-info {{
|
||||
color: #666;
|
||||
margin: 10px 0;
|
||||
}}
|
||||
.cat-description {{
|
||||
color: #888;
|
||||
line-height: 1.8;
|
||||
margin: 15px 0;
|
||||
}}
|
||||
.view-button {{
|
||||
display: inline-block;
|
||||
background: #2196F3;
|
||||
color: white;
|
||||
padding: 12px 30px;
|
||||
border-radius: 5px;
|
||||
text-decoration: none;
|
||||
font-weight: bold;
|
||||
margin-top: 10px;
|
||||
}}
|
||||
.footer {{
|
||||
text-align: center;
|
||||
color: #999;
|
||||
padding: 30px 0;
|
||||
border-top: 1px solid #eee;
|
||||
margin-top: 30px;
|
||||
}}
|
||||
.unsubscribe {{
|
||||
color: #999;
|
||||
text-decoration: none;
|
||||
}}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="header">
|
||||
<h1>🎩 Tuxedo Link</h1>
|
||||
<p>We found {len(matches)} new cat{'s' if len(matches) != 1 else ''} matching your preferences!</p>
|
||||
</div>
|
||||
"""
|
||||
|
||||
# Cat cards
|
||||
for match in matches[:10]: # Limit to top 10 for email
|
||||
cat = match.cat
|
||||
photo = cat.primary_photo or "https://via.placeholder.com/800x300?text=No+Photo"
|
||||
|
||||
html += f"""
|
||||
<div class="cat-card">
|
||||
<img src="{photo}" alt="{cat.name}" class="cat-photo">
|
||||
<div class="cat-details">
|
||||
<h2 class="cat-name">{cat.name}</h2>
|
||||
<div class="match-score">{match.match_score:.0%} Match</div>
|
||||
<div class="cat-info">
|
||||
<strong>{cat.breed}</strong><br/>
|
||||
📍 {cat.city}, {cat.state}<br/>
|
||||
🎂 {cat.age} • {cat.gender.capitalize()} • {cat.size.capitalize() if cat.size else 'Size not specified'}<br/>
|
||||
"""
|
||||
|
||||
# Add special attributes
|
||||
attrs = []
|
||||
if cat.good_with_children:
|
||||
attrs.append("👶 Good with children")
|
||||
if cat.good_with_dogs:
|
||||
attrs.append("🐕 Good with dogs")
|
||||
if cat.good_with_cats:
|
||||
attrs.append("🐱 Good with cats")
|
||||
|
||||
if attrs:
|
||||
html += "<br/>" + " • ".join(attrs)
|
||||
|
||||
html += f"""
|
||||
</div>
|
||||
<div class="cat-description">
|
||||
<strong>Why this is a great match:</strong><br/>
|
||||
{match.explanation}
|
||||
</div>
|
||||
"""
|
||||
|
||||
# Add description if available
|
||||
if cat.description:
|
||||
desc = cat.description[:300] + "..." if len(cat.description) > 300 else cat.description
|
||||
html += f"""
|
||||
<div class="cat-description">
|
||||
<strong>About {cat.name}:</strong><br/>
|
||||
{desc}
|
||||
</div>
|
||||
"""
|
||||
|
||||
html += f"""
|
||||
<a href="{cat.url}" class="view-button">View {cat.name}'s Profile →</a>
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
# Footer
|
||||
html += f"""
|
||||
<div class="footer">
|
||||
<p>This email was sent because you saved a search on Tuxedo Link.</p>
|
||||
<p>
|
||||
<a href="http://localhost:7860" class="unsubscribe">Manage Alerts</a> |
|
||||
<a href="http://localhost:7860" class="unsubscribe">Unsubscribe</a>
|
||||
</p>
|
||||
<p>Made with ❤️ in memory of Tuxedo</p>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
"""
|
||||
|
||||
return html
|
||||
|
||||
def _build_match_text(self, matches: List[CatMatch]) -> str:
|
||||
"""
|
||||
Build plain text email content for matches.
|
||||
|
||||
Args:
|
||||
matches: List of cat matches
|
||||
|
||||
Returns:
|
||||
Plain text email content
|
||||
"""
|
||||
text = f"TUXEDO LINK - New Matches Found!\n\n"
|
||||
text += f"We found {len(matches)} cat{'s' if len(matches) != 1 else ''} matching your preferences!\n\n"
|
||||
text += "="*60 + "\n\n"
|
||||
|
||||
for i, match in enumerate(matches[:10], 1):
|
||||
cat = match.cat
|
||||
text += f"{i}. {cat.name} - {match.match_score:.0%} Match\n"
|
||||
text += f" {cat.breed}\n"
|
||||
text += f" {cat.city}, {cat.state}\n"
|
||||
text += f" {cat.age} • {cat.gender} • {cat.size or 'Size not specified'}\n"
|
||||
text += f" Match: {match.explanation}\n"
|
||||
text += f" View: {cat.url}\n\n"
|
||||
|
||||
text += "="*60 + "\n"
|
||||
text += "Manage your alerts: http://localhost:7860\n"
|
||||
text += "Made with love in memory of Tuxedo\n"
|
||||
|
||||
return text
|
||||
|
||||
@timed
|
||||
def send_match_notification(
|
||||
self,
|
||||
alert: AdoptionAlert,
|
||||
matches: List[CatMatch]
|
||||
) -> bool:
|
||||
"""
|
||||
Send email notification about new matches.
|
||||
|
||||
Args:
|
||||
alert: Adoption alert with user email and preferences
|
||||
matches: List of cat matches to notify about
|
||||
|
||||
Returns:
|
||||
True if email sent successfully, False otherwise
|
||||
"""
|
||||
if not self.enabled:
|
||||
self.log_warning("Email agent disabled - skipping notification")
|
||||
return False
|
||||
|
||||
if not matches:
|
||||
self.log("No matches to send")
|
||||
return False
|
||||
|
||||
try:
|
||||
# Build email content
|
||||
subject = f"🐱 {len(matches)} New Cat Match{'es' if len(matches) != 1 else ''} on Tuxedo Link!"
|
||||
html_content = self._build_match_html(matches, alert)
|
||||
text_content = self._build_match_text(matches)
|
||||
|
||||
# Send via provider
|
||||
self.log(f"Sending notification to {alert.user_email} for {len(matches)} matches")
|
||||
success = self.provider.send_email(
|
||||
to=alert.user_email,
|
||||
subject=subject,
|
||||
html=html_content,
|
||||
text=text_content
|
||||
)
|
||||
|
||||
if success:
|
||||
self.log(f"✅ Email sent successfully")
|
||||
return True
|
||||
else:
|
||||
self.log_error(f"Failed to send email")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
self.log_error(f"Error sending email: {e}")
|
||||
return False
|
||||
|
||||
@timed
|
||||
def send_welcome_email(self, user_email: str, user_name: str = None) -> bool:
|
||||
"""
|
||||
Send welcome email when user creates an alert.
|
||||
|
||||
Args:
|
||||
user_email: User's email address
|
||||
user_name: User's name (optional)
|
||||
|
||||
Returns:
|
||||
True if sent successfully, False otherwise
|
||||
"""
|
||||
if not self.enabled:
|
||||
return False
|
||||
|
||||
try:
|
||||
greeting = f"Hi {user_name}" if user_name else "Hello"
|
||||
|
||||
subject = "Welcome to Tuxedo Link! 🐱"
|
||||
|
||||
html_content = f"""
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<style>
|
||||
body {{
|
||||
font-family: Arial, sans-serif;
|
||||
line-height: 1.6;
|
||||
color: #333;
|
||||
max-width: 600px;
|
||||
margin: 0 auto;
|
||||
padding: 20px;
|
||||
}}
|
||||
.header {{
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
color: white;
|
||||
padding: 40px;
|
||||
border-radius: 10px;
|
||||
text-align: center;
|
||||
}}
|
||||
.content {{
|
||||
padding: 30px 0;
|
||||
}}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="header">
|
||||
<h1>🎩 Welcome to Tuxedo Link!</h1>
|
||||
</div>
|
||||
<div class="content">
|
||||
<p>{greeting}!</p>
|
||||
<p>Thank you for signing up for cat adoption alerts. We're excited to help you find your perfect feline companion!</p>
|
||||
<p>We'll notify you when new cats matching your preferences become available for adoption.</p>
|
||||
<p><strong>What happens next?</strong></p>
|
||||
<ul>
|
||||
<li>We'll search across multiple adoption platforms</li>
|
||||
<li>You'll receive email notifications based on your preferences</li>
|
||||
<li>You can manage your alerts anytime at <a href="http://localhost:7860">Tuxedo Link</a></li>
|
||||
</ul>
|
||||
<p>Happy cat hunting! 🐾</p>
|
||||
<p style="color: #999; font-style: italic;">In loving memory of Kyra</p>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
"""
|
||||
|
||||
text_content = f"""
|
||||
{greeting}!
|
||||
|
||||
Thank you for signing up for Tuxedo Link cat adoption alerts!
|
||||
|
||||
We'll notify you when new cats matching your preferences become available.
|
||||
|
||||
What happens next?
|
||||
- We'll search across multiple adoption platforms
|
||||
- You'll receive email notifications based on your preferences
|
||||
- Manage your alerts at: http://localhost:7860
|
||||
|
||||
Happy cat hunting!
|
||||
|
||||
In loving memory of Kyra
|
||||
"""
|
||||
|
||||
success = self.provider.send_email(
|
||||
to=user_email,
|
||||
subject=subject,
|
||||
html=html_content,
|
||||
text=text_content
|
||||
)
|
||||
|
||||
return success
|
||||
|
||||
except Exception as e:
|
||||
self.log_error(f"Error sending welcome email: {e}")
|
||||
return False
|
||||
|
||||
@@ -1,14 +0,0 @@
|
||||
"""Email provider implementations."""
|
||||
|
||||
from .base import EmailProvider
|
||||
from .mailgun_provider import MailgunProvider
|
||||
from .sendgrid_provider import SendGridProvider
|
||||
from .factory import get_email_provider
|
||||
|
||||
__all__ = [
|
||||
"EmailProvider",
|
||||
"MailgunProvider",
|
||||
"SendGridProvider",
|
||||
"get_email_provider",
|
||||
]
|
||||
|
||||
@@ -1,45 +0,0 @@
|
||||
"""Base email provider interface."""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Dict, Optional
|
||||
|
||||
|
||||
class EmailProvider(ABC):
|
||||
"""Abstract base class for email providers."""
|
||||
|
||||
@abstractmethod
|
||||
def send_email(
|
||||
self,
|
||||
to: str,
|
||||
subject: str,
|
||||
html: str,
|
||||
text: str,
|
||||
from_email: Optional[str] = None,
|
||||
from_name: Optional[str] = None
|
||||
) -> bool:
|
||||
"""
|
||||
Send an email.
|
||||
|
||||
Args:
|
||||
to: Recipient email address
|
||||
subject: Email subject
|
||||
html: HTML body
|
||||
text: Plain text body
|
||||
from_email: Sender email (optional, uses config default)
|
||||
from_name: Sender name (optional, uses config default)
|
||||
|
||||
Returns:
|
||||
bool: True if email was sent successfully, False otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_provider_name(self) -> str:
|
||||
"""
|
||||
Get the name of this provider.
|
||||
|
||||
Returns:
|
||||
str: Provider name
|
||||
"""
|
||||
pass
|
||||
|
||||
@@ -1,45 +0,0 @@
|
||||
"""Email provider factory."""
|
||||
|
||||
import os
|
||||
import logging
|
||||
from typing import Optional
|
||||
from .base import EmailProvider
|
||||
from .mailgun_provider import MailgunProvider
|
||||
from .sendgrid_provider import SendGridProvider
|
||||
from utils.config import get_email_provider as get_configured_provider
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def get_email_provider(provider_name: Optional[str] = None) -> EmailProvider:
|
||||
"""
|
||||
Get an email provider instance.
|
||||
|
||||
Args:
|
||||
provider_name: Provider name (mailgun or sendgrid).
|
||||
If None, uses configuration from config.yaml
|
||||
|
||||
Returns:
|
||||
EmailProvider: Configured email provider instance
|
||||
|
||||
Raises:
|
||||
ValueError: If provider name is unknown
|
||||
"""
|
||||
if not provider_name:
|
||||
provider_name = get_configured_provider()
|
||||
|
||||
provider_name = provider_name.lower()
|
||||
|
||||
logger.info(f"Initializing email provider: {provider_name}")
|
||||
|
||||
if provider_name == 'mailgun':
|
||||
return MailgunProvider()
|
||||
elif provider_name == 'sendgrid':
|
||||
return SendGridProvider()
|
||||
else:
|
||||
raise ValueError(
|
||||
f"Unknown email provider: {provider_name}. "
|
||||
"Valid options are: mailgun, sendgrid"
|
||||
)
|
||||
|
||||
@@ -1,97 +0,0 @@
|
||||
"""Mailgun email provider implementation."""
|
||||
|
||||
import os
|
||||
import requests
|
||||
import logging
|
||||
from typing import Optional
|
||||
from .base import EmailProvider
|
||||
from utils.config import get_mailgun_config, get_email_config
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class MailgunProvider(EmailProvider):
|
||||
"""Mailgun email provider."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize Mailgun provider."""
|
||||
self.api_key = os.getenv('MAILGUN_API_KEY')
|
||||
if not self.api_key:
|
||||
raise ValueError("MAILGUN_API_KEY environment variable not set")
|
||||
|
||||
mailgun_config = get_mailgun_config()
|
||||
self.domain = mailgun_config['domain']
|
||||
self.base_url = f"https://api.mailgun.net/v3/{self.domain}/messages"
|
||||
|
||||
email_config = get_email_config()
|
||||
self.default_from_name = email_config['from_name']
|
||||
self.default_from_email = email_config['from_email']
|
||||
|
||||
logger.info(f"Mailgun provider initialized with domain: {self.domain}")
|
||||
|
||||
def send_email(
|
||||
self,
|
||||
to: str,
|
||||
subject: str,
|
||||
html: str,
|
||||
text: str,
|
||||
from_email: Optional[str] = None,
|
||||
from_name: Optional[str] = None
|
||||
) -> bool:
|
||||
"""
|
||||
Send an email via Mailgun.
|
||||
|
||||
Args:
|
||||
to: Recipient email address
|
||||
subject: Email subject
|
||||
html: HTML body
|
||||
text: Plain text body
|
||||
from_email: Sender email (optional, uses config default)
|
||||
from_name: Sender name (optional, uses config default)
|
||||
|
||||
Returns:
|
||||
bool: True if email was sent successfully, False otherwise
|
||||
"""
|
||||
from_email = from_email or self.default_from_email
|
||||
from_name = from_name or self.default_from_name
|
||||
from_header = f"{from_name} <{from_email}>"
|
||||
|
||||
data = {
|
||||
"from": from_header,
|
||||
"to": to,
|
||||
"subject": subject,
|
||||
"text": text,
|
||||
"html": html
|
||||
}
|
||||
|
||||
try:
|
||||
response = requests.post(
|
||||
self.base_url,
|
||||
auth=("api", self.api_key),
|
||||
data=data,
|
||||
timeout=30
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
logger.info(f"Email sent successfully to {to} via Mailgun")
|
||||
return True
|
||||
else:
|
||||
logger.error(
|
||||
f"Failed to send email via Mailgun: {response.status_code} - {response.text}"
|
||||
)
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Exception sending email via Mailgun: {e}")
|
||||
return False
|
||||
|
||||
def get_provider_name(self) -> str:
|
||||
"""
|
||||
Get the name of this provider.
|
||||
|
||||
Returns:
|
||||
str: Provider name
|
||||
"""
|
||||
return "mailgun"
|
||||
|
||||
@@ -1,72 +0,0 @@
|
||||
"""SendGrid email provider implementation (stub)."""
|
||||
|
||||
import os
|
||||
import logging
|
||||
from typing import Optional
|
||||
from .base import EmailProvider
|
||||
from utils.config import get_email_config
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SendGridProvider(EmailProvider):
|
||||
"""SendGrid email provider (stub implementation)."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize SendGrid provider."""
|
||||
self.api_key = os.getenv('SENDGRID_API_KEY')
|
||||
|
||||
email_config = get_email_config()
|
||||
self.default_from_name = email_config['from_name']
|
||||
self.default_from_email = email_config['from_email']
|
||||
|
||||
logger.info("SendGrid provider initialized (stub mode)")
|
||||
if not self.api_key:
|
||||
logger.warning("SENDGRID_API_KEY not set - stub will only log, not send")
|
||||
|
||||
def send_email(
|
||||
self,
|
||||
to: str,
|
||||
subject: str,
|
||||
html: str,
|
||||
text: str,
|
||||
from_email: Optional[str] = None,
|
||||
from_name: Optional[str] = None
|
||||
) -> bool:
|
||||
"""
|
||||
Send an email via SendGrid (stub - only logs, doesn't actually send).
|
||||
|
||||
Args:
|
||||
to: Recipient email address
|
||||
subject: Email subject
|
||||
html: HTML body
|
||||
text: Plain text body
|
||||
from_email: Sender email (optional, uses config default)
|
||||
from_name: Sender name (optional, uses config default)
|
||||
|
||||
Returns:
|
||||
bool: True (always succeeds in stub mode)
|
||||
"""
|
||||
from_email = from_email or self.default_from_email
|
||||
from_name = from_name or self.default_from_name
|
||||
|
||||
logger.info(f"[STUB] Would send email via SendGrid:")
|
||||
logger.info(f" From: {from_name} <{from_email}>")
|
||||
logger.info(f" To: {to}")
|
||||
logger.info(f" Subject: {subject}")
|
||||
logger.info(f" Text length: {len(text)} chars")
|
||||
logger.info(f" HTML length: {len(html)} chars")
|
||||
|
||||
# Simulate success
|
||||
return True
|
||||
|
||||
def get_provider_name(self) -> str:
|
||||
"""
|
||||
Get the name of this provider.
|
||||
|
||||
Returns:
|
||||
str: Provider name
|
||||
"""
|
||||
return "sendgrid (stub)"
|
||||
|
||||
@@ -1,399 +0,0 @@
|
||||
"""Matching agent for hybrid search (vector + metadata filtering)."""
|
||||
|
||||
import os
|
||||
from typing import List
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from models.cats import Cat, CatProfile, CatMatch
|
||||
from setup_vectordb import VectorDBManager
|
||||
from utils.geocoding import calculate_distance
|
||||
from .agent import Agent, timed
|
||||
|
||||
|
||||
class MatchingAgent(Agent):
|
||||
"""Agent for matching cats to user preferences using hybrid search."""
|
||||
|
||||
name = "Matching Agent"
|
||||
color = Agent.BLUE
|
||||
|
||||
def __init__(self, vector_db: VectorDBManager):
|
||||
"""
|
||||
Initialize the matching agent.
|
||||
|
||||
Args:
|
||||
vector_db: Vector database manager
|
||||
"""
|
||||
load_dotenv()
|
||||
|
||||
self.vector_db = vector_db
|
||||
|
||||
# Load configuration
|
||||
self.vector_top_n = int(os.getenv('VECTOR_TOP_N', '50'))
|
||||
self.final_limit = int(os.getenv('FINAL_RESULTS_LIMIT', '20'))
|
||||
self.semantic_weight = float(os.getenv('SEMANTIC_WEIGHT', '0.6'))
|
||||
self.attribute_weight = float(os.getenv('ATTRIBUTE_WEIGHT', '0.4'))
|
||||
|
||||
self.log("Matching Agent initialized")
|
||||
self.log(f"Config - Vector Top N: {self.vector_top_n}, Final Limit: {self.final_limit}")
|
||||
self.log(f"Weights - Semantic: {self.semantic_weight}, Attribute: {self.attribute_weight}")
|
||||
|
||||
def _apply_metadata_filters(self, profile: CatProfile) -> dict:
|
||||
"""
|
||||
Build ChromaDB where clause from profile hard constraints.
|
||||
|
||||
Args:
|
||||
profile: User's cat profile
|
||||
|
||||
Returns:
|
||||
Dictionary of metadata filters
|
||||
"""
|
||||
filters = []
|
||||
|
||||
# Age filter
|
||||
if profile.age_range:
|
||||
age_conditions = [{"age": age} for age in profile.age_range]
|
||||
if len(age_conditions) > 1:
|
||||
filters.append({"$or": age_conditions})
|
||||
else:
|
||||
filters.extend(age_conditions)
|
||||
|
||||
# Size filter
|
||||
if profile.size:
|
||||
size_conditions = [{"size": size} for size in profile.size]
|
||||
if len(size_conditions) > 1:
|
||||
filters.append({"$or": size_conditions})
|
||||
else:
|
||||
filters.extend(size_conditions)
|
||||
|
||||
# Gender filter
|
||||
if profile.gender_preference:
|
||||
filters.append({"gender": profile.gender_preference})
|
||||
|
||||
# Behavioral filters
|
||||
if profile.good_with_children is not None:
|
||||
# Filter for cats that are explicitly good with children or unknown
|
||||
if profile.good_with_children:
|
||||
filters.append({
|
||||
"$or": [
|
||||
{"good_with_children": "True"},
|
||||
{"good_with_children": "unknown"}
|
||||
]
|
||||
})
|
||||
|
||||
if profile.good_with_dogs is not None:
|
||||
if profile.good_with_dogs:
|
||||
filters.append({
|
||||
"$or": [
|
||||
{"good_with_dogs": "True"},
|
||||
{"good_with_dogs": "unknown"}
|
||||
]
|
||||
})
|
||||
|
||||
if profile.good_with_cats is not None:
|
||||
if profile.good_with_cats:
|
||||
filters.append({
|
||||
"$or": [
|
||||
{"good_with_cats": "True"},
|
||||
{"good_with_cats": "unknown"}
|
||||
]
|
||||
})
|
||||
|
||||
# Special needs filter
|
||||
if not profile.special_needs_ok:
|
||||
filters.append({"special_needs": "False"})
|
||||
|
||||
# Combine filters with AND logic
|
||||
if len(filters) == 0:
|
||||
return None
|
||||
elif len(filters) == 1:
|
||||
return filters[0]
|
||||
else:
|
||||
return {"$and": filters}
|
||||
|
||||
def _calculate_attribute_match_score(self, cat: Cat, profile: CatProfile) -> tuple[float, List[str], List[str]]:
|
||||
"""
|
||||
Calculate how well cat's attributes match profile preferences.
|
||||
|
||||
Args:
|
||||
cat: Cat to evaluate
|
||||
profile: User profile
|
||||
|
||||
Returns:
|
||||
Tuple of (score, matching_attributes, missing_attributes)
|
||||
"""
|
||||
matching_attrs = []
|
||||
missing_attrs = []
|
||||
total_checks = 0
|
||||
matches = 0
|
||||
|
||||
# Age preference
|
||||
if profile.age_range:
|
||||
total_checks += 1
|
||||
if cat.age in profile.age_range:
|
||||
matches += 1
|
||||
matching_attrs.append(f"Age: {cat.age}")
|
||||
else:
|
||||
missing_attrs.append(f"Preferred age: {', '.join(profile.age_range)}")
|
||||
|
||||
# Size preference
|
||||
if profile.size:
|
||||
total_checks += 1
|
||||
if cat.size in profile.size:
|
||||
matches += 1
|
||||
matching_attrs.append(f"Size: {cat.size}")
|
||||
else:
|
||||
missing_attrs.append(f"Preferred size: {', '.join(profile.size)}")
|
||||
|
||||
# Gender preference
|
||||
if profile.gender_preference:
|
||||
total_checks += 1
|
||||
if cat.gender == profile.gender_preference:
|
||||
matches += 1
|
||||
matching_attrs.append(f"Gender: {cat.gender}")
|
||||
else:
|
||||
missing_attrs.append(f"Preferred gender: {profile.gender_preference}")
|
||||
|
||||
# Good with children
|
||||
if profile.good_with_children:
|
||||
total_checks += 1
|
||||
if cat.good_with_children:
|
||||
matches += 1
|
||||
matching_attrs.append("Good with children")
|
||||
elif cat.good_with_children is False:
|
||||
missing_attrs.append("Not good with children")
|
||||
|
||||
# Good with dogs
|
||||
if profile.good_with_dogs:
|
||||
total_checks += 1
|
||||
if cat.good_with_dogs:
|
||||
matches += 1
|
||||
matching_attrs.append("Good with dogs")
|
||||
elif cat.good_with_dogs is False:
|
||||
missing_attrs.append("Not good with dogs")
|
||||
|
||||
# Good with cats
|
||||
if profile.good_with_cats:
|
||||
total_checks += 1
|
||||
if cat.good_with_cats:
|
||||
matches += 1
|
||||
matching_attrs.append("Good with other cats")
|
||||
elif cat.good_with_cats is False:
|
||||
missing_attrs.append("Not good with other cats")
|
||||
|
||||
# Special needs
|
||||
if not profile.special_needs_ok and cat.special_needs:
|
||||
total_checks += 1
|
||||
missing_attrs.append("Has special needs")
|
||||
|
||||
# Breed preference
|
||||
if profile.preferred_breeds:
|
||||
total_checks += 1
|
||||
if cat.breed.lower() in [b.lower() for b in profile.preferred_breeds]:
|
||||
matches += 1
|
||||
matching_attrs.append(f"Breed: {cat.breed}")
|
||||
else:
|
||||
missing_attrs.append(f"Preferred breeds: {', '.join(profile.preferred_breeds)}")
|
||||
|
||||
# Calculate score
|
||||
if total_checks == 0:
|
||||
return 0.5, matching_attrs, missing_attrs # Neutral if no preferences
|
||||
|
||||
score = matches / total_checks
|
||||
return score, matching_attrs, missing_attrs
|
||||
|
||||
def _filter_by_distance(self, cats_data: dict, profile: CatProfile) -> List[tuple[Cat, float, dict]]:
|
||||
"""
|
||||
Filter cats by distance and prepare for ranking.
|
||||
|
||||
Args:
|
||||
cats_data: Results from vector search
|
||||
profile: User profile
|
||||
|
||||
Returns:
|
||||
List of (cat, vector_similarity, metadata) tuples
|
||||
"""
|
||||
results = []
|
||||
|
||||
ids = cats_data['ids'][0]
|
||||
distances = cats_data['distances'][0]
|
||||
metadatas = cats_data['metadatas'][0]
|
||||
|
||||
for i, cat_id in enumerate(ids):
|
||||
metadata = metadatas[i]
|
||||
|
||||
# Convert distance to similarity (ChromaDB returns L2 distance)
|
||||
# Lower distance = higher similarity
|
||||
vector_similarity = 1.0 / (1.0 + distances[i])
|
||||
|
||||
# Check distance constraint
|
||||
if profile.user_latitude and profile.user_longitude:
|
||||
cat_lat = metadata.get('latitude')
|
||||
cat_lon = metadata.get('longitude')
|
||||
|
||||
if cat_lat and cat_lon and cat_lat != '' and cat_lon != '':
|
||||
try:
|
||||
cat_lat = float(cat_lat)
|
||||
cat_lon = float(cat_lon)
|
||||
distance = calculate_distance(
|
||||
profile.user_latitude,
|
||||
profile.user_longitude,
|
||||
cat_lat,
|
||||
cat_lon
|
||||
)
|
||||
|
||||
max_dist = profile.max_distance or 100
|
||||
if distance > max_dist:
|
||||
self.log(f"DEBUG: Filtered out {metadata['name']} - {distance:.1f} miles away (max: {max_dist})")
|
||||
continue # Skip this cat, too far away
|
||||
except (ValueError, TypeError):
|
||||
pass # Keep cat if coordinates invalid
|
||||
|
||||
# Reconstruct Cat from metadata
|
||||
cat = Cat(
|
||||
id=metadata['id'],
|
||||
name=metadata['name'],
|
||||
age=metadata['age'],
|
||||
size=metadata['size'],
|
||||
gender=metadata['gender'],
|
||||
breed=metadata['breed'],
|
||||
city=metadata.get('city', ''),
|
||||
state=metadata.get('state', ''),
|
||||
zip_code=metadata.get('zip_code', ''),
|
||||
latitude=float(metadata['latitude']) if metadata.get('latitude') and metadata['latitude'] != '' else None,
|
||||
longitude=float(metadata['longitude']) if metadata.get('longitude') and metadata['longitude'] != '' else None,
|
||||
organization_name=metadata['organization'],
|
||||
source=metadata['source'],
|
||||
url=metadata['url'],
|
||||
primary_photo=metadata.get('primary_photo', ''),
|
||||
description='', # Not stored in metadata
|
||||
good_with_children=metadata.get('good_with_children') == 'True' if metadata.get('good_with_children') != 'unknown' else None,
|
||||
good_with_dogs=metadata.get('good_with_dogs') == 'True' if metadata.get('good_with_dogs') != 'unknown' else None,
|
||||
good_with_cats=metadata.get('good_with_cats') == 'True' if metadata.get('good_with_cats') != 'unknown' else None,
|
||||
special_needs=metadata.get('special_needs') == 'True',
|
||||
)
|
||||
|
||||
results.append((cat, vector_similarity, metadata))
|
||||
|
||||
return results
|
||||
|
||||
def _create_explanation(self, cat: Cat, match_score: float, vector_sim: float, attr_score: float, matching_attrs: List[str]) -> str:
|
||||
"""
|
||||
Create human-readable explanation of match.
|
||||
|
||||
Args:
|
||||
cat: Matched cat
|
||||
match_score: Overall match score
|
||||
vector_sim: Vector similarity score
|
||||
attr_score: Attribute match score
|
||||
matching_attrs: List of matching attributes
|
||||
|
||||
Returns:
|
||||
Explanation string
|
||||
"""
|
||||
explanation_parts = []
|
||||
|
||||
# Overall match quality
|
||||
if match_score >= 0.8:
|
||||
explanation_parts.append(f"{cat.name} is an excellent match!")
|
||||
elif match_score >= 0.6:
|
||||
explanation_parts.append(f"{cat.name} is a good match.")
|
||||
else:
|
||||
explanation_parts.append(f"{cat.name} might be a match.")
|
||||
|
||||
# Personality match
|
||||
if vector_sim >= 0.7:
|
||||
explanation_parts.append("Personality description strongly matches your preferences.")
|
||||
elif vector_sim >= 0.5:
|
||||
explanation_parts.append("Personality description aligns with your preferences.")
|
||||
|
||||
# Matching attributes
|
||||
if matching_attrs:
|
||||
top_matches = matching_attrs[:3] # Show top 3
|
||||
explanation_parts.append("Matches: " + ", ".join(top_matches))
|
||||
|
||||
return " ".join(explanation_parts)
|
||||
|
||||
@timed
|
||||
def match(self, profile: CatProfile) -> List[CatMatch]:
|
||||
"""
|
||||
Find cats that match the user's profile using hybrid search.
|
||||
|
||||
Strategy:
|
||||
1. Vector search for semantic similarity (top N)
|
||||
2. Filter by hard constraints (metadata)
|
||||
3. Rank by weighted combination of semantic + attribute scores
|
||||
4. Return top matches with explanations
|
||||
|
||||
Args:
|
||||
profile: User's cat profile
|
||||
|
||||
Returns:
|
||||
List of CatMatch objects, sorted by match score
|
||||
"""
|
||||
self.log(f"Starting hybrid search with profile: {profile.personality_description[:50]}...")
|
||||
|
||||
# Step 1: Vector search
|
||||
query = profile.personality_description or "friendly, loving cat"
|
||||
where_clause = self._apply_metadata_filters(profile)
|
||||
|
||||
self.log(f"Vector search for top {self.vector_top_n} semantic matches")
|
||||
if where_clause:
|
||||
self.log(f"Applying metadata filters: {where_clause}")
|
||||
|
||||
results = self.vector_db.search(
|
||||
query=query,
|
||||
n_results=self.vector_top_n,
|
||||
where=where_clause
|
||||
)
|
||||
|
||||
if not results['ids'][0]:
|
||||
self.log("No results found matching criteria")
|
||||
return []
|
||||
|
||||
self.log(f"Vector search returned {len(results['ids'][0])} candidates")
|
||||
|
||||
# Step 2: Filter by distance (if applicable)
|
||||
candidates = self._filter_by_distance(results, profile)
|
||||
|
||||
# Step 3: Calculate attribute scores and rank
|
||||
self.log("Calculating attribute match scores and ranking")
|
||||
matches = []
|
||||
|
||||
for cat, vector_similarity, metadata in candidates:
|
||||
# Calculate attribute match score
|
||||
attr_score, matching_attrs, missing_attrs = self._calculate_attribute_match_score(cat, profile)
|
||||
|
||||
# Calculate weighted final score
|
||||
final_score = (
|
||||
self.semantic_weight * vector_similarity +
|
||||
self.attribute_weight * attr_score
|
||||
)
|
||||
|
||||
# Create explanation
|
||||
explanation = self._create_explanation(cat, final_score, vector_similarity, attr_score, matching_attrs)
|
||||
|
||||
# Create match object
|
||||
match = CatMatch(
|
||||
cat=cat,
|
||||
match_score=final_score,
|
||||
vector_similarity=vector_similarity,
|
||||
attribute_match_score=attr_score,
|
||||
explanation=explanation,
|
||||
matching_attributes=matching_attrs,
|
||||
missing_attributes=missing_attrs
|
||||
)
|
||||
|
||||
matches.append(match)
|
||||
|
||||
# Sort by match score
|
||||
matches.sort(key=lambda m: m.match_score, reverse=True)
|
||||
|
||||
# Return top matches
|
||||
top_matches = matches[:self.final_limit]
|
||||
|
||||
self.log(f"Returning top {len(top_matches)} matches")
|
||||
if top_matches:
|
||||
self.log(f"Best match: {top_matches[0].cat.name} (score: {top_matches[0].match_score:.2f})")
|
||||
|
||||
return top_matches
|
||||
|
||||
@@ -1,459 +0,0 @@
|
||||
"""Petfinder API agent for fetching cat adoption listings."""
|
||||
|
||||
import os
|
||||
import time
|
||||
import requests
|
||||
from datetime import datetime, timedelta
|
||||
from typing import List, Optional, Dict, Any
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from models.cats import Cat
|
||||
from .agent import Agent, timed
|
||||
|
||||
|
||||
class PetfinderAgent(Agent):
|
||||
"""Agent for interacting with Petfinder API v2."""
|
||||
|
||||
name = "Petfinder Agent"
|
||||
color = Agent.CYAN
|
||||
|
||||
BASE_URL = "https://api.petfinder.com/v2"
|
||||
TOKEN_URL = f"{BASE_URL}/oauth2/token"
|
||||
ANIMALS_URL = f"{BASE_URL}/animals"
|
||||
TYPES_URL = f"{BASE_URL}/types"
|
||||
|
||||
# Rate limiting
|
||||
MAX_REQUESTS_PER_SECOND = 1
|
||||
MAX_RESULTS_PER_PAGE = 100
|
||||
MAX_TOTAL_RESULTS = 1000
|
||||
|
||||
# Cache for valid colors and breeds (populated on first use)
|
||||
_valid_colors_cache: Optional[List[str]] = None
|
||||
_valid_breeds_cache: Optional[List[str]] = None
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the Petfinder agent with API credentials."""
|
||||
load_dotenv()
|
||||
|
||||
self.api_key = os.getenv('PETFINDER_API_KEY')
|
||||
self.api_secret = os.getenv('PETFINDER_SECRET')
|
||||
|
||||
if not self.api_key or not self.api_secret:
|
||||
raise ValueError("PETFINDER_API_KEY and PETFINDER_SECRET must be set in environment")
|
||||
|
||||
self.access_token: Optional[str] = None
|
||||
self.token_expires_at: Optional[datetime] = None
|
||||
self.last_request_time: float = 0
|
||||
|
||||
self.log("Petfinder Agent initialized")
|
||||
|
||||
def get_valid_colors(self) -> List[str]:
|
||||
"""
|
||||
Fetch valid colors for cats from Petfinder API.
|
||||
|
||||
Returns:
|
||||
List of valid color strings accepted by the API
|
||||
"""
|
||||
# Use class-level cache
|
||||
if PetfinderAgent._valid_colors_cache is not None:
|
||||
return PetfinderAgent._valid_colors_cache
|
||||
|
||||
try:
|
||||
self.log("Fetching valid cat colors from Petfinder API...")
|
||||
url = f"{self.TYPES_URL}/cat"
|
||||
token = self._get_access_token()
|
||||
headers = {'Authorization': f'Bearer {token}'}
|
||||
|
||||
response = requests.get(url, headers=headers, timeout=10)
|
||||
response.raise_for_status()
|
||||
|
||||
data = response.json()
|
||||
colors = data.get('type', {}).get('colors', [])
|
||||
|
||||
# Cache the results
|
||||
PetfinderAgent._valid_colors_cache = colors
|
||||
|
||||
self.log(f"✓ Fetched {len(colors)} valid colors from Petfinder")
|
||||
self.log(f"Valid colors: {', '.join(colors[:10])}...")
|
||||
|
||||
return colors
|
||||
except Exception as e:
|
||||
self.log_error(f"Failed to fetch valid colors: {e}")
|
||||
# Return common colors as fallback
|
||||
fallback = ["Black", "White", "Orange", "Gray", "Brown", "Cream", "Tabby"]
|
||||
self.log(f"Using fallback colors: {fallback}")
|
||||
return fallback
|
||||
|
||||
def get_valid_breeds(self) -> List[str]:
|
||||
"""
|
||||
Fetch valid cat breeds from Petfinder API.
|
||||
|
||||
Returns:
|
||||
List of valid breed strings accepted by the API
|
||||
"""
|
||||
# Use class-level cache
|
||||
if PetfinderAgent._valid_breeds_cache is not None:
|
||||
return PetfinderAgent._valid_breeds_cache
|
||||
|
||||
try:
|
||||
self.log("Fetching valid cat breeds from Petfinder API...")
|
||||
url = f"{self.TYPES_URL}/cat/breeds"
|
||||
token = self._get_access_token()
|
||||
headers = {'Authorization': f'Bearer {token}'}
|
||||
|
||||
response = requests.get(url, headers=headers, timeout=10)
|
||||
response.raise_for_status()
|
||||
|
||||
data = response.json()
|
||||
breeds = [breed['name'] for breed in data.get('breeds', [])]
|
||||
|
||||
# Cache the results
|
||||
PetfinderAgent._valid_breeds_cache = breeds
|
||||
|
||||
self.log(f"✓ Fetched {len(breeds)} valid breeds from Petfinder")
|
||||
|
||||
return breeds
|
||||
except Exception as e:
|
||||
self.log_error(f"Failed to fetch valid breeds: {e}")
|
||||
# Return common breeds as fallback
|
||||
fallback = ["Domestic Short Hair", "Domestic Medium Hair", "Domestic Long Hair", "Siamese", "Persian", "Maine Coon"]
|
||||
self.log(f"Using fallback breeds: {fallback}")
|
||||
return fallback
|
||||
|
||||
def _rate_limit(self) -> None:
|
||||
"""Implement rate limiting to respect API limits."""
|
||||
elapsed = time.time() - self.last_request_time
|
||||
min_interval = 1.0 / self.MAX_REQUESTS_PER_SECOND
|
||||
|
||||
if elapsed < min_interval:
|
||||
time.sleep(min_interval - elapsed)
|
||||
|
||||
self.last_request_time = time.time()
|
||||
|
||||
def _get_access_token(self) -> str:
|
||||
"""
|
||||
Get or refresh the OAuth access token.
|
||||
|
||||
Returns:
|
||||
Access token string
|
||||
"""
|
||||
# Check if we have a valid token
|
||||
if self.access_token and self.token_expires_at:
|
||||
if datetime.now() < self.token_expires_at:
|
||||
return self.access_token
|
||||
|
||||
# Request new token
|
||||
self.log("Requesting new access token from Petfinder")
|
||||
|
||||
data = {
|
||||
'grant_type': 'client_credentials',
|
||||
'client_id': self.api_key,
|
||||
'client_secret': self.api_secret
|
||||
}
|
||||
|
||||
try:
|
||||
response = requests.post(self.TOKEN_URL, data=data, timeout=10)
|
||||
response.raise_for_status()
|
||||
|
||||
token_data = response.json()
|
||||
self.access_token = token_data['access_token']
|
||||
|
||||
# Set expiration (subtract 60 seconds for safety)
|
||||
expires_in = token_data.get('expires_in', 3600)
|
||||
self.token_expires_at = datetime.now() + timedelta(seconds=expires_in - 60)
|
||||
|
||||
self.log(f"Access token obtained, expires at {self.token_expires_at}")
|
||||
return self.access_token
|
||||
|
||||
except Exception as e:
|
||||
self.log_error(f"Failed to get access token: {e}")
|
||||
raise
|
||||
|
||||
def _make_request(self, url: str, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Make an authenticated request to Petfinder API with rate limiting.
|
||||
|
||||
Args:
|
||||
url: API endpoint URL
|
||||
params: Query parameters
|
||||
|
||||
Returns:
|
||||
JSON response data
|
||||
"""
|
||||
self._rate_limit()
|
||||
|
||||
token = self._get_access_token()
|
||||
headers = {
|
||||
'Authorization': f'Bearer {token}'
|
||||
}
|
||||
|
||||
try:
|
||||
response = requests.get(url, headers=headers, params=params, timeout=10)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
except requests.exceptions.HTTPError as e:
|
||||
if e.response.status_code == 401:
|
||||
# Token might be invalid, clear it and retry once
|
||||
self.log_warning("Token invalid, refreshing and retrying")
|
||||
self.access_token = None
|
||||
token = self._get_access_token()
|
||||
headers['Authorization'] = f'Bearer {token}'
|
||||
|
||||
response = requests.get(url, headers=headers, params=params, timeout=10)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
else:
|
||||
raise
|
||||
|
||||
def _parse_cat(self, animal_data: Dict[str, Any]) -> Cat:
|
||||
"""
|
||||
Parse Petfinder API animal data into Cat model.
|
||||
|
||||
Args:
|
||||
animal_data: Animal data from Petfinder API
|
||||
|
||||
Returns:
|
||||
Cat object
|
||||
"""
|
||||
# Basic info
|
||||
cat_id = f"petfinder_{animal_data['id']}"
|
||||
name = animal_data.get('name', 'Unknown')
|
||||
|
||||
# Breed info
|
||||
breeds = animal_data.get('breeds', {})
|
||||
primary_breed = breeds.get('primary', 'Unknown')
|
||||
secondary_breed = breeds.get('secondary')
|
||||
secondary_breeds = [secondary_breed] if secondary_breed else []
|
||||
|
||||
# Age mapping
|
||||
age_map = {
|
||||
'Baby': 'kitten',
|
||||
'Young': 'young',
|
||||
'Adult': 'adult',
|
||||
'Senior': 'senior'
|
||||
}
|
||||
age = age_map.get(animal_data.get('age', 'Unknown'), 'unknown')
|
||||
|
||||
# Size mapping
|
||||
size_map = {
|
||||
'Small': 'small',
|
||||
'Medium': 'medium',
|
||||
'Large': 'large'
|
||||
}
|
||||
size = size_map.get(animal_data.get('size', 'Unknown'), 'unknown')
|
||||
|
||||
# Gender mapping
|
||||
gender_map = {
|
||||
'Male': 'male',
|
||||
'Female': 'female',
|
||||
'Unknown': 'unknown'
|
||||
}
|
||||
gender = gender_map.get(animal_data.get('gender', 'Unknown'), 'unknown')
|
||||
|
||||
# Description
|
||||
description = animal_data.get('description', '')
|
||||
if not description:
|
||||
description = f"{name} is a {age} {primary_breed} looking for a home."
|
||||
|
||||
# Location info
|
||||
contact = animal_data.get('contact', {})
|
||||
address = contact.get('address', {})
|
||||
|
||||
organization_id = animal_data.get('organization_id')
|
||||
city = address.get('city')
|
||||
state = address.get('state')
|
||||
zip_code = address.get('postcode')
|
||||
|
||||
# Attributes
|
||||
attributes = animal_data.get('attributes', {})
|
||||
environment = animal_data.get('environment', {})
|
||||
|
||||
# Photos
|
||||
photos_data = animal_data.get('photos', [])
|
||||
photos = [p['large'] or p['medium'] or p['small'] for p in photos_data if p]
|
||||
primary_photo = photos[0] if photos else None
|
||||
|
||||
# Videos
|
||||
videos_data = animal_data.get('videos', [])
|
||||
videos = [v.get('embed') for v in videos_data if v.get('embed')]
|
||||
|
||||
# Contact info
|
||||
contact_email = contact.get('email')
|
||||
contact_phone = contact.get('phone')
|
||||
|
||||
# Colors
|
||||
colors_data = animal_data.get('colors', {})
|
||||
colors = [c for c in [colors_data.get('primary'), colors_data.get('secondary'), colors_data.get('tertiary')] if c]
|
||||
|
||||
# Coat length
|
||||
coat = animal_data.get('coat')
|
||||
coat_map = {
|
||||
'Short': 'short',
|
||||
'Medium': 'medium',
|
||||
'Long': 'long'
|
||||
}
|
||||
coat_length = coat_map.get(coat) if coat else None
|
||||
|
||||
# URL
|
||||
url = animal_data.get('url', f"https://www.petfinder.com/cat/{animal_data['id']}")
|
||||
|
||||
return Cat(
|
||||
id=cat_id,
|
||||
name=name,
|
||||
breed=primary_breed,
|
||||
breeds_secondary=secondary_breeds,
|
||||
age=age,
|
||||
size=size,
|
||||
gender=gender,
|
||||
description=description,
|
||||
organization_name=animal_data.get('organization_id', 'Unknown Organization'),
|
||||
organization_id=organization_id,
|
||||
city=city,
|
||||
state=state,
|
||||
zip_code=zip_code,
|
||||
country='US',
|
||||
distance=animal_data.get('distance'),
|
||||
good_with_children=environment.get('children'),
|
||||
good_with_dogs=environment.get('dogs'),
|
||||
good_with_cats=environment.get('cats'),
|
||||
special_needs=attributes.get('special_needs', False),
|
||||
photos=photos,
|
||||
primary_photo=primary_photo,
|
||||
videos=videos,
|
||||
source='petfinder',
|
||||
url=url,
|
||||
contact_email=contact_email,
|
||||
contact_phone=contact_phone,
|
||||
declawed=attributes.get('declawed'),
|
||||
spayed_neutered=attributes.get('spayed_neutered'),
|
||||
house_trained=attributes.get('house_trained'),
|
||||
coat_length=coat_length,
|
||||
colors=colors,
|
||||
fetched_at=datetime.now()
|
||||
)
|
||||
|
||||
@timed
|
||||
def search_cats(
|
||||
self,
|
||||
location: Optional[str] = None,
|
||||
distance: int = 100,
|
||||
age: Optional[List[str]] = None,
|
||||
size: Optional[List[str]] = None,
|
||||
gender: Optional[str] = None,
|
||||
color: Optional[List[str]] = None,
|
||||
breed: Optional[List[str]] = None,
|
||||
good_with_children: Optional[bool] = None,
|
||||
good_with_dogs: Optional[bool] = None,
|
||||
good_with_cats: Optional[bool] = None,
|
||||
limit: int = 100
|
||||
) -> List[Cat]:
|
||||
"""
|
||||
Search for cats on Petfinder.
|
||||
|
||||
Args:
|
||||
location: ZIP code or "city, state" (e.g., "10001" or "New York, NY")
|
||||
distance: Search radius in miles (default: 100)
|
||||
age: List of age categories: baby, young, adult, senior
|
||||
size: List of sizes: small, medium, large
|
||||
gender: Gender filter: male, female
|
||||
color: List of colors (e.g., ["black", "white", "tuxedo"])
|
||||
breed: List of breed names (e.g., ["Siamese", "Maine Coon"])
|
||||
good_with_children: Filter for cats good with children
|
||||
good_with_dogs: Filter for cats good with dogs
|
||||
good_with_cats: Filter for cats good with other cats
|
||||
limit: Maximum number of results (default: 100, max: 1000)
|
||||
|
||||
Returns:
|
||||
List of Cat objects
|
||||
"""
|
||||
color_str = f" with colors {color}" if color else ""
|
||||
self.log(f"Searching for cats near {location} within {distance} miles{color_str}")
|
||||
|
||||
# Build query parameters
|
||||
params: Dict[str, Any] = {
|
||||
'type': 'cat',
|
||||
'limit': min(self.MAX_RESULTS_PER_PAGE, limit),
|
||||
'sort': 'recent'
|
||||
}
|
||||
|
||||
self.log(f"DEBUG: Initial params: {params}")
|
||||
|
||||
if location:
|
||||
params['location'] = location
|
||||
params['distance'] = distance
|
||||
|
||||
if age:
|
||||
# Map our age categories to Petfinder's
|
||||
age_map = {
|
||||
'kitten': 'baby',
|
||||
'young': 'young',
|
||||
'adult': 'adult',
|
||||
'senior': 'senior'
|
||||
}
|
||||
petfinder_ages = [age_map.get(a, a) for a in age]
|
||||
params['age'] = ','.join(petfinder_ages)
|
||||
|
||||
if size:
|
||||
params['size'] = ','.join(size)
|
||||
|
||||
if gender:
|
||||
params['gender'] = gender
|
||||
|
||||
if color:
|
||||
params['color'] = ','.join(color)
|
||||
|
||||
if breed:
|
||||
params['breed'] = ','.join(breed)
|
||||
|
||||
if good_with_children is not None:
|
||||
params['good_with_children'] = str(good_with_children).lower()
|
||||
|
||||
if good_with_dogs is not None:
|
||||
params['good_with_dogs'] = str(good_with_dogs).lower()
|
||||
|
||||
if good_with_cats is not None:
|
||||
params['good_with_cats'] = str(good_with_cats).lower()
|
||||
|
||||
self.log(f"DEBUG: ====== PETFINDER API CALL ======")
|
||||
self.log(f"DEBUG: Final API params: {params}")
|
||||
self.log(f"DEBUG: ================================")
|
||||
|
||||
# Fetch results with pagination
|
||||
cats = []
|
||||
page = 1
|
||||
total_pages = 1
|
||||
|
||||
while page <= total_pages and len(cats) < min(limit, self.MAX_TOTAL_RESULTS):
|
||||
params['page'] = page
|
||||
|
||||
try:
|
||||
data = self._make_request(self.ANIMALS_URL, params)
|
||||
|
||||
self.log(f"DEBUG: API Response - Total results: {data.get('pagination', {}).get('total_count', 'unknown')}")
|
||||
self.log(f"DEBUG: API Response - Animals in this page: {len(data.get('animals', []))}")
|
||||
|
||||
# Parse animals
|
||||
animals = data.get('animals', [])
|
||||
for animal_data in animals:
|
||||
try:
|
||||
cat = self._parse_cat(animal_data)
|
||||
cats.append(cat)
|
||||
except Exception as e:
|
||||
self.log_warning(f"Failed to parse cat {animal_data.get('id')}: {e}")
|
||||
|
||||
# Check pagination
|
||||
pagination = data.get('pagination', {})
|
||||
total_pages = pagination.get('total_pages', 1)
|
||||
|
||||
self.log(f"Fetched page {page}/{total_pages}, {len(animals)} cats")
|
||||
|
||||
page += 1
|
||||
|
||||
except Exception as e:
|
||||
self.log_error(f"Failed to fetch page {page}: {e}")
|
||||
break
|
||||
|
||||
self.log(f"Search complete: found {len(cats)} cats")
|
||||
return cats[:limit] # Ensure we don't exceed limit
|
||||
|
||||
@@ -1,365 +0,0 @@
|
||||
"""Planning agent for orchestrating the cat adoption search pipeline."""
|
||||
|
||||
import threading
|
||||
from typing import List
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
|
||||
from models.cats import Cat, CatProfile, CatMatch, SearchResult
|
||||
from database.manager import DatabaseManager
|
||||
from setup_vectordb import VectorDBManager
|
||||
from setup_metadata_vectordb import MetadataVectorDB
|
||||
from .agent import Agent, timed
|
||||
from .petfinder_agent import PetfinderAgent
|
||||
from .rescuegroups_agent import RescueGroupsAgent
|
||||
from .deduplication_agent import DeduplicationAgent
|
||||
from .matching_agent import MatchingAgent
|
||||
|
||||
|
||||
class PlanningAgent(Agent):
|
||||
"""Agent for orchestrating the complete cat adoption search pipeline."""
|
||||
|
||||
name = "Planning Agent"
|
||||
color = Agent.WHITE
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
db_manager: DatabaseManager,
|
||||
vector_db: VectorDBManager,
|
||||
metadata_vectordb: MetadataVectorDB = None
|
||||
):
|
||||
"""
|
||||
Initialize the planning agent.
|
||||
|
||||
Args:
|
||||
db_manager: Database manager instance
|
||||
vector_db: Vector database manager instance
|
||||
metadata_vectordb: Optional metadata vector DB for color/breed fuzzy matching
|
||||
"""
|
||||
self.log("Planning Agent initializing...")
|
||||
|
||||
# Initialize all agents
|
||||
self.petfinder = PetfinderAgent()
|
||||
self.rescuegroups = RescueGroupsAgent()
|
||||
self.deduplication = DeduplicationAgent(db_manager)
|
||||
self.matching = MatchingAgent(vector_db)
|
||||
|
||||
self.db_manager = db_manager
|
||||
self.vector_db = vector_db
|
||||
self.metadata_vectordb = metadata_vectordb
|
||||
|
||||
self.log("Planning Agent ready")
|
||||
|
||||
def _search_petfinder(self, profile: CatProfile) -> List[Cat]:
|
||||
"""
|
||||
Search Petfinder with the given profile.
|
||||
|
||||
Args:
|
||||
profile: User's cat profile
|
||||
|
||||
Returns:
|
||||
List of cats from Petfinder
|
||||
"""
|
||||
try:
|
||||
# Normalize colors to valid Petfinder API values (3-tier: dict + vector + fallback)
|
||||
api_colors = None
|
||||
if profile.color_preferences:
|
||||
from utils.color_mapping import normalize_user_colors
|
||||
valid_colors = self.petfinder.get_valid_colors()
|
||||
api_colors = normalize_user_colors(
|
||||
profile.color_preferences,
|
||||
valid_colors,
|
||||
vectordb=self.metadata_vectordb,
|
||||
source="petfinder"
|
||||
)
|
||||
|
||||
if api_colors:
|
||||
self.log(f"✓ Colors: {profile.color_preferences} → {api_colors}")
|
||||
else:
|
||||
self.log(f"⚠️ Could not map colors {profile.color_preferences}")
|
||||
|
||||
# Normalize breeds to valid Petfinder API values (3-tier: dict + vector + fallback)
|
||||
api_breeds = None
|
||||
if profile.preferred_breeds:
|
||||
from utils.breed_mapping import normalize_user_breeds
|
||||
valid_breeds = self.petfinder.get_valid_breeds()
|
||||
api_breeds = normalize_user_breeds(
|
||||
profile.preferred_breeds,
|
||||
valid_breeds,
|
||||
vectordb=self.metadata_vectordb,
|
||||
source="petfinder"
|
||||
)
|
||||
|
||||
if api_breeds:
|
||||
self.log(f"✓ Breeds: {profile.preferred_breeds} → {api_breeds}")
|
||||
else:
|
||||
self.log(f"⚠️ Could not map breeds {profile.preferred_breeds}")
|
||||
|
||||
return self.petfinder.search_cats(
|
||||
location=profile.user_location,
|
||||
distance=profile.max_distance or 100,
|
||||
age=profile.age_range,
|
||||
size=profile.size,
|
||||
gender=profile.gender_preference,
|
||||
color=api_colors,
|
||||
breed=api_breeds,
|
||||
good_with_children=profile.good_with_children,
|
||||
good_with_dogs=profile.good_with_dogs,
|
||||
good_with_cats=profile.good_with_cats,
|
||||
limit=100
|
||||
)
|
||||
except Exception as e:
|
||||
self.log_error(f"Petfinder search failed: {e}")
|
||||
return []
|
||||
|
||||
def _search_rescuegroups(self, profile: CatProfile) -> List[Cat]:
|
||||
"""
|
||||
Search RescueGroups with the given profile.
|
||||
|
||||
Args:
|
||||
profile: User's cat profile
|
||||
|
||||
Returns:
|
||||
List of cats from RescueGroups
|
||||
"""
|
||||
try:
|
||||
# Normalize colors to valid RescueGroups API values (3-tier: dict + vector + fallback)
|
||||
api_colors = None
|
||||
if profile.color_preferences:
|
||||
from utils.color_mapping import normalize_user_colors
|
||||
valid_colors = self.rescuegroups.get_valid_colors()
|
||||
api_colors = normalize_user_colors(
|
||||
profile.color_preferences,
|
||||
valid_colors,
|
||||
vectordb=self.metadata_vectordb,
|
||||
source="rescuegroups"
|
||||
)
|
||||
|
||||
if api_colors:
|
||||
self.log(f"✓ Colors: {profile.color_preferences} → {api_colors}")
|
||||
else:
|
||||
self.log(f"⚠️ Could not map colors {profile.color_preferences}")
|
||||
|
||||
# Normalize breeds to valid RescueGroups API values (3-tier: dict + vector + fallback)
|
||||
api_breeds = None
|
||||
if profile.preferred_breeds:
|
||||
from utils.breed_mapping import normalize_user_breeds
|
||||
valid_breeds = self.rescuegroups.get_valid_breeds()
|
||||
api_breeds = normalize_user_breeds(
|
||||
profile.preferred_breeds,
|
||||
valid_breeds,
|
||||
vectordb=self.metadata_vectordb,
|
||||
source="rescuegroups"
|
||||
)
|
||||
|
||||
if api_breeds:
|
||||
self.log(f"✓ Breeds: {profile.preferred_breeds} → {api_breeds}")
|
||||
else:
|
||||
self.log(f"⚠️ Could not map breeds {profile.preferred_breeds}")
|
||||
|
||||
return self.rescuegroups.search_cats(
|
||||
location=profile.user_location,
|
||||
distance=profile.max_distance or 100,
|
||||
age=profile.age_range,
|
||||
size=profile.size,
|
||||
gender=profile.gender_preference,
|
||||
color=api_colors,
|
||||
breed=api_breeds,
|
||||
good_with_children=profile.good_with_children,
|
||||
good_with_dogs=profile.good_with_dogs,
|
||||
good_with_cats=profile.good_with_cats,
|
||||
limit=100
|
||||
)
|
||||
except Exception as e:
|
||||
self.log_error(f"RescueGroups search failed: {e}")
|
||||
return []
|
||||
|
||||
@timed
|
||||
def fetch_cats(self, profile: CatProfile) -> List[Cat]:
|
||||
"""
|
||||
Fetch cats from all sources in parallel.
|
||||
|
||||
Args:
|
||||
profile: User's cat profile
|
||||
|
||||
Returns:
|
||||
Combined list of cats from all sources
|
||||
"""
|
||||
self.log("Fetching cats from all sources in parallel...")
|
||||
self.log(f"DEBUG: Profile location={profile.user_location}, distance={profile.max_distance}, colors={profile.color_preferences}, age={profile.age_range}")
|
||||
|
||||
all_cats = []
|
||||
sources_queried = []
|
||||
|
||||
# Execute searches in parallel
|
||||
with ThreadPoolExecutor(max_workers=2) as executor:
|
||||
futures = {
|
||||
executor.submit(self._search_petfinder, profile): 'petfinder',
|
||||
executor.submit(self._search_rescuegroups, profile): 'rescuegroups'
|
||||
}
|
||||
|
||||
for future in as_completed(futures):
|
||||
source = futures[future]
|
||||
try:
|
||||
cats = future.result()
|
||||
all_cats.extend(cats)
|
||||
sources_queried.append(source)
|
||||
self.log(f"DEBUG: ✓ Received {len(cats)} cats from {source}")
|
||||
except Exception as e:
|
||||
self.log_error(f"Failed to fetch from {source}: {e}")
|
||||
|
||||
self.log(f"DEBUG: Total cats fetched: {len(all_cats)} from {len(sources_queried)} sources")
|
||||
return all_cats, sources_queried
|
||||
|
||||
@timed
|
||||
def deduplicate_and_cache(self, cats: List[Cat]) -> List[Cat]:
|
||||
"""
|
||||
Deduplicate cats and cache them in the database.
|
||||
|
||||
Args:
|
||||
cats: List of cats to process
|
||||
|
||||
Returns:
|
||||
List of unique cats
|
||||
"""
|
||||
self.log(f"Deduplicating {len(cats)} cats...")
|
||||
|
||||
unique_cats = self.deduplication.deduplicate_batch(cats)
|
||||
|
||||
self.log(f"Deduplication complete: {len(unique_cats)} unique cats")
|
||||
return unique_cats
|
||||
|
||||
@timed
|
||||
def update_vector_db(self, cats: List[Cat]) -> None:
|
||||
"""
|
||||
Update vector database with new cats.
|
||||
|
||||
Args:
|
||||
cats: List of cats to add/update
|
||||
"""
|
||||
self.log(f"Updating vector database with {len(cats)} cats...")
|
||||
|
||||
try:
|
||||
self.vector_db.add_cats_batch(cats)
|
||||
self.log("Vector database updated successfully")
|
||||
except Exception as e:
|
||||
self.log_error(f"Failed to update vector database: {e}")
|
||||
|
||||
@timed
|
||||
def search(self, profile: CatProfile, use_cache: bool = False) -> SearchResult:
|
||||
"""
|
||||
Execute the complete search pipeline.
|
||||
|
||||
Pipeline:
|
||||
1. Fetch cats from Petfinder and RescueGroups in parallel (or use cache)
|
||||
2. Deduplicate across sources and cache in database
|
||||
3. Update vector database with new/updated cats
|
||||
4. Use matching agent to find best matches
|
||||
5. Return search results
|
||||
|
||||
Args:
|
||||
profile: User's cat profile
|
||||
use_cache: If True, use cached cats instead of fetching from APIs
|
||||
|
||||
Returns:
|
||||
SearchResult with matches and metadata
|
||||
"""
|
||||
import time
|
||||
start_time = time.time()
|
||||
|
||||
self.log("=" * 50)
|
||||
self.log("STARTING CAT ADOPTION SEARCH PIPELINE")
|
||||
if use_cache:
|
||||
self.log("🔄 CACHE MODE: Using existing cached data")
|
||||
self.log("=" * 50)
|
||||
|
||||
# Step 1: Fetch from sources or use cache
|
||||
if use_cache:
|
||||
self.log("Loading cats from cache...")
|
||||
all_cats = self.db_manager.get_all_cached_cats(exclude_duplicates=True)
|
||||
sources_queried = ['cache']
|
||||
total_found = len(all_cats)
|
||||
unique_cats = all_cats
|
||||
duplicates_removed = 0
|
||||
|
||||
if not all_cats:
|
||||
self.log("No cached cats found. Run without use_cache=True first.")
|
||||
return SearchResult(
|
||||
matches=[],
|
||||
total_found=0,
|
||||
search_profile=profile,
|
||||
search_time=time.time() - start_time,
|
||||
sources_queried=['cache'],
|
||||
duplicates_removed=0
|
||||
)
|
||||
|
||||
self.log(f"Loaded {len(all_cats)} cats from cache")
|
||||
else:
|
||||
all_cats, sources_queried = self.fetch_cats(profile)
|
||||
total_found = len(all_cats)
|
||||
|
||||
if not all_cats:
|
||||
self.log("No cats found matching criteria")
|
||||
return SearchResult(
|
||||
matches=[],
|
||||
total_found=0,
|
||||
search_profile=profile,
|
||||
search_time=time.time() - start_time,
|
||||
sources_queried=sources_queried,
|
||||
duplicates_removed=0
|
||||
)
|
||||
|
||||
# Step 2: Deduplicate and cache
|
||||
unique_cats = self.deduplicate_and_cache(all_cats)
|
||||
duplicates_removed = total_found - len(unique_cats)
|
||||
|
||||
# Step 3: Update vector database
|
||||
self.update_vector_db(unique_cats)
|
||||
|
||||
# Step 4: Find matches using hybrid search
|
||||
self.log("Finding best matches using hybrid search...")
|
||||
matches = self.matching.match(profile)
|
||||
|
||||
# Calculate search time
|
||||
search_time = time.time() - start_time
|
||||
|
||||
# Create result
|
||||
result = SearchResult(
|
||||
matches=matches,
|
||||
total_found=total_found,
|
||||
search_profile=profile,
|
||||
search_time=search_time,
|
||||
sources_queried=sources_queried,
|
||||
duplicates_removed=duplicates_removed
|
||||
)
|
||||
|
||||
self.log("=" * 50)
|
||||
self.log(f"SEARCH COMPLETE - Found {len(matches)} matches in {search_time:.2f}s")
|
||||
self.log("=" * 50)
|
||||
|
||||
return result
|
||||
|
||||
def cleanup_old_data(self, days: int = 30) -> dict:
|
||||
"""
|
||||
Clean up old cached data.
|
||||
|
||||
Args:
|
||||
days: Number of days to keep
|
||||
|
||||
Returns:
|
||||
Dictionary with cleanup stats
|
||||
"""
|
||||
self.log(f"Cleaning up cats older than {days} days...")
|
||||
|
||||
# Clean SQLite cache
|
||||
removed = self.db_manager.cleanup_old_cats(days)
|
||||
|
||||
# Note: ChromaDB cleanup would require tracking IDs separately
|
||||
# For now, we rely on the database as source of truth
|
||||
|
||||
self.log(f"Cleanup complete: removed {removed} old cats")
|
||||
|
||||
return {
|
||||
'cats_removed': removed,
|
||||
'days_threshold': days
|
||||
}
|
||||
|
||||
@@ -1,191 +0,0 @@
|
||||
"""Profile agent for extracting user preferences using LLM."""
|
||||
|
||||
import os
|
||||
from typing import List, Optional
|
||||
from openai import OpenAI
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from models.cats import CatProfile
|
||||
from utils.geocoding import parse_location_input
|
||||
from .agent import Agent
|
||||
|
||||
|
||||
class ProfileAgent(Agent):
|
||||
"""Agent for extracting cat adoption preferences from user conversation."""
|
||||
|
||||
name = "Profile Agent"
|
||||
color = Agent.GREEN
|
||||
|
||||
MODEL = "gpt-4o-mini"
|
||||
|
||||
SYSTEM_PROMPT = """You are a helpful assistant helping users find their perfect cat for adoption.
|
||||
|
||||
Your job is to extract their preferences through natural conversation and return them in structured format.
|
||||
|
||||
Ask about:
|
||||
- Color and coat patterns (e.g., tuxedo/black&white, tabby, orange, calico, tortoiseshell, gray, etc.)
|
||||
- Personality traits they're looking for (playful, calm, cuddly, independent, etc.)
|
||||
- Age preference (kitten, young adult, adult, senior)
|
||||
- Size preference (small, medium, large)
|
||||
- Living situation (children, dogs, other cats)
|
||||
- Special needs acceptance
|
||||
- Location and max distance willing to travel
|
||||
- Gender preference (if any)
|
||||
- Breed preferences (if any)
|
||||
|
||||
IMPORTANT: When users mention colors or patterns (like "tuxedo", "black and white", "orange tabby", etc.),
|
||||
extract these into the color_preferences field exactly as the user states them. Examples:
|
||||
- "tuxedo" → ["tuxedo"]
|
||||
- "black and white" → ["black and white"]
|
||||
- "orange tabby" → ["orange", "tabby"]
|
||||
- "calico" → ["calico"]
|
||||
- "gray" or "grey" → ["gray"]
|
||||
|
||||
Extract colors/patterns naturally without trying to map to specific API values.
|
||||
|
||||
Be conversational and warm. Ask follow-up questions if preferences are unclear.
|
||||
When you have enough information, extract it into the CatProfile format."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the profile agent."""
|
||||
load_dotenv()
|
||||
|
||||
self.api_key = os.getenv('OPENAI_API_KEY')
|
||||
if not self.api_key:
|
||||
raise ValueError("OPENAI_API_KEY must be set in environment")
|
||||
|
||||
self.client = OpenAI(api_key=self.api_key)
|
||||
|
||||
self.log("Profile Agent initialized")
|
||||
|
||||
def extract_profile(self, conversation: List[dict]) -> Optional[CatProfile]:
|
||||
"""
|
||||
Extract CatProfile from conversation history.
|
||||
|
||||
Args:
|
||||
conversation: List of message dicts with 'role' and 'content'
|
||||
|
||||
Returns:
|
||||
CatProfile object or None if extraction fails
|
||||
"""
|
||||
self.log("Extracting profile from conversation")
|
||||
|
||||
# Add system message
|
||||
messages = [{"role": "system", "content": self.SYSTEM_PROMPT}]
|
||||
messages.extend(conversation)
|
||||
|
||||
# Add extraction prompt
|
||||
messages.append({
|
||||
"role": "user",
|
||||
"content": "Please extract my preferences into a structured profile now."
|
||||
})
|
||||
|
||||
try:
|
||||
response = self.client.beta.chat.completions.parse(
|
||||
model=self.MODEL,
|
||||
messages=messages,
|
||||
response_format=CatProfile
|
||||
)
|
||||
|
||||
profile = response.choices[0].message.parsed
|
||||
|
||||
# Parse location if provided
|
||||
if profile.user_location:
|
||||
coords = parse_location_input(profile.user_location)
|
||||
if coords:
|
||||
profile.user_latitude, profile.user_longitude = coords
|
||||
self.log(f"Parsed location: {profile.user_location} -> {coords}")
|
||||
else:
|
||||
self.log_warning(f"Could not parse location: {profile.user_location}")
|
||||
|
||||
self.log("Profile extracted successfully")
|
||||
return profile
|
||||
|
||||
except Exception as e:
|
||||
self.log_error(f"Failed to extract profile: {e}")
|
||||
return None
|
||||
|
||||
def chat(self, user_message: str, conversation_history: List[dict]) -> str:
|
||||
"""
|
||||
Continue conversation to gather preferences.
|
||||
|
||||
Args:
|
||||
user_message: Latest user message
|
||||
conversation_history: Previous conversation
|
||||
|
||||
Returns:
|
||||
Assistant's response
|
||||
"""
|
||||
self.log(f"Processing user message: {user_message[:50]}...")
|
||||
|
||||
# Build messages
|
||||
messages = [{"role": "system", "content": self.SYSTEM_PROMPT}]
|
||||
messages.extend(conversation_history)
|
||||
messages.append({"role": "user", "content": user_message})
|
||||
|
||||
try:
|
||||
response = self.client.chat.completions.create(
|
||||
model=self.MODEL,
|
||||
messages=messages
|
||||
)
|
||||
|
||||
assistant_message = response.choices[0].message.content
|
||||
self.log("Generated response")
|
||||
|
||||
return assistant_message
|
||||
|
||||
except Exception as e:
|
||||
self.log_error(f"Chat failed: {e}")
|
||||
return "I'm sorry, I'm having trouble right now. Could you try again?"
|
||||
|
||||
def create_profile_from_direct_input(
|
||||
self,
|
||||
location: str,
|
||||
distance: int = 100,
|
||||
personality_description: str = "",
|
||||
age_range: Optional[List[str]] = None,
|
||||
size: Optional[List[str]] = None,
|
||||
good_with_children: Optional[bool] = None,
|
||||
good_with_dogs: Optional[bool] = None,
|
||||
good_with_cats: Optional[bool] = None
|
||||
) -> CatProfile:
|
||||
"""
|
||||
Create profile directly from form inputs (bypass conversation).
|
||||
|
||||
Args:
|
||||
location: User location
|
||||
distance: Search radius in miles
|
||||
personality_description: Free text personality description
|
||||
age_range: Age preferences
|
||||
size: Size preferences
|
||||
good_with_children: Must be good with children
|
||||
good_with_dogs: Must be good with dogs
|
||||
good_with_cats: Must be good with cats
|
||||
|
||||
Returns:
|
||||
CatProfile object
|
||||
"""
|
||||
self.log("Creating profile from direct input")
|
||||
|
||||
# Parse location
|
||||
user_lat, user_lon = None, None
|
||||
coords = parse_location_input(location)
|
||||
if coords:
|
||||
user_lat, user_lon = coords
|
||||
|
||||
profile = CatProfile(
|
||||
user_location=location,
|
||||
user_latitude=user_lat,
|
||||
user_longitude=user_lon,
|
||||
max_distance=distance,
|
||||
personality_description=personality_description,
|
||||
age_range=age_range,
|
||||
size=size,
|
||||
good_with_children=good_with_children,
|
||||
good_with_dogs=good_with_dogs,
|
||||
good_with_cats=good_with_cats
|
||||
)
|
||||
|
||||
self.log("Profile created from direct input")
|
||||
return profile
|
||||
|
||||
@@ -1,474 +0,0 @@
|
||||
"""RescueGroups.org API agent for fetching cat adoption listings."""
|
||||
|
||||
import os
|
||||
import time
|
||||
import requests
|
||||
from datetime import datetime
|
||||
from typing import List, Optional, Dict, Any
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from models.cats import Cat
|
||||
from .agent import Agent, timed
|
||||
|
||||
|
||||
class RescueGroupsAgent(Agent):
|
||||
"""Agent for interacting with RescueGroups.org API."""
|
||||
|
||||
name = "RescueGroups Agent"
|
||||
color = Agent.MAGENTA
|
||||
|
||||
BASE_URL = "https://api.rescuegroups.org/v5"
|
||||
|
||||
# Rate limiting
|
||||
MAX_REQUESTS_PER_SECOND = 0.5 # Be conservative
|
||||
MAX_RESULTS_PER_PAGE = 100
|
||||
|
||||
# Cache for valid colors and breeds
|
||||
_valid_colors_cache: Optional[List[str]] = None
|
||||
_valid_breeds_cache: Optional[List[str]] = None
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the RescueGroups agent with API credentials."""
|
||||
load_dotenv()
|
||||
|
||||
self.api_key = os.getenv('RESCUEGROUPS_API_KEY')
|
||||
|
||||
if not self.api_key:
|
||||
self.log_warning("RESCUEGROUPS_API_KEY not set - agent will not function")
|
||||
self.api_key = None
|
||||
|
||||
self.last_request_time: float = 0
|
||||
|
||||
self.log("RescueGroups Agent initialized")
|
||||
|
||||
def get_valid_colors(self) -> List[str]:
|
||||
"""
|
||||
Fetch valid colors from RescueGroups API.
|
||||
|
||||
Returns:
|
||||
List of valid color strings
|
||||
"""
|
||||
if not self.api_key:
|
||||
return []
|
||||
|
||||
# Use class-level cache
|
||||
if RescueGroupsAgent._valid_colors_cache is not None:
|
||||
return RescueGroupsAgent._valid_colors_cache
|
||||
|
||||
try:
|
||||
self.log("Fetching valid cat colors from RescueGroups API...")
|
||||
|
||||
# Correct endpoint for colors
|
||||
url = f"{self.BASE_URL}/public/animals/colors"
|
||||
headers = {
|
||||
'Authorization': self.api_key,
|
||||
'Content-Type': 'application/vnd.api+json'
|
||||
}
|
||||
|
||||
# Add limit parameter to get all colors (no max limit for static data per docs)
|
||||
params = {'limit': 1000}
|
||||
|
||||
self._rate_limit()
|
||||
response = requests.get(url, headers=headers, params=params, timeout=15)
|
||||
response.raise_for_status()
|
||||
|
||||
data = response.json()
|
||||
colors = [item['attributes']['name'] for item in data.get('data', [])]
|
||||
|
||||
# Cache the results
|
||||
RescueGroupsAgent._valid_colors_cache = colors
|
||||
|
||||
self.log(f"✓ Fetched {len(colors)} valid colors from RescueGroups")
|
||||
return colors
|
||||
|
||||
except Exception as e:
|
||||
self.log_error(f"Failed to fetch valid colors: {e}")
|
||||
# Return empty list - planning agent will handle gracefully
|
||||
return []
|
||||
|
||||
def get_valid_breeds(self) -> List[str]:
|
||||
"""
|
||||
Fetch valid cat breeds from RescueGroups API.
|
||||
|
||||
Returns:
|
||||
List of valid breed strings
|
||||
"""
|
||||
if not self.api_key:
|
||||
return []
|
||||
|
||||
# Use class-level cache
|
||||
if RescueGroupsAgent._valid_breeds_cache is not None:
|
||||
return RescueGroupsAgent._valid_breeds_cache
|
||||
|
||||
try:
|
||||
self.log("Fetching valid cat breeds from RescueGroups API...")
|
||||
|
||||
# Correct endpoint for breeds
|
||||
url = f"{self.BASE_URL}/public/animals/breeds"
|
||||
headers = {
|
||||
'Authorization': self.api_key,
|
||||
'Content-Type': 'application/vnd.api+json'
|
||||
}
|
||||
|
||||
# Add limit parameter to get all breeds (no max limit for static data per docs)
|
||||
params = {'limit': 1000}
|
||||
|
||||
self._rate_limit()
|
||||
response = requests.get(url, headers=headers, params=params, timeout=15)
|
||||
response.raise_for_status()
|
||||
|
||||
data = response.json()
|
||||
breeds = [item['attributes']['name'] for item in data.get('data', [])]
|
||||
|
||||
# Cache the results
|
||||
RescueGroupsAgent._valid_breeds_cache = breeds
|
||||
|
||||
self.log(f"✓ Fetched {len(breeds)} valid breeds from RescueGroups")
|
||||
return breeds
|
||||
|
||||
except Exception as e:
|
||||
self.log_error(f"Failed to fetch valid breeds: {e}")
|
||||
# Return empty list - planning agent will handle gracefully
|
||||
return []
|
||||
|
||||
def _rate_limit(self) -> None:
|
||||
"""Implement rate limiting to respect API limits."""
|
||||
elapsed = time.time() - self.last_request_time
|
||||
min_interval = 1.0 / self.MAX_REQUESTS_PER_SECOND
|
||||
|
||||
if elapsed < min_interval:
|
||||
time.sleep(min_interval - elapsed)
|
||||
|
||||
self.last_request_time = time.time()
|
||||
|
||||
def _make_request(self, endpoint: str, data: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Make an authenticated POST request to RescueGroups API.
|
||||
|
||||
Args:
|
||||
endpoint: API endpoint (e.g., "/animals/search")
|
||||
data: Request payload
|
||||
|
||||
Returns:
|
||||
JSON response data
|
||||
"""
|
||||
if not self.api_key:
|
||||
raise ValueError("RescueGroups API key not configured")
|
||||
|
||||
self._rate_limit()
|
||||
|
||||
url = f"{self.BASE_URL}{endpoint}"
|
||||
headers = {
|
||||
'Authorization': self.api_key,
|
||||
'Content-Type': 'application/vnd.api+json'
|
||||
}
|
||||
|
||||
try:
|
||||
response = requests.post(url, json=data, headers=headers, timeout=15)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
self.log_error(f"API request failed: {e}")
|
||||
if hasattr(e, 'response') and e.response is not None:
|
||||
self.log_error(f"Response: {e.response.text[:500]}")
|
||||
raise
|
||||
|
||||
def _parse_cat(self, animal_data: Dict[str, Any]) -> Cat:
|
||||
"""
|
||||
Parse RescueGroups API animal data into Cat model.
|
||||
|
||||
Args:
|
||||
animal_data: Animal data from RescueGroups API
|
||||
|
||||
Returns:
|
||||
Cat object
|
||||
"""
|
||||
attributes = animal_data.get('attributes', {})
|
||||
|
||||
# Basic info
|
||||
cat_id = f"rescuegroups_{animal_data['id']}"
|
||||
name = attributes.get('name', 'Unknown')
|
||||
|
||||
# Breed info
|
||||
primary_breed = attributes.get('breedPrimary', 'Unknown')
|
||||
secondary_breed = attributes.get('breedSecondary')
|
||||
secondary_breeds = [secondary_breed] if secondary_breed else []
|
||||
|
||||
# Age mapping
|
||||
age_str = attributes.get('ageGroup', '').lower()
|
||||
age_map = {
|
||||
'baby': 'kitten',
|
||||
'young': 'young',
|
||||
'adult': 'adult',
|
||||
'senior': 'senior'
|
||||
}
|
||||
age = age_map.get(age_str, 'unknown')
|
||||
|
||||
# Size mapping
|
||||
size_str = attributes.get('sizeGroup', '').lower()
|
||||
size_map = {
|
||||
'small': 'small',
|
||||
'medium': 'medium',
|
||||
'large': 'large'
|
||||
}
|
||||
size = size_map.get(size_str, 'unknown')
|
||||
|
||||
# Gender mapping
|
||||
gender_str = attributes.get('sex', '').lower()
|
||||
gender_map = {
|
||||
'male': 'male',
|
||||
'female': 'female'
|
||||
}
|
||||
gender = gender_map.get(gender_str, 'unknown')
|
||||
|
||||
# Description
|
||||
description = attributes.get('descriptionText', '')
|
||||
if not description:
|
||||
description = f"{name} is a {age} {primary_breed} looking for a home."
|
||||
|
||||
# Location info
|
||||
location = attributes.get('location', {}) or {}
|
||||
city = location.get('citytown')
|
||||
state = location.get('stateProvince')
|
||||
zip_code = location.get('postalcode')
|
||||
|
||||
# Organization
|
||||
org_name = attributes.get('orgName', 'Unknown Organization')
|
||||
org_id = attributes.get('orgID')
|
||||
|
||||
# Attributes - map RescueGroups boolean fields
|
||||
good_with_children = attributes.get('isKidsGood')
|
||||
good_with_dogs = attributes.get('isDogsGood')
|
||||
good_with_cats = attributes.get('isCatsGood')
|
||||
special_needs = attributes.get('isSpecialNeeds', False)
|
||||
|
||||
# Photos
|
||||
pictures = attributes.get('pictureThumbnailUrl', [])
|
||||
if isinstance(pictures, str):
|
||||
pictures = [pictures] if pictures else []
|
||||
elif not pictures:
|
||||
pictures = []
|
||||
|
||||
photos = [pic for pic in pictures if pic]
|
||||
primary_photo = photos[0] if photos else None
|
||||
|
||||
# Contact info
|
||||
contact_email = attributes.get('emailAddress')
|
||||
contact_phone = attributes.get('phoneNumber')
|
||||
|
||||
# Colors
|
||||
color_str = attributes.get('colorDetails', '')
|
||||
colors = [c.strip() for c in color_str.split(',') if c.strip()] if color_str else []
|
||||
|
||||
# Coat
|
||||
coat_str = attributes.get('coatLength', '').lower()
|
||||
coat_map = {
|
||||
'short': 'short',
|
||||
'medium': 'medium',
|
||||
'long': 'long'
|
||||
}
|
||||
coat_length = coat_map.get(coat_str)
|
||||
|
||||
# URL
|
||||
url = attributes.get('url', f"https://rescuegroups.org/animal/{animal_data['id']}")
|
||||
|
||||
# Additional attributes
|
||||
declawed = attributes.get('isDeclawed')
|
||||
spayed_neutered = attributes.get('isAltered')
|
||||
house_trained = attributes.get('isHousetrained')
|
||||
|
||||
return Cat(
|
||||
id=cat_id,
|
||||
name=name,
|
||||
breed=primary_breed,
|
||||
breeds_secondary=secondary_breeds,
|
||||
age=age,
|
||||
size=size,
|
||||
gender=gender,
|
||||
description=description,
|
||||
organization_name=org_name,
|
||||
organization_id=org_id,
|
||||
city=city,
|
||||
state=state,
|
||||
zip_code=zip_code,
|
||||
country='US',
|
||||
good_with_children=good_with_children,
|
||||
good_with_dogs=good_with_dogs,
|
||||
good_with_cats=good_with_cats,
|
||||
special_needs=special_needs,
|
||||
photos=photos,
|
||||
primary_photo=primary_photo,
|
||||
source='rescuegroups',
|
||||
url=url,
|
||||
contact_email=contact_email,
|
||||
contact_phone=contact_phone,
|
||||
declawed=declawed,
|
||||
spayed_neutered=spayed_neutered,
|
||||
house_trained=house_trained,
|
||||
coat_length=coat_length,
|
||||
colors=colors,
|
||||
fetched_at=datetime.now()
|
||||
)
|
||||
|
||||
@timed
|
||||
def search_cats(
|
||||
self,
|
||||
location: Optional[str] = None,
|
||||
distance: int = 100,
|
||||
age: Optional[List[str]] = None,
|
||||
size: Optional[List[str]] = None,
|
||||
gender: Optional[str] = None,
|
||||
color: Optional[List[str]] = None,
|
||||
breed: Optional[List[str]] = None,
|
||||
good_with_children: Optional[bool] = None,
|
||||
good_with_dogs: Optional[bool] = None,
|
||||
good_with_cats: Optional[bool] = None,
|
||||
limit: int = 100
|
||||
) -> List[Cat]:
|
||||
"""
|
||||
Search for cats on RescueGroups.
|
||||
|
||||
Args:
|
||||
location: ZIP code or city/state
|
||||
distance: Search radius in miles (default: 100)
|
||||
age: List of age categories: kitten, young, adult, senior
|
||||
size: List of sizes: small, medium, large
|
||||
gender: Gender filter: male, female
|
||||
color: List of colors (e.g., ["black", "white", "tuxedo"])
|
||||
breed: List of breed names (e.g., ["Siamese", "Maine Coon"])
|
||||
good_with_children: Filter for cats good with children
|
||||
good_with_dogs: Filter for cats good with dogs
|
||||
good_with_cats: Filter for cats good with other cats
|
||||
limit: Maximum number of results (default: 100)
|
||||
|
||||
Returns:
|
||||
List of Cat objects
|
||||
"""
|
||||
if not self.api_key:
|
||||
self.log_warning("RescueGroups API key not configured, returning empty results")
|
||||
return []
|
||||
|
||||
color_str = f" with colors {color}" if color else ""
|
||||
breed_str = f" breeds {breed}" if breed else ""
|
||||
self.log(f"Searching RescueGroups for cats near {location}{color_str}{breed_str}")
|
||||
|
||||
self.log(f"DEBUG: RescueGroups search params - location: {location}, distance: {distance}, age: {age}, size: {size}, gender: {gender}, color: {color}, breed: {breed}")
|
||||
|
||||
# Build filter criteria
|
||||
filters = [
|
||||
{
|
||||
"fieldName": "species.singular",
|
||||
"operation": "equals",
|
||||
"criteria": "cat"
|
||||
},
|
||||
{
|
||||
"fieldName": "statuses.name",
|
||||
"operation": "equals",
|
||||
"criteria": "Available"
|
||||
}
|
||||
]
|
||||
|
||||
# Location filter - DISABLED: RescueGroups v5 API doesn't support location filtering
|
||||
# Their API returns animals from all locations, filtering must be done client-side
|
||||
if location:
|
||||
self.log(f"NOTE: RescueGroups doesn't support location filters. Will return all results.")
|
||||
|
||||
# Age filter
|
||||
if age:
|
||||
age_map = {
|
||||
'kitten': 'Baby',
|
||||
'young': 'Young',
|
||||
'adult': 'Adult',
|
||||
'senior': 'Senior'
|
||||
}
|
||||
rg_ages = [age_map.get(a, a.capitalize()) for a in age]
|
||||
for rg_age in rg_ages:
|
||||
filters.append({
|
||||
"fieldName": "animals.ageGroup",
|
||||
"operation": "equals",
|
||||
"criteria": rg_age
|
||||
})
|
||||
|
||||
# Size filter
|
||||
if size:
|
||||
size_map = {
|
||||
'small': 'Small',
|
||||
'medium': 'Medium',
|
||||
'large': 'Large'
|
||||
}
|
||||
for s in size:
|
||||
rg_size = size_map.get(s, s.capitalize())
|
||||
filters.append({
|
||||
"fieldName": "animals.sizeGroup",
|
||||
"operation": "equals",
|
||||
"criteria": rg_size
|
||||
})
|
||||
|
||||
# Gender filter
|
||||
if gender:
|
||||
filters.append({
|
||||
"fieldName": "animals.sex",
|
||||
"operation": "equals",
|
||||
"criteria": gender.capitalize()
|
||||
})
|
||||
|
||||
# Color filter - DISABLED: RescueGroups v5 API field name for color is unclear
|
||||
# Filtering by color will be done client-side with returned data
|
||||
if color:
|
||||
self.log(f"NOTE: Color filtering for RescueGroups will be done client-side: {color}")
|
||||
|
||||
# Breed filter - DISABLED: RescueGroups v5 API breed filtering is not reliable
|
||||
# Filtering by breed will be done client-side with returned data
|
||||
if breed:
|
||||
self.log(f"NOTE: Breed filtering for RescueGroups will be done client-side: {breed}")
|
||||
|
||||
# Behavioral filters - DISABLED: RescueGroups v5 API doesn't support behavioral filters
|
||||
# These fields exist in response data but cannot be used as filter criteria
|
||||
# Client-side filtering will be applied to returned results
|
||||
if good_with_children:
|
||||
self.log(f"NOTE: good_with_children filtering will be done client-side")
|
||||
|
||||
if good_with_dogs:
|
||||
self.log(f"NOTE: good_with_dogs filtering will be done client-side")
|
||||
|
||||
if good_with_cats:
|
||||
self.log(f"NOTE: good_with_cats filtering will be done client-side")
|
||||
|
||||
# Build request payload
|
||||
payload = {
|
||||
"data": {
|
||||
"filters": filters,
|
||||
"filterProcessing": "1" # AND logic
|
||||
}
|
||||
}
|
||||
|
||||
# Add pagination
|
||||
if limit:
|
||||
payload["data"]["limit"] = min(limit, self.MAX_RESULTS_PER_PAGE)
|
||||
|
||||
self.log(f"DEBUG: RescueGroups filters: {len(filters)} filters applied")
|
||||
|
||||
try:
|
||||
response = self._make_request("/public/animals/search/available/cats", payload)
|
||||
|
||||
self.log(f"DEBUG: RescueGroups API Response - Found {len(response.get('data', []))} animals")
|
||||
|
||||
# Parse response
|
||||
data = response.get('data', [])
|
||||
cats = []
|
||||
|
||||
for animal_data in data:
|
||||
try:
|
||||
cat = self._parse_cat(animal_data)
|
||||
cats.append(cat)
|
||||
except Exception as e:
|
||||
self.log_warning(f"Failed to parse cat {animal_data.get('id')}: {e}")
|
||||
|
||||
self.log(f"Search complete: found {len(cats)} cats")
|
||||
return cats
|
||||
|
||||
except Exception as e:
|
||||
self.log_error(f"Search failed: {e}")
|
||||
return []
|
||||
|
||||
@@ -1,834 +0,0 @@
|
||||
"""Gradio UI for Tuxedo Link cat adoption application."""
|
||||
|
||||
import os
|
||||
import gradio as gr
|
||||
import pandas as pd
|
||||
from dotenv import load_dotenv
|
||||
from typing import List, Optional, Tuple
|
||||
import logging
|
||||
import re
|
||||
from datetime import datetime
|
||||
|
||||
# Import models - these are lightweight
|
||||
from models.cats import CatProfile, CatMatch, AdoptionAlert
|
||||
from utils.config import is_production
|
||||
|
||||
# Load environment
|
||||
load_dotenv()
|
||||
|
||||
# Initialize framework based on mode
|
||||
framework = None
|
||||
profile_agent = None
|
||||
|
||||
if not is_production():
|
||||
# LOCAL MODE: Import and initialize heavy components
|
||||
from cat_adoption_framework import TuxedoLinkFramework
|
||||
from agents.profile_agent import ProfileAgent
|
||||
|
||||
framework = TuxedoLinkFramework()
|
||||
profile_agent = ProfileAgent()
|
||||
print("✓ Running in LOCAL mode - using local components")
|
||||
else:
|
||||
# PRODUCTION MODE: Don't import heavy components - use Modal API
|
||||
print("✓ Running in PRODUCTION mode - using Modal API")
|
||||
|
||||
# Global state for current search results
|
||||
current_matches: List[CatMatch] = []
|
||||
current_profile: Optional[CatProfile] = None
|
||||
|
||||
# Configure logging to suppress verbose output
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
|
||||
|
||||
def extract_profile_from_text(user_input: str, use_cache: bool = False) -> tuple:
|
||||
"""
|
||||
Extract structured profile from user's natural language input.
|
||||
|
||||
Args:
|
||||
user_input: User's description of desired cat
|
||||
use_cache: Whether to use cached data for search
|
||||
|
||||
Returns:
|
||||
Tuple of (chat_history, results_html, profile_json)
|
||||
"""
|
||||
global current_matches, current_profile
|
||||
|
||||
try:
|
||||
# Handle empty input - use placeholder text
|
||||
if not user_input or user_input.strip() == "":
|
||||
user_input = "I'm looking for a friendly, playful kitten in NYC that's good with children"
|
||||
|
||||
# Extract profile using LLM
|
||||
# Using messages format for Gradio chatbot
|
||||
chat_history = [
|
||||
{"role": "user", "content": user_input},
|
||||
{"role": "assistant", "content": "🔍 Analyzing your preferences..."}
|
||||
]
|
||||
|
||||
# Extract profile (Modal or local)
|
||||
if is_production():
|
||||
# PRODUCTION: Call Modal API
|
||||
import modal
|
||||
|
||||
# Look up deployed function - correct API!
|
||||
extract_profile_func = modal.Function.from_name("tuxedo-link-api", "extract_profile")
|
||||
|
||||
print("[INFO] Calling Modal API to extract profile...")
|
||||
profile_result = extract_profile_func.remote(user_input)
|
||||
|
||||
if not profile_result["success"]:
|
||||
return chat_history, "<p>❌ Error extracting profile</p>", "{}"
|
||||
|
||||
profile = CatProfile(**profile_result["profile"])
|
||||
else:
|
||||
# LOCAL: Use local agent
|
||||
conversation = [{"role": "user", "content": user_input}]
|
||||
profile = profile_agent.extract_profile(conversation)
|
||||
|
||||
current_profile = profile
|
||||
|
||||
# Perform search
|
||||
response_msg = f"✅ Got it! Searching for:\n\n" + \
|
||||
f"📍 Location: {profile.user_location or 'Not specified'}\n" + \
|
||||
f"📏 Distance: {profile.max_distance or 100} miles\n" + \
|
||||
f"🎨 Colors: {', '.join(profile.color_preferences) if profile.color_preferences else 'Any'}\n" + \
|
||||
f"🎭 Personality: {profile.personality_description or 'Any'}\n" + \
|
||||
f"🎂 Age: {', '.join(profile.age_range) if profile.age_range else 'Any'}\n" + \
|
||||
f"👶 Good with children: {'Yes' if profile.good_with_children else 'Not required'}\n" + \
|
||||
f"🐕 Good with dogs: {'Yes' if profile.good_with_dogs else 'Not required'}\n" + \
|
||||
f"🐱 Good with cats: {'Yes' if profile.good_with_cats else 'Not required'}\n\n" + \
|
||||
f"Searching..."
|
||||
|
||||
chat_history[1]["content"] = response_msg
|
||||
|
||||
# Run search (Modal or local)
|
||||
if is_production():
|
||||
# PRODUCTION: Call Modal API
|
||||
import modal
|
||||
|
||||
# Look up deployed function
|
||||
search_cats_func = modal.Function.from_name("tuxedo-link-api", "search_cats")
|
||||
|
||||
print("[INFO] Calling Modal API to search cats...")
|
||||
search_result = search_cats_func.remote(profile.model_dump(), use_cache=use_cache)
|
||||
|
||||
if not search_result["success"]:
|
||||
error_msg = search_result.get('error', 'Unknown error')
|
||||
chat_history.append({"role": "assistant", "content": f"❌ Search error: {error_msg}"})
|
||||
return chat_history, "<p>😿 Search failed. Please try again.</p>", profile.json()
|
||||
|
||||
# Reconstruct matches from Modal response
|
||||
from models.cats import Cat
|
||||
current_matches = [
|
||||
CatMatch(
|
||||
cat=Cat(**m["cat"]),
|
||||
match_score=m["match_score"],
|
||||
vector_similarity=m["vector_similarity"],
|
||||
attribute_match_score=m["attribute_match_score"],
|
||||
explanation=m["explanation"],
|
||||
matching_attributes=m.get("matching_attributes", []),
|
||||
missing_attributes=m.get("missing_attributes", [])
|
||||
)
|
||||
for m in search_result["matches"]
|
||||
]
|
||||
else:
|
||||
# LOCAL: Use local framework
|
||||
result = framework.search(profile, use_cache=use_cache)
|
||||
current_matches = result.matches
|
||||
|
||||
# Build results HTML
|
||||
if current_matches:
|
||||
chat_history[1]["content"] += f"\n\n✨ Found {len(current_matches)} great matches!"
|
||||
results_html = build_results_grid(current_matches)
|
||||
else:
|
||||
chat_history[1]["content"] += "\n\n😿 No matches found. Try broadening your search criteria."
|
||||
results_html = "<p style='text-align:center; color: #666; padding: 40px;'>No matches found</p>"
|
||||
|
||||
# Profile JSON for display
|
||||
profile_json = profile.model_dump_json(indent=2)
|
||||
|
||||
return chat_history, results_html, profile_json
|
||||
|
||||
except Exception as e:
|
||||
error_msg = f"❌ Error: {str(e)}"
|
||||
print(f"[ERROR] Search failed: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return [
|
||||
{"role": "user", "content": user_input},
|
||||
{"role": "assistant", "content": error_msg}
|
||||
], "<p>Error occurred</p>", "{}"
|
||||
|
||||
|
||||
def build_results_grid(matches: List[CatMatch]) -> str:
|
||||
"""Build HTML grid of cat results."""
|
||||
html = "<div style='display: grid; grid-template-columns: repeat(auto-fill, minmax(240px, 1fr)); gap: 20px; padding: 20px;'>"
|
||||
|
||||
for match in matches:
|
||||
cat = match.cat
|
||||
photo = cat.primary_photo or "https://via.placeholder.com/240x180?text=No+Photo"
|
||||
|
||||
html += f"""
|
||||
<div style='border: 1px solid #ddd; border-radius: 10px; overflow: hidden; box-shadow: 0 2px 8px rgba(0,0,0,0.1);'>
|
||||
<img src='{photo}' style='width: 100%; height: 180px; object-fit: cover;' onerror="this.src='https://via.placeholder.com/240x180?text=No+Photo'">
|
||||
<div style='padding: 15px;'>
|
||||
<h3 style='margin: 0 0 10px 0; color: #333;'>{cat.name}</h3>
|
||||
<div style='display: flex; justify-content: space-between; margin-bottom: 8px;'>
|
||||
<span style='background: #4CAF50; color: white; padding: 4px 12px; border-radius: 12px; font-size: 12px;'>
|
||||
{match.match_score:.0%} Match
|
||||
</span>
|
||||
<span style='color: #666; font-size: 14px;'>{cat.age}</span>
|
||||
</div>
|
||||
<p style='color: #666; font-size: 14px; margin: 8px 0;'>
|
||||
<strong>{cat.breed}</strong><br/>
|
||||
{cat.city}, {cat.state}<br/>
|
||||
{cat.gender.capitalize()} • {cat.size.capitalize() if cat.size else 'Unknown size'}
|
||||
</p>
|
||||
<p style='color: #888; font-size: 13px; margin: 10px 0; line-height: 1.4;'>
|
||||
{match.explanation}
|
||||
</p>
|
||||
<a href='{cat.url}' target='_blank' style='display: block; text-align: center; background: #2196F3; color: white; padding: 10px; border-radius: 5px; text-decoration: none; margin-top: 10px;'>
|
||||
View Details
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
html += "</div>"
|
||||
return html
|
||||
|
||||
|
||||
def search_with_examples(example_text: str, use_cache: bool = False) -> tuple:
|
||||
"""Handle example button clicks."""
|
||||
return extract_profile_from_text(example_text, use_cache)
|
||||
|
||||
|
||||
# ===== ALERT MANAGEMENT FUNCTIONS =====
|
||||
|
||||
def validate_email(email: str) -> bool:
|
||||
"""Validate email address format."""
|
||||
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
|
||||
return bool(re.match(pattern, email))
|
||||
|
||||
|
||||
def send_immediate_notification_local(alert_id: int) -> None:
|
||||
"""
|
||||
Send immediate notification locally (not via Modal).
|
||||
|
||||
Args:
|
||||
alert_id: ID of the alert to process
|
||||
"""
|
||||
from agents.email_agent import EmailAgent
|
||||
from agents.email_providers.factory import get_email_provider
|
||||
|
||||
print(f"[DEBUG] Sending immediate notification for alert {alert_id}")
|
||||
|
||||
# Get alert from database
|
||||
alert = framework.db_manager.get_alert_by_id(alert_id)
|
||||
if not alert:
|
||||
print(f"[ERROR] Alert {alert_id} not found")
|
||||
raise ValueError(f"Alert {alert_id} not found")
|
||||
|
||||
print(f"[DEBUG] Alert found: email={alert.user_email}, profile exists={alert.profile is not None}")
|
||||
|
||||
# Run search with the alert's profile
|
||||
result = framework.search(alert.profile, use_cache=False)
|
||||
print(f"[DEBUG] Search complete: {len(result.matches)} matches found")
|
||||
|
||||
if result.matches:
|
||||
# Send email notification
|
||||
try:
|
||||
email_provider = get_email_provider()
|
||||
email_agent = EmailAgent(email_provider)
|
||||
print(f"[DEBUG] Sending email to {alert.user_email}...")
|
||||
email_agent.send_match_notification(
|
||||
alert=alert,
|
||||
matches=result.matches
|
||||
)
|
||||
print(f"[DEBUG] ✓ Email sent successfully!")
|
||||
except Exception as e:
|
||||
print(f"[ERROR] Failed to send email: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
raise
|
||||
else:
|
||||
print(f"[DEBUG] No matches found, no email sent")
|
||||
|
||||
|
||||
def save_alert(email: str, frequency: str, profile_json: str) -> Tuple[str, pd.DataFrame]:
|
||||
"""
|
||||
Save an adoption alert to the database.
|
||||
|
||||
Args:
|
||||
email: User's email address
|
||||
frequency: Notification frequency (Immediately, Daily, Weekly)
|
||||
profile_json: JSON string of current search profile
|
||||
|
||||
Returns:
|
||||
Tuple of (status_message, updated_alerts_dataframe)
|
||||
"""
|
||||
global current_profile
|
||||
|
||||
try:
|
||||
# Validate email
|
||||
if not email or not validate_email(email):
|
||||
return "❌ Please enter a valid email address", load_alerts()
|
||||
|
||||
# Check if we have a current profile
|
||||
if not current_profile:
|
||||
return "❌ Please perform a search first to create a profile", load_alerts()
|
||||
|
||||
# Normalize frequency
|
||||
frequency = frequency.lower()
|
||||
|
||||
# Create alert
|
||||
alert = AdoptionAlert(
|
||||
user_email=email,
|
||||
profile=current_profile,
|
||||
frequency=frequency,
|
||||
active=True
|
||||
)
|
||||
|
||||
# Save alert based on mode
|
||||
if is_production():
|
||||
# PRODUCTION MODE: Use Modal function
|
||||
try:
|
||||
import modal
|
||||
|
||||
print(f"[INFO] Production mode: Calling Modal function to create alert...")
|
||||
# Look up deployed function - correct API!
|
||||
create_alert_func = modal.Function.from_name("tuxedo-link-api", "create_alert_and_notify")
|
||||
|
||||
# Send alert data to Modal
|
||||
result = create_alert_func.remote(alert.dict())
|
||||
|
||||
if result["success"]:
|
||||
status = f"✅ {result['message']}"
|
||||
else:
|
||||
status = f"⚠️ {result['message']}"
|
||||
|
||||
return status, load_alerts()
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
error_detail = traceback.format_exc()
|
||||
print(f"[ERROR] Modal function failed: {error_detail}")
|
||||
return f"❌ Error calling Modal service: {str(e)}\n\nCheck Modal logs for details.", load_alerts()
|
||||
else:
|
||||
# LOCAL MODE: Save and process locally
|
||||
alert_id = framework.db_manager.create_alert(alert)
|
||||
|
||||
if frequency == "immediately":
|
||||
try:
|
||||
send_immediate_notification_local(alert_id)
|
||||
status = f"✅ Alert saved and notification sent locally! (ID: {alert_id})\n\nCheck your email at {email}"
|
||||
except Exception as e:
|
||||
import traceback
|
||||
error_detail = traceback.format_exc()
|
||||
print(f"[ERROR] Local notification failed: {error_detail}")
|
||||
status = f"✅ Alert saved (ID: {alert_id}), but notification failed: {str(e)}"
|
||||
else:
|
||||
status = f"✅ Alert saved successfully! (ID: {alert_id})\n\nYou'll receive {frequency} notifications at {email}"
|
||||
|
||||
return status, load_alerts()
|
||||
|
||||
except Exception as e:
|
||||
return f"❌ Error saving alert: {str(e)}", load_alerts()
|
||||
|
||||
|
||||
def load_alerts(email_filter: str = "") -> pd.DataFrame:
|
||||
"""
|
||||
Load all alerts from the database.
|
||||
|
||||
Args:
|
||||
email_filter: Optional email to filter by
|
||||
|
||||
Returns:
|
||||
DataFrame of alerts
|
||||
"""
|
||||
try:
|
||||
# Get alerts from database (Modal or local)
|
||||
if is_production():
|
||||
# PRODUCTION: Call Modal API
|
||||
import modal
|
||||
|
||||
# Look up deployed function
|
||||
get_alerts_func = modal.Function.from_name("tuxedo-link-api", "get_alerts")
|
||||
|
||||
alert_dicts = get_alerts_func.remote(email=email_filter if email_filter and validate_email(email_filter) else None)
|
||||
alerts = [AdoptionAlert(**a) for a in alert_dicts]
|
||||
else:
|
||||
# LOCAL: Use local database
|
||||
if email_filter and validate_email(email_filter):
|
||||
alerts = framework.db_manager.get_alerts_by_email(email_filter)
|
||||
else:
|
||||
alerts = framework.db_manager.get_all_alerts()
|
||||
|
||||
if not alerts:
|
||||
# Return empty DataFrame with correct columns
|
||||
return pd.DataFrame(columns=["ID", "Email", "Frequency", "Location", "Preferences", "Last Sent", "Status"])
|
||||
|
||||
# Convert to display format
|
||||
data = []
|
||||
for alert in alerts:
|
||||
location = alert.profile.user_location or "Any"
|
||||
prefs = []
|
||||
if alert.profile.age_range:
|
||||
prefs.append(f"Age: {', '.join(alert.profile.age_range)}")
|
||||
if alert.profile.good_with_children:
|
||||
prefs.append("Child-friendly")
|
||||
if alert.profile.good_with_dogs:
|
||||
prefs.append("Dog-friendly")
|
||||
if alert.profile.good_with_cats:
|
||||
prefs.append("Cat-friendly")
|
||||
|
||||
prefs_str = ", ".join(prefs) if prefs else "Any"
|
||||
|
||||
last_sent = alert.last_sent.strftime("%Y-%m-%d %H:%M") if alert.last_sent else "Never"
|
||||
status = "🟢 Active" if alert.active else "🔴 Inactive"
|
||||
|
||||
data.append({
|
||||
"ID": alert.id,
|
||||
"Email": alert.user_email,
|
||||
"Frequency": alert.frequency.capitalize(),
|
||||
"Location": location,
|
||||
"Preferences": prefs_str,
|
||||
"Last Sent": last_sent,
|
||||
"Status": status
|
||||
})
|
||||
|
||||
return pd.DataFrame(data)
|
||||
|
||||
except Exception as e:
|
||||
logging.error(f"Error loading alerts: {e}")
|
||||
return pd.DataFrame(columns=["ID", "Email", "Frequency", "Location", "Preferences", "Last Sent", "Status"])
|
||||
|
||||
|
||||
def delete_alert(alert_id: str, email_filter: str = "") -> Tuple[str, pd.DataFrame]:
|
||||
"""
|
||||
Delete an alert by ID.
|
||||
|
||||
Args:
|
||||
alert_id: Alert ID to delete
|
||||
email_filter: Optional email filter for refresh
|
||||
|
||||
Returns:
|
||||
Tuple of (status_message, updated_alerts_dataframe)
|
||||
"""
|
||||
try:
|
||||
if not alert_id:
|
||||
return "❌ Please enter an Alert ID", load_alerts(email_filter)
|
||||
|
||||
# Convert to int
|
||||
try:
|
||||
alert_id_int = int(alert_id)
|
||||
except ValueError:
|
||||
return f"❌ Invalid Alert ID: {alert_id}", load_alerts(email_filter)
|
||||
|
||||
# Delete from database (Modal or local)
|
||||
if is_production():
|
||||
# PRODUCTION: Call Modal API
|
||||
import modal
|
||||
|
||||
# Look up deployed function
|
||||
delete_alert_func = modal.Function.from_name("tuxedo-link-api", "delete_alert")
|
||||
success = delete_alert_func.remote(alert_id_int)
|
||||
if not success:
|
||||
return f"❌ Failed to delete alert {alert_id}", load_alerts(email_filter)
|
||||
else:
|
||||
# LOCAL: Use local database
|
||||
framework.db_manager.delete_alert(alert_id_int)
|
||||
|
||||
return f"✅ Alert {alert_id} deleted successfully", load_alerts(email_filter)
|
||||
|
||||
except Exception as e:
|
||||
return f"❌ Error deleting alert: {str(e)}", load_alerts(email_filter)
|
||||
|
||||
|
||||
def toggle_alert_status(alert_id: str, email_filter: str = "") -> Tuple[str, pd.DataFrame]:
|
||||
"""
|
||||
Toggle alert active/inactive status.
|
||||
|
||||
Args:
|
||||
alert_id: Alert ID to toggle
|
||||
email_filter: Optional email filter for refresh
|
||||
|
||||
Returns:
|
||||
Tuple of (status_message, updated_alerts_dataframe)
|
||||
"""
|
||||
try:
|
||||
if not alert_id:
|
||||
return "❌ Please enter an Alert ID", load_alerts(email_filter)
|
||||
|
||||
# Convert to int
|
||||
try:
|
||||
alert_id_int = int(alert_id)
|
||||
except ValueError:
|
||||
return f"❌ Invalid Alert ID: {alert_id}", load_alerts(email_filter)
|
||||
|
||||
# Get current alert and toggle (Modal or local)
|
||||
if is_production():
|
||||
# PRODUCTION: Call Modal API
|
||||
import modal
|
||||
|
||||
# Look up deployed functions
|
||||
get_alerts_func = modal.Function.from_name("tuxedo-link-api", "get_alerts")
|
||||
update_alert_func = modal.Function.from_name("tuxedo-link-api", "update_alert")
|
||||
|
||||
# Get all alerts and find this one
|
||||
alert_dicts = get_alerts_func.remote()
|
||||
alert_dict = next((a for a in alert_dicts if a["id"] == alert_id_int), None)
|
||||
|
||||
if not alert_dict:
|
||||
return f"❌ Alert {alert_id} not found", load_alerts(email_filter)
|
||||
|
||||
alert = AdoptionAlert(**alert_dict)
|
||||
new_status = not alert.active
|
||||
|
||||
success = update_alert_func.remote(alert_id_int, active=new_status)
|
||||
if not success:
|
||||
return f"❌ Failed to update alert {alert_id}", load_alerts(email_filter)
|
||||
else:
|
||||
# LOCAL: Use local database
|
||||
alert = framework.db_manager.get_alert(alert_id_int)
|
||||
if not alert:
|
||||
return f"❌ Alert {alert_id} not found", load_alerts(email_filter)
|
||||
|
||||
new_status = not alert.active
|
||||
framework.db_manager.update_alert(alert_id_int, active=new_status)
|
||||
|
||||
status_text = "activated" if new_status else "deactivated"
|
||||
return f"✅ Alert {alert_id} {status_text}", load_alerts(email_filter)
|
||||
|
||||
except Exception as e:
|
||||
return f"❌ Error toggling alert: {str(e)}", load_alerts(email_filter)
|
||||
|
||||
|
||||
def build_search_tab() -> None:
|
||||
"""Build the search tab interface with chat and results display."""
|
||||
with gr.Column():
|
||||
gr.Markdown("# 🐱 Find Your Perfect Cat")
|
||||
gr.Markdown("Tell me what kind of cat you're looking for, and I'll help you find the perfect match!")
|
||||
|
||||
with gr.Row():
|
||||
# In production mode, default to False since Modal cache starts empty
|
||||
# In local mode, can default to True after first run
|
||||
default_cache = False if is_production() else True
|
||||
use_cache_checkbox = gr.Checkbox(
|
||||
label="Use Cache (Fast Mode)",
|
||||
value=default_cache,
|
||||
info="Use cached cat data for faster searches (uncheck for fresh data from APIs)"
|
||||
)
|
||||
|
||||
# Chat interface for natural language input
|
||||
chatbot = gr.Chatbot(label="Chat", height=200, type="messages")
|
||||
user_input = gr.Textbox(
|
||||
label="Describe your ideal cat",
|
||||
placeholder="I'm looking for a friendly, playful kitten in NYC that's good with children...",
|
||||
lines=3
|
||||
)
|
||||
|
||||
with gr.Row():
|
||||
submit_btn = gr.Button("🔍 Search", variant="primary")
|
||||
clear_btn = gr.Button("🔄 Clear")
|
||||
|
||||
# Example queries
|
||||
gr.Markdown("### 💡 Try these examples:")
|
||||
with gr.Row():
|
||||
example_btns = [
|
||||
gr.Button("🏠 Family cat", size="sm"),
|
||||
gr.Button("🎮 Playful kitten", size="sm"),
|
||||
gr.Button("😴 Calm adult", size="sm"),
|
||||
gr.Button("👶 Good with kids", size="sm")
|
||||
]
|
||||
|
||||
# Results display
|
||||
gr.Markdown("---")
|
||||
gr.Markdown("## 🎯 Search Results")
|
||||
results_html = gr.HTML(value="<p style='text-align:center; color: #999; padding: 40px;'>Enter your preferences above to start searching</p>")
|
||||
|
||||
# Profile display (collapsible)
|
||||
with gr.Accordion("📋 Extracted Profile (for debugging)", open=False):
|
||||
profile_display = gr.JSON(label="Profile Data")
|
||||
|
||||
# Wire up events
|
||||
submit_btn.click(
|
||||
fn=extract_profile_from_text,
|
||||
inputs=[user_input, use_cache_checkbox],
|
||||
outputs=[chatbot, results_html, profile_display]
|
||||
)
|
||||
|
||||
user_input.submit(
|
||||
fn=extract_profile_from_text,
|
||||
inputs=[user_input, use_cache_checkbox],
|
||||
outputs=[chatbot, results_html, profile_display]
|
||||
)
|
||||
|
||||
clear_btn.click(
|
||||
fn=lambda: ([], "<p style='text-align:center; color: #999; padding: 40px;'>Enter your preferences above to start searching</p>", ""),
|
||||
outputs=[chatbot, results_html, profile_display]
|
||||
)
|
||||
|
||||
# Example buttons
|
||||
examples = [
|
||||
"I want a friendly family cat in zip code 10001, good with children and dogs",
|
||||
"Looking for a playful young kitten near New York City",
|
||||
"I need a calm, affectionate adult cat that likes to cuddle",
|
||||
"Show me cats good with children in the NYC area"
|
||||
]
|
||||
|
||||
for btn, example in zip(example_btns, examples):
|
||||
btn.click(
|
||||
fn=search_with_examples,
|
||||
inputs=[gr.State(example), use_cache_checkbox],
|
||||
outputs=[chatbot, results_html, profile_display]
|
||||
)
|
||||
|
||||
|
||||
def build_alerts_tab() -> None:
|
||||
"""Build the alerts management tab for scheduling email notifications."""
|
||||
with gr.Column():
|
||||
gr.Markdown("# 🔔 Manage Alerts")
|
||||
gr.Markdown("Save your search and get notified when new matching cats are available!")
|
||||
|
||||
# Instructions
|
||||
gr.Markdown("""
|
||||
### How it works:
|
||||
1. **Search** for cats using your preferred criteria in the Search tab
|
||||
2. **Enter your email** below and choose notification frequency
|
||||
3. **Save Alert** to start receiving notifications
|
||||
|
||||
You'll be notified when new cats matching your preferences become available!
|
||||
""")
|
||||
|
||||
# Save Alert Section
|
||||
gr.Markdown("### 💾 Save Current Search as Alert")
|
||||
|
||||
with gr.Row():
|
||||
with gr.Column(scale=2):
|
||||
email_input = gr.Textbox(
|
||||
label="Email Address",
|
||||
placeholder="your@email.com",
|
||||
info="Where should we send notifications?"
|
||||
)
|
||||
with gr.Column(scale=1):
|
||||
frequency_dropdown = gr.Dropdown(
|
||||
label="Notification Frequency",
|
||||
choices=["Immediately", "Daily", "Weekly"],
|
||||
value="Daily",
|
||||
info="How often to check for new matches"
|
||||
)
|
||||
|
||||
with gr.Row():
|
||||
save_btn = gr.Button("💾 Save Alert", variant="primary", scale=2)
|
||||
profile_display = gr.JSON(
|
||||
label="Current Search Profile",
|
||||
value={},
|
||||
visible=False,
|
||||
scale=1
|
||||
)
|
||||
|
||||
save_status = gr.Markdown("")
|
||||
|
||||
gr.Markdown("---")
|
||||
|
||||
# Manage Alerts Section
|
||||
gr.Markdown("### 📋 Your Saved Alerts")
|
||||
|
||||
with gr.Row():
|
||||
with gr.Column(scale=2):
|
||||
email_filter_input = gr.Textbox(
|
||||
label="Filter by Email (optional)",
|
||||
placeholder="your@email.com"
|
||||
)
|
||||
with gr.Column(scale=1):
|
||||
refresh_btn = gr.Button("🔄 Refresh", size="sm")
|
||||
|
||||
alerts_table = gr.Dataframe(
|
||||
value=[], # Start empty - load on demand to avoid blocking UI startup
|
||||
headers=["ID", "Email", "Frequency", "Location", "Preferences", "Last Sent", "Status"],
|
||||
datatype=["number", "str", "str", "str", "str", "str", "str"],
|
||||
interactive=False,
|
||||
wrap=True
|
||||
)
|
||||
|
||||
# Alert Actions
|
||||
gr.Markdown("### ⚙️ Manage Alert")
|
||||
with gr.Row():
|
||||
alert_id_input = gr.Textbox(
|
||||
label="Alert ID",
|
||||
placeholder="Enter Alert ID from table above",
|
||||
scale=2
|
||||
)
|
||||
with gr.Column(scale=3):
|
||||
with gr.Row():
|
||||
toggle_btn = gr.Button("🔄 Toggle Active/Inactive", size="sm")
|
||||
delete_btn = gr.Button("🗑️ Delete Alert", variant="stop", size="sm")
|
||||
|
||||
action_status = gr.Markdown("")
|
||||
|
||||
# Wire up events
|
||||
save_btn.click(
|
||||
fn=save_alert,
|
||||
inputs=[email_input, frequency_dropdown, profile_display],
|
||||
outputs=[save_status, alerts_table]
|
||||
)
|
||||
|
||||
refresh_btn.click(
|
||||
fn=load_alerts,
|
||||
inputs=[email_filter_input],
|
||||
outputs=[alerts_table]
|
||||
)
|
||||
|
||||
email_filter_input.submit(
|
||||
fn=load_alerts,
|
||||
inputs=[email_filter_input],
|
||||
outputs=[alerts_table]
|
||||
)
|
||||
|
||||
toggle_btn.click(
|
||||
fn=toggle_alert_status,
|
||||
inputs=[alert_id_input, email_filter_input],
|
||||
outputs=[action_status, alerts_table]
|
||||
)
|
||||
|
||||
delete_btn.click(
|
||||
fn=delete_alert,
|
||||
inputs=[alert_id_input, email_filter_input],
|
||||
outputs=[action_status, alerts_table]
|
||||
)
|
||||
|
||||
|
||||
def build_about_tab() -> None:
|
||||
"""Build the about tab with Kyra's story and application info."""
|
||||
with gr.Column():
|
||||
gr.Markdown("# 🎩 About Tuxedo Link")
|
||||
|
||||
gr.Markdown("""
|
||||
## In Loving Memory of Kyra 🐱
|
||||
|
||||
This application is dedicated to **Kyra**, a beloved companion who brought joy,
|
||||
comfort, and unconditional love to our lives. Kyra was more than just a cat—
|
||||
he was family, a friend, and a constant source of happiness.
|
||||
|
||||
### The Inspiration
|
||||
|
||||
Kyra Link was created to help others find their perfect feline companion,
|
||||
just as Kyra found his way into our hearts. Every cat deserves a loving home,
|
||||
and every person deserves the companionship of a wonderful cat like Kyra.
|
||||
|
||||
### The Technology
|
||||
|
||||
This application uses AI and machine learning to match prospective
|
||||
adopters with their ideal cat:
|
||||
|
||||
- **Natural Language Processing**: Understand your preferences in plain English
|
||||
- **Semantic Search**: Find cats based on personality, not just keywords
|
||||
- **Multi-Source Aggregation**: Search across multiple adoption platforms
|
||||
- **Smart Deduplication**: Remove duplicate listings using AI
|
||||
- **Image Recognition**: Match cats visually using computer vision
|
||||
- **Hybrid Matching**: Combine semantic understanding with structured filters
|
||||
|
||||
### Features
|
||||
|
||||
✅ **Multi-Platform Search**: Petfinder, RescueGroups
|
||||
✅ **AI-Powered Matching**: Semantic search with vector embeddings
|
||||
✅ **Smart Deduplication**: Name, description, and image similarity
|
||||
✅ **Personality Matching**: Find cats that match your lifestyle
|
||||
✅ **Location-Based**: Search near you with customizable radius
|
||||
|
||||
### Technical Stack
|
||||
|
||||
- **Frontend**: Gradio
|
||||
- **Backend**: Python with Modal serverless
|
||||
- **LLMs**: OpenAI GPT-4 for profile extraction
|
||||
- **Vector DB**: ChromaDB with SentenceTransformers
|
||||
- **Image AI**: CLIP for visual similarity
|
||||
- **APIs**: Petfinder, RescueGroups, SendGrid
|
||||
- **Database**: SQLite for caching and user management
|
||||
|
||||
### Open Source
|
||||
|
||||
Tuxedo Link is open source and built as part of the Andela LLM Engineering bootcamp.
|
||||
Contributions and improvements are welcome!
|
||||
|
||||
### Acknowledgments
|
||||
|
||||
- **Petfinder**: For their comprehensive pet adoption API
|
||||
- **RescueGroups**: For connecting rescues with adopters
|
||||
- **Andela**: For the LLM Engineering bootcamp
|
||||
- **Kyra**: For inspiring this project and bringing so much joy 💙
|
||||
|
||||
---
|
||||
|
||||
*"In memory of Kyra, who taught us that home is wherever your cat is."*
|
||||
|
||||
🐾 **May every cat find their perfect home** 🐾
|
||||
""")
|
||||
|
||||
# Add Kyra's picture
|
||||
with gr.Row():
|
||||
with gr.Column():
|
||||
gr.Image(
|
||||
value="assets/Kyra.png",
|
||||
label="Kyra - Forever in our hearts 💙",
|
||||
show_label=True,
|
||||
container=True,
|
||||
width=400,
|
||||
height=400,
|
||||
show_download_button=False,
|
||||
show_share_button=False,
|
||||
interactive=False
|
||||
)
|
||||
|
||||
|
||||
def create_app() -> gr.Blocks:
|
||||
"""
|
||||
Create and configure the Gradio application.
|
||||
|
||||
Returns:
|
||||
Configured Gradio Blocks application
|
||||
"""
|
||||
with gr.Blocks(
|
||||
title="Tuxedo Link - Find Your Perfect Cat",
|
||||
theme=gr.themes.Soft()
|
||||
) as app:
|
||||
gr.Markdown("""
|
||||
<div style='text-align: center; padding: 20px;'>
|
||||
<h1 style='font-size: 3em; margin: 0;'>🎩 Tuxedo Link</h1>
|
||||
<p style='font-size: 1.2em; color: #666; margin: 10px 0;'>
|
||||
AI-Powered Cat Adoption Search
|
||||
</p>
|
||||
</div>
|
||||
""")
|
||||
|
||||
with gr.Tabs():
|
||||
with gr.Tab("🔍 Search"):
|
||||
build_search_tab()
|
||||
|
||||
with gr.Tab("🔔 Alerts"):
|
||||
build_alerts_tab()
|
||||
|
||||
with gr.Tab("ℹ️ About"):
|
||||
build_about_tab()
|
||||
|
||||
gr.Markdown("""
|
||||
<div style='text-align: center; padding: 20px; color: #999; font-size: 0.9em;'>
|
||||
Made with ❤️ in memory of Kyra |
|
||||
<a href='https://github.com/yourusername/tuxedo-link' style='color: #2196F3;'>GitHub</a> |
|
||||
Powered by AI & Open Source
|
||||
</div>
|
||||
""")
|
||||
|
||||
return app
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
app = create_app()
|
||||
app.launch(
|
||||
server_name="0.0.0.0",
|
||||
server_port=7860,
|
||||
share=False,
|
||||
show_error=True
|
||||
)
|
||||
|
||||
@@ -1,255 +0,0 @@
|
||||
"""Main framework for Tuxedo Link cat adoption application."""
|
||||
|
||||
import logging
|
||||
import sys
|
||||
from typing import Optional
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from models.cats import CatProfile, SearchResult
|
||||
from database.manager import DatabaseManager
|
||||
from setup_vectordb import VectorDBManager
|
||||
from setup_metadata_vectordb import MetadataVectorDB
|
||||
from agents.planning_agent import PlanningAgent
|
||||
from utils.config import get_db_path, get_vectordb_path
|
||||
|
||||
# Color codes for logging
|
||||
BG_BLUE = '\033[44m'
|
||||
WHITE = '\033[37m'
|
||||
RESET = '\033[0m'
|
||||
|
||||
|
||||
def init_logging() -> None:
|
||||
"""Initialize logging with colored output for the framework."""
|
||||
root = logging.getLogger()
|
||||
root.setLevel(logging.INFO)
|
||||
handler = logging.StreamHandler(sys.stdout)
|
||||
handler.setLevel(logging.INFO)
|
||||
formatter = logging.Formatter(
|
||||
"[%(asctime)s] [Tuxedo Link] [%(levelname)s] %(message)s",
|
||||
datefmt="%Y-%m-%d %H:%M:%S",
|
||||
)
|
||||
handler.setFormatter(formatter)
|
||||
root.addHandler(handler)
|
||||
|
||||
|
||||
class TuxedoLinkFramework:
|
||||
"""Main framework for Tuxedo Link cat adoption application."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the Tuxedo Link framework."""
|
||||
init_logging()
|
||||
load_dotenv()
|
||||
|
||||
self.log("Initializing Tuxedo Link Framework...")
|
||||
|
||||
# Initialize database managers using config
|
||||
db_path = get_db_path()
|
||||
vectordb_path = get_vectordb_path()
|
||||
|
||||
self.db_manager = DatabaseManager(db_path)
|
||||
self.vector_db = VectorDBManager(vectordb_path)
|
||||
self.metadata_vectordb = MetadataVectorDB("metadata_vectorstore")
|
||||
|
||||
# Index colors and breeds from APIs for fuzzy matching
|
||||
self._index_metadata()
|
||||
|
||||
# Lazy agent initialization
|
||||
self.planner: Optional[PlanningAgent] = None
|
||||
|
||||
self.log("Tuxedo Link Framework initialized")
|
||||
|
||||
def _index_metadata(self) -> None:
|
||||
"""Index colors and breeds from APIs into metadata vector DB for fuzzy matching."""
|
||||
try:
|
||||
from agents.petfinder_agent import PetfinderAgent
|
||||
from agents.rescuegroups_agent import RescueGroupsAgent
|
||||
|
||||
self.log("Indexing colors and breeds for fuzzy matching...")
|
||||
|
||||
# Index Petfinder colors and breeds
|
||||
try:
|
||||
petfinder = PetfinderAgent()
|
||||
colors = petfinder.get_valid_colors()
|
||||
breeds = petfinder.get_valid_breeds()
|
||||
|
||||
if colors:
|
||||
self.metadata_vectordb.index_colors(colors, source="petfinder")
|
||||
if breeds:
|
||||
self.metadata_vectordb.index_breeds(breeds, source="petfinder")
|
||||
except Exception as e:
|
||||
logging.warning(f"Could not index Petfinder metadata: {e}")
|
||||
|
||||
# Index RescueGroups colors and breeds
|
||||
try:
|
||||
rescuegroups = RescueGroupsAgent()
|
||||
colors = rescuegroups.get_valid_colors()
|
||||
breeds = rescuegroups.get_valid_breeds()
|
||||
|
||||
if colors:
|
||||
self.metadata_vectordb.index_colors(colors, source="rescuegroups")
|
||||
if breeds:
|
||||
self.metadata_vectordb.index_breeds(breeds, source="rescuegroups")
|
||||
except Exception as e:
|
||||
logging.warning(f"Could not index RescueGroups metadata: {e}")
|
||||
|
||||
stats = self.metadata_vectordb.get_stats()
|
||||
self.log(f"✓ Metadata indexed: {stats['colors_count']} colors, {stats['breeds_count']} breeds")
|
||||
|
||||
except Exception as e:
|
||||
logging.warning(f"Metadata indexing failed: {e}")
|
||||
|
||||
def init_agents(self) -> None:
|
||||
"""Initialize agents lazily on first search request."""
|
||||
if not self.planner:
|
||||
self.log("Initializing agent pipeline...")
|
||||
self.planner = PlanningAgent(
|
||||
self.db_manager,
|
||||
self.vector_db,
|
||||
self.metadata_vectordb
|
||||
)
|
||||
self.log("Agent pipeline ready")
|
||||
|
||||
def log(self, message: str) -> None:
|
||||
"""
|
||||
Log a message with framework identifier.
|
||||
|
||||
Args:
|
||||
message: Message to log
|
||||
"""
|
||||
text = BG_BLUE + WHITE + "[Framework] " + message + RESET
|
||||
logging.info(text)
|
||||
|
||||
def search(self, profile: CatProfile, use_cache: bool = False) -> SearchResult:
|
||||
"""
|
||||
Execute cat adoption search.
|
||||
|
||||
This runs the complete pipeline:
|
||||
1. Fetch cats from APIs OR load from cache (if use_cache=True)
|
||||
2. Deduplicate across sources (if fetching new)
|
||||
3. Cache in database with image embeddings (if fetching new)
|
||||
4. Update vector database (if fetching new)
|
||||
5. Perform hybrid matching (semantic + metadata)
|
||||
6. Return ranked results
|
||||
|
||||
Args:
|
||||
profile: User's cat profile with preferences
|
||||
use_cache: If True, use cached data instead of fetching from APIs.
|
||||
This saves API calls during development/testing.
|
||||
|
||||
Returns:
|
||||
SearchResult with matches and metadata
|
||||
"""
|
||||
self.init_agents()
|
||||
return self.planner.search(profile, use_cache=use_cache)
|
||||
|
||||
def cleanup_old_data(self, days: int = 30) -> dict:
|
||||
"""
|
||||
Clean up data older than specified days.
|
||||
|
||||
Args:
|
||||
days: Number of days to keep (default: 30)
|
||||
|
||||
Returns:
|
||||
Dictionary with cleanup statistics
|
||||
"""
|
||||
self.init_agents()
|
||||
return self.planner.cleanup_old_data(days)
|
||||
|
||||
def get_stats(self) -> dict:
|
||||
"""
|
||||
Get statistics about the application state.
|
||||
|
||||
Returns:
|
||||
Dictionary with database and vector DB stats
|
||||
"""
|
||||
cache_stats = self.db_manager.get_cache_stats()
|
||||
vector_stats = self.vector_db.get_stats()
|
||||
|
||||
return {
|
||||
'database': cache_stats,
|
||||
'vector_db': vector_stats
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Test the framework with a real search
|
||||
print("\n" + "="*60)
|
||||
print("Testing Tuxedo Link Framework")
|
||||
print("="*60 + "\n")
|
||||
|
||||
framework = TuxedoLinkFramework()
|
||||
|
||||
# Create a test profile
|
||||
print("Creating test profile...")
|
||||
profile = CatProfile(
|
||||
user_location="10001", # New York City
|
||||
max_distance=50,
|
||||
personality_description="friendly, playful cat good with children",
|
||||
age_range=["young", "adult"],
|
||||
good_with_children=True
|
||||
)
|
||||
|
||||
print(f"\nProfile:")
|
||||
print(f" Location: {profile.user_location}")
|
||||
print(f" Distance: {profile.max_distance} miles")
|
||||
print(f" Age: {', '.join(profile.age_range)}")
|
||||
print(f" Personality: {profile.personality_description}")
|
||||
print(f" Good with children: {profile.good_with_children}")
|
||||
|
||||
# Run search
|
||||
print("\n" + "-"*60)
|
||||
print("Running search pipeline...")
|
||||
print("-"*60 + "\n")
|
||||
|
||||
result = framework.search(profile)
|
||||
|
||||
# Display results
|
||||
print("\n" + "="*60)
|
||||
print("SEARCH RESULTS")
|
||||
print("="*60 + "\n")
|
||||
|
||||
print(f"Total cats found: {result.total_found}")
|
||||
print(f"Sources queried: {', '.join(result.sources_queried)}")
|
||||
print(f"Duplicates removed: {result.duplicates_removed}")
|
||||
print(f"Matches returned: {len(result.matches)}")
|
||||
print(f"Search time: {result.search_time:.2f} seconds")
|
||||
|
||||
if result.matches:
|
||||
print("\n" + "-"*60)
|
||||
print("TOP MATCHES")
|
||||
print("-"*60 + "\n")
|
||||
|
||||
for i, match in enumerate(result.matches[:5], 1):
|
||||
cat = match.cat
|
||||
print(f"{i}. {cat.name}")
|
||||
print(f" Breed: {cat.breed}")
|
||||
print(f" Age: {cat.age} | Size: {cat.size} | Gender: {cat.gender}")
|
||||
print(f" Location: {cat.city}, {cat.state}")
|
||||
print(f" Match Score: {match.match_score:.2%}")
|
||||
print(f" Explanation: {match.explanation}")
|
||||
print(f" Source: {cat.source}")
|
||||
print(f" URL: {cat.url}")
|
||||
if cat.primary_photo:
|
||||
print(f" Photo: {cat.primary_photo}")
|
||||
print()
|
||||
else:
|
||||
print("\nNo matches found. Try adjusting your search criteria.")
|
||||
|
||||
# Show stats
|
||||
print("\n" + "="*60)
|
||||
print("SYSTEM STATISTICS")
|
||||
print("="*60 + "\n")
|
||||
|
||||
stats = framework.get_stats()
|
||||
print("Database:")
|
||||
for key, value in stats['database'].items():
|
||||
print(f" {key}: {value}")
|
||||
|
||||
print("\nVector Database:")
|
||||
for key, value in stats['vector_db'].items():
|
||||
print(f" {key}: {value}")
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("Test Complete!")
|
||||
print("="*60 + "\n")
|
||||
|
||||
@@ -1,31 +0,0 @@
|
||||
# Tuxedo Link Configuration
|
||||
# Copy this file to config.yaml and adjust settings
|
||||
|
||||
# Email provider configuration
|
||||
email:
|
||||
provider: mailgun # Options: mailgun, sendgrid
|
||||
from_name: "Tuxedo Link"
|
||||
from_email: "noreply@tuxedolink.com"
|
||||
|
||||
# Mailgun configuration
|
||||
mailgun:
|
||||
domain: "sandboxfd631e04f8a941d5a5993a11227ea098.mailgun.org" # Your Mailgun domain
|
||||
# API key from environment: MAILGUN_API_KEY
|
||||
|
||||
# SendGrid configuration (if using sendgrid provider)
|
||||
sendgrid:
|
||||
# API key from environment: SENDGRID_API_KEY
|
||||
# kept for backwards compatibility
|
||||
|
||||
# Deployment configuration
|
||||
deployment:
|
||||
mode: local # Options: local, production
|
||||
|
||||
local:
|
||||
db_path: "data/tuxedo_link.db"
|
||||
vectordb_path: "cat_vectorstore"
|
||||
|
||||
production:
|
||||
db_path: "/data/tuxedo_link.db"
|
||||
vectordb_path: "/data/cat_vectorstore"
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
"""Database layer for Tuxedo Link."""
|
||||
|
||||
from .manager import DatabaseManager
|
||||
|
||||
__all__ = ["DatabaseManager"]
|
||||
|
||||
@@ -1,382 +0,0 @@
|
||||
"""Database manager for Tuxedo Link."""
|
||||
|
||||
import sqlite3
|
||||
import json
|
||||
import os
|
||||
from datetime import datetime, timedelta
|
||||
from typing import List, Optional, Tuple, Generator, Dict, Any
|
||||
import numpy as np
|
||||
from contextlib import contextmanager
|
||||
|
||||
from models.cats import Cat, AdoptionAlert, CatProfile
|
||||
from .schema import initialize_database
|
||||
|
||||
|
||||
class DatabaseManager:
|
||||
"""Manages all database operations for Tuxedo Link."""
|
||||
|
||||
def __init__(self, db_path: str):
|
||||
"""
|
||||
Initialize the database manager.
|
||||
|
||||
Args:
|
||||
db_path: Path to SQLite database file
|
||||
"""
|
||||
self.db_path = db_path
|
||||
|
||||
# Create database directory if it doesn't exist
|
||||
db_dir = os.path.dirname(db_path)
|
||||
if db_dir and not os.path.exists(db_dir):
|
||||
os.makedirs(db_dir)
|
||||
|
||||
# Initialize database if it doesn't exist
|
||||
if not os.path.exists(db_path):
|
||||
initialize_database(db_path)
|
||||
|
||||
@contextmanager
|
||||
def get_connection(self) -> Generator[sqlite3.Connection, None, None]:
|
||||
"""
|
||||
Context manager for database connections.
|
||||
|
||||
Yields:
|
||||
SQLite database connection with row factory enabled
|
||||
"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
conn.row_factory = sqlite3.Row # Access columns by name
|
||||
try:
|
||||
yield conn
|
||||
conn.commit()
|
||||
except Exception:
|
||||
conn.rollback()
|
||||
raise
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
# ===== ALERT OPERATIONS =====
|
||||
|
||||
def create_alert(self, alert: AdoptionAlert) -> int:
|
||||
"""
|
||||
Create a new adoption alert.
|
||||
|
||||
Args:
|
||||
alert: AdoptionAlert object
|
||||
|
||||
Returns:
|
||||
Alert ID
|
||||
"""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"""INSERT INTO alerts
|
||||
(user_email, profile_json, frequency, last_sent, active, last_match_ids)
|
||||
VALUES (?, ?, ?, ?, ?, ?)""",
|
||||
(
|
||||
alert.user_email,
|
||||
alert.profile.model_dump_json(),
|
||||
alert.frequency,
|
||||
alert.last_sent.isoformat() if alert.last_sent else None,
|
||||
alert.active,
|
||||
json.dumps(alert.last_match_ids)
|
||||
)
|
||||
)
|
||||
return cursor.lastrowid
|
||||
|
||||
def get_alert(self, alert_id: int) -> Optional[AdoptionAlert]:
|
||||
"""Get alert by ID."""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"""SELECT id, user_email, profile_json, frequency,
|
||||
last_sent, active, created_at, last_match_ids
|
||||
FROM alerts WHERE id = ?""",
|
||||
(alert_id,)
|
||||
)
|
||||
row = cursor.fetchone()
|
||||
if row:
|
||||
return self._row_to_alert(row)
|
||||
return None
|
||||
|
||||
def get_alerts_by_email(self, email: str, active_only: bool = False) -> List[AdoptionAlert]:
|
||||
"""
|
||||
Get all alerts for a specific email address.
|
||||
|
||||
Args:
|
||||
email: User email address
|
||||
active_only: If True, only return active alerts
|
||||
|
||||
Returns:
|
||||
List of AdoptionAlert objects
|
||||
"""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
if active_only:
|
||||
cursor.execute(
|
||||
"""SELECT id, user_email, profile_json, frequency,
|
||||
last_sent, active, created_at, last_match_ids
|
||||
FROM alerts WHERE user_email = ? AND active = 1
|
||||
ORDER BY created_at DESC""",
|
||||
(email,)
|
||||
)
|
||||
else:
|
||||
cursor.execute(
|
||||
"""SELECT id, user_email, profile_json, frequency,
|
||||
last_sent, active, created_at, last_match_ids
|
||||
FROM alerts WHERE user_email = ?
|
||||
ORDER BY created_at DESC""",
|
||||
(email,)
|
||||
)
|
||||
|
||||
return [self._row_to_alert(row) for row in cursor.fetchall()]
|
||||
|
||||
def get_all_alerts(self, active_only: bool = False) -> List[AdoptionAlert]:
|
||||
"""
|
||||
Get all alerts in the database.
|
||||
|
||||
Args:
|
||||
active_only: If True, only return active alerts
|
||||
|
||||
Returns:
|
||||
List of AdoptionAlert objects
|
||||
"""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
if active_only:
|
||||
query = """SELECT id, user_email, profile_json, frequency,
|
||||
last_sent, active, created_at, last_match_ids
|
||||
FROM alerts WHERE active = 1
|
||||
ORDER BY created_at DESC"""
|
||||
else:
|
||||
query = """SELECT id, user_email, profile_json, frequency,
|
||||
last_sent, active, created_at, last_match_ids
|
||||
FROM alerts
|
||||
ORDER BY created_at DESC"""
|
||||
|
||||
cursor.execute(query)
|
||||
return [self._row_to_alert(row) for row in cursor.fetchall()]
|
||||
|
||||
def get_active_alerts(self) -> List[AdoptionAlert]:
|
||||
"""Get all active alerts across all users."""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"""SELECT id, user_email, profile_json, frequency,
|
||||
last_sent, active, created_at, last_match_ids
|
||||
FROM alerts WHERE active = 1"""
|
||||
)
|
||||
return [self._row_to_alert(row) for row in cursor.fetchall()]
|
||||
|
||||
def get_alert_by_id(self, alert_id: int) -> Optional[AdoptionAlert]:
|
||||
"""
|
||||
Get a specific alert by its ID.
|
||||
|
||||
Args:
|
||||
alert_id: Alert ID to retrieve
|
||||
|
||||
Returns:
|
||||
AdoptionAlert object or None if not found
|
||||
"""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"""SELECT id, user_email, profile_json, frequency,
|
||||
last_sent, active, created_at, last_match_ids
|
||||
FROM alerts WHERE id = ?""",
|
||||
(alert_id,)
|
||||
)
|
||||
row = cursor.fetchone()
|
||||
return self._row_to_alert(row) if row else None
|
||||
|
||||
def update_alert(self, alert_id: int, **kwargs) -> None:
|
||||
"""Update alert fields."""
|
||||
allowed_fields = ['profile_json', 'frequency', 'last_sent', 'active', 'last_match_ids']
|
||||
updates = []
|
||||
values = []
|
||||
|
||||
for field, value in kwargs.items():
|
||||
if field in allowed_fields:
|
||||
updates.append(f"{field} = ?")
|
||||
if field == 'last_sent' and isinstance(value, datetime):
|
||||
values.append(value.isoformat())
|
||||
elif field == 'last_match_ids':
|
||||
values.append(json.dumps(value))
|
||||
else:
|
||||
values.append(value)
|
||||
|
||||
if updates:
|
||||
values.append(alert_id)
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
f"UPDATE alerts SET {', '.join(updates)} WHERE id = ?",
|
||||
values
|
||||
)
|
||||
|
||||
def delete_alert(self, alert_id: int) -> None:
|
||||
"""Delete an alert."""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("DELETE FROM alerts WHERE id = ?", (alert_id,))
|
||||
|
||||
def _row_to_alert(self, row: sqlite3.Row) -> AdoptionAlert:
|
||||
"""
|
||||
Convert database row to AdoptionAlert object.
|
||||
|
||||
Args:
|
||||
row: SQLite row object from alerts table
|
||||
|
||||
Returns:
|
||||
AdoptionAlert object with parsed JSON fields
|
||||
"""
|
||||
return AdoptionAlert(
|
||||
id=row['id'],
|
||||
user_email=row['user_email'],
|
||||
profile=CatProfile.model_validate_json(row['profile_json']),
|
||||
frequency=row['frequency'],
|
||||
last_sent=datetime.fromisoformat(row['last_sent']) if row['last_sent'] else None,
|
||||
active=bool(row['active']),
|
||||
created_at=datetime.fromisoformat(row['created_at']) if row['created_at'] else datetime.now(),
|
||||
last_match_ids=json.loads(row['last_match_ids']) if row['last_match_ids'] else []
|
||||
)
|
||||
|
||||
# ===== CAT CACHE OPERATIONS =====
|
||||
|
||||
def cache_cat(self, cat: Cat, image_embedding: Optional[np.ndarray] = None) -> None:
|
||||
"""
|
||||
Cache a cat in the database.
|
||||
|
||||
Args:
|
||||
cat: Cat object
|
||||
image_embedding: Optional numpy array of image embedding
|
||||
"""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Serialize image embedding if provided
|
||||
embedding_bytes = None
|
||||
if image_embedding is not None:
|
||||
embedding_bytes = image_embedding.tobytes()
|
||||
|
||||
cursor.execute(
|
||||
"""INSERT OR REPLACE INTO cats_cache
|
||||
(id, fingerprint, source, data_json, image_embedding, fetched_at, is_duplicate, duplicate_of)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
|
||||
(
|
||||
cat.id,
|
||||
cat.fingerprint,
|
||||
cat.source,
|
||||
cat.model_dump_json(),
|
||||
embedding_bytes,
|
||||
cat.fetched_at.isoformat(),
|
||||
False,
|
||||
None
|
||||
)
|
||||
)
|
||||
|
||||
def get_cached_cat(self, cat_id: str) -> Optional[Tuple[Cat, Optional[np.ndarray]]]:
|
||||
"""Get a cat from cache by ID."""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"""SELECT data_json, image_embedding FROM cats_cache
|
||||
WHERE id = ? AND is_duplicate = 0""",
|
||||
(cat_id,)
|
||||
)
|
||||
row = cursor.fetchone()
|
||||
if row:
|
||||
cat = Cat.model_validate_json(row['data_json'])
|
||||
embedding = None
|
||||
if row['image_embedding']:
|
||||
embedding = np.frombuffer(row['image_embedding'], dtype=np.float32)
|
||||
return cat, embedding
|
||||
return None
|
||||
|
||||
def get_cats_by_fingerprint(self, fingerprint: str) -> List[Tuple[Cat, Optional[np.ndarray]]]:
|
||||
"""Get all cats with a specific fingerprint."""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"""SELECT data_json, image_embedding FROM cats_cache
|
||||
WHERE fingerprint = ? AND is_duplicate = 0
|
||||
ORDER BY fetched_at ASC""",
|
||||
(fingerprint,)
|
||||
)
|
||||
results = []
|
||||
for row in cursor.fetchall():
|
||||
cat = Cat.model_validate_json(row['data_json'])
|
||||
embedding = None
|
||||
if row['image_embedding']:
|
||||
embedding = np.frombuffer(row['image_embedding'], dtype=np.float32)
|
||||
results.append((cat, embedding))
|
||||
return results
|
||||
|
||||
def mark_as_duplicate(self, duplicate_id: str, canonical_id: str) -> None:
|
||||
"""Mark a cat as duplicate of another."""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"UPDATE cats_cache SET is_duplicate = 1, duplicate_of = ? WHERE id = ?",
|
||||
(canonical_id, duplicate_id)
|
||||
)
|
||||
|
||||
def get_all_cached_cats(self, exclude_duplicates: bool = True) -> List[Cat]:
|
||||
"""Get all cached cats."""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
if exclude_duplicates:
|
||||
cursor.execute(
|
||||
"SELECT data_json FROM cats_cache WHERE is_duplicate = 0 ORDER BY fetched_at DESC"
|
||||
)
|
||||
else:
|
||||
cursor.execute(
|
||||
"SELECT data_json FROM cats_cache ORDER BY fetched_at DESC"
|
||||
)
|
||||
return [Cat.model_validate_json(row['data_json']) for row in cursor.fetchall()]
|
||||
|
||||
def cleanup_old_cats(self, days: int = 30) -> int:
|
||||
"""
|
||||
Remove cats older than specified days.
|
||||
|
||||
Args:
|
||||
days: Number of days to keep
|
||||
|
||||
Returns:
|
||||
Number of cats removed
|
||||
"""
|
||||
cutoff_date = (datetime.now() - timedelta(days=days)).isoformat()
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"DELETE FROM cats_cache WHERE fetched_at < ?",
|
||||
(cutoff_date,)
|
||||
)
|
||||
return cursor.rowcount
|
||||
|
||||
def get_cache_stats(self) -> dict:
|
||||
"""Get statistics about the cat cache."""
|
||||
with self.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("SELECT COUNT(*) FROM cats_cache WHERE is_duplicate = 0")
|
||||
total = cursor.fetchone()[0]
|
||||
|
||||
cursor.execute("SELECT COUNT(*) FROM cats_cache WHERE is_duplicate = 1")
|
||||
duplicates = cursor.fetchone()[0]
|
||||
|
||||
cursor.execute("SELECT COUNT(DISTINCT source) FROM cats_cache WHERE is_duplicate = 0")
|
||||
sources = cursor.fetchone()[0]
|
||||
|
||||
cursor.execute("""
|
||||
SELECT source, COUNT(*) as count
|
||||
FROM cats_cache
|
||||
WHERE is_duplicate = 0
|
||||
GROUP BY source
|
||||
""")
|
||||
by_source = {row['source']: row['count'] for row in cursor.fetchall()}
|
||||
|
||||
return {
|
||||
'total_unique': total,
|
||||
'total_duplicates': duplicates,
|
||||
'sources': sources,
|
||||
'by_source': by_source
|
||||
}
|
||||
|
||||
@@ -1,131 +0,0 @@
|
||||
"""SQLite database schema for Tuxedo Link."""
|
||||
|
||||
import sqlite3
|
||||
from typing import Optional
|
||||
|
||||
|
||||
SCHEMA_VERSION = 2
|
||||
|
||||
# SQL statements for creating tables
|
||||
CREATE_ALERTS_TABLE = """
|
||||
CREATE TABLE IF NOT EXISTS alerts (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
user_email TEXT NOT NULL,
|
||||
profile_json TEXT NOT NULL,
|
||||
frequency TEXT NOT NULL CHECK(frequency IN ('immediately', 'daily', 'weekly')),
|
||||
last_sent TIMESTAMP,
|
||||
active BOOLEAN DEFAULT 1,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
last_match_ids TEXT DEFAULT '[]'
|
||||
);
|
||||
"""
|
||||
|
||||
CREATE_CATS_CACHE_TABLE = """
|
||||
CREATE TABLE IF NOT EXISTS cats_cache (
|
||||
id TEXT PRIMARY KEY,
|
||||
fingerprint TEXT NOT NULL,
|
||||
source TEXT NOT NULL,
|
||||
data_json TEXT NOT NULL,
|
||||
image_embedding BLOB,
|
||||
fetched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
is_duplicate BOOLEAN DEFAULT 0,
|
||||
duplicate_of TEXT,
|
||||
FOREIGN KEY (duplicate_of) REFERENCES cats_cache(id) ON DELETE SET NULL
|
||||
);
|
||||
"""
|
||||
|
||||
CREATE_SCHEMA_VERSION_TABLE = """
|
||||
CREATE TABLE IF NOT EXISTS schema_version (
|
||||
version INTEGER PRIMARY KEY,
|
||||
applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
"""
|
||||
|
||||
# Index statements
|
||||
CREATE_INDEXES = [
|
||||
"CREATE INDEX IF NOT EXISTS idx_fingerprint ON cats_cache(fingerprint);",
|
||||
"CREATE INDEX IF NOT EXISTS idx_source ON cats_cache(source);",
|
||||
"CREATE INDEX IF NOT EXISTS idx_fetched_at ON cats_cache(fetched_at);",
|
||||
"CREATE INDEX IF NOT EXISTS idx_is_duplicate ON cats_cache(is_duplicate);",
|
||||
"CREATE INDEX IF NOT EXISTS idx_alerts_email ON alerts(user_email);",
|
||||
"CREATE INDEX IF NOT EXISTS idx_alerts_active ON alerts(active);",
|
||||
]
|
||||
|
||||
|
||||
def initialize_database(db_path: str) -> None:
|
||||
"""
|
||||
Initialize the database with all tables and indexes.
|
||||
|
||||
Args:
|
||||
db_path: Path to SQLite database file
|
||||
"""
|
||||
conn = sqlite3.connect(db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
# Create tables
|
||||
cursor.execute(CREATE_ALERTS_TABLE)
|
||||
cursor.execute(CREATE_CATS_CACHE_TABLE)
|
||||
cursor.execute(CREATE_SCHEMA_VERSION_TABLE)
|
||||
|
||||
# Create indexes
|
||||
for index_sql in CREATE_INDEXES:
|
||||
cursor.execute(index_sql)
|
||||
|
||||
# Check and set schema version
|
||||
cursor.execute("SELECT version FROM schema_version ORDER BY version DESC LIMIT 1")
|
||||
result = cursor.fetchone()
|
||||
|
||||
if result is None:
|
||||
cursor.execute("INSERT INTO schema_version (version) VALUES (?)", (SCHEMA_VERSION,))
|
||||
elif result[0] < SCHEMA_VERSION:
|
||||
# Future: Add migration logic here
|
||||
cursor.execute("INSERT INTO schema_version (version) VALUES (?)", (SCHEMA_VERSION,))
|
||||
|
||||
conn.commit()
|
||||
print(f"Database initialized successfully at {db_path}")
|
||||
|
||||
except Exception as e:
|
||||
conn.rollback()
|
||||
raise Exception(f"Failed to initialize database: {e}")
|
||||
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def drop_all_tables(db_path: str) -> None:
|
||||
"""
|
||||
Drop all tables (useful for testing).
|
||||
|
||||
Args:
|
||||
db_path: Path to SQLite database file
|
||||
"""
|
||||
conn = sqlite3.connect(db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
cursor.execute("DROP TABLE IF EXISTS cats_cache")
|
||||
cursor.execute("DROP TABLE IF EXISTS alerts")
|
||||
cursor.execute("DROP TABLE IF EXISTS schema_version")
|
||||
conn.commit()
|
||||
print("All tables dropped successfully")
|
||||
|
||||
except Exception as e:
|
||||
conn.rollback()
|
||||
raise Exception(f"Failed to drop tables: {e}")
|
||||
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# For testing
|
||||
import os
|
||||
test_db = "test_database.db"
|
||||
|
||||
if os.path.exists(test_db):
|
||||
os.remove(test_db)
|
||||
|
||||
initialize_database(test_db)
|
||||
print(f"Test database created at {test_db}")
|
||||
|
||||
@@ -1,147 +0,0 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
# Colors
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
YELLOW='\033[1;33m'
|
||||
RED='\033[0;31m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
echo "=========================================="
|
||||
echo " Tuxedo Link - Modal Deployment"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
|
||||
# Check Modal is installed
|
||||
if ! command -v modal &> /dev/null; then
|
||||
echo -e "${RED}Error: modal CLI not found${NC}"
|
||||
echo "Install with: pip install modal"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check Modal auth
|
||||
echo -e "${BLUE}Checking Modal authentication...${NC}"
|
||||
if ! uv run python -m modal app list &>/dev/null; then
|
||||
echo -e "${RED}Error: Modal authentication not configured${NC}"
|
||||
echo "Run: uv run python -m modal setup"
|
||||
exit 1
|
||||
fi
|
||||
echo -e "${GREEN}✓ Modal authenticated${NC}"
|
||||
echo ""
|
||||
|
||||
# Check config.yaml exists
|
||||
if [ ! -f "config.yaml" ]; then
|
||||
echo -e "${RED}Error: config.yaml not found${NC}"
|
||||
echo "Copy config.example.yaml to config.yaml and configure it"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "${BLUE}Step 1: Validating configuration...${NC}"
|
||||
python -c "
|
||||
import yaml
|
||||
import sys
|
||||
try:
|
||||
config = yaml.safe_load(open('config.yaml'))
|
||||
if config['deployment']['mode'] != 'production':
|
||||
print('❌ Error: Set deployment.mode to \"production\" in config.yaml for deployment')
|
||||
sys.exit(1)
|
||||
print('✓ Configuration valid')
|
||||
except Exception as e:
|
||||
print(f'❌ Error reading config: {e}')
|
||||
sys.exit(1)
|
||||
"
|
||||
|
||||
if [ $? -ne 0 ]; then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo -e "${BLUE}Step 2: Setting up Modal secrets...${NC}"
|
||||
|
||||
# Check if required environment variables are set
|
||||
if [ -z "$OPENAI_API_KEY" ] || [ -z "$PETFINDER_API_KEY" ] || [ -z "$MAILGUN_API_KEY" ]; then
|
||||
echo -e "${YELLOW}Warning: Some environment variables are not set.${NC}"
|
||||
echo "Make sure the following are set in your environment or .env file:"
|
||||
echo " - OPENAI_API_KEY"
|
||||
echo " - PETFINDER_API_KEY"
|
||||
echo " - PETFINDER_SECRET"
|
||||
echo " - RESCUEGROUPS_API_KEY"
|
||||
echo " - MAILGUN_API_KEY"
|
||||
echo " - SENDGRID_API_KEY (optional)"
|
||||
echo ""
|
||||
read -p "Continue anyway? (y/N) " -n 1 -r
|
||||
echo
|
||||
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Load .env if it exists
|
||||
if [ -f ".env" ]; then
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
fi
|
||||
|
||||
modal secret create tuxedo-link-secrets \
|
||||
OPENAI_API_KEY="${OPENAI_API_KEY}" \
|
||||
PETFINDER_API_KEY="${PETFINDER_API_KEY}" \
|
||||
PETFINDER_SECRET="${PETFINDER_SECRET}" \
|
||||
RESCUEGROUPS_API_KEY="${RESCUEGROUPS_API_KEY}" \
|
||||
MAILGUN_API_KEY="${MAILGUN_API_KEY}" \
|
||||
SENDGRID_API_KEY="${SENDGRID_API_KEY:-}" \
|
||||
--force 2>/dev/null || echo -e "${GREEN}✓ Secrets updated${NC}"
|
||||
|
||||
echo ""
|
||||
echo -e "${BLUE}Step 3: Creating Modal volume...${NC}"
|
||||
modal volume create tuxedo-link-data 2>/dev/null && echo -e "${GREEN}✓ Volume created${NC}" || echo -e "${GREEN}✓ Volume already exists${NC}"
|
||||
|
||||
echo ""
|
||||
echo -e "${BLUE}Step 4: Copying config to Modal volume...${NC}"
|
||||
# Create scripts directory if it doesn't exist
|
||||
mkdir -p scripts
|
||||
|
||||
# Upload config.yaml to Modal volume
|
||||
python scripts/upload_config_to_modal.py
|
||||
|
||||
echo ""
|
||||
echo -e "${BLUE}Step 5: Deploying Modal API...${NC}"
|
||||
modal deploy modal_services/modal_api.py
|
||||
|
||||
echo ""
|
||||
echo -e "${BLUE}Step 6: Deploying scheduled search service...${NC}"
|
||||
modal deploy modal_services/scheduled_search.py
|
||||
|
||||
echo ""
|
||||
echo "=========================================="
|
||||
echo -e " ${GREEN}Deployment Complete!${NC}"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
echo "Deployed services:"
|
||||
echo ""
|
||||
echo "📡 Modal API (tuxedo-link-api):"
|
||||
echo " - search_cats()"
|
||||
echo " - extract_profile()"
|
||||
echo " - create_alert_and_notify()"
|
||||
echo " - get_alerts()"
|
||||
echo " - update_alert()"
|
||||
echo " - delete_alert()"
|
||||
echo " - health_check()"
|
||||
echo ""
|
||||
echo "⏰ Scheduled Jobs (tuxedo-link-scheduled-search):"
|
||||
echo " - daily_search_job (9 AM UTC daily)"
|
||||
echo " - weekly_search_job (Monday 9 AM UTC)"
|
||||
echo " - weekly_cleanup_job (Sunday 2 AM UTC)"
|
||||
echo ""
|
||||
echo "Useful commands:"
|
||||
echo " API logs: modal app logs tuxedo-link-api --follow"
|
||||
echo " Schedule logs: modal app logs tuxedo-link-scheduled-search --follow"
|
||||
echo " View apps: modal app list"
|
||||
echo " View volumes: modal volume list"
|
||||
echo " View secrets: modal secret list"
|
||||
echo ""
|
||||
echo "Next steps:"
|
||||
echo " 1. Run UI: ./run.sh"
|
||||
echo " 2. Go to: http://localhost:7860"
|
||||
echo " 3. Test search and alerts!"
|
||||
echo "=========================================="
|
||||
|
||||
@@ -1,68 +0,0 @@
|
||||
## 🚀 Modal Deployment Guide
|
||||
|
||||
How to deploy Tuxedo Link to Modal for production use.
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Production Architecture
|
||||
|
||||
In production mode, Tuxedo Link uses a **hybrid architecture**:
|
||||
|
||||
### Component Distribution
|
||||
|
||||
**Local (Your Computer)**:
|
||||
- Gradio UI (`app.py`) - User interface only
|
||||
- No heavy ML models loaded
|
||||
- Fast startup
|
||||
|
||||
**Modal (Cloud)**:
|
||||
- `modal_api.py` - Main API functions (profile extraction, search, alerts)
|
||||
- `scheduled_search.py` - Scheduled jobs (daily/weekly alerts, cleanup)
|
||||
- Database (SQLite on Modal volume)
|
||||
- Vector DB (ChromaDB on Modal volume)
|
||||
- All ML models (GPT-4, SentenceTransformer, CLIP)
|
||||
|
||||
### Communication Flow
|
||||
|
||||
```
|
||||
User → Gradio UI (local) → modal.Function.from_name().remote() → Modal API → Response → UI
|
||||
```
|
||||
|
||||
**Key Functions Exposed by Modal**:
|
||||
1. `extract_profile` - Convert natural language to CatProfile
|
||||
2. `search_cats` - Execute complete search pipeline
|
||||
3. `create_alert_and_notify` - Create alert with optional immediate email
|
||||
4. `get_alerts` / `update_alert` / `delete_alert` - Alert management
|
||||
|
||||
---
|
||||
|
||||
## 📋 Quick Start (Automated Deployment)
|
||||
|
||||
The easiest way to deploy is using the automated deployment script:
|
||||
|
||||
```bash
|
||||
cd week8/community_contributions/dkisselev-zz/tuxedo_link
|
||||
|
||||
# 1. Configure config.yaml for production
|
||||
cp config.example.yaml config.yaml
|
||||
# Edit config.yaml and set deployment.mode to 'production'
|
||||
|
||||
# 2. Ensure environment variables are set
|
||||
# Load from .env or set manually:
|
||||
export OPENAI_API_KEY=sk-...
|
||||
export PETFINDER_API_KEY=...
|
||||
export PETFINDER_SECRET=...
|
||||
export RESCUEGROUPS_API_KEY=...
|
||||
export MAILGUN_API_KEY=...
|
||||
|
||||
# 3. Run deployment script
|
||||
./deploy.sh
|
||||
```
|
||||
|
||||
The script will automatically:
|
||||
- ✅ Validate Modal authentication
|
||||
- ✅ Check configuration
|
||||
- ✅ Create/update Modal secrets
|
||||
- ✅ Create Modal volume
|
||||
- ✅ Upload config.yaml to Modal
|
||||
- ✅ Deploy scheduled search services
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,487 +0,0 @@
|
||||
# 🏗️ Tuxedo Link - Architecture Diagrams
|
||||
|
||||
**Date**: October 27, 2024
|
||||
**Tool**: [Eraser.io](https://www.eraser.io/)
|
||||
|
||||
---
|
||||
|
||||
## System Architecture
|
||||
|
||||
This diagram can be rendered on [Eraser.io](https://www.eraser.io/) or any compatible Mermaid format diagraming tool
|
||||
|
||||
### High-Level Architecture
|
||||
|
||||
```eraser
|
||||
// Tuxedo Link - High-Level System Architecture
|
||||
|
||||
// External APIs
|
||||
openai [icon: openai, color: green]
|
||||
petfinder [icon: api, color: blue]
|
||||
rescuegroups [icon: api, color: blue]
|
||||
sendgrid [icon: email, color: red]
|
||||
|
||||
// Frontend Layer
|
||||
gradio [icon: browser, color: purple] {
|
||||
search_tab
|
||||
alerts_tab
|
||||
about_tab
|
||||
}
|
||||
|
||||
// Application Layer
|
||||
framework [icon: server, color: orange] {
|
||||
TuxedoLinkFramework
|
||||
}
|
||||
|
||||
// Agent Layer
|
||||
agents [icon: users, color: cyan] {
|
||||
PlanningAgent
|
||||
ProfileAgent
|
||||
PetfinderAgent
|
||||
RescueGroupsAgent
|
||||
DeduplicationAgent
|
||||
MatchingAgent
|
||||
EmailAgent
|
||||
}
|
||||
|
||||
// Data Layer
|
||||
databases [icon: database, color: gray] {
|
||||
SQLite
|
||||
ChromaDB
|
||||
}
|
||||
|
||||
// Deployment
|
||||
modal [icon: cloud, color: blue] {
|
||||
scheduled_jobs
|
||||
volume_storage
|
||||
}
|
||||
|
||||
// Connections
|
||||
gradio > framework: User requests
|
||||
framework > agents: Orchestrate
|
||||
agents > openai: Profile extraction
|
||||
agents > petfinder: Search cats
|
||||
agents > rescuegroups: Search cats
|
||||
agents > sendgrid: Send notifications
|
||||
agents > databases: Store/retrieve
|
||||
framework > databases: Manage data
|
||||
modal > framework: Scheduled searches
|
||||
modal > databases: Persistent storage
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Detailed Component Architecture
|
||||
|
||||
```eraser
|
||||
// Tuxedo Link - Detailed Component Architecture
|
||||
|
||||
// Users
|
||||
user [icon: user, color: purple]
|
||||
|
||||
// Frontend - Gradio UI
|
||||
ui_layer [color: #E8F5E9] {
|
||||
gradio_app [label: "Gradio Application"]
|
||||
search_interface [label: "Search Tab"]
|
||||
alerts_interface [label: "Alerts Tab"]
|
||||
about_interface [label: "About Tab"]
|
||||
|
||||
gradio_app > search_interface
|
||||
gradio_app > alerts_interface
|
||||
gradio_app > about_interface
|
||||
}
|
||||
|
||||
// Framework Layer
|
||||
framework_layer [color: #FFF3E0] {
|
||||
tuxedo_framework [label: "TuxedoLinkFramework", icon: server]
|
||||
user_manager [label: "UserManager", icon: user]
|
||||
|
||||
tuxedo_framework > user_manager
|
||||
}
|
||||
|
||||
// Orchestration Layer
|
||||
orchestration [color: #E3F2FD] {
|
||||
planning_agent [label: "PlanningAgent\n(Orchestrator)", icon: brain]
|
||||
}
|
||||
|
||||
// Processing Agents
|
||||
processing_agents [color: #F3E5F5] {
|
||||
profile_agent [label: "ProfileAgent\n(GPT-4)", icon: chat]
|
||||
matching_agent [label: "MatchingAgent\n(Hybrid Search)", icon: search]
|
||||
dedup_agent [label: "DeduplicationAgent\n(Fingerprint+CLIP)", icon: filter]
|
||||
}
|
||||
|
||||
// External Integration Agents
|
||||
external_agents [color: #E0F2F1] {
|
||||
petfinder_agent [label: "PetfinderAgent\n(OAuth)", icon: api]
|
||||
rescuegroups_agent [label: "RescueGroupsAgent\n(API Key)", icon: api]
|
||||
email_agent [label: "EmailAgent\n(SendGrid)", icon: email]
|
||||
}
|
||||
|
||||
// Data Storage
|
||||
storage_layer [color: #ECEFF1] {
|
||||
sqlite_db [label: "SQLite Database", icon: database]
|
||||
vector_db [label: "ChromaDB\n(Vector Store)", icon: database]
|
||||
|
||||
db_tables [label: "Tables"] {
|
||||
users_table [label: "users"]
|
||||
alerts_table [label: "alerts"]
|
||||
cats_cache_table [label: "cats_cache"]
|
||||
}
|
||||
|
||||
vector_collections [label: "Collections"] {
|
||||
cats_collection [label: "cats_embeddings"]
|
||||
}
|
||||
|
||||
sqlite_db > db_tables
|
||||
vector_db > vector_collections
|
||||
}
|
||||
|
||||
// External Services
|
||||
external_services [color: #FFEBEE] {
|
||||
openai_api [label: "OpenAI API\n(GPT-4)", icon: openai]
|
||||
petfinder_api [label: "Petfinder API\n(OAuth 2.0)", icon: api]
|
||||
rescuegroups_api [label: "RescueGroups API\n(API Key)", icon: api]
|
||||
sendgrid_api [label: "SendGrid API\n(Email)", icon: email]
|
||||
}
|
||||
|
||||
// Deployment Layer
|
||||
deployment [color: #E8EAF6] {
|
||||
modal_service [label: "Modal (Serverless)", icon: cloud]
|
||||
|
||||
modal_functions [label: "Functions"] {
|
||||
daily_job [label: "daily_search_job"]
|
||||
weekly_job [label: "weekly_search_job"]
|
||||
cleanup_job [label: "cleanup_job"]
|
||||
}
|
||||
|
||||
modal_storage [label: "Storage"] {
|
||||
volume [label: "Modal Volume\n(/data)"]
|
||||
}
|
||||
|
||||
modal_service > modal_functions
|
||||
modal_service > modal_storage
|
||||
}
|
||||
|
||||
// User Flows
|
||||
user > ui_layer: Interact
|
||||
ui_layer > framework_layer: API calls
|
||||
framework_layer > orchestration: Search request
|
||||
|
||||
// Orchestration Flow
|
||||
orchestration > processing_agents: Extract profile
|
||||
orchestration > external_agents: Fetch cats
|
||||
orchestration > processing_agents: Deduplicate
|
||||
orchestration > processing_agents: Match & rank
|
||||
orchestration > storage_layer: Cache results
|
||||
|
||||
// Agent to External Services
|
||||
processing_agents > external_services: Profile extraction
|
||||
external_agents > external_services: API requests
|
||||
external_agents > external_services: Send emails
|
||||
|
||||
// Agent to Storage
|
||||
processing_agents > storage_layer: Store/retrieve
|
||||
external_agents > storage_layer: Cache & embeddings
|
||||
orchestration > storage_layer: Query & update
|
||||
|
||||
// Modal Integration
|
||||
deployment > framework_layer: Scheduled tasks
|
||||
deployment > storage_layer: Persistent data
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data Flow Diagram
|
||||
|
||||
```eraser
|
||||
// Tuxedo Link - Search Data Flow
|
||||
|
||||
user [icon: user]
|
||||
|
||||
// Step 1: User Input
|
||||
user_input [label: "1. User Input\n'friendly playful cat\nin NYC'"]
|
||||
|
||||
// Step 2: Profile Extraction
|
||||
profile_extraction [label: "2. Profile Agent\n(OpenAI GPT-4)", icon: chat, color: purple]
|
||||
extracted_profile [label: "CatProfile\n- location: NYC\n- age: young\n- personality: friendly"]
|
||||
|
||||
// Step 3: API Fetching (Parallel)
|
||||
api_fetch [label: "3. Fetch from APIs\n(Parallel)", icon: api, color: blue]
|
||||
petfinder_results [label: "Petfinder\n50 cats"]
|
||||
rescuegroups_results [label: "RescueGroups\n50 cats"]
|
||||
|
||||
// Step 4: Deduplication
|
||||
dedup [label: "4. Deduplication\n(3-tier)", icon: filter, color: orange]
|
||||
dedup_details [label: "- Fingerprint\n- Text similarity\n- Image similarity"]
|
||||
|
||||
// Step 5: Cache & Embed
|
||||
cache [label: "5. Cache & Embed", icon: database, color: gray]
|
||||
sqlite_cache [label: "SQLite\n(Cat data)"]
|
||||
vector_store [label: "ChromaDB\n(Embeddings)"]
|
||||
|
||||
// Step 6: Hybrid Matching
|
||||
matching [label: "6. Hybrid Search\n60% vector\n40% metadata", icon: search, color: green]
|
||||
|
||||
// Step 7: Results
|
||||
results [label: "7. Ranked Results\nTop 20 matches"]
|
||||
|
||||
// Step 8: Display
|
||||
display [label: "8. Display to User\nwith explanations", icon: browser, color: purple]
|
||||
|
||||
// Flow connections
|
||||
user > user_input
|
||||
user_input > profile_extraction
|
||||
profile_extraction > extracted_profile
|
||||
extracted_profile > api_fetch
|
||||
|
||||
api_fetch > petfinder_results
|
||||
api_fetch > rescuegroups_results
|
||||
|
||||
petfinder_results > dedup
|
||||
rescuegroups_results > dedup
|
||||
dedup > dedup_details
|
||||
|
||||
dedup > cache
|
||||
cache > sqlite_cache
|
||||
cache > vector_store
|
||||
|
||||
sqlite_cache > matching
|
||||
vector_store > matching
|
||||
|
||||
matching > results
|
||||
results > display
|
||||
display > user
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agent Interaction Diagram
|
||||
|
||||
```eraser
|
||||
// Tuxedo Link - Agent Interactions
|
||||
|
||||
// Planning Agent (Orchestrator)
|
||||
planner [label: "PlanningAgent\n(Orchestrator)", icon: brain, color: orange]
|
||||
|
||||
// Worker Agents
|
||||
profile [label: "ProfileAgent", icon: chat, color: purple]
|
||||
petfinder [label: "PetfinderAgent", icon: api, color: blue]
|
||||
rescue [label: "RescueGroupsAgent", icon: api, color: blue]
|
||||
dedup [label: "DeduplicationAgent", icon: filter, color: cyan]
|
||||
matching [label: "MatchingAgent", icon: search, color: green]
|
||||
email [label: "EmailAgent", icon: email, color: red]
|
||||
|
||||
// Data Stores
|
||||
db [label: "DatabaseManager", icon: database, color: gray]
|
||||
vectordb [label: "VectorDBManager", icon: database, color: gray]
|
||||
|
||||
// External
|
||||
openai [label: "OpenAI API", icon: openai, color: green]
|
||||
apis [label: "External APIs", icon: api, color: blue]
|
||||
sendgrid [label: "SendGrid", icon: email, color: red]
|
||||
|
||||
// Orchestration
|
||||
planner > profile: 1. Extract preferences
|
||||
profile > openai: API call
|
||||
openai > profile: Structured output
|
||||
profile > planner: CatProfile
|
||||
|
||||
planner > petfinder: 2. Search (parallel)
|
||||
planner > rescue: 2. Search (parallel)
|
||||
petfinder > apis: API request
|
||||
rescue > apis: API request
|
||||
apis > petfinder: Cat data
|
||||
apis > rescue: Cat data
|
||||
petfinder > planner: Cats list
|
||||
rescue > planner: Cats list
|
||||
|
||||
planner > dedup: 3. Remove duplicates
|
||||
dedup > db: Check cache
|
||||
db > dedup: Cached embeddings
|
||||
dedup > planner: Unique cats
|
||||
|
||||
planner > db: 4. Cache results
|
||||
planner > vectordb: 5. Update embeddings
|
||||
|
||||
planner > matching: 6. Find matches
|
||||
matching > vectordb: Vector search
|
||||
matching > db: Metadata filter
|
||||
vectordb > matching: Similar cats
|
||||
db > matching: Filtered cats
|
||||
matching > planner: Ranked matches
|
||||
|
||||
planner > email: 7. Send notifications (if alert)
|
||||
email > sendgrid: API call
|
||||
sendgrid > email: Delivery status
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Architecture
|
||||
|
||||
```eraser
|
||||
// Tuxedo Link - Modal Deployment
|
||||
|
||||
// Local Development
|
||||
local [label: "Local Development", icon: laptop, color: purple] {
|
||||
gradio_dev [label: "Gradio UI\n:7860"]
|
||||
dev_db [label: "SQLite DB\n./data/"]
|
||||
dev_vector [label: "ChromaDB\n./cat_vectorstore/"]
|
||||
}
|
||||
|
||||
// Modal Cloud
|
||||
modal [label: "Modal Cloud", icon: cloud, color: blue] {
|
||||
// Scheduled Functions
|
||||
scheduled [label: "Scheduled Functions"] {
|
||||
daily [label: "daily_search_job\nCron: 0 9 * * *"]
|
||||
weekly [label: "weekly_search_job\nCron: 0 9 * * 1"]
|
||||
cleanup [label: "cleanup_job\nCron: 0 2 * * 0"]
|
||||
}
|
||||
|
||||
// On-Demand Functions
|
||||
ondemand [label: "On-Demand"] {
|
||||
manual_search [label: "run_scheduled_searches()"]
|
||||
manual_cleanup [label: "cleanup_old_data()"]
|
||||
}
|
||||
|
||||
// Storage
|
||||
storage [label: "Modal Volume\n/data"] {
|
||||
vol_db [label: "tuxedo_link.db"]
|
||||
vol_vector [label: "cat_vectorstore/"]
|
||||
}
|
||||
|
||||
// Secrets
|
||||
secrets [label: "Secrets"] {
|
||||
api_keys [label: "- OPENAI_API_KEY\n- PETFINDER_*\n- RESCUEGROUPS_*\n- SENDGRID_*"]
|
||||
}
|
||||
}
|
||||
|
||||
// External Services
|
||||
external [label: "External Services", icon: cloud, color: red] {
|
||||
openai [label: "OpenAI"]
|
||||
petfinder [label: "Petfinder"]
|
||||
rescue [label: "RescueGroups"]
|
||||
sendgrid [label: "SendGrid"]
|
||||
}
|
||||
|
||||
// Connections
|
||||
local > modal: Deploy
|
||||
modal > storage: Persistent data
|
||||
modal > secrets: Load keys
|
||||
scheduled > storage: Read/Write
|
||||
ondemand > storage: Read/Write
|
||||
modal > external: API calls
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```eraser
|
||||
// Tuxedo Link - Database Schema
|
||||
|
||||
// Users Table
|
||||
users [icon: table, color: blue] {
|
||||
id [label: "id: INTEGER PK"]
|
||||
email [label: "email: TEXT UNIQUE"]
|
||||
password_hash [label: "password_hash: TEXT"]
|
||||
created_at [label: "created_at: DATETIME"]
|
||||
last_login [label: "last_login: DATETIME"]
|
||||
}
|
||||
|
||||
// Alerts Table
|
||||
alerts [icon: table, color: green] {
|
||||
aid [label: "id: INTEGER PK"]
|
||||
user_id [label: "user_id: INTEGER FK"]
|
||||
user_email [label: "user_email: TEXT"]
|
||||
profile_json [label: "profile_json: TEXT"]
|
||||
frequency [label: "frequency: TEXT"]
|
||||
last_sent [label: "last_sent: DATETIME"]
|
||||
active [label: "active: INTEGER"]
|
||||
created_at [label: "created_at: DATETIME"]
|
||||
last_match_ids [label: "last_match_ids: TEXT"]
|
||||
}
|
||||
|
||||
// Cats Cache Table
|
||||
cats_cache [icon: table, color: orange] {
|
||||
cid [label: "id: TEXT PK"]
|
||||
name [label: "name: TEXT"]
|
||||
breed [label: "breed: TEXT"]
|
||||
age [label: "age: TEXT"]
|
||||
gender [label: "gender: TEXT"]
|
||||
size [label: "size: TEXT"]
|
||||
organization_name [label: "organization_name: TEXT"]
|
||||
city [label: "city: TEXT"]
|
||||
state [label: "state: TEXT"]
|
||||
source [label: "source: TEXT"]
|
||||
url [label: "url: TEXT"]
|
||||
cat_json [label: "cat_json: TEXT"]
|
||||
fingerprint [label: "fingerprint: TEXT"]
|
||||
image_embedding [label: "image_embedding: BLOB"]
|
||||
is_duplicate [label: "is_duplicate: INTEGER"]
|
||||
duplicate_of [label: "duplicate_of: TEXT"]
|
||||
fetched_at [label: "fetched_at: DATETIME"]
|
||||
created_at [label: "created_at: DATETIME"]
|
||||
}
|
||||
|
||||
// ChromaDB Collection
|
||||
vector_collection [icon: database, color: purple] {
|
||||
cats_embeddings [label: "Collection: cats_embeddings"]
|
||||
embedding_dim [label: "Dimensions: 384"]
|
||||
model [label: "Model: all-MiniLM-L6-v2"]
|
||||
metadata [label: "Metadata: name, breed, age, etc."]
|
||||
}
|
||||
|
||||
// Relationships
|
||||
users > alerts: user_id
|
||||
alerts > cats_cache: Search results
|
||||
cats_cache > vector_collection: Embeddings
|
||||
```
|
||||
|
||||
---
|
||||
## Diagram Types Included
|
||||
|
||||
1. **System Architecture** - High-level overview of all components
|
||||
2. **Detailed Component Architecture** - Deep dive into layers and connections
|
||||
3. **Data Flow Diagram** - Step-by-step search process
|
||||
4. **Agent Interaction Diagram** - How agents communicate
|
||||
5. **Deployment Architecture** - Modal cloud deployment
|
||||
6. **Database Schema** - Data model and relationships
|
||||
|
||||
---
|
||||
|
||||
## Architecture Highlights
|
||||
|
||||
### Layered Architecture
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ Frontend Layer (Gradio UI) │
|
||||
├─────────────────────────────────────┤
|
||||
│ Framework Layer (Orchestration) │
|
||||
├─────────────────────────────────────┤
|
||||
│ Agent Layer (7 Specialized Agents) │
|
||||
├─────────────────────────────────────┤
|
||||
│ Data Layer (SQLite + ChromaDB) │
|
||||
├─────────────────────────────────────┤
|
||||
│ External APIs (4 Services) │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Key Design Patterns
|
||||
|
||||
- **Agent Pattern**: Specialized agents for different tasks
|
||||
- **Orchestrator Pattern**: Planning agent coordinates workflow
|
||||
- **Repository Pattern**: DatabaseManager abstracts data access
|
||||
- **Strategy Pattern**: Different search strategies (Petfinder, RescueGroups)
|
||||
- **Decorator Pattern**: Rate limiting and timing decorators
|
||||
- **Observer Pattern**: Scheduled jobs watch for new alerts
|
||||
|
||||
### Technology Stack
|
||||
|
||||
**Frontend**: Gradio
|
||||
**Backend**: Python 3.12
|
||||
**Framework**: Custom Agent-based
|
||||
**Databases**: SQLite, ChromaDB
|
||||
**AI/ML**: OpenAI GPT-4, CLIP, SentenceTransformers
|
||||
**Deployment**: Modal (Serverless)
|
||||
**APIs**: Petfinder, RescueGroups, SendGrid
|
||||
@@ -1,55 +0,0 @@
|
||||
// Tuxedo Link - Agent Interactions
|
||||
|
||||
// Planning Agent (Orchestrator)
|
||||
planner [label: "PlanningAgent\n(Orchestrator)", icon: brain, color: orange]
|
||||
|
||||
// Worker Agents
|
||||
profile [label: "ProfileAgent", icon: chat, color: purple]
|
||||
petfinder [label: "PetfinderAgent", icon: api, color: blue]
|
||||
rescue [label: "RescueGroupsAgent", icon: api, color: blue]
|
||||
dedup [label: "DeduplicationAgent", icon: filter, color: cyan]
|
||||
matching [label: "MatchingAgent", icon: search, color: green]
|
||||
email [label: "EmailAgent", icon: email, color: red]
|
||||
|
||||
// Data Stores
|
||||
db [label: "DatabaseManager", icon: database, color: gray]
|
||||
vectordb [label: "VectorDBManager", icon: database, color: gray]
|
||||
|
||||
// External
|
||||
openai [label: "OpenAI API", icon: openai, color: green]
|
||||
apis [label: "External APIs", icon: api, color: blue]
|
||||
sendgrid [label: "SendGrid", icon: email, color: red]
|
||||
|
||||
// Orchestration
|
||||
planner > profile: 1. Extract preferences
|
||||
profile > openai: API call
|
||||
openai > profile: Structured output
|
||||
profile > planner: CatProfile
|
||||
|
||||
planner > petfinder: 2. Search (parallel)
|
||||
planner > rescue: 2. Search (parallel)
|
||||
petfinder > apis: API request
|
||||
rescue > apis: API request
|
||||
apis > petfinder: Cat data
|
||||
apis > rescue: Cat data
|
||||
petfinder > planner: Cats list
|
||||
rescue > planner: Cats list
|
||||
|
||||
planner > dedup: 3. Remove duplicates
|
||||
dedup > db: Check cache
|
||||
db > dedup: Cached embeddings
|
||||
dedup > planner: Unique cats
|
||||
|
||||
planner > db: 4. Cache results
|
||||
planner > vectordb: 5. Update embeddings
|
||||
|
||||
planner > matching: 6. Find matches
|
||||
matching > vectordb: Vector search
|
||||
matching > db: Metadata filter
|
||||
vectordb > matching: Similar cats
|
||||
db > matching: Filtered cats
|
||||
matching > planner: Ranked matches
|
||||
|
||||
planner > email: 7. Send notifications (if alert)
|
||||
email > sendgrid: API call
|
||||
sendgrid > email: Delivery status
|
||||
File diff suppressed because one or more lines are too long
|
Before Width: | Height: | Size: 586 KiB |
@@ -1,114 +0,0 @@
|
||||
// Tuxedo Link - Detailed Component Architecture
|
||||
|
||||
// Users
|
||||
user [icon: user, color: purple]
|
||||
|
||||
// Frontend - Gradio UI
|
||||
ui_layer [color: #E8F5E9] {
|
||||
gradio_app [label: "Gradio Application"]
|
||||
search_interface [label: "Search Tab"]
|
||||
alerts_interface [label: "Alerts Tab"]
|
||||
about_interface [label: "About Tab"]
|
||||
|
||||
gradio_app > search_interface
|
||||
gradio_app > alerts_interface
|
||||
gradio_app > about_interface
|
||||
}
|
||||
|
||||
// Framework Layer
|
||||
framework_layer [color: #FFF3E0] {
|
||||
tuxedo_framework [label: "TuxedoLinkFramework", icon: server]
|
||||
user_manager [label: "UserManager", icon: user]
|
||||
|
||||
tuxedo_framework > user_manager
|
||||
}
|
||||
|
||||
// Orchestration Layer
|
||||
orchestration [color: #E3F2FD] {
|
||||
planning_agent [label: "PlanningAgent\n(Orchestrator)", icon: brain]
|
||||
}
|
||||
|
||||
// Processing Agents
|
||||
processing_agents [color: #F3E5F5] {
|
||||
profile_agent [label: "ProfileAgent\n(GPT-4)", icon: chat]
|
||||
matching_agent [label: "MatchingAgent\n(Hybrid Search)", icon: search]
|
||||
dedup_agent [label: "DeduplicationAgent\n(Fingerprint+CLIP)", icon: filter]
|
||||
}
|
||||
|
||||
// External Integration Agents
|
||||
external_agents [color: #E0F2F1] {
|
||||
petfinder_agent [label: "PetfinderAgent\n(OAuth)", icon: api]
|
||||
rescuegroups_agent [label: "RescueGroupsAgent\n(API Key)", icon: api]
|
||||
email_agent [label: "EmailAgent\n(SendGrid)", icon: email]
|
||||
}
|
||||
|
||||
// Data Storage
|
||||
storage_layer [color: #ECEFF1] {
|
||||
sqlite_db [label: "SQLite Database", icon: database]
|
||||
vector_db [label: "ChromaDB\n(Vector Store)", icon: database]
|
||||
|
||||
db_tables [label: "Tables"] {
|
||||
users_table [label: "users"]
|
||||
alerts_table [label: "alerts"]
|
||||
cats_cache_table [label: "cats_cache"]
|
||||
}
|
||||
|
||||
vector_collections [label: "Collections"] {
|
||||
cats_collection [label: "cats_embeddings"]
|
||||
}
|
||||
|
||||
sqlite_db > db_tables
|
||||
vector_db > vector_collections
|
||||
}
|
||||
|
||||
// External Services
|
||||
external_services [color: #FFEBEE] {
|
||||
openai_api [label: "OpenAI API\n(GPT-4)", icon: openai]
|
||||
petfinder_api [label: "Petfinder API\n(OAuth 2.0)", icon: api]
|
||||
rescuegroups_api [label: "RescueGroups API\n(API Key)", icon: api]
|
||||
sendgrid_api [label: "SendGrid API\n(Email)", icon: email]
|
||||
}
|
||||
|
||||
// Deployment Layer
|
||||
deployment [color: #E8EAF6] {
|
||||
modal_service [label: "Modal (Serverless)", icon: cloud]
|
||||
|
||||
modal_functions [label: "Functions"] {
|
||||
daily_job [label: "daily_search_job"]
|
||||
weekly_job [label: "weekly_search_job"]
|
||||
cleanup_job [label: "cleanup_job"]
|
||||
}
|
||||
|
||||
modal_storage [label: "Storage"] {
|
||||
volume [label: "Modal Volume\n(/data)"]
|
||||
}
|
||||
|
||||
modal_service > modal_functions
|
||||
modal_service > modal_storage
|
||||
}
|
||||
|
||||
// User Flows
|
||||
user > ui_layer: Interact
|
||||
ui_layer > framework_layer: API calls
|
||||
framework_layer > orchestration: Search request
|
||||
|
||||
// Orchestration Flow
|
||||
orchestration > processing_agents: Extract profile
|
||||
orchestration > external_agents: Fetch cats
|
||||
orchestration > processing_agents: Deduplicate
|
||||
orchestration > processing_agents: Match & rank
|
||||
orchestration > storage_layer: Cache results
|
||||
|
||||
// Agent to External Services
|
||||
processing_agents > external_services: Profile extraction
|
||||
external_agents > external_services: API requests
|
||||
external_agents > external_services: Send emails
|
||||
|
||||
// Agent to Storage
|
||||
processing_agents > storage_layer: Store/retrieve
|
||||
external_agents > storage_layer: Cache & embeddings
|
||||
orchestration > storage_layer: Query & update
|
||||
|
||||
// Modal Integration
|
||||
deployment > framework_layer: Scheduled tasks
|
||||
deployment > storage_layer: Persistent data
|
||||
File diff suppressed because one or more lines are too long
|
Before Width: | Height: | Size: 1.1 MiB |
@@ -1,58 +0,0 @@
|
||||
// Tuxedo Link - Database Schema
|
||||
|
||||
// Users Table
|
||||
users [icon: table, color: blue] {
|
||||
id [label: "id: INTEGER PK"]
|
||||
email [label: "email: TEXT UNIQUE"]
|
||||
password_hash [label: "password_hash: TEXT"]
|
||||
created_at [label: "created_at: DATETIME"]
|
||||
last_login [label: "last_login: DATETIME"]
|
||||
}
|
||||
|
||||
// Alerts Table
|
||||
alerts [icon: table, color: green] {
|
||||
aid [label: "id: INTEGER PK"]
|
||||
user_id [label: "user_id: INTEGER FK"]
|
||||
user_email [label: "user_email: TEXT"]
|
||||
profile_json [label: "profile_json: TEXT"]
|
||||
frequency [label: "frequency: TEXT"]
|
||||
last_sent [label: "last_sent: DATETIME"]
|
||||
active [label: "active: INTEGER"]
|
||||
created_at [label: "created_at: DATETIME"]
|
||||
last_match_ids [label: "last_match_ids: TEXT"]
|
||||
}
|
||||
|
||||
// Cats Cache Table
|
||||
cats_cache [icon: table, color: orange] {
|
||||
cid [label: "id: TEXT PK"]
|
||||
name [label: "name: TEXT"]
|
||||
breed [label: "breed: TEXT"]
|
||||
age [label: "age: TEXT"]
|
||||
gender [label: "gender: TEXT"]
|
||||
size [label: "size: TEXT"]
|
||||
organization_name [label: "organization_name: TEXT"]
|
||||
city [label: "city: TEXT"]
|
||||
state [label: "state: TEXT"]
|
||||
source [label: "source: TEXT"]
|
||||
url [label: "url: TEXT"]
|
||||
cat_json [label: "cat_json: TEXT"]
|
||||
fingerprint [label: "fingerprint: TEXT"]
|
||||
image_embedding [label: "image_embedding: BLOB"]
|
||||
is_duplicate [label: "is_duplicate: INTEGER"]
|
||||
duplicate_of [label: "duplicate_of: TEXT"]
|
||||
fetched_at [label: "fetched_at: DATETIME"]
|
||||
created_at [label: "created_at: DATETIME"]
|
||||
}
|
||||
|
||||
// ChromaDB Collection
|
||||
vector_collection [icon: database, color: purple] {
|
||||
cats_embeddings [label: "Collection: cats_embeddings"]
|
||||
embedding_dim [label: "Dimensions: 384"]
|
||||
model [label: "Model: all-MiniLM-L6-v2"]
|
||||
metadata [label: "Metadata: name, breed, age, etc."]
|
||||
}
|
||||
|
||||
// Relationships
|
||||
users > alerts: user_id
|
||||
alerts > cats_cache: Search results
|
||||
cats_cache > vector_collection: Embeddings
|
||||
File diff suppressed because one or more lines are too long
|
Before Width: | Height: | Size: 817 KiB |
@@ -1,51 +0,0 @@
|
||||
// Tuxedo Link - Modal Deployment
|
||||
|
||||
// Local Development
|
||||
local [label: "Local Development", icon: laptop, color: purple] {
|
||||
gradio_dev [label: "Gradio UI\n:7860"]
|
||||
dev_db [label: "SQLite DB\n./data/"]
|
||||
dev_vector [label: "ChromaDB\n./cat_vectorstore/"]
|
||||
}
|
||||
|
||||
// Modal Cloud
|
||||
modal [label: "Modal Cloud", icon: cloud, color: blue] {
|
||||
// Scheduled Functions
|
||||
scheduled [label: "Scheduled Functions"] {
|
||||
daily [label: "daily_search_job\nCron: 0 9 * * *"]
|
||||
weekly [label: "weekly_search_job\nCron: 0 9 * * 1"]
|
||||
cleanup [label: "cleanup_job\nCron: 0 2 * * 0"]
|
||||
}
|
||||
|
||||
// On-Demand Functions
|
||||
ondemand [label: "On-Demand"] {
|
||||
manual_search [label: "run_scheduled_searches()"]
|
||||
manual_cleanup [label: "cleanup_old_data()"]
|
||||
}
|
||||
|
||||
// Storage
|
||||
storage [label: "Modal Volume\n/data"] {
|
||||
vol_db [label: "tuxedo_link.db"]
|
||||
vol_vector [label: "cat_vectorstore/"]
|
||||
}
|
||||
|
||||
// Secrets
|
||||
secrets [label: "Secrets"] {
|
||||
api_keys [label: "- OPENAI_API_KEY\n- PETFINDER_*\n- RESCUEGROUPS_*\n- SENDGRID_*"]
|
||||
}
|
||||
}
|
||||
|
||||
// External Services
|
||||
external [label: "External Services", icon: cloud, color: red] {
|
||||
openai [label: "OpenAI"]
|
||||
petfinder [label: "Petfinder"]
|
||||
rescue [label: "RescueGroups"]
|
||||
sendgrid [label: "SendGrid"]
|
||||
}
|
||||
|
||||
// Connections
|
||||
local > modal: Deploy
|
||||
modal > storage: Persistent data
|
||||
modal > secrets: Load keys
|
||||
scheduled > storage: Read/Write
|
||||
ondemand > storage: Read/Write
|
||||
modal > external: API calls
|
||||
File diff suppressed because one or more lines are too long
|
Before Width: | Height: | Size: 599 KiB |
@@ -1,58 +0,0 @@
|
||||
|
||||
// Tuxedo Link - Search Data Flow
|
||||
|
||||
user [icon: user]
|
||||
|
||||
// Step 1: User Input
|
||||
user_input [label: "1. User Input\n'friendly playful cat\nin NYC'"]
|
||||
|
||||
// Step 2: Profile Extraction
|
||||
profile_extraction [label: "2. Profile Agent\n(OpenAI GPT-4)", icon: chat, color: purple]
|
||||
extracted_profile [label: "CatProfile\n- location: NYC\n- age: young\n- personality: friendly"]
|
||||
|
||||
// Step 3: API Fetching (Parallel)
|
||||
api_fetch [label: "3. Fetch from APIs\n(Parallel)", icon: api, color: blue]
|
||||
petfinder_results [label: "Petfinder\n50 cats"]
|
||||
rescuegroups_results [label: "RescueGroups\n50 cats"]
|
||||
|
||||
// Step 4: Deduplication
|
||||
dedup [label: "4. Deduplication\n(3-tier)", icon: filter, color: orange]
|
||||
dedup_details [label: "- Fingerprint\n- Text similarity\n- Image similarity"]
|
||||
|
||||
// Step 5: Cache & Embed
|
||||
cache [label: "5. Cache & Embed", icon: database, color: gray]
|
||||
sqlite_cache [label: "SQLite\n(Cat data)"]
|
||||
vector_store [label: "ChromaDB\n(Embeddings)"]
|
||||
|
||||
// Step 6: Hybrid Matching
|
||||
matching [label: "6. Hybrid Search\n60% vector\n40% metadata", icon: search, color: green]
|
||||
|
||||
// Step 7: Results
|
||||
results [label: "7. Ranked Results\nTop 20 matches"]
|
||||
|
||||
// Step 8: Display
|
||||
display [label: "8. Display to User\nwith explanations", icon: browser, color: purple]
|
||||
|
||||
// Flow connections
|
||||
user > user_input
|
||||
user_input > profile_extraction
|
||||
profile_extraction > extracted_profile
|
||||
extracted_profile > api_fetch
|
||||
|
||||
api_fetch > petfinder_results
|
||||
api_fetch > rescuegroups_results
|
||||
|
||||
petfinder_results > dedup
|
||||
rescuegroups_results > dedup
|
||||
dedup > dedup_details
|
||||
|
||||
dedup > cache
|
||||
cache > sqlite_cache
|
||||
cache > vector_store
|
||||
|
||||
sqlite_cache > matching
|
||||
vector_store > matching
|
||||
|
||||
matching > results
|
||||
results > display
|
||||
display > user
|
||||
File diff suppressed because one or more lines are too long
|
Before Width: | Height: | Size: 557 KiB |
@@ -1,54 +0,0 @@
|
||||
// Tuxedo Link - High-Level System Architecture
|
||||
|
||||
// External APIs
|
||||
openai [icon: openai, color: green]
|
||||
petfinder [icon: api, color: blue]
|
||||
rescuegroups [icon: api, color: blue]
|
||||
sendgrid [icon: email, color: red]
|
||||
|
||||
// Frontend Layer
|
||||
gradio [icon: browser, color: purple] {
|
||||
search_tab
|
||||
alerts_tab
|
||||
about_tab
|
||||
}
|
||||
|
||||
// Application Layer
|
||||
framework [icon: server, color: orange] {
|
||||
TuxedoLinkFramework
|
||||
}
|
||||
|
||||
// Agent Layer
|
||||
agents [icon: users, color: cyan] {
|
||||
PlanningAgent
|
||||
ProfileAgent
|
||||
PetfinderAgent
|
||||
RescueGroupsAgent
|
||||
DeduplicationAgent
|
||||
MatchingAgent
|
||||
EmailAgent
|
||||
}
|
||||
|
||||
// Data Layer
|
||||
databases [icon: database, color: gray] {
|
||||
SQLite
|
||||
ChromaDB
|
||||
}
|
||||
|
||||
// Deployment
|
||||
modal [icon: cloud, color: blue] {
|
||||
scheduled_jobs
|
||||
volume_storage
|
||||
}
|
||||
|
||||
// Connections
|
||||
gradio > framework: User requests
|
||||
framework > agents: Orchestrate
|
||||
agents > openai: Profile extraction
|
||||
agents > petfinder: Search cats
|
||||
agents > rescuegroups: Search cats
|
||||
agents > sendgrid: Send notifications
|
||||
agents > databases: Store/retrieve
|
||||
framework > databases: Manage data
|
||||
modal > framework: Scheduled searches
|
||||
modal > databases: Persistent storage
|
||||
File diff suppressed because one or more lines are too long
|
Before Width: | Height: | Size: 510 KiB |
@@ -1,35 +0,0 @@
|
||||
# LLM APIs
|
||||
OPENAI_API_KEY=sk-...
|
||||
|
||||
# Pet APIs
|
||||
PETFINDER_API_KEY=your_petfinder_api_key
|
||||
PETFINDER_SECRET=your_petfinder_secret
|
||||
RESCUEGROUPS_API_KEY=your_rescuegroups_api_key
|
||||
|
||||
# Email (provider configuration in config.yaml)
|
||||
MAILGUN_API_KEY=your_mailgun_api_key
|
||||
SENDGRID_API_KEY=your_sendgrid_api_key_optional
|
||||
|
||||
# Modal
|
||||
MODAL_TOKEN_ID=your_modal_token_id
|
||||
MODAL_TOKEN_SECRET=your_modal_token_secret
|
||||
|
||||
# App Config
|
||||
DATABASE_PATH=data/tuxedo_link.db
|
||||
VECTORDB_PATH=cat_vectorstore
|
||||
TTL_DAYS=30
|
||||
MAX_DISTANCE_MILES=100
|
||||
LOG_LEVEL=INFO
|
||||
|
||||
# Deduplication Thresholds
|
||||
DEDUP_NAME_SIMILARITY_THRESHOLD=0.8
|
||||
DEDUP_DESCRIPTION_SIMILARITY_THRESHOLD=0.7
|
||||
DEDUP_IMAGE_SIMILARITY_THRESHOLD=0.9
|
||||
DEDUP_COMPOSITE_THRESHOLD=0.85
|
||||
|
||||
# Hybrid Search Config
|
||||
VECTOR_TOP_N=50
|
||||
FINAL_RESULTS_LIMIT=20
|
||||
SEMANTIC_WEIGHT=0.6
|
||||
ATTRIBUTE_WEIGHT=0.4
|
||||
|
||||
@@ -1,378 +0,0 @@
|
||||
"""
|
||||
Complete Modal API for Tuxedo Link
|
||||
All application logic runs on Modal in production mode
|
||||
"""
|
||||
|
||||
import modal
|
||||
from datetime import datetime
|
||||
from typing import Dict, List, Any, Optional
|
||||
from pathlib import Path
|
||||
from cat_adoption_framework import TuxedoLinkFramework
|
||||
from models.cats import CatProfile, AdoptionAlert
|
||||
from database.manager import DatabaseManager
|
||||
from agents.profile_agent import ProfileAgent
|
||||
from agents.email_agent import EmailAgent
|
||||
from agents.email_providers.factory import get_email_provider
|
||||
|
||||
# Modal app and configuration
|
||||
app = modal.App("tuxedo-link-api")
|
||||
|
||||
# Create Modal volume for persistent data
|
||||
volume = modal.Volume.from_name("tuxedo-link-data", create_if_missing=True)
|
||||
|
||||
# Reference secrets
|
||||
secrets = [modal.Secret.from_name("tuxedo-link-secrets")]
|
||||
|
||||
# Get project directory
|
||||
project_dir = Path(__file__).parent
|
||||
|
||||
# Modal image with all dependencies and project files
|
||||
image = (
|
||||
modal.Image.debian_slim(python_version="3.11")
|
||||
.pip_install(
|
||||
"openai",
|
||||
"chromadb",
|
||||
"requests",
|
||||
"sentence-transformers==2.5.1",
|
||||
"transformers==4.38.0",
|
||||
"Pillow",
|
||||
"python-dotenv",
|
||||
"pydantic",
|
||||
"geopy",
|
||||
"pyyaml",
|
||||
"python-levenshtein",
|
||||
"open-clip-torch==2.24.0",
|
||||
)
|
||||
.apt_install("git")
|
||||
.run_commands(
|
||||
"pip install torch==2.2.2 torchvision==0.17.2 --index-url https://download.pytorch.org/whl/cpu",
|
||||
"pip install numpy==1.26.4",
|
||||
)
|
||||
# Add only necessary source directories (Modal 1.0+ API)
|
||||
.add_local_dir(str(project_dir / "models"), remote_path="/root/models")
|
||||
.add_local_dir(str(project_dir / "agents"), remote_path="/root/agents")
|
||||
.add_local_dir(str(project_dir / "database"), remote_path="/root/database")
|
||||
.add_local_dir(str(project_dir / "utils"), remote_path="/root/utils")
|
||||
# Add standalone Python files
|
||||
.add_local_file(str(project_dir / "cat_adoption_framework.py"), remote_path="/root/cat_adoption_framework.py")
|
||||
.add_local_file(str(project_dir / "setup_vectordb.py"), remote_path="/root/setup_vectordb.py")
|
||||
.add_local_file(str(project_dir / "setup_metadata_vectordb.py"), remote_path="/root/setup_metadata_vectordb.py")
|
||||
# Add config file
|
||||
.add_local_file(str(project_dir / "config.yaml"), remote_path="/root/config.yaml")
|
||||
)
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=600,
|
||||
cpu=2.0,
|
||||
memory=4096,
|
||||
)
|
||||
def search_cats(profile_dict: Dict[str, Any], use_cache: bool = False) -> Dict[str, Any]:
|
||||
"""
|
||||
Main search function - runs all agents and returns matches.
|
||||
|
||||
This is the primary API endpoint for cat searches in production mode.
|
||||
|
||||
Args:
|
||||
profile_dict: CatProfile as dictionary
|
||||
use_cache: Whether to use cached data
|
||||
|
||||
Returns:
|
||||
Dict with matches, stats, and search metadata
|
||||
"""
|
||||
print(f"[{datetime.now()}] Modal API: Starting cat search")
|
||||
print(f"Profile location: {profile_dict.get('user_location', 'Not specified')}")
|
||||
print(f"Cache mode: {use_cache}")
|
||||
|
||||
try:
|
||||
# Initialize framework
|
||||
framework = TuxedoLinkFramework()
|
||||
|
||||
# Reconstruct profile
|
||||
profile = CatProfile(**profile_dict)
|
||||
|
||||
# Run search
|
||||
result = framework.search(profile, use_cache=use_cache)
|
||||
|
||||
print(f"Found {len(result.matches)} matches")
|
||||
print(f"Duplicates removed: {result.duplicates_removed}")
|
||||
print(f"Sources: {len(result.sources_queried)}")
|
||||
|
||||
# Convert to serializable dict
|
||||
return {
|
||||
"success": True,
|
||||
"matches": [
|
||||
{
|
||||
"cat": m.cat.model_dump(),
|
||||
"match_score": m.match_score,
|
||||
"vector_similarity": m.vector_similarity,
|
||||
"attribute_match_score": m.attribute_match_score,
|
||||
"explanation": m.explanation,
|
||||
"matching_attributes": m.matching_attributes,
|
||||
"missing_attributes": m.missing_attributes,
|
||||
}
|
||||
for m in result.matches
|
||||
],
|
||||
"total_found": result.total_found,
|
||||
"duplicates_removed": result.duplicates_removed,
|
||||
"sources_queried": result.sources_queried,
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error in search_cats: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return {
|
||||
"success": False,
|
||||
"error": str(e),
|
||||
"matches": [],
|
||||
"total_found": 0,
|
||||
"duplicates_removed": 0,
|
||||
"sources_queried": [],
|
||||
}
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=300,
|
||||
)
|
||||
def create_alert_and_notify(alert_data: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Create alert in Modal DB and send immediate notification if needed.
|
||||
|
||||
Args:
|
||||
alert_data: AdoptionAlert as dictionary
|
||||
|
||||
Returns:
|
||||
Dict with success status, alert_id, and message
|
||||
"""
|
||||
|
||||
from cat_adoption_framework import TuxedoLinkFramework
|
||||
from database.manager import DatabaseManager
|
||||
from models.cats import AdoptionAlert
|
||||
from agents.email_agent import EmailAgent
|
||||
from agents.email_providers.factory import get_email_provider
|
||||
|
||||
print(f"[{datetime.now()}] Modal API: Creating alert")
|
||||
|
||||
try:
|
||||
# Initialize components
|
||||
db_manager = DatabaseManager("/data/tuxedo_link.db")
|
||||
|
||||
# Reconstruct alert
|
||||
alert = AdoptionAlert(**alert_data)
|
||||
print(f"Alert for: {alert.user_email}, frequency: {alert.frequency}")
|
||||
|
||||
# Save to Modal DB
|
||||
alert_id = db_manager.create_alert(alert)
|
||||
print(f"Alert created with ID: {alert_id}")
|
||||
|
||||
alert.id = alert_id
|
||||
|
||||
# If immediate, send notification now
|
||||
if alert.frequency == "immediately":
|
||||
print("Processing immediate notification...")
|
||||
framework = TuxedoLinkFramework()
|
||||
email_provider = get_email_provider()
|
||||
email_agent = EmailAgent(email_provider)
|
||||
|
||||
# Run search
|
||||
result = framework.search(alert.profile, use_cache=False)
|
||||
|
||||
if result.matches:
|
||||
print(f"Found {len(result.matches)} matches")
|
||||
|
||||
if email_agent.enabled:
|
||||
email_sent = email_agent.send_match_notification(alert, result.matches)
|
||||
if email_sent:
|
||||
# Update last_sent
|
||||
match_ids = [m.cat.id for m in result.matches]
|
||||
db_manager.update_alert(
|
||||
alert_id,
|
||||
last_sent=datetime.now(),
|
||||
last_match_ids=match_ids
|
||||
)
|
||||
return {
|
||||
"success": True,
|
||||
"alert_id": alert_id,
|
||||
"message": f"Alert created and {len(result.matches)} matches sent to {alert.user_email}!"
|
||||
}
|
||||
else:
|
||||
return {
|
||||
"success": False,
|
||||
"alert_id": alert_id,
|
||||
"message": "Alert created but email failed to send"
|
||||
}
|
||||
else:
|
||||
return {
|
||||
"success": True,
|
||||
"alert_id": alert_id,
|
||||
"message": "Alert created but no matches found yet"
|
||||
}
|
||||
else:
|
||||
return {
|
||||
"success": True,
|
||||
"alert_id": alert_id,
|
||||
"message": f"Alert created! You'll receive {alert.frequency} notifications at {alert.user_email}"
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error creating alert: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return {
|
||||
"success": False,
|
||||
"alert_id": None,
|
||||
"message": f"Error: {str(e)}"
|
||||
}
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=60,
|
||||
)
|
||||
def get_alerts(email: Optional[str] = None) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Get alerts from Modal DB.
|
||||
|
||||
Args:
|
||||
email: Optional email filter
|
||||
|
||||
Returns:
|
||||
List of alert dictionaries
|
||||
"""
|
||||
|
||||
from database.manager import DatabaseManager
|
||||
|
||||
try:
|
||||
db_manager = DatabaseManager("/data/tuxedo_link.db")
|
||||
|
||||
if email:
|
||||
alerts = db_manager.get_alerts_by_email(email)
|
||||
else:
|
||||
alerts = db_manager.get_all_alerts()
|
||||
|
||||
return [alert.dict() for alert in alerts]
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error getting alerts: {e}")
|
||||
return []
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=60,
|
||||
)
|
||||
def update_alert(alert_id: int, active: Optional[bool] = None) -> bool:
|
||||
"""
|
||||
Update alert in Modal DB.
|
||||
|
||||
Args:
|
||||
alert_id: Alert ID
|
||||
active: New active status
|
||||
|
||||
Returns:
|
||||
True if successful
|
||||
"""
|
||||
|
||||
from database.manager import DatabaseManager
|
||||
|
||||
try:
|
||||
db_manager = DatabaseManager("/data/tuxedo_link.db")
|
||||
db_manager.update_alert(alert_id, active=active)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"Error updating alert: {e}")
|
||||
return False
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=60,
|
||||
)
|
||||
def delete_alert(alert_id: int) -> bool:
|
||||
"""
|
||||
Delete alert from Modal DB.
|
||||
|
||||
Args:
|
||||
alert_id: Alert ID
|
||||
|
||||
Returns:
|
||||
True if successful
|
||||
"""
|
||||
|
||||
from database.manager import DatabaseManager
|
||||
|
||||
try:
|
||||
db_manager = DatabaseManager("/data/tuxedo_link.db")
|
||||
db_manager.delete_alert(alert_id)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"Error deleting alert: {e}")
|
||||
return False
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=120,
|
||||
)
|
||||
def extract_profile(user_input: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Extract cat profile from natural language using LLM.
|
||||
|
||||
Args:
|
||||
user_input: User's description of desired cat
|
||||
|
||||
Returns:
|
||||
CatProfile as dictionary
|
||||
"""
|
||||
|
||||
from agents.profile_agent import ProfileAgent
|
||||
|
||||
print(f"[{datetime.now()}] Modal API: Extracting profile")
|
||||
|
||||
try:
|
||||
agent = ProfileAgent()
|
||||
conversation = [{"role": "user", "content": user_input}]
|
||||
profile = agent.extract_profile(conversation)
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"profile": profile.dict()
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error extracting profile: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return {
|
||||
"success": False,
|
||||
"error": str(e),
|
||||
"profile": None
|
||||
}
|
||||
|
||||
|
||||
# Health check
|
||||
@app.function(image=image, timeout=10)
|
||||
def health_check() -> Dict[str, str]:
|
||||
"""Health check endpoint."""
|
||||
return {
|
||||
"status": "healthy",
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"service": "tuxedo-link-api"
|
||||
}
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
"""Data models for Tuxedo Link."""
|
||||
|
||||
from .cats import Cat, CatProfile, CatMatch, AdoptionAlert, SearchResult
|
||||
|
||||
__all__ = ["Cat", "CatProfile", "CatMatch", "AdoptionAlert", "SearchResult"]
|
||||
|
||||
@@ -1,229 +0,0 @@
|
||||
"""Pydantic models for cat adoption data."""
|
||||
|
||||
from datetime import datetime
|
||||
from typing import List, Optional, Dict, Any
|
||||
from pydantic import BaseModel, Field, field_validator
|
||||
|
||||
|
||||
class Cat(BaseModel):
|
||||
"""Model representing a cat available for adoption."""
|
||||
|
||||
# Basic information
|
||||
id: str = Field(..., description="Unique identifier from source")
|
||||
name: str = Field(..., description="Cat's name")
|
||||
breed: str = Field(..., description="Primary breed")
|
||||
breeds_secondary: Optional[List[str]] = Field(default=None, description="Secondary breeds")
|
||||
age: str = Field(..., description="Age category: kitten, young, adult, senior")
|
||||
size: str = Field(..., description="Size: small, medium, large")
|
||||
gender: str = Field(..., description="Gender: male, female, unknown")
|
||||
description: str = Field(default="", description="Full description of the cat")
|
||||
|
||||
# Location information
|
||||
organization_name: str = Field(..., description="Rescue organization name")
|
||||
organization_id: Optional[str] = Field(default=None, description="Organization ID")
|
||||
city: Optional[str] = Field(default=None, description="City")
|
||||
state: Optional[str] = Field(default=None, description="State/Province")
|
||||
zip_code: Optional[str] = Field(default=None, description="ZIP/Postal code")
|
||||
latitude: Optional[float] = Field(default=None, description="Latitude coordinate")
|
||||
longitude: Optional[float] = Field(default=None, description="Longitude coordinate")
|
||||
country: Optional[str] = Field(default="US", description="Country code")
|
||||
distance: Optional[float] = Field(default=None, description="Distance from user in miles")
|
||||
|
||||
# Behavioral attributes
|
||||
good_with_children: Optional[bool] = Field(default=None, description="Good with children")
|
||||
good_with_dogs: Optional[bool] = Field(default=None, description="Good with dogs")
|
||||
good_with_cats: Optional[bool] = Field(default=None, description="Good with cats")
|
||||
special_needs: bool = Field(default=False, description="Has special needs")
|
||||
|
||||
# Media
|
||||
photos: List[str] = Field(default_factory=list, description="List of photo URLs")
|
||||
primary_photo: Optional[str] = Field(default=None, description="Primary photo URL")
|
||||
videos: List[str] = Field(default_factory=list, description="List of video URLs")
|
||||
|
||||
# Metadata
|
||||
source: str = Field(..., description="Source: petfinder, rescuegroups")
|
||||
url: str = Field(..., description="Direct URL to listing")
|
||||
adoption_fee: Optional[float] = Field(default=None, description="Adoption fee in dollars")
|
||||
contact_email: Optional[str] = Field(default=None, description="Contact email")
|
||||
contact_phone: Optional[str] = Field(default=None, description="Contact phone")
|
||||
fetched_at: datetime = Field(default_factory=datetime.now, description="When data was fetched")
|
||||
|
||||
# Deduplication
|
||||
fingerprint: Optional[str] = Field(default=None, description="Computed fingerprint for deduplication")
|
||||
|
||||
# Additional attributes
|
||||
declawed: Optional[bool] = Field(default=None, description="Is declawed")
|
||||
spayed_neutered: Optional[bool] = Field(default=None, description="Is spayed/neutered")
|
||||
house_trained: Optional[bool] = Field(default=None, description="Is house trained")
|
||||
coat_length: Optional[str] = Field(default=None, description="Coat length: short, medium, long")
|
||||
colors: List[str] = Field(default_factory=list, description="Coat colors")
|
||||
|
||||
@field_validator('age')
|
||||
@classmethod
|
||||
def validate_age(cls, v: str) -> str:
|
||||
"""Validate age category."""
|
||||
valid_ages = ['kitten', 'young', 'adult', 'senior', 'unknown']
|
||||
if v.lower() not in valid_ages:
|
||||
return 'unknown'
|
||||
return v.lower()
|
||||
|
||||
@field_validator('size')
|
||||
@classmethod
|
||||
def validate_size(cls, v: str) -> str:
|
||||
"""Validate size category."""
|
||||
valid_sizes = ['small', 'medium', 'large', 'unknown']
|
||||
if v.lower() not in valid_sizes:
|
||||
return 'unknown'
|
||||
return v.lower()
|
||||
|
||||
@field_validator('gender')
|
||||
@classmethod
|
||||
def validate_gender(cls, v: str) -> str:
|
||||
"""Validate gender."""
|
||||
valid_genders = ['male', 'female', 'unknown']
|
||||
if v.lower() not in valid_genders:
|
||||
return 'unknown'
|
||||
return v.lower()
|
||||
|
||||
|
||||
class CatProfile(BaseModel):
|
||||
"""Model representing user preferences for cat adoption."""
|
||||
|
||||
# Hard constraints
|
||||
age_range: Optional[List[str]] = Field(
|
||||
default=None,
|
||||
description="Acceptable age categories: kitten, young, adult, senior"
|
||||
)
|
||||
size: Optional[List[str]] = Field(
|
||||
default=None,
|
||||
description="Acceptable sizes: small, medium, large"
|
||||
)
|
||||
max_distance: Optional[int] = Field(
|
||||
default=100,
|
||||
description="Maximum distance in miles"
|
||||
)
|
||||
good_with_children: Optional[bool] = Field(
|
||||
default=None,
|
||||
description="Must be good with children"
|
||||
)
|
||||
good_with_dogs: Optional[bool] = Field(
|
||||
default=None,
|
||||
description="Must be good with dogs"
|
||||
)
|
||||
good_with_cats: Optional[bool] = Field(
|
||||
default=None,
|
||||
description="Must be good with cats"
|
||||
)
|
||||
special_needs_ok: bool = Field(
|
||||
default=True,
|
||||
description="Open to special needs cats"
|
||||
)
|
||||
|
||||
# Soft preferences (for vector search)
|
||||
personality_description: str = Field(
|
||||
default="",
|
||||
description="Free-text description of desired personality and traits"
|
||||
)
|
||||
|
||||
# Breed preferences
|
||||
preferred_breeds: Optional[List[str]] = Field(
|
||||
default=None,
|
||||
description="Preferred breeds"
|
||||
)
|
||||
|
||||
# Location
|
||||
user_location: Optional[str] = Field(
|
||||
default=None,
|
||||
description="User location (ZIP code, city, or lat,long)"
|
||||
)
|
||||
user_latitude: Optional[float] = Field(default=None, description="User latitude")
|
||||
user_longitude: Optional[float] = Field(default=None, description="User longitude")
|
||||
|
||||
# Additional preferences
|
||||
gender_preference: Optional[str] = Field(
|
||||
default=None,
|
||||
description="Preferred gender: male, female, or None for no preference"
|
||||
)
|
||||
coat_length_preference: Optional[List[str]] = Field(
|
||||
default=None,
|
||||
description="Preferred coat lengths: short, medium, long"
|
||||
)
|
||||
color_preferences: Optional[List[str]] = Field(
|
||||
default=None,
|
||||
description="Preferred colors"
|
||||
)
|
||||
must_be_declawed: Optional[bool] = Field(default=None, description="Must be declawed")
|
||||
must_be_spayed_neutered: Optional[bool] = Field(default=None, description="Must be spayed/neutered")
|
||||
|
||||
@field_validator('age_range')
|
||||
@classmethod
|
||||
def validate_age_range(cls, v: Optional[List[str]]) -> Optional[List[str]]:
|
||||
"""Validate age range values."""
|
||||
if v is None:
|
||||
return None
|
||||
valid_ages = {'kitten', 'young', 'adult', 'senior'}
|
||||
return [age.lower() for age in v if age.lower() in valid_ages]
|
||||
|
||||
@field_validator('size')
|
||||
@classmethod
|
||||
def validate_size_list(cls, v: Optional[List[str]]) -> Optional[List[str]]:
|
||||
"""Validate size values."""
|
||||
if v is None:
|
||||
return None
|
||||
valid_sizes = {'small', 'medium', 'large'}
|
||||
return [size.lower() for size in v if size.lower() in valid_sizes]
|
||||
|
||||
|
||||
class CatMatch(BaseModel):
|
||||
"""Model representing a matched cat with scoring details."""
|
||||
|
||||
cat: Cat = Field(..., description="The matched cat")
|
||||
match_score: float = Field(..., description="Overall match score (0-1)")
|
||||
vector_similarity: float = Field(..., description="Vector similarity score (0-1)")
|
||||
attribute_match_score: float = Field(..., description="Attribute match score (0-1)")
|
||||
explanation: str = Field(default="", description="Human-readable match explanation")
|
||||
matching_attributes: List[str] = Field(
|
||||
default_factory=list,
|
||||
description="List of matching attributes"
|
||||
)
|
||||
missing_attributes: List[str] = Field(
|
||||
default_factory=list,
|
||||
description="List of desired but missing attributes"
|
||||
)
|
||||
|
||||
|
||||
class AdoptionAlert(BaseModel):
|
||||
"""Model representing a scheduled adoption alert."""
|
||||
|
||||
id: Optional[int] = Field(default=None, description="Alert ID (assigned by database)")
|
||||
user_email: str = Field(..., description="User email for notifications")
|
||||
profile: CatProfile = Field(..., description="Search profile")
|
||||
frequency: str = Field(..., description="Frequency: immediately, daily, weekly")
|
||||
last_sent: Optional[datetime] = Field(default=None, description="Last notification sent")
|
||||
active: bool = Field(default=True, description="Is alert active")
|
||||
created_at: datetime = Field(default_factory=datetime.now, description="When alert was created")
|
||||
last_match_ids: List[str] = Field(
|
||||
default_factory=list,
|
||||
description="IDs of cats from last notification (to avoid duplicates)"
|
||||
)
|
||||
|
||||
@field_validator('frequency')
|
||||
@classmethod
|
||||
def validate_frequency(cls, v: str) -> str:
|
||||
"""Validate frequency value."""
|
||||
valid_frequencies = ['immediately', 'daily', 'weekly']
|
||||
if v.lower() not in valid_frequencies:
|
||||
raise ValueError(f"Frequency must be one of: {valid_frequencies}")
|
||||
return v.lower()
|
||||
|
||||
|
||||
class SearchResult(BaseModel):
|
||||
"""Model representing search results returned to UI."""
|
||||
|
||||
matches: List[CatMatch] = Field(..., description="List of matched cats")
|
||||
total_found: int = Field(..., description="Total cats found before filtering")
|
||||
search_profile: CatProfile = Field(..., description="Search profile used")
|
||||
search_time: float = Field(..., description="Search time in seconds")
|
||||
sources_queried: List[str] = Field(..., description="Sources that were queried")
|
||||
duplicates_removed: int = Field(default=0, description="Number of duplicates removed")
|
||||
|
||||
@@ -1,61 +0,0 @@
|
||||
[project]
|
||||
name = "tuxedo-link"
|
||||
version = "0.1.0"
|
||||
description = "AI-powered cat adoption matching application"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
dependencies = [
|
||||
"pydantic>=2.0",
|
||||
"python-dotenv",
|
||||
"requests",
|
||||
"chromadb",
|
||||
"sentence-transformers",
|
||||
"transformers",
|
||||
"torch==2.2.2",
|
||||
"pillow",
|
||||
"scikit-learn",
|
||||
"open-clip-torch",
|
||||
"python-Levenshtein",
|
||||
"beautifulsoup4",
|
||||
"feedparser",
|
||||
"sendgrid",
|
||||
"gradio",
|
||||
"plotly",
|
||||
"modal",
|
||||
"tqdm",
|
||||
"numpy==1.26.4",
|
||||
"openai",
|
||||
"pyyaml",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
dev = [
|
||||
"pytest",
|
||||
"pytest-mock",
|
||||
"pytest-asyncio",
|
||||
"pytest-cov",
|
||||
"ipython",
|
||||
"jupyter",
|
||||
]
|
||||
|
||||
[build-system]
|
||||
requires = ["hatchling"]
|
||||
build-backend = "hatchling.build"
|
||||
|
||||
[tool.hatch.build.targets.wheel]
|
||||
packages = ["models", "database", "agents", "modal_services", "utils"]
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
testpaths = ["tests"]
|
||||
python_files = "test_*.py"
|
||||
python_classes = "Test*"
|
||||
python_functions = "test_*"
|
||||
addopts = "-v --cov=. --cov-report=html --cov-report=term"
|
||||
|
||||
[tool.coverage.run]
|
||||
omit = [
|
||||
"tests/*",
|
||||
"setup.py",
|
||||
"*/site-packages/*",
|
||||
]
|
||||
|
||||
@@ -1,50 +0,0 @@
|
||||
# Core
|
||||
pydantic>=2.0
|
||||
python-dotenv
|
||||
requests
|
||||
|
||||
# Database
|
||||
chromadb
|
||||
# sqlite3 is built-in to Python
|
||||
|
||||
# Vector & ML
|
||||
sentence-transformers
|
||||
transformers
|
||||
torch
|
||||
pillow
|
||||
scikit-learn
|
||||
|
||||
# Image embeddings
|
||||
open-clip-torch
|
||||
|
||||
# Fuzzy matching
|
||||
python-Levenshtein
|
||||
|
||||
# Web scraping & APIs (for potential future sources)
|
||||
beautifulsoup4
|
||||
feedparser
|
||||
|
||||
# Email
|
||||
sendgrid
|
||||
# Mailgun uses requests library (already included above)
|
||||
|
||||
# Configuration
|
||||
pyyaml
|
||||
|
||||
# UI
|
||||
gradio
|
||||
plotly
|
||||
|
||||
# Modal
|
||||
modal
|
||||
|
||||
# Testing
|
||||
pytest
|
||||
pytest-mock
|
||||
pytest-asyncio
|
||||
pytest-cov
|
||||
|
||||
# Utilities
|
||||
tqdm
|
||||
numpy
|
||||
|
||||
@@ -1,82 +0,0 @@
|
||||
#!/bin/bash
|
||||
# Launch script for Tuxedo Link
|
||||
|
||||
# Colors
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
YELLOW='\033[1;33m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
echo -e "${BLUE}🎩 Tuxedo Link - AI-Powered Cat Adoption Search${NC}"
|
||||
echo ""
|
||||
|
||||
# Check if virtual environment exists
|
||||
if [ ! -d ".venv" ]; then
|
||||
echo -e "${YELLOW}⚠️ Virtual environment not found. Please run setup first:${NC}"
|
||||
echo " uv venv && source .venv/bin/activate && uv pip install -e \".[dev]\""
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Activate virtual environment
|
||||
echo -e "${GREEN}✓${NC} Activating virtual environment..."
|
||||
source .venv/bin/activate
|
||||
|
||||
# Check if .env exists
|
||||
if [ ! -f ".env" ]; then
|
||||
echo -e "${YELLOW}⚠️ .env file not found. Creating from template...${NC}"
|
||||
if [ -f "env.example" ]; then
|
||||
cp env.example .env
|
||||
echo -e "${YELLOW}Please edit .env with your API keys before continuing.${NC}"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Check if config.yaml exists
|
||||
if [ ! -f "config.yaml" ]; then
|
||||
echo -e "${YELLOW}⚠️ config.yaml not found. Creating from example...${NC}"
|
||||
if [ -f "config.example.yaml" ]; then
|
||||
cp config.example.yaml config.yaml
|
||||
echo -e "${GREEN}✓${NC} config.yaml created. Review settings if needed."
|
||||
fi
|
||||
fi
|
||||
|
||||
# Check deployment mode from config
|
||||
DEPLOYMENT_MODE=$(python -c "import yaml; config = yaml.safe_load(open('config.yaml')); print(config['deployment']['mode'])" 2>/dev/null || echo "local")
|
||||
|
||||
if [ "$DEPLOYMENT_MODE" = "production" ]; then
|
||||
echo -e "${BLUE}📡 Production mode enabled${NC}"
|
||||
echo " UI will connect to Modal backend"
|
||||
echo " All searches and agents run on Modal"
|
||||
echo ""
|
||||
else
|
||||
echo -e "${GREEN}💻 Local mode enabled${NC}"
|
||||
echo " All components run locally"
|
||||
echo ""
|
||||
fi
|
||||
|
||||
# Check for required API keys
|
||||
if ! grep -q "OPENAI_API_KEY=sk-" .env 2>/dev/null && ! grep -q "PETFINDER_API_KEY" .env 2>/dev/null; then
|
||||
echo -e "${YELLOW}⚠️ Please configure API keys in .env file${NC}"
|
||||
echo " Required: OPENAI_API_KEY, PETFINDER_API_KEY"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "${GREEN}✓${NC} Environment configured"
|
||||
|
||||
# Initialize databases if needed
|
||||
if [ ! -f "data/tuxedo_link.db" ]; then
|
||||
echo -e "${GREEN}✓${NC} Initializing databases..."
|
||||
python setup_vectordb.py > /dev/null 2>&1
|
||||
fi
|
||||
|
||||
echo -e "${GREEN}✓${NC} Databases ready"
|
||||
echo ""
|
||||
echo -e "${BLUE}🚀 Starting Tuxedo Link...${NC}"
|
||||
echo ""
|
||||
echo -e " ${GREEN}→${NC} Opening http://localhost:7860"
|
||||
echo -e " ${GREEN}→${NC} Press Ctrl+C to stop"
|
||||
echo ""
|
||||
|
||||
# Launch the app
|
||||
python app.py
|
||||
|
||||
@@ -1,389 +0,0 @@
|
||||
"""Modal scheduled search service for running automated cat searches."""
|
||||
|
||||
import modal
|
||||
from datetime import datetime
|
||||
from typing import Dict, Any
|
||||
from pathlib import Path
|
||||
|
||||
# Local imports - available because we use .add_local_dir() to copy all project files
|
||||
from cat_adoption_framework import TuxedoLinkFramework
|
||||
from database.manager import DatabaseManager
|
||||
from agents.email_agent import EmailAgent
|
||||
from agents.email_providers.factory import get_email_provider
|
||||
|
||||
# Create Modal app
|
||||
app = modal.App("tuxedo-link-scheduled-search")
|
||||
|
||||
# Get project directory
|
||||
project_dir = Path(__file__).parent
|
||||
|
||||
# Define image with all dependencies and project files
|
||||
image = (
|
||||
modal.Image.debian_slim(python_version="3.11")
|
||||
.pip_install(
|
||||
"openai",
|
||||
"chromadb",
|
||||
"sentence-transformers==2.5.1", # Compatible with torch 2.2.2
|
||||
"transformers==4.38.0", # Compatible with torch 2.2.2
|
||||
"python-dotenv",
|
||||
"pydantic",
|
||||
"requests",
|
||||
"sendgrid",
|
||||
"pyyaml",
|
||||
"python-levenshtein",
|
||||
"Pillow",
|
||||
"geopy",
|
||||
"open-clip-torch==2.24.0", # Compatible with torch 2.2.2
|
||||
)
|
||||
.apt_install("git")
|
||||
.run_commands(
|
||||
"pip install torch==2.2.2 torchvision==0.17.2 --index-url https://download.pytorch.org/whl/cpu",
|
||||
"pip install numpy==1.26.4",
|
||||
)
|
||||
# Add only necessary source directories (Modal 1.0+ API)
|
||||
.add_local_dir(str(project_dir / "models"), remote_path="/root/models")
|
||||
.add_local_dir(str(project_dir / "agents"), remote_path="/root/agents")
|
||||
.add_local_dir(str(project_dir / "database"), remote_path="/root/database")
|
||||
.add_local_dir(str(project_dir / "utils"), remote_path="/root/utils")
|
||||
# Add standalone Python files
|
||||
.add_local_file(str(project_dir / "cat_adoption_framework.py"), remote_path="/root/cat_adoption_framework.py")
|
||||
.add_local_file(str(project_dir / "setup_vectordb.py"), remote_path="/root/setup_vectordb.py")
|
||||
.add_local_file(str(project_dir / "setup_metadata_vectordb.py"), remote_path="/root/setup_metadata_vectordb.py")
|
||||
# Add config file
|
||||
.add_local_file(str(project_dir / "config.yaml"), remote_path="/root/config.yaml")
|
||||
)
|
||||
|
||||
# Create Volume for persistent storage (database and vector store)
|
||||
volume = modal.Volume.from_name("tuxedo-link-data", create_if_missing=True)
|
||||
|
||||
# Define secrets
|
||||
secrets = [
|
||||
modal.Secret.from_name("tuxedo-link-secrets") # Contains all API keys
|
||||
]
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=600, # 10 minutes
|
||||
)
|
||||
def run_scheduled_searches() -> None:
|
||||
"""
|
||||
Run scheduled searches for all active alerts.
|
||||
|
||||
This function:
|
||||
1. Loads all active adoption alerts from database
|
||||
2. For each alert, runs a cat search based on saved profile
|
||||
3. If new matches found, sends email notification
|
||||
4. Updates alert last_sent timestamp
|
||||
"""
|
||||
print(f"[{datetime.now()}] Starting scheduled search job")
|
||||
|
||||
# Initialize components
|
||||
framework = TuxedoLinkFramework()
|
||||
db_manager = DatabaseManager("/data/tuxedo_link.db")
|
||||
email_agent = EmailAgent()
|
||||
|
||||
# Get all active alerts
|
||||
alerts = db_manager.get_active_alerts()
|
||||
print(f"Found {len(alerts)} active alerts")
|
||||
|
||||
for alert in alerts:
|
||||
try:
|
||||
print(f"Processing alert {alert.id} for {alert.user_email}")
|
||||
|
||||
# Run search
|
||||
result = framework.search(alert.profile)
|
||||
|
||||
# Filter out cats already seen
|
||||
new_matches = [
|
||||
m for m in result.matches
|
||||
if m.cat.id not in alert.last_match_ids
|
||||
]
|
||||
|
||||
if new_matches:
|
||||
print(f"Found {len(new_matches)} new matches for alert {alert.id}")
|
||||
|
||||
# Send email
|
||||
if email_agent.enabled:
|
||||
email_sent = email_agent.send_match_notification(alert, new_matches)
|
||||
if email_sent:
|
||||
# Update last_sent and last_match_ids
|
||||
new_match_ids = [m.cat.id for m in new_matches]
|
||||
db_manager.update_alert(
|
||||
alert.id,
|
||||
last_sent=datetime.now(),
|
||||
last_match_ids=new_match_ids
|
||||
)
|
||||
print(f"Email sent successfully for alert {alert.id}")
|
||||
else:
|
||||
print(f"Failed to send email for alert {alert.id}")
|
||||
else:
|
||||
print("Email agent disabled")
|
||||
else:
|
||||
print(f"No new matches for alert {alert.id}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error processing alert {alert.id}: {e}")
|
||||
continue
|
||||
|
||||
print(f"[{datetime.now()}] Scheduled search job completed")
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=300,
|
||||
)
|
||||
def send_immediate_notification(alert_id: int) -> bool:
|
||||
"""
|
||||
Send immediate notification for a specific alert.
|
||||
|
||||
This is called when an alert is created with frequency="immediately".
|
||||
|
||||
Args:
|
||||
alert_id: The ID of the alert to process
|
||||
|
||||
Returns:
|
||||
bool: True if notification sent successfully, False otherwise
|
||||
"""
|
||||
import sys
|
||||
import os
|
||||
|
||||
# Add project root to path
|
||||
print(f"[{datetime.now()}] Processing immediate notification for alert {alert_id}")
|
||||
|
||||
try:
|
||||
# Initialize components
|
||||
framework = TuxedoLinkFramework()
|
||||
db_manager = DatabaseManager("/data/tuxedo_link.db")
|
||||
email_agent = EmailAgent()
|
||||
|
||||
# Get the alert
|
||||
alert = db_manager.get_alert(alert_id)
|
||||
if not alert:
|
||||
print(f"Alert {alert_id} not found")
|
||||
return False
|
||||
|
||||
if not alert.active:
|
||||
print(f"Alert {alert_id} is inactive")
|
||||
return False
|
||||
|
||||
# Run search
|
||||
result = framework.search(alert.profile)
|
||||
|
||||
if result.matches:
|
||||
print(f"Found {len(result.matches)} matches for alert {alert_id}")
|
||||
|
||||
# Send email
|
||||
if email_agent.enabled:
|
||||
email_sent = email_agent.send_match_notification(alert, result.matches)
|
||||
if email_sent:
|
||||
# Update last_sent and last_match_ids
|
||||
match_ids = [m.cat.id for m in result.matches]
|
||||
db_manager.update_alert(
|
||||
alert.id,
|
||||
last_sent=datetime.now(),
|
||||
last_match_ids=match_ids
|
||||
)
|
||||
print(f"Email sent successfully for alert {alert_id}")
|
||||
return True
|
||||
else:
|
||||
print(f"Failed to send email for alert {alert_id}")
|
||||
return False
|
||||
else:
|
||||
print("Email agent disabled")
|
||||
return False
|
||||
else:
|
||||
print(f"No matches found for alert {alert_id}")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error processing immediate notification for alert {alert_id}: {e}")
|
||||
return False
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=300,
|
||||
)
|
||||
def create_alert_and_notify(alert_data: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Create an alert in Modal's database and send immediate notification.
|
||||
|
||||
This is called from the UI in production mode when creating an alert.
|
||||
The alert is saved to Modal's database, then processed if immediate.
|
||||
|
||||
Args:
|
||||
alert_data: Dictionary containing alert data (from AdoptionAlert.dict())
|
||||
|
||||
Returns:
|
||||
Dict with {"success": bool, "alert_id": int, "message": str}
|
||||
"""
|
||||
import sys
|
||||
import os
|
||||
|
||||
# Add project root to path
|
||||
print(f"[{datetime.now()}] Creating alert in Modal DB")
|
||||
|
||||
try:
|
||||
# Initialize database
|
||||
db_manager = DatabaseManager("/data/tuxedo_link.db")
|
||||
|
||||
# Reconstruct alert from dict
|
||||
alert = AdoptionAlert(**alert_data)
|
||||
print(f"Alert for: {alert.user_email}, location: {alert.profile.user_location if alert.profile else 'None'}")
|
||||
|
||||
# Save alert to Modal's database
|
||||
alert_id = db_manager.create_alert(alert)
|
||||
print(f"✓ Alert created in Modal DB with ID: {alert_id}")
|
||||
|
||||
# Update alert with the ID
|
||||
alert.id = alert_id
|
||||
|
||||
# If immediate frequency, send notification now
|
||||
if alert.frequency == "immediately":
|
||||
print(f"Sending immediate notification...")
|
||||
framework = TuxedoLinkFramework()
|
||||
email_provider = get_email_provider()
|
||||
email_agent = EmailAgent(email_provider)
|
||||
|
||||
# Run search
|
||||
result = framework.search(alert.profile, use_cache=False)
|
||||
|
||||
if result.matches:
|
||||
print(f"Found {len(result.matches)} matches")
|
||||
|
||||
# Send email
|
||||
if email_agent.enabled:
|
||||
email_sent = email_agent.send_match_notification(alert, result.matches)
|
||||
if email_sent:
|
||||
# Update last_sent
|
||||
match_ids = [m.cat.id for m in result.matches]
|
||||
db_manager.update_alert(
|
||||
alert_id,
|
||||
last_sent=datetime.now(),
|
||||
last_match_ids=match_ids
|
||||
)
|
||||
print(f"✓ Email sent to {alert.user_email}")
|
||||
return {
|
||||
"success": True,
|
||||
"alert_id": alert_id,
|
||||
"message": f"Alert created and {len(result.matches)} matches sent to {alert.user_email}!"
|
||||
}
|
||||
else:
|
||||
return {
|
||||
"success": False,
|
||||
"alert_id": alert_id,
|
||||
"message": "Alert created but email failed to send"
|
||||
}
|
||||
else:
|
||||
return {
|
||||
"success": False,
|
||||
"alert_id": alert_id,
|
||||
"message": "Email agent not enabled"
|
||||
}
|
||||
else:
|
||||
print(f"No matches found")
|
||||
return {
|
||||
"success": True,
|
||||
"alert_id": alert_id,
|
||||
"message": "Alert created but no matches found yet"
|
||||
}
|
||||
else:
|
||||
# For daily/weekly alerts
|
||||
return {
|
||||
"success": True,
|
||||
"alert_id": alert_id,
|
||||
"message": f"Alert created! You'll receive {alert.frequency} notifications at {alert.user_email}"
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error creating alert: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return {
|
||||
"success": False,
|
||||
"alert_id": None,
|
||||
"message": f"Error: {str(e)}"
|
||||
}
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
schedule=modal.Cron("0 9 * * *"), # Run daily at 9 AM UTC
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=600,
|
||||
)
|
||||
def daily_search_job() -> None:
|
||||
"""Daily scheduled job to run cat searches for all daily alerts."""
|
||||
run_scheduled_searches.remote()
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
schedule=modal.Cron("0 9 * * 1"), # Run weekly on Mondays at 9 AM UTC
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=600,
|
||||
)
|
||||
def weekly_search_job() -> None:
|
||||
"""Weekly scheduled job to run cat searches for all weekly alerts."""
|
||||
run_scheduled_searches.remote()
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=300,
|
||||
)
|
||||
def cleanup_old_data(days: int = 30) -> Dict[str, Any]:
|
||||
"""
|
||||
Clean up old cat data from cache and vector database.
|
||||
|
||||
Args:
|
||||
days: Number of days of data to keep (default: 30)
|
||||
|
||||
Returns:
|
||||
Statistics dictionary with cleanup results
|
||||
"""
|
||||
import sys
|
||||
print(f"[{datetime.now()}] Starting cleanup job (keeping last {days} days)")
|
||||
|
||||
framework = TuxedoLinkFramework()
|
||||
stats = framework.cleanup_old_data(days)
|
||||
|
||||
print(f"Cleanup complete: {stats}")
|
||||
print(f"[{datetime.now()}] Cleanup job completed")
|
||||
|
||||
return stats
|
||||
|
||||
|
||||
@app.function(
|
||||
image=image,
|
||||
schedule=modal.Cron("0 2 * * 0"), # Run weekly on Sundays at 2 AM UTC
|
||||
volumes={"/data": volume},
|
||||
secrets=secrets,
|
||||
timeout=300,
|
||||
)
|
||||
def weekly_cleanup_job() -> None:
|
||||
"""Weekly scheduled job to clean up old data (30+ days)."""
|
||||
cleanup_old_data.remote(30)
|
||||
|
||||
|
||||
# For manual testing
|
||||
@app.local_entrypoint()
|
||||
def main() -> None:
|
||||
"""Test the scheduled search locally for development."""
|
||||
run_scheduled_searches.remote()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
@@ -1,2 +0,0 @@
|
||||
"""Deployment and utility scripts."""
|
||||
|
||||
@@ -1,76 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
"""Fetch and display valid colors and breeds from Petfinder API."""
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Add parent directory to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||
|
||||
from agents.petfinder_agent import PetfinderAgent
|
||||
|
||||
def main():
|
||||
"""Fetch and display valid cat colors and breeds from Petfinder API."""
|
||||
print("=" * 70)
|
||||
print("Fetching Valid Cat Data from Petfinder API")
|
||||
print("=" * 70)
|
||||
print()
|
||||
|
||||
try:
|
||||
# Initialize agent
|
||||
agent = PetfinderAgent()
|
||||
|
||||
# Fetch colors
|
||||
print("📋 COLORS")
|
||||
print("-" * 70)
|
||||
colors = agent.get_valid_colors()
|
||||
|
||||
print(f"✓ Found {len(colors)} valid colors:")
|
||||
print()
|
||||
|
||||
for i, color in enumerate(colors, 1):
|
||||
print(f" {i:2d}. {color}")
|
||||
|
||||
print()
|
||||
print("=" * 70)
|
||||
print("Common user terms mapped to API colors:")
|
||||
print(" • 'tuxedo' → Black & White / Tuxedo")
|
||||
print(" • 'orange' → Orange / Red")
|
||||
print(" • 'gray' → Gray / Blue / Silver")
|
||||
print(" • 'orange tabby' → Tabby (Orange / Red)")
|
||||
print(" • 'calico' → Calico")
|
||||
print()
|
||||
|
||||
# Fetch breeds
|
||||
print("=" * 70)
|
||||
print("📋 BREEDS")
|
||||
print("-" * 70)
|
||||
breeds = agent.get_valid_breeds()
|
||||
|
||||
print(f"✓ Found {len(breeds)} valid breeds:")
|
||||
print()
|
||||
|
||||
# Show first 30 breeds
|
||||
for i, breed in enumerate(breeds[:30], 1):
|
||||
print(f" {i:2d}. {breed}")
|
||||
|
||||
if len(breeds) > 30:
|
||||
print(f" ... and {len(breeds) - 30} more breeds")
|
||||
|
||||
print()
|
||||
print("=" * 70)
|
||||
print("These are the ONLY values accepted by Petfinder API")
|
||||
print("Use these exact values when making API requests")
|
||||
print("=" * 70)
|
||||
print()
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Error: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
@@ -1,57 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
"""Upload config.yaml to Modal volume for remote configuration."""
|
||||
|
||||
import modal
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
import sys
|
||||
|
||||
|
||||
def main():
|
||||
"""Upload config.yaml to Modal volume."""
|
||||
# Load local config
|
||||
config_path = Path("config.yaml")
|
||||
if not config_path.exists():
|
||||
print("❌ Error: config.yaml not found")
|
||||
print("Copy config.example.yaml to config.yaml and configure it")
|
||||
sys.exit(1)
|
||||
|
||||
try:
|
||||
with open(config_path) as f:
|
||||
config = yaml.safe_load(f)
|
||||
except Exception as e:
|
||||
print(f"❌ Error loading config.yaml: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
# Validate config
|
||||
if config['deployment']['mode'] != 'production':
|
||||
print("⚠️ Warning: config.yaml deployment mode is not set to 'production'")
|
||||
|
||||
try:
|
||||
# Connect to Modal volume
|
||||
volume = modal.Volume.from_name("tuxedo-link-data", create_if_missing=True)
|
||||
|
||||
# Remove old config if it exists
|
||||
try:
|
||||
volume.remove_file("/data/config.yaml")
|
||||
print(" Removed old config.yaml")
|
||||
except Exception:
|
||||
# File doesn't exist, that's fine
|
||||
pass
|
||||
|
||||
# Upload new config
|
||||
with volume.batch_upload() as batch:
|
||||
batch.put_file(config_path, "/data/config.yaml")
|
||||
|
||||
print("✓ Config uploaded to Modal volume")
|
||||
print(f" Email provider: {config['email']['provider']}")
|
||||
print(f" Deployment mode: {config['deployment']['mode']}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Error uploading config to Modal: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
@@ -1,238 +0,0 @@
|
||||
"""
|
||||
Vector database for semantic search of colors and breeds.
|
||||
|
||||
This module provides fuzzy matching for user color/breed terms against
|
||||
valid API values using sentence embeddings.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from typing import List, Dict, Optional
|
||||
from pathlib import Path
|
||||
|
||||
import chromadb
|
||||
from sentence_transformers import SentenceTransformer
|
||||
|
||||
|
||||
class MetadataVectorDB:
|
||||
"""
|
||||
Vector database for semantic search of metadata (colors, breeds).
|
||||
|
||||
Separate from the main cat vector DB, this stores valid API values
|
||||
and enables fuzzy matching for user terms.
|
||||
"""
|
||||
|
||||
def __init__(self, persist_directory: str = "metadata_vectorstore"):
|
||||
"""
|
||||
Initialize metadata vector database.
|
||||
|
||||
Args:
|
||||
persist_directory: Path to persist the database
|
||||
"""
|
||||
self.persist_directory = persist_directory
|
||||
Path(persist_directory).mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Initialize ChromaDB client
|
||||
self.client = chromadb.PersistentClient(path=persist_directory)
|
||||
|
||||
# Initialize embedding model (same as main vector DB for consistency)
|
||||
self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
|
||||
|
||||
# Get or create collections
|
||||
self.colors_collection = self.client.get_or_create_collection(
|
||||
name="colors",
|
||||
metadata={"description": "Valid color values from APIs"}
|
||||
)
|
||||
|
||||
self.breeds_collection = self.client.get_or_create_collection(
|
||||
name="breeds",
|
||||
metadata={"description": "Valid breed values from APIs"}
|
||||
)
|
||||
|
||||
logging.info(f"MetadataVectorDB initialized at {persist_directory}")
|
||||
logging.info(f"Colors indexed: {self.colors_collection.count()}")
|
||||
logging.info(f"Breeds indexed: {self.breeds_collection.count()}")
|
||||
|
||||
def index_colors(self, valid_colors: List[str], source: str = "petfinder") -> None:
|
||||
"""
|
||||
Index valid color values for semantic search.
|
||||
|
||||
Args:
|
||||
valid_colors: List of valid color strings from API
|
||||
source: API source (petfinder or rescuegroups)
|
||||
"""
|
||||
if not valid_colors:
|
||||
logging.warning(f"No colors provided for indexing from {source}")
|
||||
return
|
||||
|
||||
# Check if already indexed for this source
|
||||
existing = self.colors_collection.get(
|
||||
where={"source": source}
|
||||
)
|
||||
|
||||
if existing and len(existing['ids']) > 0:
|
||||
logging.info(f"Colors from {source} already indexed ({len(existing['ids'])} items)")
|
||||
return
|
||||
|
||||
# Generate embeddings
|
||||
embeddings = self.embedding_model.encode(valid_colors, show_progress_bar=False)
|
||||
|
||||
# Create IDs
|
||||
ids = [f"{source}_color_{i}" for i in range(len(valid_colors))]
|
||||
|
||||
# Index in ChromaDB
|
||||
self.colors_collection.add(
|
||||
ids=ids,
|
||||
embeddings=embeddings.tolist(),
|
||||
documents=valid_colors,
|
||||
metadatas=[{"color": c, "source": source} for c in valid_colors]
|
||||
)
|
||||
|
||||
logging.info(f"✓ Indexed {len(valid_colors)} colors from {source}")
|
||||
|
||||
def index_breeds(self, valid_breeds: List[str], source: str = "petfinder") -> None:
|
||||
"""
|
||||
Index valid breed values for semantic search.
|
||||
|
||||
Args:
|
||||
valid_breeds: List of valid breed strings from API
|
||||
source: API source (petfinder or rescuegroups)
|
||||
"""
|
||||
if not valid_breeds:
|
||||
logging.warning(f"No breeds provided for indexing from {source}")
|
||||
return
|
||||
|
||||
# Check if already indexed for this source
|
||||
existing = self.breeds_collection.get(
|
||||
where={"source": source}
|
||||
)
|
||||
|
||||
if existing and len(existing['ids']) > 0:
|
||||
logging.info(f"Breeds from {source} already indexed ({len(existing['ids'])} items)")
|
||||
return
|
||||
|
||||
# Generate embeddings
|
||||
embeddings = self.embedding_model.encode(valid_breeds, show_progress_bar=False)
|
||||
|
||||
# Create IDs
|
||||
ids = [f"{source}_breed_{i}" for i in range(len(valid_breeds))]
|
||||
|
||||
# Index in ChromaDB
|
||||
self.breeds_collection.add(
|
||||
ids=ids,
|
||||
embeddings=embeddings.tolist(),
|
||||
documents=valid_breeds,
|
||||
metadatas=[{"breed": b, "source": source} for b in valid_breeds]
|
||||
)
|
||||
|
||||
logging.info(f"✓ Indexed {len(valid_breeds)} breeds from {source}")
|
||||
|
||||
def search_color(
|
||||
self,
|
||||
user_term: str,
|
||||
n_results: int = 1,
|
||||
source_filter: Optional[str] = None
|
||||
) -> List[Dict]:
|
||||
"""
|
||||
Find most similar valid color(s) to user term.
|
||||
|
||||
Args:
|
||||
user_term: User's color preference (e.g., "tuxedo", "grey")
|
||||
n_results: Number of results to return
|
||||
source_filter: Optional filter by source (petfinder/rescuegroups)
|
||||
|
||||
Returns:
|
||||
List of dicts with 'color', 'distance', 'source' keys
|
||||
"""
|
||||
if not user_term or not user_term.strip():
|
||||
return []
|
||||
|
||||
# Generate embedding for user term
|
||||
embedding = self.embedding_model.encode([user_term], show_progress_bar=False)[0]
|
||||
|
||||
# Query ChromaDB
|
||||
where_filter = {"source": source_filter} if source_filter else None
|
||||
|
||||
results = self.colors_collection.query(
|
||||
query_embeddings=[embedding.tolist()],
|
||||
n_results=min(n_results, self.colors_collection.count()),
|
||||
where=where_filter
|
||||
)
|
||||
|
||||
if not results or not results['ids'] or len(results['ids'][0]) == 0:
|
||||
return []
|
||||
|
||||
# Format results
|
||||
matches = []
|
||||
for i in range(len(results['ids'][0])):
|
||||
matches.append({
|
||||
"color": results['metadatas'][0][i]['color'],
|
||||
"distance": results['distances'][0][i],
|
||||
"similarity": 1.0 - results['distances'][0][i], # Convert distance to similarity
|
||||
"source": results['metadatas'][0][i]['source']
|
||||
})
|
||||
|
||||
return matches
|
||||
|
||||
def search_breed(
|
||||
self,
|
||||
user_term: str,
|
||||
n_results: int = 1,
|
||||
source_filter: Optional[str] = None
|
||||
) -> List[Dict]:
|
||||
"""
|
||||
Find most similar valid breed(s) to user term.
|
||||
|
||||
Args:
|
||||
user_term: User's breed preference (e.g., "siamese", "main coon")
|
||||
n_results: Number of results to return
|
||||
source_filter: Optional filter by source (petfinder/rescuegroups)
|
||||
|
||||
Returns:
|
||||
List of dicts with 'breed', 'distance', 'source' keys
|
||||
"""
|
||||
if not user_term or not user_term.strip():
|
||||
return []
|
||||
|
||||
# Generate embedding for user term
|
||||
embedding = self.embedding_model.encode([user_term], show_progress_bar=False)[0]
|
||||
|
||||
# Query ChromaDB
|
||||
where_filter = {"source": source_filter} if source_filter else None
|
||||
|
||||
results = self.breeds_collection.query(
|
||||
query_embeddings=[embedding.tolist()],
|
||||
n_results=min(n_results, self.breeds_collection.count()),
|
||||
where=where_filter
|
||||
)
|
||||
|
||||
if not results or not results['ids'] or len(results['ids'][0]) == 0:
|
||||
return []
|
||||
|
||||
# Format results
|
||||
matches = []
|
||||
for i in range(len(results['ids'][0])):
|
||||
matches.append({
|
||||
"breed": results['metadatas'][0][i]['breed'],
|
||||
"distance": results['distances'][0][i],
|
||||
"similarity": 1.0 - results['distances'][0][i],
|
||||
"source": results['metadatas'][0][i]['source']
|
||||
})
|
||||
|
||||
return matches
|
||||
|
||||
def clear_all(self) -> None:
|
||||
"""Clear all indexed data (for testing)."""
|
||||
try:
|
||||
self.client.delete_collection("colors")
|
||||
self.client.delete_collection("breeds")
|
||||
logging.info("Cleared all metadata collections")
|
||||
except Exception as e:
|
||||
logging.warning(f"Error clearing collections: {e}")
|
||||
|
||||
def get_stats(self) -> Dict[str, int]:
|
||||
"""Get statistics about indexed data."""
|
||||
return {
|
||||
"colors_count": self.colors_collection.count(),
|
||||
"breeds_count": self.breeds_collection.count()
|
||||
}
|
||||
|
||||
@@ -1,284 +0,0 @@
|
||||
"""Setup script for ChromaDB vector database."""
|
||||
|
||||
import os
|
||||
import chromadb
|
||||
from chromadb.config import Settings
|
||||
from typing import List
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from models.cats import Cat
|
||||
from sentence_transformers import SentenceTransformer
|
||||
|
||||
|
||||
class VectorDBManager:
|
||||
"""Manages ChromaDB for cat adoption semantic search."""
|
||||
|
||||
COLLECTION_NAME = "cats"
|
||||
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
|
||||
|
||||
def __init__(self, persist_directory: str = "cat_vectorstore"):
|
||||
"""
|
||||
Initialize the vector database manager.
|
||||
|
||||
Args:
|
||||
persist_directory: Directory for ChromaDB persistence
|
||||
"""
|
||||
self.persist_directory = persist_directory
|
||||
|
||||
# Create directory if it doesn't exist
|
||||
if not os.path.exists(persist_directory):
|
||||
os.makedirs(persist_directory)
|
||||
|
||||
# Initialize ChromaDB client
|
||||
self.client = chromadb.PersistentClient(
|
||||
path=persist_directory,
|
||||
settings=Settings(anonymized_telemetry=False)
|
||||
)
|
||||
|
||||
# Initialize embedding model
|
||||
print(f"Loading embedding model: {self.EMBEDDING_MODEL}")
|
||||
self.embedding_model = SentenceTransformer(self.EMBEDDING_MODEL)
|
||||
|
||||
# Get or create collection
|
||||
self.collection = self.client.get_or_create_collection(
|
||||
name=self.COLLECTION_NAME,
|
||||
metadata={'description': 'Cat adoption listings with semantic search'}
|
||||
)
|
||||
|
||||
print(f"Vector database initialized at {persist_directory}")
|
||||
print(f"Collection '{self.COLLECTION_NAME}' contains {self.collection.count()} documents")
|
||||
|
||||
def create_document_text(self, cat: Cat) -> str:
|
||||
"""
|
||||
Create searchable document text from cat attributes.
|
||||
|
||||
Combines description with key attributes for semantic search.
|
||||
|
||||
Args:
|
||||
cat: Cat object
|
||||
|
||||
Returns:
|
||||
Document text for embedding
|
||||
"""
|
||||
parts = []
|
||||
|
||||
# Add description
|
||||
if cat.description:
|
||||
parts.append(cat.description)
|
||||
|
||||
# Add breed info
|
||||
parts.append(f"Breed: {cat.breed}")
|
||||
if cat.breeds_secondary:
|
||||
parts.append(f"Mixed with: {', '.join(cat.breeds_secondary)}")
|
||||
|
||||
# Add personality hints from attributes
|
||||
traits = []
|
||||
if cat.good_with_children:
|
||||
traits.append("good with children")
|
||||
if cat.good_with_dogs:
|
||||
traits.append("good with dogs")
|
||||
if cat.good_with_cats:
|
||||
traits.append("good with other cats")
|
||||
if cat.house_trained:
|
||||
traits.append("house trained")
|
||||
if cat.special_needs:
|
||||
traits.append("has special needs")
|
||||
|
||||
if traits:
|
||||
parts.append(f"Personality: {', '.join(traits)}")
|
||||
|
||||
# Add color info
|
||||
if cat.colors:
|
||||
parts.append(f"Colors: {', '.join(cat.colors)}")
|
||||
|
||||
return " | ".join(parts)
|
||||
|
||||
def create_metadata(self, cat: Cat) -> dict:
|
||||
"""
|
||||
Create metadata dictionary for ChromaDB.
|
||||
|
||||
Args:
|
||||
cat: Cat object
|
||||
|
||||
Returns:
|
||||
Metadata dictionary
|
||||
"""
|
||||
return {
|
||||
'id': cat.id,
|
||||
'name': cat.name,
|
||||
'age': cat.age,
|
||||
'size': cat.size,
|
||||
'gender': cat.gender,
|
||||
'breed': cat.breed,
|
||||
'city': cat.city or '',
|
||||
'state': cat.state or '',
|
||||
'zip_code': cat.zip_code or '',
|
||||
'latitude': str(cat.latitude) if cat.latitude is not None else '',
|
||||
'longitude': str(cat.longitude) if cat.longitude is not None else '',
|
||||
'organization': cat.organization_name,
|
||||
'source': cat.source,
|
||||
'good_with_children': str(cat.good_with_children) if cat.good_with_children is not None else 'unknown',
|
||||
'good_with_dogs': str(cat.good_with_dogs) if cat.good_with_dogs is not None else 'unknown',
|
||||
'good_with_cats': str(cat.good_with_cats) if cat.good_with_cats is not None else 'unknown',
|
||||
'special_needs': str(cat.special_needs),
|
||||
'url': cat.url,
|
||||
'primary_photo': cat.primary_photo or '',
|
||||
}
|
||||
|
||||
def add_cat(self, cat: Cat) -> None:
|
||||
"""
|
||||
Add a single cat to the vector database.
|
||||
|
||||
Args:
|
||||
cat: Cat object to add
|
||||
"""
|
||||
document = self.create_document_text(cat)
|
||||
metadata = self.create_metadata(cat)
|
||||
|
||||
# Generate embedding
|
||||
embedding = self.embedding_model.encode([document])[0].tolist()
|
||||
|
||||
# Add to collection
|
||||
self.collection.add(
|
||||
ids=[cat.id],
|
||||
embeddings=[embedding],
|
||||
documents=[document],
|
||||
metadatas=[metadata]
|
||||
)
|
||||
|
||||
def add_cats_batch(self, cats: List[Cat], batch_size: int = 100) -> None:
|
||||
"""
|
||||
Add multiple cats to the vector database in batches.
|
||||
|
||||
Args:
|
||||
cats: List of Cat objects to add
|
||||
batch_size: Number of cats to process in each batch
|
||||
"""
|
||||
print(f"Adding {len(cats)} cats to vector database...")
|
||||
|
||||
for i in range(0, len(cats), batch_size):
|
||||
batch = cats[i:i+batch_size]
|
||||
|
||||
# Prepare data
|
||||
ids = [cat.id for cat in batch]
|
||||
documents = [self.create_document_text(cat) for cat in batch]
|
||||
metadatas = [self.create_metadata(cat) for cat in batch]
|
||||
|
||||
# Generate embeddings
|
||||
embeddings = self.embedding_model.encode(documents).tolist()
|
||||
|
||||
# Add to collection
|
||||
self.collection.upsert(
|
||||
ids=ids,
|
||||
embeddings=embeddings,
|
||||
documents=documents,
|
||||
metadatas=metadatas
|
||||
)
|
||||
|
||||
print(f"Processed batch {i//batch_size + 1}/{(len(cats)-1)//batch_size + 1}")
|
||||
|
||||
print(f"Successfully added {len(cats)} cats")
|
||||
|
||||
def update_cat(self, cat: Cat) -> None:
|
||||
"""
|
||||
Update an existing cat in the vector database.
|
||||
|
||||
Args:
|
||||
cat: Updated Cat object
|
||||
"""
|
||||
self.add_cat(cat)
|
||||
|
||||
def delete_cat(self, cat_id: str) -> None:
|
||||
"""
|
||||
Delete a cat from the vector database.
|
||||
|
||||
Args:
|
||||
cat_id: Cat ID to delete
|
||||
"""
|
||||
self.collection.delete(ids=[cat_id])
|
||||
|
||||
def search(self, query: str, n_results: int = 50, where: dict = None) -> dict:
|
||||
"""
|
||||
Search for cats using semantic similarity.
|
||||
|
||||
Args:
|
||||
query: Search query (personality description)
|
||||
n_results: Number of results to return
|
||||
where: Optional metadata filters
|
||||
|
||||
Returns:
|
||||
Search results dictionary
|
||||
"""
|
||||
# Generate query embedding
|
||||
query_embedding = self.embedding_model.encode([query])[0].tolist()
|
||||
|
||||
# Search collection
|
||||
results = self.collection.query(
|
||||
query_embeddings=[query_embedding],
|
||||
n_results=n_results,
|
||||
where=where,
|
||||
include=['documents', 'metadatas', 'distances']
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
def clear_collection(self) -> None:
|
||||
"""Delete all documents from the collection."""
|
||||
print(f"Clearing collection '{self.COLLECTION_NAME}'...")
|
||||
self.client.delete_collection(self.COLLECTION_NAME)
|
||||
self.collection = self.client.create_collection(
|
||||
name=self.COLLECTION_NAME,
|
||||
metadata={'description': 'Cat adoption listings with semantic search'}
|
||||
)
|
||||
print("Collection cleared")
|
||||
|
||||
def get_stats(self) -> dict:
|
||||
"""
|
||||
Get statistics about the vector database.
|
||||
|
||||
Returns:
|
||||
Dictionary with stats
|
||||
"""
|
||||
count = self.collection.count()
|
||||
return {
|
||||
'total_documents': count,
|
||||
'collection_name': self.COLLECTION_NAME,
|
||||
'persist_directory': self.persist_directory
|
||||
}
|
||||
|
||||
|
||||
def initialize_vectordb(persist_directory: str = "cat_vectorstore") -> VectorDBManager:
|
||||
"""
|
||||
Initialize the vector database.
|
||||
|
||||
Args:
|
||||
persist_directory: Directory for persistence
|
||||
|
||||
Returns:
|
||||
VectorDBManager instance
|
||||
"""
|
||||
load_dotenv()
|
||||
|
||||
# Get directory from environment or use default
|
||||
persist_dir = os.getenv('VECTORDB_PATH', persist_directory)
|
||||
|
||||
manager = VectorDBManager(persist_dir)
|
||||
|
||||
print("\nVector Database Initialized Successfully!")
|
||||
print(f"Location: {manager.persist_directory}")
|
||||
print(f"Collection: {manager.COLLECTION_NAME}")
|
||||
print(f"Documents: {manager.collection.count()}")
|
||||
|
||||
return manager
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Initialize database
|
||||
manager = initialize_vectordb()
|
||||
|
||||
# Print stats
|
||||
stats = manager.get_stats()
|
||||
print("\nDatabase Stats:")
|
||||
for key, value in stats.items():
|
||||
print(f" {key}: {value}")
|
||||
|
||||
@@ -1,291 +0,0 @@
|
||||
# 🧪 Testing Guide
|
||||
|
||||
## Test Overview
|
||||
|
||||
**Status**: ✅ **92/92 tests passing** (100%)
|
||||
|
||||
The test suite includes:
|
||||
- **81 unit tests** - Models, database, deduplication, email providers, semantic matching
|
||||
- **11 integration tests** - Search pipeline, alerts, app functionality, color/breed normalization
|
||||
- **4 manual test scripts** - Cache testing, email sending, semantic matching, framework testing
|
||||
|
||||
---
|
||||
|
||||
## Unit Tests (81 tests ✅)
|
||||
|
||||
Unit tests validate individual components in isolation.
|
||||
|
||||
### Test Data Models
|
||||
```bash
|
||||
pytest tests/unit/test_models.py -v
|
||||
```
|
||||
|
||||
**Tests**:
|
||||
- Cat model validation
|
||||
- CatProfile model validation
|
||||
- CatMatch model validation
|
||||
- AdoptionAlert model validation
|
||||
- SearchResult model validation
|
||||
- Field requirements and defaults
|
||||
- JSON serialization
|
||||
|
||||
### Test Database Operations
|
||||
```bash
|
||||
pytest tests/unit/test_database.py -v
|
||||
```
|
||||
|
||||
**Tests**:
|
||||
- Database initialization
|
||||
- Cat caching with fingerprints
|
||||
- Duplicate marking
|
||||
- Image embedding storage
|
||||
- Alert CRUD operations
|
||||
- Query filtering
|
||||
- Statistics retrieval
|
||||
|
||||
### Test Deduplication Logic
|
||||
```bash
|
||||
pytest tests/unit/test_deduplication.py -v
|
||||
```
|
||||
|
||||
**Tests**:
|
||||
- Fingerprint creation
|
||||
- Levenshtein similarity calculation
|
||||
- Composite score calculation
|
||||
- Three-tier deduplication pipeline
|
||||
- Image embedding comparison
|
||||
|
||||
### Test Email Providers
|
||||
```bash
|
||||
pytest tests/unit/test_email_providers.py -v
|
||||
```
|
||||
|
||||
**Tests**:
|
||||
- Mailgun provider initialization
|
||||
- Mailgun email sending
|
||||
- SendGrid stub behavior
|
||||
- Provider factory
|
||||
- Configuration loading
|
||||
- Error handling
|
||||
|
||||
### Test Metadata Vector Database
|
||||
```bash
|
||||
pytest tests/unit/test_metadata_vectordb.py -v
|
||||
```
|
||||
|
||||
**Tests** (11):
|
||||
- Vector DB initialization
|
||||
- Color indexing from multiple sources
|
||||
- Breed indexing from multiple sources
|
||||
- Semantic search for colors
|
||||
- Semantic search for breeds
|
||||
- Fuzzy matching with typos
|
||||
- Multi-source filtering
|
||||
- Empty search handling
|
||||
- N-results parameter
|
||||
- Statistics retrieval
|
||||
|
||||
### Test Color Mapping
|
||||
```bash
|
||||
pytest tests/unit/test_color_mapping.py -v
|
||||
```
|
||||
|
||||
**Tests** (15):
|
||||
- Dictionary matching for common terms (tuxedo, orange, gray)
|
||||
- Multiple color normalization
|
||||
- Exact match fallback
|
||||
- Substring match fallback
|
||||
- Vector DB fuzzy matching
|
||||
- Typo handling
|
||||
- Dictionary priority over vector search
|
||||
- Case-insensitive matching
|
||||
- Whitespace handling
|
||||
- Empty input handling
|
||||
- Color suggestions
|
||||
- All dictionary mappings validation
|
||||
|
||||
### Test Breed Mapping
|
||||
```bash
|
||||
pytest tests/unit/test_breed_mapping.py -v
|
||||
```
|
||||
|
||||
**Tests** (20):
|
||||
- Dictionary matching for common breeds (Maine Coon, Ragdoll, Sphynx)
|
||||
- Typo correction ("main coon" → "Maine Coon")
|
||||
- Mixed breed handling
|
||||
- Exact match fallback
|
||||
- Substring match fallback
|
||||
- Vector DB fuzzy matching
|
||||
- Dictionary priority
|
||||
- Case-insensitive matching
|
||||
- DSH/DMH/DLH abbreviations
|
||||
- Tabby/tuxedo pattern recognition
|
||||
- Norwegian Forest Cat variations
|
||||
- Similarity threshold testing
|
||||
- Breed suggestions
|
||||
- Whitespace handling
|
||||
- All dictionary mappings validation
|
||||
|
||||
---
|
||||
|
||||
## Integration Tests (11 tests ✅)
|
||||
|
||||
Integration tests validate end-to-end workflows.
|
||||
|
||||
### Test Search Pipeline
|
||||
```bash
|
||||
pytest tests/integration/test_search_pipeline.py -v
|
||||
```
|
||||
|
||||
**Tests**:
|
||||
- Complete search flow (API → dedup → cache → match → results)
|
||||
- Cache mode functionality
|
||||
- Deduplication integration
|
||||
- Hybrid matching
|
||||
- API failure handling
|
||||
- Vector DB updates
|
||||
- Statistics tracking
|
||||
|
||||
### Test Alerts System
|
||||
```bash
|
||||
pytest tests/integration/test_alerts.py -v
|
||||
```
|
||||
|
||||
**Tests**:
|
||||
- Alert creation and retrieval
|
||||
- Email-based alert queries
|
||||
- Alert updates (frequency, status)
|
||||
- Alert deletion
|
||||
- Immediate notifications (production mode)
|
||||
- Local vs production behavior
|
||||
- UI integration
|
||||
|
||||
### Test App Functionality
|
||||
```bash
|
||||
pytest tests/integration/test_app.py -v
|
||||
```
|
||||
|
||||
**Tests**:
|
||||
- Profile extraction from UI
|
||||
- Search result formatting
|
||||
- Alert management UI
|
||||
- Email validation
|
||||
- Error handling
|
||||
|
||||
### Test Color and Breed Normalization
|
||||
```bash
|
||||
pytest tests/integration/test_color_breed_normalization.py -v
|
||||
```
|
||||
|
||||
**Tests**:
|
||||
- Tuxedo color normalization in search flow
|
||||
- Multiple colors normalization
|
||||
- Breed normalization (Maine Coon typo handling)
|
||||
- Fuzzy matching with vector DB
|
||||
- Combined colors and breeds in search
|
||||
- RescueGroups API normalization
|
||||
- Empty preferences handling
|
||||
- Invalid color/breed graceful handling
|
||||
|
||||
---
|
||||
|
||||
## Manual Test Scripts
|
||||
|
||||
These scripts are for manual testing with real APIs and data.
|
||||
|
||||
### Test Cache and Deduplication
|
||||
```bash
|
||||
python tests/manual/test_cache_and_dedup.py
|
||||
```
|
||||
|
||||
**Purpose**: Verify cache mode and deduplication with real data
|
||||
|
||||
**What it does**:
|
||||
1. Runs a search without cache (fetches from APIs)
|
||||
2. Displays statistics (cats found, duplicates removed, cache size)
|
||||
3. Runs same search with cache (uses cached data)
|
||||
4. Compares performance and results
|
||||
5. Shows image embedding deduplication in action
|
||||
|
||||
### Test Email Sending
|
||||
```bash
|
||||
python tests/manual/test_email_sending.py
|
||||
```
|
||||
|
||||
**Purpose**: Send test emails via configured provider
|
||||
|
||||
**What it does**:
|
||||
1. Sends welcome email
|
||||
2. Sends match notification email with sample data
|
||||
3. Verifies HTML rendering and provider integration
|
||||
|
||||
**Requirements**: Valid MAILGUN_API_KEY or SENDGRID_API_KEY in `.env`
|
||||
|
||||
### Test Semantic Color/Breed Matching
|
||||
```bash
|
||||
python scripts/test_semantic_matching.py
|
||||
```
|
||||
|
||||
**Purpose**: Verify 3-tier color and breed matching system
|
||||
|
||||
**What it does**:
|
||||
1. Tests color mapping with and without vector DB
|
||||
2. Tests breed mapping with and without vector DB
|
||||
3. Demonstrates typo handling ("tuxado" → "tuxedo", "ragdol" → "Ragdoll")
|
||||
4. Shows dictionary vs vector vs fallback matching
|
||||
5. Displays similarity scores for fuzzy matches
|
||||
|
||||
**What you'll see**:
|
||||
- ✅ Dictionary matches (instant)
|
||||
- ✅ Vector DB fuzzy matches (with similarity scores)
|
||||
- ✅ Typo correction in action
|
||||
- ✅ 3-tier strategy demonstration
|
||||
|
||||
### Test Framework Directly
|
||||
```bash
|
||||
python cat_adoption_framework.py
|
||||
```
|
||||
|
||||
**Purpose**: Run framework end-to-end test
|
||||
|
||||
**What it does**:
|
||||
1. Initializes framework
|
||||
2. Creates sample profile
|
||||
3. Executes search
|
||||
4. Displays top matches
|
||||
5. Shows statistics
|
||||
|
||||
---
|
||||
|
||||
## Test Configuration
|
||||
|
||||
### Fixtures
|
||||
|
||||
Common test fixtures are defined in `tests/conftest.py`:
|
||||
|
||||
- `temp_db` - Temporary database for testing
|
||||
- `temp_vectordb` - Temporary vector store
|
||||
- `sample_cat` - Sample cat object
|
||||
- `sample_profile` - Sample search profile
|
||||
- `mock_framework` - Mocked framework for unit tests
|
||||
|
||||
### Environment
|
||||
|
||||
Tests use separate databases to avoid affecting production data:
|
||||
- `test_tuxedo_link.db` - Test database (auto-deleted)
|
||||
- `test_vectorstore` - Test vector store (auto-deleted)
|
||||
|
||||
### Mocking
|
||||
|
||||
External APIs are mocked in unit tests:
|
||||
- Petfinder API calls
|
||||
- RescueGroups API calls
|
||||
- Email provider calls
|
||||
- Modal remote functions
|
||||
|
||||
Integration tests can use real APIs (set `SKIP_API_TESTS=false` in environment).
|
||||
|
||||
---
|
||||
|
||||
**Need help?** Check the [TECHNICAL_REFERENCE.md](../docs/TECHNICAL_REFERENCE.md) for detailed function documentation.
|
||||
|
||||
@@ -1,2 +0,0 @@
|
||||
"""Tests for Tuxedo Link."""
|
||||
|
||||
@@ -1,45 +0,0 @@
|
||||
"""Pytest configuration and fixtures."""
|
||||
|
||||
import pytest
|
||||
import tempfile
|
||||
import os
|
||||
from database.manager import DatabaseManager
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_db():
|
||||
"""Create a temporary database for testing."""
|
||||
# Create temp path but don't create the file yet
|
||||
# This allows DatabaseManager to initialize it properly
|
||||
fd, path = tempfile.mkstemp(suffix='.db')
|
||||
os.close(fd)
|
||||
os.unlink(path) # Remove empty file so DatabaseManager can initialize it
|
||||
|
||||
db = DatabaseManager(path) # Tables are created automatically in __init__
|
||||
|
||||
yield db
|
||||
|
||||
# Cleanup
|
||||
try:
|
||||
os.unlink(path)
|
||||
except:
|
||||
pass
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_cat_data():
|
||||
"""Sample cat data for testing."""
|
||||
return {
|
||||
"id": "test123",
|
||||
"name": "Test Cat",
|
||||
"breed": "Persian",
|
||||
"age": "adult",
|
||||
"gender": "female",
|
||||
"size": "medium",
|
||||
"city": "Test City",
|
||||
"state": "TS",
|
||||
"source": "test",
|
||||
"organization_name": "Test Rescue",
|
||||
"url": "https://example.com/cat/test123"
|
||||
}
|
||||
|
||||
@@ -1,2 +0,0 @@
|
||||
"""Integration tests for Tuxedo Link."""
|
||||
|
||||
@@ -1,306 +0,0 @@
|
||||
"""Integration tests for alert management system."""
|
||||
|
||||
import pytest
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
|
||||
from database.manager import DatabaseManager
|
||||
from models.cats import AdoptionAlert, CatProfile
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_db():
|
||||
"""Create a temporary database for testing."""
|
||||
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f:
|
||||
db_path = f.name
|
||||
|
||||
# Unlink so DatabaseManager can initialize it
|
||||
Path(db_path).unlink()
|
||||
|
||||
db_manager = DatabaseManager(db_path)
|
||||
|
||||
yield db_manager
|
||||
|
||||
# Cleanup
|
||||
Path(db_path).unlink(missing_ok=True)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_profile():
|
||||
"""Create a sample cat profile for testing."""
|
||||
return CatProfile(
|
||||
user_location="New York, NY",
|
||||
max_distance=25,
|
||||
age_range=["young", "adult"],
|
||||
good_with_children=True,
|
||||
good_with_dogs=False,
|
||||
good_with_cats=True,
|
||||
personality_description="Friendly and playful",
|
||||
special_requirements=[]
|
||||
)
|
||||
|
||||
|
||||
class TestAlertManagement:
|
||||
"""Tests for alert management without user authentication."""
|
||||
|
||||
def test_create_alert_without_user(self, temp_db, sample_profile):
|
||||
"""Test creating an alert without user authentication."""
|
||||
alert = AdoptionAlert(
|
||||
user_email="test@example.com",
|
||||
profile=sample_profile,
|
||||
frequency="daily",
|
||||
active=True
|
||||
)
|
||||
|
||||
alert_id = temp_db.create_alert(alert)
|
||||
|
||||
assert alert_id is not None
|
||||
assert alert_id > 0
|
||||
|
||||
def test_get_alert_by_id(self, temp_db, sample_profile):
|
||||
"""Test retrieving an alert by ID."""
|
||||
alert = AdoptionAlert(
|
||||
user_email="test@example.com",
|
||||
profile=sample_profile,
|
||||
frequency="weekly",
|
||||
active=True
|
||||
)
|
||||
|
||||
alert_id = temp_db.create_alert(alert)
|
||||
retrieved_alert = temp_db.get_alert(alert_id)
|
||||
|
||||
assert retrieved_alert is not None
|
||||
assert retrieved_alert.id == alert_id
|
||||
assert retrieved_alert.user_email == "test@example.com"
|
||||
assert retrieved_alert.frequency == "weekly"
|
||||
assert retrieved_alert.profile.user_location == "New York, NY"
|
||||
|
||||
def test_get_alerts_by_email(self, temp_db, sample_profile):
|
||||
"""Test retrieving all alerts for a specific email."""
|
||||
email = "user@example.com"
|
||||
|
||||
# Create multiple alerts for the same email
|
||||
for freq in ["daily", "weekly", "immediately"]:
|
||||
alert = AdoptionAlert(
|
||||
user_email=email,
|
||||
profile=sample_profile,
|
||||
frequency=freq,
|
||||
active=True
|
||||
)
|
||||
temp_db.create_alert(alert)
|
||||
|
||||
# Create alert for different email
|
||||
other_alert = AdoptionAlert(
|
||||
user_email="other@example.com",
|
||||
profile=sample_profile,
|
||||
frequency="daily",
|
||||
active=True
|
||||
)
|
||||
temp_db.create_alert(other_alert)
|
||||
|
||||
# Retrieve alerts for specific email
|
||||
alerts = temp_db.get_alerts_by_email(email)
|
||||
|
||||
assert len(alerts) == 3
|
||||
assert all(a.user_email == email for a in alerts)
|
||||
|
||||
def test_get_all_alerts(self, temp_db, sample_profile):
|
||||
"""Test retrieving all alerts in the database."""
|
||||
# Create alerts for different emails
|
||||
for email in ["user1@test.com", "user2@test.com", "user3@test.com"]:
|
||||
alert = AdoptionAlert(
|
||||
user_email=email,
|
||||
profile=sample_profile,
|
||||
frequency="daily",
|
||||
active=True
|
||||
)
|
||||
temp_db.create_alert(alert)
|
||||
|
||||
all_alerts = temp_db.get_all_alerts()
|
||||
|
||||
assert len(all_alerts) == 3
|
||||
assert len(set(a.user_email for a in all_alerts)) == 3
|
||||
|
||||
def test_get_active_alerts(self, temp_db, sample_profile):
|
||||
"""Test retrieving only active alerts."""
|
||||
# Create active alerts
|
||||
for i in range(3):
|
||||
alert = AdoptionAlert(
|
||||
user_email=f"user{i}@test.com",
|
||||
profile=sample_profile,
|
||||
frequency="daily",
|
||||
active=True
|
||||
)
|
||||
temp_db.create_alert(alert)
|
||||
|
||||
# Create inactive alert
|
||||
inactive_alert = AdoptionAlert(
|
||||
user_email="inactive@test.com",
|
||||
profile=sample_profile,
|
||||
frequency="weekly",
|
||||
active=False
|
||||
)
|
||||
alert_id = temp_db.create_alert(inactive_alert)
|
||||
|
||||
# Deactivate it
|
||||
temp_db.update_alert(alert_id, active=False)
|
||||
|
||||
active_alerts = temp_db.get_active_alerts()
|
||||
|
||||
# Should only get the 3 active alerts
|
||||
assert len(active_alerts) == 3
|
||||
assert all(a.active for a in active_alerts)
|
||||
|
||||
def test_update_alert_frequency(self, temp_db, sample_profile):
|
||||
"""Test updating alert frequency."""
|
||||
alert = AdoptionAlert(
|
||||
user_email="test@example.com",
|
||||
profile=sample_profile,
|
||||
frequency="daily",
|
||||
active=True
|
||||
)
|
||||
|
||||
alert_id = temp_db.create_alert(alert)
|
||||
|
||||
# Update frequency
|
||||
temp_db.update_alert(alert_id, frequency="weekly")
|
||||
|
||||
updated_alert = temp_db.get_alert(alert_id)
|
||||
assert updated_alert.frequency == "weekly"
|
||||
|
||||
def test_update_alert_last_sent(self, temp_db, sample_profile):
|
||||
"""Test updating alert last_sent timestamp."""
|
||||
alert = AdoptionAlert(
|
||||
user_email="test@example.com",
|
||||
profile=sample_profile,
|
||||
frequency="daily",
|
||||
active=True
|
||||
)
|
||||
|
||||
alert_id = temp_db.create_alert(alert)
|
||||
|
||||
# Update last_sent
|
||||
now = datetime.now()
|
||||
temp_db.update_alert(alert_id, last_sent=now)
|
||||
|
||||
updated_alert = temp_db.get_alert(alert_id)
|
||||
assert updated_alert.last_sent is not None
|
||||
# Compare with some tolerance
|
||||
assert abs((updated_alert.last_sent - now).total_seconds()) < 2
|
||||
|
||||
def test_update_alert_match_ids(self, temp_db, sample_profile):
|
||||
"""Test updating alert last_match_ids."""
|
||||
alert = AdoptionAlert(
|
||||
user_email="test@example.com",
|
||||
profile=sample_profile,
|
||||
frequency="daily",
|
||||
active=True
|
||||
)
|
||||
|
||||
alert_id = temp_db.create_alert(alert)
|
||||
|
||||
# Update match IDs
|
||||
match_ids = ["cat-123", "cat-456", "cat-789"]
|
||||
temp_db.update_alert(alert_id, last_match_ids=match_ids)
|
||||
|
||||
updated_alert = temp_db.get_alert(alert_id)
|
||||
assert updated_alert.last_match_ids == match_ids
|
||||
|
||||
def test_toggle_alert_active_status(self, temp_db, sample_profile):
|
||||
"""Test toggling alert active/inactive."""
|
||||
alert = AdoptionAlert(
|
||||
user_email="test@example.com",
|
||||
profile=sample_profile,
|
||||
frequency="daily",
|
||||
active=True
|
||||
)
|
||||
|
||||
alert_id = temp_db.create_alert(alert)
|
||||
|
||||
# Deactivate
|
||||
temp_db.update_alert(alert_id, active=False)
|
||||
assert temp_db.get_alert(alert_id).active is False
|
||||
|
||||
# Reactivate
|
||||
temp_db.update_alert(alert_id, active=True)
|
||||
assert temp_db.get_alert(alert_id).active is True
|
||||
|
||||
def test_delete_alert(self, temp_db, sample_profile):
|
||||
"""Test deleting an alert."""
|
||||
alert = AdoptionAlert(
|
||||
user_email="test@example.com",
|
||||
profile=sample_profile,
|
||||
frequency="daily",
|
||||
active=True
|
||||
)
|
||||
|
||||
alert_id = temp_db.create_alert(alert)
|
||||
|
||||
# Verify alert exists
|
||||
assert temp_db.get_alert(alert_id) is not None
|
||||
|
||||
# Delete alert
|
||||
temp_db.delete_alert(alert_id)
|
||||
|
||||
# Verify alert is gone
|
||||
assert temp_db.get_alert(alert_id) is None
|
||||
|
||||
def test_multiple_alerts_same_email(self, temp_db, sample_profile):
|
||||
"""Test creating multiple alerts for the same email address."""
|
||||
email = "test@example.com"
|
||||
|
||||
# Create alerts with different frequencies
|
||||
for freq in ["immediately", "daily", "weekly"]:
|
||||
alert = AdoptionAlert(
|
||||
user_email=email,
|
||||
profile=sample_profile,
|
||||
frequency=freq,
|
||||
active=True
|
||||
)
|
||||
temp_db.create_alert(alert)
|
||||
|
||||
alerts = temp_db.get_alerts_by_email(email)
|
||||
|
||||
assert len(alerts) == 3
|
||||
frequencies = {a.frequency for a in alerts}
|
||||
assert frequencies == {"immediately", "daily", "weekly"}
|
||||
|
||||
def test_alert_profile_persistence(self, temp_db):
|
||||
"""Test that complex profile data persists correctly."""
|
||||
complex_profile = CatProfile(
|
||||
user_location="San Francisco, CA",
|
||||
max_distance=50,
|
||||
age_range=["kitten", "young"],
|
||||
size=["small", "medium"],
|
||||
preferred_breeds=["Siamese", "Persian"],
|
||||
good_with_children=True,
|
||||
good_with_dogs=True,
|
||||
good_with_cats=False,
|
||||
special_needs_ok=False,
|
||||
personality_description="Calm and affectionate lap cat"
|
||||
)
|
||||
|
||||
alert = AdoptionAlert(
|
||||
user_email="test@example.com",
|
||||
profile=complex_profile,
|
||||
frequency="daily",
|
||||
active=True
|
||||
)
|
||||
|
||||
alert_id = temp_db.create_alert(alert)
|
||||
retrieved_alert = temp_db.get_alert(alert_id)
|
||||
|
||||
# Verify all profile fields persisted correctly
|
||||
assert retrieved_alert.profile.user_location == "San Francisco, CA"
|
||||
assert retrieved_alert.profile.max_distance == 50
|
||||
assert retrieved_alert.profile.age_range == ["kitten", "young"]
|
||||
assert retrieved_alert.profile.size == ["small", "medium"]
|
||||
assert retrieved_alert.profile.gender == ["female"]
|
||||
assert retrieved_alert.profile.breed == ["Siamese", "Persian"]
|
||||
assert retrieved_alert.profile.good_with_children is True
|
||||
assert retrieved_alert.profile.good_with_dogs is True
|
||||
assert retrieved_alert.profile.good_with_cats is False
|
||||
assert retrieved_alert.profile.personality_description == "Calm and affectionate lap cat"
|
||||
assert retrieved_alert.profile.special_requirements == ["indoor-only", "senior-friendly"]
|
||||
|
||||
@@ -1,194 +0,0 @@
|
||||
"""Integration tests for the Gradio app interface."""
|
||||
|
||||
import pytest
|
||||
from unittest.mock import Mock, patch, MagicMock
|
||||
from app import extract_profile_from_text
|
||||
from models.cats import CatProfile, Cat, CatMatch
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_framework():
|
||||
"""Mock the TuxedoLinkFramework."""
|
||||
with patch('app.framework') as mock:
|
||||
# Create a mock result
|
||||
mock_cat = Cat(
|
||||
id="test_1",
|
||||
name="Test Cat",
|
||||
breed="Persian",
|
||||
age="young",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="New York",
|
||||
state="NY",
|
||||
source="test",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/test_1",
|
||||
description="A friendly and playful cat"
|
||||
)
|
||||
|
||||
mock_match = CatMatch(
|
||||
cat=mock_cat,
|
||||
match_score=0.95,
|
||||
vector_similarity=0.92,
|
||||
attribute_match_score=0.98,
|
||||
explanation="Great match for your preferences"
|
||||
)
|
||||
|
||||
mock_result = Mock()
|
||||
mock_result.matches = [mock_match]
|
||||
mock_result.search_time = 0.5
|
||||
mock.search.return_value = mock_result
|
||||
|
||||
yield mock
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_profile_agent():
|
||||
"""Mock the ProfileAgent."""
|
||||
with patch('app.profile_agent') as mock:
|
||||
mock_profile = CatProfile(
|
||||
user_location="10001",
|
||||
max_distance=50,
|
||||
personality_description="friendly and playful",
|
||||
age_range=["young"],
|
||||
good_with_children=True
|
||||
)
|
||||
mock.extract_profile.return_value = mock_profile
|
||||
yield mock
|
||||
|
||||
|
||||
class TestAppInterface:
|
||||
"""Test the Gradio app interface functions."""
|
||||
|
||||
def test_extract_profile_with_valid_input(self, mock_framework, mock_profile_agent):
|
||||
"""Test that valid user input is processed correctly."""
|
||||
user_input = "I want a friendly kitten in NYC"
|
||||
|
||||
chat_history, results_html, profile_json = extract_profile_from_text(user_input, use_cache=True)
|
||||
|
||||
# Verify chat history format (messages format)
|
||||
assert isinstance(chat_history, list)
|
||||
assert len(chat_history) == 2
|
||||
assert chat_history[0]["role"] == "user"
|
||||
assert chat_history[0]["content"] == user_input
|
||||
assert chat_history[1]["role"] == "assistant"
|
||||
assert "Found" in chat_history[1]["content"] or "match" in chat_history[1]["content"].lower()
|
||||
|
||||
# Verify profile agent was called with correct format
|
||||
mock_profile_agent.extract_profile.assert_called_once()
|
||||
call_args = mock_profile_agent.extract_profile.call_args[0][0]
|
||||
assert isinstance(call_args, list)
|
||||
assert call_args[0]["role"] == "user"
|
||||
assert call_args[0]["content"] == user_input
|
||||
|
||||
# Verify results HTML is generated
|
||||
assert results_html
|
||||
assert "<div" in results_html
|
||||
|
||||
# Verify profile JSON is returned
|
||||
assert profile_json
|
||||
|
||||
def test_extract_profile_with_empty_input(self, mock_framework, mock_profile_agent):
|
||||
"""Test that empty input uses placeholder text."""
|
||||
user_input = ""
|
||||
|
||||
chat_history, results_html, profile_json = extract_profile_from_text(user_input, use_cache=True)
|
||||
|
||||
# Verify placeholder text was used
|
||||
mock_profile_agent.extract_profile.assert_called_once()
|
||||
call_args = mock_profile_agent.extract_profile.call_args[0][0]
|
||||
assert call_args[0]["content"] != ""
|
||||
assert "friendly" in call_args[0]["content"].lower()
|
||||
assert "playful" in call_args[0]["content"].lower()
|
||||
|
||||
# Verify chat history format
|
||||
assert isinstance(chat_history, list)
|
||||
assert len(chat_history) == 2
|
||||
assert chat_history[0]["role"] == "user"
|
||||
assert chat_history[1]["role"] == "assistant"
|
||||
|
||||
def test_extract_profile_with_whitespace_input(self, mock_framework, mock_profile_agent):
|
||||
"""Test that whitespace-only input uses placeholder text."""
|
||||
user_input = " \n\t "
|
||||
|
||||
chat_history, results_html, profile_json = extract_profile_from_text(user_input, use_cache=True)
|
||||
|
||||
# Verify placeholder text was used
|
||||
mock_profile_agent.extract_profile.assert_called_once()
|
||||
call_args = mock_profile_agent.extract_profile.call_args[0][0]
|
||||
assert call_args[0]["content"].strip() != ""
|
||||
|
||||
def test_extract_profile_error_handling(self, mock_framework, mock_profile_agent):
|
||||
"""Test error handling when profile extraction fails."""
|
||||
user_input = "I want a cat"
|
||||
|
||||
# Make profile agent raise an error
|
||||
mock_profile_agent.extract_profile.side_effect = Exception("API Error")
|
||||
|
||||
chat_history, results_html, profile_json = extract_profile_from_text(user_input, use_cache=True)
|
||||
|
||||
# Verify error message is in chat history
|
||||
assert isinstance(chat_history, list)
|
||||
assert len(chat_history) == 2
|
||||
assert chat_history[0]["role"] == "user"
|
||||
assert chat_history[1]["role"] == "assistant"
|
||||
assert "Error" in chat_history[1]["content"] or "❌" in chat_history[1]["content"]
|
||||
|
||||
# Verify empty results
|
||||
assert results_html == ""
|
||||
assert profile_json == ""
|
||||
|
||||
def test_cache_mode_parameter(self, mock_framework, mock_profile_agent):
|
||||
"""Test that cache mode parameter is passed correctly."""
|
||||
user_input = "I want a cat in NYC"
|
||||
|
||||
# Test with cache=True
|
||||
extract_profile_from_text(user_input, use_cache=True)
|
||||
mock_framework.search.assert_called_once()
|
||||
assert mock_framework.search.call_args[1]["use_cache"] is True
|
||||
|
||||
# Reset and test with cache=False
|
||||
mock_framework.reset_mock()
|
||||
extract_profile_from_text(user_input, use_cache=False)
|
||||
mock_framework.search.assert_called_once()
|
||||
assert mock_framework.search.call_args[1]["use_cache"] is False
|
||||
|
||||
def test_messages_format_consistency(self, mock_framework, mock_profile_agent):
|
||||
"""Test that messages format is consistent throughout."""
|
||||
user_input = "Show me cats"
|
||||
|
||||
chat_history, _, _ = extract_profile_from_text(user_input, use_cache=True)
|
||||
|
||||
# Verify all messages have correct format
|
||||
for msg in chat_history:
|
||||
assert isinstance(msg, dict)
|
||||
assert "role" in msg
|
||||
assert "content" in msg
|
||||
assert msg["role"] in ["user", "assistant"]
|
||||
assert isinstance(msg["content"], str)
|
||||
|
||||
def test_example_button_scenarios(self, mock_framework, mock_profile_agent):
|
||||
"""Test example button text scenarios."""
|
||||
examples = [
|
||||
"I want a friendly family cat in zip code 10001, good with children and dogs",
|
||||
"Looking for a playful young kitten near New York City",
|
||||
"I need a calm, affectionate adult cat that likes to cuddle",
|
||||
"Show me cats good with children in the NYC area"
|
||||
]
|
||||
|
||||
for example in examples:
|
||||
mock_profile_agent.reset_mock()
|
||||
mock_framework.reset_mock()
|
||||
|
||||
chat_history, results_html, profile_json = extract_profile_from_text(example, use_cache=True)
|
||||
|
||||
# Verify each example is processed
|
||||
assert isinstance(chat_history, list)
|
||||
assert len(chat_history) == 2
|
||||
assert chat_history[0]["content"] == example
|
||||
mock_profile_agent.extract_profile.assert_called_once()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
|
||||
@@ -1,323 +0,0 @@
|
||||
"""Integration tests for color and breed normalization in search pipeline."""
|
||||
|
||||
import pytest
|
||||
import tempfile
|
||||
import shutil
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
from models.cats import CatProfile
|
||||
from setup_metadata_vectordb import MetadataVectorDB
|
||||
from agents.planning_agent import PlanningAgent
|
||||
from database.manager import DatabaseManager
|
||||
from setup_vectordb import VectorDBManager
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_dirs():
|
||||
"""Create temporary directories for testing."""
|
||||
db_dir = tempfile.mkdtemp()
|
||||
vector_dir = tempfile.mkdtemp()
|
||||
metadata_dir = tempfile.mkdtemp()
|
||||
|
||||
yield db_dir, vector_dir, metadata_dir
|
||||
|
||||
# Cleanup
|
||||
shutil.rmtree(db_dir, ignore_errors=True)
|
||||
shutil.rmtree(vector_dir, ignore_errors=True)
|
||||
shutil.rmtree(metadata_dir, ignore_errors=True)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def metadata_vectordb(temp_dirs):
|
||||
"""Create metadata vector DB with sample data."""
|
||||
_, _, metadata_dir = temp_dirs
|
||||
vectordb = MetadataVectorDB(persist_directory=metadata_dir)
|
||||
|
||||
# Index sample colors and breeds
|
||||
colors = [
|
||||
"Black",
|
||||
"White",
|
||||
"Black & White / Tuxedo",
|
||||
"Orange / Red",
|
||||
"Gray / Blue / Silver",
|
||||
"Calico"
|
||||
]
|
||||
|
||||
breeds = [
|
||||
"Siamese",
|
||||
"Persian",
|
||||
"Maine Coon",
|
||||
"Domestic Short Hair",
|
||||
"Domestic Medium Hair"
|
||||
]
|
||||
|
||||
vectordb.index_colors(colors, source="petfinder")
|
||||
vectordb.index_breeds(breeds, source="petfinder")
|
||||
|
||||
return vectordb
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def planning_agent(temp_dirs, metadata_vectordb):
|
||||
"""Create planning agent with metadata vector DB."""
|
||||
db_dir, vector_dir, _ = temp_dirs
|
||||
|
||||
db_manager = DatabaseManager(f"{db_dir}/test.db")
|
||||
vector_db = VectorDBManager(vector_dir)
|
||||
|
||||
return PlanningAgent(db_manager, vector_db, metadata_vectordb)
|
||||
|
||||
|
||||
class TestColorBreedNormalization:
|
||||
"""Integration tests for color and breed normalization."""
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors')
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_breeds')
|
||||
def test_tuxedo_color_normalization(
|
||||
self,
|
||||
mock_get_breeds,
|
||||
mock_get_colors,
|
||||
mock_search,
|
||||
planning_agent
|
||||
):
|
||||
"""Test that 'tuxedo' is correctly normalized to 'Black & White / Tuxedo'."""
|
||||
# Setup mocks
|
||||
mock_get_colors.return_value = [
|
||||
"Black",
|
||||
"White",
|
||||
"Black & White / Tuxedo"
|
||||
]
|
||||
mock_get_breeds.return_value = ["Domestic Short Hair"]
|
||||
mock_search.return_value = []
|
||||
|
||||
# Create profile with "tuxedo" color
|
||||
profile = CatProfile(
|
||||
user_location="New York, NY",
|
||||
color_preferences=["tuxedo"]
|
||||
)
|
||||
|
||||
# Execute search (will fail but we just want to see the API call)
|
||||
try:
|
||||
planning_agent._search_petfinder(profile)
|
||||
except:
|
||||
pass
|
||||
|
||||
# Verify search_cats was called with correct normalized color
|
||||
assert mock_search.called
|
||||
call_args = mock_search.call_args
|
||||
|
||||
# Check that color parameter contains the correct API value
|
||||
if 'color' in call_args.kwargs and call_args.kwargs['color']:
|
||||
assert "Black & White / Tuxedo" in call_args.kwargs['color']
|
||||
assert "Black" not in call_args.kwargs['color'] or len(call_args.kwargs['color']) == 1
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors')
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_breeds')
|
||||
def test_multiple_colors_normalization(
|
||||
self,
|
||||
mock_get_breeds,
|
||||
mock_get_colors,
|
||||
mock_search,
|
||||
planning_agent
|
||||
):
|
||||
"""Test normalization of multiple color preferences."""
|
||||
mock_get_colors.return_value = [
|
||||
"Black & White / Tuxedo",
|
||||
"Orange / Red",
|
||||
"Calico"
|
||||
]
|
||||
mock_get_breeds.return_value = []
|
||||
mock_search.return_value = []
|
||||
|
||||
profile = CatProfile(
|
||||
user_location="New York, NY",
|
||||
color_preferences=["tuxedo", "orange", "calico"]
|
||||
)
|
||||
|
||||
try:
|
||||
planning_agent._search_petfinder(profile)
|
||||
except:
|
||||
pass
|
||||
|
||||
assert mock_search.called
|
||||
call_args = mock_search.call_args
|
||||
|
||||
if 'color' in call_args.kwargs and call_args.kwargs['color']:
|
||||
colors = call_args.kwargs['color']
|
||||
assert len(colors) == 3
|
||||
assert "Black & White / Tuxedo" in colors
|
||||
assert "Orange / Red" in colors
|
||||
assert "Calico" in colors
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors')
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_breeds')
|
||||
def test_breed_normalization_maine_coon(
|
||||
self,
|
||||
mock_get_breeds,
|
||||
mock_get_colors,
|
||||
mock_search,
|
||||
planning_agent
|
||||
):
|
||||
"""Test that 'main coon' (typo) is normalized to 'Maine Coon'."""
|
||||
mock_get_colors.return_value = []
|
||||
mock_get_breeds.return_value = ["Maine Coon", "Siamese"]
|
||||
mock_search.return_value = []
|
||||
|
||||
profile = CatProfile(
|
||||
user_location="New York, NY",
|
||||
breed_preferences=["main coon"] # Typo
|
||||
)
|
||||
|
||||
try:
|
||||
planning_agent._search_petfinder(profile)
|
||||
except:
|
||||
pass
|
||||
|
||||
assert mock_search.called
|
||||
call_args = mock_search.call_args
|
||||
|
||||
if 'breed' in call_args.kwargs and call_args.kwargs['breed']:
|
||||
assert "Maine Coon" in call_args.kwargs['breed']
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors')
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_breeds')
|
||||
def test_fuzzy_color_matching_with_vectordb(
|
||||
self,
|
||||
mock_get_breeds,
|
||||
mock_get_colors,
|
||||
mock_search,
|
||||
planning_agent
|
||||
):
|
||||
"""Test fuzzy matching with vector DB for typos."""
|
||||
mock_get_colors.return_value = ["Black & White / Tuxedo"]
|
||||
mock_get_breeds.return_value = []
|
||||
mock_search.return_value = []
|
||||
|
||||
# Use a term that requires vector search (not in dictionary)
|
||||
profile = CatProfile(
|
||||
user_location="New York, NY",
|
||||
color_preferences=["tuxado"] # Typo
|
||||
)
|
||||
|
||||
try:
|
||||
planning_agent._search_petfinder(profile)
|
||||
except:
|
||||
pass
|
||||
|
||||
assert mock_search.called
|
||||
# May or may not match depending on similarity threshold
|
||||
# This test primarily ensures no errors occur
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors')
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.get_valid_breeds')
|
||||
def test_colors_and_breeds_together(
|
||||
self,
|
||||
mock_get_breeds,
|
||||
mock_get_colors,
|
||||
mock_search,
|
||||
planning_agent
|
||||
):
|
||||
"""Test normalization of both colors and breeds in same search."""
|
||||
mock_get_colors.return_value = ["Black & White / Tuxedo", "Orange / Red"]
|
||||
mock_get_breeds.return_value = ["Siamese", "Maine Coon"]
|
||||
mock_search.return_value = []
|
||||
|
||||
profile = CatProfile(
|
||||
user_location="New York, NY",
|
||||
color_preferences=["tuxedo", "orange"],
|
||||
breed_preferences=["siamese", "main coon"]
|
||||
)
|
||||
|
||||
try:
|
||||
planning_agent._search_petfinder(profile)
|
||||
except:
|
||||
pass
|
||||
|
||||
assert mock_search.called
|
||||
call_args = mock_search.call_args
|
||||
|
||||
# Verify both colors and breeds are normalized
|
||||
if 'color' in call_args.kwargs and call_args.kwargs['color']:
|
||||
assert "Black & White / Tuxedo" in call_args.kwargs['color']
|
||||
assert "Orange / Red" in call_args.kwargs['color']
|
||||
|
||||
if 'breed' in call_args.kwargs and call_args.kwargs['breed']:
|
||||
assert "Siamese" in call_args.kwargs['breed']
|
||||
assert "Maine Coon" in call_args.kwargs['breed']
|
||||
|
||||
@patch('agents.rescuegroups_agent.RescueGroupsAgent.search_cats')
|
||||
@patch('agents.rescuegroups_agent.RescueGroupsAgent.get_valid_colors')
|
||||
@patch('agents.rescuegroups_agent.RescueGroupsAgent.get_valid_breeds')
|
||||
def test_rescuegroups_normalization(
|
||||
self,
|
||||
mock_get_breeds,
|
||||
mock_get_colors,
|
||||
mock_search,
|
||||
planning_agent
|
||||
):
|
||||
"""Test that normalization works for RescueGroups API too."""
|
||||
mock_get_colors.return_value = ["Tuxedo", "Orange"]
|
||||
mock_get_breeds.return_value = ["Siamese"]
|
||||
mock_search.return_value = []
|
||||
|
||||
profile = CatProfile(
|
||||
user_location="New York, NY",
|
||||
color_preferences=["tuxedo"],
|
||||
breed_preferences=["siamese"]
|
||||
)
|
||||
|
||||
try:
|
||||
planning_agent._search_rescuegroups(profile)
|
||||
except:
|
||||
pass
|
||||
|
||||
assert mock_search.called
|
||||
# Normalization should have occurred with rescuegroups source
|
||||
|
||||
def test_no_colors_or_breeds(self, planning_agent):
|
||||
"""Test search without color or breed preferences."""
|
||||
with patch('agents.petfinder_agent.PetfinderAgent.search_cats') as mock_search:
|
||||
mock_search.return_value = []
|
||||
|
||||
profile = CatProfile(
|
||||
user_location="New York, NY"
|
||||
# No color_preferences or breed_preferences
|
||||
)
|
||||
|
||||
try:
|
||||
planning_agent._search_petfinder(profile)
|
||||
except:
|
||||
pass
|
||||
|
||||
assert mock_search.called
|
||||
call_args = mock_search.call_args
|
||||
|
||||
# Should be None or empty
|
||||
assert call_args.kwargs.get('color') is None or len(call_args.kwargs.get('color', [])) == 0
|
||||
assert call_args.kwargs.get('breed') is None or len(call_args.kwargs.get('breed', [])) == 0
|
||||
|
||||
def test_invalid_color_graceful_handling(self, planning_agent):
|
||||
"""Test that invalid colors don't break the search."""
|
||||
with patch('agents.petfinder_agent.PetfinderAgent.search_cats') as mock_search:
|
||||
with patch('agents.petfinder_agent.PetfinderAgent.get_valid_colors') as mock_colors:
|
||||
mock_search.return_value = []
|
||||
mock_colors.return_value = ["Black", "White"]
|
||||
|
||||
profile = CatProfile(
|
||||
user_location="New York, NY",
|
||||
color_preferences=["invalid_color_xyz"]
|
||||
)
|
||||
|
||||
try:
|
||||
planning_agent._search_petfinder(profile)
|
||||
except:
|
||||
pass
|
||||
|
||||
# Should still call search, just with empty/None color
|
||||
assert mock_search.called
|
||||
|
||||
@@ -1,265 +0,0 @@
|
||||
"""Integration tests for the complete search pipeline."""
|
||||
|
||||
import pytest
|
||||
from unittest.mock import Mock, patch
|
||||
from models.cats import Cat, CatProfile
|
||||
from cat_adoption_framework import TuxedoLinkFramework
|
||||
from utils.deduplication import create_fingerprint
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def framework():
|
||||
"""Create framework instance with test database."""
|
||||
return TuxedoLinkFramework()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_cats():
|
||||
"""Create sample cat data for testing."""
|
||||
cats = []
|
||||
for i in range(5):
|
||||
cat = Cat(
|
||||
id=f"test_{i}",
|
||||
name=f"Test Cat {i}",
|
||||
breed="Persian" if i % 2 == 0 else "Siamese",
|
||||
age="young" if i < 3 else "adult",
|
||||
gender="female" if i % 2 == 0 else "male",
|
||||
size="medium",
|
||||
city="Test City",
|
||||
state="TS",
|
||||
source="test",
|
||||
organization_name="Test Rescue",
|
||||
url=f"https://example.com/cat/test_{i}",
|
||||
description=f"A friendly and playful cat number {i}",
|
||||
good_with_children=True if i < 4 else False
|
||||
)
|
||||
cat.fingerprint = create_fingerprint(cat)
|
||||
cats.append(cat)
|
||||
return cats
|
||||
|
||||
|
||||
class TestSearchPipelineIntegration:
|
||||
"""Integration tests for complete search pipeline."""
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
@patch('agents.rescuegroups_agent.RescueGroupsAgent.search_cats')
|
||||
def test_end_to_end_search(self, mock_rescuegroups, mock_petfinder, framework, sample_cats):
|
||||
"""Test end-to-end search with mocked API responses."""
|
||||
# Mock API responses
|
||||
mock_petfinder.return_value = sample_cats[:3]
|
||||
mock_rescuegroups.return_value = sample_cats[3:]
|
||||
|
||||
# Create search profile
|
||||
profile = CatProfile(
|
||||
user_location="10001",
|
||||
max_distance=50,
|
||||
personality_description="friendly playful cat",
|
||||
age_range=["young"],
|
||||
good_with_children=True
|
||||
)
|
||||
|
||||
# Execute search
|
||||
result = framework.search(profile)
|
||||
|
||||
# Verify results
|
||||
assert result.total_found == 5
|
||||
assert len(result.matches) > 0
|
||||
assert result.search_time > 0
|
||||
assert 'cache' not in result.sources_queried # Should be fresh search
|
||||
|
||||
# Verify API calls were made
|
||||
mock_petfinder.assert_called_once()
|
||||
mock_rescuegroups.assert_called_once()
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
def test_cache_mode_search(self, mock_petfinder, framework, sample_cats):
|
||||
"""Test search using cache mode."""
|
||||
# First populate cache
|
||||
mock_petfinder.return_value = sample_cats
|
||||
profile = CatProfile(user_location="10001")
|
||||
result1 = framework.search(profile)
|
||||
|
||||
# Reset mock
|
||||
mock_petfinder.reset_mock()
|
||||
|
||||
# Second search with cache
|
||||
result2 = framework.search(profile, use_cache=True)
|
||||
|
||||
# Verify cache was used
|
||||
assert 'cache' in result2.sources_queried
|
||||
assert result2.search_time < result1.search_time # Cache should be faster
|
||||
mock_petfinder.assert_not_called() # Should not call API
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
def test_deduplication_integration(self, mock_petfinder, framework, sample_cats):
|
||||
"""Test that deduplication works in the pipeline."""
|
||||
# Test deduplication by creating cats that only differ by source
|
||||
# They will be marked as duplicates due to same fingerprint (org + breed + age + gender)
|
||||
cat1 = Cat(
|
||||
id="duplicate_test_1",
|
||||
name="Fluffy",
|
||||
breed="Persian",
|
||||
age="young",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="Test City",
|
||||
state="TS",
|
||||
source="petfinder",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/dup1"
|
||||
)
|
||||
|
||||
# Same cat from different source - will have same fingerprint
|
||||
cat2 = Cat(
|
||||
id="duplicate_test_2",
|
||||
name="Fluffy", # Same name
|
||||
breed="Persian", # Same breed
|
||||
age="young", # Same age
|
||||
gender="female", # Same gender
|
||||
size="medium",
|
||||
city="Test City",
|
||||
state="TS",
|
||||
source="rescuegroups", # Different source (but same fingerprint)
|
||||
organization_name="Test Rescue", # Same org
|
||||
url="https://example.com/cat/dup2"
|
||||
)
|
||||
|
||||
# Verify same fingerprints
|
||||
fp1 = create_fingerprint(cat1)
|
||||
fp2 = create_fingerprint(cat2)
|
||||
assert fp1 == fp2, f"Fingerprints should match: {fp1} vs {fp2}"
|
||||
|
||||
mock_petfinder.return_value = [cat1, cat2]
|
||||
|
||||
profile = CatProfile(user_location="10001")
|
||||
result = framework.search(profile)
|
||||
|
||||
# With same fingerprints, one should be marked as duplicate
|
||||
# Note: duplicates_removed counts cats marked as duplicates
|
||||
# The actual behavior is that cats with same fingerprint are deduplicated
|
||||
if result.duplicates_removed == 0:
|
||||
# If 0 duplicates removed, skip this check - dedup may already have been done
|
||||
# or cats may have been in cache
|
||||
pass
|
||||
else:
|
||||
assert result.duplicates_removed >= 1
|
||||
assert result.total_found == 2
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
def test_hybrid_matching_integration(self, mock_petfinder, framework, sample_cats):
|
||||
"""Test that hybrid matching filters and ranks correctly."""
|
||||
mock_petfinder.return_value = sample_cats
|
||||
|
||||
# Search for young cats only
|
||||
profile = CatProfile(
|
||||
user_location="10001",
|
||||
personality_description="friendly playful",
|
||||
age_range=["young"]
|
||||
)
|
||||
|
||||
result = framework.search(profile)
|
||||
|
||||
# All results should be young cats
|
||||
for match in result.matches:
|
||||
assert match.cat.age == "young"
|
||||
|
||||
# Should have match scores
|
||||
assert all(0 <= m.match_score <= 1 for m in result.matches)
|
||||
|
||||
# Should have explanations
|
||||
assert all(m.explanation for m in result.matches)
|
||||
|
||||
def test_stats_integration(self, framework):
|
||||
"""Test that stats are tracked correctly."""
|
||||
stats = framework.get_stats()
|
||||
|
||||
assert 'database' in stats
|
||||
assert 'vector_db' in stats
|
||||
assert 'total_unique' in stats['database']
|
||||
|
||||
|
||||
class TestAPIFailureHandling:
|
||||
"""Test that pipeline handles API failures gracefully."""
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
@patch('agents.rescuegroups_agent.RescueGroupsAgent.search_cats')
|
||||
def test_one_api_fails(self, mock_rescuegroups, mock_petfinder, framework, sample_cats):
|
||||
"""Test that pipeline continues if one API fails."""
|
||||
# Petfinder succeeds, RescueGroups fails
|
||||
mock_petfinder.return_value = sample_cats
|
||||
mock_rescuegroups.side_effect = Exception("API Error")
|
||||
|
||||
profile = CatProfile(user_location="10001")
|
||||
result = framework.search(profile)
|
||||
|
||||
# Should still get results from Petfinder
|
||||
assert result.total_found == 5
|
||||
assert len(result.matches) > 0
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
@patch('agents.rescuegroups_agent.RescueGroupsAgent.search_cats')
|
||||
def test_both_apis_fail(self, mock_rescuegroups, mock_petfinder, framework):
|
||||
"""Test that pipeline handles all APIs failing."""
|
||||
# Both fail
|
||||
mock_petfinder.side_effect = Exception("API Error")
|
||||
mock_rescuegroups.side_effect = Exception("API Error")
|
||||
|
||||
profile = CatProfile(user_location="10001")
|
||||
result = framework.search(profile)
|
||||
|
||||
# Should return empty results, not crash
|
||||
assert result.total_found == 0
|
||||
assert len(result.matches) == 0
|
||||
|
||||
|
||||
class TestVectorDBIntegration:
|
||||
"""Test vector database integration."""
|
||||
|
||||
@patch('agents.petfinder_agent.PetfinderAgent.search_cats')
|
||||
def test_vector_db_updated(self, mock_petfinder, framework):
|
||||
"""Test that vector DB is updated with new cats."""
|
||||
# Create unique cats that definitely won't exist in DB
|
||||
import time
|
||||
unique_id = str(int(time.time() * 1000))
|
||||
|
||||
unique_cats = []
|
||||
for i in range(3):
|
||||
cat = Cat(
|
||||
id=f"unique_test_{unique_id}_{i}",
|
||||
name=f"Unique Cat {unique_id} {i}",
|
||||
breed="TestBreed",
|
||||
age="young",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="Test City",
|
||||
state="TS",
|
||||
source="petfinder",
|
||||
organization_name=f"Unique Rescue {unique_id}",
|
||||
url=f"https://example.com/cat/unique_{unique_id}_{i}",
|
||||
description=f"A unique test cat {unique_id} {i}"
|
||||
)
|
||||
cat.fingerprint = create_fingerprint(cat)
|
||||
unique_cats.append(cat)
|
||||
|
||||
mock_petfinder.return_value = unique_cats
|
||||
|
||||
# Get initial count
|
||||
initial_stats = framework.get_stats()
|
||||
initial_count = initial_stats['vector_db']['total_documents']
|
||||
|
||||
# Run search
|
||||
profile = CatProfile(user_location="10001")
|
||||
framework.search(profile)
|
||||
|
||||
# Check count increased (should add at least 3 new documents)
|
||||
final_stats = framework.get_stats()
|
||||
final_count = final_stats['vector_db']['total_documents']
|
||||
|
||||
# Should have added our 3 unique cats
|
||||
assert final_count >= initial_count + 3, \
|
||||
f"Expected at least {initial_count + 3} documents, got {final_count}"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
|
||||
@@ -1,192 +0,0 @@
|
||||
"""Test script for cache mode and image-based deduplication."""
|
||||
|
||||
import os
|
||||
import sys
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from cat_adoption_framework import TuxedoLinkFramework
|
||||
from models.cats import CatProfile
|
||||
|
||||
def test_cache_mode():
|
||||
"""Test that cache mode works without hitting APIs."""
|
||||
print("\n" + "="*70)
|
||||
print("TEST 1: Cache Mode (No API Calls)")
|
||||
print("="*70 + "\n")
|
||||
|
||||
framework = TuxedoLinkFramework()
|
||||
|
||||
profile = CatProfile(
|
||||
user_location="10001",
|
||||
max_distance=50,
|
||||
personality_description="affectionate lap cat",
|
||||
age_range=["young"],
|
||||
good_with_children=True
|
||||
)
|
||||
|
||||
print("🔄 Running search with use_cache=True...")
|
||||
print(" This should use cached data from previous search\n")
|
||||
|
||||
result = framework.search(profile, use_cache=True)
|
||||
|
||||
print(f"\n✅ Cache search completed in {result.search_time:.2f} seconds")
|
||||
print(f" Sources: {', '.join(result.sources_queried)}")
|
||||
print(f" Matches: {len(result.matches)}")
|
||||
|
||||
if result.matches:
|
||||
print(f"\n Top match: {result.matches[0].cat.name} ({result.matches[0].match_score:.1%})")
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def test_image_dedup():
|
||||
"""Test that image embeddings are being used for deduplication."""
|
||||
print("\n" + "="*70)
|
||||
print("TEST 2: Image Embedding Deduplication")
|
||||
print("="*70 + "\n")
|
||||
|
||||
framework = TuxedoLinkFramework()
|
||||
|
||||
# Get database stats
|
||||
stats = framework.db_manager.get_cache_stats()
|
||||
|
||||
print("Current Database State:")
|
||||
print(f" Total unique cats: {stats['total_unique']}")
|
||||
print(f" Total duplicates: {stats['total_duplicates']}")
|
||||
print(f" Sources: {stats['sources']}")
|
||||
|
||||
# Check if image embeddings exist
|
||||
with framework.db_manager.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"SELECT COUNT(*) as total, "
|
||||
"SUM(CASE WHEN image_embedding IS NOT NULL THEN 1 ELSE 0 END) as with_images "
|
||||
"FROM cats_cache WHERE is_duplicate = 0"
|
||||
)
|
||||
row = cursor.fetchone()
|
||||
total = row['total']
|
||||
with_images = row['with_images']
|
||||
|
||||
print(f"\nImage Embeddings:")
|
||||
print(f" Cats with photos: {with_images}/{total} ({with_images/total*100 if total > 0 else 0:.1f}%)")
|
||||
|
||||
if with_images > 0:
|
||||
print("\n✅ Image embeddings ARE being generated and cached!")
|
||||
print(" These are used in the deduplication pipeline with:")
|
||||
print(" - Name similarity (40% weight)")
|
||||
print(" - Description similarity (30% weight)")
|
||||
print(" - Image similarity (30% weight)")
|
||||
else:
|
||||
print("\n⚠️ No image embeddings found yet")
|
||||
print(" Run a fresh search to populate the cache")
|
||||
|
||||
return stats
|
||||
|
||||
|
||||
def test_dedup_thresholds():
|
||||
"""Show deduplication thresholds being used."""
|
||||
print("\n" + "="*70)
|
||||
print("TEST 3: Deduplication Configuration")
|
||||
print("="*70 + "\n")
|
||||
|
||||
# Show environment variables
|
||||
name_threshold = float(os.getenv('DEDUP_NAME_THRESHOLD', '0.8'))
|
||||
desc_threshold = float(os.getenv('DEDUP_DESC_THRESHOLD', '0.7'))
|
||||
image_threshold = float(os.getenv('DEDUP_IMAGE_THRESHOLD', '0.9'))
|
||||
composite_threshold = float(os.getenv('DEDUP_COMPOSITE_THRESHOLD', '0.85'))
|
||||
|
||||
print("Current Deduplication Thresholds:")
|
||||
print(f" Name similarity: {name_threshold:.2f}")
|
||||
print(f" Description similarity: {desc_threshold:.2f}")
|
||||
print(f" Image similarity: {image_threshold:.2f}")
|
||||
print(f" Composite score: {composite_threshold:.2f}")
|
||||
|
||||
print("\nDeduplication Process:")
|
||||
print(" 1. Generate fingerprint (organization + breed + age + gender)")
|
||||
print(" 2. Query database for cats with same fingerprint")
|
||||
print(" 3. For each candidate:")
|
||||
print(" a. Load cached image embedding from database")
|
||||
print(" b. Compare names using Levenshtein distance")
|
||||
print(" c. Compare descriptions using fuzzy matching")
|
||||
print(" d. Compare images using CLIP embeddings")
|
||||
print(" e. Calculate composite score (weighted average)")
|
||||
print(" 4. If composite score > threshold → mark as duplicate")
|
||||
print(" 5. Otherwise → cache as new unique cat")
|
||||
|
||||
print("\n✅ Multi-stage deduplication with image embeddings is active!")
|
||||
|
||||
|
||||
def show_cache_benefits():
|
||||
"""Show benefits of using cache mode during development."""
|
||||
print("\n" + "="*70)
|
||||
print("CACHE MODE BENEFITS")
|
||||
print("="*70 + "\n")
|
||||
|
||||
print("Why use cache mode during development?")
|
||||
print()
|
||||
print("1. 🚀 SPEED")
|
||||
print(" - API search: ~13-14 seconds")
|
||||
print(" - Cache search: ~1-2 seconds (10x faster!)")
|
||||
print()
|
||||
print("2. 💰 SAVE API CALLS")
|
||||
print(" - Petfinder: 1000 requests/day limit")
|
||||
print(" - 100 cats/search = ~10 searches before hitting limit")
|
||||
print(" - Cache mode: unlimited searches!")
|
||||
print()
|
||||
print("3. 🧪 CONSISTENT TESTING")
|
||||
print(" - Same dataset every time")
|
||||
print(" - Test different profiles without new API calls")
|
||||
print(" - Perfect for UI development")
|
||||
print()
|
||||
print("4. 🔌 OFFLINE DEVELOPMENT")
|
||||
print(" - Work without internet")
|
||||
print(" - No API key rotation needed")
|
||||
print()
|
||||
print("Usage:")
|
||||
print(" # First run - fetch from API")
|
||||
print(" result = framework.search(profile, use_cache=False)")
|
||||
print()
|
||||
print(" # Subsequent runs - use cached data")
|
||||
print(" result = framework.search(profile, use_cache=True)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
load_dotenv()
|
||||
|
||||
print("\n" + "="*70)
|
||||
print("TUXEDO LINK - CACHE & DEDUPLICATION TESTS")
|
||||
print("="*70)
|
||||
|
||||
# Show benefits
|
||||
show_cache_benefits()
|
||||
|
||||
# Test cache mode
|
||||
try:
|
||||
cache_result = test_cache_mode()
|
||||
except Exception as e:
|
||||
print(f"\n⚠️ Cache test failed: {e}")
|
||||
print(" This is expected if you haven't run a search yet.")
|
||||
print(" Run: python cat_adoption_framework.py")
|
||||
cache_result = None
|
||||
|
||||
# Test image dedup
|
||||
test_image_dedup()
|
||||
|
||||
# Show config
|
||||
test_dedup_thresholds()
|
||||
|
||||
print("\n" + "="*70)
|
||||
print("SUMMARY")
|
||||
print("="*70 + "\n")
|
||||
|
||||
print("✅ Cache mode: IMPLEMENTED")
|
||||
print("✅ Image embeddings: CACHED & USED")
|
||||
print("✅ Multi-stage deduplication: ACTIVE")
|
||||
print("✅ API call savings: ENABLED")
|
||||
|
||||
print("\nRecommendation for development:")
|
||||
print(" 1. Run ONE search with use_cache=False to populate cache")
|
||||
print(" 2. Use use_cache=True for all UI/testing work")
|
||||
print(" 3. Refresh cache weekly or when you need new data")
|
||||
|
||||
print("\n" + "="*70 + "\n")
|
||||
|
||||
@@ -1,146 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
"""Manual test script for email sending via Mailgun."""
|
||||
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# Add project root to path
|
||||
project_root = Path(__file__).parent.parent.parent
|
||||
sys.path.insert(0, str(project_root))
|
||||
|
||||
# Load environment
|
||||
load_dotenv()
|
||||
|
||||
from agents.email_providers import MailgunProvider, get_email_provider
|
||||
from models.cats import Cat, CatMatch, AdoptionAlert, CatProfile
|
||||
|
||||
print("="*60)
|
||||
print(" Tuxedo Link - Email Sending Test")
|
||||
print("="*60)
|
||||
print()
|
||||
|
||||
# Check if Mailgun key is set
|
||||
if not os.getenv('MAILGUN_API_KEY'):
|
||||
print("❌ MAILGUN_API_KEY not set in environment")
|
||||
print("Please set it in your .env file")
|
||||
sys.exit(1)
|
||||
|
||||
print("✓ Mailgun API key found")
|
||||
print()
|
||||
|
||||
# Create test data
|
||||
test_cat = Cat(
|
||||
id="test-cat-123",
|
||||
name="Whiskers",
|
||||
age="Young",
|
||||
gender="male",
|
||||
size="medium",
|
||||
breed="Domestic Short Hair",
|
||||
description="A playful and friendly cat looking for a loving home!",
|
||||
primary_photo="https://via.placeholder.com/400x300?text=Whiskers",
|
||||
additional_photos=[],
|
||||
city="New York",
|
||||
state="NY",
|
||||
country="US",
|
||||
organization_name="Test Shelter",
|
||||
url="https://example.com/cat/123",
|
||||
good_with_children=True,
|
||||
good_with_dogs=False,
|
||||
good_with_cats=True,
|
||||
declawed=False,
|
||||
house_trained=True,
|
||||
spayed_neutered=True,
|
||||
special_needs=False,
|
||||
shots_current=True,
|
||||
adoption_fee=150.0,
|
||||
source="test"
|
||||
)
|
||||
|
||||
test_match = CatMatch(
|
||||
cat=test_cat,
|
||||
match_score=0.95,
|
||||
explanation="Great match! Friendly and playful, perfect for families.",
|
||||
vector_similarity=0.92,
|
||||
attribute_match_score=0.98,
|
||||
matching_attributes=["good_with_children", "playful", "medium_size"],
|
||||
missing_attributes=[]
|
||||
)
|
||||
|
||||
test_profile = CatProfile(
|
||||
user_location="New York, NY",
|
||||
max_distance=25,
|
||||
age_range=["young", "adult"],
|
||||
good_with_children=True,
|
||||
good_with_dogs=False,
|
||||
good_with_cats=True,
|
||||
personality_description="Friendly and playful",
|
||||
special_requirements=[]
|
||||
)
|
||||
|
||||
test_alert = AdoptionAlert(
|
||||
id=999,
|
||||
user_email="test@example.com", # Replace with your actual email for testing
|
||||
profile=test_profile,
|
||||
frequency="immediately",
|
||||
active=True
|
||||
)
|
||||
|
||||
print("Creating email provider...")
|
||||
try:
|
||||
provider = get_email_provider() # Uses config.yaml
|
||||
print(f"✓ Provider initialized: {provider.get_provider_name()}")
|
||||
except Exception as e:
|
||||
print(f"❌ Failed to initialize provider: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
print()
|
||||
print("Preparing test email...")
|
||||
print(f" To: {test_alert.user_email}")
|
||||
print(f" Subject: Test - New Cat Match on Tuxedo Link!")
|
||||
print()
|
||||
|
||||
# Create EmailAgent to use its template building methods
|
||||
from agents.email_agent import EmailAgent
|
||||
|
||||
email_agent = EmailAgent(provider=provider)
|
||||
|
||||
# Build email content
|
||||
subject = "🐱 Test - New Cat Match on Tuxedo Link!"
|
||||
html_content = email_agent._build_match_html([test_match], test_alert)
|
||||
text_content = email_agent._build_match_text([test_match])
|
||||
|
||||
# Send test email
|
||||
print("Sending test email...")
|
||||
input("Press Enter to send, or Ctrl+C to cancel...")
|
||||
|
||||
success = provider.send_email(
|
||||
to=test_alert.user_email,
|
||||
subject=subject,
|
||||
html=html_content,
|
||||
text=text_content
|
||||
)
|
||||
|
||||
print()
|
||||
if success:
|
||||
print("✅ Email sent successfully!")
|
||||
print()
|
||||
print("Please check your inbox at:", test_alert.user_email)
|
||||
print()
|
||||
print("If you don't see it:")
|
||||
print(" 1. Check your spam folder")
|
||||
print(" 2. Verify the email address is correct")
|
||||
print(" 3. Check Mailgun logs: https://app.mailgun.com/")
|
||||
else:
|
||||
print("❌ Failed to send email")
|
||||
print()
|
||||
print("Troubleshooting:")
|
||||
print(" 1. Check MAILGUN_API_KEY is correct")
|
||||
print(" 2. Verify Mailgun domain in config.yaml")
|
||||
print(" 3. Check Mailgun account status")
|
||||
print(" 4. View logs above for error details")
|
||||
|
||||
print()
|
||||
print("="*60)
|
||||
|
||||
@@ -1,2 +0,0 @@
|
||||
"""Unit tests for Tuxedo Link."""
|
||||
|
||||
@@ -1,287 +0,0 @@
|
||||
"""Unit tests for breed mapping utilities."""
|
||||
|
||||
import pytest
|
||||
import tempfile
|
||||
import shutil
|
||||
|
||||
from utils.breed_mapping import (
|
||||
normalize_user_breeds,
|
||||
get_breed_suggestions,
|
||||
USER_TERM_TO_API_BREED
|
||||
)
|
||||
from setup_metadata_vectordb import MetadataVectorDB
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_vectordb():
|
||||
"""Create a temporary metadata vector database with breeds indexed."""
|
||||
temp_dir = tempfile.mkdtemp()
|
||||
vectordb = MetadataVectorDB(persist_directory=temp_dir)
|
||||
|
||||
# Index some test breeds
|
||||
test_breeds = [
|
||||
"Siamese",
|
||||
"Persian",
|
||||
"Maine Coon",
|
||||
"Bengal",
|
||||
"Ragdoll",
|
||||
"British Shorthair",
|
||||
"Domestic Short Hair",
|
||||
"Domestic Medium Hair",
|
||||
"Domestic Long Hair"
|
||||
]
|
||||
vectordb.index_breeds(test_breeds, source="petfinder")
|
||||
|
||||
yield vectordb
|
||||
|
||||
# Cleanup
|
||||
shutil.rmtree(temp_dir, ignore_errors=True)
|
||||
|
||||
|
||||
class TestBreedMapping:
|
||||
"""Tests for breed mapping functions."""
|
||||
|
||||
def test_dictionary_match_maine_coon(self):
|
||||
"""Test dictionary mapping for 'maine coon' (common typo)."""
|
||||
valid_breeds = ["Maine Coon", "Siamese", "Persian"]
|
||||
|
||||
result = normalize_user_breeds(["main coon"], valid_breeds) # Typo: "main"
|
||||
|
||||
assert len(result) > 0
|
||||
assert "Maine Coon" in result
|
||||
|
||||
def test_dictionary_match_ragdoll(self):
|
||||
"""Test dictionary mapping for 'ragdol' (typo)."""
|
||||
valid_breeds = ["Ragdoll", "Siamese"]
|
||||
|
||||
result = normalize_user_breeds(["ragdol"], valid_breeds)
|
||||
|
||||
assert len(result) > 0
|
||||
assert "Ragdoll" in result
|
||||
|
||||
def test_dictionary_match_sphynx(self):
|
||||
"""Test dictionary mapping for 'sphinx' (common misspelling)."""
|
||||
valid_breeds = ["Sphynx", "Persian"]
|
||||
|
||||
result = normalize_user_breeds(["sphinx"], valid_breeds)
|
||||
|
||||
assert len(result) > 0
|
||||
assert "Sphynx" in result
|
||||
|
||||
def test_dictionary_match_mixed_breed(self):
|
||||
"""Test dictionary mapping for 'mixed' returns multiple options."""
|
||||
valid_breeds = [
|
||||
"Mixed Breed",
|
||||
"Domestic Short Hair",
|
||||
"Domestic Medium Hair",
|
||||
"Domestic Long Hair"
|
||||
]
|
||||
|
||||
result = normalize_user_breeds(["mixed"], valid_breeds)
|
||||
|
||||
assert len(result) >= 1
|
||||
# Should map to one or more domestic breeds
|
||||
assert any(b in result for b in valid_breeds)
|
||||
|
||||
def test_exact_match_fallback(self):
|
||||
"""Test exact match when not in dictionary."""
|
||||
valid_breeds = ["Siamese", "Persian", "Bengal"]
|
||||
|
||||
result = normalize_user_breeds(["siamese"], valid_breeds)
|
||||
|
||||
assert len(result) == 1
|
||||
assert "Siamese" in result
|
||||
|
||||
def test_substring_match_fallback(self):
|
||||
"""Test substring matching for partial breed names."""
|
||||
valid_breeds = ["British Shorthair", "American Shorthair"]
|
||||
|
||||
result = normalize_user_breeds(["shorthair"], valid_breeds)
|
||||
|
||||
assert len(result) >= 1
|
||||
assert any("Shorthair" in breed for breed in result)
|
||||
|
||||
def test_multiple_breeds(self):
|
||||
"""Test mapping multiple breed terms."""
|
||||
valid_breeds = ["Siamese", "Persian", "Maine Coon"]
|
||||
|
||||
result = normalize_user_breeds(
|
||||
["siamese", "persian", "maine"],
|
||||
valid_breeds
|
||||
)
|
||||
|
||||
assert len(result) >= 2 # At least siamese and persian should match
|
||||
assert "Siamese" in result
|
||||
assert "Persian" in result
|
||||
|
||||
def test_no_match(self):
|
||||
"""Test when no match is found."""
|
||||
valid_breeds = ["Siamese", "Persian"]
|
||||
|
||||
result = normalize_user_breeds(["invalid_breed_xyz"], valid_breeds)
|
||||
|
||||
# Should return empty list
|
||||
assert len(result) == 0
|
||||
|
||||
def test_empty_input(self):
|
||||
"""Test with empty input."""
|
||||
valid_breeds = ["Siamese", "Persian"]
|
||||
|
||||
result = normalize_user_breeds([], valid_breeds)
|
||||
assert len(result) == 0
|
||||
|
||||
result = normalize_user_breeds([""], valid_breeds)
|
||||
assert len(result) == 0
|
||||
|
||||
def test_with_vectordb(self, temp_vectordb):
|
||||
"""Test with vector DB for fuzzy matching."""
|
||||
valid_breeds = ["Maine Coon", "Ragdoll", "Bengal"]
|
||||
|
||||
# Test with typo
|
||||
result = normalize_user_breeds(
|
||||
["ragdol"], # Typo
|
||||
valid_breeds,
|
||||
vectordb=temp_vectordb,
|
||||
source="petfinder"
|
||||
)
|
||||
|
||||
# Should still find Ragdoll via vector search (if not in dictionary)
|
||||
# Or dictionary match if present
|
||||
assert len(result) > 0
|
||||
assert "Ragdoll" in result
|
||||
|
||||
def test_vector_search_typo(self, temp_vectordb):
|
||||
"""Test vector search handles typos."""
|
||||
valid_breeds = ["Siamese"]
|
||||
|
||||
# Typo: "siames"
|
||||
result = normalize_user_breeds(
|
||||
["siames"],
|
||||
valid_breeds,
|
||||
vectordb=temp_vectordb,
|
||||
source="petfinder",
|
||||
similarity_threshold=0.6
|
||||
)
|
||||
|
||||
# Vector search should find Siamese
|
||||
if len(result) > 0:
|
||||
assert "Siamese" in result
|
||||
|
||||
def test_dictionary_priority(self, temp_vectordb):
|
||||
"""Test that dictionary matches are prioritized over vector search."""
|
||||
valid_breeds = ["Maine Coon"]
|
||||
|
||||
# "main coon" is in dictionary
|
||||
result = normalize_user_breeds(
|
||||
["main coon"],
|
||||
valid_breeds,
|
||||
vectordb=temp_vectordb,
|
||||
source="petfinder"
|
||||
)
|
||||
|
||||
# Should use dictionary match
|
||||
assert "Maine Coon" in result
|
||||
|
||||
def test_case_insensitive(self):
|
||||
"""Test case-insensitive matching."""
|
||||
valid_breeds = ["Maine Coon"]
|
||||
|
||||
result_lower = normalize_user_breeds(["maine"], valid_breeds)
|
||||
result_upper = normalize_user_breeds(["MAINE"], valid_breeds)
|
||||
result_mixed = normalize_user_breeds(["MaInE"], valid_breeds)
|
||||
|
||||
assert result_lower == result_upper == result_mixed
|
||||
|
||||
def test_domestic_variations(self):
|
||||
"""Test that DSH/DMH/DLH map correctly."""
|
||||
valid_breeds = [
|
||||
"Domestic Short Hair",
|
||||
"Domestic Medium Hair",
|
||||
"Domestic Long Hair"
|
||||
]
|
||||
|
||||
result_dsh = normalize_user_breeds(["dsh"], valid_breeds)
|
||||
result_dmh = normalize_user_breeds(["dmh"], valid_breeds)
|
||||
result_dlh = normalize_user_breeds(["dlh"], valid_breeds)
|
||||
|
||||
assert "Domestic Short Hair" in result_dsh
|
||||
assert "Domestic Medium Hair" in result_dmh
|
||||
assert "Domestic Long Hair" in result_dlh
|
||||
|
||||
def test_tabby_is_not_breed(self):
|
||||
"""Test that 'tabby' maps to Domestic Short Hair (tabby is a pattern, not breed)."""
|
||||
valid_breeds = ["Domestic Short Hair", "Siamese"]
|
||||
|
||||
result = normalize_user_breeds(["tabby"], valid_breeds)
|
||||
|
||||
assert len(result) > 0
|
||||
assert "Domestic Short Hair" in result
|
||||
|
||||
def test_get_breed_suggestions(self):
|
||||
"""Test breed suggestions function."""
|
||||
valid_breeds = [
|
||||
"British Shorthair",
|
||||
"American Shorthair",
|
||||
"Domestic Short Hair"
|
||||
]
|
||||
|
||||
suggestions = get_breed_suggestions("short", valid_breeds, top_n=3)
|
||||
|
||||
assert len(suggestions) == 3
|
||||
assert all("Short" in s for s in suggestions)
|
||||
|
||||
def test_all_dictionary_mappings(self):
|
||||
"""Test that all dictionary mappings are correctly defined."""
|
||||
# Verify structure of USER_TERM_TO_API_BREED
|
||||
assert isinstance(USER_TERM_TO_API_BREED, dict)
|
||||
|
||||
for user_term, api_breeds in USER_TERM_TO_API_BREED.items():
|
||||
assert isinstance(user_term, str)
|
||||
assert isinstance(api_breeds, list)
|
||||
assert len(api_breeds) > 0
|
||||
assert all(isinstance(b, str) for b in api_breeds)
|
||||
|
||||
def test_whitespace_handling(self):
|
||||
"""Test handling of whitespace in user input."""
|
||||
valid_breeds = ["Maine Coon"]
|
||||
|
||||
result1 = normalize_user_breeds([" maine "], valid_breeds)
|
||||
result2 = normalize_user_breeds(["maine"], valid_breeds)
|
||||
|
||||
assert result1 == result2
|
||||
|
||||
def test_norwegian_forest_variations(self):
|
||||
"""Test Norwegian Forest Cat variations."""
|
||||
valid_breeds = ["Norwegian Forest Cat"]
|
||||
|
||||
result1 = normalize_user_breeds(["norwegian forest"], valid_breeds)
|
||||
result2 = normalize_user_breeds(["norwegian forest cat"], valid_breeds)
|
||||
|
||||
assert "Norwegian Forest Cat" in result1
|
||||
assert "Norwegian Forest Cat" in result2
|
||||
|
||||
def test_similarity_threshold(self, temp_vectordb):
|
||||
"""Test that similarity threshold works."""
|
||||
valid_breeds = ["Siamese"]
|
||||
|
||||
# Very different term
|
||||
result_high = normalize_user_breeds(
|
||||
["abcxyz"],
|
||||
valid_breeds,
|
||||
vectordb=temp_vectordb,
|
||||
source="petfinder",
|
||||
similarity_threshold=0.9 # High threshold
|
||||
)
|
||||
|
||||
result_low = normalize_user_breeds(
|
||||
["abcxyz"],
|
||||
valid_breeds,
|
||||
vectordb=temp_vectordb,
|
||||
source="petfinder",
|
||||
similarity_threshold=0.1 # Low threshold
|
||||
)
|
||||
|
||||
# High threshold should reject poor matches
|
||||
# Low threshold may accept them
|
||||
assert len(result_high) <= len(result_low)
|
||||
|
||||
@@ -1,225 +0,0 @@
|
||||
"""Unit tests for color mapping utilities."""
|
||||
|
||||
import pytest
|
||||
import tempfile
|
||||
import shutil
|
||||
|
||||
from utils.color_mapping import (
|
||||
normalize_user_colors,
|
||||
get_color_suggestions,
|
||||
USER_TERM_TO_API_COLOR
|
||||
)
|
||||
from setup_metadata_vectordb import MetadataVectorDB
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_vectordb():
|
||||
"""Create a temporary metadata vector database with colors indexed."""
|
||||
temp_dir = tempfile.mkdtemp()
|
||||
vectordb = MetadataVectorDB(persist_directory=temp_dir)
|
||||
|
||||
# Index some test colors
|
||||
test_colors = [
|
||||
"Black",
|
||||
"White",
|
||||
"Black & White / Tuxedo",
|
||||
"Orange / Red",
|
||||
"Gray / Blue / Silver",
|
||||
"Calico",
|
||||
"Tabby (Brown / Chocolate)"
|
||||
]
|
||||
vectordb.index_colors(test_colors, source="petfinder")
|
||||
|
||||
yield vectordb
|
||||
|
||||
# Cleanup
|
||||
shutil.rmtree(temp_dir, ignore_errors=True)
|
||||
|
||||
|
||||
class TestColorMapping:
|
||||
"""Tests for color mapping functions."""
|
||||
|
||||
def test_dictionary_match_tuxedo(self):
|
||||
"""Test dictionary mapping for 'tuxedo'."""
|
||||
valid_colors = ["Black", "White", "Black & White / Tuxedo"]
|
||||
|
||||
result = normalize_user_colors(["tuxedo"], valid_colors)
|
||||
|
||||
assert len(result) > 0
|
||||
assert "Black & White / Tuxedo" in result
|
||||
assert "Black" not in result # Should NOT map to separate colors
|
||||
|
||||
def test_dictionary_match_orange(self):
|
||||
"""Test dictionary mapping for 'orange'."""
|
||||
valid_colors = ["Orange / Red", "White"]
|
||||
|
||||
result = normalize_user_colors(["orange"], valid_colors)
|
||||
|
||||
assert len(result) == 1
|
||||
assert "Orange / Red" in result
|
||||
|
||||
def test_dictionary_match_gray_variations(self):
|
||||
"""Test dictionary mapping for gray/grey."""
|
||||
valid_colors = ["Gray / Blue / Silver", "White"]
|
||||
|
||||
result_gray = normalize_user_colors(["gray"], valid_colors)
|
||||
result_grey = normalize_user_colors(["grey"], valid_colors)
|
||||
|
||||
assert result_gray == result_grey
|
||||
assert "Gray / Blue / Silver" in result_gray
|
||||
|
||||
def test_multiple_colors(self):
|
||||
"""Test mapping multiple color terms."""
|
||||
valid_colors = [
|
||||
"Black & White / Tuxedo",
|
||||
"Orange / Red",
|
||||
"Calico"
|
||||
]
|
||||
|
||||
result = normalize_user_colors(
|
||||
["tuxedo", "orange", "calico"],
|
||||
valid_colors
|
||||
)
|
||||
|
||||
assert len(result) == 3
|
||||
assert "Black & White / Tuxedo" in result
|
||||
assert "Orange / Red" in result
|
||||
assert "Calico" in result
|
||||
|
||||
def test_exact_match_fallback(self):
|
||||
"""Test exact match when not in dictionary."""
|
||||
valid_colors = ["Black", "White", "Calico"]
|
||||
|
||||
# "Calico" should match exactly
|
||||
result = normalize_user_colors(["calico"], valid_colors)
|
||||
|
||||
assert len(result) == 1
|
||||
assert "Calico" in result
|
||||
|
||||
def test_substring_match_fallback(self):
|
||||
"""Test substring matching as last resort."""
|
||||
valid_colors = ["Tabby (Brown / Chocolate)", "Tabby (Orange / Red)"]
|
||||
|
||||
# "tabby" should match both tabby colors
|
||||
result = normalize_user_colors(["tabby"], valid_colors)
|
||||
|
||||
assert len(result) >= 1
|
||||
assert any("Tabby" in color for color in result)
|
||||
|
||||
def test_no_match(self):
|
||||
"""Test when no match is found."""
|
||||
valid_colors = ["Black", "White"]
|
||||
|
||||
result = normalize_user_colors(["invalid_color_xyz"], valid_colors)
|
||||
|
||||
# Should return empty list
|
||||
assert len(result) == 0
|
||||
|
||||
def test_empty_input(self):
|
||||
"""Test with empty input."""
|
||||
valid_colors = ["Black", "White"]
|
||||
|
||||
result = normalize_user_colors([], valid_colors)
|
||||
assert len(result) == 0
|
||||
|
||||
result = normalize_user_colors([""], valid_colors)
|
||||
assert len(result) == 0
|
||||
|
||||
def test_with_vectordb(self, temp_vectordb):
|
||||
"""Test with vector DB for fuzzy matching."""
|
||||
valid_colors = [
|
||||
"Black & White / Tuxedo",
|
||||
"Orange / Red",
|
||||
"Gray / Blue / Silver"
|
||||
]
|
||||
|
||||
# Test with typo (with lower threshold to demonstrate fuzzy matching)
|
||||
result = normalize_user_colors(
|
||||
["tuxado"], # Typo
|
||||
valid_colors,
|
||||
vectordb=temp_vectordb,
|
||||
source="petfinder",
|
||||
similarity_threshold=0.3 # Lower threshold for typos
|
||||
)
|
||||
|
||||
# With lower threshold, may find a match (not guaranteed for all typos)
|
||||
# The main point is that it doesn't crash and handles typos gracefully
|
||||
assert isinstance(result, list) # Returns a list (may be empty)
|
||||
|
||||
def test_vector_search_typo(self, temp_vectordb):
|
||||
"""Test vector search handles typos."""
|
||||
valid_colors = ["Gray / Blue / Silver"]
|
||||
|
||||
# Typo: "grey" is in dictionary but "gery" is not
|
||||
result = normalize_user_colors(
|
||||
["gery"], # Typo
|
||||
valid_colors,
|
||||
vectordb=temp_vectordb,
|
||||
source="petfinder",
|
||||
similarity_threshold=0.6 # Lower threshold for typos
|
||||
)
|
||||
|
||||
# Vector search should find gray
|
||||
# Note: May not always work for severe typos
|
||||
if len(result) > 0:
|
||||
assert "Gray" in result[0] or "Blue" in result[0] or "Silver" in result[0]
|
||||
|
||||
def test_dictionary_priority(self, temp_vectordb):
|
||||
"""Test that dictionary matches are prioritized over vector search."""
|
||||
valid_colors = ["Black & White / Tuxedo", "Black"]
|
||||
|
||||
# "tuxedo" is in dictionary
|
||||
result = normalize_user_colors(
|
||||
["tuxedo"],
|
||||
valid_colors,
|
||||
vectordb=temp_vectordb,
|
||||
source="petfinder"
|
||||
)
|
||||
|
||||
# Should use dictionary match
|
||||
assert "Black & White / Tuxedo" in result
|
||||
assert "Black" not in result # Should not be separate
|
||||
|
||||
def test_case_insensitive(self):
|
||||
"""Test case-insensitive matching."""
|
||||
valid_colors = ["Black & White / Tuxedo"]
|
||||
|
||||
result_lower = normalize_user_colors(["tuxedo"], valid_colors)
|
||||
result_upper = normalize_user_colors(["TUXEDO"], valid_colors)
|
||||
result_mixed = normalize_user_colors(["TuXeDo"], valid_colors)
|
||||
|
||||
assert result_lower == result_upper == result_mixed
|
||||
|
||||
def test_get_color_suggestions(self):
|
||||
"""Test color suggestions function."""
|
||||
valid_colors = [
|
||||
"Tabby (Brown / Chocolate)",
|
||||
"Tabby (Orange / Red)",
|
||||
"Tabby (Gray / Blue / Silver)"
|
||||
]
|
||||
|
||||
suggestions = get_color_suggestions("tab", valid_colors, top_n=3)
|
||||
|
||||
assert len(suggestions) == 3
|
||||
assert all("Tabby" in s for s in suggestions)
|
||||
|
||||
def test_all_dictionary_mappings(self):
|
||||
"""Test that all dictionary mappings are correctly defined."""
|
||||
# Verify structure of USER_TERM_TO_API_COLOR
|
||||
assert isinstance(USER_TERM_TO_API_COLOR, dict)
|
||||
|
||||
for user_term, api_colors in USER_TERM_TO_API_COLOR.items():
|
||||
assert isinstance(user_term, str)
|
||||
assert isinstance(api_colors, list)
|
||||
assert len(api_colors) > 0
|
||||
assert all(isinstance(c, str) for c in api_colors)
|
||||
|
||||
def test_whitespace_handling(self):
|
||||
"""Test handling of whitespace in user input."""
|
||||
valid_colors = ["Black & White / Tuxedo"]
|
||||
|
||||
result1 = normalize_user_colors([" tuxedo "], valid_colors)
|
||||
result2 = normalize_user_colors(["tuxedo"], valid_colors)
|
||||
|
||||
assert result1 == result2
|
||||
|
||||
@@ -1,235 +0,0 @@
|
||||
"""Fixed unit tests for database manager."""
|
||||
|
||||
import pytest
|
||||
from models.cats import Cat, CatProfile, AdoptionAlert
|
||||
|
||||
|
||||
class TestDatabaseInitialization:
|
||||
"""Tests for database initialization."""
|
||||
|
||||
def test_database_creation(self, temp_db):
|
||||
"""Test that database is created with tables."""
|
||||
assert temp_db.db_path.endswith('.db')
|
||||
|
||||
# Check that tables exist
|
||||
with temp_db.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"SELECT name FROM sqlite_master WHERE type='table'"
|
||||
)
|
||||
tables = {row['name'] for row in cursor.fetchall()}
|
||||
|
||||
assert 'alerts' in tables
|
||||
assert 'cats_cache' in tables
|
||||
|
||||
def test_get_connection(self, temp_db):
|
||||
"""Test database connection."""
|
||||
with temp_db.get_connection() as conn:
|
||||
assert conn is not None
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("SELECT 1")
|
||||
assert cursor.fetchone()[0] == 1
|
||||
|
||||
|
||||
class TestCatCaching:
|
||||
"""Tests for cat caching operations."""
|
||||
|
||||
def test_cache_cat(self, temp_db, sample_cat_data):
|
||||
"""Test caching a cat."""
|
||||
from utils.deduplication import create_fingerprint
|
||||
|
||||
cat = Cat(**sample_cat_data)
|
||||
cat.fingerprint = create_fingerprint(cat) # Generate fingerprint
|
||||
temp_db.cache_cat(cat, None)
|
||||
|
||||
# Verify cat was cached
|
||||
cats = temp_db.get_all_cached_cats()
|
||||
assert len(cats) == 1
|
||||
assert cats[0].name == "Test Cat"
|
||||
|
||||
def test_cache_cat_with_embedding(self, temp_db, sample_cat_data):
|
||||
"""Test caching a cat with image embedding."""
|
||||
import numpy as np
|
||||
from utils.deduplication import create_fingerprint
|
||||
|
||||
cat = Cat(**sample_cat_data)
|
||||
cat.fingerprint = create_fingerprint(cat) # Generate fingerprint
|
||||
embedding = np.array([0.1, 0.2, 0.3], dtype=np.float32)
|
||||
temp_db.cache_cat(cat, embedding)
|
||||
|
||||
# Verify embedding was saved
|
||||
with temp_db.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"SELECT image_embedding FROM cats_cache WHERE id = ?",
|
||||
(cat.id,)
|
||||
)
|
||||
row = cursor.fetchone()
|
||||
assert row['image_embedding'] is not None
|
||||
|
||||
def test_get_cats_by_fingerprint(self, temp_db):
|
||||
"""Test retrieving cats by fingerprint."""
|
||||
cat1 = Cat(
|
||||
id="test1",
|
||||
name="Cat 1",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="Test City",
|
||||
state="TS",
|
||||
source="test",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/test1",
|
||||
fingerprint="test_fingerprint"
|
||||
)
|
||||
|
||||
cat2 = Cat(
|
||||
id="test2",
|
||||
name="Cat 2",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="Test City",
|
||||
state="TS",
|
||||
source="test",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/test2",
|
||||
fingerprint="test_fingerprint"
|
||||
)
|
||||
|
||||
temp_db.cache_cat(cat1, None)
|
||||
temp_db.cache_cat(cat2, None)
|
||||
|
||||
results = temp_db.get_cats_by_fingerprint("test_fingerprint")
|
||||
assert len(results) == 2
|
||||
|
||||
def test_mark_as_duplicate(self, temp_db):
|
||||
"""Test marking a cat as duplicate."""
|
||||
from utils.deduplication import create_fingerprint
|
||||
|
||||
cat1 = Cat(
|
||||
id="original",
|
||||
name="Original",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="Test City",
|
||||
state="TS",
|
||||
source="test",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/original"
|
||||
)
|
||||
cat1.fingerprint = create_fingerprint(cat1)
|
||||
|
||||
cat2 = Cat(
|
||||
id="duplicate",
|
||||
name="Duplicate",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="Test City",
|
||||
state="TS",
|
||||
source="test",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/duplicate"
|
||||
)
|
||||
cat2.fingerprint = create_fingerprint(cat2)
|
||||
|
||||
temp_db.cache_cat(cat1, None)
|
||||
temp_db.cache_cat(cat2, None)
|
||||
|
||||
temp_db.mark_as_duplicate("duplicate", "original")
|
||||
|
||||
# Check duplicate is marked
|
||||
with temp_db.get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
"SELECT is_duplicate, duplicate_of FROM cats_cache WHERE id = ?",
|
||||
("duplicate",)
|
||||
)
|
||||
row = cursor.fetchone()
|
||||
assert row['is_duplicate'] == 1
|
||||
assert row['duplicate_of'] == "original"
|
||||
|
||||
def test_get_cache_stats(self, temp_db):
|
||||
"""Test getting cache statistics."""
|
||||
from utils.deduplication import create_fingerprint
|
||||
|
||||
cat1 = Cat(
|
||||
id="test1",
|
||||
name="Cat 1",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="Test City",
|
||||
state="TS",
|
||||
source="petfinder",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/test1"
|
||||
)
|
||||
cat1.fingerprint = create_fingerprint(cat1)
|
||||
|
||||
cat2 = Cat(
|
||||
id="test2",
|
||||
name="Cat 2",
|
||||
breed="Siamese",
|
||||
age="young",
|
||||
gender="male",
|
||||
size="small",
|
||||
city="Test City",
|
||||
state="TS",
|
||||
source="rescuegroups",
|
||||
organization_name="Other Rescue",
|
||||
url="https://example.com/cat/test2"
|
||||
)
|
||||
cat2.fingerprint = create_fingerprint(cat2)
|
||||
|
||||
temp_db.cache_cat(cat1, None)
|
||||
temp_db.cache_cat(cat2, None)
|
||||
|
||||
stats = temp_db.get_cache_stats()
|
||||
|
||||
assert stats['total_unique'] == 2
|
||||
assert stats['sources'] == 2
|
||||
assert 'petfinder' in stats['by_source']
|
||||
assert 'rescuegroups' in stats['by_source']
|
||||
|
||||
|
||||
class TestAlertManagement:
|
||||
"""Tests for alert management operations."""
|
||||
|
||||
def test_create_alert(self, temp_db):
|
||||
"""Test creating an alert."""
|
||||
profile = CatProfile(user_location="10001")
|
||||
alert = AdoptionAlert(
|
||||
user_email="test@example.com",
|
||||
profile=profile,
|
||||
frequency="daily"
|
||||
)
|
||||
|
||||
alert_id = temp_db.create_alert(alert)
|
||||
|
||||
assert alert_id is not None
|
||||
assert alert_id > 0
|
||||
|
||||
def test_get_alerts_by_email(self, temp_db):
|
||||
"""Test retrieving alerts by email."""
|
||||
profile = CatProfile(user_location="10001")
|
||||
alert = AdoptionAlert(
|
||||
user_email="test@example.com",
|
||||
profile=profile,
|
||||
frequency="daily"
|
||||
)
|
||||
|
||||
temp_db.create_alert(alert)
|
||||
|
||||
alerts = temp_db.get_alerts_by_email("test@example.com")
|
||||
|
||||
assert len(alerts) > 0
|
||||
assert alerts[0].user_email == "test@example.com"
|
||||
|
||||
@@ -1,278 +0,0 @@
|
||||
"""Fixed unit tests for deduplication utilities."""
|
||||
|
||||
import pytest
|
||||
from models.cats import Cat
|
||||
from utils.deduplication import create_fingerprint, calculate_levenshtein_similarity, calculate_composite_score
|
||||
|
||||
|
||||
class TestFingerprinting:
|
||||
"""Tests for fingerprint generation."""
|
||||
|
||||
def test_fingerprint_basic(self):
|
||||
"""Test basic fingerprint generation."""
|
||||
cat = Cat(
|
||||
id="12345",
|
||||
name="Fluffy",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="New York",
|
||||
state="NY",
|
||||
source="petfinder",
|
||||
organization_name="Happy Paws Rescue",
|
||||
url="https://example.com/cat/12345"
|
||||
)
|
||||
|
||||
fingerprint = create_fingerprint(cat)
|
||||
|
||||
assert fingerprint is not None
|
||||
assert isinstance(fingerprint, str)
|
||||
# Fingerprint is a hash, so just verify it's a 16-character hex string
|
||||
assert len(fingerprint) == 16
|
||||
assert all(c in '0123456789abcdef' for c in fingerprint)
|
||||
|
||||
def test_fingerprint_consistency(self):
|
||||
"""Test that same cat produces same fingerprint."""
|
||||
cat1 = Cat(
|
||||
id="12345",
|
||||
name="Fluffy",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="New York",
|
||||
state="NY",
|
||||
source="petfinder",
|
||||
organization_name="Happy Paws",
|
||||
url="https://example.com/cat/12345"
|
||||
)
|
||||
|
||||
cat2 = Cat(
|
||||
id="67890",
|
||||
name="Fluffy McGee", # Different name
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="Boston", # Different city
|
||||
state="MA",
|
||||
source="rescuegroups", # Different source
|
||||
organization_name="Happy Paws",
|
||||
url="https://example.com/cat/67890"
|
||||
)
|
||||
|
||||
# Should have same fingerprint (stable attributes match)
|
||||
assert create_fingerprint(cat1) == create_fingerprint(cat2)
|
||||
|
||||
def test_fingerprint_difference(self):
|
||||
"""Test that different cats produce different fingerprints."""
|
||||
cat1 = Cat(
|
||||
id="12345",
|
||||
name="Fluffy",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="New York",
|
||||
state="NY",
|
||||
source="petfinder",
|
||||
organization_name="Happy Paws",
|
||||
url="https://example.com/cat/12345"
|
||||
)
|
||||
|
||||
cat2 = Cat(
|
||||
id="67890",
|
||||
name="Fluffy",
|
||||
breed="Persian",
|
||||
age="young", # Different age
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="New York",
|
||||
state="NY",
|
||||
source="petfinder",
|
||||
organization_name="Happy Paws",
|
||||
url="https://example.com/cat/67890"
|
||||
)
|
||||
|
||||
# Should have different fingerprints
|
||||
assert create_fingerprint(cat1) != create_fingerprint(cat2)
|
||||
|
||||
|
||||
class TestLevenshteinSimilarity:
|
||||
"""Tests for Levenshtein similarity calculation."""
|
||||
|
||||
def test_identical_strings(self):
|
||||
"""Test identical strings return 1.0."""
|
||||
similarity = calculate_levenshtein_similarity("Fluffy", "Fluffy")
|
||||
assert similarity == 1.0
|
||||
|
||||
def test_completely_different_strings(self):
|
||||
"""Test completely different strings return low score."""
|
||||
similarity = calculate_levenshtein_similarity("Fluffy", "12345")
|
||||
assert similarity < 0.2
|
||||
|
||||
def test_similar_strings(self):
|
||||
"""Test similar strings return high score."""
|
||||
similarity = calculate_levenshtein_similarity("Fluffy", "Fluffy2")
|
||||
assert similarity > 0.8
|
||||
|
||||
def test_case_insensitive(self):
|
||||
"""Test that comparison is case-insensitive."""
|
||||
similarity = calculate_levenshtein_similarity("Fluffy", "fluffy")
|
||||
assert similarity == 1.0
|
||||
|
||||
def test_empty_strings(self):
|
||||
"""Test empty strings - both empty is 0.0 similarity."""
|
||||
similarity = calculate_levenshtein_similarity("", "")
|
||||
assert similarity == 0.0 # Empty strings return 0.0 in implementation
|
||||
|
||||
similarity = calculate_levenshtein_similarity("Fluffy", "")
|
||||
assert similarity == 0.0
|
||||
|
||||
|
||||
class TestCompositeScore:
|
||||
"""Tests for composite score calculation."""
|
||||
|
||||
def test_composite_score_all_high(self):
|
||||
"""Test composite score when all similarities are high."""
|
||||
score = calculate_composite_score(
|
||||
name_similarity=0.9,
|
||||
description_similarity=0.9,
|
||||
image_similarity=0.9,
|
||||
name_weight=0.4,
|
||||
description_weight=0.3,
|
||||
image_weight=0.3
|
||||
)
|
||||
|
||||
assert score > 0.85
|
||||
assert score <= 1.0
|
||||
|
||||
def test_composite_score_weighted(self):
|
||||
"""Test that weights affect composite score correctly."""
|
||||
# Name has 100% weight
|
||||
score = calculate_composite_score(
|
||||
name_similarity=0.5,
|
||||
description_similarity=1.0,
|
||||
image_similarity=1.0,
|
||||
name_weight=1.0,
|
||||
description_weight=0.0,
|
||||
image_weight=0.0
|
||||
)
|
||||
|
||||
assert score == 0.5
|
||||
|
||||
def test_composite_score_zero_image(self):
|
||||
"""Test composite score when no image similarity."""
|
||||
score = calculate_composite_score(
|
||||
name_similarity=0.9,
|
||||
description_similarity=0.9,
|
||||
image_similarity=0.0,
|
||||
name_weight=0.4,
|
||||
description_weight=0.3,
|
||||
image_weight=0.3
|
||||
)
|
||||
|
||||
# Should still compute based on name and description
|
||||
assert score > 0.5
|
||||
assert score < 0.9
|
||||
|
||||
def test_composite_score_bounds(self):
|
||||
"""Test that composite score is always between 0 and 1."""
|
||||
score = calculate_composite_score(
|
||||
name_similarity=1.0,
|
||||
description_similarity=1.0,
|
||||
image_similarity=1.0,
|
||||
name_weight=0.4,
|
||||
description_weight=0.3,
|
||||
image_weight=0.3
|
||||
)
|
||||
|
||||
assert 0.0 <= score <= 1.0
|
||||
|
||||
|
||||
class TestTextSimilarity:
|
||||
"""Integration tests for text similarity (name + description)."""
|
||||
|
||||
def test_similar_cats_high_score(self):
|
||||
"""Test that similar cats get high similarity scores."""
|
||||
cat1 = Cat(
|
||||
id="12345",
|
||||
name="Fluffy",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="New York",
|
||||
state="NY",
|
||||
source="petfinder",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/12345",
|
||||
description="A very friendly and playful cat that loves to cuddle"
|
||||
)
|
||||
|
||||
cat2 = Cat(
|
||||
id="67890",
|
||||
name="Fluffy",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="New York",
|
||||
state="NY",
|
||||
source="rescuegroups",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/67890",
|
||||
description="Very friendly playful cat who loves cuddling"
|
||||
)
|
||||
|
||||
name_sim = calculate_levenshtein_similarity(cat1.name, cat2.name)
|
||||
desc_sim = calculate_levenshtein_similarity(
|
||||
cat1.description or "",
|
||||
cat2.description or ""
|
||||
)
|
||||
|
||||
assert name_sim == 1.0
|
||||
assert desc_sim > 0.7
|
||||
|
||||
def test_different_cats_low_score(self):
|
||||
"""Test that different cats get low similarity scores."""
|
||||
cat1 = Cat(
|
||||
id="12345",
|
||||
name="Fluffy",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="New York",
|
||||
state="NY",
|
||||
source="petfinder",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/12345",
|
||||
description="Playful kitten"
|
||||
)
|
||||
|
||||
cat2 = Cat(
|
||||
id="67890",
|
||||
name="Rex",
|
||||
breed="Siamese",
|
||||
age="young",
|
||||
gender="male",
|
||||
size="large",
|
||||
city="Boston",
|
||||
state="MA",
|
||||
source="rescuegroups",
|
||||
organization_name="Other Rescue",
|
||||
url="https://example.com/cat/67890",
|
||||
description="Calm senior cat"
|
||||
)
|
||||
|
||||
name_sim = calculate_levenshtein_similarity(cat1.name, cat2.name)
|
||||
desc_sim = calculate_levenshtein_similarity(
|
||||
cat1.description or "",
|
||||
cat2.description or ""
|
||||
)
|
||||
|
||||
assert name_sim < 0.3
|
||||
assert desc_sim < 0.5
|
||||
|
||||
@@ -1,235 +0,0 @@
|
||||
"""Unit tests for email providers."""
|
||||
|
||||
import pytest
|
||||
from unittest.mock import patch, MagicMock
|
||||
from agents.email_providers import (
|
||||
EmailProvider,
|
||||
MailgunProvider,
|
||||
SendGridProvider,
|
||||
get_email_provider
|
||||
)
|
||||
|
||||
|
||||
class TestMailgunProvider:
|
||||
"""Tests for Mailgun email provider."""
|
||||
|
||||
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-api-key'})
|
||||
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
|
||||
@patch('agents.email_providers.mailgun_provider.get_email_config')
|
||||
def test_init(self, mock_email_config, mock_mailgun_config):
|
||||
"""Test Mailgun provider initialization."""
|
||||
mock_mailgun_config.return_value = {
|
||||
'domain': 'test.mailgun.org'
|
||||
}
|
||||
mock_email_config.return_value = {
|
||||
'from_name': 'Test App',
|
||||
'from_email': 'test@test.com'
|
||||
}
|
||||
|
||||
provider = MailgunProvider()
|
||||
|
||||
assert provider.api_key == 'test-api-key'
|
||||
assert provider.domain == 'test.mailgun.org'
|
||||
assert provider.default_from_name == 'Test App'
|
||||
assert provider.default_from_email == 'test@test.com'
|
||||
|
||||
@patch.dict('os.environ', {})
|
||||
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
|
||||
@patch('agents.email_providers.mailgun_provider.get_email_config')
|
||||
def test_init_missing_api_key(self, mock_email_config, mock_mailgun_config):
|
||||
"""Test that initialization fails without API key."""
|
||||
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
|
||||
mock_email_config.return_value = {
|
||||
'from_name': 'Test',
|
||||
'from_email': 'test@test.com'
|
||||
}
|
||||
|
||||
with pytest.raises(ValueError, match="MAILGUN_API_KEY"):
|
||||
MailgunProvider()
|
||||
|
||||
@patch('agents.email_providers.mailgun_provider.requests.post')
|
||||
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-api-key'})
|
||||
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
|
||||
@patch('agents.email_providers.mailgun_provider.get_email_config')
|
||||
def test_send_email_success(self, mock_email_config, mock_mailgun_config, mock_post):
|
||||
"""Test successful email sending."""
|
||||
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
|
||||
mock_email_config.return_value = {
|
||||
'from_name': 'Test App',
|
||||
'from_email': 'test@test.com'
|
||||
}
|
||||
|
||||
# Mock successful response
|
||||
mock_response = MagicMock()
|
||||
mock_response.status_code = 200
|
||||
mock_post.return_value = mock_response
|
||||
|
||||
provider = MailgunProvider()
|
||||
result = provider.send_email(
|
||||
to="recipient@test.com",
|
||||
subject="Test Subject",
|
||||
html="<p>Test HTML</p>",
|
||||
text="Test Text"
|
||||
)
|
||||
|
||||
assert result is True
|
||||
mock_post.assert_called_once()
|
||||
|
||||
# Check request parameters
|
||||
call_args = mock_post.call_args
|
||||
assert call_args[1]['auth'] == ('api', 'test-api-key')
|
||||
assert call_args[1]['data']['to'] == 'recipient@test.com'
|
||||
assert call_args[1]['data']['subject'] == 'Test Subject'
|
||||
|
||||
@patch('agents.email_providers.mailgun_provider.requests.post')
|
||||
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-api-key'})
|
||||
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
|
||||
@patch('agents.email_providers.mailgun_provider.get_email_config')
|
||||
def test_send_email_failure(self, mock_email_config, mock_mailgun_config, mock_post):
|
||||
"""Test email sending failure."""
|
||||
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
|
||||
mock_email_config.return_value = {
|
||||
'from_name': 'Test App',
|
||||
'from_email': 'test@test.com'
|
||||
}
|
||||
|
||||
# Mock failed response
|
||||
mock_response = MagicMock()
|
||||
mock_response.status_code = 400
|
||||
mock_response.text = "Bad Request"
|
||||
mock_post.return_value = mock_response
|
||||
|
||||
provider = MailgunProvider()
|
||||
result = provider.send_email(
|
||||
to="recipient@test.com",
|
||||
subject="Test",
|
||||
html="<p>Test</p>",
|
||||
text="Test"
|
||||
)
|
||||
|
||||
assert result is False
|
||||
|
||||
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-api-key'})
|
||||
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
|
||||
@patch('agents.email_providers.mailgun_provider.get_email_config')
|
||||
def test_get_provider_name(self, mock_email_config, mock_mailgun_config):
|
||||
"""Test provider name."""
|
||||
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
|
||||
mock_email_config.return_value = {
|
||||
'from_name': 'Test',
|
||||
'from_email': 'test@test.com'
|
||||
}
|
||||
|
||||
provider = MailgunProvider()
|
||||
assert provider.get_provider_name() == "mailgun"
|
||||
|
||||
|
||||
class TestSendGridProvider:
|
||||
"""Tests for SendGrid email provider (stub)."""
|
||||
|
||||
@patch.dict('os.environ', {'SENDGRID_API_KEY': 'test-api-key'})
|
||||
@patch('agents.email_providers.sendgrid_provider.get_email_config')
|
||||
def test_init(self, mock_email_config):
|
||||
"""Test SendGrid provider initialization."""
|
||||
mock_email_config.return_value = {
|
||||
'from_name': 'Test App',
|
||||
'from_email': 'test@test.com'
|
||||
}
|
||||
|
||||
provider = SendGridProvider()
|
||||
|
||||
assert provider.api_key == 'test-api-key'
|
||||
assert provider.default_from_name == 'Test App'
|
||||
assert provider.default_from_email == 'test@test.com'
|
||||
|
||||
@patch.dict('os.environ', {'SENDGRID_API_KEY': 'test-api-key'})
|
||||
@patch('agents.email_providers.sendgrid_provider.get_email_config')
|
||||
def test_send_email_stub(self, mock_email_config):
|
||||
"""Test that SendGrid stub always succeeds."""
|
||||
mock_email_config.return_value = {
|
||||
'from_name': 'Test',
|
||||
'from_email': 'test@test.com'
|
||||
}
|
||||
|
||||
provider = SendGridProvider()
|
||||
result = provider.send_email(
|
||||
to="test@test.com",
|
||||
subject="Test",
|
||||
html="<p>Test</p>",
|
||||
text="Test"
|
||||
)
|
||||
|
||||
# Stub should always return True
|
||||
assert result is True
|
||||
|
||||
@patch.dict('os.environ', {'SENDGRID_API_KEY': 'test-api-key'})
|
||||
@patch('agents.email_providers.sendgrid_provider.get_email_config')
|
||||
def test_get_provider_name(self, mock_email_config):
|
||||
"""Test provider name."""
|
||||
mock_email_config.return_value = {
|
||||
'from_name': 'Test',
|
||||
'from_email': 'test@test.com'
|
||||
}
|
||||
|
||||
provider = SendGridProvider()
|
||||
assert provider.get_provider_name() == "sendgrid (stub)"
|
||||
|
||||
|
||||
class TestEmailProviderFactory:
|
||||
"""Tests for email provider factory."""
|
||||
|
||||
@patch('agents.email_providers.factory.get_configured_provider')
|
||||
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-key'})
|
||||
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
|
||||
@patch('agents.email_providers.mailgun_provider.get_email_config')
|
||||
def test_get_mailgun_provider(self, mock_email_config, mock_mailgun_config, mock_get_configured):
|
||||
"""Test getting Mailgun provider."""
|
||||
mock_get_configured.return_value = 'mailgun'
|
||||
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
|
||||
mock_email_config.return_value = {
|
||||
'from_name': 'Test',
|
||||
'from_email': 'test@test.com'
|
||||
}
|
||||
|
||||
provider = get_email_provider()
|
||||
|
||||
assert isinstance(provider, MailgunProvider)
|
||||
|
||||
@patch('agents.email_providers.factory.get_configured_provider')
|
||||
@patch.dict('os.environ', {})
|
||||
@patch('agents.email_providers.sendgrid_provider.get_email_config')
|
||||
def test_get_sendgrid_provider(self, mock_email_config, mock_get_configured):
|
||||
"""Test getting SendGrid provider."""
|
||||
mock_get_configured.return_value = 'sendgrid'
|
||||
mock_email_config.return_value = {
|
||||
'from_name': 'Test',
|
||||
'from_email': 'test@test.com'
|
||||
}
|
||||
|
||||
provider = get_email_provider()
|
||||
|
||||
assert isinstance(provider, SendGridProvider)
|
||||
|
||||
@patch('agents.email_providers.factory.get_configured_provider')
|
||||
def test_unknown_provider(self, mock_get_configured):
|
||||
"""Test that unknown provider raises error."""
|
||||
mock_get_configured.return_value = 'unknown'
|
||||
|
||||
with pytest.raises(ValueError, match="Unknown email provider"):
|
||||
get_email_provider()
|
||||
|
||||
@patch.dict('os.environ', {'MAILGUN_API_KEY': 'test-key'})
|
||||
@patch('agents.email_providers.mailgun_provider.get_mailgun_config')
|
||||
@patch('agents.email_providers.mailgun_provider.get_email_config')
|
||||
def test_explicit_provider_name(self, mock_email_config, mock_mailgun_config):
|
||||
"""Test explicitly specifying provider name."""
|
||||
mock_mailgun_config.return_value = {'domain': 'test.mailgun.org'}
|
||||
mock_email_config.return_value = {
|
||||
'from_name': 'Test',
|
||||
'from_email': 'test@test.com'
|
||||
}
|
||||
|
||||
provider = get_email_provider('mailgun')
|
||||
|
||||
assert isinstance(provider, MailgunProvider)
|
||||
|
||||
@@ -1,154 +0,0 @@
|
||||
"""Unit tests for metadata vector database."""
|
||||
|
||||
import pytest
|
||||
import tempfile
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
|
||||
from setup_metadata_vectordb import MetadataVectorDB
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_vectordb():
|
||||
"""Create a temporary metadata vector database."""
|
||||
temp_dir = tempfile.mkdtemp()
|
||||
vectordb = MetadataVectorDB(persist_directory=temp_dir)
|
||||
|
||||
yield vectordb
|
||||
|
||||
# Cleanup
|
||||
shutil.rmtree(temp_dir, ignore_errors=True)
|
||||
|
||||
|
||||
class TestMetadataVectorDB:
|
||||
"""Tests for MetadataVectorDB class."""
|
||||
|
||||
def test_initialization(self, temp_vectordb):
|
||||
"""Test vector DB initializes correctly."""
|
||||
assert temp_vectordb is not None
|
||||
assert temp_vectordb.colors_collection is not None
|
||||
assert temp_vectordb.breeds_collection is not None
|
||||
|
||||
def test_index_colors(self, temp_vectordb):
|
||||
"""Test indexing colors."""
|
||||
colors = ["Black", "White", "Black & White / Tuxedo", "Orange / Red"]
|
||||
|
||||
temp_vectordb.index_colors(colors, source="petfinder")
|
||||
|
||||
# Check indexed
|
||||
stats = temp_vectordb.get_stats()
|
||||
assert stats['colors_count'] == len(colors)
|
||||
|
||||
# Should not re-index same source
|
||||
temp_vectordb.index_colors(colors, source="petfinder")
|
||||
stats = temp_vectordb.get_stats()
|
||||
assert stats['colors_count'] == len(colors) # Should not double
|
||||
|
||||
def test_index_breeds(self, temp_vectordb):
|
||||
"""Test indexing breeds."""
|
||||
breeds = ["Siamese", "Persian", "Maine Coon", "Bengal"]
|
||||
|
||||
temp_vectordb.index_breeds(breeds, source="petfinder")
|
||||
|
||||
# Check indexed
|
||||
stats = temp_vectordb.get_stats()
|
||||
assert stats['breeds_count'] == len(breeds)
|
||||
|
||||
def test_search_color_exact(self, temp_vectordb):
|
||||
"""Test searching for exact color match."""
|
||||
colors = ["Black", "White", "Black & White / Tuxedo"]
|
||||
temp_vectordb.index_colors(colors, source="petfinder")
|
||||
|
||||
# Search for exact match
|
||||
results = temp_vectordb.search_color("tuxedo", source_filter="petfinder")
|
||||
|
||||
assert len(results) > 0
|
||||
assert results[0]['color'] == "Black & White / Tuxedo"
|
||||
assert results[0]['similarity'] > 0.5 # Should be reasonable similarity
|
||||
|
||||
def test_search_color_fuzzy(self, temp_vectordb):
|
||||
"""Test searching for color with typo."""
|
||||
colors = ["Black & White / Tuxedo", "Orange / Red", "Gray / Blue / Silver"]
|
||||
temp_vectordb.index_colors(colors, source="petfinder")
|
||||
|
||||
# Search with typo
|
||||
results = temp_vectordb.search_color("tuxado", source_filter="petfinder") # typo: tuxado
|
||||
|
||||
assert len(results) > 0
|
||||
# Should still find tuxedo
|
||||
assert "Tuxedo" in results[0]['color'] or "tuxado" in results[0]['color'].lower()
|
||||
|
||||
def test_search_breed_exact(self, temp_vectordb):
|
||||
"""Test searching for exact breed match."""
|
||||
breeds = ["Siamese", "Persian", "Maine Coon"]
|
||||
temp_vectordb.index_breeds(breeds, source="petfinder")
|
||||
|
||||
results = temp_vectordb.search_breed("siamese", source_filter="petfinder")
|
||||
|
||||
assert len(results) > 0
|
||||
assert results[0]['breed'] == "Siamese"
|
||||
assert results[0]['similarity'] > 0.9 # Should be very high for exact match
|
||||
|
||||
def test_search_breed_fuzzy(self, temp_vectordb):
|
||||
"""Test searching for breed with typo."""
|
||||
breeds = ["Maine Coon", "Ragdoll", "British Shorthair"]
|
||||
temp_vectordb.index_breeds(breeds, source="petfinder")
|
||||
|
||||
# Typo: "main coon" instead of "Maine Coon"
|
||||
results = temp_vectordb.search_breed("main coon", source_filter="petfinder")
|
||||
|
||||
assert len(results) > 0
|
||||
assert "Maine" in results[0]['breed'] or "Coon" in results[0]['breed']
|
||||
|
||||
def test_multiple_sources(self, temp_vectordb):
|
||||
"""Test indexing from multiple sources."""
|
||||
petfinder_colors = ["Black", "White", "Tabby"]
|
||||
rescuegroups_colors = ["Black", "Grey", "Calico"]
|
||||
|
||||
temp_vectordb.index_colors(petfinder_colors, source="petfinder")
|
||||
temp_vectordb.index_colors(rescuegroups_colors, source="rescuegroups")
|
||||
|
||||
# Should have both indexed
|
||||
stats = temp_vectordb.get_stats()
|
||||
assert stats['colors_count'] == len(petfinder_colors) + len(rescuegroups_colors)
|
||||
|
||||
# Search with source filter
|
||||
results = temp_vectordb.search_color("black", source_filter="petfinder")
|
||||
assert len(results) > 0
|
||||
assert results[0]['source'] == "petfinder"
|
||||
|
||||
def test_empty_search(self, temp_vectordb):
|
||||
"""Test searching with empty string."""
|
||||
colors = ["Black", "White"]
|
||||
temp_vectordb.index_colors(colors, source="petfinder")
|
||||
|
||||
results = temp_vectordb.search_color("", source_filter="petfinder")
|
||||
assert len(results) == 0
|
||||
|
||||
results = temp_vectordb.search_color(None, source_filter="petfinder")
|
||||
assert len(results) == 0
|
||||
|
||||
def test_no_match(self, temp_vectordb):
|
||||
"""Test search that returns no good matches."""
|
||||
colors = ["Black", "White"]
|
||||
temp_vectordb.index_colors(colors, source="petfinder")
|
||||
|
||||
# Search for something very different
|
||||
results = temp_vectordb.search_color("xyzabc123", source_filter="petfinder")
|
||||
|
||||
# Will return something (nearest neighbor) but with low similarity
|
||||
if len(results) > 0:
|
||||
assert results[0]['similarity'] < 0.5 # Low similarity
|
||||
|
||||
def test_n_results(self, temp_vectordb):
|
||||
"""Test returning multiple results."""
|
||||
colors = ["Black", "White", "Black & White / Tuxedo", "Gray / Blue / Silver"]
|
||||
temp_vectordb.index_colors(colors, source="petfinder")
|
||||
|
||||
# Get top 3 results
|
||||
results = temp_vectordb.search_color("black", n_results=3, source_filter="petfinder")
|
||||
|
||||
assert len(results) <= 3
|
||||
# First should be best match
|
||||
assert "Black" in results[0]['color']
|
||||
|
||||
@@ -1,186 +0,0 @@
|
||||
"""Fixed unit tests for data models."""
|
||||
|
||||
import pytest
|
||||
from datetime import datetime
|
||||
from models.cats import Cat, CatProfile, CatMatch, AdoptionAlert, SearchResult
|
||||
|
||||
|
||||
class TestCat:
|
||||
"""Tests for Cat model."""
|
||||
|
||||
def test_cat_creation(self):
|
||||
"""Test basic cat creation."""
|
||||
cat = Cat(
|
||||
id="12345",
|
||||
name="Fluffy",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="New York",
|
||||
state="NY",
|
||||
source="petfinder",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/12345"
|
||||
)
|
||||
|
||||
assert cat.name == "Fluffy"
|
||||
assert cat.breed == "Persian"
|
||||
assert cat.age == "adult"
|
||||
assert cat.gender == "female"
|
||||
assert cat.size == "medium"
|
||||
assert cat.organization_name == "Test Rescue"
|
||||
|
||||
def test_cat_with_optional_fields(self):
|
||||
"""Test cat with all optional fields."""
|
||||
cat = Cat(
|
||||
id="12345",
|
||||
name="Fluffy",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="New York",
|
||||
state="NY",
|
||||
source="petfinder",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/12345",
|
||||
description="Very fluffy",
|
||||
primary_photo="http://example.com/photo.jpg",
|
||||
adoption_fee=150.00,
|
||||
good_with_children=True,
|
||||
good_with_dogs=False,
|
||||
good_with_cats=True
|
||||
)
|
||||
|
||||
assert cat.description == "Very fluffy"
|
||||
assert cat.adoption_fee == 150.00
|
||||
assert cat.good_with_children is True
|
||||
|
||||
def test_cat_from_json(self):
|
||||
"""Test cat deserialization from JSON."""
|
||||
json_data = """
|
||||
{
|
||||
"id": "12345",
|
||||
"name": "Fluffy",
|
||||
"breed": "Persian",
|
||||
"age": "adult",
|
||||
"gender": "female",
|
||||
"size": "medium",
|
||||
"city": "New York",
|
||||
"state": "NY",
|
||||
"source": "petfinder",
|
||||
"organization_name": "Test Rescue",
|
||||
"url": "https://example.com/cat/12345"
|
||||
}
|
||||
"""
|
||||
|
||||
cat = Cat.model_validate_json(json_data)
|
||||
assert cat.name == "Fluffy"
|
||||
assert cat.id == "12345"
|
||||
|
||||
|
||||
class TestCatProfile:
|
||||
"""Tests for CatProfile model."""
|
||||
|
||||
def test_profile_creation_minimal(self):
|
||||
"""Test profile with minimal fields."""
|
||||
profile = CatProfile()
|
||||
|
||||
assert profile.personality_description == "" # Defaults to empty string
|
||||
assert profile.max_distance == 100
|
||||
assert profile.age_range is None # No default
|
||||
|
||||
def test_profile_creation_full(self):
|
||||
"""Test profile with all fields."""
|
||||
profile = CatProfile(
|
||||
user_location="10001",
|
||||
max_distance=50,
|
||||
personality_description="friendly and playful",
|
||||
age_range=["young", "adult"],
|
||||
size=["small", "medium"],
|
||||
good_with_children=True,
|
||||
good_with_dogs=True,
|
||||
good_with_cats=False
|
||||
)
|
||||
|
||||
assert profile.user_location == "10001"
|
||||
assert profile.max_distance == 50
|
||||
assert "young" in profile.age_range
|
||||
assert profile.good_with_children is True
|
||||
|
||||
|
||||
class TestCatMatch:
|
||||
"""Tests for CatMatch model."""
|
||||
|
||||
def test_match_creation(self):
|
||||
"""Test match creation."""
|
||||
cat = Cat(
|
||||
id="12345",
|
||||
name="Fluffy",
|
||||
breed="Persian",
|
||||
age="adult",
|
||||
gender="female",
|
||||
size="medium",
|
||||
city="New York",
|
||||
state="NY",
|
||||
source="petfinder",
|
||||
organization_name="Test Rescue",
|
||||
url="https://example.com/cat/12345"
|
||||
)
|
||||
|
||||
match = CatMatch(
|
||||
cat=cat,
|
||||
match_score=0.85,
|
||||
vector_similarity=0.9,
|
||||
attribute_match_score=0.8,
|
||||
explanation="Great personality match"
|
||||
)
|
||||
|
||||
assert match.cat.name == "Fluffy"
|
||||
assert match.match_score == 0.85
|
||||
assert "personality" in match.explanation
|
||||
|
||||
|
||||
class TestAdoptionAlert:
|
||||
"""Tests for AdoptionAlert model."""
|
||||
|
||||
def test_alert_creation(self):
|
||||
"""Test alert creation."""
|
||||
cat_profile = CatProfile(
|
||||
user_location="10001",
|
||||
personality_description="friendly"
|
||||
)
|
||||
|
||||
alert = AdoptionAlert(
|
||||
user_id=1,
|
||||
user_email="test@example.com",
|
||||
profile=cat_profile, # Correct field name
|
||||
frequency="daily"
|
||||
)
|
||||
|
||||
assert alert.user_email == "test@example.com"
|
||||
assert alert.frequency == "daily"
|
||||
assert alert.active is True
|
||||
|
||||
|
||||
class TestSearchResult:
|
||||
"""Tests for SearchResult model."""
|
||||
|
||||
def test_search_result_creation(self):
|
||||
"""Test search result creation."""
|
||||
profile = CatProfile(user_location="10001")
|
||||
|
||||
result = SearchResult(
|
||||
matches=[],
|
||||
total_found=0,
|
||||
search_profile=profile,
|
||||
search_time=1.23,
|
||||
sources_queried=["petfinder"],
|
||||
duplicates_removed=0
|
||||
)
|
||||
|
||||
assert result.total_found == 0
|
||||
assert result.search_time == 1.23
|
||||
assert "petfinder" in result.sources_queried
|
||||
|
||||
@@ -1,37 +0,0 @@
|
||||
"""Utility functions for Tuxedo Link."""
|
||||
|
||||
from .deduplication import (
|
||||
create_fingerprint,
|
||||
calculate_levenshtein_similarity,
|
||||
calculate_text_similarity,
|
||||
)
|
||||
from .image_utils import generate_image_embedding, calculate_image_similarity
|
||||
from .log_utils import reformat
|
||||
from .config import (
|
||||
get_config,
|
||||
is_production,
|
||||
get_db_path,
|
||||
get_vectordb_path,
|
||||
get_email_provider,
|
||||
get_email_config,
|
||||
get_mailgun_config,
|
||||
reload_config,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"create_fingerprint",
|
||||
"calculate_levenshtein_similarity",
|
||||
"calculate_text_similarity",
|
||||
"generate_image_embedding",
|
||||
"calculate_image_similarity",
|
||||
"reformat",
|
||||
"get_config",
|
||||
"is_production",
|
||||
"get_db_path",
|
||||
"get_vectordb_path",
|
||||
"get_email_provider",
|
||||
"get_email_config",
|
||||
"get_mailgun_config",
|
||||
"reload_config",
|
||||
]
|
||||
|
||||
@@ -1,174 +0,0 @@
|
||||
"""
|
||||
Breed mapping utilities for cat APIs.
|
||||
|
||||
Handles mapping user breed terms to valid API breed values
|
||||
using dictionary lookups, vector search, and exact matching.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from typing import List, Optional, Dict
|
||||
|
||||
# Mapping of common user terms to API breed values
|
||||
# These are fuzzy/colloquial terms that users might type
|
||||
USER_TERM_TO_API_BREED: Dict[str, List[str]] = {
|
||||
# Common misspellings and variations
|
||||
"main coon": ["Maine Coon"],
|
||||
"maine": ["Maine Coon"],
|
||||
"ragdol": ["Ragdoll"],
|
||||
"siames": ["Siamese"],
|
||||
"persian": ["Persian"],
|
||||
"bengal": ["Bengal"],
|
||||
"british shorthair": ["British Shorthair"],
|
||||
"russian blue": ["Russian Blue"],
|
||||
"sphynx": ["Sphynx"],
|
||||
"sphinx": ["Sphynx"],
|
||||
"american shorthair": ["American Shorthair"],
|
||||
"scottish fold": ["Scottish Fold"],
|
||||
"abyssinian": ["Abyssinian"],
|
||||
"birman": ["Birman"],
|
||||
"burmese": ["Burmese"],
|
||||
"himalayan": ["Himalayan"],
|
||||
"norwegian forest": ["Norwegian Forest Cat"],
|
||||
"norwegian forest cat": ["Norwegian Forest Cat"],
|
||||
"oriental": ["Oriental"],
|
||||
"somali": ["Somali"],
|
||||
"turkish angora": ["Turkish Angora"],
|
||||
"turkish van": ["Turkish Van"],
|
||||
|
||||
# Mixed breeds
|
||||
"mixed": ["Mixed Breed", "Domestic Short Hair", "Domestic Medium Hair", "Domestic Long Hair"],
|
||||
"mixed breed": ["Mixed Breed", "Domestic Short Hair", "Domestic Medium Hair", "Domestic Long Hair"],
|
||||
"domestic": ["Domestic Short Hair", "Domestic Medium Hair", "Domestic Long Hair"],
|
||||
"dsh": ["Domestic Short Hair"],
|
||||
"dmh": ["Domestic Medium Hair"],
|
||||
"dlh": ["Domestic Long Hair"],
|
||||
"tabby": ["Domestic Short Hair"], # Tabby is a pattern, not a breed
|
||||
"tuxedo": ["Domestic Short Hair"], # Tuxedo is a color, not a breed
|
||||
}
|
||||
|
||||
|
||||
def normalize_user_breeds(
|
||||
user_breeds: List[str],
|
||||
valid_api_breeds: List[str],
|
||||
vectordb: Optional[object] = None,
|
||||
source: str = "petfinder",
|
||||
similarity_threshold: float = 0.7
|
||||
) -> List[str]:
|
||||
"""
|
||||
Normalize user breed preferences to valid API breed values.
|
||||
|
||||
Uses 3-tier strategy:
|
||||
1. Dictionary lookup for common variations
|
||||
2. Vector DB semantic search for fuzzy matching
|
||||
3. Direct string matching as fallback
|
||||
|
||||
Args:
|
||||
user_breeds: List of breed terms provided by the user
|
||||
valid_api_breeds: List of breeds actually accepted by the API
|
||||
vectordb: Optional MetadataVectorDB instance for semantic search
|
||||
source: API source (petfinder/rescuegroups) for vector filtering
|
||||
similarity_threshold: Minimum similarity score (0-1) for vector matches
|
||||
|
||||
Returns:
|
||||
List of valid API breed strings
|
||||
"""
|
||||
if not user_breeds:
|
||||
return []
|
||||
|
||||
normalized_breeds = set()
|
||||
|
||||
for user_term in user_breeds:
|
||||
if not user_term or not user_term.strip():
|
||||
continue
|
||||
|
||||
user_term_lower = user_term.lower().strip()
|
||||
matched = False
|
||||
|
||||
# Tier 1: Dictionary lookup (instant, common variations)
|
||||
if user_term_lower in USER_TERM_TO_API_BREED:
|
||||
mapped_breeds = USER_TERM_TO_API_BREED[user_term_lower]
|
||||
for mapped_breed in mapped_breeds:
|
||||
if mapped_breed in valid_api_breeds:
|
||||
normalized_breeds.add(mapped_breed)
|
||||
matched = True
|
||||
|
||||
if matched:
|
||||
logging.info(f"🎯 Dictionary match: '{user_term}' → {list(mapped_breeds)}")
|
||||
continue
|
||||
|
||||
# Tier 2: Vector DB semantic search (fuzzy matching, handles typos)
|
||||
if vectordb:
|
||||
try:
|
||||
matches = vectordb.search_breed(
|
||||
user_term,
|
||||
n_results=1,
|
||||
source_filter=source
|
||||
)
|
||||
|
||||
if matches and matches[0]['similarity'] >= similarity_threshold:
|
||||
best_match = matches[0]['breed']
|
||||
similarity = matches[0]['similarity']
|
||||
|
||||
if best_match in valid_api_breeds:
|
||||
normalized_breeds.add(best_match)
|
||||
logging.info(
|
||||
f"🔍 Vector match: '{user_term}' → '{best_match}' "
|
||||
f"(similarity: {similarity:.2f})"
|
||||
)
|
||||
matched = True
|
||||
continue
|
||||
except Exception as e:
|
||||
logging.warning(f"Vector search failed for breed '{user_term}': {e}")
|
||||
|
||||
# Tier 3: Direct string matching (exact or substring)
|
||||
if not matched:
|
||||
# Try exact match (case-insensitive)
|
||||
for valid_breed in valid_api_breeds:
|
||||
if valid_breed.lower() == user_term_lower:
|
||||
normalized_breeds.add(valid_breed)
|
||||
logging.info(f"✓ Exact match: '{user_term}' → '{valid_breed}'")
|
||||
matched = True
|
||||
break
|
||||
|
||||
# Try substring match if exact didn't work
|
||||
if not matched:
|
||||
for valid_breed in valid_api_breeds:
|
||||
if user_term_lower in valid_breed.lower():
|
||||
normalized_breeds.add(valid_breed)
|
||||
logging.info(f"≈ Substring match: '{user_term}' → '{valid_breed}'")
|
||||
matched = True
|
||||
|
||||
# Log if no match found
|
||||
if not matched:
|
||||
logging.warning(
|
||||
f"⚠️ No breed match found for '{user_term}'. "
|
||||
f"User will see broader results."
|
||||
)
|
||||
|
||||
result = list(normalized_breeds)
|
||||
logging.info(f"Breed normalization complete: {user_breeds} → {result}")
|
||||
return result
|
||||
|
||||
|
||||
def get_breed_suggestions(breed_term: str, valid_breeds: List[str], top_n: int = 5) -> List[str]:
|
||||
"""
|
||||
Get breed suggestions for autocomplete or error messages.
|
||||
|
||||
Args:
|
||||
breed_term: Partial or misspelled breed name
|
||||
valid_breeds: List of valid API breed values
|
||||
top_n: Number of suggestions to return
|
||||
|
||||
Returns:
|
||||
List of suggested breed names
|
||||
"""
|
||||
term_lower = breed_term.lower().strip()
|
||||
suggestions = []
|
||||
|
||||
# Find breeds containing the term
|
||||
for breed in valid_breeds:
|
||||
if term_lower in breed.lower():
|
||||
suggestions.append(breed)
|
||||
|
||||
return suggestions[:top_n]
|
||||
|
||||
@@ -1,224 +0,0 @@
|
||||
"""
|
||||
Color mapping utilities for cat APIs.
|
||||
|
||||
Handles mapping user color terms to valid API color values
|
||||
using dictionary lookups, vector search, and exact matching.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from typing import List, Dict, Optional
|
||||
|
||||
# Mapping of common user terms to Petfinder API color values
|
||||
# Based on actual Petfinder API color list
|
||||
USER_TERM_TO_API_COLOR: Dict[str, List[str]] = {
|
||||
# Tuxedo/Bicolor patterns
|
||||
"tuxedo": ["Black & White / Tuxedo"],
|
||||
"black and white": ["Black & White / Tuxedo"],
|
||||
"black & white": ["Black & White / Tuxedo"],
|
||||
"bicolor": ["Black & White / Tuxedo"], # Most common bicolor
|
||||
|
||||
# Solid colors
|
||||
"black": ["Black"],
|
||||
"white": ["White"],
|
||||
|
||||
# Orange variations
|
||||
"orange": ["Orange / Red"],
|
||||
"red": ["Orange / Red"],
|
||||
"ginger": ["Orange / Red"],
|
||||
"orange and white": ["Orange & White"],
|
||||
"orange & white": ["Orange & White"],
|
||||
|
||||
# Gray variations
|
||||
"gray": ["Gray / Blue / Silver"],
|
||||
"grey": ["Gray / Blue / Silver"],
|
||||
"silver": ["Gray / Blue / Silver"],
|
||||
"blue": ["Gray / Blue / Silver"],
|
||||
"gray and white": ["Gray & White"],
|
||||
"grey and white": ["Gray & White"],
|
||||
|
||||
# Brown/Chocolate
|
||||
"brown": ["Brown / Chocolate"],
|
||||
"chocolate": ["Brown / Chocolate"],
|
||||
|
||||
# Cream/Ivory
|
||||
"cream": ["Cream / Ivory"],
|
||||
"ivory": ["Cream / Ivory"],
|
||||
"buff": ["Buff / Tan / Fawn"],
|
||||
"tan": ["Buff / Tan / Fawn"],
|
||||
"fawn": ["Buff / Tan / Fawn"],
|
||||
|
||||
# Patterns
|
||||
"calico": ["Calico"],
|
||||
"dilute calico": ["Dilute Calico"],
|
||||
"tortoiseshell": ["Tortoiseshell"],
|
||||
"tortie": ["Tortoiseshell"],
|
||||
"dilute tortoiseshell": ["Dilute Tortoiseshell"],
|
||||
"torbie": ["Torbie"],
|
||||
|
||||
# Tabby patterns
|
||||
"tabby": ["Tabby (Brown / Chocolate)", "Tabby (Gray / Blue / Silver)", "Tabby (Orange / Red)"],
|
||||
"brown tabby": ["Tabby (Brown / Chocolate)"],
|
||||
"gray tabby": ["Tabby (Gray / Blue / Silver)"],
|
||||
"grey tabby": ["Tabby (Gray / Blue / Silver)"],
|
||||
"orange tabby": ["Tabby (Orange / Red)"],
|
||||
"red tabby": ["Tabby (Orange / Red)"],
|
||||
"tiger": ["Tabby (Tiger Striped)"],
|
||||
"tiger striped": ["Tabby (Tiger Striped)"],
|
||||
"leopard": ["Tabby (Leopard / Spotted)"],
|
||||
"spotted": ["Tabby (Leopard / Spotted)"],
|
||||
|
||||
# Point colors (Siamese-type)
|
||||
"blue point": ["Blue Point"],
|
||||
"chocolate point": ["Chocolate Point"],
|
||||
"cream point": ["Cream Point"],
|
||||
"flame point": ["Flame Point"],
|
||||
"lilac point": ["Lilac Point"],
|
||||
"seal point": ["Seal Point"],
|
||||
|
||||
# Other
|
||||
"smoke": ["Smoke"],
|
||||
"blue cream": ["Blue Cream"],
|
||||
}
|
||||
|
||||
|
||||
def normalize_user_colors(
|
||||
user_colors: List[str],
|
||||
valid_api_colors: List[str],
|
||||
vectordb: Optional[object] = None,
|
||||
source: str = "petfinder",
|
||||
similarity_threshold: float = 0.7
|
||||
) -> List[str]:
|
||||
"""
|
||||
Normalize user color preferences to valid API color values.
|
||||
|
||||
Uses 3-tier strategy:
|
||||
1. Dictionary lookup for common color terms
|
||||
2. Vector DB semantic search for fuzzy matching
|
||||
3. Direct string matching as fallback
|
||||
|
||||
Args:
|
||||
user_colors: List of color terms provided by the user
|
||||
valid_api_colors: List of colors actually accepted by the API
|
||||
vectordb: Optional MetadataVectorDB instance for semantic search
|
||||
source: API source (petfinder/rescuegroups) for vector filtering
|
||||
similarity_threshold: Minimum similarity score (0-1) for vector matches
|
||||
|
||||
Returns:
|
||||
List of valid API color strings
|
||||
"""
|
||||
if not user_colors:
|
||||
return []
|
||||
|
||||
normalized_colors = set()
|
||||
|
||||
for user_term in user_colors:
|
||||
if not user_term or not user_term.strip():
|
||||
continue
|
||||
|
||||
user_term_lower = user_term.lower().strip()
|
||||
matched = False
|
||||
|
||||
# Tier 1: Dictionary lookup (instant, common color terms)
|
||||
if user_term_lower in USER_TERM_TO_API_COLOR:
|
||||
mapped_colors = USER_TERM_TO_API_COLOR[user_term_lower]
|
||||
for mapped_color in mapped_colors:
|
||||
if mapped_color in valid_api_colors:
|
||||
normalized_colors.add(mapped_color)
|
||||
matched = True
|
||||
|
||||
if matched:
|
||||
logging.info(f"🎯 Dictionary match: '{user_term}' → {list(mapped_colors)}")
|
||||
continue
|
||||
|
||||
# Tier 2: Vector DB semantic search (fuzzy matching, handles typos)
|
||||
if vectordb:
|
||||
try:
|
||||
matches = vectordb.search_color(
|
||||
user_term,
|
||||
n_results=1,
|
||||
source_filter=source
|
||||
)
|
||||
|
||||
if matches and matches[0]['similarity'] >= similarity_threshold:
|
||||
best_match = matches[0]['color']
|
||||
similarity = matches[0]['similarity']
|
||||
|
||||
if best_match in valid_api_colors:
|
||||
normalized_colors.add(best_match)
|
||||
logging.info(
|
||||
f"🔍 Vector match: '{user_term}' → '{best_match}' "
|
||||
f"(similarity: {similarity:.2f})"
|
||||
)
|
||||
matched = True
|
||||
continue
|
||||
except Exception as e:
|
||||
logging.warning(f"Vector search failed for color '{user_term}': {e}")
|
||||
|
||||
# Tier 3: Direct string matching (exact or substring)
|
||||
if not matched:
|
||||
# Try exact match (case-insensitive)
|
||||
for valid_color in valid_api_colors:
|
||||
if valid_color.lower() == user_term_lower:
|
||||
normalized_colors.add(valid_color)
|
||||
logging.info(f"✓ Exact match: '{user_term}' → '{valid_color}'")
|
||||
matched = True
|
||||
break
|
||||
|
||||
# Try substring match if exact didn't work
|
||||
if not matched:
|
||||
for valid_color in valid_api_colors:
|
||||
if user_term_lower in valid_color.lower():
|
||||
normalized_colors.add(valid_color)
|
||||
logging.info(f"≈ Substring match: '{user_term}' → '{valid_color}'")
|
||||
matched = True
|
||||
|
||||
# Log if no match found
|
||||
if not matched:
|
||||
logging.warning(
|
||||
f"⚠️ No color match found for '{user_term}'. "
|
||||
f"User will see broader results."
|
||||
)
|
||||
|
||||
result = list(normalized_colors)
|
||||
logging.info(f"Color normalization complete: {user_colors} → {result}")
|
||||
return result
|
||||
|
||||
|
||||
def get_color_suggestions(color_term: str, valid_colors: List[str], top_n: int = 5) -> List[str]:
|
||||
"""
|
||||
Get color suggestions for autocomplete or error messages.
|
||||
|
||||
Args:
|
||||
color_term: Partial or misspelled color name
|
||||
valid_colors: List of valid API color values
|
||||
top_n: Number of suggestions to return
|
||||
|
||||
Returns:
|
||||
List of suggested color names
|
||||
"""
|
||||
term_lower = color_term.lower().strip()
|
||||
suggestions = []
|
||||
|
||||
# Find colors containing the term
|
||||
for color in valid_colors:
|
||||
if term_lower in color.lower():
|
||||
suggestions.append(color)
|
||||
|
||||
return suggestions[:top_n]
|
||||
|
||||
|
||||
def get_color_help_text(valid_colors: List[str]) -> str:
|
||||
"""
|
||||
Generate help text for valid colors.
|
||||
|
||||
Args:
|
||||
valid_colors: List of valid API colors
|
||||
|
||||
Returns:
|
||||
Formatted string describing valid colors
|
||||
"""
|
||||
if not valid_colors:
|
||||
return "No color information available."
|
||||
|
||||
return f"Valid colors: {', '.join(valid_colors)}"
|
||||
|
||||
@@ -1,134 +0,0 @@
|
||||
"""Configuration management for Tuxedo Link."""
|
||||
|
||||
import yaml
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
|
||||
|
||||
_config_cache: Dict[str, Any] = None
|
||||
|
||||
|
||||
def load_config() -> Dict[str, Any]:
|
||||
"""
|
||||
Load configuration from YAML with environment variable overrides.
|
||||
|
||||
Returns:
|
||||
Dict[str, Any]: Configuration dictionary
|
||||
"""
|
||||
global _config_cache
|
||||
if _config_cache:
|
||||
return _config_cache
|
||||
|
||||
# Determine config path - look for config.yaml, fallback to example
|
||||
project_root = Path(__file__).parent.parent
|
||||
config_path = project_root / "config.yaml"
|
||||
|
||||
if not config_path.exists():
|
||||
config_path = project_root / "config.example.yaml"
|
||||
|
||||
if not config_path.exists():
|
||||
raise FileNotFoundError(
|
||||
"No config.yaml or config.example.yaml found. "
|
||||
"Please copy config.example.yaml to config.yaml and configure it."
|
||||
)
|
||||
|
||||
# Load YAML
|
||||
with open(config_path) as f:
|
||||
config = yaml.safe_load(f)
|
||||
|
||||
# Override with environment variables if present
|
||||
if 'EMAIL_PROVIDER' in os.environ:
|
||||
config['email']['provider'] = os.environ['EMAIL_PROVIDER']
|
||||
if 'DEPLOYMENT_MODE' in os.environ:
|
||||
config['deployment']['mode'] = os.environ['DEPLOYMENT_MODE']
|
||||
if 'MAILGUN_DOMAIN' in os.environ:
|
||||
config['mailgun']['domain'] = os.environ['MAILGUN_DOMAIN']
|
||||
|
||||
_config_cache = config
|
||||
return config
|
||||
|
||||
|
||||
def get_config() -> Dict[str, Any]:
|
||||
"""
|
||||
Get current configuration.
|
||||
|
||||
Returns:
|
||||
Dict[str, Any]: Configuration dictionary
|
||||
"""
|
||||
return load_config()
|
||||
|
||||
|
||||
def is_production() -> bool:
|
||||
"""
|
||||
Check if running in production mode.
|
||||
|
||||
Returns:
|
||||
bool: True if production mode, False if local
|
||||
"""
|
||||
return get_config()['deployment']['mode'] == 'production'
|
||||
|
||||
|
||||
def get_db_path() -> str:
|
||||
"""
|
||||
Get database path based on deployment mode.
|
||||
|
||||
Returns:
|
||||
str: Path to database file
|
||||
"""
|
||||
config = get_config()
|
||||
mode = config['deployment']['mode']
|
||||
return config['deployment'][mode]['db_path']
|
||||
|
||||
|
||||
def get_vectordb_path() -> str:
|
||||
"""
|
||||
Get vector database path based on deployment mode.
|
||||
|
||||
Returns:
|
||||
str: Path to vector database directory
|
||||
"""
|
||||
config = get_config()
|
||||
mode = config['deployment']['mode']
|
||||
return config['deployment'][mode]['vectordb_path']
|
||||
|
||||
|
||||
def get_email_provider() -> str:
|
||||
"""
|
||||
Get configured email provider.
|
||||
|
||||
Returns:
|
||||
str: Email provider name (mailgun or sendgrid)
|
||||
"""
|
||||
return get_config()['email']['provider']
|
||||
|
||||
|
||||
def get_email_config() -> Dict[str, str]:
|
||||
"""
|
||||
Get email configuration.
|
||||
|
||||
Returns:
|
||||
Dict[str, str]: Email configuration (from_name, from_email)
|
||||
"""
|
||||
return get_config()['email']
|
||||
|
||||
|
||||
def get_mailgun_config() -> Dict[str, str]:
|
||||
"""
|
||||
Get Mailgun configuration.
|
||||
|
||||
Returns:
|
||||
Dict[str, str]: Mailgun configuration (domain)
|
||||
"""
|
||||
return get_config()['mailgun']
|
||||
|
||||
|
||||
def reload_config() -> None:
|
||||
"""
|
||||
Force reload configuration from file.
|
||||
Useful for testing or when config changes.
|
||||
"""
|
||||
global _config_cache
|
||||
_config_cache = None
|
||||
load_config()
|
||||
|
||||
@@ -1,201 +0,0 @@
|
||||
"""Deduplication utilities for identifying duplicate cat listings."""
|
||||
|
||||
import hashlib
|
||||
from typing import Tuple
|
||||
import Levenshtein
|
||||
|
||||
from models.cats import Cat
|
||||
|
||||
|
||||
def create_fingerprint(cat: Cat) -> str:
|
||||
"""
|
||||
Create a fingerprint for a cat based on stable attributes.
|
||||
|
||||
The fingerprint is a hash of:
|
||||
- Organization name (normalized)
|
||||
- Breed (normalized)
|
||||
- Age
|
||||
- Gender
|
||||
|
||||
Args:
|
||||
cat: Cat object
|
||||
|
||||
Returns:
|
||||
Fingerprint hash (16 characters)
|
||||
"""
|
||||
components = [
|
||||
cat.organization_name.lower().strip(),
|
||||
cat.breed.lower().strip(),
|
||||
str(cat.age).lower(),
|
||||
cat.gender.lower()
|
||||
]
|
||||
|
||||
# Create hash from combined components
|
||||
combined = '|'.join(components)
|
||||
hash_obj = hashlib.sha256(combined.encode())
|
||||
|
||||
# Return first 16 characters of hex digest
|
||||
return hash_obj.hexdigest()[:16]
|
||||
|
||||
|
||||
def calculate_levenshtein_similarity(str1: str, str2: str) -> float:
|
||||
"""
|
||||
Calculate normalized Levenshtein similarity between two strings.
|
||||
|
||||
Similarity = 1 - (distance / max_length)
|
||||
|
||||
Args:
|
||||
str1: First string
|
||||
str2: Second string
|
||||
|
||||
Returns:
|
||||
Similarity score (0-1, where 1 is identical)
|
||||
"""
|
||||
if not str1 or not str2:
|
||||
return 0.0
|
||||
|
||||
# Normalize strings
|
||||
str1 = str1.lower().strip()
|
||||
str2 = str2.lower().strip()
|
||||
|
||||
# Handle identical strings
|
||||
if str1 == str2:
|
||||
return 1.0
|
||||
|
||||
# Calculate Levenshtein distance
|
||||
distance = Levenshtein.distance(str1, str2)
|
||||
|
||||
# Normalize by maximum possible distance
|
||||
max_length = max(len(str1), len(str2))
|
||||
|
||||
if max_length == 0:
|
||||
return 1.0
|
||||
|
||||
similarity = 1.0 - (distance / max_length)
|
||||
|
||||
return max(0.0, similarity)
|
||||
|
||||
|
||||
def calculate_text_similarity(cat1: Cat, cat2: Cat) -> Tuple[float, float]:
|
||||
"""
|
||||
Calculate text similarity between two cats (name and description).
|
||||
|
||||
Args:
|
||||
cat1: First cat
|
||||
cat2: Second cat
|
||||
|
||||
Returns:
|
||||
Tuple of (name_similarity, description_similarity)
|
||||
"""
|
||||
# Name similarity
|
||||
name_similarity = calculate_levenshtein_similarity(cat1.name, cat2.name)
|
||||
|
||||
# Description similarity
|
||||
desc_similarity = calculate_levenshtein_similarity(
|
||||
cat1.description,
|
||||
cat2.description
|
||||
)
|
||||
|
||||
return name_similarity, desc_similarity
|
||||
|
||||
|
||||
def calculate_composite_score(
|
||||
name_similarity: float,
|
||||
description_similarity: float,
|
||||
image_similarity: float,
|
||||
name_weight: float = 0.4,
|
||||
description_weight: float = 0.3,
|
||||
image_weight: float = 0.3
|
||||
) -> float:
|
||||
"""
|
||||
Calculate a composite similarity score from multiple signals.
|
||||
|
||||
Args:
|
||||
name_similarity: Name similarity (0-1)
|
||||
description_similarity: Description similarity (0-1)
|
||||
image_similarity: Image similarity (0-1)
|
||||
name_weight: Weight for name similarity
|
||||
description_weight: Weight for description similarity
|
||||
image_weight: Weight for image similarity
|
||||
|
||||
Returns:
|
||||
Composite score (0-1)
|
||||
"""
|
||||
# Ensure weights sum to 1
|
||||
total_weight = name_weight + description_weight + image_weight
|
||||
if total_weight == 0:
|
||||
return 0.0
|
||||
|
||||
# Normalize weights
|
||||
name_weight /= total_weight
|
||||
description_weight /= total_weight
|
||||
image_weight /= total_weight
|
||||
|
||||
# Calculate weighted score
|
||||
score = (
|
||||
name_similarity * name_weight +
|
||||
description_similarity * description_weight +
|
||||
image_similarity * image_weight
|
||||
)
|
||||
|
||||
return score
|
||||
|
||||
|
||||
def normalize_string(s: str) -> str:
|
||||
"""
|
||||
Normalize a string for comparison.
|
||||
|
||||
- Convert to lowercase
|
||||
- Strip whitespace
|
||||
- Remove extra spaces
|
||||
|
||||
Args:
|
||||
s: String to normalize
|
||||
|
||||
Returns:
|
||||
Normalized string
|
||||
"""
|
||||
import re
|
||||
s = s.lower().strip()
|
||||
s = re.sub(r'\s+', ' ', s) # Replace multiple spaces with single space
|
||||
return s
|
||||
|
||||
|
||||
def calculate_breed_similarity(breed1: str, breed2: str) -> float:
|
||||
"""
|
||||
Calculate breed similarity with special handling for mixed breeds.
|
||||
|
||||
Args:
|
||||
breed1: First breed
|
||||
breed2: Second breed
|
||||
|
||||
Returns:
|
||||
Similarity score (0-1)
|
||||
"""
|
||||
breed1_norm = normalize_string(breed1)
|
||||
breed2_norm = normalize_string(breed2)
|
||||
|
||||
# Exact match
|
||||
if breed1_norm == breed2_norm:
|
||||
return 1.0
|
||||
|
||||
# Check if both are domestic shorthair/longhair (very common)
|
||||
domestic_variants = ['domestic short hair', 'domestic shorthair', 'dsh',
|
||||
'domestic long hair', 'domestic longhair', 'dlh',
|
||||
'domestic medium hair', 'domestic mediumhair', 'dmh']
|
||||
|
||||
if breed1_norm in domestic_variants and breed2_norm in domestic_variants:
|
||||
return 0.9 # High similarity for domestic cats
|
||||
|
||||
# Check for mix/mixed keywords
|
||||
mix_keywords = ['mix', 'mixed', 'tabby']
|
||||
breed1_has_mix = any(keyword in breed1_norm for keyword in mix_keywords)
|
||||
breed2_has_mix = any(keyword in breed2_norm for keyword in mix_keywords)
|
||||
|
||||
if breed1_has_mix and breed2_has_mix:
|
||||
# Both are mixes, higher tolerance
|
||||
return calculate_levenshtein_similarity(breed1, breed2) * 0.9
|
||||
|
||||
# Standard Levenshtein similarity
|
||||
return calculate_levenshtein_similarity(breed1, breed2)
|
||||
|
||||
@@ -1,161 +0,0 @@
|
||||
"""Geocoding utilities for location services."""
|
||||
|
||||
import requests
|
||||
from typing import Optional, Tuple
|
||||
|
||||
|
||||
def geocode_location(location: str) -> Optional[Tuple[float, float]]:
|
||||
"""
|
||||
Convert a location string (address, city, or ZIP) to latitude/longitude.
|
||||
|
||||
Uses the free Nominatim API (OpenStreetMap).
|
||||
|
||||
Args:
|
||||
location: Location string (address, city, ZIP code, etc.)
|
||||
|
||||
Returns:
|
||||
Tuple of (latitude, longitude) or None if geocoding fails
|
||||
"""
|
||||
try:
|
||||
# Use Nominatim API (free, no API key required)
|
||||
url = "https://nominatim.openstreetmap.org/search"
|
||||
params = {
|
||||
'q': location,
|
||||
'format': 'json',
|
||||
'limit': 1,
|
||||
'countrycodes': 'us,ca' # Limit to US and Canada
|
||||
}
|
||||
headers = {
|
||||
'User-Agent': 'TuxedoLink/1.0' # Required by Nominatim
|
||||
}
|
||||
|
||||
response = requests.get(url, params=params, headers=headers, timeout=10)
|
||||
response.raise_for_status()
|
||||
|
||||
results = response.json()
|
||||
if results and len(results) > 0:
|
||||
lat = float(results[0]['lat'])
|
||||
lon = float(results[0]['lon'])
|
||||
return lat, lon
|
||||
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
print(f"Geocoding failed for '{location}': {e}")
|
||||
return None
|
||||
|
||||
|
||||
def reverse_geocode(latitude: float, longitude: float) -> Optional[dict]:
|
||||
"""
|
||||
Convert latitude/longitude to address information.
|
||||
|
||||
Args:
|
||||
latitude: Latitude
|
||||
longitude: Longitude
|
||||
|
||||
Returns:
|
||||
Dictionary with address components or None if failed
|
||||
"""
|
||||
try:
|
||||
url = "https://nominatim.openstreetmap.org/reverse"
|
||||
params = {
|
||||
'lat': latitude,
|
||||
'lon': longitude,
|
||||
'format': 'json'
|
||||
}
|
||||
headers = {
|
||||
'User-Agent': 'TuxedoLink/1.0'
|
||||
}
|
||||
|
||||
response = requests.get(url, params=params, headers=headers, timeout=10)
|
||||
response.raise_for_status()
|
||||
|
||||
result = response.json()
|
||||
if 'address' in result:
|
||||
address = result['address']
|
||||
return {
|
||||
'city': address.get('city', address.get('town', address.get('village', ''))),
|
||||
'state': address.get('state', ''),
|
||||
'zip': address.get('postcode', ''),
|
||||
'country': address.get('country', ''),
|
||||
'display_name': result.get('display_name', '')
|
||||
}
|
||||
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
print(f"Reverse geocoding failed for ({latitude}, {longitude}): {e}")
|
||||
return None
|
||||
|
||||
|
||||
def calculate_distance(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
|
||||
"""
|
||||
Calculate the great circle distance between two points in miles.
|
||||
|
||||
Uses the Haversine formula.
|
||||
|
||||
Args:
|
||||
lat1: Latitude of first point
|
||||
lon1: Longitude of first point
|
||||
lat2: Latitude of second point
|
||||
lon2: Longitude of second point
|
||||
|
||||
Returns:
|
||||
Distance in miles
|
||||
"""
|
||||
from math import radians, sin, cos, sqrt, atan2
|
||||
|
||||
# Earth's radius in miles
|
||||
R = 3959.0
|
||||
|
||||
# Convert to radians
|
||||
lat1_rad = radians(lat1)
|
||||
lon1_rad = radians(lon1)
|
||||
lat2_rad = radians(lat2)
|
||||
lon2_rad = radians(lon2)
|
||||
|
||||
# Differences
|
||||
dlat = lat2_rad - lat1_rad
|
||||
dlon = lon2_rad - lon1_rad
|
||||
|
||||
# Haversine formula
|
||||
a = sin(dlat/2)**2 + cos(lat1_rad) * cos(lat2_rad) * sin(dlon/2)**2
|
||||
c = 2 * atan2(sqrt(a), sqrt(1-a))
|
||||
|
||||
distance = R * c
|
||||
|
||||
return distance
|
||||
|
||||
|
||||
def parse_location_input(location_input: str) -> Optional[Tuple[float, float]]:
|
||||
"""
|
||||
Parse location input that might be coordinates or an address.
|
||||
|
||||
Handles formats:
|
||||
- "lat,long" (e.g., "40.7128,-74.0060")
|
||||
- ZIP code (e.g., "10001")
|
||||
- City, State (e.g., "New York, NY")
|
||||
- Full address
|
||||
|
||||
Args:
|
||||
location_input: Location string
|
||||
|
||||
Returns:
|
||||
Tuple of (latitude, longitude) or None if parsing fails
|
||||
"""
|
||||
# Try to parse as coordinates first
|
||||
if ',' in location_input:
|
||||
parts = location_input.split(',')
|
||||
if len(parts) == 2:
|
||||
try:
|
||||
lat = float(parts[0].strip())
|
||||
lon = float(parts[1].strip())
|
||||
# Basic validation
|
||||
if -90 <= lat <= 90 and -180 <= lon <= 180:
|
||||
return lat, lon
|
||||
except ValueError:
|
||||
pass # Not coordinates, try geocoding
|
||||
|
||||
# Fall back to geocoding
|
||||
return geocode_location(location_input)
|
||||
|
||||
@@ -1,168 +0,0 @@
|
||||
"""Image utilities for generating and comparing image embeddings."""
|
||||
|
||||
import numpy as np
|
||||
import requests
|
||||
from PIL import Image
|
||||
from io import BytesIO
|
||||
from typing import Optional
|
||||
import open_clip
|
||||
import torch
|
||||
|
||||
|
||||
class ImageEmbeddingGenerator:
|
||||
"""Generate image embeddings using CLIP model."""
|
||||
|
||||
def __init__(self, model_name: str = 'ViT-B-32', pretrained: str = 'openai'):
|
||||
"""
|
||||
Initialize the embedding generator.
|
||||
|
||||
Args:
|
||||
model_name: CLIP model architecture
|
||||
pretrained: Pretrained weights to use
|
||||
"""
|
||||
self.device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
self.model, _, self.preprocess = open_clip.create_model_and_transforms(
|
||||
model_name,
|
||||
pretrained=pretrained,
|
||||
device=self.device
|
||||
)
|
||||
self.model.eval()
|
||||
|
||||
def download_image(self, url: str, timeout: int = 10) -> Optional[Image.Image]:
|
||||
"""
|
||||
Download an image from a URL.
|
||||
|
||||
Args:
|
||||
url: Image URL
|
||||
timeout: Request timeout in seconds
|
||||
|
||||
Returns:
|
||||
PIL Image or None if download fails
|
||||
"""
|
||||
try:
|
||||
response = requests.get(url, timeout=timeout)
|
||||
response.raise_for_status()
|
||||
img = Image.open(BytesIO(response.content))
|
||||
return img.convert('RGB') # Ensure RGB format
|
||||
except Exception as e:
|
||||
print(f"Failed to download image from {url}: {e}")
|
||||
return None
|
||||
|
||||
def generate_embedding(self, image: Image.Image) -> np.ndarray:
|
||||
"""
|
||||
Generate CLIP embedding for an image.
|
||||
|
||||
Args:
|
||||
image: PIL Image
|
||||
|
||||
Returns:
|
||||
Numpy array of image embedding
|
||||
"""
|
||||
with torch.no_grad():
|
||||
image_input = self.preprocess(image).unsqueeze(0).to(self.device)
|
||||
image_features = self.model.encode_image(image_input)
|
||||
|
||||
# Normalize embedding
|
||||
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
|
||||
|
||||
# Convert to numpy
|
||||
embedding = image_features.cpu().numpy().flatten()
|
||||
|
||||
return embedding.astype(np.float32)
|
||||
|
||||
def generate_embedding_from_url(self, url: str) -> Optional[np.ndarray]:
|
||||
"""
|
||||
Download an image and generate its embedding.
|
||||
|
||||
Args:
|
||||
url: Image URL
|
||||
|
||||
Returns:
|
||||
Numpy array of image embedding or None if failed
|
||||
"""
|
||||
image = self.download_image(url)
|
||||
if image is None:
|
||||
return None
|
||||
return self.generate_embedding(image)
|
||||
|
||||
|
||||
# Global instance (lazy loaded)
|
||||
_embedding_generator: Optional[ImageEmbeddingGenerator] = None
|
||||
|
||||
|
||||
def get_embedding_generator() -> ImageEmbeddingGenerator:
|
||||
"""Get or create the global embedding generator instance."""
|
||||
global _embedding_generator
|
||||
if _embedding_generator is None:
|
||||
_embedding_generator = ImageEmbeddingGenerator()
|
||||
return _embedding_generator
|
||||
|
||||
|
||||
def generate_image_embedding(image_url: str) -> Optional[np.ndarray]:
|
||||
"""
|
||||
Generate an image embedding from a URL.
|
||||
|
||||
This is a convenience function that uses the global embedding generator.
|
||||
|
||||
Args:
|
||||
image_url: URL of the image
|
||||
|
||||
Returns:
|
||||
Numpy array of image embedding or None if failed
|
||||
"""
|
||||
generator = get_embedding_generator()
|
||||
return generator.generate_embedding_from_url(image_url)
|
||||
|
||||
|
||||
def calculate_image_similarity(embedding1: np.ndarray, embedding2: np.ndarray) -> float:
|
||||
"""
|
||||
Calculate cosine similarity between two image embeddings.
|
||||
|
||||
Args:
|
||||
embedding1: First image embedding
|
||||
embedding2: Second image embedding
|
||||
|
||||
Returns:
|
||||
Similarity score (0-1, where 1 is most similar)
|
||||
"""
|
||||
if embedding1 is None or embedding2 is None:
|
||||
return 0.0
|
||||
|
||||
# Ensure embeddings are normalized
|
||||
norm1 = np.linalg.norm(embedding1)
|
||||
norm2 = np.linalg.norm(embedding2)
|
||||
|
||||
if norm1 == 0 or norm2 == 0:
|
||||
return 0.0
|
||||
|
||||
embedding1_norm = embedding1 / norm1
|
||||
embedding2_norm = embedding2 / norm2
|
||||
|
||||
# Cosine similarity
|
||||
similarity = np.dot(embedding1_norm, embedding2_norm)
|
||||
|
||||
# Clip to [0, 1] range (cosine similarity is [-1, 1])
|
||||
similarity = (similarity + 1) / 2
|
||||
|
||||
return float(similarity)
|
||||
|
||||
|
||||
def batch_generate_embeddings(image_urls: list[str]) -> list[Optional[np.ndarray]]:
|
||||
"""
|
||||
Generate embeddings for multiple images.
|
||||
|
||||
Args:
|
||||
image_urls: List of image URLs
|
||||
|
||||
Returns:
|
||||
List of embeddings (same length as input, None for failed downloads)
|
||||
"""
|
||||
generator = get_embedding_generator()
|
||||
embeddings = []
|
||||
|
||||
for url in image_urls:
|
||||
embedding = generator.generate_embedding_from_url(url)
|
||||
embeddings.append(embedding)
|
||||
|
||||
return embeddings
|
||||
|
||||
@@ -1,46 +0,0 @@
|
||||
"""Logging utilities for Tuxedo Link."""
|
||||
|
||||
# Foreground colors
|
||||
RED = '\033[31m'
|
||||
GREEN = '\033[32m'
|
||||
YELLOW = '\033[33m'
|
||||
BLUE = '\033[34m'
|
||||
MAGENTA = '\033[35m'
|
||||
CYAN = '\033[36m'
|
||||
WHITE = '\033[37m'
|
||||
|
||||
# Background color
|
||||
BG_BLACK = '\033[40m'
|
||||
BG_BLUE = '\033[44m'
|
||||
|
||||
# Reset code to return to default color
|
||||
RESET = '\033[0m'
|
||||
|
||||
# Mapping of terminal color codes to HTML colors
|
||||
mapper = {
|
||||
BG_BLACK+RED: "#dd0000",
|
||||
BG_BLACK+GREEN: "#00dd00",
|
||||
BG_BLACK+YELLOW: "#dddd00",
|
||||
BG_BLACK+BLUE: "#0000ee",
|
||||
BG_BLACK+MAGENTA: "#aa00dd",
|
||||
BG_BLACK+CYAN: "#00dddd",
|
||||
BG_BLACK+WHITE: "#87CEEB",
|
||||
BG_BLUE+WHITE: "#ff7800"
|
||||
}
|
||||
|
||||
|
||||
def reformat(message: str) -> str:
|
||||
"""
|
||||
Convert terminal color codes to HTML spans for Gradio display.
|
||||
|
||||
Args:
|
||||
message: Log message with terminal color codes
|
||||
|
||||
Returns:
|
||||
HTML formatted message
|
||||
"""
|
||||
for key, value in mapper.items():
|
||||
message = message.replace(key, f'<span style="color: {value}">')
|
||||
message = message.replace(RESET, '</span>')
|
||||
return message
|
||||
|
||||
@@ -1,37 +0,0 @@
|
||||
"""Timing utilities for performance monitoring."""
|
||||
|
||||
import time
|
||||
import functools
|
||||
from typing import Callable, Any
|
||||
|
||||
|
||||
def timed(func: Callable[..., Any]) -> Callable[..., Any]:
|
||||
"""
|
||||
Decorator to time function execution and log it.
|
||||
|
||||
Args:
|
||||
func: Function to be timed
|
||||
|
||||
Returns:
|
||||
Wrapped function that logs execution time
|
||||
|
||||
Usage:
|
||||
@timed
|
||||
def my_function():
|
||||
...
|
||||
"""
|
||||
@functools.wraps(func)
|
||||
def wrapper(*args: Any, **kwargs: Any) -> Any:
|
||||
"""Wrapper function that times the execution."""
|
||||
start_time = time.time()
|
||||
result = func(*args, **kwargs)
|
||||
elapsed = time.time() - start_time
|
||||
|
||||
# Try to log if the object has a log method (Agent classes)
|
||||
if args and hasattr(args[0], 'log'):
|
||||
args[0].log(f"{func.__name__} completed in {elapsed:.2f} seconds")
|
||||
|
||||
return result
|
||||
|
||||
return wrapper
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user