LLM_Engineering_OLD/week8/community_contributions/tochi/autonomous_deal_agent.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d5e9f9d1",
   "metadata": {},
   "source": [
    "# Autonomous Deal Agent\n",
    "\n",
    "An intelligent system that automatically discovers, evaluates, and surfaces the best deals from across the web using AI-powered price estimation and web scraping.\n",
    "\n",
    "## Overview\n",
    "\n",
    "This project builds an autonomous agent that monitors RSS feeds and online sources for deals, evaluates them using ensemble ML models, and delivers personalized deal notifications to users. The system combines multiple AI technologies including fine-tuned pricing models, GPT-4 for parsing, RAG for context, and intelligent web scraping.\n",
    "\n",
    "## Key Features\n",
    "\n",
    "### 🤖 AI-Powered Price Estimation\n",
    "- **Fine-tuned Specialist Pricer Model**: Custom-trained model deployed on Modal for accurate price predictions\n",
    "- **Ensemble Architecture**: Multiple specialist models work together to determine fair market value\n",
    "- **GPT-4 Frontend with RAG**: Advanced context-aware pricing using Retrieval-Augmented Generation\n",
    "- **Price Comparison**: Automatically compares deal prices against estimated market value to identify true bargains\n",
    "\n",
    "### 🕷️ Intelligent Web Scraping\n",
    "- **Multi-Source Aggregation**: Scrapes deals from RSS feeds and Reddit\n",
    "- **AI-Powered Parser**: Uses frontier AI models (GPT-4) to intelligently parse and extract deal information from various website structures\n",
    "- **Adaptive Scraping**: Handles different site formats and layouts automatically\n",
    "- **Data Enrichment**: Extracts product details, pricing, features, and purchase links\n",
    "\n",
    "### 💎 Deal Discovery & Analysis\n",
    "- **Automated Deal Scanning**: Continuously monitors configured sources for new deals\n",
    "- **Opportunity Detection**: Identifies deals where market price significantly exceeds offer price\n",
    "- **Deal Scoring**: Ranks deals by discount percentage and estimated value\n",
    "- **Memory System**: Tracks previously surfaced deals to avoid duplicates\n",
    "\n",
    "### 🖥️ User Interface\n",
    "- **Gradio Web UI**: Clean, intuitive interface for browsing discovered deals\n",
    "- **Deal Details**: Displays product description, pricing, estimated value, and discount percentage\n",
    "- **Direct Purchase Links**: One-click access to deal pages\n",
    "\n",
    "### 📲 Push Notifications\n",
    "- **Real-Time Alerts**: Instant notifications when high-value deals are discovered\n",
    "- **Push Notification Integration**: Uses Pushover/similar service for mobile alerts\n",
    "- **Customizable Thresholds**: Configure minimum discount percentage for notifications\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac199135",
   "metadata": {},
   "source": [
    "### Project Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "93f3b7c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Standard library imports\n",
    "import os\n",
    "import re\n",
    "import math\n",
    "import json\n",
    "import random\n",
    "import pickle\n",
    "\n",
    "# Third-party imports\n",
    "from dotenv import load_dotenv\n",
    "from huggingface_hub import login\n",
    "from tqdm import tqdm\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from openai import OpenAI\n",
    "from sentence_transformers import SentenceTransformer\n",
    "from datasets import load_dataset\n",
    "import chromadb\n",
    "from sklearn.ensemble import RandomForestRegressor\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from sklearn.metrics import mean_squared_error, r2_score\n",
    "from sklearn.manifold import TSNE\n",
    "import plotly.graph_objects as go\n",
    "import modal\n",
    "import gradio as gr\n",
    "\n",
    "# Local imports\n",
    "from pricer_ephemeral import app, price\n",
    "from items import Item\n",
    "from testing import Tester\n",
    "from agents.deals import ScrapedDeal, DealSelection, Opportunity, Deal\n",
    "from agents.messaging_agent import MessagingAgent\n",
    "from deal_agent_framework import DealAgentFramework\n",
    "from agents.planning_agent import PlanningAgent"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "069dd1ca",
   "metadata": {},
   "source": [
    "### Loading environment variables for Open AI and Hugging Face"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "446b3028",
   "metadata": {},
   "outputs": [],
   "source": [
    "load_dotenv(override=True)\n",
    "os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')\n",
    "os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')\n",
    "DB = \"products_vectorstore\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17db18f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "hf_token = os.environ['HF_TOKEN']\n",
    "login(hf_token, add_to_git_credential=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a29d6a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "from items import Item"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d1f75b1",
   "metadata": {},
   "source": [
    "### Setting up modal to deploy the pricer Service"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a47784fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "with modal.enable_output():\n",
    "    with app.run():\n",
    "        result=price.remote(\"Quadcast HyperX condenser mic, connects via usb-c to your computer for crystal clear audio\")\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ac2263f",
   "metadata": {},
   "outputs": [],
   "source": [
    "!modal deploy -m pricer_service"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7daae5f9",
   "metadata": {},
   "source": [
    "### Setting up RAG to provide relevant Price Context to GPT - to Improve Accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90cd03a0",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('train.pkl', 'rb') as file:\n",
    "    train = pickle.load(file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58a3e5a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "client = chromadb.PersistentClient(path=DB)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b5910ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "collection_name = \"products\"\n",
    "\n",
    "existing_collection_names = client.list_collections()\n",
    "\n",
    "if collection_name in existing_collection_names:\n",
    "    client.delete_collection(collection_name)\n",
    "    print(f\"Deleted existing collection: {collection_name}\")\n",
    "client.delete_collection(collection_name)\n",
    "collection = client.create_collection(collection_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8b34e0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9fb07b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Pass in a list of texts, get back a numpy array of vectors\n",
    "\n",
    "vector = model.encode([\"Well hi there\"])[0]\n",
    "vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6fdab9af",
   "metadata": {},
   "outputs": [],
   "source": [
    "def description(item):\n",
    "    text = item.prompt.replace(\"How much does this cost to the nearest dollar?\\n\\n\", \"\")\n",
    "    return text.split(\"\\n\\nPrice is $\")[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9e00e0f",
   "metadata": {},
   "outputs": [],
   "source": [
    "description(train[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2485a02d",
   "metadata": {},
   "outputs": [],
   "source": [
    "NUMBER_OF_DOCUMENTS = len(train)\n",
    "\n",
    "for i in tqdm(range(0, NUMBER_OF_DOCUMENTS, 1000)):\n",
    "    documents = [description(item) for item in train[i: i+1000]]\n",
    "    vectors = model.encode(documents).astype(float).tolist()\n",
    "    metadatas = [{\"category\": item.category, \"price\": item.price} for item in train[i: i+1000]]\n",
    "    ids = [f\"doc_{j}\" for j in range(i, i+len(documents))]\n",
    "    collection.add(\n",
    "        ids=ids,\n",
    "        documents=documents,\n",
    "        embeddings=vectors,\n",
    "        metadatas=metadatas\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b16d8ebc",
   "metadata": {},
   "outputs": [],
   "source": [
    "MAXIMUM_DATAPOINTS = 30_000"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "119197e3",
   "metadata": {},
   "outputs": [],
   "source": [
    "DB = \"products_vectorstore\"\n",
    "client = chromadb.PersistentClient(path=DB)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9de6312",
   "metadata": {},
   "outputs": [],
   "source": [
    "collection = client.get_or_create_collection('products')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f467e164",
   "metadata": {},
   "outputs": [],
   "source": [
    "CATEGORIES = ['Appliances', 'Automotive', 'Cell_Phones_and_Accessories', 'Electronics','Musical_Instruments', 'Office_Products', 'Tools_and_Home_Improvement', 'Toys_and_Games']\n",
    "COLORS = ['red', 'blue', 'brown', 'orange', 'yellow', 'green' , 'purple', 'cyan']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7196f11",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prework\n",
    "result = collection.get(include=['embeddings', 'documents', 'metadatas'], limit=MAXIMUM_DATAPOINTS)\n",
    "vectors = np.array(result['embeddings'])\n",
    "documents = result['documents']\n",
    "categories = [metadata['category'] for metadata in result['metadatas']]\n",
    "colors = [COLORS[CATEGORIES.index(c)] for c in categories]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "559f8a0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "MAXIMUM_DATAPOINTS = 20_000\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5074217",
   "metadata": {},
   "outputs": [],
   "source": [
    "DB = \"products_vectorstore\"\n",
    "client = chromadb.PersistentClient(path=DB)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45142753",
   "metadata": {},
   "outputs": [],
   "source": [
    "collection = client.get_or_create_collection('products')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9f74b7e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "CATEGORIES = ['Appliances', 'Automotive', 'Cell_Phones_and_Accessories', 'Electronics','Musical_Instruments', 'Office_Products', 'Tools_and_Home_Improvement', 'Toys_and_Games']\n",
    "COLORS = ['red', 'blue', 'brown', 'orange', 'yellow', 'green' , 'purple', 'cyan']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24ec13d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prework\n",
    "result = collection.get(include=['embeddings', 'documents', 'metadatas'], limit=MAXIMUM_DATAPOINTS)\n",
    "vectors = np.array(result['embeddings'])\n",
    "documents = result['documents']\n",
    "categories = [metadata['category'] for metadata in result['metadatas']]\n",
    "colors = [COLORS[CATEGORIES.index(c)] for c in categories]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8b7776fe",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's try 3D!\n",
    "\n",
    "tsne = TSNE(n_components=3, random_state=42, n_jobs=-1)\n",
    "reduced_vectors = tsne.fit_transform(vectors)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5d60238b",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Create the 3D scatter plot\n",
    "fig = go.Figure(data=[go.Scatter3d(\n",
    "    x=reduced_vectors[:, 0],\n",
    "    y=reduced_vectors[:, 1],\n",
    "    z=reduced_vectors[:, 2],\n",
    "    mode='markers',\n",
    "    marker=dict(size=3, color=colors, opacity=0.7),\n",
    ")])\n",
    "\n",
    "fig.update_layout(\n",
    "    title='3D Chroma Vector Store Visualization',\n",
    "    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),\n",
    "    width=1200,\n",
    "    height=800,\n",
    "    margin=dict(r=20, b=10, l=10, t=40)\n",
    ")\n",
    "\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8919e2ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "openai = OpenAI()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "610b78ed",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('test.pkl', 'rb') as file:\n",
    "    test = pickle.load(file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2828141b",
   "metadata": {},
   "outputs": [],
   "source": [
    "def make_context(similars, prices):\n",
    "    message = \"To provide some context, here are some other items that might be similar to the item you need to estimate.\\n\\n\"\n",
    "    for similar, price in zip(similars, prices):\n",
    "        message += f\"Potentially related product:\\n{similar}\\nPrice is ${price:.2f}\\n\\n\"\n",
    "    return message"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "495e100c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def messages_for(item, similars, prices):\n",
    "    system_message = \"You estimate prices of items. Reply only with the price, no explanation\"\n",
    "    user_prompt = make_context(similars, prices)\n",
    "    user_prompt += \"And now the question for you:\\n\\n\"\n",
    "    user_prompt += item.test_prompt().replace(\" to the nearest dollar\",\"\").replace(\"\\n\\nPrice is $\",\"\")\n",
    "    return [\n",
    "        {\"role\": \"system\", \"content\": system_message},\n",
    "        {\"role\": \"user\", \"content\": user_prompt},\n",
    "        {\"role\": \"assistant\", \"content\": \"Price is $\"}\n",
    "    ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24425574",
   "metadata": {},
   "outputs": [],
   "source": [
    "DB = \"products_vectorstore\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47de76ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "client = chromadb.PersistentClient(path=DB)\n",
    "collection = client.get_or_create_collection('products')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "820fc5b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "def description(item):\n",
    "    text = item.prompt.replace(\"How much does this cost to the nearest dollar?\\n\\n\", \"\")\n",
    "    return text.split(\"\\n\\nPrice is $\")[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4b75b45",
   "metadata": {},
   "outputs": [],
   "source": [
    "description(test[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32ec6e0a",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5ea1b74",
   "metadata": {},
   "outputs": [],
   "source": [
    "def vector(item):\n",
    "    return model.encode([description(item)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1bf5e72c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def find_similars(item):\n",
    "    results = collection.query(query_embeddings=vector(item).astype(float).tolist(), n_results=5)\n",
    "    documents = results['documents'][0][:]\n",
    "    prices = [m['price'] for m in results['metadatas'][0][:]]\n",
    "    return documents, prices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68618b73",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(make_context(documents, prices))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91539724",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(messages_for(test[1], documents, prices))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82b56f8c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_price(s):\n",
    "    s = s.replace('$','').replace(',','')\n",
    "    match = re.search(r\"[-+]?\\d*\\.\\d+|\\d+\", s)\n",
    "    return float(match.group()) if match else 0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78ae2c5d",
   "metadata": {},
   "source": [
    "### GPT 4o Mini - + RAG"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82f0043a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The function for gpt-4o-mini\n",
    "\n",
    "def gpt_4o_mini_rag(item):\n",
    "    documents, prices = find_similars(item)\n",
    "    response = openai.chat.completions.create(\n",
    "        model=\"gpt-4o-mini\", \n",
    "        messages=messages_for(item, documents, prices),\n",
    "        seed=42,\n",
    "        max_tokens=5\n",
    "    )\n",
    "    reply = response.choices[0].message.content\n",
    "    return get_price(reply)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "650356ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "gpt_4o_mini_rag(test[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d65cf118",
   "metadata": {},
   "outputs": [],
   "source": [
    "test[1].price"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7d31909a",
   "metadata": {},
   "outputs": [],
   "source": [
    "Tester.test(gpt_4o_mini_rag, test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c932850",
   "metadata": {},
   "outputs": [],
   "source": [
    "from agents.frontier_agent import FrontierAgent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09217080",
   "metadata": {},
   "outputs": [],
   "source": [
    "agent = FrontierAgent(collection)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f52e5c1f",
   "metadata": {},
   "outputs": [],
   "source": [
    "client = chromadb.PersistentClient(path=DB)\n",
    "collection = client.get_or_create_collection('products')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5cff455b",
   "metadata": {},
   "outputs": [],
   "source": [
    "result = collection.get(include=['embeddings', 'documents', 'metadatas'])\n",
    "vectors = np.array(result['embeddings'])\n",
    "documents = result['documents']\n",
    "prices = [metadata['price'] for metadata in result['metadatas']]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18159c68",
   "metadata": {},
   "source": [
    "### Finetuning Random Forest Model for product Price Prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae7ad094",
   "metadata": {},
   "outputs": [],
   "source": [
    "rf_model = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)\n",
    "rf_model.fit(vectors, prices)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1b452b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "joblib.dump(rf_model, 'random_forest_model.pkl')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e433a2cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "rf_model = joblib.load('random_forest_model.pkl')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed4ad6e5",
   "metadata": {},
   "source": [
    "### Agents for prediction of the price - RF, Specialist Agent Frontier Agent + RAG"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f89f1bff",
   "metadata": {},
   "outputs": [],
   "source": [
    "from agents.specialist_agent import SpecialistAgent\n",
    "from agents.frontier_agent import FrontierAgent\n",
    "from agents.random_forest_agent import RandomForestAgent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3e9b35f",
   "metadata": {},
   "outputs": [],
   "source": [
    "specialist = SpecialistAgent()\n",
    "frontier = FrontierAgent(collection)\n",
    "random_forest = RandomForestAgent()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c31663e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "def description(item):\n",
    "    return item.prompt.split(\"to the nearest dollar?\\n\\n\")[1].split(\"\\n\\nPrice is $\")[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52eadc3f",
   "metadata": {},
   "outputs": [],
   "source": [
    "def rf(item):\n",
    "    return random_forest.price(description(item))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74e75cd3",
   "metadata": {},
   "outputs": [],
   "source": [
    "Tester.test(rf, test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "738810cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "product = \"Quadcast HyperX condenser mic for high quality audio for podcasting\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf8d8c5d",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(specialist.price(product))\n",
    "print(frontier.price(product))\n",
    "print(random_forest.price(product))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5790f240",
   "metadata": {},
   "outputs": [],
   "source": [
    "specialists = []\n",
    "frontiers = []\n",
    "random_forests = []\n",
    "prices = []\n",
    "for item in tqdm(test[1000:1250]):\n",
    "    text = description(item)\n",
    "    specialists.append(specialist.price(text))\n",
    "    frontiers.append(frontier.price(text))\n",
    "    random_forests.append(random_forest.price(text))\n",
    "    prices.append(item.price)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e9676de",
   "metadata": {},
   "outputs": [],
   "source": [
    "mins = [min(s,f,r) for s,f,r in zip(specialists, frontiers, random_forests)]\n",
    "maxes = [max(s,f,r) for s,f,r in zip(specialists, frontiers, random_forests)]\n",
    "\n",
    "X = pd.DataFrame({\n",
    "    'Specialist': specialists,\n",
    "    'Frontier': frontiers,\n",
    "    'RandomForest': random_forests,\n",
    "    'Min': mins,\n",
    "    'Max': maxes,\n",
    "})\n",
    "\n",
    "# Convert y to a Series\n",
    "y = pd.Series(prices)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fac5cde1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Train a Linear Regression\n",
    "np.random.seed(42)\n",
    "\n",
    "lr = LinearRegression()\n",
    "lr.fit(X, y)\n",
    "\n",
    "feature_columns = X.columns.tolist()\n",
    "\n",
    "for feature, coef in zip(feature_columns, lr.coef_):\n",
    "    print(f\"{feature}: {coef:.2f}\")\n",
    "print(f\"Intercept={lr.intercept_:.2f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5a36b10",
   "metadata": {},
   "outputs": [],
   "source": [
    "joblib.dump(lr, 'ensemble_model.pkl')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3dd2f5d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "from agents.ensemble_agent import EnsembleAgent\n",
    "ensemble = EnsembleAgent(collection)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e12fe631",
   "metadata": {},
   "outputs": [],
   "source": [
    "ensemble.price(product)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf48356b",
   "metadata": {},
   "outputs": [],
   "source": [
    "def ensemble_pricer(item):\n",
    "    return max(0,ensemble.price(description(item)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c26bfc9",
   "metadata": {},
   "outputs": [],
   "source": [
    "Tester.test(ensemble_pricer, test)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b51c5ab",
   "metadata": {},
   "source": [
    "### The Scraper Agent - Fetches deals from websites and Formats them specially"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "359d6b41",
   "metadata": {},
   "outputs": [],
   "source": [
    "deals = ScrapedDeal.fetch(show_progress=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fce72d69",
   "metadata": {},
   "outputs": [],
   "source": [
    "len(deals)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f1a724d",
   "metadata": {},
   "outputs": [],
   "source": [
    "deals[0].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4fda93c",
   "metadata": {},
   "outputs": [],
   "source": [
    "system_prompt = \"\"\"You identify and summarize the 5 most detailed deals from a list, by selecting deals that have the most detailed, high quality description and the most clear price.\n",
    "Respond strictly in JSON with no explanation, using this format. You should provide the price as a number derived from the description. If the price of a deal isn't clear, do not include that deal in your response.\n",
    "Most important is that you respond with the 5 deals that have the most detailed product description with price. It's not important to mention the terms of the deal; most important is a thorough description of the product.\n",
    "Be careful with products that are described as \"$XXX off\" or \"reduced by $XXX\" - this isn't the actual price of the product. Only respond with products when you are highly confident about the price. \n",
    "\n",
    "{\"deals\": [\n",
    "    {\n",
    "        \"product_description\": \"Your clearly expressed summary of the product in 4-5 sentences. Details of the item are much more important than why it's a good deal. Avoid mentioning discounts and coupons; focus on the item itself. There should be a paragpraph of text for each item you choose.\",\n",
    "        \"price\": 99.99,\n",
    "        \"url\": \"the url as provided\"\n",
    "    },\n",
    "    ...\n",
    "]}\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e4223a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "user_prompt = \"\"\"Respond with the most promising 5 deals from this list, selecting those which have the most detailed, high quality product description and a clear price.\n",
    "Respond strictly in JSON, and only JSON. You should rephrase the description to be a summary of the product itself, not the terms of the deal.\n",
    "Remember to respond with a paragraph of text in the product_description field for each of the 5 items that you select.\n",
    "Be careful with products that are described as \"$XXX off\" or \"reduced by $XXX\" - this isn't the actual price of the product. Only respond with products when you are highly confident about the price. \n",
    "\n",
    "Deals:\n",
    "\n",
    "\"\"\"\n",
    "user_prompt += '\\n\\n'.join([deal.describe() for deal in deals])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d1395a64",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(user_prompt[:2000])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5e2287b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_recommendations():\n",
    "    completion = openai.beta.chat.completions.parse(\n",
    "        model=\"gpt-4o-mini\",\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": system_prompt},\n",
    "            {\"role\": \"user\", \"content\": user_prompt}\n",
    "      ],\n",
    "        response_format=DealSelection\n",
    "    )\n",
    "    result = completion.choices[0].message.parsed\n",
    "    return result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c4341005",
   "metadata": {},
   "outputs": [],
   "source": [
    "result = get_recommendations()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "04a16f22",
   "metadata": {},
   "outputs": [],
   "source": [
    "len(result.deals)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "664e37f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "result.deals[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89a7a438",
   "metadata": {},
   "outputs": [],
   "source": [
    "from agents.scanner_agent import ScannerAgent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "93195299",
   "metadata": {},
   "outputs": [],
   "source": [
    "agent = ScannerAgent()\n",
    "result = agent.scan()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66a360ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "77487599",
   "metadata": {},
   "outputs": [],
   "source": [
    "DB = \"products_vectorstore\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "94aef186",
   "metadata": {},
   "outputs": [],
   "source": [
    "agent = MessagingAgent()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e43ea7d",
   "metadata": {},
   "source": [
    "### Planning Agent "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c25239d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "DB = \"products_vectorstore\"\n",
    "client = chromadb.PersistentClient(path=DB)\n",
    "collection = client.get_or_create_collection('products')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8fb32dde",
   "metadata": {},
   "outputs": [],
   "source": [
    "planner = PlanningAgent(collection)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8dcb93b",
   "metadata": {},
   "outputs": [],
   "source": [
    "planner.plan()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a72326b",
   "metadata": {},
   "source": [
    "### Gradio UI For Data Visualization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "385e69cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "agent_framework = DealAgentFramework()\n",
    "agent_framework.init_agents_as_needed()\n",
    "\n",
    "with gr.Blocks(title=\"The Price is Right\", fill_width=True) as ui:\n",
    "\n",
    "    initial_deal = Deal(product_description=\"Example description\", price=100.0, url=\"https://cnn.com\")\n",
    "    initial_opportunity = Opportunity(deal=initial_deal, estimate=200.0, discount=100.0)\n",
    "    opportunities = gr.State([initial_opportunity])\n",
    "\n",
    "    def get_table(opps):\n",
    "        return [[opp.deal.product_description, opp.deal.price, opp.estimate, opp.discount, opp.deal.url] for opp in opps]\n",
    "\n",
    "    def do_select(opportunities, selected_index: gr.SelectData):\n",
    "        row = selected_index.index[0]\n",
    "        opportunity = opportunities[row]\n",
    "        agent_framework.planner.messenger.alert(opportunity)\n",
    "\n",
    "    with gr.Row():\n",
    "        gr.Markdown('<div style=\"text-align: center;font-size:24px\">\"The Price is Right\" - Deal Hunting Agentic AI</div>')\n",
    "    with gr.Row():\n",
    "        gr.Markdown('<div style=\"text-align: center;font-size:14px\">Deals surfaced so far:</div>')\n",
    "    with gr.Row():\n",
    "        opportunities_dataframe = gr.Dataframe(\n",
    "            headers=[\"Description\", \"Price\", \"Estimate\", \"Discount\", \"URL\"],\n",
    "            wrap=True,\n",
    "            column_widths=[4, 1, 1, 1, 2],\n",
    "            row_count=10,\n",
    "            col_count=5,\n",
    "            max_height=400,\n",
    "        )\n",
    "\n",
    "    ui.load(get_table, inputs=[opportunities], outputs=[opportunities_dataframe])\n",
    "    opportunities_dataframe.select(do_select, inputs=[opportunities], outputs=[])\n",
    "\n",
    "ui.launch(inbrowser=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67a8bb67",
   "metadata": {},
   "source": [
    "### Run the application"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8b047d0",
   "metadata": {},
   "outputs": [],
   "source": [
    "!python price_is_right_final.py"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}