{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4e66a6eb-e44a-4dc3-bad7-82e27d45155d",
   "metadata": {},
   "source": [
    "# Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98bf393c-358e-4ee1-b15b-96dfec323734",
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports\n",
    "\n",
    "import os\n",
    "import requests\n",
    "from dotenv import load_dotenv\n",
    "from bs4 import BeautifulSoup\n",
    "from IPython.display import Markdown, display\n",
    "from openai import OpenAI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f92034ed-a2e6-444a-8008-291ba3f80561",
   "metadata": {},
   "source": [
    "# OpenAI API Key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a084b35d-19e9-4b48-bb06-d2c9e4474b20",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load environment variables in a file called .env\n",
    "\n",
    "load_dotenv(override=True)\n",
    "api_key = os.getenv('OPENAI_API_KEY')\n",
    "\n",
    "# Check the key\n",
    "\n",
    "if not api_key:\n",
    "    print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
    "elif not api_key.startswith(\"sk-proj-\"):\n",
    "    print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
    "elif api_key.strip() != api_key:\n",
    "    print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
    "else:\n",
    "    print(\"API key found and looks good so far!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32b35ea0-e4ca-492a-94af-822ec61468a0",
   "metadata": {},
   "source": [
    "# About..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c660b786-af88-4134-b958-ffbf7a7b2904",
   "metadata": {},
   "source": [
    "In this project I use the code from day 1 for something I do at work. I'm a real estate appraiser and when I prepare a valuation for some real estate, I analyze the local market, and in particular the city where the property is located. I then gather economy-related information and create a report from it. I'm based in Poland, so the report is in Polish. Here, I want to ask the model to make such a report for me, using the official website of the city and its related Wikipedia article."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09f32b5a-4d0a-4fec-a2f8-5d323ca2745d",
   "metadata": {},
   "source": [
    "# The Code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0fb8fe1-f052-4426-8531-5520d5295807",
   "metadata": {},
   "outputs": [],
   "source": [
    "openai = OpenAI()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a2cca4b-8cd0-4c1a-a01c-1da10199236c",
   "metadata": {},
   "outputs": [],
   "source": [
    "headers = {\n",
    " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
    "}\n",
    "\n",
    "class Website:\n",
    "\n",
    "    def __init__(self, url):\n",
    "        \"\"\"\n",
    "        Create this Website object from the given url using the BeautifulSoup library\n",
    "        \"\"\"\n",
    "        self.url = url\n",
    "        response = requests.get(url, headers=headers)\n",
    "        soup = BeautifulSoup(response.content, 'html.parser')\n",
    "        self.title = soup.title.string if soup.title else \"No title found\"\n",
    "        for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
    "            irrelevant.decompose()\n",
    "        self.text = soup.body.get_text(separator=\"\\n\", strip=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c73e91c8-5805-4c9f-9bbb-b4e9c1e7bf12",
   "metadata": {},
   "outputs": [],
   "source": [
    "system_prompt = \"\"\"You are an analyst and real estate appraiser who checks out the official websites \n",
    "of cities as well as articles related to these cities on Wikipedia, searching the particular pages \n",
    "of the official website and the Wikipedia article for economic data, in particular the \n",
    "demographic structure of the city, its area, and how it's subdivided into built-up area, \n",
    "rural area, forests, and so on, provided this kind of information is available. \n",
    "The most important information you want to find is that related to the real estate market in the city, \n",
    "but also the general economy of the city, so what kind of factories or companies there are, commerce, \n",
    "business conditions, transportation, economic growth in recent years, and recent investments. \n",
    "wealth of the inhabitants, and so on, depending on what kind of information is available on the website. \n",
    "Combine the information found on the official website with the information found on Wikipedia, and in case\n",
    "of discrepancies, the official website should take precedence. If any of the information is missing,\n",
    "just omit it entirely and don't mention that it is missing, just don't write about it at all.\n",
    "When you gather all the required information, create a comprehensive report presenting \n",
    "the data in a clear way, using markdown, in tabular form where it makes sense. \n",
    "The length of the report should be about 5000 characters. And one more thing, the report should be entirely \n",
    "in Polish. \"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8015e8d-1655-4477-a111-aa8dd584f5eb",
   "metadata": {},
   "outputs": [],
   "source": [
    "def user_prompt_for(city, city_website, wiki_website):\n",
    "    user_prompt = f\"You are looking at the official website of the city {city}, and its wiki article.\"\n",
    "    user_prompt += f\"\\nThe contents of this website is as follows: \\\n",
    "please provide a comprehensive report of economy-related data for the city of {city}, available on the \\\n",
    "particular pages and subpages of its official website and Wikipedia in markdown. \\\n",
    "Add tables if it makes sense for the data. The length of the report should be about 5000 characters. \\\n",
    "The report should be in Polish.\\n\\n\"\n",
    "    user_prompt += city_website.text\n",
    "    user_prompt += wiki_website.text\n",
    "    return user_prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b55bd66b-e997-4d64-b5d5-679098013b9f",
   "metadata": {},
   "outputs": [],
   "source": [
    "def messages_for(city, city_website, wiki_website):\n",
    "    return [\n",
    "        {\"role\": \"system\", \"content\": system_prompt},\n",
    "        {\"role\": \"user\", \"content\": user_prompt_for(city, city_website, wiki_website)}\n",
    "    ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5f1f218-d6a9-4a9e-be7e-b4f41e7647e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "def report(url_official, url_wiki, city):\n",
    "    city_website = Website(url_official)\n",
    "    wiki_website = Website(url_wiki)\n",
    "    response = openai.chat.completions.create(\n",
    "        model = \"gpt-4o-mini\",\n",
    "        messages = messages_for(city, city_website, wiki_website)\n",
    "    )\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08b47ec7-d00f-44e4-bbe2-580c8efd88e5",
   "metadata": {},
   "source": [
    "# Raw Result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "830f0746-08a7-43ae-bd40-78d4a4c5d3e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "report(\"https://www.rudaslaska.pl/\", \"https://pl.wikipedia.org/wiki/Ruda_%C5%9Al%C4%85ska\", \"Ruda Śląska\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3630ac4-c103-4b84-a1a2-c246a702346e",
   "metadata": {},
   "source": [
    "# Polished Result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b89dd543-998d-4466-abd8-cc785118d3e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "def display_report(url_official, url_wiki, city):\n",
    "    rep = report(url_official, url_wiki, city)\n",
    "    display(Markdown(rep))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "157926f3-ba67-4d4b-abbb-24a2dcd85a8b",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_report(\"https://www.rudaslaska.pl/\", \"https://pl.wikipedia.org/wiki/Ruda_%C5%9Al%C4%85ska\", \"Ruda Śląska\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "727d2283-e74c-4e74-86f2-759b08f1427a",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}