{ "cells": [ { "cell_type": "markdown", "id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9", "metadata": {}, "source": [ "# Welcome to your first assignment!\n", "\n", "Instructions are below. Please give this a try, and look in the solutions folder if you get stuck (or feel free to ask me!)" ] }, { "cell_type": "code", "execution_count": null, "id": "4e2a9393-7767-488e-a8bf-27c12dca35bd", "metadata": {}, "outputs": [], "source": [ "# imports\n", "\n", "import requests\n", "from bs4 import BeautifulSoup\n", "from IPython.display import Markdown, display" ] }, { "cell_type": "code", "execution_count": null, "id": "29ddd15d-a3c5-4f4e-a678-873f56162724", "metadata": {}, "outputs": [], "source": [ "# Constants\n", "\n", "OLLAMA_API = \"http://localhost:11434/api/chat\"\n", "HEADERS = {\"Content-Type\": \"application/json\"}\n", "MODEL = \"llama3.2\"" ] }, { "cell_type": "code", "execution_count": null, "id": "dac0a679-599c-441f-9bf2-ddc73d35b940", "metadata": {}, "outputs": [], "source": [ "# Create a messages list using the same format that we used for OpenAI\n", "\n", "messages = [\n", " {\"role\": \"user\", \"content\": \"Describe some of the business applications of Generative AI\"}\n", "]" ] }, { "cell_type": "code", "execution_count": null, "id": "7bb9c624-14f0-4945-a719-8ddb64f66f47", "metadata": {}, "outputs": [], "source": [ "payload = {\n", " \"model\": MODEL,\n", " \"messages\": messages,\n", " \"stream\": False\n", " }" ] }, { "cell_type": "code", "execution_count": null, "id": "7745b9c4-57dc-4867-9180-61fa5db55eb8", "metadata": {}, "outputs": [], "source": [ "import ollama\n", "\n", "response = ollama.chat(model=MODEL, messages=messages)\n", "print(response['message']['content'])" ] }, { "cell_type": "markdown", "id": "a4704e10-f5fb-4c15-a935-f046c06fb13d", "metadata": {}, "source": [ "## Alternative approach - using OpenAI python library to connect to Ollama" ] }, { "cell_type": "code", "execution_count": null, "id": "23057e00-b6fc-4678-93a9-6b31cb704bff", "metadata": {}, "outputs": [], "source": [ "# There's actually an alternative approach that some people might prefer\n", "# You can use the OpenAI client python library to call Ollama:\n", "\n", "from openai import OpenAI\n", "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", "\n", "response = ollama_via_openai.chat.completions.create(\n", " model=MODEL,\n", " messages=messages\n", ")\n", "\n", "print(response.choices[0].message.content)" ] }, { "cell_type": "markdown", "id": "1622d9bb-5c68-4d4e-9ca4-b492c751f898", "metadata": {}, "source": [ "# NOW the exercise for you\n", "\n", "Take the code from day1 and incorporate it here, to build a website summarizer that uses Llama 3.2 running locally instead of OpenAI; use either of the above approaches." ] }, { "cell_type": "code", "execution_count": 12, "id": "0c1f84c4-4cc0-4085-8ea5-871a8ca46a47", "metadata": {}, "outputs": [], "source": [ "# imports\n", "\n", "import ollama" ] }, { "cell_type": "code", "execution_count": 13, "id": "890852ab-2cd4-41dc-b168-6bd1360b967a", "metadata": {}, "outputs": [], "source": [ "MODEL = \"llama3.2\"" ] }, { "cell_type": "code", "execution_count": 14, "id": "6de38216-6d1c-48c4-877b-86d403f4e0f8", "metadata": {}, "outputs": [], "source": [ "# A class to represent a Webpage\n", "\n", "# Some websites need you to use proper headers when fetching them:\n", "headers = {\n", " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", "}\n", "\n", "class Website:\n", "\n", " def __init__(self, url):\n", " \"\"\"\n", " Create this Website object from the given url using the BeautifulSoup library\n", " \"\"\"\n", " self.url = url\n", " response = requests.get(url, headers=headers)\n", " soup = BeautifulSoup(response.content, 'html.parser')\n", " self.title = soup.title.string if soup.title else \"No title found\"\n", " for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n", " irrelevant.decompose()\n", " self.text = soup.body.get_text(separator=\"\\n\", strip=True)" ] }, { "cell_type": "code", "execution_count": 16, "id": "9d398f9a-c66e-42b5-91b4-5417944b8408", "metadata": {}, "outputs": [], "source": [ "def user_prompt_generator(website) -> str:\n", " user_prompt = f\"You will act as a website summarizer with knowledge of Web Content Accessibility Guidelines. You will look into the web: {website.title} and \"\n", " user_prompt += \"break down the relevant information about it in this categories: What is the website about, \\\n", " to whom the website belongs and what practises should improve to have a better user experience. \\n\\n\"\n", " user_prompt += website.text\n", "\n", " return user_prompt" ] }, { "cell_type": "code", "execution_count": 23, "id": "156d7c67-b714-4156-9f69-faf0c50aaf13", "metadata": {}, "outputs": [], "source": [ "def messages_generator(user_prompt : str) -> list[dict[str, str]]:\n", " messages = [{\"role\" : \"user\", \"content\" : user_prompt}]\n", "\n", " return messages" ] }, { "cell_type": "code", "execution_count": 21, "id": "f07c4143-6cc5-4d28-846c-a373564e9264", "metadata": {}, "outputs": [], "source": [ "def user_request_reader() -> str:\n", " while True:\n", " website_url = input(\"Define what website you want to summarize by giving the url: \")\n", " if website_url.lower().startswith(\"http\"):\n", " return website_url\n", " print(\"URL not valid. Please provide a full url.\\n\")" ] }, { "cell_type": "code", "execution_count": 19, "id": "94933255-2ca8-40b5-8f74-865d3e781058", "metadata": {}, "outputs": [], "source": [ "def summarizer_bot():\n", " website_url = user_request_reader()\n", " website = Website(website_url)\n", " \n", " user_prompt = user_prompt_generator(website)\n", " messages = messages_generator(user_prompt)\n", "\n", " response = ollama.chat(model=MODEL, messages=messages)\n", " print(response['message']['content'])" ] }, { "cell_type": "code", "execution_count": 24, "id": "2d81faa4-25b3-4d5d-8f36-93772e449b5c", "metadata": {}, "outputs": [ { "name": "stdin", "output_type": "stream", "text": [ "Define what website you want to summarize by giving the url: test.com\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "URL not valid. Please provide a full url.\n", "\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ "Define what website you want to summarize by giving the url: https://edwarddonner.com\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "**Summary:**\n", "\n", "The website \"Home - Edward Donner\" belongs to Edward Donner, a co-founder and CTO of Nebula.io, an AI startup. The website is about Edward's interests in writing code, experimenting with Large Language Models (LLMs), and DJing, as well as his work in applying AI to help people discover their potential.\n", "\n", "**Categories:**\n", "\n", "### What is the website about?\n", "\n", "The website is primarily about Edward Donner's personal brand, showcasing his expertise in AI and LLMs. It includes information about his work at Nebula.io, which applies AI to talent management. The website also features a \"Connect Four\" arena where LLMs compete against each other, as well as sections for learning more about LLMs and staying up-to-date with Edward's courses and publications.\n", "\n", "### To whom does the website belong?\n", "\n", "The website belongs to Edward Donner, a co-founder and CTO of Nebula.io. It appears to be a personal website or blog, showcasing his expertise and interests in AI and LLMs.\n", "\n", "### Practices to improve for better user experience:\n", "\n", "1. **Clearer navigation**: The website's menu is simple but not intuitive. Adding clear categories or sections would help users quickly find the information they're looking for.\n", "2. **More detailed about section**: The \"About\" section provides a brief overview of Edward's work and interests, but it could be more detailed and comprehensive.\n", "3. **Improved accessibility**: While the website is likely following general web accessibility guidelines, there are no clear indications of this on the page. Adding alt text to images, providing a clear font size and color scheme, and ensuring sufficient contrast between background and foreground would improve the user experience for people with disabilities.\n", "4. **Better calls-to-action (CTAs)**: The website could benefit from more prominent CTAs, guiding users towards specific actions such as signing up for courses or following Edward on social media.\n", "5. **SEO optimization**: The website's content and meta tags appear to be optimized for search engines, but a more thorough SEO analysis would help identify areas for improvement.\n", "\n", "Overall, the website provides a clear overview of Edward Donner's interests and expertise in AI and LLMs, but could benefit from some tweaks to improve accessibility, navigation, and CTAs.\n" ] } ], "source": [ "# The call\n", "summarizer_bot()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.13" } }, "nbformat": 4, "nbformat_minor": 5 }