diff --git a/week1/community-contributions/week1 EXERCISE - TechHelpAgent.ipynb b/week1/community-contributions/week1 EXERCISE - TechHelpAgent.ipynb new file mode 100644 index 0000000..a750b2e --- /dev/null +++ b/week1/community-contributions/week1 EXERCISE - TechHelpAgent.ipynb @@ -0,0 +1,206 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "fe12c203-e6a6-452c-a655-afb8a03a4ff5", + "metadata": {}, + "source": [ + "# End of week 1 exercise\n", + "\n", + "To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question, \n", + "and responds with an explanation. This is a tool that you will be able to use yourself during the course!" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "c1070317-3ed9-4659-abe3-828943230e03", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from IPython.display import Markdown, display, update_display\n", + "from openai import OpenAI\n", + "import json\n", + "from IPython.display import Markdown, display, update_display\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "4a456906-915a-4bfd-bb9d-57e505c5093f", + "metadata": {}, + "outputs": [], + "source": [ + "# constants\n", + "\n", + "MODEL_GPT = 'gpt-4o-mini'\n", + "MODEL_LLAMA = 'llama3.2'\n", + "openai = OpenAI()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "a8d7923c-5f28-4c30-8556-342d7c8497c1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "API key looks good so far\n" + ] + } + ], + "source": [ + "# set up environment\n", + "load_dotenv(override=True)\n", + "api_key = os.getenv(\"OPENAI_API_KEY\")\n", + "if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:\n", + " print(\"API key looks good so far\")\n", + "else:\n", + " print(\"There might be a problem with your API key? Please visit the troubleshooting notebook!\")" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "3f0d0137-52b0-47a8-81a8-11a90a010798", + "metadata": {}, + "outputs": [], + "source": [ + "# here is the question; type over this to ask something new\n", + "system_prompt = \"You are a software engineering and data science expert and you have knowledge in all the areas of software engineering and latest technologies, trends. You should guide and help users with your technical solutions for all software engineering and data science related questions\"\n", + "user_prompt = \"\"\"\n", + "Please explain what this code does and why:\n", + "yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60ce7000-a4a5-4cce-a261-e75ef45063b4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/markdown": [ + "This code snippet is utilizing a Python generator expression combined with the `yield from` statement to yield values from a set comprehension. Let's break it down:\n", + "\n", + "1. **Set Comprehension**:\n", + " ```python\n", + " {book.get(\"author\") for book in books if book.get(\"author\")}\n", + " ```\n", + " - This is a set comprehension that iterates over a collection called `books`.\n", + " - For each `book`, it retrieves the value associated with the key `\"author\"` using the `get()` method.\n", + " - The `if book.get(\"author\")` condition ensures that only books that have a valid (non-None or non-empty) author are included. This effectively filters out any books where the author is not present.\n", + "\n", + " As a result, this part creates a set of unique authors from the list of books. Since sets automatically discard duplicates, if multiple books have the same author, that author will only appear once in the resulting set.\n", + "\n", + "2. **Yielding Values**:\n", + " ```python\n", + " yield from\n", + " ```\n", + " - The `yield from` statement is used when you want to yield all values from an iterable. It allows a generator to yield all values from another generator or iterable.\n", + " - In this context, it will yield each author from the set created by the comprehension.\n", + "\n", + "3. **Putting It All Together**:\n", + " What this overall code does is:\n", + " - It generates and yields unique authors from a collection of books, ensuring that each author is listed only once and only for books that actually specify an author.\n", + "\n", + "### Purpose:\n", + "This code is useful in scenarios where you need to obtain a seemingly infinite generator of authors from a collection of books, processing each author one by one without creating a permanent list or set in memory, which can be beneficial for memory efficiency especially if you have a very large collection of books.\n", + "\n", + "### Example Usage:\n", + "Here’s a basic example of how you might use this in a generator function:\n", + "\n", + "```python\n", + "def get_unique_authors(books):\n", + " yield from {book.get(\"author\") for book in books if book.get(\"author\")}\n", + "\n", + "# Example books list\n", + "books = [\n", + " {\"title\": \"Book 1\", \"author\": \"Author A\"},\n", + " {\"title\": \"Book 2\", \"author\": \"Author B\"},\n", + " {\"title\": \"Book 3\", \"author\": \"Author A\"},\n", + " {\"title\": \"Book 4\", \"author\": None},\n", + "]\n", + "\n", + "for author in get_unique_authors(books):\n", + " print(author)\n", + "```\n", + "\n", + "This would output:\n", + "```\n", + "Author A\n", + "Author B\n", + "```\n", + "\n", + "In this example, `Author A` only appears once, demonstrating the uniqueness provided by the set comprehension." + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "None\n" + ] + } + ], + "source": [ + "# Get gpt-4o-mini to answer, with streaming\n", + "response = openai.chat.completions.create(\n", + " model=MODEL_GPT,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt}\n", + " ],\n", + " stream=True\n", + " )\n", + "result = response.choices[0].message.content\n", + "print(display(Markdown(result)))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8f7c8ea8-4082-4ad0-8751-3301adcf6538", + "metadata": {}, + "outputs": [], + "source": [ + "# Get Llama 3.2 to answer" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "llms", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/week1_assignments/scrape_website.py b/week1/community-contributions/week1_assignments/scrape_website.py new file mode 100644 index 0000000..d040e22 --- /dev/null +++ b/week1/community-contributions/week1_assignments/scrape_website.py @@ -0,0 +1,15 @@ +from bs4 import BeautifulSoup +import requests + + +class ScrapeWebsite: + + def __init__(self, url, headers): + """ Scraping Website which provides title and content""" + self.url = url + response = requests.get(self.url, headers=headers) + soup = BeautifulSoup(response.content, 'html.parser') + self.title = soup.title.string if soup.title else "No title found" + for irrelevant in soup.body(["script", "style", "img", "input"]): + irrelevant.decompose() + self.text = soup.body.get_text(separator="\n", strip=True) \ No newline at end of file diff --git a/week1/community-contributions/week1_assignments/text_summary_ollama.ipynb b/week1/community-contributions/week1_assignments/text_summary_ollama.ipynb new file mode 100644 index 0000000..d7a5b3b --- /dev/null +++ b/week1/community-contributions/week1_assignments/text_summary_ollama.ipynb @@ -0,0 +1,186 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "4e2a9393-7767-488e-a8bf-27c12dca35bd", + "metadata": {}, + "outputs": [], + "source": [ + "# imports\n", + "import os\n", + "from dotenv import load_dotenv\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI \n", + "from scrape_website import ScrapeWebsite" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29ddd15d-a3c5-4f4e-a678-873f56162724", + "metadata": {}, + "outputs": [], + "source": [ + "# Constants\n", + "MODEL = \"llama3.2\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42c8a8c2", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = \"You are an analyst that analyses the content of the website \\\n", + " provides summary and ignore text related to navigation. Respond in markdown.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51e86dd1", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(website):\n", + " user_prompt = f\"You are looking at a website titled {website.title}\"\n", + " user_prompt += \"\\nThe contents of this website is as follows; Please provide short summary in Markdown. Please include news and \\\n", + " announcements\"\n", + " user_prompt+=website.text\n", + " return user_prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b69d7238", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(website):\n", + " return [\n", + " {\"role\":\"system\", \"content\": system_prompt},\n", + " {\"role\":\"user\", \"content\": user_prompt_for(website)}\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a56e99ea", + "metadata": {}, + "outputs": [], + "source": [ + "headers = {\n", + " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b4061d0", + "metadata": {}, + "outputs": [], + "source": [ + "def summarise(url):\n", + " website = ScrapeWebsite(url, headers)\n", + " ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n", + " response = ollama_via_openai.chat.completions.create(\n", + " model=MODEL,\n", + " messages=messages_for(website)\n", + " )\n", + "\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "65f96545", + "metadata": {}, + "outputs": [], + "source": [ + "def display_summary(url):\n", + " summary = summarise(url)\n", + " display(Markdown(summary))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23057e00-b6fc-4678-93a9-6b31cb704bff", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Generative AI has numerous business applications across various industries. Here are some examples:\n", + "\n", + "1. **Marketing and Advertising**: Generative AI can create personalized product recommendations, generate targeted advertisements, and develop new marketing campaigns.\n", + "2. **Content Creation**: AI-powered tools can assist in content creation, such as writing articles, generating social media posts, and creating videos, podcasts, and music.\n", + "3. **Product Design and Development**: Generative AI can aid in designing products, such as 3D modeling, prototyping, and testing product feasibility.\n", + "4. **Customer Service Chatbots**: AI-powered chatbots can provide personalized customer service, answering common queries, and helping resolve issues faster.\n", + "5. **Language Translation**: Generative AI can translate languages in real-time, enabling businesses to communicate with global customers more effectively.\n", + "6. **Data Analysis and Visualization**: AI can analyze large datasets, identify patterns, and create insights, making it easier for businesses to make informed decisions.\n", + "7. **Cybersecurity Threat Detection**: Generative AI-powered systems can detect and respond to cyber threats more efficiently, reducing the risk of data breaches and attacks.\n", + "8. **Supply Chain Optimization**: AI can optimize supply chain operations, predict demand, and identify opportunities for improvement, leading to increased efficiency and reduced costs.\n", + "9. **Network Security**: Generative AI can analyze network traffic patterns, detect anomalies, and prevent cyber-attacks.\n", + "10. **Finance and Banking**: AI-powered systems can detect financial fraud, predict customer creditworthiness, and generate credit reports.\n", + "\n", + "**Industry-specific applications:**\n", + "\n", + "1. **Healthcare**: AI can help with medical diagnosis, patient data analysis, and personalized medicine.\n", + "2. **Manufacturing**: Generative AI can create optimized production schedules, predict equipment failures, and improve product quality.\n", + "3. **Education**: AI-powered tools can develop personalized learning plans, automate grading, and provide educational resources.\n", + "4. **Real Estate**: AI can help with property valuations, identify market trends, and analyze potential clients' needs.\n", + "\n", + "**Business benefits:**\n", + "\n", + "1. **Increased efficiency**: Automating mundane tasks frees up human resources for more strategic work.\n", + "2. **Improved accuracy**: Generative AI reduces the likelihood of human error in decision-making and task execution.\n", + "3. **Enhanced customer experience**: Personalized experiences are created through data-driven insights.\n", + "4. **Competitive advantage**: Companies using AI can differentiate themselves from competitors by offering innovative services and products.\n", + "\n", + "As Generative AI continues to evolve, we can expect even more exciting applications across various industries, leading to increased efficiency, accuracy, and improved competitiveness for businesses worldwide.\n" + ] + } + ], + "source": [ + "display_summary(\"https://www.firstpost.com/world/united-states/\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6de38216-6d1c-48c4-877b-86d403f4e0f8", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "llms", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/week1/community-contributions/week1_assignments/text_summary_openai_gpt_5mini.ipynb b/week1/community-contributions/week1_assignments/text_summary_openai_gpt_5mini.ipynb new file mode 100644 index 0000000..ab6c1a4 --- /dev/null +++ b/week1/community-contributions/week1_assignments/text_summary_openai_gpt_5mini.ipynb @@ -0,0 +1,265 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1e45263e", + "metadata": {}, + "source": [ + "# Web Data Extraction and Summarization using openAI Latest model gpt-5-mini" + ] + }, + { + "cell_type": "markdown", + "id": "df155151", + "metadata": {}, + "source": [ + "#### Import Libraries" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "588f8e43", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from dotenv import load_dotenv\n", + "from IPython.display import Markdown, display\n", + "from openai import OpenAI \n", + "from scrape_website import ScrapeWebsite" + ] + }, + { + "cell_type": "markdown", + "id": "b5925769", + "metadata": {}, + "source": [ + "#### load api key" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "6cca85ec", + "metadata": {}, + "outputs": [], + "source": [ + "load_dotenv(override=True)\n", + "api_key = os.getenv('OPENAI_API_KEY')" + ] + }, + { + "cell_type": "markdown", + "id": "56703f80", + "metadata": {}, + "source": [ + "#### ScrapWebsite using BeautifulSoup" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "3d60c909", + "metadata": {}, + "outputs": [], + "source": [ + "headers = {\n", + " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "a8b73c27", + "metadata": {}, + "source": [ + "#### System Prompt" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "4a0c3bda", + "metadata": {}, + "outputs": [], + "source": [ + "system_prompt = \"You are an analyst that analyses the content of the website \\\n", + " provides summary and ignore text related to navigation. Respond in markdown.\"" + ] + }, + { + "cell_type": "markdown", + "id": "9117963b", + "metadata": {}, + "source": [ + "#### User Prompt" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "ab164d55", + "metadata": {}, + "outputs": [], + "source": [ + "def user_prompt_for(website):\n", + " user_prompt = f\"You are looking at a website titled {website.title}\"\n", + " user_prompt += \"\\nThe contents of this website is as follows; Please provide short summary in Markdown. Please include news and \\\n", + " announcements\"\n", + " user_prompt+=website.text\n", + " return user_prompt" + ] + }, + { + "cell_type": "markdown", + "id": "de7423fb", + "metadata": {}, + "source": [ + "#### Format messages in openAI standard" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "47c82247", + "metadata": {}, + "outputs": [], + "source": [ + "def messages_for(website):\n", + " return [\n", + " {\"role\":\"system\", \"content\": system_prompt},\n", + " {\"role\":\"user\", \"content\": user_prompt_for(website)}\n", + " ]" + ] + }, + { + "cell_type": "markdown", + "id": "6e9bb6e1", + "metadata": {}, + "source": [ + "#### Summarise the content in website using openAI latest model gpt-5-mini" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "068d6bb2", + "metadata": {}, + "outputs": [], + "source": [ + "def summarise(url):\n", + " website = ScrapeWebsite(url, headers)\n", + " openai = OpenAI()\n", + " response = openai.chat.completions.create(model=\"gpt-5-mini\", messages=messages_for(website))\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "markdown", + "id": "7e6e9da6", + "metadata": {}, + "source": [ + "#### Show summary as Markdown" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "cd86c2ca", + "metadata": {}, + "outputs": [], + "source": [ + "def display_summary(url):\n", + " summary = summarise(url)\n", + " display(Markdown(summary))" + ] + }, + { + "cell_type": "markdown", + "id": "ed5e50d2", + "metadata": {}, + "source": [ + "#### Output" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "74a056b1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/markdown": [ + "# Summary — United States Of America | Firstpost (Live/Latest)\n", + "\n", + "Site focus: Live updates and rundowns of US and world news with emphasis on politics, justice, economy, national security, and breaking incidents. Coverage mixes headlines, investigations, opinion and special features/web stories.\n", + "\n", + "## Major news (headlines)\n", + "- Police shooting near CDC/Emory in Atlanta: a suspected shooter and a police officer were killed after reports of an active shooter near the CDC and Emory University campuses. \n", + "- Death of astronaut Jim Lovell (97): Apollo 13 commander and former Navy pilot died in a Chicago suburb. \n", + "- Stephen Miran named to Fed Board (short-term): Trump appointed economist Stephen Miran to the Federal Reserve Board through Jan 2026; noted for support of tariffs and rate cuts. \n", + "- Trump fires labour statistics chief: President Trump sacked the official overseeing labor data hours after a weak jobs report. \n", + "- House panel subpoenas Clintons over Epstein: congressional subpoenas seek documents in relation to Jeffrey Epstein amid pressure on the administration over Epstein files. \n", + "- Ghislaine Maxwell moved to lower-security prison in Texas amid scrutiny of Epstein files and government handling. \n", + "- FBI/administration tension on Epstein Files: Trump said he would “release everything” after reports the FBI redacted names from the Epstein Files. \n", + "- Probe launched into attorney who investigated Trump cases: US officials began a probe targeting Special Counsel Jack Smith. \n", + "- NTSB finds technical issues in Army helicopter crash: investigation into crash that killed 67 people identified technical problems. \n", + "- Trump unveils modified reciprocal tariffs: new executive order introduced modified tariffs on multiple countries; effective date possibly as late as Oct 5. \n", + "- Trump-EU trade deal announced: reported pact imposing a 15% tariff on most EU goods, with large energy and investment components but unresolved issues remain. \n", + "- Federal Reserve holds rates steady: Fed kept rates unchanged for a fifth meeting, despite political pressure from Trump. \n", + "- White House remodel plan: Trump pushing to build a reported $200 million ballroom at the presidential residence, funded by Trump/donors per WH. \n", + "- US citizenship test format under review: Trump administration considers reverting to the 2020 naturalisation test format, citing concerns the current test is too easy. \n", + "- American Airlines incident in Denver: passengers evacuated after a Boeing plane caught fire (tire/maintenance issue) before takeoff. \n", + "- John Bolton criticizes Tulsi Gabbard: former NSA lambastes Gabbard’s report on Obama as exaggerated and lacking substance. \n", + "- Ohio solicitor general Mathura Sridharan trolled: Indian-origin jurist faced racist online backlash after appointment; Ohio AG responded strongly.\n", + "\n", + "## Announcements, features & recurring elements\n", + "- Web stories and quick-read lists: travel/animals/safety themed pieces (e.g., “10 airport codes”, “10 animals that are naturally blue”, World Tiger Day lists). \n", + "- Regular sections and shows highlighted in coverage: Firstpost America, Firstpost Africa, First Sports, Vantage, Fast and Factual, Between The Lines, Flashback, Live TV. \n", + "- Events and special coverage teased: Raisina Dialogue, Champions Trophy, Delhi Elections 2025, Budget 2025, US Elections 2024, Firstpost Defence Summit. \n", + "- Trending topics emphasized: Donald Trump, Narendra Modi, Elon Musk, United States, Joe Biden. \n", + "- Quick-links / network: cross-promotion of other Network18 properties (News18, Moneycontrol, CNBC TV18, Forbes India).\n", + "\n", + "## Tone and emphasis\n", + "- Heavy focus on US politics, Trump administration actions and controversies (Epstein Files, tariffs, personnel changes), justice probes, national security incidents, and major breaking events.\n", + "- Mix of investigative/legal reporting, immediate breaking news, and light/web-story listicles.\n", + "\n", + "If you want, I can produce a one-page brief of just the Trump-related items, a timeline of the Epstein/Clinton/Subpoena coverage, or extract all headlines with publication order." + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display_summary(\"https://www.firstpost.com/world/united-states/\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "llms", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}