LLM_Engineering_OLD/week1/day5.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a98030af-fcd1-4d63-a36e-38ba053498fa",
   "metadata": {},
   "source": [
    "# A full business solution\n",
    "\n",
    "## Now we will take our project from Day 1 to the next level\n",
    "\n",
    "### BUSINESS CHALLENGE:\n",
    "\n",
    "Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.\n",
    "\n",
    "We will be provided a company name and their primary website.\n",
    "\n",
    "See the end of this notebook for examples of real-world business applications.\n",
    "\n",
    "And remember: I'm always available if you have problems or ideas! Please do reach out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d5b08506-dc8b-4443-9201-5f1848161363",
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports\n",
    "# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt\n",
    "\n",
    "import os\n",
    "import json\n",
    "from dotenv import load_dotenv\n",
    "from IPython.display import Markdown, display, update_display\n",
    "from scraper import fetch_website_links, fetch_website_contents\n",
    "from openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "fc5d8880-f2ee-4c06-af16-ecbc0262af61",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "API key looks good so far\n"
     ]
    }
   ],
   "source": [
    "# Initialize and constants\n",
    "\n",
    "load_dotenv(override=True)\n",
    "api_key = os.getenv('OPENAI_API_KEY')\n",
    "\n",
    "if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:\n",
    "    print(\"API key looks good so far\")\n",
    "else:\n",
    "    print(\"There might be a problem with your API key? Please visit the troubleshooting notebook!\")\n",
    "    \n",
    "MODEL = 'gpt-5-nano'\n",
    "openai = OpenAI()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e30d8128-933b-44cc-81c8-ab4c9d86589a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['https://edwarddonner.com/',\n",
       " 'https://edwarddonner.com/connect-four/',\n",
       " 'https://edwarddonner.com/outsmart/',\n",
       " 'https://edwarddonner.com/about-me-and-about-nebula/',\n",
       " 'https://edwarddonner.com/posts/',\n",
       " 'https://edwarddonner.com/',\n",
       " 'https://news.ycombinator.com',\n",
       " 'https://nebula.io/?utm_source=ed&utm_medium=referral',\n",
       " 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',\n",
       " 'https://patents.google.com/patent/US20210049536A1/',\n",
       " 'https://www.linkedin.com/in/eddonner/',\n",
       " 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',\n",
       " 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',\n",
       " 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',\n",
       " 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',\n",
       " 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',\n",
       " 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',\n",
       " 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',\n",
       " 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',\n",
       " 'https://edwarddonner.com/',\n",
       " 'https://edwarddonner.com/connect-four/',\n",
       " 'https://edwarddonner.com/outsmart/',\n",
       " 'https://edwarddonner.com/about-me-and-about-nebula/',\n",
       " 'https://edwarddonner.com/posts/',\n",
       " 'mailto:hello@mygroovydomain.com',\n",
       " 'https://www.linkedin.com/in/eddonner/',\n",
       " 'https://twitter.com/edwarddonner',\n",
       " 'https://www.facebook.com/edward.donner.52']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "links = fetch_website_links(\"https://edwarddonner.com\")\n",
    "links"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1771af9c-717a-4fca-bbbe-8a95893312c3",
   "metadata": {},
   "source": [
    "## First step: Have GPT-5-nano figure out which links are relevant\n",
    "\n",
    "### Use a call to gpt-5-nano to read the links on a webpage, and respond in structured JSON.  \n",
    "It should decide which links are relevant, and replace relative links such as \"/about\" with \"https://company.com/about\".  \n",
    "We will use \"one shot prompting\" in which we provide an example of how it should respond in the prompt.\n",
    "\n",
    "This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!\n",
    "\n",
    "Sidenote: there is a more advanced technique called \"Structured Outputs\" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "6957b079-0d96-45f7-a26a-3487510e9b35",
   "metadata": {},
   "outputs": [],
   "source": [
    "link_system_prompt = \"\"\"\n",
    "You are provided with a list of links found on a webpage.\n",
    "You are able to decide which of the links would be most relevant to include in a brochure about the company,\n",
    "such as links to an About page, or a Company page, or Careers/Jobs pages.\n",
    "You should respond in JSON as in this example:\n",
    "\n",
    "{\n",
    "    \"links\": [\n",
    "        {\"type\": \"about page\", \"url\": \"https://full.url/goes/here/about\"},\n",
    "        {\"type\": \"careers page\", \"url\": \"https://another.full.url/careers\"}\n",
    "    ]\n",
    "}\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "8e1f601b-2eaf-499d-b6b8-c99050c9d6b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_links_user_prompt(url):\n",
    "    user_prompt = f\"\"\"\n",
    "Here is the list of links on the website {url} -\n",
    "Please decide which of these are relevant web links for a brochure about the company, \n",
    "respond with the full https URL in JSON format.\n",
    "Do not include Terms of Service, Privacy, email links.\n",
    "\n",
    "Links (some might be relative links):\n",
    "\n",
    "\"\"\"\n",
    "    links = fetch_website_links(url)\n",
    "    user_prompt += \"\\n\".join(links)\n",
    "    return user_prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "6bcbfa78-6395-4685-b92c-22d592050fd7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Here is the list of links on the website https://edwarddonner.com -\n",
      "Please decide which of these are relevant web links for a brochure about the company, \n",
      "respond with the full https URL in JSON format.\n",
      "Do not include Terms of Service, Privacy, email links.\n",
      "\n",
      "Links (some might be relative links):\n",
      "\n",
      "https://edwarddonner.com/\n",
      "https://edwarddonner.com/connect-four/\n",
      "https://edwarddonner.com/outsmart/\n",
      "https://edwarddonner.com/about-me-and-about-nebula/\n",
      "https://edwarddonner.com/posts/\n",
      "https://edwarddonner.com/\n",
      "https://news.ycombinator.com\n",
      "https://nebula.io/?utm_source=ed&utm_medium=referral\n",
      "https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html\n",
      "https://patents.google.com/patent/US20210049536A1/\n",
      "https://www.linkedin.com/in/eddonner/\n",
      "https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/\n",
      "https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/\n",
      "https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/\n",
      "https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/\n",
      "https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/\n",
      "https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/\n",
      "https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/\n",
      "https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/\n",
      "https://edwarddonner.com/\n",
      "https://edwarddonner.com/connect-four/\n",
      "https://edwarddonner.com/outsmart/\n",
      "https://edwarddonner.com/about-me-and-about-nebula/\n",
      "https://edwarddonner.com/posts/\n",
      "mailto:hello@mygroovydomain.com\n",
      "https://www.linkedin.com/in/eddonner/\n",
      "https://twitter.com/edwarddonner\n",
      "https://www.facebook.com/edward.donner.52\n"
     ]
    }
   ],
   "source": [
    "print(get_links_user_prompt(\"https://edwarddonner.com\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "effeb95f",
   "metadata": {},
   "outputs": [],
   "source": [
    "def select_relevant_links(url):\n",
    "    response = openai.chat.completions.create(\n",
    "        model=MODEL,\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": link_system_prompt},\n",
    "            {\"role\": \"user\", \"content\": get_links_user_prompt(url)}\n",
    "        ],\n",
    "        response_format={\"type\": \"json_object\"}\n",
    "    )\n",
    "    result = response.choices[0].message.content\n",
    "    links = json.loads(result)\n",
    "    return links\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "490de841",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "2d5b1ded",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'links': [{'type': 'homepage', 'url': 'https://edwarddonner.com/'},\n",
       "  {'type': 'about page',\n",
       "   'url': 'https://edwarddonner.com/about-me-and-about-nebula/'},\n",
       "  {'type': 'blog page', 'url': 'https://edwarddonner.com/posts/'},\n",
       "  {'type': 'blog post',\n",
       "   'url': 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/'},\n",
       "  {'type': 'blog post',\n",
       "   'url': 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/'},\n",
       "  {'type': 'blog post',\n",
       "   'url': 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/'},\n",
       "  {'type': 'blog post',\n",
       "   'url': 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/'},\n",
       "  {'type': 'linkedin page', 'url': 'https://www.linkedin.com/in/eddonner/'},\n",
       "  {'type': 'twitter', 'url': 'https://twitter.com/edwarddonner'},\n",
       "  {'type': 'facebook page',\n",
       "   'url': 'https://www.facebook.com/edward.donner.52'}]}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "select_relevant_links(\"https://edwarddonner.com\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "a29aca19-ca13-471c-a4b4-5abbfa813f69",
   "metadata": {},
   "outputs": [],
   "source": [
    "def select_relevant_links(url):\n",
    "    print(f\"Selecting relevant links for {url} by calling {MODEL}\")\n",
    "    response = openai.chat.completions.create(\n",
    "        model=MODEL,\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": link_system_prompt},\n",
    "            {\"role\": \"user\", \"content\": get_links_user_prompt(url)}\n",
    "        ],\n",
    "        response_format={\"type\": \"json_object\"}\n",
    "    )\n",
    "    result = response.choices[0].message.content\n",
    "    links = json.loads(result)\n",
    "    print(f\"Found {len(links['links'])} relevant links\")\n",
    "    return links"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "74a827a0-2782-4ae5-b210-4a242a8b4cc2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selecting relevant links for https://edwarddonner.com by calling gpt-5-nano\n",
      "Found 9 relevant links\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'links': [{'type': 'home page', 'url': 'https://edwarddonner.com/'},\n",
       "  {'type': 'about page',\n",
       "   'url': 'https://edwarddonner.com/about-me-and-about-nebula/'},\n",
       "  {'type': 'blog page', 'url': 'https://edwarddonner.com/posts/'},\n",
       "  {'type': 'projects page', 'url': 'https://edwarddonner.com/connect-four/'},\n",
       "  {'type': 'projects page', 'url': 'https://edwarddonner.com/outsmart/'},\n",
       "  {'type': 'company page',\n",
       "   'url': 'https://nebula.io/?utm_source=ed&utm_medium=referral'},\n",
       "  {'type': 'LinkedIn profile', 'url': 'https://www.linkedin.com/in/eddonner/'},\n",
       "  {'type': 'Twitter profile', 'url': 'https://twitter.com/edwarddonner'},\n",
       "  {'type': 'Facebook page',\n",
       "   'url': 'https://www.facebook.com/edward.donner.52'}]}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "select_relevant_links(\"https://edwarddonner.com\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "d3d583e2-dcc4-40cc-9b28-1e8dbf402924",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
      "Found 13 relevant links\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'links': [{'type': 'about page', 'url': 'https://huggingface.co/brand'},\n",
       "  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},\n",
       "  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},\n",
       "  {'type': 'documentation page', 'url': 'https://huggingface.co/docs'},\n",
       "  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},\n",
       "  {'type': 'join page (careers)', 'url': 'https://huggingface.co/join'},\n",
       "  {'type': 'blog', 'url': 'https://huggingface.co/blog'},\n",
       "  {'type': 'learn page', 'url': 'https://huggingface.co/learn'},\n",
       "  {'type': 'GitHub', 'url': 'https://github.com/huggingface'},\n",
       "  {'type': 'Twitter', 'url': 'https://twitter.com/huggingface'},\n",
       "  {'type': 'LinkedIn', 'url': 'https://www.linkedin.com/company/huggingface/'},\n",
       "  {'type': 'Community forum', 'url': 'https://discuss.huggingface.co'},\n",
       "  {'type': 'product endpoints', 'url': 'https://endpoints.huggingface.co'}]}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "select_relevant_links(\"https://huggingface.co\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d74128e-dfb6-47ec-9549-288b621c838c",
   "metadata": {},
   "source": [
    "## Second step: make the brochure!\n",
    "\n",
    "Assemble all the details into another prompt to GPT-5-nano"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "85a5b6e2-e7ef-44a9-bc7f-59ede71037b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "def fetch_page_and_all_relevant_links(url):\n",
    "    contents = fetch_website_contents(url)\n",
    "    relevant_links = select_relevant_links(url)\n",
    "    result = f\"## Landing Page:\\n\\n{contents}\\n## Relevant Links:\\n\"\n",
    "    for link in relevant_links['links']:\n",
    "        result += f\"\\n\\n### Link: {link['type']}\\n\"\n",
    "        result += fetch_website_contents(link[\"url\"])\n",
    "    return result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "5099bd14-076d-4745-baf3-dac08d8e5ab2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
      "Found 9 relevant links\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "## Landing Page:\n",
      "\n",
      "Hugging Face – The AI community building the future.\n",
      "\n",
      "Hugging Face\n",
      "Models\n",
      "Datasets\n",
      "Spaces\n",
      "Community\n",
      "Docs\n",
      "Enterprise\n",
      "Pricing\n",
      "Log In\n",
      "Sign Up\n",
      "NEW\n",
      "Try HuggingChat Omni – Chat with AI 💬\n",
      "Get started with Inference in seconds 🚀\n",
      "Reachy Mini: The Open Robot for AI Builders\n",
      "The AI community building the future.\n",
      "The platform where the machine learning community collaborates on models, datasets, and applications.\n",
      "Explore AI Apps\n",
      "or\n",
      "Browse 1M+ models\n",
      "Trending on\n",
      "this week\n",
      "Models\n",
      "deepseek-ai/DeepSeek-OCR\n",
      "Updated\n",
      "about 4 hours ago\n",
      "•\n",
      "32.9k\n",
      "•\n",
      "1.21k\n",
      "PaddlePaddle/PaddleOCR-VL\n",
      "Updated\n",
      "about 22 hours ago\n",
      "•\n",
      "6.62k\n",
      "•\n",
      "919\n",
      "Qwen/Qwen3-VL-8B-Instruct\n",
      "Updated\n",
      "7 days ago\n",
      "•\n",
      "117k\n",
      "•\n",
      "272\n",
      "nanonets/Nanonets-OCR2-3B\n",
      "Updated\n",
      "6 days ago\n",
      "•\n",
      "16.2k\n",
      "•\n",
      "369\n",
      "Phr00t/Qwen-Image-Edit-Rapid-AIO\n",
      "Updated\n",
      "about 7 hours ago\n",
      "•\n",
      "388\n",
      "Browse 1M+ models\n",
      "Spaces\n",
      "Running\n",
      "464\n",
      "464\n",
      "veo3.1-fast\n",
      "🐨\n",
      "Generate videos from text or images\n",
      "Running\n",
      "15.3k\n",
      "15.3k\n",
      "DeepSite v3\n",
      "🐳\n",
      "Generate any application by Vibe Coding\n",
      "Running\n",
      "406\n",
      "406\n",
      "Sora 2\n",
      "📉\n",
      "Generate videos from text or images\n",
      "Running\n",
      "1.98k\n",
      "1.98k\n",
      "Wan2.2 Animate\n",
      "👁\n",
      "Wan2.2 Animate\n",
      "Running\n",
      "on\n",
      "Zero\n",
      "MCP\n",
      "1.87k\n",
      "1.87k\n",
      "Wan2.2 14B Fast\n",
      "🎥\n",
      "generate a video from an image with a text prompt\n",
      "Browse 400k+ applications\n",
      "Datasets\n",
      "karpathy/fineweb-edu-100b-shuffle\n",
      "Updated\n",
      "27 days ago\n",
      "•\n",
      "25.3k\n",
      "•\n",
      "74\n",
      "fka/awesome-chatgpt-prompts\n",
      "Updated\n",
      "Jan 6\n",
      "•\n",
      "37.2k\n",
      "•\n",
      "9.29k\n",
      "nick007x/github-code-2025\n",
      "Updated\n",
      "7 days ago\n",
      "•\n",
      "7.23k\n",
      "•\n",
      "46\n",
      "HuggingFaceFW/finewiki\n",
      "Updated\n",
      "1 day ago\n",
      "•\n",
      "32\n",
      "•\n",
      "32\n",
      "Open-Bee/Honey-Data-15M\n",
      "Updated\n",
      "6 days ago\n",
      "•\n",
      "25\n",
      "•\n",
      "31\n",
      "Browse 250k+ datasets\n",
      "The Home of Machine Learning\n",
      "Create, discover and collaborate on ML better.\n",
      "The collaboration platform\n",
      "Host and collaborate on unlimited public models, datasets and applications.\n",
      "Move faster\n",
      "With the HF Open source stack.\n",
      "Explore all modalities\n",
      "Text, image, video, audio or even 3D.\n",
      "Build your portfolio\n",
      "Share your work with the world and build your ML profile.\n",
      "Sign Up\n",
      "Accelerate your ML\n",
      "We provide paid Compute and Enterprise solutions.\n",
      "Team & Enterprise\n",
      "Give your team the most advanced p\n",
      "## Relevant Links:\n",
      "\n",
      "\n",
      "### Link: company page\n",
      "Enterprise Hub - Hugging Face\n",
      "\n",
      "Hugging Face\n",
      "Models\n",
      "Datasets\n",
      "Spaces\n",
      "Community\n",
      "Docs\n",
      "Enterprise\n",
      "Pricing\n",
      "Log In\n",
      "Sign Up\n",
      "Team & Enterprise Hub\n",
      "Scale your organization with the world’s leading AI platform\n",
      "Subscribe to\n",
      "Team\n",
      "starting at $20/user/month\n",
      "or\n",
      "Contact sales for\n",
      "Enterprise\n",
      "to explore flexible contract options\n",
      "Give your organization the most advanced platform to build AI with enterprise-grade security, access controls,\n",
      "\t\t\tdedicated support and more.\n",
      "Single Sign-On\n",
      "Connect securely to your identity provider with SSO integration.\n",
      "Regions\n",
      "Select, manage, and audit the location of your repository data.\n",
      "Audit Logs\n",
      "Stay in control with comprehensive logs that report on actions taken.\n",
      "Resource Groups\n",
      "Accurately manage access to repositories with granular access control.\n",
      "Token Management\n",
      "Centralized token control and custom approval policies for organization access.\n",
      "Analytics\n",
      "Track and analyze repository usage data in a single dashboard.\n",
      "Advanced Compute Options\n",
      "Increase scalability and performance with more compute options like ZeroGPU.\n",
      "ZeroGPU Quota Boost\n",
      "All organization members get 5x more ZeroGPU quota to get the most of Spaces.\n",
      "Private Datasets Viewer\n",
      "Enable the Dataset Viewer on your private datasets for easier collaboration.\n",
      "Private Storage\n",
      "Get an additional 1 TB of private storage for each member of your organization (then $25/month per extra TB).\n",
      "Inference Providers\n",
      "Enable organization billing for Inference Providers, monitor usage with analytics, and manage spending limits.\n",
      "Advanced security\n",
      "Configure organization-wide security policies and default repository visibility.\n",
      "Billing\n",
      "Control your budget effectively with managed billing and yearly commit options.\n",
      "Priority Support\n",
      "Maximize your platform usage with priority support from the Hugging Face team.\n",
      "Join the most forward-thinking AI organizations\n",
      "Everything you already know and love about Hugging Face in Enterprise mode.\n",
      "Subscribe to\n",
      "Team\n",
      "starting at $20/user/month\n",
      "or\n",
      "Contact sales for\n",
      "Enterprise\n",
      "to explore fl\n",
      "\n",
      "### Link: about page\n",
      "Brand assets - Hugging Face\n",
      "\n",
      "Hugging Face\n",
      "Models\n",
      "Datasets\n",
      "Spaces\n",
      "Community\n",
      "Docs\n",
      "Enterprise\n",
      "Pricing\n",
      "Log In\n",
      "Sign Up\n",
      "Hugging Face · Brand assets\n",
      "HF Logos\n",
      ".svg\n",
      ".png\n",
      ".ai\n",
      ".svg\n",
      ".png\n",
      ".ai\n",
      ".svg\n",
      ".png\n",
      ".ai\n",
      "HF Colors\n",
      "#FFD21E\n",
      "#FF9D00\n",
      "#6B7280\n",
      "HF Bio\n",
      "Hugging Face is the collaboration platform for the machine learning community.\n",
      "\n",
      "The Hugging Face Hub works as a central place where anyone can share, explore, discover, and experiment with open-source ML. HF empowers the next generation of machine learning engineers, scientists, and end users to learn, collaborate and share their work to build an open and ethical AI future together.\n",
      "\n",
      "With the fast-growing community, some of the most used open-source ML libraries and tools, and a talented science team exploring the edge of tech, Hugging Face is at the heart of the AI revolution.\n",
      "Copy to clipboard\n",
      "HF Universe\n",
      "Find other assets available for use from the Hugging Face brand universe\n",
      "here\n",
      ".\n",
      "System theme\n",
      "Website\n",
      "Models\n",
      "Datasets\n",
      "Spaces\n",
      "Changelog\n",
      "Inference Endpoints\n",
      "HuggingChat\n",
      "Company\n",
      "About\n",
      "Brand assets\n",
      "Terms of service\n",
      "Privacy\n",
      "Jobs\n",
      "Press\n",
      "Resources\n",
      "Learn\n",
      "Documentation\n",
      "Blog\n",
      "Forum\n",
      "Service Status\n",
      "Social\n",
      "GitHub\n",
      "Twitter\n",
      "LinkedIn\n",
      "Discord\n",
      "\n",
      "### Link: careers page\n",
      "No title found\n",
      "\n",
      "\n",
      "\n",
      "### Link: careers page\n",
      "Hugging Face - Current Openings\n",
      "\n",
      "\n",
      "\n",
      "### Link: blog\n",
      "Hugging Face – Blog\n",
      "\n",
      "Hugging Face\n",
      "Models\n",
      "Datasets\n",
      "Spaces\n",
      "Community\n",
      "Docs\n",
      "Enterprise\n",
      "Pricing\n",
      "Log In\n",
      "Sign Up\n",
      "Blog, Articles, and discussions\n",
      "New Article\n",
      "Everything\n",
      "community\n",
      "guide\n",
      "open source collab\n",
      "partnerships\n",
      "research\n",
      "NLP\n",
      "Audio\n",
      "CV\n",
      "RL\n",
      "ethics\n",
      "Diffusion\n",
      "Game Development\n",
      "RLHF\n",
      "Leaderboard\n",
      "Case Studies\n",
      "LeRobot\n",
      "Inference Providers\n",
      "Community Articles\n",
      "view all\n",
      "AI for Food Allergies\n",
      "By\n",
      "hugging-science\n",
      "and 3 others\n",
      "•\n",
      "5 days ago\n",
      "•\n",
      "26\n",
      "Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text\n",
      "By\n",
      "isaacchung\n",
      "and 2 others\n",
      "•\n",
      "1 day ago\n",
      "•\n",
      "26\n",
      "Announcing Hugging Face Fundamentals: A New Learning Track on DataCamp\n",
      "By\n",
      "huggingface\n",
      "•\n",
      "6 days ago\n",
      "•\n",
      "21\n",
      "Introducing the Massive Legal Embedding Benchmark (MLEB)\n",
      "By\n",
      "isaacus\n",
      "and 1 other\n",
      "•\n",
      "5 days ago\n",
      "•\n",
      "16\n",
      "How I Built Lightning-Fast Vector Search for Legal Documents\n",
      "By\n",
      "adlumal\n",
      "•\n",
      "2 days ago\n",
      "•\n",
      "14\n",
      "Art of Focus: Page-Aware Sparse Attention and Ling 2.0’s Quest for Efficient Context Length Scaling\n",
      "By\n",
      "RichardBian\n",
      "and 19 others\n",
      "•\n",
      "1 day ago\n",
      "•\n",
      "14\n",
      "Scaling Test-Time Compute to Achieve Gold Medal at IOI 2025 with Open-Weight Models\n",
      "By\n",
      "nvidia\n",
      "and 3 others\n",
      "•\n",
      "1 day ago\n",
      "•\n",
      "11\n",
      "GSMA Open-Telco LLM Benchmarks 2.0: The first dedicated LLM Evaluation for Telecoms\n",
      "By\n",
      "otellm\n",
      "and 15 others\n",
      "•\n",
      "2 days ago\n",
      "•\n",
      "8\n",
      "Code a simple RAG from scratch\n",
      "By\n",
      "ngxson\n",
      "•\n",
      "Oct 29, 2024\n",
      "•\n",
      "223\n",
      "Model statistics of the 50 most downloaded entities on Hugging Face\n",
      "By\n",
      "lbourdois\n",
      "•\n",
      "9 days ago\n",
      "•\n",
      "25\n",
      "There is no such thing as a tokenizer-free lunch\n",
      "By\n",
      "catherinearnett\n",
      "•\n",
      "27 days ago\n",
      "•\n",
      "83\n",
      "How Financial News Can Be Used to Train Good Financial Models\n",
      "By\n",
      "SelmaNajih001\n",
      "•\n",
      "14 days ago\n",
      "•\n",
      "8\n",
      "Introduction to State Space Models (SSM)\n",
      "By\n",
      "lbourdois\n",
      "•\n",
      "Jul 19, 2024\n",
      "•\n",
      "181\n",
      "Visualizing How VLMs Work\n",
      "By\n",
      "not-lain\n",
      "and 1 other\n",
      "•\n",
      "15 days ago\n",
      "•\n",
      "39\n",
      "Understanding Vector Quantization in VQ-VAE\n",
      "By\n",
      "ariG23498\n",
      "•\n",
      "Aug 28, 2024\n",
      "•\n",
      "48\n",
      "Small Language Models (SLM): A Comprehensive Overview\n",
      "By\n",
      "jjokah\n",
      "•\n",
      "Feb 22\n",
      "•\n",
      "91\n",
      "mem-agent: Equipping LLM Agents with Memory Using RL\n",
      "By\n",
      "driaforall\n",
      "and 1 other\n",
      "•\n",
      "13 days ago\n",
      "•\n",
      "30\n",
      "Granite Em\n",
      "\n",
      "### Link: learn page\n",
      "Hugging Face - Learn\n",
      "\n",
      "Hugging Face\n",
      "Models\n",
      "Datasets\n",
      "Spaces\n",
      "Community\n",
      "Docs\n",
      "Enterprise\n",
      "Pricing\n",
      "Log In\n",
      "Sign Up\n",
      "Learn\n",
      "LLM Course\n",
      "This course will teach you about large language models using libraries from the HF ecosystem\n",
      "Robotics Course\n",
      "This course will teach you to build robots with using LeRobot\n",
      "MCP Course\n",
      "This course will teach you about Model Context Protocol\n",
      "a smol course\n",
      "This smollest course on post-training AI models\n",
      "Agents Course\n",
      "Learn to build and deploy your own AI agents\n",
      "Deep RL Course\n",
      "This course will teach you about deep reinforcement learning using libraries from the HF ecosystem\n",
      "Community Computer Vision Course\n",
      "This course will teach you about computer vision ML using libraries and models from the HF ecosystem\n",
      "Audio Course\n",
      "Learn to apply transformers to audio data using libraries from the HF ecosystem\n",
      "Open-Source AI Cookbook\n",
      "A collection of open-source-powered notebooks by AI builders, for AI builders\n",
      "ML for Games Course\n",
      "This course will teach you about integrating AI models your game and using AI tools in your game development workflow\n",
      "Diffusion Course\n",
      "Learn about diffusion models & how to use them with diffusers\n",
      "ML for 3D Course\n",
      "Learn about 3D ML with libraries from the HF ecosystem\n",
      "System theme\n",
      "Company\n",
      "TOS\n",
      "Privacy\n",
      "About\n",
      "Jobs\n",
      "Website\n",
      "Models\n",
      "Datasets\n",
      "Spaces\n",
      "Pricing\n",
      "Docs\n",
      "\n",
      "### Link: open-source page\n",
      "Hugging Face · GitHub\n",
      "\n",
      "Skip to content\n",
      "Navigation Menu\n",
      "Toggle navigation\n",
      "Sign in\n",
      "Appearance settings\n",
      "huggingface\n",
      "Platform\n",
      "GitHub Copilot\n",
      "Write better code with AI\n",
      "GitHub Spark\n",
      "New\n",
      "Build and deploy intelligent apps\n",
      "GitHub Models\n",
      "New\n",
      "Manage and compare prompts\n",
      "GitHub Advanced Security\n",
      "Find and fix vulnerabilities\n",
      "Actions\n",
      "Automate any workflow\n",
      "Codespaces\n",
      "Instant dev environments\n",
      "Issues\n",
      "Plan and track work\n",
      "Code Review\n",
      "Manage code changes\n",
      "Discussions\n",
      "Collaborate outside of code\n",
      "Code Search\n",
      "Find more, search less\n",
      "Explore\n",
      "Why GitHub\n",
      "Documentation\n",
      "GitHub Skills\n",
      "Blog\n",
      "Integrations\n",
      "GitHub Marketplace\n",
      "MCP Registry\n",
      "View all features\n",
      "Solutions\n",
      "By company size\n",
      "Enterprises\n",
      "Small and medium teams\n",
      "Startups\n",
      "Nonprofits\n",
      "By use case\n",
      "App Modernization\n",
      "DevSecOps\n",
      "DevOps\n",
      "CI/CD\n",
      "View all use cases\n",
      "By industry\n",
      "Healthcare\n",
      "Financial services\n",
      "Manufacturing\n",
      "Government\n",
      "View all industries\n",
      "View all solutions\n",
      "Resources\n",
      "Topics\n",
      "AI\n",
      "DevOps\n",
      "Security\n",
      "Software Development\n",
      "View all\n",
      "Explore\n",
      "Learning Pathways\n",
      "Events & Webinars\n",
      "Ebooks & Whitepapers\n",
      "Customer Stories\n",
      "Partners\n",
      "Executive Insights\n",
      "Open Source\n",
      "GitHub Sponsors\n",
      "Fund open source developers\n",
      "The ReadME Project\n",
      "GitHub community articles\n",
      "Repositories\n",
      "Topics\n",
      "Trending\n",
      "Collections\n",
      "Enterprise\n",
      "Enterprise platform\n",
      "AI-powered developer platform\n",
      "Available add-ons\n",
      "GitHub Advanced Security\n",
      "Enterprise-grade security features\n",
      "Copilot for business\n",
      "Enterprise-grade AI features\n",
      "Premium Support\n",
      "Enterprise-grade 24/7 support\n",
      "Pricing\n",
      "Search or jump to...\n",
      "Search code, repositories, users, issues, pull requests...\n",
      "Search\n",
      "Clear\n",
      "Search syntax tips\n",
      "Provide feedback\n",
      "We read every piece of feedback, and take your input very seriously.\n",
      "Include my email address so I can be contacted\n",
      "Cancel\n",
      "Submit feedback\n",
      "Saved searches\n",
      "Use saved searches to filter your results more quickly\n",
      "Cancel\n",
      "Create saved search\n",
      "Sign in\n",
      "Sign up\n",
      "Appearance settings\n",
      "Resetting focus\n",
      "You signed in with another tab or window.\n",
      "Reload\n",
      "to refresh your session.\n",
      "You signed out in another tab or window.\n",
      "Reload\n",
      "to refresh yo\n",
      "\n",
      "### Link: social page\n",
      "No title found\n",
      "\n",
      "JavaScript is not available.\n",
      "We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using x.com. You can see a list of supported browsers in our Help Center.\n",
      "Help Center\n",
      "Terms of Service\n",
      "Privacy Policy\n",
      "Cookie Policy\n",
      "Imprint\n",
      "Ads info\n",
      "© 2025 X Corp.\n",
      "Something went wrong, but don’t fret — let’s give it another shot.\n",
      "Try again\n",
      "Some privacy related extensions may cause issues on x.com. Please disable them and try again.\n",
      "\n",
      "### Link: social page\n",
      "Hugging Face | LinkedIn\n",
      "\n",
      "Skip to main content\n",
      "LinkedIn\n",
      "Top Content\n",
      "People\n",
      "Learning\n",
      "Jobs\n",
      "Games\n",
      "Get the app\n",
      "Sign in\n",
      "Create an account\n",
      "Hugging Face\n",
      "Software Development\n",
      "The AI community building the future.\n",
      "See jobs\n",
      "Follow\n",
      "View all 632 employees\n",
      "Report this company\n",
      "About us\n",
      "The AI community building the future.\n",
      "Website\n",
      "https://huggingface.co\n",
      "External link for Hugging Face\n",
      "Industry\n",
      "Software Development\n",
      "Company size\n",
      "51-200 employees\n",
      "Type\n",
      "Privately Held\n",
      "Founded\n",
      "2016\n",
      "Specialties\n",
      "machine learning, natural language processing, and deep learning\n",
      "Products\n",
      "Hugging Face\n",
      "Hugging Face\n",
      "Natural Language Processing (NLP) Software\n",
      "We’re on a journey to solve and democratize artificial intelligence through natural language.\n",
      "Locations\n",
      "Primary\n",
      "Get directions\n",
      "Paris, FR\n",
      "Get directions\n",
      "Employees at Hugging Face\n",
      "Ludovic Huraux\n",
      "Rajat Arya\n",
      "Tech Lead & Software Engineer @ HF | prev: co-founder XetHub, Apple, Turi, AWS, Microsoft\n",
      "Jeff Boudier\n",
      "Product + Growth at Hugging Face\n",
      "Terrence Rohan\n",
      "Seed Investor\n",
      "See all employees\n",
      "Updates\n",
      "Hugging Face\n",
      "reposted this\n",
      "Andrés Marafioti\n",
      "Multimodal Research Lead @ Hugging Face | 10+ YOE in AI R&D\n",
      "16h\n",
      "Report this post\n",
      "Finally, our new paper is out! \"𝗙𝗶𝗻𝗲𝗩𝗶𝘀𝗶𝗼𝗻: 𝗢𝗽𝗲𝗻 𝗗𝗮𝘁𝗮 𝗜𝘀 𝗔𝗹𝗹 𝗬𝗼𝘂 𝗡𝗲𝗲𝗱\"! 🥳\n",
      "\n",
      "If you've ever trained a VLM, you know this problem: nobody shares their data mixtures. It's a black box, making replicating SOTA work impossible.\n",
      "We wanted to change that.\n",
      "\n",
      "FineVision unifies 200 sources into 24 million samples. With 17.3 million images and 9.5 billion answer tokens, it's the largest open resource of its kind.\n",
      "\n",
      "In the paper, we share how we built it:\n",
      " 🔍 finding and cleaning data at scale\n",
      " 🧹 removing excessive duplicates across sources\n",
      " 🤗 decontaminating against 66 public benchmarks\n",
      "\n",
      "My favorite part is Figure 6 (in the video!). It's our visual diversity analysis. It shows that FineVision isn't just bigger; it's more balanced and conceptually richer than other open datasets. \n",
      "NVIDIA's Eagle 2 paper highlighted just how critical this visual diversity is, and \n"
     ]
    }
   ],
   "source": [
    "print(fetch_page_and_all_relevant_links(\"https://huggingface.co\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "9b863a55-f86c-4e3f-8a79-94e24c1a8cf2",
   "metadata": {},
   "outputs": [],
   "source": [
    "brochure_system_prompt = \"\"\"\n",
    "You are an assistant that analyzes the contents of several relevant pages from a company website\n",
    "and creates a short brochure about the company for prospective customers, investors and recruits.\n",
    "Respond in markdown without code blocks.\n",
    "Include details of company culture, customers and careers/jobs if you have the information.\n",
    "\"\"\"\n",
    "\n",
    "# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':\n",
    "\n",
    "# brochure_system_prompt = \"\"\"\n",
    "# You are an assistant that analyzes the contents of several relevant pages from a company website\n",
    "# and creates a short, humorous, entertaining, witty brochure about the company for prospective customers, investors and recruits.\n",
    "# Respond in markdown without code blocks.\n",
    "# Include details of company culture, customers and careers/jobs if you have the information.\n",
    "# \"\"\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "6ab83d92-d36b-4ce0-8bcc-5bb4c2f8ff23",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_brochure_user_prompt(company_name, url):\n",
    "    user_prompt = f\"\"\"\n",
    "You are looking at a company called: {company_name}\n",
    "Here are the contents of its landing page and other relevant pages;\n",
    "use this information to build a short brochure of the company in markdown without code blocks.\\n\\n\n",
    "\"\"\"\n",
    "    user_prompt += fetch_page_and_all_relevant_links(url)\n",
    "    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters\n",
    "    return user_prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "cd909e0b-1312-4ce2-a553-821e795d7572",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
      "Found 10 relevant links\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'\\nYou are looking at a company called: HuggingFace\\nHere are the contents of its landing page and other relevant pages;\\nuse this information to build a short brochure of the company in markdown without code blocks.\\n\\n\\n## Landing Page:\\n\\nHugging Face – The AI community building the future.\\n\\nHugging Face\\nModels\\nDatasets\\nSpaces\\nCommunity\\nDocs\\nEnterprise\\nPricing\\nLog In\\nSign Up\\nNEW\\nTry HuggingChat Omni – Chat with AI 💬\\nGet started with Inference in seconds 🚀\\nReachy Mini: The Open Robot for AI Builders\\nThe AI community building the future.\\nThe platform where the machine learning community collaborates on models, datasets, and applications.\\nExplore AI Apps\\nor\\nBrowse 1M+ models\\nTrending on\\nthis week\\nModels\\ndeepseek-ai/DeepSeek-OCR\\nUpdated\\nabout 4 hours ago\\n•\\n32.9k\\n•\\n1.21k\\nPaddlePaddle/PaddleOCR-VL\\nUpdated\\nabout 22 hours ago\\n•\\n6.62k\\n•\\n919\\nQwen/Qwen3-VL-8B-Instruct\\nUpdated\\n7 days ago\\n•\\n117k\\n•\\n272\\nnanonets/Nanonets-OCR2-3B\\nUpdated\\n6 days ago\\n•\\n16.2k\\n•\\n369\\nPhr00t/Qwen-Image-Edit-Rapid-AIO\\nUpdated\\nabout 7 hours ago\\n•\\n388\\nBrowse 1M+ models\\nSpaces\\nRunning\\n464\\n464\\nveo3.1-fast\\n🐨\\nGenerate videos from text or images\\nRunning\\n15.3k\\n15.3k\\nDeepSite v3\\n🐳\\nGenerate any application by Vibe Coding\\nRunning\\n406\\n406\\nSora 2\\n📉\\nGenerate videos from text or images\\nRunning\\n1.98k\\n1.98k\\nWan2.2 Animate\\n👁\\nWan2.2 Animate\\nRunning\\non\\nZero\\nMCP\\n1.87k\\n1.87k\\nWan2.2 14B Fast\\n🎥\\ngenerate a video from an image with a text prompt\\nBrowse 400k+ applications\\nDatasets\\nkarpathy/fineweb-edu-100b-shuffle\\nUpdated\\n27 days ago\\n•\\n25.3k\\n•\\n74\\nfka/awesome-chatgpt-prompts\\nUpdated\\nJan 6\\n•\\n37.2k\\n•\\n9.29k\\nnick007x/github-code-2025\\nUpdated\\n7 days ago\\n•\\n7.23k\\n•\\n46\\nHuggingFaceFW/finewiki\\nUpdated\\n1 day ago\\n•\\n32\\n•\\n32\\nOpen-Bee/Honey-Data-15M\\nUpdated\\n6 days ago\\n•\\n25\\n•\\n31\\nBrowse 250k+ datasets\\nThe Home of Machine Learning\\nCreate, discover and collaborate on ML better.\\nThe collaboration platform\\nHost and collaborate on unlimited public models, datasets and applications.\\nMove faster\\nWith the HF Open source stack.\\nExplore all modalities\\nText, image, video, audio or even 3D.\\nBuild your portfolio\\nShare your work with the world and build your ML profile.\\nSign Up\\nAccelerate your ML\\nWe provide paid Compute and Enterprise solutions.\\nTeam & Enterprise\\nGive your team the most advanced p\\n## Relevant Links:\\n\\n\\n### Link: company homepage\\nHugging Face – The AI community building the future.\\n\\nHugging Face\\nModels\\nDatasets\\nSpaces\\nCommunity\\nDocs\\nEnterprise\\nPricing\\nLog In\\nSign Up\\nNEW\\nTry HuggingChat Omni – Chat with AI 💬\\nGet started with Inference in seconds 🚀\\nReachy Mini: The Open Robot for AI Builders\\nThe AI community building the future.\\nThe platform where the machine learning community collaborates on models, datasets, and applications.\\nExplore AI Apps\\nor\\nBrowse 1M+ models\\nTrending on\\nthis week\\nModels\\ndeepseek-ai/DeepSeek-OCR\\nUpdated\\nabout 4 hours ago\\n•\\n32.9k\\n•\\n1.21k\\nPaddlePaddle/PaddleOCR-VL\\nUpdated\\nabout 22 hours ago\\n•\\n6.62k\\n•\\n919\\nQwen/Qwen3-VL-8B-Instruct\\nUpdated\\n7 days ago\\n•\\n117k\\n•\\n272\\nnanonets/Nanonets-OCR2-3B\\nUpdated\\n6 days ago\\n•\\n16.2k\\n•\\n369\\nPhr00t/Qwen-Image-Edit-Rapid-AIO\\nUpdated\\nabout 7 hours ago\\n•\\n388\\nBrowse 1M+ models\\nSpaces\\nRunning\\n464\\n464\\nveo3.1-fast\\n🐨\\nGenerate videos from text or images\\nRunning\\n15.3k\\n15.3k\\nDeepSite v3\\n🐳\\nGenerate any application by Vibe Coding\\nRunning\\n406\\n406\\nSora 2\\n📉\\nGenerate videos from text or images\\nRunning\\n1.98k\\n1.98k\\nWan2.2 Animate\\n👁\\nWan2.2 Animate\\nRunning\\non\\nZero\\nMCP\\n1.87k\\n1.87k\\nWan2.2 14B Fast\\n🎥\\ngenerate a video from an image with a text prompt\\nBrowse 400k+ applications\\nDatasets\\nkarpathy/fineweb-edu-100b-shuffle\\nUpdated\\n27 days ago\\n•\\n25.3k\\n•\\n74\\nfka/awesome-chatgpt-prompts\\nUpdated\\nJan 6\\n•\\n37.2k\\n•\\n9.29k\\nnick007x/github-code-2025\\nUpdated\\n7 days ago\\n•\\n7.23k\\n•\\n46\\nHuggingFaceFW/finewiki\\nUpdated\\n1 day ago\\n•\\n32\\n•\\n32\\nOpen-Bee/Honey-Data-15M\\nUpdated\\n6 days ago\\n•\\n25\\n•\\n31\\nBrowse 250k+ datasets\\nThe Home of Machine Learning\\nCreate, discover and collaborate on ML better.\\nThe collaboration platform\\nHost and collaborate on unlimited public models, datasets and applications.\\nMove faster\\nWith the HF Open source stack.\\nExplore all modalities\\nText, image, video, audio or even 3D.\\nBuild your portfolio\\nShare your work with the world and build your ML profile.\\nSign Up\\nAccelerate your ML\\nWe provide paid Compute and Enterprise solutions.\\nTeam & Enterprise\\nGive your team the most advanced p\\n\\n### Link: brand page\\nBrand assets - Hugging Face\\n\\nHugging Face\\nModels\\nDatasets\\nSpaces\\nCommunity\\nDocs\\nEnterprise\\nPricing\\nLog In\\nSign Up\\nHugging Face · Brand assets\\nHF Logos\\n.svg\\n.png\\n.ai\\n.svg\\n.png\\n.ai\\n.svg\\n.png\\n.ai\\nHF Colors\\n#FFD21E\\n#FF9D00\\n#6B7280\\nHF Bio\\nHugging Face is the collaboration platform for the machine learning community.\\n\\nThe Hugging Face Hub works as a central place where anyone can share, explore, discover, and experiment with open-source ML. HF empowers the next generation of machine learning engineers, scientists, and end users to learn, collaborate and share their work to build an open and ethical AI future together.\\n\\nWith the fast-growing community, some of the most used open-source ML libr'"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "get_brochure_user_prompt(\"HuggingFace\", \"https://huggingface.co\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "8b45846d",
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_brochure(company_name, url):\n",
    "    response = openai.chat.completions.create(\n",
    "        model=\"gpt-4.1-mini\",\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": brochure_system_prompt},\n",
    "            {\"role\": \"user\", \"content\": get_brochure_user_prompt(company_name, url)}\n",
    "        ],\n",
    "    )\n",
    "    result = response.choices[0].message.content\n",
    "    display(Markdown(result))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "b123615a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
      "Found 13 relevant links\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "# Hugging Face Brochure\n",
       "\n",
       "## About Hugging Face\n",
       "\n",
       "Hugging Face is a vibrant AI community dedicated to building the future of machine learning. It serves as a collaborative platform where researchers, developers, and organizations come together to create, share, and improve machine learning models, datasets, and applications. The company’s mission is to accelerate innovation in AI by providing open-source tools and an ecosystem that supports all modalities including text, image, video, audio, and even 3D.\n",
       "\n",
       "---\n",
       "\n",
       "## Platform Offerings\n",
       "\n",
       "- **Models:** Access and contribute to over 1 million state-of-the-art machine learning models, updated regularly to reflect the latest advances.\n",
       "- **Datasets:** Explore and share from more than 250,000 datasets, catering to a diversity of applications.\n",
       "- **Spaces:** Create and deploy interactive AI applications within the community, with over 400,000 applications available.\n",
       "- **Community:** Collaborate with a global network of AI enthusiasts and experts.\n",
       "- **HuggingChat Omni:** Engage with conversational AI through their cutting-edge chatbot.\n",
       "- **Reachy Mini:** An open robot designed for AI builders to integrate and experiment with AI capabilities.\n",
       "\n",
       "The platform supports seamless ML inference starting in seconds, making advanced AI accessible and easy to use.\n",
       "\n",
       "---\n",
       "\n",
       "## Enterprise Solutions\n",
       "\n",
       "Hugging Face offers dedicated solutions tailored for teams and enterprises to scale their AI initiatives with enterprise-grade security and support:\n",
       "\n",
       "- **Team subscriptions** starting at $20/user/month.\n",
       "- **Enterprise contracts** with flexible options.\n",
       "- Features include Single Sign-On (SSO), region selection and auditing of data repositories, comprehensive audit logs, and dedicated customer support.\n",
       "- The platform ensures control, security, and compliance required by organizations deploying AI at scale.\n",
       "\n",
       "---\n",
       "\n",
       "## Company Culture\n",
       "\n",
       "Hugging Face fosters an open, collaborative, and inclusive culture, emphasizing:\n",
       "\n",
       "- Open-source contributions and knowledge sharing.\n",
       "- Empowering creators to build and showcase their machine learning portfolios.\n",
       "- Supporting innovation across academic, startup, and large enterprise sectors.\n",
       "- Building a friendly, engaged community focused on ethical AI development and collective progress.\n",
       "\n",
       "---\n",
       "\n",
       "## For Customers & Developers\n",
       "\n",
       "- Extensive resources for building and deploying machine learning models.\n",
       "- Access to a vast repository of AI assets for immediate use or customization.\n",
       "- Community support and documentation to accelerate AI projects.\n",
       "- Competitive pricing plans suitable for individuals, startups, and large organizations.\n",
       "\n",
       "---\n",
       "\n",
       "## Careers at Hugging Face\n",
       "\n",
       "Hugging Face is a fast-growing technology company welcoming passionate AI professionals, engineers, researchers, and community builders. Benefits of working here include:\n",
       "\n",
       "- Being part of a global, mission-driven team shaping the future of AI.\n",
       "- Opportunities for personal and professional growth through collaboration on cutting-edge AI projects.\n",
       "- Culture that values openness, innovation, and impact.\n",
       "\n",
       "(Current job openings and detailed career information are available on the company’s website.)\n",
       "\n",
       "---\n",
       "\n",
       "## Join the Future of AI\n",
       "\n",
       "Whether you are an AI researcher, developer, enterprise user, or enthusiast, Hugging Face is your hub for cutting-edge machine learning innovation and collaboration.\n",
       "\n",
       "**Explore:** [huggingface.co](https://huggingface.co)  \n",
       "**Sign Up & Start Building:** Create your ML portfolio, deploy applications, and join a thriving AI community today!\n",
       "\n",
       "---\n",
       "\n",
       "*Hugging Face – The AI community building the future.*"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "create_brochure(\"HuggingFace\", \"https://huggingface.co\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61eaaab7-0b47-4b29-82d4-75d474ad8d18",
   "metadata": {},
   "source": [
    "## Finally - a minor improvement\n",
    "\n",
    "With a small adjustment, we can change this so that the results stream back from OpenAI,\n",
    "with the familiar typewriter animation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "51db0e49-f261-4137-aabe-92dd601f7725",
   "metadata": {},
   "outputs": [],
   "source": [
    "def stream_brochure(company_name, url):\n",
    "    stream = openai.chat.completions.create(\n",
    "        model=\"gpt-4.1-mini\",\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": brochure_system_prompt},\n",
    "            {\"role\": \"user\", \"content\": get_brochure_user_prompt(company_name, url)}\n",
    "          ],\n",
    "        stream=True\n",
    "    )    \n",
    "    response = \"\"\n",
    "    display_handle = display(Markdown(\"\"), display_id=True)\n",
    "    for chunk in stream:\n",
    "        response += chunk.choices[0].delta.content or ''\n",
    "        update_display(Markdown(response), display_id=display_handle.display_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "56bf0ae3-ee9d-4a72-9cd6-edcac67ceb6d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
      "Found 12 relevant links\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "# Hugging Face Brochure\n",
       "\n",
       "---\n",
       "\n",
       "## About Hugging Face\n",
       "\n",
       "**Hugging Face** is the AI community building the future, providing a collaborative platform where machine learning (ML) engineers, researchers, and enthusiasts come together to create, share, and innovate. It serves as the **home of machine learning**, empowering users worldwide to build an open and ethical AI future through a rich ecosystem of models, datasets, and applications.\n",
       "\n",
       "---\n",
       "\n",
       "## What We Offer\n",
       "\n",
       "- **Hugging Face Hub**: A central repository hosting over **1 million models**, **250,000+ datasets**, and **400,000+ applications** spanning various ML modalities including text, image, video, audio, and even 3D.\n",
       "- **Spaces**: User-friendly environments to build and deploy ML applications seamlessly.\n",
       "- **Inference API**: Quickly deploy and get started with inference on top models in seconds.\n",
       "- **Open Source Stack**: Leverage a fast and reliable open-source stack to accelerate your ML projects.\n",
       "- **Enterprise Solutions**: Paid compute and enterprise-grade services for teams and organizations needing advanced capabilities and support.\n",
       "- **Innovative Projects**: Includes unique initiatives like *HuggingChat Omni* — an AI chat experience, and *Reachy Mini* — an open robot platform for AI builders.\n",
       "\n",
       "---\n",
       "\n",
       "## Community and Collaboration\n",
       "\n",
       "Hugging Face is centered on community-driven collaboration with:\n",
       "\n",
       "- A **fast-growing global community** passionate about advancing machine learning.\n",
       "- Open sharing of **models, datasets, and applications** to foster innovation.\n",
       "- A platform where users can **build their portfolios** by sharing their ML work publicly.\n",
       "- Tools to support **learning, experimenting, and ethical AI development**.\n",
       "- Active engagement through various trending projects and applications reflecting cutting-edge research.\n",
       "\n",
       "---\n",
       "\n",
       "## Technology and Innovation Highlights\n",
       "\n",
       "- Supports diverse modalities: **text, image, video, audio, 3D**.\n",
       "- Hosting trendsetting models like:\n",
       "  - Qwen3-VL-8B-Instruct\n",
       "  - DeepSeek-OCR\n",
       "  - Nanonets OCR models\n",
       "- Hundreds of running spaces generating videos, animations, and applications powered by ML.\n",
       "- Access to thousands of up-to-date datasets including educational, code-related, and domain-specific datasets.\n",
       "\n",
       "---\n",
       "\n",
       "## Company Culture\n",
       "\n",
       "- **Open and Ethical AI**: Commitment to building AI technologies transparently and responsibly.\n",
       "- **Collaborative Spirit**: Encourages sharing knowledge and cooperation among ML practitioners.\n",
       "- **Community Empowerment**: Focus on enabling the next generation of ML engineers and scientists.\n",
       "- **Innovation at Core**: Constantly pushing the boundaries of AI capabilities with a rapid development cycle and user-driven improvements.\n",
       "\n",
       "---\n",
       "\n",
       "## Careers at Hugging Face\n",
       "\n",
       "- Be part of an innovative, mission-driven company revolutionizing AI and machine learning.\n",
       "- Opportunities for engineers, researchers, and AI enthusiasts passionate about open source and community growth.\n",
       "- Work in a culture that values collaboration, ethics, continuous learning, and impact.\n",
       "- Help shape the future of AI tools used by millions worldwide.\n",
       "\n",
       "---\n",
       "\n",
       "## Join Us\n",
       "\n",
       "Discover, create, and accelerate your machine learning journey with **Hugging Face** — the community and platform building the future of AI.\n",
       "\n",
       "**Get started today:**\n",
       "\n",
       "- Explore models, datasets, and AI applications on the [Hugging Face Hub](https://huggingface.co)\n",
       "- Try out the latest AI chat and inference tools.\n",
       "- Join our community to collaborate and innovate.\n",
       "\n",
       "---\n",
       "\n",
       "**Hugging Face - The AI community building the future together.**  \n",
       "Colors: #FFD21E, #FF9D00, #6B7280\n",
       "\n",
       "---\n",
       "\n",
       "For more information, visit: [huggingface.co](https://huggingface.co)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "stream_brochure(\"HuggingFace\", \"https://huggingface.co\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "fdb3f8d8-a3eb-41c8-b1aa-9f60686a653b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
      "Found 14 relevant links\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "# Welcome to Hugging Face – The AI Community Building the Future!\n",
       "\n",
       "---\n",
       "\n",
       "## Who Are We?  \n",
       "Imagine a place where machine learning wizards, data sorcerers, and AI alchemists gather to share their spells — uh, models — datasets, and apps. That’s Hugging Face! We’re *the* platform where the AI community collaborates, creates, and sometimes even has a little fun while building the future.\n",
       "\n",
       "Our motto? **\"Keep it open. Keep it ethical. Keep it hugging.\"** 💛\n",
       "\n",
       "---\n",
       "\n",
       "## What’s Cooking in the AI Kitchen?\n",
       "\n",
       "- **1 Million+ Models** — From image generators to language wizards, our treasure trove of open-source ML models grows faster than you can say \"neural network.\"  \n",
       "- **250,000+ Datasets** — Feeding AI brains with everything from chat prompts to persona profiles. Hungry for data? Dig in!  \n",
       "- **400,000+ Applications & Spaces** — Launch apps, share your ML portfolio, or just show off cool demos that make your friends say, “Whoa, AI can do that?”  \n",
       "- **Multimodal Madness** — Text, image, video, audio, even 3D...if AI had a Swiss Army knife, we’d be it.  \n",
       "\n",
       "---\n",
       "\n",
       "## Customers & Community  \n",
       "Whether you’re a student trying to get your AI feet wet, a startup looking to scale your genius, or an enterprise aiming to deploy heavy-duty models in the real world, Hugging Face has your back.\n",
       "\n",
       "With the fastest growing community of *machine learning enthusiasts* and the support of some seriously big names and organizations, here’s a place where:\n",
       "\n",
       "- **Freelancers** can build a portfolio and get noticed.  \n",
       "- **Researchers** can push boundaries openly and ethically.  \n",
       "- **Businesses** can accelerate AI adoption with our paid Compute and Enterprise suites.  \n",
       "\n",
       "Join 1.29k+ Spaces and thousands more running models that power everything from video generation to AI-powered image editing.\n",
       "\n",
       "---\n",
       "\n",
       "## Culture & Career – Geek Out with Us!  \n",
       "We believe collaboration beats isolation every day. Our culture?\n",
       "\n",
       "- Open source at heart ❤️  \n",
       "- Ethical AI advocates  \n",
       "- Casual tea-drinkers and serious problem solvers  \n",
       "- Always learning, always sharing, always growing  \n",
       "\n",
       "Want to build machine learning tools that millions will use? Hugging Face is where your skills meet endless possibilities. From ML engineers to community managers, our doors are wide open (virtual hugs included).\n",
       "\n",
       "---\n",
       "\n",
       "## Speed Up Your AI Journey  \n",
       "No need to code in the dark alone or fight for GPU time — deploy models and apps with a few clicks on optimized inference endpoints, starting at just $0.60/hour for GPU!\n",
       "\n",
       "Whether you want to host that killer new model or just tweak an existing one, we give you the tools and community support to **move faster, build smarter, and hug tighter**.\n",
       "\n",
       "---\n",
       "\n",
       "## Quick Hugging Face Facts  \n",
       "- **Founded:** Around the corner from the future  \n",
       "- **Colors:** Bright yellow (#FFD21E), orange (#FF9D00), and sleek gray (#6B7280) — because AI should be as vibrant as its ideas!  \n",
       "- **Mascot:** Friendly face with a warm smile (because AIs could learn a thing or two about friendliness here)  \n",
       "\n",
       "---\n",
       "\n",
       "## Ready to Join the AI Hug Circle?  \n",
       "\n",
       "Sign up, share your work, explore millions of models and datasets, and get your AI career (or project!) hugging new heights.\n",
       "\n",
       "[Explore AI Apps](#) | [Browse 1M+ Models](#) | [Sign Up & Join The Fun](#)\n",
       "\n",
       "---\n",
       "\n",
       "*Hugging Face — where the future of AI isn’t just created; it’s hugged into existence.* 🤗✨"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:\n",
    "\n",
    "stream_brochure(\"HuggingFace\", \"https://huggingface.co\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a27bf9e0-665f-4645-b66b-9725e2a959b5",
   "metadata": {},
   "source": [
    "<table style=\"margin: 0; text-align: left;\">\n",
    "    <tr>\n",
    "        <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
    "            <img src=\"../assets/business.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
    "        </td>\n",
    "        <td>\n",
    "            <h2 style=\"color:#181;\">Business applications</h2>\n",
    "            <span style=\"color:#181;\">In this exercise we extended the Day 1 code to make multiple LLM calls, and generate a document.\n",
    "\n",
    "This is perhaps the first example of Agentic AI design patterns, as we combined multiple calls to LLMs. This will feature more in Week 2, and then we will return to Agentic AI in a big way in Week 8 when we build a fully autonomous Agent solution.\n",
    "\n",
    "Generating content in this way is one of the very most common Use Cases. As with summarization, this can be applied to any business vertical. Write marketing content, generate a product tutorial from a spec, create personalized email content, and so much more. Explore how you can apply content generation to your business, and try making yourself a proof-of-concept prototype. See what other students have done in the community-contributions folder -- so many valuable projects -- it's wild!</span>\n",
    "        </td>\n",
    "    </tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14b2454b-8ef8-4b5c-b928-053a15e0d553",
   "metadata": {},
   "source": [
    "<table style=\"margin: 0; text-align: left;\">\n",
    "    <tr>\n",
    "        <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
    "            <img src=\"../assets/important.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
    "        </td>\n",
    "        <td>\n",
    "            <h2 style=\"color:#900;\">Before you move to Week 2 (which is tons of fun)</h2>\n",
    "            <span style=\"color:#900;\">Please see the week1 EXERCISE notebook for your challenge for the end of week 1. This will give you some essential practice working with Frontier APIs, and prepare you well for Week 2.</span>\n",
    "        </td>\n",
    "    </tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17b64f0f-7d33-4493-985a-033d06e8db08",
   "metadata": {},
   "source": [
    "<table style=\"margin: 0; text-align: left;\">\n",
    "    <tr>\n",
    "        <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
    "            <img src=\"../assets/resources.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
    "        </td>\n",
    "        <td>\n",
    "            <h2 style=\"color:#f71;\">A reminder on 3 useful resources</h2>\n",
    "            <span style=\"color:#f71;\">1. The resources for the course are available <a href=\"https://edwarddonner.com/2024/11/13/llm-engineering-resources/\">here.</a><br/>\n",
    "            2. I'm on LinkedIn <a href=\"https://www.linkedin.com/in/eddonner/\">here</a> and I love connecting with people taking the course!<br/>\n",
    "            3. I'm trying out X/Twitter and I'm at <a href=\"https://x.com/edwarddonner\">@edwarddonner<a> and hoping people will teach me how it's done..  \n",
    "            </span>\n",
    "        </td>\n",
    "    </tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f48e42e-fa7a-495f-a5d4-26bfc24d60b6",
   "metadata": {},
   "source": [
    "<table style=\"margin: 0; text-align: left;\">\n",
    "    <tr>\n",
    "        <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
    "            <img src=\"../assets/thankyou.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
    "        </td>\n",
    "        <td>\n",
    "            <h2 style=\"color:#090;\">Finally! I have a special request for you</h2>\n",
    "            <span style=\"color:#090;\">\n",
    "                My editor tells me that it makes a MASSIVE difference when students rate this course on Udemy - it's one of the main ways that Udemy decides whether to show it to others. If you're able to take a minute to rate this, I'd be so very grateful! And regardless - always please reach out to me at ed@edwarddonner.com if I can help at any point.\n",
    "            </span>\n",
    "        </td>\n",
    "    </tr>\n",
    "</table>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}