Files
LLM_Engineering_OLD/week1/day5.ipynb

1878 lines
71 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "a98030af-fcd1-4d63-a36e-38ba053498fa",
"metadata": {},
"source": [
"# A full business solution\n",
"\n",
"## Now we will take our project from Day 1 to the next level\n",
"\n",
"### BUSINESS CHALLENGE:\n",
"\n",
"Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.\n",
"\n",
"We will be provided a company name and their primary website.\n",
"\n",
"See the end of this notebook for examples of real-world business applications.\n",
"\n",
"And remember: I'm always available if you have problems or ideas! Please do reach out."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d5b08506-dc8b-4443-9201-5f1848161363",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt\n",
"\n",
"import os\n",
"import json\n",
"from dotenv import load_dotenv\n",
"from IPython.display import Markdown, display, update_display\n",
"from scraper import fetch_website_links, fetch_website_contents\n",
"from openai import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fc5d8880-f2ee-4c06-af16-ecbc0262af61",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"API key looks good so far\n"
]
}
],
"source": [
"# Initialize and constants\n",
"\n",
"load_dotenv(override=True)\n",
"api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:\n",
" print(\"API key looks good so far\")\n",
"else:\n",
" print(\"There might be a problem with your API key? Please visit the troubleshooting notebook!\")\n",
" \n",
"MODEL = 'gpt-5-nano'\n",
"openai = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "e30d8128-933b-44cc-81c8-ab4c9d86589a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['https://edwarddonner.com/',\n",
" 'https://edwarddonner.com/connect-four/',\n",
" 'https://edwarddonner.com/outsmart/',\n",
" 'https://edwarddonner.com/about-me-and-about-nebula/',\n",
" 'https://edwarddonner.com/posts/',\n",
" 'https://edwarddonner.com/',\n",
" 'https://news.ycombinator.com',\n",
" 'https://nebula.io/?utm_source=ed&utm_medium=referral',\n",
" 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',\n",
" 'https://patents.google.com/patent/US20210049536A1/',\n",
" 'https://www.linkedin.com/in/eddonner/',\n",
" 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',\n",
" 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',\n",
" 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',\n",
" 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',\n",
" 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',\n",
" 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',\n",
" 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',\n",
" 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',\n",
" 'https://edwarddonner.com/',\n",
" 'https://edwarddonner.com/connect-four/',\n",
" 'https://edwarddonner.com/outsmart/',\n",
" 'https://edwarddonner.com/about-me-and-about-nebula/',\n",
" 'https://edwarddonner.com/posts/',\n",
" 'mailto:hello@mygroovydomain.com',\n",
" 'https://www.linkedin.com/in/eddonner/',\n",
" 'https://twitter.com/edwarddonner',\n",
" 'https://www.facebook.com/edward.donner.52']"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"links = fetch_website_links(\"https://edwarddonner.com\")\n",
"links"
]
},
{
"cell_type": "markdown",
"id": "1771af9c-717a-4fca-bbbe-8a95893312c3",
"metadata": {},
"source": [
"## First step: Have GPT-5-nano figure out which links are relevant\n",
"\n",
"### Use a call to gpt-5-nano to read the links on a webpage, and respond in structured JSON. \n",
"It should decide which links are relevant, and replace relative links such as \"/about\" with \"https://company.com/about\". \n",
"We will use \"one shot prompting\" in which we provide an example of how it should respond in the prompt.\n",
"\n",
"This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!\n",
"\n",
"Sidenote: there is a more advanced technique called \"Structured Outputs\" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6957b079-0d96-45f7-a26a-3487510e9b35",
"metadata": {},
"outputs": [],
"source": [
"link_system_prompt = \"\"\"\n",
"You are provided with a list of links found on a webpage.\n",
"You are able to decide which of the links would be most relevant to include in a brochure about the company,\n",
"such as links to an About page, or a Company page, or Careers/Jobs pages.\n",
"You should respond in JSON as in this example:\n",
"\n",
"{\n",
" \"links\": [\n",
" {\"type\": \"about page\", \"url\": \"https://full.url/goes/here/about\"},\n",
" {\"type\": \"careers page\", \"url\": \"https://another.full.url/careers\"}\n",
" ]\n",
"}\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "8e1f601b-2eaf-499d-b6b8-c99050c9d6b3",
"metadata": {},
"outputs": [],
"source": [
"def get_links_user_prompt(url):\n",
" user_prompt = f\"\"\"\n",
"Here is the list of links on the website {url} -\n",
"Please decide which of these are relevant web links for a brochure about the company, \n",
"respond with the full https URL in JSON format.\n",
"Do not include Terms of Service, Privacy, email links.\n",
"\n",
"Links (some might be relative links):\n",
"\n",
"\"\"\"\n",
" links = fetch_website_links(url)\n",
" user_prompt += \"\\n\".join(links)\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6bcbfa78-6395-4685-b92c-22d592050fd7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Here is the list of links on the website https://edwarddonner.com -\n",
"Please decide which of these are relevant web links for a brochure about the company, \n",
"respond with the full https URL in JSON format.\n",
"Do not include Terms of Service, Privacy, email links.\n",
"\n",
"Links (some might be relative links):\n",
"\n",
"https://edwarddonner.com/\n",
"https://edwarddonner.com/connect-four/\n",
"https://edwarddonner.com/outsmart/\n",
"https://edwarddonner.com/about-me-and-about-nebula/\n",
"https://edwarddonner.com/posts/\n",
"https://edwarddonner.com/\n",
"https://news.ycombinator.com\n",
"https://nebula.io/?utm_source=ed&utm_medium=referral\n",
"https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html\n",
"https://patents.google.com/patent/US20210049536A1/\n",
"https://www.linkedin.com/in/eddonner/\n",
"https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/\n",
"https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/\n",
"https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/\n",
"https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/\n",
"https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/\n",
"https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/\n",
"https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/\n",
"https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/\n",
"https://edwarddonner.com/\n",
"https://edwarddonner.com/connect-four/\n",
"https://edwarddonner.com/outsmart/\n",
"https://edwarddonner.com/about-me-and-about-nebula/\n",
"https://edwarddonner.com/posts/\n",
"mailto:hello@mygroovydomain.com\n",
"https://www.linkedin.com/in/eddonner/\n",
"https://twitter.com/edwarddonner\n",
"https://www.facebook.com/edward.donner.52\n"
]
}
],
"source": [
"print(get_links_user_prompt(\"https://edwarddonner.com\"))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "effeb95f",
"metadata": {},
"outputs": [],
"source": [
"def select_relevant_links(url):\n",
" response = openai.chat.completions.create(\n",
" model=MODEL,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": link_system_prompt},\n",
" {\"role\": \"user\", \"content\": get_links_user_prompt(url)}\n",
" ],\n",
" response_format={\"type\": \"json_object\"}\n",
" )\n",
" result = response.choices[0].message.content\n",
" links = json.loads(result)\n",
" return links\n",
" "
]
},
{
"cell_type": "markdown",
"id": "490de841",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 8,
"id": "2d5b1ded",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'links': [{'type': 'homepage', 'url': 'https://edwarddonner.com/'},\n",
" {'type': 'about page',\n",
" 'url': 'https://edwarddonner.com/about-me-and-about-nebula/'},\n",
" {'type': 'blog page', 'url': 'https://edwarddonner.com/posts/'},\n",
" {'type': 'blog post',\n",
" 'url': 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/'},\n",
" {'type': 'blog post',\n",
" 'url': 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/'},\n",
" {'type': 'blog post',\n",
" 'url': 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/'},\n",
" {'type': 'blog post',\n",
" 'url': 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/'},\n",
" {'type': 'LinkedIn profile', 'url': 'https://www.linkedin.com/in/eddonner/'},\n",
" {'type': 'Twitter profile', 'url': 'https://twitter.com/edwarddonner'},\n",
" {'type': 'Facebook profile',\n",
" 'url': 'https://www.facebook.com/edward.donner.52'},\n",
" {'type': 'external company',\n",
" 'url': 'https://nebula.io/?utm_source=ed&utm_medium=referral'}]}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"select_relevant_links(\"https://edwarddonner.com\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3e7b84c0",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "26709d38",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 9,
"id": "a29aca19-ca13-471c-a4b4-5abbfa813f69",
"metadata": {},
"outputs": [],
"source": [
"def select_relevant_links(url):\n",
" print(f\"Selecting relevant links for {url} by calling {MODEL}\")\n",
" response = openai.chat.completions.create(\n",
" model=MODEL,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": link_system_prompt},\n",
" {\"role\": \"user\", \"content\": get_links_user_prompt(url)}\n",
" ],\n",
" response_format={\"type\": \"json_object\"}\n",
" )\n",
" result = response.choices[0].message.content\n",
" links = json.loads(result)\n",
" print(f\"Found {len(links['links'])} relevant links\")\n",
" return links"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74a827a0-2782-4ae5-b210-4a242a8b4cc2",
"metadata": {},
"outputs": [],
"source": [
"select_relevant_links(\"https://edwarddonner.com\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d3d583e2-dcc4-40cc-9b28-1e8dbf402924",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
"Found 3 relevant links\n"
]
},
{
"data": {
"text/plain": [
"{'links': [{'type': 'brand page', 'url': 'https://huggingface.co/brand'},\n",
" {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},\n",
" {'type': 'company page',\n",
" 'url': 'https://www.linkedin.com/company/huggingface/'}]}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"select_relevant_links(\"https://huggingface.co\")"
]
},
{
"cell_type": "markdown",
"id": "0d74128e-dfb6-47ec-9549-288b621c838c",
"metadata": {},
"source": [
"## Second step: make the brochure!\n",
"\n",
"Assemble all the details into another prompt to GPT-5-nano"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "85a5b6e2-e7ef-44a9-bc7f-59ede71037b5",
"metadata": {},
"outputs": [],
"source": [
"def fetch_page_and_all_relevant_links(url):\n",
" contents = fetch_website_contents(url)\n",
" relevant_links = select_relevant_links(url)\n",
" result = f\"## Landing Page:\\n\\n{contents}\\n## Relevant Links:\\n\"\n",
" for link in relevant_links['links']:\n",
" result += f\"\\n\\n### Link: {link['type']}\\n\"\n",
" result += fetch_website_contents(link[\"url\"])\n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "5099bd14-076d-4745-baf3-dac08d8e5ab2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
"Found 11 relevant links\n",
"## Landing Page:\n",
"\n",
"Hugging Face The AI community building the future.\n",
"\n",
"Hugging Face\n",
"Models\n",
"Datasets\n",
"Spaces\n",
"Community\n",
"Docs\n",
"Enterprise\n",
"Pricing\n",
"Log In\n",
"Sign Up\n",
"The AI community building the future.\n",
"The platform where the machine learning community collaborates on models, datasets, and applications.\n",
"Explore AI Apps\n",
"or\n",
"Browse 1M+ models\n",
"Trending on\n",
"this week\n",
"Models\n",
"tencent/HunyuanImage-3.0\n",
"Updated\n",
"about 3 hours ago\n",
"•\n",
"825\n",
"•\n",
"714\n",
"deepseek-ai/DeepSeek-V3.2-Exp\n",
"Updated\n",
"3 days ago\n",
"•\n",
"10.7k\n",
"•\n",
"470\n",
"tencent/Hunyuan3D-Part\n",
"Updated\n",
"4 days ago\n",
"•\n",
"2.57k\n",
"•\n",
"461\n",
"zai-org/GLM-4.6\n",
"Updated\n",
"2 days ago\n",
"•\n",
"9.72k\n",
"•\n",
"298\n",
"inclusionAI/Ring-1T-preview\n",
"Updated\n",
"1 day ago\n",
"•\n",
"818\n",
"•\n",
"208\n",
"Browse 1M+ models\n",
"Spaces\n",
"Running\n",
"1.29k\n",
"1.29k\n",
"Wan2.2 Animate\n",
"👁\n",
"Wan2.2 Animate\n",
"Running\n",
"14.5k\n",
"14.5k\n",
"DeepSite v3\n",
"🐳\n",
"Generate any application with DeepSeek\n",
"Running\n",
"on\n",
"Zero\n",
"MCP\n",
"1.46k\n",
"1.46k\n",
"Wan2.2 14B Fast\n",
"🎥\n",
"generate a video from an image with a text prompt\n",
"Running\n",
"on\n",
"Zero\n",
"196\n",
"196\n",
"Qwen Image Edit 2509\n",
"👀\n",
"Generate edited images based on user prompts\n",
"Running\n",
"on\n",
"CPU Upgrade\n",
"103\n",
"103\n",
"Ostris' AI Toolkit\n",
"💻\n",
"Train FLUX, Qwen and Wan LoRAs with Ostris Ai Toolkit\n",
"Browse 400k+ applications\n",
"Datasets\n",
"openai/gdpval\n",
"Updated\n",
"7 days ago\n",
"•\n",
"17k\n",
"•\n",
"157\n",
"fka/awesome-chatgpt-prompts\n",
"Updated\n",
"Jan 6\n",
"•\n",
"49k\n",
"•\n",
"9.18k\n",
"nvidia/Nemotron-Personas-Japan\n",
"Updated\n",
"9 days ago\n",
"•\n",
"9.74k\n",
"•\n",
"75\n",
"t-tech/T-ECD\n",
"Updated\n",
"6 days ago\n",
"•\n",
"993\n",
"•\n",
"22\n",
"facebook/seamless-interaction\n",
"Updated\n",
"Jul 14\n",
"•\n",
"17.5k\n",
"•\n",
"148\n",
"Browse 250k+ datasets\n",
"The Home of Machine Learning\n",
"Create, discover and collaborate on ML better.\n",
"The collaboration platform\n",
"Host and collaborate on unlimited public models, datasets and applications.\n",
"Move faster\n",
"With the HF Open source stack.\n",
"Explore all modalities\n",
"Text, image, video, audio or even 3D.\n",
"Build your portfolio\n",
"Share your work with the world and build your ML profile.\n",
"Sign Up\n",
"Accelerate your ML\n",
"We provide paid Compute and Enterprise solutions.\n",
"Compute\n",
"Deploy on optimized\n",
"Inference Endpoints\n",
"or update your\n",
"Spaces applications\n",
"to a GPU in a few clicks.\n",
"View pricing\n",
"Starting at $0.60/hour for GPU\n",
"Tea\n",
"## Relevant Links:\n",
"\n",
"\n",
"### Link: about page\n",
"Brand assets - Hugging Face\n",
"\n",
"Hugging Face\n",
"Models\n",
"Datasets\n",
"Spaces\n",
"Community\n",
"Docs\n",
"Enterprise\n",
"Pricing\n",
"Log In\n",
"Sign Up\n",
"Hugging Face · Brand assets\n",
"HF Logos\n",
".svg\n",
".png\n",
".ai\n",
".svg\n",
".png\n",
".ai\n",
".svg\n",
".png\n",
".ai\n",
"HF Colors\n",
"#FFD21E\n",
"#FF9D00\n",
"#6B7280\n",
"HF Bio\n",
"Hugging Face is the collaboration platform for the machine learning community.\n",
"\n",
"The Hugging Face Hub works as a central place where anyone can share, explore, discover, and experiment with open-source ML. HF empowers the next generation of machine learning engineers, scientists, and end users to learn, collaborate and share their work to build an open and ethical AI future together.\n",
"\n",
"With the fast-growing community, some of the most used open-source ML libraries and tools, and a talented science team exploring the edge of tech, Hugging Face is at the heart of the AI revolution.\n",
"Copy to clipboard\n",
"HF Universe\n",
"Find other assets available for use from the Hugging Face brand universe\n",
"here\n",
".\n",
"System theme\n",
"Website\n",
"Models\n",
"Datasets\n",
"Spaces\n",
"Changelog\n",
"Inference Endpoints\n",
"HuggingChat\n",
"Company\n",
"About\n",
"Brand assets\n",
"Terms of service\n",
"Privacy\n",
"Jobs\n",
"Press\n",
"Resources\n",
"Learn\n",
"Documentation\n",
"Blog\n",
"Forum\n",
"Service Status\n",
"Social\n",
"GitHub\n",
"Twitter\n",
"LinkedIn\n",
"Discord\n",
"\n",
"### Link: enterprise page\n",
"Enterprise Hub - Hugging Face\n",
"\n",
"Hugging Face\n",
"Models\n",
"Datasets\n",
"Spaces\n",
"Community\n",
"Docs\n",
"Enterprise\n",
"Pricing\n",
"Log In\n",
"Sign Up\n",
"Team & Enterprise Hub\n",
"Scale your organization with the worlds leading AI platform\n",
"Subscribe to\n",
"Team\n",
"starting at $20/user/month\n",
"or\n",
"Contact sales for\n",
"Enterprise\n",
"to explore flexible contract options\n",
"Give your organization the most advanced platform to build AI with enterprise-grade security, access controls,\n",
"\t\t\tdedicated support and more.\n",
"Single Sign-On\n",
"Connect securely to your identity provider with SSO integration.\n",
"Regions\n",
"Select, manage, and audit the location of your repository data.\n",
"Audit Logs\n",
"Stay in control with comprehensive logs that report on actions taken.\n",
"Resource Groups\n",
"Accurately manage access to repositories with granular access control.\n",
"Token Management\n",
"Centralized token control and custom approval policies for organization access.\n",
"Analytics\n",
"Track and analyze repository usage data in a single dashboard.\n",
"Advanced Compute Options\n",
"Increase scalability and performance with more compute options like ZeroGPU.\n",
"ZeroGPU Quota Boost\n",
"All organization members get 5x more ZeroGPU quota to get the most of Spaces.\n",
"Private Datasets Viewer\n",
"Enable the Dataset Viewer on your private datasets for easier collaboration.\n",
"Private Storage\n",
"Get an additional 1 TB of private storage for each member of your organization (then $25/month per extra TB).\n",
"Inference Providers\n",
"Enable organization billing for Inference Providers, monitor usage with analytics, and manage spending limits.\n",
"Advanced security\n",
"Configure organization-wide security policies and default repository visibility.\n",
"Billing\n",
"Control your budget effectively with managed billing and yearly commit options.\n",
"Priority Support\n",
"Maximize your platform usage with priority support from the Hugging Face team.\n",
"Join the most forward-thinking AI organizations\n",
"Everything you already know and love about Hugging Face in Enterprise mode.\n",
"Subscribe to\n",
"Team\n",
"starting at $20/user/month\n",
"or\n",
"Contact sales for\n",
"Enterprise\n",
"to explore fl\n",
"\n",
"### Link: careers page\n",
"Hugging Face - Current Openings\n",
"\n",
"\n",
"\n",
"### Link: blog\n",
"Hugging Face Blog\n",
"\n",
"Hugging Face\n",
"Models\n",
"Datasets\n",
"Spaces\n",
"Community\n",
"Docs\n",
"Enterprise\n",
"Pricing\n",
"Log In\n",
"Sign Up\n",
"Blog, Articles, and discussions\n",
"New Article\n",
"Everything\n",
"community\n",
"guide\n",
"open source collab\n",
"partnerships\n",
"research\n",
"NLP\n",
"Audio\n",
"CV\n",
"RL\n",
"ethics\n",
"Diffusion\n",
"Game Development\n",
"RLHF\n",
"Leaderboard\n",
"Case Studies\n",
"LeRobot\n",
"Inference Providers\n",
"SOTA OCR on-device with Core ML and dots.ocr\n",
"By\n",
"FL33TW00D-HF\n",
"October 2, 2025\n",
"Community Articles\n",
"view all\n",
"There is no such thing as a tokenizer-free lunch\n",
"By\n",
"catherinearnett\n",
"•\n",
"8 days ago\n",
"•\n",
"68\n",
"Model Quality: Hugging Face Is All You Need\n",
"By\n",
"finegrain\n",
"•\n",
"6 days ago\n",
"•\n",
"19\n",
"Code a simple RAG from scratch\n",
"By\n",
"ngxson\n",
"•\n",
"Oct 29, 2024\n",
"•\n",
"208\n",
"Nemotron-Personas-Japan: Synthesized Data for Sovereign AI\n",
"By\n",
"nvidia\n",
"and 6 others\n",
"•\n",
"9 days ago\n",
"•\n",
"24\n",
"When Does Reasoning Matter? Unpacking the Contribution of Reasoning to LLM Performance\n",
"By\n",
"Nicolas-BZRD\n",
"and 1 other\n",
"•\n",
"2 days ago\n",
"•\n",
"10\n",
"CU-1 for Autonomous UI Agent Systems: An Open Alternative to Proprietary Solutions\n",
"By\n",
"paulml\n",
"•\n",
"1 day ago\n",
"•\n",
"10\n",
"How I Trained Action Chunking Transformer (ACT) on SO-101: My Journey, Gotchas, and Lessons\n",
"By\n",
"sherryxychen\n",
"•\n",
"3 days ago\n",
"•\n",
"9\n",
"Preserving Agency: Why AI Safety Needs Community, Not Corporate Control\n",
"By\n",
"giadap\n",
"•\n",
"3 days ago\n",
"•\n",
"9\n",
"RexBERT: Encoders for a brave new world of E-Commerce\n",
"By\n",
"thebajajra\n",
"and 1 other\n",
"•\n",
"12 days ago\n",
"•\n",
"46\n",
"Nemotron-Personas-Japan: ソブリン AI のための合成データセット\n",
"By\n",
"nvidia\n",
"and 6 others\n",
"•\n",
"6 days ago\n",
"•\n",
"7\n",
"Uncensor any LLM with abliteration\n",
"By\n",
"mlabonne\n",
"•\n",
"Jun 13, 2024\n",
"•\n",
"684\n",
"How to Train an Antibody Developability Model\n",
"By\n",
"ginkgo-datapoints\n",
"and 1 other\n",
"•\n",
"15 days ago\n",
"•\n",
"14\n",
"Introduction to State Space Models (SSM)\n",
"By\n",
"lbourdois\n",
"•\n",
"Jul 19, 2024\n",
"•\n",
"176\n",
"arXiv实用技巧如何让你的paper关注度变高\n",
"By\n",
"JessyTsu1\n",
"•\n",
"Jul 8, 2024\n",
"•\n",
"14\n",
"Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face\n",
"By\n",
"dvgodoy\n",
"•\n",
"Feb 11\n",
"•\n",
"72\n",
"PP-OCRv5 on Hugging Face: A Specialized Approach to OCR\n",
"By\n",
"baidu\n",
"and 5 others\n",
"•\n",
"22 days ago\n",
"•\n",
"102\n",
"Ground-up efforts to build large datasets for effective and accurate translation of Modi-Sc\n",
"\n",
"### Link: community forum\n",
"Hugging Face Forums - Hugging Face Community Discussion\n",
"\n",
"Hugging Face Forums\n",
"Topic\n",
"Replies\n",
"Views\n",
"Activity\n",
"Tokenizer: How to suppress preceding whitespace when giving pre-tokenized list[list[str]] input\n",
"Beginners\n",
"2\n",
"4\n",
"October 2, 2025\n",
"Student Pro Plan for Education Users (Masters student)\n",
"Research\n",
"0\n",
"13\n",
"October 2, 2025\n",
"Error: Repository storage limit reached (Max: 1 GB)\n",
"Beginners\n",
"29\n",
"2720\n",
"October 2, 2025\n",
"Can't claim ownership of paper\n",
"Beginners\n",
"3\n",
"26\n",
"October 2, 2025\n",
"Set usage limit per access token\n",
"Site Feedback\n",
"0\n",
"5\n",
"October 2, 2025\n",
"I want to open source this framework. OAGI\n",
"Research\n",
"2\n",
"15\n",
"October 2, 2025\n",
"Pro Account ZeroGPU Limit Usage at Failed Sessions\n",
"Beginners\n",
"2\n",
"10\n",
"October 2, 2025\n",
"How to use for Real Estate?\n",
"Beginners\n",
"1\n",
"12\n",
"October 2, 2025\n",
"Help : integrating a Agilex PiPER robot arm to lerobot\n",
"Intermediate\n",
"8\n",
"16\n",
"October 2, 2025\n",
"Custom Domain stuck on pending\n",
"Beginners\n",
"4\n",
"66\n",
"October 1, 2025\n",
"How to edit website with AI prompts?\n",
"Beginners\n",
"1\n",
"7\n",
"October 2, 2025\n",
"Custom Domain for HF Spaces\n",
"Spaces\n",
"10\n",
"4329\n",
"October 1, 2025\n",
"What are the best practices for fine-tuning transformer models with limited data?\n",
"Beginners\n",
"1\n",
"9\n",
"October 1, 2025\n",
"Creating and posting non-traditional datasets formats like DICOM, NIfTI, and WFDB records\n",
"🤗Datasets\n",
"1\n",
"8\n",
"September 30, 2025\n",
"I used to have no problem with PEFT fine-tuning after hundreds of trainings, but now I have encountered The error RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn\n",
"🤗AutoTrain\n",
"2\n",
"8\n",
"October 1, 2025\n",
"Help: Cant find Multi Image Input node in ComfyUI\n",
"Beginners\n",
"3\n",
"15\n",
"October 2, 2025\n",
"Which parameters to use for fine-tuning Llama-2-7b?\n",
"Beginners\n",
"1\n",
"12\n",
"September 30, 2025\n",
"03 Error When Using Qwen/Qwen2.5-VL-32B-Instruct with Inference Provider\n",
"Models\n",
"4\n",
"23\n",
"October 2, 2025\n",
"Is it possible to remove articles (the, a, an) from a text sample without consequences?\n",
"🤗Datasets\n",
"3\n",
"14\n",
"October 1, 2025\n",
"Have You Ever Tested an AI Model? What Was Your Experience?\n",
"Models\n",
"2\n",
"19\n",
"October 1, 2025\n",
"When fine-tuning a model, what stand\n",
"\n",
"### Link: GitHub\n",
"Hugging Face · GitHub\n",
"\n",
"Skip to content\n",
"Navigation Menu\n",
"Toggle navigation\n",
"Sign in\n",
"Appearance settings\n",
"huggingface\n",
"Platform\n",
"GitHub Copilot\n",
"Write better code with AI\n",
"GitHub Spark\n",
"New\n",
"Build and deploy intelligent apps\n",
"GitHub Models\n",
"New\n",
"Manage and compare prompts\n",
"GitHub Advanced Security\n",
"Find and fix vulnerabilities\n",
"Actions\n",
"Automate any workflow\n",
"Codespaces\n",
"Instant dev environments\n",
"Issues\n",
"Plan and track work\n",
"Code Review\n",
"Manage code changes\n",
"Discussions\n",
"Collaborate outside of code\n",
"Code Search\n",
"Find more, search less\n",
"Explore\n",
"Why GitHub\n",
"Documentation\n",
"GitHub Skills\n",
"Blog\n",
"Integrations\n",
"GitHub Marketplace\n",
"MCP Registry\n",
"View all features\n",
"Solutions\n",
"By company size\n",
"Enterprises\n",
"Small and medium teams\n",
"Startups\n",
"Nonprofits\n",
"By use case\n",
"App Modernization\n",
"DevSecOps\n",
"DevOps\n",
"CI/CD\n",
"View all use cases\n",
"By industry\n",
"Healthcare\n",
"Financial services\n",
"Manufacturing\n",
"Government\n",
"View all industries\n",
"View all solutions\n",
"Resources\n",
"Topics\n",
"AI\n",
"DevOps\n",
"Security\n",
"Software Development\n",
"View all\n",
"Explore\n",
"Learning Pathways\n",
"Events & Webinars\n",
"Ebooks & Whitepapers\n",
"Customer Stories\n",
"Partners\n",
"Executive Insights\n",
"Open Source\n",
"GitHub Sponsors\n",
"Fund open source developers\n",
"The ReadME Project\n",
"GitHub community articles\n",
"Repositories\n",
"Topics\n",
"Trending\n",
"Collections\n",
"Enterprise\n",
"Enterprise platform\n",
"AI-powered developer platform\n",
"Available add-ons\n",
"GitHub Advanced Security\n",
"Enterprise-grade security features\n",
"Copilot for business\n",
"Enterprise-grade AI features\n",
"Premium Support\n",
"Enterprise-grade 24/7 support\n",
"Pricing\n",
"Search or jump to...\n",
"Search code, repositories, users, issues, pull requests...\n",
"Search\n",
"Clear\n",
"Search syntax tips\n",
"Provide feedback\n",
"We read every piece of feedback, and take your input very seriously.\n",
"Include my email address so I can be contacted\n",
"Cancel\n",
"Submit feedback\n",
"Saved searches\n",
"Use saved searches to filter your results more quickly\n",
"Cancel\n",
"Create saved search\n",
"Sign in\n",
"Sign up\n",
"Appearance settings\n",
"Resetting focus\n",
"You signed in with another tab or window.\n",
"Reload\n",
"to refresh your session.\n",
"You signed out in another tab or window.\n",
"Reload\n",
"to refresh yo\n",
"\n",
"### Link: Twitter\n",
"No title found\n",
"\n",
"JavaScript is not available.\n",
"Weve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using x.com. You can see a list of supported browsers in our Help Center.\n",
"Help Center\n",
"Terms of Service\n",
"Privacy Policy\n",
"Cookie Policy\n",
"Imprint\n",
"Ads info\n",
"© 2025 X Corp.\n",
"Something went wrong, but dont fret — lets give it another shot.\n",
"Try again\n",
"Some privacy related extensions may cause issues on x.com. Please disable them and try again.\n",
"\n",
"### Link: LinkedIn\n",
"Hugging Face | LinkedIn\n",
"\n",
"Skip to main content\n",
"LinkedIn\n",
"Top Content\n",
"People\n",
"Learning\n",
"Jobs\n",
"Games\n",
"Get the app\n",
"Sign in\n",
"Join now\n",
"Hugging Face\n",
"Software Development\n",
"The AI community building the future.\n",
"See jobs\n",
"Follow\n",
"Discover all 623 employees\n",
"Report this company\n",
"About us\n",
"The AI community building the future.\n",
"Website\n",
"https://huggingface.co\n",
"External link for Hugging Face\n",
"Industry\n",
"Software Development\n",
"Company size\n",
"51-200 employees\n",
"Type\n",
"Privately Held\n",
"Founded\n",
"2016\n",
"Specialties\n",
"machine learning, natural language processing, and deep learning\n",
"Products\n",
"Hugging Face\n",
"Hugging Face\n",
"Natural Language Processing (NLP) Software\n",
"Were on a journey to solve and democratize artificial intelligence through natural language.\n",
"Locations\n",
"Primary\n",
"Get directions\n",
"Paris, FR\n",
"Get directions\n",
"Employees at Hugging Face\n",
"Ludovic Huraux\n",
"Rajat Arya\n",
"Tech Lead & Software Engineer @ HF | prev: co-founder XetHub, Apple, Turi, AWS, Microsoft\n",
"Jeff Boudier\n",
"Product + Growth at Hugging Face\n",
"Terrence Rohan\n",
"Seed Investor\n",
"See all employees\n",
"Updates\n",
"Hugging Face\n",
"reposted this\n",
"Mercuri\n",
"3,316 followers\n",
"1d\n",
"Report this post\n",
"🚨 404: media not found 🚨\n",
"\n",
"We're teaming up\n",
"The Tech Bros\n",
",\n",
"Canva\n",
",\n",
"ElevenLabs\n",
", and\n",
"Hugging Face\n",
"for an all female media x tech\n",
"#hackathon\n",
"at the Canva offices 👩🏾💻👩🏼💻👩🏻💻\n",
"\n",
"⚡ Build prototypes fast\n",
"⚡ Get feedback from engineers & investors\n",
"⚡ Meet your co-founder!?\n",
"\n",
"Hugging Face, Canva and ElevenLabs are unicorns shaping the future of media and creative tech. At\n",
"Mercuri\n",
", we bring the early-stage VC perspective, meeting thousands of startups each year at the intersection of media, entertainment and technology. This hackathon brings it all together, creating a space to test bold ideas and learn directly from operators and investors immersed in the field.\n",
"\n",
"The media x creative tech world needs fresh voices and bold ideas. When women lead, products get sharper, narratives more inclusive, and impact much greater - so come join us!\n",
"\n",
"📍 Central London\n",
"📅 Wednesday, 29 October\n",
"🔗 Link to sign up in comments\n",
"23\n",
"1 Comment\n",
"\n",
"### Link: Zhihu\n",
"No title found\n",
"\n",
"知乎,让每一次点击都充满意义 —— 欢迎来到知乎,发现问题背后的世界。\n",
"\n",
"### Link: API endpoints\n",
"Inference Endpoints by Hugging Face\n",
"\n",
"Inference\n",
"Endpoints\n",
"Catalog\n",
"Log In\n",
"Machine Learning At Your Service\n",
"by\n",
"Hugging Face\n",
"Easily deploy Transformers, Diffusers or any model on dedicated, fully managed\n",
"\t\t\t\tinfrastructure. Keep your costs low with our secure, compliant and flexible production\n",
"\t\t\t\tsolution.\n",
"Log In\n",
"Learn More\n",
"No Hugging Face account ?\n",
"Sign up\n",
"!\n",
"One-click inference deployment\n",
"Import your favorite model from the Hugging Face hub or browse our catalog of hand-picked, ready-to-deploy models !\n",
"meta-llama /\n",
"Llama-3.1-70B-Instruct\n",
"59\n",
"Deployed 59 times\n",
"Text Generation\n",
"TGI\n",
"Accelerated Text Generation Inference\n",
"GPU\n",
"4x Nvidia L40S\n",
"$\n",
"8.3\n",
"openai /\n",
"whisper-large-v3-turbo\n",
"162\n",
"Deployed 162 times\n",
"Automatic Speech Recognition\n",
"GPU\n",
"1x Nvidia L4\n",
"$\n",
"0.8\n",
"mixedbread-ai /\n",
"mxbai-embed-large-v1\n",
"22\n",
"Deployed 22 times\n",
"Sentence Embeddings\n",
"TEI\n",
"Accelerated Text Embeddings Inference\n",
"GPU\n",
"1x Nvidia L4\n",
"$\n",
"0.8\n",
"Qwen /\n",
"Qwen2.5-Coder-7B-Instruct\n",
"47\n",
"Deployed 47 times\n",
"Text Generation\n",
"TGI\n",
"Accelerated Text Generation Inference\n",
"GPU\n",
"1x Nvidia L40S\n",
"$\n",
"1.8\n",
"black-forest-labs /\n",
"FLUX.1-schnell\n",
"143\n",
"Deployed 143 times\n",
"Text-to-Image\n",
"GPU\n",
"1x Nvidia L40S\n",
"$\n",
"1.8\n",
"Browse Catalog\n",
"Hub Models\n",
"Customer Stories\n",
"Learn how leading AI teams use Inference Endpoints to deploy their models\n",
"Endpoints for\n",
"Music\n",
"Musixmatch is the worlds leading music data company\n",
"Use Case\n",
"Custom text embeddings generation pipeline\n",
"Models Deployed\n",
"Distilbert-base-uncased-finetuned-sst-2-english\n",
"facebook/wav2vec2-base-960h\n",
"Custom model based on sentence transformers\n",
"The coolest thing was how easy it was to define a complete custom interface from the model to the inference process. It just took us a couple of hours to adapt our code, and have a functioning and totally custom endpoint.\n",
"Andrea Boscarino\n",
"Data Scientist at Musixmatch\n",
"Endpoints for\n",
"Health\n",
"Phamily improves patient health with intelligent care management\n",
"Use Case\n",
"HIPAA-compliant secure endpoints for text classification\n",
"Models Deployed\n",
"Custom model based on text-classification (MPNET)\n",
"Custom\n",
"\n",
"### Link: Discord community\n",
"Hugging Face\n",
"\n",
"You need to enable JavaScript to run this app.\n"
]
}
],
"source": [
"print(fetch_page_and_all_relevant_links(\"https://huggingface.co\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b863a55-f86c-4e3f-8a79-94e24c1a8cf2",
"metadata": {},
"outputs": [],
"source": [
"brochure_system_prompt = \"\"\"\n",
"You are an assistant that analyzes the contents of several relevant pages from a company website\n",
"and creates a short brochure about the company for prospective customers, investors and recruits.\n",
"Respond in markdown without code blocks.\n",
"Include details of company culture, customers and careers/jobs if you have the information.\n",
"\"\"\"\n",
"\n",
"# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':\n",
"\n",
"# brochure_system_prompt = \"\"\"\n",
"# You are an assistant that analyzes the contents of several relevant pages from a company website\n",
"# and creates a short, humorous, entertaining, witty brochure about the company for prospective customers, investors and recruits.\n",
"# Respond in markdown without code blocks.\n",
"# Include details of company culture, customers and careers/jobs if you have the information.\n",
"# \"\"\"\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "6ab83d92-d36b-4ce0-8bcc-5bb4c2f8ff23",
"metadata": {},
"outputs": [],
"source": [
"def get_brochure_user_prompt(company_name, url):\n",
" user_prompt = f\"\"\"\n",
"You are looking at a company called: {company_name}\n",
"Here are the contents of its landing page and other relevant pages;\n",
"use this information to build a short brochure of the company in markdown without code blocks.\\n\\n\n",
"\"\"\"\n",
" user_prompt += fetch_page_and_all_relevant_links(url)\n",
" user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters\n",
" return user_prompt"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "cd909e0b-1312-4ce2-a553-821e795d7572",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
"Found 7 relevant links\n"
]
},
{
"data": {
"text/plain": [
"\"\\nYou are looking at a company called: HuggingFace\\nHere are the contents of its landing page and other relevant pages;\\nuse this information to build a short brochure of the company in markdown without code blocks.\\n\\n\\n## Landing Page:\\n\\nHugging Face The AI community building the future.\\n\\nHugging Face\\nModels\\nDatasets\\nSpaces\\nCommunity\\nDocs\\nEnterprise\\nPricing\\nLog In\\nSign Up\\nThe AI community building the future.\\nThe platform where the machine learning community collaborates on models, datasets, and applications.\\nExplore AI Apps\\nor\\nBrowse 1M+ models\\nTrending on\\nthis week\\nModels\\ntencent/HunyuanImage-3.0\\nUpdated\\nabout 4 hours ago\\n•\\n825\\n•\\n714\\ndeepseek-ai/DeepSeek-V3.2-Exp\\nUpdated\\n3 days ago\\n•\\n10.7k\\n•\\n470\\ntencent/Hunyuan3D-Part\\nUpdated\\n4 days ago\\n•\\n2.57k\\n•\\n461\\nzai-org/GLM-4.6\\nUpdated\\n2 days ago\\n•\\n9.72k\\n•\\n298\\ninclusionAI/Ring-1T-preview\\nUpdated\\n1 day ago\\n•\\n818\\n•\\n208\\nBrowse 1M+ models\\nSpaces\\nRunning\\n1.29k\\n1.29k\\nWan2.2 Animate\\n👁\\nWan2.2 Animate\\nRunning\\n14.5k\\n14.5k\\nDeepSite v3\\n🐳\\nGenerate any application with DeepSeek\\nRunning\\non\\nZero\\nMCP\\n1.46k\\n1.46k\\nWan2.2 14B Fast\\n🎥\\ngenerate a video from an image with a text prompt\\nRunning\\non\\nZero\\n196\\n196\\nQwen Image Edit 2509\\n👀\\nGenerate edited images based on user prompts\\nRunning\\non\\nCPU Upgrade\\n103\\n103\\nOstris' AI Toolkit\\n💻\\nTrain FLUX, Qwen and Wan LoRAs with Ostris Ai Toolkit\\nBrowse 400k+ applications\\nDatasets\\nopenai/gdpval\\nUpdated\\n7 days ago\\n•\\n17k\\n•\\n157\\nfka/awesome-chatgpt-prompts\\nUpdated\\nJan 6\\n•\\n49k\\n•\\n9.18k\\nnvidia/Nemotron-Personas-Japan\\nUpdated\\n9 days ago\\n•\\n9.74k\\n•\\n75\\nt-tech/T-ECD\\nUpdated\\n6 days ago\\n•\\n993\\n•\\n22\\nfacebook/seamless-interaction\\nUpdated\\nJul 14\\n•\\n17.5k\\n•\\n148\\nBrowse 250k+ datasets\\nThe Home of Machine Learning\\nCreate, discover and collaborate on ML better.\\nThe collaboration platform\\nHost and collaborate on unlimited public models, datasets and applications.\\nMove faster\\nWith the HF Open source stack.\\nExplore all modalities\\nText, image, video, audio or even 3D.\\nBuild your portfolio\\nShare your work with the world and build your ML profile.\\nSign Up\\nAccelerate your ML\\nWe provide paid Compute and Enterprise solutions.\\nCompute\\nDeploy on optimized\\nInference Endpoints\\nor update your\\nSpaces applications\\nto a GPU in a few clicks.\\nView pricing\\nStarting at $0.60/hour for GPU\\nTea\\n## Relevant Links:\\n\\n\\n### Link: brand page\\nBrand assets - Hugging Face\\n\\nHugging Face\\nModels\\nDatasets\\nSpaces\\nCommunity\\nDocs\\nEnterprise\\nPricing\\nLog In\\nSign Up\\nHugging Face · Brand assets\\nHF Logos\\n.svg\\n.png\\n.ai\\n.svg\\n.png\\n.ai\\n.svg\\n.png\\n.ai\\nHF Colors\\n#FFD21E\\n#FF9D00\\n#6B7280\\nHF Bio\\nHugging Face is the collaboration platform for the machine learning community.\\n\\nThe Hugging Face Hub works as a central place where anyone can share, explore, discover, and experiment with open-source ML. HF empowers the next generation of machine learning engineers, scientists, and end users to learn, collaborate and share their work to build an open and ethical AI future together.\\n\\nWith the fast-growing community, some of the most used open-source ML libraries and tools, and a talented science team exploring the edge of tech, Hugging Face is at the heart of the AI revolution.\\nCopy to clipboard\\nHF Universe\\nFind other assets available for use from the Hugging Face brand universe\\nhere\\n.\\nSystem theme\\nWebsite\\nModels\\nDatasets\\nSpaces\\nChangelog\\nInference Endpoints\\nHuggingChat\\nCompany\\nAbout\\nBrand assets\\nTerms of service\\nPrivacy\\nJobs\\nPress\\nResources\\nLearn\\nDocumentation\\nBlog\\nForum\\nService Status\\nSocial\\nGitHub\\nTwitter\\nLinkedIn\\nDiscord\\n\\n### Link: careers page\\nHugging Face - Current Openings\\n\\n\\n\\n### Link: company page\\nHugging Face | LinkedIn\\n\\nSkip to main content\\nLinkedIn\\nTop Content\\nPeople\\nLearning\\nJobs\\nGames\\nGet the app\\nJoin now\\nSign in\\nHugging Face\\nSoftware Development\\nThe AI community building the future.\\nSee jobs\\nFollow\\nView all 625 employees\\nReport this company\\nAbout us\\nThe AI community building the future.\\nWebsite\\nhttps://huggingface.co\\nExternal link for Hugging Face\\nIndustry\\nSoftware Development\\nCompany size\\n51-200 employees\\nType\\nPrivately Held\\nFounded\\n2016\\nSpecialties\\nmachine learning, natural language processing, and deep learning\\nProducts\\nHugging Face\\nHugging Face\\nNatural Language Processing (NLP) Software\\nWere on a journey to solve and democratize artificial intelligence through natural language.\\nLocations\\nPrimary\\nGet directions\\nParis, FR\\nGet directions\\nEmployees at Hugging Face\\nLudovic Huraux\\nRajat Arya\\nTech Lead & Software Engineer @ HF | prev: co-founder XetHub, Apple, Turi, AWS, Microsoft\\nJeff Boudier\\nProduct + Growth at Hugging Face\\nTerrence Rohan\\nSeed Investor\\nSee all employees\\nUpdates\\nHugging Face\\nreposted this\\nMercuri\\n3,316 followers\\n1d\\nReport this post\\n🚨 404: media not found 🚨\\n\\nWe're teaming up\\nThe Tech Bros\\n,\\nCanva\\n,\\nElevenLabs\\n, and\\nHugging Face\\nfor an all female media x tech\\n#hackathon\\nat the Canva offices 👩🏾💻👩🏼💻👩🏻💻\\n\\n⚡ Build prototypes fast\\n⚡ Get feedback from engineers & investors\\n⚡ Meet your co-founder!?\\n\\nHugging Face, Canva and ElevenLabs are unicorns shaping the future of media and creative tech. At\\nMercuri\\n, we bring the early-st\""
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_brochure_user_prompt(\"HuggingFace\", \"https://huggingface.co\")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "8b45846d",
"metadata": {},
"outputs": [],
"source": [
"def create_brochure(company_name, url):\n",
" response = openai.chat.completions.create(\n",
" model=\"gpt-4.1-mini\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": brochure_system_prompt},\n",
" {\"role\": \"user\", \"content\": get_brochure_user_prompt(company_name, url)}\n",
" ],\n",
" )\n",
" result = response.choices[0].message.content\n",
" display(Markdown(result))"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "b123615a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
"Found 13 relevant links\n"
]
},
{
"data": {
"text/markdown": [
"# Hugging Face Brochure\n",
"\n",
"---\n",
"\n",
"## About Hugging Face\n",
"\n",
"**Hugging Face** is the premier collaboration platform for the machine learning (ML) community, dedicated to building the future of artificial intelligence through openness and shared knowledge. It serves as a vibrant hub where ML engineers, scientists, and enthusiasts from across the globe can share, discover, and experiment with cutting-edge open-source machine learning models, datasets, and applications.\n",
"\n",
"At its core, Hugging Face empowers the next generation of AI innovators to learn, collaborate, and contribute to an open and ethical AI future.\n",
"\n",
"---\n",
"\n",
"## What We Offer\n",
"\n",
"- **Model Hub**: Access and browse over **1 million pre-trained models** covering all modalities including text, image, video, audio, and 3D.\n",
"- **Datasets**: Explore and use more than **250,000 datasets** curated for various machine learning tasks.\n",
"- **Spaces**: Deploy and share interactive ML applications and demos; over **400,000 apps** available and growing.\n",
"- **Compute Solutions**: Scalable paid compute services including optimized inference endpoints and GPU support starting at $0.60/hour.\n",
"- **Enterprise Solutions**: Customized offerings tailored to accelerate business adoption of machine learning technologies.\n",
"- **Open Source Stack**: Leverage the extensive Hugging Face open-source libraries to move faster in development.\n",
"\n",
"---\n",
"\n",
"## Our Community\n",
"\n",
"Hugging Face is much more than a platform — it is a thriving, fast-growing community of machine learning practitioners who collectively build, maintain, and democratize AI resources. Collaborators range from individual researchers to top organizations, contributing models like:\n",
"\n",
"- Tencents HunyuanImage series\n",
"- deepseek-ais DeepSeek models\n",
"- NVIDIA and Facebooks datasets and models\n",
"\n",
"Community members benefit from the ability to:\n",
"\n",
"- Host and collaborate on unlimited public projects\n",
"- Share their work globally to build a professional ML portfolio\n",
"- Collaborate using state-of-the-art open-source tools \n",
" \n",
"---\n",
"\n",
"## Company Culture\n",
"\n",
"- **Collaborative:** Empowering users to co-create and improve AI together.\n",
"- **Open & Ethical:** Committed to building an open, transparent, and socially responsible AI ecosystem.\n",
"- **Innovative:** Constantly advancing the frontier in machine learning through community-driven innovation.\n",
"- **Inclusive:** Bringing together diverse contributors and users worldwide.\n",
"\n",
"---\n",
"\n",
"## Careers at Hugging Face\n",
"\n",
"Join a team on the front lines of democratizing AI and machine learning! Hugging Face is looking for passionate engineers, researchers, data scientists, and product experts who want to make a global impact by enabling accessible, ethical, and collaborative AI development.\n",
"\n",
"By working here, you'll:\n",
"\n",
"- Collaborate with world-class ML experts\n",
"- Shape the future of open-source AI tools\n",
"- Impact a vibrant global community of hundreds of thousands of users \n",
"- Thrive in an open culture that values creativity, diversity, and ethical AI \n",
"\n",
"Visit [huggingface.co/careers](https://huggingface.co/careers) to explore current job openings.\n",
"\n",
"---\n",
"\n",
"## Why Choose Hugging Face?\n",
"\n",
"- The **largest and most diverse** collection of open-source machine learning resources.\n",
"- A **community-first platform** that fosters collaboration and rapid innovation.\n",
"- Flexible solutions for individuals, communities, startups, and enterprises.\n",
"- Trusted by industry leaders and researchers worldwide.\n",
"\n",
"---\n",
"\n",
"## Get Started Today\n",
"\n",
"- Explore AI apps and models at [huggingface.co](https://huggingface.co)\n",
"- Join the community, build your portfolio, and accelerate your AI journey.\n",
"- Scale your projects with our pay-as-you-go compute resources.\n",
"\n",
"**Hugging Face** — The AI Community Building the Future.\n",
"\n",
"---\n",
"\n",
"# Contact & Links\n",
"\n",
"- Website: https://huggingface.co\n",
"- Community: https://huggingface.co/community\n",
"- Documentation: https://huggingface.co/docs\n",
"- Careers: https://huggingface.co/careers\n",
"\n",
"---\n",
"\n",
"*Hugging Face is your home for collaboration, innovation, and advancing machine learning together.*"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"create_brochure(\"HuggingFace\", \"https://huggingface.co\")"
]
},
{
"cell_type": "markdown",
"id": "61eaaab7-0b47-4b29-82d4-75d474ad8d18",
"metadata": {},
"source": [
"## Finally - a minor improvement\n",
"\n",
"With a small adjustment, we can change this so that the results stream back from OpenAI,\n",
"with the familiar typewriter animation"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "51db0e49-f261-4137-aabe-92dd601f7725",
"metadata": {},
"outputs": [],
"source": [
"def stream_brochure(company_name, url):\n",
" stream = openai.chat.completions.create(\n",
" model=\"gpt-4.1-mini\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": brochure_system_prompt},\n",
" {\"role\": \"user\", \"content\": get_brochure_user_prompt(company_name, url)}\n",
" ],\n",
" stream=True\n",
" ) \n",
" response = \"\"\n",
" display_handle = display(Markdown(\"\"), display_id=True)\n",
" for chunk in stream:\n",
" response += chunk.choices[0].delta.content or ''\n",
" update_display(Markdown(response), display_id=display_handle.display_id)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "56bf0ae3-ee9d-4a72-9cd6-edcac67ceb6d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
"Found 9 relevant links\n"
]
},
{
"data": {
"text/markdown": [
"# Hugging Face: The AI Community Building the Future\n",
"\n",
"---\n",
"\n",
"## About Hugging Face\n",
"\n",
"Hugging Face is a vibrant and fast-growing AI and machine learning community dedicated to advancing the future of open and ethical AI. It serves as a collaboration platform where machine learning engineers, scientists, and enthusiasts from around the world create, discover, and share machine learning models, datasets, and applications.\n",
"\n",
"The Hugging Face Hub is the central place for hosting and collaborating on over **1 million models**, **250k+ datasets**, and **400k+ applications** spanning multiple modalities including text, images, video, audio, and 3D.\n",
"\n",
"---\n",
"\n",
"## What Hugging Face Offers\n",
"\n",
"- **Models:** Browse and explore a vast collection of AI models, updated frequently by a diverse community. Popular trending models include image generators, video editors, and language models.\n",
" \n",
"- **Datasets:** Access a rich catalog of datasets ranging from natural language prompts to persona datasets and beyond, enabling rapid model training and experimentation.\n",
" \n",
"- **Spaces:** Deploy, share, and run AI applications on Hugging Face's platform, including innovative tools such as video generation from images, AI toolkits, and interactive applications.\n",
"\n",
"- **Compute:** Easily accelerate your machine learning workloads with optimized inference endpoints and GPU-backed applications starting as low as $0.60/hour.\n",
"\n",
"- **Enterprise Solutions:** Hugging Face provides tailored enterprise offerings to help organizations harness AI technology securely and effectively at scale.\n",
"\n",
"---\n",
"\n",
"## Company Culture\n",
"\n",
"Hugging Face fosters an **open, collaborative, and ethical AI community** — empowering developers and researchers to work together transparently and inclusively. The culture emphasizes:\n",
"\n",
"- **Community-driven innovation:** Encouraging open-source contributions and sharing knowledge.\n",
"- **Ethical AI development:** Committed to building responsible AI systems for the benefit of all.\n",
"- **Diversity and inclusion:** Supporting a broad and global community in building AI tools and resources.\n",
"- **Learning and growth:** Offering a platform for users to build their machine learning portfolios and grow their expertise.\n",
"\n",
"---\n",
"\n",
"## Customers & Community\n",
"\n",
"Hugging Face serves a global community that includes:\n",
"\n",
"- Machine learning researchers and practitioners\n",
"- Developers building innovative AI applications\n",
"- Enterprises seeking scalable AI solutions\n",
"- AI enthusiasts and learners who want to experiment and grow their skills\n",
"\n",
"With over a million models and hundreds of thousands of datasets and applications, the platform connects users with cutting-edge AI technologies from companies like Tencent, NVIDIA, Facebook, and OpenAI — showcasing the power of open collaboration.\n",
"\n",
"---\n",
"\n",
"## Careers at Hugging Face\n",
"\n",
"Interested in joining Hugging Face? The company values talented individuals passionate about AI, open-source technologies, and community building. Opportunities include roles in:\n",
"\n",
"- Machine learning research & engineering\n",
"- Software development\n",
"- Cloud infrastructure and compute optimization\n",
"- Community management and developer relations\n",
"- Enterprise solutions & sales\n",
"\n",
"At Hugging Face, you will work at the forefront of AI innovation while contributing to a global, open, and ethical AI movement.\n",
"\n",
"---\n",
"\n",
"## Get Involved\n",
"\n",
"- **Explore AI models and datasets:** Visit [huggingface.co](https://huggingface.co) to browse millions of resources.\n",
"- **Build and share your own models and apps:** Create your portfolio in an open, collaborative environment.\n",
"- **Accelerate your projects:** Utilize Hugging Faces Compute and Enterprise services.\n",
"- **Join the community:** Engage with thousands of practitioners working toward the future of AI.\n",
"\n",
"---\n",
"\n",
"### Contact & Follow\n",
"\n",
"- Website: [https://huggingface.co](https://huggingface.co)\n",
"- Join the community to learn, collaborate, and innovate in AI!\n",
"\n",
"---\n",
"\n",
"*Hugging Face — Empowering the next generation of machine learning for a better, open future.*"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"stream_brochure(\"HuggingFace\", \"https://huggingface.co\")"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "fdb3f8d8-a3eb-41c8-b1aa-9f60686a653b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Selecting relevant links for https://huggingface.co by calling gpt-5-nano\n",
"Found 14 relevant links\n"
]
},
{
"data": {
"text/markdown": [
"# Welcome to Hugging Face The AI Community Building the Future!\n",
"\n",
"---\n",
"\n",
"## Who Are We? \n",
"Imagine a place where machine learning wizards, data sorcerers, and AI alchemists gather to share their spells — uh, models — datasets, and apps. Thats Hugging Face! Were *the* platform where the AI community collaborates, creates, and sometimes even has a little fun while building the future.\n",
"\n",
"Our motto? **\"Keep it open. Keep it ethical. Keep it hugging.\"** 💛\n",
"\n",
"---\n",
"\n",
"## Whats Cooking in the AI Kitchen?\n",
"\n",
"- **1 Million+ Models** — From image generators to language wizards, our treasure trove of open-source ML models grows faster than you can say \"neural network.\" \n",
"- **250,000+ Datasets** — Feeding AI brains with everything from chat prompts to persona profiles. Hungry for data? Dig in! \n",
"- **400,000+ Applications & Spaces** — Launch apps, share your ML portfolio, or just show off cool demos that make your friends say, “Whoa, AI can do that?” \n",
"- **Multimodal Madness** — Text, image, video, audio, even 3D...if AI had a Swiss Army knife, wed be it. \n",
"\n",
"---\n",
"\n",
"## Customers & Community \n",
"Whether youre a student trying to get your AI feet wet, a startup looking to scale your genius, or an enterprise aiming to deploy heavy-duty models in the real world, Hugging Face has your back.\n",
"\n",
"With the fastest growing community of *machine learning enthusiasts* and the support of some seriously big names and organizations, heres a place where:\n",
"\n",
"- **Freelancers** can build a portfolio and get noticed. \n",
"- **Researchers** can push boundaries openly and ethically. \n",
"- **Businesses** can accelerate AI adoption with our paid Compute and Enterprise suites. \n",
"\n",
"Join 1.29k+ Spaces and thousands more running models that power everything from video generation to AI-powered image editing.\n",
"\n",
"---\n",
"\n",
"## Culture & Career Geek Out with Us! \n",
"We believe collaboration beats isolation every day. Our culture?\n",
"\n",
"- Open source at heart ❤️ \n",
"- Ethical AI advocates \n",
"- Casual tea-drinkers and serious problem solvers \n",
"- Always learning, always sharing, always growing \n",
"\n",
"Want to build machine learning tools that millions will use? Hugging Face is where your skills meet endless possibilities. From ML engineers to community managers, our doors are wide open (virtual hugs included).\n",
"\n",
"---\n",
"\n",
"## Speed Up Your AI Journey \n",
"No need to code in the dark alone or fight for GPU time — deploy models and apps with a few clicks on optimized inference endpoints, starting at just $0.60/hour for GPU!\n",
"\n",
"Whether you want to host that killer new model or just tweak an existing one, we give you the tools and community support to **move faster, build smarter, and hug tighter**.\n",
"\n",
"---\n",
"\n",
"## Quick Hugging Face Facts \n",
"- **Founded:** Around the corner from the future \n",
"- **Colors:** Bright yellow (#FFD21E), orange (#FF9D00), and sleek gray (#6B7280) — because AI should be as vibrant as its ideas! \n",
"- **Mascot:** Friendly face with a warm smile (because AIs could learn a thing or two about friendliness here) \n",
"\n",
"---\n",
"\n",
"## Ready to Join the AI Hug Circle? \n",
"\n",
"Sign up, share your work, explore millions of models and datasets, and get your AI career (or project!) hugging new heights.\n",
"\n",
"[Explore AI Apps](#) | [Browse 1M+ Models](#) | [Sign Up & Join The Fun](#)\n",
"\n",
"---\n",
"\n",
"*Hugging Face — where the future of AI isnt just created; its hugged into existence.* 🤗✨"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:\n",
"\n",
"stream_brochure(\"HuggingFace\", \"https://huggingface.co\")"
]
},
{
"cell_type": "markdown",
"id": "a27bf9e0-665f-4645-b66b-9725e2a959b5",
"metadata": {},
"source": [
"<table style=\"margin: 0; text-align: left;\">\n",
" <tr>\n",
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
" <img src=\"../assets/business.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" </td>\n",
" <td>\n",
" <h2 style=\"color:#181;\">Business applications</h2>\n",
" <span style=\"color:#181;\">In this exercise we extended the Day 1 code to make multiple LLM calls, and generate a document.\n",
"\n",
"This is perhaps the first example of Agentic AI design patterns, as we combined multiple calls to LLMs. This will feature more in Week 2, and then we will return to Agentic AI in a big way in Week 8 when we build a fully autonomous Agent solution.\n",
"\n",
"Generating content in this way is one of the very most common Use Cases. As with summarization, this can be applied to any business vertical. Write marketing content, generate a product tutorial from a spec, create personalized email content, and so much more. Explore how you can apply content generation to your business, and try making yourself a proof-of-concept prototype. See what other students have done in the community-contributions folder -- so many valuable projects -- it's wild!</span>\n",
" </td>\n",
" </tr>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"id": "14b2454b-8ef8-4b5c-b928-053a15e0d553",
"metadata": {},
"source": [
"<table style=\"margin: 0; text-align: left;\">\n",
" <tr>\n",
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
" <img src=\"../assets/important.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" </td>\n",
" <td>\n",
" <h2 style=\"color:#900;\">Before you move to Week 2 (which is tons of fun)</h2>\n",
" <span style=\"color:#900;\">Please see the week1 EXERCISE notebook for your challenge for the end of week 1. This will give you some essential practice working with Frontier APIs, and prepare you well for Week 2.</span>\n",
" </td>\n",
" </tr>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"id": "17b64f0f-7d33-4493-985a-033d06e8db08",
"metadata": {},
"source": [
"<table style=\"margin: 0; text-align: left;\">\n",
" <tr>\n",
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
" <img src=\"../assets/resources.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" </td>\n",
" <td>\n",
" <h2 style=\"color:#f71;\">A reminder on 3 useful resources</h2>\n",
" <span style=\"color:#f71;\">1. The resources for the course are available <a href=\"https://edwarddonner.com/2024/11/13/llm-engineering-resources/\">here.</a><br/>\n",
" 2. I'm on LinkedIn <a href=\"https://www.linkedin.com/in/eddonner/\">here</a> and I love connecting with people taking the course!<br/>\n",
" 3. I'm trying out X/Twitter and I'm at <a href=\"https://x.com/edwarddonner\">@edwarddonner<a> and hoping people will teach me how it's done.. \n",
" </span>\n",
" </td>\n",
" </tr>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"id": "6f48e42e-fa7a-495f-a5d4-26bfc24d60b6",
"metadata": {},
"source": [
"<table style=\"margin: 0; text-align: left;\">\n",
" <tr>\n",
" <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
" <img src=\"../assets/thankyou.jpg\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
" </td>\n",
" <td>\n",
" <h2 style=\"color:#090;\">Finally! I have a special request for you</h2>\n",
" <span style=\"color:#090;\">\n",
" My editor tells me that it makes a MASSIVE difference when students rate this course on Udemy - it's one of the main ways that Udemy decides whether to show it to others. If you're able to take a minute to rate this, I'd be so very grateful! And regardless - always please reach out to me at ed@edwarddonner.com if I can help at any point.\n",
" </span>\n",
" </td>\n",
" </tr>\n",
"</table>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}