{ "cells": [ { "cell_type": "markdown", "id": "12934dbc-ff4f-4dfc-8cc1-d92cc8826cf2", "metadata": {}, "source": [ "# πŸ” Predicting Item Prices from Descriptions (Part 4)\n", "---\n", "- Data Curation & Preprocessing\n", "- Model Benchmarking – Traditional ML vs LLMs\n", "- E5 Embeddings & RAG\n", "- ➑️ Fine-Tuning GPT-4o Mini\n", "- Evaluating LLaMA 3.1 8B Quantized\n", "- Fine-Tuning LLaMA 3.1 with QLoRA\n", "- Evaluating Fine-Tuned LLaMA\n", "- Summary & Leaderboard\n", "\n", "---\n", "\n", "# πŸ”§ Part 4: Fine-Tuning GPT-4o Mini\n", "\n", "- πŸ§‘β€πŸ’» Skill Level: Advanced\n", "- βš™οΈ Hardware: βœ… CPU is sufficient β€” no GPU required\n", "- πŸ› οΈ Requirements: πŸ”‘ HF Token, Open API Key, wandb API Key\n", "- Tasks:\n", " - Convert chat data to .jsonl format for OpenAI\n", " - Fine-tune the model and monitor with Weights & Biases\n", " - Test the fine-tuned GPT-4o Mini \n", "\n", "Can fine-tuning GPT-4o Mini outperform both its zero-shot baseline and RAG-enhanced version? \n", "Time to find out.\n", "\n", "---\n", "πŸ“’ Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)" ] }, { "cell_type": "code", "execution_count": null, "id": "5809630f-d3ea-41df-86ec-9cbf59a46f5c", "metadata": {}, "outputs": [], "source": [ "# imports\n", "\n", "import os\n", "import importlib\n", "import json\n", "import re\n", "from dotenv import load_dotenv\n", "from huggingface_hub import login\n", "from datasets import load_dataset\n", "from openai import OpenAI" ] }, { "cell_type": "code", "execution_count": null, "id": "4120c84d-c310-4d31-9e1f-1549ea4a4186", "metadata": {}, "outputs": [], "source": [ "load_dotenv(override=True)\n", "\n", "openai_api_key = os.getenv('OPENAI_API_KEY')\n", "if not openai_api_key:\n", " print(\"❌ OPENAI_API_KEY is missing\")\n", "\n", "openai = OpenAI(api_key=openai_api_key)\n", "\n", "hf_token = os.getenv('HF_TOKEN')\n", "if not hf_token:\n", " print(\"❌ HF_TOKEN is missing\")\n", "\n", "login(hf_token, add_to_git_credential=True)" ] }, { "cell_type": "markdown", "id": "31d3aa97-68a8-4f71-a43f-107f7c8553c5", "metadata": {}, "source": [ "## πŸ“₯ Load Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "f2bae96a", "metadata": {}, "outputs": [], "source": [ "# #If you face NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported run:\n", "# %pip install -U datasets" ] }, { "cell_type": "code", "execution_count": null, "id": "c45e23d6-1304-4859-81f0-35a9ddf1c755", "metadata": {}, "outputs": [], "source": [ "HF_USER = \"lisekarimi\"\n", "DATASET_NAME = f\"{HF_USER}/pricer-data\"\n", "\n", "dataset = load_dataset(DATASET_NAME)\n", "train = dataset['train']\n", "test = dataset['test']" ] }, { "cell_type": "code", "execution_count": null, "id": "667adda8-add8-41b6-9e60-7870bad20c02", "metadata": {}, "outputs": [], "source": [ "test[0]" ] }, { "cell_type": "markdown", "id": "b85d86d0-b6b1-49cd-9ef0-9214c1267199", "metadata": {}, "source": [ "## πŸ› οΈ Step 1 : Data Preparation" ] }, { "cell_type": "markdown", "id": "d3ba760d-467a-4cd9-8d3f-e6ce84273610", "metadata": {}, "source": [ "To fine-tune GPT-4o-mini, OpenAI requires training data in **.jsonl format**. \n", "\n", "`make_jsonl` converts our chat data :\n", "\n", "from \n", "\n", "[\n", " {\"role\": \"system\", \"content\": \"You estimate prices of items. Reply only with the price, no explanation\"},\n", " {\"role\": \"user\", \"content\": \"How much is this laptop worth?\"},\n", " {\"role\": \"assistant\", \"content\": \"Price is $999.00\"}\n", "]\n", "\n", "into the .jsonl format \n", "\n", "{\"messages\": [{\"role\": \"system\", \"content\": \"You estimate prices of items. Reply only with the price, no explanation\"}, {\"role\": \"user\", \"content\": \"How much is this laptop worth?\"}, {\"role\": \"assistant\", \"content\": \"Price is $999.00\"}]}\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ec254755-67f6-4676-b67f-c1376ea00124", "metadata": {}, "outputs": [], "source": [ "# Mask the price in the test item\n", "def mask_price_value(text):\n", " return re.sub(r\"(\\n\\nPrice is \\$).*\", r\"\\1\", text)" ] }, { "cell_type": "code", "execution_count": null, "id": "e5e51957-b0ec-49f9-ae70-74771a101756", "metadata": {}, "outputs": [], "source": [ "def messages_for(datapoint):\n", " system_message = \"You estimate prices of items. Reply only with the price, no explanation\"\n", " user_prompt = mask_price_value(datapoint[\"text\"]).replace(\" to the nearest dollar\", \"\").replace(\"\\n\\nPrice is $\",\"\")\n", " assistant_response = f\"Price is ${datapoint['price']:.2f}\"\n", " return [\n", " {\"role\": \"system\", \"content\": system_message},\n", " {\"role\": \"user\", \"content\": user_prompt},\n", " {\"role\": \"assistant\", \"content\": assistant_response}\n", " ]\n", "\n", "messages_for(train[0])" ] }, { "cell_type": "code", "execution_count": null, "id": "03583d32-b0f2-44c0-820e-62c8e7e48247", "metadata": {}, "outputs": [], "source": [ "def make_jsonl(datapoints):\n", " result = \"\"\n", " for datapoint in datapoints:\n", " messages = messages_for(datapoint)\n", " messages_str = json.dumps(messages, ensure_ascii=False)\n", " result += '{\"messages\": ' + messages_str + '}\\n'\n", " return result.strip()\n", "\n", "make_jsonl(train.select([0]))" ] }, { "cell_type": "code", "execution_count": null, "id": "36c9cf60-0bcb-44cb-8df6-ff2ed4110cd2", "metadata": {}, "outputs": [], "source": [ "ft_train = train.select(range(100))\n", "ft_validation = train.select(range(100, 150))" ] }, { "cell_type": "code", "execution_count": null, "id": "494eaecd-ae5d-4396-b694-6faf88fb7fd6", "metadata": {}, "outputs": [], "source": [ "# Convert the items into jsonl and write them to a file\n", "\n", "def write_jsonl(datapoints, filename):\n", " with open(filename, \"w\", encoding=\"utf-8\") as f:\n", " jsonl = make_jsonl(datapoints)\n", " f.write(jsonl)" ] }, { "cell_type": "code", "execution_count": null, "id": "ae42986d-ab02-4a11-aa0c-ede9c63ec7a2", "metadata": {}, "outputs": [], "source": [ "write_jsonl(ft_train, \"data/ft_train.jsonl\")\n", "write_jsonl(ft_validation, \"data/ft_val.jsonl\")" ] }, { "cell_type": "code", "execution_count": null, "id": "b9bed22d-73ad-4820-a983-cbdccd8dbbc8", "metadata": {}, "outputs": [], "source": [ "with open(\"data/ft_train.jsonl\", \"rb\") as f:\n", " train_file = openai.files.create(file=f, purpose=\"fine-tune\")\n", "with open(\"data/ft_val.jsonl\", \"rb\") as f:\n", " validation_file = openai.files.create(file=f, purpose=\"fine-tune\")" ] }, { "cell_type": "code", "execution_count": null, "id": "1e6c6ce8-6600-4068-9ec5-32c6428ce9ea", "metadata": {}, "outputs": [], "source": [ "train_file" ] }, { "cell_type": "code", "execution_count": null, "id": "26943fad-4301-4bb4-97e8-be52a9743322", "metadata": {}, "outputs": [], "source": [ "validation_file" ] }, { "cell_type": "markdown", "id": "edb0a3ec-1607-4c5b-ab06-852f951cae8b", "metadata": {}, "source": [ "## πŸš€ Step 2: Run Fine-Tuning & Monitor with wandb\n", "We will use https://wandb.ai to monitor the training runs\n", "\n", "1- Create an API key in wandb\n", "\n", "2- Add this key in OpenAI dashboard https://platform.openai.com/account/organization" ] }, { "cell_type": "code", "execution_count": null, "id": "59f552fe-5e80-4742-94a8-5492556a6543", "metadata": {}, "outputs": [], "source": [ "wandb_integration = {\"type\": \"wandb\", \"wandb\": {\"project\": \"gpt-pricer\"}}" ] }, { "cell_type": "code", "execution_count": null, "id": "144088d7-7c30-439a-9282-1e6096c181ea", "metadata": {}, "outputs": [], "source": [ "# Run the fine tuning\n", "\n", "openai.fine_tuning.jobs.create(\n", " training_file=train_file.id,\n", " validation_file=validation_file.id,\n", " model=\"gpt-4o-mini-2024-07-18\",\n", " seed=42,\n", " hyperparameters={\"n_epochs\": 1},\n", " integrations = [wandb_integration],\n", " suffix=\"pricer\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "330e75f5-0208-4c74-8dd3-07bc06047b2e", "metadata": {}, "outputs": [], "source": [ "job_id = openai.fine_tuning.jobs.list(limit=1).data[0].id\n", "job_id\n", "\n", "# Then check your wandb dashboard to view the run of this job ID" ] }, { "cell_type": "code", "execution_count": null, "id": "4a92dac5-e6d8-439c-b55e-507becb37a6c", "metadata": {}, "outputs": [], "source": [ "# Use this command to track the fine-tuning progress here\n", "\n", "openai.fine_tuning.jobs.list_events(fine_tuning_job_id=job_id, limit=2).data" ] }, { "cell_type": "markdown", "id": "b6b65677-06b2-47d3-b0e6-51210a3d832b", "metadata": {}, "source": [ "# πŸ“§ You’ll get an email once fine-tuning is complete. β˜• You can take a break until then. ▢️ Once you receive it, run the cells below to continue." ] }, { "cell_type": "markdown", "id": "0a7af4be-0b55-4654-af7a-f47485babc52", "metadata": {}, "source": [ "## Step 3 : Test the fine tuned model" ] }, { "cell_type": "code", "execution_count": null, "id": "c8497eb8-49ee-4a05-9e51-fc1b4b2b41d4", "metadata": {}, "outputs": [], "source": [ "ft_model_name = openai.fine_tuning.jobs.retrieve(job_id).fine_tuned_model\n", "ft_model_name" ] }, { "cell_type": "markdown", "id": "12bed33f-be31-4d7c-8651-3f267c529304", "metadata": {}, "source": [ "You can find the entire fine-tuning process in the **Fine-tuning** dashboard on OpenAI.\n", "\n", "![Fine-tuning Process](https://github.com/lisekarimi/lexo/blob/main/assets/09_ft_gpt4omini.png?raw=true)" ] }, { "cell_type": "code", "execution_count": null, "id": "ac6a89ef-f982-457a-bad7-bd84b6132a07", "metadata": {}, "outputs": [], "source": [ "# Build LLM messages\n", "def build_messages(datapoint):\n", " system_message = \"You estimate prices of items. Reply only with the price, no explanation\"\n", " user_prompt = mask_price_value(datapoint[\"text\"]).replace(\" to the nearest dollar\", \"\").replace(\"\\n\\nPrice is $\",\"\")\n", " return [\n", " {\"role\": \"system\", \"content\": system_message},\n", " {\"role\": \"user\", \"content\": user_prompt},\n", " {\"role\": \"assistant\", \"content\": \"Price is $\"}\n", " ]\n", "\n", "def get_price(s):\n", " s = s.replace('$','').replace(',','')\n", " match = re.search(r\"[-+]?\\d*\\.\\d+|\\d+\", s)\n", " return float(match.group()) if match else 0\n", "\n", "def gpt_ft(datapoint):\n", " response = openai.chat.completions.create(\n", " model=ft_model_name,\n", " messages=build_messages(datapoint),\n", " seed=42,\n", " max_tokens=7\n", " )\n", " reply = response.choices[0].message.content\n", " return get_price(reply)" ] }, { "cell_type": "code", "execution_count": null, "id": "93a93017-458c-4769-b81c-b2dad2af7552", "metadata": {}, "outputs": [], "source": [ "print(test[0][\"price\"])\n", "print(gpt_ft(test[0]))" ] }, { "cell_type": "markdown", "id": "87a5ad10-ed60-4533-ad61-225ceb847e6c", "metadata": {}, "source": [ "πŸ”” **Reminder:** \n", "- In **Part 2**, GPT-4o Mini (zero-shot) scored: \n", " Avg. Error: ~$99 | RMSLE: 0.75 | Accuracy: 44.8% \n", "\n", "- In **Part 3**, with **RAG**, performance improved to: \n", " Avg. Error: ~$59.54 | RMSLE: 0.42 | Accuracy: 69.2%\n", "\n", "πŸ§ͺ **Now it’s time to see** if fine-tuning can push GPT-4o Mini even further and outperform both baselines." ] }, { "cell_type": "code", "execution_count": null, "id": "0adf1500-9cc7-491a-9ea6-88932af85dca", "metadata": {}, "outputs": [], "source": [ "import helpers.testing\n", "importlib.reload(helpers.testing)\n", "\n", "from helpers.testing import Tester # noqa: E402\n", "\n", "tester = Tester(gpt_ft, test)\n", "tester.run()" ] }, { "cell_type": "markdown", "id": "37439666", "metadata": {}, "source": [ "Gpt Ft Error=$129.16 RMSLE=0.94 Hits=35.2%" ] }, { "cell_type": "markdown", "id": "5487da30-e1a8-4db5-bf17-80bc4f109524", "metadata": {}, "source": [ "**Fine-tuning GPT-4o Mini led to worse performance than both its zero-shot and RAG-enhanced versions.**\n", "\n", "⚠️ When Fine-Tuning Isn’t Needed:\n", "- For tasks like price prediction, GPT-4o performs well with prompting alone β€” thanks to strong pretraining and generalization.\n", "- πŸ’‘ Fine-tuning isn’t always better. Use it when prompting fails β€” not by default.\n", "\n", "βœ… **When Fine-Tuning Is Worth It (based on OpenAI’s own guidelines)**\n", "- Custom tone/style – e.g., mimicking a brand voice or writing like a specific author\n", "- More consistent output – e.g., always following a strict format\n", "- Fix prompt failures – e.g., when multi-step instructions get ignored\n", "- Handle edge cases – e.g., rare product types or weird inputs\n", "- Teach new tasks – e.g., estimating prices in a custom format no model has seen before\n", "\n", "---\n", "\n", "Now that we’ve explored both frontier closed-source models and traditional ML, it’s time to turn to open-source.\n", "\n", "πŸš€ **Next up: Fine-tuned LLaMA 3.1 8B (quantized)** β€” can it beat its base version, outperform GPT-4o Mini, or even challenge the big players?\n", "\n", "πŸ” Let’s find out in the [next notebook](https://github.com/lisekarimi/lexo/blob/main/09_part5_llama31_8b_quant.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 5 }