Added DeepSeek to weeks 1, 2 and 8

2025-01-28 12:23:46 -05:00
parent 8cb97665af
commit 7d6d9959df
9 changed files with 298 additions and 9 deletions
--- a/EXERCISE.ipynb
+++ b/EXERCISE.ipynb
@@ -203,6 +203,46 @@
    "print(response.choices[0].message.content)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "bc7d1de3-e2ac-46ff-a302-3b4ba38c4c90",
+   "metadata": {},
+   "source": [
+    "## Also trying the amazing reasoning model DeepSeek\n",
+    "\n",
+    "Here we use the version of DeepSeek-reasoner that's been distilled to 1.5B.  \n",
+    "This is actually a 1.5B variant of Qwen that has been fine-tuned using synethic data generated by Deepseek R1.\n",
+    "\n",
+    "Other sizes of DeepSeek are [here](https://ollama.com/library/deepseek-r1) all the way up to the full 671B parameter version, which would use up 404GB of your drive and is far too large for most!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cf9eb44e-fe5b-47aa-b719-0bb63669ab3d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!ollama pull deepseek-r1:1.5b"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1d3d554b-e00d-4c08-9300-45e073950a76",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# This may take a few minutes to run! You should then see a fascinating \"thinking\" trace inside <think> tags, followed by some decent definitions\n",
+    "\n",
+    "response = ollama_via_openai.chat.completions.create(\n",
+    "    model=\"deepseek-r1:1.5b\",\n",
+    "    messages=[{\"role\": \"user\", \"content\": \"Please give definitions of some core concepts behind LLMs: a neural network, attention and the transformer\"}]\n",
+    ")\n",
+    "\n",
+    "print(response.choices[0].message.content)"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "1622d9bb-5c68-4d4e-9ca4-b492c751f898",