last commit

2025-01-27 15:06:31 -05:00
parent daf9b36e28
commit ebf3600887
1 changed files with 11 additions and 40 deletions
--- a/week1/community-contributions/wk1-day1-deepseek-stream-summarize.ipynb
+++ b/week1/community-contributions/wk1-day1-deepseek-stream-summarize.ipynb
@@ -2,47 +2,10 @@
 "cells": [
  {
   "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": null,
   "id": "a767b6bc-65fe-42b2-988f-efd54125114f",
   "metadata": {},
-   "outputs": [
+   "outputs": [],
    {
     "data": {
      "text/markdown": [
       "```markdown\n",
       "# Summary of \"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning\"\n",
       "\n",
       "## Overview\n",
       "The paper introduces **DeepSeek-R1**, a first-generation reasoning model developed by DeepSeek-AI. The model is designed to enhance reasoning capabilities in large language models (LLMs) using reinforcement learning (RL). Two versions are presented:\n",
       "- **DeepSeek-R1-Zero**: A model trained via large-scale RL without supervised fine-tuning (SFT), showcasing strong reasoning abilities but facing challenges like poor readability and language mixing.\n",
       "- **DeepSeek-R1**: An improved version incorporating multi-stage training and cold-start data before RL, achieving performance comparable to OpenAI's models on reasoning tasks.\n",
       "\n",
       "## Key Contributions\n",
       "- Open-sourcing of **DeepSeek-R1-Zero**, **DeepSeek-R1**, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama architectures.\n",
       "- The models are made available to support the research community.\n",
       "\n",
       "## Community Engagement\n",
       "- The paper has been widely discussed and recommended, with 216 upvotes and 45 models citing it.\n",
       "- Additional resources, including a video review and articles, are available through external links provided by the community.\n",
       "\n",
       "## Related Research\n",
       "The paper is part of a broader trend in enhancing LLMs' reasoning abilities, with related works such as:\n",
       "- **Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization (2024)**\n",
       "- **Offline Reinforcement Learning for LLM Multi-Step Reasoning (2024)**\n",
       "- **Reasoning Language Models: A Blueprint (2025)**\n",
       "\n",
       "## Availability\n",
       "- The paper and models are accessible on [GitHub](https://github.com/deepseek-ai/DeepSeek-R1) and the [arXiv page](https://arxiv.org/abs/2501.12948).\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import os\n",
    "import requests\n",
@@ -132,11 +95,19 @@
    "\n",
    "display_summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01c9e5e7-7510-43ef-bb9c-aa44b15d39a7",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "llms",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },