last commit
This commit is contained in:
@@ -2,47 +2,10 @@
|
|||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 9,
|
"execution_count": null,
|
||||||
"id": "a767b6bc-65fe-42b2-988f-efd54125114f",
|
"id": "a767b6bc-65fe-42b2-988f-efd54125114f",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [
|
"outputs": [],
|
||||||
{
|
|
||||||
"data": {
|
|
||||||
"text/markdown": [
|
|
||||||
"```markdown\n",
|
|
||||||
"# Summary of \"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning\"\n",
|
|
||||||
"\n",
|
|
||||||
"## Overview\n",
|
|
||||||
"The paper introduces **DeepSeek-R1**, a first-generation reasoning model developed by DeepSeek-AI. The model is designed to enhance reasoning capabilities in large language models (LLMs) using reinforcement learning (RL). Two versions are presented:\n",
|
|
||||||
"- **DeepSeek-R1-Zero**: A model trained via large-scale RL without supervised fine-tuning (SFT), showcasing strong reasoning abilities but facing challenges like poor readability and language mixing.\n",
|
|
||||||
"- **DeepSeek-R1**: An improved version incorporating multi-stage training and cold-start data before RL, achieving performance comparable to OpenAI's models on reasoning tasks.\n",
|
|
||||||
"\n",
|
|
||||||
"## Key Contributions\n",
|
|
||||||
"- Open-sourcing of **DeepSeek-R1-Zero**, **DeepSeek-R1**, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama architectures.\n",
|
|
||||||
"- The models are made available to support the research community.\n",
|
|
||||||
"\n",
|
|
||||||
"## Community Engagement\n",
|
|
||||||
"- The paper has been widely discussed and recommended, with 216 upvotes and 45 models citing it.\n",
|
|
||||||
"- Additional resources, including a video review and articles, are available through external links provided by the community.\n",
|
|
||||||
"\n",
|
|
||||||
"## Related Research\n",
|
|
||||||
"The paper is part of a broader trend in enhancing LLMs' reasoning abilities, with related works such as:\n",
|
|
||||||
"- **Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization (2024)**\n",
|
|
||||||
"- **Offline Reinforcement Learning for LLM Multi-Step Reasoning (2024)**\n",
|
|
||||||
"- **Reasoning Language Models: A Blueprint (2025)**\n",
|
|
||||||
"\n",
|
|
||||||
"## Availability\n",
|
|
||||||
"- The paper and models are accessible on [GitHub](https://github.com/deepseek-ai/DeepSeek-R1) and the [arXiv page](https://arxiv.org/abs/2501.12948).\n",
|
|
||||||
"```"
|
|
||||||
],
|
|
||||||
"text/plain": [
|
|
||||||
"<IPython.core.display.Markdown object>"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"metadata": {},
|
|
||||||
"output_type": "display_data"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"import requests\n",
|
"import requests\n",
|
||||||
@@ -132,11 +95,19 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"display_summary()"
|
"display_summary()"
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "01c9e5e7-7510-43ef-bb9c-aa44b15d39a7",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "llms",
|
"display_name": "Python 3 (ipykernel)",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
"name": "python3"
|
"name": "python3"
|
||||||
},
|
},
|
||||||
|
|||||||
Reference in New Issue
Block a user