Merge pull request #546 from 7haz/my-contribution-week1

week1 contribution - Youtube Podcast Summarizer using Transcript API
This commit is contained in:
Ed Donner
2025-08-01 08:07:57 -04:00
committed by GitHub

View File

@@ -0,0 +1,233 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# **Youtube Informative-video Summerizer**\n",
"\n",
"This python app let's you summerize youtube videos that contains information-sharing-through-talking, like someone talking about a subject, someone sharing a life advice, a podcast etc.\n",
"\n",
"We extract the transcipt analyize it with an LLM to summerize and create summerization and analysis.\n",
"\n",
"\n",
"> We use youtube_transcript_api which allows you to get the transcript text of any youtube video.\n",
"\n",
"> Results however are not ideal for our use case since it does not provide who says what in case of more than one speaker. it only provide one giant string of all the words said in the video respectivly with some noise.\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
],
"metadata": {
"id": "4KULQ4rViju1"
}
},
{
"cell_type": "code",
"source": [
"#!pip install youtube-transcript-api"
],
"metadata": {
"id": "C21ZN5MNZ_1b"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"from youtube_transcript_api import YouTubeTranscriptApi\n",
"from youtube_transcript_api.formatters import TextFormatter, SRTFormatter\n",
"import re\n",
"from openai import OpenAI\n",
"from google.colab import userdata # dotenv equevilant for google colab\n",
"from IPython.display import Markdown, display, update_display"
],
"metadata": {
"id": "ttbBAJC7Zrn5"
},
"execution_count": 35,
"outputs": []
},
{
"cell_type": "code",
"source": [
"ytt = YouTubeTranscriptApi()\n",
"formatter = TextFormatter() # --> Plain text\n",
"# formatter = SRTFormatter() # --> With timestamps\n",
"\n",
"openai_api_key = userdata.get('OPENAI_TOKEN')\n",
"openai_client = OpenAI(api_key=openai_api_key)\n",
"MODEL = \"gpt-4o-mini\""
],
"metadata": {
"id": "1oP0uPCylaig"
},
"execution_count": 36,
"outputs": []
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"id": "ILPjwpGkZm1t"
},
"outputs": [],
"source": [
"def get_video_id(url):\n",
" \"\"\"Extracts video ID from a YouTube URL.\"\"\"\n",
" regex = r\"(?:v=|\\/)([0-9A-Za-z_-]{11}).*\"\n",
" match = re.search(regex, url)\n",
" if match:\n",
" return match.group(1)\n",
" raise ValueError(\"Invalid YouTube URL\")\n",
"\n",
"\n",
"def get_transcript(url):\n",
" video_id = get_video_id(url)\n",
" fetched_transcript = ytt.fetch(video_id)\n",
" # ^ defaults to English transcript, for other language use:\n",
" # fetched = ytt.fetch(video_id, languages=['de', 'en'])\n",
" transcript_text = formatter.format_transcript(fetched_transcript)\n",
" return transcript_text"
]
},
{
"cell_type": "code",
"source": [
"system_prompt = \"\"\"You are an expert assistant specialized in analyzing podcast transcripts. You will be given the full transcript of a YouTube podcast episode.\n",
"\n",
"Your task is to extract and summarize the main views or arguments presented in the podcast. For each view or argument, also identify and list any supporting evidence such as:\n",
"\n",
"- Facts or statistics\n",
"- Academic studies or research\n",
"- Theories or philosophical frameworks\n",
"- Anecdotes or personal experiences\n",
"- Expert opinions or quotes\n",
"\n",
"Recognize off topic segments and adds and igrone them.\n",
"\n",
"Structure your output in a clear and concise format.\n",
"\n",
"Output Format:\n",
"\n",
"Podcast Summary:\n",
"\n",
"1. View/Argument:\n",
" - Description: [Summarize the view or claim in 1-2 sentences.]\n",
" - Supporting Evidence:\n",
" • [Fact, study, or reasoning #1]\n",
" • [Fact, study, or reasoning #2]\n",
" • [Optional counterarguments or nuances, if any]\n",
"\n",
"2. View/Argument:\n",
" - Description: [...]\n",
" - Supporting Evidence:\n",
" • [...]\n",
"\n",
"Guidelines:\n",
"- Only include major views or arguments that are discussed in depth.\n",
"- Paraphrase in clear, neutral, and objective language.\n",
"- Do not include filler, small talk, or off-topic segments.\n",
"- If a claim lacks explicit evidence, note it as “No clear supporting evidence provided.”\n",
"\n",
"Always respond and orginize your response using Markdow.\n",
"\"\"\"\n"
],
"metadata": {
"id": "Ye3m_3lEejb_"
},
"execution_count": 38,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def get_user_prompt(title,url):\n",
" prompt = f\"Following is a transcript for a podcast titled '{title}' \\n\"\n",
" prompt += \"Carefully read through this content, analyse and summerize it as told, respond in Markdown.\"\n",
" prompt += \"\\nTranscript: \\n\\n\"\n",
" prompt += get_transcript(url)\n",
" return prompt"
],
"metadata": {
"id": "1jk6YbkpupqI"
},
"execution_count": 39,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# user_prompt = get_user_prompt()\n",
"def summerize_video(title,url):\n",
" user_prompt = get_user_prompt(title,url)\n",
" stream = openai_client.chat.completions.create(\n",
" model=MODEL,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_prompt},\n",
" ],\n",
" stream = True,\n",
" )\n",
"\n",
" response = \"\"\n",
" display_handle = display(Markdown(\"\"), display_id=True)\n",
" for chunk in stream:\n",
" response += chunk.choices[0].delta.content or ''\n",
" response = response.replace(\"```\",\"\").replace(\"markdown\", \"\")\n",
" update_display(Markdown(response), display_id=display_handle.display_id)"
],
"metadata": {
"id": "wJy0Qb8u9uqR"
},
"execution_count": 40,
"outputs": []
},
{
"cell_type": "code",
"source": [
"summerize_video(\"Anti-Aging Expert: Missing This Vitamin Is As Bad As Smoking! The Truth About Creatine!\",\"https://www.youtube.com/watch?v=JCTb3QSrGMQ\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "tbvBiPrv_O3i",
"outputId": "69d24254-e384-4b07-e35f-96c7bb733298"
},
"execution_count": 41,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "# Podcast Summary: \"Anti-Aging Expert: Missing This Vitamin Is As Bad As Smoking! The Truth About Creatine!\"\n\n1. **View/Argument: Vitamin D and Health Risks**\n - **Description:** Vitamin D deficiency significantly increases the risk of dementia and various health issues, yet many individuals are unaware of its critical importance.\n - **Supporting Evidence:**\n - Vitamin D deficiency can raise dementia risk by 80%.\n - Individuals with adequate vitamin D have a 40% reduced risk of dementia and experience better cognitive function.\n\n2. **View/Argument: Role of Lifestyle in Aging**\n - **Description:** Lifestyle choices account for over 70% of aging effects, with exercise and nutrition being key factors in improving longevity and health.\n - **Supporting Evidence:**\n - Studies show participants involved in regular exercise did not experience hippocampal shrinkage, but rather an increase in size.\n - Exercise is equated to a miracle drug for its extensive health benefits, as highlighted by unquantifiable positive effects when compared to medications.\n\n3. **View/Argument: Importance of Magnesium**\n - **Description:** Magnesium is crucial for cellular function, metabolism, and reducing cancer risk, yet nearly half the U.S. population is magnesium deficient.\n - **Supporting Evidence:**\n - Individuals with the highest magnesium levels have a 40% lower all-cause mortality.\n - A 24% increase in pancreatic cancer incidents is associated with every 100 mg decrease in magnesium intake.\n\n4. **View/Argument: Benefits of Creatine in Brain Health**\n - **Description:** Creatine isn't just beneficial for muscle health but also shows promise for enhancing cognitive performance, especially under stress or sleep deprivation.\n - **Supporting Evidence:**\n - A study found that creatine can negate cognitive deficits caused by 21 hours of sleep deprivation.\n - Users often report improved focus and energy levels when supplementing with creatine regularly.\n\n5. **View/Argument: Exercise and Hormonal Benefits**\n - **Description:** Regular exercise, especially high-intensity interval training, can reverse heart aging and improve mental health markers.\n - **Supporting Evidence:**\n - Participants in an intensive exercise program showed heart structures that were more akin to those of individuals two decades younger.\n - High-intensity workouts were shown to improve cognition and neuroplasticity due to the metabolic changes they induce.\n\n6. **View/Argument: Impact of Nutrition on Cognitive Function**\n - **Description:** A healthy diet rich in omega-3 fatty acids, vitamins D and other nutrients is essential for maintaining cognitive function and overall health.\n - **Supporting Evidence:**\n - Adequate omega-3 intake has been linked to a 5-year increase in life expectancy.\n - Regular consumption of nutrient-rich foods, such as blueberries and dark leafy greens, supports cognition and potentially reduces the risk of neurodegenerative diseases.\n\n7. **View/Argument: The Importance of Autophagy**\n - **Description:** Fasting promotes autophagy, a cellular cleaning process that can protect against diseases and improve health.\n - **Supporting Evidence:**\n - Studies suggest that fasting for 16 hours can activate autophagy and contribute to cellular repair processes.\n\n8. **View/Argument: Intermittent Fasting and Health Improvements**\n - **Description:** Intermittent fasting can improve metabolic parameters and cognitive performance while providing health benefits beyond simple calorie restriction.\n - **Supporting Evidence:**\n - Individuals practicing intermittent fasting showed improved glucose regulation compared to those restricting calories alone without fasting.\n\n9. **View/Argument: Microplastics and Health Risks**\n - **Description:** The pervasive presence of microplastics in everyday products poses health risks that are not widely recognized.\n - **Supporting Evidence:**\n - Common items, such as paper coffee cups and plastic water bottles, can release harmful chemicals, leading to increased levels of substances like BPA in beverages.\n\nBy summarizing these key points, the podcast emphasizes the interconnectedness of nutrition, exercise, and mental well-being in managing aging and chronic diseases. Additionally, it highlights emerging research on creatine, fasting, and environmental health risks that affect longevity and quality of life."
},
"metadata": {}
}
]
}
]
}