kwabena_bootcamp

This commit is contained in:
twc-kwabena
2025-10-29 08:57:50 -04:00
parent ba929c7ed4
commit 533c49b6e4

View File

@@ -0,0 +1,511 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "85b93c49",
"metadata": {},
"source": [
"# Expert Resume Creator"
]
},
{
"cell_type": "markdown",
"id": "8f90fe9a",
"metadata": {},
"source": [
" In this exercise, we'll build a RAG-powered resume refinement tool that helps tailor resumes to specific job descriptions.\n",
" \n",
" What We'll Build\n",
" An AI assistant that takes a job description and current resume, then produces an optimized version using resume writing best practices.\n",
" \n",
" The Approach (RAG)\n",
" 1. **Generate Knowledge Base** - Use an LLM to create expert resume writing guides\n",
" 2. **Create Vector Database** - Store the knowledge in Chroma for semantic search\n",
" 3. **Build Interface** - Create a Gradio app where users can refine their resumes\n",
" \n",
" Steps\n",
" - **STEP 1**: Generate synthetic resume writing knowledge using LLM\n",
" - **STEP 2**: Load documents and create RAG with Chroma vector database\n",
" - **STEP 3**: Build Gradio interface for users to input job description and resume\n",
" \n",
" ---\n",
" \n",
" Let's get started! 🚀"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f889c1d",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import glob\n",
"from dotenv import load_dotenv\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3711bc34",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import DirectoryLoader, TextLoader\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.schema import Document\n",
"from langchain_openai import OpenAIEmbeddings, ChatOpenAI\n",
"from langchain_chroma import Chroma\n",
"from langchain.memory import ConversationBufferMemory\n",
"from langchain.chains import ConversationalRetrievalChain\n",
"from langchain.embeddings import HuggingFaceEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "840999d8",
"metadata": {},
"outputs": [],
"source": [
"# Configuration\n",
"MODEL = \"gpt-4o-mini\"\n",
"db_name = \"resume_vector_db\"\n",
"KNOWLEDGE_BASE_DIR = \"resume-knowledge-base\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4695238c",
"metadata": {},
"outputs": [],
"source": [
"#load environment variables\n",
"load_dotenv(override=True)\n",
"os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')"
]
},
{
"cell_type": "markdown",
"id": "37ce61e4",
"metadata": {},
"source": [
"### STEP 1 - Programmatically Generate Synthetic Resume Knowledge Base"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f6257788",
"metadata": {},
"outputs": [],
"source": [
"def generate_content_with_llm(topic, category):\n",
" \"\"\"Use LLM to generate content for a specific topic\"\"\"\n",
" \n",
" llm = ChatOpenAI(temperature=0.8, model_name=MODEL)\n",
" \n",
" prompts = {\n",
" \"best-practices\": f\"\"\"You are an expert resume writer and career coach. Write a comprehensive guide about: {topic}\n",
"\n",
" Create a detailed markdown document with:\n",
" - Clear section headers\n",
" - Specific, actionable advice\n",
" - Multiple concrete examples\n",
" - Do's and don'ts\n",
" - Real-world tips that hiring managers look for\n",
"\n",
" Write 500-800 words in markdown format. Be specific and practical.\"\"\",\n",
" \n",
" \"industry-specific\": f\"\"\"You are an expert resume writer specializing in {topic} industry resumes.\n",
"\n",
" Write a comprehensive industry guide covering:\n",
" - Key skills and technologies to highlight for {topic} roles\n",
" - How to structure experience for this industry\n",
" - Important keywords and terminology\n",
" - 5-8 example bullet points showing strong achievements with specific metrics\n",
" - Common mistakes to avoid\n",
" - What hiring managers in {topic} look for\n",
"\n",
" Write 600-900 words in markdown format with specific examples.\"\"\",\n",
" \n",
" \"examples\": f\"\"\"You are an expert resume writer. Create detailed examples for: {topic}\n",
"\n",
" Provide:\n",
" - 3-4 complete, realistic examples showing proper formatting\n",
" - Each example should include company name, dates, and 4-6 bullet points\n",
" - Bullet points must include quantified achievements (numbers, percentages, dollar amounts)\n",
" - Show variety in roles (junior, mid-level, senior)\n",
" - Use strong action verbs\n",
" - Demonstrate clear impact and results\n",
"\n",
" Write in markdown format. Make examples realistic and impressive.\"\"\",\n",
" \n",
" \"specialized\": f\"\"\"You are an expert in resume writing for {topic}.\n",
"\n",
" Create a comprehensive guide covering:\n",
" - Unique considerations for {topic}\n",
" - Best practices and formatting tips\n",
" - 6-10 strong example bullet points with metrics\n",
" - Common questions and how to address them\n",
" - What makes a standout resume in this area\n",
"\n",
" Write 500-700 words in markdown format.\"\"\"\n",
" }\n",
" \n",
" prompt = prompts.get(category, prompts[\"best-practices\"])\n",
" response = llm.invoke(prompt)\n",
" return response.content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6a3e0c62",
"metadata": {},
"outputs": [],
"source": [
"def create_resume_knowledge_base():\n",
" \"\"\"Programmatically generate comprehensive resume knowledge base using LLM\"\"\"\n",
" \n",
" print(\"🤖 Starting LLM-powered knowledge base generation...\")\n",
" print(\"⏳ This may take 2-3 minutes to generate all content...\\n\")\n",
" \n",
" # Create directory structure\n",
" os.makedirs(f\"{KNOWLEDGE_BASE_DIR}/best-practices\", exist_ok=True)\n",
" os.makedirs(f\"{KNOWLEDGE_BASE_DIR}/examples\", exist_ok=True)\n",
" os.makedirs(f\"{KNOWLEDGE_BASE_DIR}/industry-specific\", exist_ok=True)\n",
" os.makedirs(f\"{KNOWLEDGE_BASE_DIR}/specialized\", exist_ok=True)\n",
" \n",
" # Define topics for each category\n",
" topics = {\n",
" \"best-practices\": [\n",
" \"Resume Formatting and Structure\",\n",
" \"Powerful Action Verbs and Keywords\",\n",
" \"Quantifying Achievements and Impact\",\n",
" \"Tailoring Resume to Job Descriptions\",\n",
" \"ATS (Applicant Tracking System) Optimization\",\n",
" \"Common Resume Mistakes to Avoid\"\n",
" ],\n",
" \"industry-specific\": [\n",
" \"Software Engineering and Technology\",\n",
" \"Data Science and Machine Learning\",\n",
" \"Business and Marketing\",\n",
" \"Finance and Accounting\",\n",
" \"Healthcare and Medical\",\n",
" \"Product Management\"\n",
" ],\n",
" \"examples\": [\n",
" \"Strong Experience Section Examples\",\n",
" \"Skills Section Formatting\",\n",
" \"Project Descriptions for Technical Roles\",\n",
" \"Leadership and Management Achievements\",\n",
" \"Entry-Level Resume Examples\"\n",
" ],\n",
" \"specialized\": [\n",
" \"Career Changers and Transitions\",\n",
" \"Recent Graduates and Internships\",\n",
" \"Executive and C-Level Resumes\",\n",
" \"Freelance and Contract Work\",\n",
" \"Career Gaps and Explanations\"\n",
" ]\n",
" }\n",
" \n",
" total_files = sum(len(topic_list) for topic_list in topics.values())\n",
" current_file = 0\n",
" \n",
" # Generate content for each category and topic\n",
" for category, topic_list in topics.items():\n",
" for topic in topic_list:\n",
" current_file += 1\n",
" print(f\"[{current_file}/{total_files}] Generating: {category}/{topic}...\")\n",
" \n",
" try:\n",
" # Generate content using LLM\n",
" content = generate_content_with_llm(topic, category)\n",
" \n",
" # Create filename from topic\n",
" filename = topic.lower().replace(\" \", \"-\").replace(\"(\", \"\").replace(\")\", \"\") + \".md\"\n",
" filepath = f\"{KNOWLEDGE_BASE_DIR}/{category}/{filename}\"\n",
" \n",
" # Add title to content\n",
" full_content = f\"# {topic}\\n\\n{content}\"\n",
" \n",
" # Write to file\n",
" with open(filepath, \"w\", encoding=\"utf-8\") as f:\n",
" f.write(full_content)\n",
" \n",
" print(f\" ✅ Saved to {category}/{filename}\")\n",
" \n",
" except Exception as e:\n",
" print(f\" ❌ Error generating {topic}: {str(e)}\")\n",
" continue\n",
" \n",
" print(f\"\\n✅ Knowledge base generation complete!\")\n",
" print(f\"📁 Created {total_files} files across 4 categories:\")\n",
" print(f\" - {len(topics['best-practices'])} best practice guides\")\n",
" print(f\" - {len(topics['industry-specific'])} industry-specific guides\")\n",
" print(f\" - {len(topics['examples'])} example collections\")\n",
" print(f\" - {len(topics['specialized'])} specialized guides\")\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a7257b2b",
"metadata": {},
"outputs": [],
"source": [
"# Run this to create the knowledge base\n",
"create_resume_knowledge_base()"
]
},
{
"cell_type": "markdown",
"id": "292a8d84",
"metadata": {},
"source": [
"### Load and Process Documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d8c18a52",
"metadata": {},
"outputs": [],
"source": [
" # Read in documents using LangChain's loaders\n",
"folders = glob.glob(f\"{KNOWLEDGE_BASE_DIR}/*\")\n",
"\n",
"def add_metadata(doc, doc_type):\n",
" doc.metadata[\"doc_type\"] = doc_type\n",
" return doc\n",
"\n",
"text_loader_kwargs = {'encoding': 'utf-8'}\n",
"\n",
"documents = []\n",
"for folder in folders:\n",
" doc_type = os.path.basename(folder)\n",
" loader = DirectoryLoader(folder, glob=\"**/*.md\", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)\n",
" folder_docs = loader.load()\n",
" documents.extend([add_metadata(doc, doc_type) for doc in folder_docs])\n",
"\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
"chunks = text_splitter.split_documents(documents)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "567829d5",
"metadata": {},
"outputs": [],
"source": [
"print(f\"Total number of chunks: {len(chunks)}\")\n",
"print(f\"Document types found: {set(doc.metadata['doc_type'] for doc in documents)}\")"
]
},
{
"cell_type": "markdown",
"id": "12e5dfb1",
"metadata": {},
"source": [
"Create Vector Store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "94239c9d",
"metadata": {},
"outputs": [],
"source": [
"# Using OpenAI embeddings (you can switch to HuggingFace for free alternative)\n",
"embeddings = OpenAIEmbeddings()\n",
"\n",
"# Alternative free option:\n",
"# embeddings = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")\n",
"\n",
"# Delete if already exists\n",
"if os.path.exists(db_name):\n",
" Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()\n",
"\n",
"# Create vectorstore\n",
"vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)\n",
"print(f\"✅ Vectorstore created with {vectorstore._collection.count()} documents\")\n"
]
},
{
"cell_type": "markdown",
"id": "62554189",
"metadata": {},
"source": [
"Set up RAG Chain"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e2b349f5",
"metadata": {},
"outputs": [],
"source": [
"\n",
"llm = ChatOpenAI(temperature=0.7, model_name=MODEL)\n",
"\n",
"# Alternative - use Ollama locally:\n",
"# llm = ChatOpenAI(temperature=0.7, model_name='llama3.2', base_url='http://localhost:11434/v1', api_key='ollama')\n",
"\n",
"memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)\n",
"retriever = vectorstore.as_retriever(search_kwargs={\"k\": 10})\n",
"conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)\n",
"\n",
"print(\"✅ RAG chain configured and ready\")\n"
]
},
{
"cell_type": "markdown",
"id": "280ad157",
"metadata": {},
"source": [
"Create Resume Refinement Function"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f54e5573",
"metadata": {},
"outputs": [],
"source": [
"def refine_resume(job_description, current_resume, history=None):\n",
" \"\"\"\n",
" Refines a resume based on job description using RAG knowledge base\n",
" \"\"\"\n",
" # Reset memory for each new refinement\n",
" global conversation_chain, memory, llm, retriever\n",
" memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)\n",
" conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)\n",
" \n",
" prompt = f\"\"\"You are an expert resume writer with access to best practices and successful examples.\n",
"\n",
" JOB DESCRIPTION:\n",
" {job_description}\n",
"\n",
" CURRENT RESUME:\n",
" {current_resume}\n",
"\n",
" Please analyze the current resume and provide a refined version that:\n",
" 1. Aligns keywords and skills with the job description\n",
" 2. Uses strong action verbs and quantified achievements\n",
" 3. Follows formatting best practices\n",
" 4. Highlights most relevant experience for this role\n",
" 5. Removes or de-emphasizes less relevant information\n",
"\n",
" Provide the refined resume in a clear, professional format. Also include a brief \"KEY IMPROVEMENTS\" section at the end explaining the main changes you made and why.\n",
" \"\"\"\n",
" \n",
" result = conversation_chain.invoke({\"question\": prompt})\n",
" return result[\"answer\"]\n"
]
},
{
"cell_type": "markdown",
"id": "4efdfe8b",
"metadata": {},
"source": [
"Create Gradio Interface"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dacb51de",
"metadata": {},
"outputs": [],
"source": [
"def create_gradio_interface():\n",
" with gr.Blocks(title=\"Expert Resume Creator\") as interface:\n",
" gr.Markdown(\"# 📄 Expert Resume Creator\")\n",
" gr.Markdown(\"Refine your resume using AI-powered best practices and tailored optimization\")\n",
" \n",
" with gr.Row():\n",
" with gr.Column():\n",
" job_desc_input = gr.Textbox(\n",
" label=\"Job Description\",\n",
" placeholder=\"Paste the job description here...\",\n",
" lines=10\n",
" )\n",
" resume_input = gr.Textbox(\n",
" label=\"Your Current Resume\",\n",
" placeholder=\"Paste your current resume here...\",\n",
" lines=15\n",
" )\n",
" submit_btn = gr.Button(\"✨ Refine My Resume\", variant=\"primary\", size=\"lg\")\n",
" \n",
" with gr.Column():\n",
" output = gr.Textbox(\n",
" label=\"Refined Resume\",\n",
" lines=30,\n",
" show_copy_button=True\n",
" )\n",
" \n",
" gr.Markdown(\"### 💡 Tips\")\n",
" gr.Markdown(\"\"\"\n",
" - Include complete job description with requirements and responsibilities\n",
" - Paste your full resume including experience, education, and skills\n",
" - The AI will optimize your resume to match the job requirements\n",
" - Review the KEY IMPROVEMENTS section to understand the changes\n",
" \"\"\")\n",
" \n",
" submit_btn.click(\n",
" fn=refine_resume,\n",
" inputs=[job_desc_input, resume_input],\n",
" outputs=output\n",
" )\n",
" \n",
" return interface"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e01ddd13",
"metadata": {},
"outputs": [],
"source": [
"# Launch the interface\n",
"interface = create_gradio_interface()\n",
"interface.launch(inbrowser=True, share=False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}