Files
LLM_Engineering_OLD/week5/day2.ipynb
2024-09-16 10:45:25 -04:00

323 lines
14 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "dfe37963-1af6-44fc-a841-8e462443f5e6",
"metadata": {},
"source": [
"## Expert Knowledge Worker\n",
"\n",
"### A question answering agent that is an expert knowledge worker\n",
"### To be used by employees of Insurellm, an Insurance Tech company\n",
"### The agent needs to be accurate and the solution should be low cost.\n",
"\n",
"This project will use RAG (Retrieval Augmented Generation) to ensure our question/answering assistant has high accuracy."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ba2779af-84ef-4227-9e9e-6eaf0df87e77",
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os\n",
"import glob\n",
"from dotenv import load_dotenv\n",
"import gradio as gr"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "802137aa-8a74-45e0-a487-d1974927d7ca",
"metadata": {},
"outputs": [],
"source": [
"# imports for langchain\n",
"\n",
"from langchain.document_loaders import DirectoryLoader, TextLoader\n",
"from langchain.text_splitter import CharacterTextSplitter"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "58c85082-e417-4708-9efe-81a5d55d1424",
"metadata": {},
"outputs": [],
"source": [
"# price is a factor for our company, so we're going to use a low cost model\n",
"\n",
"MODEL = \"gpt-4o-mini\"\n",
"db_name = \"vector_db\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ee78efcb-60fe-449e-a944-40bab26261af",
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables in a file called .env\n",
"\n",
"load_dotenv()\n",
"os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "730711a9-6ffe-4eee-8f48-d6cfb7314905",
"metadata": {},
"outputs": [],
"source": [
"# Read in documents using LangChain's loaders\n",
"# Take everything in all the sub-folders of our knowledgebase\n",
"\n",
"folders = glob.glob(\"knowledge-base/*\")\n",
"\n",
"documents = []\n",
"for folder in folders:\n",
" doc_type = os.path.basename(folder)\n",
" loader = DirectoryLoader(folder, glob=\"**/*.md\", loader_cls=TextLoader)\n",
" folder_docs = loader.load()\n",
" for doc in folder_docs:\n",
" doc.metadata[\"doc_type\"] = doc_type\n",
" documents.append(doc)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "252f17e9-3529-4e81-996c-cfa9f08e75a8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"31"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(documents)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "7e8decb0-d9b0-4d51-8402-7a6174d22159",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': 'knowledge-base/employees/Maxine Thompson.md', 'doc_type': 'employees'}, page_content=\"# HR Record\\n\\n# Maxine Thompson\\n\\n## Summary\\n- **Date of Birth:** January 15, 1991 \\n- **Job Title:** Data Engineer \\n- **Location:** Austin, Texas \\n\\n## Insurellm Career Progression\\n- **January 2017 - October 2018**: **Junior Data Engineer** \\n * Maxine joined Insurellm as a Junior Data Engineer, focusing primarily on ETL processes and data integration tasks. She quickly learned Insurellm's data architecture, collaborating with other team members to streamline data workflows. \\n- **November 2018 - December 2020**: **Data Engineer** \\n * In her new role, Maxine expanded her responsibilities to include designing comprehensive data models and improving data quality measures. Though she excelled in technical skills, communication issues with non-technical teams led to some project delays. \\n- **January 2021 - Present**: **Senior Data Engineer** \\n * Maxine was promoted to Senior Data Engineer after successfully leading a pivotal project that improved data retrieval times by 30%. She now mentors junior engineers and is involved in strategic data initiatives, solidifying her position as a valued asset at Insurellm. She was recognized as Insurellm Innovator of the year in 2023, receiving the prestiguous IIOTY 2023 award. \\n\\n## Annual Performance History\\n- **2017**: *Meets Expectations* \\n Maxine showed potential in her role but struggled with initial project deadlines. Her adaptability and willingness to learn made positive impacts on her team. \\n\\n- **2018**: *Exceeds Expectations* \\n Maxine improved significantly, becoming a reliable team member with strong problem-solving skills. She took on leadership in a project that automated data entry processes. \\n\\n- **2019**: *Needs Improvement* \\n During this year, difficult personal circumstances affected Maxine's performance. She missed key deadlines and had several communication issues with stakeholders. \\n\\n- **2020**: *Meets Expectations* \\n Maxine focused on regaining her footing and excelling with technical skills. She was stable, though not standout, in her contributions. Feedback indicated a need for more proactivity. \\n\\n- **2021**: *Exceeds Expectations* \\n Maxine spearheaded the transition to a new data warehousing solution, significantly enhancing Insurellms data analytics capabilities. This major achievement bolstered her reputation within the company. \\n\\n- **2022**: *Outstanding* \\n Maxine continued her upward trajectory, successfully implementing machine learning algorithms to predict customer behavior, which was well-received by the leadership team and improved client satisfaction. \\n\\n- **2023**: *Exceeds Expectations* \\n Maxine has taken on mentoring responsibilities and is leading a cross-functional team for data governance initiatives, showcasing her leadership and solidifying her role at Insurellm. \\n\\n## Compensation History\\n- **2017**: $70,000 (Junior Data Engineer) \\n- **2018**: $75,000 (Junior Data Engineer) \\n- **2019**: $80,000 (Data Engineer) \\n- **2020**: $84,000 (Data Engineer) \\n- **2021**: $95,000 (Senior Data Engineer) \\n- **2022**: $110,000 (Senior Data Engineer) \\n- **2023**: $120,000 (Senior Data Engineer) \\n\\n## Other HR Notes\\n- Maxine participated in various company-sponsored trainings related to big data technologies and cloud infrastructure. \\n- She was recognized for her contributions with the “Insurellm Innovator Award” in 2022. \\n- Maxine is currently involved in the women-in-tech initiative and participates in mentorship programs to guide junior employees. \\n- Future development areas include improving her stakeholder communication skills to ensure smoother project transitions and collaboration. \")"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"documents[24]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "7310c9c8-03c1-4efc-a104-5e89aec6db1a",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Created a chunk of size 1088, which is longer than the specified 1000\n"
]
}
],
"source": [
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
"chunks = text_splitter.split_documents(documents)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "cd06e02f-6d9b-44cc-a43d-e1faa8acc7bb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"123"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(chunks)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "d2562754-9052-4aae-92c1-37236435ea06",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': 'knowledge-base/products/Markellm.md', 'doc_type': 'products'}, page_content='- **User-Friendly Interface**: Designed with user experience in mind, Markellm features an intuitive interface that allows consumers to easily browse and compare various insurance offerings from multiple providers.\\n\\n- **Real-Time Quotes**: Consumers can receive real-time quotes from different insurance companies, empowering them to make informed decisions quickly without endless back-and-forth communication.\\n\\n- **Customized Recommendations**: Based on user profiles and preferences, Markellm provides personalized insurance recommendations, ensuring consumers find the right coverage at competitive rates.\\n\\n- **Secure Transactions**: Markellm prioritizes security, employing robust encryption methods to ensure that all transactions and data exchanges are safe and secure.\\n\\n- **Customer Support**: Our dedicated support team is always available to assist both consumers and insurers throughout the process, providing guidance and answering any questions that may arise.')"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chunks[6]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "2c54b4b6-06da-463d-bee7-4dd456c2b887",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Document types found: employees, contracts, company, products\n"
]
}
],
"source": [
"doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)\n",
"print(f\"Document types found: {', '.join(doc_types)}\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "128c73f7-f149-4904-a554-8140941fce0c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='## Support\n",
"\n",
"1. **Customer Support**: Velocity Auto Solutions will have access to Insurellms customer support team via email or chatbot, available 24/7. \n",
"2. **Technical Maintenance**: Regular maintenance and updates to the Carllm platform will be conducted by Insurellm, with any downtime communicated in advance. \n",
"3. **Training & Resources**: Initial training sessions will be provided for Velocity Auto Solutions staff to ensure effective use of the Carllm suite. Regular resources and documentation will be made available online.\n",
"\n",
"---\n",
"\n",
"**Accepted and Agreed:** \n",
"**For Velocity Auto Solutions** \n",
"Signature: _____________________ \n",
"Name: John Doe \n",
"Title: CEO \n",
"Date: _____________________ \n",
"\n",
"**For Insurellm** \n",
"Signature: _____________________ \n",
"Name: Jane Smith \n",
"Title: VP of Sales \n",
"Date: _____________________' metadata={'source': 'knowledge-base/contracts/Contract with Velocity Auto Solutions for Carllm.md', 'doc_type': 'contracts'}\n",
"_________\n",
"page_content='3. **Regular Updates:** Insurellm will offer ongoing updates and enhancements to the Homellm platform, including new features and security improvements.\n",
"\n",
"4. **Feedback Implementation:** Insurellm will actively solicit feedback from GreenValley Insurance to ensure Homellm continues to meet their evolving needs.\n",
"\n",
"---\n",
"\n",
"**Signatures:**\n",
"\n",
"_________________________________ \n",
"**[Name]** \n",
"**Title**: CEO \n",
"**Insurellm, Inc.**\n",
"\n",
"_________________________________ \n",
"**[Name]** \n",
"**Title**: COO \n",
"**GreenValley Insurance, LLC** \n",
"\n",
"---\n",
"\n",
"This agreement represents the complete understanding of both parties regarding the use of the Homellm product and supersedes any prior agreements or communications.' metadata={'source': 'knowledge-base/contracts/Contract with GreenValley Insurance for Homellm.md', 'doc_type': 'contracts'}\n",
"_________\n",
"page_content='# Avery Lancaster\n",
"\n",
"## Summary\n",
"- **Date of Birth**: March 15, 1985 \n",
"- **Job Title**: Co-Founder & Chief Executive Officer (CEO) \n",
"- **Location**: San Francisco, California \n",
"\n",
"## Insurellm Career Progression\n",
"- **2015 - Present**: Co-Founder & CEO \n",
" Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market. \n",
"\n",
"- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions \n",
" Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.' metadata={'source': 'knowledge-base/employees/Avery Lancaster.md', 'doc_type': 'employees'}\n",
"_________\n"
]
}
],
"source": [
"for chunk in chunks:\n",
" if 'CEO' in chunk.page_content:\n",
" print(chunk)\n",
" print(\"_________\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6965971c-fb97-482c-a497-4e81a0ac83df",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}