323 lines
14 KiB
Plaintext
323 lines
14 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "dfe37963-1af6-44fc-a841-8e462443f5e6",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Expert Knowledge Worker\n",
|
||
"\n",
|
||
"### A question answering agent that is an expert knowledge worker\n",
|
||
"### To be used by employees of Insurellm, an Insurance Tech company\n",
|
||
"### The agent needs to be accurate and the solution should be low cost.\n",
|
||
"\n",
|
||
"This project will use RAG (Retrieval Augmented Generation) to ensure our question/answering assistant has high accuracy."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "ba2779af-84ef-4227-9e9e-6eaf0df87e77",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# imports\n",
|
||
"\n",
|
||
"import os\n",
|
||
"import glob\n",
|
||
"from dotenv import load_dotenv\n",
|
||
"import gradio as gr"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "802137aa-8a74-45e0-a487-d1974927d7ca",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# imports for langchain\n",
|
||
"\n",
|
||
"from langchain.document_loaders import DirectoryLoader, TextLoader\n",
|
||
"from langchain.text_splitter import CharacterTextSplitter"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"id": "58c85082-e417-4708-9efe-81a5d55d1424",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# price is a factor for our company, so we're going to use a low cost model\n",
|
||
"\n",
|
||
"MODEL = \"gpt-4o-mini\"\n",
|
||
"db_name = \"vector_db\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "ee78efcb-60fe-449e-a944-40bab26261af",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Load environment variables in a file called .env\n",
|
||
"\n",
|
||
"load_dotenv()\n",
|
||
"os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "730711a9-6ffe-4eee-8f48-d6cfb7314905",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Read in documents using LangChain's loaders\n",
|
||
"# Take everything in all the sub-folders of our knowledgebase\n",
|
||
"\n",
|
||
"folders = glob.glob(\"knowledge-base/*\")\n",
|
||
"\n",
|
||
"documents = []\n",
|
||
"for folder in folders:\n",
|
||
" doc_type = os.path.basename(folder)\n",
|
||
" loader = DirectoryLoader(folder, glob=\"**/*.md\", loader_cls=TextLoader)\n",
|
||
" folder_docs = loader.load()\n",
|
||
" for doc in folder_docs:\n",
|
||
" doc.metadata[\"doc_type\"] = doc_type\n",
|
||
" documents.append(doc)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "252f17e9-3529-4e81-996c-cfa9f08e75a8",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"31"
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(documents)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"id": "7e8decb0-d9b0-4d51-8402-7a6174d22159",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Document(metadata={'source': 'knowledge-base/employees/Maxine Thompson.md', 'doc_type': 'employees'}, page_content=\"# HR Record\\n\\n# Maxine Thompson\\n\\n## Summary\\n- **Date of Birth:** January 15, 1991 \\n- **Job Title:** Data Engineer \\n- **Location:** Austin, Texas \\n\\n## Insurellm Career Progression\\n- **January 2017 - October 2018**: **Junior Data Engineer** \\n * Maxine joined Insurellm as a Junior Data Engineer, focusing primarily on ETL processes and data integration tasks. She quickly learned Insurellm's data architecture, collaborating with other team members to streamline data workflows. \\n- **November 2018 - December 2020**: **Data Engineer** \\n * In her new role, Maxine expanded her responsibilities to include designing comprehensive data models and improving data quality measures. Though she excelled in technical skills, communication issues with non-technical teams led to some project delays. \\n- **January 2021 - Present**: **Senior Data Engineer** \\n * Maxine was promoted to Senior Data Engineer after successfully leading a pivotal project that improved data retrieval times by 30%. She now mentors junior engineers and is involved in strategic data initiatives, solidifying her position as a valued asset at Insurellm. She was recognized as Insurellm Innovator of the year in 2023, receiving the prestiguous IIOTY 2023 award. \\n\\n## Annual Performance History\\n- **2017**: *Meets Expectations* \\n Maxine showed potential in her role but struggled with initial project deadlines. Her adaptability and willingness to learn made positive impacts on her team. \\n\\n- **2018**: *Exceeds Expectations* \\n Maxine improved significantly, becoming a reliable team member with strong problem-solving skills. She took on leadership in a project that automated data entry processes. \\n\\n- **2019**: *Needs Improvement* \\n During this year, difficult personal circumstances affected Maxine's performance. She missed key deadlines and had several communication issues with stakeholders. \\n\\n- **2020**: *Meets Expectations* \\n Maxine focused on regaining her footing and excelling with technical skills. She was stable, though not standout, in her contributions. Feedback indicated a need for more proactivity. \\n\\n- **2021**: *Exceeds Expectations* \\n Maxine spearheaded the transition to a new data warehousing solution, significantly enhancing Insurellm’s data analytics capabilities. This major achievement bolstered her reputation within the company. \\n\\n- **2022**: *Outstanding* \\n Maxine continued her upward trajectory, successfully implementing machine learning algorithms to predict customer behavior, which was well-received by the leadership team and improved client satisfaction. \\n\\n- **2023**: *Exceeds Expectations* \\n Maxine has taken on mentoring responsibilities and is leading a cross-functional team for data governance initiatives, showcasing her leadership and solidifying her role at Insurellm. \\n\\n## Compensation History\\n- **2017**: $70,000 (Junior Data Engineer) \\n- **2018**: $75,000 (Junior Data Engineer) \\n- **2019**: $80,000 (Data Engineer) \\n- **2020**: $84,000 (Data Engineer) \\n- **2021**: $95,000 (Senior Data Engineer) \\n- **2022**: $110,000 (Senior Data Engineer) \\n- **2023**: $120,000 (Senior Data Engineer) \\n\\n## Other HR Notes\\n- Maxine participated in various company-sponsored trainings related to big data technologies and cloud infrastructure. \\n- She was recognized for her contributions with the “Insurellm Innovator Award” in 2022. \\n- Maxine is currently involved in the women-in-tech initiative and participates in mentorship programs to guide junior employees. \\n- Future development areas include improving her stakeholder communication skills to ensure smoother project transitions and collaboration. \")"
|
||
]
|
||
},
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"documents[24]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"id": "7310c9c8-03c1-4efc-a104-5e89aec6db1a",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Created a chunk of size 1088, which is longer than the specified 1000\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
|
||
"chunks = text_splitter.split_documents(documents)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"id": "cd06e02f-6d9b-44cc-a43d-e1faa8acc7bb",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"123"
|
||
]
|
||
},
|
||
"execution_count": 11,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(chunks)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"id": "d2562754-9052-4aae-92c1-37236435ea06",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Document(metadata={'source': 'knowledge-base/products/Markellm.md', 'doc_type': 'products'}, page_content='- **User-Friendly Interface**: Designed with user experience in mind, Markellm features an intuitive interface that allows consumers to easily browse and compare various insurance offerings from multiple providers.\\n\\n- **Real-Time Quotes**: Consumers can receive real-time quotes from different insurance companies, empowering them to make informed decisions quickly without endless back-and-forth communication.\\n\\n- **Customized Recommendations**: Based on user profiles and preferences, Markellm provides personalized insurance recommendations, ensuring consumers find the right coverage at competitive rates.\\n\\n- **Secure Transactions**: Markellm prioritizes security, employing robust encryption methods to ensure that all transactions and data exchanges are safe and secure.\\n\\n- **Customer Support**: Our dedicated support team is always available to assist both consumers and insurers throughout the process, providing guidance and answering any questions that may arise.')"
|
||
]
|
||
},
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"chunks[6]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"id": "2c54b4b6-06da-463d-bee7-4dd456c2b887",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Document types found: employees, contracts, company, products\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)\n",
|
||
"print(f\"Document types found: {', '.join(doc_types)}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"id": "128c73f7-f149-4904-a554-8140941fce0c",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"page_content='## Support\n",
|
||
"\n",
|
||
"1. **Customer Support**: Velocity Auto Solutions will have access to Insurellm’s customer support team via email or chatbot, available 24/7. \n",
|
||
"2. **Technical Maintenance**: Regular maintenance and updates to the Carllm platform will be conducted by Insurellm, with any downtime communicated in advance. \n",
|
||
"3. **Training & Resources**: Initial training sessions will be provided for Velocity Auto Solutions’ staff to ensure effective use of the Carllm suite. Regular resources and documentation will be made available online.\n",
|
||
"\n",
|
||
"---\n",
|
||
"\n",
|
||
"**Accepted and Agreed:** \n",
|
||
"**For Velocity Auto Solutions** \n",
|
||
"Signature: _____________________ \n",
|
||
"Name: John Doe \n",
|
||
"Title: CEO \n",
|
||
"Date: _____________________ \n",
|
||
"\n",
|
||
"**For Insurellm** \n",
|
||
"Signature: _____________________ \n",
|
||
"Name: Jane Smith \n",
|
||
"Title: VP of Sales \n",
|
||
"Date: _____________________' metadata={'source': 'knowledge-base/contracts/Contract with Velocity Auto Solutions for Carllm.md', 'doc_type': 'contracts'}\n",
|
||
"_________\n",
|
||
"page_content='3. **Regular Updates:** Insurellm will offer ongoing updates and enhancements to the Homellm platform, including new features and security improvements.\n",
|
||
"\n",
|
||
"4. **Feedback Implementation:** Insurellm will actively solicit feedback from GreenValley Insurance to ensure Homellm continues to meet their evolving needs.\n",
|
||
"\n",
|
||
"---\n",
|
||
"\n",
|
||
"**Signatures:**\n",
|
||
"\n",
|
||
"_________________________________ \n",
|
||
"**[Name]** \n",
|
||
"**Title**: CEO \n",
|
||
"**Insurellm, Inc.**\n",
|
||
"\n",
|
||
"_________________________________ \n",
|
||
"**[Name]** \n",
|
||
"**Title**: COO \n",
|
||
"**GreenValley Insurance, LLC** \n",
|
||
"\n",
|
||
"---\n",
|
||
"\n",
|
||
"This agreement represents the complete understanding of both parties regarding the use of the Homellm product and supersedes any prior agreements or communications.' metadata={'source': 'knowledge-base/contracts/Contract with GreenValley Insurance for Homellm.md', 'doc_type': 'contracts'}\n",
|
||
"_________\n",
|
||
"page_content='# Avery Lancaster\n",
|
||
"\n",
|
||
"## Summary\n",
|
||
"- **Date of Birth**: March 15, 1985 \n",
|
||
"- **Job Title**: Co-Founder & Chief Executive Officer (CEO) \n",
|
||
"- **Location**: San Francisco, California \n",
|
||
"\n",
|
||
"## Insurellm Career Progression\n",
|
||
"- **2015 - Present**: Co-Founder & CEO \n",
|
||
" Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market. \n",
|
||
"\n",
|
||
"- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions \n",
|
||
" Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.' metadata={'source': 'knowledge-base/employees/Avery Lancaster.md', 'doc_type': 'employees'}\n",
|
||
"_________\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"for chunk in chunks:\n",
|
||
" if 'CEO' in chunk.page_content:\n",
|
||
" print(chunk)\n",
|
||
" print(\"_________\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "6965971c-fb97-482c-a497-4e81a0ac83df",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.10"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|