Merge pull request #911 from sach91/sach91-bootcamp-wk8

sach91 bootcamp week8 exercise
2025-10-30 22:08:13 -04:00
parent d20c70f1bc ef48ed539d
commit 4d3d475246
20 changed files with 3124 additions and 0 deletions
--- a/community-contributions/sach91-bootcamp/week8/README.md
+++ b/community-contributions/sach91-bootcamp/week8/README.md
@@ -0,0 +1,259 @@
+# 🧠 KnowledgeHub - Personal Knowledge Management & Research Assistant
+
+An elegant, fully local AI-powered knowledge management system that helps you organize, search, and understand your documents using state-of-the-art LLM technology.
+
+## ✨ Features
+
+### 🎯 Core Capabilities
+- **📤 Document Ingestion**: Upload PDF, DOCX, TXT, MD, and HTML files
+- **❓ Intelligent Q&A**: Ask questions and get answers from your documents using RAG
+- **📝 Smart Summarization**: Generate concise summaries with key points
+- **🔗 Connection Discovery**: Find relationships between documents
+- **💾 Multi-format Export**: Export as Markdown, HTML, or plain text
+- **📊 Statistics Dashboard**: Track your knowledge base growth
+
+### 🔒 Privacy-First
+- **100% Local Processing**: All data stays on your machine
+- **No Cloud Dependencies**: Uses Ollama for local LLM inference
+- **Open Source**: Full transparency and control
+
+### ⚡ Technology Stack
+- **LLM**: Ollama with Llama 3.2 (3B) or Llama 3.1 (8B)
+- **Embeddings**: sentence-transformers (all-MiniLM-L6-v2)
+- **Vector Database**: ChromaDB
+- **UI**: Gradio
+- **Document Processing**: pypdf, python-docx, beautifulsoup4
+
+## 🚀 Quick Start
+
+### Prerequisites
+
+1. **Python 3.8+** installed
+2. **Ollama** installed and running
+
+#### Installing Ollama
+
+**macOS/Linux:**
+```bash
+curl -fsSL https://ollama.com/install.sh | sh
+```
+
+**Windows:**
+Download from [ollama.com/download](https://ollama.com/download)
+
+### Installation
+
+1. **Clone or download this repository**
+
+2. **Install Python dependencies:**
+```bash
+pip install -r requirements.txt
+```
+
+3. **Pull Llama model using Ollama:**
+```bash
+# For faster inference (recommended for most users)
+ollama pull llama3.2
+
+# OR for better quality (requires more RAM)
+ollama pull llama3.1
+```
+
+4. **Start Ollama server** (if not already running):
+```bash
+ollama serve
+```
+
+5. **Launch KnowledgeHub:**
+```bash
+python app.py
+```
+
+The application will open in your browser at `http://127.0.0.1:7860`
+
+## 📖 Usage Guide
+
+### 1. Upload Documents
+- Go to the "Upload Documents" tab
+- Select a file (PDF, DOCX, TXT, MD, or HTML)
+- Click "Upload & Process"
+- The document will be chunked and stored in your local vector database
+
+### 2. Ask Questions
+- Go to the "Ask Questions" tab
+- Type your question in natural language
+- Adjust the number of sources to retrieve (default: 5)
+- Click "Ask" to get an AI-generated answer with sources
+
+### 3. Summarize Documents
+- Go to the "Summarize" tab
+- Select a document from the dropdown
+- Click "Generate Summary"
+- Get a concise summary with key points
+
+### 4. Find Connections
+- Go to the "Find Connections" tab
+- Select a document to analyze
+- Adjust how many related documents to find
+- See documents that are semantically similar
+
+### 5. Export Knowledge
+- Go to the "Export" tab
+- Choose your format (Markdown, HTML, or Text)
+- Click "Export" to download your knowledge base
+
+### 6. View Statistics
+- Go to the "Statistics" tab
+- See overview of your knowledge base
+- Track total documents, chunks, and characters
+
+## 🏗️ Architecture
+
+```
+KnowledgeHub/
+├── agents/              # Specialized AI agents
+│   ├── base_agent.py           # Base class for all agents
+│   ├── ingestion_agent.py      # Document processing
+│   ├── question_agent.py       # RAG-based Q&A
+│   ├── summary_agent.py        # Summarization
+│   ├── connection_agent.py     # Finding relationships
+│   └── export_agent.py         # Exporting data
+├── models/              # Data models
+│   ├── document.py             # Document structures
+│   └── knowledge_graph.py      # Graph structures
+├── utils/               # Utilities
+│   ├── ollama_client.py        # Ollama API wrapper
+│   ├── embeddings.py           # Embedding generation
+│   └── document_parser.py      # File parsing
+├── vectorstore/         # ChromaDB storage (auto-created)
+├── temp_uploads/        # Temporary file storage (auto-created)
+├── app.py              # Main Gradio application
+└── requirements.txt    # Python dependencies
+```
+
+## 🎯 Multi-Agent Framework
+
+KnowledgeHub uses a sophisticated multi-agent architecture:
+
+1. **Ingestion Agent**: Parses documents, creates chunks, generates embeddings
+2. **Question Agent**: Retrieves relevant context and answers questions
+3. **Summary Agent**: Creates concise summaries and extracts key points
+4. **Connection Agent**: Finds semantic relationships between documents
+5. **Export Agent**: Formats and exports knowledge in multiple formats
+
+Each agent is independent, reusable, and focused on a specific task, following best practices in agentic AI development.
+
+## ⚙️ Configuration
+
+### Changing Models
+
+Edit `app.py` to use different models:
+
+```python
+# For Llama 3.1 8B (better quality, more RAM)
+self.llm_client = OllamaClient(model="llama3.1")
+
+# For Llama 3.2 3B (faster, less RAM)
+self.llm_client = OllamaClient(model="llama3.2")
+```
+
+### Adjusting Chunk Size
+
+Edit `agents/ingestion_agent.py`:
+
+```python
+self.parser = DocumentParser(
+    chunk_size=1000,      # Characters per chunk
+    chunk_overlap=200     # Overlap between chunks
+)
+```
+
+### Changing Embedding Model
+
+Edit `app.py`:
+
+```python
+self.embedding_model = EmbeddingModel(
+    model_name="sentence-transformers/all-MiniLM-L6-v2"
+)
+```
+
+## 🔧 Troubleshooting
+
+### "Cannot connect to Ollama"
+- Ensure Ollama is installed: `ollama --version`
+- Start the Ollama service: `ollama serve`
+- Verify the model is pulled: `ollama list`
+
+### "Module not found" errors
+- Ensure all dependencies are installed: `pip install -r requirements.txt`
+- Try upgrading pip: `pip install --upgrade pip`
+
+### "Out of memory" errors
+- Use Llama 3.2 (3B) instead of Llama 3.1 (8B)
+- Reduce chunk_size in document parser
+- Process fewer documents at once
+
+### Slow response times
+- Ensure you're using a CUDA-enabled GPU (if available)
+- Reduce the number of retrieved chunks (top_k parameter)
+- Use a smaller model (llama3.2)
+
+## 🎓 Learning Resources
+
+This project demonstrates key concepts in LLM engineering:
+
+- **RAG (Retrieval Augmented Generation)**: Combining retrieval with generation
+- **Vector Databases**: Using ChromaDB for semantic search
+- **Multi-Agent Systems**: Specialized agents working together
+- **Embeddings**: Semantic representation of text
+- **Local LLM Deployment**: Using Ollama for privacy-focused AI
+
+## 📊 Performance
+
+**Hardware Requirements:**
+- Minimum: 8GB RAM, CPU
+- Recommended: 16GB RAM, GPU (NVIDIA with CUDA)
+- Optimal: 32GB RAM, GPU (RTX 3060 or better)
+
+**Processing Speed** (Llama 3.2 on M1 Mac):
+- Document ingestion: ~2-5 seconds per page
+- Question answering: ~5-15 seconds
+- Summarization: ~10-20 seconds
+
+## 🤝 Contributing
+
+This is a learning project showcasing LLM engineering principles. Feel free to:
+- Experiment with different models
+- Add new agents for specialized tasks
+- Improve the UI
+- Optimize performance
+
+## 📄 License
+
+This project is open source and available for educational purposes.
+
+## 🙏 Acknowledgments
+
+Built with:
+- [Ollama](https://ollama.com/) - Local LLM runtime
+- [Gradio](https://gradio.app/) - UI framework
+- [ChromaDB](https://www.trychroma.com/) - Vector database
+- [Sentence Transformers](https://www.sbert.net/) - Embeddings
+- [Llama](https://ai.meta.com/llama/) - Meta's open source LLMs
+
+## 🎯 Next Steps
+
+Potential enhancements:
+1. Add support for images and diagrams
+2. Implement multi-document chat history
+3. Build a visual knowledge graph
+4. Add collaborative features
+5. Create mobile app interface
+6. Implement advanced filters and search
+7. Add citation tracking
+8. Create automated study guides
+
+---
+
+**Made with ❤️ for the LLM Engineering Community**
--- a/community-contributions/sach91-bootcamp/week8/agents/init.py
+++ b/community-contributions/sach91-bootcamp/week8/agents/init.py
@@ -0,0 +1,18 @@
+"""
+KnowledgeHub Agents
+"""
+from .base_agent import BaseAgent
+from .ingestion_agent import IngestionAgent
+from .question_agent import QuestionAgent
+from .summary_agent import SummaryAgent
+from .connection_agent import ConnectionAgent
+from .export_agent import ExportAgent
+
+__all__ = [
+    'BaseAgent',
+    'IngestionAgent',
+    'QuestionAgent',
+    'SummaryAgent',
+    'ConnectionAgent',
+    'ExportAgent'
+]
--- a/community-contributions/sach91-bootcamp/week8/agents/base_agent.py
+++ b/community-contributions/sach91-bootcamp/week8/agents/base_agent.py
@@ -0,0 +1,91 @@
+"""
+Base Agent class - Foundation for all specialized agents
+"""
+from abc import ABC, abstractmethod
+import logging
+from typing import Optional, Dict, Any
+from utils.ollama_client import OllamaClient
+
+logger = logging.getLogger(__name__)
+
+class BaseAgent(ABC):
+    """Abstract base class for all agents"""
+    
+    def __init__(self, name: str, llm_client: Optional[OllamaClient] = None, 
+                 model: str = "llama3.2"):
+        """
+        Initialize base agent
+        
+        Args:
+            name: Agent name for logging
+            llm_client: Shared Ollama client (creates new one if None)
+            model: Ollama model to use
+        """
+        self.name = name
+        self.model = model
+        
+        # Use shared client or create new one
+        if llm_client is None:
+            self.llm = OllamaClient(model=model)
+            logger.info(f"{self.name} initialized with new LLM client (model: {model})")
+        else:
+            self.llm = llm_client
+            logger.info(f"{self.name} initialized with shared LLM client (model: {model})")
+    
+    def generate(self, prompt: str, system: Optional[str] = None, 
+                 temperature: float = 0.7, max_tokens: int = 2048) -> str:
+        """
+        Generate text using the LLM
+        
+        Args:
+            prompt: User prompt
+            system: System message (optional)
+            temperature: Sampling temperature
+            max_tokens: Maximum tokens to generate
+            
+        Returns:
+            Generated text
+        """
+        logger.info(f"{self.name} generating response")
+        response = self.llm.generate(
+            prompt=prompt,
+            system=system,
+            temperature=temperature,
+            max_tokens=max_tokens
+        )
+        logger.debug(f"{self.name} generated {len(response)} characters")
+        return response
+    
+    def chat(self, messages: list, temperature: float = 0.7, 
+             max_tokens: int = 2048) -> str:
+        """
+        Chat completion with message history
+        
+        Args:
+            messages: List of message dicts with 'role' and 'content'
+            temperature: Sampling temperature
+            max_tokens: Maximum tokens to generate
+            
+        Returns:
+            Generated text
+        """
+        logger.info(f"{self.name} processing chat with {len(messages)} messages")
+        response = self.llm.chat(
+            messages=messages,
+            temperature=temperature,
+            max_tokens=max_tokens
+        )
+        logger.debug(f"{self.name} generated {len(response)} characters")
+        return response
+    
+    @abstractmethod
+    def process(self, *args, **kwargs) -> Any:
+        """
+        Main processing method - must be implemented by subclasses
+        
+        Each agent implements its specialized logic here
+        """
+        pass
+    
+    def __str__(self):
+        return f"{self.name} (model: {self.model})"
--- a/community-contributions/sach91-bootcamp/week8/agents/connection_agent.py
+++ b/community-contributions/sach91-bootcamp/week8/agents/connection_agent.py
@@ -0,0 +1,289 @@
+"""
+Connection Agent - Finds relationships and connections between documents
+"""
+import logging
+from typing import List, Dict, Tuple
+from agents.base_agent import BaseAgent
+from models.knowledge_graph import KnowledgeNode, KnowledgeEdge, KnowledgeGraph
+from utils.embeddings import EmbeddingModel
+import chromadb
+import numpy as np
+
+logger = logging.getLogger(__name__)
+
+class ConnectionAgent(BaseAgent):
+    """Agent that discovers connections between documents and concepts"""
+    
+    def __init__(self, collection: chromadb.Collection,
+                 embedding_model: EmbeddingModel,
+                 llm_client=None, model: str = "llama3.2"):
+        """
+        Initialize connection agent
+        
+        Args:
+            collection: ChromaDB collection with documents
+            embedding_model: Model for computing similarities
+            llm_client: Optional shared LLM client
+            model: Ollama model name
+        """
+        super().__init__(name="ConnectionAgent", llm_client=llm_client, model=model)
+        
+        self.collection = collection
+        self.embedding_model = embedding_model
+        
+        logger.info(f"{self.name} initialized")
+    
+    def process(self, document_id: str = None, query: str = None, 
+                top_k: int = 5) -> Dict:
+        """
+        Find documents related to a document or query
+        
+        Args:
+            document_id: ID of reference document
+            query: Search query (used if document_id not provided)
+            top_k: Number of related documents to find
+            
+        Returns:
+            Dictionary with related documents and connections
+        """
+        if document_id:
+            logger.info(f"{self.name} finding connections for document: {document_id}")
+            return self._find_related_to_document(document_id, top_k)
+        elif query:
+            logger.info(f"{self.name} finding connections for query: {query[:100]}")
+            return self._find_related_to_query(query, top_k)
+        else:
+            return {'related': [], 'error': 'No document_id or query provided'}
+    
+    def _find_related_to_document(self, document_id: str, top_k: int) -> Dict:
+        """Find documents related to a specific document"""
+        try:
+            # Get chunks from the document
+            results = self.collection.get(
+                where={"document_id": document_id},
+                include=['embeddings', 'documents', 'metadatas']
+            )
+            
+            if not results['ids']:
+                return {'related': [], 'error': 'Document not found'}
+            
+            # Use the first chunk's embedding as representative
+            query_embedding = results['embeddings'][0]
+            document_name = results['metadatas'][0].get('filename', 'Unknown')
+            
+            # Search for similar chunks from OTHER documents
+            search_results = self.collection.query(
+                query_embeddings=[query_embedding],
+                n_results=top_k * 3,  # Get more to filter out same document
+                include=['documents', 'metadatas', 'distances']
+            )
+            
+            # Filter out chunks from the same document
+            related = []
+            seen_docs = set([document_id])
+            
+            if search_results['ids']:
+                for i in range(len(search_results['ids'][0])):
+                    related_doc_id = search_results['metadatas'][0][i].get('document_id')
+                    
+                    if related_doc_id not in seen_docs:
+                        seen_docs.add(related_doc_id)
+                        
+                        similarity = 1.0 - search_results['distances'][0][i]
+                        
+                        related.append({
+                            'document_id': related_doc_id,
+                            'document_name': search_results['metadatas'][0][i].get('filename', 'Unknown'),
+                            'similarity': float(similarity),
+                            'preview': search_results['documents'][0][i][:150] + "..."
+                        })
+                        
+                        if len(related) >= top_k:
+                            break
+            
+            return {
+                'source_document': document_name,
+                'source_id': document_id,
+                'related': related,
+                'num_related': len(related)
+            }
+            
+        except Exception as e:
+            logger.error(f"Error finding related documents: {e}")
+            return {'related': [], 'error': str(e)}
+    
+    def _find_related_to_query(self, query: str, top_k: int) -> Dict:
+        """Find documents related to a query"""
+        try:
+            # Generate query embedding
+            query_embedding = self.embedding_model.embed_query(query)
+            
+            # Search
+            results = self.collection.query(
+                query_embeddings=[query_embedding],
+                n_results=top_k * 2,  # Get more to deduplicate by document
+                include=['documents', 'metadatas', 'distances']
+            )
+            
+            # Deduplicate by document
+            related = []
+            seen_docs = set()
+            
+            if results['ids']:
+                for i in range(len(results['ids'][0])):
+                    doc_id = results['metadatas'][0][i].get('document_id')
+                    
+                    if doc_id not in seen_docs:
+                        seen_docs.add(doc_id)
+                        
+                        similarity = 1.0 - results['distances'][0][i]
+                        
+                        related.append({
+                            'document_id': doc_id,
+                            'document_name': results['metadatas'][0][i].get('filename', 'Unknown'),
+                            'similarity': float(similarity),
+                            'preview': results['documents'][0][i][:150] + "..."
+                        })
+                        
+                        if len(related) >= top_k:
+                            break
+            
+            return {
+                'query': query,
+                'related': related,
+                'num_related': len(related)
+            }
+            
+        except Exception as e:
+            logger.error(f"Error finding related documents: {e}")
+            return {'related': [], 'error': str(e)}
+    
+    def build_knowledge_graph(self, similarity_threshold: float = 0.7) -> KnowledgeGraph:
+        """
+        Build a knowledge graph showing document relationships
+        
+        Args:
+            similarity_threshold: Minimum similarity to create an edge
+            
+        Returns:
+            KnowledgeGraph object
+        """
+        logger.info(f"{self.name} building knowledge graph")
+        
+        graph = KnowledgeGraph()
+        
+        try:
+            # Get all documents
+            all_results = self.collection.get(
+                include=['embeddings', 'metadatas']
+            )
+            
+            if not all_results['ids']:
+                return graph
+            
+            # Group by document
+            documents = {}
+            for i, metadata in enumerate(all_results['metadatas']):
+                doc_id = metadata.get('document_id')
+                if doc_id not in documents:
+                    documents[doc_id] = {
+                        'name': metadata.get('filename', 'Unknown'),
+                        'embedding': all_results['embeddings'][i]
+                    }
+            
+            # Create nodes
+            for doc_id, doc_data in documents.items():
+                node = KnowledgeNode(
+                    id=doc_id,
+                    name=doc_data['name'],
+                    node_type='document',
+                    description=f"Document: {doc_data['name']}"
+                )
+                graph.add_node(node)
+            
+            # Create edges based on similarity
+            doc_ids = list(documents.keys())
+            for i, doc_id1 in enumerate(doc_ids):
+                emb1 = np.array(documents[doc_id1]['embedding'])
+                
+                for doc_id2 in doc_ids[i+1:]:
+                    emb2 = np.array(documents[doc_id2]['embedding'])
+                    
+                    # Calculate similarity
+                    similarity = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))
+                    
+                    if similarity >= similarity_threshold:
+                        edge = KnowledgeEdge(
+                            source_id=doc_id1,
+                            target_id=doc_id2,
+                            relationship='similar_to',
+                            weight=float(similarity)
+                        )
+                        graph.add_edge(edge)
+            
+            logger.info(f"{self.name} built graph with {len(graph.nodes)} nodes and {len(graph.edges)} edges")
+            return graph
+            
+        except Exception as e:
+            logger.error(f"Error building knowledge graph: {e}")
+            return graph
+    
+    def explain_connection(self, doc_id1: str, doc_id2: str) -> str:
+        """
+        Use LLM to explain why two documents are related
+        
+        Args:
+            doc_id1: First document ID
+            doc_id2: Second document ID
+            
+        Returns:
+            Explanation text
+        """
+        try:
+            # Get sample chunks from each document
+            results1 = self.collection.get(
+                where={"document_id": doc_id1},
+                limit=2,
+                include=['documents', 'metadatas']
+            )
+            
+            results2 = self.collection.get(
+                where={"document_id": doc_id2},
+                limit=2,
+                include=['documents', 'metadatas']
+            )
+            
+            if not results1['ids'] or not results2['ids']:
+                return "Could not retrieve documents"
+            
+            doc1_name = results1['metadatas'][0].get('filename', 'Document 1')
+            doc2_name = results2['metadatas'][0].get('filename', 'Document 2')
+            
+            doc1_text = " ".join(results1['documents'][:2])[:1000]
+            doc2_text = " ".join(results2['documents'][:2])[:1000]
+            
+            system_prompt = """You analyze documents and explain their relationships.
+Provide a brief, clear explanation of how two documents are related."""
+            
+            user_prompt = f"""Analyze these two documents and explain how they are related:
+
+Document 1 ({doc1_name}):
+{doc1_text}
+
+Document 2 ({doc2_name}):
+{doc2_text}
+
+How are these documents related? Provide a concise explanation:"""
+            
+            explanation = self.generate(
+                prompt=user_prompt,
+                system=system_prompt,
+                temperature=0.3,
+                max_tokens=256
+            )
+            
+            return explanation
+            
+        except Exception as e:
+            logger.error(f"Error explaining connection: {e}")
+            return f"Error: {str(e)}"
--- a/community-contributions/sach91-bootcamp/week8/agents/export_agent.py
+++ b/community-contributions/sach91-bootcamp/week8/agents/export_agent.py
@@ -0,0 +1,233 @@
+"""
+Export Agent - Generates formatted reports and exports
+"""
+import logging
+from typing import List, Dict
+from datetime import datetime
+from agents.base_agent import BaseAgent
+from models.document import Summary
+
+logger = logging.getLogger(__name__)
+
+class ExportAgent(BaseAgent):
+    """Agent that exports summaries and reports in various formats"""
+    
+    def __init__(self, llm_client=None, model: str = "llama3.2"):
+        """
+        Initialize export agent
+        
+        Args:
+            llm_client: Optional shared LLM client
+            model: Ollama model name
+        """
+        super().__init__(name="ExportAgent", llm_client=llm_client, model=model)
+        
+        logger.info(f"{self.name} initialized")
+    
+    def process(self, content: Dict, format: str = "markdown") -> str:
+        """
+        Export content in specified format
+        
+        Args:
+            content: Content dictionary to export
+            format: Export format ('markdown', 'text', 'html')
+            
+        Returns:
+            Formatted content string
+        """
+        logger.info(f"{self.name} exporting as {format}")
+        
+        if format == "markdown":
+            return self._export_markdown(content)
+        elif format == "text":
+            return self._export_text(content)
+        elif format == "html":
+            return self._export_html(content)
+        else:
+            return str(content)
+    
+    def _export_markdown(self, content: Dict) -> str:
+        """Export as Markdown"""
+        md = []
+        md.append(f"# Knowledge Report")
+        md.append(f"\n*Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}*\n")
+        
+        if 'title' in content:
+            md.append(f"## {content['title']}\n")
+        
+        if 'summary' in content:
+            md.append(f"### Summary\n")
+            md.append(f"{content['summary']}\n")
+        
+        if 'key_points' in content and content['key_points']:
+            md.append(f"### Key Points\n")
+            for point in content['key_points']:
+                md.append(f"- {point}")
+            md.append("")
+        
+        if 'sections' in content:
+            for section in content['sections']:
+                md.append(f"### {section['title']}\n")
+                md.append(f"{section['content']}\n")
+        
+        if 'sources' in content and content['sources']:
+            md.append(f"### Sources\n")
+            for i, source in enumerate(content['sources'], 1):
+                md.append(f"{i}. {source}")
+            md.append("")
+        
+        return "\n".join(md)
+    
+    def _export_text(self, content: Dict) -> str:
+        """Export as plain text"""
+        lines = []
+        lines.append("=" * 60)
+        lines.append("KNOWLEDGE REPORT")
+        lines.append(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}")
+        lines.append("=" * 60)
+        lines.append("")
+        
+        if 'title' in content:
+            lines.append(content['title'])
+            lines.append("-" * len(content['title']))
+            lines.append("")
+        
+        if 'summary' in content:
+            lines.append("SUMMARY:")
+            lines.append(content['summary'])
+            lines.append("")
+        
+        if 'key_points' in content and content['key_points']:
+            lines.append("KEY POINTS:")
+            for i, point in enumerate(content['key_points'], 1):
+                lines.append(f"  {i}. {point}")
+            lines.append("")
+        
+        if 'sections' in content:
+            for section in content['sections']:
+                lines.append(section['title'].upper())
+                lines.append("-" * 40)
+                lines.append(section['content'])
+                lines.append("")
+        
+        if 'sources' in content and content['sources']:
+            lines.append("SOURCES:")
+            for i, source in enumerate(content['sources'], 1):
+                lines.append(f"  {i}. {source}")
+        
+        lines.append("")
+        lines.append("=" * 60)
+        
+        return "\n".join(lines)
+    
+    def _export_html(self, content: Dict) -> str:
+        """Export as HTML"""
+        html = []
+        html.append("<!DOCTYPE html>")
+        html.append("<html>")
+        html.append("<head>")
+        html.append("  <meta charset='utf-8'>")
+        html.append("  <title>Knowledge Report</title>")
+        html.append("  <style>")
+        html.append("    body { font-family: Arial, sans-serif; max-width: 800px; margin: 40px auto; padding: 20px; }")
+        html.append("    h1 { color: #333; border-bottom: 3px solid #007bff; padding-bottom: 10px; }")
+        html.append("    h2 { color: #555; margin-top: 30px; }")
+        html.append("    .meta { color: #888; font-style: italic; }")
+        html.append("    .key-points { background: #f8f9fa; padding: 15px; border-left: 4px solid #007bff; }")
+        html.append("    .source { color: #666; font-size: 0.9em; }")
+        html.append("  </style>")
+        html.append("</head>")
+        html.append("<body>")
+        
+        html.append("  <h1>Knowledge Report</h1>")
+        html.append(f"  <p class='meta'>Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}</p>")
+        
+        if 'title' in content:
+            html.append(f"  <h2>{content['title']}</h2>")
+        
+        if 'summary' in content:
+            html.append(f"  <h3>Summary</h3>")
+            html.append(f"  <p>{content['summary']}</p>")
+        
+        if 'key_points' in content and content['key_points']:
+            html.append("  <h3>Key Points</h3>")
+            html.append("  <div class='key-points'>")
+            html.append("    <ul>")
+            for point in content['key_points']:
+                html.append(f"      <li>{point}</li>")
+            html.append("    </ul>")
+            html.append("  </div>")
+        
+        if 'sections' in content:
+            for section in content['sections']:
+                html.append(f"  <h3>{section['title']}</h3>")
+                html.append(f"  <p>{section['content']}</p>")
+        
+        if 'sources' in content and content['sources']:
+            html.append("  <h3>Sources</h3>")
+            html.append("  <ol class='source'>")
+            for source in content['sources']:
+                html.append(f"    <li>{source}</li>")
+            html.append("  </ol>")
+        
+        html.append("</body>")
+        html.append("</html>")
+        
+        return "\n".join(html)
+    
+    def create_study_guide(self, summaries: List[Summary]) -> str:
+        """
+        Create a study guide from multiple summaries
+        
+        Args:
+            summaries: List of Summary objects
+            
+        Returns:
+            Formatted study guide
+        """
+        logger.info(f"{self.name} creating study guide from {len(summaries)} summaries")
+        
+        # Compile all content
+        all_summaries = "\n\n".join([
+            f"{s.document_name}:\n{s.summary_text}" 
+            for s in summaries
+        ])
+        
+        all_key_points = []
+        for s in summaries:
+            all_key_points.extend(s.key_points)
+        
+        # Use LLM to create cohesive study guide
+        system_prompt = """You create excellent study guides that synthesize information from multiple sources.
+Create a well-organized study guide with clear sections, key concepts, and important points."""
+        
+        user_prompt = f"""Create a comprehensive study guide based on these document summaries:
+
+{all_summaries}
+
+Create a well-structured study guide with:
+1. An overview
+2. Key concepts
+3. Important details
+4. Study tips
+
+Study Guide:"""
+        
+        study_guide = self.generate(
+            prompt=user_prompt,
+            system=system_prompt,
+            temperature=0.5,
+            max_tokens=2048
+        )
+        
+        # Format as markdown
+        content = {
+            'title': 'Study Guide',
+            'sections': [
+                {'title': 'Overview', 'content': study_guide},
+                {'title': 'Key Points from All Documents', 'content': '\n'.join([f"• {p}" for p in all_key_points[:15]])}
+            ],
+            'sources': [s.document_name for s in summaries]
+        }
+        
+        return self._export_markdown(content)
--- a/community-contributions/sach91-bootcamp/week8/agents/ingestion_agent.py
+++ b/community-contributions/sach91-bootcamp/week8/agents/ingestion_agent.py
@@ -0,0 +1,157 @@
+"""
+Ingestion Agent - Processes and stores documents in the vector database
+"""
+import logging
+from typing import Dict, List
+import uuid
+from datetime import datetime
+
+from agents.base_agent import BaseAgent
+from models.document import Document, DocumentChunk
+from utils.document_parser import DocumentParser
+from utils.embeddings import EmbeddingModel
+import chromadb
+
+logger = logging.getLogger(__name__)
+
+class IngestionAgent(BaseAgent):
+    """Agent responsible for ingesting and storing documents"""
+    
+    def __init__(self, collection: chromadb.Collection, 
+                 embedding_model: EmbeddingModel,
+                 llm_client=None, model: str = "llama3.2"):
+        """
+        Initialize ingestion agent
+        
+        Args:
+            collection: ChromaDB collection for storage
+            embedding_model: Model for generating embeddings
+            llm_client: Optional shared LLM client
+            model: Ollama model name
+        """
+        super().__init__(name="IngestionAgent", llm_client=llm_client, model=model)
+        
+        self.collection = collection
+        self.embedding_model = embedding_model
+        self.parser = DocumentParser(chunk_size=1000, chunk_overlap=200)
+        
+        logger.info(f"{self.name} ready with ChromaDB collection")
+    
+    def process(self, file_path: str) -> Document:
+        """
+        Process and ingest a document
+        
+        Args:
+            file_path: Path to the document file
+            
+        Returns:
+            Document object with metadata
+        """
+        logger.info(f"{self.name} processing: {file_path}")
+        
+        # Parse the document
+        parsed = self.parser.parse_file(file_path)
+        
+        # Generate document ID
+        doc_id = str(uuid.uuid4())
+        
+        # Create document chunks
+        chunks = []
+        chunk_texts = []
+        chunk_ids = []
+        chunk_metadatas = []
+        
+        for i, chunk_text in enumerate(parsed['chunks']):
+            chunk_id = f"{doc_id}_chunk_{i}"
+            
+            chunk = DocumentChunk(
+                id=chunk_id,
+                document_id=doc_id,
+                content=chunk_text,
+                chunk_index=i,
+                metadata={
+                    'filename': parsed['filename'],
+                    'extension': parsed['extension'],
+                    'total_chunks': len(parsed['chunks'])
+                }
+            )
+            
+            chunks.append(chunk)
+            chunk_texts.append(chunk_text)
+            chunk_ids.append(chunk_id)
+            chunk_metadatas.append({
+                'document_id': doc_id,
+                'filename': parsed['filename'],
+                'chunk_index': i,
+                'extension': parsed['extension']
+            })
+        
+        # Generate embeddings
+        logger.info(f"{self.name} generating embeddings for {len(chunks)} chunks")
+        embeddings = self.embedding_model.embed_documents(chunk_texts)
+        
+        # Store in ChromaDB
+        logger.info(f"{self.name} storing in ChromaDB")
+        self.collection.add(
+            ids=chunk_ids,
+            documents=chunk_texts,
+            embeddings=embeddings,
+            metadatas=chunk_metadatas
+        )
+        
+        # Create document object
+        document = Document(
+            id=doc_id,
+            filename=parsed['filename'],
+            filepath=parsed['filepath'],
+            content=parsed['text'],
+            chunks=chunks,
+            metadata={
+                'extension': parsed['extension'],
+                'num_chunks': len(chunks),
+                'total_chars': parsed['total_chars']
+            },
+            created_at=datetime.now()
+        )
+        
+        logger.info(f"{self.name} successfully ingested: {document}")
+        return document
+    
+    def get_statistics(self) -> Dict:
+        """Get statistics about stored documents"""
+        try:
+            count = self.collection.count()
+            return {
+                'total_chunks': count,
+                'collection_name': self.collection.name
+            }
+        except Exception as e:
+            logger.error(f"Error getting statistics: {e}")
+            return {'total_chunks': 0, 'error': str(e)}
+    
+    def delete_document(self, document_id: str) -> bool:
+        """
+        Delete all chunks of a document
+        
+        Args:
+            document_id: ID of document to delete
+            
+        Returns:
+            True if successful
+        """
+        try:
+            # Get all chunk IDs for this document
+            results = self.collection.get(
+                where={"document_id": document_id}
+            )
+            
+            if results['ids']:
+                self.collection.delete(ids=results['ids'])
+                logger.info(f"{self.name} deleted document {document_id}")
+                return True
+            
+            return False
+            
+        except Exception as e:
+            logger.error(f"Error deleting document: {e}")
+            return False
--- a/community-contributions/sach91-bootcamp/week8/agents/question_agent.py
+++ b/community-contributions/sach91-bootcamp/week8/agents/question_agent.py
@@ -0,0 +1,156 @@
+"""
+Question Agent - Answers questions using RAG (Retrieval Augmented Generation)
+"""
+import logging
+from typing import List, Dict
+from agents.base_agent import BaseAgent
+from models.document import SearchResult, DocumentChunk
+from utils.embeddings import EmbeddingModel
+import chromadb
+
+logger = logging.getLogger(__name__)
+
+class QuestionAgent(BaseAgent):
+    """Agent that answers questions using retrieved context"""
+    
+    def __init__(self, collection: chromadb.Collection,
+                 embedding_model: EmbeddingModel,
+                 llm_client=None, model: str = "llama3.2"):
+        """
+        Initialize question agent
+        
+        Args:
+            collection: ChromaDB collection with documents
+            embedding_model: Model for query embeddings
+            llm_client: Optional shared LLM client
+            model: Ollama model name
+        """
+        super().__init__(name="QuestionAgent", llm_client=llm_client, model=model)
+        
+        self.collection = collection
+        self.embedding_model = embedding_model
+        self.top_k = 5  # Number of chunks to retrieve
+        
+        logger.info(f"{self.name} initialized")
+    
+    def retrieve(self, query: str, top_k: int = None) -> List[SearchResult]:
+        """
+        Retrieve relevant document chunks for a query
+        
+        Args:
+            query: Search query
+            top_k: Number of results to return (uses self.top_k if None)
+            
+        Returns:
+            List of SearchResult objects
+        """
+        if top_k is None:
+            top_k = self.top_k
+        
+        logger.info(f"{self.name} retrieving top {top_k} chunks for query")
+        
+        # Generate query embedding
+        query_embedding = self.embedding_model.embed_query(query)
+        
+        # Search ChromaDB
+        results = self.collection.query(
+            query_embeddings=[query_embedding],
+            n_results=top_k
+        )
+        
+        # Convert to SearchResult objects
+        search_results = []
+        
+        if results['ids'] and len(results['ids']) > 0:
+            for i in range(len(results['ids'][0])):
+                chunk = DocumentChunk(
+                    id=results['ids'][0][i],
+                    document_id=results['metadatas'][0][i].get('document_id', ''),
+                    content=results['documents'][0][i],
+                    chunk_index=results['metadatas'][0][i].get('chunk_index', 0),
+                    metadata=results['metadatas'][0][i]
+                )
+                
+                result = SearchResult(
+                    chunk=chunk,
+                    score=1.0 - results['distances'][0][i],  # Convert distance to similarity
+                    document_id=results['metadatas'][0][i].get('document_id', ''),
+                    document_name=results['metadatas'][0][i].get('filename', 'Unknown')
+                )
+                
+                search_results.append(result)
+        
+        logger.info(f"{self.name} retrieved {len(search_results)} results")
+        return search_results
+    
+    def process(self, question: str, top_k: int = None) -> Dict[str, any]:
+        """
+        Answer a question using RAG
+        
+        Args:
+            question: User's question
+            top_k: Number of chunks to retrieve
+            
+        Returns:
+            Dictionary with answer and sources
+        """
+        logger.info(f"{self.name} processing question: {question[:100]}...")
+        
+        # Retrieve relevant chunks
+        search_results = self.retrieve(question, top_k)
+        
+        if not search_results:
+            return {
+                'answer': "I don't have any relevant information in my knowledge base to answer this question.",
+                'sources': [],
+                'context_used': ""
+            }
+        
+        # Build context from retrieved chunks
+        context_parts = []
+        sources = []
+        
+        for i, result in enumerate(search_results, 1):
+            context_parts.append(f"[Source {i}] {result.chunk.content}")
+            sources.append({
+                'document': result.document_name,
+                'score': result.score,
+                'preview': result.chunk.content[:150] + "..."
+            })
+        
+        context = "\n\n".join(context_parts)
+        
+        # Create prompt for LLM
+        system_prompt = """You are a helpful research assistant. Answer questions based on the provided context.
+Be accurate and cite sources when possible. If the context doesn't contain enough information to answer fully, say so.
+Keep your answer concise and relevant."""
+        
+        user_prompt = f"""Context from my knowledge base:
+
+{context}
+
+Question: {question}
+
+Answer based on the context above. If you reference specific information, mention which source(s) you're using."""
+        
+        # Generate answer
+        answer = self.generate(
+            prompt=user_prompt,
+            system=system_prompt,
+            temperature=0.3,  # Lower temperature for more factual responses
+            max_tokens=1024
+        )
+        
+        logger.info(f"{self.name} generated answer ({len(answer)} chars)")
+        
+        return {
+            'answer': answer,
+            'sources': sources,
+            'context_used': context,
+            'num_sources': len(sources)
+        }
+    
+    def set_top_k(self, k: int):
+        """Set the number of chunks to retrieve"""
+        self.top_k = k
+        logger.info(f"{self.name} top_k set to {k}")
--- a/community-contributions/sach91-bootcamp/week8/agents/summary_agent.py
+++ b/community-contributions/sach91-bootcamp/week8/agents/summary_agent.py
@@ -0,0 +1,181 @@
+"""
+Summary Agent - Creates summaries and extracts key points from documents
+"""
+import logging
+from typing import Dict, List
+from agents.base_agent import BaseAgent
+from models.document import Summary
+import chromadb
+
+logger = logging.getLogger(__name__)
+
+class SummaryAgent(BaseAgent):
+    """Agent that creates summaries of documents"""
+    
+    def __init__(self, collection: chromadb.Collection,
+                 llm_client=None, model: str = "llama3.2"):
+        """
+        Initialize summary agent
+        
+        Args:
+            collection: ChromaDB collection with documents
+            llm_client: Optional shared LLM client
+            model: Ollama model name
+        """
+        super().__init__(name="SummaryAgent", llm_client=llm_client, model=model)
+        self.collection = collection
+        
+        logger.info(f"{self.name} initialized")
+    
+    def process(self, document_id: str = None, document_text: str = None, 
+                document_name: str = "Unknown") -> Summary:
+        """
+        Create a summary of a document
+        
+        Args:
+            document_id: ID of document in ChromaDB (retrieves chunks if provided)
+            document_text: Full document text (used if document_id not provided)
+            document_name: Name of the document
+            
+        Returns:
+            Summary object
+        """
+        logger.info(f"{self.name} creating summary for: {document_name}")
+        
+        # Get document text
+        if document_id:
+            text = self._get_document_text(document_id)
+            if not text:
+                return Summary(
+                    document_id=document_id,
+                    document_name=document_name,
+                    summary_text="Error: Could not retrieve document",
+                    key_points=[]
+                )
+        elif document_text:
+            text = document_text
+        else:
+            return Summary(
+                document_id="",
+                document_name=document_name,
+                summary_text="Error: No document provided",
+                key_points=[]
+            )
+        
+        # Truncate if too long (to fit in context)
+        max_chars = 8000
+        if len(text) > max_chars:
+            logger.warning(f"{self.name} truncating document from {len(text)} to {max_chars} chars")
+            text = text[:max_chars] + "\n\n[Document truncated...]"
+        
+        # Generate summary
+        summary_text = self._generate_summary(text)
+        
+        # Extract key points
+        key_points = self._extract_key_points(text)
+        
+        summary = Summary(
+            document_id=document_id or "",
+            document_name=document_name,
+            summary_text=summary_text,
+            key_points=key_points
+        )
+        
+        logger.info(f"{self.name} completed summary with {len(key_points)} key points")
+        return summary
+    
+    def _get_document_text(self, document_id: str) -> str:
+        """Retrieve and reconstruct document text from chunks"""
+        try:
+            results = self.collection.get(
+                where={"document_id": document_id}
+            )
+            
+            if not results['ids']:
+                return ""
+            
+            # Sort by chunk index
+            chunks_data = list(zip(
+                results['documents'],
+                results['metadatas']
+            ))
+            
+            chunks_data.sort(key=lambda x: x[1].get('chunk_index', 0))
+            
+            # Combine chunks
+            text = "\n\n".join([chunk[0] for chunk in chunks_data])
+            return text
+            
+        except Exception as e:
+            logger.error(f"Error retrieving document: {e}")
+            return ""
+    
+    def _generate_summary(self, text: str) -> str:
+        """Generate a concise summary of the text"""
+        system_prompt = """You are an expert at creating concise, informative summaries.
+Your summaries capture the main ideas and key information in clear, accessible language.
+Keep summaries to 3-5 sentences unless the document is very long."""
+        
+        user_prompt = f"""Please create a concise summary of the following document:
+
+{text}
+
+Summary:"""
+        
+        summary = self.generate(
+            prompt=user_prompt,
+            system=system_prompt,
+            temperature=0.3,
+            max_tokens=512
+        )
+        
+        return summary.strip()
+    
+    def _extract_key_points(self, text: str) -> List[str]:
+        """Extract key points from the text"""
+        system_prompt = """You extract the most important key points from documents.
+List 3-7 key points as concise bullet points. Each point should be a complete, standalone statement."""
+        
+        user_prompt = f"""Please extract the key points from the following document:
+
+{text}
+
+List the key points (one per line, without bullets or numbers):"""
+        
+        response = self.generate(
+            prompt=user_prompt,
+            system=system_prompt,
+            temperature=0.3,
+            max_tokens=512
+        )
+        
+        # Parse the response into a list
+        key_points = []
+        for line in response.split('\n'):
+            line = line.strip()
+            # Remove common list markers
+            line = line.lstrip('•-*0123456789.)')
+            line = line.strip()
+            
+            if line and len(line) > 10:  # Filter out very short lines
+                key_points.append(line)
+        
+        return key_points[:7]  # Limit to 7 points
+    
+    def summarize_multiple(self, document_ids: List[str]) -> List[Summary]:
+        """
+        Create summaries for multiple documents
+        
+        Args:
+            document_ids: List of document IDs
+            
+        Returns:
+            List of Summary objects
+        """
+        summaries = []
+        
+        for doc_id in document_ids:
+            summary = self.process(document_id=doc_id)
+            summaries.append(summary)
+        
+        return summaries
--- a/community-contributions/sach91-bootcamp/week8/app.py
+++ b/community-contributions/sach91-bootcamp/week8/app.py
@@ -0,0 +1,846 @@
+"""
+KnowledgeHub - Personal Knowledge Management & Research Assistant
+Main Gradio Application
+"""
+import os
+import logging
+import json
+import gradio as gr
+from pathlib import Path
+import chromadb
+from datetime import datetime
+
+# Setup logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+
+# Import utilities and agents
+from utils import OllamaClient, EmbeddingModel, DocumentParser
+from agents import (
+    IngestionAgent, QuestionAgent, SummaryAgent,
+    ConnectionAgent, ExportAgent
+)
+from models import Document
+
+# Constants
+VECTORSTORE_PATH = "./vectorstore"
+TEMP_UPLOAD_PATH = "./temp_uploads"
+DOCUMENTS_METADATA_PATH = "./vectorstore/documents_metadata.json"
+
+# Ensure directories exist
+os.makedirs(VECTORSTORE_PATH, exist_ok=True)
+os.makedirs(TEMP_UPLOAD_PATH, exist_ok=True)
+
+class KnowledgeHub:
+    """Main application class managing all agents"""
+
+    def __init__(self):
+        logger.info("Initializing KnowledgeHub...")
+
+        # Initialize ChromaDB
+        self.client = chromadb.PersistentClient(path=VECTORSTORE_PATH)
+        self.collection = self.client.get_or_create_collection(
+            name="knowledge_base",
+            metadata={"description": "Personal knowledge management collection"}
+        )
+
+        # Initialize embedding model
+        self.embedding_model = EmbeddingModel()
+
+        # Initialize shared LLM client
+        self.llm_client = OllamaClient(model="llama3.2")
+
+        # Check Ollama connection
+        if not self.llm_client.check_connection():
+            logger.warning("⚠️ Cannot connect to Ollama. Please ensure Ollama is running.")
+            logger.warning("Start Ollama with: ollama serve")
+        else:
+            logger.info("✓ Connected to Ollama")
+
+        # Initialize agents
+        self.ingestion_agent = IngestionAgent(
+            collection=self.collection,
+            embedding_model=self.embedding_model,
+            llm_client=self.llm_client
+        )
+
+        self.question_agent = QuestionAgent(
+            collection=self.collection,
+            embedding_model=self.embedding_model,
+            llm_client=self.llm_client
+        )
+
+        self.summary_agent = SummaryAgent(
+            collection=self.collection,
+            llm_client=self.llm_client
+        )
+
+        self.connection_agent = ConnectionAgent(
+            collection=self.collection,
+            embedding_model=self.embedding_model,
+            llm_client=self.llm_client
+        )
+
+        self.export_agent = ExportAgent(
+            llm_client=self.llm_client
+        )
+
+        # Track uploaded documents
+        self.documents = {}
+
+        # Load existing documents from metadata file
+        self._load_documents_metadata()
+
+        logger.info("✓ KnowledgeHub initialized successfully")
+
+    def _save_documents_metadata(self):
+        """Save document metadata to JSON file"""
+        try:
+            metadata = {
+                doc_id: doc.to_dict()
+                for doc_id, doc in self.documents.items()
+            }
+
+            with open(DOCUMENTS_METADATA_PATH, 'w') as f:
+                json.dump(metadata, f, indent=2)
+
+            logger.debug(f"Saved metadata for {len(metadata)} documents")
+        except Exception as e:
+            logger.error(f"Error saving document metadata: {e}")
+
+    def _load_documents_metadata(self):
+        """Load document metadata from JSON file"""
+        try:
+            if os.path.exists(DOCUMENTS_METADATA_PATH):
+                with open(DOCUMENTS_METADATA_PATH, 'r') as f:
+                    metadata = json.load(f)
+
+                # Reconstruct Document objects (simplified - without chunks)
+                for doc_id, doc_data in metadata.items():
+                    # Create a minimal Document object for UI purposes
+                    # Full chunks are still in ChromaDB
+                    doc = Document(
+                        id=doc_id,
+                        filename=doc_data['filename'],
+                        filepath=doc_data.get('filepath', ''),
+                        content=doc_data.get('content', ''),
+                        chunks=[],  # Chunks are in ChromaDB
+                        metadata=doc_data.get('metadata', {}),
+                        created_at=datetime.fromisoformat(doc_data['created_at'])
+                    )
+                    self.documents[doc_id] = doc
+
+                logger.info(f"✓ Loaded {len(self.documents)} existing documents from storage")
+            else:
+                logger.info("No existing documents found (starting fresh)")
+
+        except Exception as e:
+            logger.error(f"Error loading document metadata: {e}")
+            logger.info("Starting with empty document list")
+
+    def upload_document(self, files, progress=gr.Progress()):
+        """Handle document upload - supports single or multiple files with progress tracking"""
+        if files is None or len(files) == 0:
+            return "⚠️ Please select file(s) to upload", "", []
+
+        # Convert single file to list for consistent handling
+        if not isinstance(files, list):
+            files = [files]
+
+        results = []
+        successful = 0
+        failed = 0
+        total_chunks = 0
+
+        # Initialize progress tracking
+        progress(0, desc="Starting upload...")
+
+        for file_idx, file in enumerate(files, 1):
+            # Update progress
+            progress_pct = (file_idx - 1) / len(files)
+            progress(progress_pct, desc=f"Processing {file_idx}/{len(files)}: {Path(file.name).name}")
+
+            try:
+                logger.info(f"Processing file {file_idx}/{len(files)}: {file.name}")
+
+                # Save uploaded file temporarily
+                temp_path = os.path.join(TEMP_UPLOAD_PATH, Path(file.name).name)
+
+                # Copy file content
+                with open(temp_path, 'wb') as f:
+                    f.write(file.read() if hasattr(file, 'read') else open(file.name, 'rb').read())
+
+                # Process document
+                document = self.ingestion_agent.process(temp_path)
+
+                # Store document reference
+                self.documents[document.id] = document
+
+                # Track stats
+                successful += 1
+                total_chunks += document.num_chunks
+
+                # Add to results
+                results.append({
+                    'status': '✅',
+                    'filename': document.filename,
+                    'chunks': document.num_chunks,
+                    'size': f"{document.total_chars:,} chars"
+                })
+
+                # Clean up temp file
+                os.remove(temp_path)
+
+            except Exception as e:
+                logger.error(f"Error processing {file.name}: {e}")
+                failed += 1
+                results.append({
+                    'status': '❌',
+                    'filename': Path(file.name).name,
+                    'chunks': 0,
+                    'size': f"Error: {str(e)[:50]}"
+                })
+
+        # Final progress update
+        progress(1.0, desc="Upload complete!")
+
+        # Save metadata once after all uploads
+        if successful > 0:
+            self._save_documents_metadata()
+
+        # Create summary
+        summary = f"""## Upload Complete! 🎉
+
+**Total Files:** {len(files)}
+**✅ Successful:** {successful}
+**❌ Failed:** {failed}
+**Total Chunks Created:** {total_chunks:,}
+
+{f"⚠️ **{failed} file(s) failed** - Check results table below for details" if failed > 0 else "All files processed successfully!"}
+"""
+
+        # Create detailed results table
+        results_table = [[r['status'], r['filename'], r['chunks'], r['size']] for r in results]
+
+        # Create preview of first successful document
+        preview = ""
+        for doc in self.documents.values():
+            if doc.filename in [r['filename'] for r in results if r['status'] == '✅']:
+                preview = doc.content[:500] + "..." if len(doc.content) > 500 else doc.content
+                break
+
+        return summary, preview, results_table
+
+    def ask_question(self, question, top_k, progress=gr.Progress()):
+        """Handle question answering with progress tracking"""
+        if not question.strip():
+            return "⚠️ Please enter a question", [], ""
+
+        try:
+            # Initial status
+            progress(0, desc="Processing your question...")
+            status = "🔄 **Searching knowledge base...**\n\nRetrieving relevant documents..."
+
+            logger.info(f"Answering question: {question[:100]}")
+
+            # Update progress
+            progress(0.3, desc="Finding relevant documents...")
+
+            result = self.question_agent.process(question, top_k=top_k)
+
+            # Update progress
+            progress(0.7, desc="Generating answer with LLM...")
+
+            # Format answer
+            answer = f"""### Answer\n\n{result['answer']}\n\n"""
+
+            if result['sources']:
+                answer += f"**Sources:** {result['num_sources']} documents referenced\n\n"
+
+            # Format sources for display
+            sources_data = []
+            for i, source in enumerate(result['sources'], 1):
+                sources_data.append([
+                    i,
+                    source['document'],
+                    f"{source['score']:.2%}",
+                    source['preview']
+                ])
+
+            progress(1.0, desc="Answer ready!")
+
+            return answer, sources_data, "✅ Answer generated successfully!"
+
+        except Exception as e:
+            logger.error(f"Error answering question: {e}")
+            return f"❌ Error: {str(e)}", [], f"❌ Error: {str(e)}"
+
+    def create_summary(self, doc_selector, progress=gr.Progress()):
+        """Create document summary with progress tracking"""
+        if not doc_selector:
+            return "⚠️ Please select a document to summarize", ""
+
+        try:
+            # Initial status
+            progress(0, desc="Preparing to summarize...")
+
+            logger.info(f'doc_selector : {doc_selector}')
+            doc_id = doc_selector.split(" -|- ")[1]
+            document = self.documents.get(doc_id)
+
+            if not document:
+                return "", "❌ Document not found"
+
+            # Update status
+            status_msg = f"🔄 **Generating summary for:** {document.filename}\n\nPlease wait, this may take 10-20 seconds..."
+            progress(0.3, desc=f"Analyzing {document.filename}...")
+
+            logger.info(f"Creating summary for: {document.filename}")
+
+            # Generate summary
+            summary = self.summary_agent.process(
+                document_id=doc_id,
+                document_name=document.filename
+            )
+
+            progress(1.0, desc="Summary complete!")
+
+            # Format result
+            result = f"""## Summary of {summary.document_name}\n\n{summary.summary_text}\n\n"""
+
+            if summary.key_points:
+                result += "### Key Points\n\n"
+                for point in summary.key_points:
+                    result += f"- {point}\n"
+
+            return result, "✅ Summary generated successfully!"
+
+        except Exception as e:
+            logger.error(f"Error creating summary: {e}")
+            return "", f"❌ Error: {str(e)}"
+
+    def find_connections(self, doc_selector, top_k, progress=gr.Progress()):
+        """Find related documents with progress tracking"""
+        if not doc_selector:
+            return "⚠️ Please select a document", [], ""
+
+        try:
+            progress(0, desc="Preparing to find connections...")
+
+            doc_id = doc_selector.split(" -|- ")[1]
+            document = self.documents.get(doc_id)
+
+            if not document:
+                return "❌ Document not found", [], "❌ Document not found"
+
+            status = f"🔄 **Finding documents related to:** {document.filename}\n\nSearching knowledge base..."
+            progress(0.3, desc=f"Analyzing {document.filename}...")
+
+            logger.info(f"Finding connections for: {document.filename}")
+
+            result = self.connection_agent.process(document_id=doc_id, top_k=top_k)
+
+            progress(0.8, desc="Calculating similarity scores...")
+
+            if 'error' in result:
+                return f"❌ Error: {result['error']}", [], f"❌ Error: {result['error']}"
+
+            message = f"""## Related Documents\n\n**Source:** {result['source_document']}\n\n"""
+            message += f"**Found {result['num_related']} related documents:**\n\n"""
+
+            # Format for table
+            table_data = []
+            for i, rel in enumerate(result['related'], 1):
+                table_data.append([
+                    i,
+                    rel['document_name'],
+                    f"{rel['similarity']:.2%}",
+                    rel['preview']
+                ])
+
+            progress(1.0, desc="Connections found!")
+
+            return message, table_data, "✅ Related documents found!"
+
+        except Exception as e:
+            logger.error(f"Error finding connections: {e}")
+            return f"❌ Error: {str(e)}", [], f"❌ Error: {str(e)}"
+
+    def export_knowledge(self, format_choice):
+        """Export knowledge base"""
+        try:
+            logger.info(f"Exporting as {format_choice}")
+
+            # Get statistics
+            stats = self.ingestion_agent.get_statistics()
+
+            # Create export content
+            content = {
+                'title': 'Knowledge Base Export',
+                'summary': f"Total documents in knowledge base: {len(self.documents)}",
+                'sections': [
+                    {
+                        'title': 'Documents',
+                        'content': '\n'.join([f"- {doc.filename}" for doc in self.documents.values()])
+                    },
+                    {
+                        'title': 'Statistics',
+                        'content': f"Total chunks stored: {stats['total_chunks']}"
+                    }
+                ]
+            }
+
+            # Export
+            if format_choice == "Markdown":
+                output = self.export_agent.process(content, format="markdown")
+                filename = f"knowledge_export_{datetime.now().strftime('%Y%m%d_%H%M%S')}.md"
+            elif format_choice == "HTML":
+                output = self.export_agent.process(content, format="html")
+                filename = f"knowledge_export_{datetime.now().strftime('%Y%m%d_%H%M%S')}.html"
+            else:  # Text
+                output = self.export_agent.process(content, format="text")
+                filename = f"knowledge_export_{datetime.now().strftime('%Y%m%d_%H%M%S')}.txt"
+
+            # Save file
+            export_path = os.path.join(TEMP_UPLOAD_PATH, filename)
+            with open(export_path, 'w', encoding='utf-8') as f:
+                f.write(output)
+
+            return f"✅ Exported as {format_choice}", export_path
+
+        except Exception as e:
+            logger.error(f"Error exporting: {e}")
+            return f"❌ Error: {str(e)}", None
+
+    def get_statistics(self):
+        """Get knowledge base statistics"""
+        try:
+            stats = self.ingestion_agent.get_statistics()
+
+            total_docs = len(self.documents)
+            total_chunks = stats.get('total_chunks', 0)
+            total_chars = sum(doc.total_chars for doc in self.documents.values())
+
+            # Check if data is persisted
+            persistence_status = "✅ Enabled" if os.path.exists(DOCUMENTS_METADATA_PATH) else "⚠️ Not configured"
+            vectorstore_size = self._get_directory_size(VECTORSTORE_PATH)
+
+            stats_text = f"""## Knowledge Base Statistics
+
+**Persistence Status:** {persistence_status}
+**Total Documents:** {total_docs}
+**Total Chunks:** {total_chunks:,}
+**Total Characters:** {total_chars:,}
+**Vector Store Size:** {vectorstore_size}
+
+### Storage Locations
+- **Vector DB:** `{VECTORSTORE_PATH}/`
+- **Metadata:** `{DOCUMENTS_METADATA_PATH}`
+
+**📝 Note:** Your data persists across app restarts!
+
+**Recent Documents:**
+{chr(10).join([f"- {doc.filename} ({doc.num_chunks} chunks)" for doc in list(self.documents.values())[-5:]])}
+"""
+            if self.documents:
+                stats_text += "\n".join([f"- {doc.filename} ({doc.num_chunks} chunks, added {doc.created_at.strftime('%Y-%m-%d')})"
+                                        for doc in list(self.documents.values())[-10:]])
+            else:
+                stats_text += "\n*No documents yet. Upload some to get started!*"
+
+            return stats_text
+
+        except Exception as e:
+            return f"❌ Error: {str(e)}"
+
+    def _get_directory_size(self, path):
+        """Calculate directory size"""
+        try:
+            total_size = 0
+            for dirpath, dirnames, filenames in os.walk(path):
+                for filename in filenames:
+                    filepath = os.path.join(dirpath, filename)
+                    if os.path.exists(filepath):
+                        total_size += os.path.getsize(filepath)
+
+            # Convert to human readable
+            for unit in ['B', 'KB', 'MB', 'GB']:
+                if total_size < 1024.0:
+                    return f"{total_size:.1f} {unit}"
+                total_size /= 1024.0
+            return f"{total_size:.1f} TB"
+        except:
+            return "Unknown"
+
+    def get_document_list(self):
+        """Get list of documents for dropdown"""
+        new_choices = [f"{doc.filename} -|- {doc.id}" for doc in self.documents.values()]
+        return gr.update(choices=new_choices, value=None)
+
+
+    def delete_document(self, doc_selector):
+        """Delete a document from the knowledge base"""
+        if not doc_selector:
+            return "⚠️ Please select a document to delete", self.get_document_list()
+
+        try:
+            doc_id = doc_selector.split(" - ")[0]
+            document = self.documents.get(doc_id)
+
+            if not document:
+                return "❌ Document not found", self.get_document_list()
+
+            # Delete from ChromaDB
+            success = self.ingestion_agent.delete_document(doc_id)
+
+            if success:
+                # Remove from documents dict
+                filename = document.filename
+                del self.documents[doc_id]
+
+                # Save updated metadata
+                self._save_documents_metadata()
+
+                return f"✅ Deleted: {filename}", self.get_document_list()
+            else:
+                return f"❌ Error deleting document", self.get_document_list()
+
+        except Exception as e:
+            logger.error(f"Error deleting document: {e}")
+            return f"❌ Error: {str(e)}", self.get_document_list()
+
+    def clear_all_documents(self):
+        """Clear entire knowledge base"""
+        try:
+            # Delete collection
+            self.client.delete_collection("knowledge_base")
+
+            # Recreate empty collection
+            self.collection = self.client.create_collection(
+                name="knowledge_base",
+                metadata={"description": "Personal knowledge management collection"}
+            )
+
+            # Update agents with new collection
+            self.ingestion_agent.collection = self.collection
+            self.question_agent.collection = self.collection
+            self.summary_agent.collection = self.collection
+            self.connection_agent.collection = self.collection
+
+            # Clear documents
+            self.documents = {}
+            self._save_documents_metadata()
+
+            return "✅ All documents cleared from knowledge base"
+
+        except Exception as e:
+            logger.error(f"Error clearing database: {e}")
+            return f"❌ Error: {str(e)}"
+
+
+def create_ui():
+    """Create Gradio interface"""
+
+    # Initialize app
+    app = KnowledgeHub()
+
+    # Custom CSS
+    custom_css = """
+    .main-header {
+        text-align: center;
+        background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+        color: white;
+        padding: 30px;
+        border-radius: 10px;
+        margin-bottom: 20px;
+    }
+    .stat-box {
+        background: #f8f9fa;
+        padding: 15px;
+        border-radius: 8px;
+        border-left: 4px solid #667eea;
+    }
+    """
+
+    with gr.Blocks(title="KnowledgeHub", css=custom_css, theme=gr.themes.Soft()) as interface:
+
+        # Header
+        gr.HTML("""
+        <div class="main-header">
+            <h1>🧠 KnowledgeHub</h1>
+            <p>Personal Knowledge Management & Research Assistant</p>
+            <p style="font-size: 14px; opacity: 0.9;">
+                Powered by Ollama (Llama 3.2) • Fully Local & Private
+            </p>
+        </div>
+        """)
+
+        # Main tabs
+        with gr.Tabs():
+
+            # Tab 1: Upload Documents
+            with gr.Tab("📤 Upload Documents"):
+                gr.Markdown("### Upload your documents to build your knowledge base")
+                gr.Markdown("*Supported formats: PDF, DOCX, TXT, MD, HTML, PY*")
+                gr.Markdown("*💡 Tip: You can select multiple files at once!*")
+
+                with gr.Row():
+                    with gr.Column():
+                        file_input = gr.File(
+                            label="Select Document(s)",
+                            file_types=[".pdf", ".docx", ".txt", ".md", ".html", ".py"],
+                            file_count="multiple"  # Enable multiple file selection
+                        )
+                        upload_btn = gr.Button("📤 Upload & Process", variant="primary")
+
+                    with gr.Column():
+                        upload_status = gr.Markdown("Ready to upload documents")
+
+                # Results table for batch uploads
+                with gr.Row():
+                    upload_results = gr.Dataframe(
+                        headers=["Status", "Filename", "Chunks", "Size"],
+                        label="Upload Results",
+                        wrap=True,
+                        visible=True
+                    )
+
+                with gr.Row():
+                    document_preview = gr.Textbox(
+                        label="Document Preview (First Uploaded)",
+                        lines=10,
+                        max_lines=15
+                    )
+
+                upload_btn.click(
+                    fn=app.upload_document,
+                    inputs=[file_input],
+                    outputs=[upload_status, document_preview, upload_results]
+                )
+
+            # Tab 2: Ask Questions
+            with gr.Tab("❓ Ask Questions"):
+                gr.Markdown("### Ask questions about your documents")
+                gr.Markdown("*Uses RAG (Retrieval Augmented Generation) to answer based on your knowledge base*")
+
+                with gr.Row():
+                    with gr.Column(scale=3):
+                        question_input = gr.Textbox(
+                            label="Your Question",
+                            placeholder="What would you like to know?",
+                            lines=3
+                        )
+
+                    with gr.Column(scale=1):
+                        top_k_slider = gr.Slider(
+                            minimum=1,
+                            maximum=10,
+                            value=5,
+                            step=1,
+                            label="Number of sources"
+                        )
+                        ask_btn = gr.Button("🔍 Ask", variant="primary")
+
+                qa_status = gr.Markdown("Ready to answer questions")
+                answer_output = gr.Markdown(label="Answer")
+
+                sources_table = gr.Dataframe(
+                    headers=["#", "Document", "Relevance", "Preview"],
+                    label="Sources",
+                    wrap=True
+                )
+
+                ask_btn.click(
+                    fn=app.ask_question,
+                    inputs=[question_input, top_k_slider],
+                    outputs=[answer_output, sources_table, qa_status]
+                )
+
+            # Tab 3: Summarize
+            with gr.Tab("📝 Summarize"):
+                gr.Markdown("### Generate summaries and extract key points")
+
+                with gr.Row():
+                    with gr.Column():
+                        doc_selector = gr.Dropdown(
+                            choices=[],
+                            label="Select Document",
+                            info="Choose a document to summarize",
+                            allow_custom_value=True
+                        )
+                        refresh_btn = gr.Button("🔄 Refresh List")
+                        summarize_btn = gr.Button("📝 Generate Summary", variant="primary")
+                        summary_status = gr.Markdown("Ready to generate summaries")
+
+                    with gr.Column(scale=2):
+                        summary_output = gr.Markdown(label="Summary")
+
+                summarize_btn.click(
+                    fn=app.create_summary,
+                    inputs=[doc_selector],
+                    outputs=[summary_output, summary_status]
+                )
+
+                refresh_btn.click(
+                    fn=app.get_document_list,
+                    outputs=[doc_selector]
+                )
+
+            # Tab 4: Find Connections
+            with gr.Tab("🔗 Find Connections"):
+                gr.Markdown("### Discover relationships between documents")
+
+                with gr.Row():
+                    with gr.Column():
+                        conn_doc_selector = gr.Dropdown(
+                            choices=[],
+                            label="Select Document",
+                            info="Find documents related to this one",
+                            allow_custom_value=True
+                        )
+                        conn_top_k = gr.Slider(
+                            minimum=1,
+                            maximum=10,
+                            value=5,
+                            step=1,
+                            label="Number of related documents"
+                        )
+                        refresh_conn_btn = gr.Button("🔄 Refresh List")
+                        find_btn = gr.Button("🔗 Find Connections", variant="primary")
+                        connection_status = gr.Markdown("Ready to find connections")
+
+                connection_output = gr.Markdown(label="Connections")
+
+                connections_table = gr.Dataframe(
+                    headers=["#", "Document", "Similarity", "Preview"],
+                    label="Related Documents",
+                    wrap=True
+                )
+
+                find_btn.click(
+                    fn=app.find_connections,
+                    inputs=[conn_doc_selector, conn_top_k],
+                    outputs=[connection_output, connections_table, connection_status]
+                )
+
+                refresh_conn_btn.click(
+                    fn=app.get_document_list,
+                    outputs=[conn_doc_selector]
+                )
+
+            # Tab 5: Export
+            with gr.Tab("💾 Export"):
+                gr.Markdown("### Export your knowledge base")
+
+                with gr.Row():
+                    with gr.Column():
+                        format_choice = gr.Radio(
+                            choices=["Markdown", "HTML", "Text"],
+                            value="Markdown",
+                            label="Export Format"
+                        )
+                        export_btn = gr.Button("💾 Export", variant="primary")
+
+                    with gr.Column():
+                        export_status = gr.Markdown("Ready to export")
+                        export_file = gr.File(label="Download Export")
+
+                export_btn.click(
+                    fn=app.export_knowledge,
+                    inputs=[format_choice],
+                    outputs=[export_status, export_file]
+                )
+
+            # Tab 6: Manage Documents
+            with gr.Tab("🗂️ Manage Documents"):
+                gr.Markdown("### Manage your document library")
+
+                with gr.Row():
+                    with gr.Column():
+                        gr.Markdown("#### Delete Document")
+                        delete_doc_selector = gr.Dropdown(
+                            choices=[],
+                            label="Select Document to Delete",
+                            info="Choose a document to remove from knowledge base"
+                        )
+                        with gr.Row():
+                            refresh_delete_btn = gr.Button("🔄 Refresh List")
+                            delete_btn = gr.Button("🗑️ Delete Document", variant="stop")
+                        delete_status = gr.Markdown("")
+
+                    with gr.Column():
+                        gr.Markdown("#### Clear All Documents")
+                        gr.Markdown("⚠️ **Warning:** This will delete your entire knowledge base!")
+                        clear_confirm = gr.Textbox(
+                            label="Type 'DELETE ALL' to confirm",
+                            placeholder="DELETE ALL"
+                        )
+                        clear_all_btn = gr.Button("🗑️ Clear All Documents", variant="stop")
+                        clear_status = gr.Markdown("")
+
+                def confirm_and_clear(confirm_text):
+                    if confirm_text.strip() == "DELETE ALL":
+                        return app.clear_all_documents()
+                    else:
+                        return "⚠️ Please type 'DELETE ALL' to confirm"
+
+                delete_btn.click(
+                    fn=app.delete_document,
+                    inputs=[delete_doc_selector],
+                    outputs=[delete_status, delete_doc_selector]
+                )
+
+                refresh_delete_btn.click(
+                    fn=app.get_document_list,
+                    outputs=[delete_doc_selector]
+                )
+
+                clear_all_btn.click(
+                    fn=confirm_and_clear,
+                    inputs=[clear_confirm],
+                    outputs=[clear_status]
+                )
+
+            # Tab 7: Statistics
+            with gr.Tab("📊 Statistics"):
+                gr.Markdown("### Knowledge Base Overview")
+                
+                stats_output = gr.Markdown()
+                stats_btn = gr.Button("🔄 Refresh Statistics", variant="primary")
+                
+                stats_btn.click(
+                    fn=app.get_statistics,
+                    outputs=[stats_output]
+                )
+                
+                # Auto-load stats on tab open
+                interface.load(
+                    fn=app.get_statistics,
+                    outputs=[stats_output]
+                )
+        
+        # Footer
+        gr.HTML("""
+        <div style="text-align: center; margin-top: 30px; padding: 20px; color: #666;">
+            <p>🔒 All processing happens locally on your machine • Your data never leaves your computer</p>
+            <p style="font-size: 12px;">Powered by Ollama, ChromaDB, and Sentence Transformers</p>
+        </div>
+        """)
+    
+    return interface
+
+
+if __name__ == "__main__":
+    logger.info("Starting KnowledgeHub...")
+    
+    # Create and launch interface
+    interface = create_ui()
+    interface.launch(
+        server_name="127.0.0.1",
+        server_port=7860,
+        share=False,
+        inbrowser=True
+    )
--- a/community-contributions/sach91-bootcamp/week8/models/init.py
+++ b/community-contributions/sach91-bootcamp/week8/models/init.py
@@ -0,0 +1,13 @@
+"""
+models
+"""
+from .knowledge_graph import KnowledgeGraph
+from .document import Document, DocumentChunk, SearchResult, Summary
+
+__all__ = [
+    'KnowledgeGraph',
+    'Document',
+    'DocumentChunk',
+    'SearchResult',
+    'Summary'
+]
--- a/community-contributions/sach91-bootcamp/week8/models/document.py
+++ b/community-contributions/sach91-bootcamp/week8/models/document.py
@@ -0,0 +1,82 @@
+"""
+Document data models
+"""
+from dataclasses import dataclass, field
+from typing import List, Dict, Optional
+from datetime import datetime
+
+@dataclass
+class DocumentChunk:
+    """Represents a chunk of a document"""
+    id: str
+    document_id: str
+    content: str
+    chunk_index: int
+    metadata: Dict = field(default_factory=dict)
+    
+    def __str__(self):
+        preview = self.content[:100] + "..." if len(self.content) > 100 else self.content
+        return f"Chunk {self.chunk_index}: {preview}"
+
+@dataclass
+class Document:
+    """Represents a complete document"""
+    id: str
+    filename: str
+    filepath: str
+    content: str
+    chunks: List[DocumentChunk]
+    metadata: Dict = field(default_factory=dict)
+    created_at: datetime = field(default_factory=datetime.now)
+    
+    @property
+    def num_chunks(self) -> int:
+        return len(self.chunks)
+    
+    @property
+    def total_chars(self) -> int:
+        return len(self.content)
+    
+    @property
+    def extension(self) -> str:
+        return self.metadata.get('extension', '')
+    
+    def __str__(self):
+        return f"Document: {self.filename} ({self.num_chunks} chunks, {self.total_chars} chars)"
+    
+    def to_dict(self) -> Dict:
+        """Convert to dictionary for storage"""
+        return {
+            'id': self.id,
+            'filename': self.filename,
+            'filepath': self.filepath,
+            'content': self.content[:500] + '...' if len(self.content) > 500 else self.content,
+            'num_chunks': self.num_chunks,
+            'total_chars': self.total_chars,
+            'extension': self.extension,
+            'created_at': self.created_at.isoformat(),
+            'metadata': self.metadata
+        }
+
+@dataclass
+class SearchResult:
+    """Represents a search result from the vector database"""
+    chunk: DocumentChunk
+    score: float
+    document_id: str
+    document_name: str
+    
+    def __str__(self):
+        return f"{self.document_name} (score: {self.score:.2f})"
+
+@dataclass
+class Summary:
+    """Represents a document summary"""
+    document_id: str
+    document_name: str
+    summary_text: str
+    key_points: List[str] = field(default_factory=list)
+    created_at: datetime = field(default_factory=datetime.now)
+    
+    def __str__(self):
+        return f"Summary of {self.document_name}: {self.summary_text[:100]}..."
--- a/community-contributions/sach91-bootcamp/week8/models/knowledge_graph.py
+++ b/community-contributions/sach91-bootcamp/week8/models/knowledge_graph.py
@@ -0,0 +1,110 @@
+"""
+Knowledge Graph data models
+"""
+from dataclasses import dataclass, field
+from typing import List, Dict, Set
+from datetime import datetime
+
+@dataclass
+class KnowledgeNode:
+    """Represents a concept or entity in the knowledge graph"""
+    id: str
+    name: str
+    node_type: str  # 'document', 'concept', 'entity', 'topic'
+    description: str = ""
+    metadata: Dict = field(default_factory=dict)
+    created_at: datetime = field(default_factory=datetime.now)
+    
+    def __str__(self):
+        return f"{self.node_type.capitalize()}: {self.name}"
+
+@dataclass
+class KnowledgeEdge:
+    """Represents a relationship between nodes"""
+    source_id: str
+    target_id: str
+    relationship: str  # 'related_to', 'cites', 'contains', 'similar_to'
+    weight: float = 1.0
+    metadata: Dict = field(default_factory=dict)
+    
+    def __str__(self):
+        return f"{self.source_id} --[{self.relationship}]--> {self.target_id}"
+
+@dataclass
+class KnowledgeGraph:
+    """Represents the complete knowledge graph"""
+    nodes: Dict[str, KnowledgeNode] = field(default_factory=dict)
+    edges: List[KnowledgeEdge] = field(default_factory=list)
+    
+    def add_node(self, node: KnowledgeNode):
+        """Add a node to the graph"""
+        self.nodes[node.id] = node
+    
+    def add_edge(self, edge: KnowledgeEdge):
+        """Add an edge to the graph"""
+        if edge.source_id in self.nodes and edge.target_id in self.nodes:
+            self.edges.append(edge)
+    
+    def get_neighbors(self, node_id: str) -> List[str]:
+        """Get all nodes connected to a given node"""
+        neighbors = set()
+        for edge in self.edges:
+            if edge.source_id == node_id:
+                neighbors.add(edge.target_id)
+            elif edge.target_id == node_id:
+                neighbors.add(edge.source_id)
+        return list(neighbors)
+    
+    def get_related_documents(self, node_id: str, max_depth: int = 2) -> Set[str]:
+        """Get all documents related to a node within max_depth hops"""
+        related = set()
+        visited = set()
+        queue = [(node_id, 0)]
+        
+        while queue:
+            current_id, depth = queue.pop(0)
+            
+            if current_id in visited or depth > max_depth:
+                continue
+            
+            visited.add(current_id)
+            
+            # If this is a document node, add it
+            if current_id in self.nodes and self.nodes[current_id].node_type == 'document':
+                related.add(current_id)
+            
+            # Add neighbors to queue
+            if depth < max_depth:
+                for neighbor_id in self.get_neighbors(current_id):
+                    if neighbor_id not in visited:
+                        queue.append((neighbor_id, depth + 1))
+        
+        return related
+    
+    def to_networkx(self):
+        """Convert to NetworkX graph for visualization"""
+        try:
+            import networkx as nx
+            
+            G = nx.Graph()
+            
+            # Add nodes
+            for node_id, node in self.nodes.items():
+                G.add_node(node_id, 
+                          name=node.name, 
+                          type=node.node_type,
+                          description=node.description)
+            
+            # Add edges
+            for edge in self.edges:
+                G.add_edge(edge.source_id, edge.target_id, 
+                          relationship=edge.relationship,
+                          weight=edge.weight)
+            
+            return G
+        
+        except ImportError:
+            return None
+    
+    def __str__(self):
+        return f"KnowledgeGraph: {len(self.nodes)} nodes, {len(self.edges)} edges"
--- a/community-contributions/sach91-bootcamp/week8/requirements.txt
+++ b/community-contributions/sach91-bootcamp/week8/requirements.txt
@@ -0,0 +1,26 @@
+# Core Dependencies
+gradio>=4.0.0
+chromadb>=0.4.0
+sentence-transformers>=2.2.0
+python-dotenv>=1.0.0
+
+# Document Processing
+pypdf>=3.0.0
+python-docx>=1.0.0
+markdown>=3.4.0
+beautifulsoup4>=4.12.0
+
+# Data Processing
+numpy>=1.24.0
+pandas>=2.0.0
+tqdm>=4.65.0
+
+# Visualization
+plotly>=5.14.0
+networkx>=3.0
+
+# Ollama Client
+requests>=2.31.0
+
+# Optional but useful
+scikit-learn>=1.3.0
--- a/community-contributions/sach91-bootcamp/week8/start.bat
+++ b/community-contributions/sach91-bootcamp/week8/start.bat
@@ -0,0 +1,71 @@
+@echo off
+REM KnowledgeHub Startup Script for Windows
+
+echo 🧠 Starting KnowledgeHub...
+echo.
+
+REM Check if Ollama is installed
+where ollama >nul 2>nul
+if %errorlevel% neq 0 (
+    echo ❌ Ollama is not installed or not in PATH
+    echo Please install Ollama from https://ollama.com/download
+    pause
+    exit /b 1
+)
+
+REM Check Python
+where python >nul 2>nul
+if %errorlevel% neq 0 (
+    echo ❌ Python is not installed or not in PATH
+    echo Please install Python 3.8+ from https://www.python.org/downloads/
+    pause
+    exit /b 1
+)
+
+echo ✅ Prerequisites found
+echo.
+
+REM Check if Ollama service is running
+tasklist /FI "IMAGENAME eq ollama.exe" 2>NUL | find /I /N "ollama.exe">NUL
+if "%ERRORLEVEL%"=="1" (
+    echo ⚠️  Ollama is not running. Please start Ollama first.
+    echo You can start it from the Start menu or by running: ollama serve
+    pause
+    exit /b 1
+)
+
+echo ✅ Ollama is running
+echo.
+
+REM Check if model exists
+ollama list | find "llama3.2" >nul
+if %errorlevel% neq 0 (
+    echo 📥 Llama 3.2 model not found. Pulling model...
+    echo This may take a few minutes on first run...
+    ollama pull llama3.2
+)
+
+echo ✅ Model ready
+echo.
+
+REM Install dependencies
+echo 🔍 Checking dependencies...
+python -c "import gradio" 2>nul
+if %errorlevel% neq 0 (
+    echo 📦 Installing dependencies...
+    pip install -r requirements.txt
+)
+
+echo ✅ Dependencies ready
+echo.
+
+REM Launch application
+echo 🚀 Launching KnowledgeHub...
+echo The application will open in your browser at http://127.0.0.1:7860
+echo.
+echo Press Ctrl+C to stop the application
+echo.
+
+python app.py
+
+pause
--- a/community-contributions/sach91-bootcamp/week8/start.sh
+++ b/community-contributions/sach91-bootcamp/week8/start.sh
@@ -0,0 +1,42 @@
+#!/bin/bash
+
+# KnowledgeHub Startup Script
+
+echo "🧠 Starting KnowledgeHub..."
+echo ""
+
+# Check if Ollama is running
+if ! pgrep -x "ollama" > /dev/null; then
+    echo "⚠️  Ollama is not running. Starting Ollama..."
+    ollama serve &
+    sleep 3
+fi
+
+# Check if llama3.2 model exists
+if ! ollama list | grep -q "llama3.2"; then
+    echo "📥 Llama 3.2 model not found. Pulling model..."
+    echo "This may take a few minutes on first run..."
+    ollama pull llama3.2
+fi
+
+echo "✅ Ollama is ready"
+echo ""
+
+# Check Python dependencies
+echo "🔍 Checking dependencies..."
+if ! python -c "import gradio" 2>/dev/null; then
+    echo "📦 Installing dependencies..."
+    pip install -r requirements.txt
+fi
+
+echo "✅ Dependencies ready"
+echo ""
+
+# Launch the application
+echo "🚀 Launching KnowledgeHub..."
+echo "The application will open in your browser at http://127.0.0.1:7860"
+echo ""
+echo "Press Ctrl+C to stop the application"
+echo ""
+
+python app.py
--- a/community-contributions/sach91-bootcamp/week8/utils/init.py
+++ b/community-contributions/sach91-bootcamp/week8/utils/init.py
@@ -0,0 +1,12 @@
+"""
+models
+"""
+from .document_parser import DocumentParser
+from .embeddings import EmbeddingModel
+from .ollama_client import OllamaClient
+
+__all__ = [
+    'DocumentParser',
+    'EmbeddingModel',
+    'OllamaClient'
+]
--- a/community-contributions/sach91-bootcamp/week8/utils/document_parser.py
+++ b/community-contributions/sach91-bootcamp/week8/utils/document_parser.py
@@ -0,0 +1,218 @@
+"""
+Document Parser - Extract text from various document formats
+"""
+import os
+from typing import List, Dict, Optional
+import logging
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+class DocumentParser:
+    """Parse various document formats into text chunks"""
+    
+    SUPPORTED_FORMATS = ['.pdf', '.docx', '.txt', '.md', '.html', '.py']
+    
+    def __init__(self, chunk_size: int = 1000, chunk_overlap: int = 200):
+        """
+        Initialize document parser
+        
+        Args:
+            chunk_size: Maximum characters per chunk
+            chunk_overlap: Overlap between chunks for context preservation
+        """
+        self.chunk_size = chunk_size
+        self.chunk_overlap = chunk_overlap
+    
+    def parse_file(self, file_path: str) -> Dict:
+        """
+        Parse a file and return structured document data
+        
+        Args:
+            file_path: Path to the file
+            
+        Returns:
+            Dictionary with document metadata and chunks
+        """
+        path = Path(file_path)
+        
+        if not path.exists():
+            raise FileNotFoundError(f"File not found: {file_path}")
+        
+        extension = path.suffix.lower()
+        
+        if extension not in self.SUPPORTED_FORMATS:
+            raise ValueError(f"Unsupported format: {extension}")
+        
+        # Extract text based on file type
+        if extension == '.pdf':
+            text = self._parse_pdf(file_path)
+        elif extension == '.docx':
+            text = self._parse_docx(file_path)
+        elif extension == '.txt' or extension == '.py':
+            text = self._parse_txt(file_path)
+        elif extension == '.md':
+            text = self._parse_markdown(file_path)
+        elif extension == '.html':
+            text = self._parse_html(file_path)
+        else:
+            text = ""
+        
+        # Create chunks
+        chunks = self._create_chunks(text)
+        
+        return {
+            'filename': path.name,
+            'filepath': str(path.absolute()),
+            'extension': extension,
+            'text': text,
+            'chunks': chunks,
+            'num_chunks': len(chunks),
+            'total_chars': len(text)
+        }
+    
+    def _parse_pdf(self, file_path: str) -> str:
+        """Extract text from PDF"""
+        try:
+            from pypdf import PdfReader
+            
+            reader = PdfReader(file_path)
+            text = ""
+            
+            for page in reader.pages:
+                text += page.extract_text() + "\n\n"
+            
+            return text.strip()
+        
+        except ImportError:
+            logger.error("pypdf not installed. Install with: pip install pypdf")
+            return ""
+        except Exception as e:
+            logger.error(f"Error parsing PDF: {e}")
+            return ""
+    
+    def _parse_docx(self, file_path: str) -> str:
+        """Extract text from DOCX"""
+        try:
+            from docx import Document
+            
+            doc = Document(file_path)
+            text = "\n\n".join([para.text for para in doc.paragraphs if para.text.strip()])
+            
+            return text.strip()
+        
+        except ImportError:
+            logger.error("python-docx not installed. Install with: pip install python-docx")
+            return ""
+        except Exception as e:
+            logger.error(f"Error parsing DOCX: {e}")
+            return ""
+    
+    def _parse_txt(self, file_path: str) -> str:
+        """Extract text from TXT"""
+        try:
+            with open(file_path, 'r', encoding='utf-8') as f:
+                return f.read().strip()
+        except Exception as e:
+            logger.error(f"Error parsing TXT: {e}")
+            return ""
+    
+    def _parse_markdown(self, file_path: str) -> str:
+        """Extract text from Markdown"""
+        try:
+            import markdown
+            from bs4 import BeautifulSoup
+            
+            with open(file_path, 'r', encoding='utf-8') as f:
+                md_text = f.read()
+            
+            # Convert markdown to HTML then extract text
+            html = markdown.markdown(md_text)
+            soup = BeautifulSoup(html, 'html.parser')
+            text = soup.get_text()
+            
+            return text.strip()
+        
+        except ImportError:
+            # Fallback: just read as plain text
+            return self._parse_txt(file_path)
+        except Exception as e:
+            logger.error(f"Error parsing Markdown: {e}")
+            return ""
+    
+    def _parse_html(self, file_path: str) -> str:
+        """Extract text from HTML"""
+        try:
+            from bs4 import BeautifulSoup
+            
+            with open(file_path, 'r', encoding='utf-8') as f:
+                html = f.read()
+            
+            soup = BeautifulSoup(html, 'html.parser')
+            
+            # Remove script and style elements
+            for script in soup(["script", "style"]):
+                script.decompose()
+            
+            text = soup.get_text()
+            
+            # Clean up whitespace
+            lines = (line.strip() for line in text.splitlines())
+            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
+            text = '\n'.join(chunk for chunk in chunks if chunk)
+            
+            return text.strip()
+        
+        except ImportError:
+            logger.error("beautifulsoup4 not installed. Install with: pip install beautifulsoup4")
+            return ""
+        except Exception as e:
+            logger.error(f"Error parsing HTML: {e}")
+            return ""
+    
+    def _create_chunks(self, text: str) -> List[str]:
+        """
+        Split text into overlapping chunks
+        
+        Args:
+            text: Full text to chunk
+            
+        Returns:
+            List of text chunks
+        """
+        if not text:
+            return []
+        
+        chunks = []
+        start = 0
+        text_length = len(text)
+        
+        while start < text_length:
+            logger.info(f'Processing chunk at {start}, for len {text_length}.')
+
+            end = start + self.chunk_size
+            
+            # If this isn't the last chunk, try to break at a sentence or paragraph
+            if end < text_length:
+                # Look for paragraph break first
+                break_pos = text.rfind('\n\n', start, end)
+                if break_pos == -1:
+                    # Look for sentence break
+                    break_pos = text.rfind('. ', start, end)
+                if break_pos == -1:
+                    # Look for any space
+                    break_pos = text.rfind(' ', start, end)
+                
+                if break_pos != -1 and break_pos > start and break_pos > end - self.chunk_overlap:
+                    end = break_pos + 1
+            
+            chunk = text[start:end].strip()
+            if chunk:
+                chunks.append(chunk)
+            
+            # Move start position with overlap
+            start = end - self.chunk_overlap
+            if start < 0:
+                start = 0
+        
+        return chunks
--- a/community-contributions/sach91-bootcamp/week8/utils/embeddings.py
+++ b/community-contributions/sach91-bootcamp/week8/utils/embeddings.py
@@ -0,0 +1,84 @@
+"""
+Embeddings utility using sentence-transformers
+"""
+from sentence_transformers import SentenceTransformer
+import numpy as np
+from typing import List, Union
+import logging
+
+logger = logging.getLogger(__name__)
+
+class EmbeddingModel:
+    """Wrapper for sentence transformer embeddings"""
+    
+    def __init__(self, model_name: str = "sentence-transformers/all-MiniLM-L6-v2"):
+        """
+        Initialize embedding model
+        
+        Args:
+            model_name: HuggingFace model name for embeddings
+        """
+        self.model_name = model_name
+        logger.info(f"Loading embedding model: {model_name}")
+        self.model = SentenceTransformer(model_name)
+        self.dimension = self.model.get_sentence_embedding_dimension()
+        logger.info(f"Embedding dimension: {self.dimension}")
+    
+    def embed(self, texts: Union[str, List[str]]) -> np.ndarray:
+        """
+        Generate embeddings for text(s)
+        
+        Args:
+            texts: Single text or list of texts
+            
+        Returns:
+            Numpy array of embeddings
+        """
+        if isinstance(texts, str):
+            texts = [texts]
+        
+        embeddings = self.model.encode(texts, show_progress_bar=False)
+        return embeddings
+    
+    def embed_query(self, query: str) -> List[float]:
+        """
+        Embed a single query - returns as list for ChromaDB compatibility
+        
+        Args:
+            query: Query text
+            
+        Returns:
+            List of floats representing the embedding
+        """
+        embedding = self.model.encode([query], show_progress_bar=False)[0]
+        return embedding.tolist()
+    
+    def embed_documents(self, documents: List[str]) -> List[List[float]]:
+        """
+        Embed multiple documents - returns as list of lists for ChromaDB
+        
+        Args:
+            documents: List of document texts
+            
+        Returns:
+            List of embeddings (each as list of floats)
+        """
+        embeddings = self.model.encode(documents, show_progress_bar=False)
+        return embeddings.tolist()
+    
+    def similarity(self, text1: str, text2: str) -> float:
+        """
+        Calculate cosine similarity between two texts
+        
+        Args:
+            text1: First text
+            text2: Second text
+            
+        Returns:
+            Similarity score between 0 and 1
+        """
+        emb1, emb2 = self.model.encode([text1, text2])
+        
+        # Cosine similarity
+        similarity = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))
+        return float(similarity)
--- a/community-contributions/sach91-bootcamp/week8/utils/ollama_client.py
+++ b/community-contributions/sach91-bootcamp/week8/utils/ollama_client.py
@@ -0,0 +1,107 @@
+"""
+Ollama Client - Wrapper for local Ollama API
+"""
+import requests
+import json
+from typing import List, Dict, Optional
+import logging
+
+logger = logging.getLogger(__name__)
+
+class OllamaClient:
+    """Client for interacting with local Ollama models"""
+    
+    def __init__(self, base_url: str = "http://localhost:11434", model: str = "llama3.2"):
+        self.base_url = base_url
+        self.model = model
+        self.api_url = f"{base_url}/api"
+        
+    def generate(self, prompt: str, system: Optional[str] = None, 
+                 temperature: float = 0.7, max_tokens: int = 2048) -> str:
+        """Generate text from a prompt"""
+        try:
+            payload = {
+                "model": self.model,
+                "prompt": prompt,
+                "stream": False,
+                "options": {
+                    "temperature": temperature,
+                    "num_predict": max_tokens
+                }
+            }
+            
+            if system:
+                payload["system"] = system
+                
+            response = requests.post(
+                f"{self.api_url}/generate",
+                json=payload,
+                timeout=1200
+            )
+            response.raise_for_status()
+            
+            result = response.json()
+            return result.get("response", "").strip()
+            
+        except requests.exceptions.RequestException as e:
+            logger.error(f"Ollama API error: {e}")
+            return f"Error: Unable to connect to Ollama. Is it running? ({str(e)})"
+    
+    def chat(self, messages: List[Dict[str, str]], 
+             temperature: float = 0.7, max_tokens: int = 2048) -> str:
+        """Chat completion with message history"""
+        try:
+            payload = {
+                "model": self.model,
+                "messages": messages,
+                "stream": False,
+                "options": {
+                    "temperature": temperature,
+                    "num_predict": max_tokens
+                }
+            }
+            
+            response = requests.post(
+                f"{self.api_url}/chat",
+                json=payload,
+                timeout=1200
+            )
+            response.raise_for_status()
+            
+            result = response.json()
+            return result.get("message", {}).get("content", "").strip()
+            
+        except requests.exceptions.RequestException as e:
+            logger.error(f"Ollama API error: {e}")
+            return f"Error: Unable to connect to Ollama. Is it running? ({str(e)})"
+    
+    def check_connection(self) -> bool:
+        """Check if Ollama is running and model is available"""
+        try:
+            response = requests.get(f"{self.base_url}/api/tags", timeout=5)
+            response.raise_for_status()
+            
+            models = response.json().get("models", [])
+            model_names = [m["name"] for m in models]
+            
+            if self.model not in model_names:
+                logger.warning(f"Model {self.model} not found. Available: {model_names}")
+                return False
+            
+            return True
+            
+        except requests.exceptions.RequestException as e:
+            logger.error(f"Cannot connect to Ollama: {e}")
+            return False
+    
+    def list_models(self) -> List[str]:
+        """List available Ollama models"""
+        try:
+            response = requests.get(f"{self.base_url}/api/tags", timeout=5)
+            response.raise_for_status()
+            
+            models = response.json().get("models", [])
+            return [m["name"] for m in models]
+            
+        except requests.exceptions.RequestException:
+            return []
--- a/community-contributions/sach91-bootcamp/week8/verify_setup.py
+++ b/community-contributions/sach91-bootcamp/week8/verify_setup.py
@@ -0,0 +1,129 @@
+"""
+Setup Verification Script for KnowledgeHub
+Run this to check if everything is configured correctly
+"""
+import sys
+import os
+
+print("🔍 KnowledgeHub Setup Verification\n")
+print("=" * 60)
+
+# Check Python version
+print(f"✓ Python version: {sys.version}")
+print(f"✓ Python executable: {sys.executable}")
+print(f"✓ Current directory: {os.getcwd()}")
+print()
+
+# Check directory structure
+print("📁 Checking directory structure...")
+required_dirs = ['agents', 'models', 'utils']
+for dir_name in required_dirs:
+    if os.path.isdir(dir_name):
+        init_file = os.path.join(dir_name, '__init__.py')
+        if os.path.exists(init_file):
+            print(f"  ✓ {dir_name}/ exists with __init__.py")
+        else:
+            print(f"  ⚠️  {dir_name}/ exists but missing __init__.py")
+    else:
+        print(f"  ❌ {dir_name}/ directory not found")
+print()
+
+# Check required files
+print("📄 Checking required files...")
+required_files = ['app.py', 'requirements.txt']
+for file_name in required_files:
+    if os.path.exists(file_name):
+        print(f"  ✓ {file_name} exists")
+    else:
+        print(f"  ❌ {file_name} not found")
+print()
+
+# Try importing modules
+print("📦 Testing imports...")
+errors = []
+
+try:
+    from utils import OllamaClient, EmbeddingModel, DocumentParser
+    print("  ✓ utils module imported successfully")
+except ImportError as e:
+    print(f"  ❌ Cannot import utils: {e}")
+    errors.append(str(e))
+
+try:
+    from models import Document, DocumentChunk, SearchResult, Summary
+    print("  ✓ models module imported successfully")
+except ImportError as e:
+    print(f"  ❌ Cannot import models: {e}")
+    errors.append(str(e))
+
+try:
+    from agents import (
+        IngestionAgent, QuestionAgent, SummaryAgent,
+        ConnectionAgent, ExportAgent
+    )
+    print("  ✓ agents module imported successfully")
+except ImportError as e:
+    print(f"  ❌ Cannot import agents: {e}")
+    errors.append(str(e))
+
+print()
+
+# Check dependencies
+print("📚 Checking Python dependencies...")
+required_packages = [
+    'gradio', 'chromadb', 'sentence_transformers', 
+    'requests', 'numpy', 'tqdm'
+]
+
+missing_packages = []
+for package in required_packages:
+    try:
+        __import__(package.replace('-', '_'))
+        print(f"  ✓ {package} installed")
+    except ImportError:
+        print(f"  ❌ {package} not installed")
+        missing_packages.append(package)
+
+print()
+
+# Check Ollama
+print("🤖 Checking Ollama...")
+try:
+    import requests
+    response = requests.get('http://localhost:11434/api/tags', timeout=2)
+    if response.status_code == 200:
+        print("  ✓ Ollama is running")
+        models = response.json().get('models', [])
+        if models:
+            print(f"  ✓ Available models: {[m['name'] for m in models]}")
+            if any('llama3.2' in m['name'] for m in models):
+                print("  ✓ llama3.2 model found")
+            else:
+                print("  ⚠️  llama3.2 model not found. Run: ollama pull llama3.2")
+        else:
+            print("  ⚠️  No models found. Run: ollama pull llama3.2")
+    else:
+        print("  ⚠️  Ollama responded but with error")
+except Exception as e:
+    print(f"  ❌ Cannot connect to Ollama: {e}")
+    print("     Start Ollama with: ollama serve")
+
+print()
+print("=" * 60)
+
+# Final summary
+if errors or missing_packages:
+    print("\n⚠️  ISSUES FOUND:\n")
+    if errors:
+        print("Import Errors:")
+        for error in errors:
+            print(f"  - {error}")
+    if missing_packages:
+        print("\nMissing Packages:")
+        print(f"  Run: pip install {' '.join(missing_packages)}")
+    print("\n💡 Fix these issues before running app.py")
+else:
+    print("\n✅ All checks passed! You're ready to run:")
+    print("   python app.py")
+    
+print()