ai-web-summarizer
This commit is contained in:
33
week3/community-contributions/ai-web-summarizer/.gitignore
vendored
Normal file
33
week3/community-contributions/ai-web-summarizer/.gitignore
vendored
Normal file
@@ -0,0 +1,33 @@
|
||||
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*.pyo
|
||||
*.pyd
|
||||
.Python
|
||||
env/
|
||||
venv/
|
||||
*.env
|
||||
*.ini
|
||||
*.log
|
||||
|
||||
# VSCode
|
||||
.vscode/
|
||||
|
||||
# IDE files
|
||||
.idea/
|
||||
|
||||
# System files
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Environment variables
|
||||
.env
|
||||
|
||||
# Jupyter notebook checkpoints
|
||||
.ipynb_checkpoints
|
||||
|
||||
# Dependencies
|
||||
*.egg-info/
|
||||
dist/
|
||||
build/
|
||||
143
week3/community-contributions/ai-web-summarizer/README.md
Normal file
143
week3/community-contributions/ai-web-summarizer/README.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# AI Web Page Summarizer
|
||||
|
||||
This project is a simple AI-powered web page summarizer that leverages OpenAI's GPT models and local inference with Ollama to generate concise summaries of given text. The goal is to create a "Reader's Digest of the Internet" by summarizing web content efficiently.
|
||||
|
||||
## Features
|
||||
|
||||
- Summarize text using OpenAI's GPT models or local Ollama models.
|
||||
- Flexible summarization engine selection (OpenAI API, Ollama API, or Ollama library).
|
||||
- Simple and modular code structure.
|
||||
- Error handling for better reliability.
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
ai-summarizer/
|
||||
│-- summarizer/
|
||||
│ │-- __init__.py
|
||||
│ │-- fetcher.py # Web content fetching logic
|
||||
│ │-- summarizer.py # Main summarization logic
|
||||
│-- utils/
|
||||
│ │-- __init__.py
|
||||
│ │-- logger.py # Logging configuration
|
||||
│-- main.py # Entry point of the app
|
||||
│-- .env # Environment variables
|
||||
│-- requirements.txt # Python dependencies
|
||||
│-- README.md # Project documentation
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.8 or higher
|
||||
- OpenAI API Key (You can obtain it from [OpenAI](https://platform.openai.com/signup))
|
||||
- Ollama installed locally ([Installation Guide](https://ollama.ai))
|
||||
- `conda` for managing environments (optional)
|
||||
|
||||
## Installation
|
||||
|
||||
1. **Clone the repository:**
|
||||
|
||||
```bash
|
||||
git clone https://github.com/your-username/ai-summarizer.git
|
||||
cd ai-summarizer
|
||||
```
|
||||
|
||||
2. **Create a virtual environment (optional but recommended):**
|
||||
|
||||
```bash
|
||||
conda create --name summarizer-env python=3.9
|
||||
conda activate summarizer-env
|
||||
```
|
||||
|
||||
3. **Install dependencies:**
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
4. **Set up environment variables:**
|
||||
|
||||
Create a `.env` file in the project root and add your OpenAI API key (if using OpenAI):
|
||||
|
||||
```env
|
||||
OPENAI_API_KEY=your-api-key-here
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
1. **Run the summarizer:**
|
||||
|
||||
```bash
|
||||
python main.py
|
||||
```
|
||||
|
||||
2. **Sample Output:**
|
||||
|
||||
```shell
|
||||
Enter a URL to summarize: https://example.com
|
||||
Summary of the page:
|
||||
AI refers to machines demonstrating intelligence similar to humans and animals.
|
||||
```
|
||||
|
||||
3. **Engine Selection:**
|
||||
|
||||
The summarizer supports multiple engines. Modify `main.py` to select your preferred model:
|
||||
|
||||
```python
|
||||
summary = summarize_text(content, 'gpt-4o-mini', engine="openai")
|
||||
summary = summarize_text(content, 'deepseek-r1:1.5B', engine="ollama-api")
|
||||
summary = summarize_text(content, 'deepseek-r1:1.5B', engine="ollama-lib")
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
You can modify the model, max tokens, and temperature in `summarizer/summarizer.py`:
|
||||
|
||||
```python
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[...],
|
||||
max_tokens=300,
|
||||
temperature=0.7
|
||||
)
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
If any issues occur, the script will print an error message, for example:
|
||||
|
||||
```
|
||||
Error during summarization: Invalid API key or Ollama not running.
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
The required dependencies are listed in `requirements.txt`:
|
||||
|
||||
```
|
||||
openai
|
||||
python-dotenv
|
||||
requests
|
||||
ollama-api
|
||||
```
|
||||
|
||||
Install them using:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions are welcome! Feel free to fork the repository and submit pull requests.
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the MIT License. See the `LICENSE` file for more details.
|
||||
|
||||
## Contact
|
||||
|
||||
For any inquiries, please reach out to:
|
||||
|
||||
- Linkedin: https://www.linkedin.com/in/khanarafat/
|
||||
- GitHub: https://github.com/raoarafat
|
||||
28
week3/community-contributions/ai-web-summarizer/main.py
Normal file
28
week3/community-contributions/ai-web-summarizer/main.py
Normal file
@@ -0,0 +1,28 @@
|
||||
from summarizer.fetcher import fetch_web_content
|
||||
from summarizer.summarizer import summarize_text
|
||||
from utils.logger import logger
|
||||
|
||||
def main():
|
||||
url = input("Enter a URL to summarize: ")
|
||||
|
||||
logger.info(f"Fetching content from: {url}")
|
||||
content = fetch_web_content(url)
|
||||
|
||||
if content:
|
||||
logger.info("Content fetched successfully. Sending to OpenAI for summarization...")
|
||||
# summary = summarize_text(content,'gpt-4o-mini', engine="openai")
|
||||
# summary = summarize_text(content, 'deepseek-r1:1.5B', engine="ollama-lib")
|
||||
summary = summarize_text(content, 'deepseek-r1:1.5B', engine="ollama-api")
|
||||
|
||||
|
||||
if summary:
|
||||
logger.info("Summary generated successfully.")
|
||||
print("\nSummary of the page:\n")
|
||||
print(summary)
|
||||
else:
|
||||
logger.error("Failed to generate summary.")
|
||||
else:
|
||||
logger.error("Failed to fetch web content.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,4 @@
|
||||
openai
|
||||
requests
|
||||
beautifulsoup4
|
||||
python-dotenv
|
||||
@@ -0,0 +1,23 @@
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
def fetch_web_content(url):
|
||||
try:
|
||||
response = requests.get(url)
|
||||
response.raise_for_status()
|
||||
|
||||
# Parse the HTML content
|
||||
soup = BeautifulSoup(response.text, 'html.parser')
|
||||
|
||||
# Extract readable text from the web page (ignoring scripts, styles, etc.)
|
||||
page_text = soup.get_text(separator=' ', strip=True)
|
||||
|
||||
return page_text[:5000] # Limit to 5000 chars (API limitation)
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f"Error fetching the webpage: {e}")
|
||||
return None
|
||||
|
||||
if __name__ == "__main__":
|
||||
url = "https://en.wikipedia.org/wiki/Natural_language_processing"
|
||||
content = fetch_web_content(url)
|
||||
print(content[:500]) # Print a sample of the content
|
||||
@@ -0,0 +1,85 @@
|
||||
import openai # type: ignore
|
||||
import ollama
|
||||
import requests
|
||||
from utils.config import Config
|
||||
|
||||
# Local Ollama API endpoint
|
||||
OLLAMA_API = "http://127.0.0.1:11434/api/chat"
|
||||
|
||||
# Initialize OpenAI client with API key
|
||||
client = openai.Client(api_key=Config.OPENAI_API_KEY)
|
||||
|
||||
def summarize_with_openai(text, model):
|
||||
"""Summarize text using OpenAI's GPT model."""
|
||||
try:
|
||||
response = client.chat.completions.create(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": "You are a helpful assistant that summarizes web pages."},
|
||||
{"role": "user", "content": f"Summarize the following text: {text}"}
|
||||
],
|
||||
max_tokens=300,
|
||||
temperature=0.7
|
||||
)
|
||||
return response.choices[0].message.content
|
||||
except Exception as e:
|
||||
print(f"Error during OpenAI summarization: {e}")
|
||||
return None
|
||||
|
||||
def summarize_with_ollama_lib(text, model):
|
||||
"""Summarize text using Ollama Python library."""
|
||||
try:
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful assistant that summarizes web pages."},
|
||||
{"role": "user", "content": f"Summarize the following text: {text}"}
|
||||
]
|
||||
response = ollama.chat(model=model, messages=messages)
|
||||
return response['message']['content']
|
||||
except Exception as e:
|
||||
print(f"Error during Ollama summarization: {e}")
|
||||
return None
|
||||
|
||||
def summarize_with_ollama_api(text, model):
|
||||
"""Summarize text using local Ollama API."""
|
||||
try:
|
||||
payload = {
|
||||
"model": model,
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are a helpful assistant that summarizes web pages."},
|
||||
{"role": "user", "content": f"Summarize the following text: {text}"}
|
||||
],
|
||||
"stream": False # Set to True for streaming responses
|
||||
}
|
||||
response = requests.post(OLLAMA_API, json=payload)
|
||||
response_data = response.json()
|
||||
return response_data.get('message', {}).get('content', 'No summary generated')
|
||||
except Exception as e:
|
||||
print(f"Error during Ollama API summarization: {e}")
|
||||
return None
|
||||
|
||||
def summarize_text(text, model, engine="openai"):
|
||||
"""Generic function to summarize text using the specified engine (openai/ollama-lib/ollama-api)."""
|
||||
if engine == "openai":
|
||||
return summarize_with_openai(text, model)
|
||||
elif engine == "ollama-lib":
|
||||
return summarize_with_ollama_lib(text, model)
|
||||
elif engine == "ollama-api":
|
||||
return summarize_with_ollama_api(text, model)
|
||||
else:
|
||||
print("Invalid engine specified. Use 'openai', 'ollama-lib', or 'ollama-api'.")
|
||||
return None
|
||||
|
||||
if __name__ == "__main__":
|
||||
sample_text = "Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals and humans."
|
||||
|
||||
# Summarize using OpenAI
|
||||
openai_summary = summarize_text(sample_text, model="gpt-3.5-turbo", engine="openai")
|
||||
print("OpenAI Summary:", openai_summary)
|
||||
|
||||
# Summarize using Ollama Python library
|
||||
ollama_lib_summary = summarize_text(sample_text, model="deepseek-r1:1.5B", engine="ollama-lib")
|
||||
print("Ollama Library Summary:", ollama_lib_summary)
|
||||
|
||||
# Summarize using local Ollama API
|
||||
ollama_api_summary = summarize_text(sample_text, model="deepseek-r1:1.5B", engine="ollama-api")
|
||||
print("Ollama API Summary:", ollama_api_summary)
|
||||
@@ -0,0 +1,11 @@
|
||||
import os
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# Load environment variables from .env file
|
||||
load_dotenv()
|
||||
|
||||
class Config:
|
||||
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("Your OpenAI Key is:", Config.OPENAI_API_KEY)
|
||||
@@ -0,0 +1,16 @@
|
||||
import logging
|
||||
|
||||
# Setup logging configuration
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
handlers=[
|
||||
logging.FileHandler("app.log"),
|
||||
logging.StreamHandler()
|
||||
]
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
if __name__ == "__main__":
|
||||
logger.info("Logger is working correctly.")
|
||||
Reference in New Issue
Block a user