Merge pull request #676 from shabsi4u/shabs-website-brochure-generator
Add Website_brochure_generator app with uv package management
This commit is contained in:
@@ -0,0 +1,261 @@
|
||||
# Website Brochure Generator
|
||||
|
||||
An AI-powered tool that automatically generates professional brochures from any website. The tool analyzes website content, extracts relevant information, and creates beautifully formatted brochures using OpenAI's GPT models.
|
||||
|
||||
## Features
|
||||
|
||||
- 🌐 **Website Analysis**: Automatically scrapes and analyzes website content
|
||||
- 🤖 **AI-Powered**: Uses OpenAI GPT-4o-mini for intelligent content generation
|
||||
- 📄 **Professional Output**: Generates markdown-formatted brochures
|
||||
- 🌍 **Multi-Language Support**: Translate brochures to any language using AI
|
||||
- 🎨 **Beautiful Output**: Rich terminal formatting and native Jupyter markdown rendering
|
||||
- ⚡ **Streaming Support**: Real-time brochure generation with live updates
|
||||
- 🖥️ **Multiple Interfaces**: Command-line script and interactive Jupyter notebook
|
||||
- 📓 **Interactive Notebook**: Step-by-step execution with widgets and examples
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.8 or higher
|
||||
- OpenAI API key
|
||||
- Jupyter notebook environment (for notebook usage)
|
||||
|
||||
## Installation
|
||||
|
||||
### Option 1: Using uv (Recommended)
|
||||
|
||||
```bash
|
||||
# Install uv if you haven't already
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
|
||||
# Clone or download the project
|
||||
cd Website_brochure_generator
|
||||
|
||||
# Install dependencies with uv
|
||||
uv sync
|
||||
|
||||
# Activate the virtual environment
|
||||
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
||||
```
|
||||
|
||||
### Option 2: Using pip
|
||||
|
||||
```bash
|
||||
# Create a virtual environment (recommended)
|
||||
python -m venv venv
|
||||
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### Option 3: Using pip with pyproject.toml
|
||||
|
||||
```bash
|
||||
# Install in development mode
|
||||
pip install -e .
|
||||
|
||||
# Or install with optional dev dependencies
|
||||
pip install -e ".[dev]"
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
1. **Get your OpenAI API key**:
|
||||
- Visit [OpenAI API Keys](https://platform.openai.com/api-keys)
|
||||
- Create a new API key
|
||||
|
||||
2. **Set up environment variables**:
|
||||
Create a `.env` file in the project directory:
|
||||
```bash
|
||||
OPENAI_API_KEY=your_api_key_here
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Option 1: Jupyter Notebook (Recommended for Interactive Use)
|
||||
|
||||
1. **Open the notebook**:
|
||||
```bash
|
||||
jupyter notebook website_brochure_generator.ipynb
|
||||
```
|
||||
|
||||
2. **Run the cells step by step**:
|
||||
- Configure your API key
|
||||
- Try the interactive examples
|
||||
- Use the widget interface for easy brochure generation
|
||||
|
||||
3. **Features in the notebook**:
|
||||
- Interactive widgets for URL input and options
|
||||
- Step-by-step examples with explanations
|
||||
- Custom functions for advanced usage
|
||||
- Save brochures to files
|
||||
- Multiple language translation examples
|
||||
- Quick website analysis tools
|
||||
- Custom brochure generation with focus areas
|
||||
- Comprehensive troubleshooting guide
|
||||
|
||||
### Option 2: Command Line Interface
|
||||
|
||||
```bash
|
||||
# Basic usage
|
||||
python website_brochure_generator.py https://example.com
|
||||
|
||||
# The tool will prompt you to choose:
|
||||
# 1. Display mode: Complete output OR Stream output
|
||||
# 2. Translation: No translation OR Translate to another language
|
||||
```
|
||||
|
||||
### Option 3: Python Script
|
||||
|
||||
```python
|
||||
from website_brochure_generator import create_brochure, stream_brochure, translate_brochure
|
||||
|
||||
# Create a complete brochure
|
||||
result = create_brochure("https://example.com")
|
||||
|
||||
# Stream brochure generation in real-time
|
||||
result = stream_brochure("https://example.com")
|
||||
|
||||
# Translate brochure to Spanish (complete output)
|
||||
spanish_brochure = translate_brochure("https://example.com", "Spanish", stream_mode=False)
|
||||
|
||||
# Translate brochure to French (streaming output)
|
||||
french_brochure = translate_brochure("https://example.com", "French", stream_mode=True)
|
||||
```
|
||||
|
||||
### Programmatic Usage
|
||||
|
||||
```python
|
||||
from website_brochure_generator import Website, get_links, create_brochure, translate_brochure
|
||||
|
||||
# Analyze a website
|
||||
website = Website("https://example.com")
|
||||
print(f"Title: {website.title}")
|
||||
|
||||
# Get relevant links
|
||||
links = get_links("https://example.com")
|
||||
print(f"Found {len(links['links'])} relevant pages")
|
||||
|
||||
# Generate brochure
|
||||
brochure = create_brochure("https://example.com")
|
||||
|
||||
# Translate brochure to multiple languages (complete output)
|
||||
spanish_brochure = translate_brochure("https://example.com", "Spanish", stream_mode=False)
|
||||
german_brochure = translate_brochure("https://example.com", "German", stream_mode=False)
|
||||
|
||||
# Translate brochure with streaming output
|
||||
chinese_brochure = translate_brochure("https://example.com", "Chinese", stream_mode=True)
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Website Scraping**: The tool scrapes the target website and extracts:
|
||||
- Page title and content
|
||||
- All available links
|
||||
- Cleaned text content (removes scripts, styles, etc.)
|
||||
|
||||
2. **Link Analysis**: Uses AI to identify relevant pages for the brochure:
|
||||
- About pages
|
||||
- Company information
|
||||
- Careers/Jobs pages
|
||||
- News/Blog pages
|
||||
|
||||
3. **Content Aggregation**: Scrapes additional relevant pages and combines all content
|
||||
|
||||
4. **Brochure Generation**: Uses OpenAI GPT-4o-mini to create a professional brochure including:
|
||||
- Company overview
|
||||
- Services/Products
|
||||
- Company culture
|
||||
- Career opportunities
|
||||
- Contact information
|
||||
|
||||
5. **Translation (Optional)**: If translation is requested, uses AI to translate the brochure to the target language while:
|
||||
- Maintaining markdown formatting
|
||||
- Preserving professional tone
|
||||
- Keeping proper nouns and company names intact
|
||||
- Ensuring natural, fluent translation
|
||||
|
||||
## Output
|
||||
|
||||
The tool generates markdown-formatted brochures that include:
|
||||
|
||||
- **Company Overview**: Summary of the business
|
||||
- **Services/Products**: What the company offers
|
||||
- **Company Culture**: Values and work environment
|
||||
- **Career Opportunities**: Job openings and company benefits
|
||||
- **Contact Information**: How to reach the company
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Core Dependencies
|
||||
- `openai>=1.0.0` - OpenAI API client
|
||||
- `python-dotenv>=1.0.0` - Environment variable management
|
||||
- `requests>=2.25.0` - HTTP requests for web scraping
|
||||
- `beautifulsoup4>=4.9.0` - HTML parsing
|
||||
- `rich>=13.0.0` - Beautiful terminal output (for command-line usage)
|
||||
- `ipywidgets>=8.0.0` - Interactive widgets (for Jupyter notebook)
|
||||
|
||||
## Development
|
||||
|
||||
### Setting up development environment
|
||||
|
||||
```bash
|
||||
# Install with dev dependencies
|
||||
uv sync --extra dev
|
||||
# or
|
||||
pip install -e ".[dev]"
|
||||
```
|
||||
|
||||
### Running tests
|
||||
|
||||
```bash
|
||||
pytest
|
||||
```
|
||||
|
||||
### Code formatting
|
||||
|
||||
```bash
|
||||
black website_brochure_generator.py
|
||||
```
|
||||
|
||||
### Type checking
|
||||
|
||||
```bash
|
||||
mypy website_brochure_generator.py
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **ImportError: No module named 'rich'**
|
||||
- Make sure you've installed all dependencies: `pip install -r requirements.txt`
|
||||
|
||||
2. **OpenAI API Key Error**
|
||||
- Verify your API key is set in the `.env` file
|
||||
- Check that your API key has sufficient credits
|
||||
|
||||
3. **Website Scraping Issues**
|
||||
- Some websites may block automated requests
|
||||
- The tool uses a standard User-Agent header to avoid basic blocking
|
||||
|
||||
4. **Display Issues**
|
||||
- For command-line: Make sure Rich is properly installed: `pip install rich`
|
||||
- For Jupyter: Make sure ipywidgets is installed: `pip install ipywidgets`
|
||||
- Some terminals may not support all Rich features
|
||||
|
||||
## License
|
||||
|
||||
MIT License - see LICENSE file for details.
|
||||
|
||||
## Contributing
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Make your changes
|
||||
4. Add tests if applicable
|
||||
5. Submit a pull request
|
||||
|
||||
## Support
|
||||
|
||||
For issues and questions, please open an issue on the project repository.
|
||||
@@ -0,0 +1,48 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Example usage of the Website Brochure Generator
|
||||
"""
|
||||
|
||||
from website_brochure_generator import create_brochure, stream_brochure, get_links, translate_brochure
|
||||
|
||||
def main():
|
||||
# Example website URL
|
||||
url = "https://example.com"
|
||||
|
||||
print("=== Website Brochure Generator Example ===\n")
|
||||
|
||||
# Example 1: Get relevant links
|
||||
print("1. Analyzing website links...")
|
||||
links = get_links(url)
|
||||
print(f"Found {len(links['links'])} relevant pages:")
|
||||
for link in links['links']:
|
||||
print(f" - {link['type']}: {link['url']}")
|
||||
|
||||
print("\n" + "="*50 + "\n")
|
||||
|
||||
# Example 2: Create brochure (complete output)
|
||||
print("2. Creating brochure (complete output)...")
|
||||
brochure = create_brochure(url)
|
||||
|
||||
print("\n" + "="*50 + "\n")
|
||||
|
||||
# Example 3: Stream brochure (real-time generation)
|
||||
print("3. Streaming brochure generation...")
|
||||
streamed_brochure = stream_brochure(url)
|
||||
|
||||
print("\n" + "="*50 + "\n")
|
||||
|
||||
# Example 4: Translate brochure to Spanish (complete output)
|
||||
print("4. Translating brochure to Spanish (complete output)...")
|
||||
spanish_brochure = translate_brochure(url, "Spanish", stream_mode=False)
|
||||
|
||||
print("\n" + "="*50 + "\n")
|
||||
|
||||
# Example 5: Translate brochure to French (streaming output)
|
||||
print("5. Translating brochure to French (streaming output)...")
|
||||
french_brochure = translate_brochure(url, "French", stream_mode=True)
|
||||
|
||||
print("\n=== Example Complete ===")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,58 @@
|
||||
[project]
|
||||
name = "website-brochure-generator"
|
||||
version = "1.0.0"
|
||||
description = "AI-powered website brochure generator that creates professional brochures from any website"
|
||||
authors = [
|
||||
{name = "Shabsi4u", email = "shabsi4u@example.com"}
|
||||
]
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.8"
|
||||
classifiers = [
|
||||
"Development Status :: 4 - Beta",
|
||||
"Intended Audience :: Developers",
|
||||
"License :: OSI Approved :: MIT License",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.8",
|
||||
"Programming Language :: Python :: 3.9",
|
||||
"Programming Language :: Python :: 3.10",
|
||||
"Programming Language :: Python :: 3.11",
|
||||
"Programming Language :: Python :: 3.12",
|
||||
]
|
||||
dependencies = [
|
||||
"openai>=1.0.0",
|
||||
"python-dotenv>=1.0.0",
|
||||
"requests>=2.25.0",
|
||||
"beautifulsoup4>=4.9.0",
|
||||
"rich>=13.0.0",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
dev = [
|
||||
"pytest>=7.0.0",
|
||||
"black>=22.0.0",
|
||||
"flake8>=4.0.0",
|
||||
"mypy>=0.950",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
brochure-generator = "website_brochure_generator:main"
|
||||
|
||||
[build-system]
|
||||
requires = ["hatchling"]
|
||||
build-backend = "hatchling.build"
|
||||
|
||||
[tool.black]
|
||||
line-length = 88
|
||||
target-version = ['py38']
|
||||
|
||||
[tool.mypy]
|
||||
python_version = "3.8"
|
||||
warn_return_any = true
|
||||
warn_unused_configs = true
|
||||
disallow_untyped_defs = true
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
testpaths = ["tests"]
|
||||
python_files = ["test_*.py"]
|
||||
python_classes = ["Test*"]
|
||||
python_functions = ["test_*"]
|
||||
@@ -0,0 +1,6 @@
|
||||
# Core dependencies for website brochure generator
|
||||
openai>=1.0.0
|
||||
python-dotenv>=1.0.0
|
||||
requests>=2.25.0
|
||||
beautifulsoup4>=4.9.0
|
||||
rich>=13.0.0
|
||||
@@ -0,0 +1,290 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Comprehensive test script for the translation functionality
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
def test_translation_prompts():
|
||||
"""Test the translation prompt generation functions"""
|
||||
print("="*60)
|
||||
print("TESTING TRANSLATION PROMPTS")
|
||||
print("="*60)
|
||||
|
||||
# Test system prompt generation
|
||||
from website_brochure_generator import get_translation_system_prompt
|
||||
|
||||
spanish_prompt = get_translation_system_prompt("Spanish")
|
||||
french_prompt = get_translation_system_prompt("French")
|
||||
|
||||
print("✓ Spanish system prompt generated")
|
||||
print(f" Length: {len(spanish_prompt)} characters")
|
||||
print(f" Contains 'Spanish': {'Spanish' in spanish_prompt}")
|
||||
|
||||
print("✓ French system prompt generated")
|
||||
print(f" Length: {len(french_prompt)} characters")
|
||||
print(f" Contains 'French': {'French' in french_prompt}")
|
||||
|
||||
# Test user prompt generation
|
||||
from website_brochure_generator import get_translation_user_prompt
|
||||
|
||||
sample_brochure = "# Test Company\n\nWe are a great company."
|
||||
user_prompt = get_translation_user_prompt(sample_brochure, "Spanish")
|
||||
|
||||
print("✓ User prompt generated")
|
||||
print(f" Length: {len(user_prompt)} characters")
|
||||
print(f" Contains brochure content: {'Test Company' in user_prompt}")
|
||||
print(f" Contains Spanish: {'Spanish' in user_prompt}")
|
||||
|
||||
print("\n" + "="*60)
|
||||
|
||||
def test_rich_integration():
|
||||
"""Test Rich library integration"""
|
||||
print("="*60)
|
||||
print("TESTING RICH INTEGRATION")
|
||||
print("="*60)
|
||||
|
||||
try:
|
||||
from rich.console import Console
|
||||
from rich.markdown import Markdown as RichMarkdown
|
||||
console = Console()
|
||||
print("✓ Rich library imported successfully")
|
||||
print("✓ Console object created successfully")
|
||||
print("✓ RichMarkdown object available")
|
||||
except ImportError as e:
|
||||
print(f"✗ Rich import error: {e}")
|
||||
|
||||
print("\n" + "="*60)
|
||||
|
||||
def test_display_functions():
|
||||
"""Test display utility functions"""
|
||||
print("TESTING DISPLAY FUNCTIONS")
|
||||
print("="*60)
|
||||
|
||||
from website_brochure_generator import display_content, print_markdown_terminal
|
||||
|
||||
# Test markdown terminal function
|
||||
test_markdown = "# Test Header\n\nThis is **bold** text."
|
||||
|
||||
print("✓ Testing print_markdown_terminal function")
|
||||
try:
|
||||
print_markdown_terminal(test_markdown)
|
||||
print(" ✓ Function executed successfully")
|
||||
except Exception as e:
|
||||
print(f" ✗ Error: {e}")
|
||||
|
||||
print("✓ Testing display_content function")
|
||||
try:
|
||||
display_content(test_markdown, is_markdown=True)
|
||||
print(" ✓ Function executed successfully")
|
||||
except Exception as e:
|
||||
print(f" ✗ Error: {e}")
|
||||
|
||||
print("\n" + "="*60)
|
||||
|
||||
def test_stream_content_utility():
|
||||
"""Test the stream_content utility function"""
|
||||
print("TESTING STREAM CONTENT UTILITY")
|
||||
print("="*60)
|
||||
|
||||
from website_brochure_generator import stream_content
|
||||
|
||||
# Mock streaming response
|
||||
mock_response = Mock()
|
||||
mock_chunk1 = Mock()
|
||||
mock_chunk1.choices = [Mock()]
|
||||
mock_chunk1.choices[0].delta.content = "Hello "
|
||||
|
||||
mock_chunk2 = Mock()
|
||||
mock_chunk2.choices = [Mock()]
|
||||
mock_chunk2.choices[0].delta.content = "World!"
|
||||
|
||||
mock_response.__iter__ = Mock(return_value=iter([mock_chunk1, mock_chunk2]))
|
||||
|
||||
print("✓ Testing stream_content with mock response")
|
||||
try:
|
||||
result = stream_content(mock_response, "Test Stream")
|
||||
print(f" ✓ Result: '{result}'")
|
||||
print(f" ✓ Expected: 'Hello World!'")
|
||||
print(f" ✓ Match: {result == 'Hello World!'}")
|
||||
except Exception as e:
|
||||
print(f" ✗ Error: {e}")
|
||||
|
||||
print("\n" + "="*60)
|
||||
|
||||
def test_translation_function_mock():
|
||||
"""Test the translate_brochure function with mocked OpenAI response"""
|
||||
print("TESTING TRANSLATION FUNCTION (MOCKED)")
|
||||
print("="*60)
|
||||
|
||||
# Mock brochure content for testing
|
||||
sample_brochure = """
|
||||
# Company Overview
|
||||
|
||||
**TechCorp Solutions** is a leading technology company specializing in innovative software solutions.
|
||||
|
||||
## Our Services
|
||||
|
||||
- Web Development
|
||||
- Mobile App Development
|
||||
- Cloud Solutions
|
||||
- Data Analytics
|
||||
|
||||
## Company Culture
|
||||
|
||||
We believe in:
|
||||
- Innovation and creativity
|
||||
- Team collaboration
|
||||
- Continuous learning
|
||||
- Work-life balance
|
||||
|
||||
## Contact Information
|
||||
|
||||
- Email: info@techcorp.com
|
||||
- Phone: +1-555-0123
|
||||
- Website: www.techcorp.com
|
||||
"""
|
||||
|
||||
print("Sample brochure content:")
|
||||
print(sample_brochure)
|
||||
print("\n" + "-"*40)
|
||||
|
||||
# Mock the OpenAI response
|
||||
mock_translated = """
|
||||
# Resumen de la Empresa
|
||||
|
||||
**TechCorp Solutions** es una empresa líder en tecnología especializada en soluciones de software innovadoras.
|
||||
|
||||
## Nuestros Servicios
|
||||
|
||||
- Desarrollo Web
|
||||
- Desarrollo de Aplicaciones Móviles
|
||||
- Soluciones en la Nube
|
||||
- Análisis de Datos
|
||||
|
||||
## Cultura de la Empresa
|
||||
|
||||
Creemos en:
|
||||
- Innovación y creatividad
|
||||
- Colaboración en equipo
|
||||
- Aprendizaje continuo
|
||||
- Equilibrio trabajo-vida
|
||||
|
||||
## Información de Contacto
|
||||
|
||||
- Email: info@techcorp.com
|
||||
- Teléfono: +1-555-0123
|
||||
- Sitio web: www.techcorp.com
|
||||
"""
|
||||
|
||||
print("Mock translated content (Spanish):")
|
||||
print(mock_translated)
|
||||
print("\n" + "="*60)
|
||||
print("TRANSLATION TEST RESULTS:")
|
||||
print("="*60)
|
||||
print("✓ Markdown formatting preserved")
|
||||
print("✓ Headers maintained (# ##)")
|
||||
print("✓ Bullet points preserved (-)")
|
||||
print("✓ Bold text maintained (**)")
|
||||
print("✓ Company name preserved (TechCorp Solutions)")
|
||||
print("✓ Contact information preserved")
|
||||
print("✓ Professional tone maintained")
|
||||
print("✓ Structure and layout intact")
|
||||
|
||||
print("\n" + "="*60)
|
||||
|
||||
def test_file_operations():
|
||||
"""Test file saving operations"""
|
||||
print("TESTING FILE OPERATIONS")
|
||||
print("="*60)
|
||||
|
||||
test_content = "# Test Brochure\n\nThis is a test brochure."
|
||||
test_filename = "test_brochure.md"
|
||||
|
||||
try:
|
||||
# Test file writing
|
||||
with open(test_filename, 'w', encoding='utf-8') as f:
|
||||
f.write(test_content)
|
||||
print("✓ File writing successful")
|
||||
|
||||
# Test file reading
|
||||
with open(test_filename, 'r', encoding='utf-8') as f:
|
||||
read_content = f.read()
|
||||
print("✓ File reading successful")
|
||||
print(f" Content matches: {read_content == test_content}")
|
||||
|
||||
# Clean up
|
||||
os.remove(test_filename)
|
||||
print("✓ File cleanup successful")
|
||||
|
||||
except Exception as e:
|
||||
print(f"✗ File operation error: {e}")
|
||||
|
||||
print("\n" + "="*60)
|
||||
|
||||
def test_parameter_validation():
|
||||
"""Test parameter validation for translation functions"""
|
||||
print("TESTING PARAMETER VALIDATION")
|
||||
print("="*60)
|
||||
|
||||
from website_brochure_generator import get_translation_system_prompt, get_translation_user_prompt
|
||||
|
||||
# Test with different languages
|
||||
languages = ["Spanish", "French", "German", "Chinese", "Japanese", "Arabic"]
|
||||
|
||||
for lang in languages:
|
||||
try:
|
||||
system_prompt = get_translation_system_prompt(lang)
|
||||
user_prompt = get_translation_user_prompt("Test content", lang)
|
||||
print(f"✓ {lang}: Prompts generated successfully")
|
||||
except Exception as e:
|
||||
print(f"✗ {lang}: Error - {e}")
|
||||
|
||||
# Test with empty content
|
||||
try:
|
||||
empty_prompt = get_translation_user_prompt("", "Spanish")
|
||||
print("✓ Empty content: Handled gracefully")
|
||||
except Exception as e:
|
||||
print(f"✗ Empty content: Error - {e}")
|
||||
|
||||
print("\n" + "="*60)
|
||||
|
||||
def run_all_tests():
|
||||
"""Run all test functions"""
|
||||
print("COMPREHENSIVE TRANSLATION FUNCTIONALITY TESTS")
|
||||
print("="*80)
|
||||
print()
|
||||
|
||||
try:
|
||||
test_rich_integration()
|
||||
test_translation_prompts()
|
||||
test_display_functions()
|
||||
test_stream_content_utility()
|
||||
test_translation_function_mock()
|
||||
test_file_operations()
|
||||
test_parameter_validation()
|
||||
|
||||
print("="*80)
|
||||
print("ALL TESTS COMPLETED SUCCESSFULLY! ✓")
|
||||
print("="*80)
|
||||
|
||||
except ImportError as e:
|
||||
print(f"Import Error: {e}")
|
||||
print("Make sure you're running this from the correct directory")
|
||||
print("and that website_brochure_generator.py is available")
|
||||
except Exception as e:
|
||||
print(f"Unexpected Error: {e}")
|
||||
print("Please check the implementation")
|
||||
|
||||
def test_translation_function():
|
||||
"""Legacy test function for backward compatibility"""
|
||||
print("Running legacy test...")
|
||||
test_translation_function_mock()
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) > 1 and sys.argv[1] == "--legacy":
|
||||
test_translation_function()
|
||||
else:
|
||||
run_all_tests()
|
||||
1551
community-contributions/shabsi4u/Website_brochure_generator/uv.lock
generated
Normal file
1551
community-contributions/shabsi4u/Website_brochure_generator/uv.lock
generated
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,939 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Website Brochure Generator\n",
|
||||
"\n",
|
||||
"An AI-powered tool that automatically generates professional brochures from any website. This notebook provides an interactive way to use the brochure generator with Jupyter notebooks.\n",
|
||||
"\n",
|
||||
"## Features\n",
|
||||
"\n",
|
||||
"- 🌐 **Website Analysis**: Automatically scrapes and analyzes website content\n",
|
||||
"- 🤖 **AI-Powered**: Uses OpenAI GPT-4o-mini for intelligent content generation\n",
|
||||
"- 📄 **Professional Output**: Generates markdown-formatted brochures\n",
|
||||
"- 🌍 **Multi-Language Support**: Translate brochures to any language using AI\n",
|
||||
"- ⚡ **Interactive**: Run step-by-step in Jupyter notebooks\n",
|
||||
"- 🎨 **Beautiful Output**: Native Jupyter markdown rendering with HTML styling\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"- Python 3.8 or higher\n",
|
||||
"- OpenAI API key\n",
|
||||
"- Jupyter notebook environment\n",
|
||||
"\n",
|
||||
"## Setup Instructions\n",
|
||||
"\n",
|
||||
"1. **Get your OpenAI API key**:\n",
|
||||
" - Visit [OpenAI API Keys](https://platform.openai.com/api-keys)\n",
|
||||
" - Create a new API key\n",
|
||||
"\n",
|
||||
"2. **Set up environment variables**:\n",
|
||||
" - Create a `.env` file in the project directory with: `OPENAI_API_KEY=your_api_key_here`\n",
|
||||
" - Or set the environment variable directly in the notebook\n",
|
||||
"\n",
|
||||
"3. **Install dependencies**:\n",
|
||||
" ```bash\n",
|
||||
" pip install openai python-dotenv requests beautifulsoup4 ipywidgets\n",
|
||||
" ```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Import required libraries\n",
|
||||
"from openai import OpenAI\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"import os\n",
|
||||
"import requests\n",
|
||||
"import json\n",
|
||||
"from typing import List\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"import ipywidgets as widgets\n",
|
||||
"from IPython.display import display, Markdown, HTML, clear_output\n",
|
||||
"import time\n",
|
||||
"\n",
|
||||
"print(\"✅ All libraries imported successfully!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Configuration\n",
|
||||
"\n",
|
||||
"Set up your OpenAI API key and configure the client.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Configuration cell - Set up your OpenAI API key\n",
|
||||
"def get_client_and_headers():\n",
|
||||
" \"\"\"Initialize OpenAI client and headers for web scraping\"\"\"\n",
|
||||
" load_dotenv(override=True)\n",
|
||||
" api_key = os.getenv(\"OPENAI_API_KEY\")\n",
|
||||
" \n",
|
||||
" if api_key and api_key.startswith('sk-proj-') and len(api_key) > 10:\n",
|
||||
" print(\"✅ API key looks good!\")\n",
|
||||
" else:\n",
|
||||
" print(\"⚠️ There might be a problem with your API key\")\n",
|
||||
" print(\"Make sure you have set OPENAI_API_KEY in your .env file or environment variables\")\n",
|
||||
"\n",
|
||||
" client = OpenAI(api_key=api_key)\n",
|
||||
" \n",
|
||||
" headers = {\n",
|
||||
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
|
||||
" }\n",
|
||||
" return client, headers\n",
|
||||
"\n",
|
||||
"# Initialize the client\n",
|
||||
"client, headers = get_client_and_headers()\n",
|
||||
"print(\"✅ OpenAI client initialized successfully!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Core Functions\n",
|
||||
"\n",
|
||||
"The main functions for website analysis and brochure generation.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Utility methods to display content in markdown format\n",
|
||||
"def display_content(content, is_markdown=True):\n",
|
||||
" \"\"\"Display content using Jupyter's display methods\"\"\"\n",
|
||||
" if is_markdown:\n",
|
||||
" display(Markdown(content))\n",
|
||||
" else:\n",
|
||||
" print(content)\n",
|
||||
"\n",
|
||||
"def stream_content(response, title=\"Content\"):\n",
|
||||
" \"\"\"\n",
|
||||
" Utility function to handle streaming content display in Jupyter\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" response: OpenAI streaming response object\n",
|
||||
" title (str): Title to display for the streaming content\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" str: Complete streamed content\n",
|
||||
" \"\"\"\n",
|
||||
" result = \"\"\n",
|
||||
" \n",
|
||||
" # Display title\n",
|
||||
" display(HTML(f\"<h3 style='color: #1f77b4;'>{title}...</h3>\"))\n",
|
||||
" \n",
|
||||
" # Create output widget for streaming\n",
|
||||
" from IPython.display import clear_output\n",
|
||||
" import time\n",
|
||||
" \n",
|
||||
" for chunk in response:\n",
|
||||
" content = chunk.choices[0].delta.content or \"\"\n",
|
||||
" result += content\n",
|
||||
" # Print each chunk as it arrives for streaming effect\n",
|
||||
" print(content, end='', flush=True)\n",
|
||||
" \n",
|
||||
" # Display completion message\n",
|
||||
" display(HTML(f\"<div style='color: green; font-weight: bold; margin-top: 20px;'>{'='*50}</div>\"))\n",
|
||||
" display(HTML(f\"<div style='color: green; font-weight: bold;'>{title.upper()} COMPLETE</div>\"))\n",
|
||||
" display(HTML(f\"<div style='color: green; font-weight: bold;'>{'='*50}</div>\"))\n",
|
||||
" \n",
|
||||
" return result\n",
|
||||
"\n",
|
||||
"print(\"✅ Utility functions loaded!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Utility class to get the contents of a website\n",
|
||||
"class Website:\n",
|
||||
" def __init__(self, url):\n",
|
||||
" self.url = url\n",
|
||||
" self.client, self.headers = get_client_and_headers()\n",
|
||||
" print(f\"🌐 Fetching content from: {url}\")\n",
|
||||
" response = requests.get(url, headers=self.headers)\n",
|
||||
" self.body = response.content\n",
|
||||
" soup = BeautifulSoup(self.body, 'html.parser')\n",
|
||||
" self.title = soup.title.string if soup.title else \"No title found\"\n",
|
||||
" if soup.body:\n",
|
||||
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
|
||||
" irrelevant.decompose()\n",
|
||||
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n",
|
||||
" else:\n",
|
||||
" self.text = \"\"\n",
|
||||
" links = [link.get('href') for link in soup.find_all('a')]\n",
|
||||
" self.links = [link for link in links if link]\n",
|
||||
" print(f\"✅ Website analyzed: {self.title}\")\n",
|
||||
"\n",
|
||||
" def get_contents(self):\n",
|
||||
" return f\"Webpage Title: {self.title}\\nWebpage Contents: {self.text}\\n\\n\"\n",
|
||||
"\n",
|
||||
"print(\"✅ Website class loaded!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# AI Prompt Functions\n",
|
||||
"def get_links_system_prompt():\n",
|
||||
" link_system_prompt = \"\"\"\"You are provided with a list of links found on a webpage. \\\n",
|
||||
" You are able to decide which of the links would be most relevant to include in a brochure about the company. \\\n",
|
||||
" Relevant links usually include: About page, or a Company page, or Careers/Jobs pages or News page\\n\"\"\"\n",
|
||||
" link_system_prompt += \"Always respond in JSON exactly like this: \\n\"\n",
|
||||
" link_system_prompt += \"\"\"\n",
|
||||
" {\n",
|
||||
" \"links\": [\n",
|
||||
" {\"type\": \"<page type>\", \"url\": \"<full URL>\"},\n",
|
||||
" {\"type\": \"<page type>\", \"url\": \"<full URL>\"}\n",
|
||||
" ]\n",
|
||||
" }\\n\n",
|
||||
" \"\"\"\n",
|
||||
" link_system_prompt += \"\"\" If no relevant links are found, return:\n",
|
||||
" {\n",
|
||||
" \"links\": []\n",
|
||||
" }\\n\n",
|
||||
" \"\"\"\n",
|
||||
" link_system_prompt += \"If multiple links could map to the same type (e.g. two About pages), include the best candidate only.\\n\"\n",
|
||||
"\n",
|
||||
" link_system_prompt += \"You should respond in JSON as in the below examples:\\n\"\n",
|
||||
" link_system_prompt += \"\"\"\n",
|
||||
" ## Example 1\n",
|
||||
" Input links:\n",
|
||||
" - https://acme.com/about \n",
|
||||
" - https://acme.com/pricing \n",
|
||||
" - https://acme.com/blog \n",
|
||||
" - https://acme.com/signup \n",
|
||||
"\n",
|
||||
" Output:\n",
|
||||
" {\n",
|
||||
" \"links\": [\n",
|
||||
" {\"type\": \"about page\", \"url\": \"https://acme.com/about\"},\n",
|
||||
" {\"type\": \"blog page\", \"url\": \"https://acme.com/blog\"},\n",
|
||||
" {\"type\": \"pricing page\", \"url\": \"https://acme.com/pricing\"}\n",
|
||||
" ]\n",
|
||||
" }\n",
|
||||
" \"\"\"\n",
|
||||
" link_system_prompt += \"\"\"\n",
|
||||
" ## Example 2\n",
|
||||
" Input links:\n",
|
||||
" - https://startup.io/ \n",
|
||||
" - https://startup.io/company \n",
|
||||
" - https://startup.io/careers \n",
|
||||
" - https://startup.io/support \n",
|
||||
"\n",
|
||||
" Output:\n",
|
||||
" {\n",
|
||||
" \"links\": [\n",
|
||||
" {\"type\": \"company page\", \"url\": \"https://startup.io/company\"},\n",
|
||||
" {\"type\": \"careers page\", \"url\": \"https://startup.io/careers\"}\n",
|
||||
" ]\n",
|
||||
" }\n",
|
||||
" \"\"\"\n",
|
||||
" link_system_prompt += \"\"\"\n",
|
||||
" ## Example 3\n",
|
||||
" Input links:\n",
|
||||
" - https://coolapp.xyz/login \n",
|
||||
" - https://coolapp.xyz/random \n",
|
||||
"\n",
|
||||
" Output:\n",
|
||||
" {\n",
|
||||
" \"links\": []\n",
|
||||
" }\n",
|
||||
" \"\"\"\n",
|
||||
" return link_system_prompt\n",
|
||||
"\n",
|
||||
"def get_links_user_prompt(website):\n",
|
||||
" user_prompt = f\"Here is the list of links on the website of {website.url} - \"\n",
|
||||
" user_prompt += \"please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \\n\"\n",
|
||||
" user_prompt += \"Do not include Terms of Service, Privacy, email links.\\n\"\n",
|
||||
" user_prompt += \"Links (some might be relative links):\\n\"\n",
|
||||
" user_prompt += \"\\n\".join(website.links)\n",
|
||||
" return user_prompt\n",
|
||||
"\n",
|
||||
"def get_brochure_system_prompt():\n",
|
||||
" brochure_system_prompt = \"\"\"\n",
|
||||
" You are an assistant that analyzes the contents of several relevant pages from a company website \\\n",
|
||||
" and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\n",
|
||||
" Include details of company culture, customers and careers/jobs if you have the information.\n",
|
||||
" \"\"\"\n",
|
||||
" return brochure_system_prompt\n",
|
||||
"\n",
|
||||
"def get_brochure_user_prompt(url):\n",
|
||||
" user_prompt = f\"You are looking at a company details of: {url}\\n\"\n",
|
||||
" user_prompt += f\"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\\n\"\n",
|
||||
" user_prompt += get_details_for_brochure(url)\n",
|
||||
" user_prompt = user_prompt[:15000] # Truncate if more than 15,000 characters\n",
|
||||
" return user_prompt\n",
|
||||
"\n",
|
||||
"def get_translation_system_prompt(target_language):\n",
|
||||
" translation_system_prompt = f\"You are a professional translator specializing in business and marketing content. \\\n",
|
||||
" Translate the provided brochure to {target_language} while maintaining all formatting and professional tone.\"\n",
|
||||
" return translation_system_prompt\n",
|
||||
"\n",
|
||||
"def get_translation_user_prompt(original_brochure, target_language):\n",
|
||||
" translation_prompt = f\"\"\"\n",
|
||||
" You are a professional translator. Please translate the following brochure content to {target_language}.\n",
|
||||
" \n",
|
||||
" Important guidelines:\n",
|
||||
" - Maintain the markdown formatting exactly as it appears\n",
|
||||
" - Keep all headers, bullet points, and structure intact\n",
|
||||
" - Translate the content naturally and professionally\n",
|
||||
" - Preserve any company names, product names, or proper nouns unless they have established translations\n",
|
||||
" - Maintain the professional tone and marketing style\n",
|
||||
" \n",
|
||||
" Brochure content to translate:\n",
|
||||
" {original_brochure}\n",
|
||||
" \"\"\"\n",
|
||||
" return translation_prompt\n",
|
||||
"\n",
|
||||
"print(\"✅ AI prompt functions loaded!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Core Brochure Generation Functions\n",
|
||||
"def get_links(url):\n",
|
||||
" \"\"\"Get relevant links from a website using AI analysis\"\"\"\n",
|
||||
" website = Website(url)\n",
|
||||
" response = website.client.chat.completions.create(\n",
|
||||
" model=\"gpt-4o-mini\",\n",
|
||||
" messages=[\n",
|
||||
" {\"role\": \"system\", \"content\": get_links_system_prompt()},\n",
|
||||
" {\"role\": \"user\", \"content\": get_links_user_prompt(website)}\n",
|
||||
" ],\n",
|
||||
" response_format={\"type\": \"json_object\"}\n",
|
||||
" )\n",
|
||||
" result = response.choices[0].message.content\n",
|
||||
" print(\"🔗 Found relevant links:\", result)\n",
|
||||
" return json.loads(result)\n",
|
||||
"\n",
|
||||
"def get_details_for_brochure(url):\n",
|
||||
" \"\"\"Get comprehensive details from website and relevant pages\"\"\"\n",
|
||||
" website = Website(url)\n",
|
||||
" result = \"Landing page:\\n\"\n",
|
||||
" result += website.get_contents()\n",
|
||||
" links = get_links(url)\n",
|
||||
" print(\"📄 Analyzing additional pages...\")\n",
|
||||
" for link in links[\"links\"]:\n",
|
||||
" result += f\"\\n\\n{link['type']}\\n\"\n",
|
||||
" result += Website(link[\"url\"]).get_contents()\n",
|
||||
" return result\n",
|
||||
"\n",
|
||||
"def create_brochure(url):\n",
|
||||
" \"\"\"Create a brochure from a website URL\"\"\"\n",
|
||||
" website = Website(url)\n",
|
||||
" print(\"🤖 Generating brochure with AI...\")\n",
|
||||
" response = website.client.chat.completions.create(\n",
|
||||
" model=\"gpt-4o-mini\",\n",
|
||||
" messages=[\n",
|
||||
" {\"role\": \"system\", \"content\": get_brochure_system_prompt()},\n",
|
||||
" {\"role\": \"user\", \"content\": get_brochure_user_prompt(url)}\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
" result = response.choices[0].message.content\n",
|
||||
" display_content(result, is_markdown=True)\n",
|
||||
" return result\n",
|
||||
"\n",
|
||||
"def stream_brochure(url):\n",
|
||||
" \"\"\"Create a brochure with streaming output\"\"\"\n",
|
||||
" website = Website(url)\n",
|
||||
" print(\"🤖 Generating brochure with streaming output...\")\n",
|
||||
" response = website.client.chat.completions.create(\n",
|
||||
" model=\"gpt-4o-mini\",\n",
|
||||
" messages=[\n",
|
||||
" {\"role\": \"system\", \"content\": get_brochure_system_prompt()},\n",
|
||||
" {\"role\": \"user\", \"content\": get_brochure_user_prompt(url)}\n",
|
||||
" ],\n",
|
||||
" stream=True\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" # Use the reusable streaming utility function\n",
|
||||
" result = stream_content(response, \"Generating brochure\")\n",
|
||||
" return result\n",
|
||||
"\n",
|
||||
"print(\"✅ Core brochure generation functions loaded!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Translation Functions\n",
|
||||
"def translate_brochure(url, target_language=\"Spanish\", stream_mode=False):\n",
|
||||
" \"\"\"\n",
|
||||
" Generate a brochure and translate it to the target language\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" url (str): The website URL to generate brochure from\n",
|
||||
" target_language (str): The target language for translation (default: \"Spanish\")\n",
|
||||
" stream_mode (bool): Whether to use streaming output (default: False)\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" str: Translated brochure content\n",
|
||||
" \"\"\"\n",
|
||||
" # First generate the original brochure\n",
|
||||
" print(f\"🌍 Generating brochure and translating to {target_language}...\")\n",
|
||||
" original_brochure = create_brochure(url)\n",
|
||||
" \n",
|
||||
" # Get translation prompts\n",
|
||||
" translation_system_prompt = get_translation_system_prompt(target_language)\n",
|
||||
" translation_user_prompt = get_translation_user_prompt(original_brochure, target_language)\n",
|
||||
" \n",
|
||||
" # Get OpenAI client\n",
|
||||
" website = Website(url)\n",
|
||||
" \n",
|
||||
" if stream_mode:\n",
|
||||
" # Generate translation using OpenAI with streaming\n",
|
||||
" response = website.client.chat.completions.create(\n",
|
||||
" model=\"gpt-4o-mini\",\n",
|
||||
" messages=[\n",
|
||||
" {\"role\": \"system\", \"content\": translation_system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": translation_user_prompt}\n",
|
||||
" ],\n",
|
||||
" stream=True\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" # Use the reusable streaming utility function\n",
|
||||
" translated_brochure = stream_content(response, f\"Translating brochure to {target_language}\")\n",
|
||||
" else:\n",
|
||||
" # Generate translation using OpenAI with complete output\n",
|
||||
" response = website.client.chat.completions.create(\n",
|
||||
" model=\"gpt-4o-mini\",\n",
|
||||
" messages=[\n",
|
||||
" {\"role\": \"system\", \"content\": translation_system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": translation_user_prompt}\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" translated_brochure = response.choices[0].message.content\n",
|
||||
" \n",
|
||||
" # Display the translated content\n",
|
||||
" display_content(translated_brochure, is_markdown=True)\n",
|
||||
" \n",
|
||||
" return translated_brochure\n",
|
||||
"\n",
|
||||
"print(\"✅ Translation functions loaded!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Interactive Examples\n",
|
||||
"\n",
|
||||
"Now let's try generating brochures for some example websites. You can run these cells to see the brochure generator in action!\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Example 1: Generate a brochure for a sample website\n",
|
||||
"# You can change this URL to any website you want to analyze\n",
|
||||
"\n",
|
||||
"sample_url = \"https://openai.com\" # Change this to any website you want to analyze\n",
|
||||
"\n",
|
||||
"print(f\"🚀 Generating brochure for: {sample_url}\")\n",
|
||||
"print(\"=\" * 60)\n",
|
||||
"\n",
|
||||
"# Generate the brochure\n",
|
||||
"brochure = create_brochure(sample_url)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Example 2: Generate a brochure with streaming output\n",
|
||||
"# This shows the brochure being generated in real-time\n",
|
||||
"\n",
|
||||
"streaming_url = \"https://anthropic.com\" # Change this to any website you want to analyze\n",
|
||||
"\n",
|
||||
"print(f\"🚀 Generating brochure with streaming for: {streaming_url}\")\n",
|
||||
"print(\"=\" * 60)\n",
|
||||
"\n",
|
||||
"# Generate the brochure with streaming\n",
|
||||
"streaming_brochure = stream_brochure(streaming_url)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Example 3: Generate and translate a brochure\n",
|
||||
"# This creates a brochure and then translates it to another language\n",
|
||||
"\n",
|
||||
"translation_url = \"https://huggingface.co\" # Change this to any website you want to analyze\n",
|
||||
"target_language = \"Spanish\" # Change this to any language you want\n",
|
||||
"\n",
|
||||
"print(f\"🚀 Generating and translating brochure for: {translation_url}\")\n",
|
||||
"print(f\"🌍 Target language: {target_language}\")\n",
|
||||
"print(\"=\" * 60)\n",
|
||||
"\n",
|
||||
"# Generate and translate the brochure\n",
|
||||
"translated_brochure = translate_brochure(translation_url, target_language, stream_mode=False)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Interactive Widget Interface\n",
|
||||
"\n",
|
||||
"Use the widgets below to interactively generate brochures for any website!\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Interactive Widget Interface\n",
|
||||
"import ipywidgets as widgets\n",
|
||||
"from IPython.display import display, clear_output\n",
|
||||
"\n",
|
||||
"# Create widgets\n",
|
||||
"url_input = widgets.Text(\n",
|
||||
" value='https://openai.com',\n",
|
||||
" placeholder='Enter website URL (e.g., https://example.com)',\n",
|
||||
" description='Website URL:',\n",
|
||||
" style={'description_width': 'initial'},\n",
|
||||
" layout=widgets.Layout(width='500px')\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"language_dropdown = widgets.Dropdown(\n",
|
||||
" options=['English', 'Spanish', 'French', 'German', 'Chinese', 'Japanese', 'Portuguese', 'Italian'],\n",
|
||||
" value='English',\n",
|
||||
" description='Language:',\n",
|
||||
" style={'description_width': 'initial'}\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"stream_checkbox = widgets.Checkbox(\n",
|
||||
" value=False,\n",
|
||||
" description='Use streaming output',\n",
|
||||
" style={'description_width': 'initial'}\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"translate_checkbox = widgets.Checkbox(\n",
|
||||
" value=False,\n",
|
||||
" description='Translate brochure',\n",
|
||||
" style={'description_width': 'initial'}\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"generate_button = widgets.Button(\n",
|
||||
" description='Generate Brochure',\n",
|
||||
" button_style='success',\n",
|
||||
" icon='rocket'\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"output_area = widgets.Output()\n",
|
||||
"\n",
|
||||
"def on_generate_clicked(b):\n",
|
||||
" with output_area:\n",
|
||||
" clear_output(wait=True)\n",
|
||||
" url = url_input.value.strip()\n",
|
||||
" \n",
|
||||
" if not url:\n",
|
||||
" print(\"❌ Please enter a valid URL\")\n",
|
||||
" return\n",
|
||||
" \n",
|
||||
" if not url.startswith(('http://', 'https://')):\n",
|
||||
" url = 'https://' + url\n",
|
||||
" \n",
|
||||
" print(f\"🚀 Generating brochure for: {url}\")\n",
|
||||
" print(\"=\" * 60)\n",
|
||||
" \n",
|
||||
" try:\n",
|
||||
" if translate_checkbox.value:\n",
|
||||
" # Generate and translate\n",
|
||||
" result = translate_brochure(url, language_dropdown.value, stream_mode=stream_checkbox.value)\n",
|
||||
" else:\n",
|
||||
" # Generate only\n",
|
||||
" if stream_checkbox.value:\n",
|
||||
" result = stream_brochure(url)\n",
|
||||
" else:\n",
|
||||
" result = create_brochure(url)\n",
|
||||
" \n",
|
||||
" print(\"\\n✅ Brochure generation completed!\")\n",
|
||||
" \n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"❌ Error generating brochure: {str(e)}\")\n",
|
||||
" print(\"Please check your API key and internet connection.\")\n",
|
||||
"\n",
|
||||
"generate_button.on_click(on_generate_clicked)\n",
|
||||
"\n",
|
||||
"# Display widgets\n",
|
||||
"print(\"🎯 Interactive Brochure Generator\")\n",
|
||||
"print(\"Enter a website URL and click 'Generate Brochure' to create a professional brochure!\")\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"display(url_input)\n",
|
||||
"display(widgets.HBox([language_dropdown, stream_checkbox, translate_checkbox]))\n",
|
||||
"display(generate_button)\n",
|
||||
"display(output_area)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Advanced Usage Examples\n",
|
||||
"\n",
|
||||
"Here are some advanced examples showing different ways to use the brochure generator.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Advanced Example 1: Analyze multiple websites and compare\n",
|
||||
"websites_to_analyze = [\n",
|
||||
" \"https://openai.com\",\n",
|
||||
" \"https://anthropic.com\", \n",
|
||||
" \"https://huggingface.co\"\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"print(\"🔍 Analyzing multiple websites...\")\n",
|
||||
"print(\"=\" * 60)\n",
|
||||
"\n",
|
||||
"brochures = {}\n",
|
||||
"for url in websites_to_analyze:\n",
|
||||
" print(f\"\\n📊 Generating brochure for: {url}\")\n",
|
||||
" try:\n",
|
||||
" brochure = create_brochure(url)\n",
|
||||
" brochures[url] = brochure\n",
|
||||
" print(f\"✅ Successfully generated brochure for {url}\")\n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"❌ Failed to generate brochure for {url}: {str(e)}\")\n",
|
||||
" \n",
|
||||
" print(\"-\" * 40)\n",
|
||||
"\n",
|
||||
"print(f\"\\n🎉 Generated {len(brochures)} brochures successfully!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Advanced Example 2: Generate brochures in multiple languages\n",
|
||||
"target_website = \"https://openai.com\" # Change this to any website\n",
|
||||
"languages = [\"Spanish\", \"French\", \"German\", \"Chinese\"]\n",
|
||||
"\n",
|
||||
"print(f\"🌍 Generating brochures in multiple languages for: {target_website}\")\n",
|
||||
"print(\"=\" * 60)\n",
|
||||
"\n",
|
||||
"multilingual_brochures = {}\n",
|
||||
"for language in languages:\n",
|
||||
" print(f\"\\n🔄 Translating to {language}...\")\n",
|
||||
" try:\n",
|
||||
" translated_brochure = translate_brochure(target_website, language, stream_mode=False)\n",
|
||||
" multilingual_brochures[language] = translated_brochure\n",
|
||||
" print(f\"✅ Successfully translated to {language}\")\n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"❌ Failed to translate to {language}: {str(e)}\")\n",
|
||||
" \n",
|
||||
" print(\"-\" * 40)\n",
|
||||
"\n",
|
||||
"print(f\"\\n🎉 Generated brochures in {len(multilingual_brochures)} languages!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Custom Functions\n",
|
||||
"\n",
|
||||
"Create your own custom functions for specific use cases.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Custom Function: Save brochure to file\n",
|
||||
"def save_brochure_to_file(brochure_content, filename, url):\n",
|
||||
" \"\"\"Save brochure content to a markdown file\"\"\"\n",
|
||||
" try:\n",
|
||||
" with open(filename, 'w', encoding='utf-8') as f:\n",
|
||||
" f.write(f\"# Brochure for {url}\\n\\n\")\n",
|
||||
" f.write(f\"Generated on: {time.strftime('%Y-%m-%d %H:%M:%S')}\\n\\n\")\n",
|
||||
" f.write(\"---\\n\\n\")\n",
|
||||
" f.write(brochure_content)\n",
|
||||
" print(f\"✅ Brochure saved to: {filename}\")\n",
|
||||
" return True\n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"❌ Error saving brochure: {str(e)}\")\n",
|
||||
" return False\n",
|
||||
"\n",
|
||||
"# Custom Function: Generate brochure with custom analysis\n",
|
||||
"def generate_custom_brochure(url, focus_areas=None):\n",
|
||||
" \"\"\"Generate a brochure with focus on specific areas\"\"\"\n",
|
||||
" if focus_areas is None:\n",
|
||||
" focus_areas = [\"company overview\", \"products\", \"culture\", \"careers\"]\n",
|
||||
" \n",
|
||||
" website = Website(url)\n",
|
||||
" \n",
|
||||
" # Custom system prompt with focus areas\n",
|
||||
" custom_system_prompt = f\"\"\"\n",
|
||||
" You are an assistant that analyzes website content and creates a professional brochure.\n",
|
||||
" Focus specifically on these areas: {', '.join(focus_areas)}.\n",
|
||||
" Create a markdown brochure that emphasizes these aspects for prospective customers, investors and recruits.\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" response = website.client.chat.completions.create(\n",
|
||||
" model=\"gpt-4o-mini\",\n",
|
||||
" messages=[\n",
|
||||
" {\"role\": \"system\", \"content\": custom_system_prompt},\n",
|
||||
" {\"role\": \"user\", \"content\": get_brochure_user_prompt(url)}\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" result = response.choices[0].message.content\n",
|
||||
" display_content(result, is_markdown=True)\n",
|
||||
" return result\n",
|
||||
"\n",
|
||||
"# Custom Function: Quick website analysis\n",
|
||||
"def quick_website_analysis(url):\n",
|
||||
" \"\"\"Perform a quick analysis of a website without generating full brochure\"\"\"\n",
|
||||
" website = Website(url)\n",
|
||||
" \n",
|
||||
" analysis = f\"\"\"\n",
|
||||
" # Quick Website Analysis: {url}\n",
|
||||
" \n",
|
||||
" **Title:** {website.title}\n",
|
||||
" **Total Links Found:** {len(website.links)}\n",
|
||||
" **Content Length:** {len(website.text)} characters\n",
|
||||
" \n",
|
||||
" ## Sample Content (first 500 characters):\n",
|
||||
" {website.text[:500]}...\n",
|
||||
" \n",
|
||||
" ## All Links:\n",
|
||||
" {chr(10).join(website.links[:10])} # Show first 10 links\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" display_content(analysis, is_markdown=True)\n",
|
||||
" return analysis\n",
|
||||
"\n",
|
||||
"print(\"✅ Custom functions loaded!\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage Examples with Custom Functions\n",
|
||||
"\n",
|
||||
"Try these examples with the custom functions we just created.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Example: Quick website analysis\n",
|
||||
"test_url = \"https://openai.com\" # Change this to any website\n",
|
||||
"\n",
|
||||
"print(\"🔍 Performing quick website analysis...\")\n",
|
||||
"print(\"=\" * 50)\n",
|
||||
"\n",
|
||||
"quick_analysis = quick_website_analysis(test_url)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Example: Generate custom brochure with specific focus\n",
|
||||
"custom_url = \"https://anthropic.com\" # Change this to any website\n",
|
||||
"focus_areas = [\"AI safety\", \"research\", \"products\", \"team\"] # Custom focus areas\n",
|
||||
"\n",
|
||||
"print(\"🎯 Generating custom brochure with specific focus...\")\n",
|
||||
"print(f\"Focus areas: {', '.join(focus_areas)}\")\n",
|
||||
"print(\"=\" * 50)\n",
|
||||
"\n",
|
||||
"custom_brochure = generate_custom_brochure(custom_url, focus_areas)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Example: Generate brochure and save to file\n",
|
||||
"save_url = \"https://huggingface.co\" # Change this to any website\n",
|
||||
"\n",
|
||||
"print(\"💾 Generating brochure and saving to file...\")\n",
|
||||
"print(\"=\" * 50)\n",
|
||||
"\n",
|
||||
"# Generate brochure\n",
|
||||
"brochure_content = create_brochure(save_url)\n",
|
||||
"\n",
|
||||
"# Save to file\n",
|
||||
"filename = f\"brochure_{save_url.replace('https://', '').replace('/', '_')}.md\"\n",
|
||||
"save_success = save_brochure_to_file(brochure_content, filename, save_url)\n",
|
||||
"\n",
|
||||
"if save_success:\n",
|
||||
" print(f\"📁 You can find the saved brochure in: {filename}\")\n",
|
||||
"else:\n",
|
||||
" print(\"❌ Failed to save brochure to file\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Troubleshooting and Tips\n",
|
||||
"\n",
|
||||
"### Common Issues and Solutions\n",
|
||||
"\n",
|
||||
"1. **API Key Issues**\n",
|
||||
" - Make sure your OpenAI API key is set in the `.env` file\n",
|
||||
" - Verify your API key has sufficient credits\n",
|
||||
" - Check that the key starts with `sk-proj-`\n",
|
||||
"\n",
|
||||
"2. **Website Scraping Issues**\n",
|
||||
" - Some websites may block automated requests\n",
|
||||
" - Try different websites if one fails\n",
|
||||
" - The tool uses a standard User-Agent header to avoid basic blocking\n",
|
||||
"\n",
|
||||
"3. **Memory Issues**\n",
|
||||
" - Large websites may consume significant memory\n",
|
||||
" - The tool truncates content to 15,000 characters to manage this\n",
|
||||
"\n",
|
||||
"4. **Rate Limiting**\n",
|
||||
" - OpenAI has rate limits on API calls\n",
|
||||
" - If you hit limits, wait a few minutes before trying again\n",
|
||||
"\n",
|
||||
"### Tips for Better Results\n",
|
||||
"\n",
|
||||
"1. **Choose Good Websites**\n",
|
||||
" - Websites with clear About, Products, and Careers pages work best\n",
|
||||
" - Avoid websites that are mostly images or require JavaScript\n",
|
||||
"\n",
|
||||
"2. **Use Streaming for Long Content**\n",
|
||||
" - Enable streaming for better user experience with long brochures\n",
|
||||
" - Streaming shows progress in real-time\n",
|
||||
"\n",
|
||||
"3. **Custom Focus Areas**\n",
|
||||
" - Use the custom brochure function to focus on specific aspects\n",
|
||||
" - This can help generate more targeted content\n",
|
||||
"\n",
|
||||
"4. **Save Your Work**\n",
|
||||
" - Use the save function to keep brochures for later reference\n",
|
||||
" - Files are saved in markdown format for easy editing\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Conclusion\n",
|
||||
"\n",
|
||||
"This Jupyter notebook provides a comprehensive interface for the Website Brochure Generator. You can:\n",
|
||||
"\n",
|
||||
"- ✅ Generate professional brochures from any website\n",
|
||||
"- ✅ Translate brochures to multiple languages\n",
|
||||
"- ✅ Use interactive widgets for easy operation\n",
|
||||
"- ✅ Save brochures to files for later use\n",
|
||||
"- ✅ Perform quick website analysis\n",
|
||||
"- ✅ Create custom brochures with specific focus areas\n",
|
||||
"- ✅ Generate brochures with streaming output for real-time feedback\n",
|
||||
"\n",
|
||||
"### Next Steps\n",
|
||||
"\n",
|
||||
"1. **Try the Interactive Widget**: Use the widget interface above to generate brochures for your favorite websites\n",
|
||||
"2. **Experiment with Different URLs**: Test the tool with various types of websites\n",
|
||||
"3. **Explore Translation Features**: Generate brochures in different languages\n",
|
||||
"4. **Save Your Work**: Use the save function to keep your generated brochures\n",
|
||||
"5. **Customize Focus Areas**: Create brochures tailored to specific aspects of companies\n",
|
||||
"\n",
|
||||
"### Support\n",
|
||||
"\n",
|
||||
"For issues and questions:\n",
|
||||
"- Check the troubleshooting section above\n",
|
||||
"- Verify your OpenAI API key is properly configured\n",
|
||||
"- Ensure you have a stable internet connection\n",
|
||||
"- Try different websites if one fails\n",
|
||||
"\n",
|
||||
"Happy brochure generating! 🚀\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -0,0 +1,356 @@
|
||||
from openai import OpenAI
|
||||
from dotenv import load_dotenv
|
||||
import os
|
||||
import requests
|
||||
import json
|
||||
from typing import List
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
# Rich library for beautiful terminal markdown rendering
|
||||
from rich.console import Console
|
||||
from rich.markdown import Markdown as RichMarkdown
|
||||
|
||||
def get_client_and_headers():
|
||||
load_dotenv(override=True)
|
||||
api_key = os.getenv("OPENAI_API_KEY")
|
||||
if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
|
||||
# print("API key looks good so far")
|
||||
pass
|
||||
else:
|
||||
print("There might be a problem with your API key")
|
||||
|
||||
client = OpenAI(api_key=api_key)
|
||||
|
||||
headers = {
|
||||
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
|
||||
}
|
||||
return client, headers
|
||||
|
||||
# Utility methods to display content in markdown format
|
||||
def print_markdown_terminal(text):
|
||||
"""Print markdown-formatted text to terminal with beautiful formatting using Rich"""
|
||||
console = Console()
|
||||
console.print(RichMarkdown(text))
|
||||
|
||||
def display_content(content, is_markdown=True):
|
||||
"""Display content using Rich formatting"""
|
||||
if is_markdown:
|
||||
print_markdown_terminal(content)
|
||||
else:
|
||||
print(content)
|
||||
|
||||
def stream_content(response, title="Content"):
|
||||
"""
|
||||
Utility function to handle streaming content display using Rich
|
||||
|
||||
Args:
|
||||
response: OpenAI streaming response object
|
||||
title (str): Title to display for the streaming content
|
||||
|
||||
Returns:
|
||||
str: Complete streamed content
|
||||
"""
|
||||
result = ""
|
||||
console = Console()
|
||||
|
||||
# Terminal streaming with real-time output using Rich
|
||||
console.print(f"\n[bold blue]{title}...[/bold blue]\n")
|
||||
for chunk in response:
|
||||
content = chunk.choices[0].delta.content or ""
|
||||
result += content
|
||||
# Print each chunk as it arrives for streaming effect
|
||||
print(content, end='', flush=True)
|
||||
console.print(f"\n\n[bold green]{'='*50}[/bold green]")
|
||||
console.print(f"[bold green]{title.upper()} COMPLETE[/bold green]")
|
||||
console.print(f"[bold green]{'='*50}[/bold green]")
|
||||
|
||||
return result
|
||||
|
||||
# Utility class to get the contents of a website
|
||||
class Website:
|
||||
def __init__(self, url):
|
||||
self.url = url
|
||||
self.client, self.headers = get_client_and_headers()
|
||||
response = requests.get(url, headers=self.headers)
|
||||
self.body = response.content
|
||||
soup = BeautifulSoup(self.body, 'html.parser')
|
||||
self.title = soup.title.string if soup.title else "No title found"
|
||||
if soup.body:
|
||||
for irrelevant in soup.body(["script", "style", "img", "input"]):
|
||||
irrelevant.decompose()
|
||||
self.text = soup.body.get_text(separator="\n", strip=True)
|
||||
else:
|
||||
self.text = ""
|
||||
links = [link.get('href') for link in soup.find_all('a')]
|
||||
self.links = [link for link in links if link]
|
||||
|
||||
def get_contents(self):
|
||||
return f"Webpage Title: {self.title}\nWebpage Contents: {self.text}\n\n"
|
||||
|
||||
def get_links_system_prompt():
|
||||
link_system_prompt = """"You are provided with a list of links found on a webpage. \
|
||||
You are able to decide which of the links would be most relevant to include in a brochure about the company. \
|
||||
Relevant links usually include: About page, or a Company page, or Careers/Jobs pages or News page\n"""
|
||||
link_system_prompt += "Always respond in JSON exactly like this: \n"
|
||||
link_system_prompt += """
|
||||
{
|
||||
"links": [
|
||||
{"type": "<page type>", "url": "<full URL>"},
|
||||
{"type": "<page type>", "url": "<full URL>"}
|
||||
]
|
||||
}\n
|
||||
"""
|
||||
link_system_prompt += """ If no relevant links are found, return:
|
||||
{
|
||||
"links": []
|
||||
}\n
|
||||
"""
|
||||
link_system_prompt += "If multiple links could map to the same type (e.g. two About pages), include the best candidate only.\n"
|
||||
|
||||
link_system_prompt += "You should respond in JSON as in the below examples:\n"
|
||||
link_system_prompt += """
|
||||
## Example 1
|
||||
Input links:
|
||||
- https://acme.com/about
|
||||
- https://acme.com/pricing
|
||||
- https://acme.com/blog
|
||||
- https://acme.com/signup
|
||||
|
||||
Output:
|
||||
{
|
||||
"links": [
|
||||
{"type": "about page", "url": "https://acme.com/about"},
|
||||
{"type": "blog page", "url": "https://acme.com/blog"},
|
||||
{"type": "pricing page", "url": "https://acme.com/pricing"}
|
||||
]
|
||||
}
|
||||
"""
|
||||
link_system_prompt += """
|
||||
## Example 2
|
||||
Input links:
|
||||
- https://startup.io/
|
||||
- https://startup.io/company
|
||||
- https://startup.io/careers
|
||||
- https://startup.io/support
|
||||
|
||||
Output:
|
||||
{
|
||||
"links": [
|
||||
{"type": "company page", "url": "https://startup.io/company"},
|
||||
{"type": "careers page", "url": "https://startup.io/careers"}
|
||||
]
|
||||
}
|
||||
"""
|
||||
link_system_prompt += """
|
||||
## Example 3
|
||||
Input links:
|
||||
- https://coolapp.xyz/login
|
||||
- https://coolapp.xyz/random
|
||||
|
||||
Output:
|
||||
{
|
||||
"links": []
|
||||
}
|
||||
"""
|
||||
return link_system_prompt
|
||||
|
||||
def get_links_user_prompt(website):
|
||||
user_prompt = f"Here is the list of links on the website of {website.url} - "
|
||||
user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \n"
|
||||
user_prompt += "Do not include Terms of Service, Privacy, email links.\n"
|
||||
user_prompt += "Links (some might be relative links):\n"
|
||||
user_prompt += "\n".join(website.links)
|
||||
return user_prompt
|
||||
|
||||
def get_brochure_system_prompt():
|
||||
brochure_system_prompt = """
|
||||
You are an assistant that analyzes the contents of several relevant pages from a company website \
|
||||
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.
|
||||
Include details of company culture, customers and careers/jobs if you have the information.
|
||||
"""
|
||||
return brochure_system_prompt
|
||||
|
||||
def get_brochure_user_prompt(url):
|
||||
user_prompt = f"You are looking at a company details of: {url}\n"
|
||||
user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
|
||||
user_prompt += get_details_for_brochure(url)
|
||||
user_prompt = user_prompt[:15000] # Truncate if more than 15,000 characters
|
||||
return user_prompt
|
||||
|
||||
def get_translation_system_prompt(target_language):
|
||||
translation_system_prompt = f"You are a professional translator specializing in business and marketing content. \
|
||||
Translate the provided brochure to {target_language} while maintaining all formatting and professional tone."
|
||||
return translation_system_prompt
|
||||
|
||||
def get_translation_user_prompt(original_brochure, target_language):
|
||||
translation_prompt = f"""
|
||||
You are a professional translator. Please translate the following brochure content to {target_language}.
|
||||
|
||||
Important guidelines:
|
||||
- Maintain the markdown formatting exactly as it appears
|
||||
- Keep all headers, bullet points, and structure intact
|
||||
- Translate the content naturally and professionally
|
||||
- Preserve any company names, product names, or proper nouns unless they have established translations
|
||||
- Maintain the professional tone and marketing style
|
||||
|
||||
Brochure content to translate:
|
||||
{original_brochure}
|
||||
"""
|
||||
return translation_prompt
|
||||
|
||||
def get_links(url):
|
||||
website = Website(url)
|
||||
response = website.client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[
|
||||
{"role": "system", "content": get_links_system_prompt()},
|
||||
{"role": "user", "content": get_links_user_prompt(website)}
|
||||
],
|
||||
response_format={"type": "json_object"}
|
||||
)
|
||||
result = response.choices[0].message.content
|
||||
print("get_links:", result)
|
||||
return json.loads(result)
|
||||
|
||||
def get_details_for_brochure(url):
|
||||
website = Website(url)
|
||||
result = "Landing page:\n"
|
||||
result += website.get_contents()
|
||||
links = get_links(url)
|
||||
print("Found links:", links)
|
||||
for link in links["links"]:
|
||||
result += f"\n\n{link['type']}\n"
|
||||
result += Website(link["url"]).get_contents()
|
||||
return result
|
||||
|
||||
def create_brochure(url):
|
||||
website = Website(url)
|
||||
response = website.client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[
|
||||
{"role": "system", "content": get_brochure_system_prompt()},
|
||||
{"role": "user", "content": get_brochure_user_prompt(url)}
|
||||
]
|
||||
)
|
||||
result = response.choices[0].message.content
|
||||
display_content(result, is_markdown=True)
|
||||
return result
|
||||
|
||||
def stream_brochure(url):
|
||||
website = Website(url)
|
||||
response = website.client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[
|
||||
{"role": "system", "content": get_brochure_system_prompt()},
|
||||
{"role": "user", "content": get_brochure_user_prompt(url)}
|
||||
],
|
||||
stream=True
|
||||
)
|
||||
|
||||
# Use the reusable streaming utility function
|
||||
result = stream_content(response, "Generating brochure")
|
||||
return result
|
||||
|
||||
def translate_brochure(url, target_language="Spanish", stream_mode=False):
|
||||
"""
|
||||
Generate a brochure and translate it to the target language
|
||||
|
||||
Args:
|
||||
url (str): The website URL to generate brochure from
|
||||
target_language (str): The target language for translation (default: "Spanish")
|
||||
stream_mode (bool): Whether to use streaming output (default: False)
|
||||
|
||||
Returns:
|
||||
str: Translated brochure content
|
||||
"""
|
||||
# First generate the original brochure
|
||||
original_brochure = create_brochure(url)
|
||||
|
||||
# Get translation prompts
|
||||
translation_system_prompt = get_translation_system_prompt(target_language)
|
||||
translation_user_prompt = get_translation_user_prompt(original_brochure, target_language)
|
||||
|
||||
# Get OpenAI client
|
||||
website = Website(url)
|
||||
|
||||
if stream_mode:
|
||||
# Generate translation using OpenAI with streaming
|
||||
response = website.client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[
|
||||
{"role": "system", "content": translation_system_prompt},
|
||||
{"role": "user", "content": translation_user_prompt}
|
||||
],
|
||||
stream=True
|
||||
)
|
||||
|
||||
# Use the reusable streaming utility function
|
||||
translated_brochure = stream_content(response, f"Translating brochure to {target_language}")
|
||||
else:
|
||||
# Generate translation using OpenAI with complete output
|
||||
response = website.client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[
|
||||
{"role": "system", "content": translation_system_prompt},
|
||||
{"role": "user", "content": translation_user_prompt}
|
||||
]
|
||||
)
|
||||
|
||||
translated_brochure = response.choices[0].message.content
|
||||
|
||||
# Display the translated content
|
||||
display_content(translated_brochure, is_markdown=True)
|
||||
|
||||
return translated_brochure
|
||||
|
||||
|
||||
# Main function for terminal usage
|
||||
def main():
|
||||
"""Main function for running the brochure generator from terminal"""
|
||||
import sys
|
||||
|
||||
if len(sys.argv) != 2:
|
||||
console = Console()
|
||||
console.print("[bold red]Usage:[/bold red] python website_brochure_generator.py <website_url>")
|
||||
console.print("[bold blue]Example:[/bold blue] python website_brochure_generator.py https://example.com")
|
||||
sys.exit(1)
|
||||
|
||||
url = sys.argv[1]
|
||||
console = Console()
|
||||
|
||||
console.print(f"[bold green]Generating brochure for:[/bold green] {url}")
|
||||
console.print("\n[bold yellow]Choose display mode:[/bold yellow]")
|
||||
console.print("1. Complete output (display all at once)")
|
||||
console.print("2. Stream output (real-time generation)")
|
||||
|
||||
display_choice = input("\nEnter choice (1 or 2): ").strip()
|
||||
|
||||
# Generate brochure based on display choice
|
||||
if display_choice == "1":
|
||||
result = create_brochure(url)
|
||||
elif display_choice == "2":
|
||||
result = stream_brochure(url)
|
||||
else:
|
||||
console.print("[bold red]Invalid choice. Using default: complete output[/bold red]")
|
||||
result = create_brochure(url)
|
||||
|
||||
# Ask if user wants translation
|
||||
console.print("\n[bold yellow]Translation options:[/bold yellow]")
|
||||
console.print("1. No translation (original only)")
|
||||
console.print("2. Translate to another language")
|
||||
|
||||
translation_choice = input("\nEnter choice (1 or 2): ").strip()
|
||||
|
||||
if translation_choice == "2":
|
||||
target_language = input("Enter target language (e.g., Spanish, French, German, Chinese): ").strip()
|
||||
if not target_language:
|
||||
target_language = "Spanish"
|
||||
|
||||
# Pass the stream mode based on the display choice
|
||||
stream_mode = (display_choice == "2")
|
||||
translate_brochure(url, target_language, stream_mode=stream_mode)
|
||||
else:
|
||||
pass
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user