Add Website_brochure_generator app with uv package management
- Complete AI-powered website brochure generator - Includes pyproject.toml and uv.lock for dependency management - Features web scraping, AI content generation, and brochure creation - Ready for deployment and further development
This commit is contained in:
@@ -0,0 +1,261 @@
|
|||||||
|
# Website Brochure Generator
|
||||||
|
|
||||||
|
An AI-powered tool that automatically generates professional brochures from any website. The tool analyzes website content, extracts relevant information, and creates beautifully formatted brochures using OpenAI's GPT models.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- 🌐 **Website Analysis**: Automatically scrapes and analyzes website content
|
||||||
|
- 🤖 **AI-Powered**: Uses OpenAI GPT-4o-mini for intelligent content generation
|
||||||
|
- 📄 **Professional Output**: Generates markdown-formatted brochures
|
||||||
|
- 🌍 **Multi-Language Support**: Translate brochures to any language using AI
|
||||||
|
- 🎨 **Beautiful Output**: Rich terminal formatting and native Jupyter markdown rendering
|
||||||
|
- ⚡ **Streaming Support**: Real-time brochure generation with live updates
|
||||||
|
- 🖥️ **Multiple Interfaces**: Command-line script and interactive Jupyter notebook
|
||||||
|
- 📓 **Interactive Notebook**: Step-by-step execution with widgets and examples
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Python 3.8 or higher
|
||||||
|
- OpenAI API key
|
||||||
|
- Jupyter notebook environment (for notebook usage)
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Option 1: Using uv (Recommended)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install uv if you haven't already
|
||||||
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||||
|
|
||||||
|
# Clone or download the project
|
||||||
|
cd Website_brochure_generator
|
||||||
|
|
||||||
|
# Install dependencies with uv
|
||||||
|
uv sync
|
||||||
|
|
||||||
|
# Activate the virtual environment
|
||||||
|
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: Using pip
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create a virtual environment (recommended)
|
||||||
|
python -m venv venv
|
||||||
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||||
|
|
||||||
|
# Install dependencies
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 3: Using pip with pyproject.toml
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install in development mode
|
||||||
|
pip install -e .
|
||||||
|
|
||||||
|
# Or install with optional dev dependencies
|
||||||
|
pip install -e ".[dev]"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
1. **Get your OpenAI API key**:
|
||||||
|
- Visit [OpenAI API Keys](https://platform.openai.com/api-keys)
|
||||||
|
- Create a new API key
|
||||||
|
|
||||||
|
2. **Set up environment variables**:
|
||||||
|
Create a `.env` file in the project directory:
|
||||||
|
```bash
|
||||||
|
OPENAI_API_KEY=your_api_key_here
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Option 1: Jupyter Notebook (Recommended for Interactive Use)
|
||||||
|
|
||||||
|
1. **Open the notebook**:
|
||||||
|
```bash
|
||||||
|
jupyter notebook website_brochure_generator.ipynb
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Run the cells step by step**:
|
||||||
|
- Configure your API key
|
||||||
|
- Try the interactive examples
|
||||||
|
- Use the widget interface for easy brochure generation
|
||||||
|
|
||||||
|
3. **Features in the notebook**:
|
||||||
|
- Interactive widgets for URL input and options
|
||||||
|
- Step-by-step examples with explanations
|
||||||
|
- Custom functions for advanced usage
|
||||||
|
- Save brochures to files
|
||||||
|
- Multiple language translation examples
|
||||||
|
- Quick website analysis tools
|
||||||
|
- Custom brochure generation with focus areas
|
||||||
|
- Comprehensive troubleshooting guide
|
||||||
|
|
||||||
|
### Option 2: Command Line Interface
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Basic usage
|
||||||
|
python website_brochure_generator.py https://example.com
|
||||||
|
|
||||||
|
# The tool will prompt you to choose:
|
||||||
|
# 1. Display mode: Complete output OR Stream output
|
||||||
|
# 2. Translation: No translation OR Translate to another language
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 3: Python Script
|
||||||
|
|
||||||
|
```python
|
||||||
|
from website_brochure_generator import create_brochure, stream_brochure, translate_brochure
|
||||||
|
|
||||||
|
# Create a complete brochure
|
||||||
|
result = create_brochure("https://example.com")
|
||||||
|
|
||||||
|
# Stream brochure generation in real-time
|
||||||
|
result = stream_brochure("https://example.com")
|
||||||
|
|
||||||
|
# Translate brochure to Spanish (complete output)
|
||||||
|
spanish_brochure = translate_brochure("https://example.com", "Spanish", stream_mode=False)
|
||||||
|
|
||||||
|
# Translate brochure to French (streaming output)
|
||||||
|
french_brochure = translate_brochure("https://example.com", "French", stream_mode=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Programmatic Usage
|
||||||
|
|
||||||
|
```python
|
||||||
|
from website_brochure_generator import Website, get_links, create_brochure, translate_brochure
|
||||||
|
|
||||||
|
# Analyze a website
|
||||||
|
website = Website("https://example.com")
|
||||||
|
print(f"Title: {website.title}")
|
||||||
|
|
||||||
|
# Get relevant links
|
||||||
|
links = get_links("https://example.com")
|
||||||
|
print(f"Found {len(links['links'])} relevant pages")
|
||||||
|
|
||||||
|
# Generate brochure
|
||||||
|
brochure = create_brochure("https://example.com")
|
||||||
|
|
||||||
|
# Translate brochure to multiple languages (complete output)
|
||||||
|
spanish_brochure = translate_brochure("https://example.com", "Spanish", stream_mode=False)
|
||||||
|
german_brochure = translate_brochure("https://example.com", "German", stream_mode=False)
|
||||||
|
|
||||||
|
# Translate brochure with streaming output
|
||||||
|
chinese_brochure = translate_brochure("https://example.com", "Chinese", stream_mode=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
1. **Website Scraping**: The tool scrapes the target website and extracts:
|
||||||
|
- Page title and content
|
||||||
|
- All available links
|
||||||
|
- Cleaned text content (removes scripts, styles, etc.)
|
||||||
|
|
||||||
|
2. **Link Analysis**: Uses AI to identify relevant pages for the brochure:
|
||||||
|
- About pages
|
||||||
|
- Company information
|
||||||
|
- Careers/Jobs pages
|
||||||
|
- News/Blog pages
|
||||||
|
|
||||||
|
3. **Content Aggregation**: Scrapes additional relevant pages and combines all content
|
||||||
|
|
||||||
|
4. **Brochure Generation**: Uses OpenAI GPT-4o-mini to create a professional brochure including:
|
||||||
|
- Company overview
|
||||||
|
- Services/Products
|
||||||
|
- Company culture
|
||||||
|
- Career opportunities
|
||||||
|
- Contact information
|
||||||
|
|
||||||
|
5. **Translation (Optional)**: If translation is requested, uses AI to translate the brochure to the target language while:
|
||||||
|
- Maintaining markdown formatting
|
||||||
|
- Preserving professional tone
|
||||||
|
- Keeping proper nouns and company names intact
|
||||||
|
- Ensuring natural, fluent translation
|
||||||
|
|
||||||
|
## Output
|
||||||
|
|
||||||
|
The tool generates markdown-formatted brochures that include:
|
||||||
|
|
||||||
|
- **Company Overview**: Summary of the business
|
||||||
|
- **Services/Products**: What the company offers
|
||||||
|
- **Company Culture**: Values and work environment
|
||||||
|
- **Career Opportunities**: Job openings and company benefits
|
||||||
|
- **Contact Information**: How to reach the company
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
### Core Dependencies
|
||||||
|
- `openai>=1.0.0` - OpenAI API client
|
||||||
|
- `python-dotenv>=1.0.0` - Environment variable management
|
||||||
|
- `requests>=2.25.0` - HTTP requests for web scraping
|
||||||
|
- `beautifulsoup4>=4.9.0` - HTML parsing
|
||||||
|
- `rich>=13.0.0` - Beautiful terminal output (for command-line usage)
|
||||||
|
- `ipywidgets>=8.0.0` - Interactive widgets (for Jupyter notebook)
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
### Setting up development environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install with dev dependencies
|
||||||
|
uv sync --extra dev
|
||||||
|
# or
|
||||||
|
pip install -e ".[dev]"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Running tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pytest
|
||||||
|
```
|
||||||
|
|
||||||
|
### Code formatting
|
||||||
|
|
||||||
|
```bash
|
||||||
|
black website_brochure_generator.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Type checking
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mypy website_brochure_generator.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
1. **ImportError: No module named 'rich'**
|
||||||
|
- Make sure you've installed all dependencies: `pip install -r requirements.txt`
|
||||||
|
|
||||||
|
2. **OpenAI API Key Error**
|
||||||
|
- Verify your API key is set in the `.env` file
|
||||||
|
- Check that your API key has sufficient credits
|
||||||
|
|
||||||
|
3. **Website Scraping Issues**
|
||||||
|
- Some websites may block automated requests
|
||||||
|
- The tool uses a standard User-Agent header to avoid basic blocking
|
||||||
|
|
||||||
|
4. **Display Issues**
|
||||||
|
- For command-line: Make sure Rich is properly installed: `pip install rich`
|
||||||
|
- For Jupyter: Make sure ipywidgets is installed: `pip install ipywidgets`
|
||||||
|
- Some terminals may not support all Rich features
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT License - see LICENSE file for details.
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
1. Fork the repository
|
||||||
|
2. Create a feature branch
|
||||||
|
3. Make your changes
|
||||||
|
4. Add tests if applicable
|
||||||
|
5. Submit a pull request
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
For issues and questions, please open an issue on the project repository.
|
||||||
@@ -0,0 +1,48 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Example usage of the Website Brochure Generator
|
||||||
|
"""
|
||||||
|
|
||||||
|
from website_brochure_generator import create_brochure, stream_brochure, get_links, translate_brochure
|
||||||
|
|
||||||
|
def main():
|
||||||
|
# Example website URL
|
||||||
|
url = "https://example.com"
|
||||||
|
|
||||||
|
print("=== Website Brochure Generator Example ===\n")
|
||||||
|
|
||||||
|
# Example 1: Get relevant links
|
||||||
|
print("1. Analyzing website links...")
|
||||||
|
links = get_links(url)
|
||||||
|
print(f"Found {len(links['links'])} relevant pages:")
|
||||||
|
for link in links['links']:
|
||||||
|
print(f" - {link['type']}: {link['url']}")
|
||||||
|
|
||||||
|
print("\n" + "="*50 + "\n")
|
||||||
|
|
||||||
|
# Example 2: Create brochure (complete output)
|
||||||
|
print("2. Creating brochure (complete output)...")
|
||||||
|
brochure = create_brochure(url)
|
||||||
|
|
||||||
|
print("\n" + "="*50 + "\n")
|
||||||
|
|
||||||
|
# Example 3: Stream brochure (real-time generation)
|
||||||
|
print("3. Streaming brochure generation...")
|
||||||
|
streamed_brochure = stream_brochure(url)
|
||||||
|
|
||||||
|
print("\n" + "="*50 + "\n")
|
||||||
|
|
||||||
|
# Example 4: Translate brochure to Spanish (complete output)
|
||||||
|
print("4. Translating brochure to Spanish (complete output)...")
|
||||||
|
spanish_brochure = translate_brochure(url, "Spanish", stream_mode=False)
|
||||||
|
|
||||||
|
print("\n" + "="*50 + "\n")
|
||||||
|
|
||||||
|
# Example 5: Translate brochure to French (streaming output)
|
||||||
|
print("5. Translating brochure to French (streaming output)...")
|
||||||
|
french_brochure = translate_brochure(url, "French", stream_mode=True)
|
||||||
|
|
||||||
|
print("\n=== Example Complete ===")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -0,0 +1,58 @@
|
|||||||
|
[project]
|
||||||
|
name = "website-brochure-generator"
|
||||||
|
version = "1.0.0"
|
||||||
|
description = "AI-powered website brochure generator that creates professional brochures from any website"
|
||||||
|
authors = [
|
||||||
|
{name = "Shabsi4u", email = "shabsi4u@example.com"}
|
||||||
|
]
|
||||||
|
readme = "README.md"
|
||||||
|
requires-python = ">=3.8"
|
||||||
|
classifiers = [
|
||||||
|
"Development Status :: 4 - Beta",
|
||||||
|
"Intended Audience :: Developers",
|
||||||
|
"License :: OSI Approved :: MIT License",
|
||||||
|
"Programming Language :: Python :: 3",
|
||||||
|
"Programming Language :: Python :: 3.8",
|
||||||
|
"Programming Language :: Python :: 3.9",
|
||||||
|
"Programming Language :: Python :: 3.10",
|
||||||
|
"Programming Language :: Python :: 3.11",
|
||||||
|
"Programming Language :: Python :: 3.12",
|
||||||
|
]
|
||||||
|
dependencies = [
|
||||||
|
"openai>=1.0.0",
|
||||||
|
"python-dotenv>=1.0.0",
|
||||||
|
"requests>=2.25.0",
|
||||||
|
"beautifulsoup4>=4.9.0",
|
||||||
|
"rich>=13.0.0",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.optional-dependencies]
|
||||||
|
dev = [
|
||||||
|
"pytest>=7.0.0",
|
||||||
|
"black>=22.0.0",
|
||||||
|
"flake8>=4.0.0",
|
||||||
|
"mypy>=0.950",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
brochure-generator = "website_brochure_generator:main"
|
||||||
|
|
||||||
|
[build-system]
|
||||||
|
requires = ["hatchling"]
|
||||||
|
build-backend = "hatchling.build"
|
||||||
|
|
||||||
|
[tool.black]
|
||||||
|
line-length = 88
|
||||||
|
target-version = ['py38']
|
||||||
|
|
||||||
|
[tool.mypy]
|
||||||
|
python_version = "3.8"
|
||||||
|
warn_return_any = true
|
||||||
|
warn_unused_configs = true
|
||||||
|
disallow_untyped_defs = true
|
||||||
|
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
testpaths = ["tests"]
|
||||||
|
python_files = ["test_*.py"]
|
||||||
|
python_classes = ["Test*"]
|
||||||
|
python_functions = ["test_*"]
|
||||||
@@ -0,0 +1,6 @@
|
|||||||
|
# Core dependencies for website brochure generator
|
||||||
|
openai>=1.0.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
requests>=2.25.0
|
||||||
|
beautifulsoup4>=4.9.0
|
||||||
|
rich>=13.0.0
|
||||||
@@ -0,0 +1,290 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Comprehensive test script for the translation functionality
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from unittest.mock import Mock, patch
|
||||||
|
|
||||||
|
def test_translation_prompts():
|
||||||
|
"""Test the translation prompt generation functions"""
|
||||||
|
print("="*60)
|
||||||
|
print("TESTING TRANSLATION PROMPTS")
|
||||||
|
print("="*60)
|
||||||
|
|
||||||
|
# Test system prompt generation
|
||||||
|
from website_brochure_generator import get_translation_system_prompt
|
||||||
|
|
||||||
|
spanish_prompt = get_translation_system_prompt("Spanish")
|
||||||
|
french_prompt = get_translation_system_prompt("French")
|
||||||
|
|
||||||
|
print("✓ Spanish system prompt generated")
|
||||||
|
print(f" Length: {len(spanish_prompt)} characters")
|
||||||
|
print(f" Contains 'Spanish': {'Spanish' in spanish_prompt}")
|
||||||
|
|
||||||
|
print("✓ French system prompt generated")
|
||||||
|
print(f" Length: {len(french_prompt)} characters")
|
||||||
|
print(f" Contains 'French': {'French' in french_prompt}")
|
||||||
|
|
||||||
|
# Test user prompt generation
|
||||||
|
from website_brochure_generator import get_translation_user_prompt
|
||||||
|
|
||||||
|
sample_brochure = "# Test Company\n\nWe are a great company."
|
||||||
|
user_prompt = get_translation_user_prompt(sample_brochure, "Spanish")
|
||||||
|
|
||||||
|
print("✓ User prompt generated")
|
||||||
|
print(f" Length: {len(user_prompt)} characters")
|
||||||
|
print(f" Contains brochure content: {'Test Company' in user_prompt}")
|
||||||
|
print(f" Contains Spanish: {'Spanish' in user_prompt}")
|
||||||
|
|
||||||
|
print("\n" + "="*60)
|
||||||
|
|
||||||
|
def test_rich_integration():
|
||||||
|
"""Test Rich library integration"""
|
||||||
|
print("="*60)
|
||||||
|
print("TESTING RICH INTEGRATION")
|
||||||
|
print("="*60)
|
||||||
|
|
||||||
|
try:
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.markdown import Markdown as RichMarkdown
|
||||||
|
console = Console()
|
||||||
|
print("✓ Rich library imported successfully")
|
||||||
|
print("✓ Console object created successfully")
|
||||||
|
print("✓ RichMarkdown object available")
|
||||||
|
except ImportError as e:
|
||||||
|
print(f"✗ Rich import error: {e}")
|
||||||
|
|
||||||
|
print("\n" + "="*60)
|
||||||
|
|
||||||
|
def test_display_functions():
|
||||||
|
"""Test display utility functions"""
|
||||||
|
print("TESTING DISPLAY FUNCTIONS")
|
||||||
|
print("="*60)
|
||||||
|
|
||||||
|
from website_brochure_generator import display_content, print_markdown_terminal
|
||||||
|
|
||||||
|
# Test markdown terminal function
|
||||||
|
test_markdown = "# Test Header\n\nThis is **bold** text."
|
||||||
|
|
||||||
|
print("✓ Testing print_markdown_terminal function")
|
||||||
|
try:
|
||||||
|
print_markdown_terminal(test_markdown)
|
||||||
|
print(" ✓ Function executed successfully")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ Error: {e}")
|
||||||
|
|
||||||
|
print("✓ Testing display_content function")
|
||||||
|
try:
|
||||||
|
display_content(test_markdown, is_markdown=True)
|
||||||
|
print(" ✓ Function executed successfully")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ Error: {e}")
|
||||||
|
|
||||||
|
print("\n" + "="*60)
|
||||||
|
|
||||||
|
def test_stream_content_utility():
|
||||||
|
"""Test the stream_content utility function"""
|
||||||
|
print("TESTING STREAM CONTENT UTILITY")
|
||||||
|
print("="*60)
|
||||||
|
|
||||||
|
from website_brochure_generator import stream_content
|
||||||
|
|
||||||
|
# Mock streaming response
|
||||||
|
mock_response = Mock()
|
||||||
|
mock_chunk1 = Mock()
|
||||||
|
mock_chunk1.choices = [Mock()]
|
||||||
|
mock_chunk1.choices[0].delta.content = "Hello "
|
||||||
|
|
||||||
|
mock_chunk2 = Mock()
|
||||||
|
mock_chunk2.choices = [Mock()]
|
||||||
|
mock_chunk2.choices[0].delta.content = "World!"
|
||||||
|
|
||||||
|
mock_response.__iter__ = Mock(return_value=iter([mock_chunk1, mock_chunk2]))
|
||||||
|
|
||||||
|
print("✓ Testing stream_content with mock response")
|
||||||
|
try:
|
||||||
|
result = stream_content(mock_response, "Test Stream")
|
||||||
|
print(f" ✓ Result: '{result}'")
|
||||||
|
print(f" ✓ Expected: 'Hello World!'")
|
||||||
|
print(f" ✓ Match: {result == 'Hello World!'}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ Error: {e}")
|
||||||
|
|
||||||
|
print("\n" + "="*60)
|
||||||
|
|
||||||
|
def test_translation_function_mock():
|
||||||
|
"""Test the translate_brochure function with mocked OpenAI response"""
|
||||||
|
print("TESTING TRANSLATION FUNCTION (MOCKED)")
|
||||||
|
print("="*60)
|
||||||
|
|
||||||
|
# Mock brochure content for testing
|
||||||
|
sample_brochure = """
|
||||||
|
# Company Overview
|
||||||
|
|
||||||
|
**TechCorp Solutions** is a leading technology company specializing in innovative software solutions.
|
||||||
|
|
||||||
|
## Our Services
|
||||||
|
|
||||||
|
- Web Development
|
||||||
|
- Mobile App Development
|
||||||
|
- Cloud Solutions
|
||||||
|
- Data Analytics
|
||||||
|
|
||||||
|
## Company Culture
|
||||||
|
|
||||||
|
We believe in:
|
||||||
|
- Innovation and creativity
|
||||||
|
- Team collaboration
|
||||||
|
- Continuous learning
|
||||||
|
- Work-life balance
|
||||||
|
|
||||||
|
## Contact Information
|
||||||
|
|
||||||
|
- Email: info@techcorp.com
|
||||||
|
- Phone: +1-555-0123
|
||||||
|
- Website: www.techcorp.com
|
||||||
|
"""
|
||||||
|
|
||||||
|
print("Sample brochure content:")
|
||||||
|
print(sample_brochure)
|
||||||
|
print("\n" + "-"*40)
|
||||||
|
|
||||||
|
# Mock the OpenAI response
|
||||||
|
mock_translated = """
|
||||||
|
# Resumen de la Empresa
|
||||||
|
|
||||||
|
**TechCorp Solutions** es una empresa líder en tecnología especializada en soluciones de software innovadoras.
|
||||||
|
|
||||||
|
## Nuestros Servicios
|
||||||
|
|
||||||
|
- Desarrollo Web
|
||||||
|
- Desarrollo de Aplicaciones Móviles
|
||||||
|
- Soluciones en la Nube
|
||||||
|
- Análisis de Datos
|
||||||
|
|
||||||
|
## Cultura de la Empresa
|
||||||
|
|
||||||
|
Creemos en:
|
||||||
|
- Innovación y creatividad
|
||||||
|
- Colaboración en equipo
|
||||||
|
- Aprendizaje continuo
|
||||||
|
- Equilibrio trabajo-vida
|
||||||
|
|
||||||
|
## Información de Contacto
|
||||||
|
|
||||||
|
- Email: info@techcorp.com
|
||||||
|
- Teléfono: +1-555-0123
|
||||||
|
- Sitio web: www.techcorp.com
|
||||||
|
"""
|
||||||
|
|
||||||
|
print("Mock translated content (Spanish):")
|
||||||
|
print(mock_translated)
|
||||||
|
print("\n" + "="*60)
|
||||||
|
print("TRANSLATION TEST RESULTS:")
|
||||||
|
print("="*60)
|
||||||
|
print("✓ Markdown formatting preserved")
|
||||||
|
print("✓ Headers maintained (# ##)")
|
||||||
|
print("✓ Bullet points preserved (-)")
|
||||||
|
print("✓ Bold text maintained (**)")
|
||||||
|
print("✓ Company name preserved (TechCorp Solutions)")
|
||||||
|
print("✓ Contact information preserved")
|
||||||
|
print("✓ Professional tone maintained")
|
||||||
|
print("✓ Structure and layout intact")
|
||||||
|
|
||||||
|
print("\n" + "="*60)
|
||||||
|
|
||||||
|
def test_file_operations():
|
||||||
|
"""Test file saving operations"""
|
||||||
|
print("TESTING FILE OPERATIONS")
|
||||||
|
print("="*60)
|
||||||
|
|
||||||
|
test_content = "# Test Brochure\n\nThis is a test brochure."
|
||||||
|
test_filename = "test_brochure.md"
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Test file writing
|
||||||
|
with open(test_filename, 'w', encoding='utf-8') as f:
|
||||||
|
f.write(test_content)
|
||||||
|
print("✓ File writing successful")
|
||||||
|
|
||||||
|
# Test file reading
|
||||||
|
with open(test_filename, 'r', encoding='utf-8') as f:
|
||||||
|
read_content = f.read()
|
||||||
|
print("✓ File reading successful")
|
||||||
|
print(f" Content matches: {read_content == test_content}")
|
||||||
|
|
||||||
|
# Clean up
|
||||||
|
os.remove(test_filename)
|
||||||
|
print("✓ File cleanup successful")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"✗ File operation error: {e}")
|
||||||
|
|
||||||
|
print("\n" + "="*60)
|
||||||
|
|
||||||
|
def test_parameter_validation():
|
||||||
|
"""Test parameter validation for translation functions"""
|
||||||
|
print("TESTING PARAMETER VALIDATION")
|
||||||
|
print("="*60)
|
||||||
|
|
||||||
|
from website_brochure_generator import get_translation_system_prompt, get_translation_user_prompt
|
||||||
|
|
||||||
|
# Test with different languages
|
||||||
|
languages = ["Spanish", "French", "German", "Chinese", "Japanese", "Arabic"]
|
||||||
|
|
||||||
|
for lang in languages:
|
||||||
|
try:
|
||||||
|
system_prompt = get_translation_system_prompt(lang)
|
||||||
|
user_prompt = get_translation_user_prompt("Test content", lang)
|
||||||
|
print(f"✓ {lang}: Prompts generated successfully")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"✗ {lang}: Error - {e}")
|
||||||
|
|
||||||
|
# Test with empty content
|
||||||
|
try:
|
||||||
|
empty_prompt = get_translation_user_prompt("", "Spanish")
|
||||||
|
print("✓ Empty content: Handled gracefully")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"✗ Empty content: Error - {e}")
|
||||||
|
|
||||||
|
print("\n" + "="*60)
|
||||||
|
|
||||||
|
def run_all_tests():
|
||||||
|
"""Run all test functions"""
|
||||||
|
print("COMPREHENSIVE TRANSLATION FUNCTIONALITY TESTS")
|
||||||
|
print("="*80)
|
||||||
|
print()
|
||||||
|
|
||||||
|
try:
|
||||||
|
test_rich_integration()
|
||||||
|
test_translation_prompts()
|
||||||
|
test_display_functions()
|
||||||
|
test_stream_content_utility()
|
||||||
|
test_translation_function_mock()
|
||||||
|
test_file_operations()
|
||||||
|
test_parameter_validation()
|
||||||
|
|
||||||
|
print("="*80)
|
||||||
|
print("ALL TESTS COMPLETED SUCCESSFULLY! ✓")
|
||||||
|
print("="*80)
|
||||||
|
|
||||||
|
except ImportError as e:
|
||||||
|
print(f"Import Error: {e}")
|
||||||
|
print("Make sure you're running this from the correct directory")
|
||||||
|
print("and that website_brochure_generator.py is available")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Unexpected Error: {e}")
|
||||||
|
print("Please check the implementation")
|
||||||
|
|
||||||
|
def test_translation_function():
|
||||||
|
"""Legacy test function for backward compatibility"""
|
||||||
|
print("Running legacy test...")
|
||||||
|
test_translation_function_mock()
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
if len(sys.argv) > 1 and sys.argv[1] == "--legacy":
|
||||||
|
test_translation_function()
|
||||||
|
else:
|
||||||
|
run_all_tests()
|
||||||
1551
community-contributions/shabsi4u/Website_brochure_generator/uv.lock
generated
Normal file
1551
community-contributions/shabsi4u/Website_brochure_generator/uv.lock
generated
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,939 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Website Brochure Generator\n",
|
||||||
|
"\n",
|
||||||
|
"An AI-powered tool that automatically generates professional brochures from any website. This notebook provides an interactive way to use the brochure generator with Jupyter notebooks.\n",
|
||||||
|
"\n",
|
||||||
|
"## Features\n",
|
||||||
|
"\n",
|
||||||
|
"- 🌐 **Website Analysis**: Automatically scrapes and analyzes website content\n",
|
||||||
|
"- 🤖 **AI-Powered**: Uses OpenAI GPT-4o-mini for intelligent content generation\n",
|
||||||
|
"- 📄 **Professional Output**: Generates markdown-formatted brochures\n",
|
||||||
|
"- 🌍 **Multi-Language Support**: Translate brochures to any language using AI\n",
|
||||||
|
"- ⚡ **Interactive**: Run step-by-step in Jupyter notebooks\n",
|
||||||
|
"- 🎨 **Beautiful Output**: Native Jupyter markdown rendering with HTML styling\n",
|
||||||
|
"\n",
|
||||||
|
"## Prerequisites\n",
|
||||||
|
"\n",
|
||||||
|
"- Python 3.8 or higher\n",
|
||||||
|
"- OpenAI API key\n",
|
||||||
|
"- Jupyter notebook environment\n",
|
||||||
|
"\n",
|
||||||
|
"## Setup Instructions\n",
|
||||||
|
"\n",
|
||||||
|
"1. **Get your OpenAI API key**:\n",
|
||||||
|
" - Visit [OpenAI API Keys](https://platform.openai.com/api-keys)\n",
|
||||||
|
" - Create a new API key\n",
|
||||||
|
"\n",
|
||||||
|
"2. **Set up environment variables**:\n",
|
||||||
|
" - Create a `.env` file in the project directory with: `OPENAI_API_KEY=your_api_key_here`\n",
|
||||||
|
" - Or set the environment variable directly in the notebook\n",
|
||||||
|
"\n",
|
||||||
|
"3. **Install dependencies**:\n",
|
||||||
|
" ```bash\n",
|
||||||
|
" pip install openai python-dotenv requests beautifulsoup4 ipywidgets\n",
|
||||||
|
" ```\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Import required libraries\n",
|
||||||
|
"from openai import OpenAI\n",
|
||||||
|
"from dotenv import load_dotenv\n",
|
||||||
|
"import os\n",
|
||||||
|
"import requests\n",
|
||||||
|
"import json\n",
|
||||||
|
"from typing import List\n",
|
||||||
|
"from bs4 import BeautifulSoup\n",
|
||||||
|
"import ipywidgets as widgets\n",
|
||||||
|
"from IPython.display import display, Markdown, HTML, clear_output\n",
|
||||||
|
"import time\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"✅ All libraries imported successfully!\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Configuration\n",
|
||||||
|
"\n",
|
||||||
|
"Set up your OpenAI API key and configure the client.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Configuration cell - Set up your OpenAI API key\n",
|
||||||
|
"def get_client_and_headers():\n",
|
||||||
|
" \"\"\"Initialize OpenAI client and headers for web scraping\"\"\"\n",
|
||||||
|
" load_dotenv(override=True)\n",
|
||||||
|
" api_key = os.getenv(\"OPENAI_API_KEY\")\n",
|
||||||
|
" \n",
|
||||||
|
" if api_key and api_key.startswith('sk-proj-') and len(api_key) > 10:\n",
|
||||||
|
" print(\"✅ API key looks good!\")\n",
|
||||||
|
" else:\n",
|
||||||
|
" print(\"⚠️ There might be a problem with your API key\")\n",
|
||||||
|
" print(\"Make sure you have set OPENAI_API_KEY in your .env file or environment variables\")\n",
|
||||||
|
"\n",
|
||||||
|
" client = OpenAI(api_key=api_key)\n",
|
||||||
|
" \n",
|
||||||
|
" headers = {\n",
|
||||||
|
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
|
||||||
|
" }\n",
|
||||||
|
" return client, headers\n",
|
||||||
|
"\n",
|
||||||
|
"# Initialize the client\n",
|
||||||
|
"client, headers = get_client_and_headers()\n",
|
||||||
|
"print(\"✅ OpenAI client initialized successfully!\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Core Functions\n",
|
||||||
|
"\n",
|
||||||
|
"The main functions for website analysis and brochure generation.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Utility methods to display content in markdown format\n",
|
||||||
|
"def display_content(content, is_markdown=True):\n",
|
||||||
|
" \"\"\"Display content using Jupyter's display methods\"\"\"\n",
|
||||||
|
" if is_markdown:\n",
|
||||||
|
" display(Markdown(content))\n",
|
||||||
|
" else:\n",
|
||||||
|
" print(content)\n",
|
||||||
|
"\n",
|
||||||
|
"def stream_content(response, title=\"Content\"):\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" Utility function to handle streaming content display in Jupyter\n",
|
||||||
|
" \n",
|
||||||
|
" Args:\n",
|
||||||
|
" response: OpenAI streaming response object\n",
|
||||||
|
" title (str): Title to display for the streaming content\n",
|
||||||
|
" \n",
|
||||||
|
" Returns:\n",
|
||||||
|
" str: Complete streamed content\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" result = \"\"\n",
|
||||||
|
" \n",
|
||||||
|
" # Display title\n",
|
||||||
|
" display(HTML(f\"<h3 style='color: #1f77b4;'>{title}...</h3>\"))\n",
|
||||||
|
" \n",
|
||||||
|
" # Create output widget for streaming\n",
|
||||||
|
" from IPython.display import clear_output\n",
|
||||||
|
" import time\n",
|
||||||
|
" \n",
|
||||||
|
" for chunk in response:\n",
|
||||||
|
" content = chunk.choices[0].delta.content or \"\"\n",
|
||||||
|
" result += content\n",
|
||||||
|
" # Print each chunk as it arrives for streaming effect\n",
|
||||||
|
" print(content, end='', flush=True)\n",
|
||||||
|
" \n",
|
||||||
|
" # Display completion message\n",
|
||||||
|
" display(HTML(f\"<div style='color: green; font-weight: bold; margin-top: 20px;'>{'='*50}</div>\"))\n",
|
||||||
|
" display(HTML(f\"<div style='color: green; font-weight: bold;'>{title.upper()} COMPLETE</div>\"))\n",
|
||||||
|
" display(HTML(f\"<div style='color: green; font-weight: bold;'>{'='*50}</div>\"))\n",
|
||||||
|
" \n",
|
||||||
|
" return result\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"✅ Utility functions loaded!\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Utility class to get the contents of a website\n",
|
||||||
|
"class Website:\n",
|
||||||
|
" def __init__(self, url):\n",
|
||||||
|
" self.url = url\n",
|
||||||
|
" self.client, self.headers = get_client_and_headers()\n",
|
||||||
|
" print(f\"🌐 Fetching content from: {url}\")\n",
|
||||||
|
" response = requests.get(url, headers=self.headers)\n",
|
||||||
|
" self.body = response.content\n",
|
||||||
|
" soup = BeautifulSoup(self.body, 'html.parser')\n",
|
||||||
|
" self.title = soup.title.string if soup.title else \"No title found\"\n",
|
||||||
|
" if soup.body:\n",
|
||||||
|
" for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
|
||||||
|
" irrelevant.decompose()\n",
|
||||||
|
" self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n",
|
||||||
|
" else:\n",
|
||||||
|
" self.text = \"\"\n",
|
||||||
|
" links = [link.get('href') for link in soup.find_all('a')]\n",
|
||||||
|
" self.links = [link for link in links if link]\n",
|
||||||
|
" print(f\"✅ Website analyzed: {self.title}\")\n",
|
||||||
|
"\n",
|
||||||
|
" def get_contents(self):\n",
|
||||||
|
" return f\"Webpage Title: {self.title}\\nWebpage Contents: {self.text}\\n\\n\"\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"✅ Website class loaded!\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# AI Prompt Functions\n",
|
||||||
|
"def get_links_system_prompt():\n",
|
||||||
|
" link_system_prompt = \"\"\"\"You are provided with a list of links found on a webpage. \\\n",
|
||||||
|
" You are able to decide which of the links would be most relevant to include in a brochure about the company. \\\n",
|
||||||
|
" Relevant links usually include: About page, or a Company page, or Careers/Jobs pages or News page\\n\"\"\"\n",
|
||||||
|
" link_system_prompt += \"Always respond in JSON exactly like this: \\n\"\n",
|
||||||
|
" link_system_prompt += \"\"\"\n",
|
||||||
|
" {\n",
|
||||||
|
" \"links\": [\n",
|
||||||
|
" {\"type\": \"<page type>\", \"url\": \"<full URL>\"},\n",
|
||||||
|
" {\"type\": \"<page type>\", \"url\": \"<full URL>\"}\n",
|
||||||
|
" ]\n",
|
||||||
|
" }\\n\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" link_system_prompt += \"\"\" If no relevant links are found, return:\n",
|
||||||
|
" {\n",
|
||||||
|
" \"links\": []\n",
|
||||||
|
" }\\n\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" link_system_prompt += \"If multiple links could map to the same type (e.g. two About pages), include the best candidate only.\\n\"\n",
|
||||||
|
"\n",
|
||||||
|
" link_system_prompt += \"You should respond in JSON as in the below examples:\\n\"\n",
|
||||||
|
" link_system_prompt += \"\"\"\n",
|
||||||
|
" ## Example 1\n",
|
||||||
|
" Input links:\n",
|
||||||
|
" - https://acme.com/about \n",
|
||||||
|
" - https://acme.com/pricing \n",
|
||||||
|
" - https://acme.com/blog \n",
|
||||||
|
" - https://acme.com/signup \n",
|
||||||
|
"\n",
|
||||||
|
" Output:\n",
|
||||||
|
" {\n",
|
||||||
|
" \"links\": [\n",
|
||||||
|
" {\"type\": \"about page\", \"url\": \"https://acme.com/about\"},\n",
|
||||||
|
" {\"type\": \"blog page\", \"url\": \"https://acme.com/blog\"},\n",
|
||||||
|
" {\"type\": \"pricing page\", \"url\": \"https://acme.com/pricing\"}\n",
|
||||||
|
" ]\n",
|
||||||
|
" }\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" link_system_prompt += \"\"\"\n",
|
||||||
|
" ## Example 2\n",
|
||||||
|
" Input links:\n",
|
||||||
|
" - https://startup.io/ \n",
|
||||||
|
" - https://startup.io/company \n",
|
||||||
|
" - https://startup.io/careers \n",
|
||||||
|
" - https://startup.io/support \n",
|
||||||
|
"\n",
|
||||||
|
" Output:\n",
|
||||||
|
" {\n",
|
||||||
|
" \"links\": [\n",
|
||||||
|
" {\"type\": \"company page\", \"url\": \"https://startup.io/company\"},\n",
|
||||||
|
" {\"type\": \"careers page\", \"url\": \"https://startup.io/careers\"}\n",
|
||||||
|
" ]\n",
|
||||||
|
" }\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" link_system_prompt += \"\"\"\n",
|
||||||
|
" ## Example 3\n",
|
||||||
|
" Input links:\n",
|
||||||
|
" - https://coolapp.xyz/login \n",
|
||||||
|
" - https://coolapp.xyz/random \n",
|
||||||
|
"\n",
|
||||||
|
" Output:\n",
|
||||||
|
" {\n",
|
||||||
|
" \"links\": []\n",
|
||||||
|
" }\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" return link_system_prompt\n",
|
||||||
|
"\n",
|
||||||
|
"def get_links_user_prompt(website):\n",
|
||||||
|
" user_prompt = f\"Here is the list of links on the website of {website.url} - \"\n",
|
||||||
|
" user_prompt += \"please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \\n\"\n",
|
||||||
|
" user_prompt += \"Do not include Terms of Service, Privacy, email links.\\n\"\n",
|
||||||
|
" user_prompt += \"Links (some might be relative links):\\n\"\n",
|
||||||
|
" user_prompt += \"\\n\".join(website.links)\n",
|
||||||
|
" return user_prompt\n",
|
||||||
|
"\n",
|
||||||
|
"def get_brochure_system_prompt():\n",
|
||||||
|
" brochure_system_prompt = \"\"\"\n",
|
||||||
|
" You are an assistant that analyzes the contents of several relevant pages from a company website \\\n",
|
||||||
|
" and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\n",
|
||||||
|
" Include details of company culture, customers and careers/jobs if you have the information.\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" return brochure_system_prompt\n",
|
||||||
|
"\n",
|
||||||
|
"def get_brochure_user_prompt(url):\n",
|
||||||
|
" user_prompt = f\"You are looking at a company details of: {url}\\n\"\n",
|
||||||
|
" user_prompt += f\"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\\n\"\n",
|
||||||
|
" user_prompt += get_details_for_brochure(url)\n",
|
||||||
|
" user_prompt = user_prompt[:15000] # Truncate if more than 15,000 characters\n",
|
||||||
|
" return user_prompt\n",
|
||||||
|
"\n",
|
||||||
|
"def get_translation_system_prompt(target_language):\n",
|
||||||
|
" translation_system_prompt = f\"You are a professional translator specializing in business and marketing content. \\\n",
|
||||||
|
" Translate the provided brochure to {target_language} while maintaining all formatting and professional tone.\"\n",
|
||||||
|
" return translation_system_prompt\n",
|
||||||
|
"\n",
|
||||||
|
"def get_translation_user_prompt(original_brochure, target_language):\n",
|
||||||
|
" translation_prompt = f\"\"\"\n",
|
||||||
|
" You are a professional translator. Please translate the following brochure content to {target_language}.\n",
|
||||||
|
" \n",
|
||||||
|
" Important guidelines:\n",
|
||||||
|
" - Maintain the markdown formatting exactly as it appears\n",
|
||||||
|
" - Keep all headers, bullet points, and structure intact\n",
|
||||||
|
" - Translate the content naturally and professionally\n",
|
||||||
|
" - Preserve any company names, product names, or proper nouns unless they have established translations\n",
|
||||||
|
" - Maintain the professional tone and marketing style\n",
|
||||||
|
" \n",
|
||||||
|
" Brochure content to translate:\n",
|
||||||
|
" {original_brochure}\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" return translation_prompt\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"✅ AI prompt functions loaded!\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Core Brochure Generation Functions\n",
|
||||||
|
"def get_links(url):\n",
|
||||||
|
" \"\"\"Get relevant links from a website using AI analysis\"\"\"\n",
|
||||||
|
" website = Website(url)\n",
|
||||||
|
" response = website.client.chat.completions.create(\n",
|
||||||
|
" model=\"gpt-4o-mini\",\n",
|
||||||
|
" messages=[\n",
|
||||||
|
" {\"role\": \"system\", \"content\": get_links_system_prompt()},\n",
|
||||||
|
" {\"role\": \"user\", \"content\": get_links_user_prompt(website)}\n",
|
||||||
|
" ],\n",
|
||||||
|
" response_format={\"type\": \"json_object\"}\n",
|
||||||
|
" )\n",
|
||||||
|
" result = response.choices[0].message.content\n",
|
||||||
|
" print(\"🔗 Found relevant links:\", result)\n",
|
||||||
|
" return json.loads(result)\n",
|
||||||
|
"\n",
|
||||||
|
"def get_details_for_brochure(url):\n",
|
||||||
|
" \"\"\"Get comprehensive details from website and relevant pages\"\"\"\n",
|
||||||
|
" website = Website(url)\n",
|
||||||
|
" result = \"Landing page:\\n\"\n",
|
||||||
|
" result += website.get_contents()\n",
|
||||||
|
" links = get_links(url)\n",
|
||||||
|
" print(\"📄 Analyzing additional pages...\")\n",
|
||||||
|
" for link in links[\"links\"]:\n",
|
||||||
|
" result += f\"\\n\\n{link['type']}\\n\"\n",
|
||||||
|
" result += Website(link[\"url\"]).get_contents()\n",
|
||||||
|
" return result\n",
|
||||||
|
"\n",
|
||||||
|
"def create_brochure(url):\n",
|
||||||
|
" \"\"\"Create a brochure from a website URL\"\"\"\n",
|
||||||
|
" website = Website(url)\n",
|
||||||
|
" print(\"🤖 Generating brochure with AI...\")\n",
|
||||||
|
" response = website.client.chat.completions.create(\n",
|
||||||
|
" model=\"gpt-4o-mini\",\n",
|
||||||
|
" messages=[\n",
|
||||||
|
" {\"role\": \"system\", \"content\": get_brochure_system_prompt()},\n",
|
||||||
|
" {\"role\": \"user\", \"content\": get_brochure_user_prompt(url)}\n",
|
||||||
|
" ]\n",
|
||||||
|
" )\n",
|
||||||
|
" result = response.choices[0].message.content\n",
|
||||||
|
" display_content(result, is_markdown=True)\n",
|
||||||
|
" return result\n",
|
||||||
|
"\n",
|
||||||
|
"def stream_brochure(url):\n",
|
||||||
|
" \"\"\"Create a brochure with streaming output\"\"\"\n",
|
||||||
|
" website = Website(url)\n",
|
||||||
|
" print(\"🤖 Generating brochure with streaming output...\")\n",
|
||||||
|
" response = website.client.chat.completions.create(\n",
|
||||||
|
" model=\"gpt-4o-mini\",\n",
|
||||||
|
" messages=[\n",
|
||||||
|
" {\"role\": \"system\", \"content\": get_brochure_system_prompt()},\n",
|
||||||
|
" {\"role\": \"user\", \"content\": get_brochure_user_prompt(url)}\n",
|
||||||
|
" ],\n",
|
||||||
|
" stream=True\n",
|
||||||
|
" )\n",
|
||||||
|
" \n",
|
||||||
|
" # Use the reusable streaming utility function\n",
|
||||||
|
" result = stream_content(response, \"Generating brochure\")\n",
|
||||||
|
" return result\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"✅ Core brochure generation functions loaded!\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Translation Functions\n",
|
||||||
|
"def translate_brochure(url, target_language=\"Spanish\", stream_mode=False):\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" Generate a brochure and translate it to the target language\n",
|
||||||
|
" \n",
|
||||||
|
" Args:\n",
|
||||||
|
" url (str): The website URL to generate brochure from\n",
|
||||||
|
" target_language (str): The target language for translation (default: \"Spanish\")\n",
|
||||||
|
" stream_mode (bool): Whether to use streaming output (default: False)\n",
|
||||||
|
" \n",
|
||||||
|
" Returns:\n",
|
||||||
|
" str: Translated brochure content\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" # First generate the original brochure\n",
|
||||||
|
" print(f\"🌍 Generating brochure and translating to {target_language}...\")\n",
|
||||||
|
" original_brochure = create_brochure(url)\n",
|
||||||
|
" \n",
|
||||||
|
" # Get translation prompts\n",
|
||||||
|
" translation_system_prompt = get_translation_system_prompt(target_language)\n",
|
||||||
|
" translation_user_prompt = get_translation_user_prompt(original_brochure, target_language)\n",
|
||||||
|
" \n",
|
||||||
|
" # Get OpenAI client\n",
|
||||||
|
" website = Website(url)\n",
|
||||||
|
" \n",
|
||||||
|
" if stream_mode:\n",
|
||||||
|
" # Generate translation using OpenAI with streaming\n",
|
||||||
|
" response = website.client.chat.completions.create(\n",
|
||||||
|
" model=\"gpt-4o-mini\",\n",
|
||||||
|
" messages=[\n",
|
||||||
|
" {\"role\": \"system\", \"content\": translation_system_prompt},\n",
|
||||||
|
" {\"role\": \"user\", \"content\": translation_user_prompt}\n",
|
||||||
|
" ],\n",
|
||||||
|
" stream=True\n",
|
||||||
|
" )\n",
|
||||||
|
" \n",
|
||||||
|
" # Use the reusable streaming utility function\n",
|
||||||
|
" translated_brochure = stream_content(response, f\"Translating brochure to {target_language}\")\n",
|
||||||
|
" else:\n",
|
||||||
|
" # Generate translation using OpenAI with complete output\n",
|
||||||
|
" response = website.client.chat.completions.create(\n",
|
||||||
|
" model=\"gpt-4o-mini\",\n",
|
||||||
|
" messages=[\n",
|
||||||
|
" {\"role\": \"system\", \"content\": translation_system_prompt},\n",
|
||||||
|
" {\"role\": \"user\", \"content\": translation_user_prompt}\n",
|
||||||
|
" ]\n",
|
||||||
|
" )\n",
|
||||||
|
" \n",
|
||||||
|
" translated_brochure = response.choices[0].message.content\n",
|
||||||
|
" \n",
|
||||||
|
" # Display the translated content\n",
|
||||||
|
" display_content(translated_brochure, is_markdown=True)\n",
|
||||||
|
" \n",
|
||||||
|
" return translated_brochure\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"✅ Translation functions loaded!\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Interactive Examples\n",
|
||||||
|
"\n",
|
||||||
|
"Now let's try generating brochures for some example websites. You can run these cells to see the brochure generator in action!\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Example 1: Generate a brochure for a sample website\n",
|
||||||
|
"# You can change this URL to any website you want to analyze\n",
|
||||||
|
"\n",
|
||||||
|
"sample_url = \"https://openai.com\" # Change this to any website you want to analyze\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"🚀 Generating brochure for: {sample_url}\")\n",
|
||||||
|
"print(\"=\" * 60)\n",
|
||||||
|
"\n",
|
||||||
|
"# Generate the brochure\n",
|
||||||
|
"brochure = create_brochure(sample_url)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Example 2: Generate a brochure with streaming output\n",
|
||||||
|
"# This shows the brochure being generated in real-time\n",
|
||||||
|
"\n",
|
||||||
|
"streaming_url = \"https://anthropic.com\" # Change this to any website you want to analyze\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"🚀 Generating brochure with streaming for: {streaming_url}\")\n",
|
||||||
|
"print(\"=\" * 60)\n",
|
||||||
|
"\n",
|
||||||
|
"# Generate the brochure with streaming\n",
|
||||||
|
"streaming_brochure = stream_brochure(streaming_url)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Example 3: Generate and translate a brochure\n",
|
||||||
|
"# This creates a brochure and then translates it to another language\n",
|
||||||
|
"\n",
|
||||||
|
"translation_url = \"https://huggingface.co\" # Change this to any website you want to analyze\n",
|
||||||
|
"target_language = \"Spanish\" # Change this to any language you want\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"🚀 Generating and translating brochure for: {translation_url}\")\n",
|
||||||
|
"print(f\"🌍 Target language: {target_language}\")\n",
|
||||||
|
"print(\"=\" * 60)\n",
|
||||||
|
"\n",
|
||||||
|
"# Generate and translate the brochure\n",
|
||||||
|
"translated_brochure = translate_brochure(translation_url, target_language, stream_mode=False)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Interactive Widget Interface\n",
|
||||||
|
"\n",
|
||||||
|
"Use the widgets below to interactively generate brochures for any website!\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Interactive Widget Interface\n",
|
||||||
|
"import ipywidgets as widgets\n",
|
||||||
|
"from IPython.display import display, clear_output\n",
|
||||||
|
"\n",
|
||||||
|
"# Create widgets\n",
|
||||||
|
"url_input = widgets.Text(\n",
|
||||||
|
" value='https://openai.com',\n",
|
||||||
|
" placeholder='Enter website URL (e.g., https://example.com)',\n",
|
||||||
|
" description='Website URL:',\n",
|
||||||
|
" style={'description_width': 'initial'},\n",
|
||||||
|
" layout=widgets.Layout(width='500px')\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"language_dropdown = widgets.Dropdown(\n",
|
||||||
|
" options=['English', 'Spanish', 'French', 'German', 'Chinese', 'Japanese', 'Portuguese', 'Italian'],\n",
|
||||||
|
" value='English',\n",
|
||||||
|
" description='Language:',\n",
|
||||||
|
" style={'description_width': 'initial'}\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"stream_checkbox = widgets.Checkbox(\n",
|
||||||
|
" value=False,\n",
|
||||||
|
" description='Use streaming output',\n",
|
||||||
|
" style={'description_width': 'initial'}\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"translate_checkbox = widgets.Checkbox(\n",
|
||||||
|
" value=False,\n",
|
||||||
|
" description='Translate brochure',\n",
|
||||||
|
" style={'description_width': 'initial'}\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"generate_button = widgets.Button(\n",
|
||||||
|
" description='Generate Brochure',\n",
|
||||||
|
" button_style='success',\n",
|
||||||
|
" icon='rocket'\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"output_area = widgets.Output()\n",
|
||||||
|
"\n",
|
||||||
|
"def on_generate_clicked(b):\n",
|
||||||
|
" with output_area:\n",
|
||||||
|
" clear_output(wait=True)\n",
|
||||||
|
" url = url_input.value.strip()\n",
|
||||||
|
" \n",
|
||||||
|
" if not url:\n",
|
||||||
|
" print(\"❌ Please enter a valid URL\")\n",
|
||||||
|
" return\n",
|
||||||
|
" \n",
|
||||||
|
" if not url.startswith(('http://', 'https://')):\n",
|
||||||
|
" url = 'https://' + url\n",
|
||||||
|
" \n",
|
||||||
|
" print(f\"🚀 Generating brochure for: {url}\")\n",
|
||||||
|
" print(\"=\" * 60)\n",
|
||||||
|
" \n",
|
||||||
|
" try:\n",
|
||||||
|
" if translate_checkbox.value:\n",
|
||||||
|
" # Generate and translate\n",
|
||||||
|
" result = translate_brochure(url, language_dropdown.value, stream_mode=stream_checkbox.value)\n",
|
||||||
|
" else:\n",
|
||||||
|
" # Generate only\n",
|
||||||
|
" if stream_checkbox.value:\n",
|
||||||
|
" result = stream_brochure(url)\n",
|
||||||
|
" else:\n",
|
||||||
|
" result = create_brochure(url)\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"\\n✅ Brochure generation completed!\")\n",
|
||||||
|
" \n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" print(f\"❌ Error generating brochure: {str(e)}\")\n",
|
||||||
|
" print(\"Please check your API key and internet connection.\")\n",
|
||||||
|
"\n",
|
||||||
|
"generate_button.on_click(on_generate_clicked)\n",
|
||||||
|
"\n",
|
||||||
|
"# Display widgets\n",
|
||||||
|
"print(\"🎯 Interactive Brochure Generator\")\n",
|
||||||
|
"print(\"Enter a website URL and click 'Generate Brochure' to create a professional brochure!\")\n",
|
||||||
|
"print()\n",
|
||||||
|
"\n",
|
||||||
|
"display(url_input)\n",
|
||||||
|
"display(widgets.HBox([language_dropdown, stream_checkbox, translate_checkbox]))\n",
|
||||||
|
"display(generate_button)\n",
|
||||||
|
"display(output_area)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Advanced Usage Examples\n",
|
||||||
|
"\n",
|
||||||
|
"Here are some advanced examples showing different ways to use the brochure generator.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Advanced Example 1: Analyze multiple websites and compare\n",
|
||||||
|
"websites_to_analyze = [\n",
|
||||||
|
" \"https://openai.com\",\n",
|
||||||
|
" \"https://anthropic.com\", \n",
|
||||||
|
" \"https://huggingface.co\"\n",
|
||||||
|
"]\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"🔍 Analyzing multiple websites...\")\n",
|
||||||
|
"print(\"=\" * 60)\n",
|
||||||
|
"\n",
|
||||||
|
"brochures = {}\n",
|
||||||
|
"for url in websites_to_analyze:\n",
|
||||||
|
" print(f\"\\n📊 Generating brochure for: {url}\")\n",
|
||||||
|
" try:\n",
|
||||||
|
" brochure = create_brochure(url)\n",
|
||||||
|
" brochures[url] = brochure\n",
|
||||||
|
" print(f\"✅ Successfully generated brochure for {url}\")\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" print(f\"❌ Failed to generate brochure for {url}: {str(e)}\")\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"-\" * 40)\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"\\n🎉 Generated {len(brochures)} brochures successfully!\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Advanced Example 2: Generate brochures in multiple languages\n",
|
||||||
|
"target_website = \"https://openai.com\" # Change this to any website\n",
|
||||||
|
"languages = [\"Spanish\", \"French\", \"German\", \"Chinese\"]\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"🌍 Generating brochures in multiple languages for: {target_website}\")\n",
|
||||||
|
"print(\"=\" * 60)\n",
|
||||||
|
"\n",
|
||||||
|
"multilingual_brochures = {}\n",
|
||||||
|
"for language in languages:\n",
|
||||||
|
" print(f\"\\n🔄 Translating to {language}...\")\n",
|
||||||
|
" try:\n",
|
||||||
|
" translated_brochure = translate_brochure(target_website, language, stream_mode=False)\n",
|
||||||
|
" multilingual_brochures[language] = translated_brochure\n",
|
||||||
|
" print(f\"✅ Successfully translated to {language}\")\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" print(f\"❌ Failed to translate to {language}: {str(e)}\")\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"-\" * 40)\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"\\n🎉 Generated brochures in {len(multilingual_brochures)} languages!\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Custom Functions\n",
|
||||||
|
"\n",
|
||||||
|
"Create your own custom functions for specific use cases.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Custom Function: Save brochure to file\n",
|
||||||
|
"def save_brochure_to_file(brochure_content, filename, url):\n",
|
||||||
|
" \"\"\"Save brochure content to a markdown file\"\"\"\n",
|
||||||
|
" try:\n",
|
||||||
|
" with open(filename, 'w', encoding='utf-8') as f:\n",
|
||||||
|
" f.write(f\"# Brochure for {url}\\n\\n\")\n",
|
||||||
|
" f.write(f\"Generated on: {time.strftime('%Y-%m-%d %H:%M:%S')}\\n\\n\")\n",
|
||||||
|
" f.write(\"---\\n\\n\")\n",
|
||||||
|
" f.write(brochure_content)\n",
|
||||||
|
" print(f\"✅ Brochure saved to: {filename}\")\n",
|
||||||
|
" return True\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" print(f\"❌ Error saving brochure: {str(e)}\")\n",
|
||||||
|
" return False\n",
|
||||||
|
"\n",
|
||||||
|
"# Custom Function: Generate brochure with custom analysis\n",
|
||||||
|
"def generate_custom_brochure(url, focus_areas=None):\n",
|
||||||
|
" \"\"\"Generate a brochure with focus on specific areas\"\"\"\n",
|
||||||
|
" if focus_areas is None:\n",
|
||||||
|
" focus_areas = [\"company overview\", \"products\", \"culture\", \"careers\"]\n",
|
||||||
|
" \n",
|
||||||
|
" website = Website(url)\n",
|
||||||
|
" \n",
|
||||||
|
" # Custom system prompt with focus areas\n",
|
||||||
|
" custom_system_prompt = f\"\"\"\n",
|
||||||
|
" You are an assistant that analyzes website content and creates a professional brochure.\n",
|
||||||
|
" Focus specifically on these areas: {', '.join(focus_areas)}.\n",
|
||||||
|
" Create a markdown brochure that emphasizes these aspects for prospective customers, investors and recruits.\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" \n",
|
||||||
|
" response = website.client.chat.completions.create(\n",
|
||||||
|
" model=\"gpt-4o-mini\",\n",
|
||||||
|
" messages=[\n",
|
||||||
|
" {\"role\": \"system\", \"content\": custom_system_prompt},\n",
|
||||||
|
" {\"role\": \"user\", \"content\": get_brochure_user_prompt(url)}\n",
|
||||||
|
" ]\n",
|
||||||
|
" )\n",
|
||||||
|
" \n",
|
||||||
|
" result = response.choices[0].message.content\n",
|
||||||
|
" display_content(result, is_markdown=True)\n",
|
||||||
|
" return result\n",
|
||||||
|
"\n",
|
||||||
|
"# Custom Function: Quick website analysis\n",
|
||||||
|
"def quick_website_analysis(url):\n",
|
||||||
|
" \"\"\"Perform a quick analysis of a website without generating full brochure\"\"\"\n",
|
||||||
|
" website = Website(url)\n",
|
||||||
|
" \n",
|
||||||
|
" analysis = f\"\"\"\n",
|
||||||
|
" # Quick Website Analysis: {url}\n",
|
||||||
|
" \n",
|
||||||
|
" **Title:** {website.title}\n",
|
||||||
|
" **Total Links Found:** {len(website.links)}\n",
|
||||||
|
" **Content Length:** {len(website.text)} characters\n",
|
||||||
|
" \n",
|
||||||
|
" ## Sample Content (first 500 characters):\n",
|
||||||
|
" {website.text[:500]}...\n",
|
||||||
|
" \n",
|
||||||
|
" ## All Links:\n",
|
||||||
|
" {chr(10).join(website.links[:10])} # Show first 10 links\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" \n",
|
||||||
|
" display_content(analysis, is_markdown=True)\n",
|
||||||
|
" return analysis\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"✅ Custom functions loaded!\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Usage Examples with Custom Functions\n",
|
||||||
|
"\n",
|
||||||
|
"Try these examples with the custom functions we just created.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Example: Quick website analysis\n",
|
||||||
|
"test_url = \"https://openai.com\" # Change this to any website\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"🔍 Performing quick website analysis...\")\n",
|
||||||
|
"print(\"=\" * 50)\n",
|
||||||
|
"\n",
|
||||||
|
"quick_analysis = quick_website_analysis(test_url)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Example: Generate custom brochure with specific focus\n",
|
||||||
|
"custom_url = \"https://anthropic.com\" # Change this to any website\n",
|
||||||
|
"focus_areas = [\"AI safety\", \"research\", \"products\", \"team\"] # Custom focus areas\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"🎯 Generating custom brochure with specific focus...\")\n",
|
||||||
|
"print(f\"Focus areas: {', '.join(focus_areas)}\")\n",
|
||||||
|
"print(\"=\" * 50)\n",
|
||||||
|
"\n",
|
||||||
|
"custom_brochure = generate_custom_brochure(custom_url, focus_areas)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Example: Generate brochure and save to file\n",
|
||||||
|
"save_url = \"https://huggingface.co\" # Change this to any website\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"💾 Generating brochure and saving to file...\")\n",
|
||||||
|
"print(\"=\" * 50)\n",
|
||||||
|
"\n",
|
||||||
|
"# Generate brochure\n",
|
||||||
|
"brochure_content = create_brochure(save_url)\n",
|
||||||
|
"\n",
|
||||||
|
"# Save to file\n",
|
||||||
|
"filename = f\"brochure_{save_url.replace('https://', '').replace('/', '_')}.md\"\n",
|
||||||
|
"save_success = save_brochure_to_file(brochure_content, filename, save_url)\n",
|
||||||
|
"\n",
|
||||||
|
"if save_success:\n",
|
||||||
|
" print(f\"📁 You can find the saved brochure in: {filename}\")\n",
|
||||||
|
"else:\n",
|
||||||
|
" print(\"❌ Failed to save brochure to file\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Troubleshooting and Tips\n",
|
||||||
|
"\n",
|
||||||
|
"### Common Issues and Solutions\n",
|
||||||
|
"\n",
|
||||||
|
"1. **API Key Issues**\n",
|
||||||
|
" - Make sure your OpenAI API key is set in the `.env` file\n",
|
||||||
|
" - Verify your API key has sufficient credits\n",
|
||||||
|
" - Check that the key starts with `sk-proj-`\n",
|
||||||
|
"\n",
|
||||||
|
"2. **Website Scraping Issues**\n",
|
||||||
|
" - Some websites may block automated requests\n",
|
||||||
|
" - Try different websites if one fails\n",
|
||||||
|
" - The tool uses a standard User-Agent header to avoid basic blocking\n",
|
||||||
|
"\n",
|
||||||
|
"3. **Memory Issues**\n",
|
||||||
|
" - Large websites may consume significant memory\n",
|
||||||
|
" - The tool truncates content to 15,000 characters to manage this\n",
|
||||||
|
"\n",
|
||||||
|
"4. **Rate Limiting**\n",
|
||||||
|
" - OpenAI has rate limits on API calls\n",
|
||||||
|
" - If you hit limits, wait a few minutes before trying again\n",
|
||||||
|
"\n",
|
||||||
|
"### Tips for Better Results\n",
|
||||||
|
"\n",
|
||||||
|
"1. **Choose Good Websites**\n",
|
||||||
|
" - Websites with clear About, Products, and Careers pages work best\n",
|
||||||
|
" - Avoid websites that are mostly images or require JavaScript\n",
|
||||||
|
"\n",
|
||||||
|
"2. **Use Streaming for Long Content**\n",
|
||||||
|
" - Enable streaming for better user experience with long brochures\n",
|
||||||
|
" - Streaming shows progress in real-time\n",
|
||||||
|
"\n",
|
||||||
|
"3. **Custom Focus Areas**\n",
|
||||||
|
" - Use the custom brochure function to focus on specific aspects\n",
|
||||||
|
" - This can help generate more targeted content\n",
|
||||||
|
"\n",
|
||||||
|
"4. **Save Your Work**\n",
|
||||||
|
" - Use the save function to keep brochures for later reference\n",
|
||||||
|
" - Files are saved in markdown format for easy editing\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Conclusion\n",
|
||||||
|
"\n",
|
||||||
|
"This Jupyter notebook provides a comprehensive interface for the Website Brochure Generator. You can:\n",
|
||||||
|
"\n",
|
||||||
|
"- ✅ Generate professional brochures from any website\n",
|
||||||
|
"- ✅ Translate brochures to multiple languages\n",
|
||||||
|
"- ✅ Use interactive widgets for easy operation\n",
|
||||||
|
"- ✅ Save brochures to files for later use\n",
|
||||||
|
"- ✅ Perform quick website analysis\n",
|
||||||
|
"- ✅ Create custom brochures with specific focus areas\n",
|
||||||
|
"- ✅ Generate brochures with streaming output for real-time feedback\n",
|
||||||
|
"\n",
|
||||||
|
"### Next Steps\n",
|
||||||
|
"\n",
|
||||||
|
"1. **Try the Interactive Widget**: Use the widget interface above to generate brochures for your favorite websites\n",
|
||||||
|
"2. **Experiment with Different URLs**: Test the tool with various types of websites\n",
|
||||||
|
"3. **Explore Translation Features**: Generate brochures in different languages\n",
|
||||||
|
"4. **Save Your Work**: Use the save function to keep your generated brochures\n",
|
||||||
|
"5. **Customize Focus Areas**: Create brochures tailored to specific aspects of companies\n",
|
||||||
|
"\n",
|
||||||
|
"### Support\n",
|
||||||
|
"\n",
|
||||||
|
"For issues and questions:\n",
|
||||||
|
"- Check the troubleshooting section above\n",
|
||||||
|
"- Verify your OpenAI API key is properly configured\n",
|
||||||
|
"- Ensure you have a stable internet connection\n",
|
||||||
|
"- Try different websites if one fails\n",
|
||||||
|
"\n",
|
||||||
|
"Happy brochure generating! 🚀\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.12.11"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 4
|
||||||
|
}
|
||||||
@@ -0,0 +1,356 @@
|
|||||||
|
from openai import OpenAI
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
import os
|
||||||
|
import requests
|
||||||
|
import json
|
||||||
|
from typing import List
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
|
||||||
|
# Rich library for beautiful terminal markdown rendering
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.markdown import Markdown as RichMarkdown
|
||||||
|
|
||||||
|
def get_client_and_headers():
|
||||||
|
load_dotenv(override=True)
|
||||||
|
api_key = os.getenv("OPENAI_API_KEY")
|
||||||
|
if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
|
||||||
|
# print("API key looks good so far")
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
print("There might be a problem with your API key")
|
||||||
|
|
||||||
|
client = OpenAI(api_key=api_key)
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
|
||||||
|
}
|
||||||
|
return client, headers
|
||||||
|
|
||||||
|
# Utility methods to display content in markdown format
|
||||||
|
def print_markdown_terminal(text):
|
||||||
|
"""Print markdown-formatted text to terminal with beautiful formatting using Rich"""
|
||||||
|
console = Console()
|
||||||
|
console.print(RichMarkdown(text))
|
||||||
|
|
||||||
|
def display_content(content, is_markdown=True):
|
||||||
|
"""Display content using Rich formatting"""
|
||||||
|
if is_markdown:
|
||||||
|
print_markdown_terminal(content)
|
||||||
|
else:
|
||||||
|
print(content)
|
||||||
|
|
||||||
|
def stream_content(response, title="Content"):
|
||||||
|
"""
|
||||||
|
Utility function to handle streaming content display using Rich
|
||||||
|
|
||||||
|
Args:
|
||||||
|
response: OpenAI streaming response object
|
||||||
|
title (str): Title to display for the streaming content
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
str: Complete streamed content
|
||||||
|
"""
|
||||||
|
result = ""
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
# Terminal streaming with real-time output using Rich
|
||||||
|
console.print(f"\n[bold blue]{title}...[/bold blue]\n")
|
||||||
|
for chunk in response:
|
||||||
|
content = chunk.choices[0].delta.content or ""
|
||||||
|
result += content
|
||||||
|
# Print each chunk as it arrives for streaming effect
|
||||||
|
print(content, end='', flush=True)
|
||||||
|
console.print(f"\n\n[bold green]{'='*50}[/bold green]")
|
||||||
|
console.print(f"[bold green]{title.upper()} COMPLETE[/bold green]")
|
||||||
|
console.print(f"[bold green]{'='*50}[/bold green]")
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
# Utility class to get the contents of a website
|
||||||
|
class Website:
|
||||||
|
def __init__(self, url):
|
||||||
|
self.url = url
|
||||||
|
self.client, self.headers = get_client_and_headers()
|
||||||
|
response = requests.get(url, headers=self.headers)
|
||||||
|
self.body = response.content
|
||||||
|
soup = BeautifulSoup(self.body, 'html.parser')
|
||||||
|
self.title = soup.title.string if soup.title else "No title found"
|
||||||
|
if soup.body:
|
||||||
|
for irrelevant in soup.body(["script", "style", "img", "input"]):
|
||||||
|
irrelevant.decompose()
|
||||||
|
self.text = soup.body.get_text(separator="\n", strip=True)
|
||||||
|
else:
|
||||||
|
self.text = ""
|
||||||
|
links = [link.get('href') for link in soup.find_all('a')]
|
||||||
|
self.links = [link for link in links if link]
|
||||||
|
|
||||||
|
def get_contents(self):
|
||||||
|
return f"Webpage Title: {self.title}\nWebpage Contents: {self.text}\n\n"
|
||||||
|
|
||||||
|
def get_links_system_prompt():
|
||||||
|
link_system_prompt = """"You are provided with a list of links found on a webpage. \
|
||||||
|
You are able to decide which of the links would be most relevant to include in a brochure about the company. \
|
||||||
|
Relevant links usually include: About page, or a Company page, or Careers/Jobs pages or News page\n"""
|
||||||
|
link_system_prompt += "Always respond in JSON exactly like this: \n"
|
||||||
|
link_system_prompt += """
|
||||||
|
{
|
||||||
|
"links": [
|
||||||
|
{"type": "<page type>", "url": "<full URL>"},
|
||||||
|
{"type": "<page type>", "url": "<full URL>"}
|
||||||
|
]
|
||||||
|
}\n
|
||||||
|
"""
|
||||||
|
link_system_prompt += """ If no relevant links are found, return:
|
||||||
|
{
|
||||||
|
"links": []
|
||||||
|
}\n
|
||||||
|
"""
|
||||||
|
link_system_prompt += "If multiple links could map to the same type (e.g. two About pages), include the best candidate only.\n"
|
||||||
|
|
||||||
|
link_system_prompt += "You should respond in JSON as in the below examples:\n"
|
||||||
|
link_system_prompt += """
|
||||||
|
## Example 1
|
||||||
|
Input links:
|
||||||
|
- https://acme.com/about
|
||||||
|
- https://acme.com/pricing
|
||||||
|
- https://acme.com/blog
|
||||||
|
- https://acme.com/signup
|
||||||
|
|
||||||
|
Output:
|
||||||
|
{
|
||||||
|
"links": [
|
||||||
|
{"type": "about page", "url": "https://acme.com/about"},
|
||||||
|
{"type": "blog page", "url": "https://acme.com/blog"},
|
||||||
|
{"type": "pricing page", "url": "https://acme.com/pricing"}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
link_system_prompt += """
|
||||||
|
## Example 2
|
||||||
|
Input links:
|
||||||
|
- https://startup.io/
|
||||||
|
- https://startup.io/company
|
||||||
|
- https://startup.io/careers
|
||||||
|
- https://startup.io/support
|
||||||
|
|
||||||
|
Output:
|
||||||
|
{
|
||||||
|
"links": [
|
||||||
|
{"type": "company page", "url": "https://startup.io/company"},
|
||||||
|
{"type": "careers page", "url": "https://startup.io/careers"}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
link_system_prompt += """
|
||||||
|
## Example 3
|
||||||
|
Input links:
|
||||||
|
- https://coolapp.xyz/login
|
||||||
|
- https://coolapp.xyz/random
|
||||||
|
|
||||||
|
Output:
|
||||||
|
{
|
||||||
|
"links": []
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
return link_system_prompt
|
||||||
|
|
||||||
|
def get_links_user_prompt(website):
|
||||||
|
user_prompt = f"Here is the list of links on the website of {website.url} - "
|
||||||
|
user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \n"
|
||||||
|
user_prompt += "Do not include Terms of Service, Privacy, email links.\n"
|
||||||
|
user_prompt += "Links (some might be relative links):\n"
|
||||||
|
user_prompt += "\n".join(website.links)
|
||||||
|
return user_prompt
|
||||||
|
|
||||||
|
def get_brochure_system_prompt():
|
||||||
|
brochure_system_prompt = """
|
||||||
|
You are an assistant that analyzes the contents of several relevant pages from a company website \
|
||||||
|
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.
|
||||||
|
Include details of company culture, customers and careers/jobs if you have the information.
|
||||||
|
"""
|
||||||
|
return brochure_system_prompt
|
||||||
|
|
||||||
|
def get_brochure_user_prompt(url):
|
||||||
|
user_prompt = f"You are looking at a company details of: {url}\n"
|
||||||
|
user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
|
||||||
|
user_prompt += get_details_for_brochure(url)
|
||||||
|
user_prompt = user_prompt[:15000] # Truncate if more than 15,000 characters
|
||||||
|
return user_prompt
|
||||||
|
|
||||||
|
def get_translation_system_prompt(target_language):
|
||||||
|
translation_system_prompt = f"You are a professional translator specializing in business and marketing content. \
|
||||||
|
Translate the provided brochure to {target_language} while maintaining all formatting and professional tone."
|
||||||
|
return translation_system_prompt
|
||||||
|
|
||||||
|
def get_translation_user_prompt(original_brochure, target_language):
|
||||||
|
translation_prompt = f"""
|
||||||
|
You are a professional translator. Please translate the following brochure content to {target_language}.
|
||||||
|
|
||||||
|
Important guidelines:
|
||||||
|
- Maintain the markdown formatting exactly as it appears
|
||||||
|
- Keep all headers, bullet points, and structure intact
|
||||||
|
- Translate the content naturally and professionally
|
||||||
|
- Preserve any company names, product names, or proper nouns unless they have established translations
|
||||||
|
- Maintain the professional tone and marketing style
|
||||||
|
|
||||||
|
Brochure content to translate:
|
||||||
|
{original_brochure}
|
||||||
|
"""
|
||||||
|
return translation_prompt
|
||||||
|
|
||||||
|
def get_links(url):
|
||||||
|
website = Website(url)
|
||||||
|
response = website.client.chat.completions.create(
|
||||||
|
model="gpt-4o-mini",
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": get_links_system_prompt()},
|
||||||
|
{"role": "user", "content": get_links_user_prompt(website)}
|
||||||
|
],
|
||||||
|
response_format={"type": "json_object"}
|
||||||
|
)
|
||||||
|
result = response.choices[0].message.content
|
||||||
|
print("get_links:", result)
|
||||||
|
return json.loads(result)
|
||||||
|
|
||||||
|
def get_details_for_brochure(url):
|
||||||
|
website = Website(url)
|
||||||
|
result = "Landing page:\n"
|
||||||
|
result += website.get_contents()
|
||||||
|
links = get_links(url)
|
||||||
|
print("Found links:", links)
|
||||||
|
for link in links["links"]:
|
||||||
|
result += f"\n\n{link['type']}\n"
|
||||||
|
result += Website(link["url"]).get_contents()
|
||||||
|
return result
|
||||||
|
|
||||||
|
def create_brochure(url):
|
||||||
|
website = Website(url)
|
||||||
|
response = website.client.chat.completions.create(
|
||||||
|
model="gpt-4o-mini",
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": get_brochure_system_prompt()},
|
||||||
|
{"role": "user", "content": get_brochure_user_prompt(url)}
|
||||||
|
]
|
||||||
|
)
|
||||||
|
result = response.choices[0].message.content
|
||||||
|
display_content(result, is_markdown=True)
|
||||||
|
return result
|
||||||
|
|
||||||
|
def stream_brochure(url):
|
||||||
|
website = Website(url)
|
||||||
|
response = website.client.chat.completions.create(
|
||||||
|
model="gpt-4o-mini",
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": get_brochure_system_prompt()},
|
||||||
|
{"role": "user", "content": get_brochure_user_prompt(url)}
|
||||||
|
],
|
||||||
|
stream=True
|
||||||
|
)
|
||||||
|
|
||||||
|
# Use the reusable streaming utility function
|
||||||
|
result = stream_content(response, "Generating brochure")
|
||||||
|
return result
|
||||||
|
|
||||||
|
def translate_brochure(url, target_language="Spanish", stream_mode=False):
|
||||||
|
"""
|
||||||
|
Generate a brochure and translate it to the target language
|
||||||
|
|
||||||
|
Args:
|
||||||
|
url (str): The website URL to generate brochure from
|
||||||
|
target_language (str): The target language for translation (default: "Spanish")
|
||||||
|
stream_mode (bool): Whether to use streaming output (default: False)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
str: Translated brochure content
|
||||||
|
"""
|
||||||
|
# First generate the original brochure
|
||||||
|
original_brochure = create_brochure(url)
|
||||||
|
|
||||||
|
# Get translation prompts
|
||||||
|
translation_system_prompt = get_translation_system_prompt(target_language)
|
||||||
|
translation_user_prompt = get_translation_user_prompt(original_brochure, target_language)
|
||||||
|
|
||||||
|
# Get OpenAI client
|
||||||
|
website = Website(url)
|
||||||
|
|
||||||
|
if stream_mode:
|
||||||
|
# Generate translation using OpenAI with streaming
|
||||||
|
response = website.client.chat.completions.create(
|
||||||
|
model="gpt-4o-mini",
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": translation_system_prompt},
|
||||||
|
{"role": "user", "content": translation_user_prompt}
|
||||||
|
],
|
||||||
|
stream=True
|
||||||
|
)
|
||||||
|
|
||||||
|
# Use the reusable streaming utility function
|
||||||
|
translated_brochure = stream_content(response, f"Translating brochure to {target_language}")
|
||||||
|
else:
|
||||||
|
# Generate translation using OpenAI with complete output
|
||||||
|
response = website.client.chat.completions.create(
|
||||||
|
model="gpt-4o-mini",
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": translation_system_prompt},
|
||||||
|
{"role": "user", "content": translation_user_prompt}
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
translated_brochure = response.choices[0].message.content
|
||||||
|
|
||||||
|
# Display the translated content
|
||||||
|
display_content(translated_brochure, is_markdown=True)
|
||||||
|
|
||||||
|
return translated_brochure
|
||||||
|
|
||||||
|
|
||||||
|
# Main function for terminal usage
|
||||||
|
def main():
|
||||||
|
"""Main function for running the brochure generator from terminal"""
|
||||||
|
import sys
|
||||||
|
|
||||||
|
if len(sys.argv) != 2:
|
||||||
|
console = Console()
|
||||||
|
console.print("[bold red]Usage:[/bold red] python website_brochure_generator.py <website_url>")
|
||||||
|
console.print("[bold blue]Example:[/bold blue] python website_brochure_generator.py https://example.com")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
url = sys.argv[1]
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
console.print(f"[bold green]Generating brochure for:[/bold green] {url}")
|
||||||
|
console.print("\n[bold yellow]Choose display mode:[/bold yellow]")
|
||||||
|
console.print("1. Complete output (display all at once)")
|
||||||
|
console.print("2. Stream output (real-time generation)")
|
||||||
|
|
||||||
|
display_choice = input("\nEnter choice (1 or 2): ").strip()
|
||||||
|
|
||||||
|
# Generate brochure based on display choice
|
||||||
|
if display_choice == "1":
|
||||||
|
result = create_brochure(url)
|
||||||
|
elif display_choice == "2":
|
||||||
|
result = stream_brochure(url)
|
||||||
|
else:
|
||||||
|
console.print("[bold red]Invalid choice. Using default: complete output[/bold red]")
|
||||||
|
result = create_brochure(url)
|
||||||
|
|
||||||
|
# Ask if user wants translation
|
||||||
|
console.print("\n[bold yellow]Translation options:[/bold yellow]")
|
||||||
|
console.print("1. No translation (original only)")
|
||||||
|
console.print("2. Translate to another language")
|
||||||
|
|
||||||
|
translation_choice = input("\nEnter choice (1 or 2): ").strip()
|
||||||
|
|
||||||
|
if translation_choice == "2":
|
||||||
|
target_language = input("Enter target language (e.g., Spanish, French, German, Chinese): ").strip()
|
||||||
|
if not target_language:
|
||||||
|
target_language = "Spanish"
|
||||||
|
|
||||||
|
# Pass the stream mode based on the display choice
|
||||||
|
stream_mode = (display_choice == "2")
|
||||||
|
translate_brochure(url, target_language, stream_mode=stream_mode)
|
||||||
|
else:
|
||||||
|
pass
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
Reference in New Issue
Block a user