(Oct 2025 Bootcamp): Add audio transcription assistant with Gradio interface

- Introduced a new audio transcription tool utilizing OpenAI's Whisper model. - Added README.md detailing features, installation, and usage instructions. - Created a Jupyter notebook for local and Google Colab execution. - Included an MP3 file for demonstration purposes.
2025-10-26 04:49:01 +01:00
parent 48076f9d39
commit ae81fa4c8d
3 changed files with 594 additions and 0 deletions
--- a/week3/community-contributions/hopeogbons/README.md
+++ b/week3/community-contributions/hopeogbons/README.md
@@ -0,0 +1,197 @@
+# 🎙️ Audio Transcription Assistant
+
+An AI-powered audio transcription tool that converts speech to text in multiple languages using OpenAI's Whisper model.
+
+## Why I Built This
+
+In today's content-driven world, audio and video are everywhere—podcasts, meetings, lectures, interviews. But what if you need to quickly extract text from an audio file in a different language? Or create searchable transcripts from recordings?
+
+Manual transcription is time-consuming and expensive. I wanted to build something that could:
+
+- Accept audio files in any format (MP3, WAV, etc.)
+- Transcribe them accurately using AI
+- Support multiple languages
+- Work locally on my Mac **and** on cloud GPUs (Google Colab)
+
+That's where **Whisper** comes in—OpenAI's powerful speech recognition model.
+
+## Features
+
+- 📤 **Upload any audio file** (MP3, WAV, M4A, FLAC, etc.)
+- 🌍 **12+ languages supported** with auto-detection
+- 🤖 **Accurate AI-powered transcription** using Whisper
+- ⚡ **Cross-platform** - works on CPU (Mac) or GPU (Colab)
+- 🎨 **Clean web interface** built with Gradio
+- 🚀 **Fast processing** with optimized model settings
+
+## Tech Stack
+
+- **OpenAI Whisper** - Speech recognition model
+- **Gradio** - Web interface framework
+- **PyTorch** - Deep learning backend
+- **NumPy** - Numerical computing
+- **ffmpeg** - Audio file processing
+
+## Installation
+
+### Prerequisites
+
+- Python 3.12+
+- ffmpeg (for audio processing)
+- uv package manager (or pip)
+
+### Setup
+
+1. Clone this repository or download the notebook
+
+2. Install dependencies:
+
+```bash
+# Install compatible NumPy version
+uv pip install --reinstall "numpy==1.26.4"
+
+# Install PyTorch
+uv pip install torch torchvision torchaudio
+
+# Install Gradio and Whisper
+uv pip install gradio openai-whisper ffmpeg-python
+
+# (Optional) Install Ollama for LLM features
+uv pip install ollama
+```
+
+3. **For Mac users**, ensure ffmpeg is installed:
+
+```bash
+brew install ffmpeg
+```
+
+## Usage
+
+### Running Locally
+
+1. Open the Jupyter notebook `week3 EXERCISE_hopeogbons.ipynb`
+
+2. Run all cells in order:
+
+   - Cell 1: Install dependencies
+   - Cell 2: Import libraries
+   - Cell 3: Load Whisper model
+   - Cell 4: Define transcription function
+   - Cell 5: Build Gradio interface
+   - Cell 6: Launch the app
+
+3. The app will automatically open in your browser
+
+4. Upload an audio file, select the language, and click Submit!
+
+### Running on Google Colab
+
+For GPU acceleration:
+
+1. Open the notebook in Google Colab
+2. Runtime → Change runtime type → **GPU (T4)**
+3. Run all cells in order
+4. The model will automatically use GPU acceleration
+
+**Note:** First run downloads the Whisper model (~140MB) - this is a one-time download.
+
+## Supported Languages
+
+- 🇬🇧 English
+- 🇪🇸 Spanish
+- 🇫🇷 French
+- 🇩🇪 German
+- 🇮🇹 Italian
+- 🇵🇹 Portuguese
+- 🇨🇳 Chinese
+- 🇯🇵 Japanese
+- 🇰🇷 Korean
+- 🇷🇺 Russian
+- 🇸🇦 Arabic
+- 🌐 Auto-detect
+
+## How It Works
+
+1. **Upload** - User uploads an audio file through the Gradio interface
+2. **Process** - ffmpeg decodes the audio file
+3. **Transcribe** - Whisper model processes the audio and generates text
+4. **Display** - Transcription is shown in the output box
+
+The Whisper "base" model is used for a balance between speed and accuracy:
+
+- Fast enough for real-time use on CPU
+- Accurate enough for most transcription needs
+- Small enough (~140MB) for quick downloads
+
+## Example Transcriptions
+
+The app successfully transcribed:
+
+- English podcast episodes
+- French language audio (detected and transcribed)
+- Multi-speaker conversations
+- Audio with background noise
+
+## What I Learned
+
+Building this transcription assistant taught me:
+
+- **Audio processing** with ffmpeg and Whisper
+- **Cross-platform compatibility** (Mac CPU vs Colab GPU)
+- **Dependency management** (dealing with NumPy version conflicts!)
+- **Async handling** in Jupyter notebooks with Gradio
+- **Model optimization** (choosing the right Whisper model size)
+
+The biggest challenge? Getting ffmpeg and NumPy to play nice together across different environments. But solving those issues made me understand the stack much better.
+
+## Troubleshooting
+
+### Common Issues
+
+**1. "No module named 'whisper'" error**
+
+- Make sure you've installed `openai-whisper`, not just `whisper`
+- Restart your kernel after installation
+
+**2. "ffmpeg not found" error**
+
+- Install ffmpeg: `brew install ffmpeg` (Mac) or `apt-get install ffmpeg` (Linux)
+
+**3. NumPy version conflicts**
+
+- Use NumPy 1.26.4: `uv pip install --reinstall "numpy==1.26.4"`
+- Restart kernel after reinstalling
+
+**4. Gradio event loop errors**
+
+- Use `prevent_thread_lock=True` in `app.launch()`
+- Restart kernel if errors persist
+
+## Future Enhancements
+
+- [ ] Support for real-time audio streaming
+- [ ] Speaker diarization (identifying different speakers)
+- [ ] Export transcripts to multiple formats (SRT, VTT, TXT)
+- [ ] Integration with LLMs for summarization
+- [ ] Batch processing for multiple files
+
+## Contributing
+
+Feel free to fork this project and submit pull requests with improvements!
+
+## License
+
+This project is open source and available under the MIT License.
+
+## Acknowledgments
+
+- **OpenAI** for the amazing Whisper model
+- **Gradio** team for the intuitive interface framework
+- **Andela LLM Engineering Program** for the learning opportunity
+
+---
+
+**Built with ❤️ as part of the Andela LLM Engineering Program**
+
+For questions or feedback, feel free to reach out!