# ๐ง Synthetic Data Generator A Python-based tool to generate structured, synthetic job postings using open-source LLMs from Hugging Face. This project supports both **script-based execution** and an **interactive Colab notebook**, making it ideal for rapid prototyping, dataset bootstrapping, or demonstrating prompt engineering techniques. > Note: Original Repo can be found at: https://github.com/moawiah/synthetic_data_generator  This tool helps: - Researchers create labeled training data for NLP classification or QA - HR tech startups prototype recommendation models - AI instructors demonstrate few-shot prompting in class --- ## โจ Features - ๐ Integrates Hugging Face Transformer models - ๐ Generates realistic job postings in structured JSON format - ๐งช Supports prompt engineering with control over output length and variability - ๐ง Minimal Gradio UI for non-technical users - ๐ Jupyter/Colab support for experimentation and reproducibility ## ๐ Project Structure
```
. โโโ app/
โ
โโโ app.py # Main script entry point
โ
โโโ consts.py # Configuration and constants
โ
โโโ requirements.txt # Python dependencies
โโโ data/
โ
โโโ software_engineer_jobs.json # Sample input data (JSON format)
โโโ notebooks/
โ
โโโ synthetic_data_generator.ipynb # Interactive Colab notebook
โโโ .env.example # Sample environment variable config
โโโ .gitignore # Git ignored files list
โโโ README.md
```
## ๐ Getting Started
### 1. Clone the repository
```bash
git clone https://github.com/moawiah/synthetic_data_generator.git
cd synthetic_data_generator
```
### Install Dependencies
```bah
pip install -r app/requirements.txt
```
### Hugging Face Token
You need to create a `.env` file with your HuggingFace token like `HF_TOKEN=your-token-here`
### Run
run the app using
`python app/app.py`
## Example Output - 1 Job
```JSON
{
"title": "Software Engineer"
,
"description": "We are seeking a highly skilled software engineer to join our team and contribute to the development of innovative software solutions. The ideal candidate will have experience in designing, coding, and testing software systems, and will be able to work collaboratively with cross-functional teams. Responsibilities include writing clean, maintainable, and efficient code, as well as actively participating in code reviews and continuous integration processes. This is an excellent opportunity for a self-starter with a passion for technology and a desire to grow in their career."
,
"requirements":[
"0":"Bachelor's degree in Computer Science or related field",
"1":"Minimum of 2 years experience in software development",
"2":"Strong proficiency in Java or C++",
"3":"Experience with agile development methodologies",
"4":"Good understanding of data structures and algorithms",
"5":"Excellent problem-solving and analytical skills"
],
"location":"New York, NY",
"company_name":"ABC Technologies"
}
```
## Future Improvements
๐ Add support for more job roles and industries
๐ง Model selector from UI
๐พ Export dataset as CSV
โ๏ธ Optional integration with LangChain or RAG workflows