This commit introduces the foundational structure for the Deal Intel project, including:
- Environment configuration file (.env.example) for managing secrets and API keys.
- Scripts for building a ChromaDB vector store (build_vector_store.py) and training machine learning models (train_rf.py, train_ensemble.py).
- Health check functionality (health_check.py) to ensure system readiness.
- A launcher script (launcher.py) for executing various commands, including UI launch and health checks.
- Logging utilities (logging_utils.py) for consistent logging across the application.
- A README file providing an overview and setup instructions for the project.
These additions establish a comprehensive framework for an agentic deal-hunting AI system, integrating various components for data processing, model training, and user interaction.
This commit introduces a new Jupyter notebook, 'week6 EXERCISE.ipynb', which outlines the process for fine-tuning a model to classify banking customer queries. The notebook includes steps for data preparation, model training, and evaluation, utilizing the Banking77 dataset and OpenAI's API for fine-tuning. This addition enhances the project's capabilities in handling banking-related queries effectively.
This commit introduces a new Python module, data_cleaner.py, which provides functions for cleaning and preparing datasets for fine-tuning. The module includes a method to clean datasets based on text length and balance class distributions, as well as a function to analyze label distributions. These utilities enhance the data preprocessing capabilities for the application.
This commit introduces a new Python module, classifier_tester.py, which provides a testing framework for evaluating the accuracy of classification models on intent classification tasks. The module includes methods for running tests on individual data points, reporting metrics, and visualizing confusion pairs, enhancing the overall testing capabilities for the Banking77 application.
This commit introduces a new Python module, banking_intents.py, which maps intent labels (0-76) to their corresponding intent names for the Banking77 application. The module includes functions to retrieve intent names by label and vice versa, along with a utility to display all intents. This addition enhances the application's ability to handle various banking-related queries effectively.