# ml-toad **Repository Path**: mirrors_apple/ml-toad ## Basic Information - **Project Name**: ml-toad - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-08-28 - **Last Updated**: 2026-03-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # TOAD This software project accompanies the research paper, [TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles](https://arxiv.org/abs/2402.10137). **This paper has been accepted by ACL 2024.**

Toad

TOAD is a synthetic TOD dataset that simulates realistic app context interactions and provides multiple system response styles (verbosity & mirroring user expressions). ## Run Data Synthesis **Preparation**: - Install dependencies from `requirements.txt`. - We use OpenAI Compatible API to make requests to LLMs. Set the environment variable `OPENAI_API_KEY`, `BASE_URL` (optional) and `ENGINE` (e.g. "gpt-3.5-turbo") to config the backend LLM. You can use a dotenv file. **Synthesis**: The data synthesis pipeline is divided into 3 steps. The generated files will be stored in `data/`. Step 1: Context generation 1. Run `python -m context_generation.occupation_generator` to synthesize `occupations.json` (you can skip this step and re-use the existing file). 2. Run `python -m context_generation.persona_generator` to synthesize `personas.jsonl` using occupations. 3. Run `python -m context_generation.context_generator` to synthesize `contexts.jsonl` using personas. Step 2: Dialog generation 4. Run code in `dialog_generation` to synthesize dialogs based on contexts. Example command: ```bash python -m dialog_generation.main \ --phenomena='compound' \ --output_dir='data/dialogs' \ --number_of_data=1000 \ --full_options_mode \ --thread_num=15 ``` - `--phenomena` specifies the phenomena to be used in dialog generation. It can be one of `compound`, `compositional`, `none`. - `--output_dir` specifies the path to save the generated dialogs. - `--number_of_data` specifies the number of dialogs to generate. - `--full_options_mode` asks for generating of all 6 response style options. - `--thread_num` specifies the number of threads to run in parallel. For how to customize dialog generation by modifying the `schema.json`, please refer to [the documentation in that directory](dialog_generation/README.md). Step 3: Quality control 5. Run `python -m quality_control.main` to filter out inconsistent dialogs using the LLM. ## Citation ``` @inproceedings{liu2024toad, title = "{TOAD}: Task-Oriented Automatic Dialogs with Diverse Response Styles", author = "Liu, Yinhong and Fang, Yimai and Vandyke, David and Collier, Nigel", booktitle = "Findings of the Association for Computational Linguistics: ACL 2024", year = "2024", url = "https://arxiv.org/abs/2402.10137" } ```