# reinforced-dialog-system-for-learning
**Repository Path**: mirrors_ibm/reinforced-dialog-system-for-learning
## Basic Information
- **Project Name**: reinforced-dialog-system-for-learning
- **Description**: Code for NAACL 2022 paper "Learning as Conversation: Dialogue Systems Reinforced for Information Acquisition". Using self-play and reinforcement learning to train a dialogue agent which aims at conveying knowledge to end user.
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-04-20
- **Last Updated**: 2025-08-24
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Reinforced Dialog System For Learning
This is the repo for the NAACL 2022 paper **[Learning as Conversation: Dialogue Systems Reinforced for Information Acquisition]()**.
## Build the environment
We recommend creating a CONDA environment by
```bash
conda env create -f conda_env.yml
```
Also, following [This Github Repo](https://github.com/shrimai/Focused-Attention-Improves-Document-Grounded-Generation), You have to copy the patch provided in patch folder to the desired location, i.e. find out the path where the transformers library is installed and replace the original ```generation_utils.py``` file in the transformers library with the ```patch/generation_utils.py``` file.
[comment]: <> ((Pengshan has build a virtual env called **wow** on his softlayer machine *cai.sl.cloud9.ibm.com*, you may use that directly))
You may choose to download the preprocessed datasets, or build it yourself from scratch.
### Download preprocessed datasets
###Process the datasets yourself from scratch
Please prepare the data in another directory (e.g. you may name it *Talk_* ) under the same parent directory
```shell script
mkdir -p ../Talk_/data/WoW-raw
cd ../Talk_/data/WoW-raw
wget http://parl.ai/downloads/wizard_of_wikipedia/wizard_of_wikipedia.tgz
tar zxvf wizard_of_wikipedia.tgz
mv valid_random_split.json dev.json
```
These commands will download and decompress the ***Wizard of Wikipedia*** dataaset which would be used to pre-tune our teacher and student bots.
You may continue to build other folders under the Talk_ directory, which would be used to save hyper-parameters, model dumps and logs.
```shell script
cd ../Talk_
mkdir -p za/args
mkdir saved_models
mkdir logs
```
The downloaded ***Wizard of Wikipedia*** dataset may miss some information, please refer to
```shell script
scripts/prepare_data/load_wikipedia_into_mysql.py
```
to build up a Mysql database for Wikipedia (Please revise the code to fit your mysql setting)
When building the dataset using scripts in *scripts/prepare_data/prepare_wow_wiz_app*, the script would utilized the Wikipedia database to fill up the missing information
Use the following script to prepare data to pre-tune the wizard model and the apprentice model
```shell script
python scripts/prepare_data/prepare_wow_wiz_app/prepare_wow_1.1.py
```
To build datasets for RL piloted fine-tune (Wikipedia, CNN-DailyMail, Paper Abstracts), please refer to scripts in the following folder:
```````shell script
scripts/prepare_data/prepare_finetune_datasets
```````
To build coherence evaluation datasets (WoW-coherence), run the following python file:
```````shell script
python shell/prepare_data/prepare_coh-1.5.py
```````
[comment]: <> ((Difference between 1.1 and 1.6: Entire document as input to wizard - 1.1; Single sentence as input to wizard - 1.6))
### Our pre-trained and fine-tuned model dumps
You may train your own models following two-phase procedures:
### Phrase 1: Pre-tune
To pre-tune the wizard model, run the following command line:
```shell script
python shell/train_wiz.sh
```
To pre-tune the apprentice model, run the following command line:
```shell script
python shell/train_app.sh
```
To train the coherence evaluation model on WoW-coherence dataset, run the following command line:
```shell script
python shell/train_coh.sh
```
### Phrase 2: RL piloted fine tuning
To fine-tune the selector model using RL, run the following command line:
```shell script
python shell/rl_self_play.sh
```
You may revise the ```train_file_rl``` and ```validation_file_rl``` parameters to selects fine-tune datasets.
The fine-tuning could take to up to two days on a songle A-100 GPU.
[comment]: <> (Some self-play demos could be found in the folder *./demos* )