# EasyInstruct **Repository Path**: zxlzr/EasyInstruct ## Basic Information - **Project Name**: EasyInstruct - **Description**: 开源大模型指令处理框架(支持从种子、文本、知识图谱生成指令数据) - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2024-02-07 - **Last Updated**: 2024-02-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
**An Easy-to-use Instruction Processing Framework for Large Language Models.**
---
Project • Paper • Demo • Overview • Installation • Quickstart • How To Use • Docs • Citation • Contributors
 [](https://opensource.org/licenses/MIT)  
- The current supported instruction generation techniques are as follows:
| **Methods** | **Description** |
| --- | --- |
| [Self-Instruct](https://arxiv.org/abs/2212.10560) | The method that randomly samples a few instructions from a human-annotated seed tasks pool as demonstrations and prompts an LLM to generate more instructions and corresponding input-output pairs. |
| [Evol-Instruct](https://arxiv.org/abs/2304.12244) | The method that incrementally upgrades an initial set of instructions into more complex instructions by prompting an LLM with specific prompts. |
| [Backtranslation](https://arxiv.org/abs/2308.06259) | The method that creates an instruction following training instance by predicting an instruction that would be correctly answered by a portion of a document of the corpus. |
| [KG2Instruct](https://arxiv.org/abs/2305.11527) | The method that creates an instruction following training instance by predicting an instruction that would be correctly answered by a portion of a document of the corpus. |
- The current supported instruction selection metrics are as follows:
| **Metrics** | **Notation** | **Description** |
|----------------------|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| Length | $Len$ | The bounded length of every pair of instruction and response. |
| Perplexity | $PPL$ | The exponentiated average negative log-likelihood of response. |
| MTLD | $MTLD$ | Measure of textual lexical diversity, the mean length of sequential words in a text that maintains a minimum threshold TTR score. |
| ROUGE | $ROUGE$ | Recall-Oriented Understudy for Gisting Evaluation, a set of metrics used for evaluating similarities between sentences. |
| GPT score | $GPT$ | The score of whether the output is a good example of how AI Assistant should respond to the user's instruction, provided by ChatGPT. |
| CIRS | $CIRS$ | The score using the abstract syntax tree to encode structural and logical attributes, to measure the correlation between code and reasoning abilities. |
- API service providers and their corresponding LLM products that are currently available:
**Model** | **Description** | **Default Version**
--------------------|-------------------------------------------------------------------------------------------------------------------------------------------|------------------------------
***OpenAI***
GPT-3.5 | A set of models that improve on GPT-3 and can understand as well as generate natural language or code. | `gpt-3.5-turbo`
GPT-4 | A set of models that improve on GPT-3.5 and can understand as well as generate natural language or code. | `gpt-4`
***Anthropic***
Claude | A next-generation AI assistant based on Anthropic’s research into training helpful, honest, and harmless AI systems. | `claude-2.0`
Claude-Instant | A lighter, less expensive, and much faster option than Claude. | `claude-instant-1.2`
***Cohere***
Command | A flagship text generation model of Cohere trained to follow user commands and to be instantly useful in practical business applications. | `command`
Command-Light | A light version of Command models that are faster but may produce lower-quality generated text. | `command-light`
---
## 🔧Installation
**Installation from git repo branch:**
```
pip install git+https://github.com/zjunlp/EasyInstruct@main
```
**Installation for local development:**
```
git clone https://github.com/zjunlp/EasyInstruct
cd EasyInstruct
pip install -e .
```
**Installation using PyPI (not the latest version):**
```
pip install easyinstruct -i https://pypi.org/simple
```
---
## ⏩Quickstart
We provide two ways for users to quickly get started with EasyInstruct. You can either use the shell script or the Gradio app based on your specific needs.
### Shell Script
#### Step1: Prepare a configuration file
Users can easily configure the parameters of EasyInstruct in a YAML-style file or just quickly use the default parameters in the configuration files we provide. Following is an example of the configuration file for Self-Instruct:
```yaml
generator:
SelfInstructGenerator:
target_dir: data/generations/
data_format: alpaca
seed_tasks_path: data/seed_tasks.jsonl
generated_instructions_path: generated_instructions.jsonl
generated_instances_path: generated_instances.jsonl
num_instructions_to_generate: 100
engine: gpt-3.5-turbo
num_prompt_instructions: 8
```
More example configuration files can be found at [configs](https://github.com/zjunlp/EasyInstruct/tree/main/configs).
#### Step2: Run the shell script
Users should first specify the configuration file and provide their own OpenAI API key. Then, run the following shell script to launch the instruction generation or selection process.
```shell
config_file=""
openai_api_key=""
python demo/run.py \
--config $config_file\
--openai_api_key $openai_api_key \
```
### Gradio App
We provide a Gradio app for users to quickly get started with EasyInstruct. You can run the following command to launch the Gradio app locally on the port `7860` (if available).
```shell
python demo/app.py
```
We also host a running gradio app in HuggingFace Spaces. You can try it out [here](https://huggingface.co/spaces/zjunlp/EasyInstruct).
---
## 📌Use EasyInstruct
Please refer to our [documentations](https://zjunlp.gitbook.io/easyinstruct/documentations) for more details.
### Generators
The `Generators` module streamlines the process of instruction data generation, allowing for the generation of instruction data based on seed data. You can choose the appropriate generator based on your specific needs.
#### BaseGenerator
> `BaseGenerator` is the base class for all generators.
> You can also easily inherit this base class to customize your own generator class. Just override the `__init__` and `generate` method.
#### SelfInstructGenerator
> `SelfInstructGenerator` is the class for the instruction generation method of Self-Instruct. See [Self-Instruct: Aligning Language Model with Self Generated Instructions](http://arxiv.org/abs/2212.10560) for more details.
Example
```python
from easyinstruct import SelfInstructGenerator
from easyinstruct.utils.api import set_openai_key
# Step1: Set your own API-KEY
set_openai_key("YOUR-KEY")
# Step2: Declare a generator class
generator = SelfInstructGenerator(num_instructions_to_generate=10)
# Step3: Generate self-instruct data
generator.generate()
```
#### BacktranslationGenerator
> `BacktranslationGenerator` is the class for the instruction generation method of Instruction Backtranslation. See [Self-Alignment with Instruction Backtranslation](http://arxiv.org/abs/2308.06259) for more details.
Please check out link for more detials.
### Engines
The `Engines` module standardizes the instruction execution process, enabling the execution of instruction prompts on specific locally deployed LLMs. You can choose the appropriate engine based on your specific needs.
Please check out link for more detials.
---
### 🚩Citation
Please cite our repository if you use EasyInstruct in your work.
```bibtex
@misc{easyinstruct,
author = {Yixin Ou and Ningyu Zhang and Honghao Gui and Ziwen Xu and Shuofei Qiao and Zhen Bi and Huajun Chen},
title = {EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models},
year = {2023},
url = {https://github.com/zjunlp/EasyInstruct},
}
@misc{knowlm,
author = {Ningyu Zhang and Jintian Zhang and Xiaohan Wang and Honghao Gui and Kangwei Liu and Yinuo Jiang and Xiang Chen and Shengyu Mao and Shuofei Qiao and Yuqi Zhu and Zhen Bi and Jing Chen and Xiaozhuan Liang and Yixin Ou and Runnan Fang and Zekun Xi and Xin Xu and Lei Li and Peng Wang and Mengru Wang and Yunzhi Yao and Bozhong Tian and Yin Fang and Guozhou Zheng and Huajun Chen},
title = {KnowLM: An Open-sourced Knowledgeable Large Langugae Model Framework},
year = {2023},
url = {http://knowlm.zjukg.cn/},
}
@misc{bi2023programofthoughts,
author={Zhen Bi and Ningyu Zhang and Yinuo Jiang and Shumin Deng and Guozhou Zheng and Huajun Chen},
title={When Do Program-of-Thoughts Work for Reasoning?},
year={2023},
eprint={2308.15452},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
---
## 🎉Contributors