# prompt-tune **Repository Path**: taliux/prompt-tune ## Basic Information - **Project Name**: prompt-tune - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 69 - **Forks**: 25 - **Created**: 2024-01-08 - **Last Updated**: 2025-08-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## 基于遗传算法(GA)的 Prompt 自动优化 ### 1. 前言 通俗的讲,Prompt 调优涉及以下两种情况: (1) Prompt 表达不清晰或不完整,没有充分描述出任务的细节,导致模型在一些情况下表现不佳。这种情况下,我们应该优先考虑人工调优,补充缺失的细节描述,使模型在各种情况下的表现更为鲁棒。换言之,模型并不能猜到你的真实需求,除非你完整准确地告诉它。 (2) 即使 Prompt 表达清晰且完整,当你在一个数据集上做批量测试时,也不难发现,Prompt 中句子的顺序,词语的选择,使用中文或是英文,甚至标点符号的使用,都会对模型的表现产生影响。这种情况下,人工调优困难较大,因为没有一个明确的优化方向。 **这里提出的 GA 算法,主要针对情况 (2) 设计。** 通过遗传算法,以启发式搜索的方式,寻找最优的 Prompt 组合(局部最优解),以提升模型的表现。 当前版本只支持 OpenAI 系列的语言模型。 ### 2. 安装依赖: ``` pip install -r requirements.txt ``` ### 3. 配置 OpenAI 权限 创建 `.env` 文件并写入你的 OPENAI_API_KEY。 ``` OPENAI_API_KEY="sk-xxxx" ``` ### 4. 运行: ``` python main.py --data_path annotations.jsonl \ --prompt_template prompt.txt \ --input_keys title outline question \ --label_key label \ --label_type str ``` 输出 ``` INFO:-:Label distribution: { "Y": 77, "N": 201 } INFO:-:Before GA INFO:-:Evaluate on TRAIN: INFO:-:{ "acc": 0.85, "precision": 0.8888888888888888, "recall": 0.8, "f1": 0.8421052631578948 } INFO:-:Evaluate on TEST: INFO:-:{ "acc": 0.8151260504201681, "precision": 0.5783132530120482, "recall": 0,8421052631578947, "f1": 0.6857142857142856 } INFO:-:Start GA Yielding the first generation: 100%|██████████| 63/63 [00:22<00:00, 2.79it/s] INFO:-:Generation 0 Evaluating individuals: 64it [03:14, 3.04s/it] INFO:-:Best individual: [0.925, 0.85108540370891411] INFO:-:Generation 1 Evaluating individuals: 65it [01:45, 1.62s/it] INFO:-:Best individual: [0.95, 0.8384191768727716] INFO:-:Generation 2 Evaluating individuals: 65it [02:38, 2.44s/it] INFO:-:Best individual: [0.95, 0.86891512032942091] INFO:-:Generation 3 Evaluating individuals: 65it [01:51, 1.71s/it] INFO:-:Best individual: [0.975, 0.9098148429792807] INFO:-:Generation 4 Evaluating individuals: 65it [01:50, 1.70s/it] INFO:-:Best individual: [0.975, 0.9098148429792807] INFO:-:Final scores on TRAIN: [0.975, 0.9098148429792807] INFO:-:Evaluate on TEST: INFO:-:{ "acc": 0.8403361344537815, "precision": 0.6233766233766234, "recall": 0.8421052631578947, "f1": 0.7164179104477613 } ``` 注:因为 OpenAI 的输出不是 100% deterministic,所以每次运行的结果可能会有差异。 ### 5. 更多参数: ``` python main.py --help ``` ``` usage: main.py [-h] [--population_size POPULATION_SIZE] [--max_generations MAX_GENERATIONS] [--mutation_rate MUTATION_RATE] [--crossover_rate CROSSOVER_RATE] [--deletion_prob DELETION_PROB] [--paraphrasing_prob PARAPHRASING_PROB] [--translation_prob TRANSLATION_PROB] [--expansion_prob EXPANSION_PROB] [--exchanging_prob EXCHANGING_PROB] [--threads THREADS] [--seed SEED] [--model MODEL] [--save SAVE] [--k_shot K_SHOT] [--data_path DATA_PATH] [--prompt_template PROMPT_TEMPLATE] [--input_keys INPUT_KEYS [INPUT_KEYS ...]] [--label_key LABEL_KEY] [--label_type LABEL_TYPE] [--paraphraser_prompt PARAPHRASER_PROMPT] [--paraphraser_model PARAPHRASER_MODEL] [--translator_prompt TRANSLATOR_PROMPT] [--translator_model TRANSLATOR_MODEL] GA Prompt Optimisation options: -h, --help show this help message and exit Genetic Algorithm: GA configurations --population_size POPULATION_SIZE number of individuals in each population --max_generations MAX_GENERATIONS number of iterations --mutation_rate MUTATION_RATE gene mutation probability --crossover_rate CROSSOVER_RATE gene crossover probability --deletion_prob DELETION_PROB delete probability in mutation --paraphrasing_prob PARAPHRASING_PROB paraphrasing probability in mutation --translation_prob TRANSLATION_PROB translation probability in mutation --expansion_prob EXPANSION_PROB expansion probability in mutation --exchanging_prob EXCHANGING_PROB exchanging probability in mutation Settings: General settings --threads THREADS thread number --seed SEED random seed --model MODEL LLM model for evaluation --save SAVE file to save the optimised prompt template Data: Data configurations --k_shot K_SHOT number of training examples for each label class --data_path DATA_PATH path to data Prompt: Prompt configurations --prompt_template PROMPT_TEMPLATE path to the original prompt template file --input_keys INPUT_KEYS [INPUT_KEYS ...] input keys --label_key LABEL_KEY label key --label_type LABEL_TYPE label type Tool: Tool configurations --paraphraser_prompt PARAPHRASER_PROMPT prompt for paraphraser --paraphraser_model PARAPHRASER_MODEL paraphraser model --translator_prompt TRANSLATOR_PROMPT prompt for translator --translator_model TRANSLATOR_MODEL translator model ``` ### Citation 此工作的前身,发表于IJCAI 2023。 ``` @inproceedings{10.24963/ijcai.2023/588, author = {Zhao, Jiangjiang and Wang, Zhuoran and Yang, Fangchun}, title = {Genetic Prompt Search via Exploiting Language Model Probabilities}, year = {2023}, booktitle = {Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence}, pages = {5296--5305}, location = {Macao, P.R.China} } ```