From e58f5863850148a8eded70463a81550a8b792752 Mon Sep 17 00:00:00 2001
From: zxy <2207084090@qq.com>
Date: Fri, 20 Oct 2023 12:50:33 +0800
Subject: [PATCH] openEulerCorpus
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

openEulerCorpusSolution

删除文件 corpus/openEulerCorpus
---
 corpus/corpus.text | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)
 create mode 100644 corpus/corpus.text

diff --git a/corpus/corpus.text b/corpus/corpus.text
new file mode 100644
index 00000000..226a2843
--- /dev/null
+++ b/corpus/corpus.text
@@ -0,0 +1,26 @@
+获取文档内容，使用正则表达式去除无效信息。将文档进行分段，并整理每段的主题，然后按以下的思路来进行语料的生成和扩展。
+1. 语料生成思路：
+主要通过chatgpt进行生成，输入一段整理好的语料，通过以下prompt让chatgpt自动生成问答对。
+'''
+接下来请你根据材料，生成10个与${材料主题}有关的问答对。按以下格式进行输出，答案尽可能详细：
+{
+"prompt": "这里填入问题",
+"input": "",
+"history": "",
+"answer": "这里填入问题的答案"
+},
+如果你理解我的需求，你只需要回答：理解。
+'''
+
+2. 语料扩展思路：
+把生成的问题输入到chatgpt进行扩展，同时改变问题的答案，让其更加符合问题的回答。输入的prompt如下：
+'''
+接下来我会提供JSON格式的数据给你，请你帮我变换prompt属性值，同时也可以变换answer属性的值来符合问题的答案，且意思相同，生成10种。按照以下格式生成问答对:
+{
+"prompt": "这里填入问题",
+"input": "",
+"history": "",
+"answer": "这里填入问题的答案，简洁明了"
+},
+如果你能够理解任务，你只需要回答：理解。
+'''
\ No newline at end of file
-- 
Gitee