diff --git a/README.en.md b/README.en.md new file mode 100644 index 0000000000000000000000000000000000000000..f30cfc5d10adec01e8b5b479cd86c993bc2d4b53 --- /dev/null +++ b/README.en.md @@ -0,0 +1,52 @@ + + +# Detect-AI + +Detect-AI is a project designed to detect and classify AI-generated text. It combines deep learning and traditional statistical methods, utilizing the BERT model and statistical features for text analysis. + +## Features + +- Uses BERT to extract semantic features from text. +- Extracts statistical features and integrates LSTM for deep feature fusion. +- Supports comparative analysis between traditional machine learning models and deep learning models. +- Provides model saving and loading capabilities for convenient training and deployment. + +## Project Structure + +- `src/dev/ClassifyModel.py`: Implements model definition and feature fusion logic. +- `src/dev/FeatExtractor.py`: Provides statistical feature extraction functionality. +- `src/dev/comparison.py`: Supports comparison between traditional machine learning models and deep learning models. +- `src/dev/data.py`: Defines datasets and data loading logic. +- `src/dev/load_save.py`: Provides utility functions for saving and loading models. +- `src/dev/kfcValidate.py`: Implements K-fold cross-validation. +- `src/dev/train.py`: Core code for the training process. +- `src/dev/evaluation.py`: Model evaluation logic. +- `src/dev/configs.py`: Global configuration information. +- `training/data/train/alldata.jsonl`: Dataset file. + +## Usage + +1. Install dependencies: + ```bash + pip install torch transformers scikit-learn + ``` + +2. Prepare data: + - Data should be in JSONL format and stored in the `training/data/train/` directory. + +3. Train the model: + - Launch the training process using `train.py`. Specific parameters can be adjusted via configuration files. + +4. Model evaluation: + - Use `evaluation.py` to evaluate the model. + +5. Use K-fold cross-validation: + - `kfcValidate.py` provides the implementation for K-fold cross-validation. + +## Contributions + +Pull requests and issues are welcome to help improve the project's features and documentation. + +## License + +This project is licensed under the MIT License. For details, please refer to the LICENSE file. \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000000000000000000000000000000000000..22e1df79bd8c183836d5c726a79905bbd703ed5b --- /dev/null +++ b/README.md @@ -0,0 +1,52 @@ + + +# Detect-AI + +Detect-AI 是一个用于检测和分类人工智能生成文本的项目。该项目结合了深度学习和传统统计方法,利用 BERT 模型和统计特征进行文本分析。 + +## 特性 + +- 使用 BERT 提取文本语义特征。 +- 提取统计特征并结合 LSTM 进行深度特征融合。 +- 支持传统机器学习模型对比分析。 +- 提供模型保存与加载功能,便于训练与部署。 + +## 项目结构 + +- `src/dev/ClassifyModel.py`: 实现模型定义与特征融合逻辑。 +- `src/dev/FeatExtractor.py`: 提供统计特征提取功能。 +- `src/dev/comparison.py`: 支持传统机器学习模型与深度学习模型对比。 +- `src/dev/data.py`: 数据集定义与数据加载逻辑。 +- `src/dev/load_save.py`: 提供模型保存与加载工具函数。 +- `src/dev/kfcValidate.py`: 实现 K 折交叉验证。 +- `src/dev/train.py`: 训练流程核心代码。 +- `src/dev/evaluation.py`: 模型评估逻辑。 +- `src/dev/configs.py`: 全局配置信息。 +- `training/data/train/alldata.jsonl`: 数据集文件。 + +## 使用方法 + +1. 安装依赖: + ```bash + pip install torch transformers scikit-learn + ``` + +2. 准备数据: + - 数据格式为 JSONL,存储在 `training/data/train/` 目录下。 + +3. 训练模型: + - 使用 `train.py` 启动训练流程,具体参数可通过配置文件调整。 + +4. 模型评估: + - 使用 `evaluation.py` 进行模型评估。 + +5. 使用 K 折交叉验证: + - `kfcValidate.py` 提供了 K 折交叉验证的实现。 + +## 贡献 + +欢迎提交 PR 和 Issue,共同完善项目功能与文档。 + +## 许可证 + +本项目基于 MIT 许可证。详情请查看 LICENSE 文件。 \ No newline at end of file