# textmatch
**Repository Path**: redauzhang/textmatch
## Basic Information
- **Project Name**: textmatch
- **Description**: textmatch
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-03-31
- **Last Updated**: 2023-04-02
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# sentence-similarity
对四种句子/文本相似度计算方法进行实验与比较;
四种方法为:cosine,cosine+idf,bm25,jaccard;
本实验仍然利用之前抓取的医疗语料库;
## 1 环境
python3
gensim
jieba
scipy
numpy
**2023/3/26 Update: 准备使用thulac替换jieba**
- https://github.com/thunlp/THULAC-Python
## 2 算法原理




## 3 运行步骤
- 文件检索过程,可以参考 [./textmatch/train_model/train_model.py](./textmatch/train_model/train_model.py)
## 后记
- 参考 [textmatch](https://github.com/MachineLP/TextMatch)
- 分词参考 [THULAC-Python](https://github.com/thunlp/THULAC-Python)
- [chatglm-6b-int4](https://huggingface.co/THUDM/chatglm-6b-int4/tree/main)
### 错误记录
- [protobuf 故障,包重名](https://stackoverflow.com/questions/50839667/protofile-proto-a-file-with-this-name-is-already-in-the-pool)