1 Star 0 Fork 0

Pasca/text_matching_tf

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
word2vec_static.py 989 Bytes
一键复制 编辑 原始数据 按行查看 历史
joe 提交于 2019-06-17 10:29 +08:00 . 修改文件名
from gensim.models import Word2Vec
import pandas as pd
import jieba
from bimpm import args
df = pd.read_csv('input/train.csv')
p = df['sentence1'].values
h = df['sentence2'].values
p_seg = list(map(lambda x: list(jieba.cut(x)), p))
h_seg = list(map(lambda x: list(jieba.cut(x)), h))
common_texts = []
common_texts.extend(p_seg)
common_texts.extend(h_seg)
df = pd.read_csv('input/dev.csv')
p = df['sentence1'].values
h = df['sentence2'].values
p_seg = list(map(lambda x: list(jieba.cut(x)), p))
h_seg = list(map(lambda x: list(jieba.cut(x)), h))
common_texts.extend(p_seg)
common_texts.extend(h_seg)
df = pd.read_csv('input/test.csv')
p = df['sentence1'].values
h = df['sentence2'].values
p_seg = list(map(lambda x: list(jieba.cut(x)), p))
h_seg = list(map(lambda x: list(jieba.cut(x)), h))
common_texts.extend(p_seg)
common_texts.extend(h_seg)
model = Word2Vec(common_texts, size=args.word_embedding_len, window=5, min_count=0, workers=12)
model.save("output/word2vec/word2vec.model")
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/Samuelcoding/text_matching_tf.git
git@gitee.com:Samuelcoding/text_matching_tf.git
Samuelcoding
text_matching_tf
text_matching_tf
master

搜索帮助