# bert_encoder **Repository Path**: Samuelcoding/bert_encoder ## Basic Information - **Project Name**: bert_encoder - **Description**: 使用BERT得到词向量和句向量 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 1 - **Created**: 2020-08-06 - **Last Updated**: 2023-09-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # bert_encoder use Google Bert model to encode a sentence to vector. **usage** **[`BERT-Base, Chinese`](https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip)**: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters Download the model above, unzip and place in current directory. **How to encode a sentence?** ``` from bert_encoder import BertEncoder be = BertEncoder() embedding = be.encode("新年快乐,恭喜发财,万事如意!") print(embedding) print(embedding.shape) ``` update:直接使用bert的CLS位置得到句向量然后计算相似度被证明是不可行的,后来有很多工作研究这一点,如果想得到可用的bert句向量也有很多办法,例如可以参考:Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks