# COS960

**Repository Path**: thunlp/COS960

## Basic Information

- **Project Name**: COS960
- **Description**: COS960: A Chinese Word Similarity Dataset of 960 Word Pairs
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-05-29
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# COS960
COS960 is a Chinese word similarity dataset of 960 word pairs. Each pair of words is annotated by  15 native speakers with a similarity score which reflects **true similarity**. The 960 word pairs are further divided into 3 groups according to their Part Of Speech tags, including 480 pairs of nouns, 240 pairs of verbs and 240 pairs of adjectives.

### Usage

To use COS960 to test your word embedding, use command

```
python correlation_calcu.py {VECTOR_FILE}
```

### Dataset

The data in the files is formulated as

```
[Word1] [Word2] [Average] [Annotator1] ... [Annotator15]

小心谨慎  谨慎小心     4.0         4      ...       4 
```

### Cite

If you  use the dataset, please cite this:

```
@article{huang2019COS960,
Author = {Junjie Huang and Fanchao Qi and Chenghao Yang and Zhiyuan Liu and Maosong Sun},
Title = {{COS960: A Chinese Word Similarity Dataset of 960 Word Pairs}},
journal={arXiv preprint arXiv:1906.00247},
Year = {2019},
}
```