# TLNN **Repository Path**: thunlp/TLNN ## Basic Information - **Project Name**: TLNN - **Description**: Source code for EMNLP-IJCNLP 2019 paper "Event Detection with Trigger-Aware Lattice Neural Network". - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-05-29 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # README This is the source code of the EMNLP 2019 paper [**Event Detection with Trigger-Aware Lattice Neural Network**](https://www.aclweb.org/anthology/D19-1033.pdf) . TLNN model aims to address the issues of trigger-word mismatch and trigger polysemy. In this project, the event detection is a sequence labeling task. For more information, please read the paper. ![D962E773-6A2F-492C-A538-3D6EC3F279EB.png](https://i.loli.net/2019/11/13/Wkw3nhGoFrabDL9.jpg) ## Requirements - Python 3.6 - Pytorch 0.3.0 - CUDA 0.9 - Numpy ## Datasets Datasets in our paper is **ACE2005** and **KBP Eval 2017**. According to terms of LDC, we can not share the data to the third party. But if you have LDC license, you can obtain the two datasets with the LDC numbers: - **ACE 2005:** LDC2006T06 ![image-20191112172845029.png](https://i.loli.net/2019/11/13/1XygVfiIPKkWxmH.png) - **KBP Eval 2017:** LDC2017E55 ![image-20191112173905692.png](https://i.loli.net/2019/11/13/YUoTyKx7eSbpFgG.png) ## Data Format ### train/dev/test The task is regarded as a sequence labeling task. The training, dev and test data is expected in standard tab-separated format. One word per line, separate column for token and label, empty line between sentences. The first line of each sentence is the document id corresponding to golden set. for each word, the first column is the token, the second column is the character index, the last column is the tag of event type. For example: ``` sid:CTS20001223.1300.0809 歹 297 O 徒 298 O 抢 300 B-Conflict:Attack 得 301 O 实 302 O 在 303 O ``` ### Pretrained Character embedding One character per line. For each line, the first column is the character, the rest columns is the value of the embedding of the character. ### Pretrained Sense (Chars & Words) embedding Similar to character embedding but for word senses. For example: ``` 苹果#1 0.304095 ... 苹果#2 -0.175496 ... 香蕉 -0.230772 ... ``` where *Word#n* means that it is the n-th sense of word A, The pretrained word senses embedding could be obtained by [SAT](https://github.com/thunlp/SE-WRL-SAT). ### Sense map Records all senses for each polysemous word, corresponding to the word sense embedding. One word per line, for each line, the first column is the word, and the rest columns are all the senses of it ( if exits ). For example: ``` 苹果 苹果#1 苹果#2 香蕉 ``` ### test.golden.dat Recodes the answer of all triggers with location and event types for evaluations. One trigger per line, the columns are doucment id, start index of character, trigger word length, trigger word and event type. For example: ``` CTV20001227.1330.0447 57 2 宣判 Justice:Sentence CTV20001227.1330.0447 131 2 判处 Justice:Sentence CTV20001227.1330.0447 110 2 判处 Justice:Sentence CTV20001227.1330.0447 51 2 上诉 Justice:Appeal CTV20001227.1330.0447 288 2 上诉 Justice:Appeal ``` ## How to Run Arguments of the code are set in **config.py**, which contains ``` status = 'train' Status of the program savemodel = 'data/model/test' Path of the saved model savedset = 'data/model/test.dset' Path of the saved data settings TRAIN = 'trainid_BIO.txt' Path of the training data dev = 'devid_BIO.txt' Path of the dev data test = "testchrid_BIO.txt" Path of the test data loadmodel = 'data/model/test.model' Path of the model to load output = 'data/test.output' Path of the output lr = 0.015 Learning rate maxlen = 300 Max length of each sequence dataset = 'ace' Dataset name pretrain_char_emb = 'char.vec' Pre-trained character embeddings pretrain_sense_emb = 'sense.vec' Pre-trained sense embeddings pretrain_word_emb = 'word.vec' Pre-trained word embeddings ``` With appropriate data settings, you could run the code with: ```shell python train.py ``` ## Citation ``` @inproceedings{ding2019event, title={Event Detection with Trigger-Aware Lattice Neural Network}, author={Ding, Ning and Li, Ziran and Liu, Zhiyuan and Zheng, Haitao and Lin, Zibo}, booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)}, pages={347--356}, year={2019} } ``` ## Contact For any questions, please contact: - dingn18@mails.tsinghua.edu.cn - lizr18@mails.tsinghua.edu.cn