# language-evaluation **Repository Path**: xuuu3/language-evaluation ## Basic Information - **Project Name**: language-evaluation - **Description**: 123213213 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-02-17 - **Last Updated**: 2023-02-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # language-evaluation (Experimental) Collection of evaluation code for natural language generation. **Note: API could change frequently without notice** ## Metrics - `CocoEvaluator`: coco-caption (BLEU1-4, METEOR, ROUGE, CIDEr, SPICE) - `RougeEvaluator`: sentence-level rouge (ROUGE-1, ROUGE-2, ROUGE-L with f-measure) - `Rouge155Evaluator`: summary-level rouge (ROUGE-1, ROUGE-2, ROUGE-L with f-measure) ## Requirements - Java 1.8.0+ (used by coco-caption evaluator) - Python 3.6+ - `libxml-parser-perl` (used by ROUGE.1.5.5.pl) ## Installation and Usage Install external dependencies (e.g. Java 1.8.0+, `libxml-parser-perl`): ```bash # Oracle Java sudo add-apt-repository ppa:webupd8team/java sudo apt upadte apt-get install oracle-java8-installer # libxml-parser-perl sudo apt install libxml-parser-perl ``` Then run: ```bash pip install git+https://github.com/bckim92/language-evaluation.git python -c "import language_evaluation; language_evaluation.download('coco')" ``` Python API (or see [language_evaluation_test.py](https://github.com/bckim92/language-evaluation/blob/master/language_evaluation_test.py)): ```python import language_evaluation from pprint import PrettyPrinter pprint = PrettyPrinter().pprint predicts = ['i am a boy', 'she is a girl'] answers = ['am i a boy ?', 'is she a girl ?'] evaluator = language_evaluation.CocoEvaluator() results = evaluator.run_evaluation(predicts, answers) pprint(results) # {'Bleu_1': 0.9999999997500004, # 'Bleu_2': 0.5773502690332603, # 'Bleu_3': 4.3679023223468616e-06, # 'Bleu_4': 1.4287202142987477e-08, # 'CIDEr': 3.333333333333333, # 'METEOR': 0.43354749322305886, # 'ROUGE_L': 0.75, # 'SPICE': 0.6666666666666666} evaluator = language_evaluation.RougeEvaluator(num_parallel_calls=5) results = evaluator.run_evaluation(predicts, answers) pprint(results) # {'rouge1': 1.0, # 'rouge2': 0.3333333333333333, # 'rougeL': 0.75} evaluator = language_evaluation.Rouge155Evaluator(num_parallel_calls=5) results = evaluator.run_evaluation(predicts, answers) pprint(results) # {'rouge1': 1.0, # 'rouge2': 0.3333333333333333, # 'rougeL': 0.75} ``` ## Notes - TODOs - Support more metrics (e.g. embedding-based) - Support command-line interface - Support full functionality and configuration for rouge - Implement summary-level rouge scorer in pure python - Add tests & CI ## Related Projects - [tylin/coco-caption](https://github.com/tylin/coco-caption) - [bckim92/coco-caption-py3](https://github.com/bckim92/coco-caption-py3) - [Maluuba/nlg-eval](https://github.com/Maluuba/nlg-eval) - [google-research/google-research/rouge](https://github.com/google-research/google-research/tree/master/rouge) - [bheinzerling/pyrouge](https://github.com/bheinzerling/pyrouge) ## License See [LICENSE.md](LICENSE.md).