1 Star 0 Fork 0

bit212/Web-page-classification

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
test-ngram.sh 807 Bytes
一键复制 编辑 原始数据 按行查看 历史
Kahlil Oppenheimer 提交于 2015-12-12 10:46 +08:00 . Added ngram testing script
#!/bin/bash
# Imports both description and full page of the passed size as n-gram
# then evaluates each
(cd mallet && bin/mallet import-dir --input ../raw/dmoz5c.data/desc/* --output input/ngram/gram-d-${1}.mallet --remove-stopwords --gram-sizes ${1})
(cd mallet && bin/mallet import-dir --input ../raw/dmoz5c.data/html/* --output input/ngram/gram-p-${1}.mallet --remove-stopwords --gram-sizes ${1})
(cd mallet && bin/mallet train-classifier --input input/ngram/gram-d-${1}.mallet --cross-validation 10 --trainer NaiveBayes --trainer BalancedWinnow --trainer MaxEnt | tee output/ngram/gram-d-${1}.out)
(cd mallet && bin/mallet train-classifier --input input/ngram/gram-p-${1}.mallet --cross-validation 10 --trainer NaiveBayes --trainer BalancedWinnow --trainer MaxEnt | tee output/ngram/gram-p-${1}.out)
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/bit212/Web-page-classification.git
git@gitee.com:bit212/Web-page-classification.git
bit212
Web-page-classification
Web-page-classification
master

搜索帮助