1 Star 0 Fork 0

bit212/Web-page-classification

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
runSample.sh 884 Bytes
一键复制 编辑 原始数据 按行查看 历史
Kahlil Oppenheimer 提交于 2015-12-12 10:46 +08:00 . Added ngram testing script
#!/bin/bash
# Takes in the size of the sample as the command line arg, then imports that data into mallet for both the
# description and full web-page text, then evaluates a classifier on each
(cd mallet && bin/mallet import-dir --input ../raw/dmoz${1}c.data/desc/* --output input/sample/sample-d-${1}.mallet --remove-stopwords)
(cd mallet && bin/mallet import-dir --input ../raw/dmoz${1}c.data/html/* --output input/sample/sample-p-${1}.mallet --remove-stopwords)
(cd mallet && bin/mallet train-classifier --input input/sample/sample-d-${1}.mallet --cross-validation 10 --trainer NaiveBayes --trainer BalancedWinnow --trainer MaxEnt | tee output/sample/sample-d-${1}.out)
(cd mallet && bin/mallet train-classifier --input input/sample/sample-p-${1}.mallet --cross-validation 10 --trainer NaiveBayes --trainer BalancedWinnow --trainer MaxEnt | tee output/sample/sample-p-${1}.out)
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/bit212/Web-page-classification.git
git@gitee.com:bit212/Web-page-classification.git
bit212
Web-page-classification
Web-page-classification
master

搜索帮助