# Web-page-classification **Repository Path**: bit212/Web-page-classification ## Basic Information - **Project Name**: Web-page-classification - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-01-07 - **Last Updated**: 2025-01-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Topical Web-page classification of the DMOZ Dataset ### [Read the paper](paper.pdf) This repository contains all scripts associated with my research on topical Web-page classification. You can read the full paper describing the task, experiments, and results [here](paper.pdf). ## Abstract Multi-class topical web-page classification is a difficult task with widespread application. Throughout this paper, I analyze the performance of well-studied techniques on two different representations of web-pages: hand-written meta-descriptions and on-page text content. I acquired all of the training labels and website descriptions from the DMOZ dataset and all of the on-page content from scraping the actual web-pages. I achieved 74.035% and 79.121% accuracy for on-page content and website descriptions respectively in a 16-way classification task with a 42.032% most frequently tagged baseline accuracy.