# Boilerpipe **Repository Path**: mirrors/Boilerpipe ## Basic Information - **Project Name**: Boilerpipe - **Description**: Boilerpipe 是一个能从 HTML 中剔除广告和其他附加信息,提取出目标信息(如正文内容、发布时间)的 Java 库 - **Primary Language**: Java - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: https://www.oschina.net/p/boilerpipe - **GVP Project**: No ## Statistics - **Stars**: 17 - **Forks**: 7 - **Created**: 2018-01-24 - **Last Updated**: 2025-12-02 ## Categories & Tags **Categories**: utils **Tags**: None ## README boilerpipe ========== Boilerplate Removal and Fulltext Extraction from HTML pages NOTE: This is a work-in-progress transmit from Google Code. The latest stable version of boilerpipe is available at [`https://code.google.com/p/boilerpipe`](https://code.google.com/p/boilerpipe).