# JsoupXpath
**Repository Path**: xiaoyangge/JsoupXpath
## Basic Information
- **Project Name**: JsoupXpath
- **Description**: 基于Antlr4针对html解析与数据提取用Java完整实现了W3C XPATH 1.0标准语法的html解析器
- **Primary Language**: Java
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: https://github.com/zhegexiaohuozi/JsoupXpath
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 8
- **Created**: 2018-04-24
- **Last Updated**: 2020-12-18
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
JsoupXpath
==========
[](https://travis-ci.org/zhegexiaohuozi/JsoupXpath)
[](https://github.com/zhegexiaohuozi/JsoupXpath/releases)
[](https://opensource.org/licenses/Apache-2.0)
A html parser with xpath base on Jsoup and Antlr4.Maybe it is the best in java,Just try it.
## 简介 ##
**JsoupXpath** 是一款纯Java开发的使用xpath解析提取html数据的解析器,针对html解析完全重新实现了W3C XPATH 1.0标准语法,xpath的Lexer和Parser基于Antlr4构建,html的DOM树生成采用Jsoup,故命名为JsoupXpath.
为了在java里也享受xpath的强大与方便但又苦于找不到一款足够好用的xpath解析器,故开发了JsoupXpath。JsoupXpath的实现逻辑清晰,扩展方便,
支持完备的W3C XPATH 1.0标准语法,W3C规范:http://www.w3.org/TR/1999/REC-xpath-19991116 ,JsoupXpath语法描述文件[Xpath.g4](https://github.com/zhegexiaohuozi/JsoupXpath/blob/master/src/main/resources/Xpath.g4)
# Change Log #
https://github.com/zhegexiaohuozi/JsoupXpath/releases
# 社区讨论 #
大家有什么问题或建议现在都可以选择通过下面的邮件列表讨论,首次发言前需先订阅并等待审核通过(主要用来屏蔽广告宣传等)
- 订阅:请发邮件到 `seimicrawler+subscribe@googlegroups.com`
- 发言:请发邮件到 `seimicrawler@googlegroups.com`
- 退订:请发邮件至 `seimicrawler+unsubscribe@googlegroups.com`
## 快速开始 ##
maven依赖:
```
cn.wanghaomiao
JsoupXpath
${latest-release-version}
```
示例:
```
String xpath="//div[@id='post_list']/div[./div/div/span[@class='article_view']/a/num()>1000]/div/h3/allText()";
String doc = "...";
JXDocument jxDocument = new JXDocument(doc);
List