# WebCrawler **Repository Path**: senyuyin/WebCrawler ## Basic Information - **Project Name**: WebCrawler - **Description**: 单线程深度网页爬虫 - **Primary Language**: Java - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2017-09-14 - **Last Updated**: 2021-11-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: Spider, image, Image-processing ## README # WebCrawler ## 单线程深度网页爬虫 * jsoup:html页面解析; * http-request: web页面请求; * Lombok: object Getter Setter; * selenium-java:webDriver; * phatomJs for macOS: brew install phantomjs; ## 功能 * 单页面抓取 * 网站抓取 ## 问题 * http模式:/(http|https):\/\/([\w.]+\/?)\S*/  * TODO:通用爬虫模式; ## 支持网站 * 环球网:https://ent.huanqiu.com/article/45KB5HWc2ep * 腾讯网:https://new.qq.com/omn/20211029/20211029A03T6J00.html#p=6 ## UML模型 ![img.png](img.png)