# weixin_crawler **Repository Path**: hkingsoft_admin/weixin_crawler ## Basic Information - **Project Name**: weixin_crawler - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-03-21 - **Last Updated**: 2024-03-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## What is weixin_crawler? weixin_crawler是一款使用Scrapy、Flask、Echarts、Elasticsearch等实现的微信公众号文章爬虫,自带分析报告和全文检索功能,几百万的文档都能瞬间搜索。weixin_crawler设计的初衷是尽可能多、尽可能快地爬取微信公众的历史发文 如果你想先看看这个项目是否有趣,这段不足3分钟的介绍视频一定是你需要的: https://www.youtube.com/watch?v=CbfLRCV7oeU&t=8s ## 主要特点 1. 使用Python3编写 2. 爬虫框架为Scrapy并且实际用到了Scrapy的诸多特性,是深入学习Scrapy的不错开源项目 3. 利用Flask、Flask-socketio、Vue实现了高可用性的UI界面。功能强大实用,是新媒体运营等岗位不错的数据助手 4. 得益于Scrapy、MongoDB、Elasticsearch的使用,数据爬取、存储、索引均简单高效 5. 支持微信公众号的全部历史发文爬取 6. 支持微信公众号文章的阅读量、点赞量、赞赏量、评论量等数据的爬取 7. 自带面向单个公众号的数据分析报告 8. 利用Elasticsearch实现了全文检索,支持多种搜索和模式和排序模式,针对搜索结果提供了趋势分析图表 9. 支持对公众号进行分组,可利用分组数据限定搜索范围 10. 原创手机自动化操作方法,可实现爬虫无人监管 11. 反爬措施简单粗暴 ## 使用到的主要工具 | 语言 | | Python3.6 | | --- | ------- | --------------------------------------------- | | 前端 | web框架 | Flask / Flask-socketio / gevent | | | js/css库 | Vue / Jquery / W3css / Echarts / Front-awsome | | 后端 | 爬虫 | Scrapy | | | 存储 | Mongodb / Redis | | | 索引 | Elasticsearch | ## 运行方法 > #### Insatall mongodb / redis / elasticsearch and run them in the background > > 1. downlaod mongodb / redis / elasticsearch from their official sites and install them > > 2. run them at the same time under the default configuration. In this case mongodb is localhost:27017 redis is localhost:6379(or you have to config in weixin_crawler/project/configs/auth.py) > > #### Install proxy server and run proxy.js > > 1. install nodejs and then npm install anyproxy and redis in weixin_crawler/proxy > > 2. cd to weixin_crawler/proxy and run node proxy.js > > 3. install anyproxy https CA in both computer and phone side > > 4. if you are not sure how to use anyproxy, [here ](https://github.com/alibaba/anyproxy)is the doc > > #### Install the needed python packages > > 1. NOTE: you may can not simply type pip install -r requirements.txt to install every package, twist is one of them which is needed by scrapy > > 2. I am not sure if your python enviroment will throw other package not found error, just install any package that is needed > > #### Some source code have to be modified(maybe it is not reasonable) > > 1. scrapy Python36\Lib\site-packages\scrapy\http\request\ \__init\__.py --> weixin_crawler\source_code\request\\__init\__.py > > 2. scrapy Python36\Lib\site-packages\scrapy\http\response\ \__init\__.py --> weixin_crawler\source_code\response\\\__init\__.py > > 3. pyecharts Python36\Lib\site-packages\pyecharts\base.py --> weixin_crawler\source_code\base.py. In this case function get_echarts_options is added in line 106 > > #### If you want weixin_crawler work automatically those steps are necessary or you shoud operate the phone to get the request data that will be detected by Anyproxy manual > > 1. Install abd and add it to your path(windows for example) > > 2. install android emulator(NOX suggested) or plugin your phone and make sure you can operate them with abd from command line tools > > 3. If mutiple phone are connected to your computer you have to find out their adb ports which will be used to add crawler > > #### Run the main.py > > Just run python weixin_crawler\project\main.py. Now open the browser and everything you want would be in localhost:5000. > > In this long step list you may get stucked, join our community for help, tell us what you have done and what kind of error you have found. > > Let's go to explore the world in localhost:5000 together ## 功能展示 UI主界面 ![1](readme_img/爬虫主界面.gif) 添加公众号爬取任务和已经爬取的公众号列表 ![1](readme_img/公众号.png) 爬虫界面 ![](readme_img/caiji.png) 设置界面 ![ ](readme_img/设置.png) 公众号历史文章列表 ![ ](readme_img/历史文章列表.gif) 报告 ![ ](readme_img/报告.gif) 搜索 ![ ](readme_img/搜索.gif) ## 加入社区 也许你属于: - 刚刚毕业的技术小白,没啥项目开发经验 - 能轻松让weixin_crawler跑起来的老司机 - 爬虫大牛,指哪儿爬哪儿 - 你可能是从事新媒体运营一类工作,完全不懂编程但是希望充分运用weixin_crawler的各项功能服务于自己的工作 不管你属于哪一类,只要你对微信数据分析有浓厚的兴趣,通过作者微信加入我们的社区都能获得想要。 ## 回馈作者 weixin_crawler从2018年6月份就开始利用业余时间开发(居然用了半年时间),无奈作者水平有限,至今才勉强能拿出一个可用版本分享给各位爬虫爱好者,多谢大家的期待。如果你喜欢这个项目,期待你的回馈。 你可以通过以下任意一种方式回馈作者(可多选哦): - 一个小小的star,并把这个有趣的开源项目分享给别的开发者,哪怕只有一位,只要他也是在技术精进道路上的砥砺前行着 - 打赏给作者一杯咖啡,以后的熬夜奋战也因此多了一丝效率 :) - 加入社群一起贡献代码,我们一起创造出更酷的爬虫 - 加入知识星球听作者将weixin_crawler的每一个函数和每一个问题解决的思维过程娓娓道来,你会因此认识更多意志坚定的开发者 | 作者微信(备注请以wc开头) | 加入知识星球 | 打赏作者 | | ----------------------- | ------------------------- | ----------------------- | | ![ ](readme_img/wq.jpg) | ![ ](readme_img/知识星球.png) | ![ ](readme_img/打赏.png) |