# scrapy_spider **Repository Path**: quickn/scrapy_spider ## Basic Information - **Project Name**: scrapy_spider - **Description**: scrapy selenium 爬虫 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2024-12-27 - **Last Updated**: 2025-04-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README #爬虫框架 pip install scrapy Scrapy框架主要由六大组件组成:调度器(Scheduler)、下载器(Downloader)、爬虫(Spider)、实体管道(Item Pipeline)、Scrapy引擎(Scrapy Engine)和中间件(Middwares)‌ #定时任务 pip install schedule #打包工具 pip install pyinstaller #浏览器插件 pip install selenium #查看selenium版本 pip show selenium #代理方式安装 pip install -i https://pypi.tuna.tsinghua.edu.cn/simple scrapy #生成exe文件 pyinstaller --hidden-import=pymysql --hidden-import=selenium --hidden-import=selenium.webdriver.common.by --add-data "config;." runSpider.py #获取 chromedriver https://googlechromelabs.github.io/chrome-for-testing/known-good-versions-with-downloads.json #运行exe .\dist\runSpider\runSpider.exe #按照 mongodb 存储数据 pip install pymongo -i https://mirrors.aliyun.com/pypi/simple/