# spider--proxies **Repository Path**: pythonywy/spider--proxies ## Basic Information - **Project Name**: spider--proxies - **Description**: python爬虫/代理ip获取/代码无隐藏全部展示只需复制稍微修改即可 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2019-08-02 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 写了个爬虫代理ip的脚本给大家使用 ## 一.代码 ```python import requests from lxml.html import etree url = 'http://www.kuaidaili.com/free/' rp =requests.get(url) rp_html = etree.HTML(rp.text) #找xpath ip_xpath = '//*[@id="list"]/table/tbody/tr/td[1]/text()' port_xpath = '//*[@id="list"]/table/tbody/tr/td[2]/text()' http_or_https_xpath ='//*[@id="list"]/table/tbody/tr/td[4]/text()' #匹配内容 ip_list = rp_html.xpath(ip_xpath) port_list = rp_html.xpath(port_xpath) http_or_https_list = rp_html.xpath(http_or_https_xpath) #进行组合 list_zip = zip(ip_list,port_list,http_or_https_list) proxy_dict= {} proxy_list = [] for ip,port,http_or_https in list_zip: proxy_dict[http_or_https] = f'{ip}:{port}' proxy_list.append(proxy_dict) proxy_dict = {} print(proxy_list) #list就是啦,你们可以用random模块随机选一个进行后续的爬取 #一页不够嘛那我们就爬十写 #先看规则 ''' 第一页:https://www.kuaidaili.com/free/inha/1/ 第二页: https://www.kuaidaili.com/free/inha/2/ 后面就用说了吧 ''' ``` `http://www.kuaidaili.com/free/`这个ip代理网站不错哈 喜欢的给颗星星谢谢