# a20201216-092207-540 **Repository Path**: tmluwei/a20201216-092207-540 ## Basic Information - **Project Name**: a20201216-092207-540 - **Description**: 抖音爬虫,数据采集:热搜、话题抓包分析 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-08-11 - **Last Updated**: 2021-08-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 抖音爬虫,数据采集:热搜、话题抓包分析 我们准备实现的是抖音的热搜榜和话题的相关数据抓取。
抓包工具: charles
模拟器: 木木模拟器 --- ## 抖音的热搜榜 **一:可以直接通过抓包工具获取接口**
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607043886626-2abe95a8-de85-4f68-98a6-f89b55c47240.png#align=left&display=inline&height=755&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1510&originWidth=1758&size=1294037&status=done&style=none&width=879)
将获取到的接口地址复制出来(简化后):
[https://aweme-hl.snssdk.com/aweme/v1/hot/search/list/](https://aweme-hl.snssdk.com/aweme/v1/hot/search/list/?detail_list=1&mac_address=08:00:27:29:D2:F5&os_api=23&device_type=MI%205s&device_platform=android&ssmix=a&iid=92152480453&manifest_version_code=860&dpi=320&uuid=008796750074613&version_code=860&app_name=aweme&version_name=8.6.0&ts=1577932778&openudid=c055533a0591b2dc&device_id=69918538596&resolution=810*1440&os_version=6.0.1&language=zh&device_brand=Xiaomi&app_type=normal&ac=wifi&update_version_code=8602&aid=1128&channel=tengxun_new&_rticket=1577932779592)
接着就可以直接请求,来获取热搜数据了。

**二:通过热搜的分享页面获取接口**
点击右上角的分享选项,复制链接后,用浏览器打开。
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044327588-6d4056a1-9c50-4d3a-811f-5daeab41f335.png#align=left&display=inline&height=639&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1278&originWidth=794&size=569986&status=done&style=none&width=397)
在浏览器中打开后 [https://www.iesdouyin.com/share/billboard/](https://www.iesdouyin.com/share/billboard/)

同样也可以获取到接口地址。可直接进行get请求
[https://www.iesdouyin.com/web/api/v2/hotsearch/billboard/word/](https://www.iesdouyin.com/web/api/v2/hotsearch/billboard/word/)
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607043953566-b4c52d2e-887d-4083-b07c-9e70e964b2d9.png#align=left&display=inline&height=735&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1470&originWidth=1030&size=574555&status=done&style=none&width=515) ## 热搜下对应的话题数据 **我们点击一个话题,来找一下热搜下对应的话题数据:**
右上角的播放量数据在
[https://aweme-hl.snssdk.com/aweme/v1/hot/search/list/?&source=3&os_api=23&version_code=860](https://aweme-hl.snssdk.com/aweme/v1/hot/search/list/?&source=3&os_api=23&version_code=860)
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607043977070-9f92016c-b051-4de8-b69c-2c3965149149.png#align=left&display=inline&height=746&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1492&originWidth=2072&size=2681548&status=done&style=none&width=1036)
我们通过寻找其他数据的接口,将链接复制下来(简化后):
[https://aweme-hl.snssdk.com/aweme/v1/hot/search/video/list/?hotword=吴亦凡 脖子](https://aweme-hl.snssdk.com/aweme/v1/hot/search/video/list/?hotword=%E5%90%B4%E4%BA%A6%E5%87%A1%20%E8%84%96%E5%AD%90&offset=0&count=12&source=trending_page&is_ad=0&os_api=23&device_type=MI%205s&device_platform=android&ssmix=a&iid=92152480453&manifest_version_code=860&dpi=320&uuid=008796750074613&version_code=860&app_name=aweme&version_name=8.6.0&ts=1577934388&openudid=c055533a0591b2dc&device_id=69918538596&resolution=810*1440&os_version=6.0.1&language=zh&device_brand=Xiaomi&app_type=normal&ac=wifi&update_version_code=8602&aid=1128&channel=tengxun_new&_rticket=1577934389020)
想要的数据就有了,比如当前话题总参与人数,可以直接GET请求接口来解析数据。
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607043992366-d7944184-f01d-4f1e-aa2c-76a0f18580bf.png#align=left&display=inline&height=410&margin=%5Bobject%20Object%5D&name=image.png&originHeight=820&originWidth=1858&size=568282&status=done&style=none&width=929) --- 热搜的数据很简单就可以获取到,
但是目前针对于**指定话题**,一些加密的参数还没有研究明白。欢迎大家留言交流
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044009144-7fe4add1-e212-464a-8d47-5f4446642d23.png#align=left&display=inline&height=737&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1474&originWidth=1824&size=912345&status=done&style=none&width=912)
但是为了实现话题数据的抓取,不得不另寻他路,没想到还真找到了其他的接口。 ## 指定话题的数据获取方法 **以一个话题示例:**
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044020889-217be078-1522-459f-b66e-a1410866ac77.png#align=left&display=inline&height=536&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1072&originWidth=1022&size=324766&status=done&style=none&width=511)
我们需要的是该话题对应的播放量和视频数量。
通过抓包,找到了如下接口:
[https://aweme-hl.snssdk.com/aweme/v1/challenge/detail/?query_type=0&ch_id=1635753360881672](https://aweme-hl.snssdk.com/aweme/v1/challenge/detail/?query_type=0&ch_id=1635753360881672)
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044034158-85f8c625-4d2c-460d-a093-651f0c1bd4ba.png#align=left&display=inline&height=668&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1336&originWidth=1338&size=511729&status=done&style=none&width=669)
这里需要 ch_id 才能获取到我们需要的数据。
如何才能简单快捷的获取到这个ch_id呢,经过一段时间的分析。
我发现: 该话题《从地球出发》的ch_id:1635753360881672,
可以在该相关用户的详情中找到。
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044051477-9ccfca3b-3e02-4222-a27a-6f41bc83ae77.png#align=left&display=inline&height=741&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1482&originWidth=1914&size=1890578&status=done&style=none&width=957)
那么还是老方法,获取分享页面的链接,从浏览器打开

查看分享页面中的接口数据。
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044371164-8cac554a-e876-4082-81a4-38af283235be.png#align=left&display=inline&height=641&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1282&originWidth=1218&size=536444&status=done&style=none&width=609)
果不其然,找到了我们需要的id。
--- **话题下的视频详情:**
那么如何获取话题下的视频详情呢,回到模拟器,又发现了右上方的分享选项

将链接复制下来之后,使用浏览器打开,在接口中可以找到我们所需要的数据
[https://www.iesdouyin.com/share/challenge/1635753360881672](https://www.iesdouyin.com/share/challenge/1635753360881672)
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044131060-f8d20ca4-e2a4-4cd6-869d-84bddb014748.png#align=left&display=inline&height=718&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1436&originWidth=2674&size=2727153&status=done&style=none&width=1337)
观察一下这个接口的参数
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044142111-1936ba63-184b-4d4b-bfd0-6bc527f7f66c.png#align=left&display=inline&height=237&margin=%5Bobject%20Object%5D&name=image.png&originHeight=474&originWidth=954&size=130608&status=done&style=none&width=477)
ch_id 已经知道了,
_signature 签名,在之前的文章中有讲解过。这里就不再重复了。 --- ## 代码部分 案例代码,相对比较简介,需要大家自行完善。
**热搜榜数据:** ``` import requests import pprint # 抖音热搜榜 hot_search = 'https://aweme-hl.snssdk.com/aweme/v1/hot/search/list/?detail_list=1' headers = {"User-Agent":"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Mobile Safari/537.36"} hot_json = requests.get(hot_search,headers=headers).json() hot_list = [] for data in hot_json['data']['word_list']: item = {} keyword = data['word'] hot_value = data['hot_value'] item[keyword] = hot_value hot_list.append(item) pprint.pprint(hot_list) ``` ![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044163870-5490aeaf-7469-4224-8ea0-e000d4cac462.png#align=left&display=inline&height=369&margin=%5Bobject%20Object%5D&name=image.png&originHeight=738&originWidth=992&size=618724&status=done&style=none&width=496) --- **热搜词对应的阅读人数**
这里取其中一个热搜词。 ``` import requests headers = {"User-Agent":"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Mobile Safari/537.36"} hot_word = '鹿晗吃播' hot_reading = 'https://aweme-hl.snssdk.com/aweme/v1/hot/search/video/list/?hotword={}'.format(hot_word) hot_json = requests.get(hot_reading,headers=headers).json() print("持续时间:",hot_json['aweme_list'][2]['duration']) print("热度值:",hot_json['aweme_list'][2]['hot_info']['value']) print("当前排名:",hot_json['aweme_list'][2]['hot_info']['rank']) ``` ![](https://cdn.nlark.com/yuque/0/2020/png/97322/1607043858822-cb4eef5a-9144-4de5-9402-907e95b394a2.png#align=left&display=inline&height=125&margin=%5Bobject%20Object%5D&originHeight=125&originWidth=473&size=0&status=done&style=none&width=473) --- **单个话题阅读量** ``` import requests dy_topic = 'https://aweme-hl.snssdk.com/aweme/v1/challenge/detail/?query_type=0&ch_id=1635753360881672' headers = {"User-Agent":"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Mobile Safari/537.36"} topic_json = requests.get(dy_topic,headers=headers).json() view_count = topic_json['ch_info']['view_count'] # 阅读量 print(view_count) ``` 如果对大家有帮助或者有疑问,欢迎点赞👍留言! --- ## 更新 发现 分享页面中话题的 sign 和 个人主页的生成方法一样的。
并且话题视频的get请求中,不需要dytk,带上 ch_id 和 _sign即可。
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044202042-f5fcd474-2143-4857-bfee-1c2f4aca64ea.png#align=left&display=inline&height=514&margin=%5Bobject%20Object%5D&name=image.png&originHeight=1028&originWidth=2068&size=689573&status=done&style=none&width=1034)
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044214483-0fbab83d-7021-45f9-ada7-9e2cc0124a3f.png#align=left&display=inline&height=380&margin=%5Bobject%20Object%5D&name=image.png&originHeight=760&originWidth=1910&size=368058&status=done&style=none&width=955)
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044225484-a9c97613-86f1-4a08-94f4-9d87e86b050b.png#align=left&display=inline&height=374&margin=%5Bobject%20Object%5D&name=image.png&originHeight=748&originWidth=2028&size=476206&status=done&style=none&width=1014)
![image.png](https://cdn.nlark.com/yuque/0/2020/png/97322/1607044238381-0d2f8610-a302-426e-9b39-b47d5f33ca6e.png#align=left&display=inline&height=338&margin=%5Bobject%20Object%5D&name=image.png&originHeight=676&originWidth=1704&size=299074&status=done&style=none&width=852)

—————————————————————————————————————————— #### TiToData:专业的短视频、直播数据接口服务平台。 #### 更多信息请联系: [TiToData](https://www.titodata.com?from=douyinarticle) 覆盖主流平台:抖音,快手,小红书,TikTok,YouTube