1 Star 0 Fork 0

dzc/Python-1

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
movie_details.py 1.52 KB
一键复制 编辑 原始数据 按行查看 历史
Nathan R 提交于 2019-01-20 09:02 +08:00 . Rename movie_details to movie_details.py
import mechanize
import urllib2
from bs4 import BeautifulSoup
# Create a Browser
browser = mechanize.Browser()
# Disable loading robots.txt
browser.set_handle_robots(False)
browser.addheaders = [('User-agent',
'Mozilla/4.0 (compatible; MSIE 5.0; Windows 98;)')]
movie_title = raw_input("Enter movie title: ")
movie_types = ('feature', 'tv_movie', 'tv_series', 'tv_episode', 'tv_special',
'mini_series', 'documentary', 'game', 'short', 'video', 'tvshort')
# Navigate
browser.open('http://www.imdb.com/search/title')
# Choose a form
browser.select_form(nr=1)
browser['title'] = movie_title
# Check all the boxes of movie types
for type in movie_types:
browser.find_control(type='checkbox',nr=0).get(type).selected = True
# Submit
fd = browser.submit()
soup = BeautifulSoup(fd.read(),'html5lib')
# Updated from td tag to h3 tag
for div in soup.findAll('h3', {'class': 'lister-item-header'}, limit=1):
a = div.findAll('a')[0]
hht = 'http://www.imdb.com' + a.attrs['href']
print(hht)
page = urllib2.urlopen(hht)
soup2 = BeautifulSoup(page.read(),'html.parser')
find = soup2.find
print("Title: " + find(itemprop='name').get_text().strip())
print("Duration: " + find(itemprop='duration').get_text().strip())
print("Director: " + find(itemprop='director').get_text().strip())
print("Genre: " + find(itemprop='genre').get_text().strip())
print("IMDB rating: " + find(itemprop='ratingValue').get_text().strip())
print("Summary: " + find(itemprop='description').get_text().strip())
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/sdredboy/Python-1.git
git@gitee.com:sdredboy/Python-1.git
sdredboy
Python-1
Python-1
master

搜索帮助