1 /我正在尝试使用美丽的汤提取脚本的一部分,但它打印无.怎么了 ?
URL = "http://www.reuters.com/video/2014/08/30/woman-who-drank-restaurants-tainted-tea?videoId=341712453"
oururl= urllib2.urlopen(URL).read()
soup = BeautifulSoup(oururl)
for script in soup("script"):
script.extract()
list_of_scripts = soup.findAll("script")
print list_of_scripts
最佳答案
从dom中提取删除标记.这就是为什么你得到空列表.
原文链接:https://www.f2er.com/python/439101.html使用type =“application / ld json”属性查找脚本,并使用json.loads对其进行解码.然后,您可以访问Python数据结构等数据. (给定数据的字典)
import json
import urllib2
from bs4 import BeautifulSoup
URL = ("http://www.reuters.com/video/2014/08/30/"
"woman-who-drank-restaurants-tainted-tea?videoId=341712453")
oururl= urllib2.urlopen(URL).read()
soup = BeautifulSoup(oururl)
data = json.loads(soup.find('script',type='application/ld+json').text)
print data['video']['transcript']