Python BeautifulSoup:将内容 [] 作为单个字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4488836/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
BeautifulSoup: get contents[] as a single string
提问by AP257
Anyone know an elegant way to get the entire contents of a soup object as a single string?
任何人都知道一种将汤对象的全部内容作为单个字符串获取的优雅方法?
At the moment I'm getting contents, which is of course a list, and then iterating over it:
目前我得到contents,这当然是一个列表,然后迭代它:
notices = soup.find("div", {"class" : "middlecontent"})
con = ""
for content in notices.contents:
con += str(content)
print con
Thanks!
谢谢!
采纳答案by Fábio Diniz
What about contents = str(notices)?
怎么样contents = str(notices)?
Or maybe contents = notices.renderContents(), which will hide the div tag.
或者contents = notices.renderContents(),这将隐藏 div 标签。
回答by Frédéric Hamidi
回答by zjk
But the list is recursive, so...
I think this will work.
I'm new to python, so the code may look a little weird
但是这个列表是递归的,所以......我认为这会起作用。
我是 python 新手,所以代码可能看起来有点奇怪
getString = lambda x: \
x if type(x).__name__ == 'NavigableString' \
else "".join( \
getString(t) for t in x)
contents = getString(notices)
回答by Spouk
#!/usr/bin/env python
# coding: utf-8
__author__ = 'spouk'
import BeautifulSoup
import requests
def parse_contents_href(url, url_args=None, check_content_find=None, tag='a'):
"""
parse href contents url and find some text in href contents [ for example ]
"""
html = requests.get(url, params=url_args)
page = BeautifulSoup.BeautifulSoup(html.text)
alllinks = page.findAll(tag, href=True)
result = check_content_find and filter(
lambda x: check_content_find in x['href'], alllinks) or alllinks
return result and "".join(map(str, result)) or False
url = 'https://vk.com/postnauka'
print parse_contents_href(url)

