Python BeautifulSoup：将内容 [] 作为单个字符串

Question

提问by AP257

Anyone know an elegant way to get the entire contents of a soup object as a single string?

任何人都知道一种将汤对象的全部内容作为单个字符串获取的优雅方法？

At the moment I'm getting contents, which is of course a list, and then iterating over it:

目前我得到contents，这当然是一个列表，然后迭代它：

notices = soup.find("div", {"class" : "middlecontent"})
con = ""
for content in notices.contents:
    con += str(content)
print con

Thanks!

谢谢！

Answer 1

采纳答案by Fábio Diniz

What about contents = str(notices)?

怎么样contents = str(notices)？

Or maybe contents = notices.renderContents(), which will hide the div tag.

或者contents = notices.renderContents()，这将隐藏 div 标签。

Answer 2

回答by Frédéric Hamidi

You can use the join()method:

您可以使用join()方法：

notices = soup.find("div", {"class": "middlecontent"})
contents = "".join([str(item) for item in notices.contents])

Or, using a generator expression:

或者，使用生成器表达式：

contents = "".join(str(item) for item in notices.contents)

Answer 3

回答by zjk

But the list is recursive, so... I think this will work.
I'm new to python, so the code may look a little weird

但是这个列表是递归的，所以......我认为这会起作用。
我是 python 新手，所以代码可能看起来有点奇怪

getString = lambda x: \
    x if type(x).__name__ == 'NavigableString' \
    else "".join( \
    getString(t) for t in x)

contents = getString(notices)

Answer 4

回答by Spouk

#!/usr/bin/env python
# coding: utf-8
__author__ = 'spouk'

import BeautifulSoup
import requests


def parse_contents_href(url, url_args=None, check_content_find=None, tag='a'):
    """
    parse href contents url and find some text in href contents [ for example ]
    """
    html = requests.get(url, params=url_args)
    page = BeautifulSoup.BeautifulSoup(html.text)
    alllinks = page.findAll(tag,  href=True)
    result = check_content_find and filter(
        lambda x: check_content_find in x['href'], alllinks) or alllinks
    return result and "".join(map(str, result)) or False


url = 'https://vk.com/postnauka'
print parse_contents_href(url)

Python BeautifulSoup：将内容 [] 作为单个字符串

提问by AP257

采纳答案by Fábio Diniz

回答by Frédéric Hamidi

回答by zjk

回答by Spouk

相关推荐

最近更新

标签

Python BeautifulSoup：将内容 [] 作为单个字符串

提问by AP257

采纳答案by Fábio Diniz

回答by Frédéric Hamidi

回答by zjk

回答by Spouk

相关推荐

我在哪里可以找到好的在线 Python 课程？

Python [Errno 98] 地址已被使用

在没有外部库的情况下用 python 播放简单的哔哔声

python中的图形渲染（流程图可视化）

相关推荐

最近更新

标签