Python 3 UnicodeDecodeError：“charmap”编解码器无法解码字节 0x9d

Question

提问by Fakhriyanto

I want to make search engine and I follow tutorial in some web. I want to test parse html

我想制作搜索引擎，并在某些网站上遵循教程。我想测试解析html

from bs4 import BeautifulSoup

def parse_html(filename):
    """Extract the Author, Title and Text from a HTML file
    which was produced by pdftotext with the option -htmlmeta."""
    with open(filename) as infile:
        html = BeautifulSoup(infile, "html.parser", from_encoding='utf-8')
        d = {'text': html.pre.text}
        if html.title is not None:
            d['title'] = html.title.text
        for meta in html.findAll('meta'):
            try:
                if meta['name'] in ('Author', 'Title'):
                    d[meta['name'].lower()] = meta['content']
            except KeyError:
                continue
        return d

parse_html("C:\pdf\pydf\data\muellner2011.html")

and it getting error

它得到错误

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 867: character maps to <undefined>enter code here

I saw some solutions on the Web using the encode(). But I don't know how to insert encode() function in code. Can anyone help me?

我在网上看到了一些使用 encode() 的解决方案。但我不知道如何在代码中插入 encode() 函数。谁能帮我？

Answer 1

采纳答案by Martijn Pieters

In Python 3, files are opened as text (decoded to Unicode) for you; you don't need to tell BeautifulSoup what codec to decode from.

在 Python 3 中，文件以文本形式（解码为 Unicode）为您打开；您不需要告诉 BeautifulSoup 解码的编解码器。

If decoding of the data fails, that's because you didn't tell the open()call what codec to use when reading the file; add the correct codec with an encodingargument:

如果数据解码失败，那是因为你没有告诉open()调用读取文件时使用什么编解码器；使用encoding参数添加正确的编解码器：

with open(filename, encoding='utf8') as infile:
    html = BeautifulSoup(infile, "html.parser")

otherwise the file will be opened with your system default codec, which is OS dependent.

否则该文件将使用您的系统默认编解码器打开，这取决于操作系统。

Python 3 UnicodeDecodeError：“charmap”编解码器无法解码字节 0x9d

提问by Fakhriyanto

采纳答案by Martijn Pieters

相关推荐

最近更新

标签

Python 3 UnicodeDecodeError：“charmap”编解码器无法解码字节 0x9d

提问by Fakhriyanto

采纳答案by Martijn Pieters

相关推荐

从 Python 中的字典中删除键返回新字典

Python 如何在保留矩阵维度的同时序列化 numpy 数组？

python pandas groupby() 结果

python中括号的不同含义

相关推荐

最近更新

标签