Python 3 UnicodeDecodeError:“charmap”编解码器无法解码字节 0x9d

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30750843/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:53:53  来源:igfitidea点击:

Python 3 UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d

pythonunicode

提问by Fakhriyanto

I want to make search engine and I follow tutorial in some web. I want to test parse html

我想制作搜索引擎,并在某些网站上遵循教程。我想测试解析html

from bs4 import BeautifulSoup

def parse_html(filename):
    """Extract the Author, Title and Text from a HTML file
    which was produced by pdftotext with the option -htmlmeta."""
    with open(filename) as infile:
        html = BeautifulSoup(infile, "html.parser", from_encoding='utf-8')
        d = {'text': html.pre.text}
        if html.title is not None:
            d['title'] = html.title.text
        for meta in html.findAll('meta'):
            try:
                if meta['name'] in ('Author', 'Title'):
                    d[meta['name'].lower()] = meta['content']
            except KeyError:
                continue
        return d

parse_html("C:\pdf\pydf\data\muellner2011.html")

and it getting error

它得到错误

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 867: character maps to <undefined>enter code here

I saw some solutions on the Web using the encode(). But I don't know how to insert encode() function in code. Can anyone help me?

我在网上看到了一些使用 encode() 的解决方案。但我不知道如何在代码中插入 encode() 函数。谁能帮我?

采纳答案by Martijn Pieters

In Python 3, files are opened as text (decoded to Unicode) for you; you don't need to tell BeautifulSoup what codec to decode from.

在 Python 3 中,文件以文本形式(解码为 Unicode)为您打开;您不需要告诉 BeautifulSoup 解码的编解码器。

If decoding of the data fails, that's because you didn't tell the open()call what codec to use when reading the file; add the correct codec with an encodingargument:

如果数据解码失败,那是因为你没有告诉open()调用读取文件时使用什么编解码器;使用encoding参数添加正确的编解码器:

with open(filename, encoding='utf8') as infile:
    html = BeautifulSoup(infile, "html.parser")

otherwise the file will be opened with your system default codec, which is OS dependent.

否则该文件将使用您的系统默认编解码器打开,这取决于操作系统。