Python UnicodeEncodeError: 'charmap' 编解码器无法编码字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27092833/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:25:43  来源:igfitidea点击:

UnicodeEncodeError: 'charmap' codec can't encode characters

pythonbeautifulsoupurllib

提问by SstrykerR

I'm trying to scrape a website, but it gives me an error.

我正在尝试抓取一个网站,但它给了我一个错误。

I'm using the following code:

我正在使用以下代码:

import urllib.request
from bs4 import BeautifulSoup

get = urllib.request.urlopen("https://www.website.com/")
html = get.read()

soup = BeautifulSoup(html)

print(soup)

And I'm getting the following error:

我收到以下错误:

File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 70924-70950: character maps to <undefined>

What can I do to fix this?

我能做些什么来解决这个问题?

采纳答案by SstrykerR

I fixed it by adding .encode("utf-8")to soup.

我通过添加.encode("utf-8")soup.

That means that print(soup)becomes print(soup.encode("utf-8")).

这意味着print(soup)变成print(soup.encode("utf-8")).

回答by Pseudo Sudo

For those still getting this error, adding encode("utf-8")to soupwill also fix this.

对于那些仍然遇到此错误的人,添加encode("utf-8")soup也将解决此问题。

soup = BeautifulSoup(html_doc, 'html.parser').encode("utf-8")
print(soup)

回答by twasbrillig

I was getting the same UnicodeEncodeErrorwhen saving scraped web content to a file. To fix it I replaced this code:

UnicodeEncodeError将抓取的 Web 内容保存到文件时,我得到了相同的结果。为了修复它,我替换了此代码:

with open(fname, "w") as f:
    f.write(html)

with this:

有了这个:

import io
with io.open(fname, "w", encoding="utf-8") as f:
    f.write(html)

Using iogives you backward compatibility with Python 2.

使用io可让您向后兼容 Python 2。

If you only need to support Python 3 you can use the builtin openfunction instead:

如果您只需要支持 Python 3,您可以使用内置open函数:

with open(fname, "w", encoding="utf-8") as f:
    f.write(html)

回答by Sabbir Ahmed

In Python 3.7, and running Windows 10 this worked (I am not sure whether it will work on other platforms and/or other versions of Python)

在 Python 3.7 和运行 Windows 10 中,这有效(我不确定它是否适用于其他平台和/或其他版本的 Python)

Replacing this line:

替换这一行:

with open('filename', 'w') as f:

with open('filename', 'w') as f:

With this:

有了这个:

with open('filename', 'w', encoding='utf-8') as f:

with open('filename', 'w', encoding='utf-8') as f:

The reason why it is working is because the encoding is changed to UTF-8 when using the file, so characters in UTF-8 are able to be converted to text, instead of returning an error when it encounters a UTF-8 character that is not suppord by the current encoding.

之所以能工作,是因为在使用文件时将编码改为了UTF-8,所以UTF-8中的字符能够转为文本,而不是遇到UTF-8字符时返回错误当前编码不支持。

回答by Abhishek Jain

While saving the response of get request, same error was thrown on Python 3.7 on window 10. The response received from the URL, encoding was UTF-8 so it is always recommended to check the encoding so same can be passed to avoid such trivial issue as it really kills lots of time in production

在保存 get 请求的响应时,在窗口 10 上的 Python 3.7 上抛出了同样的错误。从 URL 接收到的响应,编码为 UTF-8,因此始终建议检查编码,以便可以传递相同的编码以避免此类琐碎问题因为它真的会在生产中浪费很多时间

import requests
resp = requests.get('https://en.wikipedia.org/wiki/NIFTY_50')
print(resp.encoding)
with open ('NiftyList.txt', 'w') as f:
    f.write(resp.text)

When I added encoding="utf-8" with the open command it saved the file with the correct response

当我使用 open 命令添加 encoding="utf-8" 时,它以正确的响应保存了文件

with open ('NiftyList.txt', 'w', encoding="utf-8") as f:
    f.write(resp.text)

回答by Pardhu Gopalam

Even I faced the same issue with the encoding that occurs when you try to print it, read/write it or open it. As others mentioned above adding .encoding="utf-8" will help if you are trying to print it.

甚至当您尝试打印、读/写或打开它时,我也遇到了与编码相同的问题。正如上面提到的其他人,如果您尝试打印它,添加 .encoding="utf-8" 会有所帮助。

soup.encode("utf-8")

汤.encode("utf-8")

If you are trying to open scraped data and maybe write it into a file, then open the file with (......,encoding="utf-8")

如果您尝试打开抓取的数据并将其写入文件,请使用 (......,encoding="utf-8") 打开文件

with open(filename_csv , 'w', newline='',encoding="utf-8") as csv_file:

使用 open(filename_csv , 'w', newline='',encoding="utf-8") 作为 csv_file: