Python UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 434852 中的字节 0xe2: 无效的连续字节

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16148356/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:52:17  来源:igfitidea点击:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 434852: invalid continuation byte

pythonxml

提问by user2181913

I am using hfcca to calculate cyclomatic complexity for a c++ code. hfcca is a simple python script (https://code.google.com/p/headerfile-free-cyclomatic-complexity-analyzer/). When i am trying to run the script to generate the output in the form of an xml file i am getting following errors :

我正在使用 hfcca 来计算 C++ 代码的圈复杂度。hfcca 是一个简单的 Python 脚本(https://code.google.com/p/headerfile-free-cyclomatic-complexity-analyzer/)。当我尝试运行脚本以 xml 文件的形式生成输出时,出现以下错误:

Traceback (most recent call last):
    "./hfcca.py", line 802, in <module>
    main(sys.argv[1:])
    File "./hfcca.py", line 798, in main
    print(xml_output([f for f in r], options))
    File "./hfcca.py", line 798, in <listcomp>
    print(xml_output([f for f in r], options))
    File "/x/home06/smanchukonda/PREFIX/lib/python3.3/multiprocessing/pool.py", line 652, in next
    raise value
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 434852: invalid continuation byte

Please help me with this..

请在这件事上给予我帮助..

回答by monk

The problem looks like the file has characters represented with latin1 that aren't characters in utf8. The fileutility can be useful for figuring out what encoding a file should be treated as, e.g:

问题看起来像该文件具有用 latin1 表示的字符,而这些字符不是 utf8 中的字符。该file实用程序可用于确定文件应被视为何种编码,例如:

monk@monk-VirtualBox:~$ file foo.txt 
foo.txt: UTF-8 Unicode text

Here's what the bytes mean in latin1:

这是latin1中字节的含义:

>>> b'\xe2'.decode('latin1')
'a'

Probably easiest is to convert the files to utf8.

可能最简单的方法是将文件转换为 utf8。

回答by Biashara Employers

I also had the same problem rendering Markup("""yyyyyy""") but i solved it using an online tool with removed the 'bad' characters. https://pteo.paranoiaworks.mobi/diacriticsremover/

我在渲染 Markup("""yyyyyy""") 时也遇到了同样的问题,但我使用在线工具解决了这个问题,并删除了“坏”字符。https://pteo.paranoiaworks.mobi/diacriticsremover/

It is a nice tool and works even offline.

这是一个不错的工具,甚至可以离线使用。