Python “utf-8”编解码器无法解码位置 4276 中的字节 0xa0:起始字节无效
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48067514/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte
提问by Vital
I try to read and print the following file: txt.tsv (https://www.sec.gov/files/dera/data/financial-statement-and-notes-data-sets/2017q3_notes.zip)
我尝试阅读并打印以下文件:txt.tsv(https://www.sec.gov/files/dera/data/financial-statement-and-notes-data-sets/2017q3_notes.zip)
According to the SEC the data set is provided in a single encoding, as follows:
根据美国证券交易委员会的说法,数据集以单一编码提供,如下所示:
Tab Delimited Value (.txt): utf-8, tab-delimited, \n- terminated lines, with the first line containing the field names in lowercase.
制表符分隔值 (.txt):utf-8、制表符分隔、\n- 终止的行,第一行包含小写的字段名称。
My current code:
我目前的代码:
import csv
with open('txt.tsv') as tsvfile:
reader = csv.DictReader(tsvfile, dialect='excel-tab')
for row in reader:
print(row)
All attempts ended with the following error message:
所有尝试都以以下错误消息结束:
'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte
“utf-8”编解码器无法解码位置 4276 中的字节 0xa0:起始字节无效
I am a bit lost. Can anyone help me? Many thanks in advance.
我有点失落。谁能帮我?提前谢谢了。
回答by koPytok
Encoding in the file is 'windows-1252'. Use:
文件中的编码是“windows-1252”。用:
open('txt.tsv', encoding='windows-1252')
回答by Hasim D
If someone works on Turkish data, then I suggest this line:
如果有人处理土耳其数据,那么我建议使用这一行:
df = pd.read_csv("text.txt",encoding='windows-1254')
回答by Ghulam Dastgeer
i have the same error message for .csv file, and This Worked for me :
我对 .csv 文件有相同的错误消息,这对我有用:
df = pd.read_csv('Text.csv',encoding='ANSI')
回答by raj kumar
ds = pd.read_csv('/Dataset/test.csv', encoding='windows-1252')
Works fine for me, thanks.
对我来说很好用,谢谢。