为什么我的 Python 代码会打印额外的字符“???” 从文本文件中读取时?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34399172/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does my Python code print the extra characters "???" when reading from a text file?
提问by vrkratheesh
try:
data=open('info.txt')
for each_line in data:
try:
(role,line_spoken)=each_line.split(':',1)
print(role,end='')
print(' said: ',end='')
print(line_spoken,end='')
except ValueError:
print(each_line)
data.close()
except IOError:
print("File is missing")
When printing the file line by line, the code tends to add three unnecessary characters in the front, namely "???".
在逐行打印文件时,代码往往会在前面添加三个不需要的字符,即“???”。
Actual output:
实际输出:
???Man said: Is this the right room for an argument?
Other Man said: I've told you once.
Man said: No you haven't!
Other Man said: Yes I have.
Expected output:
预期输出:
Man said: Is this the right room for an argument?
Other Man said: I've told you once.
Man said: No you haven't!
Other Man said: Yes I have.
采纳答案by senshin
I can't find a duplicate of this for Python 3, which handles encodings differently from Python 2. So here's the answer: instead of opening the file with the default encoding (which is 'utf-8'
), use 'utf-8-sig'
, which expects and strips off the UTF-8 Byte Order Mark, which is what shows up as ???
.
我找不到 Python 3 的副本,它处理编码的方式与 Python 2 不同。所以这里是答案:不要使用默认编码(即'utf-8'
)打开文件,而是使用'utf-8-sig'
,它期望并去除UTF- 8 字节顺序标记,显示为???
.
That is, instead of
也就是说,而不是
data = open('info.txt')
Do
做
data = open('info.txt', encoding='utf-8-sig')
Note that if you're on Python 2, you should see e.g. Python, Encoding output to UTF-8and Convert UTF-8 with BOM to UTF-8 with no BOM in Python. You'll need to do some shenanigans with codecs
or with str.decode
for this to work right in Python 2. But in Python 3, all you need to do is set the encoding=
parameter when you open the file.
请注意,如果您使用的是 Python 2,您应该会看到例如Python, Encoding output to UTF-8and Convert UTF-8 with BOM to UTF-8 with no BOM in Python。为了在 Python 2 中正常工作,您需要使用codecs
或使用一些恶作剧str.decode
。但在 Python 3 中,您需要做的就是encoding=
在打开文件时设置参数。
回答by gavin
I had a very similar problem when dealing with excel csv files. Initially I had saved my file from the drop down choices as a .csv utf-8(comma delimited) file. Then I saved it as just a .csv(comma delimited) file and all was well. Perhaps there might be something similar issue with a .txt file
在处理 excel csv 文件时,我遇到了一个非常相似的问题。最初,我从下拉选项中将我的文件保存为 .csv utf-8(逗号分隔)文件。然后我将它保存为一个 .csv(逗号分隔)文件,一切都很好。也许 .txt 文件可能存在类似的问题
回答by Giovanni
When I had this happen, it only happened to the very first line of my CSV, both reading and writing. For what I was doing, I just made a "sacrificial" entry at the first location so that those charatcers would get added to my sacrifical entry and not any of the ones I cared about. Definitley not a robust solution but was quick and worked for my purposes.
当我发生这种情况时,它只发生在我的 CSV 的第一行,包括阅读和写作。对于我正在做的事情,我只是在第一个位置创建了一个“牺牲”条目,这样这些字符就会被添加到我的牺牲条目中,而不是我关心的任何一个。Definitley 不是一个强大的解决方案,但速度很快,并且符合我的目的。