python python打开文本文件,每个字符之间有一个空格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/603115/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:27:15  来源:igfitidea点击:

python opens text file with a space between every character

pythoncsvtext-files

提问by wlindner

Whenever I try to open a .csv file with the python command fread = open('input.csv', 'r')it always opens the file with spaces between every single character. I'm guessing it's something wrong with the text file because I can open other text files with the same command and they are loaded correctly. Does anyone know why a text file would load like this in python?

每当我尝试使用 python 命令打开 .csv 文件时, fread = open('input.csv', 'r')它总是打开文件,每个字符之间都有空格。我猜这是文本文件有问题,因为我可以使用相同的命令打开其他文本文件并且它们被正确加载。有谁知道为什么文本文件会像这样在 python 中加载?

Thanks.

谢谢。

Update

更新

Ok, I got it with the help of Jarret Hardie's post

好的,我在 Jarret Hardie 的帖子的帮助下得到了它

this is the code that I used to convert the file to ascii

这是我用来将文件转换为 ascii 的代码

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)

Thanks!

谢谢!

采纳答案by Jarret Hardie

The post by recursive is probably right... the contents of the file are likely encoded with a multi-byte charset. If this is, in fact, the case you can likely read the file in python itself without having to convert it first outside of python.

递归的帖子可能是正确的......文件的内容可能是用多字节字符集编码的。如果是这样,实际上,您可能可以在 python 本身中读取文件,而不必先在 python 之外转换它。

Try something like:

尝试类似:

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')

The 'b' flag ensures the file is read as binary data. You'll need to know (or guess) the original encoding... in this example, I've used utf-16, but YMMV. This will convert the file to unicode. If you truly have a file with multi-byte chars, I don't recommend converting it to ascii as you may end up losing a lot of the characters in the process.

'b' 标志确保文件被读取为二进制数据。您需要知道(或猜测)原始编码……在本例中,我使用了 utf-16,但使用了 YMMV。这会将文件转换为 unicode。如果您确实有一个包含多字节字符的文件,我不建议将其转换为 ascii,因为您最终可能会在此过程中丢失很多字符。

EDIT: Thanks for uploading the file. There are two bytes at the front of the file which indicates that it does, indeed, use a wide charset. If you're curious, open the file in a hex editor as some have suggested... you'll see something in the text version like 'I.D.|.' (etc). The dot is the extra byte for each char.

编辑:感谢您上传文件。文件前面有两个字节,表明它确实使用了宽字符集。如果您感到好奇,可以按照某些人的建议在十六进制编辑器中打开该文件……您会在文本版本中看到诸如“ID|”之类的内容。(等等)。点是每个字符的额外字节。

The code snippet above seems to work on my machine with that file.

上面的代码片段似乎可以在我的机器上使用该文件。

回答by recursive

The file is encoded in some unicode encoding, but you are reading it as ascii. Try to convert the file to ascii before using it in python.

该文件以某种 unicode 编码进行编码,但您将其读取为 ascii。在 python 中使用它之前,尝试将文件转换为 ascii。

回答by Lo?c Wolff

Isn't csv a simple txt file with values separated with comma. Just try to open it with a text editor to see if the file is correctly formed.

csv 不是一个简单的 txt 文件,其中的值用逗号分隔。只需尝试用文本编辑器打开它,看看文件格式是否正确。

回答by Lo?c Wolff

To read an encoded file, you can simply replace openwith codecs.open.

要读取编码文件,您可以简单地替换opencodecs.open.

fread = codecs.open('input.csv', 'r', 'utf-16')

回答by Shizzmo

Here's the quick and easy way, esp if python won't parse the input correctly

这是快速简便的方法,尤其是如果 python 无法正确解析输入

sed 's/ \(.\)//g'

回答by wlindner

Ok, I got it with the help of Jarret Hardie's post

好的,我在 Jarret Hardie 的帖子的帮助下得到了它

this is the code that I used to convert the file to ascii

这是我用来将文件转换为 ascii 的代码

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)

Thanks!

谢谢!

回答by Christian Witts

Open the file in binary mode, 'rb'. Check it in a HEX Editor and check for null padding '00'. Open the file in something like Scintilla Text Editor to check the characters present in the file.

以二进制模式“rb”打开文件。在十六进制编辑器中检查它并检查空填充“00”。在 Scintilla Text Editor 之类的工具中打开文件以检查文件中存在的字符。

回答by Luiz Damim

It did never ocurred to me, but as truppo said, it must be something wrong with the file.

它从来没有发生在我身上,但正如 truppo 所说,它一定是文件有问题。

Try to open the file in Excel/BrOffice Calc and Save As the file as Csv again.

尝试在 Excel/BrOffice Calc 中打开文件并再次将文件另存为 Csv。

If the problem persists, try a subset of the data: fist 10/last 10/intermediate 10 lines of the file.

如果问题仍然存在,请尝试数据的一个子集:文件的第 10 行/最后 10 行/中间 10 行。