Python从文件中读取并保存到utf-8

Question

提问by aarelovich

I'm having problems reading from a file, processing its string and saving to an UTF-8 File.

我在读取文件、处理其字符串和保存到 UTF-8 文件时遇到问题。

Here is the code:

这是代码：

try:
    filehandle = open(filename,"r")
except:
    print("Could not open file " + filename)
    quit() 

text = filehandle.read()
filehandle.close()

I then do some processing on the variable text.

然后我对变量文本进行一些处理。

And then

进而

try:
    writer = open(output,"w")
except:
    print("Could not open file " + output)
    quit() 

#data = text.decode("iso 8859-15")    
#writer.write(data.encode("UTF-8"))
writer.write(text)
writer.close()

This output the file perfectly but it does so in iso 8859-15 according to my editor. Since the same editor recognizes the input file (in the variable filename) as UTF-8 I don't know why this happened. As far as my reasearch has shown the commented lines should solve the problem. However when I use those lines the resulting file has gibberish in special character mainly, words with tilde as the text is in spanish. I would really appreciate any help as I am stumped....

这完美地输出了文件，但根据我的编辑器，它在iso 8859-15中这样做。由于同一个编辑器将输入文件（在变量文件名中）识别为 UTF-8，我不知道为什么会发生这种情况。据我的研究表明，注释行应该可以解决问题。但是，当我使用这些行时，生成的文件主要是特殊字符中的乱码，带有波浪号的单词因为文本是西班牙语。我真的很感激任何帮助，因为我很难过......

Answer 1

采纳答案by Mark Tolonen

Process text to and from Unicode at the I/O boundaries of your program using the codecsmodule:

使用codecs模块在程序的 I/O 边界处处理与 Unicode 之间的文本：

import codecs
with codecs.open(filename, 'r', encoding='utf8') as f:
    text = f.read()
# process Unicode text
with codecs.open(filename, 'w', encoding='utf8') as f:
    f.write(text)

Edit:The iomodule is now recommended instead of codecs and is compatible with Python 3's opensyntax, and if using Python 3, you can just use openif you don't require Python 2 compatibility.

编辑：io现在推荐使用该模块而不是编解码器，并且与 Python 3 的open语法兼容，如果使用 Python 3，则可以在open不需要 Python 2 兼容性的情况下使用。

import io
with io.open(filename, 'r', encoding='utf8') as f:
    text = f.read()
# process Unicode text
with io.open(filename, 'w', encoding='utf8') as f:
    f.write(text)

Answer 2

回答by Fernando Freitas Alves

You can't do that using open. use codecs.

你不能用 open 来做到这一点。使用编解码器。

when you are opening a file in python using the open built-in function you will always read/write the file in ascii. To write it in utf-8 try this:

当您使用 open 内置函数在 python 中打开文件时，您将始终以 ascii 读取/写入文件。用 utf-8 写它试试这个：

import codecs
file = codecs.open('data.txt','w','utf-8')

Answer 3

回答by Siva Kumar

You can also get through it by the code below:

您也可以通过下面的代码来完成它：

file=open(completefilepath,'r',encoding='utf8',errors="ignore")
file.read()

Python从文件中读取并保存到utf-8

提问by aarelovich

采纳答案by Mark Tolonen

回答by Fernando Freitas Alves

回答by Siva Kumar

相关推荐

最近更新

标签

Python从文件中读取并保存到utf-8

提问by aarelovich

采纳答案by Mark Tolonen

回答by Fernando Freitas Alves

回答by Siva Kumar

相关推荐

Python 在 TensorFlow 训练期间打印损失

Python Matplotlib - 如何删除特定的直线或曲线

Python 构建多回归模型抛出错误：`Pandas 数据转换为对象的 numpy dtype。使用 np.asarray(data) 检查输入数据。`

Python ValueError: 'axis' 条目超出范围 // numpy

相关推荐

最近更新

标签