Python从文件中读取并保存到utf-8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19591458/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python reading from a file and saving to utf-8
提问by aarelovich
I'm having problems reading from a file, processing its string and saving to an UTF-8 File.
我在读取文件、处理其字符串和保存到 UTF-8 文件时遇到问题。
Here is the code:
这是代码:
try:
filehandle = open(filename,"r")
except:
print("Could not open file " + filename)
quit()
text = filehandle.read()
filehandle.close()
I then do some processing on the variable text.
然后我对变量文本进行一些处理。
And then
进而
try:
writer = open(output,"w")
except:
print("Could not open file " + output)
quit()
#data = text.decode("iso 8859-15")
#writer.write(data.encode("UTF-8"))
writer.write(text)
writer.close()
This output the file perfectly but it does so in iso 8859-15 according to my editor. Since the same editor recognizes the input file (in the variable filename) as UTF-8 I don't know why this happened. As far as my reasearch has shown the commented lines should solve the problem. However when I use those lines the resulting file has gibberish in special character mainly, words with tilde as the text is in spanish. I would really appreciate any help as I am stumped....
这完美地输出了文件,但根据我的编辑器,它在iso 8859-15中这样做。由于同一个编辑器将输入文件(在变量文件名中)识别为 UTF-8,我不知道为什么会发生这种情况。据我的研究表明,注释行应该可以解决问题。但是,当我使用这些行时,生成的文件主要是特殊字符中的乱码,带有波浪号的单词因为文本是西班牙语。我真的很感激任何帮助,因为我很难过......
采纳答案by Mark Tolonen
Process text to and from Unicode at the I/O boundaries of your program using the codecs
module:
使用codecs
模块在程序的 I/O 边界处处理与 Unicode 之间的文本:
import codecs
with codecs.open(filename, 'r', encoding='utf8') as f:
text = f.read()
# process Unicode text
with codecs.open(filename, 'w', encoding='utf8') as f:
f.write(text)
Edit:The io
module is now recommended instead of codecs and is compatible with Python 3's open
syntax, and if using Python 3, you can just use open
if you don't require Python 2 compatibility.
编辑:io
现在推荐使用该模块而不是编解码器,并且与 Python 3 的open
语法兼容,如果使用 Python 3,则可以在open
不需要 Python 2 兼容性的情况下使用。
import io
with io.open(filename, 'r', encoding='utf8') as f:
text = f.read()
# process Unicode text
with io.open(filename, 'w', encoding='utf8') as f:
f.write(text)
回答by Fernando Freitas Alves
You can't do that using open. use codecs.
你不能用 open 来做到这一点。使用编解码器。
when you are opening a file in python using the open built-in function you will always read/write the file in ascii. To write it in utf-8 try this:
当您使用 open 内置函数在 python 中打开文件时,您将始终以 ascii 读取/写入文件。用 utf-8 写它试试这个:
import codecs
file = codecs.open('data.txt','w','utf-8')
回答by Siva Kumar
You can also get through it by the code below:
您也可以通过下面的代码来完成它:
file=open(completefilepath,'r',encoding='utf8',errors="ignore")
file.read()